├── standalone ├── Makefile ├── bpf_insn.h └── load_and_control_filter.c ├── .gitignore ├── bpf-program-only ├── my-ping.service ├── cgroup-sock-drop-filter.service ├── cgroup-sock-drop.c └── Makefile ├── port-firewall ├── my-filtered-ping.service ├── bpf-make.service ├── bpf-firewall@.service ├── Makefile ├── bpf_elf.h ├── service-with-filter@.service ├── port-firewall.c └── bpf_api.h └── README.md /standalone/Makefile: -------------------------------------------------------------------------------- 1 | load_and_control_filter: load_and_control_filter.c 2 | gcc -o $@ $< -lbpf -lncurses 3 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | bpf-program-only/cgroup-sock-drop.S 2 | bpf-program-only/cgroup-sock-drop.o 3 | standalone/load_and_control_filter 4 | port-firewall/port-firewall.o 5 | -------------------------------------------------------------------------------- /bpf-program-only/my-ping.service: -------------------------------------------------------------------------------- 1 | [Unit] 2 | Description=my ping service 3 | Requires=cgroup-sock-drop-filter.service 4 | After=cgroup-sock-drop-filter.service 5 | 6 | [Service] 7 | ExecStart=ping 127.0.0.1 8 | IPIngressFilterPath=/sys/fs/bpf/cgroup-sock-drop-filter 9 | -------------------------------------------------------------------------------- /bpf-program-only/cgroup-sock-drop-filter.service: -------------------------------------------------------------------------------- 1 | [Unit] 2 | Description=cgroup socket drop filter 3 | 4 | [Service] 5 | Type=oneshot 6 | RemainAfterExit=yes 7 | ExecStart=/path/to/bpftool prog load /path/to/cgroup-sock-drop.o /sys/fs/bpf/cgroup-sock-drop-filter type cgroup/skb 8 | ExecStop=rm /sys/fs/bpf/cgroup-sock-drop-filter 9 | LimitMEMLOCK=infinity 10 | -------------------------------------------------------------------------------- /bpf-program-only/cgroup-sock-drop.c: -------------------------------------------------------------------------------- 1 | /* cgroup/skb BPF prog */ 2 | #include 3 | 4 | #ifndef __section 5 | # define __section(NAME) \ 6 | __attribute__((section(NAME), used)) 7 | #endif 8 | 9 | __section("filter") 10 | int cgroup_socket_drop(struct __sk_buff *skb) 11 | { 12 | /* analyze skb content here */ 13 | return 0; /* 0 = drop, 1 = forward */ 14 | } 15 | 16 | char __license[] __section("license") = "GPL"; 17 | -------------------------------------------------------------------------------- /port-firewall/my-filtered-ping.service: -------------------------------------------------------------------------------- 1 | [Unit] 2 | Description=my egress-filtered ping service 3 | Requires=bpf-make.service 4 | After=bpf-make.service 5 | 6 | [Service] 7 | ExecStart=ping 127.0.0.1 8 | IPEgressFilterPath=/sys/fs/bpf/port-firewall 9 | # If you don't have systemd v243 you can use this instead of the above line: 10 | # ExecStartPre=/path/to/bpftool cgroup attach /sys/fs/cgroup/unified/system.slice/%n egress pinned /sys/fs/bpf/port-firewall multi 11 | -------------------------------------------------------------------------------- /port-firewall/bpf-make.service: -------------------------------------------------------------------------------- 1 | [Unit] 2 | Description=BPF port-firewall load service 3 | 4 | [Service] 5 | Type=oneshot 6 | RemainAfterExit=yes 7 | # If bpftool is not installed system-wide use: Environment="PATH=/bin:/usr/bin:/path/to/bpftool-folder" 8 | Environment='FILTER=icmp || (udp && dst_port == 53) || (tcp && dst_port == 80)' 9 | ExecStart=/usr/bin/make -C /path/to/repo/bpf-cgroup-filter/port-firewall 10 | ExecStop=rm /sys/fs/bpf/port-firewall 11 | LimitMEMLOCK=infinity 12 | -------------------------------------------------------------------------------- /port-firewall/bpf-firewall@.service: -------------------------------------------------------------------------------- 1 | [Unit] 2 | Description=BPF port-firewall load service template for filter: %I 3 | 4 | [Service] 5 | Type=oneshot 6 | RemainAfterExit=yes 7 | # If bpftool is not installed system-wide use: Environment="PATH=/bin:/usr/bin:/path/to/bpftool-folder" 8 | Environment='FILTER=%I' 9 | Environment='BPFNAME=%i' 10 | ExecStart=/usr/bin/make -C /path/to/repo/bpf-cgroup-filter/port-firewall 11 | ExecStop=/usr/bin/make -C /path/to/repo/bpf-cgroup-filter/port-firewall remove 12 | LimitMEMLOCK=infinity 13 | -------------------------------------------------------------------------------- /bpf-program-only/Makefile: -------------------------------------------------------------------------------- 1 | cgroup-sock-drop.o: cgroup-sock-drop.c 2 | clang -O2 -S -Wall -target bpf -c cgroup-sock-drop.c -o cgroup-sock-drop.S 3 | llvm-mc -triple bpf -filetype=obj -o cgroup-sock-drop.o cgroup-sock-drop.S 4 | 5 | load: 6 | @which bpftool || ( echo "Install bpftool as package or compile and copy it to your PATH: cd linux-source-x.xx/tools/bpf/bpftool ; make bpftool ; cp bpftool ~/.local/bin/"; exit 1 ) 7 | sudo sh -c "rm /sys/fs/bpf/cgroup-sock-drop || true" 8 | sudo `which bpftool` prog load cgroup-sock-drop.o /sys/fs/bpf/cgroup-sock-drop type cgroup/skb 9 | @echo "Success" 10 | -------------------------------------------------------------------------------- /port-firewall/Makefile: -------------------------------------------------------------------------------- 1 | # Always recompile because the variable might have changed 2 | BPFNAME ?= port-firewall 3 | all: compile load cleanup 4 | 5 | .PHONY: compile 6 | compile: 7 | ifeq ($(FILTER),) 8 | @printf "To configure which packets to forward invoke as:\n FILTER=… make\nor\n make FILTER=…\n" 9 | @printf "With FILTER being a valid C expression over the boolean variables [udp, tcp, icmp, ip, ipv6]\n" 10 | @printf "and the integers [dst_port, src_port], e.g.:\n" 11 | @printf " FILTER='icmp || (udp && dst_port == 53) || (tcp && dst_port == 80)'\n" 12 | @printf " FILTER='!udp || dst_port == 53'\n" 13 | @exit 1 14 | endif 15 | clang -O2 -Wall -target bpf -c port-firewall.c -o '${BPFNAME}.o' -D "FILTER=${FILTER}" 16 | 17 | .PHONY: load 18 | load: 19 | @which bpftool || ( echo "Install bpftool as package or compile and copy it to your PATH: cd linux-source-x.xx/tools/bpf/bpftool ; make bpftool ; cp bpftool ~/.local/bin/"; exit 1 ) 20 | sudo sh -c "rm '/sys/fs/bpf/${BPFNAME}' || true" 21 | sudo `which bpftool` prog load '${BPFNAME}.o' '/sys/fs/bpf/${BPFNAME}' type cgroup/skb 22 | @echo "Success" 23 | @echo 'Now use it with Systemd v243 and the option IP(Ingress|Egress)FilterPath= or attach it manually to a cgroup with:' 24 | @echo " sudo `which bpftool` cgroup attach /sys/fs/cgroup/unified/… ingress|egress pinned '/sys/fs/bpf/${BPFNAME}' multi" 25 | @echo 'You can make a new cgroup with "sudo systemd-run --scope -S" or "systemd-run --user --scope -S" (-S can be replaced with a command)' 26 | 27 | .PHONY: cleanup 28 | cleanup: 29 | rm '${BPFNAME}.o' 30 | 31 | .PHONY: remove 32 | remove: 33 | sudo rm '/sys/fs/bpf/${BPFNAME}' 34 | -------------------------------------------------------------------------------- /port-firewall/bpf_elf.h: -------------------------------------------------------------------------------- 1 | /* Copied from https://git.kernel.org/pub/scm/network/iproute2/iproute2.git/tree/include/bpf_elf.h 2 | * (copyrights belong to the iproute2 community) 3 | */ 4 | /* SPDX-License-Identifier: GPL-2.0 */ 5 | #ifndef __BPF_ELF__ 6 | #define __BPF_ELF__ 7 | 8 | #include 9 | 10 | /* Note: 11 | * 12 | * Below ELF section names and bpf_elf_map structure definition 13 | * are not (!) kernel ABI. It's rather a "contract" between the 14 | * application and the BPF loader in tc. For compatibility, the 15 | * section names should stay as-is. Introduction of aliases, if 16 | * needed, are a possibility, though. 17 | */ 18 | 19 | /* ELF section names, etc */ 20 | #define ELF_SECTION_LICENSE "license" 21 | #define ELF_SECTION_MAPS "maps" 22 | #define ELF_SECTION_PROG "prog" 23 | #define ELF_SECTION_CLASSIFIER "classifier" 24 | #define ELF_SECTION_ACTION "action" 25 | 26 | #define ELF_MAX_MAPS 64 27 | #define ELF_MAX_LICENSE_LEN 128 28 | 29 | /* Object pinning settings */ 30 | #define PIN_NONE 0 31 | #define PIN_OBJECT_NS 1 32 | #define PIN_GLOBAL_NS 2 33 | 34 | /* ELF map definition */ 35 | struct bpf_elf_map { 36 | __u32 type; 37 | __u32 size_key; 38 | __u32 size_value; 39 | __u32 max_elem; 40 | __u32 flags; 41 | __u32 id; 42 | __u32 pinning; 43 | __u32 inner_id; 44 | __u32 inner_idx; 45 | }; 46 | 47 | #define BPF_ANNOTATE_KV_PAIR(name, type_key, type_val) \ 48 | struct ____btf_map_##name { \ 49 | type_key key; \ 50 | type_val value; \ 51 | }; \ 52 | struct ____btf_map_##name \ 53 | __attribute__ ((section(".maps." #name), used)) \ 54 | ____btf_map_##name = { } 55 | 56 | #endif /* __BPF_ELF__ */ 57 | -------------------------------------------------------------------------------- /port-firewall/service-with-filter@.service: -------------------------------------------------------------------------------- 1 | [Unit] 2 | Description=my egress-filtered ping service template 3 | # To avoid specifying the FILTER here twice and below again, 4 | # this service file is a template and the FILTER has to be 5 | # passed as argument (referenced with %i) when instanciating 6 | # the service via `systemctl start "service-with-filter@FILTER.service"` 7 | Requires=bpf-firewall@%i.service 8 | After=bpf-firewall@%i.service 9 | # The alternative is to not use a template file and include the argument directly 10 | # here as bpf-firewall@ESCAPED.service with ESCAPED being the output 11 | # of `systemd-escape "FILTER"`. 12 | # Then you can make this file here a regular service file without the @. 13 | 14 | [Service] 15 | ExecStart=ping 127.0.0.1 16 | IPEgressFilterPath=/sys/fs/bpf/%i 17 | 18 | # If you don't have systemd v243 you can use this instead of the above line: 19 | # ExecStartPre=/path/to/bpftool cgroup attach /sys/fs/cgroup/unified/system.slice/system-service\x5cx2dwith\x5cx2dfilter.slice/%n egress pinned /sys/fs/bpf/%i multi 20 | # Cannot use %p here but have to use 'my\x5cx2dping\x5cx2dwith\x5cx2dfilter' (encoded twice with systemd-escape) because the cgroup fs slice path name still has the escaping and if we use the escaping here once it is reverted once and thus removed when the unit is loaded. 21 | 22 | # Again, if this file is not a template, instead of %i use 23 | # IPEgressFilterPath=/sys/fs/bpf/ESCAPED 24 | # Without systemd v243 it would become 25 | # ExecStartPre=/path/to/bpftool cgroup attach /sys/fs/cgroup/unified/system.slice/%n egress pinned /sys/fs/bpf/ESCAPEDTWICE multi 26 | # with ESCAPEDTWICE being the output of `systemd-escape "ESCAPED"`. 27 | -------------------------------------------------------------------------------- /port-firewall/port-firewall.c: -------------------------------------------------------------------------------- 1 | /* Copyright 2019 Kai Lüke 2 | * SPDX-License-Identifier: GPL-2.0 3 | * 4 | * Minimal configurable packet filter, parses IP/IPv6 packets, ICMP, UDP ports, 5 | * and TCP ports. The forward rule is a C expression passed as FILTER variable 6 | * to the compiler with -D. The expression can use the boolean variables 7 | * [udp, tcp, icmp, ip, ipv6] and the integers [dst_port, src_port]. 8 | * If the expression evaluates to 0 (false), the packet will be dropped. 9 | */ 10 | 11 | /* Workaround for "/usr/include/gnu/stubs.h:7:11: fatal error: 'gnu/stubs-32.h' file not found" */ 12 | #define __x86_64__ 13 | 14 | #include 15 | #include "bpf_api.h" 16 | #include 17 | #include 18 | #include 19 | #include 20 | #include 21 | #include 22 | #include 23 | #include 24 | #include 25 | #include 26 | #include 27 | 28 | #ifndef __section 29 | # define __section(NAME) \ 30 | __attribute__((section(NAME), used)) 31 | #endif 32 | 33 | /* cgroup/skb BPF prog */ 34 | __section("filter") 35 | int port_firewall(struct __sk_buff *skb) { 36 | __u8 udp = 0, tcp = 0, icmp = 0, ip = 0, ipv6 = 0; 37 | __u16 dst_port = 0; 38 | __u16 src_port = 0; 39 | 40 | void *data = (void *)(long)skb->data; 41 | void *data_end = (void *)(long)skb->data_end; 42 | 43 | ip = skb->protocol == htons(ETH_P_IP); 44 | ipv6 = skb->protocol == htons(ETH_P_IPV6); 45 | 46 | if (ip) { 47 | if (data + sizeof(struct iphdr) > data_end) { return 0; } 48 | struct iphdr *ip = data; 49 | /* IP fragmentation does not need to be handled here for cgroup skbs */ 50 | icmp = ip->protocol == IPPROTO_ICMP; 51 | tcp = ip->protocol == IPPROTO_TCP; 52 | udp = ip->protocol == IPPROTO_UDP; 53 | if (udp || tcp) { 54 | __u8 *ihlandversion = data; 55 | __u8 ihlen = (*ihlandversion & 0xf) * 4; 56 | if (data + ihlen + sizeof(struct tcphdr) > data_end) { return 0; } 57 | struct tcphdr *tcp = data + ihlen; 58 | src_port = ntohs(tcp->source); 59 | dst_port = ntohs(tcp->dest); 60 | } 61 | } else if (ipv6) { 62 | struct ipv6hdr *ipv6 = data; 63 | __u8 ihlen = sizeof(struct ipv6hdr); 64 | if (((void *) ipv6) + ihlen > data_end) { return 0; } 65 | __u8 proto = ipv6->nexthdr; 66 | #pragma unroll 67 | for (int i = 0; i < 8; i++) { /* max 8 extension headers */ 68 | icmp = proto == IPPROTO_ICMPV6; 69 | tcp = proto == IPPROTO_TCP; 70 | udp = proto == IPPROTO_UDP; 71 | if (udp || tcp) { 72 | if (((void *) ipv6) + ihlen + sizeof(struct tcphdr) > data_end) { return 0; } 73 | struct tcphdr *tcp = ((void *) ipv6) + ihlen; 74 | src_port = ntohs(tcp->source); 75 | dst_port = ntohs(tcp->dest); 76 | } 77 | if (icmp || udp || tcp) { 78 | break; 79 | } 80 | if (proto == IPPROTO_FRAGMENT || proto == IPPROTO_HOPOPTS || 81 | proto == IPPROTO_ROUTING || proto == IPPROTO_AH || proto == IPPROTO_DSTOPTS) { 82 | if (((void *) ipv6) + ihlen + 2 > data_end) { return 0; } 83 | ipv6 = ((void *) ipv6) + ihlen; 84 | proto = *((__u8 *) ipv6); 85 | if (proto == IPPROTO_FRAGMENT) { 86 | ihlen = 8; 87 | } else { 88 | ihlen = *(((__u8 *) ipv6) + 1) + 8; 89 | } 90 | if (((void *) ipv6) + ihlen > data_end) { return 0; } 91 | } else { 92 | break; 93 | } 94 | } 95 | } 96 | 97 | if (FILTER) { 98 | return 1; /* 1 = forward */ 99 | } 100 | return 0; /* 0 = drop */ 101 | } 102 | 103 | char __license[] __section("license") = "GPL"; 104 | -------------------------------------------------------------------------------- /standalone/bpf_insn.h: -------------------------------------------------------------------------------- 1 | /* Copied from linux-source-4.19/samples/bpf/bpf_insn.h 2 | * (copyrights belong to the Linux kernel community) 3 | */ 4 | /* SPDX-License-Identifier: GPL-2.0 */ 5 | /* eBPF instruction mini library */ 6 | #ifndef __BPF_INSN_H 7 | #define __BPF_INSN_H 8 | 9 | struct bpf_insn; 10 | 11 | /* ALU ops on registers, bpf_add|sub|...: dst_reg += src_reg */ 12 | 13 | #define BPF_ALU64_REG(OP, DST, SRC) \ 14 | ((struct bpf_insn) { \ 15 | .code = BPF_ALU64 | BPF_OP(OP) | BPF_X, \ 16 | .dst_reg = DST, \ 17 | .src_reg = SRC, \ 18 | .off = 0, \ 19 | .imm = 0 }) 20 | 21 | #define BPF_ALU32_REG(OP, DST, SRC) \ 22 | ((struct bpf_insn) { \ 23 | .code = BPF_ALU | BPF_OP(OP) | BPF_X, \ 24 | .dst_reg = DST, \ 25 | .src_reg = SRC, \ 26 | .off = 0, \ 27 | .imm = 0 }) 28 | 29 | /* ALU ops on immediates, bpf_add|sub|...: dst_reg += imm32 */ 30 | 31 | #define BPF_ALU64_IMM(OP, DST, IMM) \ 32 | ((struct bpf_insn) { \ 33 | .code = BPF_ALU64 | BPF_OP(OP) | BPF_K, \ 34 | .dst_reg = DST, \ 35 | .src_reg = 0, \ 36 | .off = 0, \ 37 | .imm = IMM }) 38 | 39 | #define BPF_ALU32_IMM(OP, DST, IMM) \ 40 | ((struct bpf_insn) { \ 41 | .code = BPF_ALU | BPF_OP(OP) | BPF_K, \ 42 | .dst_reg = DST, \ 43 | .src_reg = 0, \ 44 | .off = 0, \ 45 | .imm = IMM }) 46 | 47 | /* Short form of mov, dst_reg = src_reg */ 48 | 49 | #define BPF_MOV64_REG(DST, SRC) \ 50 | ((struct bpf_insn) { \ 51 | .code = BPF_ALU64 | BPF_MOV | BPF_X, \ 52 | .dst_reg = DST, \ 53 | .src_reg = SRC, \ 54 | .off = 0, \ 55 | .imm = 0 }) 56 | 57 | #define BPF_MOV32_REG(DST, SRC) \ 58 | ((struct bpf_insn) { \ 59 | .code = BPF_ALU | BPF_MOV | BPF_X, \ 60 | .dst_reg = DST, \ 61 | .src_reg = SRC, \ 62 | .off = 0, \ 63 | .imm = 0 }) 64 | 65 | /* Short form of mov, dst_reg = imm32 */ 66 | 67 | #define BPF_MOV64_IMM(DST, IMM) \ 68 | ((struct bpf_insn) { \ 69 | .code = BPF_ALU64 | BPF_MOV | BPF_K, \ 70 | .dst_reg = DST, \ 71 | .src_reg = 0, \ 72 | .off = 0, \ 73 | .imm = IMM }) 74 | 75 | #define BPF_MOV32_IMM(DST, IMM) \ 76 | ((struct bpf_insn) { \ 77 | .code = BPF_ALU | BPF_MOV | BPF_K, \ 78 | .dst_reg = DST, \ 79 | .src_reg = 0, \ 80 | .off = 0, \ 81 | .imm = IMM }) 82 | 83 | /* BPF_LD_IMM64 macro encodes single 'load 64-bit immediate' insn */ 84 | #define BPF_LD_IMM64(DST, IMM) \ 85 | BPF_LD_IMM64_RAW(DST, 0, IMM) 86 | 87 | #define BPF_LD_IMM64_RAW(DST, SRC, IMM) \ 88 | ((struct bpf_insn) { \ 89 | .code = BPF_LD | BPF_DW | BPF_IMM, \ 90 | .dst_reg = DST, \ 91 | .src_reg = SRC, \ 92 | .off = 0, \ 93 | .imm = (__u32) (IMM) }), \ 94 | ((struct bpf_insn) { \ 95 | .code = 0, /* zero is reserved opcode */ \ 96 | .dst_reg = 0, \ 97 | .src_reg = 0, \ 98 | .off = 0, \ 99 | .imm = ((__u64) (IMM)) >> 32 }) 100 | 101 | #ifndef BPF_PSEUDO_MAP_FD 102 | # define BPF_PSEUDO_MAP_FD 1 103 | #endif 104 | 105 | /* pseudo BPF_LD_IMM64 insn used to refer to process-local map_fd */ 106 | #define BPF_LD_MAP_FD(DST, MAP_FD) \ 107 | BPF_LD_IMM64_RAW(DST, BPF_PSEUDO_MAP_FD, MAP_FD) 108 | 109 | 110 | /* Direct packet access, R0 = *(uint *) (skb->data + imm32) */ 111 | 112 | #define BPF_LD_ABS(SIZE, IMM) \ 113 | ((struct bpf_insn) { \ 114 | .code = BPF_LD | BPF_SIZE(SIZE) | BPF_ABS, \ 115 | .dst_reg = 0, \ 116 | .src_reg = 0, \ 117 | .off = 0, \ 118 | .imm = IMM }) 119 | 120 | /* Memory load, dst_reg = *(uint *) (src_reg + off16) */ 121 | 122 | #define BPF_LDX_MEM(SIZE, DST, SRC, OFF) \ 123 | ((struct bpf_insn) { \ 124 | .code = BPF_LDX | BPF_SIZE(SIZE) | BPF_MEM, \ 125 | .dst_reg = DST, \ 126 | .src_reg = SRC, \ 127 | .off = OFF, \ 128 | .imm = 0 }) 129 | 130 | /* Memory store, *(uint *) (dst_reg + off16) = src_reg */ 131 | 132 | #define BPF_STX_MEM(SIZE, DST, SRC, OFF) \ 133 | ((struct bpf_insn) { \ 134 | .code = BPF_STX | BPF_SIZE(SIZE) | BPF_MEM, \ 135 | .dst_reg = DST, \ 136 | .src_reg = SRC, \ 137 | .off = OFF, \ 138 | .imm = 0 }) 139 | 140 | /* Atomic memory add, *(uint *)(dst_reg + off16) += src_reg */ 141 | 142 | #define BPF_STX_XADD(SIZE, DST, SRC, OFF) \ 143 | ((struct bpf_insn) { \ 144 | .code = BPF_STX | BPF_SIZE(SIZE) | BPF_XADD, \ 145 | .dst_reg = DST, \ 146 | .src_reg = SRC, \ 147 | .off = OFF, \ 148 | .imm = 0 }) 149 | 150 | /* Memory store, *(uint *) (dst_reg + off16) = imm32 */ 151 | 152 | #define BPF_ST_MEM(SIZE, DST, OFF, IMM) \ 153 | ((struct bpf_insn) { \ 154 | .code = BPF_ST | BPF_SIZE(SIZE) | BPF_MEM, \ 155 | .dst_reg = DST, \ 156 | .src_reg = 0, \ 157 | .off = OFF, \ 158 | .imm = IMM }) 159 | 160 | /* Conditional jumps against registers, if (dst_reg 'op' src_reg) goto pc + off16 */ 161 | 162 | #define BPF_JMP_REG(OP, DST, SRC, OFF) \ 163 | ((struct bpf_insn) { \ 164 | .code = BPF_JMP | BPF_OP(OP) | BPF_X, \ 165 | .dst_reg = DST, \ 166 | .src_reg = SRC, \ 167 | .off = OFF, \ 168 | .imm = 0 }) 169 | 170 | /* Conditional jumps against immediates, if (dst_reg 'op' imm32) goto pc + off16 */ 171 | 172 | #define BPF_JMP_IMM(OP, DST, IMM, OFF) \ 173 | ((struct bpf_insn) { \ 174 | .code = BPF_JMP | BPF_OP(OP) | BPF_K, \ 175 | .dst_reg = DST, \ 176 | .src_reg = 0, \ 177 | .off = OFF, \ 178 | .imm = IMM }) 179 | 180 | /* Raw code statement block */ 181 | 182 | #define BPF_RAW_INSN(CODE, DST, SRC, OFF, IMM) \ 183 | ((struct bpf_insn) { \ 184 | .code = CODE, \ 185 | .dst_reg = DST, \ 186 | .src_reg = SRC, \ 187 | .off = OFF, \ 188 | .imm = IMM }) 189 | 190 | /* Program exit */ 191 | 192 | #define BPF_EXIT_INSN() \ 193 | ((struct bpf_insn) { \ 194 | .code = BPF_JMP | BPF_EXIT, \ 195 | .dst_reg = 0, \ 196 | .src_reg = 0, \ 197 | .off = 0, \ 198 | .imm = 0 }) 199 | 200 | #endif 201 | -------------------------------------------------------------------------------- /standalone/load_and_control_filter.c: -------------------------------------------------------------------------------- 1 | /* 2 | * Copyright 2019 Kai Lüke 3 | * SPDX-License-Identifier: GPL-2.0 4 | * Structure adapted from linux-source-4.19/samples/bpf/test_cgrp2_attach.c 5 | * authored by Daniel Mack, Sargun Dhillon, Joe Stringer, Alexei Starovoitov, and Jakub Kicinski 6 | * 7 | * Loads a BPF cgroup ingress/egress filter bytecode that filters based on the packet size. 8 | * It loads the BPF filter to a given location in /sys/fs/bpf/. 9 | * Through the +/- keys the MTU can be changed interactively (changes values in the BPF map). 10 | * Optionally the inital MTU value can be specified on startup. 11 | * The program can also attach the BPF filter to a cgroup by specifying the cgroup by its path. 12 | * The BPF filter stays loaded when the program exits and has to be deleted manually. 13 | * 14 | * The BPF filter is hardcoded below as BPF assembly instructions and does not use a BPF compiler. 15 | */ 16 | #define _GNU_SOURCE 17 | 18 | #include 19 | #include 20 | #include 21 | #include 22 | #include 23 | #include 24 | #include 25 | #include 26 | #include 27 | #include 28 | #include 29 | 30 | #include "bpf_insn.h" 31 | 32 | const int DEFAULT_MTU = 1500; 33 | 34 | enum { 35 | MAP_KEY_PACKETS_DROPPED, 36 | MAP_KEY_PACKETS_FORWARDED, 37 | MAP_KEY_MTU, 38 | }; 39 | 40 | char bpf_log_buf[BPF_LOG_BUF_SIZE]; 41 | 42 | static int prog_load(int map_fd) 43 | { 44 | struct bpf_insn prog[] = { 45 | BPF_MOV64_REG(BPF_REG_6, BPF_REG_1), /* r6 = skb */ 46 | 47 | BPF_MOV64_IMM(BPF_REG_0, MAP_KEY_MTU), /* (arg) r0 = mtu_key */ 48 | BPF_LD_MAP_FD(BPF_REG_1, map_fd), /* (arg r1) = fd */ 49 | BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_0, -4), /* *(fp-4) = r0 */ 50 | BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), /* r2 = fp */ 51 | BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4), /* (arg) r2 -= 4 */ 52 | BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem), 53 | BPF_MOV64_IMM(BPF_REG_7, 0), /* r7 = 0 (fallback mtu is 0) */ 54 | BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 1), /* may be null, skip mtu read */ 55 | BPF_LDX_MEM(BPF_W, BPF_REG_7, BPF_REG_0, 0), /* r7 = *r0 (i.e. mtu) */ 56 | 57 | BPF_LDX_MEM(BPF_W, BPF_REG_8, BPF_REG_6, offsetof(struct __sk_buff, len)), /* r8 = *(r6+offset) (i.e. skb->len) */ 58 | 59 | BPF_MOV64_IMM(BPF_REG_9, 1), /* r9 = 1 (forward decision) */ 60 | BPF_MOV64_IMM(BPF_REG_0, MAP_KEY_PACKETS_FORWARDED), /* r0 = forward_key */ 61 | BPF_JMP_REG(BPF_JLE, BPF_REG_8, BPF_REG_7, 2), /* jmp to pc+2(update) if r8 <= r7 */ 62 | BPF_MOV64_IMM(BPF_REG_9, 0), /* r9 = 0 (drop decision) */ 63 | BPF_MOV64_IMM(BPF_REG_0, MAP_KEY_PACKETS_DROPPED), /* r0 = drop_key */ 64 | /* update: */ 65 | BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_0, -4), /* (arg) r0 = forward/drop_key */ 66 | BPF_LD_MAP_FD(BPF_REG_1, map_fd), /* (arg r1) = fd */ 67 | BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_0, -4), /* *(fp-4) = r0 */ 68 | BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), /* r2 = fp */ 69 | BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4), /* (arg) r2 -= 4 */ 70 | BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem), 71 | BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 2), /* may be null, skip count write */ 72 | BPF_MOV64_IMM(BPF_REG_1, 1), /* r1 = 1 */ 73 | BPF_RAW_INSN(BPF_STX | BPF_XADD | BPF_DW, BPF_REG_0, BPF_REG_1, 0, 0), /* xadd r0 += r1 */ 74 | 75 | BPF_MOV64_REG(BPF_REG_0, BPF_REG_9), /* r0 = r9 (set forward/drop decision) */ 76 | BPF_EXIT_INSN(), 77 | }; 78 | size_t insns_cnt = sizeof(prog) / sizeof(struct bpf_insn); 79 | 80 | return bpf_load_program(BPF_PROG_TYPE_CGROUP_SKB, 81 | prog, insns_cnt, "GPL", 0, 82 | bpf_log_buf, BPF_LOG_BUF_SIZE); 83 | } 84 | 85 | static int attach_filter(int cg_fd, int type, long long mtu, char* file) 86 | { 87 | int prog_fd, map_fd, ret, key; 88 | long long pkt_cnt_dropped, pkt_cnt_forwarded; 89 | char input; 90 | struct timeval tv; 91 | fd_set fds; 92 | 93 | map_fd = bpf_create_map(BPF_MAP_TYPE_ARRAY, 94 | sizeof(key), sizeof(pkt_cnt_dropped), 95 | 256, 0); 96 | if (map_fd < 0) { 97 | printf("Failed to create map: '%s'\n", strerror(errno)); 98 | return EXIT_FAILURE; 99 | } 100 | 101 | key = MAP_KEY_MTU; 102 | assert(bpf_map_update_elem(map_fd, &key, &mtu, BPF_EXIST) == 0); 103 | 104 | prog_fd = prog_load(map_fd); 105 | printf("Output from kernel verifier:\n%s\n-------\n", bpf_log_buf); 106 | 107 | if (prog_fd < 0) { 108 | printf("Failed to load prog: '%s'\n", strerror(errno)); 109 | return EXIT_FAILURE; 110 | } 111 | 112 | if (bpf_obj_pin(prog_fd, file)) { 113 | printf("Failed to pin prog to '%s': '%s'\n", file, strerror(errno)); 114 | return EXIT_FAILURE; 115 | } 116 | 117 | if (cg_fd != -1) { 118 | /* allow multiple filters (invocation is ordered!) */ 119 | ret = bpf_prog_attach(prog_fd, cg_fd, type, BPF_F_ALLOW_MULTI); 120 | if (ret < 0) { 121 | printf("Failed to attach prog to cgroup: '%s'\n", strerror(errno)); 122 | return EXIT_FAILURE; 123 | } 124 | } 125 | 126 | initscr(); 127 | timeout(1000); 128 | noecho(); 129 | 130 | while (1) { 131 | key = MAP_KEY_PACKETS_DROPPED; 132 | assert(bpf_map_lookup_elem(map_fd, &key, &pkt_cnt_dropped) == 0); 133 | 134 | key = MAP_KEY_PACKETS_FORWARDED; 135 | assert(bpf_map_lookup_elem(map_fd, &key, &pkt_cnt_forwarded) == 0); 136 | 137 | key = MAP_KEY_MTU; 138 | assert(bpf_map_lookup_elem(map_fd, &key, &mtu) == 0); 139 | 140 | clear(); 141 | printw("cgroup dropped %lld packets, forwarded %lld packets, MTU is %lld bytes (Press +/- to change)\n", 142 | pkt_cnt_dropped, pkt_cnt_forwarded, mtu); 143 | 144 | input = getch(); 145 | switch (input) { 146 | case '+': 147 | mtu += 50; 148 | if (mtu < 0) 149 | mtu = 0; 150 | key = MAP_KEY_MTU; 151 | assert(bpf_map_update_elem(map_fd, &key, &mtu, BPF_EXIST) == 0); 152 | break; 153 | case '-': 154 | mtu -= 50; 155 | if (mtu < 0) 156 | mtu = 0; 157 | key = MAP_KEY_MTU; 158 | assert(bpf_map_update_elem(map_fd, &key, &mtu, BPF_EXIST) == 0); 159 | break; 160 | default: 161 | break; 162 | } 163 | } 164 | 165 | return EXIT_SUCCESS; 166 | } 167 | 168 | static int usage(char *argv0) 169 | { 170 | printf("Usage: %s [-m MTU] [-c ] [-t ] \n", argv0); 171 | printf(" -c PATH Attach program to control group in PATH (usually /sys/fs/cgroup/…)\n"); 172 | printf(" -m MTU Set MTU value (default %d)\n", DEFAULT_MTU); 173 | printf(" -t type Attach program as egress/ingress filter (default ingress)\n"); 174 | return EXIT_FAILURE; 175 | } 176 | 177 | int main(int argc, char **argv) 178 | { 179 | int mtu = DEFAULT_MTU; 180 | enum bpf_attach_type type = BPF_CGROUP_INET_INGRESS; 181 | int opt, cg_fd = -1; 182 | 183 | while ((opt = getopt(argc, argv, "c:m:t:")) != -1) { 184 | switch (opt) { 185 | case 'c': 186 | cg_fd = open(optarg, O_DIRECTORY | O_RDONLY); 187 | if (cg_fd < 0) { 188 | printf("Failed to open cgroup path: '%s'\n", strerror(errno)); 189 | return EXIT_FAILURE; 190 | } 191 | break; 192 | case 'm': 193 | mtu = atoi(optarg); 194 | if (mtu < 0) 195 | mtu = 0; 196 | break; 197 | case 't': 198 | if (strcmp(optarg, "ingress") == 0) 199 | type = BPF_CGROUP_INET_INGRESS; 200 | else if (strcmp(optarg, "egress") == 0) 201 | type = BPF_CGROUP_INET_EGRESS; 202 | else 203 | return usage(argv[0]); 204 | break; 205 | default: 206 | return usage(argv[0]); 207 | } 208 | } 209 | 210 | if (argc - optind < 1) 211 | return usage(argv[0]); 212 | 213 | return attach_filter(cg_fd, type, mtu, argv[optind]); 214 | } 215 | -------------------------------------------------------------------------------- /port-firewall/bpf_api.h: -------------------------------------------------------------------------------- 1 | /* Copied from https://git.kernel.org/pub/scm/network/iproute2/iproute2.git/tree/include/bpf_api.h 2 | * (copyrights belong to the iproute2 community) 3 | */ 4 | /* SPDX-License-Identifier: GPL-2.0 */ 5 | #ifndef __BPF_API__ 6 | #define __BPF_API__ 7 | 8 | /* Note: 9 | * 10 | * This file can be included into eBPF kernel programs. It contains 11 | * a couple of useful helper functions, map/section ABI (bpf_elf.h), 12 | * misc macros and some eBPF specific LLVM built-ins. 13 | */ 14 | 15 | #include 16 | 17 | #include 18 | #include 19 | #include 20 | 21 | #include 22 | 23 | #include "bpf_elf.h" 24 | 25 | /** Misc macros. */ 26 | 27 | #ifndef __stringify 28 | # define __stringify(X) #X 29 | #endif 30 | 31 | #ifndef __maybe_unused 32 | # define __maybe_unused __attribute__((__unused__)) 33 | #endif 34 | 35 | #ifndef offsetof 36 | # define offsetof(TYPE, MEMBER) __builtin_offsetof(TYPE, MEMBER) 37 | #endif 38 | 39 | #ifndef likely 40 | # define likely(X) __builtin_expect(!!(X), 1) 41 | #endif 42 | 43 | #ifndef unlikely 44 | # define unlikely(X) __builtin_expect(!!(X), 0) 45 | #endif 46 | 47 | #ifndef htons 48 | # define htons(X) __constant_htons((X)) 49 | #endif 50 | 51 | #ifndef ntohs 52 | # define ntohs(X) __constant_ntohs((X)) 53 | #endif 54 | 55 | #ifndef htonl 56 | # define htonl(X) __constant_htonl((X)) 57 | #endif 58 | 59 | #ifndef ntohl 60 | # define ntohl(X) __constant_ntohl((X)) 61 | #endif 62 | 63 | #ifndef __inline__ 64 | # define __inline__ __attribute__((always_inline)) 65 | #endif 66 | 67 | /** Section helper macros. */ 68 | 69 | #ifndef __section 70 | # define __section(NAME) \ 71 | __attribute__((section(NAME), used)) 72 | #endif 73 | 74 | #ifndef __section_tail 75 | # define __section_tail(ID, KEY) \ 76 | __section(__stringify(ID) "/" __stringify(KEY)) 77 | #endif 78 | 79 | #ifndef __section_xdp_entry 80 | # define __section_xdp_entry \ 81 | __section(ELF_SECTION_PROG) 82 | #endif 83 | 84 | #ifndef __section_cls_entry 85 | # define __section_cls_entry \ 86 | __section(ELF_SECTION_CLASSIFIER) 87 | #endif 88 | 89 | #ifndef __section_act_entry 90 | # define __section_act_entry \ 91 | __section(ELF_SECTION_ACTION) 92 | #endif 93 | 94 | #ifndef __section_lwt_entry 95 | # define __section_lwt_entry \ 96 | __section(ELF_SECTION_PROG) 97 | #endif 98 | 99 | #ifndef __section_license 100 | # define __section_license \ 101 | __section(ELF_SECTION_LICENSE) 102 | #endif 103 | 104 | #ifndef __section_maps 105 | # define __section_maps \ 106 | __section(ELF_SECTION_MAPS) 107 | #endif 108 | 109 | /** Declaration helper macros. */ 110 | 111 | #ifndef BPF_LICENSE 112 | # define BPF_LICENSE(NAME) \ 113 | char ____license[] __section_license = NAME 114 | #endif 115 | 116 | /** Classifier helper */ 117 | 118 | #ifndef BPF_H_DEFAULT 119 | # define BPF_H_DEFAULT -1 120 | #endif 121 | 122 | /** BPF helper functions for tc. Individual flags are in linux/bpf.h */ 123 | 124 | #ifndef __BPF_FUNC 125 | # define __BPF_FUNC(NAME, ...) \ 126 | (* NAME)(__VA_ARGS__) __maybe_unused 127 | #endif 128 | 129 | #ifndef BPF_FUNC 130 | # define BPF_FUNC(NAME, ...) \ 131 | __BPF_FUNC(NAME, __VA_ARGS__) = (void *) BPF_FUNC_##NAME 132 | #endif 133 | 134 | /* Map access/manipulation */ 135 | static void *BPF_FUNC(map_lookup_elem, void *map, const void *key); 136 | static int BPF_FUNC(map_update_elem, void *map, const void *key, 137 | const void *value, uint32_t flags); 138 | static int BPF_FUNC(map_delete_elem, void *map, const void *key); 139 | 140 | /* Time access */ 141 | static uint64_t BPF_FUNC(ktime_get_ns); 142 | 143 | /* Debugging */ 144 | 145 | /* FIXME: __attribute__ ((format(printf, 1, 3))) not possible unless 146 | * llvm bug https://llvm.org/bugs/show_bug.cgi?id=26243 gets resolved. 147 | * It would require ____fmt to be made const, which generates a reloc 148 | * entry (non-map). 149 | */ 150 | static void BPF_FUNC(trace_printk, const char *fmt, int fmt_size, ...); 151 | 152 | #ifndef printt 153 | # define printt(fmt, ...) \ 154 | ({ \ 155 | char ____fmt[] = fmt; \ 156 | trace_printk(____fmt, sizeof(____fmt), ##__VA_ARGS__); \ 157 | }) 158 | #endif 159 | 160 | /* Random numbers */ 161 | static uint32_t BPF_FUNC(get_prandom_u32); 162 | 163 | /* Tail calls */ 164 | static void BPF_FUNC(tail_call, struct __sk_buff *skb, void *map, 165 | uint32_t index); 166 | 167 | /* System helpers */ 168 | static uint32_t BPF_FUNC(get_smp_processor_id); 169 | static uint32_t BPF_FUNC(get_numa_node_id); 170 | 171 | /* Packet misc meta data */ 172 | static uint32_t BPF_FUNC(get_cgroup_classid, struct __sk_buff *skb); 173 | static int BPF_FUNC(skb_under_cgroup, void *map, uint32_t index); 174 | 175 | static uint32_t BPF_FUNC(get_route_realm, struct __sk_buff *skb); 176 | static uint32_t BPF_FUNC(get_hash_recalc, struct __sk_buff *skb); 177 | static uint32_t BPF_FUNC(set_hash_invalid, struct __sk_buff *skb); 178 | 179 | /* Packet redirection */ 180 | static int BPF_FUNC(redirect, int ifindex, uint32_t flags); 181 | static int BPF_FUNC(clone_redirect, struct __sk_buff *skb, int ifindex, 182 | uint32_t flags); 183 | 184 | /* Packet manipulation */ 185 | static int BPF_FUNC(skb_load_bytes, struct __sk_buff *skb, uint32_t off, 186 | void *to, uint32_t len); 187 | static int BPF_FUNC(skb_store_bytes, struct __sk_buff *skb, uint32_t off, 188 | const void *from, uint32_t len, uint32_t flags); 189 | 190 | static int BPF_FUNC(l3_csum_replace, struct __sk_buff *skb, uint32_t off, 191 | uint32_t from, uint32_t to, uint32_t flags); 192 | static int BPF_FUNC(l4_csum_replace, struct __sk_buff *skb, uint32_t off, 193 | uint32_t from, uint32_t to, uint32_t flags); 194 | static int BPF_FUNC(csum_diff, const void *from, uint32_t from_size, 195 | const void *to, uint32_t to_size, uint32_t seed); 196 | static int BPF_FUNC(csum_update, struct __sk_buff *skb, uint32_t wsum); 197 | 198 | static int BPF_FUNC(skb_change_type, struct __sk_buff *skb, uint32_t type); 199 | static int BPF_FUNC(skb_change_proto, struct __sk_buff *skb, uint32_t proto, 200 | uint32_t flags); 201 | static int BPF_FUNC(skb_change_tail, struct __sk_buff *skb, uint32_t nlen, 202 | uint32_t flags); 203 | 204 | static int BPF_FUNC(skb_pull_data, struct __sk_buff *skb, uint32_t len); 205 | 206 | /* Event notification */ 207 | static int __BPF_FUNC(skb_event_output, struct __sk_buff *skb, void *map, 208 | uint64_t index, const void *data, uint32_t size) = 209 | (void *) BPF_FUNC_perf_event_output; 210 | 211 | /* Packet vlan encap/decap */ 212 | static int BPF_FUNC(skb_vlan_push, struct __sk_buff *skb, uint16_t proto, 213 | uint16_t vlan_tci); 214 | static int BPF_FUNC(skb_vlan_pop, struct __sk_buff *skb); 215 | 216 | /* Packet tunnel encap/decap */ 217 | static int BPF_FUNC(skb_get_tunnel_key, struct __sk_buff *skb, 218 | struct bpf_tunnel_key *to, uint32_t size, uint32_t flags); 219 | static int BPF_FUNC(skb_set_tunnel_key, struct __sk_buff *skb, 220 | const struct bpf_tunnel_key *from, uint32_t size, 221 | uint32_t flags); 222 | 223 | static int BPF_FUNC(skb_get_tunnel_opt, struct __sk_buff *skb, 224 | void *to, uint32_t size); 225 | static int BPF_FUNC(skb_set_tunnel_opt, struct __sk_buff *skb, 226 | const void *from, uint32_t size); 227 | 228 | /** LLVM built-ins, mem*() routines work for constant size */ 229 | 230 | #ifndef lock_xadd 231 | # define lock_xadd(ptr, val) ((void) __sync_fetch_and_add(ptr, val)) 232 | #endif 233 | 234 | #ifndef memset 235 | # define memset(s, c, n) __builtin_memset((s), (c), (n)) 236 | #endif 237 | 238 | #ifndef memcpy 239 | # define memcpy(d, s, n) __builtin_memcpy((d), (s), (n)) 240 | #endif 241 | 242 | #ifndef memmove 243 | # define memmove(d, s, n) __builtin_memmove((d), (s), (n)) 244 | #endif 245 | 246 | /* FIXME: __builtin_memcmp() is not yet fully useable unless llvm bug 247 | * https://llvm.org/bugs/show_bug.cgi?id=26218 gets resolved. Also 248 | * this one would generate a reloc entry (non-map), otherwise. 249 | */ 250 | #if 0 251 | #ifndef memcmp 252 | # define memcmp(a, b, n) __builtin_memcmp((a), (b), (n)) 253 | #endif 254 | #endif 255 | 256 | unsigned long long load_byte(void *skb, unsigned long long off) 257 | asm ("llvm.bpf.load.byte"); 258 | 259 | unsigned long long load_half(void *skb, unsigned long long off) 260 | asm ("llvm.bpf.load.half"); 261 | 262 | unsigned long long load_word(void *skb, unsigned long long off) 263 | asm ("llvm.bpf.load.word"); 264 | 265 | #endif /* __BPF_API__ */ 266 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Examples for cgroup socket ingress/egress BPF filters with systemd 2 | 3 | These are examples for my [blogpost about custom BPF firewalls for systemd services](https://kailueke.gitlab.io/systemd-custom-bpf-firewall/) 4 | that I implemented in [this commit](https://github.com/systemd/systemd/commit/fab347489fcfafbc8367c86afc637ce1b81ae59e). 5 | You can find more explanations and examples on how to use it when reading the blog post. 6 | 7 | ## Update: Port-based BPF firewall compiled with clang 8 | 9 | The [port-firewall folder](port-firewall/) contains a 10 | small configurable packet filter that parses IP/IPv6 packets, ICMP, UDP ports, 11 | and TCP ports. 12 | The forward rule is a C expression passed as `FILTER` variable 13 | to the compiler with `-D`. 14 | 15 | The expression can use the boolean variables `udp`, `tcp`, `icmp`, `ip`, and `ipv6` denoting the packet type and the the integers `dst_port` and `src_port` for the UDP/TCP ports. 16 | If the expression evaluates to 0 (false), the packet will be dropped. 17 | Valid filters examples are `FILTER='icmp || (udp && dst_port == 53) || (tcp && dst_port == 80)'` or `FILTER='!udp || dst_port == 53'`. 18 | 19 | The makefile requires to pass the filter to build the program: `make FILTER='…'`. 20 | With `make load` the bytecode is loaded to `/sys/fs/bpf/port-firewall` as pinned BPF program in the special BPF filesystem. 21 | 22 | From there you can use it with the systemd options `IP(Ingress|Egress)FilterPath=` or attach 23 | it manually to a cgroup. 24 | 25 | The [folder](port-firewall/) also includes a `bpf-make.service` systemd unit file to configure and load the firewall 26 | and an example `my-filtered-ping.service` file that uses the loaded firewall. 27 | It includes an workaround you can use to not require systemd v243. 28 | 29 | Instead of making a custom loader unit for every service you can also use the systemd unit template 30 | `bpf-firewall@.service` by as done in the `service-with-filter@.service` file. 31 | This is also a unit template used through `systemctl start "service-with-filter@icmp || (udp && dst_port == 53).service"` 32 | but it can also be a regular service without the `@`. The disadvantage is that you have to specify the filter more than 33 | once it the file but the advantage is that it's you don't use templates unnecessary and don't have the filter as (ugly) 34 | part of the service name. 35 | Ideally I would store the filter in a BPF map so that it can be set after BPF program loading, allowing it to be 36 | a simple `ExecStartPre` line in the service file. 37 | 38 | The next section shows how to load and use a simple dropping filter as template for your own filters if you don't want to use this one. 39 | 40 | ## Simple dropping filter compiled with clang 41 | 42 | In the [bpf-program-only folder](bpf-program-only/) is a 43 | minimal BPF cgroup filter dropping all packets. 44 | You can build it with `make` and then run 45 | `make load` to `/sys/fs/bpf/cgroup-sock-drop`. 46 | This will load it with `bpftool` to 47 | `/sys/fs/bpf/cgroup-sock-drop`. 48 | 49 | You can use this to specify an `IPIngressFilterPath` or `IPEgressFilterPath` 50 | for systemd services (>= 243). 51 | Here an example with ping running in a temporary systemd system scope (or service) 52 | with an ingress filter but no egress filter. You can also use user scopes without `sudo` by passing `--user`. 53 | 54 | ``` 55 | $ sudo systemd-run -p IPIngressFilterPath=/sys/fs/bpf/cgroup-sock-drop --scope ping 127.0.0.1 56 | Running scope as unit: run-re62ba1c….scope 57 | PING 127.0.0.1 (127.0.0.1) 56(84) bytes of data. 58 | ^C # cancel since it will not get responses 59 | --- 127.0.0.1 ping statistics --- 60 | 8 packets transmitted, 0 received, 100% packet loss, time 186ms 61 | ``` 62 | 63 | You can also find the example service `my-ping.service` configured with a filter 64 | that is loaded from its dependency service `cgroup-sock-drop-filter.service` (please note 65 | the `LimitMEMLOCK=infinity` entry in the unit that load the filter). When you want to use 66 | it for your service to start at boot you need to add, e.g., 67 | `After=network.target` and an `[Install]` section with `WantedBy=multi-user.target`. 68 | 69 | Until systemd 243 is released you can also try the interactive MTU filter program from the next section below which has an option to attach the filter to a cgroup. 70 | 71 | ### Workaround when systemd 243 is not available 72 | Use `systemd-run` to spawn a shell in a new cgroup either as system scope or user scope (a temporary service). `-S` can be replaced with a concrete binary if you don't want to start a shell. 73 | 74 | ``` 75 | $ sudo systemd-run --scope -S 76 | $ # or: 77 | $ systemd-run --user --scope -S 78 | Running scope as unit: run-r63de6b74621b4ae3877d4fa86b54be75.scope 79 | ``` 80 | 81 | This will print out the unit name which is also the name of the cgroup. The full cgroup path for the system service shell is `/sys/fs/cgroup/unified/system.slice/NAME`. For the user service shell the path is `/sys/fs/cgroup/unified/user.slice/user-1000.slice/user@1000.service/NAME` depening on your UID not being `1000`. 82 | 83 | Then attach the BPF program to the cgroup: 84 | 85 | ``` 86 | $ sudo $(which bpftool) cgroup attach /sys/fs/cgroup/unified/user.slice/user-1000.slice/user@1000.service/run-rfaa93ac79de2482d8ef1870fd6b508cd.scope egress pinned /sys/fs/bpf/cgroup-sock-drop multi 87 | ``` 88 | 89 | You can either choose `ingress` or `egress` to filter incoming or outgoing packets. You can load the same filter for both `ingress` and `egrees` and you can load multiple different filters per `ingress`/`egress` (which also true when used through the systemd v243 option above). 90 | If you turn on `IPAccounting` in `systemd-run` you need to turn on `Delegate` as well to allow multiple BPF programs. 91 | 92 | ## Interactive MTU filter 93 | 94 | In the [standalone folder](standalone/) is an interactive program 95 | that loads a BPF filter and controls its behavior. 96 | 97 | _From the source code comment:_ 98 | Loads a BPF cgroup ingress/egress filter bytecode that filters based on the packet size. 99 | It loads the BPF filter to a given location in `/sys/fs/bpf/`. 100 | Through the +/- keys the MTU can be changed interactively (changes values in the BPF map). 101 | Optionally the initial MTU value can be specified on startup. 102 | The program can also attach the BPF filter to a cgroup by specifying the cgroup by its path. 103 | The BPF filter stays loaded when the program exits and has to be deleted manually. 104 | 105 | It does not use a BPF compiler but uses hardcoded BPF assembly instructions 106 | to include the BPF code in the final program. Not very accessible for hacking 107 | but for me it was interesting to see how BPF instructions work 108 | and what needs to be done to comply with the verifier. 109 | 110 | ### With systemd >= 243 111 | In one terminal you can run the interactive filter: 112 | 113 | ``` 114 | $ sudo ./load_and_control_filter -m 100 -t ingress /sys/fs/bpf/ingressfilter 115 | cgroup dropped 0 packets, forwarded 0 packets, MTU is 100 bytes (Press +/- to change) 116 | … # keeps running 117 | ``` 118 | 119 | It loaded the BPF filter to `/sys/fs/bpf/ingressfilter` which you can use for a 120 | systemd service. 121 | 122 | In another terminal you can, for example, again run ping as root in a temporary 123 | systemd scope and specify our filter as `IPIngressFilterPath`: 124 | 125 | ``` 126 | $ sudo systemd-run -p IPIngressFilterPath=/sys/fs/bpf/ingressfilter --scope ping 127.0.0.1 127 | Running scope as unit: run-….scope 128 | PING 127.0.0.1 (127.0.0.1) 56(84) bytes of data. 129 | 64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.086 ms 130 | 64 bytes from 127.0.0.1: icmp_seq=2 ttl=64 time=0.069 ms 131 | … 132 | ``` 133 | 134 | When you switch back to the first terminal and press `-`, the new MTU is 50 bytes 135 | and you can see the dropped packet count increase. 136 | In the ping terminal you will see no new responses because they are all dropped. 137 | 138 | ### Workaround when systemd 243 is not available 139 | 140 | The `load_and_control_filter` program can be told to attach the filter to a cgroup 141 | of a systemd service. 142 | 143 | Systemd uses a BPF filter for its IP accounting and firewalling based on IP addresses. 144 | If such a filter is present but no others, the flag to allow multiple BPF filters for a cgroup is missing. 145 | As workaround when, e.g., IP accounting is enabled, you can tell systemd that the cgroup management is done by externally. 146 | This means that systemd will use the flag to allow multiple BPF filters instead of loading the 147 | IP accounting BPF filter without this flag. 148 | 149 | ``` 150 | $ sudo systemd-run -p IPAccounting=yes -p Delegate=yes --scope ping 127.0.0.1 151 | Running scope as unit: run-r9f31b3947f4c4a11a24babf5517fe025.scope 152 | PING 127.0.0.1 (127.0.0.1) 56(84) bytes of data. 153 | 64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.086 ms 154 | 64 bytes from 127.0.0.1: icmp_seq=2 ttl=64 time=0.069 ms 155 | … 156 | ``` 157 | 158 | You can see the scope name in the first output line. 159 | This is also the last part of the cgroup path you have to use as argument 160 | in order to attach the filter to the cgroup. 161 | 162 | ``` 163 | $ sudo ./load_and_control_filter -m 100 -c /sys/fs/cgroup/unified/system.slice/run-r9f31b3947f4c4a11a24babf5517fe025.scope -t ingress /sys/fs/bpf/myfilter 164 | cgroup dropped 0 packets, forwarded 4 packets, MTU is 100 bytes (Press +/- to change) 165 | … # keeps running and increases the forward count 166 | ``` 167 | 168 | Now hit `-` to reduce the MTU and observe the packet drop count increasing while no ping responses can be seen. 169 | --------------------------------------------------------------------------------