├── Express_Data_Path.pdf
├── README.md
├── XDP_Inside_and_Out.pdf
├── bpf-internals-1.md
├── bpf-internals-2.md
├── bpf_helpers.rst
├── bpf_llvm_2015aug19.pdf
├── bpf_netdev_conference_2016Feb12.pdf
├── bpf_netdev_conference_2016Feb12_report.pdf
├── bpf_netdev_conference_2016Oct07.pdf
├── bpf_netdev_conference_2016Oct07_tcws.pdf
├── bpf_netvirt_2015aug21.pdf
├── bpf_network_examples_2015aug20.pdf
├── bpftrace_public_template_jun2019.odp
├── bpftrace_public_template_jun2019.pdf
├── eBPF.md
├── ebpf_excerpt_20Aug2015.pdf
├── ebpf_http_filter.pdf
├── meetups
    └── 2015-09-21
    │   └── iovisor-bcc-intro.pdf
├── netconf_2016feb.pdf
├── openstack
    ├── 2015-10-29
    │   └── iovisor-mesos-demo.pdf
    └── 2016-04-25
    │   └── OpenStackSummitAustin2016_iovisor_v1.0.pdf
├── p4
    ├── 2015-11-18
    │   └── iovisor-p4-workshop-nov-2015.pdf
    └── p4toEbpf-bcc.pdf
├── p4AbstractSwitch.pdf
├── tsc-meeting-minutes
    ├── 2015-09-02
    │   └── eBPF_to_IOV_Module.pptx
    └── 2015-09-16
    │   ├── iomodules-slides.pdf
    │   └── iovisor-odl-gbp-module.pdf
└── university
    ├── eBPF_IOVisor_academic_research.pdf
    └── sigcomm-ccr-InKev-2016.pdf


/Express_Data_Path.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/iovisor/bpf-docs/0b9f8ab13f1d2e946325c179f961563ea6e23e65/Express_Data_Path.pdf


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # bpf-docs
2 | Presentations and docs
3 | 


--------------------------------------------------------------------------------
/XDP_Inside_and_Out.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/iovisor/bpf-docs/0b9f8ab13f1d2e946325c179f961563ea6e23e65/XDP_Inside_and_Out.pdf


--------------------------------------------------------------------------------
/bpf-internals-1.md:
--------------------------------------------------------------------------------
  1 | # BPF Internals - I
  2 | 
  3 | *by Suchakra Sharma*
  4 | 
  5 | Recent [post by Brendan Gregg](http://www.brendangregg.com/blog/2015-05-15/ebpf-one-small-step.html) inspired me 
  6 | to write my own blog post about my findings of how Berkeley Packet Filter (BPF) evolved, it's interesting history 
  7 | and the immense powers it holds - the way Brendan calls it 'brutal'. I came across this while studying interpreters 
  8 | and small process virtual machines like the proposed KTap's VM. I was looking at some known papers on 
  9 | [register vs stack basd VMs](https://www.usenix.org/legacy/events/vee05/full_papers/p153-yunhe.pdf), their performances and various code dispatch mechanisms used in these small VMs. The review of state-of-the-art soon moved to native code compilation and a [discussion on LWN](http://lwn.net/Articles/598545/) caught my eye. The benefits of JIT were too good to be overlooked, and BPF's application in things like filtering, tracing and seccomp (used in Chrome as well) made me interested. I knew that the kernel devs were on to something here. This is when I started digging through the BPF background.
 10 | 
 11 | ## Background
 12 | 
 13 | Network packet analysis requires an interesting bunch of tech. Right from the time a packet reaches the embedded 
 14 | controller on the network hardware in your PC (hardware/data link layer) to the point they do someting useful in 
 15 | your system, such as display something in your browser (application layer). For connected systems evolving these 
 16 | days, the amount of data transfer is huge, and the support infrastructure for the network analysis needed a way to 
 17 | filter out things pretty fast. The initial concept of packet filtering developed keeping in mind such needs and there 
 18 | were many stategies discussed with every filter such as CMU/Stanford packet Filter (CSPF), Sun's NIT filter and so on. 
 19 | For example, some earlier filtering approaches used a tree based model (in CSPF) to represenf filters and filter them
 20 | out using predicate-tree walking. This earlier approach was also inherited in the Linux kernel's old filter in the 
 21 | net subsystem.
 22 | 
 23 | Consider an engineer's need to have a probably simple and unrealistic filter on the network packets with the predicates
 24 | P1, P2, P3 and P4 :
 25 | 
 26 | [![equation](https://suchakra.files.wordpress.com/2015/05/equation.png?w=660)](https://suchakra.files.wordpress.com/2015/05/equation.png)
 27 | 
 28 | Filtering approach like the one of CSPF would have represented this filter in a expression tree structure as follows:
 29 | 
 30 | [![tree](https://suchakra.files.wordpress.com/2015/05/tree2.png?w=127)](https://suchakra.files.wordpress.com/2015/05/tree2.png)
 31 | 
 32 | It is then trivial to walk the tree evaluating each expression and performing operations on each of them. But this 
 33 | would mean there can be extra costs assiciated with evaluating the predicates which may not necessarily have to be 
 34 | evaluated. For example, what if the packet is neither an ARP packet nor an IP packet? Having the knowledge that P1 
 35 | and P2 predicates are untrue, we may need not have to evaluate other 2 predicates and perform 2 other boolean operation
 36 | on them to determine the outcome.
 37 | 
 38 | In 1992-93, McCanne et al. proposed a [BSD Packet Filter](http://www.tcpdump.org/papers/bpf-usenix93.pdf) with a 
 39 | new CFG-bytecode based filter design. This was an in-kernel approach where a tiny interpreter would evaluate 
 40 | expressions represented as BPF bytecodes. Instead of simple expression trees, they proposed a CFG based filter design. 
 41 | One of the control flow graph representation of the same filter above can be:
 42 | 
 43 | [![cfg](https://suchakra.files.wordpress.com/2015/05/cfg.png?w=167)](https://suchakra.files.wordpress.com/2015/05/cfg.png)
 44 | 
 45 | The evaluation can start from P1 and the right edge is for FALSE and left is for TRUE with each predicate being 
 46 | evaluated in this fashion until the evaluation reaches the final result of TRUE or FALSE. The inherent property of
 47 | 'remembering' in the CFG, i.e, if P1 and P2 are false, the path reaches a final FALSE is remembered and P3 and P4 
 48 | need not be evaluated. This was then easy to represent in bytecode form where a minimal BPF VM can be designed to 
 49 | evaluate these predicates with jumps to TRUE or FALSE targets.
 50 | 
 51 | ### The BPF Machine
 52 | 
 53 | A pseudo-instruction representation of the same filter described above for earlierversions of BPF in Linux kernel can be shown as,
 54 | 
 55 | ```ASM
 56 | l0:	ldh [12]
 57 | l1:	jeq #0x800, l3, l2
 58 | l2: jeq #0x805, l3, l8
 59 | l3:	ld [26]
 60 | l4:	jeq #SRC, l4, l8
 61 | l5: ld len
 62 | l6: jlt 0x400, l7, l8
 63 | l7:	ret #0xffff
 64 | l8:	ret #0
 65 | ```
 66 | 
 67 | To know how to read these BPF instructions, look at the 
 68 | [filter documentation](https://www.kernel.org/doc/Documentation/networking/filter.txt) in Kernel source and see 
 69 | what each line does. Each of these instructions are actually just bytecodes which the BPF machine interprets. Like 
 70 | all real machines, this requires a definition of how the VM internals would look like. In the Linux kernel's version
 71 | of the BPF based in-kernel filtering technique they adopted, there were initially just 2 important registers, A and 
 72 | X with another 16 register 'scratch space' M[0-15]. The Instruction format and some sample instructions for this 
 73 | earlier version of BPF are shown below:
 74 | 
 75 | ```C
 76 | /* Instruction format: { OP, JT, JF, K }
 77 |  * OP: opcode, 16 bit
 78 |  * JT: Jump target for TRUE
 79 |  * JF: Jump target for FALSE
 80 |  * K: 32 bit constant
 81 |  */
 82 | 
 83 | /* Sample instructions*/
 84 | { 0x28,  0,  0, 0x0000000c },     /* 0x28 is opcode for ldh */
 85 | { 0x15,  1,  0, 0x00000800 },     /* jump next to next instr if A = 0x800 */
 86 | { 0x15,  0,  5, 0x00000805 },     /* jump to FALSE (offset 5) if A != 0x805 */
 87 | ..
 88 | ```
 89 | 
 90 | There were some **radical changes done to the BPF infrastructure recently** - extensions to its instruction set, 
 91 | registers, addition of things like BPF-maps etc. We shall discuss what those changes in detail, probably in the 
 92 | next post in this series. For now we'll just see the good ol' way of how BPF worked.
 93 | 
 94 | ### Interpreter
 95 | 
 96 | Each of the instructions seen above are represented as arrays of these 4 values and each program is an array of 
 97 | such instructions. The BPF interpreter sees each opcode and performs the operations on the registers or data 
 98 | accordingly after it goes through a verifier for a sanity check to make sure the filter code is secure and would not 
 99 | cause harm. The program which consists of these instructions, then passes through a dispatch routine. As an example, 
100 | here is a small snippet from the BPF instruction dispatch for the instruction 'add' before it was restructured in 
101 | Linux kernel v3.15 onwards,
102 | 
103 | ```C
104 | 127         u32 A = 0;                      /* Accumulator */
105 | 128         u32 X = 0;                      /* Index Register */
106 | 129         u32 mem[BPF_MEMWORDS];          /* Scratch Memory Store */
107 | 130         u32 tmp;
108 | 131         int k;
109 | 132
110 | 133         /*
111 | 134          * Process array of filter instructions.
112 | 135          */
113 | 136         for (;; fentry++) {
114 | 137 #if defined(CONFIG_X86_32)
115 | 138 #define K (fentry->k)
116 | 139 #else
117 | 140                 const u32 K = fentry->k;
118 | 141 #endif
119 | 142 
120 | 143                 switch (fentry->code) {
121 | 144                 case BPF_S_ALU_ADD_X:
122 | 145                         A += X;
123 | 146                         continue;
124 | 147                 case BPF_S_ALU_ADD_K:
125 | 148                         A += K;
126 | 149                         continue;
127 | 150 ..
128 | ```
129 | 
130 | Above snippet is taken from net/core/filter.c in Linux kernel v3.14\. Here, `fentry` is the `socket_filter` 
131 | structure and the filter is applied to the `sk_buff` data element. The dispatch loop (136), runs till all the 
132 | instructions are exhaused. The dispatch is basically a huge switch-case dispatch with each opcode being tested (143) 
133 | and necessary action being taken. For example, here an 'add' operation on registers would add A+X and store it in A. 
134 | Yes, this is simple isn't it? Let us take it a level above.
135 | 
136 | ### JIT Compilation
137 | 
138 | This is nothing new. JIT compilation of bytecodes has been there for a long time. I think it is one of those eventual steps taken once an interpreted language decides to look for optimizing bytecode execution speed. Interpreter dispatches can be a bit costly once the size of the filter/code and the execution time increases. With high frequency packet filtering, we need to save as much time as possible and a good way is to convert the bytecode to native machine code by Just-In-Time compiling it and then executing the native code from the code cache. For BPF, JIT was discussed first in the [BPF+ research paper](http://dl.acm.org/citation.cfm?id=316214) by Begel etc al. in 1999\. Along with other optimizations (redundant predicate elimination, peephole optimizations etc,) a JIT assembler for BPF bytecodes was also discussed. They showed improvements from 3.5x to 9x in certain cases. I quickly started seeing if the Linux kernel had done something similar. And behold, here is how the JIT looks like for the 'add' instruction we discussed before (Linux kernel v3.14),
139 | 
140 | ```C
141 | 288                switch (filter[i].code) {
142 | 289                case BPF_S_ALU_ADD_X: /* A += X; */
143 | 290                        seen |= SEEN_XREG;
144 | 291                        EMIT2(0x01, 0xd8);              /* add %ebx,%eax */
145 | 292                        break;
146 | 293                case BPF_S_ALU_ADD_K: /* A += K; */
147 | 294                        if (!K)
148 | 295                                break;
149 | 296                        if (is_imm8(K))
150 | 297                                EMIT3(0x83, 0xc0, K);   /* add imm8,%eax */
151 | 298                        else
152 | 299                                EMIT1_off32(0x05, K);   /* add imm32,%eax */
153 | 300                        break;
154 | ```
155 | 
156 | As seen above in arch/x86/net/bpf_jit_comp.c for v3.14, instead of performing operations during the code dispatch 
157 | directly, the JIT compiler [emits](http://lxr.free-electrons.com/source/arch/x86/net/bpf_jit_comp.c?v=3.14#L40) 
158 | the native code to a memory area and keeps it ready for execution.The JITed filter image is built like a function
159 | call, so we add some prologue and epilogue to it as well,
160 | 
161 | ```C
162 | /* JIT image prologue */
163 | 221                EMIT4(0x55, 0x48, 0x89, 0xe5); /* push %rbp; mov %rsp,%rbp */
164 | 222                EMIT4(0x48, 0x83, 0xec, 96);   /* subq  $96,%rsp */
165 | ```
166 | 
167 | There are rules to BPF (such as no-loop etc.) which the verifier checks before the image is built as we are 
168 | now in dangerous waters of executing external machine code inside the linux kernel. In those days, all this 
169 | would have been done by [bpf_jit_compile](http://lxr.free-electrons.com/source/arch/x86/net/bpf_jit_comp.c?v=3.14#L181)
170 | which upon completion would point the filter function to the filter image,
171 | 
172 | ```C
173 | 774                 fp->bpf_func = (void *)image
174 | ```
175 | Smooooooth... Upon execution of the filter function, instead of interpreting, the filter will now start executing 
176 | the native code. Even though things have changed a bit recently, this had been indeed a fun way to learn how 
177 | interpreters and JIT compilers work in general and the kind of optimizations that can be done. In the next part of 
178 | this post series, I will look into what changes have been done recently, the restructuring and extension efforts to 
179 | BPF and its evolution to eBPF along with BPF maps and the very recent and ongoing efforts in 
180 | [hist-triggers](https://lwn.net/Articles/639992/). I will discuss about my experiemntal userspace eBPF library 
181 | and it's use for LTTng's UST event filtering and its comparison to LTTng's bytecode interpreter. 
182 | Brendan's [blog-post](http://www.brendangregg.com/blog/2015-05-15/ebpf-one-small-step.html) is highly recommended 
183 | and so are the links to 'More Reading' in that post. Thanks to Alexei Starovoitov, Eric Dumazet and all the other 
184 | kernel contributors to BPF that I may have missed. They are doing awesome work and are the direct source for my 
185 | learnings as well. It seems, looking at versatility of eBPF, it's adoption in newer tools like 
186 | [shark](http://www.sharkly.io/), and with Brendan's views and 
187 | [first experiments](https://github.com/brendangregg/BPF-tools), this may indeed be the next big thing in tracing.
188 | 


--------------------------------------------------------------------------------
/bpf-internals-2.md:
--------------------------------------------------------------------------------
  1 | #BPF Internals - II
  2 | *by Suchakra Sharma*
  3 | 
  4 | Continuing from where I left [before](https://suchakra.wordpress.com/2015/05/18/bpf-internals-i/), in this post we would see some of the major changes in BPF that have happened recently - how it is evolving to be a very stable and accepted in-kernel VM and can probably be the next big thing - in not just filtering but going beyond. From what I observe, the most attractive feature of BPF is its ability to give access to the developers so that they can execute dynamically compiled code within the kernel - in a limited context, but still securely. This itself is a valuable asset.
  5 | 
  6 | As we have seen already, the use of BPF is not just limited to filtering out network packets but for seccomp, tracing etc. The eventual step for BPF in such a scenario was to evolve and come out of it's use in the network filtering world. To improve the architecture and bytecode, lots of additions have been proposed. I started a bit late when I saw Alexei's patches for kernel version 3.17-rcX. Perhaps, [this](https://lkml.org/lkml/2013/9/30/627) was the relevant mail by Alexei that got me interested in the upcoming changes. So, here is a summary of what all major changes have occured. We will be seeing each of them in sufficient detail.
  7 | 
  8 | ### Architecture
  9 | 
 10 | The classic BPF we discussed in the last post had two 32-bit registers - A and X. All arithmetic operations were supported and performed using these two registers. The newer BPF called extended-BPF or eBPF has ten 64-bit registers and supports arbitary load/stores. It also contains new instructions like `BPF_CALL` which can be used to call some new kernel-side helper functions. We will look into this in detail a bit later as well. The new eBPF follows calling conventions which are more like modern machines (x86_64). Here is the mapping of the new eBPF registers to x86 registers :
 11 | 
 12 | ```
 13 | R0 – rax      return value from function
 14 | R1 – rdi      1st argument
 15 | R2 – rsi      2nd argument
 16 | R3 – rdx      3rd argument
 17 | R4 – rcx      4th argument
 18 | R5 – r8       5th argument
 19 | R6 – rbx      callee saved
 20 | R7 - r13      callee saved
 21 | R8 - r14      callee saved
 22 | R9 - r15      callee saved
 23 | R10 – rbp     frame pointer
 24 | ```
 25 | 
 26 | The closeness to the machine ABI also ensures that unnecessary register spilling/copying can be avoided. The R0 register stores the return from the eBPF program and the eBPF program contexts can be loaded through register R1\. Earlier, there used to be just two jump targets i.e. either jump to TRUE or FALSE targets. Now, there can be arbitary jump targets - true or fall through. Another aspect of the eBPF instruction set is the ease of use with the in-kernel JIT compiler. eBPF Registers and most instructions are now mapped one-to-one. This makes emitting these eBPF instructions from any external compiler (in userspace) not such a daunting task. Of course, prior to any execution, the generated bytecode is passed through a verifier in the kernel to check its sanity. The verifier in itself is a very interesting and important piece of code and probably story for another day.
 27 | 
 28 | ### Building BPF Programs
 29 | 
 30 | From a users perspective, the new eBPF bytecode can now be another headache to generate. But fear not, an LLVM based backend now supports instructions being generated for BPF pseudo-machine type directly. It is being 'graduated' from just being an experimental backend and can hit the shelf anytime soon. In the meantime, you can always use [this script](https://gist.github.com/tuxology/357d8826e97eb72c9277) to setup the BPF supported LLVM yourslef. But, then what next? So, a BPF program (not necessarily just a filter anymore) can be done in two parts - A kernel part (the BPF bytecode which will get loaded in the kernel) and the userspace part (which may, if needed gather data from the kernel part) Currently you can specify a eBPF program in a restricted C like language. For example, here is a program in the restricted C which returns true if the first argument of the input program context is 42\.  Nothing fancy :
 31 | 
 32 | ```C
 33 | #include <include/bpf.h>
 34 | 
 35 | int answer(struct bpf_context *ctx)
 36 | {
 37 |     int life;
 38 |     life = ctx->arg1;
 39 | 
 40 |     if (life == 42){
 41 |         return 1;
 42 |     }
 43 |     return 0;
 44 | }
 45 | ```
 46 | 
 47 | This C like syntax generates a BPF binary which can then be loaded in the kernel. Here is what it looks like in BPF 'assembly' representation as generated by the LLVM backed (supplied with 3.4) :
 48 | 
 49 | ```ASM
 50 |         .text
 51 |         .globl  answer
 52 |         .align  8
 53 | answer:                                 # @answer
 54 | # BB#0:
 55 |         ldw     r1, 0(r1)
 56 |         mov     r0, 1
 57 |         mov     r2, 42
 58 |         jeq     r1, r2 goto .LBB0_2
 59 | # BB#1:
 60 |         mov     r0, 0
 61 | .LBB0_2:
 62 |         andi    r0, 1
 63 |         ret
 64 | ```
 65 | 
 66 | If you are adventerous enough, you can also probably write complete and valid [BPF programs in assembly](http://lxr.free-electrons.com/source/samples/bpf/test_verifier.c#L36) in a single go - right from your userspace program. I do not know if this is of any use these days. I have done this sometime back for a moderately elaborate trace filtering program though. It is also not effective as well, becasue I think at this point in human history, LLVM can generate assembly better and more efficiently than a human.
 67 | 
 68 | What we discussed just now is probably not a relevant program anymore. An [example by Alexei](http://lxr.free-electrons.com/source/samples/bpf/tracex1_kern.c) here is what is more relevant these days. With the integration of Kprobe with BPF, a BPF program can be run at any valid dynamically instrumentable function in the kernel. So now, we can probably just use pt_regs as the context and get individual register values at each time the probe is hit. As of now, some helper functions are available in BPF as well, which can get the current timestamp. You can have a very cheap tracing tool right there :)
 69 | 
 70 | ### BPF Maps
 71 | 
 72 | I think one of the most interesting features in this new eBPF is the BPF maps. It looks like an abstract data type - initially a hash-table, but from kernel 3.19 onwards, support for array-maps seems to have been added as well. These bpf_maps can be used to store data generated from a eBPF program being executed. You can see the implementation details in [arraymap.c](http://lxr.free-electrons.com/source/kernel/bpf/arraymap.c) or [hashtab.c](http://lxr.free-electrons.com/source/kernel/bpf/hashtab.c) Lets pause for a while and see some more magic added in eBPF - esp. the BPF syscall which forms the primary interface for the user to interact and use eBPF. The reason we want to know more about this syscall is to know how to work with these cool BPF maps.
 73 | 
 74 | #### BPF Syscall
 75 | 
 76 | Another nice thing about eBPF is a new syscall being added to make life easier while dealing with BPF programs. In an [article](https://lwn.net/Articles/603983/) last year on LWN Jonathan Corbet discussed the use of BPF syscall. For example, to load a BPF program you could call
 77 | 
 78 | ```C
 79 | syscall(__NR_bpf, BPF_PROG_LOAD, &attr, sizeof(attr));
 80 | ```
 81 | 
 82 | with of course, the corresponding bpf_attr structure being filled before :
 83 | 
 84 | ```C
 85 | union bpf_attr attr = {
 86 | 	.prog_type = prog_type, /* kprobe filter? socket filter? */  
 87 | 	.insns = ptr_to_u64((void *) insns), /* complete bpf instructions */
 88 | 	.insn_cnt = prog_len / sizeof(struct bpf_insn), /* how many? */
 89 | 	.license = ptr_to_u64((void *) license), /* GPL maybe */
 90 | 	.log_buf = ptr_to_u64(bpf_log_buf), /* log buffer */
 91 | 	.log_size = LOG_BUF_SIZE,
 92 | 	.log_level = 1,
 93 | };
 94 | ```
 95 | 
 96 | Yes, this may seem cumbersome to some, so for now, there are some [wrapper functions](http://lxr.free-electrons.com/source/samples/bpf/bpf_load.c#L33) in [bpf_load.c](http://lxr.free-electrons.com/source/samples/bpf/bpf_load.c) and [libbpf.c](http://lxr.free-electrons.com/source/samples/bpf/libbpf.c) released to help folks out where you may need not give too many details about your compiled bpf program. Much of what happens in the BPF syscall is determined by the arguments supported [here](http://lxr.free-electrons.com/source/kernel/bpf/syscall.c#L551). To elaborate more, let's see how to load the BPF program we did before. Assuming that we have the [sample program](http://lxr.free-electrons.com/source/samples/bpf/tracex1_kern.c) in its BPF bytecode form generated and now we want to load it up, we take the help of the wrapper function [load_bpf_file()](http://lxr.free-electrons.com/source/samples/bpf/bpf_load.c#L190) which parses the BPF ELF file and extracts the BPF bytecode from the relevant section. It also iterates over all ELF sections to get licence info, map info etc. Eventually, as per the type of BPF program - Kprobre/kretprobe or socket program, and the info and bytecode just gathered from the ELF parsing, the bpf_attr attribute structure is filled and actual syscall is made.
 97 | 
 98 | #### Creating and accessing BPF maps
 99 | 
100 | Coming back to the maps, apart from this simple syscall to load the BPF program, there are many more actions that can be taken based on just the arguments. Have a look at [bpf/syscall.c](http://lxr.free-electrons.com/source/kernel/bpf/syscall.c) From the userspace side the new BPF syscall comes to the rescue and allows most of these operations on bpf_maps to be performed! From the kernel side however, with some special helper function and the use of BPF_CALL instruction, the values in these maps can be updated/deleted/accessed etc. These [helpers](http://lxr.free-electrons.com/source/kernel/bpf/helpers.c) inturn call the actual function according to the type of map - hash-map or an array. For example, here is a BPF program that just creates an array-map and does nothing else,
101 | 
102 | ```C
103 | #include <uapi/linux/bpf.h>
104 | #include "bpf_helpers.h"
105 | #include <linux/version.h>
106 | 
107 | struct bpf_map_def SEC("maps") sample_map = { 
108 |     .type = BPF_MAP_TYPE_ARRAY,
109 |     .key_size = sizeof(u32),
110 |     .value_size = sizeof(unsigned int),
111 |     .max_entries = 1000,
112 | };
113 | 
114 | char _license[] SEC("license") = "GPL";
115 | u32 _version SEC("version") = LINUX_VERSION_CODE;
116 | ```
117 | 
118 | When loaded in the kernel, the array-map is created. Form the userspace we can then probably initialize the map with some values with a function that look likes this,
119 | 
120 | ```C
121 | static void init_array() 
122 | {
123 |     int key;
124 |     for (key = 0; key < 1000; key++) {
125 |         bpf_update_elem(map_fd[0], &key, &value1, BPF_ANY);
126 |     }
127 | }
128 | ```
129 | 
130 | where `bpf_update_elem()` wrapper is in-turn calling the BPF syscall with proper arguments and attributes as,
131 | 
132 | ```C
133 | syscall(__NR_bpf, BPF_MAP_UPDATE_ELEM, &attr, sizeof(attr));
134 | ```
135 | 
136 | This inturn calls [`map_update_elem()`](http://lxr.free-electrons.com/source/kernel/bpf/syscall.c#L205) which securely copies the key and value using `copy_from_user()` and then calls the [specialized function](http://lxr.free-electrons.com/source/kernel/bpf/arraymap.c#L94) for updating the value for array-map at the specified index. Similar things happen for reading/deleting/creating has or array maps from userspace.
137 | 
138 | So probably, things will start falling into pieces now from the [earlier post](http://www.brendangregg.com/blog/2015-05-15/ebpf-one-small-step.html) by Brendan Gregg where he was updating a map from the BPF program (using the `BPF_CALL` instruction which calls the internal [kernel helpers](http://lxr.free-electrons.com/source/kernel/bpf/helpers.c)) and then concurrently accessing it from userspace to generate a beautiful histogram (through the syscall I just mentioned above). BPF Maps are indeed a very powerful addition to the system. You can also checkout more detailed and complete [examples](https://github.com/iovisor/bcc/tree/master/examples) now that you know what is going on.  To summarize, this is how an example BPF program written in restricted C for kernel part (`foo_kern.c`) and normal C for userspace part (`foo_user.c`) would run these days:
139 | 
140 | [![img] (https://suchakra.files.wordpress.com/2015/08/ebpf-session.png)](https://suchakra.files.wordpress.com/2015/08/ebpf-session.png)
141 | 
142 | In the next BPF post, I will discuss the eBPF  verifier in detail. This is the most crucial part of BPF and deserves detailed attention I think. There is also something cool happening these days on the Plumgrid side I think - the [BPF Compiler Collection](https://github.com/iovisor/bcc). There was a very interesting demo using such tools and the power of eBPF at the recent Red Hat Summit. I got BCC working and tried out some examples with probes - where I could easily compile and load BPF programs from my Python scripts! How cool is that :) Also, I have been digging through the LTTng's interpreter lately so probably another post detailing how the BPF and LTTng's interpreters work would be nice to know. That's all for now. Run BPF.
143 | 


--------------------------------------------------------------------------------
/bpf_helpers.rst:
--------------------------------------------------------------------------------
   1 | .. Copyright (C) All BPF authors and contributors from 2014 to present.
   2 | .. See git log include/uapi/linux/bpf.h in kernel tree for details.
   3 | .. 
   4 | .. %%%LICENSE_START(VERBATIM)
   5 | .. Permission is granted to make and distribute verbatim copies of this
   6 | .. manual provided the copyright notice and this permission notice are
   7 | .. preserved on all copies.
   8 | .. 
   9 | .. Permission is granted to copy and distribute modified versions of this
  10 | .. manual under the conditions for verbatim copying, provided that the
  11 | .. entire resulting derived work is distributed under the terms of a
  12 | .. permission notice identical to this one.
  13 | .. 
  14 | .. Since the Linux kernel and libraries are constantly changing, this
  15 | .. manual page may be incorrect or out-of-date.  The author(s) assume no
  16 | .. responsibility for errors or omissions, or for damages resulting from
  17 | .. the use of the information contained herein.  The author(s) may not
  18 | .. have taken the same level of care in the production of this manual,
  19 | .. which is licensed free of charge, as they might when working
  20 | .. professionally.
  21 | .. 
  22 | .. Formatted or processed versions of this manual, if unaccompanied by
  23 | .. the source, must acknowledge the copyright and authors of this work.
  24 | .. %%%LICENSE_END
  25 | .. 
  26 | .. Please do not edit this file. It was generated from the documentation
  27 | .. located in file include/uapi/linux/bpf.h of the Linux kernel sources
  28 | .. (helpers description), and from scripts/bpf_helpers_doc.py in the same
  29 | .. repository (header and footer).
  30 | 
  31 | ===========
  32 | BPF-HELPERS
  33 | ===========
  34 | -------------------------------------------------------------------------------
  35 | list of eBPF helper functions
  36 | -------------------------------------------------------------------------------
  37 | 
  38 | :Manual section: 7
  39 | 
  40 | DESCRIPTION
  41 | ===========
  42 | 
  43 | The extended Berkeley Packet Filter (eBPF) subsystem consists in programs
  44 | written in a pseudo-assembly language, then attached to one of the several
  45 | kernel hooks and run in reaction of specific events. This framework differs
  46 | from the older, "classic" BPF (or "cBPF") in several aspects, one of them being
  47 | the ability to call special functions (or "helpers") from within a program.
  48 | These functions are restricted to a white-list of helpers defined in the
  49 | kernel.
  50 | 
  51 | These helpers are used by eBPF programs to interact with the system, or with
  52 | the context in which they work. For instance, they can be used to print
  53 | debugging messages, to get the time since the system was booted, to interact
  54 | with eBPF maps, or to manipulate network packets. Since there are several eBPF
  55 | program types, and that they do not run in the same context, each program type
  56 | can only call a subset of those helpers.
  57 | 
  58 | Due to eBPF conventions, a helper can not have more than five arguments.
  59 | 
  60 | Internally, eBPF programs call directly into the compiled helper functions
  61 | without requiring any foreign-function interface. As a result, calling helpers
  62 | introduces no overhead, thus offering excellent performance.
  63 | 
  64 | This document is an attempt to list and document the helpers available to eBPF
  65 | developers. They are sorted by chronological order (the oldest helpers in the
  66 | kernel at the top).
  67 | 
  68 | HELPERS
  69 | =======
  70 | 
  71 | **void \*bpf_map_lookup_elem(struct bpf_map \***\ *map*\ **, const void \***\ *key*\ **)**
  72 | 	Description
  73 | 		Perform a lookup in *map* for an entry associated to *key*.
  74 | 	Return
  75 | 		Map value associated to *key*, or **NULL** if no entry was
  76 | 		found.
  77 | 
  78 | **int bpf_map_update_elem(struct bpf_map \***\ *map*\ **, const void \***\ *key*\ **, const void \***\ *value*\ **, u64** *flags*\ **)**
  79 | 	Description
  80 | 		Add or update the value of the entry associated to *key* in
  81 | 		*map* with *value*. *flags* is one of:
  82 | 
  83 | 		**BPF_NOEXIST**
  84 | 			The entry for *key* must not exist in the map.
  85 | 		**BPF_EXIST**
  86 | 			The entry for *key* must already exist in the map.
  87 | 		**BPF_ANY**
  88 | 			No condition on the existence of the entry for *key*.
  89 | 
  90 | 		Flag value **BPF_NOEXIST** cannot be used for maps of types
  91 | 		**BPF_MAP_TYPE_ARRAY** or **BPF_MAP_TYPE_PERCPU_ARRAY**  (all
  92 | 		elements always exist), the helper would return an error.
  93 | 	Return
  94 | 		0 on success, or a negative error in case of failure.
  95 | 
  96 | **int bpf_map_delete_elem(struct bpf_map \***\ *map*\ **, const void \***\ *key*\ **)**
  97 | 	Description
  98 | 		Delete entry with *key* from *map*.
  99 | 	Return
 100 | 		0 on success, or a negative error in case of failure.
 101 | 
 102 | **int bpf_probe_read(void \***\ *dst*\ **, u32** *size*\ **, const void \***\ *src*\ **)**
 103 | 	Description
 104 | 		For tracing programs, safely attempt to read *size* bytes from
 105 | 		address *src* and store the data in *dst*.
 106 | 	Return
 107 | 		0 on success, or a negative error in case of failure.
 108 | 
 109 | **u64 bpf_ktime_get_ns(void)**
 110 | 	Description
 111 | 		Return the time elapsed since system boot, in nanoseconds.
 112 | 	Return
 113 | 		Current *ktime*.
 114 | 
 115 | **int bpf_trace_printk(const char \***\ *fmt*\ **, u32** *fmt_size*\ **, ...)**
 116 | 	Description
 117 | 		This helper is a "printk()-like" facility for debugging. It
 118 | 		prints a message defined by format *fmt* (of size *fmt_size*)
 119 | 		to file *\/sys/kernel/debug/tracing/trace* from DebugFS, if
 120 | 		available. It can take up to three additional **u64**
 121 | 		arguments (as an eBPF helpers, the total number of arguments is
 122 | 		limited to five).
 123 | 
 124 | 		Each time the helper is called, it appends a line to the trace.
 125 | 		The format of the trace is customizable, and the exact output
 126 | 		one will get depends on the options set in
 127 | 		*\/sys/kernel/debug/tracing/trace_options* (see also the
 128 | 		*README* file under the same directory). However, it usually
 129 | 		defaults to something like:
 130 | 
 131 | 		::
 132 | 
 133 | 			telnet-470   [001] .N.. 419421.045894: 0x00000001: <formatted msg>
 134 | 
 135 | 		In the above:
 136 | 
 137 | 			* ``telnet`` is the name of the current task.
 138 | 			* ``470`` is the PID of the current task.
 139 | 			* ``001`` is the CPU number on which the task is
 140 | 			  running.
 141 | 			* In ``.N..``, each character refers to a set of
 142 | 			  options (whether irqs are enabled, scheduling
 143 | 			  options, whether hard/softirqs are running, level of
 144 | 			  preempt_disabled respectively). **N** means that
 145 | 			  **TIF_NEED_RESCHED** and **PREEMPT_NEED_RESCHED**
 146 | 			  are set.
 147 | 			* ``419421.045894`` is a timestamp.
 148 | 			* ``0x00000001`` is a fake value used by BPF for the
 149 | 			  instruction pointer register.
 150 | 			* ``<formatted msg>`` is the message formatted with
 151 | 			  *fmt*.
 152 | 
 153 | 		The conversion specifiers supported by *fmt* are similar, but
 154 | 		more limited than for printk(). They are **%d**, **%i**,
 155 | 		**%u**, **%x**, **%ld**, **%li**, **%lu**, **%lx**, **%lld**,
 156 | 		**%lli**, **%llu**, **%llx**, **%p**, **%s**. No modifier (size
 157 | 		of field, padding with zeroes, etc.) is available, and the
 158 | 		helper will return **-EINVAL** (but print nothing) if it
 159 | 		encounters an unknown specifier.
 160 | 
 161 | 		Also, note that **bpf_trace_printk**\ () is slow, and should
 162 | 		only be used for debugging purposes. For this reason, a notice
 163 | 		bloc (spanning several lines) is printed to kernel logs and
 164 | 		states that the helper should not be used "for production use"
 165 | 		the first time this helper is used (or more precisely, when
 166 | 		**trace_printk**\ () buffers are allocated). For passing values
 167 | 		to user space, perf events should be preferred.
 168 | 	Return
 169 | 		The number of bytes written to the buffer, or a negative error
 170 | 		in case of failure.
 171 | 
 172 | **u32 bpf_get_prandom_u32(void)**
 173 | 	Description
 174 | 		Get a pseudo-random number.
 175 | 
 176 | 		From a security point of view, this helper uses its own
 177 | 		pseudo-random internal state, and cannot be used to infer the
 178 | 		seed of other random functions in the kernel. However, it is
 179 | 		essential to note that the generator used by the helper is not
 180 | 		cryptographically secure.
 181 | 	Return
 182 | 		A random 32-bit unsigned value.
 183 | 
 184 | **u32 bpf_get_smp_processor_id(void)**
 185 | 	Description
 186 | 		Get the SMP (symmetric multiprocessing) processor id. Note that
 187 | 		all programs run with preemption disabled, which means that the
 188 | 		SMP processor id is stable during all the execution of the
 189 | 		program.
 190 | 	Return
 191 | 		The SMP id of the processor running the program.
 192 | 
 193 | **int bpf_skb_store_bytes(struct sk_buff \***\ *skb*\ **, u32** *offset*\ **, const void \***\ *from*\ **, u32** *len*\ **, u64** *flags*\ **)**
 194 | 	Description
 195 | 		Store *len* bytes from address *from* into the packet
 196 | 		associated to *skb*, at *offset*. *flags* are a combination of
 197 | 		**BPF_F_RECOMPUTE_CSUM** (automatically recompute the
 198 | 		checksum for the packet after storing the bytes) and
 199 | 		**BPF_F_INVALIDATE_HASH** (set *skb*\ **->hash**, *skb*\
 200 | 		**->swhash** and *skb*\ **->l4hash** to 0).
 201 | 
 202 | 		A call to this helper is susceptible to change the underlaying
 203 | 		packet buffer. Therefore, at load time, all checks on pointers
 204 | 		previously done by the verifier are invalidated and must be
 205 | 		performed again, if the helper is used in combination with
 206 | 		direct packet access.
 207 | 	Return
 208 | 		0 on success, or a negative error in case of failure.
 209 | 
 210 | **int bpf_l3_csum_replace(struct sk_buff \***\ *skb*\ **, u32** *offset*\ **, u64** *from*\ **, u64** *to*\ **, u64** *size*\ **)**
 211 | 	Description
 212 | 		Recompute the layer 3 (e.g. IP) checksum for the packet
 213 | 		associated to *skb*. Computation is incremental, so the helper
 214 | 		must know the former value of the header field that was
 215 | 		modified (*from*), the new value of this field (*to*), and the
 216 | 		number of bytes (2 or 4) for this field, stored in *size*.
 217 | 		Alternatively, it is possible to store the difference between
 218 | 		the previous and the new values of the header field in *to*, by
 219 | 		setting *from* and *size* to 0. For both methods, *offset*
 220 | 		indicates the location of the IP checksum within the packet.
 221 | 
 222 | 		This helper works in combination with **bpf_csum_diff**\ (),
 223 | 		which does not update the checksum in-place, but offers more
 224 | 		flexibility and can handle sizes larger than 2 or 4 for the
 225 | 		checksum to update.
 226 | 
 227 | 		A call to this helper is susceptible to change the underlaying
 228 | 		packet buffer. Therefore, at load time, all checks on pointers
 229 | 		previously done by the verifier are invalidated and must be
 230 | 		performed again, if the helper is used in combination with
 231 | 		direct packet access.
 232 | 	Return
 233 | 		0 on success, or a negative error in case of failure.
 234 | 
 235 | **int bpf_l4_csum_replace(struct sk_buff \***\ *skb*\ **, u32** *offset*\ **, u64** *from*\ **, u64** *to*\ **, u64** *flags*\ **)**
 236 | 	Description
 237 | 		Recompute the layer 4 (e.g. TCP, UDP or ICMP) checksum for the
 238 | 		packet associated to *skb*. Computation is incremental, so the
 239 | 		helper must know the former value of the header field that was
 240 | 		modified (*from*), the new value of this field (*to*), and the
 241 | 		number of bytes (2 or 4) for this field, stored on the lowest
 242 | 		four bits of *flags*. Alternatively, it is possible to store
 243 | 		the difference between the previous and the new values of the
 244 | 		header field in *to*, by setting *from* and the four lowest
 245 | 		bits of *flags* to 0. For both methods, *offset* indicates the
 246 | 		location of the IP checksum within the packet. In addition to
 247 | 		the size of the field, *flags* can be added (bitwise OR) actual
 248 | 		flags. With **BPF_F_MARK_MANGLED_0**, a null checksum is left
 249 | 		untouched (unless **BPF_F_MARK_ENFORCE** is added as well), and
 250 | 		for updates resulting in a null checksum the value is set to
 251 | 		**CSUM_MANGLED_0** instead. Flag **BPF_F_PSEUDO_HDR** indicates
 252 | 		the checksum is to be computed against a pseudo-header.
 253 | 
 254 | 		This helper works in combination with **bpf_csum_diff**\ (),
 255 | 		which does not update the checksum in-place, but offers more
 256 | 		flexibility and can handle sizes larger than 2 or 4 for the
 257 | 		checksum to update.
 258 | 
 259 | 		A call to this helper is susceptible to change the underlaying
 260 | 		packet buffer. Therefore, at load time, all checks on pointers
 261 | 		previously done by the verifier are invalidated and must be
 262 | 		performed again, if the helper is used in combination with
 263 | 		direct packet access.
 264 | 	Return
 265 | 		0 on success, or a negative error in case of failure.
 266 | 
 267 | **int bpf_tail_call(void \***\ *ctx*\ **, struct bpf_map \***\ *prog_array_map*\ **, u32** *index*\ **)**
 268 | 	Description
 269 | 		This special helper is used to trigger a "tail call", or in
 270 | 		other words, to jump into another eBPF program. The same stack
 271 | 		frame is used (but values on stack and in registers for the
 272 | 		caller are not accessible to the callee). This mechanism allows
 273 | 		for program chaining, either for raising the maximum number of
 274 | 		available eBPF instructions, or to execute given programs in
 275 | 		conditional blocks. For security reasons, there is an upper
 276 | 		limit to the number of successive tail calls that can be
 277 | 		performed.
 278 | 
 279 | 		Upon call of this helper, the program attempts to jump into a
 280 | 		program referenced at index *index* in *prog_array_map*, a
 281 | 		special map of type **BPF_MAP_TYPE_PROG_ARRAY**, and passes
 282 | 		*ctx*, a pointer to the context.
 283 | 
 284 | 		If the call succeeds, the kernel immediately runs the first
 285 | 		instruction of the new program. This is not a function call,
 286 | 		and it never returns to the previous program. If the call
 287 | 		fails, then the helper has no effect, and the caller continues
 288 | 		to run its subsequent instructions. A call can fail if the
 289 | 		destination program for the jump does not exist (i.e. *index*
 290 | 		is superior to the number of entries in *prog_array_map*), or
 291 | 		if the maximum number of tail calls has been reached for this
 292 | 		chain of programs. This limit is defined in the kernel by the
 293 | 		macro **MAX_TAIL_CALL_CNT** (not accessible to user space),
 294 | 		which is currently set to 32.
 295 | 	Return
 296 | 		0 on success, or a negative error in case of failure.
 297 | 
 298 | **int bpf_clone_redirect(struct sk_buff \***\ *skb*\ **, u32** *ifindex*\ **, u64** *flags*\ **)**
 299 | 	Description
 300 | 		Clone and redirect the packet associated to *skb* to another
 301 | 		net device of index *ifindex*. Both ingress and egress
 302 | 		interfaces can be used for redirection. The **BPF_F_INGRESS**
 303 | 		value in *flags* is used to make the distinction (ingress path
 304 | 		is selected if the flag is present, egress path otherwise).
 305 | 		This is the only flag supported for now.
 306 | 
 307 | 		In comparison with **bpf_redirect**\ () helper,
 308 | 		**bpf_clone_redirect**\ () has the associated cost of
 309 | 		duplicating the packet buffer, but this can be executed out of
 310 | 		the eBPF program. Conversely, **bpf_redirect**\ () is more
 311 | 		efficient, but it is handled through an action code where the
 312 | 		redirection happens only after the eBPF program has returned.
 313 | 
 314 | 		A call to this helper is susceptible to change the underlaying
 315 | 		packet buffer. Therefore, at load time, all checks on pointers
 316 | 		previously done by the verifier are invalidated and must be
 317 | 		performed again, if the helper is used in combination with
 318 | 		direct packet access.
 319 | 	Return
 320 | 		0 on success, or a negative error in case of failure.
 321 | 
 322 | **u64 bpf_get_current_pid_tgid(void)**
 323 | 	Return
 324 | 		A 64-bit integer containing the current tgid and pid, and
 325 | 		created as such:
 326 | 		*current_task*\ **->tgid << 32 \|**
 327 | 		*current_task*\ **->pid**.
 328 | 
 329 | **u64 bpf_get_current_uid_gid(void)**
 330 | 	Return
 331 | 		A 64-bit integer containing the current GID and UID, and
 332 | 		created as such: *current_gid* **<< 32 \|** *current_uid*.
 333 | 
 334 | **int bpf_get_current_comm(char \***\ *buf*\ **, u32** *size_of_buf*\ **)**
 335 | 	Description
 336 | 		Copy the **comm** attribute of the current task into *buf* of
 337 | 		*size_of_buf*. The **comm** attribute contains the name of
 338 | 		the executable (excluding the path) for the current task. The
 339 | 		*size_of_buf* must be strictly positive. On success, the
 340 | 		helper makes sure that the *buf* is NUL-terminated. On failure,
 341 | 		it is filled with zeroes.
 342 | 	Return
 343 | 		0 on success, or a negative error in case of failure.
 344 | 
 345 | **u32 bpf_get_cgroup_classid(struct sk_buff \***\ *skb*\ **)**
 346 | 	Description
 347 | 		Retrieve the classid for the current task, i.e. for the net_cls
 348 | 		cgroup to which *skb* belongs.
 349 | 
 350 | 		This helper can be used on TC egress path, but not on ingress.
 351 | 
 352 | 		The net_cls cgroup provides an interface to tag network packets
 353 | 		based on a user-provided identifier for all traffic coming from
 354 | 		the tasks belonging to the related cgroup. See also the related
 355 | 		kernel documentation, available from the Linux sources in file
 356 | 		*Documentation/cgroup-v1/net_cls.txt*.
 357 | 
 358 | 		The Linux kernel has two versions for cgroups: there are
 359 | 		cgroups v1 and cgroups v2. Both are available to users, who can
 360 | 		use a mixture of them, but note that the net_cls cgroup is for
 361 | 		cgroup v1 only. This makes it incompatible with BPF programs
 362 | 		run on cgroups, which is a cgroup-v2-only feature (a socket can
 363 | 		only hold data for one version of cgroups at a time).
 364 | 
 365 | 		This helper is only available is the kernel was compiled with
 366 | 		the **CONFIG_CGROUP_NET_CLASSID** configuration option set to
 367 | 		"**y**" or to "**m**".
 368 | 	Return
 369 | 		The classid, or 0 for the default unconfigured classid.
 370 | 
 371 | **int bpf_skb_vlan_push(struct sk_buff \***\ *skb*\ **, __be16** *vlan_proto*\ **, u16** *vlan_tci*\ **)**
 372 | 	Description
 373 | 		Push a *vlan_tci* (VLAN tag control information) of protocol
 374 | 		*vlan_proto* to the packet associated to *skb*, then update
 375 | 		the checksum. Note that if *vlan_proto* is different from
 376 | 		**ETH_P_8021Q** and **ETH_P_8021AD**, it is considered to
 377 | 		be **ETH_P_8021Q**.
 378 | 
 379 | 		A call to this helper is susceptible to change the underlaying
 380 | 		packet buffer. Therefore, at load time, all checks on pointers
 381 | 		previously done by the verifier are invalidated and must be
 382 | 		performed again, if the helper is used in combination with
 383 | 		direct packet access.
 384 | 	Return
 385 | 		0 on success, or a negative error in case of failure.
 386 | 
 387 | **int bpf_skb_vlan_pop(struct sk_buff \***\ *skb*\ **)**
 388 | 	Description
 389 | 		Pop a VLAN header from the packet associated to *skb*.
 390 | 
 391 | 		A call to this helper is susceptible to change the underlaying
 392 | 		packet buffer. Therefore, at load time, all checks on pointers
 393 | 		previously done by the verifier are invalidated and must be
 394 | 		performed again, if the helper is used in combination with
 395 | 		direct packet access.
 396 | 	Return
 397 | 		0 on success, or a negative error in case of failure.
 398 | 
 399 | **int bpf_skb_get_tunnel_key(struct sk_buff \***\ *skb*\ **, struct bpf_tunnel_key \***\ *key*\ **, u32** *size*\ **, u64** *flags*\ **)**
 400 | 	Description
 401 | 		Get tunnel metadata. This helper takes a pointer *key* to an
 402 | 		empty **struct bpf_tunnel_key** of **size**, that will be
 403 | 		filled with tunnel metadata for the packet associated to *skb*.
 404 | 		The *flags* can be set to **BPF_F_TUNINFO_IPV6**, which
 405 | 		indicates that the tunnel is based on IPv6 protocol instead of
 406 | 		IPv4.
 407 | 
 408 | 		The **struct bpf_tunnel_key** is an object that generalizes the
 409 | 		principal parameters used by various tunneling protocols into a
 410 | 		single struct. This way, it can be used to easily make a
 411 | 		decision based on the contents of the encapsulation header,
 412 | 		"summarized" in this struct. In particular, it holds the IP
 413 | 		address of the remote end (IPv4 or IPv6, depending on the case)
 414 | 		in *key*\ **->remote_ipv4** or *key*\ **->remote_ipv6**. Also,
 415 | 		this struct exposes the *key*\ **->tunnel_id**, which is
 416 | 		generally mapped to a VNI (Virtual Network Identifier), making
 417 | 		it programmable together with the **bpf_skb_set_tunnel_key**\
 418 | 		() helper.
 419 | 
 420 | 		Let's imagine that the following code is part of a program
 421 | 		attached to the TC ingress interface, on one end of a GRE
 422 | 		tunnel, and is supposed to filter out all messages coming from
 423 | 		remote ends with IPv4 address other than 10.0.0.1:
 424 | 
 425 | 		::
 426 | 
 427 | 			int ret;
 428 | 			struct bpf_tunnel_key key = {};
 429 | 			
 430 | 			ret = bpf_skb_get_tunnel_key(skb, &key, sizeof(key), 0);
 431 | 			if (ret < 0)
 432 | 				return TC_ACT_SHOT;	// drop packet
 433 | 			
 434 | 			if (key.remote_ipv4 != 0x0a000001)
 435 | 				return TC_ACT_SHOT;	// drop packet
 436 | 			
 437 | 			return TC_ACT_OK;		// accept packet
 438 | 
 439 | 		This interface can also be used with all encapsulation devices
 440 | 		that can operate in "collect metadata" mode: instead of having
 441 | 		one network device per specific configuration, the "collect
 442 | 		metadata" mode only requires a single device where the
 443 | 		configuration can be extracted from this helper.
 444 | 
 445 | 		This can be used together with various tunnels such as VXLan,
 446 | 		Geneve, GRE or IP in IP (IPIP).
 447 | 	Return
 448 | 		0 on success, or a negative error in case of failure.
 449 | 
 450 | **int bpf_skb_set_tunnel_key(struct sk_buff \***\ *skb*\ **, struct bpf_tunnel_key \***\ *key*\ **, u32** *size*\ **, u64** *flags*\ **)**
 451 | 	Description
 452 | 		Populate tunnel metadata for packet associated to *skb.* The
 453 | 		tunnel metadata is set to the contents of *key*, of *size*. The
 454 | 		*flags* can be set to a combination of the following values:
 455 | 
 456 | 		**BPF_F_TUNINFO_IPV6**
 457 | 			Indicate that the tunnel is based on IPv6 protocol
 458 | 			instead of IPv4.
 459 | 		**BPF_F_ZERO_CSUM_TX**
 460 | 			For IPv4 packets, add a flag to tunnel metadata
 461 | 			indicating that checksum computation should be skipped
 462 | 			and checksum set to zeroes.
 463 | 		**BPF_F_DONT_FRAGMENT**
 464 | 			Add a flag to tunnel metadata indicating that the
 465 | 			packet should not be fragmented.
 466 | 		**BPF_F_SEQ_NUMBER**
 467 | 			Add a flag to tunnel metadata indicating that a
 468 | 			sequence number should be added to tunnel header before
 469 | 			sending the packet. This flag was added for GRE
 470 | 			encapsulation, but might be used with other protocols
 471 | 			as well in the future.
 472 | 
 473 | 		Here is a typical usage on the transmit path:
 474 | 
 475 | 		::
 476 | 
 477 | 			struct bpf_tunnel_key key;
 478 | 			     populate key ...
 479 | 			bpf_skb_set_tunnel_key(skb, &key, sizeof(key), 0);
 480 | 			bpf_clone_redirect(skb, vxlan_dev_ifindex, 0);
 481 | 
 482 | 		See also the description of the **bpf_skb_get_tunnel_key**\ ()
 483 | 		helper for additional information.
 484 | 	Return
 485 | 		0 on success, or a negative error in case of failure.
 486 | 
 487 | **u64 bpf_perf_event_read(struct bpf_map \***\ *map*\ **, u64** *flags*\ **)**
 488 | 	Description
 489 | 		Read the value of a perf event counter. This helper relies on a
 490 | 		*map* of type **BPF_MAP_TYPE_PERF_EVENT_ARRAY**. The nature of
 491 | 		the perf event counter is selected when *map* is updated with
 492 | 		perf event file descriptors. The *map* is an array whose size
 493 | 		is the number of available CPUs, and each cell contains a value
 494 | 		relative to one CPU. The value to retrieve is indicated by
 495 | 		*flags*, that contains the index of the CPU to look up, masked
 496 | 		with **BPF_F_INDEX_MASK**. Alternatively, *flags* can be set to
 497 | 		**BPF_F_CURRENT_CPU** to indicate that the value for the
 498 | 		current CPU should be retrieved.
 499 | 
 500 | 		Note that before Linux 4.13, only hardware perf event can be
 501 | 		retrieved.
 502 | 
 503 | 		Also, be aware that the newer helper
 504 | 		**bpf_perf_event_read_value**\ () is recommended over
 505 | 		**bpf_perf_event_read**\ () in general. The latter has some ABI
 506 | 		quirks where error and counter value are used as a return code
 507 | 		(which is wrong to do since ranges may overlap). This issue is
 508 | 		fixed with **bpf_perf_event_read_value**\ (), which at the same
 509 | 		time provides more features over the **bpf_perf_event_read**\
 510 | 		() interface. Please refer to the description of
 511 | 		**bpf_perf_event_read_value**\ () for details.
 512 | 	Return
 513 | 		The value of the perf event counter read from the map, or a
 514 | 		negative error code in case of failure.
 515 | 
 516 | **int bpf_redirect(u32** *ifindex*\ **, u64** *flags*\ **)**
 517 | 	Description
 518 | 		Redirect the packet to another net device of index *ifindex*.
 519 | 		This helper is somewhat similar to **bpf_clone_redirect**\
 520 | 		(), except that the packet is not cloned, which provides
 521 | 		increased performance.
 522 | 
 523 | 		Except for XDP, both ingress and egress interfaces can be used
 524 | 		for redirection. The **BPF_F_INGRESS** value in *flags* is used
 525 | 		to make the distinction (ingress path is selected if the flag
 526 | 		is present, egress path otherwise). Currently, XDP only
 527 | 		supports redirection to the egress interface, and accepts no
 528 | 		flag at all.
 529 | 
 530 | 		The same effect can be attained with the more generic
 531 | 		**bpf_redirect_map**\ (), which requires specific maps to be
 532 | 		used but offers better performance.
 533 | 	Return
 534 | 		For XDP, the helper returns **XDP_REDIRECT** on success or
 535 | 		**XDP_ABORTED** on error. For other program types, the values
 536 | 		are **TC_ACT_REDIRECT** on success or **TC_ACT_SHOT** on
 537 | 		error.
 538 | 
 539 | **u32 bpf_get_route_realm(struct sk_buff \***\ *skb*\ **)**
 540 | 	Description
 541 | 		Retrieve the realm or the route, that is to say the
 542 | 		**tclassid** field of the destination for the *skb*. The
 543 | 		indentifier retrieved is a user-provided tag, similar to the
 544 | 		one used with the net_cls cgroup (see description for
 545 | 		**bpf_get_cgroup_classid**\ () helper), but here this tag is
 546 | 		held by a route (a destination entry), not by a task.
 547 | 
 548 | 		Retrieving this identifier works with the clsact TC egress hook
 549 | 		(see also **tc-bpf(8)**), or alternatively on conventional
 550 | 		classful egress qdiscs, but not on TC ingress path. In case of
 551 | 		clsact TC egress hook, this has the advantage that, internally,
 552 | 		the destination entry has not been dropped yet in the transmit
 553 | 		path. Therefore, the destination entry does not need to be
 554 | 		artificially held via **netif_keep_dst**\ () for a classful
 555 | 		qdisc until the *skb* is freed.
 556 | 
 557 | 		This helper is available only if the kernel was compiled with
 558 | 		**CONFIG_IP_ROUTE_CLASSID** configuration option.
 559 | 	Return
 560 | 		The realm of the route for the packet associated to *skb*, or 0
 561 | 		if none was found.
 562 | 
 563 | **int bpf_perf_event_output(struct pt_reg \***\ *ctx*\ **, struct bpf_map \***\ *map*\ **, u64** *flags*\ **, void \***\ *data*\ **, u64** *size*\ **)**
 564 | 	Description
 565 | 		Write raw *data* blob into a special BPF perf event held by
 566 | 		*map* of type **BPF_MAP_TYPE_PERF_EVENT_ARRAY**. This perf
 567 | 		event must have the following attributes: **PERF_SAMPLE_RAW**
 568 | 		as **sample_type**, **PERF_TYPE_SOFTWARE** as **type**, and
 569 | 		**PERF_COUNT_SW_BPF_OUTPUT** as **config**.
 570 | 
 571 | 		The *flags* are used to indicate the index in *map* for which
 572 | 		the value must be put, masked with **BPF_F_INDEX_MASK**.
 573 | 		Alternatively, *flags* can be set to **BPF_F_CURRENT_CPU**
 574 | 		to indicate that the index of the current CPU core should be
 575 | 		used.
 576 | 
 577 | 		The value to write, of *size*, is passed through eBPF stack and
 578 | 		pointed by *data*.
 579 | 
 580 | 		The context of the program *ctx* needs also be passed to the
 581 | 		helper.
 582 | 
 583 | 		On user space, a program willing to read the values needs to
 584 | 		call **perf_event_open**\ () on the perf event (either for
 585 | 		one or for all CPUs) and to store the file descriptor into the
 586 | 		*map*. This must be done before the eBPF program can send data
 587 | 		into it. An example is available in file
 588 | 		*samples/bpf/trace_output_user.c* in the Linux kernel source
 589 | 		tree (the eBPF program counterpart is in
 590 | 		*samples/bpf/trace_output_kern.c*).
 591 | 
 592 | 		**bpf_perf_event_output**\ () achieves better performance
 593 | 		than **bpf_trace_printk**\ () for sharing data with user
 594 | 		space, and is much better suitable for streaming data from eBPF
 595 | 		programs.
 596 | 
 597 | 		Note that this helper is not restricted to tracing use cases
 598 | 		and can be used with programs attached to TC or XDP as well,
 599 | 		where it allows for passing data to user space listeners. Data
 600 | 		can be:
 601 | 
 602 | 		* Only custom structs,
 603 | 		* Only the packet payload, or
 604 | 		* A combination of both.
 605 | 	Return
 606 | 		0 on success, or a negative error in case of failure.
 607 | 
 608 | **int bpf_skb_load_bytes(const struct sk_buff \***\ *skb*\ **, u32** *offset*\ **, void \***\ *to*\ **, u32** *len*\ **)**
 609 | 	Description
 610 | 		This helper was provided as an easy way to load data from a
 611 | 		packet. It can be used to load *len* bytes from *offset* from
 612 | 		the packet associated to *skb*, into the buffer pointed by
 613 | 		*to*.
 614 | 
 615 | 		Since Linux 4.7, usage of this helper has mostly been replaced
 616 | 		by "direct packet access", enabling packet data to be
 617 | 		manipulated with *skb*\ **->data** and *skb*\ **->data_end**
 618 | 		pointing respectively to the first byte of packet data and to
 619 | 		the byte after the last byte of packet data. However, it
 620 | 		remains useful if one wishes to read large quantities of data
 621 | 		at once from a packet into the eBPF stack.
 622 | 	Return
 623 | 		0 on success, or a negative error in case of failure.
 624 | 
 625 | **int bpf_get_stackid(struct pt_reg \***\ *ctx*\ **, struct bpf_map \***\ *map*\ **, u64** *flags*\ **)**
 626 | 	Description
 627 | 		Walk a user or a kernel stack and return its id. To achieve
 628 | 		this, the helper needs *ctx*, which is a pointer to the context
 629 | 		on which the tracing program is executed, and a pointer to a
 630 | 		*map* of type **BPF_MAP_TYPE_STACK_TRACE**.
 631 | 
 632 | 		The last argument, *flags*, holds the number of stack frames to
 633 | 		skip (from 0 to 255), masked with
 634 | 		**BPF_F_SKIP_FIELD_MASK**. The next bits can be used to set
 635 | 		a combination of the following flags:
 636 | 
 637 | 		**BPF_F_USER_STACK**
 638 | 			Collect a user space stack instead of a kernel stack.
 639 | 		**BPF_F_FAST_STACK_CMP**
 640 | 			Compare stacks by hash only.
 641 | 		**BPF_F_REUSE_STACKID**
 642 | 			If two different stacks hash into the same *stackid*,
 643 | 			discard the old one.
 644 | 
 645 | 		The stack id retrieved is a 32 bit long integer handle which
 646 | 		can be further combined with other data (including other stack
 647 | 		ids) and used as a key into maps. This can be useful for
 648 | 		generating a variety of graphs (such as flame graphs or off-cpu
 649 | 		graphs).
 650 | 
 651 | 		For walking a stack, this helper is an improvement over
 652 | 		**bpf_probe_read**\ (), which can be used with unrolled loops
 653 | 		but is not efficient and consumes a lot of eBPF instructions.
 654 | 		Instead, **bpf_get_stackid**\ () can collect up to
 655 | 		**PERF_MAX_STACK_DEPTH** both kernel and user frames. Note that
 656 | 		this limit can be controlled with the **sysctl** program, and
 657 | 		that it should be manually increased in order to profile long
 658 | 		user stacks (such as stacks for Java programs). To do so, use:
 659 | 
 660 | 		::
 661 | 
 662 | 			# sysctl kernel.perf_event_max_stack=<new value>
 663 | 
 664 | 	Return
 665 | 		The positive or null stack id on success, or a negative error
 666 | 		in case of failure.
 667 | 
 668 | **s64 bpf_csum_diff(__be32 \***\ *from*\ **, u32** *from_size*\ **, __be32 \***\ *to*\ **, u32** *to_size*\ **, __wsum** *seed*\ **)**
 669 | 	Description
 670 | 		Compute a checksum difference, from the raw buffer pointed by
 671 | 		*from*, of length *from_size* (that must be a multiple of 4),
 672 | 		towards the raw buffer pointed by *to*, of size *to_size*
 673 | 		(same remark). An optional *seed* can be added to the value
 674 | 		(this can be cascaded, the seed may come from a previous call
 675 | 		to the helper).
 676 | 
 677 | 		This is flexible enough to be used in several ways:
 678 | 
 679 | 		* With *from_size* == 0, *to_size* > 0 and *seed* set to
 680 | 		  checksum, it can be used when pushing new data.
 681 | 		* With *from_size* > 0, *to_size* == 0 and *seed* set to
 682 | 		  checksum, it can be used when removing data from a packet.
 683 | 		* With *from_size* > 0, *to_size* > 0 and *seed* set to 0, it
 684 | 		  can be used to compute a diff. Note that *from_size* and
 685 | 		  *to_size* do not need to be equal.
 686 | 
 687 | 		This helper can be used in combination with
 688 | 		**bpf_l3_csum_replace**\ () and **bpf_l4_csum_replace**\ (), to
 689 | 		which one can feed in the difference computed with
 690 | 		**bpf_csum_diff**\ ().
 691 | 	Return
 692 | 		The checksum result, or a negative error code in case of
 693 | 		failure.
 694 | 
 695 | **int bpf_skb_get_tunnel_opt(struct sk_buff \***\ *skb*\ **, u8 \***\ *opt*\ **, u32** *size*\ **)**
 696 | 	Description
 697 | 		Retrieve tunnel options metadata for the packet associated to
 698 | 		*skb*, and store the raw tunnel option data to the buffer *opt*
 699 | 		of *size*.
 700 | 
 701 | 		This helper can be used with encapsulation devices that can
 702 | 		operate in "collect metadata" mode (please refer to the related
 703 | 		note in the description of **bpf_skb_get_tunnel_key**\ () for
 704 | 		more details). A particular example where this can be used is
 705 | 		in combination with the Geneve encapsulation protocol, where it
 706 | 		allows for pushing (with **bpf_skb_get_tunnel_opt**\ () helper)
 707 | 		and retrieving arbitrary TLVs (Type-Length-Value headers) from
 708 | 		the eBPF program. This allows for full customization of these
 709 | 		headers.
 710 | 	Return
 711 | 		The size of the option data retrieved.
 712 | 
 713 | **int bpf_skb_set_tunnel_opt(struct sk_buff \***\ *skb*\ **, u8 \***\ *opt*\ **, u32** *size*\ **)**
 714 | 	Description
 715 | 		Set tunnel options metadata for the packet associated to *skb*
 716 | 		to the option data contained in the raw buffer *opt* of *size*.
 717 | 
 718 | 		See also the description of the **bpf_skb_get_tunnel_opt**\ ()
 719 | 		helper for additional information.
 720 | 	Return
 721 | 		0 on success, or a negative error in case of failure.
 722 | 
 723 | **int bpf_skb_change_proto(struct sk_buff \***\ *skb*\ **, __be16** *proto*\ **, u64** *flags*\ **)**
 724 | 	Description
 725 | 		Change the protocol of the *skb* to *proto*. Currently
 726 | 		supported are transition from IPv4 to IPv6, and from IPv6 to
 727 | 		IPv4. The helper takes care of the groundwork for the
 728 | 		transition, including resizing the socket buffer. The eBPF
 729 | 		program is expected to fill the new headers, if any, via
 730 | 		**skb_store_bytes**\ () and to recompute the checksums with
 731 | 		**bpf_l3_csum_replace**\ () and **bpf_l4_csum_replace**\
 732 | 		(). The main case for this helper is to perform NAT64
 733 | 		operations out of an eBPF program.
 734 | 
 735 | 		Internally, the GSO type is marked as dodgy so that headers are
 736 | 		checked and segments are recalculated by the GSO/GRO engine.
 737 | 		The size for GSO target is adapted as well.
 738 | 
 739 | 		All values for *flags* are reserved for future usage, and must
 740 | 		be left at zero.
 741 | 
 742 | 		A call to this helper is susceptible to change the underlaying
 743 | 		packet buffer. Therefore, at load time, all checks on pointers
 744 | 		previously done by the verifier are invalidated and must be
 745 | 		performed again, if the helper is used in combination with
 746 | 		direct packet access.
 747 | 	Return
 748 | 		0 on success, or a negative error in case of failure.
 749 | 
 750 | **int bpf_skb_change_type(struct sk_buff \***\ *skb*\ **, u32** *type*\ **)**
 751 | 	Description
 752 | 		Change the packet type for the packet associated to *skb*. This
 753 | 		comes down to setting *skb*\ **->pkt_type** to *type*, except
 754 | 		the eBPF program does not have a write access to *skb*\
 755 | 		**->pkt_type** beside this helper. Using a helper here allows
 756 | 		for graceful handling of errors.
 757 | 
 758 | 		The major use case is to change incoming *skb*s to
 759 | 		**PACKET_HOST** in a programmatic way instead of having to
 760 | 		recirculate via **redirect**\ (..., **BPF_F_INGRESS**), for
 761 | 		example.
 762 | 
 763 | 		Note that *type* only allows certain values. At this time, they
 764 | 		are:
 765 | 
 766 | 		**PACKET_HOST**
 767 | 			Packet is for us.
 768 | 		**PACKET_BROADCAST**
 769 | 			Send packet to all.
 770 | 		**PACKET_MULTICAST**
 771 | 			Send packet to group.
 772 | 		**PACKET_OTHERHOST**
 773 | 			Send packet to someone else.
 774 | 	Return
 775 | 		0 on success, or a negative error in case of failure.
 776 | 
 777 | **int bpf_skb_under_cgroup(struct sk_buff \***\ *skb*\ **, struct bpf_map \***\ *map*\ **, u32** *index*\ **)**
 778 | 	Description
 779 | 		Check whether *skb* is a descendant of the cgroup2 held by
 780 | 		*map* of type **BPF_MAP_TYPE_CGROUP_ARRAY**, at *index*.
 781 | 	Return
 782 | 		The return value depends on the result of the test, and can be:
 783 | 
 784 | 		* 0, if the *skb* failed the cgroup2 descendant test.
 785 | 		* 1, if the *skb* succeeded the cgroup2 descendant test.
 786 | 		* A negative error code, if an error occurred.
 787 | 
 788 | **u32 bpf_get_hash_recalc(struct sk_buff \***\ *skb*\ **)**
 789 | 	Description
 790 | 		Retrieve the hash of the packet, *skb*\ **->hash**. If it is
 791 | 		not set, in particular if the hash was cleared due to mangling,
 792 | 		recompute this hash. Later accesses to the hash can be done
 793 | 		directly with *skb*\ **->hash**.
 794 | 
 795 | 		Calling **bpf_set_hash_invalid**\ (), changing a packet
 796 | 		prototype with **bpf_skb_change_proto**\ (), or calling
 797 | 		**bpf_skb_store_bytes**\ () with the
 798 | 		**BPF_F_INVALIDATE_HASH** are actions susceptible to clear
 799 | 		the hash and to trigger a new computation for the next call to
 800 | 		**bpf_get_hash_recalc**\ ().
 801 | 	Return
 802 | 		The 32-bit hash.
 803 | 
 804 | **u64 bpf_get_current_task(void)**
 805 | 	Return
 806 | 		A pointer to the current task struct.
 807 | 
 808 | **int bpf_probe_write_user(void \***\ *dst*\ **, const void \***\ *src*\ **, u32** *len*\ **)**
 809 | 	Description
 810 | 		Attempt in a safe way to write *len* bytes from the buffer
 811 | 		*src* to *dst* in memory. It only works for threads that are in
 812 | 		user context, and *dst* must be a valid user space address.
 813 | 
 814 | 		This helper should not be used to implement any kind of
 815 | 		security mechanism because of TOC-TOU attacks, but rather to
 816 | 		debug, divert, and manipulate execution of semi-cooperative
 817 | 		processes.
 818 | 
 819 | 		Keep in mind that this feature is meant for experiments, and it
 820 | 		has a risk of crashing the system and running programs.
 821 | 		Therefore, when an eBPF program using this helper is attached,
 822 | 		a warning including PID and process name is printed to kernel
 823 | 		logs.
 824 | 	Return
 825 | 		0 on success, or a negative error in case of failure.
 826 | 
 827 | **int bpf_current_task_under_cgroup(struct bpf_map \***\ *map*\ **, u32** *index*\ **)**
 828 | 	Description
 829 | 		Check whether the probe is being run is the context of a given
 830 | 		subset of the cgroup2 hierarchy. The cgroup2 to test is held by
 831 | 		*map* of type **BPF_MAP_TYPE_CGROUP_ARRAY**, at *index*.
 832 | 	Return
 833 | 		The return value depends on the result of the test, and can be:
 834 | 
 835 | 		* 0, if the *skb* task belongs to the cgroup2.
 836 | 		* 1, if the *skb* task does not belong to the cgroup2.
 837 | 		* A negative error code, if an error occurred.
 838 | 
 839 | **int bpf_skb_change_tail(struct sk_buff \***\ *skb*\ **, u32** *len*\ **, u64** *flags*\ **)**
 840 | 	Description
 841 | 		Resize (trim or grow) the packet associated to *skb* to the
 842 | 		new *len*. The *flags* are reserved for future usage, and must
 843 | 		be left at zero.
 844 | 
 845 | 		The basic idea is that the helper performs the needed work to
 846 | 		change the size of the packet, then the eBPF program rewrites
 847 | 		the rest via helpers like **bpf_skb_store_bytes**\ (),
 848 | 		**bpf_l3_csum_replace**\ (), **bpf_l3_csum_replace**\ ()
 849 | 		and others. This helper is a slow path utility intended for
 850 | 		replies with control messages. And because it is targeted for
 851 | 		slow path, the helper itself can afford to be slow: it
 852 | 		implicitly linearizes, unclones and drops offloads from the
 853 | 		*skb*.
 854 | 
 855 | 		A call to this helper is susceptible to change the underlaying
 856 | 		packet buffer. Therefore, at load time, all checks on pointers
 857 | 		previously done by the verifier are invalidated and must be
 858 | 		performed again, if the helper is used in combination with
 859 | 		direct packet access.
 860 | 	Return
 861 | 		0 on success, or a negative error in case of failure.
 862 | 
 863 | **int bpf_skb_pull_data(struct sk_buff \***\ *skb*\ **, u32** *len*\ **)**
 864 | 	Description
 865 | 		Pull in non-linear data in case the *skb* is non-linear and not
 866 | 		all of *len* are part of the linear section. Make *len* bytes
 867 | 		from *skb* readable and writable. If a zero value is passed for
 868 | 		*len*, then the whole length of the *skb* is pulled.
 869 | 
 870 | 		This helper is only needed for reading and writing with direct
 871 | 		packet access.
 872 | 
 873 | 		For direct packet access, testing that offsets to access
 874 | 		are within packet boundaries (test on *skb*\ **->data_end**) is
 875 | 		susceptible to fail if offsets are invalid, or if the requested
 876 | 		data is in non-linear parts of the *skb*. On failure the
 877 | 		program can just bail out, or in the case of a non-linear
 878 | 		buffer, use a helper to make the data available. The
 879 | 		**bpf_skb_load_bytes**\ () helper is a first solution to access
 880 | 		the data. Another one consists in using **bpf_skb_pull_data**
 881 | 		to pull in once the non-linear parts, then retesting and
 882 | 		eventually access the data.
 883 | 
 884 | 		At the same time, this also makes sure the *skb* is uncloned,
 885 | 		which is a necessary condition for direct write. As this needs
 886 | 		to be an invariant for the write part only, the verifier
 887 | 		detects writes and adds a prologue that is calling
 888 | 		**bpf_skb_pull_data()** to effectively unclone the *skb* from
 889 | 		the very beginning in case it is indeed cloned.
 890 | 
 891 | 		A call to this helper is susceptible to change the underlaying
 892 | 		packet buffer. Therefore, at load time, all checks on pointers
 893 | 		previously done by the verifier are invalidated and must be
 894 | 		performed again, if the helper is used in combination with
 895 | 		direct packet access.
 896 | 	Return
 897 | 		0 on success, or a negative error in case of failure.
 898 | 
 899 | **s64 bpf_csum_update(struct sk_buff \***\ *skb*\ **, __wsum** *csum*\ **)**
 900 | 	Description
 901 | 		Add the checksum *csum* into *skb*\ **->csum** in case the
 902 | 		driver has supplied a checksum for the entire packet into that
 903 | 		field. Return an error otherwise. This helper is intended to be
 904 | 		used in combination with **bpf_csum_diff**\ (), in particular
 905 | 		when the checksum needs to be updated after data has been
 906 | 		written into the packet through direct packet access.
 907 | 	Return
 908 | 		The checksum on success, or a negative error code in case of
 909 | 		failure.
 910 | 
 911 | **void bpf_set_hash_invalid(struct sk_buff \***\ *skb*\ **)**
 912 | 	Description
 913 | 		Invalidate the current *skb*\ **->hash**. It can be used after
 914 | 		mangling on headers through direct packet access, in order to
 915 | 		indicate that the hash is outdated and to trigger a
 916 | 		recalculation the next time the kernel tries to access this
 917 | 		hash or when the **bpf_get_hash_recalc**\ () helper is called.
 918 | 
 919 | 
 920 | **int bpf_get_numa_node_id(void)**
 921 | 	Description
 922 | 		Return the id of the current NUMA node. The primary use case
 923 | 		for this helper is the selection of sockets for the local NUMA
 924 | 		node, when the program is attached to sockets using the
 925 | 		**SO_ATTACH_REUSEPORT_EBPF** option (see also **socket(7)**),
 926 | 		but the helper is also available to other eBPF program types,
 927 | 		similarly to **bpf_get_smp_processor_id**\ ().
 928 | 	Return
 929 | 		The id of current NUMA node.
 930 | 
 931 | **int bpf_skb_change_head(struct sk_buff \***\ *skb*\ **, u32** *len*\ **, u64** *flags*\ **)**
 932 | 	Description
 933 | 		Grows headroom of packet associated to *skb* and adjusts the
 934 | 		offset of the MAC header accordingly, adding *len* bytes of
 935 | 		space. It automatically extends and reallocates memory as
 936 | 		required.
 937 | 
 938 | 		This helper can be used on a layer 3 *skb* to push a MAC header
 939 | 		for redirection into a layer 2 device.
 940 | 
 941 | 		All values for *flags* are reserved for future usage, and must
 942 | 		be left at zero.
 943 | 
 944 | 		A call to this helper is susceptible to change the underlaying
 945 | 		packet buffer. Therefore, at load time, all checks on pointers
 946 | 		previously done by the verifier are invalidated and must be
 947 | 		performed again, if the helper is used in combination with
 948 | 		direct packet access.
 949 | 	Return
 950 | 		0 on success, or a negative error in case of failure.
 951 | 
 952 | **int bpf_xdp_adjust_head(struct xdp_buff \***\ *xdp_md*\ **, int** *delta*\ **)**
 953 | 	Description
 954 | 		Adjust (move) *xdp_md*\ **->data** by *delta* bytes. Note that
 955 | 		it is possible to use a negative value for *delta*. This helper
 956 | 		can be used to prepare the packet for pushing or popping
 957 | 		headers.
 958 | 
 959 | 		A call to this helper is susceptible to change the underlaying
 960 | 		packet buffer. Therefore, at load time, all checks on pointers
 961 | 		previously done by the verifier are invalidated and must be
 962 | 		performed again, if the helper is used in combination with
 963 | 		direct packet access.
 964 | 	Return
 965 | 		0 on success, or a negative error in case of failure.
 966 | 
 967 | **int bpf_probe_read_str(void \***\ *dst*\ **, int** *size*\ **, const void \***\ *unsafe_ptr*\ **)**
 968 | 	Description
 969 | 		Copy a NUL terminated string from an unsafe address
 970 | 		*unsafe_ptr* to *dst*. The *size* should include the
 971 | 		terminating NUL byte. In case the string length is smaller than
 972 | 		*size*, the target is not padded with further NUL bytes. If the
 973 | 		string length is larger than *size*, just *size*-1 bytes are
 974 | 		copied and the last byte is set to NUL.
 975 | 
 976 | 		On success, the length of the copied string is returned. This
 977 | 		makes this helper useful in tracing programs for reading
 978 | 		strings, and more importantly to get its length at runtime. See
 979 | 		the following snippet:
 980 | 
 981 | 		::
 982 | 
 983 | 			SEC("kprobe/sys_open")
 984 | 			void bpf_sys_open(struct pt_regs *ctx)
 985 | 			{
 986 | 			        char buf[PATHLEN]; // PATHLEN is defined to 256
 987 | 			        int res = bpf_probe_read_str(buf, sizeof(buf),
 988 | 				                             ctx->di);
 989 | 
 990 | 				// Consume buf, for example push it to
 991 | 				// userspace via bpf_perf_event_output(); we
 992 | 				// can use res (the string length) as event
 993 | 				// size, after checking its boundaries.
 994 | 			}
 995 | 
 996 | 		In comparison, using **bpf_probe_read()** helper here instead
 997 | 		to read the string would require to estimate the length at
 998 | 		compile time, and would often result in copying more memory
 999 | 		than necessary.
1000 | 
1001 | 		Another useful use case is when parsing individual process
1002 | 		arguments or individual environment variables navigating
1003 | 		*current*\ **->mm->arg_start** and *current*\
1004 | 		**->mm->env_start**: using this helper and the return value,
1005 | 		one can quickly iterate at the right offset of the memory area.
1006 | 	Return
1007 | 		On success, the strictly positive length of the string,
1008 | 		including the trailing NUL character. On error, a negative
1009 | 		value.
1010 | 
1011 | **u64 bpf_get_socket_cookie(struct sk_buff \***\ *skb*\ **)**
1012 | 	Description
1013 | 		If the **struct sk_buff** pointed by *skb* has a known socket,
1014 | 		retrieve the cookie (generated by the kernel) of this socket.
1015 | 		If no cookie has been set yet, generate a new cookie. Once
1016 | 		generated, the socket cookie remains stable for the life of the
1017 | 		socket. This helper can be useful for monitoring per socket
1018 | 		networking traffic statistics as it provides a unique socket
1019 | 		identifier per namespace.
1020 | 	Return
1021 | 		A 8-byte long non-decreasing number on success, or 0 if the
1022 | 		socket field is missing inside *skb*.
1023 | 
1024 | **u32 bpf_get_socket_uid(struct sk_buff \***\ *skb*\ **)**
1025 | 	Return
1026 | 		The owner UID of the socket associated to *skb*. If the socket
1027 | 		is **NULL**, or if it is not a full socket (i.e. if it is a
1028 | 		time-wait or a request socket instead), **overflowuid** value
1029 | 		is returned (note that **overflowuid** might also be the actual
1030 | 		UID value for the socket).
1031 | 
1032 | **u32 bpf_set_hash(struct sk_buff \***\ *skb*\ **, u32** *hash*\ **)**
1033 | 	Description
1034 | 		Set the full hash for *skb* (set the field *skb*\ **->hash**)
1035 | 		to value *hash*.
1036 | 	Return
1037 | 		0
1038 | 
1039 | **int bpf_setsockopt(struct bpf_sock_ops \***\ *bpf_socket*\ **, int** *level*\ **, int** *optname*\ **, char \***\ *optval*\ **, int** *optlen*\ **)**
1040 | 	Description
1041 | 		Emulate a call to **setsockopt()** on the socket associated to
1042 | 		*bpf_socket*, which must be a full socket. The *level* at
1043 | 		which the option resides and the name *optname* of the option
1044 | 		must be specified, see **setsockopt(2)** for more information.
1045 | 		The option value of length *optlen* is pointed by *optval*.
1046 | 
1047 | 		This helper actually implements a subset of **setsockopt()**.
1048 | 		It supports the following *level*\ s:
1049 | 
1050 | 		* **SOL_SOCKET**, which supports the following *optname*\ s:
1051 | 		  **SO_RCVBUF**, **SO_SNDBUF**, **SO_MAX_PACING_RATE**,
1052 | 		  **SO_PRIORITY**, **SO_RCVLOWAT**, **SO_MARK**.
1053 | 		* **IPPROTO_TCP**, which supports the following *optname*\ s:
1054 | 		  **TCP_CONGESTION**, **TCP_BPF_IW**,
1055 | 		  **TCP_BPF_SNDCWND_CLAMP**.
1056 | 		* **IPPROTO_IP**, which supports *optname* **IP_TOS**.
1057 | 		* **IPPROTO_IPV6**, which supports *optname* **IPV6_TCLASS**.
1058 | 	Return
1059 | 		0 on success, or a negative error in case of failure.
1060 | 
1061 | **int bpf_skb_adjust_room(struct sk_buff \***\ *skb*\ **, u32** *len_diff*\ **, u32** *mode*\ **, u64** *flags*\ **)**
1062 | 	Description
1063 | 		Grow or shrink the room for data in the packet associated to
1064 | 		*skb* by *len_diff*, and according to the selected *mode*.
1065 | 
1066 | 		There is a single supported mode at this time:
1067 | 
1068 | 		* **BPF_ADJ_ROOM_NET**: Adjust room at the network layer
1069 | 		  (room space is added or removed below the layer 3 header).
1070 | 
1071 | 		All values for *flags* are reserved for future usage, and must
1072 | 		be left at zero.
1073 | 
1074 | 		A call to this helper is susceptible to change the underlaying
1075 | 		packet buffer. Therefore, at load time, all checks on pointers
1076 | 		previously done by the verifier are invalidated and must be
1077 | 		performed again, if the helper is used in combination with
1078 | 		direct packet access.
1079 | 	Return
1080 | 		0 on success, or a negative error in case of failure.
1081 | 
1082 | **int bpf_redirect_map(struct bpf_map \***\ *map*\ **, u32** *key*\ **, u64** *flags*\ **)**
1083 | 	Description
1084 | 		Redirect the packet to the endpoint referenced by *map* at
1085 | 		index *key*. Depending on its type, this *map* can contain
1086 | 		references to net devices (for forwarding packets through other
1087 | 		ports), or to CPUs (for redirecting XDP frames to another CPU;
1088 | 		but this is only implemented for native XDP (with driver
1089 | 		support) as of this writing).
1090 | 
1091 | 		All values for *flags* are reserved for future usage, and must
1092 | 		be left at zero.
1093 | 
1094 | 		When used to redirect packets to net devices, this helper
1095 | 		provides a high performance increase over **bpf_redirect**\ ().
1096 | 		This is due to various implementation details of the underlying
1097 | 		mechanisms, one of which is the fact that **bpf_redirect_map**\
1098 | 		() tries to send packet as a "bulk" to the device.
1099 | 	Return
1100 | 		**XDP_REDIRECT** on success, or **XDP_ABORTED** on error.
1101 | 
1102 | **int bpf_sk_redirect_map(struct bpf_map \***\ *map*\ **, u32** *key*\ **, u64** *flags*\ **)**
1103 | 	Description
1104 | 		Redirect the packet to the socket referenced by *map* (of type
1105 | 		**BPF_MAP_TYPE_SOCKMAP**) at index *key*. Both ingress and
1106 | 		egress interfaces can be used for redirection. The
1107 | 		**BPF_F_INGRESS** value in *flags* is used to make the
1108 | 		distinction (ingress path is selected if the flag is present,
1109 | 		egress path otherwise). This is the only flag supported for now.
1110 | 	Return
1111 | 		**SK_PASS** on success, or **SK_DROP** on error.
1112 | 
1113 | **int bpf_sock_map_update(struct bpf_sock_ops \***\ *skops*\ **, struct bpf_map \***\ *map*\ **, void \***\ *key*\ **, u64** *flags*\ **)**
1114 | 	Description
1115 | 		Add an entry to, or update a *map* referencing sockets. The
1116 | 		*skops* is used as a new value for the entry associated to
1117 | 		*key*. *flags* is one of:
1118 | 
1119 | 		**BPF_NOEXIST**
1120 | 			The entry for *key* must not exist in the map.
1121 | 		**BPF_EXIST**
1122 | 			The entry for *key* must already exist in the map.
1123 | 		**BPF_ANY**
1124 | 			No condition on the existence of the entry for *key*.
1125 | 
1126 | 		If the *map* has eBPF programs (parser and verdict), those will
1127 | 		be inherited by the socket being added. If the socket is
1128 | 		already attached to eBPF programs, this results in an error.
1129 | 	Return
1130 | 		0 on success, or a negative error in case of failure.
1131 | 
1132 | **int bpf_xdp_adjust_meta(struct xdp_buff \***\ *xdp_md*\ **, int** *delta*\ **)**
1133 | 	Description
1134 | 		Adjust the address pointed by *xdp_md*\ **->data_meta** by
1135 | 		*delta* (which can be positive or negative). Note that this
1136 | 		operation modifies the address stored in *xdp_md*\ **->data**,
1137 | 		so the latter must be loaded only after the helper has been
1138 | 		called.
1139 | 
1140 | 		The use of *xdp_md*\ **->data_meta** is optional and programs
1141 | 		are not required to use it. The rationale is that when the
1142 | 		packet is processed with XDP (e.g. as DoS filter), it is
1143 | 		possible to push further meta data along with it before passing
1144 | 		to the stack, and to give the guarantee that an ingress eBPF
1145 | 		program attached as a TC classifier on the same device can pick
1146 | 		this up for further post-processing. Since TC works with socket
1147 | 		buffers, it remains possible to set from XDP the **mark** or
1148 | 		**priority** pointers, or other pointers for the socket buffer.
1149 | 		Having this scratch space generic and programmable allows for
1150 | 		more flexibility as the user is free to store whatever meta
1151 | 		data they need.
1152 | 
1153 | 		A call to this helper is susceptible to change the underlaying
1154 | 		packet buffer. Therefore, at load time, all checks on pointers
1155 | 		previously done by the verifier are invalidated and must be
1156 | 		performed again, if the helper is used in combination with
1157 | 		direct packet access.
1158 | 	Return
1159 | 		0 on success, or a negative error in case of failure.
1160 | 
1161 | **int bpf_perf_event_read_value(struct bpf_map \***\ *map*\ **, u64** *flags*\ **, struct bpf_perf_event_value \***\ *buf*\ **, u32** *buf_size*\ **)**
1162 | 	Description
1163 | 		Read the value of a perf event counter, and store it into *buf*
1164 | 		of size *buf_size*. This helper relies on a *map* of type
1165 | 		**BPF_MAP_TYPE_PERF_EVENT_ARRAY**. The nature of the perf event
1166 | 		counter is selected when *map* is updated with perf event file
1167 | 		descriptors. The *map* is an array whose size is the number of
1168 | 		available CPUs, and each cell contains a value relative to one
1169 | 		CPU. The value to retrieve is indicated by *flags*, that
1170 | 		contains the index of the CPU to look up, masked with
1171 | 		**BPF_F_INDEX_MASK**. Alternatively, *flags* can be set to
1172 | 		**BPF_F_CURRENT_CPU** to indicate that the value for the
1173 | 		current CPU should be retrieved.
1174 | 
1175 | 		This helper behaves in a way close to
1176 | 		**bpf_perf_event_read**\ () helper, save that instead of
1177 | 		just returning the value observed, it fills the *buf*
1178 | 		structure. This allows for additional data to be retrieved: in
1179 | 		particular, the enabled and running times (in *buf*\
1180 | 		**->enabled** and *buf*\ **->running**, respectively) are
1181 | 		copied. In general, **bpf_perf_event_read_value**\ () is
1182 | 		recommended over **bpf_perf_event_read**\ (), which has some
1183 | 		ABI issues and provides fewer functionalities.
1184 | 
1185 | 		These values are interesting, because hardware PMU (Performance
1186 | 		Monitoring Unit) counters are limited resources. When there are
1187 | 		more PMU based perf events opened than available counters,
1188 | 		kernel will multiplex these events so each event gets certain
1189 | 		percentage (but not all) of the PMU time. In case that
1190 | 		multiplexing happens, the number of samples or counter value
1191 | 		will not reflect the case compared to when no multiplexing
1192 | 		occurs. This makes comparison between different runs difficult.
1193 | 		Typically, the counter value should be normalized before
1194 | 		comparing to other experiments. The usual normalization is done
1195 | 		as follows.
1196 | 
1197 | 		::
1198 | 
1199 | 			normalized_counter = counter * t_enabled / t_running
1200 | 
1201 | 		Where t_enabled is the time enabled for event and t_running is
1202 | 		the time running for event since last normalization. The
1203 | 		enabled and running times are accumulated since the perf event
1204 | 		open. To achieve scaling factor between two invocations of an
1205 | 		eBPF program, users can can use CPU id as the key (which is
1206 | 		typical for perf array usage model) to remember the previous
1207 | 		value and do the calculation inside the eBPF program.
1208 | 	Return
1209 | 		0 on success, or a negative error in case of failure.
1210 | 
1211 | **int bpf_perf_prog_read_value(struct bpf_perf_event_data \***\ *ctx*\ **, struct bpf_perf_event_value \***\ *buf*\ **, u32** *buf_size*\ **)**
1212 | 	Description
1213 | 		For en eBPF program attached to a perf event, retrieve the
1214 | 		value of the event counter associated to *ctx* and store it in
1215 | 		the structure pointed by *buf* and of size *buf_size*. Enabled
1216 | 		and running times are also stored in the structure (see
1217 | 		description of helper **bpf_perf_event_read_value**\ () for
1218 | 		more details).
1219 | 	Return
1220 | 		0 on success, or a negative error in case of failure.
1221 | 
1222 | **int bpf_getsockopt(struct bpf_sock_ops \***\ *bpf_socket*\ **, int** *level*\ **, int** *optname*\ **, char \***\ *optval*\ **, int** *optlen*\ **)**
1223 | 	Description
1224 | 		Emulate a call to **getsockopt()** on the socket associated to
1225 | 		*bpf_socket*, which must be a full socket. The *level* at
1226 | 		which the option resides and the name *optname* of the option
1227 | 		must be specified, see **getsockopt(2)** for more information.
1228 | 		The retrieved value is stored in the structure pointed by
1229 | 		*opval* and of length *optlen*.
1230 | 
1231 | 		This helper actually implements a subset of **getsockopt()**.
1232 | 		It supports the following *level*\ s:
1233 | 
1234 | 		* **IPPROTO_TCP**, which supports *optname*
1235 | 		  **TCP_CONGESTION**.
1236 | 		* **IPPROTO_IP**, which supports *optname* **IP_TOS**.
1237 | 		* **IPPROTO_IPV6**, which supports *optname* **IPV6_TCLASS**.
1238 | 	Return
1239 | 		0 on success, or a negative error in case of failure.
1240 | 
1241 | **int bpf_override_return(struct pt_reg \***\ *regs*\ **, u64** *rc*\ **)**
1242 | 	Description
1243 | 		Used for error injection, this helper uses kprobes to override
1244 | 		the return value of the probed function, and to set it to *rc*.
1245 | 		The first argument is the context *regs* on which the kprobe
1246 | 		works.
1247 | 
1248 | 		This helper works by setting setting the PC (program counter)
1249 | 		to an override function which is run in place of the original
1250 | 		probed function. This means the probed function is not run at
1251 | 		all. The replacement function just returns with the required
1252 | 		value.
1253 | 
1254 | 		This helper has security implications, and thus is subject to
1255 | 		restrictions. It is only available if the kernel was compiled
1256 | 		with the **CONFIG_BPF_KPROBE_OVERRIDE** configuration
1257 | 		option, and in this case it only works on functions tagged with
1258 | 		**ALLOW_ERROR_INJECTION** in the kernel code.
1259 | 
1260 | 		Also, the helper is only available for the architectures having
1261 | 		the CONFIG_FUNCTION_ERROR_INJECTION option. As of this writing,
1262 | 		x86 architecture is the only one to support this feature.
1263 | 	Return
1264 | 		0
1265 | 
1266 | **int bpf_sock_ops_cb_flags_set(struct bpf_sock_ops \***\ *bpf_sock*\ **, int** *argval*\ **)**
1267 | 	Description
1268 | 		Attempt to set the value of the **bpf_sock_ops_cb_flags** field
1269 | 		for the full TCP socket associated to *bpf_sock_ops* to
1270 | 		*argval*.
1271 | 
1272 | 		The primary use of this field is to determine if there should
1273 | 		be calls to eBPF programs of type
1274 | 		**BPF_PROG_TYPE_SOCK_OPS** at various points in the TCP
1275 | 		code. A program of the same type can change its value, per
1276 | 		connection and as necessary, when the connection is
1277 | 		established. This field is directly accessible for reading, but
1278 | 		this helper must be used for updates in order to return an
1279 | 		error if an eBPF program tries to set a callback that is not
1280 | 		supported in the current kernel.
1281 | 
1282 | 		The supported callback values that *argval* can combine are:
1283 | 
1284 | 		* **BPF_SOCK_OPS_RTO_CB_FLAG** (retransmission time out)
1285 | 		* **BPF_SOCK_OPS_RETRANS_CB_FLAG** (retransmission)
1286 | 		* **BPF_SOCK_OPS_STATE_CB_FLAG** (TCP state change)
1287 | 
1288 | 		Here are some examples of where one could call such eBPF
1289 | 		program:
1290 | 
1291 | 		* When RTO fires.
1292 | 		* When a packet is retransmitted.
1293 | 		* When the connection terminates.
1294 | 		* When a packet is sent.
1295 | 		* When a packet is received.
1296 | 	Return
1297 | 		Code **-EINVAL** if the socket is not a full TCP socket;
1298 | 		otherwise, a positive number containing the bits that could not
1299 | 		be set is returned (which comes down to 0 if all bits were set
1300 | 		as required).
1301 | 
1302 | **int bpf_msg_redirect_map(struct sk_msg_buff \***\ *msg*\ **, struct bpf_map \***\ *map*\ **, u32** *key*\ **, u64** *flags*\ **)**
1303 | 	Description
1304 | 		This helper is used in programs implementing policies at the
1305 | 		socket level. If the message *msg* is allowed to pass (i.e. if
1306 | 		the verdict eBPF program returns **SK_PASS**), redirect it to
1307 | 		the socket referenced by *map* (of type
1308 | 		**BPF_MAP_TYPE_SOCKMAP**) at index *key*. Both ingress and
1309 | 		egress interfaces can be used for redirection. The
1310 | 		**BPF_F_INGRESS** value in *flags* is used to make the
1311 | 		distinction (ingress path is selected if the flag is present,
1312 | 		egress path otherwise). This is the only flag supported for now.
1313 | 	Return
1314 | 		**SK_PASS** on success, or **SK_DROP** on error.
1315 | 
1316 | **int bpf_msg_apply_bytes(struct sk_msg_buff \***\ *msg*\ **, u32** *bytes*\ **)**
1317 | 	Description
1318 | 		For socket policies, apply the verdict of the eBPF program to
1319 | 		the next *bytes* (number of bytes) of message *msg*.
1320 | 
1321 | 		For example, this helper can be used in the following cases:
1322 | 
1323 | 		* A single **sendmsg**\ () or **sendfile**\ () system call
1324 | 		  contains multiple logical messages that the eBPF program is
1325 | 		  supposed to read and for which it should apply a verdict.
1326 | 		* An eBPF program only cares to read the first *bytes* of a
1327 | 		  *msg*. If the message has a large payload, then setting up
1328 | 		  and calling the eBPF program repeatedly for all bytes, even
1329 | 		  though the verdict is already known, would create unnecessary
1330 | 		  overhead.
1331 | 
1332 | 		When called from within an eBPF program, the helper sets a
1333 | 		counter internal to the BPF infrastructure, that is used to
1334 | 		apply the last verdict to the next *bytes*. If *bytes* is
1335 | 		smaller than the current data being processed from a
1336 | 		**sendmsg**\ () or **sendfile**\ () system call, the first
1337 | 		*bytes* will be sent and the eBPF program will be re-run with
1338 | 		the pointer for start of data pointing to byte number *bytes*
1339 | 		**+ 1**. If *bytes* is larger than the current data being
1340 | 		processed, then the eBPF verdict will be applied to multiple
1341 | 		**sendmsg**\ () or **sendfile**\ () calls until *bytes* are
1342 | 		consumed.
1343 | 
1344 | 		Note that if a socket closes with the internal counter holding
1345 | 		a non-zero value, this is not a problem because data is not
1346 | 		being buffered for *bytes* and is sent as it is received.
1347 | 	Return
1348 | 		0
1349 | 
1350 | **int bpf_msg_cork_bytes(struct sk_msg_buff \***\ *msg*\ **, u32** *bytes*\ **)**
1351 | 	Description
1352 | 		For socket policies, prevent the execution of the verdict eBPF
1353 | 		program for message *msg* until *bytes* (byte number) have been
1354 | 		accumulated.
1355 | 
1356 | 		This can be used when one needs a specific number of bytes
1357 | 		before a verdict can be assigned, even if the data spans
1358 | 		multiple **sendmsg**\ () or **sendfile**\ () calls. The extreme
1359 | 		case would be a user calling **sendmsg**\ () repeatedly with
1360 | 		1-byte long message segments. Obviously, this is bad for
1361 | 		performance, but it is still valid. If the eBPF program needs
1362 | 		*bytes* bytes to validate a header, this helper can be used to
1363 | 		prevent the eBPF program to be called again until *bytes* have
1364 | 		been accumulated.
1365 | 	Return
1366 | 		0
1367 | 
1368 | **int bpf_msg_pull_data(struct sk_msg_buff \***\ *msg*\ **, u32** *start*\ **, u32** *end*\ **, u64** *flags*\ **)**
1369 | 	Description
1370 | 		For socket policies, pull in non-linear data from user space
1371 | 		for *msg* and set pointers *msg*\ **->data** and *msg*\
1372 | 		**->data_end** to *start* and *end* bytes offsets into *msg*,
1373 | 		respectively.
1374 | 
1375 | 		If a program of type **BPF_PROG_TYPE_SK_MSG** is run on a
1376 | 		*msg* it can only parse data that the (**data**, **data_end**)
1377 | 		pointers have already consumed. For **sendmsg**\ () hooks this
1378 | 		is likely the first scatterlist element. But for calls relying
1379 | 		on the **sendpage** handler (e.g. **sendfile**\ ()) this will
1380 | 		be the range (**0**, **0**) because the data is shared with
1381 | 		user space and by default the objective is to avoid allowing
1382 | 		user space to modify data while (or after) eBPF verdict is
1383 | 		being decided. This helper can be used to pull in data and to
1384 | 		set the start and end pointer to given values. Data will be
1385 | 		copied if necessary (i.e. if data was not linear and if start
1386 | 		and end pointers do not point to the same chunk).
1387 | 
1388 | 		A call to this helper is susceptible to change the underlaying
1389 | 		packet buffer. Therefore, at load time, all checks on pointers
1390 | 		previously done by the verifier are invalidated and must be
1391 | 		performed again, if the helper is used in combination with
1392 | 		direct packet access.
1393 | 
1394 | 		All values for *flags* are reserved for future usage, and must
1395 | 		be left at zero.
1396 | 	Return
1397 | 		0 on success, or a negative error in case of failure.
1398 | 
1399 | **int bpf_bind(struct bpf_sock_addr \***\ *ctx*\ **, struct sockaddr \***\ *addr*\ **, int** *addr_len*\ **)**
1400 | 	Description
1401 | 		Bind the socket associated to *ctx* to the address pointed by
1402 | 		*addr*, of length *addr_len*. This allows for making outgoing
1403 | 		connection from the desired IP address, which can be useful for
1404 | 		example when all processes inside a cgroup should use one
1405 | 		single IP address on a host that has multiple IP configured.
1406 | 
1407 | 		This helper works for IPv4 and IPv6, TCP and UDP sockets. The
1408 | 		domain (*addr*\ **->sa_family**) must be **AF_INET** (or
1409 | 		**AF_INET6**). Looking for a free port to bind to can be
1410 | 		expensive, therefore binding to port is not permitted by the
1411 | 		helper: *addr*\ **->sin_port** (or **sin6_port**, respectively)
1412 | 		must be set to zero.
1413 | 	Return
1414 | 		0 on success, or a negative error in case of failure.
1415 | 
1416 | **int bpf_xdp_adjust_tail(struct xdp_buff \***\ *xdp_md*\ **, int** *delta*\ **)**
1417 | 	Description
1418 | 		Adjust (move) *xdp_md*\ **->data_end** by *delta* bytes. It is
1419 | 		only possible to shrink the packet as of this writing,
1420 | 		therefore *delta* must be a negative integer.
1421 | 
1422 | 		A call to this helper is susceptible to change the underlaying
1423 | 		packet buffer. Therefore, at load time, all checks on pointers
1424 | 		previously done by the verifier are invalidated and must be
1425 | 		performed again, if the helper is used in combination with
1426 | 		direct packet access.
1427 | 	Return
1428 | 		0 on success, or a negative error in case of failure.
1429 | 
1430 | **int bpf_skb_get_xfrm_state(struct sk_buff \***\ *skb*\ **, u32** *index*\ **, struct bpf_xfrm_state \***\ *xfrm_state*\ **, u32** *size*\ **, u64** *flags*\ **)**
1431 | 	Description
1432 | 		Retrieve the XFRM state (IP transform framework, see also
1433 | 		**ip-xfrm(8)**) at *index* in XFRM "security path" for *skb*.
1434 | 
1435 | 		The retrieved value is stored in the **struct bpf_xfrm_state**
1436 | 		pointed by *xfrm_state* and of length *size*.
1437 | 
1438 | 		All values for *flags* are reserved for future usage, and must
1439 | 		be left at zero.
1440 | 
1441 | 		This helper is available only if the kernel was compiled with
1442 | 		**CONFIG_XFRM** configuration option.
1443 | 	Return
1444 | 		0 on success, or a negative error in case of failure.
1445 | 
1446 | **int bpf_get_stack(struct pt_regs \***\ *regs*\ **, void \***\ *buf*\ **, u32** *size*\ **, u64** *flags*\ **)**
1447 | 	Description
1448 | 		Return a user or a kernel stack in bpf program provided buffer.
1449 | 		To achieve this, the helper needs *ctx*, which is a pointer
1450 | 		to the context on which the tracing program is executed.
1451 | 		To store the stacktrace, the bpf program provides *buf* with
1452 | 		a nonnegative *size*.
1453 | 
1454 | 		The last argument, *flags*, holds the number of stack frames to
1455 | 		skip (from 0 to 255), masked with
1456 | 		**BPF_F_SKIP_FIELD_MASK**. The next bits can be used to set
1457 | 		the following flags:
1458 | 
1459 | 		**BPF_F_USER_STACK**
1460 | 			Collect a user space stack instead of a kernel stack.
1461 | 		**BPF_F_USER_BUILD_ID**
1462 | 			Collect buildid+offset instead of ips for user stack,
1463 | 			only valid if **BPF_F_USER_STACK** is also specified.
1464 | 
1465 | 		**bpf_get_stack**\ () can collect up to
1466 | 		**PERF_MAX_STACK_DEPTH** both kernel and user frames, subject
1467 | 		to sufficient large buffer size. Note that
1468 | 		this limit can be controlled with the **sysctl** program, and
1469 | 		that it should be manually increased in order to profile long
1470 | 		user stacks (such as stacks for Java programs). To do so, use:
1471 | 
1472 | 		::
1473 | 
1474 | 			# sysctl kernel.perf_event_max_stack=<new value>
1475 | 
1476 | 	Return
1477 | 		a non-negative value equal to or less than size on success, or
1478 | 		a negative error in case of failure.
1479 | 
1480 | 
1481 | EXAMPLES
1482 | ========
1483 | 
1484 | Example usage for most of the eBPF helpers listed in this manual page are
1485 | available within the Linux kernel sources, at the following locations:
1486 | 
1487 | * *samples/bpf/*
1488 | * *tools/testing/selftests/bpf/*
1489 | 
1490 | LICENSE
1491 | =======
1492 | 
1493 | eBPF programs can have an associated license, passed along with the bytecode
1494 | instructions to the kernel when the programs are loaded. The format for that
1495 | string is identical to the one in use for kernel modules (Dual licenses, such
1496 | as "Dual BSD/GPL", may be used). Some helper functions are only accessible to
1497 | programs that are compatible with the GNU Privacy License (GPL).
1498 | 
1499 | In order to use such helpers, the eBPF program must be loaded with the correct
1500 | license string passed (via **attr**) to the **bpf**\ () system call, and this
1501 | generally translates into the C source code of the program containing a line
1502 | similar to the following:
1503 | 
1504 | ::
1505 | 
1506 | 	char ____license[] __attribute__((section("license"), used)) = "GPL";
1507 | 
1508 | IMPLEMENTATION
1509 | ==============
1510 | 
1511 | This manual page is an effort to document the existing eBPF helper functions.
1512 | But as of this writing, the BPF sub-system is under heavy development. New eBPF
1513 | program or map types are added, along with new helper functions. Some helpers
1514 | are occasionally made available for additional program types. So in spite of
1515 | the efforts of the community, this page might not be up-to-date. If you want to
1516 | check by yourself what helper functions exist in your kernel, or what types of
1517 | programs they can support, here are some files among the kernel tree that you
1518 | may be interested in:
1519 | 
1520 | * *include/uapi/linux/bpf.h* is the main BPF header. It contains the full list
1521 |   of all helper functions, as well as many other BPF definitions including most
1522 |   of the flags, structs or constants used by the helpers.
1523 | * *net/core/filter.c* contains the definition of most network-related helper
1524 |   functions, and the list of program types from which they can be used.
1525 | * *kernel/trace/bpf_trace.c* is the equivalent for most tracing program-related
1526 |   helpers.
1527 | * *kernel/bpf/verifier.c* contains the functions used to check that valid types
1528 |   of eBPF maps are used with a given helper function.
1529 | * *kernel/bpf/* directory contains other files in which additional helpers are
1530 |   defined (for cgroups, sockmaps, etc.).
1531 | 
1532 | Compatibility between helper functions and program types can generally be found
1533 | in the files where helper functions are defined. Look for the **struct
1534 | bpf_func_proto** objects and for functions returning them: these functions
1535 | contain a list of helpers that a given program type can call. Note that the
1536 | **default:** label of the **switch ... case** used to filter helpers can call
1537 | other functions, themselves allowing access to additional helpers. The
1538 | requirement for GPL license is also in those **struct bpf_func_proto**.
1539 | 
1540 | Compatibility between helper functions and map types can be found in the
1541 | **check_map_func_compatibility**\ () function in file *kernel/bpf/verifier.c*.
1542 | 
1543 | Helper functions that invalidate the checks on **data** and **data_end**
1544 | pointers for network processing are listed in function
1545 | **bpf_helper_changes_pkt_data**\ () in file *net/core/filter.c*.
1546 | 
1547 | SEE ALSO
1548 | ========
1549 | 
1550 | **bpf**\ (2),
1551 | **cgroups**\ (7),
1552 | **ip**\ (8),
1553 | **perf_event_open**\ (2),
1554 | **sendmsg**\ (2),
1555 | **socket**\ (7),
1556 | **tc-bpf**\ (8)
1557 | 


--------------------------------------------------------------------------------
/bpf_llvm_2015aug19.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/iovisor/bpf-docs/0b9f8ab13f1d2e946325c179f961563ea6e23e65/bpf_llvm_2015aug19.pdf


--------------------------------------------------------------------------------
/bpf_netdev_conference_2016Feb12.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/iovisor/bpf-docs/0b9f8ab13f1d2e946325c179f961563ea6e23e65/bpf_netdev_conference_2016Feb12.pdf


--------------------------------------------------------------------------------
/bpf_netdev_conference_2016Feb12_report.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/iovisor/bpf-docs/0b9f8ab13f1d2e946325c179f961563ea6e23e65/bpf_netdev_conference_2016Feb12_report.pdf


--------------------------------------------------------------------------------
/bpf_netdev_conference_2016Oct07.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/iovisor/bpf-docs/0b9f8ab13f1d2e946325c179f961563ea6e23e65/bpf_netdev_conference_2016Oct07.pdf


--------------------------------------------------------------------------------
/bpf_netdev_conference_2016Oct07_tcws.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/iovisor/bpf-docs/0b9f8ab13f1d2e946325c179f961563ea6e23e65/bpf_netdev_conference_2016Oct07_tcws.pdf


--------------------------------------------------------------------------------
/bpf_netvirt_2015aug21.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/iovisor/bpf-docs/0b9f8ab13f1d2e946325c179f961563ea6e23e65/bpf_netvirt_2015aug21.pdf


--------------------------------------------------------------------------------
/bpf_network_examples_2015aug20.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/iovisor/bpf-docs/0b9f8ab13f1d2e946325c179f961563ea6e23e65/bpf_network_examples_2015aug20.pdf


--------------------------------------------------------------------------------
/bpftrace_public_template_jun2019.odp:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/iovisor/bpf-docs/0b9f8ab13f1d2e946325c179f961563ea6e23e65/bpftrace_public_template_jun2019.odp


--------------------------------------------------------------------------------
/bpftrace_public_template_jun2019.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/iovisor/bpf-docs/0b9f8ab13f1d2e946325c179f961563ea6e23e65/bpftrace_public_template_jun2019.pdf


--------------------------------------------------------------------------------
/eBPF.md:
--------------------------------------------------------------------------------
  1 | # Unofficial eBPF spec
  2 | 
  3 | The [official documentation for the eBPF instruction set][1] is in the
  4 | Linux repository. However, while it is concise, it isn't always easy to
  5 | use as a reference. This document lists each valid eBPF opcode.
  6 | 
  7 | [1]: https://www.kernel.org/doc/Documentation/networking/filter.txt
  8 | 
  9 | ## Instruction encoding
 10 | 
 11 | An eBPF program is a sequence of 64-bit instructions. This project assumes each
 12 | instruction is encoded in host byte order, but the byte order is not relevant
 13 | to this spec.
 14 | 
 15 | All eBPF instructions have the same basic encoding:
 16 | 
 17 |     msb                                                        lsb
 18 |     +------------------------+----------------+----+----+--------+
 19 |     |immediate               |offset          |src |dst |opcode  |
 20 |     +------------------------+----------------+----+----+--------+
 21 | 
 22 | From least significant to most significant bit:
 23 | 
 24 |  - 8 bit opcode
 25 |  - 4 bit destination register (dst)
 26 |  - 4 bit source register (src)
 27 |  - 16 bit offset
 28 |  - 32 bit immediate (imm)
 29 | 
 30 | Most instructions do not use all of these fields. Unused fields should be
 31 | zeroed.
 32 | 
 33 | The low 3 bits of the opcode field are the "instruction class".
 34 | This groups together related opcodes.
 35 | 
 36 | LD/LDX/ST/STX opcode structure:
 37 | 
 38 |     msb      lsb
 39 |     +---+--+---+
 40 |     |mde|sz|cls|
 41 |     +---+--+---+
 42 | 
 43 | The `sz` field specifies the size of the memory location. The `mde` field is
 44 | the memory access mode. uBPF only supports the generic "MEM" access mode.
 45 | 
 46 | ALU/ALU64/JMP opcode structure:
 47 | 
 48 |     msb      lsb
 49 |     +----+-+---+
 50 |     |op  |s|cls|
 51 |     +----+-+---+
 52 | 
 53 | If the `s` bit is zero, then the source operand is `imm`. If `s` is one, then
 54 | the source operand is `src`. The `op` field specifies which ALU or branch
 55 | operation is to be performed.
 56 | 
 57 | ## ALU Instructions
 58 | 
 59 | ### 64-bit
 60 | 
 61 | Opcode | Mnemonic      | Pseudocode
 62 | -------|---------------|-----------------------
 63 | 0x07   | add dst, imm  | dst += imm
 64 | 0x0f   | add dst, src  | dst += src
 65 | 0x17   | sub dst, imm  | dst -= imm
 66 | 0x1f   | sub dst, src  | dst -= src
 67 | 0x27   | mul dst, imm  | dst *= imm
 68 | 0x2f   | mul dst, src  | dst *= src
 69 | 0x37   | div dst, imm  | dst /= imm
 70 | 0x3f   | div dst, src  | dst /= src
 71 | 0x47   | or dst, imm   | dst \|= imm
 72 | 0x4f   | or dst, src   | dst \|= src
 73 | 0x57   | and dst, imm  | dst &= imm
 74 | 0x5f   | and dst, src  | dst &= src
 75 | 0x67   | lsh dst, imm  | dst <<= imm
 76 | 0x6f   | lsh dst, src  | dst <<= src
 77 | 0x77   | rsh dst, imm  | dst >>= imm (logical)
 78 | 0x7f   | rsh dst, src  | dst >>= src (logical)
 79 | 0x87   | neg dst       | dst = -dst
 80 | 0x97   | mod dst, imm  | dst %= imm
 81 | 0x9f   | mod dst, src  | dst %= src
 82 | 0xa7   | xor dst, imm  | dst ^= imm
 83 | 0xaf   | xor dst, src  | dst ^= src
 84 | 0xb7   | mov dst, imm  | dst = imm
 85 | 0xbf   | mov dst, src  | dst = src
 86 | 0xc7   | arsh dst, imm | dst >>= imm (arithmetic)
 87 | 0xcf   | arsh dst, src | dst >>= src (arithmetic)
 88 | 
 89 | ### 32-bit
 90 | 
 91 | These instructions use only the lower 32 bits of their operands and zero the
 92 | upper 32 bits of the destination register.
 93 | 
 94 | Opcode | Mnemonic        | Pseudocode
 95 | -------|-----------------|------------------------------
 96 | 0x04   | add32 dst, imm  | dst += imm
 97 | 0x0c   | add32 dst, src  | dst += src
 98 | 0x14   | sub32 dst, imm  | dst -= imm
 99 | 0x1c   | sub32 dst, src  | dst -= src
100 | 0x24   | mul32 dst, imm  | dst *= imm
101 | 0x2c   | mul32 dst, src  | dst *= src
102 | 0x34   | div32 dst, imm  | dst /= imm
103 | 0x3c   | div32 dst, src  | dst /= src
104 | 0x44   | or32 dst, imm   | dst \|= imm
105 | 0x4c   | or32 dst, src   | dst \|= src
106 | 0x54   | and32 dst, imm  | dst &= imm
107 | 0x5c   | and32 dst, src  | dst &= src
108 | 0x64   | lsh32 dst, imm  | dst <<= imm
109 | 0x6c   | lsh32 dst, src  | dst <<= src
110 | 0x74   | rsh32 dst, imm  | dst >>= imm (logical)
111 | 0x7c   | rsh32 dst, src  | dst >>= src (logical)
112 | 0x84   | neg32 dst       | dst = -dst
113 | 0x94   | mod32 dst, imm  | dst %= imm
114 | 0x9c   | mod32 dst, src  | dst %= src
115 | 0xa4   | xor32 dst, imm  | dst ^= imm
116 | 0xac   | xor32 dst, src  | dst ^= src
117 | 0xb4   | mov32 dst, imm  | dst = imm
118 | 0xbc   | mov32 dst, src  | dst = src
119 | 0xc4   | arsh32 dst, imm | dst >>= imm (arithmetic)
120 | 0xcc   | arsh32 dst, src | dst >>= src (arithmetic)
121 | 
122 | ### Byteswap instructions
123 | 
124 | Opcode           | Mnemonic | Pseudocode
125 | -----------------|----------|-------------------
126 | 0xd4 (imm == 16) | le16 dst | dst = htole16(dst)
127 | 0xd4 (imm == 32) | le32 dst | dst = htole32(dst)
128 | 0xd4 (imm == 64) | le64 dst | dst = htole64(dst)
129 | 0xdc (imm == 16) | be16 dst | dst = htobe16(dst)
130 | 0xdc (imm == 32) | be32 dst | dst = htobe32(dst)
131 | 0xdc (imm == 64) | be64 dst | dst = htobe64(dst)
132 | 
133 | ## Memory Instructions
134 | 
135 | Opcode | Mnemonic              | Pseudocode
136 | -------|-----------------------|--------------------------------
137 | 0x18   | lddw dst, imm         | dst = imm
138 | 0x20   | ldabsw src, dst, imm  | See kernel documentation
139 | 0x28   | ldabsh src, dst, imm  | ...
140 | 0x30   | ldabsb src, dst, imm  | ...
141 | 0x38   | ldabsdw src, dst, imm | ...
142 | 0x40   | ldindw src, dst, imm  | ...
143 | 0x48   | ldindh src, dst, imm  | ...
144 | 0x50   | ldindb src, dst, imm  | ...
145 | 0x58   | ldinddw src, dst, imm | ...
146 | 0x61   | ldxw dst, [src+off]   | dst = *(uint32_t *) (src + off)
147 | 0x69   | ldxh dst, [src+off]   | dst = *(uint16_t *) (src + off)
148 | 0x71   | ldxb dst, [src+off]   | dst = *(uint8_t *) (src + off)
149 | 0x79   | ldxdw dst, [src+off]  | dst = *(uint64_t *) (src + off)
150 | 0x62   | stw [dst+off], imm    | *(uint32_t *) (dst + off) = imm
151 | 0x6a   | sth [dst+off], imm    | *(uint16_t *) (dst + off) = imm
152 | 0x72   | stb [dst+off], imm    | *(uint8_t *) (dst + off) = imm
153 | 0x7a   | stdw [dst+off], imm   | *(uint64_t *) (dst + off) = imm
154 | 0x63   | stxw [dst+off], src   | *(uint32_t *) (dst + off) = src
155 | 0x6b   | stxh [dst+off], src   | *(uint16_t *) (dst + off) = src
156 | 0x73   | stxb [dst+off], src   | *(uint8_t *) (dst + off) = src
157 | 0x7b   | stxdw [dst+off], src  | *(uint64_t *) (dst + off) = src
158 | 
159 | ## Branch Instructions
160 | 
161 | Opcode | Mnemonic            | Pseudocode
162 | -------|---------------------|------------------------
163 | 0x05   | ja +off             | PC += off
164 | 0x15   | jeq dst, imm, +off  | PC += off if dst == imm
165 | 0x1d   | jeq dst, src, +off  | PC += off if dst == src
166 | 0x25   | jgt dst, imm, +off  | PC += off if dst > imm
167 | 0x2d   | jgt dst, src, +off  | PC += off if dst > src
168 | 0x35   | jge dst, imm, +off  | PC += off if dst >= imm
169 | 0x3d   | jge dst, src, +off  | PC += off if dst >= src
170 | 0xa5   | jlt dst, imm, +off  | PC += off if dst < imm
171 | 0xad   | jlt dst, src, +off  | PC += off if dst < src
172 | 0xb5   | jle dst, imm, +off  | PC += off if dst <= imm
173 | 0xbd   | jle dst, src, +off  | PC += off if dst <= src
174 | 0x45   | jset dst, imm, +off | PC += off if dst & imm
175 | 0x4d   | jset dst, src, +off | PC += off if dst & src
176 | 0x55   | jne dst, imm, +off  | PC += off if dst != imm
177 | 0x5d   | jne dst, src, +off  | PC += off if dst != src
178 | 0x65   | jsgt dst, imm, +off | PC += off if dst > imm (signed)
179 | 0x6d   | jsgt dst, src, +off | PC += off if dst > src (signed)
180 | 0x75   | jsge dst, imm, +off | PC += off if dst >= imm (signed)
181 | 0x7d   | jsge dst, src, +off | PC += off if dst >= src (signed)
182 | 0xc5   | jslt dst, imm, +off | PC += off if dst < imm (signed)
183 | 0xcd   | jslt dst, src, +off | PC += off if dst < src (signed)
184 | 0xd5   | jsle dst, imm, +off | PC += off if dst <= imm (signed)
185 | 0xdd   | jsle dst, src, +off | PC += off if dst <= src (signed)
186 | 0x85   | call imm            | Function call
187 | 0x95   | exit                | return r0
188 | 


--------------------------------------------------------------------------------
/ebpf_excerpt_20Aug2015.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/iovisor/bpf-docs/0b9f8ab13f1d2e946325c179f961563ea6e23e65/ebpf_excerpt_20Aug2015.pdf


--------------------------------------------------------------------------------
/ebpf_http_filter.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/iovisor/bpf-docs/0b9f8ab13f1d2e946325c179f961563ea6e23e65/ebpf_http_filter.pdf


--------------------------------------------------------------------------------
/meetups/2015-09-21/iovisor-bcc-intro.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/iovisor/bpf-docs/0b9f8ab13f1d2e946325c179f961563ea6e23e65/meetups/2015-09-21/iovisor-bcc-intro.pdf


--------------------------------------------------------------------------------
/netconf_2016feb.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/iovisor/bpf-docs/0b9f8ab13f1d2e946325c179f961563ea6e23e65/netconf_2016feb.pdf


--------------------------------------------------------------------------------
/openstack/2015-10-29/iovisor-mesos-demo.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/iovisor/bpf-docs/0b9f8ab13f1d2e946325c179f961563ea6e23e65/openstack/2015-10-29/iovisor-mesos-demo.pdf


--------------------------------------------------------------------------------
/openstack/2016-04-25/OpenStackSummitAustin2016_iovisor_v1.0.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/iovisor/bpf-docs/0b9f8ab13f1d2e946325c179f961563ea6e23e65/openstack/2016-04-25/OpenStackSummitAustin2016_iovisor_v1.0.pdf


--------------------------------------------------------------------------------
/p4/2015-11-18/iovisor-p4-workshop-nov-2015.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/iovisor/bpf-docs/0b9f8ab13f1d2e946325c179f961563ea6e23e65/p4/2015-11-18/iovisor-p4-workshop-nov-2015.pdf


--------------------------------------------------------------------------------
/p4/p4toEbpf-bcc.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/iovisor/bpf-docs/0b9f8ab13f1d2e946325c179f961563ea6e23e65/p4/p4toEbpf-bcc.pdf


--------------------------------------------------------------------------------
/p4AbstractSwitch.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/iovisor/bpf-docs/0b9f8ab13f1d2e946325c179f961563ea6e23e65/p4AbstractSwitch.pdf


--------------------------------------------------------------------------------
/tsc-meeting-minutes/2015-09-02/eBPF_to_IOV_Module.pptx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/iovisor/bpf-docs/0b9f8ab13f1d2e946325c179f961563ea6e23e65/tsc-meeting-minutes/2015-09-02/eBPF_to_IOV_Module.pptx


--------------------------------------------------------------------------------
/tsc-meeting-minutes/2015-09-16/iomodules-slides.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/iovisor/bpf-docs/0b9f8ab13f1d2e946325c179f961563ea6e23e65/tsc-meeting-minutes/2015-09-16/iomodules-slides.pdf


--------------------------------------------------------------------------------
/tsc-meeting-minutes/2015-09-16/iovisor-odl-gbp-module.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/iovisor/bpf-docs/0b9f8ab13f1d2e946325c179f961563ea6e23e65/tsc-meeting-minutes/2015-09-16/iovisor-odl-gbp-module.pdf


--------------------------------------------------------------------------------
/university/eBPF_IOVisor_academic_research.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/iovisor/bpf-docs/0b9f8ab13f1d2e946325c179f961563ea6e23e65/university/eBPF_IOVisor_academic_research.pdf


--------------------------------------------------------------------------------
/university/sigcomm-ccr-InKev-2016.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/iovisor/bpf-docs/0b9f8ab13f1d2e946325c179f961563ea6e23e65/university/sigcomm-ccr-InKev-2016.pdf


--------------------------------------------------------------------------------