├── README.md ├── annotate_ebpf_helpers.py ├── annotate_relocations.py ├── bpf_helper_enum.h ├── ebpf.py ├── helper_annotation ├── generate_helper_lookup.sh └── parse_helper_header.py ├── img └── bpf_ida.png └── samples ├── ebpfkit ├── bootstrap.o └── main.o └── libbpf-bootstrap ├── bootstrap.bpf.o └── minimal.bpf.o /README.md: -------------------------------------------------------------------------------- 1 | # eBPF IDA Proc 2 | 3 | This is an IDA Processor and supporting scripts that can be used to disassemble 4 | eBPF bytecode. It was originally developed for a challenge, but since expanded 5 | and updated. It still needs testing against more eBPF ELF files and comparison 6 | with output from other tools like eBPF-supporting objdump, and bpftool. 7 | 8 | It was developed primarily against eBPF ELFs produced as part of libbpf 9 | toolchains, where the ELF is opened/loaded by libbpf. If your ELF differs from 10 | the conventions that libbpf expects, you may get inaccurate results or other 11 | failures. 12 | 13 | ## Requirements 14 | 15 | Currently IDA 7.4+ using Python3 is necessary. 16 | 17 | the `pyelftools` python package is necessary for annotating map references. 18 | 19 | ## Installation 20 | 21 | You just need to place `ebpf.py` in your `IDA_ROOT\procs` folder. 22 | 23 | If you want map relocation annotation to work, you additionally need to install 24 | the `pyelftools` package from pip, and ensure your IDA knows about it (look at 25 | `sys.path` in your IDA python interpreter, and try `import elftools`). 26 | 27 | ## Use 28 | 29 | 1. Open the eBPF ELF file in IDA, using the standard ELF loader, but selecting the eBPF processor 30 | 2. Wait for autoanalysis to complete 31 | 3. Go to File > Script file ... (Alt + F7) to select a script file to run 32 | 4. Select the "annotate_ebpf_helpers.py" script 33 | 5. Wait for it to finish 34 | 6. Following the same process, run the "annotate_map_relocations.py" script 35 | 7. Wait for it to finish 36 | 37 | Auto-analysis should at least mark bytes in code segments as instructions and 38 | disassemble them, though may not mark them as functions proper. Currently the 39 | bpf helper annotating script only inspects instructions belonging to functions, 40 | not all instructions present in the program. You may need to manually define 41 | functions so the helper annotation script sees them. 42 | 43 | The map annotating script requires the original ELF file because it requires the 44 | section headers, relocation sections, string and symbol tables to function. As 45 | mentioned above it also depends on the pyelftools package. 46 | 47 | Now you have all your eBPF programs disassembled, with helper calls annotated 48 | with the helper's full signature, and with data references to maps added 49 | including repeatable comments to annotate where maps are referenced. 50 | 51 | ## Testing 52 | 53 | This has been tested against eBPF ELF objects from 54 | https://github.com/vbpf/ebpf-samples, 55 | https://github.com/libbpf/libbpf-bootstrap, and from 56 | https://github.com/Gui774ume/ebpfkit 57 | 58 | A small selection of these eBPF ELF objects have been included in the samples 59 | directory for convenience. libbpf-bootstrap examples are very simple, ebpfkit 60 | samples are quite complicated. 61 | 62 | This should be a good starting point for making sure we can handle some 63 | reasonably real-world eBPF ELF files, but could easily miss more specialized 64 | programs that use less common instructions, or have a more customized loading 65 | process that's significantly different from libbpf's methods. 66 | 67 | Currently all instructions in these eBPF ELF files are recognized and 68 | disassembled. IDA's built-in ELF loader does an acceptable job loading these 69 | files, but does not interpret some eBPF-specific sections like BTF and maps. 70 | We've included scripts for annotating helper calls and map references, but these 71 | have been less rigorously tested. 72 | 73 | If you think anything is amiss, please compare the output you're seeing against 74 | output from something like `llvm-objdump -dr ebpf_elf_object.o`. The 75 | instruction syntax is different but should get the same point across. A more 76 | significant difference may indicate a problem. Relatively recent llvm is 77 | necessary to disassemble eBPF and handle eBPF specific things, but you should 78 | have it if you can build libbpf projects. 79 | 80 | ## Issues 81 | 82 | There are a number of unsupported instructions that simply have not been 83 | encountered during development & testing yet. If you run across an unsupported 84 | instruction you'll likely have autonalysis break with sections of code left as 85 | `db` or `dq` bytes. Manually marking unrecognized instruction bytes as code 86 | fails with a "MakeCode failed" kind of error. 87 | 88 | There is no custom loader so BTF related sections simply aren't handled. The 89 | map relocation annotating script tries to replicate how IDA loads sections, so 90 | if your object differs from this when loaded in IDA, the map annotation will 91 | give wrong results. 92 | 93 | Global and static variable references are not currently annotated. 94 | 95 | ## Author 96 | 97 | - Original author: Clément Berthaux - clement (dot) berthaux (at) synacktiv (dot) com 98 | - Fixes, Expansions & Updates: Michael Zandi - the (dot) zandi (at) gmail (dot) com 99 | 100 | 101 | ![Example of filter opened in IDA](img/bpf_ida.png) 102 | -------------------------------------------------------------------------------- /annotate_ebpf_helpers.py: -------------------------------------------------------------------------------- 1 | # IDA Python script to annotate all eBPF call instructions 2 | # which call eBPF helpers with the helper function's name 3 | # Developed against Python3 and IDA 7.6 4 | 5 | # started with the instruction enumeration example from the IDA Book 6 | 7 | from idaapi import * 8 | import idautils 9 | import idc 10 | 11 | # created using our generate_helper_lookup.sh script. 12 | # will need to periodically update this as new helpers are added. 13 | 14 | helper_id_to_signature = {1: 'void *bpf_map_lookup_elem(void *map, const void *key)', 2: 'long bpf_map_update_elem(void *map, const void *key, const void *value, __u64 flags)', 3: 'long bpf_map_delete_elem(void *map, const void *key)', 4: 'long bpf_probe_read(void *dst, __u32 size, const void *unsafe_ptr)', 5: '__u64 bpf_ktime_get_ns(void)', 6: 'long bpf_trace_printk(const char *fmt, __u32 fmt_size, ...)', 7: '__u32 bpf_get_prandom_u32(void)', 8: '__u32 bpf_get_smp_processor_id(void)', 9: 'long bpf_skb_store_bytes(struct __sk_buff *skb, __u32 offset, const void *from, __u32 len, __u64 flags)', 10: 'long bpf_l3_csum_replace(struct __sk_buff *skb, __u32 offset, __u64 from, __u64 to, __u64 size)', 11: 'long bpf_l4_csum_replace(struct __sk_buff *skb, __u32 offset, __u64 from, __u64 to, __u64 flags)', 12: 'long bpf_tail_call(void *ctx, void *prog_array_map, __u32 index)', 13: 'long bpf_clone_redirect(struct __sk_buff *skb, __u32 ifindex, __u64 flags)', 14: '__u64 bpf_get_current_pid_tgid(void)', 15: '__u64 bpf_get_current_uid_gid(void)', 16: 'long bpf_get_current_comm(void *buf, __u32 size_of_buf)', 17: '__u32 bpf_get_cgroup_classid(struct __sk_buff *skb)', 18: 'long bpf_skb_vlan_push(struct __sk_buff *skb, __be16 vlan_proto, __u16 vlan_tci)', 19: 'long bpf_skb_vlan_pop(struct __sk_buff *skb)', 20: 'long bpf_skb_get_tunnel_key(struct __sk_buff *skb, struct bpf_tunnel_key *key, __u32 size, __u64 flags)', 21: 'long bpf_skb_set_tunnel_key(struct __sk_buff *skb, struct bpf_tunnel_key *key, __u32 size, __u64 flags)', 22: '__u64 bpf_perf_event_read(void *map, __u64 flags)', 23: 'long bpf_redirect(__u32 ifindex, __u64 flags)', 24: '__u32 bpf_get_route_realm(struct __sk_buff *skb)', 25: 'long bpf_perf_event_output(void *ctx, void *map, __u64 flags, void *data, __u64 size)', 26: 'long bpf_skb_load_bytes(const void *skb, __u32 offset, void *to, __u32 len)', 27: 'long bpf_get_stackid(void *ctx, void *map, __u64 flags)', 28: '__s64 bpf_csum_diff(__be32 *from, __u32 from_size, __be32 *to, __u32 to_size, __wsum seed)', 29: 'long bpf_skb_get_tunnel_opt(struct __sk_buff *skb, void *opt, __u32 size)', 30: 'long bpf_skb_set_tunnel_opt(struct __sk_buff *skb, void *opt, __u32 size)', 31: 'long bpf_skb_change_proto(struct __sk_buff *skb, __be16 proto, __u64 flags)', 32: 'long bpf_skb_change_type(struct __sk_buff *skb, __u32 type)', 33: 'long bpf_skb_under_cgroup(struct __sk_buff *skb, void *map, __u32 index)', 34: '__u32 bpf_get_hash_recalc(struct __sk_buff *skb)', 35: '__u64 bpf_get_current_task(void)', 36: 'long bpf_probe_write_user(void *dst, const void *src, __u32 len)', 37: 'long bpf_current_task_under_cgroup(void *map, __u32 index)', 38: 'long bpf_skb_change_tail(struct __sk_buff *skb, __u32 len, __u64 flags)', 39: 'long bpf_skb_pull_data(struct __sk_buff *skb, __u32 len)', 40: '__s64 bpf_csum_update(struct __sk_buff *skb, __wsum csum)', 41: 'void bpf_set_hash_invalid(struct __sk_buff *skb)', 42: 'long bpf_get_numa_node_id(void)', 43: 'long bpf_skb_change_head(struct __sk_buff *skb, __u32 len, __u64 flags)', 44: 'long bpf_xdp_adjust_head(struct xdp_md *xdp_md, int delta)', 45: 'long bpf_probe_read_str(void *dst, __u32 size, const void *unsafe_ptr)', 46: '__u64 bpf_get_socket_cookie(void *ctx)', 47: '__u32 bpf_get_socket_uid(struct __sk_buff *skb)', 48: 'long bpf_set_hash(struct __sk_buff *skb, __u32 hash)', 49: 'long bpf_setsockopt(void *bpf_socket, int level, int optname, void *optval, int optlen)', 50: 'long bpf_skb_adjust_room(struct __sk_buff *skb, __s32 len_diff, __u32 mode, __u64 flags)', 51: 'long bpf_redirect_map(void *map, __u32 key, __u64 flags)', 52: 'long bpf_sk_redirect_map(struct __sk_buff *skb, void *map, __u32 key, __u64 flags)', 53: 'long bpf_sock_map_update(struct bpf_sock_ops *skops, void *map, void *key, __u64 flags)', 54: 'long bpf_xdp_adjust_meta(struct xdp_md *xdp_md, int delta)', 55: 'long bpf_perf_event_read_value(void *map, __u64 flags, struct bpf_perf_event_value *buf, __u32 buf_size)', 56: 'long bpf_perf_prog_read_value(struct bpf_perf_event_data *ctx, struct bpf_perf_event_value *buf, __u32 buf_size)', 57: 'long bpf_getsockopt(void *bpf_socket, int level, int optname, void *optval, int optlen)', 58: 'long bpf_override_return(struct pt_regs *regs, __u64 rc)', 59: 'long bpf_sock_ops_cb_flags_set(struct bpf_sock_ops *bpf_sock, int argval)', 60: 'long bpf_msg_redirect_map(struct sk_msg_md *msg, void *map, __u32 key, __u64 flags)', 61: 'long bpf_msg_apply_bytes(struct sk_msg_md *msg, __u32 bytes)', 62: 'long bpf_msg_cork_bytes(struct sk_msg_md *msg, __u32 bytes)', 63: 'long bpf_msg_pull_data(struct sk_msg_md *msg, __u32 start, __u32 end, __u64 flags)', 64: 'long bpf_bind(struct bpf_sock_addr *ctx, struct sockaddr *addr, int addr_len)', 65: 'long bpf_xdp_adjust_tail(struct xdp_md *xdp_md, int delta)', 66: 'long bpf_skb_get_xfrm_state(struct __sk_buff *skb, __u32 index, struct bpf_xfrm_state *xfrm_state, __u32 size, __u64 flags)', 67: 'long bpf_get_stack(void *ctx, void *buf, __u32 size, __u64 flags)', 68: 'long bpf_skb_load_bytes_relative(const void *skb, __u32 offset, void *to, __u32 len, __u32 start_header)', 69: 'long bpf_fib_lookup(void *ctx, struct bpf_fib_lookup *params, int plen, __u32 flags)', 70: 'long bpf_sock_hash_update(struct bpf_sock_ops *skops, void *map, void *key, __u64 flags)', 71: 'long bpf_msg_redirect_hash(struct sk_msg_md *msg, void *map, void *key, __u64 flags)', 72: 'long bpf_sk_redirect_hash(struct __sk_buff *skb, void *map, void *key, __u64 flags)', 73: 'long bpf_lwt_push_encap(struct __sk_buff *skb, __u32 type, void *hdr, __u32 len)', 74: 'long bpf_lwt_seg6_store_bytes(struct __sk_buff *skb, __u32 offset, const void *from, __u32 len)', 75: 'long bpf_lwt_seg6_adjust_srh(struct __sk_buff *skb, __u32 offset, __s32 delta)', 76: 'long bpf_lwt_seg6_action(struct __sk_buff *skb, __u32 action, void *param, __u32 param_len)', 77: 'long bpf_rc_repeat(void *ctx)', 78: 'long bpf_rc_keydown(void *ctx, __u32 protocol, __u64 scancode, __u32 toggle)', 79: '__u64 bpf_skb_cgroup_id(struct __sk_buff *skb)', 80: '__u64 bpf_get_current_cgroup_id(void)', 81: 'void *bpf_get_local_storage(void *map, __u64 flags)', 82: 'long bpf_sk_select_reuseport(struct sk_reuseport_md *reuse, void *map, void *key, __u64 flags)', 83: '__u64 bpf_skb_ancestor_cgroup_id(struct __sk_buff *skb, int ancestor_level)', 84: 'struct bpf_sock *bpf_sk_lookup_tcp(void *ctx, struct bpf_sock_tuple *tuple, __u32 tuple_size, __u64 netns, __u64 flags)', 85: 'struct bpf_sock *bpf_sk_lookup_udp(void *ctx, struct bpf_sock_tuple *tuple, __u32 tuple_size, __u64 netns, __u64 flags)', 86: 'long bpf_sk_release(void *sock)', 87: 'long bpf_map_push_elem(void *map, const void *value, __u64 flags)', 88: 'long bpf_map_pop_elem(void *map, void *value)', 89: 'long bpf_map_peek_elem(void *map, void *value)', 90: 'long bpf_msg_push_data(struct sk_msg_md *msg, __u32 start, __u32 len, __u64 flags)', 91: 'long bpf_msg_pop_data(struct sk_msg_md *msg, __u32 start, __u32 len, __u64 flags)', 92: 'long bpf_rc_pointer_rel(void *ctx, __s32 rel_x, __s32 rel_y)', 93: 'long bpf_spin_lock(struct bpf_spin_lock *lock)', 94: 'long bpf_spin_unlock(struct bpf_spin_lock *lock)', 95: 'struct bpf_sock *bpf_sk_fullsock(struct bpf_sock *sk)', 96: 'struct bpf_tcp_sock *bpf_tcp_sock(struct bpf_sock *sk)', 97: 'long bpf_skb_ecn_set_ce(struct __sk_buff *skb)', 98: 'struct bpf_sock *bpf_get_listener_sock(struct bpf_sock *sk)', 99: 'struct bpf_sock *bpf_skc_lookup_tcp(void *ctx, struct bpf_sock_tuple *tuple, __u32 tuple_size, __u64 netns, __u64 flags)', 100: 'long bpf_tcp_check_syncookie(void *sk, void *iph, __u32 iph_len, struct tcphdr *th, __u32 th_len)', 101: 'long bpf_sysctl_get_name(struct bpf_sysctl *ctx, char *buf, unsigned long buf_len, __u64 flags)', 102: 'long bpf_sysctl_get_current_value(struct bpf_sysctl *ctx, char *buf, unsigned long buf_len)', 103: 'long bpf_sysctl_get_new_value(struct bpf_sysctl *ctx, char *buf, unsigned long buf_len)', 104: 'long bpf_sysctl_set_new_value(struct bpf_sysctl *ctx, const char *buf, unsigned long buf_len)', 105: 'long bpf_strtol(const char *buf, unsigned long buf_len, __u64 flags, long *res)', 106: 'long bpf_strtoul(const char *buf, unsigned long buf_len, __u64 flags, unsigned long *res)', 107: 'void *bpf_sk_storage_get(void *map, void *sk, void *value, __u64 flags)', 108: 'long bpf_sk_storage_delete(void *map, void *sk)', 109: 'long bpf_send_signal(__u32 sig)', 110: '__s64 bpf_tcp_gen_syncookie(void *sk, void *iph, __u32 iph_len, struct tcphdr *th, __u32 th_len)', 111: 'long bpf_skb_output(void *ctx, void *map, __u64 flags, void *data, __u64 size)', 112: 'long bpf_probe_read_user(void *dst, __u32 size, const void *unsafe_ptr)', 113: 'long bpf_probe_read_kernel(void *dst, __u32 size, const void *unsafe_ptr)', 114: 'long bpf_probe_read_user_str(void *dst, __u32 size, const void *unsafe_ptr)', 115: 'long bpf_probe_read_kernel_str(void *dst, __u32 size, const void *unsafe_ptr)', 116: 'long bpf_tcp_send_ack(void *tp, __u32 rcv_nxt)', 117: 'long bpf_send_signal_thread(__u32 sig)', 118: '__u64 bpf_jiffies64(void)', 119: 'long bpf_read_branch_records(struct bpf_perf_event_data *ctx, void *buf, __u32 size, __u64 flags)', 120: 'long bpf_get_ns_current_pid_tgid(__u64 dev, __u64 ino, struct bpf_pidns_info *nsdata, __u32 size)', 121: 'long bpf_xdp_output(void *ctx, void *map, __u64 flags, void *data, __u64 size)', 122: '__u64 bpf_get_netns_cookie(void *ctx)', 123: '__u64 bpf_get_current_ancestor_cgroup_id(int ancestor_level)', 124: 'long bpf_sk_assign(void *ctx, void *sk, __u64 flags)', 125: '__u64 bpf_ktime_get_boot_ns(void)', 126: 'long bpf_seq_printf(struct seq_file *m, const char *fmt, __u32 fmt_size, const void *data, __u32 data_len)', 127: 'long bpf_seq_write(struct seq_file *m, const void *data, __u32 len)', 128: '__u64 bpf_sk_cgroup_id(void *sk)', 129: '__u64 bpf_sk_ancestor_cgroup_id(void *sk, int ancestor_level)', 130: 'long bpf_ringbuf_output(void *ringbuf, void *data, __u64 size, __u64 flags)', 131: 'void *bpf_ringbuf_reserve(void *ringbuf, __u64 size, __u64 flags)', 132: 'void bpf_ringbuf_submit(void *data, __u64 flags)', 133: 'void bpf_ringbuf_discard(void *data, __u64 flags)', 134: '__u64 bpf_ringbuf_query(void *ringbuf, __u64 flags)', 135: 'long bpf_csum_level(struct __sk_buff *skb, __u64 level)', 136: 'struct tcp6_sock *bpf_skc_to_tcp6_sock(void *sk)', 137: 'struct tcp_sock *bpf_skc_to_tcp_sock(void *sk)', 138: 'struct tcp_timewait_sock *bpf_skc_to_tcp_timewait_sock(void *sk)', 139: 'struct tcp_request_sock *bpf_skc_to_tcp_request_sock(void *sk)', 140: 'struct udp6_sock *bpf_skc_to_udp6_sock(void *sk)', 141: 'long bpf_get_task_stack(struct task_struct *task, void *buf, __u32 size, __u64 flags)', 142: 'long bpf_load_hdr_opt(struct bpf_sock_ops *skops, void *searchby_res, __u32 len, __u64 flags)', 143: 'long bpf_store_hdr_opt(struct bpf_sock_ops *skops, const void *from, __u32 len, __u64 flags)', 144: 'long bpf_reserve_hdr_opt(struct bpf_sock_ops *skops, __u32 len, __u64 flags)', 145: 'void *bpf_inode_storage_get(void *map, void *inode, void *value, __u64 flags)', 146: 'int bpf_inode_storage_delete(void *map, void *inode)', 147: 'long bpf_d_path(struct path *path, char *buf, __u32 sz)', 148: 'long bpf_copy_from_user(void *dst, __u32 size, const void *user_ptr)', 149: 'long bpf_snprintf_btf(char *str, __u32 str_size, struct btf_ptr *ptr, __u32 btf_ptr_size, __u64 flags)', 150: 'long bpf_seq_printf_btf(struct seq_file *m, struct btf_ptr *ptr, __u32 ptr_size, __u64 flags)', 151: '__u64 bpf_skb_cgroup_classid(struct __sk_buff *skb)', 152: 'long bpf_redirect_neigh(__u32 ifindex, struct bpf_redir_neigh *params, int plen, __u64 flags)', 153: 'void *bpf_per_cpu_ptr(const void *percpu_ptr, __u32 cpu)', 154: 'void *bpf_this_cpu_ptr(const void *percpu_ptr)', 155: 'long bpf_redirect_peer(__u32 ifindex, __u64 flags)', 156: 'void *bpf_task_storage_get(void *map, struct task_struct *task, void *value, __u64 flags)', 157: 'long bpf_task_storage_delete(void *map, struct task_struct *task)', 158: 'struct task_struct *bpf_get_current_task_btf(void)', 159: 'long bpf_bprm_opts_set(struct linux_binprm *bprm, __u64 flags)', 160: '__u64 bpf_ktime_get_coarse_ns(void)', 161: 'long bpf_ima_inode_hash(struct inode *inode, void *dst, __u32 size)', 162: 'struct socket *bpf_sock_from_file(struct file *file)', 163: 'long bpf_check_mtu(void *ctx, __u32 ifindex, __u32 *mtu_len, __s32 len_diff, __u64 flags)', 164: 'long bpf_for_each_map_elem(void *map, void *callback_fn, void *callback_ctx, __u64 flags)', 165: 'long bpf_snprintf(char *str, __u32 str_size, const char *fmt, __u64 *data, __u32 data_len)'} 15 | 16 | print("Scanning all known function instructions for eBPF helper call instructions") 17 | for ea in Functions(): 18 | func = get_func(ea) 19 | func_name = ida_funcs.get_func_name(func.start_ea) 20 | print(f"scanning function: {func_name}") 21 | if func: 22 | for i in FuncItems(func.start_ea): 23 | insn = ida_ua.insn_t() 24 | if ida_ua.decode_insn(insn, i): 25 | feature = insn.get_canon_feature() 26 | if feature & CF_CALL: 27 | # TODO: check that we're a helper call, not a bpf tail call 28 | try: 29 | helper_signature = helper_id_to_signature[insn[0].value] 30 | except KeyError: 31 | helper_signature = "UNKNOWN. Update list of helper function signatures" 32 | Warning(f"Unknown eBPF helper {insn[0].value:#x}. Check for updates, or use generate_helper_lookup.sh with newer kernel version and manually update this script.") 33 | 34 | idc.set_cmt(i, helper_signature, False) 35 | -------------------------------------------------------------------------------- /annotate_relocations.py: -------------------------------------------------------------------------------- 1 | # IDA Python script to annotate all references to maps 2 | # with a comment of the map name. 3 | # 4 | # requires pyelftools, since that does all the ELF parsing for us 5 | # 6 | # Also requires the eBPF ELF file to examine directly 7 | 8 | # general strategy: parse ELF to determine relocation info & locations, 9 | # replicate how IDA maps sections to addresses, use this replicated 10 | # loading/addressing to add drefs (ida_xref.add_dref) from each 11 | # relocated location (an instruction) to a defined map. 12 | # 13 | # Then, add repeatable comments on each defined map. This should at least 14 | # cause those repeatable comments to appear alongside instructions which 15 | # have relocations applied to them for those particular maps 16 | 17 | from idaapi import * 18 | import ida_xref 19 | import ida_nalt 20 | 21 | # just copied these, may not need them 22 | import idautils 23 | import idc 24 | 25 | # elftools are required 26 | from elftools.elf.elffile import ELFFile 27 | from elftools.elf.relocation import RelocationSection 28 | from elftools.elf.sections import SymbolTableSection, StringTableSection 29 | from elftools.elf.enums import ENUM_SH_TYPE_BASE 30 | 31 | def get_symtab_strtab(elffile): 32 | symtab = None 33 | strtab = None 34 | for s in elffile.iter_sections(): 35 | if isinstance(s, SymbolTableSection): 36 | symtab = s 37 | elif isinstance(s, StringTableSection): 38 | strtab = s 39 | return (symtab, strtab) 40 | 41 | # find all map sections, returning id/name tuples 42 | # we've seem maps sections named ".maps", "maps", and "maps/[name]" 43 | # ".maps" contains multiple maps, "maps" likely does as well, "maps/[name]" may be individual maps 44 | def get_map_sections(elffile): 45 | i = 0 46 | map_sections = [] 47 | for s in elffile.iter_sections(): 48 | if s.name.startswith(".maps") \ 49 | or s.name.startswith("maps/") \ 50 | or s.name.startswith("maps"): 51 | map_sections.append((i, s.name)) 52 | i+=1 53 | return map_sections 54 | 55 | # determine which symbols refer to maps, return an array of them 56 | def get_maps(elffile): 57 | maps = [] 58 | (symtab, strtab) = get_symtab_strtab(elffile) 59 | map_sections = get_map_sections(elffile) 60 | 61 | map_section_ids = [s[0] for s in map_sections] 62 | for sym in symtab.iter_symbols(): 63 | # check if the symbol is in a map section 64 | if sym['st_shndx'] in map_section_ids: 65 | maps.append(sym) 66 | return maps 67 | 68 | # constants in libbpf-bootstrap binaries have GLOBAL binding and OBJECT type, and a section index pointing to the resident section (.bss/.rodata). 69 | # ebpfkit's constants (http_server_port, ebpfkit_pid) are different, with NOTYPE type and no certainly defined resident section 70 | # IDA appears to treat these like extern symbols, since they're referenced in the symbol table & relocations, but aren't given a section to reside in 71 | # name: 4228 -> ebpfkit_pid, info: Container({'bind': 'STB_GLOBAL', 'type': 'STT_NOTYPE'}), value: 0x0, size: 0x0, resident section: SHN_UNDEF 72 | def get_possible_globals(elffile): 73 | (symtab, strtab) = get_symtab_strtab(elffile) 74 | 75 | maps = get_maps(elffile) 76 | map_names = [s.name for s in maps] 77 | 78 | possible_globals = [] 79 | 80 | for s in symtab.iter_symbols(): 81 | if s['st_info']['bind'] == 'STB_GLOBAL' and (s['st_info']['type'] == 'STT_OBJECT' or s['st_info']['type'] == 'STT_NOTYPE'): 82 | if s.name != 'LICENSE' and s.name not in map_names: 83 | # guess: want global binding, type of 'object' 84 | possible_globals.append(s) 85 | 86 | return possible_globals 87 | 88 | # get program sections and their associated address ranges 89 | # This will make dealing with relocations easier, since we 90 | # can match a relocation section to its program section, and 91 | # have the 'loaded' address range for the section to determine 92 | # the full address of the bytes which are to be relocated, 93 | # based on the offset within the section 94 | # 95 | # for convention, tuples are (section, base_address, end_address) 96 | # where the address range is [base_address, end_address) 97 | # 98 | # We do our best to replicate how IDA maps these sections 99 | # into memory, and it seems to be correct, but this may be 100 | # fragile. Future fix is to do this in the loader, where we'll 101 | # also control how sections are mapped into memory 102 | def get_program_sections_with_address_ranges(elffile): 103 | i=0 104 | program_sections = [] 105 | (symtab, strtab) = get_symtab_strtab(elffile) 106 | cur_addr = 0 107 | for s in elffile.iter_sections(): 108 | if s['sh_type'] == 'SHT_PROGBITS' or s['sh_type'] == 'SHT_NOBITS': 109 | # let's try our hand at 'mapping' sections into memory similar to how IDA does. 110 | # it seems to just be linear starting at 0, and up to alignment 111 | # Also, no overlapping sections; if .text is 0-length, the next section doesn't start at 0 as well 112 | # This algorithm seems to match IDA, though IDA additionally creates an 'extern' section 113 | # note: PROGBITS have bits to be loaded in. NOBITS don't have bits, the loader supplies 0 bytes. 114 | section_name = strtab.get_string(s['sh_name']) 115 | if section_name.startswith(".BTF"): 116 | continue # skip BTF map; IDA doesn't map it 117 | 118 | if s['sh_addr'] != 0: 119 | print("WARNING: non-zero address in section, this interferes with our loading assumptions") 120 | 121 | if cur_addr % s['sh_addralign']: 122 | cur_addr += (s['sh_addralign'] - (cur_addr % s['sh_addralign'])) 123 | 124 | program_sections.append((s, cur_addr, cur_addr + s['sh_size'])) 125 | 126 | # gross hack so the next section can't overlap us, and will likely be fixed up for alignment reasons 127 | if s['sh_size'] == 0: 128 | cur_addr += 1 129 | 130 | cur_addr += s['sh_size'] 131 | i+=1 132 | return program_sections 133 | 134 | # print sections containing programs (ideally they'll also have relocations) 135 | # note: they don't seem to have an address, it seems like IDA assumes that PROGBITS sections 136 | # (that have the alloc flag?) are allocated linearly based on alignment (often 8; width of nearly all eBPF instructions) 137 | def print_program_sections(elffile): 138 | 139 | (symtab, strtab) = get_symtab_strtab(elffile) 140 | program_sections = get_program_sections_with_address_ranges(elffile) 141 | 142 | for (s, base_addr, end_addr) in program_sections: 143 | section_name = strtab.get_string(s['sh_name']) 144 | print(f"\t[{base_addr:#8x}, {end_addr:#8x}): align {s['sh_addralign']:#8x} size {s['sh_size']:#8x} {section_name}") 145 | 146 | # Process map and global relocations. 147 | # Maps have a repeatable comment put on their definition, and a dref added to the relocation area 148 | # Globals are processed the same if their definition resides in another section. 149 | # If not, they're an extern symbol, and we just comment each individual without a xref. 150 | # This is a lazy hack because the IDA loader creates and extern section for these symbols, and I 151 | # just don't feel like replicating that addressing for adding those drefs right now 152 | def process_relocations(elffile): 153 | # first, get symbol/string tables, we'll use them a lot 154 | # next, collect info on which symbols are maps 155 | # copy the whole symbol object, build other metadata/lookup objects 156 | # next, collect info on symbols which may be globals 157 | # next, collect info on address ranges for program sections 158 | # need section's name, and correlated address range. Other info currently irrelevant (align, etc.) 159 | # name to match relocation sections to the program section they apply to, and address for offset + address calculation 160 | # next, iterate through relocation sections for each program section 161 | # combine info on relocation's offset in its section, with the symbol relocation, to print address of map relocations 162 | # Additionally if a relocation is for a global, add a dref or comment as appropriate 163 | 164 | # get symbol & string tables 165 | (symtab, strtab) = get_symtab_strtab(elffile) 166 | 167 | # get possible global symbols 168 | possible_globals = get_possible_globals(elffile) 169 | possible_global_names = [s.name for s in possible_globals] 170 | 171 | # get symbols which are maps 172 | maps = get_maps(elffile) 173 | map_sections = get_map_sections(elffile) 174 | map_section_ids = [s[0] for s in map_sections] 175 | 176 | # get program section info 177 | program_sections = get_program_sections_with_address_ranges(elffile) 178 | program_section_names = [s[0].name for s in program_sections] 179 | program_sections_by_name = {s[0].name: s for s in program_sections} 180 | 181 | # determine address for map definitions 182 | map_location_by_name = {} 183 | for sym in maps: 184 | sec = elffile.get_section(sym['st_shndx']) 185 | (_, begin_addr, _) = program_sections_by_name[sec.name] 186 | print(f"{begin_addr + sym['st_value']:#8x}: map '{sym.name}'") 187 | map_location_by_name[sym.name] = begin_addr + sym['st_value'] 188 | idc.set_cmt(map_location_by_name[sym.name], f"map {sym.name}", True) 189 | 190 | # determine address for global definitions, if they exist 191 | global_location_by_name = {} 192 | for sym in possible_globals: 193 | if sym['st_shndx'] != 'SHN_UNDEF': 194 | # definition exists in this binary. Easy case, use repeatable comment 195 | sec = elffile.get_section(sym['st_shndx']) 196 | (_, begin_addr, _) = program_sections_by_name[sec.name] 197 | def_location = begin_addr + sym['st_value'] 198 | print(f"global definition location: {def_location:#8x}") 199 | global_location_by_name[sym.name] = def_location 200 | idc.set_cmt(def_location, f"{sym.name} (possible global)", True) 201 | del(sym) # got bit by an annoying silly bug 202 | 203 | # get each program section's corresponding relocation section (if it exists) 204 | # and process the relocations, looking only for map or global relocations 205 | i = 0 206 | for section in elffile.iter_sections(): 207 | if isinstance(section, RelocationSection): 208 | if ".BTF" in section.name: 209 | # skip BTF related things for now, deal with that can of worms later 210 | break 211 | 212 | # relocation sections are named ".rel[section]" where [section] is the section name 213 | # they contain relocations for 214 | relocated_section_name = section.name[4:] 215 | if not relocated_section_name in program_section_names: 216 | # only do program sections, our probes live there 217 | break 218 | 219 | print(f"{i}: {section.name} at {section['sh_offset']:#8x} has {section.num_relocations()} relocations for {relocated_section_name}") 220 | for r in section.iter_relocations(): 221 | if not r.is_RELA(): # haven't seen any RELA yet 222 | symbol = symtab.get_symbol(r['r_info_sym']) 223 | if symbol: 224 | resident_section_ndx = symbol['st_shndx'] 225 | if resident_section_ndx in map_section_ids: 226 | # found a map relocation 227 | resident_section = elffile.get_section(resident_section_ndx) 228 | # get base address of relocated section, apply relocation offset 229 | (s, begin_addr, end_addr) = program_sections_by_name[relocated_section_name] 230 | relocated_address = begin_addr + r['r_offset'] # note: subject to relocation type, may be calculated differently 231 | print(f"\tmap relocation at {relocated_address:#8x}: {symbol.name} -> {map_location_by_name[symbol.name]:#8x}") 232 | ida_xref.add_dref(relocated_address, map_location_by_name[symbol.name], ida_xref.dr_R) 233 | elif symbol.name in possible_global_names: 234 | # found possible global relocation 235 | (s, begin_addr, end_addr) = program_sections_by_name[relocated_section_name] 236 | relocated_address = begin_addr + r['r_offset'] # note: subject to relocation type, may be calculated differently 237 | print(f"\tpossible global relocation {relocated_address:#8x} -> {symbol.name}") 238 | # if there is a resident section, add dref and count on the repeated comment. If no section, extern, directly comment 239 | if symbol['st_shndx'] == 'SHN_UNDEF': 240 | # extern, just comment location 241 | idc.set_cmt(relocated_address, f"{symbol.name} (possible global)", False) 242 | else: 243 | # have a resident section, add data xref to link 244 | ida_xref.add_dref(relocated_address, global_location_by_name[symbol.name], ida_xref.dr_R) 245 | else: 246 | print(f"ERROR: relocation has no symbol?") 247 | else: 248 | print("ERROR: RELA type relocation unsupported; only REL supported") 249 | i+=1 250 | pass 251 | 252 | def process_file(elf_filename): 253 | with open(elf_filename, 'rb') as f: 254 | elffile = ELFFile(f) 255 | 256 | # just printing info 257 | print("PROGBITS Sections with our assumed mapping") 258 | print(" If this differs from how IDA maps sections, annotation will give incorrect results!") 259 | print_program_sections(elffile) 260 | 261 | # convert to actually creating xrefs and making comments 262 | print("Adding repeatable comments to map definitions, adding drefs for map relocations") 263 | print("Also adding repeatable comments to possible global definitions and drefs for their relocations") 264 | print(" (If a possible global definition is an extern, we just comment the relocations, no xrefs are added)") 265 | process_relocations(elffile) 266 | 267 | print("Done. Happy Reversing!") 268 | 269 | 270 | source_file = ida_nalt.get_input_file_path() 271 | process_file(source_file) 272 | -------------------------------------------------------------------------------- /bpf_helper_enum.h: -------------------------------------------------------------------------------- 1 | // taken from kernel source v4.20 2 | // just trying to see if we can easily get actual 3 | // integer values specifying helpers for call instructions, 4 | // so we can decode them with something more helpful 5 | // 6 | // run just the pre-processor on this file with gcc -E, 7 | // then use vim (or something else) to transform the resulting 8 | // enum definition into a python array of strings of bpf helper names, 9 | // then put into ebpf.py. 10 | // 11 | // Could be automated, but worry about that later. 12 | // 13 | // Should be periodically redone to update the ebpf.py list of helpers 14 | // with newly added helper functions 15 | 16 | #define __BPF_FUNC_MAPPER(FN) \ 17 | FN(unspec), \ 18 | FN(map_lookup_elem), \ 19 | FN(map_update_elem), \ 20 | FN(map_delete_elem), \ 21 | FN(probe_read), \ 22 | FN(ktime_get_ns), \ 23 | FN(trace_printk), \ 24 | FN(get_prandom_u32), \ 25 | FN(get_smp_processor_id), \ 26 | FN(skb_store_bytes), \ 27 | FN(l3_csum_replace), \ 28 | FN(l4_csum_replace), \ 29 | FN(tail_call), \ 30 | FN(clone_redirect), \ 31 | FN(get_current_pid_tgid), \ 32 | FN(get_current_uid_gid), \ 33 | FN(get_current_comm), \ 34 | FN(get_cgroup_classid), \ 35 | FN(skb_vlan_push), \ 36 | FN(skb_vlan_pop), \ 37 | FN(skb_get_tunnel_key), \ 38 | FN(skb_set_tunnel_key), \ 39 | FN(perf_event_read), \ 40 | FN(redirect), \ 41 | FN(get_route_realm), \ 42 | FN(perf_event_output), \ 43 | FN(skb_load_bytes), \ 44 | FN(get_stackid), \ 45 | FN(csum_diff), \ 46 | FN(skb_get_tunnel_opt), \ 47 | FN(skb_set_tunnel_opt), \ 48 | FN(skb_change_proto), \ 49 | FN(skb_change_type), \ 50 | FN(skb_under_cgroup), \ 51 | FN(get_hash_recalc), \ 52 | FN(get_current_task), \ 53 | FN(probe_write_user), \ 54 | FN(current_task_under_cgroup), \ 55 | FN(skb_change_tail), \ 56 | FN(skb_pull_data), \ 57 | FN(csum_update), \ 58 | FN(set_hash_invalid), \ 59 | FN(get_numa_node_id), \ 60 | FN(skb_change_head), \ 61 | FN(xdp_adjust_head), \ 62 | FN(probe_read_str), \ 63 | FN(get_socket_cookie), \ 64 | FN(get_socket_uid), \ 65 | FN(set_hash), \ 66 | FN(setsockopt), \ 67 | FN(skb_adjust_room), \ 68 | FN(redirect_map), \ 69 | FN(sk_redirect_map), \ 70 | FN(sock_map_update), \ 71 | FN(xdp_adjust_meta), \ 72 | FN(perf_event_read_value), \ 73 | FN(perf_prog_read_value), \ 74 | FN(getsockopt), \ 75 | FN(override_return), \ 76 | FN(sock_ops_cb_flags_set), \ 77 | FN(msg_redirect_map), \ 78 | FN(msg_apply_bytes), \ 79 | FN(msg_cork_bytes), \ 80 | FN(msg_pull_data), \ 81 | FN(bind), \ 82 | FN(xdp_adjust_tail), \ 83 | FN(skb_get_xfrm_state), \ 84 | FN(get_stack), \ 85 | FN(skb_load_bytes_relative), \ 86 | FN(fib_lookup), \ 87 | FN(sock_hash_update), \ 88 | FN(msg_redirect_hash), \ 89 | FN(sk_redirect_hash), \ 90 | FN(lwt_push_encap), \ 91 | FN(lwt_seg6_store_bytes), \ 92 | FN(lwt_seg6_adjust_srh), \ 93 | FN(lwt_seg6_action), \ 94 | FN(rc_repeat), \ 95 | FN(rc_keydown), \ 96 | FN(skb_cgroup_id), \ 97 | FN(get_current_cgroup_id), \ 98 | FN(get_local_storage), \ 99 | FN(sk_select_reuseport), \ 100 | FN(skb_ancestor_cgroup_id), \ 101 | FN(sk_lookup_tcp), \ 102 | FN(sk_lookup_udp), \ 103 | FN(sk_release), \ 104 | FN(map_push_elem), \ 105 | FN(map_pop_elem), \ 106 | FN(map_peek_elem), \ 107 | FN(msg_push_data), \ 108 | FN(msg_pop_data), \ 109 | FN(rc_pointer_rel), \ 110 | FN(spin_lock), \ 111 | FN(spin_unlock), \ 112 | FN(sk_fullsock), \ 113 | FN(tcp_sock), \ 114 | FN(skb_ecn_set_ce), \ 115 | FN(get_listener_sock), \ 116 | FN(skc_lookup_tcp), \ 117 | FN(tcp_check_syncookie), \ 118 | FN(sysctl_get_name), \ 119 | FN(sysctl_get_current_value), \ 120 | FN(sysctl_get_new_value), \ 121 | FN(sysctl_set_new_value), \ 122 | FN(strtol), \ 123 | FN(strtoul), \ 124 | FN(sk_storage_get), \ 125 | FN(sk_storage_delete), \ 126 | FN(send_signal), \ 127 | FN(tcp_gen_syncookie), \ 128 | FN(skb_output), \ 129 | FN(probe_read_user), \ 130 | FN(probe_read_kernel), \ 131 | FN(probe_read_user_str), \ 132 | FN(probe_read_kernel_str), \ 133 | FN(tcp_send_ack), \ 134 | FN(send_signal_thread), \ 135 | FN(jiffies64), \ 136 | FN(read_branch_records), \ 137 | FN(get_ns_current_pid_tgid), \ 138 | FN(xdp_output), \ 139 | FN(get_netns_cookie), \ 140 | FN(get_current_ancestor_cgroup_id), \ 141 | FN(sk_assign), \ 142 | FN(ktime_get_boot_ns), \ 143 | FN(seq_printf), \ 144 | FN(seq_write), \ 145 | FN(sk_cgroup_id), \ 146 | FN(sk_ancestor_cgroup_id), \ 147 | FN(ringbuf_output), \ 148 | FN(ringbuf_reserve), \ 149 | FN(ringbuf_submit), \ 150 | FN(ringbuf_discard), \ 151 | FN(ringbuf_query), \ 152 | FN(csum_level), \ 153 | FN(skc_to_tcp6_sock), \ 154 | FN(skc_to_tcp_sock), \ 155 | FN(skc_to_tcp_timewait_sock), \ 156 | FN(skc_to_tcp_request_sock), \ 157 | FN(skc_to_udp6_sock), \ 158 | FN(get_task_stack), \ 159 | FN(load_hdr_opt), \ 160 | FN(store_hdr_opt), \ 161 | FN(reserve_hdr_opt), \ 162 | FN(inode_storage_get), \ 163 | FN(inode_storage_delete), \ 164 | FN(d_path), \ 165 | FN(copy_from_user), \ 166 | FN(snprintf_btf), \ 167 | FN(seq_printf_btf), \ 168 | FN(skb_cgroup_classid), \ 169 | FN(redirect_neigh), \ 170 | FN(per_cpu_ptr), \ 171 | FN(this_cpu_ptr), \ 172 | FN(redirect_peer), \ 173 | FN(task_storage_get), \ 174 | FN(task_storage_delete), \ 175 | FN(get_current_task_btf), \ 176 | FN(bprm_opts_set), \ 177 | FN(ktime_get_coarse_ns), \ 178 | FN(ima_inode_hash), \ 179 | FN(sock_from_file), \ 180 | FN(check_mtu), \ 181 | FN(for_each_map_elem), \ 182 | FN(snprintf), \ 183 | 184 | /* integer value in 'imm' field of BPF_CALL instruction selects which helper 185 | * function eBPF program intends to call 186 | */ 187 | #define __BPF_ENUM_FN(x) BPF_FUNC_ ## x 188 | enum bpf_func_id { 189 | __BPF_FUNC_MAPPER(__BPF_ENUM_FN) 190 | __BPF_FUNC_MAX_ID, 191 | }; 192 | #undef __BPF_ENUM_FN 193 | -------------------------------------------------------------------------------- /ebpf.py: -------------------------------------------------------------------------------- 1 | # ---------------------------------------------------------------------------- 2 | # "THE BEER-WARE LICENSE" (Revision 42): 3 | # wrote this file. As long as you 4 | # retain this notice you can do whatever you want with this stuff. If we meet 5 | # some day, and you think this stuff is worth it, you can buy me a beer in 6 | # return. Clement Berthaux 7 | # ---------------------------------------------------------------------------- 8 | 9 | from idaapi import * 10 | from idc import * 11 | 12 | # 'manually' crafted from include/uapi/linux/bpf.h header from kernel v5.13 13 | # will need to periodically update this as new helpers are added. 14 | # 15 | # run just the preprocessor (gcc -E) on the snippet defining the `bpf_func_id` enum, 16 | # then format the names into an array, preserving order (some search/replace in vim) 17 | # this makes the `helper_names` array, which we then use for everything else. 18 | # It's critical the order of names is not changed from how they appear in the processed 19 | # source, because enums assign integer values in order. 20 | 21 | helper_names = [ "BPF_FUNC_unspec", "map_lookup_elem", "map_update_elem", "map_delete_elem", "probe_read", "ktime_get_ns", "trace_printk", "get_prandom_u32", "get_smp_processor_id", "skb_store_bytes", "l3_csum_replace", "l4_csum_replace", "tail_call", "clone_redirect", "get_current_pid_tgid", "get_current_uid_gid", "get_current_comm", "get_cgroup_classid", "skb_vlan_push", "skb_vlan_pop", "skb_get_tunnel_key", "skb_set_tunnel_key", "perf_event_read", "redirect", "get_route_realm", "perf_event_output", "skb_load_bytes", "get_stackid", "csum_diff", "skb_get_tunnel_opt", "skb_set_tunnel_opt", "skb_change_proto", "skb_change_type", "skb_under_cgroup", "get_hash_recalc", "get_current_task", "probe_write_user", "current_task_under_cgroup", "skb_change_tail", "skb_pull_data", "csum_update", "set_hash_invalid", "get_numa_node_id", "skb_change_head", "xdp_adjust_head", "probe_read_str", "get_socket_cookie", "get_socket_uid", "set_hash", "setsockopt", "skb_adjust_room", "redirect_map", "sk_redirect_map", "sock_map_update", "xdp_adjust_meta", "perf_event_read_value", "perf_prog_read_value", "getsockopt", "override_return", "sock_ops_cb_flags_set", "msg_redirect_map", "msg_apply_bytes", "msg_cork_bytes", "msg_pull_data", "bind", "xdp_adjust_tail", "skb_get_xfrm_state", "get_stack", "skb_load_bytes_relative", "fib_lookup", "sock_hash_update", "msg_redirect_hash", "sk_redirect_hash", "lwt_push_encap", "lwt_seg6_store_bytes", "lwt_seg6_adjust_srh", "lwt_seg6_action", "rc_repeat", "rc_keydown", "skb_cgroup_id", "get_current_cgroup_id", "get_local_storage", "sk_select_reuseport", "skb_ancestor_cgroup_id", "sk_lookup_tcp", "sk_lookup_udp", "sk_release", "map_push_elem", "map_pop_elem", "map_peek_elem", "msg_push_data", "msg_pop_data", "rc_pointer_rel", "spin_lock", "spin_unlock", "sk_fullsock", "tcp_sock", "skb_ecn_set_ce", "get_listener_sock", "skc_lookup_tcp", "tcp_check_syncookie", "sysctl_get_name", "sysctl_get_current_value", "sysctl_get_new_value", "sysctl_set_new_value", "strtol", "strtoul", "sk_storage_get", "sk_storage_delete", "send_signal", "tcp_gen_syncookie", "skb_output", "probe_read_user", "probe_read_kernel", "probe_read_user_str", "probe_read_kernel_str", "tcp_send_ack", "send_signal_thread", "jiffies64", "read_branch_records", "get_ns_current_pid_tgid", "xdp_output", "get_netns_cookie", "get_current_ancestor_cgroup_id", "sk_assign", "ktime_get_boot_ns", "seq_printf", "seq_write", "sk_cgroup_id", "sk_ancestor_cgroup_id", "ringbuf_output", "ringbuf_reserve", "ringbuf_submit", "ringbuf_discard", "ringbuf_query", "csum_level", "skc_to_tcp6_sock", "skc_to_tcp_sock", "skc_to_tcp_timewait_sock", "skc_to_tcp_request_sock", "skc_to_udp6_sock", "get_task_stack", "load_hdr_opt", "store_hdr_opt", "reserve_hdr_opt", "inode_storage_get", "inode_storage_delete", "d_path", "copy_from_user", "snprintf_btf", "seq_printf_btf", "skb_cgroup_classid", "redirect_neigh", "per_cpu_ptr", "this_cpu_ptr", "redirect_peer", "task_storage_get", "task_storage_delete", "get_current_task_btf", "bprm_opts_set", "ktime_get_coarse_ns", "ima_inode_hash", "sock_from_file", "check_mtu", "for_each_map_elem", "snprintf", "__BPF_FUNC_MAX_ID" ] 22 | 23 | helper_id_to_name = {i: helper_names[i] for i in range(len(helper_names))} 24 | 25 | # BPF ALU defines from uapi/linux/bpf_common.h 26 | # Mainly using these for disassembling atomic instructions 27 | BPF_ADD = 0x00 28 | BPF_SUB = 0x10 29 | BPF_MUL = 0x20 30 | BPF_DIV = 0x30 31 | BPF_OR = 0x40 32 | BPF_AND = 0x50 33 | BPF_LSH = 0x60 34 | BPF_RSH = 0x70 35 | BPF_NEG = 0x80 36 | BPF_MOD = 0x90 37 | BPF_XOR = 0xa0 38 | 39 | # and these atomic-specific constants from include/uapi/linux/bpf.h 40 | # /* atomic op type fields (stored in immediate) */ 41 | BPF_FETCH = 0x01 # /* not an opcode on its own, used to build others */ 42 | BPF_XCHG = (0xe0 | BPF_FETCH) # /* atomic exchange */ 43 | BPF_CMPXCHG = (0xf0 | BPF_FETCH) # /* atomic compare-and-write */ 44 | 45 | # being lazy, we only use this for atomic ops so far 46 | bpf_alu_string = {BPF_ADD: 'add', BPF_AND: 'and', BPF_OR: 'or', BPF_XOR: 'xor'} 47 | 48 | def dump_helpers(): 49 | print("bpf helpers id -> name") 50 | for k, v in helper_id_to_name.items(): 51 | print(f"{k} -> {v}") 52 | 53 | def lookup_helper(helper_id: int) -> str : 54 | return helper_id_to_name[helper_id] 55 | 56 | class DecodingError(Exception): 57 | pass 58 | 59 | class INST_TYPES(object): 60 | pass 61 | 62 | class EBPFProc(processor_t): 63 | id = 0xeb7f 64 | flag = PR_ASSEMBLE | PR_SEGS | PR_DEFSEG32 | PR_USE32 | PRN_HEX | PR_RNAMESOK | PR_NO_SEGMOVE 65 | cnbits = 8 66 | dnbits = 8 67 | psnames = ['EBPF'] 68 | plnames = ['EBPF'] 69 | segreg_size = 0 70 | instruc_start = 0 71 | assembler = { 72 | 'flag': ASH_HEXF3 | AS_UNEQU | AS_COLON | ASB_BINF4 | AS_N2CHR, 73 | "uflag": 0, 74 | "name": "wut", 75 | "origin": ".org", 76 | "end": ".end", 77 | "cmnt": ";", 78 | "ascsep": '"', 79 | "accsep": "'", 80 | "esccodes": "\"'", 81 | "a_ascii": "db", 82 | "a_byte": "db", 83 | "a_word": "dw", 84 | 'a_dword': "dd", 85 | 'a_qword': "dq", 86 | "a_bss": "dfs %s", 87 | "a_seg": "seg", 88 | "a_curip": "PC", 89 | "a_public": "", 90 | "a_weak": "", 91 | "a_extrn": ".extern", 92 | "a_comdef": "", 93 | "a_align": ".align", 94 | "lbrace": "(", 95 | "rbrace": ")", 96 | "a_mod": "%", 97 | "a_band": "&", 98 | "a_bor": "|", 99 | "a_xor": "^", 100 | "a_bnot": "~", 101 | "a_shl": "<<", 102 | "a_shr": ">>", 103 | "a_sizeof_fmt": "size %s", 104 | 105 | } 106 | 107 | def __init__(self): 108 | processor_t.__init__(self) 109 | 110 | self.init_instructions() 111 | self.init_registers() 112 | 113 | def init_instructions(self): 114 | # there is a logic behind the opcode values but I chose to ignore it 115 | self.OPCODES = { 116 | # ALU 117 | 0x07:('add', self._ana_reg_imm, CF_USE1 | CF_USE2), 118 | 0x0f:('add', self._ana_2regs, CF_USE1|CF_USE2), 119 | 0x17:('sub', self._ana_reg_imm, CF_USE1 | CF_USE2), 120 | 0x1f:('sub', self._ana_2regs, CF_USE1|CF_USE2), 121 | 0x27:('mul', self._ana_reg_imm, CF_USE1|CF_USE2), 122 | 0x2f:('mul', self._ana_2regs, CF_USE1|CF_USE2), 123 | 0x37:('div', self._ana_reg_imm, CF_USE1|CF_USE2), 124 | 0x3f:('div', self._ana_2regs, CF_USE1|CF_USE2), 125 | 0x47:('or', self._ana_reg_imm, CF_USE1|CF_USE2), 126 | 0x4f:('or', self._ana_2regs, CF_USE1|CF_USE2), 127 | 0x57:('and', self._ana_reg_imm, CF_USE1|CF_USE2), 128 | 0x5f:('and', self._ana_2regs, CF_USE1|CF_USE2), 129 | 0x67:('lsh', self._ana_reg_imm, CF_USE1|CF_USE2), 130 | 0x6f:('lsh', self._ana_2regs, CF_USE1|CF_USE2), 131 | 0x77:('rsh', self._ana_reg_imm, CF_USE1|CF_USE2), 132 | 0x7f:('rsh', self._ana_2regs, CF_USE1|CF_USE2), 133 | 0x87:('neg', self._ana_1reg, CF_USE1|CF_USE2), 134 | 0x97:('mod', self._ana_reg_imm, CF_USE1|CF_USE2), 135 | 0x9f:('mod', self._ana_2regs, CF_USE1|CF_USE2), 136 | 0xa7:('xor', self._ana_reg_imm, CF_USE1|CF_USE2), 137 | 0xaf:('xor', self._ana_2regs, CF_USE1|CF_USE2), 138 | 0xb7:('mov', self._ana_reg_imm, CF_USE1 | CF_USE2), 139 | 0xbf:('mov', self._ana_2regs, CF_USE1 | CF_USE2), 140 | 0xc7:('arsh', self._ana_reg_imm, CF_USE1 | CF_USE2), 141 | 0xcf:('arsh', self._ana_2regs, CF_USE1 | CF_USE2), 142 | 143 | # TODO: ALU 32 bit opcodes 144 | 145 | # Byteswap Instructions 146 | # 1 register operand (destination), 1 immediate. 147 | # imm == 16 | 32 | 64, indicating width 148 | # TODO: output the proper mnemonic w/ optional suffix based on the immediate operand. 149 | # what should happen is that the immediate operand is used as the decimal 150 | # width modifier to produce 'be16', 'be32', etc. 151 | 0xd4:('le', self._ana_reg_imm, CF_USE1), 152 | 0xdc:('be', self._ana_reg_imm, CF_USE1), 153 | 154 | # MEM 155 | # special-case quad-word load 156 | 0x18:('lddw', self._ana_reg_imm, CF_USE1|CF_USE2), 157 | 158 | # Direct skb access loads (skb implied). Legacy cBPF, but we should still disassemble correctly 159 | # linux kernel disassembles this like "r0 = *(u32 *)skb[26]" 160 | # Here, r0 is the hardcoded destination and no source register is used. The immediate 161 | # determines the offset into the skb 162 | 0x20:('ldaw', self._ana_phrase_imm, CF_USE1|CF_USE2), 163 | 0x28:('ldah', self._ana_phrase_imm, CF_USE1|CF_USE2), 164 | 0x30:('ldab', self._ana_phrase_imm, CF_USE1|CF_USE2), 165 | 0x38:('ldadw', self._ana_phrase_imm, CF_USE1|CF_USE2), 166 | 167 | # indirect loads are basically in the same boat as the absolute loads above 168 | 0x40:('ldinw', self._ana_reg_regdisp, CF_USE1|CF_USE2), 169 | 0x48:('ldinh', self._ana_reg_regdisp, CF_USE1|CF_USE2), 170 | 0x50:('ldinb', self._ana_reg_regdisp, CF_USE1|CF_USE2), 171 | 0x58:('ldindw', self._ana_reg_regdisp, CF_USE1|CF_USE2), 172 | 173 | 0x61:('ldxw', self._ana_reg_regdisp, CF_USE1|CF_USE2), 174 | 0x69:('ldxh', self._ana_reg_regdisp, CF_USE1|CF_USE2), 175 | 0x71:('ldxb', self._ana_reg_regdisp, CF_USE1|CF_USE2), 176 | 0x79:('ldxdw', self._ana_reg_regdisp, CF_USE1|CF_USE2), 177 | 0x62:('stw', self._ana_regdisp_reg, CF_USE1|CF_USE2), 178 | 0x6a:('sth', self._ana_regdisp_reg, CF_USE1|CF_USE2), 179 | 0x72:('stb', self._ana_regdisp_reg, CF_USE1|CF_USE2), 180 | 0x7a:('stdw', self._ana_regdisp_reg, CF_USE1|CF_USE2), 181 | 0x63:('stxw', self._ana_regdisp_reg, CF_USE1|CF_USE2), 182 | 0x6b:('stxh', self._ana_regdisp_reg, CF_USE1|CF_USE2), 183 | 0x73:('stxb', self._ana_regdisp_reg, CF_USE1|CF_USE2), 184 | 0x7b:('stxdw', self._ana_regdisp_reg, CF_USE1|CF_USE2), 185 | 186 | # LOCK instructions 187 | # These are handled a bit differently than typical instructions, see 188 | # how the linux kernel disassembles the atomic instructions here 189 | # https://elixir.bootlin.com/linux/v5.13.4/source/kernel/bpf/disasm.c#L163 190 | # 0xdb: BPF_STX class, BPF_DW size, BPF_ATOMIC mode (imm indicates op type) 191 | # The actual operation is in the immediate, so we need to analyze this 192 | # to unpack the immediate into a 'virtual' 3rd operand, but this virtual 193 | # 3rd operand isn't directly printed. We inspect it in the output phase specifically for 194 | # these lock instructions to detemine which operation to print as 195 | # an optional suffix with the mnemonic 196 | 0xc3:('lock', self._ana_regdisp_reg_atomic, CF_USE1|CF_USE2), 197 | 0xdb:('lock', self._ana_regdisp_reg_atomic, CF_USE1|CF_USE2), 198 | 199 | # BRANCHES 200 | 0x05:('ja', self._ana_jmp, CF_USE1|CF_JUMP), 201 | 0x15:('jeq', self._ana_cond_jmp_reg_imm, CF_USE1 | CF_USE2 | CF_USE3 | CF_JUMP), 202 | 0x1d:('jeq', self._ana_cond_jmp_reg_reg, CF_USE1 | CF_USE2 | CF_USE3 | CF_JUMP), 203 | 0x25:('jgt', self._ana_cond_jmp_reg_imm, CF_USE1 | CF_USE2 | CF_USE3 | CF_JUMP), 204 | 0x2d:('jgt', self._ana_cond_jmp_reg_reg, CF_USE1 | CF_USE2 | CF_USE3 | CF_JUMP), 205 | 0x35:('jge', self._ana_cond_jmp_reg_imm, CF_USE1 | CF_USE2 | CF_USE3 | CF_JUMP), 206 | 0x3d:('jge', self._ana_cond_jmp_reg_reg, CF_USE1 | CF_USE2 | CF_USE3 | CF_JUMP), 207 | 0x45:('jset', self._ana_cond_jmp_reg_imm, CF_USE1 | CF_USE2 | CF_USE3 | CF_JUMP), 208 | 0x4d:('jset', self._ana_cond_jmp_reg_reg, CF_USE1 | CF_USE2 | CF_USE3 | CF_JUMP), 209 | 0x55:('jne', self._ana_cond_jmp_reg_imm, CF_USE1 | CF_USE2 | CF_USE3 | CF_JUMP), 210 | 0x5d:('jne', self._ana_cond_jmp_reg_reg, CF_USE1 | CF_USE2 | CF_USE3 | CF_JUMP), 211 | 0x65:('jsgt', self._ana_cond_jmp_reg_imm, CF_USE1 | CF_USE2 | CF_USE3 | CF_JUMP), 212 | 0x6d:('jsgt', self._ana_cond_jmp_reg_reg, CF_USE1 | CF_USE2 | CF_USE3 | CF_JUMP), 213 | 0x75:('jsge', self._ana_cond_jmp_reg_imm, CF_USE1 | CF_USE2 | CF_USE3 | CF_JUMP), 214 | 0x7d:('jsge', self._ana_cond_jmp_reg_reg, CF_USE1 | CF_USE2 | CF_USE3 | CF_JUMP), 215 | 216 | 0xa5:('jlt', self._ana_cond_jmp_reg_imm, CF_USE1 | CF_USE2 | CF_USE3 | CF_JUMP), 217 | 0xad:('jlt', self._ana_cond_jmp_reg_reg, CF_USE1 | CF_USE2 | CF_USE3 | CF_JUMP), 218 | # TODO: do we have to add any extra info here to handle differences with signed/unsigned conditional jumps? 219 | 0xc5:('jslt', self._ana_cond_jmp_reg_imm, CF_USE1 | CF_USE2 | CF_USE3 | CF_JUMP), 220 | 221 | 0x85:('call', self._ana_call, CF_USE1|CF_CALL), 222 | 223 | 0x95:('ret', self._ana_nop, CF_STOP) 224 | } 225 | 226 | Instructions = [{'name':x[0], 'feature':x[2]} for x in self.OPCODES.values()] 227 | self.inames = {v[0]:k for k,v in self.OPCODES.items()} 228 | self.instruc_end = 0xff 229 | self.instruc = [({'name':self.OPCODES[i][0], 'feature':self.OPCODES[i][2]} if i in self.OPCODES else {'name':'unknown_opcode', 'feature':0}) for i in range(0xff)] 230 | 231 | # self.icode_return = 0x95 232 | 233 | def init_registers(self): 234 | self.reg_names = ['r0', 'r1', 'r2', 'r3', 'r4', 'r5', 'r6', 'r7', 'r8', 'r9', 'r10', 'CS', 'DS'] 235 | 236 | self.reg_cs = 0 237 | self.reg_ds = 1 238 | 239 | self.reg_first_sreg = self.reg_cs 240 | self.reg_last_sreg = self.reg_ds 241 | 242 | self.reg_code_sreg = self.reg_cs 243 | self.reg_data_sreg = self.reg_ds 244 | 245 | def ev_ana_insn(self, insn): 246 | try: 247 | return self._ana(insn) 248 | except DecodingError: 249 | return 0 250 | 251 | # XXX: NOTE: we never set offb for any operands, should we? 252 | def _ana(self, insn): 253 | self.opcode = insn.get_next_byte() 254 | registers = insn.get_next_byte() 255 | 256 | self.src = (registers >> 4) & 15 257 | self.dst = registers & 15 258 | 259 | # TODO: should we just handle the 16-bit signed stuff here? 260 | self.off = insn.get_next_word() 261 | 262 | # if self.off & 0x8000: 263 | # self.off -= 0x10000 264 | 265 | self.imm = insn.get_next_dword() 266 | 267 | # special case for longer (longest) instruction 268 | if self.opcode == 0x18: 269 | insn.get_next_dword() # consume 270 | imm2 = insn.get_next_dword() 271 | self.imm += imm2 << 32 272 | 273 | insn.itype = self.opcode 274 | 275 | if self.opcode not in self.OPCODES: 276 | raise DecodingError("wuut") 277 | 278 | self.OPCODES[self.opcode][1](insn) 279 | 280 | return insn.size 281 | 282 | def _ana_nop(self, insn): 283 | pass 284 | 285 | def _ana_reg_imm(self, insn): 286 | insn[0].type = o_reg 287 | insn[0].dtype = dt_dword 288 | insn[0].reg = self.dst 289 | 290 | insn[1].type = o_imm 291 | # special quad-word load 292 | if self.opcode == 0x18: 293 | insn[1].dtype = dt_qword 294 | else: 295 | insn[1].dtype = dt_dword 296 | 297 | insn[1].value = self.imm 298 | 299 | def _ana_1reg(self, insn): 300 | insn[0].type = o_reg 301 | insn[0].dtype = dt_dword 302 | insn[0].reg = self.dst 303 | 304 | def _ana_2regs(self, insn): 305 | insn[0].type = o_reg 306 | insn[0].dtype = dt_dword 307 | insn[0].reg = self.dst 308 | 309 | insn[1].type = o_reg 310 | insn[1].dtype = dt_dword 311 | insn[1].reg = self.src 312 | 313 | def _ana_call(self, insn): 314 | insn[0].type = o_imm 315 | insn[0].value = self.imm 316 | insn[0].dtype = dt_dword 317 | 318 | def _ana_jmp(self, insn): 319 | insn[0].type = o_near 320 | # need to treat offset as a signed 16-bit integer to properly support backwards jumps, 321 | # which are allowed in more recent eBPF 322 | offset = ctypes.c_int16(self.off).value 323 | if offset < 0: 324 | #print("[_ana_jmp] backwards jump") 325 | pass 326 | insn[0].addr = 8*offset + insn.ea + 8 327 | #print(f"[_ana_jmp] off: {self.off:#8x}, ea: {insn.ea:#8x}, addr: {insn[0].addr:#8x}") 328 | # 0x05 case: signed 16-bit offset is the offset from PC to jump to 329 | insn[0].dtype = dt_word # 16-bit offset 330 | 331 | def _ana_cond_jmp_reg_imm(self, insn): 332 | insn[0].type = o_reg 333 | insn[0].dtype = dt_dword 334 | insn[0].reg = self.dst 335 | 336 | insn[1].type = o_imm 337 | insn[1].value = self.imm 338 | insn[1].dtype = dt_dword 339 | 340 | offset = ctypes.c_int16(self.off).value 341 | if offset < 0: 342 | #print("[_ana_cond_jmp_reg_imm] backwards jump") 343 | pass 344 | insn[2].type = o_near 345 | insn[2].addr = 8 * offset + insn.ea + 8 346 | insn[2].dtype = dt_dword 347 | 348 | def _ana_cond_jmp_reg_reg(self, insn): 349 | insn[0].type = o_reg 350 | insn[0].dtype = dt_dword 351 | insn[0].reg = self.dst 352 | 353 | insn[1].type = o_reg 354 | insn[1].dtype = dt_dword 355 | insn[1].reg = self.src 356 | 357 | offset = ctypes.c_int16(self.off).value 358 | if offset < 0: 359 | #print("[_ana_cond_jmp_reg_reg] backwards jump") 360 | pass 361 | insn[2].type = o_near 362 | insn[2].addr = 8 * offset + insn.ea + 8 363 | insn[2].dtype = dt_dword 364 | 365 | def _ana_regdisp_reg(self, insn): 366 | # all cases of this instruction have a 16-bit offset 367 | # eg: stxdw [dst+off], src 368 | insn[0].type = o_displ 369 | insn[0].dtype = dt_word 370 | insn[0].value = self.off 371 | insn[0].phrase = self.dst 372 | 373 | insn[1].type = o_reg 374 | insn[1].dtype = dt_dword 375 | insn[1].reg = self.src 376 | 377 | def _ana_regdisp_reg_atomic(self, insn): 378 | insn[0].type = o_displ 379 | insn[0].dtype = dt_word 380 | insn[0].value = self.off 381 | insn[0].phrase = self.dst 382 | 383 | insn[1].type = o_reg 384 | insn[1].dtype = dt_dword 385 | insn[1].reg = self.src 386 | 387 | # operation is conveyed by immediate value, but not literally used as an operand 388 | insn[2].type = o_imm 389 | insn[2].dtype = dt_dword 390 | insn[2].value = self.imm 391 | 392 | def _ana_reg_regdisp(self, insn): 393 | insn[0].type = o_reg 394 | insn[0].dtype = dt_dword 395 | insn[0].reg = self.dst 396 | 397 | insn[1].type = o_displ 398 | insn[1].dtype = dt_word 399 | insn[1].value = self.off 400 | insn[1].phrase = self.src 401 | 402 | # indirect skb loads have hardcoded r0 as destination, but use src + imm to offset 403 | # into an implicit skb 404 | if self.opcode in [0x40, 0x48, 0x50, 0x58]: 405 | insn[0].reg = 0 # hardcoded r0 destination 406 | insn[1].value = self.imm # use imm not offset for displacement 407 | insn[1].dtype = dt_dword # imm are 32-bit, off are 16-bit. 408 | 409 | 410 | # Only actually used for absolute loads, which are hardcoded to r0 destination 411 | def _ana_phrase_imm(self, insn): 412 | insn[0].type = o_reg 413 | insn[0].dtype = dt_dword 414 | insn[0].reg = 0 # hardcode destination to r0 415 | 416 | insn[1].type = o_phrase 417 | insn[1].dtype = dt_dword 418 | insn[1].value = self.imm 419 | 420 | 421 | def ev_emu_insn(self, insn): 422 | Feature = insn.get_canon_feature() 423 | 424 | if Feature & CF_JUMP: 425 | dst_op_index = 0 if insn.itype == 0x5 else 2 426 | #print("[ev_emu_insn] jump detected: 0x{:x} -> 0x{:x}".format(insn[dst_op_index].offb, insn[dst_op_index].addr)) 427 | insn.add_cref(insn[dst_op_index].addr, insn[dst_op_index].offb, fl_JN) 428 | remember_problem(cvar.PR_JUMP, insn.ea) # PR_JUMP ignored? 429 | 430 | # TODO: see what stack emulation we need to do when operating on/with r10 431 | if insn[0].type == o_displ or insn[1].type == o_displ: 432 | op_ind = 0 if insn[0].type == o_displ else 1 433 | if may_create_stkvars(): 434 | # annoying problem: we can properly display 16-bit offsets in the out stage, 435 | # but this step gets them highlighted in red as if they were invalid 436 | # Disable until we can do this correctly 437 | #insn.create_stkvar(insn[op_ind], insn[op_ind].value, STKVAR_VALID_SIZE) 438 | #op_stkvar(insn.ea, op_ind) 439 | pass 440 | 441 | # TODO: Determine difference between calling helper and tail-calling other BPF program 442 | # TODO: use FLIRT/whatever to make nice annotations for helper calls, like we get for typical PEs 443 | # if Feature & CF_CALL: 444 | # ua_add_cref(self.cmd[0].offb, self.cmd[0].addr, fl_CN) 445 | if Feature & CF_CALL: 446 | # call into eBPF helper 447 | #helper_name = lookup_helper(insn[0].value) 448 | #print(f"[eb_emu_insn] call helper: {helper_name}") 449 | #print("[ev_emu_insn] (0x{:x}) call offb: {} addr: {} value: {}".format(insn.ea, insn[0].offb, insn[0].addr, insn[0].value)) 450 | pass 451 | 452 | # continue execution flow if not stop instruction (call), and not unconditional jump 453 | flow = (Feature & CF_STOP == 0) and not insn.itype == 0x5 454 | 455 | if flow: 456 | insn.add_cref(insn.ea + insn.size, 0, fl_F) 457 | 458 | return True 459 | 460 | def ev_out_insn(self, ctx): 461 | cmd = ctx.insn 462 | ft = cmd.get_canon_feature() 463 | buf = ctx.outbuf 464 | 465 | # handle byteswap instruction suffix encoded in immediate, don't print immediate 466 | if cmd.itype == 0xd4 or cmd.itype == 0xdc: 467 | # directly use immediate as suffix in decimal 468 | # analysis function sets second operand as immediate 469 | if cmd.ops[1].type == o_imm: 470 | ctx.out_mnem(15, f"{cmd.ops[1].value}") 471 | else: 472 | print("[ev_out_insn] analysis error: invalid 2nd operand type for byteswap instruction") 473 | # special handling for atomic instruction, mnemonic is determined by immediate, not opcode 474 | elif cmd.itype == 0xdb or cmd.itype == 0xc3: 475 | atomic_alu_ops = [BPF_ADD, BPF_AND, BPF_OR, BPF_XOR] 476 | atomic_alu_fetch_ops = [op | BPF_FETCH for op in atomic_alu_ops] 477 | if cmd.ops[2].type == o_imm: 478 | # TODO: add size/width to disassembly? 479 | if cmd.ops[2].value in atomic_alu_ops: 480 | # first case; 'lock' instruction we first came across 481 | ctx.out_mnem(15, f" {bpf_alu_string[cmd.ops[2].value]}") 482 | elif cmd.ops[2].value in atomic_alu_fetch_ops: 483 | print("[ev_out_insn] untested case for atomic instruction: ALU fetch op") 484 | ctx.out_mnem(15, f" fetch {bpf_alu_string[cmd.ops[2].value]}") 485 | elif cmd.ops[2].value == BPF_CMPXCHG: 486 | print("[ev_out_insn] untested case for atomic instruction: CMPXCHG") 487 | ctx.out_mnem(15, " cmpxchg") 488 | elif cmd.ops[2].value == BPF_XCHG: 489 | print("[ev_out_insn] untested case for atomic instruction: XCHG") 490 | ctx.out_mnem(15, " xchg") 491 | else: 492 | print("[ev_out_insn] invalid operation type in immediate for atomic instruction") 493 | else: 494 | print("[ev_out_insn] analysis error: 3rd parameter for atomic instruction must be o_imm. debug me!") 495 | else: 496 | ctx.out_mnem(15) 497 | 498 | if ft & CF_USE1: 499 | if ft & CF_CALL: 500 | try: 501 | #TODO: This is probably better done elsewhere. Remove once that's figured out. 502 | helper_name = lookup_helper(cmd[0].value) 503 | #print(f"[ev_out_insn] calling helper {helper_name}") 504 | except KeyError: 505 | print(f"[ev_out_insn] unknown bpf helper {cmd[0].value:#x}. You need to update the processor's list of helper functions using a newer Linux kernel source (include/uapi/linux/bpf.h).") 506 | ctx.out_one_operand(0) 507 | if ft & CF_USE2: 508 | ctx.out_char(',') 509 | ctx.out_char(' ') 510 | ctx.out_one_operand(1) 511 | if ft & CF_USE3: 512 | ctx.out_char(',') 513 | ctx.out_char(' ') 514 | ctx.out_one_operand(2) 515 | cvar.gl_comm = 1 516 | ctx.flush_outbuf() 517 | 518 | def ev_out_operand(self, ctx, op): 519 | if op.type == o_reg: 520 | ctx.out_register(self.reg_names[op.reg]) 521 | 522 | # It appears that all uses of immediates are signed, hardcode treating them as signed. 523 | elif op.type == o_imm: 524 | if op.dtype == dt_qword: 525 | ctx.out_value(op, OOF_SIGNED|OOFW_IMM|OOFW_64) 526 | elif op.dtype == dt_dword: 527 | ctx.out_value(op, OOF_SIGNED|OOFW_IMM|OOFW_32) 528 | else: 529 | print(f"[ev_out_operand] immediate operand, unhandled dtype: {op.dtype:#8x}") 530 | ctx.out_value(op, OOF_SIGNED|OOFW_IMM|OOFW_32) # TODO: improve default case/handle all cases 531 | 532 | elif op.type in [o_near, o_mem]: 533 | ok = ctx.out_name_expr(op, op.addr, BADADDR) 534 | if not ok: 535 | ctx.out_tagon(COLOR_ERROR) 536 | ctx.out_long(op.addr, 16) 537 | ctx.out_tagoff(COLOR_ERROR) 538 | # TODO: figure out how to get this operand's instruction's address to remember this problem 539 | #remember_problem(PR_NONAME, insn.ea) 540 | 541 | elif op.type == o_phrase: 542 | # phrase operands are only encountered in absolute loads (eg: 0x20) which are implicitly 543 | # in reference to a skb, which is how the linux kernel disassembles it 544 | ctx.out_printf('skb') # text color is a bit off. fix later. 545 | ctx.out_symbol('[') 546 | ctx.out_value(op, OOF_SIGNED|OOFW_IMM|OOFW_32) # "OpDecimal" fails on this, figure out why & fix it. 547 | ctx.out_symbol(']') 548 | 549 | # All uses of displacement operands I've found so far are 16-bit signed. 550 | elif op.type == o_displ: 551 | #print(f"[ev_out_operand] displacement dtype: {op.dtype:#8x} addr: {op.addr:#8x} value: {op.value:#8x}") 552 | if op.dtype == dt_dword: 553 | # must be indirect load to be using 32-bit imm as phrase operand; skb implied 554 | ctx.out_printf('skb') 555 | ctx.out_symbol('[') 556 | ctx.out_register(self.reg_names[op.phrase]) 557 | if op.value: 558 | if op.dtype == dt_word: 559 | ctx.out_value(op, OOFS_NEEDSIGN|OOF_SIGNED|OOFW_IMM|OOFW_16) 560 | elif op.dtype == dt_dword: 561 | ctx.out_value(op, OOFS_NEEDSIGN|OOF_SIGNED|OOFW_IMM|OOFW_32) 562 | else: 563 | print("[ev_out_operand] unexpected displacement dtype: {op.dtype:#8x}") 564 | ctx.out_value(op, OOFS_NEEDSIGN|OOF_SIGNED|OOFW_IMM) 565 | ctx.out_symbol(']') 566 | else: 567 | return False 568 | return True 569 | 570 | def PROCESSOR_ENTRY(): 571 | return EBPFProc() 572 | -------------------------------------------------------------------------------- /helper_annotation/generate_helper_lookup.sh: -------------------------------------------------------------------------------- 1 | #!/bin/sh 2 | 3 | # get source files if we don't have them already 4 | # defaulting to 5.13, the most recent non-rc release at time of writing. 5 | # If you need a newer version, specify it via the KERNEL_VERSION variable. 6 | # for example, $KERNEL_VERSION="5.14-rc1" would fetch the file versions from that tag 7 | # 8 | # Unless you know what you're doing, stick with stable kernel releases 9 | 10 | KERNEL_VERSION="${KERNEL_VERSION:-v5.13}" 11 | 12 | if [ ! -f bpf_doc.py ]; then 13 | curl -o bpf_doc.py https://raw.githubusercontent.com/torvalds/linux/${KERNEL_VERSION}/scripts/bpf_doc.py 14 | fi 15 | 16 | if [ ! -f bpf.h ]; then 17 | curl -o bpf.h https://raw.githubusercontent.com/torvalds/linux/${KERNEL_VERSION}/include/uapi/linux/bpf.h 18 | fi 19 | 20 | echo "put this dictionary definition in the bpf helper annotation python script:" 21 | echo 22 | 23 | python3 ./bpf_doc.py --filename bpf.h --header | grep '^static' | ./parse_helper_header.py 24 | 25 | -------------------------------------------------------------------------------- /helper_annotation/parse_helper_header.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | # 3 | # takes signatures generated by the linux kernel source bpf_doc.py script with the --header 4 | # flag for c syntax header. Parses to build python dictionary suitable for looking up 5 | # helper function signature by id, which is supplied directly as the immediate in the 6 | # eBPF call instruction 7 | # 8 | # run this, then put the dictionary in the bpf helper annotation script 9 | 10 | import re 11 | import sys 12 | 13 | if __name__ == '__main__': 14 | # split deref patterns so we only remove the first * 15 | deref_pattern_1 = re.compile('[()]') 16 | deref_pattern_2 = re.compile('\*') 17 | 18 | split_pattern = re.compile('^static (.+) = \(void \*\) (\d+);') 19 | 20 | helpers = {} 21 | 22 | for l in sys.stdin: 23 | # split up signature and identifier integer 24 | m = split_pattern.match(l) 25 | 26 | # transform '(*name)' to 'name' to make function pointer into bare signature 27 | s = m.group(1) 28 | s2 = deref_pattern_1.sub('', s, count=2) 29 | signature = deref_pattern_2.sub('', s2, count=1) 30 | 31 | identifier = int(m.group(2).strip()) 32 | 33 | # add this signature to lookup dict 34 | helpers[identifier] = signature 35 | 36 | 37 | # don't bother putting in unspec/max fields, we can just let the exception/warning happen 38 | # skip any 'verification' steps against actual enum in source, since the bpf_doc.py script in 39 | # kernel source can provide id values with the signatures in its output. If we get wrong results, 40 | # it's a bug in that script, not this one :) 41 | 42 | print(f"helper_id_to_signature = {repr(helpers)}") 43 | 44 | -------------------------------------------------------------------------------- /img/bpf_ida.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cylance/eBPF_processor/c5295de3054f859f9a41c2f1a7ef61e895804908/img/bpf_ida.png -------------------------------------------------------------------------------- /samples/ebpfkit/bootstrap.o: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cylance/eBPF_processor/c5295de3054f859f9a41c2f1a7ef61e895804908/samples/ebpfkit/bootstrap.o -------------------------------------------------------------------------------- /samples/ebpfkit/main.o: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cylance/eBPF_processor/c5295de3054f859f9a41c2f1a7ef61e895804908/samples/ebpfkit/main.o -------------------------------------------------------------------------------- /samples/libbpf-bootstrap/bootstrap.bpf.o: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cylance/eBPF_processor/c5295de3054f859f9a41c2f1a7ef61e895804908/samples/libbpf-bootstrap/bootstrap.bpf.o -------------------------------------------------------------------------------- /samples/libbpf-bootstrap/minimal.bpf.o: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cylance/eBPF_processor/c5295de3054f859f9a41c2f1a7ef61e895804908/samples/libbpf-bootstrap/minimal.bpf.o --------------------------------------------------------------------------------