├── .gitignore ├── LICENSE ├── README.md ├── pyebpf ├── __init__.py ├── ebpf_wrapper.py ├── helpers.py └── normalizers.py └── setup.py /.gitignore: -------------------------------------------------------------------------------- 1 | # Byte-compiled / optimized / DLL files 2 | __pycache__/ 3 | *.py[cod] 4 | *$py.class 5 | 6 | # Pycharm 7 | .idea/ 8 | 9 | # C extensions 10 | *.so 11 | 12 | # Distribution / packaging 13 | .Python 14 | build/ 15 | develop-eggs/ 16 | dist/ 17 | downloads/ 18 | eggs/ 19 | .eggs/ 20 | lib/ 21 | lib64/ 22 | parts/ 23 | sdist/ 24 | var/ 25 | wheels/ 26 | *.egg-info/ 27 | .installed.cfg 28 | *.egg 29 | MANIFEST 30 | 31 | # PyInstaller 32 | # Usually these files are written by a python script from a template 33 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 34 | *.manifest 35 | *.spec 36 | 37 | # Installer logs 38 | pip-log.txt 39 | pip-delete-this-directory.txt 40 | 41 | # Unit test / coverage reports 42 | htmlcov/ 43 | .tox/ 44 | .coverage 45 | .coverage.* 46 | .cache 47 | nosetests.xml 48 | coverage.xml 49 | *.cover 50 | .hypothesis/ 51 | .pytest_cache/ 52 | 53 | # Translations 54 | *.mo 55 | *.pot 56 | 57 | # Django stuff: 58 | *.log 59 | local_settings.py 60 | db.sqlite3 61 | 62 | # Flask stuff: 63 | instance/ 64 | .webassets-cache 65 | 66 | # Scrapy stuff: 67 | .scrapy 68 | 69 | # Sphinx documentation 70 | docs/_build/ 71 | 72 | # PyBuilder 73 | target/ 74 | 75 | # Jupyter Notebook 76 | .ipynb_checkpoints 77 | 78 | # pyenv 79 | .python-version 80 | 81 | # celery beat schedule file 82 | celerybeat-schedule 83 | 84 | # SageMath parsed files 85 | *.sage.py 86 | 87 | # Environments 88 | .env 89 | .venv 90 | env/ 91 | venv/ 92 | ENV/ 93 | env.bak/ 94 | venv.bak/ 95 | 96 | # Spyder project settings 97 | .spyderproject 98 | .spyproject 99 | 100 | # Rope project settings 101 | .ropeproject 102 | 103 | # mkdocs documentation 104 | /site 105 | 106 | # mypy 107 | .mypy_cache/ 108 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2019 Danny Shemesh 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | ## PyEBPF 2 | 3 | A bcc-based, eBPF (Extended-Berkeley-Packet-Filter) wrapper for Python. 4 | 5 | *Note*: Using this library requires a working installation of BCC, please refer to this [guide](https://github.com/iovisor/bcc/blob/master/INSTALL.md). 6 | 7 | This small library serves two main purposes: 8 | 9 | 1. It lets you attach BPF kernel probes without writing native code 10 | 2. It lets you write BPF routine callbacks in Python [1] 11 | 12 | You may still write, compile and use your native routines 13 | just as you would have with bcc's BPF library, in case you need the functionality. 14 | 15 | [1] See 'How does this library work ?' below 16 | 17 | ### What is eBPF 18 | 19 | Extended-Berkeley-Packet-Filters are a superset of BPF filters (traditionally available for packet-filtering), that lets you 20 | write small kernel-routines, using a dedicated eBPF instruction set. 21 | 22 | To use eBPF, one needs to compile a routine, call the bpf(2) syscall, and attach a kernel-probe. 23 | 24 | bpf(2) will make sure to take your compiled routine, statically analyze and jit it, and 25 | then copy it to kernel space for later use. 26 | 27 | You attach a probe to a kernel trace event (such as a syscall invocation), 28 | and once your probe is attached, your eBPF routine will be invoked appropriately. 29 | 30 | Sharing data between eBPF routines, or between an eBPF routine and user-space, is possible via eBPF maps, 31 | which operate on top of a FD that lets one communicate between those two ends. 32 | 33 | ### What is IOVisor / BCC 34 | 35 | BCC (BPF Compiler Collection) is a toolkit that helps you generate and use BPF routine in a user-friendly manner. 36 | 37 | It abstracts some eBPF features (such as BPF shared data structures) via C-Macros, 38 | and lets you focus on your routine's logic, and gathering appropriate metrics. 39 | 40 | Code generation is managed by LLVM, hence you need an appropriate version installed. 41 | 42 | More about the project [here](https://github.com/iovisor/bcc). 43 | 44 | ### How does this library work ? 45 | 46 | Given an event to attach a kernel-probe to, this library will (In order): 47 | 48 | 1. Try to implicitly guess any extra parameters the event passes to your routine. 49 | This is done best-effortly, by reading the */sys/kernel/debug/tracing/events/syscalls/sys_enter_/format* file. 50 | This file contains a text description of the parameters the event-trace may contain. 51 | 52 | 2. It will then generate a crafted native data structure, that will be populated with relevant context, including: 53 | - Current time in nanosecond (via bpf_ktime_get_ns) 54 | - PID and TID (via bpf_get_current_pid_tgid) 55 | - GID and UID (via bpf_get_current_uid_gid) 56 | - Process name (via bpf_get_current_comm) 57 | - Any extra implicitly guessed event-trace parameters 58 | e.g. for the *bind* syscall, the data structure will additionally contain a: socket FD, socket address and address length 59 | 60 | 3. It will create an eBPF shared data-structure (using the BPF_PERF_OUTPUT macro) that will be used as the communication 61 | gateway with user mode routine 62 | 63 | 4. A dedicated polling daemon thread will be spawned, and for each output to the shared structure above, your python 64 | callback will be invoked, passing it a ctypes class representing the native data structure 65 | 66 | Thus, on any event your kernel probe attaches to, an internal BPF routine will be called, and in turn 67 | it will copy all relevant members via the constructed data structure back to user mode via the BPF structure. 68 | Then, an internal python thread will poll on said structure, and will call the registered python callback. 69 | 70 | ### Using this wrapper effectively 71 | 72 | First, install the library via: 73 | 74 | $> pip install pyebpf 75 | 76 | Next, import the EBPFWrapper object, instantiate it, and attach a function to an event. 77 | 78 | ```python 79 | # trace_fields.py bcc example, using pyebpf 80 | 81 | 82 | b = EBPFWrapper() 83 | print 'PID MESSAGE' 84 | 85 | def hello(data, **kwargs): 86 | print '{pid} Hello, World!'.format(pid=data.process_id) 87 | 88 | b.attach_kprobe(event=b.get_syscall_fnname('clone'), fn=hello) 89 | 90 | while True: 91 | try: 92 | time.sleep(1) 93 | except KeyboardInterrupt: 94 | print 'Bye !' 95 | break 96 | 97 | b = EBPFWrapper() 98 | print 'COMM PID SOCKETFD' 99 | 100 | def on_bind(data, **kwargs): 101 | print '{comm} {pid} {fd}'.format(comm=data.process_name, pid=data.process_id, fd=data.fd, addr=data.umyaddr) 102 | 103 | b.attach_kprobe(event=b.get_syscall_fnname('bind'), fn=on_bind) 104 | 105 | # Will print 'python ' 106 | s = socket() 107 | s.bind(('0.0.0.0', 31337)) 108 | 109 | while True: 110 | try: 111 | time.sleep(1) 112 | except KeyboardInterrupt: 113 | print 'Bye !' 114 | break 115 | 116 | # Supplying a native route 117 | 118 | from pyebpf.ebpf_wrapper import EBPFWrapper 119 | 120 | prog = ''' 121 | int hello(struct pt_regs* ctx) { 122 | bpf_trace_printk("Hello from eBPF routine!\\n"); 123 | return 0; 124 | } 125 | ''' 126 | b = EBPFWrapper(text=prog) 127 | b.attach_kprobe(event='sys_open', fn_name='hello') 128 | 129 | while True: 130 | try: 131 | print b.trace_fields() 132 | except KeyboardInterrupt: 133 | print 'Bye !' 134 | break 135 | ``` 136 | 137 | ### eBPF related resources 138 | 139 | Here are a few eBPF-related resources that I found useful during the writing of this library: 140 | 141 | 1. http://www.brendangregg.com/ebpf.html 142 | 2. https://bolinfest.github.io/opensnoop-native 143 | 3. https://github.com/iovisor/bcc 144 | 4. https://lwn.net/Articles/740157/ -------------------------------------------------------------------------------- /pyebpf/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dany74q/pyebpf/fe51728894568ae49451da17eb56f49ea4d5723f/pyebpf/__init__.py -------------------------------------------------------------------------------- /pyebpf/ebpf_wrapper.py: -------------------------------------------------------------------------------- 1 | import ctypes as ct 2 | import logging 3 | import os 4 | import re 5 | import types 6 | # noinspection PyUnresolvedReferences 7 | from collections import OrderedDict 8 | # noinspection PyProtectedMember 9 | from threading import Thread, Event, _Event, current_thread 10 | 11 | from bcc import BPF 12 | 13 | from pyebpf.helpers import assert_type 14 | from pyebpf.normalizers import normalize_event 15 | 16 | 17 | class EBPFProgramDescriptor(object): 18 | """ A wrapper around a program text and its extra parameters as a ctypes data class """ 19 | 20 | def __init__(self, program_text, ctypes_data_class): 21 | # type: (types.StringTypes, types.TypeType) -> None 22 | """ 23 | :param program_text: str - BPF-program text 24 | :param ctypes_data_class: ctypes.Structure - A Data class representing any extra parameters for the ebpf routine 25 | """ 26 | assert_type(types.StringTypes, program_text=program_text) 27 | assert_type(types.TypeType, ctypes_data_class=ctypes_data_class) 28 | 29 | self.program_text = program_text 30 | self.ctypes_data_class = ctypes_data_class 31 | 32 | 33 | class AttachedKProbeDescriptor(object): 34 | """ A wrapper around an event, a BPF object and an optional managed polling thread """ 35 | 36 | def __init__(self, event, bpf, polling_thread_event=None): 37 | # type: (types.StringTypes, BPF, [Event]) -> None 38 | 39 | """ 40 | :param event: str - An event name 41 | :param bpf: BPF - A BPF object with a compiled module 42 | :param polling_thread_event: Event - An event to set once the kprobe is detached 43 | """ 44 | 45 | assert_type(types.StringTypes, event=event) 46 | assert_type(BPF, bpf=bpf) 47 | 48 | if polling_thread_event is not None: 49 | assert_type(_Event, polling_thread_event=polling_thread_event) 50 | 51 | self.event = normalize_event(event) 52 | self.bpf = bpf 53 | self.polling_thread_event = polling_thread_event 54 | 55 | 56 | class NativeArgument(object): 57 | """ Represents an argument that is passed to a native function. """ 58 | 59 | def __init__(self, native_type_as_string, name, ctypes_type): 60 | # type: (types.StringTypes, types.StringTypes, types.TypeType) -> None 61 | 62 | """ 63 | :param native_type_as_string: str - A string representing the c-type of the argument, e.g. 'unsigned int' 64 | :param name: str - The name of the argument 65 | :param ctypes_type: The type represented as a ctypes type 66 | """ 67 | assert_type(types.StringTypes, native_type_as_string=native_type_as_string) 68 | assert_type(types.StringTypes, name=name) 69 | assert_type(types.TypeType, ctypes_type=ctypes_type) 70 | 71 | self.native_type_as_string = native_type_as_string 72 | self.name = name 73 | self.ctypes_type = ctypes_type 74 | 75 | def is_string(self): 76 | # type: () -> types.BooleanType 77 | """ 78 | :return: True if native type is a char pointer 79 | :rtype: bool 80 | """ 81 | return NativeArgument.is_native_type_a_string(self.native_type_as_string) 82 | 83 | @staticmethod 84 | def is_native_type_a_string(native_type_as_string): 85 | # type: () -> types.BooleanType 86 | """ 87 | :param: native_type_as_string: str - A string representing a native type 88 | :return: True if native type is a char pointer 89 | :rtype: bool 90 | """ 91 | 92 | return 'char' in native_type_as_string and '*' in native_type_as_string 93 | 94 | def __repr__(self): 95 | return '{} {}'.format(self.native_type_as_string, self.name) 96 | 97 | def __str__(self): 98 | return self.__repr__() 99 | 100 | 101 | class EBPFWrapper(BPF): 102 | # The syscall for-file template 103 | SYSCALL_FORMAT_FILE_TEMPLATE = os.getenv( 104 | 'SYSCALL_FORMAT_FILE_TEMPLATE', '/sys/kernel/debug/tracing/events/syscalls/sys_enter_{syscall_name}/format' 105 | ) 106 | 107 | # Max array size to be allocated when spotting strings as syscall arguments 108 | MAX_ARRAY_SIZE = int(os.getenv( 109 | 'MAX_ARRAY_SIZE', 128 110 | )) 111 | 112 | # eBPF Arguments are capped, as they are passed via registers 113 | MAX_PASSED_ARGS = int(os.getenv('MAX_PASSED_ARGS', 6)) 114 | 115 | # The default offset that all syscall-arguments (inclusive) are skipped till 116 | SKIP_SYSCALL_ARGS_TILL_OFFSET = int(os.getenv('SKIP_SYSCALL_ARGS_TILL_OFFSET', 16)) 117 | 118 | # Known syscall event aliases (e.g. open -> openat) 119 | SYSCALL_EVENT_ALIASES = { 120 | 'open': 'openat' 121 | } 122 | 123 | # Kprobe Event Syscall Prefix 124 | KRPOBE_EVENT_SYSCALL_PREFIX = os.getenv('KRPOBE_EVENT_SYSCALL_PREFIX', 'do_') 125 | 126 | # Default data structure members 127 | # noinspection PyTypeChecker 128 | DEFAULT_DATA_STRUCTURE_MEMBERS = [ 129 | NativeArgument('u64', 'current_time_ns', ct.c_uint64), 130 | NativeArgument('u32', 'process_id', ct.c_uint32), 131 | NativeArgument('u32', 'thread_id', ct.c_uint32), 132 | NativeArgument('u32', 'group_id', ct.c_uint32), 133 | NativeArgument('u32', 'user_id', ct.c_uint32), 134 | NativeArgument('char*', 'process_name', (ct.c_char * 16)), 135 | ] 136 | 137 | # A string representing a type, or a byte size - to a ctypes type 138 | C_TYPE_MAPPING = { 139 | 'char': ct.c_char, 140 | 'short': ct.c_short, 141 | 'int': ct.c_int, 142 | 'long': ct.c_long, 143 | 'float': ct.c_float, 144 | 'double': ct.c_double, 145 | 146 | 'umode_t': ct.c_ushort, 147 | 'unsigned char': ct.c_ubyte, 148 | 'unsigned short': ct.c_ushort, 149 | 'unsigned int': ct.c_uint, 150 | 'unsigned long': ct.c_ulong, 151 | 'unsigned long long': ct.c_ulonglong, 152 | 153 | 1: ct.c_char, 154 | 2: ct.c_short, 155 | 4: ct.c_int16, 156 | 8: ct.c_int32, 157 | 16: ct.c_int64 158 | } 159 | 160 | # Dummy program that is used in order to initialize internal data structures of parent 161 | _DUMMY_PROGRAM = 'int dummy(struct pt_regs* ctx) { return 0; }' 162 | 163 | logger = logging.getLogger('ebpf_wrapper') 164 | 165 | # noinspection PyMissingConstructor 166 | def __init__(self, **kwargs): 167 | # type: (**types.ObjectType) -> None 168 | 169 | self._attached_kprobes = {} 170 | 171 | log_level = kwargs.get('log_level', logging.INFO) 172 | self.logger.setLevel(log_level) 173 | 174 | if kwargs: 175 | self.logger.debug('Arguments were passed during init - will instantiate parent') 176 | super(EBPFWrapper, self).__init__(**kwargs) 177 | else: 178 | super(EBPFWrapper, self).__init__(text=self._DUMMY_PROGRAM) 179 | 180 | def detach_kprobe(self, event): 181 | # type: (types.StringTypes) -> None 182 | """ 183 | Detaches kprobes associated with the event nmae. 184 | 185 | :param event: str - Event name to detach kprobes of 186 | """ 187 | assert_type(types.StringTypes, event=event) 188 | 189 | event = normalize_event(event) 190 | 191 | if event not in self._attached_kprobes: 192 | self.logger.info('{} is not an attached kprobe'.format(event)) 193 | return 194 | 195 | kprobes_for_event = self._attached_kprobes.get(event, []) 196 | kprobes_for_event_copy = list(kprobes_for_event) 197 | 198 | for idx, descriptor in enumerate(kprobes_for_event_copy): 199 | # noinspection PyBroadException 200 | try: 201 | self.logger.info('Detaching kprobe with event={} bpf_idx={}'.format(event, idx)) 202 | if descriptor.bpf == self: 203 | # noinspection PyUnresolvedReferences 204 | super(EBPFWrapper, self).detach_kprobe(event) 205 | else: 206 | descriptor.bpf.detach_kprobe(event) 207 | 208 | if descriptor.polling_thread_event is not None: 209 | descriptor.polling_thread_event.set() 210 | 211 | kprobes_for_event.pop(idx) 212 | except Exception: 213 | self.logger.exception('Failed detaching kprobe for event={} bpf_idx={}; Continuing'.format(event, idx)) 214 | 215 | if not kprobes_for_event: 216 | self._attached_kprobes.pop(event) 217 | 218 | # noinspection PyMethodOverriding 219 | def attach_kprobe(self, event, fn=None, implicitly_add_syscall_args=True, **kwargs): 220 | # type: (types.StringTypes, types.FunctionType, types.BooleanType, **types.ObjectType) -> None 221 | """ 222 | Attaches a kernel probe to a given python function 223 | 224 | :param event: Name of the event to attach to 225 | :param fn: Python function to invoke 226 | :param implicitly_add_syscall_args: If True, will try to implicitly generate the syscall arguments, 227 | copying them from our ebpf routine back to user-space 228 | """ 229 | assert_type(types.StringTypes, event=event) 230 | if fn is not None: 231 | assert_type(types.FunctionType, fn=fn) 232 | 233 | event = normalize_event(event) 234 | attached_kprobe_descriptor = None 235 | 236 | try: 237 | if not fn: 238 | self.logger.debug('Function was not passed - fallbacking to default implementation') 239 | attached_kprobe_descriptor = AttachedKProbeDescriptor(event, self) 240 | return super(EBPFWrapper, self).attach_kprobe(event, **kwargs) 241 | 242 | event_without_syscall_prefix = None 243 | 244 | # noinspection PyUnresolvedReferences 245 | for syscall_prefix in self._syscall_prefixes: 246 | syscall_prefix = syscall_prefix.lower() 247 | if event.startswith(syscall_prefix) or \ 248 | event.startswith(self.KRPOBE_EVENT_SYSCALL_PREFIX + syscall_prefix): 249 | event_without_syscall_prefix = event \ 250 | .replace(syscall_prefix, '') \ 251 | .replace(self.KRPOBE_EVENT_SYSCALL_PREFIX, '') 252 | break 253 | 254 | if event_without_syscall_prefix is None: 255 | self.logger.debug('Event is not prefixed with a known syscall prefix - ' 256 | 'fallbacking to default implementation') 257 | attached_kprobe_descriptor = AttachedKProbeDescriptor(event, self) 258 | return super(EBPFWrapper, self).attach_kprobe(event, **kwargs) 259 | 260 | event_without_syscall_prefix = self._replace_event_alias_if_needed(event_without_syscall_prefix) 261 | attached_kprobe_descriptor = self._attach_kprobe_with_managed_polling_thread(event, 262 | event_without_syscall_prefix, 263 | fn, 264 | implicitly_add_syscall_args) 265 | finally: 266 | if attached_kprobe_descriptor is not None: 267 | self._attached_kprobes.setdefault(event, []).append(attached_kprobe_descriptor) 268 | 269 | def _replace_event_alias_if_needed(self, event_without_syscall_prefix): 270 | # type: (types.StringTypes) -> types.StringTypes 271 | """ 272 | Follows known event aliases (if any) and replaces them, or return the origin event 273 | 274 | :param event_without_syscall_prefix: str - The event without a syscall prefix 275 | :return: str - The dereferenced event from the alias, or the original event if no alias was found 276 | """ 277 | assert_type(types.StringTypes, event_without_syscall_prefix=event_without_syscall_prefix) 278 | 279 | event_alias = self.SYSCALL_EVENT_ALIASES.get(event_without_syscall_prefix, None) 280 | if event_alias is not None: 281 | self.logger.debug('Event is an alias, will replace "{}" with "{}"'.format(event_without_syscall_prefix, 282 | event_alias)) 283 | event_without_syscall_prefix = event_alias 284 | 285 | return event_without_syscall_prefix 286 | 287 | def _attach_kprobe_with_managed_polling_thread(self, event, event_without_syscall_prefix, fn, 288 | implicitly_add_syscall_args): 289 | # type: (types.StringTypes, types.StringTypes, types.FunctionType, types.BooleanType) -> AttachedKProbeDescriptor 290 | """ 291 | Attaches a kprobe with a given event, and spawns a daemon thread that polls on the kprobe map and calls 292 | the passed function as a callback. 293 | 294 | :param event: str - An event 295 | :param event_without_syscall_prefix: str - An event without its syscall prefix 296 | :param fn: function - A function to call once the kprobe was invoked 297 | :param implicitly_add_syscall_args: If True, will try to implicitly generate the syscall arguments 298 | :return: An AttachedKProbeDescriptor object that wraps the event, the bpf object and the polling thread 299 | :rtype: AttachedKProbeDescriptor 300 | """ 301 | assert_type(types.StringTypes, event=event) 302 | assert_type(types.StringTypes, event_without_syscall_prefix=event_without_syscall_prefix) 303 | assert_type(types.FunctionType, fn=fn) 304 | 305 | function_name = fn.__name__ 306 | self.logger.debug('Attaching kprobe to event={}'.format(event)) 307 | program_descriptor = self._generate_program_descriptor(event_without_syscall_prefix, function_name, 308 | implicitly_add_syscall_args) 309 | bpf = BPF(text=program_descriptor.program_text) 310 | bpf.attach_kprobe(event=event, fn_name=function_name) 311 | 312 | detach_event = Event() 313 | t = Thread(target=self._read_buffer_pool, name='{}::{}_thread'.format(event, function_name), 314 | args=(bpf, fn, program_descriptor.ctypes_data_class, detach_event)) 315 | t.setDaemon(True) 316 | t.start() 317 | 318 | return AttachedKProbeDescriptor(event, bpf, detach_event) 319 | 320 | def _generate_program_descriptor(self, event, function_name, implicitly_add_syscall_args): 321 | # type: (types.StringTypes, types.StringTypes) -> EBPFProgramDescriptor 322 | """ 323 | :param event: str - An event / syscall name 324 | :param function_name: A function name (The ebpf function will use this name) 325 | :param implicitly_add_syscall_args: If True, will try to implicitly generate the syscall arguments 326 | :return: A program text that copies all of the syscall parameters back to user-space 327 | :rtype: BPFProgramDescriptor 328 | """ 329 | assert_type(types.StringTypes, event=event) 330 | assert_type(types.StringTypes, function_name=function_name) 331 | 332 | syscall_args = [] 333 | if not implicitly_add_syscall_args: 334 | self.logger.debug('Implicit syscall argument generation is disabled') 335 | else: 336 | syscall_args = self._get_syscall_arguments(event) 337 | 338 | program_text = ''' 339 | #include 340 | #include 341 | #include 342 | 343 | BPF_PERF_OUTPUT(events); 344 | ''' 345 | 346 | data_struct = self._generate_data_struct(syscall_args) 347 | program_text += '{data_struct}'.format(data_struct=data_struct) 348 | 349 | data_struct_copy_body = self._generate_data_struct_copy_body(syscall_args) 350 | function_signature = self._generate_function_signature(function_name, syscall_args) 351 | program_text += ''' 352 | %(func_signature)s { 353 | struct data_t data = {}; 354 | 355 | %(data_struct_copy_body)s 356 | events.perf_submit(ctx, &data, sizeof(data)); 357 | return 0; 358 | } 359 | ''' % dict(func_signature=function_signature, data_struct_copy_body=data_struct_copy_body) 360 | 361 | data_class = self._generate_data_class(syscall_args) 362 | return EBPFProgramDescriptor(program_text, data_class) 363 | 364 | def _get_syscall_arguments(self, syscall_name, skip_till_offset=SKIP_SYSCALL_ARGS_TILL_OFFSET): 365 | # type: (types.StringTypes, types.IntType) -> types.ListType 366 | """ 367 | 368 | :param syscall_name: str - The name of the syscall to get the arguments for 369 | :param skip_till_offset: An offset that all arguments (inclusive) till it, are skipped 370 | :return: A list of NativeArgument objects, representing the syscall arguments, or None, if parsing fails 371 | :rtype: list 372 | """ 373 | 374 | assert_type(types.StringTypes, syscall_name=syscall_name) 375 | assert_type(types.IntType, skip_till_offset=skip_till_offset) 376 | 377 | format_file_path = self.SYSCALL_FORMAT_FILE_TEMPLATE.format(syscall_name=syscall_name) 378 | self.logger.debug('Will try to read syscall format file={}'.format(format_file_path)) 379 | 380 | # noinspection PyBroadException 381 | try: 382 | args = [] 383 | 384 | with open(format_file_path) as f: 385 | for line in f: 386 | stripped = line.strip() 387 | if not stripped.startswith('field:'): 388 | continue 389 | 390 | format_parts = [x.strip() for x in stripped.replace('field:', '').split(';') if x.strip()] 391 | assert len(format_parts) == 4, 'Failed parsing syscall field format; ' \ 392 | 'Expected a 4 parts separated by ";"' 393 | 394 | self.logger.debug('Parsing field line={}'.format(line)) 395 | 396 | offset = format_parts[1] 397 | assert offset.startswith('offset:'), 'Expected offset part to start with "offset:"' 398 | 399 | offset_match = re.findall('\d+', offset) 400 | assert offset_match and len(offset_match) == 1, 'Expected offset part to contain a single number' 401 | 402 | offset_num = int(offset_match[0]) 403 | if offset_num < skip_till_offset: 404 | self.logger.debug('Skipping field with offset={}'.format(offset_num)) 405 | continue 406 | 407 | size = format_parts[2] 408 | assert size.startswith('size:'), 'Expected size part to start with "size:"' 409 | 410 | size_match = re.findall('\d+', size) 411 | assert size_match and len(size_match) == 1, 'Expected size part to contain a single number' 412 | 413 | type_size = int(size_match[0]) 414 | 415 | type_and_name = format_parts[0] 416 | splat = type_and_name.split(' ') 417 | assert len(splat) >= 2, 'Expected type and name part to contain at least two spaces' 418 | 419 | c_type = ' '.join(splat[:-1]) 420 | name = splat[-1] 421 | 422 | native_arg = NativeArgument(c_type, name, self._resolve_ctype(c_type, type_size)) 423 | self.logger.debug('Parsed native arg={}'.format(native_arg)) 424 | args.append(native_arg) 425 | 426 | if len(args) > self.MAX_PASSED_ARGS: 427 | self.logger.warn("We've populated the maximum number of arguments ({}); " 428 | "Will return a partial argument list".format(self.MAX_PASSED_ARGS)) 429 | break 430 | except Exception as e: 431 | self.logger.error('Failed parsing syscall format file from path={} err={}; ' 432 | 'will return None'.format(format_file_path, e.message)) 433 | return [] 434 | 435 | return args 436 | 437 | def _generate_data_struct(self, syscall_args): 438 | # type: (types.ListType) -> types.StringTypes 439 | """ 440 | Generates a data structure representing string that is copied from kernel-space to user-space. 441 | 442 | :param syscall_args: List of syscall args 443 | :return: Data structure string to be shared from kernel space to user space (Could be empty) 444 | :rtype: str 445 | """ 446 | assert_type(types.ListType, syscall_args=syscall_args) 447 | 448 | data_struct_template = ''' 449 | struct data_t { 450 | %(syscall_args)s; 451 | }; 452 | 453 | ''' 454 | 455 | formatted_args = OrderedDict() 456 | for arg in (self.DEFAULT_DATA_STRUCTURE_MEMBERS + syscall_args): 457 | if arg.name in formatted_args: 458 | self.logger.warn('Duplicate arg found - will take the last match') 459 | 460 | if arg.is_string(): 461 | self.logger.debug('Spotted a string - implicitly converting to char array') 462 | # noinspection PyBroadException 463 | try: 464 | # noinspection PyUnresolvedReferences, PyProtectedMember 465 | array_size = arg.ctypes_type._length_ 466 | except Exception: 467 | # Try get array size from ctypes type best-effortly 468 | array_size = self.MAX_ARRAY_SIZE 469 | 470 | # noinspection PyTypeChecker 471 | arg = NativeArgument( 472 | 'char', 473 | '{arg_name}[{array_size}]'.format(arg_name=arg.name, array_size=array_size), 474 | ct.c_char * array_size 475 | ) 476 | formatted_args[arg.name] = str(arg) 477 | 478 | return data_struct_template % dict(syscall_args=';\n '.join(formatted_args.values())) 479 | 480 | def _generate_data_struct_copy_body(self, syscall_args): 481 | # type: (types.ListType) -> types.StringTypes 482 | """ 483 | Generates a string representing the copying of the syscall arguments to our shared data-structure 484 | 485 | :param syscall_args: List of syscall args 486 | :return: A string representing all copying to be made from syscall arguments 487 | (that are passed to our ebpf handler) back to user mode 488 | :rtype: str 489 | """ 490 | assert_type(types.ListType, syscall_args=syscall_args) 491 | 492 | body = ''' 493 | data.current_time_ns = bpf_ktime_get_ns(); 494 | data.process_id = bpf_get_current_pid_tgid() >> 32; 495 | data.thread_id = (u32) bpf_get_current_pid_tgid(); 496 | data.group_id = bpf_get_current_uid_gid() >> 32; 497 | data.user_id = (u32) bpf_get_current_uid_gid(); 498 | bpf_get_current_comm(&data.process_name, sizeof(data.process_name)); 499 | ''' 500 | 501 | for arg in syscall_args: 502 | if arg.is_string(): 503 | self.logger.debug('Spotted a string - will use bpf_probe_read to copy it') 504 | body += 'bpf_probe_read(&data.{arg_name}, sizeof(data.{arg_name}), (void*){arg_name});\n '.format( 505 | arg_name=arg.name) 506 | else: 507 | body += 'data.{arg_name} = {arg_name};\n '.format(arg_name=arg.name) 508 | 509 | return body 510 | 511 | def _generate_data_class(self, syscall_args): 512 | # type: (types.ListType) -> types.TypeType 513 | 514 | """ 515 | Generates a ctypes data-class given the syscall arguments 516 | 517 | :param syscall_args: list - A list of NativeArgument objects, representing syscall arguments 518 | that are passed to our ebpf routine 519 | :return: ctypes class that the members to copy from our ebpf routine back to user-space 520 | :rtype: type 521 | """ 522 | assert_type(types.ListType, syscall_args=syscall_args) 523 | fields = [(arg.name, arg.ctypes_type) for arg in (self.DEFAULT_DATA_STRUCTURE_MEMBERS + syscall_args)] 524 | 525 | # noinspection PyUnresolvedReferences 526 | class Data(ct.Structure): 527 | _fields_ = fields 528 | 529 | self.logger.debug('Generated data class fields: {}'.format(fields)) 530 | 531 | return Data 532 | 533 | # noinspection PyTypeChecker 534 | def _resolve_ctype(self, type_as_string, type_size): 535 | # type: (types.StringTypes, types.IntType) -> types.TypeType 536 | 537 | """ 538 | Given a string representing a type, and the type size - resolve a ctypes type 539 | 540 | :param type_as_string: str - A string representing a native type 541 | :param type_size: The size of the underlying native type 542 | :return: A ctypes type representing the native type 543 | :rtype: type 544 | """ 545 | assert_type(types.StringTypes, type_as_string=type_as_string) 546 | assert_type(types.IntType, type_size=type_size) 547 | 548 | # Qualifiers are irrelevant 549 | type_as_string = type_as_string.replace('const', '').strip() 550 | 551 | if NativeArgument.is_native_type_a_string(type_as_string): 552 | self.logger.debug('Spotted a string - will use an array instead') 553 | return ct.c_char * self.MAX_ARRAY_SIZE 554 | else: 555 | return self.C_TYPE_MAPPING.get( 556 | type_as_string, 557 | self.C_TYPE_MAPPING.get(type_size, ct.c_int) 558 | ) 559 | 560 | @staticmethod 561 | def _generate_function_signature(function_name, syscall_args): 562 | # type: (types.StringTypes, types.ListType) -> types.StringTypes 563 | """ 564 | 565 | :param function_name: The name of the ebpf function 566 | :param syscall_args: List of syscall arguments 567 | :return: A string representing the ebpf function signature (Including the syscall arguments, and the mandatory 568 | registers context struct) 569 | :rtype: str 570 | """ 571 | assert_type(types.StringTypes, function_name=function_name) 572 | assert_type(types.ListType, syscall_args=syscall_args) 573 | 574 | syscall_args = map(lambda x: str(x), syscall_args) 575 | base_function_signature = 'int {function_name} (struct pt_regs* ctx'.format(function_name=function_name) 576 | if syscall_args: 577 | base_function_signature += ', {syscall_args}'.format(syscall_args=', '.join(syscall_args)) 578 | 579 | base_function_signature += ')' 580 | return base_function_signature 581 | 582 | def _read_buffer_pool(self, bpf, callback, data_class, detach_event): 583 | # type: (BPF, types.FunctionType, types.TypeType, _Event) -> None 584 | 585 | """ 586 | A routine that will be called from a separate thread, that will call a given python callback, till 587 | detach_event is set. 588 | 589 | :param bpf: A BPF instance 590 | :param callback: A callback to call to whenever our kprobe was invoked 591 | :param data_class: A ctypes data class to pass to the routine as a kwarg 592 | :param detach_event: An event to check against, once this is set, our thread will stop polling 593 | """ 594 | assert_type(BPF, bpf=bpf) 595 | assert_type(types.FunctionType, callback=callback) 596 | assert_type(types.TypeType, data_class=data_class) 597 | assert_type(_Event, detach_event=detach_event) 598 | 599 | thread_name = current_thread().getName() 600 | 601 | def call_callback(cpu, data, size): 602 | # type: (types.IntType, types.TypeType, types.IntType) -> None 603 | 604 | """ 605 | On every BPF map poll, we'll call this wrapper, which will call the passed callback. 606 | 607 | :param cpu: CPU # 608 | :param data: Shread data class from our EBPF routine 609 | :param size: Size of data class, in bytes 610 | """ 611 | # noinspection PyUnresolvedReferences,PyBroadException 612 | try: 613 | # noinspection PyUnresolvedReferences 614 | data = ct.cast(data, ct.POINTER(data_class)).contents 615 | except Exception: 616 | # Try cast data best effortly 617 | pass 618 | 619 | if not detach_event.is_set(): 620 | # noinspection PyBroadException 621 | try: 622 | callback(cpu=cpu, data=data, size=size) 623 | except Exception: 624 | # Best-effort callback 625 | self.logger.exception('Failed calling callback') 626 | 627 | # noinspection PyUnresolvedReferences 628 | bpf['events'].open_perf_buffer(call_callback) 629 | while not detach_event.is_set(): 630 | # noinspection PyBroadException 631 | try: 632 | # noinspection PyUnresolvedReferences 633 | bpf.perf_buffer_poll() 634 | except Exception: 635 | # Best-effort polling, if we fail - we should kill the thread 636 | self.logger.exception('Failed polling from bpf map') 637 | break 638 | 639 | self.logger.info('Polling thread is detached name={}'.format(thread_name)) 640 | -------------------------------------------------------------------------------- /pyebpf/helpers.py: -------------------------------------------------------------------------------- 1 | import types 2 | 3 | 4 | def assert_type(expected_type, **kwargs): 5 | # type: (types.TypeType|types.TupleType[types.TypeType], **types.ObjectType) -> None 6 | 7 | for k, v in kwargs.iteritems(): 8 | assert isinstance(v, expected_type), 'Expected {} to be of type {}, actually got {}'.format(k, 9 | expected_type, 10 | type(v)) 11 | -------------------------------------------------------------------------------- /pyebpf/normalizers.py: -------------------------------------------------------------------------------- 1 | import types 2 | 3 | 4 | def normalize_event(event): 5 | # type: (types.StringTypes) -> types.StringTypes 6 | return event.lower().strip() 7 | -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | from io import open 2 | from os import path 3 | 4 | from setuptools import setup, find_packages 5 | 6 | here = path.abspath(path.dirname(__file__)) 7 | 8 | with open(path.join(here, 'README.md'), encoding='utf-8') as f: 9 | long_description = f.read() 10 | 11 | setup( 12 | name='pyebpf', 13 | version='1.0.4', 14 | description='A bcc-based Python eBPF (Extended-Berkeley-Packet-Filter) wrapper', 15 | long_description=long_description, # Optional 16 | long_description_content_type='text/markdown', # Optional (see note above) 17 | url='https://github.com/dany74q/pyebpf', 18 | author='Danny Shemesh (dany74q)', 19 | author_email='dany74q@gmail.com', 20 | classifiers=[ 21 | 'Development Status :: 4 - Beta', 22 | 'Intended Audience :: Developers', 23 | 'License :: OSI Approved :: MIT License', 24 | 'Programming Language :: Python :: 2', 25 | 'Programming Language :: Python :: 2.7', 26 | 'Programming Language :: Python :: 3', 27 | 'Programming Language :: Python :: 3.4', 28 | 'Programming Language :: Python :: 3.5', 29 | 'Programming Language :: Python :: 3.6', 30 | 'Programming Language :: Python :: 3.7', 31 | ], 32 | keywords='bpf ebpf', 33 | packages=find_packages(), 34 | project_urls={ # Optional 35 | 'Bug Reports': 'https://github.com/dany74q/pyebpf/issues', 36 | 'Source': 'https://github.com/dany74q/pyebpf/', 37 | } 38 | ) 39 | --------------------------------------------------------------------------------