├── README.md ├── command-client.py ├── command-server.py ├── presentation ├── Debugging across pipes and sockets with strace.md └── Debugging across pipes and sockets with strace.pdf ├── unprivileged-dmesg.py └── unprivileged-ls.py /README.md: -------------------------------------------------------------------------------- 1 | # strace-pipes-presentation 2 | 3 | Presentation: _**Debugging across pipes and sockets with strace**_ 4 | 5 | See the `presentation` directory for Markdown and PDF slides. 6 | 7 | ## License 8 | 9 | * all code under [MIT](https://opensource.org/licenses/MIT) 10 | * presentation under [CC-BY](https://creativecommons.org/licenses/by/4.0/) 11 | -------------------------------------------------------------------------------- /command-client.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python2 2 | 3 | from __future__ import print_function 4 | import socket 5 | import sys 6 | 7 | name = sys.argv[1] 8 | 9 | sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) 10 | sock.connect(("localhost", 1234)) 11 | 12 | sock.sendall(name.encode('utf-8')) 13 | 14 | while True: 15 | data = sock.recv(100) 16 | if len(data) == 0: 17 | break 18 | sys.stdout.write(data) 19 | -------------------------------------------------------------------------------- /command-server.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python2 2 | 3 | from __future__ import print_function 4 | import socket 5 | import subprocess 6 | from subprocess import PIPE 7 | 8 | serversocket = socket.socket(socket.AF_INET, socket.SOCK_STREAM) 9 | serversocket.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1) 10 | serversocket.bind(("localhost", 1234)) 11 | serversocket.listen(5) 12 | 13 | def run_command_for_client(command, clientsocket): 14 | if command in ["ls", "dmesg"]: 15 | p = subprocess.Popen([command], stdout=PIPE) 16 | p.wait() 17 | out = p.stdout.read() 18 | else: 19 | out = b"command not allowed\n" 20 | clientsocket.sendall(out) 21 | 22 | # Server loop 23 | while True: 24 | (clientsocket, address) = serversocket.accept() 25 | 26 | command = clientsocket.recv(100).decode('utf-8') 27 | print("server got command: " + command) 28 | 29 | run_command_for_client(command, clientsocket) 30 | 31 | clientsocket.close() 32 | -------------------------------------------------------------------------------- /presentation/Debugging across pipes and sockets with strace.md: -------------------------------------------------------------------------------- 1 | 2 | 3 | # Debugging across pipes and sockets with `strace` 4 | 5 | ## Niklas Hambüchen, _FP Complete_ 6 | 7 | 8 | 9 | 10 | --- 11 | 12 | # Scenario 13 | 14 | #### Privileged-server / unprivileged-client 15 | 16 | * Server that runs restricted set of commands (`ls`, `dmesg`) sent to it via a socket as root 17 | * Client that sends `argv[0]` to the server and prints the output 18 | 19 | #### Problem 20 | 21 | * `ls` case works fine 22 | * `dmesg` case just hangs without output 23 | 24 | #### Further complication 25 | 26 | * Assume this is hard to reproduce (only happens every 1000th time), so you really want to debug it on the currently hanging system, and not restart any processes. 27 | 28 | --- 29 | 30 | ## Code 31 | 32 | Command server `command-server.py`: 33 | 34 | ```python 35 | #!/usr/bin/env python2 36 | 37 | from __future__ import print_function 38 | import socket 39 | import subprocess 40 | from subprocess import PIPE 41 | 42 | serversocket = socket.socket(socket.AF_INET, socket.SOCK_STREAM) 43 | serversocket.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1) 44 | serversocket.bind(("localhost", 1234)) 45 | serversocket.listen(5) 46 | ``` 47 | 48 | --- 49 | 50 | `command-server.py` continued: 51 | 52 | ```python 53 | def run_command_for_client(command, clientsocket): 54 | if command in ["ls", "dmesg"]: 55 | p = subprocess.Popen([command], stdout=PIPE) 56 | p.wait() 57 | out = p.stdout.read() 58 | else: 59 | out = b"command not allowed\n" 60 | clientsocket.sendall(out) 61 | 62 | # Server loop 63 | while True: 64 | (clientsocket, address) = serversocket.accept() 65 | 66 | command = clientsocket.recv(100).decode('utf-8') 67 | print("server got command: " + command) 68 | 69 | run_command_for_client(command, clientsocket) 70 | 71 | clientsocket.close() 72 | ``` 73 | 74 | --- 75 | 76 | Command client `command-client.py`: 77 | 78 | ```python 79 | #!/usr/bin/env python2 80 | 81 | from __future__ import print_function 82 | import socket 83 | import sys 84 | 85 | name = sys.argv[1] 86 | 87 | sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) 88 | sock.connect(("localhost", 1234)) 89 | 90 | sock.sendall(name.encode('utf-8')) 91 | 92 | while True: 93 | data = sock.recv(100) 94 | if len(data) == 0: 95 | break 96 | sys.stdout.write(data) 97 | ``` 98 | 99 | --- 100 | 101 | Command wrappers: 102 | 103 | `unprivileged-ls.py` 104 | 105 | ```python 106 | #!/usr/bin/env python2 107 | 108 | from __future__ import print_function 109 | import subprocess 110 | from subprocess import PIPE 111 | import sys 112 | 113 | 114 | p = subprocess.Popen(['./command-client.py', "ls"], 115 | stdout=PIPE) 116 | 117 | sys.stdout.write(p.communicate()[0]) 118 | ``` 119 | 120 | `unprivileged-dmesg.py` (same thing with `dmesg`) 121 | 122 | ```python 123 | .... 124 | p = subprocess.Popen(['./command-client.py', "dmesg"], ... 125 | ``` 126 | 127 | --- 128 | 129 | ## Program communication 130 | 131 | ``` 132 | ./unprivileged-dmesg.py 133 | | | 134 | | | stdout pipe 135 | | | 136 | ./command-client.py 137 | | 138 | | TCP socket 139 | | 140 | ./command-server.py 141 | | | 142 | | | stdout pipe 143 | | | 144 | dmesg executable 145 | ``` 146 | 147 | --- 148 | 149 | ## Outputs 150 | 151 | ```plain 152 | % ./command-server.py 153 | server got command: ls 154 | server got command: dmesg 155 | ``` 156 | 157 | ```plain 158 | % ./unprivileged-ls.py 159 | command-client.py 160 | command-server.py 161 | unprivileged-dmesg.py 162 | unprivileged-ls.py 163 | ``` 164 | 165 | ```plain 166 | % ./unprivileged-dmesg.py 167 | [hangs] 168 | ``` 169 | 170 | --- 171 | 172 | > 95% of computer problems can be solved with `strace`. 173 | 174 | _me_ 175 | 176 | --- 177 | 178 | # strace part 179 | 180 | --- 181 | 182 | ## `strace` (on the process that hangs) 183 | 184 | ``` 185 | % sudo strace -fp "$(pgrep -f 'unprivileged-dmesg')" 186 | strace: Process 23572 attached 187 | read(3, 188 | ``` 189 | 190 | Add `-y` (_prints paths associated with file descriptor arguments_): 191 | 192 | ``` 193 | % sudo strace -fp "$(pgrep -f 'unprivileged-dmesg')" -y 194 | strace: Process 23572 attached 195 | read(3, 196 | ``` 197 | 198 | The pipe number it's trying to read from is `139828372`. 199 | 200 | Let's chase down that pipe. 201 | 202 | --- 203 | 204 | ## Chasing pipes with `lsof` 205 | 206 | ``` 207 | % lsof -n -P | grep --color '139828372' 208 | COMMAND PID .. FD TYPE DEVICE .. NODE NAME 209 | python2 23572 .. 3r FIFO 0,12 .. 139828372 pipe 210 | python2 23573 .. 1w FIFO 0,12 .. 139828372 pipe 211 | ``` 212 | 213 | `3r` means process `23572` has a read-end of the pipe open as file descriptior `3`. See: 214 | 215 | ``` 216 | % ls -l /proc/23572/fd/3 217 | ... /proc/23572/fd/3 -> pipe:[139828372] 218 | ``` 219 | 220 | `1w` means process `23573` has a write-end of the pipe open as file descriptior `1`. 221 | 222 | **So the only possible producer to unblock our `read(3,` in `strace` is process `23573`.** 223 | 224 | --- 225 | 226 | Let's `strace` the process that has the write end: 227 | 228 | ``` 229 | % sudo strace -fp 23573 -y 230 | strace: Process 23573 attached 231 | recvfrom(3, 232 | ``` 233 | 234 | It's blocked reading from a socket. 235 | 236 | Let's chase down that socket. 237 | 238 | --- 239 | 240 | ## Chasing sockets with `lsof` 241 | 242 | ``` 243 | % lsof -n -P | grep --color '139825870' 244 | COMMAND PID .. FD TYPE DEVICE .. NODE NAME 245 | python2 23573 .. 3u IPv4 139825870 .. TCP 127.0.0.1:39392->127.0.0.1:1234 (ESTABLISHED) 246 | ``` 247 | 248 | `TCP 127.0.0.1:39392->127.0.0.1:1234 (ESTABLISHED)` 249 | 250 | Follow that TCP connection (potentially on a another machine, in our case `127.0.0.1` is the same machine): 251 | 252 | ``` 253 | % lsof -n -P | grep --color '\b1234\b' 254 | COMMAND PID NODE NAME 255 | python2 23270 .. TCP 127.0.0.1:1234 (LISTEN) 256 | python2 23270 .. TCP 127.0.0.1:1234->127.0.0.1:39392 (ESTABLISHED) 257 | python2 23573 .. TCP 127.0.0.1:39392->127.0.0.1:1234 (ESTABLISHED) 258 | ``` 259 | 260 | So process `23270` has the other end of that socket. 261 | 262 | Let's `strace` it. 263 | 264 | --- 265 | 266 | ## General appraoch 267 | 268 | Find what a process is blocked reading from / writing to with `strace -y`. 269 | 270 | * If it's a **file** `strace -y` will show it inline. 271 | * If it's a **pipe**, look up its number in `lsof`. 272 | Find PID that has the other end, strace that one. 273 | * If it's a **socket**, look up its number in `lsof`. 274 | Find PID (or host IP) that has the other end, strace that one. 275 | 276 | --- 277 | 278 | strace'ing the process that has the other end of the TCP socket 279 | (PID `23270` is `python2 ./command-server.py`): 280 | 281 | ``` 282 | % sudo strace -fp 23270 -y 283 | strace: Process 23270 attached 284 | wait4(32084, 285 | ``` 286 | 287 | Blocked on `wait4()`, so that's probably this code from the server: 288 | 289 | ```python 290 | p = subprocess.Popen([name], stdout=PIPE) 291 | p.wait() 292 | ``` 293 | 294 | We've found the bug, because this is incorrect code, as [the Python `Popen()` documentation says on `wait()`](https://docs.python.org/3/library/subprocess.html#subprocess.Popen.wait): 295 | 296 | > This will deadlock when using stdout=PIPE ... and the child process generates enough output to a pipe such that it blocks waiting for the OS pipe buffer to accept more data. 297 | > Use Popen.communicate() when using pipes to avoid that. 298 | 299 | `dmesg` creates more output than fits in the pipe buffer. 300 | 301 | --- 302 | 303 | This was easy to identify because there's only one `wait()` invocation in the server. 304 | 305 | But how would we find the location of the problem if there were hundreds of `wait()` invocations in the server software, across many files? 306 | 307 | **In general, after you've chased down the problematic process using a series of straces, how do you find the userspace location issuing the blocking syscall?** 308 | 309 | --- 310 | 311 | > 95% of computer problems can be solved with `strace`. 312 | > For the remaining 4% there's `gdb`. 313 | 314 | _me_ 315 | 316 | --- 317 | 318 | # GDB part 319 | 320 | --- 321 | 322 | ## Inspecting Python with GDB 323 | 324 | From [this deleted StackOverflow question](https://stackoverflow.com/a/25297075/263061) and [here](https://web.archive.org/web/20131218105608/https://www.python.org/~jeremy/weblog/031003.html) we learn: 325 | 326 | Switch to the frame in the stack which has function: 327 | 328 | ```c 329 | PyEval_EvalFrameEx (or eval_frame) # For Python < 3 330 | ``` 331 | To get the **file name**: 332 | 333 | ```c 334 | x/s ((PyStringObject*)f->f_code->co_filename)->ob_sval 335 | ``` 336 | 337 | To get the **function name**: 338 | 339 | ```c 340 | x/s ((PyStringObject*)f->f_code->co_name)->ob_sval 341 | ``` 342 | 343 | To get the **line number**: 344 | 345 | ```c 346 | print f->f_lineno 347 | ``` 348 | 349 | --- 350 | 351 | ## GDB preparation 352 | 353 | If your GDB shows 354 | 355 | ``` 356 | Reading symbols from /usr/bin/python2.7... 357 | (no debugging symbols found)...done. 358 | ``` 359 | 360 | then install debugging symbols, e.g. 361 | 362 | ```bash 363 | sudo apt-get install python2.7-dbg 364 | ``` 365 | 366 | ## GDB and blocking syscalls 367 | 368 | When a process is blocked on a syscall, GDB drops you into a shell at that syscall, and you can ask for the `backtrace` to see how the program got there. 369 | 370 | --- 371 | 372 | ## GDB run 373 | 374 | ``` 375 | % sudo gdb -p $(pgrep -f command-server) 376 | Attaching to process 23270 377 | 378 | 0x00007f61090aff2a in __waitpid (pid=2946, stat_loc=0x7ffcf7e32a5c, options=0) at ../sysdeps/unix/sysv/linux/waitpid.c:29 379 | 380 | (gdb) backtrace 381 | #0 0x00007f61090aff2a in __waitpid (pid=2946, ...) 382 | #1 0x0000000000576dd6 in posix_waitpid.lto_priv () 383 | #2 0x00000000004c30ce in ext_do_call (...) 384 | #3 PyEval_EvalFrameEx () 385 | #4 0x00000000004b9ab6 in PyEval_EvalCodeEx () 386 | #5 0x00000000004c1e6f in fast_function (...) 387 | #6 call_function (...) 388 | #7 PyEval_EvalFrameEx () 389 | #8 0x00000000004c136f in fast_function (...) 390 | ... 391 | ``` 392 | 393 | OK, stuck in `__waitpid()`. We are in stack frame `#0`. 394 | Let's go `up` to `PyEval_EvalFrameEx` ... 395 | 396 | --- 397 | 398 | ``` 399 | (gdb) up 400 | #1 0x0000000000576dd6 in posix_waitpid.lto_priv (...) 401 | 402 | (gdb) up 403 | #2 0x00000000004c30ce in ext_do_call (...) 404 | 405 | (gdb) up 406 | #3 PyEval_EvalFrameEx () 407 | ``` 408 | 409 | ``` 410 | (gdb) x/s ((PyStringObject*)f->f_code->co_filename)->ob_sval 411 | 0x7f6109388eac: "/usr/lib/python2.7/subprocess.py" 412 | 413 | (gdb) print f->f_lineno 414 | $1 = 473 415 | 416 | (gdb) x/s ((PyStringObject*)f->f_code->co_name)->ob_sval 417 | 0x7f610932ac94: "_eintr_retry_call" 418 | ``` 419 | 420 | We're at the bottom of the Python standard library, in a wrapper that loops around the `wait()` syscall ([code of subprocess.py:473](https://github.com/python/cpython/blob/v2.7.12/Lib/subprocess.py#L473)). 421 | Let's go futher `up` until we're in our applications's code. 422 | 423 | --- 424 | 425 | ``` 426 | (gdb) up 427 | #4 0x00000000004b9ab6 in PyEval_EvalCodeEx () 428 | (gdb) up 429 | #5 0x00000000004c1e6f in fast_function (...) 430 | (gdb) up 431 | #6 call_function (...) 432 | (gdb) up 433 | #7 PyEval_EvalFrameEx () 434 | 435 | (gdb) x/s ((PyStringObject*)f->f_code->co_filename)->ob_sval 436 | 0x7f610933fa2c: "/usr/lib/python2.7/subprocess.py" 437 | 438 | (gdb) up 439 | #8 0x00000000004c136f in fast_function (...) 440 | (gdb) up 441 | #9 call_function () 442 | (gdb) up 443 | #10 PyEval_EvalFrameEx () 444 | 445 | (gdb) x/s ((PyStringObject*)f->f_code->co_filename)->ob_sval 446 | 0x7f61093924b4: "./command-server.py" 447 | ``` 448 | 449 | This is the first/lowest stack frame that's in our code. 450 | 451 | --- 452 | 453 | ``` 454 | (gdb) x/s ((PyStringObject*)f->f_code->co_name)->ob_sval 455 | 0x7f879c0d4454: "run_command_for_client" 456 | 457 | (gdb) print f->f_lineno 458 | $1 = 13 459 | ``` 460 | 461 | So we've tracked down the precise location in our python program's userpace: 462 | 463 | It's the `wait()` call in `command-server.py`, in the function `run_command_for_client()` which starts at line `13`. 464 | 465 | --- 466 | 467 | # Summary 468 | 469 | 1. Investigate issues on the running system via syscalls using `strace`. 470 | 2. Chase through pipes, sockets and across machines, with `strace` and `lsof`. 471 | 3. Find the origin of the final syscall in your userspace program using `gdb` or a similar debugger. 472 | 473 | You can do this to debug hard-to-reproduce problems in production, and knowing very little about the programs you are debugging. 474 | 475 | System calls are tye universal inspection point on Linux. 476 | 477 | _**Thanks!**_ 478 | -------------------------------------------------------------------------------- /presentation/Debugging across pipes and sockets with strace.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/nh2/strace-pipes-presentation/0f5862ac1717cd4b4cb18209081efb7232724fd8/presentation/Debugging across pipes and sockets with strace.pdf -------------------------------------------------------------------------------- /unprivileged-dmesg.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python2 2 | 3 | from __future__ import print_function 4 | import subprocess 5 | from subprocess import PIPE 6 | import sys 7 | 8 | 9 | p = subprocess.Popen(['./command-client.py', "dmesg"], stdout=PIPE) 10 | 11 | sys.stdout.write(p.communicate()[0]) 12 | -------------------------------------------------------------------------------- /unprivileged-ls.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python2 2 | 3 | from __future__ import print_function 4 | import subprocess 5 | from subprocess import PIPE 6 | import sys 7 | 8 | 9 | p = subprocess.Popen(['./command-client.py', "ls"], stdout=PIPE) 10 | 11 | sys.stdout.write(p.communicate()[0]) 12 | --------------------------------------------------------------------------------