├── .gitignore ├── LICENSE.txt ├── Makefile ├── README.md ├── example └── master.py ├── pyseidon ├── __init__.py ├── client │ ├── Makefile │ ├── __init__.py │ ├── lru_cache.py │ ├── pyseidon │ ├── pyseidon-client │ └── pyseidon.c └── handlers │ └── __init__.py ├── setup.cfg └── setup.py /.gitignore: -------------------------------------------------------------------------------- 1 | /build/ 2 | /client/pyseidon 3 | *.egg-info 4 | /dist 5 | -------------------------------------------------------------------------------- /LICENSE.txt: -------------------------------------------------------------------------------- 1 | The MIT License (MIT) 2 | 3 | Copyright (c) 2015 Greg Brockman 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in 13 | all copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN 21 | THE SOFTWARE. 22 | -------------------------------------------------------------------------------- /Makefile: -------------------------------------------------------------------------------- 1 | upload: 2 | rm -rf dist 3 | python setup.py sdist 4 | twine upload dist/* 5 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Pyseidon: A fork-based "load-once, run many times" Python server 2 | 3 | ![Pyseidon: A fork-based "load-once, run many times" Python server](https://i.imgur.com/DwC22BK.png) 4 | 5 | Pyseidon implements a "load once, run many times" paradigm for any 6 | Python workload. It's useful for speeding up load times where there's 7 | some slow fixed setup required each time you launch a process, 8 | followed by some dynamically changing workload. 9 | 10 | A common use-case is preloading some slow-to-load dependencies that 11 | don't change much; alternatively, you can use it to preload a big 12 | dataset once. 13 | 14 | Pyseidon works by launching a server process which performs one-time 15 | setup; you can then dynamically connect a client to it which causes 16 | the server to fork & run your workload in the resulting subprocess. 17 | 18 | As a user, it feels just like running a Python script, but the 19 | one-time setup happens exactly once. 20 | 21 | [Poseidon](https://github.com/stripe-ctf/poseidon) is a Ruby 22 | implementation of the same concept. 23 | 24 | # Installation 25 | 26 | You can install via `pip`: 27 | 28 | ``` 29 | pip install pyseidon 30 | ``` 31 | 32 | Or you can install from source: 33 | 34 | ``` 35 | git clone https://github.com/gdb/pyseidon 36 | pip install -e pyseidon 37 | cd pyseidon/client && make 38 | ``` 39 | 40 | # How to use 41 | 42 | - Create & run a Pyseidon server process: 43 | 44 | ```shell 45 | $ cat < server.py 46 | import pyseidon 47 | import sys 48 | def handler(): 49 | print(f'Hi from worker. Your client ran with args: {sys.argv}') 50 | 51 | pyseidon = pyseidon.Pyseidon() 52 | pyseidon.run(handler) 53 | EOF 54 | $ python server.py 55 | ``` 56 | 57 | - Connect to the server process via the provided client: 58 | 59 | ```shell 60 | $ pyseidon a b c 61 | Hi from worker. Your client ran with args: ['a', 'b', 'c'] 62 | ``` 63 | 64 | # What's going on 65 | 66 | Pyseidon works by having the client send its stdin, stdout, and stderr 67 | file descriptors to a worker process spawned by the server. The 68 | workflow is the following: 69 | 70 | - The Pyseidon server accepts a connection from a client, forking off 71 | a worker. 72 | - The client sends over its argv, its current working directory, and 73 | its stdin, stdout, stderr file descriptors. 74 | - The worker installs those file descriptors, cds to that working 75 | directory, and then executes the provided handler from the server. 76 | 77 | The worker, having forked from the server, has full copy-on-write 78 | access to all variables and code in the server's address space. This 79 | means it can access any data sets that were already loaded by the 80 | server (and can also stomp on that data or load new code without 81 | worry). 82 | 83 | # TODO 84 | 85 | - Forward signals from the client 86 | - Rewrite the client in Rust 87 | - Find a less hacky way to compile the client 88 | -------------------------------------------------------------------------------- /example/master.py: -------------------------------------------------------------------------------- 1 | import os 2 | import sys 3 | import time 4 | sys.path.append(os.path.join(os.path.dirname(__file__), '..')) 5 | 6 | import pyseidon 7 | 8 | pyseidon = pyseidon.Pyseidon() 9 | 10 | # Try running things like: 11 | # 12 | # pyseidon exit 1 13 | # pyseidon signal 15 14 | def handler(): 15 | print('Hello from Poseidon worker! You sent the following arguments:', sys.argv) 16 | # time.sleep(100) 17 | 18 | if sys.argv[0] == 'exit': 19 | sys.exit(int(sys.argv[1])) 20 | elif sys.argv[0] == 'signal': 21 | os.kill(os.getpid(), int(sys.argv[1])) 22 | time.sleep(1) 23 | print('Huh, signal did not kill me') 24 | 25 | pyseidon.run(handler) 26 | -------------------------------------------------------------------------------- /pyseidon/__init__.py: -------------------------------------------------------------------------------- 1 | import array 2 | import atexit 3 | import errno 4 | import fcntl 5 | import _multiprocessing 6 | import os 7 | import select 8 | import signal 9 | import socket 10 | import struct 11 | import sys 12 | 13 | # Read a line up to a custom delimiter 14 | def _recvline(io, delim=b'\n'): 15 | buf = [] 16 | while True: 17 | byte = io.recv(1) 18 | buf.append(byte) 19 | 20 | # End of line reached! 21 | if byte == b'' or byte == delim: 22 | return b''.join(buf) 23 | 24 | def _recvfds(sock): 25 | msg, anc, flags, addr = sock.recvmsg(1, 4096) 26 | fds = [] 27 | for level, type, data in anc: 28 | fda = array.array('I') 29 | fda.frombytes(data) 30 | fds.extend(fda) 31 | return fds 32 | 33 | def _recvfd(sock): 34 | fds = _recvfds(sock) 35 | assert len(fds) == 1, 'Expected exactly one FD, but got: {}'.format(fds) 36 | return fds[0] 37 | 38 | class Pyseidon(object): 39 | def __init__(self, path='/tmp/pyseidon.sock'): 40 | self.path = path 41 | self.children = {} 42 | self.master_pid = os.getpid() 43 | 44 | r, w = os.pipe() 45 | # These are purely informational, so there's no point in 46 | # blocking on them. 47 | fcntl.fcntl(r, fcntl.F_SETFL, fcntl.fcntl(r, fcntl.F_GETFL) | os.O_NONBLOCK) 48 | fcntl.fcntl(w, fcntl.F_SETFL, fcntl.fcntl(w, fcntl.F_GETFL) | os.O_NONBLOCK) 49 | self.loopbreak_reader = os.fdopen(r, 'rb', 0) 50 | self.loopbreak_writer = os.fdopen(w, 'wb', 0) 51 | 52 | def _run_event_loop(self): 53 | while True: 54 | conns = {} 55 | for child in self.children.values(): 56 | if not child['notified']: 57 | conns[child['conn']] = child 58 | 59 | try: 60 | # We want to detect when a client has hung up (so we 61 | # can tell the child about this). See 62 | # http://stefan.buettcher.org/cs/conn_closed.html for 63 | # another way of solving this problem with poll(2). 64 | candidates = [self.loopbreak_reader, self.sock] + list(conns.keys()) 65 | readers, _, _ = select.select(candidates, [], []) 66 | except select.error as e: 67 | if e.errno == errno.EINTR: 68 | # Probably just got a SIGCHLD. We'll forfeit this run 69 | # through the loop. 70 | continue 71 | else: 72 | raise 73 | 74 | for reader in readers: 75 | if reader == self.loopbreak_reader: 76 | # Drain the loopbreak reader 77 | self.loopbreak_reader.read() 78 | self._reap() 79 | elif reader == self.sock: 80 | argv = self._accept() 81 | # In the master, we'll just hit another cycle through 82 | # the loop. 83 | if not self._is_master(): 84 | return argv 85 | elif reader in conns: 86 | child = conns[reader] 87 | data = self._socket_peek(reader) 88 | if len(data) == 0: 89 | self._notify_socket_dead(child) 90 | elif data is None: 91 | raise RuntimeError('Socket unexpectedly showed up in readers list, but has nothing to read: child={}'.format(child['pid'])) 92 | else: 93 | raise RuntimeError('Socket unexpectedly had available data: child={} data={}'.format(child['pid'], data)) 94 | 95 | def _socket_peek(self, sock): 96 | try: 97 | # See what data is available 98 | data = sock.recv(256, socket.MSG_PEEK | socket.MSG_DONTWAIT) 99 | except socket.error as e: 100 | if e.errno == errno.EAGAIN: 101 | # Socket is fine, and there's nothing to read. 102 | return None 103 | # Hm, something's wrong. 104 | raise 105 | else: 106 | return data 107 | 108 | def _notify_socket_dead(self, child): 109 | child['notified'] = True 110 | print('[{}] Client disconnected; sending HUP: child={}'.format(os.getpid(), child['pid']), file=sys.stderr) 111 | try: 112 | # HUP is about right for this. 113 | os.kill(child['pid'], signal.SIGHUP) 114 | except OSError as e: 115 | # ESRCH means the process is dead, and it'll get cleaned 116 | # up automatically. 117 | if e.errno != errno.ESRCH: 118 | raise 119 | 120 | def _listen(self): 121 | self.sock = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM) 122 | 123 | # Make sure this socket is readable only by the current user, 124 | # since it'll give possibly arbitrary code execution to anyone 125 | # who can connect to it. 126 | umask = os.umask(0o077) 127 | try: 128 | self.sock.bind(self.path) 129 | finally: 130 | os.umask(umask) 131 | atexit.register(self._remove_socket) 132 | 133 | self.sock.listen(1) 134 | print('[{}] Pyseidon master booted'.format(os.getpid()), file=sys.stderr) 135 | 136 | def _accept(self): 137 | conn, _ = self.sock.accept() 138 | 139 | # Note that these will be blocking, so a slow or misbehaving 140 | # client could in theory cause issues. We could solve this by 141 | # adding these to the event loop. 142 | argv = self._read_argv(conn) 143 | env = self._read_env(conn) 144 | cwd = self._read_cwd(conn) 145 | fds = self._read_fds(conn) 146 | 147 | pid = os.fork() 148 | if pid: 149 | # Master 150 | print('[{}] Spawned worker: pid={} argv={} cwd={}'.format(os.getpid(), pid, argv, cwd), file= sys.stderr) 151 | # Do not want these FDs 152 | for fd in fds: 153 | fd.close() 154 | self.children[pid] = {'conn': conn, 'pid': pid, 'notified': False} 155 | else: 156 | # Worker 157 | self._setup_env(conn, argv, env, cwd, fds) 158 | return argv 159 | 160 | def _setup_env(self, conn, argv, env, cwd, fds): 161 | # Close now-unneeded file descriptors 162 | conn.close() 163 | self.loopbreak_reader.close() 164 | self.loopbreak_writer.close() 165 | self.sock.close() 166 | 167 | print('[{}] cwd={} argv={} env_count={}'.format(os.getpid(), cwd, argv, len(env)), file=sys.stderr) 168 | 169 | # Python doesn't natively let you set your actual 170 | # procname. TODO: consider importing a library for that. 171 | sys.argv = [a.decode('utf-8') for a in argv[1:]] 172 | env = {k.decode('utf-8'): v.decode('utf-8') for k, v in env.items()} 173 | 174 | # This changes the actual underlying environment 175 | os.environ.clear() 176 | os.environ.update(env) 177 | 178 | os.chdir(cwd) 179 | 180 | # Set up file descriptors 181 | stdin, stdout, stderr = fds 182 | os.dup2(stdin.fileno(), 0) 183 | os.dup2(stdout.fileno(), 1) 184 | os.dup2(stderr.fileno(), 2) 185 | stdin.close() 186 | stdout.close() 187 | stderr.close() 188 | 189 | def _is_master(self): 190 | return os.getpid() == self.master_pid 191 | 192 | 193 | def _remove_socket(self): 194 | # Don't worry about removing the socket if a worker exits 195 | if not self._is_master(): 196 | return 197 | 198 | try: 199 | os.unlink(self.path) 200 | except OSError: 201 | if os.path.exists(self.path): 202 | raise 203 | 204 | def _read_argv(self, conn): 205 | return self._read_array(conn) 206 | 207 | def _read_env(self, conn): 208 | env = {} 209 | kv_pairs = self._read_array(conn) 210 | for kv in kv_pairs: 211 | k, v = kv.split(b'=', 1) 212 | env[k] = v 213 | return env 214 | 215 | def _read_array(self, conn): 216 | argc_packed = conn.recv(4) 217 | argc, = struct.unpack('I', argc_packed) 218 | 219 | argv = [] 220 | for i in range(argc): 221 | line = _recvline(conn, b'\0') 222 | if line[-1:] != b'\0': 223 | raise RuntimeError("Corrupted array; not null terminated: {}".format(line)) 224 | argv.append(line.rstrip(b'\0')) 225 | 226 | return argv 227 | 228 | def _read_cwd(self, conn): 229 | line = _recvline(conn, b'\0') 230 | if line[-1:] != b'\0': 231 | raise RuntimeError("Corrupted cwd; not null terminated: {}".format(line)) 232 | return line.rstrip(b'\0') 233 | 234 | def _read_fds(self, conn): 235 | stdin = os.fdopen(_recvfd(conn)) 236 | stdout = os.fdopen(_recvfd(conn), 'w') 237 | stderr = os.fdopen(_recvfd(conn), 'w') 238 | return stdin, stdout, stderr 239 | 240 | def _break_loop(self, signum, stack): 241 | try: 242 | self.loopbreak_writer.write(b'a') 243 | except IOError as e: 244 | # The pipe is full. This is surprising, but could happen 245 | # in theory if we're being spammed with dying children. 246 | if t.errno == errno.EAGAIN: 247 | return 248 | else: 249 | raise 250 | 251 | def _reap(self): 252 | try: 253 | while True: 254 | pid, exitinfo = os.waitpid(-1, os.WNOHANG) 255 | if pid == 0: 256 | # Just means there's an extra child hanging around 257 | break 258 | 259 | signal = exitinfo % 2**8 260 | status = exitinfo >> 8 261 | if signal: 262 | print('[{}] Worker {} exited due to signal {}'.format(os.getpid(), pid, signal), file=sys.stderr) 263 | # In this case, we'll just have the client exit 264 | # with an arbitrary status 100. 265 | client_exit = 100 266 | else: 267 | if pid in self.children: 268 | print('[{}] Worker {} exited with status {}'.format(os.getpid(), pid, status), file=sys.stderr) 269 | else: 270 | print('[{}] Non-worker child process {} exited with status {}'.format(os.getpid(), pid, status), file=sys.stderr) 271 | continue 272 | client_exit = status 273 | conn = self.children[pid]['conn'] 274 | try: 275 | # TODO: make this non-blocking 276 | conn.send(struct.pack('I', client_exit)) 277 | except socket.error as e: 278 | # Shouldn't care if the client has died in the 279 | # meanwhile. Their loss! 280 | if e.errno == errno.EPIPE: 281 | pass 282 | else: 283 | raise 284 | conn.close() 285 | del self.children[pid] 286 | except OSError as e: 287 | # Keep going until we run out of dead workers 288 | if e.errno == errno.ECHILD: 289 | return 290 | else: 291 | raise 292 | 293 | def run(self, callback): 294 | # Install SIGCHLD handler so we know when workers exit 295 | old = signal.signal(signal.SIGCHLD, self._break_loop) 296 | # Start listening on the UNIX socket 297 | self._listen() 298 | 299 | # And do the actual workhorse 300 | self._run_event_loop() 301 | 302 | # Get rid of that handler 303 | signal.signal(signal.SIGCHLD, old) 304 | # In theory we might add the ability for the master to 305 | # gracefully exit. 306 | if self._is_master(): 307 | return 308 | 309 | # Guess we're in a worker process. 310 | callback() 311 | sys.exit(0) 312 | -------------------------------------------------------------------------------- /pyseidon/client/Makefile: -------------------------------------------------------------------------------- 1 | CC = gcc 2 | all: 3 | $(CC) -std=gnu99 -o pyseidon-client pyseidon.c 4 | -------------------------------------------------------------------------------- /pyseidon/client/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gdb/pyseidon/769bcb2e557c297f4febd75550903fd2e5edc78e/pyseidon/client/__init__.py -------------------------------------------------------------------------------- /pyseidon/client/lru_cache.py: -------------------------------------------------------------------------------- 1 | # Backport from Python 3, from 2 | # http://code.activestate.com/recipes/578078-py26-and-py30-backport-of-python-33s-lru-cache/. 3 | 4 | from collections import namedtuple 5 | from functools import update_wrapper 6 | from threading import RLock 7 | 8 | _CacheInfo = namedtuple("CacheInfo", ["hits", "misses", "maxsize", "currsize"]) 9 | 10 | class _HashedSeq(list): 11 | __slots__ = 'hashvalue' 12 | 13 | def __init__(self, tup, hash=hash): 14 | self[:] = tup 15 | self.hashvalue = hash(tup) 16 | 17 | def __hash__(self): 18 | return self.hashvalue 19 | 20 | def _make_key(args, kwds, typed, 21 | kwd_mark = (object(),), 22 | fasttypes = {int, str, frozenset, type(None)}, 23 | sorted=sorted, tuple=tuple, type=type, len=len): 24 | 'Make a cache key from optionally typed positional and keyword arguments' 25 | key = args 26 | if kwds: 27 | sorted_items = sorted(kwds.items()) 28 | key += kwd_mark 29 | for item in sorted_items: 30 | key += item 31 | if typed: 32 | key += tuple(type(v) for v in args) 33 | if kwds: 34 | key += tuple(type(v) for k, v in sorted_items) 35 | elif len(key) == 1 and type(key[0]) in fasttypes: 36 | return key[0] 37 | return _HashedSeq(key) 38 | 39 | def lru_cache(maxsize=100, typed=False): 40 | """Least-recently-used cache decorator. 41 | 42 | If *maxsize* is set to None, the LRU features are disabled and the cache 43 | can grow without bound. 44 | 45 | If *typed* is True, arguments of different types will be cached separately. 46 | For example, f(3.0) and f(3) will be treated as distinct calls with 47 | distinct results. 48 | 49 | Arguments to the cached function must be hashable. 50 | 51 | View the cache statistics named tuple (hits, misses, maxsize, currsize) with 52 | f.cache_info(). Clear the cache and statistics with f.cache_clear(). 53 | Access the underlying function with f.__wrapped__. 54 | 55 | See: http://en.wikipedia.org/wiki/Cache_algorithms#Least_Recently_Used 56 | 57 | """ 58 | 59 | # Users should only access the lru_cache through its public API: 60 | # cache_info, cache_clear, and f.__wrapped__ 61 | # The internals of the lru_cache are encapsulated for thread safety and 62 | # to allow the implementation to change (including a possible C version). 63 | 64 | def decorating_function(user_function): 65 | 66 | cache = dict() 67 | stats = [0, 0] # make statistics updateable non-locally 68 | HITS, MISSES = 0, 1 # names for the stats fields 69 | make_key = _make_key 70 | cache_get = cache.get # bound method to lookup key or return None 71 | _len = len # localize the global len() function 72 | lock = RLock() # because linkedlist updates aren't threadsafe 73 | root = [] # root of the circular doubly linked list 74 | root[:] = [root, root, None, None] # initialize by pointing to self 75 | nonlocal_root = [root] # make updateable non-locally 76 | PREV, NEXT, KEY, RESULT = 0, 1, 2, 3 # names for the link fields 77 | 78 | if maxsize == 0: 79 | 80 | def wrapper(*args, **kwds): 81 | # no caching, just do a statistics update after a successful call 82 | result = user_function(*args, **kwds) 83 | stats[MISSES] += 1 84 | return result 85 | 86 | elif maxsize is None: 87 | 88 | def wrapper(*args, **kwds): 89 | # simple caching without ordering or size limit 90 | key = make_key(args, kwds, typed) 91 | result = cache_get(key, root) # root used here as a unique not-found sentinel 92 | if result is not root: 93 | stats[HITS] += 1 94 | return result 95 | result = user_function(*args, **kwds) 96 | cache[key] = result 97 | stats[MISSES] += 1 98 | return result 99 | 100 | else: 101 | 102 | def wrapper(*args, **kwds): 103 | # size limited caching that tracks accesses by recency 104 | key = make_key(args, kwds, typed) if kwds or typed else args 105 | with lock: 106 | link = cache_get(key) 107 | if link is not None: 108 | # record recent use of the key by moving it to the front of the list 109 | root, = nonlocal_root 110 | link_prev, link_next, key, result = link 111 | link_prev[NEXT] = link_next 112 | link_next[PREV] = link_prev 113 | last = root[PREV] 114 | last[NEXT] = root[PREV] = link 115 | link[PREV] = last 116 | link[NEXT] = root 117 | stats[HITS] += 1 118 | return result 119 | result = user_function(*args, **kwds) 120 | with lock: 121 | root, = nonlocal_root 122 | if key in cache: 123 | # getting here means that this same key was added to the 124 | # cache while the lock was released. since the link 125 | # update is already done, we need only return the 126 | # computed result and update the count of misses. 127 | pass 128 | elif _len(cache) >= maxsize: 129 | # use the old root to store the new key and result 130 | oldroot = root 131 | oldroot[KEY] = key 132 | oldroot[RESULT] = result 133 | # empty the oldest link and make it the new root 134 | root = nonlocal_root[0] = oldroot[NEXT] 135 | oldkey = root[KEY] 136 | oldvalue = root[RESULT] 137 | root[KEY] = root[RESULT] = None 138 | # now update the cache dictionary for the new links 139 | del cache[oldkey] 140 | cache[key] = oldroot 141 | else: 142 | # put result in a new link at the front of the list 143 | last = root[PREV] 144 | link = [last, root, key, result] 145 | last[NEXT] = root[PREV] = cache[key] = link 146 | stats[MISSES] += 1 147 | return result 148 | 149 | def cache_info(): 150 | """Report cache statistics""" 151 | with lock: 152 | return _CacheInfo(stats[HITS], stats[MISSES], maxsize, len(cache)) 153 | 154 | def cache_clear(): 155 | """Clear the cache and cache statistics""" 156 | with lock: 157 | cache.clear() 158 | root = nonlocal_root[0] 159 | root[:] = [root, root, None, None] 160 | stats[:] = [0, 0] 161 | 162 | wrapper.__wrapped__ = user_function 163 | wrapper.cache_info = cache_info 164 | wrapper.cache_clear = cache_clear 165 | return update_wrapper(wrapper, user_function) 166 | 167 | return decorating_function 168 | -------------------------------------------------------------------------------- /pyseidon/client/pyseidon: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | import pyseidon 3 | import os 4 | import sys 5 | 6 | base = os.path.dirname(pyseidon.__file__) 7 | os.execv(os.path.join(base, 'client/pyseidon-client'), ['pyseidon-client'] + sys.argv[1:]) 8 | -------------------------------------------------------------------------------- /pyseidon/client/pyseidon-client: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gdb/pyseidon/769bcb2e557c297f4febd75550903fd2e5edc78e/pyseidon/client/pyseidon-client -------------------------------------------------------------------------------- /pyseidon/client/pyseidon.c: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | #include 5 | #include 6 | #include 7 | #include 8 | #include 9 | #include 10 | #include 11 | 12 | extern char **environ; 13 | 14 | void handle_error(char *message) 15 | { 16 | fprintf(stderr, "Something went wrong! Error details:\n\n"); 17 | 18 | if (errno) { 19 | perror(message); 20 | } else { 21 | printf("%s.\n", message); 22 | } 23 | 24 | fprintf(stderr, "\n"); 25 | 26 | char buf[1024]; 27 | char *errorfile = getenv("POSEIDON_ERRORFILE"); 28 | if (errorfile) { 29 | FILE *fr = fopen(errorfile, "r"); 30 | if (fr) { 31 | while(fgets(buf, sizeof(buf), fr) != NULL) { 32 | fprintf(stderr, "%s", buf); 33 | } 34 | fclose(fr); 35 | } 36 | } 37 | 38 | fprintf(stderr, "\n"); 39 | 40 | exit(200); 41 | } 42 | 43 | void pack_int(unsigned char bytes[4], unsigned long n) 44 | { 45 | bytes[0] = n & 0xFF; 46 | bytes[1] = (n >> 8) & 0xFF; 47 | bytes[2] = (n >> 16) & 0xFF; 48 | bytes[3] = (n >> 24) & 0xFF; 49 | } 50 | 51 | int unpack_int(unsigned char bytes[4]) 52 | { 53 | int output = 0; 54 | output += bytes[0]; 55 | output += bytes[1] << 8; 56 | output += bytes[2] << 16; 57 | output += bytes[3] << 24; 58 | return output; 59 | } 60 | 61 | 62 | void checked_send(int s, void *buffer, int length) 63 | { 64 | if (send(s, buffer, length, 0) < 0) { 65 | handle_error("Could not write bytes"); 66 | } 67 | } 68 | 69 | void checked_send_int(int s, int value) { 70 | unsigned char bytes[4]; 71 | pack_int(bytes, value); 72 | // WRITE: argument count 73 | checked_send(s, bytes, 4); 74 | } 75 | 76 | // Copied from http://code.swtch.com/plan9port/src/0e6ae8ed3276/src/lib9/sendfd.c 77 | int checked_sendfd(int s, int fd) 78 | { 79 | char buf[1]; 80 | struct iovec iov; 81 | struct msghdr msg; 82 | struct cmsghdr *cmsg; 83 | int n; 84 | char cms[CMSG_SPACE(sizeof(int))]; 85 | 86 | buf[0] = 0; 87 | iov.iov_base = buf; 88 | iov.iov_len = 1; 89 | 90 | memset(&msg, 0, sizeof msg); 91 | msg.msg_iov = &iov; 92 | msg.msg_iovlen = 1; 93 | msg.msg_control = (caddr_t)cms; 94 | msg.msg_controllen = CMSG_LEN(sizeof(int)); 95 | 96 | cmsg = CMSG_FIRSTHDR(&msg); 97 | cmsg->cmsg_len = CMSG_LEN(sizeof(int)); 98 | cmsg->cmsg_level = SOL_SOCKET; 99 | cmsg->cmsg_type = SCM_RIGHTS; 100 | memmove(CMSG_DATA(cmsg), &fd, sizeof(int)); 101 | 102 | if((n=sendmsg(s, &msg, 0)) != iov.iov_len) { 103 | handle_error("Could not send file descriptors"); 104 | } 105 | return 0; 106 | } 107 | 108 | int main(int argc, char **argv) 109 | { 110 | char *sock_path = getenv("POSEIDON_SOCK"); 111 | if (!sock_path) 112 | sock_path = "/tmp/pyseidon.sock"; 113 | 114 | int s; 115 | if ((s = socket(AF_UNIX, SOCK_STREAM, 0)) < 0) { 116 | handle_error("Could not create socket"); 117 | } 118 | 119 | struct sockaddr_un remote; 120 | remote.sun_family = AF_UNIX; 121 | strncpy(remote.sun_path, sock_path, sizeof(remote.sun_path)-1); 122 | if (connect(s, (struct sockaddr *) &remote, sizeof(remote)) < 0) { 123 | handle_error("Could not connect to UNIX socket for master process"); 124 | } 125 | 126 | // WRITE: argument count 127 | checked_send_int(s, argc); 128 | 129 | for (int i = 0; i < argc; i++) { 130 | // WRITE: all arguments (including null terminators) 131 | checked_send(s, argv[i], strlen(argv[i]) + 1); 132 | } 133 | 134 | // WRITE: environment variable count 135 | int env_size = 0; 136 | while (environ[env_size] != NULL) { 137 | env_size++; 138 | } 139 | checked_send_int(s, env_size); 140 | 141 | for (int i = 0; i < env_size; i++) { 142 | // WRITE: all environment variables (as k=v, including null terminators) 143 | checked_send(s, environ[i], strlen(environ[i])+1); 144 | } 145 | 146 | // WRITE: cwd 147 | int size = 0; 148 | if ((size = pathconf(".", _PC_PATH_MAX)) < 0) { 149 | handle_error("Could not determine file system's maximum path length"); 150 | } 151 | 152 | char *buf; 153 | if ((buf = (char *)malloc((size_t)size)) == NULL) { 154 | handle_error("Could not allocate enough memory to store current working directory"); 155 | } 156 | 157 | if (getcwd(buf, size) == NULL) { 158 | handle_error("Could not determine current working directory"); 159 | } 160 | checked_send(s, buf, strlen(buf)+1); 161 | free(buf); 162 | 163 | // Finally, send over the FDs 164 | checked_sendfd(s, 0); 165 | checked_sendfd(s, 1); 166 | checked_sendfd(s, 2); 167 | 168 | int t, total = 0; 169 | unsigned char exitstatus[4]; 170 | 171 | // Maybe should replace this janky buffering with fread or 172 | // something. 173 | while (total < 4) { 174 | // Handle errors as well as closed other ends 175 | if ((t = recv(s, exitstatus + total, 4 - total, 0)) < 0) { 176 | handle_error("Could not receive exitstatus from master"); 177 | } else if (t == 0) { 178 | handle_error("Master hung up connection"); 179 | } 180 | total += t; 181 | } 182 | 183 | close(s); 184 | 185 | return unpack_int(exitstatus); 186 | } 187 | -------------------------------------------------------------------------------- /pyseidon/handlers/__init__.py: -------------------------------------------------------------------------------- 1 | import pyseidon 2 | import sys 3 | 4 | def handle_script(): 5 | import runpy 6 | """ 7 | Allow the client to run an arbitrary Python script. 8 | 9 | Here's sample usage: 10 | 11 | ``` 12 | def expensive_setup(): 13 | ... 14 | 15 | if __name__ == '__main__': 16 | expensive_setup() 17 | 18 | import pyseidon.handlers 19 | pyseidon.handlers.handle_script() 20 | ``` 21 | """ 22 | def handler(): 23 | if len(sys.argv) < 1: 24 | print >>sys.stderr, 'Must provide path to Python script to execute' 25 | sys.exit(1) 26 | runpy.run_path(sys.argv[0], run_name='__main__') 27 | master = pyseidon.Pyseidon() 28 | master.run(handler) 29 | -------------------------------------------------------------------------------- /setup.cfg: -------------------------------------------------------------------------------- 1 | [metadata] 2 | description-file = README.md 3 | -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | """A setuptools based setup module. 2 | 3 | See: 4 | https://packaging.python.org/en/latest/distributing.html 5 | https://github.com/pypa/sampleproject 6 | """ 7 | 8 | # Always prefer setuptools over distutils 9 | from setuptools import setup, find_packages 10 | from distutils.command.build import build as DistutilsBuild 11 | # To use a consistent encoding 12 | from codecs import open 13 | from os import path 14 | 15 | import subprocess 16 | 17 | here = path.abspath(path.dirname(__file__)) 18 | 19 | class Build(DistutilsBuild): 20 | def run(self): 21 | subprocess.check_call(['make', '-C', 'pyseidon/client']) 22 | DistutilsBuild.run(self) 23 | 24 | setup( 25 | cmdclass={'build': Build}, 26 | 27 | name='pyseidon', 28 | 29 | # Versions should comply with PEP440. For a discussion on single-sourcing 30 | # the version across setup.py and the project code, see 31 | # https://packaging.python.org/en/latest/single_source_version.html 32 | version='0.1.3', 33 | 34 | description='A boot-once, run-many-times framework for Python', 35 | long_description='Pyseidon allows you to boot a Python master process, and then run clients that are forked directly from the master. This is particularly useful for completing a slow data-loading process once and then running many experiments.', 36 | 37 | # The project's main homepage. 38 | url='https://github.com/gdb/pyseidon', 39 | 40 | # Author details 41 | author='Greg Brockman', 42 | author_email='gdb@gregbrockman.com', 43 | 44 | # Choose your license 45 | license='MIT', 46 | 47 | # See https://pypi.python.org/pypi?%3Aaction=list_classifiers 48 | classifiers=[ 49 | # How mature is this project? Common values are 50 | # 3 - Alpha 51 | # 4 - Beta 52 | # 5 - Production/Stable 53 | 'Development Status :: 3 - Alpha', 54 | 55 | # Indicate who your project is intended for 56 | 'Intended Audience :: Developers', 57 | 58 | # Pick your license as you wish (should match "license" above) 59 | 'License :: OSI Approved :: MIT License', 60 | 61 | # Specify the Python versions you support here. In particular, ensure 62 | # that you indicate whether you support Python 2, Python 3 or both. 63 | 'Programming Language :: Python :: 2.7', 64 | ], 65 | 66 | # What does your project relate to? 67 | keywords='pyseidon', 68 | 69 | # You can just specify the packages manually here if your project is 70 | # simple. Or you can use find_packages(). 71 | packages=find_packages(exclude=['contrib', 'docs', 'tests*']), 72 | 73 | # To provide executable scripts, use entry points in preference to the 74 | # "scripts" keyword. Entry points provide cross-platform support and allow 75 | # pip to create the appropriate form of executable for the target platform. 76 | scripts=['pyseidon/client/pyseidon'], 77 | package_data={'pyseidon': ['client/Makefile', 'client/pyseidon.c', 'client/pyseidon-client']} 78 | ) 79 | --------------------------------------------------------------------------------