├── .gitmodules ├── CMakeLists.txt ├── LICENSE ├── README.md ├── extra ├── lldb.py └── loader_diagram.png └── src ├── find_shared_function.cpp ├── loader.cpp ├── loader.h ├── main.cpp └── python_runner.cpp /.gitmodules: -------------------------------------------------------------------------------- 1 | [submodule "third_party/fmt"] 2 | path = third_party/fmt 3 | url = https://github.com/fmtlib/fmt 4 | [submodule "third_party/pybind11"] 5 | path = third_party/pybind11 6 | url = https://github.com/pybind/pybind11 7 | -------------------------------------------------------------------------------- /CMakeLists.txt: -------------------------------------------------------------------------------- 1 | project(singleprocessmultiprocess) 2 | set(CMAKE_CXX_STANDARD 17) 3 | add_subdirectory(third_party/fmt) 4 | 5 | 6 | find_package (Python COMPONENTS Development) 7 | add_library(python_runner SHARED src/python_runner.cpp) 8 | target_include_directories(python_runner PRIVATE third_party/pybind11/include ${Python_INCLUDE_DIRS}) 9 | 10 | add_library(find_shared_function SHARED src/find_shared_function.cpp) 11 | target_link_libraries(find_shared_function PRIVATE fmt::fmt-header-only) 12 | 13 | add_executable(main src/main.cpp src/loader.cpp) 14 | target_compile_definitions(main PRIVATE PYTHON_SO_PATH=\"${Python_LIBRARIES}\") 15 | target_link_libraries(main PRIVATE dl pthread fmt::fmt-header-only) 16 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | BSD 3-Clause License 2 | 3 | Copyright (c) 2021, Zachary DeVito 4 | All rights reserved. 5 | 6 | Redistribution and use in source and binary forms, with or without 7 | modification, are permitted provided that the following conditions are met: 8 | 9 | 1. Redistributions of source code must retain the above copyright notice, this 10 | list of conditions and the following disclaimer. 11 | 12 | 2. Redistributions in binary form must reproduce the above copyright notice, 13 | this list of conditions and the following disclaimer in the documentation 14 | and/or other materials provided with the distribution. 15 | 16 | 3. Neither the name of the copyright holder nor the names of its 17 | contributors may be used to endorse or promote products derived from 18 | this software without specific prior written permission. 19 | 20 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" 21 | AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 22 | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 23 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE 24 | FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 25 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 26 | SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 27 | CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 28 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 29 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 30 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | Multiple Python Interpreters via Custom Dynamic Loading 2 | ===================================== 3 | 4 | The Python interpreter is normally limited to run on a single thread at a time by Python's global interpreter lock, or GIL. This design makes the interpreter's design simpler but makes it hard to write multithreaded Python programs. The most common workaround is Python's multiprocessing module which provides a way to run multiple cooperative Python processes. However, multiprocessing has several downsides. First, managing multiple processes is much more complicated than managing a single process, with tasks like running a debugger being substantially more difficult in the multiprocessing case. Furthermore, sharing non-Python data between the processes can be tricky. For instance in PyTorch we have a lot of CUDA-allocated Tensors that we would like to share between processes, but to do this we have to carefully allocate them into the systems "shared memory" to make sure all processes can see them, and we have to have custom management of the lifetime of these objects. 5 | 6 | Future versions of Python will [have a mechanism for allocating multiple separate interpreters](https://lwn.net/Articles/820424/) in a single process, but this will not be available until Python 3.10 at the earliest. Even then, lots of Python extension modules, PyTorch and Numpy included, make the assumption that there is only one interpreter and one GIL. They will need to be modified to be compatible with these extensions. 7 | 8 | But it is possible today to get multiple interpreters in a single process without modifying most extensions! It just requires loading the entire Python library multiple times. 9 | 10 | Multiple Pythons via Custom Dynamic Loading 11 | ------------------------------------------- 12 | 13 | The reason having multiple Python interpreters is hard is because CPython's API has a lot of global symbols and values like the interpreter lock. By writing a custom shared library loader, we can arrange that multiple copies of Python _and its extensions libaries_ can be loaded in a single process such that they cannot see each other. Nevertheless, data allocated in C/C++ such as PyTorch Tensors can be shared across interpreters since they exist in the same process. 14 | 15 | ![loader diagram](https://github.com/zdevito/custom_loader/blob/main/extra/loader_diagram.png?raw=true) 16 | 17 | A shared library loader is the part of `libc` accessed by `dlopen` which reads shared libraries into memory. The normal Unix loader is inflexible: it will only ever load a library once, and it has a fixed method for linking the symbols of that library with the running process. However, nothing stops us from writing our own loader with a more flexible API for symbol resolution: 18 | 19 | // linker.h 20 | struct SymbolProvider { 21 | SymbolProvider() {} 22 | virtual at::optional sym(const char* name) const = 0; 23 | // for symbols referring to thread local state (TLS) 24 | virtual at::optional tls_sym(const char* name) const = 0; 25 | SymbolProvider(const SymbolProvider&) = delete; 26 | SymbolProvider& operator=(const SymbolProvider&) = delete; 27 | virtual ~SymbolProvider() {} 28 | }; 29 | 30 | // RAII wrapper around dlopen 31 | struct SystemLibrary : public SymbolProvider { 32 | // create a wrapper around an existing handle returned from dlopen 33 | // if steal == true, then this will dlclose the handle when it is destroyed. 34 | static std::shared_ptr create( 35 | void* handle = RTLD_DEFAULT, 36 | bool steal = false); 37 | static std::shared_ptr create(const char* path, int flags); 38 | }; 39 | 40 | struct CustomLibrary : public SymbolProvider { 41 | static std::shared_ptr create( 42 | const char* filename, 43 | int argc = 0, 44 | const char** argv = nullptr); 45 | virtual void add_search_library(std::shared_ptr lib) = 0; 46 | virtual void load() = 0; 47 | }; 48 | 49 | Here a `SymbolProvider` is an abstract object that can resolve symbol names (strings) to an address (`Elf64_Addr`). The `SystemLibrary` class does this as a wrapper around the system loader using `dlopen` and `dlsym`. The `CustomLibrary` is our custom loader's API. It resolves symbols by looking through a list of `SymbolProvider` objects which can be backed by the system or other custom libraries. 50 | 51 | We can use this API to get multiple Python's in the same process: 52 | 53 | std::vector python_libs_; 54 | // RTLD_GLOBAL, the symbols in the current process 55 | auto global = SystemLibrary::create(); 56 | for (int i = 0; i < 2; ++ i) { 57 | auto p = CustomLibrary::create(PYTHON_SO_PATH); 58 | p->add_search_library(global); 59 | p->load(); 60 | python_libs_.push_back(p); 61 | } 62 | 63 | It's hard to use these interpreters directly because all of the Python API functions have to be looked up using the `sym` method. Instead of doing it that way, we can create another shared library `libpython_runner.so` that contains our interaction with the Python API: 64 | 65 | // python_runner.cpp 66 | struct PythonGuard { 67 | PythonGuard() { 68 | Py_Initialize(); 69 | // this has to occur on the thread that calls finalize 70 | // otherwise 'assert tlock.locked()' fails 71 | py::exec("import threading"); 72 | // release GIL after startup, we will acquire on each call to run 73 | PyEval_SaveThread(); 74 | } 75 | ~PythonGuard() { 76 | PyGILState_Ensure(); 77 | Py_Finalize(); 78 | } 79 | }; 80 | 81 | static PythonGuard runner; 82 | 83 | extern "C" void run(const char * code) { 84 | // use pybind11 API to run some code 85 | py::gil_scoped_acquire guard_; 86 | py::exec(code); 87 | } 88 | 89 | Now we can link that against the Python library and use it to run code by exposing the `run` function which compiles and execs a string of Python. Wrapping it up an in object we get: 90 | 91 | struct PythonAPI { 92 | PythonAPI() { 93 | auto global = SystemLibrary::create(); 94 | python_ = CustomLibrary::create(PYTHON_SO_PATH); 95 | python_->add_search_library(global); 96 | python_->load(); 97 | 98 | python_runner_ = CustomLibrary::create("libpython_runner.so"); 99 | python_runner_->add_search_library(python_); 100 | python_runner_->add_search_library(global); 101 | python_runner_->load(); 102 | } 103 | void run(const char* code) { 104 | auto run = (void(*)(const char* code)) python_runner_->sym("run").value(); 105 | run(code); 106 | } 107 | CustomLibraryPtr python_; 108 | CustomLibraryPtr python_runner_; 109 | }; 110 | 111 | 112 | We can then make multiple copies of the object to get multiple interpreters. Let's time using a single one against using two to show that we really have two separate GILs 113 | 114 | 115 | auto example_src = R"end( 116 | print("I think None is", id(None)) 117 | from time import time 118 | 119 | def fib(x): 120 | if x <= 1: 121 | return 1 122 | return fib(x - 1) + fib(x - 2) 123 | 124 | def do_fib(): 125 | s = time() 126 | fib(30) 127 | e = time() 128 | print(e - s) 129 | 130 | )end"; 131 | 132 | int main() { 133 | PythonAPI a; 134 | PythonAPI b; 135 | a.run(example_src); 136 | b.run(example_src); 137 | 138 | std::cout << "fib(30) for single interpreter\n"; 139 | std::thread t0([&]{ 140 | a.run("do_fib()"); 141 | }); 142 | std::thread t1([&]{ 143 | a.run("do_fib()"); 144 | }); 145 | t0.join(); 146 | t1.join(); 147 | 148 | std::cout << "fib(30) for 2 interpreters\n"; 149 | std::thread t2([&]{ 150 | a.run("do_fib()"); 151 | }); 152 | std::thread t3([&]{ 153 | b.run("do_fib()"); 154 | }); 155 | t2.join(); 156 | t3.join(); 157 | } 158 | 159 | When we run this we get: 160 | 161 | I think None is 139756875126544 162 | I think None is 139756860208912 163 | fib(30) for single interpreter 164 | 0.5423781871795654 165 | 0.5387833118438721 166 | fib(30) for 2 interpreters 167 | 0.2851290702819824 168 | 0.28827738761901855 169 | 170 | You can see that each interpreter is really different because they have a different `id` for the singleton `None` object. Furthermore running two threads in separate interpreters is nearly twice as fast at computing `fib` as a single interpreter used in two threads. 171 | 172 | Supporting C extensions 173 | ----------------------- 174 | 175 | We now have a way to get multiple Python interpreters in a process. A problem arises when we try to use any extension library: 176 | 177 | a.run("import regex"); 178 | > terminate called after throwing an instance of 'pybind11::error_already_set' 179 | > what(): ImportError: [...]/unicodedata.cpython-38-x86_64-linux-gnu.so: undefined symbol: _PyUnicode_Ready 180 | 181 | Because Python is calling `dlopen` on any C extension library, those libraries cannot see the Python symbols we loaded with the `CustomLibrary` object. The system loader doesn't know about the symbols we loaded. 182 | 183 | Internally CPython calls `_PyImport_FindSharedFuncptr` to do this loading, so we can fix this problem by overriding it with our own implementation: 184 | 185 | // find_shared_function.cpp 186 | extern "C" { 187 | CustomLibraryPtr the_python_library; 188 | } 189 | 190 | // note: intentially leaking the vector so that 191 | // dtors on the loaded libraries do not get called. 192 | // this module will unload after python so it is unsafe 193 | // for destruct the loaded libraries then. 194 | auto loaded = new std::vector; 195 | 196 | typedef void (*dl_funcptr)(void); 197 | extern "C" dl_funcptr _PyImport_FindSharedFuncptr( 198 | const char* prefix, 199 | const char* shortname, 200 | const char* pathname, 201 | FILE* fp) { 202 | std::cout << "CUSTOM LOAD SHARED LIBRARY " << pathname << "\n"; 203 | auto lib = CustomLibrary::create(pathname); 204 | lib->add_search_library(SystemLibrary::create()); 205 | lib->add_search_library(the_python_library); 206 | lib->load(); 207 | auto init_name = fmt::format("{}_{}", prefix, shortname); 208 | auto result = (dl_funcptr)lib->sym(init_name.c_str()).value(); 209 | loaded->emplace_back(std::move(lib)); 210 | return result; 211 | } 212 | 213 | Note that we need a reference to `the_python_library` which we want to link the extensions against. Since this will be different for each interpreter, we need different copies of this function for each interpreter. When you have a hammer, every problem looks like a nail, so we will generate multiple copies of this function by using the custom loader to load it multiple times. We can modify our `PythonAPI` object from before to put these pieces together: 214 | 215 | struct PythonAPI { 216 | PythonAPI() { 217 | auto global = SystemLibrary::create(); 218 | + find_shared_function_ = CustomLibrary::create("libfind_shared_function.so"); 219 | + find_shared_function_->add_search_library(global); 220 | + find_shared_function_->load(); 221 | 222 | python_ = CustomLibrary::create(PYTHON_SO_PATH); 223 | + python_->add_search_library(find_shared_function_); 224 | python_->add_search_library(global); 225 | python_->load(); 226 | 227 | + auto find_shared_python_ref = (CustomLibraryPtr*) find_shared_function_->sym("the_python_library").value(); 228 | + *find_shared_python_ref = python_; 229 | 230 | python_runner_ = CustomLibrary::create("libpython_runner.so"); 231 | python_runner_->add_search_library(python_); 232 | python_runner_->add_search_library(global); 233 | python_runner_->load(); 234 | } 235 | void run(const char* code) { 236 | auto run = (void(*)(const char* code)) python_runner_->sym("run").value(); 237 | run(code); 238 | } 239 | + CustomLibraryPtr find_shared_function_; 240 | CustomLibraryPtr python_; 241 | CustomLibraryPtr python_runner_; 242 | }; 243 | 244 | With this change, now we can dynamically load extension libraries into the Python interpreters and they will correctly link their extensions against their own API: 245 | 246 | a.run("import regex"); 247 | 248 | > CUSTOM LOAD SHARED LIBRARY [...]/lib-dynload/_heapq.cpython-38-x86_64-linux-gnu.so 249 | > CUSTOM LOAD SHARED LIBRARY [...]/lib-dynload/unicodedata.cpython-38-x86_64-linux-gnu.so 250 | > CUSTOM LOAD SHARED LIBRARY [...]/site-packages/regex/_regex.cpython-38-x86_64-linux-gnu.so 251 | 252 | This is pretty cool because it allows us to use extension libraries (and Python itself!) without modification. 253 | 254 | We've tested this with `numpy`, and it appears to pass the `numpy.test()` test suite when run simultaneously on two separate threads (except for a few tests that directly call dlopen, which are expected to fail in this setup): 255 | 256 | auto run_numpy = R"end( 257 | import numpy as np 258 | print(np.arange(10)*10) 259 | )end"; 260 | a.run(run_numpy); 261 | > CUSTOM LOAD SHARED LIBRARY [...]/numpy/core/_multiarray_umath.cpython-38-x86_64-linux-gnu.so 262 | > [...] 263 | > CUSTOM LOAD SHARED LIBRARY [...]/numpy/random/_generator.cpython-38-x86_64-linux-gnu.so 264 | > [ 0 10 20 30 40 50 60 70 80 90] 265 | 266 | It might seem like loading multiple copies of Python and all of its extensions is going to take a lot of RAM. But most of the libraries are read-only program text and debug information. Since these sections are mmap'd into the process directly from the library file, the RAM for all the read-only pages will be shared across all interpreters. Only the global variables, and some relocation tables will get duplicated. 267 | 268 | The full [source code](https://github.com/fairinternal/dynamic_torchscript_experiments/tree/master/elftests) for these examples also contains the implementation of the custom loader. It is a heavily modified version of Androids libbionic linker. It is only a prototype so it is limited to x86_64 code and ELF-format shared libraries (found on Linux and BSD but not OSX or Windows). 269 | 270 | We plan to integrate this custom loading approach into `torch::deploy` so that C extensions can be used in the private Python interpreters that deploy uses to run deep learning models, but we also wanted to put this independent example together since it may prove useful in other cases where using multiple embedded Python interpreters would be beneficial. 271 | 272 | Special thanks to the PyTorch Platform Team, especially Ailing Zhang and Will Constable for getting `torch::deploy` integrated into PyTorch, and to Greg Clayton for helping figure out how to register custom loaded code with lldb for debugging. 273 | -------------------------------------------------------------------------------- /extra/lldb.py: -------------------------------------------------------------------------------- 1 | import lldb 2 | # load into lldb instance with: 3 | # command script import src/lldb.py 4 | 5 | target = lldb.debugger.GetSelectedTarget() 6 | bp = target.BreakpointCreateByRegex("__deploy_register_code") 7 | bp.SetScriptCallbackBody("""\ 8 | process = frame.thread.GetProcess() 9 | target = process.target 10 | symbol_addr = frame.module.FindSymbol("__deploy_module_info").GetStartAddress() 11 | info_addr = symbol_addr.GetLoadAddress(target) 12 | e = lldb.SBError() 13 | ptr_size = 8 14 | str_addr = process.ReadPointerFromMemory(info_addr, e) 15 | file_addr = process.ReadPointerFromMemory(info_addr + ptr_size, e) 16 | file_size = process.ReadPointerFromMemory(info_addr + 2*ptr_size, e) 17 | load_bias = process.ReadPointerFromMemory(info_addr + 3*ptr_size, e) 18 | name = process.ReadCStringFromMemory(str_addr, 512, e) 19 | r = process.ReadMemory(file_addr, file_size, e) 20 | from tempfile import NamedTemporaryFile 21 | from pathlib import Path 22 | stem = Path(name).stem 23 | with NamedTemporaryFile(prefix=stem, suffix='.so', delete=False) as tf: 24 | tf.write(r) 25 | print("torch_deploy registering debug inforation for ", tf.name) 26 | cmd1 = f"target modules add {tf.name}" 27 | # print(cmd1) 28 | lldb.debugger.HandleCommand(cmd1) 29 | cmd2 = f"target modules load -f {tf.name} -s {hex(load_bias)}" 30 | # print(cmd2) 31 | lldb.debugger.HandleCommand(cmd2) 32 | 33 | return False 34 | """) 35 | -------------------------------------------------------------------------------- /extra/loader_diagram.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zdevito/custom_loader/cf1a73de8c75f3c61a2842ab9d5a82dd77361044/extra/loader_diagram.png -------------------------------------------------------------------------------- /src/find_shared_function.cpp: -------------------------------------------------------------------------------- 1 | 2 | #include "loader.h" 3 | #include 4 | 5 | #include 6 | 7 | using namespace loader; 8 | 9 | extern "C" { 10 | CustomLibraryPtr the_python_library; 11 | } 12 | 13 | // note: intentially leaking the vector so that 14 | // dtors on the loaded libraries do not get called. 15 | // this module will unload after python so it is unsafe 16 | // for destruct the loaded libraries then. 17 | auto loaded = new std::vector; 18 | 19 | typedef void (*dl_funcptr)(void); 20 | extern "C" dl_funcptr _PyImport_FindSharedFuncptr( 21 | const char* prefix, 22 | const char* shortname, 23 | const char* pathname, 24 | FILE* fp) { 25 | std::cout << "CUSTOM LOAD SHARED LIBRARY " << pathname << "\n"; 26 | auto lib = CustomLibrary::create(pathname); 27 | lib->add_search_library(SystemLibrary::create()); 28 | lib->add_search_library(the_python_library); 29 | lib->load(); 30 | auto init_name = fmt::format("{}_{}", prefix, shortname); 31 | auto result = (dl_funcptr)lib->sym(init_name.c_str()).value(); 32 | loaded->emplace_back(std::move(lib)); 33 | return result; 34 | } -------------------------------------------------------------------------------- /src/loader.cpp: -------------------------------------------------------------------------------- 1 | // Code in this file is a heavily modified version of the dynamic loader 2 | // from android's bionic library. Here is the license for that project: 3 | 4 | /* 5 | * Copyright (C) 2016 The Android Open Source Project 6 | * All rights reserved. 7 | * 8 | * Redistribution and use in source and binary forms, with or without 9 | * modification, are permitted provided that the following conditions 10 | * are met: 11 | * * Redistributions of source code must retain the above copyright 12 | * notice, this list of conditions and the following disclaimer. 13 | * * Redistributions in binary form must reproduce the above copyright 14 | * notice, this list of conditions and the following disclaimer in 15 | * the documentation and/or other materials provided with the 16 | * distribution. 17 | * 18 | * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 19 | * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT 20 | * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS 21 | * FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 22 | * COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, 23 | * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, 24 | * BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS 25 | * OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED 26 | * AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 27 | * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT 28 | * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 29 | * SUCH DAMAGE. 30 | */ 31 | 32 | #include 33 | #include 34 | #include 35 | #include 36 | #include 37 | #include 38 | #include 39 | #include 40 | #include 41 | #include 42 | #include 43 | #include 44 | #include 45 | #include 46 | #include 47 | #include 48 | #include 49 | #include 50 | #include 51 | #include 52 | #include 53 | #include 54 | // Get PAGE_SIZE and PAGE_MASK. 55 | #include 56 | 57 | #include 58 | #include "loader.h" 59 | 60 | namespace loader { 61 | 62 | #define DEPLOY_ERROR(msg_fmt, ...) \ 63 | throw DeployLinkerError(fmt::format(msg_fmt, ##__VA_ARGS__)) 64 | 65 | #define DEPLOY_CHECK(cond, fmt, ...) \ 66 | if (!(cond)) { \ 67 | DEPLOY_ERROR(fmt, ##__VA_ARGS__); \ 68 | } 69 | 70 | std::vector split_path(const std::string& s, char delim) { 71 | const char* cur = s.c_str(); 72 | const char* end = cur + s.size(); 73 | if (cur == end) { 74 | return {}; 75 | } 76 | std::vector result; 77 | while (true) { 78 | // non-zero amount of chars 79 | const char* next = strchr(cur, delim); 80 | if (!next) { 81 | result.push_back(std::string(cur, end)); 82 | break; 83 | } 84 | result.push_back(std::string(cur, next)); 85 | cur = next + 1; 86 | } 87 | return result; 88 | } 89 | 90 | // https://stackoverflow.com/questions/23006930/the-shared-library-rpath-and-the-binary-rpath-priority/52647116#52647116 91 | void replace_all( 92 | std::string& str, 93 | const std::string& from, 94 | const std::string& to) { 95 | if (from.empty()) 96 | return; 97 | size_t start_pos = 0; 98 | while ((start_pos = str.find(from, start_pos)) != std::string::npos) { 99 | str.replace(start_pos, from.length(), to); 100 | start_pos += to.length(); // In case 'to' contains 'from', like replacing 101 | // 'x' with 'yx' 102 | } 103 | } 104 | 105 | std::string resolve_path(const std::string& origin, const std::string& t) { 106 | std::string result = t; 107 | replace_all(result, "$ORIGIN", origin); 108 | char buf[PATH_MAX]; 109 | char* resolved = realpath(result.c_str(), buf); 110 | if (!resolved) { 111 | return result; 112 | } 113 | return resolved; 114 | } 115 | 116 | std::string resolve_origin(const std::string& so_name) { 117 | char origin[PATH_MAX]; 118 | realpath(so_name.c_str(), origin); 119 | dirname(origin); 120 | return origin; 121 | } 122 | 123 | template 124 | std::string stringf(const char* format, Args... args) { 125 | int size_s = snprintf(nullptr, 0, format, args...); 126 | std::string result(size_s + 1, 0); 127 | snprintf((char*)result.data(), size_s + 1, format, args...); 128 | return result; 129 | } 130 | // Returns the address of the page containing address 'x'. 131 | #define PAGE_START(x) ((x)&PAGE_MASK) 132 | 133 | // Returns the offset of address 'x' in its page. 134 | #define PAGE_OFFSET(x) ((x) & ~PAGE_MASK) 135 | 136 | // Returns the address of the next page after address 'x', unless 'x' is 137 | // itself at the start of a page. 138 | #define PAGE_END(x) PAGE_START((x) + (PAGE_SIZE - 1)) 139 | 140 | // from bionic 141 | // returns the size a shared library will take in memory 142 | size_t phdr_table_get_load_size( 143 | const Elf64_Phdr* phdr_table, 144 | size_t phdr_count, 145 | Elf64_Addr* out_min_vaddr, 146 | Elf64_Addr* out_max_vaddr) { 147 | Elf64_Addr min_vaddr = UINTPTR_MAX; 148 | Elf64_Addr max_vaddr = 0; 149 | 150 | bool found_pt_load = false; 151 | for (size_t i = 0; i < phdr_count; ++i) { 152 | const Elf64_Phdr* phdr = &phdr_table[i]; 153 | 154 | if (phdr->p_type != PT_LOAD) { 155 | continue; 156 | } 157 | found_pt_load = true; 158 | 159 | if (phdr->p_vaddr < min_vaddr) { 160 | min_vaddr = phdr->p_vaddr; 161 | } 162 | 163 | if (phdr->p_vaddr + phdr->p_memsz > max_vaddr) { 164 | max_vaddr = phdr->p_vaddr + phdr->p_memsz; 165 | } 166 | } 167 | if (!found_pt_load) { 168 | min_vaddr = 0; 169 | } 170 | 171 | min_vaddr = PAGE_START(min_vaddr); 172 | max_vaddr = PAGE_END(max_vaddr); 173 | 174 | if (out_min_vaddr != nullptr) { 175 | *out_min_vaddr = min_vaddr; 176 | } 177 | if (out_max_vaddr != nullptr) { 178 | *out_max_vaddr = max_vaddr; 179 | } 180 | return max_vaddr - min_vaddr; 181 | } 182 | 183 | #define MAYBE_MAP_FLAG(x, from, to) (((x) & (from)) ? (to) : 0) 184 | #define PFLAGS_TO_PROT(x) \ 185 | (MAYBE_MAP_FLAG((x), PF_X, PROT_EXEC) | \ 186 | MAYBE_MAP_FLAG((x), PF_R, PROT_READ) | \ 187 | MAYBE_MAP_FLAG((x), PF_W, PROT_WRITE)) 188 | 189 | // holds a pre-computed hash for a string that is used in a GNU-style hash 190 | // tables and also keeps track of the string length. 191 | struct GnuHash { 192 | GnuHash(const char* name) { 193 | uint32_t h = 5381; 194 | const uint8_t* name_bytes = reinterpret_cast(name); 195 | #pragma unroll 8 196 | while (*name_bytes != 0) { 197 | h += (h << 5) + 198 | *name_bytes++; // h*33 + c = h + h * 32 + c = h + h << 5 + c 199 | } 200 | hash = h; 201 | name_len = reinterpret_cast(name_bytes) - name; 202 | } 203 | uint32_t hash; 204 | uint32_t name_len; 205 | }; 206 | 207 | // this is a special builtin in the libc++ API used for telling C++ execption 208 | // frame unwinding about functions loaded from a pathway other than the libc 209 | // loader. it is passed a pointer to where the EH_FRAME section was loaded, 210 | // which appears to include frame information relative to that address. 211 | extern "C" void __register_frame(void*); 212 | 213 | // Memory maps a file into the address space read-only, and manages the lifetime 214 | // of the mapping. Used in the loader to read in initial image, and to inspect 215 | // ELF files for dependencies before callling dlopen. 216 | struct MemFile { 217 | MemFile(const char* filename_) : fd_(0), mem_(nullptr), n_bytes_(0) { 218 | fd_ = open(filename_, O_RDONLY); 219 | DEPLOY_CHECK( 220 | fd_ != -1, "failed to open {}: {}", filename_, strerror(errno)); 221 | struct stat s; 222 | if (-1 == fstat(fd_, &s)) { 223 | close(fd_); // destructors don't run during exceptions 224 | DEPLOY_ERROR("failed to stat {}: {}", filename_, strerror(errno)); 225 | } 226 | n_bytes_ = s.st_size; 227 | mem_ = mmap(nullptr, n_bytes_, PROT_READ, MAP_SHARED, fd_, 0); 228 | if (MAP_FAILED == mem_) { 229 | close(fd_); 230 | DEPLOY_ERROR("failed to mmap {}: {}", filename_, strerror(errno)); 231 | } 232 | } 233 | MemFile(const MemFile&) = delete; 234 | const char* data() const { 235 | return (const char*)mem_; 236 | } 237 | ~MemFile() { 238 | if (mem_) { 239 | munmap((void*)mem_, n_bytes_); 240 | } 241 | if (fd_) { 242 | close(fd_); 243 | } 244 | } 245 | size_t size() { 246 | return n_bytes_; 247 | } 248 | int fd() const { 249 | return fd_; 250 | } 251 | 252 | private: 253 | int fd_; 254 | void* mem_; 255 | size_t n_bytes_; 256 | }; 257 | 258 | typedef void (*linker_dtor_function_t)(); 259 | typedef void (*linker_ctor_function_t)(int, const char**, char**); 260 | 261 | // https://refspecs.linuxfoundation.org/LSB_2.1.0/LSB-Core-generic/LSB-Core-generic/ehframehdr.html 262 | // note that eh_frame_ptr can be different types based on eh_frame_ptr_enc but 263 | // we only support one sepecific encoding that is stored in a int32_t and an 264 | // offset relative to the start of this struct. 265 | struct EH_Frame_HDR { 266 | char version; 267 | char eh_frame_ptr_enc; 268 | char fde_count_enc; 269 | char table_enc; 270 | int32_t eh_frame_ptr; 271 | }; 272 | 273 | // this is the libc++ function called to lookup thread local state. 274 | // It is passed a pointer to an object of the same shape as TLSEntry 275 | // with the module_id and offset. 276 | extern "C" void* __tls_get_addr(void*); 277 | 278 | extern "C" int __cxa_thread_atexit_impl( 279 | void (*dtor)(void*), 280 | void* obj, 281 | void* dso_symbol); 282 | 283 | struct CustomLibraryImpl; 284 | 285 | struct TLSMemory { 286 | TLSMemory(std::shared_ptr file, size_t size) 287 | : file_(std::move(file)), mem_(malloc(size)) {} 288 | std::shared_ptr file_; 289 | void* mem_; 290 | ~TLSMemory() { 291 | free(mem_); 292 | } 293 | }; 294 | 295 | static void delete_TLSMemory(void* obj) { 296 | delete ((TLSMemory*)obj); 297 | } 298 | 299 | // This object performs TLS emulation for modules not loaded by dlopen. 300 | // Normally modules have a module_id that is used as a key in libc for the 301 | // thread local data for that module. However, there is no public API for 302 | // assigning this module id. Instead, for modules that we load, we set module_id 303 | // to a pointer to a TLSSegment object, and replace __tls_get_addr with a 304 | // function that calls `addr`. 305 | 306 | // libc module_id's are sequential, so we use the top bit as a flag to see 307 | // if we have a local TLSegment object instead. This will break if 308 | // someone creates 2^63 sequential objects, but it is hard to imagine 309 | // a system with enough RAM to do that. 310 | constexpr size_t TLS_LOCAL_FLAG = (1ULL << 63); 311 | 312 | static void* local__tls_get_addr(TLSIndex* idx); 313 | 314 | /* LLDB puts a breakpoint in this function, and reads __deploy_module_info to 315 | * get debug info from library. */ 316 | __attribute__((noinline)) void __deploy_register_code() { 317 | std::cout << ""; // otherwise the breakpoint doesn't get hit, not sure if 318 | // there is a more stable way of doing this. 319 | }; 320 | 321 | struct DeployModuleInfo { 322 | const char* name; 323 | Elf64_Addr file_addr; 324 | size_t file_size; 325 | Elf64_Addr load_bias; 326 | }; 327 | 328 | extern "C" { 329 | DeployModuleInfo __deploy_module_info; 330 | } 331 | 332 | // RAII wrapper around dlopen 333 | struct SystemLibraryImpl : public SystemLibrary { 334 | SystemLibraryImpl(void* handle, bool steal) 335 | : handle_(handle), own_handle_(steal && handle != RTLD_DEFAULT) {} 336 | 337 | optional sym(const char* name) const override { 338 | void* r = dlsym(handle_, name); 339 | if (!r) { 340 | return nullopt; 341 | } 342 | return (Elf64_Addr)r; 343 | } 344 | 345 | optional tls_sym(const char* name) const override; 346 | 347 | ~SystemLibraryImpl() override { 348 | if (own_handle_) { 349 | dlclose(handle_); 350 | } 351 | } 352 | 353 | private: 354 | void* handle_; 355 | bool own_handle_; 356 | }; 357 | 358 | std::shared_ptr SystemLibrary::create(void* handle, bool steal) { 359 | return std::make_shared(handle, steal); 360 | } 361 | std::shared_ptr SystemLibrary::create( 362 | const char* path, 363 | int flags) { 364 | void* handle = dlopen(path, flags); 365 | return SystemLibrary::create(handle, handle != nullptr); 366 | } 367 | 368 | // reads DT_NEEDED and DT_RUNPATH from an unloaded elf file so we can sort out 369 | // dependencies before calling dlopen 370 | std::pair> load_needed_from_elf_file( 371 | const char* filename, 372 | const char* data) { 373 | auto header_ = (Elf64_Ehdr*)data; 374 | auto program_headers = (Elf64_Phdr*)(data + header_->e_phoff); 375 | auto n_program_headers = header_->e_phnum; 376 | const Elf64_Dyn* dynamic = nullptr; 377 | for (size_t i = 0; i < n_program_headers; ++i) { 378 | const Elf64_Phdr* phdr = &program_headers[i]; 379 | if (phdr->p_type == PT_DYNAMIC) { 380 | dynamic = reinterpret_cast(data + phdr->p_offset); 381 | break; 382 | } 383 | } 384 | DEPLOY_CHECK( 385 | dynamic, 386 | "{}: could not load dynamic section for looking up DT_NEEDED", 387 | filename); 388 | 389 | const char* runpath = ""; 390 | std::vector needed; 391 | 392 | auto segment_headers = (Elf64_Shdr*)(data + header_->e_shoff); 393 | size_t n_segments = header_->e_shnum; 394 | const char* strtab = nullptr; 395 | 396 | const char* segment_string_table = 397 | data + segment_headers[header_->e_shstrndx].sh_offset; 398 | 399 | for (size_t i = 0; i < n_segments; ++i) { 400 | const Elf64_Shdr* shdr = &segment_headers[i]; 401 | if (shdr->sh_type == SHT_STRTAB && 402 | strcmp(".dynstr", segment_string_table + shdr->sh_name) == 0) { 403 | strtab = data + shdr->sh_offset; 404 | break; 405 | } 406 | } 407 | 408 | DEPLOY_CHECK(strtab, "{}: could not load dynstr for DT_NEEDED", filename); 409 | 410 | for (const Elf64_Dyn* d = dynamic; d->d_tag != DT_NULL; ++d) { 411 | switch (d->d_tag) { 412 | case DT_NEEDED: 413 | // std::cout << "NEEDED: '" << strtab + d->d_un.d_val << "'\n"; 414 | needed.push_back(strtab + d->d_un.d_val); 415 | break; 416 | case DT_RPATH: /* not quite correct, because this is a different order 417 | than runpath, 418 | but better than not processing it at all */ 419 | case DT_RUNPATH: 420 | // std::cout << "RUNPATH: '" << strtab + d->d_un.d_val << "'\n"; 421 | runpath = strtab + d->d_un.d_val; 422 | break; 423 | } 424 | } 425 | return std::make_pair(runpath, std::move(needed)); 426 | } 427 | 428 | // common mechanism for reading the elf symbol table, 429 | // and other information in the PT_DYNAMIC segment. 430 | struct ElfDynamicInfo { 431 | std::string name_; 432 | const Elf64_Dyn* dynamic_; 433 | Elf64_Addr load_bias_; 434 | const Elf64_Sym* symtab_ = nullptr; 435 | const char* strtab_ = nullptr; 436 | size_t strtab_size_ = 0; 437 | Elf64_Rela* plt_rela_ = nullptr; 438 | size_t n_plt_rela_ = 0; 439 | Elf64_Rela* rela_ = nullptr; 440 | size_t n_rela_ = 0; 441 | linker_ctor_function_t init_func_ = nullptr; 442 | linker_ctor_function_t* init_array_ = nullptr; 443 | linker_dtor_function_t fini_func_ = nullptr; 444 | linker_dtor_function_t* fini_array_ = nullptr; 445 | size_t n_init_array_ = 0; 446 | size_t n_fini_array_ = 0; 447 | size_t gnu_nbucket_; 448 | uint32_t* gnu_bucket_ = nullptr; 449 | uint32_t* gnu_chain_; 450 | uint32_t gnu_maskwords_; 451 | uint32_t gnu_shift2_; 452 | Elf64_Addr* gnu_bloom_filter_; 453 | std::string runpath_; 454 | std::vector needed_; 455 | 456 | const char* get_string(int idx) { 457 | return strtab_ + idx; 458 | } 459 | 460 | void initialize_from_dynamic_section( 461 | std::string name, 462 | Elf64_Dyn* dynamic, 463 | Elf64_Addr load_bias, 464 | bool check_absolute) { 465 | name_ = std::move(name); 466 | load_bias_ = load_bias; 467 | dynamic_ = dynamic; 468 | for (const Elf64_Dyn* d = dynamic_; d->d_tag != DT_NULL; ++d) { 469 | void* addr = (check_absolute && d->d_un.d_ptr > load_bias_) 470 | ? reinterpret_cast(d->d_un.d_ptr) 471 | : reinterpret_cast(load_bias_ + d->d_un.d_ptr); 472 | auto value = d->d_un.d_val; 473 | 474 | switch (d->d_tag) { 475 | case DT_SYMTAB: 476 | symtab_ = (Elf64_Sym*)addr; 477 | break; 478 | case DT_STRTAB: 479 | strtab_ = (const char*)addr; 480 | break; 481 | 482 | case DT_STRSZ: 483 | strtab_size_ = value; 484 | break; 485 | 486 | case DT_JMPREL: 487 | plt_rela_ = (Elf64_Rela*)addr; 488 | break; 489 | case DT_PLTRELSZ: 490 | n_plt_rela_ = value / sizeof(Elf64_Rela); 491 | break; 492 | case DT_RELA: 493 | rela_ = (Elf64_Rela*)addr; 494 | break; 495 | case DT_RELASZ: 496 | n_rela_ = value / sizeof(Elf64_Rela); 497 | break; 498 | 499 | case DT_INIT: 500 | init_func_ = reinterpret_cast( 501 | load_bias_ + d->d_un.d_ptr); 502 | break; 503 | 504 | case DT_FINI: 505 | fini_func_ = reinterpret_cast( 506 | load_bias_ + d->d_un.d_ptr); 507 | break; 508 | 509 | case DT_INIT_ARRAY: 510 | init_array_ = reinterpret_cast( 511 | load_bias_ + d->d_un.d_ptr); 512 | break; 513 | 514 | case DT_INIT_ARRAYSZ: 515 | n_init_array_ = 516 | static_cast(d->d_un.d_val) / sizeof(Elf64_Addr); 517 | break; 518 | 519 | case DT_FINI_ARRAY: 520 | fini_array_ = reinterpret_cast( 521 | load_bias_ + d->d_un.d_ptr); 522 | break; 523 | 524 | case DT_FINI_ARRAYSZ: 525 | n_fini_array_ = 526 | static_cast(d->d_un.d_val) / sizeof(Elf64_Addr); 527 | break; 528 | 529 | case DT_HASH: 530 | break; 531 | 532 | case DT_GNU_HASH: { 533 | gnu_nbucket_ = reinterpret_cast(addr)[0]; 534 | // skip symndx 535 | gnu_maskwords_ = reinterpret_cast(addr)[2]; 536 | gnu_shift2_ = reinterpret_cast(addr)[3]; 537 | gnu_bloom_filter_ = 538 | reinterpret_cast((Elf64_Addr)addr + 16); 539 | gnu_bucket_ = 540 | reinterpret_cast(gnu_bloom_filter_ + gnu_maskwords_); 541 | // amend chain for symndx = header[1] 542 | gnu_chain_ = 543 | gnu_bucket_ + gnu_nbucket_ - reinterpret_cast(addr)[1]; 544 | --gnu_maskwords_; 545 | } break; 546 | } 547 | } 548 | 549 | if (!gnu_bucket_) { 550 | std::cout << fmt::format( 551 | "{}: warning, no DT_GNU_HASH found, symbol lookups on this module will not find anything.\n", 552 | name_); 553 | } 554 | 555 | // pass 2 for things that require the strtab_ to be loaded 556 | for (const Elf64_Dyn* d = dynamic_; d->d_tag != DT_NULL; ++d) { 557 | switch (d->d_tag) { 558 | case DT_NEEDED: 559 | needed_.push_back(get_string(d->d_un.d_val)); 560 | break; 561 | case DT_RPATH: /* not quite correct, because this is a different order 562 | than runpath, 563 | but better than not processing it at all */ 564 | case DT_RUNPATH: 565 | runpath_ = get_string(d->d_un.d_val); 566 | break; 567 | } 568 | } 569 | } 570 | 571 | optional sym( 572 | const char* name, 573 | GnuHash* precomputed_hash = nullptr) const { 574 | if (!gnu_bucket_) { 575 | return nullopt; // no hashtable was loaded 576 | } 577 | GnuHash hash_obj = precomputed_hash ? *precomputed_hash : GnuHash(name); 578 | auto hash = hash_obj.hash; 579 | auto name_len = hash_obj.name_len; 580 | constexpr uint32_t kBloomMaskBits = sizeof(Elf64_Addr) * 8; 581 | 582 | const uint32_t word_num = (hash / kBloomMaskBits) & gnu_maskwords_; 583 | const Elf64_Addr bloom_word = gnu_bloom_filter_[word_num]; 584 | const uint32_t h1 = hash % kBloomMaskBits; 585 | const uint32_t h2 = (hash >> gnu_shift2_) % kBloomMaskBits; 586 | 587 | if ((1 & (bloom_word >> h1) & (bloom_word >> h2)) != 1) { 588 | return nullopt; 589 | } 590 | 591 | uint32_t sym_idx = gnu_bucket_[hash % gnu_nbucket_]; 592 | if (sym_idx == 0) { 593 | return nullopt; 594 | } 595 | 596 | uint32_t chain_value = 0; 597 | const Elf64_Sym* sym = nullptr; 598 | 599 | do { 600 | sym = symtab_ + sym_idx; 601 | chain_value = gnu_chain_[sym_idx]; 602 | if ((chain_value >> 1) == (hash >> 1)) { 603 | if (static_cast(sym->st_name) + name_len + 1 <= strtab_size_ && 604 | memcmp(strtab_ + sym->st_name, name, name_len + 1) == 0) { 605 | // found the matching entry, is it defined? 606 | if (sym->st_shndx != 0) { 607 | return sym->st_value + 608 | ((ELF64_ST_TYPE(sym->st_info) == STT_TLS) ? 0 : load_bias_); 609 | } 610 | // symbol isn't defined 611 | return nullopt; 612 | } 613 | } 614 | ++sym_idx; 615 | } while ((chain_value & 1) == 0); 616 | return nullopt; 617 | } 618 | }; 619 | 620 | // for resolving TLS offsets we need to look through 621 | // libc's already loaded libraries. We do not have the whole 622 | // ELF file mapped in this case just a pointer to the program headers and 623 | // the load_bias (offset in memory) where the library was loaded. 624 | struct AlreadyLoadedSymTable { 625 | private: 626 | ElfDynamicInfo dyninfo_; 627 | 628 | public: 629 | AlreadyLoadedSymTable( 630 | const char* name, 631 | Elf64_Addr load_bias, 632 | const Elf64_Phdr* program_headers, 633 | size_t n_program_headers) { 634 | Elf64_Dyn* dynamic = nullptr; 635 | for (size_t i = 0; i < n_program_headers; ++i) { 636 | const Elf64_Phdr* phdr = &program_headers[i]; 637 | 638 | // Segment addresses in memory. 639 | Elf64_Addr seg_start = phdr->p_vaddr + load_bias; 640 | if (phdr->p_type == PT_DYNAMIC) { 641 | dynamic = reinterpret_cast(seg_start); 642 | break; 643 | } 644 | } 645 | DEPLOY_CHECK( 646 | dynamic, "%s: couldn't find PT_DYNAMIC in already loaded table.", name); 647 | dyninfo_.initialize_from_dynamic_section(name, dynamic, load_bias, true); 648 | } 649 | 650 | optional sym(const char* name) { 651 | return dyninfo_.sym(name); 652 | } 653 | }; 654 | static int iterate_cb(struct dl_phdr_info* info, size_t size, void* data) { 655 | auto fn = (std::function*)data; 656 | return (*fn)(info, size); 657 | } 658 | 659 | // we need to find a TLS offset / module_id pair for a symbol which we cannot do 660 | // with a normal dlsym call. Instead we iterate through all loaded libraries and 661 | // check their symbol tables for the symbol. The value of the symbol is the TLS 662 | // offset. When we find the library we also get the module id. 663 | optional slow_find_tls_symbol_offset(const char* sym_name) { 664 | optional result = nullopt; 665 | std::function cb = 666 | [&](struct dl_phdr_info* info, size_t size) { 667 | // std::cout << "SEARCHING .. " << info->dlpi_name << "\n"; 668 | AlreadyLoadedSymTable symtable( 669 | info->dlpi_name, 670 | info->dlpi_addr, 671 | info->dlpi_phdr, 672 | info->dlpi_phnum); 673 | auto sym_addr = symtable.sym(sym_name); 674 | if (sym_addr) { 675 | // std::cout << "FOUND IT IN: " << info->dlpi_name << " it has modid: 676 | // " << info->dlpi_tls_modid << "\n"; 677 | result = TLSIndex{info->dlpi_tls_modid, *sym_addr}; 678 | return 1; 679 | } 680 | return 0; 681 | }; 682 | 683 | dl_iterate_phdr(iterate_cb, (void*)&cb); 684 | return result; 685 | } 686 | 687 | optional SystemLibraryImpl::tls_sym(const char* name) const { 688 | if (!sym(name)) { 689 | return nullopt; // before we do a bunch of slow lookups to find the 690 | // module_id, check that this even defines the symbol 691 | } 692 | if (handle_ == RTLD_DEFAULT) { 693 | return slow_find_tls_symbol_offset(name); 694 | } 695 | 696 | struct link_map* lm = 0; 697 | DEPLOY_CHECK( 698 | 0 == dlinfo(handle_, RTLD_DI_LINKMAP, &lm), "failed to query dlinfo"); 699 | std::cout << "TLS dlinfo LOOKUP " << lm->l_name << " " << name << " " 700 | << "\n"; 701 | 702 | ElfDynamicInfo info; 703 | info.initialize_from_dynamic_section(lm->l_name, lm->l_ld, lm->l_addr, true); 704 | auto r = info.sym(name); 705 | if (r) { 706 | size_t module_id = 0; 707 | DEPLOY_CHECK( 708 | 0 == dlinfo(handle_, RTLD_DI_TLS_MODID, &module_id), 709 | "failed to query dlinfo for module_id"); 710 | return TLSIndex{module_id, *r}; 711 | } 712 | return nullopt; 713 | } 714 | 715 | // dlopen does not accept additional search paths as an argument. 716 | // however, normal DT_NEEDED library load inherits the runpath of parents. 717 | // So we need to pre-find all the libraries and call dlopen on them directly to 718 | // get the same behavior. We can find the dependencies by reading the libraries 719 | // dynamic section for recursive DT_NEEED entries. 720 | void resolve_needed_libraries( 721 | std::vector>& libraries, 722 | const std::string& origin_relative, 723 | std::vector& search_path, 724 | const std::string& runpath_template, 725 | const std::vector& needed) { 726 | size_t search_path_start_size = search_path.size(); 727 | 728 | std::string origin = resolve_origin(origin_relative); 729 | std::vector paths = split_path(runpath_template, ':'); 730 | // backwards because we want paths to be search in order but we search 731 | // search_path backward 732 | for (size_t i = paths.size(); i > 0; --i) { 733 | search_path.emplace_back(resolve_path(origin, paths[i - 1])); 734 | } 735 | 736 | for (const char* name : needed) { 737 | // std::cout << "ATTEMPTING FIND " << name << "\n"; 738 | if (strcmp(name, "libtorch_python.so") == 0) { 739 | // torchvision expects it... 740 | continue; 741 | } 742 | // find the library, either (1) it is already loaded, 743 | // (2) it is an absolute path that exists, 744 | // (3) we find it in the search path 745 | // (4) we can dlopen it 746 | 747 | // (1) the library is already loaded 748 | const int base_flags = RTLD_LAZY | RTLD_LOCAL; 749 | void* handle = dlopen(name, base_flags | RTLD_NOLOAD); 750 | if (handle) { 751 | // std::cout << "ALREADY LOADED " << name << "\n"; 752 | libraries.emplace_back(SystemLibrary::create(handle, true)); 753 | continue; 754 | } 755 | 756 | std::string library_path = ""; 757 | // (2) it is an absolute path 758 | if (strchr(name, '/') != nullptr) { 759 | library_path = name; 760 | } else { 761 | // (3) find it in the search path 762 | for (size_t i = search_path.size(); i > 0; --i) { 763 | std::stringstream ss; 764 | ss << search_path[i - 1] << "/" << name; 765 | if (access(ss.str().c_str(), F_OK) == 0) { 766 | library_path = ss.str(); 767 | break; 768 | } 769 | } 770 | } 771 | 772 | std::vector> 773 | sublibraries; // these need to say loaded until we open library_path 774 | // otherwise we might dlclose a sublibrary 775 | 776 | if (library_path != "") { 777 | // std::cout << "LOOKING FOR SUBLIBRARIES FOR FILE AT PATH " << 778 | // library_path << "\n"; we found the actual file, recursively load its 779 | // deps before opening it so we resolve their paths correctly 780 | MemFile image(library_path.c_str()); 781 | auto search = 782 | load_needed_from_elf_file(library_path.c_str(), image.data()); 783 | resolve_needed_libraries( 784 | sublibraries, library_path, search_path, search.first, search.second); 785 | } else { 786 | library_path = name; 787 | } 788 | 789 | // either we didn't find the file, or we have already loaded its deps 790 | // in both cases, we now try to call dlopen. In the case where we didn't 791 | // find the file, we hope that something like LD_LIBRARY_PATH knows where it 792 | // is. In the case where we found it, we know its deps are loaded and 793 | // resolved. 794 | 795 | // std::cout << "OPENING " << library_path << "\n"; 796 | handle = dlopen(library_path.c_str(), base_flags); 797 | DEPLOY_CHECK( 798 | handle, "{}: could not load library, dlopen says: {}", name, dlerror()); 799 | libraries.emplace_back(SystemLibrary::create(handle, true)); 800 | } 801 | 802 | // unwind search_path stack 803 | search_path.erase( 804 | search_path.begin() + search_path_start_size, search_path.end()); 805 | } 806 | 807 | extern "C" void* __dso_handle; 808 | 809 | struct CustomLibraryImpl 810 | : public std::enable_shared_from_this, 811 | public CustomLibrary { 812 | CustomLibraryImpl(const char* filename, int argc, const char** argv) 813 | : contents_(filename), 814 | mapped_library_(nullptr), 815 | name_(filename), 816 | argc_(argc), 817 | argv_(argv) { 818 | pthread_key_create(&tls_key_, nullptr); 819 | data_ = contents_.data(); 820 | header_ = (Elf64_Ehdr*)data_; 821 | program_headers_ = (Elf64_Phdr*)(data_ + header_->e_phoff); 822 | n_program_headers_ = header_->e_phnum; 823 | } 824 | void add_search_library(std::shared_ptr lib) override { 825 | symbol_search_path_.emplace_back(std::move(lib)); 826 | } 827 | void reserve_address_space() { 828 | Elf64_Addr min_vaddr, max_vaddr; 829 | mapped_size_ = phdr_table_get_load_size( 830 | program_headers_, n_program_headers_, &min_vaddr, &max_vaddr); 831 | mapped_library_ = mmap( 832 | nullptr, mapped_size_, PROT_NONE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); 833 | load_bias_ = 834 | (const char*)mapped_library_ - reinterpret_cast(min_vaddr); 835 | } 836 | 837 | void load_segments() { 838 | // from bionic 839 | for (size_t i = 0; i < n_program_headers_; ++i) { 840 | const Elf64_Phdr* phdr = &program_headers_[i]; 841 | 842 | // Segment addresses in memory. 843 | Elf64_Addr seg_start = phdr->p_vaddr + load_bias_; 844 | Elf64_Addr seg_end = seg_start + phdr->p_memsz; 845 | 846 | switch (phdr->p_type) { 847 | case PT_DYNAMIC: 848 | dynamic_ = reinterpret_cast(seg_start); 849 | break; 850 | case PT_GNU_EH_FRAME: 851 | eh_frame_hdr_ = reinterpret_cast(seg_start); 852 | assert(eh_frame_hdr_->eh_frame_ptr_enc == 0x1b); 853 | eh_frame_ = 854 | (void*)((int64_t)&eh_frame_hdr_->eh_frame_ptr + eh_frame_hdr_->eh_frame_ptr); 855 | break; 856 | case PT_TLS: 857 | tls_file_size_ = phdr->p_filesz; 858 | tls_mem_size_ = phdr->p_memsz; 859 | tls_initalization_image_ = (void*)seg_start; 860 | break; 861 | }; 862 | 863 | if (phdr->p_type != PT_LOAD) { 864 | continue; 865 | } 866 | 867 | Elf64_Addr seg_page_start = PAGE_START(seg_start); 868 | Elf64_Addr seg_page_end = PAGE_END(seg_end); 869 | 870 | Elf64_Addr seg_file_end = seg_start + phdr->p_filesz; 871 | 872 | // File offsets. 873 | Elf64_Addr file_start = phdr->p_offset; 874 | Elf64_Addr file_end = file_start + phdr->p_filesz; 875 | 876 | Elf64_Addr file_page_start = PAGE_START(file_start); 877 | Elf64_Addr file_length = file_end - file_page_start; 878 | 879 | if (contents_.size() <= 0) { 880 | DEPLOY_ERROR( 881 | "\"{}\" invalid file size: {}", name_.c_str(), contents_.size()); 882 | } 883 | 884 | if (file_end > contents_.size()) { 885 | DEPLOY_ERROR( 886 | "invalid ELF file \"{}\" load segment[{}]:" 887 | " p_offset ({}) + p_filesz ({}) ( = {}) past end of file " 888 | "({})", 889 | name_.c_str(), 890 | i, 891 | reinterpret_cast(phdr->p_offset), 892 | reinterpret_cast(phdr->p_filesz), 893 | reinterpret_cast(file_end), 894 | contents_.size()); 895 | } 896 | 897 | if (file_length != 0) { 898 | int prot = PFLAGS_TO_PROT(phdr->p_flags); 899 | 900 | void* seg_addr = mmap64( 901 | reinterpret_cast(seg_page_start), 902 | file_length, 903 | prot, 904 | MAP_FIXED | MAP_PRIVATE, 905 | contents_.fd(), 906 | file_page_start); 907 | if (seg_addr == MAP_FAILED) { 908 | DEPLOY_ERROR( 909 | "couldn't map \"{}\" segment {}: {}", 910 | name_.c_str(), 911 | i, 912 | strerror(errno)); 913 | } 914 | } 915 | 916 | // if the segment is writable, and does not end on a page boundary, 917 | // zero-fill it until the page limit. 918 | if ((phdr->p_flags & PF_W) != 0 && PAGE_OFFSET(seg_file_end) > 0) { 919 | memset( 920 | reinterpret_cast(seg_file_end), 921 | 0, 922 | PAGE_SIZE - PAGE_OFFSET(seg_file_end)); 923 | } 924 | 925 | seg_file_end = PAGE_END(seg_file_end); 926 | 927 | // seg_file_end is now the first page address after the file 928 | // content. If seg_end is larger, we need to zero anything 929 | // between them. This is done by using a private anonymous 930 | // map for all extra pages. 931 | if (seg_page_end > seg_file_end) { 932 | size_t zeromap_size = seg_page_end - seg_file_end; 933 | void* zeromap = mmap( 934 | reinterpret_cast(seg_file_end), 935 | zeromap_size, 936 | PFLAGS_TO_PROT(phdr->p_flags), 937 | MAP_FIXED | MAP_ANONYMOUS | MAP_PRIVATE, 938 | -1, 939 | 0); 940 | if (zeromap == MAP_FAILED) { 941 | DEPLOY_ERROR( 942 | "couldn't zero fill \"{}\" gap: {}", 943 | name_.c_str(), 944 | strerror(errno)); 945 | } 946 | } 947 | } 948 | } 949 | size_t module_id() const { 950 | size_t this_as_number = (size_t)this; 951 | return this_as_number | TLS_LOCAL_FLAG; 952 | } 953 | 954 | void read_dynamic_section() { 955 | dyninfo_.initialize_from_dynamic_section( 956 | name_, dynamic_, load_bias_, false); 957 | std::vector empty_search_path; 958 | resolve_needed_libraries( 959 | symbol_search_path_, 960 | name_, 961 | empty_search_path, 962 | dyninfo_.runpath_, 963 | dyninfo_.needed_); 964 | } 965 | 966 | optional lookup_symbol(Elf64_Xword r_info) { 967 | const uint32_t r_type = ELF64_R_TYPE(r_info); 968 | const uint32_t r_sym = ELF64_R_SYM(r_info); 969 | 970 | if (r_sym == 0) { 971 | return (Elf64_Addr)0; 972 | } 973 | auto sym_st = dyninfo_.symtab_[r_sym]; 974 | const char* sym_name = dyninfo_.get_string(sym_st.st_name); 975 | if (r_type == R_X86_64_JUMP_SLOT && 976 | strcmp(sym_name, "__tls_get_addr") == 0) { 977 | return (Elf64_Addr)local__tls_get_addr; 978 | } 979 | for (const auto& sys_lib : symbol_search_path_) { 980 | auto r = sys_lib->sym(sym_name); 981 | if (r) { 982 | return r; 983 | } 984 | } 985 | auto r = sym(sym_name); 986 | if (r) { 987 | return r; 988 | } 989 | if (ELF64_ST_BIND(sym_st.st_info) != STB_WEAK) { 990 | DEPLOY_ERROR( 991 | "{}: '{}' symbol not found in ElfFile lookup", 992 | name_.c_str(), 993 | sym_name); 994 | } 995 | return nullopt; 996 | } 997 | 998 | optional tls_lookup_symbol(Elf64_Xword r_info) { 999 | const uint32_t r_type = ELF64_R_TYPE(r_info); 1000 | const uint32_t r_sym = ELF64_R_SYM(r_info); 1001 | 1002 | if (r_sym == 0) { 1003 | return TLSIndex{ 1004 | module_id(), 1005 | 0}; // note: offset is not queried when the symbol is blank 1006 | } 1007 | 1008 | auto sym_st = dyninfo_.symtab_[r_sym]; 1009 | const char* sym_name = dyninfo_.get_string(sym_st.st_name); 1010 | for (const auto& sys_lib : symbol_search_path_) { 1011 | auto r = sys_lib->tls_sym(sym_name); 1012 | if (r) { 1013 | return r; 1014 | } 1015 | } 1016 | auto r = tls_sym(sym_name); 1017 | if (r) { 1018 | return r; 1019 | } 1020 | 1021 | if (ELF64_ST_BIND(sym_st.st_info) != STB_WEAK) { 1022 | DEPLOY_ERROR( 1023 | "{}: '{}' symbol not found in ElfFile lookup", 1024 | name_.c_str(), 1025 | sym_name); 1026 | } 1027 | return nullopt; 1028 | } 1029 | 1030 | void relocate_one(const Elf64_Rela& reloc) { 1031 | const uint32_t r_type = ELF64_R_TYPE(reloc.r_info); 1032 | 1033 | if (r_type == 0) { 1034 | return; 1035 | } 1036 | 1037 | void* const rel_target = 1038 | reinterpret_cast(reloc.r_offset + load_bias_); 1039 | 1040 | // TLS relocations need to lookup symbols differently so we can get the 1041 | // module_id 1042 | if (r_type == R_X86_64_DTPMOD64 || r_type == R_X86_64_DTPOFF64) { 1043 | auto tls_index = tls_lookup_symbol(reloc.r_info); 1044 | if (!tls_index) { 1045 | return; // skip weak relocation that wasn't found 1046 | } 1047 | switch (r_type) { 1048 | case R_X86_64_DTPMOD64: 1049 | *static_cast(rel_target) = tls_index->module_id; 1050 | break; 1051 | case R_X86_64_DTPOFF64: 1052 | *static_cast(rel_target) = 1053 | tls_index->offset + reloc.r_addend; 1054 | break; 1055 | } 1056 | return; 1057 | } 1058 | 1059 | auto sym_addr = lookup_symbol(reloc.r_info); 1060 | if (!sym_addr) { 1061 | return; // skip weak relocation that wasn't found 1062 | } 1063 | 1064 | switch (r_type) { 1065 | case R_X86_64_JUMP_SLOT: 1066 | case R_X86_64_64: 1067 | case R_X86_64_GLOB_DAT: { 1068 | const Elf64_Addr result = *sym_addr + reloc.r_addend; 1069 | *static_cast(rel_target) = result; 1070 | } break; 1071 | case R_X86_64_RELATIVE: { 1072 | // In practice, r_sym is always zero, but if it weren't, the linker 1073 | // would still look up the referenced symbol (and abort if the symbol 1074 | // isn't found), even though it isn't used. 1075 | const Elf64_Addr result = load_bias_ + reloc.r_addend; 1076 | *static_cast(rel_target) = result; 1077 | } break; 1078 | case R_X86_64_32: { 1079 | const Elf32_Addr result = *sym_addr + reloc.r_addend; 1080 | *static_cast(rel_target) = result; 1081 | } break; 1082 | case R_X86_64_PC32: { 1083 | const Elf64_Addr target = *sym_addr + reloc.r_addend; 1084 | const Elf64_Addr base = reinterpret_cast(rel_target); 1085 | const Elf32_Addr result = target - base; 1086 | *static_cast(rel_target) = result; 1087 | } break; 1088 | default: 1089 | DEPLOY_ERROR("unknown reloc type {} in \"{}\"", r_type, name_.c_str()); 1090 | break; 1091 | } 1092 | } 1093 | 1094 | void relocate() { 1095 | for (size_t i = 0; i < dyninfo_.n_rela_; ++i) { 1096 | relocate_one(dyninfo_.rela_[i]); 1097 | } 1098 | for (size_t i = 0; i < dyninfo_.n_plt_rela_; ++i) { 1099 | relocate_one(dyninfo_.plt_rela_[i]); 1100 | } 1101 | } 1102 | 1103 | void initialize() { 1104 | call_function(dyninfo_.init_func_); 1105 | for (size_t i = 0; i < dyninfo_.n_init_array_; ++i) { 1106 | call_function(dyninfo_.init_array_[i]); 1107 | } 1108 | initialized_ = true; 1109 | } 1110 | 1111 | void finalize() { 1112 | for (size_t i = dyninfo_.n_fini_array_; i > 0; --i) { 1113 | call_function(dyninfo_.fini_array_[i - 1]); 1114 | } 1115 | call_function(dyninfo_.fini_func_); 1116 | } 1117 | 1118 | void register_debug_info() { 1119 | // std::cout << "target modules add " << name_.c_str() << "\n"; 1120 | // std::cout << "target modules load -f " << name_.c_str() << " -s " 1121 | // << std::hex << "0x" << load_bias_ << "\n"; 1122 | __deploy_module_info.name = name_.c_str(); 1123 | __deploy_module_info.file_addr = (Elf64_Addr)contents_.data(); 1124 | __deploy_module_info.file_size = contents_.size(); 1125 | __deploy_module_info.load_bias = load_bias_; 1126 | // debugger script sets a breakpoint on this function, 1127 | // then reads __deploy_module_info to issue the target module commands. 1128 | __deploy_register_code(); 1129 | } 1130 | 1131 | void load() { 1132 | reserve_address_space(); 1133 | load_segments(); 1134 | read_dynamic_section(); 1135 | relocate(); 1136 | __register_frame(eh_frame_); 1137 | register_debug_info(); 1138 | initialize(); 1139 | } 1140 | 1141 | ~CustomLibraryImpl() { 1142 | // std::cout << "LINKER IS UNLOADING: " << name_ << "\n"; 1143 | if (initialized_) { 1144 | finalize(); 1145 | } 1146 | if (mapped_library_) { 1147 | munmap(mapped_library_, mapped_size_); 1148 | } 1149 | } 1150 | void call_function(linker_dtor_function_t f) { 1151 | if (f == nullptr || (int64_t)f == -1) 1152 | return; 1153 | f(); 1154 | } 1155 | void call_function(linker_ctor_function_t f) { 1156 | if (f == nullptr || (int64_t)f == -1) 1157 | return; 1158 | f(argc_, argv_, environ); 1159 | } 1160 | 1161 | optional sym(const char* name) const override { 1162 | return dyninfo_.sym(name); 1163 | } 1164 | 1165 | optional tls_sym(const char* name) const override { 1166 | auto r = dyninfo_.sym(name); 1167 | if (r) { 1168 | return TLSIndex{module_id(), *r}; 1169 | } 1170 | return nullopt; 1171 | } 1172 | 1173 | void* tls_addr(size_t offset) { 1174 | // this was a TLS entry for one of our modules, so we use pthreads to 1175 | // emulate thread local state. 1176 | void* start = pthread_getspecific(tls_key_); 1177 | if (!start) { 1178 | auto tls_mem = new TLSMemory(shared_from_this(), tls_mem_size_); 1179 | __cxa_thread_atexit_impl(delete_TLSMemory, tls_mem, &__dso_handle); 1180 | start = tls_mem->mem_; 1181 | memcpy(start, tls_initalization_image_, tls_file_size_); 1182 | memset( 1183 | (void*)((const char*)start + tls_file_size_), 1184 | 0, 1185 | tls_mem_size_ - tls_file_size_); 1186 | pthread_setspecific(tls_key_, start); 1187 | } 1188 | return (void*)((const char*)start + offset); 1189 | } 1190 | 1191 | private: 1192 | MemFile contents_; 1193 | const char* data_; 1194 | const Elf64_Ehdr* header_; 1195 | const Elf64_Phdr* program_headers_; 1196 | const EH_Frame_HDR* eh_frame_hdr_; 1197 | void* eh_frame_; 1198 | size_t n_program_headers_; 1199 | void* mapped_library_; 1200 | size_t mapped_size_; 1201 | Elf64_Addr load_bias_; 1202 | Elf64_Dyn* dynamic_; 1203 | ElfDynamicInfo dyninfo_; 1204 | std::string name_; 1205 | int argc_; 1206 | const char** argv_; 1207 | bool initialized_ = false; 1208 | 1209 | pthread_key_t tls_key_; 1210 | void* tls_initalization_image_; 1211 | size_t tls_file_size_; 1212 | size_t tls_mem_size_; 1213 | 1214 | std::vector> symbol_search_path_; 1215 | }; 1216 | 1217 | std::shared_ptr CustomLibrary::create( 1218 | const char* filename, 1219 | int argc, 1220 | const char** argv) { 1221 | return std::make_shared(filename, argc, argv); 1222 | } 1223 | 1224 | static void* local__tls_get_addr(TLSIndex* idx) { 1225 | if ((idx->module_id & TLS_LOCAL_FLAG) != 0) { 1226 | return ((CustomLibraryImpl*)(idx->module_id & ~TLS_LOCAL_FLAG)) 1227 | ->tls_addr(idx->offset); 1228 | } 1229 | return __tls_get_addr(idx); 1230 | } 1231 | 1232 | } // namespace loader -------------------------------------------------------------------------------- /src/loader.h: -------------------------------------------------------------------------------- 1 | #pragma once 2 | #include 3 | #include 4 | #include 5 | #include 6 | 7 | namespace loader { 8 | 9 | using std::optional; 10 | using std::nullopt; 11 | 12 | struct DeployLinkerError : public std::runtime_error { 13 | using std::runtime_error::runtime_error; 14 | }; 15 | 16 | struct TLSIndex { 17 | size_t module_id; // if module_id & TLS_LOCAL_FLAG, then module_id & 18 | // ~TLS_LOCAL_FLAG is a TLSMemory*; 19 | size_t offset; 20 | }; 21 | 22 | struct SymbolProvider { 23 | SymbolProvider() {} 24 | virtual optional sym(const char* name) const = 0; 25 | virtual optional tls_sym(const char* name) const = 0; 26 | SymbolProvider(const SymbolProvider&) = delete; 27 | SymbolProvider& operator=(const SymbolProvider&) = delete; 28 | virtual ~SymbolProvider() {} 29 | }; 30 | 31 | // RAII wrapper around dlopen 32 | struct SystemLibrary : public SymbolProvider { 33 | // create a wrapper around an existing handle returned from dlopen 34 | // if steal == true, then this will dlclose the handle when it is destroyed. 35 | static std::shared_ptr create( 36 | void* handle = RTLD_DEFAULT, 37 | bool steal = false); 38 | static std::shared_ptr create(const char* path, int flags); 39 | }; 40 | 41 | struct CustomLibrary : public SymbolProvider { 42 | static std::shared_ptr create( 43 | const char* filename, 44 | int argc = 0, 45 | const char** argv = nullptr); 46 | virtual void add_search_library(std::shared_ptr lib) = 0; 47 | virtual void load() = 0; 48 | }; 49 | 50 | using SystemLibraryPtr = std::shared_ptr; 51 | using CustomLibraryPtr = std::shared_ptr; 52 | 53 | } // namespace loader -------------------------------------------------------------------------------- /src/main.cpp: -------------------------------------------------------------------------------- 1 | #include "loader.h" 2 | #include 3 | #include 4 | #include 5 | 6 | using namespace loader; 7 | 8 | struct PythonAPI { 9 | PythonAPI() { 10 | auto global = SystemLibrary::create(); 11 | find_shared_function_ = CustomLibrary::create("libfind_shared_function.so"); 12 | find_shared_function_->add_search_library(global); 13 | find_shared_function_->load(); 14 | 15 | python_ = CustomLibrary::create(PYTHON_SO_PATH); 16 | python_->add_search_library(find_shared_function_); 17 | python_->add_search_library(global); 18 | python_->load(); 19 | 20 | auto find_shared_python_ref = (CustomLibraryPtr*) find_shared_function_->sym("the_python_library").value(); 21 | *find_shared_python_ref = python_; 22 | 23 | python_runner_ = CustomLibrary::create("libpython_runner.so"); 24 | python_runner_->add_search_library(python_); 25 | python_runner_->add_search_library(global); 26 | python_runner_->load(); 27 | } 28 | void run(const char* code) { 29 | auto run = (void(*)(const char* code)) python_runner_->sym("run").value(); 30 | run(code); 31 | } 32 | CustomLibraryPtr find_shared_function_; 33 | CustomLibraryPtr python_; 34 | CustomLibraryPtr python_runner_; 35 | }; 36 | 37 | auto example_src = R"end( 38 | print("I think None is", id(None)) 39 | from time import time 40 | 41 | def fib(x): 42 | if x <= 1: 43 | return 1 44 | return fib(x - 1) + fib(x - 2) 45 | 46 | def do_fib(): 47 | s = time() 48 | fib(30) 49 | e = time() 50 | print(e - s) 51 | 52 | )end"; 53 | 54 | auto run_numpy = R"end( 55 | import numpy as np 56 | print(np.arange(10)*10) 57 | )end"; 58 | 59 | int main(int argc, const char **argv) { 60 | PythonAPI a; 61 | PythonAPI b; 62 | a.run(example_src); 63 | b.run(example_src); 64 | 65 | std::cout << "fib(30) for single interpreter\n"; 66 | std::thread t0([&]{ 67 | a.run("do_fib()"); 68 | }); 69 | std::thread t1([&]{ 70 | a.run("do_fib()"); 71 | }); 72 | t0.join(); 73 | t1.join(); 74 | 75 | std::cout << "fib(30) for 2 interpreters\n"; 76 | std::thread t2([&]{ 77 | a.run("do_fib()"); 78 | }); 79 | std::thread t3([&]{ 80 | b.run("do_fib()"); 81 | }); 82 | t2.join(); 83 | t3.join(); 84 | 85 | a.run("import regex"); 86 | std::thread t4([&]{ 87 | a.run(run_numpy); 88 | }); 89 | std::thread t5([&]{ 90 | b.run(run_numpy); 91 | }); 92 | t4.join(); 93 | t5.join(); 94 | 95 | } -------------------------------------------------------------------------------- /src/python_runner.cpp: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | #include // everything needed for embedding 5 | #include 6 | 7 | namespace py = pybind11; 8 | 9 | 10 | struct PythonGuard { 11 | PythonGuard() { 12 | Py_Initialize(); 13 | // this has to occur on the thread that calls finalize 14 | // otherwise 'assert tlock.locked()' fails 15 | py::exec("import threading"); 16 | // release GIL after startup, we will acquire on each call to run 17 | PyEval_SaveThread(); 18 | } 19 | ~PythonGuard() { 20 | PyGILState_Ensure(); 21 | Py_Finalize(); 22 | } 23 | }; 24 | 25 | static PythonGuard runner; 26 | 27 | extern "C" void run(const char * code) { 28 | py::gil_scoped_acquire guard_; 29 | py::exec(code); 30 | } --------------------------------------------------------------------------------