├── README.md ├── pipe.md └── tail.md /README.md: -------------------------------------------------------------------------------- 1 | # Customzing Python 2 | 3 | A series of step-by-step of tutorials of CPython customizing. Each focuses on understanding CPython internals, and highlights different parts of the interpreter. 4 | 5 | ## Pipe operator 6 | We will add an `|>` operator to the CPython interpreter to perform bash-like chain calls. 7 | 8 | ```python 9 | >>> [1,2] |> map(lambda x:x*2) |> list() 10 | 11 | [2, 4] 12 | ``` 13 | 14 | The tutorial covers python grammar, parsing and and compilation to bytecode. 15 | 16 | **[Read the tutorial](/pipe.md)** 17 | 18 | ## Tail call optimization 19 | The goal is to implement tail-call optimization in the python inteprtetier. 20 | ```python 21 | >>> def u(i): 22 | if i == 0: 23 | return 'a' 24 | return u(i-1) 25 | 26 | >>> u(10000) 27 | 'a' 28 | ``` 29 | The tutorial covers creation a new bytecode, flowgraph optimization, and evaluation of Python programs. 30 | 31 | **[Read the tutorial](/tail.md)** 32 | 33 | --- 34 | Pavel Kolchanov, 2024 35 | -------------------------------------------------------------------------------- /pipe.md: -------------------------------------------------------------------------------- 1 | > [!WARNING] 2 | > I'm new to Python internals, so the tutorial may contain mistakes and clunky solutions. 3 | 4 | # Pipe operator 5 | 6 | The goal is to add an operator to perform bash-like chain calls to CPython: 7 | ```python 8 | >>> [1,2] |> map(lambda x:x*2) |> list() 9 | 10 | [2, 4] 11 | ``` 12 | 13 | ## Plan 14 | 15 | According to the [CPython devguide](https://devguide.python.org/internals/compiler/), the compilation of source code involves several steps: 16 | 17 | > 1. Tokenize the source code 18 | > 19 | > 2. Parse the stream of tokens into an Abstract Syntax Tree. 20 | > 21 | > 2. Transform AST into an instruction sequence 22 | > 23 | > 2. Construct a Control Flow Graph and apply optimizations to it. 24 | > 25 | > 3. Emit bytecode based on the Control Flow Graph. 26 | 27 | To implement `|>`, we are going to change the first three steps: modify the parsing and compilation processes. 28 | 29 | ## Prereq­ui­sites 30 | Clone the cpython repo and checkout to a new branch. 31 | ```bash 32 | $ git clone git@github.com:python/cpython.git && cd cpython 33 | $ git checkout tags/v3.12.1 -b pipes 34 | ``` 35 | 36 | 37 | ## Parsing 38 | 39 | #### Tokenezation 40 | Let's start with tokenization. Tokenization is a process of splitting an input string into a sequence of tokens. 41 | 42 | Add a new `PIPE` token to the `Grammar/Tokens` 43 | ```diff 44 | ELLIPSIS '...' 45 | COLONEQUAL ':=' 46 | EXCLAMATION '!' 47 | +PIPE '|>' 48 | ``` 49 | 50 | Next, run `make regen-token` to regenerate `pycore_token.h`, `Parser/token.c`, `Lib/token.py`. 51 | 52 | For example, look at the changes in `Parser/token.c`. `regen-token` added a switch case to the parser code. 53 | 54 | ```diff 55 | case '|': 56 | switch (c2) { 57 | case '=': return VBAREQUAL; 58 | + case '>': return PIPE; 59 | } 60 | break; 61 | } 62 | ``` 63 | #### ASDL 64 | 65 | Next, move to `Parser/Python.asdl`. ASDL is a blueprint for Python AST nodes. Each AST node is defined by the ASDL. 66 | 67 | Let's add a new node – PipeOP expression. It contains two children – right and left expressions. 68 | 69 | ```diff 70 | expr = BoolOp(boolop op, expr* values) 71 | | NamedExpr(expr target, expr value) 72 | | BinOp(expr left, operator op, expr right) 73 | + | PipeOp(expr left, expr right) 74 | | UnaryOp(unaryop op, expr operand) 75 | | Lambda(arguments args, expr body) 76 | | IfExp(expr test, expr body, expr orelse) 77 | ``` 78 | 79 | Then run `make regen-ast` to regenerate `pycore_ast.h` and `Python/Python-ast.c`. 80 | 81 | Look at the changes in `Python/Python-ast.c`. `regen-ast` generated the constructor function for the pipe operator expression. 82 | ```c 83 | expr_ty 84 | _PyAST_PipeOp(expr_ty left, expr_ty right, int lineno, int col_offset, int 85 | end_lineno, int end_col_offset, PyArena *arena) 86 | { 87 | expr_ty p; 88 | if (!left) { 89 | PyErr_SetString(PyExc_ValueError, 90 | "field 'left' is required for PipeOp"); 91 | return NULL; 92 | } 93 | if (!right) { 94 | PyErr_SetString(PyExc_ValueError, 95 | "field 'right' is required for PipeOp"); 96 | return NULL; 97 | } 98 | p = (expr_ty)_PyArena_Malloc(arena, sizeof(*p)); 99 | if (!p) 100 | return NULL; 101 | p->kind = PipeOp_kind; 102 | p->v.PipeOp.left = left; 103 | p->v.PipeOp.right = right; 104 | p->lineno = lineno; 105 | p->col_offset = col_offset; 106 | p->end_lineno = end_lineno; 107 | p->end_col_offset = end_col_offset; 108 | return p; 109 | } 110 | ``` 111 | 112 | #### Grammar 113 | 114 | Now let's move to the `Parser/python.gram`. 115 | 116 | This file contains the whole python grammar. In short: it describes how to construct an abstract syntax tree using the grammar rules. 117 | 118 | Add a `pipe_op` expression definition. 119 | 120 | ```diff 121 | 122 | power[expr_ty]: 123 | | a=await_primary '**' b=factor { _PyAST_BinOp(a, Pow, b, EXTRA) } 124 | + | pipe_op 125 | + 126 | +pipe_op[expr_ty]: 127 | + | a=pipe_op '|>' b=await_primary { _PyAST_PipeOp(a, b, EXTRA) } 128 | | await_primary 129 | ``` 130 | 131 | It means "When matching a `await_primary` after `|>` token, construct a AST node using `_PyAST_PipeOp` function" 132 | 133 | Then run `make regen-pegen` to regenerate `Parser/parser.c`. 134 | 135 | #### Parsing from tokens to AST 136 | 137 | The whole AST parser is already generated by `regen-pegen`. 138 | 139 | We only need to update AST validation. Add a case for `PipeOp_king` to `Python/ast.c/validate_expr`. 140 | 141 | ```diff 142 | + case PipeOp_kind: 143 | + if (exp->v.PipeOp.right->kind != Call_kind) { 144 | + PyErr_SetString(PyExc_TypeError, 145 | + "Pipe op arg must be a function call"); 146 | + return 0; 147 | + } 148 | + ret = validate_expr(state, exp->v.PipeOp.left, Load) && 149 | + validate_expr(state, exp->v.PipeOp.right, Load); 150 | + break; 151 | ``` 152 | 153 | 154 | #### Test parsing to AST 155 | 156 | Recompile CPython with `make -j4` and test the parser with `ast.parse` module: 157 | 158 | ```python 159 | >>> import ast 160 | >>> tree = ast.parse('1 |> f()') 161 | >>> ast.dump(tree) 162 | 163 | "Module(body=[Expr(value=PipeOp(left=Constant(value=1), right=Call(func=Name(id='f', ctx=Load()), args=[], keywords=[])))], type_ignores=[])" 164 | ``` 165 | 166 | Parsing is working, so we can move to the compilation part. 167 | 168 | ## Compilation from AST to bytecode 169 | 170 | The next stage is compilation. Now we need to translate an AST to a sequence of commands for the Python VM. 171 | 172 | #### Target bytecode 173 | 174 | `a |> f(b)` is another way of saying `f(a, b)` 175 | 176 | Let's examine how `f(a, b)` is compiled into bytecode instructions using the `dis` module: 177 | 178 | ```python 179 | >>> import dis 180 | >>> dis.dis("f(a, b)") 181 | 0 0 RESUME 0 182 | 183 | 1 2 PUSH_NULL 184 | 4 LOAD_NAME 0 (f) 185 | 6 LOAD_NAME 1 (a) 186 | 8 LOAD_NAME 2 (b) 187 | 10 PRECALL 2 188 | 14 CALL 2 189 | 24 RETURN_VALUE 190 | ``` 191 | 192 | These instructions are telling an interpreter to: 193 | 194 | 1. Load a function to a stack using `LOAD_NAME` 195 | 2. Load value of `a` to the stack using `LOAD_NAME` 196 | 3. Load value of `b` to the stack using `LOAD_NAME` 197 | 4. Call the function using `CALL` with `2` arguments. 198 | 199 | So, to implement the pipe, we need to add an additional argument to the stack, between a function load and a function call, during compilation. 200 | 201 | #### Compile.c 202 | The starting point of compilation is the `Python/compile.c` file. 203 | 204 | Let's look to the `Python/compile.c/compiler_visit_expr1` . 205 | 206 | The function describes compilation of an `expr_ty` AST node to with a simple switch. For example, to compile a binary operation like `a + b` complier recurcively visitis left and right nodes, and adds binary operation to controll sequence. 207 | 208 | ```c 209 | static int 210 | compiler_visit_expr1(struct compiler *c, expr_ty e) 211 | { 212 | location loc = LOC(e); 213 | switch (e->kind) { 214 | case NamedExpr_kind: 215 | VISIT(c, expr, e->v.NamedExpr.value); 216 | ADDOP_I(c, loc, COPY, 1); 217 | VISIT(c, expr, e->v.NamedExpr.target); 218 | break; 219 | case BoolOp_kind: 220 | return compiler_boolop(c, e); 221 | case BinOp_kind: 222 | VISIT(c, expr, e->v.BinOp.left); 223 | VISIT(c, expr, e->v.BinOp.right); 224 | ADDOP_BINARY(c, loc, e->v.BinOp.op); 225 | break; 226 | ... 227 | ``` 228 | 229 | Add a new case with for `PipeOp_kind`. Let's start with a copy of regular function call. 230 | 231 | ```diff 232 | + case PipeOp_kind: 233 | + return compiler_call(c, e->>v.PipeOp.right); 234 | case Lambda_kind: 235 | return compiler_lambda(c, e); 236 | ``` 237 | 238 | Recompile Cpython with `make -j4` and try new operator: 239 | 240 | ```python 241 | >>> 1 |> f() 242 | 243 | Assertion failed: (scope || PyUnicode_READ_CHAR(name, 0) == '_'), function compiler_nameop, file compile.c, line 4209 244 | ``` 245 | 246 | Seems like compiler can't resolve a `f` symbol. Let's fix that. 247 | 248 | #### Symbables 249 | 250 | The purpose of a symbol table is to resolving scopes. 251 | 252 | Look at `Python/symtable.c/symtable_visit_expr`. The function recursively visits the AST and builds symbol tables. 253 | ```c 254 | static int 255 | symtable_visit_expr(struct symtable *st, expr_ty e) 256 | { 257 | ... 258 | switch (e->kind) { 259 | case NamedExpr_kind: 260 | if (!symtable_raise_if_annotation_block(st, "named expression", e)) { 261 | VISIT_QUIT(st, 0); 262 | } 263 | if(!symtable_handle_namedexpr(st, e)) 264 | VISIT_QUIT(st, 0); 265 | break; 266 | case BoolOp_kind: 267 | VISIT_SEQ(st, expr, e->v.BoolOp.values); 268 | break; 269 | .... 270 | ``` 271 | 272 | Add a case for a `PipeOP_kind`: 273 | 274 | ```diff 275 | case BinOp_kind: 276 | VISIT(st, expr, e->v.BinOp.left); 277 | VISIT(st, expr, e->v.BinOp.right); 278 | + break; 279 | + case PipeOp_kind: 280 | + VISIT(st, expr, e->v.PipeOp.left); 281 | + VISIT(st, expr, e->v.PipeOp.right); 282 | break; 283 | case UnaryOp_kind: 284 | VISIT(st, expr, e->v.UnaryOp.operand); 285 | ``` 286 | 287 | Recompile CPython with `make -j4` and test symbols with `symtables` module: 288 | 289 | ````python 290 | >>> import symtable 291 | >>> st = symtable.symtable('1 |> f()', "example.py", "exec") 292 | >>> [ (symbol.get_name(), symbol.is_global() ) for symbol in st.get_symbols() ] 293 | 294 | [('f', True)] 295 | ```` 296 | 297 | #### Compilaton 298 | 299 | Move back to the `Python/compile.c/compiler_visit_expr1` and redefine case for `PipeOp_kind` with `compiler_pipe_call`: 300 | 301 | ```diff 302 | case PipeOp_kind: 303 | - return compiler_call(c, e->>v.PipeOp.right); 304 | + return compiler_pipe_call(c, e); 305 | case Lambda_kind: 306 | return compiler_lambda(c, e); 307 | ``` 308 | 309 | And define a new `compiler_pipe_call` function. Start with a modified copy of `compiler_call`, which calls the right expression: 310 | 311 | ```c 312 | static int compiler_pipe_call(struct compiler *c, expr_ty e) { 313 | expr_ty func_e = e->v.PipeOp.right; 314 | expr_ty arg_e = e->v.PipeOp.left; 315 | 316 | RETURN_IF_ERROR(validate_keywords(c, func_e->v.Call.keywords)); 317 | int ret = maybe_optimize_method_call(c, func_e); 318 | if (ret < 0) { 319 | return ERROR; 320 | } 321 | if (ret == 1) { 322 | return SUCCESS; 323 | } 324 | 325 | RETURN_IF_ERROR(check_caller(c, func_e->v.Call.func)); 326 | VISIT(c, expr, func_e->v.Call.func); 327 | location loc = LOC(func_e->v.Call.func); 328 | ADDOP(c, loc, PUSH_NULL); 329 | loc = LOC(func_e); 330 | 331 | return compiler_call_helper(c, loc, 0, 332 | func_e->v.Call.args, 333 | func_e->v.Call.keywords); 334 | } 335 | 336 | ``` 337 | 338 | Recompile CPython and check that nothing is broken: 339 | ```python 340 | >>> f = lambda x:x 341 | >>> 1 |> f() 342 | 343 | TypeError: () missing 1 required positional argument: 'x' 344 | ``` 345 | 346 | Fine. Finaly, we need to add `e->v.PipeOp.left` to `v.Call.args` 347 | 348 | Let's create new argument sequence using `_Py_asdl_expr_seq_new` and fill it with orignal args and left pipe expression: 349 | 350 | ```diff 351 | static int compiler_pipe_call(struct compiler *c, expr_ty e) { 352 | expr_ty func_e = e->v.PipeOp.right; 353 | expr_ty arg_e = e->v.PipeOp.left; 354 | 355 | RETURN_IF_ERROR(validate_keywords(c, func_e->v.Call.keywords)); 356 | int ret = maybe_optimize_method_call(c, func_e); 357 | if (ret < 0) { 358 | return ERROR; 359 | } 360 | if (ret == 1) { 361 | return SUCCESS; 362 | } 363 | 364 | RETURN_IF_ERROR(check_caller(c, func_e->v.Call.func)); 365 | VISIT(c, expr, func_e->v.Call.func); 366 | location loc = LOC(func_e->v.Call.func); 367 | ADDOP(c, loc, PUSH_NULL); 368 | loc = LOC(func_e); 369 | 370 | + Py_ssize_t original_len = asdl_seq_LEN(func_e->v.Call.args); 371 | + asdl_expr_seq *new_args = _Py_asdl_expr_seq_new( 372 | + original_len+1, c->c_arena); 373 | 374 | + for (Py_ssize_t i = 0; i < original_len; i++) { 375 | + asdl_seq_SET(new_args, i, asdl_seq_GET(func_e->v.Call.args, i)); 376 | + } 377 | 378 | + asdl_seq_SET(new_args, original_len, arg_e); 379 | 380 | return compiler_call_helper(c, loc, 0, 381 | - func_e->v.Call.args, 382 | + new_args, 383 | func_e->v.Call.keywords); 384 | } 385 | 386 | ``` 387 | 388 | Thats it! 389 | 390 | Recompile Cpython with `make -j4` and try new operator: 391 | 392 | 393 | ```python 394 | >>> [1,2] |> map(lambda x:x*2) |> list() 395 | 396 | [2, 4] 397 | ``` 398 | 399 | Yaaaaay! 400 | 401 | ### Final dis 402 | 403 | ```python 404 | >>> import dis 405 | >>> dis.dis("a |> f(b)") 406 | 407 | 0 RESUME 0 408 | 409 | 1 LOAD_NAME 0 (f) 410 | PUSH_NULL 411 | LOAD_NAME 1 (b) 412 | LOAD_NAME 2 (a) 413 | CALL 2 414 | RETURN_VALUE 415 | ``` 416 | 417 | The bytecode is the same to a `f(a,b)`. -------------------------------------------------------------------------------- /tail.md: -------------------------------------------------------------------------------- 1 | > [!WARNING] 2 | > I'm new to Python internals, so the tutorial may contain mistakes and clunky solutions. 3 | 4 | # Tail call optimization 5 | A tail call happens when a function calls another as its last action, so it has nothing else to do: 6 | ```python 7 | def g(): 8 | return f() 9 | ``` 10 | Tail call optimization eliminates the need for adding a new stack frame to the call stack. This is useful for writing recursive functions: 11 | 12 | ```python 13 | def fact(n, acc): 14 | if n == 1: 15 | return acc 16 | return fact(n-1, acc*n) 17 | ``` 18 | 19 | Altrough Guido Van Rossum [considers](https://neopythonic.blogspot.com/2009/04/final-words-on-tail-calls.html ) tail call optimizatoin as unpythonic, it's interesting to implement it for educational purposes. Let's start. 20 | 21 | ## Plan 22 | We are going to modify three steps of python inperpreter: 23 | 24 | 1. Introduce a new `TAIL_CALL` bytecode opeartor to python vm 25 | 2. Add an optimization to compiler, that inserts `TAIL_CALL` to code 26 | 3. Implement an interpretator for the new bytecode 27 | 28 | ## Prereq­ui­sites 29 | Clone the CPython repo and checkout to a new branch. 30 | ```bash 31 | $ git clone git@github.com:python/cpython.git && cd cpython 32 | $ git checkout tags/v3.12.1 -b tail-call 33 | ``` 34 | 35 | ## A new bytecode instruction 36 | 37 | #### About bytecodes 38 | Python source code is compiled into bytecode. Bytecode is a set of insturctions for the python vm. 39 | For example, check how `f(a, b)` is represented: 40 | 41 | ```python 42 | >>> import dis 43 | >>> dis.dis("f(a, b)") 44 | 0 0 RESUME 0 45 | 46 | 1 2 PUSH_NULL 47 | 4 LOAD_NAME 0 (f) 48 | 6 LOAD_NAME 1 (a) 49 | 8 LOAD_NAME 2 (b) 50 | 10 PRECALL 2 51 | 14 CALL 2 52 | 24 RETURN_VALUE 53 | ``` 54 | 55 | These instructions are telling an interpreter to: 56 | 57 | 1. Load a function to a value stack using `LOAD_NAME` 58 | 2. Load value of `a` to the value stack using `LOAD_NAME` 59 | 3. Load value of `b` to the value stack using `LOAD_NAME` 60 | 4. Call the function using `CALL` with `2` arguments. 61 | 62 | 63 | #### `TAIL_CALL` defenition 64 | Let's introduce a new bytecode instruction. The `Python/bytecodes.c` contains defenitions and interpretations of python bytecodes. It's written in a custom syntax. 65 | 66 | Since we're interested in calls, let's introduce an `TAIL_CALL` by blank copy of regular `CALL` 67 | 68 | ```diff 69 | macro(CALL) = _SPECIALIZE_CALL + unused/2 + _CALL; 70 | + macro(TAIL_CALL) = _SPECIALIZE_CALL + unused/2 + _CALL; 71 | ``` 72 | 73 | Next, run `make regen-cases` to translate the `Python/bytecodes.c` to a propper c code. Let's have a look what is changed. 74 | 75 | First, it defined a new bytecode. For example, in the `Include/opcode_ids.h` 76 | 77 | ```diff 78 | +#define TAIL_CALL 116 79 | ``` 80 | 81 | Second, it defined how to interpret the new bytecode in the `Python/generated_cases.c.h`. This file contains functions for the interptetiner. 82 | 83 | ```diff 84 | + TARGET(TAIL_CALL) { 85 | + _Py_CODEUNIT *this_instr = frame->instr_ptr = next_instr; 86 | + next_instr += 4; 87 | + INSTRUCTION_STATS(TAIL_CALL); 88 | + PyObject **args; 89 | + PyObject *self_or_null; 90 | + PyObject *callable; 91 | + PyObject *res; 92 | ... 93 | ``` 94 | 95 | #### Importlib 96 | The other important step is to update the imporlib after introducing the new bytecode. Since some Python libraries are frozen and linked to the Python interpreter, it is necessary to freeze them after any bytecode updates. 97 | 98 | Change `MAGIC_NUMBER` constant in the `Lib/importlib/_bootstrap_external.py`. This will lead to .pyc files with the old `MAGIC_NUMBER` to be recompiled by the interpreter on import. 99 | 100 | Then, run `make regen-importlib`. 101 | 102 | 103 | 104 | 105 | ## Flowgraph optimization 106 | 107 | According to the 108 | [CPython devguide](https://devguide.python.org/internals/compiler/#control-flow-graphs) control flow graph is an intermediate result of python source code compilation. CFGs are usually one step away from final code output, and are perfect place to perfom a code optimization. 109 | 110 | Look to the to the `Python/flowgraph.c/_PyCfg_OptimizeCodeUnit`. The function updates code graph: removes unused condsts, insterts super instructions, etc. 111 | 112 | ```c 113 | int 114 | _PyCfg_OptimizeCodeUnit(cfg_builder *g, PyObject *consts, PyObject *const_cache, 115 | int nlocals, int nparams, int firstlineno) 116 | { 117 | ... 118 | RETURN_IF_ERROR(optimize_cfg(g, consts, const_cache, firstlineno)); 119 | RETURN_IF_ERROR(remove_unused_consts(g->g_entryblock, consts)); 120 | RETURN_IF_ERROR( 121 | add_checks_for_loads_of_uninitialized_variables( 122 | g->g_entryblock, nlocals, nparams)); 123 | insert_superinstructions(g); 124 | ... 125 | } 126 | ``` 127 | 128 | Insert a new optimization `optimize_tail_call`: 129 | 130 | ```diff 131 | RETURN_IF_ERROR( 132 | add_checks_for_loads_of_uninitialized_variables( 133 | g->g_entryblock, nlocals, nparams)); 134 | insert_superinstructions(g); 135 | + optimize_tail_call(g); 136 | ``` 137 | 138 | And define it. The goal is to replace CALL→RETURN sequence to TAIL_CALL→RETURN: 139 | 140 | ```c 141 | 142 | static void 143 | optimize_tail_call(cfg_builder *g) 144 | { 145 | for (basicblock *b = g->g_entryblock; b != NULL; b = b->b_next) { 146 | 147 | for (int i = 0; i < b->b_iused; i++) { 148 | cfg_instr *inst = &b->b_instr[i]; 149 | int nextop = i+1 < b->b_iused ? b->b_instr[i+1].i_opcode : 0; 150 | if (inst->i_opcode == CALL && nextop == RETURN_VALUE) { 151 | INSTR_SET_OP1(inst, TAIL_CALL, inst->i_oparg); 152 | } 153 | 154 | } 155 | } 156 | } 157 | ``` 158 | 159 | Let's check if the new optimizaiton is working. 160 | 161 | Recompile cpython with `make -j6` 162 | 163 | And check the new optimization with `dis` module: 164 | 165 | ```python 166 | >>> def f(): 167 | ... return g(): 168 | 169 | >>> dis.dis(f) 170 | 171 | 1 RESUME 0 172 | 173 | 2 LOAD_GLOBAL 1 (g + NULL) 174 | TAIL_CALL 0 175 | RETURN_VALUE 176 | ``` 177 | 178 | Perfect. Let's move to the final step. 179 | 180 | ## Implement an interpretator for the new bytecode 181 | As previously mentioned, the `Python/bytecodes.c` file contains definitions and interpretations of Python bytecodes. 182 | Currently, we are using a blank copy of `CALL` interptetier as `TAIL_CALL`. First, let's understand how it works. 183 | 184 | 185 | #### How `CALL` works 186 | A bit of terminology. Call frame is a structure that represents a function call's execution context: local variables, function arguments etc. 187 | 188 | Value stack is a list of pointers to python objects, that instructions operates. For example, 189 | 190 | The `Python/bytecodes.c/_CALL` manipulaets both structures. In summary, it does three things: 191 | 1. Creates a new call frame and pushes it to the call stack 192 | 2. Consumes arguments from the current frame's value stack 193 | 3. Passes control to the new frame 194 | 195 | ```c 196 | // Creates a new call frame and pushes it to the call stack 197 | int code_flags = ((PyCodeObject*)PyFunction_GET_CODE(callable))->co_flags; 198 | PyObject *locals = code_flags & CO_OPTIMIZED ? NULL : Py_NewRef(PyFunction_GET_GLOBALS(callable)); 199 | _PyInterpreterFrame *new_frame = _PyEvalFramePushAndInit( 200 | tstate, (PyFunctionObject *)callable, locals, 201 | args, total_args, NULL 202 | ); 203 | 204 | // Consumes arguments from the current frame's value stack 205 | STACK_SHRINK(oparg + 2); 206 | 207 | if (new_frame == NULL) { 208 | GOTO_ERROR(error); 209 | } 210 | // Updates the current frame's return offset 211 | frame->return_offset = (uint16_t)(next_instr - this_instr); 212 | // Passes control to the new frame 213 | DISPATCH_INLINED(new_frame); 214 | ``` 215 | 216 | Simple and easy. Let's move to the next step. 217 | #### `TAIL_CALL` interptetier 218 | To create a `TAIL_CALL` interptetier we are going to change a few things in the regular `CALL`. 219 | 220 | We need to drop the current frame before creating a new call frame. However, because references to arguments are stored in the current dying frame, we need to store them before dropping. And clean them up after creating the new frame. 221 | 222 | Let's move to `Python/bytecodes.c/_TAIL_CALL`. 223 | 224 | First, save args. Since CPython uses it's own memory allocator, use `PyMem_Malloc` to allocate memory. 225 | 226 | ```diff 227 | // Check if the call can be inlined or not 228 | if (Py_TYPE(callable) == &PyFunction_Type && 229 | tstate->interp->eval_frame == NULL && 230 | ((PyFunctionObject *)callable)->vectorcall == _PyFunction_Vectorcall) 231 | { 232 | + PyObject **newargs = PyMem_Malloc(sizeof(PyObject*) * (total_args)); 233 | + Py_ssize_t j, n; 234 | + n = total_args; 235 | 236 | + for (j = 0; j < n; j++) 237 | + { 238 | + PyObject *x = args[j]; 239 | + newargs[j] = x; 240 | + } 241 | ``` 242 | 243 | Next, drop the current call frame. The snippet is a copy of the `Python/bytecodes.c/POP_FRAME` instruction. 244 | ```diff 245 | + STACK_SHRINK(oparg + 2); 246 | + _Py_LeaveRecursiveCallPy(tstate); 247 | +_PyFrame_SetStackPointer(frame, stack_pointer); 248 | + _PyInterpreterFrame *dying = frame; 249 | + frame = tstate->current_frame = dying->previous; 250 | +_PyEval_FrameClearAndPop(tstate, dying); 251 | + LOAD_SP(); 252 | ``` 253 | 254 | Init a new frame using callable and new args. 255 | ```diff 256 | _PyInterpreterFrame *new_frame = _PyEvalFramePushAndInit( 257 | tstate, (PyFunctionObject *)callable, locals, 258 | - args, total_args, NULL 259 | + newargs, total_args, NULL 260 | ); 261 | ``` 262 | Clean up the argument stash: 263 | 264 | ```diff 265 | + PyMem_Free(newargs); 266 | ``` 267 | And pass contoll to the new frame: 268 | 269 | ```c 270 | DISPATCH_INLINED(new_frame); 271 | ``` 272 | 273 | 274 | Recompile CPython with `make regen-cases && make regen-importlib && make -j6` and test the new operator. 275 | 276 | 277 | ## Final check 278 | 279 | ```python 280 | >>> def fact(n, acc): 281 | ... if n == 1: 282 | ... return acc 283 | ... return fact(n-1, acc*n) 284 | 285 | >>>fact (1500,1) 286 | 287 | 48119977967797748601669900935... 288 | ``` 289 | 290 | --------------------------------------------------------------------------------