├── README.md
├── pipe.md
└── tail.md


/README.md:
--------------------------------------------------------------------------------
 1 | # Customzing Python
 2 | 
 3 | A series of step-by-step of tutorials of CPython customizing. Each focuses on understanding CPython internals, and highlights different parts of the interpreter. 
 4 | 
 5 | ## Pipe operator
 6 | We will  add an `|>` operator to the CPython interpreter to perform bash-like chain calls.
 7 | 
 8 | ```python
 9 | >>> [1,2] |> map(lambda x:x*2) |> list()
10 | 
11 | [2, 4]
12 | ```
13 | 
14 | The tutorial covers python grammar, parsing and and compilation to bytecode. 
15 | 
16 | **[Read the tutorial](/pipe.md)**
17 | 
18 | ## Tail call optimization
19 | The goal is to implement tail-call optimization in the python inteprtetier. 
20 | ```python
21 | >>> def u(i):
22 |         if i == 0:
23 |             return 'a'
24 |         return u(i-1)
25 | 
26 | >>> u(10000)
27 | 'a'
28 | ```
29 | The tutorial covers creation a new bytecode, flowgraph optimization, and evaluation of Python programs.
30 | 
31 | **[Read the tutorial](/tail.md)**
32 | 
33 | ---
34 | Pavel Kolchanov, 2024
35 | 


--------------------------------------------------------------------------------
/pipe.md:
--------------------------------------------------------------------------------
  1 | > [!WARNING]
  2 | > I'm new to Python internals, so the tutorial may contain mistakes and clunky solutions.
  3 | 
  4 | # Pipe operator
  5 | 
  6 | The goal is to add an operator to perform bash-like chain calls to CPython:
  7 | ```python
  8 | >>> [1,2] |> map(lambda x:x*2) |> list()
  9 | 
 10 | [2, 4]
 11 | ```
 12 | 
 13 | ## Plan
 14 | 
 15 | According to the [CPython devguide](https://devguide.python.org/internals/compiler/), the compilation of source code involves several steps: 
 16 |  
 17 | > 1. Tokenize the source code
 18 | >
 19 | > 2. Parse the stream of tokens into an Abstract Syntax Tree.
 20 | >
 21 | > 2. Transform AST into an instruction sequence
 22 | > 
 23 | > 2. Construct a Control Flow Graph and apply optimizations to it.
 24 | >
 25 | > 3. Emit bytecode based on the Control Flow Graph.
 26 | 
 27 | To implement `|>`, we are going to change the first three steps: modify the parsing and compilation processes.
 28 | 
 29 | ## Prereq­ui­sites
 30 | Clone the cpython repo and checkout to a new branch.
 31 | ```bash
 32 | $ git clone git@github.com:python/cpython.git && cd cpython
 33 | $ git checkout tags/v3.12.1 -b pipes
 34 | ```
 35 | 
 36 | 
 37 | ## Parsing 
 38 | 
 39 | #### Tokenezation 
 40 | Let's start with tokenization. Tokenization is a process of splitting an input string into a sequence of tokens. 
 41 | 
 42 | Add a new `PIPE` token to the  `Grammar/Tokens`
 43 | ```diff 
 44 |  ELLIPSIS                '...'
 45 |  COLONEQUAL              ':='
 46 |  EXCLAMATION             '!'
 47 | +PIPE                    '|>'
 48 | ```
 49 | 
 50 | Next, run `make regen-token` to regenerate `pycore_token.h`, `Parser/token.c`, `Lib/token.py`. 
 51 | 
 52 | For example, look at the changes in  `Parser/token.c`. `regen-token` added a switch case to the parser code.
 53 | 
 54 | ```diff
 55 |      case '|':
 56 |          switch (c2) {
 57 |          case '=': return VBAREQUAL;
 58 | +        case '>': return PIPE;
 59 |          }
 60 |          break;
 61 |      }
 62 | ```
 63 | #### ASDL
 64 | 
 65 | Next, move to `Parser/Python.asdl`. ASDL is a blueprint for Python AST nodes. Each AST node is defined by the ASDL.
 66 | 
 67 | Let's add a new node – PipeOP expression. It contains two children – right and left expressions.  
 68 | 
 69 | ```diff
 70 |      expr = BoolOp(boolop op, expr* values)
 71 |           | NamedExpr(expr target, expr value)
 72 |           | BinOp(expr left, operator op, expr right)
 73 | +         | PipeOp(expr left, expr right)
 74 |           | UnaryOp(unaryop op, expr operand)
 75 |           | Lambda(arguments args, expr body)
 76 |           | IfExp(expr test, expr body, expr orelse)
 77 | ```
 78 | 
 79 | Then run `make regen-ast` to regenerate `pycore_ast.h` and `Python/Python-ast.c`. 
 80 | 
 81 | Look at the changes in `Python/Python-ast.c`. `regen-ast` generated the constructor function for the pipe operator expression.
 82 | ```c
 83 | expr_ty
 84 | _PyAST_PipeOp(expr_ty left, expr_ty right, int lineno, int col_offset, int
 85 |               end_lineno, int end_col_offset, PyArena *arena)
 86 | {
 87 |     expr_ty p;
 88 |     if (!left) {
 89 |         PyErr_SetString(PyExc_ValueError,
 90 |                         "field 'left' is required for PipeOp");
 91 |         return NULL;
 92 |     }
 93 |     if (!right) {
 94 |         PyErr_SetString(PyExc_ValueError,
 95 |                         "field 'right' is required for PipeOp");
 96 |         return NULL;
 97 |     }
 98 |     p = (expr_ty)_PyArena_Malloc(arena, sizeof(*p));
 99 |     if (!p)
100 |         return NULL;
101 |     p->kind = PipeOp_kind;
102 |     p->v.PipeOp.left = left;
103 |     p->v.PipeOp.right = right;
104 |     p->lineno = lineno;
105 |     p->col_offset = col_offset;
106 |     p->end_lineno = end_lineno;
107 |     p->end_col_offset = end_col_offset;
108 |     return p;
109 | }
110 | ```
111 | 
112 | #### Grammar
113 | 
114 | Now let's move to the `Parser/python.gram`. 
115 | 
116 | This file contains the whole python grammar.  In short: it describes how to construct an abstract syntax tree using the grammar rules.
117 | 
118 | Add a `pipe_op` expression definition.
119 | 
120 | ```diff
121 | 
122 |  power[expr_ty]:
123 |      | a=await_primary '**' b=factor { _PyAST_BinOp(a, Pow, b, EXTRA) }
124 | +    | pipe_op
125 | +
126 | +pipe_op[expr_ty]:
127 | +    | a=pipe_op '|>' b=await_primary { _PyAST_PipeOp(a, b, EXTRA) } 
128 |      | await_primary
129 | ```
130 | 
131 | It means "When matching a `await_primary` after `|>` token, construct a AST node using `_PyAST_PipeOp` function"
132 | 
133 | Then run `make regen-pegen` to regenerate  `Parser/parser.c`. 
134 | 
135 | #### Parsing from tokens to AST
136 | 
137 | The whole AST parser is already generated by `regen-pegen`.
138 | 
139 | We only need to update AST validation. Add a case for `PipeOp_king` to `Python/ast.c/validate_expr`. 
140 | 
141 | ```diff
142 | +    case PipeOp_kind:
143 | +        if (exp->v.PipeOp.right->kind != Call_kind) {
144 | +            PyErr_SetString(PyExc_TypeError,
145 | +                            "Pipe op arg must be a function call");
146 | +            return 0;
147 | +        }
148 | +        ret = validate_expr(state, exp->v.PipeOp.left, Load) &&
149 | +        validate_expr(state, exp->v.PipeOp.right, Load);
150 | +        break;
151 | ```
152 | 
153 | 
154 | #### Test parsing to AST
155 | 
156 | Recompile CPython with `make -j4` and test the parser with `ast.parse` module:
157 | 
158 | ```python
159 | >>> import ast
160 | >>> tree = ast.parse('1 |> f()')
161 | >>> ast.dump(tree)
162 | 
163 | "Module(body=[Expr(value=PipeOp(left=Constant(value=1), right=Call(func=Name(id='f', ctx=Load()), args=[], keywords=[])))], type_ignores=[])"
164 | ```
165 | 
166 | Parsing is working, so we can move to the compilation part.
167 | 
168 | ## Compilation from AST to bytecode
169 | 
170 | The next stage is compilation. Now we need to translate an AST to a sequence of commands for the Python VM. 
171 | 
172 | #### Target bytecode
173 | 
174 | `a |> f(b)` is another way of saying  `f(a, b)`
175 | 
176 | Let's examine how `f(a, b)` is compiled into bytecode instructions using the `dis` module:
177 | 
178 | ```python
179 | >>> import dis
180 | >>> dis.dis("f(a, b)")
181 |   0           0 RESUME                   0
182 | 
183 |   1           2 PUSH_NULL
184 |               4 LOAD_NAME                0 (f)
185 |               6 LOAD_NAME                1 (a)
186 |               8 LOAD_NAME                2 (b)
187 |              10 PRECALL                  2
188 |              14 CALL                     2
189 |              24 RETURN_VALUE
190 | ```
191 | 
192 | These instructions are telling an interpreter to:
193 | 
194 | 1. Load a function to a stack using `LOAD_NAME`
195 | 2. Load value of `a` to the stack using `LOAD_NAME`
196 | 3. Load value of `b` to the stack using `LOAD_NAME`
197 | 4. Call the function using `CALL` with `2` arguments.
198 | 
199 | So, to implement the pipe, we need to add an additional argument to the stack, between a function load and a function call, during compilation.
200 | 
201 | #### Compile.c
202 | The starting point of compilation is the `Python/compile.c` file.
203 | 
204 | Let's look to the `Python/compile.c/compiler_visit_expr1` . 
205 | 
206 | The function describes compilation of an `expr_ty` AST node to with a simple switch. For example, to compile a binary operation like `a + b`  complier recurcively visitis left and right nodes, and adds binary operation to controll sequence. 
207 | 
208 | ```c
209 | static int
210 | compiler_visit_expr1(struct compiler *c, expr_ty e)
211 | {
212 |     location loc = LOC(e);
213 |     switch (e->kind) {
214 |     case NamedExpr_kind:
215 |         VISIT(c, expr, e->v.NamedExpr.value);
216 |         ADDOP_I(c, loc, COPY, 1);
217 |         VISIT(c, expr, e->v.NamedExpr.target);
218 |         break;
219 |     case BoolOp_kind:
220 |         return compiler_boolop(c, e);
221 |     case BinOp_kind:
222 |         VISIT(c, expr, e->v.BinOp.left);
223 |         VISIT(c, expr, e->v.BinOp.right);
224 |         ADDOP_BINARY(c, loc, e->v.BinOp.op);
225 |         break;
226 | ...
227 | ```
228 | 
229 | Add a new case with for `PipeOp_kind`. Let's start with a copy of regular function call. 
230 | 
231 | ```diff
232 | +    case PipeOp_kind:
233 | +        return compiler_call(c, e->>v.PipeOp.right);
234 |      case Lambda_kind:
235 |          return compiler_lambda(c, e);
236 | ```
237 | 
238 | Recompile Cpython with `make -j4` and try new operator:
239 | 
240 | ```python
241 | >>> 1 |> f()
242 | 
243 | Assertion failed: (scope || PyUnicode_READ_CHAR(name, 0) == '_'), function compiler_nameop, file compile.c, line 4209
244 | ```
245 | 
246 | Seems like compiler can't resolve a `f` symbol. Let's fix that. 
247 | 
248 | #### Symbables 
249 | 
250 | The purpose of a symbol table is to resolving scopes. 
251 | 
252 | Look at `Python/symtable.c/symtable_visit_expr`. The function recursively visits the AST and builds symbol tables.
253 | ```c
254 | static int
255 | symtable_visit_expr(struct symtable *st, expr_ty e)
256 | {
257 |     ...
258 |     switch (e->kind) {
259 |     case NamedExpr_kind:
260 |         if (!symtable_raise_if_annotation_block(st, "named expression", e)) {
261 |             VISIT_QUIT(st, 0);
262 |         }
263 |         if(!symtable_handle_namedexpr(st, e))
264 |             VISIT_QUIT(st, 0);
265 |         break;
266 |     case BoolOp_kind:
267 |         VISIT_SEQ(st, expr, e->v.BoolOp.values);
268 |         break;
269 |     ....
270 | ```
271 | 
272 | Add a case for a `PipeOP_kind`:
273 | 
274 | ```diff
275 |      case BinOp_kind:
276 |          VISIT(st, expr, e->v.BinOp.left);
277 |          VISIT(st, expr, e->v.BinOp.right);
278 | +        break;
279 | +    case PipeOp_kind:
280 | +        VISIT(st, expr, e->v.PipeOp.left);
281 | +        VISIT(st, expr, e->v.PipeOp.right);
282 |          break;
283 |      case UnaryOp_kind:
284 |          VISIT(st, expr, e->v.UnaryOp.operand);
285 | ```
286 | 
287 | Recompile CPython with `make -j4` and test symbols with `symtables` module:
288 | 
289 | ````python
290 | >>> import symtable
291 | >>> st = symtable.symtable('1 |> f()',  "example.py", "exec")
292 | >>> [ (symbol.get_name(), symbol.is_global() ) for symbol in st.get_symbols() ]
293 | 
294 | [('f', True)]
295 | ````
296 | 
297 | #### Compilaton 
298 | 
299 | Move back to the `Python/compile.c/compiler_visit_expr1` and redefine case for `PipeOp_kind` with `compiler_pipe_call`:
300 | 
301 | ```diff
302 |     case PipeOp_kind:
303 | -        return compiler_call(c, e->>v.PipeOp.right);
304 | +        return compiler_pipe_call(c, e);
305 |      case Lambda_kind:
306 |          return compiler_lambda(c, e);
307 | ```
308 | 
309 | And define a new `compiler_pipe_call` function. Start with a modified copy of `compiler_call`, which calls the right expression:
310 | 
311 | ```c
312 | static int compiler_pipe_call(struct compiler *c, expr_ty e) {
313 |     expr_ty func_e = e->v.PipeOp.right;
314 |     expr_ty arg_e = e->v.PipeOp.left;
315 | 
316 |     RETURN_IF_ERROR(validate_keywords(c, func_e->v.Call.keywords));
317 |     int ret = maybe_optimize_method_call(c, func_e);
318 |     if (ret < 0) {
319 |         return ERROR;
320 |     }
321 |     if (ret == 1) {
322 |         return SUCCESS;
323 |     }
324 |     
325 |     RETURN_IF_ERROR(check_caller(c, func_e->v.Call.func));
326 |     VISIT(c, expr, func_e->v.Call.func);
327 |     location loc = LOC(func_e->v.Call.func);
328 |     ADDOP(c, loc, PUSH_NULL);
329 |     loc = LOC(func_e);
330 | 
331 |     return compiler_call_helper(c, loc, 0,
332 |                                 func_e->v.Call.args,
333 |                                 func_e->v.Call.keywords);
334 | }
335 | 
336 | ```
337 | 
338 | Recompile CPython and check that nothing is broken:
339 | ```python
340 | >>> f = lambda x:x
341 | >>> 1 |> f()
342 | 
343 | TypeError: <lambda>() missing 1 required positional argument: 'x'
344 | ```
345 | 
346 | Fine. Finaly, we need to add `e->v.PipeOp.left` to  `v.Call.args` 
347 | 
348 | Let's create new argument sequence using `_Py_asdl_expr_seq_new` and fill it with orignal args and left pipe expression:
349 | 
350 | ```diff
351 | static int compiler_pipe_call(struct compiler *c, expr_ty e) {
352 |     expr_ty func_e = e->v.PipeOp.right;
353 |     expr_ty arg_e = e->v.PipeOp.left;
354 | 
355 |     RETURN_IF_ERROR(validate_keywords(c, func_e->v.Call.keywords));
356 |     int ret = maybe_optimize_method_call(c, func_e);
357 |     if (ret < 0) {
358 |         return ERROR;
359 |     }
360 |     if (ret == 1) {
361 |         return SUCCESS;
362 |     }
363 |     
364 |     RETURN_IF_ERROR(check_caller(c, func_e->v.Call.func));
365 |     VISIT(c, expr, func_e->v.Call.func);
366 |     location loc = LOC(func_e->v.Call.func);
367 |     ADDOP(c, loc, PUSH_NULL);
368 |     loc = LOC(func_e);
369 | 
370 | +    Py_ssize_t original_len = asdl_seq_LEN(func_e->v.Call.args);
371 | +    asdl_expr_seq *new_args = _Py_asdl_expr_seq_new(
372 | +            original_len+1, c->c_arena);
373 |     
374 | +    for (Py_ssize_t i = 0; i < original_len; i++) {
375 | +            asdl_seq_SET(new_args, i, asdl_seq_GET(func_e->v.Call.args, i));
376 | +    }
377 | 
378 | +    asdl_seq_SET(new_args, original_len, arg_e);
379 | 
380 |     return compiler_call_helper(c, loc, 0,
381 | -                                func_e->v.Call.args,
382 | +                                new_args,
383 |                                 func_e->v.Call.keywords);
384 | }
385 | 
386 | ```
387 | 
388 | Thats it!
389 | 
390 | Recompile Cpython with `make -j4` and try new operator:
391 | 
392 | 
393 | ```python
394 | >>> [1,2] |> map(lambda x:x*2) |> list()
395 | 
396 | [2, 4]
397 | ```
398 | 
399 | Yaaaaay! 
400 | 
401 | ### Final dis
402 | 
403 | ```python
404 | >>> import dis
405 | >>> dis.dis("a |> f(b)")
406 | 
407 |   0           RESUME                   0
408 | 
409 |   1           LOAD_NAME                0 (f)
410 |               PUSH_NULL
411 |               LOAD_NAME                1 (b)
412 |               LOAD_NAME                2 (a)
413 |               CALL                     2
414 |               RETURN_VALUE
415 | ```
416 | 
417 | The bytecode is the same to a `f(a,b)`. 


--------------------------------------------------------------------------------
/tail.md:
--------------------------------------------------------------------------------
  1 | > [!WARNING]
  2 | > I'm new to Python internals, so the tutorial may contain mistakes and clunky solutions.
  3 | 
  4 | # Tail call optimization
  5 | A tail call happens when a function calls another as its last action, so it has nothing else to do: 
  6 | ```python
  7 | def g():
  8 |    return f()
  9 | ```
 10 | Tail call optimization eliminates the need for adding a new stack frame to the call stack. This is useful for writing recursive functions:
 11 | 
 12 | ```python
 13 | def fact(n, acc):
 14 |     if n == 1:
 15 |         return acc
 16 |     return fact(n-1, acc*n)
 17 | ```
 18 | 
 19 | Altrough Guido Van Rossum [considers](https://neopythonic.blogspot.com/2009/04/final-words-on-tail-calls.html ) tail call optimizatoin as unpythonic, it's interesting to implement it for educational purposes. Let's start.
 20 | 
 21 | ## Plan 
 22 | We are going to modify three steps of python inperpreter:
 23 | 
 24 | 1. Introduce a new `TAIL_CALL` bytecode opeartor to python vm
 25 | 2. Add an optimization to compiler, that inserts `TAIL_CALL` to code
 26 | 3. Implement an interpretator for the new bytecode
 27 | 
 28 | ## Prereq­ui­sites
 29 | Clone the CPython repo and checkout to a new branch.
 30 | ```bash
 31 | $ git clone git@github.com:python/cpython.git && cd cpython
 32 | $ git checkout tags/v3.12.1 -b tail-call
 33 | ```
 34 | 
 35 | ## A new bytecode instruction
 36 | 
 37 | #### About bytecodes
 38 | Python source code is compiled into bytecode.  Bytecode is a set of insturctions for the python vm. 
 39 | For example, check how `f(a, b)` is represented:
 40 | 
 41 | ```python
 42 | >>> import dis
 43 | >>> dis.dis("f(a, b)")
 44 |   0           0 RESUME                   0
 45 | 
 46 |   1           2 PUSH_NULL
 47 |               4 LOAD_NAME                0 (f)
 48 |               6 LOAD_NAME                1 (a)
 49 |               8 LOAD_NAME                2 (b)
 50 |              10 PRECALL                  2
 51 |              14 CALL                     2
 52 |              24 RETURN_VALUE
 53 | ```
 54 | 
 55 | These instructions are telling an interpreter to:
 56 | 
 57 | 1. Load a function to a value stack using `LOAD_NAME`
 58 | 2. Load value of `a` to the value stack using `LOAD_NAME`
 59 | 3. Load value of `b` to the value stack using `LOAD_NAME`
 60 | 4. Call the function using `CALL` with `2` arguments.
 61 | 
 62 | 
 63 | #### `TAIL_CALL` defenition
 64 | Let's introduce a new bytecode instruction. The `Python/bytecodes.c` contains defenitions and interpretations of python bytecodes. It's written in a custom syntax. 
 65 | 
 66 | Since we're interested in calls, let's introduce an `TAIL_CALL` by blank copy of regular `CALL`
 67 | 
 68 | ```diff
 69 |   macro(CALL) = _SPECIALIZE_CALL + unused/2 + _CALL;
 70 | + macro(TAIL_CALL) =  _SPECIALIZE_CALL + unused/2 + _CALL;
 71 | ```
 72 | 
 73 | Next, run `make regen-cases` to translate the `Python/bytecodes.c` to a propper c code. Let's have a look what is changed.
 74 | 
 75 | First, it defined a new bytecode. For example, in the `Include/opcode_ids.h`
 76 | 
 77 | ```diff
 78 | +#define TAIL_CALL                              116
 79 | ```
 80 | 
 81 | Second, it defined how to interpret the new bytecode in the `Python/generated_cases.c.h`. This file contains functions for the interptetiner. 
 82 | 
 83 | ```diff
 84 | +        TARGET(TAIL_CALL) {
 85 | +            _Py_CODEUNIT *this_instr = frame->instr_ptr = next_instr;
 86 | +            next_instr += 4;
 87 | +            INSTRUCTION_STATS(TAIL_CALL);
 88 | +            PyObject **args;
 89 | +            PyObject *self_or_null;
 90 | +            PyObject *callable;
 91 | +            PyObject *res;
 92 | ...
 93 | ```
 94 | 
 95 | #### Importlib
 96 | The other important step is to update the imporlib after introducing the new bytecode. Since some Python libraries are frozen and linked to the Python interpreter, it is necessary to freeze them after any bytecode updates.
 97 | 
 98 | Change `MAGIC_NUMBER` constant in the `Lib/importlib/_bootstrap_external.py`. This will lead to .pyc files with the old `MAGIC_NUMBER` to be recompiled by the interpreter on import. 
 99 | 
100 | Then, run `make regen-importlib`. 
101 | 
102 | <!-- todo importlib  -->
103 | 
104 | 
105 | ## Flowgraph optimization
106 | 
107 | According to the
108 | [CPython devguide](https://devguide.python.org/internals/compiler/#control-flow-graphs)  control flow graph is an intermediate result of python source code compilation. CFGs are usually one step away from final code output, and are perfect place to perfom a code optimization. 
109 | 
110 | Look to the to the `Python/flowgraph.c/_PyCfg_OptimizeCodeUnit`. The function updates code graph: removes unused condsts, insterts super instructions, etc.
111 | 
112 | ```c
113 | int
114 | _PyCfg_OptimizeCodeUnit(cfg_builder *g, PyObject *consts, PyObject *const_cache,
115 |                         int nlocals, int nparams, int firstlineno)
116 | {
117 |     ...
118 |     RETURN_IF_ERROR(optimize_cfg(g, consts, const_cache, firstlineno));
119 |     RETURN_IF_ERROR(remove_unused_consts(g->g_entryblock, consts));
120 |     RETURN_IF_ERROR(
121 |         add_checks_for_loads_of_uninitialized_variables(
122 |             g->g_entryblock, nlocals, nparams));
123 |     insert_superinstructions(g);
124 |     ...
125 | }
126 | ```
127 | 
128 | Insert a new optimization `optimize_tail_call`:
129 | 
130 | ```diff
131 |     RETURN_IF_ERROR(
132 |         add_checks_for_loads_of_uninitialized_variables(
133 |             g->g_entryblock, nlocals, nparams));
134 |     insert_superinstructions(g);
135 | +   optimize_tail_call(g);
136 | ```
137 | 
138 | And define it. The goal is to replace CALL→RETURN sequence to TAIL_CALL→RETURN:
139 | 
140 | ```c
141 | 
142 | static void
143 | optimize_tail_call(cfg_builder *g)
144 | {
145 |     for (basicblock *b = g->g_entryblock; b != NULL; b = b->b_next) {
146 | 
147 |         for (int i = 0; i < b->b_iused; i++) {
148 |             cfg_instr *inst = &b->b_instr[i];
149 |             int nextop = i+1 < b->b_iused ? b->b_instr[i+1].i_opcode : 0;
150 |             if (inst->i_opcode == CALL && nextop == RETURN_VALUE) {
151 |                     INSTR_SET_OP1(inst, TAIL_CALL, inst->i_oparg);
152 |             }
153 |   
154 |         }
155 |     }
156 | }
157 | ```
158 | 
159 | Let's check if the new optimizaiton is working.
160 | 
161 | Recompile cpython with `make -j6`
162 | 
163 | And check the new optimization with `dis` module:
164 | 
165 | ```python
166 | >>> def f():
167 | ...    return g():
168 | 
169 | >>> dis.dis(f)
170 | 
171 | 1           RESUME                   0
172 | 
173 | 2           LOAD_GLOBAL              1 (g + NULL)
174 |             TAIL_CALL                0
175 |             RETURN_VALUE
176 | ```
177 | 
178 | Perfect. Let's move to the final step. 
179 | 
180 | ##  Implement an interpretator for the new bytecode
181 | As previously mentioned, the `Python/bytecodes.c` file contains definitions and interpretations of Python bytecodes. 
182 | Currently, we are using a blank copy of `CALL` interptetier as `TAIL_CALL`. First, let's understand how it works.
183 | 
184 | 
185 | #### How `CALL` works 
186 | A bit of terminology. Call frame is a structure that represents a function call's execution context: local variables, function arguments etc.
187 | 
188 | Value stack is a list of pointers to python objects, that instructions operates. For example, 
189 | 
190 | The `Python/bytecodes.c/_CALL` manipulaets both structures. In summary, it does three things:
191 | 1. Creates a new call frame and pushes it to the call stack
192 | 2. Consumes arguments from the current frame's value stack
193 | 3. Passes control to the new frame
194 | 
195 | ```c
196 | //  Creates a new call frame and pushes it to the call stack
197 | int code_flags = ((PyCodeObject*)PyFunction_GET_CODE(callable))->co_flags;
198 | PyObject *locals = code_flags & CO_OPTIMIZED ? NULL : Py_NewRef(PyFunction_GET_GLOBALS(callable));
199 | _PyInterpreterFrame *new_frame = _PyEvalFramePushAndInit(
200 |     tstate, (PyFunctionObject *)callable, locals,
201 |     args, total_args, NULL
202 | );
203 | 
204 | // Consumes arguments from the current frame's value stack
205 | STACK_SHRINK(oparg + 2);
206 | 
207 | if (new_frame == NULL) {
208 |     GOTO_ERROR(error);
209 | }
210 | // Updates the current frame's return offset
211 | frame->return_offset = (uint16_t)(next_instr - this_instr);
212 | // Passes control to the new frame
213 | DISPATCH_INLINED(new_frame);
214 | ```
215 | 
216 | Simple and easy. Let's move to the next step. 
217 | #### `TAIL_CALL` interptetier
218 | To create a `TAIL_CALL` interptetier we are going to change a few things in the regular `CALL`.
219 | 
220 | We need to drop the current frame before creating a new call frame. However, because references to arguments are stored in the current dying frame, we need to store them before dropping. And clean them up after creating the new frame.
221 | 
222 | Let's move to `Python/bytecodes.c/_TAIL_CALL`. 
223 | 
224 | First, save args. Since CPython uses it's own memory allocator, use `PyMem_Malloc` to allocate memory. 
225 | 
226 | ```diff
227 | // Check if the call can be inlined or not
228 | if (Py_TYPE(callable) == &PyFunction_Type &&
229 |     tstate->interp->eval_frame == NULL &&
230 |     ((PyFunctionObject *)callable)->vectorcall == _PyFunction_Vectorcall)
231 | {
232 | + PyObject **newargs = PyMem_Malloc(sizeof(PyObject*) * (total_args));
233 | + Py_ssize_t j, n;
234 | + n = total_args;
235 | 
236 | + for (j = 0; j < n; j++)
237 | + {
238 | +     PyObject *x = args[j];
239 | +     newargs[j] = x;
240 | + }
241 | ```
242 | 
243 | Next, drop the current call frame. The snippet is a copy of the `Python/bytecodes.c/POP_FRAME` instruction.
244 | ```diff
245 | + STACK_SHRINK(oparg + 2);
246 | + _Py_LeaveRecursiveCallPy(tstate);
247 | +_PyFrame_SetStackPointer(frame, stack_pointer);
248 | + _PyInterpreterFrame *dying = frame;
249 | + frame = tstate->current_frame = dying->previous;
250 | +_PyEval_FrameClearAndPop(tstate, dying);
251 | + LOAD_SP();
252 | ```
253 | 
254 | Init a new frame using callable and new args.
255 | ```diff 
256 | _PyInterpreterFrame *new_frame = _PyEvalFramePushAndInit(
257 |     tstate, (PyFunctionObject *)callable, locals,
258 | -    args, total_args, NULL
259 | +    newargs, total_args, NULL
260 | );
261 | ```
262 | Clean up the argument stash:
263 | 
264 | ```diff
265 | + PyMem_Free(newargs);
266 | ```
267 | And pass contoll to the new frame:
268 | 
269 | ```c
270 | DISPATCH_INLINED(new_frame);
271 | ```
272 | 
273 | 
274 | Recompile CPython with `make regen-cases && make regen-importlib && make -j6` and test the new operator. 
275 | 
276 | 
277 | ## Final check
278 | 
279 | ```python
280 | >>> def fact(n, acc):
281 | ...    if n == 1:
282 | ...        return acc
283 | ...    return fact(n-1, acc*n)
284 | 
285 | >>>fact (1500,1)
286 | 
287 | 48119977967797748601669900935...
288 | ```
289 | 
290 | <!-- 
291 | All together:
292 | 
293 | ```c
294 | if (Py_TYPE(callable) == &PyFunction_Type &&
295 |                 tstate->interp->eval_frame == NULL &&
296 |                 ((PyFunctionObject *)callable)->vectorcall == _PyFunction_Vectorcall)
297 |             {
298 |     int code_flags = ((PyCodeObject*)PyFunction_GET_CODE(callable))->co_flags;
299 |     PyObject *locals = code_flags & CO_OPTIMIZED ? NULL : Py_NewRef(PyFunction_GET_GLOBALS(callable));
300 |     Py_INCREF(callable);
301 |     PyObject **newargs = PyMem_Malloc(sizeof(PyObject*) * (total_args));
302 |     Py_ssize_t j, n;
303 |     n = total_args;
304 | 
305 |     for (j = 0; j < n; j++)
306 |     {
307 |         PyObject *x = args[j];
308 |         newargs[j] = x;
309 |     }
310 |     STACK_SHRINK(oparg + 2);
311 | 
312 |     _Py_LeaveRecursiveCallPy(tstate);
313 |     _PyFrame_SetStackPointer(frame, stack_pointer);
314 |     _PyInterpreterFrame *dying = frame;
315 |     frame = tstate->current_frame = dying->previous;
316 |     _PyEval_FrameClearAndPop(tstate, dying);
317 |     LOAD_SP();
318 |     _PyInterpreterFrame *new_frame = _PyEvalFramePushAndInit(
319 |         tstate, (PyFunctionObject *)callable, locals,
320 |         newargs, total_args, NULL
321 |     );
322 |     PyMem_Free(newargs);
323 |     
324 |     if (new_frame == NULL) {
325 |         GOTO_ERROR(error);
326 |     }
327 |     
328 |         DISPATCH_INLINED(new_frame);
329 | }
330 | ``` -->


--------------------------------------------------------------------------------