├── README.md ├── examples └── iteration_and_generators.md ├── machine.md ├── memory.gv ├── memory.gv.png ├── operations.md ├── overview.md └── translation.md /README.md: -------------------------------------------------------------------------------- 1 | # python_formal_semantics 2 | 3 | Partial specification and examples for formal specification of Python's semantics. 4 | 5 | ## Small step operational semantics 6 | 7 | We aim to define the semantics of Python using [small step operational semantics](https://en.wikipedia.org/wiki/Operational_semantics#Small-step_semantics). 8 | 9 | Conventionally, small step semantics are using defined directly in terms of syntax. 10 | However, given the complex semantics of some statements, such as the `try`-`finally`, and `with` statements, we start by defining the operational semantics of an abstract machine using a set of operations. The semantics of each statement can then be defined by defining a translation from syntactice element to a list of operations. 11 | 12 | ## The semantics 13 | 14 | The semantics consists of interrelated parts. 15 | 16 | * A definition of an abstract machine 17 | * A start state for that machine 18 | * A halt state for that machine 19 | * A transition function for that machine 20 | 21 | Execution of a program consists of the sequence of transitions that begins at the start state and ends at the halt state. 22 | 23 | ### Semantics components 24 | 25 | * [Top level](.overview.md) 26 | * [Abstract machine](./machine.md) 27 | * [Transition functions](.operations.md) 28 | 29 | 30 | ## Python implementation 31 | 32 | As part of the definition we also aim to implement an implementation in Python. 33 | The ultimate goal is that this implementation would be able to run arbitrary Python code, 34 | and provide a reference implementation. While, it is unlikely to displace CPython as the de facto reference implementation, it will hopefully prove to be useful. 35 | -------------------------------------------------------------------------------- /examples/iteration_and_generators.md: -------------------------------------------------------------------------------- 1 | # Iteration and generators examples 2 | 3 | ## Introduction 4 | 5 | This is *not* the final formal semantics of iteration and generators, but it is close enough to be illustrative. 6 | 7 | See `Core.md` for details of the VM state, abbreviations and low-level instructions. 8 | 9 | ## Iteration 10 | 11 | The loop 12 | 13 | ```python 14 | for it in seq: 15 | body 16 | ``` 17 | 18 | Will compile into the following instruction sequence: 19 | 20 | ``` 21 | 22 | GET_ITER 23 | FOR_ITER exit 24 | 25 | 26 | 27 | exit: 28 | POP_TOP # discard the return value from the generator 29 | POP_TOP # discard the generator 30 | ``` 31 | 32 | ### Explanation of instructions 33 | 34 | #### GET_ITER 35 | 36 | The `GET_ITER` instruction replaces TOS with a generator suitable for iterating over that value. 37 | If TOS is a generator then this is a no op, otherwise: 38 | 39 | 1. Replace TOS with result of the C-level call to the `tp_iter` slot of the type of TOS. 40 | 2. If TOS is not a generator, wrap it with the generator resulting from `wrap(TOS)`. 41 | 3. Increment the `instruction pointer` 42 | 43 | `wrap` is defined as: 44 | 45 | ```python 46 | def wrap(obj): 47 | try: 48 | yield $ITERNEXT(obj) 49 | except StopIteration: 50 | return 51 | ``` 52 | 53 | `$ITERNEXT(obj)` is not a call, and compiles to: 54 | 55 | ``` 56 | 57 | ITERNEXT 58 | ``` 59 | 60 | #### ITERNEXT 61 | 62 | The `ITERNEXT` instruction makes a C-level call to the `tp_iternext` slot of the type of TOS. 63 | If NULL is returned, but no exception set by the C code, it sets the exception as `StopIteration`. 64 | 65 | #### FOR_ITER 66 | 67 | The `FOR_ITER` instruction does the following: 68 | 69 | 1. Set the `last` attribute of the current frame to the current instruction. 70 | 2. Set the `exit` attribute of the current frame to the instruction following the `exit:` label. 71 | 3. Push the frame of the generator in TOS to the current thread's frame stack. 72 | 4. Push `None` to the current frame's data stack. 73 | 5. Set the frame's `state` to `EXECUTING` 74 | 6. Set the `instruction pointer` to the instruction after the `last` attribute of the newly pushed frame. 75 | 3. `Continue` 76 | 77 | ## Generators 78 | 79 | The generator function 80 | 81 | ```python 82 | def gen(): 83 | yield 1 84 | return "done" 85 | ``` 86 | 87 | Will compile into the following instruction sequence: 88 | 89 | ``` 90 | MAKE_GEN 91 | YIELD_VALUE 92 | POP_TOP 93 | LOAD_CONSTANT 1 94 | POP_TOP 95 | LOAD_CONSTANT "done" 96 | GEN_RETURN 97 | exhausted: 98 | EXHAUSTED 99 | JUMP exhausted 100 | ``` 101 | 102 | ### Explanation of instructions 103 | 104 | #### MAKE_GEN 105 | 106 | The `MAKE_GEN` creates a generator from the current frame add pushes it to the stack. 107 | 108 | It performs the following steps: 109 | 110 | 1. Create a generator object who's frame is the current frame. 111 | 2. Push that generator to the stack. 112 | 3. Increment the `instruction pointer` 113 | 4. Continue 114 | 115 | #### YIELD_VALUE 116 | 117 | The `YIELD_VALUE` instruction does the following: 118 | 119 | 1. Store `instruction pointer` in the `last` attribute of the current frame. 120 | 2. Set the frame's `state` to `SUSPENDED` 121 | 3. Pop value from current frame’s data stack 122 | 4. Pop current frame from the thread's frame stack. 123 | 5. Push the popped value to current frame’s data stack. 124 | 6. Set the `instruction pointer` to the instruction after the `last` attribute of the newly pushed frame. 125 | 7. `Continue` 126 | 127 | #### GEN_RETURN 128 | 129 | The `GEN_RETURN` instruction does the following: 130 | 131 | 1. Store `instruction pointer` in the `last` attribute of the current frame. 132 | 2. Set the frame's `state` to `EXHAUSTED` 133 | 3. Pop value from current frame’s data stack 134 | 4. Pop current frame from the thread's frame stack. 135 | 5. Push the popped value to current frame’s data stack. 136 | 6. Set the `instruction pointer` to the instruction the `exit` attribute of the newly pushed frame. 137 | 7. `Continue` 138 | 139 | #### EXHAUSTED 140 | 141 | The `EXHAUSTED` instruction is equivalent to 142 | ``` 143 | LOAD_CONSTANT None 144 | GEN_RETURN 145 | ``` 146 | but does not use the stack, to allow implementations to free the memory for the stack as soon as the generate is exhausted. 147 | 148 | 149 | 150 | 151 | 152 | 153 | 154 | -------------------------------------------------------------------------------- /machine.md: -------------------------------------------------------------------------------- 1 | # Abstract machine description 2 | 3 | The following describes the components of the abstract machine, and how they are related. 4 | 5 | See [operations](./operations.md) for a description of the set of operation that operation on the abstract machine state. 6 | See [translation](./translation.md) for a description of how Python source is translated into those operations. 7 | See [objects](./objects.md) for a description of the core objects and classes necessary to implement Python. 8 | 9 | The following model assumes a single intepreter, it should be easily extended to incorporate several communicating interpreters. 10 | The current CPython implementation is a hybrid of these two models, but is attempting to move to the communicating interpreters model. 11 | 12 | ## Components 13 | 14 | ### Interpreter 15 | 16 | The interpreter has the following attributes: 17 | 18 | * Lock (the GIL) 19 | * Thread list (zero or more threads) 20 | * Reference to the currently running thread (if any) 21 | * Heap -- Where objects are allocated. Once objects are no longer reachable, they will be automatically reclaimed. 22 | 23 | ### Thread 24 | 25 | Each thread has the following attributes: 26 | 27 | * `instruction pointer` Pointer to next instruction to be executed 28 | * Frame stack 29 | 30 | ### Frame 31 | 32 | Each frame has the following attributes: 33 | 34 | * Data `stack`: used for computation 35 | * `locals`: the local variables 36 | * Reference to global variable mapping 37 | * Reference to builtins dictionary 38 | * `code`: a reference to the code object 39 | * `state`: one of `CREATED`, `EXECUTING`, `SUSPENDED`, or `EXHAUSTED` 40 | * `last`: index of last instruction executed in this frame 41 | * `exit`: index of instruction to execute when exiting from callee generator 42 | 43 | ### Instructions 44 | 45 | Execution in the abstract machine procedes by performing the operation described by instructions. 46 | Each instruction has the following attributes: 47 | 48 | * Name: name of the operation 49 | * Operand: a name or number that specifies the exact behaviour of the operation. 50 | * Line number: The line number of the source code coresponding to this instruction. 51 | 52 | ### Code 53 | 54 | Each code object has the following attributes: 55 | 56 | * List of arguments 57 | * List of instructions. 58 | * Source filename 59 | * Names of all local variable names: For debugging purposes. 60 | 61 | ## Execution 62 | 63 | Each thread in the machine repeated executes the current operation, until it halts. 64 | In Python pseudo-code, this would be something like: 65 | ```python 66 | while True: 67 | name, operand, line = get_current_instruction() 68 | if name == "halt": 69 | break 70 | execute_operation(name, operand) 71 | ``` 72 | 73 | ### Running Python code 74 | 75 | To run some Python code, either as a module or just a piece of source code, 76 | the following steps must occur. 77 | 78 | * Create an interpreter 79 | * The Python code should be compiled to a code object for that interpreter. 80 | * Create a thread within that interpreter 81 | * Start the execution loop 82 | 83 | ### Running a code object in a thread 84 | 85 | To run a code object in a thread, the thread's frame stack must be set up, then execution procedes as normal. 86 | The frame stack is set up as follows: 87 | 88 | * An entry frame is pushed to the thread's frame stack. 89 | * A frame for the code object is created, and pushed to the thread's frame stack. 90 | 91 | The entry frame is a minimal (no locals, globals, or builtins) frame, whose code object contains the one instruction: `halt`. 92 | The entry frame's `last` is set to -1. 93 | 94 | Normal execution will execute the code object. Once that returns, the `halt` instruction will be reached and execution will stop. 95 | -------------------------------------------------------------------------------- /memory.gv: -------------------------------------------------------------------------------- 1 | digraph G { 2 | rankdir = LR 3 | splines = "true" 4 | 5 | subgraph cluster_0 { 6 | node [shape=rectangle]; 7 | label = "Interpreter 0"; 8 | style=filled; 9 | color=lightgrey; 10 | 11 | istate0 [ 12 | shape = "record" 13 | label = " Interpreter State | Lock | Allocator" 14 | ]; 15 | subgraph cluster_heap0 { 16 | 17 | "frame_c" -> "frame_b" -> "frame_a" 18 | "frame_m" -> "frame_n" 19 | 20 | label = "Heap" 21 | style=filled; 22 | color="#bbeebb"; 23 | shared0 [ label = "Shared memory referrence" ] 24 | channels0 [label="Channel interface"] 25 | int0 [label="int"] 26 | type0 [label="type"] 27 | object0 [label="object"] 28 | int0 -> type0 [ style="dashed"] 29 | frame0 [label="frame class"] 30 | frame0 -> type0 [ style="dashed"] 31 | object0 -> type0 [ style="dashed"] 32 | frame_a -> frame0 [style="dashed"] 33 | frame_b -> frame0 [style="dashed"] 34 | frame_c -> frame0 [style="dashed"] 35 | frame_m -> frame0 [style="dashed"] 36 | frame_n -> frame0 [style="dashed"] 37 | } 38 | 39 | 40 | "thread0" [ 41 | shape = "record" 42 | rankdir = TB 43 | label = " Thread 0 | frame | C stack | ... | ..." 44 | ]; 45 | 46 | "thread1" [ 47 | shape = "record" 48 | rankdir = TB 49 | label = " Thread 1 | frame | C stack | ... | ..." 50 | ]; 51 | 52 | 53 | 54 | thread0:f0:e -> istate0 55 | thread0:f1 -> frame_c 56 | 57 | thread1:f0 -> istate0 58 | thread1:f1 -> frame_m 59 | 60 | } 61 | 62 | subgraph cluster_1 { 63 | node [shape=rectangle]; 64 | 65 | label = "Interpreter 1"; 66 | style=filled; 67 | color=lightgrey; 68 | 69 | istate1 [ 70 | shape = "record" 71 | label = " Interpreter State | Lock | Allocator" 72 | ]; 73 | 74 | subgraph cluster_heap1 { 75 | 76 | label = "Heap" 77 | 78 | "frame_p" -> "frame_q" 79 | "frame_x" -> "frame_y" -> "frame_z" 80 | 81 | style=filled; 82 | color="#bbeebb"; 83 | channels1 [label="Channel interface", rank = 0] 84 | 85 | int1 [label="int"] 86 | type1 [label="type"] 87 | object1 [label="object"] 88 | int1 -> type1 [ style="dashed"] 89 | frame1 [label="frame class"] 90 | frame1 -> type1 [ style="dashed"] 91 | object1 -> type1 [ style="dashed"] 92 | frame_p -> frame1 [style="dashed"] 93 | frame_q -> frame1 [style="dashed"] 94 | frame_x -> frame1 [style="dashed"] 95 | frame_y -> frame1 [style="dashed"] 96 | frame_z -> frame1 [style="dashed"] 97 | shared1 [ label = "Shared memory referrence" ] 98 | } 99 | 100 | "thread2" [ 101 | shape = "record" 102 | rankdir = TB 103 | label = " Thread 2 | frame | C stack | ... | ..." 104 | ]; 105 | 106 | "thread3" [ 107 | shape = "record" 108 | rankdir = TB 109 | label = " Thread 3 | frame | C stack | ... | ..." 110 | ]; 111 | 112 | 113 | thread3:f0 -> istate1:f0 114 | thread3:f1 -> frame_x 115 | thread2:f0 -> istate1:f0 116 | thread2:f1 -> frame_p 117 | 118 | } 119 | 120 | subgraph cluster_shared { 121 | rankdir = LR 122 | label = "Shared Memory" 123 | style=filled; 124 | color="#eebbbb"; 125 | "shared" [ label = "Shared buffer", shape=rectangle ] 126 | "channel" [ label = "Channel", shape=rectangle ] 127 | shared -> channel [style="invis"] 128 | } 129 | 130 | channels0 -> channel 131 | channels1 -> channel 132 | channel -> channels0 133 | channel -> channels1 134 | shared0 -> shared 135 | shared1 -> shared 136 | 137 | } 138 | -------------------------------------------------------------------------------- /memory.gv.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/markshannon/python_formal_semantics/d4a8c36298b8ec2ad52aa1bc7dee726add096571/memory.gv.png -------------------------------------------------------------------------------- /operations.md: -------------------------------------------------------------------------------- 1 | # Operations supported by the abstract machine 2 | 3 | This file lists the operations on the abstract machine. 4 | 5 | ## Calls and helpers 6 | 7 | 8 | ### FFI_CALL 9 | 10 | This opcode makes calls to builtin functions. The operand is the number of arguments 11 | 12 | ``` 13 | builtin_func *args -> result 14 | ``` 15 | 16 | Pops the arguments and builtin-function and makes the call using the foreign function interface. 17 | If the call is successful, the result is pushed to the stack. 18 | If the call fails, then unwinding occurs. 19 | 20 | For the purposes of this sematics, builtin functions can take only a fixed number of positional arguments. 21 | More complex interfaces are possible by wrapping the builtin function in a Python function. 22 | 23 | ### Calls to Python functions. 24 | 25 | Calls to Python functions are performed in a two step process. Frame creation and pushing the frame. 26 | 27 | ### MAKE_FRAME 28 | 29 | This operation has no operand. 30 | 31 | ``` 32 | func args_tuple keyword_dict -> frame 33 | ``` 34 | 35 | The function, a tuple of positional arguments, and a dictionary of keyword arguments are popped from the stack. 36 | The dictionary is top of the stack, next the positional arguments, then the function. 37 | The frame is initialized as follows: 38 | 39 | * If the length of the tuple is greater than the number of arguments, excluding keyword-only arguments, then fail. 40 | * Move the values in the tuple into the locals, starting at position 0. 41 | * For each key, value pair in the keyword arguments: 42 | * if the key is not a string, then fail. 43 | * if there is no argument of that name, then fail. 44 | * if that argument has already been assigned, then fail. 45 | * if that argument is positional-only, then fail. 46 | * move the value to the named argument 47 | * For each argument so far unassigned: 48 | * Assign the default value, if there is one. Otherwise, fail. 49 | * Copy the `globals`, `builtins`, and `code` attributes from the function. 50 | * Set `state` to `CREATED` 51 | * Set `last` to -1 52 | 53 | If frame creation fails, then perform unwinding with a `TypeError`. 54 | 55 | ### ENTER_FRAME 56 | 57 | ``` 58 | frame -> 59 | ``` 60 | 61 | This operation has no operands. 62 | Sets `last` attribute of the frame currently on top of the stack to the thread's `next` attribute. 63 | Pushes the frame to the top of the current thread's frame stack. 64 | Sets the thread's `next` attribute to the frame's `last` attribute, plus one. 65 | 66 | ### RETURN 67 | 68 | This operation has no operands. 69 | 70 | Callee: 71 | ``` 72 | value -> 73 | ``` 74 | 75 | then caller: 76 | ``` 77 | -> value 78 | ``` 79 | 80 | Pops the value from the data stack. Note: this should be the only value on the stack. 81 | Pops the frame from the thread's frame stack. 82 | Pushes the value to the data stack. 83 | 84 | ### Call Helpers 85 | 86 | To implement calls, we need a few helper operations to build up the arguments. 87 | 88 | ### LIST_APPEND 89 | 90 | ``` 91 | list item -> list 92 | ``` 93 | 94 | Append `item` to `list`. 95 | 96 | ### DICT_INSERT_NO_DUPLICATE 97 | 98 | ``` 99 | dict key value -> dict 100 | ``` 101 | 102 | Insert `key, value` pair into `dict`, raising an Exception if `key` is already present in `dict`. 103 | 104 | ### LIST_TO_TUPLE 105 | 106 | ``` 107 | list -> tuple 108 | ``` 109 | 110 | Convert `list` to `tuple`. 111 | 112 | ### MAPPING_TO_DICT 113 | 114 | ``` 115 | obj -> dict 116 | ``` 117 | 118 | Convert `obj` to a mapping. Fail if `obj` is not a mapping. 119 | 120 | ## Type checking operations 121 | 122 | ### TYPE 123 | 124 | Takes no operand. 125 | 126 | ``` 127 | obj -> cls 128 | ``` 129 | 130 | Pops the object on top of the stack and pushes its class. 131 | 132 | ### SUBTYPE 133 | 134 | Takes no operand. 135 | 136 | ``` 137 | cls1 cls2 -> res 138 | ``` 139 | 140 | Pushes `True` if `cls1` is a direct subtype of `cls2`, 141 | ignoring anything but the class's MRO. 142 | ``` 143 | res = cls1 in cls2.__mro__ 144 | ``` 145 | 146 | ## Control Flow 147 | 148 | Three are three local control flow operations: 149 | 150 | * ``JUMP`` -- Jumps uncondtionally to the target label. 151 | * ``BRANCH`` -- Pops the top value on the stack and jumps to the target label, if value matches condition. 152 | 153 | And non-local control flow operations 154 | * ``RETURN`` -- Returns from a call, see section on calls. 155 | * ``RAISE`` -- Starts unwinding, see section on exceptions. 156 | * ``HALT`` -- Stops the current thread. 157 | 158 | In addition there is a `TO_BOOL` operation to convert any value to a boolean. 159 | This is provided to avoid have to mix the conversion to boolean and the jump into a single operation. 160 | 161 | ### JUMP 162 | 163 | ``` 164 | -> 165 | ``` 166 | 167 | This operation has one operand, the `offset`, as a signed integer, of the target instruction from the following instruction. 168 | 169 | The `jump` instruction simply adds the `offset` to the `next` attribute of the current thread. 170 | 171 | ### BRANCH 172 | 173 | ``` 174 | cond -> 175 | ``` 176 | 177 | This operation has two operands, the `way` (a boolean), and the `offset`, as a signed integer, of the target instruction from the following instruction. 178 | 179 | The `BRANCH` operation pops the the value from the top of the stack and, if the value is the same as `way`, add the `offset` to the `next` attribute of the current thread. 180 | 181 | ```python 182 | def BRANCH(cond, way, target): 183 | if cond is way: 184 | JUMP(target) 185 | ``` 186 | 187 | ### TO_BOOL 188 | 189 | ``` 190 | obj -> bool 191 | ``` 192 | 193 | This operation has no operands. 194 | Converts the obj on top of the stack to a bool. 195 | This operation is not strictly necessary as it is defined as: 196 | 197 | ```python 198 | def TO_BOOL(obj): 199 | has_bool, to_bool = load_special!(obj, "__bool__") 200 | if has_bool: 201 | res = to_bool() 202 | if res is True or res is False: 203 | return res 204 | raise TypeError(...) 205 | return True 206 | ``` 207 | 208 | ### HALT 209 | 210 | Halt has no operands. 211 | 212 | A special operation, `HALT`, exists for terminating execution of a thread. 213 | All other operations implicitly procede to executing the next operation. 214 | `HALT` does not; execution of the thread halts. 215 | 216 | `HALT` removes the current thread from the interpreter's `threads` and `runnable-threads` sets. 217 | 218 | 219 | ## Exception Handling 220 | 221 | The are three exception handling control operations: 222 | 223 | ### PUSH_HANDLER 224 | 225 | ``` 226 | -> 227 | ``` 228 | 229 | Pushes an exception handler to the handler stack. 230 | Takes one operand, the offset, as a signed integer, of the target instruction from the following instruction. 231 | 232 | This instruction pushes the pair `(stackdepth, target)` to the handler stack, 233 | where `stackdepth` is the current depth of the data stack. 234 | 235 | #### POP_HANDLER 236 | 237 | Has no operand. 238 | 239 | ``` 240 | -> 241 | ``` 242 | 243 | Pops the top pair from the handler stack. 244 | 245 | #### SWAP_EXCEPTION 246 | 247 | Swaps the exception on top of the data stack with the current exception 248 | 249 | ``` 250 | exc -> current_exc 251 | ``` 252 | Sets `current_exc = exc`. 253 | -------------------------------------------------------------------------------- /overview.md: -------------------------------------------------------------------------------- 1 | # Top level semantics 2 | 3 | The operational semantics consists of an abstract machine state, a start state, 4 | a halt state and and a transition function. 5 | 6 | The abstract machine, described more fully [here] (./machine.md) 7 | consists of a single interpreter. 8 | We expect to change the model to include several interpreters as CPython supports that model of execution. 9 | 10 | Each interpreter consists of a set of threads and a set of runnable threads 11 | 12 | `runnable-threads ⊆ threads` 13 | 14 | ## The interpreter semantics 15 | 16 | ### Start state 17 | 18 | The state state of an interpreter is as follows: 19 | 20 | * There is exact one thread and it is runnable. 21 | * The code for that thread is the code for the interpreter 22 | * That thread is in its state state. 23 | 24 | ### Halt state 25 | 26 | The halt state occurs when there are no runnable threads. 27 | That is `runnable-threads = ∅`. 28 | 29 | If there are no threads, `threads = ∅`, then interpreter has terminated. 30 | If there are threads, but none are runnable, then the interpreter has deadlocked. 31 | 32 | ### Execution steps 33 | 34 | Execution of the abstract machine occurs in a, possibly infinite, number of steps. 35 | Each step causes the abstract machine to transition from one state to another. 36 | Each step is defined by a transition function and is an atomic, indivisible operation. 37 | 38 | #### Transition function 39 | 40 | The transition function for the interpreter is: 41 | 42 | * Choose fairly a thread from `runnable-threads` 43 | * Execute the transition function for that thread. 44 | 45 | Here "fairly" is defined to mean that it tends to random where averaged over larger number of steps. 46 | 47 | ## Threads 48 | 49 | The interpreter consists of a number of threads. Threads run concurrently, 50 | but only one step is executed at a time per interpreter. 51 | 52 | ### Start state 53 | 54 | To create the start state for a thread for a `code` object `c`: 55 | 56 | * Start with an empty frame stack 57 | * Push a "halt frame", that is a frame with no locals, globals, or builtins, whose code object contains the single instruction `halt`, and `last` is set to -1. 58 | * Push the frame for the `code` object `c`, as if `eval` was being called. 59 | 60 | ### Halt state 61 | 62 | Threads do not have an explicit `halt` state, but when execution encounters a `halt` instruction, the thread is removed from the interpreter's `thread` set, and can thus never be run again. 63 | 64 | ### Transition function 65 | 66 | The transition function of a thread is the core of the semantics. 67 | The transition function is similar to the execution of real hardware in its most general form: fetch, then execute. 68 | 69 | Real hardware has a decode phase, but an abstract machine has no encoding, so there is no need to decode. 70 | 71 | The fetch phase simply sets the `last` value in the topmost frame to `next`, reads the instruction from `top_frame.code.instructions[next]`, then increments `next` by one. 72 | The execute phase is described in (operations.md)[./operations.md]. 73 | 74 | 75 | 76 | -------------------------------------------------------------------------------- /translation.md: -------------------------------------------------------------------------------- 1 | # Translation from source code to a sequence of operations 2 | 3 | In order to translate source code to a sequence of operations, it must first be parsed. 4 | Parsing produces a tree representing the source code. 5 | Correct parsing of the source code is complex, and out of scope for this documentation. 6 | 7 | For this document we assume that source has been parsed to form a tree and that each syntactic part of the original source is represented by a subtree of the tree for the whole source. 8 | 9 | We express the tranlsation from source to sequence of operations as a function on part of the AST than produces a sequence of operations. Translation is a recursive process, translation of the AST for `a + b` requires the result of translating `a` and of translating `b`. 10 | 11 | For convenience, the expression `translate(x)` means translate the AST for `x`. 12 | 13 | ### Terminology 14 | 15 | #### Pseudo operations 16 | 17 | In the following we will often describe the way one syntactic construct is translated by converting it to another one. 18 | However, it is not always possible to do directly in Python source and we need to include a pseudo-operation. 19 | We use a `!` to signify that a name is an pseudo-operation, not a Python variable. 20 | For example, `load_attr` is just the Python variable "load_attr", but `load_attr!` means the "load_attr" pseudo-operation. 21 | 22 | When describing translations and pseudo-operations, the pseudo-operations may used in what looks a normal Python call. 23 | These should not be considered calls in Python, but rather like calls in a language like C, where there is no lookup and 24 | the execution enters the pseudo-operation directly. 25 | 26 | #### emit 27 | 28 | The function `emit` emits the instruction to the imaginary buffer we are using to compile the code. 29 | 30 | ```python 31 | def emit(opcode, operand=None): 32 | ... 33 | ``` 34 | 35 | ## Expressions 36 | 37 | ### Attibutes 38 | 39 | #### Load 40 | 41 | The expression `a.b` is translated with help from the pseudo-operation `load_special` and 42 | a call to the object's class's `__getattribute__` method. 43 | 44 | `a.b` is translated by: 45 | 46 | ```python 47 | translate(a) 48 | emit(load_attr!, "b") 49 | ``` 50 | 51 | where `load_attr!` is defined as: 52 | 53 | ```python 54 | def load_attr!(obj, name): 55 | _, getattr = load_special!(obj, "__getattribute__") 56 | assert(_ is True) # __getattribute__ is defined on object. 57 | return getattr(name) 58 | ``` 59 | 60 | ## Load_special 61 | 62 | Loading a special attribute from an object's class is a very common operation, 63 | used to perform almost any operation. 64 | 65 | ```python 66 | def load_special!(obj, name): 67 | cls = TYPE(obj) 68 | has_attr, descriptor = lookup_class_attr!(cls, name) 69 | if has_attr: 70 | desc_type = TYPE(descriptor) 71 | has_getter, getter = lookup_class_attr!(desc_type, '__get__') 72 | if has_getter: 73 | return True, getter(obj, cls) 74 | else: 75 | return True, descriptor 76 | return False, None 77 | ``` 78 | 79 | ### lookup_class_attr! 80 | 81 | ```python 82 | def lookup_class_attr!(cls, name): 83 | for scls in GET_MRO(cls): 84 | succ, result = LOOKUP_DICT(GET_DICT(scls), name) 85 | if succ: 86 | return result 87 | return False, None 88 | ``` 89 | 90 | ### Calls 91 | 92 | The call `f(*args, **kwrgs)` is translated using the `call!` pseudo-operation: 93 | 94 | ```python 95 | translate(f) 96 | translate(args) 97 | translate(kwargs) 98 | emit(call!) 99 | ``` 100 | 101 | Most calls are not of the form, `f(*args, **kwrgs)`. To translate those calls, 102 | all positional arguments must be merged to form a tuple, and all named arguments must be 103 | merged to form a dictionary. 104 | 105 | Translation then procedes as follows: 106 | 107 | 1. Create an empty list 108 | 2. For each positional or star argument: 109 | * Tranlsate the argument 110 | * If star argument: 111 | * Extend the list 112 | * Else: 113 | * Emit `LIST_APPEND` operation. 114 | 3. Emit `LIST_TO_TUPLE` 115 | 4. Create an empty dict 116 | 5. For each named or double-star argument: 117 | * If double-star argument: 118 | * Merge into the dict, with `MERGE_DICT_NO_DUPLICATES`. 119 | * Else: 120 | * Tranlsate the key 121 | * Tranlsate the value 122 | * Emit `DICT_INSERT_NO_DUPLICATES` operation 123 | 6. Emit the `CALL` operation. 124 | 125 | 126 | #### call! 127 | 128 | The `call!` pseudo-operation is defined as follows: 129 | 130 | ```python 131 | def call!(func, args, kwargs): 132 | while True: 133 | if TYPE(func) is types.FunctionType: 134 | frame = MAKE_FRAME(func, args, kwargs) 135 | ENTER_FRAME(frame) 136 | break 137 | if TYPE(func) is types.BuiltinFunctionType: 138 | if kwargs: 139 | raise TypeError(...) 140 | success, value = FFI_CALL(func, args) 141 | if success: 142 | PUSH(value) 143 | else: 144 | raise value 145 | break 146 | if TYPE(func) is types.MethodType: 147 | args = (func.self,) + args 148 | func = func.function 149 | continue 150 | is_callable, func = load_special!(func, "__call__") 151 | if not is_callable: 152 | raise TypeError("Not callable") 153 | ``` 154 | 155 | #### Example 156 | 157 | The translation of `f(a, *b, c, x=1, **y, z=2)` is: 158 | 159 | ``` 160 | LOAD_FAST 'f' 161 | BUILD_LIST 0 162 | LOAD_FAST 'a' 163 | LIST_APPEND 1 164 | LOAD_FAST 'b' 165 | LIST_EXTEND 166 | LOAD_FAST 'c' 167 | LIST_APPEND 1 168 | LIST_TO_TUPLE 169 | NEW_DICT 170 | LOAD_CONSTANT "x" 171 | LOAD_CONSTANT 1 172 | DICT_INSERT_NO_DUPLICATE 173 | LOAD_FAST 'y' 174 | MERGE_DICT_NO_DUPLICATES 175 | LOAD_CONSTANT "z" 176 | LOAD_CONSTANT 2 177 | DICT_INSERT_NO_DUPLICATE 178 | CALL 179 | ``` 180 | 181 | ### Binary operations 182 | 183 | Binary operations of the form `l op r` are all translated the same way, using the `binary_op` pseudo-operation. 184 | 185 | `l op r` is translated by: 186 | 187 | ```python 188 | translate(l) 189 | tranlsate(r) 190 | opname, oprname = binary_op_names[op] 191 | emit(binary_op!, (opname, oprname)) 192 | ``` 193 | 194 | where `binary_op_names` is a table defining the opnames. 195 | Most opnames take the form `__xxx__`, `__rxxx__` where xxx is the mnemonic form of the operator. 196 | For example, the opnames for `+` are `__add__` and `__radd__`. 197 | 198 | #### binary_op! 199 | 200 | `binary_op!` is defined as follows: 201 | 202 | ``` 203 | def binary_op!(l, r, name, rname): 204 | if SUBTYPE(TYPE(r), TYPE(l)) and r is not l: # Strict subclass 205 | r, l = l, r 206 | name, rname = rname, name 207 | succ, op = load_special!(l, name) 208 | if succ: 209 | res = op(r) 210 | if res is not NotImplemented: 211 | return res 212 | succ, op = load_special!(r, rname) 213 | if succ: 214 | res = op(l) 215 | if res is not NotImplemented: 216 | return res 217 | raise TypeError(...) 218 | ``` 219 | 220 | ## Statements 221 | 222 | ### Assignments 223 | 224 | #### Simple assignment 225 | 226 | The translation of the simple assignment `a = b` depends on the scope in which `a` resides. 227 | In all case the value `b` is translated first. 228 | 229 | * If `a` is a function local variable then emit `store_local("a")` 230 | * If `a` is a class local variable, then emit `store_name("a")` 231 | * If `a` is a global variable, then emit `store_global("a")` 232 | * If `a` is a non-local, that is a variable in an outer function, emit `store_nonlocal("a", n)` where `n` is the nesting difference between scope of the store and the scope of the variable. 233 | 234 | Determining the scope of a variable requires symbol table analysis. 235 | 236 | #### Attribute assignments 237 | 238 | The statement `a.b = x` is translated with help from the operation `load_special` and 239 | a call to the object's class's `__setattr__` method. 240 | 241 | ``` 242 | translate(x) 243 | translate(a) 244 | emit(store_attr!, "b") 245 | POP() # discard result of call to store_attr 246 | ``` 247 | 248 | where `store_attr!` is defined as: 249 | 250 | ```python 251 | def store_attr!(value, obj, name): 252 | _, setattrfunc = load_special!(obj, "__setattr__") 253 | assert _ is True # object has __setattr__, so it is always defined 254 | setattrfunc(name, value) 255 | ``` 256 | 257 | #### Indexed assignments 258 | 259 | The statement `a[i] = x` is translated with help from the operation `load_special` and 260 | a call to the object's class's `__setitem__` method. 261 | 262 | ``` 263 | translate(x) 264 | translate(a) 265 | translate(i) 266 | emit(store_subscr!) 267 | POP() # discard result of call to store_subscr 268 | ``` 269 | 270 | where `store_subscr!` is defined as 271 | 272 | ```python 273 | def store_subscr!(value, obj, key): 274 | succ, func = load_special!(obj, "__setitem__") 275 | if succ: 276 | func(key, value) 277 | else: 278 | raise TypeError(...) 279 | ``` 280 | 281 | ### Augmented assignments 282 | 283 | Augmented assignments, such as `a += b` are implemented as if the operator and assignment were seperated, like `a = a + b`, but the any subexpression in `a` are only evaluated once and the inplace form of the operator is used. 284 | 285 | `a += b` is translated by: 286 | 287 | ```python 288 | translate(a) 289 | translate(b) 290 | emit(inplace_op!, ("__iadd__", "__add__", "__radd__")) 291 | translate_store(a) 292 | ``` 293 | 294 | but `a.x += b` would be translated as: 295 | ```python 296 | translate(a) 297 | emit(STORE_TEMP, 'obj') 298 | emit(LOAD_TEMP, 'obj') 299 | emit(LOAD_ATTR, 'x') 300 | translate(b) 301 | emit(inplace_op!, ("__iadd__", "__add__", "__radd__")) 302 | emit(LOAD_TEMP, 'obj') 303 | emit(STORE_ATTR, 'x') 304 | ``` 305 | 306 | and `a[i] += b` is translated by: 307 | 308 | ```python 309 | translate(a) 310 | emit(STORE_TEMP, 'obj') 311 | emit(LOAD_TEMP, 'obj') 312 | translate(i) 313 | emit(STORE_TEMP, 'index') 314 | emit(LOAD_TEMP, 'index') 315 | emit(LOAD_SUBSCR) 316 | translate(b) 317 | emit(inplace_op!, ("__iadd__", "__add__", "__radd__")) 318 | emit(LOAD_TEMP, 'obj') 319 | emit(LOAD_TEMP, 'index') 320 | emit(STORE_SUBSCR) 321 | ``` 322 | 323 | where `inplace_op!` is defined as: 324 | 325 | ```python 326 | def inplace_op!(l, r, iname, lname, rname): 327 | 328 | succ, op = load_special!(l, iname) 329 | if succ: 330 | res = op(r) 331 | if res is not NotImplemented: 332 | return res 333 | return binary_op!(l, r, lname, rname) 334 | ``` 335 | 336 | ### Compound Statements and control flow 337 | 338 | 339 | #### The `if` statement 340 | 341 | ``` 342 | if test: 343 | body 344 | ``` 345 | 346 | Is translated as: 347 | 348 | ``` 349 | translate(test) 350 | emit(BRANCH, (False, end)) 351 | translate(body) 352 | end: 353 | ``` 354 | 355 | ``` 356 | if test: 357 | body 358 | else: 359 | elsebody 360 | ``` 361 | 362 | Is translated as: 363 | 364 | ``` 365 | translate(test) 366 | emit(BRANCH, (False, orelse)) 367 | translate(body) 368 | emit(JUMP, end) 369 | orelse: 370 | translate(elsebody) 371 | end: 372 | ``` 373 | 374 | 375 | #### Control flow stack 376 | 377 | In order to correctly translate loops, `try` statements, `with` statements and anything else with complex control flow, we need to maintain a control flow stack. This is a stack of control flow structures that we are translating. It has no operational equivalent. 378 | We define an anciliary function : `push_control(kind, loop_label, exit_label, ast)`. 379 | This function is used in a with statement to temporarily push a control to the control stack. 380 | 381 | #### The `while` statement 382 | 383 | ``` 384 | while test: 385 | body 386 | ``` 387 | 388 | Is translated by: 389 | 390 | ``` 391 | tranlsate(test) 392 | emit(BRANCH, (False, end)) 393 | loop: 394 | with push_control("loop", (loop, end)): 395 | translate(body) 396 | emit(JUMP, loop) 397 | end: 398 | ``` 399 | 400 | #### The `for` statement 401 | 402 | ``` 403 | for var in seq: 404 | body 405 | ``` 406 | 407 | Is translated as: 408 | 409 | ``` 410 | translate(seq) 411 | emit(GET_ITER) 412 | emit(STORE_TEMP, "iter") 413 | loop: 414 | emit(LOAD_TEMP, "iter") 415 | emit(FOR_ITER, end) 416 | translate_store(var) 417 | with push_control("loop", (loop, end)): 418 | translate(body) 419 | emit(JUMP, loop) 420 | end: 421 | ``` 422 | 423 | #### The `try` `except` statement 424 | 425 | ``` 426 | try: 427 | body 428 | except: 429 | handler 430 | ``` 431 | 432 | Is translated as: 433 | 434 | ``` 435 | emit(PUSH_HANDLER(label) 436 | with push_control("try", None): 437 | translate(body) 438 | emit(POP_HANDLER) 439 | emit(JUMP, end) 440 | label: 441 | emit(SWAP_EXCEPTION) 442 | emit(PUSH_HANDLER, cleanup) 443 | with push_control("handler", None): 444 | translate(handler) 445 | emit(POP_HANDLER) 446 | emit(SWAP_EXCEPTION) 447 | emit(POP_TOP) 448 | emit(JUMP, end) 449 | cleanup: 450 | emit(ROT_TWO) 451 | emit(SWAP_EXCEPTION) 452 | emit(POP_TOP) 453 | emit(RERAISE) 454 | end: 455 | ``` 456 | 457 | ``` 458 | try: 459 | body 460 | except ex as name: 461 | handler 462 | ``` 463 | 464 | is converted to: 465 | 466 | ``` 467 | try: 468 | body 469 | except ex as name: 470 | handler 471 | except: 472 | reraise! 473 | ``` 474 | 475 | which is translated as: 476 | 477 | ``` 478 | emit(PUSH_HANDLER, label) 479 | with push_control("try", None): 480 | translate(body) 481 | emit(POP_HANDLER) 482 | emit(JUMP, end) 483 | label: 484 | emit(DUP_TOP) 485 | emit(SWAP_EXCEPTION) 486 | emit(ROT_TWO) 487 | translate(ex) 488 | emit(EXCEPTION_MATCHES) 489 | emit(BRANCH_ON_FALSE, no_match) 490 | emit(PUSH_HANDLER, cleanup) 491 | translate_store(name) 492 | with push_control("handler", name): 493 | translate(handler) 494 | pop_handler! 495 | clear_local!(name) 496 | emit(JUMP, end) 497 | cleanup: 498 | emit(ROT_TWO) 499 | emit(SWAP_EXCEPTION) 500 | translate_clear(name) 501 | emit(POP_TOP) 502 | emit(RAISE) 503 | no_match: 504 | emit(RAISE) 505 | end: 506 | ``` 507 | 508 | #### The `try` `finally` statement 509 | 510 | ``` 511 | try: 512 | body 513 | finally: 514 | final_body 515 | ``` 516 | 517 | is translated as: 518 | 519 | ``` 520 | emit(PUSH_HANDLER, finally_label) 521 | with push_control("finally", final_body): 522 | translate(body) 523 | emit(POP_HANDLER) 524 | translate(final_body) 525 | emit(JUMP, end) 526 | finally_label: 527 | translate(final_body) 528 | end: 529 | ``` 530 | 531 | #### The `with` statement 532 | 533 | As described in PEP 343 534 | 535 | ``` 536 | with cm: 537 | body 538 | ``` 539 | 540 | Is translated as if the code were: 541 | 542 | ``` 543 | $exit_tmp = load_special!(cm, "__exit__") 544 | load_special!(cm, "__enter__")() 545 | try: 546 | body 547 | except: 548 | if not $exit_tmp(*sys.exc_info()): 549 | raise 550 | else: 551 | $exit_tmp(None, None, None) 552 | ``` 553 | 554 | ``` 555 | with cm as var: 556 | body 557 | ``` 558 | 559 | Is translated as if the code were: 560 | 561 | ``` 562 | $exit_tmp = load_special!(cm, "__exit__") 563 | $enter_tmp = load_special!(cm, "__enter__")() 564 | try: 565 | var = $enter_tmp 566 | body 567 | except: 568 | if not $exit_tmp(*sys.exc_info()): 569 | raise 570 | else: 571 | $exit_tmp(None, None, None) 572 | ``` 573 | 574 | ### Continue, break and return statements 575 | 576 | The `continue`, `break` and `return` statements transfer control, but care must to taken 577 | to ensure that `with` and `finally` statements are handled correctly. 578 | To do so, the control stack must be traversed and the relevant code emitted. 579 | For that we define the helper function, `emit_control(name, data)` which emits the 580 | code to cleanup the control statement. 581 | 582 | NOTE: 583 | * TO DO -- This needs to be double checked, the "handler" code is probably wrong. 584 | 585 | ```python 586 | def emit_control(name, data): 587 | if name == "loop": 588 | pass 589 | elif name == "try": 590 | emit(POP_HANDLER) 591 | elif name == "finally": 592 | emit(POP_HANDLER) 593 | translate(data) 594 | elif name == "finally_cleanup": 595 | emit(POP) 596 | emit(POP_HANDLER) 597 | emit(SET_EXCEPTION) 598 | elif name == "handler": 599 | emit(POP_HANDLER) 600 | if data is not None: 601 | emit(POP_HANDLER) 602 | emit(POP_EXCEPT) 603 | if data is not None: 604 | emit(CLEAR_LOCAL, data) 605 | ``` 606 | 607 | ### The `break` statement 608 | 609 | For a `break` statement, the control stack is unwound, until a "loop" control is found. 610 | Then a `jump` is to the exit is emitted. 611 | 612 | ```python 613 | for name, data in control_stack: 614 | emit_control(name, data) 615 | if name == "loop": 616 | loop, exit = data 617 | emit(JUMP, exit) 618 | break 619 | else: 620 | raise SyntaxError("break not in loop") 621 | ``` 622 | 623 | ### The `continue` statement 624 | 625 | The continue is much like the `break` statement, except that it jumps to the loop, not the exit. 626 | 627 | 628 | ```python 629 | for name, data in control_stack: 630 | emit_control(name, data) 631 | if name == "loop": 632 | loop, exit = data 633 | emit(JUMP, loop) 634 | break 635 | else: 636 | raise SyntaxError("continue not in loop") 637 | ``` 638 | 639 | ### The `return` statement 640 | 641 | 642 | ```python 643 | # Save return value before unwinding 644 | emit(STORE_TEMP, "return_value") 645 | for name, data in control_stack: 646 | emit_control(name, data) 647 | emit(LOAD_TEMP, "return_value") 648 | emit(RETURN) 649 | ``` --------------------------------------------------------------------------------