├── .gitignore ├── README.rest ├── _Sidebar.md ├── common-insts.rest ├── hail.rest ├── instruction-set.rest ├── irbuilder.rest ├── memory-model.rest ├── muapi.h ├── native-interface-x64-unix.rest ├── native-interface.rest ├── overview.rest ├── portability.rest ├── scripts ├── extract_comminst_macros.py └── muapiparser.py ├── threads-stacks.rest ├── type-system.rest ├── uvm-client-interface.rest ├── uvm-ir-binary.rest ├── uvm-ir.rest └── uvm-memory.rest /.gitignore: -------------------------------------------------------------------------------- 1 | *~ 2 | *.py[co] 3 | *.swp 4 | .DS_Store 5 | __pycache__ 6 | -------------------------------------------------------------------------------- /README.rest: -------------------------------------------------------------------------------- 1 | ================ 2 | Mu Specification 3 | ================ 4 | 5 | This document aims to provide a detailed description of Mu, a micro virtual 6 | machine, including its architecture, instruction set and type system. 7 | 8 | NOTE: This branch uses the goto-with-values form. The previous branch using 9 | SSA form with PHI nodes is in the `phi 10 | `__ branch. 11 | 12 | Main specification: 13 | 14 | - `Overview `__ 15 | - `Intermediate Representation `__ 16 | - `Intermediate Representation Binary Form (deprecated) `__ 17 | - `Type System `__ 18 | - `Instruction Set `__ 19 | - `Common Instructions `__ 20 | - `Client Interface (a.k.a. The API) `__ 21 | - `Call-based IR Building API (work in progress) `__ 22 | - `Threads and Stacks `__ 23 | - `Memory and Garbage Collection `__ 24 | - `Memory Model `__ 25 | - `(Unsafe) Native Interface `__ 26 | - `Heap Allocation and Initialisation Language (HAIL) `__ 27 | - `Portability and Implementation Advices `__ 28 | 29 | Platform-specific parts: These extends the main specification. The main 30 | specification considers these parts as implementation-specific. 31 | 32 | - `AMD64 Unix Native Interface `__ 33 | 34 | .. vim: tw=80 35 | -------------------------------------------------------------------------------- /_Sidebar.md: -------------------------------------------------------------------------------- 1 | # Contents 2 | 3 | [[_TOC_]] 4 | 5 | Apparently GitHub Wiki does not support the Gollum tag `[[_TOC_]]` that automatically generates the table of content. 6 | 7 | I recommend the following browser add-ons as a workaround: 8 | 9 | * Firefox: [HeadingsMap](https://addons.mozilla.org/en-US/firefox/addon/headingsmap/) 10 | * Chrome: [HTML5 Outliner](https://chrome.google.com/webstore/detail/html5-outliner/afoibpobokebhgfnknfndkgemglggomo) (recommended) 11 | 12 | Other Add-ons/Extensions that may work: 13 | 14 | * Firefox: [Table of Contents](https://addons.mozilla.org/en-US/firefox/addon/table-of-contents/) (width not adjustable) 15 | * Firefox: [HTML5 Outliner](https://addons.mozilla.org/en-US/firefox/addon/html5_outliner/) (no hyperlinks) 16 | * Chrome: [Table-of-contents-crx](https://chrome.google.com/webstore/detail/table-of-contents-crx/eeknhipceeelbgdbcmchicoaoalfdnhi) (not expanding the ToC on start) 17 | 18 | -------------------------------------------------------------------------------- /common-insts.rest: -------------------------------------------------------------------------------- 1 | =================== 2 | Common Instructions 3 | =================== 4 | 5 | This document specifies Common Instructions. 6 | 7 | **Common Instructions** are instructions that have a common format and are 8 | used with the ``COMMINST`` super instruction. They have: 9 | 10 | 1. An ID and a name. (This means, they are *identified*. See ``__.) 11 | 2. A flag list. 12 | 3. A type parameter list. 13 | 4. A value parameter list. 14 | 5. An optional exception clause. 15 | 6. A possibly empty (which means optional) keep-alive clause. 16 | 17 | *Common instructions* are a mechanism to extend the Mu IR without adding new 18 | instructions or changing the grammar. 19 | 20 | NOTE: *Common instructions* were named "intrinsic function" in previous 21 | versions of this document. The name was borrowed from the LLVM. However, the 22 | common instructions in Mu are quite different from the usual concept of 23 | intrinsic functions. 24 | 25 | Intrinsic functions usually mean a kind a function that is understood 26 | directly by the compiler. The C function ``memcpy`` is considered an 27 | intrinsic function by some compilers. In JikesRVM, methods of the ``Magic`` 28 | class are a kind of intrinsic functions. They appear like ordinary functions 29 | in the language and bypass all front-end tools including the C parser and 30 | javac, but they are understood by the backend. Their purpose is to perform 31 | tasks that cannot be expressed by the high-level programming language, 32 | including direct raw memory access in Java. 33 | 34 | Common instructions only differ from ordinary Mu instructions in that they 35 | have a common format and are called by the ``COMMINST`` super instruction. 36 | The purpose is to add more instructions to the Mu IR without having to 37 | modify the parser. 38 | 39 | Common instructions are not Mu functions and cannot be called by the 40 | ``CALL`` instruction, nor can it be directly used from the high-level 41 | language that the client implements. The Mu client must understand common 42 | instructions because it is the only source of IR code of Mu. That is to say, 43 | *there is no way any higher-level program can express anything which Mu 44 | knows but the client does not*. For special high-level language functions 45 | that cannot be directly implemented in the high-level programming language, 46 | like the methods in the ``java.lang.Thread`` class, the client must 47 | implement those special high-level language functions in "ordinary" Mu IR 48 | code, which may or may not involve common instructions. For example, 49 | creating a thread is a "magic" in Java, but it is not more special than 50 | executing an instruction (``NEWTHREAD``) in Mu. Some Java libraries require 51 | Mu to make a ``CCALL`` to some C functions which are provided by the JVM, 52 | and they slip under the level of Mu. But Mu and the client always know the 53 | fact that "it call C function" and it is not magic. 54 | 55 | This document uses the following notation:: 56 | 57 | [id]@name [F1 F2 ...] < T1 T2 ... > <[ sig1 sig2 ... ]> ( p1:t1, p2:t2, ... ) excClause KEEPALIVE -> RTs 58 | 59 | - ``id`` is the ID and ``@name`` is the name. 60 | 61 | - ``[F1 F2 ...]`` is a list of flag parameters. 62 | 63 | - ``[T1 T2 ...]`` is a list of type parameters. The users pass types into the 64 | common instruction via this list. 65 | 66 | - ``<[ sig1 sig2 ... ]>`` is a list of function signature parameters. 67 | 68 | - ``(p1:t1, p2:t2, ...)`` is a list of pairs of symbolic name and type. It is 69 | the value parameter list with the type of each parameter. The user only passes 70 | values via this list, and the types are only parts of the documentation. 71 | 72 | If any of the above list is omitted in this document, it means the respective 73 | common instruction does not take that kind of parameters. 74 | 75 | If ``excClause`` or ``KEEPALIVE`` are present, they mean that the common 76 | instruction accepts exception clause or keepalive clause, respectively. 77 | Otherwise the common instruction does not branch to exception destinations nor 78 | support any keep-alive variables. 79 | 80 | ``RTs`` are the return types. If the return type is omitted, it means it 81 | produces no results (equivalent to ``-> ()``). 82 | 83 | The names of many common instructions are grouped by prefixes, such as 84 | ``@uvm.tr64.``. In this document, their common prefixes may be omitted in their 85 | descriptions when unambiguous. 86 | 87 | Thread and Stack operations 88 | =========================== 89 | 90 | :: 91 | 92 | [0x201]@uvm.new_stack <[sig]> (%func: funcref) -> stackref 93 | 94 | Create a new stack with ``%func`` as the stack-bottom function. ``%func`` must 95 | have signature ``sig``. Returns the stack reference to the new stack. 96 | 97 | The stack-bottom frame is in the state **READY**, where *Ts* are the 98 | parameter types of ``%func``. 99 | 100 | This instruction continues exceptionally if Mu failed to create the stack. The 101 | exception parameter receives NULL. 102 | 103 | :: 104 | 105 | [0x202]@uvm.kill_stack (%s: stackref) 106 | 107 | Destroy the given stack ``%s``. The stack ``%s`` must be in the **READY** state 108 | and will enter the **DEAD** state. 109 | 110 | :: 111 | 112 | [0x203]@uvm.thread_exit 113 | 114 | Stop the current thread and kill the current stack. The current stack will enter 115 | the **DEAD** state. The current thread stops running. 116 | 117 | :: 118 | 119 | [0x204]@uvm.current_stack -> stackref 120 | 121 | Return the current stack. 122 | 123 | :: 124 | 125 | [0x205]@uvm.set_threadlocal (%ref: ref) 126 | 127 | Set the thread-local object reference of the current thread to ``%ref``. 128 | 129 | :: 130 | 131 | [0x206]@uvm.get_threadlocal -> ref 132 | 133 | Return the current thread-local object reference of the current thread. 134 | 135 | 64-bit Tagged Reference 136 | ======================= 137 | 138 | :: 139 | 140 | [0x211]@uvm.tr64.is_fp (%tr: tagref64) -> int<1> 141 | [0x212]@uvm.tr64.is_int (%tr: tagref64) -> int<1> 142 | [0x213]@uvm.tr64.is_ref (%tr: tagref64) -> int<1> 143 | 144 | - ``is_fp`` checks if ``%tr`` holds an FP number. 145 | - ``is_int`` checks if ``%tr`` holds an integer. 146 | - ``is_ref`` checks if ``%tr`` holds a reference. 147 | 148 | Return 1 or 0 for true or false. 149 | 150 | :: 151 | 152 | [0x214]@uvm.tr64.from_fp (%val: double) -> tagref64 153 | [0x215]@uvm.tr64.from_int (%val: int<52>) -> tagref64 154 | [0x216]@uvm.tr64.from_ref (%ref: ref, %tag: int<6>) -> tagref6 155 | 156 | - ``from_fp`` creates a ``tagref64`` value from an FP number ``%val``. 157 | - ``from_int`` creates a ``tagref64`` value from an integer ``%val``. 158 | - ``from_ref`` creates a ``tagref64`` value from a reference ``%ref`` and the 159 | integer tag ``%tag``. 160 | 161 | Return the created ``tagref64`` value. 162 | 163 | 164 | :: 165 | 166 | [0x217]@uvm.tr64.to_fp (%tr: tagref64) -> double 167 | [0x218]@uvm.tr64.to_int (%tr: tagref64) -> int<52> 168 | [0x219]@uvm.tr64.to_ref (%tr: tagref64) -> ref 169 | [0x21a]@uvm.tr64.to_tag (%tr: tagref64) -> int<6> 170 | 171 | - ``to_fp`` returns the FP number held by ``%tr``. 172 | - ``to_int`` returns the integer held by ``%tr``. 173 | - ``to_ref`` returns the reference held by ``%tr``. 174 | - ``to_tag`` returns the integer tag held by ``%tr`` that accompanies the 175 | reference. 176 | 177 | They have undefined behaviours if ``%tr`` does not hold the value of the 178 | expected type. 179 | 180 | Math Instructions 181 | ================= 182 | 183 | TODO: Should provide enough math functions to support: 184 | 185 | 1. Ordinary arithmetic and logical operations that throw exceptions when 186 | overflow. Example: C# in checked mode, ``java.lang.Math.addOvf`` added in 187 | Java 1.8. 188 | 2. Floating point math functions. Example: trigonometric functions, testing 189 | NaN, fused multiply-add, ... 190 | 191 | It requires some work to decide a complete list of such functions. To work 192 | around the limitations for now, please call native functions in libc or 193 | libm using ``CCALL``. 194 | 195 | Futex Instructions 196 | ================== 197 | 198 | See ``__ for high-level descriptions about Futex. 199 | 200 | Wait 201 | ---- 202 | 203 | :: 204 | 205 | [0x220]@uvm.futex.wait (%loc: iref, %val: T) -> int<32> 206 | [0x221]@uvm.futex.wait_timeout (%loc: iref, %val: T, %timeout: int<64>) -> int<32> 207 | 208 | ``T`` must be an integer type. 209 | 210 | ``wait`` and ``wait_timeout`` verify if the memory location ``%loc`` still 211 | contains the value ``%val`` and then put the current thread to the waiting queue 212 | of memory location ``%loc``. If ``%loc`` does not contain ``%val``, return 213 | immediately. These instructions are atomic. 214 | 215 | - ``wait`` waits indefinitely. 216 | 217 | - ``wait_timeout`` has an extra ``%timeout`` parameter which is a 64-bit 218 | unsigned integer that represents a time in nanoseconds. It specifies the 219 | duration of the wait. 220 | 221 | Both instructions are allowed to spuriously wake up. 222 | 223 | They return a signed integer which indicates the result of this call: 224 | 225 | * 0: the current thread is woken. 226 | * -1: the memory location ``%loc`` does not contain the value ``%val``. 227 | * -2: spurious wakeup. 228 | * -3: timeout during waiting (``wait_timeout`` only). 229 | 230 | Wake 231 | ---- 232 | 233 | :: 234 | 235 | [0x222]@uvm.futex.wake (%loc: iref, %nthread: int<32>) -> int<32> 236 | 237 | ``T`` must be an integer type. 238 | 239 | ``wake`` wakes *N* threads in the waiting queue of the memory location ``%loc``. 240 | This instruction is atomic. 241 | 242 | *N* is the minimum value of ``%nthread`` and the actual number of threads in the 243 | waiting queue of ``%loc``. ``%nthread`` is signed. Negative ``%nthread`` has 244 | undefined behaviour. 245 | 246 | It returns the number of threads woken up. 247 | 248 | Requeue 249 | ------- 250 | 251 | :: 252 | 253 | [0x223]@uvm.futex.cmp_requeue (%loc_src: iref, %loc_dst: iref, %expected: T, %nthread: int<32>) -> int<32> 254 | 255 | ``T`` must be an integer type. 256 | 257 | ``cmp_requeue`` verifies if the memory location ``%loc_src`` still contains the 258 | value ``%expected`` and then wakes up *N* threads from the waiting queue of 259 | ``%loc_src`` and move all other threads in the waiting queue of ``%loc_src`` to 260 | the waiting queue of ``%loc_dst``. If ``%loc_src`` does not contain the value 261 | ``%expected``, return immediately. This instruction is atomic. 262 | 263 | *N* is the minimum value of ``%nthread`` and the actual number of threads in the 264 | waiting queue of ``%loc``. ``%nthread`` is signed. Negative ``%nthread`` has 265 | undefined behaviour. 266 | 267 | It returns a signed integer. When the ``%loc_src`` contains the value of 268 | ``%expected``, return the number of threads woken up; otherwise return -1. 269 | 270 | Miscellaneous Instructions 271 | ========================== 272 | 273 | :: 274 | 275 | [0x230]@uvm.kill_dependency (%val: T) -> T 276 | 277 | Return the same value as ``%val``, but ``%val`` does not carry a dependency to 278 | the return value. 279 | 280 | NOTE: This is supposed to free the compiler from keeping dependencies in 281 | some performance-critical cases. 282 | 283 | Native Interface 284 | ================ 285 | 286 | Object pinning 287 | -------------- 288 | 289 | :: 290 | 291 | [0x240]@uvm.native.pin (%opnd: T) -> uptr 292 | [0x241]@uvm.native.unpin (%opnd: T) 293 | 294 | *T* must be ``ref`` or ``iref`` for some U. 295 | 296 | - ``pin`` adds one instance of the reference ``%opnd`` to the pinning multiset 297 | of the current thread. Returns the mapped pointer to the bytes for the memory 298 | location. If *T* is ``ref``, it is equivalent to pinning the memory 299 | location of the whole object (as returned by the ``GETIREF`` instruction). If 300 | *opnd* is ``NULL``, the result is a null pointer whose address is 0. 301 | 302 | - ``unpin`` removes one instance of the reference ``%opnd`` from the pinning 303 | multiset of the current thread. It has undefined behaviour if no such an 304 | instance exists. 305 | 306 | Mu function exposing 307 | -------------------- 308 | 309 | :: 310 | 311 | [0x242]@uvm.native.expose [callconv] <[sig]> (%func: funcref, %cookie: int<64>) -> U 312 | 313 | *callconv* is a platform-specific calling convention flag. *U* is determined by 314 | the calling convention and *sig*. 315 | 316 | ``expose`` exposes a Mu function *func* as a value according to the calling 317 | convention *callConv* with cookie *cookie*. 318 | 319 | Example:: 320 | 321 | .funcdef @foo VERSION ... <@foo_sig> (...) { ... } 322 | 323 | %ev = COMMINST @uvm.native.expose [#DEFAULT] <[@foo_sig]> 324 | 325 | :: 326 | 327 | [0x243]@uvm.native.unexpose [callconv] (%value: U) 328 | 329 | *callconv* is a platform-specific calling convention flag. *U* is determined by 330 | the calling convention. 331 | 332 | ``unexpose`` removes the exposed value. 333 | 334 | :: 335 | 336 | [0x244]@uvm.native.get_cookie () -> int<64> 337 | 338 | If a Mu function is called via its exposed value, this instruction returns the 339 | attached cookie. Otherwise it returns an arbitrary value. 340 | 341 | Metacircular Client Interface 342 | ============================= 343 | 344 | These are additional instructions that enables Mu IR programs to behave like a 345 | client. 346 | 347 | Some types and signatures are pre-defined. They are always available. Note that 348 | the following are not strict text IR syntax because some types are defined in 349 | line:: 350 | 351 | .typedef @uvm.meta.bytes = hybrid int<8>> // ID: 0x260 352 | .typedef @uvm.meta.bytes.r = ref<@uvm.meta.bytes.r> // ID: 0x261 353 | .typedef @uvm.meta.refs = hybrid ref> // ID: 0x262 354 | .typedef @uvm.meta.refs.r = ref<@uvm.meta.refs.r> // ID: 0x263 355 | 356 | .funcsig @uvm.meta.trap_handler.sig = (stackref int<32> ref) -> () // ID: 0x264 357 | 358 | In ``bytes`` and ``refs``, the fixed part is the length of the variable part. 359 | ``bytes`` represents a byte array. ASCII strings are also represented this way. 360 | 361 | ID/name conversion 362 | ------------------ 363 | 364 | :: 365 | 366 | [0x250]@uvm.meta.id_of (%name: @uvm.meta.bytes.r) -> int<32> 367 | [0x251]@uvm.meta.name_of (%id: int<32>) -> @uvm.meta.bytes.r 368 | 369 | - ``id_of`` converts a textual Mu name ``%name`` to the numerical ID. The name 370 | must be a global name. 371 | 372 | - ``name_of`` converts the ID ``%id`` to its corresponding name. If the name 373 | does not exist, it returns ``NULL``. The returned object must not be modified. 374 | 375 | They have undefined behaviours if the name or the ID in the argument do not 376 | exist, or ``%name`` is ``NULL``. 377 | 378 | Bundle/HAIL loading 379 | ------------------- 380 | 381 | :: 382 | 383 | [0x252]@uvm.meta.load_bundle (%buf: @uvm.meta.bytes.r) 384 | [0x253]@uvm.meta.load_hail (%buf: @uvm.meta.bytes.r) 385 | 386 | ``load_bundle`` and ``load_hail`` loads Mu IR bundles and HAIL scripts, 387 | respectively. ``%buf`` is the content. 388 | 389 | TODO: These comminsts should be made optional, and the IR Builder API should 390 | be provided as comminsts, too. 391 | 392 | Stack introspection 393 | ------------------- 394 | 395 | :: 396 | 397 | [0x254]@uvm.meta.new_cursor (%stack: stackref) -> framecursorref 398 | [0x255]@uvm.meta.next_frame (%cursor: framecursorref) 399 | [0x256]@uvm.meta.copy_cursor (%cursor: framecursorref) -> framecursorref 400 | [0x257]@uvm.meta.close_cursor (%cursor: framecursorref) 401 | 402 | In all cases, ``cursor`` and ``stack`` cannot be ``NULL``. 403 | 404 | - ``new_cursor`` allocates a frame cursor, referring to the top frame of 405 | ``%stack``. Returns the frame cursor reference. 406 | 407 | - ``next_frame`` moves the frame cursor so that it refers to the frame below its 408 | current frame. 409 | 410 | - ``copy_cursor`` allocates a frame cursor which refers to the same frame as 411 | ``%cursor``. Returns the frame cursor reference. 412 | 413 | - ``close_cursor`` deallocates the cursor. 414 | 415 | :: 416 | 417 | [0x258]@uvm.meta.cur_func (%cursor: framecursorref) -> int<32> 418 | [0x259]@uvm.meta.cur_func_Ver (%cursor: framecursorref) -> int<32> 419 | [0x25a]@uvm.meta.cur_inst (%cursor: framecursorref) -> int<32> 420 | [0x25b]@uvm.meta.dump_keepalives (%cursor: framecursorref) -> @uvm.meta.refs.r 421 | 422 | These functions operate on the frame referred by ``%cursor``. In all cases, 423 | ``%cursor`` cannot be ``NULL``. 424 | 425 | - ``cur_func`` returns the ID of the frame. Returns 0 if the frame is native. 426 | 427 | - ``cur_func_ver`` returns the ID of the current function version of the frame. 428 | Returns 0 if the frame is native, or the function of the frame is undefined. 429 | 430 | - ``cur_inst`` returns the ID of the current instruction of the frame. Returns 0 431 | if the frame is just created, its function is undefined, or the frame is 432 | native. 433 | 434 | - ``dump_keepalives`` dumps the values of the keep-alive variables of the 435 | current instruction in the frame. If the function is undefined, the arguments 436 | are the keep-alive variables. Cannot be used on native frames. The return 437 | value is a list of object references, each of which refers to an object which 438 | has type *T* and contains value *v*, where *T* and *v* are the type and the 439 | value of the corresponding keep-alive variable, respectively. 440 | 441 | On-stack replacement 442 | -------------------- 443 | 444 | :: 445 | 446 | [0x25c]@uvm.meta.pop_frames_to (%cursor: framecursorref) 447 | [0x25d]@uvm.meta.push_frame <[sig]> (%stack: stackref, %func: funcref) 448 | 449 | ``%cursor``, ``%stack`` and ``%func`` must not be ``NULL``. 450 | 451 | - ``pop_frames_to`` pops all frames above ``%cursor``. 452 | 453 | - ``push_frame`` creates a new frame on top of the stack ``%stack`` for the 454 | current version of the Mu function ``%func``. ``%func`` must have the 455 | signature ``sig``. 456 | 457 | Watchpoint operations 458 | --------------------- 459 | 460 | :: 461 | 462 | [0x25e]@uvm.meta.enable_watchpoint (%wpid: int<32>) 463 | [0x25f]@uvm.meta.disable_watchpoint (%wpid: int<32>) 464 | 465 | - ``enable_watchpoint`` enables all watchpoints of watchpoint ID ``%wpid``. 466 | - ``disenable_watchpoint`` disables all watchpoints of watchpoint ID ``%wpid``. 467 | 468 | Trap handling 469 | ------------- 470 | 471 | :: 472 | 473 | [0x260]@uvm.meta.set_trap_handler (%handler: funcref<@uvm.meta.trap_handler.sig>, %userdata: ref) 474 | 475 | This instruction registers a trap handler. ``%handler`` is the function to be 476 | called and ``%userdata`` will be their last argument when called. 477 | 478 | This instruction overrides the trap handler registered via the C-based client 479 | API. 480 | 481 | A trap handler takes three parameters: 482 | 483 | 1. The stack where the trap takes place. 484 | 2. The watchpoint ID, or 0 if triggered by the ``TRAP`` instruction. 485 | 3. The user data, which is provided when registering. 486 | 487 | A trap handler is run by the same Mu thread that caused the trap and is executed 488 | on a new stack. 489 | 490 | A trap handler *usually* terminates by either executing the ``@uvm.thread_exit`` 491 | instruction (probably also kill the old stack before exiting), or ``SWAPSTACK`` 492 | back to another stack while killing the stack the trap handler was running on. 493 | 494 | Notes about dynamism 495 | -------------------- 496 | 497 | These additional instructions are not dynamic. Unlike the C-based API, these 498 | instructions do not use handles. Arguments, such as the additional arguments of 499 | ``push_frame`` are also statically typed. If the client needs dynamically typed 500 | handles, it can always make its own. For example, ``push_frame`` can be wrapped 501 | by a Mu function which takes a dynamic argument list, checks the argument types, 502 | and executes a static ``@uvm.meta.push_frame`` instruction on the unboxed 503 | values. 504 | 505 | Some dynamic lookups, such as looking up constants by ID, are not available, 506 | either. It can be worked around by maintaining a ``HashMap`` (in the 507 | form of Mu IR programs) which is updated with each bundle loading. In other 508 | words, if the client does not maintain such a map, Mu will have to maintain it 509 | for the client. 510 | 511 | .. vim: tw=80 512 | -------------------------------------------------------------------------------- /hail.rest: -------------------------------------------------------------------------------- 1 | =========================================== 2 | Heap Allocation and Initialisation Language 3 | =========================================== 4 | 5 | **HAIL may not be the best tool**. The most efficient way to initialise a micro 6 | VM is by building a boot image (this is implementation-specific). The most 7 | efficient way to load objects from a serialised file is to build the 8 | de-serialiser (such as the ".class" file parser) in Mu IR. 9 | 10 | The Heap Allocation and Initialisation Language (HAIL) is a Mu IR-like language 11 | that allocates heap objects and initialise Mu memory locations with values. 12 | 13 | It is designed to initialise language-specific objects, such as class 14 | meta-objects (e.g. the ``java.lang.Class`` objects and the virtual tables in JVM 15 | created during class loading), heap-allocated constant string objects, and 16 | language-level constants which are implemented as Mu-level global cells (because 17 | Mu does not allow "constant object references"). 18 | 19 | HAIL should be faster than initialising the memory via the client API, and more 20 | space-efficient than a naively implemented Mu function which creates and 21 | initialises objects by executing a (usually *very* long) sequence of ``NEW`` and 22 | ``STORE`` instructions. But keep in mind that is not the only efficient method. 23 | A well-written Mu program can read from a serialised file (e.g. the ".class" 24 | file in Java) and interpret it in a similar way the Mu micro VM interprets the 25 | HAIL script. The client can also rely on object pinning and initialise objects 26 | via pointers, bypassing the handle-based API. 27 | 28 | A **HAIL script** has a text format and a binary format. The text format is 29 | similar to the text-based Mu IR, and the binary format is similar to the binary 30 | form Mu IR. 31 | 32 | This document uses EBNF to define the text-form syntax. A non-terminal starts 33 | with a capital letter and a terminal starts with a lower-case letter. Literal 34 | characters are quoted within pairs of ``'`` or ``"``. ``*`` means repeating 0 or 35 | more times. ``+`` means repeating 1 or more times. ``?`` means optional. ``|`` 36 | means either its left-hand-side or its right-hand-side. ``|`` has the lowest 37 | precedence than simple concatenation; unary suffixes have the highest 38 | precedence. ``(`` and ``)`` group terms together to override the precedence. 39 | ``[`` and ``]`` denote a set of characters. 40 | 41 | Lexical Structures 42 | ================== 43 | 44 | Comments start with two slashes ``//`` and ends at the end of the line. White 45 | spaces between lexicons are ignored. 46 | 47 | In HAIL, a **HAIL name**, i.e. the name of a heap-allocated object, start with a 48 | dollar sign ``$`` followed by one or more characters in the set: 49 | ``[0-9a-zA-Z_-.]``:: 50 | 51 | hailName ::= '$' [0-9a-zA-Z_-.]+ 52 | 53 | The scope of a HAIL name is within a single HAIL script. In other words, they 54 | are temporary, and they become invalid as soon as the HAIL script is fully 55 | evaluated. Storing them to global cells is one way to keep references to the 56 | allocated objects. 57 | 58 | **Global name**, **integer literal**, **floating point literal** and **null 59 | literal** are defined the same way as in `Mu IR `__. They are 60 | denoted as ``globalName``, ``intLit``, ``floatLit``, ``'NULL'``, respectively. 61 | 62 | Expressions 63 | =========== 64 | 65 | An **LValue** specifies a memory location:: 66 | 67 | LValue ::= Name Index* 68 | 69 | Name ::= globalName | hailName 70 | 71 | Index ::= '[' IntExpr ']' 72 | 73 | IntExpr ::= intLit | globalName 74 | 75 | It is either a component of a global cell (when ``Name`` is a ``globalName``), 76 | or a component of a newly allocated heap object in the current HAIL script (when 77 | ``Name`` is a ``hailName``). If the name appears alone, the memory location is 78 | the global cell or the heap object itself. Its fields or elements can be 79 | selected using indices. The index can be an integer literal (``intLit``) or a 80 | global name of a Mu constant of ``int`` type of any ``n``, treated as 81 | unsigned integer. 82 | 83 | If a memory location ``l`` holds a struct or hybrid, then ``l[n]`` is its n-th 84 | field (n = 0, 1, 2...). Specifically, for hybrid, it means the n-th field in the 85 | fixed part. If n equals the number of fixed-part fields, it selects the variable 86 | part. In such cases, ``l[n][m]`` is the m-th element of the variable part. 87 | 88 | If a memory location ``l`` holds an array or a vector, then ``l[n]`` is its n-th 89 | element (n = 0, 1, 2...). 90 | 91 | An **RValue** specifies a Mu value:: 92 | 93 | RValue ::= globalName | intLit | floatLit | ``'NULL'`` 94 | | hailName | '&' LValue | '*' LValue | List 95 | 96 | List ::= '{' RValue* '}' 97 | 98 | It can be the address of an LValue, in which case the value is one of: 99 | 100 | - A Mu global SSA variable (constant, global cell, function, exposed function) 101 | (``globalName``) 102 | - An integer literal (``intLit``) 103 | - A floating point literal (``floatLit``) 104 | - The ``NULL`` value of an appropriate type (``'NULL'``) 105 | - An object reference to an object just created in HAIL (``hailName``) 106 | - An internal reference of an LValue (``'&' LValue``) 107 | - The current value held at the memory location of an LValue (``'*' LValue``) 108 | - A list of 0 or more RValue. 109 | 110 | See *memory initialisation* below for more details. 111 | 112 | Top-level Definitions 113 | ===================== 114 | 115 | Top-level definitions in HAIL include **fixed object allocation**, **hybrid 116 | allocation** and **memory initialisation**. All object allocations are evaluated 117 | before any memory initialisation definitions are evaluated. 118 | 119 | A *fixed object allocation* allocates a fixed-size object:: 120 | 121 | FixedAlloc ::= '.new' hailName '<' Type '>' 122 | 123 | Type ::= globalName 124 | 125 | 126 | where ``hailName`` is a HAIL name of the allocated object, and ``Type`` is the 127 | global name of the type of the object. ``Type`` must not be a ``hybrid`` type. 128 | 129 | A *hybrid allocation* allocates a hybrid:: 130 | 131 | HybridAlloc ::= '.newhybrid' hailName '<' Type '>' IntExpr 132 | 133 | where ``hailName`` and ``Type`` are the name and the type. ``IntExpr`` specifies 134 | the length of the variable part of the hybrid, which can either be an integer 135 | literal, or a Mu ``int`` constant of any ``n``, treated as unsigned integer. 136 | ``Type`` must be a ``hybrid`` type. 137 | 138 | A *memory initialisation* initialises a memory location:: 139 | 140 | MemInit ::= '.init' LValue = RValue 141 | 142 | ``RValue`` must be appropriate for the ``LValue`` type. Specifically, the star 143 | notation ``*LValue`` copies the value from the memory location of the ``LValue`` 144 | after the ``*``. It is applicable to all types as long as the type matches the 145 | ``LValue`` being written to. In addition: 146 | 147 | - If ``LValue`` is ``int``, ``uptr`` or ``ufuncptr``, then ``RValue`` 148 | can be an ``intLit``, a constant of the same ``LValue`` type. 149 | 150 | - If ``LValue`` is ``float`` or ``double``, then ``RValue`` can be a 151 | ``floatLit`` or a Mu constant of the same type as ``LValue``. 152 | 153 | - If ``LValue`` is ``ref`` or ``weakref``, then ``RValue`` can be ``NULL`` 154 | or a HAIL name of a newly created object. 155 | 156 | - If ``LValue`` is ``iref``, then ``RValue`` can be ``NULL``, the global name 157 | of a global cell, or an ``LValue`` (with the ``&`` sign). Implicit ``REFCAST`` 158 | applies. 159 | 160 | - If ``LValue`` is ``funcref``, then ``RValue`` can be ``NULL`` or the 161 | global name of a Mu function. Implicit ``REFCAST`` applies. 162 | 163 | - If ``LValue`` is ``stackref`` or ``threadref``, then the only applicable 164 | ``RValue`` is ``NULL``. 165 | 166 | - If ``LValue`` is ``tagref64``, then ``RValue`` can be the appropriate value 167 | suitable for ``double``, ``int<52>`` or ``struct int<6>>``. 168 | 169 | - If ``LValue`` is a struct, hybrid, array or vector, then ``RValue`` must be a 170 | ``List`` of ``RValue`` items. Each item will initialise a field or element of 171 | the composite type. The entire variable part of a hybrid is treated as one 172 | additional field to its fixed part fields, and is treated as an array of the 173 | actual length. The list may have less fields/elements of the ``LValue``, in 174 | which case only the first fields/elements are initialised, and others remain 175 | their old values. (Note: All newly allocated memory locations, including heap 176 | objects, stack cells and global cells, have initial values: 0 or NULL.) 177 | 178 | When assigning to an LValue of ``ref``, ``weakref``, ``iref``, 179 | ``funcref``, ``uptr`` or ``ufuncptr`` types, if the RValue only 180 | differs in the ``T`` or ``sig`` parameter, then implicit ``REFCAST`` or 181 | ``PTRCAST`` are applied. ``weakref`` and ``ref`` can be assigned to each other. 182 | ``PTRCAST`` can only change the type/sig parameters ``T`` and ``sig``, but not 183 | the base type ``int``, ``uptr`` and ``ufuncptr``. (Note: This makes sub-class 184 | instances assignable to a location that refers to a super-class instance.) 185 | 186 | Multiple top-level definitions are applied in the order they appear in the HAIL 187 | script. In order to deal with cyclic references, it is advisable to put ``.new`` 188 | and ``.newhybrid`` before ``.init``. 189 | 190 | Memory Order 191 | ============ 192 | 193 | All loads (via ``*LValue``) and stores (via ``.init``) are non-atomic. In 194 | ``.init``, it has undefined behaviour if any values in the ``LValue`` in the 195 | left-hand-side of the ``=`` is accessed by the right-hand-side ``RValue``. 196 | 197 | NOTE: This is to say, don't load from the memory location being initialised 198 | because the implementation may write into the left-hand-side in any order. 199 | 200 | Example 201 | ======= 202 | 203 | Example 1:: 204 | 205 | // Assume the following definitions in a previously loaded Mu IR bundle. 206 | // .typedef @i8 = int<8> 207 | // .typedef @i32 = int<32> 208 | // .typedef @i64 = int<64> 209 | // .typedef @float = float 210 | // .typedef @double = double 211 | // 212 | // .typedef @NakedArray = hybrid<@i8> // no fields in the fixed part 213 | // .typedef @LengthedArray = hybrid<@i32 @i8> // no fields in the fixed part 214 | // .typedef @JavaStyleArray = hybrid<@TID @i32 @i8> // two fields in the fiexed part 215 | // 216 | // .typedef @TID = int<64> 217 | // .typedef @SmallFloatArray = array<@float 4> 218 | // .typedef @irefi64 = iref<@i64> 219 | // .typedef @Object = struct<@TID @SmallFloatArray @double @irefi64> 220 | // 221 | // .typedef @vtable = hybrid<...> 222 | // .typedef @vtable_r = ref<@vtable> 223 | // .global @g_vtable <@vtable_r> 224 | // 225 | // .typedef @Object2 = struct<@vtable_r @i64> 226 | // 227 | // .typedef @LinkedList = struct<@i64 @LinkedList_r> 228 | // .typedef @LinkedList_r = ref<@LinkedList> 229 | // 230 | // .typedef @irefi64 = iref<@i64> 231 | // 232 | // .const @MAGICAL_NUMBER <@i64> = 42 233 | // .const @PI <@double> = 3.14d 234 | // 235 | // .global @my_global <@i64> 236 | // .global @a_global_iref_cell <@irefi64> 237 | // .global @another_global_iref_cell <@irefi64> 238 | // 239 | // .global @my_favourite_linked_list_node <@LinkedList> 240 | // 241 | 242 | 243 | .new $my_long_obj <@i64> 244 | .init $my_long_obj = 0x123456789abcdef0 245 | 246 | .newhybrid $my_array1 <@NakedArray> 4 247 | .newhybrid $my_array2 <@LengthedArray> 10000 248 | .newhybrid $my_array3 <@JavaStyleArray> @MAGICAL_NUMBER 249 | 250 | .init $my_array1 = {1 2 3 4} 251 | 252 | .init $my_array2 = {100 // claim length 100, while the capacity is really 10000 253 | {0 1 2 3 4}} // Only init 5 elems 254 | .init $my_array2[1][99] = 99 // Also init the 99-th elem 255 | 256 | .init $my_array3 = {1001 42 {1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 257 | 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 258 | 7 8 9 0 1 2}} 259 | 260 | .new $my_obj <@Object> 261 | 262 | .init $my_obj = {@MAGICAL_NUMBER {1.0f 2.0f 3.0f 4.0f} @PI @my_global} 263 | 264 | .new $my_obj2 <@Object2> 265 | 266 | // This object has a pointer to an existing v-table allocated before loading 267 | // this HAIL script. A reference to the v-table is held in the global cell @g_vtable. 268 | // The star notation *@g_vtable loads the value from an LValue and assign it 269 | // to the field. 270 | .init $my_obj2 = {*@g_vtable 42} 271 | 272 | .new $node0 <@LinkedList> 273 | .new $node1 <@LinkedList> 274 | .new $node2 <@LinkedList> 275 | 276 | .init $node0 = {0 $node1} // All objects are allocated before init. 277 | .init $node1 = {1 $node0} // so they can form a ring 278 | 279 | .init $node2 = {2 NULL} // Isolated node 280 | 281 | // Global cells can be initialised, too. 282 | .init @my_global = -1 283 | 284 | // @a_global_iref_cell will hold an iref<@i64> to the global cell @my_global 285 | .init @a_global_iref_cell = &@myglobal 286 | 287 | // Equivalent. The global variable @myglobal is already an iref. 288 | .init @a_global_iref_cell = @myglobal 289 | 290 | .new $foo <@i64> 291 | 292 | // This refers into the gut of $foo 293 | .init @another_global_iref_cell = &$foo 294 | 295 | // In fact, all objects except $node0 and $node1 will become garbages after 296 | // this HAIL script is fully evaluated. 297 | .init @my_favourite_linked_list_node = $node0 298 | 299 | Example 2: String constant initialisation. In order to keep references to these 300 | objects, we need to store them to global cells:: 301 | 302 | // Assume the following Java code: 303 | // System.out.println("Hello world!"); 304 | // 305 | // We want to create a String object for the string literal "Hello world!". 306 | // In a real JVM, more strings would be created for class names and method 307 | // names for reflection. 308 | // 309 | // We assume the Java class loader defines the String like this: 310 | // .typedef @RefString = ref <@String> 311 | // .typedef @String = struct <@TID @RefCharArray @i32 @i32> // tid, buf, begin, size 312 | // .typedef @RefCharArray = ref <@CharArray> 313 | // .typedef @CharArray = hybrid <@ArrayHeader @i16> // header, elements 314 | // .typedef @ArrayHeader = struct <@TID @i32> // tid, length 315 | // 316 | // It makes a global cell to store a reference to the String: 317 | // .global @const_hello_world <@RefString> 318 | // 319 | // Then we can create and initialise the string in HAIL: 320 | 321 | .new $hw <@String> // The String object 322 | .newhybrid $hwbuf <@CharArray> 12 // The underlying array 323 | 324 | .init $hw = {0xabcd $hwbuf 0 12} 325 | .init $hwbuf = {{0x1234 12} {0x48 0x65 0x6c 0x6c 0x6f 0x20 0x77 0x6f 0x72 0x6c 0x64 0x21}} 326 | 327 | .init @const_hello_world = $hw // Store it to the global cell. 328 | 329 | // Then out.println("Hello world!") can be compiled to: 330 | // %1 = LOAD <@RefString> @const_hello_world 331 | // CALL <@sig1> @PrintStream.println (%out %1) // in the real world it may need dynamic dispatching 332 | 333 | Binary Form 334 | =========== 335 | 336 | A binary HAIL script starts with a 4-byte magic '\x7f' 'H' 'A' 'I', or 0x7f 0x48 337 | 0x41 0x49. 338 | 339 | HAIL IDs are the counterpart of HAIL names. HAIL IDs are 32-bit integers. 0 is 340 | an invalid HAIL ID. HAIL ID has a different namespace from Mu IDs, i.e. they 341 | refer to different things even if their values are equal. HAIL IDs only refer to 342 | heap-allocated objects in the current HAIL script. 343 | 344 | In the following paragraphs, binary types defined in `Mu IR Binary Form 345 | `__ are used. For convenience, we use "hID" for HAIL ID and "mID" 346 | for Mu ID. 347 | 348 | A *fixed object allocation* definition has the form: 349 | 350 | +------+-----+------+ 351 | | opct | idt | idt | 352 | +======+=====+======+ 353 | | 0x01 | hID | type | 354 | +------+-----+------+ 355 | 356 | *hID* is the HAIL ID of the object. *type* is the Mu ID of the type. 357 | 358 | A *variable-length object allocation* definition has the form: 359 | 360 | +------+-----+------+--------+ 361 | | opct | idt | idt | i64 | 362 | +======+=====+======+========+ 363 | | 0x02 | hID | type | length | 364 | +------+-----+------+--------+ 365 | 366 | *hID* is the HAIL ID of the object. *type* is the Mu ID of the type. *length* is 367 | the length of the variable part. 368 | 369 | A *memory initialisation* definition has the form: 370 | 371 | +------+--------+--------+ 372 | | opct | LValue | RValue | 373 | +======+========+========+ 374 | | 0x03 | LValue | RValue | 375 | +------+--------+--------+ 376 | 377 | LValue: 378 | 379 | +------+------+---------+-----+ 380 | | Name | lent | IntExpr | ... | 381 | +======+======+=========+=====+ 382 | | Name | n | IntExpr | ... | 383 | +------+------+---------+-----+ 384 | 385 | *n* is the number of *IntExpr* following. 386 | 387 | Name: 388 | 389 | +------+-----+ 390 | | opct | idt | 391 | +======+=====+ 392 | | tag | id | 393 | +------+-----+ 394 | 395 | If *tag* is 0x04, *id* is the HAIL ID; if *tag* is 0x05, *id* is the Mu ID. 396 | 397 | IntExpr can be intLit or a global name (Name with tag=2) 398 | 399 | intLit: 400 | 401 | +------+-----+ 402 | | opct | i64 | 403 | +======+=====+ 404 | | 0x12 | lit | 405 | +------+-----+ 406 | 407 | *lit* is the literal. There is currently no way to express integer literals 408 | longer than 64 bits. *lit* is truncated or zero-extended to the LValue length. 409 | 410 | *RValue* can be one of the following: 411 | 412 | 1. A Mu global SSA variable: 413 | 414 | +------+-----+ 415 | | opct | idt | 416 | +======+=====+ 417 | | 0x11 | gv | 418 | +------+-----+ 419 | 420 | gv it the ID of the global SSA variable. 421 | 422 | 2. An integer literal (see intLit above) 423 | 424 | 3. A 32-bit float literal 425 | 426 | +------+-------+ 427 | | opct | float | 428 | +======+=======+ 429 | | 0x13 | value | 430 | +------+-------+ 431 | 432 | 4. A 64-bit float literal 433 | 434 | +------+--------+ 435 | | opct | double | 436 | +======+========+ 437 | | 0x14 | value | 438 | +------+--------+ 439 | 440 | 5. A ``NULL`` literal 441 | 442 | +------+ 443 | | opct | 444 | +======+ 445 | | 0x15 | 446 | +------+ 447 | 448 | 6. An object reference to an object allocated in HAIL 449 | 450 | +------+-----+ 451 | | opct | idt | 452 | +======+=====+ 453 | | 0x16 | id | 454 | +------+-----+ 455 | 456 | *id* is the HAIL id. 457 | 458 | 7. An internal reference of an LValue 459 | 460 | +------+--------+ 461 | | opct | LValue | 462 | +======+========+ 463 | | 0x17 | LValue | 464 | +------+--------+ 465 | 466 | 8. A list of other values of any kinds. 467 | 468 | +------+--------+--------+--------+--------+ 469 | | opct | i64 | RValue | RValue | ... | 470 | +======+========+========+========+========+ 471 | | 0x18 | nelems | rv1 | rv2 | ... | 472 | +------+--------+--------+--------+--------+ 473 | 474 | *nelems* is the number of RValues following it. This structure is recursive. 475 | 476 | 9. A list of other values of the same kind of literals. 477 | 478 | +------+--------+------+---------+---------+--------+ 479 | | opct | i64 | opct | literal | literal | ... | 480 | +======+========+======+=========+=========+========+ 481 | | 0x19 | nelems | kind | lit1 | lit2 | ... | 482 | +------+--------+------+---------+---------+--------+ 483 | 484 | *nelems* is the number of literals following. *kind* can be one in the following 485 | table, and *kind* determines the following *literal* element type. 486 | 487 | =========== ============== 488 | opct literal type 489 | ----------- -------------- 490 | 0x12 i64 491 | 0x13 float 492 | 0x14 double 493 | 0x1a i8 494 | 0x1b i16 495 | 0x1c i32 496 | =========== ============== 497 | 498 | This allows more compact encoding of large arrays of simple elements. The 499 | literal type, however, does not need to match the actual type of the LValue, 500 | because implicit truncating or zero-extension always happen. 501 | 502 | Future Work 503 | =========== 504 | 505 | The binary format is not the most efficient format possible. HAIL is still an 506 | interpreted format, even when it is binary. It is designed to be convenient, 507 | reasonably efficient and platform-independent. 508 | 509 | There could be implementation-specific ways of serialising data faster than this 510 | portable interface. 511 | 512 | The native interface can also potentially outperform HAIL. Using object pinning 513 | and pointers, Mu IR programs can directly memcpy data from files, such as 514 | copying strings from ``.class`` files. 515 | 516 | .. vim: tw=80 517 | -------------------------------------------------------------------------------- /memory-model.rest: -------------------------------------------------------------------------------- 1 | ============ 2 | Memory Model 3 | ============ 4 | 5 | The Mu memory model basically follows the C11 memory model with a few 6 | modifications to make it suitable for Mu. 7 | 8 | Overview 9 | ======== 10 | 11 | Mu does not enforce any strong order, but trusts the client to correctly use the 12 | atomic and ordering mechanisms provided by Mu. Many choices of ordering, from 13 | relaxed to sequentially consistent, on each memory operation are given to the 14 | client. The client has the freedom to make its choice and has the responsibility 15 | to synchronise their multi-threaded programs. 16 | 17 | The most restricted form of memory accesses are sequentially consistent. Mu 18 | guarantees that they are atomic. They follow the well-known acquire-release 19 | memory model and there is a total order of all such memory accesses in all 20 | threads. 21 | 22 | A less-restricted form is acquire and release. It does not have the total 23 | order provided by sequential consistency, but atomicity and the synchronise-with 24 | relationship between release and acquire operations are provided. 25 | 26 | The "consume" order exploits the fact that some processors with relaxed memory 27 | order can figure out the dependencies between load operations in the hardware 28 | and will not reorder them even without memory fences. It can achieve some 29 | synchronisation requirements more efficiently than the "acquire" order. 30 | 31 | The "relaxed" order only guarantees atomicity but does not enforce any order. 32 | The most unrestricted form of memory access is not atomic. These operations 33 | allows the Mu implementation and the processor to maximise the throughput while 34 | relying on the programmer to correctly synchronise their programs. 35 | 36 | Notable differences from C11 37 | ---------------------------- 38 | 39 | The program order in Mu is a total order while the sequenced-before relationship 40 | in C is a partial order, because there are unspecified order of evaluations in 41 | C. 42 | 43 | There is no "atomic type" in Mu. Only operations make a difference between 44 | atomic and non-atomic accesses. Using both atomic and non-atomic operations on 45 | the same memory location is an undefined behaviour in Mu. 46 | 47 | The primitives for atomic accesses and fences are provided by the instruction 48 | set of Mu rather than the library. Mutex locks, however, have to be implemented 49 | on top of this memory model. 50 | 51 | Notable differences from LLVM 52 | ----------------------------- 53 | 54 | LLVM does not guarantee that all atomic writes to a memory location has a total 55 | order unless using ``monotonic`` or stronger memory order. It provides an 56 | ``unordered`` order which is atomic but not "monotonic". The ``unordered`` order 57 | is intended to support the Java memory model, but whether ``unordered`` is 58 | necessary and whether the ``relaxed`` order in C11 or the ``monotonic`` in LLVM 59 | is suitable for Java is not yet known. 60 | 61 | In LLVM, an atomic operation can be labelled ``singlethread``, in which case it 62 | only synchronises with or participates in modification and ``seq_cst`` total 63 | orderings with other operations running in the same thread (for example, in 64 | signal handlers). C11 provides ``atomic_signal_fence`` for similar purposes. 65 | 66 | Concepts 67 | ======== 68 | 69 | data value 70 | See `Type System `__ 71 | 72 | SSA variable, instruction and evaluation 73 | See `Instruction Set `__ 74 | 75 | memory, initial value, load, store, access and conflict 76 | See `Mu and the Memory `__ 77 | 78 | thread 79 | A thread is the unit of CPU scheduling. In this memory model, threads 80 | include but are not limited to Mu threads. See `Threads and Stacks `__ for the 81 | definition of Mu threads. 82 | 83 | stack, stack binding, stack unbinding, swap-stack 84 | See `Threads and Stacks `__ 85 | 86 | futex, futex_wait, futex_wake 87 | See `Threads and Stacks `__ 88 | 89 | Comparison of Terminology 90 | ------------------------- 91 | 92 | The following table is a approximate comparison and may not strictly apply. 93 | 94 | =================== ================================ 95 | C Mu 96 | =================== ================================ 97 | value data value 98 | expression SSA variable 99 | object memory location 100 | memory location memory location of scalar type 101 | (N/A) object 102 | read load 103 | modify store 104 | =================== ================================ 105 | 106 | Operations 107 | ========== 108 | 109 | Operations include (but are not limited to) the following: 110 | 111 | load 112 | A memory load. May be atomic or not. 113 | 114 | store 115 | A memory store. May be atomic or not. 116 | 117 | atomic read-modify-write 118 | A load and (maybe conditionally) a store as one atomic action. It may 119 | contain both a load and a store operation, but may have special atomic 120 | properties. 121 | 122 | fence 123 | A fence introduces memory orders. 124 | 125 | stack binding 126 | Binding a thread to a stack. 127 | 128 | stack unbinding 129 | Unbinding a thread from a stack. 130 | 131 | swap-stack 132 | Unbinding a thread from a stack and bind that thread to another stack. 133 | 134 | futex wait 135 | Waiting on a memory location. 136 | 137 | futex wake 138 | Wake up threads waiting on a memory location. 139 | 140 | external operation 141 | Any other operation that may affect the state outside Mu. 142 | 143 | .. 144 | 145 | NOTE: Unlike the Java Memory Model, Mu memory model does not contain locks. 146 | 147 | Memory Operations 148 | ================= 149 | 150 | Some instructions and API functions perform memory operations. Specifically, 151 | 152 | - The ``LOAD`` instruction and the ``load`` API function perform a load 153 | operation. 154 | - The ``STORE`` instruction and the ``store`` API function perform a store 155 | operation. 156 | - The ``CMPXCHG`` instruction and the ``cmpxchg`` API function perform a 157 | compare-exchange operation, which is a kind of atomic read-modify-write 158 | operation. 159 | - The ``ATOMICRMW`` instruction and the ``atomicrmw`` API function perform an 160 | atomic read-modify-write operation. 161 | - The ``FENCE`` instruction and the ``fence`` API function are a fence. 162 | - A concrete implementation may have other ways to perform those instructions. 163 | 164 | .. 165 | 166 | NOTE: Programs in other languages (e.g. native programs or any other 167 | language a Mu implementation can interface with) can synchronise with Mu in 168 | an implementation-specific way. But the implementation must guarantee that 169 | those programs perform those operations in a way compatible with Mu. 170 | 171 | For example, there are more than one way to implement loads and stores of 172 | the SEQ_CST order (either put fences in the load or in the store). If the 173 | implementation interfaces with a C implementation (e.g. 174 | gcc+glibc+Linux+x86_64), then Mu should do the same thing as (or be 175 | compatible with) the C program. 176 | 177 | Load, store, atomic read-modify-write operations and fences have memory orders, 178 | which are the following: 179 | 180 | - NOT_ATOMIC 181 | - RELAXED 182 | - CONSUME 183 | - ACQUIRE 184 | - RELEASE 185 | - ACQ_REL (acquire and release) 186 | - SEQ_CST (sequentially consistent) 187 | 188 | All accesses that are not NOT_ATOMIC are atomic. Using both non-atomic 189 | operations and atomic operations on the same memory location is an undefined 190 | behaviour. 191 | 192 | - Load shall have NOT_ATOMIC, RELAXED, CONSUME, ACQUIRE or SEQ_CST order. 193 | - Store shall have NOT_ATOMIC, RELAXED, RELEASE or SEQ_CST order. 194 | - Compare-exchange shall have RELAXED, ACQUIRE, RELEASE, ACQ_REL or SEQ_CST on 195 | success and RELAXED, ACQUIRE or SEQ_CST on failure. 196 | - Other atomic read-modify-write operations shall have RELAXED, ACQUIRE, 197 | RELEASE, ACQ_REL or SEQ_CST order. 198 | - Fence shall have ACQUIRE, RELEASE, ACQ_REL or SEQ_CST order. 199 | 200 | =========== ======= ======= =============== =============== =========== ===== 201 | Order LOAD STORE CMPXCHG(succ) CMPXCHG(fail) ATOMICRMW FENCE 202 | =========== ======= ======= =============== =============== =========== ===== 203 | NOT_ATOMIC yes yes no no no no 204 | RELAXED yes yes yes yes yes no 205 | CONSUME yes no no no no no 206 | ACQUIRE yes no yes yes yes yes 207 | RELEASE no yes yes no yes yes 208 | ACQ_REL no no yes no yes yes 209 | SEQ_CST yes yes yes yes yes yes 210 | =========== ======= ======= =============== =============== =========== ===== 211 | 212 | - A load operation with ACQUIRE, ACQ_REL or SEQ_CST order performs a **acquire** 213 | operation on its specified memory location. 214 | - A load operation with CONSUME order performs a **consume** operation on its 215 | specified memory location. 216 | - A store operation with RELEASE, ACQ_REL or SEQ_CST order performs a 217 | **release** operation on its specified memory location. 218 | - A fence with ACQUIRE, ACQ_REL or SEQ_CST order is a **acquire fence**. 219 | - A fence with RELEASE, ACQ_REL or SEQ_CST order is a **release fence**. 220 | 221 | Orders 222 | ====== 223 | 224 | Program Order 225 | ------------- 226 | 227 | All evaluations performed by a Mu thread form a total order, in which the 228 | operations performed by each evaluation are **sequenced before** operations 229 | performed by its successor. 230 | 231 | All operations performed by a Mu client via a particular client context of the 232 | API form a total order, in which each operation is **sequenced before** its 233 | successor. 234 | 235 | Operations before a ``TRAP`` or ``WATCHPOINT`` are **sequenced before** the 236 | operations in the trap handler. Operations in the trap handler are **sequenced 237 | before** operations after that ``TRAP`` or ``WATCHPOINT``. 238 | 239 | The **program order** contains operations and their "sequenced before" 240 | relations. 241 | 242 | NOTE: This means all Mu instructions plus all client operations done by the 243 | trap handler in a Mu thread still forms a total order. 244 | 245 | In C, the program order is a partial order even in a single thread because 246 | of unspecified order of evaluations. 247 | 248 | Modification Order 249 | ------------------ 250 | 251 | All atomic store operations on a particular memory location M occur in some 252 | particular total order, called the **modification order** of M. If A and B are 253 | atomic stores on memory location M, and A happens before B, then A shall precede 254 | B in the modification order of M. 255 | 256 | NOTE: This is to say, the modification order is consistent with the happens 257 | before order. 258 | 259 | NOTE: This reflects the mechanisms, including cache coherence, provided by 260 | some hardware that guarantees such a total order. 261 | 262 | A **release sequence** headed by a release operation A on a memory location M is 263 | a maximal contiguous sub-sequence of atomic store operations in the modification 264 | order M, where the first operation is A and every subsequent operation either is 265 | performed by the same thread that performed the release or is an atomic 266 | read-modify-write operation. 267 | 268 | NOTE: In Mu, when a memory location is accessed by both atomic and 269 | non-atomic operations, it is an undefined behaviour. So the release sequence 270 | only apply for memory locations only accessed by atomic operations. 271 | 272 | NOTE: Intuitively, there is a invisible fence before a release store (which 273 | is sometimes actually implemented as this). Seeing a store in the release 274 | sequence should imply seeing stores before the invisible fence. 275 | 276 | The Synchronises With Relation 277 | ------------------------------ 278 | 279 | An evaluation A **synchronises with** another evaluation B if: 280 | 281 | - A performs a release operation on memory location M, and, B performs an 282 | acquire operation on M, and, sees a value stored by an operation in the 283 | release sequence headed by A, or 284 | - A is a release fence, and, B is an acquire fence, and, there exist atomic 285 | operations X and Y, both operating on some memory location M, such that A is 286 | sequenced before X, X store into M, Y is sequenced before B, and Y sees the 287 | value written by X or a value written by any store operation in the 288 | hypothetical release sequence X would head if it were a release operation, or 289 | - A is a release fence, and, B is an atomic operation that performs an acquire 290 | operation on a memory location M, and, there exists an atomic operation X such 291 | that A is sequenced before X, X stores into M, and B sees the value written by 292 | X or a value written by any store operations in the hypothetical release 293 | sequence X would head if it were a release operation, or 294 | - A is an atomic operation that performs a release operation on M, and, B is an 295 | acquire fence, and, there exists some atomic operation X on M such that X is 296 | sequenced before B and sees the value written by A or a value written by any 297 | side effect in the release sequence headed by A, or 298 | - A is the creation of a thread and B is the beginning of the execution of the 299 | new thread. 300 | - A is a futex wake operation and B is the next operation after the futex wait 301 | operation of the thread woken up by A. 302 | 303 | .. 304 | 305 | NOTE: A thread can be created by the ``NEWTHREAD`` instruction or the 306 | ``new_thread`` API function. 307 | 308 | NOTE: Since there is no explicit heap memory management in Mu, the 309 | "synchronises with" relation in C involving ``free`` and ``realloc`` does 310 | not apply in Mu. 311 | 312 | NOTE: Mu only provides very primitive threading support. The "synchronises 313 | with" relations involving ``call_once`` and ``thrd_join`` are not in the 314 | memory model, but can be implemented on a higher level. 315 | 316 | NOTE: The "synchronises with" relation between the futex wake and wait is 317 | necessary to ensure the visibility of values written by one thread to be 318 | visible immediately by the woken thread. If such relation does not exist, 319 | the woken thread may never see the memory change made by the other thread. 320 | For example:: 321 | 322 | // C pseudo code 323 | int shared_var = 42; 324 | atomic_int futex = 0; 325 | 326 | thread1 { 327 | shared_var = 43; 328 | futex = 1; // Op1 329 | futex_wake(&futex); // Op2 330 | } 331 | 332 | thread2 { 333 | while(futex == 0) { // Op4 334 | futex_wait(&futex, 0); // Op3 335 | } 336 | int local_var = shared_var; 337 | } 338 | 339 | If the "synchronises with" between Op2 and Op3 does not exist, then Op4 may 340 | never see the value written by Op1, and thread2 will loop indefinitely. 341 | 342 | Dependency 343 | ---------- 344 | 345 | An evaluation A **carries a dependency to** another evaluation B, or B *carries 346 | a dependency from* A, if: 347 | 348 | - the data value of A is used as a data argument of B unless: 349 | 350 | * A is used in the ``KEEPALIVE`` clause of B, or 351 | * B is a ``SELECT`` instruction and A is its ``cond`` argument or a is the 352 | ``iftrue`` or ``iffalse`` argument not selected by ``cond``, or 353 | * A is a comparing or ``INSERTVALUE`` instruction, or 354 | * B is a ``@uvm.kill_dependency``, ``CALL``, ``EXTRACTVALUE`` or ``CCALL`` 355 | instruction, or 356 | 357 | - there is a store operation X such that A is sequenced before X and X is 358 | sequenced before B, and, X stores the value of A to a memory location M, and, 359 | B performs a load operation from M, or 360 | - for some evaluation X, A carries a dependency to X and X carries a dependency 361 | to B. 362 | 363 | .. 364 | 365 | NOTE: The "carries a dependency to" relation together with the 366 | "dependency-ordered before"" relation exploits the fact that some 367 | processors, notably ARM and POWER, will not reorder load operations if the 368 | address used in the later in the program order depends on the result of the 369 | earlier load. On such processors, the earlier load can be implemented as an 370 | ordinary load without fences and still has "consume" semantic. 371 | 372 | NOTE: Processors including ARM and POWER only respects data dependency, not 373 | control dependency. The ``SELECT`` instruction and the comparing instruction 374 | are usually implemented by conditional moves or conditional flags, which 375 | would end up that the result is control-dependent on the argument rather 376 | than data dependent. 377 | 378 | NOTE: Operations involving ``struct`` types in Mu may be implemented as 379 | no-ops. Consider the following:: 380 | 381 | .typedef @i64 = int<64> 382 | .const @I64_0 <@i64> = 0 383 | 384 | .type @A = struct <@i64 @i64> 385 | .const @A_ZERO <@A> = {@I64_0 @I64_0} 386 | 387 | %v = LOAD CONSUME <@i64> %some_memory_location 388 | %x = INSERTVALUE <@A 0> @A_ZERO %v // {%v 0} 389 | %y = EXTRACTVALUE <@A 0> %x // %v 390 | %z = EXTRACTVALUE <@A 1> %x // 0 391 | 392 | Mu can alias ``%y`` with ``%v`` in the machine code, but ``%z`` is always a 393 | constant zero. 394 | 395 | NOTE: Dependencies may not always be carried across function calls. A 396 | function may return a constant and it is uncertain if any processor respect 397 | this order. 398 | 399 | An evaluation A is **dependency-ordered before** another evaluation B if any of 400 | the following is true: 401 | 402 | * A performs a release operation on a memory location M, and, in another thread, 403 | B performs a consume operation on M and sees a value stored by any store 404 | operations in the release sequence headed by A. 405 | * For some evaluation X, A is dependency-ordered before X and X carries a 406 | dependency to B. 407 | 408 | .. 409 | 410 | NOTE: The "dependency-ordered before" relation consists of a release/consume 411 | pair followed by zero or more "carries a dependency to" relations. If the 412 | consume sees the value of (or "later than") the release operation, then 413 | subsequent loads that depends on the consume operation should also see 414 | values stored before the release operation. 415 | 416 | .. 417 | 418 | TODO: The "carries a dependency to" relation is not well-defined for the 419 | client since it may be written in a different language. 420 | 421 | The Happens Before Relation 422 | --------------------------- 423 | 424 | An evaluation A **inter-thread happens before** an evaluation B if A 425 | synchronises with B, A is dependency-ordered before B, or, for some evaluation 426 | X: 427 | 428 | * A synchronises with X and X is sequenced before B, 429 | * A is sequenced before X and X inter-thread happens before B, or 430 | * A inter-thread happens before X and X inter-thread happens before B. 431 | 432 | .. 433 | 434 | NOTE: This basically allows any concatenations of "synchronises with", 435 | "dependency-ordered before" and "sequenced before" relations, but disallows 436 | ending with a "dependency-ordered before" relation followed by a "sequenced 437 | before" relation. It is disallowed because the consume load in the 438 | "dependency-ordered before" relation only respects later loads that works 439 | with a location that depends on the consume load, not arbitrary loads 440 | sequenced after it. It is only disallowed in the end because the release 441 | operation in a "synchronises with" relation or a "dependency-ordered before" 442 | relation will force the order between it and any preceding operations. 443 | 444 | NOTE: A sequence of purely "sequenced before" is not "inter-thread" and is 445 | also not allowed in the "inter-thread happens before" relation. 446 | 447 | An evaluation A **happens before** an evaluation B if A is sequenced before B or 448 | A inter-thread happens before B. 449 | 450 | Value Visibility 451 | ---------------- 452 | 453 | A load operation B from a memory location M shall see the initial value of M, 454 | the value stored by a store operation A sequenced before B, or other permitted 455 | values defined later. 456 | 457 | A **visible store operation** A to a memory location M with respect to a load 458 | operation B from M satisfies the conditions: 459 | 460 | * A happens before B, and 461 | * there is no other store operation X to M such that A happens before X and X 462 | happens before B. 463 | 464 | A non-atomic load operation B from memory location M shall see the value stored 465 | by the visible store operation A. 466 | 467 | NOTE: If there is ambiguity about which store operation is visible to a 468 | non-atomic load operation, then there is a data race and the behaviour is 469 | undefined. 470 | 471 | The **visible seqeunce of atomic store operations** to a memory location M with 472 | respect to an atomic load operation B from M, is a maximal contiguous 473 | sub-sequence of atomic store operations in the modification order of M, where 474 | the first operation is visible with respect to B, and for every subsequent 475 | operation, it is not the case that B happens before it. The atomic load 476 | operation B sees the value stored by some atomic load operation in the visible 477 | sequence M. Furthermore, if an atomic load operation A from memory location M 478 | happens before an atomic load operation B from M, and A sees a value stored by 479 | an atomic store operation X, then the value B sees shall either equal the value 480 | seen by A, or be the value stored by an atomic store operation Y, where Y 481 | follows X in the modification order of M. 482 | 483 | NOTE: This means, a load cannot see the value stored by an operation happens 484 | after it, or a store operation separated by another store in the 485 | happen-before relation. Furthermore, the later operation of two loads 486 | cannot see an earlier value than that seen by the first load. 487 | 488 | The execution of a program contains a **data race** if it contains two 489 | conflicting non-atomic memory accesses in different threads, neither happens 490 | before the other. Any such data race results in undefined behaviour. 491 | 492 | NOTE: Using both atomic and non-atomic accesses on the same memory location 493 | is already an undefined behaviour, whether in the same thread of not. 494 | 495 | Special Rules for SEQ_CST 496 | ========================= 497 | 498 | There shall be a single total order S on all SEQ_CST operations, consistent with 499 | the "happens before" order and modification orders for all affected memory 500 | locations, such that each SEQ_CST load operation B from memory location M sees 501 | one of the following values: 502 | 503 | * the result of the last store operation A that precedes B in S, if it exists, 504 | or 505 | * if A exists, the result of some store operation to M in the visible sequence 506 | of atomic store operations with respect to B that is not SEQ_CST and does not 507 | happen before A, or 508 | * if A does not exist, the result of some store operation to M in the visible 509 | sequence of atomic store operations with respect to B that is not SEQ_CST. 510 | 511 | For an atomic load operation B from a memory location M, if there is a SEQ_CST 512 | fence X sequenced before B, then B observes either the last SEQ_CST store 513 | operation of M preceding X in the total order S or a later store operation of M 514 | in its modification order. 515 | 516 | For atomic operations A and B on a memory location M, where A stores into M and 517 | B loads from M, if there is a SEQ_CST fence X such that A is sequenced before X 518 | and B follows X in S, then B observes either the effect of A or a later store 519 | operation of M in its modification order. 520 | 521 | For atomic operations A and B on a memory location M, where A stores into M and 522 | B loads from M, if there are SEQ_CST fences X and Y such that A is sequenced 523 | before X, Y is sequenced before B and X precedes Y in S, then B observes either 524 | the effect of A or a later store operation of M in its modification order. 525 | 526 | Special Rules for Atomic Read-modify-write Operations 527 | ===================================================== 528 | 529 | Atomic read-modify-write operations shall always see the last value (in the 530 | modification order) stored before the store operation associated with the 531 | read-modify-write operation. 532 | 533 | Special Rules for Stack Operations 534 | ================================== 535 | 536 | A swap-stack operation performs an unbinding operation followed by a binding 537 | operation. The former is sequenced before the latter. 538 | 539 | In the evaluation of a ``TRAP`` or ``WATCHPOINT`` instruction, the implied stack 540 | unbinding operation is sequenced before any operations performed by the client. 541 | If the client chooses to return and rebind the stack, the stack binding 542 | operation is sequenced after all operations performed by the client and the 543 | implied stack unbinding operation. 544 | 545 | Stack binding and unbinding operations are not atomic. If there is a pair of 546 | stack binding or unbinding operations on the same stack, but do not have a 547 | "happens before" relation, it has undefined behaviour. 548 | 549 | Special Rules for Futex 550 | ======================= 551 | 552 | The load operations performed by the ``@uvm.futex.wait``, 553 | ``@uvm.futex.wait_timeout`` and ``@uvm.futex.cmp_requeue`` on the memory 554 | location given by its argument are atomic. 555 | 556 | Special Rules for Functions and Function Redefinition 557 | ===================================================== 558 | 559 | The rules of memory access applies to functions as if 560 | 561 | * a function were a memory location that holds a function version, and 562 | 563 | * a creation of a frame for a function were an atomic load on that location of 564 | the RELAXED order, which sees a particular version, and 565 | 566 | * a function definition or redefinition during the load of a bundle were an 567 | atomic store on that location of the RELAXED order, which stores a new 568 | version. 569 | 570 | .. 571 | 572 | NOTE: A frame is created when: 573 | 574 | 1. calling a function by the ``CALL`` or ``TAILCALL`` instructions, or by 575 | native programs through exposed Mu functions, or 576 | 577 | 2. creating a new stack by the ``@uvm.new_stack`` instruction or the 578 | ``new_stack`` API, or 579 | 580 | 3. pushing a new frame by the ``push_frame`` API or the 581 | ``@uvm.meta.push_frame`` instruction. 582 | 583 | The order of definitions and redefinitions of a particular function is 584 | consistent with the order the bundles that contain the definitions are loaded. 585 | 586 | NOTE: This means synchronisation operations must be used to guarantee other 587 | threads other than the one which loads a bundle see the most recent version 588 | of a function. 589 | 590 | Out-of-thin-air or Speculative stores 591 | ===================================== 592 | 593 | TODO 594 | 595 | .. vim: tw=80 596 | -------------------------------------------------------------------------------- /native-interface-x64-unix.rest: -------------------------------------------------------------------------------- 1 | =========================== 2 | AMD64 Unix Native Interface 3 | =========================== 4 | 5 | This is the native interface for AMD64 on Unix-like operating systems. 6 | 7 | Memory Layout 8 | ============= 9 | 10 | The memory layout of the native memory uses the `System V Application Binary 11 | Interface for x86-64 `__ 12 | (referred to as "the AMD64 ABI" from now on) as a reference. It is recommended 13 | that the Mu memory also use this memory layout, but is not required since the Mu 14 | memory it is never externally visible unless explicitly pinned. 15 | 16 | Data in the native memory uses the sizes and alignments of types are listed in 17 | the following table. The unit of sizes and alignments is byte, which is 8 bits. 18 | All non-native-safe types have unspecified sizes and alignments. 19 | 20 | .. table:: Mapping between Mu and C types 21 | 22 | =========================== ======================= =============== ================ 23 | Mu type C type Size Alignment 24 | =========================== ======================= =============== ================ 25 | ``int<8>`` ``char`` 1 1 26 | ``int<16>`` ``short`` 2 2 27 | ``int<32>`` ``int`` 4 4 28 | ``int<64>`` ``long``, ``long long`` 8 8 29 | ``float`` ``float`` 4 4 30 | ``double`` ``double`` 8 8 31 | ``vector 4>`` ``__m128`` 16 16 32 | ``vector`` ``__m128`` 16 16 33 | ``vector`` ``__m128`` 16 16 34 | ``uptr`` ``T *`` 8 8 35 | ``ufuncptr`` ``T (*) ()`` 8 8 36 | ``ref`` N/A unspecified unspecified 37 | ``iref`` N/A unspecified unspecified 38 | ``weakref`` N/A unspecified unspecified 39 | ``tagref64`` N/A unspecified unspecified 40 | ``funcref`` N/A unspecified unspecified 41 | ``threadref`` N/A unspecified unspecified 42 | ``stackref`` N/A unspecified unspecified 43 | ``int<1>`` N/A unspecified unspecified 44 | ``int<6>`` N/A unspecified unspecified 45 | ``int<52>`` N/A unspecified unspecified 46 | ``int`` unspecified unspecified unspecified 47 | ``vector`` unspecified unspecified unspecified 48 | =========================== ======================= =============== ================ 49 | 50 | .. 51 | 52 | NOTE: Although ``int<1>`` is required and ``int<6>`` and ``int<52>`` are 53 | required when ``tagref64`` is implemented, their memory layout is 54 | unspecified because memory access instructions ``LOAD``, ``STORE``, etc. are 55 | not required to support those types. It it not recommended to include those 56 | types in the memory because they may never be loaded or stored. 57 | 58 | Although vectors of other lengths are not required by a Mu implementation, 59 | implementations are encouraged to support them in a way compatible with the 60 | AMD64 ABI. 61 | 62 | The structure type ``struct<...>`` and the hybrid type ``hybrid`` is 63 | aligned to its most strictly aligned component. Each member is assigned to the 64 | lowest available offset with the appropriate alignment. This rule applies to 65 | hybrids as if the hybrid ``hybrid`` is a struct of fields ``Fs`` followed 66 | by a flexible array member (as in C99) ``V fam[];``. Arrays ``array`` use 67 | the same alignment as its elements. 68 | 69 | NOTE: There is no union types in Mu. Arrays do not have special rules of 70 | 16-byte alignment as the AMD64 ABI does. Mu arrays must be declared as an 71 | array of vectors (such as ``array 4> 100>``) to be eligible 72 | for vector access. 73 | 74 | Both integers and floating point numbers are little-endian (lower bytes in lower 75 | addresses). Signed integers use the 2's complement representation. Elements with 76 | lower indexes in a vector is stored in lower addresses in the memory. 77 | 78 | Calling Convention 79 | ================== 80 | 81 | The calling convention between Mu functions is implementation-defined. 82 | 83 | The Default Calling Convention 84 | ------------------------------ 85 | 86 | The *default* calling convention, denoted by the ``#DEFAULT`` flag in the IR, 87 | follows the AMD64 ABI in register usage, stack frame structure, parameter 88 | passing and returning. The parameter types and the return types are mapped to C 89 | types according to the above table. Functions in this calling convention can 90 | return at most one value. As a special case, if the native function signature 91 | returns void, the corresponding Mu signature returns no values ``(...) -> ()``. 92 | Mu ``struct`` and ``array`` types are mapped to C structs of corresponding 93 | members. ``array`` cannot be the type of parameters or return values because C 94 | disallows this, but arrays within structs and pointers to arrays are is allowed. 95 | 96 | Arguments and return values are passed in registers and the memory according to 97 | the AMD64 ABI, with the types of Mu arguments and the return type mapped to the 98 | corresponding C types. 99 | 100 | NOTE: This is to say, C programs can call Mu functions with a "compatible" 101 | signature (that is, parameters and return values match the above table). 102 | Even if the signature is not "perfectly" matching (for example, an int/long 103 | is passed when a pointer is expected, Mu must still interpret the incoming 104 | arguments strictly according to the ABI, i.e. interpreting the integer value 105 | in the register as an address). 106 | 107 | If a Mu function of signature *sig* is exposed with the *default* calling 108 | convention, the resulting value has ``ufuncptr`` type, i.e. it is a 109 | function pointer which can be called (by either Mu or native programs) with the 110 | *default* calling convention. 111 | 112 | It has undefined behaviour when the native program attempts to unwind Mu frames. 113 | 114 | NOTE: This means C ``longjmp`` and C++ exceptions must not go through Mu 115 | frames, but as long as they are handled **above** any Mu frames, it is safe. 116 | 117 | .. vim: tw=80 118 | -------------------------------------------------------------------------------- /native-interface.rest: -------------------------------------------------------------------------------- 1 | ================ 2 | Native Interface 3 | ================ 4 | 5 | This chapter defines the Mu native interface. 6 | 7 | NOTE: The term **foreign function interface** may have been used in many 8 | other places by experienced VM engineers to mean a heavy-weighted complex 9 | interface with another language. JikesRVM users use **foreign function 10 | interface** to refer to JNI and use **syscall** to refer to a light-weight 11 | unsafe mechanism to call arbitrary C functions with minimal overhead. 12 | 13 | The ``CCALL`` instruction is more similar to the latter. It has minimum 14 | overhead, but provides no protection to malicious code. So it must be used 15 | with care. 16 | 17 | To reduce confusion, we use the term **unsafe native interface** or just 18 | **native interface** instead of *foreign function interface*. 19 | 20 | The **native interface** is a *light-weight* *unsafe* interface through which 21 | *Mu IR programs* communicate with *native programs*. 22 | 23 | NOTE: This has no direct relationship with the Mu client interface. 24 | 25 | * Native programs are usually written in C, C++ or other low-level languages 26 | and usually does not run on VMs. 27 | 28 | * A Mu client is not necessary a native program. The client can be written 29 | in a managed language, running in a VM, running in the same Mu VM as 30 | user-level programs (i.e. a "metacircular" client), or living in a 31 | different process or even a different computer, communicating with Mu 32 | using sockets. 33 | 34 | However, it does not rule out the possibility to implement the Mu client 35 | interface *for* native programs *via* this native interface. 36 | 37 | The main purpose of the native interface is 38 | 39 | 1. to interoperate with the operating system by invoking system libraries 40 | (including system calls), and 41 | 42 | 2. to interoperate with libraries written in other programming languages. 43 | 44 | .. 45 | 46 | NOTE: The purpose of the Mu client interface is to let the client control 47 | the Mu micro VM and handle events. The native interface is not about 48 | "controlling Mu". 49 | 50 | It is not a purpose to interface with *arbitrary* native libraries. This 51 | interface should be minimal but just enough to handle most *common* system calls 52 | (e.g. ``open``, ``read``, ``write``, ``close``, ...) and *common* native 53 | libraries. Complex data types and functions (e.g. those with unusual 54 | size/alignment requirements or calling conventions) may require wrapper code 55 | provided by the language implementer. 56 | 57 | The native interface is not required to be *safe*. The overhead of this 58 | interface should be as low as possible. It is the client's responsibility to 59 | implement things like JNI on top of this interface. 60 | 61 | For JikesRVM users: The native interface includes raw memory access which is 62 | similar to "vmmagic" and the ``CCALL`` instruction is more like the 63 | "syscall" mechanism. They are not safe, but highly efficient and should be 64 | used with care. 65 | 66 | .. 67 | 68 | NOTE: Directly making system calls from Mu and bypassing the C library 69 | (libc) is theoretically possible, but is not a mainstream way to do so. It 70 | has a lower priority in the design. 71 | 72 | Outline 73 | ======= 74 | 75 | This interface has several aspects: 76 | 77 | 1. **Raw memory access**: This interface provides pointer types and directly 78 | access the memory via pointers. 79 | 80 | 2. **Exposing Mu memory to the native world**: This allows native programs to 81 | access Mu memory in a limited fashion. 82 | 83 | 3. **Native function call**: This interface provides a mechanism to call a 84 | native function using a native calling convention. 85 | 86 | 4. **Callback from native programs**: This interface will enable calling back 87 | from the native program. 88 | 89 | 5. **Inline assembly**: Directly inserting machine-dependent instructions into 90 | a Mu IR function. 91 | 92 | Raw Memory Access 93 | ================= 94 | 95 | This section defines mechanisms for raw memory access. *Pointers* give Mu 96 | programs access to the native (raw) memory, while *pinning* gives native 97 | programs access to the Mu memory. 98 | 99 | Pointers 100 | -------- 101 | 102 | A **pointer** is an address in the memory space of the current process. A 103 | pointer can be a **data pointer** (type ``uptr``) or **function pointer** 104 | (type ``ufuncptr``). The former assumes a data value is stored in a region 105 | beginning with the address. The latter assumes a piece of executable machine 106 | code is located at the address. 107 | 108 | ``uptr``, ``ufuncptr`` and ``int``, where ``T`` is a type, ``sig`` is a 109 | function signature, can be cast to each other using the ``PTRCAST`` instruction. 110 | The address is preserved and the ``int`` type has the numerical value of the 111 | address. Type checking is not performed. 112 | 113 | Potential problem: There may be machines where data pointers have a 114 | different size from function pointers, but I have never seen one. 115 | 116 | For C users: C spec never defined pointers as addresses. C pointers can 117 | point to either objects (region of storage) or functions. Casting between 118 | object pointers, function pointers and integers has implementation-defined 119 | behaviours. 120 | 121 | There are segmented architectures, including x86, whose "pointers" are 122 | segments + offsets. However, apparently the trend is to move to a "flat" 123 | memory space. 124 | 125 | Pinning 126 | ------- 127 | 128 | A **pinning** operations takes either a ``ref`` value or an ``iref`` value 129 | as parameter. The result is a data pointer. If it is an ``iref``, the data 130 | pointer can be used to access the memory location referred by the ``iref``. 131 | Pinning a ``NULL`` ``iref`` returns a ``NULL`` pointer whose address is 0. If it 132 | is a ``ref``, it is equivalent to pin the ``iref`` of the memory location of 133 | the object itself, or 0 if the ``ref`` itself is ``NULL``. 134 | 135 | An **unpinning** operation also takes either a ``ref`` value or an 136 | ``iref`` value as parameter, but returns ``void``. 137 | 138 | In each thread, there is a conceptual "pinning multi-set" (may contain repeated 139 | elements). A pinning operation adds a ``ref`` or ``iref`` into this multi-set, 140 | and an unpinning operation removes one instance of the ``ref`` or ``iref`` from 141 | his multi-set. A memory location is pinned as long as there is at least one 142 | ``iref`` to that memory location in the pinning multi-set of any thread. 143 | 144 | NOTE: This requires the Mu micro VM to perform somewhat complex 145 | book-keeping, but this gives Mu the opportunity for performance improvement 146 | over global Boolean pinning, where a pinned object can be unpinned instantly 147 | by an unpinning operation in any thread. The "pinning multi-thread" can be 148 | implemented as a thread-local buffer. In this case, if GC never happens, no 149 | expensive atomic memory access or inter-thread synchronisation is performed. 150 | 151 | Calling between Mu and Native Functions 152 | ======================================= 153 | 154 | Calling Conventions 155 | ------------------- 156 | 157 | The calling conventions involving native programs are platform-dependent and 158 | implementation-dependent. It should be defined by platform-specific binary 159 | interfaces (ABI) as supplements to this Mu specification. Mu implementations 160 | should advertise what ABI it implements. 161 | 162 | Calling conventions are identified by flags (``#XXXXXX``) in the IR. Mu defines 163 | the flag ``#DEFAULT`` and its numerical value 0x00 for the default calling 164 | convention of platforms. This flag is always available. Other calling 165 | conventions can be defined by implementations. 166 | 167 | The calling convention determines the type of value that are callable by the 168 | ``CCALL`` instruction (described below), and the type of the exposed value for 169 | Mu functions (described below). The type is usually a ``ufuncptr`` for C 170 | functions, which are called via their addresses. Other examples are: 171 | 172 | * If it is desired to make system calls directly from Mu, then the type can be 173 | an integer, i.e. the system call number. 174 | 175 | * If it is something like `a SWAP-STACK operation implemented as a calling 176 | convention `__, then the callee can 177 | be a stack pointer in the form of ``uptr``. 178 | 179 | Mu Functions Calling Native Functions 180 | ------------------------------------- 181 | 182 | The ``CCALL`` instruction calls a native function. Determined by calling 183 | conventions, the native function may be represented in different ways, and the 184 | arguments are passed in different ways. The return value of the call will be the 185 | return value of the ``CCALL`` instruction, which is a Mu SSA variable. 186 | 187 | Native Functions Calling Mu Functions 188 | ------------------------------------- 189 | 190 | A Mu function can be **exposed** as a native function pointer in three ways: 191 | 192 | 1. Statically, an ``.expose`` top-level definition exposes a Mu function as a 193 | native value according to the desired calling convention. For the default 194 | calling convention, the result is usually a function pointer. 195 | 196 | 2. Dynamically, the ``@uvm.native.expose`` common instructions can expose a Mu 197 | function, and the ``@uvm.native.unexpose`` common instruction deletes the 198 | exposed value. 199 | 200 | 3. Dynamically, the ``expose`` and ``unexpose`` API function do the same thing 201 | as the above instructions. 202 | 203 | A "cookie", which is a 64-bit integer value, can be attached to each exposed 204 | value. When a Mu function is called via one of its exposed value, the attached 205 | cookie can be retrieved by the ``@uvm.native.get_cookie`` common instruction in 206 | the callee, or 0 if called directly from Mu. 207 | 208 | NOTE: The purpose for the cookie is to support "closures". In some 209 | high-level languages, the programmer-accessible "functions" are actually 210 | closures, i.e. codes with attached data. Implemented on Mu, multiple 211 | different closures may share the same Mu function as their codes, but has 212 | different attached data. For example, in Lua:: 213 | 214 | function make_adder(y) 215 | return function(x) 216 | return x + y 217 | end 218 | end 219 | 220 | plus_one = make_adder(1) 221 | plus_two = make_adder(2) 222 | 223 | print(plus_one(3), plus_two(3)) -- 4 5 224 | 225 | ``plus_one`` and ``plus_two`` may probably share the same underlying Mu 226 | function as their common implementations, and they only differ by the 227 | different "up-value" ``y``. 228 | 229 | In C, any sane C programs that use call-backs should also have a ``void *`` 230 | as the "user data". For example, the ``pthread_create`` routine takes an 231 | extra ``void *arg`` parameter which will be passed to its ``start_routine`` 232 | as the argument. If the call-back is supposed to be a wrapper of a 233 | high-level language closure, the user data will be its context. 234 | 235 | However, different C programs support user data in different ways (if at 236 | all). For example, the UNIX signal handler function takes exactly one 237 | parameter which is the signal number: ``typedef void (*sig_t) (int)``. If a 238 | closure is supposed to handle UNIX signals, it must be able to identify its 239 | context by merely the exposed function pointer. 240 | 241 | One way to work around this problem is to generate a trampoline function 242 | which sets the cookie and jumps to the real callee. Many different 243 | trampolines can be made for a single Mu function, each of which supplies a 244 | different cookie. In this case, the cookie can identify the context for the 245 | closure. 246 | 247 | The simplest kind of cookie is an integer, but an object reference may also 248 | be a candidate. 249 | 250 | Since Mu programs need special contexts to execute (such as the thread-local 251 | memory allocation pool for the garbage collector, and the notion of the "current 252 | stack" for the SWAP-STACK operation), a native thread needs to attach itself to 253 | the Mu instance before calling any Mu functions. If a Mu thread calls native 254 | code from Mu, then it is already attached and can freely call back to Mu again. 255 | How to attach a thread to Mu is implementation-defined. 256 | 257 | For JVM users: The JNI invocation API function ``AttachCurrentThread()`` and 258 | ``DetachCurrentThread()`` are the counterpart of this requirement. 259 | 260 | Stack Sharing and Stack Introspection 261 | ------------------------------------- 262 | 263 | The callee may share the stack with the caller. 264 | 265 | When a Mu function "A" calls a native function which then calls back to another 266 | Mu function "B", Mu sees one single native frame between the frames for "A" and 267 | "B". When a Mu function is called from a native function without other Mu 268 | functions below, Mu consider the Mu function sitting on top of a native frame. 269 | 270 | Stack introspection can skip native frames and introspect other Mu frames below. 271 | 272 | NOTE: The requirement to "see through" native frames is partially required 273 | by exact garbage collection, in which case all references in the stack must 274 | be identified. 275 | 276 | However, throwing Mu exceptions into native frames has implementation-defined 277 | behaviour. Attempting to pop native frames via the API also has 278 | implementation-defined behaviour. 279 | 280 | NOTE: In general, it is not safe to force unwind native frames because 281 | native programs may need to clean up their own resources. Existing 282 | approaches, including JNI, models high-level (such as Java-level) exceptions 283 | as a query-able state rather than actual stack unwinding through native 284 | programs. 285 | 286 | Native exceptions thrown into Mu frames also have implementation-defined 287 | behaviours. 288 | 289 | NOTE: Similar to native frames, Mu programs may have even more necessary 290 | clean-up operations, such as GC barriers. 291 | 292 | Changes in the Mu IR and the API introduced by the native interface 293 | =================================================================== 294 | 295 | **New types**: 296 | 297 | * ``uptr < T >`` 298 | * ``ufuncptr < sig >`` 299 | 300 | See `Type System `__ 301 | 302 | **New top-level definitions**: 303 | 304 | * function exposing definition 305 | 306 | See `Mu IR `__. 307 | 308 | **New instructions**: 309 | 310 | * ``PTRCAST`` 311 | * ``@uvm.native.pin`` 312 | * ``@uvm.native.unpin`` 313 | * ``@uvm.native.expose`` 314 | * ``@uvm.native.unexpose`` 315 | * ``@uvm.native.get_cookie`` 316 | 317 | See `Instruction Set `__ and `Common Instructions 318 | `__. 319 | 320 | **Modified instructions**: 321 | 322 | * Memory addressing: 323 | 324 | * ``GETFIELDIREF`` 325 | * ``GETELEMIREF`` 326 | * ``SHIFTIREF`` 327 | * ``GETVARPARTIREF`` 328 | * ``LOAD`` 329 | * ``STORE`` 330 | * ``CMPXCHG`` 331 | * ``ATOMICRMW`` 332 | 333 | * ``CCALL`` 334 | 335 | Memory addressing instructions take an additional ``PTR`` flag. If this flag is 336 | present, the location operand must be ``uptr`` rather than ``iref``. For 337 | example: 338 | 339 | * ``%new_ptr = GETFIELDIREF PTR <@some_struct 3> %ptr_to_some_struct`` 340 | * ``%new_ptr = GETELEMIREF PTR <@some_array @i64> %ptr_to_some_array @const1`` 341 | * ``%new_ptr = SHIFTIREF PTR <@some_elem @i64> %ptr_to_some_elem @const2`` 342 | * ``%new_ptr = GETVARPARTIREF PTR <@some_hybrid> %ptr_to_some_hybrid`` 343 | * ``%old_val = LOAD PTR SEQ_CST <@T> %ptr_to_T`` 344 | * ``%void = STORE PTR SEQ_CST <@T> %ptr_to_T %newval`` 345 | * ``%result = CMPXCHG PTR ACQ_REL ACQUIRE <@T> %ptr_to_T %expected %desired`` 346 | * ``%old_val = ATOMICRMW ADD PTR SEQ_CST <@T> %ptr_to_T %rhs`` 347 | 348 | See `Instruction Set `__. 349 | 350 | **New API functions**: 351 | 352 | * ``ptrcast`` 353 | * ``pin`` 354 | * ``unpin`` 355 | * ``expose`` 356 | * ``unexpose`` 357 | 358 | **Modified API functions**: 359 | 360 | The ``cur_func_ver`` function, in addition to returning the function version 361 | ID, it may also return 0 if the selected frame is a native frame. (Multiple 362 | native frames are counted as one between two Mu frames.) 363 | 364 | The ``pop_frames_to`` function has implementation-defined behaviours when 365 | popping native frames. 366 | 367 | When rebinding a thread to a stack with a value, and the top frame is on a call 368 | site (native or Mu), the value associated with the rebinding is the return value 369 | of the call. 370 | 371 | Future Works 372 | ============ 373 | 374 | TODO: Inline assembly 375 | 376 | .. vim: tw=80 377 | -------------------------------------------------------------------------------- /overview.rest: -------------------------------------------------------------------------------- 1 | ======== 2 | Overview 3 | ======== 4 | 5 | Mu is a micro virtual machine designed to support high-level programming 6 | languages. It focuses on three basic concerns: 7 | 8 | - garbage collection 9 | - concurrency 10 | - just-in-time compiling 11 | 12 | The Concept of Micro Virtual Machines 13 | ===================================== 14 | 15 | Many programming languages are implemented on virtual machines. 16 | 17 | There are many aspects in a language implementation. There are high-level 18 | aspects which are usually language-specific, including: 19 | 20 | * parsing high-level language programs, or loading high-level byte codes 21 | * object oriented programming, including classes, inheritance and polymorphism 22 | (if applicable) 23 | * functional programming, including high-order functions, pattern matching, etc. 24 | (if applicable) 25 | * eager or lazy code/class loading 26 | * high-level optimisation 27 | * a comprehensive standard library 28 | 29 | as well as low-level aspects which are usually language-neutral, including: 30 | 31 | * an execution engine (for example, JIT compiling) 32 | * a model of threads and a memory model 33 | * garbage collection 34 | 35 | A "monolithic" VM implements everything listed above. JVM is one of such VMs. 36 | Creating such a VM is a huge amount of work. It takes two decades and billions 37 | of dollars for the JVM to have a high-quality implementation. Such man-power and 38 | investment is usually unavailable for other languages. 39 | 40 | We coined the term "**micro virtual machine**". It is an analogue to the term 41 | "microkernel" in the operating system context. A micro virtual machine, like a 42 | micro kernel, only does what absolutely needs to be done in the micro virtual 43 | machine, and pushes most high-level aspects to its client, the counterpart of 44 | the "services" of a microkernel. 45 | 46 | In a language implementation with the presence of a micro virtual machine, the 47 | micro virtual machine shall handle those three low-level aspects, namely 48 | concurrency, JIT and GC, and the client handles all other high-level aspects. 49 | 50 | Take JVM as an example. If JVM were implemented as a client of a micro virtual 51 | machine, it only needs to handle JVM-specific features, including the byte-code 52 | format, class loading and aspects of object-oriented programming. 53 | 54 | :: 55 | 56 | Traditional JVM 57 | +-------------------+ +---------------------------+ 58 | | | | | 59 | | *JVM* | | *Java Client* | 60 | | byte code format | | byte-code format | 61 | | class loading | | class loading | 62 | | OOP | | OOP | 63 | | GC | | | 64 | | concurrenty | +---------------------------+ 65 | | JIT compiling | | *micro virtual machine* | 66 | | | | GC,concurrency,JIT | 67 | +-------------------+ +---------------------------+ 68 | | *OS* | | *OS* | 69 | +-------------------+ +---------------------------+ 70 | 71 | The Mu Project 72 | ============== 73 | 74 | Mu is a concrete micro virtual machine. 75 | 76 | The main part of this project is this specification which defines the behaviour 77 | of Mu and the interaction with the client. This allows multiple compliant 78 | implementations. 79 | 80 | The specification mainly includes the type system, the instruction set and the 81 | Mu client interface (sometimes called "the API"). 82 | 83 | The Mu Architecture 84 | ------------------- 85 | 86 | The whole system is divided into a language-specific **client** and a 87 | language-neutral **micro virtual machine** (in this case, it is Mu). 88 | 89 | :: 90 | 91 | | source code or byte code 92 | v 93 | +-----------------+ 94 | | client | 95 | +-----------------+ 96 | | ^ 97 | Mu IR / | | traps/watchpoints/ 98 | API call | | other events 99 | v | 100 | +-------------------+ manages +---------+ 101 | | Mu (the micro VM) |----------->| Mu heap | 102 | +-------------------+ +---------+ 103 | 104 | A typical client implements a high-level language (e.g. Python or Lua). Such a 105 | client would be responsible for loading, parsing and executing the source code 106 | or byte code. 107 | 108 | The client submits programs to Mu in a language called **Mu Intermediate 109 | Representation**, a.k.a. **Mu IR**. The Mu IR code is then executed on Mu. 110 | 111 | The client can directly manipulate the states of Mu using the **Mu client 112 | Interface**, a.k.a. **the API**. The API can access the Mu memory (including the 113 | heap), create Mu threads, stacks, introspect stack states and so on. The Mu IR 114 | code mentioned above is submitted via the API, too. 115 | 116 | There are events which Mu cannot handle alone. These include lazy code loading, 117 | requesting for optimisation/deoptimisation and so on. In these cases, Mu 118 | generates events to be handled by the client. 119 | 120 | Mu handles garbage collection internally. Mu can identify all references held 121 | inside Mu and also tracks all references held by the client. So exact GC is 122 | possible in Mu without the intervention from the client. 123 | 124 | The Mu Type System 125 | ------------------- 126 | 127 | The Mu type system has scalar and vector integer and floating point types, 128 | aggregate types including structs, arrays and hybrids, as well as reference 129 | types. The type system is low level, similar to the level of C, but natively 130 | supports reference types. 131 | 132 | Mu is agnostic of the type hierarchy in high-level languages, but the client can 133 | implement its language-specific type system and run-time type information on top 134 | of the Mu type system. 135 | 136 | See `Type System `__ for more details. 137 | 138 | The Mu Instruction Set 139 | ----------------------- 140 | 141 | The Mu instruction set is similar to (and is actually inspired by) the `LLVM 142 | `__'s instruction set. There are primitive 143 | arithmetic/logical/relational/conversion operations and control flow 144 | instructions. 145 | 146 | Mu has its own exception handling, not depending on system libraries as C++ 147 | does. Mu IR programs can throw and catch exceptions, but the client needs to 148 | implement its own exception type hierarchy if applicable. 149 | 150 | There are garbage-collection-aware memory operations, including memory 151 | allocation, addressing and accessing. The client does not need to implement 152 | garbage collection algorithms; it only needs to use reference types and related 153 | instructions and Mu handles the rest. 154 | 155 | Trap instructions let Mu IR programs talk back to the client for events it 156 | cannot handle. 157 | 158 | There are also instructions for handling stack and threads. 159 | 160 | See `Instruction Set `__ and `Common Instructions 161 | `__ for more details. 162 | 163 | The Mu Client Interface 164 | ------------------------ 165 | 166 | The Mu client interface (API) allows the client to directly manipulate the state 167 | of Mu. 168 | 169 | The API can load Mu IR code. 170 | 171 | The API can create threads and stacks. The usual way to start a Mu program is 172 | to create a new stack with a function on the bottom of the stack and create a 173 | new thread on it to start execution. (The concept of threads and stacks are 174 | discussed later.) 175 | 176 | The API can directly allocate and access the Mu memory. References are 177 | indirectly exposed to the client as handles rather than raw pointers for the 178 | ease of garbage collection. (The JVM takes the same approach.) 179 | 180 | The client also handles trap events generated by the Mu IR code. The client can 181 | introspect the selected local variables on the stack and perform on-stack 182 | replacement (i.e. OSR. Discussed later.) 183 | 184 | See `Client Interface `__ for more details. 185 | 186 | Unsafe Native Interface 187 | ----------------------- 188 | 189 | The (unsafe) native interface is designed to directly interact with native 190 | programs. It gives Mu program direct access to the memory via pointers, and 191 | allows pinning Mu objects so that they can be accessed by native programs. It 192 | also allows Mu to call a native (usually C) function directly, and allows native 193 | programs to call back to selected Mu functions. (The .NET CLR takes similar 194 | approach, i.e. giving the high-level program "unsafe" access to the lower 195 | level.) 196 | 197 | This interface is different from the client API. The main purpose is to 198 | implement the system-interfacing part of the high-level language, such as the IO 199 | and the networking library. 200 | 201 | See `Native Interface `__ for more details. 202 | 203 | Multi-Threading 204 | --------------- 205 | 206 | Mu supports threads. Mu threads are usually implemented with native OS threads, 207 | but this specification does not enforce this. Multiple Mu threads may execute 208 | simultaneously. 209 | 210 | Mu has a C11/C++11-like memory model. There are atomic memory access with 211 | different memory orders. The client should generate code with the appropriate 212 | memory order for its high-level language. 213 | 214 | Mu provides a Futex-like mechanism similar to the counterpart provided by the 215 | Linux kernel. It is the client's responsibility to implement mutex locks, 216 | semaphores, conditions, barriers and so on using atomic memory accesses and the 217 | futex. 218 | 219 | See `Threads and Stacks `__ for details about threads and 220 | `Memory Model `__ for the Mu memory model. 221 | 222 | The Swap-stack Operation 223 | ------------------------ 224 | 225 | Mu distinguishes between threads and stack. In Mu, a thread is the unit of CPU 226 | scheduling and a stack is the context in which a thread executes. An analogy is 227 | "workers and jobs". 228 | 229 | A stack has multiple frames, each of which is a context of a function 230 | activation, including local variables and the current instruction. 231 | 232 | A *swap-stack* operation unbinds a thread from a stack (the old context) and 233 | bind to another stack (the new context). As a result, the old context of 234 | execution is paused and can be continued when another swap-stack operation binds 235 | another thread (may not be the same old thread) to that stack. This is similar 236 | to letting a worker stop doing one job and continue with another job. 237 | 238 | The swap-stack operation is essentially a model of symmetric coroutines. It 239 | allows the client to implement coroutines in high-level languages (e.g. Ruby, 240 | Lua, Go as well as Python and ECMAScript 6). 241 | 242 | It also allows the client to implement its own light-weight 243 | thread. This is particularly useful for languages with massively many threads 244 | (e.g. Erlang). 245 | 246 | See `Threads and Stacks `__ for details. 247 | 248 | Function Redefinition 249 | --------------------- 250 | 251 | It is a common strategy to use a fast compiler to compile high-level programs to 252 | suboptimal low-level code, and only optimise when the implementation decided at 253 | run time that a function (or loop) is hot. Then an optimised version is 254 | compiled. Optimising compilation usually takes longer, but the code runs faster. 255 | 256 | In Mu, a function can have zero or more versions. When a function is called, it 257 | always calls the newest version. 258 | 259 | The semantic of *function definition* in Mu is to create a new version of a 260 | function. What the client should do is to generate an optimised version of the 261 | high-level code in Mu IR and submit it to Mu. All call sites and function 262 | references are automatically updated. 263 | 264 | If a function has zero versions, it is "undefined". Calling such a function will 265 | "trap" to the client. Such functions behave like "stubs" and this gives the 266 | client a chance to implement lazy code/class loading. 267 | 268 | See `Intermediate Representation `__ for the definition of 269 | functions and versions, and see `Client Interface `__ for 270 | the code loading interface. 271 | 272 | On-stack Replacement 273 | -------------------- 274 | 275 | At the same time when an optimised version of a function is compiled, there are 276 | existing activations on the stack still running the old version. On-stack 277 | Replacement (OSR) is the operation to replace an existing stack frame with 278 | another frame. 279 | 280 | Mu provides two primitives in its API: 281 | 282 | 1. Pop the top frame of a stack. 283 | 2. Given a function and its arguments, create a new frame and push it on the top 284 | of a stack. 285 | 286 | Note that Mu is oblivious about whether the new version is "equivalent to" or 287 | "better than" the old version. The responsibility of optimisation is pushed to 288 | the client. 289 | 290 | See `Client Interface `__ for more details. 291 | 292 | Miscellaneous Topics 293 | -------------------- 294 | 295 | The `Memory `__ chapter provides more detail about garbage 296 | collection and memory allocation/accessing. 297 | 298 | The `Portability `__ chapter describes the requirements of 299 | implementations. It summarises corner cases which may result in different or 300 | undefined behaviours in different platforms. 301 | 302 | .. vim: tw=80 303 | -------------------------------------------------------------------------------- /portability.rest: -------------------------------------------------------------------------------- 1 | ============== 2 | Portability 3 | ============== 4 | 5 | As both a thin layer over the hardware and an abstraction over concurrency, JIT 6 | compiling and GC, Mu must strike a balance between portability and the ability 7 | to exploit platform-specific features. Thus Mu is designed in such a way that 8 | 9 | 1. There is a basic set of types and instructions that have common and defined 10 | behaviours and reasonably good performance on all platforms. 11 | 2. Mu also includes platform-specific instructions. These instructions are 12 | either defined by this Mu specification or extended by Mu implementations. 13 | 14 | In this chapter, **required** features must be implemented by Mu 15 | implementation and **optional** features may or may not be implemented. However, 16 | if an optional feature is implemented, it must behave as specified. 17 | 18 | NOTE: Although "behaving as specified", the implementation can still reject 19 | some inputs in a compliant way. For example, if an array type is too large, 20 | Mu still needs to accept the Mu IR that contains such a type, but may always 21 | refuse to allocate such a type in the memory. 22 | 23 | The platform-independent part of the native interface is a required component of 24 | the Mu spec, but the platform-dependent parts are not. There will be things 25 | which are "required but has implementation-defined behaviours". In this case, Mu 26 | must not reject IR programs that contain such constructs, but each 27 | implementation may do different things. 28 | 29 | Type System 30 | =========== 31 | 32 | The ``int`` type of lengths 1, 8, 16, 32, and 64 are required. ``int`` of 6 and 33 | 52 bits are required if Mu also implements the ``tagref64`` type. Other lengths 34 | are optional. 35 | 36 | Both ``float`` and ``double`` are required. 37 | 38 | The vector types ``vector 4>``, ``vector`` and ``vector`` are required. Other vector types are optional. 40 | 41 | NOTE: Even though required to be accepted by Mu, they are not required to be 42 | implemented using hardware-provided vector instructions or vector registers 43 | of the exact same length. They can be implemented totally with scalar 44 | operations and general-purpose registers or memory, or implemented using 45 | different hardware vector sizes, larger or smaller. 46 | 47 | Reference types ``ref``, ``iref`` and ``weakref`` are required if ``T`` 48 | is implemented. Otherwise optional. 49 | 50 | A struct type ``struct<...>`` are required if it has at most 256 fields and all 51 | of its field types are implemented. Otherwise optional. 52 | 53 | An array type ``array`` is required if T is implemented and n is less than 54 | 2^64. Otherwise optional. 55 | 56 | NOTE: This implies Mu must accept array types of up to 2^64-1 elements. 57 | However, arrays must be in the memory. Whether such an array can be 58 | successfully allocated is a different story. 59 | 60 | A hybrid type ``hybrid`` is required if all fields in ``Fs`` and ``V`` are 61 | implemented. 62 | 63 | The void type ``void`` is required. 64 | 65 | A function type ``funcref`` is required if ``Sig`` is implemented. 66 | 67 | The opaque types ``threadref`` and ``stackref`` are both required. 68 | 69 | The tagged reference type ``tagref64`` is optional. 70 | 71 | A function signature ``R (P0 P1 ...)`` is required if all of its parameter types 72 | and its return type are implemented and there are at most 256 parameters. 73 | Otherwise optional. 74 | 75 | Pointer types ``uptr`` and ``ufuncptr`` are required for required and 76 | native-safe types ``T`` and signatures ``sig``. Both are represented as 77 | integers, but their lengths are implementation-defined. 78 | 79 | Constants 80 | ========= 81 | 82 | Integer constants of type ``int`` is required for all implemented n. 83 | 84 | Float and double constants are required. 85 | 86 | A struct constant is required if constants for all of its fields are 87 | implemented. 88 | 89 | All NULL constants are required. 90 | 91 | Pointer constants are required, but the implementation defines the length of 92 | them. 93 | 94 | Instructions 95 | ============ 96 | 97 | All integer binary operations and comparisons are required for ``int`` of length 98 | 8, 16, 32, 64 and required integer vector types, and optional for other integer 99 | lengths or integer vector types. All floating-point binary operations and 100 | comparisons are required for all floating point types and required floating 101 | point vector types, and optional for other floating point vector types. 102 | 103 | In the event of signed and unsigned integer overflow in binary operations, the 104 | result is truncated to the length of the operand type. 105 | 106 | Divide-by-zero caused by ``UDIV`` and ``SDIV`` results in exceptional control 107 | flows. The result of signed overflow by ``SDIV`` is the left-hand-side. 108 | 109 | NOTE: -0x80000000 / -1 == -0x80000000 for signed 32-bit int. 110 | 111 | For shifting instructions ``SHL``, ``LSHR`` and ``ASHR`` for integer type 112 | ``int``, only the lowest ``m`` bits of the right-hand-side are used, where 113 | ``m`` is the smallest integer that ``2^m`` >= ``n``. 114 | 115 | Conversions instructions are required between any two implemented types that can 116 | be converted. Specifically, given two types T1 and T2 and a conversion operation 117 | CONVOP, if both T1 and T2 are implemented and they satisfied the requirement of 118 | the ``T1`` and ``T2`` parameters of CONVOP (see ``__), then the 119 | CONVOP operation converting from T1 to T2 is required. 120 | 121 | Binary floating point operations round to nearest and round ties to even. 122 | Conversions involving floating point numbers round towards zero except 123 | converting from floating point to integer, in which case, round towards zero and 124 | the range is clamped to that of the result type and NaN is converted to 0. 125 | Binary operations, comparisons and conversions involving floating point numbers 126 | never raise exceptions or hardware traps. *[JVM behaviour]* 127 | 128 | Switch is required for the operand type of ``int`` of length 8, 16, 32 and 64. 129 | Otherwise optional. 130 | 131 | Calling a function whose value is ``NULL`` is undefined behaviour. 132 | 133 | Throwing exceptions across native frames has implementation-defined behaviour. 134 | 135 | Stack overflow when calling a function results in taking the exceptional control 136 | flow and the exceptional parameter instruction receives ``NULL``. 137 | 138 | ``EXTRACTELEMENT`` and ``INSERTELEMENT`` is required for all implemented vector 139 | types and integer types. ``SHUFFLEELEMENT`` is required if the source vector 140 | type, the mask vector type and the result vector type are all implemented. 141 | 142 | All memory allocation instructions ``NEW``, ``NEWHYBRID``, ``ALLOCA`` and 143 | ``ALLOCAHYBRID`` are allowed to result in error, in which case the exceptional 144 | control flow is taken. 145 | 146 | NOTE: This is for out-of-memory error and other errors. 147 | 148 | The ``GETELEMIREF`` and ``SHIFTIREF`` instructions accept any integer type as 149 | the index or offset type. The index and the offset are treated as signed. When 150 | these two instructions result in a reference beyond the size of the actual array 151 | in the memory, they have undefined behaviours. 152 | 153 | ``GETVARPARTIREF`` has undefined behaviour if the hybrid has zero elements in 154 | its variable part. 155 | 156 | All memory addressing instructions ``GETIREF``, ``GETFIELDIREF``, 157 | ``GETELEMIREF``, ``SHIFTIREF`` and ``GETVARPARTIREF`` give 158 | undefined behaviour when applied to ``NULL`` references. But when applied to 159 | pointers, these instructions calculates the result by calculating the offset 160 | according to the memory layout, which is implementation-defined. 161 | 162 | All memory access instructions ``LOAD``, ``STORE``, ``CMPXCHG`` and 163 | ``ATOMICRMW`` that access the ``NULL`` references take the exceptional control 164 | flow. If they access an invalid memory location (this include the case when the 165 | stack frame that contains a stack cell created by ``ALLOCA`` is popped and a 166 | reference to it becomes a dangling reference), then they have undefined 167 | behaviours. 168 | 169 | Accessing a memory location which represents a type different from the type 170 | expected by the instruction gives undefined behaviour. 171 | 172 | Accessing the memory via a pointer behaves as if accessing via an ``iref`` if 173 | the byte region represented by the pointer overlaps with the mapped (pinned) 174 | memory region of the Mu memory location. It behaves as if updating the memory 175 | byte-by-byte (not atomically) when all bytes in the byte region pointed by the 176 | pointer are part of some Mu memory locations. Otherwise such memory access has 177 | implementation-defined behaviours. 178 | 179 | The following types are required for both non-atomic and atomic ``LOAD`` and 180 | ``STORE`` for all implemented ``T`` and ``Sig``: ``int<8>``, ``int<16>``, 181 | ``int<32>``, ``int<64>``, ``float``, ``double``, ``ref``, ``iref``, 182 | ``weakref``, ``funcref``, ``threadref``, ``stackref``, ``uptr`` and 183 | ``ufuncptr``. 184 | 185 | The following types are required for non-atomic ``LOAD`` and ``STORE``: 186 | ``vector 4>``, ``vector`` and ``vector``. 187 | 188 | ``int<32>``, ``int<64>``, ``ref``, ``iref``, ``weakref``, ``funcref``, 189 | ``threadref``, ``stackref``, ``uptr`` and ``ufuncptr`` are required for 190 | ``CMPXCHG`` and the ``XCHG`` operation of the ``ATOMICRMW`` instruction. 191 | 192 | ``int<32>`` and ``int<64>`` are required for all ``ATOMICRMW`` operations. 193 | 194 | If ``tagref64`` is implemented, it is required for both atomic and non-atomic 195 | ``LOAD`` and ``STORE``, and the ``XCHG`` operation of the ``ATOMICRMW`` 196 | instruction. 197 | 198 | Other types are optional for ``CMPXCHG`` and any subset of ``ATOMICRMW`` 199 | operations. 200 | 201 | One atomic Mu instruction does not necessarily correspond to exactly one 202 | machine instruction. So some atomic read-modify-write operations can be 203 | implemented using ``CMPXCHG`` or load-link store-conditional constructs. 204 | 205 | ``CCALL`` is required, but the behaviour is implementation-defined. The 206 | available calling conventions are implementation-defined. 207 | 208 | ``@uvm.new_stack`` and ``NEWTHREAD`` is allowed to result in error, in which 209 | case the exceptional control flow is taken. 210 | 211 | The availability of ``COMMINST`` is specified in the next section. 212 | 213 | In any cases when an error occurs and the control flow is expected to transfer 214 | to the exceptional control flow, but the exception clause is not supplied, then 215 | it gives undefined behaviours. 216 | 217 | All instructions whose availability is not explicitly specified above are 218 | required for all types and signatures that are implemented and suitable. 219 | 220 | Common Instructions 221 | =================== 222 | 223 | Required only when ``tagref64`` is implemented: 224 | 225 | * @uvm.tr64.is_fp 226 | * @uvm.tr64.is_int 227 | * @uvm.tr64.is_ref 228 | * @uvm.tr64.to_fp 229 | * @uvm.tr64.to_int 230 | * @uvm.tr64.to_ref 231 | * @uvm.tr64.to_tag 232 | * @uvm.tr64.from_fp 233 | * @uvm.tr64.from_int 234 | * @uvm.tr64.from_ref 235 | 236 | All other common instructions are always required. 237 | 238 | The Mu implementation may add common instructions. 239 | 240 | .. vim: tw=80 241 | -------------------------------------------------------------------------------- /scripts/extract_comminst_macros.py: -------------------------------------------------------------------------------- 1 | #/usr/bin/env python3 2 | 3 | """ 4 | Extract comminst definitions from common-insts.rest into C macros. 5 | 6 | USAGE: python3 script/extract_comminst_macros.py < common-inst.rest 7 | """ 8 | 9 | import re 10 | import sys 11 | 12 | pat = re.compile(r'\[(0x[0-9a-f]+)\]@([a-zA-Z0-9_.]+)', re.MULTILINE) 13 | 14 | defs = [] 15 | longest = 0 16 | 17 | text = sys.stdin.read() 18 | 19 | for opcode, name in pat.findall(text): 20 | macro_name = "MU_CI_" + name.upper().replace(".", "_") 21 | opcode = opcode.upper() 22 | defs.append((macro_name, opcode)) 23 | longest = max(longest, len(macro_name)) 24 | 25 | for macro_name, opcode in defs: 26 | print("#define {} {}".format(macro_name.ljust(longest), opcode)) 27 | 28 | -------------------------------------------------------------------------------- /scripts/muapiparser.py: -------------------------------------------------------------------------------- 1 | """ 2 | Parse the muapi.h so that you can generate different bindings. 3 | 4 | The result will be a simple JSON object (dict of dicts). 5 | """ 6 | 7 | import re 8 | 9 | import injecttools 10 | 11 | r_commpragma = re.compile(r'///\s*MUAPIPARSER:(.*)$') 12 | r_comment = re.compile(r'//.*$', re.MULTILINE) 13 | r_decl = re.compile(r'(?P\w+\s*\*?)\s*\(\s*\*\s*(?P\w+)\s*\)\s*\((?P[^)]*)\)\s*;\s*(?:///\s*MUAPIPARSER\s+(?P.*)$)?', re.MULTILINE) 14 | r_param = re.compile(r'\s*(?P\w+\s*\*?)\s*(?P\w+)') 15 | 16 | r_define = re.compile(r'^\s*#define\s+(?P\w+)\s*\(\((?P\w+)\)(?P\w+)\)\s*$', re.MULTILINE) 17 | 18 | r_typedef = re.compile(r'^\s*typedef\s+(?P\w+\s*\*?)\s*(?P\w+)\s*;', re.MULTILINE) 19 | 20 | r_struct_start = re.compile(r'^struct\s+(\w+)\s*\{') 21 | r_struct_end = re.compile(r'^\};') 22 | 23 | def filter_ret_ty(text): 24 | return text.replace(" ","") 25 | 26 | def extract_params(text): 27 | params = [] 28 | for text1 in text.split(','): 29 | ty, name = r_param.search(text1).groups() 30 | ty = ty.replace(" ",'') 31 | params.append({"type": ty, "name": name}) 32 | 33 | return params 34 | 35 | def extract_pragmas(text): 36 | text = text.strip() 37 | if len(text) == 0: 38 | return [] 39 | else: 40 | return text.split(";") 41 | 42 | def extract_methods(body): 43 | methods = [] 44 | for ret, name, params, pragma in r_decl.findall(body): 45 | methods.append({ 46 | "name": name, 47 | "params": extract_params(params), 48 | "ret_ty": filter_ret_ty(ret), 49 | "pragmas": extract_pragmas(pragma), 50 | }) 51 | 52 | return methods 53 | 54 | def extract_struct(text, name): 55 | return injecttools.extract_lines(text, (r_struct_start, name), (r_struct_end,)) 56 | 57 | def extract_enums(text, typename, pattern): 58 | defs = [] 59 | for m in r_define.finditer(text): 60 | if m is not None: 61 | name, ty, value = m.groups() 62 | if pattern.search(name) is not None: 63 | defs.append({"name": name, "value": value}) 64 | return { 65 | "name": typename, 66 | "defs": defs, 67 | } 68 | 69 | _top_level_structs = ["MuVM", "MuCtx"] 70 | _enums = [(typename, re.compile(regex)) for typename, regex in [ 71 | ("MuTrapHandlerResult", r'^MU_(THREAD|REBIND)'), 72 | ("MuDestKind", r'^MU_DEST_'), 73 | ("MuBinOptr", r'^MU_BINOP_'), 74 | ("MuCmpOptr", r'^MU_CMP_'), 75 | ("MuConvOptr", r'^MU_CONV_'), 76 | ("MuMemOrd", r'^MU_ORD_'), 77 | ("MuAtomicRMWOptr", r'^MU_ARMW_'), 78 | ("MuCallConv", r'^MU_CC_'), 79 | ("MuCommInst", r'^MU_CI_'), 80 | ]] 81 | 82 | def extract_typedefs(text): 83 | typedefs = {} 84 | for m in r_typedef.finditer(text): 85 | expand_to, name = m.groups() 86 | typedefs[name] = expand_to.replace(" ","") 87 | 88 | return typedefs 89 | 90 | def parse_muapi(text): 91 | structs = [] 92 | 93 | for sn in _top_level_structs: 94 | b = extract_struct(text, sn) 95 | methods = extract_methods(b) 96 | structs.append({"name": sn, "methods": methods}) 97 | 98 | enums = [] 99 | 100 | for tn,pat in _enums: 101 | enums.append(extract_enums(text, tn, pat)) 102 | 103 | typedefs = extract_typedefs(text) 104 | 105 | return { 106 | "structs": structs, 107 | "enums": enums, 108 | "typedefs": typedefs, 109 | } 110 | 111 | if __name__=='__main__': 112 | import sys, pprint, shutil 113 | 114 | width = 80 115 | 116 | try: 117 | width, height = shutil.get_terminal_size((80, 25)) 118 | except: 119 | pass 120 | 121 | text = sys.stdin.read() 122 | pprint.pprint(parse_muapi(text), width=width) 123 | 124 | 125 | -------------------------------------------------------------------------------- /threads-stacks.rest: -------------------------------------------------------------------------------- 1 | ================== 2 | Threads and Stacks 3 | ================== 4 | 5 | One unique feature of Mu is the flexible relation between stacks and threads. A 6 | thread can swap between multiple stacks to achieve light-weighted context 7 | switch. This provides support for language features like co-routines and green 8 | threads. 9 | 10 | On the other hand, Mu does allow multiple simultaneous threads. Mu threads, by 11 | design, can be implemented as native OS threads and make use of parallel CPU 12 | resources. Mu also has a memory model. See the `Memory Model `__ 13 | chapter for more information. 14 | 15 | This chapter discusses Mu threads and Mu stacks. In this chapter, "thread" 16 | means "Mu thread" unless explicitly stated otherwise. 17 | 18 | Concepts 19 | ======== 20 | 21 | A **stack** is the context of nested or recursive activations of functions. 22 | 23 | NOTE: "Stack" here means the "control stack", or more precisely the 24 | "context" of execution. On a concrete machine, the context includes not only 25 | the stack, but also the CPU/register states. Mu abstracts the CPU state, 26 | modelling it as part of the state of the stack-top frame. 27 | 28 | A stack has many **frames**, each of which is the context of one function 29 | activation. A frame contains the states of all local variables (parameters and 30 | instructions), the program counter and alloca cells (see `Mu and the Memory 31 | `__). Each frame is associated with a *version* of a function. 32 | 33 | NOTE: Because Mu allows function redefinition, a function may be redefined 34 | by the client, and newly created function activations (newly called 35 | functions) will use the new definition. But any existing function 36 | activations will still use their old definitions, thus a frame is only bound 37 | to a particular version of a function, not just a function. This is very 38 | important because Mu cannot magically translate the state of any old 39 | function activation to a new one. A redefined function may even have 40 | completely different meaning from the old one. Mu allows the client to do 41 | crazy things like redefining a factorial function to a Fibonacci function. 42 | 43 | During on-stack replacement, the Mu client API can tell the client which 44 | version of which function any frame is executing and the value of KEEPALIVE 45 | variables. The client is responsible for translating the Mu-level states to 46 | the high-level language states. 47 | 48 | A **thread** is the unit of CPU scheduling. A thread can be **bound** to a 49 | stack, in which case the thread executes using the stack as its context. 50 | The phrase "bind a stack to a thread" has the same meaning as "bind a thread to 51 | a stack". While a thread is executing on a stack, it changes the state of the 52 | stack, including changing the value of local variables by executing 53 | instructions, pushing or popping frames and allocating memory on the stack. 54 | 55 | A stack can be bound to at most one thread at any moment. 56 | 57 | A thread is always bound to one stack, with one exception: when executing a 58 | ``TRAP`` or ``WATCHPOINT`` instruction, it is temporarily unbound from its 59 | current stack. It either rebinds to a stack (may be the old stack or another 60 | stack) or terminates after returning from the trap handler. 61 | 62 | TODO: https://github.com/microvm/microvm-meta/issues/42 Extend the unbinding 63 | to undefined function handling. 64 | 65 | State of Threads 66 | ================ 67 | 68 | The state of a thread include: 69 | 70 | - the *stack* it is bound to (explained in this chapter) 71 | 72 | - a *thread-local object reference* (see below) 73 | 74 | - a thread-local *pinning multi-set* (see `object pinning 75 | `__) 76 | 77 | NOTE: Implementations may keep more thread-local states, such as the 78 | thread-local memory pool for the garbage collector. They are implementation 79 | details. 80 | 81 | The **thread-local object reference** is an arbitrary object reference, and can 82 | be ``NULL``. It is initialised when a thread is created. It can be read and 83 | modified by the thread itself. It can also be read and modified by the client in 84 | the trap handler, but the trap handler can only read and modify the thread-local 85 | object reference of the thread that triggered the trap. It cannot be read or 86 | modified in any other ways. 87 | 88 | NOTE: This design ensures that: 89 | 90 | 1. The access to the thread-local object reference itself is data-race-free. 91 | 92 | 2. It is only a single object reference, so the reference can fit in a 93 | machine register. In this way, if the implementation reserves a register 94 | for that reference, accessing fields in the object it refers to can be as 95 | efficient as register-indexed addressing. 96 | 97 | It also off-loads the responsibility of resizing or redefining the 98 | thread-local object to the client. If the client wishes to add more fields 99 | to that object (e.g. when more bundles are loaded), it can use watchpoints 100 | to stop existing threads and replace their thread-local object references in 101 | the trap handler. 102 | 103 | States of Stacks and Frames 104 | =========================== 105 | 106 | At any moment, **the state of a frame** is one of the following: 107 | 108 | READY 109 | (Ts = T1 T2 T3 ..., a list of types) The frame is ready to resume when 110 | values of types *T1 T2 T3 ...* are supplied. *Ts* can be an empty list. 111 | 112 | ACTIVE 113 | The current frame is the top of a stack and a thread is executing on the 114 | stack. 115 | 116 | DEAD 117 | The frame is dead. 118 | 119 | **The state of a stack** is the state of its top frame. In a bound stack, the 120 | top frame is in the **ACTIVE** state while all other frames are in the 121 | **READY** state; in an unbound stack, all frames are in the **READY** 122 | state, where *Ts* are specific to each frame. When killing a stack, all of its 123 | frames enter the **DEAD** state. 124 | 125 | Calling, returning and exception throwing instructions change the state of 126 | frames, but since the stack is always running on the same stack, the state of 127 | the stack remains to be **ACTIVE**. ``CALL`` and ``CCALL`` change the state of 128 | the caller frame to **READY** where *Ts* is the return type of the callee, 129 | but a new frame is created for the callee, entering the **ACTIVE** state 130 | immediately. ``RET`` and ``THROW`` remove frames from the top of the current 131 | stack, but resume a lower frame and change its state to **ACTIVE**. 132 | 133 | Operations on remote stacks can change the state of stacks. The table below 134 | summarises important operations: 135 | 136 | ======================= =============================== ======================= ====================== 137 | Operation Current Stack New/Destination Stack Affected frames 138 | ======================= =============================== ======================= ====================== 139 | create new stack N/A READY creates new frame 140 | create new thread N/A READY -> ACTIVE top 141 | SWAPSTACK ACTIVE -> READY or DEAD READY -> ACTIVE both top frames 142 | killing a stack N/A READY -> DEAD all frames 143 | @uvm.thread_exit ACTIVE -> DEAD N/A all frames 144 | trap to client ACTIVE -> READY N/A top 145 | popping a frame READY -> READY N/A removes top frame 146 | pushing a frame READY -> READY N/A creates new frame 147 | ======================= =============================== ======================= ====================== 148 | 149 | Stack and Thread Creation 150 | ========================= 151 | 152 | Mu stacks and Mu threads can be created by Mu instructions ``@uvm.new_stack`` 153 | and ``NEWTHREAD``, or the API function ``new_stack`` and ``new_thread``. 154 | 155 | When a stack is created, a Mu function must be provided. The stack will contain 156 | a frame created for the current version of the function (as seen by the current 157 | thread because of concurrency and the memory model). This frame is called the 158 | **stack-bottom frame** and the function is called the **stack-bottom function**. 159 | 160 | NOTE: The stack-bottom frame is conceptually the last frame in a Mu stack 161 | and returning from that frame has undefined behaviour. But a concrete Mu 162 | implementation can still have its own frames or useful data below the 163 | stack-bottom frame. They are implementation-specific details. 164 | 165 | The stack-bottom frame (and also the stack) is in the **READY** state, where 166 | Ts are the parameter types of the stack-bottom function. The resumption point is 167 | the beginning of the function version. 168 | 169 | When a thread is created, a stack must be provided as its **initial stack**. 170 | Creating a thread binds the thread to the stack, passing values or raising 171 | exception to it (explained later), thus the top frame will enter the **ACTIVE** 172 | state after the thread is created. A newly created thread starts execution 173 | immediately. 174 | 175 | NOTE: Unlike Java, there is not a separate step to "start" a thread. A 176 | thread starts when it is created. 177 | 178 | Thread Termination 179 | ================== 180 | 181 | A thread is terminated when it executes the ``@uvm.thread_exit`` instruction, or 182 | the client orders the current thread to terminate in a trap handler. 183 | 184 | The ``@uvm.thread_exit`` instruction kills the current stack of the current 185 | thread. 186 | 187 | Mu may change the value of ``threadref`` type to ``NULL`` if the thread it 188 | refers to is terminated. 189 | 190 | Binding of Stack and Thread 191 | =========================== 192 | 193 | Binding 194 | ------- 195 | 196 | Some actions, including the ``NEWTHREAD`` and the ``SWAPSTACK`` instruction, the 197 | ``new_thread`` API function and the trap handler, can bind a thread to a stack. 198 | 199 | When **binding** a thread to a stack, the state of its top frame changes from 200 | **READY** to **ACTIVE**. In this process, one of the following two actions 201 | shall be performed on the stack: 202 | 203 | - A binding operation can **pass values** of types *Ts* to the stack. In this 204 | case, the types *Ts* must match the expected types, and the stack **receives 205 | the values**. *Ts* can be an empty list. 206 | 207 | - A binding operation can **raise an exception** to the stack. In this case, the 208 | stack can be in **READY** with any *Ts* and it **receives the exception**. 209 | 210 | It gives undefined behaviour if the stack is not in the expected state. 211 | 212 | Resumption Point 213 | ---------------- 214 | 215 | A frame in the **READY** state has a **resumption point**. The resumption 216 | point determines how the received values and the received exception are 217 | processed when binding to a thread. 218 | 219 | For a Mu frame, the resumption point is either the beginning of a function 220 | version, or an OSR point instruction in the function version. 221 | 222 | - In the former case, the *Ts* in **READY** are the parameters of the 223 | function (also the entry block). Received values are bound to the parameters 224 | and the execution continues from the beginning of the entry block. Received 225 | exception is re-thrown. 226 | 227 | - In the latter case, the *Ts* types are determined by the concrete 228 | instructions. Specifically, *Ts* are the return types for ``CALL`` and 229 | ``CCALL``, and are explicitly specified for ``TRAP``, ``WATCHPOINT`` and 230 | ``SWAPSTACK``. The received values are bound to the results of the OSR point 231 | instruction. The received exception is handle by the instruction or re-thrown 232 | depending on the instruction. 233 | 234 | Undefined Mu functions behaves as defined in `Mu IR `__. 235 | 236 | Native frames can only enter the **READY** state when it calls back to Mu. 237 | Thus the resumption point is where it will continue after the native-to-Mu call 238 | returns. The received values are the return values to the native function. 239 | Throwing exceptions to native frames has implementation-defined behaviour. 240 | 241 | Mu gives the client a unified model of stack binding. The binding operation is 242 | only aware of the *Ts* types of the **READY** state, but oblivious of the 243 | resumption point. Therefore it can resume any **READY** stack in the same 244 | way, whether the resumption point is the beginning of a function, a call site, a 245 | trap or a swap-stack instruction, or a native frame. 246 | 247 | Note to the Mu implementers: swap-stack, Mu-to-Mu calls and Mu-native calls 248 | may all have different calling conventions, but the implementation must 249 | present a unified "resumption protocol" to the client: all stack-binding 250 | operations work on all OSR point instructions, as long as the *Ts* in 251 | *READY* match the passed values. In practice, some "adapter" frames may 252 | need to be inserted to convert one convention to another, but these frames 253 | must not be seen by the client. This implies that some API functions 254 | (especially stack introspection) must "lie" to the client about the presence 255 | of such frames. 256 | 257 | For example, on x86_64, assume ``SWAPSTACK`` passes values to the other 258 | stack via rdi and rsi, but ``RET`` returns values to the caller via rax and 259 | rdx. If, during OSR, a frame pausing on the ``CALL`` instruction becomes the 260 | top frame of a stack, then the Mu implementation must also create some glue 261 | code and an adapter frame above that frame, so that when a thread SWAP-STACK 262 | to this stack, values passed in rdi and rsi can be moved to rax and rdx, 263 | respectively. This "adapter" frame must also recover callee-saved registers. 264 | 265 | Unbinding 266 | --------- 267 | 268 | Some actions, including the ``@uvm.thread_exit``, ``TRAP``, ``WATCHPOINT`` and 269 | the ``SWAPSTACK`` instruction, can unbind a thread from a stack. 270 | 271 | When **unbinding** a thread from a stack, one of the following two actions shall 272 | be performed on the stack: 273 | 274 | An unbinding operation can **leave the stack** with a return types *Ts*. In this 275 | case, the state of its top frame changes from **ACTIVE** to **READY** for 276 | some given *Ts*. The instruction becomes the resumption point of the frame. 277 | 278 | An unbinding operation can **kill the stack**. In this case, the state of all 279 | frames of the stack changes from **ACTIVE** to **DEAD**. Specifically the 280 | ``@uvm.thread_exit`` kills the current stack and the ``SWAPSTACK`` instruction 281 | can do either option on the swapper. 282 | 283 | Executing a ``TRAP`` or an enabled ``WATCHPOINT`` instruction implies an 284 | unbinding operation, leaving the top frame in a **READY** state. 285 | 286 | Swap-stack 287 | ---------- 288 | 289 | **Swap-stack** is an operation that unbinds a thread from a stack and rebind 290 | that thread to a new stack. In a swap-stack operation, the stack to unbind from 291 | is called the **swapper** and the stack to bind to is called the **swappee**. 292 | 293 | The ``SWAPSTACK`` instruction (see ``__) performs a 294 | *swap-stack* operation. 295 | 296 | A trap handler can do similar things as *swap-stack* by re-binding the current 297 | thread to a different stack. 298 | 299 | Stack Destruction 300 | ================= 301 | 302 | The ``@uvm.kill_stack`` instruction, the ``kill_stack`` API function and all 303 | operations that perform unbinding operations can destroy a stack. Destroying a 304 | stack changes the state of all of its frames to **DEAD**. 305 | 306 | If a stack becomes unreachable from roots, the garbage collector may kill the 307 | stack. 308 | 309 | The Mu may change the value of ``stackref`` type to ``NULL`` if the stack it 310 | refers to is in the **DEAD** state. 311 | 312 | Stack Introspection 313 | =================== 314 | 315 | Stacks in the **READY** state can be introspected. Stacks in other states 316 | cannot. 317 | 318 | The stack introspection API uses **frame cursors**. A *frame cursor* is a 319 | mutable opaque structure allocated by Mu. It refers to a Mu frame, and also 320 | keeps implementation-dependent states necessary to iterate through frames in a 321 | stack. 322 | 323 | Note: The reason why it is mutable is that the cursor may be big. The states 324 | to be kept is specific to the implementation. Generally speaking, the more 325 | callee-saved registers there are, the bigger the cursor is. Allocating a new 326 | structure whenever moving down a frame may not scale for deep stacks. 327 | 328 | The ``new_cursor`` API call allocates a frame cursor that refers to the top 329 | frame of a given stack, and returns a ``framecursorref`` that refers to the 330 | cursor. Then the client can use the ``next_frame`` API to move the cursor to the 331 | frame below. The ``copy_cursor`` copies the given frame cursor. The original 332 | frame cursor and the copied cursor can move down independently. This is useful 333 | when the client wishes to iterate through the stack in different paces. The 334 | ``close_cursor`` API closes the frame cursor and deallocates its resources. 335 | 336 | It has undefined behaviour if a stack is bound to a thread or a stack is killed 337 | while there are frame cursors to its frames not closed. It has undefined 338 | behaviour if ``next_frame`` goes below the bottom frame. 339 | 340 | Note: There are several reasons why it needs explicit closing. 341 | 342 | * It forces the client to avoid racing stack modification and stack 343 | introspection. 344 | 345 | * It will not force the Mu implementation to use a particular way to 346 | allocate such cursors. The Mu implementation can use malloc and free. If 347 | the implementation uses garbage collection for such cursors, it can still 348 | treat the ``close_cursor`` operation as a no-op. 349 | 350 | * An alternative is to close all related cursors automatically when the 351 | stack is re-bound. But that will involve one extra check for every 352 | swap-stack operation, which may be much more frequent than stack 353 | introspection, which is usually only used in exceptional cases. 354 | 355 | The ``cur_func``, ``cur_func_ver``, ``cur_inst`` and ``dump_keepalives`` API 356 | calls take a ``framecursorref`` as argument, and returns the ID of the function, 357 | function version or current instruction of the frame, or dumps the keep-alive 358 | variables of the current instruction of the frame. 359 | 360 | Multiple threads may introspect the stack concurrently as long as there is no 361 | concurrent modification using the on-stack replacement API (see below). However, 362 | it has undefined behaviour to operate on a closed frame cursor. 363 | 364 | These operations can also be performed by their equivalent common instructions 365 | ``@uvm.meta.*``. 366 | 367 | On-stack Replacement 368 | ==================== 369 | 370 | The client can pop and push frames from or to a stack. 371 | 372 | The ``pop_frames_to`` API function takes a ``framecursorref`` as argument. It 373 | will pop all frames above the frame cursor, and the frame of the cursor becomes 374 | the new top frame. 375 | 376 | Popping native frames has implementation-defined behaviour. It has undefined 377 | behaviour if a frame is popped but there are frame cursors referring to that 378 | frame. 379 | 380 | The ``push_frame`` API function takes a ``stackref`` and a ``funcref`` as 381 | arguments. It creates a new frame on the top of the stack, using the current 382 | version (as seen by the current thread) of the given function. The resumption 383 | point is the beginning of the function version. The return types of the function 384 | must match the *Ts* of the state of the previous frame, which must be 385 | **READY**. 386 | 387 | There are equivalent common instructions in the IR, too. 388 | 389 | It has undefined behaviour if 390 | 391 | - there are two API calls or equivalent instructions executed by two threads, 392 | and 393 | - one is ``new_cursor``, ``next_frame``, ``cur_func``, ``cur_func_ver``, 394 | ``cur_inst``, ``dump_keepalives``, ``pop_frames_to`` or ``push_frame``, and 395 | - the other is ``pop_frames_to`` or ``push_frame``, and 396 | - neither happens before the other. 397 | 398 | After popping or pushing frames, the state of the stack become the state of the 399 | new top frame, which must be **READY** for some *Ts*. The stack can be 400 | bound. 401 | 402 | NOTE: For the ease of Mu implementation, the new function must continue 403 | from the beginning rather than an arbitrary instruction in the middle. 404 | Continuing from the middle of a function demands too much power from the 405 | code generator. 406 | 407 | However, in most OSR scenarios, the desired behaviour is to continue from 408 | the point where the program left out for optimisation. The client can 409 | emulate the behaviour of continuing from the middle of a function by 410 | inserting a "prologue" in the high-level language in the beginning of the 411 | function. For example, in C, the client can add extra assignment expressions 412 | to initialise local variables to the previous context and use a goto 413 | statement to jump to the location to continue. Then the optimising compiler 414 | can remove unreachable code. As another example, if the client implements a 415 | JVM, it can insert ``Xstore`` instructions and a ``goto`` instruction to 416 | continue from the appropriate bytecode instruction. The optimising compiler 417 | can handle the rest. 418 | 419 | TODO: With the goto-with-values form defined, we can extend the IR and the 420 | API so that execution can continue from an arbitrary basic block of an 421 | arbitrary function version, rather than just the beginning of a function. 422 | 423 | Futex 424 | ===== 425 | 426 | Mu provides a mechanism similar to the Futex in the Linux kernel for 427 | implementing blocking locks and other synchronisation primitives. 428 | 429 | There is a waiting queue for all memory locations that has some integer types. 430 | (See ``__ for valid candidate types for Futex.) 431 | 432 | The ``@uvm.futex.wait`` and the ``@uvm.futex.wait_timeout`` instructions put the 433 | current thread into the waiting queue of a memory location. Both 434 | ``@uvm.futex.wake`` and ``@uvm.futex.cmp_requeue`` wakes up threads in a waiting 435 | queue of a memory location. 436 | 437 | NOTE: The term *memory location* is defined in Mu's sense and is abstract 438 | over physical memory or the virtual memory space given by the operating 439 | system. Even if a Mu implementation uses copying or replicating garbage 440 | collectors, the memory location in a heap object remains the same until the 441 | object is collected. 442 | 443 | The Mu Futex is designed to be easy to map to the ``futex`` system call on 444 | Linux. With the presence of copying garbage collector, Mu may internally 445 | perform ``FUTEX_REQUEUE`` or ``FUTEX_CMP_REQUEUE`` operations to compensate 446 | the effect of object movements. It may put barriers around Futex-related Mu 447 | instructions when the GC is concurrently re-queuing threads. 448 | 449 | When a thread is blocking on a futex, the state of its stack is still ACTIVE, 450 | making the impression that the thread is still "busy executing" the futex 451 | wait/wait_timeout instructions. Only the kernel knows whether it is doing an 452 | OS-level swap-stack, as it always does for context-switching. 453 | 454 | .. vim: tw=80 455 | -------------------------------------------------------------------------------- /type-system.rest: -------------------------------------------------------------------------------- 1 | =========== 2 | Type System 3 | =========== 4 | 5 | Overview 6 | ======== 7 | 8 | Mu has a comprehensive type system. It is close to the machine level, but also 9 | has reference types for exact garbage collection. 10 | 11 | In the Mu IR, a type is created by a (possibly recursive) combination of type 12 | constructors. 13 | 14 | By convention, types are written in lower cases. Parameters to types are written 15 | in angular brackets ``< >``. 16 | 17 | Type and Data Value 18 | =================== 19 | 20 | A Mu **type** defines a set where a **data value**, or **value** when 21 | unambiguous, is one of its elements. 22 | 23 | Both SSA variables and the Mu memory can hold values in this type system. Some 24 | restrictions can limit what type a variable or a memory location can hold. 25 | 26 | Types and Type Constructors 27 | =========================== 28 | 29 | A **type constructor** represents an **abstract type**. A **concrete type** is 30 | created by applying a type constructor and supplying necessary **parameters**. 31 | The following type constructors are available in Mu: 32 | 33 | - **int** < *length* > 34 | - **float** 35 | - **double** 36 | - **uptr** < *T* > 37 | - **ufuncptr** < *sig* > 38 | - **struct** < *T1* *T2* *...* > 39 | - **hybrid** < *F1* *F2* *...* *V* > 40 | - **array** < *T* *length* > 41 | - **vector** < *T* *length* > 42 | - **void** 43 | - **ref** < *T* > 44 | - **iref** < *T* > 45 | - **weakref** < *T* > 46 | - **tagref64** 47 | - **funcref** < *sig* > 48 | - **threadref** 49 | - **stackref** 50 | - **framecursorref** 51 | - **irnoderef** 52 | 53 | .. 54 | 55 | For C programmers: In Mu IR, types cannot be "inlined", i.e. all types 56 | referenced by other definitions (such as other types, constants, globals, 57 | functions, ...) must be defined at top-level. For example:: 58 | 59 | .typedef @refi64 = ref> // WRONG. Cannot write int<64> inside. 60 | 61 | .typedef @i64 = int<64> 62 | .typedef @refi64 = ref<@i64> // Right. 63 | 64 | %sum = FADD %a %b // WRONG. "double" is a type constructor, not a type 65 | 66 | .typedef @double = double 67 | %sum = FADD <@double> %a %b // Right. 68 | 69 | Parameters of a type are in the angular brackets. They can be integer literals, 70 | types and function signatures. In the text form, the latter two are global 71 | names (See ``__). 72 | 73 | There are several kinds of types. 74 | 75 | * ``float`` and ``double`` are **floating point types**. 76 | * ``ref`` and ``weakref`` are **object referenct types**. 77 | * ``ref``, ``iref`` and ``weakref`` are **reference types**. 78 | * ``funcref``, ``threadref``, ``stackref``, ``framecursorref`` and ``irnoderef`` 79 | are **opaque reference types**. 80 | * *Reference types* and *opaque reference types* are **general reference types**. 81 | * ``int``, ``float``, ``double``, *pointer types*, *general reference types* and 82 | ``tagref64`` are **scalar types**. 83 | * ``struct``, ``hybrid``, ``array`` and ``vector`` are **composite types**. 84 | 85 | * ``void`` is neither a *scalar type* nor a *composite type*. 86 | 87 | * ``hybrid`` is the only **variable-length type**. All other types are 88 | **fixed-length types**. 89 | * ``uptr`` and ``ufuncptr`` are **pointer types**. 90 | * ``int``, *pointer types*, ``ref``, ``iref`` and *opaque reference types* are 91 | **EQ-comparable types**. 92 | * ``int``, ``iref`` and *pointer types* are **ULT-comparable types**. 93 | * ``ref`` is the **strong variant** of ``weakref``; ``weakref`` is the 94 | **weak variant** of ``ref``. All other types are the strong variant or weak 95 | variant of themselves. 96 | 97 | A **member** of a composite type T is either a field of T if T is a struct, or 98 | an element of T if T is an array or a vector, or a field in the fixed part or 99 | any element in the variable part if T is a hybrid. A **component** of type T is 100 | either itself or a member of any component of T. 101 | 102 | NOTE: This means a component is anything in a type, including itself and any 103 | arbitrarily nested members. 104 | 105 | The type parameter *T* in ``uptr`` and the return type and all parameter 106 | types of *sig* in ``ufuncptr`` must be **native-safe**. It is defined as 107 | following: 108 | 109 | * ``void``, ``int``, ``float`` and ``double`` are *native-safe*. 110 | 111 | * ``struct``, ``array``, ``vec`` and ``hybrid`` are native-safe if all of their type arguments *T1*, *T2*, ..., *T*, 113 | *F1*, *F2*, ..., *V* are native-safe. 114 | 115 | * ``uptr`` and ``ufuncptr`` are *native-safe* if *T* and the return type 116 | and all parameter types in *sig* are native-safe. Otherwise the ``uptr`` or 117 | the ``ufuncptr`` type is not well-formed. 118 | 119 | * All other types are not native-safe. (Specifically, they are all *general 120 | reference types* as well as ``struct``, ``array`` or ``hybrid`` that contains 121 | them.) 122 | 123 | Primitive Non-reference Types 124 | ============================= 125 | 126 | Integer Types 127 | ------------- 128 | 129 | ``int`` ``<`` *length* ``>`` 130 | 131 | length 132 | *integer literal*: The length of the integer in bits. 133 | 134 | ``int`` is an integer type of *length* bits. 135 | 136 | ``int`` is neutral to signedness. Negative numbers are represented in the 2's 137 | complement notation where the highest bit is the sign bit. 138 | 139 | ``int<1>`` is a Boolean type, in which case 1 means true and 0 means false. 140 | 141 | NOTE: The signedness of an ``int`` type is determined by the operations 142 | rather than the type. For example, ``UDIV`` treats both operands as unsigned 143 | numbers, ``SDIV`` treats both operands as signed numbers and ``ASHR`` treats 144 | the first operand as signed and the second operand as unsigned. 145 | 146 | .. 147 | 148 | NOTE: Although ``int<1>`` is required and ``int<6>`` and ``int<52>`` are 149 | also required when ``tagref64`` is implemented, they should not be part of 150 | any in-memory structure because their corresponding ``LOAD`` and ``STORE`` 151 | operations are not required for Mu implementations. ``int<1>`` is meant to 152 | represent register flags, such as the result of comparison and some overflow 153 | or carry flags. ``int<6>`` and ``int<52>`` are supposed to be used 154 | transiently when packing or unpacking a ``tagref64`` value. 155 | 156 | .. 157 | 158 | For LLVM users: these types are directly borrowed from LLVM. 159 | 160 | .. 161 | 162 | Example:: 163 | 164 | .typedef @i1 = int<1> 165 | .typedef @i8 = int<8> 166 | .typedef @i16 = int<16> 167 | .typedef @i32 = int<32> 168 | .typedef @i64 = int<64> 169 | 170 | Floating Point Types 171 | -------------------- 172 | 173 | ``float`` 174 | 175 | ``double`` 176 | 177 | ``float`` and ``double`` are the IEEE754 single-precision and double-precision 178 | floating point number type, respectively. 179 | 180 | For LLVM users: these types are directly borrowed from LLVM. 181 | 182 | .. 183 | 184 | Example:: 185 | 186 | .typedef @float = float 187 | .typedef @double = double 188 | 189 | Pointer Types 190 | ------------- 191 | 192 | ``uptr < T >`` 193 | 194 | ``T`` 195 | *type*: The type of the referent. 196 | 197 | ``ufuncptr < sig >`` 198 | 199 | ``sig`` 200 | *function signature*: The signature of the pointed function. 201 | 202 | ``uptr`` and ``ufuncptr`` are (untraced) pointer types which are represented by 203 | the integral address of the referent. They are part of the (unsafe) native 204 | interface. The "u" in their names stand for "untraced". Their values are not 205 | affected by the garbage collection, even if their addresses are obtained from 206 | pinning heap objects which are later unpinned. 207 | 208 | ``uptr`` is the data pointer type. It points to a region in the native address 209 | space which represents the data type *T*. 210 | 211 | ``ufuncptr`` is the function pointer type. It points to a native function whose 212 | signature is *sig*. 213 | 214 | The type parameter *T* and both the return types and the parameter types of 215 | *sig* must be *native-safe*. It is implementation-defined whether multiple 216 | return values are allowed for a particular calling convention. 217 | 218 | For LLVM users: ``uptr`` is the counterpart of pointer types ``T*``. 219 | ``ufuncptr`` is the counterpart of function pointers ``R (P1 P2 ...)*``. 220 | The ``PTRCAST`` instruction can cast between different pointer types as well 221 | as integer types. 222 | 223 | For C users: Similar to LLVM, ``uptr`` and ``ufuncptr`` are the equivalent 224 | of C pointers to objects and functions, respectively. However, since Mu 225 | interfaces with the native world at the ABI level rather than the C 226 | programming language level, pointers are defined as addresses and casting 227 | between pointers and integers has semantics. 228 | 229 | .. 230 | 231 | Example:: 232 | 233 | .typedef @i32 = int<32> // int 234 | .typedef @i32_p = uptr<@i32> // int* 235 | 236 | // ssize_t write(int fildes, const void *buf, size_t nbyte); 237 | // See man (2) write. 238 | .typedef @void = void // void 239 | .typedef @void_p = uptr<@void> // void* 240 | .typedef @size_t = int<64> // size_t, ssize_t 241 | .funcsig @write_s = (@i32 @void_p @size_t) -> (@size_t) 242 | .typedef @write_fp = ufuncptr<@write_s> 243 | // @write_fp may point to the native function "write". 244 | 245 | // typedef void (*sig_t) (int); 246 | // sig_t signal(int sig, sig_t func); 247 | // See man (3) signal. 248 | .funcsig @sig_s = (@i32) -> () 249 | .typedef @sig_t = ufuncptr<@sig_s> 250 | .funcsig @signal_s = (@i32 @sig_t) -> (@sig_t) 251 | .typedef @signal_fp = ufuncptr<@signal_s> 252 | // @signal_fp may point to the native function "signal". 253 | 254 | Aggregate Types 255 | =============== 256 | 257 | Struct 258 | ------ 259 | 260 | ``struct`` ``<`` *T1* *T2* *...* ``>`` 261 | 262 | T1, T2, ... 263 | *type*: The type of fields. 264 | 265 | A ``struct`` is a Cartesian product type of several types. *T1*, *T2*, *...* are 266 | its **field types**. A ``struct`` must have at least one member. ``struct`` 267 | members cannot be ``void``. 268 | 269 | NOTE: For C programs: C does not allow empty structures, either, but many 270 | programmers create empty structures in practice. C++ does allow empty 271 | classes. g++ treats empty classes as having one ``char`` element. In Mu, if 272 | it is desired to allocate an empty unit in the heap, the appropriate type is 273 | ``void``. 274 | 275 | A ``struct`` cannot have itself as a component. 276 | 277 | NOTE: If it could, the struct would be infinitely large. However, a struct 278 | may contain a reference to the same struct type, since the size of 279 | references is not dictated by the thing it points to. For example:: 280 | 281 | .typedef @foo = struct <@i32 @foo> // WRONG. @foo is infinitely big. 282 | 283 | .typedef @foo = struct <@i32 @fooref> 284 | .typedef @fooref = ref<@foo> // Okay. It is a linked list. 285 | 286 | ``struct`` cannot be the type of an SSA variable if any of its field types 287 | cannot be the type of an SSA variable. 288 | 289 | .. 290 | 291 | NOTE: For example, a ``struct`` with a ``weakref`` field cannot be the type 292 | of an SSA variable. However, there can be references to such structs. 293 | 294 | .. 295 | 296 | For LLVM users: This is the same as LLVM's structure type, except structures 297 | with a "flexible array member" (a 0-length array as the last element) 298 | corresponds to the ``hybrid`` type in Mu. 299 | 300 | .. 301 | 302 | Example:: 303 | 304 | .typedef @byte = int<8> 305 | .typedef @short = int<16> 306 | .typedef @int = int<32> 307 | .typedef @f = float 308 | .typedef @d = double 309 | 310 | .typedef @struct1 = struct<@byte @short @int @f @d> 311 | .typedef @struct2 = struct<@f @f @struct1 @d @d> // nesting structs 312 | 313 | Hybrid 314 | ------ 315 | 316 | ``hybrid`` ``<`` *F1* *F2* *...* *V* ``>`` 317 | 318 | F1, F2, ... 319 | *list of types*: The types in the fixed part 320 | V 321 | *type*: The type of the elements of the variable part 322 | 323 | A hybrid is a combination of a fixed-length prefix, i.e. its ``fixed part``, and 324 | a variable-length array suffix, i.e. its ``variable part``, whose length is 325 | decided at allocation time. *F1* *F2* ... are the types of fields in the fixed 326 | part. *V* is the type of the *elements* of the variable part. 327 | 328 | NOTE: This is intended to play the part of "struct with flexible array 329 | member" in C99, i.e. ``struct { F1 f1; F2 f2; ... V v[]; }``. 330 | 331 | The fixed part may contain 0 fields. In this case, the fixed part is empty, and 332 | the variable part is in the beginning of this ``hybrid`` memory location, like a 333 | variable-length array without a header. The variable part cannot be omitted. 334 | Neither any fixed-part field nor *V* can be ``void``. 335 | 336 | ``hybrid`` cannot be the type of any SSA variable. 337 | 338 | NOTE: There can be references to hybrids. 339 | 340 | ``hybrid`` is the only type in Mu whose length is determined at allocation site 341 | rather than determined by the type itself. 342 | 343 | ``hybrid`` cannot be contained in any other composite types, including other 344 | hybrids. 345 | 346 | NOTE: Since the length of hybrids are only known at allocation time, 347 | allowing embedding hybrid members will make the size of other types 348 | variable. In Mu's design, hybrid is the only type whose length is determined 349 | at allocation site. 350 | 351 | For C programmers: Just like ``struct { F f; V v[]; }`` cannot be embedded 352 | in other types, ``hybrid`` cannot, either. However, pointers/references to 353 | hybrids are allowed. 354 | 355 | .. 356 | 357 | Example:: 358 | 359 | .typedef @byte = int<8> 360 | .typedef @long = int<64> 361 | .typedef @double = double 362 | 363 | .typedef @struct1 = struct<@long @long @long> 364 | 365 | .typedef @hybrid1 = hybrid<@long @byte> // one int<64> followed by many int<8> 366 | .typedef @hybrid2 = hybrid<@long @long @long @double> // three int<64>, followed by many double 367 | .typedef @hybrid3 = hybrid<@struct1 @double> // similar to @hybrid2, but using struct as the header 368 | .typedef @hybrid4 = hybrid<@byte> // no fixed-part header. Just many int<8>. 369 | 370 | Array 371 | ----- 372 | 373 | ``array`` ``<`` *T* *length* ``>`` 374 | 375 | T 376 | *type*: The type of elements. 377 | length 378 | *integer literal*: The number of elements. 379 | 380 | An ``array`` is a sequence of values of the same type. *T* is its **element 381 | type**, i.e. the type of its elements, and *length* is the length of the array. 382 | *T* must not be ``void``. An array must have at least one element. 383 | 384 | An ``array`` cannot have itself as a component. 385 | 386 | It is not recommended to have SSA variables of ``array`` type. 387 | 388 | NOTE: The most useful feature of arrays is indexing by a variable index. 389 | But an SSA variable has more in common with registers than memory, and SSA 390 | variables are designed to be allocated in registers when possible. Mu 391 | implementations, as supposed to be minimal, may not be able to implement 392 | indexing more efficiently than storing the array to the memory and load 393 | back and element. 394 | 395 | Using arrays as an SSA variable is most useful when passing a ``struct`` 396 | value (not pointer) that contains an array member to a C function that 397 | requires such a parameter, although such C functions are, themselves, very 398 | rare. 399 | 400 | .. 401 | 402 | For LLVM users: Like LLVM arrays, a Mu array must have a size, but cannot 403 | have size 0. The closest counterpart of the "variable length array" (VLA) 404 | type in C is the ``hybrid`` type. 405 | 406 | .. 407 | 408 | Example:: 409 | 410 | .typedef @u8 = int<8> 411 | .typedef @real = double 412 | .typedef @cmpx = struct<@real @real> 413 | 414 | .typedef @array1 = array<@u8 4096> // array of 4096 bytes 415 | .typedef @array2 = array<@real 100> // array of 100 doubles 416 | .typedef @array3 = array<@cmpx 16> // array of 16 structs 417 | .typedef @array4 = array<@array2 1024> // array of 1024 nested arrays 418 | 419 | Vector Type 420 | ----------- 421 | 422 | ``vector < T length >`` 423 | 424 | ``T`` 425 | *type*: The type of elements. 426 | ``length`` 427 | *integer literal*: The number of elements. 428 | 429 | ``vector`` is the vector type for single-instruction multiple-data (SIMD) 430 | operations. A ``vector`` value is a packed value of multiple values of the same 431 | type. *T* is the type of its elements and *length* is the number of elements. 432 | *T* cannot be void. *length* must be at least one. 433 | 434 | It is allowed to have SSA variables of vector types. 435 | 436 | Only some primitive element types, such as ``int<32>``, ``float`` and 437 | ``double``, are `required `__ for implementations. If the 438 | implementation allows other types, then any vector cannot directly or indirectly 439 | contain itself as a member. 440 | 441 | For LLVM users: This is the counterpart of the LLVM vector type. 442 | 443 | .. 444 | 445 | Example:: 446 | 447 | .typedef @i32 = int<32> 448 | .typedef @float = float 449 | .typedef @double = double 450 | 451 | .typedef @vector1 = vector<@i32 4> 452 | .typedef @vector2 = vector<@float 4> 453 | .typedef @vector3 = vector<@double 2> 454 | .typedef @vector4 = vector<@double 4> 455 | 456 | Void Type 457 | ========= 458 | 459 | ``void`` 460 | 461 | The ``void`` type has no value. 462 | 463 | It can only be used as the type of allocation units that do not store any 464 | values. This allows allocating ``void`` in the heap/stack/global memory. 465 | Particularly, the ``NEW`` instruction with the type ``void`` creates a new empty 466 | heap object which is not the same as any others. This is similar to the ``new 467 | Object()`` expression in Java. ``ref``, ``iref``, ``weakref`` 468 | and ``uptr`` are also allowed, which can refer/point to "anything". 469 | 470 | 471 | Reference Types and General Reference Types 472 | =========================================== 473 | 474 | Reference Types 475 | --------------- 476 | 477 | ``ref`` ``<`` *T* ``>`` 478 | 479 | ``iref`` ``<`` *T* ``>`` 480 | 481 | ``weakref`` ``<`` *T* ``>`` 482 | 483 | T 484 | *type*: The type of referent. 485 | 486 | ``ref`` is an object reference type. A ``ref`` value is a strong reference to a 487 | heap objects. 488 | 489 | ``iref`` is an internal reference type. An ``iref`` value is an internal 490 | reference to a memory location. 491 | 492 | ``weakref`` is a weak object reference type. It is the weak variant of ``ref``. 493 | A memory location of ``weakref`` holds a weak reference to a heap object and can 494 | be clear to ``NULL`` by the garbage collector when there is no strong references 495 | to the object the ``weakref`` value refers to. 496 | 497 | NOTE: There is no weak internal reference. 498 | 499 | The type parameter ``T`` is the referent type, which is the type of the heap 500 | object or memory location its value refers to. 501 | 502 | All reference types can have ``NULL`` value which does not refer to any heap 503 | object or memory location. 504 | 505 | Weakref can only be the type of a memory location, not an SSA variable. When a 506 | ``weakref`` location is loaded from, the result is a ``ref`` to the same object; 507 | when a ``ref`` is stored to a ``weakref`` location, the location holds a 508 | ``weakref`` to that object. 509 | 510 | .. 511 | 512 | NOTE: Allowing SSA variables to hold weak references may cause many 513 | problems. The semantic allows the GC to change it to ``NULL`` at any time 514 | as long as the GC decides the referent object is no longer reachable. For 515 | this reason, it is impossible to guarantee a weak reference is ``NULL`` 516 | before accessing. Consider the following program:: 517 | 518 | %entry(): 519 | %notnull = NE <@RefT> %weakref @NULLREF 520 | // Just at this moment, GC changed %weakref to NULL 521 | BRANCH2 %notnull %bb_cont(%weakref) %bb_abnormal(...) 522 | 523 | %bb_cont(<@WeakRefT> %weakref): 524 | %val = LOAD <@T> %weakref // null reference access 525 | ... 526 | 527 | GC may clear the weak reference right after the program decided it is not 528 | ``NULL``. 529 | 530 | Requiring an explicit conversion from ``weakref`` to ``ref`` is not 531 | very useful. In that case, the only operation allowed for ``weakref`` is 532 | to convert to ``ref``. 533 | 534 | So letting this conversion happen implicitly during memory access is a 535 | natural choice, though not intuitive at all. In Mu's conceptual model, a 536 | memory load is like an IO operation: it does not simply "get" the value 537 | (such as an object reference) in the memory, but is a communication with the 538 | memory system and queries the global state. So it is natural for a load 539 | operation to return a different value each time executed. 540 | 541 | .. 542 | 543 | For LLVM users: there is no equivalence in LLVM. Mu guarantees that all 544 | references are identified both in the heap and in the stack and are subject 545 | to garbage collection. The closest counterpart in LLVM is the pointer type. 546 | Mu does not encourage the use of pointers, though pointer types will be 547 | introduced in Mu in the future. 548 | 549 | .. 550 | 551 | Example:: 552 | 553 | .typedef @i8 = int<8> 554 | .typedef @i16 = int<16> 555 | .typedef @i32 = int<32> 556 | .typedef @float = float 557 | .typedef @double = double 558 | .typedef @some_struct = struct<@i32 @i16 @i8 @double @float> 559 | .typedef @some_array = array<@i8 100> 560 | 561 | .typedef @ref1 = ref<@i32> 562 | .typedef @ref2 = ref<@some_struct> 563 | .typedef @ref3 = ref<@some_array> 564 | .typedef @iref1 = iref<@i32> 565 | .typedef @iref2 = iref<@some_struct> 566 | .typedef @iref3 = iref<@some_array> 567 | .typedef @weakref1 = weakref<@i32> 568 | .typedef @weakref2 = weakref<@some_struct> 569 | .typedef @weakref3 = weakref<@some_array> 570 | 571 | Tagged Reference 572 | ---------------- 573 | 574 | ``tagref64`` 575 | 576 | ``tagref64`` is a union type of ``double``, ``int<52>`` and ``struct 577 | int<6>``. It occupies 64 bits. A ``tagref64`` value holds both a state which 578 | identifies the type it is currently representing and a value of that type. 579 | 580 | 581 | NOTE: When a ``tagref64`` contains an object reference, it can hold an 582 | ``int<6>`` together as a user-defined tag. It is useful to store type 583 | information. 584 | 585 | When a ``tagref64`` represents a ``double`` NaN value, it does not preserve the 586 | bit-wise representation of the NaN. 587 | 588 | NOTE: This type is intended to reuse the NaN space of the IEEE754 double 589 | value to multiplex with integers and object references. For this reason, 590 | when storing NaN values, it will still be NaN, but may not have the same bit 591 | representation. 592 | 593 | .. 594 | 595 | NOTE: This type is only available on some architectures including x86-64 596 | with 48-bit addresses. 597 | 598 | Function Reference Type 599 | ----------------------- 600 | 601 | ``funcref`` ``<`` *sig* ``>`` 602 | 603 | sig 604 | *function signature*: The signature of the referred function. 605 | 606 | ``funcref`` is a function reference type. It is an opaque reference to a Mu 607 | function and is not interchangeable with reference types. *sig* is the signature 608 | of the function it refers to. 609 | 610 | A ``NULL`` value of a ``funcref`` type does not refer to any function. 611 | 612 | NOTE: The value of a ``funcref`` may refer to a function that is declared 613 | but not defined. The value of a ``funcref`` type does not change even the 614 | function it refers to becomes defined or redefined. 615 | 616 | .. 617 | 618 | For C and LLVM users: The ``funcref`` type is similar to the "pointer to 619 | function" type in C and LLVM, but it only refer to Mu functions. It is not a 620 | pointer (see the ``ufuncptr`` type). It may be implemented under the hood as 621 | a pointer to a function, which will be an implementation detail. 622 | 623 | .. 624 | 625 | Example:: 626 | 627 | .typedef @i64 = int<64> 628 | 629 | .funcsig @sig1 = (@i64 @i64) -> (@i64) 630 | .funcsig @sig2 = () -> () 631 | 632 | .typedef @func1 = funcref<@sig1> 633 | .typedef @func2 = funcref<@sig2> 634 | 635 | Other Opaque Reference Types 636 | ---------------------------- 637 | 638 | ``threadref`` 639 | 640 | ``stackref`` 641 | 642 | ``framecursorref`` 643 | 644 | ``irnoderef`` 645 | 646 | These types are opaque references to things within Mu. They are not 647 | interchangeable with reference types. Only some special instructions (e.g. 648 | ``@uvm.new_stack``, ``NEWTHREAD``, ``@uvm.meta.new_cursor``) or API calls can 649 | operate on them. 650 | 651 | All opaque reference values can be ``NULL``, which does not refer to anything. 652 | 653 | ``threadref`` and ``stackref`` refer to Mu Threads and Mu stacks, respectively. 654 | They are used to manipulate the `threads and stacks `__. In 655 | particular, ``stackref`` is used in the ``SWAPSTACK`` instruction. ``stackref`` 656 | is not a pointer to the top of the stack. It refers to the same stack even if 657 | frames are added or removed. 658 | 659 | ``framecursorref`` refers to to frame cursors. A frame cursor is an internal 660 | structure used by the stack introspection API to iterate through stack frames. 661 | Its content is mutable but opaque. See `Threads and Stacks 662 | `__ for more details. 663 | 664 | ``irnoderef`` refers to a Mu IR node being constructed by the `IR Builder API 665 | `__. 666 | 667 | .. vim: tw=80 668 | -------------------------------------------------------------------------------- /uvm-ir-binary.rest: -------------------------------------------------------------------------------- 1 | ========================================= 2 | Intermediate Representation (Binary Form) 3 | ========================================= 4 | 5 | This document describes the binary form of the Mu intermediate representation. 6 | For the text form, see ``__. 7 | 8 | **DEPRECATED**: The binary format is deprecated. As mentioned in `this ticket 9 | `__, we have come to the 10 | conclusion that the interface between the client and the micro VM should be a 11 | functional interface, i.e. constructing IR nodes by invoking API functions. This 12 | binary IR form is still a serialised data format that needs to be parsed. The 13 | text form, however, is still useful for debugging and for using in statically 14 | compiled implementations. 15 | 16 | Overview 17 | ======== 18 | 19 | The Mu IR Binary Form is similar to the `Text Form `__ in structure, 20 | but has notable differences. 21 | 22 | Numerical IDs are used exclusively instead of textual names. The binary form 23 | also provides a special "name binding" pseudo-top-level definition which 24 | associates IDs with names. 25 | 26 | Binary format 27 | ============= 28 | 29 | A bundle in the binary form consists of many numbers encoded in bytes. All 30 | numbers are encoded in **little endian** and are **tightly packed** which means 31 | there are no padding bytes between two adjacent numbers. For floating point 32 | numbers, it is equivalent to convert them bit-by-bit into integer types of the 33 | same length and convert to bytes in little-endian. 34 | 35 | Binary Types 36 | ------------ 37 | 38 | A sequence of bytes has a **binary type** which maps the bytes to the value they 39 | represent. Possible binary types are: 40 | 41 | i8, i16, i32, i64 42 | Integer types of the respective lengths. 43 | float, double 44 | Floating point types of 32 bits and 64 bits, respectively. 45 | idt 46 | Alias to i32. Used for IDs. 47 | lent 48 | Alias to i16. Used for lengths of variable-length structures including the 49 | number of fields in a struct and the number of items in a parameter list. 50 | aryszt 51 | Alias to i64. Used for the length of arrays. 52 | opct 53 | Alias to i8. Used for instruction opcodes, operations or flags. 54 | *other structures* 55 | One structure can contain other structures defined separately. 56 | 57 | A table is used to represent a contiguous structure. The first row is a list of 58 | binary types specifying the type of each column and the second row specifies for 59 | each column either a symbolic name for that field or the exact binary content 60 | expected. Such a structure consists of a sequence of numbers of the types of the 61 | first row. 62 | 63 | +-------+------------------+ 64 | | type1 | type2 | 65 | +=======+==================+ 66 | | num | or symbolic name | 67 | +-------+------------------+ 68 | 69 | Common Structures 70 | ================= 71 | 72 | Some structures are common in multiple structures. 73 | 74 | ID List 75 | ------- 76 | 77 | An ID list, denoted as **idList**, is a list of IDs. It has the general form: 78 | 79 | +------+-----+-----+-----+ 80 | | lent | idt | idt | ... | 81 | +======+=====+=====+=====+ 82 | | nIDs | id1 | id2 | ... | 83 | +------+-----+-----+-----+ 84 | 85 | ``nIDs`` specifies the number of IDs and there are ``nIDs`` IDs following it. 86 | 87 | Top-level Structure 88 | =================== 89 | 90 | A bundle starts with a 4-byte magic "\x7F' 'U' 'I' 'R', or 0x7F 0x55 0x49 91 | 0x52. Then there are many top-level definitions until the end of the bundle. 92 | 93 | Type Definition 94 | --------------- 95 | 96 | Type definition has the following form: 97 | 98 | +------+-----+------------------+ 99 | | opct | idt | type constructor | 100 | +======+=====+==================+ 101 | | 0x01 | id | cons | 102 | +------+-----+------------------+ 103 | 104 | ``id`` is the identifier of the defined type. A type constructor follows the 105 | opcode 0x01 and the ID. See ``__ for a complete list of type 106 | constructors. 107 | 108 | NOTE: this is equivalent to: ``.typedef id = cons``. 109 | 110 | Function Signature Definition 111 | ----------------------------- 112 | 113 | Function signature definition has the following form: 114 | 115 | +------+-----+----------+----------+ 116 | | opct | idt | idList | idList | 117 | +======+=====+==========+==========+ 118 | | 0x02 | id | paramtys | rettys | 119 | +------+-----+----------+----------+ 120 | 121 | ``id`` is the identifier of the defined function signature. ``paramtys`` is a 122 | list of IDs of its parameter types. ``rettys`` is a list of IDs of the return 123 | types. 124 | 125 | NOTE: this is equivalent to: ``.funcsig id = (paramtys) -> (rettys)`` 126 | 127 | Constant Definition 128 | ------------------- 129 | 130 | Constant definition has the following form: 131 | 132 | +------+-----+------+----------------------+ 133 | | opct | idt | idt | constant constructor | 134 | +======+=====+======+======================+ 135 | | 0x03 | id | type | cons | 136 | +------+-----+------+----------------------+ 137 | 138 | ``id`` is the identifier of the defined constant. ``type`` is the type of the 139 | constant and must match the constant constructor. A constant constructor follows 140 | the type. 141 | 142 | NOTE: this is equivalent to: ``.const id = cons`` 143 | 144 | Integer Constant Constructor 145 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 146 | 147 | An integer constant constructor has the following form: 148 | 149 | +------+--------+ 150 | | opct | i64 | 151 | +======+========+ 152 | | 0x01 | number | 153 | +------+--------+ 154 | 155 | ``number`` is the integer constant number. If the integer constant has a type 156 | with fewer bits, only the least significant bits are valid. The binary form 157 | cannot encode integer constants larger than 64 bits. 158 | 159 | NOTE: this is equivalent to an integer literal in the text form. 160 | 161 | Floating Point Constant Constructors 162 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 163 | 164 | A float constant constructor has the following form: 165 | 166 | +------+--------+ 167 | | opct | float | 168 | +======+========+ 169 | | 0x02 | number | 170 | +------+--------+ 171 | 172 | ``number`` is the float constant number. 173 | 174 | NOTE: this is equivalent to a float literal in the text form. 175 | 176 | A double constant constructor has the following form: 177 | 178 | +------+--------+ 179 | | opct | double | 180 | +======+========+ 181 | | 0x03 | number | 182 | +------+--------+ 183 | 184 | ``number`` is the double constant number. 185 | 186 | NOTE: this is equivalent to a double literal in the text form. 187 | 188 | List Constant Constructor 189 | ~~~~~~~~~~~~~~~~~~~~~~~~~ 190 | 191 | A list constant constructor has the following form: 192 | 193 | +------+---------+ 194 | | opct | idList | 195 | +======+=========+ 196 | | 0x04 | elems | 197 | +------+---------+ 198 | 199 | ``elems`` is a list of IDs, each of which refers to another constant which is 200 | the value of the corresponding field of the struct or element of array/vector. 201 | 202 | NOTE: this is equivalent to the struct literal ``{elems}`` in the text 203 | form. 204 | 205 | NULL Constant Constructor 206 | ~~~~~~~~~~~~~~~~~~~~~~~~~ 207 | 208 | A NULL constant constructor has the following form: 209 | 210 | +------+ 211 | | opct | 212 | +======+ 213 | | 0x05 | 214 | +------+ 215 | 216 | NOTE: this is equivalent to the ``NULL`` keyword in the text form. 217 | 218 | Global Cell Definition 219 | ---------------------- 220 | 221 | Global cell definition has the following form: 222 | 223 | +------+-----+------+ 224 | | opct | idt | idt | 225 | +======+=====+======+ 226 | | 0x04 | id | type | 227 | +------+-----+------+ 228 | 229 | ``id`` is the ID of the defined global cell. ``type`` is the type of the global 230 | cell. 231 | 232 | NOTE: this is equivalent to: ``.global id `` 233 | 234 | Function Definition and Declaration 235 | ----------------------------------- 236 | 237 | Function declaration has the following form: 238 | 239 | +------+-----+-----+ 240 | | opct | idt | idt | 241 | +======+=====+=====+ 242 | | 0x05 | id | sig | 243 | +------+-----+-----+ 244 | 245 | ``id`` is the ID of the declared function. ``sig`` is the function signature of 246 | it. 247 | 248 | NOTE: this is equivalent to: ``.funcdecl id `` 249 | 250 | Function definition has the following form: 251 | 252 | +------+-----+-------+-----+---------------+ 253 | | opct | idt | idt | idt | function body | 254 | +======+=====+=======+=====+===============+ 255 | | 0x06 | id | verid | sig | body | 256 | +------+-----+-------+-----+---------------+ 257 | 258 | ``id`` is the ID of the defined function. ``verid`` is the ID of the version of 259 | the function. ``sig`` is the function signature of it. ``params`` is a list of 260 | IDs, each of which is the ID of its parameter. ``body`` is the function body. 261 | 262 | NOTE: this is equivalent to: ``.funcdef id VERSION verid { 263 | body }`` 264 | 265 | Function Body 266 | ============= 267 | 268 | A **function body** has the following form: 269 | 270 | +------+-------------+-------------+-----+ 271 | | lent | basic block | basic block | ... | 272 | +======+=============+=============+=====+ 273 | | nbbs | bb1 | bb2 | ... | 274 | +------+-------------+-------------+-----+ 275 | 276 | ``nbbs`` is the number of basic blocks. ``bbx`` are basic blocks. 277 | 278 | A **basic block** has the following form: 279 | 280 | +-----+---------+---------+-----+--------+-------------+-------------+-----+ 281 | | idt | lent | idPairs | idt | lent | instruction | instruction | ... | 282 | +=====+=========+=========+=====+========+=============+=============+=====+ 283 | | id | nparams | params | exc | ninsts | inst1 | inst2 | ... | 284 | +-----+---------+---------+-----+--------+-------------+-------------+-----+ 285 | 286 | ``id`` is the ID of the basic block. Every basic block must have an ID, *even 287 | the entry block*. ``nparams`` is the number of parameters in ``params``, which a 288 | list of pairs of IDs, each of which is: 289 | 290 | +------+-------+ 291 | | idt | idt | 292 | +======+=======+ 293 | | type | param | 294 | +------+-------+ 295 | 296 | where ``type`` is the ID of the type of the parameter, and ``param`` is a 297 | parameter to the basic block. 298 | 299 | ``exc`` is the ID of the exceptional parameter. It is omitted when the ID is 0. 300 | 301 | ``ninsts`` is the number of instructions in the current basic block. There are 302 | ninsts instructions following the header. 303 | 304 | An **instruction** has the following form: 305 | 306 | +--------+-----+------------------+ 307 | | idList | idt | instruction body | 308 | +========+=====+==================+ 309 | | resIDs | id | instbody | 310 | +--------+-----+------------------+ 311 | 312 | ``resIDs`` is a list of IDS for the results. ``id`` is the ID of the 313 | instruction. ``instbody`` is instruction body which is specific to each 314 | instruction. See ``__ for an exhaustive list. 315 | 316 | Function Exposing Definition 317 | ============================ 318 | 319 | Function declaration has the following form: 320 | 321 | +------+-----+------+----------+--------+ 322 | | opct | idt | idt | opct | idt | 323 | +======+=====+======+==========+========+ 324 | | 0x07 | id | func | callConv | cookie | 325 | +------+-----+------+----------+--------+ 326 | 327 | ``id`` is the ID of the exposed value. ``func`` is the ID of the function to 328 | expose. ``callConv`` is the calling convention flag. ``cookie`` is the cookie, 329 | the ID of an ``int<64>`` constant. 330 | 331 | Name Binding 332 | ============ 333 | 334 | Name binding is a definition specific to the binary form. It binds a name to an 335 | ID. It is designed for debugging purposes and is optional. The name must be a 336 | valid textual global identifier (including the prefix '@'). 337 | 338 | A name binding has the following form: 339 | 340 | +------+-----+--------+-------+-------+-----+ 341 | | opct | idt | lent | i8 | i8 | ... | 342 | +======+=====+========+=======+=======+=====+ 343 | | 0x08 | id | nbytes | byte1 | byte2 | ... | 344 | +------+-----+--------+-------+-------+-----+ 345 | 346 | ``id`` is the ID to bind. ``nbytes`` is the number of bytes in the name and 347 | ``bytex`` is the value of each byte. 348 | 349 | The name is encoded in ASCII and must follow the rules of global names, local 350 | names and allowed characters as defined in ``__. 351 | 352 | .. vim: textwidth=80 353 | -------------------------------------------------------------------------------- /uvm-memory.rest: -------------------------------------------------------------------------------- 1 | ================= 2 | Mu and the Memory 3 | ================= 4 | 5 | Overview 6 | ======== 7 | 8 | Mu supports automatic memory management via garbage collection. There is a 9 | **heap** which contains garbage-collected **objects** as well as many **stacks** 10 | and a **global memory** which contain data that are not garbage-collected. 11 | 12 | The heap memory is managed by the garbage collector. 13 | 14 | Unlike C or C++, local SSA variables are not bound to memory locations. Stack 15 | memory must be allocated by the ``ALLOCA`` or ``ALLOCAHYBRID`` instructions or 16 | using the Mu client interface. 17 | 18 | This specification does not mandate any object layout, but it is recommended to 19 | layout common data types as in the platform application binary interface (ABI) 20 | so that the native interface is easier to implement. 21 | 22 | Basic Concepts 23 | ============== 24 | 25 | Mu Memory 26 | --------- 27 | 28 | There are three kinds of memory in Mu, namely the **heap memory**, the **stack 29 | memory** and the **global memory**. **Mu memory** means one of them. 30 | 31 | Memory is allocated in their respective **allocation units**. Every allocation 32 | unit has a **lifetime** which begins when the allocation unit is created and 33 | ends when it is destroyed. 34 | 35 | A **memory location** is a region of data storage in the memory which can hold 36 | data values. A memory location has a type and its value can only be of that 37 | type. 38 | 39 | NOTE: The Mu memory is defined without mentioning "address". There is no 40 | "size", "alignment" or "offset" of a memory location. The relation between a 41 | memory location and an address is only established when pinning (discussed 42 | later). Even when pinned, the address may or may not be the canonical 43 | address where the location is allocated in Mu. For example, Mu objects can 44 | be replicated. 45 | 46 | When allocating Mu memory locations (in the heap, stack or global memory), 47 | Mu guarantee the location can hold Mu values of a particular type and, as 48 | long as the type allows atomic access, the location can be accessed 49 | atomically. The implementation must ensure all memory locations are 50 | allocated in such a way. For example, it should not allocate an integer 51 | across page boundary, but it may choose to use locks for atomicity, which, 52 | in practice, is usually a bad idea. 53 | 54 | .. 55 | 56 | For C programmers: The word "object" in the C language is the counterpart of 57 | "memory location" in Mu. Mu does not have bit fields and a memory location 58 | is always an "object" in C's sense. In Mu's terminology, the word "object" 59 | is a synonym of "heap object" or "garbage-collected object". 60 | 61 | In C, the word "memory location" must have scalar types, but Mu uses the 62 | word for composite types, too. 63 | 64 | For a memory location L that represents type T, if c is a member (if applicable) 65 | or a component of T, it also has a memory location which is a **member** or a 66 | **component** of the memory location L, respectively. Memory location L1 67 | **contains** a memory location L2 if L2 is a component of L1. 68 | 69 | The **lifetime** of a memory location is the same as the allocation unit that 70 | contains it. 71 | 72 | As implementation details, when an allocation unit is destroyed and another 73 | allocation unit occupied the same or overlapping space as the former, they are 74 | different allocation units. Different allocation units contain no common memory 75 | locations. When a heap object is moved by the garbage collector, it is still the 76 | same object. Any memory locations within the same object remain the same. 77 | 78 | NOTE: This means the memory of Mu is an abstraction over the memory space of 79 | the process. 80 | 81 | Native Memory 82 | ------------- 83 | 84 | The **native memory** is not Mu memory. The native memory is an address space of 85 | a sequence of bytes, each can be addressed by an integral address. The size of 86 | the address is implementation-defined. 87 | 88 | A region of bytes in the native memory can be interpreted as Mu values in an 89 | implementation-dependent way. The bytes that represents a Mu value is the 90 | **bytes representation** of that Mu value. 91 | 92 | For C programmers: it is similar to the "object representation", but in Mu, 93 | unless a memory location is pinned, it may not be represented in such a way. 94 | 95 | A Mu memory location can be **pinned**. In this state, it is mapped to a 96 | (contiguous) region of bytes in the native memory which contains the bytes 97 | representation of the value the memory location holds. The beginning of the 98 | memory location is mapped to the lowest address of the region. Different 99 | components of a memory location which do not contain each other do not map to 100 | overlapping regions in the address space. 101 | 102 | For C programmers: 103 | 104 | * Mu assumes 8-bit bytes. 105 | 106 | * Mu does not have the bit-field type, but a client can implement bit-fields 107 | using integer types and bit operations. 108 | 109 | * Mu does not have union types. However, like C, directly casting an address 110 | to a pointer has implementation-defined behaviours. If a Mu program 111 | interfaces with native programs, it has to also depend on the platform. 112 | 113 | * Unlike C, Mu operations work on SSA variables rather than memory locations 114 | (the counterpart of objects in C). 115 | 116 | * Mu forces the 2's complement representation, though the byte order and 117 | alignment requirement are implementation-defined. 118 | 119 | See `Native Interface `__ for details about the pinning and 120 | unpinning operations. 121 | 122 | Memory Allocation and Deallocation 123 | ================================== 124 | 125 | An allocation unit in the heap memory is called a **heap object**, or **object** 126 | when unambiguous. It is created when executing the ``NEW`` or ``NEWHYBRID`` 127 | instructions or the ``new_fixed`` or ``new_hybrid`` API function. It is 128 | destroyed when the object is collected by the garbage collector. 129 | 130 | An allocation unit in the stack memory is called an **alloca cell**. It is 131 | created when executing the ``ALLOCA`` or ``ALLOCAHYBRID`` instruction. It is 132 | destroyed when the stack frame containing it is destroyed. 133 | 134 | An allocation unit in the global memory is called a **global cell**. One global 135 | cell is created for every ``.global`` declaration in a bundle submitted to Mu. 136 | Global cells are never destroyed. 137 | 138 | Initial Values 139 | -------------- 140 | 141 | The initial value of any memory location is defined as the following, according 142 | the type of data value the memory location represents: 143 | 144 | * The initial value of ``int`` and pointer types is 0 (numerical value or 145 | address). 146 | * The initial value of floating point types is positive zero. 147 | * The initial value of ``ref``, ``iref``, ``weakref``, ``funcref``, ``stackref`` 148 | and ``threadref`` is ``NULL``. 149 | * The initial value of ``tagref64`` is a floating point number which is 150 | positive zero. 151 | * The initial values of all fields or elements in ``struct``, ``array``, 152 | ``vector`` and the fixed and variable part of ``hybrid`` are the initial 153 | values according to their respective types. 154 | 155 | Garbage Collection 156 | ------------------ 157 | 158 | A **root** is an object reference or internal reference in: 159 | 160 | * any global cell, or 161 | * any bound Mu stacks, or 162 | * the thread-local object reference in any threads, or 163 | * any values held by any client contexts in the API. 164 | 165 | A live stack contains references in its alloca cells and live local SSA 166 | variables. A dead stack contains no references. A thread can strongly reach its 167 | bound stack unless it is temporarily unbound because of trapping. 168 | 169 | An object is **strongly reachable** if it can be reached by traversing only 170 | strong, stack and thread references from any root. An object is **weakly 171 | reachable** if it is not strongly reachable, but can be reached by traversing 172 | strong stack, thread and weak references from any root. Otherwise an object is 173 | **unreachable**. 174 | 175 | The garbage collector can collect unreachable objects. It may also modify weak 176 | references which refers to a weakly reachable object to ``NULL``. 177 | 178 | NOTE: Doing the latter may make weakly reachable objects become unreachable. 179 | 180 | The garbage collector may move objects. 181 | 182 | Memory Accessing 183 | ================ 184 | 185 | Memory accessing operations include **load** and **store** operations. To 186 | **access** means to load or store. **Atomic read-modify-write** operations may 187 | have both a load and a store operation, but may have special atomic properties. 188 | 189 | NOTE: Instructions are named in capital letters: LOAD and STORE. The 190 | abstract operations are in lower case: load, store and access. 191 | 192 | Memory access operations can be performed by some Mu instructions (see 193 | `Instruction Set `__, API functions (see `Client Interface 194 | `__), native programs which accesses the pinned Mu memory, 195 | or in other implementation-specific ways. 196 | 197 | Two memory accesses **conflict** if one stores to a memory location and the 198 | other loads from or stores to the same memory location. 199 | 200 | Parameters and Semantics of Memory Operations 201 | --------------------------------------------- 202 | 203 | Generally speaking, load operations copy values from the memory and store 204 | operations copy values into the memory. The exact results are determined by the 205 | memory model. See `Memory Model `__. 206 | 207 | A **load** operation has parameters ``(ord, T, loc)``. *ord* is the memory order 208 | of the operation. *T* is the type of *loc*, a memory location. The result is a 209 | value of the strong variant of type *T*. 210 | 211 | A **store** operation has parameters ``(ord, T, loc, newVal)``. *ord is the 212 | memory order of the operation. *T* is the type of *loc*, a memory location. 213 | *newVal* is a value whose type is the strong variant of *T*. This operation does 214 | not produce values as result. 215 | 216 | A **compare exchange** operation is an atomic read-modify-write operation. Its 217 | parameters are ``(isWeak, ordSucc, ordFail, T, loc, expected, desired)``. 218 | *isWeak* is a Boolean parameter which indicates whether the compare exchange 219 | operation is weak or string. *ordSucc* and *ordFail* are the memory orders of 220 | the operation when the comparing is successful or failed. *T* is the type of the 221 | memory location *loc*. *expected* and *desired* are values whose type is the 222 | strong variant of *T*. The result is a pair ``(v, s)``, where *v* has the type 223 | of the strong variant of *T*, and *s* is a Boolean value. 224 | 225 | A compare exchange operation performs a load operation on *loc* and compares its 226 | result with *expected*. If the comparison is successful, it performs a store 227 | operation to location *loc* with *desired* as *newVal*. 228 | 229 | If the operation is strong, The comparison succeeds **if and only if** the 230 | result of load equals *expected*. If it is weak, the comparison succeeds **only 231 | if** the result of load equals the *expected* value and it may spuriously fail, 232 | that is, it may fail even if the loaded value equals the *expected* value. 233 | 234 | The result *v* is the result of the initial load operation and *s* is whether 235 | the comparison is successful or not. 236 | 237 | An **atomic-x** operation is an atomic read-modify-write operation, where *x* 238 | can be one of (XCHG, ADD, SUB, AND, NAND, OR, XOR, MAX, MIN, UMAX, UMIN). Its 239 | parameters are ``(ord, T, loc, opnd)``. *ord* is the memory order of the 240 | operation. *T* is the type of the memory location *loc*. *opnd* is a value whose 241 | type is the strong variant of *T*. The result also has the type of the strong 242 | variant of *T*. 243 | 244 | An atomic-x operation performs a load operation on location *loc*. Then 245 | according to *x*, it performs one of the binary operation below, with the result 246 | of the load operation as the left-hand-side operand and the value *opnd* as the 247 | right-hand-side operand. The result is: 248 | 249 | XCHG 250 | The value of *opnd*. 251 | ADD 252 | The sum of the two operands. 253 | SUB 254 | The difference of the two operands. 255 | AND 256 | The bitwise AND of the two operands. 257 | NAND 258 | The bitwise NOT of the bitwise AND of the two operands. 259 | OR 260 | The bitwise inclusive OR of the two operands. 261 | XOR 262 | The bitwise exclusive OR of the two operands. 263 | MAX 264 | The maximum value of the two operands, considering both operand as signed. 265 | MIN 266 | The minimum value of the two operands, considering both operand as signed. 267 | UMAX 268 | The maximum value of the two operands, considering both operand as unsigned. 269 | UMIN 270 | The minimum value of the two operands, considering both operand as unsigned. 271 | 272 | .. 273 | 274 | NOTE: In the C syntax, the semantic of NAND is ``~(op1 & op2)``. 275 | 276 | Then it performs a store operation to location *loc* with the result of the 277 | binary operation as *newVal*. 278 | 279 | The result of the atomic-x operation is the result of the initial load 280 | operation. 281 | 282 | All operators other than ``XCHG`` are only applicable for integer types. 283 | ``XCHG`` is allowed for any type. However, a Mu implementation may only 284 | implement some combinations of operators and operand types according to the 285 | requirements specified in `Portability `__ 286 | 287 | Memory Operations on Pointers 288 | ----------------------------- 289 | 290 | Load, store, compare exchange and atomic-x operations can work with native 291 | memory in addition to Mu memory locations. In this case, the *loc* parameter of 292 | the above operations become a region of bytes in the native memory (usually 293 | represented as ``uptr``) rather than memory locations (usually represented as 294 | ``iref``). 295 | 296 | Only *native-safe* types can be accessed via pointers. 297 | 298 | When accessing the memory via pointers, if the bytes are mapped to a Mu memory 299 | location via pinning (see `Native Interface `__), then if the 300 | referent type of the pointer is the same as the Mu memory location, it has the 301 | same effect as accessing the corresponding Mu memory location. 302 | 303 | When non-atomically loading from or storing to a region *R* of bytes which is 304 | 305 | 1. not mapped to (i.e. not perfectly overlapping with) a particular Mu memory 306 | location, and 307 | 2. each byte in the region is part of any mapped byte region of any pinned Mu 308 | memory location, 309 | 310 | then such an operation loads or stores on a byte-by-byte basis. Specifically: 311 | 312 | * Such a load operation *L*: 313 | 314 | 1. for each address *A* of byte in the region *R*, performs a load operation 315 | on the (only) Mu memory location of scalar types (not composite types) 316 | whose mapped byte region *R2* contains address *A*, then extract the byte 317 | value *b* at address *A*, then 318 | 319 | 2. combine all results *b* from the previous step into a sequence of byte 320 | values, then interprets it as the bytes representation of a Mu value. 321 | This Mu value is the result of the load operation *L*. 322 | 323 | * Such a store operation *S*: 324 | 325 | 1. interprets its *newVal* argument as its bytes representation *B*, then 326 | 327 | 2. for each address *A* of byte in the region *R*, performs a load operation 328 | on the (only) Mu memory location of scalar types (not composite types) 329 | whose mapped byte region *R2* contains address *A*, then update the 330 | result by replacing the byte at address *A* with the byte in *B*, then 331 | perform a store operation on the same Mu memory location with the updated 332 | value as *newVal*. 333 | 334 | .. 335 | 336 | NOTE: This allows Mu to allocate a byte array and access (by itself or by 337 | native programs) it via pointers as if it is a struct or a union, and then 338 | interpret the written values as bytes. The requirement of each byte being 339 | mapped gives implementation-defined behaviours to accesses beyond the border 340 | of any Mu objects (such as array out-of-bound errors), or accessing padding 341 | bytes in Mu structs. 342 | 343 | Accessing native memory regions not mapped to Mu memory locations has 344 | implementation-defined behaviours. 345 | 346 | NOTE: Accessing the native memory may have all kinds of results: getting a 347 | previously-stored value, storing to one address and affect another address 348 | when two addresses are mapped to the same physical memory region/file, 349 | segmentation fault, bus error (especially on OSX), turning on/off the light 350 | by doing memory-mapped IO, launching nuclear missiles, summoning nasal 351 | demons, etc. Mu cannot make much guarantee. 352 | 353 | Native programs can access pinned Mu memory locations in implementation-defined 354 | ways. 355 | 356 | NOTE: This means it requires the efforts from the implementations of both Mu 357 | and the native programs to obtain any defined semantics in mixed Mu-native 358 | programs. For C, it will involve the C language, the platform ABI and the Mu 359 | ABI of that platform. 360 | 361 | Memory Layout 362 | ============= 363 | 364 | Whether or how Mu data of any type are represented in the native memory is 365 | implementation-defined. When an object is pinned, the layout is viewed from the 366 | native memory in a platform-dependent way. 367 | 368 | For Mu implementers, it is recommended to use the layout defined by the 369 | application binary interface of the platform in order to ease the data exchange 370 | via the native interface implementation. 371 | 372 | Mu has some rules about Mu memory locations which must always preserved. 373 | 374 | Rules of Memory Locations 375 | ========================= 376 | 377 | Every memory location has an associated type bound when the memory location is 378 | created and cannot be changed. The memory location can only hold values of that 379 | type. 380 | 381 | NOTE: The association between memory location and type is conceptual. This 382 | does not mean the Mu implementation has to keep a metadata of the type of 383 | all memory locations at runtime. The implementation only needs to keep 384 | enough metadata to implement its garbage collector. 385 | 386 | A memory location has a **beginning** and an **end**. The value it holds is 387 | represented in that region. A non-NULL internal reference of type *T* refers to 388 | the memory location of type *T* at a specific beginning. 389 | 390 | NOTE: There can only be one such memory location. 391 | 392 | Specifically, there is a memory location of type ``void`` at the beginning of 393 | any other memory location. 394 | 395 | NOTE: This makes it legal to cast any ``iref`` to ``iref`` and 396 | back. 397 | 398 | Prefix Rule 399 | ----------- 400 | 401 | NOTE: The prefix rule is design to support having common language-specific 402 | object headers in objects. It also supports inheritance in object-oriented 403 | programming where a superclass is a prefix of a subclass. 404 | 405 | A component C is a **prefix** of a type T if any of the following is true. 406 | 407 | + *C* is *T* itself. 408 | + *T* is a ``struct`` and *C* is its first field. 409 | + *T* is a ``hybrid`` and *C* is its first field of the fixed part, or the fixed 410 | part of *T* has no fields and *C* is the first element of the variable part. 411 | + *T* is an ``array`` or ``vector`` and n >= 1, and *C* is its first 412 | element. 413 | + *C* is a prefix of another prefix of *T*. 414 | 415 | A prefix of memory location *M* is the memory location that represents a prefix 416 | of the type of *M*. 417 | 418 | All prefixes of a memory location have the same beginning. 419 | 420 | The ``REFCAST`` instruction or the ``refcast`` API function preserves the 421 | beginning of the operand. If it casts ``iref`` to ``iref``, the result 422 | is an internal reference to the memory location of type ``T2`` at the same 423 | beginning. (see `Instruction Set `__) 424 | 425 | Array Rule 426 | ---------- 427 | 428 | A **memory array** is defined as a contiguous memory location of components of 429 | the same type. The ``array`` type, the ``vector`` type as well as the variable 430 | part of a ``hybrid`` are all represented in the memory as memory arrays. 431 | 432 | Nested ``array``, ``vector`` and variable part of ``hybrid`` can be considered 433 | as a single memory array with the innermost element type of the nested type as 434 | the element type. 435 | 436 | Example: The variable part of ``hybrid 10 437 | 20>>`` can be treated as: 438 | 439 | + a memory array of ``float``, or, 440 | + a memory array of ``vector``, or 441 | + a memory array of ``array 10>``, or 442 | + a memory array of ``array 10> 20>``, or 443 | 444 | Internal references to an element of a memory array can be shifted to other 445 | elements in the same memory array using the ``SHIFTIREF`` instruction. 446 | 447 | NOTE: ``SHIFTREF`` may cross the boundary of Mu types, but still remain in 448 | the memory array. For example, an internal reference to the first ``float`` 449 | in the ``array 10>`` array, which is a 10x10 matrix of 450 | float, can be shifted to other rows using the ``SHIFTIREF`` instruction and 451 | cross the 10-element boundary. Shifting by 12 elements from element (0,0) 452 | will reach the element at (1,2). 453 | 454 | .. vim: tw=80 455 | --------------------------------------------------------------------------------