├── .gitignore
├── README.rest
├── _Sidebar.md
├── common-insts.rest
├── hail.rest
├── instruction-set.rest
├── irbuilder.rest
├── memory-model.rest
├── muapi.h
├── native-interface-x64-unix.rest
├── native-interface.rest
├── overview.rest
├── portability.rest
├── scripts
    ├── extract_comminst_macros.py
    └── muapiparser.py
├── threads-stacks.rest
├── type-system.rest
├── uvm-client-interface.rest
├── uvm-ir-binary.rest
├── uvm-ir.rest
└── uvm-memory.rest


/.gitignore:
--------------------------------------------------------------------------------
1 | *~
2 | *.py[co]
3 | *.swp
4 | .DS_Store
5 | __pycache__
6 | 


--------------------------------------------------------------------------------
/README.rest:
--------------------------------------------------------------------------------
 1 | ================
 2 | Mu Specification
 3 | ================
 4 | 
 5 | This document aims to provide a detailed description of Mu, a micro virtual
 6 | machine, including its architecture, instruction set and type system.
 7 | 
 8 |     NOTE: This branch uses the goto-with-values form. The previous branch using
 9 |     SSA form with PHI nodes is in the `phi
10 |     <https://github.com/microvm/microvm-spec/tree/phi>`__ branch.
11 | 
12 | Main specification:
13 | 
14 | - `Overview <overview.rest>`__
15 | - `Intermediate Representation <uvm-ir.rest>`__
16 | - `Intermediate Representation Binary Form (deprecated) <uvm-ir-binary.rest>`__
17 | - `Type System <type-system.rest>`__
18 | - `Instruction Set <instruction-set.rest>`__
19 | - `Common Instructions <common-insts.rest>`__
20 | - `Client Interface (a.k.a. The API) <uvm-client-interface.rest>`__
21 | - `Call-based IR Building API (work in progress) <irbuilder.rest>`__
22 | - `Threads and Stacks <threads-stacks.rest>`__
23 | - `Memory and Garbage Collection <uvm-memory.rest>`__
24 | - `Memory Model <memory-model.rest>`__
25 | - `(Unsafe) Native Interface <native-interface.rest>`__
26 | - `Heap Allocation and Initialisation Language (HAIL) <hail.rest>`__
27 | - `Portability and Implementation Advices <portability.rest>`__
28 | 
29 | Platform-specific parts: These extends the main specification. The main
30 | specification considers these parts as implementation-specific.
31 | 
32 | - `AMD64 Unix Native Interface <native-interface-x64-unix.rest>`__
33 | 
34 | .. vim: tw=80
35 | 


--------------------------------------------------------------------------------
/_Sidebar.md:
--------------------------------------------------------------------------------
 1 | # Contents
 2 | 
 3 | [[_TOC_]]
 4 | 
 5 | Apparently GitHub Wiki does not support the Gollum tag `[[_TOC_]]` that automatically generates the table of content.
 6 | 
 7 | I recommend the following browser add-ons as a workaround:
 8 | 
 9 | * Firefox: [HeadingsMap](https://addons.mozilla.org/en-US/firefox/addon/headingsmap/)
10 | * Chrome: [HTML5 Outliner](https://chrome.google.com/webstore/detail/html5-outliner/afoibpobokebhgfnknfndkgemglggomo) (recommended)
11 | 
12 | Other Add-ons/Extensions that may work:
13 | 
14 | * Firefox: [Table of Contents](https://addons.mozilla.org/en-US/firefox/addon/table-of-contents/) (width not adjustable)
15 | * Firefox: [HTML5 Outliner](https://addons.mozilla.org/en-US/firefox/addon/html5_outliner/) (no hyperlinks)
16 | * Chrome: [Table-of-contents-crx](https://chrome.google.com/webstore/detail/table-of-contents-crx/eeknhipceeelbgdbcmchicoaoalfdnhi) (not expanding the ToC on start)
17 | 
18 | 


--------------------------------------------------------------------------------
/common-insts.rest:
--------------------------------------------------------------------------------
  1 | ===================
  2 | Common Instructions
  3 | ===================
  4 | 
  5 | This document specifies Common Instructions.
  6 | 
  7 | **Common Instructions** are instructions that have a common format and are
  8 | used with the ``COMMINST`` super instruction. They have:
  9 | 
 10 | 1. An ID and a name. (This means, they are *identified*. See `<uvm-ir.rest>`__.)
 11 | 2. A flag list.
 12 | 3. A type parameter list.
 13 | 4. A value parameter list.
 14 | 5. An optional exception clause.
 15 | 6. A possibly empty (which means optional) keep-alive clause.
 16 | 
 17 | *Common instructions* are a mechanism to extend the Mu IR without adding new
 18 | instructions or changing the grammar.
 19 | 
 20 |     NOTE: *Common instructions* were named "intrinsic function" in previous
 21 |     versions of this document. The name was borrowed from the LLVM. However, the
 22 |     common instructions in Mu are quite different from the usual concept of
 23 |     intrinsic functions.
 24 | 
 25 |     Intrinsic functions usually mean a kind a function that is understood
 26 |     directly by the compiler. The C function ``memcpy`` is considered an
 27 |     intrinsic function by some compilers. In JikesRVM, methods of the ``Magic``
 28 |     class are a kind of intrinsic functions. They appear like ordinary functions
 29 |     in the language and bypass all front-end tools including the C parser and
 30 |     javac, but they are understood by the backend. Their purpose is to perform
 31 |     tasks that cannot be expressed by the high-level programming language,
 32 |     including direct raw memory access in Java.
 33 | 
 34 |     Common instructions only differ from ordinary Mu instructions in that they
 35 |     have a common format and are called by the ``COMMINST`` super instruction.
 36 |     The purpose is to add more instructions to the Mu IR without having to
 37 |     modify the parser.
 38 | 
 39 |     Common instructions are not Mu functions and cannot be called by the
 40 |     ``CALL`` instruction, nor can it be directly used from the high-level
 41 |     language that the client implements. The Mu client must understand common
 42 |     instructions because it is the only source of IR code of Mu. That is to say,
 43 |     *there is no way any higher-level program can express anything which Mu
 44 |     knows but the client does not*. For special high-level language functions
 45 |     that cannot be directly implemented in the high-level programming language,
 46 |     like the methods in the ``java.lang.Thread`` class, the client must
 47 |     implement those special high-level language functions in "ordinary" Mu IR
 48 |     code, which may or may not involve common instructions. For example,
 49 |     creating a thread is a "magic" in Java, but it is not more special than
 50 |     executing an instruction (``NEWTHREAD``) in Mu. Some Java libraries require
 51 |     Mu to make a ``CCALL`` to some C functions which are provided by the JVM,
 52 |     and they slip under the level of Mu. But Mu and the client always know the
 53 |     fact that "it call C function" and it is not magic.
 54 | 
 55 | This document uses the following notation::
 56 | 
 57 |     [id]@name [F1 F2 ...] < T1 T2 ... > <[ sig1 sig2 ... ]> ( p1:t1, p2:t2, ... ) excClause KEEPALIVE -> RTs
 58 |     
 59 | - ``id`` is the ID and ``@name`` is the name.
 60 | 
 61 | - ``[F1 F2 ...]`` is a list of flag parameters.
 62 | 
 63 | - ``[T1 T2 ...]`` is a list of type parameters. The users pass types into the
 64 |   common instruction via this list.
 65 | 
 66 | - ``<[ sig1 sig2 ... ]>`` is a list of function signature parameters.
 67 | 
 68 | - ``(p1:t1, p2:t2, ...)`` is a list of pairs of symbolic name and type.  It is
 69 |   the value parameter list with the type of each parameter. The user only passes
 70 |   values via this list, and the types are only parts of the documentation.
 71 | 
 72 | If any of the above list is omitted in this document, it means the respective
 73 | common instruction does not take that kind of parameters.
 74 | 
 75 | If ``excClause`` or ``KEEPALIVE`` are present, they mean that the common
 76 | instruction accepts exception clause or keepalive clause, respectively.
 77 | Otherwise the common instruction does not branch to exception destinations nor
 78 | support any keep-alive variables.
 79 | 
 80 | ``RTs`` are the return types. If the return type is omitted, it means it
 81 | produces no results (equivalent to ``-> ()``).
 82 | 
 83 | The names of many common instructions are grouped by prefixes, such as
 84 | ``@uvm.tr64.``. In this document, their common prefixes may be omitted in their
 85 | descriptions when unambiguous.
 86 | 
 87 | Thread and Stack operations
 88 | ===========================
 89 | 
 90 | ::
 91 | 
 92 |     [0x201]@uvm.new_stack <[sig]> (%func: funcref<sig>) -> stackref
 93 | 
 94 | Create a new stack with ``%func`` as the stack-bottom function. ``%func`` must
 95 | have signature ``sig``. Returns the stack reference to the new stack.
 96 | 
 97 | The stack-bottom frame is in the state **READY<Ts>**, where *Ts* are the
 98 | parameter types of ``%func``.
 99 | 
100 | This instruction continues exceptionally if Mu failed to create the stack. The
101 | exception parameter receives NULL.
102 | 
103 | ::
104 | 
105 |     [0x202]@uvm.kill_stack (%s: stackref)
106 | 
107 | Destroy the given stack ``%s``. The stack ``%s`` must be in the **READY** state
108 | and will enter the **DEAD** state.
109 | 
110 | ::
111 | 
112 |     [0x203]@uvm.thread_exit
113 | 
114 | Stop the current thread and kill the current stack. The current stack will enter
115 | the **DEAD** state. The current thread stops running.
116 | 
117 | ::
118 | 
119 |     [0x204]@uvm.current_stack -> stackref
120 | 
121 | Return the current stack.
122 | 
123 | ::
124 | 
125 |     [0x205]@uvm.set_threadlocal (%ref: ref<void>)
126 | 
127 | Set the thread-local object reference of the current thread to ``%ref``.
128 | 
129 | ::
130 | 
131 |     [0x206]@uvm.get_threadlocal -> ref<void>
132 | 
133 | Return the current thread-local object reference of the current thread.
134 | 
135 | 64-bit Tagged Reference
136 | =======================
137 | 
138 | ::
139 | 
140 |     [0x211]@uvm.tr64.is_fp  (%tr: tagref64) -> int<1>
141 |     [0x212]@uvm.tr64.is_int (%tr: tagref64) -> int<1>
142 |     [0x213]@uvm.tr64.is_ref (%tr: tagref64) -> int<1>
143 | 
144 | - ``is_fp`` checks if ``%tr`` holds an FP number.
145 | - ``is_int`` checks if ``%tr`` holds an integer.
146 | - ``is_ref`` checks if ``%tr`` holds a reference.
147 | 
148 | Return 1 or 0 for true or false.
149 | 
150 | ::
151 | 
152 |     [0x214]@uvm.tr64.from_fp  (%val: double) -> tagref64
153 |     [0x215]@uvm.tr64.from_int (%val: int<52>) -> tagref64
154 |     [0x216]@uvm.tr64.from_ref (%ref: ref<void>, %tag: int<6>) -> tagref6
155 | 
156 | - ``from_fp``  creates a ``tagref64`` value from an FP number ``%val``.
157 | - ``from_int`` creates a ``tagref64`` value from an integer ``%val``.
158 | - ``from_ref`` creates a ``tagref64`` value from a reference ``%ref`` and the
159 |   integer tag ``%tag``.
160 | 
161 | Return the created ``tagref64`` value.
162 | 
163 | 
164 | ::
165 | 
166 |     [0x217]@uvm.tr64.to_fp  (%tr: tagref64) -> double
167 |     [0x218]@uvm.tr64.to_int (%tr: tagref64) -> int<52>
168 |     [0x219]@uvm.tr64.to_ref (%tr: tagref64) -> ref<void>
169 |     [0x21a]@uvm.tr64.to_tag (%tr: tagref64) -> int<6>
170 | 
171 | - ``to_fp``  returns the FP number held by ``%tr``.
172 | - ``to_int`` returns the integer held by ``%tr``.
173 | - ``to_ref`` returns the reference held by ``%tr``.
174 | - ``to_tag`` returns the integer tag held by ``%tr`` that accompanies the
175 |   reference.
176 | 
177 | They have undefined behaviours if ``%tr`` does not hold the value of the
178 | expected type.
179 | 
180 | Math Instructions
181 | =================
182 | 
183 |     TODO: Should provide enough math functions to support:
184 | 
185 |     1. Ordinary arithmetic and logical operations that throw exceptions when
186 |        overflow. Example: C# in checked mode, ``java.lang.Math.addOvf`` added in
187 |        Java 1.8.
188 |     2. Floating point math functions. Example: trigonometric functions, testing
189 |        NaN, fused multiply-add, ...
190 | 
191 |     It requires some work to decide a complete list of such functions. To work
192 |     around the limitations for now, please call native functions in libc or
193 |     libm using ``CCALL``.
194 | 
195 | Futex Instructions
196 | ==================
197 | 
198 | See `<threads-stacks.rest>`__ for high-level descriptions about Futex.
199 | 
200 | Wait
201 | ----
202 | 
203 | ::
204 | 
205 |     [0x220]@uvm.futex.wait <T> (%loc: iref<T>, %val: T) -> int<32>
206 |     [0x221]@uvm.futex.wait_timeout <T> (%loc: iref<T>, %val: T, %timeout: int<64>) -> int<32>
207 | 
208 | ``T`` must be an integer type.
209 | 
210 | ``wait`` and ``wait_timeout`` verify if the memory location ``%loc`` still
211 | contains the value ``%val`` and then put the current thread to the waiting queue
212 | of memory location ``%loc``. If ``%loc`` does not contain ``%val``, return
213 | immediately. These instructions are atomic.
214 | 
215 | - ``wait`` waits indefinitely.
216 | 
217 | - ``wait_timeout`` has an extra ``%timeout`` parameter which is a 64-bit
218 |   unsigned integer that represents a time in nanoseconds. It specifies the
219 |   duration of the wait.
220 | 
221 | Both instructions are allowed to spuriously wake up.
222 | 
223 | They return a signed integer which indicates the result of this call:
224 | 
225 | * 0: the current thread is woken.
226 | * -1: the memory location ``%loc`` does not contain the value ``%val``.
227 | * -2: spurious wakeup.
228 | * -3: timeout during waiting (``wait_timeout`` only).
229 | 
230 | Wake
231 | ----
232 | 
233 | ::
234 | 
235 |     [0x222]@uvm.futex.wake <T> (%loc: iref<T>, %nthread: int<32>) -> int<32>
236 | 
237 | ``T`` must be an integer type.
238 | 
239 | ``wake`` wakes *N* threads in the waiting queue of the memory location ``%loc``.
240 | This instruction is atomic.
241 | 
242 | *N* is the minimum value of ``%nthread`` and the actual number of threads in the
243 | waiting queue of ``%loc``. ``%nthread`` is signed. Negative ``%nthread`` has
244 | undefined behaviour.
245 | 
246 | It returns the number of threads woken up.
247 | 
248 | Requeue
249 | -------
250 | 
251 | ::
252 | 
253 |     [0x223]@uvm.futex.cmp_requeue <T> (%loc_src: iref<T>, %loc_dst: iref<T>, %expected: T, %nthread: int<32>) -> int<32>
254 | 
255 | ``T`` must be an integer type.
256 | 
257 | ``cmp_requeue`` verifies if the memory location ``%loc_src`` still contains the
258 | value ``%expected`` and then wakes up *N* threads from the waiting queue of
259 | ``%loc_src`` and move all other threads in the waiting queue of ``%loc_src`` to
260 | the waiting queue of ``%loc_dst``. If ``%loc_src`` does not contain the value
261 | ``%expected``, return immediately. This instruction is atomic.
262 | 
263 | *N* is the minimum value of ``%nthread`` and the actual number of threads in the
264 | waiting queue of ``%loc``. ``%nthread`` is signed. Negative ``%nthread`` has
265 | undefined behaviour.
266 | 
267 | It returns a signed integer. When the ``%loc_src`` contains the value of
268 | ``%expected``, return the number of threads woken up; otherwise return -1.
269 | 
270 | Miscellaneous Instructions
271 | ==========================
272 | 
273 | ::
274 | 
275 |     [0x230]@uvm.kill_dependency <T> (%val: T) -> T
276 | 
277 | Return the same value as ``%val``, but ``%val`` does not carry a dependency to
278 | the return value.
279 | 
280 |     NOTE: This is supposed to free the compiler from keeping dependencies in
281 |     some performance-critical cases.
282 | 
283 | Native Interface
284 | ================
285 | 
286 | Object pinning
287 | --------------
288 | 
289 | ::
290 | 
291 |     [0x240]@uvm.native.pin   <T> (%opnd: T) -> uptr<U>
292 |     [0x241]@uvm.native.unpin <T> (%opnd: T)
293 | 
294 | *T* must be ``ref<U>`` or ``iref<U>`` for some U.
295 | 
296 | - ``pin`` adds one instance of the reference ``%opnd`` to the pinning multiset
297 |   of the current thread.  Returns the mapped pointer to the bytes for the memory
298 |   location.  If *T* is ``ref<U>``, it is equivalent to pinning the memory
299 |   location of the whole object (as returned by the ``GETIREF`` instruction). If
300 |   *opnd* is ``NULL``, the result is a null pointer whose address is 0.
301 | 
302 | - ``unpin`` removes one instance of the reference ``%opnd`` from the pinning
303 |   multiset of the current thread. It has undefined behaviour if no such an
304 |   instance exists.
305 | 
306 | Mu function exposing
307 | --------------------
308 | 
309 | ::
310 | 
311 |     [0x242]@uvm.native.expose [callconv] <[sig]> (%func: funcref<sig>, %cookie: int<64>) -> U
312 | 
313 | *callconv* is a platform-specific calling convention flag. *U* is determined by
314 | the calling convention and *sig*.
315 | 
316 | ``expose`` exposes a Mu function *func* as a value according to the calling
317 | convention *callConv* with cookie *cookie*.
318 | 
319 |     Example::
320 |         
321 |         .funcdef @foo VERSION ... <@foo_sig> (...) { ... }
322 | 
323 |         %ev = COMMINST @uvm.native.expose [#DEFAULT] <[@foo_sig]>
324 | 
325 | ::
326 | 
327 |     [0x243]@uvm.native.unexpose [callconv] (%value: U)
328 | 
329 | *callconv* is a platform-specific calling convention flag. *U* is determined by
330 | the calling convention.
331 | 
332 | ``unexpose`` removes the exposed value.
333 | 
334 | ::
335 | 
336 |     [0x244]@uvm.native.get_cookie () -> int<64>
337 | 
338 | If a Mu function is called via its exposed value, this instruction returns the
339 | attached cookie. Otherwise it returns an arbitrary value.
340 | 
341 | Metacircular Client Interface
342 | =============================
343 | 
344 | These are additional instructions that enables Mu IR programs to behave like a
345 | client.
346 | 
347 | Some types and signatures are pre-defined. They are always available. Note that
348 | the following are not strict text IR syntax because some types are defined in
349 | line::
350 | 
351 |     .typedef @uvm.meta.bytes   = hybrid <int<64> int<8>>    // ID: 0x260
352 |     .typedef @uvm.meta.bytes.r = ref<@uvm.meta.bytes.r>     // ID: 0x261
353 |     .typedef @uvm.meta.refs    = hybrid <int<64> ref<void>> // ID: 0x262
354 |     .typedef @uvm.meta.refs.r  = ref<@uvm.meta.refs.r>      // ID: 0x263
355 | 
356 |     .funcsig @uvm.meta.trap_handler.sig       = (stackref int<32> ref<void>) -> ()   // ID: 0x264
357 | 
358 | In ``bytes`` and ``refs``, the fixed part is the length of the variable part.
359 | ``bytes`` represents a byte array. ASCII strings are also represented this way.
360 | 
361 | ID/name conversion
362 | ------------------
363 | 
364 | ::
365 | 
366 |     [0x250]@uvm.meta.id_of (%name: @uvm.meta.bytes.r) -> int<32>
367 |     [0x251]@uvm.meta.name_of (%id: int<32>) -> @uvm.meta.bytes.r
368 | 
369 | - ``id_of`` converts a textual Mu name ``%name`` to the numerical ID. The name
370 |   must be a global name.
371 | 
372 | - ``name_of`` converts the ID ``%id`` to its corresponding name. If the name
373 |   does not exist, it returns ``NULL``. The returned object must not be modified.
374 | 
375 | They have undefined behaviours if the name or the ID in the argument do not
376 | exist, or ``%name`` is ``NULL``.
377 | 
378 | Bundle/HAIL loading
379 | -------------------
380 | 
381 | ::
382 | 
383 |     [0x252]@uvm.meta.load_bundle (%buf: @uvm.meta.bytes.r)
384 |     [0x253]@uvm.meta.load_hail   (%buf: @uvm.meta.bytes.r)
385 | 
386 | ``load_bundle`` and ``load_hail`` loads Mu IR bundles and HAIL scripts,
387 | respectively. ``%buf`` is the content.
388 | 
389 |     TODO: These comminsts should be made optional, and the IR Builder API should
390 |     be provided as comminsts, too.
391 | 
392 | Stack introspection
393 | -------------------
394 | 
395 | ::
396 | 
397 |     [0x254]@uvm.meta.new_cursor         (%stack: stackref) -> framecursorref
398 |     [0x255]@uvm.meta.next_frame         (%cursor: framecursorref)
399 |     [0x256]@uvm.meta.copy_cursor        (%cursor: framecursorref) -> framecursorref
400 |     [0x257]@uvm.meta.close_cursor       (%cursor: framecursorref)
401 | 
402 | In all cases, ``cursor`` and ``stack`` cannot be ``NULL``.
403 | 
404 | - ``new_cursor`` allocates a frame cursor, referring to the top frame of
405 |   ``%stack``. Returns the frame cursor reference.
406 | 
407 | - ``next_frame`` moves the frame cursor so that it refers to the frame below its
408 |   current frame.
409 | 
410 | - ``copy_cursor`` allocates a frame cursor which refers to the same frame as
411 |   ``%cursor``. Returns the frame cursor reference.
412 | 
413 | - ``close_cursor`` deallocates the cursor.
414 | 
415 | ::
416 | 
417 |     [0x258]@uvm.meta.cur_func           (%cursor: framecursorref) -> int<32>
418 |     [0x259]@uvm.meta.cur_func_Ver       (%cursor: framecursorref) -> int<32>
419 |     [0x25a]@uvm.meta.cur_inst           (%cursor: framecursorref) -> int<32>
420 |     [0x25b]@uvm.meta.dump_keepalives    (%cursor: framecursorref) -> @uvm.meta.refs.r
421 | 
422 | These functions operate on the frame referred by ``%cursor``. In all cases,
423 | ``%cursor`` cannot be ``NULL``.
424 | 
425 | - ``cur_func`` returns the ID of the frame. Returns 0 if the frame is native.
426 | 
427 | - ``cur_func_ver`` returns the ID of the current function version of the frame.
428 |   Returns 0 if the frame is native, or the function of the frame is undefined.
429 | 
430 | - ``cur_inst`` returns the ID of the current instruction of the frame. Returns 0
431 |   if the frame is just created, its function is undefined, or the frame is
432 |   native.
433 | 
434 | - ``dump_keepalives`` dumps the values of the keep-alive variables of the
435 |   current instruction in the frame. If the function is undefined, the arguments
436 |   are the keep-alive variables. Cannot be used on native frames. The return
437 |   value is a list of object references, each of which refers to an object which
438 |   has type *T* and contains value *v*, where *T* and *v* are the type and the
439 |   value of the corresponding keep-alive variable, respectively.
440 | 
441 | On-stack replacement
442 | --------------------
443 | 
444 | ::
445 | 
446 |     [0x25c]@uvm.meta.pop_frames_to (%cursor: framecursorref)
447 |     [0x25d]@uvm.meta.push_frame <[sig]> (%stack: stackref, %func: funcref<sig>)
448 | 
449 | ``%cursor``, ``%stack`` and ``%func`` must not be ``NULL``.
450 | 
451 | - ``pop_frames_to`` pops all frames above ``%cursor``.
452 | 
453 | - ``push_frame`` creates a new frame on top of the stack ``%stack`` for the
454 |   current version of the Mu function ``%func``. ``%func`` must have the
455 |   signature ``sig``.
456 | 
457 | Watchpoint operations
458 | ---------------------
459 | 
460 | ::
461 | 
462 |     [0x25e]@uvm.meta.enable_watchpoint  (%wpid: int<32>)
463 |     [0x25f]@uvm.meta.disable_watchpoint (%wpid: int<32>)
464 | 
465 | - ``enable_watchpoint``    enables  all watchpoints of watchpoint ID ``%wpid``.
466 | - ``disenable_watchpoint`` disables all watchpoints of watchpoint ID ``%wpid``.
467 | 
468 | Trap handling
469 | -------------
470 | 
471 | ::
472 | 
473 |     [0x260]@uvm.meta.set_trap_handler (%handler: funcref<@uvm.meta.trap_handler.sig>, %userdata: ref<void>)
474 | 
475 | This instruction registers a trap handler. ``%handler`` is the function to be
476 | called and ``%userdata`` will be their last argument when called.
477 | 
478 | This instruction overrides the trap handler registered via the C-based client
479 | API.
480 | 
481 | A trap handler takes three parameters:
482 | 
483 | 1. The stack where the trap takes place.
484 | 2. The watchpoint ID, or 0 if triggered by the ``TRAP`` instruction.
485 | 3. The user data, which is provided when registering.
486 | 
487 | A trap handler is run by the same Mu thread that caused the trap and is executed
488 | on a new stack.
489 | 
490 | A trap handler *usually* terminates by either executing the ``@uvm.thread_exit``
491 | instruction (probably also kill the old stack before exiting), or ``SWAPSTACK``
492 | back to another stack while killing the stack the trap handler was running on.
493 | 
494 | Notes about dynamism
495 | --------------------
496 | 
497 | These additional instructions are not dynamic. Unlike the C-based API, these
498 | instructions do not use handles. Arguments, such as the additional arguments of
499 | ``push_frame`` are also statically typed. If the client needs dynamically typed
500 | handles, it can always make its own. For example, ``push_frame`` can be wrapped
501 | by a Mu function which takes a dynamic argument list, checks the argument types,
502 | and executes a static ``@uvm.meta.push_frame`` instruction on the unboxed
503 | values.
504 | 
505 | Some dynamic lookups, such as looking up constants by ID, are not available,
506 | either. It can be worked around by maintaining a ``HashMap<id,value>`` (in the
507 | form of Mu IR programs) which is updated with each bundle loading. In other
508 | words, if the client does not maintain such a map, Mu will have to maintain it
509 | for the client.
510 | 
511 | .. vim: tw=80
512 | 


--------------------------------------------------------------------------------
/hail.rest:
--------------------------------------------------------------------------------
  1 | ===========================================
  2 | Heap Allocation and Initialisation Language
  3 | ===========================================
  4 | 
  5 | **HAIL may not be the best tool**. The most efficient way to initialise a micro
  6 | VM is by building a boot image (this is implementation-specific). The most
  7 | efficient way to load objects from a serialised file is to build the
  8 | de-serialiser (such as the ".class" file parser) in Mu IR.
  9 | 
 10 | The Heap Allocation and Initialisation Language (HAIL) is a Mu IR-like language
 11 | that allocates heap objects and initialise Mu memory locations with values.
 12 | 
 13 | It is designed to initialise language-specific objects, such as class
 14 | meta-objects (e.g. the ``java.lang.Class`` objects and the virtual tables in JVM
 15 | created during class loading), heap-allocated constant string objects, and
 16 | language-level constants which are implemented as Mu-level global cells (because
 17 | Mu does not allow "constant object references").
 18 | 
 19 | HAIL should be faster than initialising the memory via the client API, and more
 20 | space-efficient than a naively implemented Mu function which creates and
 21 | initialises objects by executing a (usually *very* long) sequence of ``NEW`` and
 22 | ``STORE`` instructions. But keep in mind that is not the only efficient method.
 23 | A well-written Mu program can read from a serialised file (e.g. the ".class"
 24 | file in Java) and interpret it in a similar way the Mu micro VM interprets the
 25 | HAIL script. The client can also rely on object pinning and initialise objects
 26 | via pointers, bypassing the handle-based API.
 27 | 
 28 | A **HAIL script** has a text format and a binary format. The text format is
 29 | similar to the text-based Mu IR, and the binary format is similar to the binary
 30 | form Mu IR.
 31 | 
 32 | This document uses EBNF to define the text-form syntax. A non-terminal starts
 33 | with a capital letter and a terminal starts with a lower-case letter. Literal
 34 | characters are quoted within pairs of ``'`` or ``"``. ``*`` means repeating 0 or
 35 | more times. ``+`` means repeating 1 or more times. ``?`` means optional. ``|``
 36 | means either its left-hand-side or its right-hand-side. ``|`` has the lowest
 37 | precedence than simple concatenation; unary suffixes have the highest
 38 | precedence. ``(`` and ``)`` group terms together to override the precedence.
 39 | ``[`` and ``]`` denote a set of characters.
 40 | 
 41 | Lexical Structures
 42 | ==================
 43 | 
 44 | Comments start with two slashes ``//`` and ends at the end of the line. White
 45 | spaces between lexicons are ignored.
 46 | 
 47 | In HAIL, a **HAIL name**, i.e. the name of a heap-allocated object, start with a
 48 | dollar sign ``$`` followed by one or more characters in the set:
 49 | ``[0-9a-zA-Z_-.]``::
 50 | 
 51 |     hailName ::= '$' [0-9a-zA-Z_-.]+
 52 | 
 53 | The scope of a HAIL name is within a single HAIL script. In other words, they
 54 | are temporary, and they become invalid as soon as the HAIL script is fully
 55 | evaluated. Storing them to global cells is one way to keep references to the
 56 | allocated objects.
 57 | 
 58 | **Global name**, **integer literal**, **floating point literal** and **null
 59 | literal** are defined the same way as in `Mu IR <uvm-ir.rest>`__. They are
 60 | denoted as ``globalName``, ``intLit``, ``floatLit``, ``'NULL'``, respectively.
 61 | 
 62 | Expressions
 63 | ===========
 64 | 
 65 | An **LValue** specifies a memory location::
 66 | 
 67 |     LValue  ::= Name Index*
 68 |     
 69 |     Name    ::= globalName | hailName
 70 | 
 71 |     Index   ::= '[' IntExpr ']'
 72 | 
 73 |     IntExpr ::= intLit | globalName
 74 | 
 75 | It is either a component of a global cell (when ``Name`` is a ``globalName``),
 76 | or a component of a newly allocated heap object in the current HAIL script (when
 77 | ``Name`` is a ``hailName``). If the name appears alone, the memory location is
 78 | the global cell or the heap object itself. Its fields or elements can be
 79 | selected using indices. The index can be an integer literal (``intLit``) or a
 80 | global name of a Mu constant of ``int<n>`` type of any ``n``, treated as
 81 | unsigned integer.
 82 | 
 83 | If a memory location ``l`` holds a struct or hybrid, then ``l[n]`` is its n-th
 84 | field (n = 0, 1, 2...). Specifically, for hybrid, it means the n-th field in the
 85 | fixed part. If n equals the number of fixed-part fields, it selects the variable
 86 | part. In such cases, ``l[n][m]`` is the m-th element of the variable part.
 87 | 
 88 | If a memory location ``l`` holds an array or a vector, then ``l[n]`` is its n-th
 89 | element (n = 0, 1, 2...).
 90 | 
 91 | An **RValue** specifies a Mu value::
 92 | 
 93 |     RValue  ::= globalName | intLit | floatLit | ``'NULL'``
 94 |                 | hailName | '&' LValue | '*' LValue | List
 95 | 
 96 |     List    ::= '{' RValue* '}'
 97 | 
 98 | It can be the address of an LValue, in which case the value is one of:
 99 | 
100 | - A Mu global SSA variable (constant, global cell, function, exposed function)
101 |   (``globalName``)
102 | - An integer literal (``intLit``)
103 | - A floating point literal (``floatLit``)
104 | - The ``NULL`` value of an appropriate type (``'NULL'``)
105 | - An object reference to an object just created in HAIL (``hailName``)
106 | - An internal reference of an LValue (``'&' LValue``)
107 | - The current value held at the memory location of an LValue (``'*' LValue``)
108 | - A list of 0 or more RValue.
109 | 
110 | See *memory initialisation* below for more details.
111 | 
112 | Top-level Definitions
113 | =====================
114 | 
115 | Top-level definitions in HAIL include **fixed object allocation**, **hybrid
116 | allocation** and **memory initialisation**. All object allocations are evaluated
117 | before any memory initialisation definitions are evaluated.
118 | 
119 | A *fixed object allocation* allocates a fixed-size object::
120 | 
121 |     FixedAlloc  ::= '.new' hailName '<' Type '>'
122 | 
123 |     Type        ::= globalName
124 | 
125 | 
126 | where ``hailName`` is a HAIL name of the allocated object, and ``Type`` is the
127 | global name of the type of the object. ``Type`` must not be a ``hybrid`` type.
128 | 
129 | A *hybrid allocation* allocates a hybrid::
130 | 
131 |     HybridAlloc ::= '.newhybrid' hailName '<' Type '>' IntExpr
132 | 
133 | where ``hailName`` and ``Type`` are the name and the type. ``IntExpr`` specifies
134 | the length of the variable part of the hybrid, which can either be an integer
135 | literal, or a Mu ``int<n>`` constant of any ``n``, treated as unsigned integer.
136 | ``Type`` must be a ``hybrid`` type.
137 | 
138 | A *memory initialisation* initialises a memory location::
139 | 
140 |     MemInit     ::= '.init' LValue = RValue
141 | 
142 | ``RValue`` must be appropriate for the ``LValue`` type. Specifically, the star
143 | notation ``*LValue`` copies the value from the memory location of the ``LValue``
144 | after the ``*``. It is applicable to all types as long as the type matches the
145 | ``LValue`` being written to. In addition:
146 | 
147 | - If ``LValue`` is ``int<n>``, ``uptr<T>`` or ``ufuncptr<sig>``, then ``RValue``
148 |   can be an ``intLit``, a constant of the same ``LValue`` type.
149 | 
150 | - If ``LValue`` is ``float`` or ``double``, then ``RValue`` can be a
151 |   ``floatLit`` or a Mu constant of the same type as ``LValue``.
152 | 
153 | - If ``LValue`` is ``ref<T>`` or ``weakref<T>``, then ``RValue`` can be ``NULL``
154 |   or a HAIL name of a newly created object.
155 | 
156 | - If ``LValue`` is ``iref<T>``, then ``RValue`` can be ``NULL``, the global name
157 |   of a global cell, or an ``LValue`` (with the ``&`` sign). Implicit ``REFCAST``
158 |   applies.
159 | 
160 | - If ``LValue`` is ``funcref<sig>``, then ``RValue`` can be ``NULL`` or the
161 |   global name of a Mu function. Implicit ``REFCAST`` applies.
162 | 
163 | - If ``LValue`` is ``stackref`` or ``threadref``, then the only applicable
164 |   ``RValue`` is ``NULL``.
165 | 
166 | - If ``LValue`` is ``tagref64``, then ``RValue`` can be the appropriate value
167 |   suitable for ``double``, ``int<52>`` or ``struct<ref<void> int<6>>``.
168 | 
169 | - If ``LValue`` is a struct, hybrid, array or vector, then ``RValue`` must be a
170 |   ``List`` of ``RValue`` items. Each item will initialise a field or element of
171 |   the composite type. The entire variable part of a hybrid is treated as one
172 |   additional field to its fixed part fields, and is treated as an array of the
173 |   actual length. The list may have less fields/elements of the ``LValue``, in
174 |   which case only the first fields/elements are initialised, and others remain
175 |   their old values. (Note: All newly allocated memory locations, including heap
176 |   objects, stack cells and global cells, have initial values: 0 or NULL.)
177 | 
178 | When assigning to an LValue of ``ref<T>``, ``weakref<T>``, ``iref<T>``,
179 | ``funcref<sig>``, ``uptr<T>`` or ``ufuncptr<sig>`` types, if the RValue only
180 | differs in the ``T`` or ``sig`` parameter, then implicit ``REFCAST`` or
181 | ``PTRCAST`` are applied. ``weakref`` and ``ref`` can be assigned to each other.
182 | ``PTRCAST`` can only change the type/sig parameters ``T`` and ``sig``, but not
183 | the base type ``int``, ``uptr`` and ``ufuncptr``.  (Note: This makes sub-class
184 | instances assignable to a location that refers to a super-class instance.)
185 | 
186 | Multiple top-level definitions are applied in the order they appear in the HAIL
187 | script. In order to deal with cyclic references, it is advisable to put ``.new``
188 | and ``.newhybrid`` before ``.init``.
189 | 
190 | Memory Order
191 | ============
192 | 
193 | All loads (via ``*LValue``) and stores (via ``.init``) are non-atomic. In
194 | ``.init``, it has undefined behaviour if any values in the ``LValue`` in the
195 | left-hand-side of the ``=`` is accessed by the right-hand-side ``RValue``.
196 | 
197 |     NOTE: This is to say, don't load from the memory location being initialised
198 |     because the implementation may write into the left-hand-side in any order.
199 | 
200 | Example
201 | =======
202 | 
203 | Example 1::
204 | 
205 |     // Assume the following definitions in a previously loaded Mu IR bundle.
206 |     // .typedef @i8     = int<8>
207 |     // .typedef @i32    = int<32>
208 |     // .typedef @i64    = int<64>
209 |     // .typedef @float  = float
210 |     // .typedef @double = double
211 |     //
212 |     // .typedef @NakedArray     = hybrid<@i8>           // no fields in the fixed part
213 |     // .typedef @LengthedArray  = hybrid<@i32 @i8>      // no fields in the fixed part
214 |     // .typedef @JavaStyleArray = hybrid<@TID @i32 @i8> // two fields in the fiexed part
215 |     //
216 |     // .typedef @TID = int<64>
217 |     // .typedef @SmallFloatArray = array<@float 4>
218 |     // .typedef @irefi64 = iref<@i64>
219 |     // .typedef @Object = struct<@TID @SmallFloatArray @double @irefi64>
220 |     //
221 |     // .typedef @vtable     = hybrid<...>
222 |     // .typedef @vtable_r   = ref<@vtable>
223 |     // .global  @g_vtable <@vtable_r>
224 |     //
225 |     // .typedef @Object2 = struct<@vtable_r @i64>
226 |     //
227 |     // .typedef @LinkedList     = struct<@i64 @LinkedList_r>
228 |     // .typedef @LinkedList_r   = ref<@LinkedList>
229 |     //
230 |     // .typedef @irefi64 = iref<@i64>
231 |     //
232 |     // .const   @MAGICAL_NUMBER <@i64> = 42
233 |     // .const   @PI <@double> = 3.14d
234 |     //
235 |     // .global  @my_global          <@i64>
236 |     // .global  @a_global_iref_cell <@irefi64>
237 |     // .global  @another_global_iref_cell <@irefi64>
238 |     //
239 |     // .global  @my_favourite_linked_list_node <@LinkedList>
240 |     //
241 | 
242 | 
243 |     .new        $my_long_obj    <@i64>
244 |     .init       $my_long_obj    = 0x123456789abcdef0
245 | 
246 |     .newhybrid  $my_array1      <@NakedArray> 4
247 |     .newhybrid  $my_array2      <@LengthedArray> 10000
248 |     .newhybrid  $my_array3      <@JavaStyleArray> @MAGICAL_NUMBER
249 | 
250 |     .init       $my_array1      = {1 2 3 4}
251 | 
252 |     .init       $my_array2          = {100              // claim length 100, while the capacity is really 10000
253 |                                         {0 1 2 3 4}}    // Only init 5 elems
254 |     .init       $my_array2[1][99]   = 99                // Also init the 99-th elem
255 | 
256 |     .init       $my_array3      = {1001 42 {1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8
257 |                                             9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6
258 |                                             7 8 9 0 1 2}}
259 | 
260 |     .new        $my_obj     <@Object>
261 | 
262 |     .init       $my_obj     = {@MAGICAL_NUMBER {1.0f 2.0f 3.0f 4.0f} @PI @my_global}
263 | 
264 |     .new        $my_obj2    <@Object2>
265 | 
266 |     // This object has a pointer to an existing v-table allocated before loading
267 |     // this HAIL script. A reference to the v-table is held in the global cell @g_vtable.
268 |     // The star notation *@g_vtable loads the value from an LValue and assign it
269 |     // to the field.
270 |     .init       $my_obj2    = {*@g_vtable 42}
271 |     
272 |     .new        $node0  <@LinkedList>
273 |     .new        $node1  <@LinkedList>
274 |     .new        $node2  <@LinkedList>
275 | 
276 |     .init       $node0 = {0 $node1}     // All objects are allocated before init.
277 |     .init       $node1 = {1 $node0}     // so they can form a ring
278 | 
279 |     .init       $node2 = {2 NULL}       // Isolated node
280 | 
281 |     // Global cells can be initialised, too.
282 |     .init       @my_global  = -1
283 | 
284 |     // @a_global_iref_cell will hold an iref<@i64> to the global cell @my_global
285 |     .init       @a_global_iref_cell = &@myglobal
286 | 
287 |     // Equivalent. The global variable @myglobal is already an iref.
288 |     .init       @a_global_iref_cell = @myglobal
289 | 
290 |     .new        $foo <@i64>
291 | 
292 |     // This refers into the gut of $foo
293 |     .init       @another_global_iref_cell = &$foo
294 | 
295 |     // In fact, all objects except $node0 and $node1 will become garbages after
296 |     // this HAIL script is fully evaluated.
297 |     .init   @my_favourite_linked_list_node = $node0
298 | 
299 | Example 2: String constant initialisation. In order to keep references to these
300 | objects, we need to store them to global cells::
301 | 
302 |     // Assume the following Java code:
303 |     // System.out.println("Hello world!");
304 |     //
305 |     // We want to create a String object for the string literal "Hello world!".
306 |     // In a real JVM, more strings would be created for class names and method
307 |     // names for reflection.
308 |     //
309 |     // We assume the Java class loader defines the String like this:
310 |     // .typedef @RefString    = ref <@String>
311 |     // .typedef @String       = struct <@TID @RefCharArray @i32 @i32>  // tid, buf, begin, size
312 |     // .typedef @RefCharArray = ref <@CharArray>
313 |     // .typedef @CharArray    = hybrid <@ArrayHeader @i16>  // header, elements
314 |     // .typedef @ArrayHeader  = struct <@TID @i32>   // tid, length
315 |     // 
316 |     // It makes a global cell to store a reference to the String:
317 |     // .global @const_hello_world <@RefString>
318 |     //
319 |     // Then we can create and initialise the string in HAIL:
320 | 
321 |     .new            $hw     <@String>               // The String object
322 |     .newhybrid      $hwbuf  <@CharArray>    12      // The underlying array
323 | 
324 |     .init   $hw     = {0xabcd $hwbuf 0 12}
325 |     .init   $hwbuf  = {{0x1234 12} {0x48 0x65 0x6c 0x6c 0x6f 0x20 0x77 0x6f 0x72 0x6c 0x64 0x21}} 
326 | 
327 |     .init   @const_hello_world = $hw    // Store it to the global cell.
328 | 
329 |     // Then out.println("Hello world!") can be compiled to:
330 |     // %1 = LOAD <@RefString> @const_hello_world
331 |     //      CALL <@sig1> @PrintStream.println (%out %1) // in the real world it may need dynamic dispatching
332 | 
333 | Binary Form
334 | ===========
335 | 
336 | A binary HAIL script starts with a 4-byte magic '\x7f' 'H' 'A' 'I', or 0x7f 0x48
337 | 0x41 0x49.
338 | 
339 | HAIL IDs are the counterpart of HAIL names. HAIL IDs are 32-bit integers. 0 is
340 | an invalid HAIL ID. HAIL ID has a different namespace from Mu IDs, i.e. they
341 | refer to different things even if their values are equal. HAIL IDs only refer to
342 | heap-allocated objects in the current HAIL script.
343 | 
344 | In the following paragraphs, binary types defined in `Mu IR Binary Form
345 | <uvm-ir-binary.rest>`__ are used. For convenience, we use "hID" for HAIL ID and "mID"
346 | for Mu ID.
347 | 
348 | A *fixed object allocation* definition has the form:
349 | 
350 | +------+-----+------+
351 | | opct | idt | idt  |
352 | +======+=====+======+
353 | | 0x01 | hID | type |
354 | +------+-----+------+
355 | 
356 | *hID* is the HAIL ID of the object. *type* is the Mu ID of the type.
357 | 
358 | A *variable-length object allocation* definition has the form:
359 | 
360 | +------+-----+------+--------+
361 | | opct | idt | idt  | i64    |
362 | +======+=====+======+========+
363 | | 0x02 | hID | type | length |
364 | +------+-----+------+--------+
365 | 
366 | *hID* is the HAIL ID of the object. *type* is the Mu ID of the type. *length* is
367 | the length of the variable part.
368 | 
369 | A *memory initialisation* definition has the form:
370 | 
371 | +------+--------+--------+
372 | | opct | LValue | RValue |
373 | +======+========+========+
374 | | 0x03 | LValue | RValue |
375 | +------+--------+--------+
376 | 
377 | LValue:
378 | 
379 | +------+------+---------+-----+
380 | | Name | lent | IntExpr | ... |
381 | +======+======+=========+=====+
382 | | Name | n    | IntExpr | ... |
383 | +------+------+---------+-----+
384 | 
385 | *n* is the number of *IntExpr* following.
386 | 
387 | Name:
388 | 
389 | +------+-----+
390 | | opct | idt |
391 | +======+=====+
392 | | tag  | id  |
393 | +------+-----+
394 | 
395 | If *tag* is 0x04, *id* is the HAIL ID; if *tag* is 0x05, *id* is the Mu ID. 
396 | 
397 | IntExpr can be intLit or a global name (Name with tag=2)
398 | 
399 | intLit:
400 | 
401 | +------+-----+
402 | | opct | i64 |
403 | +======+=====+
404 | | 0x12 | lit |
405 | +------+-----+
406 | 
407 | *lit* is the literal. There is currently no way to express integer literals
408 | longer than 64 bits. *lit* is truncated or zero-extended to the LValue length.
409 | 
410 | *RValue* can be one of the following:
411 | 
412 | 1. A Mu global SSA variable:
413 | 
414 | +------+-----+
415 | | opct | idt |
416 | +======+=====+
417 | | 0x11 | gv  |
418 | +------+-----+
419 | 
420 | gv it the ID of the global SSA variable.
421 | 
422 | 2. An integer literal (see intLit above)
423 | 
424 | 3. A 32-bit float literal
425 | 
426 | +------+-------+
427 | | opct | float |
428 | +======+=======+
429 | | 0x13 | value |
430 | +------+-------+
431 | 
432 | 4. A 64-bit float literal
433 | 
434 | +------+--------+
435 | | opct | double |
436 | +======+========+
437 | | 0x14 | value  |
438 | +------+--------+
439 | 
440 | 5. A ``NULL`` literal
441 | 
442 | +------+
443 | | opct |
444 | +======+
445 | | 0x15 |
446 | +------+
447 | 
448 | 6. An object reference to an object allocated in HAIL
449 | 
450 | +------+-----+
451 | | opct | idt |
452 | +======+=====+
453 | | 0x16 | id  |
454 | +------+-----+
455 | 
456 | *id* is the HAIL id.
457 | 
458 | 7. An internal reference of an LValue
459 | 
460 | +------+--------+
461 | | opct | LValue |
462 | +======+========+
463 | | 0x17 | LValue |
464 | +------+--------+
465 | 
466 | 8. A list of other values of any kinds.
467 | 
468 | +------+--------+--------+--------+--------+
469 | | opct | i64    | RValue | RValue | ...    |
470 | +======+========+========+========+========+
471 | | 0x18 | nelems | rv1    | rv2    | ...    |
472 | +------+--------+--------+--------+--------+
473 | 
474 | *nelems* is the number of RValues following it. This structure is recursive.
475 | 
476 | 9. A list of other values of the same kind of literals.
477 | 
478 | +------+--------+------+---------+---------+--------+
479 | | opct | i64    | opct | literal | literal | ...    |
480 | +======+========+======+=========+=========+========+
481 | | 0x19 | nelems | kind | lit1    | lit2    | ...    |
482 | +------+--------+------+---------+---------+--------+
483 | 
484 | *nelems* is the number of literals following. *kind* can be one in the following
485 | table, and *kind* determines the following *literal* element type.
486 | 
487 | =========== ==============
488 | opct        literal type
489 | ----------- --------------
490 | 0x12        i64
491 | 0x13        float
492 | 0x14        double
493 | 0x1a        i8
494 | 0x1b        i16
495 | 0x1c        i32
496 | =========== ==============
497 | 
498 | This allows more compact encoding of large arrays of simple elements. The
499 | literal type, however, does not need to match the actual type of the LValue,
500 | because implicit truncating or zero-extension always happen.
501 | 
502 | Future Work
503 | ===========
504 | 
505 | The binary format is not the most efficient format possible. HAIL is still an
506 | interpreted format, even when it is binary. It is designed to be convenient,
507 | reasonably efficient and platform-independent.
508 | 
509 | There could be implementation-specific ways of serialising data faster than this
510 | portable interface.
511 | 
512 | The native interface can also potentially outperform HAIL. Using object pinning
513 | and pointers, Mu IR programs can directly memcpy data from files, such as
514 | copying strings from ``.class`` files.
515 | 
516 | .. vim: tw=80
517 | 


--------------------------------------------------------------------------------
/memory-model.rest:
--------------------------------------------------------------------------------
  1 | ============
  2 | Memory Model
  3 | ============
  4 | 
  5 | The Mu memory model basically follows the C11 memory model with a few
  6 | modifications to make it suitable for Mu.
  7 | 
  8 | Overview
  9 | ========
 10 | 
 11 | Mu does not enforce any strong order, but trusts the client to correctly use the
 12 | atomic and ordering mechanisms provided by Mu. Many choices of ordering, from
 13 | relaxed to sequentially consistent, on each memory operation are given to the
 14 | client. The client has the freedom to make its choice and has the responsibility
 15 | to synchronise their multi-threaded programs.
 16 | 
 17 | The most restricted form of memory accesses are sequentially consistent. Mu
 18 | guarantees that they are atomic. They follow the well-known acquire-release
 19 | memory model and there is a total order of all such memory accesses in all
 20 | threads.
 21 | 
 22 | A less-restricted form is acquire and release. It does not have the total
 23 | order provided by sequential consistency, but atomicity and the synchronise-with
 24 | relationship between release and acquire operations are provided.
 25 | 
 26 | The "consume" order exploits the fact that some processors with relaxed memory
 27 | order can figure out the dependencies between load operations in the hardware
 28 | and will not reorder them even without memory fences. It can achieve some
 29 | synchronisation requirements more efficiently than the "acquire" order.
 30 | 
 31 | The "relaxed" order only guarantees atomicity but does not enforce any order.
 32 | The most unrestricted form of memory access is not atomic. These operations
 33 | allows the Mu implementation and the processor to maximise the throughput while
 34 | relying on the programmer to correctly synchronise their programs.
 35 | 
 36 | Notable differences from C11
 37 | ----------------------------
 38 | 
 39 | The program order in Mu is a total order while the sequenced-before relationship
 40 | in C is a partial order, because there are unspecified order of evaluations in
 41 | C.
 42 | 
 43 | There is no "atomic type" in Mu. Only operations make a difference between
 44 | atomic and non-atomic accesses. Using both atomic and non-atomic operations on
 45 | the same memory location is an undefined behaviour in Mu.
 46 | 
 47 | The primitives for atomic accesses and fences are provided by the instruction
 48 | set of Mu rather than the library. Mutex locks, however, have to be implemented
 49 | on top of this memory model.
 50 | 
 51 | Notable differences from LLVM
 52 | -----------------------------
 53 | 
 54 | LLVM does not guarantee that all atomic writes to a memory location has a total
 55 | order unless using ``monotonic`` or stronger memory order. It provides an
 56 | ``unordered`` order which is atomic but not "monotonic". The ``unordered`` order
 57 | is intended to support the Java memory model, but whether ``unordered`` is
 58 | necessary and whether the ``relaxed`` order in C11 or the ``monotonic`` in LLVM
 59 | is suitable for Java is not yet known.
 60 | 
 61 | In LLVM, an atomic operation can be labelled ``singlethread``, in which case it
 62 | only synchronises with or participates in modification and ``seq_cst`` total
 63 | orderings with other operations running in the same thread (for example, in
 64 | signal handlers). C11 provides ``atomic_signal_fence`` for similar purposes.
 65 | 
 66 | Concepts
 67 | ========
 68 | 
 69 | data value
 70 |     See `Type System <type-system.rest>`__
 71 | 
 72 | SSA variable, instruction and evaluation
 73 |     See `Instruction Set <instruction-set.rest>`__
 74 | 
 75 | memory, initial value, load, store, access and conflict
 76 |     See `Mu and the Memory <uvm-memory.rest>`__
 77 | 
 78 | thread
 79 |     A thread is the unit of CPU scheduling. In this memory model, threads
 80 |     include but are not limited to Mu threads. See `Threads and Stacks <threads-stacks.rest>`__ for the
 81 |     definition of Mu threads.
 82 | 
 83 | stack, stack binding, stack unbinding, swap-stack
 84 |     See `Threads and Stacks <threads-stacks.rest>`__
 85 | 
 86 | futex, futex_wait, futex_wake
 87 |     See `Threads and Stacks <threads-stacks.rest>`__
 88 | 
 89 | Comparison of Terminology
 90 | -------------------------
 91 | 
 92 | The following table is a approximate comparison and may not strictly apply.
 93 | 
 94 | =================== ================================
 95 | C                   Mu
 96 | =================== ================================
 97 | value               data value
 98 | expression          SSA variable
 99 | object              memory location
100 | memory location     memory location of scalar type
101 | (N/A)               object
102 | read                load
103 | modify              store
104 | =================== ================================
105 | 
106 | Operations
107 | ==========
108 | 
109 | Operations include (but are not limited to) the following:
110 | 
111 | load
112 |     A memory load. May be atomic or not.
113 | 
114 | store
115 |     A memory store. May be atomic or not.
116 | 
117 | atomic read-modify-write
118 |     A load and (maybe conditionally) a store as one atomic action. It may
119 |     contain both a load and a store operation, but may have special atomic
120 |     properties.
121 |     
122 | fence
123 |     A fence introduces memory orders.
124 | 
125 | stack binding
126 |     Binding a thread to a stack.
127 | 
128 | stack unbinding
129 |     Unbinding a thread from a stack.
130 | 
131 | swap-stack
132 |     Unbinding a thread from a stack and bind that thread to another stack.
133 | 
134 | futex wait
135 |     Waiting on a memory location.
136 | 
137 | futex wake
138 |     Wake up threads waiting on a memory location.
139 | 
140 | external operation
141 |     Any other operation that may affect the state outside Mu.
142 | 
143 | ..
144 | 
145 |     NOTE: Unlike the Java Memory Model, Mu memory model does not contain locks. 
146 | 
147 | Memory Operations
148 | =================
149 | 
150 | Some instructions and API functions perform memory operations. Specifically,
151 | 
152 | - The ``LOAD`` instruction and the ``load`` API function perform a load
153 |   operation.
154 | - The ``STORE`` instruction and the ``store`` API function perform a store
155 |   operation.
156 | - The ``CMPXCHG`` instruction and the ``cmpxchg`` API function perform a
157 |   compare-exchange operation, which is a kind of atomic read-modify-write
158 |   operation.
159 | - The ``ATOMICRMW`` instruction and the ``atomicrmw`` API function perform an
160 |   atomic read-modify-write operation.
161 | - The ``FENCE`` instruction and the ``fence`` API function are a fence.
162 | - A concrete implementation may have other ways to perform those instructions.
163 | 
164 | ..
165 | 
166 |     NOTE: Programs in other languages (e.g. native programs or any other
167 |     language a Mu implementation can interface with) can synchronise with Mu in
168 |     an implementation-specific way. But the implementation must guarantee that
169 |     those programs perform those operations in a way compatible with Mu.
170 | 
171 |     For example, there are more than one way to implement loads and stores of
172 |     the SEQ_CST order (either put fences in the load or in the store). If the
173 |     implementation interfaces with a C implementation (e.g.
174 |     gcc+glibc+Linux+x86_64), then Mu should do the same thing as (or be
175 |     compatible with) the C program.
176 | 
177 | Load, store, atomic read-modify-write operations and fences have memory orders,
178 | which are the following:
179 | 
180 | - NOT_ATOMIC
181 | - RELAXED
182 | - CONSUME
183 | - ACQUIRE
184 | - RELEASE
185 | - ACQ_REL (acquire and release)
186 | - SEQ_CST (sequentially consistent)
187 | 
188 | All accesses that are not NOT_ATOMIC are atomic. Using both non-atomic
189 | operations and atomic operations on the same memory location is an undefined
190 | behaviour.
191 | 
192 | - Load shall have NOT_ATOMIC, RELAXED, CONSUME, ACQUIRE or SEQ_CST order.
193 | - Store shall have NOT_ATOMIC, RELAXED, RELEASE or SEQ_CST order.
194 | - Compare-exchange shall have RELAXED, ACQUIRE, RELEASE, ACQ_REL or SEQ_CST on
195 |   success and RELAXED, ACQUIRE or SEQ_CST on failure.
196 | - Other atomic read-modify-write operations shall have RELAXED, ACQUIRE,
197 |   RELEASE, ACQ_REL or SEQ_CST order.
198 | - Fence shall have ACQUIRE, RELEASE, ACQ_REL or SEQ_CST order.
199 | 
200 | =========== ======= ======= =============== =============== =========== =====
201 | Order       LOAD    STORE   CMPXCHG(succ)   CMPXCHG(fail)   ATOMICRMW   FENCE
202 | =========== ======= ======= =============== =============== =========== =====
203 | NOT_ATOMIC  yes     yes     no              no              no          no
204 | RELAXED     yes     yes     yes             yes             yes         no
205 | CONSUME     yes     no      no              no              no          no
206 | ACQUIRE     yes     no      yes             yes             yes         yes
207 | RELEASE     no      yes     yes             no              yes         yes
208 | ACQ_REL     no      no      yes             no              yes         yes
209 | SEQ_CST     yes     yes     yes             yes             yes         yes
210 | =========== ======= ======= =============== =============== =========== =====
211 | 
212 | - A load operation with ACQUIRE, ACQ_REL or SEQ_CST order performs a **acquire**
213 |   operation on its specified memory location.
214 | - A load operation with CONSUME order performs a **consume** operation on its
215 |   specified memory location.
216 | - A store operation with RELEASE, ACQ_REL or SEQ_CST order performs a
217 |   **release** operation on its specified memory location.
218 | - A fence with ACQUIRE, ACQ_REL or SEQ_CST order is a **acquire fence**.
219 | - A fence with RELEASE, ACQ_REL or SEQ_CST order is a **release fence**.
220 | 
221 | Orders
222 | ======
223 | 
224 | Program Order
225 | -------------
226 | 
227 | All evaluations performed by a Mu thread form a total order, in which the
228 | operations performed by each evaluation are **sequenced before** operations
229 | performed by its successor.
230 | 
231 | All operations performed by a Mu client via a particular client context of the
232 | API form a total order, in which each operation is **sequenced before** its
233 | successor.
234 | 
235 | Operations before a ``TRAP`` or ``WATCHPOINT`` are **sequenced before** the
236 | operations in the trap handler. Operations in the trap handler are **sequenced
237 | before** operations after that ``TRAP`` or ``WATCHPOINT``.
238 | 
239 | The **program order** contains operations and their "sequenced before"
240 | relations.
241 | 
242 |     NOTE: This means all Mu instructions plus all client operations done by the
243 |     trap handler in a Mu thread still forms a total order.
244 | 
245 |     In C, the program order is a partial order even in a single thread because
246 |     of unspecified order of evaluations.
247 | 
248 | Modification Order
249 | ------------------
250 | 
251 | All atomic store operations on a particular memory location M occur in some
252 | particular total order, called the **modification order** of M. If A and B are
253 | atomic stores on memory location M, and A happens before B, then A shall precede
254 | B in the modification order of M.
255 | 
256 |     NOTE: This is to say, the modification order is consistent with the happens
257 |     before order.
258 | 
259 |     NOTE: This reflects the mechanisms, including cache coherence, provided by
260 |     some hardware that guarantees such a total order.
261 | 
262 | A **release sequence** headed by a release operation A on a memory location M is
263 | a maximal contiguous sub-sequence of atomic store operations in the modification
264 | order M, where the first operation is A and every subsequent operation either is
265 | performed by the same thread that performed the release or is an atomic
266 | read-modify-write operation.
267 | 
268 |     NOTE: In Mu, when a memory location is accessed by both atomic and
269 |     non-atomic operations, it is an undefined behaviour. So the release sequence
270 |     only apply for memory locations only accessed by atomic operations.
271 | 
272 |     NOTE: Intuitively, there is a invisible fence before a release store (which
273 |     is sometimes actually implemented as this). Seeing a store in the release
274 |     sequence should imply seeing stores before the invisible fence.
275 | 
276 | The Synchronises With Relation
277 | ------------------------------
278 | 
279 | An evaluation A **synchronises with** another evaluation B if:
280 | 
281 | - A performs a release operation on memory location M, and, B performs an
282 |   acquire operation on M, and, sees a value stored by an operation in the
283 |   release sequence headed by A, or
284 | - A is a release fence, and, B is an acquire fence, and, there exist atomic
285 |   operations X and Y, both operating on some memory location M, such that A is
286 |   sequenced before X, X store into M, Y is sequenced before B, and Y sees the
287 |   value written by X or a value written by any store operation in the
288 |   hypothetical release sequence X would head if it were a release operation, or
289 | - A is a release fence, and, B is an atomic operation that performs an acquire
290 |   operation on a memory location M, and, there exists an atomic operation X such
291 |   that A is sequenced before X, X stores into M, and B sees the value written by
292 |   X or a value written by any store operations in the hypothetical release
293 |   sequence X would head if it were a release operation, or
294 | - A is an atomic operation that performs a release operation on M, and, B is an
295 |   acquire fence, and, there exists some atomic operation X on M such that X is
296 |   sequenced before B and sees the value written by A or a value written by any
297 |   side effect in the release sequence headed by A, or
298 | - A is the creation of a thread and B is the beginning of the execution of the
299 |   new thread.
300 | - A is a futex wake operation and B is the next operation after the futex wait
301 |   operation of the thread woken up by A.
302 | 
303 | ..
304 | 
305 |     NOTE: A thread can be created by the ``NEWTHREAD`` instruction or the
306 |     ``new_thread`` API function.
307 | 
308 |     NOTE: Since there is no explicit heap memory management in Mu, the
309 |     "synchronises with" relation in C involving ``free`` and ``realloc`` does
310 |     not apply in Mu.
311 | 
312 |     NOTE: Mu only provides very primitive threading support. The "synchronises
313 |     with" relations involving ``call_once`` and ``thrd_join`` are not in the
314 |     memory model, but can be implemented on a higher level.
315 | 
316 |     NOTE: The "synchronises with" relation between the futex wake and wait is
317 |     necessary to ensure the visibility of values written by one thread to be
318 |     visible immediately by the woken thread. If such relation does not exist,
319 |     the woken thread may never see the memory change made by the other thread.
320 |     For example::
321 | 
322 |         // C pseudo code
323 |         int        shared_var = 42;
324 |         atomic_int futex      = 0;
325 | 
326 |         thread1 {
327 |             shared_var = 43;
328 |             futex = 1;              // Op1
329 |             futex_wake(&futex);     // Op2
330 |         }
331 | 
332 |         thread2 {
333 |             while(futex == 0) {         // Op4
334 |                 futex_wait(&futex, 0);  // Op3
335 |             }
336 |             int local_var = shared_var;
337 |         }
338 | 
339 |     If the "synchronises with" between Op2 and Op3 does not exist, then Op4 may
340 |     never see the value written by Op1, and thread2 will loop indefinitely.
341 | 
342 | Dependency
343 | ----------
344 | 
345 | An evaluation A **carries a dependency to** another evaluation B, or B *carries
346 | a dependency from* A, if:
347 | 
348 | - the data value of A is used as a data argument of B unless:
349 | 
350 |   * A is used in the ``KEEPALIVE`` clause of B, or
351 |   * B is a ``SELECT`` instruction and A is its ``cond`` argument or a is the
352 |     ``iftrue`` or ``iffalse`` argument not selected by ``cond``, or
353 |   * A is a comparing or ``INSERTVALUE`` instruction, or
354 |   * B is a ``@uvm.kill_dependency``, ``CALL``, ``EXTRACTVALUE`` or ``CCALL``
355 |     instruction, or
356 | 
357 | - there is a store operation X such that A is sequenced before X and X is
358 |   sequenced before B, and, X stores the value of A to a memory location M, and,
359 |   B performs a load operation from M, or
360 | - for some evaluation X, A carries a dependency to X and X carries a dependency
361 |   to B.
362 | 
363 | ..
364 | 
365 |     NOTE: The "carries a dependency to" relation together with the
366 |     "dependency-ordered before"" relation exploits the fact that some
367 |     processors, notably ARM and POWER, will not reorder load operations if the
368 |     address used in the later in the program order depends on the result of the
369 |     earlier load. On such processors, the earlier load can be implemented as an
370 |     ordinary load without fences and still has "consume" semantic.
371 | 
372 |     NOTE: Processors including ARM and POWER only respects data dependency, not
373 |     control dependency. The ``SELECT`` instruction and the comparing instruction
374 |     are usually implemented by conditional moves or conditional flags, which
375 |     would end up that the result is control-dependent on the argument rather
376 |     than data dependent.
377 | 
378 |     NOTE: Operations involving ``struct`` types in Mu may be implemented as
379 |     no-ops. Consider the following::
380 | 
381 |         .typedef @i64 = int<64>
382 |         .const @I64_0 <@i64> = 0
383 | 
384 |         .type @A = struct <@i64 @i64>
385 |         .const @A_ZERO <@A> = {@I64_0 @I64_0}
386 | 
387 |         %v = LOAD CONSUME <@i64> %some_memory_location
388 |         %x = INSERTVALUE  <@A 0> @A_ZERO %v     // {%v 0}
389 |         %y = EXTRACTVALUE <@A 0> %x             // %v
390 |         %z = EXTRACTVALUE <@A 1> %x             // 0
391 | 
392 |     Mu can alias ``%y`` with ``%v`` in the machine code, but ``%z`` is always a
393 |     constant zero.
394 | 
395 |     NOTE: Dependencies may not always be carried across function calls. A
396 |     function may return a constant and it is uncertain if any processor respect
397 |     this order.
398 | 
399 | An evaluation A is **dependency-ordered before** another evaluation B if any of
400 | the following is true:
401 | 
402 | * A performs a release operation on a memory location M, and, in another thread,
403 |   B performs a consume operation on M and sees a value stored by any store
404 |   operations in the release sequence headed by A.
405 | * For some evaluation X, A is dependency-ordered before X and X carries a
406 |   dependency to B.
407 | 
408 | ..
409 | 
410 |     NOTE: The "dependency-ordered before" relation consists of a release/consume
411 |     pair followed by zero or more "carries a dependency to" relations. If the
412 |     consume sees the value of (or "later than") the release operation, then
413 |     subsequent loads that depends on the consume operation should also see
414 |     values stored before the release operation.
415 | 
416 | ..
417 | 
418 |     TODO: The "carries a dependency to" relation is not well-defined for the
419 |     client since it may be written in a different language.
420 | 
421 | The Happens Before Relation
422 | ---------------------------
423 | 
424 | An evaluation A **inter-thread happens before** an evaluation B if A
425 | synchronises with B, A is dependency-ordered before B, or, for some evaluation
426 | X:
427 | 
428 | * A synchronises with X and X is sequenced before B,
429 | * A is sequenced before X and X inter-thread happens before B, or
430 | * A inter-thread happens before X and X inter-thread happens before B.
431 | 
432 | ..
433 | 
434 |     NOTE: This basically allows any concatenations of "synchronises with",
435 |     "dependency-ordered before" and "sequenced before" relations, but disallows
436 |     ending with a "dependency-ordered before" relation followed by a "sequenced
437 |     before" relation. It is disallowed because the consume load in the
438 |     "dependency-ordered before" relation only respects later loads that works
439 |     with a location that depends on the consume load, not arbitrary loads
440 |     sequenced after it. It is only disallowed in the end because the release
441 |     operation in a "synchronises with" relation or a "dependency-ordered before"
442 |     relation will force the order between it and any preceding operations.
443 | 
444 |     NOTE: A sequence of purely "sequenced before" is not "inter-thread" and is
445 |     also not allowed in the "inter-thread happens before" relation.
446 | 
447 | An evaluation A **happens before** an evaluation B if A is sequenced before B or
448 | A inter-thread happens before B.
449 | 
450 | Value Visibility
451 | ----------------
452 | 
453 | A load operation B from a memory location M shall see the initial value of M,
454 | the value stored by a store operation A sequenced before B, or other permitted
455 | values defined later.
456 | 
457 | A **visible store operation** A to a memory location M with respect to a load
458 | operation B from M satisfies the conditions:
459 | 
460 | * A happens before B, and
461 | * there is no other store operation X to M such that A happens before X and X
462 |   happens before B.
463 | 
464 | A non-atomic load operation B from memory location M shall see the value stored
465 | by the visible store operation A.
466 | 
467 |     NOTE: If there is ambiguity about which store operation is visible to a
468 |     non-atomic load operation, then there is a data race and the behaviour is
469 |     undefined.
470 | 
471 | The **visible seqeunce of atomic store operations** to a memory location M with
472 | respect to an atomic load operation B from M, is a maximal contiguous
473 | sub-sequence of atomic store operations in the modification order of M, where
474 | the first operation is visible with respect to B, and for every subsequent
475 | operation, it is not the case that B happens before it. The atomic load
476 | operation B sees the value stored by some atomic load operation in the visible
477 | sequence M. Furthermore, if an atomic load operation A from memory location M
478 | happens before an atomic load operation B from M, and A sees a value stored by
479 | an atomic store operation X, then the value B sees shall either equal the value
480 | seen by A, or be the value stored by an atomic store operation Y, where Y
481 | follows X in the modification order of M.
482 | 
483 |     NOTE: This means, a load cannot see the value stored by an operation happens
484 |     after it, or a store operation separated by another store in the
485 |     happen-before relation.  Furthermore, the later operation of two loads
486 |     cannot see an earlier value than that seen by the first load.
487 | 
488 | The execution of a program contains a **data race** if it contains two
489 | conflicting non-atomic memory accesses in different threads, neither happens
490 | before the other. Any such data race results in undefined behaviour.
491 | 
492 |     NOTE: Using both atomic and non-atomic accesses on the same memory location
493 |     is already an undefined behaviour, whether in the same thread of not.
494 | 
495 | Special Rules for SEQ_CST
496 | =========================
497 | 
498 | There shall be a single total order S on all SEQ_CST operations, consistent with
499 | the "happens before" order and modification orders for all affected memory
500 | locations, such that each SEQ_CST load operation B from memory location M sees
501 | one of the following values:
502 | 
503 | * the result of the last store operation A that precedes B in S, if it exists,
504 |   or
505 | * if A exists, the result of some store operation to M in the visible sequence
506 |   of atomic store operations with respect to B that is not SEQ_CST and does not
507 |   happen before A, or
508 | * if A does not exist, the result of some store operation to M in the visible
509 |   sequence of atomic store operations with respect to B that is not SEQ_CST.
510 | 
511 | For an atomic load operation B from a memory location M, if there is a SEQ_CST
512 | fence X sequenced before B, then B observes either the last SEQ_CST store
513 | operation of M preceding X in the total order S or a later store operation of M
514 | in its modification order.
515 | 
516 | For atomic operations A and B on a memory location M, where A stores into M and
517 | B loads from M, if there is a SEQ_CST fence X such that A is sequenced before X
518 | and B follows X in S, then B observes either the effect of A or a later store
519 | operation of M in its modification order.
520 | 
521 | For atomic operations A and B on a memory location M, where A stores into M and
522 | B loads from M, if there are SEQ_CST fences X and Y such that A is sequenced
523 | before X, Y is sequenced before B and X precedes Y in S, then B observes either
524 | the effect of A or a later store operation of M in its modification order.
525 | 
526 | Special Rules for Atomic Read-modify-write Operations
527 | =====================================================
528 | 
529 | Atomic read-modify-write operations shall always see the last value (in the
530 | modification order) stored before the store operation associated with the
531 | read-modify-write operation.
532 | 
533 | Special Rules for Stack Operations
534 | ==================================
535 | 
536 | A swap-stack operation performs an unbinding operation followed by a binding
537 | operation. The former is sequenced before the latter.
538 | 
539 | In the evaluation of a ``TRAP`` or ``WATCHPOINT`` instruction, the implied stack
540 | unbinding operation is sequenced before any operations performed by the client.
541 | If the client chooses to return and rebind the stack, the stack binding
542 | operation is sequenced after all operations performed by the client and the
543 | implied stack unbinding operation.
544 | 
545 | Stack binding and unbinding operations are not atomic. If there is a pair of
546 | stack binding or unbinding operations on the same stack, but do not have a
547 | "happens before" relation, it has undefined behaviour.
548 | 
549 | Special Rules for Futex
550 | =======================
551 | 
552 | The load operations performed by the ``@uvm.futex.wait``,
553 | ``@uvm.futex.wait_timeout`` and ``@uvm.futex.cmp_requeue`` on the memory
554 | location given by its argument are atomic.
555 | 
556 | Special Rules for Functions and Function Redefinition
557 | =====================================================
558 | 
559 | The rules of memory access applies to functions as if
560 | 
561 | * a function were a memory location that holds a function version, and
562 | 
563 | * a creation of a frame for a function were an atomic load on that location of
564 |   the RELAXED order, which sees a particular version, and
565 | 
566 | * a function definition or redefinition during the load of a bundle were an
567 |   atomic store on that location of the RELAXED order, which stores a new
568 |   version.
569 | 
570 | ..
571 | 
572 |     NOTE: A frame is created when:
573 |     
574 |     1. calling a function by the ``CALL`` or ``TAILCALL`` instructions, or by
575 |        native programs through exposed Mu functions, or
576 | 
577 |     2. creating a new stack by the ``@uvm.new_stack`` instruction or the
578 |        ``new_stack`` API, or
579 | 
580 |     3. pushing a new frame by the ``push_frame`` API or the
581 |        ``@uvm.meta.push_frame`` instruction.
582 | 
583 | The order of definitions and redefinitions of a particular function is
584 | consistent with the order the bundles that contain the definitions are loaded.
585 | 
586 |     NOTE: This means synchronisation operations must be used to guarantee other
587 |     threads other than the one which loads a bundle see the most recent version
588 |     of a function.
589 | 
590 | Out-of-thin-air or Speculative stores
591 | =====================================
592 | 
593 | TODO
594 | 
595 | .. vim: tw=80
596 | 


--------------------------------------------------------------------------------
/native-interface-x64-unix.rest:
--------------------------------------------------------------------------------
  1 | ===========================
  2 | AMD64 Unix Native Interface
  3 | ===========================
  4 | 
  5 | This is the native interface for AMD64 on Unix-like operating systems.
  6 | 
  7 | Memory Layout
  8 | =============
  9 | 
 10 | The memory layout of the native memory uses the `System V Application Binary
 11 | Interface for x86-64 <http://www.x86-64.org/documentation_folder/abi.pdf>`__
 12 | (referred to as "the AMD64 ABI" from now on) as a reference. It is recommended
 13 | that the Mu memory also use this memory layout, but is not required since the Mu
 14 | memory it is never externally visible unless explicitly pinned.
 15 | 
 16 | Data in the native memory uses the sizes and alignments of types are listed in
 17 | the following table. The unit of sizes and alignments is byte, which is 8 bits.
 18 | All non-native-safe types have unspecified sizes and alignments.
 19 | 
 20 | .. table:: Mapping between Mu and C types
 21 | 
 22 |     =========================== ======================= =============== ================
 23 |     Mu type                     C type                  Size            Alignment
 24 |     =========================== ======================= =============== ================
 25 |     ``int<8>``                  ``char``                1               1
 26 |     ``int<16>``                 ``short``               2               2
 27 |     ``int<32>``                 ``int``                 4               4
 28 |     ``int<64>``                 ``long``, ``long long`` 8               8
 29 |     ``float``                   ``float``               4               4
 30 |     ``double``                  ``double``              8               8
 31 |     ``vector<int<8> 4>``        ``__m128``              16              16
 32 |     ``vector<float  4>``        ``__m128``              16              16
 33 |     ``vector<double 2>``        ``__m128``              16              16
 34 |     ``uptr<T>``                 ``T *``                 8               8
 35 |     ``ufuncptr<sig>``           ``T (*) ()``            8               8
 36 |     ``ref<T>``                  N/A                     unspecified     unspecified
 37 |     ``iref<T>``                 N/A                     unspecified     unspecified
 38 |     ``weakref<T>``              N/A                     unspecified     unspecified
 39 |     ``tagref64``                N/A                     unspecified     unspecified
 40 |     ``funcref<sig>``            N/A                     unspecified     unspecified
 41 |     ``threadref``               N/A                     unspecified     unspecified
 42 |     ``stackref``                N/A                     unspecified     unspecified
 43 |     ``int<1>``                  N/A                     unspecified     unspecified
 44 |     ``int<6>``                  N/A                     unspecified     unspecified
 45 |     ``int<52>``                 N/A                     unspecified     unspecified
 46 |     ``int<n>``                  unspecified             unspecified     unspecified
 47 |     ``vector<T n>``             unspecified             unspecified     unspecified
 48 |     =========================== ======================= =============== ================
 49 | 
 50 | ..
 51 | 
 52 |     NOTE: Although ``int<1>`` is required and ``int<6>`` and ``int<52>`` are
 53 |     required when ``tagref64`` is implemented, their memory layout is
 54 |     unspecified because memory access instructions ``LOAD``, ``STORE``, etc. are
 55 |     not required to support those types. It it not recommended to include those
 56 |     types in the memory because they may never be loaded or stored.
 57 | 
 58 |     Although vectors of other lengths are not required by a Mu implementation,
 59 |     implementations are encouraged to support them in a way compatible with the
 60 |     AMD64 ABI.
 61 | 
 62 | The structure type ``struct<...>`` and the hybrid type ``hybrid<Fs V>`` is
 63 | aligned to its most strictly aligned component. Each member is assigned to the
 64 | lowest available offset with the appropriate alignment. This rule applies to
 65 | hybrids as if the hybrid ``hybrid<Fs V>`` is a struct of fields ``Fs`` followed
 66 | by a flexible array member (as in C99) ``V fam[];``.  Arrays ``array<T>`` use
 67 | the same alignment as its elements.
 68 | 
 69 |     NOTE: There is no union types in Mu. Arrays do not have special rules of
 70 |     16-byte alignment as the AMD64 ABI does. Mu arrays must be declared as an
 71 |     array of vectors (such as ``array<vector<int<32> 4> 100>``) to be eligible
 72 |     for vector access.
 73 | 
 74 | Both integers and floating point numbers are little-endian (lower bytes in lower
 75 | addresses). Signed integers use the 2's complement representation. Elements with
 76 | lower indexes in a vector is stored in lower addresses in the memory.
 77 | 
 78 | Calling Convention
 79 | ==================
 80 | 
 81 | The calling convention between Mu functions is implementation-defined.
 82 | 
 83 | The Default Calling Convention
 84 | ------------------------------
 85 | 
 86 | The *default* calling convention, denoted by the ``#DEFAULT`` flag in the IR,
 87 | follows the AMD64 ABI in register usage, stack frame structure, parameter
 88 | passing and returning. The parameter types and the return types are mapped to C
 89 | types according to the above table. Functions in this calling convention can
 90 | return at most one value. As a special case, if the native function signature
 91 | returns void, the corresponding Mu signature returns no values ``(...) -> ()``.
 92 | Mu ``struct`` and ``array`` types are mapped to C structs of corresponding
 93 | members. ``array`` cannot be the type of parameters or return values because C
 94 | disallows this, but arrays within structs and pointers to arrays are is allowed.
 95 | 
 96 | Arguments and return values are passed in registers and the memory according to
 97 | the AMD64 ABI, with the types of Mu arguments and the return type mapped to the
 98 | corresponding C types.
 99 | 
100 |     NOTE: This is to say, C programs can call Mu functions with a "compatible"
101 |     signature (that is, parameters and return values match the above table).
102 |     Even if the signature is not "perfectly" matching (for example, an int/long
103 |     is passed when a pointer is expected, Mu must still interpret the incoming
104 |     arguments strictly according to the ABI, i.e. interpreting the integer value
105 |     in the register as an address).
106 | 
107 | If a Mu function of signature *sig* is exposed with the *default* calling
108 | convention, the resulting value has ``ufuncptr<sig>`` type, i.e. it is a
109 | function pointer which can be called (by either Mu or native programs) with the
110 | *default* calling convention.
111 | 
112 | It has undefined behaviour when the native program attempts to unwind Mu frames.
113 | 
114 |     NOTE: This means C ``longjmp`` and C++ exceptions must not go through Mu
115 |     frames, but as long as they are handled **above** any Mu frames, it is safe.
116 | 
117 | .. vim: tw=80
118 | 


--------------------------------------------------------------------------------
/native-interface.rest:
--------------------------------------------------------------------------------
  1 | ================
  2 | Native Interface
  3 | ================
  4 | 
  5 | This chapter defines the Mu native interface.
  6 | 
  7 |     NOTE: The term **foreign function interface** may have been used in many
  8 |     other places by experienced VM engineers to mean a heavy-weighted complex
  9 |     interface with another language. JikesRVM users use **foreign function
 10 |     interface** to refer to JNI and use **syscall** to refer to a light-weight
 11 |     unsafe mechanism to call arbitrary C functions with minimal overhead.
 12 |     
 13 |     The ``CCALL`` instruction is more similar to the latter. It has minimum
 14 |     overhead, but provides no protection to malicious code. So it must be used
 15 |     with care. 
 16 | 
 17 |     To reduce confusion, we use the term **unsafe native interface** or just
 18 |     **native interface** instead of *foreign function interface*.
 19 | 
 20 | The **native interface** is a *light-weight* *unsafe* interface through which
 21 | *Mu IR programs* communicate with *native programs*.
 22 | 
 23 |     NOTE: This has no direct relationship with the Mu client interface.
 24 |     
 25 |     * Native programs are usually written in C, C++ or other low-level languages
 26 |       and usually does not run on VMs.
 27 | 
 28 |     * A Mu client is not necessary a native program.  The client can be written
 29 |       in a managed language, running in a VM, running in the same Mu VM as
 30 |       user-level programs (i.e. a "metacircular" client), or living in a
 31 |       different process or even a different computer, communicating with Mu
 32 |       using sockets.
 33 | 
 34 |     However, it does not rule out the possibility to implement the Mu client
 35 |     interface *for* native programs *via* this native interface.
 36 | 
 37 | The main purpose of the native interface is
 38 | 
 39 | 1. to interoperate with the operating system by invoking system libraries
 40 |    (including system calls), and
 41 |    
 42 | 2. to interoperate with libraries written in other programming languages.
 43 | 
 44 | ..
 45 | 
 46 |     NOTE: The purpose of the Mu client interface is to let the client control
 47 |     the Mu micro VM and handle events. The native interface is not about
 48 |     "controlling Mu".
 49 | 
 50 | It is not a purpose to interface with *arbitrary* native libraries. This
 51 | interface should be minimal but just enough to handle most *common* system calls
 52 | (e.g. ``open``, ``read``, ``write``, ``close``, ...) and *common* native
 53 | libraries.  Complex data types and functions (e.g. those with unusual
 54 | size/alignment requirements or calling conventions) may require wrapper code
 55 | provided by the language implementer.
 56 | 
 57 | The native interface is not required to be *safe*. The overhead of this
 58 | interface should be as low as possible. It is the client's responsibility to
 59 | implement things like JNI on top of this interface.
 60 | 
 61 |     For JikesRVM users: The native interface includes raw memory access which is
 62 |     similar to "vmmagic" and the ``CCALL`` instruction is more like the
 63 |     "syscall" mechanism. They are not safe, but highly efficient and should be
 64 |     used with care.
 65 | 
 66 | ..
 67 | 
 68 |     NOTE: Directly making system calls from Mu and bypassing the C library
 69 |     (libc) is theoretically possible, but is not a mainstream way to do so. It
 70 |     has a lower priority in the design.
 71 | 
 72 | Outline
 73 | =======
 74 | 
 75 | This interface has several aspects:
 76 | 
 77 | 1. **Raw memory access**: This interface provides pointer types and directly
 78 |    access the memory via pointers.
 79 | 
 80 | 2. **Exposing Mu memory to the native world**: This allows native programs to
 81 |    access Mu memory in a limited fashion.
 82 | 
 83 | 3. **Native function call**: This interface provides a mechanism to call a
 84 |    native function using a native calling convention.
 85 | 
 86 | 4. **Callback from native programs**: This interface will enable calling back
 87 |    from the native program.
 88 | 
 89 | 5. **Inline assembly**: Directly inserting machine-dependent instructions into
 90 |    a Mu IR function.
 91 | 
 92 | Raw Memory Access
 93 | =================
 94 | 
 95 | This section defines mechanisms for raw memory access. *Pointers* give Mu
 96 | programs access to the native (raw) memory, while *pinning* gives native
 97 | programs access to the Mu memory.
 98 | 
 99 | Pointers
100 | --------
101 | 
102 | A **pointer** is an address in the memory space of the current process.  A
103 | pointer can be a **data pointer** (type ``uptr<T>``) or **function pointer**
104 | (type ``ufuncptr<sig>``). The former assumes a data value is stored in a region
105 | beginning with the address. The latter assumes a piece of executable machine
106 | code is located at the address.
107 | 
108 | ``uptr<T>``, ``ufuncptr<sig>`` and ``int<n>``, where ``T`` is a type, ``sig`` is a
109 | function signature, can be cast to each other using the ``PTRCAST`` instruction.
110 | The address is preserved and the ``int<n>`` type has the numerical value of the
111 | address. Type checking is not performed.
112 | 
113 |     Potential problem: There may be machines where data pointers have a
114 |     different size from function pointers, but I have never seen one.
115 | 
116 |     For C users: C spec never defined pointers as addresses. C pointers can
117 |     point to either objects (region of storage) or functions. Casting between
118 |     object pointers, function pointers and integers has implementation-defined
119 |     behaviours.
120 | 
121 |     There are segmented architectures, including x86, whose "pointers" are
122 |     segments + offsets. However, apparently the trend is to move to a "flat"
123 |     memory space.
124 | 
125 | Pinning
126 | -------
127 | 
128 | A **pinning** operations takes either a ``ref<T>`` value or an ``iref<T>`` value
129 | as parameter. The result is a data pointer. If it is an ``iref``, the data
130 | pointer can be used to access the memory location referred by the ``iref``.
131 | Pinning a ``NULL`` ``iref`` returns a ``NULL`` pointer whose address is 0. If it
132 | is a ``ref``, it is equivalent to pin the ``iref`` of the memory location of
133 | the object itself, or 0 if the ``ref`` itself is ``NULL``.
134 | 
135 | An **unpinning** operation also takes either a ``ref<T>`` value or an
136 | ``iref<T>`` value as parameter, but returns ``void``.
137 | 
138 | In each thread, there is a conceptual "pinning multi-set" (may contain repeated
139 | elements). A pinning operation adds a ``ref`` or ``iref`` into this multi-set,
140 | and an unpinning operation removes one instance of the ``ref`` or ``iref`` from
141 | his multi-set. A memory location is pinned as long as there is at least one
142 | ``iref`` to that memory location in the pinning multi-set of any thread.
143 | 
144 |     NOTE: This requires the Mu micro VM to perform somewhat complex
145 |     book-keeping, but this gives Mu the opportunity for performance improvement
146 |     over global Boolean pinning, where a pinned object can be unpinned instantly
147 |     by an unpinning operation in any thread. The "pinning multi-thread" can be
148 |     implemented as a thread-local buffer. In this case, if GC never happens, no
149 |     expensive atomic memory access or inter-thread synchronisation is performed.
150 | 
151 | Calling between Mu and Native Functions
152 | =======================================
153 | 
154 | Calling Conventions
155 | -------------------
156 | 
157 | The calling conventions involving native programs are platform-dependent and
158 | implementation-dependent. It should be defined by platform-specific binary
159 | interfaces (ABI) as supplements to this Mu specification. Mu implementations
160 | should advertise what ABI it implements.
161 | 
162 | Calling conventions are identified by flags (``#XXXXXX``) in the IR. Mu defines
163 | the flag ``#DEFAULT`` and its numerical value 0x00 for the default calling
164 | convention of platforms. This flag is always available. Other calling
165 | conventions can be defined by implementations.
166 | 
167 | The calling convention determines the type of value that are callable by the
168 | ``CCALL`` instruction (described below), and the type of the exposed value for
169 | Mu functions (described below). The type is usually a ``ufuncptr<sig>`` for C
170 | functions, which are called via their addresses. Other examples are:
171 | 
172 | * If it is desired to make system calls directly from Mu, then the type can be
173 |   an integer, i.e. the system call number.
174 |   
175 | * If it is something like `a SWAP-STACK operation implemented as a calling
176 |   convention <http://dl.acm.org/citation.cfm?id=2400695>`__, then the callee can
177 |   be a stack pointer in the form of ``uptr<void>``.
178 | 
179 | Mu Functions Calling Native Functions
180 | -------------------------------------
181 | 
182 | The ``CCALL`` instruction calls a native function. Determined by calling
183 | conventions, the native function may be represented in different ways, and the
184 | arguments are passed in different ways. The return value of the call will be the
185 | return value of the ``CCALL`` instruction, which is a Mu SSA variable.
186 | 
187 | Native Functions Calling Mu Functions
188 | -------------------------------------
189 | 
190 | A Mu function can be **exposed** as a native function pointer in three ways:
191 | 
192 | 1. Statically, an ``.expose`` top-level definition exposes a Mu function as a
193 |    native value according to the desired calling convention. For the default
194 |    calling convention, the result is usually a function pointer.
195 | 
196 | 2. Dynamically, the ``@uvm.native.expose`` common instructions can expose a Mu
197 |    function, and the ``@uvm.native.unexpose`` common instruction deletes the
198 |    exposed value.
199 | 
200 | 3. Dynamically, the ``expose`` and ``unexpose`` API function do the same thing
201 |    as the above instructions.
202 | 
203 | A "cookie", which is a 64-bit integer value, can be attached to each exposed
204 | value. When a Mu function is called via one of its exposed value, the attached
205 | cookie can be retrieved by the ``@uvm.native.get_cookie`` common instruction in
206 | the callee, or 0 if called directly from Mu.
207 | 
208 |     NOTE: The purpose for the cookie is to support "closures". In some
209 |     high-level languages, the programmer-accessible "functions" are actually
210 |     closures, i.e. codes with attached data. Implemented on Mu, multiple
211 |     different closures may share the same Mu function as their codes, but has
212 |     different attached data. For example, in Lua::
213 | 
214 |         function make_adder(y)
215 |             return function(x)
216 |                 return x + y
217 |             end
218 |         end
219 | 
220 |         plus_one = make_adder(1)
221 |         plus_two = make_adder(2)
222 | 
223 |         print(plus_one(3), plus_two(3))     -- 4 5
224 | 
225 |     ``plus_one`` and ``plus_two`` may probably share the same underlying Mu
226 |     function as their common implementations, and they only differ by the
227 |     different "up-value" ``y``.
228 | 
229 |     In C, any sane C programs that use call-backs should also have a ``void *`` 
230 |     as the "user data". For example, the ``pthread_create`` routine takes an
231 |     extra ``void *arg`` parameter which will be passed to its ``start_routine``
232 |     as the argument. If the call-back is supposed to be a wrapper of a
233 |     high-level language closure, the user data will be its context.
234 |     
235 |     However, different C programs support user data in different ways (if at
236 |     all). For example, the UNIX signal handler function takes exactly one
237 |     parameter which is the signal number: ``typedef void (*sig_t) (int)``. If a
238 |     closure is supposed to handle UNIX signals, it must be able to identify its
239 |     context by merely the exposed function pointer.
240 | 
241 |     One way to work around this problem is to generate a trampoline function
242 |     which sets the cookie and jumps to the real callee. Many different
243 |     trampolines can be made for a single Mu function, each of which supplies a
244 |     different cookie. In this case, the cookie can identify the context for the
245 |     closure.
246 | 
247 |     The simplest kind of cookie is an integer, but an object reference may also
248 |     be a candidate.
249 | 
250 | Since Mu programs need special contexts to execute (such as the thread-local
251 | memory allocation pool for the garbage collector, and the notion of the "current
252 | stack" for the SWAP-STACK operation), a native thread needs to attach itself to
253 | the Mu instance before calling any Mu functions. If a Mu thread calls native
254 | code from Mu, then it is already attached and can freely call back to Mu again.
255 | How to attach a thread to Mu is implementation-defined.
256 | 
257 |     For JVM users: The JNI invocation API function ``AttachCurrentThread()`` and
258 |     ``DetachCurrentThread()`` are the counterpart of this requirement.
259 | 
260 | Stack Sharing and Stack Introspection
261 | -------------------------------------
262 | 
263 | The callee may share the stack with the caller. 
264 | 
265 | When a Mu function "A" calls a native function which then calls back to another
266 | Mu function "B", Mu sees one single native frame between the frames for "A" and
267 | "B". When a Mu function is called from a native function without other Mu
268 | functions below, Mu consider the Mu function sitting on top of a native frame.
269 | 
270 | Stack introspection can skip native frames and introspect other Mu frames below.
271 | 
272 |     NOTE: The requirement to "see through" native frames is partially required
273 |     by exact garbage collection, in which case all references in the stack must
274 |     be identified.
275 | 
276 | However, throwing Mu exceptions into native frames has implementation-defined
277 | behaviour. Attempting to pop native frames via the API also has
278 | implementation-defined behaviour.
279 | 
280 |     NOTE: In general, it is not safe to force unwind native frames because
281 |     native programs may need to clean up their own resources. Existing
282 |     approaches, including JNI, models high-level (such as Java-level) exceptions
283 |     as a query-able state rather than actual stack unwinding through native
284 |     programs.
285 | 
286 | Native exceptions thrown into Mu frames also have implementation-defined
287 | behaviours.
288 | 
289 |     NOTE: Similar to native frames, Mu programs may have even more necessary
290 |     clean-up operations, such as GC barriers.
291 | 
292 | Changes in the Mu IR and the API introduced by the native interface
293 | ===================================================================
294 | 
295 | **New types**:
296 | 
297 | * ``uptr < T >``
298 | * ``ufuncptr < sig >``
299 | 
300 | See `Type System <type-system.rest>`__
301 | 
302 | **New top-level definitions**:
303 | 
304 | * function exposing definition
305 | 
306 | See `Mu IR <uvm-ir.rest>`__.
307 | 
308 | **New instructions**:
309 | 
310 | * ``PTRCAST``
311 | * ``@uvm.native.pin``
312 | * ``@uvm.native.unpin``
313 | * ``@uvm.native.expose``
314 | * ``@uvm.native.unexpose``
315 | * ``@uvm.native.get_cookie``
316 | 
317 | See `Instruction Set <instruction-set.rest>`__ and `Common Instructions
318 | <common-insts.rest>`__.
319 | 
320 | **Modified instructions**:
321 | 
322 | * Memory addressing:
323 | 
324 |   * ``GETFIELDIREF``
325 |   * ``GETELEMIREF``
326 |   * ``SHIFTIREF``
327 |   * ``GETVARPARTIREF``
328 |   * ``LOAD``
329 |   * ``STORE``
330 |   * ``CMPXCHG``
331 |   * ``ATOMICRMW``
332 | 
333 | * ``CCALL``
334 | 
335 | Memory addressing instructions take an additional ``PTR`` flag. If this flag is
336 | present, the location operand must be ``uptr<T>`` rather than ``iref<T>``. For
337 | example:
338 | 
339 | * ``%new_ptr = GETFIELDIREF PTR <@some_struct 3>   %ptr_to_some_struct``
340 | * ``%new_ptr = GETELEMIREF  PTR <@some_array @i64> %ptr_to_some_array @const1``
341 | * ``%new_ptr = SHIFTIREF    PTR <@some_elem  @i64> %ptr_to_some_elem  @const2``
342 | * ``%new_ptr = GETVARPARTIREF   PTR <@some_hybrid> %ptr_to_some_hybrid``
343 | * ``%old_val = LOAD          PTR SEQ_CST         <@T> %ptr_to_T``
344 | * ``%void    = STORE         PTR SEQ_CST         <@T> %ptr_to_T %newval``
345 | * ``%result  = CMPXCHG       PTR ACQ_REL ACQUIRE <@T> %ptr_to_T %expected %desired``
346 | * ``%old_val = ATOMICRMW ADD PTR SEQ_CST         <@T> %ptr_to_T %rhs``
347 | 
348 | See `Instruction Set <instruction-set.rest>`__.
349 | 
350 | **New API functions**:
351 | 
352 | * ``ptrcast``
353 | * ``pin``
354 | * ``unpin``
355 | * ``expose``
356 | * ``unexpose``
357 | 
358 | **Modified API functions**:
359 | 
360 | The ``cur_func_ver`` function, in addition to returning the function version
361 | ID, it may also return 0 if the selected frame is a native frame. (Multiple
362 | native frames are counted as one between two Mu frames.)
363 | 
364 | The ``pop_frames_to`` function has implementation-defined behaviours when
365 | popping native frames.
366 | 
367 | When rebinding a thread to a stack with a value, and the top frame is on a call
368 | site (native or Mu), the value associated with the rebinding is the return value
369 | of the call.
370 |   
371 | Future Works
372 | ============
373 | 
374 |     TODO: Inline assembly
375 | 
376 | .. vim: tw=80
377 | 


--------------------------------------------------------------------------------
/overview.rest:
--------------------------------------------------------------------------------
  1 | ========
  2 | Overview
  3 | ========
  4 | 
  5 | Mu is a micro virtual machine designed to support high-level programming
  6 | languages. It focuses on three basic concerns: 
  7 | 
  8 | - garbage collection
  9 | - concurrency
 10 | - just-in-time compiling
 11 | 
 12 | The Concept of Micro Virtual Machines
 13 | =====================================
 14 | 
 15 | Many programming languages are implemented on virtual machines.
 16 | 
 17 | There are many aspects in a language implementation. There are high-level
 18 | aspects which are usually language-specific, including:
 19 | 
 20 | * parsing high-level language programs, or loading high-level byte codes
 21 | * object oriented programming, including classes, inheritance and polymorphism
 22 |   (if applicable)
 23 | * functional programming, including high-order functions, pattern matching, etc.
 24 |   (if applicable)
 25 | * eager or lazy code/class loading
 26 | * high-level optimisation
 27 | * a comprehensive standard library
 28 | 
 29 | as well as low-level aspects which are usually language-neutral, including:
 30 | 
 31 | * an execution engine (for example, JIT compiling)
 32 | * a model of threads and a memory model
 33 | * garbage collection
 34 | 
 35 | A "monolithic" VM implements everything listed above. JVM is one of such VMs.
 36 | Creating such a VM is a huge amount of work. It takes two decades and billions
 37 | of dollars for the JVM to have a high-quality implementation. Such man-power and
 38 | investment is usually unavailable for other languages.
 39 | 
 40 | We coined the term "**micro virtual machine**". It is an analogue to the term
 41 | "microkernel" in the operating system context. A micro virtual machine, like a
 42 | micro kernel, only does what absolutely needs to be done in the micro virtual
 43 | machine, and pushes most high-level aspects to its client, the counterpart of
 44 | the "services" of a microkernel.
 45 | 
 46 | In a language implementation with the presence of a micro virtual machine, the
 47 | micro virtual machine shall handle those three low-level aspects, namely
 48 | concurrency, JIT and GC, and the client handles all other high-level aspects.
 49 | 
 50 | Take JVM as an example. If JVM were implemented as a client of a micro virtual
 51 | machine, it only needs to handle JVM-specific features, including the byte-code
 52 | format, class loading and aspects of object-oriented programming.
 53 | 
 54 | ::
 55 | 
 56 |     Traditional JVM
 57 |     +-------------------+           +---------------------------+
 58 |     |                   |           |                           |
 59 |     |  *JVM*            |           |  *Java Client*            |
 60 |     | byte code format  |           | byte-code format          |
 61 |     | class loading     |           | class loading             |
 62 |     | OOP               |           | OOP                       |
 63 |     | GC                |           |                           |
 64 |     | concurrenty       |           +---------------------------+
 65 |     | JIT compiling     |           |  *micro virtual machine*  |
 66 |     |                   |           | GC,concurrency,JIT        |
 67 |     +-------------------+           +---------------------------+
 68 |     |  *OS*             |           |  *OS*                     |
 69 |     +-------------------+           +---------------------------+
 70 | 
 71 | The Mu Project
 72 | ==============
 73 | 
 74 | Mu is a concrete micro virtual machine.
 75 | 
 76 | The main part of this project is this specification which defines the behaviour
 77 | of Mu and the interaction with the client. This allows multiple compliant
 78 | implementations.
 79 | 
 80 | The specification mainly includes the type system, the instruction set and the
 81 | Mu client interface (sometimes called "the API").
 82 | 
 83 | The Mu Architecture
 84 | -------------------
 85 | 
 86 | The whole system is divided into a language-specific **client** and a
 87 | language-neutral **micro virtual machine** (in this case, it is Mu).
 88 | 
 89 | ::
 90 | 
 91 |          | source code or byte code
 92 |          v
 93 |     +-----------------+
 94 |     | client          |
 95 |     +-----------------+
 96 |               |   ^
 97 |     Mu IR /   |   | traps/watchpoints/
 98 |     API call  |   | other events
 99 |               v   |
100 |     +-------------------+  manages   +---------+
101 |     | Mu (the micro VM) |----------->| Mu heap |
102 |     +-------------------+            +---------+
103 | 
104 | A typical client implements a high-level language (e.g. Python or Lua). Such a
105 | client would be responsible for loading, parsing and executing the source code
106 | or byte code.
107 | 
108 | The client submits programs to Mu in a language called **Mu Intermediate
109 | Representation**, a.k.a. **Mu IR**. The Mu IR code is then executed on Mu.
110 | 
111 | The client can directly manipulate the states of Mu using the **Mu client
112 | Interface**, a.k.a. **the API**. The API can access the Mu memory (including the
113 | heap), create Mu threads, stacks, introspect stack states and so on. The Mu IR
114 | code mentioned above is submitted via the API, too.
115 | 
116 | There are events which Mu cannot handle alone. These include lazy code loading,
117 | requesting for optimisation/deoptimisation and so on. In these cases, Mu
118 | generates events to be handled by the client.
119 | 
120 | Mu handles garbage collection internally. Mu can identify all references held
121 | inside Mu and also tracks all references held by the client. So exact GC is
122 | possible in Mu without the intervention from the client.
123 | 
124 | The Mu Type System
125 | -------------------
126 | 
127 | The Mu type system has scalar and vector integer and floating point types,
128 | aggregate types including structs, arrays and hybrids, as well as reference
129 | types. The type system is low level, similar to the level of C, but natively
130 | supports reference types.
131 | 
132 | Mu is agnostic of the type hierarchy in high-level languages, but the client can
133 | implement its language-specific type system and run-time type information on top
134 | of the Mu type system.
135 | 
136 | See `Type System <type-system.rest>`__ for more details.
137 | 
138 | The Mu Instruction Set
139 | -----------------------
140 | 
141 | The Mu instruction set is similar to (and is actually inspired by) the `LLVM
142 | <http://llvm.org/>`__'s instruction set. There are primitive
143 | arithmetic/logical/relational/conversion operations and control flow
144 | instructions.
145 | 
146 | Mu has its own exception handling, not depending on system libraries as C++
147 | does. Mu IR programs can throw and catch exceptions, but the client needs to
148 | implement its own exception type hierarchy if applicable.
149 | 
150 | There are garbage-collection-aware memory operations, including memory
151 | allocation, addressing and accessing. The client does not need to implement
152 | garbage collection algorithms; it only needs to use reference types and related
153 | instructions and Mu handles the rest.
154 | 
155 | Trap instructions let Mu IR programs talk back to the client for events it
156 | cannot handle.
157 | 
158 | There are also instructions for handling stack and threads.
159 | 
160 | See `Instruction Set <instruction-set.rest>`__ and `Common Instructions
161 | <common-insts.rest>`__ for more details.
162 | 
163 | The Mu Client Interface
164 | ------------------------
165 | 
166 | The Mu client interface (API) allows the client to directly manipulate the state
167 | of Mu.
168 | 
169 | The API can load Mu IR code.
170 | 
171 | The API can create threads and stacks. The usual way to start a Mu program is
172 | to create a new stack with a function on the bottom of the stack and create a
173 | new thread on it to start execution. (The concept of threads and stacks are
174 | discussed later.)
175 | 
176 | The API can directly allocate and access the Mu memory. References are
177 | indirectly exposed to the client as handles rather than raw pointers for the
178 | ease of garbage collection. (The JVM takes the same approach.)
179 | 
180 | The client also handles trap events generated by the Mu IR code. The client can
181 | introspect the selected local variables on the stack and perform on-stack
182 | replacement (i.e. OSR. Discussed later.)
183 | 
184 | See `Client Interface <uvm-client-interface.rest>`__ for more details.
185 | 
186 | Unsafe Native Interface
187 | -----------------------
188 | 
189 | The (unsafe) native interface is designed to directly interact with native
190 | programs. It gives Mu program direct access to the memory via pointers, and
191 | allows pinning Mu objects so that they can be accessed by native programs. It
192 | also allows Mu to call a native (usually C) function directly, and allows native
193 | programs to call back to selected Mu functions. (The .NET CLR takes similar
194 | approach, i.e. giving the high-level program "unsafe" access to the lower
195 | level.)
196 | 
197 | This interface is different from the client API. The main purpose is to
198 | implement the system-interfacing part of the high-level language, such as the IO
199 | and the networking library.
200 | 
201 | See `Native Interface <native-interface.rest>`__ for more details.
202 | 
203 | Multi-Threading
204 | ---------------
205 | 
206 | Mu supports threads. Mu threads are usually implemented with native OS threads,
207 | but this specification does not enforce this. Multiple Mu threads may execute
208 | simultaneously.
209 | 
210 | Mu has a C11/C++11-like memory model. There are atomic memory access with
211 | different memory orders. The client should generate code with the appropriate
212 | memory order for its high-level language.
213 | 
214 | Mu provides a Futex-like mechanism similar to the counterpart provided by the
215 | Linux kernel. It is the client's responsibility to implement mutex locks,
216 | semaphores, conditions, barriers and so on using atomic memory accesses and the
217 | futex.
218 | 
219 | See `Threads and Stacks <threads-stacks.rest>`__ for details about threads and
220 | `Memory Model <memory-model.rest>`__ for the Mu memory model.
221 | 
222 | The Swap-stack Operation
223 | ------------------------
224 | 
225 | Mu distinguishes between threads and stack. In Mu, a thread is the unit of CPU
226 | scheduling and a stack is the context in which a thread executes. An analogy is
227 | "workers and jobs".
228 | 
229 | A stack has multiple frames, each of which is a context of a function
230 | activation, including local variables and the current instruction.
231 | 
232 | A *swap-stack* operation unbinds a thread from a stack (the old context) and
233 | bind to another stack (the new context). As a result, the old context of
234 | execution is paused and can be continued when another swap-stack operation binds
235 | another thread (may not be the same old thread) to that stack. This is similar
236 | to letting a worker stop doing one job and continue with another job.
237 | 
238 | The swap-stack operation is essentially a model of symmetric coroutines.  It
239 | allows the client to implement coroutines in high-level languages (e.g. Ruby,
240 | Lua, Go as well as Python and ECMAScript 6).
241 | 
242 | It also allows the client to implement its own light-weight
243 | thread. This is particularly useful for languages with massively many threads
244 | (e.g. Erlang).
245 | 
246 | See `Threads and Stacks <threads-stacks.rest>`__ for details.
247 | 
248 | Function Redefinition
249 | ---------------------
250 | 
251 | It is a common strategy to use a fast compiler to compile high-level programs to
252 | suboptimal low-level code, and only optimise when the implementation decided at
253 | run time that a function (or loop) is hot. Then an optimised version is
254 | compiled. Optimising compilation usually takes longer, but the code runs faster.
255 | 
256 | In Mu, a function can have zero or more versions. When a function is called, it
257 | always calls the newest version.
258 | 
259 | The semantic of *function definition* in Mu is to create a new version of a
260 | function. What the client should do is to generate an optimised version of the
261 | high-level code in Mu IR and submit it to Mu. All call sites and function
262 | references are automatically updated.
263 | 
264 | If a function has zero versions, it is "undefined". Calling such a function will
265 | "trap" to the client. Such functions behave like "stubs" and this gives the
266 | client a chance to implement lazy code/class loading.
267 | 
268 | See `Intermediate Representation <uvm-ir.rest.rest>`__ for the definition of
269 | functions and versions, and see `Client Interface <uvm-client-interface.rest>`__ for
270 | the code loading interface.
271 | 
272 | On-stack Replacement
273 | --------------------
274 | 
275 | At the same time when an optimised version of a function is compiled, there are
276 | existing activations on the stack still running the old version.  On-stack
277 | Replacement (OSR) is the operation to replace an existing stack frame with
278 | another frame.
279 | 
280 | Mu provides two primitives in its API:
281 | 
282 | 1. Pop the top frame of a stack.
283 | 2. Given a function and its arguments, create a new frame and push it on the top
284 |    of a stack.
285 | 
286 | Note that Mu is oblivious about whether the new version is "equivalent to" or
287 | "better than" the old version. The responsibility of optimisation is pushed to
288 | the client.
289 | 
290 | See `Client Interface <uvm-client-interface.rest>`__ for more details.
291 | 
292 | Miscellaneous Topics
293 | --------------------
294 | 
295 | The `Memory <uvm-memory.rest>`__ chapter provides more detail about garbage
296 | collection and memory allocation/accessing.
297 | 
298 | The `Portability <portability.rest.rest>`__ chapter describes the requirements of
299 | implementations. It summarises corner cases which may result in different or
300 | undefined behaviours in different platforms.
301 | 
302 | .. vim: tw=80
303 | 


--------------------------------------------------------------------------------
/portability.rest:
--------------------------------------------------------------------------------
  1 | ==============
  2 | Portability
  3 | ==============
  4 | 
  5 | As both a thin layer over the hardware and an abstraction over concurrency, JIT
  6 | compiling and GC, Mu must strike a balance between portability and the ability
  7 | to exploit platform-specific features. Thus Mu is designed in such a way that
  8 | 
  9 | 1. There is a basic set of types and instructions that have common and defined
 10 |    behaviours and reasonably good performance on all platforms.
 11 | 2. Mu also includes platform-specific instructions. These instructions are
 12 |    either defined by this Mu specification or extended by Mu implementations.
 13 | 
 14 | In this chapter, **required** features must be implemented by Mu
 15 | implementation and **optional** features may or may not be implemented. However,
 16 | if an optional feature is implemented, it must behave as specified.
 17 | 
 18 |     NOTE: Although "behaving as specified", the implementation can still reject
 19 |     some inputs in a compliant way. For example, if an array type is too large,
 20 |     Mu still needs to accept the Mu IR that contains such a type, but may always
 21 |     refuse to allocate such a type in the memory.
 22 | 
 23 | The platform-independent part of the native interface is a required component of
 24 | the Mu spec, but the platform-dependent parts are not. There will be things
 25 | which are "required but has implementation-defined behaviours". In this case, Mu
 26 | must not reject IR programs that contain such constructs, but each
 27 | implementation may do different things.
 28 | 
 29 | Type System
 30 | ===========
 31 | 
 32 | The ``int`` type of lengths 1, 8, 16, 32, and 64 are required. ``int`` of 6 and
 33 | 52 bits are required if Mu also implements the ``tagref64`` type. Other lengths
 34 | are optional.
 35 | 
 36 | Both ``float`` and ``double`` are required.
 37 | 
 38 | The vector types ``vector<int<32> 4>``, ``vector<float 4>`` and ``vector<double
 39 | 2>`` are required. Other vector types are optional.
 40 | 
 41 |     NOTE: Even though required to be accepted by Mu, they are not required to be
 42 |     implemented using hardware-provided vector instructions or vector registers
 43 |     of the exact same length. They can be implemented totally with scalar
 44 |     operations and general-purpose registers or memory, or implemented using
 45 |     different hardware vector sizes, larger or smaller.
 46 | 
 47 | Reference types ``ref<T>``, ``iref<T>`` and ``weakref<T>`` are required if ``T``
 48 | is implemented. Otherwise optional.
 49 | 
 50 | A struct type ``struct<...>`` are required if it has at most 256 fields and all
 51 | of its field types are implemented. Otherwise optional.
 52 | 
 53 | An array type ``array<T n>`` is required if T is implemented and n is less than
 54 | 2^64. Otherwise optional.
 55 | 
 56 |     NOTE: This implies Mu must accept array types of up to 2^64-1 elements.
 57 |     However, arrays must be in the memory. Whether such an array can be
 58 |     successfully allocated is a different story.
 59 | 
 60 | A hybrid type ``hybrid<Fs V>`` is required if all fields in ``Fs`` and ``V`` are
 61 | implemented.
 62 | 
 63 | The void type ``void`` is required.
 64 | 
 65 | A function type ``funcref<Sig>`` is required if ``Sig`` is implemented.
 66 | 
 67 | The opaque types ``threadref`` and ``stackref`` are both required.
 68 | 
 69 | The tagged reference type ``tagref64`` is optional.
 70 | 
 71 | A function signature ``R (P0 P1 ...)`` is required if all of its parameter types
 72 | and its return type are implemented and there are at most 256 parameters.
 73 | Otherwise optional.
 74 | 
 75 | Pointer types ``uptr<T>`` and ``ufuncptr<sig>`` are required for required and
 76 | native-safe types ``T`` and signatures ``sig``. Both are represented as
 77 | integers, but their lengths are implementation-defined.
 78 | 
 79 | Constants
 80 | =========
 81 | 
 82 | Integer constants of type ``int<n>`` is required for all implemented n.
 83 | 
 84 | Float and double constants are required.
 85 | 
 86 | A struct constant is required if constants for all of its fields are
 87 | implemented.
 88 | 
 89 | All NULL constants are required.
 90 | 
 91 | Pointer constants are required, but the implementation defines the length of
 92 | them.
 93 | 
 94 | Instructions
 95 | ============
 96 | 
 97 | All integer binary operations and comparisons are required for ``int`` of length
 98 | 8, 16, 32, 64 and required integer vector types, and optional for other integer
 99 | lengths or integer vector types. All floating-point binary operations and
100 | comparisons are required for all floating point types and required floating
101 | point vector types, and optional for other floating point vector types.
102 | 
103 | In the event of signed and unsigned integer overflow in binary operations, the
104 | result is truncated to the length of the operand type.
105 | 
106 | Divide-by-zero caused by ``UDIV`` and ``SDIV`` results in exceptional control
107 | flows. The result of signed overflow by ``SDIV`` is the left-hand-side.
108 | 
109 |     NOTE: -0x80000000 / -1 == -0x80000000 for signed 32-bit int.
110 | 
111 | For shifting instructions ``SHL``, ``LSHR`` and ``ASHR`` for integer type
112 | ``int<n>``, only the lowest ``m`` bits of the right-hand-side are used, where
113 | ``m`` is the smallest integer that ``2^m`` >= ``n``.
114 | 
115 | Conversions instructions are required between any two implemented types that can
116 | be converted. Specifically, given two types T1 and T2 and a conversion operation
117 | CONVOP, if both T1 and T2 are implemented and they satisfied the requirement of
118 | the ``T1`` and ``T2`` parameters of CONVOP (see `<instruction-set.rest>`__), then the
119 | CONVOP operation converting from T1 to T2 is required.
120 | 
121 | Binary floating point operations round to nearest and round ties to even.
122 | Conversions involving floating point numbers round towards zero except
123 | converting from floating point to integer, in which case, round towards zero and
124 | the range is clamped to that of the result type and NaN is converted to 0.
125 | Binary operations, comparisons and conversions involving floating point numbers
126 | never raise exceptions or hardware traps.  *[JVM behaviour]*
127 | 
128 | Switch is required for the operand type of ``int`` of length 8, 16, 32 and 64.
129 | Otherwise optional.
130 | 
131 | Calling a function whose value is ``NULL`` is undefined behaviour.
132 | 
133 | Throwing exceptions across native frames has implementation-defined behaviour.
134 | 
135 | Stack overflow when calling a function results in taking the exceptional control
136 | flow and the exceptional parameter instruction receives ``NULL``.
137 | 
138 | ``EXTRACTELEMENT`` and ``INSERTELEMENT`` is required for all implemented vector
139 | types and integer types. ``SHUFFLEELEMENT`` is required if the source vector
140 | type, the mask vector type and the result vector type are all implemented.
141 | 
142 | All memory allocation instructions ``NEW``, ``NEWHYBRID``, ``ALLOCA`` and
143 | ``ALLOCAHYBRID`` are allowed to result in error, in which case the exceptional
144 | control flow is taken.
145 | 
146 |     NOTE: This is for out-of-memory error and other errors.
147 | 
148 | The ``GETELEMIREF`` and ``SHIFTIREF`` instructions accept any integer type as
149 | the index or offset type. The index and the offset are treated as signed. When
150 | these two instructions result in a reference beyond the size of the actual array
151 | in the memory, they have undefined behaviours.
152 | 
153 | ``GETVARPARTIREF`` has undefined behaviour if the hybrid has zero elements in
154 | its variable part.
155 | 
156 | All memory addressing instructions ``GETIREF``, ``GETFIELDIREF``,
157 | ``GETELEMIREF``, ``SHIFTIREF`` and ``GETVARPARTIREF`` give
158 | undefined behaviour when applied to ``NULL`` references. But when applied to
159 | pointers, these instructions calculates the result by calculating the offset
160 | according to the memory layout, which is implementation-defined.
161 | 
162 | All memory access instructions ``LOAD``, ``STORE``, ``CMPXCHG`` and
163 | ``ATOMICRMW`` that access the ``NULL`` references take the exceptional control
164 | flow. If they access an invalid memory location (this include the case when the
165 | stack frame that contains a stack cell created by ``ALLOCA`` is popped and a
166 | reference to it becomes a dangling reference), then they have undefined
167 | behaviours.
168 | 
169 | Accessing a memory location which represents a type different from the type
170 | expected by the instruction gives undefined behaviour.
171 | 
172 | Accessing the memory via a pointer behaves as if accessing via an ``iref`` if
173 | the byte region represented by the pointer overlaps with the mapped (pinned)
174 | memory region of the Mu memory location. It behaves as if updating the memory
175 | byte-by-byte (not atomically) when all bytes in the byte region pointed by the
176 | pointer are part of some Mu memory locations. Otherwise such memory access has
177 | implementation-defined behaviours.
178 | 
179 | The following types are required for both non-atomic and atomic ``LOAD`` and
180 | ``STORE`` for all implemented ``T`` and ``Sig``: ``int<8>``, ``int<16>``,
181 | ``int<32>``, ``int<64>``, ``float``, ``double``, ``ref<T>``, ``iref<T>``,
182 | ``weakref<T>``, ``funcref<Sig>``, ``threadref``, ``stackref``, ``uptr<T>`` and
183 | ``ufuncptr<Sig>``.
184 | 
185 | The following types are required for non-atomic ``LOAD`` and ``STORE``:
186 | ``vector<int<32> 4>``, ``vector<float 4>`` and ``vector<double 2>``.
187 | 
188 | ``int<32>``, ``int<64>``, ``ref``, ``iref``, ``weakref``, ``funcref``,
189 | ``threadref``, ``stackref``, ``uptr`` and ``ufuncptr`` are required for
190 | ``CMPXCHG`` and the ``XCHG`` operation of the ``ATOMICRMW`` instruction. 
191 | 
192 | ``int<32>`` and ``int<64>`` are required for all ``ATOMICRMW`` operations.
193 | 
194 | If ``tagref64`` is implemented, it is required for both atomic and non-atomic
195 | ``LOAD`` and ``STORE``, and the ``XCHG`` operation of the ``ATOMICRMW``
196 | instruction.
197 | 
198 | Other types are optional for ``CMPXCHG`` and any subset of ``ATOMICRMW``
199 | operations.
200 | 
201 | One atomic Mu instruction does not necessarily correspond to exactly one
202 | machine instruction. So some atomic read-modify-write operations can be
203 | implemented using ``CMPXCHG`` or load-link store-conditional constructs.
204 | 
205 | ``CCALL`` is required, but the behaviour is implementation-defined. The
206 | available calling conventions are implementation-defined.
207 | 
208 | ``@uvm.new_stack`` and ``NEWTHREAD`` is allowed to result in error, in which
209 | case the exceptional control flow is taken.
210 | 
211 | The availability of ``COMMINST`` is specified in the next section.
212 | 
213 | In any cases when an error occurs and the control flow is expected to transfer
214 | to the exceptional control flow, but the exception clause is not supplied, then
215 | it gives undefined behaviours.
216 | 
217 | All instructions whose availability is not explicitly specified above are
218 | required for all types and signatures that are implemented and suitable.
219 | 
220 | Common Instructions
221 | ===================
222 | 
223 | Required only when ``tagref64`` is implemented:
224 | 
225 | * @uvm.tr64.is_fp
226 | * @uvm.tr64.is_int
227 | * @uvm.tr64.is_ref
228 | * @uvm.tr64.to_fp
229 | * @uvm.tr64.to_int
230 | * @uvm.tr64.to_ref
231 | * @uvm.tr64.to_tag
232 | * @uvm.tr64.from_fp
233 | * @uvm.tr64.from_int
234 | * @uvm.tr64.from_ref
235 | 
236 | All other common instructions are always required.
237 | 
238 | The Mu implementation may add common instructions.
239 | 
240 | .. vim: tw=80
241 | 


--------------------------------------------------------------------------------
/scripts/extract_comminst_macros.py:
--------------------------------------------------------------------------------
 1 | #/usr/bin/env python3
 2 | 
 3 | """
 4 | Extract comminst definitions from common-insts.rest into C macros.
 5 | 
 6 | USAGE: python3 script/extract_comminst_macros.py < common-inst.rest
 7 | """
 8 | 
 9 | import re
10 | import sys
11 | 
12 | pat = re.compile(r'\[(0x[0-9a-f]+)\]@([a-zA-Z0-9_.]+)', re.MULTILINE)
13 | 
14 | defs = []
15 | longest = 0
16 | 
17 | text = sys.stdin.read()
18 | 
19 | for opcode, name in pat.findall(text):
20 |     macro_name = "MU_CI_" + name.upper().replace(".", "_")
21 |     opcode = opcode.upper()
22 |     defs.append((macro_name, opcode))
23 |     longest = max(longest, len(macro_name))
24 | 
25 | for macro_name, opcode in defs:
26 |     print("#define {} {}".format(macro_name.ljust(longest), opcode))
27 | 
28 | 


--------------------------------------------------------------------------------
/scripts/muapiparser.py:
--------------------------------------------------------------------------------
  1 | """
  2 | Parse the muapi.h so that you can generate different bindings.
  3 | 
  4 | The result will be a simple JSON object (dict of dicts).
  5 | """
  6 | 
  7 | import re
  8 | 
  9 | import injecttools
 10 | 
 11 | r_commpragma = re.compile(r'///\s*MUAPIPARSER:(.*)$')
 12 | r_comment = re.compile(r'//.*$', re.MULTILINE)
 13 | r_decl = re.compile(r'(?P<ret>\w+\s*\*?)\s*\(\s*\*\s*(?P<name>\w+)\s*\)\s*\((?P<params>[^)]*)\)\s*;\s*(?:///\s*MUAPIPARSER\s+(?P<pragma>.*)$)?', re.MULTILINE)
 14 | r_param = re.compile(r'\s*(?P<type>\w+\s*\*?)\s*(?P<name>\w+)')
 15 | 
 16 | r_define = re.compile(r'^\s*#define\s+(?P<name>\w+)\s*\(\((?P<type>\w+)\)(?P<value>\w+)\)\s*$', re.MULTILINE)
 17 | 
 18 | r_typedef = re.compile(r'^\s*typedef\s+(?P<expand_to>\w+\s*\*?)\s*(?P<name>\w+)\s*;', re.MULTILINE)
 19 | 
 20 | r_struct_start = re.compile(r'^struct\s+(\w+)\s*\{')
 21 | r_struct_end = re.compile(r'^\};')
 22 | 
 23 | def filter_ret_ty(text):
 24 |     return text.replace(" ","")
 25 | 
 26 | def extract_params(text):
 27 |     params = []
 28 |     for text1 in text.split(','):
 29 |         ty, name = r_param.search(text1).groups()
 30 |         ty = ty.replace(" ",'')
 31 |         params.append({"type": ty, "name": name})
 32 | 
 33 |     return params
 34 | 
 35 | def extract_pragmas(text):
 36 |     text = text.strip()
 37 |     if len(text) == 0:
 38 |         return []
 39 |     else:
 40 |         return text.split(";")
 41 | 
 42 | def extract_methods(body):
 43 |     methods = []
 44 |     for ret, name, params, pragma in r_decl.findall(body):
 45 |         methods.append({
 46 |             "name": name,
 47 |             "params": extract_params(params),
 48 |             "ret_ty": filter_ret_ty(ret),
 49 |             "pragmas": extract_pragmas(pragma),
 50 |             })
 51 |         
 52 |     return methods
 53 | 
 54 | def extract_struct(text, name):
 55 |     return injecttools.extract_lines(text, (r_struct_start, name), (r_struct_end,))
 56 | 
 57 | def extract_enums(text, typename, pattern):
 58 |     defs = []
 59 |     for m in r_define.finditer(text):
 60 |         if m is not None:
 61 |             name, ty, value = m.groups()
 62 |             if pattern.search(name) is not None:
 63 |                 defs.append({"name": name, "value": value})
 64 |     return {
 65 |             "name": typename,
 66 |             "defs": defs,
 67 |             }
 68 | 
 69 | _top_level_structs = ["MuVM", "MuCtx"]
 70 | _enums = [(typename, re.compile(regex)) for typename, regex in [
 71 |     ("MuTrapHandlerResult", r'^MU_(THREAD|REBIND)'),
 72 |     ("MuDestKind",          r'^MU_DEST_'),
 73 |     ("MuBinOptr",           r'^MU_BINOP_'),
 74 |     ("MuCmpOptr",           r'^MU_CMP_'),
 75 |     ("MuConvOptr",          r'^MU_CONV_'),
 76 |     ("MuMemOrd",            r'^MU_ORD_'),
 77 |     ("MuAtomicRMWOptr",     r'^MU_ARMW_'),
 78 |     ("MuCallConv",          r'^MU_CC_'),
 79 |     ("MuCommInst",          r'^MU_CI_'),
 80 |     ]]
 81 | 
 82 | def extract_typedefs(text):
 83 |     typedefs = {}
 84 |     for m in r_typedef.finditer(text):
 85 |         expand_to, name = m.groups()
 86 |         typedefs[name] = expand_to.replace(" ","")
 87 | 
 88 |     return typedefs
 89 | 
 90 | def parse_muapi(text):
 91 |     structs = []
 92 | 
 93 |     for sn in _top_level_structs:
 94 |         b = extract_struct(text, sn)
 95 |         methods = extract_methods(b)
 96 |         structs.append({"name": sn, "methods": methods})
 97 | 
 98 |     enums = []
 99 | 
100 |     for tn,pat in _enums:
101 |         enums.append(extract_enums(text, tn, pat))
102 | 
103 |     typedefs = extract_typedefs(text)
104 | 
105 |     return {
106 |             "structs": structs,
107 |             "enums": enums,
108 |             "typedefs": typedefs,
109 |             }
110 | 
111 | if __name__=='__main__':
112 |     import sys, pprint, shutil
113 | 
114 |     width = 80
115 | 
116 |     try:
117 |         width, height = shutil.get_terminal_size((80, 25))
118 |     except:
119 |         pass
120 | 
121 |     text = sys.stdin.read()
122 |     pprint.pprint(parse_muapi(text), width=width)
123 | 
124 | 
125 | 


--------------------------------------------------------------------------------
/threads-stacks.rest:
--------------------------------------------------------------------------------
  1 | ==================
  2 | Threads and Stacks
  3 | ==================
  4 | 
  5 | One unique feature of Mu is the flexible relation between stacks and threads. A
  6 | thread can swap between multiple stacks to achieve light-weighted context
  7 | switch. This provides support for language features like co-routines and green
  8 | threads.
  9 | 
 10 | On the other hand, Mu does allow multiple simultaneous threads. Mu threads, by
 11 | design, can be implemented as native OS threads and make use of parallel CPU
 12 | resources. Mu also has a memory model. See the `Memory Model <memory-model.rest>`__
 13 | chapter for more information.
 14 | 
 15 | This chapter discusses Mu threads and Mu stacks. In this chapter, "thread"
 16 | means "Mu thread" unless explicitly stated otherwise.
 17 | 
 18 | Concepts
 19 | ========
 20 | 
 21 | A **stack** is the context of nested or recursive activations of functions.
 22 | 
 23 |     NOTE: "Stack" here means the "control stack", or more precisely the
 24 |     "context" of execution. On a concrete machine, the context includes not only
 25 |     the stack, but also the CPU/register states. Mu abstracts the CPU state,
 26 |     modelling it as part of the state of the stack-top frame.
 27 | 
 28 | A stack has many **frames**, each of which is the context of one function
 29 | activation. A frame contains the states of all local variables (parameters and
 30 | instructions), the program counter and alloca cells (see `Mu and the Memory
 31 | <uvm-memory.rest>`__). Each frame is associated with a *version* of a function.
 32 | 
 33 |     NOTE: Because Mu allows function redefinition, a function may be redefined
 34 |     by the client, and newly created function activations (newly called
 35 |     functions) will use the new definition. But any existing function
 36 |     activations will still use their old definitions, thus a frame is only bound
 37 |     to a particular version of a function, not just a function. This is very
 38 |     important because Mu cannot magically translate the state of any old
 39 |     function activation to a new one. A redefined function may even have
 40 |     completely different meaning from the old one. Mu allows the client to do
 41 |     crazy things like redefining a factorial function to a Fibonacci function.
 42 | 
 43 |     During on-stack replacement, the Mu client API can tell the client which
 44 |     version of which function any frame is executing and the value of KEEPALIVE
 45 |     variables.  The client is responsible for translating the Mu-level states to
 46 |     the high-level language states.
 47 | 
 48 | A **thread** is the unit of CPU scheduling. A thread can be **bound** to a
 49 | stack, in which case the thread executes using the stack as its context.
 50 | The phrase "bind a stack to a thread" has the same meaning as "bind a thread to
 51 | a stack". While a thread is executing on a stack, it changes the state of the
 52 | stack, including changing the value of local variables by executing
 53 | instructions, pushing or popping frames and allocating memory on the stack.
 54 | 
 55 | A stack can be bound to at most one thread at any moment.
 56 | 
 57 | A thread is always bound to one stack, with one exception: when executing a
 58 | ``TRAP`` or ``WATCHPOINT`` instruction, it is temporarily unbound from its
 59 | current stack. It either rebinds to a stack (may be the old stack or another
 60 | stack) or terminates after returning from the trap handler.
 61 | 
 62 |     TODO: https://github.com/microvm/microvm-meta/issues/42 Extend the unbinding
 63 |     to undefined function handling.
 64 | 
 65 | State of Threads
 66 | ================
 67 | 
 68 | The state of a thread include:
 69 | 
 70 | - the *stack* it is bound to (explained in this chapter)
 71 | 
 72 | - a *thread-local object reference* (see below)
 73 | 
 74 | - a thread-local *pinning multi-set* (see `object pinning
 75 |   <native-interface.rest#pinning>`__)
 76 | 
 77 |     NOTE: Implementations may keep more thread-local states, such as the
 78 |     thread-local memory pool for the garbage collector. They are implementation
 79 |     details.
 80 | 
 81 | The **thread-local object reference** is an arbitrary object reference, and can
 82 | be ``NULL``. It is initialised when a thread is created. It can be read and
 83 | modified by the thread itself. It can also be read and modified by the client in
 84 | the trap handler, but the trap handler can only read and modify the thread-local
 85 | object reference of the thread that triggered the trap. It cannot be read or
 86 | modified in any other ways.
 87 | 
 88 |     NOTE: This design ensures that:
 89 | 
 90 |     1. The access to the thread-local object reference itself is data-race-free.
 91 | 
 92 |     2. It is only a single object reference, so the reference can fit in a
 93 |        machine register. In this way, if the implementation reserves a register
 94 |        for that reference, accessing fields in the object it refers to can be as
 95 |        efficient as register-indexed addressing.
 96 | 
 97 |     It also off-loads the responsibility of resizing or redefining the
 98 |     thread-local object to the client. If the client wishes to add more fields
 99 |     to that object (e.g. when more bundles are loaded), it can use watchpoints
100 |     to stop existing threads and replace their thread-local object references in
101 |     the trap handler.
102 | 
103 | States of Stacks and Frames
104 | ===========================
105 | 
106 | At any moment, **the state of a frame** is one of the following:
107 | 
108 | READY<Ts>
109 |     (Ts = T1 T2 T3 ..., a list of types) The frame is ready to resume when
110 |     values of types *T1 T2 T3 ...* are supplied. *Ts* can be an empty list.
111 | 
112 | ACTIVE
113 |     The current frame is the top of a stack and a thread is executing on the
114 |     stack.
115 | 
116 | DEAD
117 |     The frame is dead.
118 | 
119 | **The state of a stack** is the state of its top frame. In a bound stack, the
120 | top frame is in the **ACTIVE** state while all other frames are in the
121 | **READY<Ts>** state; in an unbound stack, all frames are in the **READY<Ts>**
122 | state, where *Ts* are specific to each frame. When killing a stack, all of its
123 | frames enter the **DEAD** state.
124 | 
125 | Calling, returning and exception throwing instructions change the state of
126 | frames, but since the stack is always running on the same stack, the state of
127 | the stack remains to be **ACTIVE**. ``CALL`` and ``CCALL`` change the state of
128 | the caller frame to **READY<Ts>** where *Ts* is the return type of the callee,
129 | but a new frame is created for the callee, entering the **ACTIVE** state
130 | immediately. ``RET`` and ``THROW`` remove frames from the top of the current
131 | stack, but resume a lower frame and change its state to **ACTIVE**.
132 | 
133 | Operations on remote stacks can change the state of stacks. The table below
134 | summarises important operations:
135 | 
136 | ======================= =============================== ======================= ======================
137 | Operation               Current Stack                   New/Destination Stack   Affected frames
138 | ======================= =============================== ======================= ======================
139 | create new stack        N/A                             READY<Ts>               creates new frame
140 | create new thread       N/A                             READY<Ts> -> ACTIVE     top
141 | SWAPSTACK               ACTIVE -> READY<Ts> or DEAD     READY<Us> -> ACTIVE     both top frames
142 | killing a stack         N/A                             READY<Ts> -> DEAD       all frames
143 | @uvm.thread_exit        ACTIVE -> DEAD                  N/A                     all frames
144 | trap to client          ACTIVE -> READY<Ts>             N/A                     top
145 | popping a frame         READY<Ts> -> READY<Us>          N/A                     removes top frame
146 | pushing a frame         READY<Ts> -> READY<Us>          N/A                     creates new frame
147 | ======================= =============================== ======================= ======================
148 | 
149 | Stack and Thread Creation
150 | =========================
151 | 
152 | Mu stacks and Mu threads can be created by Mu instructions ``@uvm.new_stack``
153 | and ``NEWTHREAD``, or the API function ``new_stack`` and ``new_thread``.
154 | 
155 | When a stack is created, a Mu function must be provided. The stack will contain
156 | a frame created for the current version of the function (as seen by the current
157 | thread because of concurrency and the memory model). This frame is called the
158 | **stack-bottom frame** and the function is called the **stack-bottom function**.
159 | 
160 |     NOTE: The stack-bottom frame is conceptually the last frame in a Mu stack
161 |     and returning from that frame has undefined behaviour. But a concrete Mu
162 |     implementation can still have its own frames or useful data below the
163 |     stack-bottom frame. They are implementation-specific details.
164 | 
165 | The stack-bottom frame (and also the stack) is in the **READY<Ts>** state, where
166 | Ts are the parameter types of the stack-bottom function. The resumption point is
167 | the beginning of the function version.
168 | 
169 | When a thread is created, a stack must be provided as its **initial stack**.
170 | Creating a thread binds the thread to the stack, passing values or raising
171 | exception to it (explained later), thus the top frame will enter the **ACTIVE**
172 | state after the thread is created. A newly created thread starts execution
173 | immediately.
174 | 
175 |     NOTE: Unlike Java, there is not a separate step to "start" a thread. A
176 |     thread starts when it is created.
177 | 
178 | Thread Termination
179 | ==================
180 | 
181 | A thread is terminated when it executes the ``@uvm.thread_exit`` instruction, or
182 | the client orders the current thread to terminate in a trap handler.
183 | 
184 | The ``@uvm.thread_exit`` instruction kills the current stack of the current
185 | thread.
186 | 
187 | Mu may change the value of ``threadref`` type to ``NULL`` if the thread it
188 | refers to is terminated.
189 | 
190 | Binding of Stack and Thread
191 | ===========================
192 | 
193 | Binding
194 | -------
195 | 
196 | Some actions, including the ``NEWTHREAD`` and the ``SWAPSTACK`` instruction, the
197 | ``new_thread`` API function and the trap handler, can bind a thread to a stack.
198 | 
199 | When **binding** a thread to a stack, the state of its top frame changes from
200 | **READY<Ts>** to **ACTIVE**. In this process, one of the following two actions
201 | shall be performed on the stack:
202 | 
203 | - A binding operation can **pass values** of types *Ts* to the stack. In this
204 |   case, the types *Ts* must match the expected types, and the stack **receives
205 |   the values**. *Ts* can be an empty list.
206 | 
207 | - A binding operation can **raise an exception** to the stack. In this case, the
208 |   stack can be in **READY<Ts>** with any *Ts* and it **receives the exception**.
209 | 
210 | It gives undefined behaviour if the stack is not in the expected state.
211 | 
212 | Resumption Point
213 | ----------------
214 | 
215 | A frame in the **READY<Ts>** state has a **resumption point**. The resumption
216 | point determines how the received values and the received exception are
217 | processed when binding to a thread.
218 | 
219 | For a Mu frame, the resumption point is either the beginning of a function
220 | version, or an OSR point instruction in the function version.
221 | 
222 | - In the former case, the *Ts* in **READY<Ts>** are the parameters of the
223 |   function (also the entry block). Received values are bound to the parameters
224 |   and the execution continues from the beginning of the entry block. Received
225 |   exception is re-thrown.
226 | 
227 | - In the latter case, the *Ts* types are determined by the concrete
228 |   instructions. Specifically, *Ts* are the return types for ``CALL`` and
229 |   ``CCALL``, and are explicitly specified for ``TRAP``, ``WATCHPOINT`` and
230 |   ``SWAPSTACK``. The received values are bound to the results of the OSR point
231 |   instruction. The received exception is handle by the instruction or re-thrown
232 |   depending on the instruction.
233 | 
234 | Undefined Mu functions behaves as defined in `Mu IR <uvm-ir.rest>`__.
235 | 
236 | Native frames can only enter the **READY<Ts>** state when it calls back to Mu.
237 | Thus the resumption point is where it will continue after the native-to-Mu call
238 | returns. The received values are the return values to the native function.
239 | Throwing exceptions to native frames has implementation-defined behaviour.
240 | 
241 | Mu gives the client a unified model of stack binding. The binding operation is
242 | only aware of the *Ts* types of the **READY<Ts>** state, but oblivious of the
243 | resumption point. Therefore it can resume any **READY<Ts>** stack in the same
244 | way, whether the resumption point is the beginning of a function, a call site, a
245 | trap or a swap-stack instruction, or a native frame.
246 | 
247 |     Note to the Mu implementers: swap-stack, Mu-to-Mu calls and Mu-native calls
248 |     may all have different calling conventions, but the implementation must
249 |     present a unified "resumption protocol" to the client: all stack-binding
250 |     operations work on all OSR point instructions, as long as the *Ts* in
251 |     *READY<Ts>* match the passed values. In practice, some "adapter" frames may
252 |     need to be inserted to convert one convention to another, but these frames
253 |     must not be seen by the client. This implies that some API functions
254 |     (especially stack introspection) must "lie" to the client about the presence
255 |     of such frames.
256 | 
257 |     For example, on x86_64, assume ``SWAPSTACK`` passes values to the other
258 |     stack via rdi and rsi, but ``RET`` returns values to the caller via rax and
259 |     rdx. If, during OSR, a frame pausing on the ``CALL`` instruction becomes the
260 |     top frame of a stack, then the Mu implementation must also create some glue
261 |     code and an adapter frame above that frame, so that when a thread SWAP-STACK
262 |     to this stack, values passed in rdi and rsi can be moved to rax and rdx,
263 |     respectively. This "adapter" frame must also recover callee-saved registers.
264 | 
265 | Unbinding
266 | ---------
267 | 
268 | Some actions, including the ``@uvm.thread_exit``, ``TRAP``, ``WATCHPOINT`` and
269 | the ``SWAPSTACK`` instruction, can unbind a thread from a stack.
270 | 
271 | When **unbinding** a thread from a stack, one of the following two actions shall
272 | be performed on the stack:
273 | 
274 | An unbinding operation can **leave the stack** with a return types *Ts*. In this
275 | case, the state of its top frame changes from **ACTIVE** to **READY<Ts>** for
276 | some given *Ts*. The instruction becomes the resumption point of the frame.
277 | 
278 | An unbinding operation can **kill the stack**. In this case, the state of all
279 | frames of the stack changes from **ACTIVE** to **DEAD**. Specifically the
280 | ``@uvm.thread_exit`` kills the current stack and the ``SWAPSTACK`` instruction
281 | can do either option on the swapper.
282 | 
283 | Executing a ``TRAP`` or an enabled ``WATCHPOINT`` instruction implies an
284 | unbinding operation, leaving the top frame in a **READY<Ts>** state.
285 | 
286 | Swap-stack
287 | ----------
288 | 
289 | **Swap-stack** is an operation that unbinds a thread from a stack and rebind
290 | that thread to a new stack. In a swap-stack operation, the stack to unbind from
291 | is called the **swapper** and the stack to bind to is called the **swappee**.
292 | 
293 | The ``SWAPSTACK`` instruction (see `<instruction-set.rest>`__) performs a
294 | *swap-stack* operation.
295 | 
296 | A trap handler can do similar things as *swap-stack* by re-binding the current
297 | thread to a different stack.
298 | 
299 | Stack Destruction
300 | =================
301 | 
302 | The ``@uvm.kill_stack`` instruction, the ``kill_stack`` API function and all
303 | operations that perform unbinding operations can destroy a stack. Destroying a
304 | stack changes the state of all of its frames to **DEAD**.
305 | 
306 | If a stack becomes unreachable from roots, the garbage collector may kill the
307 | stack.
308 | 
309 | The Mu may change the value of ``stackref`` type to ``NULL`` if the stack it
310 | refers to is in the **DEAD** state.
311 | 
312 | Stack Introspection
313 | ===================
314 | 
315 | Stacks in the **READY<Ts>** state can be introspected. Stacks in other states
316 | cannot.
317 | 
318 | The stack introspection API uses **frame cursors**. A *frame cursor* is a
319 | mutable opaque structure allocated by Mu. It refers to a Mu frame, and also
320 | keeps implementation-dependent states necessary to iterate through frames in a
321 | stack.
322 | 
323 |     Note: The reason why it is mutable is that the cursor may be big. The states
324 |     to be kept is specific to the implementation. Generally speaking, the more
325 |     callee-saved registers there are, the bigger the cursor is. Allocating a new
326 |     structure whenever moving down a frame may not scale for deep stacks.
327 | 
328 | The ``new_cursor`` API call allocates a frame cursor that refers to the top
329 | frame of a given stack, and returns a ``framecursorref`` that refers to the
330 | cursor. Then the client can use the ``next_frame`` API to move the cursor to the
331 | frame below. The ``copy_cursor`` copies the given frame cursor. The original
332 | frame cursor and the copied cursor can move down independently. This is useful
333 | when the client wishes to iterate through the stack in different paces. The
334 | ``close_cursor`` API closes the frame cursor and deallocates its resources.
335 | 
336 | It has undefined behaviour if a stack is bound to a thread or a stack is killed
337 | while there are frame cursors to its frames not closed. It has undefined
338 | behaviour if ``next_frame`` goes below the bottom frame.
339 | 
340 |     Note: There are several reasons why it needs explicit closing.
341 | 
342 |     * It forces the client to avoid racing stack modification and stack
343 |       introspection.
344 |     
345 |     * It will not force the Mu implementation to use a particular way to
346 |       allocate such cursors. The Mu implementation can use malloc and free. If
347 |       the implementation uses garbage collection for such cursors, it can still
348 |       treat the ``close_cursor`` operation as a no-op.
349 | 
350 |     * An alternative is to close all related cursors automatically when the
351 |       stack is re-bound. But that will involve one extra check for every
352 |       swap-stack operation, which may be much more frequent than stack
353 |       introspection, which is usually only used in exceptional cases.
354 | 
355 | The ``cur_func``, ``cur_func_ver``, ``cur_inst`` and ``dump_keepalives`` API
356 | calls take a ``framecursorref`` as argument, and returns the ID of the function,
357 | function version or current instruction of the frame, or dumps the keep-alive
358 | variables of the current instruction of the frame.
359 | 
360 | Multiple threads may introspect the stack concurrently as long as there is no
361 | concurrent modification using the on-stack replacement API (see below). However,
362 | it has undefined behaviour to operate on a closed frame cursor.
363 | 
364 | These operations can also be performed by their equivalent common instructions
365 | ``@uvm.meta.*``.
366 | 
367 | On-stack Replacement
368 | ====================
369 | 
370 | The client can pop and push frames from or to a stack.
371 | 
372 | The ``pop_frames_to`` API function takes a ``framecursorref`` as argument. It
373 | will pop all frames above the frame cursor, and the frame of the cursor becomes
374 | the new top frame.
375 | 
376 | Popping native frames has implementation-defined behaviour. It has undefined
377 | behaviour if a frame is popped but there are frame cursors referring to that
378 | frame.
379 | 
380 | The ``push_frame`` API function takes a ``stackref`` and a ``funcref`` as
381 | arguments. It creates a new frame on the top of the stack, using the current
382 | version (as seen by the current thread) of the given function. The resumption
383 | point is the beginning of the function version. The return types of the function
384 | must match the *Ts* of the state of the previous frame, which must be
385 | **READY<Ts>**.
386 | 
387 | There are equivalent common instructions in the IR, too.
388 | 
389 | It has undefined behaviour if
390 | 
391 | - there are two API calls or equivalent instructions executed by two threads,
392 |   and
393 | - one is ``new_cursor``, ``next_frame``, ``cur_func``, ``cur_func_ver``,
394 |   ``cur_inst``, ``dump_keepalives``, ``pop_frames_to`` or ``push_frame``, and
395 | - the other is ``pop_frames_to`` or ``push_frame``, and
396 | - neither happens before the other.
397 | 
398 | After popping or pushing frames, the state of the stack become the state of the
399 | new top frame, which must be **READY<Ts>** for some *Ts*. The stack can be
400 | bound.
401 | 
402 |     NOTE: For the ease of Mu implementation, the new function must continue
403 |     from the beginning rather than an arbitrary instruction in the middle.
404 |     Continuing from the middle of a function demands too much power from the
405 |     code generator.
406 |     
407 |     However, in most OSR scenarios, the desired behaviour is to continue from
408 |     the point where the program left out for optimisation. The client can
409 |     emulate the behaviour of continuing from the middle of a function by
410 |     inserting a "prologue" in the high-level language in the beginning of the
411 |     function. For example, in C, the client can add extra assignment expressions
412 |     to initialise local variables to the previous context and use a goto
413 |     statement to jump to the location to continue. Then the optimising compiler
414 |     can remove unreachable code. As another example, if the client implements a
415 |     JVM, it can insert ``Xstore`` instructions and a ``goto`` instruction to
416 |     continue from the appropriate bytecode instruction. The optimising compiler
417 |     can handle the rest.
418 | 
419 |     TODO: With the goto-with-values form defined, we can extend the IR and the
420 |     API so that execution can continue from an arbitrary basic block of an
421 |     arbitrary function version, rather than just the beginning of a function.
422 | 
423 | Futex
424 | =====
425 | 
426 | Mu provides a mechanism similar to the Futex in the Linux kernel for
427 | implementing blocking locks and other synchronisation primitives.
428 | 
429 | There is a waiting queue for all memory locations that has some integer types.
430 | (See `<portability.rest>`__ for valid candidate types for Futex.)
431 | 
432 | The ``@uvm.futex.wait`` and the ``@uvm.futex.wait_timeout`` instructions put the
433 | current thread into the waiting queue of a memory location. Both
434 | ``@uvm.futex.wake`` and ``@uvm.futex.cmp_requeue`` wakes up threads in a waiting
435 | queue of a memory location.
436 | 
437 |     NOTE: The term *memory location* is defined in Mu's sense and is abstract
438 |     over physical memory or the virtual memory space given by the operating
439 |     system. Even if a Mu implementation uses copying or replicating garbage
440 |     collectors, the memory location in a heap object remains the same until the
441 |     object is collected.
442 | 
443 |     The Mu Futex is designed to be easy to map to the ``futex`` system call on
444 |     Linux. With the presence of copying garbage collector, Mu may internally
445 |     perform ``FUTEX_REQUEUE`` or ``FUTEX_CMP_REQUEUE`` operations to compensate
446 |     the effect of object movements. It may put barriers around Futex-related Mu
447 |     instructions when the GC is concurrently re-queuing threads.
448 | 
449 | When a thread is blocking on a futex, the state of its stack is still ACTIVE,
450 | making the impression that the thread is still "busy executing" the futex
451 | wait/wait_timeout instructions. Only the kernel knows whether it is doing an
452 | OS-level swap-stack, as it always does for context-switching.
453 | 
454 | .. vim: tw=80
455 | 


--------------------------------------------------------------------------------
/type-system.rest:
--------------------------------------------------------------------------------
  1 | ===========
  2 | Type System
  3 | ===========
  4 | 
  5 | Overview
  6 | ========
  7 | 
  8 | Mu has a comprehensive type system. It is close to the machine level, but also
  9 | has reference types for exact garbage collection.
 10 | 
 11 | In the Mu IR, a type is created by a (possibly recursive) combination of type
 12 | constructors.
 13 | 
 14 | By convention, types are written in lower cases. Parameters to types are written
 15 | in angular brackets ``< >``.
 16 | 
 17 | Type and Data Value
 18 | ===================
 19 | 
 20 | A Mu **type** defines a set where a **data value**, or **value** when
 21 | unambiguous, is one of its elements.
 22 | 
 23 | Both SSA variables and the Mu memory can hold values in this type system. Some
 24 | restrictions can limit what type a variable or a memory location can hold.
 25 | 
 26 | Types and Type Constructors
 27 | ===========================
 28 | 
 29 | A **type constructor** represents an **abstract type**. A **concrete type** is
 30 | created by applying a type constructor and supplying necessary **parameters**.
 31 | The following type constructors are available in Mu:
 32 | 
 33 | - **int** < *length* >
 34 | - **float**
 35 | - **double**
 36 | - **uptr** < *T* >
 37 | - **ufuncptr** < *sig* >
 38 | - **struct** < *T1* *T2* *...* >
 39 | - **hybrid** < *F1* *F2* *...* *V* >
 40 | - **array** < *T* *length* >
 41 | - **vector** < *T* *length* >
 42 | - **void**
 43 | - **ref** < *T* >
 44 | - **iref** < *T* >
 45 | - **weakref** < *T* >
 46 | - **tagref64**
 47 | - **funcref** < *sig* >
 48 | - **threadref**
 49 | - **stackref**
 50 | - **framecursorref**
 51 | - **irnoderef**
 52 | 
 53 | ..
 54 | 
 55 |     For C programmers: In Mu IR, types cannot be "inlined", i.e. all types
 56 |     referenced by other definitions (such as other types, constants, globals,
 57 |     functions, ...) must be defined at top-level. For example::
 58 | 
 59 |         .typedef @refi64 = ref<int<64>>     // WRONG. Cannot write int<64> inside.
 60 |         
 61 |         .typedef @i64 = int<64>
 62 |         .typedef @refi64 = ref<@i64>        // Right.
 63 | 
 64 |         %sum = FADD <double> %a %b          // WRONG. "double" is a type constructor, not a type
 65 | 
 66 |         .typedef @double = double
 67 |         %sum = FADD <@double> %a %b         // Right.
 68 | 
 69 | Parameters of a type are in the angular brackets. They can be integer literals,
 70 | types and function signatures. In the text form, the latter two are global
 71 | names (See `<uvm-ir.rest>`__).
 72 | 
 73 | There are several kinds of types.
 74 | 
 75 | * ``float`` and ``double`` are **floating point types**.
 76 | * ``ref`` and ``weakref`` are **object referenct types**.
 77 | * ``ref``, ``iref`` and ``weakref`` are **reference types**.
 78 | * ``funcref``, ``threadref``, ``stackref``, ``framecursorref`` and ``irnoderef``
 79 |   are **opaque reference types**.
 80 | * *Reference types* and *opaque reference types* are **general reference types**.
 81 | * ``int``, ``float``, ``double``, *pointer types*, *general reference types* and
 82 |   ``tagref64`` are **scalar types**.
 83 | * ``struct``, ``hybrid``, ``array`` and ``vector`` are **composite types**.
 84 | 
 85 |   * ``void`` is neither a *scalar type* nor a *composite type*.
 86 | 
 87 | * ``hybrid`` is the only **variable-length type**. All other types are
 88 |   **fixed-length types**.
 89 | * ``uptr`` and ``ufuncptr`` are **pointer types**.
 90 | * ``int``, *pointer types*, ``ref``, ``iref`` and *opaque reference types* are
 91 |   **EQ-comparable types**.
 92 | * ``int``, ``iref`` and *pointer types* are **ULT-comparable types**.
 93 | * ``ref<T>`` is the **strong variant** of ``weakref<T>``; ``weakref<T>`` is the
 94 |   **weak variant** of ``ref<T>``. All other types are the strong variant or weak
 95 |   variant of themselves.
 96 | 
 97 | A **member** of a composite type T is either a field of T if T is a struct, or
 98 | an element of T if T is an array or a vector, or a field in the fixed part or
 99 | any element in the variable part if T is a hybrid. A **component** of type T is
100 | either itself or a member of any component of T.
101 | 
102 |     NOTE: This means a component is anything in a type, including itself and any
103 |     arbitrarily nested members.
104 | 
105 | The type parameter *T* in ``uptr<T>`` and the return type and all parameter
106 | types of *sig* in ``ufuncptr<sig>`` must be **native-safe**. It is defined as
107 | following:
108 | 
109 | * ``void``, ``int<n>``, ``float`` and ``double`` are *native-safe*.
110 | 
111 | * ``struct<T1 T2 ...>``, ``array<T n>``, ``vec<T n>`` and ``hybrid<F1 F2 ...
112 |   V>`` are native-safe if all of their type arguments *T1*, *T2*, ..., *T*,
113 |   *F1*, *F2*, ..., *V* are native-safe.
114 | 
115 | * ``uptr<T>`` and ``ufuncptr<sig>`` are *native-safe* if *T* and the return type
116 |   and all parameter types in *sig* are native-safe. Otherwise the ``uptr`` or
117 |   the ``ufuncptr`` type is not well-formed.
118 | 
119 | * All other types are not native-safe. (Specifically, they are all *general
120 |   reference types* as well as ``struct``, ``array`` or ``hybrid`` that contains
121 |   them.)
122 | 
123 | Primitive Non-reference Types
124 | =============================
125 | 
126 | Integer Types
127 | -------------
128 | 
129 | ``int`` ``<`` *length* ``>``
130 | 
131 | length
132 |     *integer literal*: The length of the integer in bits.
133 | 
134 | ``int`` is an integer type of *length* bits.
135 | 
136 | ``int`` is neutral to signedness. Negative numbers are represented in the 2's
137 | complement notation where the highest bit is the sign bit.
138 | 
139 | ``int<1>`` is a Boolean type, in which case 1 means true and 0 means false.
140 | 
141 |     NOTE: The signedness of an ``int`` type is determined by the operations
142 |     rather than the type. For example, ``UDIV`` treats both operands as unsigned
143 |     numbers, ``SDIV`` treats both operands as signed numbers and ``ASHR`` treats
144 |     the first operand as signed and the second operand as unsigned.
145 | 
146 | ..
147 | 
148 |     NOTE: Although ``int<1>`` is required and ``int<6>`` and ``int<52>`` are
149 |     also required when ``tagref64`` is implemented, they should not be part of
150 |     any in-memory structure because their corresponding ``LOAD`` and ``STORE``
151 |     operations are not required for Mu implementations. ``int<1>`` is meant to
152 |     represent register flags, such as the result of comparison and some overflow
153 |     or carry flags. ``int<6>`` and ``int<52>`` are supposed to be used
154 |     transiently when packing or unpacking a ``tagref64`` value.
155 | 
156 | ..
157 | 
158 |     For LLVM users: these types are directly borrowed from LLVM.
159 | 
160 | ..
161 | 
162 |     Example::
163 | 
164 |         .typedef @i1 = int<1>
165 |         .typedef @i8 = int<8>
166 |         .typedef @i16 = int<16>
167 |         .typedef @i32 = int<32>
168 |         .typedef @i64 = int<64>
169 | 
170 | Floating Point Types
171 | --------------------
172 | 
173 | ``float``
174 | 
175 | ``double``
176 | 
177 | ``float`` and ``double`` are the IEEE754 single-precision and double-precision
178 | floating point number type, respectively.
179 | 
180 |     For LLVM users: these types are directly borrowed from LLVM.
181 | 
182 | ..
183 | 
184 |     Example::
185 | 
186 |         .typedef @float = float
187 |         .typedef @double = double
188 | 
189 | Pointer Types
190 | -------------
191 | 
192 | ``uptr < T >``
193 | 
194 |     ``T``
195 |         *type*: The type of the referent.
196 | 
197 | ``ufuncptr < sig >``
198 | 
199 |     ``sig``
200 |         *function signature*: The signature of the pointed function.
201 | 
202 | ``uptr`` and ``ufuncptr`` are (untraced) pointer types which are represented by
203 | the integral address of the referent. They are part of the (unsafe) native
204 | interface. The "u" in their names stand for "untraced". Their values are not
205 | affected by the garbage collection, even if their addresses are obtained from
206 | pinning heap objects which are later unpinned. 
207 | 
208 | ``uptr`` is the data pointer type. It points to a region in the native address
209 | space which represents the data type *T*.
210 | 
211 | ``ufuncptr`` is the function pointer type. It points to a native function whose
212 | signature is *sig*.
213 | 
214 | The type parameter *T* and both the return types and the parameter types of
215 | *sig* must be *native-safe*. It is implementation-defined whether multiple
216 | return values are allowed for a particular calling convention.
217 | 
218 |     For LLVM users: ``uptr<T>`` is the counterpart of pointer types ``T*``.
219 |     ``ufuncptr<sig>`` is the counterpart of function pointers ``R (P1 P2 ...)*``.
220 |     The ``PTRCAST`` instruction can cast between different pointer types as well
221 |     as integer types.
222 | 
223 |     For C users: Similar to LLVM, ``uptr`` and ``ufuncptr`` are the equivalent
224 |     of C pointers to objects and functions, respectively. However, since Mu
225 |     interfaces with the native world at the ABI level rather than the C
226 |     programming language level, pointers are defined as addresses and casting
227 |     between pointers and integers has semantics.
228 | 
229 | ..
230 | 
231 |     Example::
232 | 
233 |         .typedef @i32           = int<32>       // int
234 |         .typedef @i32_p         = uptr<@i32>    // int*
235 | 
236 |         // ssize_t write(int fildes, const void *buf, size_t nbyte);
237 |         // See man (2) write.
238 |         .typedef @void          = void          // void
239 |         .typedef @void_p        = uptr<@void>   // void*
240 |         .typedef @size_t        = int<64>       // size_t, ssize_t
241 |         .funcsig @write_s       = (@i32 @void_p @size_t) -> (@size_t)
242 |         .typedef @write_fp      = ufuncptr<@write_s>
243 |         // @write_fp may point to the native function "write".
244 | 
245 |         // typedef void (*sig_t) (int);
246 |         // sig_t signal(int sig, sig_t func);
247 |         // See man (3) signal.
248 |         .funcsig @sig_s         = (@i32) -> ()
249 |         .typedef @sig_t         = ufuncptr<@sig_s>
250 |         .funcsig @signal_s      = (@i32 @sig_t) -> (@sig_t)
251 |         .typedef @signal_fp     = ufuncptr<@signal_s>
252 |         // @signal_fp may point to the native function "signal".
253 | 
254 | Aggregate Types
255 | ===============
256 | 
257 | Struct
258 | ------
259 | 
260 | ``struct`` ``<`` *T1* *T2* *...* ``>``
261 | 
262 | T1, T2, ...
263 |     *type*: The type of fields.
264 | 
265 | A ``struct`` is a Cartesian product type of several types. *T1*, *T2*, *...* are
266 | its **field types**. A ``struct`` must have at least one member. ``struct``
267 | members cannot be ``void``.
268 | 
269 |     NOTE: For C programs: C does not allow empty structures, either, but many
270 |     programmers create empty structures in practice. C++ does allow empty
271 |     classes. g++ treats empty classes as having one ``char`` element. In Mu, if
272 |     it is desired to allocate an empty unit in the heap, the appropriate type is
273 |     ``void``.
274 | 
275 | A ``struct`` cannot have itself as a component.
276 | 
277 |     NOTE: If it could, the struct would be infinitely large. However, a struct
278 |     may contain a reference to the same struct type, since the size of
279 |     references is not dictated by the thing it points to. For example::
280 | 
281 |         .typedef @foo = struct <@i32 @foo>      // WRONG. @foo is infinitely big.
282 | 
283 |         .typedef @foo = struct <@i32 @fooref>
284 |         .typedef @fooref = ref<@foo>            // Okay. It is a linked list.
285 | 
286 | ``struct`` cannot be the type of an SSA variable if any of its field types
287 | cannot be the type of an SSA variable.
288 | 
289 | ..
290 | 
291 |     NOTE: For example, a ``struct`` with a ``weakref`` field cannot be the type
292 |     of an SSA variable. However, there can be references to such structs.
293 | 
294 | ..
295 | 
296 |     For LLVM users: This is the same as LLVM's structure type, except structures
297 |     with a "flexible array member" (a 0-length array as the last element)
298 |     corresponds to the ``hybrid`` type in Mu.
299 | 
300 | ..
301 | 
302 |     Example::
303 | 
304 |         .typedef @byte = int<8>
305 |         .typedef @short = int<16>
306 |         .typedef @int = int<32>
307 |         .typedef @f = float
308 |         .typedef @d = double
309 | 
310 |         .typedef @struct1 = struct<@byte @short @int @f @d>
311 |         .typedef @struct2 = struct<@f @f @struct1 @d @d> // nesting structs
312 | 
313 | Hybrid
314 | ------
315 | 
316 | ``hybrid`` ``<`` *F1* *F2* *...* *V* ``>``
317 | 
318 | F1, F2, ...
319 |     *list of types*: The types in the fixed part
320 | V
321 |     *type*: The type of the elements of the variable part
322 | 
323 | A hybrid is a combination of a fixed-length prefix, i.e. its ``fixed part``, and
324 | a variable-length array suffix, i.e. its ``variable part``, whose length is
325 | decided at allocation time. *F1* *F2* ... are the types of fields in the fixed
326 | part. *V* is the type of the *elements* of the variable part.
327 | 
328 |     NOTE: This is intended to play the part of "struct with flexible array
329 |     member" in C99, i.e. ``struct { F1 f1; F2 f2; ... V v[]; }``.
330 | 
331 | The fixed part may contain 0 fields. In this case, the fixed part is empty, and
332 | the variable part is in the beginning of this ``hybrid`` memory location, like a
333 | variable-length array without a header. The variable part cannot be omitted.
334 | Neither any fixed-part field nor *V* can be ``void``.
335 | 
336 | ``hybrid`` cannot be the type of any SSA variable.
337 | 
338 |     NOTE: There can be references to hybrids.
339 | 
340 | ``hybrid`` is the only type in Mu whose length is determined at allocation site
341 | rather than determined by the type itself.
342 | 
343 | ``hybrid`` cannot be contained in any other composite types, including other
344 | hybrids.
345 | 
346 |     NOTE: Since the length of hybrids are only known at allocation time,
347 |     allowing embedding hybrid members will make the size of other types
348 |     variable. In Mu's design, hybrid is the only type whose length is determined
349 |     at allocation site.
350 |     
351 |     For C programmers: Just like ``struct { F f; V v[]; }`` cannot be embedded
352 |     in other types, ``hybrid`` cannot, either. However, pointers/references to
353 |     hybrids are allowed.
354 | 
355 | ..
356 | 
357 |     Example::
358 | 
359 |         .typedef @byte = int<8>
360 |         .typedef @long = int<64>
361 |         .typedef @double = double
362 | 
363 |         .typedef @struct1 = struct<@long @long @long>
364 | 
365 |         .typedef @hybrid1 = hybrid<@long @byte>         // one int<64> followed by many int<8>
366 |         .typedef @hybrid2 = hybrid<@long @long @long @double>    // three int<64>, followed by many double
367 |         .typedef @hybrid3 = hybrid<@struct1 @double>    // similar to @hybrid2, but using struct as the header
368 |         .typedef @hybrid4 = hybrid<@byte>               // no fixed-part header. Just many int<8>.
369 | 
370 | Array
371 | -----
372 | 
373 | ``array`` ``<`` *T* *length* ``>``
374 | 
375 | T
376 |     *type*: The type of elements.
377 | length
378 |     *integer literal*: The number of elements.
379 | 
380 | An ``array`` is a sequence of values of the same type. *T* is its **element
381 | type**, i.e. the type of its elements, and *length* is the length of the array.
382 | *T* must not be ``void``. An array must have at least one element.
383 | 
384 | An ``array`` cannot have itself as a component.
385 | 
386 | It is not recommended to have SSA variables of ``array`` type.
387 | 
388 |     NOTE: The most useful feature of arrays is indexing by a variable index.
389 |     But an SSA variable has more in common with registers than memory, and SSA
390 |     variables are designed to be allocated in registers when possible. Mu
391 |     implementations, as supposed to be minimal, may not be able to implement
392 |     indexing more efficiently than storing the array to the memory and load
393 |     back and element.
394 | 
395 |     Using arrays as an SSA variable is most useful when passing a ``struct``
396 |     value (not pointer) that contains an array member to a C function that
397 |     requires such a parameter, although such C functions are, themselves, very
398 |     rare.
399 | 
400 | ..
401 | 
402 |     For LLVM users: Like LLVM arrays, a Mu array must have a size, but cannot
403 |     have size 0. The closest counterpart of the "variable length array" (VLA)
404 |     type in C is the ``hybrid`` type.
405 | 
406 | ..
407 | 
408 |     Example::
409 | 
410 |         .typedef @u8 = int<8>
411 |         .typedef @real = double
412 |         .typedef @cmpx = struct<@real @real>
413 | 
414 |         .typedef @array1 = array<@u8 4096>      // array of 4096 bytes
415 |         .typedef @array2 = array<@real 100>     // array of 100 doubles
416 |         .typedef @array3 = array<@cmpx 16>      // array of 16 structs
417 |         .typedef @array4 = array<@array2 1024>  // array of 1024 nested arrays
418 | 
419 | Vector Type
420 | -----------
421 | 
422 | ``vector < T length >``
423 | 
424 |     ``T``
425 |         *type*: The type of elements.
426 |     ``length``
427 |         *integer literal*: The number of elements.
428 | 
429 | ``vector`` is the vector type for single-instruction multiple-data (SIMD)
430 | operations. A ``vector`` value is a packed value of multiple values of the same
431 | type. *T* is the type of its elements and *length* is the number of elements.
432 | *T* cannot be void. *length* must be at least one.
433 | 
434 | It is allowed to have SSA variables of vector types.
435 | 
436 | Only some primitive element types, such as ``int<32>``, ``float`` and
437 | ``double``, are `required <portability.rest>`__ for implementations. If the
438 | implementation allows other types, then any vector cannot directly or indirectly
439 | contain itself as a member.
440 | 
441 |     For LLVM users: This is the counterpart of the LLVM vector type.
442 | 
443 | ..
444 | 
445 |     Example::
446 |     
447 |         .typedef @i32 = int<32>
448 |         .typedef @float = float
449 |         .typedef @double = double
450 | 
451 |         .typedef @vector1 = vector<@i32 4>
452 |         .typedef @vector2 = vector<@float 4>
453 |         .typedef @vector3 = vector<@double 2>
454 |         .typedef @vector4 = vector<@double 4>
455 | 
456 | Void Type
457 | =========
458 | 
459 | ``void``
460 | 
461 | The ``void`` type has no value.
462 | 
463 | It can only be used as the type of allocation units that do not store any
464 | values.  This allows allocating ``void`` in the heap/stack/global memory.
465 | Particularly, the ``NEW`` instruction with the type ``void`` creates a new empty
466 | heap object which is not the same as any others. This is similar to the ``new
467 | Object()`` expression in Java.  ``ref<void>``, ``iref<void>``, ``weakref<void>``
468 | and ``uptr<void>`` are also allowed, which can refer/point to "anything".
469 | 
470 | 
471 | Reference Types and General Reference Types
472 | ===========================================
473 | 
474 | Reference Types
475 | ---------------
476 | 
477 | ``ref`` ``<`` *T* ``>``
478 | 
479 | ``iref`` ``<`` *T* ``>``
480 | 
481 | ``weakref`` ``<`` *T* ``>``
482 | 
483 | T
484 |     *type*: The type of referent.
485 | 
486 | ``ref`` is an object reference type. A ``ref`` value is a strong reference to a
487 | heap objects.
488 | 
489 | ``iref`` is an internal reference type. An ``iref`` value is an internal
490 | reference to a memory location.
491 | 
492 | ``weakref`` is a weak object reference type. It is the weak variant of ``ref``.
493 | A memory location of ``weakref`` holds a weak reference to a heap object and can
494 | be clear to ``NULL`` by the garbage collector when there is no strong references
495 | to the object the ``weakref`` value refers to.
496 | 
497 |     NOTE: There is no weak internal reference.
498 | 
499 | The type parameter ``T`` is the referent type, which is the type of the heap
500 | object or memory location its value refers to.
501 | 
502 | All reference types can have ``NULL`` value which does not refer to any heap
503 | object or memory location.
504 | 
505 | Weakref can only be the type of a memory location, not an SSA variable. When a
506 | ``weakref`` location is loaded from, the result is a ``ref`` to the same object;
507 | when a ``ref`` is stored to a ``weakref`` location, the location holds a
508 | ``weakref`` to that object.
509 | 
510 | ..
511 | 
512 |     NOTE: Allowing SSA variables to hold weak references may cause many
513 |     problems. The semantic allows the GC to change it to ``NULL`` at any time
514 |     as long as the GC decides the referent object is no longer reachable. For
515 |     this reason, it is impossible to guarantee a weak reference is ``NULL``
516 |     before accessing. Consider the following program::
517 | 
518 |         %entry():
519 |             %notnull = NE <@RefT> %weakref @NULLREF
520 |             // Just at this moment, GC changed %weakref to NULL
521 |             BRANCH2 %notnull %bb_cont(%weakref) %bb_abnormal(...)
522 | 
523 |         %bb_cont(<@WeakRefT> %weakref):
524 |             %val = LOAD <@T> %weakref       // null reference access
525 |             ...
526 | 
527 |     GC may clear the weak reference right after the program decided it is not
528 |     ``NULL``.
529 |     
530 |     Requiring an explicit conversion from ``weakref<T>`` to ``ref<T>`` is not
531 |     very useful. In that case, the only operation allowed for ``weakref<T>`` is
532 |     to convert to ``ref<T>``.
533 |     
534 |     So letting this conversion happen implicitly during memory access is a
535 |     natural choice, though not intuitive at all.  In Mu's conceptual model, a
536 |     memory load is like an IO operation: it does not simply "get" the value
537 |     (such as an object reference) in the memory, but is a communication with the
538 |     memory system and queries the global state. So it is natural for a load
539 |     operation to return a different value each time executed.
540 | 
541 | ..
542 | 
543 |     For LLVM users: there is no equivalence in LLVM. Mu guarantees that all
544 |     references are identified both in the heap and in the stack and are subject
545 |     to garbage collection. The closest counterpart in LLVM is the pointer type.
546 |     Mu does not encourage the use of pointers, though pointer types will be
547 |     introduced in Mu in the future.
548 | 
549 | ..
550 | 
551 |     Example::
552 | 
553 |         .typedef @i8  = int<8>
554 |         .typedef @i16 = int<16>
555 |         .typedef @i32 = int<32>
556 |         .typedef @float = float
557 |         .typedef @double = double
558 |         .typedef @some_struct = struct<@i32 @i16 @i8 @double @float>
559 |         .typedef @some_array = array<@i8 100>
560 | 
561 |         .typedef @ref1 = ref<@i32>
562 |         .typedef @ref2 = ref<@some_struct>
563 |         .typedef @ref3 = ref<@some_array>
564 |         .typedef @iref1 = iref<@i32>
565 |         .typedef @iref2 = iref<@some_struct>
566 |         .typedef @iref3 = iref<@some_array>
567 |         .typedef @weakref1 = weakref<@i32>
568 |         .typedef @weakref2 = weakref<@some_struct>
569 |         .typedef @weakref3 = weakref<@some_array>
570 | 
571 | Tagged Reference
572 | ----------------
573 | 
574 | ``tagref64``
575 | 
576 | ``tagref64`` is a union type of ``double``, ``int<52>`` and ``struct<ref<void>
577 | int<6>``. It occupies 64 bits. A ``tagref64`` value holds both a state which
578 | identifies the type it is currently representing and a value of that type.
579 | 
580 | 
581 |     NOTE: When a ``tagref64`` contains an object reference, it can hold an
582 |     ``int<6>`` together as a user-defined tag. It is useful to store type
583 |     information.
584 | 
585 | When a ``tagref64`` represents a ``double`` NaN value, it does not preserve the
586 | bit-wise representation of the NaN.
587 | 
588 |     NOTE: This type is intended to reuse the NaN space of the IEEE754 double
589 |     value to multiplex with integers and object references. For this reason,
590 |     when storing NaN values, it will still be NaN, but may not have the same bit
591 |     representation.
592 | 
593 | ..
594 | 
595 |     NOTE: This type is only available on some architectures including x86-64
596 |     with 48-bit addresses.
597 | 
598 | Function Reference Type
599 | -----------------------
600 | 
601 | ``funcref`` ``<`` *sig* ``>``
602 | 
603 | sig
604 |     *function signature*: The signature of the referred function.
605 | 
606 | ``funcref`` is a function reference type. It is an opaque reference to a Mu
607 | function and is not interchangeable with reference types. *sig* is the signature
608 | of the function it refers to.
609 | 
610 | A ``NULL`` value of a ``funcref`` type does not refer to any function.
611 | 
612 |     NOTE: The value of a ``funcref`` may refer to a function that is declared
613 |     but not defined. The value of a ``funcref`` type does not change even the
614 |     function it refers to becomes defined or redefined.
615 | 
616 | ..
617 | 
618 |     For C and LLVM users: The ``funcref`` type is similar to the "pointer to
619 |     function" type in C and LLVM, but it only refer to Mu functions. It is not a
620 |     pointer (see the ``ufuncptr`` type). It may be implemented under the hood as
621 |     a pointer to a function, which will be an implementation detail.
622 | 
623 | ..
624 | 
625 |     Example::
626 |         
627 |         .typedef @i64 = int<64>
628 | 
629 |         .funcsig @sig1 = (@i64 @i64) -> (@i64)
630 |         .funcsig @sig2 = () -> ()
631 | 
632 |         .typedef @func1 = funcref<@sig1>
633 |         .typedef @func2 = funcref<@sig2>
634 | 
635 | Other Opaque Reference Types
636 | ----------------------------
637 | 
638 | ``threadref``
639 | 
640 | ``stackref``
641 | 
642 | ``framecursorref``
643 | 
644 | ``irnoderef``
645 | 
646 | These types are opaque references to things within Mu. They are not
647 | interchangeable with reference types. Only some special instructions (e.g.
648 | ``@uvm.new_stack``, ``NEWTHREAD``, ``@uvm.meta.new_cursor``) or API calls can
649 | operate on them.
650 | 
651 | All opaque reference values can be ``NULL``, which does not refer to anything.
652 | 
653 | ``threadref`` and ``stackref`` refer to Mu Threads and Mu stacks, respectively.
654 | They are used to manipulate the `threads and stacks <threads-stacks.rest>`__. In
655 | particular, ``stackref`` is used in the ``SWAPSTACK`` instruction. ``stackref``
656 | is not a pointer to the top of the stack. It refers to the same stack even if
657 | frames are added or removed.
658 | 
659 | ``framecursorref`` refers to to frame cursors. A frame cursor is an internal
660 | structure used by the stack introspection API to iterate through stack frames.
661 | Its content is mutable but opaque. See `Threads and Stacks
662 | <threads-stacks.rest>`__ for more details.
663 | 
664 | ``irnoderef`` refers to a Mu IR node being constructed by the `IR Builder API
665 | <irbuilder.rest>`__.
666 | 
667 | .. vim: tw=80
668 | 


--------------------------------------------------------------------------------
/uvm-ir-binary.rest:
--------------------------------------------------------------------------------
  1 | =========================================
  2 | Intermediate Representation (Binary Form)
  3 | =========================================
  4 | 
  5 | This document describes the binary form of the Mu intermediate representation.
  6 | For the text form, see `<uvm-ir.rest>`__.
  7 | 
  8 | **DEPRECATED**: The binary format is deprecated. As mentioned in `this ticket
  9 | <https://github.com/microvm/microvm-meta/issues/55>`__, we have come to the
 10 | conclusion that the interface between the client and the micro VM should be a
 11 | functional interface, i.e. constructing IR nodes by invoking API functions. This
 12 | binary IR form is still a serialised data format that needs to be parsed. The
 13 | text form, however, is still useful for debugging and for using in statically
 14 | compiled implementations.
 15 | 
 16 | Overview
 17 | ========
 18 | 
 19 | The Mu IR Binary Form is similar to the `Text Form <uvm-ir.rest>`__ in structure,
 20 | but has notable differences.
 21 | 
 22 | Numerical IDs are used exclusively instead of textual names. The binary form
 23 | also provides a special "name binding" pseudo-top-level definition which
 24 | associates IDs with names.
 25 | 
 26 | Binary format
 27 | =============
 28 | 
 29 | A bundle in the binary form consists of many numbers encoded in bytes. All
 30 | numbers are encoded in **little endian** and are **tightly packed** which means
 31 | there are no padding bytes between two adjacent numbers. For floating point
 32 | numbers, it is equivalent to convert them bit-by-bit into integer types of the
 33 | same length and convert to bytes in little-endian.
 34 | 
 35 | Binary Types
 36 | ------------
 37 | 
 38 | A sequence of bytes has a **binary type** which maps the bytes to the value they
 39 | represent. Possible binary types are:
 40 | 
 41 | i8, i16, i32, i64
 42 |     Integer types of the respective lengths.
 43 | float, double
 44 |     Floating point types of 32 bits and 64 bits, respectively.
 45 | idt
 46 |     Alias to i32. Used for IDs.
 47 | lent
 48 |     Alias to i16. Used for lengths of variable-length structures including the
 49 |     number of fields in a struct and the number of items in a parameter list.
 50 | aryszt
 51 |     Alias to i64. Used for the length of arrays.
 52 | opct
 53 |     Alias to i8. Used for instruction opcodes, operations or flags.
 54 | *other structures*
 55 |     One structure can contain other structures defined separately.
 56 | 
 57 | A table is used to represent a contiguous structure. The first row is a list of
 58 | binary types specifying the type of each column and the second row specifies for
 59 | each column either a symbolic name for that field or the exact binary content
 60 | expected. Such a structure consists of a sequence of numbers of the types of the
 61 | first row.
 62 | 
 63 | +-------+------------------+
 64 | | type1 | type2            |
 65 | +=======+==================+
 66 | | num   | or symbolic name | 
 67 | +-------+------------------+
 68 | 
 69 | Common Structures
 70 | =================
 71 | 
 72 | Some structures are common in multiple structures.
 73 | 
 74 | ID List
 75 | -------
 76 | 
 77 | An ID list, denoted as **idList**, is a list of IDs. It has the general form:
 78 | 
 79 | +------+-----+-----+-----+
 80 | | lent | idt | idt | ... |
 81 | +======+=====+=====+=====+
 82 | | nIDs | id1 | id2 | ... |
 83 | +------+-----+-----+-----+
 84 | 
 85 | ``nIDs`` specifies the number of IDs and there are ``nIDs`` IDs following it.
 86 | 
 87 | Top-level Structure
 88 | ===================
 89 | 
 90 | A bundle starts with a 4-byte magic "\x7F' 'U' 'I' 'R', or 0x7F 0x55 0x49
 91 | 0x52. Then there are many top-level definitions until the end of the bundle.
 92 | 
 93 | Type Definition
 94 | ---------------
 95 | 
 96 | Type definition has the following form:
 97 | 
 98 | +------+-----+------------------+
 99 | | opct | idt | type constructor |
100 | +======+=====+==================+
101 | | 0x01 | id  | cons             |
102 | +------+-----+------------------+
103 | 
104 | ``id`` is the identifier of the defined type. A type constructor follows the
105 | opcode 0x01 and the ID. See `<type-system.rest>`__ for a complete list of type
106 | constructors.
107 | 
108 |     NOTE: this is equivalent to: ``.typedef id = cons``.
109 | 
110 | Function Signature Definition
111 | -----------------------------
112 | 
113 | Function signature definition has the following form:
114 | 
115 | +------+-----+----------+----------+
116 | | opct | idt | idList   | idList   |
117 | +======+=====+==========+==========+
118 | | 0x02 | id  | paramtys | rettys   |
119 | +------+-----+----------+----------+
120 | 
121 | ``id`` is the identifier of the defined function signature. ``paramtys`` is a
122 | list of IDs of its parameter types. ``rettys`` is a list of IDs of the return
123 | types. 
124 | 
125 |     NOTE: this is equivalent to: ``.funcsig id = (paramtys) -> (rettys)``
126 | 
127 | Constant Definition
128 | -------------------
129 | 
130 | Constant definition has the following form:
131 | 
132 | +------+-----+------+----------------------+
133 | | opct | idt | idt  | constant constructor |
134 | +======+=====+======+======================+
135 | | 0x03 | id  | type | cons                 |
136 | +------+-----+------+----------------------+
137 | 
138 | ``id`` is the identifier of the defined constant. ``type`` is the type of the
139 | constant and must match the constant constructor. A constant constructor follows
140 | the type.
141 | 
142 |     NOTE: this is equivalent to: ``.const id <type> = cons``
143 | 
144 | Integer Constant Constructor
145 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
146 | 
147 | An integer constant constructor has the following form:
148 | 
149 | +------+--------+
150 | | opct | i64    |
151 | +======+========+
152 | | 0x01 | number |
153 | +------+--------+
154 | 
155 | ``number`` is the integer constant number. If the integer constant has a type
156 | with fewer bits, only the least significant bits are valid. The binary form
157 | cannot encode integer constants larger than 64 bits.
158 | 
159 |     NOTE: this is equivalent to an integer literal in the text form.
160 | 
161 | Floating Point Constant Constructors
162 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
163 | 
164 | A float constant constructor has the following form:
165 | 
166 | +------+--------+
167 | | opct | float  |
168 | +======+========+
169 | | 0x02 | number |
170 | +------+--------+
171 | 
172 | ``number`` is the float constant number.
173 | 
174 |     NOTE: this is equivalent to a float literal in the text form.
175 | 
176 | A double constant constructor has the following form:
177 | 
178 | +------+--------+
179 | | opct | double |
180 | +======+========+
181 | | 0x03 | number |
182 | +------+--------+
183 | 
184 | ``number`` is the double constant number.
185 | 
186 |     NOTE: this is equivalent to a double literal in the text form.
187 | 
188 | List Constant Constructor
189 | ~~~~~~~~~~~~~~~~~~~~~~~~~
190 | 
191 | A list constant constructor has the following form:
192 | 
193 | +------+---------+
194 | | opct | idList  |
195 | +======+=========+
196 | | 0x04 | elems   |
197 | +------+---------+
198 | 
199 | ``elems`` is a list of IDs, each of which refers to another constant which is
200 | the value of the corresponding field of the struct or element of array/vector.
201 | 
202 |     NOTE: this is equivalent to the struct literal ``{elems}`` in the text
203 |     form.
204 | 
205 | NULL Constant Constructor
206 | ~~~~~~~~~~~~~~~~~~~~~~~~~
207 | 
208 | A NULL constant constructor has the following form:
209 | 
210 | +------+
211 | | opct |
212 | +======+
213 | | 0x05 |
214 | +------+
215 | 
216 |     NOTE: this is equivalent to the ``NULL`` keyword in the text form.
217 | 
218 | Global Cell Definition
219 | ----------------------
220 | 
221 | Global cell definition has the following form:
222 | 
223 | +------+-----+------+
224 | | opct | idt | idt  |
225 | +======+=====+======+
226 | | 0x04 | id  | type |
227 | +------+-----+------+
228 | 
229 | ``id`` is the ID of the defined global cell. ``type`` is the type of the global
230 | cell.
231 | 
232 |     NOTE: this is equivalent to: ``.global id <type>``
233 | 
234 | Function Definition and Declaration
235 | -----------------------------------
236 | 
237 | Function declaration has the following form:
238 | 
239 | +------+-----+-----+
240 | | opct | idt | idt |
241 | +======+=====+=====+
242 | | 0x05 | id  | sig |
243 | +------+-----+-----+
244 | 
245 | ``id`` is the ID of the declared function. ``sig`` is the function signature of
246 | it.
247 | 
248 |     NOTE: this is equivalent to: ``.funcdecl id <sig>``
249 | 
250 | Function definition has the following form:
251 | 
252 | +------+-----+-------+-----+---------------+
253 | | opct | idt | idt   | idt | function body |
254 | +======+=====+=======+=====+===============+
255 | | 0x06 | id  | verid | sig | body          |
256 | +------+-----+-------+-----+---------------+
257 | 
258 | ``id`` is the ID of the defined function. ``verid`` is the ID of the version of
259 | the function. ``sig`` is the function signature of it. ``params`` is a list of
260 | IDs, each of which is the ID of its parameter. ``body`` is the function body.
261 | 
262 |     NOTE: this is equivalent to: ``.funcdef id VERSION verid <sig> {
263 |     body }``
264 | 
265 | Function Body
266 | =============
267 | 
268 | A **function body** has the following form:
269 | 
270 | +------+-------------+-------------+-----+
271 | | lent | basic block | basic block | ... |
272 | +======+=============+=============+=====+
273 | | nbbs | bb1         | bb2         | ... |
274 | +------+-------------+-------------+-----+
275 | 
276 | ``nbbs`` is the number of basic blocks. ``bbx`` are basic blocks.
277 | 
278 | A **basic block** has the following form:
279 | 
280 | +-----+---------+---------+-----+--------+-------------+-------------+-----+
281 | | idt | lent    | idPairs | idt | lent   | instruction | instruction | ... |
282 | +=====+=========+=========+=====+========+=============+=============+=====+
283 | | id  | nparams | params  | exc | ninsts | inst1       | inst2       | ... |
284 | +-----+---------+---------+-----+--------+-------------+-------------+-----+
285 | 
286 | ``id`` is the ID of the basic block. Every basic block must have an ID, *even
287 | the entry block*. ``nparams`` is the number of parameters in ``params``, which a
288 | list of pairs of IDs, each of which is:
289 | 
290 | +------+-------+
291 | | idt  | idt   |
292 | +======+=======+
293 | | type | param |
294 | +------+-------+
295 | 
296 | where ``type`` is the ID of the type of the parameter, and ``param`` is a
297 | parameter to the basic block.
298 | 
299 | ``exc`` is the ID of the exceptional parameter. It is omitted when the ID is 0.
300 | 
301 | ``ninsts`` is the number of instructions in the current basic block. There are
302 | ninsts instructions following the header.
303 | 
304 | An **instruction** has the following form:
305 | 
306 | +--------+-----+------------------+
307 | | idList | idt | instruction body |
308 | +========+=====+==================+
309 | | resIDs | id  | instbody         |
310 | +--------+-----+------------------+
311 | 
312 | ``resIDs`` is a list of IDS for the results. ``id`` is the ID of the
313 | instruction. ``instbody`` is instruction body which is specific to each
314 | instruction. See `<instruction-set.rest>`__ for an exhaustive list.
315 | 
316 | Function Exposing Definition
317 | ============================
318 | 
319 | Function declaration has the following form:
320 | 
321 | +------+-----+------+----------+--------+
322 | | opct | idt | idt  | opct     | idt    |
323 | +======+=====+======+==========+========+
324 | | 0x07 | id  | func | callConv | cookie |
325 | +------+-----+------+----------+--------+
326 | 
327 | ``id`` is the ID of the exposed value. ``func`` is the ID of the function to
328 | expose. ``callConv`` is the calling convention flag. ``cookie`` is the cookie,
329 | the ID of an ``int<64>`` constant.
330 | 
331 | Name Binding
332 | ============
333 | 
334 | Name binding is a definition specific to the binary form. It binds a name to an
335 | ID. It is designed for debugging purposes and is optional. The name must be a
336 | valid textual global identifier (including the prefix '@').
337 | 
338 | A name binding has the following form:
339 | 
340 | +------+-----+--------+-------+-------+-----+
341 | | opct | idt | lent   | i8    | i8    | ... |
342 | +======+=====+========+=======+=======+=====+
343 | | 0x08 | id  | nbytes | byte1 | byte2 | ... |
344 | +------+-----+--------+-------+-------+-----+
345 | 
346 | ``id`` is the ID to bind. ``nbytes`` is the number of bytes in the name and
347 | ``bytex`` is the value of each byte.
348 | 
349 | The name is encoded in ASCII and must follow the rules of global names, local
350 | names and allowed characters as defined in `<uvm-ir.rest>`__.
351 | 
352 | .. vim: textwidth=80
353 | 


--------------------------------------------------------------------------------
/uvm-memory.rest:
--------------------------------------------------------------------------------
  1 | =================
  2 | Mu and the Memory
  3 | =================
  4 | 
  5 | Overview
  6 | ========
  7 | 
  8 | Mu supports automatic memory management via garbage collection. There is a
  9 | **heap** which contains garbage-collected **objects** as well as many **stacks**
 10 | and a **global memory** which contain data that are not garbage-collected.
 11 | 
 12 | The heap memory is managed by the garbage collector.
 13 | 
 14 | Unlike C or C++, local SSA variables are not bound to memory locations. Stack
 15 | memory must be allocated by the ``ALLOCA`` or ``ALLOCAHYBRID`` instructions or
 16 | using the Mu client interface.
 17 | 
 18 | This specification does not mandate any object layout, but it is recommended to
 19 | layout common data types as in the platform application binary interface (ABI)
 20 | so that the native interface is easier to implement.
 21 | 
 22 | Basic Concepts
 23 | ==============
 24 | 
 25 | Mu Memory
 26 | ---------
 27 | 
 28 | There are three kinds of memory in Mu, namely the **heap memory**, the **stack
 29 | memory** and the **global memory**. **Mu memory** means one of them.
 30 | 
 31 | Memory is allocated in their respective **allocation units**. Every allocation
 32 | unit has a **lifetime** which begins when the allocation unit is created and
 33 | ends when it is destroyed.
 34 | 
 35 | A **memory location** is a region of data storage in the memory which can hold
 36 | data values. A memory location has a type and its value can only be of that
 37 | type.
 38 | 
 39 |     NOTE: The Mu memory is defined without mentioning "address". There is no
 40 |     "size", "alignment" or "offset" of a memory location. The relation between a
 41 |     memory location and an address is only established when pinning (discussed
 42 |     later). Even when pinned, the address may or may not be the canonical
 43 |     address where the location is allocated in Mu. For example, Mu objects can
 44 |     be replicated.
 45 |     
 46 |     When allocating Mu memory locations (in the heap, stack or global memory),
 47 |     Mu guarantee the location can hold Mu values of a particular type and, as
 48 |     long as the type allows atomic access, the location can be accessed
 49 |     atomically. The implementation must ensure all memory locations are
 50 |     allocated in such a way. For example, it should not allocate an integer
 51 |     across page boundary, but it may choose to use locks for atomicity, which,
 52 |     in practice, is usually a bad idea.
 53 | 
 54 | ..
 55 | 
 56 |     For C programmers: The word "object" in the C language is the counterpart of
 57 |     "memory location" in Mu. Mu does not have bit fields and a memory location
 58 |     is always an "object" in C's sense. In Mu's terminology, the word "object"
 59 |     is a synonym of "heap object" or "garbage-collected object".
 60 | 
 61 |     In C, the word "memory location" must have scalar types, but Mu uses the
 62 |     word for composite types, too.
 63 |     
 64 | For a memory location L that represents type T, if c is a member (if applicable)
 65 | or a component of T, it also has a memory location which is a **member** or a
 66 | **component** of the memory location L, respectively.  Memory location L1
 67 | **contains** a memory location L2 if L2 is a component of L1.
 68 | 
 69 | The **lifetime** of a memory location is the same as the allocation unit that
 70 | contains it.
 71 | 
 72 | As implementation details, when an allocation unit is destroyed and another
 73 | allocation unit occupied the same or overlapping space as the former, they are
 74 | different allocation units.  Different allocation units contain no common memory
 75 | locations. When a heap object is moved by the garbage collector, it is still the
 76 | same object. Any memory locations within the same object remain the same.
 77 | 
 78 |     NOTE: This means the memory of Mu is an abstraction over the memory space of
 79 |     the process. 
 80 | 
 81 | Native Memory
 82 | -------------
 83 | 
 84 | The **native memory** is not Mu memory. The native memory is an address space of
 85 | a sequence of bytes, each can be addressed by an integral address. The size of
 86 | the address is implementation-defined.
 87 | 
 88 | A region of bytes in the native memory can be interpreted as Mu values in an
 89 | implementation-dependent way. The bytes that represents a Mu value is the
 90 | **bytes representation** of that Mu value.
 91 | 
 92 |     For C programmers: it is similar to the "object representation", but in Mu,
 93 |     unless a memory location is pinned, it may not be represented in such a way.
 94 | 
 95 | A Mu memory location can be **pinned**. In this state, it is mapped to a
 96 | (contiguous) region of bytes in the native memory which contains the bytes
 97 | representation of the value the memory location holds. The beginning of the
 98 | memory location is mapped to the lowest address of the region. Different
 99 | components of a memory location which do not contain each other do not map to
100 | overlapping regions in the address space.
101 | 
102 |     For C programmers:
103 |     
104 |     * Mu assumes 8-bit bytes. 
105 | 
106 |     * Mu does not have the bit-field type, but a client can implement bit-fields
107 |       using integer types and bit operations. 
108 | 
109 |     * Mu does not have union types. However, like C, directly casting an address
110 |       to a pointer has implementation-defined behaviours. If a Mu program
111 |       interfaces with native programs, it has to also depend on the platform.
112 | 
113 |     * Unlike C, Mu operations work on SSA variables rather than memory locations
114 |       (the counterpart of objects in C).
115 | 
116 |     * Mu forces the 2's complement representation, though the byte order and
117 |       alignment requirement are implementation-defined.
118 | 
119 | See `Native Interface <native-interface.rest>`__ for details about the pinning and
120 | unpinning operations.
121 | 
122 | Memory Allocation and Deallocation
123 | ==================================
124 | 
125 | An allocation unit in the heap memory is called a **heap object**, or **object**
126 | when unambiguous. It is created when executing the ``NEW`` or ``NEWHYBRID``
127 | instructions or the ``new_fixed`` or ``new_hybrid`` API function. It is
128 | destroyed when the object is collected by the garbage collector.
129 | 
130 | An allocation unit in the stack memory is called an **alloca cell**. It is
131 | created when executing the ``ALLOCA`` or ``ALLOCAHYBRID`` instruction. It is
132 | destroyed when the stack frame containing it is destroyed.
133 | 
134 | An allocation unit in the global memory is called a **global cell**. One global
135 | cell is created for every ``.global`` declaration in a bundle submitted to Mu.
136 | Global cells are never destroyed.
137 | 
138 | Initial Values
139 | --------------
140 | 
141 | The initial value of any memory location is defined as the following, according
142 | the type of data value the memory location represents:
143 | 
144 | * The initial value of ``int`` and pointer types is 0 (numerical value or
145 |   address).
146 | * The initial value of floating point types is positive zero.
147 | * The initial value of ``ref``, ``iref``, ``weakref``, ``funcref``, ``stackref``
148 |   and ``threadref`` is ``NULL``.
149 | * The initial value of ``tagref64`` is a floating point number which is
150 |   positive zero.
151 | * The initial values of all fields or elements in ``struct``, ``array``,
152 |   ``vector`` and the fixed and variable part of ``hybrid`` are the initial
153 |   values according to their respective types.
154 | 
155 | Garbage Collection
156 | ------------------
157 | 
158 | A **root** is an object reference or internal reference in:
159 | 
160 | * any global cell, or
161 | * any bound Mu stacks, or
162 | * the thread-local object reference in any threads, or
163 | * any values held by any client contexts in the API.
164 | 
165 | A live stack contains references in its alloca cells and live local SSA
166 | variables. A dead stack contains no references. A thread can strongly reach its
167 | bound stack unless it is temporarily unbound because of trapping.
168 | 
169 | An object is **strongly reachable** if it can be reached by traversing only
170 | strong, stack and thread references from any root. An object is **weakly
171 | reachable** if it is not strongly reachable, but can be reached by traversing
172 | strong stack, thread and weak references from any root. Otherwise an object is
173 | **unreachable**.
174 | 
175 | The garbage collector can collect unreachable objects. It may also modify weak
176 | references which refers to a weakly reachable object to ``NULL``.
177 | 
178 |     NOTE: Doing the latter may make weakly reachable objects become unreachable.
179 | 
180 | The garbage collector may move objects.
181 | 
182 | Memory Accessing
183 | ================
184 | 
185 | Memory accessing operations include **load** and **store** operations. To
186 | **access** means to load or store. **Atomic read-modify-write** operations may
187 | have both a load and a store operation, but may have special atomic properties.
188 | 
189 |     NOTE: Instructions are named in capital letters: LOAD and STORE. The
190 |     abstract operations are in lower case: load, store and access.
191 | 
192 | Memory access operations can be performed by some Mu instructions (see
193 | `Instruction Set <instruction-set.rest>`__, API functions (see `Client Interface
194 | <uvm-client-interface.rest>`__), native programs which accesses the pinned Mu memory,
195 | or in other implementation-specific ways.
196 | 
197 | Two memory accesses **conflict** if one stores to a memory location and the
198 | other loads from or stores to the same memory location.
199 | 
200 | Parameters and Semantics of Memory Operations
201 | ---------------------------------------------
202 | 
203 | Generally speaking, load operations copy values from the memory and store
204 | operations copy values into the memory. The exact results are determined by the
205 | memory model. See `Memory Model <memory-model.rest>`__.
206 | 
207 | A **load** operation has parameters ``(ord, T, loc)``. *ord* is the memory order
208 | of the operation.  *T* is the type of *loc*, a memory location. The result is a
209 | value of the strong variant of type *T*.
210 | 
211 | A **store** operation has parameters ``(ord, T, loc, newVal)``. *ord is the
212 | memory order of the operation. *T* is the type of *loc*, a memory location.
213 | *newVal* is a value whose type is the strong variant of *T*. This operation does
214 | not produce values as result.
215 | 
216 | A **compare exchange** operation is an atomic read-modify-write operation. Its
217 | parameters are ``(isWeak, ordSucc, ordFail, T, loc, expected, desired)``.
218 | *isWeak* is a Boolean parameter which indicates whether the compare exchange
219 | operation is weak or string. *ordSucc* and *ordFail* are the memory orders of
220 | the operation when the comparing is successful or failed. *T* is the type of the
221 | memory location *loc*. *expected* and *desired* are values whose type is the
222 | strong variant of *T*. The result is a pair ``(v, s)``, where *v* has the type
223 | of the strong variant of *T*, and *s* is a Boolean value.
224 | 
225 | A compare exchange operation performs a load operation on *loc* and compares its
226 | result with *expected*. If the comparison is successful, it performs a store
227 | operation to location *loc* with *desired* as *newVal*.
228 | 
229 | If the operation is strong, The comparison succeeds **if and only if** the
230 | result of load equals *expected*. If it is weak, the comparison succeeds **only
231 | if** the result of load equals the *expected* value and it may spuriously fail,
232 | that is, it may fail even if the loaded value equals the *expected* value.
233 | 
234 | The result *v* is the result of the initial load operation and *s* is whether
235 | the comparison is successful or not.
236 | 
237 | An **atomic-x** operation is an atomic read-modify-write operation, where *x*
238 | can be one of (XCHG, ADD, SUB, AND, NAND, OR, XOR, MAX, MIN, UMAX, UMIN). Its
239 | parameters are ``(ord, T, loc, opnd)``. *ord* is the memory order of the
240 | operation. *T* is the type of the memory location *loc*. *opnd* is a value whose
241 | type is the strong variant of *T*. The result also has the type of the strong
242 | variant of *T*.
243 | 
244 | An atomic-x operation performs a load operation on location *loc*. Then
245 | according to *x*, it performs one of the binary operation below, with the result
246 | of the load operation as the left-hand-side operand and the value *opnd* as the
247 | right-hand-side operand. The result is:
248 | 
249 | XCHG
250 |     The value of *opnd*.
251 | ADD
252 |     The sum of the two operands.
253 | SUB
254 |     The difference of the two operands.
255 | AND
256 |     The bitwise AND of the two operands.
257 | NAND
258 |     The bitwise NOT of the bitwise AND of the two operands.
259 | OR
260 |     The bitwise inclusive OR of the two operands.
261 | XOR
262 |     The bitwise exclusive OR of the two operands.
263 | MAX
264 |     The maximum value of the two operands, considering both operand as signed.
265 | MIN
266 |     The minimum value of the two operands, considering both operand as signed.
267 | UMAX
268 |     The maximum value of the two operands, considering both operand as unsigned.
269 | UMIN
270 |     The minimum value of the two operands, considering both operand as unsigned.
271 | 
272 | ..
273 | 
274 |     NOTE: In the C syntax, the semantic of NAND is ``~(op1 & op2)``.
275 | 
276 | Then it performs a store operation to location *loc* with the result of the
277 | binary operation as *newVal*.
278 | 
279 | The result of the atomic-x operation is the result of the initial load
280 | operation.
281 | 
282 | All operators other than ``XCHG`` are only applicable for integer types.
283 | ``XCHG`` is allowed for any type. However, a Mu implementation may only
284 | implement some combinations of operators and operand types according to the
285 | requirements specified in `Portability <portability.rest>`__
286 | 
287 | Memory Operations on Pointers
288 | -----------------------------
289 | 
290 | Load, store, compare exchange and atomic-x operations can work with native
291 | memory in addition to Mu memory locations. In this case, the *loc* parameter of
292 | the above operations become a region of bytes in the native memory (usually
293 | represented as ``uptr<T>``) rather than memory locations (usually represented as
294 | ``iref<T>``).
295 | 
296 | Only *native-safe* types can be accessed via pointers.
297 | 
298 | When accessing the memory via pointers, if the bytes are mapped to a Mu memory
299 | location via pinning (see `Native Interface <native-interface.rest>`__), then if the
300 | referent type of the pointer is the same as the Mu memory location, it has the
301 | same effect as accessing the corresponding Mu memory location.
302 | 
303 | When non-atomically loading from or storing to a region *R* of bytes which is
304 | 
305 | 1. not mapped to (i.e. not perfectly overlapping with) a particular Mu memory
306 |    location, and
307 | 2. each byte in the region is part of any mapped byte region of any pinned Mu
308 |    memory location,
309 |    
310 | then such an operation loads or stores on a byte-by-byte basis. Specifically:
311 | 
312 | * Such a load operation *L*:
313 | 
314 |   1. for each address *A* of byte in the region *R*, performs a load operation
315 |        on the (only) Mu memory location of scalar types (not composite types)
316 |        whose mapped byte region *R2* contains address *A*, then extract the byte
317 |        value *b* at address *A*, then
318 | 
319 |   2. combine all results *b* from the previous step into a sequence of byte
320 |        values, then interprets it as the bytes representation of a Mu value.
321 |        This Mu value is the result of the load operation *L*.
322 | 
323 | * Such a store operation *S*:
324 | 
325 |   1. interprets its *newVal* argument as its bytes representation *B*, then
326 | 
327 |   2. for each address *A* of byte in the region *R*, performs a load operation
328 |        on the (only) Mu memory location of scalar types (not composite types)
329 |        whose mapped byte region *R2* contains address *A*, then update the
330 |        result by replacing the byte at address *A* with the byte in *B*, then
331 |        perform a store operation on the same Mu memory location with the updated
332 |        value as *newVal*.
333 | 
334 | ..
335 | 
336 |     NOTE: This allows Mu to allocate a byte array and access (by itself or by
337 |     native programs) it via pointers as if it is a struct or a union, and then
338 |     interpret the written values as bytes. The requirement of each byte being
339 |     mapped gives implementation-defined behaviours to accesses beyond the border
340 |     of any Mu objects (such as array out-of-bound errors), or accessing padding
341 |     bytes in Mu structs.
342 | 
343 | Accessing native memory regions not mapped to Mu memory locations has
344 | implementation-defined behaviours.
345 | 
346 |     NOTE: Accessing the native memory may have all kinds of results: getting a
347 |     previously-stored value, storing to one address and affect another address
348 |     when two addresses are mapped to the same physical memory region/file,
349 |     segmentation fault, bus error (especially on OSX), turning on/off the light
350 |     by doing memory-mapped IO, launching nuclear missiles, summoning nasal
351 |     demons, etc. Mu cannot make much guarantee.
352 | 
353 | Native programs can access pinned Mu memory locations in implementation-defined
354 | ways.
355 | 
356 |     NOTE: This means it requires the efforts from the implementations of both Mu
357 |     and the native programs to obtain any defined semantics in mixed Mu-native
358 |     programs. For C, it will involve the C language, the platform ABI and the Mu
359 |     ABI of that platform.
360 | 
361 | Memory Layout
362 | =============
363 | 
364 | Whether or how Mu data of any type are represented in the native memory is
365 | implementation-defined. When an object is pinned, the layout is viewed from the
366 | native memory in a platform-dependent way.
367 | 
368 | For Mu implementers, it is recommended to use the layout defined by the
369 | application binary interface of the platform in order to ease the data exchange
370 | via the native interface implementation.
371 | 
372 | Mu has some rules about Mu memory locations which must always preserved.
373 | 
374 | Rules of Memory Locations
375 | =========================
376 | 
377 | Every memory location has an associated type bound when the memory location is
378 | created and cannot be changed. The memory location can only hold values of that
379 | type.
380 | 
381 |     NOTE: The association between memory location and type is conceptual. This
382 |     does not mean the Mu implementation has to keep a metadata of the type of
383 |     all memory locations at runtime. The implementation only needs to keep
384 |     enough metadata to implement its garbage collector.
385 | 
386 | A memory location has a **beginning** and an **end**. The value it holds is
387 | represented in that region.  A non-NULL internal reference of type *T* refers to
388 | the memory location of type *T* at a specific beginning.
389 | 
390 |     NOTE: There can only be one such memory location.
391 |     
392 | Specifically, there is a memory location of type ``void`` at the beginning of
393 | any other memory location.
394 | 
395 |     NOTE: This makes it legal to cast any ``iref<T>`` to ``iref<void>`` and
396 |     back.
397 | 
398 | Prefix Rule
399 | -----------
400 | 
401 |     NOTE: The prefix rule is design to support having common language-specific
402 |     object headers in objects. It also supports inheritance in object-oriented
403 |     programming where a superclass is a prefix of a subclass.
404 | 
405 | A component C is a **prefix** of a type T if any of the following is true.
406 | 
407 | + *C* is *T* itself.
408 | + *T* is a ``struct`` and *C* is its first field.
409 | + *T* is a ``hybrid`` and *C* is its first field of the fixed part, or the fixed
410 |   part of *T* has no fields and *C* is the first element of the variable part.
411 | + *T* is an ``array<T n>`` or ``vector<T n>`` and n >= 1, and *C* is its first
412 |   element.
413 | + *C* is a prefix of another prefix of *T*.
414 | 
415 | A prefix of memory location *M* is the memory location that represents a prefix
416 | of the type of *M*.
417 | 
418 | All prefixes of a memory location have the same beginning.
419 | 
420 | The ``REFCAST`` instruction or the ``refcast`` API function preserves the
421 | beginning of the operand. If it casts ``iref<T1>`` to ``iref<T2>``, the result
422 | is an internal reference to the memory location of type ``T2`` at the same
423 | beginning. (see `Instruction Set <instruction-set.rest>`__)
424 | 
425 | Array Rule
426 | ----------
427 | 
428 | A **memory array** is defined as a contiguous memory location of components of
429 | the same type. The ``array`` type, the ``vector`` type as well as the variable
430 | part of a ``hybrid`` are all represented in the memory as memory arrays.
431 | 
432 | Nested ``array``, ``vector`` and variable part of ``hybrid`` can be considered
433 | as a single memory array with the innermost element type of the nested type as
434 | the element type.
435 | 
436 |     Example: The variable part of ``hybrid<T array<array<vector<float 4> 10
437 |     20>>`` can be treated as:
438 | 
439 |     + a memory array of ``float``, or,
440 |     + a memory array of ``vector<float 4>``, or
441 |     + a memory array of ``array<vector<float 4> 10>``, or
442 |     + a memory array of ``array<array<vector<float 4> 10> 20>``, or
443 | 
444 | Internal references to an element of a memory array can be shifted to other
445 | elements in the same memory array using the ``SHIFTIREF`` instruction.
446 | 
447 |     NOTE: ``SHIFTREF`` may cross the boundary of Mu types, but still remain in
448 |     the memory array. For example, an internal reference to the first ``float``
449 |     in the ``array<array<float 10> 10>`` array, which is a 10x10 matrix of
450 |     float, can be shifted to other rows using the ``SHIFTIREF`` instruction and
451 |     cross the 10-element boundary. Shifting by 12 elements from element (0,0)
452 |     will reach the element at (1,2).
453 | 
454 | .. vim: tw=80
455 | 


--------------------------------------------------------------------------------