├── watchdog-timer.md
├── history.md
├── contributing.md
├── README.md
├── memory-map.md
├── device-control-block.md
├── startup.md
├── rtos-support-features.md
├── hart-control-block.md
├── introduction.md
├── real-time-clock.md
├── interrupt-controller.md
├── system-clock.md
├── eabi.md
├── csrs.md
├── exceptions-and-interrupts.md
├── interrupts-use-cases.md
└── improvements-upon-privileged.md


/watchdog-timer.md:
--------------------------------------------------------------------------------
1 | # The Device Watchdog Timer (WDT)
2 | 
3 | TODO: define it. (consider E300 watchdog for inspiration)
4 | 
5 | 


--------------------------------------------------------------------------------
/history.md:
--------------------------------------------------------------------------------
 1 | # Appendix C: History
 2 | 
 3 | The open source project was created on GitHub in October 2017 (the
 4 | [The Embedded RISC-V Project](https://github.com/emb-riscv)),
 5 | but initially there was no content available.
 6 | 
 7 | Work on the first proposal of the specs started in late January 2108, with the
 8 | text formatted as markdown, and the preliminary version 0.1.1 was ready by the
 9 | end of February 2018, and submitted to selected readers for feedback.
10 | 


--------------------------------------------------------------------------------
/contributing.md:
--------------------------------------------------------------------------------
 1 | # Appendix D: Contributing
 2 | 
 3 | As for most open source projects, all contributions are welcomed!
 4 | 
 5 | ## Bugs
 6 | 
 7 | Any mistakes that are identified, either typos, logic mistakes, wrong
 8 | argumentations, etc, should be addressed as Bugs in the
 9 | [Issues](https://github.com/emb-riscv/specs-markdown/issues) section.
10 | 
11 | ## Enhancements
12 | 
13 | Clearly defined proposals should be addressed as Enhancements to the
14 | [Issues](https://github.com/emb-riscv/specs-markdown/issues) section,
15 | or, even better, as
16 | [Pull requests](https://github.com/emb-riscv/specs-markdown/pulls).
17 | 
18 | ### C/C++ use cases
19 | 
20 | Proposals should be accompanied by use-cases in C/C++, and solid argumentation
21 | why the new solution is more efficient and/or easier to use than existing
22 | solutions.
23 | 
24 | Please don't forget that the mission statement is to "define a modern
25 | C/C++ friendly architecture", so solutions that cannot be expressed in
26 | C/C++ need a very good argumentation to be seriously considered.
27 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # The RISC-V Microcontroller Profile
 2 | 
 3 | A proposal for a friendlier microcontroller architecture using the RISC-V instruction set.
 4 | 
 5 | Version: 0.2.1-pre
 6 | 
 7 | Editors:
 8 | * Liviu Ionescu
 9 | 
10 | Warning: This draft specification is in a preliminary phase and may change at any time. For the moment it is more like a wish list than a real specs document.
11 | 
12 | 
13 | ## Motto 
14 | 
15 | _"People are more expensive than transistors"._
16 | 
17 | ## Table of Contents
18 | 
19 | * [Introduction](introduction.md)
20 | * [Memory Map](memory-map.md)
21 | * [The Startup Process](startup.md)
22 | * [Exceptions and Interrupts](exceptions-and-interrupts.md)
23 | * [Control and Status Registers (CSRs)](csrs.md)
24 | * [Hart Control Block (`hcb`)](hart-control-block.md)
25 | * [Hart Interrupt Controller (`hic`)](interrupt-controller.md)
26 | * [Device Control Block (`dcb`)](device-control-block.md)
27 | * [Device Real-Time Clock (`rtclock`)](real-time-clock.md)
28 | * [Device System Clock (`sysclock`)](system-clock.md)
29 | * [Device Watchdog Timer (`wdog`)](watchdog-timer.md)
30 | * [Embedded ABI (EABI)](eabi.md)
31 | * [RTOS Support Features](rtos-support-features.md)
32 | * [Appendix A: Improvements upon RISC-V privileged](improvements-upon-privileged.md) <--- Read Me First!
33 | * [Appendix B: Interrupts use cases](interrupts-use-cases.md)
34 | * [Appendix C: History](history.md)
35 | * [Appendix D: Contributing](contributing.md)
36 | 
37 | TODO:
38 | 
39 | - add MPU definitions
40 | - add more details about the restrictions in user mode.
41 | 
42 | ## License
43 | 
44 | This document is released under a [Creative Commons Attribution 4.0 International](https://creativecommons.org/licenses/by/4.0/legalcode) license.
45 | 


--------------------------------------------------------------------------------
/memory-map.md:
--------------------------------------------------------------------------------
 1 | # Memory map
 2 | 
 3 | Generally the RISC-V microcontroller memory map is implementation specific; the only reserved area
 4 | in the RISC-V microcontroller profile is a slice at the end of the memory space, called the
 5 | **system control area**.
 6 | 
 7 | Typical RISC-V microcontroller devices have:
 8 | 
 9 | - a **read-only code** area, usually at 0x00000000, but the actual address may be implementation
10 | specific (typically flash)
11 | - a **read-write data** area (typically RAM)
12 | - an implementation specific **peripheral** area.
13 | 
14 | Multi-hart devices can share certain memory areas (code or data), but can also have hart-specific
15 | memory areas, or both shared and specific areas.
16 | 
17 | ## The system control area
18 | 
19 | The system control area is a slice of 256 MiB at the end of the memory space. This area
20 | must have the execute permissions removed, and attempts to execute code from it must trigger
21 | an exceptions (instruction access fault).
22 | 
23 | For 32-bit devices, the system control area is **0xF0000000-0xFFFFFFFF**.
24 | 
25 | For 64-bit devices, the system control area is **0xFFFFFFFF'F0000000-0xFFFFFFFF'FFFFFFFF**.
26 | 
27 | The system control area is implemented as a set of memory-mapped address spaces, some providing control and status registers common for the entire
28 | device, and some providing control and status registers for the current hart:
29 | 
30 | | Base | Top | Name | Description |
31 | |:-----|:----|:-----|-------------|
32 | | 0xF000'0000 | 0xF000'0FFF | `dcb` | The Device Control Block. |
33 | | 0xF000'1000 | 0xF000'1FFF | `sysclock` | The Device System Clock. |
34 | | 0xF000'2000 | 0xF000'2FFF | `rtclock` | The Device Real-Time Clock. |
35 | | 0xF000'3000 | 0xF000'3FFF | `wdog` | The Device Watchdog Timer. |
36 | | | | | |
37 | | 0xF100'0000 | 0xF100'0FFF | `hcb` | The Hart Control Block. |
38 | | 0xF100'2000 | 0xF100'3FFF | `hic` | The Hart Interrupt Controller. |
39 | 
40 | Each hart has its own separate control block; all HCBs map to the same address, the internal
41 | logic being able to distinguish between them based on the ID of the hart requesting access;
42 | thus each hart can access only its own hart control block.
43 | 
44 | Same for the Hart Interrupt Controller.
45 | 
46 | (the addresses are preliminary, need more work to find a solution easy to decode)
47 | 
48 | 
49 | TODO [PA]: reserving space in the system control area
50 | for debug program buffer, in case an implementation chooses to
51 | make the buffer memory mapped.
52 | 


--------------------------------------------------------------------------------
/device-control-block.md:
--------------------------------------------------------------------------------
 1 | # The Device Control Bloc (`dcb`)
 2 | 
 3 | The DCB includes system registers that are common to the entire device and are not
 4 | specific to any given hart.
 5 | 
 6 | ## Memory map
 7 | 
 8 | | Offset | Name | Width | Type | Reset | Description |
 9 | |:-------|:-----|:------|:-----|:------|-------------|
10 | | 0x0000 | hartidmax | 32b | ro | 0x00000NNN | The highest value for hart ID in the device. |
11 | | 0x0004 | vendorid | 32b | ro |  | Vendor ID. |
12 | | 0x0008 | archid | 32b | ro |  | Architecture ID. |
13 | | 0x000C | impid | 32b | ro |  | Implementation ID. |
14 | | 0x0010 | | | | | Reserved. |
15 | | 0x0020 | dcsr | | | | Debug CSR. |
16 | | 0x0028 | dpc | | | | Debug PC. |
17 | | 0x0030 | | | | | Reserved. |
18 | | 0x0100 | harts | | | | All harts interrupts. |
19 | 
20 | ## The highest hart ID
21 | 
22 | For multi-hart devices, reading this register returns the highest numerical value used as hart ID
23 | in the device. Single-hart devices must return 0.
24 | 
25 | ## All harts interrupts
26 | 
27 | For multi-hart devices, these registers allow one hart to pend interrupts to any other
28 | hart, and possibly to temporarily adjust the priority thresholds, to handle synchronization
29 | issues like priority inversion.
30 | 
31 | > <sup>Use case must be further investigated.</sup>
32 | 
33 | For single-hart devices, this area is reserved.
34 | 
35 | | Offset | Name | Width | Type | Reset | Description |
36 | |:-------|:-----|:------|:-----|:------|-------------|
37 | | 0x0000 | `hartid` | 32b | w | 0x00000000 | Access key. |
38 | | 0x0004 | `pendnum` | 32b | w |  | Hart interrupt pending bits. |
39 | | 0x0008 | `prioth` | 32b | rw | 0x00000000 | Hart priority threshold. |
40 | | 0x000C |  |  |  |  | Reserved. |
41 | 
42 | These registers have one additional bit of state. To prevent inadvertent interrupt
43 | pendings, all writes to this area (`pendnum` and `prioth`) must be preceded by an
44 | unlock operation to the `hartid` register. The value (0x51F15000 + (Hart ID)) must be
45 | written to the `hartid` register to set the hart id and the state bit before
46 | any write access to `pendings` and `prioth`.
47 | The state bit is cleared at reset, and after any write to `pendnum` or `prioth` registers.
48 | 
49 | The `pendnum` register is write only and allows access to the hart identified
50 | by the write to `hartid`. Writing a small integer value pends the
51 | interrupt with the given number. The `pendnum` register must have enough bits to
52 | represent any interrupt number (at most 10).
53 | 
54 | The `prioth` register is read/write and allows access to the hart identified
55 | by the write to `hartid`. It must have enough bits to represent any interrupt
56 | priority.
57 | 
58 | Warning: The key mechanism has synchronisation problems in case multiple harts access it
59 | simultaneously. Implementations can choose to allow access only from hart 0.
60 | 
61 | ## RISC-V compatibility CSRs
62 | 
63 | The RISC-V Volume I, Chapter 2.8, mandates for the `rdtime` instruction. This can be
64 | retrieved from `dcb.rtclock.counter`.
65 | 
66 | Other RISC-V registers from RISC-V Volume II:
67 | 
68 | - mvendorid
69 | - marchid
70 | - mimpid
71 | 
72 | Other registers that might need attention:
73 | 
74 | - required for the debug module (are these per-hart or per-device?):
75 |   - dcsr
76 |   - dpc
77 | - optional for the debug module:
78 |   - dscratch0
79 |   - dscratch1
80 | 


--------------------------------------------------------------------------------
/startup.md:
--------------------------------------------------------------------------------
  1 | # Device startup
  2 | 
  3 | After reset, all harts in a RISC-V microcontroller start executing code, identified by a
  4 | per-hart **startup block**.
  5 | 
  6 | The location of the hart startup block is implementation specific. The typical
  7 | configuration with a single hart has the startup block located at the beginning
  8 | of the memory space (usually address 0x00000000).
  9 | 
 10 | If multiple harts share a memory area to fetch code (like a flash area), the
 11 | startup blocks are organised as an array located at the beginning of the shared
 12 | memory area. If different harts have different memory areas, the startup blocks
 13 | are located at the beginning of each area.
 14 | 
 15 | For a RISC-V hart, the minimum information required to start a hart is:
 16 | 
 17 | - a pointer to the startup routine
 18 | - a pointer to the main stack (`spm`)
 19 | - a pointer to the RISC-V global pointer (`gp`)
 20 | - a pointer to the exception table
 21 | 
 22 | All pointers are xlen bits.
 23 | 
 24 | For further extensions, a few words at the end of the startup area are reserved.
 25 | 
 26 | > <sup>The pointer to the exception table must be known by the hart before entering
 27 |   the startup code, to catch possible execution faults in the startup code.</sup>
 28 | 
 29 | ## Usage
 30 | 
 31 | With the above definition of a startup block, there is no need for any assembly
 32 | instructions, the entire startup code can be written in C/C++.
 33 | 
 34 | ```c
 35 | 
 36 | extern "C" {
 37 | 
 38 | typedef void (*riscv_exception_handler_t)(void);
 39 | 
 40 | typedef struct
 41 | {
 42 |   void (*startup)(void);
 43 |   void* main_stack_pointer;
 44 |   void* global_pointer;
 45 |   riscv_exception_handler_t* exception_handlers;
 46 |   void* reserved[4];
 47 | } riscv_startup_block_t
 48 | 
 49 | riscv_startup_block_t
 50 | __attribute__((section(".startup_blocks")))
 51 | harts_startup_blocks[] = {
 52 |   {
 53 |     hart0_startup,
 54 |     hart0_stack_pointer,
 55 |     hart0_global_pointer,
 56 |     hart0_exception_handlers
 57 |   },
 58 |   {
 59 |     hart1_startup,
 60 |     hart1_stack_pointer,
 61 |     hart1_global_pointer,
 62 |     hart1_exception_handlers
 63 |   }
 64 | };
 65 | 
 66 | [[noreturn]] void
 67 | hart0_startup(void)
 68 | {
 69 |   // ...
 70 | }
 71 | 
 72 | [[noreturn]] void
 73 | hart1_startup(void)
 74 | {
 75 |   // ...
 76 | }
 77 | 
 78 | } // extern "C"
 79 | ```
 80 | 
 81 | ### Prerequisites
 82 | 
 83 | The linker script must allocate the `.startup_blocks` section at the implementation
 84 | specific address (usually 0x00000000).
 85 | 
 86 | ## Implementation
 87 | 
 88 | TODO: define a format to express the pseudocode. Possibly Scala?
 89 | 
 90 | After reset, each hart will execute the following code, with
 91 | 
 92 | ```
 93 | start_hart(int hid)
 94 | {
 95 |   // Identify the per-hart startup block.
 96 |   addr = (word_size * 8) * hid;
 97 | 
 98 |   // Clear all hart registers.
 99 |   hart[hid].x0 = 0;
100 |   hart[hid].x1 = 0;
101 |   // ...
102 |   // Store the exception pointer in the hart specific register.
103 |   hart[hid].excvta = *(addr + word_size * 3);
104 | 
105 |   // Load global pointer.
106 |   hart[hid].gp = *(addr + word_size * 2);
107 |   // Load main stack pointer.
108 |   hart[hid].sp = *(addr + word_size * 1);
109 |   // Load program counter; this will immediately pass control to the startup code.
110 |   hart[hid].pc = *(addr + word_size * 0);
111 | }
112 | ```
113 | 


--------------------------------------------------------------------------------
/rtos-support-features.md:
--------------------------------------------------------------------------------
 1 | # RTOS Support Features
 2 | 
 3 | The RISC-V microcontroller profile is designed not only to be C/C++ friendly, but also with RTOS support in mind.
 4 | 
 5 | To make RTOS implementations easier and more efficient, the following features are available:
 6 | 
 7 | ## Shadow thread stack pointer
 8 | 
 9 | Two stack pointers are available, the main stack pointer (MSP) and the thread stack pointer (TSP).
10 | 
11 | The main stack is the default stack available after reset, and all exceptions and interrupts create
12 | a stack frame on the main stack.
13 | 
14 | If the application switches to Thread mode, the hart switches to the TSP, while interrupts continue
15 | to use the MSP.
16 | 
17 | This solution has several advantages:
18 | 
19 | * the stack space for each thread needs to cover only the threads needs, and do not worry about
20 | possible large stack usages in ISRs;
21 | * if a thread corrupts it's stack, it is still likely that the stacks used by the interrupts and
22 | other threads are intact, thus improving system reliability.
23 | 
24 | ## Stack pointer limit
25 | 
26 | One of the most common failure cases that occurs while developing multi-threaded applications is
27 | for one thread to exceed its stack and damage the surrounding memory content.
28 | 
29 | The solution is to add a system register with a memory address used as lower limit for the stack.
30 | While pushing words on stack, the address is compared and if the limit is reached, an exception is
31 | raised.
32 | 
33 | ## System clock timer
34 | 
35 | The system clock timer is intended to drive the RTOS scheduler, and allow to measure durations
36 | (like timeouts) during normal system operations.
37 | 
38 | Having an architecture timer allows the RTOS to implement the scheduler code only once, and do not
39 | rely on device specific timers, which require separate initialisation and interrupt handlers for
40 | each specific device.
41 | 
42 | ## Real-time clock
43 | 
44 | The real-time clock is intended to provide the application a way of keeping track of time while the
45 | device is in sleep mode (and the system clock timer is shut down).
46 | 
47 | Having an architecture RTC allows to write the code to manage the absolute time only once inside the RTOS,
48 | and do not rely
49 | on device specific timers, which require separate initialisation and interrupt handlers for
50 | each specific device.
51 | 
52 | ## Context-Switch interrupt
53 | 
54 | The context switch interrupt is usually the lowest priority interrupt, and is used as the single point
55 | of handling context switches, allowing all other interrupt handlers to be written in C/C++ and do
56 | not bother with context switches at all.
57 | 
58 | Without such a feature, all application interrupt handlers require an assembly part to handle the
59 | context switching prior to calling the C/C++ handler, which is a major hassle.
60 | 
61 | ## Interrupts priorities threshold
62 | 
63 | Having a mechanism to disable only interrupts below a certain threshold greatly improves the real-time
64 | characteristics of a system, by not having to disable all interrupts while handling the system
65 | data structures. By raising the priority threshold instead of completely disabling interrupts, it
66 | is possible to keep fast interrupts still active, regardless how busy the RTOS itself is.
67 | 
68 | ## Hart soft reset
69 | 
70 | The hart soft reset is intended to reset the running hart from within.
71 | 
72 | ## Device soft reset
73 | 
74 | The device soft reset is intended to reset the entire device from within.
75 | 
76 | Having an architecture soft reset allows to write the code to reset the device only once
77 | inside the RTOS and do not rely on device specific code.
78 | 
79 | ## User mode
80 | 
81 | For security strict applications, the user mode can also be used in conjunction with the Memory
82 | Protection Unit (MPU), thus further enhancing the robustness of embedded systems.
83 | 
84 | ## Atomics
85 | 
86 | For multi-hart devices, the RISC-V 'A' Standard Extension for Atomic instructions contains
87 | instructions that atomically
88 | read-modify-write memory to support synchronization between multiple RISC-V harts running in
89 | the same memory space.
90 | 


--------------------------------------------------------------------------------
/hart-control-block.md:
--------------------------------------------------------------------------------
 1 | # The Hart Control Block (HCB)
 2 | 
 3 | For uniform access by software, in addition to CSRs, each hart maps its own status registers to the
 4 | same address in the memory space.
 5 | 
 6 | ## Memory Map
 7 | 
 8 | ### RV64 devices
 9 | 
10 | | Offset | Name | Width | Type | Reset | Description |
11 | |:-------|:-----|:------|:-----|:------|-------------|
12 | | 0x0000 | `excvta` | 64b | rw | Startup | Exceptions vector table address.  |
13 | | 0x0008 | `intvta` | 64b | rw | 0x00000000'00000000 | Interrupts vector table address.  |
14 | | 0x0010 | `intlast` | 64b | ro | | The index of the last interrupt in the HIC table.  |
15 | | 0x0018 | `sysclockcmp` | 64b | rw | 0x00000000'00000000 | System clock comparator. |
16 | | 0x0020 | `rtclockcmp` | 64b | rw | 0x00000000'00000000 | Real-time clock comparator. |
17 | | 0x0028 | | | | | Reserved.  |
18 | | 0x00F0 | `cyclecnt` | 64b | ro | 0x00000000'00000000 | Cycle count. |
19 | | 0x00F8 | `instcnt` | 64b | ro | 0x00000000'00000000 | Instructions count. |
20 | 
21 | ### RV32 devices
22 | 
23 | | Offset | Name | Width | Type | Reset | Description |
24 | |:-------|:-----|:------|:-----|:------|-------------|
25 | | 0x0000 | `excvta` | 32b | rw | Startup | Exceptions vector table address.  |
26 | | 0x0004 | | | | | Reserved.  |
27 | | 0x0008 | `intvta` | 32b | rw | 0x00000000 | Interrupts vector table address.  |
28 | | 0x000C | | | | | Reserved.  |
29 | | 0x0010 | `intlast` | 32b | ro | | The index of the last interrupt in the HIC table.  |
30 | | 0x0014 | | | | | Reserved.  |
31 | | 0x0018 | `sysclockcmpl` | 32b | rw | 0x00000000 | Low word of system clock comparator. |
32 | | 0x0018 | `sysclockcmph` | 32b | rw | 0x00000000 | High word of system clock comparator. |
33 | | 0x0020 | `rtclockcmpl` | 32b | rw | 0x00000000 | Low word of real-time clock comparator. |
34 | | 0x0020 | `rtclockcmph` | 32b | rw | 0x00000000 | High word of real-time clock comparator. |
35 | | 0x0028 | | | | | Reserved.  |
36 | | 0x00F0 | `cyclecntl` | 32b | ro | 0x00000000 | Low word of cycle count. |
37 | | 0x00F4 | `cyclecnth` | 32b | ro | 0x00000000 | High word of cycle count. |
38 | | 0x00F8 | `instcntl` | 32b | ro | 0x00000000 | Low word of instructions count. |
39 | | 0x00FC | `instcnth` | 32b | ro | 0x00000000 | High word of instructions count. |
40 | 
41 | ## Exceptions vector table address (`excvta`)
42 | 
43 | An xlen-bit register that holds the address of the exceptions dispatch table.
44 | The table is an array of addresses
45 | (xlen size elements) pointing to exception handlers (C/C++ functions).
46 | 
47 | The register is initialised with the value fetched from the hart startup block.
48 | 
49 | If not set (i.e. 0x0) and an exception occurs, the behaviour is undefined.
50 | 
51 | ## Interrupts vector table address (`intvta`)
52 | 
53 | An xlen-bit register that holds the address of the interrupts dispatch table.
54 | The table is an array of addresses
55 | (xlen size elements) pointing to interrupt handlers (C/C++ functions).
56 | 
57 | If not set (i.e. 0x0) and an interrupt occurs, an exception is
58 | triggered (TODO: what exception?).
59 | 
60 | If the hart does not implement an interrupt controller, writing this register
61 | is ignored and reading always returns zero. This mechanism can also be used
62 | to determine at runtime if the hart implements an interrupt controller.
63 | 
64 | ## The highest interrupt number (`intmax`)
65 | 
66 | The `intmax` read-only register is 32-bit and reads the highest interrupt number; it is
67 | useful when iterating the Hart Interrupt Controller array.
68 | 
69 | ## The system clock comparator
70 | 
71 | See the Device System Clock page.
72 | 
73 | ## The real-time clock comparator
74 | 
75 | See the Device Real-Time Clock page.
76 | 
77 | ## Cycle count
78 | 
79 | The `cyclecnt` register is 64-bit wide and holds a count of the number of clock cycles
80 | executed by the core on which the hart is running (not the hart itself!) from an
81 | arbitrary start time in the past. In practice, the underlying 64-bit counter should never
82 | overflow between two samples. The rate at which the cycle counter advances will depend
83 | on the implementation and operating environment. The execution environment
84 | should provide a means to determine the current rate (cycles/second) at which
85 | the cycle counter is incrementing.
86 | 
87 | RV64 devices expose a single 64-bit register, accessible with 64-bit instructions.
88 | RV32 devices exposes separate high/low 32-bit registers.
89 | 
90 | ## Instructions count
91 | 
92 | The `instcnt` register is 64-bit wide and counts the number of instructions executed
93 | by this hart from some arbitrary start point in the past.
94 | 
95 | RV64 devices expose a single 64-bit register, accessible with 64-bit instructions.
96 | RV32 devices exposes separate high/low 32-bit registers.
97 | 
98 | 


--------------------------------------------------------------------------------
/introduction.md:
--------------------------------------------------------------------------------
  1 | # Chapter 1: Introduction
  2 | 
  3 | This is a draft of the **RISC-V microcontroller architecture** description document.
  4 | [Feedback](contributing.md) welcome.
  5 | 
  6 | ## Mission Statement
  7 | 
  8 | Define a **modern C/C++ friendly** microcontroller architecture based on the RISC-V
  9 | instruction set, that makes writing embedded software **easier** and **more productive**.
 10 | And... enjoy the process!
 11 | 
 12 | In technical terms, the mission statement can be rephrased as: define a set of
 13 | specifications for **RISC-V microcontrollers** intended for **embedded** **real-time**
 14 | / **low power** / **IoT** applications that do not require an operating system.
 15 | Favour **C/C++** multi-threaded **RTOS** systems.
 16 | 
 17 | A secondary goal is to improve the RISC-V microcontroller profile to the point where
 18 | it can be adopted by the RISC-V foundation as a alternate standard for microcontroller
 19 | devices.
 20 | 
 21 | ## Limitations
 22 | 
 23 | These specifications intentionally **do not** include application class devices which
 24 | use virtual memory and/or have supervisor/hypervisor modes which are intended to run
 25 | operating systems kernels. For this class of devices, see the "RISC-V Privileged
 26 | Architecture" specifications.
 27 | 
 28 | ## Sub-profiles
 29 | 
 30 | Since there are many microcontroller configurations, 3 classes were identified:
 31 | 
 32 | - **ES** (embedded small) **ES-RV32E** if possible, otherwise **ES-RV32I[M][C]**:
 33 | **low end**, single hart,
 34 | 32-bit, no floating point, no unprivileged mode (intended to support legacy PIC & AVR class
 35 | applications; comparable with Cortex-M0)
 36 | - **EM** (embedded medium) **EM-RV32IM[F[D]]C** / **EM-RV64IM[F[D]]C**:
 37 | **regular**, single hart, 32/64-bit, possibly with floating point
 38 | (intended to support common multi-threaded applications; comparable with
 39 | Cortex-M3/M4)
 40 | - **EL** (embedded large) **EL-RV32IMA[F[D]]C** / **EL-RV64IMA[F[D]]C**:
 41 | **high end**, multi-hart/multi-core, 32/64-bit, atomics, possibly with floating point
 42 | (intended to support hard real-time, high performance applications)
 43 | 
 44 | ## Benefits
 45 | 
 46 | One of the mantras used during the RISC-V design was "if it can be done
 47 | in software, it should not be done in hardware."
 48 | 
 49 | The microcontroller profile reconsidered the implementation of some
 50 | core features (like stack handling), and pushed them back to hardware,
 51 | where they belong.
 52 | 
 53 | 
 54 | 
 55 | Some of the benefits are:
 56 | 
 57 | - best interrupt latency, more appropriate for real-time applications
 58 | - improved robustness for multi-threaded applications
 59 | - much easier to use directly in C/C++
 60 | 
 61 | > <sup>The RISC-V microcontroller profile is created with developers in 
 62 |   mind, to make developpers happy. Happy developers write better 
 63 |   applications, making final users happy as well.</sup> 
 64 |   
 65 | ## Definitions
 66 | 
 67 | ### Hart
 68 | 
 69 | Hart is a contraction of _hardware thread_ and represents a hardware resource.
 70 | 
 71 | Technically, a hart is a resource abstraction representing an independently
 72 | advancing RISC-V execution context within a RISC-V execution environment.
 73 | 
 74 | A RISC-V execution context contains a full set of RISC-V architectural registers.
 75 | 
 76 | A hart executes its program independently from other harts in a RISC-V system.
 77 | "Execute independently" means that each hart will
 78 | eventually fetch and execute its next instruction in program order regardless
 79 | of the activity of other harts (at least at user level).
 80 | 
 81 | #### RISC-V microcontroller specifics
 82 | 
 83 | Harts are identified by a Hart ID, a small unsigned integer. Hart IDs are unique.
 84 | The rule used to assign hart IDs is implementation specific, but it is recommended
 85 | to keep it simple, preferably within a continuous small range. There should always
 86 | be a hart with ID=0, which will have slightly more duties, for example to process
 87 | the NMIs.
 88 | 
 89 | To help applications auto-configure themselves, the largest hart ID is stored in
 90 | a register in the Device Control Bloc (`dcb.hartidmax`).
 91 | 
 92 | ### Core
 93 | 
 94 | A RISC-V device can contain one or more RISC-V-compatible processing cores
 95 | together with other non-RISC-V-compatible cores.
 96 | 
 97 | A core is usually considered a purely physical thing.
 98 | 
 99 | A core implements one or more harts, where if there are multiple harts, they are
100 | time-multiplexing some common hardware components (e.g., instruction fetch,
101 | physical registers, ALUs, predictor state, etc.)
102 | 
103 | ### CSRs
104 | 
105 | Control and Status Registers (CSRs) are used for hart-specific state only. CSRs
106 | are not memory mapped - they are accessed by CSR instructions.
107 | 
108 | 


--------------------------------------------------------------------------------
/real-time-clock.md:
--------------------------------------------------------------------------------
  1 | # The Device Real-Time Clock (DRTC)
  2 | 
  3 | ## Overview
  4 | 
  5 | The **Device Real-Time Clock** is intended to support the implementation of the ISO/IEC 14882.2011
  6 | `system_clock` (§ 20.11.7.1) and `steady_clock` (§ 20.11.7.2) classes. Objects of class
  7 | `system_clock` represent wall clock time from the system-wide real-time clock. Objects of
  8 | class `steady_clock` represent clocks for which values of the time point never decrease as
  9 | physical time advances and for which values of time_point advance at a steady rate
 10 | relative to real time. That is, the clock may not be adjusted.
 11 | 
 12 | All harts in a RISC-V device share the same Device Real-Time Clock counter, but each hart may
 13 | have its own comparator.
 14 | 
 15 | Even when the device is halted in Debug state, the clock counter continues to be incremented.
 16 | 
 17 | The real-time clock is inspired by the `mtime`/`mtimecmp` definitions in the RISC-V privileged specs,
 18 | but it differs by having a control register and not being intended to drive the scheduler clock.
 19 | 
 20 | ## Power domain
 21 | 
 22 | To support full functionality, the real-time clock should run even when the
 23 | rest of system is powered down, so it must be located in a different frequency/voltage
 24 | domain from the cores.
 25 | 
 26 | ## Clock input
 27 | 
 28 | The real-time clock input frequency is fixed to a device or application specific value.
 29 | 
 30 | To support low-power devices, the real-time clock input should be a low frequency oscillator; the actual
 31 | source is implementation specific.
 32 | 
 33 | > <sup>Common implementations use a 32.678 Hz quartz or oscillator.
 34 |   Low frequency internal RC oscillators (for example 40 kHz) can also be used, but the application
 35 |   must calibrate the frequency using a higher accuracy source. With a typical 32 kHz input,
 36 |   the clock resolution
 37 |   is about 30 µS and it takes about 17 million years to overflow. </sup>
 38 | 
 39 | > <sup>The real-time clock is usually not suitable to drive the RTOS tick timer, since either
 40 |   it is not accurate enough, or its frequency does not allow the common 1000 Hz scheduler rate;
 41 |   use the system clock instead.</sup>
 42 | 
 43 | ## Memory map
 44 | 
 45 | RV64 devices
 46 | 
 47 | | Offset | Name | Width | Type | Reset | Description |
 48 | |:-------|:-----|:------|:-----|:------|-------------|
 49 | | 0x0000 | `ctrl` | 32b | rw | 0x00000003 | Control and status register. |
 50 | | 0x0008 | `cnt` | 64b | ro | Undefined | RTC timer counter. |
 51 | | 0x0010 | `cmp` | 64b | rw | Undefined | RTC comparator. |
 52 | 
 53 | RV32 devices
 54 | 
 55 | | Offset | Name | Width | Type | Reset | Description |
 56 | |:-------|:-----|:------|:-----|:------|-------------|
 57 | | 0x0000 | `ctrl` | 32b | rw | 0x00000003 | Control and status register. |
 58 | | 0x0008 | `cntl` | 32b | ro | Undefined | Low word of RTC counter. |
 59 | | 0x000C | `cnth` | 32b | ro | Undefined | High word of RTC counter. |
 60 | | 0x0010 | `cmpl` | 32b | rw | Undefined | Low word of RTC comparator. |
 61 | | 0x0014 | `cmph` | 32b | rw | Undefined | High word of RTC comparator. |
 62 | 
 63 | TODO: define the mechanism to clear the counter. at each enable?
 64 | 
 65 | ## The clock control and status register
 66 | 
 67 | Controls the RTC and provides status data.
 68 | 
 69 | By default, the RTC starts disabled; software must enable it during startup.
 70 | 
 71 | | Bits | Name | Type | Reset | Description |
 72 | |:-----|:-----|:-----|:------|-------------|
 73 | | [0] | `enable` | rw | 0 | Indicates the enabled status of the RTC counter: <br> 0 - Counter is disabled (default). <br> 1 - Counter is enabled. |
 74 | | [2-1] | `source` | rw | 0b11 | Indicates the clock source: <br> 0b00 - Implementation specific external reference clock. <br> 0b01 - Reserved. <br> 0b10 - Factory-trimmed on-chip oscillator. <br> 0b11 - External crystal oscillator (default). |
 75 | | [31-3] |||| Reserved. |
 76 | 
 77 | 
 78 | ## The clock counter register
 79 | 
 80 | The real-time clock time point register is a 64-bit counter, common on all RV32 and RV64 devices.
 81 | 
 82 | To guarantee the steadiness characteristic of the clock, the register is read-only.
 83 | 
 84 | RV64 devices expose a single 64-bit register, accessible with 64-bit instructions.
 85 | RV32 devices exposes separate high/low 32-bit registers.
 86 | 
 87 | ## The clock comparator register
 88 | 
 89 | In addition to keeping track of time, the real-time clock can also be used to
 90 | trigger periodic interrupts. Low-power devices
 91 | can use the real-time clock to wakeup the entire RISC-V device from implementation
 92 | specific sleep modes.
 93 | 
 94 | The comparator register causes a `rtclock_cmp` interrupt to be posted when the
 95 | counter register
 96 | contains a value greater than or equal to the value in the comparator register.
 97 | The interrupt remains posted until it is cleared by writing to the comparator register.
 98 | 
 99 | RV64 devices expose a single 64-bit register, accessible with 64-bit instructions.
100 | RV32 devices exposes separate high/low 32-bit registers.
101 | 
102 | ## Usage
103 | 
104 | The memory mapped registers are available via a set of structures, directly available in C/C++.
105 | 
106 | RV64 devices:
107 | 
108 | - `rtclock.ctrl`
109 | - `rtclock.cnt`
110 | - `hcb.rtclockcmp`
111 | 
112 | RV32 devices:
113 | 
114 | - `rtclock.ctrl`
115 | - `rtclock.cntl`
116 | - `rtclock.cnth`
117 | - `hcb.rtclockcmpl`
118 | - `hcb.rtclockcmph`
119 | 
120 | ```c
121 | uint64_t
122 | riscv_rtclock_read_cnt(void)
123 | {
124 | #if __riscv_xlen == 32
125 |   // Atomic read. The loop is taken once in most cases. Only when the
126 |   // value carries to the high word, two loops are performed.
127 |   while (true)
128 |     {
129 |       uint32_t hi = rtclock.cnth;
130 |       uint32_t lo = rtclock.cntl;
131 |       if (hi == rtclock.cnth)
132 |         {
133 |           return ((uint64_t) hi << 32) | lo;
134 |         }
135 |     }
136 | #else
137 |   return rtclock.cnt;
138 | #endif
139 | }
140 | 
141 | uint64_t
142 | riscv_rtclock_read_cmp(void)
143 | {
144 | #if __riscv_xlen == 32
145 |   return ((uint64_t) hcb.rtclockcmph << 32) | hcb.rtclockcmpl;
146 | #else
147 |   return dcb.rtclock.cmp;
148 | #endif
149 | }
150 | 
151 | void
152 | riscv_rtclock_write_cmp(uint64_t value)
153 | {
154 | #if __riscv_xlen == 32
155 |   // Write low as max; no smaller than old value.
156 |   hcb.rtclockcmpl = (uint32_t) UINT_MAX;
157 |   // Write high; no smaller than old value.
158 |   hcb.rtclockcmph = ((uint32_t) (value >> 32));
159 |   // Write low as new value.
160 |   hcb.rtclockcmpl = ((uint32_t) value);
161 | #else
162 |   hcb.rtclockcmp = value;
163 | #endif
164 | }
165 | ```
166 | 


--------------------------------------------------------------------------------
/interrupt-controller.md:
--------------------------------------------------------------------------------
  1 | # The Hart Interrupt Controller (HIC)
  2 | 
  3 | The RISC-V microcontroller profile provides a nested vectored interrupt controller as part of
  4 | the common specifications.
  5 | 
  6 | Each hart may be able to process its own set of interrupts, independent from the other harts.
  7 | Only hart 0 is required to implement a HIC; additional interrupt controllers in all other
  8 | harts are optional and implementation specific.
  9 | 
 10 | > <sup>Hard real-time devices may dedicate separate harts to process fast interrupts.
 11 |   It is possible to wire all interrupts to all harts, and decide in software which interrupts
 12 |   are processed by each hart.</sup>
 13 | 
 14 | ## Features
 15 | 
 16 | The HIC supports the following features:
 17 | 
 18 | - HIC interrupts can be enabled and disabled by writing to their corresponding `status.enabled` or
 19 | `status.clearenabled` bit fields, using a write-1-to-enable and write-1-to-clear policy.
 20 | 
 21 |   When an interrupt is disabled, interrupt assertion causes the interrupt to become pending, but the interrupt
 22 | cannot become active. If an interrupt is active when it is disabled, it remains in the active state until
 23 | this is cleared by a reset or an exception return. Clearing the enable bit prevents any new activation of
 24 | the associated interrupt.
 25 | 
 26 |   An implementation can hard-wire interrupt enable bits to zero if the associated interrupt line does not
 27 | exist, or hard-wired them to one if the associated interrupt line cannot be disabled.
 28 | 
 29 | - the pending state of HIC interrupts can set or removed by software using
 30 | the `status.pending` and `status.clearpending` bit fields. The registers use a write-1-to-enable and
 31 | write-1-to-clear policy. Writing 1 to a bit in the `status.clearpending` bit field has no effect on the
 32 | execution status of an active interrupt.
 33 | 
 34 |   It is implementation specific for each interrupt line supported, whether an interrupt supports either or both
 35 | setting and clearing of the associated pending state under software control.
 36 | 
 37 | - status bits are provided to allow software to determine whether an interrupt is active, pending, or enabled.
 38 | - HIC interrupts are prioritized by updating a priority field. Priorities are maintained according to the RISC-V
 39 | prioritization scheme.
 40 | - HIC supports a maximum of 1024 interrupts.
 41 | 
 42 | ## Memory map
 43 | 
 44 | | Offset | Name | Width | Type | Reset | Description |
 45 | |:-------|:-----|:------|:-----|:------|-------------|
 46 | | 0x0000 | `interrupts[]` | 32b * 2 * N | rw | 0x00000000 | Array of interrupt control registers. |
 47 | 
 48 | The number of interrupts (N) is implementation specific, but no higher than 1024, including the system interrupts.
 49 | 
 50 | Total size: 0x2000.
 51 | 
 52 | ## Per interrupt registers
 53 | 
 54 | Each interrupt has a small per-hart set of status and configuration attributes:
 55 | 
 56 | * `enabled`: interrupts can either be disabled (default) or enabled
 57 | * `pending`: interrupts can either be pending (a request is waiting to be served) or not
 58 | pending
 59 | * `active`: interrupts can either be in an active (being served) or inactive state
 60 | * `prio`: interrupt priority
 61 | 
 62 | To store and control these attributes, each interrupt has two 32-bit registers:
 63 | 
 64 | | Offset | Name | Width | Type | Reset | Description |
 65 | |:-------|:-----|:------|:-----|:------|-------------|
 66 | | 0x0000 | `prio` | 32b | rw | 0x00000000 | The interrupt priority register. |
 67 | | 0x0004 | `status` | 32b | rw | 0x00000000 | The interrupt status and control register. |
 68 | 
 69 | The `prio` register has the the following content:
 70 | 
 71 | | Bits | Name | Type | Reset | Description |
 72 | |:-----|:-----|:-----|:------|-------------|
 73 | | [N:0] | `prio` | rw | 0 | The interrupt priority. |
 74 | | [(xlen-1):(N+1)] | | | | Reserved. |
 75 | 
 76 | N is the number of bits required to store the maximum priority level, and is implementation
 77 | specific. It must match the number of bits used by the `iprioth` CSR.
 78 | 
 79 | The `status` register has the following content:
 80 | 
 81 | | Bits | Name | Type | Reset | Description |
 82 | |:-----|:-----|:-----|:------|-------------|
 83 | | [0] | `enabled` | rw1s | 0 | Enabled status bit; 1 if the interrupt is enabled.<br>When 1 is written, the `enabled` bit is set. |
 84 | | [1] | `pending` | rw1s | 0 | Pending status bit; 1 if the interrupt is pending.<br>When 1 is written, the `pending` bit is set. |
 85 | | [2] | `active` | r | 0 | Active status bit; 1 if the interrupt is active. |
 86 | | [3] |||| Reserved |
 87 | | [4] | `clearenabled` | w1c | | When 1 is written, the `enabled` status bit is cleared. |
 88 | | [5] | `clearpending` | w1c | | When 1 is written, the `pending` status bit is cleared. |
 89 | | [31:6] |||| Reserved |
 90 | 
 91 | > <sup>The alternative to packing all status and control bits related to an interrupt
 92 |   in two words would be to have separate multi-word fields with status, enable, disable,
 93 |   set pending, clear pending, active bits. It was considered that the packed solution
 94 |   is easier to use in software.</sup>
 95 | 
 96 | > <sup>[JB] Why use separate bits for enabling and disabling interrupts? Why not
 97 | use the same write-1-to-enable and write-0-to-clear? ... it is not the way
 98 | assignment works anywhere else in C. [ilg] C programmers are very much used
 99 |   to different semantics when accessing peripheral registers, actually most
100 |   real peripherals in modern devices use write-1-to-set/clear bits, so this
101 |   not surprise anybody. As for keeping the language semantics, in C++ all
102 |   operators can be redefined, so they can implement any semantics is
103 |   required. </sup>
104 | 
105 | ## Usage
106 | 
107 | Individual interrupts are enabled by setting the `status.enabled` bit and are disabled by writing 1 in the `status.clearenabled` bit. To be effective, interrupts must also have non-zero priorities.
108 | 
109 | ```c
110 | hic.interrupts[7].prio = 7;
111 | hic.interrupts[7].status = INTERRUPTS_SET_ENABLED;
112 | 
113 | hcb.interrupts[7].status = INTERRUPTS_CLEAR_ENABLED;
114 | ```
115 | 
116 | Interrupts can be programmatically set to be pending by writing 1 in the `status.pending` field; the pending status can be cleared by writing 1 to the `status.clearpending` bit.
117 | 
118 | ```c
119 | hcb.interrupts[7].status = INTERRUPTS_SET_PENDING;
120 | hcb.interrupts[7].status = INTERRUPTS_CLEAR_PENDING;
121 | ```
122 | 
123 | To check the status bits:
124 | 
125 | ```c
126 | if (hcb.interrupts[7].status & INTERRUPTS_STATUS_PENDING) {
127 |   // ...
128 | }
129 | ```
130 | 
131 | ## Alternate proposal
132 | 
133 | [JB] Each hart has a fixed set of interrupt vectors. For each interrupt
134 | source, a register exists that defines which vector receives interrupts
135 | from that source. Effectively, the hart has some number of IRQ lines
136 | and interrupt sources are assigned manually to IRQ lines. The IRQ lines
137 | have a fixed priority, based on the interrupt number.
138 | 
139 | If multiple
140 | peripherals are assigned to the same vector, then the ISR for that
141 | vector must poll each of the peripherals assigned to that vector to
142 | determine the cause of the interrupt.
143 | 
144 | This also limits interrupt nesting, since only a higher-priority
145 | interrupt (or an exception) can interrupt an ISR, there can be at most
146 | 2\*PRIORITY_LEVELS nested interrupts, if every ISR is interrupted in an
147 | exception handler and exception handlers do not themselves raise
148 | exceptions.
149 | 
150 | > <sup>[ilg] The only advantage to be noted is that it limits nesting. The disadvantages are: increased software complexity, increased latency, more complicated to maintain (changing the priority in the first case requires only a write to the priority register, while in the second case it is also necessary to move the test and the call from one intermediate handler to the other), possible out-of-sync cases, when the test is not in the right handler.</sup>
151 | 


--------------------------------------------------------------------------------
/system-clock.md:
--------------------------------------------------------------------------------
  1 | # The Device System Clock (`sysclock`)
  2 | 
  3 | ## Overview
  4 | 
  5 | The **system clock** is intended to support the implementation of the ISO/IEC 14882.2011
  6 | `high_resolution_clock` (§ 20.11.7.3). Objects of class `high_resolution_clock` represent clocks
  7 | with the shortest tick period.
  8 | 
  9 | The system clock is also intended as:
 10 | 
 11 | - the RTOS tick timer that fires periodically at a programmable rate, for example 1000 Hz, to
 12 | measure time and to drive pre-emptive context switches
 13 | - a variable rate alarm or signal timer to handle timeouts and alarms
 14 | 
 15 | All harts in a RISC-V device share the same system clock counter, but each hart may have its
 16 | own comparator.
 17 | 
 18 | When the device is halted in Debug state, the clock counter is not incremented.
 19 | 
 20 | The system clock is inspired by the `mtime`/`mtimecmp` definitions in the RISC-V
 21 | privileged specs, but it differs by counting a higher frequency input, running only when
 22 | the device is powered and not counting during debug.
 23 | 
 24 | ## Power domain
 25 | 
 26 | The system clock is required to run only when the device is powered up, so it can be
 27 | located in the same frequency/voltage domain as the cores.
 28 | 
 29 | ## Clock input
 30 | 
 31 | The system clock source is a reference clock. Software can select whether the reference
 32 | clock is the core clock, the device high frequency reference clock or an implementation
 33 | specific external clock source. If an implementation uses an external clock, it must
 34 | document the relationship between the processor clock and the external reference.
 35 | 
 36 | > <sup>By default, the system clock uses the same source as the core clock, which is
 37 |   a common configuration.
 38 |   For example, with a 100 MHz core clock, the system clock resolution
 39 |   is 10 nS and it takes about 5800 years to overflow.</sup>
 40 | 
 41 | > <sup>A common RTOS tick frequency is 1000 Hz; in order to accurately achieve this,
 42 |   an input frequency multiple of the tick frequency is required.</sup>
 43 | 
 44 | > <sup>Low-power devices might need to vary the core frequency by changing implementation
 45 |   specific clock registers (like PLL registers). In this case the system clock software
 46 |   must be notified to use the same input frequency. Alternately, the system clock may
 47 |   be configured to use the high frequency clock reference (like the quartz
 48 |   oscillator), assumed to have a fixed frequency. </sup>
 49 | 
 50 | ## Memory map
 51 | 
 52 | ### RV64 devices
 53 | 
 54 | | Offset | Name | Width | Type | Reset | Description |
 55 | |:-------|:-----|:------|:-----|:------|-------------|
 56 | | 0x0000 | `ctrl` | 32b | rw | 0x00000003 | Control and status register. |
 57 | | 0x0008 | `cnt` | 64b | ro | 0x00000000'00000000 | System clock timer counter. |
 58 | 
 59 | Part of the Hart Control Block
 60 | 
 61 | | Offset | Name | Width | Type | Reset | Description |
 62 | |:-------|:-----|:------|:-----|:------|-------------|
 63 | | 0x0000 | `cmp` | 64b | rw | Undefined | System clock timer comparator. |
 64 | 
 65 | ### RV32 devices
 66 | 
 67 | | Offset | Name | Width | Type | Reset | Description |
 68 | |:-------|:-----|:------|:-----|:------|-------------|
 69 | | 0x0000 | `ctrl` | 32b | rw | 0x00000003 | Control and status register. |
 70 | | 0x0008 | `cntl` | 32b | ro | 0x00000000 | Low word of system clock timer counter. |
 71 | | 0x000C | `cnth` | 32b | ro | 0x00000000 | High word of system clock timer counter. |
 72 | 
 73 | Part of the Hart Control Block
 74 | 
 75 | | Offset | Name | Width | Type | Reset | Description |
 76 | |:-------|:-----|:------|:-----|:------|-------------|
 77 | | 0x0000 | `cmpl` | 32b | rw | Undefined | Low word of system clock timer comparator. |
 78 | | 0x0004 | `cmph` | 32b | rw | Undefined | High word of system clock timer comparator. |
 79 | 
 80 | ## The clock control and status register
 81 | 
 82 | Controls the system clock timer and provides status data.
 83 | 
 84 | By default, the system clock starts disabled; software must enable it during startup.
 85 | 
 86 | | Bits | Name | Type | Reset | Description |
 87 | |:-----|:-----|:-----|:------|-------------|
 88 | | [0] | `enable` | rw | 0b0 | Indicates the enabled status of the system clock counter: <br> 0 - Counter is disabled (default). <br> 1 - Counter is enabled. |
 89 | | [2-1] | `source` | rw | 0b11 | Indicates the clock source: <br> 0b00 - Implementation specific external reference clock. <br> 0b01 - Reserved. <br> 0b10 - High frequency clock reference. <br> 0b11 - Core clock (default). |
 90 | | [31-3] |||| Reserved. |
 91 | 
 92 | ## The clock counter register
 93 | 
 94 | The system clock time point register is a 64-bit counter, common on all RV32 and RV64 devices.
 95 | 
 96 | To guarantee the steadiness characteristic of the clock, the register is read-only. At reset, the register is cleared to 0.
 97 | 
 98 | RV64 devices expose a single 64-bit register, accessible with 64-bit instructions.
 99 | RV32 devices exposes separate high/low 32-bit registers.
100 | 
101 | ## The clock comparator register
102 | 
103 | In addition to keeping track of time, the system clock can also be used to trigger
104 | interrupts at specific time points, either for periodic events (like driving
105 | pre-emption in a RTOS scheduler) or to trigger timeout events.
106 | 
107 | The comparator register causes a `sysclock_cmp` interrupt to be posted when the
108 | counter register
109 | contains a value greater than or equal to the value in the comparator register.
110 | The interrupt remains posted until it is cleared by writing to the comparator register.
111 | 
112 | The clock comparator register is specific to each hart and is part of the Hart Control Block.
113 | 
114 | Only hart 0 is required to have a comparator. If any other harts also have comparators,
115 | the `sysclock_cmp` interrupt is posted only to the local hart. For harts that do not have
116 | a comparator, this register always reads as 0 and writes are ignored.
117 | 
118 | RV64 devices expose a single 64-bit register, accessible with 64-bit instructions.
119 | RV32 devices exposes separate high/low 32-bit registers.
120 | 
121 | ## Usage
122 | 
123 | The memory mapped registers are available via a set of structures, directly available in C/C++.
124 | 
125 | RV64 devices:
126 | 
127 | - `sysclock.ctrl`
128 | - `sysclock.cnt`
129 | - `hcb.sysclockcmp`
130 | 
131 | RV32 devices:
132 | 
133 | - `sysclock.ctrl`
134 | - `sysclock.cntl`
135 | - `sysclock.cnth`
136 | - `hcb.sysclockcmpl`
137 | - `hcb.sysclockcmph`
138 | 
139 | ```c
140 | uint64_t
141 | riscv_sysclock_read_cnt(void)
142 | {
143 | #if __riscv_xlen == 32
144 |   // Atomic read. The loop is taken once in most cases. Only when the
145 |   // value carries to the high word, two loops are performed.
146 |   while (true)
147 |     {
148 |       uint32_t hi = sysclock.cnth;
149 |       uint32_t lo = sysclock.cntl;
150 |       if (hi == sysclock.cnth)
151 |         {
152 |           return ((uint64_t) hi << 32) | lo;
153 |         }
154 |     }
155 | #else
156 |   return sysclock.cnt;
157 | #endif
158 | }
159 | 
160 | uint64_t
161 | riscv_sysclock_read_cmp(void)
162 | {
163 | #if __riscv_xlen == 32
164 |   return ((uint64_t) hcb.sysclockcmph << 32) | hcb.sysclockcmpl;
165 | #else
166 |   return hcb.sysclockcmp;
167 | #endif
168 | }
169 | 
170 | void
171 | riscv_sysclock_write_cmp(uint64_t value)
172 | {
173 | #if __riscv_xlen == 32
174 |   // Write low as max; no smaller than old value.
175 |   hcb.sysclockcmpl = (uint32_t) UINT_MAX;
176 |   // Write high; no smaller than old value.
177 |   hcb.sysclockcmph = ((uint32_t) (value >> 32));
178 |   // Write low as new value.
179 |   hcb.sysclockcmpl = ((uint32_t) value);
180 | #else
181 |   hcb.sysclockcmp = value;
182 | #endif
183 | }
184 | ```
185 | 
186 | A typical periodic tick counter:
187 | 
188 | ```c
189 | 
190 | uint64_c sysclock_cmp;
191 | uint32_t sysclock_increment;
192 | 
193 | void
194 | sysclock_init(void)
195 | {
196 |   // ...
197 |   sysclock_increment = INPUT_FREQ_HZ/SYSCLOCK_FREQ_HZ;
198 |   sysclock_cmp = riscv_sysclock_read_cnt() + sysclock_increment;
199 | 
200 |   // Ask for an interrupt after one tick interval.
201 |   // Since the comparator is not initialised at reset, it
202 |   // must be written before enabling interrupts.
203 |   riscv_sysclock_write_cmp(sysclock_cmp);
204 | 
205 |   // Assign a priority.
206 |   hic.interrupts[SYSCLOCK_CMP_INT_NUM].prio = SYSCLOCK_CMP_PRIO;
207 |   // Enable.
208 |   hic.interrupts[SYSCLOCK_CMP_INT_NUM].status = INTERRUPTS_SET_ENABLED;
209 | }
210 | 
211 | void
212 | interrupt_handle_sysclock_cmp(void)
213 | {
214 |   // Increment the clock tick counter and run tick actions.
215 |   sysclock_tick_increment();
216 | 
217 |   // Compute the next time point when the interrupt should come.
218 |   sysclock_cmp += sysclock_increment;
219 |   riscv_sysclock_write_cmp(sysclock_cmp);
220 | }
221 | 
222 | ```
223 | 


--------------------------------------------------------------------------------
/eabi.md:
--------------------------------------------------------------------------------
  1 | # Embedded ABI
  2 | 
  3 | The current RISC-V privileged ABI requires the caller to save the following registers:
  4 | `ra`, `t0`, `t1`, `t2`, `a0`, `a1`, `a2`, `a3`, `a4`, `a5`, `a6`, `a7`, `t3`, `t4`,
  5 | `t5`, `t6`. This amounts
  6 | to 16 registers. If floating point is used, 20 more registers must be saved.
  7 | 
  8 | In order to be able to call a C/C++ function from the interrupt handler, all
  9 | these registers must be saved when entering interrupts, which impacts the
 10 | interrupt latency.
 11 | 
 12 | ## Proposal
 13 | 
 14 | The main goal of the RISC-V Embedded ABI is to balance a high performance for background code with a reduced interrupt latency.
 15 | 
 16 | As a secondary goal, if possible, it should remain consistent when applied to the reduced register set used by the RV32E devices.
 17 | 
 18 | ### RV32E EABI calling convention
 19 | 
 20 | For interrupt latency reasons, there should be no more than 7-8 caller 
 21 | saved registers. The table assumes the minimum of 7. If 8 are 
 22 | accepted, `x14` should be renamed as `a4`.
 23 | 
 24 | | Register | ABI Name | Description | Caller | Callee |
 25 | |:---------|:---------|:------------|--------|-------|
 26 | | `x0` | `zero` | Hard-wired zero |  |  |
 27 | | `x1` | `ra` | Return address | * |  |
 28 | | `x2` | `sp` | Stack pointer |  | * |
 29 | | `x3` | `gp` | Global pointer |  |  |
 30 | | `x4` | `tp` | Thread pointer |  |  |
 31 | | `x5` | `t1/al` | Temporary/alternate link register | * | |
 32 | | `x6` | `s3` | Saved register |  | * |
 33 | | `x7` | `s4(/sl)` | Saved register(/stack limit?) |  | * |
 34 | |||||
 35 | | `x8` | `s0/fp` | Saved register/frame pointer |  | * |
 36 | | `x9` | `s1` | Saved register |  | * |
 37 | | `x10,x11` | `a0,a1` | Function arguments/return values | ** |  |
 38 | | `x12` | `a2` | Function arguments | * |  |
 39 | | `x13` | `a3` | Function arguments | * |  |
 40 | | `x14` | `s2` (`a4`?) | Saved register | (\*?) | * |
 41 | | `x15` | `t0` | Temporary | * | |
 42 | 
 43 | TODO: check how this allocation matches the needs of C++ virtual function dispatch.
 44 | 
 45 | > <sup>[AW] For RVC compressibility, the most popular registers should 
 46 |   be `x8-x15`. So I suggest renumbering `x8/x9` to be `s0/s1` (as is the 
 47 |   case in the POSIX ABI).</sup>
 48 |   
 49 | > <sup>[AW] Per the ISA spec, `x5` serves as an alternate link register, 
 50 |   to drive hardware management of return-address stacks. It’s used by 
 51 |   things like the `-msave-restore` option, which reduces code size by 
 52 |   using millicode routines to implement prologues/epilogues. For this 
 53 |   to work, `x5` needs to be one of the t-registers, as is the case in 
 54 |   the POSIX ABI. I suggest either adding one more t-register at `x5`, 
 55 |   or moving an existing t-register to `x5`. (The former option is better 
 56 |   for code size and performance; the latter option is better for 
 57 |   interrupt latency.)</sup>
 58 |   
 59 | > <sup>[AW] For all the use cases I’ve encountered, two t-registers 
 60 |   is sufficient for linkage purposes.</sup>
 61 |   
 62 | > <sup>[BH] `jal[r]` and `jr` with `x5` are being baked into hardware as 
 63 |   function call/return, just as with `x1`, complete with a special 
 64 |   return address stack to accelerate the indirect jump for function 
 65 |   return. That's *especially* important with millicode register 
 66 |   save/restore which will primarily be used on microcontrollers. 
 67 |   So `x5` must be a t register. No choice. But maybe don't call it 
 68 |   `t0`, if at least one t register is in `x8-x15`.</sup>
 69 |   
 70 | > <sup>[BH] the lowest numbered registers of each class (s, a t) should 
 71 |   fall somewhere inside the C-favoured registers `x8-x15` (if any  
 72 |   registers of that class fall in this range).</sup>
 73 |   
 74 | > <sup>Having the stack limit exposed as a general register
 75 |   would save an extra push/pop during RTOS context switches.</sup>
 76 | 
 77 | > <sup>[BH] I don't like the stack limit being in a register.
 78 |   Much better in a CSR. Harder to corrupt by accident.</sup>
 79 |  
 80 | > <sup>[jnk0le] Stack limit is not about to be frequently accessed by thread 
 81 |   code nor it is available from raw C/C++. Reserving another general purpose 
 82 |   register increases register pressure especially in RV32E which currently 
 83 |   have less available registers than armv7[M]. Stack limit can be corrupted 
 84 |   by code. Mapping another shadow register into GPRs and protecting it from 
 85 |   corruption by thread code, will increase hardware complexity. Saving 2 
 86 |   cycles on push/pop in context switch is a sign of premature optimization 
 87 |   of whole ABI for specific use case. Assuming 50MHz clockrate and 1000Hz 
 88 |   scheduler tickrate, those 2 cycles saved per context switch accounts for 
 89 |   0,004% of total cycles saved. Of course, only if rest of the code is 
 90 |   actually not starving from missing register.</sup>
 91 |  
 92 | [ilg] I agree that the stack limit register may be better available only 
 93 | as a CSR.
 94 | 
 95 | More details on the register allocation in the 
 96 | [SW Dev list](https://groups.google.com/a/groups.riscv.org/d/msg/sw-dev/Lp6ucrijap0/ZwVO5Ts-CQAJ).
 97 | 
 98 | ### RV32I/RV64I EABI calling convention
 99 | 
100 | | Register | ABI Name | Description | Caller | Callee |
101 | |:---------|:---------|:------------|--------|-------|
102 | | `x0` | `zero` | Hard-wired zero |  |  |
103 | | `x1` | `ra` | Return address | * |  |
104 | | `x2` | `sp` | Stack pointer |  | * |
105 | | `x3` | `gp` | Global pointer |  |  |
106 | | `x4` | `tp` | Thread pointer |  |  |
107 | | `x5` | `t1/al` | Temporary/alternate link register | * | |
108 | | `x6` | `s3` | Saved register |  | * |
109 | | `x7` | `s4(/sl)` | Saved register(/stack limit?) |  | * |
110 | |||||
111 | | `x8` | `s0/fp` | Saved register/frame pointer |  | * |
112 | | `x9` | `s1` | Saved register |  | * |
113 | | `x10,x11` | `a0,a1` | Function arguments/return values | ** |  |
114 | | `x12` | `a2` | Function arguments | * |  |
115 | | `x13` | `a3` | Function arguments | * |  |
116 | | `x14` | `s2` | Saved register |  | * |
117 | | `x15` | `t0` | Temporary | * | |
118 | |||||
119 | | `x16–x31` | `s5-s20` | Saved registers |  | * |
120 | |||||
121 | | `f0–f1` | `fa0-fa1` | FP arguments/return values | * |  |
122 | | `f2–f7` | `fa2-fa7` | FP arguments | * |  |
123 | | `f8–f15` | `ft0-ft7` | FP temporaries | * |  |
124 | | `f16–f31` | `fs0-fs15` | FP saved registers |  | * |
125 | 
126 | > <sup>To simplify the context push/pop code,
127 |   the floating point registers were reordered, to group
128 |   all the caller register in one half of the set and the callee
129 |   saved registers in the other half.</sup>
130 | 
131 | ### Sizes of variables
132 | 
133 | - `long double` - 64 bits.
134 | 
135 | TODO: add all other
136 | 
137 | ## References
138 | 
139 | ## RISC-V POSIX ABI
140 | 
141 | Currently defined in Chapter 20, RISC-V Assembly Programmer’s Handbook, of the "The RISC-V Instruction Set Manual Volume I: User-Level ISA, Document Version 2.2".
142 | 
143 | | Register | ABI Name | Description | Caller | Callee |
144 | |:---------|:---------|:------------|--------|-------|
145 | | `x0` | `zero` | Hard-wired zero |  |  |
146 | | `x1` | `ra` | Return address | * |  |
147 | | `x2` | `sp` | Stack pointer |  | * |
148 | | `x3` | `gp` | Global pointer |  |  |
149 | | `x4` | `tp` | Thread pointer |  |  |
150 | | `x5` | `t0` | Temporary/alternate link register | * |  |
151 | | `x6–x7` | `t1-t2` | Temporaries | * |  |
152 | | `x8` | `s0/fp` | Saved register/frame pointer |  | * |
153 | | `x9` | `s1` | Saved register |  | * |
154 | | `x10–x11` | `a0-a1` | Function arguments/return values | * |  |
155 | | `x12–x17` | `a2-a7` | Function arguments | * |  |
156 | | `x18–x27` | `s2-s11` | Saved registers |  | * |
157 | | `x28–x31` | `t3-t6` | Temporaries | * |  |
158 | |||||
159 | | `f0–f7` | `ft0-ft7` | FP temporaries | * |  |
160 | | `f8–f9` | `fs0-fs1` | FP saved registers |  | * |
161 | | `f10–f11` | `fa0-fa1` | FP arguments/return values | * |  |
162 | | `f12–f17` | `fa2-fa7` | FP arguments | * |  |
163 | | `f18–f27` | `fs2-fs11` | FP saved registers |  | * |
164 | | `f28–f31` | `ft8-ft11` | FP temporaries | * |  |
165 | 
166 | ## RISC-V RV32E ABI
167 | 
168 | Currently defined in the [RISC-V ELF psABI](https://github.com/riscv/riscv-elf-psabi-doc/blob/master/riscv-elf.md#-rv32e-calling-convention).
169 | 
170 | | Register | ABI Name | Description | Caller | Callee |
171 | |:---------|:---------|:------------|--------|-------|
172 | | `x0` | `zero` | Hard-wired zero |  |  |
173 | | `x1` | `ra` | Return address | * |  |
174 | | `x2` | `sp` | Stack pointer |  | * |
175 | | `x3` | `gp` | Global pointer |  |  |
176 | | `x4` | `tp` | Thread pointer |  |  |
177 | | `x5` | `t0` | Temporary/alternate link register | * |  |
178 | | `x6–x7` | `t1-t2` | Temporaries | * |  |
179 | | `x8` | `s0/fp` | Saved register/frame pointer |  | * |
180 | | `x9` | `s1` | Saved register |  | * |
181 | | `x10–x11` | `a0-a1` | Function arguments/return values | * |  |
182 | | `x12–x15` | `a2-a5` | Function arguments | * |  |
183 | 
184 | ## Links
185 | 
186 | - [Application Binary Interface for
187 | the ARM® Architecture](http://infocenter.arm.com/help/topic/com.arm.doc.ihi0036b/IHI0036B_bsabi.pdf)
188 | - [Procedure Call Standard for the ARM® Architecture](http://infocenter.arm.com/help/topic/com.arm.doc.ihi0042f/IHI0042F_aapcs.pdf)
189 | - [RISC-V ELF psABI](https://github.com/riscv/riscv-elf-psabi-doc/blob/master/riscv-elf.md)
190 | 
191 | 


--------------------------------------------------------------------------------
/csrs.md:
--------------------------------------------------------------------------------
  1 | # The Control and Status Registers (CSRs)
  2 | 
  3 | The RISC-V ISA defines a set of 4096 Control and Status Registers that can be accessed
  4 | via special `csr` instructions with immediate operands identifying the register.
  5 | 
  6 | For performance reasons, the RISC-V microcontroller profile uses only a small number
  7 | of core system registers via the CSR mechanism; the rest are available in the memory
  8 | mapped system area.
  9 | 
 10 | Unless otherwise mentioned, write access to the CSRs is limited to machine/privileged mode.
 11 | 
 12 | ## Hart ID Register (`hartid`)
 13 | 
 14 | The `hartid` CSR is an xlen-bit read-only register containing the integer ID of the
 15 | hart running the code. This register must be readable in any implementation.
 16 | In single-hart devices, it always reads 0. In multi-hart devices, the hart IDs might
 17 | not necessarily be numbered contiguously
 18 | (although it is preferable), but at least one hart must have a hart ID of zero.
 19 | 
 20 | | Bits | Name | Type | Reset | Description |
 21 | |:-----|:-----|:-----|:------|-------------|
 22 | | [N:0] | `hartid` | ro | | The integer ID of the hart. |
 23 | | [(xlen-1):(N+1)] | | | | Reserved. |
 24 | 
 25 | N is the number of bits required to store the maximum hart ID and is implementation specific.
 26 | 
 27 | This CSR is identical to `mhartid` in the RISC-V privileged profile.
 28 | 
 29 | ## Configuration and control (`ctrl`)
 30 | 
 31 | The `ctrl` CSR is an xlen-bit read/write register that controls several aspects of the hart
 32 | functionality.
 33 | 
 34 | | Bits | Name | Type | Reset | Description |
 35 | |:-----|:-----|:-----|:------|-------------|
 36 | | [0] | `sptena` | r | 0 | Thread stack enable: <br>- 0: always use `spm` as `sp`. <br>- 1: in **application** mode use `spt` as `sp`. |
 37 | | [1] | `stackalign` | rw | 1<sup>[1]</sup> | The context stack alignment: <br>- 0: 4-bytes alignment guaranteed, no SP adjustment is performed.<br>- 1: 8-bytes alignment guaranteed, SP adjusted if necessary.|
 38 | | [2] | `spadjusted` | r | 0 | Reserved bit used during context push/pop to remember if the stack required an extra alignment word. |
 39 | | [7:3] | | | | Reserved. |
 40 | | [8] | `fpena` | rw | 0 | Floating point enable: <br>- 0: if the FP unit is disabled.<br>- 1: if the FP unit is enabled. |
 41 | | [9] | `fpcxs` | rw | 1 | Floating point context save: <br>- 0: if the context stack should not save FP registers. <br>- 1: if the context stack should save FP registers. |
 42 | | [10] | `fplazy` | rw | 1 | Floating point lazy context save: <br>- 0: disable automatic lazy context save.<br>- 1: enable automatic lazy context save. |
 43 | | [(xlen-1):11] | | | | Reserved. |
 44 | 
 45 | <sup>*1</sup>: The default value for the `stackalign` is implementation specific; the
 46 | recommended default is 1.
 47 | 
 48 | TODO: decide if a `reset` bit (to reset the current hart) fits here, and
 49 | where should be a `sysreset` to reset the entire device.
 50 | 
 51 | TODO: allocate a number for it.
 52 | 
 53 | ## Mode and status (`status`)
 54 | 
 55 | The `status` CSR is an xlen-bit read/write register that identifies the
 56 | current hart mode and status.
 57 | 
 58 | | Bits | Name | Type | Reset | Description |
 59 | |:-----|:-----|:-----|:------|-------------|
 60 | | [0] | `handler` | r | 0 | Hart is running:<br>- 0: application code.<br>- 1: handler code. |
 61 | | [1] | `user` | r | 0 | Application privileges:<br>- 0: machine/privileged mode.<br>- 1: user/unprivileged mode. |
 62 | | [7:2] | | | | Reserved. |
 63 | | [8] | `interrupt` | r | 0 | If `handler` is set, then<br>1 if in an interrupt, 0 if in an exception |
 64 | | [18:9] | `cause` | r | 0 | The exception or interrupt cause code. |
 65 | | [(xlen-1):19] | | | | Reserved. |
 66 | 
 67 | The handler code is always running in machine/privileged mode.
 68 | 
 69 | TODO: the bits in this register, as the entire mechanism to enter/exit
 70 | exceptions and traps, requires a thorough analysis.
 71 | 
 72 | TODO: allocate a number for it.
 73 | 
 74 | ## Interrupt Enable (`iena`)
 75 | 
 76 | The `iena` CSR is an xlen-bit read/write register that controls whether the interrupt
 77 | are enabled or not.
 78 | 
 79 | This register has a single bit on purpose. Access to the interrupt enable bit must be quite
 80 | fast since masking all interrupts is one of the methods used to implement critical sections.
 81 | 
 82 | | Bits | Name | Type | Reset | Description |
 83 | |:-----|:-----|:-----|:------|-------------|
 84 | | [0] | `iena` | rw | 0 | Interrupts Enable; 1 if interrupts are enabled. |
 85 | | [(xlen-1):1] | | | | Reserved. |
 86 | 
 87 | This CSR is specific to the RISC-V microcontroller profile.
 88 | 
 89 | TODO: allocate a number for it.
 90 | 
 91 | ## Interrupt Priority Threshold (`iprioth`)
 92 | 
 93 | The `iprioth` CSR is an xlen-bit read/write register that holds the interrupts threshold.
 94 | Only interrupts requests that have a priority strictly greater than the threshold will cause
 95 | an interrupt to become active. The threshold register must always be able to hold the value zero,
 96 | in which case, no interrupts are masked. The threshold register must also be able to hold
 97 | the maximum priority level, in which case all interrupts are masked (functionally equivalent
 98 | to disabling interrupts).
 99 | 
100 | This register is a CSR because handling the interrupts threshold is one of the methods used
101 | to implement critical sections, and this must be as fast as possible.
102 | 
103 | | Bits | Name | Type | Reset | Description |
104 | |:-----|:-----|:-----|:------|-------------|
105 | | [N:0] | `iprioth` | rw | 0x00 | The interrupt priority threshold. |
106 | | [(xlen-1):(N+1)] | | | | Reserved. |
107 | 
108 | N is the number of bits required to store the maximum priority level, and is implementation
109 | specific. It must match the number of bits used by the `prio` register in the interrupt
110 | controller.
111 | 
112 | All reserved bits read back as 0. To find out N at runtime, an application can write an
113 | 'all-1' pattern and read back the register.
114 | 
115 | If the hart does not implement an interrupt controller, the whole register reads back as zero.
116 | 
117 | This CSR is specific to the RISC-V microcontroller profile.
118 | 
119 | TODO: allocate a number for it.
120 | 
121 | > <sup>[PA]: the truncation of priority bits should be done at the
122 |  least-significant end, to avoid
123 |  priority inversion. [ilg] this might be done for example by moving the bits to the
124 |  high end of the register, but this requires later handling the priority as
125 |  word/double word.<sup>
126 | 
127 | ## Interrupt Priority Threshold Increase (`ipriothinc`)
128 | 
129 | The `ipriothinc` CSR behaves like an xlen-bit read/write register, but in fact uses the
130 | same register as `iprioth`. The difference is that writes to this CSR are effective only
131 | if the new value is higher than the current value, in other words it guarantees that the
132 | interrupt threshold is not decreased.
133 | 
134 | This register is a CSR because handling the interrupts threshold is one of the methods used
135 | to implement critical sections, and this must be as fast as possible.
136 | 
137 | | Bits | Name | Type | Reset | Description |
138 | |:-----|:-----|:-----|:------|-------------|
139 | | [N:0] | `ipriothinc` | rw | 0x00 | The interrupt priority threshold. |
140 | | [(xlen-1):(N+1)] | | | | Reserved. |
141 | 
142 | N is the number of bits required to store the maximum priority level, and is implementation
143 | specific.
144 | 
145 | This CSR is specific to the RISC-V microcontroller profile.
146 | 
147 | TODO: possibly find a better name. how about `ipriothup`?
148 | 
149 | TODO: allocate a number for it.
150 | 
151 | ## Main Stack Pointer (`spm`)
152 | 
153 | The `spm` CSR is an xlen-bit read-write register that holds the main stack pointer.
154 | It is always the default stack pointer after reset. Interrupts and exceptions always
155 | use this stack to store the exception frame.
156 | 
157 | | Bits | Name | Type | Reset | Description |
158 | |:-----|:-----|:-----|:------|-------------|
159 | | [0] | | | 0 | Reserved. |
160 | | [(xlen-1):1] | `spm` | rw | startup | The main stack pointer. |
161 | 
162 | This CSR is specific to the RISC-V microcontroller profile.
163 | 
164 | TODO: check if the stack has more strict alignment requirements.
165 | 
166 | TODO: allocate a number for it.
167 | 
168 | ## Main Stack Pointer Limit (`spmlimit`)
169 | 
170 | The `msplimit` CSR is an xlen-bit read-write register that holds the lowest address
171 | the main stack can descend.
172 | 
173 | | Bits | Name | Type | Reset | Description |
174 | |:-----|:-----|:-----|:------|-------------|
175 | | [0] | | | 0 | Reserved. |
176 | | [(xlen-1):1] | `spmlimit` | rw | startup | The main stack lower limit. |
177 | 
178 | If an operation using the main stack pointer attempts to write to an address below
179 | the limit, an exception is triggered and the operation is not performed.
180 | 
181 | This CSR is specific to the RISC-V microcontroller profile.
182 | 
183 | The `spmlimit` CSR is optional for the ES (small) sub-profile; in this case it
184 | must always read zero.
185 | 
186 | TODO: allocate a number for it.
187 | 
188 | ## Thread Stack Pointer (`spt`)
189 | 
190 | The `spt` CSR is an xlen-bit read-write register that holds the stack pointer used
191 | by the application current thread. It is intended to multi-threaded applications.
192 | 
193 | This register is a CSR because access to the stack pointer may occur in context switching
194 | routines and needs to be fast.
195 | 
196 | | Bits | Name | Type | Reset | Description |
197 | |:-----|:-----|:-----|:------|-------------|
198 | | [0] | | | 0 | Reserved. |
199 | | [(xlen-1):1] | `spt` | rw | Unknown | The thread stack pointer. |
200 | 
201 | This CSR is specific to the RISC-V microcontroller profile.
202 | 
203 | TODO: allocate a number for it.
204 | 
205 | ## Thread Stack Pointer Limit (`sptlimit`)
206 | 
207 | The `tsplimit` CSR is an xlen-bit read-write register that holds the lowest address
208 | the thread stack can descend.
209 | 
210 | This register is a CSR because access to the stack pointer limit may occur in context switching
211 | routines and needs to be fast.
212 | 
213 | | Bits | Name | Type | Reset | Description |
214 | |:-----|:-----|:-----|:------|-------------|
215 | | [0] | | | 0 | Reserved. |
216 | | [(xlen-1):1] | `sptplimit` | rw | Unknown | The thread stack lower limit. |
217 | 
218 | If an operation using the thread stack pointer attempts to write to an address below
219 | the limit, an exception is triggered and the operation is not performed.
220 | 
221 | This CSR is specific to the RISC-V microcontroller profile.
222 | 
223 | The `sptlimit` CSR is optional for the ES (small) sub-profile; in this case it
224 | must always read zero.
225 | 
226 | TODO: allocate a number for it.
227 | 
228 | ## RISC-V compatibility CSRs
229 | 
230 | The RISC-V Volume I, Chapter 2.8, mentions two mandatory instructions, `rdcycle` and
231 | `rdinstret`; to implement them, two CSRs are required:
232 | 
233 | - cycle: available as `hcb.cyclecnt`
234 | - instret: available as `hcb.instcnt`
235 | 
236 | The RISC-V Volume II, mentions other CSRs, but it is not clear which one are mandatory,
237 | if any:
238 | 
239 | - mstatus: not needed, there is only one bit needed, mie, which is now n the `iena` CSR.
240 | - mcause: no longer needed, the cause is packed in the `status` CSR
241 | - mie: not needed, interrupts are enabled in the HIC registers
242 | - mip: not needed, interrupt pending bits are in the HIC registers
243 | - mtvec: not needed, there are two memory mapped registers, `excvta` and `intvta`
244 | - misa: can be safely migrated to memory mapped
245 | - mepc: not needed, pushed onto the stack
246 | - mtval: not decided; nested exceptions may override this. TODO: check thoroughly.
247 | 
248 | - mscratch: not decided if needed
249 | 
250 | TODO: add MPU registers
251 | 
252 | ## Usage
253 | 
254 | ### Interrupt critical sections
255 | 
256 | In a single hart device, the simple ways to implement critical section is to
257 | fully disable interrupts, assuming the application does not need to keep any
258 | fast interrupts enabled.
259 | 
260 | ```c
261 | void
262 | f1(void)
263 | {
264 |   // ...
265 |   {
266 |     // Interrupts critical section.
267 |     xlenreg_t status = riscv_csr_write_iena(0);
268 |     // ...
269 |     riscv_csr_write_iena(status);
270 |   }
271 |   // ...
272 | }
273 | ```
274 | 
275 | Otherwise, if the application uses some fast interrupts, it can raise the
276 | interrupt threshold to a limit below the fast interrupts priority. Please
277 | note how entering the critical sections guarantees that the threshold is
278 | not lowered.
279 | 
280 | ```c
281 | void
282 | f2(void)
283 | {
284 |   // ...
285 |   {
286 |     // Interrupts critical section.
287 |     xlenreg_t status = riscv_csr_write_ipriothinc(7);
288 |     // ...
289 |     riscv_csr_write_iprioth(status);
290 |   }
291 |   // ...
292 | }
293 | ```
294 | 
295 | ### Performance issues
296 | 
297 | The reason for prefering CSRs vs memory mapped registers is speed; accessing CSRs requires a single instruction, while memory accesses take two:
298 | 
299 | ```
300 | f(riscv_csr_read_mstatus(), SYSPERIPH->cmd); 
301 | 
302 | 20400238:	30002573           csrr	a0,mstatus 
303 | 
304 | 2040023c:	f00007b7           lui	a5,0xf0000 
305 | 20400240:	43cc               lw	a1,4(a5) 
306 | 
307 | 20400242:	307000ef           jal	ra,20400d48 <f(unsigned long, unsigned long)> 
308 | ```
309 | 
310 | For this reason, all registers needed in interrupt critical sections and context switches should be accessed with CSR instructions, while all other non critical registers can be memory mapped. 
311 | 


--------------------------------------------------------------------------------
/exceptions-and-interrupts.md:
--------------------------------------------------------------------------------
  1 | # Exceptions and Interrupts
  2 | 
  3 | Exceptions are unusual **conditions that occur at run time, associated with an
  4 | instruction** in the current RISC-V hart.
  5 | 
  6 | Interrupts are **events that occur asynchronously outside** any of the RISC-V harts.
  7 | 
  8 | > <sup>Other architectures define interrupts as a specific type of exceptions.
  9 |   However, for the RISC-V microcontroller profile, exceptions are specific
 10 |   for the architecture, and common to all devices, while interrupts are
 11 |   mostly specific to an implementation (except a few system interrupts,
 12 |   also common to all devices). Thus it looks more natural to define
 13 |   two separate vector tables, one for exceptions, to be implemented
 14 |   in the architecture software package, and one for interrupts, to be
 15 |   implemented in the device software package.</sup>
 16 | 
 17 | The mechanism to process exceptions and interrupts (vectored, nested, separate stack)
 18 | is one of the main improvements in the RISC-V microcontroller profile over the
 19 | privileged profile.
 20 | 
 21 | ## Exception and interrupt handlers in C/C++
 22 | 
 23 | The main feature is the ability to write the exception and interrupt handlers
 24 | as plain C/C++ function, that do not need any compiler attributes, or assembly
 25 | code.
 26 | 
 27 | For this to be possible, there are two requirements:
 28 | 
 29 | - both the exception and the interrupt entry code must abide by the ABI requirements
 30 | and save the same caller registers as a regular C/C++ call
 31 | - a custom return address must be used, such that when the handler returns,
 32 | the core will trigger the exception return mechanism, without the need of explicit
 33 | assembly `mret` instructions.
 34 | 
 35 | ## Exceptions
 36 | 
 37 | Exceptions trigger a **synchronous transfer of control** to an exception handler
 38 | within the current hart.
 39 | 
 40 | Some exceptions cannot be disabled, and handlers to process them should always be installed.
 41 | 
 42 | Some exceptions are **resumable**, i.e. an execution can continue to the next
 43 | instruction (for example the illegal instruction handler can implement a custom
 44 | instruction and resume).
 45 | 
 46 | The RISC-V privileged specs define the following exceptions, in decreasing priority order:
 47 | 
 48 | * Instruction address misaligned
 49 | * Instruction access fault
 50 | * Illegal instruction
 51 | * Breakpoint
 52 | * Load address misaligned
 53 | * Load access fault
 54 | * Store/AMO address misaligned
 55 | * Store/AMO access fault
 56 | * Environment call from U-mode
 57 | * Environment call from M-mode
 58 | * Instruction page fault
 59 | * Load page fault
 60 | * Store/AMO page fault
 61 | 
 62 | TODO: rework for microcontrollers; define which one have configurable priorities.
 63 | 
 64 | TODO: NMI? routed only to hart 0?
 65 | 
 66 | ### Exceptions vector table
 67 | 
 68 | The exceptions vector table is an array of addresses (xlen size elements) pointing to
 69 | interrupt handlers (C/C++ functions).
 70 | 
 71 | The address of the exceptions vector table is kept by each hart in (`hcb.excvta`);
 72 | it is automatically initialised at startup with
 73 | the address provided in the hart startup block and can be later written by software.
 74 | 
 75 | ## Interrupts
 76 | 
 77 | Interrupts are generally **triggered by peripherals** to notify the application of a
 78 | given condition or event.
 79 | 
 80 | Interrupts trigger the transfer of control to an interrupt handler associated with
 81 | a hart.
 82 | 
 83 | In the RISC-V microcontroller profile, a hart can have up to **1024** interrupts,
 84 | including the system interrupts.
 85 | 
 86 | > <sup>This limit was chosen arbitrarily and is considered quite high.</sup>
 87 | 
 88 | ### Interrupt priorities
 89 | 
 90 | Interrupts have **programmable priorities**, defined as small unsigned numbers.
 91 | 
 92 | The **priority value 0** is reserved to mean
 93 | _'never interrupt'_ or _'disabled'_, and interrupt priorities increase with
 94 | the increasing integer value.
 95 | 
 96 | Interrupts with the same priority are processed in the order of their index
 97 | in the interrupt vector
 98 | table, with a higher index meaning a higher priority.
 99 | 
100 | For multi-hart devices, the interrupt wiring to harts is implementation specific;
101 | each interrupt
102 | may be wired to one or several harts; it is the responsibility
103 | of each hart to enable the interrupts it desires to process. For redundant systems,
104 | it is also
105 | possible for multiple harts to process the same interrupt.
106 | 
107 | ### Interrupt priority threshold
108 | 
109 | Each hart has an associated priority threshold, held in a hart-specific register.
110 | 
111 | Only interrupts that have a priority strictly greater than the threshold will
112 | cause an interrupt to be sent to the hart.
113 | 
114 | ### Priority bits
115 | 
116 | The actual number of bits used to store the interrupt priority is implementation
117 | specific, but must
118 | be at least 3 (i.e. at least 8 priority levels).
119 | 
120 | > <sup>Extra care must be considered when moving code to implementations with fewer
121 |   priority levels, since truncation could lead to priority inversions.
122 |   For example, when moving a program from devices
123 |   with 4-bit priority bits to devices with 3-bit priorities, if the application
124 |   uses priority 9 for IRQ0 and priority 3
125 |   for IRQ1, IRQ0 is expected to have a higher
126 |   priority. But if the MSB bit is removed, IRQ0 will have priority 1 and be
127 |   lower than IRQ1.</sup>
128 | 
129 | > <sup>It is
130 |   recommended that software handling priorities know about the number of bits
131 |   and use asserts to validate the priority values.</sup>
132 | 
133 | > <sup>[PA]: the truncation of priority bits should be done at the
134 |  least-significant end, to avoid the kind of
135 |  priority inversion. [ilg] this translates into moving the bits to the other
136 |  end of the word/register, and possibly requiring byte/half-word accesses
137 |  to the NIC.<sup>
138 | 
139 | ### Interrupt preemption and nesting
140 | 
141 | If an hart is executing an interrupt handler and a higher priority interrupt
142 | occurs, the current interrupt handler is temporarily suspended and the higher
143 | priority interrupt handler is executed to completion, then the initial
144 | interrupt handler is resumed.
145 | 
146 | Each new interrupt creates a new context on the main stack, and removes it
147 | when the handler returns.
148 | 
149 | There is no limit for interrupt nesting, assuming the main stack is large enough.
150 | 
151 | ### System interrupts
152 | 
153 | System interrupts are generated by system peripherals, like `sysclock`, `rtclock`.
154 | 
155 | TBD
156 | 
157 | ### Interrupts vector table
158 | 
159 | The interrupts table is an **array of pointers** to interrupt handlers,
160 | implemented as **C/C++ functions**. The number of interrupts per hart is
161 | implementation specific but cannot exceed 1024 elements.
162 | 
163 | Each hart may have its own table, with handlers for the interrupts it can process.
164 | 
165 | The address of the array must be programmatically written by each hart to
166 | its `hcb.intvta` register before enabling interrupts, usually during startup.
167 | 
168 | The first 8 entries are reserved for system interrupts:
169 | 
170 | * `context_switch` (must have the lowest priority)
171 | * `rtclock_cmp`
172 | * `sysclock_cmp`
173 | * ... 5 more, reserved
174 | 
175 | ## Vector tables relocation
176 | 
177 | The starting address used by a RISC-V microcontroller is usually
178 | either a flash memory or a ROM device, and the value cannot be changed at run-time.
179 | However, some applications, like bootloaders or applications running in RAM,
180 | start with the vector tables at one
181 | address and later transfer control to the application located at a different
182 | address. For such cases it is useful to be able to modify or define vector tables
183 | at run-time. In order to handle this, the RISC-V microcontrollers support a feature
184 | called Vector Table Relocation.
185 | 
186 | For this, the `hcb.excvta` and `hcb.intvta` registers can be written at any time from
187 | code running in machine mode.
188 | 
189 | ## Context stack
190 | 
191 | When exceptions/interrupts are taken, they push a context on the current stack.
192 | The stack pointer must be xlen aligned. For RV32 harts with the D extension,
193 | an additional alignment to 8 may be required.
194 | 
195 | If the `stackalign` bit in the `ctrl` CSR is set, the stack is always aligned
196 | at 8. Although this is implementation specific, it usually allows faster context
197 | switches.
198 | 
199 | The RISC-V microcontroller profile uses a full-descending context stack, where:
200 | 
201 | - When pushing context, the hardware decrements the stack pointer to the end of the
202 | new stack frame before it stores data onto the stack.
203 | - When popping context, the hardware reads the data from the stack frame and then
204 | increments the stack pointer.
205 | 
206 | The current stack pointer is either `spt` (when in application mode and the
207 | `ctrl.sptena` is set),
208 | or `spm` otherwise (when already in handler mode or `ctrl.sptena` is not set).
209 | 
210 | In other words, regardless how many nested interrupts occur, there is only one
211 | context pushed onto the thread stack, and all other nested contexts are pushed
212 | onto the main stack. Also
213 | all handlers use the main stack, and do not pollute the thread stack, which
214 | do not need to reserve space for the interrupt handlers.
215 | 
216 | For the current RISC-V Linux ABI, the stack context is, from hight to low
217 | addresses:
218 | 
219 | - <-- original `sp` (`spt` or `spm`)
220 | - (optional padding)
221 | - `status` (CSR, the current mode when the exception/interrupt occurred)
222 | - `pc` (the next address to return from the exception/interrupt)
223 | - `x31/t6`
224 | - `x30/t5`
225 | - `x29/t4`
226 | - `x28/t3`
227 | - `x17/a7`
228 | - `x16/a6`
229 | - `x15/a5`
230 | - `x14/a4`
231 | - `x13/a3`
232 | - `x12/a2`
233 | - `x11/a1`
234 | - `x10/a0`
235 | - `x7/t2`
236 | - `x6/t1`
237 | - `x5/t0`
238 | - `x1/ra` <-- new `sp`, possibly align 8
239 | 
240 | With the new RISC-V EABI proposal, this would be reduced to a more
241 | reasonable context stack:
242 | 
243 | - <-- original `sp` (`spt` or `spm`)
244 | - (optional padding)
245 | - `status` (CSR, the current mode when the exception/interrupt occurred)
246 | - `pc` (the next address to return from the exception/interrupt)
247 | - `x15/a5`
248 | - `x14/a4`
249 | - `x13/a3`
250 | - `x12/a2`
251 | - `x11/a1`
252 | - `x10/a0`
253 | - `x1/ra` <-- new `sp`, possibly align 8
254 | 
255 | With floating point support added, the context stack for the current RISC-V
256 | Linux ABI is quite large, which is another good reason why the RISC-V
257 | microcontroller profile should use an optimised Embedded ABI.
258 | 
259 | - <-- original `sp` (`spt` or `spm`)
260 | - (optional padding)
261 | - `fcsr` (\*) <- for double, it must be aligned to 8
262 | - `f31/ft11` (\*)
263 | - `f30/ft10` (\*)
264 | - `f29/ft9` (\*)
265 | - `f28/ft8` (\*)
266 | - `f17/fa7` (\*)
267 | - `f16/fa6` (\*)
268 | - `f15/fa5` (\*)
269 | - `f14/fa4` (\*)
270 | - `f13/fa3` (\*)
271 | - `f12/fa2` (\*)
272 | - `f11/fa1` (\*)
273 | - `f10/fa0` (\*)
274 | - `f7/ft7` (\*)
275 | - `f6/ft6` (\*)
276 | - `f5/ft5` (\*)
277 | - `f4/ft4` (\*)
278 | - `f3/ft3` (\*)
279 | - `f2/ft2` (\*)
280 | - `f1/ft1` (\*)
281 | - `f0/ft0` (\*)
282 | - `status` (CSR, the current mode when the exception/interrupt occurred)
283 | - `pc` (the next address to return from the exception/interrupt)
284 | - `x31/t6`
285 | - `x30/t5`
286 | - `x29/t4`
287 | - `x28/t3`
288 | - `x17/a7`
289 | - `x16/a6`
290 | - `x15/a5`
291 | - `x14/a4`
292 | - `x13/a3`
293 | - `x12/a2`
294 | - `x11/a1`
295 | - `x10/a0`
296 | - `x7/t2`
297 | - `x6/t1`
298 | - `x5/t0`
299 | - `x1/ra` <-- new `sp`, possibly align 8
300 | 
301 | (\*) The floating point registers are not saved by devices that do not
302 | implement the
303 | F or D extensions and do not have the `ctrl.fpena` bit set.
304 | 
305 | To reduce latency, in parallel with saving the registers, the address of the
306 | exception/interrupt handler is fetched from the vector table.
307 | 
308 | After saving the context stack:
309 | 
310 | - the `handler` bit in the `status` register is set, to mark the handler-mode
311 | - the `ra` register is loaded with a special HANDLER_RETURN pattern,
312 | defined below
313 | - the `pc` register is loaded with the handler address; this is equivalent
314 | with calling the handler.
315 | 
316 | When the C/C++ function returns, the return code will load `pc` with the
317 | special HANDLER_RETURN value in `ra`.
318 | This will trigger the exception return mechanism, which will pop the context
319 | from the stack and return from the interrupt/exception.
320 | 
321 | TODO: define the detailed logic in pseudocode.
322 | 
323 | ## The HANDLER_RETURN pattern
324 | 
325 | The special HANDLER_RETURN pattern is an 'all-1' for the given xlen with
326 | some bits used to differentiate contexts.
327 | Since the RISC-V microcontroller profile reserves a slice at the very end
328 | of the memory space (0xF...), and this slice has the execute permissions
329 | removed, it does not create any confusion.
330 | 
331 | This value is generated at exception entrance and is stored in the return
332 | address register (`ra`).
333 | 
334 | The HANDLER_RETURN pattern bits:
335 | 
336 | | Bits | Value | Description |
337 | |:-----|:------|-------------|
338 | | [0] | 1 | Reserved. |
339 | | [1] | 0 | Reserved. |
340 | | [2] | - 0: main stack<br>- 1: thread stack | Stack that holds the context to pop. |
341 | | [3] | - 0: short, without FP<br>- 1: long, with FP | Stack frame type. |
342 | | [4] | - 0: Linux<br>- 1: Embedded | ABI |
343 | | [(xlen-1):5] | 1 | Reserved. |
344 | 
345 | > <sup>The ABI bit is used mainly for compatibility reasons, until the EABI
346 |   will be finalised and implemented by the compiler.</sup>
347 | 
348 | > <sup>The HANDLER_RETURN pattern does not include a bit defining the
349 |   resulting
350 |   application/handler mode, since it can be restored from the saved
351 |   `status` register. Saving this register is necessary not only for the
352 |   `handler` bit (which might have been added to HANDLER_RETURN), but for the
353 |   `cause` field, which otherwise may be overridden by nested interrupts.</sup>
354 | 
355 | > <sup>There is also a [proposal](https://github.com/emb-riscv/specs-markdown/issues/3) 
356 |   to use the lowest bits of the address and to slightly adjust JALR.</sup>
357 | 
358 | ## The FP lazy stacking mechanism
359 | 
360 | The large number of floating point registers take a long time to copy
361 | during context push/pop on the stack.
362 | 
363 | One solution to optimize this is to save them only when needed, by using a
364 | lazy stacking mechanism.
365 | 
366 | TODO: define the details.
367 | 
368 | ## Tail chaining
369 | 
370 | When an exception/interrupt takes place while already in handler mode, and the
371 | priority does not require pre-emption, the new exception/interrupt will enter the
372 | pending state. When the hart finishes executing the current handler, it can then
373 | proceed to process the pending exception/interrupt request. Instead of restoring
374 | the registers back from the stack (unstacking) and then pushing them on to the
375 | stack again (stacking), the hart skips the unstacking and stacking steps and
376 | enters the new handler of the pending exception/interrupt as soon as possible.
377 | 
378 | TODO: define the details.
379 | 
380 | ## Usage
381 | 
382 | ```c
383 | extern "C" {
384 | 
385 | riscv_startup_block_t
386 | __attribute__((section(".startup_blocks")))
387 | harts_startup_blocks[] = {
388 |   {
389 |     hart_startup,
390 |     hart_stack_pointer,
391 |     hart_global_pointer,
392 |     hart_exception_handlers // <--
393 |   }
394 | };
395 | 
396 | // The exception vector table address is automatically set during startup.
397 | riscv_exception_handler_t
398 | hart_exception_handlers[] = {
399 |   exception_handle_address_misaligned,
400 |   exception_handle_address_fault,
401 |   exception_handle_illegal_instruction,
402 |   // ...
403 | };
404 | 
405 | // An example of an exception handler. Plain C function. May return.
406 | void
407 | exception_handle_address_misaligned()
408 | {
409 |   // ...
410 | }
411 | 
412 | // ...
413 | 
414 | [[noreturn]] void
415 | hart_startup(void)
416 | {
417 |   // ...
418 |   // Set the interrupt vector table address.
419 |   hcb.intvta = hart_interrupt_handlers;
420 |   // ...
421 | }
422 | 
423 | riscv_interrupt_handler_t
424 | hart_interrupt_handlers[] = {
425 |   interrupt_handle_context_switch,
426 |   interrupt_handle_rtclock_cmp,
427 |   interrupt_handle_sysclock_cmp,
428 |   // ...
429 | };
430 | 
431 | // ...
432 | 
433 | // An example of an interrupt handler. Plain C function.
434 | void
435 | interrupt_handle_syslock_cmp(void)
436 | {
437 |   // ...
438 | 
439 |   // Simply returns without having to do anything special.
440 | }
441 | 
442 | } // extern "C"
443 | ```
444 | 


--------------------------------------------------------------------------------
/interrupts-use-cases.md:
--------------------------------------------------------------------------------
  1 | # Appendix B: Interrupts use cases
  2 | 
  3 | Regardless how the interrupts are implemented, any architecture design should 
  4 | be checked how well the common use cases are accommodated.
  5 | 
  6 | ## Peripherals vs scheduler interrupts
  7 | 
  8 | By design, old microcontroller architectures expected interrupts to be 
  9 | triggered only occasionally by peripherals.
 10 | 
 11 | Since Cortex-M, interrupts were extensively enhanced with features to
 12 | support the implementation of RTOSes, greatly simplifying context switches
 13 | and preemption. 
 14 | 
 15 | ### Peripheral interrupts
 16 | 
 17 | Although there were opinions that peripheral interrupts should be as simple
 18 | as possible to be fully inlined, a well structured application may use
 19 | drivers from a separate library/package, so the typical use case is to
 20 | have the interrupt handlers in files specific to the application
 21 | and call the driver interrupt service routine via a plain C/C++ call.
 22 | 
 23 | The traditional approach is to have the interrupt handler annotated 
 24 | with the `interrupt` attribute, which generates a fully functional 
 25 | interrupt handler, including preserving registers and returning
 26 | from interrupt.
 27 | 
 28 | ```c
 29 | # include "driver-xyz.h"
 30 | 
 31 | void __attribute__((interrupt))
 32 | interrupt_handle_xyz(void)
 33 | {
 34 |   driver_xyz_interrupt_service_routine();
 35 | }
 36 | ```
 37 | 
 38 | The problem with this approach is that on a RISC-V device, 
 39 | with the current POSIX ABI, the number of
 40 | registers to be saved by the caller is large, 
 41 | and the generated
 42 | code, with `-march=rv64gc -mabi=lp64d` looks like:
 43 | 
 44 | ```
 45 | .option nopic 
 46 | .text 
 47 | .align 1 
 48 | .globl interrupt_handle_xyz 
 49 | .type interrupt_handle_xyz, @function 
 50 | 
 51 | interrupt_handle_xyz: 
 52 | addi sp,sp,-288 
 53 | 
 54 | sd ra,280(sp) 
 55 | sd t0,272(sp) 
 56 | sd t1,264(sp) 
 57 | sd t2,256(sp) 
 58 | sd a0,248(sp) 
 59 | sd a1,240(sp) 
 60 | sd a2,232(sp) 
 61 | sd a3,224(sp) 
 62 | sd a4,216(sp) 
 63 | sd a5,208(sp) 
 64 | sd a6,200(sp) 
 65 | sd a7,192(sp) 
 66 | sd t3,184(sp) 
 67 | sd t4,176(sp) 
 68 | sd t5,168(sp) 
 69 | sd t6,160(sp) 
 70 | fsd ft0,152(sp) 
 71 | fsd ft1,144(sp) 
 72 | fsd ft2,136(sp) 
 73 | fsd ft3,128(sp) 
 74 | fsd ft4,120(sp) 
 75 | fsd ft5,112(sp) 
 76 | fsd ft6,104(sp) 
 77 | fsd ft7,96(sp) 
 78 | fsd fa0,88(sp) 
 79 | fsd fa1,80(sp) 
 80 | fsd fa2,72(sp) 
 81 | fsd fa3,64(sp) 
 82 | fsd fa4,56(sp) 
 83 | fsd fa5,48(sp) 
 84 | fsd fa6,40(sp) 
 85 | fsd fa7,32(sp) 
 86 | fsd ft8,24(sp) 
 87 | fsd ft9,16(sp) 
 88 | fsd ft10,8(sp) 
 89 | fsd ft11,0(sp) 
 90 | 
 91 | call driver_xyz_interrupt_service_routine 
 92 | 
 93 | ld ra,280(sp) 
 94 | ld t0,272(sp) 
 95 | ld t1,264(sp) 
 96 | ld t2,256(sp) 
 97 | ld a0,248(sp) 
 98 | ld a1,240(sp) 
 99 | ld a2,232(sp) 
100 | ld a3,224(sp) 
101 | ld a4,216(sp) 
102 | ld a5,208(sp) 
103 | ld a6,200(sp) 
104 | ld a7,192(sp) 
105 | ld t3,184(sp) 
106 | ld t4,176(sp) 
107 | ld t5,168(sp) 
108 | ld t6,160(sp) 
109 | fld ft0,152(sp) 
110 | fld ft1,144(sp) 
111 | fld ft2,136(sp) 
112 | fld ft3,128(sp) 
113 | fld ft4,120(sp) 
114 | fld ft5,112(sp) 
115 | fld ft6,104(sp) 
116 | fld ft7,96(sp) 
117 | fld fa0,88(sp) 
118 | fld fa1,80(sp) 
119 | fld fa2,72(sp) 
120 | fld fa3,64(sp) 
121 | fld fa4,56(sp) 
122 | fld fa5,48(sp) 
123 | fld fa6,40(sp) 
124 | fld fa7,32(sp) 
125 | fld ft8,24(sp) 
126 | fld ft9,16(sp) 
127 | fld ft10,8(sp) 
128 | fld ft11,0(sp) 
129 | 
130 | addi sp,sp,288 
131 | mret 
132 | 
133 | .size interrupt_handle_xyz, .-interrupt_handle_xyz 
134 | ```
135 | 
136 | Simpler devices, without hardware FP, have slightly shorter code,
137 | but still lots of registers (`-march=rv32i -mabi=ilp32`):
138 | 
139 | ```
140 | .option nopic 
141 | .text 
142 | .align 2 
143 | .globl interrupt_handle_xyz 
144 | .type interrupt_handle_xyz, @function 
145 | 
146 | interrupt_handle_xyz: 
147 | addi sp,sp,-64 
148 | 
149 | sw ra,60(sp) 
150 | sw t0,56(sp) 
151 | sw t1,52(sp) 
152 | sw t2,48(sp) 
153 | sw a0,44(sp) 
154 | sw a1,40(sp) 
155 | sw a2,36(sp) 
156 | sw a3,32(sp) 
157 | sw a4,28(sp) 
158 | sw a5,24(sp) 
159 | sw a6,20(sp) 
160 | sw a7,16(sp) 
161 | sw t3,12(sp) 
162 | sw t4,8(sp) 
163 | sw t5,4(sp) 
164 | sw t6,0(sp) 
165 | 
166 | call driver_xyz_interrupt_service_routine 
167 | 
168 | lw ra,60(sp) 
169 | lw t0,56(sp) 
170 | lw t1,52(sp) 
171 | lw t2,48(sp) 
172 | lw a0,44(sp) 
173 | lw a1,40(sp) 
174 | lw a2,36(sp) 
175 | lw a3,32(sp) 
176 | lw a4,28(sp) 
177 | lw a5,24(sp) 
178 | lw a6,20(sp) 
179 | lw a7,16(sp) 
180 | lw t3,12(sp) 
181 | lw t4,8(sp) 
182 | lw t5,4(sp) 
183 | lw t6,0(sp) 
184 | 
185 | addi sp,sp,64 
186 | mret 
187 | 
188 | .size interrupt_handle_xyz, .-interrupt_handle_xyz 
189 | ```
190 | 
191 | On the other hand, modern designs use plain C functions as interrupt 
192 | handlers, and in this case the generated code looks definitely
193 | better:
194 | 
195 | ```
196 | .option nopic 
197 | .text 
198 | .align 2 
199 | .globl interrupt_handle_xyz 
200 | .type interrupt_handle_xyz, @function 
201 | 
202 | interrupt_handle_xyz:  
203 | tail driver_xyz_interrupt_service_routine 
204 | 
205 | .size interrupt_handle_xyz, .-interrupt_handle_xyz 
206 | ```
207 | 
208 | For this style of handlers to work, it is still necessary to save/restore
209 | the ABI caller registers outside the handler; this can be done either in
210 | hardware, or, for cheap devices, in software.
211 | 
212 | ### Context switches
213 | 
214 | In a multi-threaded environment, a context switches is generally 
215 | a sequence of operations performing the following steps:
216 | 
217 | - interrupt the current thread
218 | - save the state of the current thread in the thread control block (TCB)
219 | - select the next thread to run
220 | - restore the state of the new thread from the selected TCB
221 | - resume execution in the context of the new thread
222 | 
223 | #### Cooperative vs preemptive
224 | 
225 | In a cooperative environment, threads deliberately pass control to other
226 | threads either by directly issuing an `yield()` call, or indirectly 
227 | by calling a system function that internally yields.
228 | 
229 | In a cooperative environment, user interrupt handlers are regular handlers, 
230 | they interrupt the current running code (thread or interrupt), 
231 | perform some operations, and return in exactly the same context.
232 | 
233 | Preemptive environments improve response time by extending some of
234 | the interrupt handlers with code that also performs context switches,
235 | such that the interrupt occurs in the context of one thread but 
236 | returns in the context of another thread.
237 | 
238 | Traditional interrupt handlers need to be changed from the simple 
239 | implementation that calls the peripheral ISR:
240 | 
241 | ```c
242 | void __attribute__((interrupt))
243 | interrupt_handle_xyz(void)
244 | {
245 |   driver_xyz_interrupt_service_routine();
246 | }
247 | ```
248 | 
249 | ... to something like this:
250 | 
251 | ```c
252 | stack_elem_t* 
253 | static inline __attribute__((naked, always_inline))
254 | save_context(void)
255 | {
256 |   // Assembly code to push all registers onto the thread stack
257 |   // ...
258 |   return sp;
259 | }
260 | 
261 | void 
262 | static inline __attribute__((naked, always_inline))
263 | restore_context(stack_elem_t* sp)
264 | {
265 |   // Assembly code to pop all registers from the thread stack
266 |   // ...
267 | }
268 | 
269 | 
270 | void __attribute__((naked))
271 | interrupt_handle_xxx(void)
272 | {
273 |   stack_elem_t* sp = save_context(); // Push all registers onto the thread stack
274 |   
275 |   driver_xyz_interrupt_service_routine();
276 |   
277 |   if (must_switch_context) 
278 |   {
279 |     sp = scheduler_select_next_thread(sp);
280 |   }
281 |   restore_context(sp); // Pop all registers from the thread stack
282 |   return_from_interrupt();
283 | }
284 | ```
285 | 
286 | The complexity vary from RTOS to RTOS, and in real life it must also include
287 | some critical sections, but the general framework is highly similar to the
288 | above; it requires significant changes in the user code and it is not simple.
289 | 
290 | #### Dedicated context switch interrupt
291 | 
292 | In modern RTOS friendly architectures, the context switch is delegated
293 | to a single dedicated interrupt, implemented in the system part, such
294 | that all user interrupt handlers no longer need to worry about this and
295 | can be written directly in C/C++:
296 | 
297 | ```c
298 | void
299 | interrupt_handle_xyz(void)
300 | {
301 |   driver_xyz_interrupt_service_routine();
302 | }
303 | ```
304 | 
305 | ... while the context switch is performed by an interrupt handler like:
306 | 
307 | ```c
308 | void
309 | __attribute__((naked))
310 | interrupt_handle_context_switch(void)
311 | {
312 |   stack_elem_t* sp = save_context(); // Push all registers onto the thread stack
313 |   
314 |   sp = scheduler_select_next_thread(sp);
315 |   
316 |   restore_context(sp); // Pop all registers from the stack
317 | }
318 | ```
319 | 
320 | For this to work, the context switch interrupt must be guaranteed to have the
321 | lowest priority, such that it is executed after all other interrupts are 
322 | completed and the hart/core must return to thread state.
323 | 
324 | #### Triggering a context switch
325 | 
326 | With such a dedicated interrupt, triggering a context switch is as simple 
327 | as pending a software interrupt:
328 | 
329 | ```c
330 | hcb.interrupts[CONTEXT_SWITCH_INTERRUPT_NUMBER].status = INTERRUPTS_SET_PENDING;
331 | ```
332 | 
333 | Pending a context switch interrupt can be performed either in other interrupt
334 | handlers (and in this case the switch occurs after all interrupts are 
335 | completed), or in thread mode, in the `yield()` function, and in this case
336 | the switch is performed as soon as interrupts are enabled.
337 | 
338 | ## Use cases
339 | 
340 | Once defined the mechanism to switch contexts via a dedicated interrupt, 
341 | it is easy to imagine that, with a preemptive scheduler,
342 | most peripheral interrupts can trigger context switches,
343 | so it becomes clear that both peripheral and context switch interrupts 
344 | should be given equal attention in the design.
345 | 
346 | The most general case is when a peripheral interrupt occurs, while it is 
347 | processed other interrupts with lower or equal priorities occur too and
348 | wait their turn, and are processed back-to-back,
349 | and one of those interrupts requests a context switch,
350 | so the last interrupt in the chain is the context switch interrupt.
351 | 
352 | From simple to complex, the use cases are:
353 | 
354 | ### Single peripheral interrupt, no context switch
355 | 
356 | This is the simplest case, when the driver processes the peripheral
357 | data, but does not need to inform the associated thread of the
358 | change, so it does not request a context switch, and after the 
359 | interrupt completed, execution returns to the same thread. 
360 | 
361 | ### Single peripheral interrupt with context switch
362 | 
363 | If the driver decides to inform the thread that new data is available,
364 | for example by raising a semaphore, or pushing data onto a queue, it
365 | must pend the context switch interrupt, which will be executed
366 | back-to-back with the peripheral interrupt.
367 | 
368 | ### Multiple interrupts with context switch
369 | 
370 | If, during the execution of the peripheral interrupt, other
371 | interrupts with lower or equal priority occur, they do not
372 | preempt the current interrupt, but are remembered and when
373 | interrupt completes are executed in sequence, back-to-back,
374 | including the context switch interrupt, if requested.
375 | 
376 | ## Tail chaining
377 | 
378 | Given the use cases presented, with virtually all
379 | peripheral interrupts requesting context switches,
380 | it results that it is highly likely
381 | to have at least two back-to-back interrupts.
382 | 
383 | Old architectures that use interrupt handlers annotated 
384 | with the `interrupt` attribute, simply call the handlers
385 | in sequence, and each handler saves and restores all registers.
386 | 
387 | For back-to-back interrupts, the registers restored by
388 | the first interrupt have exactly the same values as those
389 | saved by the second interrupt, so the long list of
390 | register operations is practically useless, but the
391 | compiler does not know this, so the code is not efficient.
392 | 
393 | For the current RISC-V POSIX ABI, the behaviour is:
394 | 
395 | - process the top priority interrupt
396 |   - enter annotated handler 
397 |   - **save 16 general registers and 20 FP registers**
398 |   - call the C/C++ functions and return
399 |   - **restore 16 general registers and 20 FP registers**
400 |   - exit annotated handler
401 | - possibly process other interrupts with lower or similar 
402 | priority that occur while in interrupt mode, each of them doing (**N times**)
403 |   - enter annotated handler 
404 |   - **save 16 general registers and 20 FP registers**
405 |   - call the C/C++ functions and return
406 |   - **restore 16 general registers and 20 FP registers**
407 |   - exit annotated handler
408 | - process the `context_switch` interrupt (lowest possible priority)
409 |   - enter naked handler
410 |   - **save 32 general registers and 32 FP registers**
411 |   - save the SP in the current thread control block
412 |   - select the next thread to run
413 |   - load SP from the new thread control block
414 |   - **restore 32 general registers and 32 FP registers**
415 |   - exit naked handler
416 | - return from interrupt in the context of the new thread
417 | 
418 | In modern designs, which use plain C interrupt handlers,
419 | the registers are saved before entering the first handler
420 | and restored after the first handler,
421 | so the behaviour is significantly more efficient:
422 | 
423 | - reserve space for the FP registers, but do not save them
424 | - **save** the ABI caller registers
425 | - call the handler for top priority interrupt
426 | - possibly call other handlers for interrupts with lower or similar 
427 | priorities, that occur while in interrupt mode
428 | - call the `context_switch` handler (lowest possible priority)
429 |   - save the rest of the general registers (ABI callee)
430 |   - save the SP in the current thread control block
431 |   - select the next thread to run
432 |   - load the SP from the new thread control block
433 |   - restore the rest of the general registers (ABI callee)
434 |   - return from the handler
435 | - **restore** the ABI caller registers
436 | - return from interrupt in the context of the new thread
437 | 
438 | ## Lazy FP stacking
439 | 
440 | For devices with hardware FP units, the large number of FP
441 | registers may severely impact the interrupt latency.
442 | 
443 | Old architectures that use interrupt handlers annotated 
444 | with the `interrupt` attribute, should always save and 
445 | restore all the FP registers, as seen in the example.
446 | 
447 | In modern designs, which use plain C interrupt handlers,
448 | and the registers are saved before entering the handlers,
449 | it is possible to use a more efficient mechanism, which
450 | only reserves the space onto the thread stack, but does
451 | the actual save only when the first FP instructions is 
452 | executed.
453 | 
454 | Since most interrupt handlers do not use FP instructions,
455 | the saving/restoring of the FP registers is skipped 
456 | entirely, and the interrupt latency is not affected.
457 | 
458 | > <sup>More details on this mechanism to be added as a separate page. 
459 | The design should also consider the ABI callee registers,
460 | handled during context switches.</sup>
461 | 
462 | ## Conclusions
463 | 
464 | When designing a new architecture, the focus should be
465 | to optimise the most common use case, which is a sequence
466 | of 2 or more back-to-back interrupts that in most cases
467 | end with the context switch interrupt.
468 | 
469 | ### `interrupt` handlers are not efficient
470 | 
471 | Although traditional interrupt handlers annotated 
472 | with the `interrupt` attribute may seem a solution for
473 | fast interrupts, they are really fast only if everything
474 | is inlined and no other plain C function is called, otherwise
475 | the entire ABI caller registers must be saved and restored,
476 | including the FP registers, and it must be done repeatedly
477 | for each interrupt, the possibilities for tail chaining and
478 | lazy FP stacking not being realistic.
479 | 
480 | ### Plain C functions are recommended
481 | 
482 | Plain C interrupt handlers are much better suited for the
483 | common use cases and have the following benefits:
484 | 
485 | - easier to use in user code
486 | - allow tail chaining without user intervention
487 | - allow lazy FP stacking without user intervention
488 | - save only the ABI caller registers
489 | - save the ABI callee registers only if context switches are triggered
490 | 
491 | The preferred implementation is with hardware stacking/unstacking,
492 | but cheap devices can also choose to do the stacking/unstacking
493 | in software, together with vectoring, so from a user
494 | point of view they are similar, the interrupt handlers remain
495 | the same plain C functions.
496 | 


--------------------------------------------------------------------------------
/improvements-upon-privileged.md:
--------------------------------------------------------------------------------
  1 | # Appendix A: Improvements upon RISC-V privileged
  2 | 
  3 | ## Rationale
  4 | 
  5 | As mentioned in RISC-V Volume I, v2.2, the _"RISC-V is a new instruction set architecture
  6 | (ISA) that was originally designed to support computer architecture research and education.
  7 | ... The RISC-V manual is structured in two volumes. This volume covers the user-level ISA
  8 | design, including optional ISA extensions. The second volume provides the privileged
  9 | architecture."_
 10 | 
 11 | The RISC-V Volume II, v1.10, mentions: _"... This document describes the RISC-V privileged
 12 | architecture, which covers all aspects of RISC-V systems beyond the user-level ISA,
 13 | including privileged instructions as well as additional functionality required for
 14 | running operating systems and attaching external devices."_
 15 | 
 16 | This is great news for the GNU/Linux community and for the academia, but attempts
 17 | to identify in the RISC-V specs how the new design meets the requirements of bare-metal
 18 | embedded devices were not very successful; browsing the two docs revealed only some
 19 | references to Tensilica and ARC (probably not the most successful embedded architectures),
 20 | and some incomplete specs for **the RV32E subset** (which halves the number of general
 21 | registers, do not support hardware floating point and makes some counter instructions
 22 | optional).
 23 | 
 24 | According to the privileged specs in Volume II, **RISC-V embedded systems share the
 25 | exact same definitions as systems running Unix-like operating systems, but they do
 26 | not include the "S" (Supervisor) mode features**.
 27 | 
 28 | This strategy does not work very well for real-time systems; for example, **in the
 29 | RISC-V interrupt model**, without special measures, **interrupts remain disabled
 30 | while executing interrupt handlers**. This may be acceptable for general purpose
 31 | Linux kernels, but for hard real-time systems this is generally a no-go, since
 32 | **interrupt latency** may end up well above tolerable limits.
 33 | 
 34 | ### The dividing line
 35 | 
 36 | Currently there is no clear understanding where the dividing line between RISC-V
 37 | general purpose and microcontroller devices should be.
 38 | 
 39 | One possible approach is to start by defining what microcontroller devices are not:
 40 | they definitely are not expected to run multi-process applications on top of
 41 | Unix-like operating systems. Although some
 42 | projects try to challenge this, it is generally agreed that **Unix-like operating
 43 | systems DO need virtual memory and supervisor modes** to properly run multi-process
 44 | applications.
 45 | 
 46 | After long considerations, the conclusion was that the common and logical dividing
 47 | line between the RISC-V privileged profile and a RISC-V microcontroller
 48 | profile is the capability to run a full-blown operating system, that uses virtual 
 49 | memory and supervisor modes (like Unix and derivatives); as such, **RISC-V
 50 | microcontrollers are devices
 51 | that do not implement a virtual memory system or supervisor modes** and are
 52 | intended to run single-process multi-threaded applications only (and are not 
 53 | intended to run Unix-like systems).
 54 | 
 55 | > <sup>[JB] Two more criteria may be used
 56 | for dividing microcontrollers and application processors: pipeline
 57 | complexity and memory latency. Microcontrollers use simpler, in-order
 58 | pipelines and have memory subsystems that are tightly synchronized to
 59 | the execution pipeline. ... Out-of-order and parallel
 60 | execution are becoming common features in application processors, but
 61 | are not used in microcontrollers, since the latter must have predictable
 62 | execution timing. [ilg] The pipeline complexity and memory latency should
 63 |   be implementation specific. Microcontrollers intended for applications that
 64 |   need predictable execution timings may decide not to implement
 65 |   out-of-order and parallel execution, or allow to disable them at run time.
 66 | </sup>
 67 | 
 68 | ## Improvements upon RISC-V privileged
 69 | 
 70 | The main 'pain-point' with the current RISC-V privileged specs
 71 | is the mechanism to handle interrupts, which is not suitable for real-time,
 72 | low power, bare-metal embedded applications.
 73 | 
 74 | The following issues were identified in the current RISC-V privileged specs when
 75 | used for bare-metal applications:
 76 | 
 77 | | RISC-V Privileged | RISC-V Microcontroller |
 78 | |-------------------|------------------------|
 79 | | Handlers run with interrupts disabled; low priority interrupts that take a long time to complete may delay high priority interrupts, affecting real-time capabilities. | The microcontroller profile allows nesting; high priority interrupts preempt low priority ones, being processed as fast as possible. |
 80 | | There is only a single trap handler, serving all interrupts and exceptions (the so called _vectored_ mode is so complicated to use that it is not even worth mentioning). | The microcontroller profile has an advanced vectored mode; interrupts are dispatched to separate handlers, via a simple array of pointers, easy to define in C/C++. |
 81 | | The interrupt code must be written in assembly, to perform the low level stacking/unstacking and return from exception; this code **is** complicated, a good example is the [Linux handler](https://github.com/torvalds/linux/blob/master/arch/riscv/kernel/entry.S), and the current Linux implementation does not even re-enable interrupts while in handler mode. | The microcontroller profile automatically performs the stacking/unstacking, allowing all application interrupt handlers to be written as C/C++ functions, with minimum latency. |
 82 | | The current ISA Volume I manual defines a common POSIX ABI to be used by all devices, but this ABI requires the caller to save a lot of registers, making interrupt stacking/unstacking very expensive and increasing latency. | Better adapted to real-time, the microcontroller profile defines a lighter Embedded ABI, reducing latency. |
 83 | | The privileged profile defines a few hundred CSRs, and encourages implementation to define even more custom CSRs; current debuggers do not have support for proprietary mechanisms like CSRs, and viewing/changing these registers requires unusual hacks. | The microcontroller profile uses a very limited set of CSRs and favours the use of memory mapped registers, which are very well supported by debuggers/IDEs, including via detailed peripheral register viewers. |
 84 | | The current RISC-V ISA does not explicitly define a stack (it is only mentioned in the POSIX ABI), and there is no separate stack for interrupts; in a multi-threaded application, interrupts can occur on any thread stack, thus when provisioning for thread stacks, the additional memory requirements of all interrupts must be added to all thread stacks, wasting precious RAM. | The microcontroller profile not only defines the stack pointer register, but also adds a shadow thread stack pointer, separate from the main stack used by the interrupts, improving RTOS implementations and reducing tread stack requirements for RTOS multi-threaded applications. |
 85 | | A common reason of crashes during embedded systems development is one of the threads running out of space; the specs do not provide a standard way of detecting stack overflows. | The microcontroller profile adds a stack limit register and stack overflows trigger exceptions. |
 86 | | The system clock runs from the low frequency real-time clock, which has low resolution and, at common 32768 Hz frequencies, does not allow accurate 1000 Hz scheduler clocks. | The microcontroller profile defines separate low-power real-time clock and high accuracy system clock, improving both general clock resolution and scheduler clock accuracy. |
 87 | | There is no explicit mechanism to trigger and implement context switches in a multi-threaded RTOS. | The microcontroller profile adds a dedicated interrupt, guaranteed with the lowest priority, to be used for all context switches, relieving all other interrupt handlers from this duty. |
 88 | || The microcontroller profile adds an architecture device reset mechanism. |
 89 | || The microcontroller profile adds an architecture resumable NMI. |
 90 | | The startup code also requires some assembly code, to set the stack pointer and the `gp` register. | The microcontroller profile adds a simplified device startup code, based on a table of standard C/C++  pointers, requiring no assembly code at all. |
 91 | 
 92 | ## Criticism
 93 | 
 94 | ### Fragmentation would break upward compatibility
 95 | 
 96 | While discussing the opportunity for a new RISC-V embedded profile, the most common
 97 | concern raised was that migrating an applications written for a microcontroller to a
 98 | larger application class core would be more difficult.
 99 | 
100 | Well, yes, in theory it might be possible to design a board in such a way to allow to
101 | swap in a bigger core, and to design the application in such a way to ignore the MMU
102 | and the supervisor mode and continue to use a RTOS; the application will probably run
103 | faster due to the improved pipelines and core clock rates, but there are several
104 | practical issues:
105 | 
106 | - in industrial embedded applications the processor selection is not based on the
107 | architecture (which in the majority of cases is Cortex-M only), but on the available
108 | on-chip peripherals; it is very unlikely to find an application class core with the
109 | desired peripherals available on a microcontroller;
110 | - application class cores generally do not have internal flash/ram, requiring
111 | external chips; external memory chips require lots of address and data pins, which
112 | mean large BGA chips, larger & more complex PCBs, and generally higher costs.
113 | 
114 | So this concern is not realistic, and not accepting a distinct microcontroller
115 | profile, optimised for real-time applications simply for maintaining compatibility
116 | with the privileged specs is not a beneficial approach.
117 | 
118 | ### No need to, everything will run Linux in the future
119 | 
120 | > "In the future everything will run Linux, so defining separate non-Linux profiles
121 | is a futile exercise."
122 | 
123 | Yes. Sure. Eventually. No doubt about it. When waiting long enough many marvellous things can happen.
124 | 
125 | However, for those who are not ready to wait for the kingdom come, having simpler
126 | devices for critical real-time applications is a requirement for today.
127 | 
128 | ### Automatic stacking/unstacking is evil
129 | 
130 | > "Automatic stacking/unstacking is fine for Cortex-M, but it is
131 | very objectionable for
132 | RISC-V. The difference is in ARM's MOVEM instruction. A Cortex-M
133 | already has the hardware to move multiple words to/from the stack
134 | because it has to implement the MOVEM instruction anyway. So doing
135 | this specially on trap entry/exit is a small addition to what is
136 | already required to execute the user instruction set. The story for
137 | RISC-V is different, as it has no MOVEM-like instruction. Therefore,
138 | having the hardware automatically push/pop a collection of registers on
139 | trap entry/exit is a larger addition for RISC-V than it was for ARM."
140 | 
141 | Well, that's a point of view. However, it must be noted that even for
142 | the tiny Cortex-M0, so economical in terms of transistors, ARM decided
143 | to do automatic stacking/unstacking, so the added complexity might not
144 | be that high.
145 | 
146 | Not to mention another detail: Cortex-M has 16 registers, and the
147 | EABI requires R0-R3, R12, R14, PC and xPSR
148 | to be stacked/unstacked automatically. On the other hand,
149 | the LDM/STM instructions, probably due to to the tight encoding,
150 | are able to move only half of the registers,
151 | (R0-R7), so the logic to do the stacking/unstacking is
152 | definitely more capable than required by the instruction set.
153 | It is not by accident that Cortex-M has automatic stacking/unstacking
154 | simply because support for LDM/STM was present anyway, it is
155 | by design. As an exercise of reversed logic,
156 | it might also be argued that Cortex-M has the LDM/STM instructions
157 | because the logic for moving multiple words was already
158 | available from the automatic stacking/unstacking mechanism.
159 | 
160 | Also RISC-V having no MOVEM-like instructions may save a few
161 | transistors, but otherwise this is not exactly a feature,
162 | it simply makes saving contexts in multi-threaded environments more
163 | complicated and possibly less efficient.
164 | 
165 | ### Automatic stacking/unstacking the interrupt context increases latency
166 | 
167 | The current RISC-V ABI requires the caller to save the following registers:
168 | `ra`, `t0`, `t1`, `t2`, `a0`, `a1`, `a2`, `a3`, `a4`, `a5`, `a6`, `a7`, `t3`,
169 | `t4`, `t5`, `t6`. This amounts to 16 registers. If floating point is used, 20
170 | more registers must be saved.
171 | 
172 | The current RISC-V privileged specs do not define a hardware stack and do not
173 | require the core to save any registers
174 | on the stack, delegating this to the assembly trap handler.
175 | 
176 | Some voices claim that this strategy allows the application to have a highly optimised
177 | assembly trap handler, and as such avoid pushing all registers.
178 | 
179 | Well, yes, in theory, for very simple (blinky) applications this might be so, but for
180 | real applications, regardless how optimised is the assembly trap handler, at a certain
181 | point it'll need to call a C function (for example to access a system service, like
182 | posting to a semaphore), and at this point the entire ABI caller register set must be
183 | pushed onto the stack, and popped after the C function returns.
184 | 
185 | The result of this strategy is that the assembly trap handler will initially save a
186 | small number of registers (those known to be used by the handler), then save the rest
187 | of the register set to prepare for the C call, so the full register set must be
188 | saved anyway.
189 | 
190 | For the current ABI this still means 20 registers, which is a lot. The real problem
191 | here is not the decision to save them automatically (which greatly simplifies the
192 | software), but the current ABI which is designed for user mode Unix applications.
193 | 
194 | The solution is a **separate Embedded ABI (EABI)**, optimised for embedded real-time
195 | applications, with a smaller caller register set.
196 | 
197 | ### Automatic stacking/unstacking should be replaced by compiler attribute
198 | 
199 | > "Better to have a “handler” function attribute that causes the compiler
200 | to save only and exactly the registers the function modifies. If a handler
201 | function calls a regular C function then it needs to save all the volatile
202 | registers first.
203 | 
204 | Well, yes, as argued before, for very simple applications
205 | it is possible to imagine interrupt handlers incrementing a
206 | variable and
207 | returning, but this is rare, by far the biggest majority of interrupt
208 | handlers call C/C++ functions to perform system services, like posting
209 | a semaphore, pushing to a queue, or any other synchronization mechanism.
210 | 
211 | So, by using a custom prolog/epilogue which may decide to first save a few 
212 | registers and then save all those required
213 | by the ABI, it might end up with more registers that need to be saved, thus
214 | further worsening the latency.
215 | 
216 | > "You consider it too much work to add  `__attribute__ ((interrupt))`
217 |   to appropriate C functions, as on non-Cortex M ARM32, ARM64, x86 etc?"
218 |   
219 | No, adding an attribute is only a minor nuisance and a possible reason 
220 | for incompatibilities between compilers.
221 | 
222 | However, the question is interesting, because it reveals a common mistake,
223 | it expects microcontrollers to
224 | share the same behaviour with general purpose application devices.
225 | 
226 | Unfortunately this is not the case, one major difference is that
227 | microcontrollers have lots of peripherals, each with one or more 
228 | interrupts; thus, an embedded application, with or without an RTOS,
229 | is mainly interrupt driven. 
230 | 
231 | And with the advent of fast devices, like USB, or QSPI, possibly with
232 | DMAs, the number of interrupts may be quite high, and the total time
233 | spent in interrupt mode may be significant.
234 | 
235 | Thus the need for a careful design to tackle efficiency issues.
236 | 
237 | The very general case is with a sequence of interrupts of decreasing 
238 | priorities, that most probably trigger a context switch, on a machine 
239 | with hardware floating point.
240 | 
241 | With hardware stacking/unstacking and lazy FP, the expected behaviour 
242 | when an interrupt with a priority higher than the threshold occurs, is:
243 | 
244 | - reserve space for the FP registers, but do not save them
245 | - save the ABI caller registers
246 | - enter handler for top priority interrupt, which calls
247 | other C/C++ functions and finally returns
248 | - possibly enter other handlers for interrupts with lower or similar 
249 | priorities, that occur while in interrupt mode
250 | - enter the `context_switch` handler (lowest possible priority)
251 |   - save the rest of the general registers
252 |   - save the SP in the current thread control block
253 |   - select the top priority thread
254 |   - load the SP from the new thread control block
255 |   - restore the rest of the general registers
256 |   - return from the handler
257 |  - restore the ABI caller registers
258 |  - return from interrupt in the context of the new thread
259 |  
260 |  If the interrupt routines do not use FP, the FP registers are not saved,
261 |  having no impact on latency; they will be saved before the first
262 |  FP instruction is executed.
263 |  
264 |  Without hardware stacking/unstacking, without lazy FP and relying only  
265 |  on the compiler to save/restore the registers, for the current RISC-V 
266 |  POSIX ABI, the behaviour is:
267 |  
268 | - process the top priority interrupt
269 |   - enter annotated handler 
270 |   - **save 16 general registers and 20 FP registers**
271 |   - call the C/C++ functions and return
272 |   - **restore 16 general registers and 20 FP registers**
273 |   - exit annotated handler
274 | - possibly process other interrupts with lower or similar 
275 | priority that occur while in interrupt mode, each of them doing (**N times**)
276 |   - enter annotated handler 
277 |   - **save 16 general registers and 20 FP registers**
278 |   - call the C/C++ functions and return
279 |   - **restore 16 general registers and 20 FP registers**
280 |   - exit annotated handler
281 | - process the context_switch interrupt (lowest possible priority)
282 |   - enter naked handler
283 |   - **save 32 general registers and 32 FP registers**
284 |   - save the SP in the current thread control block
285 |   - select the top priority thread
286 |   - load SP from the new thread control block
287 |   - **restore 32 general registers and 32 FP registers**
288 |   - exit naked handler
289 | - return from interrupt in the context of the new thread
290 | 
291 | As it can be seen, without special precautions, each interrupt
292 | must push/pop the ABI caller registers, even if interrupts 
293 | are back to back and the popped registers are immediately 
294 | pushed again, possibly several times in a row.
295 | 
296 | It might be possible to somehow further optimise this mechanism,
297 | but I seriously doubt that it can be more efficient than the hardware
298 | stacking/unstacking, with tail chaining and lazy FP.
299 | 
300 | Challenge: find a solution more efficient than the current proposal,
301 | in terms of time and/or ease of use. In practical terms, write an
302 | user interrupt handler that must be able to conditionally perform
303 | a context switch (sometimes it does, sometimes it does not, depending
304 | on the peripheral).
305 | 
306 | ### Assembly interrupt handlers should be ok, they reside in the system part
307 | 
308 | > "Why insist on having the interrupt handlers written in C, when they can be very well
309 | written in assembly, since they reside in the system part, written by the system
310 | programmers, not by the application programmer."
311 | 
312 | Well, this might be the case for Linux, where the kernel and the modules are
313 | indeed written by system gurus, but in embedded bare-metal applications the
314 | interrupt handlers are very application specific and cannot be part of
315 | the RTOS or drivers/libraries, so it is the application programmer who must
316 | write them, not someone else, thus the need for the interrupt handlers to be
317 | as easy to write as possible, the best choice being to have them defined as plain
318 | C/C++ functions.
319 | 
320 | > "Interrupt handlers do not need to be entirely in assembler, only
321 | the entry/exit millicode needs to be part of the system. That millicode
322 | *can* be written by system gurus, while the application ISRs, written by
323 | application programmers, are called via the millicode. Unless
324 | there is some faster memory access cycle that the hardware can use,
325 | automatic context save/restore (presumably in microcode) will be no
326 | faster than RISC-V millicode."
327 | 
328 | Well, reversing the logic, there is no best case scenario when the millicode
329 | will be faster than the microcode; even when there is no faster memory access
330 | for the microcode, the millicode will still have to make a call to the actual
331 | interrupt handler so the total timing cannot be better. The main difference
332 | is the ease of use, the application programmer will no longer need any guru to
333 | write the millicode.
334 | 
335 | 
336 | ### CSRs cannot be memory-mapped
337 | 
338 | Another almost 'religious' RISC-V issue is related to accessing the system registers.
339 | Before a more elaborate explanation, those who claim this should remember that
340 | the current privileged specs moved `mtime` and `mtimecmp` from CSRs to
341 | memory mapped, and the PLIC specs require all registers to be memory mapped.
342 | 
343 | Generally, with
344 | the exception of a very limited set of special cases, industry standard
345 | architectures map most of the system peripherals and registers to a memory area.
346 | Instead, the
347 | RISC-V ISA defined several special instructions allowing to address 4096 per-hart
348 | registers.
349 | 
350 | It is generally agreed that for application class devices, with complex out-of-order
351 | pipelines, the current mechanism has several
352 | advantages. Unfortunately, the RISC-V privileged specs abused this
353 | mechanism, and now there are several hundred registers defined in this proprietary
354 | space, some of them even read-only, and obviously creating no security threads (like
355 | `mvendorid`, `marchid`, etc).
356 | 
357 | From the point of view of a microcontroller profile, this mechanism of accessing
358 | the system registers has two main disadvantages:
359 | 
360 | - requires assembly code to access each individual register
361 | - it is not supported by current development tools (debuggers have no ways of accessing
362 | these registers, IDEs have no special views for them, etc).
363 | 
364 | Mapping system registers in the memory space is perfectly possible, and the RISC-V
365 | privileged specs even mandates for some registers like the `mtime` and `mtimecmp`,
366 | not to mention the PLIC, to be memory mapped.
367 | 
368 | However, from a technical point of view, for virtual memory systems, accessing system
369 | registers from code running in user mode requires 'punching' some holes into the
370 | virtual memory space to reach the special memory mapped registers, which adds some
371 | complexity, and may cause havoc to the pipelines. But, since the specs require this
372 | mechanism for `mtime` and `mtimecmp`,
373 | it no longer matters if there are two or more such memory mapped registers.
374 | 
375 | Fortunately, microcontrollers running without a MMU do not have this problem,
376 | accessing any memory mapped registers is usual, and the cost of doing so is
377 | perfectly acceptable.
378 | 
379 | Plus that in the microcontroller profile there are _no_ hardware security
380 | boundaries, so the risk of attacks somehow exploiting the CSR-as-MMIO is a
381 | non-issue.
382 | 
383 | ### The hardware stack limit register is expensive
384 | 
385 | > "The stack limit register needs to be read and compared on every
386 | store via the stack register so it should have dedicated read circuit
387 | and comparator.
388 | 
389 | Yes, it is a small price to pay, but by far the most common cause of crashes
390 | in a multi-threaded device is stack overflow, so detecting this exception
391 | should be worth the extra price.
392 | 
393 | ### Microcontrollers do not need privilege levels
394 | 
395 | > If microcontrollers do not run a kernel, why have privilege levels?
396 | 
397 | It is true that microcontrollers do not run a 'unix kernel' (they run a 'scheduler').
398 | But for some security concerned applications, microcontrollers can run the
399 | application code in unprivileged mode and the scheduler/drivers in
400 | privileged mode.
401 | 
402 | ARM Cortex-M devices can run code in unprivileged mode, and new
403 | Cortex-M23/M33 devices even have a TrustZone security feature.
404 | Also most of the Cortex-M devices have an MPU, which prevents unprivileged
405 | code accessing system memory/registers.
406 | 
407 | Using the unprivileged mode is not at all unusual,
408 | [ARM CMSIS](http://www.keil.com/pack/doc/CMSIS/General/html/index.html), the industry
409 | software standard for Cortex-M devices, includes a component called CMSIS RTOS, and
410 | the reference implementation is
411 | [Keil RTX](http://www.keil.com/pack/doc/CMSIS/RTOS/html/rtxImplementation.html),
412 | which by default runs application code in unprivileged mode.
413 | 
414 | It is true that, with all ARM marketing, RTX is not the most successful RTOS,
415 | but even FreeRTOS has a mode in which the MPU can be activated.
416 | 
417 | ### C embedded system programmers vs C embedded application programmers
418 | 
419 | > C embedded systems programmers might be used to accessing peripheral via
420 | registers, but C embedded application programmers are used to accessing peripherals
421 | via system calls. C embedded system programmers are also used to writing ISRs in assembly.
422 | 
423 | The distinction between system and application programmers stands perfectly true for
424 | Unix-like systems, where a small team of highly experienced system programmers write
425 | the low level kernel code and the device drivers, allowing millions of application
426 | programmers to access all required resources via system calls, without bothering
427 | with details.
428 | 
429 | However, in the embedded world, this distinction is almost non existent, C embedded
430 | programmers are both application and system programmers. On one hand they need
431 | full and unlimited control of the hardware, and on the other hand they would
432 | like too have access from high level C/C++ code.
433 | 
434 | Having to use assembly code is definitely not a joy for modern embedded programmers,
435 | especially since Cortex-M came to market in 2004, and allowed to write interrupt
436 | handlers directly in C/C++, without any assembly stubs, millicodes or compiler
437 | attributes/pragmas.
438 | 
439 | ### Comparisons with ARM are meaningless
440 | 
441 | > "Arguments like 'ARM does this' are very weak for a feature in RISC-V"
442 | 
443 | Well, the creators of the RISC-V instruction set probably have all reasons to be
444 | proud of their design, and many in the academia may consider that application
445 | class devices based on RISC-V may very well make ARM similar devices irrelevant,
446 | but in the embedded space, ARM Cortex-M **is** the industry standard, and
447 | disregarding it is not beneficial.
448 | 
449 | Except the auto industry, which is more conservative, where old proprietary
450 | cores still have a significant market share (but losing it), and some very
451 | cost driven applications,
452 | where 8-bit microcontrollers are considered still good enough, the majority of the
453 | silicon vendors
454 | now sell microcontrollers based on Cortex-M cores; the trend is clear,
455 | it was observed for more than 10 years, and the Cortex-M market share is
456 | expected to continue to increase in the years to come.
457 | 
458 | ARM tried to sell licenses for microcontrollers even before the Cortex-M family
459 | was created, but with very limited success. The devices were very similar to
460 | their application cores, and used the same solutions, for example a single
461 | interrupt handler, and lots of assembly code required to start and make use of
462 | core.
463 | 
464 | There may be multiple reasons why Cortex-M was so successful, but the main one
465 | probably is the ease of use, and the C-friendliness, by design.
466 | 
467 | This lessened the need for a C system programmer to act as guru, and allowed
468 | C application programmers to fully take control of their applications.
469 | 
470 | > "RISC-V microcontrollers should compare to PIC or AVR devices"
471 | 
472 | This was probably true 10-15 years ago, but today it is no longer the case.
473 | Not only the industry migrated to 32-bit cores, but the ecosystems around
474 | Cortex-M and the ease of use made most of the other cores irrelevant.
475 | 
476 | There is also another fact to be noted: according to several studies, the
477 | world wide population of programmers is doubling every few years (let's
478 | say N, less than 10). Assuming a constant share for the embedded programmers,
479 | statistically half of them have less than N/2 years of experience, and most
480 | of these new (relatively inexperienced) programmers met Cortex-M as their
481 | first architecture (which is already 14 years old!). They may have heard of
482 | PIC and AVR, but never had to write assembly interrupt handlers, and
483 | asking them to do so will obviously be seen as a major step backward.
484 | 
485 | ### Microcontrollers should not be on networks
486 | 
487 | > "Generally, microcontrollers should probably not be on networks, except
488 | possibly for multi-core versions that can handle real-time tasks on one
489 | core and network latency on the other."
490 | 
491 | Yes, multi-hart devices would be excellent for hard real-time applications,
492 | by allocating separate harts for each critical task,
493 | but with nested, pre-emptive high priority interrupts, even a single hart device
494 | can handle multiple tasks very well, and if the real-time tasks are driven by ISRs,
495 | then the network stack can run at a lower priority.
496 | 
497 | ### 64-bit microcontrollers will never be needed
498 | 
499 | > "Why 64-bit? Any system big enough to need more
500 | than 32-bit addressing is probably already running an operating system
501 | like Linux."
502 | 
503 | That's a good question. By the time 8051 was king, many questioned
504 | why would someone think of 16-bit microcontrollers. While some were
505 | debating this, vendors gradually offered devices with 12 address
506 | bits, then 16 bits, even 24 bits. Cortex-M came boldly and provided
507 | 32-bit registers and a large (32-bit) linear address space.
508 | 
509 | Although a 4 GiB memory space may be enough for most current devices,
510 | it should be noted that 64-bit devices bring not only a wider memory
511 | space, but also 64-bit registers, and native atomic 64-bit accesses.
512 | 
513 | Applications with lots of integer arithmetic may benefit from 64-bit
514 | cores, and, indirectly, applications manupulating double floating point
515 | numbers may also benefit.
516 | 
517 | Also applications with large and fast timers benefit from atomic 64-bit
518 | accesses, which otherwise require a lot of juggling on a 32-bit platform
519 | (see the recommended RISC-V mechanism to access the timer registers on
520 | a 32-bit device).
521 | 
522 | ## Proposed steps to change the current RISC-V specs
523 | 
524 | It is not realistic to expect a new set of RISC-V microcontroller specs to be
525 | adopted overnight. However, given the expected ratification of the current specs by
526 | the RISC-V Foundation, it is quite urgent to ensure that this process will not block
527 | further developments in the embedded/microcontroller space.
528 | 
529 | This will probably require several steps, but the main ones are:
530 | 
531 | - acknowledge that microcontroller devices have different requirements
532 | compared to systems running Unix-like operating systems
533 | - acknowledge that the solutions provided by the current privileged mode
534 | specs are not optimal for real-time, low power, bare-metal embedded applications
535 | - acknowledge the need for changes in the current specs
536 | - relax the requirements for the privileged specs
537 | - create new specs for the microcontroller profile
538 | 
539 | ### Acknowledge the need for the changes
540 | 
541 | Given the current structure of the RISC-V Foundation, with most of the efforts
542 | focused on finalising the specifications required for running Unix-like
543 | operating systems, acknowledging that the specifications for general
544 | purpose devices do not work very well for real-time systems will be a challenge.
545 | 
546 | ### De-entangle the privileged specs
547 | 
548 | The RISC-V Volume II, v1.10, mentions: _"... the entire privileged-level design
549 | described in this document could be replaced with an entirely different
550 | privileged-level design without changing the user-level ISA, and possibly without
551 | even changing the ABI. In particular, this privileged specification was designed
552 | to run existing popular operating systems, and so embodies the conventional
553 | level-based protection model. Alternate privileged specifications could embody
554 | other more flexible protection-domain models."_
555 | 
556 | So, at least in theory, it should be possible to extend the specs, but in
557 | practice it is not clear how exactly this can be done. Ideally, **the Volume
558 | I should not explicitly refer to Volume II**, or should refer to it as optional,
559 | leaving room for a complementary specification for other classes of devices,
560 | including microcontroller devices.
561 | 
562 | As a parenthesis, the RISC-V ISA specs provide a very high degree of flexibility
563 | allowing for custom extensions for the instruction set, but they are still very
564 | rigid by insisting that all these devices should be able to run Unix-like
565 | operating systems.
566 | 
567 | ### Move all mandatory CSRs to the privileged specs
568 | 
569 | Apart from relaxing the need for the privileged specs, the instruction set defined
570 | by Volume I is generally acceptable for microcontroller devices.
571 | 
572 | The only notable exception is in Chapter 2.8, the `rdcycle`, `rdtime` and `rdinstret`
573 | which should be moved to Volume II.
574 | 
575 | Related to these instructions, the list of CSRs defined in Table 19.3 should be
576 | shortened, by moving the `cycle`, `time` and `instret` to Volume II, allowing for
577 | microcontrollers to define a more efficient set of mandatory registers.
578 | 
579 | ### Remove the POSIX ABI from Volume I
580 | 
581 | Another important issue with the current specs is the mandatory use of the POSIX ABI,
582 | which is too expensive for real-time devices.
583 | 
584 | The solution is to move it either to Volume II, or to a separate assembly
585 | programmer's handbook, and allow a microcontroller profile to define
586 | an EABI, (Embedded ABI), as a lighter version of the POSIX ABI.
587 | 
588 | 


--------------------------------------------------------------------------------