├── watchdog-timer.md ├── history.md ├── contributing.md ├── README.md ├── memory-map.md ├── device-control-block.md ├── startup.md ├── rtos-support-features.md ├── hart-control-block.md ├── introduction.md ├── real-time-clock.md ├── interrupt-controller.md ├── system-clock.md ├── eabi.md ├── csrs.md ├── exceptions-and-interrupts.md ├── interrupts-use-cases.md └── improvements-upon-privileged.md /watchdog-timer.md: -------------------------------------------------------------------------------- 1 | # The Device Watchdog Timer (WDT) 2 | 3 | TODO: define it. (consider E300 watchdog for inspiration) 4 | 5 | -------------------------------------------------------------------------------- /history.md: -------------------------------------------------------------------------------- 1 | # Appendix C: History 2 | 3 | The open source project was created on GitHub in October 2017 (the 4 | [The Embedded RISC-V Project](https://github.com/emb-riscv)), 5 | but initially there was no content available. 6 | 7 | Work on the first proposal of the specs started in late January 2108, with the 8 | text formatted as markdown, and the preliminary version 0.1.1 was ready by the 9 | end of February 2018, and submitted to selected readers for feedback. 10 | -------------------------------------------------------------------------------- /contributing.md: -------------------------------------------------------------------------------- 1 | # Appendix D: Contributing 2 | 3 | As for most open source projects, all contributions are welcomed! 4 | 5 | ## Bugs 6 | 7 | Any mistakes that are identified, either typos, logic mistakes, wrong 8 | argumentations, etc, should be addressed as Bugs in the 9 | [Issues](https://github.com/emb-riscv/specs-markdown/issues) section. 10 | 11 | ## Enhancements 12 | 13 | Clearly defined proposals should be addressed as Enhancements to the 14 | [Issues](https://github.com/emb-riscv/specs-markdown/issues) section, 15 | or, even better, as 16 | [Pull requests](https://github.com/emb-riscv/specs-markdown/pulls). 17 | 18 | ### C/C++ use cases 19 | 20 | Proposals should be accompanied by use-cases in C/C++, and solid argumentation 21 | why the new solution is more efficient and/or easier to use than existing 22 | solutions. 23 | 24 | Please don't forget that the mission statement is to "define a modern 25 | C/C++ friendly architecture", so solutions that cannot be expressed in 26 | C/C++ need a very good argumentation to be seriously considered. 27 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # The RISC-V Microcontroller Profile 2 | 3 | A proposal for a friendlier microcontroller architecture using the RISC-V instruction set. 4 | 5 | Version: 0.2.1-pre 6 | 7 | Editors: 8 | * Liviu Ionescu 9 | 10 | Warning: This draft specification is in a preliminary phase and may change at any time. For the moment it is more like a wish list than a real specs document. 11 | 12 | 13 | ## Motto 14 | 15 | _"People are more expensive than transistors"._ 16 | 17 | ## Table of Contents 18 | 19 | * [Introduction](introduction.md) 20 | * [Memory Map](memory-map.md) 21 | * [The Startup Process](startup.md) 22 | * [Exceptions and Interrupts](exceptions-and-interrupts.md) 23 | * [Control and Status Registers (CSRs)](csrs.md) 24 | * [Hart Control Block (`hcb`)](hart-control-block.md) 25 | * [Hart Interrupt Controller (`hic`)](interrupt-controller.md) 26 | * [Device Control Block (`dcb`)](device-control-block.md) 27 | * [Device Real-Time Clock (`rtclock`)](real-time-clock.md) 28 | * [Device System Clock (`sysclock`)](system-clock.md) 29 | * [Device Watchdog Timer (`wdog`)](watchdog-timer.md) 30 | * [Embedded ABI (EABI)](eabi.md) 31 | * [RTOS Support Features](rtos-support-features.md) 32 | * [Appendix A: Improvements upon RISC-V privileged](improvements-upon-privileged.md) <--- Read Me First! 33 | * [Appendix B: Interrupts use cases](interrupts-use-cases.md) 34 | * [Appendix C: History](history.md) 35 | * [Appendix D: Contributing](contributing.md) 36 | 37 | TODO: 38 | 39 | - add MPU definitions 40 | - add more details about the restrictions in user mode. 41 | 42 | ## License 43 | 44 | This document is released under a [Creative Commons Attribution 4.0 International](https://creativecommons.org/licenses/by/4.0/legalcode) license. 45 | -------------------------------------------------------------------------------- /memory-map.md: -------------------------------------------------------------------------------- 1 | # Memory map 2 | 3 | Generally the RISC-V microcontroller memory map is implementation specific; the only reserved area 4 | in the RISC-V microcontroller profile is a slice at the end of the memory space, called the 5 | **system control area**. 6 | 7 | Typical RISC-V microcontroller devices have: 8 | 9 | - a **read-only code** area, usually at 0x00000000, but the actual address may be implementation 10 | specific (typically flash) 11 | - a **read-write data** area (typically RAM) 12 | - an implementation specific **peripheral** area. 13 | 14 | Multi-hart devices can share certain memory areas (code or data), but can also have hart-specific 15 | memory areas, or both shared and specific areas. 16 | 17 | ## The system control area 18 | 19 | The system control area is a slice of 256 MiB at the end of the memory space. This area 20 | must have the execute permissions removed, and attempts to execute code from it must trigger 21 | an exceptions (instruction access fault). 22 | 23 | For 32-bit devices, the system control area is **0xF0000000-0xFFFFFFFF**. 24 | 25 | For 64-bit devices, the system control area is **0xFFFFFFFF'F0000000-0xFFFFFFFF'FFFFFFFF**. 26 | 27 | The system control area is implemented as a set of memory-mapped address spaces, some providing control and status registers common for the entire 28 | device, and some providing control and status registers for the current hart: 29 | 30 | | Base | Top | Name | Description | 31 | |:-----|:----|:-----|-------------| 32 | | 0xF000'0000 | 0xF000'0FFF | `dcb` | The Device Control Block. | 33 | | 0xF000'1000 | 0xF000'1FFF | `sysclock` | The Device System Clock. | 34 | | 0xF000'2000 | 0xF000'2FFF | `rtclock` | The Device Real-Time Clock. | 35 | | 0xF000'3000 | 0xF000'3FFF | `wdog` | The Device Watchdog Timer. | 36 | | | | | | 37 | | 0xF100'0000 | 0xF100'0FFF | `hcb` | The Hart Control Block. | 38 | | 0xF100'2000 | 0xF100'3FFF | `hic` | The Hart Interrupt Controller. | 39 | 40 | Each hart has its own separate control block; all HCBs map to the same address, the internal 41 | logic being able to distinguish between them based on the ID of the hart requesting access; 42 | thus each hart can access only its own hart control block. 43 | 44 | Same for the Hart Interrupt Controller. 45 | 46 | (the addresses are preliminary, need more work to find a solution easy to decode) 47 | 48 | 49 | TODO [PA]: reserving space in the system control area 50 | for debug program buffer, in case an implementation chooses to 51 | make the buffer memory mapped. 52 | -------------------------------------------------------------------------------- /device-control-block.md: -------------------------------------------------------------------------------- 1 | # The Device Control Bloc (`dcb`) 2 | 3 | The DCB includes system registers that are common to the entire device and are not 4 | specific to any given hart. 5 | 6 | ## Memory map 7 | 8 | | Offset | Name | Width | Type | Reset | Description | 9 | |:-------|:-----|:------|:-----|:------|-------------| 10 | | 0x0000 | hartidmax | 32b | ro | 0x00000NNN | The highest value for hart ID in the device. | 11 | | 0x0004 | vendorid | 32b | ro | | Vendor ID. | 12 | | 0x0008 | archid | 32b | ro | | Architecture ID. | 13 | | 0x000C | impid | 32b | ro | | Implementation ID. | 14 | | 0x0010 | | | | | Reserved. | 15 | | 0x0020 | dcsr | | | | Debug CSR. | 16 | | 0x0028 | dpc | | | | Debug PC. | 17 | | 0x0030 | | | | | Reserved. | 18 | | 0x0100 | harts | | | | All harts interrupts. | 19 | 20 | ## The highest hart ID 21 | 22 | For multi-hart devices, reading this register returns the highest numerical value used as hart ID 23 | in the device. Single-hart devices must return 0. 24 | 25 | ## All harts interrupts 26 | 27 | For multi-hart devices, these registers allow one hart to pend interrupts to any other 28 | hart, and possibly to temporarily adjust the priority thresholds, to handle synchronization 29 | issues like priority inversion. 30 | 31 | > Use case must be further investigated. 32 | 33 | For single-hart devices, this area is reserved. 34 | 35 | | Offset | Name | Width | Type | Reset | Description | 36 | |:-------|:-----|:------|:-----|:------|-------------| 37 | | 0x0000 | `hartid` | 32b | w | 0x00000000 | Access key. | 38 | | 0x0004 | `pendnum` | 32b | w | | Hart interrupt pending bits. | 39 | | 0x0008 | `prioth` | 32b | rw | 0x00000000 | Hart priority threshold. | 40 | | 0x000C | | | | | Reserved. | 41 | 42 | These registers have one additional bit of state. To prevent inadvertent interrupt 43 | pendings, all writes to this area (`pendnum` and `prioth`) must be preceded by an 44 | unlock operation to the `hartid` register. The value (0x51F15000 + (Hart ID)) must be 45 | written to the `hartid` register to set the hart id and the state bit before 46 | any write access to `pendings` and `prioth`. 47 | The state bit is cleared at reset, and after any write to `pendnum` or `prioth` registers. 48 | 49 | The `pendnum` register is write only and allows access to the hart identified 50 | by the write to `hartid`. Writing a small integer value pends the 51 | interrupt with the given number. The `pendnum` register must have enough bits to 52 | represent any interrupt number (at most 10). 53 | 54 | The `prioth` register is read/write and allows access to the hart identified 55 | by the write to `hartid`. It must have enough bits to represent any interrupt 56 | priority. 57 | 58 | Warning: The key mechanism has synchronisation problems in case multiple harts access it 59 | simultaneously. Implementations can choose to allow access only from hart 0. 60 | 61 | ## RISC-V compatibility CSRs 62 | 63 | The RISC-V Volume I, Chapter 2.8, mandates for the `rdtime` instruction. This can be 64 | retrieved from `dcb.rtclock.counter`. 65 | 66 | Other RISC-V registers from RISC-V Volume II: 67 | 68 | - mvendorid 69 | - marchid 70 | - mimpid 71 | 72 | Other registers that might need attention: 73 | 74 | - required for the debug module (are these per-hart or per-device?): 75 | - dcsr 76 | - dpc 77 | - optional for the debug module: 78 | - dscratch0 79 | - dscratch1 80 | -------------------------------------------------------------------------------- /startup.md: -------------------------------------------------------------------------------- 1 | # Device startup 2 | 3 | After reset, all harts in a RISC-V microcontroller start executing code, identified by a 4 | per-hart **startup block**. 5 | 6 | The location of the hart startup block is implementation specific. The typical 7 | configuration with a single hart has the startup block located at the beginning 8 | of the memory space (usually address 0x00000000). 9 | 10 | If multiple harts share a memory area to fetch code (like a flash area), the 11 | startup blocks are organised as an array located at the beginning of the shared 12 | memory area. If different harts have different memory areas, the startup blocks 13 | are located at the beginning of each area. 14 | 15 | For a RISC-V hart, the minimum information required to start a hart is: 16 | 17 | - a pointer to the startup routine 18 | - a pointer to the main stack (`spm`) 19 | - a pointer to the RISC-V global pointer (`gp`) 20 | - a pointer to the exception table 21 | 22 | All pointers are xlen bits. 23 | 24 | For further extensions, a few words at the end of the startup area are reserved. 25 | 26 | > The pointer to the exception table must be known by the hart before entering 27 | the startup code, to catch possible execution faults in the startup code. 28 | 29 | ## Usage 30 | 31 | With the above definition of a startup block, there is no need for any assembly 32 | instructions, the entire startup code can be written in C/C++. 33 | 34 | ```c 35 | 36 | extern "C" { 37 | 38 | typedef void (*riscv_exception_handler_t)(void); 39 | 40 | typedef struct 41 | { 42 | void (*startup)(void); 43 | void* main_stack_pointer; 44 | void* global_pointer; 45 | riscv_exception_handler_t* exception_handlers; 46 | void* reserved[4]; 47 | } riscv_startup_block_t 48 | 49 | riscv_startup_block_t 50 | __attribute__((section(".startup_blocks"))) 51 | harts_startup_blocks[] = { 52 | { 53 | hart0_startup, 54 | hart0_stack_pointer, 55 | hart0_global_pointer, 56 | hart0_exception_handlers 57 | }, 58 | { 59 | hart1_startup, 60 | hart1_stack_pointer, 61 | hart1_global_pointer, 62 | hart1_exception_handlers 63 | } 64 | }; 65 | 66 | [[noreturn]] void 67 | hart0_startup(void) 68 | { 69 | // ... 70 | } 71 | 72 | [[noreturn]] void 73 | hart1_startup(void) 74 | { 75 | // ... 76 | } 77 | 78 | } // extern "C" 79 | ``` 80 | 81 | ### Prerequisites 82 | 83 | The linker script must allocate the `.startup_blocks` section at the implementation 84 | specific address (usually 0x00000000). 85 | 86 | ## Implementation 87 | 88 | TODO: define a format to express the pseudocode. Possibly Scala? 89 | 90 | After reset, each hart will execute the following code, with 91 | 92 | ``` 93 | start_hart(int hid) 94 | { 95 | // Identify the per-hart startup block. 96 | addr = (word_size * 8) * hid; 97 | 98 | // Clear all hart registers. 99 | hart[hid].x0 = 0; 100 | hart[hid].x1 = 0; 101 | // ... 102 | // Store the exception pointer in the hart specific register. 103 | hart[hid].excvta = *(addr + word_size * 3); 104 | 105 | // Load global pointer. 106 | hart[hid].gp = *(addr + word_size * 2); 107 | // Load main stack pointer. 108 | hart[hid].sp = *(addr + word_size * 1); 109 | // Load program counter; this will immediately pass control to the startup code. 110 | hart[hid].pc = *(addr + word_size * 0); 111 | } 112 | ``` 113 | -------------------------------------------------------------------------------- /rtos-support-features.md: -------------------------------------------------------------------------------- 1 | # RTOS Support Features 2 | 3 | The RISC-V microcontroller profile is designed not only to be C/C++ friendly, but also with RTOS support in mind. 4 | 5 | To make RTOS implementations easier and more efficient, the following features are available: 6 | 7 | ## Shadow thread stack pointer 8 | 9 | Two stack pointers are available, the main stack pointer (MSP) and the thread stack pointer (TSP). 10 | 11 | The main stack is the default stack available after reset, and all exceptions and interrupts create 12 | a stack frame on the main stack. 13 | 14 | If the application switches to Thread mode, the hart switches to the TSP, while interrupts continue 15 | to use the MSP. 16 | 17 | This solution has several advantages: 18 | 19 | * the stack space for each thread needs to cover only the threads needs, and do not worry about 20 | possible large stack usages in ISRs; 21 | * if a thread corrupts it's stack, it is still likely that the stacks used by the interrupts and 22 | other threads are intact, thus improving system reliability. 23 | 24 | ## Stack pointer limit 25 | 26 | One of the most common failure cases that occurs while developing multi-threaded applications is 27 | for one thread to exceed its stack and damage the surrounding memory content. 28 | 29 | The solution is to add a system register with a memory address used as lower limit for the stack. 30 | While pushing words on stack, the address is compared and if the limit is reached, an exception is 31 | raised. 32 | 33 | ## System clock timer 34 | 35 | The system clock timer is intended to drive the RTOS scheduler, and allow to measure durations 36 | (like timeouts) during normal system operations. 37 | 38 | Having an architecture timer allows the RTOS to implement the scheduler code only once, and do not 39 | rely on device specific timers, which require separate initialisation and interrupt handlers for 40 | each specific device. 41 | 42 | ## Real-time clock 43 | 44 | The real-time clock is intended to provide the application a way of keeping track of time while the 45 | device is in sleep mode (and the system clock timer is shut down). 46 | 47 | Having an architecture RTC allows to write the code to manage the absolute time only once inside the RTOS, 48 | and do not rely 49 | on device specific timers, which require separate initialisation and interrupt handlers for 50 | each specific device. 51 | 52 | ## Context-Switch interrupt 53 | 54 | The context switch interrupt is usually the lowest priority interrupt, and is used as the single point 55 | of handling context switches, allowing all other interrupt handlers to be written in C/C++ and do 56 | not bother with context switches at all. 57 | 58 | Without such a feature, all application interrupt handlers require an assembly part to handle the 59 | context switching prior to calling the C/C++ handler, which is a major hassle. 60 | 61 | ## Interrupts priorities threshold 62 | 63 | Having a mechanism to disable only interrupts below a certain threshold greatly improves the real-time 64 | characteristics of a system, by not having to disable all interrupts while handling the system 65 | data structures. By raising the priority threshold instead of completely disabling interrupts, it 66 | is possible to keep fast interrupts still active, regardless how busy the RTOS itself is. 67 | 68 | ## Hart soft reset 69 | 70 | The hart soft reset is intended to reset the running hart from within. 71 | 72 | ## Device soft reset 73 | 74 | The device soft reset is intended to reset the entire device from within. 75 | 76 | Having an architecture soft reset allows to write the code to reset the device only once 77 | inside the RTOS and do not rely on device specific code. 78 | 79 | ## User mode 80 | 81 | For security strict applications, the user mode can also be used in conjunction with the Memory 82 | Protection Unit (MPU), thus further enhancing the robustness of embedded systems. 83 | 84 | ## Atomics 85 | 86 | For multi-hart devices, the RISC-V 'A' Standard Extension for Atomic instructions contains 87 | instructions that atomically 88 | read-modify-write memory to support synchronization between multiple RISC-V harts running in 89 | the same memory space. 90 | -------------------------------------------------------------------------------- /hart-control-block.md: -------------------------------------------------------------------------------- 1 | # The Hart Control Block (HCB) 2 | 3 | For uniform access by software, in addition to CSRs, each hart maps its own status registers to the 4 | same address in the memory space. 5 | 6 | ## Memory Map 7 | 8 | ### RV64 devices 9 | 10 | | Offset | Name | Width | Type | Reset | Description | 11 | |:-------|:-----|:------|:-----|:------|-------------| 12 | | 0x0000 | `excvta` | 64b | rw | Startup | Exceptions vector table address. | 13 | | 0x0008 | `intvta` | 64b | rw | 0x00000000'00000000 | Interrupts vector table address. | 14 | | 0x0010 | `intlast` | 64b | ro | | The index of the last interrupt in the HIC table. | 15 | | 0x0018 | `sysclockcmp` | 64b | rw | 0x00000000'00000000 | System clock comparator. | 16 | | 0x0020 | `rtclockcmp` | 64b | rw | 0x00000000'00000000 | Real-time clock comparator. | 17 | | 0x0028 | | | | | Reserved. | 18 | | 0x00F0 | `cyclecnt` | 64b | ro | 0x00000000'00000000 | Cycle count. | 19 | | 0x00F8 | `instcnt` | 64b | ro | 0x00000000'00000000 | Instructions count. | 20 | 21 | ### RV32 devices 22 | 23 | | Offset | Name | Width | Type | Reset | Description | 24 | |:-------|:-----|:------|:-----|:------|-------------| 25 | | 0x0000 | `excvta` | 32b | rw | Startup | Exceptions vector table address. | 26 | | 0x0004 | | | | | Reserved. | 27 | | 0x0008 | `intvta` | 32b | rw | 0x00000000 | Interrupts vector table address. | 28 | | 0x000C | | | | | Reserved. | 29 | | 0x0010 | `intlast` | 32b | ro | | The index of the last interrupt in the HIC table. | 30 | | 0x0014 | | | | | Reserved. | 31 | | 0x0018 | `sysclockcmpl` | 32b | rw | 0x00000000 | Low word of system clock comparator. | 32 | | 0x0018 | `sysclockcmph` | 32b | rw | 0x00000000 | High word of system clock comparator. | 33 | | 0x0020 | `rtclockcmpl` | 32b | rw | 0x00000000 | Low word of real-time clock comparator. | 34 | | 0x0020 | `rtclockcmph` | 32b | rw | 0x00000000 | High word of real-time clock comparator. | 35 | | 0x0028 | | | | | Reserved. | 36 | | 0x00F0 | `cyclecntl` | 32b | ro | 0x00000000 | Low word of cycle count. | 37 | | 0x00F4 | `cyclecnth` | 32b | ro | 0x00000000 | High word of cycle count. | 38 | | 0x00F8 | `instcntl` | 32b | ro | 0x00000000 | Low word of instructions count. | 39 | | 0x00FC | `instcnth` | 32b | ro | 0x00000000 | High word of instructions count. | 40 | 41 | ## Exceptions vector table address (`excvta`) 42 | 43 | An xlen-bit register that holds the address of the exceptions dispatch table. 44 | The table is an array of addresses 45 | (xlen size elements) pointing to exception handlers (C/C++ functions). 46 | 47 | The register is initialised with the value fetched from the hart startup block. 48 | 49 | If not set (i.e. 0x0) and an exception occurs, the behaviour is undefined. 50 | 51 | ## Interrupts vector table address (`intvta`) 52 | 53 | An xlen-bit register that holds the address of the interrupts dispatch table. 54 | The table is an array of addresses 55 | (xlen size elements) pointing to interrupt handlers (C/C++ functions). 56 | 57 | If not set (i.e. 0x0) and an interrupt occurs, an exception is 58 | triggered (TODO: what exception?). 59 | 60 | If the hart does not implement an interrupt controller, writing this register 61 | is ignored and reading always returns zero. This mechanism can also be used 62 | to determine at runtime if the hart implements an interrupt controller. 63 | 64 | ## The highest interrupt number (`intmax`) 65 | 66 | The `intmax` read-only register is 32-bit and reads the highest interrupt number; it is 67 | useful when iterating the Hart Interrupt Controller array. 68 | 69 | ## The system clock comparator 70 | 71 | See the Device System Clock page. 72 | 73 | ## The real-time clock comparator 74 | 75 | See the Device Real-Time Clock page. 76 | 77 | ## Cycle count 78 | 79 | The `cyclecnt` register is 64-bit wide and holds a count of the number of clock cycles 80 | executed by the core on which the hart is running (not the hart itself!) from an 81 | arbitrary start time in the past. In practice, the underlying 64-bit counter should never 82 | overflow between two samples. The rate at which the cycle counter advances will depend 83 | on the implementation and operating environment. The execution environment 84 | should provide a means to determine the current rate (cycles/second) at which 85 | the cycle counter is incrementing. 86 | 87 | RV64 devices expose a single 64-bit register, accessible with 64-bit instructions. 88 | RV32 devices exposes separate high/low 32-bit registers. 89 | 90 | ## Instructions count 91 | 92 | The `instcnt` register is 64-bit wide and counts the number of instructions executed 93 | by this hart from some arbitrary start point in the past. 94 | 95 | RV64 devices expose a single 64-bit register, accessible with 64-bit instructions. 96 | RV32 devices exposes separate high/low 32-bit registers. 97 | 98 | -------------------------------------------------------------------------------- /introduction.md: -------------------------------------------------------------------------------- 1 | # Chapter 1: Introduction 2 | 3 | This is a draft of the **RISC-V microcontroller architecture** description document. 4 | [Feedback](contributing.md) welcome. 5 | 6 | ## Mission Statement 7 | 8 | Define a **modern C/C++ friendly** microcontroller architecture based on the RISC-V 9 | instruction set, that makes writing embedded software **easier** and **more productive**. 10 | And... enjoy the process! 11 | 12 | In technical terms, the mission statement can be rephrased as: define a set of 13 | specifications for **RISC-V microcontrollers** intended for **embedded** **real-time** 14 | / **low power** / **IoT** applications that do not require an operating system. 15 | Favour **C/C++** multi-threaded **RTOS** systems. 16 | 17 | A secondary goal is to improve the RISC-V microcontroller profile to the point where 18 | it can be adopted by the RISC-V foundation as a alternate standard for microcontroller 19 | devices. 20 | 21 | ## Limitations 22 | 23 | These specifications intentionally **do not** include application class devices which 24 | use virtual memory and/or have supervisor/hypervisor modes which are intended to run 25 | operating systems kernels. For this class of devices, see the "RISC-V Privileged 26 | Architecture" specifications. 27 | 28 | ## Sub-profiles 29 | 30 | Since there are many microcontroller configurations, 3 classes were identified: 31 | 32 | - **ES** (embedded small) **ES-RV32E** if possible, otherwise **ES-RV32I[M][C]**: 33 | **low end**, single hart, 34 | 32-bit, no floating point, no unprivileged mode (intended to support legacy PIC & AVR class 35 | applications; comparable with Cortex-M0) 36 | - **EM** (embedded medium) **EM-RV32IM[F[D]]C** / **EM-RV64IM[F[D]]C**: 37 | **regular**, single hart, 32/64-bit, possibly with floating point 38 | (intended to support common multi-threaded applications; comparable with 39 | Cortex-M3/M4) 40 | - **EL** (embedded large) **EL-RV32IMA[F[D]]C** / **EL-RV64IMA[F[D]]C**: 41 | **high end**, multi-hart/multi-core, 32/64-bit, atomics, possibly with floating point 42 | (intended to support hard real-time, high performance applications) 43 | 44 | ## Benefits 45 | 46 | One of the mantras used during the RISC-V design was "if it can be done 47 | in software, it should not be done in hardware." 48 | 49 | The microcontroller profile reconsidered the implementation of some 50 | core features (like stack handling), and pushed them back to hardware, 51 | where they belong. 52 | 53 | 54 | 55 | Some of the benefits are: 56 | 57 | - best interrupt latency, more appropriate for real-time applications 58 | - improved robustness for multi-threaded applications 59 | - much easier to use directly in C/C++ 60 | 61 | > The RISC-V microcontroller profile is created with developers in 62 | mind, to make developpers happy. Happy developers write better 63 | applications, making final users happy as well. 64 | 65 | ## Definitions 66 | 67 | ### Hart 68 | 69 | Hart is a contraction of _hardware thread_ and represents a hardware resource. 70 | 71 | Technically, a hart is a resource abstraction representing an independently 72 | advancing RISC-V execution context within a RISC-V execution environment. 73 | 74 | A RISC-V execution context contains a full set of RISC-V architectural registers. 75 | 76 | A hart executes its program independently from other harts in a RISC-V system. 77 | "Execute independently" means that each hart will 78 | eventually fetch and execute its next instruction in program order regardless 79 | of the activity of other harts (at least at user level). 80 | 81 | #### RISC-V microcontroller specifics 82 | 83 | Harts are identified by a Hart ID, a small unsigned integer. Hart IDs are unique. 84 | The rule used to assign hart IDs is implementation specific, but it is recommended 85 | to keep it simple, preferably within a continuous small range. There should always 86 | be a hart with ID=0, which will have slightly more duties, for example to process 87 | the NMIs. 88 | 89 | To help applications auto-configure themselves, the largest hart ID is stored in 90 | a register in the Device Control Bloc (`dcb.hartidmax`). 91 | 92 | ### Core 93 | 94 | A RISC-V device can contain one or more RISC-V-compatible processing cores 95 | together with other non-RISC-V-compatible cores. 96 | 97 | A core is usually considered a purely physical thing. 98 | 99 | A core implements one or more harts, where if there are multiple harts, they are 100 | time-multiplexing some common hardware components (e.g., instruction fetch, 101 | physical registers, ALUs, predictor state, etc.) 102 | 103 | ### CSRs 104 | 105 | Control and Status Registers (CSRs) are used for hart-specific state only. CSRs 106 | are not memory mapped - they are accessed by CSR instructions. 107 | 108 | -------------------------------------------------------------------------------- /real-time-clock.md: -------------------------------------------------------------------------------- 1 | # The Device Real-Time Clock (DRTC) 2 | 3 | ## Overview 4 | 5 | The **Device Real-Time Clock** is intended to support the implementation of the ISO/IEC 14882.2011 6 | `system_clock` (§ 20.11.7.1) and `steady_clock` (§ 20.11.7.2) classes. Objects of class 7 | `system_clock` represent wall clock time from the system-wide real-time clock. Objects of 8 | class `steady_clock` represent clocks for which values of the time point never decrease as 9 | physical time advances and for which values of time_point advance at a steady rate 10 | relative to real time. That is, the clock may not be adjusted. 11 | 12 | All harts in a RISC-V device share the same Device Real-Time Clock counter, but each hart may 13 | have its own comparator. 14 | 15 | Even when the device is halted in Debug state, the clock counter continues to be incremented. 16 | 17 | The real-time clock is inspired by the `mtime`/`mtimecmp` definitions in the RISC-V privileged specs, 18 | but it differs by having a control register and not being intended to drive the scheduler clock. 19 | 20 | ## Power domain 21 | 22 | To support full functionality, the real-time clock should run even when the 23 | rest of system is powered down, so it must be located in a different frequency/voltage 24 | domain from the cores. 25 | 26 | ## Clock input 27 | 28 | The real-time clock input frequency is fixed to a device or application specific value. 29 | 30 | To support low-power devices, the real-time clock input should be a low frequency oscillator; the actual 31 | source is implementation specific. 32 | 33 | > Common implementations use a 32.678 Hz quartz or oscillator. 34 | Low frequency internal RC oscillators (for example 40 kHz) can also be used, but the application 35 | must calibrate the frequency using a higher accuracy source. With a typical 32 kHz input, 36 | the clock resolution 37 | is about 30 µS and it takes about 17 million years to overflow. 38 | 39 | > The real-time clock is usually not suitable to drive the RTOS tick timer, since either 40 | it is not accurate enough, or its frequency does not allow the common 1000 Hz scheduler rate; 41 | use the system clock instead. 42 | 43 | ## Memory map 44 | 45 | RV64 devices 46 | 47 | | Offset | Name | Width | Type | Reset | Description | 48 | |:-------|:-----|:------|:-----|:------|-------------| 49 | | 0x0000 | `ctrl` | 32b | rw | 0x00000003 | Control and status register. | 50 | | 0x0008 | `cnt` | 64b | ro | Undefined | RTC timer counter. | 51 | | 0x0010 | `cmp` | 64b | rw | Undefined | RTC comparator. | 52 | 53 | RV32 devices 54 | 55 | | Offset | Name | Width | Type | Reset | Description | 56 | |:-------|:-----|:------|:-----|:------|-------------| 57 | | 0x0000 | `ctrl` | 32b | rw | 0x00000003 | Control and status register. | 58 | | 0x0008 | `cntl` | 32b | ro | Undefined | Low word of RTC counter. | 59 | | 0x000C | `cnth` | 32b | ro | Undefined | High word of RTC counter. | 60 | | 0x0010 | `cmpl` | 32b | rw | Undefined | Low word of RTC comparator. | 61 | | 0x0014 | `cmph` | 32b | rw | Undefined | High word of RTC comparator. | 62 | 63 | TODO: define the mechanism to clear the counter. at each enable? 64 | 65 | ## The clock control and status register 66 | 67 | Controls the RTC and provides status data. 68 | 69 | By default, the RTC starts disabled; software must enable it during startup. 70 | 71 | | Bits | Name | Type | Reset | Description | 72 | |:-----|:-----|:-----|:------|-------------| 73 | | [0] | `enable` | rw | 0 | Indicates the enabled status of the RTC counter:
0 - Counter is disabled (default).
1 - Counter is enabled. | 74 | | [2-1] | `source` | rw | 0b11 | Indicates the clock source:
0b00 - Implementation specific external reference clock.
0b01 - Reserved.
0b10 - Factory-trimmed on-chip oscillator.
0b11 - External crystal oscillator (default). | 75 | | [31-3] |||| Reserved. | 76 | 77 | 78 | ## The clock counter register 79 | 80 | The real-time clock time point register is a 64-bit counter, common on all RV32 and RV64 devices. 81 | 82 | To guarantee the steadiness characteristic of the clock, the register is read-only. 83 | 84 | RV64 devices expose a single 64-bit register, accessible with 64-bit instructions. 85 | RV32 devices exposes separate high/low 32-bit registers. 86 | 87 | ## The clock comparator register 88 | 89 | In addition to keeping track of time, the real-time clock can also be used to 90 | trigger periodic interrupts. Low-power devices 91 | can use the real-time clock to wakeup the entire RISC-V device from implementation 92 | specific sleep modes. 93 | 94 | The comparator register causes a `rtclock_cmp` interrupt to be posted when the 95 | counter register 96 | contains a value greater than or equal to the value in the comparator register. 97 | The interrupt remains posted until it is cleared by writing to the comparator register. 98 | 99 | RV64 devices expose a single 64-bit register, accessible with 64-bit instructions. 100 | RV32 devices exposes separate high/low 32-bit registers. 101 | 102 | ## Usage 103 | 104 | The memory mapped registers are available via a set of structures, directly available in C/C++. 105 | 106 | RV64 devices: 107 | 108 | - `rtclock.ctrl` 109 | - `rtclock.cnt` 110 | - `hcb.rtclockcmp` 111 | 112 | RV32 devices: 113 | 114 | - `rtclock.ctrl` 115 | - `rtclock.cntl` 116 | - `rtclock.cnth` 117 | - `hcb.rtclockcmpl` 118 | - `hcb.rtclockcmph` 119 | 120 | ```c 121 | uint64_t 122 | riscv_rtclock_read_cnt(void) 123 | { 124 | #if __riscv_xlen == 32 125 | // Atomic read. The loop is taken once in most cases. Only when the 126 | // value carries to the high word, two loops are performed. 127 | while (true) 128 | { 129 | uint32_t hi = rtclock.cnth; 130 | uint32_t lo = rtclock.cntl; 131 | if (hi == rtclock.cnth) 132 | { 133 | return ((uint64_t) hi << 32) | lo; 134 | } 135 | } 136 | #else 137 | return rtclock.cnt; 138 | #endif 139 | } 140 | 141 | uint64_t 142 | riscv_rtclock_read_cmp(void) 143 | { 144 | #if __riscv_xlen == 32 145 | return ((uint64_t) hcb.rtclockcmph << 32) | hcb.rtclockcmpl; 146 | #else 147 | return dcb.rtclock.cmp; 148 | #endif 149 | } 150 | 151 | void 152 | riscv_rtclock_write_cmp(uint64_t value) 153 | { 154 | #if __riscv_xlen == 32 155 | // Write low as max; no smaller than old value. 156 | hcb.rtclockcmpl = (uint32_t) UINT_MAX; 157 | // Write high; no smaller than old value. 158 | hcb.rtclockcmph = ((uint32_t) (value >> 32)); 159 | // Write low as new value. 160 | hcb.rtclockcmpl = ((uint32_t) value); 161 | #else 162 | hcb.rtclockcmp = value; 163 | #endif 164 | } 165 | ``` 166 | -------------------------------------------------------------------------------- /interrupt-controller.md: -------------------------------------------------------------------------------- 1 | # The Hart Interrupt Controller (HIC) 2 | 3 | The RISC-V microcontroller profile provides a nested vectored interrupt controller as part of 4 | the common specifications. 5 | 6 | Each hart may be able to process its own set of interrupts, independent from the other harts. 7 | Only hart 0 is required to implement a HIC; additional interrupt controllers in all other 8 | harts are optional and implementation specific. 9 | 10 | > Hard real-time devices may dedicate separate harts to process fast interrupts. 11 | It is possible to wire all interrupts to all harts, and decide in software which interrupts 12 | are processed by each hart. 13 | 14 | ## Features 15 | 16 | The HIC supports the following features: 17 | 18 | - HIC interrupts can be enabled and disabled by writing to their corresponding `status.enabled` or 19 | `status.clearenabled` bit fields, using a write-1-to-enable and write-1-to-clear policy. 20 | 21 | When an interrupt is disabled, interrupt assertion causes the interrupt to become pending, but the interrupt 22 | cannot become active. If an interrupt is active when it is disabled, it remains in the active state until 23 | this is cleared by a reset or an exception return. Clearing the enable bit prevents any new activation of 24 | the associated interrupt. 25 | 26 | An implementation can hard-wire interrupt enable bits to zero if the associated interrupt line does not 27 | exist, or hard-wired them to one if the associated interrupt line cannot be disabled. 28 | 29 | - the pending state of HIC interrupts can set or removed by software using 30 | the `status.pending` and `status.clearpending` bit fields. The registers use a write-1-to-enable and 31 | write-1-to-clear policy. Writing 1 to a bit in the `status.clearpending` bit field has no effect on the 32 | execution status of an active interrupt. 33 | 34 | It is implementation specific for each interrupt line supported, whether an interrupt supports either or both 35 | setting and clearing of the associated pending state under software control. 36 | 37 | - status bits are provided to allow software to determine whether an interrupt is active, pending, or enabled. 38 | - HIC interrupts are prioritized by updating a priority field. Priorities are maintained according to the RISC-V 39 | prioritization scheme. 40 | - HIC supports a maximum of 1024 interrupts. 41 | 42 | ## Memory map 43 | 44 | | Offset | Name | Width | Type | Reset | Description | 45 | |:-------|:-----|:------|:-----|:------|-------------| 46 | | 0x0000 | `interrupts[]` | 32b * 2 * N | rw | 0x00000000 | Array of interrupt control registers. | 47 | 48 | The number of interrupts (N) is implementation specific, but no higher than 1024, including the system interrupts. 49 | 50 | Total size: 0x2000. 51 | 52 | ## Per interrupt registers 53 | 54 | Each interrupt has a small per-hart set of status and configuration attributes: 55 | 56 | * `enabled`: interrupts can either be disabled (default) or enabled 57 | * `pending`: interrupts can either be pending (a request is waiting to be served) or not 58 | pending 59 | * `active`: interrupts can either be in an active (being served) or inactive state 60 | * `prio`: interrupt priority 61 | 62 | To store and control these attributes, each interrupt has two 32-bit registers: 63 | 64 | | Offset | Name | Width | Type | Reset | Description | 65 | |:-------|:-----|:------|:-----|:------|-------------| 66 | | 0x0000 | `prio` | 32b | rw | 0x00000000 | The interrupt priority register. | 67 | | 0x0004 | `status` | 32b | rw | 0x00000000 | The interrupt status and control register. | 68 | 69 | The `prio` register has the the following content: 70 | 71 | | Bits | Name | Type | Reset | Description | 72 | |:-----|:-----|:-----|:------|-------------| 73 | | [N:0] | `prio` | rw | 0 | The interrupt priority. | 74 | | [(xlen-1):(N+1)] | | | | Reserved. | 75 | 76 | N is the number of bits required to store the maximum priority level, and is implementation 77 | specific. It must match the number of bits used by the `iprioth` CSR. 78 | 79 | The `status` register has the following content: 80 | 81 | | Bits | Name | Type | Reset | Description | 82 | |:-----|:-----|:-----|:------|-------------| 83 | | [0] | `enabled` | rw1s | 0 | Enabled status bit; 1 if the interrupt is enabled.
When 1 is written, the `enabled` bit is set. | 84 | | [1] | `pending` | rw1s | 0 | Pending status bit; 1 if the interrupt is pending.
When 1 is written, the `pending` bit is set. | 85 | | [2] | `active` | r | 0 | Active status bit; 1 if the interrupt is active. | 86 | | [3] |||| Reserved | 87 | | [4] | `clearenabled` | w1c | | When 1 is written, the `enabled` status bit is cleared. | 88 | | [5] | `clearpending` | w1c | | When 1 is written, the `pending` status bit is cleared. | 89 | | [31:6] |||| Reserved | 90 | 91 | > The alternative to packing all status and control bits related to an interrupt 92 | in two words would be to have separate multi-word fields with status, enable, disable, 93 | set pending, clear pending, active bits. It was considered that the packed solution 94 | is easier to use in software. 95 | 96 | > [JB] Why use separate bits for enabling and disabling interrupts? Why not 97 | use the same write-1-to-enable and write-0-to-clear? ... it is not the way 98 | assignment works anywhere else in C. [ilg] C programmers are very much used 99 | to different semantics when accessing peripheral registers, actually most 100 | real peripherals in modern devices use write-1-to-set/clear bits, so this 101 | not surprise anybody. As for keeping the language semantics, in C++ all 102 | operators can be redefined, so they can implement any semantics is 103 | required. 104 | 105 | ## Usage 106 | 107 | Individual interrupts are enabled by setting the `status.enabled` bit and are disabled by writing 1 in the `status.clearenabled` bit. To be effective, interrupts must also have non-zero priorities. 108 | 109 | ```c 110 | hic.interrupts[7].prio = 7; 111 | hic.interrupts[7].status = INTERRUPTS_SET_ENABLED; 112 | 113 | hcb.interrupts[7].status = INTERRUPTS_CLEAR_ENABLED; 114 | ``` 115 | 116 | Interrupts can be programmatically set to be pending by writing 1 in the `status.pending` field; the pending status can be cleared by writing 1 to the `status.clearpending` bit. 117 | 118 | ```c 119 | hcb.interrupts[7].status = INTERRUPTS_SET_PENDING; 120 | hcb.interrupts[7].status = INTERRUPTS_CLEAR_PENDING; 121 | ``` 122 | 123 | To check the status bits: 124 | 125 | ```c 126 | if (hcb.interrupts[7].status & INTERRUPTS_STATUS_PENDING) { 127 | // ... 128 | } 129 | ``` 130 | 131 | ## Alternate proposal 132 | 133 | [JB] Each hart has a fixed set of interrupt vectors. For each interrupt 134 | source, a register exists that defines which vector receives interrupts 135 | from that source. Effectively, the hart has some number of IRQ lines 136 | and interrupt sources are assigned manually to IRQ lines. The IRQ lines 137 | have a fixed priority, based on the interrupt number. 138 | 139 | If multiple 140 | peripherals are assigned to the same vector, then the ISR for that 141 | vector must poll each of the peripherals assigned to that vector to 142 | determine the cause of the interrupt. 143 | 144 | This also limits interrupt nesting, since only a higher-priority 145 | interrupt (or an exception) can interrupt an ISR, there can be at most 146 | 2\*PRIORITY_LEVELS nested interrupts, if every ISR is interrupted in an 147 | exception handler and exception handlers do not themselves raise 148 | exceptions. 149 | 150 | > [ilg] The only advantage to be noted is that it limits nesting. The disadvantages are: increased software complexity, increased latency, more complicated to maintain (changing the priority in the first case requires only a write to the priority register, while in the second case it is also necessary to move the test and the call from one intermediate handler to the other), possible out-of-sync cases, when the test is not in the right handler. 151 | -------------------------------------------------------------------------------- /system-clock.md: -------------------------------------------------------------------------------- 1 | # The Device System Clock (`sysclock`) 2 | 3 | ## Overview 4 | 5 | The **system clock** is intended to support the implementation of the ISO/IEC 14882.2011 6 | `high_resolution_clock` (§ 20.11.7.3). Objects of class `high_resolution_clock` represent clocks 7 | with the shortest tick period. 8 | 9 | The system clock is also intended as: 10 | 11 | - the RTOS tick timer that fires periodically at a programmable rate, for example 1000 Hz, to 12 | measure time and to drive pre-emptive context switches 13 | - a variable rate alarm or signal timer to handle timeouts and alarms 14 | 15 | All harts in a RISC-V device share the same system clock counter, but each hart may have its 16 | own comparator. 17 | 18 | When the device is halted in Debug state, the clock counter is not incremented. 19 | 20 | The system clock is inspired by the `mtime`/`mtimecmp` definitions in the RISC-V 21 | privileged specs, but it differs by counting a higher frequency input, running only when 22 | the device is powered and not counting during debug. 23 | 24 | ## Power domain 25 | 26 | The system clock is required to run only when the device is powered up, so it can be 27 | located in the same frequency/voltage domain as the cores. 28 | 29 | ## Clock input 30 | 31 | The system clock source is a reference clock. Software can select whether the reference 32 | clock is the core clock, the device high frequency reference clock or an implementation 33 | specific external clock source. If an implementation uses an external clock, it must 34 | document the relationship between the processor clock and the external reference. 35 | 36 | > By default, the system clock uses the same source as the core clock, which is 37 | a common configuration. 38 | For example, with a 100 MHz core clock, the system clock resolution 39 | is 10 nS and it takes about 5800 years to overflow. 40 | 41 | > A common RTOS tick frequency is 1000 Hz; in order to accurately achieve this, 42 | an input frequency multiple of the tick frequency is required. 43 | 44 | > Low-power devices might need to vary the core frequency by changing implementation 45 | specific clock registers (like PLL registers). In this case the system clock software 46 | must be notified to use the same input frequency. Alternately, the system clock may 47 | be configured to use the high frequency clock reference (like the quartz 48 | oscillator), assumed to have a fixed frequency. 49 | 50 | ## Memory map 51 | 52 | ### RV64 devices 53 | 54 | | Offset | Name | Width | Type | Reset | Description | 55 | |:-------|:-----|:------|:-----|:------|-------------| 56 | | 0x0000 | `ctrl` | 32b | rw | 0x00000003 | Control and status register. | 57 | | 0x0008 | `cnt` | 64b | ro | 0x00000000'00000000 | System clock timer counter. | 58 | 59 | Part of the Hart Control Block 60 | 61 | | Offset | Name | Width | Type | Reset | Description | 62 | |:-------|:-----|:------|:-----|:------|-------------| 63 | | 0x0000 | `cmp` | 64b | rw | Undefined | System clock timer comparator. | 64 | 65 | ### RV32 devices 66 | 67 | | Offset | Name | Width | Type | Reset | Description | 68 | |:-------|:-----|:------|:-----|:------|-------------| 69 | | 0x0000 | `ctrl` | 32b | rw | 0x00000003 | Control and status register. | 70 | | 0x0008 | `cntl` | 32b | ro | 0x00000000 | Low word of system clock timer counter. | 71 | | 0x000C | `cnth` | 32b | ro | 0x00000000 | High word of system clock timer counter. | 72 | 73 | Part of the Hart Control Block 74 | 75 | | Offset | Name | Width | Type | Reset | Description | 76 | |:-------|:-----|:------|:-----|:------|-------------| 77 | | 0x0000 | `cmpl` | 32b | rw | Undefined | Low word of system clock timer comparator. | 78 | | 0x0004 | `cmph` | 32b | rw | Undefined | High word of system clock timer comparator. | 79 | 80 | ## The clock control and status register 81 | 82 | Controls the system clock timer and provides status data. 83 | 84 | By default, the system clock starts disabled; software must enable it during startup. 85 | 86 | | Bits | Name | Type | Reset | Description | 87 | |:-----|:-----|:-----|:------|-------------| 88 | | [0] | `enable` | rw | 0b0 | Indicates the enabled status of the system clock counter:
0 - Counter is disabled (default).
1 - Counter is enabled. | 89 | | [2-1] | `source` | rw | 0b11 | Indicates the clock source:
0b00 - Implementation specific external reference clock.
0b01 - Reserved.
0b10 - High frequency clock reference.
0b11 - Core clock (default). | 90 | | [31-3] |||| Reserved. | 91 | 92 | ## The clock counter register 93 | 94 | The system clock time point register is a 64-bit counter, common on all RV32 and RV64 devices. 95 | 96 | To guarantee the steadiness characteristic of the clock, the register is read-only. At reset, the register is cleared to 0. 97 | 98 | RV64 devices expose a single 64-bit register, accessible with 64-bit instructions. 99 | RV32 devices exposes separate high/low 32-bit registers. 100 | 101 | ## The clock comparator register 102 | 103 | In addition to keeping track of time, the system clock can also be used to trigger 104 | interrupts at specific time points, either for periodic events (like driving 105 | pre-emption in a RTOS scheduler) or to trigger timeout events. 106 | 107 | The comparator register causes a `sysclock_cmp` interrupt to be posted when the 108 | counter register 109 | contains a value greater than or equal to the value in the comparator register. 110 | The interrupt remains posted until it is cleared by writing to the comparator register. 111 | 112 | The clock comparator register is specific to each hart and is part of the Hart Control Block. 113 | 114 | Only hart 0 is required to have a comparator. If any other harts also have comparators, 115 | the `sysclock_cmp` interrupt is posted only to the local hart. For harts that do not have 116 | a comparator, this register always reads as 0 and writes are ignored. 117 | 118 | RV64 devices expose a single 64-bit register, accessible with 64-bit instructions. 119 | RV32 devices exposes separate high/low 32-bit registers. 120 | 121 | ## Usage 122 | 123 | The memory mapped registers are available via a set of structures, directly available in C/C++. 124 | 125 | RV64 devices: 126 | 127 | - `sysclock.ctrl` 128 | - `sysclock.cnt` 129 | - `hcb.sysclockcmp` 130 | 131 | RV32 devices: 132 | 133 | - `sysclock.ctrl` 134 | - `sysclock.cntl` 135 | - `sysclock.cnth` 136 | - `hcb.sysclockcmpl` 137 | - `hcb.sysclockcmph` 138 | 139 | ```c 140 | uint64_t 141 | riscv_sysclock_read_cnt(void) 142 | { 143 | #if __riscv_xlen == 32 144 | // Atomic read. The loop is taken once in most cases. Only when the 145 | // value carries to the high word, two loops are performed. 146 | while (true) 147 | { 148 | uint32_t hi = sysclock.cnth; 149 | uint32_t lo = sysclock.cntl; 150 | if (hi == sysclock.cnth) 151 | { 152 | return ((uint64_t) hi << 32) | lo; 153 | } 154 | } 155 | #else 156 | return sysclock.cnt; 157 | #endif 158 | } 159 | 160 | uint64_t 161 | riscv_sysclock_read_cmp(void) 162 | { 163 | #if __riscv_xlen == 32 164 | return ((uint64_t) hcb.sysclockcmph << 32) | hcb.sysclockcmpl; 165 | #else 166 | return hcb.sysclockcmp; 167 | #endif 168 | } 169 | 170 | void 171 | riscv_sysclock_write_cmp(uint64_t value) 172 | { 173 | #if __riscv_xlen == 32 174 | // Write low as max; no smaller than old value. 175 | hcb.sysclockcmpl = (uint32_t) UINT_MAX; 176 | // Write high; no smaller than old value. 177 | hcb.sysclockcmph = ((uint32_t) (value >> 32)); 178 | // Write low as new value. 179 | hcb.sysclockcmpl = ((uint32_t) value); 180 | #else 181 | hcb.sysclockcmp = value; 182 | #endif 183 | } 184 | ``` 185 | 186 | A typical periodic tick counter: 187 | 188 | ```c 189 | 190 | uint64_c sysclock_cmp; 191 | uint32_t sysclock_increment; 192 | 193 | void 194 | sysclock_init(void) 195 | { 196 | // ... 197 | sysclock_increment = INPUT_FREQ_HZ/SYSCLOCK_FREQ_HZ; 198 | sysclock_cmp = riscv_sysclock_read_cnt() + sysclock_increment; 199 | 200 | // Ask for an interrupt after one tick interval. 201 | // Since the comparator is not initialised at reset, it 202 | // must be written before enabling interrupts. 203 | riscv_sysclock_write_cmp(sysclock_cmp); 204 | 205 | // Assign a priority. 206 | hic.interrupts[SYSCLOCK_CMP_INT_NUM].prio = SYSCLOCK_CMP_PRIO; 207 | // Enable. 208 | hic.interrupts[SYSCLOCK_CMP_INT_NUM].status = INTERRUPTS_SET_ENABLED; 209 | } 210 | 211 | void 212 | interrupt_handle_sysclock_cmp(void) 213 | { 214 | // Increment the clock tick counter and run tick actions. 215 | sysclock_tick_increment(); 216 | 217 | // Compute the next time point when the interrupt should come. 218 | sysclock_cmp += sysclock_increment; 219 | riscv_sysclock_write_cmp(sysclock_cmp); 220 | } 221 | 222 | ``` 223 | -------------------------------------------------------------------------------- /eabi.md: -------------------------------------------------------------------------------- 1 | # Embedded ABI 2 | 3 | The current RISC-V privileged ABI requires the caller to save the following registers: 4 | `ra`, `t0`, `t1`, `t2`, `a0`, `a1`, `a2`, `a3`, `a4`, `a5`, `a6`, `a7`, `t3`, `t4`, 5 | `t5`, `t6`. This amounts 6 | to 16 registers. If floating point is used, 20 more registers must be saved. 7 | 8 | In order to be able to call a C/C++ function from the interrupt handler, all 9 | these registers must be saved when entering interrupts, which impacts the 10 | interrupt latency. 11 | 12 | ## Proposal 13 | 14 | The main goal of the RISC-V Embedded ABI is to balance a high performance for background code with a reduced interrupt latency. 15 | 16 | As a secondary goal, if possible, it should remain consistent when applied to the reduced register set used by the RV32E devices. 17 | 18 | ### RV32E EABI calling convention 19 | 20 | For interrupt latency reasons, there should be no more than 7-8 caller 21 | saved registers. The table assumes the minimum of 7. If 8 are 22 | accepted, `x14` should be renamed as `a4`. 23 | 24 | | Register | ABI Name | Description | Caller | Callee | 25 | |:---------|:---------|:------------|--------|-------| 26 | | `x0` | `zero` | Hard-wired zero | | | 27 | | `x1` | `ra` | Return address | * | | 28 | | `x2` | `sp` | Stack pointer | | * | 29 | | `x3` | `gp` | Global pointer | | | 30 | | `x4` | `tp` | Thread pointer | | | 31 | | `x5` | `t1/al` | Temporary/alternate link register | * | | 32 | | `x6` | `s3` | Saved register | | * | 33 | | `x7` | `s4(/sl)` | Saved register(/stack limit?) | | * | 34 | ||||| 35 | | `x8` | `s0/fp` | Saved register/frame pointer | | * | 36 | | `x9` | `s1` | Saved register | | * | 37 | | `x10,x11` | `a0,a1` | Function arguments/return values | ** | | 38 | | `x12` | `a2` | Function arguments | * | | 39 | | `x13` | `a3` | Function arguments | * | | 40 | | `x14` | `s2` (`a4`?) | Saved register | (\*?) | * | 41 | | `x15` | `t0` | Temporary | * | | 42 | 43 | TODO: check how this allocation matches the needs of C++ virtual function dispatch. 44 | 45 | > [AW] For RVC compressibility, the most popular registers should 46 | be `x8-x15`. So I suggest renumbering `x8/x9` to be `s0/s1` (as is the 47 | case in the POSIX ABI). 48 | 49 | > [AW] Per the ISA spec, `x5` serves as an alternate link register, 50 | to drive hardware management of return-address stacks. It’s used by 51 | things like the `-msave-restore` option, which reduces code size by 52 | using millicode routines to implement prologues/epilogues. For this 53 | to work, `x5` needs to be one of the t-registers, as is the case in 54 | the POSIX ABI. I suggest either adding one more t-register at `x5`, 55 | or moving an existing t-register to `x5`. (The former option is better 56 | for code size and performance; the latter option is better for 57 | interrupt latency.) 58 | 59 | > [AW] For all the use cases I’ve encountered, two t-registers 60 | is sufficient for linkage purposes. 61 | 62 | > [BH] `jal[r]` and `jr` with `x5` are being baked into hardware as 63 | function call/return, just as with `x1`, complete with a special 64 | return address stack to accelerate the indirect jump for function 65 | return. That's *especially* important with millicode register 66 | save/restore which will primarily be used on microcontrollers. 67 | So `x5` must be a t register. No choice. But maybe don't call it 68 | `t0`, if at least one t register is in `x8-x15`. 69 | 70 | > [BH] the lowest numbered registers of each class (s, a t) should 71 | fall somewhere inside the C-favoured registers `x8-x15` (if any 72 | registers of that class fall in this range). 73 | 74 | > Having the stack limit exposed as a general register 75 | would save an extra push/pop during RTOS context switches. 76 | 77 | > [BH] I don't like the stack limit being in a register. 78 | Much better in a CSR. Harder to corrupt by accident. 79 | 80 | > [jnk0le] Stack limit is not about to be frequently accessed by thread 81 | code nor it is available from raw C/C++. Reserving another general purpose 82 | register increases register pressure especially in RV32E which currently 83 | have less available registers than armv7[M]. Stack limit can be corrupted 84 | by code. Mapping another shadow register into GPRs and protecting it from 85 | corruption by thread code, will increase hardware complexity. Saving 2 86 | cycles on push/pop in context switch is a sign of premature optimization 87 | of whole ABI for specific use case. Assuming 50MHz clockrate and 1000Hz 88 | scheduler tickrate, those 2 cycles saved per context switch accounts for 89 | 0,004% of total cycles saved. Of course, only if rest of the code is 90 | actually not starving from missing register. 91 | 92 | [ilg] I agree that the stack limit register may be better available only 93 | as a CSR. 94 | 95 | More details on the register allocation in the 96 | [SW Dev list](https://groups.google.com/a/groups.riscv.org/d/msg/sw-dev/Lp6ucrijap0/ZwVO5Ts-CQAJ). 97 | 98 | ### RV32I/RV64I EABI calling convention 99 | 100 | | Register | ABI Name | Description | Caller | Callee | 101 | |:---------|:---------|:------------|--------|-------| 102 | | `x0` | `zero` | Hard-wired zero | | | 103 | | `x1` | `ra` | Return address | * | | 104 | | `x2` | `sp` | Stack pointer | | * | 105 | | `x3` | `gp` | Global pointer | | | 106 | | `x4` | `tp` | Thread pointer | | | 107 | | `x5` | `t1/al` | Temporary/alternate link register | * | | 108 | | `x6` | `s3` | Saved register | | * | 109 | | `x7` | `s4(/sl)` | Saved register(/stack limit?) | | * | 110 | ||||| 111 | | `x8` | `s0/fp` | Saved register/frame pointer | | * | 112 | | `x9` | `s1` | Saved register | | * | 113 | | `x10,x11` | `a0,a1` | Function arguments/return values | ** | | 114 | | `x12` | `a2` | Function arguments | * | | 115 | | `x13` | `a3` | Function arguments | * | | 116 | | `x14` | `s2` | Saved register | | * | 117 | | `x15` | `t0` | Temporary | * | | 118 | ||||| 119 | | `x16–x31` | `s5-s20` | Saved registers | | * | 120 | ||||| 121 | | `f0–f1` | `fa0-fa1` | FP arguments/return values | * | | 122 | | `f2–f7` | `fa2-fa7` | FP arguments | * | | 123 | | `f8–f15` | `ft0-ft7` | FP temporaries | * | | 124 | | `f16–f31` | `fs0-fs15` | FP saved registers | | * | 125 | 126 | > To simplify the context push/pop code, 127 | the floating point registers were reordered, to group 128 | all the caller register in one half of the set and the callee 129 | saved registers in the other half. 130 | 131 | ### Sizes of variables 132 | 133 | - `long double` - 64 bits. 134 | 135 | TODO: add all other 136 | 137 | ## References 138 | 139 | ## RISC-V POSIX ABI 140 | 141 | Currently defined in Chapter 20, RISC-V Assembly Programmer’s Handbook, of the "The RISC-V Instruction Set Manual Volume I: User-Level ISA, Document Version 2.2". 142 | 143 | | Register | ABI Name | Description | Caller | Callee | 144 | |:---------|:---------|:------------|--------|-------| 145 | | `x0` | `zero` | Hard-wired zero | | | 146 | | `x1` | `ra` | Return address | * | | 147 | | `x2` | `sp` | Stack pointer | | * | 148 | | `x3` | `gp` | Global pointer | | | 149 | | `x4` | `tp` | Thread pointer | | | 150 | | `x5` | `t0` | Temporary/alternate link register | * | | 151 | | `x6–x7` | `t1-t2` | Temporaries | * | | 152 | | `x8` | `s0/fp` | Saved register/frame pointer | | * | 153 | | `x9` | `s1` | Saved register | | * | 154 | | `x10–x11` | `a0-a1` | Function arguments/return values | * | | 155 | | `x12–x17` | `a2-a7` | Function arguments | * | | 156 | | `x18–x27` | `s2-s11` | Saved registers | | * | 157 | | `x28–x31` | `t3-t6` | Temporaries | * | | 158 | ||||| 159 | | `f0–f7` | `ft0-ft7` | FP temporaries | * | | 160 | | `f8–f9` | `fs0-fs1` | FP saved registers | | * | 161 | | `f10–f11` | `fa0-fa1` | FP arguments/return values | * | | 162 | | `f12–f17` | `fa2-fa7` | FP arguments | * | | 163 | | `f18–f27` | `fs2-fs11` | FP saved registers | | * | 164 | | `f28–f31` | `ft8-ft11` | FP temporaries | * | | 165 | 166 | ## RISC-V RV32E ABI 167 | 168 | Currently defined in the [RISC-V ELF psABI](https://github.com/riscv/riscv-elf-psabi-doc/blob/master/riscv-elf.md#-rv32e-calling-convention). 169 | 170 | | Register | ABI Name | Description | Caller | Callee | 171 | |:---------|:---------|:------------|--------|-------| 172 | | `x0` | `zero` | Hard-wired zero | | | 173 | | `x1` | `ra` | Return address | * | | 174 | | `x2` | `sp` | Stack pointer | | * | 175 | | `x3` | `gp` | Global pointer | | | 176 | | `x4` | `tp` | Thread pointer | | | 177 | | `x5` | `t0` | Temporary/alternate link register | * | | 178 | | `x6–x7` | `t1-t2` | Temporaries | * | | 179 | | `x8` | `s0/fp` | Saved register/frame pointer | | * | 180 | | `x9` | `s1` | Saved register | | * | 181 | | `x10–x11` | `a0-a1` | Function arguments/return values | * | | 182 | | `x12–x15` | `a2-a5` | Function arguments | * | | 183 | 184 | ## Links 185 | 186 | - [Application Binary Interface for 187 | the ARM® Architecture](http://infocenter.arm.com/help/topic/com.arm.doc.ihi0036b/IHI0036B_bsabi.pdf) 188 | - [Procedure Call Standard for the ARM® Architecture](http://infocenter.arm.com/help/topic/com.arm.doc.ihi0042f/IHI0042F_aapcs.pdf) 189 | - [RISC-V ELF psABI](https://github.com/riscv/riscv-elf-psabi-doc/blob/master/riscv-elf.md) 190 | 191 | -------------------------------------------------------------------------------- /csrs.md: -------------------------------------------------------------------------------- 1 | # The Control and Status Registers (CSRs) 2 | 3 | The RISC-V ISA defines a set of 4096 Control and Status Registers that can be accessed 4 | via special `csr` instructions with immediate operands identifying the register. 5 | 6 | For performance reasons, the RISC-V microcontroller profile uses only a small number 7 | of core system registers via the CSR mechanism; the rest are available in the memory 8 | mapped system area. 9 | 10 | Unless otherwise mentioned, write access to the CSRs is limited to machine/privileged mode. 11 | 12 | ## Hart ID Register (`hartid`) 13 | 14 | The `hartid` CSR is an xlen-bit read-only register containing the integer ID of the 15 | hart running the code. This register must be readable in any implementation. 16 | In single-hart devices, it always reads 0. In multi-hart devices, the hart IDs might 17 | not necessarily be numbered contiguously 18 | (although it is preferable), but at least one hart must have a hart ID of zero. 19 | 20 | | Bits | Name | Type | Reset | Description | 21 | |:-----|:-----|:-----|:------|-------------| 22 | | [N:0] | `hartid` | ro | | The integer ID of the hart. | 23 | | [(xlen-1):(N+1)] | | | | Reserved. | 24 | 25 | N is the number of bits required to store the maximum hart ID and is implementation specific. 26 | 27 | This CSR is identical to `mhartid` in the RISC-V privileged profile. 28 | 29 | ## Configuration and control (`ctrl`) 30 | 31 | The `ctrl` CSR is an xlen-bit read/write register that controls several aspects of the hart 32 | functionality. 33 | 34 | | Bits | Name | Type | Reset | Description | 35 | |:-----|:-----|:-----|:------|-------------| 36 | | [0] | `sptena` | r | 0 | Thread stack enable:
- 0: always use `spm` as `sp`.
- 1: in **application** mode use `spt` as `sp`. | 37 | | [1] | `stackalign` | rw | 1[1] | The context stack alignment:
- 0: 4-bytes alignment guaranteed, no SP adjustment is performed.
- 1: 8-bytes alignment guaranteed, SP adjusted if necessary.| 38 | | [2] | `spadjusted` | r | 0 | Reserved bit used during context push/pop to remember if the stack required an extra alignment word. | 39 | | [7:3] | | | | Reserved. | 40 | | [8] | `fpena` | rw | 0 | Floating point enable:
- 0: if the FP unit is disabled.
- 1: if the FP unit is enabled. | 41 | | [9] | `fpcxs` | rw | 1 | Floating point context save:
- 0: if the context stack should not save FP registers.
- 1: if the context stack should save FP registers. | 42 | | [10] | `fplazy` | rw | 1 | Floating point lazy context save:
- 0: disable automatic lazy context save.
- 1: enable automatic lazy context save. | 43 | | [(xlen-1):11] | | | | Reserved. | 44 | 45 | *1: The default value for the `stackalign` is implementation specific; the 46 | recommended default is 1. 47 | 48 | TODO: decide if a `reset` bit (to reset the current hart) fits here, and 49 | where should be a `sysreset` to reset the entire device. 50 | 51 | TODO: allocate a number for it. 52 | 53 | ## Mode and status (`status`) 54 | 55 | The `status` CSR is an xlen-bit read/write register that identifies the 56 | current hart mode and status. 57 | 58 | | Bits | Name | Type | Reset | Description | 59 | |:-----|:-----|:-----|:------|-------------| 60 | | [0] | `handler` | r | 0 | Hart is running:
- 0: application code.
- 1: handler code. | 61 | | [1] | `user` | r | 0 | Application privileges:
- 0: machine/privileged mode.
- 1: user/unprivileged mode. | 62 | | [7:2] | | | | Reserved. | 63 | | [8] | `interrupt` | r | 0 | If `handler` is set, then
1 if in an interrupt, 0 if in an exception | 64 | | [18:9] | `cause` | r | 0 | The exception or interrupt cause code. | 65 | | [(xlen-1):19] | | | | Reserved. | 66 | 67 | The handler code is always running in machine/privileged mode. 68 | 69 | TODO: the bits in this register, as the entire mechanism to enter/exit 70 | exceptions and traps, requires a thorough analysis. 71 | 72 | TODO: allocate a number for it. 73 | 74 | ## Interrupt Enable (`iena`) 75 | 76 | The `iena` CSR is an xlen-bit read/write register that controls whether the interrupt 77 | are enabled or not. 78 | 79 | This register has a single bit on purpose. Access to the interrupt enable bit must be quite 80 | fast since masking all interrupts is one of the methods used to implement critical sections. 81 | 82 | | Bits | Name | Type | Reset | Description | 83 | |:-----|:-----|:-----|:------|-------------| 84 | | [0] | `iena` | rw | 0 | Interrupts Enable; 1 if interrupts are enabled. | 85 | | [(xlen-1):1] | | | | Reserved. | 86 | 87 | This CSR is specific to the RISC-V microcontroller profile. 88 | 89 | TODO: allocate a number for it. 90 | 91 | ## Interrupt Priority Threshold (`iprioth`) 92 | 93 | The `iprioth` CSR is an xlen-bit read/write register that holds the interrupts threshold. 94 | Only interrupts requests that have a priority strictly greater than the threshold will cause 95 | an interrupt to become active. The threshold register must always be able to hold the value zero, 96 | in which case, no interrupts are masked. The threshold register must also be able to hold 97 | the maximum priority level, in which case all interrupts are masked (functionally equivalent 98 | to disabling interrupts). 99 | 100 | This register is a CSR because handling the interrupts threshold is one of the methods used 101 | to implement critical sections, and this must be as fast as possible. 102 | 103 | | Bits | Name | Type | Reset | Description | 104 | |:-----|:-----|:-----|:------|-------------| 105 | | [N:0] | `iprioth` | rw | 0x00 | The interrupt priority threshold. | 106 | | [(xlen-1):(N+1)] | | | | Reserved. | 107 | 108 | N is the number of bits required to store the maximum priority level, and is implementation 109 | specific. It must match the number of bits used by the `prio` register in the interrupt 110 | controller. 111 | 112 | All reserved bits read back as 0. To find out N at runtime, an application can write an 113 | 'all-1' pattern and read back the register. 114 | 115 | If the hart does not implement an interrupt controller, the whole register reads back as zero. 116 | 117 | This CSR is specific to the RISC-V microcontroller profile. 118 | 119 | TODO: allocate a number for it. 120 | 121 | > [PA]: the truncation of priority bits should be done at the 122 | least-significant end, to avoid 123 | priority inversion. [ilg] this might be done for example by moving the bits to the 124 | high end of the register, but this requires later handling the priority as 125 | word/double word. 126 | 127 | ## Interrupt Priority Threshold Increase (`ipriothinc`) 128 | 129 | The `ipriothinc` CSR behaves like an xlen-bit read/write register, but in fact uses the 130 | same register as `iprioth`. The difference is that writes to this CSR are effective only 131 | if the new value is higher than the current value, in other words it guarantees that the 132 | interrupt threshold is not decreased. 133 | 134 | This register is a CSR because handling the interrupts threshold is one of the methods used 135 | to implement critical sections, and this must be as fast as possible. 136 | 137 | | Bits | Name | Type | Reset | Description | 138 | |:-----|:-----|:-----|:------|-------------| 139 | | [N:0] | `ipriothinc` | rw | 0x00 | The interrupt priority threshold. | 140 | | [(xlen-1):(N+1)] | | | | Reserved. | 141 | 142 | N is the number of bits required to store the maximum priority level, and is implementation 143 | specific. 144 | 145 | This CSR is specific to the RISC-V microcontroller profile. 146 | 147 | TODO: possibly find a better name. how about `ipriothup`? 148 | 149 | TODO: allocate a number for it. 150 | 151 | ## Main Stack Pointer (`spm`) 152 | 153 | The `spm` CSR is an xlen-bit read-write register that holds the main stack pointer. 154 | It is always the default stack pointer after reset. Interrupts and exceptions always 155 | use this stack to store the exception frame. 156 | 157 | | Bits | Name | Type | Reset | Description | 158 | |:-----|:-----|:-----|:------|-------------| 159 | | [0] | | | 0 | Reserved. | 160 | | [(xlen-1):1] | `spm` | rw | startup | The main stack pointer. | 161 | 162 | This CSR is specific to the RISC-V microcontroller profile. 163 | 164 | TODO: check if the stack has more strict alignment requirements. 165 | 166 | TODO: allocate a number for it. 167 | 168 | ## Main Stack Pointer Limit (`spmlimit`) 169 | 170 | The `msplimit` CSR is an xlen-bit read-write register that holds the lowest address 171 | the main stack can descend. 172 | 173 | | Bits | Name | Type | Reset | Description | 174 | |:-----|:-----|:-----|:------|-------------| 175 | | [0] | | | 0 | Reserved. | 176 | | [(xlen-1):1] | `spmlimit` | rw | startup | The main stack lower limit. | 177 | 178 | If an operation using the main stack pointer attempts to write to an address below 179 | the limit, an exception is triggered and the operation is not performed. 180 | 181 | This CSR is specific to the RISC-V microcontroller profile. 182 | 183 | The `spmlimit` CSR is optional for the ES (small) sub-profile; in this case it 184 | must always read zero. 185 | 186 | TODO: allocate a number for it. 187 | 188 | ## Thread Stack Pointer (`spt`) 189 | 190 | The `spt` CSR is an xlen-bit read-write register that holds the stack pointer used 191 | by the application current thread. It is intended to multi-threaded applications. 192 | 193 | This register is a CSR because access to the stack pointer may occur in context switching 194 | routines and needs to be fast. 195 | 196 | | Bits | Name | Type | Reset | Description | 197 | |:-----|:-----|:-----|:------|-------------| 198 | | [0] | | | 0 | Reserved. | 199 | | [(xlen-1):1] | `spt` | rw | Unknown | The thread stack pointer. | 200 | 201 | This CSR is specific to the RISC-V microcontroller profile. 202 | 203 | TODO: allocate a number for it. 204 | 205 | ## Thread Stack Pointer Limit (`sptlimit`) 206 | 207 | The `tsplimit` CSR is an xlen-bit read-write register that holds the lowest address 208 | the thread stack can descend. 209 | 210 | This register is a CSR because access to the stack pointer limit may occur in context switching 211 | routines and needs to be fast. 212 | 213 | | Bits | Name | Type | Reset | Description | 214 | |:-----|:-----|:-----|:------|-------------| 215 | | [0] | | | 0 | Reserved. | 216 | | [(xlen-1):1] | `sptplimit` | rw | Unknown | The thread stack lower limit. | 217 | 218 | If an operation using the thread stack pointer attempts to write to an address below 219 | the limit, an exception is triggered and the operation is not performed. 220 | 221 | This CSR is specific to the RISC-V microcontroller profile. 222 | 223 | The `sptlimit` CSR is optional for the ES (small) sub-profile; in this case it 224 | must always read zero. 225 | 226 | TODO: allocate a number for it. 227 | 228 | ## RISC-V compatibility CSRs 229 | 230 | The RISC-V Volume I, Chapter 2.8, mentions two mandatory instructions, `rdcycle` and 231 | `rdinstret`; to implement them, two CSRs are required: 232 | 233 | - cycle: available as `hcb.cyclecnt` 234 | - instret: available as `hcb.instcnt` 235 | 236 | The RISC-V Volume II, mentions other CSRs, but it is not clear which one are mandatory, 237 | if any: 238 | 239 | - mstatus: not needed, there is only one bit needed, mie, which is now n the `iena` CSR. 240 | - mcause: no longer needed, the cause is packed in the `status` CSR 241 | - mie: not needed, interrupts are enabled in the HIC registers 242 | - mip: not needed, interrupt pending bits are in the HIC registers 243 | - mtvec: not needed, there are two memory mapped registers, `excvta` and `intvta` 244 | - misa: can be safely migrated to memory mapped 245 | - mepc: not needed, pushed onto the stack 246 | - mtval: not decided; nested exceptions may override this. TODO: check thoroughly. 247 | 248 | - mscratch: not decided if needed 249 | 250 | TODO: add MPU registers 251 | 252 | ## Usage 253 | 254 | ### Interrupt critical sections 255 | 256 | In a single hart device, the simple ways to implement critical section is to 257 | fully disable interrupts, assuming the application does not need to keep any 258 | fast interrupts enabled. 259 | 260 | ```c 261 | void 262 | f1(void) 263 | { 264 | // ... 265 | { 266 | // Interrupts critical section. 267 | xlenreg_t status = riscv_csr_write_iena(0); 268 | // ... 269 | riscv_csr_write_iena(status); 270 | } 271 | // ... 272 | } 273 | ``` 274 | 275 | Otherwise, if the application uses some fast interrupts, it can raise the 276 | interrupt threshold to a limit below the fast interrupts priority. Please 277 | note how entering the critical sections guarantees that the threshold is 278 | not lowered. 279 | 280 | ```c 281 | void 282 | f2(void) 283 | { 284 | // ... 285 | { 286 | // Interrupts critical section. 287 | xlenreg_t status = riscv_csr_write_ipriothinc(7); 288 | // ... 289 | riscv_csr_write_iprioth(status); 290 | } 291 | // ... 292 | } 293 | ``` 294 | 295 | ### Performance issues 296 | 297 | The reason for prefering CSRs vs memory mapped registers is speed; accessing CSRs requires a single instruction, while memory accesses take two: 298 | 299 | ``` 300 | f(riscv_csr_read_mstatus(), SYSPERIPH->cmd); 301 | 302 | 20400238: 30002573 csrr a0,mstatus 303 | 304 | 2040023c: f00007b7 lui a5,0xf0000 305 | 20400240: 43cc lw a1,4(a5) 306 | 307 | 20400242: 307000ef jal ra,20400d48 308 | ``` 309 | 310 | For this reason, all registers needed in interrupt critical sections and context switches should be accessed with CSR instructions, while all other non critical registers can be memory mapped. 311 | -------------------------------------------------------------------------------- /exceptions-and-interrupts.md: -------------------------------------------------------------------------------- 1 | # Exceptions and Interrupts 2 | 3 | Exceptions are unusual **conditions that occur at run time, associated with an 4 | instruction** in the current RISC-V hart. 5 | 6 | Interrupts are **events that occur asynchronously outside** any of the RISC-V harts. 7 | 8 | > Other architectures define interrupts as a specific type of exceptions. 9 | However, for the RISC-V microcontroller profile, exceptions are specific 10 | for the architecture, and common to all devices, while interrupts are 11 | mostly specific to an implementation (except a few system interrupts, 12 | also common to all devices). Thus it looks more natural to define 13 | two separate vector tables, one for exceptions, to be implemented 14 | in the architecture software package, and one for interrupts, to be 15 | implemented in the device software package. 16 | 17 | The mechanism to process exceptions and interrupts (vectored, nested, separate stack) 18 | is one of the main improvements in the RISC-V microcontroller profile over the 19 | privileged profile. 20 | 21 | ## Exception and interrupt handlers in C/C++ 22 | 23 | The main feature is the ability to write the exception and interrupt handlers 24 | as plain C/C++ function, that do not need any compiler attributes, or assembly 25 | code. 26 | 27 | For this to be possible, there are two requirements: 28 | 29 | - both the exception and the interrupt entry code must abide by the ABI requirements 30 | and save the same caller registers as a regular C/C++ call 31 | - a custom return address must be used, such that when the handler returns, 32 | the core will trigger the exception return mechanism, without the need of explicit 33 | assembly `mret` instructions. 34 | 35 | ## Exceptions 36 | 37 | Exceptions trigger a **synchronous transfer of control** to an exception handler 38 | within the current hart. 39 | 40 | Some exceptions cannot be disabled, and handlers to process them should always be installed. 41 | 42 | Some exceptions are **resumable**, i.e. an execution can continue to the next 43 | instruction (for example the illegal instruction handler can implement a custom 44 | instruction and resume). 45 | 46 | The RISC-V privileged specs define the following exceptions, in decreasing priority order: 47 | 48 | * Instruction address misaligned 49 | * Instruction access fault 50 | * Illegal instruction 51 | * Breakpoint 52 | * Load address misaligned 53 | * Load access fault 54 | * Store/AMO address misaligned 55 | * Store/AMO access fault 56 | * Environment call from U-mode 57 | * Environment call from M-mode 58 | * Instruction page fault 59 | * Load page fault 60 | * Store/AMO page fault 61 | 62 | TODO: rework for microcontrollers; define which one have configurable priorities. 63 | 64 | TODO: NMI? routed only to hart 0? 65 | 66 | ### Exceptions vector table 67 | 68 | The exceptions vector table is an array of addresses (xlen size elements) pointing to 69 | interrupt handlers (C/C++ functions). 70 | 71 | The address of the exceptions vector table is kept by each hart in (`hcb.excvta`); 72 | it is automatically initialised at startup with 73 | the address provided in the hart startup block and can be later written by software. 74 | 75 | ## Interrupts 76 | 77 | Interrupts are generally **triggered by peripherals** to notify the application of a 78 | given condition or event. 79 | 80 | Interrupts trigger the transfer of control to an interrupt handler associated with 81 | a hart. 82 | 83 | In the RISC-V microcontroller profile, a hart can have up to **1024** interrupts, 84 | including the system interrupts. 85 | 86 | > This limit was chosen arbitrarily and is considered quite high. 87 | 88 | ### Interrupt priorities 89 | 90 | Interrupts have **programmable priorities**, defined as small unsigned numbers. 91 | 92 | The **priority value 0** is reserved to mean 93 | _'never interrupt'_ or _'disabled'_, and interrupt priorities increase with 94 | the increasing integer value. 95 | 96 | Interrupts with the same priority are processed in the order of their index 97 | in the interrupt vector 98 | table, with a higher index meaning a higher priority. 99 | 100 | For multi-hart devices, the interrupt wiring to harts is implementation specific; 101 | each interrupt 102 | may be wired to one or several harts; it is the responsibility 103 | of each hart to enable the interrupts it desires to process. For redundant systems, 104 | it is also 105 | possible for multiple harts to process the same interrupt. 106 | 107 | ### Interrupt priority threshold 108 | 109 | Each hart has an associated priority threshold, held in a hart-specific register. 110 | 111 | Only interrupts that have a priority strictly greater than the threshold will 112 | cause an interrupt to be sent to the hart. 113 | 114 | ### Priority bits 115 | 116 | The actual number of bits used to store the interrupt priority is implementation 117 | specific, but must 118 | be at least 3 (i.e. at least 8 priority levels). 119 | 120 | > Extra care must be considered when moving code to implementations with fewer 121 | priority levels, since truncation could lead to priority inversions. 122 | For example, when moving a program from devices 123 | with 4-bit priority bits to devices with 3-bit priorities, if the application 124 | uses priority 9 for IRQ0 and priority 3 125 | for IRQ1, IRQ0 is expected to have a higher 126 | priority. But if the MSB bit is removed, IRQ0 will have priority 1 and be 127 | lower than IRQ1. 128 | 129 | > It is 130 | recommended that software handling priorities know about the number of bits 131 | and use asserts to validate the priority values. 132 | 133 | > [PA]: the truncation of priority bits should be done at the 134 | least-significant end, to avoid the kind of 135 | priority inversion. [ilg] this translates into moving the bits to the other 136 | end of the word/register, and possibly requiring byte/half-word accesses 137 | to the NIC. 138 | 139 | ### Interrupt preemption and nesting 140 | 141 | If an hart is executing an interrupt handler and a higher priority interrupt 142 | occurs, the current interrupt handler is temporarily suspended and the higher 143 | priority interrupt handler is executed to completion, then the initial 144 | interrupt handler is resumed. 145 | 146 | Each new interrupt creates a new context on the main stack, and removes it 147 | when the handler returns. 148 | 149 | There is no limit for interrupt nesting, assuming the main stack is large enough. 150 | 151 | ### System interrupts 152 | 153 | System interrupts are generated by system peripherals, like `sysclock`, `rtclock`. 154 | 155 | TBD 156 | 157 | ### Interrupts vector table 158 | 159 | The interrupts table is an **array of pointers** to interrupt handlers, 160 | implemented as **C/C++ functions**. The number of interrupts per hart is 161 | implementation specific but cannot exceed 1024 elements. 162 | 163 | Each hart may have its own table, with handlers for the interrupts it can process. 164 | 165 | The address of the array must be programmatically written by each hart to 166 | its `hcb.intvta` register before enabling interrupts, usually during startup. 167 | 168 | The first 8 entries are reserved for system interrupts: 169 | 170 | * `context_switch` (must have the lowest priority) 171 | * `rtclock_cmp` 172 | * `sysclock_cmp` 173 | * ... 5 more, reserved 174 | 175 | ## Vector tables relocation 176 | 177 | The starting address used by a RISC-V microcontroller is usually 178 | either a flash memory or a ROM device, and the value cannot be changed at run-time. 179 | However, some applications, like bootloaders or applications running in RAM, 180 | start with the vector tables at one 181 | address and later transfer control to the application located at a different 182 | address. For such cases it is useful to be able to modify or define vector tables 183 | at run-time. In order to handle this, the RISC-V microcontrollers support a feature 184 | called Vector Table Relocation. 185 | 186 | For this, the `hcb.excvta` and `hcb.intvta` registers can be written at any time from 187 | code running in machine mode. 188 | 189 | ## Context stack 190 | 191 | When exceptions/interrupts are taken, they push a context on the current stack. 192 | The stack pointer must be xlen aligned. For RV32 harts with the D extension, 193 | an additional alignment to 8 may be required. 194 | 195 | If the `stackalign` bit in the `ctrl` CSR is set, the stack is always aligned 196 | at 8. Although this is implementation specific, it usually allows faster context 197 | switches. 198 | 199 | The RISC-V microcontroller profile uses a full-descending context stack, where: 200 | 201 | - When pushing context, the hardware decrements the stack pointer to the end of the 202 | new stack frame before it stores data onto the stack. 203 | - When popping context, the hardware reads the data from the stack frame and then 204 | increments the stack pointer. 205 | 206 | The current stack pointer is either `spt` (when in application mode and the 207 | `ctrl.sptena` is set), 208 | or `spm` otherwise (when already in handler mode or `ctrl.sptena` is not set). 209 | 210 | In other words, regardless how many nested interrupts occur, there is only one 211 | context pushed onto the thread stack, and all other nested contexts are pushed 212 | onto the main stack. Also 213 | all handlers use the main stack, and do not pollute the thread stack, which 214 | do not need to reserve space for the interrupt handlers. 215 | 216 | For the current RISC-V Linux ABI, the stack context is, from hight to low 217 | addresses: 218 | 219 | - <-- original `sp` (`spt` or `spm`) 220 | - (optional padding) 221 | - `status` (CSR, the current mode when the exception/interrupt occurred) 222 | - `pc` (the next address to return from the exception/interrupt) 223 | - `x31/t6` 224 | - `x30/t5` 225 | - `x29/t4` 226 | - `x28/t3` 227 | - `x17/a7` 228 | - `x16/a6` 229 | - `x15/a5` 230 | - `x14/a4` 231 | - `x13/a3` 232 | - `x12/a2` 233 | - `x11/a1` 234 | - `x10/a0` 235 | - `x7/t2` 236 | - `x6/t1` 237 | - `x5/t0` 238 | - `x1/ra` <-- new `sp`, possibly align 8 239 | 240 | With the new RISC-V EABI proposal, this would be reduced to a more 241 | reasonable context stack: 242 | 243 | - <-- original `sp` (`spt` or `spm`) 244 | - (optional padding) 245 | - `status` (CSR, the current mode when the exception/interrupt occurred) 246 | - `pc` (the next address to return from the exception/interrupt) 247 | - `x15/a5` 248 | - `x14/a4` 249 | - `x13/a3` 250 | - `x12/a2` 251 | - `x11/a1` 252 | - `x10/a0` 253 | - `x1/ra` <-- new `sp`, possibly align 8 254 | 255 | With floating point support added, the context stack for the current RISC-V 256 | Linux ABI is quite large, which is another good reason why the RISC-V 257 | microcontroller profile should use an optimised Embedded ABI. 258 | 259 | - <-- original `sp` (`spt` or `spm`) 260 | - (optional padding) 261 | - `fcsr` (\*) <- for double, it must be aligned to 8 262 | - `f31/ft11` (\*) 263 | - `f30/ft10` (\*) 264 | - `f29/ft9` (\*) 265 | - `f28/ft8` (\*) 266 | - `f17/fa7` (\*) 267 | - `f16/fa6` (\*) 268 | - `f15/fa5` (\*) 269 | - `f14/fa4` (\*) 270 | - `f13/fa3` (\*) 271 | - `f12/fa2` (\*) 272 | - `f11/fa1` (\*) 273 | - `f10/fa0` (\*) 274 | - `f7/ft7` (\*) 275 | - `f6/ft6` (\*) 276 | - `f5/ft5` (\*) 277 | - `f4/ft4` (\*) 278 | - `f3/ft3` (\*) 279 | - `f2/ft2` (\*) 280 | - `f1/ft1` (\*) 281 | - `f0/ft0` (\*) 282 | - `status` (CSR, the current mode when the exception/interrupt occurred) 283 | - `pc` (the next address to return from the exception/interrupt) 284 | - `x31/t6` 285 | - `x30/t5` 286 | - `x29/t4` 287 | - `x28/t3` 288 | - `x17/a7` 289 | - `x16/a6` 290 | - `x15/a5` 291 | - `x14/a4` 292 | - `x13/a3` 293 | - `x12/a2` 294 | - `x11/a1` 295 | - `x10/a0` 296 | - `x7/t2` 297 | - `x6/t1` 298 | - `x5/t0` 299 | - `x1/ra` <-- new `sp`, possibly align 8 300 | 301 | (\*) The floating point registers are not saved by devices that do not 302 | implement the 303 | F or D extensions and do not have the `ctrl.fpena` bit set. 304 | 305 | To reduce latency, in parallel with saving the registers, the address of the 306 | exception/interrupt handler is fetched from the vector table. 307 | 308 | After saving the context stack: 309 | 310 | - the `handler` bit in the `status` register is set, to mark the handler-mode 311 | - the `ra` register is loaded with a special HANDLER_RETURN pattern, 312 | defined below 313 | - the `pc` register is loaded with the handler address; this is equivalent 314 | with calling the handler. 315 | 316 | When the C/C++ function returns, the return code will load `pc` with the 317 | special HANDLER_RETURN value in `ra`. 318 | This will trigger the exception return mechanism, which will pop the context 319 | from the stack and return from the interrupt/exception. 320 | 321 | TODO: define the detailed logic in pseudocode. 322 | 323 | ## The HANDLER_RETURN pattern 324 | 325 | The special HANDLER_RETURN pattern is an 'all-1' for the given xlen with 326 | some bits used to differentiate contexts. 327 | Since the RISC-V microcontroller profile reserves a slice at the very end 328 | of the memory space (0xF...), and this slice has the execute permissions 329 | removed, it does not create any confusion. 330 | 331 | This value is generated at exception entrance and is stored in the return 332 | address register (`ra`). 333 | 334 | The HANDLER_RETURN pattern bits: 335 | 336 | | Bits | Value | Description | 337 | |:-----|:------|-------------| 338 | | [0] | 1 | Reserved. | 339 | | [1] | 0 | Reserved. | 340 | | [2] | - 0: main stack
- 1: thread stack | Stack that holds the context to pop. | 341 | | [3] | - 0: short, without FP
- 1: long, with FP | Stack frame type. | 342 | | [4] | - 0: Linux
- 1: Embedded | ABI | 343 | | [(xlen-1):5] | 1 | Reserved. | 344 | 345 | > The ABI bit is used mainly for compatibility reasons, until the EABI 346 | will be finalised and implemented by the compiler. 347 | 348 | > The HANDLER_RETURN pattern does not include a bit defining the 349 | resulting 350 | application/handler mode, since it can be restored from the saved 351 | `status` register. Saving this register is necessary not only for the 352 | `handler` bit (which might have been added to HANDLER_RETURN), but for the 353 | `cause` field, which otherwise may be overridden by nested interrupts. 354 | 355 | > There is also a [proposal](https://github.com/emb-riscv/specs-markdown/issues/3) 356 | to use the lowest bits of the address and to slightly adjust JALR. 357 | 358 | ## The FP lazy stacking mechanism 359 | 360 | The large number of floating point registers take a long time to copy 361 | during context push/pop on the stack. 362 | 363 | One solution to optimize this is to save them only when needed, by using a 364 | lazy stacking mechanism. 365 | 366 | TODO: define the details. 367 | 368 | ## Tail chaining 369 | 370 | When an exception/interrupt takes place while already in handler mode, and the 371 | priority does not require pre-emption, the new exception/interrupt will enter the 372 | pending state. When the hart finishes executing the current handler, it can then 373 | proceed to process the pending exception/interrupt request. Instead of restoring 374 | the registers back from the stack (unstacking) and then pushing them on to the 375 | stack again (stacking), the hart skips the unstacking and stacking steps and 376 | enters the new handler of the pending exception/interrupt as soon as possible. 377 | 378 | TODO: define the details. 379 | 380 | ## Usage 381 | 382 | ```c 383 | extern "C" { 384 | 385 | riscv_startup_block_t 386 | __attribute__((section(".startup_blocks"))) 387 | harts_startup_blocks[] = { 388 | { 389 | hart_startup, 390 | hart_stack_pointer, 391 | hart_global_pointer, 392 | hart_exception_handlers // <-- 393 | } 394 | }; 395 | 396 | // The exception vector table address is automatically set during startup. 397 | riscv_exception_handler_t 398 | hart_exception_handlers[] = { 399 | exception_handle_address_misaligned, 400 | exception_handle_address_fault, 401 | exception_handle_illegal_instruction, 402 | // ... 403 | }; 404 | 405 | // An example of an exception handler. Plain C function. May return. 406 | void 407 | exception_handle_address_misaligned() 408 | { 409 | // ... 410 | } 411 | 412 | // ... 413 | 414 | [[noreturn]] void 415 | hart_startup(void) 416 | { 417 | // ... 418 | // Set the interrupt vector table address. 419 | hcb.intvta = hart_interrupt_handlers; 420 | // ... 421 | } 422 | 423 | riscv_interrupt_handler_t 424 | hart_interrupt_handlers[] = { 425 | interrupt_handle_context_switch, 426 | interrupt_handle_rtclock_cmp, 427 | interrupt_handle_sysclock_cmp, 428 | // ... 429 | }; 430 | 431 | // ... 432 | 433 | // An example of an interrupt handler. Plain C function. 434 | void 435 | interrupt_handle_syslock_cmp(void) 436 | { 437 | // ... 438 | 439 | // Simply returns without having to do anything special. 440 | } 441 | 442 | } // extern "C" 443 | ``` 444 | -------------------------------------------------------------------------------- /interrupts-use-cases.md: -------------------------------------------------------------------------------- 1 | # Appendix B: Interrupts use cases 2 | 3 | Regardless how the interrupts are implemented, any architecture design should 4 | be checked how well the common use cases are accommodated. 5 | 6 | ## Peripherals vs scheduler interrupts 7 | 8 | By design, old microcontroller architectures expected interrupts to be 9 | triggered only occasionally by peripherals. 10 | 11 | Since Cortex-M, interrupts were extensively enhanced with features to 12 | support the implementation of RTOSes, greatly simplifying context switches 13 | and preemption. 14 | 15 | ### Peripheral interrupts 16 | 17 | Although there were opinions that peripheral interrupts should be as simple 18 | as possible to be fully inlined, a well structured application may use 19 | drivers from a separate library/package, so the typical use case is to 20 | have the interrupt handlers in files specific to the application 21 | and call the driver interrupt service routine via a plain C/C++ call. 22 | 23 | The traditional approach is to have the interrupt handler annotated 24 | with the `interrupt` attribute, which generates a fully functional 25 | interrupt handler, including preserving registers and returning 26 | from interrupt. 27 | 28 | ```c 29 | # include "driver-xyz.h" 30 | 31 | void __attribute__((interrupt)) 32 | interrupt_handle_xyz(void) 33 | { 34 | driver_xyz_interrupt_service_routine(); 35 | } 36 | ``` 37 | 38 | The problem with this approach is that on a RISC-V device, 39 | with the current POSIX ABI, the number of 40 | registers to be saved by the caller is large, 41 | and the generated 42 | code, with `-march=rv64gc -mabi=lp64d` looks like: 43 | 44 | ``` 45 | .option nopic 46 | .text 47 | .align 1 48 | .globl interrupt_handle_xyz 49 | .type interrupt_handle_xyz, @function 50 | 51 | interrupt_handle_xyz: 52 | addi sp,sp,-288 53 | 54 | sd ra,280(sp) 55 | sd t0,272(sp) 56 | sd t1,264(sp) 57 | sd t2,256(sp) 58 | sd a0,248(sp) 59 | sd a1,240(sp) 60 | sd a2,232(sp) 61 | sd a3,224(sp) 62 | sd a4,216(sp) 63 | sd a5,208(sp) 64 | sd a6,200(sp) 65 | sd a7,192(sp) 66 | sd t3,184(sp) 67 | sd t4,176(sp) 68 | sd t5,168(sp) 69 | sd t6,160(sp) 70 | fsd ft0,152(sp) 71 | fsd ft1,144(sp) 72 | fsd ft2,136(sp) 73 | fsd ft3,128(sp) 74 | fsd ft4,120(sp) 75 | fsd ft5,112(sp) 76 | fsd ft6,104(sp) 77 | fsd ft7,96(sp) 78 | fsd fa0,88(sp) 79 | fsd fa1,80(sp) 80 | fsd fa2,72(sp) 81 | fsd fa3,64(sp) 82 | fsd fa4,56(sp) 83 | fsd fa5,48(sp) 84 | fsd fa6,40(sp) 85 | fsd fa7,32(sp) 86 | fsd ft8,24(sp) 87 | fsd ft9,16(sp) 88 | fsd ft10,8(sp) 89 | fsd ft11,0(sp) 90 | 91 | call driver_xyz_interrupt_service_routine 92 | 93 | ld ra,280(sp) 94 | ld t0,272(sp) 95 | ld t1,264(sp) 96 | ld t2,256(sp) 97 | ld a0,248(sp) 98 | ld a1,240(sp) 99 | ld a2,232(sp) 100 | ld a3,224(sp) 101 | ld a4,216(sp) 102 | ld a5,208(sp) 103 | ld a6,200(sp) 104 | ld a7,192(sp) 105 | ld t3,184(sp) 106 | ld t4,176(sp) 107 | ld t5,168(sp) 108 | ld t6,160(sp) 109 | fld ft0,152(sp) 110 | fld ft1,144(sp) 111 | fld ft2,136(sp) 112 | fld ft3,128(sp) 113 | fld ft4,120(sp) 114 | fld ft5,112(sp) 115 | fld ft6,104(sp) 116 | fld ft7,96(sp) 117 | fld fa0,88(sp) 118 | fld fa1,80(sp) 119 | fld fa2,72(sp) 120 | fld fa3,64(sp) 121 | fld fa4,56(sp) 122 | fld fa5,48(sp) 123 | fld fa6,40(sp) 124 | fld fa7,32(sp) 125 | fld ft8,24(sp) 126 | fld ft9,16(sp) 127 | fld ft10,8(sp) 128 | fld ft11,0(sp) 129 | 130 | addi sp,sp,288 131 | mret 132 | 133 | .size interrupt_handle_xyz, .-interrupt_handle_xyz 134 | ``` 135 | 136 | Simpler devices, without hardware FP, have slightly shorter code, 137 | but still lots of registers (`-march=rv32i -mabi=ilp32`): 138 | 139 | ``` 140 | .option nopic 141 | .text 142 | .align 2 143 | .globl interrupt_handle_xyz 144 | .type interrupt_handle_xyz, @function 145 | 146 | interrupt_handle_xyz: 147 | addi sp,sp,-64 148 | 149 | sw ra,60(sp) 150 | sw t0,56(sp) 151 | sw t1,52(sp) 152 | sw t2,48(sp) 153 | sw a0,44(sp) 154 | sw a1,40(sp) 155 | sw a2,36(sp) 156 | sw a3,32(sp) 157 | sw a4,28(sp) 158 | sw a5,24(sp) 159 | sw a6,20(sp) 160 | sw a7,16(sp) 161 | sw t3,12(sp) 162 | sw t4,8(sp) 163 | sw t5,4(sp) 164 | sw t6,0(sp) 165 | 166 | call driver_xyz_interrupt_service_routine 167 | 168 | lw ra,60(sp) 169 | lw t0,56(sp) 170 | lw t1,52(sp) 171 | lw t2,48(sp) 172 | lw a0,44(sp) 173 | lw a1,40(sp) 174 | lw a2,36(sp) 175 | lw a3,32(sp) 176 | lw a4,28(sp) 177 | lw a5,24(sp) 178 | lw a6,20(sp) 179 | lw a7,16(sp) 180 | lw t3,12(sp) 181 | lw t4,8(sp) 182 | lw t5,4(sp) 183 | lw t6,0(sp) 184 | 185 | addi sp,sp,64 186 | mret 187 | 188 | .size interrupt_handle_xyz, .-interrupt_handle_xyz 189 | ``` 190 | 191 | On the other hand, modern designs use plain C functions as interrupt 192 | handlers, and in this case the generated code looks definitely 193 | better: 194 | 195 | ``` 196 | .option nopic 197 | .text 198 | .align 2 199 | .globl interrupt_handle_xyz 200 | .type interrupt_handle_xyz, @function 201 | 202 | interrupt_handle_xyz: 203 | tail driver_xyz_interrupt_service_routine 204 | 205 | .size interrupt_handle_xyz, .-interrupt_handle_xyz 206 | ``` 207 | 208 | For this style of handlers to work, it is still necessary to save/restore 209 | the ABI caller registers outside the handler; this can be done either in 210 | hardware, or, for cheap devices, in software. 211 | 212 | ### Context switches 213 | 214 | In a multi-threaded environment, a context switches is generally 215 | a sequence of operations performing the following steps: 216 | 217 | - interrupt the current thread 218 | - save the state of the current thread in the thread control block (TCB) 219 | - select the next thread to run 220 | - restore the state of the new thread from the selected TCB 221 | - resume execution in the context of the new thread 222 | 223 | #### Cooperative vs preemptive 224 | 225 | In a cooperative environment, threads deliberately pass control to other 226 | threads either by directly issuing an `yield()` call, or indirectly 227 | by calling a system function that internally yields. 228 | 229 | In a cooperative environment, user interrupt handlers are regular handlers, 230 | they interrupt the current running code (thread or interrupt), 231 | perform some operations, and return in exactly the same context. 232 | 233 | Preemptive environments improve response time by extending some of 234 | the interrupt handlers with code that also performs context switches, 235 | such that the interrupt occurs in the context of one thread but 236 | returns in the context of another thread. 237 | 238 | Traditional interrupt handlers need to be changed from the simple 239 | implementation that calls the peripheral ISR: 240 | 241 | ```c 242 | void __attribute__((interrupt)) 243 | interrupt_handle_xyz(void) 244 | { 245 | driver_xyz_interrupt_service_routine(); 246 | } 247 | ``` 248 | 249 | ... to something like this: 250 | 251 | ```c 252 | stack_elem_t* 253 | static inline __attribute__((naked, always_inline)) 254 | save_context(void) 255 | { 256 | // Assembly code to push all registers onto the thread stack 257 | // ... 258 | return sp; 259 | } 260 | 261 | void 262 | static inline __attribute__((naked, always_inline)) 263 | restore_context(stack_elem_t* sp) 264 | { 265 | // Assembly code to pop all registers from the thread stack 266 | // ... 267 | } 268 | 269 | 270 | void __attribute__((naked)) 271 | interrupt_handle_xxx(void) 272 | { 273 | stack_elem_t* sp = save_context(); // Push all registers onto the thread stack 274 | 275 | driver_xyz_interrupt_service_routine(); 276 | 277 | if (must_switch_context) 278 | { 279 | sp = scheduler_select_next_thread(sp); 280 | } 281 | restore_context(sp); // Pop all registers from the thread stack 282 | return_from_interrupt(); 283 | } 284 | ``` 285 | 286 | The complexity vary from RTOS to RTOS, and in real life it must also include 287 | some critical sections, but the general framework is highly similar to the 288 | above; it requires significant changes in the user code and it is not simple. 289 | 290 | #### Dedicated context switch interrupt 291 | 292 | In modern RTOS friendly architectures, the context switch is delegated 293 | to a single dedicated interrupt, implemented in the system part, such 294 | that all user interrupt handlers no longer need to worry about this and 295 | can be written directly in C/C++: 296 | 297 | ```c 298 | void 299 | interrupt_handle_xyz(void) 300 | { 301 | driver_xyz_interrupt_service_routine(); 302 | } 303 | ``` 304 | 305 | ... while the context switch is performed by an interrupt handler like: 306 | 307 | ```c 308 | void 309 | __attribute__((naked)) 310 | interrupt_handle_context_switch(void) 311 | { 312 | stack_elem_t* sp = save_context(); // Push all registers onto the thread stack 313 | 314 | sp = scheduler_select_next_thread(sp); 315 | 316 | restore_context(sp); // Pop all registers from the stack 317 | } 318 | ``` 319 | 320 | For this to work, the context switch interrupt must be guaranteed to have the 321 | lowest priority, such that it is executed after all other interrupts are 322 | completed and the hart/core must return to thread state. 323 | 324 | #### Triggering a context switch 325 | 326 | With such a dedicated interrupt, triggering a context switch is as simple 327 | as pending a software interrupt: 328 | 329 | ```c 330 | hcb.interrupts[CONTEXT_SWITCH_INTERRUPT_NUMBER].status = INTERRUPTS_SET_PENDING; 331 | ``` 332 | 333 | Pending a context switch interrupt can be performed either in other interrupt 334 | handlers (and in this case the switch occurs after all interrupts are 335 | completed), or in thread mode, in the `yield()` function, and in this case 336 | the switch is performed as soon as interrupts are enabled. 337 | 338 | ## Use cases 339 | 340 | Once defined the mechanism to switch contexts via a dedicated interrupt, 341 | it is easy to imagine that, with a preemptive scheduler, 342 | most peripheral interrupts can trigger context switches, 343 | so it becomes clear that both peripheral and context switch interrupts 344 | should be given equal attention in the design. 345 | 346 | The most general case is when a peripheral interrupt occurs, while it is 347 | processed other interrupts with lower or equal priorities occur too and 348 | wait their turn, and are processed back-to-back, 349 | and one of those interrupts requests a context switch, 350 | so the last interrupt in the chain is the context switch interrupt. 351 | 352 | From simple to complex, the use cases are: 353 | 354 | ### Single peripheral interrupt, no context switch 355 | 356 | This is the simplest case, when the driver processes the peripheral 357 | data, but does not need to inform the associated thread of the 358 | change, so it does not request a context switch, and after the 359 | interrupt completed, execution returns to the same thread. 360 | 361 | ### Single peripheral interrupt with context switch 362 | 363 | If the driver decides to inform the thread that new data is available, 364 | for example by raising a semaphore, or pushing data onto a queue, it 365 | must pend the context switch interrupt, which will be executed 366 | back-to-back with the peripheral interrupt. 367 | 368 | ### Multiple interrupts with context switch 369 | 370 | If, during the execution of the peripheral interrupt, other 371 | interrupts with lower or equal priority occur, they do not 372 | preempt the current interrupt, but are remembered and when 373 | interrupt completes are executed in sequence, back-to-back, 374 | including the context switch interrupt, if requested. 375 | 376 | ## Tail chaining 377 | 378 | Given the use cases presented, with virtually all 379 | peripheral interrupts requesting context switches, 380 | it results that it is highly likely 381 | to have at least two back-to-back interrupts. 382 | 383 | Old architectures that use interrupt handlers annotated 384 | with the `interrupt` attribute, simply call the handlers 385 | in sequence, and each handler saves and restores all registers. 386 | 387 | For back-to-back interrupts, the registers restored by 388 | the first interrupt have exactly the same values as those 389 | saved by the second interrupt, so the long list of 390 | register operations is practically useless, but the 391 | compiler does not know this, so the code is not efficient. 392 | 393 | For the current RISC-V POSIX ABI, the behaviour is: 394 | 395 | - process the top priority interrupt 396 | - enter annotated handler 397 | - **save 16 general registers and 20 FP registers** 398 | - call the C/C++ functions and return 399 | - **restore 16 general registers and 20 FP registers** 400 | - exit annotated handler 401 | - possibly process other interrupts with lower or similar 402 | priority that occur while in interrupt mode, each of them doing (**N times**) 403 | - enter annotated handler 404 | - **save 16 general registers and 20 FP registers** 405 | - call the C/C++ functions and return 406 | - **restore 16 general registers and 20 FP registers** 407 | - exit annotated handler 408 | - process the `context_switch` interrupt (lowest possible priority) 409 | - enter naked handler 410 | - **save 32 general registers and 32 FP registers** 411 | - save the SP in the current thread control block 412 | - select the next thread to run 413 | - load SP from the new thread control block 414 | - **restore 32 general registers and 32 FP registers** 415 | - exit naked handler 416 | - return from interrupt in the context of the new thread 417 | 418 | In modern designs, which use plain C interrupt handlers, 419 | the registers are saved before entering the first handler 420 | and restored after the first handler, 421 | so the behaviour is significantly more efficient: 422 | 423 | - reserve space for the FP registers, but do not save them 424 | - **save** the ABI caller registers 425 | - call the handler for top priority interrupt 426 | - possibly call other handlers for interrupts with lower or similar 427 | priorities, that occur while in interrupt mode 428 | - call the `context_switch` handler (lowest possible priority) 429 | - save the rest of the general registers (ABI callee) 430 | - save the SP in the current thread control block 431 | - select the next thread to run 432 | - load the SP from the new thread control block 433 | - restore the rest of the general registers (ABI callee) 434 | - return from the handler 435 | - **restore** the ABI caller registers 436 | - return from interrupt in the context of the new thread 437 | 438 | ## Lazy FP stacking 439 | 440 | For devices with hardware FP units, the large number of FP 441 | registers may severely impact the interrupt latency. 442 | 443 | Old architectures that use interrupt handlers annotated 444 | with the `interrupt` attribute, should always save and 445 | restore all the FP registers, as seen in the example. 446 | 447 | In modern designs, which use plain C interrupt handlers, 448 | and the registers are saved before entering the handlers, 449 | it is possible to use a more efficient mechanism, which 450 | only reserves the space onto the thread stack, but does 451 | the actual save only when the first FP instructions is 452 | executed. 453 | 454 | Since most interrupt handlers do not use FP instructions, 455 | the saving/restoring of the FP registers is skipped 456 | entirely, and the interrupt latency is not affected. 457 | 458 | > More details on this mechanism to be added as a separate page. 459 | The design should also consider the ABI callee registers, 460 | handled during context switches. 461 | 462 | ## Conclusions 463 | 464 | When designing a new architecture, the focus should be 465 | to optimise the most common use case, which is a sequence 466 | of 2 or more back-to-back interrupts that in most cases 467 | end with the context switch interrupt. 468 | 469 | ### `interrupt` handlers are not efficient 470 | 471 | Although traditional interrupt handlers annotated 472 | with the `interrupt` attribute may seem a solution for 473 | fast interrupts, they are really fast only if everything 474 | is inlined and no other plain C function is called, otherwise 475 | the entire ABI caller registers must be saved and restored, 476 | including the FP registers, and it must be done repeatedly 477 | for each interrupt, the possibilities for tail chaining and 478 | lazy FP stacking not being realistic. 479 | 480 | ### Plain C functions are recommended 481 | 482 | Plain C interrupt handlers are much better suited for the 483 | common use cases and have the following benefits: 484 | 485 | - easier to use in user code 486 | - allow tail chaining without user intervention 487 | - allow lazy FP stacking without user intervention 488 | - save only the ABI caller registers 489 | - save the ABI callee registers only if context switches are triggered 490 | 491 | The preferred implementation is with hardware stacking/unstacking, 492 | but cheap devices can also choose to do the stacking/unstacking 493 | in software, together with vectoring, so from a user 494 | point of view they are similar, the interrupt handlers remain 495 | the same plain C functions. 496 | -------------------------------------------------------------------------------- /improvements-upon-privileged.md: -------------------------------------------------------------------------------- 1 | # Appendix A: Improvements upon RISC-V privileged 2 | 3 | ## Rationale 4 | 5 | As mentioned in RISC-V Volume I, v2.2, the _"RISC-V is a new instruction set architecture 6 | (ISA) that was originally designed to support computer architecture research and education. 7 | ... The RISC-V manual is structured in two volumes. This volume covers the user-level ISA 8 | design, including optional ISA extensions. The second volume provides the privileged 9 | architecture."_ 10 | 11 | The RISC-V Volume II, v1.10, mentions: _"... This document describes the RISC-V privileged 12 | architecture, which covers all aspects of RISC-V systems beyond the user-level ISA, 13 | including privileged instructions as well as additional functionality required for 14 | running operating systems and attaching external devices."_ 15 | 16 | This is great news for the GNU/Linux community and for the academia, but attempts 17 | to identify in the RISC-V specs how the new design meets the requirements of bare-metal 18 | embedded devices were not very successful; browsing the two docs revealed only some 19 | references to Tensilica and ARC (probably not the most successful embedded architectures), 20 | and some incomplete specs for **the RV32E subset** (which halves the number of general 21 | registers, do not support hardware floating point and makes some counter instructions 22 | optional). 23 | 24 | According to the privileged specs in Volume II, **RISC-V embedded systems share the 25 | exact same definitions as systems running Unix-like operating systems, but they do 26 | not include the "S" (Supervisor) mode features**. 27 | 28 | This strategy does not work very well for real-time systems; for example, **in the 29 | RISC-V interrupt model**, without special measures, **interrupts remain disabled 30 | while executing interrupt handlers**. This may be acceptable for general purpose 31 | Linux kernels, but for hard real-time systems this is generally a no-go, since 32 | **interrupt latency** may end up well above tolerable limits. 33 | 34 | ### The dividing line 35 | 36 | Currently there is no clear understanding where the dividing line between RISC-V 37 | general purpose and microcontroller devices should be. 38 | 39 | One possible approach is to start by defining what microcontroller devices are not: 40 | they definitely are not expected to run multi-process applications on top of 41 | Unix-like operating systems. Although some 42 | projects try to challenge this, it is generally agreed that **Unix-like operating 43 | systems DO need virtual memory and supervisor modes** to properly run multi-process 44 | applications. 45 | 46 | After long considerations, the conclusion was that the common and logical dividing 47 | line between the RISC-V privileged profile and a RISC-V microcontroller 48 | profile is the capability to run a full-blown operating system, that uses virtual 49 | memory and supervisor modes (like Unix and derivatives); as such, **RISC-V 50 | microcontrollers are devices 51 | that do not implement a virtual memory system or supervisor modes** and are 52 | intended to run single-process multi-threaded applications only (and are not 53 | intended to run Unix-like systems). 54 | 55 | > [JB] Two more criteria may be used 56 | for dividing microcontrollers and application processors: pipeline 57 | complexity and memory latency. Microcontrollers use simpler, in-order 58 | pipelines and have memory subsystems that are tightly synchronized to 59 | the execution pipeline. ... Out-of-order and parallel 60 | execution are becoming common features in application processors, but 61 | are not used in microcontrollers, since the latter must have predictable 62 | execution timing. [ilg] The pipeline complexity and memory latency should 63 | be implementation specific. Microcontrollers intended for applications that 64 | need predictable execution timings may decide not to implement 65 | out-of-order and parallel execution, or allow to disable them at run time. 66 | 67 | 68 | ## Improvements upon RISC-V privileged 69 | 70 | The main 'pain-point' with the current RISC-V privileged specs 71 | is the mechanism to handle interrupts, which is not suitable for real-time, 72 | low power, bare-metal embedded applications. 73 | 74 | The following issues were identified in the current RISC-V privileged specs when 75 | used for bare-metal applications: 76 | 77 | | RISC-V Privileged | RISC-V Microcontroller | 78 | |-------------------|------------------------| 79 | | Handlers run with interrupts disabled; low priority interrupts that take a long time to complete may delay high priority interrupts, affecting real-time capabilities. | The microcontroller profile allows nesting; high priority interrupts preempt low priority ones, being processed as fast as possible. | 80 | | There is only a single trap handler, serving all interrupts and exceptions (the so called _vectored_ mode is so complicated to use that it is not even worth mentioning). | The microcontroller profile has an advanced vectored mode; interrupts are dispatched to separate handlers, via a simple array of pointers, easy to define in C/C++. | 81 | | The interrupt code must be written in assembly, to perform the low level stacking/unstacking and return from exception; this code **is** complicated, a good example is the [Linux handler](https://github.com/torvalds/linux/blob/master/arch/riscv/kernel/entry.S), and the current Linux implementation does not even re-enable interrupts while in handler mode. | The microcontroller profile automatically performs the stacking/unstacking, allowing all application interrupt handlers to be written as C/C++ functions, with minimum latency. | 82 | | The current ISA Volume I manual defines a common POSIX ABI to be used by all devices, but this ABI requires the caller to save a lot of registers, making interrupt stacking/unstacking very expensive and increasing latency. | Better adapted to real-time, the microcontroller profile defines a lighter Embedded ABI, reducing latency. | 83 | | The privileged profile defines a few hundred CSRs, and encourages implementation to define even more custom CSRs; current debuggers do not have support for proprietary mechanisms like CSRs, and viewing/changing these registers requires unusual hacks. | The microcontroller profile uses a very limited set of CSRs and favours the use of memory mapped registers, which are very well supported by debuggers/IDEs, including via detailed peripheral register viewers. | 84 | | The current RISC-V ISA does not explicitly define a stack (it is only mentioned in the POSIX ABI), and there is no separate stack for interrupts; in a multi-threaded application, interrupts can occur on any thread stack, thus when provisioning for thread stacks, the additional memory requirements of all interrupts must be added to all thread stacks, wasting precious RAM. | The microcontroller profile not only defines the stack pointer register, but also adds a shadow thread stack pointer, separate from the main stack used by the interrupts, improving RTOS implementations and reducing tread stack requirements for RTOS multi-threaded applications. | 85 | | A common reason of crashes during embedded systems development is one of the threads running out of space; the specs do not provide a standard way of detecting stack overflows. | The microcontroller profile adds a stack limit register and stack overflows trigger exceptions. | 86 | | The system clock runs from the low frequency real-time clock, which has low resolution and, at common 32768 Hz frequencies, does not allow accurate 1000 Hz scheduler clocks. | The microcontroller profile defines separate low-power real-time clock and high accuracy system clock, improving both general clock resolution and scheduler clock accuracy. | 87 | | There is no explicit mechanism to trigger and implement context switches in a multi-threaded RTOS. | The microcontroller profile adds a dedicated interrupt, guaranteed with the lowest priority, to be used for all context switches, relieving all other interrupt handlers from this duty. | 88 | || The microcontroller profile adds an architecture device reset mechanism. | 89 | || The microcontroller profile adds an architecture resumable NMI. | 90 | | The startup code also requires some assembly code, to set the stack pointer and the `gp` register. | The microcontroller profile adds a simplified device startup code, based on a table of standard C/C++ pointers, requiring no assembly code at all. | 91 | 92 | ## Criticism 93 | 94 | ### Fragmentation would break upward compatibility 95 | 96 | While discussing the opportunity for a new RISC-V embedded profile, the most common 97 | concern raised was that migrating an applications written for a microcontroller to a 98 | larger application class core would be more difficult. 99 | 100 | Well, yes, in theory it might be possible to design a board in such a way to allow to 101 | swap in a bigger core, and to design the application in such a way to ignore the MMU 102 | and the supervisor mode and continue to use a RTOS; the application will probably run 103 | faster due to the improved pipelines and core clock rates, but there are several 104 | practical issues: 105 | 106 | - in industrial embedded applications the processor selection is not based on the 107 | architecture (which in the majority of cases is Cortex-M only), but on the available 108 | on-chip peripherals; it is very unlikely to find an application class core with the 109 | desired peripherals available on a microcontroller; 110 | - application class cores generally do not have internal flash/ram, requiring 111 | external chips; external memory chips require lots of address and data pins, which 112 | mean large BGA chips, larger & more complex PCBs, and generally higher costs. 113 | 114 | So this concern is not realistic, and not accepting a distinct microcontroller 115 | profile, optimised for real-time applications simply for maintaining compatibility 116 | with the privileged specs is not a beneficial approach. 117 | 118 | ### No need to, everything will run Linux in the future 119 | 120 | > "In the future everything will run Linux, so defining separate non-Linux profiles 121 | is a futile exercise." 122 | 123 | Yes. Sure. Eventually. No doubt about it. When waiting long enough many marvellous things can happen. 124 | 125 | However, for those who are not ready to wait for the kingdom come, having simpler 126 | devices for critical real-time applications is a requirement for today. 127 | 128 | ### Automatic stacking/unstacking is evil 129 | 130 | > "Automatic stacking/unstacking is fine for Cortex-M, but it is 131 | very objectionable for 132 | RISC-V. The difference is in ARM's MOVEM instruction. A Cortex-M 133 | already has the hardware to move multiple words to/from the stack 134 | because it has to implement the MOVEM instruction anyway. So doing 135 | this specially on trap entry/exit is a small addition to what is 136 | already required to execute the user instruction set. The story for 137 | RISC-V is different, as it has no MOVEM-like instruction. Therefore, 138 | having the hardware automatically push/pop a collection of registers on 139 | trap entry/exit is a larger addition for RISC-V than it was for ARM." 140 | 141 | Well, that's a point of view. However, it must be noted that even for 142 | the tiny Cortex-M0, so economical in terms of transistors, ARM decided 143 | to do automatic stacking/unstacking, so the added complexity might not 144 | be that high. 145 | 146 | Not to mention another detail: Cortex-M has 16 registers, and the 147 | EABI requires R0-R3, R12, R14, PC and xPSR 148 | to be stacked/unstacked automatically. On the other hand, 149 | the LDM/STM instructions, probably due to to the tight encoding, 150 | are able to move only half of the registers, 151 | (R0-R7), so the logic to do the stacking/unstacking is 152 | definitely more capable than required by the instruction set. 153 | It is not by accident that Cortex-M has automatic stacking/unstacking 154 | simply because support for LDM/STM was present anyway, it is 155 | by design. As an exercise of reversed logic, 156 | it might also be argued that Cortex-M has the LDM/STM instructions 157 | because the logic for moving multiple words was already 158 | available from the automatic stacking/unstacking mechanism. 159 | 160 | Also RISC-V having no MOVEM-like instructions may save a few 161 | transistors, but otherwise this is not exactly a feature, 162 | it simply makes saving contexts in multi-threaded environments more 163 | complicated and possibly less efficient. 164 | 165 | ### Automatic stacking/unstacking the interrupt context increases latency 166 | 167 | The current RISC-V ABI requires the caller to save the following registers: 168 | `ra`, `t0`, `t1`, `t2`, `a0`, `a1`, `a2`, `a3`, `a4`, `a5`, `a6`, `a7`, `t3`, 169 | `t4`, `t5`, `t6`. This amounts to 16 registers. If floating point is used, 20 170 | more registers must be saved. 171 | 172 | The current RISC-V privileged specs do not define a hardware stack and do not 173 | require the core to save any registers 174 | on the stack, delegating this to the assembly trap handler. 175 | 176 | Some voices claim that this strategy allows the application to have a highly optimised 177 | assembly trap handler, and as such avoid pushing all registers. 178 | 179 | Well, yes, in theory, for very simple (blinky) applications this might be so, but for 180 | real applications, regardless how optimised is the assembly trap handler, at a certain 181 | point it'll need to call a C function (for example to access a system service, like 182 | posting to a semaphore), and at this point the entire ABI caller register set must be 183 | pushed onto the stack, and popped after the C function returns. 184 | 185 | The result of this strategy is that the assembly trap handler will initially save a 186 | small number of registers (those known to be used by the handler), then save the rest 187 | of the register set to prepare for the C call, so the full register set must be 188 | saved anyway. 189 | 190 | For the current ABI this still means 20 registers, which is a lot. The real problem 191 | here is not the decision to save them automatically (which greatly simplifies the 192 | software), but the current ABI which is designed for user mode Unix applications. 193 | 194 | The solution is a **separate Embedded ABI (EABI)**, optimised for embedded real-time 195 | applications, with a smaller caller register set. 196 | 197 | ### Automatic stacking/unstacking should be replaced by compiler attribute 198 | 199 | > "Better to have a “handler” function attribute that causes the compiler 200 | to save only and exactly the registers the function modifies. If a handler 201 | function calls a regular C function then it needs to save all the volatile 202 | registers first. 203 | 204 | Well, yes, as argued before, for very simple applications 205 | it is possible to imagine interrupt handlers incrementing a 206 | variable and 207 | returning, but this is rare, by far the biggest majority of interrupt 208 | handlers call C/C++ functions to perform system services, like posting 209 | a semaphore, pushing to a queue, or any other synchronization mechanism. 210 | 211 | So, by using a custom prolog/epilogue which may decide to first save a few 212 | registers and then save all those required 213 | by the ABI, it might end up with more registers that need to be saved, thus 214 | further worsening the latency. 215 | 216 | > "You consider it too much work to add `__attribute__ ((interrupt))` 217 | to appropriate C functions, as on non-Cortex M ARM32, ARM64, x86 etc?" 218 | 219 | No, adding an attribute is only a minor nuisance and a possible reason 220 | for incompatibilities between compilers. 221 | 222 | However, the question is interesting, because it reveals a common mistake, 223 | it expects microcontrollers to 224 | share the same behaviour with general purpose application devices. 225 | 226 | Unfortunately this is not the case, one major difference is that 227 | microcontrollers have lots of peripherals, each with one or more 228 | interrupts; thus, an embedded application, with or without an RTOS, 229 | is mainly interrupt driven. 230 | 231 | And with the advent of fast devices, like USB, or QSPI, possibly with 232 | DMAs, the number of interrupts may be quite high, and the total time 233 | spent in interrupt mode may be significant. 234 | 235 | Thus the need for a careful design to tackle efficiency issues. 236 | 237 | The very general case is with a sequence of interrupts of decreasing 238 | priorities, that most probably trigger a context switch, on a machine 239 | with hardware floating point. 240 | 241 | With hardware stacking/unstacking and lazy FP, the expected behaviour 242 | when an interrupt with a priority higher than the threshold occurs, is: 243 | 244 | - reserve space for the FP registers, but do not save them 245 | - save the ABI caller registers 246 | - enter handler for top priority interrupt, which calls 247 | other C/C++ functions and finally returns 248 | - possibly enter other handlers for interrupts with lower or similar 249 | priorities, that occur while in interrupt mode 250 | - enter the `context_switch` handler (lowest possible priority) 251 | - save the rest of the general registers 252 | - save the SP in the current thread control block 253 | - select the top priority thread 254 | - load the SP from the new thread control block 255 | - restore the rest of the general registers 256 | - return from the handler 257 | - restore the ABI caller registers 258 | - return from interrupt in the context of the new thread 259 | 260 | If the interrupt routines do not use FP, the FP registers are not saved, 261 | having no impact on latency; they will be saved before the first 262 | FP instruction is executed. 263 | 264 | Without hardware stacking/unstacking, without lazy FP and relying only 265 | on the compiler to save/restore the registers, for the current RISC-V 266 | POSIX ABI, the behaviour is: 267 | 268 | - process the top priority interrupt 269 | - enter annotated handler 270 | - **save 16 general registers and 20 FP registers** 271 | - call the C/C++ functions and return 272 | - **restore 16 general registers and 20 FP registers** 273 | - exit annotated handler 274 | - possibly process other interrupts with lower or similar 275 | priority that occur while in interrupt mode, each of them doing (**N times**) 276 | - enter annotated handler 277 | - **save 16 general registers and 20 FP registers** 278 | - call the C/C++ functions and return 279 | - **restore 16 general registers and 20 FP registers** 280 | - exit annotated handler 281 | - process the context_switch interrupt (lowest possible priority) 282 | - enter naked handler 283 | - **save 32 general registers and 32 FP registers** 284 | - save the SP in the current thread control block 285 | - select the top priority thread 286 | - load SP from the new thread control block 287 | - **restore 32 general registers and 32 FP registers** 288 | - exit naked handler 289 | - return from interrupt in the context of the new thread 290 | 291 | As it can be seen, without special precautions, each interrupt 292 | must push/pop the ABI caller registers, even if interrupts 293 | are back to back and the popped registers are immediately 294 | pushed again, possibly several times in a row. 295 | 296 | It might be possible to somehow further optimise this mechanism, 297 | but I seriously doubt that it can be more efficient than the hardware 298 | stacking/unstacking, with tail chaining and lazy FP. 299 | 300 | Challenge: find a solution more efficient than the current proposal, 301 | in terms of time and/or ease of use. In practical terms, write an 302 | user interrupt handler that must be able to conditionally perform 303 | a context switch (sometimes it does, sometimes it does not, depending 304 | on the peripheral). 305 | 306 | ### Assembly interrupt handlers should be ok, they reside in the system part 307 | 308 | > "Why insist on having the interrupt handlers written in C, when they can be very well 309 | written in assembly, since they reside in the system part, written by the system 310 | programmers, not by the application programmer." 311 | 312 | Well, this might be the case for Linux, where the kernel and the modules are 313 | indeed written by system gurus, but in embedded bare-metal applications the 314 | interrupt handlers are very application specific and cannot be part of 315 | the RTOS or drivers/libraries, so it is the application programmer who must 316 | write them, not someone else, thus the need for the interrupt handlers to be 317 | as easy to write as possible, the best choice being to have them defined as plain 318 | C/C++ functions. 319 | 320 | > "Interrupt handlers do not need to be entirely in assembler, only 321 | the entry/exit millicode needs to be part of the system. That millicode 322 | *can* be written by system gurus, while the application ISRs, written by 323 | application programmers, are called via the millicode. Unless 324 | there is some faster memory access cycle that the hardware can use, 325 | automatic context save/restore (presumably in microcode) will be no 326 | faster than RISC-V millicode." 327 | 328 | Well, reversing the logic, there is no best case scenario when the millicode 329 | will be faster than the microcode; even when there is no faster memory access 330 | for the microcode, the millicode will still have to make a call to the actual 331 | interrupt handler so the total timing cannot be better. The main difference 332 | is the ease of use, the application programmer will no longer need any guru to 333 | write the millicode. 334 | 335 | 336 | ### CSRs cannot be memory-mapped 337 | 338 | Another almost 'religious' RISC-V issue is related to accessing the system registers. 339 | Before a more elaborate explanation, those who claim this should remember that 340 | the current privileged specs moved `mtime` and `mtimecmp` from CSRs to 341 | memory mapped, and the PLIC specs require all registers to be memory mapped. 342 | 343 | Generally, with 344 | the exception of a very limited set of special cases, industry standard 345 | architectures map most of the system peripherals and registers to a memory area. 346 | Instead, the 347 | RISC-V ISA defined several special instructions allowing to address 4096 per-hart 348 | registers. 349 | 350 | It is generally agreed that for application class devices, with complex out-of-order 351 | pipelines, the current mechanism has several 352 | advantages. Unfortunately, the RISC-V privileged specs abused this 353 | mechanism, and now there are several hundred registers defined in this proprietary 354 | space, some of them even read-only, and obviously creating no security threads (like 355 | `mvendorid`, `marchid`, etc). 356 | 357 | From the point of view of a microcontroller profile, this mechanism of accessing 358 | the system registers has two main disadvantages: 359 | 360 | - requires assembly code to access each individual register 361 | - it is not supported by current development tools (debuggers have no ways of accessing 362 | these registers, IDEs have no special views for them, etc). 363 | 364 | Mapping system registers in the memory space is perfectly possible, and the RISC-V 365 | privileged specs even mandates for some registers like the `mtime` and `mtimecmp`, 366 | not to mention the PLIC, to be memory mapped. 367 | 368 | However, from a technical point of view, for virtual memory systems, accessing system 369 | registers from code running in user mode requires 'punching' some holes into the 370 | virtual memory space to reach the special memory mapped registers, which adds some 371 | complexity, and may cause havoc to the pipelines. But, since the specs require this 372 | mechanism for `mtime` and `mtimecmp`, 373 | it no longer matters if there are two or more such memory mapped registers. 374 | 375 | Fortunately, microcontrollers running without a MMU do not have this problem, 376 | accessing any memory mapped registers is usual, and the cost of doing so is 377 | perfectly acceptable. 378 | 379 | Plus that in the microcontroller profile there are _no_ hardware security 380 | boundaries, so the risk of attacks somehow exploiting the CSR-as-MMIO is a 381 | non-issue. 382 | 383 | ### The hardware stack limit register is expensive 384 | 385 | > "The stack limit register needs to be read and compared on every 386 | store via the stack register so it should have dedicated read circuit 387 | and comparator. 388 | 389 | Yes, it is a small price to pay, but by far the most common cause of crashes 390 | in a multi-threaded device is stack overflow, so detecting this exception 391 | should be worth the extra price. 392 | 393 | ### Microcontrollers do not need privilege levels 394 | 395 | > If microcontrollers do not run a kernel, why have privilege levels? 396 | 397 | It is true that microcontrollers do not run a 'unix kernel' (they run a 'scheduler'). 398 | But for some security concerned applications, microcontrollers can run the 399 | application code in unprivileged mode and the scheduler/drivers in 400 | privileged mode. 401 | 402 | ARM Cortex-M devices can run code in unprivileged mode, and new 403 | Cortex-M23/M33 devices even have a TrustZone security feature. 404 | Also most of the Cortex-M devices have an MPU, which prevents unprivileged 405 | code accessing system memory/registers. 406 | 407 | Using the unprivileged mode is not at all unusual, 408 | [ARM CMSIS](http://www.keil.com/pack/doc/CMSIS/General/html/index.html), the industry 409 | software standard for Cortex-M devices, includes a component called CMSIS RTOS, and 410 | the reference implementation is 411 | [Keil RTX](http://www.keil.com/pack/doc/CMSIS/RTOS/html/rtxImplementation.html), 412 | which by default runs application code in unprivileged mode. 413 | 414 | It is true that, with all ARM marketing, RTX is not the most successful RTOS, 415 | but even FreeRTOS has a mode in which the MPU can be activated. 416 | 417 | ### C embedded system programmers vs C embedded application programmers 418 | 419 | > C embedded systems programmers might be used to accessing peripheral via 420 | registers, but C embedded application programmers are used to accessing peripherals 421 | via system calls. C embedded system programmers are also used to writing ISRs in assembly. 422 | 423 | The distinction between system and application programmers stands perfectly true for 424 | Unix-like systems, where a small team of highly experienced system programmers write 425 | the low level kernel code and the device drivers, allowing millions of application 426 | programmers to access all required resources via system calls, without bothering 427 | with details. 428 | 429 | However, in the embedded world, this distinction is almost non existent, C embedded 430 | programmers are both application and system programmers. On one hand they need 431 | full and unlimited control of the hardware, and on the other hand they would 432 | like too have access from high level C/C++ code. 433 | 434 | Having to use assembly code is definitely not a joy for modern embedded programmers, 435 | especially since Cortex-M came to market in 2004, and allowed to write interrupt 436 | handlers directly in C/C++, without any assembly stubs, millicodes or compiler 437 | attributes/pragmas. 438 | 439 | ### Comparisons with ARM are meaningless 440 | 441 | > "Arguments like 'ARM does this' are very weak for a feature in RISC-V" 442 | 443 | Well, the creators of the RISC-V instruction set probably have all reasons to be 444 | proud of their design, and many in the academia may consider that application 445 | class devices based on RISC-V may very well make ARM similar devices irrelevant, 446 | but in the embedded space, ARM Cortex-M **is** the industry standard, and 447 | disregarding it is not beneficial. 448 | 449 | Except the auto industry, which is more conservative, where old proprietary 450 | cores still have a significant market share (but losing it), and some very 451 | cost driven applications, 452 | where 8-bit microcontrollers are considered still good enough, the majority of the 453 | silicon vendors 454 | now sell microcontrollers based on Cortex-M cores; the trend is clear, 455 | it was observed for more than 10 years, and the Cortex-M market share is 456 | expected to continue to increase in the years to come. 457 | 458 | ARM tried to sell licenses for microcontrollers even before the Cortex-M family 459 | was created, but with very limited success. The devices were very similar to 460 | their application cores, and used the same solutions, for example a single 461 | interrupt handler, and lots of assembly code required to start and make use of 462 | core. 463 | 464 | There may be multiple reasons why Cortex-M was so successful, but the main one 465 | probably is the ease of use, and the C-friendliness, by design. 466 | 467 | This lessened the need for a C system programmer to act as guru, and allowed 468 | C application programmers to fully take control of their applications. 469 | 470 | > "RISC-V microcontrollers should compare to PIC or AVR devices" 471 | 472 | This was probably true 10-15 years ago, but today it is no longer the case. 473 | Not only the industry migrated to 32-bit cores, but the ecosystems around 474 | Cortex-M and the ease of use made most of the other cores irrelevant. 475 | 476 | There is also another fact to be noted: according to several studies, the 477 | world wide population of programmers is doubling every few years (let's 478 | say N, less than 10). Assuming a constant share for the embedded programmers, 479 | statistically half of them have less than N/2 years of experience, and most 480 | of these new (relatively inexperienced) programmers met Cortex-M as their 481 | first architecture (which is already 14 years old!). They may have heard of 482 | PIC and AVR, but never had to write assembly interrupt handlers, and 483 | asking them to do so will obviously be seen as a major step backward. 484 | 485 | ### Microcontrollers should not be on networks 486 | 487 | > "Generally, microcontrollers should probably not be on networks, except 488 | possibly for multi-core versions that can handle real-time tasks on one 489 | core and network latency on the other." 490 | 491 | Yes, multi-hart devices would be excellent for hard real-time applications, 492 | by allocating separate harts for each critical task, 493 | but with nested, pre-emptive high priority interrupts, even a single hart device 494 | can handle multiple tasks very well, and if the real-time tasks are driven by ISRs, 495 | then the network stack can run at a lower priority. 496 | 497 | ### 64-bit microcontrollers will never be needed 498 | 499 | > "Why 64-bit? Any system big enough to need more 500 | than 32-bit addressing is probably already running an operating system 501 | like Linux." 502 | 503 | That's a good question. By the time 8051 was king, many questioned 504 | why would someone think of 16-bit microcontrollers. While some were 505 | debating this, vendors gradually offered devices with 12 address 506 | bits, then 16 bits, even 24 bits. Cortex-M came boldly and provided 507 | 32-bit registers and a large (32-bit) linear address space. 508 | 509 | Although a 4 GiB memory space may be enough for most current devices, 510 | it should be noted that 64-bit devices bring not only a wider memory 511 | space, but also 64-bit registers, and native atomic 64-bit accesses. 512 | 513 | Applications with lots of integer arithmetic may benefit from 64-bit 514 | cores, and, indirectly, applications manupulating double floating point 515 | numbers may also benefit. 516 | 517 | Also applications with large and fast timers benefit from atomic 64-bit 518 | accesses, which otherwise require a lot of juggling on a 32-bit platform 519 | (see the recommended RISC-V mechanism to access the timer registers on 520 | a 32-bit device). 521 | 522 | ## Proposed steps to change the current RISC-V specs 523 | 524 | It is not realistic to expect a new set of RISC-V microcontroller specs to be 525 | adopted overnight. However, given the expected ratification of the current specs by 526 | the RISC-V Foundation, it is quite urgent to ensure that this process will not block 527 | further developments in the embedded/microcontroller space. 528 | 529 | This will probably require several steps, but the main ones are: 530 | 531 | - acknowledge that microcontroller devices have different requirements 532 | compared to systems running Unix-like operating systems 533 | - acknowledge that the solutions provided by the current privileged mode 534 | specs are not optimal for real-time, low power, bare-metal embedded applications 535 | - acknowledge the need for changes in the current specs 536 | - relax the requirements for the privileged specs 537 | - create new specs for the microcontroller profile 538 | 539 | ### Acknowledge the need for the changes 540 | 541 | Given the current structure of the RISC-V Foundation, with most of the efforts 542 | focused on finalising the specifications required for running Unix-like 543 | operating systems, acknowledging that the specifications for general 544 | purpose devices do not work very well for real-time systems will be a challenge. 545 | 546 | ### De-entangle the privileged specs 547 | 548 | The RISC-V Volume II, v1.10, mentions: _"... the entire privileged-level design 549 | described in this document could be replaced with an entirely different 550 | privileged-level design without changing the user-level ISA, and possibly without 551 | even changing the ABI. In particular, this privileged specification was designed 552 | to run existing popular operating systems, and so embodies the conventional 553 | level-based protection model. Alternate privileged specifications could embody 554 | other more flexible protection-domain models."_ 555 | 556 | So, at least in theory, it should be possible to extend the specs, but in 557 | practice it is not clear how exactly this can be done. Ideally, **the Volume 558 | I should not explicitly refer to Volume II**, or should refer to it as optional, 559 | leaving room for a complementary specification for other classes of devices, 560 | including microcontroller devices. 561 | 562 | As a parenthesis, the RISC-V ISA specs provide a very high degree of flexibility 563 | allowing for custom extensions for the instruction set, but they are still very 564 | rigid by insisting that all these devices should be able to run Unix-like 565 | operating systems. 566 | 567 | ### Move all mandatory CSRs to the privileged specs 568 | 569 | Apart from relaxing the need for the privileged specs, the instruction set defined 570 | by Volume I is generally acceptable for microcontroller devices. 571 | 572 | The only notable exception is in Chapter 2.8, the `rdcycle`, `rdtime` and `rdinstret` 573 | which should be moved to Volume II. 574 | 575 | Related to these instructions, the list of CSRs defined in Table 19.3 should be 576 | shortened, by moving the `cycle`, `time` and `instret` to Volume II, allowing for 577 | microcontrollers to define a more efficient set of mandatory registers. 578 | 579 | ### Remove the POSIX ABI from Volume I 580 | 581 | Another important issue with the current specs is the mandatory use of the POSIX ABI, 582 | which is too expensive for real-time devices. 583 | 584 | The solution is to move it either to Volume II, or to a separate assembly 585 | programmer's handbook, and allow a microcontroller profile to define 586 | an EABI, (Embedded ABI), as a lighter version of the POSIX ABI. 587 | 588 | --------------------------------------------------------------------------------