├── .gitignore ├── 0000-template.md ├── LICENSE ├── README.md └── text ├── 0001-namespace-syscalls.md ├── 0002-event-overhaul.md ├── 0003-channels.md ├── 0004-ptrace.md ├── 0005-scheme-forward-fds.md ├── 0006-scheme-path.md ├── 0007-base-system-repo.md ├── 0008-userspace-signals.md └── 0009-namespace-scheme.md /.gitignore: -------------------------------------------------------------------------------- 1 | *~ 2 | -------------------------------------------------------------------------------- /0000-template.md: -------------------------------------------------------------------------------- 1 | - Feature Name: (fill me in with a unique ident, my_awesome_feature) 2 | - Start Date: (fill me in with today's date, YYYY-MM-DD) 3 | - RFC PR: (leave this empty) 4 | - Redox Issue: (leave this empty) 5 | 6 | # Summary 7 | [summary]: #summary 8 | 9 | One paragraph explanation of the feature. 10 | 11 | # Motivation 12 | [motivation]: #motivation 13 | 14 | Why are we doing this? What use cases does it support? What is the expected outcome? 15 | 16 | # Detailed design 17 | [design]: #detailed-design 18 | 19 | This is the bulk of the RFC. Explain the design in enough detail for somebody familiar 20 | with the language to understand, and for somebody familiar with the compiler to implement. 21 | This should get into specifics and corner-cases, and include examples of how the feature is used. 22 | 23 | # Drawbacks 24 | [drawbacks]: #drawbacks 25 | 26 | Why should we *not* do this? 27 | 28 | # Alternatives 29 | [alternatives]: #alternatives 30 | 31 | What other designs have been considered? What is the impact of not doing this? 32 | 33 | # Unresolved questions 34 | [unresolved]: #unresolved-questions 35 | 36 | What parts of the design are still TBD? 37 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2016 Redox OS 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Redox RFCs 2 | [Redox RFCs]: #redox-rfcs 3 | 4 | Many changes, including bug fixes and documentation improvements can be implemented and reviewed via the normal GitLab merge request workflow. 5 | 6 | Some changes though are "substantial", and we ask that these be put through a bit of a design process and produce a consensus among the Redox community. 7 | 8 | The "RFC" (request for comments) process is intended to provide a consistent and controlled path for new features to enter, so that all stakeholders can be confident about the direction the OS is evolving in. 9 | 10 | ## When you need to follow this process 11 | [When you need to follow this process]: #when-you-need-to-follow-this-process 12 | 13 | You need to follow this process if you intend to make "substantial" changes to 14 | Redox, Cargo, Crates.io, or the RFC process itself. What constitutes a 15 | "substantial" change is evolving based on community norms and varies depending 16 | on what part of the ecosystem you are proposing to change. 17 | 18 | Some changes do not require an RFC: 19 | 20 | - Rephrasing, reorganizing, refactoring, or otherwise "changing shape 21 | does not change meaning". 22 | - Additions that strictly improve objective, numerical quality 23 | criteria (warning removal, speedup, better software compatibility, more 24 | parallelism, trap more errors, etc.) 25 | invisible to users-of-redox. 26 | 27 | If you submit a merge request to implement a new feature without going 28 | through the RFC process, it may be closed with a polite request to 29 | submit an RFC first. 30 | 31 | ## Before creating an RFC 32 | [Before creating an RFC]: #before-creating-an-rfc 33 | 34 | A hastily-proposed RFC can hurt its chances of acceptance. Low quality 35 | proposals, proposals for previously-rejected features, or those that 36 | don't fit into the near-term roadmap, may be quickly rejected, which 37 | can be demotivating for the unprepared contributor. Laying some 38 | groundwork ahead of the RFC can make the process smoother. 39 | 40 | Although there is no single way to prepare for submitting an RFC, it 41 | is generally a good idea to pursue feedback from other project 42 | developers beforehand, to ascertain that the RFC may be desirable: 43 | having a consistent impact on the project requires concerted effort 44 | toward consensus-building. 45 | 46 | As a rule of thumb, receiving encouraging feedback from long-standing project developers, is a good indication that the RFC is worth pursuing. 47 | 48 | ## What the process is 49 | [What the process is]: #what-the-process-is 50 | 51 | In short, to get a major feature added to Redox, one must first get the 52 | RFC merged into the RFC repo as a markdown file. At that point the RFC 53 | is 'active' and may be implemented with the goal of eventual inclusion 54 | into Redox. 55 | 56 | * Fork the RFC repo http://gitlab.redox-os.org/redox-os/rfcs 57 | * Copy `0000-template.md` to `text/0000-my-feature.md` (where 'my-feature' is 58 | descriptive. don't assign an RFC number yet). 59 | * Fill in the RFC. Put care into the details: RFCs that do not present 60 | convincing motivation, demonstrate understanding of the impact of the design, or 61 | are disingenuous about the drawbacks or alternatives tend to be poorly-received. 62 | * Submit a merge request. As a merge request the RFC will receive design feedback 63 | from the larger community, and the author should be prepared to revise it in 64 | response. 65 | * RFCs rarely go through this process unchanged, especially as alternatives and 66 | drawbacks are shown. You can make edits, big and small, to the RFC to 67 | clarify or change the design, but make changes as new commits to the PR, and 68 | leave a comment on the PR explaining your changes. Specifically, do not squash 69 | or rebase commits after they are visible on the PR. 70 | 82 | 83 | ## The RFC life-cycle 84 | [The RFC life-cycle]: #the-rfc-life-cycle 85 | 86 | Once an RFC becomes active then authors may implement it and submit 87 | the feature as a merge request to the Redox repo. Being 'active' is not 88 | a rubber stamp, and in particular still does not mean the feature will 89 | ultimately be merged; it does mean that in principle all the major 90 | stakeholders have agreed to the feature and are amenable to merging 91 | it. 92 | 93 | Furthermore, the fact that a given RFC has been accepted and is 94 | 'active' implies nothing about what priority is assigned to its 95 | implementation, nor does it imply anything about whether a Redox 96 | developer has been assigned the task of implementing the feature. 97 | While it is not *necessary* that the author of the RFC also write the 98 | implementation, it is by far the most effective way to see an RFC 99 | through to completion: authors should not expect that other project 100 | developers will take on responsibility for implementing their accepted 101 | feature. 102 | 103 | Modifications to active RFC's can be done in follow-up PR's. We strive 104 | to write each RFC in a manner that it will reflect the final design of 105 | the feature; but the nature of the process means that we cannot expect 106 | every merged RFC to actually reflect what the end result will be at 107 | the time of the next major release. 108 | 109 | In general, once accepted, RFCs should not be substantially changed. Only very 110 | minor changes should be submitted as amendments. More substantial changes should 111 | be new RFCs, with a note added to the original RFC. 112 | 113 | ## Implementing an RFC 114 | [Implementing an RFC]: #implementing-an-rfc 115 | 116 | Some accepted RFC's represent vital features that need to be implemented right away. 117 | Other accepted RFC's can represent features that can wait until some arbitrary developer feels like doing the 118 | work. 119 | Every accepted RFC has an associated issue tracking its implementation in the Redox repository; thus that associated issue can be assigned a priority that the team uses for all issues in the Redox repository. 120 | 121 | The author of an RFC is not obligated to implement it. 122 | Of course, the RFC author (like any other developer) is welcome to post an implementation for review after the RFC has been accepted. 123 | 124 | If you are interested in working on the implementation for an 'active' RFC, but cannot determine if someone else is already working on it, feel free to ask (e.g. by leaving a comment on the associated issue). 125 | 126 | 127 | ## RFC Postponement 128 | [RFC Postponement]: #rfc-postponement 129 | 130 | Some RFC merge requests are tagged with the 'postponed' label when they are closed (as part of the rejection process). 131 | An RFC closed with “postponed” is marked as such because we want neither to think about evaluating the proposal nor about implementing the described feature until some time in the future, and we believe that we can afford to wait until then to do so. 132 | Postponed PRs may be re-opened when the time is right. 133 | 134 | Usually an RFC merge request marked as “postponed” has already passed 135 | an informal first round of evaluation, namely the round of “do we 136 | think we would ever possibly consider making this change, as outlined 137 | in the RFC merge request, or some semi-obvious variation of it.” (When 138 | the answer to the latter question is “no”, then the appropriate 139 | response is to close the RFC, not postpone it.) 140 | 141 | 142 | ### Help this is all too informal! 143 | [Help this is all too informal!]: #help-this-is-all-too-informal 144 | 145 | The process is intended to be as lightweight as reasonable for the 146 | present circumstances. As usual, we are trying to let the process be 147 | driven by consensus and community norms, not impose more structure than 148 | necessary. 149 | 150 | #### This text 151 | 152 | This text is originally based on an older version of the README from https://github.com/rust-lang/rfcs . 153 | -------------------------------------------------------------------------------- /text/0001-namespace-syscalls.md: -------------------------------------------------------------------------------- 1 | - Feature Name: namespace-syscalls 2 | - Start Date: 2016-11-23 3 | - RFC PR: https://github.com/redox-os/rfcs/pull/4 4 | - Redox Issue: N/A 5 | 6 | # Summary 7 | [summary]: #summary 8 | 9 | A namespace is designed to implement the following with one abstraction: 10 | - `cap_enter`, by default 11 | - `chroot`, by allowing a filter on `file:` 12 | - [OS-level virtualization](https://en.wikipedia.org/wiki/Operating-system-level_virtualization) such as [FreeBSD-style Jails](https://en.wikipedia.org/wiki/FreeBSD_jail) or [Illumos-style Zones](https://en.wikipedia.org/wiki/Solaris_Containers), with more complex filtering of scheme access 13 | 14 | It achieves this with the addition of three syscalls: 15 | - `getns`, which gets the current namespace 16 | - `mkns`, which creates a new namespace 17 | - `setns`, which switches namespaces 18 | 19 | # Motivation 20 | [motivation]: #motivation 21 | 22 | Why are we doing this? What use cases does it support? What is the expected outcome? 23 | 24 | # Detailed design 25 | [design]: #detailed-design 26 | 27 | ```rust 28 | // Get the current namespace 29 | let old_ns = getns(); 30 | // Create a new empty namespace 31 | let new_ns = mkns(&[]); 32 | // Switch to the new namespace 33 | // This is only possible because this process created new_ns 34 | setns(new_ns); 35 | 36 | // Create a child fork 37 | let child = clone(0); 38 | if child == 0 { 39 | // Execute a process in the new namespace 40 | // This will reset the original namespace, preventing setns(old_ns) 41 | exec("process-to-contain"); 42 | }else{ 43 | // Create a new `file:` in the new namespace 44 | let file_scheme = open(":file", O_CREAT | O_RDWR); 45 | 46 | // Switch back to the original namespace 47 | // This is only possible because this process was once inside of old_ns 48 | setns(old_ns); 49 | 50 | // For every file event in the new `file:` 51 | for event in file_scheme.events() { 52 | // Translate it if required and forward it to the original `file:` 53 | handle_event(event); 54 | } 55 | } 56 | ``` 57 | 58 | # Drawbacks 59 | [drawbacks]: #drawbacks 60 | 61 | Why should we *not* do this? 62 | 63 | - Potential rooting by placing a setuid program in a specially designed container 64 | 65 | # Alternatives 66 | [alternatives]: #alternatives 67 | 68 | What other designs have been considered? What is the impact of not doing this? 69 | 70 | # Unresolved questions 71 | [unresolved]: #unresolved-questions 72 | 73 | What parts of the design are still TBD? 74 | 75 | - How to prevent rooting by placing a setuid program in a specially designed container 76 | -------------------------------------------------------------------------------- /text/0002-event-overhaul.md: -------------------------------------------------------------------------------- 1 | - Feature Name: event-overhaul 2 | - Start Date: 2018-05-19 3 | - RFC PR: https://github.com/redox-os/rfcs/pull/10 4 | - Redox Issue: https://github.com/redox-os/kernel/issues/89 5 | 6 | # Summary 7 | [summary]: #summary 8 | 9 | Overhaul the kernel event system to support mio. 10 | 11 | # Motivation 12 | [motivation]: #motivation 13 | 14 | The current kernel event system has one event queue per context, which is not 15 | flexible enough to be used by mio. 16 | 17 | # Detailed design 18 | [design]: #detailed-design 19 | 20 | - Make `fevent` system call return `ENOSYS`. The system call number will still 21 | be needed to support kernel <> scheme event communication. 22 | - Remove old kernel event system. 23 | - Produce a new kernel event system which can support the example 24 | - Port existing event users to the new system 25 | - Ensure that event generators will always trigger events when added to an 26 | event queue, and will be edge triggered after that 27 | - Rebuild all packages 28 | - Produce a new major release of Redox OS 29 | 30 | ```rust 31 | // This is a psuedo-Rust example 32 | 33 | // An event object, which can be converted to [u8] to be written to a file 34 | #[derive(Copy, Clone, Debug, Default)] 35 | #[repr(C)] 36 | pub struct Event { 37 | pub id: usize, 38 | pub flags: usize, 39 | pub data: usize 40 | } 41 | 42 | // An example file, a network interface 43 | let file = OpenOptions::new() 44 | .read(true) 45 | .write(true) 46 | .custom_flags(O_CLOEXEC | O_NONBLOCK) 47 | .open("network:").unwrap(); 48 | 49 | // Create a new event queue. This is tracked by file id 50 | let mut event_queue = OpenOptions::new() 51 | .read(true) 52 | .write(true) 53 | .custom_flags(O_CLOEXEC) 54 | .open("event:").unwrap(); 55 | 56 | // Create a request for read events on the file, with a unique token 57 | let event_request = Event { 58 | id: file.as_raw_fd(), 59 | flags: EVENT_READ, 60 | data: 0x1234 61 | }; 62 | 63 | // Add the event request to this event queue 64 | event_queue.write(&event_request).unwrap(); 65 | 66 | loop { 67 | // Wait for the next event 68 | let mut event = Event::default(); 69 | let count = event_queue.read(&mut event).unwrap(); 70 | if count == mem::size_of::() { 71 | // The event should have the id set to the network file, the flags set 72 | // to EVENT_READ, and the data set the same as the request 73 | assert_eq!(event, event_request); 74 | } else { 75 | panic!("invalid size of event: {}", count); 76 | } 77 | } 78 | ``` 79 | 80 | # Drawbacks 81 | [drawbacks]: #drawbacks 82 | 83 | A number of programs will need to be updated to the new event system. The old 84 | system will by necessity need to be removed. 85 | 86 | # Alternatives 87 | [alternatives]: #alternatives 88 | 89 | The alternatives are to attempt to implement different event handling in a 90 | userspace daemon. 91 | 92 | # Unresolved questions 93 | [unresolved]: #unresolved-questions 94 | 95 | What parts of the design are still TBD? 96 | -------------------------------------------------------------------------------- /text/0003-channels.md: -------------------------------------------------------------------------------- 1 | - Feature Name: channels 2 | - Start Date: 2017-12-29 3 | - RFC PR: (leave this empty) 4 | - Redox Issue: (leave this empty) 5 | 6 | # Summary 7 | [summary]: #summary 8 | 9 | Design for a fast, bidirectional IPC mechanism in Redox. 10 | 11 | # Motivation 12 | [motivation]: #motivation 13 | 14 | An easy to use, performant mechanism for IPC would be extremely useful. At the 15 | moment, the primary way of communicating between processes that are not 16 | parent-child is through creating schemes. 17 | 18 | # Detailed design 19 | [design]: #detailed-design 20 | 21 | The current design consists of a single scheme, `chan:`, that provides an 22 | interface for creating, interfacing with, and closing channels. Each channel 23 | has one server process and one or more client processes. 24 | 25 | In this documentation, `` is used to represent a channel name. This can 26 | be anything, but must be known by all the participating processes. 27 | 28 | Usage: 29 | 1. The server process opens `chan:` with the `O_CREAT` flag. 30 | 2. It listens for connections and duplicates the file descriptor with the path 31 | `"listen"` to accept (similarly to the way that tcp is implemented in Redox) 32 | 3. The connection can be written to by both the client and server. 33 | 34 | For unnamed socket: 35 | 1. The process opens `chan:` with the `O_CREAT` flag to create a server. 36 | 2. It duplicates the file descriptor with the path `"connect"` to create a 37 | client and connect to it. 38 | 3. The server duplicates again, now with the path `"listen"` to accept the 39 | newly created client. 40 | 3. The connection can be written to by both ends. 41 | 42 | # Drawbacks 43 | [drawbacks]: #drawbacks 44 | 45 | There aren't any real drawbacks to this. It's a solid, easily extendible IPC 46 | mechanism. 47 | 48 | # Alternatives 49 | [alternatives]: #alternatives 50 | 51 | - Named Pipes 52 | - Actually implement UNIX Domain Sockets 53 | - Do nothing 54 | -------------------------------------------------------------------------------- /text/0004-ptrace.md: -------------------------------------------------------------------------------- 1 | - Feature Name: ptrace 2 | - Start Date: 2019-06-08 3 | - RFC PR: (leave this empty) 4 | - Redox Issue: (leave this empty) 5 | 6 | # Summary 7 | [summary]: #summary 8 | 9 | A ptrace-like feature for tracing processes in Redox OS 10 | 11 | # Motivation 12 | [motivation]: #motivation 13 | 14 | Currently, we have no way for debuggers to work in Redox. We provide 15 | no interface for tracing a process' system calls or instructions, and 16 | no interface for managing another process' memory. 17 | 18 | A good first step for implementing `gdb` or a similar utility would be 19 | to implement a Linux `ptrace(...)` alternative for Redox. This should 20 | not only open up the possibility for debuggers, but also system-call 21 | translation processes like WINE, perhaps for Linux compatibility which 22 | would rid us the problem of porting software. 23 | 24 | And even with *pure* `ptrace`, without any register or memory reading, 25 | one can still implement the immensely useful tool `strace`, which 26 | could serve as an alternative to recompiling the kernel with system 27 | call debugging turned on or off. This is probably what we should focus 28 | on getting to work initially, before getting to the good stuff. 29 | 30 | # Detailed design 31 | [design]: #detailed-design 32 | 33 | The Linux ptrace interface is sometimes considered a huge mistake due 34 | to its inconsistence and it being just one massive function. The Redox 35 | interface will have to take that in mind, as well as remove any 36 | duplicate or otherwise redundant functions. 37 | 38 | All process-controlling functions are implemented as one kernel 39 | scheme, `proc:`. Opening it up with the `pid` of a tracee and a path 40 | will perform a specific operation on the process. The benefit of using 41 | schemes as opposed to a Linux-style function is that we get the 42 | ability to disallow this feature using namespacing for free. We also 43 | allow multiplexing using `event:` and therefore can be used in a 44 | nonblocking fashion just like any other file descriptor. 45 | 46 | ## Process trace 47 | 48 | Opening `proc:/trace` will give you a file which you can write 49 | proc-related functions, and closing the file descriptor will detach 50 | from it automatically. If any breakpoint is set when the file is 51 | closed, they are deleted and the process is resumed. Only *one* tracer 52 | can control a process, as I am too close-minded to come up with a 53 | design that would make sense for running multiple tracers on a single 54 | tracee. 55 | 56 | That said, if the tracer has the flag `O_EXCL` will instead send 57 | `SIGKILL` to the tracee when the tracer closes its file. This is to 58 | prevent any ptrace-contained processes from breaking out. (`O_EXCL` 59 | can be thought of as meaning the tracer is the only one who controls 60 | the process, and the process can't live on its own) 61 | 62 | Another flag used in `open` is `O_TRUNC` which will stop the process 63 | *immediately*. This can be compared to using `PTRACE_ATTACH` on Linux 64 | as opposed to `PTRACE_SEIZE`. (`O_TRUNC` can be thought of as 65 | *truncating*/clearing the file's execution. It's a stretch, but I have 66 | no better idea) 67 | 68 | The most important operation of `ptrace` is of course to put a 69 | breakpoints! Redox tries to unify the Linux event system and the 70 | breakpoint system by making the input a bitflags with *one or more* 71 | breakpoints. It will return when the first breakpoint/event specified 72 | within the bitmask is reached, which in case it'll add that event for 73 | reading using the `read` system call (see events below). If an event 74 | is already set, the `write` returns immediately. (slight exception is 75 | `O_NONBLOCK`, see that below as well) 76 | 77 | Each breakpoint is set and optionally awaited using the `write` system 78 | call. Each such call will also resume the tracee in case it's stopped 79 | after another breakpoint. So if you write a value with no stop bits 80 | set, the program will run to completion. The exception is manually 81 | specifying `PTRACE_FLAG_WAIT` (even in blocking mode, see below), 82 | which will - unless any new stop is set - only wait for an existing 83 | breakpoint to be reached. 84 | 85 | ### Breakpoints 86 | 87 | - If `PTRACE_STOP_PRE_SYSCALL` is set, the tracee will break on the 88 | next start of a syscall. This diverges from Linux' way of using 89 | `PTRACE_SYSCALL` for *both* pre- and post- syscall. However it's for 90 | a good reason: Signals can occur in the middle of a syscall, and 91 | unlike Linux which just delays the signal, we should go the simplest 92 | route to minimize kernel code size and let the user choose the 93 | behavior they want and not choose for them. 94 | - If `PTRACE_STOP_POST_SYSCALL` is set, the tracee will break at the 95 | end of a syscall, when the return value has just been set in the 96 | appropriate register. 97 | - If `PTRACE_STOP_SINGLESTEP` is set, the tracee is stopped after the 98 | execution of just one assembly instruction. If used together with 99 | any system-call trace, the system-call method will take precedence 100 | and allow you to fine-grane how that should work. (Not a special 101 | case, the syscall trace returns before the instruction returns and 102 | thus is what is used by the multiplexing trace call!) 103 | - If `PTRACE_STOP_SIGNAL` is set, the tracee is stopped before next 104 | signal is handled. To the break event (see section on events below), 105 | the signal number is pushed as the first parameter, and the pointer 106 | to the signal handler as the second parameter. The pointer can help 107 | you detect whether a signal will be handled by kernel space or 108 | userspace, by detecting constants such as `SIG_DFL` and `SIG_IGN`. 109 | - If `PTRACE_STOP_BREAKPOINT` is set, the tracee is stopped when the 110 | Breakpoint Exception, interrupt number 3, is triggered. This is 111 | commonly caused by the `int3` instruction with opcode `0xCC` on 112 | x86_64. The default behavior for breakpoint exceptions is to exit 113 | the process with the `SIGTRAP` signal, but you can unfortunately not 114 | catch this with `PTRACE_STOP_SIGNAL` due to the fact that signals 115 | are sent in a way that never goes through the signal 116 | handler. Instead of the microkernel working around this just to 117 | cause an ambigious breakpoint event, the two causes of signals are 118 | separated. Like the signal breakpoint, the default behavior (to exit 119 | the process) can be ignored using `PTRACE_FLAG_IGNORE` (read on 120 | flags below). 121 | - If `PTRACE_STOP_EXIT` is set, the tracee is stopped before the 122 | process exits. Exits here is from all kinds of contexts, like 123 | *after* signals (so they will first raise `PTRACE_STOP_SIGNAL` if 124 | selected), *during* an exit syscall (will never reach 125 | `PTRACE_STOP_POST_SYSCALL`), as well as when caused by an hardware 126 | interrupt such as when an out-of-bounds read occurs. You cannot 127 | abort the exit and continue running the program, but you can inspect 128 | everything just like any other breakpoint. 129 | 130 | ### Non-breakpoint events 131 | 132 | These events will not stop the tracee, but rather keep running in the 133 | background until whatever breakpoint was set alongside this, was reached. 134 | 135 | - If `PTRACE_EVENT_CLONE` was set, the tracer will wake up when the 136 | traee creates a new child process. An event will be delivered to the 137 | tracer with the PID as the first parameter. The child process will 138 | be in a stopped state, but unless attached to with a separate 139 | tracer, it will be restarted upon the next ptrace invocation. 140 | 141 | ### Flags 142 | 143 | - If `PTRACE_FLAG_IGNORE` is set, the general action being done is 144 | aborted and returned early. If this is set immediately after a 145 | pre-syscall breakpoint, the system call is not executed but rather 146 | by setting the registers, *the tracer* can handle the system 147 | call. This behavior is known as "sysemu" on Linux. This 148 | general-purpose flag also lets you ignore tracee signals (except for 149 | `SIGKILL` which is off-limits and will ignore your wishes). 150 | - If `PTRACE_FLAG_WAIT` is set, the `write` call will not return 151 | before the breakpoint is reached, but rather await that. This is the 152 | default behavior whenever `O_NONBLOCK` is not set, but this flags 153 | lets nonblocking tracers override that behavior. As explained 154 | briefly above, this flag will not restart a stopped tracee unless a 155 | new stop bit was set - which is behavior *not* replicated by default 156 | without `O_NONBLOCK`. 157 | 158 | --- 159 | 160 | Because `ptrace` does **not** rely on signals, when a process is 161 | ptrace-stopped (such as attaching to the tracee with `O_TRUNC` 162 | explained above) you can send `SIGCONT` without actually restarting 163 | the process. The process is restarted only using a ptrace operation or 164 | when the tracer file handle is closed. This signal is instead just 165 | scheduled to get handled whenever the tracee starts, which allows the 166 | tracee to raise `SIGSTOP` and let the tracer to restart it only after 167 | a ptrace operation was completed. 168 | 169 | When the tracee exits (after any selected `PTRACE_STOP_EXIT` 170 | breakpoints are invoked), any blocking operation depending on it stops 171 | and instead returns `ESRCH`. It does not, however, reap the zombie 172 | process. Therefore, if the tracee is your own child process you should 173 | invoke `waitpid` immediately after a `ESRCH` error, which will also 174 | allow you to obtain the exit status in the normal fashion, without 175 | putting a breakpoint specifically on exit. 176 | 177 | ### Events 178 | 179 | Events give the tracer information about breakpoints or actions the 180 | tracee has taken. There are two types of events: Breakpoint events, 181 | and non-breakpoint events. Only breakpoint events stop the tracee when 182 | reached, other events only wake up the tracer, while the tracee keeps 183 | going. The way you receive events is by `read`ing a `PtraceEvent` 184 | structure from the file. Reads are not blocking, and will return `0` 185 | when no event was able to be read. 186 | 187 | Events are read sequencially, i.e. follow first-in-last-out. The 188 | standard behavior for handling non-breakpoint events is to read them 189 | all and then retry waiting for the breakpoint to be reached using 190 | `PTRACE_FLAG_WAIT`. Any unread events from the last operation will 191 | cause a new one to return immediately, in order to prevent a possible 192 | race condition where you think you've read all events but another one 193 | occurs right when want to retry the wait for a breakpoint to be reached. 194 | 195 | The structure has a value `cause` specifying what bit caused the 196 | tracer to wake up, as well as a set of values like `a` (first 197 | parameter), `b` (second parameter), `c` (third parameter), etc. For 198 | example, if the input was `PTRACE_STOP_SIGNAL | PTRACE_EVENT_CLONE`, 199 | the bitmask may be either `PTRACE_STOP_SIGNAL` or `PTRACE_EVENT_CLONE` 200 | depending on which event was hit first. The `a` value of 201 | `PTRACE_STOP_SIGNAL` is the signal number which caused the breakpoint 202 | to be hit, while the `a` value of `PTRACE_EVENT_CLONE` is the PID of 203 | the tracee's new child process. 204 | 205 | ### Nonblocking mode 206 | 207 | In nonblocking mode, a ptrace call without the `PTRACE_FLAG_WAIT` bit 208 | set will return `1` immediately. Any breakpoint specified is set, and 209 | will like usual overwriting any existing breakpoints. Note that the 210 | file will send events to the `event:` scheme, meaning you can 211 | multiplex multiple tracers. 212 | 213 | `EVENT_READ` is triggered whenever the first event arrives. Since an 214 | event only gets pushed to the stack if it's within the specified write 215 | bitmask, all events in the stack are of interest and this notification 216 | means you should immediately read them all. 217 | 218 | `EVENT_WRITE` is reserved, for now. 219 | 220 | ## Modify registers 221 | 222 | Another important part of Linux `ptrace` is reading and writing 223 | registers. Opening the file `proc:/regs/int` will allow you to 224 | `read`/`write` a struct consisting of the integer values of all 225 | registers. The same for `proc:/regs/float`, but for all 226 | floating-point values. This is similar to Linux's `GETREGS`/`SETREGS`. 227 | 228 | ## Modify memory 229 | 230 | There are many different ways to read a process' memory in Linux, but 231 | Redox should put effort in unifying these functions. Namely, the 232 | `proc:` scheme is an excellent candidate for a unified 233 | memory-modifying system. Opening `proc:/mem` will allow you to 234 | seek/read/write around the memory of another process. This is a 235 | unification of the following calls in Linux: 236 | 237 | - `/proc//mem` 238 | - `PTRACE_POKEDATA`/`PTRACE_PEEKDATA` 239 | - `process_vm_readv`/`process_vm_writev` 240 | 241 | ## Security 242 | 243 | By default a process should only be allowed to control a process owned 244 | by the current user, as well as being an anchestor of the process, 245 | direct or indirect. The main motivation for allowing indirect 246 | subprocesses is so one can trace threads of a direct subprocess. 247 | 248 | This restriction is lifted by processes owned by `root`, which can 249 | trace any process. In the future, a capability-like system could be 250 | put in place to allow specific executables to trace any process owned 251 | by the current user without having root access. 252 | 253 | ## Example 254 | 255 | ```rust 256 | let pid = unsafe { syscall::clone()? }; 257 | if pid == 0 { 258 | // This is the child 259 | 260 | // Wait until parent is ready to trace 261 | syscall::kill(syscall::getpid()?, syscall::SIGSTOP)?; 262 | 263 | println!("Some prints here"); 264 | println!("Some other syscalls"); 265 | eprintln!("One can even print to STDERR"); 266 | // Do things here. `fexec` another process, maybe? 267 | } else { 268 | // This is the parent 269 | 270 | // Wait for the child-initiated SIGSTOP to complete. 271 | syscall::waitpid(pid, &mut status, WUNTRACED)?; 272 | 273 | // ptrace attach: Stop the process using internal ptrace mechanism 274 | // (not SIGSTOP!) 275 | let mut trace = OpenOptions::new() 276 | .read(true) 277 | .write(true) 278 | .truncate(true) 279 | .open(&format!("proc:{}/trace", pid))?; 280 | // obtain a handle to the process registers 281 | let mut regs = File::open(&format!("proc:{}/regs/int", pid))?; 282 | let mut status = 0; 283 | 284 | // It is safe to schedule a continuation of child process, because 285 | // it is still stopped by ptrace 286 | syscall::kill(pid, SIGCONT)?; 287 | 288 | trace.write(syscall::PTRACE_STOP_PRE_SYSCALL | syscall::PTRACE_EVENT_CLONE)?; 289 | // Mostly ignore event... usually you can get some interesting 290 | // data from it 291 | let mut event: PtraceEvent = PtraceEvent::default(); 292 | trace.read(&mut event)?; 293 | while event.cause & syscall::PTRACE_EVENT_MASK != 0 { 294 | // In reality, you'll actually want to handle this event, or 295 | // it makes no sense to listen for it at all. This is just an 296 | // example to show you how you can handle non-breakpoint 297 | // events though. 298 | trace.write(syscall::PTRACE_FLAG_WAIT)?; 299 | trace.read(&mut event)?; 300 | } 301 | // This assertion is safe because if the process exits, the write 302 | // call returns ESRCH 303 | assert_eq!(event.cause, syscall::PTRACE_STOP_PRE_SYSCALL)?; 304 | 305 | let mut registers = syscall::IntRegisters::default(); 306 | regs.read(&mut registers)?; 307 | 308 | println!("System call: {}", registers.orig_rax); 309 | println!("Replacing with exit!"); 310 | 311 | registers.orig_rax = syscall::SYS_EXIT; 312 | registers.rdi = 0; 313 | 314 | regs.write(®isters)?; 315 | 316 | // trace.write(syscall::PTRACE_STOP_POST_SYSCALL)?; // wait for the completion of the system call 317 | 318 | trace.write(syscall::PtraceFlags::empty())?; // don't set any stops, rather run the program to the end, which is like right now 319 | syscall::waitpid(pid, &mut status, 0)?; // reap zombie process 320 | 321 | // trace file dropped here: process tracing detached and process 322 | // implicitly resumed if it hadn't already been, y'know, killed 323 | } 324 | ``` 325 | 326 | # Drawbacks 327 | [drawbacks]: #drawbacks 328 | 329 | There is no drawback to introducing more features, except for the mere 330 | size of the microkernel being increased. This is worth it as `ptrace` 331 | will open up a lot of different possibilities, maybe even so that we 332 | can eventually run linux programs on Redox. 333 | 334 | # Alternatives 335 | [alternatives]: #alternatives 336 | 337 | One could, of course, use the Linux way of doing things. I think 338 | schemes provide excellent interfaces with a clear sense of ownership 339 | ("This ptrace invocation operates on this tracee, not anything else"), 340 | and I'm especially convinced of using them when that means we can 341 | leverage the existing, excellent, namespacing support ([see 342 | example](https://gitlab.redox-os.org/redox-os/contain/blob/6a1c070381f2c8b56c688c8cca454a818ee72520/src/main.rs#L21-33)). 343 | 344 | An alternative to using a unified `proc:` scheme would be to, of 345 | course, split it up into one for ptrace + registers and one for 346 | writing memory. This was what the original RFC first suggested, before 347 | [@zen3ger](https://gitlab.redox-os.org/zen3ger) mentioned how Solaris 348 | implements a `ptrace(...)` function as a userspace library over their 349 | ProcFS. 350 | 351 | There are lots of possible alternatives, one of which was implemented 352 | and tried out. However, out of the ones I've considerd, this one 353 | should be the most scalable over time. 354 | -------------------------------------------------------------------------------- /text/0005-scheme-forward-fds.md: -------------------------------------------------------------------------------- 1 | - Feature Name: scheme_forward_fds 2 | - Start Date: 2021-06-12 3 | - RFC PR: https://gitlab.redox-os.org/redox-os/rfcs/-/merge_requests/17 4 | - Redox Issue: (leave this empty) 5 | 6 | # Summary 7 | [summary]: #summary 8 | 9 | This feature allows schemes to forward file descriptors local _to the process 10 | that is responsible for handling that scheme_. In other words, rather than 11 | giving the kernel a word-size integer representing a scheme-local file 12 | descriptor number, it can instead give the kernel a fully valid file descriptor 13 | in its _process file descriptor namespace_, that might have originated from a 14 | completely different scheme in the first place, thus _forwarding_ file 15 | descriptors. 16 | 17 | # Motivation 18 | [motivation]: #motivation 19 | 20 | Microkernels are intrinsically meant to be as minimal as possible, and Redox 21 | schemes serve as a useful IPC primitive. The ability to compose schemes, on the 22 | other hand, is limited by the latency delay of chaining scheme calls, if scheme 23 | A needs to communicate with scheme B, and in turn, scheme C, etc. Sometimes, 24 | the schemes in such chains need to access the inner data, but in many cases, 25 | filtering e.g. directory entries and enforcing access checks is sufficient. 26 | 27 | A good example is the `irq:` scheme. IRQ handling must be be _fast_, and as 28 | low-latency as possible. Ideally the only thing before the actual code, should 29 | be to mask the interrupt and directly switch to the handler. However, the 30 | current IRQ scheme also handles IRQ allocation for drivers, which adds 31 | unnecessary kernel code, simply because adding a wrapper scheme would slow down 32 | IRQ handling. This would also apply for hypothetical I/O port and MSR 33 | scheme-based interfaces. 34 | 35 | Additionally, The `proc:` scheme can probably at some point in the future, be 36 | divided into a userspace and kernel part, providing convenience APIs such as 37 | `proc:PID/memory` while forwarding performance-critical APIs such as ptrace 38 | pipes. 39 | 40 | With the feature described by this RFC, resource allocation can be moved to a 41 | userspace scheme, and giving that wrapper scheme handler near-full access to 42 | the kernel scheme. Once opened by a client, it will have a file descriptor that 43 | directly points to the kernel IRQ scheme. 44 | 45 | The same holds logic holds for userspace. Disk partitions are currently managed 46 | by the disk drivers, mainly to maintain flexibility and performance, as a 47 | middleman scheme would add latency to every read or write call. And while it 48 | may be reasonable to add this particular duplicate functionality to multiple 49 | drivers, this feature would enable disk drivers to simply add track allowed 50 | ranges in each file descriptor, and then have partition daemons provide 51 | forwarding schemes. 52 | 53 | Most importantly, __a chroot tool implemented using schemes and namespaces, 54 | would also become zero-cost for data, even if it may have to filter metadata 55 | access, i.e. directory structures.__ Scheme file forwarding is not 56 | _sufficient_, as some other interfaces such as `openat` may be required first, 57 | but it is _necessary_ for allowing fast chroots. 58 | 59 | # Detailed design 60 | [design]: #detailed-design 61 | 62 | On Redox, each file description is associated with a scheme ID and a 63 | scheme-provided identifier, and is refcounted. Hence, schemes respond to 64 | `SYS_OPEN` and `SYS_DUP` calls by returning that identifier, causing the kernel 65 | to insert a new file descriptor, pointing to the newly-created file 66 | description. A new file descriptor is thus _created_ in the process, with the 67 | scheme being responsible for handling all subsequent system calls operating on 68 | that file descriptor. 69 | 70 | Alternatively, this new feature additionally allows schemes to _forward_ an 71 | existing file description, as opposed to creating a new file description. 72 | 73 | Scemes normally respond to kernel calls by reusing the same packet it received 74 | from the kernel earlier, keeping all fields (even though every field except 75 | `id` is ignored), and setting `a` to the return value of that scheme message. 76 | The only exception is when triggerings events, where it instead sets the `id` 77 | field to zero, and sets `a` to `SYS_FEVENT`, `b` to the scheme-provided number, 78 | and `c` to the event flags. 79 | 80 | It would be possible to add a new syscall number exclusively used as a scheme 81 | response code, similar to `SYS_FEVENT` messages. However, since the response 82 | needs to be associated with `packet.id`, which as a `u64` is not guaranteed to 83 | fit within a `usize`, the current implementation introduces a new error code, 84 | `ESKMSG`. Paired with `packet.b` = `SKMSG_FRETURNFD` and `packet.c` = 85 | scheme-owned fd, the kernel will move that file descriptor out of the scheme's 86 | file table, and then transparently store that file description into the 87 | caller's file table, regardless of whether the file descriptor was created or 88 | forwarded. 89 | 90 | # Drawbacks 91 | [drawbacks]: #drawbacks 92 | 93 | The trivial drawback is that it adds complexity to the kernel, although in this 94 | case it is relatively minor. The question is rather whether we actually do want 95 | the functionality of being able to forward file descriptors that previously 96 | originated from different schemes. 97 | 98 | # Alternatives 99 | [alternatives]: #alternatives 100 | 101 | The obvious alternative would be to simply allow schemes to communicate with 102 | the caller requesting a file descriptor from `SYS_OPEN` or `SYS_DUP`, via 103 | previously suggested "fd channels", such as `cable:` or `sendfd:`. On the other 104 | hand, this would require the client to implement different logic for this, thus 105 | becoming less flexible and most importantly no longer transparent. 106 | 107 | That said, scheme file forwarding is not incompatible with such file descriptor 108 | channels, and when sending or receiving large sets of file descriptors, scheme 109 | file forwarding may incur unnecessary performance overhead, in comparison. 110 | 111 | # Unresolved questions 112 | [unresolved]: #unresolved-questions 113 | 114 | The primary unresolved question, is what syscall interface schemes should use 115 | when forwarding. The alternative to the current implementation's "scheme-kernel 116 | messages" (`ESKMSG`, `SKMSG_*`, and the rest of the packet), would be to use an 117 | interface similar to `SYS_FEVENT`. 118 | 119 | A minor question is whether forwarded files should be moved or cloned from the 120 | scheme's file table. The current implementation moves them, but the scheme can 121 | dup the file descriptor to keep it, if moved by the kernel, or respectively, 122 | close it later, if cloned by the kernel. 123 | -------------------------------------------------------------------------------- /text/0006-scheme-path.md: -------------------------------------------------------------------------------- 1 | - Feature Name: scheme-path 2 | - Start Date: 2024-01-17 3 | - RFC PR: (leave this empty) 4 | - Redox Issue: (leave this empty) 5 | 6 | # Summary 7 | [summary]: #summary 8 | 9 | As discussed with jeremy_soller, bjorn3, jcake, rw_van, 4lDO2, et al. 10 | 11 | To alleviate the difficulties created by having `scheme_name:` as the root of a scheme path, the proposed new format is `/scheme/scheme_name` where `scheme` is the literal word "scheme" and `scheme_name` is the name of the scheme. 12 | 13 | There will be a transition period where `scheme_name:` will continue to be accepted as a scheme path. 14 | 15 | # Motivation 16 | [motivation]: #motivation 17 | 18 | The current `scheme_name:` format creates multiple problems. 19 | 20 | - Linux programs cannot detect `scheme_name:` as an absolute path and instead treat it as a relative path. 21 | 22 | - Paths containing colon (`:`) are not properly parsed when they are part of e.g. `$PATH` or other colon-separated lists. 23 | 24 | - Rust's `std::path` library does not easily integrate new `Prefix` variants. Adding a `Scheme` variant to `Prefix` causes some existing Rust programs and libraries to not compile due to missing `match` arms, and may create other problems. An OS-aware implementation of `std::path` is under consideration by the Rust team but it is not imminent. 25 | 26 | # Detailed design 27 | [design]: #detailed-design 28 | 29 | ## Behavior 30 | 31 | 1. The new format for schemes will be `/scheme/scheme_name` where `scheme` is the literal string "scheme" and `scheme_name` is the name of the scheme (resource or service). 32 | 33 | 2. Slash (`/`) will be the only recognized path separator (after the transition is complete). Once the transition to the new format is entirely complete, `:` will be an allowed character in filenames. 34 | 35 | 3. The `file` scheme will now **always** be the default scheme. If a path does not start with `/scheme` but it does start with `/`, it will have `/scheme/file` prefixed during canonicalization. 36 | 37 | 4. Relative paths are now allowed to back up out of a scheme. Previously, `scheme_name:/..` would resolve to `scheme_name:`. Now, `/scheme/scheme_name/..` resolves to `/scheme` and `/scheme/scheme_name/../..` resolves to `/` which is equivalent to `/scheme/file`. `/..` resolves to `/` as on Unix. 38 | 39 | 5. Scheme providers currently receive paths for `open` requests with the scheme component stripped off by the kernel, with the path argument to the `open` call assumed as an absolute path within the receiving scheme. No change is required here. 40 | 41 | **NOTE:** The creation/mounting of new schemes is documented in a separate RFC. 42 | 43 | 6. To ease the transition to the new format, functionality that parses `scheme_name:` 44 | should be guarded with two feature flags, `scheme_fmt_warn` and `scheme_fmt_compat`. 45 | 46 | - If `scheme_fmt_warn` is enabled, an error message is printed to stderr or logged to the console when the `scheme_name:` format is used. 47 | 48 | - If `scheme_fmt_compat` is enabled, the `scheme_name:` format continues to work as previously. 49 | 50 | - Disabling `scheme_fmt_compat` will cause the `scheme_name:` format to be treated as a regular file name or as a path relative to the current working directory. 51 | 52 | - Initially, `scheme_fmt_compat` will be enabled by default and `scheme_fmt_warn` will be disabled. 53 | 54 | In the descriptions below, 55 | 56 | - items marked **DEPENDENCY** are required to be implemented before some other change can be completed. 57 | 58 | - items marked **USER VISIBLE** are higher priority as the user will see filenames using the old format. 59 | 60 | ## Disk Partitions 61 | 62 | `RedoxFS` is the scheme provider for the `file:` scheme, which will be renamed `/scheme/file`. RedoxFS handles one disk partition per scheme. In a scenario where multiple disk partitions are to be mounted, 63 | 64 | 1. There will be a "root partition" managed by RedoxFS and named `/scheme/file`. Its content will normally be referred to starting from `/`, i.e. `/scheme/file/bin` will be referred to as `/bin`. 65 | 66 | 2. Additional partitions will be managed by other instances of RedoxFS, (or a single instance managing multiple schemes) with their own scheme names, e.g. a scheme name of "file.home" would appear as `/scheme/file.home` and identify the RedoxFS instance that manages the "home" partition. 67 | 68 | 3. There will be a symbolic link in the root partition that places `/scheme/file.home` at its "mount point". e.g. `/home` -> `/scheme/file.home`. This will allow a `user` folder on the `home` partition to have an apparent name of `/home/user` and a full name of `/scheme/file.home/user`. 69 | 70 | Further details on the topic of disk partitions, mount points, `realpath` and `fpath` are outside the scope of this RFC. 71 | 72 | ## Code Changes 73 | 74 | ### redox-path 75 | 76 | A new crate for handling of new and legacy paths is now available. `redox-path` includes `canonicalize_with_cwd`, which canonicalizes paths that use the new format. Legacy-format paths are not currently canonicalized as some schemes use paths that must be formatted very precisely. 77 | 78 | ### PATH variable 79 | 80 | In future, the PATH environment variable will be converted to colon-separated format. In the short term, it is recommended that the PATH be limited to `/usr/bin` and all commands be copied or linked in that directory. 81 | 82 | ### Kernel Scheme Dispatch 83 | 84 | The kernel "Scheme Dispatch" functionality must be changed first, including compatibility feature guards. Once that is done, all other changes can be done as time permits. See also the RFC regarding the "Namespace" scheme. 85 | 86 | #### Current implementation 87 | 88 | [syscall::fs::open](https://gitlab.redox-os.org/redox-os/kernel/-/blob/master/src/syscall/fs.rs?ref_type=heads#L44) is responsible for dispatching `open` calls to the appropriate scheme. It assumes paths are already in canonicalized form. 89 | 90 | It currently obtains the scheme name by splitting the name at the first `:`. It dispatches to the appropriate scheme by looking up the scheme in the current namespace. `:scheme` is naturally parsed into an empty scheme name `""` with a path of `scheme`. This is interpreted as a request to the `RootScheme` with a path argument of `scheme`. 91 | 92 | This functionality will continue to be provided until we are ready to delete it, but with feature guards described below. 93 | 94 | The path format for dispatch to the `RootScheme` is discussed in another RFC. 95 | 96 | #### Changes required 97 | 98 | 1. `syscall::fs::open` will now also need to parse the new format by stripping the `/scheme/` literal and taking the string up to the next `/` as the scheme name. A path that does not begin with `/scheme/scheme_name` (or is not in the previous format, when `scheme_fmt_compat` is enabled) is considered an error. 99 | 100 | 2. The `scheme_fmt_warn` and `scheme_fmt_compat` feature guards should be implemented. Logging to the console will be done for old format scheme references. 101 | 102 | **DEPENDENCY** This work must be completed before any other conversion work can proceed. 103 | 104 | ### Rust's std::path 105 | 106 | Rust's `std::path` has some junk code in it due to *partial* rejection of past Redox pull requests. All the Redox-specific code should now be removed from `std::path` as Redox will now work with Linux-format paths. It is proposed that this work should be done once we are confident in the new `/scheme/scheme_name` format, but as soon as possible after that. 107 | 108 | - [here](https://github.com/rust-lang/rust/blob/master/library/std/src/path.rs#L303) 109 | 110 | - [here](https://github.com/rust-lang/rust/blob/master/library/std/src/path.rs#L2180) 111 | 112 | - [here](https://github.com/rust-lang/rust/blob/master/library/std/src/path.rs#L2673) 113 | 114 | - Possibly others 115 | 116 | ### Redox's std::path 117 | 118 | All Redox-specific code in Redox's fork of Rust's `std::path` should be removed. Similar to the above changes, plus removal of the `Scheme` variant of Prefix. 119 | 120 | ### Camino 121 | 122 | There is a [PR pending](https://github.com/camino-rs/camino/pull/88) for Camino, to adopt the `Scheme` Prefix variant. It should be closed. 123 | 124 | ### relibc Canonicalize 125 | 126 | #### Current functionality 127 | 128 | Currently, paths that use libc `open` are canonicalized to create a full path. The canonicalization uses the current working directory `CWD` as an additional information source during canonicalization. 129 | 130 | - If the path given to `open` starts with `scheme_name:`, it is taken as an absolute path and is used unchanged. 131 | 132 | - If the path starts with `/`, it is taken as absolute within the scheme of `CWD` (typically `file:`), and the scheme portion of `CWD` is prepended. 133 | 134 | - If the path does not start with `scheme_name:` or `/`, it is prefixed with `CWD`. 135 | 136 | #### Changes 137 | 138 | 1. The [canonicalize_with_cwd](https://gitlab.redox-os.org/redox-os/relibc/-/blob/master/src/platform/redox/path.rs?ref_type=heads#L16) function in `relibc` needs to be modified to accept `/scheme/scheme_name/path` as a new format, for specifying both the path and the CWD. This function will convert all paths to the new format prior to `syscall::open`. 139 | 140 | 2. This function will need to be changed to use `/scheme/file` as the scheme for an absolute path that does not contain a scheme. 141 | 142 | 3. The `scheme_fmt_warn` and `scheme_fmt_compat` feature guards should be implemented, with old format scheme references being warned on `stderr`. 143 | 144 | 4. An `assert` that the path must not follow the old format should be included when `scheme_fmt_compat` is disabled, at least for some initial period. This will trigger an abort, and when used with `RUST_BACKTRACE=full` can help determine the source of the problem. 145 | 146 | 5. After all old-format code has been removed, the `assert` should be removed, and paths starting with `name:` will be treated as allowed relative paths and handled normally. 147 | 148 | ### realpath 149 | 150 | `realpath` is a `libc` function that on Linux takes a file descriptor and returns an absolute path that can be used to open the same file. This path will have all symbolic links resolved. 151 | 152 | See [Unresolved questions](#unresolved-questions) regarding `realpath` issues. 153 | 154 | On Redox, `realpath` uses the scheme service `fpath` to determine the path. `fpath` is expected to return a full pathname including the scheme. Current `fpath` implementations return paths using `scheme_name:path` format. 155 | 156 | `realpath` should be modified as follows: 157 | 158 | 1. On return from `fpath`, `realpath` will strip `/scheme/file` from paths that contain it, so a path such as `/scheme/file/home` will be reported as `/home`. 159 | 160 | 2. The `scheme_fmt_warn` feature guard will enable `realpath` to check if the scheme format is `scheme_name:` and report to `stderr` if it is. 161 | 162 | 3. The `scheme_fmt_compat` feature guard will enable `realpath` to replace from `scheme_name:` with `/scheme/scheme_name/` (with appropriate bounds checking and overlapping copy). `file:` will be stripped from paths that contain it, replacing it with a leading `/` if needed. 163 | 164 | ### fpath implementations 165 | 166 | Several scheme providers implement `fpath`, where the scheme-relative path is calculated for a given file descriptor, and the scheme prefix is inserted by the scheme provider. 167 | 168 | 1. `RedoxFS` is the main provider of `fpath` and should be updated as soon as possible to return a path in the new format. 169 | 170 | 2. We will need to do a survey of schemes to determine which ones provide `fpath`. 171 | 172 | 3. We will need to work through a prioritized list of `fpath` implementations to update them. 173 | 174 | ### redox-scheme 175 | 176 | `redox-scheme` is the current best practice for creating user-space schemes. It should be updated to use the new scheme format. Scheme creation is discussed in the "Namespace Scheme" RFC. 177 | 178 | **DEPENDENCY** - Is `redox-scheme` ready for all drivers to be migrated? 179 | 180 | ### redox-event 181 | 182 | `redox-event` is the current best practice for using an event-based interface. 183 | 184 | 1. It should be updated to use the new scheme format. 185 | 186 | 2. A timer service should be added to `redox-event`, as many event subscribers use timers/timeouts directly, and in fact the use of timers is often the motivation for using the event scheme. 187 | 188 | 3. `epoll` requires the ability to gather all events that are available. This implies a need for a non-blocking check for events, e.g. `is_ready` or `maybe_next`. 189 | 190 | **DEPENDENCY** - A timer service should be implemented as soon as possible. Is `redox-event` ready for all (most) event users to be migrated? 191 | 192 | **DEPENDENCY** - A non-blocking check for events is needed by `epoll`. 193 | 194 | ### Scheme Clients 195 | 196 | `event:`, `time:` and `file:` schemes are referenced in many places throughout the Redox code. They will all need to be modified. 197 | 1. Where `event:` and `time:` are referenced, consider using the `redox-event` crate (depends on the [timer service](#redox-event) being added to `redox-event`). 198 | 199 | 2. Where `file:` is used, should we simply delete the scheme reference (since `/scheme/file` is now always the default), or should we convert it to `/scheme/file/`? 200 | 201 | ### epoll 202 | 203 | [epoll](https://gitlab.redox-os.org/redox-os/relibc/-/blob/master/src/platform/redox/epoll.rs?ref_type=heads#L56) in `relibc` uses the `event:` and `time:` schemes directly. It should be converted to use `redox-event` if possible. A non-blocking check for events may be required. If this is not feasible, `epoll` should be modified to use the new scheme format. 204 | 205 | ### libc: Scheme 206 | 207 | The `libc:` scheme is implemented in `relibc` as way to resolve paths that depend on the process context, e.g. `/dev/tty`. `/dev/tty` is a symbolic link in the `file:` scheme that refers to `libc:tty`. The `libc:` scheme parses the full path used for `open` (`"libc:tty"` in this case). It needs to be updated to use the new format. The primary use of the `libc` scheme is via symbolic links in the [filesystem configs](#filesystem-configs). 208 | 209 | On first glance, it looks like the `libc:` scheme can be updated with a [single change](https://gitlab.redox-os.org/redox-os/relibc/-/blob/master/src/platform/redox/libcscheme.rs?ref_type=heads#L6) to a constant. However, further investigation should be done. 210 | 211 | **DEPENDENCY** This change should be done at the same time as updating the filesystem configs. Not many Redox applications use this functionality, so it is not urgent unless we are porting Linux TUI apps. 212 | 213 | ### RedoxFS 214 | 215 | 1. RedoxFS has its own [canonicalize](https://gitlab.redox-os.org/redox-os/redoxfs/-/blob/master/src/mount/redox/scheme.rs?ref_type=heads#L164) functionality that needs to be updated. 216 | 217 | 2. Old format symbolic links are supported. Because old format paths are not canonicalized, RedoxFS needs to convert old format links to the new format when resolving the links. 218 | 219 | 3. [fpath](https://gitlab.redox-os.org/redox-os/redoxfs/-/blob/master/src/mount/redox/scheme.rs?ref_type=heads#L634) will need to be revised. 220 | 221 | ### Contain 222 | 223 | Contain makes extensive use of filename parsing, canonicalization and scheme references. The work to be done is not listed here as it is extensive. 224 | 225 | **DEPENDENCY** The `desktop-contain` filesystem config should be updated at the same time as the updates to Contain. 226 | 227 | ### Ion 228 | 229 | Ion has some code that strips or adds the `file:` prefix. 230 | 231 | **User Visible** Although `file:` is removed, other scheme-prefixed paths are displayed to the user. This should be minimized by updates to `realpath`. 232 | 233 | ### Bash, Dash, Nushell, etc. 234 | 235 | Shells that have Redox forks likely have `file:` prefix stripping to enable `glob` pattern expansion. This will need to be removed. 236 | 237 | ### Other libraries and crates 238 | 239 | Any library or crate that has a Redox fork likely has some scheme-related code. They will need to be examined. 240 | 241 | ### Filesystem Configs 242 | 243 | All filesystem configs (e.g. `desktop.toml`) need to be updated to the new format. 244 | 245 | 1. Symbolic links e.g. `/dev/tty` will need to be in the new format. 246 | 247 | 2. `desktop-contain.toml` will need to be revised to the new format. 248 | 249 | ## Documentation Changes 250 | 251 | ### Book 252 | 253 | All documentation about schemes will need to be updated. There are several pages that discuss schemes. 254 | 255 | ### README files 256 | 257 | A scan of the README files for each repo will need to be done, with updates as needed. 258 | 259 | # Drawbacks 260 | [drawbacks]: #drawbacks 261 | 262 | 1. Amount of Rework 263 | 2. Moving away from URI format 264 | 3. History of pushing for URI format to be accepted by the Rust community 265 | 266 | # Alternatives 267 | [alternatives]: #alternatives 268 | 269 | ## URI (Status Quo) 270 | 271 | The status quo of URI-style `scheme_name:` paths seems to have reached its limit. Although it is a recognized path format in POSIX, it is not used for filesystem references. In practice URIs are converted by services into simple Unix paths by web applications that process URIs. 272 | 273 | Our use of this format is now impeding the porting of Linux applications and FOSS Rust applications. 274 | 275 | ## Plan 9 Paths 276 | 277 | Plan 9 paths are similar to the proposed Redox scheme format, except that Plan 9 services are mostly present at the root of the filesystem, e.g. `/tcp`. Redox will have all services (schemes) logically under the `/scheme/` folder. This will help keep the filesystem root clean and allow for more *nix-like filenames. 278 | 279 | ## Other Path Formats 280 | 281 | Using some Windows-compatible path format could alleviate some of the problems of having scheme-based names. [UNC](https://en.wikipedia.org/wiki/Path_(computing)#Universal_Naming_Convention) and the DeviceNS formats are supported by Rust `std::path`. However, there doesn't seem to be a benefit over using `/scheme/`. 282 | 283 | POSIX allows that paths starting with `//` are "implementation defined" but other points in the POSIX specification state that a prefix of more than one slash is to be treated as a single slash. 284 | 285 | # Unresolved questions 286 | [unresolved]: #unresolved-questions 287 | 288 | 1. On Redox, the `fpath` scheme service is used as the mechanism to obtain a path for a file descriptor. However, it produces results that are not guaranteed to be correct. A future implementation of an `fpath`-like service will address the problems. Resolving this issue is outside the scope of this RFC. -------------------------------------------------------------------------------- /text/0007-base-system-repo.md: -------------------------------------------------------------------------------- 1 | - Feature Name: base-system-repo 2 | - Start Date: 2024-12-29 3 | - RFC PR: (leave this empty) 4 | - Redox Issue: (leave this empty) 5 | 6 | # Summary 7 | [summary]: #summary 8 | 9 | Merge the repos forming the base system into a single repo. 10 | 11 | # Motivation 12 | [motivation]: #motivation 13 | 14 | While we will likely be able to stabilize the userspace ABI relatively soon through dynamic linking of relibc and libredox, the syscall interface as well as the interface of system services between each other and with relibc is likely to take much longer to stabilize if it ever gets stabilized. Many internal improvements currently require merge requests across multiple repos that need to be merged at the same time which makes such changes harder to do. For this reason some repos contain programs that don't actually quite fit in the repo but are only there because it makes changes easier. For example the driver repo contains inputd, fbbootlogd and fbcond, none of which are actually drivers, but all of them are somewhat coupled with the graphics drivers. And having crates like redox-scheme and redox-log be included through crates.io due to being in a separate repo makes changes to them harder too. Merging all programs in the base system into a single repo will make it easier to make atomic changes. It is also currently not safe for users to update individual packages that are part of the base system. 15 | 16 | # Detailed design 17 | [design]: #detailed-design 18 | 19 | The following repos will be merged into a single "base" repo: 20 | 21 | * audiod 22 | * contain 23 | * drivers 24 | * event 25 | * init 26 | * initfs 27 | * ipcd 28 | * logd 29 | * netstack 30 | * ptyd 31 | * ramfs 32 | * randd 33 | * redox-log 34 | * redox-scheme 35 | * zerod 36 | 37 | The following repos will **not** be merged into the "base" repo: 38 | 39 | * binutils 40 | * bootloader (the bootloader interface rarely changes) 41 | * coreutils 42 | * dash 43 | * findutils 44 | * installer 45 | * ion 46 | * libredox (this is supposed to be a stable API in the future) 47 | * pkgar 48 | * pkgutils 49 | * redoxerd 50 | * uutils 51 | * pretty much everything outside of the core category in the cookbook 52 | 53 | For the actual merge, I propose to make for each repo a commit which moves the entire content to a subdirectory and then git merge all the repos together into a new repo. This preserves the full history of all repos as well as git blame. And afterwards all issues will have to be transfered to the new repo. 54 | 55 | After merging the repos, initially all recipes can be updated to use the `source.same_as` functionality that drivers-initfs already uses to have a single checkout for the base repo across all recipes and then build the respective subdirectory of the base repo. Once that is done, other MRs can start getting merged again as usual. 56 | 57 | At a later point we can start merging recipes for base system components together and adapt the build step as appropriate. This will also enable sharing compiled dependencies between components if we put them in the same cargo workspace. In the end we probably want to either end up with either a single base package or a base package and a base-desktop package where the latter would contain the audio and graphics subsystem. Or alternatively we could end up with a base-server and base-desktop package which contain all components that overlap between both configurations to ensure the base system is atomically updated. 58 | 59 | # Drawbacks 60 | [drawbacks]: #drawbacks 61 | 62 | It is a non-trivial amount of work. 63 | 64 | # Alternatives 65 | [alternatives]: #alternatives 66 | 67 | What other designs have been considered? What is the impact of not doing this? 68 | 69 | # Unresolved questions 70 | [unresolved]: #unresolved-questions 71 | 72 | Should the following repos be merged into the "base" repo? I think they should, but it might be less disruptive to keep them in separate repos at least for the time being. 73 | 74 | * bootstrap 75 | * escalated 76 | * kernel 77 | * relibc 78 | * redoxfs 79 | * syscall 80 | * orbital (but not the gui apps themself) 81 | 82 | @4lDO2 prefers putting those in submodules instead for another repo. 83 | 84 | --- 85 | 86 | Should we split the base package and if so should we split it into base and base-desktop or base-server and base-desktop? 87 | -------------------------------------------------------------------------------- /text/0008-userspace-signals.md: -------------------------------------------------------------------------------- 1 | - Feature Name: userspace_signals 2 | - Start Date: 2024-02-16 3 | - RFC PR: https://gitlab.redox-os.org/redox-os/rfcs/-/merge_requests/19 4 | - Redox Issue: https://gitlab.redox-os.org/redox-os/kernel/-/issues/113 5 | 6 | # Summary 7 | [summary]: #summary 8 | 9 | Most of Redox's POSIX signal handling implementation can be moved to userspace, in a way that in particular, allows changing the signal mask without any syscalls. 10 | Sending signals would still be done by the kernel, with regular mutex synchronization, or later, a userspace process manager. 11 | 12 | # Motivation 13 | [motivation]: #motivation 14 | 15 | Signals are the userspace equivalent of hardware interrupts -- they interrupt what was previously running, use the same stack, and are mostly maskable. 16 | POSIX requires sigprocmask, which most POSIX systems implement in the kernel, using a sigprocmask syscall. 17 | 18 | However, as Redox is moving more and more of previous kernel functionality to redox-rt, it sometimes needs critical sections where signals must not be delivered. 19 | This is currently the case for _open(3)_, where it would be useful not to need to wrap each open call in two sigprocmask syscalls. 20 | The same issue will become more significant in the future, should redox-rt for example emulate POSIX file descriptors. 21 | 22 | Moving the signal implementation to userspace will eliminate the need for the `sigprocmask`, `sigaction`, and `sigreturn` syscalls. 23 | The redox-rt counterparts will also likely be faster, when no longer syscall-based. 24 | Although, there will still need to exist a `kill`/`sigqueue` syscall, or an equivalent IPC call to a process manager. 25 | 26 | # Detailed design 27 | [design]: #detailed-design 28 | 29 | ## Proc scheme API 30 | 31 | The kernel would remove `//sigstack` and change `//sighandler`, which would provide write-only access to the following struct: 32 | 33 | ```rust 34 | #[repr(C)] 35 | struct SigEntry { 36 | user_handler: usize, 37 | excp_handler: usize, 38 | thread_ctl_region: usize, 39 | proc_ctl_region: usize, 40 | } 41 | ``` 42 | 43 | The `user_handler` and `excp_handler` fields are function pointers to the signal trampoline and CPU exception handler, respectively. 44 | The `thread_ctl_region` and `proc_ctl_region` fields point to a control structure defined below, for thread vs process granularity. 45 | Setting `user_handler` to zero disables user-handled signals completely for the thread. 46 | The `excp_handler` field is however optional, and by default, CPU exceptions will result in core dumps unless explicitly handled. 47 | The thread and process control region is defined as follows: 48 | 49 | ```rust 50 | #[repr(C)] 51 | struct SigCtlRegion { 52 | // Consists of two words, one for standard and one for realtime signals. 53 | // The low 32 bits are the pending set, whereas the high bits are the allowset. 54 | ctl: [AtomicU64; 2], 55 | 56 | local_ctl: SigatomicU64, // accessed using Relaxed and compiler barriers 57 | 58 | old_ip: usize, // eip/rip/pc 59 | old_archdep_reg: usize, // eflags/rflags/x0 60 | } 61 | 62 | const LOCAL_CTL_INHIBIT_DELIVERY_BIT: u64 = 1; 63 | 64 | #[repr(C)] 65 | struct ProcCtlRegion { 66 | pending: AtomicU64, 67 | actions: [RawAction; 64], 68 | q: [RtSig; 32], 69 | qhead: AtomicU8, 70 | qtail: AtomicU8, 71 | } 72 | 73 | #[repr(C)] 74 | struct RawAction { 75 | first: AtomicU64, 76 | user_data: AtomicU64, 77 | } 78 | 79 | #[repr(C)] 80 | struct RtSig { 81 | signo: usize, // all bits except 6:0 are reserved 82 | sigval: usize, 83 | } 84 | ``` 85 | 86 | The signal control regions, for both process and thread, must be contained within a single page, and will 16-byte alignment. 87 | Userspace is encouraged to reuse the existing TCB page. 88 | The ctl field consists of two _signal groups_, namely the standard and realtime signals (starting at 33). 89 | Each such `AtomicU64` is divided into a lower _pending set_ and upper _allowset_ half. 90 | 91 | Since signal 0 does not exist, this bitset is zero-based. 92 | The `sigprocmask` and `pthread_sigmask` functions, as defined by POSIX, can only modify the current thread's mask. 93 | 94 | ## Kernel implementation of kill/sigqueue 95 | 96 | The kill and/or sigqueue syscalls will still use mutex-based synchronization, with other kernel hardware threads. 97 | This implies the only lock-free synchronization is check-mask-then-deliver and unmask-then-check-pending, where userspace synchronizes with the kernel, possibly on another hardware thread. 98 | 99 | The kernel will set the corresponding pending bit, followed by reading the masked and pending bit simultaneously, the logical AND of which, is the set of deliverable signals. 100 | If nonempty, the kernel will unblock the thread and set an internal flag indicating that a signal is incoming. 101 | 102 | Sending signals to a process rather than thread, is done by first setting the bit in the process-wide pending set, followed by linearly searching the TCBs for a thread that has not blocked that signal. 103 | A subsequent no-op write to that thread's `ctl` with Release ordering, should synchronize that earlier write to the process-wide set, when the trampoline later reads its thread-specific `ctl` (with Acquire ordering) followed by reading the process-wide mask, and deciding which signal to deliver. 104 | 105 | ### Delivery 106 | 107 | When the kernel delivers a signal, it unblocks the thread potentially sending an IPI, and when the context is switched to, it has exclusive access to the saved registers. 108 | The instruction and stack pointers, as well as some miscellaneous registers, are saved (using nonatomic accesses) to the respective fields of the thread control region. 109 | It will also set the _inhibit flag_. 110 | 111 | The _inhibit flag_ allows temporarily preventing the kernel from jumping the userspace context to the signal trampoline, without affecting how threads are awoken etc. 112 | This flag allows async-signal-safe functions to easily and efficiently disabling signals during short critical sections. 113 | 114 | #### Trampoline 115 | 116 | The `user_handler` field, points to the _signal trampoline_. 117 | The kernel will save a few registers, some of which can be used as scratch registers. 118 | The trampoline will need to calculate the new stack pointer, taking into account the potential alternate signal stack (`sigaltstack`). 119 | On x86_64 this can be found [here](https://gitlab.redox-os.org/redox-os/relibc/-/blob/44f148ad6c214551bb0fddf0eecc9801558f25b9/redox-rt/src/arch/x86_64.rs#L174). 120 | 121 | ## Fork, exec, pthread_create 122 | 123 | POSIX requires fork to preserve all sigactions and the sigprocmask, which would likely be trivial considering the shared address space, as the proc scheme struct can simply be reapplied. 124 | 125 | Exec must preserve the procmask, and remember which signals were ignored, but otherwise reset all nonignored sigactions. 126 | This information is passed in AT_SIGPROCMASK_{LO,HI}/AT_SIGIGNMASK_{LO,HI} with both lo and hi variants on 32-bit platforms. 127 | 128 | `pthread_create` requires the pending set to start as empty, and the mask is inherited from the 'parent' thread. 129 | 130 | ## SIGCONT, SIGSTOP(/SIGTSTP,SIGTTIN,SIGTTOU), SIGKILL 131 | 132 | POSIX states that SIGKILL and SIGSTOP are not maskable, and cannot be handled in userspace or ignored. 133 | Luckily, this frees up 4 additional bits. 134 | The "SIGKILL masked", "SIGKILL pending", and "SIGSTOP masked" bits, instead indicate whether the action for SIGTSTP, SIGTTIN, and SIGTTOU, is equivalent to the hardcoded SIGSTOP action. 135 | If they are unset, they can be ignored or handled, and masked, like the other signals. 136 | 137 | The SIGCONT signal cannot be ignored either, in the sense that sending a SIGCONT will always continue the target process, but userspace can choose whether or not it will be caught. 138 | The pending and masked bits for SIGCONT will thus have the same behavior as regular signals, except `kill` will unconditionally transition the thread from _stopped_ to _blocked_ first. 139 | 140 | POSIX requires the generation of SIGCONT to discard all pending stop signals, and vice versa. 141 | Since the `kill` implementation is mutex-synchronized, this should be relatively easy to synchronize. 142 | 143 | ## Realtime signals 144 | 145 | POSIX specifies that the conventional signals (typically 1-31) should be implemented as a set; that is, multiple undelivered signals of the same number, should be merged into one. 146 | Realtime signals must instead be implemented as a queue, allowing multiple independent signals of the same number. 147 | Realtime signals must also be able to provide a value, either an `int` or a pointer. 148 | 149 | This would be implemented using the `queue` field in the signal control region. 150 | The `qhead` and `qtail` atomic fields together divide the queue array into a consumer-owned and producer-owned half. 151 | The consumer half shall thus be read nonatomically by the thread, and the producer half written nonatomically by the kernel. 152 | The `qtail` field shall be written only by the producer (kernel), and the `qhead` field by the consumer (thread). 153 | 154 | Since the consumer half is exclusively owned by the thread, it can dynamically update the pending bits accordingly based on which nonmasked realtime signals are present in the queue. 155 | The kernel will set the pending bits as usual when sending realtime signals, which for synchronization reasons, must be done _after_ the actual entry is enqueued. 156 | 157 | This iteration will likely be very quick, so long as the number of possible signals does not significantly increase. 158 | Should this become a performance issue, the queue array may be converted to a structure-of-arrays, where the signal numbers can be packed, and counted quickly using SIMD. 159 | POSIX only requires the implementation to provide 8 realtime signals, and the threading implementation requires two additional signals (cancellation and timer). 160 | 161 | ## raise 162 | 163 | Raise will initially be implemented using a regular thread-specific kill syscall, but that should be possible to bypass. 164 | 165 | ## sigprocmask/pthread_sigmask 166 | 167 | Changing the signal mask (equivalently, the inverted allowset), is done fully in userspace. 168 | The inhibit bit is set, the allowsets for each group are atomically swapped while simultaneously reading the pending set, and there are pending unblocked signals, then at least one will be delivered before the *mask function returns. 169 | 170 | Since the allowset strictly is writable only by the target thread, it can be modified without necessitating a CAS loop, on x86 which supports XADD (atomic fetch_add). 171 | Specifically, swapping only the allowset is done using `word.fetch_add(new_allowset.wrapping_sub(old_allowset))`. 172 | 173 | ## sigaction 174 | 175 | Sigaction will be implemented entirely in redox-rt. 176 | With signals temporarily disabled in the `sigaction` function itself, it can use regular mutex-based synchronization, including synchronization between `sigaction` and the signal trampoline running on other threads. 177 | Setting the action to SIG_IGN will modify the ignmask and clear the allowset bit for the respective signal. 178 | Setting it to SIG_DFL will either modify the signal-is-stop bits for SIGTSTP/SIGTTIN/SIGTTOU, or set it to a builtin default handler. 179 | 180 | POSIX allows the sigaction to change between the generation and delivery of a signal, allowing sigaction to be weakly ordered and only synchronize against the signal trampoline (see [mem-orderings][mem-orderings]). 181 | 182 | # sigwait, sigsuspend, etc 183 | 184 | POSIX does not appear to differentiate between accepting (i.e. `sigwait`ing) and delivering (i.e. trampoline runs). 185 | Thus, it should be valid to implement these internally with a few additional checks in the trampoline. 186 | That said, these functions can avoid the trampoline entirely, by using the inhibit bit and simply catching `EINTR`, which is what the current implementation does. 187 | 188 | # Drawbacks 189 | [drawbacks]: #drawbacks 190 | 191 | This obviously adds complexity, and partially blurs the line between userspace and kernel. 192 | The TCB will also significantly grow in size, some of which also needs to store pthread information. 193 | However, from a microkernel perspecive, it would be useful to move as much of POSIX logic as possible to userspace. 194 | It would also likely improve the performance of signals, even compared to existing monolithic kernels such as Linux. 195 | 196 | Since the sigaltstack logic is done in the trampolines, there's a large amount of assembly, but not significantly more than other libcs' signal trampolines on monolithic kernels. 197 | 198 | # Alternatives 199 | [alternatives]: #alternatives 200 | 201 | The kernel already has a basic signal implementation in the kernel. 202 | It would be possible to extend this to include realtime signals, sigwait/sigsuspend, and implement all the sigaction flags. 203 | However, this will severely limit the ability for redox-rt to quickly protect critical sections in async-signal-safe functions. 204 | 205 | Alternatively, it would be possible to only implement the logic behind the _inhibit_ bit. 206 | That said, this breaks down if larger critical sections are used, that may internally block. 207 | In those cases, sigprocmask would likely be used to disable all signals inside that section, which would suffer from the same base syscall latency (usually a few hundred cycles), and this needs to be called twice. 208 | 209 | # Unresolved questions 210 | [unresolved]: #unresolved-questions 211 | 212 | ## Who should send the signals? 213 | 214 | In this proposal, the kernel will be responsible for sending the signals, and userspace will merely control sigprocmasking, and raising some signals on its own. 215 | However, if performance is not sufficiently significant for this to stay in the kernel, it might make sense for _kill(3)_ and/or _sigqueue(3)_ to be implemented as a (fast synchronous) IPC call to the process manager. 216 | A process manager has been suggested as a way for the kernel to only abstract _contexts_, and let that manager define _threads_, _processes_, _sessions_, and _process groups_. 217 | 218 | ## `siginfo_t` 219 | 220 | POSIX, at least provided Redox will support XSI, requires extra information to be obtainable for each signal. 221 | This is highly likely possible to pass in an array per signal, using the pending bits as synchronization, but may require realtime signals to be queued internally in the kernel. 222 | 223 | ## Memory orderings 224 | 225 | [mem-orderings]: #mem-orderings 226 | 227 | Acquire+Release+Relaxed should be sufficient, but for now it is on a correctness basis assumed that the implementation uses SeqCst. 228 | 229 | # Acknowledgements 230 | 231 | This RFC has been developed as part of the _Unix-style Signals_ project. The project is funded through [NGI Zero Core](https://nlnet.nl/core), a fund established by [NLnet](https://nlnet.nl) with financial support from the European Commission's [Next Generation Internet](https://ngi.eu) program. Learn more at the [NLnet project page](https://nlnet.nl/project/RedoxOS-Signals). 232 | 233 | [NLnet foundation logo](https://nlnet.nl) 234 | [NGI Zero Logo](https://nlnet.nl/core) 235 | -------------------------------------------------------------------------------- /text/0009-namespace-scheme.md: -------------------------------------------------------------------------------- 1 | - Feature Name: namespace-scheme 2 | - Start Date: 2024-01-17 3 | - RFC PR: (leave this empty) 4 | - Redox Issue: (leave this empty) 5 | 6 | # Summary 7 | [summary]: #summary 8 | 9 | As discussed between jeremy_soller and rw_van. 10 | 11 | Adopting the new scheme naming format `/scheme/scheme_name` creates an ambiguity when mounting a scheme in the effective namespace. Mounting a scheme will change from `open(":scheme_name", O_CREAT)` to 12 | 13 | ``` 14 | open("/scheme/namespace/scheme_name", O_CREAT | O_EXCL) 15 | ``` 16 | 17 | (See [Unresolved questions](#unresolved-questions) for naming options and issues.) 18 | 19 | # Motivation 20 | [motivation]: #motivation 21 | 22 | Schemes exist within a namespace. Currently, a scheme is referred to as `scheme_name:`. The namespace manager is `RootScheme` in the kernel, and it is addressed using a scheme name of `:`, which is effectively an empty name `""` followed by a colon `:` separator. Creating/mounting a scheme is done using the format `:scheme_name`, i.e. an empty name followed by a separator and the scheme name as a path. This allows path parsing to naturally detect references to the root scheme, and to pass the scheme name to the root scheme for mounting. 23 | 24 | The change to a naming format of `/scheme/scheme_name` make it so a request to the root scheme to mount `scheme_name` cannot be naturally parsed. There is no separator that indicates an empty scheme name. 25 | 26 | # Detailed design 27 | [design]: #detailed-design 28 | 29 | ## Behavior 30 | 31 | 1. Currently, the effective namespace is referred to using `":"`, i.e. an empty name `""` followed by a colon separator `":"`. The proposed new name for the effective namespace is `/scheme/namespace`. 32 | 33 | 2. Currently, mounting scheme `scheme_name` is done by `open(":scheme_name", O_CREAT)`. The new open call will be 34 | 35 | ``` 36 | open("/scheme/namespace/scheme_name", O_CREAT | O_EXCL) 37 | ``` 38 | 39 | This will result in the new scheme being mounted as `/scheme/scheme_name`. 40 | 41 | 3. If the scheme has already been mounted and is healthy, a second create call will fail (`EEXIST`). 42 | 43 | 4. `open("/scheme/namespace/scheme_name")` **without** `(O_CREAT | O_EXCL)` present will provide an fd that can be used to query or set **TBD** information about the namespace's view of the scheme. (See [Unresolved questions](#unresolved-questions).) 44 | 45 | ## Changes 46 | 47 | ### Kernel dispatch 48 | 49 | 1. [kernel::syscall::fs::open](https://gitlab.redox-os.org/redox-os/kernel/-/blob/master/src/syscall/fs.rs?ref_type=heads#L66) needs to translate the empty scheme name `""` to the new namespace scheme name `"namespace"`. 50 | 51 | 2. The function [scheme::SchemeList::new_ns](https://gitlab.redox-os.org/redox-os/kernel/-/blob/master/src/scheme/mod.rs?ref_type=heads#L169) inserts an empty string `""` into the list of schemes as a key, so that parsing a path with an empty scheme is naturally forwarded to the RootScheme. Changing this string to `"namespace"`, in combination with the changes to `open` above, will enable the new format. 52 | 53 | ### redox-scheme 54 | 55 | [redox_scheme::Socket::create_inner](https://gitlab.redox-os.org/redox-os/redox-scheme/-/blob/master/src/lib.rs?ref_type=heads#L129) should be updated as soon as possible to use the new format. 56 | 57 | ### Scheme providers 58 | 59 | `redox-scheme` is the best practice interface for schemes. Wherever possible, schemes should be updated to use redox-scheme rather than implementing the scheme protocol themselves. 60 | 61 | For those schemes that cannot not use `redox-scheme`, the `open` call to create the scheme will need to be modified. 62 | 63 | ### Contain 64 | 65 | Contain should already be using redox-scheme and should therefore update automatically when redox-scheme is updated. However, this should be verified. 66 | 67 | # Drawbacks 68 | [drawbacks]: #drawbacks 69 | 70 | There is no compelling reason to not do this. 71 | 72 | # Alternatives 73 | [alternatives]: #alternatives 74 | 75 | ## Status Quo 76 | 77 | Due to the change in scheme naming described in the "Scheme Path" RFC, there is an ambiguity in referencing the root scheme. Does `/scheme/s1` mean "open the null path on scheme "s1" or does it mean open "s1" on the root scheme? This is unnecessarily confusing and would require coding of `open` flags to resolve the ambiguity. 78 | 79 | ## Special Files and Mount Points 80 | 81 | Unix uses "special files", e.g. block-special and character-special files, to refer to physical devices and pseudo-devices. Special files are indicated by a filetype in their status flags, and the "major" and "minor" numbers are interpreted to determine the driver and specific resource (e.g. partition) for the special file. For filesystem providers, the special device can be "mounted" at a named point in the filesystem, e.g. the block-special device `/dev/nvme0n1p2` can be mounted at `/home`. This masks the file named `/home` and connects the filesystem to that point. 82 | 83 | Conceivably, Redox could use a status bit to indicate a path that represents a "service provider" (scheme) that could be mounted, and then the "mount" system call could make the scheme accessible at some location in the filesystem. There would be a (mostly) one-to-one correspondence between what special files exist, and what schemes are in the namespace. This ultimately is very similar to the proposed mapping. 84 | 85 | # Unresolved questions 86 | [unresolved]: #unresolved-questions 87 | 88 | 1. What should the name of the effective namespace scheme (RootScheme) be? 89 | 90 | - `/scheme/ns` 91 | - `/scheme/namespace` (see note below) 92 | - `/namespace` (treats namespace as a thing that is not a scheme) 93 | - `/scheme/ens` (`/scheme/rns` for "real namespace") 94 | 95 | The recommended name in this RFC, `/scheme/namespace`, will be the choice once this RFC is approved. 96 | 97 | 2. We need a naming format for distinct namespaces, e.g. `/scheme/namespaces/n` for namespace `n`. It should be separated from the naming of the effective namespace. 98 | 99 | - Having names for distinct namespaces would allow us to have a capability-based security mechanism where schemes can be inserted or removed from namespaces. It would also enable user-managed namespaces and schemes. Details to be discussed in another RFC. 100 | 101 | - Using a name that is distinct from the normal namespace, e.g. `/scheme/namespaces` (plural), would allow us to have arbitrary names for new namespaces. If we choose `/scheme/namespace/scheme_name` for mounting a scheme, and `/scheme/namespace/n` to refer to namespace `n`, then we are forced to use numbers rather than names for namespaces, and we will have schemes under the namespace folder at two different levels, creating confusion. 102 | 103 | This is out of scope for this RFC. 104 | 105 | 3. We need better management of namespaces, but it should be addressed in a separate RFC. 106 | 107 | This is out of scope for this RFC. 108 | 109 | 4. Should we change for `open` to `mount` when creating a new scheme? It includes additional flags and options. 110 | 111 | This is out of scope for this RFC. 112 | 113 | 5. Assume that when a scheme is restarted, it will call `open("/scheme/namespace/scheme_name", O_CREAT | O_EXCL)`. If those flags are not provided, then the open is to query the namespace about the state of the scheme. What actions can be performed on the resulting fd, and what information is provided? 114 | 115 | This is out of scope for this RFC. 116 | --------------------------------------------------------------------------------