├── LICENSE ├── README.md ├── cosa ├── 20200804-cfg-spec.md └── 20200805-meta-merge.md ├── os ├── 20201103-full-disk-raid.md ├── 20211014-s390x-secure-execution.md ├── 20231024-s390x-zvm-secure-ipl.md └── coreos-layering.md └── template.md /LICENSE: -------------------------------------------------------------------------------- 1 | Apache License 2 | Version 2.0, January 2004 3 | http://www.apache.org/licenses/ 4 | 5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 6 | 7 | 1. Definitions. 8 | 9 | "License" shall mean the terms and conditions for use, reproduction, 10 | and distribution as defined by Sections 1 through 9 of this document. 11 | 12 | "Licensor" shall mean the copyright owner or entity authorized by 13 | the copyright owner that is granting the License. 14 | 15 | "Legal Entity" shall mean the union of the acting entity and all 16 | other entities that control, are controlled by, or are under common 17 | control with that entity. For the purposes of this definition, 18 | "control" means (i) the power, direct or indirect, to cause the 19 | direction or management of such entity, whether by contract or 20 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 21 | outstanding shares, or (iii) beneficial ownership of such entity. 22 | 23 | "You" (or "Your") shall mean an individual or Legal Entity 24 | exercising permissions granted by this License. 25 | 26 | "Source" form shall mean the preferred form for making modifications, 27 | including but not limited to software source code, documentation 28 | source, and configuration files. 29 | 30 | "Object" form shall mean any form resulting from mechanical 31 | transformation or translation of a Source form, including but 32 | not limited to compiled object code, generated documentation, 33 | and conversions to other media types. 34 | 35 | "Work" shall mean the work of authorship, whether in Source or 36 | Object form, made available under the License, as indicated by a 37 | copyright notice that is included in or attached to the work 38 | (an example is provided in the Appendix below). 39 | 40 | "Derivative Works" shall mean any work, whether in Source or Object 41 | form, that is based on (or derived from) the Work and for which the 42 | editorial revisions, annotations, elaborations, or other modifications 43 | represent, as a whole, an original work of authorship. For the purposes 44 | of this License, Derivative Works shall not include works that remain 45 | separable from, or merely link (or bind by name) to the interfaces of, 46 | the Work and Derivative Works thereof. 47 | 48 | "Contribution" shall mean any work of authorship, including 49 | the original version of the Work and any modifications or additions 50 | to that Work or Derivative Works thereof, that is intentionally 51 | submitted to Licensor for inclusion in the Work by the copyright owner 52 | or by an individual or Legal Entity authorized to submit on behalf of 53 | the copyright owner. For the purposes of this definition, "submitted" 54 | means any form of electronic, verbal, or written communication sent 55 | to the Licensor or its representatives, including but not limited to 56 | communication on electronic mailing lists, source code control systems, 57 | and issue tracking systems that are managed by, or on behalf of, the 58 | Licensor for the purpose of discussing and improving the Work, but 59 | excluding communication that is conspicuously marked or otherwise 60 | designated in writing by the copyright owner as "Not a Contribution." 61 | 62 | "Contributor" shall mean Licensor and any individual or Legal Entity 63 | on behalf of whom a Contribution has been received by Licensor and 64 | subsequently incorporated within the Work. 65 | 66 | 2. Grant of Copyright License. Subject to the terms and conditions of 67 | this License, each Contributor hereby grants to You a perpetual, 68 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 69 | copyright license to reproduce, prepare Derivative Works of, 70 | publicly display, publicly perform, sublicense, and distribute the 71 | Work and such Derivative Works in Source or Object form. 72 | 73 | 3. Grant of Patent License. Subject to the terms and conditions of 74 | this License, each Contributor hereby grants to You a perpetual, 75 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 76 | (except as stated in this section) patent license to make, have made, 77 | use, offer to sell, sell, import, and otherwise transfer the Work, 78 | where such license applies only to those patent claims licensable 79 | by such Contributor that are necessarily infringed by their 80 | Contribution(s) alone or by combination of their Contribution(s) 81 | with the Work to which such Contribution(s) was submitted. If You 82 | institute patent litigation against any entity (including a 83 | cross-claim or counterclaim in a lawsuit) alleging that the Work 84 | or a Contribution incorporated within the Work constitutes direct 85 | or contributory patent infringement, then any patent licenses 86 | granted to You under this License for that Work shall terminate 87 | as of the date such litigation is filed. 88 | 89 | 4. Redistribution. You may reproduce and distribute copies of the 90 | Work or Derivative Works thereof in any medium, with or without 91 | modifications, and in Source or Object form, provided that You 92 | meet the following conditions: 93 | 94 | (a) You must give any other recipients of the Work or 95 | Derivative Works a copy of this License; and 96 | 97 | (b) You must cause any modified files to carry prominent notices 98 | stating that You changed the files; and 99 | 100 | (c) You must retain, in the Source form of any Derivative Works 101 | that You distribute, all copyright, patent, trademark, and 102 | attribution notices from the Source form of the Work, 103 | excluding those notices that do not pertain to any part of 104 | the Derivative Works; and 105 | 106 | (d) If the Work includes a "NOTICE" text file as part of its 107 | distribution, then any Derivative Works that You distribute must 108 | include a readable copy of the attribution notices contained 109 | within such NOTICE file, excluding those notices that do not 110 | pertain to any part of the Derivative Works, in at least one 111 | of the following places: within a NOTICE text file distributed 112 | as part of the Derivative Works; within the Source form or 113 | documentation, if provided along with the Derivative Works; or, 114 | within a display generated by the Derivative Works, if and 115 | wherever such third-party notices normally appear. The contents 116 | of the NOTICE file are for informational purposes only and 117 | do not modify the License. You may add Your own attribution 118 | notices within Derivative Works that You distribute, alongside 119 | or as an addendum to the NOTICE text from the Work, provided 120 | that such additional attribution notices cannot be construed 121 | as modifying the License. 122 | 123 | You may add Your own copyright statement to Your modifications and 124 | may provide additional or different license terms and conditions 125 | for use, reproduction, or distribution of Your modifications, or 126 | for any such Derivative Works as a whole, provided Your use, 127 | reproduction, and distribution of the Work otherwise complies with 128 | the conditions stated in this License. 129 | 130 | 5. Submission of Contributions. Unless You explicitly state otherwise, 131 | any Contribution intentionally submitted for inclusion in the Work 132 | by You to the Licensor shall be under the terms and conditions of 133 | this License, without any additional terms or conditions. 134 | Notwithstanding the above, nothing herein shall supersede or modify 135 | the terms of any separate license agreement you may have executed 136 | with Licensor regarding such Contributions. 137 | 138 | 6. Trademarks. This License does not grant permission to use the trade 139 | names, trademarks, service marks, or product names of the Licensor, 140 | except as required for reasonable and customary use in describing the 141 | origin of the Work and reproducing the content of the NOTICE file. 142 | 143 | 7. Disclaimer of Warranty. Unless required by applicable law or 144 | agreed to in writing, Licensor provides the Work (and each 145 | Contributor provides its Contributions) on an "AS IS" BASIS, 146 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 147 | implied, including, without limitation, any warranties or conditions 148 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 149 | PARTICULAR PURPOSE. You are solely responsible for determining the 150 | appropriateness of using or redistributing the Work and assume any 151 | risks associated with Your exercise of permissions under this License. 152 | 153 | 8. Limitation of Liability. In no event and under no legal theory, 154 | whether in tort (including negligence), contract, or otherwise, 155 | unless required by applicable law (such as deliberate and grossly 156 | negligent acts) or agreed to in writing, shall any Contributor be 157 | liable to You for damages, including any direct, indirect, special, 158 | incidental, or consequential damages of any character arising as a 159 | result of this License or out of the use or inability to use the 160 | Work (including but not limited to damages for loss of goodwill, 161 | work stoppage, computer failure or malfunction, or any and all 162 | other commercial damages or losses), even if such Contributor 163 | has been advised of the possibility of such damages. 164 | 165 | 9. Accepting Warranty or Additional Liability. While redistributing 166 | the Work or Derivative Works thereof, You may choose to offer, 167 | and charge a fee for, acceptance of support, warranty, indemnity, 168 | or other liability obligations and/or rights consistent with this 169 | License. However, in accepting such obligations, You may act only 170 | on Your own behalf and on Your sole responsibility, not on behalf 171 | of any other Contributor, and only if You agree to indemnify, 172 | defend, and hold each Contributor harmless for any liability 173 | incurred by, or claims asserted against, such Contributor by reason 174 | of your accepting any such warranty or additional liability. 175 | 176 | END OF TERMS AND CONDITIONS 177 | 178 | APPENDIX: How to apply the Apache License to your work. 179 | 180 | To apply the Apache License to your work, attach the following 181 | boilerplate notice, with the fields enclosed by brackets "[]" 182 | replaced with your own identifying information. (Don't include 183 | the brackets!) The text should be enclosed in the appropriate 184 | comment syntax for the file format. We also recommend that a 185 | file or class name and description of purpose be included on the 186 | same "printed page" as the copyright notice for easier 187 | identification within third-party archives. 188 | 189 | Copyright [yyyy] [name of copyright owner] 190 | 191 | Licensed under the Apache License, Version 2.0 (the "License"); 192 | you may not use this file except in compliance with the License. 193 | You may obtain a copy of the License at 194 | 195 | http://www.apache.org/licenses/LICENSE-2.0 196 | 197 | Unless required by applicable law or agreed to in writing, software 198 | distributed under the License is distributed on an "AS IS" BASIS, 199 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 200 | See the License for the specific language governing permissions and 201 | limitations under the License. 202 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Enhancements Tracking for CoreOS-based Systems 2 | 3 | Inspired by [openshift/enhancements](https://github.com/openshift/enhancements/) 4 | and [rust-lang/rfcs](https://github.com/rust-lang/rfcs/) and others. 5 | 6 | To create a new enhancement, copy `template.md` to `directory/my-awesome-enhancement.md` 7 | where `directory` is the tool or area impacted by the proposal and `my-awesome-enhancement.md` 8 | is descriptive of the feature. 9 | 10 | This repo serves a location to record design decisions that affect systems like 11 | Fedora CoreOS and RHEL CoreOS. This is also the location where discussion can 12 | happen on those design decisions. 13 | -------------------------------------------------------------------------------- /cosa/20200804-cfg-spec.md: -------------------------------------------------------------------------------- 1 | # CoreOS Assembler Config Specification 2 | - Subject ID: COSA.20200804 3 | - Superceeds: None 4 | - Conflicts: None 5 | 6 | ## Summary 7 | 8 | The RHCOS development pipelines use a "jobspec" to describe the pipeline runs. This enhancement proposes a YAML-based interface based loosely on the RHCOS jobspec. 9 | 10 | ## Goals 11 | 12 | 1. Provide a file-based interface into COSA 13 | 1. Prep for breaking up the Pipeline monoliths 14 | 1. Provide a migration path for RHCOS pipelines 15 | 1. Prepare for pipeline Nirvana (where *COS pipelines can live in peace and harmony) 16 | 17 | ## Why? 18 | 19 | The innovation of the RHCOS JobSpec was it provided a clear seperation of configuration from the code; the logic was conflated. Further the jobspec allowed the logic to be run with different inputs. 20 | 21 | To support this, a "COSAspec" is being introduced to provide a CLI file-interface. 22 | 23 | ## YAML to EnvVars 24 | 25 | * For Bash and Pythonic CLIs, each argument will have an envVar added. 26 | * At entry into COSA, the optional COSAspec will be checked 27 | * Values will be parsed and emit as EnvVars by a common entry point. 28 | * The EnvVars will set defaults to the CLI commands 29 | 30 | Note: The Mantle code is GoLang based. The use of envVars is supported by the Cobra CLI, but needs further research and consideration. The Mantle code is not considered in this design standard. 31 | 32 | ## Types 33 | 34 | * List values will be joined to comma seperated strings 35 | * Dicts/Map will be flattened with an `_` between levels 36 | * Boolean values should be 0, 1, True, False, true, false. 37 | * Number types are strings and the consuming code is repsonsbile for casting to the type. 38 | 39 | Due to the lack of cohesion, the mix of Python, Bash and GoLang, each component will necessarily handle empty and null values. 40 | 41 | 42 | ## Example: 43 | 44 | For the `coreos-assembler build` command: 45 | ``` 46 | build: 47 | force: false 48 | force_nocache: false 49 | skip_prune: false 50 | parent: 51 | tag: 52 | ``` 53 | 54 | Would render the envVars: 55 | ``` 56 | COSA_BUILD_FORCE=false 57 | COSA_BUILD_FORCE_NOCAHCE=false 58 | COSA_BUILD_SKIP_PRUNE=false 59 | ... 60 | ``` 61 | 62 | Then in `cmd-build` the defaults would be: 63 | ``` 64 | FORCE="${COSA_BUILD_FORCE:-}" 65 | FORCE_IMAGE="${COSA_BUILD_FORCE_IMAGE:-}" 66 | SKIP_PRUNE="${COSA_BUILD_SKIP_PRUNE:-0}" 67 | ... 68 | ``` 69 | 70 | ## List value example 71 | 72 | ``` 73 | extend: 74 | aws: 75 | regions: 76 | - us-east-1 77 | - us-west-1 78 | ``` 79 | Would render at `COSA_EXTEND_AWS_REGIONS="us-east-1,us-west-1"` 80 | 81 | ## Dict example 82 | 83 | ``` 84 | foo: 85 | bar: 86 | baz: 87 | 1: true 88 | 2: true 89 | ``` 90 | 91 | Would render as `COSA_FOO_BAR_BAZ_1=true` and `COSA_FOO_BAR_BAZ_2=true` 92 | 93 | ## PR Standard 94 | 95 | A thorough review of the COSA code-base, a comparison between the FCOS and RHCOS pipelines shows that each command will need to be handled seperately. 96 | 97 | Therefore, for each command updated to modify this interface: 98 | - The prefix `COSA_` will be used for envVars (e.g. `COSA_EXTEND_AZURE`) 99 | - No calculated values. Commands may not parse the spec file itself. 100 | - One command per PR. 101 | 102 | 103 | Naming convention for commands (in order of precedence): 104 | - `cmd-buildextend-*`: `COSA_EXTEND_*` 105 | - `cmd-build`: `COSA_BUILD_*` 106 | - `cmd-.*`: `COSA__*` 107 | - `.*`: no interface 108 | -------------------------------------------------------------------------------- /cosa/20200805-meta-merge.md: -------------------------------------------------------------------------------- 1 | # Merging of build meta-data 2 | - Specification ID: cosa-20200805 3 | - Supersedes: None 4 | - Conflicts: None 5 | 6 | # Background 7 | 8 | One of the biggest blockers to performing parallel builds, distributed builds and/or re-starting failed builds is the ability to lock write access to `meta.json` in COSA. 9 | 10 | Under the following assumptions, COSA needs to support merging of `meta.json`: 11 | - Not all `cmd-buildextend-*` operations will happen on the same pod or Linux namespace 12 | - S3 is the authoritative store for builds. S3 is eventually consistent meaning that the version read from S3 may be replaced after reading from S3. Implementing distributed locking on `meta.json` based on S3 is problematic. 13 | - For RHCOS we need the ability to do partial builds and then back-fill artifacts (e.g. only build qemu, but not cloud) and later build cloud. 14 | - Developers want to restart failed builds 15 | - Consumers of the artifacts from COSA want to reduce the time to build and test 16 | 17 | ## State of a build is a point-of-view statement 18 | 19 | To COSA a build is usually `ostree` + `qcow2`. The notion of a "failed" or "partial" build is a point of view. COSA does provision the build id, but COSA does not know what comprised the artifacts of the "build". Usually developers and consumers speak of a build as the base `ostree` and the _optional_ artifacts. 20 | 21 | In COSA a "build" only starts to exist when the `meta.json` is incepted after the `ostree` has been written. The build directory may exist, but the `meta.json` that is produced immediately is valid for any consumer. In most cases a "failed" build means that an artifact in the `buildextend-*` commands has failed to publish (such as AWS) not that the artifact itself failed to build; the artifact was indeed built. 22 | 23 | Further, a "failed build" is also used to mean that some part of a pipeline failed to complete. COSA itself does not have any concept of dependencies, much less the idea that a multiple COSA commands can be run. COSA only has implied dependencies that some artifacts requires others. In other words, COSA doesn't know when the "build" is complete until after something else considers it's complete. 24 | 25 | ## Multiple `meta.json` 26 | 27 | Under the axiom that the state of a build is usually determined from the point-of-view of a pipeline and to support distributed and parallel builds, builds will be produced with multiple `meta.json` that will be merged together after the artifact is built. 28 | 29 | ## `cosalib.meta` library 30 | 31 | `cmd-meta` currently uses the [`cosalib.meta`](https://github.com/coreos/coreos-assembler/blob/main/src/cosalib/meta.py) library for reading and writing `meta.json`. To support merging of `meta.json` files, the library will be updated to allow for merging in-memory updates with on-disk content. 32 | 33 | Only `meta.json` that is validated will be merged under the following rules: 34 | - source and destination meta must match the buildid, name and ostree commit 35 | - on-disk `meta.json` takes precedence over the in-memory meta 36 | - all versions must be validated before and after merging before commiting to disk. 37 | 38 | Prior to merging `cosalib.meta` will lock the localfile. 39 | 40 | ## merge mode denoted in `meta.json` 41 | 42 | To support production of meta meta data, `cmd-build` will have a key to support merging meta-data in a delayed fashion. 43 | 44 | ``` 45 | { 46 | "coreos-assembler.delayed-meta-merge": true 47 | } 48 | ``` 49 | 50 | `cmd-build` will have the flag of `--delay-meta-merge` added to set the merge mode. The default is `false`. 51 | 52 | ## Delayed Merging of meta-data 53 | 54 | When `coreos-assembler.delayed-meta-merge` is set to `true`, `cmd-buildextend.` commands will create suffixed `meta..json` instead of writing directly to `meta.json`. A new command called `cmd-meta-merge` will be introduced that will read `meta..json` and will ONLY combine the images and the cloud-build specific sections into `meta.json`. 55 | 56 | Orchestration layers will be responsible for shuffling artifacts around, e.g. `cmd-upload-build`. `cmd-upload-build --meta ` will only upload the artifacts described in the meta. 57 | 58 | All `meta..json` will be removed when their contents are merged into `meta.json`. If the artifacts are produced across compute domains, orchestration layers are responsible for calling `cmd-upload-build` to stage the artifacts. 59 | 60 | `cmd-meta-merge` will be able to read `meta..json`. `cmd-meta-merge` will: 61 | - read in `meta.json` and validate it. 62 | - iterate over `meta..json` in the path or in S3. 63 | - Validation against the schema for both source and destination meta-data before and after merging. 64 | - the `meta..json` will be deleted 65 | 66 | ## Merging logic 67 | 68 | By default, only keys not found in `meta.json` will be added from `meta..json`. To support rebuilds/restart, a force option will exist in `cmd-meta-merge` with a default of `false`. 69 | 70 | Keys that will be merged: 71 | - fields from the images section 72 | - known cloud names such as AWS, Azure, GPC, etc. 73 | 74 | ## Default builds 75 | 76 | When `coreos-assembler.delayed-meta-merge` is false or absent, `cmd-buildextend-*` commands will will produce a `meta..json` and immediately merge into `meta.json` using the `cmd-meta-merge`. To support bash commands `finalize-artifact` will use `cmd-meta-meta` for `meta.json`. 77 | 78 | The immediate merging of metas requires: 79 | - time sync clocks 80 | - shared disk for `builds/` 81 | 82 | ## Validation Requirement 83 | 84 | All merged `meta.json` must be validated against the schema. 85 | -------------------------------------------------------------------------------- /os/20201103-full-disk-raid.md: -------------------------------------------------------------------------------- 1 | # Full disk RAID support 2 | 3 | ## Theory of operation 4 | 5 | Users can write FCC sugar that enables redundant boot and/or an encrypted root volume. There are three cases: 6 | 7 | 1. The sugar only specifies encrypted root. FCCT desugars the section into a LUKS volume within the existing root partition, and a root filesystem layered on top. 8 | 2. The sugar only specifies redundant boot. FCCT desugars the section into an Ignition config which completely recreates the partition structure on each specified disk. The config creates partitions, RAID volumes, and filesystems for the root and boot partitions; partitions and independent filesystems for each ESP replica; and partitions for each BIOS-BOOT replica. 9 | 3. The sugar specifies both. FCCT desugars similarly to the second case, with the addition of a LUKS volume. 10 | 11 | If redundant boot is specified, Dracut glue detects the corresponding config stanzas, saves the corresponding on-disk data to RAM before the Ignition disks phase, and restores it after disks phase completes. The glue also replicates the boot sector when replicating BIOS-BOOT. 12 | 13 | Each piece of the replication logic (root partition, boot partition, ESP, and BIOS-BOOT + MBR) functions independently. An advanced user can create nonstandard partition configurations (for example, RAID-1 boot with RAID-5 root) by manually configuring partitions in their FCC. 14 | 15 | Recovery from a disk failure is out of scope for this enhancement. We will recommend that users recover from a hardware failure by reprovisioning the machine. 16 | 17 | ## FCC sugar 18 | 19 | ```yaml 20 | variant: fcos 21 | version: 1.3.0 22 | boot_device: 23 | # Allows choosing from multiple disk layout templates hardcoded in FCCT. 24 | # We might use this to support CPU architectures with different partition 25 | # requirements (e.g. ppc64le) or to change the disk layout in future OS 26 | # releases. 27 | # Optional, defaults to x86_64 28 | layout: x86_64 29 | # If specified, mirror every default partition 30 | mirror: 31 | devices: 32 | # Two or more devices are required 33 | - /dev/vda 34 | - /dev/vdb 35 | # If specified, encrypt the root partition 36 | luks: 37 | # The contents of the existing clevis struct, without the `custom` 38 | # field (which is too advanced for sugar). We don't need to support 39 | # static key files because they don't work for root. 40 | tang: 41 | - url: https://example.com/tang.tang.tang 42 | thumbprint: x 43 | tpm2: true 44 | threshold: 2 45 | ``` 46 | 47 | The `boot_device` struct is at the top level, rather than under `storage`, because CT sugar has historically been placed at the top level and because Go struct embedding semantics make it harder to put it under `storage`. 48 | 49 | The sugar does not support advanced features such as RAID spares, RAID levels other than 1, or filesystem/mount options. Users can manually specify individual partitions, RAIDs, and filesystems if desired. 50 | 51 | ## Rendered Ignition config 52 | 53 | ### LUKS only 54 | 55 | ```json 56 | { 57 | "ignition": { 58 | "version": "3.2.0" 59 | }, 60 | "storage": { 61 | "filesystems": [ 62 | { 63 | "device": "/dev/disk/by-id/dm-name-root", 64 | "format": "xfs", 65 | "label": "root", 66 | "wipeFilesystem": true 67 | } 68 | ], 69 | "luks": [ 70 | { 71 | "clevis": { 72 | "tpm2": true 73 | }, 74 | "device": "/dev/disk/by-partlabel/root", 75 | "label": "luks-root", 76 | "name": "root", 77 | "wipeVolume": true 78 | } 79 | ] 80 | } 81 | } 82 | ``` 83 | 84 | ### RAID only 85 | 86 | ```json 87 | { 88 | "ignition": { 89 | "version": "3.2.0" 90 | }, 91 | "storage": { 92 | "disks": [ 93 | { 94 | "device": "/dev/vda", 95 | "partitions": [ 96 | { 97 | "label": "bios-1", 98 | "sizeMiB": 1, 99 | "typeGuid": "21686148-6449-6E6F-744E-656564454649" 100 | }, 101 | { 102 | "label": "esp-1", 103 | "sizeMiB": 127, 104 | "typeGuid": "C12A7328-F81F-11D2-BA4B-00A0C93EC93B" 105 | }, 106 | { 107 | "label": "boot-1", 108 | "sizeMiB": 384 109 | }, 110 | { 111 | "label": "root-1" 112 | } 113 | ], 114 | "wipeTable": true 115 | }, 116 | { 117 | "device": "/dev/vdb", 118 | "partitions": [ 119 | { 120 | "label": "bios-2", 121 | "sizeMiB": 1, 122 | "typeGuid": "21686148-6449-6E6F-744E-656564454649" 123 | }, 124 | { 125 | "label": "esp-2", 126 | "sizeMiB": 127, 127 | "typeGuid": "C12A7328-F81F-11D2-BA4B-00A0C93EC93B" 128 | }, 129 | { 130 | "label": "boot-2", 131 | "sizeMiB": 384 132 | }, 133 | { 134 | "label": "root-2" 135 | } 136 | ], 137 | "wipeTable": true 138 | } 139 | ], 140 | "filesystems": [ 141 | { 142 | "device": "/dev/disk/by-partlabel/esp-1", 143 | "format": "vfat", 144 | "label": "esp-1", 145 | "wipeFilesystem": true 146 | }, 147 | { 148 | "device": "/dev/disk/by-partlabel/esp-2", 149 | "format": "vfat", 150 | "label": "esp-2", 151 | "wipeFilesystem": true 152 | }, 153 | { 154 | "device": "/dev/md/md-boot", 155 | "format": "ext4", 156 | "label": "boot", 157 | "wipeFilesystem": true 158 | }, 159 | { 160 | "device": "/dev/md/md-root", 161 | "format": "xfs", 162 | "label": "root", 163 | "wipeFilesystem": true 164 | } 165 | ], 166 | "raid": [ 167 | { 168 | "devices": [ 169 | "/dev/disk/by-partlabel/boot-1", 170 | "/dev/disk/by-partlabel/boot-2" 171 | ], 172 | "level": "raid1", 173 | "name": "md-boot", 174 | "options": [ 175 | "--metadata=1.0" 176 | ] 177 | }, 178 | { 179 | "devices": [ 180 | "/dev/disk/by-partlabel/root-1", 181 | "/dev/disk/by-partlabel/root-2" 182 | ], 183 | "level": "raid1", 184 | "name": "md-root" 185 | } 186 | ] 187 | } 188 | } 189 | ``` 190 | 191 | FCCT assigns a unique serial number to each disk and appends it to the label assigned to each disk partition. This ensures each partition can be referenced by a unique path in `/dev/disk/by-partlabel`. 192 | 193 | When desugaring, FCCT should be careful to allow explicit settings elsewhere in the config to override settings originating from sugar. This could be used e.g. to specify filesystem formatting options. 194 | 195 | FCCT also hardcodes the partition size for each partition, replicating the existing partition layout. If we change the default partition sizes in future, we can encode them in a new layout, and we can also change the default layout if we bump the FCC major version. 196 | 197 | The original boot disk is not treated specially; it's wiped and rewritten just like any other replica. 198 | 199 | Each of root, boot, ESP, and BIOS-BOOT is placed on a separate partition. MD-RAID volumes are partitionable, so in principle we could put root and boot on the same RAID volume. However, this is uncommon (neither Anaconda nor Ignition currently allow it) and it complicates GRUB access to the boot filesystem. 200 | 201 | ### LUKS and RAID 202 | 203 | ```json 204 | { 205 | "ignition": { 206 | "version": "3.2.0" 207 | }, 208 | "storage": { 209 | "disks": [ 210 | { 211 | "device": "/dev/vda", 212 | "partitions": [ 213 | { 214 | "label": "bios-1", 215 | "sizeMiB": 1, 216 | "typeGuid": "21686148-6449-6E6F-744E-656564454649" 217 | }, 218 | { 219 | "label": "esp-1", 220 | "sizeMiB": 127, 221 | "typeGuid": "C12A7328-F81F-11D2-BA4B-00A0C93EC93B" 222 | }, 223 | { 224 | "label": "boot-1", 225 | "sizeMiB": 384 226 | }, 227 | { 228 | "label": "root-1" 229 | } 230 | ], 231 | "wipeTable": true 232 | }, 233 | { 234 | "device": "/dev/vdb", 235 | "partitions": [ 236 | { 237 | "label": "bios-2", 238 | "sizeMiB": 1, 239 | "typeGuid": "21686148-6449-6E6F-744E-656564454649" 240 | }, 241 | { 242 | "label": "esp-2", 243 | "sizeMiB": 127, 244 | "typeGuid": "C12A7328-F81F-11D2-BA4B-00A0C93EC93B" 245 | }, 246 | { 247 | "label": "boot-2", 248 | "sizeMiB": 384 249 | }, 250 | { 251 | "label": "root-2" 252 | } 253 | ], 254 | "wipeTable": true 255 | } 256 | ], 257 | "filesystems": [ 258 | { 259 | "device": "/dev/disk/by-partlabel/esp-1", 260 | "format": "vfat", 261 | "label": "esp-1", 262 | "wipeFilesystem": true 263 | }, 264 | { 265 | "device": "/dev/disk/by-partlabel/esp-2", 266 | "format": "vfat", 267 | "label": "esp-2", 268 | "wipeFilesystem": true 269 | }, 270 | { 271 | "device": "/dev/md/md-boot", 272 | "format": "ext4", 273 | "label": "boot", 274 | "wipeFilesystem": true 275 | }, 276 | { 277 | "device": "/dev/disk/by-id/dm-name-root", 278 | "format": "xfs", 279 | "label": "root", 280 | "wipeFilesystem": true 281 | } 282 | ], 283 | "luks": [ 284 | { 285 | "clevis": { 286 | "tpm2": true 287 | }, 288 | "device": "/dev/md/md-root", 289 | "label": "luks-root", 290 | "name": "root", 291 | "wipeVolume": true 292 | } 293 | ], 294 | "raid": [ 295 | { 296 | "devices": [ 297 | "/dev/disk/by-partlabel/boot-1", 298 | "/dev/disk/by-partlabel/boot-2" 299 | ], 300 | "level": "raid1", 301 | "name": "md-boot", 302 | "options": [ 303 | "--metadata=1.0" 304 | ] 305 | }, 306 | { 307 | "devices": [ 308 | "/dev/disk/by-partlabel/root-1", 309 | "/dev/disk/by-partlabel/root-2" 310 | ], 311 | "level": "raid1", 312 | "name": "md-root" 313 | } 314 | ] 315 | } 316 | } 317 | ``` 318 | 319 | ## Root filesystem 320 | 321 | The Dracut glue already saves and restores the root filesystem contents whenever the config specifies a filesystem with label `root` and `wipeFilesystem` set to `true`. 322 | 323 | ## Boot filesystem 324 | 325 | Similar to the root filesystem, the Dracut glue will save and restore the contents of `/boot` whenever the config specifies a filesystem labeled `boot` with `wipeFilesystem` set to `true`. 326 | 327 | GRUB includes a module for reading MD-RAID volumes, and we want to use it, since it will read from any available replica. However, the shipped boot sector is hardcoded to set `prefix` to the first boot disk, and we want to replicate the boot sector via a bit-for-bit copy (see below). For now, we'll create the RAID volume with metadata format 1.0, which puts the RAID superblock at the end of the partition, and allows GRUB to treat individual `/boot` replicas as independent filesystems. We'll modify the GRUB configs so UEFI boot always treats `/boot` as a RAID, and BIOS boot treats it as a RAID after the `normal` module and `grub.cfg` are loaded. That leaves a window during BIOS boots where boot will fail if the first enumerated device doesn't work. Once bootupd knows how to update BIOS GRUB, we can close the vulnerability window by having bootupd reinstall GRUB during the first boot (see [coreos/fedora-coreos-tracker#702](https://github.com/coreos/fedora-coreos-tracker/issues/702)). 328 | 329 | `/boot` replication prevents the use of the grubenv mechanism, as GRUB [disables grubenv support](https://www.gnu.org/software/grub/manual/grub/grub.html#Environment-block) when RAID is in use. 330 | 331 | ## EFI System Partition 332 | 333 | The Dracut glue will save the contents of the ESP filesystem whenever the config specifies a partition with the ESP type GUID, and will restore those contents to _every_ partition with that type GUID. 334 | 335 | Thus the ESP won't be RAIDed; we'll create multiple independent filesystems with identical directory trees. Keeping each replica independent avoids RAID desynchronization if the firmware chooses to modify the ESP, and makes sense because CoreOS does not modify the ESP after installation. Since there will no longer be a single canonical ESP, we'll no longer automount `/boot/efi` in the running system (see [coreos/fedora-coreos-tracker#694](https://github.com/coreos/fedora-coreos-tracker/issues/694)). 336 | 337 | When updating the bootloader, bootupd will need to find each partition with an ESP type GUID and update it independently. This assumes that all ESPs on the system are controlled by CoreOS. 338 | 339 | Per the usual Ignition philosophy, the copy logic will fail if the source ESP is missing. If we decide to [drop the ESP in AWS images](https://github.com/coreos/fedora-coreos-config/pull/407) and want to support mirrored boot disks in AWS, we'll need to provide an alternative `layout` without an ESP. 340 | 341 | ## BIOS Boot partition 342 | 343 | The Dracut glue will save an image of the BIOS boot partition whenever the config specifies a partition with the BIOS boot type GUID, and will restore that image to _every_ such partition. In addition, the same logic will save the boot sector (the first 440 bytes of the disk) and restore it to every disk where we write a BIOS Boot partition. 344 | 345 | The BIOS Boot partition is not updated at runtime, so RAID is not necessary. When updating the bootloader, bootupd will need to find each BIOS Boot type GUID and update both the partition and the corresponding boot sector. This assumes that the system is not dual-booting and belongs entirely to CoreOS. 346 | 347 | We're currently copying the BIOS Boot partition and boot sector bit-for-bit, rather than rerunning `grub-install` from the initrd or hand-modifying the boot sector. Since the boot sector hardcodes the offset of the BIOS Boot partition, the latter must be recreated at the same offset as the original partition. To avoid hardcoding an awkward constant in FCCT, we'll move the BIOS boot partition from its current offset of 512 MiB to the beginning of the disk (offset 1 MiB) in new images. The Dracut glue will detect relocation of the BIOS boot partition and fail the boot. 348 | 349 | There's no BIOS boot partition in 4Kn images. We could work around this, but to reduce the test matrix, we'll add an empty BIOS boot partition to the 4Kn image. 350 | 351 | ## PReP partition 352 | 353 | For ppc64le, we'll create independent replicas of the PReP partition, similar to the BIOS boot partition. 354 | 355 | ## Ignition changes 356 | 357 | The Dracut glue can detect the requisite Ignition config stanzas using `jq`, but that's brittle and we're already doing a lot of it in the initrd. Ideally Ignition would provide command-line arguments for executing relevant (hard-coded) queries against the cached Ignition config. For example: 358 | 359 | ```sh 360 | ignition -query wiped-filesystem=root 361 | ignition -query partition-type-guid=c12a7328-f81f-11d2-ba4b-00a0c93ec93b 362 | ``` 363 | 364 | We're currently planning to defer this change to future work. 365 | 366 | ## Architecture support 367 | 368 | ppc64le will need an alternate `layout` definition that includes the PReP partition. aarch64 will similarly need a `layout` without a BIOS Boot partition. 369 | 370 | For s390x, Ignition doesn't yet support DASD partitioning, but could work with FCP disks. In any event, we'd need provisions for rerunning zipl from the initramfs, if RAID support is even possible at all. Each DASD is connected via a particular bus identifier, which might need to be encoded into each respective copy of the bootloader. For now, this functionality is out of scope on s390x for 4.7. 371 | 372 | ## Configuring boot disk RAID in OCP 373 | 374 | For 4.7, we'll document a means for OCP users to use FCCT to render a RAID configuration for their Ignition pointer config. Better OCP UX for FCCs is left as future work. 375 | -------------------------------------------------------------------------------- /os/20211014-s390x-secure-execution.md: -------------------------------------------------------------------------------- 1 | # IBM Secure Execution for Linux 2 | --- 3 | 4 | # Overview 5 | 6 | [IBM Secure Execution for Linux®](https://www.ibm.com/docs/en/linux-on-systems?topic=virtualization-introducing-secure-execution-linux) is a hardware-based security technology that is 7 | built into the IBM z15™ and LinuxONE III generation systems. 8 | The goal of Secure Execution is to protect data of workloads that run in a KVM guest from being inspected or modified by the hypervisor. In particular, no hardware administrator, no KVM code, and no KVM administrator can access the data in a guest that was started as an IBM Secure Execution guest. 9 | 10 | IBM Secure Execution provides the following benefits: 11 | - Instead of relying on deterrence by using extensive audit tracks, IBM Secure Execution provides technology-enforced security rather than process or audit-based security. 12 | - It is possible to securely deploy workloads in the cloud. 13 | - As a cloud provider that uses IBM Secure Execution, you can attract sensitive workloads that, formerly, were restricted to the workload owner's system. 14 | - As a secure workload owner, you know that your workload is run in a secure manner, even outside your data center. 15 | - As a secure workload owner, you can choose where to run your workload, independently of the security level required. 16 | 17 | --- 18 | 19 | # Theory of operation 20 | 21 | Secure Execution is designed to provide scalable isolation for individual workloads to help protect them from not 22 | only external attacks, but also insider threats. In traditional x86 ring architectures, 23 | the host can access the memory and data of guest applications freely, leading to the 24 | potential for malicious software to be proliferated throughout the entire system. 25 | Isolation between host and guest environments is necessary to help prevent system compromise. 26 | Users can enable Secure Execution feature on IBM zKVM Host and Guest VMs to isolate them from each other. 27 | 28 | That means: 29 | 1. An encrypted Linux image, called `sdboot`, (kernel + initramfs + cmdline) is created using: 30 | - universal host public key provided by IBM. This happens only during CoreOS Qemu image build. 31 | - customer-specific host's public key(s). This happens only in runtime 32 | 2. kernel, initramfs and cmdline couldn't be modified without regeneration of `sdboot` image 33 | 3. zKVM host cannot access memory of secured guest VM 34 | 4. Guest VMs cannot access memory of other secured guest VMs 35 | 5. If anyone tries to access memory of secured guest VM, the attempt will get blocked at the HW level 36 | 37 | Since the encryption private keys are stored on the IBM Z hardware and firmware, the encrypted image can only be executed in a virtual machine on the host(s) it has been prepared for, and the image can’t be decrypted or tampered with outside of the designated host(s). 38 | 39 | ## Specific attacks being addressed 40 | - protect guests against the system 41 | - protect guests against a Linux Admin (on the host) 42 | - protect the data services use within a secured guests from host admin and/or other guests 43 | 44 | ## Specific attacks not being addressed 45 | - any attack over Internet on some service(s) running within Guest 46 | - usage of the fake ignition config for guest provisioning 47 | 48 | > **Note:** 49 | > `Hostkey` is generated by IBM and is stored somewhere in HW, the key is unique for each IBM Z mainframe. 50 | > Image signed with a key (or set of keys) could be used only on machine with the identical key. 51 | 52 | > **Note:** 53 | > Only production hostkey(s) could be verified (that a host key document is genuine) using [check_hostkeydoc](https://raw.githubusercontent.com/ibm-s390-tools/s390-tools/master/genprotimg/samples/check_hostkeydoc) tool. In a later version of `genprotimg` this function will 54 | be performed by the tooling itself. 55 | 56 | # High-level overview 57 | 58 | The goal of Secure Execution is to provide customer the highest level of isolation and security for each OCP node. 59 | To achive that new type of artifact for s390x would be available to download and use - Secure Execution Ready Qcow2 Image: 60 | - Image comes with `sdboot` generated using universal key from IBM. Key itself is not known to public and used by RH/IBM during build of RHCOS 61 | - Image couldn't be modified, by anyone except `ignition` during lifetime 62 | - It won't possible to inspect and modify (using `guestfish` or other tools) contents of image's `/` and `/boot` filesystems 63 | - It's only possible to run this image on zKVM host with enabled Secure Execution support 64 | - It's not possible to disable Secure Execution once it was enabled 65 | 66 | To use Secure Execution customer must have hostkeys for host machine(s) and specify them in `ignition` config. 67 | 68 | ## Prerequisites 69 | 70 | Secure Execution is supported only starting with z15 and only for zKVM guests. 71 | In order to use it zKVM host's kernel-cmdline should have `prot_virt=1 swiotlb=262144` options. 72 | 73 | KVM guests also require some additional options: 74 | - disk's driver should have `iommu=on` option: 75 | ```xml 76 | 77 | 78 | 79 | 80 | ``` 81 | - network interfaces should have `iommu=on` option: 82 | ```xml 83 | 84 | 85 | 86 | 87 | ``` 88 | - `memballoon` should be disabled: 89 | ```xml 90 | 91 | ``` 92 | 93 | Or same options for cmdline execution: 94 | ``` 95 | qemu-system-s390x \ 96 | -machine s390-ccw-virtio,accel=kvm \ 97 | -m 4096 -smp 2 \ 98 | -nographic \ 99 | -drive if=none,id=hda,file=/path/to/image.qcow2,auto-read-only=off,cache=unsafe -device virtio-blk,iommu_platform=on,drive=hda \ 100 | -netdev user,id=eth,hostfwd=tcp::2222-:22 -device virtio-net-ccw,iommu_platform=on,netdev=eth 101 | ``` 102 | 103 | # Detailed overview 104 | 105 | To achive goals listed above not only a special qcow2 image is generated, but also some additional steps during system's lifetime are required. 106 | 107 | ## Build 108 | To build Secure Execution Ready image new command is added to `coreos-assembler`: 109 | ``` 110 | cosa buildextend-secex --hostkey UNIVERSAL_KEY.crt 111 | ``` 112 | Which is similar to `cosa buildextend-qemu` and also generates qcow2 image. 113 | 114 | The qcow2 image comes with `sdboot` installed into its own partition, and dm-verity setup for boot and rootfs filesystems, where hashes are part of kernel's cmdline. 115 | Here are partitions of the Secure Execution qcow2 image provided by RH/IBM: 116 | 117 | ``` 118 | NAME PARTLABEL TYPE MOUNTPOINT 119 | vda disk / 120 | vdb disk 121 | |-vdb1 se part 122 | |-vdb3 boot part 123 | | `-boot crypt /tmp/rootfs/boot 124 | |-vdb4 root part 125 | | `-root crypt /tmp/rootfs 126 | |-vdb5 boothash part 127 | | `-boot crypt /tmp/rootfs/boot 128 | `-vdb6 roothash part 129 | `-root crypt /tmp/rootfs 130 | 131 | ``` 132 | 133 | - `se` - This is a new partition `ext4`, where we store a `sdboot` encrypted image 134 | - `boot` - boot partition is the same to one on reqular qcow2 image 135 | - `root` - rootfs partition is the same to one on reqular qcow2 image 136 | - `boothash` - partition for settting up `dm-verity` on `boot` 137 | - `roothash` - partition for settting up `dm-verity` on `boot` 138 | 139 | `dm-verity` is used to ensure, that pristine qcow image wasn't modified after build. 140 | 141 | > **Note:** 142 | > Further partitioning of boot disk with those 3 partitons is not supported, but 143 | rootfs could be grown. That means, that when required user should enlarge `secex-qemu.qcow2` image before `firstboot`: 144 | ``` 145 | qemu-img resize /path/to/img.qcow2 +50G 146 | ``` 147 | 148 | ## Firstboot 149 | 150 | During guest provisioning new hostkey(s) is/are fetched and used for generation of `sdboot`. User should specify them within ignition config with pre-defined name template: `/etc/se-hostkeys/ibm-z-hostkey-N` where `N` is a key number, several hostkeys could be used: 151 | 152 | ```yaml 153 | storage: 154 | files: 155 | - 156 | path: /etc/se-hostkeys/ibm-z-hostkey-1 157 | overwrite: true 158 | contents: 159 | source: https://url.com/key1 160 | ``` 161 | 162 | System would also set a new random LUKS keys for both `boot` and `root` filesystems, so each node has its own LUKS keys. 163 | 164 | ## Lifetime 165 | 166 | Secure Execution is enabled by 2 tools provided by IBM's s390-utils package: 167 | - `genprotimg` - tool to generate `sdboot` 168 | - `zipl` - tool to install new bootloader 169 | 170 | Despite `zipl` is well known and used by CoreOS, it couldn't be executed same way as before. When system runs in Secure Execution, `ostree` and `rdcore` automatically detect that and perform all required steps. 171 | That means, that user won't notice any difference, both mentioned components would automatically regenerate `sdboot` image and 172 | install it with `zipl` whenever required. 173 | 174 | # Implementation steps 175 | 176 | ## Step 1: CoreOS with existing patches to `libostree` could be configured to perform in Secure Execution mode. 177 | 178 | Customer's public hostkey(s) should be placed into `/etc/se-hostkeys/` using name templates `ibm-z-hostkey-N`. Secure Execution is enabled by hands following https://www.ibm.com/docs/en/linux-on-systems?topic=linux-introduction. 179 | 180 | ## Step 2: New `qemu-secex.qcow2` build artifact for CoreOS with enabled Secure Execution. 181 | 182 | `coreos-xyz-qemu-secex.qcow2` image with `dm-verity` setup for `/boot` and `rootfs` is available for download. This is done by using new `buildextend-secex` `coreos-assembler` command, which: 183 | - adds new `/se` partition for `sdboot` 184 | - adds `roothash` and `boothash` partitions for [dm-verity](https://docs.kernel.org/admin-guide/device-mapper/verity.html) setup 185 | - calculates hashes for `/boot` and `/` and appends them to kernel's cmdline 186 | - generates `/se/sdboot` encrypted by universal public hostkey (valid for all Z15 and newer machines) 187 | - `zipl`es `/se/sdboot`, so `qemu-secex.qcow2` image could only be run with Secure Execution 188 | 189 | Customer should add his public hostkeys(s) into `ignition` config. 190 | 191 | ## Step 3: During firstboot `/boot` and `rootfs` are encrypted with randomly generated LUKS keys; `integrity` option is enabled 192 | 193 | During firstboot `/boot` and `/` are luks encrypted with randomly generated luks keys. This ensures, that each CoreOS isntance has its own LUKS keys and also makes it not possible to read the contents of qcow2 image just by inspecting (`libguestfish`) it. Any modifications of qcow image will be detected in runtime by [dm-integrity](https://docs.kernel.org/admin-guide/device-mapper/dm-integrity.html). 194 | 195 | 196 | After provisioning, new disk is like: 197 | ``` 198 | NAME PARTLABEL TYPE MOUNTPOINT 199 | vdb disk 200 | |-vdb1 se part 201 | |-vdb3 boot part 202 | | `-boot_dif crypt 203 | | `-boot crypt /boot 204 | `-vdb4 root part 205 | `-root_dif crypt 206 | `-root crypt /sysroot 207 | 208 | ``` 209 | 210 | - `se` - partition for newly generated`sdboot` encrypted image 211 | - `boot` - LUKS encrypted boot partition 212 | - `root` - LUKS encrypted root partition 213 | - `boot_dif` - `dm-integrity` for `/boot` 214 | - `root_dif` - `dm-integrity` for `/root` 215 | 216 | ## Step 4: Ignition protection 217 | 218 | The goal is to prevent reading (to protect any secret stored in the Ignition config) and modification of the config. 219 | 220 | To achive this we do: 221 | - Generate at cosa compose time a pair of GPG keys: 222 | - The private key is part of the `sdboot` image 223 | - The public key is part of the build metadata 224 | - Disable Ignition logging to the console (e.g. serial and VGA) 225 | - Add a hook to emergency shell to remove private key and Ignition config before entering the shell 226 | - Disable local login 227 | 228 | User has to fetch the public GPG key for the `secex-qemu.qcow2` image and encrypt his Ignition config with the key: 229 | ``` 230 | gpg --recipient-file /path/to/ignition.gpg.pub --yes --output /path/to/config.ign.gpg --armor --encrypt /path/to/config.ign 231 | ``` 232 | And than instead of `serial=ignition` user uses `serial=ignition_crypted`: 233 | ``` 234 | -drive if=none,id=ignition,format=raw,file=/path/to/config.ign.gpg,readonly=on 235 | -device virtio-blk,serial=ignition_crypted,iommu_platform=on,drive=ignition 236 | ``` 237 | -------------------------------------------------------------------------------- /os/20231024-s390x-zvm-secure-ipl.md: -------------------------------------------------------------------------------- 1 | # z/VM Guest Secure IPL (Secure Boot-like protection) 2 | --- 3 | 4 | # [Overview](https://www.ibm.com/docs/en/zvm/7.3?topic=ipl-guest-secure) 5 | 6 | z/VM® supports guest secure IPL. Guest secure IPL supports the NIAP (National Information Assurance Partnership) operating system protection profile, which supports the Common Criteria certification. 7 | 8 | A z/VM user can request that the machine loader validate the signed IPL code by using the security keys that were previously loaded by the customer into the HMC certificate store. The validation ensures that the IPL code is intact, unaltered, and originates from a trusted build-time source. 9 | 10 | This support provides the ability for a Linux guest to exploit hardware to validate the code being booted, helping to ensure it is signed by the client or its supplier. 11 | 12 | Support is provided for the following device types: 13 | - SCSI devices. 14 | - ECKD devices. 15 | 16 | --- 17 | 18 | # Enabling Secure IPL at installation time (day-1) 19 | 20 | Assuming zVM is ready for secure boot, we can setup LOADDEV at installation time 21 | 22 | ## coreos-installer 23 | 24 | 1) Add new `coreos.inst.secure_ipl` karg and `--secure-ipl` option. `coreos-installer-generator` appends the switch when karg is provided 25 | 2) During installation we check for `--secure-ipl` and use `vmcp` tool to set LOADDEV 26 | 3) Add new systemd unit `coreos-installer-reboot-loaddev.service` to restart from LOADDEV (which immediately terminates running CoreOS VM) 27 | 28 | # Enabling Secure IPL on existing system (day-2) 29 | 30 | This is for informational purposes only. We expect users to only turn on Guest Secure IPL at installation time. 31 | 32 | 1) Login into RedHat CoreOS and ensure the output: 33 | ``` 34 | $ cat /sys/firmware/ipl/has_secure 35 | 1 36 | ``` 37 | 2) Set target disk as zVM LOADEVICE 38 | - ECKD 39 | 40 | Assuming RHCOS is installed on DASD disk `0.0.5223`, from zVM terminal execute: 41 | ``` 42 | # cp set loaddev eckd dev 5223 secure 43 | ``` 44 | - SCSI 45 | 46 | Assuming RHCOS is installed on FCP disk `0.0.8007,0x500507630400d1e3,0x4001404c00000000`, from zVM execute: 47 | ``` 48 | # cp set loaddev dev 8007 portname 50050763 0400d1e3 lun 4001404c 00000000 secure 49 | ``` 50 | 3) Update bootloader 51 | ``` 52 | $ sudo unshare --mount bash -c 'mount -o remount,rw /boot && zipl -V --secure 1' 53 | ``` 54 | 4) Poweroff RHCOS and start it from zVM terminal: 55 | ``` 56 | # cp ipl loaddev 57 | ``` 58 | 59 | # Ensure system runs with Secure IPL 60 | 61 | Preferred way to check whether Secure IPL is used: 62 | ``` 63 | $ cat /sys/firmware/ipl/secure 64 | 1 65 | ``` 66 | But currently this functionality could be broken on some kernel versions. 67 | 68 | Another way to check is: 69 | ``` 70 | $ dmesg | grep "Secure-IPL enabled" 71 | [ 0.029829] setup: Linux is running with Secure-IPL enabled 72 | ``` 73 | -------------------------------------------------------------------------------- /os/coreos-layering.md: -------------------------------------------------------------------------------- 1 | # CoreOS Layering 2 | 3 | This enhancement proposes: 4 | 5 | - A fundamental new addition to ostree/rpm-ostree, which is support for directly pulling and updating the OS from container images (while keeping all existing functionality, per-node package layering and our existing support for pulling via ostree on the wire). 6 | - Documentation for generating derived (layered) images from the pristine CoreOS base image. 7 | - Support for switching a FCOS system to use a custom image on firstboot via Ignition 8 | - zincati will continue to perform upgrades by inspecting the upgrade graph from the base image. 9 | 10 | This builds on [Fedora Changes/OstreeNativeContainer](https://fedoraproject.org/wiki/Changes/OstreeNativeContainer). 11 | 12 | # Existing work 13 | 14 | Originally, https://github.com/coreos/fedora-coreos-tracker/issues/812 tracked native support for "encapsulating" ostree commits in containers. 15 | 16 | Then, it was realized that when shipping the OS as a container image, it feels natural to support users deriving from it. The bulk of this is really "ostree native container" integration glue, and is happening in https://github.com/ostreedev/ostree-rs-ext 17 | 18 | rpm-ostree vendors the ostree-rs-ext code and also be extended to support the same interfaces as implemented by the "base" ostree-rs-ext code. 19 | 20 | Specifically as of today, this functionality is exposed in: 21 | 22 | - `rpm-ostree ex-container` 23 | - `rpm-ostree rebase --experimental $containerref` 24 | 25 | # Rationale 26 | 27 | Since the creation of Container Linux (CoreOS) as well as Atomic Host, and continuing into the Fedora/RHEL CoreOS days, we have faced a constant tension around what we ship in the host system. 28 | 29 | [This issue](https://github.com/coreos/fedora-coreos-tracker/issues/401) encapsulates much prior discussion. 30 | 31 | For users who are happy with Fedora CoreOS today, not much will change. 32 | 33 | For those who e.g. want to install custom agents or nontrivial amounts of code (such as kubelet), this "CoreOS layering" will be a powerful new mechanism to ship the code they need. 34 | 35 | # Example via Dockerfile 36 | 37 | One goal the current CoreOS team has in this is to still preserve the distinction we have between "base image" and user content. 38 | This argues for a declarative input that only allows controlled mutation. See the next section about this. 39 | 40 | However, `Dockerfile` is the lowest common denominator in the container ecosystem. 41 | To truly illustrate the goal of supporting arbitrary inputs, we must support it. 42 | 43 | [fcos-derivation-example](https://github.com/cgwalters/fcos-derivation-example) contains an example `Dockerfile` that builds a Go binary and injects it along with a corresponding systemd unit as a layer, building on top of Fedora CoreOS. 44 | 45 | For ease of reference, a copy of the above is inline here: 46 | 47 | ```dockerfile 48 | # Build a small Go program using a builder image 49 | FROM registry.access.redhat.com/ubi8/ubi:latest as builder 50 | WORKDIR /build 51 | COPY . . 52 | RUN yum -y install go-toolset 53 | RUN go build hello-world.go 54 | 55 | # In the future, this would be e.g. quay.io/coreos/fedora:stable 56 | FROM quay.io/cgwalters/fcos 57 | # Inject it into Fedora CoreOS 58 | COPY --from=builder /build/hello-world /usr/bin 59 | # And add our unit file 60 | ADD hello-world.service /etc/systemd/system/hello-world.service 61 | # Also add strace; we don't yet support `yum install` but we can 62 | # with some work in rpm-ostree! 63 | RUN rpm -Uvh https://kojipkgs.fedoraproject.org//packages/strace/5.14/1.fc34/x86_64/strace-5.14-1.fc34.x86_64.rpm 64 | ``` 65 | 66 | # Controlled mutation 67 | 68 | One of the primary advantages of the `Dockerfile` layering approach is that it allows direct filesystem modifications. However, we should distinguish between layering things (e.g. `/etc` files, or third-party daemon) and modifying base content (e.g. fast-track kernel hotfix), which has a higher likelihood of invalidating our CI process. In a cluster context for example, it's possible that a cluster admin may want to permit users only certain modifications. 69 | 70 | It is expected that control mechanisms will be integrated, though it's still not clear how that will look. It may be inside rpm-ostree (e.g. requiring override switches when first rebasing to the pullspec, and/or requiring a specific label on the image), or as part of the image build process itself (e.g. as part of [finalization](https://github.com/ostreedev/ostree-rs-ext/issues/159)). Of course, higher-level interfaces may enforce even stricter guidelines or only accept easily verifiable configs such as Butane/Ignition (see Butane example below). 71 | 72 | Ideally, it shouldn't be difficult for an FCOS/RHCOS user to query the kinds of mutations inside a container image, and this could then be displayed in a succinct way as part of `rpm-ostree status` when rebased onto it. 73 | # Derivation versus Ignition/Butane 74 | 75 | This proposal does not replace Ignition. Ignition will still play at least two key roles: 76 | 77 | - Setting up partitions and storage, e.g. LUKS is something configured via Ignition provided on boot. 78 | - Machine/node specific configuration, in particular bootstrap configuration: e.g. static IP addresses that are necessary to fetch container images at all. 79 | 80 | # Butane as a declarative input format for layering 81 | 82 | We must support `Dockerfile`, because it's the lowest common denominator for the container ecosystem, and is accepted as input for many tools. 83 | 84 | However, one does not have to use `Dockerfile` to make containers. Specifically, what would make a lot of sense for Fedora CoreOS is to focus 85 | on Butane as a standard declarative interface to this process. 86 | 87 | This could run as a "builder" container, something like this: 88 | 89 | ``` 90 | FROM quay.io/coreos/butane:release 91 | COPY . . 92 | RUN butane -o /build/ignition.json 93 | 94 | FROM quay.io/fedora/coreos:stable 95 | COPY --from=builder /build/ignition.json /tmp/ 96 | RUN ignition --write-filesystem /tmp/ignition.json && rm -f /tmp/ignition.json 97 | ``` 98 | 99 | Another option is to support being run nested inside an existing container tool, similar to 100 | [kaniko](https://github.com/GoogleContainerTools/kaniko). Then no 101 | `Dockerfile` would be needed. 102 | 103 | [More information on nesting container builds](https://www.redhat.com/sysadmin/podman-inside-kubernetes). 104 | 105 | # Use of CoreOS disk/boot images 106 | 107 | And more explicitly, it's expected that many if not most users would continue to use the official Fedora CoreOS "boot images" (e.g. ISO, AMI, qcow2, etc.). This proposal does *not* currently call for exposing a way for a user to create the boot image shell around their custom container, although that is an obvious potential next step. 108 | 109 | Hence, a user wanting to use a custom base image would provide machines with an Ignition config that performs e.g. `rpm-ostree rebase ostree-remote-image:quay.io/examplecorp/baseos:latest` as a systemd unit. It is likely that we would provide this via [Butane](github.com/coreos/butane) as well; for example: 110 | 111 | ``` 112 | variant: fcos 113 | version: 1.5.0 114 | ostree_container: 115 | image: quay.io/mycorp/myfcos:stable 116 | reboot: true 117 | ``` 118 | -------------------------------------------------------------------------------- /template.md: -------------------------------------------------------------------------------- 1 | - Feature Name: (fill me in with a unique ident, `my_awesome_feature`) 2 | 3 | # Summary 4 | [summary]: #summary 5 | 6 | One paragraph explanation of the proposal. 7 | 8 | # Motivation 9 | [motivation]: #motivation 10 | 11 | Why are we doing this? What use cases does it support? What is the expected outcome? 12 | 13 | # Explanation and Examples 14 | [explanation-and-examples]: #explanation-and-examples 15 | 16 | Explain the proposal as if it was already accepted and you were teaching it to another person. That generally means: 17 | 18 | - Introducing new named concepts. 19 | - Explaining the feature largely in terms of examples. 20 | - If applicable, provide sample error messages, deprecation warnings, or migration guidance. 21 | - If applicable, explain how to use this enhancement. 22 | 23 | # Drawbacks 24 | [drawbacks]: #drawbacks 25 | 26 | Why should we *not* do this? 27 | 28 | # Rationale and alternatives 29 | [rationale-and-alternatives]: #rationale-and-alternatives 30 | 31 | - Why is this design the best in the space of possible designs? 32 | - What other designs have been considered and what is the rationale for not choosing them? 33 | - What is the impact of not doing this? 34 | 35 | --------------------------------------------------------------------------------