├── LICENSE
├── README.md
├── cosa
    ├── 20200804-cfg-spec.md
    └── 20200805-meta-merge.md
├── os
    ├── 20201103-full-disk-raid.md
    ├── 20211014-s390x-secure-execution.md
    ├── 20231024-s390x-zvm-secure-ipl.md
    └── coreos-layering.md
└── template.md


/LICENSE:
--------------------------------------------------------------------------------
  1 |                                  Apache License
  2 |                            Version 2.0, January 2004
  3 |                         http://www.apache.org/licenses/
  4 | 
  5 |    TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
  6 | 
  7 |    1. Definitions.
  8 | 
  9 |       "License" shall mean the terms and conditions for use, reproduction,
 10 |       and distribution as defined by Sections 1 through 9 of this document.
 11 | 
 12 |       "Licensor" shall mean the copyright owner or entity authorized by
 13 |       the copyright owner that is granting the License.
 14 | 
 15 |       "Legal Entity" shall mean the union of the acting entity and all
 16 |       other entities that control, are controlled by, or are under common
 17 |       control with that entity. For the purposes of this definition,
 18 |       "control" means (i) the power, direct or indirect, to cause the
 19 |       direction or management of such entity, whether by contract or
 20 |       otherwise, or (ii) ownership of fifty percent (50%) or more of the
 21 |       outstanding shares, or (iii) beneficial ownership of such entity.
 22 | 
 23 |       "You" (or "Your") shall mean an individual or Legal Entity
 24 |       exercising permissions granted by this License.
 25 | 
 26 |       "Source" form shall mean the preferred form for making modifications,
 27 |       including but not limited to software source code, documentation
 28 |       source, and configuration files.
 29 | 
 30 |       "Object" form shall mean any form resulting from mechanical
 31 |       transformation or translation of a Source form, including but
 32 |       not limited to compiled object code, generated documentation,
 33 |       and conversions to other media types.
 34 | 
 35 |       "Work" shall mean the work of authorship, whether in Source or
 36 |       Object form, made available under the License, as indicated by a
 37 |       copyright notice that is included in or attached to the work
 38 |       (an example is provided in the Appendix below).
 39 | 
 40 |       "Derivative Works" shall mean any work, whether in Source or Object
 41 |       form, that is based on (or derived from) the Work and for which the
 42 |       editorial revisions, annotations, elaborations, or other modifications
 43 |       represent, as a whole, an original work of authorship. For the purposes
 44 |       of this License, Derivative Works shall not include works that remain
 45 |       separable from, or merely link (or bind by name) to the interfaces of,
 46 |       the Work and Derivative Works thereof.
 47 | 
 48 |       "Contribution" shall mean any work of authorship, including
 49 |       the original version of the Work and any modifications or additions
 50 |       to that Work or Derivative Works thereof, that is intentionally
 51 |       submitted to Licensor for inclusion in the Work by the copyright owner
 52 |       or by an individual or Legal Entity authorized to submit on behalf of
 53 |       the copyright owner. For the purposes of this definition, "submitted"
 54 |       means any form of electronic, verbal, or written communication sent
 55 |       to the Licensor or its representatives, including but not limited to
 56 |       communication on electronic mailing lists, source code control systems,
 57 |       and issue tracking systems that are managed by, or on behalf of, the
 58 |       Licensor for the purpose of discussing and improving the Work, but
 59 |       excluding communication that is conspicuously marked or otherwise
 60 |       designated in writing by the copyright owner as "Not a Contribution."
 61 | 
 62 |       "Contributor" shall mean Licensor and any individual or Legal Entity
 63 |       on behalf of whom a Contribution has been received by Licensor and
 64 |       subsequently incorporated within the Work.
 65 | 
 66 |    2. Grant of Copyright License. Subject to the terms and conditions of
 67 |       this License, each Contributor hereby grants to You a perpetual,
 68 |       worldwide, non-exclusive, no-charge, royalty-free, irrevocable
 69 |       copyright license to reproduce, prepare Derivative Works of,
 70 |       publicly display, publicly perform, sublicense, and distribute the
 71 |       Work and such Derivative Works in Source or Object form.
 72 | 
 73 |    3. Grant of Patent License. Subject to the terms and conditions of
 74 |       this License, each Contributor hereby grants to You a perpetual,
 75 |       worldwide, non-exclusive, no-charge, royalty-free, irrevocable
 76 |       (except as stated in this section) patent license to make, have made,
 77 |       use, offer to sell, sell, import, and otherwise transfer the Work,
 78 |       where such license applies only to those patent claims licensable
 79 |       by such Contributor that are necessarily infringed by their
 80 |       Contribution(s) alone or by combination of their Contribution(s)
 81 |       with the Work to which such Contribution(s) was submitted. If You
 82 |       institute patent litigation against any entity (including a
 83 |       cross-claim or counterclaim in a lawsuit) alleging that the Work
 84 |       or a Contribution incorporated within the Work constitutes direct
 85 |       or contributory patent infringement, then any patent licenses
 86 |       granted to You under this License for that Work shall terminate
 87 |       as of the date such litigation is filed.
 88 | 
 89 |    4. Redistribution. You may reproduce and distribute copies of the
 90 |       Work or Derivative Works thereof in any medium, with or without
 91 |       modifications, and in Source or Object form, provided that You
 92 |       meet the following conditions:
 93 | 
 94 |       (a) You must give any other recipients of the Work or
 95 |           Derivative Works a copy of this License; and
 96 | 
 97 |       (b) You must cause any modified files to carry prominent notices
 98 |           stating that You changed the files; and
 99 | 
100 |       (c) You must retain, in the Source form of any Derivative Works
101 |           that You distribute, all copyright, patent, trademark, and
102 |           attribution notices from the Source form of the Work,
103 |           excluding those notices that do not pertain to any part of
104 |           the Derivative Works; and
105 | 
106 |       (d) If the Work includes a "NOTICE" text file as part of its
107 |           distribution, then any Derivative Works that You distribute must
108 |           include a readable copy of the attribution notices contained
109 |           within such NOTICE file, excluding those notices that do not
110 |           pertain to any part of the Derivative Works, in at least one
111 |           of the following places: within a NOTICE text file distributed
112 |           as part of the Derivative Works; within the Source form or
113 |           documentation, if provided along with the Derivative Works; or,
114 |           within a display generated by the Derivative Works, if and
115 |           wherever such third-party notices normally appear. The contents
116 |           of the NOTICE file are for informational purposes only and
117 |           do not modify the License. You may add Your own attribution
118 |           notices within Derivative Works that You distribute, alongside
119 |           or as an addendum to the NOTICE text from the Work, provided
120 |           that such additional attribution notices cannot be construed
121 |           as modifying the License.
122 | 
123 |       You may add Your own copyright statement to Your modifications and
124 |       may provide additional or different license terms and conditions
125 |       for use, reproduction, or distribution of Your modifications, or
126 |       for any such Derivative Works as a whole, provided Your use,
127 |       reproduction, and distribution of the Work otherwise complies with
128 |       the conditions stated in this License.
129 | 
130 |    5. Submission of Contributions. Unless You explicitly state otherwise,
131 |       any Contribution intentionally submitted for inclusion in the Work
132 |       by You to the Licensor shall be under the terms and conditions of
133 |       this License, without any additional terms or conditions.
134 |       Notwithstanding the above, nothing herein shall supersede or modify
135 |       the terms of any separate license agreement you may have executed
136 |       with Licensor regarding such Contributions.
137 | 
138 |    6. Trademarks. This License does not grant permission to use the trade
139 |       names, trademarks, service marks, or product names of the Licensor,
140 |       except as required for reasonable and customary use in describing the
141 |       origin of the Work and reproducing the content of the NOTICE file.
142 | 
143 |    7. Disclaimer of Warranty. Unless required by applicable law or
144 |       agreed to in writing, Licensor provides the Work (and each
145 |       Contributor provides its Contributions) on an "AS IS" BASIS,
146 |       WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
147 |       implied, including, without limitation, any warranties or conditions
148 |       of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
149 |       PARTICULAR PURPOSE. You are solely responsible for determining the
150 |       appropriateness of using or redistributing the Work and assume any
151 |       risks associated with Your exercise of permissions under this License.
152 | 
153 |    8. Limitation of Liability. In no event and under no legal theory,
154 |       whether in tort (including negligence), contract, or otherwise,
155 |       unless required by applicable law (such as deliberate and grossly
156 |       negligent acts) or agreed to in writing, shall any Contributor be
157 |       liable to You for damages, including any direct, indirect, special,
158 |       incidental, or consequential damages of any character arising as a
159 |       result of this License or out of the use or inability to use the
160 |       Work (including but not limited to damages for loss of goodwill,
161 |       work stoppage, computer failure or malfunction, or any and all
162 |       other commercial damages or losses), even if such Contributor
163 |       has been advised of the possibility of such damages.
164 | 
165 |    9. Accepting Warranty or Additional Liability. While redistributing
166 |       the Work or Derivative Works thereof, You may choose to offer,
167 |       and charge a fee for, acceptance of support, warranty, indemnity,
168 |       or other liability obligations and/or rights consistent with this
169 |       License. However, in accepting such obligations, You may act only
170 |       on Your own behalf and on Your sole responsibility, not on behalf
171 |       of any other Contributor, and only if You agree to indemnify,
172 |       defend, and hold each Contributor harmless for any liability
173 |       incurred by, or claims asserted against, such Contributor by reason
174 |       of your accepting any such warranty or additional liability.
175 | 
176 |    END OF TERMS AND CONDITIONS
177 | 
178 |    APPENDIX: How to apply the Apache License to your work.
179 | 
180 |       To apply the Apache License to your work, attach the following
181 |       boilerplate notice, with the fields enclosed by brackets "[]"
182 |       replaced with your own identifying information. (Don't include
183 |       the brackets!)  The text should be enclosed in the appropriate
184 |       comment syntax for the file format. We also recommend that a
185 |       file or class name and description of purpose be included on the
186 |       same "printed page" as the copyright notice for easier
187 |       identification within third-party archives.
188 | 
189 |    Copyright [yyyy] [name of copyright owner]
190 | 
191 |    Licensed under the Apache License, Version 2.0 (the "License");
192 |    you may not use this file except in compliance with the License.
193 |    You may obtain a copy of the License at
194 | 
195 |        http://www.apache.org/licenses/LICENSE-2.0
196 | 
197 |    Unless required by applicable law or agreed to in writing, software
198 |    distributed under the License is distributed on an "AS IS" BASIS,
199 |    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
200 |    See the License for the specific language governing permissions and
201 |    limitations under the License.
202 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # Enhancements Tracking for CoreOS-based Systems
 2 | 
 3 | Inspired by [openshift/enhancements](https://github.com/openshift/enhancements/)
 4 | and [rust-lang/rfcs](https://github.com/rust-lang/rfcs/) and others.
 5 | 
 6 | To create a new enhancement, copy `template.md` to `directory/my-awesome-enhancement.md`
 7 | where `directory` is the tool or area impacted by the proposal and `my-awesome-enhancement.md`
 8 | is descriptive of the feature.
 9 | 
10 | This repo serves a location to record design decisions that affect systems like
11 | Fedora CoreOS and RHEL CoreOS.  This is also the location where discussion can
12 | happen on those design decisions.
13 | 


--------------------------------------------------------------------------------
/cosa/20200804-cfg-spec.md:
--------------------------------------------------------------------------------
  1 | # CoreOS Assembler Config Specification
  2 | - Subject ID: COSA.20200804
  3 | - Superceeds: None
  4 | - Conflicts: None
  5 | 
  6 | ## Summary
  7 | 
  8 | The RHCOS development pipelines use a "jobspec" to describe the pipeline runs.  This enhancement proposes a YAML-based interface based loosely on the RHCOS jobspec.
  9 | 
 10 | ## Goals
 11 | 
 12 | 1. Provide a file-based interface into COSA
 13 | 1. Prep for breaking up the Pipeline monoliths
 14 | 1. Provide a migration path for RHCOS pipelines
 15 | 1. Prepare for pipeline Nirvana (where *COS pipelines can live in peace and harmony)
 16 | 
 17 | ## Why?
 18 | 
 19 | The innovation of the RHCOS JobSpec was it provided a clear seperation of configuration from the code; the logic was conflated. Further the jobspec allowed the logic to be run with different inputs.
 20 | 
 21 | To support this, a "COSAspec" is being introduced to provide a CLI file-interface.
 22 | 
 23 | ## YAML to EnvVars
 24 | 
 25 | * For Bash and Pythonic CLIs, each argument will have an envVar added.
 26 | * At entry into COSA, the optional COSAspec will be checked
 27 | * Values will be parsed and emit as EnvVars by a common entry point.
 28 | * The EnvVars will set defaults to the CLI commands
 29 | 
 30 | Note: The Mantle code is GoLang based. The use of envVars is supported by the Cobra CLI, but needs further research and consideration. The Mantle code is not considered in this design standard.
 31 | 
 32 | ## Types
 33 | 
 34 | * List values will be joined to comma seperated strings
 35 | * Dicts/Map will be flattened with an `_` between levels
 36 | * Boolean values should be 0, 1, True, False, true, false.
 37 | * Number types are strings and the consuming code is repsonsbile for casting to the type.
 38 | 
 39 | Due to the lack of cohesion, the mix of Python, Bash and GoLang, each component will necessarily handle empty and null values.
 40 | 
 41 | 
 42 | ## Example:
 43 | 
 44 | For the `coreos-assembler build` command:
 45 | ```
 46 | build:
 47 |   force: false
 48 |   force_nocache: false
 49 |   skip_prune: false
 50 |   parent: <string>
 51 |   tag: <string>
 52 | ```
 53 | 
 54 | Would render the envVars:
 55 | ```
 56 | COSA_BUILD_FORCE=false
 57 | COSA_BUILD_FORCE_NOCAHCE=false
 58 | COSA_BUILD_SKIP_PRUNE=false
 59 | ...
 60 | ```
 61 | 
 62 | Then in `cmd-build` the defaults would be:
 63 | ```
 64 | FORCE="${COSA_BUILD_FORCE:-}"
 65 | FORCE_IMAGE="${COSA_BUILD_FORCE_IMAGE:-}"
 66 | SKIP_PRUNE="${COSA_BUILD_SKIP_PRUNE:-0}"
 67 | ...
 68 | ```
 69 | 
 70 | ## List value example
 71 | 
 72 | ```
 73 | extend:
 74 |   aws:
 75 |     regions:
 76 |        - us-east-1
 77 |        - us-west-1
 78 | ```
 79 | Would render at `COSA_EXTEND_AWS_REGIONS="us-east-1,us-west-1"`
 80 | 
 81 | ## Dict example
 82 | 
 83 | ```
 84 | foo:
 85 |   bar:
 86 |     baz:
 87 |       1: true
 88 |       2: true
 89 | ```
 90 | 
 91 | Would render as `COSA_FOO_BAR_BAZ_1=true` and `COSA_FOO_BAR_BAZ_2=true`
 92 | 
 93 | ## PR Standard
 94 | 
 95 | A thorough review of the COSA code-base, a comparison between the FCOS and RHCOS pipelines shows that each command will need to be handled seperately.
 96 | 
 97 | Therefore, for each command updated to modify this interface:
 98 | - The prefix `COSA_<COSA COMMAND>` will be used for envVars (e.g. `COSA_EXTEND_AZURE`)
 99 | - No calculated values. Commands may not parse the spec file itself.
100 | - One command per PR.
101 | 
102 | 
103 | Naming convention for commands (in order of precedence):
104 | - `cmd-buildextend-*`: `COSA_EXTEND_*`
105 | - `cmd-build`: `COSA_BUILD_*`
106 | - `cmd-.*`: `COSA_<SUFFIX>_*`
107 | - `.*`: no interface
108 | 


--------------------------------------------------------------------------------
/cosa/20200805-meta-merge.md:
--------------------------------------------------------------------------------
 1 | # Merging of build meta-data
 2 | - Specification ID: cosa-20200805
 3 | - Supersedes: None
 4 | - Conflicts: None
 5 | 
 6 | # Background
 7 | 
 8 | One of the biggest blockers to performing parallel builds, distributed builds and/or re-starting failed builds is the ability to lock write access to `meta.json` in COSA.
 9 | 
10 | Under the following assumptions, COSA needs to support merging of `meta.json`:
11 | - Not all `cmd-buildextend-*` operations will happen on the same pod or Linux namespace
12 | - S3 is the authoritative store for builds. S3 is eventually consistent meaning that the version read from S3 may be replaced after reading from S3. Implementing distributed locking on `meta.json` based on S3 is problematic.
13 | - For RHCOS we need the ability to do partial builds and then back-fill artifacts (e.g. only build qemu, but not cloud) and later build cloud.
14 | - Developers want to restart failed builds
15 | - Consumers of the artifacts from COSA want to reduce the time to build and test
16 | 
17 | ## State of a build is a point-of-view statement
18 | 
19 | To COSA a build is usually `ostree` + `qcow2`. The notion of a "failed" or "partial" build is a point of view. COSA does provision the build id, but COSA does not know what comprised the artifacts of the "build". Usually developers and consumers speak of a build as the base `ostree` and the _optional_ artifacts.
20 | 
21 | In COSA a "build" only starts to exist when the `meta.json` is incepted after the `ostree` has been written. The build directory may exist, but the `meta.json` that is produced immediately is valid for any consumer. In most cases a "failed" build means that an artifact in the `buildextend-*` commands has failed to publish (such as AWS) not that the artifact itself failed to build; the artifact was indeed built.
22 | 
23 | Further, a "failed build" is also used to mean that some part of a pipeline failed to complete. COSA itself does not have any concept of dependencies, much less the idea that a multiple COSA commands can be run. COSA only has implied dependencies that some artifacts requires others. In other words, COSA doesn't know when the "build" is complete until after something else considers it's complete.
24 | 
25 | ## Multiple `meta.json`
26 | 
27 | Under the axiom that the state of a build is usually determined from the point-of-view of a pipeline and to support distributed and parallel builds, builds will be produced with multiple `meta.json` that will be merged together after the artifact is built.
28 | 
29 | ## `cosalib.meta` library
30 | 
31 | `cmd-meta` currently uses the [`cosalib.meta`](https://github.com/coreos/coreos-assembler/blob/main/src/cosalib/meta.py) library for reading and writing `meta.json`. To support merging of `meta.json` files, the library will be updated to allow for merging in-memory updates with on-disk content.
32 | 
33 | Only `meta.json` that is validated will be merged under the following rules:
34 | - source and destination meta must match the buildid, name and ostree commit
35 | - on-disk `meta.json` takes precedence over the in-memory meta
36 | - all versions must be validated before and after merging before commiting to disk.
37 | 
38 | Prior to merging `cosalib.meta` will lock the localfile.
39 | 
40 | ## merge mode denoted in `meta.json`
41 | 
42 | To support production of meta meta data, `cmd-build` will have a key to support merging meta-data in a delayed fashion.
43 | 
44 | ```
45 | {
46 |     	"coreos-assembler.delayed-meta-merge": true
47 | }
48 | ```
49 | 
50 | `cmd-build` will have the flag of `--delay-meta-merge` added to set the merge mode. The default is `false`.
51 | 
52 | ## Delayed Merging of meta-data
53 | 
54 | When `coreos-assembler.delayed-meta-merge` is set to `true`, `cmd-buildextend.<ARTIFACT>` commands will create suffixed `meta.<ARTIFACT>.json` instead of writing directly to `meta.json`. A new command called `cmd-meta-merge` will be introduced that will read `meta.<ARTIFACT>.json` and will ONLY combine the images and the cloud-build specific sections into `meta.json`.
55 | 
56 | Orchestration layers will be responsible for shuffling artifacts around, e.g. `cmd-upload-build`. `cmd-upload-build --meta <ARTIFACT>` will only upload the artifacts described in the meta.
57 | 
58 | All `meta.<ARTIFACT>.json` will be removed when their contents are merged into `meta.json`. If the artifacts are produced across compute domains, orchestration layers are responsible for calling `cmd-upload-build` to stage the artifacts.
59 | 
60 | `cmd-meta-merge` will be able to read `meta.<ARTIFACT>.json`. `cmd-meta-merge` will:
61 | - read in `meta.json` and validate it.
62 | - iterate over `meta.<ARTIFACT>.json` in the path or in S3.
63 | - Validation against the schema for both source and destination meta-data before and after merging.
64 | - the `meta.<ARTIFACT>.json` will be deleted
65 | 
66 | ## Merging logic
67 | 
68 | By default, only keys not found in `meta.json` will be added from `meta.<ARTIFACT>.json`. To support rebuilds/restart, a force option will exist in `cmd-meta-merge` with a default of `false`.
69 | 
70 | Keys that will be merged:
71 | - fields from the images section
72 | - known cloud names such as AWS, Azure, GPC, etc.
73 | 
74 | ## Default builds
75 | 
76 | When `coreos-assembler.delayed-meta-merge` is false or absent, `cmd-buildextend-*` commands will will produce a `meta.<ARTIFACT>.json` and immediately merge into `meta.json` using the `cmd-meta-merge`. To support bash commands `finalize-artifact` will use `cmd-meta-meta` for `meta.json`.
77 | 
78 | The immediate merging of metas requires:
79 | - time sync clocks
80 | - shared disk for `builds/`
81 | 
82 | ## Validation Requirement
83 | 
84 | All merged `meta.json` must be validated against the schema.
85 | 


--------------------------------------------------------------------------------
/os/20201103-full-disk-raid.md:
--------------------------------------------------------------------------------
  1 | # Full disk RAID support
  2 | 
  3 | ## Theory of operation
  4 | 
  5 | Users can write FCC sugar that enables redundant boot and/or an encrypted root volume.  There are three cases:
  6 | 
  7 | 1. The sugar only specifies encrypted root.  FCCT desugars the section into a LUKS volume within the existing root partition, and a root filesystem layered on top.
  8 | 2. The sugar only specifies redundant boot.  FCCT desugars the section into an Ignition config which completely recreates the partition structure on each specified disk.  The config creates partitions, RAID volumes, and filesystems for the root and boot partitions; partitions and independent filesystems for each ESP replica; and partitions for each BIOS-BOOT replica.
  9 | 3. The sugar specifies both.  FCCT desugars similarly to the second case, with the addition of a LUKS volume.
 10 | 
 11 | If redundant boot is specified, Dracut glue detects the corresponding config stanzas, saves the corresponding on-disk data to RAM before the Ignition disks phase, and restores it after disks phase completes.  The glue also replicates the boot sector when replicating BIOS-BOOT.
 12 | 
 13 | Each piece of the replication logic (root partition, boot partition, ESP, and BIOS-BOOT + MBR) functions independently.  An advanced user can create nonstandard partition configurations (for example, RAID-1 boot with RAID-5 root) by manually configuring partitions in their FCC.
 14 | 
 15 | Recovery from a disk failure is out of scope for this enhancement.  We will recommend that users recover from a hardware failure by reprovisioning the machine.
 16 | 
 17 | ## FCC sugar
 18 | 
 19 | ```yaml
 20 | variant: fcos
 21 | version: 1.3.0
 22 | boot_device:
 23 |   # Allows choosing from multiple disk layout templates hardcoded in FCCT.
 24 |   # We might use this to support CPU architectures with different partition
 25 |   # requirements (e.g. ppc64le) or to change the disk layout in future OS
 26 |   # releases.
 27 |   # Optional, defaults to x86_64
 28 |   layout: x86_64
 29 |   # If specified, mirror every default partition
 30 |   mirror:
 31 |     devices:
 32 |       # Two or more devices are required
 33 |       - /dev/vda
 34 |       - /dev/vdb
 35 |   # If specified, encrypt the root partition
 36 |   luks:
 37 |     # The contents of the existing clevis struct, without the `custom`
 38 |     # field (which is too advanced for sugar).  We don't need to support
 39 |     # static key files because they don't work for root.
 40 |     tang:
 41 |       - url: https://example.com/tang.tang.tang
 42 |         thumbprint: x
 43 |     tpm2: true
 44 |     threshold: 2
 45 | ```
 46 | 
 47 | The `boot_device` struct is at the top level, rather than under `storage`, because CT sugar has historically been placed at the top level and because Go struct embedding semantics make it harder to put it under `storage`.
 48 | 
 49 | The sugar does not support advanced features such as RAID spares, RAID levels other than 1, or filesystem/mount options.  Users can manually specify individual partitions, RAIDs, and filesystems if desired.
 50 | 
 51 | ## Rendered Ignition config
 52 | 
 53 | ### LUKS only
 54 | 
 55 | ```json
 56 | {
 57 |   "ignition": {
 58 |     "version": "3.2.0"
 59 |   },
 60 |   "storage": {
 61 |     "filesystems": [
 62 |       {
 63 |         "device": "/dev/disk/by-id/dm-name-root",
 64 |         "format": "xfs",
 65 |         "label": "root",
 66 |         "wipeFilesystem": true
 67 |       }
 68 |     ],
 69 |     "luks": [
 70 |       {
 71 |         "clevis": {
 72 |           "tpm2": true
 73 |         },
 74 |         "device": "/dev/disk/by-partlabel/root",
 75 |         "label": "luks-root",
 76 |         "name": "root",
 77 |         "wipeVolume": true
 78 |       }
 79 |     ]
 80 |   }
 81 | }
 82 | ```
 83 | 
 84 | ### RAID only
 85 | 
 86 | ```json
 87 | {
 88 |   "ignition": {
 89 |     "version": "3.2.0"
 90 |   },
 91 |   "storage": {
 92 |     "disks": [
 93 |       {
 94 |         "device": "/dev/vda",
 95 |         "partitions": [
 96 |           {
 97 |             "label": "bios-1",
 98 |             "sizeMiB": 1,
 99 |             "typeGuid": "21686148-6449-6E6F-744E-656564454649"
100 |           },
101 |           {
102 |             "label": "esp-1",
103 |             "sizeMiB": 127,
104 |             "typeGuid": "C12A7328-F81F-11D2-BA4B-00A0C93EC93B"
105 |           },
106 |           {
107 |             "label": "boot-1",
108 |             "sizeMiB": 384
109 |           },
110 |           {
111 |             "label": "root-1"
112 |           }
113 |         ],
114 |         "wipeTable": true
115 |       },
116 |       {
117 |         "device": "/dev/vdb",
118 |         "partitions": [
119 |           {
120 |             "label": "bios-2",
121 |             "sizeMiB": 1,
122 |             "typeGuid": "21686148-6449-6E6F-744E-656564454649"
123 |           },
124 |           {
125 |             "label": "esp-2",
126 |             "sizeMiB": 127,
127 |             "typeGuid": "C12A7328-F81F-11D2-BA4B-00A0C93EC93B"
128 |           },
129 |           {
130 |             "label": "boot-2",
131 |             "sizeMiB": 384
132 |           },
133 |           {
134 |             "label": "root-2"
135 |           }
136 |         ],
137 |         "wipeTable": true
138 |       }
139 |     ],
140 |     "filesystems": [
141 |       {
142 |         "device": "/dev/disk/by-partlabel/esp-1",
143 |         "format": "vfat",
144 |         "label": "esp-1",
145 |         "wipeFilesystem": true
146 |       },
147 |       {
148 |         "device": "/dev/disk/by-partlabel/esp-2",
149 |         "format": "vfat",
150 |         "label": "esp-2",
151 |         "wipeFilesystem": true
152 |       },
153 |       {
154 |         "device": "/dev/md/md-boot",
155 |         "format": "ext4",
156 |         "label": "boot",
157 |         "wipeFilesystem": true
158 |       },
159 |       {
160 |         "device": "/dev/md/md-root",
161 |         "format": "xfs",
162 |         "label": "root",
163 |         "wipeFilesystem": true
164 |       }
165 |     ],
166 |     "raid": [
167 |       {
168 |         "devices": [
169 |           "/dev/disk/by-partlabel/boot-1",
170 |           "/dev/disk/by-partlabel/boot-2"
171 |         ],
172 |         "level": "raid1",
173 |         "name": "md-boot",
174 |         "options": [
175 |           "--metadata=1.0"
176 |         ]
177 |       },
178 |       {
179 |         "devices": [
180 |           "/dev/disk/by-partlabel/root-1",
181 |           "/dev/disk/by-partlabel/root-2"
182 |         ],
183 |         "level": "raid1",
184 |         "name": "md-root"
185 |       }
186 |     ]
187 |   }
188 | }
189 | ```
190 | 
191 | FCCT assigns a unique serial number to each disk and appends it to the label assigned to each disk partition.  This ensures each partition can be referenced by a unique path in `/dev/disk/by-partlabel`.
192 | 
193 | When desugaring, FCCT should be careful to allow explicit settings elsewhere in the config to override settings originating from sugar.  This could be used e.g. to specify filesystem formatting options.
194 | 
195 | FCCT also hardcodes the partition size for each partition, replicating the existing partition layout.  If we change the default partition sizes in future, we can encode them in a new layout, and we can also change the default layout if we bump the FCC major version.
196 | 
197 | The original boot disk is not treated specially; it's wiped and rewritten just like any other replica.
198 | 
199 | Each of root, boot, ESP, and BIOS-BOOT is placed on a separate partition.  MD-RAID volumes are partitionable, so in principle we could put root and boot on the same RAID volume.  However, this is uncommon (neither Anaconda nor Ignition currently allow it) and it complicates GRUB access to the boot filesystem.
200 | 
201 | ### LUKS and RAID
202 | 
203 | ```json
204 | {
205 |   "ignition": {
206 |     "version": "3.2.0"
207 |   },
208 |   "storage": {
209 |     "disks": [
210 |       {
211 |         "device": "/dev/vda",
212 |         "partitions": [
213 |           {
214 |             "label": "bios-1",
215 |             "sizeMiB": 1,
216 |             "typeGuid": "21686148-6449-6E6F-744E-656564454649"
217 |           },
218 |           {
219 |             "label": "esp-1",
220 |             "sizeMiB": 127,
221 |             "typeGuid": "C12A7328-F81F-11D2-BA4B-00A0C93EC93B"
222 |           },
223 |           {
224 |             "label": "boot-1",
225 |             "sizeMiB": 384
226 |           },
227 |           {
228 |             "label": "root-1"
229 |           }
230 |         ],
231 |         "wipeTable": true
232 |       },
233 |       {
234 |         "device": "/dev/vdb",
235 |         "partitions": [
236 |           {
237 |             "label": "bios-2",
238 |             "sizeMiB": 1,
239 |             "typeGuid": "21686148-6449-6E6F-744E-656564454649"
240 |           },
241 |           {
242 |             "label": "esp-2",
243 |             "sizeMiB": 127,
244 |             "typeGuid": "C12A7328-F81F-11D2-BA4B-00A0C93EC93B"
245 |           },
246 |           {
247 |             "label": "boot-2",
248 |             "sizeMiB": 384
249 |           },
250 |           {
251 |             "label": "root-2"
252 |           }
253 |         ],
254 |         "wipeTable": true
255 |       }
256 |     ],
257 |     "filesystems": [
258 |       {
259 |         "device": "/dev/disk/by-partlabel/esp-1",
260 |         "format": "vfat",
261 |         "label": "esp-1",
262 |         "wipeFilesystem": true
263 |       },
264 |       {
265 |         "device": "/dev/disk/by-partlabel/esp-2",
266 |         "format": "vfat",
267 |         "label": "esp-2",
268 |         "wipeFilesystem": true
269 |       },
270 |       {
271 |         "device": "/dev/md/md-boot",
272 |         "format": "ext4",
273 |         "label": "boot",
274 |         "wipeFilesystem": true
275 |       },
276 |       {
277 |         "device": "/dev/disk/by-id/dm-name-root",
278 |         "format": "xfs",
279 |         "label": "root",
280 |         "wipeFilesystem": true
281 |       }
282 |     ],
283 |     "luks": [
284 |       {
285 |         "clevis": {
286 |           "tpm2": true
287 |         },
288 |         "device": "/dev/md/md-root",
289 |         "label": "luks-root",
290 |         "name": "root",
291 |         "wipeVolume": true
292 |       }
293 |     ],
294 |     "raid": [
295 |       {
296 |         "devices": [
297 |           "/dev/disk/by-partlabel/boot-1",
298 |           "/dev/disk/by-partlabel/boot-2"
299 |         ],
300 |         "level": "raid1",
301 |         "name": "md-boot",
302 |         "options": [
303 |           "--metadata=1.0"
304 |         ]
305 |       },
306 |       {
307 |         "devices": [
308 |           "/dev/disk/by-partlabel/root-1",
309 |           "/dev/disk/by-partlabel/root-2"
310 |         ],
311 |         "level": "raid1",
312 |         "name": "md-root"
313 |       }
314 |     ]
315 |   }
316 | }
317 | ```
318 | 
319 | ## Root filesystem
320 | 
321 | The Dracut glue already saves and restores the root filesystem contents whenever the config specifies a filesystem with label `root` and `wipeFilesystem` set to `true`.
322 | 
323 | ## Boot filesystem
324 | 
325 | Similar to the root filesystem, the Dracut glue will save and restore the contents of `/boot` whenever the config specifies a filesystem labeled `boot` with `wipeFilesystem` set to `true`.
326 | 
327 | GRUB includes a module for reading MD-RAID volumes, and we want to use it, since it will read from any available replica.  However, the shipped boot sector is hardcoded to set `prefix` to the first boot disk, and we want to replicate the boot sector via a bit-for-bit copy (see below).  For now, we'll create the RAID volume with metadata format 1.0, which puts the RAID superblock at the end of the partition, and allows GRUB to treat individual `/boot` replicas as independent filesystems.  We'll modify the GRUB configs so UEFI boot always treats `/boot` as a RAID, and BIOS boot treats it as a RAID after the `normal` module and `grub.cfg` are loaded.  That leaves a window during BIOS boots where boot will fail if the first enumerated device doesn't work.  Once bootupd knows how to update BIOS GRUB, we can close the vulnerability window by having bootupd reinstall GRUB during the first boot (see [coreos/fedora-coreos-tracker#702](https://github.com/coreos/fedora-coreos-tracker/issues/702)).
328 | 
329 | `/boot` replication prevents the use of the grubenv mechanism, as GRUB [disables grubenv support](https://www.gnu.org/software/grub/manual/grub/grub.html#Environment-block) when RAID is in use.
330 | 
331 | ## EFI System Partition
332 | 
333 | The Dracut glue will save the contents of the ESP filesystem whenever the config specifies a partition with the ESP type GUID, and will restore those contents to _every_ partition with that type GUID.
334 | 
335 | Thus the ESP won't be RAIDed; we'll create multiple independent filesystems with identical directory trees.  Keeping each replica independent avoids RAID desynchronization if the firmware chooses to modify the ESP, and makes sense because CoreOS does not modify the ESP after installation.  Since there will no longer be a single canonical ESP, we'll no longer automount `/boot/efi` in the running system (see [coreos/fedora-coreos-tracker#694](https://github.com/coreos/fedora-coreos-tracker/issues/694)).
336 | 
337 | When updating the bootloader, bootupd will need to find each partition with an ESP type GUID and update it independently.  This assumes that all ESPs on the system are controlled by CoreOS.
338 | 
339 | Per the usual Ignition philosophy, the copy logic will fail if the source ESP is missing.  If we decide to [drop the ESP in AWS images](https://github.com/coreos/fedora-coreos-config/pull/407) and want to support mirrored boot disks in AWS, we'll need to provide an alternative `layout` without an ESP.
340 | 
341 | ## BIOS Boot partition
342 | 
343 | The Dracut glue will save an image of the BIOS boot partition whenever the config specifies a partition with the BIOS boot type GUID, and will restore that image to _every_ such partition.  In addition, the same logic will save the boot sector (the first 440 bytes of the disk) and restore it to every disk where we write a BIOS Boot partition.
344 | 
345 | The BIOS Boot partition is not updated at runtime, so RAID is not necessary.  When updating the bootloader, bootupd will need to find each BIOS Boot type GUID and update both the partition and the corresponding boot sector.  This assumes that the system is not dual-booting and belongs entirely to CoreOS.
346 | 
347 | We're currently copying the BIOS Boot partition and boot sector bit-for-bit, rather than rerunning `grub-install` from the initrd or hand-modifying the boot sector.  Since the boot sector hardcodes the offset of the BIOS Boot partition, the latter must be recreated at the same offset as the original partition.  To avoid hardcoding an awkward constant in FCCT, we'll move the BIOS boot partition from its current offset of 512 MiB to the beginning of the disk (offset 1 MiB) in new images.  The Dracut glue will detect relocation of the BIOS boot partition and fail the boot.
348 | 
349 | There's no BIOS boot partition in 4Kn images.  We could work around this, but to reduce the test matrix, we'll add an empty BIOS boot partition to the 4Kn image.
350 | 
351 | ## PReP partition
352 | 
353 | For ppc64le, we'll create independent replicas of the PReP partition, similar to the BIOS boot partition.
354 | 
355 | ## Ignition changes
356 | 
357 | The Dracut glue can detect the requisite Ignition config stanzas using `jq`, but that's brittle and we're already doing a lot of it in the initrd.  Ideally Ignition would provide command-line arguments for executing relevant (hard-coded) queries against the cached Ignition config.  For example:
358 | 
359 | ```sh
360 | ignition -query wiped-filesystem=root
361 | ignition -query partition-type-guid=c12a7328-f81f-11d2-ba4b-00a0c93ec93b
362 | ```
363 | 
364 | We're currently planning to defer this change to future work.
365 | 
366 | ## Architecture support
367 | 
368 | ppc64le will need an alternate `layout` definition that includes the PReP partition.  aarch64 will similarly need a `layout` without a BIOS Boot partition.
369 | 
370 | For s390x, Ignition doesn't yet support DASD partitioning, but could work with FCP disks.  In any event, we'd need provisions for rerunning zipl from the initramfs, if RAID support is even possible at all.  Each DASD is connected via a particular bus identifier, which might need to be encoded into each respective copy of the bootloader.  For now, this functionality is out of scope on s390x for 4.7.
371 | 
372 | ## Configuring boot disk RAID in OCP
373 | 
374 | For 4.7, we'll document a means for OCP users to use FCCT to render a RAID configuration for their Ignition pointer config.  Better OCP UX for FCCs is left as future work.
375 | 


--------------------------------------------------------------------------------
/os/20211014-s390x-secure-execution.md:
--------------------------------------------------------------------------------
  1 | # IBM Secure Execution for Linux
  2 | ---
  3 | 
  4 | # Overview
  5 | 
  6 | [IBM Secure Execution for Linux®](https://www.ibm.com/docs/en/linux-on-systems?topic=virtualization-introducing-secure-execution-linux) is a hardware-based security technology that is
  7 | built into the IBM z15™ and LinuxONE III generation systems.
  8 | The goal of Secure Execution is to protect data of workloads that run in a KVM guest from being inspected or modified by the hypervisor. In particular, no hardware administrator, no KVM code, and no KVM administrator can access the data in a guest that was started as an IBM Secure Execution guest.
  9 | 
 10 | IBM Secure Execution provides the following benefits:
 11 |  - Instead of relying on deterrence by using extensive audit tracks, IBM Secure Execution provides technology-enforced security rather than process or audit-based security.
 12 |  - It is possible to securely deploy workloads in the cloud.
 13 |  - As a cloud provider that uses IBM Secure Execution, you can attract sensitive workloads that, formerly, were restricted to the workload owner's system.
 14 |  - As a secure workload owner, you know that your workload is run in a secure manner, even outside your data center.
 15 |  - As a secure workload owner, you can choose where to run your workload, independently of the security level required.
 16 | 
 17 | ---
 18 | 
 19 | # Theory of operation
 20 | 
 21 | Secure Execution is designed to provide scalable isolation for individual workloads to help protect them from not
 22 | only external attacks, but also insider threats. In traditional x86 ring architectures,
 23 | the host can access the memory and data of guest applications freely, leading to the
 24 | potential for malicious software to be proliferated throughout the entire system.
 25 | Isolation between host and guest environments is necessary to help prevent system compromise.
 26 | Users can enable Secure Execution feature on IBM zKVM Host and Guest VMs to isolate them from each other.
 27 | 
 28 | That means:
 29 | 1. An encrypted Linux image, called `sdboot`, (kernel + initramfs + cmdline) is created using:
 30 |  - universal host public key provided by IBM. This happens only during CoreOS Qemu image build.
 31 |  - customer-specific host's public key(s). This happens only in runtime
 32 | 2. kernel, initramfs and cmdline couldn't be modified without regeneration of `sdboot` image
 33 | 3. zKVM host cannot access memory of secured guest VM
 34 | 4. Guest VMs cannot access memory of other secured guest VMs
 35 | 5. If anyone tries to access memory of secured guest VM, the attempt will get blocked at the HW level
 36 | 
 37 | Since the encryption private keys are stored on the IBM Z hardware and firmware, the encrypted image can only be executed in a virtual machine on the host(s) it has been prepared for, and the image can’t be decrypted or tampered with outside of the designated host(s).
 38 | 
 39 | ## Specific attacks being addressed
 40 | - protect guests against the system
 41 | - protect guests against a Linux Admin (on the host)
 42 | - protect the data services use within a secured guests from host admin and/or other guests
 43 | 
 44 | ## Specific attacks not being addressed
 45 | - any attack over Internet on some service(s) running within Guest
 46 | - usage of the fake ignition config for guest provisioning
 47 | 
 48 | > **Note:**
 49 | > `Hostkey` is generated by IBM and is stored somewhere in HW, the key is unique for each IBM Z mainframe.
 50 | > Image signed with a key (or set of keys) could be used only on machine with the identical key.
 51 | 
 52 | > **Note:**
 53 | > Only production hostkey(s) could be verified (that a host key document is genuine) using [check_hostkeydoc](https://raw.githubusercontent.com/ibm-s390-tools/s390-tools/master/genprotimg/samples/check_hostkeydoc) tool. In a later version of `genprotimg` this function will
 54 | be performed by the tooling itself.
 55 | 
 56 | # High-level overview
 57 | 
 58 | The goal of Secure Execution is to provide customer the highest level of isolation and security for each OCP node.
 59 | To achive that new type of artifact for s390x would be available to download and use - Secure Execution Ready Qcow2 Image:
 60 |  - Image comes with `sdboot` generated using universal key from IBM. Key itself is not known to public and used by RH/IBM during build of RHCOS
 61 |  - Image couldn't be modified, by anyone except `ignition` during lifetime
 62 |  - It won't possible to inspect and modify (using `guestfish` or other tools) contents of image's `/` and `/boot` filesystems
 63 |  - It's only possible to run this image on zKVM host with enabled Secure Execution support
 64 |  - It's not possible to disable Secure Execution once it was enabled
 65 | 
 66 | To use Secure Execution customer must have hostkeys for host machine(s) and specify them in `ignition` config.
 67 | 
 68 | ## Prerequisites
 69 | 
 70 | Secure Execution is supported only starting with z15 and only for zKVM guests.
 71 | In order to use it zKVM host's kernel-cmdline should have `prot_virt=1 swiotlb=262144` options.
 72 | 
 73 | KVM guests also require some additional options:
 74 | - disk's driver should have `iommu=on` option:
 75 | ```xml
 76 | <disk type='file' device='disk'>
 77 |       <driver name='qemu' type='qcow2' iommu='on'/>
 78 |       <target dev='vda' bus='virtio'/>
 79 | </disk>
 80 | ```
 81 | - network interfaces should have `iommu=on` option:
 82 | ```xml
 83 |  <interface type='network'>
 84 |       <model type='virtio'/>
 85 |       <driver name='qemu' iommu='on'/>
 86 | </interface>
 87 | ```
 88 | - `memballoon` should be disabled:
 89 | ```xml
 90 | <memballoon model='none'/>
 91 | ```
 92 | 
 93 | Or same options for cmdline execution:
 94 | ```
 95 | qemu-system-s390x \
 96 |      -machine s390-ccw-virtio,accel=kvm \
 97 |      -m 4096 -smp 2 \
 98 |      -nographic \
 99 |      -drive if=none,id=hda,file=/path/to/image.qcow2,auto-read-only=off,cache=unsafe -device virtio-blk,iommu_platform=on,drive=hda \
100 |      -netdev user,id=eth,hostfwd=tcp::2222-:22 -device virtio-net-ccw,iommu_platform=on,netdev=eth
101 | ```
102 | 
103 | # Detailed overview
104 | 
105 | To achive goals listed above not only a special qcow2 image is generated, but also some additional steps during system's lifetime are required.
106 | 
107 | ##  Build
108 |  To build Secure Execution Ready image new command is added to `coreos-assembler`:
109 |  ```
110 |  cosa buildextend-secex --hostkey UNIVERSAL_KEY.crt
111 |  ```
112 |  Which is similar to `cosa buildextend-qemu` and also generates qcow2 image.
113 | 
114 | The qcow2 image comes with `sdboot` installed into its own partition, and dm-verity setup for boot and rootfs filesystems, where hashes are part of kernel's cmdline.
115 | Here are partitions of the Secure Execution qcow2 image provided by RH/IBM:
116 | 
117 | ```
118 | NAME     PARTLABEL TYPE  MOUNTPOINT
119 | vda                disk  /
120 | vdb                disk
121 | |-vdb1   se        part
122 | |-vdb3   boot      part
123 | | `-boot           crypt /tmp/rootfs/boot
124 | |-vdb4   root      part
125 | | `-root           crypt /tmp/rootfs
126 | |-vdb5   boothash  part
127 | | `-boot           crypt /tmp/rootfs/boot
128 | `-vdb6   roothash  part
129 |   `-root           crypt /tmp/rootfs
130 | 
131 | ```
132 | 
133 | - `se`   - This is a new partition `ext4`, where we store a `sdboot` encrypted image
134 | - `boot` - boot partition is the same to one on reqular qcow2 image
135 | - `root` - rootfs partition is the same to one on reqular qcow2 image
136 | - `boothash` - partition for settting up `dm-verity` on `boot`
137 | - `roothash` - partition for settting up `dm-verity` on `boot`
138 | 
139 | `dm-verity` is used to ensure, that pristine qcow image wasn't modified after build.
140 | 
141 | > **Note:**
142 | > Further partitioning of boot disk with those 3 partitons is not supported, but
143 | rootfs could be grown. That means, that when required user should enlarge `secex-qemu.qcow2` image before `firstboot`:
144 | ```
145 | qemu-img resize /path/to/img.qcow2 +50G
146 | ```
147 | 
148 | ## Firstboot
149 | 
150 | During guest provisioning new hostkey(s) is/are fetched and used for generation of `sdboot`. User should specify them within ignition config with pre-defined name template: `/etc/se-hostkeys/ibm-z-hostkey-N` where `N` is a key number, several hostkeys could be used:
151 | 
152 | ```yaml
153 | storage:
154 |   files:
155 |     -
156 |       path: /etc/se-hostkeys/ibm-z-hostkey-1
157 |       overwrite: true
158 |       contents:
159 |         source: https://url.com/key1
160 | ```
161 | 
162 | System would also set a new random LUKS keys for both `boot` and `root` filesystems, so each node has its own LUKS keys.
163 | 
164 | ## Lifetime
165 | 
166 | Secure Execution is enabled by 2 tools provided by IBM's s390-utils package:
167 | - `genprotimg` - tool to generate `sdboot`
168 | - `zipl` - tool to install new bootloader
169 | 
170 | Despite `zipl` is well known and used by CoreOS, it couldn't be executed same way as before. When system runs in Secure Execution, `ostree` and `rdcore` automatically detect that and perform all required steps.
171 | That means, that user won't notice any difference, both mentioned components would automatically regenerate `sdboot` image and
172 | install it with `zipl` whenever required.
173 | 
174 | # Implementation steps
175 | 
176 | ## Step 1: CoreOS with existing patches to `libostree` could be configured to perform in Secure Execution mode.
177 | 
178 | Customer's public hostkey(s) should be placed into `/etc/se-hostkeys/` using name templates `ibm-z-hostkey-N`. Secure Execution is enabled by hands following https://www.ibm.com/docs/en/linux-on-systems?topic=linux-introduction.
179 | 
180 | ## Step 2: New `qemu-secex.qcow2` build artifact for CoreOS with enabled Secure Execution.
181 | 
182 | `coreos-xyz-qemu-secex.qcow2` image with `dm-verity` setup for `/boot` and `rootfs` is available for download. This is done by using new `buildextend-secex` `coreos-assembler` command, which:
183 |  - adds new `/se` partition for `sdboot`
184 |  - adds `roothash` and `boothash` partitions for [dm-verity](https://docs.kernel.org/admin-guide/device-mapper/verity.html) setup
185 |  - calculates hashes for `/boot` and `/` and appends them to kernel's cmdline
186 |  - generates `/se/sdboot` encrypted by universal public hostkey (valid for all Z15 and newer machines)
187 |  - `zipl`es `/se/sdboot`, so `qemu-secex.qcow2` image could only be run with Secure Execution
188 | 
189 | Customer should add his public hostkeys(s) into `ignition` config.
190 | 
191 | ## Step 3: During firstboot `/boot` and `rootfs` are encrypted with randomly generated LUKS keys; `integrity` option is enabled
192 | 
193 | During firstboot `/boot` and `/` are luks encrypted with randomly generated luks keys. This ensures, that each CoreOS isntance has its own LUKS keys and also makes it not possible to read the contents of qcow2 image just by inspecting (`libguestfish`) it. Any modifications of qcow image will be detected in runtime by [dm-integrity](https://docs.kernel.org/admin-guide/device-mapper/dm-integrity.html).
194 | 
195 | 
196 | After provisioning, new disk is like:
197 | ```
198 | NAME         PARTLABEL TYPE  MOUNTPOINT
199 | vdb                    disk
200 | |-vdb1       se        part
201 | |-vdb3       boot      part
202 | | `-boot_dif           crypt
203 | |   `-boot             crypt /boot
204 | `-vdb4       root      part
205 |   `-root_dif           crypt
206 |     `-root             crypt /sysroot
207 | 
208 | ```
209 | 
210 | - `se`   - partition for newly generated`sdboot` encrypted image
211 | - `boot` - LUKS encrypted boot partition
212 | - `root` - LUKS encrypted root partition
213 | - `boot_dif` - `dm-integrity` for `/boot`
214 | - `root_dif` - `dm-integrity` for `/root`
215 | 
216 | ## Step 4: Ignition protection
217 | 
218 | The goal is to prevent reading (to protect any secret stored in the Ignition config) and modification of the config.
219 | 
220 | To achive this we do:
221 | - Generate at cosa compose time a pair of GPG keys:
222 |   - The private key is part of the `sdboot` image
223 |   - The public key is part of the build metadata
224 | - Disable Ignition logging to the console (e.g. serial and VGA)
225 | - Add a hook to emergency shell to remove private key and Ignition config before entering the shell
226 | - Disable local login
227 | 
228 | User has to fetch the public GPG key for the `secex-qemu.qcow2` image and encrypt his Ignition config with  the key:
229 |   ```
230 |   gpg --recipient-file /path/to/ignition.gpg.pub --yes --output /path/to/config.ign.gpg --armor --encrypt /path/to/config.ign
231 |   ```
232 | And than instead of `serial=ignition` user uses `serial=ignition_crypted`:
233 |   ```
234 |   -drive if=none,id=ignition,format=raw,file=/path/to/config.ign.gpg,readonly=on
235 |   -device virtio-blk,serial=ignition_crypted,iommu_platform=on,drive=ignition
236 |   ```
237 | 


--------------------------------------------------------------------------------
/os/20231024-s390x-zvm-secure-ipl.md:
--------------------------------------------------------------------------------
 1 | # z/VM Guest Secure IPL (Secure Boot-like protection)
 2 | ---
 3 | 
 4 | # [Overview](https://www.ibm.com/docs/en/zvm/7.3?topic=ipl-guest-secure)
 5 | 
 6 | z/VM® supports guest secure IPL. Guest secure IPL supports the NIAP (National Information Assurance Partnership) operating system protection profile, which supports the Common Criteria certification.
 7 | 
 8 | A z/VM user can request that the machine loader validate the signed IPL code by using the security keys that were previously loaded by the customer into the HMC certificate store. The validation ensures that the IPL code is intact, unaltered, and originates from a trusted build-time source.
 9 | 
10 | This support provides the ability for a Linux guest to exploit hardware to validate the code being booted, helping to ensure it is signed by the client or its supplier.
11 | 
12 | Support is provided for the following device types:
13 | - SCSI devices.
14 | - ECKD devices.
15 | 
16 | ---
17 | 
18 | # Enabling Secure IPL at installation time (day-1)
19 | 
20 | Assuming zVM is ready for secure boot, we can setup LOADDEV at installation time
21 | 
22 | ## coreos-installer
23 | 
24 | 1) Add new `coreos.inst.secure_ipl` karg and `--secure-ipl` option. `coreos-installer-generator` appends the switch when karg is provided
25 | 2) During installation we check for `--secure-ipl` and use `vmcp` tool to set LOADDEV
26 | 3) Add new systemd unit `coreos-installer-reboot-loaddev.service` to restart from LOADDEV (which immediately terminates running CoreOS VM)
27 | 
28 | # Enabling Secure IPL on existing system (day-2)
29 | 
30 | This is for informational purposes only. We expect users to only turn on Guest Secure IPL at installation time.
31 | 
32 | 1) Login into RedHat CoreOS and ensure the output:
33 |     ```
34 |     $ cat /sys/firmware/ipl/has_secure
35 |     1
36 |     ```
37 | 2) Set target disk as zVM LOADEVICE
38 | - ECKD
39 | 
40 |     Assuming RHCOS is installed on DASD disk `0.0.5223`, from zVM terminal execute:
41 |     ```
42 |     # cp set loaddev eckd dev 5223 secure
43 |     ```
44 | - SCSI
45 | 
46 |     Assuming RHCOS is installed on FCP disk `0.0.8007,0x500507630400d1e3,0x4001404c00000000`, from zVM execute:
47 |     ```
48 |     # cp set loaddev dev 8007 portname 50050763 0400d1e3 lun 4001404c 00000000 secure
49 |     ```
50 | 3) Update bootloader
51 |     ```
52 |     $ sudo unshare --mount bash -c 'mount -o remount,rw /boot && zipl -V --secure 1'
53 |     ```
54 | 4) Poweroff RHCOS and start it from zVM terminal:
55 |     ```
56 |     # cp ipl loaddev
57 |     ```
58 | 
59 | # Ensure system runs with Secure IPL
60 | 
61 | Preferred way to check whether Secure IPL is used:
62 | ```
63 | $ cat /sys/firmware/ipl/secure
64 | 1
65 | ```
66 | But currently this functionality could be broken on some kernel versions.
67 | 
68 | Another way to check is:
69 | ```
70 | $ dmesg | grep "Secure-IPL enabled"
71 | [    0.029829] setup: Linux is running with Secure-IPL enabled
72 | ```
73 | 


--------------------------------------------------------------------------------
/os/coreos-layering.md:
--------------------------------------------------------------------------------
  1 | # CoreOS Layering
  2 | 
  3 | This enhancement proposes:
  4 | 
  5 | - A fundamental new addition to ostree/rpm-ostree, which is support for directly pulling and updating the OS from container images (while keeping all existing functionality, per-node package layering and our existing support for pulling via ostree on the wire).
  6 | - Documentation for generating derived (layered) images from the pristine CoreOS base image.
  7 | - Support for switching a FCOS system to use a custom image on firstboot via Ignition
  8 | - zincati will continue to perform upgrades by inspecting the upgrade graph from the base image.
  9 | 
 10 | This builds on [Fedora Changes/OstreeNativeContainer](https://fedoraproject.org/wiki/Changes/OstreeNativeContainer).
 11 | 
 12 | # Existing work
 13 | 
 14 | Originally, https://github.com/coreos/fedora-coreos-tracker/issues/812 tracked native support for "encapsulating" ostree commits in containers.
 15 | 
 16 | Then, it was realized that when shipping the OS as a container image, it feels natural to support users deriving from it.  The bulk of this is really "ostree native container" integration glue, and is happening in https://github.com/ostreedev/ostree-rs-ext
 17 | 
 18 | rpm-ostree vendors the ostree-rs-ext code and also be extended to support the same interfaces as implemented by the "base" ostree-rs-ext code.
 19 | 
 20 | Specifically as of today, this functionality is exposed in:
 21 | 
 22 | - `rpm-ostree ex-container`
 23 | - `rpm-ostree rebase --experimental $containerref`
 24 | 
 25 | # Rationale
 26 | 
 27 | Since the creation of Container Linux (CoreOS) as well as Atomic Host, and continuing into the Fedora/RHEL CoreOS days, we have faced a constant tension around what we ship in the host system.
 28 | 
 29 | [This issue](https://github.com/coreos/fedora-coreos-tracker/issues/401) encapsulates much prior discussion.
 30 | 
 31 | For users who are happy with Fedora CoreOS today, not much will change.
 32 | 
 33 | For those who e.g. want to install custom agents or nontrivial amounts of code (such as kubelet), this "CoreOS layering" will be a powerful new mechanism to ship the code they need.
 34 | 
 35 | # Example via Dockerfile
 36 | 
 37 | One goal the current CoreOS team has in this is to still preserve the distinction we have between "base image" and user content.
 38 | This argues for a declarative input that only allows controlled mutation. See the next section about this.
 39 | 
 40 | However, `Dockerfile` is the lowest common denominator in the container ecosystem.
 41 | To truly illustrate the goal of supporting arbitrary inputs, we must support it.
 42 | 
 43 | [fcos-derivation-example](https://github.com/cgwalters/fcos-derivation-example) contains an example `Dockerfile` that builds a Go binary and injects it along with a corresponding systemd unit as a layer, building on top of Fedora CoreOS.
 44 | 
 45 | For ease of reference, a copy of the above is inline here:
 46 | 
 47 | ```dockerfile
 48 | # Build a small Go program using a builder image
 49 | FROM registry.access.redhat.com/ubi8/ubi:latest as builder
 50 | WORKDIR /build
 51 | COPY . .
 52 | RUN yum -y install go-toolset
 53 | RUN go build hello-world.go
 54 | 
 55 | # In the future, this would be e.g. quay.io/coreos/fedora:stable
 56 | FROM quay.io/cgwalters/fcos
 57 | # Inject it into Fedora CoreOS
 58 | COPY --from=builder /build/hello-world /usr/bin
 59 | # And add our unit file
 60 | ADD hello-world.service /etc/systemd/system/hello-world.service
 61 | # Also add strace; we don't yet support `yum install` but we can
 62 | # with some work in rpm-ostree!
 63 | RUN rpm -Uvh https://kojipkgs.fedoraproject.org//packages/strace/5.14/1.fc34/x86_64/strace-5.14-1.fc34.x86_64.rpm
 64 | ```
 65 | 
 66 | # Controlled mutation
 67 | 
 68 | One of the primary advantages of the `Dockerfile` layering approach is that it allows direct filesystem modifications. However, we should distinguish between layering things (e.g. `/etc` files, or third-party daemon) and modifying base content (e.g. fast-track kernel hotfix), which has a higher likelihood of invalidating our CI process. In a cluster context for example, it's possible that a cluster admin may want to permit users only certain modifications.
 69 | 
 70 | It is expected that control mechanisms will be integrated, though it's still not clear how that will look. It may be inside rpm-ostree (e.g. requiring override switches when first rebasing to the pullspec, and/or requiring a specific label on the image), or as part of the image build process itself (e.g. as part of [finalization](https://github.com/ostreedev/ostree-rs-ext/issues/159)). Of course, higher-level interfaces may enforce even stricter guidelines or only accept easily verifiable configs such as Butane/Ignition (see Butane example below).
 71 | 
 72 | Ideally, it shouldn't be difficult for an FCOS/RHCOS user to query the kinds of mutations inside a container image, and this could then be displayed in a succinct way as part of `rpm-ostree status` when rebased onto it.
 73 | # Derivation versus Ignition/Butane
 74 | 
 75 | This proposal does not replace Ignition.  Ignition will still play at least two key roles:
 76 | 
 77 | - Setting up partitions and storage, e.g. LUKS is something configured via Ignition provided on boot.
 78 | - Machine/node specific configuration, in particular bootstrap configuration: e.g. static IP addresses that are necessary to fetch container images at all.
 79 | 
 80 | # Butane as a declarative input format for layering
 81 | 
 82 | We must support `Dockerfile`, because it's the lowest common denominator for the container ecosystem, and is accepted as input for many tools.
 83 | 
 84 | However, one does not have to use `Dockerfile` to make containers.  Specifically, what would make a lot of sense for Fedora CoreOS is to focus
 85 | on Butane as a standard declarative interface to this process.
 86 | 
 87 | This could run as a "builder" container, something like this:
 88 | 
 89 | ```
 90 | FROM quay.io/coreos/butane:release
 91 | COPY . .
 92 | RUN butane -o /build/ignition.json
 93 | 
 94 | FROM quay.io/fedora/coreos:stable
 95 | COPY --from=builder /build/ignition.json /tmp/
 96 | RUN ignition --write-filesystem /tmp/ignition.json && rm -f /tmp/ignition.json
 97 | ```
 98 | 
 99 | Another option is to support being run nested inside an existing container tool, similar to
100 | [kaniko](https://github.com/GoogleContainerTools/kaniko).  Then no
101 | `Dockerfile` would be needed.
102 | 
103 | [More information on nesting container builds](https://www.redhat.com/sysadmin/podman-inside-kubernetes).
104 | 
105 | # Use of CoreOS disk/boot images
106 | 
107 | And more explicitly, it's expected that many if not most users would continue to use the official Fedora CoreOS "boot images" (e.g. ISO, AMI, qcow2, etc.).  This proposal does *not* currently call for exposing a way for a user to create the boot image shell around their custom container, although that is an obvious potential next step.
108 | 
109 | Hence, a user wanting to use a custom base image would provide machines with an Ignition config that performs e.g. `rpm-ostree rebase ostree-remote-image:quay.io/examplecorp/baseos:latest` as a systemd unit.  It is likely that we would provide this via [Butane](github.com/coreos/butane) as well; for example:
110 | 
111 | ```
112 | variant: fcos
113 | version: 1.5.0
114 | ostree_container:
115 |   image: quay.io/mycorp/myfcos:stable
116 |   reboot: true
117 | ```
118 | 


--------------------------------------------------------------------------------
/template.md:
--------------------------------------------------------------------------------
 1 | - Feature Name: (fill me in with a unique ident, `my_awesome_feature`)
 2 | 
 3 | # Summary
 4 | [summary]: #summary
 5 | 
 6 | One paragraph explanation of the proposal.
 7 | 
 8 | # Motivation
 9 | [motivation]: #motivation
10 | 
11 | Why are we doing this? What use cases does it support? What is the expected outcome?
12 | 
13 | # Explanation and Examples
14 | [explanation-and-examples]: #explanation-and-examples
15 | 
16 | Explain the proposal as if it was already accepted and you were teaching it to another person. That generally means:
17 | 
18 | - Introducing new named concepts.
19 | - Explaining the feature largely in terms of examples.
20 | - If applicable, provide sample error messages, deprecation warnings, or migration guidance.
21 | - If applicable, explain how to use this enhancement.
22 | 
23 | # Drawbacks
24 | [drawbacks]: #drawbacks
25 | 
26 | Why should we *not* do this?
27 | 
28 | # Rationale and alternatives
29 | [rationale-and-alternatives]: #rationale-and-alternatives
30 | 
31 | - Why is this design the best in the space of possible designs?
32 | - What other designs have been considered and what is the rationale for not choosing them?
33 | - What is the impact of not doing this?
34 | 
35 | 


--------------------------------------------------------------------------------