19 | {% endif %}
20 | {% endmacro %}
21 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # OmniBOR Website
2 |
3 | This is the repository for the official [OmniBOR website][omnibor_site]. The
4 | site is built using Hugo, and is deployed to Netlify.
5 |
6 | ## Contributing
7 |
8 | We're happy to accept contributions to the site! All contributions are done
9 | under the terms of the Apache 2.0 license. See the [`LICENSE` file][license]
10 | for more information.
11 |
12 | We provide a [devcontainer][devcontainer] to make contributing easier, it
13 | comes with all dependencies required to build the site already installed.
14 | You can use the container locally, or using [GitHub codespaces][codespaces].
15 |
16 | ## License
17 |
18 | The website code is licensed under the Apache 2.0 license. The full contents
19 | can be found in the [`LICENSE` file][license].
20 |
21 | [omnibor_site]: https://omnibor.io/
22 | [license]: https://github.com/omnibor/site/blob/main/LICENSE
23 | [devcontainer]: https://containers.dev/
24 | [codespaces]: https://docs.github.com/en/codespaces/developing-in-a-codespace/developing-in-a-codespace
25 |
26 |
27 |
--------------------------------------------------------------------------------
/templates/blog.html:
--------------------------------------------------------------------------------
1 | {% extends "base.html" %}
2 |
3 | {% block hero %}
4 | {% endblock %}
5 |
6 | {% block content %}
7 |
41 | {% endblock %}
42 |
43 | {% block sidebar %}
44 | {{ toc::toc(obj=page) }}
45 | {% endblock %}
46 |
--------------------------------------------------------------------------------
/content/docs/_index.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: Docs
3 | ---
4 |
5 | While the [OmniBOR Specification][spec] is the official reference information
6 | for what OmniBOR is, how its key concepts are defined, and how producers and
7 | users of OmniBOR data should work, it's not necessarily an approachable guide
8 | to understanding the motivation and purpose behind the design of OmniBOR.
9 |
10 | This documentation is intended to supplement the specification. If this
11 | material and the specification are ever in conflict, the specification
12 | supersedes this content.
13 |
14 | Currently, there are four pieces of documentation:
15 |
16 | * [__Artifact ID__][artifact_id]: An explanation of what Artifact IDs are, how their defined,
17 | and how to derive them yourself.
18 | * [__Input Manifest__][input_manifest]: An explanation of what Input Manifests are, how they're
19 | structured, stored, and distributed.
20 | * [__Glossary__][glossary]: A glossary of important terms used throughout the OmniBOR
21 | project.
22 | * [__Resources__][resources]: Links to other helpful resources including written materials,
23 | frequently asked questions, and conference talks.
24 |
25 | [spec]: @/spec/_index.md
26 | [artifact_id]: @/docs/artifact-ids.md
27 | [input_manifest]: @/docs/input-manifests.md
28 | [glossary]: @/glossary/_index.md
29 | [resources]: @/resources/_index.md
30 |
--------------------------------------------------------------------------------
/content/resources/_index.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: Resources
3 | ---
4 |
5 | The following are collected resources related to OmniBOR.
6 |
7 | ## Frequently Asked Questions
8 |
9 | We maintain a list of answers to [frequently asked questions][faq] about
10 | OmniBOR.
11 |
12 | ## Whitepaper
13 |
14 | OmniBOR was originally envisioned in a whitepaper drafted by Aeva Black.
15 | [View the whitepaper here][whitepaper].
16 |
17 | ## Talks
18 |
19 | | Event | Video |
20 | |:----------------------------------|:------------------------------------------|
21 | | Cloud Native Security Con EU 2022 | [View][cloud_native_security_con_eu_2022] |
22 | | Supply Chain Security Con NA 2021 | [View][supply_chain_security_con_na_2021] |
23 |
24 | ## Slides
25 |
26 | | Title | Link | Notes |
27 | |:----------------|:--------------|:------------------------------------------------------------------|
28 | | Intro to GitBOM | [View][intro] | These slides predate the project renaming from GitBOM to OmniBOR. |
29 |
30 | [faq]: @/resources/faq.md
31 | [cloud_native_security_con_eu_2022]: https://www.youtube.com/watch?v=2SSkNLWL4UM
32 | [supply_chain_security_con_na_2021]: https://www.youtube.com/watch?v=GKyrsDOse6s&t=546s
33 | [intro]: https://docs.google.com/presentation/d/1fSyRyvYhRYQr-RGm5N1TFcSLQdNV7YYtZmaf2xVwjy4/edit?usp=sharing
34 | [whitepaper]: @/resources/whitepaper.md
35 |
--------------------------------------------------------------------------------
/templates/project.html:
--------------------------------------------------------------------------------
1 | {% extends "base.html" %}
2 |
3 | {% block hero %}
4 | {% endblock %}
5 |
6 | {% block content %}
7 |
41 | {% endblock %}
42 |
43 | {% block sidebar %}
44 | {% endblock %}
45 |
46 | {% block body_scripts %}
47 |
60 | {% endblock %}
61 |
--------------------------------------------------------------------------------
/content/third-party.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: Third-Party Integrations
3 | ---
4 |
5 | One of the goals of the OmniBOR project is to integrate the generation of
6 | [Input Manifests][input-manifests] into compilers, linkers, archivers,
7 | bundlers, and containerization tools people already use today. Ideally, we can
8 | achieve a future where Input Manifests are built automatically, and where
9 | anyone distributing software distributes the Input Manifests alongside it. This
10 | is how we can collectively achieve a future of universal transparency through
11 | Artifact Dependency Graphs.
12 |
13 | ## Build Tool Patches
14 |
15 | This goal involves working with a large number of open source projects and
16 | communities. In the spirit of open source, collaboration, and putting in the
17 | work, the OmniBOR Project maintains patches for some existing build tools.
18 | Long-term, our plan is to work with these projects to get OmniBOR generation
19 | integrated upstream.
20 |
21 | Today, we maintain patches for the following tools:
22 |
23 | | Name | Patch |
24 | |:----------|:-----------------|
25 | | GCC | [Link][gcc] |
26 | | LLVM | [Link][llvm] |
27 | | Binutils | [Link][binutils] |
28 | | GNU Patch | [Link][patch] |
29 |
30 | If you're interested in contributing to these patches, helping maintain them,
31 | or getting this work upstreamed into their respective projects, we'd love for
32 | you to [get involved][contribute]!
33 |
34 | [input-manifests]: @/docs/input-manifests.md
35 | [gcc]: https://github.com/omnibor/gcc-omnibor
36 | [llvm]: https://github.com/omnibor/llvm-omnibor
37 | [binutils]: https://github.com/omnibor/binutils-omnibor
38 | [patch]: https://github.com/omnibor/patch-omnibor
39 | [contribute]: @/contribute.md
40 |
--------------------------------------------------------------------------------
/netlify.toml:
--------------------------------------------------------------------------------
1 | #============================================================================
2 | # General Build Configuration
3 | #----------------------------------------------------------------------------
4 |
5 | [build]
6 | publish = "public"
7 |
8 | #============================================================================
9 | # Deployment Contexts
10 | #
11 | # Learn more: https://docs.netlify.com/configure-builds/file-based-configuration/#deploy-contexts
12 | #----------------------------------------------------------------------------
13 |
14 | # Deploys coming from `main`.
15 | [context.production]
16 | command = "zola build && tailwindcss -i styles/main.css -o public/main.css"
17 | environment = {ZOLA_VERSION = "0.13.0"}
18 |
19 | # Deploys coming from pull requests.
20 | [context.deploy-preview]
21 | command = "zola build --base-url $DEPLOY_PRIME_URL && tailwindcss -i styles/main.css -o public/main.css"
22 | environment = {ZOLA_VERSION = "0.13.0"}
23 |
24 | # Deploys coming from branches other than `main`.
25 | [context.branch-deploy]
26 | command = "zola build --base-url $DEPLOY_PRIME_URL && tailwindcss -i styles/main.css -o public/main.css"
27 | environment = {ZOLA_VERSION = "0.13.0"}
28 |
29 | #============================================================================
30 | # Redirects
31 | #----------------------------------------------------------------------------
32 |
33 | [[redirects]]
34 | from = "/community"
35 | to = "/contribute"
36 |
37 | [[redirects]]
38 | from = "/glossary/git/#git-ref"
39 | to = "/glossary/git/#git-object-id-gitoid"
40 |
41 | [[redirects]]
42 | from = "/glossary/git/#git-ref"
43 | to = "/glossary/git/#git-object-id-gitoid"
44 |
45 | [[redirects]]
46 | from = "https://gitbom.dev/*"
47 | to = "https://omnibor.io/:splat"
48 | status = 301
49 | force = true
50 |
--------------------------------------------------------------------------------
/templates/includes/header.html:
--------------------------------------------------------------------------------
1 |
2 |
37 |
38 |
--------------------------------------------------------------------------------
/content/blog/2024-10-16-new-website.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: A New OmniBOR Website
3 | authors:
4 | - Andrew Lilley Brinker
5 | ---
6 |
7 | The OmniBOR Project has a new website, new documentation, and for the first
8 | time, a project blog!
9 |
10 |
11 |
12 | If you're reading this, then you can already see the new look of the OmniBOR
13 | website. This new website does include a substantial redesign, intended to work
14 | better on both desktop and mobile, but it also includes a substantial addition
15 | of new content!
16 |
17 | The project website now hosts the [OmniBOR specification][spec]. Our intent is
18 | to keep this updated with any new release of the specification. If you'd like
19 | to track the pre-release updates of new versions, we also link to the
20 | [`omnibor/spec`][spec_repo] repository.
21 |
22 | We've also added new [documentation][docs] intended to provide a more
23 | accessible explanation of [Artifact IDs][artifact-ids],
24 | [Input Manifests][input-manifests], and the other key concepts underlying what
25 | we're doing with OmniBOR. If you're brand new to OmniBOR, we recommend you
26 | start with this new documentation before diving into the more detailed
27 | specification.
28 |
29 | Finally, the new site has [a blog][blog]! This is the first post on that blog. In the
30 | future we'll be using this to provide updates on the project, to share
31 | information about work we're doing around the open source ecosystem, and more.
32 |
33 | As always, OmniBOR is a project that benefits enormously from the
34 | involvement of a broad community of people. If you're interested in solving
35 | the challenge of universal software identity, we'd love for you to [get
36 | involved][contribute]!
37 |
38 | [spec]: @/spec/_index.md
39 | [spec_repo]: https://github.com/omnibor/spec
40 | [docs]: @/docs/_index.md
41 | [artifact-ids]: @/docs/artifact-ids.md
42 | [input-manifests]: @/docs/input-manifests.md
43 | [blog]: @/blog/_index.md
44 | [contribute]: @/contribute.md
45 |
--------------------------------------------------------------------------------
/templates/blog_post.html:
--------------------------------------------------------------------------------
1 | {% extends "base.html" %}
2 |
3 | {% block hero %}
4 | {% endblock %}
5 |
6 | {% block content %}
7 |
14 | {% if page.authors %}
15 |
16 | {% set num_authors = page.authors | length %}
17 | {% if num_authors != 0 %}
18 | Written by
19 | {% for author in page.authors -%}
20 | {# I WILL produce a proper Oxford comma #}
21 | {%- if not loop.first and not loop.last -%}, {%- elif loop.last and num_authors != 1 -%}{% if num_authors >= 3%},{% endif %} and {%- endif -%}{{ author }}
22 | {%- endfor %}
23 | {% endif %}
24 |
25 | {% endif %}
26 |
27 | Posted on {{ page.date | date(format="%B %-d, %Y") }}
28 |
29 |
30 |
31 |
56 | {{ page.content | safe }}
57 |
58 | {% endblock %}
59 |
60 | {% block sidebar %}
61 | {% endblock %}
62 |
--------------------------------------------------------------------------------
/content/contribute.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Contribute"
3 | ---
4 |
5 | The OmniBOR Project is happy to accept contributions from anyone! Part of our
6 | ethos is that __OmniBOR is for everyone__. Below you'll find some guidance for
7 | making contributions to different parts of the OmniBOR project!
8 |
9 | ## Contributing to the OmniBOR Specification
10 |
11 | The OmniBOR specification is the central location where we define Artifact IDs,
12 | Input Manifests, and how they should be constructed, stored, and distributed.
13 |
14 | Contributions to the specification can take many shapes. For small,
15 | non-semantic contributions, like rewriting prose for clarity or fixing typos
16 | and grammatical mistakes, we encourage you to just make a Pull Request directly
17 | to the specification repository on GitHub.
18 |
19 | For larger contributions, especially those which change the semantic meaning of
20 | the spec, we work by a consensus model which involves:
21 |
22 | 1. Proposing changes
23 | 2. Discussing those changes among the OmniBOR Working Group
24 | 3. Modifying proposals based on feedback
25 | 4. If consensus is reached, making edits to the specification
26 |
27 | The process of gathering feedback currently happens largely synchronously during
28 | the weekly OmniBOR Working Group calls, held on Zoom every Monday from 10am to
29 | 11am Pacific Time.
30 |
31 | ### Working Group Meetings
32 |
33 | [iCal-format calendar subscription](https://calendar.google.com/calendar/ical/rqmtkd0ucekn9obagmo9v4b6s4%40group.calendar.google.com/public/basic.ics)
34 |
35 | {{ calendar(id = "rqmtkd0ucekn9obagmo9v4b6s4") }}
36 |
37 | ## Contributing Code to the OmniBOR Project
38 |
39 | The OmniBOR Project maintains a number of existing software projects, including:
40 |
41 | - First-party implementations of the OmniBOR specification
42 | - Patches for third-party tools to support producing OmniBOR data
43 | - Additional tools to support interacting with OmniBOR data
44 |
45 | We are happy to accept contributions to any of these!
46 |
47 | Each project has its own license and may have specific unique contribution
48 | guidance, so you should review the policies of the specific repository before
49 | contributing. For first-party implementations, we default to permissive
50 | open source licenses like Apache 2.0 or MIT. For patches to third-party tools,
51 | we match the licensing of the upstream tool, as our goal is to eventually
52 | merge any patches we maintain back into upstream so others can make use of our
53 | changes.
54 |
--------------------------------------------------------------------------------
/templates/includes/footer.html:
--------------------------------------------------------------------------------
1 |
2 |
12 | OmniBOR defines two key concepts, Artifact IDs and
13 | Input Manifests, that enable anyone to
14 | independently produce the same identifier for any software artifact, and to detect any artifact built
15 | with vulnerable inputs.
16 |
17 |
18 | {# Artifact ID block #}
19 |
20 |
21 |
{{ icon::icon(name="tag", classes="w-5 h-5") }}
22 |
23 |
Artifact IDs
24 |
Reproducible identifiers based only on an artifact itself.
42 | {% endblock %}
43 |
44 | {% block content %}
45 | {% endblock %}
46 |
47 | {% block sidebar %}
48 | {% endblock %}
49 |
--------------------------------------------------------------------------------
/content/spec/v0.1/annex-b.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: OmniBOR Specification, Version 0.1
3 | extra:
4 | subtitle: Annex B — ELF Embedding
5 | ---
6 |
7 | Annex B contains a method of embedding Input Manifest Identifiers into ELF
8 | files.
9 |
10 | ## Input Manifest Identifiers
11 |
12 | Input Manifest Identifiers are Artifact Identifiers (Git Object Identifiers
13 | \[GitOIDs\]) for Artifact Input Manifests. They identify an Artifact Input
14 | Manifest and MAY be embedded into an artifact to relate the artifact to its
15 | Artifact Input Manifests.
16 |
17 | If an ELF artifact contains an embedded Input Manifest Identifier, then
18 | implementations MUST conform to the format specified in this document.
19 |
20 | Note that multiple Input Manifests MUST be produced for a single artifact,
21 | reflecting the use of different hash functions to produce the Artifact
22 | Identifiers.
23 |
24 | ## Input Manifest Identifier persistence in ELF Objects/Executables
25 |
26 | Input Manifest Identifiers MUST be persisted by build tools when they build
27 | an artifact and produce an Artifact Input Manifest for that artifact.
28 |
29 | When persisting Input Manifest Identifiers into an ELF object or an ELF
30 | executable, the build tool MUST create a [section][elf_section]
31 | `.note.omnibor` and place the Input Manifest Identifiers in the descriptor
32 | field of the note entry. This section MUST be of type `SHT_NOTE` and MUST have
33 | the attribute `SHF_ALLOC`. Multiple Note entries MUST be created, one for each
34 | Artifact Identifier type when multiple Artifact Identifier types are involved.
35 | Each note entry MUST contain the following fields in the same order as given
36 | below:
37 |
38 | 1. `namesz` (4 bytes): This field MUST be set to a value of `8`, the length of
39 | the 'owner' field `OMNIBOR\0` in bytes.
40 | 2. `descz` (4 bytes): This field MUST contain the length of the Input Manifest
41 | Identifier in bytes, including a byte for the null terminator.
42 | 3. `type` (4 bytes): This field MUST contain the value associated with one of
43 | the reserved Artifact Identifier types. The values for the reserved types
44 | are in the range of `0x00000000` to `0x7fffffff`. Permissible types with
45 | reserved values are:
46 |
47 | ```
48 | NT_GITOID_BLOB_SHA1 = 0x1,
49 | NT_GITOID_BLOB_SHA256 = 0x2,
50 | ```
51 |
52 | 4. `owner` (8 bytes): This field MUST contain the string `OMNIBOR\0`, padded to
53 | 8 bytes.
54 | 5. `descriptor`: This field MUST contain the Input Manifest Identifiers as raw
55 | bytes.The length of this field is the same as the value in the `descz` field.
56 |
57 | When recording multiple Input Manifest Identifiers in the note section,
58 |
59 | 1. There MUST be only one note entry for each Input Manifest Identifier type.
60 | 2. The note entries MUST be in ascending order of Input Manifest Identifier
61 | type.
62 |
63 | Conforming build tools MUST generate all Input Manifest Identifier types,
64 | currently SHA1 and SHA256 Artifact Identifiers.
65 |
66 | [elf_section]: https://refspecs.linuxfoundation.org/LSB_3.0.0/LSB-PDA/LSB-PDA.junk/sections.html
67 |
--------------------------------------------------------------------------------
/content/spec/v0.1/annex-a.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: OmniBOR Specification, Version 0.1
3 | extra:
4 | subtitle: Annex A — File System Storage
5 | ---
6 |
7 | Annex A documents known methods of persisting OmniBOR Documents to various
8 | stores.
9 |
10 | ## Input Manifest persistence by a Build Tool to its local filesystem
11 |
12 | If a build tool persists an Input Manifest to its local filesystem, the build
13 | tool should write out the Input Manifest to
14 | `${OMNIBOR_DIR}/objects/${Artifact Identifier Type uri prefix with ':' replaced by '_'}/${Input Manifest Identifier:0:2}/${Input Manifest Identifier:2:}`
15 | where `${Input Manifest Identifier}` is Input Manifest Identifier in lowercase
16 | hexadecimal with leading zeros NOT suppressed.
17 |
18 | Example:
19 |
20 | If `OMNIBOR_DIR=.omnibor` then the Input Manifest with `gitoid:blob:sha1` Input
21 | Manifest Identifier `0e8efd4cdf0d5bafcfcae658c2662a73b199b301` would be stored
22 | in:
23 |
24 | ```
25 | .omnibor/objects/gitoid_blob_sha1/0e/8efd4cdf0d5bafcfcae658c2662a73b199b301
26 | ```
27 |
28 | ## Build tool persistence of related metadata
29 |
30 | A build tool may persist additional metadata to that makes reference to the
31 | Artifact Dependency Graph (ADG). It should persist such metadata to a
32 | subdirectory of the directory to which the output artifact is being
33 | written of the form: `${OMNIBOR_DIR}/metadata/${context}/`.
34 |
35 | For metadata specific to a particular build tool `${context}` should be a name
36 | uniquely associated with the build tool. For example:
37 |
38 | - `${OMNIBOR_DIR}/metadata/llvm`
39 | - `${OMNIBOR_DIR}/metadata/clang`
40 | - `${OMNIBOR_DIR}/metadata/go`
41 | - `${OMNIBOR_DIR}/metadata/rustc`
42 | - `${OMNIBOR_DIR}/metadata/gcc`
43 |
44 | Build tools should report their selection of `${context}` subdirectory name to
45 | the OmniBOR spec for inclusion in a list to preclude `${context}` collision.
46 |
47 | Metadata persisted by multiple build tools in the same way should be documented
48 | in a specification for that metadata. Such specs must include the `${context}`
49 | for that metadata. Such specs should be reported to the OmniBOR spec for
50 | inclusion in a list to preclude `${context}` collision. For example, if a
51 | group of build tools decide to store metadata about file locations in a common
52 | format, they might choose to define a `${context}` `filelocation` in which case
53 | the metadata would be stored in `${OMNIBOR_DIR}/metadata/filelocation`.
54 |
55 | Subdirectory structure, filenaming, and file schema below
56 | `${OMNIBOR_DIR}/metadata/${context}/` are at the discretion of the build tool
57 | for build tool specific metadata or the metadata spec for common metadata.
58 |
59 | ## Build tool selection of `OMNIBOR_DIR`
60 |
61 | `OMNIBOR_DIR` may be set by the following methods, listed in order of
62 | precedence:
63 |
64 | 1. A build tool specific flag
65 | 2. A non-empty env variable named `OMNIBOR_DIR`
66 |
67 | The absence of specification of a location to write omnibor data via either the
68 | build tool specific flag or `OMNIBOR_DIR` variable may be taken as a signal to
69 | skip OmniBOR generation.
70 |
--------------------------------------------------------------------------------
/content/project.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Project"
3 | template: project.html
4 | ---
5 |
6 | The OmniBOR project consists of:
7 |
8 | - The [OmniBOR specification][spec]
9 | - A set of first-party OmniBOR implementations, including:
10 | - [`omnibor-rs`][rust]: A Rust implementation
11 | - [`omnibor-go`][go]: A Go implementation
12 | - [`omnibor-dotnet`][dotnet]: A .NET implementation
13 | - [`omnibor-py`][python]: A Python implementation
14 | - A set of patches for third-party software to add OmniBOR support, including:
15 | - [`patch-omnibor`][patch]: A patched version of GNU `patch`.
16 | - [`gcc-omnibor`][gcc]: A patched version of GCC.
17 | - [`binutils-omnibor`][binutils]: A patched version of `binutils`.
18 | - [`llvm-omnibor`][llvm]: A patched version of LLVM.
19 | - Miscellaneous other tools related to OmniBOR, including:
20 | - [`bomsh`][bomsh]: Shell scripts for interacting with OmniBOR data.
21 | - [`jbor`][jbor]: A Java agent to log OmniBOR Artifact IDs.
22 | - The [OmniBOR website][site]
23 | - OmniBOR project spaces, including:
24 | - GitHub Discussions under on any OmniBOR repositories.
25 | - The weekly OmniBOR Working Group meetings.
26 | - Any other meetings or discussion spaces operated by the OmniBOR project.
27 |
28 | ## Code of Conduct
29 |
30 | All OmniBOR projects and spaces are covered by the [OmniBOR Project Code of
31 | Conduct][coc].
32 |
33 | ## Governance
34 |
35 | The OmniBOR Project is governed by consensus among active project
36 | participants. Generally, being an "active project participant" means
37 | participating in the weekly OmniBOR Working Group meetings, currently
38 | held over Zoom from 10am to 11am Pacific Time on Mondays.
39 |
40 | Proposals for improvements to the language are generally discussed
41 | during these meetings, and when consensus is reached on a design, a formal
42 | proposal is made to the relevant repository and the change is merged.
43 |
44 | The project does have a Core Team of long-term active participants. The
45 | OmniBOR Core Team currently consists of:
46 |
47 |
81 |
82 |
88 |
89 | {# Smooth scrolling on anchor click for the current page. #}
90 |
105 |
106 | {% block body_scripts %}{% endblock %}
107 |
108 |
109 |
--------------------------------------------------------------------------------
/content/resources/faq.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: Frequently Asked Questions
3 | ---
4 |
5 | This FAQ is a good-faith attempt by the OmniBOR community to answer common
6 | OmniBOR questions. This document will evolve over time and as the community
7 | grows.
8 |
9 | To propose a new question, please open [an issue][site_issue].
10 |
11 | To propose a question-answer pair, please open [a pull request][site_pr]
12 | updating this file.
13 |
14 | ## GitOID's are based on SHA-1, isn't SHA-1 broken?
15 |
16 | Git itself still uses SHA-1, and we'd like to be compatible with git and tools
17 | that already use gitoids. We plan to update to another hashing algorithm
18 | if/when git updates.
19 |
20 | [shattered.io][shattered] has found an impressive attack on SHA-1 in PDF files.
21 | There are [misconceptions about what that means][shattered_misconceptions].
22 |
23 | GitHub has published some analysis of its [implications for git][shattered_git].
24 |
25 | ## Git has been considering moving to SHA-256 for some time. Why doesn't OmniBOR simply adopt that?
26 |
27 | Great question. We might — when Git does.
28 |
29 | ## Why isn't information about the compiler or linker included in the OmniBOR?
30 |
31 | Our view is that build environment information that does not affect the build
32 | output should not be represented in the OmniBOR. Doing so would invalidate the
33 | characteristics of **Uniqueness** and **Artifact Identity**.
34 |
35 | ## Why isn't metadata included in the OmniBOR?
36 |
37 | OmniBOR seeks to have the following characteristics:
38 |
39 | 1. **Artifact Equivalence**: Two artifacts are equivalent if and only if they
40 | are bit-for-bit identical.
41 | 2. **Artifact Identity**: Independent parties derive the same artifact identity
42 | when presented with equivalent artifacts.
43 | 3. **Immutability**: An identified artifact can not be modified without changing
44 | its identity.
45 | 4. **Uniqueness**: An artifact can have precisely *one* artifact identity graph.
46 | All equivalent artifacts have the same graph.
47 |
48 | The uniqueness requirement is what drives the exclusion of metadata from
49 | OmniBOR.
50 |
51 | ## Will the generation of artifact dependency graphs slow down build processes? Will the graphs be very large?
52 |
53 | We don't think so and would be delighted to receive data from very large
54 | projects that would either challenge or validate this assumption.
55 |
56 | ## What about files with duplicate hashes?
57 |
58 | We don't think this will be a problem because OmniBOR does not include any
59 | metadata, such as provenance, timestamp, and licence — the domain of SBOMs.
60 |
61 | While duplicate hashes of empty files and regularly copied files (such as
62 | LICENSE files) are guaranteed to occur, this does not affect the security
63 | properties of OmniBOR.
64 |
65 | ## How do [Software Heritage Foundation][swh] identifiers relate to OmniBOR Identifiers?
66 |
67 | [Software Heritage Foundation Identifiers][swhid] use
68 | [Git Object IDs][swhid_gitoid] as part of their
69 | [core identifiers][swhid_coreids]:
70 |
71 | > SWHIDs for contents, directories, revisions, and releases are, at present,
72 | > compatible with the Git way of computing identifiers for its objects. The
73 | > part of a SWHID for a content object is the Git blob identifier
74 | > of any file with the same content; for a revision it is the Git commit
75 | > identifier for the same revision, etc.
76 |
77 | OmniBOR uses Git Object IDs as the entire Artifact ID.
78 |
79 | Whereas SWHIDs' core identifier includes additional metadata (see
80 | [SWHID Syntax][swhid_metadata]):
81 |
82 | ```
83 | ::= "swh" ":" ":" ":" ;
84 | ```
85 |
86 | … the Git Object ID is the object's identifier in an Input Manifest.
87 |
88 | ```
89 | ::=
90 | ```
91 |
92 | The scheme in which SWHIDs are used is also different from the scheme in which
93 | OmniBOR Artifact IDs are used in an Input Manifest.
94 |
95 | [site_issue]: https://github.com/omnibor/site/issues
96 | [site_pr]: https://github.com/omnibor/site/pulls
97 | [shattered]: https://shattered.io/
98 | [shattered_misconceptions]: https://manishearth.github.io/blog/2017/02/26/clarifying-misconceptions-about-shattered/
99 | [shattered_git]: https://github.blog/2017-03-20-sha-1-collision-detection-on-github-com/
100 | [swh]: https://www.softwareheritage.org/
101 | [swhid]: https://docs.softwareheritage.org/devel/swh-model/persistent-identifiers.html#persistent-identifiers
102 | [swhid_gitoid]: https://docs.softwareheritage.org/devel/swh-model/persistent-identifiers.html#git-compatibility
103 | [swhid_coreids]: https://docs.softwareheritage.org/devel/swh-model/persistent-identifiers.html#core-identifiers
104 | [swhid_metadata]: https://docs.softwareheritage.org/devel/swh-model/persistent-identifiers.html#syntax
105 |
--------------------------------------------------------------------------------
/content/docs/input-manifests.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: Input Manifests
3 | template: doc.html
4 | ---
5 |
6 | Input Manifests, alongside [Artifact IDs][artifact_ids], are one half of the
7 | equation for OmniBOR. Input Manifests are how OmniBOR records the inputs used
8 | to build software artifacts, and form the basis for how OmniBOR can allow
9 | consumers of software to build fine-grained _Artifact Dependency Graphs_ (ADGs)
10 | that enable rapid discovery of vulnerable components and much more.
11 |
12 | ## What is an Input Manifest?
13 |
14 | An Input Manifests is a small text file format which records information about
15 | the inputs used to build a software artifacts. By "inputs" we mean anything
16 | provided to a build tool in order to produce the artifact. For example, when
17 | building a project written in the C programming language, the Input Manifest
18 | for a `.o` file (an object file) built from an associated `.c` file (a
19 | source file) would have an Input Manifest recording the Artifact ID of the
20 | `.c` file.
21 |
22 | ## What do Input Manifests Look Like?
23 |
24 | Input Manifests look something like this:
25 |
26 | ```
27 | gitoid:blob:sha256\n
28 | 09c825ac02df9150e4f93d12ba1da5d1ff5846c3e62503c814aa3a300c535772\n
29 | 230f3515d1306690815bd9c3da0d15d8b6fcf43894d17100eb44b6d329a92f61\n
30 | 2f4a51b16b76bbc87c4c27af8ae062b1b50b280f1ab78e3eec155334588dc88e manifest 4f3a822f776412c049dda53c3277bf2225b51b805ce8a99222af23a7d9f55636\n
31 | c71d239df91726fc519c6eb72d318ec65820627232b2f796219e87dcf35d0ab4\n
32 | f47ffb3518f236eea6525fd29f057ddd5cda1bb803ccc662e6bc5925afd1e4af\n
33 | ```
34 |
35 | Every Input Manifest starts with a header that provides some information about
36 | the Artifact IDs used throughout the rest of the manifest. Every Artifact ID
37 | includes `blob` as its object type, and `sha256` as its hash type, and _all_
38 | Artifact IDs in a single Input Manifest must have the same hash type as all
39 | others. This is to ensure in the future, if Artifact IDs are ever updated to
40 | support more hash algorithms, that a single Input Manifest only uses one hash
41 | algorithm at a time.
42 |
43 | Then we have Artifact IDs for each input artifact used to build the "target
44 | artifact" being described. These Artifact IDs are listed in lexical order.
45 |
46 | Each line is separated by a single newline character (`\n`) regardless of
47 | the user's current platform. This is because we need these Input Manifests to
48 | always be bit-for-bit identical regardless of where they're derived.
49 |
50 | The one additional wrinkle is that if an input artifact itself has an Input
51 | Manifest, we can include the Artifact ID of the input artifact's Input Manifest
52 | as well.
53 |
54 | _This_ is a key part of the secret sauce of OmniBOR, which we'll explain in
55 | the next section.
56 |
57 | ## From Input Manifests to Artifact Dependency Graphs
58 |
59 | There are two key ideas that we've not discussed yet which turn OmniBOR from
60 | a lightweight method for listing IDs of input files and turn it into a
61 | Merkle tree for a software artifact's complete dependency tree:
62 |
63 | - Input Manifests record Artifact IDs of their inputs and (if available) their
64 | inputs' own Input Manifests
65 | - The Artifact ID of an Input Manifest should be embedded in the artifact
66 | itself at build time.
67 |
68 | These two details, when implemented in tooling, mean that artifacts become
69 | cryptographically tied to a description of their own inputs, which can't be
70 | modified without detection. Because an Artifact ID is based on the contents of
71 | an artifact, if the artifact's contents include the Artifact ID of its own
72 | Input Manifest, any change to that manifest results in a change in the
73 | artifact's own Artifact ID.
74 |
75 | Thinking this through, this means changes in a dependency anywhere in the
76 | dependency graph results in changes of Artifact IDs for _anything_ derived from
77 | it, no matter how many steps removed it is!
78 |
79 | For some people, you may already be thinking this sounds like a form of Merkel
80 | tree, and you're right! OmniBOR's Artifact IDs and Input Manifests come
81 | together to form a Merkel tree of all dependencies used to construct an
82 | artifact. If a software consumer receives all the Input Manifests for an
83 | artifact and its dependencies, they can not just detect changes in the
84 | Artifact ID of the artifact itself, but they can use the manifests to drill
85 | down to exactly what changed and where.
86 |
87 | ## How Should Input Manifests be Produced?
88 |
89 | Our dream as a project is to get changes upstreamed into popular software build
90 | tools like compilers, containerization tools, linkers, archivers, and more to
91 | be able to automatically produce Input Manifests and, whenever possible, to
92 | embed their Artifact IDs into the artifacts being constructed.
93 |
94 | We're also working on tooling to enable users of these tools to wrap them so
95 | they produce Artifact IDs today, though this is not yet ready.
96 |
97 | ## What's Next?
98 |
99 | We're still working to develop the implementations of OmniBOR to fruition that
100 | will enable others to start integrating it and producing Artifact IDs and
101 | Input Manifests. If that sounds interesting to you, [come help us][contribute]!
102 |
103 | [artifact_ids]: @/docs/artifact-ids.md
104 | [contribute]: @/contribute.md
105 |
--------------------------------------------------------------------------------
/content/docs/artifact-ids.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: Artifact IDs
3 | template: doc.html
4 | ---
5 |
6 | NOTE: The following explanations are based on the latest version of the
7 | OmniBOR specification. If any conflicts arise between the explanations given
8 | here and the OmniBOR specification, the specification supersedes these
9 | explanations.
10 |
11 | ## What are Artifact IDs?
12 |
13 | Artifact IDs are how OmniBOR solves the challenge of reproducibly identifying
14 | software artifacts. What we mean by "reproducible" is that anyone with access
15 | to the artifact can produce an identifier for it, and that identifier will
16 | always be the same.
17 |
18 | ## How are Artifact IDs Defined?
19 |
20 | OmniBOR Artifact IDs achieve this by choosing Git Object Identifiers (GitOIDs)
21 | as their identification scheme. GitOIDs are how Git identifies objects it
22 | tracks, and they're formed using a choice of a hash function, and then producing
23 | hashes using what we'll call the "GitOID construction".
24 |
25 | ### Choice of Hash Function
26 |
27 | Today, Git supports three hash functions:
28 |
29 | - SHA-1
30 | - SHA-1CD
31 | - SHA-256
32 |
33 | SHA-1 is the hash function most people are familiar with in Git, and for a long
34 | time it was the _only_ hash function Git supported. However, since the
35 | discovery of the SHAttered attack against SHA-1, and Git project did two things:
36 |
37 | 1. Introduced SHA-1CD to mitigate the risk of collisions arising from attempts
38 | to exploit the SHAttered attack.
39 | 2. Began a process of transitioning to SHA-256 as the basis for all GitOIDs,
40 | which is still underway.
41 |
42 | SHA-1CD is _almost_ equivalent to SHA-1. In essence, SHA-1CD attempts to
43 | detect attempts to engineer collisions (hence the "CD," for "collision
44 | detection") and modifies the output of the hash in those cases to break the
45 | collision. For Git, this kind of modification is fine, and so Git in recent
46 | versions usually uses SHA-1CD under the hood by default, though Git
47 | documentation still just calls in SHA-1. In the context of a single Git
48 | repository, the distinction doesn't generally matter.
49 |
50 | However, for the purposes of achieving a universally reproducible identifier,
51 | we do need to care about the distinction between SHA-1 and SHA-1CD, which is
52 | why we list them separately here.
53 |
54 | The SHA-256 transition in Git has been moving slowly, with successive versions
55 | periodically adding more support to smooth the transition. Nonetheless, progress
56 | is generally recognized as slow.
57 |
58 | __OmniBOR Artifact IDs only support SHA-256 today.__
59 |
60 | This is important. While SHA-1 (really, SHA-1CD) is in wide use in existing Git
61 | data today, we expect that in the long run, it will be phased out. The Git
62 | project itself continues its slow movement along the SHA-256 transition, and
63 | (perhaps more importantly) we anticipate there will likely be government
64 | standards in the future which mandate a move away from SHA-1, in a similar
65 | fashion to prior widespread mandates to move away from MD5.
66 |
67 | While we could in theory support multiple identifiers at the same time, even
68 | just supporting two would double the complexity of OmniBOR operationally for
69 | producers and consumers. Worse, given the SHA-1 / SHA-1CD split, we'd likely
70 | need to support all three if we're going to support SHA-1.
71 |
72 | The specification does reserve the right to add support for alternative hashes
73 | in the future if, for example, SHA-256 is later found to be broken in a manner
74 | similar to how SHA-1 can be broken today.
75 |
76 | ### The "GitOID Construction"
77 |
78 | GitOIDs are constructed not just by hashing the data of the object itself.
79 | Instead, a small "prefix string" is hashed in first, with the following
80 | structure:
81 |
82 | ```
83 | ⎵\0
84 | ```
85 |
86 | Here `⎵` refers to the ASCII space character (`0x20`), replaced with a visual
87 | character for clarity.
88 |
89 | This prefix string has two purposes in Git. First, the object type (which can
90 | be `blob`, `tree`, `commit`, or `tag`) indicates the type of the data being
91 | stored based on Git's object model. This helps differentiate hashes for the
92 | different types of objects. Second, the length being hashed in helps provide
93 | additional protection against collisions. With this length included, an attacker
94 | trying to engineer a collision in Git's object storage would need to account
95 | for how the length of the colliding data impacts the hash as well. The
96 | SHAttered attack specifically relies on extensions of the original data in
97 | highly flexible formats like PDFs, which this is an effective protection
98 | against.
99 |
100 | For Artifact IDs, the object type is always `blob`, so for our purposes
101 | the prefix string is
102 |
103 | ```
104 | blob⎵\0
105 | ```
106 |
107 | Thus, the "GitOID construction" for Artifact IDs is to:
108 |
109 | 1. Calculate the length of the object being identified, in bytes.
110 | 2. Provide this prefix string to the SHA-256 hasher.
111 | 3. Provide the bytes of the object to the hasher.
112 | 4. Generate the hash from the hasher.
113 |
114 | ## How are Artifact IDs represented?
115 |
116 | The textual representation of an Artifact ID looks like this:
117 |
118 | ```
119 | gitoid:blob:sha256:9f64df92367881be21e23567a31a8ce01994d98b69d28917b5c132ce32a8e6c8
120 | ```
121 |
122 | This is a URI using the `gitoid` scheme, [registered with IANA][uri]. In this
123 | scheme, the representation includes four parts, each separate by a colon (`:`).
124 | The first is the string `gitoid`, indicating the URI scheme. The second is
125 | the object type, which for Artifact IDs is always `blob`. Then it's the hash
126 | algorithm, which for Artifact IDs is `sha256`. Finally, it's a lowercase
127 | hexadecimal representation of the SHA-256 hash of the object made with the
128 | GitOID construction.
129 |
130 | ## Why are Artifact IDs Useful?
131 |
132 | Artifact IDs are used for uniquely and reproducibly identifying software
133 | artifacts. Because the construction of an Artifact ID relies only on the
134 | contents of an artifact itself, anyone who has access to the artifact can
135 | derive its Artifact ID, and the Artifact ID they derive will be exactly
136 | equal to one derived by anyone else with access to the same artifact.
137 |
138 | This means that, as an identifier scheme, Artifact IDs can scale without
139 | limits! Other identification systems, like
140 | [Common Platform Enumerations (CPE)][cpe] or [Package URLs (pURLs)][purl],
141 | rely on some form of centralization. CPEs are identifiers which rely on a
142 | centralized dictionary, maintained by the United States' National
143 | Institute of Standards and Technology (NIST). Package URLs rely on a central
144 | list of known package hosts. In either case, while these identifier schemes
145 | are _very_ useful (and we view OmniBOR's Artifact IDs as _complementary_ to
146 | these other identifier schemes), they lack the property of independent
147 | reproducibility that makes Artifact IDs so powerful!
148 |
149 | ## What's Next?
150 |
151 | Of course, Artifact IDs by themselves are only one part of the equation. To
152 | understand more, learn about [Input Manifests][input_manifests] next!
153 |
154 | [uri]: https://www.iana.org/assignments/uri-schemes/prov/gitoid
155 | [cpe]: https://nvd.nist.gov/products/cpe
156 | [purl]: https://github.com/package-url/purl-spec
157 | [input_manifests]: @/docs/input-manifests.md
158 |
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | Apache License
2 | Version 2.0, January 2004
3 | http://www.apache.org/licenses/
4 |
5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
6 |
7 | 1. Definitions.
8 |
9 | "License" shall mean the terms and conditions for use, reproduction,
10 | and distribution as defined by Sections 1 through 9 of this document.
11 |
12 | "Licensor" shall mean the copyright owner or entity authorized by
13 | the copyright owner that is granting the License.
14 |
15 | "Legal Entity" shall mean the union of the acting entity and all
16 | other entities that control, are controlled by, or are under common
17 | control with that entity. For the purposes of this definition,
18 | "control" means (i) the power, direct or indirect, to cause the
19 | direction or management of such entity, whether by contract or
20 | otherwise, or (ii) ownership of fifty percent (50%) or more of the
21 | outstanding shares, or (iii) beneficial ownership of such entity.
22 |
23 | "You" (or "Your") shall mean an individual or Legal Entity
24 | exercising permissions granted by this License.
25 |
26 | "Source" form shall mean the preferred form for making modifications,
27 | including but not limited to software source code, documentation
28 | source, and configuration files.
29 |
30 | "Object" form shall mean any form resulting from mechanical
31 | transformation or translation of a Source form, including but
32 | not limited to compiled object code, generated documentation,
33 | and conversions to other media types.
34 |
35 | "Work" shall mean the work of authorship, whether in Source or
36 | Object form, made available under the License, as indicated by a
37 | copyright notice that is included in or attached to the work
38 | (an example is provided in the Appendix below).
39 |
40 | "Derivative Works" shall mean any work, whether in Source or Object
41 | form, that is based on (or derived from) the Work and for which the
42 | editorial revisions, annotations, elaborations, or other modifications
43 | represent, as a whole, an original work of authorship. For the purposes
44 | of this License, Derivative Works shall not include works that remain
45 | separable from, or merely link (or bind by name) to the interfaces of,
46 | the Work and Derivative Works thereof.
47 |
48 | "Contribution" shall mean any work of authorship, including
49 | the original version of the Work and any modifications or additions
50 | to that Work or Derivative Works thereof, that is intentionally
51 | submitted to Licensor for inclusion in the Work by the copyright owner
52 | or by an individual or Legal Entity authorized to submit on behalf of
53 | the copyright owner. For the purposes of this definition, "submitted"
54 | means any form of electronic, verbal, or written communication sent
55 | to the Licensor or its representatives, including but not limited to
56 | communication on electronic mailing lists, source code control systems,
57 | and issue tracking systems that are managed by, or on behalf of, the
58 | Licensor for the purpose of discussing and improving the Work, but
59 | excluding communication that is conspicuously marked or otherwise
60 | designated in writing by the copyright owner as "Not a Contribution."
61 |
62 | "Contributor" shall mean Licensor and any individual or Legal Entity
63 | on behalf of whom a Contribution has been received by Licensor and
64 | subsequently incorporated within the Work.
65 |
66 | 2. Grant of Copyright License. Subject to the terms and conditions of
67 | this License, each Contributor hereby grants to You a perpetual,
68 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable
69 | copyright license to reproduce, prepare Derivative Works of,
70 | publicly display, publicly perform, sublicense, and distribute the
71 | Work and such Derivative Works in Source or Object form.
72 |
73 | 3. Grant of Patent License. Subject to the terms and conditions of
74 | this License, each Contributor hereby grants to You a perpetual,
75 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable
76 | (except as stated in this section) patent license to make, have made,
77 | use, offer to sell, sell, import, and otherwise transfer the Work,
78 | where such license applies only to those patent claims licensable
79 | by such Contributor that are necessarily infringed by their
80 | Contribution(s) alone or by combination of their Contribution(s)
81 | with the Work to which such Contribution(s) was submitted. If You
82 | institute patent litigation against any entity (including a
83 | cross-claim or counterclaim in a lawsuit) alleging that the Work
84 | or a Contribution incorporated within the Work constitutes direct
85 | or contributory patent infringement, then any patent licenses
86 | granted to You under this License for that Work shall terminate
87 | as of the date such litigation is filed.
88 |
89 | 4. Redistribution. You may reproduce and distribute copies of the
90 | Work or Derivative Works thereof in any medium, with or without
91 | modifications, and in Source or Object form, provided that You
92 | meet the following conditions:
93 |
94 | (a) You must give any other recipients of the Work or
95 | Derivative Works a copy of this License; and
96 |
97 | (b) You must cause any modified files to carry prominent notices
98 | stating that You changed the files; and
99 |
100 | (c) You must retain, in the Source form of any Derivative Works
101 | that You distribute, all copyright, patent, trademark, and
102 | attribution notices from the Source form of the Work,
103 | excluding those notices that do not pertain to any part of
104 | the Derivative Works; and
105 |
106 | (d) If the Work includes a "NOTICE" text file as part of its
107 | distribution, then any Derivative Works that You distribute must
108 | include a readable copy of the attribution notices contained
109 | within such NOTICE file, excluding those notices that do not
110 | pertain to any part of the Derivative Works, in at least one
111 | of the following places: within a NOTICE text file distributed
112 | as part of the Derivative Works; within the Source form or
113 | documentation, if provided along with the Derivative Works; or,
114 | within a display generated by the Derivative Works, if and
115 | wherever such third-party notices normally appear. The contents
116 | of the NOTICE file are for informational purposes only and
117 | do not modify the License. You may add Your own attribution
118 | notices within Derivative Works that You distribute, alongside
119 | or as an addendum to the NOTICE text from the Work, provided
120 | that such additional attribution notices cannot be construed
121 | as modifying the License.
122 |
123 | You may add Your own copyright statement to Your modifications and
124 | may provide additional or different license terms and conditions
125 | for use, reproduction, or distribution of Your modifications, or
126 | for any such Derivative Works as a whole, provided Your use,
127 | reproduction, and distribution of the Work otherwise complies with
128 | the conditions stated in this License.
129 |
130 | 5. Submission of Contributions. Unless You explicitly state otherwise,
131 | any Contribution intentionally submitted for inclusion in the Work
132 | by You to the Licensor shall be under the terms and conditions of
133 | this License, without any additional terms or conditions.
134 | Notwithstanding the above, nothing herein shall supersede or modify
135 | the terms of any separate license agreement you may have executed
136 | with Licensor regarding such Contributions.
137 |
138 | 6. Trademarks. This License does not grant permission to use the trade
139 | names, trademarks, service marks, or product names of the Licensor,
140 | except as required for reasonable and customary use in describing the
141 | origin of the Work and reproducing the content of the NOTICE file.
142 |
143 | 7. Disclaimer of Warranty. Unless required by applicable law or
144 | agreed to in writing, Licensor provides the Work (and each
145 | Contributor provides its Contributions) on an "AS IS" BASIS,
146 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
147 | implied, including, without limitation, any warranties or conditions
148 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
149 | PARTICULAR PURPOSE. You are solely responsible for determining the
150 | appropriateness of using or redistributing the Work and assume any
151 | risks associated with Your exercise of permissions under this License.
152 |
153 | 8. Limitation of Liability. In no event and under no legal theory,
154 | whether in tort (including negligence), contract, or otherwise,
155 | unless required by applicable law (such as deliberate and grossly
156 | negligent acts) or agreed to in writing, shall any Contributor be
157 | liable to You for damages, including any direct, indirect, special,
158 | incidental, or consequential damages of any character arising as a
159 | result of this License or out of the use or inability to use the
160 | Work (including but not limited to damages for loss of goodwill,
161 | work stoppage, computer failure or malfunction, or any and all
162 | other commercial damages or losses), even if such Contributor
163 | has been advised of the possibility of such damages.
164 |
165 | 9. Accepting Warranty or Additional Liability. While redistributing
166 | the Work or Derivative Works thereof, You may choose to offer,
167 | and charge a fee for, acceptance of support, warranty, indemnity,
168 | or other liability obligations and/or rights consistent with this
169 | License. However, in accepting such obligations, You may act only
170 | on Your own behalf and on Your sole responsibility, not on behalf
171 | of any other Contributor, and only if You agree to indemnify,
172 | defend, and hold each Contributor harmless for any liability
173 | incurred by, or claims asserted against, such Contributor by reason
174 | of your accepting any such warranty or additional liability.
175 |
176 | END OF TERMS AND CONDITIONS
177 |
178 | APPENDIX: How to apply the Apache License to your work.
179 |
180 | To apply the Apache License to your work, attach the following
181 | boilerplate notice, with the fields enclosed by brackets "[]"
182 | replaced with your own identifying information. (Don't include
183 | the brackets!) The text should be enclosed in the appropriate
184 | comment syntax for the file format. We also recommend that a
185 | file or class name and description of purpose be included on the
186 | same "printed page" as the copyright notice for easier
187 | identification within third-party archives.
188 |
189 | Copyright [yyyy] [name of copyright owner]
190 |
191 | Licensed under the Apache License, Version 2.0 (the "License");
192 | you may not use this file except in compliance with the License.
193 | You may obtain a copy of the License at
194 |
195 | http://www.apache.org/licenses/LICENSE-2.0
196 |
197 | Unless required by applicable law or agreed to in writing, software
198 | distributed under the License is distributed on an "AS IS" BASIS,
199 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
200 | See the License for the specific language governing permissions and
201 | limitations under the License.
202 |
--------------------------------------------------------------------------------
/content/glossary/git_blob.svg:
--------------------------------------------------------------------------------
1 |
2 |
3 |
--------------------------------------------------------------------------------
/content/glossary/git_object.svg:
--------------------------------------------------------------------------------
1 |
2 |
3 |
--------------------------------------------------------------------------------
/content/spec/v0.1/_index.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: "OmniBOR Specification, Version 0.1"
3 | template: spec.html
4 | ---
5 |
6 | ## Foreword
7 |
8 | This specification is subject to the Community Specification License 1.0,
9 | available at .
10 |
11 | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD",
12 | "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" are used as described in
13 | [RFC 2119][rfc_2119].
14 |
15 | Attention is drawn to the possibility that some of the elements of this
16 | document may be the subject of patent rights. No party shall be held
17 | responsible for identifying any or all such patent rights.
18 |
19 | Any trade name used in this document is information given for the convenience
20 | of users and does not constitute an endorsement.
21 |
22 | This document was prepared by the OmniBOR Community.
23 |
24 | Known patent licensing exclusions are available in the specification
25 | repository's `NOTICES.md` file.
26 |
27 | Any feedback or questions on this document should be directed to the
28 | specification's repository, located at .
29 |
30 | THESE MATERIALS ARE PROVIDED "AS IS." The Contributors and Licensees expressly
31 | disclaim any warranties (express, implied, or otherwise), including implied
32 | warranties of merchantability, non-infringement, fitness for a particular
33 | purpose, or title, related to the materials. The entire risk as to
34 | implementing or otherwise using the materials is assumed by the implementer and
35 | user. IN NO EVENT WILL THE CONTRIBUTORS OR LICENSEES BE LIABLE TO ANY OTHER
36 | PARTY FOR LOST PROFITS OR ANY FORM OF INDIRECT, SPECIAL, INCIDENTAL, OR
37 | CONSEQUENTIAL DAMAGES OF ANY CHARACTER FROM ANY CAUSES OF ACTION OF ANY KIND
38 | WITH RESPECT TO THIS DELIVERABLE OR ITS GOVERNING AGREEMENT, WHETHER BASED ON
39 | BREACH OF CONTRACT, TORT (INCLUDING NEGLIGENCE), OR OTHERWISE, AND WHETHER OR
40 | NOT THE OTHER MEMBER HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
41 |
42 | ## Introduction
43 |
44 | Software supply chains face many challenges: security and compliance chief
45 | among them. Often, projects are hamstrung by the inability to easily and
46 | reliably capture a complete, concise, verifiable accounting of exactly
47 | __what__ inputs were built into software. Without this information,
48 | identifying vulnerable software to patch or replace is difficult. While
49 | Software Bills of Material (SBOMs) help identify third-party components,
50 | they do not go far enough to precisely identify the exact inputs necessary
51 | for vulnerability management.
52 |
53 | The OmniBOR standard defines three concepts, which together enable the
54 | consistent, reproducible, and embeddable encoding of the exact inputs used to
55 | build a software artifact: Artifact Identifiers, Input Manifests, and Artifact
56 | Dependency Graphs.
57 |
58 | An Artifact Identifier is a content-based identifier of a single input (for
59 | example, a single file) used to build a software artifact. Identifiers are
60 | reproducible, meaning two individuals will always derive the same identifier
61 | for the same input. With these identifiers, we can consistently and precisely
62 | identify any software artifact or its input, for use in forensics, accounting,
63 | and vulnerability management.
64 |
65 | Next, an Input Manifest lists the Artifact Identifier of every input used to
66 | produce an artifact. For example, if an executable is compiled by linking
67 | together a collection of object files, the Artifact Identifier of every object
68 | file would be listed in the Input Manifest for the executable. Input Manifests
69 | can be identified by treating them as artifacts and applying the same identifier
70 | heuristic to them applied to any other artifact. For purposes of discussion,
71 | these are typically called Input Manifest Idenftiers or Input Manifest IDs
72 | or IMIDs. The Input Manifest Identifier can be embedded directly into
73 | executable files, or can be provided in a separate file alongside the artifact
74 | whose inputs they describe.
75 |
76 | Finally, a collection of Input Manifests can be combined to produce an Artifact
77 | Dependency Graph. The Artifact Dependency Graph is a complete description of
78 | all inputs, direct or transitive, used to produce a software artifact.
79 |
80 | Returning to the example of building an executable: the executable's Input
81 | Manifest would list the Artifact Identifier of every object file, and each
82 | object file would have its own Input Manifest listing the Artifact Identifier
83 | of each of their source files. This set of Input Manifests can then be
84 | resolved to produce an overall graph completely describing the inputs which
85 | produced the executable.
86 |
87 | With the Artifact Dependency Graph, consumers of this information could then
88 | exactly identify when two artifacts were produced with exactly identical
89 | inputs, and if inputs vary, could identify the exact inputs which vary and
90 | observe how that affects the entirety of the graph. When coupled with
91 | SBOM information about third-party dependencies, this can provide highly
92 | specific and accurate identification of supply chain differences and their
93 | causes.
94 |
95 | This Artifact Dependency Graph may also be used to supplement vulnerability
96 | information by precisely identifying affects files or resolving the impacts
97 | of changes to those files across all users of those projects. By leveraging
98 | transparent inclusion of Input Manifests into executable and other formats,
99 | users would also gain the benefits of high precision supply chain information
100 | without manually recording or updating those manifests as projects develop over
101 | time.
102 |
103 | ## Scope
104 |
105 | Specifies procedures for constructing and conveying Input Manifests,
106 | Artifact Dependency Graphs (ADGs), and other related data structures
107 | for artifacts. Including but not limited to:
108 |
109 | - formats for artifact identifiers
110 | - formats for specifying graph relationships between artifacts
111 | - manner of embedding identifiers for Input Manifests, ADGs, and other related
112 | data structures in artifacts of various types
113 | - guidance on metadata which references Input Manifests, ADGs, and other related data structures
114 | - guidance for build tools for:
115 | - constructing Input Manifests, ADGs, and other related data structures
116 | - conveying Input Manifests, ADGs, and other related data structures
117 | - embedding identifiers for Input Manifests, and other related data structures' ids in artifacts
118 | - manners of conveyance of Input Manifests, and other related data structure's
119 | - descriptions of use cases for which Input Manifests, ADGs, and other related data structures may be used
120 |
121 | ## Normative References
122 |
123 | - [GitOID URI][gitoid_uri]
124 |
125 | ## Terms and Definitions
126 |
127 | For the purposes of this document, the following terms and definitions apply.
128 |
129 | ### Artifact
130 |
131 | An artifact is any object of interest that can be represented as arrays of
132 | bytes (`[]byte`).
133 |
134 | ### Artifact Equivalency
135 |
136 | Two artifacts are equivalent if and only if their binary representations are
137 | equal. This can be expressed in pseudocode with the following expression:
138 | `[]byte(artifact1) == []byte(artifact2)`
139 |
140 | ### Artifact Identifiers
141 |
142 | It should be possible to identify each artifact with an artifact identifier with
143 | the following characteristics:
144 |
145 | - __Reproducible__: Independent parties, presented with equivalent artifacts,
146 | derive the same artifact identity.
147 | - __Unique__: Non-equivalent artifacts have distinct identities.
148 | - __Immutable__: An identified artifact can not be modified without also
149 | changing its identity.
150 |
151 | ## Build Tools
152 |
153 | A build tool is something which reads one or more input artifacts and writes
154 | one or more output artifacts. Examples of build tools include:
155 |
156 | - compilers:
157 | - llvm-clang
158 | - gcc
159 | - javac
160 | - rustc
161 | - go
162 | - linkers:
163 | - llvm-lld
164 | - binutils-ld
165 | - runtimes
166 | - Java JVM
167 | - Node.js
168 | - Python interpreter
169 | - code generators
170 |
171 | ## Specifications
172 |
173 | ### Artifact ID
174 |
175 | Because two artifacts are equivalent if and only if their binary
176 | representations are equal, a hash function may be applied to the binary
177 | representation of an artifact to yield an identifier which satisfies the
178 | canonical, unique, and immutable requirements of artifact identifiers.
179 |
180 | ### Artifact Identifier Types
181 |
182 | The majority of source code artifacts are already stored in git and
183 | indexed by their git object identifiers ("gitoids") as git objects of type
184 | "blob".
185 |
186 | For this reason, OmniBOR has chosen to use the "gitoid" of an Artifact as
187 | its Artifact Identifier.
188 |
189 | Git currently supports two varieties of gitoids. One is based on SHA1 and is
190 | in common use. The other is based on SHA256 and has been very slow to garner
191 | adoption. The [gitoid URI spec][gitoid_uri] uses different prefixes,
192 | `gitoid:blob:sha1` or `gitoid:blob:sha256`, to distinguish which algorithm is
193 | being used for computing the gitoid of a blob. This document adopts the gitoid
194 | URI prefixes to distinguish Artifact Identifier Types. This approach is
195 | anticipated to extend gracefully as git adopts new hash types in the future.
196 |
197 | All subsequent references to mandatory identifier types in this document should
198 | be interpreted to mean the list:
199 |
200 | - `gitoid:blob:sha1`
201 | - `gitoid:blob:sha256`
202 |
203 | ### Artifact Input Manifest
204 |
205 | An Artifact Input Manifest for an Artifact enumerates the inputs to the
206 | build tool that produced the artifact.
207 |
208 | Hereafter in the spec Artifact Input Manifest will simply be referred to as Input Manifest.
209 |
210 | A given Input Manifest utilizes precisely one Artifact Identifier Type.
211 |
212 | #### Input Manifest Identifier
213 |
214 | An Input Manifest is identified by computing its identifier as an artifact
215 | with the Artifact Identifier Type used for identifiers within the Input Manifest
216 | itself.
217 |
218 | The Input Manifest Identifier for the Input Manifest of an artifact is sometimes
219 | referred to as the Input Manifest Identifier of the artifact.
220 |
221 | #### Input Manifest Header
222 |
223 | In order to distinguish the type of identifier used in the Input Manifest,
224 | it begins with a single newline terminated header line:
225 |
226 | ```
227 | ${Artifact Identifier Type uri prefix}\n
228 | ```
229 |
230 | For example:
231 |
232 | ```
233 | gitoid:blob:sha1\n
234 | ```
235 |
236 | or
237 |
238 | ```
239 | gitoid:blob:sha256\n
240 | ```
241 |
242 | All identifiers in a Input Manifest MUST be of the Artifact Identifier
243 | Type declared in the header.
244 |
245 | #### Input Manifest Records
246 |
247 | The Input Manifest after the header consists of a list of newline terminated
248 | input records
249 |
250 | An input record for an artifact for which no Input Manifest Identifier is known is represented as:
251 |
252 | ```
253 | blob⎵${artifact identifier of the input artifact}\n
254 | ```
255 |
256 | An input record for an artifact for which an Input Manifest Identifier is known is represented as:
257 |
258 | ```
259 | blob⎵${artifact identifier of the input artifact}⎵bom⎵${input manifest identifier of the input artifact}\n
260 | ```
261 |
262 | `⎵` above refers to the ASCII space character (0x20).
263 |
264 | Artifact identifiers in Input Records should be represented as a strings in lower case hexadecimal. For example
265 | 514516097a2f95c893f2a9685bcecfb85b7598e6.
266 |
267 | The input artifact records must be written to the Input Manifest in lexical
268 | order.
269 |
270 | The Artifact Identifier and Input Manifest Identifier must both be of the Artifact Identifier
271 | Type declared in the Input Manifest header.
272 |
273 | #### Input Manifest Character Encoding
274 |
275 | All characters in an Input Manifest are encoded in ASCII. Please note: all '\n'
276 | must be encoded as '\n' characters, _not_ the line delimiter of the platform.
277 |
278 | #### Input Manifest Identifier Embedding
279 |
280 | Each build tool should embed into the output artifact a deterministically
281 | ordered list of Input Manifest Identifiers for each mandatory Artifact
282 | Identifier Type in a manner:
283 |
284 | 1. Appropriate to the type of artifact
285 | 2. Generally agreed upon for that artifact
286 |
287 | #### Input Manifest Construction by a Build Tool
288 |
289 | A build tool creating an output artifact must compute an Input Manifest of
290 | each mandatory artifact identifier type.
291 |
292 | For each input artifact the build tool must:
293 |
294 | 1. Compute the artifact identifier of the input - `${artifact identifier}`
295 | 2. Examine the input for an embedded Input Manifest Identifier -
296 | `${input manifest identifier}`
297 |
298 | The build tool must persist an Input Manifest using the
299 | `${artifact identifier}` and `${input manifest identifier}` for each input.
300 |
301 | #### Input Manifest Examples
302 |
303 | ```
304 | gitoid:blob:sha1
305 | blob 06a6891154fff74e1ddb6245f4a0467b09c617c5
306 | blob 06dd79bc831bb06a6267a36ad2d62beccd7900b2 bom a9a64def763517df596fbb4348a8561069b5dc4b
307 | blob 0bc39408c1e5feaadd6f0420d14324b477420b93
308 | blob 15acd4427ca14000111aad5071563bc7f2dc09f4
309 | blob 1be90e6fab4ab9b7dd3b27cea5bb1fe29acc0204
310 | blob 1d8a4e28d1b62a2bfeba837fe18422cd106e6ddf bom 5bda8237d1676df0a2d0b8682d40f99a27ef5b13
311 | blob 28488e0b05954ccf87c779f5f9258987e4d68ac5
312 | blob 2c0cde251f1a9f05563a5f7a7f32588f04aaa235
313 | ```
314 |
315 | ```
316 | gitoid:blob:sha256
317 | blob 09c825ac02df9150e4f93d12ba1da5d1ff5846c3e62503c814aa3a300c535772
318 | blob 230f3515d1306690815bd9c3da0d15d8b6fcf43894d17100eb44b6d329a92f61
319 | blob 2f4a51b16b76bbc87c4c27af8ae062b1b50b280f1ab78e3eec155334588dc88e bom 4f3a822f776412c049dda53c3277bf2225b51b805ce8a99222af23a7d9f55636
320 | blob c71d239df91726fc519c6eb72d318ec65820627232b2f796219e87dcf35d0ab4
321 | blob f47ffb3518f236eea6525fd29f057ddd5cda1bb803ccc662e6bc5925afd1e4af
322 | ```
323 |
324 | ### Artifact Dependency Graph (ADG)
325 |
326 | The Artifact Dependency Graph (ADG) of an artifact is the recursive DAG
327 | (Directed Acyclic Graph) of all the "input artifacts" that are transformed
328 | by a build tool into that artifact. It includes the direct input artifacts,
329 | and the recursive set of input artifacts to each input artifact, all the way
330 | down the graph.
331 |
332 | Concretely the Artifact Dependency Graph (ADG) of an Artifact is:
333 |
334 | - The set of Input Manifests defined by:
335 | - The Input Manifest of the Artifact
336 | - Any Input Manifest referenced in an Input Manifest in the set (ie the transitive closure of the Input Manifests)
337 | - The Input Manifest Identifier of the Artifact
338 | ## Annexes
339 |
340 | - [Annex A - File System Storage](@/spec/v0.1/annex-a.md)
341 | - [Annex B - ELF Embedding](@/spec/v0.1/annex-b.md)
342 | - [Annex C - Source Embedding](@/spec/v0.1/annex-c.md)
343 |
344 | ## Bibliography
345 |
346 | - RFC 2119:
347 | - GitOID URI:
348 |
349 | [rfc_2119]: https://tools.ietf.org/html/rfc2119
350 | [gitoid_uri]: https://www.iana.org/assignments/uri-schemes/prov/gitoid
351 |
--------------------------------------------------------------------------------
/content/glossary/gitoid.svg:
--------------------------------------------------------------------------------
1 |
2 |
3 |
147 |
--------------------------------------------------------------------------------
/content/glossary/_index.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: Glossary
3 | template: glossary.html
4 | ---
5 |
6 | The following is a glossary of terms defined by the OmniBOR project. For the
7 | current precise definitions, refer to the [specification].
8 |
9 | ## Artifact
10 |
11 | An artifact is any software object of interest.
12 |
13 | Examples:
14 |
15 | - source code file (of any language)
16 | - `.o` object file
17 | - `.so` shared object file
18 | - `.class` Java class file
19 | - `.jar` file
20 | - `.pyc` compiled python file
21 | - executable file
22 | - container image
23 |
24 | What all artifacts have in common is that they are all arrays of bytes.
25 |
26 | ## Artifact Equivalency
27 |
28 | Two artifacts are equivalent if and only if their byte representations are
29 | exactly equal.
30 |
31 | ## Derived Artifacts
32 |
33 | Most artifacts are produced by a [build tool](#build-tool) consuming some set
34 | of input artifacts to produce an artifact as an output. Such artifacts are said
35 | to be 'derived artifacts'.
36 |
37 | ## Leaf Artifacts
38 |
39 | Artifacts which are not 'derived artifacts' are said to be 'leaf artifacts'.
40 | Leaf artifacts are usually source code files constructed by hand by humans.
41 |
42 | Examples:
43 |
44 | - "`foo.o` is derived from `foo.c` and `bar.h` using `gcc`"
45 | - "`fooexecutable` is derived from `foo.o` and `baz.o` using `ld`"
46 | - "`foo.class` is derived from `foo.java` using `javac`"
47 |
48 | ## Artifact ID
49 |
50 | It should be possible to identify each artifact with an Artifact ID.
51 |
52 | Artifact IDs should have the following characteristics:
53 |
54 | **Canonical**
55 | : Independent parties, presented with equivalent artifacts, derive the same
56 | Artifact ID.
57 |
58 | **Unique**
59 | : Non-equivalent artifacts have distinct Artifact IDs.
60 |
61 | **Immutable**
62 | : An artifact cannot be modified without also changing its Artifact ID.
63 |
64 | OmniBOR uses the [GitOID](#gitoid) of an artifact as its
65 | Artifact ID.
66 |
67 | Source code [leaf artifacts](#leaf-artifacts) are typically already being
68 | stored in [Git](#git) where they are identified via their [GitOID](#gitoid).
69 |
70 | ## Artifact Dependency Graph
71 |
72 | The Artifact Dependency Graph (ADG) of an [artifact](#artifact) is the
73 | DAG (Directed Acyclic Graph) of all the ['leaf artifacts'](#leaf-artifacts) that
74 | are transformed by a [build tool](#build-tool) into that artifact. This
75 | includes the direct input artifacts, and the transitive set of artifacts to
76 | each input artifact, all the way down to source code.
77 |
78 | ### Examples
79 |
80 | Simple C Executable
81 |
82 | {% mermaid() %}
83 | flowchart BT
84 | c1[.c] --> o1[.o]
85 | h1.1[.h] --> o1[.o]
86 | h1.2[.h] --> o1[.o]
87 | c2[.c] --> o2[.o]
88 | h2.1[.h] --> o2[.o]
89 | h2.2[.h] --> o2[.o]
90 | o1 --> executable
91 | o2 --> executable
92 | {% end %}
93 |
94 | Running C Executable with Shared Object
95 |
96 | {% mermaid() %}
97 | flowchart BT
98 | c1[.c] --> o1[.o]
99 | h1.1[.h] --> o1[.o]
100 | h1.2[.h] --> o1[.o]
101 | c2[.c] --> o2[.o]
102 | h2.1[.h] --> o2[.o]
103 | h2.2[.h] --> o2[.o]
104 | o1 --> executable
105 | o2 --> executable
106 | c3[.c] --> o3[.o]
107 | h3.1[.h] --> o3[.o]
108 | h3.2[.h] --> o3[.o]
109 | c4[.c] --> o4[.o]
110 | h4.1[.h] --> o4[.o]
111 | h4.2[.h] --> o4[.o]
112 | o3 --> .so
113 | o4 --> .so
114 | executable --> running[running executable]
115 | .so --> running[running executable]
116 | {% end %}
117 |
118 | Java Example
119 |
120 | {% mermaid() %}
121 | flowchart BT
122 | java1[.java] --> cls1[.class]
123 | java2[.java] --> cls2[.class]
124 | java3[.java] --> cls3[.class]
125 | java4[.java] --> cls4[.class]
126 | java5[.java] --> cls5[.class]
127 | cls1 --> running[running executable]
128 | cls2 --> running[running executable]
129 | cls3 --> running[running executable]
130 | cls4 --> running[running executable]
131 | cls5 --> running[running executable]
132 | {% end %}
133 |
134 | Go Example
135 |
136 | {% mermaid() %}
137 | flowchart BT
138 | go1[.go] --> o1[.o]
139 | go2[.go] --> o2[.o]
140 | go3[.go] --> o3[.o]
141 | go4[.go] --> o4[.o]
142 | go5[.go] --> o5[.o]
143 | o1 --> executable
144 | o2 --> executable
145 | o3 --> executable
146 | o4 --> executable
147 | o5 --> executable
148 | {% end %}
149 |
150 | Python Example
151 |
152 | {% mermaid() %}
153 | flowchart BT
154 | py1[.py] --> pyc1[.pyc]
155 | py2[.py] --> pyc2[.pyc]
156 | py3[.py] --> pyc3[.pyc]
157 | py4[.py] --> pyc4[.pyc]
158 | py5[.py] --> pyc5[.pyc]
159 | pyc1 --> running[running executable]
160 | pyc2 --> running[running executable]
161 | pyc3 --> running[running executable]
162 | pyc4 --> running[running executable]
163 | pyc5 --> running[running executable]
164 | {% end %}
165 |
166 | ## Build Tool
167 |
168 | A build tool is something which reads one or more input [artifacts](#artifact)
169 | and writes one or more output [artifacts](#artifact).
170 |
171 | {% mermaid() %}
172 | flowchart LR
173 | input1 --> buildtool[build tool] --> output
174 | input2 --> buildtool[build tool]
175 | input3 --> buildtool[build tool]
176 | {% end %}
177 |
178 | Examples:
179 |
180 | * C compiler consumes one `.c` file and zero or more `.h` files to produce a
181 | `.o` file
182 |
183 | {% mermaid() %}
184 | flowchart LR
185 | .c --> compiler[[compiler]]
186 | *.h --> compiler[[compiler]]
187 | compiler --> .o
188 | {% end %}
189 |
190 | * C linker consumes one or more `.o` files to produce an executable file
191 |
192 | {% mermaid() %}
193 | flowchart LR
194 | *.o --> linker[[linker]]
195 | linker --> executable
196 | {% end %}
197 |
198 | * C linker consumes one or more `.o` files to produce a shared object
199 |
200 | {% mermaid() %}
201 | flowchart LR
202 | *.o --> linker[[linker]]
203 | linker --> .so
204 | {% end %}
205 |
206 | * Dynamic linker consumes an executable file and zero or more shared objects to
207 | produce a running process
208 |
209 | {% mermaid() %}
210 | flowchart LR
211 | executable --> linker[[dynamic linker]]
212 | *.so --> linker[[dynamic linker]]
213 | linker --> running[running executable]
214 | {% end %}
215 |
216 | * Java compiler consumes a `.java` file to produce a `.class` file
217 |
218 | {% mermaid() %}
219 | flowchart LR
220 | .java --> compiler[[compiler]]
221 | compiler --> classfile[.class]
222 | {% end %}
223 |
224 | * Java runtime consumes one or more `.class` files to produce a running process
225 |
226 | {% mermaid() %}
227 | flowchart LR
228 | classfile[*.class] --> runtime[[runtime]]
229 | runtime --> running[running executable]
230 | {% end %}
231 |
232 | * Python bytecode compiler consumes a `.py` file to produce a `.pyc` file
233 |
234 | {% mermaid() %}
235 | flowchart LR
236 | .py --> compiler[[compiler]]
237 | compiler --> .pyc
238 | {% end %}
239 |
240 | The totality of ancestors for a given artifact may be represented as an
241 | [Artifact Dependency Graph (ADG)](#artifact-dependency-graph).
242 |
243 | ## Code Generators
244 |
245 | Typically, source code files are hand written by humans, and as such are
246 | [leaf artifacts](#leaf-artifacts) in the
247 | [Artifact Dependency Graph (ADG)](#artifact-dependency-graph).
248 |
249 | Source code files can also be **generated** from other inputs by a code
250 | generator.
251 |
252 | {% mermaid() %}
253 | flowchart LR
254 | input[input] --> codegenerator[[code generator]] --> generatedsrc[generated source code file]
255 | {% end %}
256 |
257 | In this scenario, the generated source code file is a
258 | [derived artifact](#derived-artifacts). This is because the
259 | [code generator](#code-generators) is a [build tool](#build-tool) and, by
260 | definition, the output from the [build tool](#build-tool) is a
261 | [derived artifact](/glossary/artifact/#derived-artifacts).
262 |
263 | Code generation is very common in many languages.
264 | See [go generate](https://eli.thegreenplace.net/2021/a-comprehensive-guide-to-go-generate/),
265 | [Java Xtend](https://www.eclipse.org/xtend/), and
266 | [qtcpp](https://qface.readthedocs.io/en/latest/qtcpp.html) for examples.
267 |
268 | ## Git
269 |
270 | [Git](https://git-scm.com/) is an object store masquerading as a source
271 | code management system (SCM).
272 |
273 | Git's storage model stores source code and metadata using a Merkel tree.
274 |
275 | ## Git Objects
276 |
277 | Git Objects are represented as follows:
278 |
279 | {{ img(path = "/glossary/git_object.svg", alt = "Git Object") }}
280 |
281 | * `${type}` - Git Object Type as a string
282 | - `blob` - any bytes
283 | - `tree` - represents a filesystem tree
284 | - `commit` - represents a Git commit
285 | - `tag` - represents a Git tag
286 | * `${size}`: size in bytes of `${content}` represented as a string base 10.
287 | * `${content}`: the byte content of the object
288 |
289 | ## Git Blob
290 |
291 | A Git [blob](https://git-scm.com/book/en/v2/Git-Internals-Git-Objects)
292 | (binary large object) is the type used for file contents in git:
293 |
294 | {{ img(path = "/glossary/git_blob.svg", alt = "Git Blobs") }}
295 |
296 | - `${content}` - bytes of the file contents
297 | - Does not include filename or path
298 | - Does not include mode information
299 | - Does not include *any* metadata
300 | - Just the contents
301 | - **Any file anywhere with the same contents will have the same 'blob' object**
302 | - **Any file anywhere with the same contents will have the same GitOID**
303 |
304 | ## GitOID
305 |
306 | Git Blobs are identified by the SHA-1 hash of the blob object with the GitOID
307 | construction, which first hashes in a string containing the object type, an
308 | ASCII space character, the length of the content in number of bytes, and an
309 | ASCII null terminator character:
310 |
311 | {{ img(path = "/glossary/gitoid.svg", alt = "GitOIDs") }}
312 |
313 | ## OmniBOR
314 |
315 | An [artifact dependency graph](#artifact-dependency-graph) can be represented as
316 | a graph with nodes identified by an [Artifact ID](#artifact-id). In the examples
317 | below, we only show tree structures for simplicity.
318 |
319 | {% mermaid() %}
320 | flowchart BT
321 | Artifact-2[Artifact-2 ID] --> Artifact-1[Artifact-1 ID]
322 | Artifact-3[Artifact-3 ID] --> Artifact-1[Artifact-1 ID]
323 | Artifact-4[Artifact-4 ID] --> Artifact-2[Artifact-2 ID]
324 | Artifact-5[Artifact-5 ID] --> Artifact-2[Artifact-2 ID]
325 | Artifact-6[Artifact-6 ID] --> Artifact-3[Artifact-3 ID]
326 | Artifact-7[Artifact-7 ID] --> Artifact-3[Artifact-3 ID]
327 | {% end %}
328 |
329 | OmniBOR uses the [GitOID](#gitoid) of an artifact as its
330 | [Artifact ID](#artifact-id).
331 |
332 | {% mermaid() %}
333 | flowchart BT
334 | Artifact-2[Artifact-2 gitoid] --> Artifact-1[Artifact-1 gitoid]
335 | Artifact-3[Artifact-3 gitoid] --> Artifact-1[Artifact-1 gitoid]
336 | Artifact-4[Artifact-4 gitoid] --> Artifact-2[Artifact-2 gitoid]
337 | Artifact-5[Artifact-5 gitoid] --> Artifact-2[Artifact-2 gitoid]
338 | Artifact-6[Artifact-6 gitoid] --> Artifact-3[Artifact-3 gitoid]
339 | Artifact-7[Artifact-7 gitoid] --> Artifact-3[Artifact-3 gitoid]
340 | {% end %}
341 |
342 | ## Input Manifest
343 |
344 | The parent-child relationship is captured by a set of Input Manifests.
345 |
346 | Each artifact has an Input Manifest that describes its immediate children
347 | consiting of a set of new line delimited records, one for each child, in
348 | lexical order.
349 |
350 | A child artifact which is itself a [leaf artifact](#leaf-artifacts) would be
351 | represented by:
352 |
353 | ```
354 | ${Artifact ID of child}\n
355 | ```
356 |
357 | A child artifact which is itself a [derived artifact](#derived-artifacts) would
358 | be represented by:
359 |
360 | ```
361 | ${Artifact ID of child}⎵manifest⎵${Artifact ID of child's Input Manifest}\n
362 | ```
363 |
364 | Example:
365 |
366 | {% mermaid() %}
367 | flowchart BT
368 | Artifact-2[Artifact-2 Artifact ID] --> Artifact-1[Artifact-1 Artifact ID]
369 | Artifact-3[Artifact-3 Artifact ID] --> Artifact-1[Artifact-1 Artifact ID]
370 | Artifact-4[Artifact-4 Artifact ID] --> Artifact-2[Artifact-2 Artifact ID]
371 | Artifact-5[Artifact-5 Artifact ID] --> Artifact-2[Artifact-2 Artifact ID]
372 | Artifact-6[Artifact-6 Artifact ID] --> Artifact-3[Artifact-3 Artifact ID]
373 | Artifact-7[Artifact-7 Artifact ID] --> Artifact-3[Artifact-3 Artifact ID]
374 | {% end %}
375 |
376 | Artifact-2's Input Manifest:
377 |
378 | ```
379 | gitoid:sha256\n
380 | ${Artifact ID of Artifact-4}\n
381 | ${Artifact ID of Artifact-5}\n
382 | ```
383 |
384 | Artifact-3's Input Manifest:
385 |
386 | ```
387 | gitoid:sha256\n
388 | ${Artifact ID of Artifact-6}\n
389 | ${Artifact ID of Artifact-7}\n
390 | ```
391 |
392 | Artifact-1's Input Manifest:
393 |
394 | ```
395 | gitoid:sha256\n
396 | ${Artifact ID of Artifact-2}⎵manifest⎵${Artifact ID of Artifact-2's Input Manifest}\n
397 | ${Artifact ID of Artifact-3}⎵manifest⎵${Artifact ID of Artifact-2's Input Manifest}\n
398 | ```
399 |
400 | ### Embedding of Artifact IDs for Input Manifests
401 |
402 | OmniBOR advocates for [build tools](#build-tool) to embed into each
403 | [derived artifact](#derived-artifacts) the Artifact ID of that derived
404 | artifact's Input Manifest.
405 |
406 | Examples:
407 |
408 | **ELF Files (Executables and `.so`, and `.o` files)**
409 | : Embed Input Manifest Artifact ID into an ELF section named `.omnibor`
410 |
411 | **ar Files (`.a` static libraries)**
412 | : Embed Input Manifest Artifact ID into an archive entry named `.omnibor`
413 |
414 | **General Archive files (`tar`, `gzip`, etc.)**
415 | : Embed Input Manifest Artifact ID into an archive entry named `.omnibor`
416 |
417 | **Java `.class` file**
418 | : Embed Input Manifest Artifact ID into an annotation named `@OMNIBOR` in the
419 | `.class` file.
420 |
421 | **Python `.pyc` files**
422 | : Embed Input Manifest Artifact ID into an `__omnibor__` in the `.pyc` file.
423 |
424 | **Container Images**
425 | : Embed Input Manifest Artifact ID into the image manifest as an annotation
426 | named `dot.omnibor`
427 |
428 | **Generated Source Code**
429 | : Embed Input Manifest Artifact ID for a generated source code file using a
430 | comment
431 |
432 | ## SBOM
433 |
434 | OmniBOR is not a Software Bill of Materials (SBOM). It is designed to
435 | complement SBOMs, such as [SPDX](https://spdx.dev/) or
436 | [CycloneDX](https://cyclonedx.org/).
437 |
438 | [OmniBOR](#omnibor) can help [SBOMs](#sbom) be more precise and reliable.
439 |
440 | Most [SBOMs](#sbom) allow for 'external identifiers' and can thus use
441 | [Artifact IDs](#artifact-id) to reference the artifacts in the OmniBOR
442 | [Artifact Dependency Graph (ADG)](#artifact-dependency-graph). This allows an
443 | [SBOM](#sbom) describing a specific component, e.g.
444 | `Component Name: Django` and `Component Version: 1.11.1`, to reference a list
445 | of applicable [Artifact IDs](#artifact-id).
446 |
447 | This is helpful because today two different tools might produce two different
448 | SBOMs for the same software [artifact](#artifact). This could occur if the SBOM
449 | generation tools use different sources to identify and describe the component.
450 | OmniBOR provides a precise software [Artifact ID](#artifact-id) which can be
451 | used in SBOMs in situations where naming schemes may be ambiguous.
452 |
453 | **Example 1**: If one SBOM generation tool uses [CPEs](https://nvd.nist.gov/products/cpe):
454 |
455 | ```
456 | cpe:2.3:a:djangoproject:django:1.11.1:*:*:*:*:*:*:*
457 | ```
458 |
459 | and the other uses [Package URLs (pURLs)](https://github.com/package-url/purl-spec):
460 |
461 | ```
462 | pkg:pypi/django@1.11.1
463 | ```
464 |
465 | … then these two SBOMs might diverge when they define the component
466 | supplier: it could be `Component Supplier: djangoproject` or
467 | `Component Supplier: pypi`.
468 |
469 | **Example 2:** In another instance a vendor might choose to use their
470 | product's current marketing name for the component name in their SBOM
471 | generation tools, whereas third-party SBOM generation tools might use the
472 | vendor's product name as listed in a [CPE](https://nvd.nist.gov/products/cpe)
473 | or [SWID tag](https://nvd.nist.gov/products/swid).
474 |
475 | By enabling both SBOM generation tools to list the OmniBOR Artifact ID(s) for
476 | associated with the component, an SBOM consumer can quickly understand that
477 | both SBOMs do describe the same artifact, regardless of ambiguities in naming
478 | schemes.
479 |
480 | [specification]: @/spec/_index.md
481 |
--------------------------------------------------------------------------------
/content/resources/whitepaper.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: Whitepaper
3 | ---
4 |
5 |
6 | OmniBOR: Enabling Universal Artifact Traceability In Software Supply Chains
7 | ===
8 |
9 | * Author: Aeva Black
10 | * Status: DRAFT
11 | * Last updated: 2022-01-25
12 |
13 | ## Summary
14 |
15 | OmniBOR is an application of the [git](https://en.wikipedia.org/wiki/Git) [DAG](https://en.wikipedia.org/wiki/Directed_acyclic_graph), a widely used merkle tree with a flat-file storage format, to the challenge of creating build artifact dependency graphs in today's language-heterogeneous open source environments.
16 |
17 | By generating artifact dependency graphs at build time, embedding the hash of the graph in produced artifacts, and referencing that hash in each subsequent build step, OmniBOR will enable the creation of verifiable and complete artifact dependency graphs while requiring no effort from, or changes in, most open source projects. Furthermore, it will enable efficient correlation of vulnerability databases against a concise representation of the artifact dependency graph within run-time environments, if vulnerability databases can be correlated to source files or intermediary packages or libraries. These benefits would also accrue to closed-source projects that use the same build tools, and provide insights which span both open and closed source components in a consistent manner.
18 |
19 | ### Objective
20 |
21 | It is desirable to enable efficient launch-time comparison of the verifiable and complete build tree of any executable component [1] against a then-current list of undesirable source files [2] which are known to be undesirable, where such a build tree contains unique referents for all sources from which the given executable object was composed.
22 |
23 | [1]: binary, dynamically-linked library, container image, etc.
24 |
25 | [2]: because vulnerabilities may be discovered between the time an executable is created and the time when it is run, these processes must be decoupled
26 |
27 |
28 | ### Proposal
29 |
30 | In an ideal scenario, an open source consumer would have available to them a complete artifact dependency graph, tracing dependencies to their ultimate depth. Even if we do not achieve this ideal, we should seek a solution with the lowest cost of adoption so as to enable the greatest buy-in across all open source ecosystems and communities.
31 |
32 | For this reason we propose two areas of work:
33 | 1. enhancing artifact-generating tools (e.g., compilers, linkers, and container image generators) to also output metadata regarding their inputs and outputs
34 | 2. defining a storage format which represents the minimum information to describe the artifact dependency graph, and which is based on git's on-disk storage format
35 |
36 | Following from (1), this approach will require minimal to no effort on the part of open source project maintainers, thus significantly increasing its chances of widespread adoption as compared to any approach which requires maintainers to perform additional actions (e.g., implementing substantive changes in their CI/CD or package build pipeline to generate an SBOM).
37 |
38 | Following from (2), this on-disk format provides an efficient and already well-understood method for cross referencing artifacts and source files by a deterministically-generated UUID (SHA1 or SHA256).
39 |
40 |
41 | ### ASCII-Art Flow Chart
42 | ```
43 | ┌─────────────────────────────-┐
44 | │ Build Time: Graph Generation │
45 | │ │
46 | │ ┌────────┐ ┌────────┐ │
47 | │ │ Src A │ │ Src B │ │
48 | │ └───┬────┘ └──┬─────┘ │
49 | │ │ │ │
50 | │ ▼ │ │
51 | │ ┌───────┐ │ │
52 | │ │ Obj A │ │ │
53 | │ └─────┬─┘ │ │
54 | │ │ │ │
55 | │ ▼ ▼ │
56 | │ ┌─────────────┐ │
57 | │ │ Compilation │ │
58 | │ │ & Signing │ │
59 | │ └─────┬───────┘ │
60 | │ │ │
61 | └──────────────┼──────────────-┘
62 | │
63 | ┌──▼─┐
64 | ▼ ▼
65 | ┌──────────┐ ┌──────--┐
66 | │ [header] ├──►│omnibor │
67 | │executable│ │ graph │
68 | └──────┬───┘ └┬─────--┘
69 | │ │
70 | ┌───────┼────────┼────────────────────────────┐
71 | │ │ │ Run Time: Comparison │
72 | │ │ │ │
73 | │ ▼ ▼ ┌────────────┐ │
74 | │ ┌───────────────┐ │ Public │ │
75 | │ │ Policy │◄─────►│ Vuln │ │
76 | │ │ Enforcement │ │ Database │ │
77 | │ └─┬─────────────┘ └────────────┘ │
78 | │ │ | │
79 | │ ▼ ▼ │
80 | │ ┌─────────────┐ ┌────────────┐ │
81 | │ │ Runtime | | Scanning | |
82 | | | Environment │◄─────►│ Tools | │
83 | │ └─────────────┘ └────────────┘ │
84 | │ │
85 | └─────────────────────────────────────────────┘
86 |
87 | ```
88 |
89 |
90 | ## OmniBOR
91 |
92 | OmniBOR is an approach which has the following properties:
93 | 1. re-uses a well understood paradigm for modelling artifact relationships efficiently in flat files on disk in a machine-readable format
94 | 2. optimally efficient approach for run-time comparison of any given binary object against a dataset of signatures of known-vulnerable inputs
95 | 3. does not require project maintainers to make any changes to their workflow in order to comply with the [Biden Executive Order](https://www.whitehouse.gov/briefing-room/presidential-actions/2021/05/12/executive-order-on-improving-the-nations-cybersecurity/)
96 | 4. has a bounded scope of work to achieve near-complete coverage of the F/OSS landscape
97 | 5. could be integrated with both free and commercial services
98 |
99 | ### Characteristics
100 |
101 | 1. **Artifact Equivalence**: Two artifacts are equivalent IFF `[]byte(artifact1) == []byte(artifact2)`.
102 | 2. **Artifact Identity**: Independent parties, presented with equivalent artifacts, derive the same artifact identity.
103 | 3. **Immutability**: An identified artifact can not be modified without also changing its identity.
104 | 4. **Uniqueness**: An artifact can have precisely *one* artifact identity graph. All equivalent artifacts have the same graph.
105 | 5. **Transparently Opaque**: Artifacts and associated metadata may be obfuscated when sharing the artifact identity graph, while preserving other properties.
106 | 6. **Truncatability of Graph**: Artifact identity graphs may themselves be treated as artifacts, enabling truncation of a part of the graph and replacing the leading node with a signature of the sub-graph, thereby preserving all other properties with respect to the whole.
107 | 7. **Independent Metadata**: Artifacts may be associated, through their identity, to independently generated metadata stored outside of the artifact identity graph, such as an SBOM containing license and provenance metadata.
108 | 8. **Authoritative Reference**: By generating the artifacts in the authoring function, correctness of the generated artifact identity graph can have the minimum number of dependencies (N=1) and least error rate of all solutions which could generate similar graphs.
109 | 9. **Non-reputability**:
110 | 10. **Embedded**: An artifact includes a unique identifier of the document containing the artifact identity graph used to generate that artifact.
111 |
112 |
113 | #### 1. Artifact Equivalence
114 |
115 | *Two artifacts are equivalent if `[]byte(artifact1) == []byte(artifact2)`.*
116 |
117 | Two artifacts are said to be equivalent if and only if they are byte-for-byte identical. This implies that OmniBOR is not concerned with questions of provenance, origination, licensure, or many others aspects which are encompassed by a software bill of materials, and which could differ between byte-equivalent artifacts.
118 |
119 | #### 2. Artifact Identity
120 |
121 | *Independent parties, presented with equivalent artifacts, derive the same artifact identity.*
122 |
123 | This implies that a deterministic hashing function may be used to derive artifact identity, such as SHA256.
124 |
125 | #### 3. Immutability
126 |
127 | *An identified artifact cannot be modified without also changing its identity. Non-equivalent artifacts have distinct identities.*
128 |
129 | "An identified artifact" means an artifact whose identity has been determined. "Can not be modified without also changing its identity" means that the deterministic hashing function has no collisions, and therefore any change to the artifact results in a change to its identity. In this way, the relationship between artifact and identity is immutable.
130 |
131 | #### 4. Uniqueness
132 |
133 | *An artifact can have precisely one artifact identity graph. All equivalent artifacts have the same graph.*
134 |
135 | This implies that we must not include build tooling in the artifact dependency graph, as doing otherwise would violate the Uniqueness requirement. For example, two reproducible build systems which rely on different auxiliary libraries (e.g., zlib) and result in byte-equivalent outputs **must** yield identical OmniBORs.
136 |
137 | For further exploration of this topic, see Wheeler's work on reproducibility as a means to verify trustability: [Countering Trusting Trust through Diverse Double-Compiling](https://dwheeler.com/trusting-trust/)
138 |
139 | {{% notification type="info" %}}
140 | **Note the implication** that for any artifact, there can only be one artifact identity graph, but the reverse is not true. Each artifact identity graph may generate multiple artifacts (e.g., if different build parameters are used, or it is compiled on a different architecture, or different metadata, such as compile time, were embedded in the built artifact).
141 | {{% /notification %}}
142 |
143 | #### 5. Transparently Opaque
144 |
145 | *Artifacts and associated metadata may be obfuscated when sharing the artifact identity graph, while preserving other properties.*
146 |
147 | Metadata about artifacts and their associated artifact dependency graphs may have varying levels of sensitivity. OmniBOR allows the supplier to reveal as little or as much as they, in negotiation with their consumers, choose. The OmniBOR graph itself is just a [merkle tree](https://en.wikipedia.org/wiki/Merkle_tree) of opaque hashes. This provides transparency about the artifact dependency graph and its structure, while allowing supplier modulated levels of opaequeness about the metadata.
148 |
149 | #### 6. Truncatability of Graph
150 |
151 |
152 |
153 | #### 7. Independent Metadata
154 |
155 | *Artifacts may be associated, through their identity, to independently generated metadata stored outside of the artifact identity graph, such as an SBOM containing license and provenance metadata.*
156 |
157 | There are many many many use cases that could use OmniBORs. An incomplete list would include:
158 |
159 | * Detecting potential vulnerabilities in executables/containers.
160 | * Identifying Open Source License obligations
161 | * Identifying commercial license obligations
162 | * More reliable attestation
163 | * Post exploit forensics
164 |
165 | Undoubtably, more will arise. Independence of metadata independent permissionless innovation around each use case without the need for cross domain coordination. This lowers the cost of innovation and thus allows more productive innovation in this space.
166 |
167 | #### 8. Authoritative Reference
168 |
169 | #### 9. Non-reputability
170 |
171 |
172 | ### What OmniBOR is not
173 |
174 | 1. Not a system for build reproducibility, but it does provide information that is useful for that.
175 | 2. Not a version control system, though it is designed to co-exist with them.
176 | 3. Not an SBOM, though it is designed to complement them.
177 | 4. Not a globally unique software identifier (SWID).
178 | 5. Not reliant on any particular packaging or distribution mechanism, either for artifacts or for artifact identity graphs).
179 |
180 |
181 | ### Comparison to Software Bill Of Materials and our Objective
182 |
183 | {{% notification type="info" %}}
184 | OmniBOR is **not** an SBOM standard.
185 | {{% /notification%}}
186 |
187 | From the OmniBOR perspective, any SBOM document is a type of artifact which could be referenced in an artifact dependency graph.
188 |
189 | From an SBOM perspective, OmniBOR is a common precise way to identify artifacts and their artifact dependency graphs, and nothing more. This makes OmniBOR incapable of fulfilling many of the objectives of SBOMs, such as recording provenance, origination, build environment information, licensure, and other qualities.
190 |
191 | {{% notification type="info" %}}
192 | Speaking strictly from an **SPDX 3.0-draft** perspective, OmniBOR is a lossy serialization format that only includes the minimum metadata field of "Identifier".
193 | {{% /notification %}}
194 |
195 | Current metadata formats, such as SPDX 2.x, as well as current systems to sign and transport metadata documents, do not *efficiently* support [our use case](#Objective) in the general case. They may well, however, support this use case in a specialized case, which we will discuss.
196 |
197 | An argument can be made that current metadata formats can enable run-time analysis of the complete artifact dependency graph. Achieving this would require (1) that generation of SBOM metadata be performed using compatible tooling by every project within the graph, (2) the documents' distrubion be consistent, and, crucially, (3) that a separate system exist to recursively fetch and parse metadata documents for all related projects and index them in a manner enabling efficient search.
198 |
199 | Let us look briefly at these three adoption requirements in more detail to understand the implications for (and, at least, one motivation for hesitancy in uptake of) volunteer-maintained open source projects.
200 |
201 | 1. Current tooling to generate SBOM documents requires effort on the part of every OSS project maintainer to integrate with their build systems. While full SBOM generation *could* be integrated into compilers and linkers, as we propose for OmniBOR, many view the complexity as overly burdensome on small projects, [creating a source of friction](https://opensource.com/article/21/8/open-source-maintainers) that has and may continue to hamper adoption. On the other hand, due to the pervasiveness of Git itself, we believe a minimalist approach that *already feels familiar* will be better received by this long tail of OSS projects.
202 |
203 | 2. One obstacle in the distribution and adoption of SBOMs has been competing standards (see the "Landscape" document for examples in addition to SPDX). By proposing to capture only the bare minimum metadata necessary to enable this scenario, we believe this proposal will avoid the ongoing debates about competing standards. *N.B.: Early socialization of this idea has received fairly wide support for the principle of a minimalist disk-based representation of the artifact dependency graph.*
204 |
205 | 3. Run-time comparison, as described in the Objective, must be within the capabilities of even small and independent consumers of open source. A proposal which required large investments in infrastructure (e.g., that an operator maintain a database containing complete SBOM documents for the totality of open source) will not be seen as a reasonable requirement for smaller and independent organizations (even though it may make for a very compelling product offering, were someone to build and license it!).
206 |
207 | ### How will this intersect with reproducible build systems?
208 |
209 | ***TODO***
210 |
211 | ### Does this play well with In-Toto?
212 |
213 | ***TODO: Santiago***
214 |
215 | ### OmniBOR and SWID
216 |
217 | ### OmniBOR and pURL
218 |
219 |
220 | ## Examples
221 |
222 | ### Example: hello.c
223 |
224 | Imagine we have the following two files:
225 |
226 | `hello.c` has gitoid `c64efd8bd8bceca8c69f9b5b7647cf0ff61fed59` and includes `stdio.h`
227 |
228 | `stdio.h` has gitoid `c0f35b8ae567f5348df3711496fdc0ef6f634169`
229 |
230 | From these two inputs, we compile `hello.o`. The resulting OmniBOR is a document (text file) containing the lexically ordered sequence of the gitoids of each input artifact related to this build step:
231 |
232 | ```
233 | blob⎵c0f35b8ae567f5348df3711496fdc0ef6f634169\n
234 | blob⎵c64efd8bd8bceca8c69f9b5b7647cf0ff61fed59\n
235 | ```
236 |
237 | The gitoid of the resulting document is `85322091b1d50a23d1c2a0f5933788a2a958f2ad`, and this document is written out to disk in a directory in the build environment, e.g.:
238 |
239 | ```
240 | ./.bom/object/85/322091b1d50a23d1c2a0f5933788a2a958f2ad
241 | ```
242 |
243 | The compiler would also embed this gitoid in a new elf section of the resulting `hello.o` binary; this adds a total of 89 bytes when accounting for elf section formatting.
244 |
245 | ### Example: OCI v2 / ORAS
246 |
247 | Imagine we have the following Dockerfile:
248 | ```docker
249 | FROM :
250 | RUN
251 | ```
252 |
253 | We calculate the hash of `:`, which is: `000TODO`.
254 |
255 | Things get a little trickier when we go to calculate the hash of the next layer.
256 |
257 | Also, we want to produce an artifact dependency graph that can reference the omnibor of any artifacts added to that layer, not merely a hash of the whole layer. We'll do that by ... *TODO* ...
258 |
259 | Combining these together, we produce the following OmniBOR document:
260 | ```
261 | blob_000TODO
262 | blob_000TODO
263 | ```
264 | ... and embed the gitoid of this omnibor in the image manifest's `annotations` field, like so:
265 |
266 | ```
267 | {
268 | "schemaVersion": 2,
269 | "config": {...},
270 | "layers": [ {...}, {...} ],
271 | "annotations": {
272 | "omnibor”: “sha256:abc123TODO”
273 | }
274 | }
275 | ```
276 | {{% notification type="info" %}}
277 | **NOTE**: The annotation type 'omnibor' is not yet standardized or accepted to OCI. In the above snippet, 'omnibor' is merely an example.
278 | {{% /notification %}}
279 |
280 | ### Example: truncating a graph for non-public subgraphs
281 |
282 | ***TODO***
283 |
284 | ### Example: very large build systems (e.g., Linux)
285 |
286 | ***TODO***
287 |
288 | ## Proposed Implementation
289 |
290 | ### For Compiled Artifacts
291 |
292 | **TODO:** Replace / reformat examples as a specification
293 | - *Describe implementation for GCC*
294 | - *Describe implementation for LLVM*
295 | - *Address container image composition*
296 |
297 | ### For Non-compiled Artifacts
298 |
299 | **TODO**
300 | - *Address run-time compiled languages, such as python and java*
301 |
302 |
303 | ## Credits and Gratitudes
304 |
305 | I must thank Ed Warnicke, who pitched this idea to me one sunny summer afternoon in 2021 while I was stuck in Puget Sound traffic, and who graciously accomodated my awkward schedule throughout the rest of the year, most often while both of us were in a car.
306 |
307 | I must also thank everyone who provided input and feedback to my "Open Source Landscape" document in 2021, which I have since migrated to a [github repo](https://github.com/AevaOnline/supply-chain-synthesis). The knowledge I gained through those discussions allowed me to identify a tool that was missing from my "supply chain backpack": the OmniBOR.
308 |
--------------------------------------------------------------------------------
/content/glossary/gitref.svg:
--------------------------------------------------------------------------------
1 |
2 |
3 |
--------------------------------------------------------------------------------