├── Glossary └── .gitkeep ├── Scientific articles └── .gitkeep ├── White paper ├── user_journey_images │ ├── readme.md │ ├── 2_Members.jpg │ ├── 1_Precondition.jpg │ ├── 7_Data_Transfer.jpg │ ├── 3_Members_Policies.jpg │ ├── 4_Policies_Agreed.jpg │ ├── 5_Sovereignity_Loss.jpg │ ├── 6_What_is_a_Data_Space.jpg │ ├── 9_Data_Space_all_members.jpg │ └── 8_Source_Connectors_Connection.jpg └── A User Journey to Dataspaces.md ├── Dataspaces ├── images │ ├── Figure_4_FCN_FCC.png │ ├── Figure_1_Essentials.png │ ├── Figure_5_Connectors.png │ ├── Dataspaces_mental_model.png │ ├── Figure_2_ParticipantAgents.png │ ├── Figure_3_Identity_and_Trust.png │ ├── Figure_6_ContractNegotiation.png │ └── Figure_7_DataTransferProcess.png ├── Dataspace Context Model and Conceptual Architecture.md ├── Sovereign enterprise data sharing based on dataspaces.md └── Dataspaces Vocabulary and Operations.md ├── Identity Management ├── images │ ├── did-edc.png │ └── parts-of-a-did-2.svg └── DID_EDC.md └── LICENSE /Glossary/.gitkeep: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /Scientific articles/.gitkeep: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /White paper/user_journey_images/readme.md: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /Dataspaces/images/Figure_4_FCN_FCC.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/eclipse-edc/Publications/HEAD/Dataspaces/images/Figure_4_FCN_FCC.png -------------------------------------------------------------------------------- /Identity Management/images/did-edc.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/eclipse-edc/Publications/HEAD/Identity Management/images/did-edc.png -------------------------------------------------------------------------------- /Dataspaces/images/Figure_1_Essentials.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/eclipse-edc/Publications/HEAD/Dataspaces/images/Figure_1_Essentials.png -------------------------------------------------------------------------------- /Dataspaces/images/Figure_5_Connectors.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/eclipse-edc/Publications/HEAD/Dataspaces/images/Figure_5_Connectors.png -------------------------------------------------------------------------------- /Dataspaces/images/Dataspaces_mental_model.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/eclipse-edc/Publications/HEAD/Dataspaces/images/Dataspaces_mental_model.png -------------------------------------------------------------------------------- /White paper/user_journey_images/2_Members.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/eclipse-edc/Publications/HEAD/White paper/user_journey_images/2_Members.jpg -------------------------------------------------------------------------------- /Dataspaces/images/Figure_2_ParticipantAgents.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/eclipse-edc/Publications/HEAD/Dataspaces/images/Figure_2_ParticipantAgents.png -------------------------------------------------------------------------------- /Dataspaces/images/Figure_3_Identity_and_Trust.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/eclipse-edc/Publications/HEAD/Dataspaces/images/Figure_3_Identity_and_Trust.png -------------------------------------------------------------------------------- /Dataspaces/images/Figure_6_ContractNegotiation.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/eclipse-edc/Publications/HEAD/Dataspaces/images/Figure_6_ContractNegotiation.png -------------------------------------------------------------------------------- /Dataspaces/images/Figure_7_DataTransferProcess.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/eclipse-edc/Publications/HEAD/Dataspaces/images/Figure_7_DataTransferProcess.png -------------------------------------------------------------------------------- /White paper/user_journey_images/1_Precondition.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/eclipse-edc/Publications/HEAD/White paper/user_journey_images/1_Precondition.jpg -------------------------------------------------------------------------------- /White paper/user_journey_images/7_Data_Transfer.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/eclipse-edc/Publications/HEAD/White paper/user_journey_images/7_Data_Transfer.jpg -------------------------------------------------------------------------------- /White paper/user_journey_images/3_Members_Policies.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/eclipse-edc/Publications/HEAD/White paper/user_journey_images/3_Members_Policies.jpg -------------------------------------------------------------------------------- /White paper/user_journey_images/4_Policies_Agreed.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/eclipse-edc/Publications/HEAD/White paper/user_journey_images/4_Policies_Agreed.jpg -------------------------------------------------------------------------------- /White paper/user_journey_images/5_Sovereignity_Loss.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/eclipse-edc/Publications/HEAD/White paper/user_journey_images/5_Sovereignity_Loss.jpg -------------------------------------------------------------------------------- /White paper/user_journey_images/6_What_is_a_Data_Space.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/eclipse-edc/Publications/HEAD/White paper/user_journey_images/6_What_is_a_Data_Space.jpg -------------------------------------------------------------------------------- /White paper/user_journey_images/9_Data_Space_all_members.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/eclipse-edc/Publications/HEAD/White paper/user_journey_images/9_Data_Space_all_members.jpg -------------------------------------------------------------------------------- /White paper/user_journey_images/8_Source_Connectors_Connection.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/eclipse-edc/Publications/HEAD/White paper/user_journey_images/8_Source_Connectors_Connection.jpg -------------------------------------------------------------------------------- /Dataspaces/Dataspace Context Model and Conceptual Architecture.md: -------------------------------------------------------------------------------- 1 | ## Overview 2 | 3 | This document provides an overview of a _context model_ for a dataspace. A context model provides a conceptual backdrop for the architecture and implementation of a software system. Since a context model organizes and informs a codebase, concepts it introduces may not necessarily translate directly into business terminology. For example, Kubernetes has the concepts of operators and controllers; those concepts may have no direct expression in the business terminology used to describe a service that is built on that infrastructure. However, it must be possible to map a business requirement to a context model. The concept model described in this document is implemented by the EDC. 4 | 5 | ## Dataspaces in a Nutshell 6 | 7 | All dataspaces can be expressed by the following context model: 8 | 9 | ![](./images/Figure_1_Essentials.png) 10 | 11 | **Figure 1: The essentials of a dataspace** 12 | 13 | Participant agents are software systems that perform a specific operation or role in a dataspace. The following illustrates the different types of participant agents that may exist in a dataspace: 14 | 15 | ![](./images/Figure_2_ParticipantAgents.png) 16 | 17 | **Figure 2: Participant Agent types** 18 | 19 | The participant agent types are: 20 | 21 | - **Federated Catalog Node:** A system that publishes _assets_ provided by a participant in a dataspace. Publishing makes the assets available for discovery. A participant may choose to make an asset available to a subset of other participants using an _access policy_ and impose usage requirements with an _usage policy._ 22 | 23 | - **Federated Catalog Crawler:** A system that discovers assets published by other participants in a dataspace. The result of a crawling operation is a collection of assets the crawling participant has access to. Access is determined by the provider participant and may include evaluation of access policy and usage policy against a set of verifiable credentials. 24 | 25 | - **Connector** : A system that performs _contract negotiation_ and _asset sharing_ (data transfer or compute-to-data) on behalf of a participant. 26 | 27 | - **Application:** A custom system that performs some role in the dataspace. For example, a supply-chain parts tracking service. 28 | 29 | ### Identity and Trust in a Dataspace 30 | 31 | In _ **Figure 1** _ (above), the notion of a _Dataspace Authority_ was introduced. The Dataspace Authority is responsible for approving one or more identity providers that serve as trust anchors in a dataspace. The Dataspace Authority is an optional role; a dataspace may exist where there is no central authority or it is composed of autonomous actors with no centralized decision-making process. 32 | 33 | There is, however, at least one _Identity Provider_ associated with a dataspace since all _participant agents_ are identifiable. This is an important distinction: while a participant has an identity, all participant agents also have a unique identity. Furthermore, the participant agent identity may be hierarchically related to the participant identity, thereby making it possible to establish a trust chains. Consider the following scenario, which underscores why this distinction is important. Company A may have two participant agents located in different geographic regions. Based on their location, one participant agent may access geospatially restricted data the other agent cannot access. Access policy would be determined using verifiable credentials tied to the participant agent identity. 34 | 35 | An identity provider may be centralized, distributed, or a combination of the two: 36 | 37 | ![](./images/Figure_3_Identity_and_Trust.png) 38 | 39 | **Figure 3: Identity and trust in a dataspace** 40 | 41 | In a dataspace with one centralized identity provider, both Participant A and Participant B would share the same provider. 42 | 43 | ### Catalog Participant Agents 44 | 45 | There are two types of _Catalog Participant Agent_: The Federated Catalog Node (FCN) and the Federated Catalog Crawler (FCC). The FCN is used to publish assets to a dataspace. The details of publishing are described in the following section on contract negotiation. It is important to note that the EDC-based FCN is not an asset repository in the classic sense. Rather, it is an index of assets and pointers to content stored in diverse systems such as object cloud storage, databases, and other infrastructure. The role of the FCN is to make that index available for discovery by other participants. 46 | 47 | The FCC is a participant agent that queries (or crawls) other FCNs in a dataspace. It may be required to present verifiable credentials used to determine which assets are visible to it. A naïve implementation of an FCC could perform real-time crawling in response to a query made by an end-user. This would not scale for dataspaces of any significant size. The EDC FCC, in contrast, performs periodic crawling operations of other FCNs and updates a local, query-able cache. The following diagram illustrates the relationship between the FCC and FCN: 48 | 49 | ![](./images/Figure_4_FCN_FCC.png) 50 | 51 | **Figure 4: The FCC and FCN in a dataspace** 52 | 53 | ### Connector Participant Agents 54 | 55 | A Connector is a specialized participant agent that functions as the asset sharing infrastructure in a dataspace. Connectors may share diverse assets such as data streams, API access, big data, or compute-to-data services. They may support push data transfers, pull data transfers, event streaming, pub/sub notifications, or a variety of other transfer topologies. The following outlines the role of the connector in a dataspace: 56 | 57 | ![](./images/Figure_5_Connectors.png) 58 | 59 | **Figure 5: Connectors in a dataspace** 60 | 61 | Asset sharing is performed in two distinct steps: _contract negotiation_ and _data transfer._ In the EDC, both contract negotiation and data transfer are implemented as asynchronous state machines. Processes transition through a series of states that are understood by the client and provider connectors. Some dataspaces may optimize the contract negotiation step by transitioning it automatically when an asset is requested. Other dataspaces (or, more precisely, participants) may implement a contract negotiation process backed by automated or human workflow. The role of the connector is to manage these processes and provide an audit history of all operations. 62 | 63 | ### Contract Negotiation 64 | 65 | TBD 66 | 67 | ![](./images/Figure_6_ContractNegotiation.png) 68 | 69 | **Figure 6: Contract negotiation** 70 | 71 | ### Data Transfer 72 | 73 | TBD 74 | 75 | ![](./images/Figure_7_DataTransferProcess.png) 76 | 77 | **Figure 7: Data transfer process** 78 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Apache License 2 | Version 2.0, January 2004 3 | http://www.apache.org/licenses/ 4 | 5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 6 | 7 | 1. Definitions. 8 | 9 | "License" shall mean the terms and conditions for use, reproduction, 10 | and distribution as defined by Sections 1 through 9 of this document. 11 | 12 | "Licensor" shall mean the copyright owner or entity authorized by 13 | the copyright owner that is granting the License. 14 | 15 | "Legal Entity" shall mean the union of the acting entity and all 16 | other entities that control, are controlled by, or are under common 17 | control with that entity. For the purposes of this definition, 18 | "control" means (i) the power, direct or indirect, to cause the 19 | direction or management of such entity, whether by contract or 20 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 21 | outstanding shares, or (iii) beneficial ownership of such entity. 22 | 23 | "You" (or "Your") shall mean an individual or Legal Entity 24 | exercising permissions granted by this License. 25 | 26 | "Source" form shall mean the preferred form for making modifications, 27 | including but not limited to software source code, documentation 28 | source, and configuration files. 29 | 30 | "Object" form shall mean any form resulting from mechanical 31 | transformation or translation of a Source form, including but 32 | not limited to compiled object code, generated documentation, 33 | and conversions to other media types. 34 | 35 | "Work" shall mean the work of authorship, whether in Source or 36 | Object form, made available under the License, as indicated by a 37 | copyright notice that is included in or attached to the work 38 | (an example is provided in the Appendix below). 39 | 40 | "Derivative Works" shall mean any work, whether in Source or Object 41 | form, that is based on (or derived from) the Work and for which the 42 | editorial revisions, annotations, elaborations, or other modifications 43 | represent, as a whole, an original work of authorship. For the purposes 44 | of this License, Derivative Works shall not include works that remain 45 | separable from, or merely link (or bind by name) to the interfaces of, 46 | the Work and Derivative Works thereof. 47 | 48 | "Contribution" shall mean any work of authorship, including 49 | the original version of the Work and any modifications or additions 50 | to that Work or Derivative Works thereof, that is intentionally 51 | submitted to Licensor for inclusion in the Work by the copyright owner 52 | or by an individual or Legal Entity authorized to submit on behalf of 53 | the copyright owner. For the purposes of this definition, "submitted" 54 | means any form of electronic, verbal, or written communication sent 55 | to the Licensor or its representatives, including but not limited to 56 | communication on electronic mailing lists, source code control systems, 57 | and issue tracking systems that are managed by, or on behalf of, the 58 | Licensor for the purpose of discussing and improving the Work, but 59 | excluding communication that is conspicuously marked or otherwise 60 | designated in writing by the copyright owner as "Not a Contribution." 61 | 62 | "Contributor" shall mean Licensor and any individual or Legal Entity 63 | on behalf of whom a Contribution has been received by Licensor and 64 | subsequently incorporated within the Work. 65 | 66 | 2. Grant of Copyright License. Subject to the terms and conditions of 67 | this License, each Contributor hereby grants to You a perpetual, 68 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 69 | copyright license to reproduce, prepare Derivative Works of, 70 | publicly display, publicly perform, sublicense, and distribute the 71 | Work and such Derivative Works in Source or Object form. 72 | 73 | 3. Grant of Patent License. Subject to the terms and conditions of 74 | this License, each Contributor hereby grants to You a perpetual, 75 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 76 | (except as stated in this section) patent license to make, have made, 77 | use, offer to sell, sell, import, and otherwise transfer the Work, 78 | where such license applies only to those patent claims licensable 79 | by such Contributor that are necessarily infringed by their 80 | Contribution(s) alone or by combination of their Contribution(s) 81 | with the Work to which such Contribution(s) was submitted. If You 82 | institute patent litigation against any entity (including a 83 | cross-claim or counterclaim in a lawsuit) alleging that the Work 84 | or a Contribution incorporated within the Work constitutes direct 85 | or contributory patent infringement, then any patent licenses 86 | granted to You under this License for that Work shall terminate 87 | as of the date such litigation is filed. 88 | 89 | 4. Redistribution. You may reproduce and distribute copies of the 90 | Work or Derivative Works thereof in any medium, with or without 91 | modifications, and in Source or Object form, provided that You 92 | meet the following conditions: 93 | 94 | (a) You must give any other recipients of the Work or 95 | Derivative Works a copy of this License; and 96 | 97 | (b) You must cause any modified files to carry prominent notices 98 | stating that You changed the files; and 99 | 100 | (c) You must retain, in the Source form of any Derivative Works 101 | that You distribute, all copyright, patent, trademark, and 102 | attribution notices from the Source form of the Work, 103 | excluding those notices that do not pertain to any part of 104 | the Derivative Works; and 105 | 106 | (d) If the Work includes a "NOTICE" text file as part of its 107 | distribution, then any Derivative Works that You distribute must 108 | include a readable copy of the attribution notices contained 109 | within such NOTICE file, excluding those notices that do not 110 | pertain to any part of the Derivative Works, in at least one 111 | of the following places: within a NOTICE text file distributed 112 | as part of the Derivative Works; within the Source form or 113 | documentation, if provided along with the Derivative Works; or, 114 | within a display generated by the Derivative Works, if and 115 | wherever such third-party notices normally appear. The contents 116 | of the NOTICE file are for informational purposes only and 117 | do not modify the License. You may add Your own attribution 118 | notices within Derivative Works that You distribute, alongside 119 | or as an addendum to the NOTICE text from the Work, provided 120 | that such additional attribution notices cannot be construed 121 | as modifying the License. 122 | 123 | You may add Your own copyright statement to Your modifications and 124 | may provide additional or different license terms and conditions 125 | for use, reproduction, or distribution of Your modifications, or 126 | for any such Derivative Works as a whole, provided Your use, 127 | reproduction, and distribution of the Work otherwise complies with 128 | the conditions stated in this License. 129 | 130 | 5. Submission of Contributions. Unless You explicitly state otherwise, 131 | any Contribution intentionally submitted for inclusion in the Work 132 | by You to the Licensor shall be under the terms and conditions of 133 | this License, without any additional terms or conditions. 134 | Notwithstanding the above, nothing herein shall supersede or modify 135 | the terms of any separate license agreement you may have executed 136 | with Licensor regarding such Contributions. 137 | 138 | 6. Trademarks. This License does not grant permission to use the trade 139 | names, trademarks, service marks, or product names of the Licensor, 140 | except as required for reasonable and customary use in describing the 141 | origin of the Work and reproducing the content of the NOTICE file. 142 | 143 | 7. Disclaimer of Warranty. Unless required by applicable law or 144 | agreed to in writing, Licensor provides the Work (and each 145 | Contributor provides its Contributions) on an "AS IS" BASIS, 146 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 147 | implied, including, without limitation, any warranties or conditions 148 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 149 | PARTICULAR PURPOSE. You are solely responsible for determining the 150 | appropriateness of using or redistributing the Work and assume any 151 | risks associated with Your exercise of permissions under this License. 152 | 153 | 8. Limitation of Liability. In no event and under no legal theory, 154 | whether in tort (including negligence), contract, or otherwise, 155 | unless required by applicable law (such as deliberate and grossly 156 | negligent acts) or agreed to in writing, shall any Contributor be 157 | liable to You for damages, including any direct, indirect, special, 158 | incidental, or consequential damages of any character arising as a 159 | result of this License or out of the use or inability to use the 160 | Work (including but not limited to damages for loss of goodwill, 161 | work stoppage, computer failure or malfunction, or any and all 162 | other commercial damages or losses), even if such Contributor 163 | has been advised of the possibility of such damages. 164 | 165 | 9. Accepting Warranty or Additional Liability. While redistributing 166 | the Work or Derivative Works thereof, You may choose to offer, 167 | and charge a fee for, acceptance of support, warranty, indemnity, 168 | or other liability obligations and/or rights consistent with this 169 | License. However, in accepting such obligations, You may act only 170 | on Your own behalf and on Your sole responsibility, not on behalf 171 | of any other Contributor, and only if You agree to indemnify, 172 | defend, and hold each Contributor harmless for any liability 173 | incurred by, or claims asserted against, such Contributor by reason 174 | of your accepting any such warranty or additional liability. 175 | 176 | END OF TERMS AND CONDITIONS 177 | -------------------------------------------------------------------------------- /Dataspaces/Sovereign enterprise data sharing based on dataspaces.md: -------------------------------------------------------------------------------- 1 | # 1. Introduction: Dataspaces: A Foundation for Enterprise Digital Sovereignty 2 | 3 | To realize a viable data economy, companies and organizations of all types and sizes around the world need to have full control over their data combined with the ability to share their data in a controllable and compliant way. They need an enterprise grade architecture and solution that puts them in control of their data while sharing with other stakeholders such as customers, supply chain, the public, and in certain scenarios even with their competitors. 4 | 5 | This architecture begins with the concept of _**dataspaces**_. A given dataspace consists of one or more datasets, large or small, static, or streaming, relational, or not, that are configurated for sharing. The dataspace also defines the syntax and semantics of the dataset it represents, the associated policies for sharing of that dataset, the type of identity management used (e.g. whether it is centralized or self-sovereign/decentralized), the security mechanism used to protect that dataset, and last not but least, the laws/regulations and implicit or explicit contracts and policies in effect for (sharing) that dataset. 6 | 7 | A dataspace can be formed by any number of parties interested in sharing data in a controlled way. This can be partnerships of multiple companies, or private – public partnerships. Dataspaces can be accessible to a closed set of members or be open and accessible to the public. 8 | 9 | Participants in a dataspace usually organize around a central purpose, such as a company and its partners and supply chain, a specific industry, a geographically limited use case, or simply a limited interest group, which then forms the basis of rules and policies for the dataspace. Individual members can be participants of multiple dataspaces and thus also expose their data and services in multiple ways, with rules and policies depending on the requirements of each individual dataspace. 10 | 11 | Dataspaces are not bound to a specific technology or processing location. They allow for data exchange across multiple clouds, on-premises, and edge. A dataspace isn’t limited to a single data protocol. Depending on the scenario and technology used it can accommodate blob data, unstructured data, relational data, streaming data and a wide array of protocols and storage technologies. 12 | 13 | A possible way to look at dataspaces is as the control plane for data sharing: controlling the flow, filtering, and application of policy on data transfers and data operations that happen in the data plane, but not being prescriptive about the infrastructure and processing layer. Any storage technology or provider can be incorporated in a dataspace. 14 | 15 | ## 1.1 Enterprise Digital Sovereignty and its Elements 16 | 17 | Any organization would want to exert control over its data and other digital resources, to the extent possible, given the applicable laws/regulations, their organizational governance model and associated risk/reward analysis. This desired control could be about protection of the digital assets as well as the goal of generating value from it (by sharing them with others, for example). This means that the organization needs to assess its risk for loss of control/access to its data by actions of various jurisdictions or other stakeholders. The control over data also requires the digital mechanism for enforcing their policies for sharing of their data only with the explicit consent of the organization (e.g., as part of a contract) or based on proceedings under the applicable jurisdiction in which the organization operates in. 18 | 19 | A key difference between political digital sovereignty and enterprise digital sovereignty is that with the former the jurisdiction typically applies to a geo-political boundary (which could be a province/state, a nation or political union such as the EU). The key differentiator is that an enterprise can be subject to many jurisdictions based on where they operate and with whom they trade (for example a multi-national company with manufacturing and sales operations across several continents). A government by contrast is only subject to the laws and treaties it agreed to. The ability to identify and comply with the applicable laws and regulations are particularly important when crossing geo-political boundaries, because this may impact jurisdictional obligations which may need to be transferred to organizations with which data is shared. 20 | 21 | At a more technical level, enterprise digital sovereignty requires an organization to have control over its digital identities, control over which digital identities (internal or external to the organization) have access to data, and to store and process data on desired platforms or infrastructures. Enterprise digital sovereignty also requires the ability to move or copy data between different platforms or infrastructures. In other words, enterprise digital sovereignty requires **identity control** (centralized, or decentralized for maximum sovereignty), **access control**, and **usage control**. 22 | 23 | One of the important elements of enterprise digital sovereignty is avoidance of lock-in. Dataspaces could help address _**lock-in**_ concerns of digital platform users in a couple of ways. When implemented with inherently portable, cloud native technologies such as containers and the associated orchestration engine, they provide a technological foundation for data sharing that is inherently platform neutral, addressing _**platform lock-in**_ concerns. And because they allow data to be shared using connectors that communicate with each other while running on the desired, but optionally different platforms on each end of the wire, they demonstrate _**interoperability**_. In addition, given the data is exchanged across the wire, dataspaces need to address _**data portability**_ facets such as syntactic and semantic portability, as well data policy portability defined as “the ability to transfer data between source and destination while complying with the legal, organizational and policy frameworks, including applicable data regulations in areas such as security and privacy.” ([ISO/IEC 19941](https://standards.iso.org/ittf/PubliclyAvailableStandards/c079573_ISO_IEC_19944-1_2020(E).zip)) 24 | 25 | # 2. Dataspace Conceptual Model 26 | 27 | Dataspaces can be an instrument for achieving enterprise digital sovereignty, and if used by governments to maintain political sovereignty. Dataspaces can leverage centralized or decentralized identity to provide desired level of control over identity and build access policies around that. Usage control is done by implementing policies and attaching those inseparably to the data. In addition, dataspaces that are built on top of cloud native infrastructure offer code portability. By nature, dataspaces provide interoperability of data as well as metadata describing data sources and their access policies. Optional semantic models can help build a joint understanding of the meaning of data within a dataspace. 28 | 29 | A dataspace is a cornerstone of enterprise digital sovereignty. It consists of one or more datasets, large or small, static, or streaming, relational or not, that are designated for sharing. The dataspace also defines the syntax and semantics of the dataset it represents, the associated policies for sharing of that dataset, the identity management system used (e.g., whether it is centralized or sovereign/decentralized), the security mechanism used to protect that dataset, and last not but least, the laws/regulations and implicit or explicit contract in effect for sharing that dataset. 30 | 31 | A dataspace is both an agreement and a supporting technical infrastructure (hardware and software) that enables data sharing between two or more participants. 32 | 33 | A dataspace is unique with respect to other data sharing arrangements such as a B2B data exchange because it adheres to the following principles: 34 | 35 | 1. Participants maintain control (agency) over their identities 36 | 1. Participants maintain control (agency) over which other participants they trust 37 | 1. Participants maintain control (agency) over their data 38 | 1. All data sharing transactions are observable and verifiable 39 | 40 | The sharing of the data in a dataspace is accomplished by _**dataspace connectors**_. Each dataspace is represented by one or more such connectors that facilitate the actual sharing of the data at runtime, while enforcing the policies and requirements put in place by the data controller in the dataspace. A connector includes executable code and other configuration and metadata artifacts that can be run on any cloud infrastructure, on premises or on an edge device. 41 | 42 | There will be complementary documents describing how dataspaces and connectors work, and how various data controllers and data users as well as other stakeholders can come together to share data in a secure and sovereign fashion that satisfies their requirements and addresses the concerns of all stakeholders. 43 | 44 | # 3. Dataspace Value Generation: Decentralized Data Applications and Collective Participants 45 | 46 | Dataspaces can adopt different levels of maturity. A basic dataspace may exist to exchange data in a peer-to-peer fashion. In this architecture, value is derived from sharing data on a 1:1 basis in an automated fashion. 47 | 48 | A dataspace can evolve to provide value beyond peer-to-peer data exchanges by building collective data services and applications. A collective data service aggregates data from multiple participants to create a new offering. Consider an industrial dataspace with multiple supply-chain networks. The dataspace may collectively introduce a parts tracing service that aggregates data from participants to provide new processing capabilities. Each participant shares data with the parts tracing service and attaches specific usage requirements. One requirement could be that participants higher up in the supply chain do not have access to detailed parts data deriving from lower levels in the chain. 49 | 50 | In traditional data exchange systems, the “Parts Tracing” application would typically be built using a centralized database. A major drawback of this approach is that a central organization running the database would have access to all sensitive supply chain data. By contrast, a dataspace is fundamentally distributed. In a dataspace, the parts chain application would be partitioned and hosted by participants on disparate computing infrastructure. The dataspace provides the connective fabric that ties the data partitions into a coherent whole. 51 | 52 | Sometimes it will be necessary to expose the results from applying an algorithm or data application on data shared within the dataspace to the members of the dataspace as a common data asset or service of the dataspace. In this case the dataspace has a need for one or more collective participant(s) that represent data and services that are forming collective value of the dataspace and are not controlled by an individual member alone. Control and rights to the collective results is usually governed by legal structures and policies put in place for the dataspace. Technically those collective participants act like any other participant, following policies and exposing an identity. 53 | -------------------------------------------------------------------------------- /Dataspaces/Dataspaces Vocabulary and Operations.md: -------------------------------------------------------------------------------- 1 | # Dataspaces Vocabulary and Operations 2 | 3 | ## How to read this document 4 | EDC-specific terms and concepts are marked with bold text. 5 | 6 | The terms and concepts come from two separate but related sources: 7 | - Business perspective of organizations that use dataspaces for B2B data sharing with their partners and customers. 8 | - EDC’s own choice of architecture and implementation 9 | 10 | ## Dataspace Activities described in this document 11 | - Establishing a Dataspace 12 | - Joining a Dataspace 13 | - Publishing Data 14 | - **Assets**, **Policies** and **Contract Definitions** 15 | - Discovering Data 16 | - Data Contracts 17 | - Sharing Data 18 | - Establishing trust 19 | - Contract negotiation and data transfer 20 | 21 | ## Scenario Assumptions 22 | - Multiple companies are members of a dataspace; they collaborate on sharing their data with each other. They are multi-nationals that need to protect their data across geopolitical boundaries. 23 | - Need a way to advertise available data and usage requirements 24 | - Need a way to enforce data usage requirements 25 | - Need access control since some data may be confidential to a subset of participants 26 | - Each company must be in control over who they trust 27 | - Need an interoperable way to transfer data across heterogenous clouds 28 | - Need a full audit history 29 | 30 | # Establishing a dataspace 31 | 32 | - Founding companies create a dataspace entity called a **dataspace authority** 33 | - Establish rules of participation. 34 | - Optionally establish a set of third-party **verifier organizations**. 35 | - Define a common set of **policies** that must be supported by all members. 36 | - Define a **trust model**. A trust model may be centralized or decentralized (more on this later). 37 | - The **dataspace authority** defines a registration process 38 | - Web Application where companies can apply for membership 39 | - Membership applications are certified by a verifier organization or an automated ruleset. 40 | - On verification, the new member company becomes a **participant**, and an entry is added into the **dataspace member registry** 41 | - The only software systems run by the dataspace authority are: 42 | - A custom registration web app 43 | - A **dataspace member registry** that contains information about the participants and serves as a participant directory 44 | - **Decentralization** is the default operation mode of a dataspace, however a centralized mode is possible, starting with a central Identity Service, and potentially centralizing other services (Catalog, Observer,...) as well. 45 | - Companies adopt EDC, a set of open-source software components 46 | - The EDC **dataspace member registry** is operated by the dataspace authority 47 | 48 | # Joining a Dataspace 49 | - The company choses and deploys EDC components 50 | - **Federated Catalog Service** – advertises data to other **participants** and discovers data made available by other **participants**. 51 | - **Connector** – orchestrates the sharing and transfer of data in a secure way with other **participants** 52 | - The components may be deployed in the cloud, on premises, or in a hybrid model. 53 | - The components can also be deployed across multiple clouds (for companies using more than one cloud provider) 54 | - The member company applies with the *dataspace authority* 55 | - The company provides its Decentralized Identity (DID), which points to the participants **self-description**, which includes connectivity information 56 | - On approval, the company is entered in the **dataspace member registry** and a Verifiable Credential (VC) is issued, proving membership in the dataspace 57 | 58 | # Roles within a dataspace participant 59 | This list is a non-exhaustive example of the most common roles and might differ from your actual organisational setup. 60 | 61 | - Data Officer 62 | - Responsible for data security and trust 63 | - Defines policies for access and sharing 64 | - Monitors and tracks adherence to policies 65 | - Wants to view data lineage 66 | -Which data has been shared with other organizations 67 | -Which data has been consumed by their organization 68 | - Data Owner 69 | - Acts as the steward of particular data sets 70 | - Registers one or multiple data sets as a **Data Asset** in the connector 71 | - Data Consumer 72 | - Proves membership in the dataspace and adherence to policies 73 | - Requests and uses a **Data Contract**, which contains **Usage Policies** and the **Data Asset** from another participant 74 | - System Integrator 75 | - A composite role, responsible for setting up and running EDC components 76 | 77 | # Publishing Data Preliminaries 78 | - Data is made available to other **Participants** as a **Data Contract** 79 | - A **Data Contract** always has an **Access Policy**, which defines which **Participants** have access to the contained **Data Asset**. 80 | - A **Data Contract** always has a **Data Usage Policy**, which defines the duties, obligations and rules that apply to the use of the contained **Data Asset**. 81 | In order to access a **Data Contract**, a **Participant** must satisfy both the **Access Policy** and **Usage Policy**. 82 | 83 | # Publishing an Asset 84 | 85 | - **Assets** are not published “raw”, the are wrapped in **Data Contracts** 86 | - **Data Contracts** are either **Contract Offers** (query & negotiation) or **Contract Agreements** (sharing) 87 | - All **Data Contracts** have an **Access Policy** and **Usage Policy** 88 | - If we had to define separate **Access Policy** and **Usage policy** for each **Data Asset** that would be 89 | - Tedious and error prone 90 | - A security risk 91 | - Difficult for the data officer to set corporate standards 92 | - Overly complex when defining different policies for the same asset (e.g., for different audiences) 93 | The contract definition solves these issues 94 | 95 | ## Contract Definition as a template for Data Contracts 96 | 97 | A **Contract Definition** specifies 98 | - the **Access Policy** on a **Data Asset** - which **Participants** have access to (e.g., "my partners", "any organization in Europe", "anyone", etc.) 99 | - the **Usage Policy** - duties, rules, and obligations one must follow (e.g., "store data in Europe", "delete the data after 30 days", etc.) 100 | 101 | A **Contract Definition** can be associated with many **Data Assets** 102 | - follows a top-down model 103 | - a **Contract Definition** has an **Asset Selector** (this **Contract Definition** applies to these **Data Assets**) 104 | - A **Data Asset** can be published unter multiple **Contract Definitions** if it needs to be exposed with different **Policies** 105 | 106 | #Publishing Data in a Dataspace 107 | 108 | ## Step 1: Participant Setup 109 | 1. The **Participant** registers their DID with the **Dataspace Authority**, which issues a Verifiable Credential (VC) of membership to the **Participant** 110 | 2. The **Participant** or their System Integrator/Operator installs **EDC**, which can be run on-premises or in the cloud. It then configures the deployment with the received VC and the pointer to the **Dataspace Participant Registry** 111 | 3. The **Federated Catalog Node (FCN)** and the **Federated Catalog Crawler (FCC)** are configured. (e.g.; catalog storage, scheduling of the crawler) 112 | 4. The **FCN** publishes a list of **Data Contracts** that the **Participant** wants to make available to other **Participants** - see next step for details 113 | 114 | ## Step 2: Publishing Data 115 | 1. The data officer defines **Contract Definitions* in EDC 116 | This **Contract Definition** is for all parts (individual data sources) od the **Data Asset** (**Asset Selector**) 117 | Examples: 118 | - "Can be accessed only by a given member company's partners (**Access POlicy**) 119 | - "Must be stored in Europe and used only for maintenance purposes (**Usage Policy**" 120 | 2. The data owner creates a **Data Asset Entry** in EDC 121 | - The **Data Asset Entry** is not the actual data source, rather it points to where the **Data Asset** is stored (e.g., Object Storage) 122 | - The EDC automatically "associates" the **Data Asset** with the **Contract Definition** in the system 123 | 3. The **Data Contract (Data Asset + Contract Definition)** is now available to other **Participants** that satisfy the policies contained. 124 | 125 | # Discovering Data: Participant Catalog Setup 126 | 1. Install and configure **EDC, FCN and FCC** 127 | - **EDC** receives the URL to the **Dataspace Participant Registry** 128 | - **FCN** is configured to store and publish catalog entries 129 | - **EDC** receives the membership VC to authenticate itself and its services as a valid member of the dataspace (e.g., when querying other **Participants** catalogs) 130 | - **FCC** is a scheduled task which is performed regularly 131 | 2. **FCC** uses the **Dataspace Participant Registry** to find other **Participants** 132 | - **FCC** starts crawling the list of **Participants** 133 | - for each **Participant** FCC reads the DID Document of their DID to find the self-description (service entry in the DID Document) which then provides all information necessary to access the services of other **Participants' EDCs** 134 | - **FCC** proves its dataspace membership and **Access Policy** attributes to other **Participants** through the issued VC and the self-description of the **Participant** 135 | 3. The **FCC** crawls the **FCNs** from other **Participants** for available (published) **Data Contracts** 136 | - If the credentials and self-description fulfill the **Access Policy** of a **Data Contract**, it becomes visible to the requesting **Participant** in their **FCN** 137 | 4. The **FCC** caches the results of periodic crawling in the local **FCN** and provides means to query the local cache. 138 | - **Participants** query their local **FCN** for **Data Contracts** available in the dataspace 139 | 140 | # Sharing Data 141 | 1. A **Participant** queries their **FCN** cache for available **Data Contracts** 142 | 2. The **Participant** selects which **Data Contract Offers** they would like to consume 143 | 3. The providing and the consuming **Participant** negotiate the **Data Contract Agreement** 144 | - The contract negotiation may be automatic or involve manual workflow 145 | - When the contract negotiation is completed, a **Contract Agreement** is created for the requested **Data Contract** and it's preserved for future audit. The **Data Contract Agreement** contains the **Data Contract** which contains the **Usage Policy**. 146 | 4. The data consumer **Participant** initiates a data transfer request with the **Connector** from the providing **Participant** 147 | 5. The **Connector** component orchestrates the data transfer using specific data transfer technologies, depending on the extensions in use and the underlying data storage and processcing technologies. 148 | - The consuming and providing **Connectors** jointly orchestrate the transfer of data associated with the **Data Asset** 149 | - Both **Connectors** record an audit history of the transaction 150 | 151 | ![Mental Model of Data Transfer in a Dataspace](./images/Dataspaces_mental_model.png "Dataspaces Mental Model") 152 | -------------------------------------------------------------------------------- /Identity Management/DID_EDC.md: -------------------------------------------------------------------------------- 1 | # Decentralized Identifiers and the Eclipse Dataspace Connector 2 | 3 | # Introduction 4 | 5 | ## Purpose and Target Audience 6 | 7 | This document assumes a basic understanding of the concept of International Dataspaces and some knowledge about the Eclipse Dataspace Connector within that context. Also, basic acquaintance with the mechanisms of public key infrastructures (PKI) and the domain name service (DNS) is required. 8 | 9 | Its purpose is to provide an overview of what Decentralized Identifiers (DIDs) and Verifiable Credentials (VCs) are, why dataspaces need them, and how the Eclipse Dataspace Connector (EDC) leverages DIDs and VCs to implement decentralized identity management. 10 | 11 | ## The Problem of Identity 12 | 13 | Dataspaces enable the discovery and sharing of data between participants. Any such transaction is predicated on trust on many levels: trust that the data is accurate, that the transmission is secure, that agreed usage policies will be adhered to, etc. But the first and basic principle of trust must always be trust that the entity one is dealing with truly is who they claim to be. And trust needs continued verification – after all, today’s trusted third-party (which is only verified once) could become tomorrow’s attack vector. 14 | 15 | In a dataspace, one might want to discover and use data from (or share data with) entities one does not have a previous legal relationship with, and do so in an automated way. To enable this, there needs to be a system of identity verification. A system that ideally would come without the need to trust a central authority. After all, who verifies the identity of the verifier? 16 | 17 | But identity may not be enough. A verified entity might want to make claims about itself, claims to certain accreditations by third-party authorities (e.g., business licenses, GDPR compliance, carbon neutrality certifications, etc.). Such claims also must be verifiable. And, again, ideally without the need to contact any third party, whose identity in turn would have to be verified (and whose continued existence would be paramount), and who could potentially act as gatekeeper and exclude some entities from participating in the market (for whatever reasons). 18 | 19 | Decentralized Identifiers aim to solve these challenges. 20 | 21 | # Decentralized Identifiers 22 | 23 | Decentralized Identifiers (DIDs) are globally unique identifiers that can be created and fully controlled by the owner (subject) of the identifier itself without the need for external authorities. As such, they are self-sovereign identities. They give entities (individual people, corporations, or digital entities like an EDC) the ability to de-couple sensitive information from the identifier itself, make them publicly discoverable (through various methods, see below), and enable verifiable credentials through use of cryptographic proof systems (e.g., PKI). 24 | 25 | A DID is a simple three-part string: 26 | 27 | ![DID three-part string](images/parts-of-a-did-2.svg) 28 | 29 | (Source: https://www.w3.org/TR/did-core/) 30 | 31 | The **scheme** simply identifies the string as a DID. The **DID method** specifies the mechanism used to create, resolve, update, and deactivate (CRUD) the DID itself as well as its associated DID document (see below). The **DID method-specific identifier** specifies where the DID document may be found. 32 | 33 | There are numerous ways DIDs may be implemented, and there is a growing list of such implementations. These implementations are called methods. They each have their own specification and are usually associated with specific verifiable data registries (see Verifiable Credentials below). **DID resolvers** are used to handle each specific method implementation for locating the document associated with a DID. Later in this document, a method will be introduced in the context of the EDC called DID:Web. 34 | 35 | Next in the above string, the second integral part of a DID is its associated DID document. This is a simple JSON file that contains information associated with the DID, such as the public key of its owner (also known as the DID controller) and the location where further information (such as verifiable credentials) may be found (a service endpoint). A sample DID document would look like this: 36 | 37 | ``` 38 | { 39 | "@context": [ 40 | "https://www.w3.org/ns/did/v1", 41 | "https://w3id.org/security/suites/ed25519-2020/v1" 42 | ] 43 | "id": "did:example:123456789abcdefghi", 44 | "authentication": [{ 45 | // used to authenticate as did:...fghi 46 | "id": "did:example:123456789abcdefghi#keys-1", 47 | "type": "Ed25519VerificationKey2020", 48 | "controller": "did:example:123456789abcdefghi", 49 | "publicKeyMultibase": "zH3C2AVvLMv6gmMNam3uVAjZpfkcJCwDwnZn6z3wXmqPV" 50 | }] 51 | } 52 | ``` 53 | (Source: https://www.w3.org/TR/did-core/) 54 | 55 | If another entity supplies a verifiable credential that was signed by the issuing authority with its private key, one can easily discover the public key and check if the credential really was issued by the authority. 56 | 57 | # Verifiable Credentials 58 | 59 | Verifiable Credentials are a W3C Standard. A verifiable credential in essence is a claim that an entity (the **holder** of the credential) makes about itself, which another entity (the **issuer** of the credential) bestowed upon that holder, and that a third entity wants to check (the **verifier**). 60 | 61 | ![verifiable credentials ecosystem](images/ecosystem-2.svg) 62 | 63 | (Source: https://www.w3.org/TR/vc-data-model/) 64 | 65 | As an example, consider the case of a driver’s license. A public authority (e.g., department of motor vehicles) issues a driver a license to operate a motor vehicle. In a routine check, a police officer may want to verify the authenticity of the license presented by the driver. They would do so by checking with the public authority through a data registry. Verifiable credentials as envisioned by the W3C transfers that process in its entirety to the digital world in a machine-readable way. 66 | 67 | In the above diagram, the verifiable data registry can be implemented by using a DID method. 68 | 69 | # Identity Hubs 70 | 71 | The holder of verifiable credentials as described above needs some way to store those credentials digitally and securely, and ideally in such a way as to allow programmatic access to them by verifiers. This is where identity hubs come into play. 72 | 73 | Taking the driver’s license example from above, an identity hub fulfills the role of the wallet of the driver, where the license is kept (the concept of digital wallets is also widely known today in the cryptocurrency world to securely store one’s digital assets). But an identity hub provides more than just storage of verifiable credentials. 74 | 75 | ![identity hub](images/topology-2.svg) 76 | 77 | (Source: https://identity.foundation/identity-hub/spec/) 78 | 79 | As shown in the topology above, in addition to storing verifiable credentials, identity hubs (through standardized APIs) enable the exchange of messages between participants. 80 | 81 | Identity hubs act as the service endpoints defined in DID documents: 82 | 83 | ``` 84 | { 85 | "id": "did:example:123", 86 | "service": [{ 87 | "id":"#hub", 88 | "type": "IdentityHub", 89 | "serviceEndpoint": { 90 | "instances": ["https://hub.example.com", "https://example.org/hub"] 91 | } 92 | }] 93 | } 94 | ``` 95 | (Source: https://identity.foundation/identity-hub/spec/) 96 | 97 | # DID:Web 98 | 99 | DID:Web is a specific DID method. It is based on the domain name system (DNS) and allows participants to bootstrap trust based on a domain’s existing reputation. Essentially, by trusting the domain resolution, a DID document stored at a specific location on a web server is considered trustworthy because the entity in control of the domain also has control over documents posted on the servers in their domain. 100 | 101 | Taking an example DID of *did:web:example.com*, the associated DID document would look like this: 102 | 103 | ``` 104 | { 105 | "@context": "https://www.w3.org/ns/did/v1", 106 | "id": "did:web:example.com", 107 | "verificationMethod": [{ 108 | "id": "did:web:example.com#owner", 109 | "type": "Secp256k1VerificationKey2018", 110 | "owner": "did:web:example.com", 111 | "ethereumAddress": "0xb9c5714089478a327f09197987f16f9e5d936e8a" 112 | }], 113 | "authentication": [ 114 | "did:web:example.com#owner" 115 | ] 116 | } 117 | ``` 118 | (Source: https://w3c-ccg.github.io/did-method-web/) 119 | 120 | A DID-Web resolver would simply follow the standardized path of https://example.com/.well-known/did.json to retrieve this DID document. Alternatively, if a specific path would be provided in the DID, as in *did:web:example.com:did-storage:did1*, the resolver would look for the DID document at https://example.com/did-storage/did1/did.json. 121 | 122 | DID-Web is a very simple, but also practical solution. It avoids the complexity of blockchains, but provides an easy path to later switching to a different DID method, which could also be a blockchain based one). DID:Web uses an existing distributed trust network: DNS. Note that this does, of course, come with its own set of questions around security (a simple hijacked local hosts file pointing to a fraudulent IP for the domain comes to mind). It is, however, a solution which can be implemented relatively easily today which only relies on well-known and mature technology readily available. 123 | 124 | # The EDC Solution 125 | 126 | The Eclipse Dataspace Connector implements decentralized identifiers and uses DID:Web as the default method. However, as the EDC is extensible with custom modules additional DID methods can be easily added. It even supports the replacement of decentralized identity with centralized identity providers if a dataspace should desire to do so. 127 | 128 | The identification verification process looks like this (both “Participant” and “Provider” are participants in a dataspace): 129 | 130 | ![EDC DID workflow](images/did-edc.png) 131 | 132 | The two entities iterate through the following steps: 133 | 1. The participant EDC sends its DID to the provider. 134 | 1. The provider checks if the domain provided with the DID can be resolved and is trusted (e.g., not part of some blacklist). 135 | 1. See 2. 136 | 1. The provider resolves the DID provided by the participant (did:web:foo.com:guid). 137 | 1. The provider requests the EDC DID document from the participant’s web server. 138 | 1. The participant’s web server provides the EDC DID document. 139 | 1. The provider resolves the participant’s general company DID (did:web:foo.com). 140 | 1. The provider’s DID-Web resolver forwards the EDC DID document to its own EDC. 141 | 1. The participant’s web server provides the company DID document. 142 | 1. The provider’s DID-Web resolver forwards the company DID document to its own EDC. 143 | 144 | Optionally, corporate verifiable credentials may be exchanged and verified: 145 | 146 | 11. The provider requests the corporate authorizations from the participant. 147 | 12. The participant provides its credentials. 148 | 13. The provider verifies the corporate authorizations (note that this would involve another iteration of DID verification, but external to the dataspace). 149 | 150 | And the original flow continues: 151 | 152 | 14. The provider verifies the participant’s identity and claims. 153 | 15. The provider acknowledges the identification to the participant. 154 | 155 | The EDC’s modular architecture allows for easy addition of different DID resolvers, thereby allowing effortless switching between different DID methods, using several methods concurrently in a hybrid setup, or even using a universal DID resolver to manage DID resolution externally. One could also (e.g., for purely intra-organizational purposes) use a DID resolver for a centralized identity solution. 156 | 157 | # References & Resources 158 | 159 | EDC project: https://projects.eclipse.org/projects/technology.dsconnector 160 | 161 | EDC GitHub repository: https://github.com/eclipse-dataspaceconnector/DataSpaceConnector 162 | 163 | EDC YouTube Channel: https://www.youtube.com/channel/UCYmjEHtMSzycheBB4AeITHg 164 | 165 | Decentralized Identifiers: https://www.w3.org/TR/did-core/ 166 | 167 | DID-Web: https://w3c-ccg.github.io/did-method-web/ 168 | 169 | Verifiable Credentials: https://www.w3.org/TR/vc-data-model/ 170 | 171 | Identity Hub: https://identity.foundation/identity-hub/spec/ 172 | -------------------------------------------------------------------------------- /White paper/A User Journey to Dataspaces.md: -------------------------------------------------------------------------------- 1 | ## A User Journey to Data Spaces 2 | 3 | ### As of Q2/2022 4 | 5 | Purpose of the document: 6 | 7 | This document aims to demonstrate the necessity and functions of data spaces. 8 | 9 | # Table of contents 10 | 11 | _**Abstract**_ 12 | 13 | _**Introduction**_ 14 | 15 | _**User Story – Part I**_ 16 | 17 | _**User Story – Part II**_ 18 | 19 | Authors: 20 | 21 | Natascha Totzler ([natascha.totzler@gmail.com](mailto:natascha.totzler@gmail.com)) and 22 | 23 | Günther Tschabuschnig ([guenther.tschabuschnig@gmail.com](mailto:guenther.tschabuschnig@gmail.com)) 24 | 25 | The present document was published under an [Apache 2.0 license](mailto:https://www.apache.org/licenses/LICENSE-2.0.html) and follows the rules of the Eclipse Foundation. 26 | 27 | # 28 | 29 | # Abstract 30 | 31 | Every day, billions of new data are generated, measured and managed by the humankind: environment data, health data, social media data etc. 32 | 33 | When it comes to data use and data sharing, however, corporate entities quickly reach their limits. The presented example shows how organizations can actually benefit from these limits by creating new data-driven business cases. 34 | 35 | Since our world is increasingly becoming data-driven, and we make products out of data, it is essential to also provide a corresponding technological environment. The related tools mentioned in this document, such as data spaces or connectors, are the building blocks that can be applied to create, use and participate in a data ecosystem. In this process, particularly high value is placed on topics of sovereignty, trust and interconnectedness of data. 36 | 37 | In our example of building data spaces, we use a digital product passport (DPP) that illustrates the traceability of a supply chain by focusing on the subject of CO2. This example demonstrates how data-driven innovation and the building blocks of economy of data can promote product improvements or even new products. 38 | 39 | "_Without Data, you're just another person with an opinion."_ 40 | 41 | _- W. Edwards Deming_ 42 | 43 | # Introduction 44 | 45 | A regional organic grocery store operator wants to raise consumer and supplier awareness of grocery carbon footprint. The goal is to encourage suppliers to produce and offer groceries that are more climate friendly and sustainable in the long term, as research suggests that consumers are more likely to purchase an item if they are provided information about its carbon footprint. 46 | 47 | The product to be created is a data-based calculation of CO2 emissions of every individual food item, in the form of a badge that is placed on each product label of all private label products. 48 | 49 | The organic grocery store operator has a wide range of food items. The product range mainly comprises basic food items and dairy products. As the intended project is very extensive, the company decides to run a proof of concept (POC) using the supply chain of one individual food item. 50 | 51 | The generally popular Gouda cheese from organic farming is chosen for the proof of concept. 52 | 53 | Alex, who is the responsible project manager, identifies all stakeholders and determines what is needed to calculate the carbon footprint of this food item. A first research delivers the following key parameters: 54 | 55 | - CO2 emissions for the milk needed for a 350g package of Gouda cheese 56 | - CO2 emissions of cheese production 57 | - CO2 emissions for packaging 58 | - CO2 emissions for warehouse and logistics to the grocery store 59 | 60 | Since these parameters are not given figures but based on various calculations, the project manager visualizes which data sets are needed and from whom: 61 | 62 | ![](./user_journey_images/1_Precondition.jpg) 63 | 64 | It quickly becomes apparent that this calculation, even if it is just for a single food item, involves many different data sources and data suppliers. Even if Alex collects all the necessary one-time data, she would still need help from external parties to do the calculation. 65 | 66 | So she rolls up her sleeves, contacts her suppliers and gets everyone on board. Everybody is interested in principle in participating in the project and eager to see whether it is feasible at all, even if some of the partners are skeptical whether the data will only be used as agreed upon. 67 | 68 | The organic farm and the organic cheese producer approve of transparency and awareness raising and, moreover, hope to achieve a long-term increase in purchase of their products due to better sales figures. The data savvy logistics partner is supportive of this goal, and also sees the potential to optimize logistics and reduce CO2 emissions (and costs) based on the collected data. After all, the data sharing does not have to be a one-way street in the long run. 69 | 70 | While data suppliers are determining all the necessary figures, Alex is looking for an adequate data processor and data intermediary. (A data intermediary is an organization that processes data on behalf of another organization. A data intermediary works to manage data in a specific way and ensure a certain degree of trust with regard to the use of data.) What she needs is a service company which can prepare data from different sources for further processing, in terms of both content and quality, and also do the required calculations like aggregation, anonymization and analysis. 71 | 72 | Since this is a proof of concept, and no established data exchange contracts are available an NDA is drafted and contractual agreements for sharing data in this project are arranged. Alex ensures that there is a digital storage location to save the data. Due to the uniqueness of the project, all stakeholders agree to save their data in the cloud storage run by the organic grocery store operator, where the data processing company can also access the data from the cloud storage. 73 | 74 | All contracts are signed, the data is stored in the cloud storage of the organic grocery store operator and subsequently processed, and voila the Gouda cheese gets its first carbon footprint number. 75 | 76 | To be able to use the successful proof of concept for management and marketing purposes, Alex turns to the graphics department, who develop the corresponding packing design, i.e. the future product labels. 77 | 78 | The presentation of the label design is well received, and Alex gets the order to develop the design of carbon footprint labels for all private label products. 79 | 80 | # User Story – Part I 81 | 82 | Alex is now facing the challenge of getting all suppliers and supply-chain participants on board. However, the requirements are much higher this time than they were with the proof of concept because the complexity of the project has increased exponentially. This is due to the continuous data streams and a substantially greater number of participants in the future data-based product. 83 | 84 | Alex identifies the following new requirements: 85 | 86 | - First and foremost, there must be no manual steps required for this process because that would not be feasible. Among other things, this means that the values determined must be send automatically to the packaging producer so to print the calculated carbon footprint on the product labels. Therefore, the data stream must flow in both directions: to and from the data processor. If we think about this in the case of Gouda cheese, it would look like this: 87 | 88 | ![](./user_journey_images/2_Members.jpg) 89 | 90 | - Besides that, the data suppliers also have a requirement that they must be able to define which data set may be shared, with whom (access policy), and under what conditions (usage policy). After all, they are dealing with confidential production data that can cause considerable damage in the wrong hands. If we think about this in the case of Gouda cheese, usage policies and access policies need to be included in multiple places: 91 | 92 | ![](./user_journey_images/3_Members_Policies.jpg) 93 | 94 | - In addition, the exchange of data has to be automatic and only possible if specific conditions are fulfilled, i.e. only if the data recipient has accepted the policies of the data provider. 95 | 96 | ![](./user_journey_images/4_Policies_Agreed.jpg) 97 | 98 | - Unfortunately, in our case all of this cannot be yet fully automated due to the way the data was collected in the first place, i.e. by uploading them to the data drive of the organic grocery store operator. At the time of delivery of data to the cloud storage of the organic grocery store operator, the respective data supplier had no way of controlling whether the data will be used only for the agreed purpose, or who could view or access the data in the cloud storage. However, every data supplier wants to ensure that they can leverage control and make an explicit decision as to who may receive, view or process their data (data autonomy issue). For the data supplier to be able to do so, conditions of data exchange must be clearly defined and negotiated before sending the data, as well as continuously monitored and recorded during the exchange of data. 99 | 100 | For the proof of concept, however, data was delivered directly to the cloud storage of the organic grocery store operator. This procedure was acceptable for everyone in this case, since all the parties agreed thereto, it was a small and clearly defined circle of participants, the one-time data transfer was not business-critical and there was a general mutual trust among the participating parties. This resulted in the following picture: 101 | 102 | ![](./user_journey_images/5_Sovereignity_Loss.jpg) 103 | 104 | In order to create a data-based product, such as a carbon footprint label for every packaged product, as well as to produce carbon footprint labels on a large scale for all private label products of the organic grocery store with participants who do not know each other, a completely different type of architecture and data transfer is necessary: a digital room in which business partners can find each other, access data descriptions (=metadata) and agree on conditions for data exchange (=policies) before any actual data exchange takes place. The mentioned digital room is known as a data space: 105 | 106 | ![](./user_journey_images/6_What_is_a_Data_Space.jpg) 107 | 108 | Participants in a data space can access all shared data descriptions (metadata), but not yet the contents of data files! This is comparable to a book cover providing the title, the author and a short description (= metadata), and a price tag stating the conditions for access to the contents (= usage and access policies). In our example, the metadata set could be titled "Milk production per cow per year", provided by "farmer XY", offered under the usage condition (usage policy) that "data my only be stored in the EU" and access condition (access policy) "for members of the organic grocery stores club". If an interested party complies with these conditions, the technological basis is established to enable the data transfer (i.e. connectors exchange data between the two partners). 109 | 110 | ![](./user_journey_images/7_Data_Transfer.jpg) 111 | 112 | If we look at this on a superficial technological level, it means that the participants keep their respective data and data sources and only grant other participants access to the respective metadata and policies. Only after the policies are complied with, the data transfer takes place by means of connectors. A connector is thus a technical building block that is linked to a data source on one side of the exchange and can connect with other connectors under the defined conditions on the other side. These connectors are not merely adapters; they also ensure that the policies are being complied with. For example, a connector can carry the information that the data source linked to it is located in the EU; otherwise, it is not possible to agree to the policies which govern the data transfer, and no data transfer can take place. (Data Exchange Service) 113 | 114 | ![](./user_journey_images/8_Source_Connectors_Connection.jpg) 115 | 116 | **Summary:** 117 | 118 | The need for data spaces quickly becomes apparent when it comes to sharing data in a trusted environment with partners outside of your own organization. Our increasingly data-driven world creates products from data, therefore it is essential to also create a suitable technological environment. Neutral, decentralized digital spaces (data spaces) are not controlled by one of the parties but instead provide a fair, cooperative playing field for all participants, enabling them to exchange metadata and agree on access and usage policies. From a technological point of view, connectors form the basis for such a data exchange. 119 | 120 | 121 | 122 | 123 | # User Story – Part II 124 | 125 | Alex managed to bring together all the suppliers of food items in the private label product line. The resulting map is very remarkable for this use case alone! Numerous suppliers and logistics partners deliver data under their own terms and conditions, and with the aim to calculate the carbon footprint of an item and provide it on the product packaging. Not only is data supplied from individual data sources, but it's also processed by a specialized company for the purpose of data analysis and then fed back so that the resulting values can be printed on each product label. Due to the large number of participants, manual administration of the data transfers is no longer feasible. Even a simplified picture clearly shows how many connections need to be established under the defined conditions: 126 | 127 | ![](./user_journey_images/9_Data_Space_all_members.jpg) 128 | 129 | Moreover, the project is not only successful within the organization. The implementation of such a data-based product, which increases transparency in terms of sustainability and involves so many different stakeholders, also attracts media attention. Various magazines, blogs and e-journals refer to the use case, and the topic is increasingly gaining momentum. Within the shortest time this is reflected in the sales figures, as the eco-conscious consumers of the organic grocery store appreciate the transparency. 130 | 131 | The increased media attention and the rising sales lead to Alex receiving another project assignment from her superior. She is asked to "... figure out what else can be done with the data. It all seems to work out quite well!" 132 | 133 | During the proof of concept, one of the logistics partners has implied that there's interest in a data-based route optimization. 134 | 135 | Alex, the logistics partner and the data processing startup company set out to formulate another use case with the aim to not only measure CO2 emissions but to optimize them. The three parties agree to analyze data for the purpose of route optimization during a contract period of one year: on the basis of the continuously collected data, route planning shall be optimized, synergies with other suppliers created and progress measured. After the expiration of the contract period, the automated data transfers shall be discontinued, and the optimized carbon footprint shall be used as reference for further measures. 136 | 137 | Stopping all data flows at the end of the agreed period does not pose a problem since the connectors ensure that data may only be transferred under the defined conditions at the technological level. In the case that a contract expires or a partner does not comply with one or several conditions, no data transfer takes place. This means that every data supplier has full control over their data transfers at all times, and that is also highly impressive to the Chief Data Officer and the legal department of the organic supermarket chain. 138 | 139 | Connectors thus offer several advantages: 140 | 141 | - Data transfers only take place when conditions defined by the data supplier are met and such conditions must be explicitly agreed to. 142 | - These conditions are expressed through policies, which can be quite extensive to ensure that a data consumer is in fact eligible to access the data on the level of technology. For example, already at this level it can be defined that only members of a certain group may access the data (by using verifiable credentials). 143 | - On the level of connectors, it can be ensured that the target connector uses a storage location which meets the requirements with regard to both its geographical location and available certificates. For example, if data may only be stored in the EU, then this is a policy which can be ensured at the level of connectors. 144 | - Data transfers can be limited in several ways, for example "one-time transfer only" or "continuous updates from the production company for the time period defined". No access to data is possible outside of the defined policies. 145 | 146 | This technology and its use cases open many new possibilities for the participants and draw further public attention. 147 | 148 | The project started with a simple proof of concept, and it led to a data-based product which creates transparency (a carbon footprint label for the entire range of products) and to the optimization of the actual CO2 emissions for the same products. Alex has not only created real added value for her organization, its customers and the environment, but also has attracted the interest of many other parties. 149 | 150 | The focus of interest is now no longer the question how individual logistics partners can optimize their routes and minimize CO2 emissions, but the obvious next step is to map and optimize the entire supply chain down to the fertilizer suppliers and their subcontractors. In addition, due to the requirements of supply chain laws (supply chain due diligence), it must also be ensured that all partners meet the defined requirements and ethical standards and show a sufficient level of integrity. For this reason, additional service providers want to join the data space. 151 | 152 | There are also other parties which have heard about the data space, and they are now striving to establish their own data space because counterfeit products constitute a serious problem and new data-driven technologies open new possibilities for them to address this issue. 153 | 154 | In summary, it can be stated that these are the challenges most suited for data spaces: 155 | 156 | - Many different parties want to generate added value by using data, be it for data-based products, services, process optimization or cost savings. 157 | - Often these different parties have no trust relationship, they often do not know each other, or may even be competitors. 158 | - Moreover, each of these parties must keep control over their own data flows at all times and know who can access their data and when. 159 | - The parties also must know under which conditions their data can be accessed and used. 160 | 161 | All of this can already be mapped and implemented with existing technologies, i.e. decentralized systems that are interconnected by means of connectors, cooperating in a data space to enable collaboration. 162 | 163 | Data spaces are not limited to supply chains or individual use cases in the area of sustainability. Data spaces can combine whole domains such as mobility, health, energy, circular economy, or agriculture - no limits are set here. Data spaces can be driven by corporations, but also research or policies, or even be a combination of all of the above. 164 | 165 | It is all about sovereignty, autonomy, trust and innovation in decentralized data ecosystems. 166 | 167 | All of it, and much more, is already feasible with technology available today. 168 | 169 | All we need to do is - do it! 170 | -------------------------------------------------------------------------------- /Identity Management/images/parts-of-a-did-2.svg: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | --------------------------------------------------------------------------------