├── .github
├── validate-examples
└── workflows
│ └── ci.yaml
├── .gitignore
├── CHANGELOG.md
├── CNAME
├── LICENSE
├── README.md
├── _config.yml
├── _layouts
└── default.html
├── datacontract.init.yaml
├── datacontract.schema.json
├── definition.schema.json
├── diagrams
├── automation.drawio
├── datacontract.drawio
└── favicon.drawio
├── examples
├── covid-cases
│ ├── datacontract.html
│ └── datacontract.yaml
├── datacontract.html
├── generate-catalog
├── index.html
├── muellimperium
│ ├── data.csv
│ ├── datacontract.html
│ └── datacontract.yaml
├── orders-latest-nested
│ ├── datacontract.html
│ └── datacontract.yaml
└── orders-latest
│ ├── datacontract.html
│ └── datacontract.yaml
├── gen-openapi-yaml
├── images
├── categories.png
├── datacontract-logo.png
├── datacontract-preview.png
├── datacontract.png
├── favicon.png
└── supported-by-innoq--petrol-apricot.svg
├── versions
├── 0.9.0
│ ├── README.md
│ ├── datacontract.init.yaml
│ └── datacontract.schema.json
├── 0.9.1
│ ├── README.md
│ ├── datacontract.init.yaml
│ └── datacontract.schema.json
├── 0.9.2
│ ├── README.md
│ ├── datacontract.init.yaml
│ └── datacontract.schema.json
└── 0.9.3
│ ├── README.md
│ ├── datacontract.init.yaml
│ ├── datacontract.schema.json
│ └── definition.schema.json
└── workshop.md
/.github/validate-examples:
--------------------------------------------------------------------------------
1 | #!/bin/bash
2 |
3 | set -ex
4 |
5 | #function datacontract() {
6 | # docker run --rm -v "${PWD}:/home/datacontract" --platform linux/amd64 datacontract/cli:latest "$@"
7 | #}
8 |
9 | datacontract --version
10 |
11 | SCHEMA=datacontract.schema.json
12 |
13 | awk '/^```yaml$/{flag=1; next} /^```$/{print ""; flag=0; exit} flag' README.md > datacontract-from-readme.yaml
14 | datacontract lint datacontract-from-readme.yaml --schema $SCHEMA
15 | datacontract test --examples datacontract-from-readme.yaml --schema $SCHEMA
16 | # Compare with example?
17 |
18 | datacontract lint examples/orders-latest/datacontract.yaml --schema $SCHEMA
19 | datacontract test --examples examples/orders-latest/datacontract.yaml --schema $SCHEMA
20 |
21 | datacontract lint examples/orders-latest-nested/datacontract.yaml --schema $SCHEMA
22 | datacontract test --examples examples/orders-latest-nested/datacontract.yaml --schema $SCHEMA || true # examples are not nested
23 |
24 | datacontract lint examples/covid-cases/datacontract.yaml --schema $SCHEMA
25 | datacontract test --examples examples/covid-cases/datacontract.yaml --schema $SCHEMA || true
26 |
27 |
--------------------------------------------------------------------------------
/.github/workflows/ci.yaml:
--------------------------------------------------------------------------------
1 | on:
2 | push:
3 | pull_request:
4 | workflow_call:
5 |
6 | name: CI
7 | jobs:
8 | test:
9 | if: false # skip as the example structure has changed with v1.1.0
10 | runs-on: ubuntu-latest
11 | steps:
12 | - uses: actions/checkout@v4
13 | - name: Set up Python
14 | uses: actions/setup-python@v5
15 | with:
16 | python-version: 3.11
17 | - name: Install dependencies
18 | run: |
19 | python -m pip install --upgrade pip
20 | pip install datacontract-cli[all]
21 | datacontract --version
22 | - name: Validate examples
23 | run: .github/validate-examples
24 |
--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 | .idea/
2 | *.bkp
3 | datacontract.schema.openapi-format.*
4 | .soda/
5 | datacontract-from-readme.yaml
6 | .duckdb/
7 |
--------------------------------------------------------------------------------
/CHANGELOG.md:
--------------------------------------------------------------------------------
1 | # Changelog
2 |
3 | All notable changes to this project will be documented in this file.
4 |
5 | The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
6 | and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7 |
8 | ## [Unreleased]
9 |
10 | ## [1.1.0] - 2024-10-30
11 |
12 | ### Added
13 | - Data quality on model and field level ([#55](https://github.com/datacontract/datacontract-specification/issues/55))
14 | - Lineage support ([#90](https://github.com/datacontract/datacontract-specification/issues/90))
15 | - Field and definition `examples` as array of any type, instead of `example` as a single value ([#29](https://github.com/datacontract/datacontract-specification/issues/29)
16 | - Support for server-specific data types as config map ([#63](https://github.com/datacontract/datacontract-specification/issues/63))
17 | - AWS Glue Catalog server support
18 | - sftp server support
19 | - info.status field
20 | - oracle server support
21 | - field.title attribute
22 | - model.title attribute
23 | - AWS Kinesis Data Streams server support
24 | - field.links attribute
25 | - Trino support
26 | - Field `type: map` support with properties `keys` and `values`
27 | - Definitions: `fields`, for type `object`, `record`, and `struct`
28 | - Field `field.primaryKey` (Replaces `field.primary`)
29 | - Field `model.primaryKey` to describe a composite primary key
30 | - Add Redshift server properties `clusterIdentifier`, `endpoint`, `host` and `port`.
31 |
32 | ### Removed
33 |
34 | - `definitions.domain` removed (use a hierarchical structure instead)
35 | - `definitions.name` removed (use a hierarchical structure instead)
36 | - `quality` on top-level removed
37 | - `examples` on top-level removed
38 | - `schema` removed in favor of encoding any physical schema configuration in the `model` using the `config` map at the field level and supporting import/export ([#21](https://github.com/datacontract/datacontract-specification/issues/21)).
39 |
40 | ### Deprecated
41 |
42 | - `field.primary` (use `field.primaryKey` instead)
43 |
44 |
45 | ## [0.9.3] - 2024-03-06
46 |
47 | ### Added
48 |
49 | - Service levels as a top level `servicelevels` element
50 | - pubsub server support
51 | - primary key and relationship support via `field.primary` and `field.references` attributes
52 | - databricks server support improved
53 |
54 | ## [0.9.2] - 2024-01-04
55 |
56 | ### Added
57 |
58 | - Format and validation attributes to fields in models and definitions
59 | - Postgres support
60 | - Databricks support
61 |
62 | ## [0.9.1] - 2023-11-19
63 |
64 | ### Added
65 |
66 | - A logical data model (#13), mainly to simplify editor support with a defined schema, easier to detect breaking changes, and better Databricks support.
67 | - Definitions (#14) for reusable semantic definitions within one data contract or across data contracts.
68 |
69 | ### Removed
70 |
71 | - Property `info.dataProduct` as data products should define which data contracts they implement.
72 | - Property `info.outputPort` as data products should define which data contracts they implement.
73 |
74 | Those removals are not considered as breaking changes, as these attributes are now treated as specification extensions.
75 |
76 | ## [0.9.0] - 2023-09-12
77 |
78 | First public release.
79 |
--------------------------------------------------------------------------------
/CNAME:
--------------------------------------------------------------------------------
1 | datacontract.com
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) 2023 Data Mesh Architecture
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 |
--------------------------------------------------------------------------------
/_config.yml:
--------------------------------------------------------------------------------
1 | plugins:
2 | - jekyll-sitemap
3 | name: Data Contract Specification
4 | title: null
5 | description: Data contracts bring data providers and data consumers together.
6 |
--------------------------------------------------------------------------------
/_layouts/default.html:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
12 |
13 |
14 | {% seo %}
15 |
16 |
26 |
27 |
28 |
29 | {% if site.title and site.title != page.title %}
30 |
31 | {% endif %}
32 |
33 | {{ content }}
34 |
35 | {% if site.github.private != true and site.github.license %}
36 |
39 | {% endif %}
40 |
41 |
53 |
54 |
55 | {% if site.google_analytics %}
56 |
64 | {% endif %}
65 |
66 |
67 |
68 |
69 |
70 |
--------------------------------------------------------------------------------
/datacontract.init.yaml:
--------------------------------------------------------------------------------
1 | dataContractSpecification: 1.1.0
2 | id: my-data-contract-id
3 | info:
4 | title: My Data Contract
5 | version: 0.0.1
6 | # description:
7 | # owner:
8 | # contact:
9 | # name:
10 | # url:
11 | # email:
12 |
13 |
14 | ### servers
15 |
16 | #servers:
17 | # production:
18 | # type: s3
19 | # location: s3://
20 | # format: parquet
21 | # delimiter: new_line
22 |
23 | ### terms
24 |
25 | #terms:
26 | # usage:
27 | # limitations:
28 | # billing:
29 | # noticePeriod:
30 |
31 |
32 | ### models
33 |
34 | # models:
35 | # my_model:
36 | # description:
37 | # type:
38 | # fields:
39 | # my_field:
40 | # type:
41 | # description:
42 |
43 |
44 | ### definitions
45 |
46 | # definitions:
47 | # my_field:
48 | # domain:
49 | # name:
50 | # title:
51 | # type:
52 | # description:
53 | # example:
54 | # pii:
55 | # classification:
56 |
57 |
58 | ### servicelevels
59 |
60 | #servicelevels:
61 | # availability:
62 | # description: The server is available during support hours
63 | # percentage: 99.9%
64 | # retention:
65 | # description: Data is retained for one year because!
66 | # period: P1Y
67 | # unlimited: false
68 | # latency:
69 | # description: Data is available within 25 hours after the order was placed
70 | # threshold: 25h
71 | # sourceTimestampField: orders.order_timestamp
72 | # processedTimestampField: orders.processed_timestamp
73 | # freshness:
74 | # description: The age of the youngest row in a table.
75 | # threshold: 25h
76 | # timestampField: orders.order_timestamp
77 | # frequency:
78 | # description: Data is delivered once a day
79 | # type: batch # or streaming
80 | # interval: daily # for batch, either or cron
81 | # cron: 0 0 * * * # for batch, either or interval
82 | # support:
83 | # description: The data is available during typical business hours at headquarters
84 | # time: 9am to 5pm in EST on business days
85 | # responseTime: 1h
86 | # backup:
87 | # description: Data is backed up once a week, every Sunday at 0:00 UTC.
88 | # interval: weekly
89 | # cron: 0 0 * * 0
90 | # recoveryTime: 24 hours
91 | # recoveryPoint: 1 week
92 |
--------------------------------------------------------------------------------
/definition.schema.json:
--------------------------------------------------------------------------------
1 | {
2 | "$schema": "http://json-schema.org/draft-07/schema#",
3 | "type": "object",
4 | "description": "Clear and concise explanations of syntax, semantic, and classification of business objects in a given domain.",
5 | "properties": {
6 | "id": {
7 | "type": "string",
8 | "description": "A unique identifier for this definition. Encode the domain into the ID, separated by slashes.",
9 | "examples": [
10 | "checkout/order_id"
11 | ]
12 | },
13 | "title": {
14 | "type": "string",
15 | "description": "The business name of this definition."
16 | },
17 | "description": {
18 | "type": "string",
19 | "description": "Clear and concise explanations related to the domain."
20 | },
21 | "type": {
22 | "type": "string",
23 | "description": "The logical data type."
24 | },
25 | "minLength": {
26 | "type": "integer",
27 | "description": "A value must be greater than or equal to this value. Applies only to string types."
28 | },
29 | "maxLength": {
30 | "type": "integer",
31 | "description": "A value must be less than or equal to this value. Applies only to string types."
32 | },
33 | "format": {
34 | "type": "string",
35 | "description": "Specific format requirements for the value (e.g., 'email', 'uri', 'uuid')."
36 | },
37 | "precision": {
38 | "type": "integer",
39 | "examples": [
40 | 38
41 | ],
42 | "description": "The maximum number of digits in a number. Only applies to numeric values. Defaults to 38."
43 | },
44 | "scale": {
45 | "type": "integer",
46 | "examples": [
47 | 0
48 | ],
49 | "description": "The maximum number of decimal places in a number. Only applies to numeric values. Defaults to 0."
50 | },
51 | "pattern": {
52 | "type": "string",
53 | "description": "A regular expression pattern the value must match. Applies only to string types."
54 | },
55 | "example": {
56 | "type": "string",
57 | "description": "An example value for this field.",
58 | "deprecationMessage": "Use the examples field instead."
59 | },
60 | "examples": {
61 | "type": "array",
62 | "description": "A examples value for this field."
63 | },
64 | "pii": {
65 | "type": "boolean",
66 | "description": "Indicates if the field contains Personal Identifiable Information (PII)."
67 | },
68 | "classification": {
69 | "type": "string",
70 | "description": "The data class defining the sensitivity level for this field."
71 | },
72 | "tags": {
73 | "type": "array",
74 | "items": {
75 | "type": "string"
76 | },
77 | "description": "Custom metadata to provide additional context."
78 | },
79 | "links": {
80 | "type": "object",
81 | "description": "Links to external resources.",
82 | "minProperties": 1,
83 | "propertyNames": {
84 | "pattern": "^[a-zA-Z0-9_-]+$"
85 | },
86 | "additionalProperties": {
87 | "type": "string",
88 | "title": "Link",
89 | "description": "A URL to an external resource.",
90 | "format": "uri",
91 | "examples": [
92 | "https://example.com"
93 | ]
94 | }
95 | }
96 | },
97 | "required": [
98 | "type"
99 | ]
100 | }
101 |
--------------------------------------------------------------------------------
/diagrams/favicon.drawio:
--------------------------------------------------------------------------------
1 |
--------------------------------------------------------------------------------
/examples/covid-cases/datacontract.yaml:
--------------------------------------------------------------------------------
1 | dataContractSpecification: 0.9.3
2 | id: covid_cases
3 | info:
4 | title: COVID-19 cases
5 | description: Johns Hopkins University Consolidated data on COVID-19 cases, sourced from Enigma
6 | version: "0.0.1"
7 | links:
8 | blog: https://aws.amazon.com/blogs/big-data/a-public-data-lake-for-analysis-of-covid-19-data/
9 | data-explorer: https://dj2taa9i652rf.cloudfront.net/
10 | data: https://covid19-lake.s3.us-east-2.amazonaws.com/enigma-jhu/json/part-00000-adec1cd2-96df-4c6b-a5f2-780f092951ba-c000.json
11 | servers:
12 | s3-json:
13 | type: s3
14 | location: s3://covid19-lake/enigma-jhu/json/*.json
15 | format: json
16 | delimiter: new_line
17 | models:
18 | covid_cases:
19 | description: the number of confirmed covid cases reported for a specified region, with location and county/province/country information.
20 | fields:
21 | fips:
22 | type: string
23 | description: state and county two digits code
24 | admin2:
25 | type: string
26 | description: county name
27 | province_state:
28 | type: string
29 | description: province name or state name
30 | country_region:
31 | type: string
32 | description: country name or region name
33 | last_update:
34 | type: timestamp_ntz
35 | description: last update timestamp
36 | latitude:
37 | type: double
38 | description: location (latitude)
39 | longitude:
40 | type: double
41 | description: location (longitude)
42 | confirmed:
43 | type: int
44 | description: number of confirmed cases
45 | combined_key:
46 | type: string
47 | description: county name+state name+country name
48 | quality:
49 | type: SodaCL
50 | specification:
51 | checks for covid_cases:
52 | - freshness(last_update::datetime) < 5000d # dataset is not updated anymore
53 | - row_count > 1000
54 |
--------------------------------------------------------------------------------
/examples/generate-catalog:
--------------------------------------------------------------------------------
1 | datacontract catalog --files "**/*.yaml" --output "."
2 |
--------------------------------------------------------------------------------
/examples/muellimperium/data.csv:
--------------------------------------------------------------------------------
1 | Pluto,residual_waste,2021-01-09
2 | Pluto,bio_waste,2021-01-02
3 | Pluto,paper,2021-01-11
4 | Pluto,plastic,2021-01-12
5 | Pluto,bulky_waste,2021-02-04
6 | Earth,residual_waste,2021-01-14
7 | Earth,bio_waste,2021-01-08
8 | Earth,paper,2021-01-12
9 | Earth,plastic,2021-01-27
10 | Earth,bulky_waste,2021-02-03
11 |
--------------------------------------------------------------------------------
/examples/muellimperium/datacontract.yaml:
--------------------------------------------------------------------------------
1 | dataContractSpecification: 0.9.3
2 | id: muellimperium-exchange-format
3 | info:
4 | title: Muellimperium Exchange Format
5 | version: 0.0.1
6 | description: |
7 | The Muellimperium Exchange Format is a data contract for exchanging data between the Muellimperium and its partners.
8 | owner: Emperor of the Muellimperium
9 | contract:
10 | name: The Emperor
11 | email: the-emperor@muellimperium.com
12 | servers:
13 | exchange:
14 | type: local
15 | path: data.csv
16 | format: csv
17 | models:
18 | garbage_collection:
19 | type: table
20 | fields:
21 | location:
22 | type: text
23 | required: true
24 | description: The location where the garbage is collected.
25 | garbage_type:
26 | type: text
27 | required: true
28 | description: The type of garbage that is collected.
29 | enum:
30 | - paper
31 | - plastic
32 | - residual_waste
33 | - bio_waste
34 | - bulky_waste
35 | - hazardous_waste
36 | collection_date:
37 | type: date
38 | required: true
39 | description: The date when the garbage is collected.
40 | examples:
41 | - model: garbage_collection
42 | type: json
43 | data:
44 | - location: "Musterstadt"
45 | garbage_type: "paper"
46 | collection_date: "2022-01-01"
47 | - location: "Musterstadt"
48 | garbage_type: "plastic"
49 | collection_date: "2022-01-02"
50 | - location: "Musterstadt"
51 | garbage_type: "residual_waste"
52 | collection_date: "2022-01-03"
53 |
--------------------------------------------------------------------------------
/examples/orders-latest-nested/datacontract.yaml:
--------------------------------------------------------------------------------
1 | dataContractSpecification: 0.9.3
2 | id: urn:datacontract:checkout:orders-latest-nested
3 | info:
4 | title: Orders Latest (Nested)
5 | version: 1.0.0
6 | description: |
7 | Successful customer orders in the webshop.
8 | All orders since 2020-01-01.
9 | Orders with their line items are in their current state (no history included).
10 | owner: Checkout Team
11 | contact:
12 | name: John Doe (Data Product Owner)
13 | url: https://teams.microsoft.com/l/channel/example/checkout
14 | terms:
15 | usage: |
16 | Data can be used for reports, analytics and machine learning use cases.
17 | Order may be linked and joined by other tables
18 | limitations: |
19 | Not suitable for real-time use cases.
20 | Data may not be used to identify individual customers.
21 | Max data processing per day: 10 TiB
22 | billing: 5000 USD per month
23 | noticePeriod: P3M
24 | models:
25 | orders:
26 | description: One record per order. Includes cancelled and deleted orders.
27 | type: table
28 | fields:
29 | order_id:
30 | $ref: '#/definitions/order_id'
31 | required: true
32 | unique: true
33 | primary: true
34 | order_timestamp:
35 | description: The business timestamp in UTC when the order was successfully registered in the source system and the payment was successful.
36 | type: timestamp
37 | required: true
38 | order_total:
39 | description: Total amount the smallest monetary unit (e.g., cents).
40 | type: long
41 | required: true
42 | customer_id:
43 | description: Unique identifier for the customer.
44 | type: text
45 | minLength: 10
46 | maxLength: 20
47 | customer_email_address:
48 | description: The email address, as entered by the customer. The email address was not verified.
49 | type: text
50 | format: email
51 | required: true
52 | address:
53 | type: object
54 | description: The delivery address of the customer.
55 | fields:
56 | street:
57 | description: The street name and house number.
58 | type: text
59 | city:
60 | description: The city name.
61 | type: text
62 | additional_lines:
63 | description: Additional address lines, such as floor, apartment, or company name.
64 | type: array
65 | items:
66 | type: text
67 | description: Additional line
68 | processed_timestamp:
69 | description: The timestamp when the record was processed by the data platform.
70 | type: timestamp
71 | required: true
72 | line_items:
73 | description: A single article that is part of an order.
74 | type: table
75 | fields:
76 | lines_item_id:
77 | type: text
78 | description: Primary key of the lines_item_id table
79 | required: true
80 | unique: true
81 | primary: true
82 | order_id:
83 | $ref: '#/definitions/order_id'
84 | references: orders.order_id
85 | sku:
86 | description: The purchased article number
87 | $ref: '#/definitions/sku'
88 | definitions:
89 | order_id:
90 | domain: checkout
91 | name: order_id
92 | title: Order ID
93 | type: text
94 | format: uuid
95 | description: An internal ID that identifies an order in the online shop.
96 | example: 243c25e5-a081-43a9-aeab-6d5d5b6cb5e2
97 | pii: true
98 | classification: restricted
99 | sku:
100 | domain: inventory
101 | name: sku
102 | title: Stock Keeping Unit
103 | type: text
104 | pattern: ^[A-Za-z0-9]{8,14}$
105 | example: "96385074"
106 | description: |
107 | A Stock Keeping Unit (SKU) is an internal unique identifier for an article.
108 | It is typically associated with an article's barcode, such as the EAN/GTIN.
109 |
--------------------------------------------------------------------------------
/examples/orders-latest/datacontract.yaml:
--------------------------------------------------------------------------------
1 | dataContractSpecification: 1.1.0
2 | id: urn:datacontract:checkout:orders-latest
3 | info:
4 | title: Orders Latest
5 | version: 2.0.0
6 | description: |
7 | Successful customer orders in the webshop.
8 | All orders since 2020-01-01.
9 | Orders with their line items are in their current state (no history included).
10 | owner: Checkout Team
11 | contact:
12 | name: John Doe (Data Product Owner)
13 | url: https://teams.microsoft.com/l/channel/example/checkout
14 | servers:
15 | production:
16 | type: s3
17 | environment: prod
18 | location: s3://datacontract-example-orders-latest/v2/{model}/*.json
19 | format: json
20 | delimiter: new_line
21 | description: "One folder per model. One file per day."
22 | roles:
23 | - name: analyst_us
24 | description: Access to the data for US region
25 | - name: analyst_cn
26 | description: Access to the data for China region
27 | terms:
28 | usage: |
29 | Data can be used for reports, analytics and machine learning use cases.
30 | Order may be linked and joined by other tables
31 | limitations: |
32 | Not suitable for real-time use cases.
33 | Data may not be used to identify individual customers.
34 | Max data processing per day: 10 TiB
35 | policies:
36 | - name: privacy-policy
37 | url: https://example.com/privacy-policy
38 | - name: license
39 | description: External data is licensed under agreement 1234.
40 | url: https://example.com/license/1234
41 | billing: 5000 USD per month
42 | noticePeriod: P3M
43 | models:
44 | orders:
45 | description: One record per order. Includes cancelled and deleted orders.
46 | type: table
47 | fields:
48 | order_id:
49 | $ref: '#/definitions/order_id'
50 | required: true
51 | unique: true
52 | primaryKey: true
53 | order_timestamp:
54 | description: The business timestamp in UTC when the order was successfully registered in the source system and the payment was successful.
55 | type: timestamp
56 | required: true
57 | examples:
58 | - "2024-09-09T08:30:00Z"
59 | tags: ["business-timestamp"]
60 | order_total:
61 | description: Total amount the smallest monetary unit (e.g., cents).
62 | type: long
63 | required: true
64 | examples:
65 | - 9999
66 | quality:
67 | - type: sql
68 | description: 95% of all order total values are expected to be between 10 and 499 EUR.
69 | query: |
70 | SELECT quantile_cont(order_total, 0.95) AS percentile_95
71 | FROM orders
72 | mustBeBetween: [1000, 49900]
73 | customer_id:
74 | description: Unique identifier for the customer.
75 | type: text
76 | minLength: 10
77 | maxLength: 20
78 | customer_email_address:
79 | description: The email address, as entered by the customer.
80 | type: text
81 | format: email
82 | required: true
83 | pii: true
84 | classification: sensitive
85 | quality:
86 | - type: text
87 | description: The email address is not verified and may be invalid.
88 | lineage:
89 | inputFields:
90 | - namespace: com.example.service.checkout
91 | name: checkout_db.orders
92 | field: email_address
93 | processed_timestamp:
94 | description: The timestamp when the record was processed by the data platform.
95 | type: timestamp
96 | required: true
97 | config:
98 | jsonType: string
99 | jsonFormat: date-time
100 | quality:
101 | - type: sql
102 | description: The maximum duration between two orders should be less that 3600 seconds
103 | query: |
104 | SELECT MAX(duration) AS max_duration FROM (SELECT EXTRACT(EPOCH FROM (order_timestamp - LAG(order_timestamp)
105 | OVER (ORDER BY order_timestamp))) AS duration FROM orders)
106 | mustBeLessThan: 3600
107 | - type: sql
108 | description: Row Count
109 | query: |
110 | SELECT count(*) as row_count
111 | FROM orders
112 | mustBeGreaterThan: 5
113 | examples:
114 | - |
115 | order_id,order_timestamp,order_total,customer_id,customer_email_address,processed_timestamp
116 | "1001","2030-09-09T08:30:00Z",2500,"1000000001","mary.taylor82@example.com","2030-09-09T08:31:00Z"
117 | "1002","2030-09-08T15:45:00Z",1800,"1000000002","michael.miller83@example.com","2030-09-09T08:31:00Z"
118 | "1003","2030-09-07T12:15:00Z",3200,"1000000003","michael.smith5@example.com","2030-09-09T08:31:00Z"
119 | "1004","2030-09-06T19:20:00Z",1500,"1000000004","elizabeth.moore80@example.com","2030-09-09T08:31:00Z"
120 | "1005","2030-09-05T10:10:00Z",4200,"1000000004","elizabeth.moore80@example.com","2030-09-09T08:31:00Z"
121 | "1006","2030-09-04T14:55:00Z",2800,"1000000005","john.davis28@example.com","2030-09-09T08:31:00Z"
122 | "1007","2030-09-03T21:05:00Z",1900,"1000000006","linda.brown67@example.com","2030-09-09T08:31:00Z"
123 | "1008","2030-09-02T17:40:00Z",3600,"1000000007","patricia.smith40@example.com","2030-09-09T08:31:00Z"
124 | "1009","2030-09-01T09:25:00Z",3100,"1000000008","linda.wilson43@example.com","2030-09-09T08:31:00Z"
125 | "1010","2030-08-31T22:50:00Z",2700,"1000000009","mary.smith98@example.com","2030-09-09T08:31:00Z"
126 | line_items:
127 | description: A single article that is part of an order.
128 | type: table
129 | fields:
130 | line_item_id:
131 | type: text
132 | description: Primary key of the lines_item_id table
133 | required: true
134 | order_id:
135 | $ref: '#/definitions/order_id'
136 | references: orders.order_id
137 | sku:
138 | description: The purchased article number
139 | $ref: '#/definitions/sku'
140 | primaryKey: ["order_id", "line_item_id"]
141 | examples:
142 | - |
143 | line_item_id,order_id,sku
144 | "LI-1","1001","5901234123457"
145 | "LI-2","1001","4001234567890"
146 | "LI-3","1002","5901234123457"
147 | "LI-4","1002","2001234567893"
148 | "LI-5","1003","4001234567890"
149 | "LI-6","1003","5001234567892"
150 | "LI-7","1004","5901234123457"
151 | "LI-8","1005","2001234567893"
152 | "LI-9","1005","5001234567892"
153 | "LI-10","1005","6001234567891"
154 | definitions:
155 | order_id:
156 | title: Order ID
157 | type: text
158 | format: uuid
159 | description: An internal ID that identifies an order in the online shop.
160 | examples:
161 | - 243c25e5-a081-43a9-aeab-6d5d5b6cb5e2
162 | pii: true
163 | classification: restricted
164 | tags:
165 | - orders
166 | sku:
167 | title: Stock Keeping Unit
168 | type: text
169 | pattern: ^[A-Za-z0-9]{8,14}$
170 | examples:
171 | - "96385074"
172 | description: |
173 | A Stock Keeping Unit (SKU) is an internal unique identifier for an article.
174 | It is typically associated with an article's barcode, such as the EAN/GTIN.
175 | links:
176 | wikipedia: https://en.wikipedia.org/wiki/Stock_keeping_unit
177 | tags:
178 | - inventory
179 | servicelevels:
180 | availability:
181 | description: The server is available during support hours
182 | percentage: 99.9%
183 | retention:
184 | description: Data is retained for one year
185 | period: P1Y
186 | unlimited: false
187 | latency:
188 | description: Data is available within 25 hours after the order was placed
189 | threshold: 25h
190 | sourceTimestampField: orders.order_timestamp
191 | processedTimestampField: orders.processed_timestamp
192 | freshness:
193 | description: The age of the youngest row in a table.
194 | threshold: 25h
195 | timestampField: orders.order_timestamp
196 | frequency:
197 | description: Data is delivered once a day
198 | type: batch # or streaming
199 | interval: daily # for batch, either or cron
200 | cron: 0 0 * * * # for batch, either or interval
201 | support:
202 | description: The data is available during typical business hours at headquarters
203 | time: 9am to 5pm in EST on business days
204 | responseTime: 1h
205 | backup:
206 | description: Data is backed up once a week, every Sunday at 0:00 UTC.
207 | interval: weekly
208 | cron: 0 0 * * 0
209 | recoveryTime: 24 hours
210 | recoveryPoint: 1 week
211 | tags:
212 | - checkout
213 | - orders
214 | - s3
215 | links:
216 | datacontractCli: https://cli.datacontract.com
--------------------------------------------------------------------------------
/gen-openapi-yaml:
--------------------------------------------------------------------------------
1 | #!/bin/bash
2 |
3 | # INSTALL BEFORE
4 | # npm install -g @openapi-contrib/json-schema-to-openapi-schema
5 | # brew install yq
6 |
7 | json-schema-to-openapi-schema convert datacontract.schema.json > datacontract.schema.openapi-format.json
8 | yq --input-format=json --output-format=yaml --prettyPrint datacontract.schema.openapi-format.json > datacontract.schema.openapi-format.yaml
9 | echo "Compare 'datacontract.schema.openapi-format.yaml' with openapi.yaml of the Data Mesh Manager"
10 | echo "Prepend 'DataContract:\\n' and match the indendation correctly. Then, compare in IntelliJ"
11 |
--------------------------------------------------------------------------------
/images/categories.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datacontract/datacontract-specification/e7a0259a002b5b82ba3bb8a02323cc47a20d374c/images/categories.png
--------------------------------------------------------------------------------
/images/datacontract-logo.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datacontract/datacontract-specification/e7a0259a002b5b82ba3bb8a02323cc47a20d374c/images/datacontract-logo.png
--------------------------------------------------------------------------------
/images/datacontract-preview.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datacontract/datacontract-specification/e7a0259a002b5b82ba3bb8a02323cc47a20d374c/images/datacontract-preview.png
--------------------------------------------------------------------------------
/images/datacontract.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datacontract/datacontract-specification/e7a0259a002b5b82ba3bb8a02323cc47a20d374c/images/datacontract.png
--------------------------------------------------------------------------------
/images/favicon.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datacontract/datacontract-specification/e7a0259a002b5b82ba3bb8a02323cc47a20d374c/images/favicon.png
--------------------------------------------------------------------------------
/images/supported-by-innoq--petrol-apricot.svg:
--------------------------------------------------------------------------------
1 |
--------------------------------------------------------------------------------
/versions/0.9.0/README.md:
--------------------------------------------------------------------------------
1 | # Data Contract Specification
2 |
3 | 
4 |
5 | Data contracts bring data providers and data consumers together.
6 |
7 | A _data contract_ is a document that defines the structure, format, semantics, quality, and terms of use for exchanging data between a data provider and their consumers. A data contract is implemented by a data product's output port or other data technologies. Data contracts can also be used for the input port to specify the expectations of data dependencies and verify given guarantees.
8 |
9 | The _data contract specification_ defines a YAML format to describe attributes of provided data sets. It is data platform neutral, yet supports well-known formats to express schemas (e.g., dbt models, JSON Schema, Protobuf, SQL DDL) and quality tests (e.g., SodaCL, SQL queries) to avoid unnecessary abstractions. The data contract specification is an open initiative to define a common data contract format. Think of an [OpenAPI specification](https://www.openapis.org/), but for data sets.
10 |
11 | Data contracts come into play when data is exchanged between different teams or organizational units, such as in a [data mesh architecture](https://www.datamesh-architecture.com/).
12 | First, and foremost, data contracts are a communication tool to express a common understanding of how data should be structured and interpreted. They make semantic and quality expectations explicit. They are often created in [workshops](/workshop). Later in development and production, they also serve as the basis for code generation, testing, schema validations, quality checks, monitoring, access control, and computational governance policies.
13 |
14 | _Note: The term "data contract" refers to a specification that is usually owned by the data provider and thus does not align with a "contract" in a legal sense as a mutual agreement between two parties. The term "contract" may be somewhat misleading, but it is how it is used in practice. The mutual agreement between one data provider and one data consumer is the "data usage agreement" that refers to a data contract. Data usage agreements have a defined lifecycle, start/end date, and help the data provider to track who accesses their data and for which purposes._
15 |
16 | The specification is inspired by [AIDA User Group's Open Data Contract Standard](https://github.com/AIDAUserGroup/open-data-contract-standard), (formerly [PayPal's Data Contract Template](https://github.com/paypal/data-contract-template/blob/main/docs/README.md)) and Data Mesh Manager's [Data Contract API](https://www.datamesh-manager.com).
17 | It follows [OpenAPI](https://www.openapis.org/) and [AsyncAPI](https://www.asyncapi.com/) conventions.
18 |
19 | Version
20 | ---
21 |
22 | 0.9.0
23 |
24 | Example
25 | ---
26 |
27 | [](https://studio.datacontract.com/)
28 |
29 | ```yaml
30 | dataContractSpecification: 0.9.0
31 | id: urn:datacontract:checkout:orders-latest-npii
32 | info:
33 | title: Orders Latest NPII
34 | version: 1.0.0
35 | description: Successful customer orders in the webshop. All orders since 2020-01-01. Orders with their line items are in their current state (no history included). PII data is removed.
36 | owner: Checkout Team
37 | contact:
38 | name: John Doe (Data Product Owner)
39 | email: john.doe@example.com
40 | servers:
41 | production:
42 | type: BigQuery
43 | project: acme_orders_prod
44 | dataset: bigquery_orders_latest_npii_v1
45 | terms:
46 | usage: >
47 | Data can be used for reports, analytics and machine learning use cases.
48 | Order may be linked and joined by other tables
49 | limitations: >
50 | Not suitable for real-time use cases.
51 | Data may not be used to identify individual customers.
52 | Max data processing per day: 10 TiB
53 | billing: 5000 USD per month
54 | noticePeriod: P3M
55 | schema:
56 | type: dbt # the specification format: dbt, bigquery, avro, protobuf, sql, json-schema, custom
57 | specification: # expressed as string or inline yaml or via "$ref: model.yaml"
58 | version: 2
59 | description: The subset of the output port's data model that we agree to use
60 | models:
61 | - name: orders
62 | description: >
63 | One record per order. Includes cancelled and deleted orders.
64 | columns:
65 | - name: order_id
66 | data_type: string
67 | description: Primary key of the orders table
68 | - name: order_timestamp
69 | data_type: timestamptz
70 | description: The business timestamp in UTC when the order was successfully registered in the source system and the payment was successful.
71 | - name: order_total
72 | data_type: integer
73 | description: "Total amount of the order in the smallest monetary unit (e.g., cents)."
74 | - name: line_items
75 | description: >
76 | The items that are part of an order
77 | columns:
78 | - name: lines_item_id
79 | data_type: string
80 | description: Primary key of the lines_item_id table
81 | - name: order_id
82 | data_type: string
83 | description: Foreign key to the orders table
84 | - name: sku
85 | data_type: string
86 | description: The purchased article number
87 | examples:
88 | - type: csv # csv, json, yaml, custom
89 | model: orders
90 | data: |- # expressed as string or inline yaml or via "$ref: data.csv"
91 | order_id,order_timestamp,order_total
92 | "1001","2023-09-09T08:30:00Z",2500
93 | "1002","2023-09-08T15:45:00Z",1800
94 | "1003","2023-09-07T12:15:00Z",3200
95 | "1004","2023-09-06T19:20:00Z",1500
96 | "1005","2023-09-05T10:10:00Z",4200
97 | "1006","2023-09-04T14:55:00Z",2800
98 | "1007","2023-09-03T21:05:00Z",1900
99 | "1008","2023-09-02T17:40:00Z",3600
100 | "1009","2023-09-01T09:25:00Z",3100
101 | "1010","2023-08-31T22:50:00Z",2700
102 | - type: csv
103 | model: line_items
104 | data: |-
105 | lines_item_id,order_id,sku
106 | "1","1001","5901234123457"
107 | "2","1001","4001234567890"
108 | "3","1002","5901234123457"
109 | "4","1002","2001234567893"
110 | "5","1003","4001234567890"
111 | "6","1003","5001234567892"
112 | "7","1004","5901234123457"
113 | "8","1005","2001234567893"
114 | "9","1005","5001234567892"
115 | "10","1005","6001234567891"
116 | quality:
117 | type: SodaCL # data quality check format: SodaCL, montecarlo, custom
118 | specification: # expressed as string or inline yaml or via "$ref: checks.yaml"
119 | checks for orders:
120 | - freshness(order_timestamp) < 24h
121 | - row_count > 500000
122 | - duplicate_count(order_id) = 0
123 | checks for line_items:
124 | - row_count > 500000
125 | ```
126 |
127 | Schema
128 | ---
129 |
130 | [JSON Schema](https://github.com/datacontract/datacontract-specification/blob/main/datacontract.schema.json) of the Data Contract Specification.
131 |
132 | ### Data Contract Object
133 |
134 | This is the root document.
135 |
136 | It is _RECOMMENDED_ that the root document be named: `datacontract.yaml`.
137 |
138 | | Field | Type | Description |
139 | |---------------------------|------------------------------------|-------------------------------------------------------------------------------------------------------|
140 | | dataContractSpecification | `string` | REQUIRED. Specifies the Data Contract Specification being used. |
141 | | id | `string` | REQUIRED. An organization-wide unique technical identifier, such as a UUID, URN, slug, string, or number |
142 | | info | [Info Object](#info-object) | REQUIRED. Specifies the metadata of the data contract. May be used by tooling. |
143 | | servers | [Servers Object](#servers-object) | Specifies the servers of the data contract. |
144 | | terms | [Terms Object](#terms-object) | Specifies the terms and conditions of the data contract. |
145 | | schema | [Schema Object](#schema-object) | Specifies the data contract schema. The specification supports different schemas. |
146 | | examples | [Examples Object](#examples-object) | Specifies example data sets for the schema. The specification supports different example types. |
147 | | quality | [Quality Object](#quality-object) | Specifies the quality attributes and checks. The specification supports different quality check DSLs. |
148 |
149 | This object _MAY_ be extended with [Specification Extensions](#specification-extensions).
150 |
151 |
152 |
153 |
154 | ### Info Object
155 |
156 | Metadata and life cycle information about the data contract.
157 |
158 |
159 | | Field | Type | Description |
160 | |---------|--------|------------------------------------------------------------------------------------------------------------------------------------------------------------------|
161 | | title | `string` | REQUIRED. The title of the data contract. |
162 | | version | `string` | REQUIRED. The version of the data contract document (which is distinct from the Data Contract Specification version or the Data Product implementation version). |
163 | | description | `string` | A description of the data contract. |
164 | | owner | `string` | The owner or team responsible for managing the data contract and providing the data. |
165 | | dataProduct | `string` | The identifier of the data product that contains the output port providing the data. |
166 | | outputPort | `string` | DEPRECATED. The identifier of the output port that implements the data contract. |
167 | | contact | [Contact Object](#contact-object) | Contact information for the data contract. |
168 |
169 |
170 |
171 |
172 | ### Contact Object
173 |
174 | Contact information for the data contract.
175 |
176 | | Field | Type | Description |
177 | |-------|----------|-------------------------------------------------------------------------------------------------------|
178 | | name | `string` | The identifying name of the contact person/organization. |
179 | | url | `string` | The URL pointing to the contact information. This _MUST_ be in the form of a URL. |
180 | | email | `string` | The email address of the contact person/organization. This _MUST_ be in the form of an email address. |
181 |
182 | This object _MAY_ be extended with [Specification Extensions](#specification-extensions).
183 |
184 | ### Servers Object
185 |
186 | Information about the servers.
187 |
188 | The Servers Object is a map of [Server Objects](#server-object).
189 |
190 | ### Server Object
191 |
192 | The fields are dependent on the defined type.
193 |
194 | | Field | Type | Description |
195 | |-------------|----------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
196 | | type | `string` | The type of the data product technology that implements the data contract. Well-known server types are: `bigquery`, `s3`, `redshift`, `snowflake`, `databricks`, `kafka` |
197 | | description | `string` | An optional string describing the server. |
198 |
199 |
200 | This object _MAY_ be extended with [Specification Extensions](#specification-extensions).
201 |
202 | #### BigQuery Server Object
203 |
204 | | Field | Type | Description |
205 | |---------|----------|-------------|
206 | | type | `string` | `bigquery` |
207 | | project | `string` | |
208 | | dataset | `string` | |
209 |
210 | #### S3 Server Object
211 |
212 | | Field | Type | Description |
213 | |----------|----------|--------------------------------|
214 | | type | `string` | `s3` |
215 | | location | `string` | S3 URL, starting with `s3://` |
216 |
217 | Example:
218 |
219 | ```yaml
220 | servers:
221 | production:
222 | type: s3
223 | location: s3://acme-orders-prod/orders/
224 | ```
225 |
226 |
227 | #### Redshift Server Object
228 |
229 | | Field | Type | Description |
230 | |----------|----------|-------------|
231 | | type | `string` | `redshift` |
232 | | account | `string` | |
233 | | database | `string` | |
234 | | schema | `string` | |
235 |
236 | #### Snowflake Server Object
237 |
238 | | Field | Type | Description |
239 | |----------|----------|-------------|
240 | | type | `string` | `snowflake` |
241 | | account | `string` | |
242 | | database | `string` | |
243 | | schema | `string` | |
244 |
245 | #### Databricks Server Object
246 |
247 | | Field | Type | Description |
248 | |----------|----------|--------------|
249 | | type | `string` | `databricks` |
250 | | share | `string` | |
251 |
252 | #### Kafka Server Object
253 |
254 | | Field | Type | Description |
255 | |-------|----------|-------------|
256 | | type | `string` | `kafka` |
257 | | host | `string` | |
258 | | topic | `string` | |
259 |
260 |
261 |
262 | ### Terms Object
263 |
264 | The terms and conditions of the data contract.
265 |
266 | | Field | Type | Description |
267 | |----------------------|--------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
268 | | usage | `string` | The usage describes the way the data is expected to be used. Can contain business and technical information. |
269 | | limitations | `string` | The limitations describe the restrictions on how the data can be used, can be technical or restrictions on what the data may not be used for. |
270 | | billing | `string` | The billing describes the pricing model for using the data, such as whether it's free, having a monthly fee, or metered pay-per-use. |
271 | | noticePeriod | `string` | The period of time that must be given by either party to terminate or modify a data usage agreement. Uses ISO-8601 period format, e.g., `P3M` for a period of three months. |
272 |
273 |
274 | ### Schema Object
275 |
276 | The schema of the data contract describes the syntax and semantics of provided data sets.
277 | As the type of the output port depends on the data platform, multiple schema specifications are supported.
278 |
279 | A schema may define a single table, a collection of tables as a dataset, a file structure, or any arbitrary structure.
280 |
281 | To avoid unnecessary abstractions, the data contract specification supports existing well-known formats. Some schema types, such as `dbt`, also support defining tests and additional metadata.
282 |
283 |
284 | | Field | Type | Description |
285 | | ----- |----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------|
286 | | type | `string` | REQUIRED. The type of the schema.
Typical values are: `dbt`, `bigquery`, `json-schema`, `sql-ddl`, `avro`, `protobuf`, `custom` |
287 | | specification | [dbt Schema Object](#dbt-schema-object) \|
[BigQuery Schema Object](#bigquery-schema-object) \|
[JSON Schema Schema Object](#bigquery-schema-object) \|
[SQL DDL Schema Object](#sql-ddl-schema-object) \|
`string` | REQUIRED. The specification of the schema. The schema specification can be encoded as a string or as inline YAML. |
288 |
289 |
290 | #### dbt Schema Object
291 |
292 | https://docs.getdbt.com/reference/model-properties
293 |
294 | Example (inline YAML):
295 |
296 | ```yaml
297 | schema:
298 | type: dbt
299 | specification:
300 | version: 2
301 | models:
302 | - name: "My Table"
303 | description: "My description"
304 | columns:
305 | - name: "My column"
306 | data_type: text
307 | description: "My description"
308 | ```
309 |
310 | Example (string):
311 |
312 | ```yaml
313 | schema:
314 | type: dbt
315 | specification: |-
316 | version: 2
317 | models:
318 | - name: "My Table"
319 | description: "My description"
320 | columns:
321 | - name: "My column"
322 | data_type: text
323 | description: "My description"
324 | ```
325 |
326 | #### BigQuery Schema Object
327 |
328 | The schema structure is defined by the [Google BigQuery Table](https://cloud.google.com/bigquery/docs/reference/rest/v2/tables#resource:-table) object. You can extract such a Table object via the [tables.get](https://cloud.google.com/bigquery/docs/reference/rest/v2/tables/get) endpoint.
329 |
330 | Instead of providing a single Table object, you can also provide an array of such objects. Be aware that [tables.list](https://cloud.google.com/bigquery/docs/reference/rest/v2/tables/list) only returns a subset of the full Table object. You need to call every Table object via [tables.get](https://cloud.google.com/bigquery/docs/reference/rest/v2/tables/get) to get the full Table object, including the actual schema.
331 |
332 | Learn more: [Google BigQuery REST Reference v2](https://cloud.google.com/bigquery/docs/reference/rest)
333 |
334 |
335 |
336 | Example:
337 |
338 | ```yaml
339 | schema:
340 | type: bigquery
341 | specification: |-
342 | {
343 | "tableReference": {
344 | "projectId": "my-project",
345 | "datasetId": "my_dataset",
346 | "tableId": "my_table"
347 | },
348 | "description": "This is a description",
349 | "type": "TABLE",
350 | "schema": {
351 | "fields": [
352 | {
353 | "name": "name",
354 | "type": "STRING",
355 | "mode": "NULLABLE",
356 | "description": "This is a description"
357 | }
358 | ]
359 | }
360 | }
361 | ```
362 |
363 | #### JSON Schema Schema Object
364 |
365 | JSON Schema can be defined as JSON or rendered as YAML, following the [OpenAPI Schema Object dialect](https://spec.openapis.org/oas/v3.1.0#properties)
366 |
367 | Example (inline YAML):
368 |
369 | ```yaml
370 | schema:
371 | type: json-schema
372 | specification:
373 | orders:
374 | description: One record per order. Includes cancelled and deleted orders.
375 | type: object
376 | properties:
377 | order_id:
378 | type: string
379 | description: Primary key of the orders table
380 | order_timestamp:
381 | type: string
382 | format: date-time
383 | description: The business timestamp in UTC when the order was successfully registered in the source system and the payment was successful.
384 | order_total:
385 | type: integer
386 | description: Total amount of the order in the smallest monetary unit (e.g., cents).
387 | line_items:
388 | type: object
389 | properties:
390 | lines_item_id:
391 | type: string
392 | description: Primary key of the lines_item_id table
393 | order_id:
394 | type: string
395 | description: Foreign key to the orders table
396 | sku:
397 | type: string
398 | description: The purchased article number
399 | ```
400 |
401 | Example (string):
402 |
403 | ```yaml
404 | schema:
405 | type: json-schema
406 | specification: |-
407 | {
408 | "$schema": "http://json-schema.org/draft-07/schema#",
409 | "type": "object",
410 | "properties": {
411 | "orders": {
412 | "type": "object",
413 | "description": "One record per order. Includes cancelled and deleted orders.",
414 | "properties": {
415 | "order_id": {
416 | "type": "string",
417 | "description": "Primary key of the orders table"
418 | },
419 | "order_timestamp": {
420 | "type": "string",
421 | "format": "date-time",
422 | "description": "The business timestamp in UTC when the order was successfully registered in the source system and the payment was successful."
423 | },
424 | "order_total": {
425 | "type": "integer",
426 | "description": "Total amount of the order in the smallest monetary unit (e.g., cents)."
427 | }
428 | },
429 | "required": ["order_id", "order_timestamp", "order_total"]
430 | },
431 | "line_items": {
432 | "type": "object",
433 | "properties": {
434 | "lines_item_id": {
435 | "type": "string",
436 | "description": "Primary key of the lines_item_id table"
437 | },
438 | "order_id": {
439 | "type": "string",
440 | "description": "Foreign key to the orders table"
441 | },
442 | "sku": {
443 | "type": "string",
444 | "description": "The purchased article number"
445 | }
446 | },
447 | "required": ["lines_item_id", "order_id", "sku"]
448 | }
449 | },
450 | "required": ["orders", "line_items"]
451 | }
452 | ```
453 |
454 | #### SQL DDL Schema Object
455 |
456 | Classical SQL DDLs can be used to describe the structure.
457 |
458 |
459 | Example (string):
460 |
461 | ```yaml
462 | schema:
463 | type: sql-ddl
464 | specification: |-
465 | -- One record per order. Includes cancelled and deleted orders.
466 | CREATE TABLE orders (
467 | order_id TEXT PRIMARY KEY, -- Primary key of the orders table
468 | order_timestamp TIMESTAMPTZ NOT NULL, -- The business timestamp in UTC when the order was successfully registered in the source system and the payment was successful.
469 | order_total INTEGER NOT NULL -- Total amount of the order in the smallest monetary unit (e.g., cents)
470 | );
471 |
472 | -- The items that are part of an order
473 | CREATE TABLE line_items (
474 | lines_item_id TEXT PRIMARY KEY, -- Primary key of the lines_item_id table
475 | order_id TEXT REFERENCES orders(order_id), -- Foreign key to the orders table
476 | sku TEXT NOT NULL -- The purchased article number
477 | );
478 |
479 | ```
480 |
481 | ### Examples Object
482 |
483 | The Examples Object is an array of [Example Objects](#examples-object).
484 |
485 | ### Example Object
486 |
487 | | Field | Type | Description |
488 | |-------------|----------|-----------------------------------------------------------------------------------------------------------------------------------------|
489 | | type | `string` | The type of the data product technology that implements the data contract. Well-known server types are: `csv`, `json`, `yaml`, `custom` |
490 | | description | `string` | An optional string describing the example. |
491 | | model | `string` | The reference to the model in the schema, e.g. a table name. |
492 | | data | `string` | Example data for this model. |
493 |
494 | Example:
495 |
496 | ```yaml
497 | examples:
498 | - type: csv
499 | model: orders
500 | data: |-
501 | order_id,order_timestamp,order_total
502 | "1001","2023-09-09T08:30:00Z",2500
503 | "1002","2023-09-08T15:45:00Z",1800
504 | "1003","2023-09-07T12:15:00Z",3200
505 | "1004","2023-09-06T19:20:00Z",1500
506 | "1005","2023-09-05T10:10:00Z",4200
507 | "1006","2023-09-04T14:55:00Z",2800
508 | "1007","2023-09-03T21:05:00Z",1900
509 | "1008","2023-09-02T17:40:00Z",3600
510 | "1009","2023-09-01T09:25:00Z",3100
511 | "1010","2023-08-31T22:50:00Z",2700
512 | ```
513 |
514 | ### Quality Object
515 |
516 | The quality object contains quality attributes and checks.
517 |
518 | | Field | Type | Description |
519 | | ----- |-------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------|
520 | | type | `string` | REQUIRED. The type of the schema.
Typical values are: `SodaCL`, `montecarlo`, `custom` |
521 | | specification | [SodaCL Quality Object](#sodacl-quality-object) \|
[Monte Carlo Schema Object](#monte-carlo-quality-object) \|
`string` | REQUIRED. The specification of the quality attributes. The quality specification can be encoded as a string or as inline YAML. |
522 |
523 |
524 | #### SodaCL Quality Object
525 |
526 | Quality attributes in [Soda Checks Language](https://docs.soda.io/soda-cl/soda-cl-overview.html).
527 |
528 | The `specification` represents the content of a `checks.yml` file.
529 |
530 | Example (inline):
531 |
532 | ```yaml
533 | quality:
534 | type: SodaCL # data quality check format: SodaCL, montecarlo, dbt-tests, custom
535 | specification: # expressed as string or inline yaml or via "$ref: checks.yaml"
536 | checks for orders:
537 | - row_count > 0
538 | - duplicate_count(order_id) = 0
539 | checks for line_items:
540 | - row_count > 0
541 | ```
542 |
543 | Example (string):
544 |
545 | ```yaml
546 | quality:
547 | type: SodaCL
548 | specification: |-
549 | checks for search_queries:
550 | - freshness(search_timestamp) < 1d
551 | - row_count > 100000
552 | - missing_count(search_query) = 0
553 | ```
554 |
555 | #### Monte Carlo Quality Object
556 |
557 | Quality attributes defined as Monte Carlos [Monitors as Code](https://docs.getmontecarlo.com/docs/monitors-as-code).
558 |
559 | The `specification` represents the content of a `montecarlo.yml` file.
560 |
561 | Example (string):
562 |
563 | ```yaml
564 | quality:
565 | type: montecarlo
566 | specification: |-
567 | montecarlo:
568 | field_health:
569 | - table: project:dataset.table_name
570 | timestamp_field: created
571 | dimension_tracking:
572 | - table: project:dataset.table_name
573 | timestamp_field: created
574 | field: order_status
575 | ```
576 |
577 | ### Specification Extensions
578 |
579 | While the Data Contract Specification tries to accommodate most use cases, additional data can be added to extend the specification at certain points.
580 |
581 | A custom fields can be added with any name. The value can be null, a primitive, an array or an object.
582 |
583 | ### Design Principles
584 |
585 | The Data Contract Specification follows these design principles:
586 |
587 | - Is an open standard and its serialization can be versioned in git
588 | - Follows OpenAPI and AsyncAPI conventions so that it feels immediately familiar
589 | - Supports tooling by being machine-readable
590 | - Supports existing well-known formats to avoid unnecessary abstractions
591 | - Supports contract-first approaches
592 | - Supports code-first approaches
593 |
594 | Tooling
595 | ---
596 | - [Data Contract Studio](https://studio.datacontract.com/) is a free web tool to develop and share data contracts.
597 | - [Data Contract CLI](https://github.com/datacontract/cli) is a free CLI tool to help you create, develop, and maintain your data contracts.
598 | - [Data Mesh Manager](https://www.datamesh-manager.com/) is a commercial tool to manage data products and data contracts. It supports the data contract specification and allows the user to import or export data contracts using this specification.
599 |
600 |
601 | Other Data Contract Specifications
602 | ---
603 | - [AIDA User Group's Open Data Contract Standard](https://github.com/AIDAUserGroup/open-data-contract-standard)
604 | - [PayPal's Data Contract Template](https://github.com/paypal/data-contract-template/blob/main/docs/README.md)
605 |
606 | Literature
607 | ---
608 | - [Driving Data Quality with Data Contracts](https://www.amazon.com/dp/B0C37FPH3D) by Andrew Jones
609 |
610 | Authors
611 | ---
612 | The Data Contract Specification was originally created by [Jochen Christ](https://www.linkedin.com/in/jochenchrist/) and [Dr. Simon Harrer](https://www.linkedin.com/in/simonharrer/), and is currently maintained by them.
613 |
614 |
615 | Contributing
616 | ---
617 | Contributions are welcome! Please open an issue or a pull request.
618 |
619 | License
620 | ---
621 | [MIT License](LICENSE)
622 |
623 |
624 |
625 |
--------------------------------------------------------------------------------
/versions/0.9.0/datacontract.init.yaml:
--------------------------------------------------------------------------------
1 | dataContractSpecification: 0.9.0
2 | id: my-data-contract-id
3 | info:
4 | title: My Data Contract
5 | version: 0.0.1
6 | # description:
7 | # owner:
8 | # contact:
9 | # name:
10 | # url:
11 | # email:
12 |
13 |
14 | ### servers
15 |
16 | #servers:
17 | # my-stage:
18 | # type: bigquery
19 | # project:
20 | # dataset:
21 |
22 | #servers:
23 | # my-stage:
24 | # type: s3
25 | # location: s3://
26 |
27 | #servers:
28 | # my-stage:
29 | # type: redshift
30 | # account:
31 | # database:
32 | # schema:
33 |
34 | #servers:
35 | # my-stage:
36 | # type: snowflake
37 | # account:
38 | # database:
39 | # schema:
40 |
41 | #servers:
42 | # my-stage:
43 | # type: databricks
44 | # share:
45 |
46 | #servers:
47 | # my-stage:
48 | # type: kafka
49 | # host:
50 | # topic:
51 |
52 |
53 | ### terms
54 |
55 | #terms:
56 | # usage:
57 | # limitations:
58 | # billing:
59 | # noticePeriod:
60 |
61 |
62 | ### schema
63 |
64 | #schema:
65 | # type: dbt
66 | # specification:
67 | # version:
68 | # models:
69 | # - name:
70 | # description:
71 | # columns:
72 | # - name:
73 | # type:
74 | # description:
75 | # tests:
76 |
77 | #schema:
78 | # type: dbt
79 | # specification: |-
80 | # version:
81 | # models:
82 | # - name:
83 | # description:
84 | # columns:
85 | # - name:
86 | # type:
87 | # description:
88 | # tests:
89 |
90 | #schema:
91 | # type: dbt
92 | # specification: "$ref: model.yaml"
93 |
94 | #schema:
95 | # type: bigquery
96 | # specification: |-
97 | # {
98 | # "tableReference": {
99 | # "projectId": "my-project",
100 | # "datasetId": "my_dataset",
101 | # "tableId": "my_table"
102 | # },
103 | # "description": "This is a description",
104 | # "type": "TABLE",
105 | # "schema": {
106 | # "fields": [
107 | # {
108 | # "name": "name",
109 | # "type": "STRING",
110 | # "mode": "NULLABLE",
111 | # "description": "This is a description"
112 | # }
113 | # ]
114 | # }
115 | # }
116 |
117 | #schema:
118 | # type: json-schema
119 | # specification:
120 | # my-table:
121 | # description:
122 | # type: object
123 | # properties:
124 | # id:
125 | # type: string
126 | # description:
127 |
128 | #schema:
129 | # type: json-schema
130 | # specification: |-
131 | # {
132 | # "$schema": "http://json-schema.org/draft-07/schema#",
133 | # "type": "object",
134 | # "properties": {
135 | # "my_table": {
136 | # "type": "object",
137 | # "description": "",
138 | # "properties": {
139 | # "id": {
140 | # "type": "string",
141 | # "description": ""
142 | # },
143 | # "required": ["id"]
144 | # }
145 | # },
146 | # "required": ["my-table"]
147 | # }
148 |
149 | #schema:
150 | # type: sql-ddl
151 | # specification: |-
152 | # CREATE TABLE my_table (
153 | # id TEXT PRIMARY KEY
154 | # );
155 |
156 | #schema:
157 | # type: avro
158 | # specification:
159 | # User:
160 | # type: record
161 | # name: MyTable
162 | # fields:
163 | # - name: id
164 | # type: string
165 |
166 | #schema:
167 | # type: avro
168 | # specification: |-
169 | # {
170 | # "type": "record",
171 | # "name": "MyTable",
172 | # "fields": [
173 | # {
174 | # "name": "name",
175 | # "type": "string"
176 | # }
177 | # ]
178 | # }
179 |
180 | #schema:
181 | # type: protobuf
182 | # specification: |-
183 | # message MyTable {
184 | # string id = 1;
185 | # }
186 |
187 | #schema:
188 | # type: custom
189 | # specification:
190 |
191 |
192 | ### examples
193 |
194 | #examples:
195 | # - type: csv
196 | # model: my_table
197 | # data: |-
198 | # id,timestamp,amount
199 | # "1001","2023-09-09T08:30:00Z",2500
200 | # "1002","2023-09-08T15:45:00Z",1800
201 | #
202 | #examples:
203 | # - type: csv
204 | # model: my_table
205 | # data: "$ref: data.csv"
206 |
207 | #examples:
208 | # - type: json
209 | # model: my_table
210 | # data: |-
211 | # [
212 | # {
213 | # "id": "1001",
214 | # "timestamp": "2023-09-09T08:30:00Z",
215 | # "amount": 2500
216 | # },
217 | # {
218 | # "id": "1002",
219 | # "timestamp": "2023-09-08T15:45:00Z",
220 | # "amount": 1800
221 | # }
222 | # ]
223 |
224 | #examples:
225 | # - type: yaml
226 | # model: my_table
227 | # data:
228 | # - id: 1001
229 | # timestamp: 2023-09-09T08:30:00Z
230 | # amount: 2500
231 | # - id: 1002
232 | # timestamp: 2023-09-08T15:45:00Z
233 | # amount: 1800
234 |
235 | #examples:
236 | # - type: custom
237 | # model: my_table
238 | # data: |-
239 |
240 |
241 | ### quality
242 |
243 | #quality:
244 | # type: SodaCL
245 | # specification:
246 | # checks for my_table:
247 | # - duplicate_count(order_id) = 0
248 |
249 | #quality:
250 | # type: SodaCL
251 | # specification:
252 | # checks for my_table: |-
253 | # - duplicate_count(id) = 0
254 |
255 | #quality:
256 | # type: SodaCL
257 | # specification:
258 | # checks for my_table: "$ref: checks.yaml"
259 |
260 | #quality:
261 | # type: montecarlo
262 | # specification: |-
263 | # montecarlo:
264 | # field_health:
265 | # - table: my_project:my_dataset.my_table
266 | # fields:
267 | # - id
268 | # - timestamp
269 | # - amount
270 | # timestamp_field: timestamp
271 |
272 | #quality:
273 | # type: custom
274 | # specification: |-
275 |
--------------------------------------------------------------------------------
/versions/0.9.0/datacontract.schema.json:
--------------------------------------------------------------------------------
1 | {
2 | "$schema": "http://json-schema.org/draft-07/schema#",
3 | "type": "object",
4 | "properties": {
5 | "dataContractSpecification": {
6 | "type": "string",
7 | "enum": [
8 | "0.9.0"
9 | ],
10 | "description": "Specifies the Data Contract Specification being used."
11 | },
12 | "id": {
13 | "type": "string",
14 | "description": "Specifies the identifier of the data contract."
15 | },
16 | "info": {
17 | "type": "object",
18 | "properties": {
19 | "title": {
20 | "type": "string",
21 | "description": "The title of the data contract."
22 | },
23 | "version": {
24 | "type": "string",
25 | "description": "The version of the data contract document (which is distinct from the Data Contract Specification version or the Data Product implementation version)."
26 | },
27 | "description": {
28 | "type": "string",
29 | "description": "A description of the data contract."
30 | },
31 | "owner": {
32 | "type": "string",
33 | "description": "The owner or team responsible for managing the data contract and providing the data."
34 | },
35 | "dataProduct": {
36 | "type": "string",
37 | "description": "The data product that contains the output port providing the data."
38 | },
39 | "outputPort": {
40 | "type": "string",
41 | "description": "The output port that implements the data contract."
42 | },
43 | "contact": {
44 | "type": "object",
45 | "properties": {
46 | "name": {
47 | "type": "string",
48 | "description": "The identifying name of the contact person/organization."
49 | },
50 | "url": {
51 | "type": "string",
52 | "format": "uri",
53 | "description": "The URL pointing to the contact information. This MUST be in the form of a URL."
54 | },
55 | "email": {
56 | "type": "string",
57 | "format": "email",
58 | "description": "The email address of the contact person/organization. This MUST be in the form of an email address."
59 | }
60 | },
61 | "description": "Contact information for the data contract."
62 | }
63 | },
64 | "required": [
65 | "title",
66 | "version"
67 | ],
68 | "description": "Metadata and life cycle information about the data contract."
69 | },
70 | "servers": {
71 | "type": "object",
72 | "additionalProperties": {
73 | "anyOf": [
74 | {
75 | "type": "object",
76 | "properties": {
77 | "type": {
78 | "type": "string",
79 | "enum": [
80 | "bigquery",
81 | "BigQuery"
82 | ],
83 | "description": "The type of the data product technology that implements the data contract."
84 | },
85 | "project": {
86 | "type": "string",
87 | "description": "An optional string describing the server."
88 | },
89 | "dataset": {
90 | "type": "string",
91 | "description": "An optional string describing the server."
92 | }
93 | },
94 | "required": [
95 | "type",
96 | "project",
97 | "dataset"
98 | ]
99 | },
100 | {
101 | "type": "object",
102 | "properties": {
103 | "type": {
104 | "type": "string",
105 | "enum": [
106 | "s3"
107 | ],
108 | "description": "The type of the data product technology that implements the data contract."
109 | },
110 | "location": {
111 | "type": "string",
112 | "format": "uri",
113 | "description": "An optional string describing the server. Must be in the form of a URL."
114 | }
115 | },
116 | "required": [
117 | "type",
118 | "location"
119 | ]
120 | },
121 | {
122 | "type": "object",
123 | "properties": {
124 | "type": {
125 | "type": "string",
126 | "enum": [
127 | "redshift"
128 | ],
129 | "description": "The type of the data product technology that implements the data contract."
130 | },
131 | "account": {
132 | "type": "string",
133 | "description": "An optional string describing the server."
134 | },
135 | "database": {
136 | "type": "string",
137 | "description": "An optional string describing the server."
138 | },
139 | "schema": {
140 | "type": "string",
141 | "description": "An optional string describing the server."
142 | }
143 | },
144 | "required": [
145 | "type",
146 | "account",
147 | "database",
148 | "schema"
149 | ]
150 | },
151 | {
152 | "type": "object",
153 | "properties": {
154 | "type": {
155 | "type": "string",
156 | "enum": [
157 | "snowflake"
158 | ],
159 | "description": "The type of the data product technology that implements the data contract."
160 | },
161 | "account": {
162 | "type": "string",
163 | "description": "An optional string describing the server."
164 | },
165 | "database": {
166 | "type": "string",
167 | "description": "An optional string describing the server."
168 | },
169 | "schema": {
170 | "type": "string",
171 | "description": "An optional string describing the server."
172 | }
173 | },
174 | "required": [
175 | "type",
176 | "account",
177 | "database",
178 | "schema"
179 | ]
180 | },
181 | {
182 | "type": "object",
183 | "properties": {
184 | "type": {
185 | "type": "string",
186 | "enum": [
187 | "databricks"
188 | ],
189 | "description": "The type of the data product technology that implements the data contract."
190 | },
191 | "share": {
192 | "type": "string",
193 | "description": "An optional string describing the server."
194 | }
195 | },
196 | "required": [
197 | "type",
198 | "share"
199 | ]
200 | },
201 | {
202 | "type": "object",
203 | "properties": {
204 | "type": {
205 | "type": "string",
206 | "enum": [
207 | "kafka"
208 | ],
209 | "description": "The type of the data product technology that implements the data contract."
210 | },
211 | "host": {
212 | "type": "string",
213 | "description": "An optional string describing the server."
214 | },
215 | "topic": {
216 | "type": "string",
217 | "description": "An optional string describing the server."
218 | }
219 | },
220 | "required": [
221 | "type",
222 | "host",
223 | "topic"
224 | ]
225 | }
226 | ]
227 | },
228 | "description": "Information about the servers."
229 | },
230 | "terms": {
231 | "type": "object",
232 | "description": "The terms and conditions of the data contract.",
233 | "properties": {
234 | "usage": {
235 | "type": "string",
236 | "description": "The usage describes the way the data is expected to be used. Can contain business and technical information."
237 | },
238 | "limitations": {
239 | "type": "string",
240 | "description": "The limitations describe the restrictions on how the data can be used, can be technical or restrictions on what the data may not be used for."
241 | },
242 | "billing": {
243 | "type": "string",
244 | "description": "The billing describes the pricing model for using the data, such as whether it's free, having a monthly fee, or metered pay-per-use."
245 | },
246 | "noticePeriod": {
247 | "type": "string",
248 | "description": "The period of time that must be given by either party to terminate or modify a data usage agreement. Uses ISO-8601 period format, e.g., 'P3M' for a period of three months."
249 | }
250 | }
251 | },
252 | "schema": {
253 | "type": "object",
254 | "properties": {
255 | "type": {
256 | "type": "string",
257 | "enum": [
258 | "dbt",
259 | "bigquery",
260 | "json-schema",
261 | "sql-ddl",
262 | "avro",
263 | "protobuf",
264 | "custom"
265 | ],
266 | "description": "The type of the schema. Typical values are: dbt, bigquery, json-schema, sql-ddl, avro, protobuf, custom."
267 | },
268 | "specification": {
269 | "anyOf": [
270 | {
271 | "type": "string",
272 | "description": "The specification of the schema as a string."
273 | },
274 | {
275 | "type": "object",
276 | "description": "The specification of the schema as an object."
277 | }
278 | ]
279 | }
280 | },
281 | "required": [
282 | "type",
283 | "specification"
284 | ],
285 | "description": "The schema of the data contract describes the syntax and semantics of provided data sets. It supports different schema types."
286 | },
287 | "examples": {
288 | "type": "array",
289 | "items": {
290 | "type": "object",
291 | "properties": {
292 | "type": {
293 | "type": "string",
294 | "enum": [
295 | "csv",
296 | "json",
297 | "yaml",
298 | "custom"
299 | ],
300 | "description": "The type of the example data. Well-known types are: csv, json, yaml, custom."
301 | },
302 | "description": {
303 | "type": "string",
304 | "description": "An optional string describing the example."
305 | },
306 | "model": {
307 | "type": "string",
308 | "description": "The reference to the model in the schema, e.g., a table name."
309 | },
310 | "data": {
311 | "type": "string",
312 | "description": "Example data for this model."
313 | }
314 | },
315 | "required": [
316 | "type",
317 | "model",
318 | "data"
319 | ]
320 | },
321 | "description": "The Examples Object is an array of Example Objects."
322 | },
323 | "quality": {
324 | "type": "object",
325 | "properties": {
326 | "type": {
327 | "type": "string",
328 | "enum": [
329 | "SodaCL",
330 | "montecarlo",
331 | "custom"
332 | ],
333 | "description": "The type of the quality check. Typical values are: SodaCL, montecarlo, custom."
334 | },
335 | "specification": {
336 | "anyOf": [
337 | {
338 | "type": "string",
339 | "description": "The specification of the quality attributes as a string."
340 | },
341 | {
342 | "type": "object",
343 | "description": "The specification of the quality attributes as an object."
344 | }
345 | ]
346 | }
347 | },
348 | "required": [
349 | "type",
350 | "specification"
351 | ],
352 | "description": "The quality object contains quality attributes and checks."
353 | }
354 | },
355 | "required": [
356 | "dataContractSpecification",
357 | "id",
358 | "info"
359 | ]
360 | }
361 |
--------------------------------------------------------------------------------
/versions/0.9.1/README.md:
--------------------------------------------------------------------------------
1 | # Data Contract Specification
2 |
3 | 
4 |
5 | Data contracts bring data providers and data consumers together.
6 |
7 | A _data contract_ is a document that defines the structure, format, semantics, quality, and terms of use for exchanging data between a data provider and their consumers.
8 | A data contract is implemented by a data product's output port or other data technologies.
9 | Data contracts can also be used for the input port to specify the expectations of data dependencies and verify given guarantees.
10 |
11 | The _data contract specification_ defines a YAML format to describe attributes of provided data sets.
12 | It is data platform neutral and can be used with any data platform, such as AWS S3, Google BigQuery, Microsoft Fabric, Databricks, and Snowflake.
13 | The data contract specification is an open initiative to define a common data contract format.
14 | It follows [OpenAPI](https://www.openapis.org/) and [AsyncAPI](https://www.asyncapi.com/) conventions.
15 |
16 | Data contracts come into play when data is exchanged between different teams or organizational units, such as in a [data mesh architecture](https://www.datamesh-architecture.com/).
17 | First, and foremost, data contracts are a communication tool to express a common understanding of how data should be structured and interpreted.
18 | They make semantic and quality expectations explicit.
19 | They are often created collaboratively in [workshops](/workshop) together with data providers and data consumers.
20 | Later in development and production, they also serve as the basis for code generation, testing, schema validations, quality checks, monitoring, access control, and computational governance policies.
21 |
22 | The specification comes along with the [Data Contract CLI](https://github.com/datacontract/cli), an open-source tool to develop, validate, and enforce data contracts.
23 |
24 | _Note: The term "data contract" refers to a specification that is usually owned by the data provider and thus does not align with a "contract" in a legal sense as a mutual agreement between two parties.
25 | The term "contract" may be somewhat misleading, but it is how it is used in practice.
26 | The mutual agreement between one data provider and one data consumer is the "data usage agreement" that refers to a data contract.
27 | Data usage agreements have a defined lifecycle, start/end date, and help the data provider to track who accesses their data and for which purposes._
28 |
29 | Version
30 | ---
31 |
32 | 0.9.1 ([Changelog](CHANGELOG.md))
33 |
34 | Example
35 | ---
36 |
37 | [](https://studio.datacontract.com/)
38 |
39 | ```yaml
40 | dataContractSpecification: 0.9.1
41 | id: urn:datacontract:checkout:orders-latest-npii
42 | info:
43 | title: Orders Latest NPII
44 | version: 1.0.0
45 | description: Successful customer orders in the webshop. All orders since 2020-01-01. Orders with their line items are in their current state (no history included). PII data is removed.
46 | owner: Checkout Team
47 | contact:
48 | name: John Doe (Data Product Owner)
49 | email: john.doe@example.com
50 | servers:
51 | production:
52 | type: BigQuery
53 | project: acme_orders_prod
54 | dataset: bigquery_orders_latest_npii_v1
55 | terms:
56 | usage: >
57 | Data can be used for reports, analytics and machine learning use cases.
58 | Order may be linked and joined by other tables
59 | limitations: >
60 | Not suitable for real-time use cases.
61 | Data may not be used to identify individual customers.
62 | Max data processing per day: 10 TiB
63 | billing: 5000 USD per month
64 | noticePeriod: P3M
65 | models:
66 | orders:
67 | description: One record per order. Includes cancelled and deleted orders.
68 | type: table
69 | fields:
70 | order_id:
71 | $ref: '#/definitions/order_id'
72 | order_timestamp:
73 | type: timestamp
74 | description: The business timestamp in UTC when the order was successfully registered in the source system and the payment was successful.
75 | order_total:
76 | type: long
77 | description: Total amount the smallest monetary unit (e.g., cents).
78 | line_items:
79 | description: A single article that is part of an order.
80 | type: table
81 | fields:
82 | lines_item_id:
83 | type: string
84 | description: Primary key of the lines_item_id table
85 | order_id:
86 | $ref: '#/definitions/order_id'
87 | sku:
88 | description: The purchased article number
89 | $ref: '#/definitions/sku'
90 | definitions:
91 | order_id:
92 | domain: checkout
93 | name: order_id
94 | title: Order ID
95 | type: string
96 | description: An internal ID that identifies an order in the online shop.
97 | example: 243c25e5-a081-43a9-aeab-6d5d5b6cb5e2
98 | pii: true
99 | classification: restricted
100 | sku:
101 | domain: inventory
102 | name: sku
103 | title: Stock Keeping Unit
104 | type: string
105 | example: AC1212ME1
106 | description: |
107 | A Stock Keeping Unit (SKU) is an internal unique identifier for an article.
108 | It is typically associated with an article's barcode, such as the EAN/GTIN.
109 | examples:
110 | - type: csv # csv, json, yaml, custom
111 | model: orders
112 | data: |- # expressed as string or inline yaml or via "$ref: data.csv"
113 | order_id,order_timestamp,order_total
114 | "1001","2023-09-09T08:30:00Z",2500
115 | "1002","2023-09-08T15:45:00Z",1800
116 | "1003","2023-09-07T12:15:00Z",3200
117 | "1004","2023-09-06T19:20:00Z",1500
118 | "1005","2023-09-05T10:10:00Z",4200
119 | "1006","2023-09-04T14:55:00Z",2800
120 | "1007","2023-09-03T21:05:00Z",1900
121 | "1008","2023-09-02T17:40:00Z",3600
122 | "1009","2023-09-01T09:25:00Z",3100
123 | "1010","2023-08-31T22:50:00Z",2700
124 | - type: csv
125 | model: line_items
126 | data: |-
127 | lines_item_id,order_id,sku
128 | "1","1001","5901234123457"
129 | "2","1001","4001234567890"
130 | "3","1002","5901234123457"
131 | "4","1002","2001234567893"
132 | "5","1003","4001234567890"
133 | "6","1003","5001234567892"
134 | "7","1004","5901234123457"
135 | "8","1005","2001234567893"
136 | "9","1005","5001234567892"
137 | "10","1005","6001234567891"
138 | quality:
139 | type: SodaCL # data quality check format: SodaCL, montecarlo, custom
140 | specification: # expressed as string or inline yaml or via "$ref: checks.yaml"
141 | checks for orders:
142 | - freshness(order_timestamp) < 24h
143 | - row_count > 500000
144 | - duplicate_count(order_id) = 0
145 | checks for line_items:
146 | - row_count > 500000
147 | ```
148 |
149 | Schema
150 | ---
151 |
152 | - [Data Contract Object](#data-contract-object)
153 | - [Info Object](#info-object)
154 | - [Contact Object](#contact-object)
155 | - [Server Object](#server-object)
156 | - [Terms Object](#terms-object)
157 | - [Model Object](#model-object)
158 | - [Field Object](#field-object)
159 | - [Definition Object](#definition-object)
160 | - [Schema Object](#schema-object)
161 | - [Example Object](#example-object)
162 | - [Quality Object](#quality-object)
163 | - [Data Types](#data-types)
164 | - [Specification Extensions](#specification-extensions)
165 |
166 |
167 | [JSON Schema](https://github.com/datacontract/datacontract-specification/blob/main/datacontract.schema.json) of the Data Contract Specification.
168 |
169 | ### Data Contract Object
170 |
171 | This is the root document.
172 |
173 | It is _RECOMMENDED_ that the root document be named: `datacontract.yaml`.
174 |
175 | | Field | Type | Description |
176 | |---------------------------|------------------------------------------------------|----------------------------------------------------------------------------------------------------------|
177 | | dataContractSpecification | `string` | REQUIRED. Specifies the Data Contract Specification being used. |
178 | | id | `string` | REQUIRED. An organization-wide unique technical identifier, such as a UUID, URN, slug, string, or number |
179 | | info | [Info Object](#info-object) | REQUIRED. Specifies the metadata of the data contract. May be used by tooling. |
180 | | servers | Map[string, [Server Object](#server-object)] | Specifies the servers of the data contract. |
181 | | terms | [Terms Object](#terms-object) | Specifies the terms and conditions of the data contract. |
182 | | models | Map[string, [Model Object](#model-object)] | Specifies the logical data model. |
183 | | definitions | Map[string, [Definition Object](#definition-object)] | Specifies definitions. |
184 | | schema | [Schema Object](#schema-object) | Specifies the physical schema. The specification supports different schema format. |
185 | | examples | Array of [Example Objects](#example-object) | Specifies example data sets for the data model. The specification supports different example types. |
186 | | quality | [Quality Object](#quality-object) | Specifies the quality attributes and checks. The specification supports different quality check DSLs. |
187 |
188 | This object _MAY_ be extended with [Specification Extensions](#specification-extensions).
189 |
190 |
191 |
192 |
193 | ### Info Object
194 |
195 | Metadata and life cycle information about the data contract.
196 |
197 |
198 | | Field | Type | Description |
199 | |---------|--------|------------------------------------------------------------------------------------------------------------------------------------------------------------------|
200 | | title | `string` | REQUIRED. The title of the data contract. |
201 | | version | `string` | REQUIRED. The version of the data contract document (which is distinct from the Data Contract Specification version or the Data Product implementation version). |
202 | | description | `string` | A description of the data contract. |
203 | | owner | `string` | The owner or team responsible for managing the data contract and providing the data. |
204 | | contact | [Contact Object](#contact-object) | Contact information for the data contract. |
205 |
206 | This object _MAY_ be extended with [Specification Extensions](#specification-extensions).
207 |
208 |
209 | ### Contact Object
210 |
211 | Contact information for the data contract.
212 |
213 | | Field | Type | Description |
214 | |-------|----------|-------------------------------------------------------------------------------------------------------|
215 | | name | `string` | The identifying name of the contact person/organization. |
216 | | url | `string` | The URL pointing to the contact information. This _MUST_ be in the form of a URL. |
217 | | email | `string` | The email address of the contact person/organization. This _MUST_ be in the form of an email address. |
218 |
219 | This object _MAY_ be extended with [Specification Extensions](#specification-extensions).
220 |
221 | ### Server Object
222 |
223 | The fields are dependent on the defined type.
224 |
225 | | Field | Type | Description |
226 | |-------------|----------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
227 | | type | `string` | The type of the data product technology that implements the data contract. Well-known server types are: `bigquery`, `s3`, `redshift`, `snowflake`, `databricks`, `kafka` |
228 | | description | `string` | An optional string describing the server. |
229 |
230 | This object _MAY_ be extended with [Specification Extensions](#specification-extensions).
231 |
232 | #### BigQuery Server Object
233 |
234 | | Field | Type | Description |
235 | |---------|----------|-------------|
236 | | type | `string` | `bigquery` |
237 | | project | `string` | |
238 | | dataset | `string` | |
239 |
240 | #### S3 Server Object
241 |
242 | | Field | Type | Description |
243 | |----------|----------|--------------------------------|
244 | | type | `string` | `s3` |
245 | | location | `string` | S3 URL, starting with `s3://` |
246 |
247 | Example:
248 |
249 | ```yaml
250 | servers:
251 | production:
252 | type: s3
253 | location: s3://acme-orders-prod/orders/
254 | ```
255 |
256 |
257 | #### Redshift Server Object
258 |
259 | | Field | Type | Description |
260 | |----------|----------|-------------|
261 | | type | `string` | `redshift` |
262 | | account | `string` | |
263 | | database | `string` | |
264 | | schema | `string` | |
265 |
266 | #### Snowflake Server Object
267 |
268 | | Field | Type | Description |
269 | |----------|----------|-------------|
270 | | type | `string` | `snowflake` |
271 | | account | `string` | |
272 | | database | `string` | |
273 | | schema | `string` | |
274 |
275 | #### Databricks Server Object
276 |
277 | | Field | Type | Description |
278 | |----------|----------|--------------|
279 | | type | `string` | `databricks` |
280 | | share | `string` | |
281 |
282 | #### Kafka Server Object
283 |
284 | | Field | Type | Description |
285 | |-------|----------|-------------|
286 | | type | `string` | `kafka` |
287 | | host | `string` | |
288 | | topic | `string` | |
289 |
290 |
291 |
292 | ### Terms Object
293 |
294 | The terms and conditions of the data contract.
295 |
296 | | Field | Type | Description |
297 | |----------------------|--------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
298 | | usage | `string` | The usage describes the way the data is expected to be used. Can contain business and technical information. |
299 | | limitations | `string` | The limitations describe the restrictions on how the data can be used, can be technical or restrictions on what the data may not be used for. |
300 | | billing | `string` | The billing describes the pricing model for using the data, such as whether it's free, having a monthly fee, or metered pay-per-use. |
301 | | noticePeriod | `string` | The period of time that must be given by either party to terminate or modify a data usage agreement. Uses ISO-8601 period format, e.g., `P3M` for a period of three months. |
302 |
303 |
304 | ### Model Object
305 |
306 | The Model Object describes the structure and semantics of a data model, such as tables, views, or structured files.
307 |
308 | The name of the data model (table name) is defined by the key that refers to this Model Object.
309 |
310 | | Field | Type | Description |
311 | |-------------|----------------------------------------------|-----------------------------------------------------------------------|
312 | | type | `string` | The type of the model. Examples: `table`, `object`. Default: `table`. |
313 | | description | `string` | An optional string describing the data model. |
314 | | fields | Map[`string`, [Field Object](#field-object)] | The fields (e.g. columns) of the data model. |
315 |
316 |
317 |
318 | ### Field Object
319 |
320 | The Field Objects describes one field (column, property, nested field) of a data model.
321 |
322 | | Field | Type | Description |
323 | |----------------|--------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
324 | | type | [Data Type](#data-types) | The logical data type of the field. |
325 | | description | `string` | An optional string describing the semantic of the data in this field. |
326 | | pii | `boolean` | An indication, if this field contains Personal Identifiable Information (PII). |
327 | | classification | `string` | The data class defining the sensitivity level for this field, according to the organization's classification scheme. Examples may be: `sensitive`, `restricted`, `internal`, `public`. |
328 | | tags | Array of `string` | Custom metadata to provide additional context. |
329 | | $ref | `string` | A reference URI to a definition in the specification, internally or externally. Properties will be inherited from the definition. |
330 |
331 |
332 | ### Definition Object
333 |
334 | The Definition Object includes a clear and concise explanations of syntax, semantic, and classification of a business object in a given domain.
335 | It serves as a reference for a common understanding of terminology, ensure consistent usage and to identify join-able fields.
336 | Models fields can refer to definitions using the `$ref` field to link to existing definitions and avoid duplicate documentations.
337 |
338 | | Field | Type | Description |
339 | |----------------|--------------------------|----------------------------------------------------------------------------------------------------------------------|
340 | | domain | `string` | The domain in which this definition is valid. Default: `global`. |
341 | | name | `string` | The technical name of this definition. |
342 | | title | `string` | The business name of this definition. |
343 | | type | [Data Type](#data-types) | The logical data type |
344 | | description | `string` | Clear and concise explanations related to the domain |
345 | | example | `string` | An example value. |
346 | | pii | `boolean` | An indication, if this field contains Personal Identifiable Information (PII). |
347 | | classification | `string` | The data class defining the sensitivity level for this field, according to the organization's classification scheme. |
348 | | tags | Array of `string` | Custom metadata to provide additional context. |
349 |
350 |
351 | ### Schema Object
352 |
353 | The schema of the data contract describes the physical schema.
354 | The type of the schema depends on the data platform.
355 |
356 | | Field | Type | Description |
357 | | ----- |----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------|
358 | | type | `string` | REQUIRED. The type of the schema.
Typical values are: `dbt`, `bigquery`, `json-schema`, `sql-ddl`, `avro`, `protobuf`, `custom` |
359 | | specification | [dbt Schema Object](#dbt-schema-object) \|
[BigQuery Schema Object](#bigquery-schema-object) \|
[JSON Schema Schema Object](#bigquery-schema-object) \|
[SQL DDL Schema Object](#sql-ddl-schema-object) \|
`string` | REQUIRED. The specification of the schema. The schema specification can be encoded as a string or as inline YAML. |
360 |
361 |
362 | #### dbt Schema Object
363 |
364 | https://docs.getdbt.com/reference/model-properties
365 |
366 | Example (inline YAML):
367 |
368 | ```yaml
369 | schema:
370 | type: dbt
371 | specification:
372 | version: 2
373 | models:
374 | - name: "My Table"
375 | description: "My description"
376 | columns:
377 | - name: "My column"
378 | data_type: text
379 | description: "My description"
380 | ```
381 |
382 | Example (string):
383 |
384 | ```yaml
385 | schema:
386 | type: dbt
387 | specification: |-
388 | version: 2
389 | models:
390 | - name: "My Table"
391 | description: "My description"
392 | columns:
393 | - name: "My column"
394 | data_type: text
395 | description: "My description"
396 | ```
397 |
398 | #### BigQuery Schema Object
399 |
400 | The schema structure is defined by the [Google BigQuery Table](https://cloud.google.com/bigquery/docs/reference/rest/v2/tables#resource:-table) object. You can extract such a Table object via the [tables.get](https://cloud.google.com/bigquery/docs/reference/rest/v2/tables/get) endpoint.
401 |
402 | Instead of providing a single Table object, you can also provide an array of such objects. Be aware that [tables.list](https://cloud.google.com/bigquery/docs/reference/rest/v2/tables/list) only returns a subset of the full Table object. You need to call every Table object via [tables.get](https://cloud.google.com/bigquery/docs/reference/rest/v2/tables/get) to get the full Table object, including the actual schema.
403 |
404 | Learn more: [Google BigQuery REST Reference v2](https://cloud.google.com/bigquery/docs/reference/rest)
405 |
406 |
407 |
408 | Example:
409 |
410 | ```yaml
411 | schema:
412 | type: bigquery
413 | specification: |-
414 | {
415 | "tableReference": {
416 | "projectId": "my-project",
417 | "datasetId": "my_dataset",
418 | "tableId": "my_table"
419 | },
420 | "description": "This is a description",
421 | "type": "TABLE",
422 | "schema": {
423 | "fields": [
424 | {
425 | "name": "name",
426 | "type": "STRING",
427 | "mode": "NULLABLE",
428 | "description": "This is a description"
429 | }
430 | ]
431 | }
432 | }
433 | ```
434 |
435 | #### JSON Schema Schema Object
436 |
437 | JSON Schema can be defined as JSON or rendered as YAML, following the [OpenAPI Schema Object dialect](https://spec.openapis.org/oas/v3.1.0#properties)
438 |
439 | Example (inline YAML):
440 |
441 | ```yaml
442 | schema:
443 | type: json-schema
444 | specification:
445 | orders:
446 | description: One record per order. Includes cancelled and deleted orders.
447 | type: object
448 | properties:
449 | order_id:
450 | type: string
451 | description: Primary key of the orders table
452 | order_timestamp:
453 | type: string
454 | format: date-time
455 | description: The business timestamp in UTC when the order was successfully registered in the source system and the payment was successful.
456 | order_total:
457 | type: integer
458 | description: Total amount of the order in the smallest monetary unit (e.g., cents).
459 | line_items:
460 | type: object
461 | properties:
462 | lines_item_id:
463 | type: string
464 | description: Primary key of the lines_item_id table
465 | order_id:
466 | type: string
467 | description: Foreign key to the orders table
468 | sku:
469 | type: string
470 | description: The purchased article number
471 | ```
472 |
473 | Example (string):
474 |
475 | ```yaml
476 | schema:
477 | type: json-schema
478 | specification: |-
479 | {
480 | "$schema": "http://json-schema.org/draft-07/schema#",
481 | "type": "object",
482 | "properties": {
483 | "orders": {
484 | "type": "object",
485 | "description": "One record per order. Includes cancelled and deleted orders.",
486 | "properties": {
487 | "order_id": {
488 | "type": "string",
489 | "description": "Primary key of the orders table"
490 | },
491 | "order_timestamp": {
492 | "type": "string",
493 | "format": "date-time",
494 | "description": "The business timestamp in UTC when the order was successfully registered in the source system and the payment was successful."
495 | },
496 | "order_total": {
497 | "type": "integer",
498 | "description": "Total amount of the order in the smallest monetary unit (e.g., cents)."
499 | }
500 | },
501 | "required": ["order_id", "order_timestamp", "order_total"]
502 | },
503 | "line_items": {
504 | "type": "object",
505 | "properties": {
506 | "lines_item_id": {
507 | "type": "string",
508 | "description": "Primary key of the lines_item_id table"
509 | },
510 | "order_id": {
511 | "type": "string",
512 | "description": "Foreign key to the orders table"
513 | },
514 | "sku": {
515 | "type": "string",
516 | "description": "The purchased article number"
517 | }
518 | },
519 | "required": ["lines_item_id", "order_id", "sku"]
520 | }
521 | },
522 | "required": ["orders", "line_items"]
523 | }
524 | ```
525 |
526 | #### SQL DDL Schema Object
527 |
528 | Classical SQL DDLs can be used to describe the structure.
529 |
530 |
531 | Example (string):
532 |
533 | ```yaml
534 | schema:
535 | type: sql-ddl
536 | specification: |-
537 | -- One record per order. Includes cancelled and deleted orders.
538 | CREATE TABLE orders (
539 | order_id TEXT PRIMARY KEY, -- Primary key of the orders table
540 | order_timestamp TIMESTAMPTZ NOT NULL, -- The business timestamp in UTC when the order was successfully registered in the source system and the payment was successful.
541 | order_total INTEGER NOT NULL -- Total amount of the order in the smallest monetary unit (e.g., cents)
542 | );
543 |
544 | -- The items that are part of an order
545 | CREATE TABLE line_items (
546 | lines_item_id TEXT PRIMARY KEY, -- Primary key of the lines_item_id table
547 | order_id TEXT REFERENCES orders(order_id), -- Foreign key to the orders table
548 | sku TEXT NOT NULL -- The purchased article number
549 | );
550 |
551 | ```
552 |
553 | ### Example Object
554 |
555 | | Field | Type | Description |
556 | |-------------|----------|-----------------------------------------------------------------------------------------------------------------------------------------|
557 | | type | `string` | The type of the data product technology that implements the data contract. Well-known server types are: `csv`, `json`, `yaml`, `custom` |
558 | | description | `string` | An optional string describing the example. |
559 | | model | `string` | The reference to the model in the schema, e.g. a table name. |
560 | | data | `string` | Example data for this model. |
561 |
562 | Example:
563 |
564 | ```yaml
565 | examples:
566 | - type: csv
567 | model: orders
568 | data: |-
569 | order_id,order_timestamp,order_total
570 | "1001","2023-09-09T08:30:00Z",2500
571 | "1002","2023-09-08T15:45:00Z",1800
572 | "1003","2023-09-07T12:15:00Z",3200
573 | "1004","2023-09-06T19:20:00Z",1500
574 | "1005","2023-09-05T10:10:00Z",4200
575 | "1006","2023-09-04T14:55:00Z",2800
576 | "1007","2023-09-03T21:05:00Z",1900
577 | "1008","2023-09-02T17:40:00Z",3600
578 | "1009","2023-09-01T09:25:00Z",3100
579 | "1010","2023-08-31T22:50:00Z",2700
580 | ```
581 |
582 | ### Quality Object
583 |
584 | The quality object contains quality attributes and checks.
585 |
586 | | Field | Type | Description |
587 | | ----- |-------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------|
588 | | type | `string` | REQUIRED. The type of the schema.
Typical values are: `SodaCL`, `montecarlo`, `custom` |
589 | | specification | [SodaCL Quality Object](#sodacl-quality-object) \|
[Monte Carlo Schema Object](#monte-carlo-quality-object) \|
`string` | REQUIRED. The specification of the quality attributes. The quality specification can be encoded as a string or as inline YAML. |
590 |
591 |
592 | #### SodaCL Quality Object
593 |
594 | Quality attributes in [Soda Checks Language](https://docs.soda.io/soda-cl/soda-cl-overview.html).
595 |
596 | The `specification` represents the content of a `checks.yml` file.
597 |
598 | Example (inline):
599 |
600 | ```yaml
601 | quality:
602 | type: SodaCL # data quality check format: SodaCL, montecarlo, dbt-tests, custom
603 | specification: # expressed as string or inline yaml or via "$ref: checks.yaml"
604 | checks for orders:
605 | - row_count > 0
606 | - duplicate_count(order_id) = 0
607 | checks for line_items:
608 | - row_count > 0
609 | ```
610 |
611 | Example (string):
612 |
613 | ```yaml
614 | quality:
615 | type: SodaCL
616 | specification: |-
617 | checks for search_queries:
618 | - freshness(search_timestamp) < 1d
619 | - row_count > 100000
620 | - missing_count(search_query) = 0
621 | ```
622 |
623 | #### Monte Carlo Quality Object
624 |
625 | Quality attributes defined as Monte Carlos [Monitors as Code](https://docs.getmontecarlo.com/docs/monitors-as-code).
626 |
627 | The `specification` represents the content of a `montecarlo.yml` file.
628 |
629 | Example (string):
630 |
631 | ```yaml
632 | quality:
633 | type: montecarlo
634 | specification: |-
635 | montecarlo:
636 | field_health:
637 | - table: project:dataset.table_name
638 | timestamp_field: created
639 | dimension_tracking:
640 | - table: project:dataset.table_name
641 | timestamp_field: created
642 | field: order_status
643 | ```
644 |
645 | ### Data Types
646 |
647 | The following data types are supported for model fields and definitions:
648 |
649 | - Unicode character sequence: `string`, `text`, `varchar`
650 | - Any numeric type, either integers or floating point numbers: `number`, `decimal`, `numeric`
651 | - 32-bit signed integer: `int`, `integer`
652 | - 64-bit signed integer: `long`, `bigint`
653 | - Single precision (32-bit) IEEE 754 floating-point number: `float`
654 | - Double precision (64-bit) IEEE 754 floating-point number: `double`
655 | - Binary value: `boolean`
656 | - Timestamp with timezone: `timestamp`, `timestamp_tz`
657 | - Timestamp with no timezone: `timestamp_ntz`
658 | - Date with no time information: `date`
659 | - Array: `array`
660 | - Sequence of 8-bit unsigned bytes: `bytes`
661 | - Complex type: `object`, `record`, `struct`
662 | - No value: `null`
663 |
664 | ### Specification Extensions
665 |
666 | While the Data Contract Specification tries to accommodate most use cases, additional data can be added to extend the specification at certain points.
667 |
668 | A custom fields can be added with any name. The value can be null, a primitive, an array or an object.
669 |
670 | ### Design Principles
671 |
672 | The Data Contract Specification follows these design principles:
673 |
674 | - A free, open, and open-sourced standard
675 | - Follow OpenAPI and AsyncAPI conventions so that it feels immediately familiar
676 | - Support contract-first approaches
677 | - Support code-first approaches
678 | - Support tooling by being machine-readable
679 |
680 | Tooling
681 | ---
682 | - [Data Contract CLI](https://github.com/datacontract/cli) is a free CLI tool to help you create, develop, and maintain your data contracts.
683 | - [Data Contract Studio](https://studio.datacontract.com/) is a free web tool to develop and share data contracts.
684 | - [Data Mesh Manager](https://www.datamesh-manager.com/) is a commercial tool to manage data products and data contracts. It supports the data contract specification and allows the user to import or export data contracts using this specification.
685 |
686 |
687 | Other Data Contract Specifications
688 | ---
689 | - [AIDA User Group's Open Data Contract Standard](https://github.com/AIDAUserGroup/open-data-contract-standard)
690 | - [PayPal's Data Contract Template](https://github.com/paypal/data-contract-template/blob/main/docs/README.md)
691 |
692 | Literature
693 | ---
694 | - [Driving Data Quality with Data Contracts](https://www.amazon.com/dp/B0C37FPH3D) by Andrew Jones
695 |
696 | Authors
697 | ---
698 | The Data Contract Specification was originally created by [Jochen Christ](https://www.linkedin.com/in/jochenchrist/) and [Dr. Simon Harrer](https://www.linkedin.com/in/simonharrer/), and is currently maintained by them.
699 |
700 |
701 | Contributing
702 | ---
703 | Contributions are welcome! Please open an issue or a pull request.
704 |
705 | License
706 | ---
707 | [MIT License](LICENSE)
708 |
709 |
710 |
711 |
--------------------------------------------------------------------------------
/versions/0.9.1/datacontract.init.yaml:
--------------------------------------------------------------------------------
1 | dataContractSpecification: 0.9.1
2 | id: my-data-contract-id
3 | info:
4 | title: My Data Contract
5 | version: 0.0.1
6 | # description:
7 | # owner:
8 | # contact:
9 | # name:
10 | # url:
11 | # email:
12 |
13 |
14 | ### servers
15 |
16 | #servers:
17 | # my-stage:
18 | # type: bigquery
19 | # project:
20 | # dataset:
21 |
22 | #servers:
23 | # my-stage:
24 | # type: s3
25 | # location: s3://
26 |
27 | #servers:
28 | # my-stage:
29 | # type: redshift
30 | # account:
31 | # database:
32 | # schema:
33 |
34 | #servers:
35 | # my-stage:
36 | # type: snowflake
37 | # account:
38 | # database:
39 | # schema:
40 |
41 | #servers:
42 | # my-stage:
43 | # type: databricks
44 | # share:
45 |
46 | #servers:
47 | # my-stage:
48 | # type: kafka
49 | # host:
50 | # topic:
51 |
52 |
53 | ### terms
54 |
55 | #terms:
56 | # usage:
57 | # limitations:
58 | # billing:
59 | # noticePeriod:
60 |
61 | ### models
62 | # models:
63 | # my_model:
64 | # description:
65 | # type:
66 | # fields:
67 | # my_field:
68 | # type:
69 | # description:
70 |
71 | ### definitions
72 | # definitions:
73 | # my_field:
74 | # domain:
75 | # name:
76 | # title:
77 | # type:
78 | # description:
79 | # example:
80 | # pii:
81 | # classification:
82 |
83 | ### schema
84 |
85 | #schema:
86 | # type: dbt
87 | # specification:
88 | # version:
89 | # models:
90 | # - name:
91 | # description:
92 | # columns:
93 | # - name:
94 | # type:
95 | # description:
96 | # tests:
97 |
98 | #schema:
99 | # type: dbt
100 | # specification: |-
101 | # version:
102 | # models:
103 | # - name:
104 | # description:
105 | # columns:
106 | # - name:
107 | # type:
108 | # description:
109 | # tests:
110 |
111 | #schema:
112 | # type: dbt
113 | # specification: "$ref: model.yaml"
114 |
115 | #schema:
116 | # type: bigquery
117 | # specification: |-
118 | # {
119 | # "tableReference": {
120 | # "projectId": "my-project",
121 | # "datasetId": "my_dataset",
122 | # "tableId": "my_table"
123 | # },
124 | # "description": "This is a description",
125 | # "type": "TABLE",
126 | # "schema": {
127 | # "fields": [
128 | # {
129 | # "name": "name",
130 | # "type": "STRING",
131 | # "mode": "NULLABLE",
132 | # "description": "This is a description"
133 | # }
134 | # ]
135 | # }
136 | # }
137 |
138 | #schema:
139 | # type: json-schema
140 | # specification:
141 | # my-table:
142 | # description:
143 | # type: object
144 | # properties:
145 | # id:
146 | # type: string
147 | # description:
148 |
149 | #schema:
150 | # type: json-schema
151 | # specification: |-
152 | # {
153 | # "$schema": "http://json-schema.org/draft-07/schema#",
154 | # "type": "object",
155 | # "properties": {
156 | # "my_table": {
157 | # "type": "object",
158 | # "description": "",
159 | # "properties": {
160 | # "id": {
161 | # "type": "string",
162 | # "description": ""
163 | # },
164 | # "required": ["id"]
165 | # }
166 | # },
167 | # "required": ["my-table"]
168 | # }
169 |
170 | #schema:
171 | # type: sql-ddl
172 | # specification: |-
173 | # CREATE TABLE my_table (
174 | # id TEXT PRIMARY KEY
175 | # );
176 |
177 | #schema:
178 | # type: avro
179 | # specification:
180 | # User:
181 | # type: record
182 | # name: MyTable
183 | # fields:
184 | # - name: id
185 | # type: string
186 |
187 | #schema:
188 | # type: avro
189 | # specification: |-
190 | # {
191 | # "type": "record",
192 | # "name": "MyTable",
193 | # "fields": [
194 | # {
195 | # "name": "name",
196 | # "type": "string"
197 | # }
198 | # ]
199 | # }
200 |
201 | #schema:
202 | # type: protobuf
203 | # specification: |-
204 | # message MyTable {
205 | # string id = 1;
206 | # }
207 |
208 | #schema:
209 | # type: custom
210 | # specification:
211 |
212 |
213 | ### examples
214 |
215 | #examples:
216 | # - type: csv
217 | # model: my_table
218 | # data: |-
219 | # id,timestamp,amount
220 | # "1001","2023-09-09T08:30:00Z",2500
221 | # "1002","2023-09-08T15:45:00Z",1800
222 | #
223 | #examples:
224 | # - type: csv
225 | # model: my_table
226 | # data: "$ref: data.csv"
227 |
228 | #examples:
229 | # - type: json
230 | # model: my_table
231 | # data: |-
232 | # [
233 | # {
234 | # "id": "1001",
235 | # "timestamp": "2023-09-09T08:30:00Z",
236 | # "amount": 2500
237 | # },
238 | # {
239 | # "id": "1002",
240 | # "timestamp": "2023-09-08T15:45:00Z",
241 | # "amount": 1800
242 | # }
243 | # ]
244 |
245 | #examples:
246 | # - type: yaml
247 | # model: my_table
248 | # data:
249 | # - id: 1001
250 | # timestamp: 2023-09-09T08:30:00Z
251 | # amount: 2500
252 | # - id: 1002
253 | # timestamp: 2023-09-08T15:45:00Z
254 | # amount: 1800
255 |
256 | #examples:
257 | # - type: custom
258 | # model: my_table
259 | # data: |-
260 |
261 |
262 | ### quality
263 |
264 | #quality:
265 | # type: SodaCL
266 | # specification:
267 | # checks for my_table:
268 | # - duplicate_count(order_id) = 0
269 |
270 | #quality:
271 | # type: SodaCL
272 | # specification:
273 | # checks for my_table: |-
274 | # - duplicate_count(id) = 0
275 |
276 | #quality:
277 | # type: SodaCL
278 | # specification:
279 | # checks for my_table: "$ref: checks.yaml"
280 |
281 | #quality:
282 | # type: montecarlo
283 | # specification: |-
284 | # montecarlo:
285 | # field_health:
286 | # - table: my_project:my_dataset.my_table
287 | # fields:
288 | # - id
289 | # - timestamp
290 | # - amount
291 | # timestamp_field: timestamp
292 |
293 | #quality:
294 | # type: custom
295 | # specification: |-
296 |
--------------------------------------------------------------------------------
/versions/0.9.1/datacontract.schema.json:
--------------------------------------------------------------------------------
1 | {
2 | "$schema": "http://json-schema.org/draft-07/schema#",
3 | "type": "object",
4 | "properties": {
5 | "dataContractSpecification": {
6 | "type": "string",
7 | "enum": [
8 | "0.9.1",
9 | "0.9.0"
10 | ],
11 | "description": "Specifies the Data Contract Specification being used."
12 | },
13 | "id": {
14 | "type": "string",
15 | "description": "Specifies the identifier of the data contract."
16 | },
17 | "info": {
18 | "type": "object",
19 | "properties": {
20 | "title": {
21 | "type": "string",
22 | "description": "The title of the data contract."
23 | },
24 | "version": {
25 | "type": "string",
26 | "description": "The version of the data contract document (which is distinct from the Data Contract Specification version or the Data Product implementation version)."
27 | },
28 | "description": {
29 | "type": "string",
30 | "description": "A description of the data contract."
31 | },
32 | "owner": {
33 | "type": "string",
34 | "description": "The owner or team responsible for managing the data contract and providing the data."
35 | },
36 | "contact": {
37 | "type": "object",
38 | "properties": {
39 | "name": {
40 | "type": "string",
41 | "description": "The identifying name of the contact person/organization."
42 | },
43 | "url": {
44 | "type": "string",
45 | "format": "uri",
46 | "description": "The URL pointing to the contact information. This MUST be in the form of a URL."
47 | },
48 | "email": {
49 | "type": "string",
50 | "format": "email",
51 | "description": "The email address of the contact person/organization. This MUST be in the form of an email address."
52 | }
53 | },
54 | "description": "Contact information for the data contract."
55 | }
56 | },
57 | "required": [
58 | "title",
59 | "version"
60 | ],
61 | "description": "Metadata and life cycle information about the data contract."
62 | },
63 | "servers": {
64 | "type": "object",
65 | "additionalProperties": {
66 | "anyOf": [
67 | {
68 | "type": "object",
69 | "properties": {
70 | "type": {
71 | "type": "string",
72 | "enum": [
73 | "bigquery",
74 | "BigQuery"
75 | ],
76 | "description": "The type of the data product technology that implements the data contract."
77 | },
78 | "project": {
79 | "type": "string",
80 | "description": "An optional string describing the server."
81 | },
82 | "dataset": {
83 | "type": "string",
84 | "description": "An optional string describing the server."
85 | }
86 | },
87 | "required": [
88 | "type",
89 | "project",
90 | "dataset"
91 | ]
92 | },
93 | {
94 | "type": "object",
95 | "properties": {
96 | "type": {
97 | "type": "string",
98 | "enum": [
99 | "s3"
100 | ],
101 | "description": "The type of the data product technology that implements the data contract."
102 | },
103 | "location": {
104 | "type": "string",
105 | "format": "uri",
106 | "description": "An optional string describing the server. Must be in the form of a URL."
107 | }
108 | },
109 | "required": [
110 | "type",
111 | "location"
112 | ]
113 | },
114 | {
115 | "type": "object",
116 | "properties": {
117 | "type": {
118 | "type": "string",
119 | "enum": [
120 | "redshift"
121 | ],
122 | "description": "The type of the data product technology that implements the data contract."
123 | },
124 | "account": {
125 | "type": "string",
126 | "description": "An optional string describing the server."
127 | },
128 | "database": {
129 | "type": "string",
130 | "description": "An optional string describing the server."
131 | },
132 | "schema": {
133 | "type": "string",
134 | "description": "An optional string describing the server."
135 | }
136 | },
137 | "required": [
138 | "type",
139 | "account",
140 | "database",
141 | "schema"
142 | ]
143 | },
144 | {
145 | "type": "object",
146 | "properties": {
147 | "type": {
148 | "type": "string",
149 | "enum": [
150 | "snowflake"
151 | ],
152 | "description": "The type of the data product technology that implements the data contract."
153 | },
154 | "account": {
155 | "type": "string",
156 | "description": "An optional string describing the server."
157 | },
158 | "database": {
159 | "type": "string",
160 | "description": "An optional string describing the server."
161 | },
162 | "schema": {
163 | "type": "string",
164 | "description": "An optional string describing the server."
165 | }
166 | },
167 | "required": [
168 | "type",
169 | "account",
170 | "database",
171 | "schema"
172 | ]
173 | },
174 | {
175 | "type": "object",
176 | "properties": {
177 | "type": {
178 | "type": "string",
179 | "enum": [
180 | "databricks"
181 | ],
182 | "description": "The type of the data product technology that implements the data contract."
183 | },
184 | "share": {
185 | "type": "string",
186 | "description": "An optional string describing the server."
187 | }
188 | },
189 | "required": [
190 | "type",
191 | "share"
192 | ]
193 | },
194 | {
195 | "type": "object",
196 | "properties": {
197 | "type": {
198 | "type": "string",
199 | "enum": [
200 | "kafka"
201 | ],
202 | "description": "The type of the data product technology that implements the data contract."
203 | },
204 | "host": {
205 | "type": "string",
206 | "description": "An optional string describing the server."
207 | },
208 | "topic": {
209 | "type": "string",
210 | "description": "An optional string describing the server."
211 | }
212 | },
213 | "required": [
214 | "type",
215 | "host",
216 | "topic"
217 | ]
218 | }
219 | ]
220 | },
221 | "description": "Information about the servers."
222 | },
223 | "terms": {
224 | "type": "object",
225 | "description": "The terms and conditions of the data contract.",
226 | "properties": {
227 | "usage": {
228 | "type": "string",
229 | "description": "The usage describes the way the data is expected to be used. Can contain business and technical information."
230 | },
231 | "limitations": {
232 | "type": "string",
233 | "description": "The limitations describe the restrictions on how the data can be used, can be technical or restrictions on what the data may not be used for."
234 | },
235 | "billing": {
236 | "type": "string",
237 | "description": "The billing describes the pricing model for using the data, such as whether it's free, having a monthly fee, or metered pay-per-use."
238 | },
239 | "noticePeriod": {
240 | "type": "string",
241 | "description": "The period of time that must be given by either party to terminate or modify a data usage agreement. Uses ISO-8601 period format, e.g., 'P3M' for a period of three months."
242 | }
243 | }
244 | },
245 | "models": {
246 | "description": "Specifies the logical data model. Use the models name (e.g., the table name) as the key.",
247 | "type": "object",
248 | "minProperties": 1,
249 | "propertyNames": {
250 | "pattern": "^[a-zA-Z0-9_-]+$"
251 | },
252 | "additionalProperties": {
253 | "type": "object",
254 | "properties": {
255 | "description": {
256 | "type": "string"
257 | },
258 | "type": {
259 | "description": "The type of the model. Examples: table, object. Default: table.",
260 | "type": "string",
261 | "default": "table"
262 | },
263 | "fields": {
264 | "description": "Specifies a field in the data model. Use the field name (e.g., the column name) as the key.",
265 | "type": "object",
266 | "additionalProperties": {
267 | "type": "object",
268 | "properties": {
269 | "$ref": {
270 | "type": "string",
271 | "description": "A reference URI to a definition in the specification, internally or externally. Properties will be inherited from the definition."
272 | },
273 | "type": {
274 | "type": "string",
275 | "description": "The logical data type of the field.",
276 | "enum": [
277 | "number", "decimal", "numeric",
278 | "int", "integer",
279 | "long", "bigint",
280 | "float",
281 | "double",
282 | "string", "text", "varchar",
283 | "boolean",
284 | "timestamp", "timestamp_tz",
285 | "timestamp_ntz",
286 | "date",
287 | "array",
288 | "object", "record", "struct",
289 | "bytes",
290 | "null"
291 | ]
292 | },
293 | "description": {
294 | "type": "string",
295 | "description": "An optional string describing the semantic of the data in this field."
296 | },
297 | "pii": {
298 | "type": "boolean",
299 | "description": "An indication, if this field contains Personal Identifiable Information (PII)."
300 | },
301 | "classification": {
302 | "type": "string",
303 | "description": "The data class defining the sensitivity level for this field, according to the organization's classification scheme.",
304 | "examples": ["sensitive", "restricted", "internal", "public"]
305 | },
306 | "tags": {
307 | "type": "array",
308 | "items": {
309 | "type": "string"
310 | },
311 | "description": "Custom metadata to provide additional context."
312 | }
313 | }
314 | }
315 | }
316 | }
317 | }
318 | },
319 | "schema": {
320 | "type": "object",
321 | "properties": {
322 | "type": {
323 | "type": "string",
324 | "enum": [
325 | "dbt",
326 | "bigquery",
327 | "json-schema",
328 | "sql-ddl",
329 | "avro",
330 | "protobuf",
331 | "custom"
332 | ],
333 | "description": "The type of the schema. Typical values are: dbt, bigquery, json-schema, sql-ddl, avro, protobuf, custom."
334 | },
335 | "specification": {
336 | "anyOf": [
337 | {
338 | "type": "string",
339 | "description": "The specification of the schema as a string."
340 | },
341 | {
342 | "type": "object",
343 | "description": "The specification of the schema as an object."
344 | }
345 | ]
346 | }
347 | },
348 | "required": [
349 | "type",
350 | "specification"
351 | ],
352 | "description": "The schema of the data contract describes the syntax and semantics of provided data sets. It supports different schema types."
353 | },
354 | "examples": {
355 | "type": "array",
356 | "items": {
357 | "type": "object",
358 | "properties": {
359 | "type": {
360 | "type": "string",
361 | "enum": [
362 | "csv",
363 | "json",
364 | "yaml",
365 | "custom"
366 | ],
367 | "description": "The type of the example data. Well-known types are: csv, json, yaml, custom."
368 | },
369 | "description": {
370 | "type": "string",
371 | "description": "An optional string describing the example."
372 | },
373 | "model": {
374 | "type": "string",
375 | "description": "The reference to the model in the schema, e.g., a table name."
376 | },
377 | "data": {
378 | "type": "string",
379 | "description": "Example data for this model."
380 | }
381 | },
382 | "required": [
383 | "type",
384 | "data"
385 | ]
386 | },
387 | "description": "The Examples Object is an array of Example Objects."
388 | },
389 | "quality": {
390 | "type": "object",
391 | "properties": {
392 | "type": {
393 | "type": "string",
394 | "enum": [
395 | "SodaCL",
396 | "montecarlo",
397 | "custom"
398 | ],
399 | "description": "The type of the quality check. Typical values are: SodaCL, montecarlo, custom."
400 | },
401 | "specification": {
402 | "anyOf": [
403 | {
404 | "type": "string",
405 | "description": "The specification of the quality attributes as a string."
406 | },
407 | {
408 | "type": "object",
409 | "description": "The specification of the quality attributes as an object."
410 | }
411 | ]
412 | }
413 | },
414 | "required": [
415 | "type",
416 | "specification"
417 | ],
418 | "description": "The quality object contains quality attributes and checks."
419 | }
420 | },
421 | "required": [
422 | "dataContractSpecification",
423 | "id",
424 | "info"
425 | ]
426 | }
427 |
--------------------------------------------------------------------------------
/versions/0.9.2/datacontract.init.yaml:
--------------------------------------------------------------------------------
1 | dataContractSpecification: 0.9.2
2 | id: my-data-contract-id
3 | info:
4 | title: My Data Contract
5 | version: 0.0.1
6 | # description:
7 | # owner:
8 | # contact:
9 | # name:
10 | # url:
11 | # email:
12 |
13 |
14 | ### servers
15 |
16 | #servers:
17 | # production:
18 | # type: s3
19 | # location: s3://
20 | # format: parquet
21 | # delimiter: new_line
22 |
23 | ### terms
24 |
25 | #terms:
26 | # usage:
27 | # limitations:
28 | # billing:
29 | # noticePeriod:
30 |
31 |
32 | ### models
33 |
34 | # models:
35 | # my_model:
36 | # description:
37 | # type:
38 | # fields:
39 | # my_field:
40 | # type:
41 | # description:
42 |
43 |
44 | ### definitions
45 |
46 | # definitions:
47 | # my_field:
48 | # domain:
49 | # name:
50 | # title:
51 | # type:
52 | # description:
53 | # example:
54 | # pii:
55 | # classification:
56 |
57 |
58 | ### examples
59 |
60 | #examples:
61 | # - type: csv
62 | # model: my_model
63 | # data: |-
64 | # id,timestamp,amount
65 | # "1001","2023-09-09T08:30:00Z",2500
66 | # "1002","2023-09-08T15:45:00Z",1800
67 |
68 |
69 | ### quality
70 |
71 | #quality:
72 | # type: SodaCL
73 | # specification:
74 | # checks for my_model: |-
75 | # - duplicate_count(id) = 0
76 |
--------------------------------------------------------------------------------
/versions/0.9.2/datacontract.schema.json:
--------------------------------------------------------------------------------
1 | {
2 | "$schema": "http://json-schema.org/draft-07/schema#",
3 | "type": "object",
4 | "title": "DataContractSpecification",
5 | "properties": {
6 | "dataContractSpecification": {
7 | "type": "string",
8 | "title": "DataContractSpecificationVersion",
9 | "enum": [
10 | "0.9.2",
11 | "0.9.1",
12 | "0.9.0"
13 | ],
14 | "description": "Specifies the Data Contract Specification being used."
15 | },
16 | "id": {
17 | "type": "string",
18 | "description": "Specifies the identifier of the data contract."
19 | },
20 | "info": {
21 | "type": "object",
22 | "properties": {
23 | "title": {
24 | "type": "string",
25 | "description": "The title of the data contract."
26 | },
27 | "version": {
28 | "type": "string",
29 | "description": "The version of the data contract document (which is distinct from the Data Contract Specification version or the Data Product implementation version)."
30 | },
31 | "description": {
32 | "type": "string",
33 | "description": "A description of the data contract."
34 | },
35 | "owner": {
36 | "type": "string",
37 | "description": "The owner or team responsible for managing the data contract and providing the data."
38 | },
39 | "contact": {
40 | "type": "object",
41 | "properties": {
42 | "name": {
43 | "type": "string",
44 | "description": "The identifying name of the contact person/organization."
45 | },
46 | "url": {
47 | "type": "string",
48 | "format": "uri",
49 | "description": "The URL pointing to the contact information. This MUST be in the form of a URL."
50 | },
51 | "email": {
52 | "type": "string",
53 | "format": "email",
54 | "description": "The email address of the contact person/organization. This MUST be in the form of an email address."
55 | }
56 | },
57 | "description": "Contact information for the data contract."
58 | }
59 | },
60 | "required": [
61 | "title",
62 | "version"
63 | ],
64 | "description": "Metadata and life cycle information about the data contract."
65 | },
66 | "servers": {
67 | "type": "object",
68 | "additionalProperties": {
69 | "oneOf": [
70 | {
71 | "type": "object",
72 | "title": "BigQueryServer",
73 | "properties": {
74 | "type": {
75 | "type": "string",
76 | "enum": [
77 | "bigquery",
78 | "BigQuery"
79 | ],
80 | "description": "The type of the data product technology that implements the data contract."
81 | },
82 | "project": {
83 | "type": "string",
84 | "description": "An optional string describing the server."
85 | },
86 | "dataset": {
87 | "type": "string",
88 | "description": "An optional string describing the server."
89 | }
90 | },
91 | "additionalProperties": true,
92 | "required": [
93 | "type",
94 | "project",
95 | "dataset"
96 | ]
97 | },
98 | {
99 | "type": "object",
100 | "title": "S3Server",
101 | "properties": {
102 | "type": {
103 | "type": "string",
104 | "enum": [
105 | "s3"
106 | ],
107 | "description": "The type of the data product technology that implements the data contract."
108 | },
109 | "location": {
110 | "type": "string",
111 | "format": "uri",
112 | "description": "An optional string describing the server. Must be in the form of a URL."
113 | }
114 | },
115 | "additionalProperties": true,
116 | "required": [
117 | "type",
118 | "location"
119 | ]
120 | },
121 | {
122 | "type": "object",
123 | "title": "RedshiftServer",
124 | "properties": {
125 | "type": {
126 | "type": "string",
127 | "enum": [
128 | "redshift"
129 | ],
130 | "description": "The type of the data product technology that implements the data contract."
131 | },
132 | "account": {
133 | "type": "string",
134 | "description": "An optional string describing the server."
135 | },
136 | "database": {
137 | "type": "string",
138 | "description": "An optional string describing the server."
139 | },
140 | "schema": {
141 | "type": "string",
142 | "description": "An optional string describing the server."
143 | }
144 | },
145 | "additionalProperties": true,
146 | "required": [
147 | "type",
148 | "account",
149 | "database",
150 | "schema"
151 | ]
152 | },
153 | {
154 | "type": "object",
155 | "title": "SnowflakeServer",
156 | "properties": {
157 | "type": {
158 | "type": "string",
159 | "enum": [
160 | "snowflake"
161 | ],
162 | "description": "The type of the data product technology that implements the data contract."
163 | },
164 | "account": {
165 | "type": "string",
166 | "description": "An optional string describing the server."
167 | },
168 | "database": {
169 | "type": "string",
170 | "description": "An optional string describing the server."
171 | },
172 | "schema": {
173 | "type": "string",
174 | "description": "An optional string describing the server."
175 | }
176 | },
177 | "additionalProperties": true,
178 | "required": [
179 | "type",
180 | "account",
181 | "database",
182 | "schema"
183 | ]
184 | },
185 | {
186 | "type": "object",
187 | "title": "DatabricksServer",
188 | "properties": {
189 | "type": {
190 | "type": "string",
191 | "const": "databricks",
192 | "description": "The type of the data product technology that implements the data contract."
193 | },
194 | "host": {
195 | "type": "string",
196 | "description": "The Databricks host",
197 | "examples": ["dbc-abcdefgh-1234.cloud.databricks.com"]
198 | },
199 | "catalog": {
200 | "type": "string",
201 | "description": "The name of the Hive or Unity catalog"
202 | },
203 | "schema": {
204 | "type": "string",
205 | "description": "The schema name in the catalog"
206 | }
207 | },
208 | "additionalProperties": true,
209 | "required": [
210 | "type",
211 | "host",
212 | "catalog",
213 | "schema"
214 | ]
215 | },
216 | {
217 | "type": "object",
218 | "title": "PostgresServer",
219 | "properties": {
220 | "type": {
221 | "type": "string",
222 | "const": "postgres",
223 | "description": "The type of the data product technology that implements the data contract."
224 | },
225 | "host": {
226 | "type": "string",
227 | "description": "The host to the database server",
228 | "examples": ["localhost"]
229 | },
230 | "port": {
231 | "type": "integer",
232 | "description": "The port to the database server."
233 | },
234 | "database": {
235 | "type": "string",
236 | "description": "The name of the database.",
237 | "examples": ["postgres"]
238 | },
239 | "schema": {
240 | "type": "string",
241 | "description": "The name of the schema in the database.",
242 | "examples": ["public"]
243 | }
244 | },
245 | "additionalProperties": true,
246 | "required": [
247 | "type",
248 | "host",
249 | "port",
250 | "database",
251 | "schema"
252 | ]
253 | },
254 | {
255 | "type": "object",
256 | "title": "KafkaServer",
257 | "description": "Kafka Server",
258 | "properties": {
259 | "type": {
260 | "type": "string",
261 | "enum": [
262 | "kafka"
263 | ],
264 | "description": "The type of the data product technology that implements the data contract."
265 | },
266 | "host": {
267 | "type": "string",
268 | "description": "The bootstrap server of the kafka cluster."
269 | },
270 | "topic": {
271 | "type": "string",
272 | "description": "The topic name."
273 | },
274 | "format": {
275 | "type": "string",
276 | "description": "The format of the message. Examples: json, avro, protobuf. Default: json.",
277 | "default": "json"
278 | }
279 | },
280 | "additionalProperties": true,
281 | "required": [
282 | "type",
283 | "host",
284 | "topic"
285 | ]
286 | },
287 | {
288 | "type": "object",
289 | "title": "PubSubServer",
290 | "properties": {
291 | "type": {
292 | "type": "string",
293 | "enum": [
294 | "pubsub"
295 | ],
296 | "description": "The type of the data product technology that implements the data contract."
297 | },
298 | "project": {
299 | "type": "string",
300 | "description": "The GCP project name."
301 | },
302 | "topic": {
303 | "type": "string",
304 | "description": "The topic name."
305 | }
306 | },
307 | "additionalProperties": true,
308 | "required": [
309 | "type",
310 | "project",
311 | "topic"
312 | ]
313 | },
314 | {
315 | "type": "object",
316 | "title": "LocalServer",
317 | "properties": {
318 | "type": {
319 | "type": "string",
320 | "enum": [
321 | "local"
322 | ],
323 | "description": "The type of the data product technology that implements the data contract."
324 | },
325 | "path": {
326 | "type": "string",
327 | "description": "The relative or absolute path to the data file(s).",
328 | "examples": [
329 | "./folder/data.parquet",
330 | "./folder/*.parquet"
331 | ]
332 | },
333 | "format": {
334 | "type": "string",
335 | "description": "The format of the file(s)",
336 | "examples": ["json", "parquet", "csv"]
337 | }
338 | },
339 | "additionalProperties": true,
340 | "required": [
341 | "type",
342 | "path",
343 | "format"
344 | ]
345 | }
346 |
347 | ]
348 | },
349 | "description": "Information about the servers."
350 | },
351 | "terms": {
352 | "type": "object",
353 | "description": "The terms and conditions of the data contract.",
354 | "properties": {
355 | "usage": {
356 | "type": "string",
357 | "description": "The usage describes the way the data is expected to be used. Can contain business and technical information."
358 | },
359 | "limitations": {
360 | "type": "string",
361 | "description": "The limitations describe the restrictions on how the data can be used, can be technical or restrictions on what the data may not be used for."
362 | },
363 | "billing": {
364 | "type": "string",
365 | "description": "The billing describes the pricing model for using the data, such as whether it's free, having a monthly fee, or metered pay-per-use."
366 | },
367 | "noticePeriod": {
368 | "type": "string",
369 | "description": "The period of time that must be given by either party to terminate or modify a data usage agreement. Uses ISO-8601 period format, e.g., 'P3M' for a period of three months."
370 | }
371 | }
372 | },
373 | "models": {
374 | "description": "Specifies the logical data model. Use the models name (e.g., the table name) as the key.",
375 | "type": "object",
376 | "minProperties": 1,
377 | "propertyNames": {
378 | "pattern": "^[a-zA-Z0-9_-]+$"
379 | },
380 | "additionalProperties": {
381 | "type": "object",
382 | "title": "Model",
383 | "properties": {
384 | "description": {
385 | "type": "string"
386 | },
387 | "type": {
388 | "description": "The type of the model. Examples: table, view, object. Default: table.",
389 | "type": "string",
390 | "title": "ModelType",
391 | "default": "table",
392 | "enum": [
393 | "table",
394 | "view",
395 | "object"
396 | ]
397 | },
398 | "fields": {
399 | "description": "Specifies a field in the data model. Use the field name (e.g., the column name) as the key.",
400 | "type": "object",
401 | "additionalProperties": {
402 | "type": "object",
403 | "title": "Field",
404 | "properties": {
405 | "description": {
406 | "type": "string",
407 | "description": "An optional string describing the semantic of the data in this field."
408 | },
409 | "type": {
410 | "type": "string",
411 | "title": "FieldType",
412 | "description": "The logical data type of the field.",
413 | "enum": [
414 | "number",
415 | "decimal",
416 | "numeric",
417 | "int",
418 | "integer",
419 | "long",
420 | "bigint",
421 | "float",
422 | "double",
423 | "string",
424 | "text",
425 | "varchar",
426 | "boolean",
427 | "timestamp",
428 | "timestamp_tz",
429 | "timestamp_ntz",
430 | "date",
431 | "array",
432 | "object",
433 | "record",
434 | "struct",
435 | "bytes",
436 | "null"
437 | ]
438 | },
439 | "required": {
440 | "type": "boolean",
441 | "default": false,
442 | "description": "An indication, if this field must contain a value and may not be null."
443 | },
444 | "primary": {
445 | "type": "boolean",
446 | "default": false,
447 | "description": "If this field is a primary key."
448 | },
449 | "unique": {
450 | "type": "boolean",
451 | "default": false,
452 | "description": "An indication, if the value must be unique within the model."
453 | },
454 | "enum": {
455 | "type": "array",
456 | "items": {
457 | "type": "string"
458 | },
459 | "uniqueItems": true,
460 | "description": "A value must be equal to one of the elements in this array value. Only evaluated if the value is not null."
461 | },
462 | "minLength": {
463 | "type": "number",
464 | "description": "A value must greater than, or equal to, the value of this. Only applies to string types."
465 | },
466 | "maxLength": {
467 | "type": "number",
468 | "description": "A value must less than, or equal to, the value of this. Only applies to string types."
469 | },
470 | "format": {
471 | "type": "string",
472 | "description": "A specific format the value must comply with (e.g., 'email', 'uri', 'uuid')."
473 | },
474 | "pattern": {
475 | "type": "string",
476 | "description": "A regular expression the value must match. Only applies to string types."
477 | },
478 | "minimum": {
479 | "type": "number",
480 | "description": "A value of a number must greater than, or equal to, the value of this. Only evaluated if the value is not null. Only applies to numeric values."
481 | },
482 | "exclusiveMinimum": {
483 | "type": "number",
484 | "description": "A value of a number must greater than the value of this. Only evaluated if the value is not null. Only applies to numeric values."
485 | },
486 | "maximum": {
487 | "type": "number",
488 | "description": "A value of a number must less than, or equal to, the value of this. Only evaluated if the value is not null. Only applies to numeric values."
489 | },
490 | "exclusiveMaximum": {
491 | "type": "number",
492 | "description": "A value of a number must less than the value of this. Only evaluated if the value is not null. Only applies to numeric values."
493 | },
494 | "example": {
495 | "type": "string",
496 | "description": "An example value for this field."
497 | },
498 | "pii": {
499 | "type": "boolean",
500 | "description": "An indication, if this field contains Personal Identifiable Information (PII)."
501 | },
502 | "classification": {
503 | "type": "string",
504 | "description": "The data class defining the sensitivity level for this field, according to the organization's classification scheme.",
505 | "examples": [
506 | "sensitive",
507 | "restricted",
508 | "internal",
509 | "public"
510 | ]
511 | },
512 | "tags": {
513 | "type": "array",
514 | "items": {
515 | "type": "string"
516 | },
517 | "description": "Custom metadata to provide additional context."
518 | },
519 | "$ref": {
520 | "type": "string",
521 | "description": "A reference URI to a definition in the specification, internally or externally. Properties will be inherited from the definition."
522 | }
523 | }
524 | }
525 | }
526 | }
527 | }
528 | },
529 | "definitions": {
530 | "description": "Clear and concise explanations of syntax, semantic, and classification of business objects in a given domain.",
531 | "type": "object",
532 | "propertyNames": {
533 | "pattern": "^[a-zA-Z0-9_-]+$"
534 | },
535 | "additionalProperties": {
536 | "type": "object",
537 | "title": "Definition",
538 | "properties": {
539 | "domain": {
540 | "type": "string",
541 | "description": "The domain in which this definition is valid.",
542 | "default": "global"
543 | },
544 | "name": {
545 | "type": "string",
546 | "description": "The technical name of this definition."
547 | },
548 | "title": {
549 | "type": "string",
550 | "description": "The business name of this definition."
551 | },
552 | "description": {
553 | "type": "string",
554 | "description": "Clear and concise explanations related to the domain."
555 | },
556 | "type": {
557 | "type": "string",
558 | "description": "The logical data type."
559 | },
560 | "minLength": {
561 | "type": "number",
562 | "description": "A value must be greater than or equal to this value. Applies only to string types."
563 | },
564 | "maxLength": {
565 | "type": "number",
566 | "description": "A value must be less than or equal to this value. Applies only to string types."
567 | },
568 | "format": {
569 | "type": "string",
570 | "description": "Specific format requirements for the value (e.g., 'email', 'uri', 'uuid')."
571 | },
572 | "pattern": {
573 | "type": "string",
574 | "description": "A regular expression pattern the value must match. Applies only to string types."
575 | },
576 | "example": {
577 | "type": "string",
578 | "description": "An example value."
579 | },
580 | "pii": {
581 | "type": "boolean",
582 | "description": "Indicates if the field contains Personal Identifiable Information (PII)."
583 | },
584 | "classification": {
585 | "type": "string",
586 | "description": "The data class defining the sensitivity level for this field."
587 | },
588 | "tags": {
589 | "type": "array",
590 | "items": {
591 | "type": "string"
592 | },
593 | "description": "Custom metadata to provide additional context."
594 | }
595 | },
596 | "required": [
597 | "name",
598 | "type"
599 | ]
600 | }
601 | },
602 | "schema": {
603 | "type": "object",
604 | "properties": {
605 | "type": {
606 | "type": "string",
607 | "title": "SchemaType",
608 | "enum": [
609 | "dbt",
610 | "bigquery",
611 | "json-schema",
612 | "sql-ddl",
613 | "avro",
614 | "protobuf",
615 | "custom"
616 | ],
617 | "description": "The type of the schema. Typical values are dbt, bigquery, json-schema, sql-ddl, avro, protobuf, custom."
618 | },
619 | "specification": {
620 | "oneOf": [
621 | {
622 | "type": "string",
623 | "description": "The specification of the schema as a string."
624 | },
625 | {
626 | "type": "object",
627 | "description": "The specification of the schema as an object."
628 | }
629 | ]
630 | }
631 | },
632 | "required": [
633 | "type",
634 | "specification"
635 | ],
636 | "description": "The schema of the data contract describes the syntax and semantics of provided data sets. It supports different schema types."
637 | },
638 | "examples": {
639 | "type": "array",
640 | "items": {
641 | "type": "object",
642 | "properties": {
643 | "type": {
644 | "type": "string",
645 | "title": "ExampleType",
646 | "enum": [
647 | "csv",
648 | "json",
649 | "yaml",
650 | "custom"
651 | ],
652 | "description": "The type of the example data. Well-known types are csv, json, yaml, custom."
653 | },
654 | "description": {
655 | "type": "string",
656 | "description": "An optional string describing the example."
657 | },
658 | "model": {
659 | "type": "string",
660 | "description": "The reference to the model in the schema, e.g., a table name."
661 | },
662 | "data": {
663 | "oneOf": [{
664 | "type": "string",
665 | "description": "Example data for this model."
666 | },{
667 | "type": "array",
668 | "description": "Example data for this model in a structured format. Use this for type json or yaml."
669 | }]
670 | }
671 | },
672 | "required": [
673 | "type",
674 | "data"
675 | ]
676 | },
677 | "description": "The Examples Object is an array of Example Objects."
678 | },
679 | "quality": {
680 | "type": "object",
681 | "properties": {
682 | "type": {
683 | "type": "string",
684 | "title": "QualityType",
685 | "enum": [
686 | "SodaCL",
687 | "montecarlo",
688 | "custom"
689 | ],
690 | "description": "The type of the quality check. Typical values are SodaCL, montecarlo, custom."
691 | },
692 | "specification": {
693 | "oneOf": [
694 | {
695 | "type": "string",
696 | "description": "The specification of the quality attributes as a string."
697 | },
698 | {
699 | "type": "object",
700 | "description": "The specification of the quality attributes as an object."
701 | }
702 | ]
703 | }
704 | },
705 | "required": [
706 | "type",
707 | "specification"
708 | ],
709 | "description": "The quality object contains quality attributes and checks."
710 | }
711 | },
712 | "required": [
713 | "dataContractSpecification",
714 | "id",
715 | "info"
716 | ]
717 | }
718 |
--------------------------------------------------------------------------------
/versions/0.9.3/datacontract.init.yaml:
--------------------------------------------------------------------------------
1 | dataContractSpecification: 0.9.3
2 | id: my-data-contract-id
3 | info:
4 | title: My Data Contract
5 | version: 0.0.1
6 | # description:
7 | # owner:
8 | # contact:
9 | # name:
10 | # url:
11 | # email:
12 |
13 |
14 | ### servers
15 |
16 | #servers:
17 | # production:
18 | # type: s3
19 | # location: s3://
20 | # format: parquet
21 | # delimiter: new_line
22 |
23 | ### terms
24 |
25 | #terms:
26 | # usage:
27 | # limitations:
28 | # billing:
29 | # noticePeriod:
30 |
31 |
32 | ### models
33 |
34 | # models:
35 | # my_model:
36 | # description:
37 | # type:
38 | # fields:
39 | # my_field:
40 | # type:
41 | # description:
42 |
43 |
44 | ### definitions
45 |
46 | # definitions:
47 | # my_field:
48 | # domain:
49 | # name:
50 | # title:
51 | # type:
52 | # description:
53 | # example:
54 | # pii:
55 | # classification:
56 |
57 |
58 | ### examples
59 |
60 | #examples:
61 | # - type: csv
62 | # model: my_model
63 | # data: |-
64 | # id,timestamp,amount
65 | # "1001","2023-09-09T08:30:00Z",2500
66 | # "1002","2023-09-08T15:45:00Z",1800
67 |
68 | ### servicelevels
69 |
70 | #servicelevels:
71 | # availability:
72 | # description: The server is available during support hours
73 | # percentage: 99.9%
74 | # retention:
75 | # description: Data is retained for one year because!
76 | # period: P1Y
77 | # unlimited: false
78 | # latency:
79 | # description: Data is available within 25 hours after the order was placed
80 | # threshold: 25h
81 | # sourceTimestampField: orders.order_timestamp
82 | # processedTimestampField: orders.processed_timestamp
83 | # freshness:
84 | # description: The age of the youngest row in a table.
85 | # threshold: 25h
86 | # timestampField: orders.order_timestamp
87 | # frequency:
88 | # description: Data is delivered once a day
89 | # type: batch # or streaming
90 | # interval: daily # for batch, either or cron
91 | # cron: 0 0 * * * # for batch, either or interval
92 | # support:
93 | # description: The data is available during typical business hours at headquarters
94 | # time: 9am to 5pm in EST on business days
95 | # responseTime: 1h
96 | # backup:
97 | # description: Data is backed up once a week, every Sunday at 0:00 UTC.
98 | # interval: weekly
99 | # cron: 0 0 * * 0
100 | # recoveryTime: 24 hours
101 | # recoveryPoint: 1 week
102 |
103 | ### quality
104 |
105 | #quality:
106 | # type: SodaCL
107 | # specification:
108 | # checks for my_model: |-
109 | # - duplicate_count(id) = 0
--------------------------------------------------------------------------------
/versions/0.9.3/definition.schema.json:
--------------------------------------------------------------------------------
1 | {
2 | "$schema": "http://json-schema.org/draft-07/schema#",
3 | "type": "object",
4 | "description": "Clear and concise explanations of syntax, semantic, and classification of business objects in a given domain.",
5 | "properties": {
6 | "domain": {
7 | "type": "string",
8 | "description": "The domain in which this definition is valid.",
9 | "default": "global"
10 | },
11 | "name": {
12 | "type": "string",
13 | "description": "The technical name of this definition."
14 | },
15 | "title": {
16 | "type": "string",
17 | "description": "The business name of this definition."
18 | },
19 | "description": {
20 | "type": "string",
21 | "description": "Clear and concise explanations related to the domain."
22 | },
23 | "type": {
24 | "type": "string",
25 | "description": "The logical data type."
26 | },
27 | "minLength": {
28 | "type": "integer",
29 | "description": "A value must be greater than or equal to this value. Applies only to string types."
30 | },
31 | "maxLength": {
32 | "type": "integer",
33 | "description": "A value must be less than or equal to this value. Applies only to string types."
34 | },
35 | "format": {
36 | "type": "string",
37 | "description": "Specific format requirements for the value (e.g., 'email', 'uri', 'uuid')."
38 | },
39 | "precision": {
40 | "type": "integer",
41 | "examples": [
42 | 38
43 | ],
44 | "description": "The maximum number of digits in a number. Only applies to numeric values. Defaults to 38."
45 | },
46 | "scale": {
47 | "type": "integer",
48 | "examples": [
49 | 0
50 | ],
51 | "description": "The maximum number of decimal places in a number. Only applies to numeric values. Defaults to 0."
52 | },
53 | "pattern": {
54 | "type": "string",
55 | "description": "A regular expression pattern the value must match. Applies only to string types."
56 | },
57 | "example": {
58 | "type": "string",
59 | "description": "An example value."
60 | },
61 | "pii": {
62 | "type": "boolean",
63 | "description": "Indicates if the field contains Personal Identifiable Information (PII)."
64 | },
65 | "classification": {
66 | "type": "string",
67 | "description": "The data class defining the sensitivity level for this field."
68 | },
69 | "tags": {
70 | "type": "array",
71 | "items": {
72 | "type": "string"
73 | },
74 | "description": "Custom metadata to provide additional context."
75 | },
76 | "links": {
77 | "type": "object",
78 | "description": "Links to external resources.",
79 | "minProperties": 1,
80 | "propertyNames": {
81 | "pattern": "^[a-zA-Z0-9_-]+$"
82 | },
83 | "additionalProperties": {
84 | "type": "string",
85 | "title": "Link",
86 | "description": "A URL to an external resource.",
87 | "format": "uri",
88 | "examples": [
89 | "https://example.com"
90 | ]
91 | }
92 | }
93 | },
94 | "required": [
95 | "name",
96 | "type"
97 | ]
98 | }
--------------------------------------------------------------------------------
/workshop.md:
--------------------------------------------------------------------------------
1 | # Data Contract Workshop
2 |
3 | Bring data producers and consumers together to define data contracts in a facilitated workshop.
4 |
5 | ## Goal
6 |
7 | A defined and agreed upon data contract between data producers and consumers.
8 |
9 | ## Participants
10 |
11 | - Facilitator
12 | - Neutral moderator and typist
13 | - Should know the used data contract formal ([Data Contract Specification](https://datacontract.com) or [ODCS](https://bitol-io.github.io/open-data-contract-standard/latest/)) and its tools well
14 | - Get the [authors of the Data Contract Specification](https://datacontract.com/#authors) as facilitators for your workshop.
15 | - Data producer
16 | - Product Owner
17 | - Software Engineers
18 | - Data consumers
19 | - Product Owner
20 | - Data Engineers / Scientist / Analyst
21 |
22 | Recommendation: keep the group small (not more than 5 people)
23 |
24 | ## Settings
25 |
26 | - Show data contract the whole workshop on the screen (projector, screenshare, ...)
27 | - Facilitator is the typist
28 | - Facilitator is moderator
29 | - Data Producer and Data Consumers discuss and give commands to the facilitator
30 |
31 | ## Guidelines for the Data Contract Specification
32 |
33 | ### Recommended Order of Completion (Data Contract Specification)
34 |
35 | 1. Info (get the context)
36 | 2. Examples (example-driven facilitation)
37 | 3. Model (you will spend most of your time here)
38 | - Use the [Data Contract CLI](https://cli.datacontract.com) to test the model against the previously created examples:\\
39 | `datacontract test --examples datacontract.yaml`
40 | 4. Quality
41 | 5. Terms
42 | 6. Servers (if already applicable)
43 | - Start with a "local" server with actual, real data you downloaded
44 | - Use the [Data Contract CLI](https://cli.datacontract.com) to test the model against the actual data on a specific server:\\
45 | `datacontract test datacontract.yaml`
46 | - Switch to the actual remote server, if applicable
47 |
48 | ### Tooling (Data Contract Specification)
49 |
50 | - Open the [starter template](https://datacontract.com/datacontract.init.yaml) in the [Data Contract Editor](https://editor.datacontract.com) and get going. If you lack an experienced facilitator, ignore any validation errors and warnings within the editor.
51 | - Use the [Data Contract Editor](https://editor.datacontract.com) to share the results of the workshop afterward with the participants and other stakeholders.
52 | - Use the [Data Contract CLI](https://cli.datacontract.com) to validate the data contract after the workshop.
53 | - Use the [Data Mesh Manager](https://www.datamesh-manager.com) to publish the data contract and have it in a central place
54 |
55 | ## Guidelines for ODCS
56 |
57 | We recommend to use the [Excel template](https://github.com/datacontract/open-data-contract-standard-excel-template) for workshops as it is easier to work with in such a setting as it comes with a nice visualization.
58 |
59 | ### Recommended Order of Completion (ODCS)
60 |
61 | 1. Fundamentals (get the context)
62 | - **[Fill in the fundamentals](https://bitol-io.github.io/open-data-contract-standard/latest/#fundamentals)** consisting of id, name, version, status, and description.
63 | 2. Schema (you will spend most of your time here)
64 | - **[Fill in the schemas](https://bitol-io.github.io/open-data-contract-standard/latest/#schema)** (tables) and their properties (columns) along with their name and logicalType as a start in the schema part.
65 | - After that, add information like `description`, `classification`, ...
66 | - Use tags or customProperties add additional metadata where there is no direct support by ODCS
67 | 3. Quality
68 | - **[Add quality checks](https://bitol-io.github.io/open-data-contract-standard/latest/#data-quality)** at the schema or the property level. Start with quality checks of type text first to capture the requirements.
69 | - OPTIONAL Conver the text-based requirements into automated sql-based quality checks
70 | 4. SLAs
71 | - **[Add SLAs](https://bitol-io.github.io/open-data-contract-standard/latest/#service-level-agreement-sla)** that the data provider guarantees towards all data consumers.
72 | 5. Team & Support
73 | - **[Add the team members](https://bitol-io.github.io/open-data-contract-standard/latest/#team)** so that the data consumer knows who is part of the team that owns the data protected by the data contracts.
74 | - **[Add a support channel](https://bitol-io.github.io/open-data-contract-standard/latest/#support-and-communication-channels)** so (potential) data consumers know how to get support and reach the data owners.
75 | 6. Servers (if already applicable)
76 | - **[Add the server information](https://bitol-io.github.io/open-data-contract-standard/latest/#infrastructure-and-servers)** on where the data is available
77 | - Use the [Data Contract CLI](https://cli.datacontract.com) to test the schema against the actual data on a specific server:\\
78 | `datacontract test datacontract.yaml`
79 |
80 | ### Tooling (ODCS)
81 |
82 | - Use the [Excel template](https://github.com/datacontract/open-data-contract-standard-excel-template) for the workshop
83 | - Use the [Data Contract CLI](https://cli.datacontract.com) to validate the data contract after the workshop.
84 | - Use the [Data Mesh Manager](https://www.datamesh-manager.com) to publish the data contract and have it in a central place
85 |
86 | ## Related
87 |
88 | - This data contract workshop could be a followup to a data product design workshop using the [Data Product Canvas](https://www.datamesh-architecture.com/data-product-canvas), making the offered contract at the output port of the designed data product more concrete.
89 |
--------------------------------------------------------------------------------