├── sources
    ├── INSTRUCTIONS.md
    ├── datacontract.com
    │   ├── CHANGELOG.md
    │   ├── workshop.md
    │   ├── datacontract.init.yaml
    │   ├── definition.schema.json
    │   ├── datacontract.yaml
    │   └── datacontract.schema.json
    └── cli.datacontract.com
    │   ├── CHANGELOG.md
    │   └── README.md
├── CNAME
├── images
    ├── favicon.png
    ├── datacontract-gpt-browser.png
    ├── datacontract-gpt-social-media.png
    └── supported-by-innoq--petrol-apricot.svg
├── _config.yml
├── example_shipment.yaml
├── _layouts
    └── default.html
├── example_datacontract.yaml
└── README.md


/sources/INSTRUCTIONS.md:
--------------------------------------------------------------------------------
1 | 


--------------------------------------------------------------------------------
/CNAME:
--------------------------------------------------------------------------------
1 | gpt.datacontract.com


--------------------------------------------------------------------------------
/images/favicon.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datacontract/datacontract-gpt/main/images/favicon.png


--------------------------------------------------------------------------------
/images/datacontract-gpt-browser.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datacontract/datacontract-gpt/main/images/datacontract-gpt-browser.png


--------------------------------------------------------------------------------
/images/datacontract-gpt-social-media.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datacontract/datacontract-gpt/main/images/datacontract-gpt-social-media.png


--------------------------------------------------------------------------------
/_config.yml:
--------------------------------------------------------------------------------
1 | plugins:
2 |   - jekyll-sitemap
3 | name: Data Contract GPT
4 | title: null
5 | description: Data Contract GPT for the Data Contract Specification to interactively create and modify data contracts.
6 | 


--------------------------------------------------------------------------------
/example_shipment.yaml:
--------------------------------------------------------------------------------
1 | shipment_id: "123e4567-e89b-12d3-a456-426614174000"
2 | origin: "New York, NY"
3 | destination: "Los Angeles, CA"
4 | shipment_date: "2024-01-01T10:00:00Z"
5 | delivery_date: "2024-01-05T15:00:00Z"
6 | status: "delivered"


--------------------------------------------------------------------------------
/sources/datacontract.com/CHANGELOG.md:
--------------------------------------------------------------------------------
 1 | # Changelog
 2 | 
 3 | All notable changes to this project will be documented in this file.
 4 | 
 5 | The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
 6 | and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 7 | 
 8 | ## [Unreleased]
 9 | 
10 | Please note,  while the major version is zero (0.y.z), Anything MAY change at any time. 
11 | The public API SHOULD NOT be considered stable.
12 | 
13 | ### Added
14 | - Support for server-specific data types as config map ([#63](https://github.com/datacontract/datacontract-specification/issues/63))
15 | - AWS Glue Catalog server support
16 | - sftp server support
17 | - info.status field
18 | - oracle server support
19 | - field.title attribute
20 | - model.title attribute
21 | - AWS Kinesis Data Streams server support
22 | 
23 | ## [0.9.3] - 2024-03-06
24 | 
25 | ### Added
26 | 
27 | - Service levels as a top level `servicelevels` element
28 | - pubsub server support
29 | - primary key and relationship support via `field.primary` and `field.references` attributes
30 | - databricks server support improved
31 | 
32 | ## [0.9.2] - 2024-01-04
33 | 
34 | ### Added
35 | 
36 | - Format and validation attributes to fields in models and definitions
37 | - Postgres support
38 | - Databricks support
39 | 
40 | ## [0.9.1] - 2023-11-19
41 | 
42 | ### Added
43 | 
44 | - A logical data model (#13), mainly to simplify editor support with a defined schema, easier to detect breaking changes, and better Databricks support.
45 | - Definitions (#14) for reusable semantic definitions within one data contract or across data contracts. 
46 | 
47 | ### Removed
48 | 
49 | - Property `info.dataProduct` as data products should define which data contracts they implement.
50 | - Property `info.outputPort` as data products should define which data contracts they implement.
51 | 
52 | Those removals are not considered as breaking changes, as these attributes are now treated as specification extensions.
53 | 
54 | ## [0.9.0] - 2023-09-12
55 | 
56 | First public release.
57 | 


--------------------------------------------------------------------------------
/sources/datacontract.com/workshop.md:
--------------------------------------------------------------------------------
 1 | # Data Contract Workshop
 2 | 
 3 | Bring data producers and consumers together to define data contracts in a facilitated workshop.
 4 | 
 5 | ## Goal
 6 | 
 7 | A defined and agreed upon data contract between data producers and consumers.
 8 | 
 9 | ## Participants
10 | 
11 | - Facilitator
12 |   - Neutral moderator and typist
13 |   - Should know the [Data Contract Specification](https://datacontract.com) and its tools well
14 |   - Get the [authors of the Data Contract Specification](https://datacontract.com/#authors) as facilitators for your workshop.
15 | - Data producer
16 |   - Product Owner
17 |   - Software Engineers
18 | - Data consumers
19 |   - Product Owner
20 |   - Data Engineers / Scientist / Analyst
21 | 
22 | Recommendation: keep the group small (not more than 5 people)
23 | 
24 | ## Settings
25 | 
26 | - Show data contract the whole workshop on the screen (projector, screenshare, ...)
27 | - Facilitator is the typist
28 | - Facilitator is moderator
29 | - Data Producer and Data Consumers discuss and give commands to the facilitator
30 | 
31 | ## Recommended Order of Completion
32 | 
33 | 1. Info (get the context)
34 | 2. Examples (example-driven facilitation)
35 | 3. Model (you will spend most of your time here)
36 |    - Use the [Data Contract CLI](https://cli.datacontract.com) to test the model against the previously created examples:\\
37 |     `datacontract test --examples datacontract.yaml`
38 | 4. Quality
39 | 5. Terms
40 | 6. Servers (if already applicable)
41 |    - Start with a "local" server with actual, real data you downloaded
42 |    - Use the [Data Contract CLI](https://cli.datacontract.com) to test the model against the actual data on a specific server:\\
43 |     `datacontract test datacontract.yaml`
44 |    - Switch to the actual remote server, if applicable
45 | 
46 | ## Tooling
47 | 
48 | - Open the [starter template](https://datacontract.com/datacontract.init.yaml) in the [Data Contract Studio](https://studio.datacontract.com) and get going. If you lack an experienced facilitator, ignore any validation errors and warnings within the studio.
49 | - Use the [Data Contract Studio](https://studio.datacontract.com) to share the results of the workshop afterward with the participants and other stakeholders.
50 | - Use the [Data Contract CLI](https://cli.datacontract.com) to validate the data contract after the workshop.
51 | 
52 | ## Related
53 | 
54 | - This data contract workshop could be a followup to a data product design workshop using the [Data Product Canvas](https://www.datamesh-architecture.com/data-product-canvas), making the offered contract at the output port of the designed data product more concrete.
55 | 


--------------------------------------------------------------------------------
/sources/datacontract.com/datacontract.init.yaml:
--------------------------------------------------------------------------------
  1 | dataContractSpecification: 0.9.3
  2 | id: my-data-contract-id
  3 | info:
  4 |   title: My Data Contract
  5 |   version: 0.0.1
  6 | #  description:
  7 | #  owner:
  8 | #  contact:
  9 | #    name:
 10 | #    url:
 11 | #    email:
 12 | 
 13 | 
 14 | ### servers
 15 | 
 16 | #servers:
 17 | #  production:
 18 | #    type: s3
 19 | #    location: s3://
 20 | #    format: parquet
 21 | #    delimiter: new_line
 22 | 
 23 | ### terms
 24 | 
 25 | #terms:
 26 | #  usage:
 27 | #  limitations:
 28 | #  billing:
 29 | #  noticePeriod:
 30 | 
 31 | 
 32 | ### models
 33 | 
 34 | # models:
 35 | #   my_model:
 36 | #     description:
 37 | #     type:
 38 | #     fields:
 39 | #       my_field:
 40 | #         type:
 41 | #         description:
 42 | 
 43 | 
 44 | ### definitions
 45 | 
 46 | # definitions:
 47 | #   my_field:
 48 | #     domain:
 49 | #     name:
 50 | #     title:
 51 | #     type:
 52 | #     description:
 53 | #     example:
 54 | #     pii:
 55 | #     classification:
 56 | 
 57 | 
 58 | ### examples
 59 | 
 60 | #examples:
 61 | #  - type: csv
 62 | #    model: my_model
 63 | #    data: |-
 64 | #      id,timestamp,amount
 65 | #      "1001","2023-09-09T08:30:00Z",2500
 66 | #      "1002","2023-09-08T15:45:00Z",1800
 67 | 
 68 | ### servicelevels
 69 | 
 70 | #servicelevels:
 71 | #  availability:
 72 | #    description: The server is available during support hours
 73 | #    percentage: 99.9%
 74 | #  retention:
 75 | #    description: Data is retained for one year because!
 76 | #    period: P1Y
 77 | #    unlimited: false
 78 | #  latency:
 79 | #    description: Data is available within 25 hours after the order was placed
 80 | #    threshold: 25h
 81 | #    sourceTimestampField: orders.order_timestamp
 82 | #    processedTimestampField: orders.processed_timestamp
 83 | #  freshness:
 84 | #    description: The age of the youngest row in a table.
 85 | #    threshold: 25h
 86 | #    timestampField: orders.order_timestamp
 87 | #  frequency:
 88 | #    description: Data is delivered once a day
 89 | #    type: batch # or streaming
 90 | #    interval: daily # for batch, either or cron
 91 | #    cron: 0 0 * * * # for batch, either or interval
 92 | #  support:
 93 | #    description: The data is available during typical business hours at headquarters
 94 | #    time: 9am to 5pm in EST on business days
 95 | #    responseTime: 1h
 96 | #  backup:
 97 | #    description: Data is backed up once a week, every Sunday at 0:00 UTC.
 98 | #    interval: weekly
 99 | #    cron: 0 0 * * 0
100 | #    recoveryTime: 24 hours
101 | #    recoveryPoint: 1 week
102 | 
103 | ### quality
104 | 
105 | #quality:
106 | #  type: SodaCL
107 | #  specification:
108 | #    checks for my_model: |-
109 | #      - duplicate_count(id) = 0
110 | 


--------------------------------------------------------------------------------
/sources/datacontract.com/definition.schema.json:
--------------------------------------------------------------------------------
 1 | {
 2 |   "$schema": "http://json-schema.org/draft-07/schema#",
 3 |   "type": "object",
 4 |   "description": "Clear and concise explanations of syntax, semantic, and classification of business objects in a given domain.",
 5 |   "properties": {
 6 |     "domain": {
 7 |       "type": "string",
 8 |       "description": "The domain in which this definition is valid.",
 9 |       "default": "global"
10 |     },
11 |     "name": {
12 |       "type": "string",
13 |       "description": "The technical name of this definition."
14 |     },
15 |     "title": {
16 |       "type": "string",
17 |       "description": "The business name of this definition."
18 |     },
19 |     "description": {
20 |       "type": "string",
21 |       "description": "Clear and concise explanations related to the domain."
22 |     },
23 |     "type": {
24 |       "type": "string",
25 |       "description": "The logical data type."
26 |     },
27 |     "minLength": {
28 |       "type": "integer",
29 |       "description": "A value must be greater than or equal to this value. Applies only to string types."
30 |     },
31 |     "maxLength": {
32 |       "type": "integer",
33 |       "description": "A value must be less than or equal to this value. Applies only to string types."
34 |     },
35 |     "format": {
36 |       "type": "string",
37 |       "description": "Specific format requirements for the value (e.g., 'email', 'uri', 'uuid')."
38 |     },
39 |     "precision": {
40 |       "type": "integer",
41 |       "examples": [
42 |         38
43 |       ],
44 |       "description": "The maximum number of digits in a number. Only applies to numeric values. Defaults to 38."
45 |     },
46 |     "scale": {
47 |       "type": "integer",
48 |       "examples": [
49 |         0
50 |       ],
51 |       "description": "The maximum number of decimal places in a number. Only applies to numeric values. Defaults to 0."
52 |     },
53 |     "pattern": {
54 |       "type": "string",
55 |       "description": "A regular expression pattern the value must match. Applies only to string types."
56 |     },
57 |     "example": {
58 |       "type": "string",
59 |       "description": "An example value."
60 |     },
61 |     "pii": {
62 |       "type": "boolean",
63 |       "description": "Indicates if the field contains Personal Identifiable Information (PII)."
64 |     },
65 |     "classification": {
66 |       "type": "string",
67 |       "description": "The data class defining the sensitivity level for this field."
68 |     },
69 |     "tags": {
70 |       "type": "array",
71 |       "items": {
72 |         "type": "string"
73 |       },
74 |       "description": "Custom metadata to provide additional context."
75 |     }
76 |   },
77 |   "required": [
78 |     "name",
79 |     "type"
80 |   ]
81 | }
82 | 


--------------------------------------------------------------------------------
/_layouts/default.html:
--------------------------------------------------------------------------------
 1 | <!DOCTYPE html>
 2 | <html lang="{{ site.lang | default: "en-US" }}">
 3 |   <head>
 4 |     <meta charset="UTF-8">
 5 |     <meta http-equiv="X-UA-Compatible" content="IE=edge">
 6 | 
 7 |     <meta name="image" property="og:image" content="https://gpt.datacontract.com/images/datacontract-gpt-social-media.png" />
 8 |     <meta name="twitter:card" content="summary_large_image" />
 9 |     <meta name="twitter:image" content="https://gpt.datacontract.com/images/datacontract-gpt-social-media.png" />
10 |     <meta name="viewport" content="width=device-width, initial-scale=1">
11 | 
12 |     <link rel="shortcut icon" type="image/png" href="images/favicon.png">
13 | 
14 |     {% seo %}
15 |     <link rel="stylesheet" href="{{ "/assets/css/style.css?v=" | append: site.github.build_revision | relative_url }}">
16 |     <style>
17 |       .footer {
18 |         text-align: center;
19 |         margin-bottom: 1rem;
20 |       }
21 | 
22 |       .footer-logo {
23 |         width: 150px;
24 |       }
25 |     </style>
26 |   </head>
27 |   <body>
28 |     <div class="container-lg px-3 my-5 markdown-body">
29 |       {% if site.title and site.title != page.title %}
30 |       <h1><a href="{{ "/" | absolute_url }}">{{ site.title }}</a></h1>
31 |       {% endif %}
32 | 
33 |       {{ content }}
34 | 
35 |       {% if site.github.private != true and site.github.license %}
36 |       <div class="footer border-top border-gray-light mt-5 pt-3 text-right text-gray">
37 |         This site is open source. {% github_edit_link "Improve this page" %}.
38 |       </div>
39 |       {% endif %}
40 |     </div>
41 |     <footer class="footer">
42 |       <p style="margin-top: 2em;">
43 |         <a href="https://www.innoq.com">
44 |           <img src="/images/supported-by-innoq--petrol-apricot.svg" class="footer-logo" />
45 |         </a>
46 |       </p>
47 |       <p>
48 |         <a href="https://www.innoq.com/en/impressum/">Legal Notice</a>
49 |         |
50 |         <a href="https://www.innoq.com/en/datenschutz/">Privacy</a>
51 |       </p>
52 |     </footer>
53 |     <script src="{{ "assets/javascript/anchor-js/anchor.min.js" | relative_url }}"></script>
54 |     <script>anchors.add();</script>
55 |     {% if site.google_analytics %}
56 |     <script>
57 |       (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
58 |       (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
59 |       m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
60 |       })(window,document,'script','//www.google-analytics.com/analytics.js','ga');
61 |       ga('create', '{{ site.google_analytics }}', 'auto');
62 |       ga('send', 'pageview');
63 |     </script>
64 |     {% endif %}
65 |     <!-- 100% privacy friendly analytics -->
66 |     <script async defer src="https://scripts.simpleanalyticscdn.com/latest.js"></script>
67 |     <noscript><img src="https://queue.simpleanalyticscdn.com/noscript.gif" alt="" referrerpolicy="no-referrer-when-downgrade" /></noscript>
68 |   </body>
69 | </html>
70 | 


--------------------------------------------------------------------------------
/images/supported-by-innoq--petrol-apricot.svg:
--------------------------------------------------------------------------------
1 | <svg width="226" height="81" xmlns="http://www.w3.org/2000/svg"><g transform="translate(.4 .018)" fill="none"><path d="M31.287 1.057C30.055.322 28.914.014 27.542.014c-2.086 0-3.906 1.064-3.906 3.108 0 1.771 1.26 2.534 2.989 2.807l.623.098c1.365.217 2.079.469 2.079 1.148 0 .756-.84 1.183-1.925 1.183-1.26 0-2.415-.49-3.122-1.057l-1.015 1.575c1.05.847 2.646 1.281 4.13 1.281 2.107 0 4.011-1.029 4.011-3.192 0-1.708-1.372-2.457-3.122-2.744l-.553-.091c-1.176-.196-2.023-.413-2.023-1.134 0-.742.777-1.176 1.876-1.176 1.071 0 2.009.357 2.779.833l.924-1.596zM47.19.182h-1.96v5.789c0 1.61-.91 2.317-2.261 2.317s-2.261-.707-2.261-2.317V.182h-1.96v5.887c0 2.709 1.729 4.088 4.221 4.088s4.221-1.379 4.221-4.088V.182zm11.85 6.216c2.324 0 3.57-1.218 3.57-3.101 0-1.897-1.246-3.115-3.584-3.115h-3.997v9.8h1.953V6.398h2.058zm-.049-4.417c1.064 0 1.617.441 1.617 1.316 0 .882-.553 1.316-1.617 1.316h-2.009V1.981h2.009zm15.021 4.417c2.324 0 3.57-1.218 3.57-3.101 0-1.897-1.246-3.115-3.584-3.115h-3.997v9.8h1.953V6.398h2.058zm-.049-4.417c1.064 0 1.617.441 1.617 1.316 0 .882-.553 1.316-1.617 1.316h-2.009V1.981h2.009zM89.663 0c-3.017 0-5.145 2.156-5.145 5.082 0 2.926 2.128 5.082 5.145 5.082 3.017 0 5.152-2.156 5.152-5.082C94.815 2.156 92.68 0 89.663 0zm0 1.876c1.806 0 3.108 1.302 3.108 3.206s-1.302 3.206-3.108 3.206c-1.799 0-3.108-1.302-3.108-3.206s1.309-3.206 3.108-3.206zM102.297.182v9.8h1.953V6.083h.574l2.856 3.899h2.366l-3.017-3.927c1.883-.182 3.003-1.281 3.003-2.891 0-1.876-1.253-2.982-3.591-2.982h-4.144zm4.109 1.799c1.036 0 1.617.371 1.617 1.183 0 .826-.553 1.218-1.617 1.218h-2.156V1.981h2.156zM124.661.182h-8.029v1.827h3.038v7.973h1.953V2.009h3.038V.182zm14.384 8.008h-5.369V5.936h5.068V4.144h-5.068v-2.17h5.313V.182h-7.266v9.8h7.322V8.19zm7.727-8.008v9.8h4.039c3.066 0 5.089-1.946 5.089-4.9 0-2.954-2.023-4.9-5.089-4.9h-4.039zm4.018 1.827c2.065 0 3.066 1.274 3.066 3.073 0 1.778-1.001 3.073-3.066 3.073h-2.065V2.009h2.065zM172.572.182v9.8h4.242c2.352 0 3.689-.924 3.689-2.807 0-1.057-.735-1.897-1.631-2.198a2.186 2.186 0 0 0 1.421-2.079c0-1.869-1.435-2.716-3.717-2.716h-4.004zm4.144 5.726c1.078 0 1.757.252 1.757 1.169 0 .938-.679 1.19-1.757 1.19l-2.191-.007V5.908h2.191zm-.196-4.004c.966 0 1.736.21 1.736 1.134 0 .902-.679 1.132-1.577 1.153l-2.154.002V1.904h1.995zM196.028.182l-3.724 5.936v3.864h-1.953V6.111L186.578.182h2.296l2.464 4.158 2.464-4.158h2.226z" fill="#004153"/><g transform="translate(0 25.882)"><g fill="#004153"><path d="M0 1L12.5 1 12.5 53.5 0 53.5z"/><path d="M102.9 53.5L102.9 1 90.4 1 90.4 33.1 66.4 1 60.9 1 54.4 1 54.4 33.1 30.5 1 18.5 1 18.5 53.5 31 53.5 31 21.5 54.9 53.5 57 53.5 64.1 53.5 67 53.5 67 21.5 90.9 53.5z"/><path d="M134.9 12c8.5 0 14.6 6.2 14.6 15.2s-6.1 15.2-14.6 15.2c-8.6 0-14.6-6.2-14.6-15.2S126.4 12 134.9 12m0-12c-16.2 0-27.7 11.6-27.7 27.2s11.5 27.2 27.7 27.2 27.7-11.6 27.7-27.2S151.1 0 134.9 0"/></g><path d="M193.8 12c8.5 0 14.6 6.2 14.6 15.2 0 1.7-.2 3.3-.6 4.8l-1.7-2.3h-13.6l8.3 11.1c-2 1-4.4 1.6-6.9 1.6-8.5 0-14.6-6.2-14.6-15.2-.1-8.9 6-15.2 14.5-15.2m0-12c-16.2 0-27.7 11.6-27.7 27.2s11.5 27.2 27.7 27.2c5.5 0 10.2-1.5 14.2-4.1l2.3 3.1H225l-8.3-11.2c2.9-4.2 4.6-9.1 4.6-15C221.5 11.6 210 0 193.8 0" fill="#FF9C66"/></g></g></svg>


--------------------------------------------------------------------------------
/example_datacontract.yaml:
--------------------------------------------------------------------------------
  1 | dataContractSpecification: "0.9.3"
  2 | id: "shipment-data-contract"
  3 | info:
  4 |   title: "Shipment Data Contract"
  5 |   version: "1.0.0"
  6 |   status: "active"
  7 |   description: "Data contract for shipment information."
  8 |   owner: "Logistics Team"
  9 |   contact:
 10 |     name: "Logistics Support"
 11 |     email: "logistics-support@example.com"
 12 | servers:
 13 |   S3:
 14 |     type: "s3"
 15 |     location: "s3://logistics-data/shipments"
 16 |     format: "json"
 17 | models:
 18 |   shipments:
 19 |     description: "Details of each shipment."
 20 |     type: "table"
 21 |     fields:
 22 |       shipment_id:
 23 |         type: "string"
 24 |         description: "Unique identifier for the shipment."
 25 |         primary: true
 26 |         required: true
 27 |       sender:
 28 |         type: "string"
 29 |         description: "Name of the sender."
 30 |         required: true
 31 |       recipient:
 32 |         type: "string"
 33 |         description: "Name of the recipient."
 34 |         required: true
 35 |       origin:
 36 |         type: "string"
 37 |         description: "Origin location of the shipment."
 38 |         required: true
 39 |       destination:
 40 |         type: "string"
 41 |         description: "Destination location of the shipment."
 42 |         required: true
 43 |       weight:
 44 |         type: "decimal"
 45 |         description: "Weight of the shipment in kilograms."
 46 |         precision: 10
 47 |         scale: 2
 48 |         required: true
 49 |       shipped_date:
 50 |         type: "timestamp"
 51 |         description: "Date and time when the shipment was sent."
 52 |         required: true
 53 |       delivery_date:
 54 |         type: "timestamp"
 55 |         description: "Date and time when the shipment was delivered."
 56 |       status:
 57 |         type: "string"
 58 |         description: "Current status of the shipment."
 59 |         enum: ["pending", "shipped", "in_transit", "delivered", "canceled"]
 60 |         required: true
 61 | examples:
 62 |   - type: "json"
 63 |     model: "shipments"
 64 |     data:
 65 |       - shipment_id: "SHIP12345"
 66 |         sender: "Company A"
 67 |         recipient: "Customer B"
 68 |         origin: "New York, NY"
 69 |         destination: "San Francisco, CA"
 70 |         weight: 15.75
 71 |         shipped_date: "2023-06-01T10:00:00Z"
 72 |         delivery_date: "2023-06-03T15:30:00Z"
 73 |         status: "delivered"
 74 |   - type: "json"
 75 |     model: "shipments"
 76 |     data:
 77 |       - shipment_id: "SHIP12346"
 78 |         sender: "Company C"
 79 |         recipient: "Customer D"
 80 |         origin: "Los Angeles, CA"
 81 |         destination: "Seattle, WA"
 82 |         weight: 5.20
 83 |         shipped_date: "2023-06-01T11:00:00Z"
 84 |         status: "in_transit"
 85 | quality:
 86 |   type: "SodaCL"
 87 |   specification: |
 88 |     checks for shipments:
 89 |       - schema:
 90 |           fail:
 91 |             when required column missing:
 92 |               - shipment_id
 93 |               - sender
 94 |               - recipient
 95 |               - origin
 96 |               - destination
 97 |               - weight
 98 |               - shipped_date
 99 |               - status
100 |       - values in shipment_id:
101 |           must not be null:
102 |             fail:
103 |               count: 1
104 |       - values in weight:
105 |           must not be null:
106 |             fail:
107 |               count: 1
108 |           must be between 0 and 1000:
109 |             fail:
110 |               count: 1
111 |       - values in status:
112 |           must be in:
113 |             - pending
114 |             - shipped
115 |             - in_transit
116 |             - delivered
117 |             - canceled
118 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # Data Contract GPT
  2 | 
  3 | <p>
  4 |   <a href="https://datacontract.com/slack" rel="nofollow"><img src="https://camo.githubusercontent.com/5ade1fd1e76a6ab860802cdd2941fe2501e2ca2cb534e5d8968dbf864c13d33d/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f736c61636b2d6a6f696e5f636861742d77686974652e7376673f6c6f676f3d736c61636b267374796c653d736f6369616c" alt="Slack Status" data-canonical-src="https://img.shields.io/badge/slack-join_chat-white.svg?logo=slack&amp;style=social" style="max-width: 100%;"></a>
  5 | </p>
  6 | 
  7 | [Data Contract GPT](https://chatgpt.com/g/g-QGMQrqm3p-data-contract-gpt) for the [Data Contract Specification](https://datacontract.com) to create data contracts from your example data, chat with your data contracts, or export in other formats. And yes, we know that GPTs can be wrong. That's why we also offer a [deterministic open source tool around data contracts written in Python](https://cli.datacontract.com).
  8 | 
  9 | [![Data Contract GPT Browser Window](/images/datacontract-gpt-browser.png){: width="100%"}](https://chatgpt.com/g/g-QGMQrqm3p-data-contract-gpt)
 10 | 
 11 | ## Create an initial data contract for your domain
 12 | 
 13 | [Open Example in Data Contract GPT (Requires free ChatGPT account)](https://chatgpt.com/share/9695cd45-0352-4d9a-bae2-6ee1b236a1a0)
 14 | 
 15 | > **User**
 16 | > 
 17 | > Create a data contract for data in the logistics domain for my table shipments in Snowflake.
 18 | 
 19 | > **Data Contract GPT**
 20 | >
 21 | > ```yaml
 22 | > dataContractSpecification: "0.9.3"
 23 | > id: "logistics_shipments_001"
 24 | > info:
 25 | >   title: "Logistics Shipments"
 26 | >   version: "1.0.0"
 27 | >   status: "active"
 28 | >   description: "Data contract for the shipments table in the logistics domain."
 29 | >   owner: "Logistics Team"
 30 | > ...
 31 | > ```
 32 | 
 33 | ## Create a data contract from example data
 34 | 
 35 | [Open Example in Data Contract GPT (Requires free ChatGPT account)](https://chatgpt.com/share/82614125-2152-4603-b52d-03619e59699e)
 36 | 
 37 | > **User**
 38 | > 
 39 | > *Upload [example data](example_shipment.yaml) to ChatGPT*
 40 | > 
 41 | > Create a data contract for the uploaded example data.
 42 | 
 43 | > **Data Contract GPT**
 44 | >
 45 | > ```yaml
 46 | > dataContractSpecification: "0.9.3"
 47 | > id: "shipment-data-contract"
 48 | > info:
 49 | >   title: "Shipment Data Contract"
 50 | >   version: "1.0.0"
 51 | >   status: "active"
 52 | >   description: |
 53 | >     This data contract defines the structure and requirements for shipment data, including details 
 54 | >     about the shipment ID, order ID, shipping address, shipment date, and delivery date.
 55 | >   owner: "Logistics Team"
 56 | > ...
 57 | > ```
 58 | 
 59 | ## Extend an existing data contract interactively
 60 | 
 61 | [Open Example in Data Contract GPT (Requires free ChatGPT account)](https://chatgpt.com/share/fc3486e4-c289-4314-8e29-e1ea26b744dd)
 62 | 
 63 | > **User**
 64 | > 
 65 | > *Paste [example contract](example_datacontract.yaml) into prompt*
 66 | > 
 67 | > Add new field location to this data contract.
 68 | 
 69 | > **Data Contract GPT**
 70 | > 
 71 | > ```yaml
 72 | > ...
 73 | > models:
 74 | >   shipments:
 75 | >     fields:
 76 | >       location:
 77 | >         type: "string"
 78 | >         description: "Current location of the shipment."
 79 | > ...
 80 | > examples:
 81 | >   - type: "json"
 82 | >     model: "shipments"
 83 | >     data:
 84 | >     - shipment_id: "SHIP12345"
 85 | >       ...
 86 | >       location: "San Francisco, CA"
 87 | > ...
 88 | > ```
 89 | 
 90 | > **User**
 91 | >
 92 | > Can you give recommendations for the service levels?
 93 | 
 94 | > **Data Contract GPT**
 95 | >
 96 | > ```yaml
 97 | >  servicelevels:
 98 | >    availability:
 99 | >      description: "The server is available 24/7 with a guaranteed uptime."
100 | >      percentage: "99.9%"
101 | >    retention:
102 | >      description: "Data is retained for one year from the shipment date."
103 | >      period: "P1Y"
104 | >    latency:
105 | >      description: "Data is available within 24 hours after the shipment is processed."
106 | >      threshold: "PT24H"
107 | >      sourceTimestampField: "shipped_date"
108 | >      processedTimestampField: "delivery_date"
109 | >  ...
110 | >  ```
111 | 
112 | ## You have even better prompt ideas?
113 | 
114 | Help others and contribute your great prompt ideas to [datacontract-gpt repository](https://github.com/datacontract/datacontract-gpt).
115 | 
116 | ## Credits
117 | 
118 | Created by [Simon Harrer](https://www.linkedin.com/in/simonharrer/) and [Jochen Christ](https://www.linkedin.com/in/jochenchrist/), with the help of many folks from INNOQ.
119 | 
120 | 
121 | <a href="https://github.com/datacontract/datacontract-gpt" class="github-corner" aria-label="View source on GitHub"><svg width="80" height="80" viewBox="0 0 250 250" style="fill:#151513; color:#fff; position: absolute; top: 0; border: 0; right: 0;" aria-hidden="true"><path d="M0,0 L115,115 L130,115 L142,142 L250,250 L250,0 Z"></path><path d="M128.3,109.0 C113.8,99.7 119.0,89.6 119.0,89.6 C122.0,82.7 120.5,78.6 120.5,78.6 C119.2,72.0 123.4,76.3 123.4,76.3 C127.3,80.9 125.5,87.3 125.5,87.3 C122.9,97.6 130.6,101.9 134.4,103.2" fill="currentColor" style="transform-origin: 130px 106px;" class="octo-arm"></path><path d="M115.0,115.0 C114.9,115.1 118.7,116.5 119.8,115.4 L133.7,101.6 C136.9,99.2 139.9,98.4 142.2,98.6 C133.8,88.0 127.5,74.4 143.8,58.0 C148.5,53.4 154.0,51.2 159.7,51.0 C160.3,49.4 163.2,43.6 171.4,40.1 C171.4,40.1 176.1,42.5 178.8,56.2 C183.1,58.6 187.2,61.8 190.9,65.4 C194.5,69.0 197.7,73.2 200.1,77.6 C213.8,80.2 216.3,84.9 216.3,84.9 C212.7,93.1 206.9,96.0 205.4,96.6 C205.1,102.4 203.0,107.8 198.3,112.5 C181.9,128.9 168.3,122.5 157.7,114.1 C157.9,116.9 156.7,120.9 152.7,124.9 L141.0,136.5 C139.8,137.7 141.6,141.9 141.8,141.8 Z" fill="currentColor" class="octo-body"></path></svg></a><style>.github-corner:hover .octo-arm{animation:octocat-wave 560ms ease-in-out}@keyframes octocat-wave{0%,100%{transform:rotate(0)}20%,60%{transform:rotate(-25deg)}40%,80%{transform:rotate(10deg)}}@media (max-width:500px){.github-corner:hover .octo-arm{animation:none}.github-corner .octo-arm{animation:octocat-wave 560ms ease-in-out}}</style>
122 | 


--------------------------------------------------------------------------------
/sources/datacontract.com/datacontract.yaml:
--------------------------------------------------------------------------------
  1 | dataContractSpecification: 0.9.3
  2 | id: urn:datacontract:checkout:orders-latest
  3 | info:
  4 |   title: Orders Latest
  5 |   version: 1.0.0
  6 |   description: |
  7 |     Successful customer orders in the webshop. 
  8 |     All orders since 2020-01-01. 
  9 |     Orders with their line items are in their current state (no history included).
 10 |   owner: Checkout Team
 11 |   contact:
 12 |     name: John Doe (Data Product Owner)
 13 |     url: https://teams.microsoft.com/l/channel/example/checkout
 14 | servers:
 15 |   production:
 16 |     type: s3
 17 |     location: s3://datacontract-example-orders-latest/data/{model}/*.json
 18 |     format: json
 19 |     delimiter: new_line
 20 | terms:
 21 |   usage: |
 22 |     Data can be used for reports, analytics and machine learning use cases.
 23 |     Order may be linked and joined by other tables
 24 |   limitations: |
 25 |     Not suitable for real-time use cases.
 26 |     Data may not be used to identify individual customers.
 27 |     Max data processing per day: 10 TiB
 28 |   billing: 5000 USD per month
 29 |   noticePeriod: P3M
 30 | models:
 31 |   orders:
 32 |     description: One record per order. Includes cancelled and deleted orders.
 33 |     type: table
 34 |     fields:
 35 |       order_id:
 36 |         $ref: '#/definitions/order_id'
 37 |         required: true
 38 |         unique: true
 39 |         primary: true
 40 |       order_timestamp:
 41 |         description: The business timestamp in UTC when the order was successfully registered in the source system and the payment was successful.
 42 |         type: timestamp
 43 |         required: true
 44 |         example: "2024-09-09T08:30:00Z"
 45 |       order_total:
 46 |         description: Total amount the smallest monetary unit (e.g., cents).
 47 |         type: long
 48 |         required: true
 49 |         example: "9999"
 50 |       customer_id:
 51 |         description: Unique identifier for the customer.
 52 |         type: text
 53 |         minLength: 10
 54 |         maxLength: 20
 55 |       customer_email_address:
 56 |         description: The email address, as entered by the customer. The email address was not verified.
 57 |         type: text
 58 |         format: email
 59 |         required: true
 60 |         pii: true
 61 |         classification: sensitive
 62 |       processed_timestamp:
 63 |         description: The timestamp when the record was processed by the data platform.
 64 |         type: timestamp
 65 |         required: true
 66 |         config:
 67 |           jsonType: string
 68 |           jsonFormat: date-time
 69 |   line_items:
 70 |     description: A single article that is part of an order.
 71 |     type: table
 72 |     fields:
 73 |       lines_item_id:
 74 |         type: text
 75 |         description: Primary key of the lines_item_id table
 76 |         required: true
 77 |         unique: true
 78 |         primary: true
 79 |       order_id:
 80 |         $ref: '#/definitions/order_id'
 81 |         references: orders.order_id
 82 |       sku:
 83 |         description: The purchased article number
 84 |         $ref: '#/definitions/sku'
 85 | definitions:
 86 |   order_id:
 87 |     domain: checkout
 88 |     name: order_id
 89 |     title: Order ID
 90 |     type: text
 91 |     format: uuid
 92 |     description: An internal ID that identifies an order in the online shop.
 93 |     example: 243c25e5-a081-43a9-aeab-6d5d5b6cb5e2
 94 |     pii: true
 95 |     classification: restricted
 96 |   sku:
 97 |     domain: inventory
 98 |     name: sku
 99 |     title: Stock Keeping Unit
100 |     type: text
101 |     pattern: ^[A-Za-z0-9]{8,14}$
102 |     example: "96385074"
103 |     description: |
104 |       A Stock Keeping Unit (SKU) is an internal unique identifier for an article. 
105 |       It is typically associated with an article's barcode, such as the EAN/GTIN.
106 | examples:
107 |   - type: csv # csv, json, yaml, custom
108 |     model: orders
109 |     description: An example list of order records.
110 |     data: | # expressed as string or inline yaml or via "$ref: data.csv"
111 |       order_id,order_timestamp,order_total,customer_id,customer_email_address,processed_timestamp
112 |       "1001","2030-09-09T08:30:00Z",2500,"1000000001","mary.taylor82@example.com","2030-09-09T08:31:00Z"
113 |       "1002","2030-09-08T15:45:00Z",1800,"1000000002","michael.miller83@example.com","2030-09-09T08:31:00Z"
114 |       "1003","2030-09-07T12:15:00Z",3200,"1000000003","michael.smith5@example.com","2030-09-09T08:31:00Z"
115 |       "1004","2030-09-06T19:20:00Z",1500,"1000000004","elizabeth.moore80@example.com","2030-09-09T08:31:00Z"
116 |       "1005","2030-09-05T10:10:00Z",4200,"1000000004","elizabeth.moore80@example.com","2030-09-09T08:31:00Z"
117 |       "1006","2030-09-04T14:55:00Z",2800,"1000000005","john.davis28@example.com","2030-09-09T08:31:00Z"
118 |       "1007","2030-09-03T21:05:00Z",1900,"1000000006","linda.brown67@example.com","2030-09-09T08:31:00Z"
119 |       "1008","2030-09-02T17:40:00Z",3600,"1000000007","patricia.smith40@example.com","2030-09-09T08:31:00Z"
120 |       "1009","2030-09-01T09:25:00Z",3100,"1000000008","linda.wilson43@example.com","2030-09-09T08:31:00Z"
121 |       "1010","2030-08-31T22:50:00Z",2700,"1000000009","mary.smith98@example.com","2030-09-09T08:31:00Z"
122 |   - type: csv
123 |     model: line_items
124 |     description: An example list of line items.
125 |     data: |
126 |       lines_item_id,order_id,sku
127 |       "LI-1","1001","5901234123457"
128 |       "LI-2","1001","4001234567890"
129 |       "LI-3","1002","5901234123457"
130 |       "LI-4","1002","2001234567893"
131 |       "LI-5","1003","4001234567890"
132 |       "LI-6","1003","5001234567892"
133 |       "LI-7","1004","5901234123457"
134 |       "LI-8","1005","2001234567893"
135 |       "LI-9","1005","5001234567892"
136 |       "LI-10","1005","6001234567891"
137 | servicelevels:
138 |   availability:
139 |     description: The server is available during support hours
140 |     percentage: 99.9%
141 |   retention:
142 |     description: Data is retained for one year
143 |     period: P1Y
144 |     unlimited: false
145 |   latency:
146 |     description: Data is available within 25 hours after the order was placed
147 |     threshold: 25h
148 |     sourceTimestampField: orders.order_timestamp
149 |     processedTimestampField: orders.processed_timestamp
150 |   freshness:
151 |     description: The age of the youngest row in a table.
152 |     threshold: 25h
153 |     timestampField: orders.order_timestamp
154 |   frequency:
155 |     description: Data is delivered once a day
156 |     type: batch # or streaming
157 |     interval: daily # for batch, either or cron
158 |     cron: 0 0 * * * # for batch, either or interval
159 |   support:
160 |     description: The data is available during typical business hours at headquarters
161 |     time: 9am to 5pm in EST on business days
162 |     responseTime: 1h
163 |   backup:
164 |     description: Data is backed up once a week, every Sunday at 0:00 UTC.
165 |     interval: weekly
166 |     cron: 0 0 * * 0
167 |     recoveryTime: 24 hours
168 |     recoveryPoint: 1 week
169 | quality:
170 |   type: SodaCL   # data quality check format: SodaCL, montecarlo, custom
171 |   specification: # expressed as string or inline yaml or via "$ref: checks.yaml"
172 |     checks for orders:
173 |       - row_count >= 5
174 |       - duplicate_count(order_id) = 0
175 |     checks for line_items:
176 |       - values in (order_id) must exist in orders (order_id)
177 |       - row_count >= 5
178 | 


--------------------------------------------------------------------------------
/sources/cli.datacontract.com/CHANGELOG.md:
--------------------------------------------------------------------------------
  1 | # Changelog
  2 | 
  3 | All notable changes to this project will be documented in this file.
  4 | 
  5 | The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
  6 | and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
  7 | 
  8 | ## [Unreleased]
  9 | 
 10 | ## [0.10.7] - 2024-05-31
 11 | 
 12 | ### Added
 13 | - Test data contract against dataframes / temporary views (#175)
 14 | 
 15 | ### Fixed
 16 | - AVRO export: Logical Types should be nested (#233)
 17 | 
 18 | ## [0.10.6] - 2024-05-29
 19 | 
 20 | ### Fixed
 21 | 
 22 | - Fixed Docker build by removing msodbcsql18 dependency (temporary workaround)
 23 | 
 24 | ## [0.10.5] - 2024-05-29
 25 | 
 26 | ### Added
 27 | - Added support for `sqlserver` (#196)
 28 | - `datacontract export --format dbml`: Export to [Database Markup Language (DBML)](https://dbml.dbdiagram.io/home/) (#135)
 29 | - `datacontract export --format avro`: Now supports config map on field level for logicalTypes and default values [Custom Avro Properties](./README.md#custom-avro-properties)
 30 | - `datacontract import --format avro`: Now supports importing logicalType and default definition on avro files [Custom Avro Properties](./README.md#custom-avro-properties)
 31 | - Support `config.bigqueryType` for testing BigQuery types
 32 | - Added support for selecting specific tables in an AWS Glue `import` through the `glue-table` parameter (#122)
 33 | 
 34 | ### Fixed
 35 | 
 36 | - Fixed jsonschema export for models with empty object-typed fields (#218)
 37 | - Fixed testing BigQuery tables with BOOL fields
 38 | - `datacontract catalog` Show search bar also on mobile
 39 | 
 40 | ## [0.10.4] - 2024-05-17
 41 | 
 42 | ### Added
 43 | 
 44 | - `datacontract catalog` Search
 45 | - `datacontract publish`: Publish the data contract to the Data Mesh Manager
 46 | - `datacontract import --format bigquery`: Import from BigQuery format (#110)
 47 | - `datacontract export --format bigquery`: Export to BigQuery format (#111)
 48 | - `datacontract export --format avro`: Now supports [Avro logical types](https://avro.apache.org/docs/1.11.1/specification/#logical-types) to better model date types. `date`, `timestamp`/`timestamp-tz` and `timestamp-ntz` are now mapped to the appropriate logical types. (#141)
 49 | - `datacontract import --format jsonschema`: Import from JSON schema (#91)
 50 | - `datacontract export --format jsonschema`: Improved export by exporting more additional information
 51 | - `datacontract export --format html`: Added support for Service Levels, Definitions, Examples and nested Fields
 52 | - `datacontract export --format go`: Export to go types format
 53 | 
 54 | ## [0.10.3] - 2024-05-05
 55 | 
 56 | ### Fixed
 57 | - datacontract catalog: Add index.html to manifest
 58 | 
 59 | ## [0.10.2] - 2024-05-05
 60 | 
 61 | ### Added
 62 | 
 63 | - Added import glue (#166)
 64 | - Added test support for `azure` (#146)
 65 | - Added support for `delta` tables on S3 (#24)
 66 | - Added new command `datacontract catalog` that generates a data contract catalog with an `index.html` file.
 67 | - Added field format information to HTML export
 68 | 
 69 | ### Fixed
 70 | - RDF Export: Fix error if owner is not a URI/URN
 71 | 
 72 | 
 73 | ## [0.10.1] - 2024-04-19
 74 | 
 75 | ### Fixed
 76 | 
 77 | - Fixed docker columns
 78 | 
 79 | ## [0.10.0] - 2024-04-19
 80 | 
 81 | ### Added
 82 | 
 83 | - Added timestamp when ah HTML export was created
 84 | 
 85 | ### Fixed
 86 | 
 87 | - Fixed export format **html**
 88 | 
 89 | ## [0.9.9] - 2024-04-18
 90 | 
 91 | ### Added
 92 | 
 93 | - Added export format **html** (#15)
 94 | - Added descriptions as comments to `datacontract export --format sql` for Databricks dialects
 95 | - Added import of arrays in Avro import
 96 | 
 97 | ## [0.9.8] - 2024-04-01
 98 | 
 99 | ### Added
100 | 
101 | - Added export format **great-expectations**: `datacontract export --format great-expectations`
102 | - Added gRPC support to OpenTelemetry integration for publishing test results
103 | - Added AVRO import support for namespace (#121)
104 | - Added handling for optional fields in avro import (#112)
105 | - Added Databricks SQL dialect for `datacontract export --format sql`
106 | 
107 | ### Fixed
108 | 
109 | - Use `sql_type_converter` to build checks.
110 | - Fixed AVRO import when doc is missing (#121)
111 | 
112 | ## [0.9.7] - 2024-03-15
113 | 
114 | ### Added
115 | 
116 | - Added option publish test results to **OpenTelemetry**: `datacontract test --publish-to-opentelemetry`
117 | - Added export format **protobuf**: `datacontract export --format protobuf`
118 | - Added export format **terraform**: `datacontract export --format terraform` (limitation: only works for AWS S3 right now)
119 | - Added export format **sql**: `datacontract export --format sql`
120 | - Added export format **sql-query**: `datacontract export --format sql-query`
121 | - Added export format **avro-idl**: `datacontract export --format avro-idl`: Generates an Avro IDL file containing records for each model.
122 | - Added new command **changelog**: `datacontract changelog datacontract1.yaml datacontract2.yaml` will now generate a changelog based on the changes in the data contract. This will be useful for keeping track of changes in the data contract over time.
123 | - Added extensive linting on data contracts. `datacontract lint` will now check for a variety of possible errors in the data contract, such as missing descriptions, incorrect references to models or fields, nonsensical constraints, and more.
124 | - Added importer for avro schemas. `datacontract import --format avro` will now import avro schemas into a data contract.
125 | 
126 | ### Fixed
127 | 
128 | - Fixed a bug where the export to YAML always escaped the unicode characters.
129 | 
130 | 
131 | ## [0.9.6-2] - 2024-03-04
132 | 
133 | ### Added
134 | 
135 | - test kafka for avro messages
136 | - added export format **avro**: `datacontract export --format avro`
137 | 
138 | ## [0.9.6] - 2024-03-04
139 | 
140 | This is a huge step forward, we now support testing Kafka messages.
141 | We start with JSON messages and avro, and Protobuf will follow.
142 | 
143 | ### Added
144 | - test kafka for JSON messages
145 | - added import format **sql**: `datacontract import --format sql` (#51)
146 | - added export format **dbt-sources**: `datacontract export --format dbt-sources`
147 | - added export format **dbt-staging-sql**: `datacontract export --format dbt-staging-sql`
148 | - added export format **rdf**: `datacontract export --format rdf` (#52)
149 | - added command `datacontract breaking` to detect breaking changes in between two data contracts.
150 | 
151 | ## [0.9.5] - 2024-02-22
152 | 
153 | ### Added
154 | - export to dbt models (#37).
155 | - export to ODCS (#49).
156 | - test - show a test summary table.
157 | - lint - Support local schema (#46).
158 | 
159 | ## [0.9.4] - 2024-02-18
160 | 
161 | ### Added
162 | - Support for Postgres
163 | - Support for Databricks
164 | 
165 | ## [0.9.3] - 2024-02-10
166 | 
167 | ### Added
168 | - Support for BigQuery data connection
169 | - Support for multiple models with S3
170 | 
171 | ### Fixed
172 | 
173 | - Fix Docker images. Disable builds for linux/amd64.
174 | 
175 | ## [0.9.2] - 2024-01-31
176 | 
177 | ### Added
178 | - Publish to Docker Hub
179 | 
180 | ## [0.9.0] - 2024-01-26 - BREAKING
181 | 
182 | This is a breaking change (we are still on a 0.x.x version).
183 | The project migrated from Golang to Python.
184 | The Golang version can be found at [cli-go](https://github.com/datacontract/cli-go)
185 | 
186 | ### Added
187 | - `test` Support to directly run tests and connect to data sources defined in servers section.
188 | - `test` generated schema tests from the model definition.
189 | - `test --publish URL` Publish test results to a server URL.
190 | - `export` now exports the data contract so format jsonschema and sodacl.
191 | 
192 | ### Changed
193 | - The `--file` option removed in favor of a direct argument.: Use `datacontract test datacontract.yaml` instead of `datacontract test --file datacontract.yaml`.
194 | 
195 | ### Removed
196 | - `model` is now part of `export`
197 | - `quality` is now part of `export`
198 | - Temporary Removed: `diff` needs to be migrated to Python.
199 | - Temporary Removed: `breaking` needs to be migrated to Python.
200 | - Temporary Removed: `inline` needs to be migrated to Python.
201 | 
202 | ## [0.6.0]
203 | ### Added
204 | - Support local json schema in lint command.
205 | - Update to specification 0.9.2.
206 | 
207 | ## [0.5.3]
208 | ### Fixed
209 | - Fix format flag bug in model (print) command.
210 | 
211 | ## [0.5.2]
212 | ### Changed
213 | - Log to STDOUT.
214 | - Rename `model` command parameter, `type` -> `format`.
215 | 
216 | ## [0.5.1]
217 | ### Removed
218 | - Remove `schema` command.
219 | 
220 | ### Fixed
221 | - Fix documentation.
222 | - Security update of x/sys.
223 | 
224 | ## [0.5.0]
225 | ### Added
226 | - Adapt Data Contract Specification in version 0.9.2.
227 | - Use `models` section for `diff`/`breaking`.
228 | - Add `model` command.
229 | - Let `inline` print to STDOUT instead of overwriting datacontract file.
230 | - Let `quality` write input from STDIN if present.
231 | 
232 | ## [0.4.0]
233 | ### Added
234 | - Basic implementation of `test` command for Soda Core.
235 | 
236 | ### Changed
237 | - Change package structure to allow usage as library.
238 | 
239 | ## [0.3.2]
240 | ### Fixed
241 | - Fix field parsing for dbt models, affects stability of `diff`/`breaking`.
242 | 
243 | ## [0.3.1]
244 | ### Fixed
245 | - Fix comparing order of contracts in `diff`/`breaking`.
246 | 
247 | ## [0.3.0]
248 | ### Added
249 | - Handle non-existent schema specification when using `diff`/`breaking`.
250 | - Resolve local and remote resources such as schema specifications when using "$ref: ..." notation.
251 | - Implement `schema` command: prints your schema.
252 | - Implement `quality` command: prints your quality definitions.
253 | - Implement the `inline` command: resolves all references using the "$ref: ..." notation and writes them to your data contract.
254 | 
255 | ### Changed
256 | - Allow remote and local location for all data contract inputs (`--file`, `--with`).
257 | 
258 | ## [0.2.0]
259 | ### Added
260 | - Add `diff` command for dbt schema specification.
261 | - Add `breaking` command for dbt schema specification.
262 | 
263 | ### Changed
264 | - Suggest a fix during `init` when the file already exists.
265 | - Rename `validate` command to `lint`.
266 | 
267 | ### Removed
268 | - Remove `check-compatibility` command.
269 | 
270 | ### Fixed
271 | - Improve usage documentation.
272 | 
273 | ## [0.1.1]
274 | ### Added
275 | - Initial release.
276 | 


--------------------------------------------------------------------------------
/sources/datacontract.com/datacontract.schema.json:
--------------------------------------------------------------------------------
   1 | {
   2 |   "$schema": "http://json-schema.org/draft-07/schema#",
   3 |   "type": "object",
   4 |   "title": "DataContractSpecification",
   5 |   "properties": {
   6 |     "dataContractSpecification": {
   7 |       "type": "string",
   8 |       "title": "DataContractSpecificationVersion",
   9 |       "enum": [
  10 |         "0.9.3",
  11 |         "0.9.2",
  12 |         "0.9.1",
  13 |         "0.9.0"
  14 |       ],
  15 |       "description": "Specifies the Data Contract Specification being used."
  16 |     },
  17 |     "id": {
  18 |       "type": "string",
  19 |       "description": "Specifies the identifier of the data contract."
  20 |     },
  21 |     "info": {
  22 |       "type": "object",
  23 |       "properties": {
  24 |         "title": {
  25 |           "type": "string",
  26 |           "description": "The title of the data contract."
  27 |         },
  28 |         "version": {
  29 |           "type": "string",
  30 |           "description": "The version of the data contract document (which is distinct from the Data Contract Specification version or the Data Product implementation version)."
  31 |         },
  32 |         "status": {
  33 |           "type": "string",
  34 |           "description": "The status of the data contract. Can be proposed, in development, active, retired.",
  35 |           "x-extensible-enum": [
  36 |             "proposed",
  37 |             "in development",
  38 |             "active",
  39 |             "retired"
  40 |           ]
  41 |         },
  42 |         "description": {
  43 |           "type": "string",
  44 |           "description": "A description of the data contract."
  45 |         },
  46 |         "owner": {
  47 |           "type": "string",
  48 |           "description": "The owner or team responsible for managing the data contract and providing the data."
  49 |         },
  50 |         "contact": {
  51 |           "type": "object",
  52 |           "properties": {
  53 |             "name": {
  54 |               "type": "string",
  55 |               "description": "The identifying name of the contact person/organization."
  56 |             },
  57 |             "url": {
  58 |               "type": "string",
  59 |               "format": "uri",
  60 |               "description": "The URL pointing to the contact information. This MUST be in the form of a URL."
  61 |             },
  62 |             "email": {
  63 |               "type": "string",
  64 |               "format": "email",
  65 |               "description": "The email address of the contact person/organization. This MUST be in the form of an email address."
  66 |             }
  67 |           },
  68 |           "description": "Contact information for the data contract.",
  69 |           "additionalProperties": true
  70 |         }
  71 |       },
  72 |       "additionalProperties": true,
  73 |       "required": [
  74 |         "title",
  75 |         "version"
  76 |       ],
  77 |       "description": "Metadata and life cycle information about the data contract."
  78 |     },
  79 |     "servers": {
  80 |       "type": "object",
  81 |       "additionalProperties": {
  82 |         "oneOf": [
  83 |           {
  84 |             "type": "object",
  85 |             "title": "BigQueryServer",
  86 |             "properties": {
  87 |               "type": {
  88 |                 "type": "string",
  89 |                 "enum": [
  90 |                   "bigquery",
  91 |                   "BigQuery"
  92 |                 ],
  93 |                 "description": "The type of the data product technology that implements the data contract."
  94 |               },
  95 |               "project": {
  96 |                 "type": "string",
  97 |                 "description": "An optional string describing the server."
  98 |               },
  99 |               "dataset": {
 100 |                 "type": "string",
 101 |                 "description": "An optional string describing the server."
 102 |               }
 103 |             },
 104 |             "additionalProperties": true,
 105 |             "required": [
 106 |               "type",
 107 |               "project",
 108 |               "dataset"
 109 |             ]
 110 |           },
 111 |           {
 112 |             "type": "object",
 113 |             "title": "S3Server",
 114 |             "properties": {
 115 |               "type": {
 116 |                 "type": "string",
 117 |                 "enum": [
 118 |                   "s3"
 119 |                 ],
 120 |                 "description": "The type of the data product technology that implements the data contract."
 121 |               },
 122 |               "location": {
 123 |                 "type": "string",
 124 |                 "format": "uri",
 125 |                 "description": "An optional string describing the server. Must be in the form of a URL.",
 126 |                 "examples": [
 127 |                   "s3://datacontract-example-orders-latest/data/{model}/*.json"
 128 |                 ]
 129 |               },
 130 |               "endpointUrl": {
 131 |                 "type": "string",
 132 |                 "format": "uri",
 133 |                 "description": "The server endpoint for S3-compatible servers.",
 134 |                 "examples": ["https://minio.example.com"]
 135 |               },
 136 |               "format": {
 137 |                 "type": "string",
 138 |                 "enum": [
 139 |                   "parquet",
 140 |                   "delta",
 141 |                   "json",
 142 |                   "csv"
 143 |                 ],
 144 |                 "description": "File format."
 145 |               },
 146 |               "delimiter": {
 147 |                 "type": "string",
 148 |                 "enum": [
 149 |                   "new_line",
 150 |                   "array"
 151 |                 ],
 152 |                 "description": "Only for format = json. How multiple json documents are delimited within one file"
 153 |               }
 154 |             },
 155 |             "additionalProperties": true,
 156 |             "required": [
 157 |               "type",
 158 |               "location"
 159 |             ]
 160 |           },
 161 |           {
 162 |             "type": "object",
 163 |             "title": "SftpServer",
 164 |             "properties": {
 165 |               "type": {
 166 |                 "type": "string",
 167 |                 "enum": [
 168 |                   "sftp"
 169 |                 ],
 170 |                 "description": "The type of the data product technology that implements the data contract."
 171 |               },
 172 |               "location": {
 173 |                 "type": "string",
 174 |                 "format": "uri",
 175 |                 "description": "An optional string describing the server. Must be in the form of a sftp URL.",
 176 |                 "examples": [
 177 |                   "sftp://123.123.12.123/{model}/*.json"
 178 |                 ]
 179 |               },
 180 |               "format": {
 181 |                 "type": "string",
 182 |                 "enum": [
 183 |                   "parquet",
 184 |                   "delta",
 185 |                   "json",
 186 |                   "csv"
 187 |                 ],
 188 |                 "description": "File format."
 189 |               },
 190 |               "delimiter": {
 191 |                 "type": "string",
 192 |                 "enum": [
 193 |                   "new_line",
 194 |                   "array"
 195 |                 ],
 196 |                 "description": "Only for format = json. How multiple json documents are delimited within one file"
 197 |               }
 198 |             },
 199 |             "additionalProperties": true,
 200 |             "required": [
 201 |               "type",
 202 |               "location"
 203 |             ]
 204 |           },
 205 |           {
 206 |             "type": "object",
 207 |             "title": "RedshiftServer",
 208 |             "properties": {
 209 |               "type": {
 210 |                 "type": "string",
 211 |                 "enum": [
 212 |                   "redshift"
 213 |                 ],
 214 |                 "description": "The type of the data product technology that implements the data contract."
 215 |               },
 216 |               "account": {
 217 |                 "type": "string",
 218 |                 "description": "An optional string describing the server."
 219 |               },
 220 |               "database": {
 221 |                 "type": "string",
 222 |                 "description": "An optional string describing the server."
 223 |               },
 224 |               "schema": {
 225 |                 "type": "string",
 226 |                 "description": "An optional string describing the server."
 227 |               }
 228 |             },
 229 |             "additionalProperties": true,
 230 |             "required": [
 231 |               "type",
 232 |               "account",
 233 |               "database",
 234 |               "schema"
 235 |             ]
 236 |           },
 237 |           {
 238 |             "type": "object",
 239 |             "title": "AzureServer",
 240 |             "properties": {
 241 |               "type": {
 242 |                 "type": "string",
 243 |                 "enum": [
 244 |                   "azure"
 245 |                 ],
 246 |                 "description": "The type of the data product technology that implements the data contract."
 247 |               },
 248 |               "location": {
 249 |                 "type": "string",
 250 |                 "format": "uri",
 251 |                 "description": "Fully qualified path to Azure Blob Storage or Azure Data Lake Storage (ADLS), supports globs.",
 252 |                 "examples": [
 253 |                   "az://my_storage_account_name.blob.core.windows.net/my_container/path/*.parquet",
 254 |                   "abfss://my_storage_account_name.dfs.core.windows.net/my_container_name/path/*.parquet"
 255 |                 ]
 256 |               },
 257 |               "format": {
 258 |                 "type": "string",
 259 |                 "enum": [
 260 |                   "parquet",
 261 |                   "delta",
 262 |                   "json",
 263 |                   "csv"
 264 |                 ],
 265 |                 "description": "File format."
 266 |               },
 267 |               "delimiter": {
 268 |                 "type": "string",
 269 |                 "enum": [
 270 |                   "new_line",
 271 |                   "array"
 272 |                 ],
 273 |                 "description": "Only for format = json. How multiple json documents are delimited within one file"
 274 |               }
 275 |             },
 276 |             "additionalProperties": true,
 277 |             "required": [
 278 |               "type",
 279 |               "location",
 280 |               "format"
 281 |             ]
 282 |           },
 283 |           {
 284 |             "type": "object",
 285 |             "title": "SqlserverServer",
 286 |             "properties": {
 287 |               "type": {
 288 |                 "type": "string",
 289 |                 "enum": [
 290 |                   "sqlserver"
 291 |                 ],
 292 |                 "description": "The type of the data product technology that implements the data contract."
 293 |               },
 294 |               "host": {
 295 |                 "type": "string",
 296 |                 "description": "The host to the database server",
 297 |                 "examples": [
 298 |                   "localhost"
 299 |                 ]
 300 |               },
 301 |               "port": {
 302 |                 "type": "integer",
 303 |                 "description": "The port to the database server.",
 304 |                 "default": 1433,
 305 |                 "examples": [
 306 |                   1433
 307 |                 ]
 308 |               },
 309 |               "database": {
 310 |                 "type": "string",
 311 |                 "description": "The name of the database.",
 312 |                 "examples": [
 313 |                   "database"
 314 |                 ]
 315 |               },
 316 |               "schema": {
 317 |                 "type": "string",
 318 |                 "description": "The name of the schema in the database.",
 319 |                 "examples": [
 320 |                   "dbo"
 321 |                 ]
 322 |               }
 323 |             },
 324 |             "additionalProperties": true,
 325 |             "required": [
 326 |               "type",
 327 |               "host",
 328 |               "database",
 329 |               "schema"
 330 |             ]
 331 |           },
 332 |           {
 333 |             "type": "object",
 334 |             "title": "SnowflakeServer",
 335 |             "properties": {
 336 |               "type": {
 337 |                 "type": "string",
 338 |                 "enum": [
 339 |                   "snowflake"
 340 |                 ],
 341 |                 "description": "The type of the data product technology that implements the data contract."
 342 |               },
 343 |               "account": {
 344 |                 "type": "string",
 345 |                 "description": "An optional string describing the server."
 346 |               },
 347 |               "database": {
 348 |                 "type": "string",
 349 |                 "description": "An optional string describing the server."
 350 |               },
 351 |               "schema": {
 352 |                 "type": "string",
 353 |                 "description": "An optional string describing the server."
 354 |               }
 355 |             },
 356 |             "additionalProperties": true,
 357 |             "required": [
 358 |               "type",
 359 |               "account",
 360 |               "database",
 361 |               "schema"
 362 |             ]
 363 |           },
 364 |           {
 365 |             "type": "object",
 366 |             "title": "DatabricksServer",
 367 |             "properties": {
 368 |               "type": {
 369 |                 "type": "string",
 370 |                 "const": "databricks",
 371 |                 "description": "The type of the data product technology that implements the data contract."
 372 |               },
 373 |               "host": {
 374 |                 "type": "string",
 375 |                 "description": "The Databricks host",
 376 |                 "examples": [
 377 |                   "dbc-abcdefgh-1234.cloud.databricks.com"
 378 |                 ]
 379 |               },
 380 |               "catalog": {
 381 |                 "type": "string",
 382 |                 "description": "The name of the Hive or Unity catalog"
 383 |               },
 384 |               "schema": {
 385 |                 "type": "string",
 386 |                 "description": "The schema name in the catalog"
 387 |               }
 388 |             },
 389 |             "additionalProperties": true,
 390 |             "required": [
 391 |               "type",
 392 |               "host",
 393 |               "catalog",
 394 |               "schema"
 395 |             ]
 396 |           },
 397 |           {
 398 |             "type": "object",
 399 |             "title": "DataframeServer",
 400 |             "properties": {
 401 |               "type": {
 402 |                 "type": "string",
 403 |                 "const": "dataframe",
 404 |                 "description": "The type of the data product technology that implements the data contract."
 405 |               }
 406 |             },
 407 |             "additionalProperties": true,
 408 |             "required": [
 409 |               "type"
 410 |             ]
 411 |           },
 412 |           {
 413 |             "type": "object",
 414 |             "title": "GlueServer",
 415 |             "properties": {
 416 |               "type": {
 417 |                 "type": "string",
 418 |                 "const": "glue",
 419 |                 "description": "The type of the data product technology that implements the data contract."
 420 |               },
 421 |               "account": {
 422 |                 "type": "string",
 423 |                 "description": "The AWS Glue account",
 424 |                 "examples": [
 425 |                   "1234-5678-9012"
 426 |                 ]
 427 |               },
 428 |               "database": {
 429 |                 "type": "string",
 430 |                 "description": "The AWS Glue database name",
 431 |                 "examples": [
 432 |                   "my_database"
 433 |                 ]
 434 |               },
 435 |               "location": {
 436 |                 "type": "string",
 437 |                 "format": "uri",
 438 |                 "description": "The AWS S3 path. Must be in the form of a URL.",
 439 |                 "examples": [
 440 |                   "s3://datacontract-example-orders-latest/data/{model}"
 441 |                 ]
 442 |               },
 443 |               "format": {
 444 |                 "type": "string",
 445 |                 "description": "The format of the files",
 446 |                 "examples": [
 447 |                   "parquet",
 448 |                   "csv",
 449 |                   "json",
 450 |                   "delta"
 451 |                 ]
 452 |               }
 453 |             },
 454 |             "additionalProperties": true,
 455 |             "required": [
 456 |               "type",
 457 |               "account",
 458 |               "database"
 459 |             ]
 460 |           },
 461 |           {
 462 |             "type": "object",
 463 |             "title": "PostgresServer",
 464 |             "properties": {
 465 |               "type": {
 466 |                 "type": "string",
 467 |                 "const": "postgres",
 468 |                 "description": "The type of the data product technology that implements the data contract."
 469 |               },
 470 |               "host": {
 471 |                 "type": "string",
 472 |                 "description": "The host to the database server",
 473 |                 "examples": [
 474 |                   "localhost"
 475 |                 ]
 476 |               },
 477 |               "port": {
 478 |                 "type": "integer",
 479 |                 "description": "The port to the database server."
 480 |               },
 481 |               "database": {
 482 |                 "type": "string",
 483 |                 "description": "The name of the database.",
 484 |                 "examples": [
 485 |                   "postgres"
 486 |                 ]
 487 |               },
 488 |               "schema": {
 489 |                 "type": "string",
 490 |                 "description": "The name of the schema in the database.",
 491 |                 "examples": [
 492 |                   "public"
 493 |                 ]
 494 |               }
 495 |             },
 496 |             "additionalProperties": true,
 497 |             "required": [
 498 |               "type",
 499 |               "host",
 500 |               "port",
 501 |               "database",
 502 |               "schema"
 503 |             ]
 504 |           },
 505 |           {
 506 |             "type": "object",
 507 |             "title": "OracleServer",
 508 |             "properties": {
 509 |               "type": {
 510 |                 "type": "string",
 511 |                 "const": "oracle",
 512 |                 "description": "The type of the data product technology that implements the data contract."
 513 |               },
 514 |               "host": {
 515 |                 "type": "string",
 516 |                 "description": "The host to the oracle server",
 517 |                 "examples": [
 518 |                   "localhost"
 519 |                 ]
 520 |               },
 521 |               "port": {
 522 |                 "type": "integer",
 523 |                 "description": "The port to the oracle server.",
 524 |                 "examples": [
 525 |                   1523
 526 |                 ]
 527 |               },
 528 |               "serviceName": {
 529 |                 "type": "string",
 530 |                 "description": "The name of the service.",
 531 |                 "examples": [
 532 |                   "service"
 533 |                 ]
 534 |               }
 535 |             },
 536 |             "additionalProperties": true,
 537 |             "required": [
 538 |               "type",
 539 |               "host",
 540 |               "port",
 541 |               "serviceName"
 542 |             ]
 543 |           },
 544 |           {
 545 |             "type": "object",
 546 |             "title": "KafkaServer",
 547 |             "description": "Kafka Server",
 548 |             "properties": {
 549 |               "type": {
 550 |                 "type": "string",
 551 |                 "enum": [
 552 |                   "kafka"
 553 |                 ],
 554 |                 "description": "The type of the data product technology that implements the data contract."
 555 |               },
 556 |               "host": {
 557 |                 "type": "string",
 558 |                 "description": "The bootstrap server of the kafka cluster."
 559 |               },
 560 |               "topic": {
 561 |                 "type": "string",
 562 |                 "description": "The topic name."
 563 |               },
 564 |               "format": {
 565 |                 "type": "string",
 566 |                 "description": "The format of the message. Examples: json, avro, protobuf. Default: json.",
 567 |                 "default": "json"
 568 |               }
 569 |             },
 570 |             "additionalProperties": true,
 571 |             "required": [
 572 |               "type",
 573 |               "host",
 574 |               "topic"
 575 |             ]
 576 |           },
 577 |           {
 578 |             "type": "object",
 579 |             "title": "PubSubServer",
 580 |             "properties": {
 581 |               "type": {
 582 |                 "type": "string",
 583 |                 "enum": [
 584 |                   "pubsub"
 585 |                 ],
 586 |                 "description": "The type of the data product technology that implements the data contract."
 587 |               },
 588 |               "project": {
 589 |                 "type": "string",
 590 |                 "description": "The GCP project name."
 591 |               },
 592 |               "topic": {
 593 |                 "type": "string",
 594 |                 "description": "The topic name."
 595 |               }
 596 |             },
 597 |             "additionalProperties": true,
 598 |             "required": [
 599 |               "type",
 600 |               "project",
 601 |               "topic"
 602 |             ]
 603 |           },
 604 |           {
 605 |             "type": "object",
 606 |             "title": "KinesisDataStreamsServer",
 607 |             "description": "Kinesis Data Streams Server",
 608 |             "properties": {
 609 |               "type": {
 610 |                 "type": "string",
 611 |                 "enum": [
 612 |                   "kinesis"
 613 |                 ],
 614 |                 "description": "The type of the data product technology that implements the data contract."
 615 |               },
 616 |               "stream": {
 617 |                 "type": "string",
 618 |                 "description": "The name of the Kinesis data stream."
 619 |               },
 620 |               "region": {
 621 |                 "type": "string",
 622 |                 "description": "AWS region.",
 623 |                 "examples": [
 624 |                   "eu-west-1"
 625 |                 ]
 626 |               },
 627 |               "format": {
 628 |                 "type": "string",
 629 |                 "description": "The format of the record",
 630 |                 "examples": [
 631 |                   "json",
 632 |                   "avro",
 633 |                   "protobuf"
 634 |                 ]
 635 |               }
 636 |             },
 637 |             "additionalProperties": true,
 638 |             "required": [
 639 |               "type",
 640 |               "stream"
 641 |             ]
 642 |           },
 643 |           {
 644 |             "type": "object",
 645 |             "title": "LocalServer",
 646 |             "properties": {
 647 |               "type": {
 648 |                 "type": "string",
 649 |                 "enum": [
 650 |                   "local"
 651 |                 ],
 652 |                 "description": "The type of the data product technology that implements the data contract."
 653 |               },
 654 |               "path": {
 655 |                 "type": "string",
 656 |                 "description": "The relative or absolute path to the data file(s).",
 657 |                 "examples": [
 658 |                   "./folder/data.parquet",
 659 |                   "./folder/*.parquet"
 660 |                 ]
 661 |               },
 662 |               "format": {
 663 |                 "type": "string",
 664 |                 "description": "The format of the file(s)",
 665 |                 "examples": [
 666 |                   "json",
 667 |                   "parquet",
 668 |                   "delta",
 669 |                   "csv"
 670 |                 ]
 671 |               }
 672 |             },
 673 |             "additionalProperties": true,
 674 |             "required": [
 675 |               "type",
 676 |               "path",
 677 |               "format"
 678 |             ]
 679 |           }
 680 |         ]
 681 |       },
 682 |       "description": "Information about the servers."
 683 |     },
 684 |     "terms": {
 685 |       "type": "object",
 686 |       "description": "The terms and conditions of the data contract.",
 687 |       "properties": {
 688 |         "usage": {
 689 |           "type": "string",
 690 |           "description": "The usage describes the way the data is expected to be used. Can contain business and technical information."
 691 |         },
 692 |         "limitations": {
 693 |           "type": "string",
 694 |           "description": "The limitations describe the restrictions on how the data can be used, can be technical or restrictions on what the data may not be used for."
 695 |         },
 696 |         "billing": {
 697 |           "type": "string",
 698 |           "description": "The billing describes the pricing model for using the data, such as whether it's free, having a monthly fee, or metered pay-per-use."
 699 |         },
 700 |         "noticePeriod": {
 701 |           "type": "string",
 702 |           "description": "The period of time that must be given by either party to terminate or modify a data usage agreement. Uses ISO-8601 period format, e.g., 'P3M' for a period of three months."
 703 |         }
 704 |       },
 705 |       "additionalProperties": true
 706 |     },
 707 |     "models": {
 708 |       "description": "Specifies the logical data model. Use the models name (e.g., the table name) as the key.",
 709 |       "type": "object",
 710 |       "minProperties": 1,
 711 |       "propertyNames": {
 712 |         "pattern": "^[a-zA-Z0-9_-]+$"
 713 |       },
 714 |       "additionalProperties": {
 715 |         "type": "object",
 716 |         "title": "Model",
 717 |         "properties": {
 718 |           "description": {
 719 |             "type": "string"
 720 |           },
 721 |           "type": {
 722 |             "description": "The type of the model. Examples: table, view, object. Default: table.",
 723 |             "type": "string",
 724 |             "title": "ModelType",
 725 |             "default": "table",
 726 |             "enum": [
 727 |               "table",
 728 |               "view",
 729 |               "object"
 730 |             ]
 731 |           },
 732 |           "title": {
 733 |             "type": "string",
 734 |             "description": "An optional string providing a human readable name for the model. Especially useful if the model name is cryptic or contains abbreviations.",
 735 |             "examples": ["Purchase Orders", "Air Shipments"]
 736 |           },
 737 |           "fields": {
 738 |             "description": "Specifies a field in the data model. Use the field name (e.g., the column name) as the key.",
 739 |             "type": "object",
 740 |             "additionalProperties": {
 741 |               "type": "object",
 742 |               "title": "Field",
 743 |               "properties": {
 744 |                 "description": {
 745 |                   "type": "string",
 746 |                   "description": "An optional string describing the semantic of the data in this field."
 747 |                 },
 748 |                 "title": {
 749 |                   "type": "string",
 750 |                   "description": "An optional string providing a human readable name for the field. Especially useful if the field name is cryptic or contains abbreviations."
 751 |                 },
 752 |                 "type": {
 753 |                   "$ref": "#/$defs/FieldType"
 754 |                 },
 755 |                 "required": {
 756 |                   "type": "boolean",
 757 |                   "default": false,
 758 |                   "description": "An indication, if this field must contain a value and may not be null."
 759 |                 },
 760 |                 "fields": {
 761 |                   "description": "The nested fields (e.g. columns) of the object, record, or struct.",
 762 |                   "type": "object",
 763 |                   "additionalProperties": {
 764 |                     "$ref": "#/properties/models/additionalProperties/properties/fields/additionalProperties"
 765 |                   }
 766 |                 },
 767 |                 "items": {
 768 |                   "$ref": "#/properties/models/additionalProperties/properties/fields/additionalProperties"
 769 |                 },
 770 |                 "primary": {
 771 |                   "type": "boolean",
 772 |                   "default": false,
 773 |                   "description": "If this field is a primary key."
 774 |                 },
 775 |                 "references": {
 776 |                   "type": "string",
 777 |                   "description": "The reference to a field in another model. E.g. use 'orders.order_id' to reference the order_id field of the model orders. Think of defining a foreign key relationship.",
 778 |                   "examples": [
 779 |                     "orders.order_id",
 780 |                     "model.nested_field.field"
 781 |                   ]
 782 |                 },
 783 |                 "unique": {
 784 |                   "type": "boolean",
 785 |                   "default": false,
 786 |                   "description": "An indication, if the value must be unique within the model."
 787 |                 },
 788 |                 "enum": {
 789 |                   "type": "array",
 790 |                   "items": {
 791 |                     "type": "string"
 792 |                   },
 793 |                   "uniqueItems": true,
 794 |                   "description": "A value must be equal to one of the elements in this array value. Only evaluated if the value is not null."
 795 |                 },
 796 |                 "minLength": {
 797 |                   "type": "integer",
 798 |                   "description": "A value must greater than, or equal to, the value of this. Only applies to string types."
 799 |                 },
 800 |                 "maxLength": {
 801 |                   "type": "integer",
 802 |                   "description": "A value must less than, or equal to, the value of this. Only applies to string types."
 803 |                 },
 804 |                 "format": {
 805 |                   "type": "string",
 806 |                   "description": "A specific format the value must comply with (e.g., 'email', 'uri', 'uuid').",
 807 |                   "examples": [
 808 |                     "email",
 809 |                     "uri",
 810 |                     "uuid"
 811 |                   ]
 812 |                 },
 813 |                 "precision": {
 814 |                   "type": "number",
 815 |                   "examples": [
 816 |                     38
 817 |                   ],
 818 |                   "description": "The maximum number of digits in a number. Only applies to numeric values. Defaults to 38."
 819 |                 },
 820 |                 "scale": {
 821 |                   "type": "number",
 822 |                   "examples": [
 823 |                     0
 824 |                   ],
 825 |                   "description": "The maximum number of decimal places in a number. Only applies to numeric values. Defaults to 0."
 826 |                 },
 827 |                 "pattern": {
 828 |                   "type": "string",
 829 |                   "description": "A regular expression the value must match. Only applies to string types.",
 830 |                   "examples": [
 831 |                       "^[a-zA-Z0-9_-]+$"
 832 |                   ]
 833 |                 },
 834 |                 "minimum": {
 835 |                   "type": "number",
 836 |                   "description": "A value of a number must greater than, or equal to, the value of this. Only evaluated if the value is not null. Only applies to numeric values."
 837 |                 },
 838 |                 "exclusiveMinimum": {
 839 |                   "type": "number",
 840 |                   "description": "A value of a number must greater than the value of this. Only evaluated if the value is not null. Only applies to numeric values."
 841 |                 },
 842 |                 "maximum": {
 843 |                   "type": "number",
 844 |                   "description": "A value of a number must less than, or equal to, the value of this. Only evaluated if the value is not null. Only applies to numeric values."
 845 |                 },
 846 |                 "exclusiveMaximum": {
 847 |                   "type": "number",
 848 |                   "description": "A value of a number must less than the value of this. Only evaluated if the value is not null. Only applies to numeric values."
 849 |                 },
 850 |                 "example": {
 851 |                   "type": "string",
 852 |                   "description": "An example value for this field."
 853 |                 },
 854 |                 "pii": {
 855 |                   "type": "boolean",
 856 |                   "description": "An indication, if this field contains Personal Identifiable Information (PII)."
 857 |                 },
 858 |                 "classification": {
 859 |                   "type": "string",
 860 |                   "description": "The data class defining the sensitivity level for this field, according to the organization's classification scheme.",
 861 |                   "examples": [
 862 |                     "sensitive",
 863 |                     "restricted",
 864 |                     "internal",
 865 |                     "public"
 866 |                   ]
 867 |                 },
 868 |                 "tags": {
 869 |                   "type": "array",
 870 |                   "items": {
 871 |                     "type": "string"
 872 |                   },
 873 |                   "description": "Custom metadata to provide additional context."
 874 |                 },
 875 |                 "$ref": {
 876 |                   "type": "string",
 877 |                   "description": "A reference URI to a definition in the specification, internally or externally. Properties will be inherited from the definition."
 878 |                 },
 879 |                 "config": {
 880 |                   "type": "object",
 881 |                   "description": "Additional metadata for field configuration.",
 882 |                   "additionalProperties": {
 883 |                     "type": [
 884 |                       "string",
 885 |                       "number",
 886 |                       "boolean",
 887 |                       "object",
 888 |                       "array",
 889 |                       "null"
 890 |                     ]
 891 |                   },
 892 |                   "properties": {
 893 |                     "avroType": {
 894 |                       "type": "string",
 895 |                       "description": "Specify the field type to use when exporting the data model to Apache Avro."
 896 |                     },
 897 |                     "avroLogicalType": {
 898 |                       "type": "string",
 899 |                       "description": "Specify the logical field type to use when exporting the data model to Apache Avro."
 900 |                     },
 901 |                     "bigqueryType": {
 902 |                       "type": "string",
 903 |                       "description": "Specify the physical column type that is used in a BigQuery table, e.g., `NUMERIC(5, 2)`."
 904 |                     },
 905 |                     "snowflakeType": {
 906 |                       "type": "string",
 907 |                       "description": "Specify the physical column type that is used in a Snowflake table, e.g., `TIMESTAMP_LTZ`."
 908 |                     },
 909 |                     "redshiftType": {
 910 |                       "type": "string",
 911 |                       "description": "Specify the physical column type that is used in a Redshift table, e.g., `SMALLINT`."
 912 |                     },
 913 |                     "sqlserverType": {
 914 |                       "type": "string",
 915 |                       "description": "Specify the physical column type that is used in a SQL Server table, e.g., `DATETIME2`."
 916 |                     },
 917 |                     "unityType": {
 918 |                       "type": "string",
 919 |                       "description": "Specify the physical column type that is used in a Databricks Unity Catalog table."
 920 |                     },
 921 |                     "glueType": {
 922 |                       "type": "string",
 923 |                       "description": "Specify the physical column type that is used in an AWS Glue Data Catalog table."
 924 |                     }
 925 |                   }
 926 |                 }
 927 |               }
 928 |             }
 929 |           },
 930 |           "config": {
 931 |             "type": "object",
 932 |             "description": "Additional metadata for model configuration.",
 933 |             "additionalProperties": {
 934 |               "type": [
 935 |                 "string",
 936 |                 "number",
 937 |                 "boolean",
 938 |                 "object",
 939 |                 "array",
 940 |                 "null"
 941 | 
 942 | 
 943 |               ]
 944 |             },
 945 |             "properties": {
 946 |               "avroNamespace": {
 947 |                 "type": "string",
 948 |                 "description": "The namespace to use when importing and exporting the data model from / to Apache Avro."
 949 |               }
 950 |             }
 951 |           }
 952 |         }
 953 |       }
 954 |     },
 955 |     "definitions": {
 956 |       "description": "Clear and concise explanations of syntax, semantic, and classification of business objects in a given domain.",
 957 |       "type": "object",
 958 |       "propertyNames": {
 959 |         "pattern": "^[a-zA-Z0-9_-]+$"
 960 |       },
 961 |       "additionalProperties": {
 962 |         "type": "object",
 963 |         "title": "Definition",
 964 |         "properties": {
 965 |           "domain": {
 966 |             "type": "string",
 967 |             "description": "The domain in which this definition is valid.",
 968 |             "default": "global"
 969 |           },
 970 |           "name": {
 971 |             "type": "string",
 972 |             "description": "The technical name of this definition."
 973 |           },
 974 |           "title": {
 975 |             "type": "string",
 976 |             "description": "The business name of this definition."
 977 |           },
 978 |           "description": {
 979 |             "type": "string",
 980 |             "description": "Clear and concise explanations related to the domain."
 981 |           },
 982 |           "type": {
 983 |             "$ref": "#/$defs/FieldType"
 984 |           },
 985 |           "minLength": {
 986 |             "type": "integer",
 987 |             "description": "A value must be greater than or equal to this value. Applies only to string types."
 988 |           },
 989 |           "maxLength": {
 990 |             "type": "integer",
 991 |             "description": "A value must be less than or equal to this value. Applies only to string types."
 992 |           },
 993 |           "format": {
 994 |             "type": "string",
 995 |             "description": "Specific format requirements for the value (e.g., 'email', 'uri', 'uuid')."
 996 |           },
 997 |           "precision": {
 998 |             "type": "integer",
 999 |             "examples": [
1000 |               38
1001 |             ],
1002 |             "description": "The maximum number of digits in a number. Only applies to numeric values. Defaults to 38."
1003 |           },
1004 |           "scale": {
1005 |             "type": "integer",
1006 |             "examples": [
1007 |               0
1008 |             ],
1009 |             "description": "The maximum number of decimal places in a number. Only applies to numeric values. Defaults to 0."
1010 |           },
1011 |           "pattern": {
1012 |             "type": "string",
1013 |             "description": "A regular expression pattern the value must match. Applies only to string types."
1014 |           },
1015 |           "minimum": {
1016 |             "type": "number",
1017 |             "description": "A value of a number must greater than, or equal to, the value of this. Only evaluated if the value is not null. Only applies to numeric values."
1018 |           },
1019 |           "exclusiveMinimum": {
1020 |             "type": "number",
1021 |             "description": "A value of a number must greater than the value of this. Only evaluated if the value is not null. Only applies to numeric values."
1022 |           },
1023 |           "maximum": {
1024 |             "type": "number",
1025 |             "description": "A value of a number must less than, or equal to, the value of this. Only evaluated if the value is not null. Only applies to numeric values."
1026 |           },
1027 |           "exclusiveMaximum": {
1028 |             "type": "number",
1029 |             "description": "A value of a number must less than the value of this. Only evaluated if the value is not null. Only applies to numeric values."
1030 |           },
1031 |           "example": {
1032 |             "type": "string",
1033 |             "description": "An example value."
1034 |           },
1035 |           "pii": {
1036 |             "type": "boolean",
1037 |             "description": "Indicates if the field contains Personal Identifiable Information (PII)."
1038 |           },
1039 |           "classification": {
1040 |             "type": "string",
1041 |             "description": "The data class defining the sensitivity level for this field."
1042 |           },
1043 |           "tags": {
1044 |             "type": "array",
1045 |             "items": {
1046 |               "type": "string"
1047 |             },
1048 |             "description": "Custom metadata to provide additional context."
1049 |           }
1050 |         },
1051 |         "required": [
1052 |           "name",
1053 |           "type"
1054 |         ]
1055 |       }
1056 |     },
1057 |     "schema": {
1058 |       "type": "object",
1059 |       "properties": {
1060 |         "type": {
1061 |           "type": "string",
1062 |           "title": "SchemaType",
1063 |           "enum": [
1064 |             "dbt",
1065 |             "bigquery",
1066 |             "json-schema",
1067 |             "sql-ddl",
1068 |             "avro",
1069 |             "protobuf",
1070 |             "custom"
1071 |           ],
1072 |           "description": "The type of the schema. Typical values are dbt, bigquery, json-schema, sql-ddl, avro, protobuf, custom."
1073 |         },
1074 |         "specification": {
1075 |           "oneOf": [
1076 |             {
1077 |               "type": "string",
1078 |               "description": "The specification of the schema as a string."
1079 |             },
1080 |             {
1081 |               "type": "object",
1082 |               "description": "The specification of the schema as an object."
1083 |             }
1084 |           ]
1085 |         }
1086 |       },
1087 |       "required": [
1088 |         "type",
1089 |         "specification"
1090 |       ],
1091 |       "description": "The schema of the data contract describes the syntax and semantics of provided data sets. It supports different schema types."
1092 |     },
1093 |     "examples": {
1094 |       "type": "array",
1095 |       "items": {
1096 |         "type": "object",
1097 |         "properties": {
1098 |           "type": {
1099 |             "type": "string",
1100 |             "title": "ExampleType",
1101 |             "enum": [
1102 |               "csv",
1103 |               "json",
1104 |               "yaml",
1105 |               "custom"
1106 |             ],
1107 |             "description": "The type of the example data. Well-known types are csv, json, yaml, custom."
1108 |           },
1109 |           "description": {
1110 |             "type": "string",
1111 |             "description": "An optional string describing the example."
1112 |           },
1113 |           "model": {
1114 |             "type": "string",
1115 |             "description": "The reference to the model in the schema, e.g., a table name."
1116 |           },
1117 |           "data": {
1118 |             "oneOf": [
1119 |               {
1120 |                 "type": "string",
1121 |                 "description": "Example data for this model."
1122 |               },
1123 |               {
1124 |                 "type": "array",
1125 |                 "description": "Example data for this model in a structured format. Use this for type json or yaml."
1126 |               }
1127 |             ]
1128 |           }
1129 |         },
1130 |         "required": [
1131 |           "type",
1132 |           "data"
1133 |         ]
1134 |       },
1135 |       "description": "The Examples Object is an array of Example Objects."
1136 |     },
1137 |     "servicelevels": {
1138 |       "type": "object",
1139 |       "description": "Specifies the service level agreements for the provided data, including availability, data retention policies, latency requirements, data freshness, update frequency, support availability, and backup policies.",
1140 |       "properties": {
1141 |         "availability": {
1142 |           "type": "object",
1143 |           "description": "Availability refers to the promise or guarantee by the service provider about the uptime of the system that provides the data.",
1144 |           "properties": {
1145 |             "description": {
1146 |               "type": "string",
1147 |               "description": "An optional string describing the availability service level.",
1148 |               "example": "The server is available during support hours"
1149 |             },
1150 |             "percentage": {
1151 |               "type": "string",
1152 |               "description": "An optional string describing the guaranteed uptime in percent (e.g., `99.9%`)",
1153 |               "pattern": "^\\d+(\\.\\d+)?%$",
1154 |               "example": "99.9%"
1155 |             }
1156 |           }
1157 |         },
1158 |         "retention": {
1159 |           "type": "object",
1160 |           "description": "Retention covers the period how long data will be available.",
1161 |           "properties": {
1162 |             "description": {
1163 |               "type": "string",
1164 |               "description": "An optional string describing the retention service level.",
1165 |               "example": "Data is retained for one year."
1166 |             },
1167 |             "period": {
1168 |               "type": "string",
1169 |               "description": "An optional period of time, how long data is available. Supported formats: Simple duration (e.g., `1 year`, `30d`) and ISO 8601 duration (e.g, `P1Y`).",
1170 |               "example": "P1Y"
1171 |             },
1172 |             "unlimited": {
1173 |               "type": "boolean",
1174 |               "description": "An optional indicator that data is kept forever.",
1175 |               "example": false
1176 |             },
1177 |             "timestampField": {
1178 |               "type": "string",
1179 |               "description": "An optional reference to the field that contains the timestamp that the period refers to.",
1180 |               "example": "orders.order_timestamp"
1181 |             }
1182 |           }
1183 |         },
1184 |         "latency": {
1185 |           "type": "object",
1186 |           "description": "Latency refers to the maximum amount of time from the source to its destination.",
1187 |           "properties": {
1188 |             "description": {
1189 |               "type": "string",
1190 |               "description": "An optional string describing the latency service level.",
1191 |               "example": "Data is available within 25 hours after the order was placed."
1192 |             },
1193 |             "threshold": {
1194 |               "type": "string",
1195 |               "description": "An optional maximum duration between the source timestamp and the processed timestamp. Supported formats: Simple duration (e.g., `24 hours`, `5s`) and ISO 8601 duration (e.g, `PT24H`).",
1196 |               "example": "25h"
1197 |             },
1198 |             "sourceTimestampField": {
1199 |               "type": "string",
1200 |               "description": "An optional reference to the field that contains the timestamp when the data was provided at the source.",
1201 |               "example": "orders.order_timestamp"
1202 |             },
1203 |             "processedTimestampField": {
1204 |               "type": "string",
1205 |               "description": "An optional reference to the field that contains the processing timestamp, which denotes when the data is made available to consumers of this data contract.",
1206 |               "example": "orders.processed_timestamp"
1207 |             }
1208 |           }
1209 |         },
1210 |         "freshness": {
1211 |           "type": "object",
1212 |           "description": "The maximum age of the youngest row in a table.",
1213 |           "properties": {
1214 |             "description": {
1215 |               "type": "string",
1216 |               "description": "An optional string describing the freshness service level.",
1217 |               "example": "The age of the youngest row in a table is within 25 hours."
1218 |             },
1219 |             "threshold": {
1220 |               "type": "string",
1221 |               "description": "An optional maximum age of the youngest entry. Supported formats: Simple duration (e.g., `24 hours`, `5s`) and ISO 8601 duration (e.g., `PT24H`).",
1222 |               "example": "25h"
1223 |             },
1224 |             "timestampField": {
1225 |               "type": "string",
1226 |               "description": "An optional reference to the field that contains the timestamp that the threshold refers to.",
1227 |               "example": "orders.order_timestamp"
1228 |             }
1229 |           }
1230 |         },
1231 |         "frequency": {
1232 |           "type": "object",
1233 |           "description": "Frequency describes how often data is updated.",
1234 |           "properties": {
1235 |             "description": {
1236 |               "type": "string",
1237 |               "description": "An optional string describing the frequency service level.",
1238 |               "example": "Data is delivered once a day."
1239 |             },
1240 |             "type": {
1241 |               "type": "string",
1242 |               "enum": [
1243 |                 "batch",
1244 |                 "micro-batching",
1245 |                 "streaming",
1246 |                 "manual"
1247 |               ],
1248 |               "description": "The method of data processing.",
1249 |               "example": "batch"
1250 |             },
1251 |             "interval": {
1252 |               "type": "string",
1253 |               "description": "Optional. Only for batch: How often the pipeline is triggered, e.g., `daily`.",
1254 |               "example": "daily"
1255 |             },
1256 |             "cron": {
1257 |               "type": "string",
1258 |               "description": "Optional. Only for batch: A cron expression when the pipelines is triggered. E.g., `0 0 * * *`.",
1259 |               "example": "0 0 * * *"
1260 |             }
1261 |           }
1262 |         },
1263 |         "support": {
1264 |           "type": "object",
1265 |           "description": "Support describes the times when support will be available for contact.",
1266 |           "properties": {
1267 |             "description": {
1268 |               "type": "string",
1269 |               "description": "An optional string describing the support service level.",
1270 |               "example": "The data is available during typical business hours at headquarters."
1271 |             },
1272 |             "time": {
1273 |               "type": "string",
1274 |               "description": "An optional string describing the times when support will be available for contact such as `24/7` or `business hours only`.",
1275 |               "example": "9am to 5pm in EST on business days"
1276 |             },
1277 |             "responseTime": {
1278 |               "type": "string",
1279 |               "description": "An optional string describing the time it takes for the support team to acknowledge a request. This does not mean the issue will be resolved immediately, but it assures users that their request has been received and will be dealt with.",
1280 |               "example": "24 hours"
1281 |             }
1282 |           }
1283 |         },
1284 |         "backup": {
1285 |           "type": "object",
1286 |           "description": "Backup specifies details about data backup procedures.",
1287 |           "properties": {
1288 |             "description": {
1289 |               "type": "string",
1290 |               "description": "An optional string describing the backup service level.",
1291 |               "example": "Data is backed up once a week, every Sunday at 0:00 UTC."
1292 |             },
1293 |             "interval": {
1294 |               "type": "string",
1295 |               "description": "An optional interval that defines how often data will be backed up, e.g., `daily`.",
1296 |               "example": "weekly"
1297 |             },
1298 |             "cron": {
1299 |               "type": "string",
1300 |               "description": "An optional cron expression when data will be backed up, e.g., `0 0 * * *`.",
1301 |               "example": "0 0 * * 0"
1302 |             },
1303 |             "recoveryTime": {
1304 |               "type": "string",
1305 |               "description": "An optional Recovery Time Objective (RTO) specifies the maximum amount of time allowed to restore data from a backup after a failure or loss event (e.g., 4 hours, 24 hours).",
1306 |               "example": "24 hours"
1307 |             },
1308 |             "recoveryPoint": {
1309 |               "type": "string",
1310 |               "description": "An optional Recovery Point Objective (RPO) defines the maximum acceptable age of files that must be recovered from backup storage for normal operations to resume after a disaster or data loss event. This essentially measures how much data you can afford to lose, measured in time (e.g., 4 hours, 24 hours).",
1311 |               "example": "1 week"
1312 |             }
1313 |           }
1314 |         }
1315 |       }
1316 |     },
1317 |     "quality": {
1318 |       "type": "object",
1319 |       "properties": {
1320 |         "type": {
1321 |           "type": "string",
1322 |           "title": "QualityType",
1323 |           "enum": [
1324 |             "SodaCL",
1325 |             "montecarlo",
1326 |             "great-expectations",
1327 |             "custom"
1328 |           ],
1329 |           "description": "The type of the quality check. Typical values are SodaCL, montecarlo, great-expectations, custom."
1330 |         },
1331 |         "specification": {
1332 |           "oneOf": [
1333 |             {
1334 |               "type": "string",
1335 |               "description": "The specification of the quality attributes as a string."
1336 |             },
1337 |             {
1338 |               "type": "object",
1339 |               "description": "The specification of the quality attributes as an object."
1340 |             }
1341 |           ]
1342 |         }
1343 |       },
1344 |       "required": [
1345 |         "type",
1346 |         "specification"
1347 |       ],
1348 |       "description": "The quality object contains quality attributes and checks."
1349 |     }
1350 |   },
1351 |   "required": [
1352 |     "dataContractSpecification",
1353 |     "id",
1354 |     "info"
1355 |   ],
1356 |   "$defs": {
1357 |     "FieldType": {
1358 |       "type": "string",
1359 |       "title": "FieldType",
1360 |       "description": "The logical data type of the field.",
1361 |       "enum": [
1362 |         "number",
1363 |         "decimal",
1364 |         "numeric",
1365 |         "int",
1366 |         "integer",
1367 |         "long",
1368 |         "bigint",
1369 |         "float",
1370 |         "double",
1371 |         "string",
1372 |         "text",
1373 |         "varchar",
1374 |         "boolean",
1375 |         "timestamp",
1376 |         "timestamp_tz",
1377 |         "timestamp_ntz",
1378 |         "date",
1379 |         "array",
1380 |         "object",
1381 |         "record",
1382 |         "struct",
1383 |         "bytes",
1384 |         "null"
1385 |       ]
1386 |     }
1387 |   }
1388 | }
1389 | 


--------------------------------------------------------------------------------
/sources/cli.datacontract.com/README.md:
--------------------------------------------------------------------------------
   1 | # Data Contract CLI
   2 | 
   3 | <p>
   4 |   <a href="https://github.com/datacontract/datacontract-cli/actions/workflows/ci.yaml?query=branch%3Amain">
   5 |     <img alt="Test Workflow" src="https://img.shields.io/github/actions/workflow/status/datacontract/datacontract-cli/ci.yaml?branch=main"></a>
   6 |   <a href="https://github.com/datacontract/datacontract-cli">
   7 |     <img alt="Stars" src="https://img.shields.io/github/stars/datacontract/datacontract-cli" /></a>
   8 |   <a href="https://datacontract.com/slack" rel="nofollow"><img src="https://camo.githubusercontent.com/5ade1fd1e76a6ab860802cdd2941fe2501e2ca2cb534e5d8968dbf864c13d33d/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f736c61636b2d6a6f696e5f636861742d77686974652e7376673f6c6f676f3d736c61636b267374796c653d736f6369616c" alt="Slack Status" data-canonical-src="https://img.shields.io/badge/slack-join_chat-white.svg?logo=slack&amp;style=social" style="max-width: 100%;"></a>
   9 | </p>
  10 | 
  11 | The `datacontract` CLI is an open source command-line tool for working with [Data Contracts](https://datacontract.com/).
  12 | It uses data contract YAML files to lint the data contract, connect to data sources and execute schema and quality tests, detect breaking changes, and export to different formats. The tool is written in Python. It can be used as a standalone CLI tool, in a CI/CD pipeline, or directly as a Python library.
  13 | 
  14 | ![Main features of the Data Contract CLI](datacontractcli.png)
  15 | 
  16 | 
  17 | ## Getting started
  18 | 
  19 | Let's look at this data contract:
  20 | [https://datacontract.com/examples/orders-latest/datacontract.yaml](https://datacontract.com/examples/orders-latest/datacontract.yaml)
  21 | 
  22 | We have a _servers_ section with endpoint details to the S3 bucket, _models_ for the structure of the data, _servicelevels_ and _quality_ attributes that describe the expected freshness and number of rows.
  23 | 
  24 | This data contract contains all information to connect to S3 and check that the actual data meets the defined schema and quality requirements. We can use this information to test if the actual data set in S3 is compliant to the data contract.
  25 | 
  26 | Let's use [pip](https://pip.pypa.io/en/stable/getting-started/) to install the CLI (or use the [Docker image](#docker), if you prefer).
  27 | ```bash
  28 | $ python3 -m pip install datacontract-cli
  29 | ```
  30 | 
  31 | We run the tests:
  32 | 
  33 | ```bash
  34 | $ datacontract test https://datacontract.com/examples/orders-latest/datacontract.yaml
  35 | 
  36 | # returns:
  37 | Testing https://datacontract.com/examples/orders-latest/datacontract.yaml
  38 | ╭────────┬─────────────────────────────────────────────────────────────────────┬───────────────────────────────┬─────────╮
  39 | │ Result │ Check                                                               │ Field                         │ Details │
  40 | ├────────┼─────────────────────────────────────────────────────────────────────┼───────────────────────────────┼─────────┤
  41 | │ passed │ Check that JSON has valid schema                                    │ orders                        │         │
  42 | │ passed │ Check that JSON has valid schema                                    │ line_items                    │         │
  43 | │ passed │ Check that field order_id is present                                │ orders                        │         │
  44 | │ passed │ Check that field order_timestamp is present                         │ orders                        │         │
  45 | │ passed │ Check that field order_total is present                             │ orders                        │         │
  46 | │ passed │ Check that field customer_id is present                             │ orders                        │         │
  47 | │ passed │ Check that field customer_email_address is present                  │ orders                        │         │
  48 | │ passed │ row_count >= 5000                                                   │ orders                        │         │
  49 | │ passed │ Check that required field order_id has no null values               │ orders.order_id               │         │
  50 | │ passed │ Check that unique field order_id has no duplicate values            │ orders.order_id               │         │
  51 | │ passed │ duplicate_count(order_id) = 0                                       │ orders.order_id               │         │
  52 | │ passed │ Check that required field order_timestamp has no null values        │ orders.order_timestamp        │         │
  53 | │ passed │ freshness(order_timestamp) < 24h                                    │ orders.order_timestamp        │         │
  54 | │ passed │ Check that required field order_total has no null values            │ orders.order_total            │         │
  55 | │ passed │ Check that required field customer_email_address has no null values │ orders.customer_email_address │         │
  56 | │ passed │ Check that field lines_item_id is present                           │ line_items                    │         │
  57 | │ passed │ Check that field order_id is present                                │ line_items                    │         │
  58 | │ passed │ Check that field sku is present                                     │ line_items                    │         │
  59 | │ passed │ values in (order_id) must exist in orders (order_id)                │ line_items.order_id           │         │
  60 | │ passed │ row_count >= 5000                                                   │ line_items                    │         │
  61 | │ passed │ Check that required field lines_item_id has no null values          │ line_items.lines_item_id      │         │
  62 | │ passed │ Check that unique field lines_item_id has no duplicate values       │ line_items.lines_item_id      │         │
  63 | ╰────────┴─────────────────────────────────────────────────────────────────────┴───────────────────────────────┴─────────╯
  64 | 🟢 data contract is valid. Run 22 checks. Took 6.739514 seconds.
  65 | ```
  66 | 
  67 | Voilà, the CLI tested that the _datacontract.yaml_ itself is valid, all records comply with the schema, and all quality attributes are met.
  68 | 
  69 | We can also use the datacontract.yaml to export in many [formats](#format), e.g., to SQL:
  70 | 
  71 | ```bash
  72 | $ datacontract export --format sql https://datacontract.com/examples/orders-latest/datacontract.yaml
  73 | 
  74 | # returns:
  75 | -- Data Contract: urn:datacontract:checkout:orders-latest
  76 | -- SQL Dialect: snowflake
  77 | CREATE TABLE orders (
  78 |   order_id TEXT not null primary key,
  79 |   order_timestamp TIMESTAMP_TZ not null,
  80 |   order_total NUMBER not null,
  81 |   customer_id TEXT,
  82 |   customer_email_address TEXT not null,
  83 |   processed_timestamp TIMESTAMP_TZ not null
  84 | );
  85 | CREATE TABLE line_items (
  86 |   lines_item_id TEXT not null primary key,
  87 |   order_id TEXT,
  88 |   sku TEXT
  89 | );
  90 | ```
  91 | 
  92 | Or generate an HTML export:
  93 | 
  94 | ```bash
  95 | $ datacontract export --format html https://datacontract.com/examples/orders-latest/datacontract.yaml > datacontract.html
  96 | ```
  97 | 
  98 | which will create this [HTML export](https://datacontract.com/examples/orders-latest/datacontract.html).
  99 | 
 100 | ## Usage
 101 | 
 102 | ```bash
 103 | # create a new data contract from example and write it to datacontract.yaml
 104 | $ datacontract init datacontract.yaml
 105 | 
 106 | # lint the datacontract.yaml
 107 | $ datacontract lint datacontract.yaml
 108 | 
 109 | # execute schema and quality checks
 110 | $ datacontract test datacontract.yaml
 111 | 
 112 | # execute schema and quality checks on the examples within the contract
 113 | $ datacontract test --examples datacontract.yaml
 114 | 
 115 | # export data contract as html (other formats: avro, dbt, dbt-sources, dbt-staging-sql, jsonschema, odcs, rdf, sql, sodacl, terraform, ...)
 116 | $ datacontract export --format html datacontract.yaml > datacontract.html
 117 | 
 118 | # import avro (other formats: sql, glue, bigquery...)
 119 | $ datacontract import --format avro --source avro_schema.avsc
 120 | 
 121 | # find differences between to data contracts
 122 | $ datacontract diff datacontract-v1.yaml datacontract-v2.yaml
 123 | 
 124 | # find differences between to data contracts categorized into error, warning, and info.
 125 | $ datacontract changelog datacontract-v1.yaml datacontract-v2.yaml
 126 | 
 127 | # fail pipeline on breaking changes. Uses changelog internally and showing only error and warning.
 128 | $ datacontract breaking datacontract-v1.yaml datacontract-v2.yaml
 129 | ```
 130 | 
 131 | ## Programmatic (Python)
 132 | ```python
 133 | from datacontract.data_contract import DataContract
 134 | 
 135 | data_contract = DataContract(data_contract_file="datacontract.yaml")
 136 | run = data_contract.test()
 137 | if not run.has_passed():
 138 |     print("Data quality validation failed.")
 139 |     # Abort pipeline, alert, or take corrective actions...
 140 | ```
 141 | 
 142 | 
 143 | ## Installation
 144 | 
 145 | Choose the most appropriate installation method for your needs:
 146 | 
 147 | ### pip
 148 | Python 3.11 recommended.
 149 | Python 3.12 available as pre-release release candidate for 0.9.3
 150 | 
 151 | ```bash
 152 | python3 -m pip install datacontract-cli
 153 | ```
 154 | 
 155 | ### pipx
 156 | pipx installs into an isolated environment.
 157 | ```bash
 158 | pipx install datacontract-cli
 159 | ```
 160 | 
 161 | ### Docker
 162 | 
 163 | ```bash
 164 | docker pull datacontract/cli
 165 | docker run --rm -v ${PWD}:/home/datacontract datacontract/cli
 166 | ```
 167 | 
 168 | Or via an alias that automatically uses the latest version:
 169 | 
 170 | ```bash
 171 | alias datacontract='docker run --rm -v "${PWD}:/home/datacontract" datacontract/cli:latest'
 172 | ```
 173 | 
 174 | ## Documentation
 175 | 
 176 | Commands
 177 | 
 178 | - [init](#init)
 179 | - [lint](#lint)
 180 | - [test](#test)
 181 | - [export](#export)
 182 | - [import](#import)
 183 | - [breaking](#breaking)
 184 | - [changelog](#changelog)
 185 | - [diff](#diff)
 186 | - [catalog](#catalog)
 187 | - [publish](#publish)
 188 | 
 189 | ### init
 190 | 
 191 | ```
 192 |  Usage: datacontract init [OPTIONS] [LOCATION]
 193 | 
 194 |  Download a datacontract.yaml template and write it to file.
 195 | 
 196 | ╭─ Arguments ──────────────────────────────────────────────────────────────────────────────────╮
 197 | │   location      [LOCATION]  The location (url or path) of the data contract yaml to create. │
 198 | │                             [default: datacontract.yaml]                                    │
 199 | ╰──────────────────────────────────────────────────────────────────────────────────────────────╯
 200 | ╭─ Options ────────────────────────────────────────────────────────────────────────────────────╮
 201 | │ --template                       TEXT  URL of a template or data contract                   │
 202 | │                                        [default:                                            │
 203 | │                                        https://datacontract.com/datacontract.init.yaml]     │
 204 | │ --overwrite    --no-overwrite          Replace the existing datacontract.yaml               │
 205 | │                                        [default: no-overwrite]                              │
 206 | │ --help                                 Show this message and exit.                          │
 207 | ╰──────────────────────────────────────────────────────────────────────────────────────────────╯
 208 | ```
 209 | 
 210 | ### lint
 211 | 
 212 | ```
 213 |  Usage: datacontract lint [OPTIONS] [LOCATION]
 214 | 
 215 |  Validate that the datacontract.yaml is correctly formatted.
 216 | 
 217 | ╭─ Arguments ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
 218 | │   location      [LOCATION]  The location (url or path) of the data contract yaml. [default: datacontract.yaml]                 │
 219 | ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
 220 | ╭─ Options ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
 221 | │ --schema        TEXT  The location (url or path) of the Data Contract Specification JSON Schema                                │
 222 | │                       [default: https://datacontract.com/datacontract.schema.json]                                             │
 223 | │ --help                Show this message and exit.                                                                              │
 224 | ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
 225 | ```
 226 | 
 227 | ### test
 228 | 
 229 | ```
 230 |  Usage: datacontract test [OPTIONS] [LOCATION]
 231 | 
 232 |  Run schema and quality tests on configured servers.
 233 | 
 234 | ╭─ Arguments ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
 235 | │   location      [LOCATION]  The location (url or path) of the data contract yaml. [default: datacontract.yaml]                 │
 236 | ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
 237 | ╭─ Options ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
 238 | │ --schema                                                       TEXT  The location (url or path) of the Data Contract           │
 239 | │                                                                      Specification JSON Schema                                 │
 240 | │                                                                      [default:                                                 │
 241 | │                                                                      https://datacontract.com/datacontract.schema.json]        │
 242 | │ --server                                                       TEXT  The server configuration to run the schema and quality    │
 243 | │                                                                      tests. Use the key of the server object in the data       │
 244 | │                                                                      contract yaml file to refer to a server, e.g.,            │
 245 | │                                                                      `production`, or `all` for all servers (default).         │
 246 | │                                                                      [default: all]                                            │
 247 | │ --examples                    --no-examples                          Run the schema and quality tests on the example data      │
 248 | │                                                                      within the data contract.                                 │
 249 | │                                                                      [default: no-examples]                                    │
 250 | │ --publish                                                      TEXT  The url to publish the results after the test             │
 251 | │                                                                      [default: None]                                           │
 252 | │ --publish-to-opentelemetry    --no-publish-to-opentelemetry          Publish the results to opentelemetry. Use environment     │
 253 | │                                                                      variables to configure the OTLP endpoint, headers, etc.   │
 254 | │                                                                      [default: no-publish-to-opentelemetry]                    │
 255 | │ --logs                        --no-logs                              Print logs [default: no-logs]                             │
 256 | │ --help                                                               Show this message and exit.                               │
 257 | ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
 258 | ```
 259 | 
 260 | Data Contract CLI connects to a data source and runs schema and quality tests to verify that the data contract is valid.
 261 | 
 262 | ```bash
 263 | $ datacontract test --server production datacontract.yaml
 264 | ```
 265 | 
 266 | To connect to the databases the `server` block in the datacontract.yaml is used to set up the connection.
 267 | In addition, credentials, such as username and passwords, may be defined with environment variables.
 268 | 
 269 | The application uses different engines, based on the server `type`.
 270 | Internally, it connects with DuckDB, Spark, or a native connection and executes the most tests with _soda-core_ and _fastjsonschema_.
 271 | 
 272 | Credentials are provided with environment variables.
 273 | 
 274 | Supported server types:
 275 | 
 276 | - [s3](#S3)
 277 | - [bigquery](#bigquery)
 278 | - [azure](#azure)
 279 | - [sqlserver](#sqlserver)
 280 | - [databricks](#databricks)
 281 | - [databricks (programmatic)](#databricks-programmatic)
 282 | - [dataframr (programmatic)](#dataframe-programmatic)
 283 | - [snowflake](#snowflake)
 284 | - [kafka](#kafka)
 285 | - [postgres](#postgres)
 286 | - [local](#local)
 287 | 
 288 | Supported formats:
 289 | 
 290 | - parquet
 291 | - json
 292 | - csv
 293 | - delta
 294 | - iceberg (coming soon)
 295 | 
 296 | Feel free to create an [issue](https://github.com/datacontract/datacontract-cli/issues), if you need support for an additional type and formats.
 297 | 
 298 | ### S3
 299 | 
 300 | Data Contract CLI can test data that is stored in S3 buckets or any S3-compliant endpoints in various formats.
 301 | 
 302 | #### Examples
 303 | 
 304 | ##### JSON
 305 | 
 306 | datacontract.yaml
 307 | ```yaml
 308 | servers:
 309 |   production:
 310 |     type: s3
 311 |     endpointUrl: https://minio.example.com # not needed with AWS S3
 312 |     location: s3://bucket-name/path/*/*.json
 313 |     format: json
 314 |     delimiter: new_line # new_line, array, or none
 315 | ```
 316 | 
 317 | ##### Delta Tables
 318 | 
 319 | datacontract.yaml
 320 | ```yaml
 321 | servers:
 322 |   production:
 323 |     type: s3
 324 |     endpointUrl: https://minio.example.com # not needed with AWS S3
 325 |     location: s3://bucket-name/path/table.delta # path to the Delta table folder containing parquet data files and the _delta_log
 326 |     format: delta
 327 | ```
 328 | 
 329 | #### Environment Variables
 330 | 
 331 | | Environment Variable              | Example                       | Description           |
 332 | |-----------------------------------|-------------------------------|-----------------------|
 333 | | `DATACONTRACT_S3_REGION`            | `eu-central-1`                  | Region of S3 bucket   |
 334 | | `DATACONTRACT_S3_ACCESS_KEY_ID`     | `AKIAXV5Q5QABCDEFGH`            | AWS Access Key ID     |
 335 | | `DATACONTRACT_S3_SECRET_ACCESS_KEY` | `93S7LRrJcqLaaaa/XXXXXXXXXXXXX` | AWS Secret Access Key |
 336 | 
 337 | 
 338 | 
 339 | ### BigQuery
 340 | 
 341 | We support authentication to BigQuery using Service Account Key. The used Service Account should include the roles:
 342 | * BigQuery Job User
 343 | * BigQuery Data Viewer
 344 | 
 345 | 
 346 | #### Example
 347 | 
 348 | datacontract.yaml
 349 | ```yaml
 350 | servers:
 351 |   production:
 352 |     type: bigquery
 353 |     project: datameshexample-product
 354 |     dataset: datacontract_cli_test_dataset
 355 | models:
 356 |   datacontract_cli_test_table: # corresponds to a BigQuery table
 357 |     type: table
 358 |     fields: ...
 359 | ```
 360 | 
 361 | #### Environment Variables
 362 | 
 363 | | Environment Variable                         | Example                   | Description                                             |
 364 | |----------------------------------------------|---------------------------|---------------------------------------------------------|
 365 | | `DATACONTRACT_BIGQUERY_ACCOUNT_INFO_JSON_PATH` | `~/service-access-key.json` | Service Access key as saved on key creation by BigQuery |
 366 | 
 367 | 
 368 | 
 369 | ### Azure
 370 | 
 371 | Data Contract CLI can test data that is stored in Azure Blob storage or Azure Data Lake Storage (Gen2) (ADLS) in various formats.
 372 | 
 373 | #### Example
 374 | 
 375 | datacontract.yaml
 376 | ```yaml
 377 | servers:
 378 |   production:
 379 |     type: azure
 380 |     location: abfss://datameshdatabricksdemo.dfs.core.windows.net/dataproducts/inventory_events/*.parquet
 381 |     format: parquet
 382 | ```
 383 | 
 384 | #### Environment Variables
 385 | 
 386 | Authentication works with an Azure Service Principal (SPN) aka App Registration with a secret.
 387 | 
 388 | | Environment Variable              | Example                       | Description                                          |
 389 | |-----------------------------------|-------------------------------|------------------------------------------------------|
 390 | | `DATACONTRACT_AZURE_TENANT_ID`            | `79f5b80f-10ff-40b9-9d1f-774b42d605fc`                  | The Azure Tenant ID                                  |
 391 | | `DATACONTRACT_AZURE_CLIENT_ID`     | `3cf7ce49-e2e9-4cbc-a922-4328d4a58622`            | The ApplicationID / ClientID of the app registration |
 392 | | `DATACONTRACT_AZURE_CLIENT_SECRET` | `yZK8Q~GWO1MMXXXXXXXXXXXXX` | The Client Secret value                              |
 393 | 
 394 | 
 395 | 
 396 | ### Sqlserver
 397 | 
 398 | Data Contract CLI can test data in MS SQL Server (including Azure SQL, Synapse Analytics SQL Pool).
 399 | 
 400 | #### Example
 401 | 
 402 | datacontract.yaml
 403 | ```yaml
 404 | servers:
 405 |   production:
 406 |     type: sqlserver
 407 |     host: localhost
 408 |     port: 5432
 409 |     database: tempdb
 410 |     schema: dbo
 411 |     driver: ODBC Driver 18 for SQL Server
 412 | models:
 413 |   my_table_1: # corresponds to a table
 414 |     type: table
 415 |     fields:
 416 |       my_column_1: # corresponds to a column
 417 |         type: varchar
 418 | ```
 419 | 
 420 | #### Environment Variables
 421 | 
 422 | | Environment Variable             | Example            | Description |
 423 | |----------------------------------|--------------------|-------------|
 424 | | `DATACONTRACT_SQLSERVER_USERNAME` | `root`         | Username    |
 425 | | `DATACONTRACT_SQLSERVER_PASSWORD` | `toor` | Password    |
 426 | | `DATACONTRACT_SQLSERVER_TRUSTED_CONNECTION` | `True` | Use windows authentication, instead of login    |
 427 | | `DATACONTRACT_SQLSERVER_TRUST_SERVER_CERTIFICATE` | `True` | Trust self-signed certificate    |
 428 | | `DATACONTRACT_SQLSERVER_ENCRYPTED_CONNECTION` | `True` | Use SSL    |
 429 | 
 430 | 
 431 | 
 432 | 
 433 | ### Databricks
 434 | 
 435 | Works with Unity Catalog and Hive metastore.
 436 | 
 437 | Needs a running SQL warehouse or compute cluster.
 438 | 
 439 | #### Example
 440 | 
 441 | datacontract.yaml
 442 | ```yaml
 443 | servers:
 444 |   production:
 445 |     type: databricks
 446 |     host: dbc-abcdefgh-1234.cloud.databricks.com
 447 |     catalog: acme_catalog_prod
 448 |     schema: orders_latest
 449 | models:
 450 |   orders: # corresponds to a table
 451 |     type: table
 452 |     fields: ...
 453 | ```
 454 | 
 455 | #### Environment Variables
 456 | 
 457 | | Environment Variable                         | Example                              | Description                                           |
 458 | |----------------------------------------------|--------------------------------------|-------------------------------------------------------|
 459 | | `DATACONTRACT_DATABRICKS_TOKEN` | `dapia00000000000000000000000000000` | The personal access token to authenticate             |
 460 | | `DATACONTRACT_DATABRICKS_HTTP_PATH` | `/sql/1.0/warehouses/b053a3ffffffff` | The HTTP path to the SQL warehouse or compute cluster |
 461 | 
 462 | 
 463 | ### Databricks (programmatic)
 464 | 
 465 | Works with Unity Catalog and Hive metastore.
 466 | When running in a notebook or pipeline, the provided `spark` session can be used.
 467 | An additional authentication is not required.
 468 | 
 469 | Requires a Databricks Runtime with Python >= 3.10.
 470 | 
 471 | #### Example
 472 | 
 473 | datacontract.yaml
 474 | ```yaml
 475 | servers:
 476 |   production:
 477 |     type: databricks
 478 |     host: dbc-abcdefgh-1234.cloud.databricks.com # ignored, always use current host
 479 |     catalog: acme_catalog_prod
 480 |     schema: orders_latest
 481 | models:
 482 |   orders: # corresponds to a table
 483 |     type: table
 484 |     fields: ...
 485 | ```
 486 | 
 487 | Notebook
 488 | ```python
 489 | %pip install datacontract-cli
 490 | dbutils.library.restartPython()
 491 | 
 492 | from datacontract.data_contract import DataContract
 493 | 
 494 | data_contract = DataContract(
 495 |   data_contract_file="/Volumes/acme_catalog_prod/orders_latest/datacontract/datacontract.yaml",
 496 |   spark=spark)
 497 | run = data_contract.test()
 498 | run.result
 499 | ```
 500 | 
 501 | ### Dataframe (programmatic)
 502 | 
 503 | Works with Spark DataFrames.
 504 | DataFrames need to be created as named temporary views.
 505 | Multiple temporary views are suppored if your data contract contains multiple models.
 506 | 
 507 | Testing DataFrames is useful to test your datasets in a pipeline before writing them to a data source.
 508 | 
 509 | #### Example
 510 | 
 511 | datacontract.yaml
 512 | ```yaml
 513 | servers:
 514 |   production:
 515 |     type: dataframe
 516 | models:
 517 |   my_table: # corresponds to a temporary view
 518 |     type: table
 519 |     fields: ...
 520 | ```
 521 | 
 522 | Example code
 523 | ```python
 524 | from datacontract.data_contract import DataContract
 525 | 
 526 | df.createOrReplaceTempView("my_table")
 527 | 
 528 | data_contract = DataContract(
 529 |   data_contract_file="datacontract.yaml",
 530 |   spark=spark,
 531 | )
 532 | run = data_contract.test()
 533 | assert run.result == "passed"
 534 | ```
 535 | 
 536 | 
 537 | ### Snowflake
 538 | 
 539 | Data Contract CLI can test data in Snowflake.
 540 | 
 541 | #### Example
 542 | 
 543 | datacontract.yaml
 544 | ```yaml
 545 | 
 546 | servers:
 547 |   snowflake:
 548 |     type: snowflake
 549 |     account: abcdefg-xn12345
 550 |     database: ORDER_DB
 551 |     schema: ORDERS_PII_V2
 552 | models:
 553 |   my_table_1: # corresponds to a table
 554 |     type: table
 555 |     fields:
 556 |       my_column_1: # corresponds to a column
 557 |         type: varchar
 558 | ```
 559 | 
 560 | #### Environment Variables
 561 | 
 562 | | Environment Variable               | Example            | Description                                         |
 563 | |------------------------------------|--------------------|-----------------------------------------------------|
 564 | | `DATACONTRACT_SNOWFLAKE_USERNAME`  | `datacontract`     | Username                                            |
 565 | | `DATACONTRACT_SNOWFLAKE_PASSWORD`  | `mysecretpassword` | Password                                            |
 566 | | `DATACONTRACT_SNOWFLAKE_ROLE`      | `DATAVALIDATION`   | The snowflake role to use.                          |
 567 | | `DATACONTRACT_SNOWFLAKE_WAREHOUSE` | `COMPUTE_WH`       | The Snowflake Warehouse to use executing the tests. |
 568 | 
 569 | 
 570 | 
 571 | ### Kafka
 572 | 
 573 | Kafka support is currently considered experimental.
 574 | 
 575 | #### Example
 576 | 
 577 | datacontract.yaml
 578 | ```yaml
 579 | servers:
 580 |   production:
 581 |     type: kafka
 582 |     host: abc-12345.eu-central-1.aws.confluent.cloud:9092
 583 |     topic: my-topic-name
 584 |     format: json
 585 | ```
 586 | 
 587 | #### Environment Variables
 588 | 
 589 | | Environment Variable               | Example | Description                 |
 590 | |------------------------------------|---------|-----------------------------|
 591 | | `DATACONTRACT_KAFKA_SASL_USERNAME` | `xxx`   | The SASL username (key).    |
 592 | | `DATACONTRACT_KAFKA_SASL_PASSWORD` | `xxx`   | The SASL password (secret). |
 593 | 
 594 | 
 595 | ### Postgres
 596 | 
 597 | Data Contract CLI can test data in Postgres or Postgres-compliant databases (e.g., RisingWave).
 598 | 
 599 | #### Example
 600 | 
 601 | datacontract.yaml
 602 | ```yaml
 603 | servers:
 604 |   postgres:
 605 |     type: postgres
 606 |     host: localhost
 607 |     port: 5432
 608 |     database: postgres
 609 |     schema: public
 610 | models:
 611 |   my_table_1: # corresponds to a table
 612 |     type: table
 613 |     fields:
 614 |       my_column_1: # corresponds to a column
 615 |         type: varchar
 616 | ```
 617 | 
 618 | #### Environment Variables
 619 | 
 620 | | Environment Variable             | Example            | Description |
 621 | |----------------------------------|--------------------|-------------|
 622 | | `DATACONTRACT_POSTGRES_USERNAME` | `postgres`         | Username    |
 623 | | `DATACONTRACT_POSTGRES_PASSWORD` | `mysecretpassword` | Password    |
 624 | 
 625 | 
 626 | 
 627 | 
 628 | 
 629 | ### export
 630 | 
 631 | ```
 632 | 
 633 |  Usage: datacontract export [OPTIONS] [LOCATION]                                                                                                                           
 634 |                                                                                                                                                                            
 635 |  Convert data contract to a specific format. Prints to stdout or to the specified output file.                                                                                                     
 636 |                                                                                                                                                                            
 637 | ╭─ Arguments ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
 638 | │   location      [LOCATION]  The location (url or path) of the data contract yaml. [default: datacontract.yaml]                 │
 639 | ╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
 640 | ╭─ Options ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
 641 | │ *  --format        [jsonschema|pydantic-model|sodacl|dbt|dbt-sources|db  The export format. [default: None] [required]         │
 642 | │                    t-staging-sql|odcs|rdf|avro|protobuf|great-expectati                                                        │
 643 | │                    ons|terraform|avro-idl|sql|sql-query|html|go|bigquer                                                        │
 644 | │                    y|dbml]                                                                                                     │
 645 | │    --output        PATH                                                  Specify the file path where the exported data will be │
 646 | │                                                                          saved. If no path is provided, the output will be     │
 647 | │                                                                          printed to stdout.                                    │
 648 | │                                                                          [default: None]                                       │
 649 | │    --server        TEXT                                                  The server name to export. [default: None]            │
 650 | │    --model         TEXT                                                  Use the key of the model in the data contract yaml    │
 651 | │                                                                          file to refer to a model, e.g., `orders`, or `all`    │
 652 | │                                                                          for all models (default).                             │
 653 | │                                                                          [default: all]                                        │
 654 | │    --help                                                                Show this message and exit.                           │
 655 | ╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
 656 | ╭─ RDF Options ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
 657 | │ --rdf-base        TEXT  [rdf] The base URI used to generate the RDF graph. [default: None]                                     │
 658 | ╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
 659 | ╭─ SQL Options ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
 660 | │ --sql-server-type        TEXT  [sql] The server type to determine the sql dialect. By default, it uses 'auto' to automatically │
 661 | │                                detect the sql dialect via the specified servers in the data contract.                          │
 662 | │                                [default: auto]                                                                                 │
 663 | ╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
 664 | 
 665 | ```
 666 | 
 667 | ```bash
 668 | # Example export data contract as HTML
 669 | datacontract export --format html > datacontract.html
 670 | ```
 671 | 
 672 | Available export options:
 673 | 
 674 | | Type                 | Description                                             | Status |
 675 | |----------------------|---------------------------------------------------------|--------|
 676 | | `html`               | Export to HTML                                          | ✅     |
 677 | | `jsonschema`         | Export to JSON Schema                                   | ✅     |
 678 | | `odcs`               | Export to Open Data Contract Standard (ODCS)            | ✅     |
 679 | | `sodacl`             | Export to SodaCL quality checks in YAML format          | ✅     |
 680 | | `dbt`                | Export to dbt models in YAML format                     | ✅     |
 681 | | `dbt-sources`        | Export to dbt sources in YAML format                    | ✅     |
 682 | | `dbt-staging-sql`    | Export to dbt staging SQL models                        | ✅     |
 683 | | `rdf`                | Export data contract to RDF representation in N3 format | ✅     |
 684 | | `avro`               | Export to AVRO models                                   | ✅     |
 685 | | `protobuf`           | Export to Protobuf                                      | ✅     |
 686 | | `terraform`          | Export to terraform resources                           | ✅     |
 687 | | `sql`                | Export to SQL DDL                                       | ✅     |
 688 | | `sql-query`          | Export to SQL Query                                     | ✅     |
 689 | | `great-expectations` | Export to Great Expectations Suites in JSON Format      | ✅     |
 690 | | `bigquery`           | Export to BigQuery Schemas                              | ✅     |
 691 | | `go`                 | Export to Go types                                      | ✅     |
 692 | | `pydantic-model`     | Export to pydantic models                               | ✅     |
 693 | | `DBML`               | Export to a DBML Diagram description                    | ✅     |
 694 | | Missing something?   | Please create an issue on GitHub                        | TBD    |
 695 | 
 696 | #### Great Expectations
 697 | 
 698 | The export function transforms a specified data contract into a comprehensive Great Expectations JSON suite.
 699 | If the contract includes multiple models, you need to specify the names of the model you wish to export.
 700 | 
 701 | ```shell
 702 | datacontract  export datacontract.yaml --format great-expectations --model orders
 703 | ```
 704 | 
 705 | The export creates a list of expectations by utilizing:
 706 | 
 707 | - The data from the Model definition with a fixed mapping
 708 | - The expectations provided in the quality field for each model (find here the expectations gallery https://greatexpectations.io/expectations/)
 709 | 
 710 | #### RDF
 711 | 
 712 | The export function converts a given data contract into a RDF representation. You have the option to
 713 | add a base_url which will be used as the default prefix to resolve relative IRIs inside the document.
 714 | 
 715 | ```shell
 716 | datacontract export --format rdf --rdf-base https://www.example.com/ datacontract.yaml
 717 | ```
 718 | 
 719 | The data contract is mapped onto the following concepts of a yet to be defined Data Contract
 720 | Ontology named https://datacontract.com/DataContractSpecification/ :
 721 | - DataContract
 722 | - Server
 723 | - Model
 724 | 
 725 | Having the data contract inside an RDF Graph gives us access the following use cases:
 726 | - Interoperability with other data contract specification formats
 727 | - Store data contracts inside a knowledge graph
 728 | - Enhance a semantic search to find and retrieve data contracts
 729 | - Linking model elements to already established ontologies and knowledge
 730 | - Using full power of OWL to reason about the graph structure of data contracts
 731 | - Apply graph algorithms on multiple data contracts (Find similar data contracts, find "gatekeeper"
 732 | data products, find the true domain owner of a field attribute)
 733 | 
 734 | #### DBML
 735 | 
 736 | The export function converts the logical data types of the datacontract into the specific ones of a concrete Database
 737 | if a server is selected via the `--server` option (based on the `type` of that server). If no server is selected, the 
 738 | logical data types are exported.
 739 | 
 740 | 
 741 | #### Avro
 742 | 
 743 | The export function converts the data contract specification into an avro schema. It supports specifying custom avro properties for logicalTypes and default values. 
 744 | 
 745 | ##### Custom Avro Properties
 746 | 
 747 | We support a **config map on field level**. A config map may include any additional key-value pairs and support multiple server type bindings.
 748 | 
 749 | To specify custom Avro properties in your data contract, you can define them within the `config` section of your field definition. Below is an example of how to structure your YAML configuration to include custom Avro properties, such as `avroLogicalType` and `avroDefault`.
 750 | 
 751 | >NOTE: At this moment, we just support [logicalType](https://avro.apache.org/docs/1.11.0/spec.html#Logical+Types) and [default](https://avro.apache.org/docs/1.11.0/spec.htm)
 752 | 
 753 | #### Example Configuration
 754 | 
 755 | ```yaml
 756 | models:
 757 |   orders:
 758 |     fields:
 759 |       my_field_1:
 760 |         description: Example for AVRO with Timestamp (microsecond precision) https://avro.apache.org/docs/current/spec.html#Local+timestamp+%28microsecond+precision%29
 761 |         type: long
 762 |         example: 1672534861000000  # Equivalent to 2023-01-01 01:01:01 in microseconds
 763 |         config:          
 764 |           avroLogicalType: local-timestamp-micros
 765 |           avroDefault: 1672534861000000
 766 | ```
 767 | 
 768 | #### Explanation
 769 | 
 770 | - **models**: The top-level key that contains different models (tables or objects) in your data contract.
 771 | - **orders**: A specific model name. Replace this with the name of your model.
 772 | - **fields**: The fields within the model. Each field can have various properties defined.
 773 | - **my_field_1**: The name of a specific field. Replace this with your field name.
 774 |   - **description**: A textual description of the field.
 775 |   - **type**: The data type of the field. In this example, it is `long`.
 776 |   - **example**: An example value for the field.
 777 |   - **config**: Section to specify custom Avro properties.
 778 |     - **avroLogicalType**: Specifies the logical type of the field in Avro. In this example, it is `local-timestamp-micros`.
 779 |     - **avroDefault**: Specifies the default value for the field in Avro. In this example, it is 1672534861000000 which corresponds to ` 2023-01-01 01:01:01 UTC`.
 780 | 
 781 | 
 782 | ### import
 783 | 
 784 | ```
 785 |  Usage: datacontract import [OPTIONS]
 786 | 
 787 |  Create a data contract from the given source location. Prints to stdout.                                                              
 788 |                                                                                                                                        
 789 | ╭─ Options ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
 790 | │ *  --format                  [sql|avro|glue|bigquery|jsonschema]  The format of the source file. [default: None] [required]         │
 791 | │    --source                  TEXT                                 The path to the file or Glue Database that should be imported.    │
 792 | │                                                                   [default: None]                                                   │
 793 | │    --glue-table              TEXT                                 List of table ids to import from the Glue Database (repeat for    │
 794 | │                                                                   multiple table ids, leave empty for all tables in the dataset).   │
 795 | │                                                                   [default: None]                                                   │
 796 | │    --bigquery-project        TEXT                                 The bigquery project id. [default: None]                          │
 797 | │    --bigquery-dataset        TEXT                                 The bigquery dataset id. [default: None]                          │
 798 | │    --bigquery-table          TEXT                                 List of table ids to import from the bigquery API (repeat for     │
 799 | │                                                                   multiple table ids, leave empty for all tables in the dataset).   │
 800 | │                                                                   [default: None]                                                   │
 801 | │    --help                                                         Show this message and exit.                                       │
 802 | ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
 803 | ```
 804 | 
 805 | Example:
 806 | ```bash
 807 | # Example import from SQL DDL
 808 | datacontract import --format sql --source my_ddl.sql
 809 | ```
 810 | 
 811 | Available import options:
 812 | 
 813 | | Type               | Description                                    | Status  |
 814 | |--------------------|------------------------------------------------|---------|
 815 | | `sql`              | Import from SQL DDL                            | ✅      |
 816 | | `avro`             | Import from AVRO schemas                       | ✅      |
 817 | | `glue`             | Import from AWS Glue DataCatalog               | ✅      |
 818 | | `protobuf`         | Import from Protobuf schemas                   | TBD     |
 819 | | `jsonschema`       | Import from JSON Schemas                       | ✅      |
 820 | | `bigquery`         | Import from BigQuery Schemas                   | ✅      |
 821 | | `dbt`              | Import from dbt models                         | TBD     |
 822 | | `odcs`             | Import from Open Data Contract Standard (ODCS) | TBD     |
 823 | | Missing something? | Please create an issue on GitHub               | TBD     |
 824 | 
 825 | 
 826 | #### BigQuery
 827 | 
 828 | Bigquery data can either be imported off of JSON Files generated from the table descriptions or directly from the Bigquery API. In case you want to use JSON Files, specify the `source` parameter with a path to the JSON File.
 829 | 
 830 | To import from the Bigquery API, you have to _omit_ `source` and instead need to provide `bigquery-project` and `bigquery-dataset`. Additionally you may specify `bigquery-table` to enumerate the tables that should be imported. If no tables are given, _all_ available tables of the dataset will be imported.
 831 | 
 832 | For providing authentication to the Client, please see [the google documentation](https://cloud.google.com/docs/authentication/provide-credentials-adc#how-to) or the one [about authorizing client libraries](https://cloud.google.com/bigquery/docs/authentication#client-libs).
 833 | 
 834 | Examples: 
 835 | 
 836 | ```bash
 837 | # Example import from Bigquery JSON
 838 | datacontract import --format bigquery --source my_bigquery_table.json
 839 | ```
 840 | 
 841 | ```bash
 842 | # Example import from Bigquery API with specifying the tables to import
 843 | datacontract import --format bigquery --bigquery-project <project_id> --bigquery-dataset <dataset_id> --bigquery-table <tableid_1> --bigquery-table <tableid_2> --bigquery-table <tableid_3>
 844 | ```
 845 | 
 846 | ```bash
 847 | # Example import from Bigquery API importing all tables in the dataset
 848 | datacontract import --format bigquery --bigquery-project <project_id> --bigquery-dataset <dataset_id>
 849 | ```
 850 | 
 851 | ### Glue
 852 | 
 853 | Importing from Glue reads the necessary Data directly off of the AWS API.
 854 | You may give the `glue-table` parameter to enumerate the tables that should be imported. If no tables are given, _all_ available tables of the database will be imported.
 855 | 
 856 | Examples: 
 857 | 
 858 | ```bash
 859 | # Example import from AWS Glue with specifying the tables to import
 860 | datacontract import --format glue --source <database_name> --glue-table <table_name_1> --glue-table <table_name_2> --glue-table <table_name_3>
 861 | ```
 862 | 
 863 | ```bash
 864 | # Example import from AWS Glue importing all tables in the database
 865 | datacontract import --format glue --source <database_name>
 866 | ```
 867 | 
 868 | 
 869 | ### breaking
 870 | 
 871 | ```
 872 |  Usage: datacontract breaking [OPTIONS] LOCATION_OLD LOCATION_NEW
 873 | 
 874 |  Identifies breaking changes between data contracts. Prints to stdout.
 875 | 
 876 | ╭─ Arguments ───────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
 877 | │ *    location_old      TEXT  The location (url or path) of the old data contract yaml. [default: None] [required]        │
 878 | │ *    location_new      TEXT  The location (url or path) of the new data contract yaml. [default: None] [required]        │
 879 | ╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
 880 | ╭─ Options ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
 881 | │ --help          Show this message and exit.                                                                              │
 882 | ╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
 883 | ```
 884 | 
 885 | ### changelog
 886 | 
 887 | ```
 888 |  Usage: datacontract changelog [OPTIONS] LOCATION_OLD LOCATION_NEW
 889 | 
 890 |  Generate a changelog between data contracts. Prints to stdout.
 891 | 
 892 | ╭─ Arguments ───────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
 893 | │ *    location_old      TEXT  The location (url or path) of the old data contract yaml. [default: None] [required]        │
 894 | │ *    location_new      TEXT  The location (url or path) of the new data contract yaml. [default: None] [required]        │
 895 | ╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
 896 | ╭─ Options ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
 897 | │ --help          Show this message and exit.                                                                              │
 898 | ╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
 899 | ```
 900 | 
 901 | ### diff
 902 | 
 903 | ```
 904 |  Usage: datacontract diff [OPTIONS] LOCATION_OLD LOCATION_NEW
 905 | 
 906 |  PLACEHOLDER. Currently works as 'changelog' does.
 907 | 
 908 | ╭─ Arguments ───────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
 909 | │ *    location_old      TEXT  The location (url or path) of the old data contract yaml. [default: None] [required]        │
 910 | │ *    location_new      TEXT  The location (url or path) of the new data contract yaml. [default: None] [required]        │
 911 | ╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
 912 | ╭─ Options ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
 913 | │ --help          Show this message and exit.                                                                               │
 914 | ╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
 915 | ```
 916 | 
 917 | ### catalog
 918 | 
 919 | ```
 920 | 
 921 |  Usage: datacontract catalog [OPTIONS]
 922 | 
 923 |  Create an html catalog of data contracts.
 924 | 
 925 | ╭─ Options ────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
 926 | │ --files         TEXT  Glob pattern for the data contract files to include in the catalog. [default: *.yaml]             │
 927 | │ --output        TEXT  Output directory for the catalog html files. [default: catalog/]                                  │
 928 | │ --help                Show this message and exit.                                                                       │
 929 | ╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
 930 | ```
 931 | 
 932 | ### Publish
 933 | 
 934 | ```
 935 | 
 936 |  Usage: datacontract publish [OPTIONS] [LOCATION]
 937 | 
 938 |  Publish the data contract to the Data Mesh Manager.
 939 | 
 940 | ╭─ Arguments ────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
 941 | │   location      [LOCATION]  The location (url or path) of the data contract yaml. [default: datacontract.yaml]             │
 942 | ╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
 943 | ╭─ Options ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
 944 | │ --help          Show this message and exit.                                                                                │
 945 | ╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
 946 | ```
 947 | 
 948 | ## Integrations
 949 | 
 950 | | Integration       | Option                       | Description                                                                                           |
 951 | |-------------------|------------------------------|-------------------------------------------------------------------------------------------------------|
 952 | | Data Mesh Manager | `--publish`                  | Push full results to the [Data Mesh Manager API](https://api.datamesh-manager.com/swagger/index.html) |
 953 | | OpenTelemetry     | `--publish-to-opentelemetry` | Push result as gauge metrics                                                                          |
 954 | 
 955 | ### Integration with Data Mesh Manager
 956 | 
 957 | If you use [Data Mesh Manager](https://datamesh-manager.com/), you can use the data contract URL and append the `--publish` option to send and display the test results. Set an environment variable for your API key.
 958 | 
 959 | ```bash
 960 | # Fetch current data contract, execute tests on production, and publish result to data mesh manager
 961 | $ EXPORT DATAMESH_MANAGER_API_KEY=xxx
 962 | $ datacontract test https://demo.datamesh-manager.com/demo279750347121/datacontracts/4df9d6ee-e55d-4088-9598-b635b2fdcbbc/datacontract.yaml --server production --publish
 963 | ```
 964 | 
 965 | ### Integration with OpenTelemetry
 966 | 
 967 | If you use OpenTelemetry, you can use the data contract URL and append the `--publish-to-opentelemetry` option to send the test results to your OLTP-compatible instance, e.g., Prometheus.
 968 | 
 969 | The metric name is "datacontract.cli.test.result" and it uses the following encoding for the result:
 970 | 
 971 | | datacontract.cli.test.result | Description                           |
 972 | |-------|---------------------------------------|
 973 | | 0     | test run passed, no warnings          |
 974 | | 1     | test run has warnings                 |
 975 | | 2     | test run failed                       |
 976 | | 3     | test run not possible due to an error |
 977 | | 4     | test status unknown                   |
 978 | 
 979 | 
 980 | ```bash
 981 | # Fetch current data contract, execute tests on production, and publish result to open telemetry
 982 | $ EXPORT OTEL_SERVICE_NAME=datacontract-cli
 983 | $ EXPORT OTEL_EXPORTER_OTLP_ENDPOINT=https://YOUR_ID.apm.westeurope.azure.elastic-cloud.com:443
 984 | $ EXPORT OTEL_EXPORTER_OTLP_HEADERS=Authorization=Bearer%20secret # Optional, when using SaaS Products
 985 | $ EXPORT OTEL_EXPORTER_OTLP_PROTOCOL=http/protobuf # Optional, default is http/protobuf - use value grpc to use the gRPC protocol instead
 986 | # Send to OpenTelemetry
 987 | $ datacontract test https://demo.datamesh-manager.com/demo279750347121/datacontracts/4df9d6ee-e55d-4088-9598-b635b2fdcbbc/datacontract.yaml --server production --publish-to-opentelemetry
 988 | ```
 989 | 
 990 | Current limitations:
 991 | - currently, only ConsoleExporter and OTLP Exporter
 992 | - Metrics only, no logs yet (but loosely planned)
 993 | 
 994 | 
 995 | ## Best Practices
 996 | 
 997 | We share best practices in using the Data Contract CLI.
 998 | 
 999 | ### Data-first Approach
1000 | 
1001 | Create a data contract based on the actual data. This is the fastest way to get started and to get feedback from the data consumers.
1002 | 
1003 | 1. Use an existing physical schema (e.g., SQL DDL) as a starting point to define your logical data model in the contract. Double check right after the import whether the actual data meets the imported logical data model. Just to be sure.
1004 |     ```bash
1005 |     $ datacontract import --format sql ddl.sql
1006 |     $ datacontract test
1007 |     ```
1008 | 
1009 | 2. Add examples to the `datacontract.yaml`. If you can, use actual data and anonymize. Make sure that the examples match the imported logical data model.
1010 |    ```bash
1011 |    $ datacontract test --examples
1012 |    ```
1013 | 
1014 | 
1015 | 3. Add quality checks and additional type constraints one by one to the contract and make sure the examples and the actual data still adheres to the contract. Check against examples for a very fast feedback loop.
1016 |    ```bash
1017 |    $ datacontract test --examples
1018 |    $ datacontract test
1019 |    ```
1020 | 
1021 | 4. Make sure that all the best practices for a `datacontract.yaml` are met using the linter. You probably forgot to document some fields and add the terms and conditions.
1022 |    ```bash
1023 |    $ datacontract lint
1024 |    ```
1025 | 
1026 | 5. Set up a CI pipeline that executes daily and reports the results to the [Data Mesh Manager](https://datamesh-manager.com). Or to some place else. You can even publish to any opentelemetry compatible system.
1027 |    ```bash
1028 |    $ datacontract test --publish https://api.datamesh-manager.com/api/runs
1029 |    ```
1030 | 
1031 | ### Contract-First
1032 | 
1033 | Create a data contract based on the requirements from use cases.
1034 | 
1035 | 1. Start with a `datacontract.yaml` template.
1036 |    ```bash
1037 |    $ datacontract init
1038 |    ```
1039 | 
1040 | 2. Add examples to the `datacontract.yaml`. Do not start with the data model, although you are probably tempted to do that. Examples are the fastest way to get feedback from everybody and not loose someone in the discussion.
1041 | 
1042 | 3. Create the model based on the examples. Test the model against the examples to double-check whether the model matches the examples.
1043 |     ```bash
1044 |     $ datacontract test --examples
1045 |     ```
1046 | 
1047 | 4. Add quality checks and additional type constraints one by one to the contract and make sure the examples and the actual data still adheres to the contract. Check against examples for a very fast feedback loop.
1048 |     ```bash
1049 |     $ datacontract test --examples
1050 |     ```
1051 | 
1052 | 5. Fill in the terms, descriptions, etc. Make sure you follow all best practices for a `datacontract.yaml` using the linter.
1053 |     ```bash
1054 |     $ datacontract lint
1055 |     ```
1056 | 
1057 | 6. Set up a CI pipeline that lints and tests the examples so you make sure that any changes later do not decrease the quality of the contract.
1058 |     ```bash
1059 |     $ datacontract lint
1060 |     $ datacontract test --examples
1061 |     ```
1062 | 
1063 | 7. Use the export function to start building the providing data product as well as the integration into the consuming data products.
1064 |     ```bash
1065 |     # data provider
1066 |     $ datacontract export --format dbt
1067 |     # data consumer
1068 |     $ datacontract export --format dbt-sources
1069 |     $ datacontract export --format dbt-staging-sql
1070 |     ```
1071 | 
1072 | ### Schema Evolution
1073 | 
1074 | #### Non-breaking Changes
1075 | Examples: adding models or fields
1076 | 
1077 | - Add the models or fields in the datacontract.yaml
1078 | - Increment the minor version of the datacontract.yaml on any change. Simply edit the datacontract.yaml for this.
1079 | - You need a policy that these changes are non-breaking. That means that one cannot use the star expression in SQL to query a table under contract. Make the consequences known.
1080 | - Fail the build in the Pull Request if a datacontract.yaml accidentially adds a breaking change even despite only a minor version change
1081 |    ```bash
1082 |   $ datacontract breaking datacontract-from-pr.yaml datacontract-from-main.yaml
1083 |   ```
1084 | - Create a changelog of this minor change.
1085 |    ```bash
1086 |   $ datacontract changelog datacontract-from-pr.yaml datacontract-from-main.yaml
1087 |   ```
1088 | #### Breaking Changes
1089 | Examples: Removing or renaming models and fields.
1090 | 
1091 | - Remove or rename models and fields in the datacontract.yaml, and any other change that might be part of this new major version of this data contract.
1092 | - Increment the major version of the datacontract.yaml for this and create a new file for the major version. The reason being, that one needs to offer an upgrade path for the data consumers from the old to the new major version.
1093 | - As data consumers need to migrate, try to reduce the frequency of major versions by making multiple breaking changes together if possible.
1094 | - Be aware of the notice period in the data contract as this is the minimum amount of time you have to offer both the old and the new version for a migration path.
1095 | - Do not fear making breaking changes with data contracts. It's okay to do them in this controlled way. Really!
1096 | - Create a changelog of this major change.
1097 |    ```bash
1098 |   $ datacontract changelog datacontract-from-pr.yaml datacontract-from-main.yaml
1099 |   ```
1100 | 
1101 | ## Development Setup
1102 | 
1103 | Python base interpreter should be 3.11.x (unless working on 3.12 release candidate).
1104 | 
1105 | ```bash
1106 | # create venv
1107 | python3 -m venv venv
1108 | source venv/bin/activate
1109 | 
1110 | # Install Requirements
1111 | pip install --upgrade pip setuptools wheel
1112 | pip install -e '.[dev]'
1113 | ruff check --fix
1114 | ruff format
1115 | pytest
1116 | ```
1117 | 
1118 | 
1119 | ### Docker Build
1120 | 
1121 | ```bash
1122 | docker build -t datacontract/cli .
1123 | docker run --rm -v ${PWD}:/home/datacontract datacontract/cli
1124 | ```
1125 | 
1126 | #### Docker compose integration
1127 | 
1128 | We've included a [docker-compose.yml](./docker-compose.yml) configuration to simplify the build, test, and deployment of the image.
1129 | 
1130 | ##### Building the Image with Docker Compose
1131 | 
1132 | To build the Docker image using Docker Compose, run the following command:
1133 | 
1134 | ```bash
1135 | docker compose build
1136 | ```
1137 | 
1138 | This command utilizes the `docker-compose.yml` to build the image, leveraging predefined settings such as the build context and Dockerfile location. This approach streamlines the image creation process, avoiding the need for manual build specifications each time.
1139 | 
1140 | #### Testing the Image
1141 | 
1142 | After building the image, you can test it directly with Docker Compose:
1143 | 
1144 | ```bash
1145 | docker compose run --rm datacontract --version
1146 | ```
1147 | 
1148 | This command runs the container momentarily to check the version of the `datacontract` CLI. The `--rm` flag ensures that the container is automatically removed after the command executes, keeping your environment clean.
1149 | 
1150 | 
1151 | 
1152 | ## Release Steps
1153 | 
1154 | 1. Update the version in `pyproject.toml`
1155 | 2. Have a look at the `CHANGELOG.md`
1156 | 3. Create release commit manually
1157 | 4. Execute `./release`
1158 | 5. Wait until GitHub Release is created
1159 | 6. Add the release notes to the GitHub Release
1160 | 
1161 | ## Contribution
1162 | 
1163 | We are happy to receive your contributions. Propose your change in an issue or directly create a pull request with your improvements.
1164 | 
1165 | ## Companies using this tool
1166 | 
1167 | - [INNOQ](https://innoq.com)
1168 | - And many more. To add your company, please create a pull request.
1169 | 
1170 | ## License
1171 | 
1172 | [MIT License](LICENSE)
1173 | 
1174 | ## Credits
1175 | 
1176 | Created by [Stefan Negele](https://www.linkedin.com/in/stefan-negele-573153112/) and [Jochen Christ](https://www.linkedin.com/in/jochenchrist/).
1177 | 
1178 | 
1179 | 
1180 | <a href="https://github.com/datacontract/datacontract-cli" class="github-corner" aria-label="View source on GitHub"><svg width="80" height="80" viewBox="0 0 250 250" style="fill:#151513; color:#fff; position: absolute; top: 0; border: 0; right: 0;" aria-hidden="true"><path d="M0,0 L115,115 L130,115 L142,142 L250,250 L250,0 Z"></path><path d="M128.3,109.0 C113.8,99.7 119.0,89.6 119.0,89.6 C122.0,82.7 120.5,78.6 120.5,78.6 C119.2,72.0 123.4,76.3 123.4,76.3 C127.3,80.9 125.5,87.3 125.5,87.3 C122.9,97.6 130.6,101.9 134.4,103.2" fill="currentColor" style="transform-origin: 130px 106px;" class="octo-arm"></path><path d="M115.0,115.0 C114.9,115.1 118.7,116.5 119.8,115.4 L133.7,101.6 C136.9,99.2 139.9,98.4 142.2,98.6 C133.8,88.0 127.5,74.4 143.8,58.0 C148.5,53.4 154.0,51.2 159.7,51.0 C160.3,49.4 163.2,43.6 171.4,40.1 C171.4,40.1 176.1,42.5 178.8,56.2 C183.1,58.6 187.2,61.8 190.9,65.4 C194.5,69.0 197.7,73.2 200.1,77.6 C213.8,80.2 216.3,84.9 216.3,84.9 C212.7,93.1 206.9,96.0 205.4,96.6 C205.1,102.4 203.0,107.8 198.3,112.5 C181.9,128.9 168.3,122.5 157.7,114.1 C157.9,116.9 156.7,120.9 152.7,124.9 L141.0,136.5 C139.8,137.7 141.6,141.9 141.8,141.8 Z" fill="currentColor" class="octo-body"></path></svg></a><style>.github-corner:hover .octo-arm{animation:octocat-wave 560ms ease-in-out}@keyframes octocat-wave{0%,100%{transform:rotate(0)}20%,60%{transform:rotate(-25deg)}40%,80%{transform:rotate(10deg)}}@media (max-width:500px){.github-corner:hover .octo-arm{animation:none}.github-corner .octo-arm{animation:octocat-wave 560ms ease-in-out}}</style>
1181 | 


--------------------------------------------------------------------------------