├── .gitignore
├── CONTRIBUTING.md
├── README.md
├── versions
    ├── v0.2.0
    │   ├── CHANGELOG.md
    │   └── transformation-specification.md
    ├── v0.2.1
    │   └── CHANGELOG.md
    └── v0.1.0
    │   └── transformation-specification.md
├── CODE_OF_CONDUCT.md
└── LICENSE


/.gitignore:
--------------------------------------------------------------------------------
 1 | # Draft content - not included in the specification
 2 | draft/
 3 | 
 4 | # Common files
 5 | *.pyc
 6 | *.pyo
 7 | *.pyd
 8 | __pycache__/
 9 | .pytest_cache/
10 | *.egg-info/
11 | dist/
12 | build/
13 | 
14 | # IDE
15 | .vscode/
16 | .idea/
17 | *.swp
18 | *.swo
19 | *~
20 | 
21 | # OS files
22 | .DS_Store
23 | Thumbs.db
24 | 
25 | 


--------------------------------------------------------------------------------
/CONTRIBUTING.md:
--------------------------------------------------------------------------------
 1 | # Contributing to the Open Transformation Specification
 2 | 
 3 | We welcome contributions to the Open Transformation Specification! This document provides guidelines for contributing to the project.
 4 | 
 5 | ## How to Contribute
 6 | 
 7 | We welcome new contributors to the project whether you have changes to suggest, problems to report, or some feedback for us. Please jump to the most relevant section from the list below:
 8 | 
 9 | - Ask a question or offer feedback: use a discussion
10 | - Suggest a change or report a problem: open an issue
11 | - Contribute a change to the repository: open a pull request
12 | - Or just get in touch
13 | 
14 | ## Development Process
15 | 
16 | The Open Transformation Specification is currently in early development (v0.1.0). The development process is evolving as we establish the foundation for the specification.
17 | 
18 | ### Current Focus Areas
19 | 
20 | - **Core Specification**: Defining the fundamental structure and concepts
21 | - **Examples**: Creating clear examples of transformation specifications
22 | - **Documentation**: Building comprehensive documentation
23 | - **Community**: Establishing governance and contribution processes
24 | 
25 | ## Specification Guidelines
26 | 
27 | When contributing to the specification:
28 | 
29 | - Use clear, unambiguous language
30 | - Provide examples for complex concepts
31 | - Consider backward compatibility
32 | - Follow the established structure in `versions/v0.1.0/`
33 | 
34 | ## Code Standards
35 | 
36 | - Use clear, descriptive commit messages
37 | - Follow existing formatting conventions
38 | - Include documentation for new features
39 | - Test any code changes thoroughly
40 | 
41 | ## Getting Help
42 | 
43 | - Check existing [issues](https://github.com/francescomucio/open-transformation-specification/issues)
44 | - Start a [discussion](https://github.com/francescomucio/open-transformation-specification/discussions)
45 | - Review the [Code of Conduct](CODE_OF_CONDUCT.md)
46 | 
47 | ## License
48 | 
49 | By contributing to the Open Transformation Specification, you agree that your contributions will be licensed under the Apache 2.0 License.
50 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # The Open Transformation Specification
 2 | 
 3 | [![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
 4 | 
 5 | The Open Transformation Specification (OTS) is a community-driven open specification that defines a standard, programming language-agnostic interface description for data transformation pipelines and workflows, including transformations, data quality tests, and user-defined functions (UDFs).
 6 | 
 7 | This specification enables **interoperability** between tools and platforms, allowing data transformations to be shared, understood, and executed across different systems without vendor lock-in. By providing a common standard, OTS shifts the data transformation ecosystem from isolated, proprietary tools to an **open core** where tools can seamlessly work together around a shared specification.
 8 | 
 9 | This specification allows both humans and computers to discover and understand the capabilities of data transformations without requiring access to source code, additional documentation, or inspection of execution logs. When properly defined via OTS, a consumer can understand and interact with data transformation pipelines with a minimal amount of implementation logic.
10 | 
11 | ## Versions
12 | 
13 | This repository contains the Markdown sources for all published Open Transformation Specification versions. For release notes and release candidate versions, refer to the [releases page](https://github.com/francescomucio/open-transformation-specification/releases).
14 | 
15 | - **Current Version**: [v0.2.1](versions/v0.2.1/transformation-specification.md)
16 | - **Previous Versions**: 
17 |   - [v0.2.0](versions/v0.2.0/transformation-specification.md)
18 |   - [v0.1.0](versions/v0.1.0/transformation-specification.md)
19 | 
20 | ## Interoperability and Ecosystem
21 | 
22 | The Open Transformation Specification enables true **interoperability** in the data transformation space. By defining a common standard, OTS allows:
23 | 
24 | - **Cross-tool compatibility**: Transformations defined in one tool can be consumed and executed by any OTS-compliant tool
25 | - **Vendor independence**: Avoid lock-in to proprietary formats and tools
26 | - **Ecosystem growth**: An open core standard enables a thriving ecosystem of compatible tools, libraries, and services
27 | - **Seamless integration**: Tools can work together, sharing transformations, tests, and functions without conversion or manual intervention
28 | 
29 | ## Tools and Libraries
30 | 
31 | The OTS ecosystem is growing with tools and libraries that implement the specification. These tools demonstrate the **interoperability** benefits of the standard:
32 | 
33 | - **[Tee for Transform](https://github.com/francescomucio/tee-for-transform)** - A Python framework for managing SQL data transformations with support for multiple database backends. Fully OTS-compliant with import/export capabilities.
34 | 
35 | *More tools and libraries will be documented as they become available. If you're building an OTS-compliant tool, we'd love to feature it here!*
36 | 
37 | ## Participation
38 | 
39 | The current process for developing the Open Transformation Specification is described in the [Contributing Guidelines](CONTRIBUTING.md).
40 | 
41 | ## Community
42 | 
43 | Join our community on Discord to discuss the specification, ask questions, share ideas, and connect with other contributors: [Discord Community](https://discord.gg/gKm6KgACY7)
44 | 
45 | ## Licensing
46 | 
47 | See: [License](LICENSE) (Apache-2.0)
48 | 


--------------------------------------------------------------------------------
/versions/v0.2.0/CHANGELOG.md:
--------------------------------------------------------------------------------
  1 | # OTS v0.2.0 Changelog
  2 | 
  3 | ## New Features
  4 | 
  5 | ### User-Defined Functions (UDFs) Support
  6 | 
  7 | **Added `source_functions` field to SQL transformation code structure**
  8 | 
  9 | - **Field**: `source_functions` (optional array of strings)
 10 | - **Purpose**: Track user-defined functions (UDFs) called in SQL transformations
 11 | - **Location**: `code.sql.source_functions`
 12 | - **Format**: Array of function names (preferably fully qualified: `schema.function_name`)
 13 | 
 14 | ### Changes to SQL Transformation Structure
 15 | 
 16 | The `code.sql` object now includes:
 17 | 
 18 | ```yaml
 19 | code:
 20 |   sql:
 21 |     original_sql: string
 22 |     resolved_sql: string
 23 |     source_tables: [string]      # Existing field
 24 |     source_functions: [string]   # NEW in v0.2.0
 25 | ```
 26 | 
 27 | ### Benefits
 28 | 
 29 | 1. **Dependency Analysis**: Enables accurate dependency graph building including function dependencies
 30 | 2. **Execution Order**: Ensures functions are created before transformations that use them
 31 | 3. **Validation**: Allows tools to verify all required functions exist before execution
 32 | 4. **Function Chains**: Supports function-to-function dependencies
 33 | 
 34 | ### Backward Compatibility
 35 | 
 36 | - `source_functions` is **optional** - existing v0.1.0 modules remain valid
 37 | - If omitted, tools should assume an empty array `[]`
 38 | - All examples in v0.2.0 include `source_functions: []` for transformations without function dependencies
 39 | 
 40 | ### Functions Array in OTS Modules
 41 | 
 42 | **Added `functions` array to OTS Module structure**
 43 | 
 44 | - **Field**: `functions` (optional array of function definitions)
 45 | - **Purpose**: Define user-defined functions (UDFs) that can be used in transformations
 46 | - **Location**: Top-level in OTS Module (same level as `transformations`)
 47 | - **Structure**: Array of function definitions with `function_id`, `function_type`, `language`, `code`, `parameters`, `return_type`, `deterministic`, `dependencies`, and `metadata`
 48 | 
 49 | **Function Execution Order:**
 50 | - Functions are created before transformations in dependency order
 51 | - Function-to-function dependencies are resolved automatically
 52 | - Execution order: Seeds → Functions → Transformations
 53 | 
 54 | **Function Overloading:**
 55 | - OTS 0.2.0 supports function overloading (multiple functions with same name, different signatures)
 56 | - Functions are identified by fully qualified name and parameter signature
 57 | - Each overloaded function is tracked separately in the `functions` array
 58 | 
 59 | ### Test Library Structure Updates
 60 | 
 61 | **Added `ots_version` field to Test Library structure**
 62 | 
 63 | - **Field**: `ots_version` (required string)
 64 | - **Purpose**: Indicates which version of the OTS standard the test library follows
 65 | - **Location**: Top-level in Test Library file
 66 | - **Format**: OTS version string (e.g., "0.2.0")
 67 | 
 68 | ### Documentation Updates
 69 | 
 70 | - Added new section: "User-Defined Functions (UDFs)" with examples
 71 | - Updated all code examples to include `source_functions` field
 72 | - Added example transformation using UDFs
 73 | - Added `functions` array definition to OTS Module structure
 74 | - Updated test library examples to include `ots_version` field
 75 | 
 76 | ## Migration from v0.1.0
 77 | 
 78 | To migrate an OTS Module from v0.1.0 to v0.2.0:
 79 | 
 80 | 1. Update `ots_version` from `"0.1.0"` to `"0.2.0"`
 81 | 2. Add `source_functions: []` to all `code.sql` objects (if no functions are used)
 82 | 3. If transformations use UDFs, populate `source_functions` with function names
 83 | 
 84 | Example migration:
 85 | 
 86 | ```yaml
 87 | # v0.1.0
 88 | code:
 89 |   sql:
 90 |     original_sql: "..."
 91 |     resolved_sql: "..."
 92 |     source_tables: ["table1"]
 93 | 
 94 | # v0.2.0
 95 | code:
 96 |   sql:
 97 |     original_sql: "..."
 98 |     resolved_sql: "..."
 99 |     source_tables: ["table1"]
100 |     source_functions: []  # Add this line
101 | ```
102 | 
103 | 


--------------------------------------------------------------------------------
/versions/v0.2.1/CHANGELOG.md:
--------------------------------------------------------------------------------
  1 | # OTS v0.2.1 Changelog
  2 | 
  3 | ## New Features
  4 | 
  5 | ### Schema Change Handling for Incremental Materialization
  6 | 
  7 | **Added `on_schema_change` field to incremental materialization configuration**
  8 | 
  9 | - **Field**: `on_schema_change` (optional string)
 10 | - **Purpose**: Control how schema differences between transformation output and existing target table are handled
 11 | - **Location**: `materialization.incremental_details.on_schema_change`
 12 | - **Default**: `"fail"` (fail transformation if schema changes detected)
 13 | - **Options**:
 14 |   - `"fail"`: Fail the transformation if any schema differences are detected (default)
 15 |   - `"ignore"`: Ignore schema differences and proceed (may cause errors if columns don't match)
 16 |   - `"append_new_columns"`: Automatically add new columns, keep existing columns
 17 |   - `"sync_all_columns"`: Add new columns and remove missing columns (may cause data loss)
 18 |   - `"full_refresh"`: Drop and recreate table with full transformation output
 19 |   - `"full_incremental_refresh"`: Drop, recreate, then run incremental strategy in chunks
 20 |   - `"recreate_empty"`: Drop and recreate as empty table (for external backfilling)
 21 | 
 22 | ### Full Incremental Refresh Configuration
 23 | 
 24 | **Added `full_incremental_refresh` field for chunked incremental execution**
 25 | 
 26 | - **Field**: `full_incremental_refresh` (optional object)
 27 | - **Purpose**: Configure parameter-based incremental chunking after table recreation
 28 | - **Location**: Top-level in transformation definition (same level as `materialization`)
 29 | - **Required when**: `on_schema_change` is set to `"full_incremental_refresh"`
 30 | 
 31 | **Structure:**
 32 | ```yaml
 33 | full_incremental_refresh:
 34 |   parameters:
 35 |     - name: string         # Parameter name (matches placeholder, e.g., "@start_date")
 36 |       start_value: string  # Initial value for the parameter
 37 |       end_value: string    # End condition: hardcoded value or expression evaluated against source table (e.g., "max(event_date)" from source.events)
 38 |       step: string         # Increment step (SQL interval or numeric value)
 39 | ```
 40 | 
 41 | **Use Cases:**
 42 | - Single parameter: One parameter in the array (e.g., `@start_date`)
 43 | - Multiple parameters: Multiple parameters for boundary-based queries (e.g., `@start_date` and `@end_date`)
 44 | - Both parameters use the same `step` value (ideally), but can differ if needed
 45 | - Parameter names must match placeholders in the query (e.g., `"@start_date"` matches `'@start_date'` in SQL)
 46 | 
 47 | ### Schema Change Behavior Details
 48 | 
 49 | **Type Mismatches:**
 50 | - Columns with same name but different data type are treated as schema changes
 51 | - With `on_schema_change="fail"`, type mismatches cause immediate failure
 52 | - Different data type = different column (fail immediately)
 53 | 
 54 | **Column Order:**
 55 | - Column order differences are detected and logged as warnings
 56 | - Tools should rely on explicit column lists in INSERT/MERGE statements
 57 | - No automatic reordering is performed
 58 | 
 59 | **Schema Comparison:**
 60 | - Schema comparison happens after time-based filtering but before execution
 61 | - Uses database-specific `DESCRIBE` or equivalent methods for precision
 62 | - If table doesn't exist, schema comparison is skipped and table is created normally
 63 | 
 64 | ## Changes to Incremental Materialization Structure
 65 | 
 66 | The `incremental_details` object now includes:
 67 | 
 68 | ```yaml
 69 | incremental_details:
 70 |   strategy: string
 71 |   delete_condition: string
 72 |   filter_condition: string
 73 |   merge_key: [string]
 74 |   update_columns: [string]
 75 |   on_schema_change: string  # NEW in v0.2.1
 76 | ```
 77 | 
 78 | The transformation definition now supports:
 79 | 
 80 | ```yaml
 81 | materialization:
 82 |   type: "incremental"
 83 |   incremental_details: {...}
 84 | 
 85 | full_incremental_refresh:  # NEW in v0.2.1 (optional)
 86 |   parameters: [...]
 87 | ```
 88 | 
 89 | ## Backward Compatibility
 90 | 
 91 | - `on_schema_change` is **optional** - existing v0.2.0 modules remain valid
 92 | - If omitted, default behavior is `"fail"` (matches previous implicit behavior)
 93 | - `full_incremental_refresh` is **optional** - only required when `on_schema_change="full_incremental_refresh"`
 94 | - All existing v0.2.0 examples remain valid in v0.2.1
 95 | 
 96 | ## Migration from v0.2.0
 97 | 
 98 | To migrate an OTS Module from v0.2.0 to v0.2.1:
 99 | 
100 | 1. Update `ots_version` from `"0.2.0"` to `"0.2.1"`
101 | 2. Optionally add `on_schema_change` to `incremental_details` if you want explicit schema change handling
102 | 3. If using `full_incremental_refresh`, add the `full_incremental_refresh` configuration
103 | 
104 | Example migration:
105 | 
106 | ```yaml
107 | # v0.2.0
108 | materialization:
109 |   type: "incremental"
110 |   incremental_details:
111 |     strategy: "append"
112 |     filter_condition: "created_at >= '@start_date'"
113 | 
114 | # v0.2.1 (explicit default)
115 | materialization:
116 |   type: "incremental"
117 |   incremental_details:
118 |     strategy: "append"
119 |     filter_condition: "created_at >= '@start_date'"
120 |     on_schema_change: "fail"  # Optional: explicit default
121 | 
122 | # v0.2.1 (with schema change handling)
123 | materialization:
124 |   type: "incremental"
125 |   incremental_details:
126 |     strategy: "append"
127 |     filter_condition: "created_at >= '@start_date'"
128 |     on_schema_change: "append_new_columns"  # Auto-add new columns
129 | 
130 | # v0.2.1 (with full incremental refresh)
131 | materialization:
132 |   type: "incremental"
133 |   incremental_details:
134 |     strategy: "append"
135 |     filter_condition: "event_date >= '@start_date'"
136 |     on_schema_change: "full_incremental_refresh"
137 | 
138 | full_incremental_refresh:
139 |   parameters:
140 |     - name: "@start_date"
141 |       start_value: "2024-01-01"
142 |       end_value: "max(event_date)"  # Evaluated against source table
143 |       step: "INTERVAL 1 DAY"
144 | ```
145 | 
146 | ## Documentation Updates
147 | 
148 | - Added "Schema Change Handling" section to Incremental Materialization documentation
149 | - Added examples for all `on_schema_change` options
150 | - Added `full_incremental_refresh` configuration examples
151 | - Updated append strategy example to show `on_schema_change` usage
152 | - Added documentation for type mismatch and column order handling
153 | 
154 | 
155 | 


--------------------------------------------------------------------------------
/CODE_OF_CONDUCT.md:
--------------------------------------------------------------------------------
 1 | # Contributor Covenant 3.0 Code of Conduct
 2 | 
 3 | ## Our Pledge
 4 | 
 5 | We pledge to make our community welcoming, safe, and equitable for all.
 6 | 
 7 | We are committed to fostering an environment that respects and promotes the dignity, rights, and contributions of all individuals, regardless of characteristics including race, ethnicity, caste, color, age, physical characteristics, neurodiversity, disability, sex or gender, gender identity or expression, sexual orientation, language, philosophy or religion, national or social origin, socio-economic position, level of education, or other status. The same privileges of participation are extended to everyone who participates in good faith and in accordance with this Covenant.
 8 | 
 9 | ## Encouraged Behaviors
10 | 
11 | While acknowledging differences in social norms, we all strive to meet our community's expectations for positive behavior. We also understand that our words and actions may be interpreted differently than we intend based on culture, background, or native language.
12 | 
13 | With these considerations in mind, we agree to behave mindfully toward each other and act in ways that center our shared values, including:
14 | 
15 | 1. Respecting the **purpose of our community**, our activities, and our ways of gathering.
16 | 2. Engaging **kindly and honestly** with others.
17 | 3. Respecting **different viewpoints** and experiences.
18 | 4. **Taking responsibility** for our actions and contributions.
19 | 5. Gracefully giving and accepting **constructive feedback**.
20 | 6. Committing to **repairing harm** when it occurs.
21 | 7. Behaving in other ways that promote and sustain the **well-being of our community**.
22 | 
23 | ## Restricted Behaviors
24 | 
25 | We agree to restrict the following behaviors in our community. Instances, threats, and promotion of these behaviors are violations of this Code of Conduct.
26 | 
27 | 1. **Harassment.** Violating explicitly expressed boundaries or engaging in unnecessary personal attention after any clear request to stop.
28 | 2. **Character attacks.** Making insulting, demeaning, or pejorative comments directed at a community member or group of people.
29 | 3. **Stereotyping or discrimination.** Characterizing anyone's personality or behavior on the basis of immutable identities or traits.
30 | 4. **Sexualization.** Behaving in a way that would generally be considered inappropriately intimate in the context or purpose of the community.
31 | 5. **Violating confidentiality**. Sharing or acting on someone's personal or private information without their permission.
32 | 6. **Endangerment.** Causing, encouraging, or threatening violence or other harm toward any person or group.
33 | 7. Behaving in other ways that **threaten the well-being** of our community.
34 | 
35 | ### Other Restrictions
36 | 
37 | 1. **Misleading identity.** Impersonating someone else for any reason, or pretending to be someone else to evade enforcement actions.
38 | 2. **Failing to credit sources.** Not properly crediting the sources of content you contribute.
39 | 3. **Promotional materials**. Sharing marketing or other commercial content in a way that is outside the norms of the community.
40 | 4. **Irresponsible communication.** Failing to responsibly present content which includes, links or describes any other restricted behaviors.
41 | 
42 | ## Reporting an Issue
43 | 
44 | Tensions can occur between community members even when they are trying their best to collaborate. Not every conflict represents a code of conduct violation, and this Code of Conduct reinforces encouraged behaviors and norms that can help avoid conflicts and minimize harm.
45 | 
46 | When an incident does occur, it is important to report it promptly. To report a possible violation, please contact the project maintainers at [contact email].
47 | 
48 | Community Moderators take reports of violations seriously and will make every effort to respond in a timely manner. They will investigate all reports of code of conduct violations, reviewing messages, logs, and recordings, or interviewing witnesses and other participants. Community Moderators will keep investigation and enforcement actions as transparent as possible while prioritizing safety and confidentiality. In order to honor these values, enforcement actions are carried out in private with the involved parties, but communicating to the whole community may be part of a mutually agreed upon resolution.
49 | 
50 | ## Addressing and Repairing Harm
51 | 
52 | If an investigation by the Community Moderators finds that this Code of Conduct has been violated, the following enforcement ladder may be used to determine how best to repair harm, based on the incident's impact on the individuals involved and the community as a whole. Depending on the severity of a violation, lower rungs on the ladder may be skipped.
53 | 
54 | 1. **Warning**  
55 |    1. Event: A violation involving a single incident or series of incidents.  
56 |    2. Consequence: A private, written warning from the Community Moderators.  
57 |    3. Repair: Examples of repair include a private written apology, acknowledgement of responsibility, and seeking clarification on expectations.
58 | 
59 | 2. **Temporarily Limited Activities**  
60 |    1. Event: A repeated incidence of a violation that previously resulted in a warning, or the first incidence of a more serious violation.  
61 |    2. Consequence: A private, written warning with a time-limited cooldown period designed to underscore the seriousness of the situation and give the community members involved time to process the incident. The cooldown period may be limited to particular communication channels or interactions with particular community members.  
62 |    3. Repair: Examples of repair may include making an apology, using the cooldown period to reflect on actions and impact, and being thoughtful about re-entering community spaces after the period is over.
63 | 
64 | 3. **Temporary Suspension**  
65 |    1. Event: A pattern of repeated violation which the Community Moderators have tried to address with warnings, or a single serious violation.  
66 |    2. Consequence: A private written warning with conditions for return from suspension. In general, temporary suspensions give the person being suspended time to reflect upon their behavior and possible corrective actions.  
67 |    3. Repair: Examples of repair include respecting the spirit of the suspension, meeting the specified conditions for return, and being thoughtful about how to reintegrate with the community when the suspension is lifted.
68 | 
69 | 4. **Permanent Ban**  
70 |    1. Event: A pattern of repeated code of conduct violations that other steps on the ladder have failed to resolve, or a violation so serious that the Community Moderators determine there is no way to keep the community safe with this person as a member.  
71 |    2. Consequence: Access to all community spaces, tools, and communication channels is removed. In general, permanent bans should be rarely used, should have strong reasoning behind them, and should only be resorted to if working through other remedies has failed to change the behavior.  
72 |    3. Repair: There is no possible repair in cases of this severity.
73 | 
74 | This enforcement ladder is intended as a guideline. It does not limit the ability of Community Managers to use their discretion and judgment, in keeping with the best interests of our community.
75 | 
76 | ## Scope
77 | 
78 | This Code of Conduct applies within all community spaces, and also applies when an individual is officially representing the community in public or other spaces. Examples of representing our community include using an official email address, posting via an official social media account, or acting as an appointed representative at an online or offline event.
79 | 
80 | ## Attribution
81 | 
82 | This Code of Conduct is adapted from the [Contributor Covenant, version 3.0](https://www.contributor-covenant.org/version/3/0/code_of_conduct/), permanently available at https://www.contributor-covenant.org/version/3/0/.
83 | 
84 | Contributor Covenant is stewarded by the Organization for Ethical Source and licensed under CC BY-SA 4.0. To view a copy of this license, visit https://creativecommons.org/licenses/by-sa/4.0/
85 | 
86 | For answers to common questions about Contributor Covenant, see the FAQ at https://www.contributor-covenant.org/faq. Translations are provided at https://www.contributor-covenant.org/translations. Additional enforcement and community guideline resources can be found at https://www.contributor-covenant.org/resources. The enforcement ladder was inspired by the work of Mozilla's code of conduct team.
87 | 
88 | 
89 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
  1 |                                  Apache License
  2 |                            Version 2.0, January 2004
  3 |                         http://www.apache.org/licenses/
  4 | 
  5 |    TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
  6 | 
  7 |    1. Definitions.
  8 | 
  9 |       "License" shall mean the terms and conditions for use, reproduction,
 10 |       and distribution as defined by Sections 1 through 9 of this document.
 11 | 
 12 |       "Licensor" shall mean the copyright owner or entity granting the License.
 13 | 
 14 |       "Legal Entity" shall mean the union of the acting entity and all
 15 |       other entities that control, are controlled by, or are under common
 16 |       control with that entity. For the purposes of this definition,
 17 |       "control" means (i) the power, direct or indirect, to cause the
 18 |       direction or management of such entity, whether by contract or
 19 |       otherwise, or (ii) ownership of fifty percent (50%) or more of the
 20 |       outstanding shares, or (iii) beneficial ownership of such entity.
 21 | 
 22 |       "You" (or "Your") shall mean an individual or Legal Entity
 23 |       exercising permissions granted by this License.
 24 | 
 25 |       "Source" shall mean the preferred form for making modifications,
 26 |       including but not limited to software source code, documentation
 27 |       source, and configuration files.
 28 | 
 29 |       "Object" shall mean any form resulting from mechanical
 30 |       transformation or translation of a Source form, including but
 31 |       not limited to compiled object code, generated documentation,
 32 |       and conversions to other media types.
 33 | 
 34 |       "Work" shall mean the work of authorship, whether in Source or
 35 |       Object form, made available under the License, as indicated by a
 36 |       copyright notice that is included in or attached to the work
 37 |       (which shall not include communications that are clearly marked or
 38 |       otherwise designated in writing by the copyright owner as "Not a Work").
 39 | 
 40 |       "Derivative Works" shall mean any work, whether in Source or Object
 41 |       form, that is based upon (or derived from) the Work and for which the
 42 |       editorial revisions, annotations, elaborations, or other modifications
 43 |       represent, as a whole, an original work of authorship. For the purposes
 44 |       of this License, Derivative Works shall not include works that remain
 45 |       separable from, or merely link (or bind by name) to the interfaces of,
 46 |       the Work and derivative works thereof.
 47 | 
 48 |       "Contribution" shall mean any work of authorship, including
 49 |       the original version of the Work and any modifications or additions
 50 |       to that Work or Derivative Works thereof, that is intentionally
 51 |       submitted to Licensor for inclusion in the Work by the copyright owner
 52 |       or by an individual or Legal Entity authorized to submit on behalf of
 53 |       the copyright owner. For the purposes of this definition, "submitted"
 54 |       means any form of electronic, verbal, or written communication sent
 55 |       to the Licensor or its representatives, including but not limited to
 56 |       communication on electronic mailing lists, source code control
 57 |       systems, and issue tracking systems that are managed by, or on behalf
 58 |       of, the Licensor for the purpose of discussing and improving the Work,
 59 |       but excluding communication that is conspicuously marked or otherwise
 60 |       designated in writing by the copyright owner as "Not a Work".
 61 | 
 62 |    2. Grant of Copyright License. Subject to the terms and conditions of
 63 |       this License, each Contributor hereby grants to You a perpetual,
 64 |       worldwide, non-exclusive, no-charge, royalty-free, irrevocable
 65 |       copyright license to use, reproduce, modify, distribute, and prepare
 66 |       Derivative Works of, and to display, perform, and distribute the
 67 |       Work and such Derivative Works in Source or Object form.
 68 | 
 69 |    3. Grant of Patent License. Subject to the terms and conditions of
 70 |       this License, each Contributor hereby grants to You a perpetual,
 71 |       worldwide, non-exclusive, no-charge, royalty-free, irrevocable
 72 |       (except as stated in this section) patent license to make, have made,
 73 |       use, offer to sell, sell, import, and otherwise transfer the Work,
 74 |       where such license applies only to those patent claims licensable
 75 |       by such Contributor that are necessarily infringed by their
 76 |       Contribution(s) alone or by combination of their Contribution(s)
 77 |       with the Work to which such Contribution(s) was submitted. If You
 78 |       institute patent litigation against any entity (including a
 79 |       cross-claim or counterclaim in a lawsuit) alleging that the Work
 80 |       or a Contribution incorporated within the Work constitutes direct
 81 |       or contributory patent infringement, then any patent licenses
 82 |       granted to You under this License for that Work shall terminate
 83 |       as of the date such litigation is filed.
 84 | 
 85 |    4. Redistribution. You may reproduce and distribute copies of the
 86 |       Work or Derivative Works thereof in any medium, with or without
 87 |       modifications, and in Source or Object form, provided that You
 88 |       meet the following conditions:
 89 | 
 90 |       (a) You must give any other recipients of the Work or
 91 |           Derivative Works a copy of this License; and
 92 | 
 93 |       (b) You must cause any modified files to carry prominent notices
 94 |           stating that You changed the files; and
 95 | 
 96 |       (c) You must retain, in the Source form of any Derivative Works
 97 |           that You distribute, all copyright, patent, trademark, and
 98 |           attribution notices from the Source form of the Work,
 99 |           excluding those notices that do not pertain to any part of
100 |           the Derivative Works; and
101 | 
102 |       (d) If the Work includes a "NOTICE" text file as part of its
103 |           distribution, then any Derivative Works that You distribute must
104 |           include a readable copy of the attribution notices contained
105 |           within such NOTICE file, excluding those notices that do not
106 |           pertain to any part of the Derivative Works, in at least one
107 |           of the following places: within a NOTICE text file distributed
108 |           as part of the Derivative Works; within the Source form or
109 |           documentation, if provided along with the Derivative Works; or,
110 |           within a display generated by the Derivative Works, if and
111 |           wherever such third-party notices normally appear. The contents
112 |           of the NOTICE file are for informational purposes only and
113 |           do not modify the License. You may add Your own attribution
114 |           notices within Derivative Works that You distribute, alongside
115 |           or as an addendum to the NOTICE text from the Work, provided
116 |           that such additional attribution notices cannot be construed
117 |           as modifying the License.
118 | 
119 |       You may add Your own copyright notice to Your modifications and
120 |       may provide additional or different license terms and conditions
121 |       for use, reproduction, or distribution of Your modifications, or
122 |       for any such Derivative Works as a whole, provided Your use,
123 |       reproduction, and distribution of the Work otherwise complies with
124 |       the conditions stated in this License.
125 | 
126 |    5. Submission of Contributions. Unless You explicitly state otherwise,
127 |       any Contribution intentionally submitted for inclusion in the Work
128 |       by You to the Licensor shall be under the terms and conditions of
129 |       this License, without any additional terms or conditions.
130 |       Notwithstanding the above, nothing herein shall supersede or modify
131 |       the terms of any separate license agreement you may have executed
132 |       with Licensor regarding such Contributions.
133 | 
134 |    6. Trademarks. This License does not grant permission to use the trade
135 |       names, trademarks, service marks, or product names of the Licensor,
136 |       except as required for reasonable and customary use in describing the
137 |       origin of the Work and reproducing the content of the NOTICE file.
138 | 
139 |    7. Disclaimer of Warranty. Unless required by applicable law or
140 |       agreed to in writing, Licensor provides the Work (and each
141 |       Contributor provides its Contributions) on an "AS IS" BASIS,
142 |       WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
143 |       implied, including, without limitation, any warranties or conditions
144 |       of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
145 |       PARTICULAR PURPOSE. You are solely responsible for determining the
146 |       appropriateness of using or redistributing the Work and assume any
147 |       risks associated with Your exercise of permissions under this License.
148 | 
149 |    8. Limitation of Liability. In no event and under no legal theory,
150 |       whether in tort (including negligence), contract, or otherwise,
151 |       unless required by applicable law (such as deliberate and grossly
152 |       negligent acts) or agreed to in writing, shall any Contributor be
153 |       liable to You for damages, including any direct, indirect, special,
154 |       incidental, or consequential damages of any character arising as a
155 |       result of this License or out of the use or inability to use the
156 |       Work (including but not limited to damages for loss of goodwill,
157 |       work stoppage, computer failure or malfunction, or any and all
158 |       other commercial damages or losses), even if such Contributor
159 |       has been advised of the possibility of such damages.
160 | 
161 |    9. Accepting Warranty or Support. You may choose to offer, and to
162 |       charge a fee for, warranty, support, indemnity or other liability
163 |       obligations and/or rights consistent with this License. However, in
164 |       accepting such obligations, You may act only on Your own behalf and
165 |       on Your sole responsibility, not on behalf of any other Contributor,
166 |       and only if You agree to indemnify, defend, and hold each Contributor
167 |       harmless for any liability incurred by, or claims asserted against,
168 |       such Contributor by reason of your accepting any such warranty or support.
169 | 
170 |    END OF TERMS AND CONDITIONS
171 | 
172 |    APPENDIX: How to apply the Apache License to your work.
173 | 
174 |       To apply the Apache License to your work, attach the following
175 |       boilerplate notice, with the fields enclosed by brackets "[]"
176 |       replaced with your own identifying information. (Don't include
177 |       the brackets!)  The text should be enclosed in the appropriate
178 |       comment syntax for the file format. We also recommend that a
179 |       file or class name and description of purpose be included on the
180 |       same "printed page" as the copyright notice for easier
181 |       identification within third-party archives.
182 | 
183 |    Copyright [yyyy] [name of copyright owner]
184 | 
185 |    Licensed under the Apache License, Version 2.0 (the "License");
186 |    you may not use this file except in compliance with the License.
187 |    You may obtain a copy of the License at
188 | 
189 |        http://www.apache.org/licenses/LICENSE-2.0
190 | 
191 |    Unless required by applicable law or agreed to in writing, software
192 |    distributed under the License is distributed on an "AS IS" BASIS,
193 |    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
194 |    See the License for the specific language governing permissions and
195 |    limitations under the License.
196 | 
197 | 
198 | 


--------------------------------------------------------------------------------
/versions/v0.1.0/transformation-specification.md:
--------------------------------------------------------------------------------
   1 | # Open Transformation Specification v0.1.0
   2 | 
   3 | ## Table of Contents
   4 | 
   5 | 1. [Introduction](#introduction)
   6 | 2. [Core Concepts](#core-concepts)
   7 | 3. [Open Transformation Definition (OTD) Structure](#open-transformation-definition-otd-structure)
   8 | 4. [Materialization Types](#materialization-types)
   9 | 5. [Data Quality Tests](#data-quality-tests)
  10 | 6. [Examples](#examples)
  11 | 
  12 | ## Introduction
  13 | 
  14 | The Open Transformation Specification (OTS) defines a standard, programming language-agnostic interface description for data transformations. This specification allows both humans and computers to discover and understand how transformations behave, what outputs they produce, and how those outputs are materialized (as tables, views, incremental updates, SCD2, etc.) without requiring additional documentation or configuration.
  15 | 
  16 | An OTS-based transformation must include both the code that transforms the data and metadata about the transformation. A tool implementing OTS should be able to execute an OTS transformation with no additional code or information beyond what's specified in the OTS document.
  17 | 
  18 | ## Core Concepts
  19 | 
  20 | ### Open Transformation Definition (OTD)
  21 | 
  22 | An **Open Transformation Definition (OTD)** is a concrete instance of the Open Transformation Specification - a single file or document that describes a specific data transformation using the OTS format. 
  23 | 
  24 | A transformation is a unit of data processing that takes one or more data sources as input and produces one data output. Right now, transformations are SQL queries, but we plan to add support for other programming languages in the future.
  25 | 
  26 | ### Open Transformation Specification Module
  27 | 
  28 | An **Open Transformation Specification Module (OTS Module)** is a collection of related transformations that target the same database and schema. An OTS Module can contain one or more transformations, much like how an OpenAPI specification can contain multiple endpoints.
  29 | 
  30 | Key characteristics of an OTS Module:
  31 | - **Single target**: All transformations in a module target the same database and schema
  32 | - **Logical grouping**: Related transformations are organized together
  33 | - **Deployment unit**: The entire module can be deployed as a single unit
  34 | 
  35 | #### OTS vs OTD vs OTS Module
  36 | 
  37 | - **Open Transformation Specification (OTS)**: The standard that defines the structure and rules
  38 | - **Open Transformation Definition (OTD)**: A specific transformation within a module
  39 | - **Open Transformation Specification Module (OTS Module)**: A collection of related transformations targeting the same database and schema
  40 | 
  41 | ### Test Library
  42 | 
  43 | A **Test Library** is a project-level collection of reusable test definitions (generic and singular SQL tests) that can be shared across multiple OTS modules. Test libraries are defined separately from transformation modules and are referenced by modules that need to use them.
  44 | 
  45 | Key characteristics of a Test Library:
  46 | - **Project-level scope**: Test libraries are defined at the project/workspace level, separate from OTS modules
  47 | - **Reusability**: Tests defined in a library can be referenced by any OTS module in the project
  48 | - **Test types**: Contains both generic SQL tests (with placeholders) and singular SQL tests (table-specific)
  49 | - **Optional**: Modules can define tests inline or reference a test library, or both
  50 | 
  51 | #### OTS vs OTD vs OTS Module vs Test Library
  52 | 
  53 | - **Open Transformation Specification (OTS)**: The standard that defines the structure and rules
  54 | - **Open Transformation Definition (OTD)**: A specific transformation within a module
  55 | - **Open Transformation Specification Module (OTS Module)**: A collection of related transformations targeting the same database and schema
  56 | - **Test Library**: A project-level collection of reusable test definitions that can be shared across modules
  57 | 
  58 | Think of it this way: OTS is like the blueprint, an OTS Module is the house (a complete set of transformations), each OTD is a room within that house (an individual transformation), and a Test Library is like a shared toolbox of quality checks that can be used across multiple houses.
  59 | 
  60 | ## Components of an OTD
  61 | 
  62 | An Open Transformation Definition consists of several key components that work together to define an executable transformation:
  63 | 
  64 | 1. **Transformation Code**: The transformation logic (SQL, Python, PySpark, etc.) stored in a type-based structure
  65 | 2. **Schema Definition**: The structure of the output data including column definitions, types, and validation rules
  66 | 3. **Materialization Strategy**: How the output is stored and updated (table, view, incremental, SCD2)
  67 | 4. **Tests**: Validation rules that ensure data quality at table level
  68 | 5. **Metadata**: Additional information about the transformation (owner, tags, creation date, etc.)
  69 | 
  70 | ### Transformation Code
  71 | 
  72 | Transformations can be written in different languages (SQL, Python, PySpark, etc.). The transformation code is stored in a type-based structure that supports multiple transformation types while maintaining a consistent interface.
  73 | 
  74 | #### SQL Transformations
  75 | 
  76 | For SQL transformations, the code is stored with the following structure:
  77 | - `original_sql`: The original SQL query as written (typically a SELECT statement). This preserves the original transformation code as authored.
  78 | - `resolved_sql`: SQL with fully qualified table names (schema.table format). This is the preferred version for execution as it eliminates ambiguity in table references. Tools should use `resolved_sql` when executing transformations.
  79 | - `source_tables`: List of input tables referenced in the query (required for dependency analysis)
  80 | 
  81 | **When to use each:**
  82 | - Use `original_sql` for: displaying the original code to users, version control, understanding the transformation logic
  83 | - Use `resolved_sql` for: actual execution, dependency resolution, cross-database compatibility
  84 | 
  85 | #### Non-SQL Transformations
  86 | 
  87 | Support for non-SQL transformation types (Python, PySpark, R, etc.) is planned for future versions of the specification. The current v0.1.0 specification focuses on SQL transformations.
  88 | 
  89 | ### Schema Definition
  90 | 
  91 | Schema defines the structure of the output data, including column names, data types, descriptions, partitioning, indexes, and other properties of the physical table. The schema is essential for understanding what the transformation produces without executing it. For example, it enables generating DDL statements for creating the output table.
  92 | 
  93 | ### Materialization Strategy
  94 | Materialization defines how the transformation output is stored and updated. Common types include:
  95 | - **table**: Full table replacement on each run
  96 | - **view**: Virtual table that queries underlying data
  97 | - **incremental**: Partial updates using strategies like delete+insert or merge
  98 | - **scd2**: Slowly Changing Dimension type 2 for tracking historical changes
  99 | 
 100 | ### Tests
 101 | Tests are validation rules that ensure data quality. They can be defined at two levels:
 102 | - **Column-level tests**: Applied to individual columns (e.g., `not_null`, `unique`)
 103 | - **Table-level tests**: Applied to the entire output (e.g., `row_count_gt_0`, `unique`)
 104 | 
 105 | Tests enable automated data quality validation without manual inspection. OTS supports three types of tests:
 106 | 1. **Standard Tests**: Built-in tests defined in the OTS specification (e.g., `not_null`, `unique`, `row_count_gt_0`)
 107 | 2. **Generic SQL Tests**: Reusable SQL tests with placeholders that can be applied to multiple transformations
 108 | 3. **Singular SQL Tests**: Table-specific SQL tests with hardcoded table references
 109 | 
 110 | Generic SQL Tests and Singular SQL Tests can be defined in a project Test Library (see [Test Library](#test-library)) or inline within the current OTS Module.
 111 | 
 112 | For detailed information about test types, definitions, and usage, see the [Data Quality Tests](#data-quality-tests) section.
 113 | 
 114 | ### Metadata
 115 | Metadata provides additional information about the transformation including:
 116 | - **file_path**: Location of the source transformation file
 117 | - **owner**: Person or team responsible for the transformation
 118 | - **tags**: List of string tags for categorization and discovery (e.g., ["analytics", "fct", "production"])
 119 | - **object_tags**: Dictionary of key-value pairs for database object tagging (e.g., {"sensitivity_tag": "pii", "classification": "public"})
 120 | 
 121 | **Tag Types:**
 122 | - **tags** (dbt-style): Simple string tags used for filtering, categorization, and discovery. These are typically used for model selection and organization.
 123 |   - **Module-level tags**: Tags defined at the module level apply to all transformations in the module. They can be inherited by transformations or merged with transformation-specific tags.
 124 |   - **Transformation-level tags**: Tags defined at the transformation level are specific to that transformation. They can be merged with module-level tags.
 125 | - **object_tags** (database-style): Key-value pairs that are attached directly to database objects (tables, views) in databases that support object tagging (e.g., Snowflake). These are used for data governance, compliance, and metadata management. Unlike `tags`, `object_tags` are always transformation-specific and are not inherited from module level.
 126 | 
 127 | ## OTS Module Structure
 128 | 
 129 | An OTS Module is a YAML or JSON document that can contain one or more transformations. Below is the complete structure:
 130 | 
 131 | ### Complete OTS Module Structure
 132 | 
 133 | ```yaml
 134 | # OTS version
 135 | ots_version: string             # OTS specification version (e.g., "0.1.0") - indicates which version of the OTS standard this module follows
 136 | 
 137 | # Module metadata
 138 | module_name: string              # Module name (e.g., "ecommerce_analytics")
 139 | module_description: string       # Description of the module (optional)
 140 | version: string                  # Optional: Module/package version (e.g., "1.0.0") - version of this specific module, independent of OTS version
 141 | tags: [string]                   # Optional: Module-level tags for categorization (e.g., ["analytics", "fct"]). These can be inherited or merged with transformation-level tags.
 142 | test_library_path: string        # Optional: Path to test library file (relative to module file or absolute path)
 143 | 
 144 | # Optional: Inline test definitions (same structure as test library)
 145 | generic_tests:                   # Optional: Module-specific generic SQL tests
 146 |   test_name:
 147 |     type: "sql"
 148 |     level: "table" | "column"
 149 |     description: string
 150 |     sql: string
 151 |     parameters: {}
 152 | singular_tests:                  # Optional: Module-specific singular SQL tests
 153 |   test_name:
 154 |     type: "sql"
 155 |     level: "table" | "column"
 156 |     description: string
 157 |     sql: string
 158 |     target_transformation: string
 159 | 
 160 | target:
 161 |   database: string               # Target database name
 162 |   schema: string                 # Target schema name
 163 |   sql_dialect: string            # Optional: SQL dialect (e.g., "postgres", "bigquery", "snowflake", "spark", etc.)
 164 |   connection_profile: string     # Optional: connection profile reference
 165 | 
 166 | # Transformations
 167 | transformations:                 # Array of transformation definitions
 168 |   - transformation_id: string    # Fully qualified identifier (e.g., "analytics.my_first_table")
 169 |     description: string          # Optional: Description of what the transformation does (optional)
 170 |     transformation_type: string  # Type of transformation: "sql" (default: "sql"). Non-SQL types (python, pyspark, r) are planned for future versions.
 171 |     sql_dialect: string          # Optional: SQL dialect of the transformation code (for translation to target dialect)
 172 |     
 173 |     # Transformation code (type-based structure)
 174 |     code:
 175 |       # For SQL transformations (transformation_type: "sql")
 176 |       sql:
 177 |         original_sql: string     # The original SQL query as written (typically a SELECT statement)
 178 |         resolved_sql: string     # SQL with fully qualified table names (schema.table) - preferred for execution
 179 |         source_tables: [string] # List of input tables referenced (required for dependency analysis)
 180 |       
 181 |       # Note: Non-SQL transformation types (python, pyspark, r) are planned for future versions
 182 |     
 183 |     # Schema definition
 184 |     schema:
 185 |       columns:                   # Array of column definitions
 186 |         - name: string           # Column name
 187 |           datatype: string       # Data type ("number", "string", "date", etc.)
 188 |           description: string    # Column description
 189 |       partitioning: [string]     # Optional: Partition keys
 190 |       indexes:                   # Optional: Array of index definitions
 191 |         - name: string           # Index name (optional, auto-generated if not provided)
 192 |           columns: [string]      # Columns to index
 193 |     
 194 |     # Materialization strategy
 195 |     materialization:
 196 |       type: string               # "table", "view", "incremental", "scd2"
 197 |       incremental_details:       # Required if type is "incremental"
 198 |         strategy: string         # "delete_insert", "append", "merge"
 199 |         delete_condition: string # SQL condition for delete (delete_insert only)
 200 |         filter_condition: string # SQL condition for filtering data
 201 |         merge_key: [string]      # Primary key columns for matching records (merge only)
 202 |         update_columns: [string] # (Optional) List of columns to be updated in merge strategy
 203 |       scd2_details:              # Optional if type is "scd2"
 204 |         start_column: string     # Name of the start column (default: "valid_from")
 205 |         end_column: string       # Name of the end column (default: "valid_to")
 206 |         unique_key: [string]     # Array of columns that uniquely identify a record in SCD2 modeling (optional)
 207 |     
 208 |     # Tests: both column-level and table-level
 209 |     tests:
 210 |       columns:                   # Optional: Column-level tests
 211 |         column_name:             # Tests for specific columns
 212 |           - string               # Simple test name (e.g., "not_null", "unique")
 213 |           - object               # Test with parameters: {name: string, params?: object, severity?: "error"|"warning"}
 214 |       table:                     # Optional: Table-level tests
 215 |         - string                 # Simple test name (e.g., "row_count_gt_0")
 216 |         - object                 # Test with parameters: {name: string, params?: object, severity?: "error"|"warning"}
 217 |     
 218 |     # Metadata
 219 |     metadata:
 220 |       file_path: string          # Path to the source transformation file
 221 |       owner: string              # Optional: Person or team responsible (optional)
 222 |       tags: [string]             # Optional: List of string tags for categorization and discovery (e.g., ["analytics", "fct"])
 223 |       object_tags: dict          # Optional: Dictionary of key-value pairs for database object tagging (e.g., {"sensitivity_tag": "pii", "classification": "public"})
 224 | ```
 225 | 
 226 | ## Simple Table Transformation
 227 | 
 228 | <details>
 229 | <summary><strong>JSON Format</strong></summary>
 230 | 
 231 | ```json
 232 | {
 233 |   "ots_version": "0.1.0",
 234 |   "module_name": "analytics_customers",
 235 |   "module_description": "Customer analytics transformations",
 236 |   "tags": ["analytics", "production"],
 237 |   "test_library_path": "../tests/test_library.yaml",
 238 |   "target": {
 239 |     "database": "warehouse",
 240 |     "schema": "analytics",
 241 |     "sql_dialect": "postgres"
 242 |   },
 243 |   "transformations": [
 244 |     {
 245 |       "transformation_id": "analytics.customers",
 246 |       "description": "Customer data table",
 247 |       "transformation_type": "sql",
 248 |       "code": {
 249 |         "sql": {
 250 |           "original_sql": "SELECT id, name, email, created_at FROM source.customers WHERE active = true",
 251 |           "resolved_sql": "SELECT id, name, email, created_at FROM warehouse.source.customers WHERE active = true",
 252 |           "source_tables": ["source.customers"]
 253 |         }
 254 |       },
 255 |       "schema": {
 256 |         "columns": [
 257 |           {
 258 |             "name": "id",
 259 |             "datatype": "number",
 260 |             "description": "Unique customer identifier"
 261 |           },
 262 |           {
 263 |             "name": "name",
 264 |             "datatype": "string",
 265 |             "description": "Customer name"
 266 |           },
 267 |           {
 268 |             "name": "email",
 269 |             "datatype": "string",
 270 |             "description": "Customer email address"
 271 |           },
 272 |           {
 273 |             "name": "created_at",
 274 |             "datatype": "date",
 275 |             "description": "Customer creation date"
 276 |           }
 277 |         ],
 278 |         "partitioning": [],
 279 |         "indexes": [
 280 |           {
 281 |             "name": "idx_customers_id",
 282 |             "columns": ["id"]
 283 |           },
 284 |           {
 285 |             "name": "idx_customers_email",
 286 |             "columns": ["email"]
 287 |           }
 288 |         ]
 289 |       },
 290 |       "materialization": {
 291 |         "type": "table"
 292 |       },
 293 |       "tests": {
 294 |         "columns": {
 295 |           "id": ["not_null", "unique"],
 296 |           "email": ["not_null", "unique"],
 297 |           "created_at": ["not_null"]
 298 |         },
 299 |         "table": ["row_count_gt_0"]
 300 |       },
 301 |       "metadata": {
 302 |         "file_path": "/models/analytics/customers.sql",
 303 |         "owner": "data-team",
 304 |         "tags": ["customer", "core"],
 305 |         "object_tags": {
 306 |           "sensitivity_tag": "pii",
 307 |           "classification": "internal"
 308 |         }
 309 |       }
 310 |     }
 311 |   ]
 312 | }
 313 | ```
 314 | 
 315 | </details>
 316 | 
 317 | <details>
 318 | <summary><strong>YAML Format</strong></summary>
 319 | 
 320 | ```yaml
 321 | ots_version: "0.1.0"
 322 | module_name: "analytics_customers"
 323 | module_description: "Customer analytics transformations"
 324 | tags: ["analytics", "production"]
 325 | 
 326 | target:
 327 |   database: "warehouse"
 328 |   schema: "analytics"
 329 |   sql_dialect: "postgres"
 330 | 
 331 | transformations:
 332 |   - transformation_id: "analytics.customers"
 333 |     description: "Customer data table"
 334 |     transformation_type: "sql"
 335 |     
 336 |     code:
 337 |       sql:
 338 |         original_sql: "SELECT id, name, email, created_at FROM source.customers WHERE active = true"
 339 |         resolved_sql: "SELECT id, name, email, created_at FROM warehouse.source.customers WHERE active = true"
 340 |         source_tables: ["source.customers"]
 341 |     
 342 |     schema:
 343 |       columns:
 344 |         - name: "id"
 345 |           datatype: "number"
 346 |           description: "Unique customer identifier"
 347 |         - name: "name"
 348 |           datatype: "string"
 349 |           description: "Customer name"
 350 |         - name: "email"
 351 |           datatype: "string"
 352 |           description: "Customer email address"
 353 |         - name: "created_at"
 354 |           datatype: "date"
 355 |           description: "Customer creation date"
 356 |       partitioning: []
 357 |       indexes:
 358 |         - name: "idx_customers_id"
 359 |           columns: ["id"]
 360 |         - name: "idx_customers_email"
 361 |           columns: ["email"]
 362 |     
 363 |     materialization:
 364 |       type: "table"
 365 |     
 366 |     tests:
 367 |       columns:
 368 |         id: ["not_null", "unique"]
 369 |         email: ["not_null", "unique"]
 370 |         created_at: ["not_null"]
 371 |       table: ["row_count_gt_0"]
 372 |     
 373 |     metadata:
 374 |       file_path: "/models/analytics/customers.sql"
 375 |       owner: "data-team"
 376 |       tags: ["customer", "core"]
 377 |       object_tags:
 378 |         sensitivity_tag: "pii"
 379 |         classification: "internal"
 380 | ```
 381 | 
 382 | </details>
 383 | 
 384 | ## Materialization Types
 385 | 
 386 | ### Incremental Materialization
 387 | 
 388 | Incremental materialization updates only changed data using one of three strategies:
 389 | - **delete_insert**: Deletes rows matching a condition and inserts new data
 390 | - **append**: Simply appends new data without removing existing rows
 391 | - **merge**: Performs an upsert operation using a merge statement
 392 | 
 393 | #### Delete-Insert Strategy
 394 | 
 395 | ```yaml
 396 | materialization:
 397 |   type: "incremental"
 398 |   incremental_details:
 399 |     strategy: "delete_insert"
 400 |     delete_condition: "to_date(updated_ts) = '@start_date'" 
 401 |     filter_condition: "to_date(updated_ts) = '@start_date'"
 402 | ```
 403 | 
 404 | #### Append Strategy
 405 | 
 406 | ```yaml
 407 | materialization:
 408 |   type: "incremental"
 409 |   incremental_details:
 410 |     strategy: "append"
 411 |     filter_condition: "created_at >= '@start_date'"
 412 | ```
 413 | 
 414 | #### Merge Strategy
 415 | 
 416 | ```yaml
 417 | materialization:
 418 |   type: "incremental"
 419 |   incremental_details:
 420 |     strategy: "merge"
 421 |     merge_key: "customer_id"      # Primary key or unique identifier for matching
 422 |     filter_condition: "updated_at >= '@start_date'"
 423 |     update_columns: ["name", "email"]  # Optional: specific columns to update
 424 | ```
 425 | 
 426 | ### SCD2 Materialization
 427 | 
 428 | SCD2 (Slowly Changing Dimension Type 2) materialization tracks historical changes with valid date ranges. Requires a unique key to identify records.
 429 | 
 430 | ```yaml
 431 | materialization:
 432 |   type: "scd2"
 433 |   scd2_details:
 434 |     unique_key: ["product_id"]    # Primary key or unique identifier
 435 |     start_column: "valid_from"    # Optional, defaults to "valid_from"
 436 |     end_column: "valid_to"        # Optional, defaults to "valid_to"
 437 | ```
 438 | 
 439 | ### Schema Column Definition
 440 | 
 441 | A schema column in an OTD defines the structure and properties of a single column in the output table:
 442 | 
 443 | ```yaml
 444 | columns:
 445 |   - name: string              # Column name
 446 |     datatype: string          # Data type ("number", "string", "date", etc.)
 447 |     description: string        # Column description
 448 | ```
 449 | 
 450 | **Common Data Types:**
 451 | - `number`: Numeric values
 452 | - `string`: Text values  
 453 | - `date`: Date and timestamp values
 454 | - `boolean`: True/false values
 455 | - `array`: Array of values
 456 | - `object`: Complex nested objects
 457 | 
 458 | ## Data Quality Tests
 459 | 
 460 | Data quality tests are validation rules that ensure the correctness and quality of transformation outputs. Tests can be defined at two levels:
 461 | - **Column-level tests**: Applied to individual columns (e.g., `not_null`, `unique`)
 462 | - **Table-level tests**: Applied to the entire output (e.g., `row_count_gt_0`, `unique`)
 463 | 
 464 | Tests enable automated data quality validation without manual inspection. OTS supports three types of tests:
 465 | 
 466 | 1. **Standard Tests**: Built-in tests defined in the OTS specification (e.g., `not_null`, `unique`, `row_count_gt_0`)
 467 | 2. **Generic SQL Tests**: Reusable SQL tests with placeholders that can be applied to multiple transformations
 468 | 3. **Singular SQL Tests**: Table-specific SQL tests with hardcoded table references
 469 | 
 470 | ### Standard Tests
 471 | 
 472 | Standard tests are built into the OTS specification and must be implemented by all OTS-compliant tools. These tests provide common data quality checks that are widely applicable across different transformations.
 473 | 
 474 | #### Column-Level Standard Tests
 475 | 
 476 | **`not_null`**
 477 | - **Description**: Ensures a column contains no NULL values
 478 | - **Level**: Column
 479 | - **Parameters**: None
 480 | - **Implementation**: Returns rows where the column is NULL (test fails if any rows returned)
 481 | - **Example**: 
 482 |   ```yaml
 483 |   tests:
 484 |     columns:
 485 |       id: ["not_null"]
 486 |   ```
 487 | 
 488 | **`unique`**
 489 | - **Description**: Ensures column values are unique across all rows
 490 | - **Level**: Column or Table
 491 | - **Parameters**: 
 492 |   - `columns` (array, optional): For table-level tests, specifies which columns to check for uniqueness. If omitted at table level, checks all columns (entire row uniqueness)
 493 | - **Implementation**: Returns duplicate values (test fails if any duplicates found)
 494 | - **Examples**: 
 495 |   ```yaml
 496 |   tests:
 497 |     columns:
 498 |       # Column-level: single column uniqueness
 499 |       id: ["not_null", "unique"]
 500 |     
 501 |     table:
 502 |       # Table-level: composite uniqueness on specific columns
 503 |       - name: "unique"
 504 |         params:
 505 |           columns: ["customer_id", "order_date"]
 506 |       
 507 |       # Table-level: entire row uniqueness (all columns)
 508 |       - "unique"
 509 |   ```
 510 | 
 511 | **`accepted_values`**
 512 | - **Description**: Ensures column values are within a specified list of acceptable values
 513 | - **Level**: Column
 514 | - **Parameters**:
 515 |   - `values` (array, required): List of acceptable values
 516 | - **Implementation**: Returns rows where column value is not in the accepted list
 517 | - **Example**: 
 518 |   ```yaml
 519 |   tests:
 520 |     columns:
 521 |       status:
 522 |         - name: "accepted_values"
 523 |           params:
 524 |             values: ["active", "inactive", "pending"]
 525 |   ```
 526 | 
 527 | **`relationships`**
 528 | - **Description**: Ensures referential integrity between tables (foreign key validation)
 529 | - **Level**: Column
 530 | - **Parameters**:
 531 |   - `to` (string, required): Target transformation ID (e.g., "analytics.customers")
 532 |   - `field` (string, required): Column name in the target transformation
 533 | - **Implementation**: Returns rows where the column value doesn't exist in the target table's specified field
 534 | - **Example**:
 535 |   ```yaml
 536 |   tests:
 537 |     columns:
 538 |       customer_id:
 539 |         - name: "relationships"
 540 |           params:
 541 |             to: "analytics.customers"
 542 |             field: "id"
 543 |   ```
 544 | 
 545 | #### Table-Level Standard Tests
 546 | 
 547 | **`row_count_gt_0`**
 548 | - **Description**: Ensures the table has at least one row
 549 | - **Level**: Table
 550 | - **Parameters**: None
 551 | - **Implementation**: Returns a count result (test fails if count = 0)
 552 | - **Example**: 
 553 |   ```yaml
 554 |   tests:
 555 |     table:
 556 |       - "row_count_gt_0"
 557 |   ```
 558 | 
 559 | ### Test Libraries
 560 | 
 561 | Test libraries are project-level collections of custom test definitions (generic and singular SQL tests) that can be shared across multiple OTS modules. For a detailed introduction to Test Libraries, see the [Test Library](#test-library) section in Core Concepts.
 562 | 
 563 | #### Test Library Structure
 564 | 
 565 | A test library is a YAML or JSON file that defines reusable test definitions. The file can be named anything (e.g., `test_library.yaml`, `tests.yaml`, `data_quality_tests.json`), but must follow the structure below.
 566 | 
 567 | **Test Library File Structure:**
 568 | ```yaml
 569 | # test_library.yaml
 570 | test_library_version: string        # Optional: Version identifier for the test library (e.g., "1.0", "2.1")
 571 | description: string                  # Optional: Human-readable description of the test library
 572 | 
 573 | generic_tests:
 574 |   check_minimum_rows:
 575 |     type: "sql"
 576 |     level: "table"
 577 |     description: "Ensures table has minimum number of rows"
 578 |     sql: |
 579 |       SELECT 1 as violation
 580 |       FROM @table_name
 581 |       GROUP BY 1
 582 |       HAVING COUNT(*) < @min_rows:10
 583 |     parameters:
 584 |       min_rows:
 585 |         type: "number"
 586 |         default: 10
 587 |         description: "Minimum number of rows required"
 588 |   
 589 |   column_not_negative:
 590 |     type: "sql"
 591 |     level: "column"
 592 |     description: "Ensures numeric column has no negative values"
 593 |     sql: |
 594 |       SELECT @column_name
 595 |       FROM @table_name
 596 |       WHERE @column_name < 0
 597 |     parameters: []
 598 | 
 599 | singular_tests:
 600 |   test_customers_email_format:
 601 |     type: "sql"
 602 |     level: "table"
 603 |     description: "Validates email format for customers table"
 604 |     sql: |
 605 |       SELECT id, email
 606 |       FROM analytics.customers
 607 |       WHERE email NOT LIKE '%@%.%'
 608 |     target_transformation: "analytics.customers"
 609 | ```
 610 | 
 611 | #### Generic SQL Tests
 612 | 
 613 | Generic SQL tests are reusable tests that use placeholders (variables) to make them applicable to multiple transformations. They follow the dbt pattern where:
 614 | - The query returns rows when the test fails
 615 | - 0 rows returned = test passes
 616 | - 1+ rows returned = test fails
 617 | 
 618 | **Placeholders:**
 619 | - `@table_name` or `{{ table_name }}`: Replaced with the fully qualified transformation ID. The `@` syntax is recommended for cleaner SQL.
 620 | - `@column_name` or `{{ column_name }}`: Replaced with the column name (for column-level tests). The `@` syntax is recommended.
 621 | - Custom parameters: Available as `@parameter_name` or `{{ parameter_name }}` with optional defaults using `@param:default` syntax (e.g., `@min_rows:10`)
 622 | 
 623 | **Structure:**
 624 | ```yaml
 625 | generic_tests:
 626 |   test_name:                       # Required: Unique test name (used for referencing)
 627 |     type: "sql"                    # Required: Always "sql" for SQL tests
 628 |     level: "table" | "column"      # Required: Test level
 629 |     description: string            # Optional: Human-readable description
 630 |     sql: string                    # Required: SQL query (returns rows on failure)
 631 |     parameters:                    # Optional: Parameter definitions
 632 |       param_name:
 633 |         type: "number" | "string" | "boolean" | "array"  # Required: Parameter type
 634 |         default: value             # Optional: Default value
 635 |         description: string        # Optional: Parameter description
 636 | ```
 637 | 
 638 | **Example Generic Test:**
 639 | ```yaml
 640 | check_minimum_rows:
 641 |   type: "sql"
 642 |   level: "table"
 643 |   description: "Ensures table has minimum number of rows"
 644 |   sql: |
 645 |     SELECT 1 as violation
 646 |     FROM @table_name
 647 |     GROUP BY 1
 648 |     HAVING COUNT(*) < @min_rows:10
 649 |   parameters:
 650 |     min_rows:
 651 |       type: "number"
 652 |       default: 10
 653 |       description: "Minimum number of rows required"
 654 | ```
 655 | 
 656 | #### Singular SQL Tests
 657 | 
 658 | Singular SQL tests are table-specific tests with hardcoded table references. They are useful for:
 659 | - Complex business logic specific to one transformation
 660 | - Tests that reference multiple tables
 661 | - Table-specific validation rules
 662 | 
 663 | **Structure:**
 664 | ```yaml
 665 | singular_tests:
 666 |   test_name:                       # Required: Unique test name (used for referencing)
 667 |     type: "sql"                    # Required: Always "sql" for SQL tests
 668 |     level: "table" | "column"      # Required: Test level
 669 |     description: string            # Optional: Human-readable description
 670 |     sql: string                    # Required: SQL query with hardcoded table names
 671 |     target_transformation: string  # Required: Transformation ID this test applies to (used for validation and discovery)
 672 | ```
 673 | 
 674 | **Example Singular Test:**
 675 | ```yaml
 676 | test_customers_email_format:
 677 |   type: "sql"
 678 |   level: "table"
 679 |   description: "Validates email format for customers table"
 680 |   sql: |
 681 |     SELECT id, email
 682 |     FROM analytics.customers
 683 |     WHERE email NOT LIKE '%@%.%'
 684 |   target_transformation: "analytics.customers"
 685 | ```
 686 | 
 687 | ### Referencing Tests in Transformations
 688 | 
 689 | Transformations reference tests from:
 690 | 1. **Standard tests**: Referenced by name (e.g., `"not_null"`, `"unique"`)
 691 | 2. **Test library tests**: Referenced by name from the test library (e.g., `"check_minimum_rows"`)
 692 | 
 693 | **Module Structure with Test Library Reference:**
 694 | 
 695 | ```yaml
 696 | ots_version: "0.1.0"
 697 | module_name: "analytics_customers"
 698 | test_library_path: "../tests/test_library.yaml"  # Optional: Path to test library
 699 | 
 700 | target:
 701 |   database: "warehouse"
 702 |   schema: "analytics"
 703 | 
 704 | transformations:
 705 |   - transformation_id: "analytics.customers"
 706 |     tests:
 707 |       columns:
 708 |         id: 
 709 |           - "not_null"                           # Standard test
 710 |           - "unique"                             # Standard test (column-level)
 711 |         email:
 712 |           - "not_null"
 713 |           - name: "accepted_values"              # Standard test with params
 714 |             params:
 715 |               values: ["gmail.com", "yahoo.com"]
 716 |         amount:
 717 |           - name: "column_not_negative"           # Generic test from library
 718 |       table:
 719 |         - "row_count_gt_0"                        # Standard test
 720 |         - "unique"                                # Standard test (table-level, checks all columns)
 721 |         - name: "unique"                          # Standard test (table-level, composite on specific columns)
 722 |           params:
 723 |             columns: ["customer_id", "order_date"]
 724 |         - name: "check_minimum_rows"              # Generic test with params
 725 |           params:
 726 |             min_rows: 100
 727 |         - "test_customers_email_format"          # Singular test from library
 728 | ```
 729 | 
 730 | **Test Reference Formats:**
 731 | 
 732 | 1. **Simple string** (standard test, no parameters):
 733 |    ```yaml
 734 |    tests:
 735 |      columns:
 736 |        id: ["not_null", "unique"]
 737 |      table:
 738 |        - "row_count_gt_0"
 739 |    ```
 740 | 
 741 | 2. **Object with name** (standard test with parameters):
 742 |    ```yaml
 743 |    tests:
 744 |      columns:
 745 |        status:
 746 |          - name: "accepted_values"
 747 |            params:
 748 |              values: ["active", "inactive"]
 749 |    ```
 750 | 
 751 | 3. **Object with name** (generic/singular test from library):
 752 |    ```yaml
 753 |    tests:
 754 |      table:
 755 |        - name: "check_minimum_rows"
 756 |          params:
 757 |            min_rows: 100
 758 |    ```
 759 | 
 760 | ### Test Execution Model
 761 | 
 762 | Tests follow the dbt execution model:
 763 | - **0 rows returned** = test passes
 764 | - **1+ rows returned** = test fails
 765 | 
 766 | For standard tests, tools generate SQL queries that return violating rows. For SQL tests (generic and singular), the SQL query itself returns rows when violations are found.
 767 | 
 768 | **Test Severity:**
 769 | - Tests can have a `severity` level: `"error"` (default) or `"warning"`
 770 | - `error`: Test failure stops execution and fails the build
 771 | - `warning`: Test failure is logged but doesn't stop execution
 772 | 
 773 | **Severity in Test References:**
 774 | ```yaml
 775 | tests:
 776 |   columns:
 777 |     id:
 778 |       - name: "not_null"
 779 |         severity: "error"        # Default, can be omitted
 780 |       - name: "unique"
 781 |         severity: "warning"      # Non-blocking
 782 |   table:
 783 |     - name: "row_count_gt_0"
 784 |       severity: "error"          # Default, can be omitted
 785 | ```
 786 | 
 787 | ### Inline Test Definitions in OTS Modules
 788 | 
 789 | Generic and singular SQL tests can also be defined directly within an OTS Module, using the same structure as test libraries. This is useful for module-specific tests that don't need to be shared across modules.
 790 | 
 791 | **Module Structure with Inline Tests:**
 792 | ```yaml
 793 | ots_version: "0.1.0"
 794 | module_name: "analytics_customers"
 795 | 
 796 | # Optional: Inline test definitions (same structure as test library)
 797 | generic_tests:
 798 |   check_recent_data:
 799 |     type: "sql"
 800 |     level: "table"
 801 |     description: "Ensures table has recent data"
 802 |     sql: |
 803 |       SELECT 1 as violation
 804 |       FROM @table_name
 805 |       WHERE updated_at < CURRENT_DATE - INTERVAL '@days:7' DAY
 806 |     parameters:
 807 |       days:
 808 |         type: "number"
 809 |         default: 7
 810 | 
 811 | singular_tests:
 812 |   test_customers_specific:
 813 |     type: "sql"
 814 |     level: "table"
 815 |     description: "Module-specific test"
 816 |     sql: |
 817 |       SELECT id FROM analytics.customers WHERE status = 'invalid'
 818 |     target_transformation: "analytics.customers"
 819 | 
 820 | target:
 821 |   database: "warehouse"
 822 |   schema: "analytics"
 823 | 
 824 | transformations:
 825 |   - transformation_id: "analytics.customers"
 826 |     tests:
 827 |       table:
 828 |         - name: "check_recent_data"      # References inline generic test
 829 |           params:
 830 |             days: 3
 831 |         - "test_customers_specific"      # References inline singular test
 832 | ```
 833 | 
 834 | **Test Resolution Priority:**
 835 | When resolving test names, tools should check in the following order:
 836 | 1. **Standard tests** (built into OTS specification)
 837 | 2. **Inline tests** (defined in the current OTS Module)
 838 | 3. **Test library tests** (from referenced test library)
 839 | 
 840 | If a test name exists in multiple locations, the first match takes precedence. This allows modules to override test library tests with module-specific implementations.
 841 | 
 842 | ### Test Library Resolution
 843 | 
 844 | When a transformation module references a test library:
 845 | 1. The tool resolves the `test_library_path` (relative to the module file or absolute path)
 846 | 2. Loads the test library file (YAML or JSON format)
 847 | 3. Validates test definitions
 848 | 4. Makes tests available for reference in transformations (after inline tests)
 849 | 
 850 | **Test Discovery:**
 851 | - **Standard tests**: Always available, no discovery needed
 852 | - **Generic tests**: Discovered from test library or inline module definitions
 853 | - **Singular tests**: Discovered from test library or inline module definitions. The `target_transformation` field helps tools validate that the test is applied to the correct transformation.
 854 | 
 855 | If a test is referenced but not found among the Standard tests, inline tests, or Test library, it must result in an error.
 856 | 
 857 | ## Complete Examples: Incremental Strategies
 858 | 
 859 | ### Delete-Insert Example
 860 | 
 861 | <details>
 862 | <summary><strong>YAML Format</strong></summary>
 863 | 
 864 | ```yaml
 865 | ots_version: "0.1.0"
 866 | transformation_id: "analytics.recent_orders"
 867 | description: "Orders updated in the last 7 days"
 868 | 
 869 | transformation_type: "sql"
 870 | code:
 871 |   sql:
 872 |     original_sql: "SELECT order_id, customer_id, order_date, amount, status FROM source.orders WHERE updated_at >= '@start_date'"
 873 |     resolved_sql: "SELECT order_id, customer_id, order_date, amount, status FROM warehouse.source.orders WHERE updated_at >= '@start_date'"
 874 |     source_tables: ["source.orders"]
 875 | 
 876 | schema:
 877 |   columns:
 878 |     - name: "order_id"
 879 |       datatype: "number"
 880 |       description: "Unique order identifier"
 881 |     - name: "customer_id"
 882 |       datatype: "number"
 883 |       description: "Customer ID"
 884 |     - name: "order_date"
 885 |       datatype: "date"
 886 |       description: "Order date"
 887 |     - name: "amount"
 888 |       datatype: "number"
 889 |       description: "Order amount"
 890 |     - name: "status"
 891 |       datatype: "string"
 892 |       description: "Order status"
 893 |   partitioning: ["order_date"]
 894 |   indexes:
 895 |     - name: "idx_order_id"
 896 |       columns: ["order_id"]
 897 | 
 898 | materialization:
 899 |   type: "incremental"
 900 |   incremental_details:
 901 |     strategy: "delete_insert"
 902 |     delete_condition: "to_date(updated_at) = '@start_date'"
 903 |     filter_condition: "to_date(updated_at) = '@start_date'"
 904 |     
 905 | tests:
 906 |   columns:
 907 |     order_id: ["not_null", "unique"]
 908 |     order_date: ["not_null"]
 909 |   table: ["row_count_gt_0"]
 910 | 
 911 |   metadata:
 912 |   file_path: "/models/analytics/recent_orders.sql"
 913 |   owner: "analytics-team"
 914 |   tags: ["orders", "incremental"]
 915 | ```
 916 | 
 917 | </details>
 918 | 
 919 | <details>
 920 | <summary><strong>JSON Format</strong></summary>
 921 | 
 922 | ```json
 923 | {
 924 |   "ots_version": "0.1.0",
 925 |   "transformation_id": "analytics.recent_orders",
 926 |   "description": "Orders updated in the last 7 days",
 927 |   
 928 |   "transformation_type": "sql",
 929 |   "code": {
 930 |     "sql": {
 931 |       "original_sql": "SELECT order_id, customer_id, order_date, amount, status FROM source.orders WHERE updated_at >= '@start_date'",
 932 |       "resolved_sql": "SELECT order_id, customer_id, order_date, amount, status FROM warehouse.source.orders WHERE updated_at >= '@start_date'",
 933 |       "source_tables": ["source.orders"]
 934 |     }
 935 |   },
 936 |   
 937 |   "schema": {
 938 |     "columns": [
 939 |       {
 940 |         "name": "order_id",
 941 |         "datatype": "number",
 942 |         "description": "Unique order identifier"
 943 |       },
 944 |       {
 945 |         "name": "customer_id",
 946 |         "datatype": "number",
 947 |         "description": "Customer ID"
 948 |       },
 949 |       {
 950 |         "name": "order_date",
 951 |         "datatype": "date",
 952 |         "description": "Order date"
 953 |       },
 954 |       {
 955 |         "name": "amount",
 956 |         "datatype": "number",
 957 |         "description": "Order amount"
 958 |       },
 959 |       {
 960 |         "name": "status",
 961 |         "datatype": "string",
 962 |         "description": "Order status"
 963 |       }
 964 |     ],
 965 |     "partitioning": ["order_date"],
 966 |     "indexes": [
 967 |       {
 968 |         "name": "idx_order_id",
 969 |         "columns": ["order_id"]
 970 |       }
 971 |     ]
 972 |   },
 973 |   
 974 |   "materialization": {
 975 |     "type": "incremental",
 976 |     "incremental_details": {
 977 |       "strategy": "delete_insert",
 978 |       "delete_condition": "to_date(updated_at) = '@start_date'",
 979 |       "filter_condition": "to_date(updated_at) = '@start_date'"
 980 |     }
 981 |   },
 982 |   
 983 |   "tests": {
 984 |     "columns": {
 985 |       "order_id": ["not_null", "unique"],
 986 |       "order_date": ["not_null"]
 987 |     },
 988 |     "table": ["row_count_gt_0"]
 989 |   },
 990 |   
 991 |   "metadata": {
 992 |     "file_path": "/models/analytics/recent_orders.sql",
 993 |     "owner": "analytics-team",
 994 |     "tags": ["orders", "incremental"]
 995 |   }
 996 | }
 997 | ```
 998 | 
 999 | </details>
1000 | 
1001 | ### Append Example
1002 | 
1003 | <details>
1004 | <summary><strong>YAML Format</strong></summary>
1005 | 
1006 | ```yaml
1007 | ots_version: "0.1.0"
1008 | transformation_id: "logs.event_stream"
1009 | description: "Append-only event log"
1010 | 
1011 | transformation_type: "sql"
1012 | code:
1013 |   sql:
1014 |     original_sql: "SELECT event_id, timestamp, user_id, event_type, payload FROM source.events WHERE timestamp >= '@start_date'"
1015 |     resolved_sql: "SELECT event_id, timestamp, user_id, event_type, payload FROM warehouse.source.events WHERE timestamp >= '@start_date'"
1016 |     source_tables: ["source.events"]
1017 | 
1018 | schema:
1019 |   columns:
1020 |     - name: "event_id"
1021 |       datatype: "string"
1022 |       description: "Unique event identifier"
1023 |     - name: "timestamp"
1024 |       datatype: "date"
1025 |       description: "Event timestamp"
1026 |     - name: "user_id"
1027 |       datatype: "string"
1028 |       description: "User who triggered the event"
1029 |     - name: "event_type"
1030 |       datatype: "string"
1031 |       description: "Type of event"
1032 |     - name: "payload"
1033 |       datatype: "object"
1034 |       description: "Event payload data"
1035 |   partitioning: ["timestamp"]
1036 |   indexes:
1037 |     - name: "idx_timestamp"
1038 |       columns: ["timestamp"]
1039 |     - name: "idx_user_id"
1040 |       columns: ["user_id"]
1041 | 
1042 | materialization:
1043 |   type: "incremental"
1044 |   incremental_details:
1045 |     strategy: "append"
1046 |     filter_condition: "timestamp >= '@start_date'"
1047 | 
1048 | tests:
1049 |   columns:
1050 |     event_id: ["not_null", "unique"]
1051 |     timestamp: ["not_null"]
1052 |   table: ["row_count_gt_0"]
1053 | 
1054 | metadata:
1055 |   file_path: "/models/logs/event_stream.sql"
1056 |   owner: "data-engineering"
1057 |   tags: ["events", "append-only"]
1058 | ```
1059 | 
1060 | </details>
1061 | 
1062 | <details>
1063 | <summary><strong>JSON Format</strong></summary>
1064 | 
1065 | ```json
1066 | {
1067 |   "ots_version": "0.1.0",
1068 |   "transformation_id": "logs.event_stream",
1069 |   "description": "Append-only event log",
1070 |   
1071 |   "transformation_type": "sql",
1072 |   "code": {
1073 |     "sql": {
1074 |       "original_sql": "SELECT event_id, timestamp, user_id, event_type, payload FROM source.events WHERE timestamp >= '@start_date'",
1075 |       "resolved_sql": "SELECT event_id, timestamp, user_id, event_type, payload FROM warehouse.source.events WHERE timestamp >= '@start_date'",
1076 |       "source_tables": ["source.events"]
1077 |     }
1078 |   },
1079 |   
1080 |   "schema": {
1081 |     "columns": [
1082 |       {
1083 |         "name": "event_id",
1084 |         "datatype": "string",
1085 |         "description": "Unique event identifier"
1086 |       },
1087 |       {
1088 |         "name": "timestamp",
1089 |         "datatype": "date",
1090 |         "description": "Event timestamp"
1091 |       },
1092 |       {
1093 |         "name": "user_id",
1094 |         "datatype": "string",
1095 |         "description": "User who triggered the event"
1096 |       },
1097 |       {
1098 |         "name": "event_type",
1099 |         "datatype": "string",
1100 |         "description": "Type of event"
1101 |       },
1102 |       {
1103 |         "name": "payload",
1104 |         "datatype": "object",
1105 |         "description": "Event payload data"
1106 |       }
1107 |     ],
1108 |     "partitioning": ["timestamp"],
1109 |     "indexes": [
1110 |       {
1111 |         "name": "idx_timestamp",
1112 |         "columns": ["timestamp"]
1113 |       },
1114 |       {
1115 |         "name": "idx_user_id",
1116 |         "columns": ["user_id"]
1117 |       }
1118 |     ]
1119 |   },
1120 |   
1121 |   "materialization": {
1122 |     "type": "incremental",
1123 |     "incremental_details": {
1124 |       "strategy": "append",
1125 |       "filter_condition": "timestamp >= '@start_date'"
1126 |     }
1127 |   },
1128 |   
1129 |   "tests": {
1130 |     "columns": {
1131 |       "event_id": ["not_null", "unique"],
1132 |       "timestamp": ["not_null"]
1133 |     },
1134 |     "table": ["row_count_gt_0"]
1135 |   },
1136 |   
1137 |   "metadata": {
1138 |     "file_path": "/models/logs/event_stream.sql",
1139 |     "owner": "data-engineering",
1140 |     "tags": ["events", "append-only"]
1141 |   }
1142 | }
1143 | ```
1144 | 
1145 | </details>
1146 | 
1147 | ### Merge Example
1148 | 
1149 | <details>
1150 | <summary><strong>YAML Format</strong></summary>
1151 | 
1152 | ```yaml
1153 | ots_version: "0.1.0"
1154 | transformation_id: "product.master_data"
1155 | description: "Customer master data with upsert logic"
1156 | 
1157 | transformation_type: "sql"
1158 | code:
1159 |   sql:
1160 |     original_sql: "SELECT customer_id, name, email, phone, updated_at FROM source.customers WHERE updated_at >= '@start_date'"
1161 |     resolved_sql: "SELECT customer_id, name, email, phone, updated_at FROM warehouse.source.customers WHERE updated_at >= '@start_date'"
1162 |     source_tables: ["source.customers"]
1163 | 
1164 | schema:
1165 |   columns:
1166 |     - name: "customer_id"
1167 |       datatype: "number"
1168 |       description: "Unique customer identifier"
1169 |     - name: "name"
1170 |       datatype: "string"
1171 |       description: "Customer name"
1172 |     - name: "email"
1173 |       datatype: "string"
1174 |       description: "Customer email"
1175 |     - name: "phone"
1176 |       datatype: "string"
1177 |       description: "Customer phone number"
1178 |     - name: "updated_at"
1179 |       datatype: "date"
1180 |       description: "Last update timestamp"
1181 |   partitioning: []
1182 |   indexes:
1183 |     - name: "idx_customer_id"
1184 |       columns: ["customer_id"]
1185 |     - name: "idx_email"
1186 |       columns: ["email"]
1187 | 
1188 | materialization:
1189 |   type: "incremental"
1190 |   incremental_details:
1191 |     strategy: "merge"
1192 |     filter_condition: "updated_at >= '@start_date'"
1193 |     merge_key: ["customer_id"]
1194 |     update_columns: ["name", "email", "phone", "updated_at"]
1195 | 
1196 | tests:
1197 |   columns:
1198 |     customer_id: ["not_null", "unique"]
1199 |     email: ["not_null"]
1200 |   table: ["row_count_gt_0", "unique"]  # unique at table level checks all columns for row uniqueness
1201 | 
1202 |   metadata:
1203 |   file_path: "/models/product/master_data.sql"
1204 |   owner: "product-team"
1205 |   tags: ["customers", "master-data"]
1206 | ```
1207 | 
1208 | </details>
1209 | 
1210 | <details>
1211 | <summary><strong>JSON Format</strong></summary>
1212 | 
1213 | ```json
1214 | {
1215 |   "ots_version": "0.1.0",
1216 |   "transformation_id": "product.master_data",
1217 |   "description": "Customer master data with upsert logic",
1218 |   
1219 |   "transformation_type": "sql",
1220 |   "code": {
1221 |     "sql": {
1222 |       "original_sql": "SELECT customer_id, name, email, phone, updated_at FROM source.customers WHERE updated_at >= '@start_date'",
1223 |       "resolved_sql": "SELECT customer_id, name, email, phone, updated_at FROM warehouse.source.customers WHERE updated_at >= '@start_date'",
1224 |       "source_tables": ["source.customers"]
1225 |     }
1226 |   },
1227 |   
1228 |   "schema": {
1229 |     "columns": [
1230 |       {
1231 |         "name": "customer_id",
1232 |         "datatype": "number",
1233 |         "description": "Unique customer identifier"
1234 |       },
1235 |       {
1236 |         "name": "name",
1237 |         "datatype": "string",
1238 |         "description": "Customer name"
1239 |       },
1240 |       {
1241 |         "name": "email",
1242 |         "datatype": "string",
1243 |         "description": "Customer email"
1244 |       },
1245 |       {
1246 |         "name": "phone",
1247 |         "datatype": "string",
1248 |         "description": "Customer phone number"
1249 |       },
1250 |       {
1251 |         "name": "updated_at",
1252 |         "datatype": "date",
1253 |         "description": "Last update timestamp"
1254 |       }
1255 |     ],
1256 |     "partitioning": [],
1257 |     "indexes": [
1258 |       {
1259 |         "name": "idx_customer_id",
1260 |         "columns": ["customer_id"]
1261 |       },
1262 |       {
1263 |         "name": "idx_email",
1264 |         "columns": ["email"]
1265 |       }
1266 |     ]
1267 |   },
1268 |   
1269 |   "materialization": {
1270 |     "type": "incremental",
1271 |     "incremental_details": {
1272 |       "strategy": "merge",
1273 |       "filter_condition": "updated_at >= '@start_date'",
1274 |       "merge_key": ["customer_id"],
1275 |       "update_columns": ["name", "email", "phone", "updated_at"]
1276 |     }
1277 |   },
1278 |   
1279 |   "tests": {
1280 |     "columns": {
1281 |       "customer_id": ["not_null", "unique"],
1282 |       "email": ["not_null"]
1283 |     },
1284 |     "table": ["row_count_gt_0", "unique"]
1285 |   },
1286 |   
1287 |   "metadata": {
1288 |     "file_path": "/models/product/master_data.sql",
1289 |     "owner": "product-team",
1290 |     "tags": ["customers", "master-data"]
1291 |   }
1292 | }
1293 | ```
1294 | 
1295 | </details>
1296 | 
1297 | ### SCD2 Example
1298 | 
1299 | <details>
1300 | <summary><strong>YAML Format</strong></summary>
1301 | 
1302 | ```yaml
1303 | ots_version: "0.1.0"
1304 | transformation_id: "dim.products_scd2"
1305 | description: "Product dimension with full history tracking"
1306 | 
1307 | transformation_type: "sql"
1308 | code:
1309 |   sql:
1310 |     original_sql: "SELECT product_id, product_name, price, category, updated_at FROM source.products WHERE updated_at >= '@start_date'"
1311 |     resolved_sql: "SELECT product_id, product_name, price, category, updated_at FROM warehouse.source.products WHERE updated_at >= '@start_date'"
1312 |     source_tables: ["source.products"]
1313 | 
1314 | schema:
1315 |   columns:
1316 |     - name: "product_id"
1317 |       datatype: "number"
1318 |       description: "Unique product identifier"
1319 |     - name: "product_name"
1320 |       datatype: "string"
1321 |       description: "Product name"
1322 |     - name: "price"
1323 |       datatype: "number"
1324 |       description: "Product price"
1325 |     - name: "category"
1326 |       datatype: "string"
1327 |       description: "Product category"
1328 |     - name: "updated_at"
1329 |       datatype: "date"
1330 |       description: "Last update timestamp"
1331 |     - name: "valid_from"
1332 |       datatype: "date"
1333 |       description: "Record validity start date"
1334 |     - name: "valid_to"
1335 |       datatype: "date"
1336 |       description: "Record validity end date"
1337 |   partitioning: []
1338 |   indexes:
1339 |     - name: "idx_product_id"
1340 |       columns: ["product_id"]
1341 |     - name: "idx_valid_from"
1342 |       columns: ["valid_from"]
1343 | 
1344 | materialization:
1345 |   type: "scd2"
1346 |   scd2_details:
1347 |     unique_key: ["product_id"]
1348 |     start_column: "valid_from"
1349 |     end_column: "valid_to"
1350 | 
1351 | tests:
1352 |   columns:
1353 |     product_id: ["not_null", "unique"]
1354 |     valid_from: ["not_null"]
1355 |   table: ["row_count_gt_0"]
1356 | 
1357 |   metadata:
1358 |   file_path: "/models/dim/products_scd2.sql"
1359 |   owner: "data-engineering"
1360 |   tags: ["products", "scd2", "dimension"]
1361 | ```
1362 | 
1363 | </details>
1364 | 
1365 | <details>
1366 | <summary><strong>JSON Format</strong></summary>
1367 | 
1368 | ```json
1369 | {
1370 |   "ots_version": "0.1.0",
1371 |   "transformation_id": "dim.products_scd2",
1372 |   "description": "Product dimension with full history tracking",
1373 |   
1374 |   "transformation_type": "sql",
1375 |   "code": {
1376 |     "sql": {
1377 |       "original_sql": "SELECT product_id, product_name, price, category, updated_at FROM source.products WHERE updated_at >= '@start_date'",
1378 |       "resolved_sql": "SELECT product_id, product_name, price, category, updated_at FROM warehouse.source.products WHERE updated_at >= '@start_date'",
1379 |       "source_tables": ["source.products"]
1380 |     }
1381 |   },
1382 |   
1383 |   "schema": {
1384 |     "columns": [
1385 |       {
1386 |         "name": "product_id",
1387 |         "datatype": "number",
1388 |         "description": "Unique product identifier"
1389 |       },
1390 |       {
1391 |         "name": "product_name",
1392 |         "datatype": "string",
1393 |         "description": "Product name"
1394 |       },
1395 |       {
1396 |         "name": "price",
1397 |         "datatype": "number",
1398 |         "description": "Product price"
1399 |       },
1400 |       {
1401 |         "name": "category",
1402 |         "datatype": "string",
1403 |         "description": "Product category"
1404 |       },
1405 |       {
1406 |         "name": "updated_at",
1407 |         "datatype": "date",
1408 |         "description": "Last update timestamp"
1409 |       },
1410 |       {
1411 |         "name": "valid_from",
1412 |         "datatype": "date",
1413 |         "description": "Record validity start date"
1414 |       },
1415 |       {
1416 |         "name": "valid_to",
1417 |         "datatype": "date",
1418 |         "description": "Record validity end date"
1419 |       }
1420 |     ],
1421 |     "partitioning": [],
1422 |     "indexes": [
1423 |       {
1424 |         "name": "idx_product_id",
1425 |         "columns": ["product_id"]
1426 |       },
1427 |       {
1428 |         "name": "idx_valid_from",
1429 |         "columns": ["valid_from"]
1430 |       }
1431 |     ]
1432 |   },
1433 |   
1434 |   "materialization": {
1435 |     "type": "scd2",
1436 |     "scd2_details": {
1437 |       "unique_key": ["product_id"],
1438 |       "start_column": "valid_from",
1439 |       "end_column": "valid_to"
1440 |     }
1441 |   },
1442 |   
1443 |   "tests": {
1444 |     "columns": {
1445 |       "product_id": ["not_null", "unique"],
1446 |       "valid_from": ["not_null"]
1447 |     },
1448 |     "table": ["row_count_gt_0"]
1449 |   },
1450 |   
1451 |   "metadata": {
1452 |     "file_path": "/models/dim/products_scd2.sql",
1453 |     "owner": "data-engineering",
1454 |     "tags": ["products", "scd2", "dimension"]
1455 |   }
1456 | }
1457 | ```
1458 | 
1459 | </details>
1460 | 
1461 | ## Complete Example: Test Library and Module
1462 | 
1463 | This example demonstrates a complete setup with a test library and a transformation module that uses both standard and custom tests.
1464 | 
1465 | ### Test Library Example
1466 | 
1467 | <details>
1468 | <summary><strong>YAML Format</strong></summary>
1469 | 
1470 | ```yaml
1471 | # tests/test_library.yaml
1472 | test_library_version: "1.0"
1473 | description: "Shared data quality tests for analytics project"
1474 | 
1475 | generic_tests:
1476 |   check_minimum_rows:
1477 |     type: "sql"
1478 |     level: "table"
1479 |     description: "Ensures table has minimum number of rows"
1480 |     sql: |
1481 |       SELECT 1 as violation
1482 |       FROM @table_name
1483 |       GROUP BY 1
1484 |       HAVING COUNT(*) < @min_rows:10
1485 |     parameters:
1486 |       min_rows:
1487 |         type: "number"
1488 |         default: 10
1489 |         description: "Minimum number of rows required"
1490 |   
1491 |   column_not_negative:
1492 |     type: "sql"
1493 |     level: "column"
1494 |     description: "Ensures numeric column has no negative values"
1495 |     sql: |
1496 |       SELECT @column_name
1497 |       FROM @table_name
1498 |       WHERE @column_name < 0
1499 |     parameters: []
1500 | 
1501 | singular_tests:
1502 |   test_customers_email_format:
1503 |     type: "sql"
1504 |     level: "table"
1505 |     description: "Validates email format for customers table"
1506 |     sql: |
1507 |       SELECT id, email
1508 |       FROM analytics.customers
1509 |       WHERE email NOT LIKE '%@%.%'
1510 |     target_transformation: "analytics.customers"
1511 | ```
1512 | 
1513 | </details>
1514 | 
1515 | <details>
1516 | <summary><strong>JSON Format</strong></summary>
1517 | 
1518 | ```json
1519 | {
1520 |   "test_library_version": "1.0",
1521 |   "description": "Shared data quality tests for analytics project",
1522 |   "generic_tests": {
1523 |     "check_minimum_rows": {
1524 |       "type": "sql",
1525 |       "level": "table",
1526 |       "description": "Ensures table has minimum number of rows",
1527 |       "sql": "SELECT 1 as violation\nFROM @table_name\nGROUP BY 1\nHAVING COUNT(*) < @min_rows:10",
1528 |       "parameters": {
1529 |         "min_rows": {
1530 |           "type": "number",
1531 |           "default": 10,
1532 |           "description": "Minimum number of rows required"
1533 |         }
1534 |       }
1535 |     },
1536 |     "column_not_negative": {
1537 |       "type": "sql",
1538 |       "level": "column",
1539 |       "description": "Ensures numeric column has no negative values",
1540 |       "sql": "SELECT @column_name\nFROM @table_name\nWHERE @column_name < 0",
1541 |       "parameters": []
1542 |     }
1543 |   },
1544 |   "singular_tests": {
1545 |     "test_customers_email_format": {
1546 |       "type": "sql",
1547 |       "level": "table",
1548 |       "description": "Validates email format for customers table",
1549 |       "sql": "SELECT id, email\nFROM analytics.customers\nWHERE email NOT LIKE '%@%.%'",
1550 |       "target_transformation": "analytics.customers"
1551 |     }
1552 |   }
1553 | }
1554 | ```
1555 | 
1556 | </details>
1557 | 
1558 | ### Module Using Test Library
1559 | 
1560 | <details>
1561 | <summary><strong>YAML Format</strong></summary>
1562 | 
1563 | ```yaml
1564 | ots_version: "0.1.0"
1565 | module_name: "analytics_customers"
1566 | module_description: "Customer analytics transformations"
1567 | test_library_path: "../tests/test_library.yaml"
1568 | tags: ["analytics", "production"]
1569 | 
1570 | target:
1571 |   database: "warehouse"
1572 |   schema: "analytics"
1573 |   sql_dialect: "postgres"
1574 | 
1575 | transformations:
1576 |   - transformation_id: "analytics.customers"
1577 |     description: "Customer data table"
1578 |     transformation_type: "sql"
1579 |     
1580 |     code:
1581 |       sql:
1582 |         original_sql: "SELECT id, name, email, created_at, amount FROM source.customers WHERE active = true"
1583 |         resolved_sql: "SELECT id, name, email, created_at, amount FROM warehouse.source.customers WHERE active = true"
1584 |         source_tables: ["source.customers"]
1585 |     
1586 |     schema:
1587 |       columns:
1588 |         - name: "id"
1589 |           datatype: "number"
1590 |           description: "Unique customer identifier"
1591 |         - name: "name"
1592 |           datatype: "string"
1593 |           description: "Customer name"
1594 |         - name: "email"
1595 |           datatype: "string"
1596 |           description: "Customer email address"
1597 |         - name: "created_at"
1598 |           datatype: "date"
1599 |           description: "Customer creation date"
1600 |         - name: "amount"
1601 |           datatype: "number"
1602 |           description: "Customer account balance"
1603 |       partitioning: []
1604 |       indexes:
1605 |         - name: "idx_customers_id"
1606 |           columns: ["id"]
1607 |         - name: "idx_customers_email"
1608 |           columns: ["email"]
1609 |     
1610 |     materialization:
1611 |       type: "table"
1612 |     
1613 |     tests:
1614 |       columns:
1615 |         id: 
1616 |           - "not_null"                              # Standard test
1617 |           - "unique"                                 # Standard test (column-level)
1618 |         email:
1619 |           - "not_null"
1620 |           - name: "accepted_values"                 # Standard test with params
1621 |             params:
1622 |               values: ["gmail.com", "yahoo.com", "company.com"]
1623 |         amount:
1624 |           - name: "column_not_negative"               # Generic test from library
1625 |       table:
1626 |         - "row_count_gt_0"                          # Standard test
1627 |         - "unique"                                   # Standard test (table-level, checks all columns for row uniqueness)
1628 |         - name: "check_minimum_rows"                 # Generic test with params
1629 |           params:
1630 |             min_rows: 100
1631 |         - "test_customers_email_format"             # Singular test from library
1632 |     
1633 |     metadata:
1634 |       file_path: "/models/analytics/customers.sql"
1635 |       owner: "data-team"
1636 |       tags: ["customer", "core"]
1637 |       object_tags:
1638 |         sensitivity_tag: "pii"
1639 |         classification: "internal"
1640 | ```
1641 | 
1642 | </details>
1643 | 
1644 | <details>
1645 | <summary><strong>JSON Format</strong></summary>
1646 | 
1647 | ```json
1648 | {
1649 |   "ots_version": "0.1.0",
1650 |   "module_name": "analytics_customers",
1651 |   "module_description": "Customer analytics transformations",
1652 |   "test_library_path": "../tests/test_library.yaml",
1653 |   "tags": ["analytics", "production"],
1654 |   "target": {
1655 |     "database": "warehouse",
1656 |     "schema": "analytics",
1657 |     "sql_dialect": "postgres"
1658 |   },
1659 |   "transformations": [
1660 |     {
1661 |       "transformation_id": "analytics.customers",
1662 |       "description": "Customer data table",
1663 |       "transformation_type": "sql",
1664 |       "code": {
1665 |         "sql": {
1666 |           "original_sql": "SELECT id, name, email, created_at, amount FROM source.customers WHERE active = true",
1667 |           "resolved_sql": "SELECT id, name, email, created_at, amount FROM warehouse.source.customers WHERE active = true",
1668 |           "source_tables": ["source.customers"]
1669 |         }
1670 |       },
1671 |       "schema": {
1672 |         "columns": [
1673 |           {
1674 |             "name": "id",
1675 |             "datatype": "number",
1676 |             "description": "Unique customer identifier"
1677 |           },
1678 |           {
1679 |             "name": "name",
1680 |             "datatype": "string",
1681 |             "description": "Customer name"
1682 |           },
1683 |           {
1684 |             "name": "email",
1685 |             "datatype": "string",
1686 |             "description": "Customer email address"
1687 |           },
1688 |           {
1689 |             "name": "created_at",
1690 |             "datatype": "date",
1691 |             "description": "Customer creation date"
1692 |           },
1693 |           {
1694 |             "name": "amount",
1695 |             "datatype": "number",
1696 |             "description": "Customer account balance"
1697 |           }
1698 |         ],
1699 |         "partitioning": [],
1700 |         "indexes": [
1701 |           {
1702 |             "name": "idx_customers_id",
1703 |             "columns": ["id"]
1704 |           },
1705 |           {
1706 |             "name": "idx_customers_email",
1707 |             "columns": ["email"]
1708 |           }
1709 |         ]
1710 |       },
1711 |       "materialization": {
1712 |         "type": "table"
1713 |       },
1714 |       "tests": {
1715 |         "columns": {
1716 |           "id": ["not_null", "unique"],
1717 |           "email": [
1718 |             "not_null",
1719 |             {
1720 |               "name": "accepted_values",
1721 |               "params": {
1722 |                 "values": ["gmail.com", "yahoo.com", "company.com"]
1723 |               }
1724 |             }
1725 |           ],
1726 |           "amount": [
1727 |             {
1728 |               "name": "column_not_negative"
1729 |             }
1730 |           ]
1731 |         },
1732 |         "table": [
1733 |           "row_count_gt_0",
1734 |           "unique",
1735 |           {
1736 |             "name": "check_minimum_rows",
1737 |             "params": {
1738 |               "min_rows": 100
1739 |             }
1740 |           },
1741 |           "test_customers_email_format"
1742 |         ]
1743 |       },
1744 |       "metadata": {
1745 |         "file_path": "/models/analytics/customers.sql",
1746 |         "owner": "data-team",
1747 |         "tags": ["customer", "core"],
1748 |         "object_tags": {
1749 |           "sensitivity_tag": "pii",
1750 |           "classification": "internal"
1751 |         }
1752 |       }
1753 |     }
1754 |   ]
1755 | }
1756 | ```
1757 | 
1758 | </details>
1759 | 


--------------------------------------------------------------------------------
/versions/v0.2.0/transformation-specification.md:
--------------------------------------------------------------------------------
   1 | # Open Transformation Specification v0.2.0
   2 | 
   3 | ## Table of Contents
   4 | 
   5 | 1. [Introduction](#introduction)
   6 | 2. [Core Concepts](#core-concepts)
   7 | 3. [Open Transformation Definition (OTD) Structure](#open-transformation-definition-otd-structure)
   8 | 4. [Materialization Types](#materialization-types)
   9 | 5. [Data Quality Tests](#data-quality-tests)
  10 | 6. [User-Defined Functions (UDFs)](#user-defined-functions-udfs)
  11 | 7. [Examples](#examples)
  12 | 
  13 | ## Introduction
  14 | 
  15 | The Open Transformation Specification (OTS) defines a standard, programming language-agnostic interface description for data transformations, data quality tests, and user-defined functions (UDFs). This specification enables **interoperability** between tools and platforms, shifting the data transformation ecosystem from isolated, proprietary tools to an **open core** where tools can seamlessly work together around a shared specification.
  16 | 
  17 | This specification allows both humans and computers to discover and understand how transformations behave, what outputs they produce, and how those outputs are materialized (as tables, views, incremental updates, SCD2, etc.) without requiring additional documentation or configuration. By providing a common standard, OTS ensures that transformations defined in one tool can be consumed, understood, and executed by any OTS-compliant tool.
  18 | 
  19 | The OTS standard encompasses three types of artifacts: **Open Transformation Definitions (OTDs)** for transformations, **UDF Definitions** for user-defined functions, and **Test Definitions** for data quality tests. Together, these form the complete set of **OTS Artifacts** that can be defined and managed within an OTS Module.
  20 | 
  21 | An OTS-based transformation must include both the code that transforms the data and metadata about the transformation. A tool implementing OTS should be able to execute an OTS transformation with no additional code or information beyond what's specified in the OTS document. This **interoperability** ensures that the transformation ecosystem can grow organically, with tools building on each other's capabilities rather than creating isolated silos.
  22 | 
  23 | ## Core Concepts
  24 | 
  25 | ### OTS Artifacts
  26 | 
  27 | **OTS Artifacts** is the umbrella term for all concrete instances of the Open Transformation Specification. The OTS standard defines three types of artifacts:
  28 | 
  29 | 1. **Open Transformation Definition (OTD)**: A structured definition that describes a specific data transformation
  30 | 2. **UDF Definition**: A structured definition that describes a user-defined function
  31 | 3. **Test Definition**: A structured definition that describes a data quality test
  32 | 
  33 | All OTS Artifacts follow the OTS format and can be defined within an OTS Module. Together, they form a complete data transformation pipeline with reusable functions and quality validation.
  34 | 
  35 | ### Open Transformation Definition (OTD)
  36 | 
  37 | An **Open Transformation Definition (OTD)** is a concrete instance of the Open Transformation Specification that describes a specific data transformation using the OTS format. An OTD exists as a structured definition within an OTS Module, which is the file or document that contains one or more transformation definitions.
  38 | 
  39 | A transformation is a unit of data processing that takes one or more data sources as input and produces one data output. Right now, transformations are SQL queries, but we plan to add support for other programming languages in the future.
  40 | 
  41 | ### Open Transformation Specification Module
  42 | 
  43 | An **Open Transformation Specification Module (OTS Module)** is a collection of related OTS Artifacts (transformations, UDFs, and tests) that target the same database and schema. An OTS Module can contain one or more transformations, UDF definitions, and test definitions, much like how an OpenAPI specification can contain multiple endpoints.
  44 | 
  45 | Key characteristics of an OTS Module:
  46 | - **Single target**: All transformations in a module target the same database and schema
  47 | - **Logical grouping**: Related transformations are organized together
  48 | - **Deployment unit**: The entire module can be deployed as a single unit
  49 | 
  50 | ### Test Library
  51 | 
  52 | A **Test Library** is a project-level collection of reusable Test Definitions (generic and singular SQL tests) that can be shared across multiple OTS modules. Test libraries are defined separately from transformation modules and are referenced by modules that need to use them.
  53 | 
  54 | Key characteristics of a Test Library:
  55 | - **Project-level scope**: Test libraries are defined at the project/workspace level, separate from OTS modules
  56 | - **Reusability**: Test Definitions in a library can be referenced by any OTS module in the project
  57 | - **Test types**: Contains both generic SQL tests (with placeholders) and singular SQL tests (table-specific)
  58 | - **Optional**: Modules can define tests inline or reference a test library, or both
  59 | 
  60 | ### UDF Definition
  61 | 
  62 | A **UDF Definition** is a concrete instance of the Open Transformation Specification that describes a user-defined function using the OTS format. A UDF Definition exists as a structured definition within an OTS Module, defining a custom function that can be called within SQL transformations.
  63 | 
  64 | UDF Definitions include the function's signature (parameters and return type), implementation code, dependencies, and metadata. They enable reusable business logic and calculations that can be shared across multiple transformations.
  65 | 
  66 | ### Test Definition
  67 | 
  68 | A **Test Definition** is a concrete instance of the Open Transformation Specification that describes a data quality test using the OTS format. Test Definitions can exist in two contexts:
  69 | 
  70 | 1. **Within a Test Library**: Reusable test definitions (generic and singular SQL tests) that can be shared across multiple OTS modules
  71 | 2. **Inline within an OTS Module**: Module-specific test definitions that are defined directly in the transformation module
  72 | 
  73 | Test Definitions include the test logic (SQL queries for generic/singular tests, or test type for standard tests), parameters, target scope (table or column level), and metadata. They enable automated data quality validation without manual inspection.
  74 | 
  75 | #### OTS vs OTD vs OTS Module vs Test Library vs OTS Artifacts
  76 | 
  77 | - **Open Transformation Specification (OTS)**: The standard that defines the structure and rules
  78 | - **OTS Artifacts**: The umbrella term for all concrete instances of OTS (OTDs, UDF Definitions, and Test Definitions)
  79 | - **Open Transformation Definition (OTD)**: A specific transformation within a module
  80 | - **UDF Definition**: A specific user-defined function within a module
  81 | - **Test Definition**: A specific data quality test (in a Test Library or inline in a module)
  82 | - **Open Transformation Specification Module (OTS Module)**: A collection of related OTS Artifacts targeting the same database and schema
  83 | - **Test Library**: A project-level collection of reusable Test Definitions that can be shared across modules
  84 | 
  85 | ## Components of an OTD
  86 | 
  87 | An Open Transformation Definition consists of several key components that work together to define an executable transformation:
  88 | 
  89 | 1. **Transformation Code**: The transformation logic (SQL, Python, PySpark, etc.) stored in a type-based structure
  90 | 2. **Schema Definition**: The structure of the output data including column definitions, types, and validation rules
  91 | 3. **Materialization Strategy**: How the output is stored and updated (table, view, incremental, SCD2)
  92 | 4. **Tests**: Validation rules that ensure data quality at table level
  93 | 5. **Metadata**: Additional information about the transformation (owner, tags, creation date, etc.)
  94 | 
  95 | ### Transformation Code
  96 | 
  97 | Transformations can be written in different languages (SQL, Python, PySpark, etc.). The transformation code is stored in a type-based structure that supports multiple transformation types while maintaining a consistent interface.
  98 | 
  99 | #### SQL Transformations
 100 | 
 101 | For SQL transformations, the code is stored with the following structure:
 102 | - `original_sql`: The original SQL query as written (typically a SELECT statement). This preserves the original transformation code as authored.
 103 | - `resolved_sql`: SQL with fully qualified table names (schema.table format). This is the preferred version for execution as it eliminates ambiguity in table references. Tools should use `resolved_sql` when executing transformations.
 104 | - `source_tables`: List of input tables referenced in the query (required for dependency analysis)
 105 | - `source_functions`: List of user-defined functions (UDFs) called in the query (required for dependency analysis). This field is optional and may be empty if no user-defined functions are used. Function names should be fully qualified (schema.function_name) when possible, or unqualified if the function is resolved by the database.
 106 | 
 107 | **When to use each:**
 108 | - Use `original_sql` for: displaying the original code to users, version control, understanding the transformation logic
 109 | - Use `resolved_sql` for: actual execution, dependency resolution, cross-database compatibility
 110 | - Use `source_tables` and `source_functions` for: dependency graph building, execution order determination, and understanding transformation dependencies
 111 | 
 112 | #### Non-SQL Transformations
 113 | 
 114 | Support for non-SQL transformation types (Python, PySpark, R, etc.) is planned for future versions of the specification. The current v0.2.0 specification focuses on SQL transformations and adds support for user-defined functions (UDFs).
 115 | 
 116 | ### Schema Definition
 117 | 
 118 | Schema defines the structure of the output data, including column names, data types, descriptions, partitioning, indexes, and other properties of the physical table. The schema is essential for understanding what the transformation produces without executing it. For example, it enables generating DDL statements for creating the output table.
 119 | 
 120 | ### Materialization Strategy
 121 | Materialization defines how the transformation output is stored and updated. Common types include:
 122 | - **table**: Full table replacement on each run
 123 | - **view**: Virtual table that queries underlying data
 124 | - **incremental**: Partial updates using strategies like delete+insert or merge
 125 | - **scd2**: Slowly Changing Dimension type 2 for tracking historical changes
 126 | 
 127 | ### Tests
 128 | Tests are validation rules that ensure data quality. They can be defined at two levels:
 129 | - **Column-level tests**: Applied to individual columns (e.g., `not_null`, `unique`)
 130 | - **Table-level tests**: Applied to the entire output (e.g., `row_count_gt_0`, `unique`)
 131 | 
 132 | Tests enable automated data quality validation without manual inspection. OTS supports three types of tests:
 133 | 1. **Standard Tests**: Built-in tests defined in the OTS specification (e.g., `not_null`, `unique`, `row_count_gt_0`)
 134 | 2. **Generic SQL Tests**: Reusable SQL tests with placeholders that can be applied to multiple transformations
 135 | 3. **Singular SQL Tests**: Table-specific SQL tests with hardcoded table references
 136 | 
 137 | Generic SQL Tests and Singular SQL Tests can be defined in a project Test Library (see [Test Library](#test-library)) or inline within the current OTS Module.
 138 | 
 139 | For detailed information about test types, definitions, and usage, see the [Data Quality Tests](#data-quality-tests) section.
 140 | 
 141 | ### Metadata
 142 | Metadata provides additional information about the transformation including:
 143 | - **file_path**: Location of the source transformation file
 144 | - **owner**: Person or team responsible for the transformation
 145 | - **tags**: List of string tags for categorization and discovery (e.g., ["analytics", "fct", "production"])
 146 | - **object_tags**: Dictionary of key-value pairs for database object tagging (e.g., {"sensitivity_tag": "pii", "classification": "public"})
 147 | 
 148 | **Tag Types:**
 149 | - **tags** (dbt-style): Simple string tags used for filtering, categorization, and discovery. These are typically used for model selection and organization.
 150 |   - **Module-level tags**: Tags defined at the module level apply to all transformations in the module. They can be inherited by transformations or merged with transformation-specific tags.
 151 |   - **Transformation-level tags**: Tags defined at the transformation level are specific to that transformation. They can be merged with module-level tags.
 152 | - **object_tags** (database-style): Key-value pairs that are attached directly to database objects (tables, views) in databases that support object tagging (e.g., Snowflake). These are used for data governance, compliance, and metadata management. Unlike `tags`, `object_tags` are always transformation-specific and are not inherited from module level.
 153 | 
 154 | ## OTS Module Structure
 155 | 
 156 | An OTS Module is a YAML or JSON document that can contain one or more OTS Artifacts (transformations, UDF Definitions, and Test Definitions). Below is the complete structure:
 157 | 
 158 | ### Complete OTS Module Structure
 159 | 
 160 | ```yaml
 161 | # OTS version
 162 | ots_version: string             # OTS specification version (e.g., "0.1.0") - indicates which version of the OTS standard this module follows
 163 | 
 164 | # Module metadata
 165 | module_name: string              # Module name (e.g., "ecommerce_analytics")
 166 | module_description: string       # Description of the module (optional)
 167 | version: string                  # Optional: Module/package version (e.g., "1.0.0") - version of this specific module, independent of OTS version
 168 | tags: [string]                   # Optional: Module-level tags for categorization (e.g., ["analytics", "fct"]). These can be inherited or merged with transformation-level tags.
 169 | test_library_path: string        # Optional: Path to test library file (relative to module file or absolute path)
 170 | 
 171 | # Optional: Inline test definitions (same structure as test library)
 172 | generic_tests:                   # Optional: Module-specific generic SQL tests
 173 |   test_name:
 174 |     type: "sql"
 175 |     level: "table" | "column"
 176 |     description: string
 177 |     sql: string
 178 |     parameters: {}
 179 | singular_tests:                  # Optional: Module-specific singular SQL tests
 180 |   test_name:
 181 |     type: "sql"
 182 |     level: "table" | "column"
 183 |     description: string
 184 |     sql: string
 185 |     target_transformation: string
 186 | 
 187 | target:
 188 |   database: string               # Target database name
 189 |   schema: string                 # Target schema name
 190 |   sql_dialect: string            # Optional: SQL dialect (e.g., "postgres", "bigquery", "snowflake", "spark", etc.)
 191 |   connection_profile: string     # Optional: connection profile reference
 192 | 
 193 | # Transformations
 194 | transformations:                 # Array of transformation definitions
 195 |   - transformation_id: string    # Fully qualified identifier (e.g., "analytics.my_first_table")
 196 |     description: string          # Optional: Description of what the transformation does (optional)
 197 |     transformation_type: string  # Type of transformation: "sql" (default: "sql"). Non-SQL types (python, pyspark, r) are planned for future versions.
 198 |     sql_dialect: string          # Optional: SQL dialect of the transformation code (for translation to target dialect)
 199 |     
 200 |     # Transformation code (type-based structure)
 201 |     code:
 202 |       # For SQL transformations (transformation_type: "sql")
 203 |       sql:
 204 |         original_sql: string     # The original SQL query as written (typically a SELECT statement)
 205 |         resolved_sql: string     # SQL with fully qualified table names (schema.table) - preferred for execution
 206 |         source_tables: [string] # List of input tables referenced (required for dependency analysis)
 207 |         source_functions: [string] # Optional: List of user-defined functions (UDFs) called in the query (required for dependency analysis)
 208 |       
 209 |       # Note: Non-SQL transformation types (python, pyspark, r) are planned for future versions
 210 |     
 211 |     # Schema definition
 212 |     schema:
 213 |       columns:                   # Array of column definitions
 214 |         - name: string           # Column name
 215 |           datatype: string       # Data type ("number", "string", "date", etc.)
 216 |           description: string    # Column description
 217 |       partitioning: [string]     # Optional: Partition keys
 218 |       indexes:                   # Optional: Array of index definitions
 219 |         - name: string           # Index name (optional, auto-generated if not provided)
 220 |           columns: [string]      # Columns to index
 221 |     
 222 |     # Materialization strategy
 223 |     materialization:
 224 |       type: string               # "table", "view", "incremental", "scd2"
 225 |       incremental_details:       # Required if type is "incremental"
 226 |         strategy: string         # "delete_insert", "append", "merge"
 227 |         delete_condition: string # SQL condition for delete (delete_insert only)
 228 |         filter_condition: string # SQL condition for filtering data
 229 |         merge_key: [string]      # Primary key columns for matching records (merge only)
 230 |         update_columns: [string] # (Optional) List of columns to be updated in merge strategy
 231 |       scd2_details:              # Optional if type is "scd2"
 232 |         start_column: string     # Name of the start column (default: "valid_from")
 233 |         end_column: string       # Name of the end column (default: "valid_to")
 234 |         unique_key: [string]     # Array of columns that uniquely identify a record in SCD2 modeling (optional)
 235 |     
 236 |     # Tests: both column-level and table-level
 237 |     tests:
 238 |       columns:                   # Optional: Column-level tests
 239 |         column_name:             # Tests for specific columns
 240 |           - string               # Simple test name (e.g., "not_null", "unique")
 241 |           - object               # Test with parameters: {name: string, params?: object, severity?: "error"|"warning"}
 242 |       table:                     # Optional: Table-level tests
 243 |         - string                 # Simple test name (e.g., "row_count_gt_0")
 244 |         - object                 # Test with parameters: {name: string, params?: object, severity?: "error"|"warning"}
 245 |     
 246 |     # Metadata
 247 |     metadata:
 248 |       file_path: string          # Path to the source transformation file
 249 |       owner: string              # Optional: Person or team responsible (optional)
 250 |       tags: [string]             # Optional: List of string tags for categorization and discovery (e.g., ["analytics", "fct"])
 251 |       object_tags: dict          # Optional: Dictionary of key-value pairs for database object tagging (e.g., {"sensitivity_tag": "pii", "classification": "public"})
 252 | 
 253 | # Functions (OTS 0.2.0+)
 254 | functions:                       # Optional: Array of user-defined function definitions (OTS 0.2.0+)
 255 |   - function_id: string         # Fully qualified function name (e.g., "schema.function_name")
 256 |     description: string         # Optional: Description of what the function does
 257 |     function_type: string       # Function type: "scalar", "aggregate", or "table"
 258 |     language: string            # Programming language: "sql", "python", "javascript", etc.
 259 |     parameters:                 # Optional: Array of function parameters
 260 |       - name: string            # Parameter name
 261 |         type: string           # Parameter data type (e.g., "DOUBLE", "VARCHAR")
 262 |         description: string   # Optional: Parameter description
 263 |     return_type: string         # Optional: Return type for scalar/aggregate functions (e.g., "DOUBLE", "VARCHAR")
 264 |     return_table_schema:        # Optional: Schema definition for table functions (same structure as transformation schema)
 265 |       columns: [ColumnDefinition]
 266 |     deterministic: bool         # Optional: Whether the function is deterministic (same inputs always produce same outputs)
 267 |     code:                       # Function code (type-based structure)
 268 |       generic_sql: string      # Generic SQL code that works across databases (for SQL functions)
 269 |       database_specific: dict   # Database-specific implementations (keyed by database name)
 270 |     dependencies:                # Optional: Function dependencies
 271 |       tables: [string]         # List of tables the function depends on
 272 |       functions: [string]      # List of other functions this function depends on
 273 |     metadata:                    # Function metadata
 274 |       file_path: string        # Path to the source function file
 275 |       tags: [string]           # Optional: List of string tags for categorization
 276 |       object_tags: dict        # Optional: Dictionary of key-value pairs for database object tagging
 277 | ```
 278 | 
 279 | ## Simple Table Transformation
 280 | 
 281 | <details>
 282 | <summary><strong>JSON Format</strong></summary>
 283 | 
 284 | ```json
 285 | {
 286 |   "ots_version": "0.2.0",
 287 |   "module_name": "analytics_customers",
 288 |   "module_description": "Customer analytics transformations",
 289 |   "tags": ["analytics", "production"],
 290 |   "test_library_path": "../tests/test_library.yaml",
 291 |   "target": {
 292 |     "database": "warehouse",
 293 |     "schema": "analytics",
 294 |     "sql_dialect": "postgres"
 295 |   },
 296 |   "transformations": [
 297 |     {
 298 |       "transformation_id": "analytics.customers",
 299 |       "description": "Customer data table",
 300 |       "transformation_type": "sql",
 301 |       "code": {
 302 |         "sql": {
 303 |           "original_sql": "SELECT id, name, email, created_at FROM source.customers WHERE active = true",
 304 |           "resolved_sql": "SELECT id, name, email, created_at FROM warehouse.source.customers WHERE active = true",
 305 |           "source_tables": ["source.customers"],
 306 |           "source_functions": []
 307 |         }
 308 |       },
 309 |       "schema": {
 310 |         "columns": [
 311 |           {
 312 |             "name": "id",
 313 |             "datatype": "number",
 314 |             "description": "Unique customer identifier"
 315 |           },
 316 |           {
 317 |             "name": "name",
 318 |             "datatype": "string",
 319 |             "description": "Customer name"
 320 |           },
 321 |           {
 322 |             "name": "email",
 323 |             "datatype": "string",
 324 |             "description": "Customer email address"
 325 |           },
 326 |           {
 327 |             "name": "created_at",
 328 |             "datatype": "date",
 329 |             "description": "Customer creation date"
 330 |           }
 331 |         ],
 332 |         "partitioning": [],
 333 |         "indexes": [
 334 |           {
 335 |             "name": "idx_customers_id",
 336 |             "columns": ["id"]
 337 |           },
 338 |           {
 339 |             "name": "idx_customers_email",
 340 |             "columns": ["email"]
 341 |           }
 342 |         ]
 343 |       },
 344 |       "materialization": {
 345 |         "type": "table"
 346 |       },
 347 |       "tests": {
 348 |         "columns": {
 349 |           "id": ["not_null", "unique"],
 350 |           "email": ["not_null", "unique"],
 351 |           "created_at": ["not_null"]
 352 |         },
 353 |         "table": ["row_count_gt_0"]
 354 |       },
 355 |       "metadata": {
 356 |         "file_path": "/models/analytics/customers.sql",
 357 |         "owner": "data-team",
 358 |         "tags": ["customer", "core"],
 359 |         "object_tags": {
 360 |           "sensitivity_tag": "pii",
 361 |           "classification": "internal"
 362 |         }
 363 |       }
 364 |     }
 365 |   ]
 366 | }
 367 | ```
 368 | 
 369 | </details>
 370 | 
 371 | <details>
 372 | <summary><strong>YAML Format</strong></summary>
 373 | 
 374 | ```yaml
 375 | ots_version: "0.2.0"
 376 | module_name: "analytics_customers"
 377 | module_description: "Customer analytics transformations"
 378 | tags: ["analytics", "production"]
 379 | 
 380 | target:
 381 |   database: "warehouse"
 382 |   schema: "analytics"
 383 |   sql_dialect: "postgres"
 384 | 
 385 | transformations:
 386 |   - transformation_id: "analytics.customers"
 387 |     description: "Customer data table"
 388 |     transformation_type: "sql"
 389 |     
 390 |     code:
 391 |       sql:
 392 |         original_sql: "SELECT id, name, email, created_at FROM source.customers WHERE active = true"
 393 |         resolved_sql: "SELECT id, name, email, created_at FROM warehouse.source.customers WHERE active = true"
 394 |         source_tables: ["source.customers"]
 395 |         source_functions: []
 396 |     
 397 |     schema:
 398 |       columns:
 399 |         - name: "id"
 400 |           datatype: "number"
 401 |           description: "Unique customer identifier"
 402 |         - name: "name"
 403 |           datatype: "string"
 404 |           description: "Customer name"
 405 |         - name: "email"
 406 |           datatype: "string"
 407 |           description: "Customer email address"
 408 |         - name: "created_at"
 409 |           datatype: "date"
 410 |           description: "Customer creation date"
 411 |       partitioning: []
 412 |       indexes:
 413 |         - name: "idx_customers_id"
 414 |           columns: ["id"]
 415 |         - name: "idx_customers_email"
 416 |           columns: ["email"]
 417 |     
 418 |     materialization:
 419 |       type: "table"
 420 |     
 421 |     tests:
 422 |       columns:
 423 |         id: ["not_null", "unique"]
 424 |         email: ["not_null", "unique"]
 425 |         created_at: ["not_null"]
 426 |       table: ["row_count_gt_0"]
 427 |     
 428 |     metadata:
 429 |       file_path: "/models/analytics/customers.sql"
 430 |       owner: "data-team"
 431 |       tags: ["customer", "core"]
 432 |       object_tags:
 433 |         sensitivity_tag: "pii"
 434 |         classification: "internal"
 435 | ```
 436 | 
 437 | </details>
 438 | 
 439 | ## Materialization Types
 440 | 
 441 | ### Incremental Materialization
 442 | 
 443 | Incremental materialization updates only changed data using one of three strategies:
 444 | - **delete_insert**: Deletes rows matching a condition and inserts new data
 445 | - **append**: Simply appends new data without removing existing rows
 446 | - **merge**: Performs an upsert operation using a merge statement
 447 | 
 448 | #### Delete-Insert Strategy
 449 | 
 450 | ```yaml
 451 | materialization:
 452 |   type: "incremental"
 453 |   incremental_details:
 454 |     strategy: "delete_insert"
 455 |     delete_condition: "to_date(updated_ts) = '@start_date'" 
 456 |     filter_condition: "to_date(updated_ts) = '@start_date'"
 457 | ```
 458 | 
 459 | #### Append Strategy
 460 | 
 461 | ```yaml
 462 | materialization:
 463 |   type: "incremental"
 464 |   incremental_details:
 465 |     strategy: "append"
 466 |     filter_condition: "created_at >= '@start_date'"
 467 | ```
 468 | 
 469 | #### Merge Strategy
 470 | 
 471 | ```yaml
 472 | materialization:
 473 |   type: "incremental"
 474 |   incremental_details:
 475 |     strategy: "merge"
 476 |     merge_key: "customer_id"      # Primary key or unique identifier for matching
 477 |     filter_condition: "updated_at >= '@start_date'"
 478 |     update_columns: ["name", "email"]  # Optional: specific columns to update
 479 | ```
 480 | 
 481 | ### SCD2 Materialization
 482 | 
 483 | SCD2 (Slowly Changing Dimension Type 2) materialization tracks historical changes with valid date ranges. Requires a unique key to identify records.
 484 | 
 485 | ```yaml
 486 | materialization:
 487 |   type: "scd2"
 488 |   scd2_details:
 489 |     unique_key: ["product_id"]    # Primary key or unique identifier
 490 |     start_column: "valid_from"    # Optional, defaults to "valid_from"
 491 |     end_column: "valid_to"        # Optional, defaults to "valid_to"
 492 | ```
 493 | 
 494 | ### Schema Column Definition
 495 | 
 496 | A schema column in an OTD defines the structure and properties of a single column in the output table:
 497 | 
 498 | ```yaml
 499 | columns:
 500 |   - name: string              # Column name
 501 |     datatype: string          # Data type ("number", "string", "date", etc.)
 502 |     description: string        # Column description
 503 | ```
 504 | 
 505 | **Common Data Types:**
 506 | - `number`: Numeric values
 507 | - `string`: Text values  
 508 | - `date`: Date and timestamp values
 509 | - `boolean`: True/false values
 510 | - `array`: Array of values
 511 | - `object`: Complex nested objects
 512 | 
 513 | ## Data Quality Tests
 514 | 
 515 | Data quality tests are validation rules that ensure the correctness and quality of transformation outputs. Tests can be defined at two levels:
 516 | - **Column-level tests**: Applied to individual columns (e.g., `not_null`, `unique`)
 517 | - **Table-level tests**: Applied to the entire output (e.g., `row_count_gt_0`, `unique`)
 518 | 
 519 | Tests enable automated data quality validation without manual inspection. OTS supports three types of tests:
 520 | 
 521 | 1. **Standard Tests**: Built-in tests defined in the OTS specification (e.g., `not_null`, `unique`, `row_count_gt_0`)
 522 | 2. **Generic SQL Tests**: Reusable SQL tests with placeholders that can be applied to multiple transformations
 523 | 3. **Singular SQL Tests**: Table-specific SQL tests with hardcoded table references
 524 | 
 525 | ### Standard Tests
 526 | 
 527 | Standard tests are built into the OTS specification and must be implemented by all OTS-compliant tools. These tests provide common data quality checks that are widely applicable across different transformations.
 528 | 
 529 | #### Column-Level Standard Tests
 530 | 
 531 | **`not_null`**
 532 | - **Description**: Ensures a column contains no NULL values
 533 | - **Level**: Column
 534 | - **Parameters**: None
 535 | - **Implementation**: Returns rows where the column is NULL (test fails if any rows returned)
 536 | - **Example**: 
 537 |   ```yaml
 538 |   tests:
 539 |     columns:
 540 |       id: ["not_null"]
 541 |   ```
 542 | 
 543 | **`unique`**
 544 | - **Description**: Ensures column values are unique across all rows
 545 | - **Level**: Column or Table
 546 | - **Parameters**: 
 547 |   - `columns` (array, optional): For table-level tests, specifies which columns to check for uniqueness. If omitted at table level, checks all columns (entire row uniqueness)
 548 | - **Implementation**: Returns duplicate values (test fails if any duplicates found)
 549 | - **Examples**: 
 550 |   ```yaml
 551 |   tests:
 552 |     columns:
 553 |       # Column-level: single column uniqueness
 554 |       id: ["not_null", "unique"]
 555 |     
 556 |     table:
 557 |       # Table-level: composite uniqueness on specific columns
 558 |       - name: "unique"
 559 |         params:
 560 |           columns: ["customer_id", "order_date"]
 561 |       
 562 |       # Table-level: entire row uniqueness (all columns)
 563 |       - "unique"
 564 |   ```
 565 | 
 566 | **`accepted_values`**
 567 | - **Description**: Ensures column values are within a specified list of acceptable values
 568 | - **Level**: Column
 569 | - **Parameters**:
 570 |   - `values` (array, required): List of acceptable values
 571 | - **Implementation**: Returns rows where column value is not in the accepted list
 572 | - **Example**: 
 573 |   ```yaml
 574 |   tests:
 575 |     columns:
 576 |       status:
 577 |         - name: "accepted_values"
 578 |           params:
 579 |             values: ["active", "inactive", "pending"]
 580 |   ```
 581 | 
 582 | **`relationships`**
 583 | - **Description**: Ensures referential integrity between tables (foreign key validation)
 584 | - **Level**: Column
 585 | - **Parameters**:
 586 |   - `to` (string, required): Target transformation ID (e.g., "analytics.customers")
 587 |   - `field` (string, required): Column name in the target transformation
 588 | - **Implementation**: Returns rows where the column value doesn't exist in the target table's specified field
 589 | - **Example**:
 590 |   ```yaml
 591 |   tests:
 592 |     columns:
 593 |       customer_id:
 594 |         - name: "relationships"
 595 |           params:
 596 |             to: "analytics.customers"
 597 |             field: "id"
 598 |   ```
 599 | 
 600 | #### Table-Level Standard Tests
 601 | 
 602 | **`row_count_gt_0`**
 603 | - **Description**: Ensures the table has at least one row
 604 | - **Level**: Table
 605 | - **Parameters**: None
 606 | - **Implementation**: Returns a count result (test fails if count = 0)
 607 | - **Example**: 
 608 |   ```yaml
 609 |   tests:
 610 |     table:
 611 |       - "row_count_gt_0"
 612 |   ```
 613 | 
 614 | ### Test Libraries
 615 | 
 616 | Test libraries are project-level collections of reusable Test Definitions (generic and singular SQL tests) that can be shared across multiple OTS modules. For a detailed introduction to Test Libraries, see the [Test Library](#test-library) section in Core Concepts.
 617 | 
 618 | #### Test Library Structure
 619 | 
 620 | A test library is a YAML or JSON file that contains reusable Test Definitions. The file can be named anything (e.g., `test_library.yaml`, `tests.yaml`, `data_quality_tests.json`), but must follow the structure below.
 621 | 
 622 | **Test Library File Structure:**
 623 | ```yaml
 624 | # test_library.yaml
 625 | ots_version: string                 # OTS specification version (e.g., "0.2.0") - indicates which version of the OTS standard this test library follows
 626 | test_library_version: string        # Optional: Version identifier for the test library (e.g., "1.0", "2.1")
 627 | description: string                  # Optional: Human-readable description of the test library
 628 | 
 629 | generic_tests:
 630 |   check_minimum_rows:
 631 |     type: "sql"
 632 |     level: "table"
 633 |     description: "Ensures table has minimum number of rows"
 634 |     sql: |
 635 |       SELECT 1 as violation
 636 |       FROM @table_name
 637 |       GROUP BY 1
 638 |       HAVING COUNT(*) < @min_rows:10
 639 |     parameters:
 640 |       min_rows:
 641 |         type: "number"
 642 |         default: 10
 643 |         description: "Minimum number of rows required"
 644 |   
 645 |   column_not_negative:
 646 |     type: "sql"
 647 |     level: "column"
 648 |     description: "Ensures numeric column has no negative values"
 649 |     sql: |
 650 |       SELECT @column_name
 651 |       FROM @table_name
 652 |       WHERE @column_name < 0
 653 |     parameters: []
 654 | 
 655 | singular_tests:
 656 |   test_customers_email_format:
 657 |     type: "sql"
 658 |     level: "table"
 659 |     description: "Validates email format for customers table"
 660 |     sql: |
 661 |       SELECT id, email
 662 |       FROM analytics.customers
 663 |       WHERE email NOT LIKE '%@%.%'
 664 |     target_transformation: "analytics.customers"
 665 | ```
 666 | 
 667 | #### Generic SQL Tests
 668 | 
 669 | Generic SQL tests are reusable tests that use placeholders (variables) to make them applicable to multiple transformations. They follow the dbt pattern where:
 670 | - The query returns rows when the test fails
 671 | - 0 rows returned = test passes
 672 | - 1+ rows returned = test fails
 673 | 
 674 | **Placeholders:**
 675 | - `@table_name` or `{{ table_name }}`: Replaced with the fully qualified transformation ID. The `@` syntax is recommended for cleaner SQL.
 676 | - `@column_name` or `{{ column_name }}`: Replaced with the column name (for column-level tests). The `@` syntax is recommended.
 677 | - Custom parameters: Available as `@parameter_name` or `{{ parameter_name }}` with optional defaults using `@param:default` syntax (e.g., `@min_rows:10`)
 678 | 
 679 | **Structure:**
 680 | ```yaml
 681 | generic_tests:
 682 |   test_name:                       # Required: Unique test name (used for referencing)
 683 |     type: "sql"                    # Required: Always "sql" for SQL tests
 684 |     level: "table" | "column"      # Required: Test level
 685 |     description: string            # Optional: Human-readable description
 686 |     sql: string                    # Required: SQL query (returns rows on failure)
 687 |     parameters:                    # Optional: Parameter definitions
 688 |       param_name:
 689 |         type: "number" | "string" | "boolean" | "array"  # Required: Parameter type
 690 |         default: value             # Optional: Default value
 691 |         description: string        # Optional: Parameter description
 692 | ```
 693 | 
 694 | **Example Generic Test:**
 695 | ```yaml
 696 | check_minimum_rows:
 697 |   type: "sql"
 698 |   level: "table"
 699 |   description: "Ensures table has minimum number of rows"
 700 |   sql: |
 701 |     SELECT 1 as violation
 702 |     FROM @table_name
 703 |     GROUP BY 1
 704 |     HAVING COUNT(*) < @min_rows:10
 705 |   parameters:
 706 |     min_rows:
 707 |       type: "number"
 708 |       default: 10
 709 |       description: "Minimum number of rows required"
 710 | ```
 711 | 
 712 | #### Singular SQL Tests
 713 | 
 714 | Singular SQL tests are table-specific tests with hardcoded table references. They are useful for:
 715 | - Complex business logic specific to one transformation
 716 | - Tests that reference multiple tables
 717 | - Table-specific validation rules
 718 | 
 719 | **Structure:**
 720 | ```yaml
 721 | singular_tests:
 722 |   test_name:                       # Required: Unique test name (used for referencing)
 723 |     type: "sql"                    # Required: Always "sql" for SQL tests
 724 |     level: "table" | "column"      # Required: Test level
 725 |     description: string            # Optional: Human-readable description
 726 |     sql: string                    # Required: SQL query with hardcoded table names
 727 |     target_transformation: string  # Required: Transformation ID this test applies to (used for validation and discovery)
 728 | ```
 729 | 
 730 | **Example Singular Test:**
 731 | ```yaml
 732 | test_customers_email_format:
 733 |   type: "sql"
 734 |   level: "table"
 735 |   description: "Validates email format for customers table"
 736 |   sql: |
 737 |     SELECT id, email
 738 |     FROM analytics.customers
 739 |     WHERE email NOT LIKE '%@%.%'
 740 |   target_transformation: "analytics.customers"
 741 | ```
 742 | 
 743 | ### Referencing Tests in Transformations
 744 | 
 745 | Transformations reference tests from:
 746 | 1. **Standard tests**: Referenced by name (e.g., `"not_null"`, `"unique"`)
 747 | 2. **Test library tests**: Referenced by name from the test library (e.g., `"check_minimum_rows"`)
 748 | 
 749 | **Module Structure with Test Library Reference:**
 750 | 
 751 | ```yaml
 752 | ots_version: "0.2.0"
 753 | module_name: "analytics_customers"
 754 | test_library_path: "../tests/test_library.yaml"  # Optional: Path to test library
 755 | 
 756 | target:
 757 |   database: "warehouse"
 758 |   schema: "analytics"
 759 | 
 760 | transformations:
 761 |   - transformation_id: "analytics.customers"
 762 |     tests:
 763 |       columns:
 764 |         id: 
 765 |           - "not_null"                           # Standard test
 766 |           - "unique"                             # Standard test (column-level)
 767 |         email:
 768 |           - "not_null"
 769 |           - name: "accepted_values"              # Standard test with params
 770 |             params:
 771 |               values: ["gmail.com", "yahoo.com"]
 772 |         amount:
 773 |           - name: "column_not_negative"           # Generic test from library
 774 |       table:
 775 |         - "row_count_gt_0"                        # Standard test
 776 |         - "unique"                                # Standard test (table-level, checks all columns)
 777 |         - name: "unique"                          # Standard test (table-level, composite on specific columns)
 778 |           params:
 779 |             columns: ["customer_id", "order_date"]
 780 |         - name: "check_minimum_rows"              # Generic test with params
 781 |           params:
 782 |             min_rows: 100
 783 |         - "test_customers_email_format"          # Singular test from library
 784 | ```
 785 | 
 786 | **Test Reference Formats:**
 787 | 
 788 | 1. **Simple string** (standard test, no parameters):
 789 |    ```yaml
 790 |    tests:
 791 |      columns:
 792 |        id: ["not_null", "unique"]
 793 |      table:
 794 |        - "row_count_gt_0"
 795 |    ```
 796 | 
 797 | 2. **Object with name** (standard test with parameters):
 798 |    ```yaml
 799 |    tests:
 800 |      columns:
 801 |        status:
 802 |          - name: "accepted_values"
 803 |            params:
 804 |              values: ["active", "inactive"]
 805 |    ```
 806 | 
 807 | 3. **Object with name** (generic/singular test from library):
 808 |    ```yaml
 809 |    tests:
 810 |      table:
 811 |        - name: "check_minimum_rows"
 812 |          params:
 813 |            min_rows: 100
 814 |    ```
 815 | 
 816 | ### Test Execution Model
 817 | 
 818 | Tests follow the dbt execution model:
 819 | - **0 rows returned** = test passes
 820 | - **1+ rows returned** = test fails
 821 | 
 822 | For standard tests, tools generate SQL queries that return violating rows. For SQL tests (generic and singular), the SQL query itself returns rows when violations are found.
 823 | 
 824 | **Test Severity:**
 825 | - Tests can have a `severity` level: `"error"` (default) or `"warning"`
 826 | - `error`: Test failure stops execution and fails the build
 827 | - `warning`: Test failure is logged but doesn't stop execution
 828 | 
 829 | **Severity in Test References:**
 830 | ```yaml
 831 | tests:
 832 |   columns:
 833 |     id:
 834 |       - name: "not_null"
 835 |         severity: "error"        # Default, can be omitted
 836 |       - name: "unique"
 837 |         severity: "warning"      # Non-blocking
 838 |   table:
 839 |     - name: "row_count_gt_0"
 840 |       severity: "error"          # Default, can be omitted
 841 | ```
 842 | 
 843 | ### Inline Test Definitions in OTS Modules
 844 | 
 845 | Test Definitions (generic and singular SQL tests) can also be defined directly within an OTS Module, using the same structure as test libraries. This is useful for module-specific Test Definitions that don't need to be shared across modules.
 846 | 
 847 | **Module Structure with Inline Tests:**
 848 | ```yaml
 849 | ots_version: "0.2.0"
 850 | module_name: "analytics_customers"
 851 | 
 852 | # Optional: Inline test definitions (same structure as test library)
 853 | generic_tests:
 854 |   check_recent_data:
 855 |     type: "sql"
 856 |     level: "table"
 857 |     description: "Ensures table has recent data"
 858 |     sql: |
 859 |       SELECT 1 as violation
 860 |       FROM @table_name
 861 |       WHERE updated_at < CURRENT_DATE - INTERVAL '@days:7' DAY
 862 |     parameters:
 863 |       days:
 864 |         type: "number"
 865 |         default: 7
 866 | 
 867 | singular_tests:
 868 |   test_customers_specific:
 869 |     type: "sql"
 870 |     level: "table"
 871 |     description: "Module-specific test"
 872 |     sql: |
 873 |       SELECT id FROM analytics.customers WHERE status = 'invalid'
 874 |     target_transformation: "analytics.customers"
 875 | 
 876 | target:
 877 |   database: "warehouse"
 878 |   schema: "analytics"
 879 | 
 880 | transformations:
 881 |   - transformation_id: "analytics.customers"
 882 |     tests:
 883 |       table:
 884 |         - name: "check_recent_data"      # References inline generic test
 885 |           params:
 886 |             days: 3
 887 |         - "test_customers_specific"      # References inline singular test
 888 | ```
 889 | 
 890 | **Test Resolution Priority:**
 891 | When resolving test names, tools should check in the following order:
 892 | 1. **Standard tests** (built into OTS specification)
 893 | 2. **Inline tests** (defined in the current OTS Module)
 894 | 3. **Test library tests** (from referenced test library)
 895 | 
 896 | If a test name exists in multiple locations, the first match takes precedence. This allows modules to override test library tests with module-specific implementations.
 897 | 
 898 | ### Test Library Resolution
 899 | 
 900 | When a transformation module references a test library:
 901 | 1. The tool resolves the `test_library_path` (relative to the module file or absolute path)
 902 | 2. Loads the test library file (YAML or JSON format)
 903 | 3. Validates test definitions
 904 | 4. Makes tests available for reference in transformations (after inline tests)
 905 | 
 906 | **Test Discovery:**
 907 | - **Standard tests**: Always available, no discovery needed
 908 | - **Generic tests**: Discovered from test library or inline module definitions
 909 | - **Singular tests**: Discovered from test library or inline module definitions. The `target_transformation` field helps tools validate that the test is applied to the correct transformation.
 910 | 
 911 | If a test is referenced but not found among the Standard tests, inline tests, or Test library, it must result in an error.
 912 | 
 913 | ## User-Defined Functions (UDFs)
 914 | 
 915 | ### Overview
 916 | 
 917 | User-Defined Functions (UDFs) are custom functions that can be called within SQL transformations. OTS v0.2.0 adds support for defining UDFs as **UDF Definitions** within OTS Modules and tracking UDF dependencies in transformations, enabling proper dependency graph building and execution order determination.
 918 | 
 919 | ### Function Dependencies
 920 | 
 921 | When a transformation calls a user-defined function, the function name should be listed in the `source_functions` array. This allows tools to:
 922 | 
 923 | - Build accurate dependency graphs that include function dependencies
 924 | - Determine correct execution order (functions must be created before transformations that use them)
 925 | - Validate that all required functions exist before executing transformations
 926 | - Support function-to-function dependencies (functions calling other functions)
 927 | 
 928 | ### Function Naming
 929 | 
 930 | Function names in `source_functions` should follow these conventions:
 931 | - **Fully qualified names** (preferred): `schema.function_name` (e.g., `analytics.calculate_percentage`)
 932 | - **Unqualified names**: `function_name` (when the function is resolved by the database's search path)
 933 | 
 934 | Tools should resolve unqualified function names to fully qualified names when building dependency graphs.
 935 | 
 936 | ### Example: Transformation Using a Function
 937 | 
 938 | ```yaml
 939 | ots_version: "0.2.0"
 940 | transformation_id: "analytics.order_summary"
 941 | description: "Order summary with calculated metrics"
 942 | 
 943 | transformation_type: "sql"
 944 | code:
 945 |   sql:
 946 |     original_sql: |
 947 |       SELECT 
 948 |         order_id,
 949 |         customer_id,
 950 |         analytics.calculate_percentage(discount_amount, total_amount) as discount_pct,
 951 |         analytics.format_currency(total_amount) as formatted_total
 952 |       FROM source.orders
 953 |     resolved_sql: |
 954 |       SELECT 
 955 |         order_id,
 956 |         customer_id,
 957 |         analytics.calculate_percentage(discount_amount, total_amount) as discount_pct,
 958 |         analytics.format_currency(total_amount) as formatted_total
 959 |       FROM warehouse.source.orders
 960 |     source_tables: ["source.orders"]
 961 |     source_functions: ["analytics.calculate_percentage", "analytics.format_currency"]
 962 | 
 963 | schema:
 964 |   columns:
 965 |     - name: "order_id"
 966 |       datatype: "number"
 967 |     - name: "customer_id"
 968 |       datatype: "number"
 969 |     - name: "discount_pct"
 970 |       datatype: "number"
 971 |       description: "Discount percentage calculated using UDF"
 972 |     - name: "formatted_total"
 973 |       datatype: "string"
 974 |       description: "Formatted currency using UDF"
 975 | 
 976 | materialization:
 977 |   type: "table"
 978 | ```
 979 | 
 980 | In this example, the transformation depends on two user-defined functions:
 981 | - `analytics.calculate_percentage`: Calculates percentage values
 982 | - `analytics.format_currency`: Formats numeric values as currency strings
 983 | 
 984 | These dependencies are tracked in `source_functions`, allowing the dependency graph to ensure these functions are created before the transformation executes.
 985 | 
 986 | ### Function Execution Order
 987 | 
 988 | Functions are executed in dependency order, before transformations:
 989 | 1. **Seeds** are loaded first (if any)
 990 | 2. **Functions** are created in dependency order (functions that depend on other functions are created after their dependencies)
 991 | 3. **Transformations** are executed (can use functions created in step 2)
 992 | 
 993 | Function-to-function dependencies are resolved automatically based on the `dependencies.functions` array in each function definition.
 994 | 
 995 | ### Function Overloading
 996 | 
 997 | Some databases (e.g., Snowflake, DuckDB, PostgreSQL) support function overloading - multiple functions with the same name but different parameter signatures. OTS 0.2.0 supports this by:
 998 | 
 999 | - **Function identification**: Functions are identified by their fully qualified name (`schema.function_name`) and parameter signature
1000 | - **Signature matching**: When a function is called, the database matches the call to the appropriate overload based on parameter types
1001 | - **Dependency tracking**: Each overloaded function is tracked separately in the `functions` array with its unique signature
1002 | 
1003 | Tools implementing OTS should handle function overloading according to the target database's capabilities and requirements.
1004 | 
1005 | ### Example: OTS Module with Functions
1006 | 
1007 | The following example shows a complete OTS Module that includes both transformations and UDF Definitions:
1008 | 
1009 | ```yaml
1010 | ots_version: "0.2.0"
1011 | module_name: "analytics_calculations"
1012 | module_description: "Analytics module with custom calculation functions"
1013 | 
1014 | target:
1015 |   database: "warehouse"
1016 |   schema: "analytics"
1017 |   sql_dialect: "postgres"
1018 | 
1019 | transformations:
1020 |   - transformation_id: "analytics.order_summary"
1021 |     description: "Order summary with calculated metrics"
1022 |     transformation_type: "sql"
1023 |     code:
1024 |       sql:
1025 |         original_sql: |
1026 |           SELECT 
1027 |             order_id,
1028 |             customer_id,
1029 |             analytics.calculate_percentage(discount_amount, total_amount) as discount_pct,
1030 |             analytics.format_currency(total_amount) as formatted_total
1031 |           FROM source.orders
1032 |         resolved_sql: |
1033 |           SELECT 
1034 |             order_id,
1035 |             customer_id,
1036 |             analytics.calculate_percentage(discount_amount, total_amount) as discount_pct,
1037 |             analytics.format_currency(total_amount) as formatted_total
1038 |           FROM warehouse.source.orders
1039 |         source_tables: ["source.orders"]
1040 |         source_functions: ["analytics.calculate_percentage", "analytics.format_currency"]
1041 |     materialization:
1042 |       type: "table"
1043 | 
1044 | functions:
1045 |   - function_id: "analytics.calculate_percentage"
1046 |     description: "Calculates the percentage of a numerator over a denominator"
1047 |     function_type: "scalar"
1048 |     language: "sql"
1049 |     parameters:
1050 |       - name: "numerator"
1051 |         type: "DOUBLE"
1052 |         description: "The numerator value"
1053 |       - name: "denominator"
1054 |         type: "DOUBLE"
1055 |         description: "The denominator value"
1056 |     return_type: "DOUBLE"
1057 |     deterministic: true
1058 |     code:
1059 |       generic_sql: |
1060 |         CREATE OR REPLACE FUNCTION calculate_percentage(
1061 |             numerator DOUBLE,
1062 |             denominator DOUBLE
1063 |         ) RETURNS DOUBLE AS $$
1064 |             SELECT 
1065 |                 CASE 
1066 |                     WHEN denominator = 0 OR denominator IS NULL THEN NULL
1067 |                     ELSE (numerator / denominator) * 100.0
1068 |                 END
1069 |         $$;
1070 |       database_specific: {}
1071 |     dependencies:
1072 |       tables: []
1073 |       functions: []
1074 |     metadata:
1075 |       file_path: "/functions/analytics/calculate_percentage.sql"
1076 |       tags: ["math", "utility"]
1077 |       object_tags:
1078 |         category: "calculation"
1079 |         complexity: "simple"
1080 | 
1081 |   - function_id: "analytics.format_currency"
1082 |     description: "Formats a numeric value as currency string"
1083 |     function_type: "scalar"
1084 |     language: "sql"
1085 |     parameters:
1086 |       - name: "amount"
1087 |         type: "DOUBLE"
1088 |         description: "The amount to format"
1089 |     return_type: "VARCHAR"
1090 |     code:
1091 |       generic_sql: |
1092 |         CREATE OR REPLACE FUNCTION format_currency(amount DOUBLE) 
1093 |         RETURNS VARCHAR AS $$
1094 |             SELECT '$' || TO_CHAR(amount, 'FM999,999,999.00')
1095 |         $$;
1096 |       database_specific: {}
1097 |     dependencies:
1098 |       tables: []
1099 |       functions: []
1100 |     metadata:
1101 |       file_path: "/functions/analytics/format_currency.sql"
1102 |       tags: ["formatting", "utility"]
1103 | ```
1104 | 
1105 | In this example:
1106 | - The module defines two functions: `analytics.calculate_percentage` and `analytics.format_currency`
1107 | - The transformation `analytics.order_summary` uses both functions and lists them in `source_functions`
1108 | - Functions are defined with their complete structure including parameters, return types, code, and metadata
1109 | - The `functions` array is at the same level as `transformations`, maintaining consistency in the module structure
1110 | 
1111 | ## Complete Examples: Incremental Strategies
1112 | 
1113 | ### Delete-Insert Example
1114 | 
1115 | <details>
1116 | <summary><strong>YAML Format</strong></summary>
1117 | 
1118 | ```yaml
1119 | ots_version: "0.2.0"
1120 | transformation_id: "analytics.recent_orders"
1121 | description: "Orders updated in the last 7 days"
1122 | 
1123 | transformation_type: "sql"
1124 | code:
1125 |   sql:
1126 |     original_sql: "SELECT order_id, customer_id, order_date, amount, status FROM source.orders WHERE updated_at >= '@start_date'"
1127 |     resolved_sql: "SELECT order_id, customer_id, order_date, amount, status FROM warehouse.source.orders WHERE updated_at >= '@start_date'"
1128 |     source_tables: ["source.orders"]
1129 |     source_functions: []
1130 | 
1131 | schema:
1132 |   columns:
1133 |     - name: "order_id"
1134 |       datatype: "number"
1135 |       description: "Unique order identifier"
1136 |     - name: "customer_id"
1137 |       datatype: "number"
1138 |       description: "Customer ID"
1139 |     - name: "order_date"
1140 |       datatype: "date"
1141 |       description: "Order date"
1142 |     - name: "amount"
1143 |       datatype: "number"
1144 |       description: "Order amount"
1145 |     - name: "status"
1146 |       datatype: "string"
1147 |       description: "Order status"
1148 |   partitioning: ["order_date"]
1149 |   indexes:
1150 |     - name: "idx_order_id"
1151 |       columns: ["order_id"]
1152 | 
1153 | materialization:
1154 |   type: "incremental"
1155 |   incremental_details:
1156 |     strategy: "delete_insert"
1157 |     delete_condition: "to_date(updated_at) = '@start_date'"
1158 |     filter_condition: "to_date(updated_at) = '@start_date'"
1159 |     
1160 | tests:
1161 |   columns:
1162 |     order_id: ["not_null", "unique"]
1163 |     order_date: ["not_null"]
1164 |   table: ["row_count_gt_0"]
1165 | 
1166 |   metadata:
1167 |   file_path: "/models/analytics/recent_orders.sql"
1168 |   owner: "analytics-team"
1169 |   tags: ["orders", "incremental"]
1170 | ```
1171 | 
1172 | </details>
1173 | 
1174 | <details>
1175 | <summary><strong>JSON Format</strong></summary>
1176 | 
1177 | ```json
1178 | {
1179 |   "ots_version": "0.2.0",
1180 |   "transformation_id": "analytics.recent_orders",
1181 |   "description": "Orders updated in the last 7 days",
1182 |   
1183 |   "transformation_type": "sql",
1184 |   "code": {
1185 |     "sql": {
1186 |       "original_sql": "SELECT order_id, customer_id, order_date, amount, status FROM source.orders WHERE updated_at >= '@start_date'",
1187 |       "resolved_sql": "SELECT order_id, customer_id, order_date, amount, status FROM warehouse.source.orders WHERE updated_at >= '@start_date'",
1188 |       "source_tables": ["source.orders"],
1189 |           "source_functions": []
1190 |     }
1191 |   },
1192 |   
1193 |   "schema": {
1194 |     "columns": [
1195 |       {
1196 |         "name": "order_id",
1197 |         "datatype": "number",
1198 |         "description": "Unique order identifier"
1199 |       },
1200 |       {
1201 |         "name": "customer_id",
1202 |         "datatype": "number",
1203 |         "description": "Customer ID"
1204 |       },
1205 |       {
1206 |         "name": "order_date",
1207 |         "datatype": "date",
1208 |         "description": "Order date"
1209 |       },
1210 |       {
1211 |         "name": "amount",
1212 |         "datatype": "number",
1213 |         "description": "Order amount"
1214 |       },
1215 |       {
1216 |         "name": "status",
1217 |         "datatype": "string",
1218 |         "description": "Order status"
1219 |       }
1220 |     ],
1221 |     "partitioning": ["order_date"],
1222 |     "indexes": [
1223 |       {
1224 |         "name": "idx_order_id",
1225 |         "columns": ["order_id"]
1226 |       }
1227 |     ]
1228 |   },
1229 |   
1230 |   "materialization": {
1231 |     "type": "incremental",
1232 |     "incremental_details": {
1233 |       "strategy": "delete_insert",
1234 |       "delete_condition": "to_date(updated_at) = '@start_date'",
1235 |       "filter_condition": "to_date(updated_at) = '@start_date'"
1236 |     }
1237 |   },
1238 |   
1239 |   "tests": {
1240 |     "columns": {
1241 |       "order_id": ["not_null", "unique"],
1242 |       "order_date": ["not_null"]
1243 |     },
1244 |     "table": ["row_count_gt_0"]
1245 |   },
1246 |   
1247 |   "metadata": {
1248 |     "file_path": "/models/analytics/recent_orders.sql",
1249 |     "owner": "analytics-team",
1250 |     "tags": ["orders", "incremental"]
1251 |   }
1252 | }
1253 | ```
1254 | 
1255 | </details>
1256 | 
1257 | ### Append Example
1258 | 
1259 | <details>
1260 | <summary><strong>YAML Format</strong></summary>
1261 | 
1262 | ```yaml
1263 | ots_version: "0.2.0"
1264 | transformation_id: "logs.event_stream"
1265 | description: "Append-only event log"
1266 | 
1267 | transformation_type: "sql"
1268 | code:
1269 |   sql:
1270 |     original_sql: "SELECT event_id, timestamp, user_id, event_type, payload FROM source.events WHERE timestamp >= '@start_date'"
1271 |     resolved_sql: "SELECT event_id, timestamp, user_id, event_type, payload FROM warehouse.source.events WHERE timestamp >= '@start_date'"
1272 |     source_tables: ["source.events"]
1273 |         source_functions: []
1274 | 
1275 | schema:
1276 |   columns:
1277 |     - name: "event_id"
1278 |       datatype: "string"
1279 |       description: "Unique event identifier"
1280 |     - name: "timestamp"
1281 |       datatype: "date"
1282 |       description: "Event timestamp"
1283 |     - name: "user_id"
1284 |       datatype: "string"
1285 |       description: "User who triggered the event"
1286 |     - name: "event_type"
1287 |       datatype: "string"
1288 |       description: "Type of event"
1289 |     - name: "payload"
1290 |       datatype: "object"
1291 |       description: "Event payload data"
1292 |   partitioning: ["timestamp"]
1293 |   indexes:
1294 |     - name: "idx_timestamp"
1295 |       columns: ["timestamp"]
1296 |     - name: "idx_user_id"
1297 |       columns: ["user_id"]
1298 | 
1299 | materialization:
1300 |   type: "incremental"
1301 |   incremental_details:
1302 |     strategy: "append"
1303 |     filter_condition: "timestamp >= '@start_date'"
1304 | 
1305 | tests:
1306 |   columns:
1307 |     event_id: ["not_null", "unique"]
1308 |     timestamp: ["not_null"]
1309 |   table: ["row_count_gt_0"]
1310 | 
1311 | metadata:
1312 |   file_path: "/models/logs/event_stream.sql"
1313 |   owner: "data-engineering"
1314 |   tags: ["events", "append-only"]
1315 | ```
1316 | 
1317 | </details>
1318 | 
1319 | <details>
1320 | <summary><strong>JSON Format</strong></summary>
1321 | 
1322 | ```json
1323 | {
1324 |   "ots_version": "0.2.0",
1325 |   "transformation_id": "logs.event_stream",
1326 |   "description": "Append-only event log",
1327 |   
1328 |   "transformation_type": "sql",
1329 |   "code": {
1330 |     "sql": {
1331 |       "original_sql": "SELECT event_id, timestamp, user_id, event_type, payload FROM source.events WHERE timestamp >= '@start_date'",
1332 |       "resolved_sql": "SELECT event_id, timestamp, user_id, event_type, payload FROM warehouse.source.events WHERE timestamp >= '@start_date'",
1333 |       "source_tables": ["source.events"],
1334 |           "source_functions": []
1335 |     }
1336 |   },
1337 |   
1338 |   "schema": {
1339 |     "columns": [
1340 |       {
1341 |         "name": "event_id",
1342 |         "datatype": "string",
1343 |         "description": "Unique event identifier"
1344 |       },
1345 |       {
1346 |         "name": "timestamp",
1347 |         "datatype": "date",
1348 |         "description": "Event timestamp"
1349 |       },
1350 |       {
1351 |         "name": "user_id",
1352 |         "datatype": "string",
1353 |         "description": "User who triggered the event"
1354 |       },
1355 |       {
1356 |         "name": "event_type",
1357 |         "datatype": "string",
1358 |         "description": "Type of event"
1359 |       },
1360 |       {
1361 |         "name": "payload",
1362 |         "datatype": "object",
1363 |         "description": "Event payload data"
1364 |       }
1365 |     ],
1366 |     "partitioning": ["timestamp"],
1367 |     "indexes": [
1368 |       {
1369 |         "name": "idx_timestamp",
1370 |         "columns": ["timestamp"]
1371 |       },
1372 |       {
1373 |         "name": "idx_user_id",
1374 |         "columns": ["user_id"]
1375 |       }
1376 |     ]
1377 |   },
1378 |   
1379 |   "materialization": {
1380 |     "type": "incremental",
1381 |     "incremental_details": {
1382 |       "strategy": "append",
1383 |       "filter_condition": "timestamp >= '@start_date'"
1384 |     }
1385 |   },
1386 |   
1387 |   "tests": {
1388 |     "columns": {
1389 |       "event_id": ["not_null", "unique"],
1390 |       "timestamp": ["not_null"]
1391 |     },
1392 |     "table": ["row_count_gt_0"]
1393 |   },
1394 |   
1395 |   "metadata": {
1396 |     "file_path": "/models/logs/event_stream.sql",
1397 |     "owner": "data-engineering",
1398 |     "tags": ["events", "append-only"]
1399 |   }
1400 | }
1401 | ```
1402 | 
1403 | </details>
1404 | 
1405 | ### Merge Example
1406 | 
1407 | <details>
1408 | <summary><strong>YAML Format</strong></summary>
1409 | 
1410 | ```yaml
1411 | ots_version: "0.2.0"
1412 | transformation_id: "product.master_data"
1413 | description: "Customer master data with upsert logic"
1414 | 
1415 | transformation_type: "sql"
1416 | code:
1417 |   sql:
1418 |     original_sql: "SELECT customer_id, name, email, phone, updated_at FROM source.customers WHERE updated_at >= '@start_date'"
1419 |     resolved_sql: "SELECT customer_id, name, email, phone, updated_at FROM warehouse.source.customers WHERE updated_at >= '@start_date'"
1420 |     source_tables: ["source.customers"]
1421 |         source_functions: []
1422 | 
1423 | schema:
1424 |   columns:
1425 |     - name: "customer_id"
1426 |       datatype: "number"
1427 |       description: "Unique customer identifier"
1428 |     - name: "name"
1429 |       datatype: "string"
1430 |       description: "Customer name"
1431 |     - name: "email"
1432 |       datatype: "string"
1433 |       description: "Customer email"
1434 |     - name: "phone"
1435 |       datatype: "string"
1436 |       description: "Customer phone number"
1437 |     - name: "updated_at"
1438 |       datatype: "date"
1439 |       description: "Last update timestamp"
1440 |   partitioning: []
1441 |   indexes:
1442 |     - name: "idx_customer_id"
1443 |       columns: ["customer_id"]
1444 |     - name: "idx_email"
1445 |       columns: ["email"]
1446 | 
1447 | materialization:
1448 |   type: "incremental"
1449 |   incremental_details:
1450 |     strategy: "merge"
1451 |     filter_condition: "updated_at >= '@start_date'"
1452 |     merge_key: ["customer_id"]
1453 |     update_columns: ["name", "email", "phone", "updated_at"]
1454 | 
1455 | tests:
1456 |   columns:
1457 |     customer_id: ["not_null", "unique"]
1458 |     email: ["not_null"]
1459 |   table: ["row_count_gt_0", "unique"]  # unique at table level checks all columns for row uniqueness
1460 | 
1461 |   metadata:
1462 |   file_path: "/models/product/master_data.sql"
1463 |   owner: "product-team"
1464 |   tags: ["customers", "master-data"]
1465 | ```
1466 | 
1467 | </details>
1468 | 
1469 | <details>
1470 | <summary><strong>JSON Format</strong></summary>
1471 | 
1472 | ```json
1473 | {
1474 |   "ots_version": "0.2.0",
1475 |   "transformation_id": "product.master_data",
1476 |   "description": "Customer master data with upsert logic",
1477 |   
1478 |   "transformation_type": "sql",
1479 |   "code": {
1480 |     "sql": {
1481 |       "original_sql": "SELECT customer_id, name, email, phone, updated_at FROM source.customers WHERE updated_at >= '@start_date'",
1482 |       "resolved_sql": "SELECT customer_id, name, email, phone, updated_at FROM warehouse.source.customers WHERE updated_at >= '@start_date'",
1483 |       "source_tables": ["source.customers"],
1484 |           "source_functions": []
1485 |     }
1486 |   },
1487 |   
1488 |   "schema": {
1489 |     "columns": [
1490 |       {
1491 |         "name": "customer_id",
1492 |         "datatype": "number",
1493 |         "description": "Unique customer identifier"
1494 |       },
1495 |       {
1496 |         "name": "name",
1497 |         "datatype": "string",
1498 |         "description": "Customer name"
1499 |       },
1500 |       {
1501 |         "name": "email",
1502 |         "datatype": "string",
1503 |         "description": "Customer email"
1504 |       },
1505 |       {
1506 |         "name": "phone",
1507 |         "datatype": "string",
1508 |         "description": "Customer phone number"
1509 |       },
1510 |       {
1511 |         "name": "updated_at",
1512 |         "datatype": "date",
1513 |         "description": "Last update timestamp"
1514 |       }
1515 |     ],
1516 |     "partitioning": [],
1517 |     "indexes": [
1518 |       {
1519 |         "name": "idx_customer_id",
1520 |         "columns": ["customer_id"]
1521 |       },
1522 |       {
1523 |         "name": "idx_email",
1524 |         "columns": ["email"]
1525 |       }
1526 |     ]
1527 |   },
1528 |   
1529 |   "materialization": {
1530 |     "type": "incremental",
1531 |     "incremental_details": {
1532 |       "strategy": "merge",
1533 |       "filter_condition": "updated_at >= '@start_date'",
1534 |       "merge_key": ["customer_id"],
1535 |       "update_columns": ["name", "email", "phone", "updated_at"]
1536 |     }
1537 |   },
1538 |   
1539 |   "tests": {
1540 |     "columns": {
1541 |       "customer_id": ["not_null", "unique"],
1542 |       "email": ["not_null"]
1543 |     },
1544 |     "table": ["row_count_gt_0", "unique"]
1545 |   },
1546 |   
1547 |   "metadata": {
1548 |     "file_path": "/models/product/master_data.sql",
1549 |     "owner": "product-team",
1550 |     "tags": ["customers", "master-data"]
1551 |   }
1552 | }
1553 | ```
1554 | 
1555 | </details>
1556 | 
1557 | ### SCD2 Example
1558 | 
1559 | <details>
1560 | <summary><strong>YAML Format</strong></summary>
1561 | 
1562 | ```yaml
1563 | ots_version: "0.2.0"
1564 | transformation_id: "dim.products_scd2"
1565 | description: "Product dimension with full history tracking"
1566 | 
1567 | transformation_type: "sql"
1568 | code:
1569 |   sql:
1570 |     original_sql: "SELECT product_id, product_name, price, category, updated_at FROM source.products WHERE updated_at >= '@start_date'"
1571 |     resolved_sql: "SELECT product_id, product_name, price, category, updated_at FROM warehouse.source.products WHERE updated_at >= '@start_date'"
1572 |     source_tables: ["source.products"]
1573 |         source_functions: []
1574 | 
1575 | schema:
1576 |   columns:
1577 |     - name: "product_id"
1578 |       datatype: "number"
1579 |       description: "Unique product identifier"
1580 |     - name: "product_name"
1581 |       datatype: "string"
1582 |       description: "Product name"
1583 |     - name: "price"
1584 |       datatype: "number"
1585 |       description: "Product price"
1586 |     - name: "category"
1587 |       datatype: "string"
1588 |       description: "Product category"
1589 |     - name: "updated_at"
1590 |       datatype: "date"
1591 |       description: "Last update timestamp"
1592 |     - name: "valid_from"
1593 |       datatype: "date"
1594 |       description: "Record validity start date"
1595 |     - name: "valid_to"
1596 |       datatype: "date"
1597 |       description: "Record validity end date"
1598 |   partitioning: []
1599 |   indexes:
1600 |     - name: "idx_product_id"
1601 |       columns: ["product_id"]
1602 |     - name: "idx_valid_from"
1603 |       columns: ["valid_from"]
1604 | 
1605 | materialization:
1606 |   type: "scd2"
1607 |   scd2_details:
1608 |     unique_key: ["product_id"]
1609 |     start_column: "valid_from"
1610 |     end_column: "valid_to"
1611 | 
1612 | tests:
1613 |   columns:
1614 |     product_id: ["not_null", "unique"]
1615 |     valid_from: ["not_null"]
1616 |   table: ["row_count_gt_0"]
1617 | 
1618 |   metadata:
1619 |   file_path: "/models/dim/products_scd2.sql"
1620 |   owner: "data-engineering"
1621 |   tags: ["products", "scd2", "dimension"]
1622 | ```
1623 | 
1624 | </details>
1625 | 
1626 | <details>
1627 | <summary><strong>JSON Format</strong></summary>
1628 | 
1629 | ```json
1630 | {
1631 |   "ots_version": "0.2.0",
1632 |   "transformation_id": "dim.products_scd2",
1633 |   "description": "Product dimension with full history tracking",
1634 |   
1635 |   "transformation_type": "sql",
1636 |   "code": {
1637 |     "sql": {
1638 |       "original_sql": "SELECT product_id, product_name, price, category, updated_at FROM source.products WHERE updated_at >= '@start_date'",
1639 |       "resolved_sql": "SELECT product_id, product_name, price, category, updated_at FROM warehouse.source.products WHERE updated_at >= '@start_date'",
1640 |       "source_tables": ["source.products"],
1641 |           "source_functions": []
1642 |     }
1643 |   },
1644 |   
1645 |   "schema": {
1646 |     "columns": [
1647 |       {
1648 |         "name": "product_id",
1649 |         "datatype": "number",
1650 |         "description": "Unique product identifier"
1651 |       },
1652 |       {
1653 |         "name": "product_name",
1654 |         "datatype": "string",
1655 |         "description": "Product name"
1656 |       },
1657 |       {
1658 |         "name": "price",
1659 |         "datatype": "number",
1660 |         "description": "Product price"
1661 |       },
1662 |       {
1663 |         "name": "category",
1664 |         "datatype": "string",
1665 |         "description": "Product category"
1666 |       },
1667 |       {
1668 |         "name": "updated_at",
1669 |         "datatype": "date",
1670 |         "description": "Last update timestamp"
1671 |       },
1672 |       {
1673 |         "name": "valid_from",
1674 |         "datatype": "date",
1675 |         "description": "Record validity start date"
1676 |       },
1677 |       {
1678 |         "name": "valid_to",
1679 |         "datatype": "date",
1680 |         "description": "Record validity end date"
1681 |       }
1682 |     ],
1683 |     "partitioning": [],
1684 |     "indexes": [
1685 |       {
1686 |         "name": "idx_product_id",
1687 |         "columns": ["product_id"]
1688 |       },
1689 |       {
1690 |         "name": "idx_valid_from",
1691 |         "columns": ["valid_from"]
1692 |       }
1693 |     ]
1694 |   },
1695 |   
1696 |   "materialization": {
1697 |     "type": "scd2",
1698 |     "scd2_details": {
1699 |       "unique_key": ["product_id"],
1700 |       "start_column": "valid_from",
1701 |       "end_column": "valid_to"
1702 |     }
1703 |   },
1704 |   
1705 |   "tests": {
1706 |     "columns": {
1707 |       "product_id": ["not_null", "unique"],
1708 |       "valid_from": ["not_null"]
1709 |     },
1710 |     "table": ["row_count_gt_0"]
1711 |   },
1712 |   
1713 |   "metadata": {
1714 |     "file_path": "/models/dim/products_scd2.sql",
1715 |     "owner": "data-engineering",
1716 |     "tags": ["products", "scd2", "dimension"]
1717 |   }
1718 | }
1719 | ```
1720 | 
1721 | </details>
1722 | 
1723 | ## Complete Example: Test Library and Module
1724 | 
1725 | This example demonstrates a complete setup with a test library and a transformation module that uses both standard and custom tests.
1726 | 
1727 | ### Test Library Example
1728 | 
1729 | <details>
1730 | <summary><strong>YAML Format</strong></summary>
1731 | 
1732 | ```yaml
1733 | # tests/test_library.yaml
1734 | ots_version: "0.2.0"
1735 | test_library_version: "1.0"
1736 | description: "Shared data quality tests for analytics project"
1737 | 
1738 | generic_tests:
1739 |   check_minimum_rows:
1740 |     type: "sql"
1741 |     level: "table"
1742 |     description: "Ensures table has minimum number of rows"
1743 |     sql: |
1744 |       SELECT 1 as violation
1745 |       FROM @table_name
1746 |       GROUP BY 1
1747 |       HAVING COUNT(*) < @min_rows:10
1748 |     parameters:
1749 |       min_rows:
1750 |         type: "number"
1751 |         default: 10
1752 |         description: "Minimum number of rows required"
1753 |   
1754 |   column_not_negative:
1755 |     type: "sql"
1756 |     level: "column"
1757 |     description: "Ensures numeric column has no negative values"
1758 |     sql: |
1759 |       SELECT @column_name
1760 |       FROM @table_name
1761 |       WHERE @column_name < 0
1762 |     parameters: []
1763 | 
1764 | singular_tests:
1765 |   test_customers_email_format:
1766 |     type: "sql"
1767 |     level: "table"
1768 |     description: "Validates email format for customers table"
1769 |     sql: |
1770 |       SELECT id, email
1771 |       FROM analytics.customers
1772 |       WHERE email NOT LIKE '%@%.%'
1773 |     target_transformation: "analytics.customers"
1774 | ```
1775 | 
1776 | </details>
1777 | 
1778 | <details>
1779 | <summary><strong>JSON Format</strong></summary>
1780 | 
1781 | ```json
1782 | {
1783 |   "ots_version": "0.2.0",
1784 |   "test_library_version": "1.0",
1785 |   "description": "Shared data quality tests for analytics project",
1786 |   "generic_tests": {
1787 |     "check_minimum_rows": {
1788 |       "type": "sql",
1789 |       "level": "table",
1790 |       "description": "Ensures table has minimum number of rows",
1791 |       "sql": "SELECT 1 as violation\nFROM @table_name\nGROUP BY 1\nHAVING COUNT(*) < @min_rows:10",
1792 |       "parameters": {
1793 |         "min_rows": {
1794 |           "type": "number",
1795 |           "default": 10,
1796 |           "description": "Minimum number of rows required"
1797 |         }
1798 |       }
1799 |     },
1800 |     "column_not_negative": {
1801 |       "type": "sql",
1802 |       "level": "column",
1803 |       "description": "Ensures numeric column has no negative values",
1804 |       "sql": "SELECT @column_name\nFROM @table_name\nWHERE @column_name < 0",
1805 |       "parameters": []
1806 |     }
1807 |   },
1808 |   "singular_tests": {
1809 |     "test_customers_email_format": {
1810 |       "type": "sql",
1811 |       "level": "table",
1812 |       "description": "Validates email format for customers table",
1813 |       "sql": "SELECT id, email\nFROM analytics.customers\nWHERE email NOT LIKE '%@%.%'",
1814 |       "target_transformation": "analytics.customers"
1815 |     }
1816 |   }
1817 | }
1818 | ```
1819 | 
1820 | </details>
1821 | 
1822 | ### Module Using Test Library
1823 | 
1824 | <details>
1825 | <summary><strong>YAML Format</strong></summary>
1826 | 
1827 | ```yaml
1828 | ots_version: "0.2.0"
1829 | module_name: "analytics_customers"
1830 | module_description: "Customer analytics transformations"
1831 | test_library_path: "../tests/test_library.yaml"
1832 | tags: ["analytics", "production"]
1833 | 
1834 | target:
1835 |   database: "warehouse"
1836 |   schema: "analytics"
1837 |   sql_dialect: "postgres"
1838 | 
1839 | transformations:
1840 |   - transformation_id: "analytics.customers"
1841 |     description: "Customer data table"
1842 |     transformation_type: "sql"
1843 |     
1844 |     code:
1845 |       sql:
1846 |         original_sql: "SELECT id, name, email, created_at, amount FROM source.customers WHERE active = true"
1847 |         resolved_sql: "SELECT id, name, email, created_at, amount FROM warehouse.source.customers WHERE active = true"
1848 |         source_tables: ["source.customers"]
1849 |         source_functions: []
1850 |     
1851 |     schema:
1852 |       columns:
1853 |         - name: "id"
1854 |           datatype: "number"
1855 |           description: "Unique customer identifier"
1856 |         - name: "name"
1857 |           datatype: "string"
1858 |           description: "Customer name"
1859 |         - name: "email"
1860 |           datatype: "string"
1861 |           description: "Customer email address"
1862 |         - name: "created_at"
1863 |           datatype: "date"
1864 |           description: "Customer creation date"
1865 |         - name: "amount"
1866 |           datatype: "number"
1867 |           description: "Customer account balance"
1868 |       partitioning: []
1869 |       indexes:
1870 |         - name: "idx_customers_id"
1871 |           columns: ["id"]
1872 |         - name: "idx_customers_email"
1873 |           columns: ["email"]
1874 |     
1875 |     materialization:
1876 |       type: "table"
1877 |     
1878 |     tests:
1879 |       columns:
1880 |         id: 
1881 |           - "not_null"                              # Standard test
1882 |           - "unique"                                 # Standard test (column-level)
1883 |         email:
1884 |           - "not_null"
1885 |           - name: "accepted_values"                 # Standard test with params
1886 |             params:
1887 |               values: ["gmail.com", "yahoo.com", "company.com"]
1888 |         amount:
1889 |           - name: "column_not_negative"               # Generic test from library
1890 |       table:
1891 |         - "row_count_gt_0"                          # Standard test
1892 |         - "unique"                                   # Standard test (table-level, checks all columns for row uniqueness)
1893 |         - name: "check_minimum_rows"                 # Generic test with params
1894 |           params:
1895 |             min_rows: 100
1896 |         - "test_customers_email_format"             # Singular test from library
1897 |     
1898 |     metadata:
1899 |       file_path: "/models/analytics/customers.sql"
1900 |       owner: "data-team"
1901 |       tags: ["customer", "core"]
1902 |       object_tags:
1903 |         sensitivity_tag: "pii"
1904 |         classification: "internal"
1905 | ```
1906 | 
1907 | </details>
1908 | 
1909 | <details>
1910 | <summary><strong>JSON Format</strong></summary>
1911 | 
1912 | ```json
1913 | {
1914 |   "ots_version": "0.2.0",
1915 |   "module_name": "analytics_customers",
1916 |   "module_description": "Customer analytics transformations",
1917 |   "test_library_path": "../tests/test_library.yaml",
1918 |   "tags": ["analytics", "production"],
1919 |   "target": {
1920 |     "database": "warehouse",
1921 |     "schema": "analytics",
1922 |     "sql_dialect": "postgres"
1923 |   },
1924 |   "transformations": [
1925 |     {
1926 |       "transformation_id": "analytics.customers",
1927 |       "description": "Customer data table",
1928 |       "transformation_type": "sql",
1929 |       "code": {
1930 |         "sql": {
1931 |           "original_sql": "SELECT id, name, email, created_at, amount FROM source.customers WHERE active = true",
1932 |           "resolved_sql": "SELECT id, name, email, created_at, amount FROM warehouse.source.customers WHERE active = true",
1933 |           "source_tables": ["source.customers"],
1934 |           "source_functions": []
1935 |         }
1936 |       },
1937 |       "schema": {
1938 |         "columns": [
1939 |           {
1940 |             "name": "id",
1941 |             "datatype": "number",
1942 |             "description": "Unique customer identifier"
1943 |           },
1944 |           {
1945 |             "name": "name",
1946 |             "datatype": "string",
1947 |             "description": "Customer name"
1948 |           },
1949 |           {
1950 |             "name": "email",
1951 |             "datatype": "string",
1952 |             "description": "Customer email address"
1953 |           },
1954 |           {
1955 |             "name": "created_at",
1956 |             "datatype": "date",
1957 |             "description": "Customer creation date"
1958 |           },
1959 |           {
1960 |             "name": "amount",
1961 |             "datatype": "number",
1962 |             "description": "Customer account balance"
1963 |           }
1964 |         ],
1965 |         "partitioning": [],
1966 |         "indexes": [
1967 |           {
1968 |             "name": "idx_customers_id",
1969 |             "columns": ["id"]
1970 |           },
1971 |           {
1972 |             "name": "idx_customers_email",
1973 |             "columns": ["email"]
1974 |           }
1975 |         ]
1976 |       },
1977 |       "materialization": {
1978 |         "type": "table"
1979 |       },
1980 |       "tests": {
1981 |         "columns": {
1982 |           "id": ["not_null", "unique"],
1983 |           "email": [
1984 |             "not_null",
1985 |             {
1986 |               "name": "accepted_values",
1987 |               "params": {
1988 |                 "values": ["gmail.com", "yahoo.com", "company.com"]
1989 |               }
1990 |             }
1991 |           ],
1992 |           "amount": [
1993 |             {
1994 |               "name": "column_not_negative"
1995 |             }
1996 |           ]
1997 |         },
1998 |         "table": [
1999 |           "row_count_gt_0",
2000 |           "unique",
2001 |           {
2002 |             "name": "check_minimum_rows",
2003 |             "params": {
2004 |               "min_rows": 100
2005 |             }
2006 |           },
2007 |           "test_customers_email_format"
2008 |         ]
2009 |       },
2010 |       "metadata": {
2011 |         "file_path": "/models/analytics/customers.sql",
2012 |         "owner": "data-team",
2013 |         "tags": ["customer", "core"],
2014 |         "object_tags": {
2015 |           "sensitivity_tag": "pii",
2016 |           "classification": "internal"
2017 |         }
2018 |       }
2019 |     }
2020 |   ]
2021 | }
2022 | ```
2023 | 
2024 | </details>
2025 | 


--------------------------------------------------------------------------------