├── data_quality.dbc ├── README.md └── data_quality └── index.html /data_quality.dbc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/richchad/data_quality_databricks/HEAD/data_quality.dbc -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Examples of data quality processes implemeted in Databricks 2 | 3 | This repository contains a collection of Databricks notebooks that demonstrate configurable data quality processes that can be implemented in Databricks using python and SQL. 4 | 5 | The processes detailed in this repository are related to data quality and data product management, they include methods for automating the maintenance of a data dictionary, refining a data model (comments and column positions), executing data quality tests, blocking bad quality data and value mapping. 6 | 7 | The repository contains a html version of each notebook that can be viewed in a browser and a dbc archive that can be imported into a Databricks workspace. Execute Run All on the notebooks in their numebered order to reproduce the demo in your own workspace. 8 | 9 | ### Notebooks 10 | 1. Create sample data using Databricks data sets. 11 | 2. Create data dictionary tables. 12 | 3. Update data dictionaries using metastore data4. Refine data model. 13 | 4. Comment and reorder columns 14 | 5. Configuring data quality tests. 15 | 6. Executing data quality tests. 16 | 7. Blocking bad quality data 17 | 8. Mapping local values to global ones 18 | 9. Clean up (drop all tables created during demo). 19 | -------------------------------------------------------------------------------- /data_quality/index.html: -------------------------------------------------------------------------------- 1 | 2 | 3 |
4 | 5 |