├── LICENSE
├── README.md
├── data_dictionary_scripts.md
├── data_validation_scripts.md
├── greenplum
├── data_dictionary
│ └── data_dict_dump.sql
└── schemadiff
│ └── simple_schema_dump.sql
├── img
├── 01_schemadiff_side_by_side.png
├── 02_data_dictionary_in_xl.png
├── 03_data_dictionary_edit_and_run_script.png
├── 04_data_validation_scripts.png
└── db_icons
│ ├── gp_icon.png
│ ├── mssql_icon.png
│ ├── mysql_icon.png
│ ├── oracle_icon.png
│ └── pgsql_icon1.png
├── mssql
├── data_dictionary
│ └── data_dict_dump.sql
├── data_validation
│ ├── best_practices.md
│ ├── demo_data
│ │ ├── demo_hr_01_create_database.sql
│ │ ├── demo_hr_02_create_tables.sql
│ │ └── demo_hr_03_populate_tables.sql
│ ├── how_to_use_advanced_script.md
│ ├── how_to_use_basic_script.md
│ ├── img
│ │ ├── 01_data_val_mssql_example_test_case_sql_code.png
│ │ ├── 02_data_val_mssql_run_results1.png
│ │ ├── 03_data_val_mssql_adv_test_case_ex.png
│ │ └── 04_data_val_mssql_run_results_adv.png
│ ├── rule_set_01__row_counts.md
│ ├── rule_set_02__keys.md
│ ├── rule_set_03__heuristic_thresholds.md
│ ├── rule_set_04__numeric_values.md
│ ├── rule_set_05__date_values.md
│ ├── rule_set_06__text_values.md
│ ├── rule_set_07__regular_expressions.md
│ ├── rule_set_08__diff_checks.md
│ ├── rule_set_09__defect_regression.md
│ └── sql_scripts
│ │ ├── dv_advanced_test_cases.sql
│ │ ├── dv_basic_test_cases.sql
│ │ └── dv_test_case_snippets.sql
└── schemadiff
│ └── simple_schema_dump.sql
├── mysql
├── data_dictionary
│ └── data_dict_dump.sql
├── data_validation
│ ├── best_practices.md
│ ├── demo_data
│ │ ├── demo_hr_01_create_database.sql
│ │ ├── demo_hr_02_create_tables.sql
│ │ └── demo_hr_03_populate_tables.sql
│ ├── how_to_use_advanced_script.md
│ ├── how_to_use_basic_script.md
│ ├── img
│ │ ├── 01_data_val_mysql_example_test_case_sql_code.png
│ │ ├── 02_data_val_mysql_run_results.png
│ │ ├── 03_data_val_mysql_adv_test_case.png
│ │ └── 04_data_val_mysql_run_results_adv.png
│ ├── rule_set_01__row_counts.md
│ ├── rule_set_02__keys.md
│ ├── rule_set_03__heuristic_thresholds.md
│ ├── rule_set_04__numeric_values.md
│ ├── rule_set_05__date_values.md
│ ├── rule_set_06__text_values.md
│ ├── rule_set_07__regular_expressions.md
│ ├── rule_set_08__diff_checks.md
│ ├── rule_set_09__defect_regression.md
│ └── sql_scripts
│ │ ├── dv_advanced_test_cases.sql
│ │ ├── dv_basic_test_cases.sql
│ │ └── dv_test_case_snippets.sql
└── schemadiff
│ └── simple_schema_dump.sql
├── oracle
├── data_dictionary
│ └── data_dict_dump.sql
├── data_validation
│ ├── best_practices.md
│ ├── demo_data
│ │ ├── demo_hr_01_create_tables.sql
│ │ └── demo_hr_02_populate_tables.sql
│ ├── how_to_use_advanced_script.md
│ ├── how_to_use_basic_script.md
│ ├── img
│ │ ├── 01_data_val_oracle_example_test_case_sql_code.png
│ │ ├── 02_data_val_oracle_run_results1.png
│ │ ├── 03_data_val_oracle_adv_test_case_ex.png
│ │ └── 04_data_val_oracle_run_results_adv.png
│ ├── rule_set_01__row_counts.md
│ ├── rule_set_02__keys.md
│ ├── rule_set_03__heuristic_thresholds.md
│ ├── rule_set_04__numeric_values.md
│ ├── rule_set_05__date_values.md
│ ├── rule_set_06__text_values.md
│ ├── rule_set_07__regular_expressions.md
│ ├── rule_set_08__diff_checks.md
│ ├── rule_set_09__defect_regression.md
│ └── sql_scripts
│ │ ├── dv_advanced_01_setup.sql
│ │ ├── dv_advanced_02_test_cases.sql
│ │ ├── dv_basic_test_cases.sql
│ │ └── dv_test_case_snippets.sql
└── schemadiff
│ └── simple_schema_dump.sql
├── postgresql
├── data_dictionary
│ └── data_dict_dump.sql
├── data_validation
│ ├── best_practices.md
│ ├── demo_data
│ │ ├── demo_hr_01_create_database.sql
│ │ ├── demo_hr_02_create_tables.sql
│ │ └── demo_hr_03_populate_tables.sql
│ ├── how_to_use_advanced_script.md
│ ├── how_to_use_basic_script.md
│ ├── img
│ │ ├── 01_data_val_postgresql_example_test_case_sql_code.png
│ │ ├── 03_data_val_postgresql_adv_test_case_ex2.png
│ │ └── 04_data_val_postgresql_run_results_adv.png
│ ├── rule_set_01__row_counts.md
│ ├── rule_set_02__keys.md
│ ├── rule_set_03__heuristic_thresholds.md
│ ├── rule_set_04__numeric_values.md
│ ├── rule_set_05__date_values.md
│ ├── rule_set_06__text_values.md
│ ├── rule_set_07__regular_expressions.md
│ ├── rule_set_08__diff_checks.md
│ ├── rule_set_09__defect_regression.md
│ └── sql_scripts
│ │ ├── dv_advanced_test_cases.sql
│ │ ├── dv_basic_test_cases.sql
│ │ └── dv_test_case_snippets.sql
└── schemadiff
│ └── simple_schema_dump.sql
└── schemadiff_scripts.md
/LICENSE:
--------------------------------------------------------------------------------
1 | Creative Commons Legal Code
2 |
3 | CC0 1.0 Universal
4 |
5 | CREATIVE COMMONS CORPORATION IS NOT A LAW FIRM AND DOES NOT PROVIDE
6 | LEGAL SERVICES. DISTRIBUTION OF THIS DOCUMENT DOES NOT CREATE AN
7 | ATTORNEY-CLIENT RELATIONSHIP. CREATIVE COMMONS PROVIDES THIS
8 | INFORMATION ON AN "AS-IS" BASIS. CREATIVE COMMONS MAKES NO WARRANTIES
9 | REGARDING THE USE OF THIS DOCUMENT OR THE INFORMATION OR WORKS
10 | PROVIDED HEREUNDER, AND DISCLAIMS LIABILITY FOR DAMAGES RESULTING FROM
11 | THE USE OF THIS DOCUMENT OR THE INFORMATION OR WORKS PROVIDED
12 | HEREUNDER.
13 |
14 | Statement of Purpose
15 |
16 | The laws of most jurisdictions throughout the world automatically confer
17 | exclusive Copyright and Related Rights (defined below) upon the creator
18 | and subsequent owner(s) (each and all, an "owner") of an original work of
19 | authorship and/or a database (each, a "Work").
20 |
21 | Certain owners wish to permanently relinquish those rights to a Work for
22 | the purpose of contributing to a commons of creative, cultural and
23 | scientific works ("Commons") that the public can reliably and without fear
24 | of later claims of infringement build upon, modify, incorporate in other
25 | works, reuse and redistribute as freely as possible in any form whatsoever
26 | and for any purposes, including without limitation commercial purposes.
27 | These owners may contribute to the Commons to promote the ideal of a free
28 | culture and the further production of creative, cultural and scientific
29 | works, or to gain reputation or greater distribution for their Work in
30 | part through the use and efforts of others.
31 |
32 | For these and/or other purposes and motivations, and without any
33 | expectation of additional consideration or compensation, the person
34 | associating CC0 with a Work (the "Affirmer"), to the extent that he or she
35 | is an owner of Copyright and Related Rights in the Work, voluntarily
36 | elects to apply CC0 to the Work and publicly distribute the Work under its
37 | terms, with knowledge of his or her Copyright and Related Rights in the
38 | Work and the meaning and intended legal effect of CC0 on those rights.
39 |
40 | 1. Copyright and Related Rights. A Work made available under CC0 may be
41 | protected by copyright and related or neighboring rights ("Copyright and
42 | Related Rights"). Copyright and Related Rights include, but are not
43 | limited to, the following:
44 |
45 | i. the right to reproduce, adapt, distribute, perform, display,
46 | communicate, and translate a Work;
47 | ii. moral rights retained by the original author(s) and/or performer(s);
48 | iii. publicity and privacy rights pertaining to a person's image or
49 | likeness depicted in a Work;
50 | iv. rights protecting against unfair competition in regards to a Work,
51 | subject to the limitations in paragraph 4(a), below;
52 | v. rights protecting the extraction, dissemination, use and reuse of data
53 | in a Work;
54 | vi. database rights (such as those arising under Directive 96/9/EC of the
55 | European Parliament and of the Council of 11 March 1996 on the legal
56 | protection of databases, and under any national implementation
57 | thereof, including any amended or successor version of such
58 | directive); and
59 | vii. other similar, equivalent or corresponding rights throughout the
60 | world based on applicable law or treaty, and any national
61 | implementations thereof.
62 |
63 | 2. Waiver. To the greatest extent permitted by, but not in contravention
64 | of, applicable law, Affirmer hereby overtly, fully, permanently,
65 | irrevocably and unconditionally waives, abandons, and surrenders all of
66 | Affirmer's Copyright and Related Rights and associated claims and causes
67 | of action, whether now known or unknown (including existing as well as
68 | future claims and causes of action), in the Work (i) in all territories
69 | worldwide, (ii) for the maximum duration provided by applicable law or
70 | treaty (including future time extensions), (iii) in any current or future
71 | medium and for any number of copies, and (iv) for any purpose whatsoever,
72 | including without limitation commercial, advertising or promotional
73 | purposes (the "Waiver"). Affirmer makes the Waiver for the benefit of each
74 | member of the public at large and to the detriment of Affirmer's heirs and
75 | successors, fully intending that such Waiver shall not be subject to
76 | revocation, rescission, cancellation, termination, or any other legal or
77 | equitable action to disrupt the quiet enjoyment of the Work by the public
78 | as contemplated by Affirmer's express Statement of Purpose.
79 |
80 | 3. Public License Fallback. Should any part of the Waiver for any reason
81 | be judged legally invalid or ineffective under applicable law, then the
82 | Waiver shall be preserved to the maximum extent permitted taking into
83 | account Affirmer's express Statement of Purpose. In addition, to the
84 | extent the Waiver is so judged Affirmer hereby grants to each affected
85 | person a royalty-free, non transferable, non sublicensable, non exclusive,
86 | irrevocable and unconditional license to exercise Affirmer's Copyright and
87 | Related Rights in the Work (i) in all territories worldwide, (ii) for the
88 | maximum duration provided by applicable law or treaty (including future
89 | time extensions), (iii) in any current or future medium and for any number
90 | of copies, and (iv) for any purpose whatsoever, including without
91 | limitation commercial, advertising or promotional purposes (the
92 | "License"). The License shall be deemed effective as of the date CC0 was
93 | applied by Affirmer to the Work. Should any part of the License for any
94 | reason be judged legally invalid or ineffective under applicable law, such
95 | partial invalidity or ineffectiveness shall not invalidate the remainder
96 | of the License, and in such case Affirmer hereby affirms that he or she
97 | will not (i) exercise any of his or her remaining Copyright and Related
98 | Rights in the Work or (ii) assert any associated claims and causes of
99 | action with respect to the Work, in either case contrary to Affirmer's
100 | express Statement of Purpose.
101 |
102 | 4. Limitations and Disclaimers.
103 |
104 | a. No trademark or patent rights held by Affirmer are waived, abandoned,
105 | surrendered, licensed or otherwise affected by this document.
106 | b. Affirmer offers the Work as-is and makes no representations or
107 | warranties of any kind concerning the Work, express, implied,
108 | statutory or otherwise, including without limitation warranties of
109 | title, merchantability, fitness for a particular purpose, non
110 | infringement, or the absence of latent or other defects, accuracy, or
111 | the present or absence of errors, whether or not discoverable, all to
112 | the greatest extent permissible under applicable law.
113 | c. Affirmer disclaims responsibility for clearing rights of other persons
114 | that may apply to the Work or any use thereof, including without
115 | limitation any person's Copyright and Related Rights in the Work.
116 | Further, Affirmer disclaims responsibility for obtaining any necessary
117 | consents, permissions or other rights required for any use of the
118 | Work.
119 | d. Affirmer understands and acknowledges that Creative Commons is not a
120 | party to this document and has no duty or obligation with respect to
121 | this CC0 or use of the Work.
122 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # SQL Scripts
2 | Useful sql scripts for MSSQL, MySQL, Oracle, PostgreSQL, and Greenplum. These are based on years of usage and refinement. These are common scripts used by data analysts, software testers, and other database professionals.
3 |
4 |
5 |
6 |
7 | ## [Data Dictionary](https://github.com/DataResearchLabs/sql_scripts/blob/main/data_dictionary_scripts.md)
8 |
9 |
10 |
11 | The Data Dictionary script and tutorials enable you to easily document an existing database schema. You can dump the tables, views, column descriptions, data types/lengths/sizes/precision, key constraints, and other information. Export to Excel for pretty output and simple filtering, searching, and sharing.
12 |
13 |
14 |
15 |
16 |
17 |
18 | ## [Data Validation Scripts](https://github.com/DataResearchLabs/sql_scripts/blob/main/data_validation_scripts.md)
19 |
20 |
21 |
22 | The Data Validation Framework scripts and tutorials enable you to easily standup a battery of automated data validation tests in your environment. Use the DVF for prod checkouts, or test and stage regression tests, or dev unit tests, or automated data validation after each data load. There are nine rule sets depicting 66 sample test cases to demonstrate how to test for row counts, keys, heuristic thresholds, numeric/date/text values, regular expressions, and data or schema diffs. The basic data validation script demonstrations executes all 66 sample validation tests putting out one line of text with the test id, status, and test description -- a nice simple way to organize your tests. The advanced data validation scripts execute the same 66 sample validation tests, but pushes output to a table and adds in the execution time in seconds, as well supporting detail rows on fails with the rejection code + reason, the expected value, the actual value, and the SQL to lookup the rejected row so you can copy-paste-execute-and-troubleshoot.
23 |
24 |
25 |
26 | ## [SchemaDiff](https://github.com/DataResearchLabs/sql_scripts/blob/main/schemadiff_scripts.md)
27 |
28 |
29 |
30 | The SchemaDiff script and tutorials enable you to track changes to your schema over time or between environments. You'll know exactly what changed last night with a deployment vs. the prior night's stable baseline. You'll know when folks are changing your development or test environment every morning rather than 15 days later, avoiding all the troubleshooting red herrings.
31 |
32 |
33 |
34 |
35 |
36 |
37 | ***If you like these scripts, please be sure to click the "Star" button above in GitHub.***
38 |
39 | ***Also, be sure to visit or subscribe to our YouTube channel*** www.DataResearchLabs.com!
40 |
41 |
42 |
--------------------------------------------------------------------------------
/greenplum/data_dictionary/data_dict_dump.sql:
--------------------------------------------------------------------------------
1 | ------------------------------------------------------------------------------------
2 | -- Data Dictionary Dump:
3 | -- This SQL script will dump table, column, key, and description design related
4 | -- metadata so that you can copy-paste or export to Excel as a Data Dictionary.
5 | ------------------------------------------------------------------------------------
6 | -- Platform: Greenplum
7 | -- Author: DataResearchLabs
8 | -- GitHub: https://github.com/DataResearchLabs/sql_scripts
9 | -- YouTube Tutorials: https://www.youtube.com/channel/UCQciXv3xaBykeUFc04GxSXA
10 | ----------------------------------------------------------------------------------
11 | WITH vars
12 | AS (
13 | SELECT
14 | 'acq_cms' :: CHARACTER VARYING(50) AS v_SchemaName -- <<<<<<<<<<<<<<<< Set to the schema you wish to document
15 | , 'NO' :: CHARACTER VARYING(10) AS v_TablesOnly -- YES=Limit To Tables only; NO=Include views too
16 | )
17 |
18 | , baseTbl
19 | AS (
20 | SELECT table_schema AS SchemaName
21 | , table_catalog
22 | , table_type, table_name, table_schema
23 | FROM INFORMATION_SCHEMA.TABLES
24 | WHERE TABLE_SCHEMA = (SELECT v_SchemaName FROM vars)
25 | AND ( (TABLE_TYPE = 'BASE TABLE')
26 | OR ((SELECT v_TablesOnly FROM vars) = 'NO')
27 | )
28 | )
29 |
30 | , metadata
31 | AS (
32 | SELECT
33 | bt.SchemaName AS schema_nm
34 | , bt.table_name AS table_nm
35 | , CASE WHEN bt.TABLE_TYPE = 'BASE TABLE' THEN 'TBL'
36 | WHEN bt.TABLE_TYPE = 'VIEW' THEN 'VW'
37 | ELSE 'UK'
38 | END AS obj_typ
39 | , tut.ordinal_position AS ord_pos
40 | , tut.column_name AS column_nm
41 | , tut.data_type ||
42 | CASE WHEN tut.data_type IN('varchar','char') THEN '(' || tut.CHARACTER_MAXIMUM_LENGTH || ')'
43 | WHEN tut.data_type IN('date','time') THEN '(3)'
44 | WHEN tut.data_type = 'datetime' THEN '(8)'
45 | WHEN tut.data_type = 'timestamp' THEN '(4)'
46 | WHEN tut.data_type in('bigint','int8','integer','int','int4','smallint','int2') THEN '(' || tut.NUMERIC_PRECISION || ')'
47 | WHEN tut.data_type = 'decimal' THEN '(' || tut.NUMERIC_PRECISION || ',' || tut.NUMERIC_SCALE || ')'
48 | WHEN tut.CHARACTER_MAXIMUM_LENGTH IS NOT NULL THEN '(' || tut.CHARACTER_MAXIMUM_LENGTH || ')'
49 | WHEN tut.DATETIME_PRECISION IS NOT NULL THEN '(' || tut.DATETIME_PRECISION || ')'
50 | WHEN tut.NUMERIC_PRECISION IS NOT NULL
51 | AND tut.NUMERIC_SCALE IS NULL THEN '(' || tut.NUMERIC_PRECISION || ')'
52 | WHEN tut.NUMERIC_PRECISION IS NOT NULL
53 | AND tut.NUMERIC_SCALE IS NOT NULL THEN '(' || tut.NUMERIC_PRECISION || ',' || tut.NUMERIC_SCALE || ')'
54 | ELSE ''
55 | END AS data_typ
56 | , CASE WHEN tut.IS_NULLABLE = 'YES' THEN 'NULL' ELSE 'NOT NULL' END AS nullable
57 | FROM INFORMATION_SCHEMA.COLUMNS tut
58 | INNER JOIN baseTbl bt ON bt.table_catalog = tut.TABLE_CATALOG AND bt.table_name = tut.table_name
59 | )
60 |
61 |
62 | , meta_for_keys
63 | AS (
64 | SELECT schema_nm, table_nm
65 | , COALESCE(distribution_keys,'DISTRIBUTED RANDOMLY') :: CHARACTER VARYING(255) AS column_nm
66 | , 'DK' :: CHARACTER VARYING(10) AS is_key
67 | FROM
68 | ( SELECT pgn.nspname AS schema_nm,
69 | pgc.relname AS table_nm,
70 | pga.attname AS distribution_keys
71 | FROM
72 | ( SELECT gdp.localoid
73 | , CASE WHEN ( Array_upper(gdp.attrnums, 1) > 0 ) THEN Unnest(gdp.attrnums)
74 | ELSE NULL
75 | END As attnum
76 | FROM gp_distribution_policy gdp
77 | ORDER BY gdp.localoid
78 | ) AS distrokey
79 | INNER JOIN pg_class pgc ON distrokey.localoid = pgc.oid
80 | INNER JOIN pg_namespace pgn ON pgc.relnamespace = pgn.oid
81 | LEFT OUTER JOIN pg_attribute pga ON distrokey.attnum = pga.attnum AND distrokey.localoid = pga.attrelid
82 | WHERE pgn.nspname = (SELECT v_SchemaName FROM vars)
83 | AND pgc.relname IN(SELECT DISTINCT table_name FROM baseTbl)
84 | ) AS a
85 | )
86 |
87 | , col_descr
88 | AS (
89 | SELECT schema_nm
90 | , REPLACE(table_nm, schema_nm || '.', '') AS table_nm
91 | , column_nm, column_descr
92 | FROM (
93 | SELECT
94 | ns.nspname AS schema_nm
95 | , (a.attrelid :: regclass) :: VARCHAR(150) AS table_nm
96 | , a.attname AS column_nm
97 | , d.description AS column_descr
98 | FROM pg_catalog.pg_attribute a
99 | LEFT JOIN pg_catalog.pg_class c ON a.attrelid = c.oid
100 | LEFT JOIN pg_catalog.pg_namespace ns ON c.relnamespace = ns.oid
101 | LEFT JOIN pg_catalog.pg_description d ON d.objoid = a.attrelid AND d.objsubid = a.attnum
102 | WHERE a.attnum > 0
103 | AND NOT a.attisdropped
104 | ) t
105 | WHERE schema_nm = (SELECT v_SchemaName FROM vars)
106 | AND REPLACE(table_nm, schema_nm || '.', '') IN(SELECT DISTINCT table_name FROM baseTbl)
107 | )
108 |
109 | , combined_data
110 | AS (
111 | SELECT md.schema_nm
112 | , md.table_nm
113 | , md.obj_typ
114 | , md.ord_pos AS ord
115 | , COALESCE(dk.is_key, ' ') AS is_key
116 | , md.column_nm
117 | , md.data_typ
118 | , md.nullable
119 | , cd.column_descr
120 | FROM metadata md
121 | LEFT JOIN col_descr cd ON cd.schema_nm = md.schema_nm AND cd.table_nm = md.table_nm AND cd.column_nm = md.column_nm
122 | LEFT JOIN meta_for_keys dk ON dk.schema_nm = md.schema_nm AND dk.table_nm = md.table_nm AND dk.column_nm = md.column_nm
123 | )
124 |
125 | SELECT *
126 | FROM combined_data
127 | ORDER BY schema_nm, table_nm, ord
--------------------------------------------------------------------------------
/greenplum/schemadiff/simple_schema_dump.sql:
--------------------------------------------------------------------------------
1 | ------------------------------------------------------------------------------------
2 | -- Simple Schema Dump:
3 | -- This SQL script will dump table, column, key, and index design related metadata
4 | -- so that you can copy-paste or export to a text file.
5 | -- Even better, you can make other snapshots over time (same database schema earlier
6 | -- points in time), OR in different environments (DEV, PPMO, STAGE, PROD). Then,
7 | -- using your favorite free Text File Diff Tool (DiffMerge, ExamDiff, etc.) you
8 | -- can compare snapshots to quick isolate and identify what changed over time
9 | -- or is different between environments.
10 | ------------------------------------------------------------------------------------
11 | -- Platform: Greenplum Server
12 | -- Author: DataResearchLabs
13 | -- GitHub: https://github.com/DataResearchLabs/sql_scripts
14 | -- YouTube Tutorials: https://www.youtube.com/playlist?list=PLVHoUDdbskUSWcPnDztPhXOnQT1R-rg0t
15 | ----------------------------------------------------------------------------------
16 | WITH vars
17 | AS (
18 | SELECT 'acq_cms' :: CHARACTER VARYING(50) AS v_SchemaName
19 | )
20 |
21 |
22 | , baseTbl
23 | AS (
24 | SELECT table_schema AS SchemaName, table_type, table_name
25 | FROM INFORMATION_SCHEMA.TABLES
26 | WHERE table_schema = (SELECT v_SchemaName FROM vars)
27 | )
28 |
29 |
30 | , metaForTbl
31 | AS (
32 | SELECT SchemaName
33 | , table_name AS TableName
34 | , '(' || CASE WHEN table_type = 'BASE TABLE' THEN 'Table'
35 | WHEN table_type = 'VIEW' THEN 'View'
36 | ELSE 'UK'
37 | END
38 | || ')' :: CHARACTER VARYING(20) AS ObjectType
39 | , table_name AS ObjectName
40 | , '(Exists)' :: CHARACTER VARYING(20) AS PropertyName
41 | , '' :: CHARACTER VARYING(50) AS PropertyValue
42 | FROM baseTbl
43 | )
44 |
45 | , metaForCol_dataType
46 | AS (
47 | SELECT ft.SchemaName, ft.table_name AS TableName
48 | , 'Column' :: CHARACTER VARYING(15) AS ObjectType
49 | , tut.column_name AS ObjectName
50 | , '2' :: CHARACTER VARYING(10) AS PropertyName
51 | , COALESCE(tut.data_type, 'unknown')
52 | || CASE WHEN tut.CHARACTER_MAXIMUM_LENGTH IS NOT NULL
53 | OR tut.NUMERIC_PRECISION IS NOT NULL
54 | OR tut.NUMERIC_SCALE IS NOT NULL THEN '('
55 | ELSE ''
56 | END
57 | || CASE WHEN tut.CHARACTER_MAXIMUM_LENGTH IS NOT NULL THEN CAST(tut.CHARACTER_MAXIMUM_LENGTH AS VARCHAR(10))
58 | ELSE ''
59 | END
60 | || CASE WHEN tut.DATA_TYPE IN('date','datetime','timestamp') THEN ''
61 | WHEN tut.NUMERIC_PRECISION IS NULL THEN ''
62 | ELSE CAST(tut.NUMERIC_PRECISION AS VARCHAR(10))
63 | END
64 | || CASE WHEN tut.NUMERIC_SCALE IS NULL THEN ''
65 | WHEN tut.NUMERIC_SCALE >0 THEN ',' || CAST(tut.NUMERIC_SCALE AS VARCHAR(10))
66 | ELSE ''
67 | END
68 | || CASE WHEN tut.CHARACTER_MAXIMUM_LENGTH IS NOT NULL
69 | OR tut.NUMERIC_PRECISION IS NOT NULL
70 | OR tut.NUMERIC_SCALE IS NOT NULL THEN ')'
71 | ELSE ''
72 | END
73 | :: CHARACTER VARYING(255) AS PropertyValue
74 | FROM INFORMATION_SCHEMA.COLUMNS tut
75 | INNER JOIN baseTbl ft ON ft.SchemaName = tut.TABLE_SCHEMA AND ft.TABLE_NAME = tut.TABLE_NAME
76 | )
77 |
78 |
79 | , metaForCol_nullable
80 | AS (
81 | SELECT ft.SchemaName, ft.table_name AS TableName
82 | , 'Column' :: CHARACTER VARYING(20) AS ObjectType
83 | , tut.column_name AS ObjectName
84 | , '3' :: CHARACTER VARYING(20) AS PropertyName
85 | , CASE WHEN tut.IS_NULLABLE = 'YES' THEN 'NULL' ELSE 'NOT NULL' END :: CHARACTER VARYING(20) AS PropertyValue
86 | FROM INFORMATION_SCHEMA.COLUMNS tut
87 | INNER JOIN baseTbl ft ON ft.SchemaName = tut.TABLE_SCHEMA AND ft.table_name = tut.table_name
88 | )
89 |
90 |
91 | , metaForCol_ordpos
92 | AS (
93 | SELECT ft.SchemaName, ft.table_name AS TableName
94 | , 'Column' :: CHARACTER VARYING(20) AS ObjectType
95 | , tut.column_name AS ObjectName
96 | , '1' :: CHARACTER VARYING(20) AS PropertyName
97 | , CASE WHEN tut.ORDINAL_POSITION IS NULL THEN ''
98 | ELSE LPAD( CAST(tut.ORDINAL_POSITION AS VARCHAR(3)), 3, '0')
99 | END AS PropertyValue
100 | FROM INFORMATION_SCHEMA.COLUMNS tut
101 | INNER JOIN baseTbl ft ON ft.SchemaName = tut.TABLE_SCHEMA AND ft.table_name = tut.table_name
102 | )
103 |
104 |
105 | , metaAllCols
106 | AS (
107 | SELECT schemaname, tablename, objecttype, objectname
108 | , 'Properties' :: CHARACTER VARYING(20) AS propertyname
109 | , STRING_AGG(propertyvalue, ' | ' ORDER BY propertyname, propertyvalue) :: CHARACTER VARYING(255) AS propertyvalue
110 | FROM (
111 | SELECT * FROM metaForCol_dataType
112 | UNION SELECT * FROM metaForCol_nullable
113 | UNION SELECT * FROM metaForCol_ordpos
114 | ) t
115 | GROUP BY schemaname, tablename, objecttype, objectname
116 | )
117 |
118 |
119 | , metaForKeys
120 | AS (
121 | SELECT SchemaName, TableName
122 | , 'DistribKey' :: CHARACTER VARYING(20) AS ObjectType
123 | , 'n/a' :: CHARACTER VARYING(10) AS ObjectName
124 | , 'FieldList' :: CHARACTER VARYING(10) AS PropertyName
125 | , COALESCE(distribution_keys,'DISTRIBUTED RANDOMLY') :: CHARACTER VARYING(255) AS PropertyValue
126 | FROM
127 | ( SELECT pgn.nspname AS SchemaName,
128 | pgc.relname AS TableName,
129 | pga.attname AS distribution_keys
130 | FROM
131 | ( SELECT gdp.localoid
132 | , CASE WHEN ( Array_upper(gdp.attrnums, 1) > 0 ) THEN Unnest(gdp.attrnums)
133 | ELSE NULL
134 | END As attnum
135 | FROM gp_distribution_policy gdp
136 | ORDER BY gdp.localoid
137 | ) AS distrokey
138 | INNER JOIN pg_class pgc ON distrokey.localoid = pgc.oid
139 | INNER JOIN pg_namespace pgn ON pgc.relnamespace = pgn.oid
140 | LEFT OUTER JOIN pg_attribute pga ON distrokey.attnum = pga.attnum AND distrokey.localoid = pga.attrelid
141 | WHERE pgn.nspname = (SELECT v_SchemaName FROM vars)
142 | AND pgc.relname IN(SELECT DISTINCT table_name FROM baseTbl)
143 | ) AS a
144 | )
145 |
146 |
147 | , metaForIdxs
148 | AS (
149 | SELECT (SELECT v_SchemaName FROM vars) AS SchemaName
150 | , tablename AS TableName
151 | , 'Index' :: CHARACTER VARYING(20) AS ObjectType
152 | , indexname AS ObjectName
153 | , 'FieldList' :: CHARACTER VARYING(20) AS PropertyName
154 | , REPLACE(SUBSTR(indexdef, POSITION('(' IN indexdef)+ 1), ')', '') :: CHARACTER VARYING(255) AS PropertyValue
155 | FROM pg_catalog.pg_indexes
156 | WHERE schemaname = (SELECT v_SchemaName FROM vars)
157 | AND tablename IN(SELECT DISTINCT table_name FROM baseTbl)
158 | )
159 |
160 |
161 | , allMetadata
162 | AS (
163 | SELECT * FROM metaAllCols
164 | UNION SELECT * FROM metaForTbl /* not first, b/c propertyvalue column all nulls = no data type default = error */
165 | UNION SELECT * FROM metaForKeys
166 | UNION SELECT * FROM metaForIdxs
167 | )
168 |
169 |
170 | SELECT CASE WHEN objecttype IN('(Table)','(View)') THEN schemaname ELSE '' END AS schema_nm
171 | , CASE WHEN objecttype IN('(Table)','(View)') THEN tablename ELSE '' END AS tbl_nm
172 | , objecttype AS obj_typ, objectname AS obj_nm, /*propertyname,*/ propertyvalue AS properties
173 | FROM allMetadata
174 | ORDER BY schemaname, tablename, objecttype
175 | , CASE WHEN objecttype='Column' THEN propertyvalue ELSE '' END
176 | , objectname, propertyname
177 |
--------------------------------------------------------------------------------
/img/01_schemadiff_side_by_side.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DataResearchLabs/sql_scripts/3eaa17caf248d3b58763534d62ef4d9ac5729a46/img/01_schemadiff_side_by_side.png
--------------------------------------------------------------------------------
/img/02_data_dictionary_in_xl.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DataResearchLabs/sql_scripts/3eaa17caf248d3b58763534d62ef4d9ac5729a46/img/02_data_dictionary_in_xl.png
--------------------------------------------------------------------------------
/img/03_data_dictionary_edit_and_run_script.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DataResearchLabs/sql_scripts/3eaa17caf248d3b58763534d62ef4d9ac5729a46/img/03_data_dictionary_edit_and_run_script.png
--------------------------------------------------------------------------------
/img/04_data_validation_scripts.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DataResearchLabs/sql_scripts/3eaa17caf248d3b58763534d62ef4d9ac5729a46/img/04_data_validation_scripts.png
--------------------------------------------------------------------------------
/img/db_icons/gp_icon.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DataResearchLabs/sql_scripts/3eaa17caf248d3b58763534d62ef4d9ac5729a46/img/db_icons/gp_icon.png
--------------------------------------------------------------------------------
/img/db_icons/mssql_icon.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DataResearchLabs/sql_scripts/3eaa17caf248d3b58763534d62ef4d9ac5729a46/img/db_icons/mssql_icon.png
--------------------------------------------------------------------------------
/img/db_icons/mysql_icon.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DataResearchLabs/sql_scripts/3eaa17caf248d3b58763534d62ef4d9ac5729a46/img/db_icons/mysql_icon.png
--------------------------------------------------------------------------------
/img/db_icons/oracle_icon.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DataResearchLabs/sql_scripts/3eaa17caf248d3b58763534d62ef4d9ac5729a46/img/db_icons/oracle_icon.png
--------------------------------------------------------------------------------
/img/db_icons/pgsql_icon1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DataResearchLabs/sql_scripts/3eaa17caf248d3b58763534d62ef4d9ac5729a46/img/db_icons/pgsql_icon1.png
--------------------------------------------------------------------------------
/mssql/data_dictionary/data_dict_dump.sql:
--------------------------------------------------------------------------------
1 |
2 | ------------------------------------------------------------------------------------
3 | -- Data Dictionary Dump:
4 | -- This SQL script will dump table, column, key, and description design related
5 | -- metadata so that you can copy-paste or export to Excel as a Data Dictionary.
6 | ------------------------------------------------------------------------------------
7 | -- Platform: SQL Server
8 | -- Author: DataResearchLabs
9 | -- GitHub: https://github.com/DataResearchLabs/sql_scripts
10 | -- GitHub Tool: https://github.com/DataResearchLabs/data_analysts_toolbox/blob/main/data_dictionary_generator/readme.md
11 | -- YouTube Tutorials: https://www.youtube.com/channel/UCQciXv3xaBykeUFc04GxSXA
12 | ----------------------------------------------------------------------------------
13 |
14 | -- IMPORTANT
15 | USE AdventureWorksLT2019; -- <<<<<<<<<<<< CHANGE THIS VALUE to Schema you want to dump
16 | ;
17 |
18 | -- All variables are consolidated here in the first CTE (Common Table Expression)
19 | -- Each given row is a variable, with the value you change preceding the "AS" command
20 | WITH vars
21 | AS (
22 | SELECT
23 | DB_NAME() AS v_SchemaName -- (Do not change this value, it is picked up from changes above)
24 | , 'NO' AS v_TablesOnly -- Change this setting: YES=Limit To Tables only; NO=Include views too
25 | )
26 |
27 |
28 | , baseTbl
29 | AS (
30 | SELECT TABLE_CATALOG AS SchemaName, table_type, table_name, table_schema
31 | FROM INFORMATION_SCHEMA.TABLES
32 | WHERE TABLE_CATALOG = (SELECT v_SchemaName FROM vars)
33 | AND ( (TABLE_TYPE = 'BASE TABLE')
34 | OR ((SELECT v_TablesOnly FROM vars) = 'NO')
35 | )
36 | )
37 |
38 |
39 | , metadata
40 | AS (
41 | SELECT
42 | bt.SchemaName AS schema_nm
43 | , bt.table_name AS table_nm
44 | , CASE WHEN bt.table_type = 'BASE TABLE' THEN 'TBL'
45 | WHEN bt.table_type = 'VIEW' THEN 'VW'
46 | ELSE 'UK'
47 | END AS obj_typ
48 | , RIGHT('000' + CAST(tut.ORDINAL_POSITION AS VARCHAR(3)), 3) AS ord_pos
49 | , tut.column_name AS column_nm
50 | , COALESCE(tut.data_type, 'unknown') +
51 | CASE WHEN tut.data_type IN('varchar','nvarchar') THEN '(' + CAST(tut.CHARACTER_MAXIMUM_LENGTH AS VARCHAR(10)) + ')'
52 | WHEN tut.data_type IN('char','nchar') THEN '(' + CAST(tut.CHARACTER_MAXIMUM_LENGTH AS VARCHAR(10)) + ')'
53 | WHEN tut.data_type ='date' THEN '(' + CAST(tut.DATETIME_PRECISION AS VARCHAR(10)) + ')'
54 | WHEN tut.data_type ='datetime' THEN '(' + CAST(tut.DATETIME_PRECISION AS VARCHAR(10)) + ')'
55 | WHEN tut.data_type in('bigint','int','smallint', 'tinyint') THEN '(' + CAST(tut.NUMERIC_PRECISION AS VARCHAR(10)) + ')'
56 | WHEN tut.data_type = 'uniqueidentifier' THEN '(16)'
57 | WHEN tut.data_type = 'money' THEN '(' + CAST(tut.NUMERIC_PRECISION AS VARCHAR(10)) + ')'
58 | WHEN tut.data_type = 'decimal' THEN '(' + CAST(tut.NUMERIC_PRECISION AS VARCHAR(10)) + ',' + CAST(tut.NUMERIC_SCALE AS VARCHAR(10)) + ')'
59 | WHEN tut.data_type = 'numeric' THEN '(' + CAST(tut.NUMERIC_PRECISION AS VARCHAR(10)) + ',' + CAST(tut.NUMERIC_SCALE AS VARCHAR(10)) + ')'
60 | WHEN tut.data_type = 'varbinary' THEN '(' + CAST(tut.CHARACTER_MAXIMUM_LENGTH AS VARCHAR(10)) + ')'
61 | WHEN tut.data_type = 'xml' THEN '(' + CAST(tut.CHARACTER_MAXIMUM_LENGTH AS VARCHAR(10)) + ')'
62 | WHEN tut.data_type IN('char','nchar') THEN '(' + CAST(tut.CHARACTER_MAXIMUM_LENGTH AS VARCHAR(10)) + ')'
63 | WHEN tut.CHARACTER_MAXIMUM_LENGTH IS NOT NULL THEN '(' + CAST(tut.CHARACTER_MAXIMUM_LENGTH AS VARCHAR(10)) + ')'
64 | WHEN tut.DATETIME_PRECISION IS NOT NULL THEN '(' + CAST(tut.DATETIME_PRECISION AS VARCHAR(10)) + ')'
65 | WHEN tut.NUMERIC_PRECISION IS NOT NULL
66 | AND tut.NUMERIC_SCALE IS NULL THEN '(' + CAST(tut.NUMERIC_PRECISION AS VARCHAR(10)) + ')'
67 | WHEN tut.NUMERIC_PRECISION IS NOT NULL
68 | AND tut.NUMERIC_SCALE IS NOT NULL THEN '(' + CAST(tut.NUMERIC_PRECISION AS VARCHAR(10)) + ',' + CAST(tut.NUMERIC_SCALE AS VARCHAR(10)) + ')'
69 | ELSE ''
70 | END AS data_typ
71 | , CASE WHEN tut.IS_NULLABLE = 'YES' THEN 'NULL' ELSE 'NOT NULL' END AS nullable
72 | FROM INFORMATION_SCHEMA.COLUMNS tut
73 | INNER JOIN baseTbl bt ON bt.SchemaName = tut.TABLE_CATALOG AND bt.table_name = tut.table_name
74 | )
75 |
76 | , descr
77 | AS (
78 | SELECT
79 | bt.SchemaName AS schema_nm
80 | , bt.table_name AS table_nm
81 | , tut.column_name AS column_nm
82 | , STRING_AGG(CAST(de.value AS VARCHAR(1024)), '. ') WITHIN GROUP (ORDER BY de.value) AS description
83 | FROM INFORMATION_SCHEMA.COLUMNS tut
84 | INNER JOIN baseTbl bt ON bt.SchemaName = tut.TABLE_CATALOG AND bt.table_name = tut.table_name
85 | LEFT JOIN sys.extended_properties de ON de.major_id = OBJECT_ID(bt.table_schema + '.' + bt.table_name)
86 | AND de.minor_id = tut.ORDINAL_POSITION
87 | AND de.name = 'MS_Description'
88 | GROUP BY bt.SchemaName, bt.table_name, tut.column_name
89 | )
90 |
91 |
92 | , metadata_keys
93 | AS (
94 | SELECT schema_nm, table_nm, column_nm
95 | , STRING_AGG(key_typ, ',') WITHIN GROUP (ORDER BY key_typ) AS is_key
96 | FROM (
97 | SELECT
98 | cons.TABLE_CATALOG AS schema_nm
99 | , cons.TABLE_NAME AS table_nm
100 | , kcu.COLUMN_NAME AS column_nm
101 | , CASE WHEN cons.CONSTRAINT_TYPE = 'PRIMARY KEY' THEN 'PK'
102 | WHEN cons.CONSTRAINT_TYPE = 'UNIQUE' THEN 'UK'
103 | WHEN cons.CONSTRAINT_TYPE = 'FOREIGN KEY' THEN 'FK'
104 | ELSE 'X'
105 | END AS key_typ
106 | FROM INFORMATION_SCHEMA.TABLE_CONSTRAINTS cons
107 | INNER JOIN INFORMATION_SCHEMA.KEY_COLUMN_USAGE kcu
108 | ON cons.TABLE_CATALOG = kcu.TABLE_CATALOG
109 | AND cons.TABLE_NAME = kcu.TABLE_NAME
110 | AND cons.CONSTRAINT_NAME = kcu.CONSTRAINT_NAME
111 | WHERE cons.TABLE_CATALOG = (SELECT v_SchemaName FROM vars)
112 | AND cons.table_name IN(SELECT DISTINCT table_name FROM baseTbl)
113 | AND cons.constraint_type IN('PRIMARY KEY','FOREIGN KEY','UNIQUE')
114 | GROUP BY cons.TABLE_CATALOG, cons.TABLE_NAME, kcu.COLUMN_NAME, cons.CONSTRAINT_TYPE
115 | ) AS t
116 | GROUP BY schema_nm, table_nm, column_nm
117 | )
118 |
119 |
120 | SELECT md.schema_nm, md.table_nm, md.obj_typ, md.ord_pos
121 | , COALESCE(mk.is_key, ' ') AS keys
122 | , md.column_nm, md.data_typ, md.nullable
123 | , de.[description]
124 | FROM metadata md
125 | LEFT JOIN descr de ON de.schema_nm = md.schema_nm AND de.table_nm = md.table_nm AND de.column_nm = md.column_nm
126 | LEFT JOIN metadata_keys mk ON mk.schema_nm = md.schema_nm AND mk.table_nm = md.table_nm AND mk.column_nm = md.column_nm
127 | ORDER BY schema_nm, table_nm, ord_pos
128 |
129 |
--------------------------------------------------------------------------------
/mssql/data_validation/demo_data/demo_hr_01_create_database.sql:
--------------------------------------------------------------------------------
1 | USE [master]
2 | GO
3 |
4 | CREATE DATABASE [Demo_HR]
5 | CONTAINMENT = NONE
6 | ON PRIMARY
7 | ( NAME = N'Demo_HR', FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL15.SQLEXPRESS\MSSQL\DATA\Demo_HR.mdf' , SIZE = 8192KB , MAXSIZE = UNLIMITED, FILEGROWTH = 65536KB )
8 | LOG ON
9 | ( NAME = N'Demo_HR_log', FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL15.SQLEXPRESS\MSSQL\DATA\Demo_HR_log.ldf' , SIZE = 8192KB , MAXSIZE = 2048GB , FILEGROWTH = 65536KB )
10 | WITH CATALOG_COLLATION = DATABASE_DEFAULT
11 | GO
12 |
13 | /*
14 | ALTER DATABASE [Demo_HR] SET ANSI_NULL_DEFAULT OFF
15 | GO
16 | ALTER DATABASE [Demo_HR] SET ANSI_NULLS OFF
17 | GO
18 | ALTER DATABASE [Demo_HR] SET ANSI_PADDING OFF
19 | GO
20 | ALTER DATABASE [Demo_HR] SET ANSI_WARNINGS OFF
21 | GO
22 | ALTER DATABASE [Demo_HR] SET ARITHABORT OFF
23 | GO
24 | ALTER DATABASE [Demo_HR] SET AUTO_CLOSE OFF
25 | GO
26 | ALTER DATABASE [Demo_HR] SET AUTO_SHRINK OFF
27 | GO
28 | ALTER DATABASE [Demo_HR] SET AUTO_UPDATE_STATISTICS ON
29 | GO
30 | ALTER DATABASE [Demo_HR] SET CURSOR_CLOSE_ON_COMMIT OFF
31 | GO
32 | ALTER DATABASE [Demo_HR] SET CURSOR_DEFAULT GLOBAL
33 | GO
34 | ALTER DATABASE [Demo_HR] SET CONCAT_NULL_YIELDS_NULL OFF
35 | GO
36 | ALTER DATABASE [Demo_HR] SET NUMERIC_ROUNDABORT OFF
37 | GO
38 | ALTER DATABASE [Demo_HR] SET QUOTED_IDENTIFIER OFF
39 | GO
40 | ALTER DATABASE [Demo_HR] SET RECURSIVE_TRIGGERS OFF
41 | GO
42 | ALTER DATABASE [Demo_HR] SET DISABLE_BROKER
43 | GO
44 | ALTER DATABASE [Demo_HR] SET AUTO_UPDATE_STATISTICS_ASYNC OFF
45 | GO
46 | ALTER DATABASE [Demo_HR] SET DATE_CORRELATION_OPTIMIZATION OFF
47 | GO
48 | ALTER DATABASE [Demo_HR] SET TRUSTWORTHY OFF
49 | GO
50 | ALTER DATABASE [Demo_HR] SET ALLOW_SNAPSHOT_ISOLATION OFF
51 | GO
52 | ALTER DATABASE [Demo_HR] SET PARAMETERIZATION SIMPLE
53 | GO
54 | ALTER DATABASE [Demo_HR] SET READ_COMMITTED_SNAPSHOT OFF
55 | GO
56 | ALTER DATABASE [Demo_HR] SET HONOR_BROKER_PRIORITY OFF
57 | GO
58 | ALTER DATABASE [Demo_HR] SET RECOVERY SIMPLE
59 | GO
60 | ALTER DATABASE [Demo_HR] SET MULTI_USER
61 | GO
62 | ALTER DATABASE [Demo_HR] SET PAGE_VERIFY CHECKSUM
63 | GO
64 | ALTER DATABASE [Demo_HR] SET DB_CHAINING OFF
65 | GO
66 | ALTER DATABASE [Demo_HR] SET FILESTREAM( NON_TRANSACTED_ACCESS = OFF )
67 | GO
68 | ALTER DATABASE [Demo_HR] SET TARGET_RECOVERY_TIME = 60 SECONDS
69 | GO
70 | ALTER DATABASE [Demo_HR] SET DELAYED_DURABILITY = DISABLED
71 | GO
72 | ALTER DATABASE [Demo_HR] SET ACCELERATED_DATABASE_RECOVERY = OFF
73 | GO
74 | ALTER DATABASE [Demo_HR] SET QUERY_STORE = OFF
75 | GO
76 | ALTER DATABASE [Demo_HR] SET READ_WRITE
77 | GO
78 | */
79 |
--------------------------------------------------------------------------------
/mssql/data_validation/demo_data/demo_hr_02_create_tables.sql:
--------------------------------------------------------------------------------
1 |
2 | -- Demo_HR..JOBS definition
3 | CREATE TABLE Demo_HR..JOBS
4 | (JOB_ID VARCHAR(10),
5 | JOB_TITLE VARCHAR(35) NOT NULL,
6 | MIN_SALARY NUMERIC(6,0),
7 | MAX_SALARY NUMERIC(6,0),
8 | CONSTRAINT JOB_ID_PK PRIMARY KEY (JOB_ID)
9 | );
10 |
11 |
12 |
13 | -- Demo_HR..JOBS_SNAPSHOT definition
14 | CREATE TABLE Demo_HR..JOBS_SNAPSHOT
15 | (JOB_ID VARCHAR(10),
16 | JOB_TITLE VARCHAR(35) NOT NULL,
17 | MIN_SALARY NUMERIC(6,0),
18 | MAX_SALARY NUMERIC(6,0)
19 | );
20 |
21 |
22 |
23 | -- Demo_HR..REGIONS definition
24 | CREATE TABLE Demo_HR..REGIONS
25 | (REGION_ID NUMERIC CONSTRAINT REGION_ID_NN NOT NULL,
26 | REGION_NAME VARCHAR(25),
27 | CONSTRAINT REG_ID_PK PRIMARY KEY (REGION_ID)
28 | );
29 |
30 |
31 |
32 | -- Demo_HR..COUNTRIES definition
33 | CREATE TABLE Demo_HR..COUNTRIES
34 | (COUNTRY_ID CHAR(2) CONSTRAINT COUNTRY_ID_NN NOT NULL,
35 | COUNTRY_NAME VARCHAR(40),
36 | REGION_ID NUMERIC,
37 | DATE_LAST_UPDATED DATETIME DEFAULT GETDATE(),
38 | CONSTRAINT COUNTRY_C_ID_PK PRIMARY KEY (COUNTRY_ID),
39 | CONSTRAINT COUNTR_REG_FK FOREIGN KEY (REGION_ID) REFERENCES Demo_HR..REGIONS (REGION_ID)
40 | );
41 |
42 |
43 |
44 |
45 | -- Demo_HR..LOCATIONS definition
46 | CREATE TABLE Demo_HR..LOCATIONS
47 | (LOCATION_ID NUMERIC(4,0),
48 | STREET_ADDRESS VARCHAR(40),
49 | POSTAL_CODE VARCHAR(12),
50 | CITY VARCHAR(30) CONSTRAINT LOC_CITY_NN NOT NULL,
51 | STATE_PROVINCE VARCHAR(25),
52 | COUNTRY_ID CHAR(2),
53 | CONSTRAINT LOC_ID_PK PRIMARY KEY (LOCATION_ID),
54 | CONSTRAINT LOC_C_ID_FK FOREIGN KEY (COUNTRY_ID) REFERENCES Demo_HR..COUNTRIES (COUNTRY_ID)
55 | );
56 |
57 | CREATE INDEX LOC_CITY_IX ON Demo_HR..LOCATIONS (CITY);
58 | CREATE INDEX LOC_STATE_PROVINCE_IX ON Demo_HR..LOCATIONS (STATE_PROVINCE);
59 | CREATE INDEX LOC_COUNTRY_IX ON Demo_HR..LOCATIONS (COUNTRY_ID);
60 |
61 |
62 |
63 | -- Demo_HR..DEPARTMENTS definition
64 | CREATE TABLE Demo_HR..DEPARTMENTS
65 | (DEPARTMENT_ID NUMERIC(4,0),
66 | DEPARTMENT_NAME VARCHAR(30) CONSTRAINT DEPT_NAME_NN NOT NULL,
67 | MANAGER_ID NUMERIC(6,0),
68 | LOCATION_ID NUMERIC(4,0),
69 | URL VARCHAR(255),
70 | CONSTRAINT DEPT_ID_PK PRIMARY KEY (DEPARTMENT_ID),
71 | CONSTRAINT DEPT_LOC_FK FOREIGN KEY (LOCATION_ID) REFERENCES Demo_HR..LOCATIONS (LOCATION_ID)
72 | ) ;
73 |
74 | CREATE INDEX DEPT_LOCATION_IX ON Demo_HR..DEPARTMENTS (LOCATION_ID);
75 |
76 |
77 |
78 | -- Demo_HR..JOB_HISTORY definition
79 | CREATE TABLE Demo_HR..JOB_HISTORY
80 | (EMPLOYEE_ID NUMERIC(6,0) CONSTRAINT JHIST_EMPLOYEE_NN NOT NULL,
81 | START_DATE DATETIME CONSTRAINT JHIST_START_DATE_NN NOT NULL,
82 | END_DATE DATETIME CONSTRAINT JHIST_END_DATE_NN NOT NULL,
83 | JOB_ID VARCHAR(10) CONSTRAINT JHIST_JOB_NN NOT NULL,
84 | DEPARTMENT_ID NUMERIC(4,0),
85 | CONSTRAINT JHIST_DATE_INTERVAL CHECK (end_date > start_date),
86 | CONSTRAINT JHIST_EMP_ID_ST_DATE_PK PRIMARY KEY (EMPLOYEE_ID, START_DATE),
87 | CONSTRAINT JHIST_JOB_FK FOREIGN KEY (JOB_ID) REFERENCES Demo_HR..JOBS (JOB_ID),
88 | CONSTRAINT JHIST_DEPT_FK FOREIGN KEY (DEPARTMENT_ID) REFERENCES Demo_HR..DEPARTMENTS (DEPARTMENT_ID)
89 | ) ;
90 |
91 | CREATE INDEX JHIST_JOB_IX ON Demo_HR..JOB_HISTORY (JOB_ID);
92 | CREATE INDEX JHIST_EMPLOYEE_IX ON Demo_HR..JOB_HISTORY (EMPLOYEE_ID);
93 | CREATE INDEX JHIST_DEPARTMENT_IX ON Demo_HR..JOB_HISTORY (DEPARTMENT_ID);
94 |
95 |
96 |
97 |
98 | -- Demo_HR..EMPLOYEES definition
99 | CREATE TABLE Demo_HR..EMPLOYEES
100 | (EMPLOYEE_ID NUMERIC(6,0),
101 | FIRST_NAME VARCHAR(20),
102 | LAST_NAME VARCHAR(25) CONSTRAINT EMP_LAST_NAME_NN NOT NULL,
103 | EMAIL VARCHAR(25) CONSTRAINT EMP_EMAIL_NN NOT NULL,
104 | PHONE_NUMBER VARCHAR(20),
105 | HIRE_DATE DATETIME CONSTRAINT EMP_HIRE_DATE_NN NOT NULL,
106 | JOB_ID VARCHAR(10) CONSTRAINT EMP_JOB_NN NOT NULL,
107 | SALARY NUMERIC(8,2),
108 | COMMISSION_PCT NUMERIC(2,2),
109 | MANAGER_ID NUMERIC(6,0),
110 | DEPARTMENT_ID NUMERIC(4,0),
111 | SOME_DATE_FMT1 VARCHAR(50),
112 | SOME_DATE_FMT2 VARCHAR(50),
113 | SOME_DATE_FMT3 VARCHAR(50),
114 | SOME_DATE_FMT4 VARCHAR(50),
115 | FAKE_SSN VARCHAR(11),
116 | ZIP5 VARCHAR(5),
117 | ZIP5OR9 VARCHAR(10),
118 | ZIP9 VARCHAR(10),
119 | EMAIL_ADDRESS VARCHAR(100),
120 | CONSTRAINT EMP_SALARY_MIN CHECK (salary > 0),
121 | CONSTRAINT EMP_EMAIL_UK UNIQUE (EMAIL),
122 | CONSTRAINT EMP_EMP_ID_PK PRIMARY KEY (EMPLOYEE_ID),
123 | CONSTRAINT EMP_DEPT_FK FOREIGN KEY (DEPARTMENT_ID) REFERENCES Demo_HR..DEPARTMENTS (DEPARTMENT_ID),
124 | CONSTRAINT EMP_JOB_FK FOREIGN KEY (JOB_ID) REFERENCES Demo_HR..JOBS (JOB_ID),
125 | CONSTRAINT EMP_MANAGER_FK FOREIGN KEY (MANAGER_ID) REFERENCES Demo_HR..EMPLOYEES (EMPLOYEE_ID)
126 | ) ;
127 |
128 | CREATE INDEX EMP_DEPARTMENT_IX ON Demo_HR..EMPLOYEES (DEPARTMENT_ID);
129 | CREATE INDEX EMP_JOB_IX ON Demo_HR..EMPLOYEES (JOB_ID);
130 | CREATE INDEX EMP_MANAGER_IX ON Demo_HR..EMPLOYEES (MANAGER_ID);
131 | CREATE INDEX EMP_NAME_IX ON Demo_HR..EMPLOYEES (LAST_NAME, FIRST_NAME);
132 |
133 |
134 |
135 |
136 |
137 |
138 |
--------------------------------------------------------------------------------
/mssql/data_validation/img/01_data_val_mssql_example_test_case_sql_code.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DataResearchLabs/sql_scripts/3eaa17caf248d3b58763534d62ef4d9ac5729a46/mssql/data_validation/img/01_data_val_mssql_example_test_case_sql_code.png
--------------------------------------------------------------------------------
/mssql/data_validation/img/02_data_val_mssql_run_results1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DataResearchLabs/sql_scripts/3eaa17caf248d3b58763534d62ef4d9ac5729a46/mssql/data_validation/img/02_data_val_mssql_run_results1.png
--------------------------------------------------------------------------------
/mssql/data_validation/img/03_data_val_mssql_adv_test_case_ex.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DataResearchLabs/sql_scripts/3eaa17caf248d3b58763534d62ef4d9ac5729a46/mssql/data_validation/img/03_data_val_mssql_adv_test_case_ex.png
--------------------------------------------------------------------------------
/mssql/data_validation/img/04_data_val_mssql_run_results_adv.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DataResearchLabs/sql_scripts/3eaa17caf248d3b58763534d62ef4d9ac5729a46/mssql/data_validation/img/04_data_val_mssql_run_results_adv.png
--------------------------------------------------------------------------------
/mssql/data_validation/rule_set_01__row_counts.md:
--------------------------------------------------------------------------------
1 | ### Data Validation Examples - MS SQL Server
2 | [](LICENSE "Creative Commons Zero License by DataResearchLabs (effectively = Public Domain")
3 | [](http://www.DataResearchLabs.com)
4 | #### [Return to Data Validation Home Page](https://github.com/DataResearchLabs/sql_scripts/blob/main/data_validation_scripts.md)
5 |
6 |
7 | # Rule Set #1 - Row Counts
8 |
9 | ## Table of Contents
10 | - T001 - "Full" Row Count
11 | - T002 - "Partial" Row Count
12 | - T003 - "Relative" Row Count
13 | - T004 - "Recent" Row Count
14 |
15 |
16 |
17 |
18 | ### T001 - "Full" Row Count
19 | Verify full row count for a table or view. For example, to verify that table countries has exactly 25 rows:
20 | ```sql
21 | SELECT CASE WHEN COUNT(*) <> 25 THEN 'FAIL' ELSE 'P' END AS status
22 | FROM demo_hr..countries;
23 | ```
24 |
25 |
26 |
27 |
28 | ### T002 - "Partial" Row Count
29 | Verify partial row count for a subset of a table or view. For example, to verify that table countries has exactly 8 rows where region_id = 1 (Europe):
30 | ```sql
31 | SELECT CASE WHEN COUNT(*) <> 8 THEN 'FAIL' ELSE 'P' END AS status
32 | FROM demo_hr..countries
33 | WHERE region_id = 1;
34 | ```
35 |
36 |
37 |
38 |
39 | ### T003 - "Relative" Row Count
40 | Verify relative row counts between tables or views. The example below verifies that table countries has at least 5 times the number of rows as table regions. The inner query returns one row with two columns: (a) count of contries, and (b) count of regions. The outer query can then apply the business logic of comparing the counts and returning a Pass or Fail.
41 | ```sql
42 | SELECT CASE WHEN countries_count < 5 * regions_count THEN 'FAIL' ELSE 'P' END AS status
43 | FROM (
44 | SELECT (SELECT COUNT(*) AS row_count FROM demo_hr..countries) AS countries_count
45 | , (SELECT COUNT(*) AS row_count FROM demo_hr..regions) AS regions_count
46 | ) t;
47 | ```
48 |
49 |
50 |
51 |
52 | ### T004 - "Recent" Row Count
53 | Verify recent row counts. The example below verifies that table countries has had at least 5 rows updated in the past 10 days based on the date stamp in field date_last_updated. The inner query uses GETDATE() (SQL Server current date time stamp) and subtracts 10 days. The inner query counts how many rows in table countries have a date_last_updated with the past 10 days. The outer query can then confirm that at least 5 rows are recent for a Pass...if not then it returns a Fail.
54 | ```sql
55 | SELECT CASE WHEN row_count < 5 THEN 'FAIL' ELSE 'P' END AS status
56 | FROM (
57 | SELECT COUNT(*) AS row_count
58 | FROM demo_hr..countries
59 | WHERE date_last_updated >= GETDATE() - 10
60 | );
61 | ```
62 |
63 |
64 |
65 |
--------------------------------------------------------------------------------
/mssql/data_validation/rule_set_02__keys.md:
--------------------------------------------------------------------------------
1 | ### Data Validation Examples - MS SQL Server
2 | [](LICENSE "Creative Commons Zero License by DataResearchLabs (effectively = Public Domain")
3 | [](http://www.DataResearchLabs.com)
4 | #### [Return to Data Validation Home Page](https://github.com/DataResearchLabs/sql_scripts/blob/main/data_validation_scripts.md)
5 |
6 |
7 | # Rule Set #2 - Keys (Foreign & Unique)
8 |
9 | ## Table of Contents
10 | - T005 - Unique (Native) Key Has No Duplicates
11 | - T006 - Foreign Key Childern Have Orphans
12 | - T007 - Foreign Key Parent is Childless
13 |
14 |
15 |
16 |
17 | ### T005 - Unique Key Has No Duplicates
18 | Sure, good database design implies that unique keys be enforced by a constraint so that you do not need to test for it. However, there are times where a decision is made to **not** add a constraint to enforce the unique key (e.g.: table is replicated from a source having the constraint so skipped for performance). This does happen! At work just last week, two of my unique key regression tests started failing -- without these checks in place as a wide net, the downstream defects would been around a lot longer.
19 | In the example below, the inner query does a group by on the unique key fields, then using a HAVING clause filters down to those key-values with a count of more than 1 -- the dups. The outer query returns a fail if any rows come back with dups (match_count >= 2), or a pass if no dups found.
20 | ```sql
21 | SELECT CASE WHEN COUNT(*) > 0 THEN 'FAIL' ELSE 'P' END AS status
22 | FROM (
23 | SELECT country_name -- UKey fields separated by comma
24 | , COUNT(*) AS match_count
25 | FROM demo_hr..countries -- UKey fields separated by comma
26 | GROUP BY country_name
27 | HAVING COUNT(*) > 1
28 | ) t;
29 | ```
30 |
31 |
32 |
33 |
34 | ### T006 - Foreign Key Children Have Orphans
35 | Sure, as with T005 UKeys above, good database design implies that foreign keys be enforced by a constraint so that you do not need to test for it. However, there are times where for whatever reason the constraints do not exist. In those instances, you will want to periodically run a data validation test to ensure that this core assumption is not being violated (of course adding a foreign key constraint would be best, but if that is not an option then periodically check).
36 | In the example below, the inner query pulls from the child table countries as the anchor, then left joins out to the parent table regions on the key field region_id. If region_id does not exist in the parent table (p.region_id IS NULL), then the child region_id is an orphan. The outer query checks the count() of orphaned child rows: if it is >= 1 then the test fails, but if the count() = 0 then it passes.
37 | ```sql
38 | SELECT CASE WHEN COUNT(*) > 0 THEN 'FAIL' ELSE 'P' END AS status
39 | FROM (
40 | SELECT DISTINCT c.region_id AS child_id, p.region_id AS parent_id
41 | FROM demo_hr..countries c
42 | LEFT JOIN demo_hr..regions p ON p.region_id = c.region_id
43 | WHERE p.region_id IS NULL
44 | ) t;
45 | ```
46 |
47 |
48 |
49 |
50 | ### T007 - Foreign Key Parent Is Childless
51 | I'm not sure this particular test is all that useful because often it is okay for the parent-side of a foreign key relationship to not have children. But, if for some reason you need to be sure there is data present on both sides (parent **and** child), then this test is for you. You will notice in the example below that the query is very similar to T006 above, but the parent and child tables have switched positions in the FROM and LEFT JOIN lines. This is because we want to first pull all parent rows, then left join to find missing (FKey field IS NULL) child rows.
52 | ```sql
53 | SELECT CASE WHEN COUNT(*) > 0 THEN 'FAIL' ELSE 'P' END AS status
54 | FROM (
55 | SELECT DISTINCT c.country_id AS child_id, p.country_id AS parent_id
56 | FROM demo_hr..countries p
57 | LEFT JOIN demo_hr..locations c ON p.country_id = c.country_id
58 | WHERE c.country_id IS NULL
59 | ) t;
60 | ```
61 |
--------------------------------------------------------------------------------
/mssql/data_validation/rule_set_03__heuristic_thresholds.md:
--------------------------------------------------------------------------------
1 | ### Data Validation Examples - MS SQL Server
2 | [](LICENSE "Creative Commons Zero License by DataResearchLabs (effectively = Public Domain")
3 | [](http://www.DataResearchLabs.com)
4 | #### [Return to Data Validation Home Page](https://github.com/DataResearchLabs/sql_scripts/blob/main/data_validation_scripts.md)
5 |
6 | # Rule Set #3 - Heuristic Thresholds (Column Null Rates & Value Rates)
7 |
8 |
9 | ## Table of Contents
10 | - T008 - Null Rate Thresholds
11 | - T009 - Value Frequency Thresholds
12 |
13 | Note: Although the terms "rate" and "frequency" are mostly synonomous, nulls are usually referred to as "rates" and values as "frequencies". Don't know why, but following the convention I've heard over and over.
14 |
15 |
16 |
17 |
18 | ### T008 - Null Rate Thresholds
19 | There is a lot going on in this "Null Rate Threshold" data validation query below. The basic goal is to validate that a given table's columns do not have too many, or too few NULLs.
20 | In the example below...
22 | ...we are checking columns department_name, manager_id, and url in the table departments. We want to ensure that column department_name has no NULLs, and that column manager_id is NULL less than 65% of the time and column url is NULL less than 80% of the time.
23 |
24 | This check is like the proverbial miner's canary in that it is a trip wire triggered when something goes awry in a data feed. I've used this test scenario to great effect when coupled with a create-date or last-updated-date to monitor the past week's data loads for any unexpected upticks in null rates.
25 |
26 | There is a downside to this test scenario too however; and that is when it fires false alarms and you find yourself tinkering with the thresholds values (0.0000, 0.65000, and 0.80000 cutoffs below), raising and lowering them over and over. If this happens, chances are test fails are not actionable nor important and you should not waste your time applying this test scenario to that given table and field. Be careful to only pick fields that truly matter.
27 |
28 | Below, there is an upper CTE (common table expression) named "dtls" at the WITH clause, and a lower wrapper that applies the business logic (if any null rate rejections were found, fail the case). Inside the dtls CTE, there is an inner query at the bottom (at the FROM clause) doing a single table scan to calculate a null rate per column by counting nulls in each column and dividing by the total table row count. The SELECT CASE logic at the top applies the business logic; comparing the actual calcuated null rates (nr_dept_nm, nr_mgr_id, and nr_url) against the expected threshold rates (hard-coded as 0.0000, 0.6500, and 0.8000). The returned value is a rejection code (REJ-01, REJ-02, etc.) clearly indicating which field failed the null rate check, what the actual null rate was, and what the expected null rate threshold to exceed was. If no rejections are triggered, then status returns a "P" for pass.
29 |
51 |
52 |
53 |
54 | ### T009 - Value Frequency Thresholds
55 | "Value Frequency Threshold" tests are fairly similar to null rates above (T008). The difference is that we are checking the frequency (or rate) at which a column's values occur.
56 | In the example below...
58 | ...we are checking the frequencies with which the values 1, 2, 3, and 4 occur in field region_id of table countries. There is an upper CTE (common table expression) named "dtls" at the WITH clause, and a lower wrapper that applies the business logic (if any value frequency rejections were found, fail the case). Inside the dtls CTE, there is an inner query at the bottom (at the FROM clause) doing a single table scan to calculate a frequencies for each value in the GROUP BY for the column. It the GROUP BY value count (field "freq") is divided by the total table row count (field "den") to calculate field "freq_rt". The SELECT CASE logic at the top applies the business logic; comparing the actual value frequencies (freq_rt when region_id = 1, or =2, etc.) against the expected threshold frequencies (hard-coded as 0.28 to 0.36, 016 to 0.24 and so on). The returned value is a rejection code (REJ-01, REJ-02, etc.) clearly indicating which field failed the value ferquency check, what the actual value frequency was, and what the expected value frequency threshold ranges were. If no rejections are triggered, then status returns a "P" for pass.
59 |
16 |
17 |
18 |
19 | ### T010 - Not Null
20 | Verify numeric field is not null. For example, to verify that table countries has no NULLs in field region_id:
21 | ```sql
22 | SELECT CASE WHEN COUNT(*) > 0 THEN 'FAIL' ELSE 'P' END AS status
23 | FROM demo_hr..countries
24 | WHERE region_id IS NULL;
25 | ```
26 |
27 |
28 |
29 |
30 | ### T011 - Not Negative
31 | Verify numeric field is not negative. For example, to verify that table countries has no field region_id negative values:
32 | ```sql
33 | SELECT CASE WHEN COUNT(*) > 0 THEN 'FAIL' ELSE 'P' END AS status
34 | FROM demo_hr..countries
35 | WHERE region_id < 0;
36 | ```
37 |
38 |
39 |
40 |
41 | ### T012 - Numeric Range
42 | Verify numeric field value is within a range. In the example below, we verify that field employee_id is between 100 and 999 in table employees. Note that you can run the inner query yourself to return the actual rejection code (is too low or too high) along with the actual value and the expected value...all nicely packaged for easy troubleshooting.
43 | ```sql
44 | SELECT CASE WHEN COUNT(*) > 0 THEN 'FAIL' ELSE 'P' END AS status
45 | FROM (
46 | SELECT employee_id
47 | , CASE WHEN employee_id < 100 THEN 'REJ-01: Verify employee_id > 99|exp>99|act=' + CAST(employee_id AS VARCHAR(10))
48 | WHEN employee_id > 999 THEN 'REJ-02: Verify employee_id < 1000|exp<1000|act=' + CAST(employee_id AS VARCHAR(10))
49 | ELSE 'P'
50 | END AS status
51 | FROM demo_hr..employees
52 | ) t
53 | WHERE status <> 'P';
54 | ```
55 |
56 |
57 |
58 |
59 | ### T013 - In Value List
60 | Verify numeric field is **in** the list of values. For example, to verify that table countries field region_id is always values 1, 2, 3, or 4 we use the IN() clause as shown below:
61 | ```sql
62 | SELECT CASE WHEN COUNT(*) > 0 THEN 'FAIL' ELSE 'P' END AS status
63 | FROM (
64 | SELECT region_id
65 | , CASE WHEN region_id NOT IN(1,2,3,4) THEN 'FAIL' ELSE 'P' END AS status
66 | FROM demo_hr..countries
67 | ) t
68 | WHERE status <> 'P';
69 | ```
70 |
71 |
72 |
73 |
74 | ### T014 - Not In Value List
75 | Verify numeric field is **not** in the list of values. For example, to verify that table countries field region_id is never in values 97, 98, or 99 we use the NOT IN() clauses as shown below:
76 | ```sql
77 | SELECT CASE WHEN COUNT(*) > 0 THEN 'FAIL' ELSE 'P' END AS status
78 | FROM (
79 | SELECT region_id
80 | , CASE WHEN region_id IN(97,98,99) THEN 'FAIL' ELSE 'P' END AS status
81 | FROM demo_hr..countries
82 | ) t
83 | WHERE status <> 'P';
84 | ```
85 |
86 |
87 |
88 |
89 | ### T015 - Multi Field Compare
90 | Verify numeric field values in relation to one another. For example, to verify that salary times commission_pct is always less than $10,000 in table employees:
91 | ```sql
92 | SELECT CASE WHEN COUNT(*) > 0 THEN 'FAIL' ELSE 'P' END AS status
93 | FROM (
94 | SELECT salary, commission_pct
95 | , CASE WHEN salary * commission_pct > 10000 THEN 'FAIL' ELSE 'P' END AS status
96 | FROM demo_hr..employees
97 | ) t
98 | WHERE status <> 'P';
99 | ```
100 |
101 |
--------------------------------------------------------------------------------
/mssql/data_validation/rule_set_05__date_values.md:
--------------------------------------------------------------------------------
1 | ### Data Validation Examples - MS SQL Server
2 | [](LICENSE "Creative Commons Zero License by DataResearchLabs (effectively = Public Domain")
3 | [](http://www.DataResearchLabs.com)
4 | #### [Return to Data Validation Home Page](https://github.com/DataResearchLabs/sql_scripts/blob/main/data_validation_scripts.md)
5 |
6 |
7 | # Rule Set #5 - Date Values
8 |
9 | ## Table of Contents
10 | - T016 - Not Null
11 | - T017 - Date Range
12 | - T018 - No Time Part
13 | - T019 - Has Time Part
14 | - T020 - Multi Field Compare
15 | - Bonus Tip - Joining Tables with 2 Pairs of Start-End Date Overlaps
16 |
17 |
18 |
19 |
20 | ### T016 - Not Null
21 | Verify date field is not null. For example, to verify that table countries has no NULLs in field date_last_updated:
22 | ```sql
23 | SELECT date_last_updated
24 | , CASE WHEN date_last_updated IS NULL THEN 'FAIL' ELSE 'P' END AS status
25 | FROM demo_hr..countries;
26 | ```
27 |
28 |
29 |
30 |
31 | ### T017 - Date Range
32 | Verify date field is within specified range. For example, you can run the sql below to verify that table countries field date_last_updated is between 1/1/2021 and today. Note the use of SYSDATE to represent today's date dynamically in Oracle. Notice the inner query uses a CASE...WHEN...ELSE structure to identify two rejections codes: (1) date is too high, and (2) date is too low. Expected and actual values are displayed in the output if you run the inner query only. The outer query is a wrapper to determine whether the test passed or failed.
33 | ```sql
34 | SELECT date_last_updated
35 | , CASE WHEN date_last_updated > GETDATE() THEN 'REJ-01: Field date_last_updated cannot be in the future|exp<=' + CAST(GETDATE() AS VARCHAR(20)) + '|act=' + CAST(date_last_updated AS VARCHAR(20))
36 | WHEN date_last_updated < '01/01/2021' THEN 'REJ-02: Field date_last_updated cannot be too old|exp>=1/1/2021|act=' + CAST(date_last_updated AS VARCHAR(20))
37 | ELSE 'P'
38 | END AS status
39 | FROM demo_hr..countries;
40 | ```
41 |
42 |
43 |
44 |
45 | ### T018 - No Time Part
46 | Verify date field is a date only, no time part present. For example, to verify that table employees has no time part in field hire_date (time part must be "12:00:00"):
47 | ```sql
48 | SELECT hire_date
49 | , CASE WHEN CONVERT(VARCHAR(8), hire_date, 108) <> '00:00:00' THEN 'FAIL' ELSE 'P' END AS status
50 | FROM demo_hr..employees;
51 | ```
52 |
53 |
54 |
55 |
56 | ### T019 - Has Time Part
57 | Verify date field is a date **and** time. For example, to verify that table employees has a time part in field hire_date (time part cannot be "12:00:00"):
58 | ```sql
59 | SELECT date_last_updated
60 | , CASE WHEN CONVERT(VARCHAR(8), date_last_updated, 108) = '00:00:00' THEN 'FAIL' ELSE 'P' END AS status
61 | FROM demo_hr..countries;
62 | ```
63 |
64 |
65 |
66 |
67 | ### T020 - Multi Field Compare
68 | Verify multiple date fields relative to each other. For example, to verify that field start_date must be < field end_date in table job_history (thus if start_date is >= end_date the test case fails):
69 | ```sql
70 | SELECT start_date, end_date
71 | , CASE WHEN start_date >= end_date THEN 'FAIL' ELSE 'P' END AS status
72 | FROM demo_hr..job_history;
73 | ```
74 |
75 |
76 |
77 |
78 | ### Bonus Tip - Joining Tables with 2 Pairs of Start-End Date Overlaps
79 | So this is a fantastic tip I learned from a co-worker in healthcare back in 2011 (was it Jennifer C.? or Matt G.? or Jonathon P.? I can't remember).
80 |
81 |
82 | The **problem** is that you are trying to join two tables with logic where the table1.start_dt/end_dt's overlap with the table2.start_dt/end_dt.
83 |
84 |
85 | The **solution** is to **join on `table1.start_dt <= table2.end_dt AND table1.end_dt >= table2.start_dt`**.
86 |
87 | Here is why:
88 |
89 | ```
90 | Scenario #1 = "Discard - No Overlap" Table #1 date range is completely before Table #2's (Dt#1.End is NOT > Dt#2.Start)
91 | Table#1: |Start ----------- End|
92 | Table#2: |Start ----------- End|
93 |
94 |
95 | Scenario #2 = "Include - Table #1 End = Table#2 Start" Table #1 date range ends exactly where Table #2's starts (Tbl#1.End = Tbl#2.Start AND Tbl#1.Start < Tbl#2.End)
96 | Table#1: |Start ----------- End|
97 | Table#2: |Start ----------- End|
98 |
99 |
100 | Scenario #3 = "Include - Table #1 Ends Midway Table #2 Span" Table #1 date range nicely overlaps Table #2's (Tbl#1.End > Tbl#2.Start AND Tbl#1.Start < Tbl#2.End)
101 | Table#1: |Start ----------- End|
102 | Table#2: |Start ----------- End|
103 |
104 |
105 | Scenario #4 = "Include - Table #1 Starts Midway Table #2 Span" Table #1 date range nicely overlaps Table #2's (Tbl#1.End > Tbl#2.Start AND Tbl#1.Start < Tbl#2.End)
106 | Table#1: |Start ----------- End|
107 | Table#2: |Start ----------- End|
108 |
109 |
110 | Scenario #5 = "Include - Table #1 Starts at Table #2 End" Table #1 date range start exactly at Table #2's End (Tbl#1.End > Tbl#2.St AND Tbl#1.St = Tbl#2.End)
111 | Table#1: |Start ----------- End|
112 | Table#2: |Start ----------- End|
113 |
114 |
115 | Scenario #6 = "Discard - No Overlap" Date #1 is entirely after Date #2 (Tbl#1.Start is > Tbl#2.End)
116 | Table#1: |Start ----------- End|
117 | Table#2: |Start ----------- End|
118 |
119 | ```
120 |
121 |
122 |
123 |
--------------------------------------------------------------------------------
/mssql/data_validation/rule_set_09__defect_regression.md:
--------------------------------------------------------------------------------
1 | ### Data Validation Examples - MS SQL Server
2 | [](LICENSE "Creative Commons Zero License by DataResearchLabs (effectively = Public Domain")
3 | [](http://www.DataResearchLabs.com)
4 | #### [Return to Data Validation Home Page](https://github.com/DataResearchLabs/sql_scripts/blob/main/data_validation_scripts.md)
5 |
6 |
7 | # Rule Set #9 - Defect Regression
8 |
9 | ## Summary
10 | Where possible, it is a good idea to cover known defects with a test case. This enables automated regression testing for free. Yuo can simply run this script to quickly re-check whether a defect has been re-introduced.
11 |
12 | There are a lot of caveats that will block you from covering every defect:
13 | * If anything but the simplest of setup and tear down data is required
14 | * If processes or jobs or applications or ETL must be triggered
15 | * If there are interactions with other databases or files not on the current server
16 |
17 | That said, you can still frequently setup static trip wires to monitor the data for recurrence of old bugs.
18 |
19 | Some real-world examples from my past include:
20 | * A defect where email addresses stored in the database had TAB characters. I setup a simple CHAR(9) check that ran daily and cross referenced the defect number in the SQL return results.
21 | * A defect where customer specific HTML snippets for email reports had bad tags manually entered by company analysts working with the client. Obviously fixing the app to avoid those data entries would be best, but that was a different department that was going to get to it later. So, I setup a daily alert to check for those HTML tags and some other similar ones that might be accidentally introduced.
22 | * A defect where internal data entry folks on rare occasions had to key in customer names. They were supposed to be all upper case for other reasons in the system, but the analysts sometimes entered lower case. Developers were not going to get around to fixing it for a couple of months (other priorities, this was a very low frequency issue). So I setup a daily check to trip an alarm and notify via email the appropriate person to correct their data.
23 | * Basically, you can take any of the test case examples in Rule Sets #1 thu #8 and apply them as regression tests for defects when appropriate.
24 |
--------------------------------------------------------------------------------
/mssql/schemadiff/simple_schema_dump.sql:
--------------------------------------------------------------------------------
1 | ------------------------------------------------------------------------------------
2 | -- Simple Schema Dump:
3 | -- This SQL script will dump table, column, key, and index design related metadata
4 | -- so that you can copy-paste or export to a text file.
5 | -- Even better, you can make other snapshots over time (same database schema earlier
6 | -- points in time), OR in different environments (DEV, PPMO, STAGE, PROD). Then,
7 | -- using your favorite free Text File Diff Tool (DiffMerge, ExamDiff, etc.) you
8 | -- can compare snapshots to quick isolate and identify what changed over time
9 | -- or is different between environments.
10 | ------------------------------------------------------------------------------------
11 | -- Platform: Microsoft SQL Server
12 | -- Author: DataResearchLabs
13 | -- GitHub: https://github.com/DataResearchLabs/sql_scripts
14 | -- YouTube Tutorials: https://www.youtube.com/playlist?list=PLVHoUDdbskUSlZTVZmllGTdSuvyI4LBiE
15 | ----------------------------------------------------------------------------------
16 |
17 | -- IMPORTANT
18 | USE AdventureWorksLT2019; -- <<<<<<<<<<<< CHANGE THIS VALUE to Schema you want to dump
19 | ;
20 |
21 |
22 | WITH vars
23 | AS (
24 | SELECT DB_NAME() AS v_SchemaName
25 | )
26 |
27 | , baseTbl
28 | AS (
29 | SELECT TABLE_CATALOG AS SchemaName, table_type, table_name
30 | FROM INFORMATION_SCHEMA.TABLES
31 | WHERE TABLE_CATALOG = (SELECT v_SchemaName FROM vars)
32 | )
33 |
34 | , metaForTbl
35 | AS (
36 | SELECT t.SchemaName
37 | , t.table_name AS TableName
38 | , '(' + CASE WHEN t.table_type = 'BASE TABLE' THEN 'Table' WHEN t.table_type = 'VIEW' THEN 'View' ELSE 'UK' END + ')' AS ObjectType
39 | , t.table_name AS ObjectName
40 | , '(Exists)' AS PropertyName
41 | , ' ' AS PropertyValue
42 | FROM baseTbl t
43 | )
44 |
45 | , metaForCol_dataType
46 | AS (
47 | SELECT ft.SchemaName, ft.table_name AS TableName, 'Column' AS ObjectType, tut.column_name AS ObjectName
48 | , '2' AS PropertyName
49 | , COALESCE(tut.data_type, 'unknown') + '('
50 | + CASE WHEN tut.CHARACTER_MAXIMUM_LENGTH IS NOT NULL THEN CAST(tut.CHARACTER_MAXIMUM_LENGTH AS VARCHAR(10)) ELSE '' END
51 | + CASE WHEN tut.DATA_TYPE IN('date','datetime') THEN ',' + CAST(tut.DATETIME_PRECISION AS VARCHAR(10))
52 | WHEN tut.NUMERIC_PRECISION IS NULL THEN ''
53 | ELSE ',' + CAST(tut.NUMERIC_PRECISION AS VARCHAR(10))
54 | END
55 | + CASE WHEN tut.NUMERIC_SCALE IS NOT NULL THEN ',' + CAST(tut.NUMERIC_SCALE AS VARCHAR(10)) ELSE '' END
56 | + ')' AS PropertyValue
57 | FROM INFORMATION_SCHEMA.COLUMNS tut
58 | INNER JOIN baseTbl ft ON ft.SchemaName = tut.TABLE_CATALOG AND ft.table_name = tut.table_name
59 | )
60 |
61 | , metaForCol_nullable
62 | AS (
63 | SELECT ft.SchemaName, ft.table_name AS TableName, 'Column' AS ObjectType, tut.column_name AS ObjectName
64 | , '3' AS PropertyName, CASE WHEN tut.IS_NULLABLE = 'YES' THEN 'NULL' ELSE 'NOT NULL' END AS PropertyValue
65 | FROM INFORMATION_SCHEMA.COLUMNS tut
66 | INNER JOIN baseTbl ft ON ft.SchemaName = tut.TABLE_CATALOG AND ft.table_name = tut.table_name
67 | )
68 |
69 | , metaForCol_ordpos
70 | AS (
71 | SELECT ft.SchemaName, ft.table_name AS TableName, 'Column' AS ObjectType, tut.column_name AS ObjectName
72 | , '1' AS PropertyName, RIGHT('000' + CAST(tut.ORDINAL_POSITION AS VARCHAR(3)), 3) AS PropertyValue
73 | FROM INFORMATION_SCHEMA.COLUMNS tut
74 | INNER JOIN baseTbl ft ON ft.SchemaName = tut.TABLE_CATALOG AND ft.table_name = tut.table_name
75 | )
76 |
77 | , metaAllCols
78 | AS (
79 | SELECT schemaname, tablename, objecttype, objectname, 'Properties' AS propertyname
80 | /* NOTE: STRING_AGG was added in SQL Server 2017 and forward.
81 | If you have and older version, use FOR XML PATH approach here: https://stackoverflow.com/questions/15477743/listagg-in-sqlserver
82 | */
83 | , STRING_AGG(propertyvalue, ' | ')
84 | WITHIN GROUP (ORDER BY propertyname, propertyvalue) AS propertyvalue
85 | FROM (
86 | SELECT * FROM metaForCol_dataType
87 | UNION SELECT * FROM metaForCol_nullable
88 | UNION SELECT * FROM metaForCol_ordpos
89 | ) t
90 | GROUP BY schemaname, tablename, objecttype, objectname
91 | )
92 |
93 | , metaForKeys
94 | AS (
95 | SELECT cons.TABLE_CATALOG AS SchemaName, cons.TABLE_NAME AS TableName
96 | , CASE WHEN cons.constraint_type = 'PRIMARY KEY' THEN 'PKey'
97 | WHEN cons.constraint_type = 'UNIQUE' THEN 'UKey'
98 | WHEN cons.constraint_type = 'FOREIGN KEY' THEN 'FKey'
99 | ELSE 'X'
100 | END AS ObjectType
101 | , cons.constraint_name AS ObjectName
102 | , 'FieldList' AS PropertyName
103 | /* NOTE: STRING_AGG was added in SQL Server 2017 and forward.
104 | If you have and older version, use FOR XML PATH approach here: https://stackoverflow.com/questions/15477743/listagg-in-sqlserver
105 | */
106 | , STRING_AGG(kcu.COLUMN_NAME, ',') WITHIN GROUP (ORDER BY kcu.ORDINAL_POSITION) AS PropertyValue
107 | FROM INFORMATION_SCHEMA.TABLE_CONSTRAINTS cons
108 | INNER JOIN INFORMATION_SCHEMA.KEY_COLUMN_USAGE kcu
109 | ON cons.TABLE_CATALOG = kcu.TABLE_CATALOG
110 | AND cons.TABLE_NAME = kcu.TABLE_NAME
111 | AND cons.CONSTRAINT_NAME = kcu.CONSTRAINT_NAME
112 | WHERE cons.TABLE_CATALOG = (SELECT v_SchemaName FROM vars)
113 | AND cons.table_name IN(SELECT DISTINCT table_name FROM baseTbl)
114 | AND cons.constraint_type IN('PRIMARY KEY','FOREIGN KEY','UNIQUE')
115 | GROUP BY cons.TABLE_CATALOG, cons.TABLE_NAME, cons.CONSTRAINT_TYPE, cons.CONSTRAINT_NAME
116 | )
117 |
118 | , metaForIdxs
119 | AS (
120 | SELECT (SELECT v_SchemaName FROM vars) AS SchemaName, o.name AS TableName
121 | , 'Index' AS ObjectType, i.name AS ObjectName
122 | , 'FieldList' AS PropertyName
123 | /* NOTE: STRING_AGG was added in SQL Server 2017 and forward.
124 | If you have and older version, use FOR XML PATH approach here: https://stackoverflow.com/questions/15477743/listagg-in-sqlserver
125 | */
126 | , STRING_AGG(c.name, ',') WITHIN GROUP (ORDER BY ic.key_ordinal /*column_store_order_ordinal*/) AS PropertyValue
127 | FROM sys.indexes i
128 | INNER JOIN sys.objects o ON i.object_id = o.object_id
129 | INNER JOIN sys.index_columns ic ON ic.object_id = i.object_id AND ic.index_id = i.index_id
130 | INNER JOIN sys.columns c ON c.object_id = i.object_id AND c.column_id = ic.column_id
131 | WHERE i.[Type] = 2
132 | AND i.is_unique = 0
133 | AND i.is_primary_key = 0
134 | AND o.[type] = 'U'
135 | GROUP BY o.name, i.name
136 | )
137 |
138 | , allMetadata
139 | AS (
140 | SELECT * FROM metaForTbl
141 | UNION SELECT * FROM metaAllCols
142 | UNION SELECT * FROM metaForKeys
143 | UNION SELECT * FROM metaForIdxs
144 | )
145 |
146 | SELECT CASE WHEN objecttype IN('(Table)','(View)') THEN schemaname ELSE ' ' END AS schema_nm
147 | , CASE WHEN objecttype IN('(Table)','(View)') THEN tablename ELSE ' ' END AS tbl_nm
148 | , objecttype AS obj_typ, objectname AS obj_nm, /*propertyname,*/ propertyvalue AS properties
149 | FROM allMetadata
150 | ORDER BY schemaname, tablename, objecttype
151 | , CASE WHEN objecttype='Column' THEN propertyvalue ELSE ' ' END
152 | , objectname, propertyname
153 |
154 |
155 |
--------------------------------------------------------------------------------
/mysql/data_dictionary/data_dict_dump.sql:
--------------------------------------------------------------------------------
1 | ------------------------------------------------------------------------------------
2 | -- Data Dictionary Dump:
3 | -- This SQL script will dump table, column, key, and description design related
4 | -- metadata so that you can copy-paste or export to Excel as a Data Dictionary.
5 | ------------------------------------------------------------------------------------
6 | -- Platform: MySQL
7 | -- Author: DataResearchLabs
8 | -- GitHub: https://github.com/DataResearchLabs/sql_scripts
9 | -- YouTube Tutorials: https://www.youtube.com/channel/UCQciXv3xaBykeUFc04GxSXA
10 | ----------------------------------------------------------------------------------
11 | USE sakila -- <<<<<<<<<<<<< Change schema here
12 | ;
13 |
14 | WITH vars
15 | AS (
16 | SELECT
17 | DATABASE() AS v_SchemaName -- Do not change this value...it is changed up above on line 11 and passed thru to here
18 | , 'NO' AS v_TablesOnly -- YES=Limit To Tables only; NO=Include views too
19 | )
20 |
21 | , baseTbl
22 | AS (
23 | SELECT table_schema AS SchemaName
24 | , table_catalog
25 | , table_type, table_name, table_schema
26 | FROM INFORMATION_SCHEMA.TABLES
27 | WHERE TABLE_SCHEMA = (SELECT v_SchemaName FROM vars)
28 | AND ( (TABLE_TYPE = 'BASE TABLE')
29 | OR ((SELECT v_TablesOnly FROM vars) = 'NO')
30 | )
31 | )
32 |
33 | , metadata
34 | AS (
35 | SELECT
36 | bt.SchemaName AS schema_nm
37 | , bt.table_name AS table_nm
38 | , CASE WHEN bt.TABLE_TYPE = 'BASE TABLE' THEN 'TBL'
39 | WHEN bt.TABLE_TYPE = 'VIEW' THEN 'VW'
40 | ELSE 'UK'
41 | END AS obj_typ
42 | , tut.ordinal_position AS ord_pos
43 | , tut.column_name AS column_nm
44 | , CONCAT(COALESCE(tut.data_type, 'unknown'),
45 | CASE WHEN tut.data_type IN('varchar','char') THEN CONCAT('(', tut.CHARACTER_MAXIMUM_LENGTH, ')')
46 | WHEN tut.data_type IN('date','time') THEN CONCAT('(3)')
47 | WHEN tut.data_type = 'datetime' THEN CONCAT('(8)')
48 | WHEN tut.data_type = 'timestamp' THEN CONCAT('(4)')
49 | WHEN tut.data_type in('tinyint','smallint','mediumint','int','bigint') THEN CONCAT('(', tut.NUMERIC_PRECISION, ')')
50 | WHEN tut.data_type = 'decimal' THEN CONCAT('(', tut.NUMERIC_PRECISION, ',', tut.NUMERIC_SCALE, ')')
51 | WHEN tut.CHARACTER_MAXIMUM_LENGTH IS NOT NULL THEN CONCAT('(', tut.CHARACTER_MAXIMUM_LENGTH, ')')
52 | WHEN tut.DATETIME_PRECISION IS NOT NULL THEN CONCAT('(', tut.DATETIME_PRECISION, ')')
53 | WHEN tut.NUMERIC_PRECISION IS NOT NULL
54 | AND tut.NUMERIC_SCALE IS NULL THEN CONCAT('(', tut.NUMERIC_PRECISION, ')')
55 | WHEN tut.NUMERIC_PRECISION IS NOT NULL
56 | AND tut.NUMERIC_SCALE IS NOT NULL THEN CONCAT('(', tut.NUMERIC_PRECISION, ',', tut.NUMERIC_SCALE, ')')
57 | ELSE ''
58 | END ) AS data_typ
59 | , CASE WHEN tut.IS_NULLABLE = 'YES' THEN 'NULL' ELSE 'NOT NULL' END AS nullable
60 | FROM INFORMATION_SCHEMA.COLUMNS tut
61 | INNER JOIN baseTbl bt ON bt.table_catalog = tut.TABLE_CATALOG AND bt.table_name = tut.table_name
62 | )
63 |
64 | , meta_for_keys
65 | AS (
66 | SELECT schema_nm, table_nm, column_nm
67 | , GROUP_CONCAT(is_key ORDER BY is_key SEPARATOR ',') AS is_key
68 | FROM (
69 | SELECT cons.TABLE_SCHEMA AS schema_nm
70 | , cons.TABLE_NAME AS table_nm
71 | , kcu.COLUMN_NAME AS column_nm
72 | , CASE WHEN cons.constraint_type = 'PRIMARY KEY' THEN 'PK'
73 | WHEN cons.constraint_type = 'UNIQUE' THEN 'UK'
74 | WHEN cons.constraint_type = 'FOREIGN KEY' THEN 'FK'
75 | ELSE 'X'
76 | END AS is_key
77 | FROM INFORMATION_SCHEMA.TABLE_CONSTRAINTS cons
78 | INNER JOIN INFORMATION_SCHEMA.KEY_COLUMN_USAGE kcu
79 | ON cons.TABLE_SCHEMA = kcu.TABLE_SCHEMA
80 | AND cons.TABLE_NAME = kcu.TABLE_NAME
81 | AND cons.CONSTRAINT_NAME = kcu.CONSTRAINT_NAME
82 | WHERE cons.table_schema = (SELECT v_SchemaName FROM vars)
83 | AND cons.table_name IN(SELECT DISTINCT table_name FROM baseTbl)
84 | AND cons.constraint_type IN('PRIMARY KEY','FOREIGN KEY','UNIQUE')
85 | GROUP BY cons.TABLE_SCHEMA, cons.TABLE_NAME, kcu.COLUMN_NAME, cons.constraint_type
86 | ) t
87 | GROUP BY schema_nm, table_nm, column_nm
88 | )
89 |
90 | , col_comm
91 | AS (
92 | SELECT TABLE_SCHEMA AS SCHEMA_NM
93 | , TABLE_NAME AS TABLE_NM
94 | , COLUMN_NAME AS COLUMN_NM
95 | , COLUMN_COMMENT AS column_descr
96 | FROM INFORMATION_SCHEMA.COLUMNS
97 | WHERE table_schema = (SELECT v_SchemaName FROM vars)
98 | AND table_name IN(SELECT DISTINCT table_name FROM baseTbl)
99 | )
100 |
101 | SELECT md.SCHEMA_NM, md.TABLE_NM, md.OBJ_TYP
102 | , md.ORD_POS AS ord
103 | , COALESCE(pk.is_key, ' ') AS is_key
104 | , md.COLUMN_NM, md.DATA_TYP, md.NULLABLE, c.column_descr
105 | FROM metadata md
106 | LEFT JOIN meta_for_keys pk ON pk.SCHEMA_NM = md.SCHEMA_NM AND pk.TABLE_NM = md.TABLE_NM AND pk.COLUMN_NM = md.COLUMN_NM
107 | LEFT JOIN col_comm c ON c.SCHEMA_NM = md.SCHEMA_NM AND c.TABLE_NM = md.TABLE_NM AND c.COLUMN_NM = md.COLUMN_NM
108 | ORDER BY md.SCHEMA_NM, md.TABLE_NM, md.ORD_POS
109 |
--------------------------------------------------------------------------------
/mysql/data_validation/demo_data/demo_hr_01_create_database.sql:
--------------------------------------------------------------------------------
1 | CREATE DATABASE Demo_HR;
2 |
--------------------------------------------------------------------------------
/mysql/data_validation/demo_data/demo_hr_02_create_tables.sql:
--------------------------------------------------------------------------------
1 | USE Demo_HR;
2 |
3 | CREATE TABLE Demo_HR.JOBS
4 | (JOB_ID VARCHAR(10),
5 | JOB_TITLE VARCHAR(35) NOT NULL,
6 | MIN_SALARY NUMERIC(6,0),
7 | MAX_SALARY NUMERIC(6,0),
8 | CONSTRAINT JOB_ID_PK PRIMARY KEY (JOB_ID)
9 | );
10 |
11 |
12 |
13 | -- Demo_HR..JOBS_SNAPSHOT definition
14 | CREATE TABLE Demo_HR.JOBS_SNAPSHOT
15 | (JOB_ID VARCHAR(10),
16 | JOB_TITLE VARCHAR(35) NOT NULL,
17 | MIN_SALARY NUMERIC(6,0),
18 | MAX_SALARY NUMERIC(6,0)
19 | );
20 |
21 |
22 |
23 | -- Demo_HR..REGIONS definition
24 | CREATE TABLE Demo_HR.REGIONS
25 | (REGION_ID INTEGER NOT NULL PRIMARY KEY,
26 | REGION_NAME VARCHAR(25)
27 | );
28 |
29 |
30 |
31 | -- Demo_HR..COUNTRIES definition
32 | CREATE TABLE Demo_HR.COUNTRIES
33 | (COUNTRY_ID CHAR(2) NOT NULL PRIMARY KEY,
34 | COUNTRY_NAME VARCHAR(40),
35 | REGION_ID NUMERIC REFERENCES Demo_HR.REGIONS (REGION_ID),
36 | DATE_LAST_UPDATED DATETIME DEFAULT CURRENT_TIMESTAMP
37 | );
38 |
39 |
40 |
41 |
42 | -- Demo_HR..LOCATIONS definition
43 | CREATE TABLE Demo_HR.LOCATIONS
44 | (LOCATION_ID NUMERIC(4,0) PRIMARY KEY,
45 | STREET_ADDRESS VARCHAR(40),
46 | POSTAL_CODE VARCHAR(12),
47 | CITY VARCHAR(30) NOT NULL,
48 | STATE_PROVINCE VARCHAR(25),
49 | COUNTRY_ID CHAR(2) REFERENCES Demo_HR.COUNTRIES (COUNTRY_ID)
50 | );
51 |
52 | CREATE INDEX LOC_CITY_IX ON Demo_HR.LOCATIONS (CITY);
53 | CREATE INDEX LOC_STATE_PROVINCE_IX ON Demo_HR.LOCATIONS (STATE_PROVINCE);
54 | CREATE INDEX LOC_COUNTRY_IX ON Demo_HR.LOCATIONS (COUNTRY_ID);
55 |
56 |
57 |
58 | -- Demo_HR..DEPARTMENTS definition
59 | CREATE TABLE Demo_HR.DEPARTMENTS
60 | (DEPARTMENT_ID NUMERIC(4,0) PRIMARY KEY,
61 | DEPARTMENT_NAME VARCHAR(30) NOT NULL,
62 | MANAGER_ID NUMERIC(6,0),
63 | LOCATION_ID NUMERIC(4,0) REFERENCES Demo_HR.LOCATIONS (LOCATION_ID) ,
64 | URL VARCHAR(255)
65 | ) ;
66 |
67 | CREATE INDEX DEPT_LOCATION_IX ON Demo_HR.DEPARTMENTS (LOCATION_ID);
68 |
69 |
70 |
71 | -- Demo_HR..JOB_HISTORY definition
72 | CREATE TABLE Demo_HR.JOB_HISTORY
73 | (EMPLOYEE_ID NUMERIC(6,0) NOT NULL,
74 | START_DATE DATETIME NOT NULL,
75 | END_DATE DATETIME NOT NULL,
76 | JOB_ID VARCHAR(10) NOT NULL,
77 | DEPARTMENT_ID NUMERIC(4,0),
78 | CONSTRAINT JHIST_DATE_INTERVAL CHECK (end_date > start_date),
79 | CONSTRAINT JHIST_EMP_ID_ST_DATE_PK PRIMARY KEY (EMPLOYEE_ID, START_DATE),
80 | CONSTRAINT JHIST_JOB_FK FOREIGN KEY (JOB_ID) REFERENCES Demo_HR.JOBS (JOB_ID),
81 | CONSTRAINT JHIST_DEPT_FK FOREIGN KEY (DEPARTMENT_ID) REFERENCES Demo_HR.DEPARTMENTS (DEPARTMENT_ID)
82 | ) ;
83 |
84 | CREATE INDEX JHIST_JOB_IX ON Demo_HR.JOB_HISTORY (JOB_ID);
85 | CREATE INDEX JHIST_EMPLOYEE_IX ON Demo_HR.JOB_HISTORY (EMPLOYEE_ID);
86 | CREATE INDEX JHIST_DEPARTMENT_IX ON Demo_HR.JOB_HISTORY (DEPARTMENT_ID);
87 |
88 |
89 |
90 |
91 | -- Demo_HR..EMPLOYEES definition
92 | CREATE TABLE Demo_HR.EMPLOYEES
93 | (EMPLOYEE_ID NUMERIC(6,0),
94 | FIRST_NAME VARCHAR(20),
95 | LAST_NAME VARCHAR(25) NOT NULL,
96 | EMAIL VARCHAR(25) NOT NULL,
97 | PHONE_NUMBER VARCHAR(20),
98 | HIRE_DATE DATETIME NOT NULL,
99 | JOB_ID VARCHAR(10) NOT NULL,
100 | SALARY NUMERIC(8,2),
101 | COMMISSION_PCT NUMERIC(2,2),
102 | MANAGER_ID NUMERIC(6,0),
103 | DEPARTMENT_ID NUMERIC(4,0),
104 | SOME_DATE_FMT1 VARCHAR(50),
105 | SOME_DATE_FMT2 VARCHAR(50),
106 | SOME_DATE_FMT3 VARCHAR(50),
107 | SOME_DATE_FMT4 VARCHAR(50),
108 | FAKE_SSN VARCHAR(11),
109 | ZIP5 VARCHAR(5),
110 | ZIP5OR9 VARCHAR(10),
111 | ZIP9 VARCHAR(10),
112 | EMAIL_ADDRESS VARCHAR(100),
113 | CONSTRAINT EMP_SALARY_MIN CHECK (salary > 0),
114 | CONSTRAINT EMP_EMAIL_UK UNIQUE (EMAIL),
115 | CONSTRAINT EMP_EMP_ID_PK PRIMARY KEY (EMPLOYEE_ID),
116 | CONSTRAINT EMP_DEPT_FK FOREIGN KEY (DEPARTMENT_ID) REFERENCES Demo_HR.DEPARTMENTS (DEPARTMENT_ID),
117 | CONSTRAINT EMP_JOB_FK FOREIGN KEY (JOB_ID) REFERENCES Demo_HR.JOBS (JOB_ID),
118 | CONSTRAINT EMP_MANAGER_FK FOREIGN KEY (MANAGER_ID) REFERENCES Demo_HR.EMPLOYEES (EMPLOYEE_ID)
119 | ) ;
120 |
121 | CREATE INDEX EMP_DEPARTMENT_IX ON Demo_HR.EMPLOYEES (DEPARTMENT_ID);
122 | CREATE INDEX EMP_JOB_IX ON Demo_HR.EMPLOYEES (JOB_ID);
123 | CREATE INDEX EMP_MANAGER_IX ON Demo_HR.EMPLOYEES (MANAGER_ID);
124 | CREATE INDEX EMP_NAME_IX ON Demo_HR.EMPLOYEES (LAST_NAME, FIRST_NAME);
125 |
126 |
127 |
128 |
129 |
130 |
131 |
--------------------------------------------------------------------------------
/mysql/data_validation/img/01_data_val_mysql_example_test_case_sql_code.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DataResearchLabs/sql_scripts/3eaa17caf248d3b58763534d62ef4d9ac5729a46/mysql/data_validation/img/01_data_val_mysql_example_test_case_sql_code.png
--------------------------------------------------------------------------------
/mysql/data_validation/img/02_data_val_mysql_run_results.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DataResearchLabs/sql_scripts/3eaa17caf248d3b58763534d62ef4d9ac5729a46/mysql/data_validation/img/02_data_val_mysql_run_results.png
--------------------------------------------------------------------------------
/mysql/data_validation/img/03_data_val_mysql_adv_test_case.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DataResearchLabs/sql_scripts/3eaa17caf248d3b58763534d62ef4d9ac5729a46/mysql/data_validation/img/03_data_val_mysql_adv_test_case.png
--------------------------------------------------------------------------------
/mysql/data_validation/img/04_data_val_mysql_run_results_adv.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DataResearchLabs/sql_scripts/3eaa17caf248d3b58763534d62ef4d9ac5729a46/mysql/data_validation/img/04_data_val_mysql_run_results_adv.png
--------------------------------------------------------------------------------
/mysql/data_validation/rule_set_01__row_counts.md:
--------------------------------------------------------------------------------
1 | ### Data Validation Examples - MySQL
2 | [](LICENSE "Creative Commons Zero License by DataResearchLabs (effectively = Public Domain")
3 | [](http://www.DataResearchLabs.com)
4 | #### [Return to Data Validation Home Page](https://github.com/DataResearchLabs/sql_scripts/blob/main/data_validation_scripts.md)
5 |
6 |
7 | # Rule Set #1 - Row Counts
8 |
9 | ## Table of Contents
10 | - T001 - "Full" Row Count
11 | - T002 - "Partial" Row Count
12 | - T003 - "Relative" Row Count
13 | - T004 - "Recent" Row Count
14 |
15 |
16 |
17 |
18 | ### T001 - "Full" Row Count
19 | Verify full row count for a table or view. For example, to verify that table countries has exactly 25 rows:
20 | ```sql
21 | SELECT CASE WHEN COUNT(*) <> 25 THEN 'FAIL' ELSE 'P' END AS status
22 | FROM demo_hr.countries;
23 | ```
24 |
25 |
26 |
27 |
28 | ### T002 - "Partial" Row Count
29 | Verify partial row count for a subset of a table or view. For example, to verify that table countries has exactly 8 rows where region_id = 1 (Europe):
30 | ```sql
31 | SELECT CASE WHEN COUNT(*) <> 8 THEN 'FAIL' ELSE 'P' END AS status
32 | FROM demo_hr.countries
33 | WHERE region_id = 1;
34 | ```
35 |
36 |
37 |
38 |
39 | ### T003 - "Relative" Row Count
40 | Verify relative row counts between tables or views. The example below verifies that table countries has at least 5 times the number of rows as table regions. The inner query returns one row with two columns: (a) count of contries, and (b) count of regions. The outer query can then apply the business logic of comparing the counts and returning a Pass or Fail.
41 | ```sql
42 | SELECT CASE WHEN countries_count < 5 * regions_count THEN 'FAIL' ELSE 'P' END AS status
43 | FROM (
44 | SELECT (SELECT COUNT(*) AS row_count FROM demo_hr.countries) AS countries_count
45 | , (SELECT COUNT(*) AS row_count FROM demo_hr.regions) AS regions_count
46 | ) t;
47 | ```
48 |
49 |
50 |
51 |
52 | ### T004 - "Recent" Row Count
53 | Verify recent row counts. The example below verifies that table countries has had at least 5 rows updated in the past 150 days based on the date stamp in field date_last_updated. The inner query uses DATE_SUB() (MySQL current date time stamp) and subtracts 150 days. The inner query counts how many rows in table countries have a date_last_updated with the past 150 days. The outer query can then confirm that at least 5 rows are recent for a Pass...if not then it returns a Fail.
54 | ```sql
55 | SELECT CASE WHEN row_count < 5 THEN 'FAIL' ELSE 'P' END AS status
56 | FROM (
57 | SELECT COUNT(*) AS row_count
58 | FROM demo_hr.countries
59 | WHERE date_last_updated >= DATE_SUB(NOW(), INTERVAL 150 DAY)
60 | ) t;
61 | ```
62 |
63 |
64 |
--------------------------------------------------------------------------------
/mysql/data_validation/rule_set_02__keys.md:
--------------------------------------------------------------------------------
1 | ### Data Validation Examples - MySQL
2 | [](LICENSE "Creative Commons Zero License by DataResearchLabs (effectively = Public Domain")
3 | [](http://www.DataResearchLabs.com)
4 | #### [Return to Data Validation Home Page](https://github.com/DataResearchLabs/sql_scripts/blob/main/data_validation_scripts.md)
5 |
6 |
7 | # Rule Set #2 - Keys (Foreign & Unique)
8 |
9 | ## Table of Contents
10 | - T005 - Unique (Native) Key Has No Duplicates
11 | - T006 - Foreign Key Childern Have Orphans
12 | - T007 - Foreign Key Parent is Childless
13 |
14 |
15 |
16 |
17 | ### T005 - Unique Key Has No Duplicates
18 | Sure, good database design implies that unique keys be enforced by a constraint so that you do not need to test for it. However, there are times where a decision is made to **not** add a constraint to enforce the unique key (e.g.: table is replicated from a source having the constraint so skipped for performance). This does happen! At work just last week, two of my unique key regression tests started failing -- without these checks in place as a wide net, the downstream defects would been around a lot longer.
19 | In the example below, the inner query does a group by on the unique key fields, then using a HAVING clause filters down to those key-values with a count of more than 1 -- the dups. The outer query returns a fail if any rows come back with dups (match_count >= 2), or a pass if no dups found.
20 | ```sql
21 | SELECT country_name -- UKey fields
22 | , COUNT(*) AS match_count
23 | FROM demo_hr.countries
24 | GROUP BY country_name -- UKey fields
25 | HAVING COUNT(*) > 1;
26 | ```
27 |
28 |
29 |
30 |
31 | ### T006 - Foreign Key Children Have Orphans
32 | Sure, as with T005 UKeys above, good database design implies that foreign keys be enforced by a constraint so that you do not need to test for it. However, there are times where for whatever reason the constraints do not exist. In those instances, you will want to periodically run a data validation test to ensure that this core assumption is not being violated (of course adding a foreign key constraint would be best, but if that is not an option then periodically check).
33 | In the example below, the inner query pulls from the child table countries as the anchor, then left joins out to the parent table regions on the key field region_id. If region_id does not exist in the parent table (p.region_id IS NULL), then the child region_id is an orphan. The outer query checks the count() of orphaned child rows: if it is >= 1 then the test fails, but if the count() = 0 then it passes.
34 | ```sql
35 | SELECT DISTINCT c.region_id AS child_id, p.region_id AS parent_id
36 | FROM demo_hr.countries c
37 | LEFT JOIN demo_hr.regions p ON p.region_id = c.region_id
38 | WHERE p.region_id IS NULL;
39 | ```
40 |
41 |
42 |
43 |
44 | ### T007 - Foreign Key Parent Is Childless
45 | Im not sure this particular test is all the useful because often it is okay for the parent-side of a foreign key relationship to not have children. But, if for some reason you need to be sure there is data present on both sides (parent **and** child), then this test is for you. You will notice in the example below that the query is very similar to T006 above, but the parent and child tables have switched positions in the FROM and LEFT JOIN lines. This is because we want to first pull all parent rows, then left join to find missing (FKey field IS NULL) child rows.
46 | ```sql
47 | SELECT DISTINCT c.country_id AS child_id, p.country_id AS parent_id
48 | FROM demo_hr.countries p
49 | LEFT JOIN demo_hr.locations c ON p.country_id = c.country_id
50 | WHERE c.country_id IS NULL
51 | AND p.country_id IN('IT','JP','US','CA','CN','IN','AU','SG','UK','DE','CH','NL','MX');
52 | ```
53 |
--------------------------------------------------------------------------------
/mysql/data_validation/rule_set_03__heuristic_thresholds.md:
--------------------------------------------------------------------------------
1 | ### Data Validation Examples - MySQL
2 | [](LICENSE "Creative Commons Zero License by DataResearchLabs (effectively = Public Domain")
3 | [](http://www.DataResearchLabs.com)
4 | #### [Return to Data Validation Home Page](https://github.com/DataResearchLabs/sql_scripts/blob/main/data_validation_scripts.md)
5 |
6 | # Rule Set #3 - Heuristic Thresholds (Column Null Rates & Value Rates)
7 |
8 |
9 | ## Table of Contents
10 | - T008 - Null Rate Thresholds
11 | - T009 - Value Frequency Thresholds
12 |
13 | Note: Although the terms "rate" and "frequency" are mostly synonomous, nulls are usually referred to as "rates" and values as "frequencies". Don't know why, but following the convention I've heard over and over.
14 |
15 |
16 |
17 |
18 | ### T008 - Null Rate Thresholds
19 | There is a lot going on in this "Null Rate Threshold" data validation query below. The basic goal is to validate that a given table's columns do not have too many, or too few NULLs.
20 | In the example below...
22 | ...we are checking columns department_name, manager_id, and url in the table departments. We want to ensure that column department_name has no NULLs, and that column manager_id is NULL less than 65% of the time and column url is NULL less than 80% of the time.
23 |
24 | This check is like the proverbial miner's canary in that it is a trip wire triggered when something goes awry in a data feed. I've used this test scenario to great effect when coupled with a create-date or last-updated-date to monitor the past week's data loads for any unexpected upticks in null rates.
25 |
26 | There is a downside to this test scenario too however; and that is when it fires false alarms and you find yourself tinkering with the thresholds values (0.0000, 0.65000, and 0.80000 cutoffs below), raising and lowering them over and over. If this happens, chances are test fails are not actionable nor important and you should not waste your time applying this test scenario to that given table and field. Be careful to only pick fields that truly matter.
27 |
28 | Below, there is an upper CTE (common table expression) named "dtls" at the WITH clause, and a lower wrapper that applies the business logic (if any null rate rejections were found, fail the case). Inside the dtls CTE, there is an inner query at the bottom (at the FROM clause) doing a single table scan to calculate a null rate per column by counting nulls in each column and dividing by the total table row count. The SELECT CASE logic at the top applies the business logic; comparing the actual calcuated null rates (nr_dept_nm, nr_mgr_id, and nr_url) against the expected threshold rates (hard-coded as 0.0000, 0.6500, and 0.8000). The returned value is a rejection code (REJ-01, REJ-02, etc.) clearly indicating which field failed the null rate check, what the actual null rate was, and what the expected null rate threshold to exceed was. If no rejections are triggered, then status returns a "P" for pass.
29 |
51 |
52 |
53 |
54 | ### T009 - Value Frequency Thresholds
55 | "Value Frequency Threshold" tests are fairly similar to null rates above (T008). The difference is that we are checking the frequency (or rate) at which a column's values occur.
56 | In the example below...
58 | ...we are checking the frequencies with which the values 1, 2, 3, and 4 occur in field region_id of table countries. There is an upper CTE (common table expression) named "dtls" at the WITH clause, and a lower wrapper that applies the business logic (if any value frequency rejections were found, fail the case). Inside the dtls CTE, there is an inner query at the bottom (at the FROM clause) doing a single table scan to calculate a frequencies for each value in the GROUP BY for the column. It the GROUP BY value count (field "freq") is divided by the total table row count (field "den") to calculate field "freq_rt". The SELECT CASE logic at the top applies the business logic; comparing the actual value frequencies (freq_rt when region_id = 1, or =2, etc.) against the expected threshold frequencies (hard-coded as 0.28 to 0.36, 016 to 0.24 and so on). The returned value is a rejection code (REJ-01, REJ-02, etc.) clearly indicating which field failed the value ferquency check, what the actual value frequency was, and what the expected value frequency threshold ranges were. If no rejections are triggered, then status returns a "P" for pass.
59 |
16 |
17 |
18 |
19 | ### T010 - Not Null
20 | Verify numeric field is not null. For example, to verify that table countries has no NULLs in field region_id:
21 | ```sql
22 | SELECT CASE WHEN COUNT(*) > 0 THEN 'FAIL' ELSE 'P' END AS status
23 | FROM demo_hr.countries
24 | WHERE region_id IS NULL;
25 | ```
26 |
27 |
28 |
29 |
30 | ### T011 - Not Negative
31 | Verify numeric field is not negative. For example, to verify that table countries has no field region_id negative values:
32 | ```sql
33 | SELECT CASE WHEN COUNT(*) > 0 THEN 'FAIL' ELSE 'P' END AS status
34 | FROM demo_hr.countries
35 | WHERE region_id < 0;
36 | ```
37 |
38 |
39 |
40 |
41 | ### T012 - Numeric Range
42 | Verify numeric field value is within a range. In the example below, we verify that field employee_id is between 100 and 999 in table employees. Note that you can run the inner query yourself to return the actual rejection code (is too low or too high) along with the actual value and the expected value...all nicely packaged for easy troubleshooting.
43 | ```sql
44 | SELECT employee_id
45 | , CASE WHEN employee_id < 100 THEN CONCAT('REJ-01: Verify employee_id > 99|exp>99|act=', CAST(employee_id AS CHAR(10)) )
46 | WHEN employee_id > 999 THEN CONCAT('REJ-02: Verify employee_id < 1000|exp<1000|act=', CAST(employee_id AS CHAR(10)) )
47 | ELSE 'P'
48 | END AS status
49 | FROM demo_hr.employees;
50 | ```
51 |
52 |
53 |
54 |
55 | ### T013 - In Value List
56 | Verify numeric field is **in** the list of values. For example, to verify that table countries field region_id is always values 1, 2, 3, or 4 we use the IN() clause as shown below:
57 | ```sql
58 | SELECT region_id
59 | , CASE WHEN region_id NOT IN(1,2,3,4) THEN 'FAIL' ELSE 'P' END AS status
60 | FROM demo_hr.countries;
61 | ```
62 |
63 |
64 |
65 |
66 | ### T014 - Not In Value List
67 | Verify numeric field is **not** in the list of values. For example, to verify that table countries field region_id is never in values 97, 98, or 99 we use the NOT IN() clauses as shown below:
68 | ```sql
69 | SELECT region_id
70 | , CASE WHEN region_id IN(97,98,99) THEN 'FAIL' ELSE 'P' END AS status
71 | FROM demo_hr.countries;
72 | ```
73 |
74 |
75 |
76 |
77 | ### T015 - Multi Field Compare
78 | Verify numeric field values in relation to one another. For example, to verify that salary times commission_pct is always less than $10,000 in table employees:
79 | ```sql
80 | SELECT salary, commission_pct
81 | , CASE WHEN salary * commission_pct > 10000 THEN 'FAIL' ELSE 'P' END AS status
82 | FROM demo_hr.employees;
83 | ```
84 |
85 |
--------------------------------------------------------------------------------
/mysql/data_validation/rule_set_05__date_values.md:
--------------------------------------------------------------------------------
1 | ### Data Validation Examples - MySQL
2 | [](LICENSE "Creative Commons Zero License by DataResearchLabs (effectively = Public Domain")
3 | [](http://www.DataResearchLabs.com)
4 | #### [Return to Data Validation Home Page](https://github.com/DataResearchLabs/sql_scripts/blob/main/data_validation_scripts.md)
5 |
6 |
7 | # Rule Set #5 - Date Values
8 |
9 | ## Table of Contents
10 | - T016 - Not Null
11 | - T017 - Date Range
12 | - T018 - No Time Part
13 | - T019 - Has Time Part
14 | - T020 - Multi Field Compare
15 | - Bonus Tip - Joining Tables with 2 Pairs of Start-End Date Overlaps
16 |
17 |
18 |
19 |
20 | ### T016 - Not Null
21 | Verify date field is not null. For example, to verify that table countries has no NULLs in field date_last_updated:
22 | ```sql
23 | SELECT date_last_updated
24 | , CASE WHEN date_last_updated IS NULL THEN 'FAIL' ELSE 'P' END AS status
25 | FROM demo_hr.countries;
26 | ```
27 |
28 |
29 |
30 |
31 | ### T017 - Date Range
32 | Verify date field is within specified range. For example, you can run the sql below to verify that table countries field date_last_updated is between 1/1/2021 and today. Note the use of SYSDATE to represent today's date dynamically in Oracle. Notice the inner query uses a CASE...WHEN...ELSE structure to identify two rejections codes: (1) date is too high, and (2) date is too low. Expected and actual values are displayed in the output if you run the inner query only. The outer query is a wrapper to determine whether the test passed or failed.
33 | ```sql
34 | SELECT date_last_updated
35 | , CASE WHEN date_last_updated > NOW() THEN CONCAT('REJ-01: Field date_last_updated cannot be in the future|exp<=', CAST(NOW() AS CHAR(20)), '|act=', CAST(date_last_updated AS CHAR(20)) )
36 | WHEN date_last_updated < '2021-01-01' THEN CONCAT('REJ-02: Field date_last_updated cannot be too old|exp>=1/1/2021|act=', CAST(date_last_updated AS CHAR(20)) )
37 | ELSE 'P'
38 | END AS status
39 | FROM demo_hr.countries;
40 | ```
41 |
42 |
43 |
44 |
45 | ### T018 - No Time Part
46 | Verify date field is a date only, no time part present. For example, to verify that table employees has no time part in field hire_date (time part must be "12:00:00"):
47 | ```sql
48 | SELECT hire_date
49 | , CASE WHEN DATE_FORMAT(hire_date, '%H:%i:%s') <> '00:00:00'THEN 'FAIL' ELSE 'P' END AS status
50 | FROM demo_hr.employees;
51 | ```
52 |
53 |
54 |
55 |
56 | ### T019 - Has Time Part
57 | Verify date field is a date **and** time. For example, to verify that table employees has a time part in field hire_date (time part cannot be "12:00:00"):
58 | ```sql
59 | SELECT date_last_updated
60 | , CASE WHEN DATE_FORMAT(date_last_updated, '%H:%i:%s') = '00:00:00' THEN 'FAIL' ELSE 'P' END AS status
61 | FROM demo_hr.countries;
62 | ```
63 |
64 |
65 |
66 |
67 | ### T020 - Multi Field Compare
68 | Verify multiple date fields relative to each other. For example, to verify that field start_date must be < field end_date in table job_history (thus if start_date is >= end_date the test case fails):
69 | ```sql
70 | SELECT start_date, end_date
71 | , CASE WHEN start_date >= end_date THEN 'FAIL' ELSE 'P' END AS status
72 | FROM demo_hr.job_history;
73 | ```
74 |
75 |
76 |
77 |
78 | ### Bonus Tip - Joining Tables with 2 Pairs of Start-End Date Overlaps
79 | So this is a fantastic tip I learned from a co-worker in healthcare back in 2011 (was it Jennifer C.? or Matt G.? or Jonathon P.? I can't remember).
80 |
81 |
82 | The **problem** is that you are trying to join two tables with logic where the table1.start_dt/end_dt's overlap with the table2.start_dt/end_dt.
83 |
84 |
85 | The **solution** is to **join on `table1.start_dt <= table2.end_dt AND table1.end_dt >= table2.start_dt`**.
86 |
87 | Here is why:
88 |
89 | ```
90 | Scenario #1 = "Discard - No Overlap" Table #1 date range is completely before Table #2's (Dt#1.End is NOT > Dt#2.Start)
91 | Table#1: |Start ----------- End|
92 | Table#2: |Start ----------- End|
93 |
94 |
95 | Scenario #2 = "Include - Table #1 End = Table#2 Start" Table #1 date range ends exactly where Table #2's starts (Tbl#1.End = Tbl#2.Start AND Tbl#1.Start < Tbl#2.End)
96 | Table#1: |Start ----------- End|
97 | Table#2: |Start ----------- End|
98 |
99 |
100 | Scenario #3 = "Include - Table #1 Ends Midway Table #2 Span" Table #1 date range nicely overlaps Table #2's (Tbl#1.End > Tbl#2.Start AND Tbl#1.Start < Tbl#2.End)
101 | Table#1: |Start ----------- End|
102 | Table#2: |Start ----------- End|
103 |
104 |
105 | Scenario #4 = "Include - Table #1 Starts Midway Table #2 Span" Table #1 date range nicely overlaps Table #2's (Tbl#1.End > Tbl#2.Start AND Tbl#1.Start < Tbl#2.End)
106 | Table#1: |Start ----------- End|
107 | Table#2: |Start ----------- End|
108 |
109 |
110 | Scenario #5 = "Include - Table #1 Starts at Table #2 End" Table #1 date range start exactly at Table #2's End (Tbl#1.End > Tbl#2.St AND Tbl#1.St = Tbl#2.End)
111 | Table#1: |Start ----------- End|
112 | Table#2: |Start ----------- End|
113 |
114 |
115 | Scenario #6 = "Discard - No Overlap" Date #1 is entirely after Date #2 (Tbl#1.Start is > Tbl#2.End)
116 | Table#1: |Start ----------- End|
117 | Table#2: |Start ----------- End|
118 |
119 | ```
120 |
121 |
122 |
123 |
--------------------------------------------------------------------------------
/mysql/data_validation/rule_set_09__defect_regression.md:
--------------------------------------------------------------------------------
1 | ### Data Validation Examples - MySQL
2 | [](LICENSE "Creative Commons Zero License by DataResearchLabs (effectively = Public Domain")
3 | [](http://www.DataResearchLabs.com)
4 | #### [Return to Data Validation Home Page](https://github.com/DataResearchLabs/sql_scripts/blob/main/data_validation_scripts.md)
5 |
6 |
7 | # Rule Set #9 - Defect Regression
8 |
9 | ## Summary
10 | Where possible, it is a good idea to cover known defects with a test case. This enables automated regression testing for free. Yuo can simply run this script to quickly re-check whether a defect has been re-introduced.
11 |
12 | There are a lot of caveats that will block you from covering every defect:
13 | * If anything but the simplest of setup and tear down data is required
14 | * If processes or jobs or applications or ETL must be triggered
15 | * If there are interactions with other databases or files not on the current server
16 |
17 | That said, you can still frequently setup static trip wires to monitor the data for recurrence of old bugs.
18 |
19 | Some real-world examples from my past include:
20 | * A defect where email addresses stored in the database had TAB characters. I setup a simple CHAR(9) check that ran daily and cross referenced the defect number in the SQL return results.
21 | * A defect where customer specific HTML snippets for email reports had bad tags manually entered by company analysts working with the client. Obviously fixing the app to avoid those data entries would be best, but that was a different department that was going to get to it later. So, I setup a daily alert to check for those HTML tags and some other similar ones that might be accidentally introduced.
22 | * A defect where internal data entry folks on rare occasions had to key in customer names. They were supposed to be all upper case for other reasons in the system, but the analysts sometimes entered lower case. Developers were not going to get around to fixing it for a couple of months (other priorities, this was a very low frequency issue). So I setup a daily check to trip an alarm and notify via email the appropriate person to correct their data.
23 | * Basically, you can take any of the test case examples in Rule Sets #1 thu #8 and apply them as regression tests for defects when appropriate.
24 |
--------------------------------------------------------------------------------
/mysql/schemadiff/simple_schema_dump.sql:
--------------------------------------------------------------------------------
1 | ------------------------------------------------------------------------------------
2 | -- Simple Schema Dump:
3 | -- This SQL script will dump table, column, key, and index design related metadata
4 | -- so that you can copy-paste or export to a text file.
5 | -- Even better, you can make other snapshots over time (same database schema earlier
6 | -- points in time), OR in different environments (DEV, PPMO, STAGE, PROD). Then,
7 | -- using your favorite free Text File Diff Tool (DiffMerge, ExamDiff, etc.) you
8 | -- can compare snapshots to quick isolate and identify what changed over time
9 | -- or is different between environments.
10 | ------------------------------------------------------------------------------------
11 | -- Platform: MySQL Server
12 | -- Author: DataResearchLabs
13 | -- GitHub: https://github.com/DataResearchLabs/sql_scripts
14 | -- YouTube Tutorials: https://www.youtube.com/playlist?list=PLVHoUDdbskUT0kz-4aB68EsollbmelcjT
15 | ----------------------------------------------------------------------------------
16 | USE sakila -- <<<<<<<<<<<<< Change schema here
17 | ;
18 |
19 | WITH vars
20 | AS (
21 | SELECT DATABASE() AS v_SchemaName
22 | )
23 |
24 | , baseTbl
25 | AS (
26 | SELECT table_schema AS SchemaName, table_type, table_name
27 | FROM INFORMATION_SCHEMA.TABLES
28 | WHERE table_schema = (SELECT v_SchemaName FROM vars)
29 | )
30 |
31 | , metaForTbl
32 | AS (
33 | SELECT t.SchemaName
34 | , t.table_name AS TableName
35 | , CONCAT( '('
36 | , CASE WHEN t.table_type <=> 'BASE TABLE' THEN 'Table'
37 | WHEN t.table_type <=> 'VIEW' THEN 'View'
38 | ELSE 'UK'
39 | END
40 | , ')'
41 | ) AS ObjectType
42 | , t.table_name AS ObjectName
43 | , '(Exists)' AS PropertyName
44 | , ' ' AS PropertyValue
45 | FROM baseTbl t
46 | )
47 |
48 | , metaForCol_dataType
49 | AS (
50 | SELECT ft.SchemaName, ft.table_name AS TableName, 'Column' AS ObjectType, tut.column_name AS ObjectName
51 | , '2' AS PropertyName
52 | , CONCAT(COALESCE(tut.data_type, 'unknown')
53 | , '('
54 | , CASE WHEN tut.CHARACTER_MAXIMUM_LENGTH IS NOT NULL THEN CAST(tut.CHARACTER_MAXIMUM_LENGTH AS CHAR(10)) ELSE '' END
55 | , CASE WHEN tut.DATA_TYPE IN('date','datetime','timestamp') THEN CAST(tut.DATETIME_PRECISION AS CHAR(10))
56 | WHEN tut.NUMERIC_PRECISION IS NULL THEN ''
57 | ELSE CONCAT(CAST(tut.NUMERIC_PRECISION AS CHAR(10)))
58 | END
59 | , CASE WHEN tut.NUMERIC_SCALE IS NOT NULL THEN CONCAT(',', CAST(tut.NUMERIC_SCALE AS CHAR(10))) ELSE '' END
60 | , ')'
61 | ) AS PropertyValue
62 | FROM INFORMATION_SCHEMA.COLUMNS tut
63 | INNER JOIN baseTbl ft ON ft.SchemaName = tut.TABLE_SCHEMA AND ft.TABLE_NAME = tut.TABLE_NAME
64 | )
65 |
66 | , metaForCol_nullable
67 | AS (
68 | SELECT ft.SchemaName, ft.table_name AS TableName, 'Column' AS ObjectType, tut.column_name AS ObjectName
69 | , '3' AS PropertyName, CASE WHEN tut.IS_NULLABLE = 'YES' THEN 'NULL' ELSE 'NOT NULL' END AS PropertyValue
70 | FROM INFORMATION_SCHEMA.COLUMNS tut
71 | INNER JOIN baseTbl ft ON ft.SchemaName = tut.TABLE_SCHEMA AND ft.table_name = tut.table_name
72 | )
73 |
74 | , metaForCol_ordpos
75 | AS (
76 | SELECT ft.SchemaName, ft.table_name AS TableName, 'Column' AS ObjectType, tut.column_name AS ObjectName
77 | , '1' AS PropertyName
78 | , LPAD(CAST(tut.ORDINAL_POSITION AS CHAR(3)), 3, 0) AS PropertyValue
79 | FROM INFORMATION_SCHEMA.COLUMNS tut
80 | INNER JOIN baseTbl ft ON ft.SchemaName = tut.TABLE_SCHEMA AND ft.table_name = tut.table_name
81 | )
82 |
83 | , metaAllCols
84 | AS (
85 | SELECT schemaname, tablename, objecttype, objectname, 'Properties' AS propertyname
86 | , GROUP_CONCAT(propertyvalue ORDER BY propertyname, propertyvalue SEPARATOR ' | ') AS propertyvalue
87 | FROM (
88 | SELECT * FROM metaForCol_dataType
89 | UNION SELECT * FROM metaForCol_nullable
90 | UNION SELECT * FROM metaForCol_ordpos
91 | ) t
92 | GROUP BY schemaname, tablename, objecttype, objectname
93 | )
94 |
95 | , metaForKeys
96 | AS (
97 | SELECT cons.TABLE_SCHEMA AS SchemaName, cons.TABLE_NAME AS TableName
98 | , CASE WHEN cons.constraint_type = 'PRIMARY KEY' THEN 'PKey'
99 | WHEN cons.constraint_type = 'UNIQUE' THEN 'UKey'
100 | WHEN cons.constraint_type = 'FOREIGN KEY' THEN 'FKey'
101 | ELSE 'X'
102 | END AS ObjectType
103 | , cons.constraint_name AS ObjectName
104 | , 'FieldList' AS PropertyName
105 | , GROUP_CONCAT(kcu.COLUMN_NAME ORDER BY kcu.ORDINAL_POSITION SEPARATOR ',') AS PropertyValue
106 | FROM INFORMATION_SCHEMA.TABLE_CONSTRAINTS cons
107 | INNER JOIN INFORMATION_SCHEMA.KEY_COLUMN_USAGE kcu
108 | ON cons.TABLE_SCHEMA = kcu.TABLE_SCHEMA
109 | AND cons.TABLE_NAME = kcu.TABLE_NAME
110 | AND cons.CONSTRAINT_NAME = kcu.CONSTRAINT_NAME
111 | WHERE cons.table_schema = (SELECT v_SchemaName FROM vars)
112 | AND cons.table_name IN(SELECT DISTINCT table_name FROM baseTbl)
113 | AND cons.constraint_type IN('PRIMARY KEY','FOREIGN KEY','UNIQUE')
114 | GROUP BY cons.TABLE_SCHEMA, cons.TABLE_NAME, cons.CONSTRAINT_TYPE, cons.CONSTRAINT_NAME
115 | )
116 |
117 | , metaForIdxs
118 | AS (
119 | SELECT (SELECT v_SchemaName FROM vars) AS SchemaName
120 | , table_name AS TableName
121 | , 'Index' AS ObjectType
122 | , index_name AS ObjectName
123 | , 'FieldList' AS PropertyName
124 | , GROUP_CONCAT(column_name ORDER BY seq_in_index SEPARATOR ',') AS PropertyValue
125 | FROM information_schema.statistics
126 | WHERE table_schema = (SELECT v_SchemaName FROM vars)
127 | GROUP BY table_name, index_name
128 | )
129 |
130 | , allMetadata
131 | AS (
132 | SELECT * FROM metaForTbl
133 | UNION SELECT * FROM metaAllCols
134 | UNION SELECT * FROM metaForKeys
135 | UNION SELECT * FROM metaForIdxs
136 | )
137 |
138 | SELECT CASE WHEN objecttype IN('(Table)','(View)') THEN schemaname ELSE ' ' END AS schema_nm
139 | , CASE WHEN objecttype IN('(Table)','(View)') THEN tablename ELSE ' ' END AS tbl_nm
140 | , objecttype AS obj_typ, objectname AS obj_nm, /*propertyname,*/ propertyvalue AS properties
141 | FROM allMetadata
142 | ORDER BY schemaname, tablename, objecttype
143 | , CASE WHEN objecttype='Column' THEN propertyvalue ELSE ' ' END
144 | , objectname, propertyname
145 |
146 |
147 |
--------------------------------------------------------------------------------
/oracle/data_dictionary/data_dict_dump.sql:
--------------------------------------------------------------------------------
1 |
2 | ------------------------------------------------------------------------------------
3 | -- Data Dictionary Dump:
4 | -- This SQL script will dump table, column, key, and description design related
5 | -- metadata so that you can copy-paste or export to Excel as a Data Dictionary.
6 | ------------------------------------------------------------------------------------
7 | -- Platform: Oracle
8 | -- Author: DataResearchLabs
9 | -- GitHub: https://github.com/DataResearchLabs/sql_scripts
10 | -- YouTube Tutorials: https://www.youtube.com/channel/UCQciXv3xaBykeUFc04GxSXA
11 | ----------------------------------------------------------------------------------
12 |
13 | -- All variables are consolidated here in the first CTE (Common Table Expression)
14 | -- Each given row is a variable, with the value you change preceding the "AS" command
15 | WITH vars
16 | AS (
17 | SELECT
18 | 'HR' AS v_SchemaName -- <<<<<<<<<<<< CHANGE THIS VALUE to Schema you want to dump
19 | , 'NO' AS v_TablesOnly -- YES=Limit To Tables only; NO=Include views too
20 | FROM dual
21 | )
22 |
23 | , meta_data
24 | AS (
25 | SELECT
26 | atc.owner AS SCHEMA_NM
27 | , atc.table_name AS TABLE_NM
28 | , CASE WHEN t.TABLE_TYPE = 'TABLE' THEN 'TBL'
29 | WHEN t.TABLE_TYPE = 'VIEW' THEN 'VW'
30 | ELSE 'UK'
31 | END AS OBJ_TYP
32 | , atc.column_id AS ORD_POS
33 | , atc.column_name AS COLUMN_NM
34 | , (COALESCE(atc.data_type, 'UNKNOWN') ||
35 | decode(atc.data_type,
36 | 'NUMBER',
37 | decode(atc.data_precision, null, '',
38 | '(' || to_char(atc.data_precision) || decode(atc.data_scale,null,'',0,'',',' || to_char(atc.data_scale) )
39 | || ')' ),
40 | 'FLOAT', '(' || to_char(atc.data_precision) || ')',
41 | 'VARCHAR2', '(' || to_char(atc.data_length) || ')',
42 | 'NVARCHAR2', '(' || to_char(atc.data_length) || ')',
43 | 'VARCHAR', '(' || to_char(atc.data_length) || ')',
44 | 'CHAR', '(' || to_char(atc.data_length) || ')',
45 | 'RAW', '(' || to_char(atc.data_length) || ')',
46 | 'MLSLABEL',decode(atc.data_length,null,'',0,'','(' || to_char(atc.data_length) || ')'),
47 | '')
48 | ) AS DATA_TYP
49 | , CASE WHEN atc.nullable = 'Y' THEN 'NULL' ELSE 'NOT NULL' END AS NULLABLE
50 | , dcc.comments AS DESCRIPTION
51 | FROM all_tab_columns atc
52 | INNER JOIN all_col_comments dcc ON atc.owner = dcc.owner AND atc.table_name = dcc.table_name AND atc.column_name = dcc.column_name
53 | INNER JOIN all_tab_comments t ON t.OWNER = atc.owner AND t.TABLE_NAME = atc.table_name
54 | WHERE atc.owner = (SELECT vars.v_SchemaName FROM vars)
55 | AND ( (t.TABLE_TYPE = 'TABLE')
56 | OR ((SELECT v_TablesOnly FROM vars) = 'NO')
57 | )
58 | )
59 |
60 |
61 | , meta_for_keys
62 | AS (
63 | SELECT SCHEMA_NM, TABLE_NM, COLUMN_NM
64 | , LISTAGG(IS_KEY, ', ')
65 | WITHIN GROUP(ORDER BY IS_KEY DESC) AS IS_KEY
66 | FROM (
67 | SELECT cons.owner AS SCHEMA_NM
68 | , cols.table_name AS TABLE_NM
69 | , cols.column_name AS COLUMN_NM
70 | , CASE WHEN cons.constraint_type = 'P' THEN 'PK'
71 | WHEN cons.constraint_type = 'U' THEN 'UK'
72 | WHEN cons.constraint_type = 'R' THEN 'FK'
73 | ELSE 'X'
74 | END AS IS_KEY
75 | FROM all_constraints cons
76 | INNER JOIN all_cons_columns cols ON cons.constraint_name = cols.constraint_name AND cons.owner = cols.owner
77 | WHERE cons.owner = (SELECT vars.v_SchemaName FROM vars)
78 | AND cons.table_name IN(SELECT DISTINCT TABLE_NM FROM meta_data)
79 | AND cons.constraint_type IN('P','R','U')
80 | GROUP BY cons.owner, cols.table_name, cols.column_name, cons.constraint_type
81 | ) t
82 | GROUP BY SCHEMA_NM, TABLE_NM, COLUMN_NM
83 | )
84 |
85 |
86 | SELECT md.SCHEMA_NM, md.TABLE_NM, md.OBJ_TYP
87 | , md.ORD_POS AS ORD
88 | , COALESCE(pk.IS_KEY, ' ') AS KEYS
89 | , md.COLUMN_NM, md.DATA_TYP, md.NULLABLE, md.DESCRIPTION
90 | FROM meta_data md
91 | LEFT JOIN meta_for_keys pk ON pk.SCHEMA_NM = md.SCHEMA_NM AND pk.TABLE_NM = md.TABLE_NM AND pk.COLUMN_NM = md.COLUMN_NM
92 | ORDER BY md.SCHEMA_NM, md.TABLE_NM, md.ORD_POS
93 |
94 |
--------------------------------------------------------------------------------
/oracle/data_validation/demo_data/demo_hr_01_create_tables.sql:
--------------------------------------------------------------------------------
1 | CREATE USER DEMO_HR IDENTIFIED BY DEMO_HR;
2 |
3 | GRANT UNLIMITED TABLESPACE TO DEMO_HR;
4 |
5 |
6 |
7 | -- DEMO_HR.JOBS definition
8 | CREATE TABLE "DEMO_HR"."JOBS"
9 | ("JOB_ID" VARCHAR2(10),
10 | "JOB_TITLE" VARCHAR2(35) CONSTRAINT "JOB_TITLE_NN" NOT NULL ENABLE,
11 | "MIN_SALARY" NUMBER(6,0),
12 | "MAX_SALARY" NUMBER(6,0),
13 | CONSTRAINT "JOB_ID_PK" PRIMARY KEY ("JOB_ID") ENABLE
14 | );
15 |
16 |
17 |
18 | -- DEMO_HR.JOBS_SNAPSHOT definition
19 | CREATE TABLE "DEMO_HR"."JOBS_SNAPSHOT"
20 | ("JOB_ID" VARCHAR2(10),
21 | "JOB_TITLE" VARCHAR2(35) NOT NULL ENABLE,
22 | "MIN_SALARY" NUMBER(6,0),
23 | "MAX_SALARY" NUMBER(6,0)
24 | );
25 |
26 |
27 |
28 | -- DEMO_HR.REGIONS definition
29 | CREATE TABLE "DEMO_HR"."REGIONS"
30 | ("REGION_ID" NUMBER CONSTRAINT "REGION_ID_NN" NOT NULL ENABLE,
31 | "REGION_NAME" VARCHAR2(25),
32 | CONSTRAINT "REG_ID_PK" PRIMARY KEY ("REGION_ID") ENABLE
33 | );
34 |
35 |
36 |
37 | -- DEMO_HR.COUNTRIES definition
38 | CREATE TABLE "DEMO_HR"."COUNTRIES"
39 | ("COUNTRY_ID" CHAR(2) CONSTRAINT "COUNTRY_ID_NN" NOT NULL ENABLE,
40 | "COUNTRY_NAME" VARCHAR2(40),
41 | "REGION_ID" NUMBER,
42 | "DATE_LAST_UPDATED" DATE DEFAULT SYSDATE,
43 | CONSTRAINT "COUNTRY_C_ID_PK" PRIMARY KEY ("COUNTRY_ID") ENABLE,
44 | CONSTRAINT "COUNTR_REG_FK" FOREIGN KEY ("REGION_ID") REFERENCES "DEMO_HR"."REGIONS" ("REGION_ID") ENABLE
45 | );
46 |
47 |
48 |
49 |
50 | -- DEMO_HR.LOCATIONS definition
51 | CREATE TABLE "DEMO_HR"."LOCATIONS"
52 | ("LOCATION_ID" NUMBER(4,0),
53 | "STREET_ADDRESS" VARCHAR2(40),
54 | "POSTAL_CODE" VARCHAR2(12),
55 | "CITY" VARCHAR2(30) CONSTRAINT "LOC_CITY_NN" NOT NULL ENABLE,
56 | "STATE_PROVINCE" VARCHAR2(25),
57 | "COUNTRY_ID" CHAR(2),
58 | CONSTRAINT "LOC_ID_PK" PRIMARY KEY ("LOCATION_ID") ENABLE,
59 | CONSTRAINT "LOC_C_ID_FK" FOREIGN KEY ("COUNTRY_ID") REFERENCES "DEMO_HR"."COUNTRIES" ("COUNTRY_ID") ENABLE
60 | );
61 |
62 | CREATE INDEX "DEMO_HR"."LOC_CITY_IX" ON "DEMO_HR"."LOCATIONS" ("CITY");
63 | CREATE INDEX "DEMO_HR"."LOC_STATE_PROVINCE_IX" ON "DEMO_HR"."LOCATIONS" ("STATE_PROVINCE");
64 | CREATE INDEX "DEMO_HR"."LOC_COUNTRY_IX" ON "DEMO_HR"."LOCATIONS" ("COUNTRY_ID");
65 |
66 |
67 |
68 | -- DEMO_HR.DEPARTMENTS definition
69 | CREATE TABLE "DEMO_HR"."DEPARTMENTS"
70 | ("DEPARTMENT_ID" NUMBER(4,0),
71 | "DEPARTMENT_NAME" VARCHAR2(30) CONSTRAINT "DEPT_NAME_NN" NOT NULL ENABLE,
72 | "MANAGER_ID" NUMBER(6,0),
73 | "LOCATION_ID" NUMBER(4,0),
74 | "URL" VARCHAR2(255),
75 | CONSTRAINT "DEPT_ID_PK" PRIMARY KEY ("DEPARTMENT_ID") ENABLE,
76 | CONSTRAINT "DEPT_LOC_FK" FOREIGN KEY ("LOCATION_ID") REFERENCES "DEMO_HR"."LOCATIONS" ("LOCATION_ID") ENABLE
77 | ) ;
78 |
79 | CREATE INDEX "DEMO_HR"."DEPT_LOCATION_IX" ON "DEMO_HR"."DEPARTMENTS" ("LOCATION_ID");
80 |
81 |
82 |
83 | -- DEMO_HR.JOB_HISTORY definition
84 | CREATE TABLE "DEMO_HR"."JOB_HISTORY"
85 | ("EMPLOYEE_ID" NUMBER(6,0) CONSTRAINT "JHIST_EMPLOYEE_NN" NOT NULL ENABLE,
86 | "START_DATE" DATE CONSTRAINT "JHIST_START_DATE_NN" NOT NULL ENABLE,
87 | "END_DATE" DATE CONSTRAINT "JHIST_END_DATE_NN" NOT NULL ENABLE,
88 | "JOB_ID" VARCHAR2(10) CONSTRAINT "JHIST_JOB_NN" NOT NULL ENABLE,
89 | "DEPARTMENT_ID" NUMBER(4,0),
90 | CONSTRAINT "JHIST_DATE_INTERVAL" CHECK (end_date > start_date) ENABLE,
91 | CONSTRAINT "JHIST_EMP_ID_ST_DATE_PK" PRIMARY KEY ("EMPLOYEE_ID", "START_DATE") ENABLE,
92 | CONSTRAINT "JHIST_JOB_FK" FOREIGN KEY ("JOB_ID") REFERENCES "DEMO_HR"."JOBS" ("JOB_ID") ENABLE,
93 | CONSTRAINT "JHIST_DEPT_FK" FOREIGN KEY ("DEPARTMENT_ID") REFERENCES "DEMO_HR"."DEPARTMENTS" ("DEPARTMENT_ID") ENABLE
94 | ) ;
95 |
96 | CREATE INDEX "DEMO_HR"."JHIST_JOB_IX" ON "DEMO_HR"."JOB_HISTORY" ("JOB_ID");
97 | CREATE INDEX "DEMO_HR"."JHIST_EMPLOYEE_IX" ON "DEMO_HR"."JOB_HISTORY" ("EMPLOYEE_ID");
98 | CREATE INDEX "DEMO_HR"."JHIST_DEPARTMENT_IX" ON "DEMO_HR"."JOB_HISTORY" ("DEPARTMENT_ID");
99 |
100 |
101 |
102 |
103 | -- DEMO_HR.EMPLOYEES definition
104 | CREATE TABLE "DEMO_HR"."EMPLOYEES"
105 | ("EMPLOYEE_ID" NUMBER(6,0),
106 | "FIRST_NAME" VARCHAR2(20),
107 | "LAST_NAME" VARCHAR2(25) CONSTRAINT "EMP_LAST_NAME_NN" NOT NULL ENABLE,
108 | "EMAIL" VARCHAR2(25) CONSTRAINT "EMP_EMAIL_NN" NOT NULL ENABLE,
109 | "PHONE_NUMBER" VARCHAR2(20),
110 | "HIRE_DATE" DATE CONSTRAINT "EMP_HIRE_DATE_NN" NOT NULL ENABLE,
111 | "JOB_ID" VARCHAR2(10) CONSTRAINT "EMP_JOB_NN" NOT NULL ENABLE,
112 | "SALARY" NUMBER(8,2),
113 | "COMMISSION_PCT" NUMBER(2,2),
114 | "MANAGER_ID" NUMBER(6,0),
115 | "DEPARTMENT_ID" NUMBER(4,0),
116 | "SOME_DATE_FMT1" VARCHAR2(50),
117 | "SOME_DATE_FMT2" VARCHAR2(50),
118 | "SOME_DATE_FMT3" VARCHAR2(50),
119 | "SOME_DATE_FMT4" VARCHAR2(50),
120 | "FAKE_SSN" VARCHAR2(11),
121 | "ZIP5" VARCHAR2(5),
122 | "ZIP5OR9" VARCHAR2(10),
123 | "ZIP9" VARCHAR2(10),
124 | "EMAIL_ADDRESS" VARCHAR2(100),
125 | CONSTRAINT "EMP_SALARY_MIN" CHECK (salary > 0) ENABLE,
126 | CONSTRAINT "EMP_EMAIL_UK" UNIQUE ("EMAIL") ENABLE,
127 | CONSTRAINT "EMP_EMP_ID_PK" PRIMARY KEY ("EMPLOYEE_ID") ENABLE,
128 | CONSTRAINT "EMP_DEPT_FK" FOREIGN KEY ("DEPARTMENT_ID") REFERENCES "DEMO_HR"."DEPARTMENTS" ("DEPARTMENT_ID") ENABLE,
129 | CONSTRAINT "EMP_JOB_FK" FOREIGN KEY ("JOB_ID") REFERENCES "DEMO_HR"."JOBS" ("JOB_ID") ENABLE,
130 | CONSTRAINT "EMP_MANAGER_FK" FOREIGN KEY ("MANAGER_ID") REFERENCES "DEMO_HR"."EMPLOYEES" ("EMPLOYEE_ID") ENABLE
131 | ) ;
132 |
133 | CREATE INDEX "DEMO_HR"."EMP_DEPARTMENT_IX" ON "DEMO_HR"."EMPLOYEES" ("DEPARTMENT_ID");
134 | CREATE INDEX "DEMO_HR"."EMP_JOB_IX" ON "DEMO_HR"."EMPLOYEES" ("JOB_ID");
135 | CREATE INDEX "DEMO_HR"."EMP_MANAGER_IX" ON "DEMO_HR"."EMPLOYEES" ("MANAGER_ID");
136 | CREATE INDEX "DEMO_HR"."EMP_NAME_IX" ON "DEMO_HR"."EMPLOYEES" ("LAST_NAME", "FIRST_NAME");
137 |
138 |
139 |
140 |
141 |
142 |
143 |
--------------------------------------------------------------------------------
/oracle/data_validation/img/01_data_val_oracle_example_test_case_sql_code.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DataResearchLabs/sql_scripts/3eaa17caf248d3b58763534d62ef4d9ac5729a46/oracle/data_validation/img/01_data_val_oracle_example_test_case_sql_code.png
--------------------------------------------------------------------------------
/oracle/data_validation/img/02_data_val_oracle_run_results1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DataResearchLabs/sql_scripts/3eaa17caf248d3b58763534d62ef4d9ac5729a46/oracle/data_validation/img/02_data_val_oracle_run_results1.png
--------------------------------------------------------------------------------
/oracle/data_validation/img/03_data_val_oracle_adv_test_case_ex.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DataResearchLabs/sql_scripts/3eaa17caf248d3b58763534d62ef4d9ac5729a46/oracle/data_validation/img/03_data_val_oracle_adv_test_case_ex.png
--------------------------------------------------------------------------------
/oracle/data_validation/img/04_data_val_oracle_run_results_adv.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DataResearchLabs/sql_scripts/3eaa17caf248d3b58763534d62ef4d9ac5729a46/oracle/data_validation/img/04_data_val_oracle_run_results_adv.png
--------------------------------------------------------------------------------
/oracle/data_validation/rule_set_01__row_counts.md:
--------------------------------------------------------------------------------
1 | ### Data Validation Examples - Oracle
2 | [](LICENSE "Creative Commons Zero License by DataResearchLabs (effectively = Public Domain")
3 | [](http://www.DataResearchLabs.com)
4 | #### [Return to Data Validation Home Page](https://github.com/DataResearchLabs/sql_scripts/blob/main/data_validation_scripts.md)
5 |
6 |
7 | # Rule Set #1 - Row Counts
8 |
9 | ## Table of Contents
10 | - T001 - "Full" Row Count
11 | - T002 - "Partial" Row Count
12 | - T003 - "Relative" Row Count
13 | - T004 - "Recent" Row Count
14 |
15 |
16 |
17 |
18 | ### T001 - "Full" Row Count
19 | Verify full row count for a table or view. For example, to verify that table countries has exactly 25 rows:
20 | ```sql
21 | SELECT CASE WHEN COUNT(*) <> 25 THEN 'FAIL' ELSE 'P' END AS status
22 | FROM demo_hr.countries;
23 | ```
24 |
25 |
26 |
27 |
28 | ### T002 - "Partial" Row Count
29 | Verify partial row count for a subset of a table or view. For example, to verify that table countries has exactly 8 rows where region_id = 1 (Europe):
30 | ```sql
31 | SELECT CASE WHEN COUNT(*) <> 8 THEN 'FAIL' ELSE 'P' END AS status
32 | FROM demo_hr.countries
33 | WHERE region_id = 1;
34 | ```
35 |
36 |
37 |
38 |
39 | ### T003 - "Relative" Row Count
40 | Verify relative row counts between tables or views. The example below verifies that table countries has at least 5 times the number of rows as table regions. The inner query returns one row with two columns: (a) count of contries, and (b) count of regions. The outer query can then apply the business logic of comparing the counts and returning a Pass or Fail.
41 | ```sql
42 | SELECT CASE WHEN countries_count < 5 * regions_count THEN 'FAIL' ELSE 'P' END AS status
43 | FROM (
44 | SELECT (SELECT COUNT(*) AS row_count FROM demo_hr.countries) AS countries_count
45 | , (SELECT COUNT(*) AS row_count FROM demo_hr.regions) AS regions_count
46 | FROM dual
47 | );
48 | ```
49 |
50 |
51 |
52 |
53 | ### T004 - "Recent" Row Count
54 | Verify recent row counts. The example below verifies that table countries has had at least 5 rows updated in the past 10 days based on the date stamp in field date_last_updated. The inner query uses SYSDATE (oracle current date time stamp) and subtracts 10 days. The inner query counts how many rows in table countries have a date_last_updated with the past 10 days. The outer query can then confirm that at least 5 rows are recent for a Pass...if not then it returns a Fail.
55 | ```sql
56 | SELECT CASE WHEN row_count < 5 THEN 'FAIL' ELSE 'P' END AS status
57 | FROM (
58 | SELECT COUNT(*) AS row_count
59 | FROM demo_hr.countries
60 | WHERE date_last_updated >= SYSDATE - 10
61 | );
62 | ```
63 |
64 |
65 |
66 |
--------------------------------------------------------------------------------
/oracle/data_validation/rule_set_02__keys.md:
--------------------------------------------------------------------------------
1 | ### Data Validation Examples - Oracle
2 | [](LICENSE "Creative Commons Zero License by DataResearchLabs (effectively = Public Domain")
3 | [](http://www.DataResearchLabs.com)
4 | #### [Return to Data Validation Home Page](https://github.com/DataResearchLabs/sql_scripts/blob/main/data_validation_scripts.md)
5 |
6 |
7 | # Rule Set #2 - Keys (Foreign & Unique)
8 |
9 | ## Table of Contents
10 | - T005 - Unique (Native) Key Has No Duplicates
11 | - T006 - Foreign Key Childern Have Orphans
12 | - T007 - Foreign Key Parent is Childless
13 |
14 |
15 |
16 |
17 | ### T005 - Unique Key Has No Duplicates
18 | Sure, good database design implies that unique keys be enforced by a constraint so that you do not need to test for it. However, there are times where a decision is made to **not** add a constraint to enforce the unique key (e.g.: table is replicated from a source having the constraint so skipped for performance). This does happen! At work just last week, two of my unique key regression tests started failing -- without these checks in place as a wide net, the downstream defects would been around a lot longer.
19 | In the example below, the inner query does a group by on the unique key fields, then using a HAVING clause filters down to those key-values with a count of more than 1 -- the dups. The outer query returns a fail if any rows come back with dups (match_count >= 2), or a pass if no dups found.
20 | ```sql
21 | SELECT CASE WHEN COUNT(*) > 0 THEN 'FAIL' ELSE 'P' END AS status
22 | FROM (
23 | SELECT country_name -- UKey fields separated by comma
24 | , COUNT(*) AS match_count
25 | FROM demo_hr.countries -- UKey fields separated by comma
26 | GROUP BY country_name
27 | HAVING COUNT(*) > 1
28 | );
29 | ```
30 |
31 |
32 |
33 |
34 | ### T006 - Foreign Key Children Have Orphans
35 | Sure, as with T005 UKeys above, good database design implies that foreign keys be enforced by a constraint so that you do not need to test for it. However, there are times where for whatever reason the constraints do not exist. In those instances, you will want to periodically run a data validation test to ensure that this core assumption is not being violated (of course adding a foreign key constraint would be best, but if that is not an option then periodically check).
36 | In the example below, the inner query pulls from the child table countries as the anchor, then left joins out to the parent table regions on the key field region_id. If region_id does not exist in the parent table (p.region_id IS NULL), then the child region_id is an orphan. The outer query checks the count() of orphaned child rows: if it is >= 1 then the test fails, but if the count() = 0 then it passes.
37 | ```sql
38 | SELECT CASE WHEN COUNT(*) > 0 THEN 'FAIL' ELSE 'P' END AS status
39 | FROM (
40 | SELECT DISTINCT c.region_id AS child_id, p.region_id AS parent_id
41 | FROM demo_hr.countries c
42 | LEFT JOIN demo_hr.regions p ON p.region_id = c.region_id
43 | WHERE p.region_id IS NULL
44 | );
45 | ```
46 |
47 |
48 |
49 |
50 | ### T007 - Foreign Key Parent Is Childless
51 | Im not sure this particular test is all the useful because often it is okay for the parent-side of a foreign key relationship to not have children. But, if for some reason you need to be sure there is data present on both sides (parent **and** child), then this test is for you. You will notice in the example below that the query is very similar to T006 above, but the parent and child tables have switched positions in the FROM and LEFT JOIN lines. This is because we want to first pull all parent rows, then left join to find missing (FKey field IS NULL) child rows.
52 | ```sql
53 | SELECT CASE WHEN COUNT(*) > 0 THEN 'FAIL' ELSE 'P' END AS status
54 | FROM (
55 | SELECT DISTINCT c.country_id AS child_id, p.country_id AS parent_id
56 | FROM demo_hr.countries p
57 | LEFT JOIN demo_hr.locations c ON p.country_id = c.country_id
58 | WHERE c.country_id IS NULL
59 | );
60 | ```
61 |
62 |
--------------------------------------------------------------------------------
/oracle/data_validation/rule_set_03__heuristic_thresholds.md:
--------------------------------------------------------------------------------
1 | ### Data Validation Examples - Oracle
2 | [](LICENSE "Creative Commons Zero License by DataResearchLabs (effectively = Public Domain")
3 | [](http://www.DataResearchLabs.com)
4 | #### [Return to Data Validation Home Page](https://github.com/DataResearchLabs/sql_scripts/blob/main/data_validation_scripts.md)
5 |
6 |
7 | # Rule Set #3 - Heuristic Thresholds (Column Null Rates & Value Rates)
8 |
9 | ## Table of Contents
10 | - T008 - Null Rate Thresholds
11 | - T009 - Value Frequency Thresholds
12 |
13 |
14 |
15 | Note: Although the terms "rate" and "frequency" are mostly synonomous, nulls are usually referred to as "rates" and values as "frequencies". Don't know why, but following the convention I've heard over and over.
16 |
17 |
18 |
19 |
20 | ### T008 - Null Rate Thresholds
21 | There is a lot going on in this "Null Rate Threshold" data validation query below. The basic goal is to validate that a given table's columns do not have too many, or too few NULLs.
22 | In the example below...
24 | ...we are checking columns department_name, manager_id, and url in the table departments. We want to ensure that column department_name has no NULLs, and that column manager_id is NULL less than 65% of the time and column url is NULL less than 80% of the time.
25 |
26 | This check is like the proverbial miner's canary in that it is a trip wire triggered when something goes awry in a data feed. I've used this test scenario to great effect when coupled with a create-date or last-updated-date to monitor the past week's data loads for any unexpected upticks in null rates.
27 |
28 | There is a downside to this test scenario too however; and that is when it fires false alarms and you find yourself tinkering with the thresholds values (0.0000, 0.65000, and 0.80000 cutoffs below), raising and lowering them over and over. If this happens, chances are test fails are not actionable nor important and you should not waste your time applying this test scenario to that given table and field. Be careful to only pick fields that truly matter.
29 |
30 | Below, there is an upper CTE (common table expression) named "dtls" at the WITH clause, and a lower wrapper that applies the business logic (if any null rate rejections were found, fail the case). Inside the dtls CTE, there is an inner query at the bottom (at the FROM clause) doing a single table scan to calculate a null rate per column by counting nulls in each column and dividing by the total table row count. The SELECT CASE logic at the top applies the business logic; comparing the actual calcuated null rates (nr_dept_nm, nr_mgr_id, and nr_url) against the expected threshold rates (hard-coded as 0.0000, 0.6500, and 0.8000). The returned value is a rejection code (REJ-01, REJ-02, etc.) clearly indicating which field failed the null rate check, what the actual null rate was, and what the expected null rate threshold to exceed was. If no rejections are triggered, then status returns a "P" for pass.
31 |
53 |
54 |
55 |
56 | ### T009 - Value Frequency Thresholds
57 | "Value Frequency Threshold" tests are fairly similar to null rates above (T008). The difference is that we are checking the frequency (or rate) at which a column's values occur.
58 | In the example below...
60 | ...we are checking the frequencies with which the values 1, 2, 3, and 4 occur in field region_id of table countries. There is an upper CTE (common table expression) named "dtls" at the WITH clause, and a lower wrapper that applies the business logic (if any value frequency rejections were found, fail the case). Inside the dtls CTE, there is an inner query at the bottom (at the FROM clause) doing a single table scan to calculate a frequencies for each value in the GROUP BY for the column. It the GROUP BY value count (field "freq") is divided by the total table row count (field "den") to calculate field "freq_rt". The SELECT CASE logic at the top applies the business logic; comparing the actual value frequencies (freq_rt when region_id = 1, or =2, etc.) against the expected threshold frequencies (hard-coded as 0.28 to 0.36, 016 to 0.24 and so on). The returned value is a rejection code (REJ-01, REJ-02, etc.) clearly indicating which field failed the value ferquency check, what the actual value frequency was, and what the expected value frequency threshold ranges were. If no rejections are triggered, then status returns a "P" for pass.
61 |
6 |
7 | # Rule Set #4 - Numeric Values
8 |
9 | ## Table of Contents
10 | - T010 - Not Null
11 | - T011 - Not Negative
12 | - T012 - Numeric Range
13 | - T013 - In Value List
14 | - T014 - Not In Value List
15 | - T015 - Multi Field Compare
16 |
17 |
18 |
19 |
20 | ### T010 - Not Null
21 | Verify numeric field is not null. For example, to verify that table countries has no NULLs in field region_id:
22 | ```sql
23 | SELECT CASE WHEN COUNT(*) > 0 THEN 'FAIL' ELSE 'P' END AS status
24 | FROM demo_hr.countries
25 | WHERE region_id IS NULL;
26 | ```
27 |
28 |
29 |
30 |
31 | ### T011 - Not Negative
32 | Verify numeric field is not negative. For example, to verify that table countries has no field region_id negative values:
33 | ```sql
34 | SELECT CASE WHEN COUNT(*) > 0 THEN 'FAIL' ELSE 'P' END AS status
35 | FROM demo_hr.countries
36 | WHERE region_id < 0;
37 | ```
38 |
39 |
40 |
41 |
42 | ### T012 - Numeric Range
43 | Verify numeric field value is within a range. In the example below, we verify that field employee_id is between 100 and 999 in table employees. Note that you can run the inner query yourself to return the actual rejection code (is too low or too high) along with the actual value and the expected value...all nicely packaged for easy troubleshooting.
44 | ```sql
45 | SELECT employee_id
46 | , CASE WHEN employee_id < 100 THEN 'REJ-01: Verify employee_id > 99|exp>99|act=' || CAST(employee_id AS VARCHAR2(10))
47 | WHEN employee_id > 999 THEN 'REJ-02: Verify employee_id < 1000|exp<1000|act=' || CAST(employee_id AS VARCHAR2(10))
48 | ELSE 'P'
49 | END AS status
50 | FROM demo_hr.employees;
51 | ```
52 |
53 |
54 |
55 |
56 | ### T013 - In Value List
57 | Verify numeric field is **in** the list of values. For example, to verify that table countries field region_id is always values 1, 2, 3, or 4 we use the IN() clause as shown below:
58 | ```sql
59 | SELECT region_id
60 | , CASE WHEN region_id NOT IN(1,2,3,4) THEN 'FAIL' ELSE 'P' END AS status
61 | FROM demo_hr.countries;
62 | ```
63 |
64 |
65 |
66 |
67 | ### T014 - Not In Value List
68 | Verify numeric field is **not** in the list of values. For example, to verify that table countries field region_id is never in values 97, 98, or 99 we use the NOT IN() clauses as shown below:
69 | ```sql
70 | SELECT region_id
71 | , CASE WHEN region_id IN(97,98,99) THEN 'FAIL' ELSE 'P' END AS status
72 | FROM demo_hr.countries;
73 | ```
74 |
75 |
76 |
77 |
78 | ### T015 - Multi Field Compare
79 | Verify numeric field values in relation to one another. For example, to verify that salary times commission_pct is always less than $10,000 in table employees:
80 | ```sql
81 | SELECT salary, commission_pct
82 | , CASE WHEN salary * commission_pct > 10000 THEN 'FAIL' ELSE 'P' END AS status
83 | FROM demo_hr.employees;
84 | ```
85 |
86 |
--------------------------------------------------------------------------------
/oracle/data_validation/rule_set_05__date_values.md:
--------------------------------------------------------------------------------
1 | ### Data Validation Examples - Oracle
2 | [](LICENSE "Creative Commons Zero License by DataResearchLabs (effectively = Public Domain")
3 | [](http://www.DataResearchLabs.com)
4 | #### [Return to Data Validation Home Page](https://github.com/DataResearchLabs/sql_scripts/blob/main/data_validation_scripts.md)
5 |
6 |
7 | # Rule Set #5 - Date Values
8 |
9 | ## Table of Contents
10 | - T016 - Not Null
11 | - T017 - Date Range
12 | - T018 - No Time Part
13 | - T019 - Has Time Part
14 | - T020 - Multi Field Compare
15 | - Bonus Tip - Joining Tables with 2 Pairs of Start-End Date Overlaps
16 |
17 |
18 |
19 |
20 | ### T016 - Not Null
21 | Verify date field is not null. For example, to verify that table countries has no NULLs in field date_last_updated:
22 | ```sql
23 | SELECT date_last_updated
24 | , CASE WHEN date_last_updated IS NULL THEN 'FAIL' ELSE 'P' END AS status
25 | FROM demo_hr.countries;
26 | ```
27 |
28 |
29 |
30 |
31 | ### T017 - Date Range
32 | Verify date field is within specified range. For example, you can run the sql below to verify that table countries field date_last_updated is between 1/1/2021 and today. Note the use of SYSDATE to represent today's date dynamically in Oracle. Notice the inner query uses a CASE...WHEN...ELSE structure to identify two rejections codes: (1) date is too high, and (2) date is too low. Expected and actual values are displayed in the output if you run the inner query only. The outer query is a wrapper to determine whether the test passed or failed.
33 | ```sql
34 | SELECT date_last_updated
35 | , CASE WHEN date_last_updated > SYSDATE THEN 'REJ-01: Field date_last_updated cannot be in the future|exp<=' || CAST(SYSDATE AS VARCHAR2(20)) || '|act=' || CAST(date_last_updated AS VARCHAR2(20))
36 | WHEN date_last_updated < TO_DATE('01/01/2021', 'mm/dd/yyyy') THEN 'REJ-02: Field date_last_updated cannot be too old|exp>=1/1/2021|act=' || CAST(date_last_updated AS VARCHAR2(20))
37 | ELSE 'P'
38 | END AS status
39 | FROM demo_hr.countries;
40 | ```
41 |
42 |
43 |
44 |
45 | ### T018 - No Time Part
46 | Verify date field is a date only, no time part present. For example, to verify that table employees has no time part in field hire_date (time part must be "12:00:00"):
47 | ```sql
48 | SELECT hire_date
49 | , CASE WHEN TO_CHAR(hire_date, 'hh:mi:ss') <> '12:00:00' THEN 'FAIL' ELSE 'P' END AS status
50 | FROM demo_hr.employees;
51 | ```
52 |
53 |
54 |
55 |
56 | ### T019 - Has Time Part
57 | Verify date field is a date **and** time. For example, to verify that table employees has a time part in field hire_date (time part cannot be "12:00:00"):
58 | ```sql
59 | SELECT start_tm
60 | , CASE WHEN TO_CHAR(start_tm, 'hh:mi:ss') = '12:00:00' THEN 'FAIL' ELSE 'P' END AS status
61 | FROM demo_hr.test_case_results;
62 | ```
63 |
64 |
65 |
66 |
67 | ### T020 - Multi Field Compare
68 | Verify multiple date fields relative to each other. For example, to verify that field start_date must be < field end_date in table job_history (thus if start_date is >= end_date the test case fails):
69 | ```sql
70 | SELECT start_date, end_date
71 | , CASE WHEN start_date >= end_date THEN 'FAIL' ELSE 'P' END AS status
72 | FROM demo_hr.job_history;
73 | ```
74 |
75 |
76 |
77 |
78 | ### Bonus Tip - Joining Tables with 2 Pairs of Start-End Date Overlaps
79 | So this is a fantastic tip I learned from a co-worker in healthcare back in 2011 (was it Jennifer C.? or Matt G.? or Jonathon P.? I can't remember).
80 |
81 |
82 | The **problem** is that you are trying to join two tables with logic where the table1.start_dt/end_dt's overlap with the table2.start_dt/end_dt.
83 |
84 |
85 | The **solution** is to **join on `table1.start_dt <= table2.end_dt AND table1.end_dt >= table2.start_dt`**.
86 |
87 | Here is why:
88 |
89 | ```
90 | Scenario #1 = "Discard - No Overlap" Table #1 date range is completely before Table #2's (Dt#1.End is NOT > Dt#2.Start)
91 | Table#1: |Start ----------- End|
92 | Table#2: |Start ----------- End|
93 |
94 |
95 | Scenario #2 = "Include - Table #1 End = Table#2 Start" Table #1 date range ends exactly where Table #2's starts (Tbl#1.End = Tbl#2.Start AND Tbl#1.Start < Tbl#2.End)
96 | Table#1: |Start ----------- End|
97 | Table#2: |Start ----------- End|
98 |
99 |
100 | Scenario #3 = "Include - Table #1 Ends Midway Table #2 Span" Table #1 date range nicely overlaps Table #2's (Tbl#1.End > Tbl#2.Start AND Tbl#1.Start < Tbl#2.End)
101 | Table#1: |Start ----------- End|
102 | Table#2: |Start ----------- End|
103 |
104 |
105 | Scenario #4 = "Include - Table #1 Starts Midway Table #2 Span" Table #1 date range nicely overlaps Table #2's (Tbl#1.End > Tbl#2.Start AND Tbl#1.Start < Tbl#2.End)
106 | Table#1: |Start ----------- End|
107 | Table#2: |Start ----------- End|
108 |
109 |
110 | Scenario #5 = "Include - Table #1 Starts at Table #2 End" Table #1 date range start exactly at Table #2's End (Tbl#1.End > Tbl#2.St AND Tbl#1.St = Tbl#2.End)
111 | Table#1: |Start ----------- End|
112 | Table#2: |Start ----------- End|
113 |
114 |
115 | Scenario #6 = "Discard - No Overlap" Date #1 is entirely after Date #2 (Tbl#1.Start is > Tbl#2.End)
116 | Table#1: |Start ----------- End|
117 | Table#2: |Start ----------- End|
118 |
119 | ```
120 |
121 |
122 |
123 |
--------------------------------------------------------------------------------
/oracle/data_validation/rule_set_09__defect_regression.md:
--------------------------------------------------------------------------------
1 | ### Data Validation Examples - Oracle
2 | [](LICENSE "Creative Commons Zero License by DataResearchLabs (effectively = Public Domain")
3 | [](http://www.DataResearchLabs.com)
4 | #### [Return to Data Validation Home Page](https://github.com/DataResearchLabs/sql_scripts/blob/main/data_validation_scripts.md)
5 |
6 |
7 | # Rule Set #9 - Defect Regression
8 |
9 | ## Summary
10 | Where possible, it is a good idea to cover known defects with a test case. This enables automated regression testing for free. Yuo can simply run this script to quickly re-check whether a defect has been re-introduced.
11 |
12 | There are a lot of caveats that will block you from covering every defect:
13 | * If anything but the simplest of setup and tear down data is required
14 | * If processes or jobs or applications or ETL must be triggered
15 | * If there are interactions with other databases or files not on the current server
16 |
17 | That said, you can still frequently setup static trip wires to monitor the data for recurrence of old bugs.
18 |
19 | Some real-world examples from my past include:
20 | * A defect where email addresses stored in the database had TAB characters. I setup a simple CHR(9) check that ran daily and cross referenced the defect number in the SQL return results.
21 | * A defect where customer specific HTML snippets for email reports had bad tags manually entered by company analysts working with the client. Obviously fixing the app to avoid those data entries would be best, but that was a different department that was going to get to it later. So, I setup a daily alert to check for those HTML tags and some other similar ones that might be accidentally introduced.
22 | * A defect where internal data entry folks on rare occasions had to key in customer names. They were supposed to be all upper case for other reasons in the system, but the analysts sometimes entered lower case. Developers were not going to get around to fixing it for a couple of months (other priorities, this was a very low frequency issue). So I setup a daily check to trip an alarm and notify via email the appropriate person to correct their data.
23 | * Basically, you can take any of the test case examples in Rule Sets #1 thu #8 and apply them as regression tests for defects when appropriate.
24 |
--------------------------------------------------------------------------------
/oracle/data_validation/sql_scripts/dv_advanced_01_setup.sql:
--------------------------------------------------------------------------------
1 |
2 | -- ===============================================================================================
3 | -- Filename: dv_advanced_01_setup.sql
4 | -- Description: Data Validation Scripts - Setup Two "Temp" Tables
5 | -- Platform: Oracle
6 | -- Author: DataResearchLabs
7 | -- GitHub: https://github.com/DataResearchLabs/sql_scripts
8 | -- YouTube Tutorials: https://www.youtube.com/channel/UCQciXv3xaBykeUFc04GxSXA
9 | --------------------------------------------------------------------------------------------------
10 | -- This SQL script is run one time prior to the ongoing regular running of the
11 | -- "dv_advanced_02_script.sql".
12 | -- Run this SQL *one time* to add the "TEST_CASE_RESULTS" and "TEST_CASE_CONFIG" tables to the
13 | -- database in which you are running tests. These tables contain the test results and any
14 | -- configuration values your test cases might need.
15 | -- Unfortunately Oracle differs from all the other platforms in that prior to version 19, there
16 | -- were no private/local/on-the-fly temporary tables that can just be easily spun up within a
17 | -- script and then discarded. Thus, we will just add two permanent tables one time and grant
18 | -- the user running the script read and write permissions to that table.
19 | -- ===============================================================================================
20 |
21 |
22 |
23 | -- 1: Drop Tables if Exist
24 | --------------------------
25 | /* -- Highlight and manually run the line below **if** it already exists.
26 | DROP TABLE demo_hr.test_case_results;
27 | DROP TABLE demo_hr.test_case_config;
28 | */
29 |
30 |
31 | -- 2: Add Table Test_Case_Results
32 | ---------------------------------
33 | CREATE TABLE demo_hr.test_case_results ( /* <<<<<<<<<<<<<<<<< Change Schema and Table name here */
34 | tst_id VARCHAR2(5)
35 | , tst_descr VARCHAR2(255)
36 | , START_TM TIMESTAMP DEFAULT SYSTIMESTAMP
37 | , exec_tm VARCHAR2(15)
38 | , status VARCHAR2(5)
39 | , rej_dtls VARCHAR2(1024)
40 | , lookup_sql VARCHAR(1024)
41 | );
42 | COMMIT;
43 |
44 |
45 |
46 | -- 3. Add Table Test_Case_Config
47 | CREATE TABLE demo_hr.test_case_config ( /* <<<<<<<<<<<<<<<<< change schema and table name here */
48 | prop_nm VARCHAR2(99)
49 | , prop_val VARCHAR2(255)
50 | );
51 | COMMIT;
52 |
53 |
54 |
55 |
--------------------------------------------------------------------------------
/oracle/schemadiff/simple_schema_dump.sql:
--------------------------------------------------------------------------------
1 | ------------------------------------------------------------------------------------
2 | -- Simple Schema Dump:
3 | -- This SQL script will dump table, column, key, and index design related metadata
4 | -- so that you can copy-paste or export to a text file.
5 | -- Even better, you can make other snapshots over time (same database schema earlier
6 | -- points in time), OR in different environments (DEV, PPMO, STAGE, PROD). Then,
7 | -- using your favorite free Text File Diff Tool (DiffMerge, ExamDiff, etc.) you
8 | -- can compare snapshots to quick isolate and identify what changed over time
9 | -- or is different between environments.
10 | ------------------------------------------------------------------------------------
11 | -- Platform: Oracle 11g or later
12 | -- Author: DataResearchLabs
13 | -- GitHub: https://github.com/DataResearchLabs/sql_scripts
14 | -- YouTube Tutorials: https://www.youtube.com/channel/UCQciXv3xaBykeUFc04GxSXA
15 | ----------------------------------------------------------------------------------
16 |
17 | -- All variables are consolidated here in the first CTE (Common Table Expression)
18 | -- Each given row is a variable, with the value you change preceding the "AS" command
19 | WITH vars
20 | AS (
21 | SELECT 'HR' AS v_SchemaName -- <<<<<<<<<<<< CHANGE THIS VALUE to Schema you want to dump
22 | FROM dual
23 | )
24 |
25 |
26 | , baseTbl
27 | AS (
28 | SELECT owner, table_type, table_name /*, comments */
29 | FROM SYS.ALL_TAB_COMMENTS
30 | WHERE table_name NOT LIKE 'BIN%' -- Leave this as is to ignore the Oracle10g and forard Recycle Bin tables
31 | AND owner = (SELECT v_SchemaName FROM vars)
32 | )
33 |
34 | , metaForTbl
35 | AS (
36 | SELECT t.owner AS SchemaName
37 | , t.table_name AS TableName
38 | , '(' || CASE WHEN t.table_type = 'TABLE' THEN 'Table'
39 | WHEN t.table_type = 'VIEW' THEN 'View'
40 | ELSE 'UK'
41 | END || ')' AS ObjectType
42 | , t.table_name AS ObjectName
43 | , '(Exists)' AS PropertyName
44 | , ' ' AS PropertyValue
45 | FROM baseTbl t
46 | )
47 |
48 | , metaForCol_dataType
49 | AS (
50 | SELECT tut.owner AS SchemaName, tut.table_name AS TableName, 'Column' AS ObjectType, tut.column_name AS ObjectName
51 | , '2' AS PropertyName
52 | , COALESCE(tut.data_type, 'UNKNOWN') || '('
53 | || CASE WHEN tut.data_length IS NOT NULL THEN CAST(tut.data_length AS VARCHAR2(10)) ELSE '' END
54 | || CASE WHEN tut.data_precision IS NOT NULL THEN ',' || CAST(tut.data_precision AS VARCHAR2(10)) ELSE '' END
55 | || CASE WHEN tut.data_scale IS NOT NULL THEN ',' || CAST(tut.data_scale AS VARCHAR2(10)) ELSE '' END
56 | || ')' AS PropertyValue
57 | FROM SYS.ALL_TAB_COLUMNS tut
58 | INNER JOIN baseTbl ft ON ft.owner = tut.owner AND ft.table_name = tut.table_name
59 | )
60 |
61 | , metaForCol_nullable
62 | AS (
63 | SELECT tut.owner AS SchemaName, tut.table_name AS TableName, 'Column' AS ObjectType, tut.column_name AS ObjectName
64 | , '3' AS PropertyName, CASE WHEN tut.nullable = 'Y' THEN 'NULL' ELSE 'NOT NULL' END AS PropertyValue
65 | FROM SYS.ALL_TAB_COLUMNS tut
66 | INNER JOIN baseTbl ft ON ft.owner = tut.owner AND ft.table_name = tut.table_name
67 | )
68 |
69 | , metaForCol_ordpos
70 | AS (
71 | SELECT tut.owner AS SchemaName, tut.table_name AS TableName, 'Column' AS ObjectType, tut.column_name AS ObjectName
72 | , '1' AS PropertyName, LPAD(CAST(tut.column_id AS VARCHAR2(3)), 3, '0') AS PropertyValue
73 | FROM SYS.ALL_TAB_COLUMNS tut
74 | INNER JOIN baseTbl ft ON ft.owner = tut.owner AND ft.table_name = tut.table_name
75 | )
76 |
77 | , metaAllCols
78 | AS (
79 | SELECT schemaname, tablename, objecttype, objectname, 'Properties' AS propertyname
80 | , LISTAGG(propertyvalue, ' | ')
81 | WITHIN GROUP (ORDER BY propertyname, propertyvalue) AS propertyvalue
82 | FROM (
83 | SELECT * FROM metaForCol_dataType
84 | UNION SELECT * FROM metaForCol_nullable
85 | UNION SELECT * FROM metaForCol_ordpos
86 | ) t
87 | GROUP BY schemaname, tablename, objecttype, objectname
88 | )
89 |
90 | , metaForKeys
91 | AS (
92 | SELECT cons.owner AS SchemaName, cols.table_name AS TableName
93 | , CASE WHEN cons.constraint_type = 'P' THEN 'PKey'
94 | WHEN cons.constraint_type = 'U' THEN 'UKey'
95 | WHEN cons.constraint_type = 'R' THEN 'FKey'
96 | END AS ObjectType
97 | , cons.constraint_name AS ObjectName
98 | , 'FieldList' AS PropertyName
99 | , LISTAGG(cols.column_name, ',') WITHIN GROUP (ORDER BY cols.position) AS PropertyValue
100 | FROM all_constraints cons
101 | INNER JOIN all_cons_columns cols ON cons.constraint_name = cols.constraint_name AND cons.owner = cols.owner
102 | WHERE cons.owner = (SELECT v_SchemaName FROM vars)
103 | AND cons.table_name IN(SELECT DISTINCT table_name FROM baseTbl)
104 | AND cons.constraint_type IN('P','R','U')
105 | GROUP BY cons.owner, cols.table_name, cons.constraint_type, cons.constraint_name
106 | )
107 |
108 | , metaForIdxs
109 | AS (
110 | SELECT tut.table_owner AS SchemaName, tut.table_name AS TableName
111 | , 'Index' AS ObjectType, tut.index_name AS ObjectName
112 | , 'FieldList' AS PropertyName
113 | , LISTAGG(tut.column_name, ',') WITHIN GROUP (ORDER BY tut.column_position) AS PropertyValue
114 | FROM ALL_IND_COLUMNS tut
115 | INNER JOIN baseTbl ft ON ft.owner = tut.index_owner AND ft.table_name = tut.table_name
116 | GROUP BY tut.table_owner, tut.table_name, tut.index_name
117 | )
118 |
119 | , allMetadata
120 | AS (
121 | SELECT * FROM metaForTbl
122 | UNION SELECT * FROM metaAllCols
123 | UNION SELECT * FROM metaForKeys
124 | UNION SELECT * FROM metaForIdxs
125 | )
126 |
127 | SELECT CASE WHEN objecttype IN('(Table)','(View)') THEN schemaname ELSE ' ' END AS schema_nm
128 | , CASE WHEN objecttype IN('(Table)','(View)') THEN tablename ELSE ' ' END AS tbl_nm
129 | , objecttype AS obj_typ, objectname AS obj_nm, /*propertyname,*/ propertyvalue AS properties
130 | FROM allMetadata
131 | ORDER BY schemaname, tablename, objecttype
132 | , CASE WHEN objecttype='Column' THEN propertyvalue ELSE ' ' END
133 | , objectname, propertyname
134 |
135 |
136 |
--------------------------------------------------------------------------------
/postgresql/data_dictionary/data_dict_dump.sql:
--------------------------------------------------------------------------------
1 | ------------------------------------------------------------------------------------
2 | -- Data Dictionary Dump:
3 | -- This SQL script will dump table, column, key, and description design related
4 | -- metadata so that you can copy-paste or export to Excel as a Data Dictionary.
5 | ------------------------------------------------------------------------------------
6 | -- Platform: PostgreSQL
7 | -- Author: DataResearchLabs
8 | -- GitHub: https://github.com/DataResearchLabs/sql_scripts
9 | -- YouTube Tutorials: https://www.youtube.com/channel/UCQciXv3xaBykeUFc04GxSXA
10 | ----------------------------------------------------------------------------------
11 | WITH vars
12 | AS (
13 | SELECT
14 | 'public' AS v_SchemaName -- Set to the schema whose tables you want in the Data Dictionary
15 | , 'NO' AS v_TablesOnly -- YES=Limit To Tables only; NO=Include views too
16 | )
17 |
18 | , baseTbl
19 | AS (
20 | SELECT table_schema AS SchemaName
21 | , table_catalog
22 | , table_type, table_name, table_schema
23 | FROM INFORMATION_SCHEMA.TABLES
24 | WHERE TABLE_SCHEMA = (SELECT v_SchemaName FROM vars)
25 | AND ( (TABLE_TYPE = 'BASE TABLE')
26 | OR ((SELECT v_TablesOnly FROM vars) = 'NO')
27 | )
28 | )
29 |
30 | , metadata
31 | AS (
32 | SELECT
33 | bt.SchemaName AS schema_nm
34 | , bt.table_name AS table_nm
35 | , CASE WHEN bt.TABLE_TYPE = 'BASE TABLE' THEN 'TBL'
36 | WHEN bt.TABLE_TYPE = 'VIEW' THEN 'VW'
37 | ELSE 'UK'
38 | END AS obj_typ
39 | , tut.ordinal_position AS ord_pos
40 | , tut.column_name AS column_nm
41 | , CONCAT(COALESCE(tut.data_type, 'unknown'),
42 | CASE WHEN tut.data_type IN('varchar','char') THEN CONCAT('(', CAST(tut.CHARACTER_MAXIMUM_LENGTH AS varchar(10)), ')')
43 | WHEN tut.data_type IN('date','time') THEN CONCAT('(3)')
44 | WHEN tut.data_type = 'datetime' THEN CONCAT('(8)')
45 | WHEN tut.data_type = 'timestamp' THEN CONCAT('(4)')
46 | WHEN tut.data_type in('bigint','integer','smallint') THEN CONCAT('(', CAST(tut.NUMERIC_PRECISION AS varchar(10)), ')')
47 | WHEN tut.data_type = 'decimal' THEN CONCAT('(', CAST(tut.NUMERIC_PRECISION AS varchar(10)), ',', CAST(tut.NUMERIC_SCALE AS varchar(10)), ')')
48 | WHEN tut.CHARACTER_MAXIMUM_LENGTH IS NOT NULL THEN CONCAT('(', CAST(tut.CHARACTER_MAXIMUM_LENGTH AS varchar(10)), ')')
49 | WHEN tut.DATETIME_PRECISION IS NOT NULL THEN CONCAT('(', CAST(tut.DATETIME_PRECISION AS varchar(10)), ')')
50 | WHEN tut.NUMERIC_PRECISION IS NOT NULL
51 | AND tut.NUMERIC_SCALE IS NULL THEN CONCAT('(', CAST(tut.NUMERIC_PRECISION AS varchar(10)), ')')
52 | WHEN tut.NUMERIC_PRECISION IS NOT NULL
53 | AND tut.NUMERIC_SCALE IS NOT NULL THEN CONCAT('(', CAST(tut.NUMERIC_PRECISION AS varchar(10)), ',', CAST(tut.NUMERIC_SCALE AS varchar(10)), ')')
54 | ELSE ''
55 | END ) AS data_typ
56 | , CASE WHEN tut.IS_NULLABLE = 'YES' THEN 'NULL' ELSE 'NOT NULL' END AS nullable
57 | FROM INFORMATION_SCHEMA.COLUMNS tut
58 | INNER JOIN baseTbl bt ON bt.table_catalog = tut.TABLE_CATALOG AND bt.table_name = tut.table_name
59 | )
60 |
61 | , meta_for_keys
62 | AS (
63 | SELECT schema_nm, table_nm, column_nm
64 | , STRING_AGG(is_key, ',' ORDER BY is_key) AS is_key
65 | FROM (
66 | SELECT cons.TABLE_SCHEMA AS schema_nm
67 | , cons.TABLE_NAME AS table_nm
68 | , kcu.COLUMN_NAME AS column_nm
69 | , CASE WHEN cons.constraint_type = 'PRIMARY KEY' THEN 'PK'
70 | WHEN cons.constraint_type = 'UNIQUE' THEN 'UK'
71 | WHEN cons.constraint_type = 'FOREIGN KEY' THEN 'FK'
72 | ELSE 'X'
73 | END AS is_key
74 | FROM INFORMATION_SCHEMA.TABLE_CONSTRAINTS cons
75 | INNER JOIN INFORMATION_SCHEMA.KEY_COLUMN_USAGE kcu
76 | ON cons.TABLE_SCHEMA = kcu.TABLE_SCHEMA
77 | AND cons.TABLE_NAME = kcu.TABLE_NAME
78 | AND cons.CONSTRAINT_NAME = kcu.CONSTRAINT_NAME
79 | WHERE cons.table_schema = (SELECT v_SchemaName FROM vars)
80 | AND cons.table_name IN(SELECT DISTINCT table_name FROM baseTbl)
81 | AND cons.constraint_type IN('PRIMARY KEY','FOREIGN KEY','UNIQUE')
82 | GROUP BY cons.TABLE_SCHEMA, cons.TABLE_NAME, kcu.COLUMN_NAME, cons.constraint_type
83 | ) t
84 | GROUP BY schema_nm, table_nm, column_nm
85 | )
86 |
87 | , col_comm
88 | AS (
89 | SELECT c.TABLE_SCHEMA AS schema_nm
90 | , c.TABLE_NAME AS table_nm
91 | , c.COLUMN_NAME AS column_nm
92 | , pgd.DESCRIPTION AS column_descr
93 | FROM pg_catalog.pg_statio_all_tables AS st
94 | INNER JOIN pg_catalog.PG_DESCRIPTION AS pgd ON pgd.objoid = st.relid
95 | INNER JOIN INFORMATION_SCHEMA.COLUMNS AS c ON pgd.objsubid = c.ordinal_position
96 | AND c.table_schema = st.schemaname
97 | AND c.table_name = st.relname
98 | WHERE c.table_schema = (SELECT v_SchemaName FROM vars)
99 | AND c.table_name IN(SELECT DISTINCT table_name FROM baseTbl)
100 | )
101 |
102 | SELECT md.SCHEMA_NM, md.TABLE_NM, md.OBJ_TYP
103 | , md.ORD_POS AS ord
104 | , COALESCE(pk.is_key, ' ') AS is_key
105 | , md.COLUMN_NM, md.DATA_TYP, md.NULLABLE, c.column_descr
106 | FROM metadata md
107 | LEFT JOIN meta_for_keys pk ON pk.SCHEMA_NM = md.SCHEMA_NM AND pk.TABLE_NM = md.TABLE_NM AND pk.COLUMN_NM = md.COLUMN_NM
108 | LEFT JOIN col_comm c ON c.SCHEMA_NM = md.SCHEMA_NM AND c.TABLE_NM = md.TABLE_NM AND c.COLUMN_NM = md.COLUMN_NM
109 | ORDER BY md.SCHEMA_NM, md.TABLE_NM, md.ORD_POS
110 |
--------------------------------------------------------------------------------
/postgresql/data_validation/demo_data/demo_hr_01_create_database.sql:
--------------------------------------------------------------------------------
1 | -- SCHEMA: demo_hr
2 |
3 | -- DROP SCHEMA demo_hr;
4 |
5 | CREATE SCHEMA demo_hr
6 | AUTHORIZATION postgres;
7 |
8 | COMMENT ON SCHEMA public
9 | IS 'demo_hr schema for testing out data validation scripts';
10 |
11 | GRANT ALL ON SCHEMA demo_hr TO PUBLIC;
12 |
13 | GRANT ALL ON SCHEMA demo_hr TO postgres;
14 |
--------------------------------------------------------------------------------
/postgresql/data_validation/demo_data/demo_hr_02_create_tables.sql:
--------------------------------------------------------------------------------
1 |
2 | -- demo_hr.jobs definition
3 | CREATE TABLE demo_hr.jobs
4 | (job_id VARCHAR(10),
5 | job_title VARCHAR(35) CONSTRAINT job_title_nn NOT NULL,
6 | min_salary INTEGER,
7 | max_salary INTEGER,
8 | CONSTRAINT job_id_pk PRIMARY KEY (job_id)
9 | );
10 |
11 |
12 |
13 | -- demo_hr.jobs_snapshot definition
14 | CREATE TABLE demo_hr.jobs_snapshot
15 | (job_id VARCHAR(10),
16 | job_title VARCHAR(35) NOT NULL,
17 | min_salary INTEGER,
18 | max_salary INTEGER
19 | );
20 |
21 |
22 |
23 | -- demo_hr.regions definition
24 | CREATE TABLE demo_hr.regions
25 | (region_id INTEGER CONSTRAINT region_id_nn NOT NULL,
26 | region_name VARCHAR(25),
27 | CONSTRAINT reg_id_pk PRIMARY KEY (region_id)
28 | );
29 |
30 |
31 |
32 | -- demo_hr.countries definition
33 | CREATE TABLE demo_hr.countries
34 | (country_id CHAR(2) CONSTRAINT country_id_nn NOT NULL,
35 | country_name VARCHAR(40),
36 | region_id INTEGER,
37 | date_last_updated DATE DEFAULT CURRENT_DATE,
38 | CONSTRAINT country_c_id_pk PRIMARY KEY (country_id),
39 | CONSTRAINT countr_reg_fk FOREIGN KEY (region_id) REFERENCES demo_hr.regions (region_id)
40 | );
41 |
42 |
43 |
44 |
45 | -- demo_hr.locations definition
46 | CREATE TABLE demo_hr.locations
47 | (location_id INTEGER,
48 | street_address VARCHAR(40),
49 | postal_code VARCHAR(12),
50 | city VARCHAR(30) CONSTRAINT loc_city_nn NOT NULL,
51 | state_province VARCHAR(25),
52 | country_id CHAR(2),
53 | CONSTRAINT loc_id_pk PRIMARY KEY (location_id),
54 | CONSTRAINT loc_c_id_fk FOREIGN KEY (country_id) REFERENCES demo_hr.countries (country_id)
55 | );
56 |
57 | CREATE INDEX loc_city_ix ON demo_hr.locations (city);
58 | CREATE INDEX loc_state_province_ix ON demo_hr.locations (state_province);
59 | CREATE INDEX loc_country_ix ON demo_hr.locations (country_id);
60 |
61 |
62 |
63 | -- demo_hr.departments dEFinition
64 | CREATE TABLE demo_hr.departments
65 | (department_id INTEGER,
66 | department_name VARCHAR(30) CONSTRAINT dept_name_nn NOT NULL,
67 | manager_id INTEGER,
68 | location_id INTEGER,
69 | url VARCHAR(255),
70 | CONSTRAINT dept_id_pk PRIMARY KEY (department_id),
71 | CONSTRAINT dept_loc_fk FOREIGN KEY (location_id) REFERENCES demo_hr.locations (location_id)
72 | ) ;
73 |
74 | CREATE INDEX dept_location_ix ON demo_hr.departments (location_id);
75 |
76 |
77 |
78 | -- demo_hr.job_history definition
79 | CREATE TABLE demo_hr.job_history
80 | (employee_id INTEGER CONSTRAINT jhist_employee_nn NOT NULL,
81 | start_date DATE CONSTRAINT jhist_start_date_nn NOT NULL,
82 | end_date DATE CONSTRAINT jhist_end_date_nn NOT NULL,
83 | job_id VARCHAR(10) CONSTRAINT jhist_job_nn NOT NULL,
84 | department_id INTEGER,
85 | CONSTRAINT jhist_date_interval CHECK (end_date > start_date),
86 | CONSTRAINT jhist_emp_id_st_date_pk PRIMARY KEY (employee_id, start_date),
87 | CONSTRAINT jhist_job_fk FOREIGN KEY (job_id) REFERENCES demo_hr.jobs (job_id),
88 | CONSTRAINT jhist_dept_fk FOREIGN KEY (department_id) REFERENCES demo_hr.departments (department_id)
89 | ) ;
90 |
91 | CREATE INDEX jhist_job_ix ON demo_hr.job_history (job_id);
92 | CREATE INDEX jhist_employee_ix ON demo_hr.job_history (employee_id);
93 | CREATE INDEX jhist_department_ix ON demo_hr.job_history (department_id);
94 |
95 |
96 |
97 |
98 | -- demo_hr.employees definition
99 | CREATE TABLE demo_hr.employees
100 | (employee_id INTEGER,
101 | first_name VARCHAR(20),
102 | last_name VARCHAR(25) CONSTRAINT emp_last_name_nn NOT NULL,
103 | email VARCHAR(25) CONSTRAINT emp_email_nn NOT NULL,
104 | phone_number VARCHAR(20),
105 | hire_date DATE CONSTRAINT emp_hire_date_nn NOT NULL,
106 | job_id VARCHAR(10) CONSTRAINT emp_job_nn NOT NULL,
107 | salary INTEGER,
108 | commission_pct NUMERIC(2,2),
109 | manager_id INTEGER,
110 | department_id INTEGER,
111 | some_date_fmt1 VARCHAR(50),
112 | some_date_fmt2 VARCHAR(50),
113 | some_date_fmt3 VARCHAR(50),
114 | some_date_fmt4 VARCHAR(50),
115 | fake_ssn VARCHAR(11),
116 | zip5 VARCHAR(5),
117 | zip5or9 VARCHAR(10),
118 | zip9 VARCHAR(10),
119 | email_address VARCHAR(100),
120 | CONSTRAINT emp_salary_min CHECK (salary > 0),
121 | CONSTRAINT emp_email_uk UNIQUE (email),
122 | CONSTRAINT emp_emp_id_pk PRIMARY KEY (employee_id),
123 | CONSTRAINT emp_dept_fk FOREIGN KEY (department_id) REFERENCES demo_hr.departments (department_id),
124 | CONSTRAINT emp_job_fk FOREIGN KEY (job_id) REFERENCES demo_hr.jobs (job_id),
125 | CONSTRAINT emp_manager_fk FOREIGN KEY (manager_id) REFERENCES demo_hr.employees (employee_id)
126 | ) ;
127 |
128 | CREATE INDEX emp_department_ix ON demo_hr.employees (department_id);
129 | CREATE INDEX emp_job_ix ON demo_hr.employees (job_id);
130 | CREATE INDEX emp_manager_ix ON demo_hr.employees (manager_id);
131 | CREATE INDEX emp_name_ix ON demo_hr.employees (last_name, first_name);
132 |
133 |
134 |
135 |
136 |
137 |
138 |
--------------------------------------------------------------------------------
/postgresql/data_validation/img/01_data_val_postgresql_example_test_case_sql_code.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DataResearchLabs/sql_scripts/3eaa17caf248d3b58763534d62ef4d9ac5729a46/postgresql/data_validation/img/01_data_val_postgresql_example_test_case_sql_code.png
--------------------------------------------------------------------------------
/postgresql/data_validation/img/03_data_val_postgresql_adv_test_case_ex2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DataResearchLabs/sql_scripts/3eaa17caf248d3b58763534d62ef4d9ac5729a46/postgresql/data_validation/img/03_data_val_postgresql_adv_test_case_ex2.png
--------------------------------------------------------------------------------
/postgresql/data_validation/img/04_data_val_postgresql_run_results_adv.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DataResearchLabs/sql_scripts/3eaa17caf248d3b58763534d62ef4d9ac5729a46/postgresql/data_validation/img/04_data_val_postgresql_run_results_adv.png
--------------------------------------------------------------------------------
/postgresql/data_validation/rule_set_01__row_counts.md:
--------------------------------------------------------------------------------
1 | ### Data Validation Examples - Postgresql
2 | [](LICENSE "Creative Commons Zero License by DataResearchLabs (effectively = Public Domain")
3 | [](http://www.DataResearchLabs.com)
4 | #### [Return to Data Validation Home Page](https://github.com/DataResearchLabs/sql_scripts/blob/main/data_validation_scripts.md)
5 |
6 |
7 | # Rule Set #1 - Row Counts
8 |
9 | ## Table of Contents
10 | - T001 - "Full" Row Count
11 | - T002 - "Partial" Row Count
12 | - T003 - "Relative" Row Count
13 | - T004 - "Recent" Row Count
14 |
15 |
16 |
17 |
18 | ### T001 - "Full" Row Count
19 | Verify full row count for a table or view. For example, to verify that table countries has exactly 25 rows:
20 | ```sql
21 | SELECT CASE WHEN COUNT(*) <> 25 THEN 'FAIL' ELSE 'P' END AS status
22 | FROM demo_hr.countries;
23 | ```
24 |
25 |
26 |
27 |
28 | ### T002 - "Partial" Row Count
29 | Verify partial row count for a subset of a table or view. For example, to verify that table countries has exactly 8 rows where region_id = 1 (Europe):
30 | ```sql
31 | SELECT CASE WHEN COUNT(*) <> 8 THEN 'FAIL' ELSE 'P' END AS status
32 | FROM demo_hr.countries
33 | WHERE region_id = 1;
34 | ```
35 |
36 |
37 |
38 |
39 | ### T003 - "Relative" Row Count
40 | Verify relative row counts between tables or views. The example below verifies that table countries has at least 5 times the number of rows as table regions. The inner query returns one row with two columns: (a) count of contries, and (b) count of regions. The outer query can then apply the business logic of comparing the counts and returning a Pass or Fail.
41 | ```sql
42 | SELECT CASE WHEN countries_count < 5 * regions_count THEN 'FAIL' ELSE 'P' END AS status
43 | FROM (
44 | SELECT (SELECT COUNT(*) AS row_count FROM demo_hr.countries) AS countries_count
45 | , (SELECT COUNT(*) AS row_count FROM demo_hr.regions) AS regions_count
46 | ) t;
47 | ```
48 |
49 |
50 |
51 |
52 | ### T004 - "Recent" Row Count
53 | Verify recent row counts. The example below verifies that table countries has had at least 5 rows updated in the past 10 days based on the date stamp in field date_last_updated. The inner query uses GETDATE() (SQL Server current date time stamp) and subtracts 10 days. The inner query counts how many rows in table countries have a date_last_updated with the past 10 days. The outer query can then confirm that at least 5 rows are recent for a Pass...if not then it returns a Fail.
54 | ```sql
55 | SELECT CASE WHEN row_count < 5 THEN 'FAIL' ELSE 'P' END AS status
56 | FROM (
57 | SELECT COUNT(*) AS row_count
58 | FROM demo_hr..countries
59 | WHERE date_last_updated >= NOW()::DATE - 150
60 | ) t;
61 | ```
62 |
63 |
64 |
--------------------------------------------------------------------------------
/postgresql/data_validation/rule_set_02__keys.md:
--------------------------------------------------------------------------------
1 | ### Data Validation Examples - Postgresql
2 | [](LICENSE "Creative Commons Zero License by DataResearchLabs (effectively = Public Domain")
3 | [](http://www.DataResearchLabs.com)
4 | #### [Return to Data Validation Home Page](https://github.com/DataResearchLabs/sql_scripts/blob/main/data_validation_scripts.md)
5 |
6 |
7 | # Rule Set #2 - Keys (Foreign & Unique)
8 |
9 | ## Table of Contents
10 | - T005 - Unique (Native) Key Has No Duplicates
11 | - T006 - Foreign Key Childern Have Orphans
12 | - T007 - Foreign Key Parent is Childless
13 |
14 |
15 |
16 |
17 | ### T005 - Unique Key Has No Duplicates
18 | Sure, good database design implies that unique keys be enforced by a constraint so that you do not need to test for it. However, there are times where a decision is made to **not** add a constraint to enforce the unique key (e.g.: table is replicated from a source having the constraint so skipped for performance). This does happen! At work just last week, two of my unique key regression tests started failing -- without these checks in place as a wide net, the downstream defects would been around a lot longer.
19 | In the example below, the inner query does a group by on the unique key fields, then using a HAVING clause filters down to those key-values with a count of more than 1 -- the dups. The outer query returns a fail if any rows come back with dups (match_count >= 2), or a pass if no dups found.
20 | ```sql
21 | SELECT CASE WHEN COUNT(*) > 0 THEN 'FAIL' ELSE 'P' END AS status
22 | FROM (
23 | SELECT country_name -- UKey fields separated by comma
24 | , COUNT(*) AS match_count
25 | FROM demo_hr.countries -- UKey fields separated by comma
26 | GROUP BY country_name
27 | HAVING COUNT(*) > 1
28 | ) t;
29 | ```
30 |
31 |
32 |
33 |
34 | ### T006 - Foreign Key Children Have Orphans
35 | Sure, as with T005 UKeys above, good database design implies that foreign keys be enforced by a constraint so that you do not need to test for it. However, there are times where for whatever reason the constraints do not exist. In those instances, you will want to periodically run a data validation test to ensure that this core assumption is not being violated (of course adding a foreign key constraint would be best, but if that is not an option then periodically check).
36 | In the example below, the inner query pulls from the child table countries as the anchor, then left joins out to the parent table regions on the key field region_id. If region_id does not exist in the parent table (p.region_id IS NULL), then the child region_id is an orphan. The outer query checks the count() of orphaned child rows: if it is >= 1 then the test fails, but if the count() = 0 then it passes.
37 | ```sql
38 | SELECT CASE WHEN COUNT(*) > 0 THEN 'FAIL' ELSE 'P' END AS status
39 | FROM (
40 | SELECT DISTINCT c.region_id AS child_id, p.region_id AS parent_id
41 | FROM demo_hr.countries c
42 | LEFT JOIN demo_hr.regions p ON p.region_id = c.region_id
43 | WHERE p.region_id IS NULL
44 | ) t;
45 | ```
46 |
47 |
48 |
49 |
50 | ### T007 - Foreign Key Parent Is Childless
51 | I'm not sure this particular test is all that useful because often it is okay for the parent-side of a foreign key relationship to not have children. But, if for some reason you need to be sure there is data present on both sides (parent **and** child), then this test is for you. You will notice in the example below that the query is very similar to T006 above, but the parent and child tables have switched positions in the FROM and LEFT JOIN lines. This is because we want to first pull all parent rows, then left join to find missing (FKey field IS NULL) child rows.
52 | ```sql
53 | SELECT CASE WHEN COUNT(*) > 0 THEN 'FAIL' ELSE 'P' END AS status
54 | FROM (
55 | SELECT DISTINCT c.country_id AS child_id, p.country_id AS parent_id
56 | FROM demo_hr.countries p
57 | LEFT JOIN demo_hr.locations c ON p.country_id = c.country_id
58 | WHERE c.country_id IS NULL
59 | AND p.country_id IN('IT','JP','US','CA','CN','IN','AU','SG','UK','DE','CH','NL','MX')
60 | ) t;
61 | ```
62 |
--------------------------------------------------------------------------------
/postgresql/data_validation/rule_set_03__heuristic_thresholds.md:
--------------------------------------------------------------------------------
1 | ### Data Validation Examples - Postgresql
2 | [](LICENSE "Creative Commons Zero License by DataResearchLabs (effectively = Public Domain")
3 | [](http://www.DataResearchLabs.com)
4 | #### [Return to Data Validation Home Page](https://github.com/DataResearchLabs/sql_scripts/blob/main/data_validation_scripts.md)
5 |
6 | # Rule Set #3 - Heuristic Thresholds (Column Null Rates & Value Rates)
7 |
8 |
9 | ## Table of Contents
10 | - T008 - Null Rate Thresholds
11 | - T009 - Value Frequency Thresholds
12 |
13 | Note: Although the terms "rate" and "frequency" are mostly synonomous, nulls are usually referred to as "rates" and values as "frequencies". Don't know why, but following the convention I've heard over and over.
14 |
15 |
16 |
17 |
18 | ### T008 - Null Rate Thresholds
19 | There is a lot going on in this "Null Rate Threshold" data validation query below. The basic goal is to validate that a given table's columns do not have too many, or too few NULLs.
20 | In the example below...
22 | ...we are checking columns department_name, manager_id, and url in the table departments. We want to ensure that column department_name has no NULLs, and that column manager_id is NULL less than 65% of the time and column url is NULL less than 80% of the time.
23 |
24 | This check is like the proverbial miner's canary in that it is a trip wire triggered when something goes awry in a data feed. I've used this test scenario to great effect when coupled with a create-date or last-updated-date to monitor the past week's data loads for any unexpected upticks in null rates.
25 |
26 | There is a downside to this test scenario too however; and that is when it fires false alarms and you find yourself tinkering with the thresholds values (0.0000, 0.65000, and 0.80000 cutoffs below), raising and lowering them over and over. If this happens, chances are test fails are not actionable nor important and you should not waste your time applying this test scenario to that given table and field. Be careful to only pick fields that truly matter.
27 |
28 | Below, there is an upper CTE (common table expression) named "dtls" at the WITH clause, and a lower wrapper that applies the business logic (if any null rate rejections were found, fail the case). Inside the dtls CTE, there is an inner query at the bottom (at the FROM clause) doing a single table scan to calculate a null rate per column by counting nulls in each column and dividing by the total table row count. The SELECT CASE logic at the top applies the business logic; comparing the actual calcuated null rates (nr_dept_nm, nr_mgr_id, and nr_url) against the expected threshold rates (hard-coded as 0.0000, 0.6500, and 0.8000). The returned value is a rejection code (REJ-01, REJ-02, etc.) clearly indicating which field failed the null rate check, what the actual null rate was, and what the expected null rate threshold to exceed was. If no rejections are triggered, then status returns a "P" for pass.
29 |
51 |
52 |
53 |
54 | ### T009 - Value Frequency Thresholds
55 | "Value Frequency Threshold" tests are fairly similar to null rates above (T008). The difference is that we are checking the frequency (or rate) at which a column's values occur.
56 | In the example below...
58 | ...we are checking the frequencies with which the values 1, 2, 3, and 4 occur in field region_id of table countries. There is an upper CTE (common table expression) named "dtls" at the WITH clause, and a lower wrapper that applies the business logic (if any value frequency rejections were found, fail the case). Inside the dtls CTE, there is an inner query at the bottom (at the FROM clause) doing a single table scan to calculate a frequencies for each value in the GROUP BY for the column. It the GROUP BY value count (field "freq") is divided by the total table row count (field "den") to calculate field "freq_rt". The SELECT CASE logic at the top applies the business logic; comparing the actual value frequencies (freq_rt when region_id = 1, or =2, etc.) against the expected threshold frequencies (hard-coded as 0.28 to 0.36, 016 to 0.24 and so on). The returned value is a rejection code (REJ-01, REJ-02, etc.) clearly indicating which field failed the value ferquency check, what the actual value frequency was, and what the expected value frequency threshold ranges were. If no rejections are triggered, then status returns a "P" for pass.
59 |
16 |
17 |
18 |
19 | ### T010 - Not Null
20 | Verify numeric field is not null. For example, to verify that table countries has no NULLs in field region_id:
21 | ```sql
22 | SELECT CASE WHEN COUNT(*) > 0 THEN 'FAIL' ELSE 'P' END AS status
23 | FROM demo_hr.countries
24 | WHERE region_id IS NULL;
25 | ```
26 |
27 |
28 |
29 |
30 | ### T011 - Not Negative
31 | Verify numeric field is not negative. For example, to verify that table countries has no field region_id negative values:
32 | ```sql
33 | SELECT CASE WHEN COUNT(*) > 0 THEN 'FAIL' ELSE 'P' END AS status
34 | FROM demo_hr.countries
35 | WHERE region_id < 0;
36 | ```
37 |
38 |
39 |
40 |
41 | ### T012 - Numeric Range
42 | Verify numeric field value is within a range. In the example below, we verify that field employee_id is between 100 and 999 in table employees. Note that you can run the inner query yourself to return the actual rejection code (is too low or too high) along with the actual value and the expected value...all nicely packaged for easy troubleshooting.
43 | ```sql
44 | SELECT CASE WHEN COUNT(*) > 0 THEN 'FAIL' ELSE 'P' END AS status
45 | FROM (
46 | SELECT employee_id
47 | , CASE WHEN employee_id < 100 THEN 'REJ-01: Verify employee_id > 99|exp>99|act=' + CAST(employee_id AS VARCHAR(10))
48 | WHEN employee_id > 999 THEN 'REJ-02: Verify employee_id < 1000|exp<1000|act=' + CAST(employee_id AS VARCHAR(10))
49 | ELSE 'P'
50 | END AS status
51 | FROM demo_hr.employees
52 | ) t
53 | WHERE status <> 'P';
54 | ```
55 |
56 |
57 |
58 |
59 | ### T013 - In Value List
60 | Verify numeric field is **in** the list of values. For example, to verify that table countries field region_id is always values 1, 2, 3, or 4 we use the IN() clause as shown below:
61 | ```sql
62 | SELECT CASE WHEN COUNT(*) > 0 THEN 'FAIL' ELSE 'P' END AS status
63 | FROM (
64 | SELECT region_id
65 | , CASE WHEN region_id NOT IN(1,2,3,4) THEN 'FAIL' ELSE 'P' END AS status
66 | FROM demo_hr.countries
67 | ) t
68 | WHERE status <> 'P';
69 | ```
70 |
71 |
72 |
73 |
74 | ### T014 - Not In Value List
75 | Verify numeric field is **not** in the list of values. For example, to verify that table countries field region_id is never in values 97, 98, or 99 we use the NOT IN() clauses as shown below:
76 | ```sql
77 | SELECT CASE WHEN COUNT(*) > 0 THEN 'FAIL' ELSE 'P' END AS status
78 | FROM (
79 | SELECT region_id
80 | , CASE WHEN region_id IN(97,98,99) THEN 'FAIL' ELSE 'P' END AS status
81 | FROM demo_hr.countries
82 | ) t
83 | WHERE status <> 'P';
84 | ```
85 |
86 |
87 |
88 |
89 | ### T015 - Multi Field Compare
90 | Verify numeric field values in relation to one another. For example, to verify that salary times commission_pct is always less than $10,000 in table employees:
91 | ```sql
92 | SELECT CASE WHEN COUNT(*) > 0 THEN 'FAIL' ELSE 'P' END AS status
93 | FROM (
94 | SELECT salary, commission_pct
95 | , CASE WHEN salary * commission_pct > 10000 THEN 'FAIL' ELSE 'P' END AS status
96 | FROM demo_hr.employees
97 | ) t
98 | WHERE status <> 'P';
99 | ```
100 |
101 |
--------------------------------------------------------------------------------
/postgresql/data_validation/rule_set_05__date_values.md:
--------------------------------------------------------------------------------
1 | ### Data Validation Examples - Postgresql
2 | [](LICENSE "Creative Commons Zero License by DataResearchLabs (effectively = Public Domain")
3 | [](http://www.DataResearchLabs.com)
4 | #### [Return to Data Validation Home Page](https://github.com/DataResearchLabs/sql_scripts/blob/main/data_validation_scripts.md)
5 |
6 |
7 | # Rule Set #5 - Date Values
8 |
9 | ## Table of Contents
10 | - T016 - Not Null
11 | - T017 - Date Range
12 | - T018 - No Time Part
13 | - T019 - Has Time Part
14 | - T020 - Multi Field Compare
15 | - Bonus Tip - Joining Tables with 2 Pairs of Start-End Date Overlaps
16 |
17 |
18 |
19 |
20 | ### T016 - Not Null
21 | Verify date field is not null. For example, to verify that table countries has no NULLs in field date_last_updated:
22 | ```sql
23 | SELECT date_last_updated
24 | , CASE WHEN date_last_updated IS NULL THEN 'FAIL' ELSE 'P' END AS status
25 | FROM demo_hr.countries;
26 | ```
27 |
28 |
29 |
30 |
31 | ### T017 - Date Range
32 | Verify date field is within specified range. For example, you can run the sql below to verify that table countries field date_last_updated is between 1/1/2021 and today. Note the use of SYSDATE to represent today's date dynamically in Oracle. Notice the inner query uses a CASE...WHEN...ELSE structure to identify two rejections codes: (1) date is too high, and (2) date is too low. Expected and actual values are displayed in the output if you run the inner query only. The outer query is a wrapper to determine whether the test passed or failed.
33 | ```sql
34 | SELECT date_last_updated
35 | , CASE WHEN date_last_updated > CLOCK_TIMESTAMP() THEN 'REJ-01: Field date_last_updated cannot be in the future|exp<=' || CAST(CLOCK_TIMESTAMP() AS VARCHAR(20)) || '|act=' || CAST(date_last_updated AS VARCHAR(20))
36 | WHEN date_last_updated < TO_DATE('01/01/2021', 'mm/dd/yyyy') THEN 'REJ-02: Field date_last_updated cannot be too old|exp>=1/1/2021|act=' || CAST(date_last_updated AS VARCHAR(20))
37 | ELSE 'P'
38 | END AS status
39 | FROM demo_hr.countries;
40 | ```
41 |
42 |
43 |
44 |
45 | ### T018 - No Time Part
46 | Verify date field is a date only, no time part present. For example, to verify that table employees has no time part in field hire_date (time part must be "12:00:00"):
47 | ```sql
48 | SELECT hire_date
49 | , CASE WHEN TO_CHAR(hire_date, 'hh:mi:ss') <> '12:00:00' THEN 'FAIL' ELSE 'P' END AS status
50 | FROM demo_hr.employees;
51 | ```
52 |
53 |
54 |
55 |
56 | ### T019 - Has Time Part
57 | Verify date field is a date **and** time. For example, to verify that table employees has a time part in field hire_date (time part cannot be "12:00:00"):
58 | ```sql
59 | SELECT date_last_updated
60 | , CASE WHEN TO_CHAR(date_last_updated, 'hh:mi:ss') = '12:00:00' THEN 'FAIL' ELSE 'P' END AS status
61 | FROM demo_hr.countries;
62 | ```
63 |
64 |
65 |
66 |
67 | ### T020 - Multi Field Compare
68 | Verify multiple date fields relative to each other. For example, to verify that field start_date must be < field end_date in table job_history (thus if start_date is >= end_date the test case fails):
69 | ```sql
70 | SELECT start_date, end_date
71 | , CASE WHEN start_date >= end_date THEN 'FAIL' ELSE 'P' END AS status
72 | FROM demo_hr.job_history;
73 | ```
74 |
75 |
76 |
77 |
78 | ### Bonus Tip - Joining Tables with 2 Pairs of Start-End Date Overlaps
79 | So this is a fantastic tip I learned from a co-worker in healthcare back in 2011 (was it Jennifer C.? or Matt G.? or Jonathon P.? I can't remember).
80 |
81 |
82 | The **problem** is that you are trying to join two tables with logic where the table1.start_dt/end_dt's overlap with the table2.start_dt/end_dt.
83 |
84 |
85 | The **solution** is to **join on `table1.start_dt <= table2.end_dt AND table1.end_dt >= table2.start_dt`**.
86 |
87 | Here is why:
88 |
89 | ```
90 | Scenario #1 = "Discard - No Overlap" Table #1 date range is completely before Table #2's (Dt#1.End is NOT > Dt#2.Start)
91 | Table#1: |Start ----------- End|
92 | Table#2: |Start ----------- End|
93 |
94 |
95 | Scenario #2 = "Include - Table #1 End = Table#2 Start" Table #1 date range ends exactly where Table #2's starts (Tbl#1.End = Tbl#2.Start AND Tbl#1.Start < Tbl#2.End)
96 | Table#1: |Start ----------- End|
97 | Table#2: |Start ----------- End|
98 |
99 |
100 | Scenario #3 = "Include - Table #1 Ends Midway Table #2 Span" Table #1 date range nicely overlaps Table #2's (Tbl#1.End > Tbl#2.Start AND Tbl#1.Start < Tbl#2.End)
101 | Table#1: |Start ----------- End|
102 | Table#2: |Start ----------- End|
103 |
104 |
105 | Scenario #4 = "Include - Table #1 Starts Midway Table #2 Span" Table #1 date range nicely overlaps Table #2's (Tbl#1.End > Tbl#2.Start AND Tbl#1.Start < Tbl#2.End)
106 | Table#1: |Start ----------- End|
107 | Table#2: |Start ----------- End|
108 |
109 |
110 | Scenario #5 = "Include - Table #1 Starts at Table #2 End" Table #1 date range start exactly at Table #2's End (Tbl#1.End > Tbl#2.St AND Tbl#1.St = Tbl#2.End)
111 | Table#1: |Start ----------- End|
112 | Table#2: |Start ----------- End|
113 |
114 |
115 | Scenario #6 = "Discard - No Overlap" Date #1 is entirely after Date #2 (Tbl#1.Start is > Tbl#2.End)
116 | Table#1: |Start ----------- End|
117 | Table#2: |Start ----------- End|
118 |
119 | ```
120 |
121 |
122 |
123 |
--------------------------------------------------------------------------------
/postgresql/data_validation/rule_set_07__regular_expressions.md:
--------------------------------------------------------------------------------
1 | ### Data Validation Examples - Postgresql
2 | [](LICENSE "Creative Commons Zero License by DataResearchLabs (effectively = Public Domain")
3 | [](http://www.DataResearchLabs.com)
4 | #### [Return to Data Validation Home Page](https://github.com/DataResearchLabs/sql_scripts/blob/main/data_validation_scripts.md)
5 |
6 | # Rule Set #7 - Regular Expressions
7 |
8 | ## Table of Contents
9 | - T045 - RegExp("IsPhoneNumber")
10 | - T046 - RegExp("IsSSN")
11 | - T047 - RegExp("IsZip5")
12 | - T048 - RegExp("IsZip5or9")
13 | - T049 - RegExp("IsZip9")
14 | - T050 - RegExp("OnlyText")
15 | - T051 - RegExp("OnlyNumeric")
16 | - T052 - RegExp("NoLeadTrailSpaces")
17 | - T053 - RegExp("NoWhitespaces")
18 | - T054 - RegExp("OnlyLowerCase")
19 | - T055 - RegExp("OnlyUpperCase")
20 | - T056 - RegExp("TitleCase")
21 | - T057 - RegExp("EmailAddress")
22 | - T058 - RegExp("IsUrl")
23 |
24 |
25 |
26 |
27 | ### T045 - RegExp("IsPhoneNumber")
28 | Verify text field is a phone number format. For example, to verify that field phone_number of table employees is either US or international format:
29 | ```sql
30 | SELECT phone_number
31 | , CASE WHEN NOT phone_number ~ '[0-9]{3}[-. ][0-9]{3}[-. ][0-9]{4}' THEN 'FAIL' ELSE 'P' END AS status
32 | FROM demo_hr.employees;
33 | ```
34 |
35 |
36 |
37 |
38 | ### T046 - RegExp("IsSSN")
39 | Verify text field is a valid social security number (SSN) format. For example, to verify that field fake_ssn of table employees is a valid SSN format:
40 | ```sql
41 | SELECT fake_ssn
42 | , CASE WHEN NOT fake_ssn ~ '^[0-9]{3}-[0-9]{2}-[0-9]{4}$' THEN 'FAIL' ELSE 'P' END AS status
43 | FROM demo_hr.employees;
44 | ```
45 |
46 |
47 |
48 |
49 | ### T047 - RegExp("IsZip5")
50 | Verify text field is a valid zipcode 5-digit format. For example, to verify that field zip5 of table employees is a valid format:
51 | ```sql
52 | SELECT zip5
53 | , CASE WHEN NOT zip5 ~ '^[0-9]{5}$' THEN 'FAIL' ELSE 'P' END AS status
54 | FROM demo_hr.employees;
55 | ```
56 |
57 |
58 |
59 |
60 | ### T048 - RegExp("IsZip5or9")
61 | Verify text field is a valid zipcode 5- or 9-digit format. For example, to verify that field zip5or9 of table employees is a valid format:
62 | ```sql
63 | SELECT zip5or9
64 | , CASE WHEN NOT zip5or9 ~ '^[[:digit:]]{5}(-[[:digit:]]{4})?$' THEN 'FAIL' ELSE 'P' END AS status
65 | FROM demo_hr.employees;
66 | ```
67 |
68 |
69 |
70 |
71 | ### T049 - RegExp("IsZip9")
72 | Verify text field is a valid zipcode 9-digit format. For example, to verify that field zip9 of table employees is a valid format:
73 | ```sql
74 | SELECT zip9
75 | , CASE WHEN NOT zip9 ~ '^[[:digit:]]{5}[-/.][[:digit:]]{4}$' THEN 'FAIL' ELSE 'P' END AS status
76 | FROM demo_hr.employees;
77 | ```
78 |
79 |
80 |
81 |
82 | ### T050 - RegExp("OnlyText")
83 | Verify text field is text / only contains alpha characters. For example, to verify that field last_name of table employees is text only:
84 | ```sql
85 | SELECT last_name
86 | , CASE WHEN NOT last_name ~ '^[a-zA-Z ]+$' THEN 'FAIL' ELSE 'P' END AS status
87 | FROM demo_hr.employees;
88 | ```
89 |
90 |
91 |
92 |
93 | ### T051 - RegExp("OnlyNumeric")
94 | Verify text field numeric characters only. For example, to verify that field zip5 of table employees is numeric digits only:
95 | ```sql
96 | SELECT zip5
97 | , CASE WHEN NOT zip5 ~ '^[0-9]+$' THEN 'FAIL' ELSE 'P' END AS status
98 | FROM demo_hr.employees;
99 | ```
100 |
101 |
102 |
103 |
104 | ### T052 - RegExp("NoLeadTrailSpaces")
105 | Verify text field has no leading or trailing spaces. For example, to verify that field last_name of table employees is fully trimmed:
106 | ```sql
107 | SELECT last_name
108 | , CASE WHEN last_name ~ '(^\s)|(\s$)' THEN 'FAIL' ELSE 'P' END AS status
109 | FROM demo_hr.employees;
110 | ```
111 |
112 |
113 |
114 |
115 | ### T053 - RegExp("NoWhitespaces")
116 | Verify text field has no whitespace (spaces, non breaking spaces, carriage return, line feed, etc.). For example, to verify that field job_id of table employees has no whitespace:
117 | ```sql
118 | SELECT job_id
119 | , CASE WHEN job_id ~ '(\s)+' THEN 'FAIL' ELSE 'P' END AS status
120 | FROM demo_hr.employees;
121 | ```
122 |
123 |
124 |
125 |
126 | ### T054 - RegExp("OnlyLowerCase")
127 | Verify text field has only lower case characters. For example, (not really practical, but as a demo) to verify that the 3rd and 4th characters in the field first_name of table employees are lower case:
128 | ```sql
129 | SELECT first_name
130 | , CASE WHEN NOT SUBSTRING(first_name,3,2) ~ '^[a-z]+$' THEN 'FAIL' ELSE 'P' END AS status
131 | FROM demo_hr.employees;
132 | ```
133 |
134 |
135 |
136 |
137 | ### T055 - RegExp("OnlyUpperCase")
138 | Verify text field has only upper case characters. For example, to verify that all characters are uppercase in the field first_name of table employees:
139 | ```sql
140 | SELECT email
141 | , CASE WHEN NOT SUBSTRING(email,3,2) ~ '^[A-Z]+$' THEN 'FAIL' ELSE 'P' END AS status
142 | FROM demo_hr.employees;
143 | ```
144 |
145 |
146 |
147 |
148 | ### T056 - RegExp("TitleCase")
149 | Verify text field is title case format (where the first letter of every word is upper case, and the rest are lower case). For example, to verify that the field first_name of table employees has proper casing:
150 | ```sql
151 | SELECT first_name, SUBSTRING(first_name,1,1) AS first_letter
152 | , CASE WHEN NOT SUBSTRING(first_name,1,1) ~ '([A-Z])' THEN 'REJ-01: Field first_name first character not upper case|exp=Like"[A-Z]"|act=' || first_name
153 | WHEN first_name NOT LIKE '% %' THEN 'P' -- Only one word, so no space + first character to check for uppercase
154 | WHEN NOT first_name ~ '(\s[A-Z]){1}' THEN 'REJ-02: Field first_name failed RegExpression check|exp=Like"(\s[A-Z]){1}"|act=' || first_name
155 | ELSE 'P'
156 | END AS status
157 | FROM demo_hr.employees;
158 | ```
159 |
160 |
161 |
162 |
163 | ### T057 - RegExp("EmailAddress")
164 | Verify text field is a properly formatted email address. For example, to verify that the field email_address of table employees is properly formatted:
165 | ```sql
166 | SELECT email_address
167 | , CASE WHEN NOT email_address ~ '^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,4}$' THEN 'FAIL' ELSE 'P' END AS status
168 | FROM demo_hr.employees;
169 | ```
170 |
171 |
172 |
173 |
174 | ### T058 - RegExp("IsUrl")
175 | Verify text field is a properly formatted URL. For example, to verify that the field url of table departments is properly formatted with "http://" or "https://":
176 | ```sql
177 | SELECT url
178 | , CASE WHEN NOT url ~ '(http)(s)?(:\/\/)' THEN 'FAIL' ELSE 'P' END AS status
179 | FROM demo_hr.departments;
180 | ```
181 |
182 |
--------------------------------------------------------------------------------
/postgresql/data_validation/rule_set_09__defect_regression.md:
--------------------------------------------------------------------------------
1 | ### Data Validation Examples - Postgresql
2 | [](LICENSE "Creative Commons Zero License by DataResearchLabs (effectively = Public Domain")
3 | [](http://www.DataResearchLabs.com)
4 | #### [Return to Data Validation Home Page](https://github.com/DataResearchLabs/sql_scripts/blob/main/data_validation_scripts.md)
5 |
6 |
7 | # Rule Set #9 - Defect Regression
8 |
9 | ## Summary
10 | Where possible, it is a good idea to cover known defects with a test case. This enables automated regression testing for free. Yuo can simply run this script to quickly re-check whether a defect has been re-introduced.
11 |
12 | There are a lot of caveats that will block you from covering every defect:
13 | * If anything but the simplest of setup and tear down data is required
14 | * If processes or jobs or applications or ETL must be triggered
15 | * If there are interactions with other databases or files not on the current server
16 |
17 | That said, you can still frequently setup static trip wires to monitor the data for recurrence of old bugs.
18 |
19 | Some real-world examples from my past include:
20 | * A defect where email addresses stored in the database had TAB characters. I setup a simple CHAR(9) check that ran daily and cross referenced the defect number in the SQL return results.
21 | * A defect where customer specific HTML snippets for email reports had bad tags manually entered by company analysts working with the client. Obviously fixing the app to avoid those data entries would be best, but that was a different department that was going to get to it later. So, I setup a daily alert to check for those HTML tags and some other similar ones that might be accidentally introduced.
22 | * A defect where internal data entry folks on rare occasions had to key in customer names. They were supposed to be all upper case for other reasons in the system, but the analysts sometimes entered lower case. Developers were not going to get around to fixing it for a couple of months (other priorities, this was a very low frequency issue). So I setup a daily check to trip an alarm and notify via email the appropriate person to correct their data.
23 | * Basically, you can take any of the test case examples in Rule Sets #1 thu #8 and apply them as regression tests for defects when appropriate.
24 |
--------------------------------------------------------------------------------
/postgresql/schemadiff/simple_schema_dump.sql:
--------------------------------------------------------------------------------
1 | ------------------------------------------------------------------------------------
2 | -- Simple Schema Dump:
3 | -- This SQL script will dump table, column, key, and index design related metadata
4 | -- so that you can copy-paste or export to a text file.
5 | -- Even better, you can make other snapshots over time (same database schema earlier
6 | -- points in time), OR in different environments (DEV, PPMO, STAGE, PROD). Then,
7 | -- using your favorite free Text File Diff Tool (DiffMerge, ExamDiff, etc.) you
8 | -- can compare snapshots to quick isolate and identify what changed over time
9 | -- or is different between environments.
10 | ------------------------------------------------------------------------------------
11 | -- Platform: PostgreSQL Server
12 | -- Author: DataResearchLabs
13 | -- GitHub: https://github.com/DataResearchLabs/sql_scripts
14 | -- YouTube Tutorials: https://www.youtube.com/playlist?list=PLVHoUDdbskURPrZpH0Zkzfa1OQjNfj2Gg
15 | ----------------------------------------------------------------------------------
16 | WITH vars
17 | AS (
18 | SELECT 'pg_catalog' AS v_SchemaName
19 | )
20 |
21 | , baseTbl
22 | AS (
23 | SELECT table_schema AS SchemaName, table_type, table_name
24 | FROM INFORMATION_SCHEMA.TABLES
25 | WHERE table_schema = (SELECT v_SchemaName FROM vars)
26 | )
27 |
28 | , metaForTbl
29 | AS (
30 | SELECT t.SchemaName
31 | , t.table_name AS TableName
32 | , CONCAT( '('
33 | , CASE WHEN t.table_type = 'BASE TABLE' THEN 'Table'
34 | WHEN t.table_type = 'VIEW' THEN 'View'
35 | ELSE 'UK'
36 | END
37 | , ')'
38 | ) AS ObjectType
39 | , t.table_name AS ObjectName
40 | , '(Exists)' AS PropertyName
41 | , ' ' AS PropertyValue
42 | FROM baseTbl t
43 | )
44 |
45 | , metaForCol_dataType
46 | AS (
47 | SELECT ft.SchemaName, ft.table_name AS TableName, 'Column' AS ObjectType, tut.column_name AS ObjectName
48 | , '2' AS PropertyName
49 | , CONCAT(COALESCE(tut.data_type, 'unknown')
50 | , CASE WHEN tut.CHARACTER_MAXIMUM_LENGTH IS NOT NULL
51 | OR tut.NUMERIC_PRECISION IS NOT NULL
52 | OR tut.NUMERIC_SCALE IS NOT NULL THEN '('
53 | ELSE ''
54 | END
55 | , CASE WHEN tut.CHARACTER_MAXIMUM_LENGTH IS NOT NULL THEN CAST(tut.CHARACTER_MAXIMUM_LENGTH AS VARCHAR(10)) ELSE '' END
56 | , CASE WHEN tut.DATA_TYPE IN('date','datetime','timestamp') THEN CAST(tut.DATETIME_PRECISION AS VARCHAR(10))
57 | WHEN tut.NUMERIC_PRECISION IS NULL THEN ''
58 | ELSE CONCAT(CAST(tut.NUMERIC_PRECISION AS VARCHAR(10)))
59 | END
60 | , CASE WHEN tut.NUMERIC_SCALE IS NOT NULL
61 | AND tut.NUMERIC_SCALE >0 THEN CONCAT(',', CAST(tut.NUMERIC_SCALE AS VARCHAR(10))) ELSE ''
62 | END
63 | , CASE WHEN tut.CHARACTER_MAXIMUM_LENGTH IS NOT NULL
64 | OR tut.NUMERIC_PRECISION IS NOT NULL
65 | OR tut.NUMERIC_SCALE IS NOT NULL THEN ')'
66 | END
67 | ) AS PropertyValue
68 | FROM INFORMATION_SCHEMA.COLUMNS tut
69 | INNER JOIN baseTbl ft ON ft.SchemaName = tut.TABLE_SCHEMA AND ft.TABLE_NAME = tut.TABLE_NAME
70 | )
71 |
72 | , metaForCol_nullable
73 | AS (
74 | SELECT ft.SchemaName, ft.table_name AS TableName, 'Column' AS ObjectType, tut.column_name AS ObjectName
75 | , '3' AS PropertyName, CASE WHEN tut.IS_NULLABLE = 'YES' THEN 'NULL' ELSE 'NOT NULL' END AS PropertyValue
76 | FROM INFORMATION_SCHEMA.COLUMNS tut
77 | INNER JOIN baseTbl ft ON ft.SchemaName = tut.TABLE_SCHEMA AND ft.table_name = tut.table_name
78 | )
79 |
80 | , metaForCol_ordpos
81 | AS (
82 | SELECT ft.SchemaName, ft.table_name AS TableName, 'Column' AS ObjectType, tut.column_name AS ObjectName
83 | , '1' AS PropertyName
84 | , CASE WHEN tut.ORDINAL_POSITION IS NULL THEN ''
85 | ELSE LPAD( CAST(tut.ORDINAL_POSITION AS VARCHAR(3)), 3, '0')
86 | END AS PropertyValue
87 | FROM INFORMATION_SCHEMA.COLUMNS tut
88 | INNER JOIN baseTbl ft ON ft.SchemaName = tut.TABLE_SCHEMA AND ft.table_name = tut.table_name
89 | )
90 |
91 | , metaAllCols
92 | AS (
93 | SELECT schemaname, tablename, objecttype, objectname, 'Properties' AS propertyname
94 | , STRING_AGG(propertyvalue, ' | ' ORDER BY propertyname, propertyvalue) AS propertyvalue
95 | FROM (
96 | SELECT * FROM metaForCol_dataType
97 | UNION SELECT * FROM metaForCol_nullable
98 | UNION SELECT * FROM metaForCol_ordpos
99 | ) t
100 | GROUP BY schemaname, tablename, objecttype, objectname
101 | )
102 |
103 | , metaForKeys
104 | AS (
105 | SELECT cons.TABLE_SCHEMA AS SchemaName, cons.TABLE_NAME AS TableName
106 | , CASE WHEN cons.constraint_type = 'PRIMARY KEY' THEN 'PKey'
107 | WHEN cons.constraint_type = 'UNIQUE' THEN 'UKey'
108 | WHEN cons.constraint_type = 'FOREIGN KEY' THEN 'FKey'
109 | ELSE 'X'
110 | END AS ObjectType
111 | , cons.constraint_name AS ObjectName
112 | , 'FieldList' AS PropertyName
113 | , STRING_AGG(kcu.COLUMN_NAME, ',' ORDER BY kcu.ORDINAL_POSITION) AS PropertyValue
114 | FROM INFORMATION_SCHEMA.TABLE_CONSTRAINTS cons
115 | INNER JOIN INFORMATION_SCHEMA.KEY_COLUMN_USAGE kcu
116 | ON cons.TABLE_SCHEMA = kcu.TABLE_SCHEMA
117 | AND cons.TABLE_NAME = kcu.TABLE_NAME
118 | WHERE cons.TABLE_NAME IN(SELECT DISTINCT table_name FROM baseTbl)
119 | AND cons.TABLE_SCHEMA = (SELECT v_SchemaName FROM vars)
120 | AND cons.constraint_type IN('PRIMARY KEY','FOREIGN KEY','UNIQUE')
121 | GROUP BY cons.TABLE_SCHEMA, cons.TABLE_NAME, cons.CONSTRAINT_TYPE, cons.CONSTRAINT_NAME
122 | )
123 |
124 | , metaForIdxs
125 | AS (
126 | SELECT (SELECT v_SchemaName FROM vars) AS SchemaName
127 | , tablename AS TableName
128 | , 'Index' AS ObjectType
129 | , indexname AS ObjectName
130 | , 'FieldList' AS PropertyName
131 | , REPLACE(SUBSTR(indexdef, POSITION('(' IN indexdef)+ 1), ')', '') AS PropertyValue
132 | FROM pg_catalog.pg_indexes
133 | WHERE schemaname = (SELECT v_SchemaName FROM vars)
134 | )
135 |
136 | , allMetadata
137 | AS (
138 | SELECT * FROM metaForTbl
139 | UNION SELECT * FROM metaAllCols
140 | UNION SELECT * FROM metaForKeys
141 | UNION SELECT * FROM metaForIdxs
142 | )
143 |
144 | SELECT CASE WHEN objecttype IN('(Table)','(View)') THEN schemaname ELSE ' ' END AS schema_nm
145 | , CASE WHEN objecttype IN('(Table)','(View)') THEN tablename ELSE ' ' END AS tbl_nm
146 | , objecttype AS obj_typ, objectname AS obj_nm, /*propertyname,*/ propertyvalue AS properties
147 | FROM allMetadata
148 | ORDER BY schemaname, tablename, objecttype
149 | , CASE WHEN objecttype='Column' THEN propertyvalue ELSE ' ' END
150 | , objectname, propertyname
151 |
152 |
153 |
--------------------------------------------------------------------------------
/schemadiff_scripts.md:
--------------------------------------------------------------------------------
1 | # Multiple Platform SchemaDiff Scripts
2 | [](LICENSE "Creative Commons Zero License by DataResearchLabs (effectively = Public Domain")
3 | [](http://www.DataResearchLabs.com)
4 |
5 | ## Overview
6 | There are many reasons you may need a SQL script that captures table, view, column, and key properties.
7 | It all boils down to comparing different states to quickly identify what changed, or what is different.
8 |
9 |
10 | * Maybe you need to quickly isolate what got missed in the migrations from DEV to TEST environments.
11 | * Maybe you need to quickly identify what changed between this release (AFTER) and the prior release (BEFORE).
12 | * Maybe you want to run the script daily and output state snapshots to text files so that in the event of an emergency you can quickly identify what changed between given dates.
13 |
14 | Using your favorite text diff-ing tool, here is what two sample schemas (PPMO-OldSchema vs. DEV-NewSchema) might look like side-by-side (red markup lines added):
15 |
16 |
17 |
18 | ***If you like these scripts, be sure to click the "Star" button above in GitHub.***
19 |
20 | ***Also, be sure to visit or subscribe to our YouTube channel*** www.DataResearchLabs.com!
21 |
22 |
23 | Links to the script source code as well as video tutorials are listed below, by platform (MSSQL, Oracle, MySQL, etc.):
24 |
25 |
26 |
27 |
31 | 32 | ![]() |
34 |
35 |
36 |
37 | ## Greenplum
38 | * [Greenplum "Schema Dump" Script](https://raw.githubusercontent.com/DataResearchLabs/sql_scripts/main/greenplum/schemadiff/simple_schema_dump.sql) 39 | * [Greenplum "Schema Dump" Tutorial](http://www.youtube.com/watch?feature=player_embedded&v=U8p6YJZtPM8) 40 | |
41 |
42 |
43 |
44 | ![]() |
48 |
54 | 55 | ![]() |
57 |
58 |
59 |
60 | ## MS SQL Server
61 | * [MSSQL "Schema Dump" Script](https://raw.githubusercontent.com/DataResearchLabs/sql_scripts/main/mssql/schemadiff/simple_schema_dump.sql) 62 | * [MSSQL "Schema Dump" Tutorial](http://www.youtube.com/watch?feature=player_embedded&v=Vxf3SM38Ar0) 63 | * *Note: MSSQL 2017 forward only (STRING_AGG fx)* 64 | |
65 |
66 |
67 |
68 | ![]() |
72 |
78 | 79 | ![]() |
81 |
82 |
83 |
84 | ## MySQL
85 | * [MySQL "Schema Dump" Script](https://raw.githubusercontent.com/DataResearchLabs/sql_scripts/main/mysql/schemadiff/simple_schema_dump.sql) 86 | * [MySQL "Schema Dump" Tutorial](http://www.youtube.com/watch?feature=player_embedded&v=LwfagEfficU) 87 | |
88 |
89 |
90 |
91 | ![]() |
95 |
101 | 102 | ![]() |
104 |
105 |
106 |
107 | ## Oracle
108 | * [Oracle "Schema Dump" Script](https://raw.githubusercontent.com/DataResearchLabs/sql_scripts/main/oracle/schemadiff/simple_schema_dump.sql) 109 | * [Oracle "Schema Dump" Tutorial](http://www.youtube.com/watch?feature=player_embedded&v=kzemPW1156s) 110 | |
111 |
112 |
113 |
114 | ![]() |
118 |
123 | 124 | ![]() |
126 |
127 |
128 |
129 | ## PostgreSQL
130 | * [PostgreSQL "Schema Dump" Script](https://raw.githubusercontent.com/DataResearchLabs/sql_scripts/main/postgresql/schemadiff/simple_schema_dump.sql) 131 | * [PostgreSQL "Schema Dump" Tutorial](http://www.youtube.com/watch?feature=player_embedded&v=I3Gr2d7tob0) 132 | |
133 |
134 |
135 |
136 | ![]() |
140 |