├── .github
    └── workflows
    │   └── check.yml
├── .gitignore
├── Collection context clinical code.png
├── Gemfile
├── Gemfile.lock
├── LICENSE
├── OI CDM Hierarchical - Table Relationships.png
├── README.md
├── check_readme.sh
├── convert_to_schema.rb
├── converter.rb
├── dropbox-deployment.yml
├── generate_wide_tables.sql
├── style.rb
└── wide.md


/.github/workflows/check.yml:
--------------------------------------------------------------------------------
 1 | name: GDM Check and Conversion
 2 | on: [push, pull_request]
 3 | jobs:
 4 |   check_and_convert:
 5 |     runs-on: ubuntu-latest
 6 |     environment: GDM Secrets
 7 |     steps:
 8 |     - uses: actions/checkout@v3
 9 |     - uses: ruby/setup-ruby@v1
10 |       with:
11 |         ruby-version: '3.3'
12 |     - 
13 |       name: Install Gems
14 |       run: gem install dropbox-deployment mdl
15 |     - 
16 |       name: Check Syntax
17 |       run: sh check_readme.sh
18 |     - 
19 |       name: Convert to CSV/YAML
20 |       run: ruby converter.rb
21 |     - 
22 |       name: Convert to Sequel Schema
23 |       run: ruby convert_to_schema.rb
24 |     - 
25 |       name: Prepare schemas archive
26 |       run: (cd artifacts && tar jcvf gdm_schemas.tbz schemas)
27 |     - 
28 |       name: Upload to Dropbox
29 |       run: dropbox-deployment
30 |       env:
31 |         DROPBOX_OAUTH_BEARER: ${{ secrets.DROPBOX_OAUTH_BEARER }}


--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 | /artifacts
2 | 


--------------------------------------------------------------------------------
/Collection context clinical code.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/outcomesinsights/generalized_data_model/8ea52e16ca62641ae93fc8e0c9834accc8aa712f/Collection context clinical code.png


--------------------------------------------------------------------------------
/Gemfile:
--------------------------------------------------------------------------------
1 | # frozen_string_literal: true
2 | 
3 | source "https://rubygems.org"
4 | 
5 | # gem "rails"
6 | 
7 | gem "mdl", "~> 0.12.0"
8 | 


--------------------------------------------------------------------------------
/Gemfile.lock:
--------------------------------------------------------------------------------
 1 | GEM
 2 |   remote: https://rubygems.org/
 3 |   specs:
 4 |     chef-utils (18.2.7)
 5 |       concurrent-ruby
 6 |     concurrent-ruby (1.2.2)
 7 |     kramdown (2.4.0)
 8 |       rexml
 9 |     kramdown-parser-gfm (1.1.0)
10 |       kramdown (~> 2.0)
11 |     mdl (0.12.0)
12 |       kramdown (~> 2.3)
13 |       kramdown-parser-gfm (~> 1.1)
14 |       mixlib-cli (~> 2.1, >= 2.1.1)
15 |       mixlib-config (>= 2.2.1, < 4)
16 |       mixlib-shellout
17 |     mixlib-cli (2.1.8)
18 |     mixlib-config (3.0.27)
19 |       tomlrb
20 |     mixlib-shellout (3.2.7)
21 |       chef-utils
22 |     rexml (3.2.6)
23 |     tomlrb (2.0.3)
24 | 
25 | PLATFORMS
26 |   x86_64-linux
27 | 
28 | DEPENDENCIES
29 |   mdl (~> 0.12.0)
30 | 
31 | BUNDLED WITH
32 |    2.4.7
33 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | The MIT License (MIT)
 2 | 
 3 | Copyright (c) 2015 Outcomes Insights, Inc.
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 
23 | 


--------------------------------------------------------------------------------
/OI CDM Hierarchical - Table Relationships.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/outcomesinsights/generalized_data_model/8ea52e16ca62641ae93fc8e0c9834accc8aa712f/OI CDM Hierarchical - Table Relationships.png


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # Generalized Data Model (GDM)
  2 | 
  3 | We have a [manuscript available](https://bmcmedinformdecismak.biomedcentral.com/articles/10.1186/s12911-019-0837-5) that describes the design of the Generalized Data Model (GDM).
  4 | 
  5 | Below is the current version of the schema for the Generalized Data Model. We gratefully acknowledge the influence of the OHDSI community and the open-source OMOP common data model [specifications](http://www.ohdsi.org/web/wiki/doku.php?id=documentation:cdm) on our thinking. In addition, we acknowledge the influence of both Sentinel and i2b2 on our approach, although most of our data model was designed prior to fully reviewing other data models. At the moment, many references to the [concept](https://ohdsi.github.io/CommonDataModel/cdm54.html#CONCEPT) table refer to the OMOP version 5 vocabulary [table](http://www.ohdsi.org/web/athena/) maintained by OHDSI.  However, any internally consistent set of vocabularies with unique concept ids would be sufficient (e.g., the [National Library of Medicine Metathesaurus](https://www.nlm.nih.gov/research/umls/knowledge_sources/metathesaurus/)).
  6 | 
  7 | Note that in April 2023, we removed the patient_details table from the data model.
  8 | 
  9 | ## GDM Tables
 10 | 
 11 | ### [patients](#patients)
 12 | 
 13 | - Demographic information about the [patients](#patients) in the data
 14 | - The column for _practitioner_id_ is intended for situations where there is a defined primary care practitioner (e.g., HMO or CPRD data)
 15 | 
 16 | column               | type   | description                                                                                                                          | foreign key (FK)                                                      | required
 17 | ---------------------|--------|--------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------|---------
 18 | id                   | serial | Surrogate key for record                                                                                                             |                                                                       | x
 19 | gender_concept_id    | bigint | FK reference to the [concept](https://ohdsi.github.io/CommonDataModel/cdm54.html#CONCEPT) table for the unique gender of the patient | [concept](https://ohdsi.github.io/CommonDataModel/cdm54.html#CONCEPT) |
 20 | birth_date           | date   | Date of birth (yyyy-mm-dd)                                                                                                           |                                                                       |
 21 | race_concept_id      | bigint | FK reference to the [concept](https://ohdsi.github.io/CommonDataModel/cdm54.html#CONCEPT) table for the unique race of the patient   | [concept](https://ohdsi.github.io/CommonDataModel/cdm54.html#CONCEPT) |
 22 | ethnicity_concept_id | bigint | FK reference to the [concept](https://ohdsi.github.io/CommonDataModel/cdm54.html#CONCEPT) table for the ethnicity of the patient     | [concept](https://ohdsi.github.io/CommonDataModel/cdm54.html#CONCEPT) |
 23 | address_id           | bigint | FK reference to the place of residency for the patient in the location table, where the detailed address information is stored       | [addresses](#addresses)                                               |
 24 | practitioner_id      | bigint | FK reference to the primary care practitioner the patient is seeing in the [practitioners](#practitioners) table                     | [practitioners](#practitioners)                                       |
 25 | patient_id_source_value | text | Originial patient identifier defined in the source data                                                                         |                                 | x
 26 | 
 27 | ### [practitioners](#practitioners)
 28 | 
 29 | - All non-facility [practitioners](#practitioners) (i.e., physicians, etc.) are listed
 30 | 
 31 | column                    | type   | description                                                                                                                                                | foreign key (FK)                                                      | required
 32 | --------------------------|--------|------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------|---------
 33 | id                        | serial | Surrogate key for record                                                                                                                                   |                                                                       | x
 34 | practitioner_name         | text   | Practitioner's name, if available                                                                                                                          |                                                                       |
 35 | primary_identifier        | text   | Primary practitioner identifier                                                                                                                            |                                                                       | x
 36 | primary_identifier_type   | text   | Type of identifier specified in primary identifier field (UPIN, NPI, etc)                                                                                  |                                                                       | x
 37 | secondary_identifier      | text   | Secondary practitioner identifier (Optional)                                                                                                               |                                                                       |
 38 | secondary_identifier_type | text   | Type of identifier specified in secondary identifier field (UPIN, NPI, etc)                                                                                |                                                                       |
 39 | specialty_concept_id      | bigint | FK reference to an identifier in the [concept](https://ohdsi.github.io/CommonDataModel/cdm54.html#CONCEPT) table for specialty                             | [concept](https://ohdsi.github.io/CommonDataModel/cdm54.html#CONCEPT) |
 40 | address_id                | bigint | FK reference to the address of the location where the practitioner is practicing                                                                           | [addresses](#addresses)                                               |
 41 | birth_date                | date   | Date of birth (yyyy-mm-dd)                                                                                                                                 |                                                                       |
 42 | gender_concept_id         | bigint | FK reference to an identifier in the [concept](https://ohdsi.github.io/CommonDataModel/cdm54.html#CONCEPT) table for the unique gender of the practitioner | [concept](https://ohdsi.github.io/CommonDataModel/cdm54.html#CONCEPT) |
 43 | 
 44 | ### [facilities](#facilities)
 45 | 
 46 | - Unique records for all the [facilities](#facilities) in the data
 47 | - facility_type_concept_id should be used to describe the whole facility (e.g., Academic Medical Center or Community Medical Center). Specific departments in the facility should be entered in the [contexts](#contexts) table using the care_site_type_concept_id field.
 48 | 
 49 | column                    | type   | description                                                                                                                     | foreign key (FK)                                                      | required
 50 | --------------------------|--------|---------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------|---------
 51 | id                        | serial | Surrogate key for record                                                                                                        |                                                                       | x
 52 | facility_name             | text   | Facility name, if available                                                                                                     |                                                                       |
 53 | primary_identifier        | text   | Primary facility identifier                                                                                                     |                                                                       | x
 54 | primary_identifier_type   | text   | Type of identifier specified in primary identifier field (UPIN, NPI, etc)                                                       |                                                                       | x
 55 | secondary_identifier      | text   | Secondary facility identifier (Optional)                                                                                        |                                                                       |
 56 | secondary_identifier_type | text   | Type of identifier specified in secondary identifier field (UPIN, NPI, etc)                                                     |                                                                       |
 57 | facility_type_concept_id  | bigint | FK reference to [concept](https://ohdsi.github.io/CommonDataModel/cdm54.html#CONCEPT) table representing the facility type      | [concept](https://ohdsi.github.io/CommonDataModel/cdm54.html#CONCEPT) |
 58 | specialty_concept_id      | bigint | A foreign key to an identifier in the [concept](https://ohdsi.github.io/CommonDataModel/cdm54.html#CONCEPT) table for specialty | [concept](https://ohdsi.github.io/CommonDataModel/cdm54.html#CONCEPT) |
 59 | address_id                | bigint | A foreign key to the address of the location of the facility                                                                    | [addresses](#addresses)                                               |
 60 | 
 61 | ### [collections](#collections)
 62 | 
 63 | - Used to group [contexts](#contexts) records
 64 | - For claims, records the claim level information (also referred to as "headers" in some databases)
 65 |    - Use claim from and thru date for start and end date, if available
 66 |    - Admit and discharge dates should go in the [admission_details](#admission_details) table unless those are the only dates for the records in which case they should be entered into both the [collections](#collections) and [admission_details](#admission_details) tables
 67 | - For EHR, records the visit level information
 68 | 
 69 | column                     | type   | description                                                                                                                                             | foreign key (FK)                                                      | required
 70 | ---------------------------|--------|---------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------|---------
 71 | id                         | serial | Surrogate key for record                                                                                                                                |                                                                       | x
 72 | patient_id                 | bigint | FK to reference to [patients](#patients) table                                                                                                          | [patients](#patients)                                                 | x
 73 | start_date                 | date   | Start date of record (yyyy-mm-dd)                                                                                                                       |                                                                       | x
 74 | end_date                   | date   | End date of record (yyyy-mm-dd)                                                                                                                         |                                                                       | x
 75 | duration                   | float  | Duration of collection. (e.g. hospitalization length of stay)                                                                                           |                                                                       |
 76 | duration_unit_concept_id   | bigint | FK reference to [concept](https://ohdsi.github.io/CommonDataModel/cdm54.html#CONCEPT) table representing the unit of duration (hours, days, weeks etc.) | concepts                                                              |
 77 | facility_id                | bigint | FK reference to [facilities](#facilities) table                                                                                                         | [facilities](#facilities)                                             |
 78 | admission_detail_id        | bigint | FK reference to [admission_details](#admission_details) table                                                                                           | [admission_details](#admission_details)                               |
 79 | collection_type_concept_id | bigint | FK reference to [concept](https://ohdsi.github.io/CommonDataModel/cdm54.html#CONCEPT) table representing the type of collection this record represents  | [concept](https://ohdsi.github.io/CommonDataModel/cdm54.html#CONCEPT) |
 80 | 
 81 | ### [contexts_practitioners](#context_practitioners)
 82 | 
 83 | - Links one or more [practitioners](#practitioners) with a [contexts](#contexts) record
 84 | - Each record represents an encounter between a patient and a practitioner on a specific context
 85 | - Captures the role, if any, the practitioner played on the context (e.g., attending physician)
 86 | 
 87 | column                    | type   | description                                                                                                                                                                                       | foreign key (FK)                                                      | required
 88 | --------------------------|--------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------|---------
 89 | context_id                | bigint | FK reference to [contexts](#contexts) table                                                                                                                                                       | [contexts](#contexts)                                                 | x
 90 | practitioner_id           | bigint | FK reference to [practitioners](#practitioners) table                                                                                                                                             | [practitioners](#practitioners)                                       | x
 91 | patient_id                | bigint | FK reference to [patients](#patients) table                                                                                                                                                       | [patients](#patients)                                                 | x
 92 | role_type_concept_id      | bigint | FK reference to the [concept](https://ohdsi.github.io/CommonDataModel/cdm54.html#CONCEPT) table representing roles [practitioners](#practitioners) can play in an encounter                       | [concept](https://ohdsi.github.io/CommonDataModel/cdm54.html#CONCEPT) |
 93 | specialty_type_concept_id | bigint | FK reference to [concept](https://ohdsi.github.io/CommonDataModel/cdm54.html#CONCEPT) table representing the practitioner's specialty type for the services/diagnoses associated with this record | [concept](https://ohdsi.github.io/CommonDataModel/cdm54.html#CONCEPT) |
 94 | 
 95 | ### [contexts](#contexts)
 96 | 
 97 | - Stores information about the context of the [clinical_codes](#clinical_codes) and [payer_reimbursements](#payer_reimbursements)
 98 | - Used to group [clinical_codes](#clinical_codes) typically occurring on the same day or at the same time (e.g., a diagnosis and a procedure, or a systolic and diastolic blood pressure)
 99 | - [contexts](#contexts) records are always linked to a collection record
100 | - care_site_type_concept_id is used to describe the department in which the service was performed
101 | 
102 | column                            | type   | description                                                                                                                                                                                                                                    | foreign key (FK)                                                      | required
103 | ----------------------------------|--------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------|---------
104 | id                                | serial | Surrogate key for record                                                                                                                                                                                                                       |                                                                       | x
105 | collection_id                     | bigint | FK reference to [collections](#collections) table                                                                                                                                                                                              | [collections](#collections)                                           | x
106 | patient_id                        | bigint | FK to reference to [patients](#patients) table                                                                                                                                                                                                 | [patients](#patients)                                                 | x
107 | start_date                        | date   | Start date of record (yyyy-mm-dd)                                                                                                                                                                                                              |                                                                       | x
108 | end_date                          | date   | End date of record (yyyy-mm-dd)                                                                                                                                                                                                                |                                                                       |
109 | facility_id                       | bigint | FK reference to [facilities](#facilities) table                                                                                                                                                                                                | [facilities](#facilities)                                             |
110 | care_site_type_concept_id         | bigint | FK reference to [concept](https://ohdsi.github.io/CommonDataModel/cdm54.html#CONCEPT) table representing the care site type within the facility                                                                                                | [concept](https://ohdsi.github.io/CommonDataModel/cdm54.html#CONCEPT) |
111 | pos_concept_id                    | bigint | FK reference to [concept](https://ohdsi.github.io/CommonDataModel/cdm54.html#CONCEPT) table representing the place of service associated with this record                                                                                      | [concept](https://ohdsi.github.io/CommonDataModel/cdm54.html#CONCEPT) |
112 | source_type_concept_id            | bigint | FK reference to [concept](https://ohdsi.github.io/CommonDataModel/cdm54.html#CONCEPT) table representing the file name (e.g MEDPAR). If data represents a subset of a file, concatenate the name of the file used and subset  (e.g MEDPAR_SNF) | [concept](https://ohdsi.github.io/CommonDataModel/cdm54.html#CONCEPT) | x
113 | service_specialty_type_concept_id | bigint | FK reference to [concept](https://ohdsi.github.io/CommonDataModel/cdm54.html#CONCEPT) table representing the specialty type for the services/diagnoses associated with this record                                                             | [concept](https://ohdsi.github.io/CommonDataModel/cdm54.html#CONCEPT) |
114 | record_type_concept_id            | bigint | FK reference to [concept](https://ohdsi.github.io/CommonDataModel/cdm54.html#CONCEPT) table representing the type of [contexts](#contexts) the record represents (line, claim, etc.)                                                           | [concept](https://ohdsi.github.io/CommonDataModel/cdm54.html#CONCEPT) | x
115 | 
116 | ### [clinical_codes](#clinical_codes)
117 | 
118 | - Stores clinical codes from all types of records including procedures, diagnoses, drugs, laboratory records and other sources.Some common vocabularies include ICD-9, ICD-10, SNOMED, Read, HCPCS, CPT, NDC, and LOINC
119 | - Ignores semantic distinctions about the type of information represented within a vocabulary because most vocabularies contain information from more than one domain
120 | - One record generated for each individual code in the raw data
121 | - Extra detail can be found about a code in the [measurement_details](#measurement_details) and [drug_exposure_details](#drug_exposure_details) tables if that information exists
122 | 
123 | column                      | type   | description                                                                                                                     | foreign key (FK)                                                            | required
124 | ----------------------------|--------|---------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------|---------
125 | id                          | serial | Surrogate key for record                                                                                                        |                                                                             | x
126 | collection_id               | bigint | FK reference to [collections](#collections) table                                                                               | [collections](#collections)                                                 | x
127 | context_id                  | bigint | FK reference to [contexts](#contexts) table                                                                                     | [contexts](#contexts)                                                       | x
128 | patient_id                  | bigint | FK reference to [patients](#patients) table                                                                                     | [patients](#patients)                                                       | x
129 | start_date                  | date   | Start date of record (yyyy-mm-dd)                                                                                               |                                                                             | x
130 | end_date                    | date   | End date of record (yyyy-mm-dd)                                                                                                 |                                                                             | x
131 | clinical_code_concept_id    | bigint | FK reference to [concept](https://ohdsi.github.io/CommonDataModel/cdm54.html#CONCEPT) table for the code assigned to the record | [concept](https://ohdsi.github.io/CommonDataModel/cdm54.html#CONCEPT)       | x
132 | quantity                    | bigint | Quantity, if available (e.g., procedures)                                                                                       |                                                                             |
133 | seq_num                     | int    | The sequence number for the variable assigned (e.g. dx3 gets sequence number 3)                                                 |                                                                             |
134 | provenance_concept_id       | bigint | Additional type information (ex: primary, admitting, problem list, etc)                                                         | [concept](https://ohdsi.github.io/CommonDataModel/cdm54.html#CONCEPT)       |
135 | clinical_code_source_value  | text   | Source code from raw data                                                                                                       |                                                                             | x
136 | clinical_code_vocabulary_id | text   | FK reference to the vocabulary the clinical code comes from                                                                     | [vocabulary](https://ohdsi.github.io/CommonDataModel/cdm54.html#VOCABULARY) | x
137 | measurement_detail_id       | bigint | FK reference to [measurement_details](#measurement_details) table                                                               | [measurement_details](#measurement_details)                                 |
138 | drug_exposure_detail_id     | bigint | FK reference to [drug_exposure_details](#drug_exposure_details) table                                                           | [drug_exposure_details](#drug_exposure_details)                             |
139 | 
140 | ### [measurement_details](#measurement_details)
141 | 
142 | - Stores additional information related to measurements, observations, status, and specifications
143 | - Text-based vocabularies are sufficient, but could also be mapped to LOINC and stored in the [mappings](#mappings) table(e.g., laboratory data indexed by text names for the lab results)
144 | - Other vocabularies should be included in their original system (e.g., oncology may be comprised of separate vocabularies for location, histology, grade, behavior, etc.)
145 |    - This could be implemented by making variable names a vocabulary in themselves, depending on the use case
146 | 
147 | column                                | type   | description                                                                                                                                                                                                | foreign key (FK)                                                      | required
148 | --------------------------------------|--------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------|---------
149 | id                                    | serial | Surrogate key for record                                                                                                                                                                                   |                                                                       | x
150 | patient_id                            | bigint | FK reference to [patients](#patients) table                                                                                                                                                                | [patients](#patients)                                                 | x
151 | result_as_number                      | float  | The observation result stored as a number, applicable to observations where the result is expressed as a numeric value                                                                                     |                                                                       |
152 | result_as_string                      | text   | The observation result stored as a string, applicable to observations where the result is expressed as verbatim text                                                                                       |                                                                       |
153 | result_as_concept_id                  | bigint | FK reference to [concept](https://ohdsi.github.io/CommonDataModel/cdm54.html#CONCEPT) table for the result associated with the detail_concept_id (e.g., positive/negative, present/absent, low/high, etc.) | [concept](https://ohdsi.github.io/CommonDataModel/cdm54.html#CONCEPT) |
154 | result_modifier_concept_id            | bigint | FK reference to [concept](https://ohdsi.github.io/CommonDataModel/cdm54.html#CONCEPT) table for result modifier (=, <, >, etc.)                                                                            | [concept](https://ohdsi.github.io/CommonDataModel/cdm54.html#CONCEPT) |
155 | unit_concept_id                       | bigint | FK reference to [concept](https://ohdsi.github.io/CommonDataModel/cdm54.html#CONCEPT) table for the measurement units (e.g., mmol/L, mg/dL, etc.)                                                          | [concept](https://ohdsi.github.io/CommonDataModel/cdm54.html#CONCEPT) |
156 | normal_range_low                      | float  | Lower bound of the normal reference range assigned by the laboratory                                                                                                                                       |                                                                       |
157 | normal_range_high                     | float  | Upper bound of the normal reference range assigned by the laboratory                                                                                                                                       |                                                                       |
158 | normal_range_low_modifier_concept_id  | bigint | FK reference to [concept](https://ohdsi.github.io/CommonDataModel/cdm54.html#CONCEPT) table for result modifier (=, <, >, etc.)                                                                            | [concept](https://ohdsi.github.io/CommonDataModel/cdm54.html#CONCEPT) |
159 | normal_range_high_modifier_concept_id | bigint | FK reference to [concept](https://ohdsi.github.io/CommonDataModel/cdm54.html#CONCEPT) table for result modifier (=, <, >, etc.)                                                                            | [concept](https://ohdsi.github.io/CommonDataModel/cdm54.html#CONCEPT) |
160 | 
161 | ### [drug_exposure_details](#drug_exposure_details)
162 | 
163 | - Designed to capture extra details about drug-specific [clinical_codes](#clinical_codes)
164 | - The quantity of a drug is stored in the [clinical_codes](#clinical_codes) quantity field
165 | 
166 | column                  | type   | description                                                                                                                                     | foreign key (FK)                                                      | required
167 | ------------------------|--------|-------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------|---------
168 | id                      | serial | Surrogate key for record                                                                                                                        |                                                                       | x
169 | patient_id              | bigint | FK to reference to [patients](#patients) table                                                                                                  | [patients](#patients)                                                 | x
170 | refills                 | int    | The number of refills after the initial prescription; the initial prescription is not counted (i.e., values start with 0)                       |                                                                       |
171 | days_supply             | int    | The number of days of supply as recorded in the original prescription or dispensing record                                                      |                                                                       |
172 | number_per_day          | float  | The number of pills taken per day                                                                                                               |                                                                       |
173 | dose_form_concept_id    | bigint | FK reference to [concept](https://ohdsi.github.io/CommonDataModel/cdm54.html#CONCEPT) table for the form of the drug (capsule, injection, etc.) | [concept](https://ohdsi.github.io/CommonDataModel/cdm54.html#CONCEPT) |
174 | dose_unit_concept_id    | bigint | FK reference to [concept](https://ohdsi.github.io/CommonDataModel/cdm54.html#CONCEPT) table for the units in which the dose_value is expressed  | [concept](https://ohdsi.github.io/CommonDataModel/cdm54.html#CONCEPT) |
175 | route_concept_id        | bigint | FK reference to [concept](https://ohdsi.github.io/CommonDataModel/cdm54.html#CONCEPT) table for route in which drug is given                    | [concept](https://ohdsi.github.io/CommonDataModel/cdm54.html#CONCEPT) |
176 | dose_value              | float  | Numeric value for the dose of the drug                                                                                                          |                                                                       |
177 | strength_source_value   | text   | Drug strength as reported in the raw data. This can include both dose value and units                                                           |                                                                       |
178 | ingredient_source_value | text   | Ingredient/Generic name of drug as reported in the raw data                                                                                     |                                                                       |
179 | drug_name_source_value  | text   | Product/Brand name of drug as reported in the raw data                                                                                          |                                                                       |
180 | 
181 | ### [payer_reimbursements](#payer_reimbursements)
182 | 
183 | - The purpose of this table is to capture all costs reported in the course of paying for services. It is designed from a US administrative claims data perspective.
184 | - All payer reimbursement records are linked to a record in the [contexts](#contexts) table which identifies the type of reimbursement (generally a line-level or claim-level cost)
185 | - Note that claim-level reimbursements do not always sum to the individual line-level reimbursements, so caution should be used when querying records
186 | 
187 | column                   | type   | description                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   | foreign key (FK)                                                      | required
188 | -------------------------|--------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------|---------
189 | id                       | serial | Surrogate key for record                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |                                                                       |
190 | context_id               | bigint | FK reference to context table                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 | [contexts](#contexts)                                                 | x
191 | patient_id               | bigint | FK to reference to [patients](#patients) table                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                | [patients](#patients)                                                 | x
192 | clinical_code_id         | bigint | FK reference to [clinical_codes](#clinical_codes) table to be used if a specific code is the direct cause for the reimbursement                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               | [clinical_codes](#clinical_codes)                                     |
193 | currency_concept_id      | bigint | FK reference to [concept](https://ohdsi.github.io/CommonDataModel/cdm54.html#CONCEPT) table for the 3-letter code used to delineate international currencies (e.g., USD = US Dollar)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          | [concept](https://ohdsi.github.io/CommonDataModel/cdm54.html#CONCEPT) | x
194 | total_charged            | float  | The total amount charged by the provider of the good/service (e.g. hospital, physician pharmacy, dme provider) billed to a payer. This information is usually provided in claims data.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |                                                                       |
195 | total_paid               | float  | The total amount paid from all payers for the expenses of the service/device/drug. This field is calculated using the following formula: paid_by_payer + paid_by_patient + paid_by_primary. In claims data, this field is considered the calculated field the payer expects the provider to get reimbursed for the service/device/drug from the payer and from the patient, based on the payer's contractual obligations.                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |                                                                       |
196 | paid_by_payer            | float  | The amount paid by the Payer for the service/device/drug. In claims data, generally there is one field representing the total payment from the payer for the service/device/drug. However, this field could be a calculated field if the source data provides separate payment information for the ingredient cost and the dispensing fee. If the paid_ingredient_cost or paid_dispensing_fee fields are populated with nonzero values, the paid_by_payer field is calculated using the following formula: paid_ingredient_cost + paid_dispensing_fee. If there is more than one Payer in the source data, several cost records indicate that fact. The Payer reporting this reimbursement should be indicated under the payer_plan_id field.                                                                                                                                                 |                                                                       |
197 | paid_by_patient          | float  | The total amount paid by the patient as a share of the expenses. This field is most often used in claims data to report the contracted amount the patient is responsible for reimbursing the provider for said service/device/drug. This is a calculated field using the following formula: paid_patient_copay + paid_patient_coinsurance + paid_patient_deductible. If the source data has actual patient payments (e.g. the patient payment is not a derivative of the payer claim and there is verification the patient paid an amount to the provider), then the patient payment should have it's own cost record with a payer_plan_id set to 0 to indicate the payer is actually the patient, and the actual patient payment should be noted under the total_paid field. The paid_by_patient field is only used for reporting a patient's responsibility reported on an insurance claim. |                                                                       |
198 | paid_patient_copay       | float  | The amount paid by the patient as a fixed contribution to the expenses. paid_patient_copay does contribute to the paid_by_patient variable. The paid_patient_copay field is only used for reporting a patient's copay amount reported on an insurance claim.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |                                                                       |
199 | paid_patient_coinsurance | float  | The amount paid by the patient as a joint assumption of risk. Typically, this is a percentage of the expenses defined by the Payer Plan after the patient's deductible is exceeded. paid_patient_coinsurance does contribute to the paid_by_patient variable. The paid_patient_coinsurance field is only used for reporting a patient's coinsurance amount reported on an insurance claim.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |                                                                       |
200 | paid_patient_deductible  | float  | The amount paid by the patient that is counted toward the deductible defined by the Payer Plan. paid_patient_deductible does contribute to the paid_by_patient variable. The paid_patient_deductible field is only used for reporting a patient's deductible amount reported on an insurance claim.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |                                                                       |
201 | paid_by_primary          | float  | The amount paid by a primary Payer through the coordination of benefits. paid_by_primary does contribute to the total_paid variable. The paid_by_primary field is only used for reporting a patient's primary insurance payment amount reported on the secondary payer insurance claim. If the source data has actual primary insurance payments (e.g. the primary insurance payment is not a derivative of the payer claim and there is verification another insurance company paid an amount to the provider), then the primary insurance payment should have it's own cost record with a payer_plan_id set to the applicable payer, and the actual primary insurance payment should be noted under the paid_by_payer field.                                                                                                                                                                |                                                                       |
202 | paid_ingredient_cost     | float  | The amount paid by the Payer to a pharmacy for the drug, excluding the amount paid for dispensing the drug. paid_ingredient_cost contributes to the paid_by_payer field if this field is populated with a nonzero value.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |                                                                       |
203 | paid_dispensing_fee      | float  | The amount paid by the Payer to a pharmacy for dispensing a drug, excluding the amount paid for the drug ingredient. paid_dispensing_fee contributes to the paid_by_payer field if this field is populated with a nonzero value.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |                                                                       |
204 | information_period_id    | bigint | FK reference to the [information_periods](#information_periods) table                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |                                                                       |
205 | amount_allowed           | float  | The contracted amount agreed between the payer and provider. This information is generally available in claims data. This is similar to the total_paid amount in that it shows what the payer expects the provider to be reimbursed after the payer and patient pay. This differs from the total_paid amount in that it is not a calculated field, but a field available directly in claims data. Use case: This will capture non-covered services. Non-covered services are indicated by an amount allowed and patient responsibility variables (copay, coinsurance, deductible) will be equal $0 in the source data. This means the patient is responsible for the total_charged value. The amount_allowed field is payer specific and the payer should be indicated by the payer_plan_id field.                                                                                            |                                                                       |
206 | 
207 | ### [costs](#costs)
208 | 
209 | - Used to capture all non reimbursement costs
210 | - Examples of things captured in this table are things like cost-to-charge ratio, calculated cost (for situations where the ETL process calculates a cost based on the available data), reported cost (where the ETL process imputes a cost from another source), and some other things that may become apparent with more use cases.
211 | 
212 | column                | type   | description                                                                                                                                                                                                                          | foreign key (FK)                                                      | required
213 | ----------------------|--------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------|---------
214 | id                    | serial | Surrogate key for record                                                                                                                                                                                                             |                                                                       | x
215 | context_id            | bigint | FK reference to context table                                                                                                                                                                                                        | [contexts](#contexts)                                                 | x
216 | patient_id            | bigint | FK reference to [patients](#patients) table                                                                                                                                                                                          | [patients](#patients)                                                 | x
217 | clinical_code_id      | bigint | FK reference to [clinical_codes](#clinical_codes) table to be used if a specific code is the direct cause for the reimbursement                                                                                                      | [clinical_codes](#clinical_codes)                                     |
218 | currency_concept_id   | bigint | FK reference to [concept](https://ohdsi.github.io/CommonDataModel/cdm54.html#CONCEPT) table for the 3-letter code used to delineate international currencies (e.g., USD = US Dollar)                                                 | [concept](https://ohdsi.github.io/CommonDataModel/cdm54.html#CONCEPT) | x
219 | cost_base             | text   | Defines the basis for the cost in the table (e.g., 2013 for a specific cost-to-charge ratio, or a specific cost from an external cost                                                                                                |                                                                       | x
220 | value                 | float  | Cost value                                                                                                                                                                                                                           |                                                                       | x
221 | value_type_concept_id | bigint | FK reference to [concept](https://ohdsi.github.io/CommonDataModel/cdm54.html#CONCEPT) table to concept that defines the type of economic information in the value field (e.g., cost-to-charge ratio, calculated cost, reported cost) | [concept](https://ohdsi.github.io/CommonDataModel/cdm54.html#CONCEPT) | x
222 | 
223 | ### [addresses](#addresses)
224 | 
225 | - Used to store location information for [patients](#patients), [practitioners](#practitioners), and [facilities](#facilities)
226 | - One record for each geographic location in the data
227 | 
228 | column       | type   | description                                                                                             | foreign key (FK) | required
229 | -------------|--------|---------------------------------------------------------------------------------------------------------|------------------|---------
230 | id           | serial | Surrogate key for record                                                                                |                  | x
231 | address_1    | text   | Typically used for street address                                                                       |                  |
232 | address_2    | text   | Typically used for additional detail such as building, suite, floor, etc.                               |                  |
233 | city         | text   | The city field as it appears in the source data                                                         |                  |
234 | state        | text   | The state field as it appears in the source data                                                        |                  |
235 | zip          | text   | The zip or postal code                                                                                  |                  |
236 | county       | text   | The county, if available                                                                                |                  |
237 | census_tract | text   | The census tract if available                                                                           |                  |
238 | hsa          | text   | The Health Service Area, if available (originally defined by the National Center for Health Statistics) |                  |
239 | country      | text   | The country if necessary                                                                                |                  |
240 | 
241 | ### [deaths](#deaths)
242 | 
243 | - Stores mortality information including date of death and cause(s) of death
244 | - Commonly populated from beneficiary or similar administrative data associated with the medical record
245 | - Deaths identified from diagnosis codes or discharge status are not necessary since such records are in the [clinical_codes](#clinical_codes) and [admission_details](#admission_details) tables and can be queried separately
246 | 
247 | column                | type   | description                                                                                                                                                 | foreign key (FK)                                                      | required
248 | ----------------------|--------|-------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------|---------
249 | id                    | serial | Surrogate key for record                                                                                                                                    |                                                                       | x
250 | patient_id            | bigint | FK reference to [patients](#patients) table                                                                                                                 | [patients](#patients)                                                 | x
251 | date                  | date   | Date of death (yyyy-mm-dd)                                                                                                                                  |                                                                       | x
252 | cause_concept_id      | bigint | FK reference to [concept](https://ohdsi.github.io/CommonDataModel/cdm54.html#CONCEPT) table for cause of death (typically ICD-9 or ICD-10 code)             | [concept](https://ohdsi.github.io/CommonDataModel/cdm54.html#CONCEPT) |
253 | cause_type_concept_id | bigint | FK reference to [concept](https://ohdsi.github.io/CommonDataModel/cdm54.html#CONCEPT) table for the type of cause of death (e.g. primary, secondary, etc. ) | [concept](https://ohdsi.github.io/CommonDataModel/cdm54.html#CONCEPT) |
254 | practitioner_id       | bigint | FK reference to [practitioners](#practitioners) table                                                                                                       | [practitioners](#practitioners)                                       |
255 | 
256 | ### [information_periods](#information_periods)
257 | 
258 | - Captures periods for which information in each table is relevant for each person
259 | - Could include enrollment types (e.g., Part A, Part B, HMO) or just "observable" (as with up-to-standard data in CPRD)
260 | - One row per patient per non-overlapping enrollment/information period type
261 | 
262 | column                      | type   | description                                                                                                                                                                                  | foreign key (FK)                                                      | required
263 | ----------------------------|--------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------|---------
264 | id                          | serial | Surrogate key for record                                                                                                                                                                     |                                                                       | x
265 | patient_id                  | bigint | FK reference to [patients](#patients) table                                                                                                                                                  | [patients](#patients)                                                 | x
266 | start_date                  | date   | Start date of record (yyyy-mm-dd)                                                                                                                                                            |                                                                       | x
267 | end_date                    | date   | End date of record (yyyy-mm-dd)                                                                                                                                                              |                                                                       | x
268 | information_type_concept_id | bigint | FK reference to [concept](https://ohdsi.github.io/CommonDataModel/cdm54.html#CONCEPT) table representing the information type (e.g., insurance coverage, hospital data, up-to-standard date) | [concept](https://ohdsi.github.io/CommonDataModel/cdm54.html#CONCEPT) | x
269 | 
270 | ### [admission_details](#admission_details)
271 | 
272 | - Captures details about admissions and emergency department encounters that cannot be stored in the [clinical_codes](#clinical_codes), [contexts](#contexts), or [collections](#collections) tables
273 | - One row per admission
274 | - Each admission record in the [collections](#collections) table will link to this table
275 | 
276 | column                        | type   | description                                                                                                                                                              | foreign key (FK)                                                      | required
277 | ------------------------------|--------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------|---------
278 | id                            | serial | Surrogate key for record                                                                                                                                                 |                                                                       | x
279 | patient_id                    | bigint | FK reference to [patients](#patients) table                                                                                                                              | [patients](#patients)                                                 | x
280 | admission_date                | date   | Date of admission (yyyy-mm-dd)                                                                                                                                           |                                                                       | x
281 | discharge_date                | date   | Date of discharge (yyyy-mm-dd)                                                                                                                                           |                                                                       | x
282 | admit_source_concept_id       | bigint | Database specific code indicating source of admission (e.g., ER visit, transfer, etc.)                                                                                   | [concept](https://ohdsi.github.io/CommonDataModel/cdm54.html#CONCEPT) |
283 | discharge_location_concept_id | bigint | Database specific code indicating discharge location (e.g., death, home, transfer, long-term care, etc.)                                                                 | [concept](https://ohdsi.github.io/CommonDataModel/cdm54.html#CONCEPT) |
284 | admission_type_concept_id     | bigint | FK reference to [concept](https://ohdsi.github.io/CommonDataModel/cdm54.html#CONCEPT) table representing the type of admission the record is (Emergency, Elective, etc.) | [concept](https://ohdsi.github.io/CommonDataModel/cdm54.html#CONCEPT) |
285 | 
286 | ### [etl_info](#etl_info)
287 | 
288 | - Basic attribute value table for storing information about the ETL
289 | - Currently used to store all arguments used during a setlr run
290 | 
291 | column | type | description        | foreign key (FK) | required
292 | -------|------|--------------------|------------------|---------
293 | key    | text | Name for the value |                  | x
294 | value  | text | Value              |                  |
295 | 
296 | ### [etl_information_types](#etl_information_types)
297 | 
298 | - Lists all the information types present in [`information_periods`](#information_periods)`.information_type_concept_id`
299 | - Gives a sense of what vocabularies are used in the dataset and how common they are
300 | 
301 | column           | type   | description                                                          | foreign key (FK) | required
302 | -----------------|--------|----------------------------------------------------------------------|------------------|---------
303 | information_type | text   | Information type                                                     |                  | x
304 | n_rows           | bigint | Number of occurrences in [information_periods](#information_periods) |                  | x
305 | 
306 | ### [etl_tables](#etl_tables)
307 | 
308 | - Lists all tables ETL'd for the dataset along with row counts, patient counts, and min/max dates
309 | - Gives a sense of what tables are present in the dataset and how large each is
310 | 
311 | column        | type   | description                     | foreign key (FK) | required
312 | --------------|--------|---------------------------------|------------------|---------
313 | table_name    | text   | Name of table                   |                  | x
314 | n_rows        | bigint | Number of rows in table         |                  | x
315 | n_patients    | bigint | Number of unique patients table |                  |
316 | earliest_date | date   | Earliest date found in table    |                  |
317 | latest_date   | date   | Latest date found in table      |                  |
318 | 
319 | ### [etl_vocabulary_ids](#etl_vocabulary_ids)
320 | 
321 | - Lists all the vocabularies present in [`clinical_codes`](#clinical_codes)`.clinical_code_vocabulary_id`
322 | - Gives a sense of what vocabularies are used in the dataset and how common they are
323 | 
324 | column        | type   | description                                                | foreign key (FK)                                                            | required
325 | --------------|--------|------------------------------------------------------------|-----------------------------------------------------------------------------|---------
326 | vocabulary_id | text   | Vocabulary ID present                                      | [vocabulary](https://ohdsi.github.io/CommonDataModel/cdm54.html#VOCABULARY) | x
327 | n_rows        | bigint | Number of occurrences in [clinical_codes](#clinical_codes) |                                                                             | x
328 | 


--------------------------------------------------------------------------------
/check_readme.sh:
--------------------------------------------------------------------------------
1 | bundle && bundle exec mdl --style style.rb README.md && echo "Looks good!"
2 | 


--------------------------------------------------------------------------------
/convert_to_schema.rb:
--------------------------------------------------------------------------------
  1 | require "pathname"
  2 | require "stringio"
  3 | 
  4 | table = nil
  5 | collect = false
  6 | schema = {}
  7 | 
  8 | artifacts_dir = Pathname.new("artifacts") + "schemas" + "gdm"
  9 | artifacts_dir.mkpath
 10 | 
 11 | class SequelMigrationIO
 12 |   attr_reader :indent, :io
 13 | 
 14 |   def initialize
 15 |     @io = StringIO.new
 16 |     @indent = 0
 17 |     open_block "Sequel.migration do"
 18 |     open_block "change do"
 19 |   end
 20 | 
 21 |   def puts(*args)
 22 |     io.print(" " * indent)
 23 |     io.puts(*args)
 24 |   end
 25 | 
 26 |   def close_block(opts = {})
 27 |     decrease_indent
 28 |     puts("end")
 29 |     unless opts[:no_blank]
 30 |       io.puts
 31 |     end
 32 |   end
 33 | 
 34 |   def open_block(*args)
 35 |     puts(*args)
 36 |     increase_indent
 37 |   end
 38 | 
 39 |   def increase_indent
 40 |     @indent += 2
 41 |   end
 42 | 
 43 |   def decrease_indent
 44 |     @indent -= 2
 45 |   end
 46 | 
 47 |   def finish
 48 |     while(@indent > 0)
 49 |       close_block(no_blank: true)
 50 |     end
 51 |   end
 52 | 
 53 |   def string
 54 |     io.string
 55 |   end
 56 | end
 57 | 
 58 | def extract(link)
 59 |   return nil if link.nil? || link.empty?
 60 |   return "contexts_practitioners" if link =~ /contexts/ && link =~ /practitioners/
 61 |   md = /\[(.+)\]/.match(link)
 62 |   md.to_a[1] if md
 63 | end
 64 | 
 65 | def is_primary?(column, type)
 66 |   type.to_sym == :serial || column.to_sym == :id
 67 | end
 68 | 
 69 | def convert(name, type)
 70 |   opts = {}
 71 | 
 72 |   db_type = case type.to_sym
 73 |   when :int
 74 |     "Integer"
 75 |   when :text
 76 |     opts[:text] = true
 77 |     "String"
 78 |   when :bigint, :serial
 79 |     opts[:type] = :Bigint
 80 |     "Integer"
 81 |   when :float
 82 |     "Float"
 83 |   when :date
 84 |     "Date"
 85 |   when :boolean
 86 |     "TrueClass"
 87 |   else
 88 |     raise "Unknown type #{type}"
 89 |   end
 90 | 
 91 |   if is_primary?(name, type)
 92 |     if opts[:text]
 93 |       opts[:primary_key] = true
 94 |     else
 95 |       opts[:type] = :Bigint
 96 |       db_type = "primary_key"
 97 |     end
 98 |   end
 99 | 
100 |   [db_type, opts]
101 | end
102 | 
103 | schema_io = SequelMigrationIO.new
104 | indexes = {
105 |   patients: [
106 |     :gender_concept_id,
107 |     :race_concept_id,
108 |     :ethnicity_concept_id,
109 |   ],
110 |   patient_details: [
111 |     :patient_id
112 |   ],
113 |   practictioners: [
114 |     :specialty_concept_id
115 |   ],
116 |   collections: [
117 |     :patient_id
118 |   ],
119 |   contexts_practitioners: [
120 |     :specialty_concept_id
121 |   ],
122 |   contexts: [
123 |     :collection_id,
124 |     :source_type_concept_id
125 |   ],
126 |   clinical_codes: [
127 |     :patient_id,
128 |     :clinical_code_concept_id
129 |   ],
130 |   costs: [
131 |     :patient_id
132 |   ],
133 |   deaths: [
134 |     :patient_id
135 |   ],
136 |   information_periods: [
137 |     :patient_id,
138 |     :information_type_concept_id
139 |   ],
140 | }
141 | 
142 | table = nil
143 | 
144 | def apply_indexes(io, table, indexes)
145 |   return
146 |   (indexes || []).each do |index|
147 |     columns_str = "[ " + Array(index).map do |col|
148 |       case col
149 |       when Symbol
150 |         col.inspect
151 |       else
152 |         col
153 |       end
154 |     end.join(", ") + " ]"
155 |     io.puts "add_index #{table.inspect}, #{columns_str}"
156 |   end
157 | end
158 | 
159 | File.foreach('README.md') do |line|
160 |   line.chomp!
161 |   case line
162 |   when /^\###\s*(.+)/
163 |     table = extract(Regexp.last_match.to_a.last).to_sym
164 |     schema_io.open_block "create_table(#{table.inspect}) do"
165 |     next
166 |   when /-{4,}/
167 |     collect = true
168 |     next
169 |   when ''
170 |     if collect
171 |       apply_indexes(schema_io, table, indexes[table])
172 |       schema_io.close_block
173 |     end
174 |     collect = false
175 |   end
176 | 
177 |   if collect
178 |     line.gsub!(/(^\||\|$)/, '')
179 |     name, type, comment, foreign_key, required = line.split("|").map(&:strip)
180 |     #p [name, type, comment, foreign_key, required]
181 |     name = name.to_sym
182 |     type = type.to_sym
183 |     foreign_key = extract(foreign_key)
184 |     type, column_opts = *convert(name, type)
185 |     column_opts.merge!(comment: comment)
186 |     column_opts.merge!(null: false) if required && !required.strip.empty?
187 |     if foreign_key
188 |       fk_col = :id
189 |       fk_type = "Bigint".to_sym
190 |       if %w(concept vocabulary).include?(foreign_key)
191 |         fk_col = "#{foreign_key}_id".to_sym
192 |         fk_type = "String" if foreign_key == "vocabulary"
193 |       end
194 |       schema_io.puts "foreign_key #{name.inspect}, #{foreign_key.to_sym.inspect}, #{column_opts.merge(type: fk_type, key: fk_col).inspect}"
195 |     else
196 |       schema_io.puts "#{type} #{name.inspect}#{column_opts.empty? ? '' : ", #{column_opts.inspect}"}"
197 |     end
198 |   end
199 | end
200 | 
201 | apply_indexes(schema_io, table, indexes[table])
202 | schema_io.finish
203 | 
204 | File.write(artifacts_dir + "schema.rb", schema_io.string)
205 | 


--------------------------------------------------------------------------------
/converter.rb:
--------------------------------------------------------------------------------
 1 | require 'psych'
 2 | require 'pathname'
 3 | require 'csv'
 4 | 
 5 | table = nil
 6 | collect = false
 7 | schema = {}
 8 | 
 9 | artifacts_dir = Pathname.new("artifacts") + "schemas" + "gdm"
10 | artifacts_dir.mkpath
11 | 
12 | def extract(link)
13 |   return nil if link.nil? || link.empty?
14 |   return "contexts_practitioners" if link =~ /contexts/ && link =~ /practitioners/
15 |   md = /\[(.+)\]/.match(link)
16 |   md.to_a[1] if md
17 | end
18 | 
19 | def is_primary?(column, type)
20 |   type.to_sym == :serial || column.to_sym == :id
21 | end
22 | 
23 | def convert(name, type)
24 |   db_type = case type.to_sym
25 |   when :int
26 |     "Integer"
27 |   when :text
28 |     "String"
29 |   when :bigint, :serial
30 |     :Bigint
31 |   when :float
32 |     "Float"
33 |   when :date
34 |     "Date"
35 |   when :boolean
36 |     "TrueClass"
37 |   else
38 |     raise "Unknown type #{type}"
39 |   end
40 | 
41 |   result = { type: db_type }
42 |   result.merge!(primary_key: true) if is_primary?(name, type)
43 |   result
44 | end
45 | 
46 | CSV.open(artifacts_dir + "schema.csv", "w") do |csv|
47 |   csv << %w(table column type comment foreign_key required)
48 |   File.foreach('README.md') do |line|
49 |     line.chomp!
50 |     case line
51 |     when /^\###\s*(.+)/
52 |       table = extract(Regexp.last_match.to_a.last).to_sym
53 |       next
54 |     when /-{4,}/
55 |       collect = true
56 |       next
57 |     when ''
58 |       collect = false
59 |     end
60 |     if collect
61 |       line.gsub!(/(^\||\|$)/, '')
62 |       name, type, comment, foreign_key, required = line.split("|").map(&:strip)
63 |       #p [name, type, comment, foreign_key, required]
64 |       name = name.to_sym
65 |       type = type.to_sym
66 |       foreign_key = extract(foreign_key)
67 |       csv << [table, name, type, comment, foreign_key, required]
68 |       schema[table] ||= { columns: {} }
69 |       schema[table][:columns][name] = convert(name, type).merge(comment: comment)
70 |       schema[table][:columns][name].merge!(foreign_key: foreign_key) unless foreign_key.nil? || foreign_key.empty?
71 |       schema[table][:columns][name].merge!(null: false) if required && !required.strip.empty?
72 |     end
73 |   end
74 | end
75 | 
76 | arrayed_schema = schema.map do |table_name, table|
77 |   columns = table[:columns].map do |column_name, column|
78 |     { name: column_name }.merge(column)
79 |   end
80 |   { name: table_name, columns: columns}
81 | end
82 | File.write(artifacts_dir + "schema.yml", schema.to_yaml)
83 | File.write(artifacts_dir + "schema_arrayed.yml", arrayed_schema.to_yaml)
84 | 


--------------------------------------------------------------------------------
/dropbox-deployment.yml:
--------------------------------------------------------------------------------
1 | deploy:
2 |   dropbox_path: /zOI (GDM) Generalized Data Model # The path to the folder on Dropbox where the files will go
3 |   artifacts_path: artifacts/gdm_schemas.tbz # can be a single file, or a path
4 |   debug: true # if you want to see more logs
5 | 


--------------------------------------------------------------------------------
/generate_wide_tables.sql:
--------------------------------------------------------------------------------
 1 | DROP TABLE IF EXISTS observations;
 2 | CREATE TABLE observations AS
 3 | 	SELECT
 4 | 		cc.id,
 5 | 		cc.patient_id,
 6 | 		cc.start_date,
 7 | 		cc.end_date,
 8 | 		cc.clinical_code_concept_id,
 9 | 		cc.quantity,
10 | 		cc.seq_num,
11 | 		cc.provenance_concept_id,
12 | 		cc.clinical_code_source_value,
13 | 		cc.clinical_code_vocabulary_id,
14 | 		ctx.id AS context_id,
15 | 		ctx.start_date AS context_start_date,
16 | 		ctx.end_date AS context_end_date,
17 | 		ctx.facility_id AS context_facility_id,
18 | 		ctx.care_site_type_concept_id,
19 | 		ctx.pos_concept_id,
20 | 		ctx.source_type_concept_id,
21 | 		ctx.service_specialty_type_concept_id,
22 | 		ctx.record_type_concept_id,
23 | 		col.id AS collection_id,
24 | 		col.start_date AS collection_start_date,
25 | 		col.end_date AS collection_end_date,
26 | 		col.duration,
27 | 		col.duration_unit_concept_id,
28 | 		col.facility_id AS collection_facility_id,
29 | 		col.collection_type_concept_id,
30 | 		ded.id AS drug_exposure_detail_id,
31 | 		ded.refills,
32 | 		ded.days_supply,
33 | 		ded.number_per_day,
34 | 		ded.dose_form_concept_id,
35 | 		ded.dose_unit_concept_id,
36 | 		ded.route_concept_id,
37 | 		ded.dose_value,
38 | 		ded.strength_source_value,
39 | 		ded.ingredient_source_value,
40 | 		ded.drug_name_source_value,
41 | 		ad.id AS admission_detail_id,
42 | 		ad.admission_date AS admit_admission_date,
43 | 		ad.discharge_date AS admit_discharge_date,
44 | 		ad.admit_source_concept_id,
45 | 		ad.discharge_location_concept_id,
46 | 		ad.admission_type_concept_id,
47 | 		md.id AS measurement_detail_id,
48 | 		md.result_as_number,
49 | 		md.result_as_string,
50 | 		md.result_as_concept_id,
51 | 		md.result_modifier_concept_id,
52 | 		md.unit_concept_id,
53 | 		md.normal_range_low,
54 | 		md.normal_range_high,
55 | 		md.normal_range_low_modifier_concept_id,
56 | 		md.normal_range_high_modifier_concept_id
57 | 	FROM clinical_codes AS cc
58 | 	LEFT JOIN contexts AS ctx ON (cc.context_id = ctx.id)
59 | 	LEFT JOIN collections AS col ON (cc.collection_id = col.id)
60 | 	LEFT JOIN drug_exposure_details AS ded ON (cc.drug_exposure_detail_id = ded.id)
61 | 	LEFT JOIN admission_details AS ad ON (col.admission_detail_id = ad.id)
62 | 	LEFT JOIN measurement_details AS md ON (cc.measurement_detail_id = md.id)
63 | ;
64 | CREATE INDEX ON observations (clinical_code_vocabulary_id, clinical_code_concept_id, patient_id);
65 | CLUSTER observations USING observations_clinical_code_vocabulary_id_clinical_code_conc_idx;
66 | CREATE INDEX ON observations (patient_id, clinical_code_concept_id, start_date);
67 | CREATE INDEX ON observations (patient_id, start_date, clinical_code_concept_id);
68 | CREATE INDEX ON observations (clinical_code_concept_id, patient_id, start_date);
69 | CREATE INDEX ON observations (clinical_code_concept_id, context_id);
70 | CREATE INDEX ON observations (provenance_concept_id, source_type_concept_id);
71 | CREATE INDEX ON observations (context_id);
72 | VACUUM ANALYZE observations;
73 | 
74 | DROP TABLE IF EXISTS supplemented_payer_reimbursements;
75 | CREATE TABLE supplemented_payer_reimbursements AS
76 | 	SELECT
77 | 		pr.*,
78 | 		cc.collection_id AS collection_id,
79 | 		cc.clinical_code_concept_id,
80 | 		cc.clinical_code_source_value,
81 | 		cc.clinical_code_vocabulary_id,
82 | 		ctx.start_date,
83 | 		ctx.end_date,
84 | 		ctx.source_type_concept_id,
85 | 		ctx.record_type_concept_id
86 | 	FROM payer_reimbursements AS pr
87 | 	LEFT JOIN clinical_codes AS cc ON (cc.id = pr.clinical_code_id)
88 | 	LEFT JOIN contexts AS ctx ON (ctx.id = pr.context_id)
89 | ;
90 | CREATE INDEX ON supplemented_payer_reimbursements (patient_id, start_date);
91 | CLUSTER supplemented_payer_reimbursements USING supplemented_payer_reimbursements_patient_id_start_date_idx;


--------------------------------------------------------------------------------
/style.rb:
--------------------------------------------------------------------------------
1 | all
2 | exclude_rule 'MD013'
3 | 


--------------------------------------------------------------------------------
/wide.md:
--------------------------------------------------------------------------------
  1 | # Generalized Data Model Wide Format (GDM Wide)
  2 | 
  3 | There are situations where denormalized tables perform better than normalized tables.
  4 | 
  5 | We are investigating possible approaches to denormalizing GDM, referring to it as GDM Wide.
  6 | 
  7 | ## Denormalizing `clinical_codes`
  8 | 
  9 | It turns out that GDM lends itself nicely to denormalization.  Because the `clinical_codes` table exclusively contains one to one or one to many relationships, we can quickly generate a denormalized table that includes all the details about a `clinical_codes` record without worrying about producing duplicate rows.
 10 | 
 11 | To avoid naming conflicts, we call this denormalized table `observations`.  We can generate `observations` using the following PostgreSQL query:
 12 | 
 13 | ```sql
 14 | DROP TABLE IF EXISTS observations;
 15 | CREATE TABLE observations AS
 16 | 	SELECT
 17 | 		cc.id,
 18 | 		cc.patient_id,
 19 | 		cc.start_date,
 20 | 		cc.end_date,
 21 | 		cc.clinical_code_concept_id,
 22 | 		cc.quantity,
 23 | 		cc.seq_num,
 24 | 		cc.provenance_concept_id,
 25 | 		cc.clinical_code_source_value,
 26 | 		cc.clinical_code_vocabulary_id,
 27 | 		ctx.id AS context_id,
 28 | 		ctx.start_date AS context_start_date,
 29 | 		ctx.end_date AS context_end_date,
 30 | 		ctx.facility_id AS context_facility_id,
 31 | 		ctx.care_site_type_concept_id,
 32 | 		ctx.pos_concept_id,
 33 | 		ctx.source_type_concept_id,
 34 | 		ctx.service_specialty_type_concept_id,
 35 | 		ctx.record_type_concept_id,
 36 | 		col.id AS collection_id,
 37 | 		col.start_date AS collection_start_date,
 38 | 		col.end_date AS collection_end_date,
 39 | 		col.duration,
 40 | 		col.duration_unit_concept_id,
 41 | 		col.facility_id AS collection_facility_id,
 42 | 		col.collection_type_concept_id,
 43 | 		ded.id AS drug_exposure_detail_id,
 44 | 		ded.refills,
 45 | 		ded.days_supply,
 46 | 		ded.number_per_day,
 47 | 		ded.dose_form_concept_id,
 48 | 		ded.dose_unit_concept_id,
 49 | 		ded.route_concept_id,
 50 | 		ded.dose_value,
 51 | 		ded.strength_source_value,
 52 | 		ded.ingredient_source_value,
 53 | 		ded.drug_name_source_value,
 54 | 		ad.id AS admission_detail_id,
 55 | 		ad.admission_date AS admit_admission_date,
 56 | 		ad.discharge_date AS admit_discharge_date,
 57 | 		ad.admit_source_concept_id,
 58 | 		ad.discharge_location_concept_id,
 59 | 		ad.admission_type_concept_id,
 60 | 		md.id AS measurement_detail_id,
 61 | 		md.result_as_number,
 62 | 		md.result_as_string,
 63 | 		md.result_as_concept_id,
 64 | 		md.result_modifier_concept_id,
 65 | 		md.unit_concept_id,
 66 | 		md.normal_range_low,
 67 | 		md.normal_range_high,
 68 | 		md.normal_range_low_modifier_concept_id,
 69 | 		md.normal_range_high_modifier_concept_id
 70 | 	FROM clinical_codes AS cc
 71 | 	LEFT JOIN contexts AS ctx ON (cc.context_id = ctx.id)
 72 | 	LEFT JOIN collections AS col ON (cc.collection_id = col.id)
 73 | 	LEFT JOIN drug_exposure_details AS ded ON (cc.drug_exposure_detail_id = ded.id)
 74 | 	LEFT JOIN admission_details AS ad ON (col.admission_detail_id = ad.id)
 75 | 	LEFT JOIN measurement_details AS md ON (cc.measurement_detail_id = md.id)
 76 | ;
 77 | CREATE INDEX ON observations (clinical_code_vocabulary_id, clinical_code_concept_id, patient_id);
 78 | CLUSTER observations USING observations_clinical_code_vocabulary_id_clinical_code_conc_idx;
 79 | CREATE INDEX ON observations (patient_id, clinical_code_concept_id, start_date);
 80 | CREATE INDEX ON observations (patient_id, start_date, clinical_code_concept_id);
 81 | CREATE INDEX ON observations (clinical_code_concept_id, patient_id, start_date);
 82 | CREATE INDEX ON observations (clinical_code_concept_id, context_id);
 83 | CREATE INDEX ON observations (provenance_concept_id, source_type_concept_id);
 84 | CREATE INDEX ON observations (context_id);
 85 | VACUUM ANALYZE observations;
 86 | ```
 87 | 
 88 | The `observations` eliminates the need for following tables:
 89 | 
 90 | - `clinical_codes`
 91 | - `context`
 92 | - `collections`
 93 | - `drug_exposure_details`
 94 | - `admission_details`
 95 | - `measurement_details`
 96 | 
 97 | We could further denormalize the table by appending all columns from `patients` but that seemed excessive as we rarely need demographic information when querying `clinical_codes`-related information.  There are still other tables that we could join, but again, we rarely query on or utilize these tables in our analyses.
 98 | 
 99 | ## Supplementing `payer_reimbursements`
100 | 
101 | Another table we frequently join on is `payer_reimbursements`.  We took the opportunity to supplement the `payer_reimbursements` table with the most-common fields we query on to create the `supplemented_payer_reimbursements` table.  We can generate it using the following PostgreSQL query:
102 | 
103 | ```sql
104 | DROP TABLE IF EXISTS supplemented_payer_reimbursements;
105 | CREATE TABLE supplemented_payer_reimbursements AS
106 | 	SELECT
107 | 		pr.*,
108 | 		cc.collection_id AS collection_id,
109 | 		cc.clinical_code_concept_id,
110 | 		cc.clinical_code_source_value,
111 | 		cc.clinical_code_vocabulary_id,
112 | 		ctx.start_date,
113 | 		ctx.end_date,
114 | 		ctx.source_type_concept_id,
115 | 		ctx.record_type_concept_id
116 | 	FROM payer_reimbursements AS pr
117 | 	LEFT JOIN clinical_codes AS cc ON (cc.id = pr.clinical_code_id)
118 | 	LEFT JOIN contexts AS ctx ON (ctx.id = pr.context_id)
119 | ;
120 | CREATE INDEX ON supplemented_payer_reimbursements (patient_id, start_date);
121 | CLUSTER supplemented_payer_reimbursements USING supplemented_payer_reimbursements_patient_id_start_date_idx;
122 | ```
123 | 
124 | ## Versions
125 | 
126 | ### 1.0.0 - 2024-02-01
127 | 
128 | - Original implementation
129 | 
130 | ### 1.0.1 - 2024-02-17
131 | 
132 | - Fix bug where collections were incorrectly joined via cc.context_id instead of cc.collection_id
133 | 
134 | ### 1.1.0 - 2025-04-24
135 | 
136 | - Include foreign keys in `observations`
137 | 	- Provides helpful information for some [ConceptQL](https://github.com/outcomesinsights/conceptql) operators


--------------------------------------------------------------------------------