├── .github ├── ISSUE_TEMPLATE │ ├── issue_template.md │ └── proposal_template.md └── pull_request_template.md ├── .gitignore ├── LICENSE.md ├── README.md ├── docs ├── examples │ ├── README.md │ ├── maintenance_inhaler │ │ └── settings.yaml │ └── rescue_inhaler │ │ └── settings.yaml └── validation │ └── MDT Validation Notebook.ipynb ├── setup.py └── src └── mdt ├── __init__.py ├── cli.py ├── database.py ├── fda ├── __init__.py └── utils.py ├── meps ├── __init__.py ├── columns.py ├── sql │ ├── __init__.py │ └── meps_reference.sql └── utils.py ├── run_mdt.py ├── rxnorm ├── __init__.py ├── rxclass.py ├── sql │ ├── __init__.py │ ├── dfg_df.sql │ └── rxcui_ndc.sql └── utils.py ├── sql ├── __init__.py └── meps_rx_qty_ds.sql ├── utils.py └── yamlmanager.py /.github/ISSUE_TEMPLATE/issue_template.md: -------------------------------------------------------------------------------- 1 | --- 2 | name: Issue 3 | about: Create a new issue 4 | --- 5 | # Problem Statement 6 | [What needs to be done and why] 7 | 8 | # Criteria for Success 9 | [Measureable outcome if possible] 10 | 11 | # Additional Information 12 | [ways one might accomplish this task, links, documentation, alternatives, etc.] 13 | -------------------------------------------------------------------------------- /.github/ISSUE_TEMPLATE/proposal_template.md: -------------------------------------------------------------------------------- 1 | --- 2 | name: Proposal 3 | about: Propose a new feature or some other changes not related to a direct issue 4 | --- 5 | 6 | # Proposal 7 | [What is the idea] 8 | 9 | # Rationale 10 | [Why should this be implemented] 11 | -------------------------------------------------------------------------------- /.github/pull_request_template.md: -------------------------------------------------------------------------------- 1 | Fixes coderxio/medication-diversification#ISSUE NUMBER 2 | 3 | ## Explanation 4 | [What did you change?] 5 | 6 | ## Rationale 7 | [Why did you make the changes mentioned above? What alternatives did you consider?] 8 | 9 | ## Tests 10 | 1. What testing did you do? 11 | 1. Attach testing logs inside a summary block: 12 | 13 |
14 | testing logs 15 | 16 | ``` 17 | 18 | ``` 19 |
20 | 21 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | data 2 | output 3 | __pycache__ 4 | .vscode 5 | .vim 6 | .DS_Store 7 | venv 8 | .venv 9 | 10 | *.egg-info 11 | 12 | .ipynb_checkpoints 13 | */.ipynb_checkpoints/* 14 | 15 | *.ipynb 16 | -------------------------------------------------------------------------------- /LICENSE.md: -------------------------------------------------------------------------------- 1 | Copyright 2021 CodeRx, LLC 2 | 3 | Licensed under the Apache License, Version 2.0 (the "License"); 4 | you may not use this file except in compliance with the License. 5 | You may obtain a copy of the License at 6 | 7 | http://www.apache.org/licenses/LICENSE-2.0 8 | 9 | Unless required by applicable law or agreed to in writing, software 10 | distributed under the License is distributed on an "AS IS" BASIS, 11 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 | See the License for the specific language governing permissions and 13 | limitations under the License. -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Medication Diversification Tool 2 | 3 | The Medication Diversification Tool (MDT) leverages publicly-available, government-maintained datasets to enhance [Synthea’s Synthetic Patient Generator](https://github.com/synthetichealth/synthea). The synthetic health data generated by Synthea can be used by researchers, software developers, policymakers, and clinicians to develop healthcare solutions. In its current state, the process for generating medications in Synthea is manual and limited to a small selection of medications in individual modules. The goal for the MDT is to create more diverse synthetic patient medication orders that accurately reflects the heterogeneity of medications being prescribed in the US population. 4 | 5 | The MDT automates the process for finding relevant medication codes and calculating a distribution of medications, using medication classification dictionaries from RxClass and population-level prescription data from the Medical Expenditure Panel Survey (MEPS). The medication distributions can be tailored to specific patient demographics (e.g., age, gender, state of residence) and combined with Synthea data to generate medication records for a sample patient population. 6 | 7 | 8 | ## Developer quickstart 9 | 1. Clone the repo. 10 | ``` 11 | git clone https://github.com/coderxio/medication-diversification.git 12 | cd medication-diversification 13 | ``` 14 | 2. Create and activate a venv. 15 | ``` 16 | python -m venv venv 17 | source venv/bin/activate 18 | ``` 19 | Or on Windows (using Git Bash): 20 | ``` 21 | py -m venv venv 22 | venv/scripts/activate 23 | ``` 24 | > If using [VSCode](https://code.visualstudio.com/docs/python/python-tutorial#_install-and-use-packages) on Windows and getting error "Activate.ps1 is not digitally signed. You cannot run this script on the current system.", then you may need to temporarily change the PowerShell execution policy to allow scripts to run. If this is the case, try `Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope Process` and then repeat step 2. 25 | 3. Install MDT as an installed editable package (note the `.` after `-e`). 26 | ``` 27 | pip install -e . 28 | ``` 29 | 4. Change to a new directory outside of the `medication-diversification/` project folder to test out MDT. 30 | ``` 31 | cd .. 32 | mkdir mdt-test 33 | cd mdt-test 34 | ``` 35 | 5. Initialize MDT. This only needs to be done once. This will create a `data/` directory and load the `MDT.db` database. 36 | ``` 37 | mdt init 38 | ``` 39 | 6. Create a new module. This will create a `<>/` directory which is empty except for an initial `settings.yaml` file. 40 | ``` 41 | mdt module -n <> create 42 | ``` 43 | 7. Edit the `settings.yaml` folder in the newly created `<>/` directory, following the directions in this README. 44 | 8. Build the module. 45 | ``` 46 | mdt module -n <> build 47 | ``` 48 | This will create: 49 | - A `<>.json` file which is the Synthea module itself 50 | - A `lookup_tables/` directory with all transition table CSVs 51 | - A `log/` directory with helpful output logs and debugging CSVs 52 | > Repeat steps 7 and 8 until MDT is producing medications that align with what you would expect. Use the `log <>.txt` files in the `log/` directory as a quick and easy way to validate the output of the module with a clinical subject matter expert. 53 | 54 | > To create a new module, start at step 6. 55 | 56 | ## User-defined settings 57 | 58 | Pre-built module settings file examples available in the [docs/examples](https://github.com/coderxio/medication-diversification/tree/main/docs/examples) folder. 59 | 60 | ### Module settings 61 | | Setting | Type | Description | 62 | | ------- | ---- | ----------- | 63 | | `name` | `string` | **(optional)** The name of your module. Defaults to the `camel_case` name of the module folder. Also used as `assign_to_attribute` property by default. | 64 | | `assign_to_attribute` | `string` | **(optional)** The name of the `"attribute"` to assign this state to. Defaults to `<>`. | 65 | | `reason` | `string` | **(optional)** Either an `"attribute"` or a `"State_Name"` referencing a *previous* `ConditionOnset` state. | 66 | | `chronic` | `boolean` | **(optional)** If `true`, a medication is considered a chronic medication for a chronic condition. This will cause Synthea to reissue the same medication as a new prescription AND discontinue the old prescription at each wellness encounter. Defaults to `false`. | 67 | | `as_needed` | `boolean` | **(optional)** If `true`, the medication may be taken as needed instead of on a specific schedule. Defaults to `false`. | 68 | | `refills` | `integer` | **(optional)** The number of refills to allow. Defaults to `0`. | 69 | 70 | ### RxClass settings 71 | 72 | **NOTE:** At least one RxClass `include` or RXCUI `include` is required to run MDT. 73 | 74 | | Setting | Type | Description | 75 | | ------- | ---- | ----------- | 76 | | `include` | `list of objects` | `class_id` / `relationship` pairs of RxClass classes to include. See [RxClass](https://mor.nlm.nih.gov/RxClass/) for valid options. | 77 | | `exclude` | `list of objects` | `class_id` / `relationship` pairs of RxClass classes to exclude. See [RxClass](https://mor.nlm.nih.gov/RxClass/) for valid options. | 78 | 79 | **Examples:** 80 | 81 | **NOTE:** All yaml keys in the default generated settings file must be present even if the key value is empty, this will be adjusted in a future version of MDT to set appropriate default values if a key is omitted. 82 | ``` 83 | rxclass: 84 | include: 85 | - class_id: R01AD 86 | relationship: ATC 87 | exclude: <-- Required key, read as an empty array 88 | # - 89 | ``` 90 | 91 | *Corticosteroid medications* 92 | ``` 93 | rxclass: 94 | include: 95 | - class_id: R01AD 96 | relationship: ATC 97 | exclude: 98 | ``` 99 | 100 | *Medications that may treat hypothyroidism* 101 | ``` 102 | rxclass: 103 | include: 104 | - class_id: D007037 105 | relationship: may_treat 106 | exclude: 107 | ``` 108 | 109 | *HMG CoA reductase inhibitor medications AND medications that may prevent stroke* 110 | ``` 111 | rxclass: 112 | include: 113 | - class_id: R01AD 114 | relationship: ATC 115 | - class_id: D020521 116 | relationship: may_prevent 117 | exclude: 118 | ``` 119 | 120 | *Medications that may prevent stroke EXCLUDING P2Y12 platelet inhibitors* 121 | ``` 122 | rxclass: 123 | include: 124 | - class_id: D020521 125 | relationship: may_prevent 126 | exclude: 127 | - class_id: N0000182142 128 | relationship: has_EPC 129 | ``` 130 | 131 | ### RXCUI settings 132 | 133 | **NOTE:** At least one RxClass `include` or RXCUI `include` is required to run MDT. RXCUIs in the `include` and `exclude` sections must be surrounded by single quotation marks. 134 | 135 | | Setting | Type | Description | 136 | | ------- | ---- | ----------- | 137 | | `include` | `list of strings` | RXCUIs to include. See ingredients section of [RxNav](https://mor.nlm.nih.gov/RxNav/) for valid options. | 138 | | `exclude` | `list of strings` | RXCUIs to exclude. See ingredients section of [RxNav](https://mor.nlm.nih.gov/RxNav/) for valid options. | 139 | | `ingredient_tty_filter` | `string` | **(optional)** `IN` to only return single ingredient products or `MIN` to only return multiple ingredient products. | 140 | | `dose_form_filter` | `list of strings` | **(optional)** A list of dose forms or dose form group names to filter products by. See this [RxNorm dose form reference](https://www.nlm.nih.gov/research/umls/rxnorm/docs/appendix3.html) for valid options. | 141 | 142 | **Examples:** 143 | 144 | *Prednisone medications* 145 | ``` 146 | rxcui: 147 | include: 148 | - '8640' 149 | exclude: 150 | ``` 151 | 152 | *Albuterol AND levalbuterol medications* 153 | ``` 154 | rxcui: 155 | include: 156 | - '435' 157 | - '237159' 158 | exclude: 159 | ``` 160 | 161 | *Fluticasone / salmeterol (TTY = MIN, multiple ingredient) medications* 162 | ``` 163 | rxcui: 164 | include: 165 | - '284635' 166 | exclude: 167 | ``` 168 | 169 | *Single ingredient inhalant product fluticasone medications only* 170 | ``` 171 | rxcui: 172 | include: 173 | - '41126' 174 | exclude: 175 | ingredient_tty_filter: IN 176 | dose_form_filter: 177 | - Inhalant Product 178 | ``` 179 | 180 | ### MEPS settings 181 | | Setting | Type | Description | 182 | | ------- | ---- | ----------- | 183 | | `age_ranges` | `list of strings` | Age ranges to break up distributions by. Defaults to MDT system defaults. | 184 | | `demographic_distribution_flags` | `object` | Whether to break up distributions by `age`, `gender`, and `state`. All three default to `true`. | 185 | 186 | **Examples:** 187 | 188 | *Custom age ranges for pediatric patients only* 189 | ``` 190 | meps: 191 | age_ranges: 192 | - 0-5 193 | - 6-12 194 | - 13-17 195 | ``` 196 | 197 | *Split population under and over 65 years old* 198 | ``` 199 | meps: 200 | age_ranges: 201 | - 0-64 202 | - 65-103 203 | ``` 204 | 205 | ## How to replace a MedicationOrder with a MDT submodule 206 | To replace a MedicationOrder with one of our MDT submodules, replace the [MedicationOrder state](https://github.com/synthetichealth/synthea/wiki/Generic-Module-Framework:-States#medicationorder) with a [CallSubmodule state](https://github.com/synthetichealth/synthea/wiki/Generic-Module-Framework%3A-States#callsubmodule). 207 | 208 | ``` 209 | "Medication_Submodule": { 210 | "type": "CallSubmodule", 211 | "submodule": "medications/<>" 212 | } 213 | ``` 214 | 215 | Put the submodule JSON file in the [`synthea/src/main/resources/modules/medications`](https://github.com/synthetichealth/synthea/tree/master/src/main/resources/modules/medications) folder. 216 | 217 | Put your transition table CSV files in the [`synthea/src/main/resources/modules/lookup_tables`](https://github.com/synthetichealth/synthea/tree/master/src/main/resources/modules/lookup_tables) folder. 218 | 219 | **Example for asthma module:** 220 | 221 | Using the existing [asthma module](https://github.com/synthetichealth/synthea/blob/master/src/main/resources/modules/asthma.json) as an example... 222 | 223 | Change this... 224 | 225 | ``` 226 | ... 227 | "Prescribe_Maintenance_Inhaler": { 228 | "type": "MedicationOrder", 229 | "reason": "asthma_condition", 230 | "codes": [ 231 | { 232 | "system": "RxNorm", 233 | "code": "895994", 234 | "display": "120 ACTUAT Fluticasone propionate 0.044 MG/ACTUAT Metered Dose Inhaler" 235 | } 236 | ], 237 | "prescription": { 238 | "as_needed": true 239 | }, 240 | "direct_transition": "Prescribe_Emergency_Inhaler", 241 | "chronic": true 242 | }, 243 | ... 244 | ``` 245 | 246 | To this... 247 | 248 | ``` 249 | ... 250 | "Prescribe_Maintenance_Inhaler": { 251 | "type": "CallSubmodule", 252 | "submodule": "medications/maintenance_inhaler", 253 | "direct_transition": "Prescribe_Emergency_Inhaler" 254 | }, 255 | ... 256 | ``` 257 | 258 | And make sure your submodule JSON and transition table CSVs are in the folder locations specified above. 259 | - Put a `maintenance_inhaler.json` file in the `synthea/src/main/resources/modules/medication` folder. 260 | - Put all the transition table CSV files in the `synthea/src/main/resources/modules/lookup_tables` folder. 261 | 262 | See below for example file structure: 263 | 264 | ``` 265 | synthea/ 266 | ├─ src/ 267 | │ ├─ main/ 268 | | │ ├─ resources/ 269 | | │ │ ├─ modules/ 270 | | │ │ │ ├─ medication/ 271 | | │ │ │ │ ├─ maintenance_inhaler.json 272 | | │ │ │ │ ├─ ... 273 | | │ │ │ ├─ lookup_tables/ 274 | | │ │ │ │ ├─ maintenance_inhaler_ingredient_distribution.csv 275 | | │ │ │ │ ├─ maintenance_inhaler_fluticasone_product_distribution.csv 276 | | │ │ │ │ ├─ maintenance_inhaler_budesonide_product_distribution.csv 277 | | │ │ │ │ ├─ maintenance_inhaler_beclomethasone_product_distribution.csv 278 | | │ │ │ │ ├─ maintenance_inhaler_mometasone_product_distribution.csv 279 | | │ │ │ │ ├─ ... 280 | | │ │ │ ├─ asthma.json 281 | | │ │ │ ├─ ... 282 | ``` 283 | 284 | Lastly, if the calling module (in this case, `asthma.json`) ends medications by a specific `State_Name` of a previous `MedicationOrder` state, you will need to change that `MedicationEnd` state to instead end a medication by `attribute`. The reason for this is that our MDT JSON module generates different `MedicationOrder` state names for each potential prescribed product, but they all have the same `attribute`. 285 | 286 | Change this... 287 | 288 | ``` 289 | ... 290 | "Maintenance_Medication_End": { 291 | "type": "MedicationEnd", 292 | "medication_order": "Prescribe_Maintenance_Inhaler", 293 | "direct_transition": "Emergency_Medication_End" 294 | }, 295 | ... 296 | ``` 297 | 298 | To this... 299 | 300 | ``` 301 | ... 302 | "Maintenance_Medication_End": { 303 | "type": "MedicationEnd", 304 | "referenced_by_attribute": "maintenance_inhaler", 305 | "direct_transition": "Emergency_Medication_End" 306 | }, 307 | ... 308 | ``` 309 | 310 | ## Tips on testing MDT with Synthea 311 | 312 | - In Synthea, change setting in `synthea/src/main/resources/synthea.properties` to disable FHIR exporting and enable CSV exporting. 313 | ``` 314 | ... 315 | exporter.fhir.export = false 316 | ... 317 | exporter.csv.export = true 318 | ... 319 | ``` 320 | - Each time you run Synthea, make sure you havea all Synthea CSV output files closed, or it will error out with a non-specific error message. 321 | - Run Synthea with a large enough sample size (at least 1000) to see a noticable impact from MDT. 322 | - Check the `medications.csv` output file for medications produced by your MDT-generated module. 323 | 324 | ## Validation 325 | 326 | Please see [docs/validation](https://github.com/coderxio/medication-diversification/tree/main/docs/validation) for a python notebook which can be used to validate Synthea + MDT patient populations against MEPS patient populations. 327 | -------------------------------------------------------------------------------- /docs/examples/README.md: -------------------------------------------------------------------------------- 1 | # Examples Usage 2 | 3 | This directory contains example module directories which can be used with MDT to build Synthea modules. Each directory contains a single `settings.yaml` which MDT will read to build the appropriate Synthea module and related files. After installing MDT with pip and building the database with `mdt init` execute `mdt module -n rescue_inhaler build`, for example, to build the rescue_inhaler submodule. 4 | 5 | End users can either copy the example module directory to the same location as where MDT was initialized or copy the settings file into a custom directory use the same steps in the repo README to build the module. 6 | -------------------------------------------------------------------------------- /docs/examples/maintenance_inhaler/settings.yaml: -------------------------------------------------------------------------------- 1 | # Settings for the Synthea module 2 | module: 3 | name: # (optional) string, defaults to the camelcase name of the module folder 4 | assign_to_attribute: # (optional) string, defaults to the lowercase name of the module folder 5 | reason: asthma_condition # (optional) string, references a previous ConditionOnset state 6 | as_needed: false # boolean, whether the prescription is as needed 7 | chronic: true # boolean, whether the prescription is chronic 8 | refills: 0 # integer, number of refills 9 | 10 | # Settings for the RxClass search to include/exclude 11 | # *** At least one RxClass include or RXCUI include is required *** 12 | # NOTE: you can include/exclude multiple class_id/relationship pairs 13 | # RxClass options - see https://mor.nlm.nih.gov/RxClass/ 14 | rxclass: 15 | include: 16 | # R01AD is the ATC RxClass for corticosteroids 17 | - class_id: R01AD 18 | relationship: ATC 19 | exclude: 20 | # - class_id: 21 | # relationship: 22 | 23 | # Settings for individual RXCUIs to include/exclude 24 | # *** At least one RxClass include or RXCUI include is required *** 25 | # NOTE: you can include/exclude multiple RXCUIs 26 | # You must enclose RXCUIs in quotes - example: '435' 27 | # RXCUI options - see the Ingredient section in https://mor.nlm.nih.gov/RxNav/ 28 | # Dose form options - see https://www.nlm.nih.gov/research/umls/rxnorm/docs/appendix3.html 29 | rxcui: 30 | include: 31 | # - 32 | exclude: 33 | # - 34 | ingredient_tty_filter: IN # (optional) string, options are IN or MIN 35 | dose_form_filter: # (optional) list, see dose form options above 36 | - Metered Dose Inhaler 37 | - Dry Powder Inhaler 38 | - Inhalation Suspension 39 | 40 | # Settings for the MEPS population 41 | meps: 42 | age_ranges: # (optional) list, defaults to mdt-settings.yaml default age ranges 43 | - 0-5 44 | #- 6-103 45 | demographic_distribution_flags: 46 | age: true # boolean, whether to break up distributions by age ranges 47 | gender: false # boolean, whether to break up distributions by gender 48 | state: false # boolean, whether to break up distributions by state of residence 49 | -------------------------------------------------------------------------------- /docs/examples/rescue_inhaler/settings.yaml: -------------------------------------------------------------------------------- 1 | # Settings for the Synthea module 2 | module: 3 | name: # (optional) string, defaults to the camelcase name of the module folder 4 | assign_to_attribute: # (optional) string, defaults to the lowercase name of the module folder 5 | reason: asthma_condition # (optional) string, references a previous ConditionOnset state 6 | as_needed: true # boolean, whether the prescription is as needed 7 | chronic: true # boolean, whether the prescription is chronic 8 | refills: 0 # integer, number of refills 9 | 10 | # Settings for the RxClass search to include/exclude 11 | # *** At least one RxClass include or RXCUI include is required *** 12 | # NOTE: you can include/exclude multiple class_id/relationship pairs 13 | # RxClass options - see https://mor.nlm.nih.gov/RxClass/ 14 | rxclass: 15 | include: 16 | # - class_id: 17 | # relationship: 18 | exclude: 19 | # - class_id: 20 | # relationship: 21 | 22 | # Settings for individual RXCUIs to include/exclude 23 | # *** At least one RxClass include or RXCUI include is required *** 24 | # NOTE: you can include/exclude multiple RXCUIs 25 | # You must enclose RXCUIs in quotes - example: '435' 26 | # RXCUI options - see the Ingredient section in https://mor.nlm.nih.gov/RxNav/ 27 | # Dose form options - see https://www.nlm.nih.gov/research/umls/rxnorm/docs/appendix3.html 28 | rxcui: 29 | include: 30 | # '435' = albuterol, '237159' = ipratropium 31 | - '435' 32 | - '237159' 33 | exclude: 34 | # - 35 | ingredient_tty_filter: IN # (optional) string, options are IN or MIN 36 | dose_form_filter: # (optional) list, see dose form options above 37 | - Metered Dose Inhaler 38 | - Inhalation Solution 39 | 40 | # Settings for the MEPS population 41 | meps: 42 | age_ranges: # (optional) defaults to MDT defaults 43 | - 0-5 44 | #- 6-103 45 | demographic_distribution_flags: 46 | age: true # boolean, whether to break up distributions by age ranges 47 | gender: false # boolean, whether to break up distributions by gender 48 | state: false # boolean, whether to break up distributions by state of residence 49 | -------------------------------------------------------------------------------- /docs/validation/MDT Validation Notebook.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "metadata": { 3 | "kernelspec": { 4 | "name": "python3", 5 | "display_name": "Python 3", 6 | "language": "python" 7 | }, 8 | "language_info": { 9 | "name": "python", 10 | "version": "3.8.5", 11 | "mimetype": "text/x-python", 12 | "codemirror_mode": { 13 | "name": "ipython", 14 | "version": 3 15 | }, 16 | "pygments_lexer": "ipython3", 17 | "nbconvert_exporter": "python", 18 | "file_extension": ".py" 19 | } 20 | }, 21 | "nbformat_minor": 2, 22 | "nbformat": 4, 23 | "cells": [ 24 | { 25 | "cell_type": "markdown", 26 | "source": [ 27 | "# MDT Validation Notebook\r\n", 28 | "\r\n", 29 | "Validated on Synthea +MDT population vs MEPS for Pediatric Asthma" 30 | ], 31 | "metadata": {} 32 | }, 33 | { 34 | "cell_type": "code", 35 | "execution_count": 29, 36 | "source": [ 37 | "import pandas as pd\r\n", 38 | "import datetime as dt\r\n", 39 | "import numpy as np\r\n", 40 | "from scipy.stats import chi2_contingency" 41 | ], 42 | "outputs": [], 43 | "metadata": { 44 | "azdata_cell_guid": "6f5d30fc-eda6-4936-aa53-270406aff005" 45 | } 46 | }, 47 | { 48 | "cell_type": "markdown", 49 | "source": [ 50 | "# Grab medication RXCUI of interest\r\n", 51 | "\r\n", 52 | "Grabs the MEPS product RXCUI lists for filtering of Synthea to medications of interest. \r\n", 53 | "Path to this will be MDT module - log - rxcui_ndc_df_output.csv" 54 | ], 55 | "metadata": {} 56 | }, 57 | { 58 | "cell_type": "code", 59 | "execution_count": 113, 60 | "source": [ 61 | "rxcui_df = pd.read_csv(r\"\") # MDT produced medication list\r\n", 62 | "rxcui_df = rxcui_df[['medication_product_name','medication_product_rxcui']].drop_duplicates()\r\n", 63 | "rxcui_df['medication_product_rxcui'] = rxcui_df['medication_product_rxcui'].astype(int)" 64 | ], 65 | "outputs": [], 66 | "metadata": { 67 | "azdata_cell_guid": "055a6ba7-8ac6-45d7-abd9-928e4631e876" 68 | } 69 | }, 70 | { 71 | "cell_type": "markdown", 72 | "source": [ 73 | "# Read Synthea Population\r\n", 74 | "Reads Synthea Medication file and filters on medications of interest\r\n", 75 | "\r\n", 76 | "The path for this will be synthea -> output -> csv -> medications.csv " 77 | ], 78 | "metadata": {} 79 | }, 80 | { 81 | "cell_type": "code", 82 | "execution_count": 115, 83 | "source": [ 84 | "col_list = ['START','PATIENT','CODE']\r\n", 85 | "\r\n", 86 | "syn_med_df = pd.DataFrame(columns = ['START','PATIENT','CODE','medication_product_rxcui','medication_product_name'])\r\n", 87 | "\r\n", 88 | "for x in pd.read_csv(r\"\", usecols=col_list, chunksize=100000):\r\n", 89 | " x['CODE'] = x['CODE'].astype(int)\r\n", 90 | " temp_df = x.merge(rxcui_df, how=\"inner\", left_on='CODE', right_on='medication_product_rxcui')\r\n", 91 | " syn_med_df = syn_med_df.append(temp_df)" 92 | ], 93 | "outputs": [], 94 | "metadata": { 95 | "azdata_cell_guid": "689fcd9a-79be-41df-8c5b-213689388e2c", 96 | "tags": [] 97 | } 98 | }, 99 | { 100 | "cell_type": "markdown", 101 | "source": [ 102 | "# Synthea Patient Population Filtering\r\n", 103 | "\r\n", 104 | "Reads and merges Synthea patient data to allow for patient management.\r\n", 105 | "The path for this will be synthea -> output -> csv -> patients.csv\r\n", 106 | "\r\n", 107 | "This step can be skipped if not filtering by patient. For the pediatic use case we limited to patients who received medications when they were < 6 years of age" 108 | ], 109 | "metadata": {} 110 | }, 111 | { 112 | "cell_type": "code", 113 | "execution_count": 76, 114 | "source": [ 115 | "syn_pat_df = pd.read_csv(r\"\")\r\n", 116 | "syn_pat_df = syn_pat_df.merge(syn_med_df, how='inner', left_on='Id', right_on='PATIENT')\r\n", 117 | "\r\n", 118 | "syn_pat_df['START'] = pd.to_datetime(syn_pat_df['START']).dt.date\r\n", 119 | "syn_pat_df['BIRTHDATE'] = pd.to_datetime(syn_pat_df['BIRTHDATE']).dt.date\r\n", 120 | "syn_pat_df['age_in_days'] = (syn_pat_df['START'] - syn_pat_df['BIRTHDATE']).dt.days\r\n", 121 | "\r\n", 122 | "syn_med_df = syn_pat_df[syn_pat_df['age_in_days'] < 2191]" 123 | ], 124 | "outputs": [], 125 | "metadata": { 126 | "azdata_cell_guid": "6b4f7a19-1e25-4eff-a5a2-9dbc7053cfaf" 127 | } 128 | }, 129 | { 130 | "cell_type": "markdown", 131 | "source": [ 132 | "# Synthea distributions\r\n", 133 | "Gets total patient counts and medication distributions from Synthea population" 134 | ], 135 | "metadata": {} 136 | }, 137 | { 138 | "cell_type": "code", 139 | "execution_count": 116, 140 | "source": [ 141 | "syn_med_df = syn_med_df.groupby(['medication_product_name']).agg(patient_count=('CODE','count')).reset_index()\r\n", 142 | "total_patients = syn_med_df['patient_count'].sum()\r\n", 143 | "syn_med_df['percent'] = syn_med_df['patient_count']/total_patients\r\n", 144 | "syn_med_df" 145 | ], 146 | "outputs": [ 147 | { 148 | "output_type": "execute_result", 149 | "data": { 150 | "text/plain": [ 151 | " medication_product_name patient_count percent\n", 152 | "0 120 ACTUAT fluticasone propionate 0.044 MG/ACT... 2378 0.341618\n", 153 | "1 120 ACTUAT fluticasone propionate 0.11 MG/ACTU... 1070 0.153714\n", 154 | "2 Breath-Actuated 120 ACTUAT beclomethasone dipr... 203 0.029162\n", 155 | "3 budesonide 0.125 MG/ML Inhalation Suspension 977 0.140353\n", 156 | "4 budesonide 0.125 MG/ML Inhalation Suspension [... 513 0.073696\n", 157 | "5 budesonide 0.25 MG/ML Inhalation Suspension 1819 0.261313\n", 158 | "6 budesonide 0.5 MG/ML Inhalation Suspension 1 0.000144" 159 | ], 160 | "text/html": [ 161 | "
\n", 162 | "\n", 175 | "\n", 176 | " \n", 177 | " \n", 178 | " \n", 179 | " \n", 180 | " \n", 181 | " \n", 182 | " \n", 183 | " \n", 184 | " \n", 185 | " \n", 186 | " \n", 187 | " \n", 188 | " \n", 189 | " \n", 190 | " \n", 191 | " \n", 192 | " \n", 193 | " \n", 194 | " \n", 195 | " \n", 196 | " \n", 197 | " \n", 198 | " \n", 199 | " \n", 200 | " \n", 201 | " \n", 202 | " \n", 203 | " \n", 204 | " \n", 205 | " \n", 206 | " \n", 207 | " \n", 208 | " \n", 209 | " \n", 210 | " \n", 211 | " \n", 212 | " \n", 213 | " \n", 214 | " \n", 215 | " \n", 216 | " \n", 217 | " \n", 218 | " \n", 219 | " \n", 220 | " \n", 221 | " \n", 222 | " \n", 223 | " \n", 224 | " \n", 225 | " \n", 226 | " \n", 227 | " \n", 228 | "
medication_product_namepatient_countpercent
0120 ACTUAT fluticasone propionate 0.044 MG/ACT...23780.341618
1120 ACTUAT fluticasone propionate 0.11 MG/ACTU...10700.153714
2Breath-Actuated 120 ACTUAT beclomethasone dipr...2030.029162
3budesonide 0.125 MG/ML Inhalation Suspension9770.140353
4budesonide 0.125 MG/ML Inhalation Suspension [...5130.073696
5budesonide 0.25 MG/ML Inhalation Suspension18190.261313
6budesonide 0.5 MG/ML Inhalation Suspension10.000144
\n", 229 | "
" 230 | ] 231 | }, 232 | "metadata": {}, 233 | "execution_count": 116 234 | } 235 | ], 236 | "metadata": {} 237 | }, 238 | { 239 | "cell_type": "markdown", 240 | "source": [ 241 | "# MEPS Expected\r\n", 242 | "\r\n", 243 | "generates the expected MEPS patient counts for chi squared goodness of fit test\r\n", 244 | "\r\n", 245 | "Path to file will be in you MDT module - log - validation_df.csv" 246 | ], 247 | "metadata": {} 248 | }, 249 | { 250 | "cell_type": "code", 251 | "execution_count": 108, 252 | "source": [ 253 | "meps_df = pd.read_csv(r\"\")\r\n", 254 | "meps_df = meps_df[meps_df['age'] == '0-5'][['medication_product_name','validation_percent_product_patients']]\r\n", 255 | "meps_df['patient_count'] = meps_df['validation_percent_product_patients'] * total_patients\r\n", 256 | "meps_df['patient_count'] = meps_df['patient_count'].round(0)\r\n", 257 | "meps_df" 258 | ], 259 | "outputs": [ 260 | { 261 | "output_type": "execute_result", 262 | "data": { 263 | "text/plain": [ 264 | " medication_product_name \\\n", 265 | "0 120_Actuat_Fluticasone_Propionate_0_044_Mg_Act... \n", 266 | "1 120_Actuat_Fluticasone_Propionate_0_11_Mg_Actu... \n", 267 | "16 Budesonide_0_125_Mg_Ml_Inhalation_Suspension \n", 268 | "17 Budesonide_0_125_Mg_Ml_Inhalation_Suspension_P... \n", 269 | "18 Budesonide_0_25_Mg_Ml_Inhalation_Suspension \n", 270 | "19 Breath_Actuated_120_Actuat_Beclomethasone_Dipr... \n", 271 | "\n", 272 | " validation_percent_product_patients patient_count \n", 273 | "0 0.335052 2332.0 \n", 274 | "1 0.156948 1093.0 \n", 275 | "16 0.140715 980.0 \n", 276 | "17 0.072027 501.0 \n", 277 | "18 0.263781 1836.0 \n", 278 | "19 0.031000 216.0 " 279 | ], 280 | "text/html": [ 281 | "
\n", 282 | "\n", 295 | "\n", 296 | " \n", 297 | " \n", 298 | " \n", 299 | " \n", 300 | " \n", 301 | " \n", 302 | " \n", 303 | " \n", 304 | " \n", 305 | " \n", 306 | " \n", 307 | " \n", 308 | " \n", 309 | " \n", 310 | " \n", 311 | " \n", 312 | " \n", 313 | " \n", 314 | " \n", 315 | " \n", 316 | " \n", 317 | " \n", 318 | " \n", 319 | " \n", 320 | " \n", 321 | " \n", 322 | " \n", 323 | " \n", 324 | " \n", 325 | " \n", 326 | " \n", 327 | " \n", 328 | " \n", 329 | " \n", 330 | " \n", 331 | " \n", 332 | " \n", 333 | " \n", 334 | " \n", 335 | " \n", 336 | " \n", 337 | " \n", 338 | " \n", 339 | " \n", 340 | " \n", 341 | " \n", 342 | "
medication_product_namevalidation_percent_product_patientspatient_count
0120_Actuat_Fluticasone_Propionate_0_044_Mg_Act...0.3350522332.0
1120_Actuat_Fluticasone_Propionate_0_11_Mg_Actu...0.1569481093.0
16Budesonide_0_125_Mg_Ml_Inhalation_Suspension0.140715980.0
17Budesonide_0_125_Mg_Ml_Inhalation_Suspension_P...0.072027501.0
18Budesonide_0_25_Mg_Ml_Inhalation_Suspension0.2637811836.0
19Breath_Actuated_120_Actuat_Beclomethasone_Dipr...0.031000216.0
\n", 343 | "
" 344 | ] 345 | }, 346 | "metadata": {}, 347 | "execution_count": 108 348 | } 349 | ], 350 | "metadata": {} 351 | }, 352 | { 353 | "cell_type": "markdown", 354 | "source": [ 355 | "# Run Chi Squared\r\n", 356 | "\r\n", 357 | "Runs chi squared test for two different populations\r\n", 358 | "Take the values for patient count from syn_med_df and meps_df for this.\r\n", 359 | "\r\n", 360 | "Numbers used are for the pediatric asthma use case of Synthea +MDT vs MEPS" 361 | ], 362 | "metadata": {} 363 | }, 364 | { 365 | "cell_type": "code", 366 | "execution_count": 117, 367 | "source": [ 368 | "obs = np.array([[203, 216],\r\n", 369 | " [977, 979],\r\n", 370 | " [513, 489],\r\n", 371 | " [1819, 1836],\r\n", 372 | " [1, 0],\r\n", 373 | " [2378, 2332],\r\n", 374 | " [1070, 1093]])\r\n", 375 | "\r\n", 376 | "\r\n", 377 | "chi2, p, df, ob = chi2_contingency(obs)\r\n", 378 | "print(f\"\"\"X2 = {chi2}\r\n", 379 | "p-value = {p}\r\n", 380 | "degrees of freedom = {df}\r\n", 381 | "observatrions = {ob}\"\"\")" 382 | ], 383 | "outputs": [ 384 | { 385 | "output_type": "stream", 386 | "name": "stdout", 387 | "text": [ 388 | "X2 = 2.7347252762386036\n", 389 | "p-value = 0.8413287112519282\n", 390 | "degrees of freedom = 6\n", 391 | "observatrions = [[2.09741047e+02 2.09258953e+02]\n", 392 | " [9.79125270e+02 9.76874730e+02]\n", 393 | " [5.01576442e+02 5.00423558e+02]\n", 394 | " [1.82960269e+03 1.82539731e+03]\n", 395 | " [5.00575291e-01 4.99424709e-01]\n", 396 | " [2.35770962e+03 2.35229038e+03]\n", 397 | " [1.08274435e+03 1.08025565e+03]]\n" 398 | ] 399 | } 400 | ], 401 | "metadata": {} 402 | } 403 | ] 404 | } -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | from setuptools import setup, find_packages 2 | import pathlib 3 | 4 | here = pathlib.Path(__file__).parent.resolve() 5 | 6 | # Get the long description from the README file 7 | long_description = (here / 'README.md').read_text(encoding='utf-8') 8 | 9 | setup( 10 | name='mdt', 11 | version='1.0.0', 12 | # description='A sample Python project', # Optional 13 | # long_description=long_description, # Optional 14 | # long_description_content_type='text/markdown', # Optional 15 | # url='https://github.com/pypa/sampleproject', # Optional 16 | # author='A. Random Developer', # Optional 17 | # author_email='author@example.com', # Optional 18 | # keywords='sample, setuptools, development', # Optional 19 | package_dir={'': 'src'}, 20 | packages=find_packages(where='src'), 21 | python_requires='>=3.7, <4', 22 | install_requires=[ 23 | 'requests', 24 | 'pandas', 25 | 'ruamel.yaml' 26 | ], # Optional 27 | 28 | # If there are data files included in your packages that need to be 29 | # installed, specify them here. 30 | package_data={ 31 | "": ['*.sql'] 32 | }, 33 | # Although 'package_data' is the preferred approach, in some case you may 34 | # need to place data files outside of your packages. See: 35 | # http://docs.python.org/distutils/setupscript.html#installing-additional-files 36 | # 37 | # In this case, 'data_file' will be installed into '/my_data' 38 | # data_files=[('my_data', ['data/data_file'])], # Optional 39 | 40 | entry_points={ # Optional 41 | 'console_scripts': [ 42 | 'mdt=mdt.cli:main', 43 | ], 44 | }, 45 | ) 46 | -------------------------------------------------------------------------------- /src/mdt/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/coderxio/medication-diversification/8d43a8e1c2c38826aa79a2717e969fea58f81065/src/mdt/__init__.py -------------------------------------------------------------------------------- /src/mdt/cli.py: -------------------------------------------------------------------------------- 1 | import sys 2 | import argparse 3 | from mdt.database import ( 4 | load_rxnorm, 5 | load_meps, 6 | load_fda, 7 | check_table, 8 | ) 9 | from mdt.yamlmanager import ( 10 | create_mdt_settings, 11 | create_module_settings, 12 | get_settings, 13 | ) 14 | from mdt.utils import ( 15 | get_rxcui_ingredient_df, 16 | get_rxcui_product_df, 17 | get_rxcui_ndc_df, 18 | get_meps_rxcui_ndc_df, 19 | generate_module_csv, 20 | generate_module_json, 21 | ) 22 | 23 | 24 | def init_db(args): 25 | if check_table('rxcui_ndc') is False: 26 | load_rxnorm() 27 | 28 | if check_table('meps_demographics') is False: 29 | load_meps() 30 | 31 | if check_table('package') is False: 32 | load_fda() 33 | 34 | print('All Tables are loaded') 35 | 36 | create_mdt_settings() 37 | 38 | 39 | def module_create(args): 40 | arguments = vars(args) 41 | create_module_settings(arguments['module_name']) 42 | 43 | 44 | def module_build(args): 45 | arguments = vars(args) 46 | module_name = arguments['module_name'] 47 | settings = get_settings(module_name) 48 | 49 | # First, get all medications that contain one of the ingredient RXCUIs 50 | # This will result in duplicate NDCs and potentially no MINs 51 | rxcui_ingredient_df = get_rxcui_ingredient_df(settings) 52 | 53 | # Second, get all of the medications that contain one of the product RXCUIs in the df above 54 | # This will result in potentially INs and MINs, but still duplicate NDCs 55 | rxcui_product_df = get_rxcui_product_df(rxcui_ingredient_df, settings) 56 | 57 | # Third, query the rxcui_product_df with a window function to group by NDC and prefer MIN over IN 58 | # This will result in only distinct NDCs that map to either an MIN (preferred) or an IN 59 | # https://pandas.pydata.org/pandas-docs/stable/getting_started/comparison/comparison_with_sql.html#top-n-rows-per-group 60 | # Also, filter by dose form and ingredient term type (if appliable) 61 | rxcui_ndc_df = get_rxcui_ndc_df(rxcui_product_df, module_name, settings) 62 | 63 | #Join MEPS data with rxcui_ndc_df 64 | meps_rxcui_ndc_df = get_meps_rxcui_ndc_df(rxcui_ndc_df, module_name, settings) 65 | 66 | #Generate distribution CSVs 67 | dcp_demographictotal_ingred_df, dcp_demographictotal_prod_df = generate_module_csv(meps_rxcui_ndc_df, module_name, settings) 68 | 69 | #Generate JSON 70 | generate_module_json(meps_rxcui_ndc_df, dcp_demographictotal_ingred_df, dcp_demographictotal_prod_df, module_name, settings) 71 | 72 | 73 | def main(): 74 | # Main command and child command setup 75 | parser = argparse.ArgumentParser( 76 | description='Medication Diversification Tool for Synthea' 77 | ) 78 | 79 | subparsers = parser.add_subparsers( 80 | title='Commands', 81 | metavar='', 82 | ) 83 | 84 | # Init command parsers 85 | 86 | init_parser = subparsers.add_parser( 87 | 'init', 88 | description='Download MEPS, RxNorm data and set up the database', 89 | help='Initialize MDT DB' 90 | ) 91 | init_parser.set_defaults(func=init_db) 92 | 93 | # Module ommand parsers 94 | 95 | module_parser = subparsers.add_parser( 96 | 'module', 97 | description='Module-specific commands', 98 | help='Module-specific commands' 99 | ) 100 | module_parser.add_argument( 101 | '--module-name', 102 | '-n', 103 | help='Specific name of module', 104 | ) 105 | 106 | module_subparser = module_parser.add_subparsers( 107 | title='Commands', 108 | metavar='', 109 | dest='module_commands' 110 | ) 111 | 112 | create_parser = module_subparser.add_parser( 113 | 'create', 114 | description='Create template module directory', 115 | help='Create template module directory' 116 | ) 117 | create_parser.set_defaults(func=module_create) 118 | 119 | build_parser = module_subparser.add_parser( 120 | 'build', 121 | description='Build Synthea module', 122 | help='Build Synthea module' 123 | ) 124 | 125 | build_parser.set_defaults(func=module_build) 126 | 127 | if len(sys.argv) < 2: 128 | parser.print_help() 129 | sys.exit(0) 130 | 131 | args = parser.parse_args() 132 | 133 | try: 134 | args.func(args) 135 | except AttributeError: 136 | for key, _ in vars(args).items(): 137 | if key == 'module_commands': 138 | module_parser.print_help() 139 | -------------------------------------------------------------------------------- /src/mdt/database.py: -------------------------------------------------------------------------------- 1 | import importlib.resources as pkg_resources 2 | from . import rxnorm, meps, fda 3 | from pathlib import Path 4 | import zipfile 5 | import io 6 | import sqlite3 7 | import pandas as pd 8 | 9 | 10 | def path_manager(*args): 11 | """creates folder path if it does not exist""" 12 | p = Path.cwd().joinpath(*args) 13 | if not p.exists(): 14 | p.mkdir(parents=True, exist_ok=True) 15 | return p 16 | 17 | 18 | def delete_csv_files(path=Path.cwd()): 19 | files = path.glob('*.csv') 20 | for file in files: 21 | file.unlink() 22 | 23 | 24 | def create_mdt_con(): 25 | """create defualt connection to the MDT.db in data folder.""" 26 | conn = sqlite3.connect(path_manager('data') / 'MDT.db') 27 | return conn 28 | 29 | 30 | def sql_create_table(table_name, df, conn=None): 31 | """Creates a table in the connected database when passed a pandas dataframe. 32 | Note default is to delete dataframe if table name is same as global variable name that stores the df and delete_df is True""" 33 | 34 | if conn is None: 35 | conn = create_mdt_con() 36 | 37 | try: 38 | df.to_sql(table_name, conn, if_exists='replace', index=False) 39 | print('{} table created in DB'.format(table_name)) 40 | except: 41 | print('Could not create table {0} in DB'.format(table_name)) 42 | 43 | 44 | def db_query(query_str, conn=None): 45 | """Sends query to DB and returns results as a dataframe""" 46 | if conn is None: 47 | conn = create_mdt_con() 48 | return pd.read_sql(query_str, conn) 49 | 50 | 51 | def check_table(tablename, conn=None): 52 | """checks if table exists in database""" 53 | if conn is None: 54 | conn = create_mdt_con() 55 | c = conn.cursor() 56 | c.execute(f"SELECT count(name) FROM sqlite_master WHERE type='table' AND name='{tablename}'") 57 | if c.fetchone()[0]==1: 58 | return True 59 | else: 60 | return False 61 | 62 | 63 | def read_sql_string(file_name): 64 | """reads the contents of a sql script into a string for python to use in a query""" 65 | fd = open(file_name, 'r') 66 | query_str = fd.read() 67 | fd.close() 68 | print('Read {0} file as string'.format(file_name)) 69 | return query_str 70 | 71 | 72 | def load_rxnorm(): 73 | """downloads and loads RxNorm dataset into database""" 74 | z = zipfile.ZipFile(rxnorm.utils.get_dataset(handler=io.BytesIO)) 75 | col_names = ['RXCUI', 'LAT', 'TS', 'LUI', 'STT', 'SUI', 'ISPREF', 'RXAUI', 'SAUI', 'SCUI', 'SDUI', 'SAB', 'TTY', 'CODE', 'STR', 'SRL', 'SUPPRESS', 'CVF', 'test'] 76 | rxnconso = pd.read_csv( 77 | z.open('rrf/RXNCONSO.RRF'), 78 | sep='|', 79 | header=None, 80 | dtype=object, 81 | names=col_names 82 | ) 83 | sql_create_table('rxnconso', rxnconso) 84 | del rxnconso 85 | 86 | col_names = ['RXCUI1', 'RXAUI1', 'STYPE1', 'REL', 'RXCUI2', 'RXAUI2', 'STYPE2', 'RELA', 'RUI', 'SRUI', 'SAB', 'SL', 'DIR', 'RG', 'SUPPRESS', 'CVF', 'test'] 87 | rxnrel = pd.read_csv( 88 | z.open('rrf/RXNREL.RRF'), 89 | sep='|', 90 | dtype=object, 91 | header=None, 92 | names=col_names 93 | ) 94 | sql_create_table('rxnrel', rxnrel) 95 | del rxnrel 96 | 97 | col_names = ['RXCUI', 'LUI', 'SUI', 'RXAUI', 'STYPE', 'CODE', 'ATUI', 'SATUI', 'ATN', 'SAB', 'ATV', 'SUPPRESS', 'CVF', 'test'] 98 | rxnsat = pd.read_csv( 99 | z.open('rrf/RXNSAT.RRF'), 100 | sep='|', 101 | dtype=object, 102 | header=None, 103 | names=col_names 104 | ) 105 | sql_create_table('rxnsat', rxnsat) 106 | del rxnsat 107 | 108 | del z 109 | 110 | rxcui_ndc = db_query(rxnorm.utils.get_sql('rxcui_ndc.sql')) 111 | sql_create_table('rxcui_ndc', rxcui_ndc) 112 | del rxcui_ndc 113 | 114 | dfg_df = db_query(rxnorm.utils.get_sql('dfg_df.sql')) 115 | sql_create_table('dfg_df', dfg_df) 116 | del dfg_df 117 | 118 | 119 | def load_meps(): 120 | '''Load Meps data into db''' 121 | z = zipfile.ZipFile( 122 | meps.utils.get_dataset('h206adat.zip', handler=io.BytesIO) 123 | ) 124 | 125 | meps_prescription = pd.read_fwf( 126 | z.open('H206A.dat'), 127 | header=None, 128 | names=meps.columns.p_col_names, 129 | converters={col: str for col in meps.columns.p_col_names}, 130 | colspecs=meps.columns.p_col_spaces, 131 | ) 132 | 133 | sql_create_table('meps_prescription', meps_prescription) 134 | del meps_prescription 135 | del z 136 | 137 | z = zipfile.ZipFile( 138 | meps.utils.get_dataset('h209dat.zip', handler=io.BytesIO) 139 | ) 140 | 141 | meps_demographics = pd.read_fwf( 142 | z.open('h209.dat'), 143 | header=None, 144 | names=meps.columns.d_col_names, 145 | converters={col: str for col in meps.columns.d_col_names}, 146 | colspecs=meps.columns.d_col_spaces, 147 | usecols=['DUPERSID', 'PERWT18F', "REGION18", 'SEX', 'AGELAST'] 148 | ) 149 | 150 | # removing numbers from meps_demographic column names, since the '18' in region18 and perwt18f in MEPS are year-specific 151 | meps_demographics.columns = meps_demographics.columns.str.replace(r'\d+', '', regex=True) 152 | sql_create_table('meps_demographics', meps_demographics) 153 | del meps_demographics 154 | del z 155 | 156 | sql_create_table('meps_region_states', meps.columns.meps_region_states) 157 | 158 | meps_reference_str = meps.utils.get_sql('meps_reference.sql') 159 | meps_reference = db_query(meps_reference_str) 160 | sql_create_table('meps_reference', meps_reference) 161 | del meps_reference 162 | 163 | meps_rx_qty_ds = db_query( 164 | pkg_resources.read_text('mdt.sql', 'meps_rx_qty_ds.sql') 165 | ) 166 | sql_create_table('meps_rx_qty_ds', meps_rx_qty_ds) 167 | del meps_rx_qty_ds 168 | 169 | # TEST!!!!!!!!!!!!!!!! reads record count from created database 170 | meps_prescription = db_query("Select count(*) AS records from meps_prescription") 171 | print('DB table meps_prescription has {0} records'.format(meps_prescription['records'].iloc[0])) 172 | 173 | meps_demographics = db_query("Select count(*) AS records from meps_demographics") 174 | print('DB table meps_demographics has {0} records'.format(meps_demographics['records'].iloc[0])) 175 | 176 | meps_reference = db_query("Select count(*) AS records from meps_reference") 177 | print('DB table meps_reference has {0} records'.format(meps_reference['records'].iloc[0])) 178 | 179 | meps_region_states = db_query("Select count(*) AS records from meps_region_states") 180 | print('DB table meps_region_states has {0} records'.format(meps_region_states['records'].iloc[0])) 181 | 182 | meps_rx_qty_ds = db_query("Select count(*) AS records from meps_rx_qty_ds") 183 | print('DB table meps_rx_qty_ds has {0} records'.format(meps_rx_qty_ds['records'].iloc[0])) 184 | 185 | 186 | def load_fda(): 187 | '''Load FDA tables into db''' 188 | 189 | z = zipfile.ZipFile( 190 | fda.utils.get_dataset(handler=io.BytesIO) 191 | ) 192 | product = pd.read_csv( 193 | z.open('product.txt'), 194 | sep='\t', 195 | dtype=object, header=0, encoding='cp1252' 196 | ) 197 | package = pd.read_csv( 198 | z.open('package.txt'), 199 | sep='\t', 200 | dtype=object, 201 | header=0, 202 | encoding='cp1252' 203 | ) 204 | sql_create_table('product', product) 205 | sql_create_table('package', package) 206 | del product 207 | del package 208 | 209 | # deletes FDA ZIP 210 | del z 211 | 212 | # NOTE: Rob's python code to join one of these tables with the rxcui_ndc table goes here 213 | """ 214 | rxcui_ndc_string = read_sql_string('rxcui_ndc.sql') 215 | rxcui_ndc = db_query(rxcui_ndc_string) 216 | sql_create_table('rxcui_ndc', rxcui_ndc) 217 | del rxcui_ndc 218 | """ 219 | 220 | # TEST!!!!!!!!!!!!!!!! reads record count from created database 221 | product = db_query("Select count(*) AS records from product limit 1") 222 | print('DB table product has {0} records'.format(product['records'].iloc[0])) 223 | 224 | package = db_query("Select count(*) AS records from package limit 1") 225 | print('DB table package has {0} records'.format(package['records'].iloc[0])) -------------------------------------------------------------------------------- /src/mdt/fda/__init__.py: -------------------------------------------------------------------------------- 1 | from . import utils 2 | -------------------------------------------------------------------------------- /src/mdt/fda/utils.py: -------------------------------------------------------------------------------- 1 | import requests 2 | from pathlib import Path 3 | 4 | 5 | def get_dataset( 6 | dest=Path.cwd(), 7 | handler=None 8 | ): 9 | url = 'https://www.accessdata.fda.gov/cder/ndctext.zip' 10 | response = requests.get(url) 11 | 12 | if handler: 13 | return handler(response.content) 14 | 15 | (dest / url.split('/')[-1]).write_bytes(response.content) 16 | 17 | return response 18 | -------------------------------------------------------------------------------- /src/mdt/meps/__init__.py: -------------------------------------------------------------------------------- 1 | from . import utils 2 | from . import columns 3 | 4 | __all__ = ['utils', 'columns'] 5 | -------------------------------------------------------------------------------- /src/mdt/meps/columns.py: -------------------------------------------------------------------------------- 1 | import pandas as pd 2 | 3 | #Source: https://www.meps.ahrq.gov/survey_comp/hc_technical_notes.shtml 4 | meps_region_states = pd.DataFrame({'region_value': [1, 2, 3, 4], 5 | 'region_label': ['Northeast', 'Midwest', 'South', 'West'], 6 | 'state': [['Connecticut', 'Maine', 'Massachusetts', 'New Hampshire', 'New Jersey', 7 | 'New York', 'Pennsylvania', 'Rhode Island', 'Vermont'], 8 | ['Indiana', 'Illinois', 'Iowa', 'Kansas', 'Michigan', 'Minnesota', 'Missouri', 9 | 'Nebraska', 'North Dakota', 'Ohio', 'South Dakota', 'Wisconsin'], 10 | ['Alabama', 'Arkansas', 'Delaware', 'District of Columbia', 'Florida', 11 | 'Georgia', 'Kentucky', 'Louisiana', 'Maryland', 'Mississippi', 'North Carolina', 'Oklahoma', 'South Carolina', 'Tennessee', 'Texas', 'Virginia', 12 | 'West Virginia'], 13 | ['Alaska', 'Arizona', 'California', 'Colorado', 'Hawaii', 'Idaho', 'Montana', 14 | 'Nevada', 'New Mexico', 'Oregon', 'Utah', 'Washington', 'Wyoming']] 15 | }).set_index(['region_value'])['state'].apply(pd.Series).stack().reset_index(level=1, drop=True).reset_index().rename(columns={0:'state'}).astype(str) 16 | 17 | 18 | 19 | d_col_names=["DUID", "PID", "DUPERSID", "PANEL", "FAMID31", "FAMID42", 20 | "FAMID53", "FAMID18", "FAMIDYR", "CPSFAMID", "FCSZ1231", 21 | "FCRP1231", "RULETR31", "RULETR42", "RULETR53", "RULETR18", 22 | "RUSIZE31", "RUSIZE42", "RUSIZE53", "RUSIZE18", "RUCLAS31", 23 | "RUCLAS42", "RUCLAS53", "RUCLAS18", "FAMSZE31", "FAMSZE42", 24 | "FAMSZE53", "FAMSZE18", "FMRS1231", "FAMS1231", "FAMSZEYR", 25 | "FAMRFPYR", "REGION31", "REGION42", "REGION53", "REGION18", 26 | "REFPRS31", "REFPRS42", "REFPRS53", "REFPRS18", "RESP31", 27 | "RESP42", "RESP53", "RESP18", "PROXY31", "PROXY42", 28 | "PROXY53", "PROXY18", "INTVLANG", "BEGRFM31", "BEGRFY31", 29 | "ENDRFM31", "ENDRFY31", "BEGRFM42", "BEGRFY42", "ENDRFM42", 30 | "ENDRFY42", "BEGRFM53", "BEGRFY53", "ENDRFM53", "ENDRFY53", 31 | "ENDRFM18", "ENDRFY18", "KEYNESS", "INSCOP31", "INSCOP42", 32 | "INSCOP53", "INSCOP18", "INSC1231", "INSCOPE", "ELGRND31", 33 | "ELGRND42", "ELGRND53", "ELGRND18", "PSTATS31", "PSTATS42", 34 | "PSTATS53", "RURSLT31", "RURSLT42", "RURSLT53", "AGE31X", 35 | "AGE42X", "AGE53X", "AGE18X", "AGELAST", "DOBMM", "DOBYY", 36 | "SEX", "RACEV1X", "RACEV2X", "RACEAX", "RACEBX", "RACEWX", 37 | "RACETHX", "HISPANX", "HISPNCAT", "MARRY31X", "MARRY42X", 38 | "MARRY53X", "MARRY18X", "SPOUID31", "SPOUID42", "SPOUID53", 39 | "SPOUID18", "SPOUIN31", "SPOUIN42", "SPOUIN53", "SPOUIN18", 40 | "EDUCYR", "HIDEG", "FTSTU31X", "FTSTU42X", "FTSTU53X", 41 | "FTSTU18X", "ACTDTY31", "ACTDTY42", "ACTDTY53", "HONRDC31", 42 | "HONRDC42", "REFRL31X", "REFRL42X", "REFRL53X", "REFRL18X", 43 | "OTHLANG", "LANGSPK", "HWELLSPE", "OTHLGSPK", "WHTLGSPK", 44 | "HWELLSPK", "BORNUSA", "YRSINUS", "MOPID31X", "MOPID42X", 45 | "MOPID53X", "DAPID31X", "DAPID42X", "DAPID53X", "RTHLTH31", 46 | "RTHLTH42", "RTHLTH53", "MNHLTH31", "MNHLTH42", "MNHLTH53", 47 | "HIBPDX", "HIBPAGED", "BPMLDX", "CHDDX", "CHDAGED", 48 | "ANGIDX", "ANGIAGED", "MIDX", "MIAGED", "OHRTDX", 49 | "OHRTAGED", "OHRTTYPE", "STRKDX", "STRKAGED", "EMPHDX", 50 | "EMPHAGED", "CHBRON31", "CHOLDX", "CHOLAGED", "CANCERDX", 51 | "CABLADDR", "CABREAST", "CACERVIX", "CACOLON", "CALUNG", 52 | "CALYMPH", "CAMELANO", "CAOTHER", "CAPROSTA", "CASKINNM", 53 | "CASKINDK", "CAUTERUS", "DIABDX_M18", "DIABAGED", 54 | "JTPAIN31_M18", "ARTHDX", "ARTHTYPE", "ARTHAGED", "ASTHDX", 55 | "ASTHAGED", "ASSTIL31", "ASATAK31", "ASTHEP31", "ASACUT31", 56 | "ASMRCN31", "ASPREV31", "ASDALY31", "ASPKFL31", "ASEVFL31", 57 | "ASWNFL31", "ADHDADDX", "ADHDAGED", "IADLHP31", "ADLHLP31", 58 | "AIDHLP31", "WLKLIM31", "LFTDIF31", "STPDIF31", "WLKDIF31", 59 | "MILDIF31", "STNDIF31", "BENDIF31", "RCHDIF31", "FNGRDF31", 60 | "ACTLIM31", "WRKLIM31", "HSELIM31", "SCHLIM31", "UNABLE31", 61 | "SOCLIM31", "COGLIM31", "DFHEAR42", "DFSEE42", "DFCOG42", 62 | "DFWLKC42", "DFDRSB42", "DFERND42", "ANYLMI18", "CHPMED42", 63 | "CHPMHB42", "CHPMCN42", "CHSERV42", "CHSRHB42", "CHSRCN42", 64 | "CHLIMI42", "CHLIHB42", "CHLICO42", "CHTHER42", "CHTHHB42", 65 | "CHTHCO42", "CHCOUN42", "CHEMPB42", "CSHCN42", "MESHGT42", 66 | "WHNHGT42", "MESWGT42", "WHNWGT42", "CHBMIX42", "MESVIS42", 67 | "EATHLT42", "WHNEAT42", "PHYSCL42", "WHNPHY42", "SAFEST42", 68 | "WHNSAF42", "BOOST42", "WHNBST42", "LAPBLT42", "WHNLAP42", 69 | "HELMET42", "WHNHEL42", "NOSMOK42", "WHNSMK42", "TIMALN42", 70 | "LSTETH53", "PHYEXE53", "OFTSMK53", "SAQELIG", "ADSEX42", 71 | "ADAGE42", "ADPROX42", "ADGENH42", "ADDAYA42", "ADCLIM42", 72 | "ADACLS42", "ADWKLM42", "ADEMLS42", "ADMWCF42", "ADPAIN42", 73 | "ADPCFL42", "ADENGY42", "ADPRST42", "ADSOCA42", "VPCS42", 74 | "VMCS42", "VRFLAG42", "ADNERV42", "ADHOPE42", "ADREST42", 75 | "ADSAD42", "ADEFRT42", "ADWRTH42", "K6SUM42", "ADINTR42", 76 | "ADDPRS42", "PHQ242", "ADBRTC42", "ADMDVT42", "ADFLST42", 77 | "ADWGHD42", "ADBMI42", "ADWTAD42", "ADKALC42", "ADRNK542", 78 | "ADRNK442", "ADSTAL42", "ADTBAC42", "ADOFTB42", "ADQTTB42", 79 | "ADQTMD42", "ADQTHP42", "ADMOOD42", "ADBPCK42", "ADCHLC42", 80 | "ADPNEU42", "ADSHNG42", "ADNOAP42", "ADDSCU42", "ADCOLN42", 81 | "ADCLNS42", "ADSGMD42", "ADBLDS42", "ADPROS42", "ADPSAG42", 82 | "ADUTRM42", "ADPAP42", "ADPAPG42", "ADOSTP42", "ADBNDN42", 83 | "ADBRST42", "ADMMGR42", "ADCMPM42", "ADCMPY42", "ADLANG42", 84 | "VSAQELIG", "VACTDY53", "VAPRHT53", "VACOPD53", "VADERM53", 85 | "VAGERD53", "VAHRLS53", "VABACK53", "VAJTPN53", "VARTHR53", 86 | "VAGOUT53", "VANECK53", "VATMD53", "VAPTSD53", "VALCOH53", 87 | "VABIPL53", "VADEPR53", "VAMOOD53", "VAPROS53", "VARHAB53", 88 | "VAMNHC53", "VAGCNS53", "VARXMD53", "VACRGV53", "VAMOBL53", 89 | "VACOST53", "VARECM53", "VAREP53", "VAWAIT53", "VALOCT53", 90 | "VANTWK53", "VANEED53", "VAOUT53", "VAPAST53", "VACOMP53", 91 | "VAMREC53", "VAGTRC53", "VACARC53", "VAPROB53", "VACARE53", 92 | "VAPACT53", "VAPCPR53", "VAPROV53", "VAPCOT53", "VAPCCO53", 93 | "VAPCRC53", "VAPCSN53", "VAPCRF53", "VAPCSO53", "VAPCOU53", 94 | "VAPCUN53", "VASPCL53", "VASPMH53", "VASPOU53", "VASPUN53", 95 | "VACMPM53", "VACMPY53", "VAPROX53", "DCSELIG", "DSDIA53", 96 | "DSA1C53", "DSFT1953", "DSFT1853", "DSFT1753", "DSFB1753", 97 | "DSFTNV53", "DSEY1953", "DSEY1853", "DSEY1753", "DSEB1753", 98 | "DSEYNV53", "DSCH1953", "DSCH1853", "DSCH1753", "DSCB1753", 99 | "DSCHNV53", "DSFL1953", "DSFL1853", "DSFL1753", "DSVB1753", 100 | "DSFLNV53", "DSKIDN53", "DSEYPR53", "DSDIET53", "DSMED53", 101 | "DSINSU53", "DSCPCP53", "DSCNPC53", "DSCPHN53", "DSCINT53", 102 | "DSCGRP53", "DSCONF53", "DSPRX53", "DDNWRK18", "OTHDYS18", 103 | "OTHNDD18", "ACCELI42", "HAVEUS42", "PRACTP42", 104 | "YNOUSC42_M18", "PROVTY42_M18", "PLCTYP42", "TMTKUS42", 105 | "TYPEPE42", "LOCATN42", "HSPLAP42", "WHITPR42", "BLCKPR42", 106 | "ASIANP42", "NATAMP42", "PACISP42", "OTHRCP42", "GENDRP42", 107 | "PHNREG42", "OFFHOU42", "AFTHOU42", "TREATM42", "DECIDE42", 108 | "EXPLOP42", "PRVSPK42", "DLAYCA42", "AFRDCA42", "DLAYDN42", 109 | "AFRDDN42", "DLAYPM42", "AFRDPM42", "EMPST31", "EMPST42", 110 | "EMPST53", "RNDFLG31", "MORJOB31", "MORJOB42", "MORJOB53", 111 | "EVRWRK", "HRWG31X", "HRWG42X", "HRWG53X", "HRWGIM31", 112 | "HRWGIM42", "HRWGIM53", "HRHOW31", "HRHOW42", "HRHOW53", 113 | "DIFFWG31", "DIFFWG42", "DIFFWG53", "NHRWG31", "NHRWG42", 114 | "NHRWG53", "HOUR31", "HOUR42", "HOUR53", "TEMPJB31", 115 | "TEMPJB42", "TEMPJB53", "SSNLJB31", "SSNLJB42", "SSNLJB53", 116 | "SELFCM31", "SELFCM42", "SELFCM53", "DISVW31X", "DISVW42X", 117 | "DISVW53X", "CHOIC31", "CHOIC42", "CHOIC53", "INDCAT31", 118 | "INDCAT42", "INDCAT53", "NUMEMP31", "NUMEMP42", "NUMEMP53", 119 | "MORE31", "MORE42", "MORE53", "UNION31", "UNION42", 120 | "UNION53", "NWK31", "NWK42", "NWK53", "CHGJ3142", 121 | "CHGJ4253", "YCHJ3142", "YCHJ4253", "STJBMM31", "STJBYY31", 122 | "STJBMM42", "STJBYY42", "STJBMM53", "STJBYY53", "EVRETIRE", 123 | "OCCCAT31", "OCCCAT42", "OCCCAT53", "PAYVAC31", "PAYVAC42", 124 | "PAYVAC53", "SICPAY31", "SICPAY42", "SICPAY53", "PAYDR31", 125 | "PAYDR42", "PAYDR53", "RETPLN31", "RETPLN42", "RETPLN53", 126 | "BSNTY31", "BSNTY42", "BSNTY53", "JOBORG31", "JOBORG42", 127 | "JOBORG53", "HELD31X", "HELD42X", "HELD53X", "OFFER31X", 128 | "OFFER42X", "OFFER53X", "OFREMP31", "OFREMP42", "OFREMP53", 129 | "EMPST31H", "EMPST42H", "EMPST53H", "SLFCM31H", "SLFCM42H", 130 | "SLFCM53H", "NMEMP31H", "NMEMP42H", "NMEMP53H", "MORE31H", 131 | "MORE42H", "MORE53H", "INDCT31H", "INDCT42H", "INDCT53H", 132 | "OCCCT31H", "OCCCT42H", "OCCCT53H", "HOUR31H", "HOUR42H", 133 | "HOUR53H", "JBORG31H", "JBORG42H", "JBORG53H", "UNION31H", 134 | "UNION42H", "UNION53H", "BSNTY31H", "BSNTY42H", "BSNTY53H", 135 | "HRWG31H", "HRWG42H", "HRWG53H", "CMJHLD31", "CMJHLD42", 136 | "CMJHLD53", "OFFER31H", "OFFER42H", "OFFER53H", "OFEMP31H", 137 | "OFEMP42H", "OFEMP53H", "PYVAC31H", "PYVAC42H", "PYVAC53H", 138 | "SCPAY31H", "SCPAY42H", "SCPAY53H", "PAYDR31H", "PAYDR42H", 139 | "PAYDR53H", "RTPLN31H", "RTPLN42H", "RTPLN53H", "FILEDR18", 140 | "WILFIL18", "FLSTAT18", "FILER18", "JTINRU18", "JNTPID18", 141 | "TAXFRM18", "FOODST18", "FOODMN18", "FOODVL18", "TTLP18X", 142 | "FAMINC18", "POVCAT18", "POVLEV18", "WAGEP18X", "WAGIMP18", 143 | "BUSNP18X", "BUSIMP18", "UNEMP18X", "UNEIMP18", "WCMPP18X", 144 | "WCPIMP18", "INTRP18X", "INTIMP18", "DIVDP18X", "DIVIMP18", 145 | "SALEP18X", "SALIMP18", "PENSP18X", "PENIMP18", "SSECP18X", 146 | "SSCIMP18", "TRSTP18X", "TRTIMP18", "VETSP18X", "VETIMP18", 147 | "IRASP18X", "IRAIMP18", "ALIMP18X", "ALIIMP18", "CHLDP18X", 148 | "CHLIMP18", "CASHP18X", "CSHIMP18", "SSIP18X", "SSIIMP18", 149 | "PUBP18X", "PUBIMP18", "OTHRP18X", "OTHIMP18", "HIEUIDX", 150 | "TRIJA18X", "TRIFE18X", "TRIMA18X", "TRIAP18X", "TRIMY18X", 151 | "TRIJU18X", "TRIJL18X", "TRIAU18X", "TRISE18X", "TRIOC18X", 152 | "TRINO18X", "TRIDE18X", "MCRJA18", "MCRFE18", "MCRMA18", 153 | "MCRAP18", "MCRMY18", "MCRJU18", "MCRJL18", "MCRAU18", 154 | "MCRSE18", "MCROC18", "MCRNO18", "MCRDE18", "MCRJA18X", 155 | "MCRFE18X", "MCRMA18X", "MCRAP18X", "MCRMY18X", "MCRJU18X", 156 | "MCRJL18X", "MCRAU18X", "MCRSE18X", "MCROC18X", "MCRNO18X", 157 | "MCRDE18X", "MCDJA18", "MCDFE18", "MCDMA18", "MCDAP18", 158 | "MCDMY18", "MCDJU18", "MCDJL18", "MCDAU18", "MCDSE18", 159 | "MCDOC18", "MCDNO18", "MCDDE18", "MCDJA18X", "MCDFE18X", 160 | "MCDMA18X", "MCDAP18X", "MCDMY18X", "MCDJU18X", "MCDJL18X", 161 | "MCDAU18X", "MCDSE18X", "MCDOC18X", "MCDNO18X", "MCDDE18X", 162 | "GVAJA18", "GVAFE18", "GVAMA18", "GVAAP18", "GVAMY18", 163 | "GVAJU18", "GVAJL18", "GVAAU18", "GVASE18", "GVAOC18", 164 | "GVANO18", "GVADE18", "GVBJA18", "GVBFE18", "GVBMA18", 165 | "GVBAP18", "GVBMY18", "GVBJU18", "GVBJL18", "GVBAU18", 166 | "GVBSE18", "GVBOC18", "GVBNO18", "GVBDE18", "GVCJA18", 167 | "GVCFE18", "GVCMA18", "GVCAP18", "GVCMY18", "GVCJU18", 168 | "GVCJL18", "GVCAU18", "GVCSE18", "GVCOC18", "GVCNO18", 169 | "GVCDE18", "VAPJA18", "VAPFE18", "VAPMA18", "VAPAP18", 170 | "VAPMY18", "VAPJU18", "VAPJL18", "VAPAU18", "VAPSE18", 171 | "VAPOC18", "VAPNO18", "VAPDE18", "IHSJA18", "IHSFE18", 172 | "IHSMA18", "IHSAP18", "IHSMY18", "IHSJU18", "IHSJL18", 173 | "IHSAU18", "IHSSE18", "IHSOC18", "IHSNO18", "IHSDE18", 174 | "PUBJA18X", "PUBFE18X", "PUBMA18X", "PUBAP18X", "PUBMY18X", 175 | "PUBJU18X", "PUBJL18X", "PUBAU18X", "PUBSE18X", "PUBOC18X", 176 | "PUBNO18X", "PUBDE18X", "PEGJA18", "PEGFE18", "PEGMA18", 177 | "PEGAP18", "PEGMY18", "PEGJU18", "PEGJL18", "PEGAU18", 178 | "PEGSE18", "PEGOC18", "PEGNO18", "PEGDE18", "PDKJA18", 179 | "PDKFE18", "PDKMA18", "PDKAP18", "PDKMY18", "PDKJU18", 180 | "PDKJL18", "PDKAU18", "PDKSE18", "PDKOC18", "PDKNO18", 181 | "PDKDE18", "PNGJA18", "PNGFE18", "PNGMA18", "PNGAP18", 182 | "PNGMY18", "PNGJU18", "PNGJL18", "PNGAU18", "PNGSE18", 183 | "PNGOC18", "PNGNO18", "PNGDE18", "POGJA18", "POGFE18", 184 | "POGMA18", "POGAP18", "POGMY18", "POGJU18", "POGJL18", 185 | "POGAU18", "POGSE18", "POGOC18", "POGNO18", "POGDE18", 186 | "POEJA18", "POEFE18", "POEMA18", "POEAP18", "POEMY18", 187 | "POEJU18", "POEJL18", "POEAU18", "POESE18", "POEOC18", 188 | "POENO18", "POEDE18", "PNEJA18", "PNEFE18", "PNEMA18", 189 | "PNEAP18", "PNEMY18", "PNEJU18", "PNEJL18", "PNEAU18", 190 | "PNESE18", "PNEOC18", "PNENO18", "PNEDE18", "PRXJA18", 191 | "PRXFE18", "PRXMA18", "PRXAP18", "PRXMY18", "PRXJU18", 192 | "PRXJL18", "PRXAU18", "PRXSE18", "PRXOC18", "PRXNO18", 193 | "PRXDE18", "PRIJA18", "PRIFE18", "PRIMA18", "PRIAP18", 194 | "PRIMY18", "PRIJU18", "PRIJL18", "PRIAU18", "PRISE18", 195 | "PRIOC18", "PRINO18", "PRIDE18", "HPEJA18", "HPEFE18", 196 | "HPEMA18", "HPEAP18", "HPEMY18", "HPEJU18", "HPEJL18", 197 | "HPEAU18", "HPESE18", "HPEOC18", "HPENO18", "HPEDE18", 198 | "HPDJA18", "HPDFE18", "HPDMA18", "HPDAP18", "HPDMY18", 199 | "HPDJU18", "HPDJL18", "HPDAU18", "HPDSE18", "HPDOC18", 200 | "HPDNO18", "HPDDE18", "HPNJA18", "HPNFE18", "HPNMA18", 201 | "HPNAP18", "HPNMY18", "HPNJU18", "HPNJL18", "HPNAU18", 202 | "HPNSE18", "HPNOC18", "HPNNO18", "HPNDE18", "HPOJA18", 203 | "HPOFE18", "HPOMA18", "HPOAP18", "HPOMY18", "HPOJU18", 204 | "HPOJL18", "HPOAU18", "HPOSE18", "HPOOC18", "HPONO18", 205 | "HPODE18", "HPXJA18", "HPXFE18", "HPXMA18", "HPXAP18", 206 | "HPXMY18", "HPXJU18", "HPXJL18", "HPXAU18", "HPXSE18", 207 | "HPXOC18", "HPXNO18", "HPXDE18", "HPRJA18", "HPRFE18", 208 | "HPRMA18", "HPRAP18", "HPRMY18", "HPRJU18", "HPRJL18", 209 | "HPRAU18", "HPRSE18", "HPROC18", "HPRNO18", "HPRDE18", 210 | "INSJA18X", "INSFE18X", "INSMA18X", "INSAP18X", "INSMY18X", 211 | "INSJU18X", "INSJL18X", "INSAU18X", "INSSE18X", "INSOC18X", 212 | "INSNO18X", "INSDE18X", "PRVEV18", "TRIEV18", "MCREV18", 213 | "MCDEV18", "VAEV18", "GVAEV18", "GVBEV18", "GVCEV18", 214 | "UNINS18", "INSCOV18", "INSURC18", "TRIST31X", "TRIST42X", 215 | "TRIST18X", "TRIPR31X", "TRIPR42X", "TRIPR18X", "TRIEX31X", 216 | "TRIEX42X", "TRIEX18X", "TRILI31X", "TRILI42X", "TRILI18X", 217 | "TRICH31X", "TRICH42X", "TRICH18X", "MCRPD31", "MCRPD42", 218 | "MCRPD18", "MCRPD31X", "MCRPD42X", "MCRPD18X", "MCRPB31", 219 | "MCRPB42", "MCRPB18", "MCRPHO31", "MCRPHO42", "MCRPHO18", 220 | "MCDHMO31", "MCDHMO42", "MCDHMO18", "MCDMC31", "MCDMC42", 221 | "MCDMC18", "PRVHMO31", "PRVHMO42", "PRVHMO18", "FSAGT31", 222 | "HASFSA31", "PFSAMT31", "PREVCOVR", "MORECOVR", "TRICR31X", 223 | "TRICR42X", "TRICR53X", "TRICR18X", "TRIAT31X", "TRIAT42X", 224 | "TRIAT53X", "TRIAT18X", "MCAID31", "MCAID42", "MCAID53", 225 | "MCAID18", "MCAID31X", "MCAID42X", "MCAID53X", "MCAID18X", 226 | "MCARE31", "MCARE42", "MCARE53", "MCARE18", "MCARE31X", 227 | "MCARE42X", "MCARE53X", "MCARE18X", "MCDAT31X", "MCDAT42X", 228 | "MCDAT53X", "MCDAT18X", "GOVTA31", "GOVTA42", "GOVTA53", 229 | "GOVTA18", "GOVAAT31", "GOVAAT42", "GOVAAT53", "GOVAAT18", 230 | "GOVTB31", "GOVTB42", "GOVTB53", "GOVTB18", "GOVBAT31", 231 | "GOVBAT42", "GOVBAT53", "GOVBAT18", "GOVTC31", "GOVTC42", 232 | "GOVTC53", "GOVTC18", "GOVCAT31", "GOVCAT42", "GOVCAT53", 233 | "GOVCAT18", "VAPROG31", "VAPROG42", "VAPROG53", "VAPROG18", 234 | "VAPRAT31", "VAPRAT42", "VAPRAT53", "VAPRAT18", "IHS31", 235 | "IHS42", "IHS53", "IHS18", "IHSAT31", "IHSAT42", "IHSAT53", 236 | "IHSAT18", "PRIDK31", "PRIDK42", "PRIDK53", "PRIDK18", 237 | "PRIEU31", "PRIEU42", "PRIEU53", "PRIEU18", "PRING31", 238 | "PRING42", "PRING53", "PRING18", "PRIOG31", "PRIOG42", 239 | "PRIOG53", "PRIOG18", "PRINEO31", "PRINEO42", "PRINEO53", 240 | "PRINEO18", "PRIEUO31", "PRIEUO42", "PRIEUO53", "PRIEUO18", 241 | "PRSTX31", "PRSTX42", "PRSTX53", "PRSTX18", "PRIV31", 242 | "PRIV42", "PRIV53", "PRIV18", "PRIVAT31", "PRIVAT42", 243 | "PRIVAT53", "PRIVAT18", "PUB31X", "PUB42X", "PUB53X", 244 | "PUB18X", "PUBAT31X", "PUBAT42X", "PUBAT53X", "PUBAT18X", 245 | "VERFLG31", "VERFLG42", "VERFLG18", "INS31X", "INS42X", 246 | "INS53X", "INS18X", "INSAT31X", "INSAT42X", "INSAT53X", 247 | "INSAT18X", "DENTIN31", "DENTIN42", "DENTIN53", "DNTINS31", 248 | "DNTINS18", "PMEDIN31", "PMEDIN42", "PMEDIN53", "PMDINS31", 249 | "PMDINS18", "PROBPY42", "CRFMPY42", "PYUNBL42", "PMEDUP31", 250 | "PMEDUP42", "PMEDUP53", "PMEDPY31", "PMEDPY42", "PMEDPY53", 251 | "TOTTCH18", "TOTEXP18", "TOTSLF18", "TOTMCR18", "TOTMCD18", 252 | "TOTPRV18", "TOTVA18", "TOTTRI18", "TOTOFD18", "TOTSTL18", 253 | "TOTWCP18", "TOTOPR18", "TOTOPU18", "TOTOSR18", "TOTPTR18", 254 | "TOTOTH18", "OBTOTV18", "OBVTCH18", "OBVEXP18", "OBVSLF18", 255 | "OBVMCR18", "OBVMCD18", "OBVPRV18", "OBVVA18", "OBVTRI18", 256 | "OBVOFD18", "OBVSTL18", "OBVWCP18", "OBVOPR18", "OBVOPU18", 257 | "OBVOSR18", "OBVPTR18", "OBVOTH18", "OBDRV18", "OBDTCH18", 258 | "OBDEXP18", "OBDSLF18", "OBDMCR18", "OBDMCD18", "OBDPRV18", 259 | "OBDVA18", "OBDTRI18", "OBDOFD18", "OBDSTL18", "OBDWCP18", 260 | "OBDOPR18", "OBDOPU18", "OBDOSR18", "OBDPTR18", "OBDOTH18", 261 | "OPTOTV18", "OPTTCH18", "OPTEXP18", "OPTSLF18", "OPTMCR18", 262 | "OPTMCD18", "OPTPRV18", "OPTVA18", "OPTTRI18", "OPTOFD18", 263 | "OPTSTL18", "OPTWCP18", "OPTOPR18", "OPTOPU18", "OPTOSR18", 264 | "OPTPTR18", "OPTOTH18", "OPFTCH18", "OPFEXP18", "OPFSLF18", 265 | "OPFMCR18", "OPFMCD18", "OPFPRV18", "OPFVA18", "OPFTRI18", 266 | "OPFOFD18", "OPFSTL18", "OPFWCP18", "OPFOPR18", "OPFOPU18", 267 | "OPFOSR18", "OPFPTR18", "OPFOTH18", "OPDEXP18", "OPDTCH18", 268 | "OPDSLF18", "OPDMCR18", "OPDMCD18", "OPDPRV18", "OPDVA18", 269 | "OPDTRI18", "OPDOFD18", "OPDSTL18", "OPDWCP18", "OPDOPR18", 270 | "OPDOPU18", "OPDOSR18", "OPDPTR18", "OPDOTH18", "OPDRV18", 271 | "OPVTCH18", "OPVEXP18", "OPVSLF18", "OPVMCR18", "OPVMCD18", 272 | "OPVPRV18", "OPVVA18", "OPVTRI18", "OPVOFD18", "OPVSTL18", 273 | "OPVWCP18", "OPVOPR18", "OPVOPU18", "OPVOSR18", "OPVPTR18", 274 | "OPVOTH18", "OPSEXP18", "OPSTCH18", "OPSSLF18", "OPSMCR18", 275 | "OPSMCD18", "OPSPRV18", "OPSVA18", "OPSTRI18", "OPSOFD18", 276 | "OPSSTL18", "OPSWCP18", "OPSOPR18", "OPSOPU18", "OPSOSR18", 277 | "OPSPTR18", "OPSOTH18", "ERTOT18", "ERTTCH18", "ERTEXP18", 278 | "ERTSLF18", "ERTMCR18", "ERTMCD18", "ERTPRV18", "ERTVA18", 279 | "ERTTRI18", "ERTOFD18", "ERTSTL18", "ERTWCP18", "ERTOPR18", 280 | "ERTOPU18", "ERTOSR18", "ERTPTR18", "ERTOTH18", "ERFTCH18", 281 | "ERFEXP18", "ERFSLF18", "ERFMCR18", "ERFMCD18", "ERFPRV18", 282 | "ERFVA18", "ERFTRI18", "ERFOFD18", "ERFSTL18", "ERFWCP18", 283 | "ERFOPR18", "ERFOPU18", "ERFOSR18", "ERFPTR18", "ERFOTH18", 284 | "ERDEXP18", "ERDTCH18", "ERDSLF18", "ERDMCR18", "ERDMCD18", 285 | "ERDPRV18", "ERDVA18", "ERDTRI18", "ERDOFD18", "ERDSTL18", 286 | "ERDWCP18", "ERDOPR18", "ERDOPU18", "ERDOSR18", "ERDPTR18", 287 | "ERDOTH18", "IPDIS18", "IPTEXP18", "IPTTCH18", "IPTSLF18", 288 | "IPTMCR18", "IPTMCD18", "IPTPRV18", "IPTVA18", "IPTTRI18", 289 | "IPTOFD18", "IPTSTL18", "IPTWCP18", "IPTOPR18", "IPTOPU18", 290 | "IPTOSR18", "IPTPTR18", "IPTOTH18", "IPFEXP18", "IPFTCH18", 291 | "IPFSLF18", "IPFMCR18", "IPFMCD18", "IPFPRV18", "IPFVA18", 292 | "IPFTRI18", "IPFOFD18", "IPFSTL18", "IPFWCP18", "IPFOPR18", 293 | "IPFOPU18", "IPFOSR18", "IPFPTR18", "IPFOTH18", "IPDEXP18", 294 | "IPDTCH18", "IPDSLF18", "IPDMCR18", "IPDMCD18", "IPDPRV18", 295 | "IPDVA18", "IPDTRI18", "IPDOFD18", "IPDSTL18", "IPDWCP18", 296 | "IPDOPR18", "IPDOPU18", "IPDOSR18", "IPDPTR18", "IPDOTH18", 297 | "IPNGTD18", "DVTOT18", "DVTTCH18", "DVTEXP18", "DVTSLF18", 298 | "DVTMCR18", "DVTMCD18", "DVTPRV18", "DVTVA18", "DVTTRI18", 299 | "DVTOFD18", "DVTSTL18", "DVTWCP18", "DVTOPR18", "DVTOPU18", 300 | "DVTOSR18", "DVTPTR18", "DVTOTH18", "HHTOTD18", "HHAGD18", 301 | "HHATCH18", "HHAEXP18", "HHASLF18", "HHAMCR18", "HHAMCD18", 302 | "HHAPRV18", "HHAVA18", "HHATRI18", "HHAOFD18", "HHASTL18", 303 | "HHAWCP18", "HHAOPR18", "HHAOPU18", "HHAOSR18", "HHAPTR18", 304 | "HHAOTH18", "HHINDD18", "HHNTCH18", "HHNEXP18", "HHNSLF18", 305 | "HHNMCD18", "HHNMCR18", "HHNPRV18", "HHNVA18", "HHNTRI18", 306 | "HHNOFD18", "HHNSTL18", "HHNWCP18", "HHNOPR18", "HHNOPU18", 307 | "HHNOSR18", "HHNPTR18", "HHNOTH18", "HHINFD18", "VISEXP18", 308 | "VISTCH18", "VISSLF18", "VISMCR18", "VISMCD18", "VISPRV18", 309 | "VISVA18", "VISTRI18", "VISOFD18", "VISSTL18", "VISWCP18", 310 | "VISOPR18", "VISOPU18", "VISOSR18", "VISPTR18", "VISOTH18", 311 | "OTHTCH18", "OTHEXP18", "OTHSLF18", "OTHMCR18", "OTHMCD18", 312 | "OTHPRV18", "OTHVA18", "OTHTRI18", "OTHOFD18", "OTHSTL18", 313 | "OTHWCP18", "OTHOPR18", "OTHOPU18", "OTHOSR18", "OTHPTR18", 314 | "OTHOTH18", "RXTOT18", "RXEXP18", "RXSLF18", "RXMCR18", 315 | "RXMCD18", "RXPRV18", "RXVA18", "RXTRI18", "RXOFD18", 316 | "RXSTL18", "RXWCP18", "RXOPR18", "RXOPU18", "RXOSR18", 317 | "RXPTR18", "RXOTH18", "PERWT18F", "FAMWT18F", "FAMWT18C", 318 | "SAQWT18F", "DIABW18F", "VSAQW18F", "VARSTR", "VARPSU"] 319 | 320 | d_col_spaces = [(0,7), 321 | (7,10), 322 | (10,20), 323 | (20,22), 324 | (22,24), 325 | (24,26), 326 | (26,28), 327 | (28,30), 328 | (30,32), 329 | (32,34), 330 | (34,36), 331 | (36,38), 332 | (38,40), 333 | (40,42), 334 | (42,44), 335 | (44,47), 336 | (47,49), 337 | (49,51), 338 | (51,53), 339 | (53,55), 340 | (55,57), 341 | (57,59), 342 | (59,61), 343 | (61,62), 344 | (62,64), 345 | (64,66), 346 | (66,68), 347 | (68,70), 348 | (70,72), 349 | (72,74), 350 | (74,76), 351 | (76,77), 352 | (77,79), 353 | (79,81), 354 | (81,83), 355 | (83,85), 356 | (85,88), 357 | (88,91), 358 | (91,94), 359 | (94,97), 360 | (97,98), 361 | (98,99), 362 | (99,100), 363 | (100,101), 364 | (101,103),(103,105),(105,107),(107,108),(108,110),(110,112),(112,116),(116,118),(118,122),(122,124),(124,128),(128,130),(130,134),(134,136),(136,140),(140,142),(142,146),(146,148),(148,152),(152,153),(153,154),(154,155),(155,156),(156,157),(157,158),(158,159),(159,160),(160,161),(161,162),(162,163),(163,165),(165,167),(167,169),(169,171),(171,173),(173,175),(175,177),(177,179),(179,181),(181,183),(183,185),(185,187),(187,191),(191,192),(192,193),(193,195),(195,196),(196,197),(197,198),(198,199),(199,200),(200,201),(201,203),(203,205),(205,207),(207,209),(209,212),(212,215),(215,218),(218,221),(221,224),(224,227),(227,230),(230,233),(233,236),(236,239),(239,241),(241,243),(243,245),(245,247),(247,249),(249,251),(251,253),(253,256),(256,259),(259,261),(261,263),(263,265),(265,267),(267,270),(270,272),(272,275),(275,278),(278,280),(280,282),(282,285),(285,287),(287,290),(290,293),(293,296),(296,299),(299,302),(302,305),(305,307),(307,309),(309,311),(311,313),(313,315),(315,317),(317,320),(320,322),(322,324),(324,327),(327,329),(329,332),(332,334),(334,337),(337,339),(339,342),(342,344),(344,346),(346,349),(349,351),(351,354),(354,356),(356,358),(358,361),(361,363),(363,366),(366,368),(368,370),(370,372),(372,374),(374,376),(376,378),(378,380),(380,382),(382,384),(384,386),(386,388),(388,390),(390,393),(393,395),(395,397),(397,400),(400,402),(402,405),(405,407),(407,409),(409,412),(412,415),(415,417),(417,420),(420,422),(422,425),(425,427),(427,430),(430,432),(432,434),(434,437),(437,439),(439,441),(441,443),(443,445),(445,447),(447,449),(449,451),(451,453),(453,455),(455,457),(457,459),(459,461),(461,463),(463,465),(465,467),(467,469),(469,471),(471,473),(473,475),(475,477),(477,479),(479,481),(481,483),(483,485),(485,487),(487,489),(489,492),(492,495),(495,497),(497,499),(499,502),(502,504),(504,506),(506,509),(509,511),(511,513),(513,516),(516,518),(518,520),(520,523),(523,525),(525,527),(527,530),(530,532),(532,535),(535,537),(537,542),(542,545),(545,548),(548,550),(550,553),(553,555),(555,557),(557,559),(559,562),(562,564),(564,567),(567,569),(569,572),(572,574),(574,577),(577,579),(579,582),(582,584),(584,586),(586,588),(588,589),(589,592),(592,595),(595,598),(598,601),(601,604),(604,607),(607,610),(610,613),(613,616),(616,619),(619,622),(622,625),(625,628),(628,631),(631,634),(634,640),(640,646),(646,648),(648,651),(651,654),(654,657),(657,660),(660,663),(663,666),(666,669),(669,672),(672,675),(675,678),(678,681),(681,684),(684,687),(687,690),(690,695),(695,698),(698,701),(701,704),(704,707),(707,710),(710,713),(713,716),(716,719),(719,722),(722,725),(725,728),(728,731),(731,734),(734,737),(737,740),(740,743),(743,746),(746,749),(749,752),(752,755),(755,758),(758,761),(761,764),(764,767),(767,770),(770,773),(773,776),(776,779),(779,782),(782,785),(785,788),(788,792),(792,794),(794,795),(795,797),(797,800),(800,803),(803,806),(806,809),(809,812),(812,815),(815,818),(818,821),(821,824),(824,827),(827,830),(830,833),(833,836),(836,839),(839,842),(842,845),(845,848),(848,851),(851,854),(854,857),(857,860),(860,863),(863,866),(866,869),(869,872),(872,875),(875,878),(878,881),(881,884),(884,887),(887,890),(890,893),(893,896),(896,899),(899,902),(902,905),(905,908),(908,911),(911,914),(914,917),(917,920),(920,923),(923,926),(926,929),(929,931),(931,934),(934,937),(937,940),(940,942),(942,945),(945,948),(948,951),(951,954),(954,957),(957,961),(961,964),(964,965),(965,967),(967,970),(970,973),(973,976),(976,979),(979,982),(982,985),(985,988),(988,991),(991,994),(994,997),(997,1000), 365 | (1000,1003),(1003,1006),(1006,1009),(1009,1012),(1012,1015),(1015,1018),(1018,1021),(1021,1024),(1024,1027),(1027,1030),(1030,1033),(1033,1036),(1036,1039),(1039,1042),(1042,1045),(1045,1047),(1047,1049),(1049,1051),(1051,1053),(1053,1055),(1055,1058),(1058,1061),(1061,1064),(1064,1067),(1067,1070),(1070,1072),(1072,1074),(1074,1076),(1076,1078),(1078,1080),(1080,1082),(1082,1084),(1084,1086),(1086,1088),(1088,1090),(1090,1092),(1092,1094),(1094,1096),(1096,1098),(1098,1100),(1100,1102),(1102,1104),(1104,1106),(1106,1108),(1108,1110),(1110,1112),(1112,1114),(1114,1116),(1116,1118),(1118,1120),(1120,1122),(1122,1124),(1124,1126),(1126,1128),(1128,1130),(1130,1133),(1133,1136),(1136,1139),(1139,1141),(1141,1144),(1144,1147),(1147,1150),(1150,1153),(1153,1159),(1159,1165),(1165,1171),(1171,1172),(1172,1173),(1173,1174),(1174,1177),(1177,1180),(1180,1183),(1183,1186),(1186,1189),(1189,1192),(1192,1198),(1198,1204),(1204,1210),(1210,1213),(1213,1216),(1216,1219),(1219,1222),(1222,1225),(1225,1228),(1228,1231),(1231,1234),(1234,1237),(1237,1240),(1240,1243),(1243,1246),(1246,1249),(1249,1252),(1252,1255),(1255,1258),(1258,1261),(1261,1264),(1264,1267),(1267,1270),(1270,1273),(1273,1276),(1276,1279),(1279,1282),(1282,1285),(1285,1288),(1288,1291),(1291,1294),(1294,1297),(1297,1300),(1300,1303),(1303,1305),(1305,1307),(1307,1310),(1310,1313),(1313,1316),(1316,1319),(1319,1322),(1322,1326),(1326,1329),(1329,1333),(1333,1336),(1336,1340),(1340,1343),(1343,1346),(1346,1349),(1349,1352),(1352,1355),(1355,1358),(1358,1361),(1361,1364),(1364,1367),(1367,1370),(1370,1373),(1373,1376),(1376,1379),(1379,1382),(1382,1385),(1385,1388),(1388,1391),(1391,1394),(1394,1397),(1397,1400),(1400,1403),(1403,1406),(1406,1409),(1409,1412),(1412,1415),(1415,1418),(1418,1421),(1421,1424),(1424,1427),(1427,1430),(1430,1433),(1433,1435),(1435,1437),(1437,1439),(1439,1441),(1441,1443),(1443,1445),(1445,1448),(1448,1451),(1451,1454),(1454,1456),(1456,1458),(1458,1460),(1460,1462),(1462,1464),(1464,1466),(1466,1468),(1468,1470),(1470,1472),(1472,1475),(1475,1478),(1478,1481),(1481,1483),(1483,1485),(1485,1487),(1487,1489),(1489,1491),(1491,1493),(1493,1495),(1495,1497),(1497,1499),(1499,1505),(1505,1511),(1511,1517),(1517,1519),(1519,1521),(1521,1523),(1523,1525),(1525,1527),(1527,1529),(1529,1531),(1531,1533),(1533,1535),(1535,1537),(1537,1539),(1539,1541),(1541,1543),(1543,1545),(1545,1547),(1547,1549),(1549,1551),(1551,1553),(1553,1555),(1555,1557),(1557,1559),(1559,1561),(1561,1563),(1563,1565),(1565,1567),(1567,1569),(1569,1572),(1572,1574),(1574,1576),(1576,1578),(1578,1582),(1582,1589),(1589,1596),(1596,1597),(1597,1609),(1609,1615),(1615,1616),(1616,1622),(1622,1623),(1623,1628),(1628,1629),(1629,1634),(1634,1635),(1635,1640),(1640,1641),(1641,1646),(1646,1647),(1647,1654),(1654,1655),(1655,1660),(1660,1661),(1661,1666),(1666,1667),(1667,1674),(1674,1675),(1675,1680),(1680,1681),(1681,1686),(1686,1687),(1687,1692),(1692,1693),(1693,1698),(1698,1699),(1699,1704),(1704,1705),(1705,1710),(1710,1711),(1711,1716),(1716,1717),(1717,1723),(1723,1724),(1724,1733),(1733,1735),(1735,1737),(1737,1739),(1739,1741),(1741,1743),(1743,1745),(1745,1747),(1747,1749),(1749,1751),(1751,1753),(1753,1755),(1755,1757),(1757,1759),(1759,1761),(1761,1763),(1763,1765),(1765,1767),(1767,1769),(1769,1771),(1771,1773),(1773,1775),(1775,1777),(1777,1779),(1779,1781),(1781,1783),(1783,1785),(1785,1787),(1787,1789),(1789,1791),(1791,1793),(1793,1795),(1795,1797),(1797,1799),(1799,1801),(1801,1803),(1803,1805),(1805,1807),(1807,1809),(1809,1811),(1811,1813),(1813,1815),(1815,1817),(1817,1819),(1819,1821),(1821,1823),(1823,1825),(1825,1827),(1827,1829),(1829,1831),(1831,1833),(1833,1835),(1835,1837),(1837,1839),(1839,1841),(1841,1843),(1843,1845),(1845,1847),(1847,1849),(1849,1851),(1851,1853),(1853,1855),(1855,1857),(1857,1859),(1859,1861),(1861,1863),(1863,1865),(1865,1867),(1867,1869),(1869,1871),(1871,1873),(1873,1875),(1875,1877),(1877,1879),(1879,1881),(1881,1883),(1883,1885),(1885,1887),(1887,1889),(1889,1891),(1891,1893),(1893,1895),(1895,1897),(1897,1899),(1899,1901),(1901,1903),(1903,1905),(1905,1907),(1907,1909),(1909,1911),(1911,1913),(1913,1915),(1915,1917),(1917,1919),(1919,1921),(1921,1923),(1923,1925),(1925,1927),(1927,1929),(1929,1931),(1931,1933),(1933,1935),(1935,1937),(1937,1939),(1939,1941),(1941,1943),(1943,1945),(1945,1947),(1947,1949),(1949,1951),(1951,1953),(1953,1955),(1955,1957),(1957,1959),(1959,1961),(1961,1963),(1963,1965),(1965,1967),(1967,1969),(1969,1971),(1971,1973),(1973,1975),(1975,1977),(1977,1979),(1979,1981),(1981,1983),(1983,1985),(1985,1987),(1987,1989),(1989,1991),(1991,1993),(1993,1995),(1995,1997),(1997,1999),(1999,2001),(2001,2003),(2003,2005),(2005,2007),(2007,2009),(2009,2011),(2011,2013),(2013,2015),(2015,2017),(2017,2019),(2019,2021),(2021,2023),(2023,2025),(2025,2027),(2027,2029),(2029,2031),(2031,2033),(2033,2035),(2035,2037),(2037,2039),(2039,2041),(2041,2043),(2043,2045),(2045,2047),(2047,2049),(2049,2051),(2051,2053),(2053,2055),(2055,2057),(2057,2059),(2059,2061),(2061,2063),(2063,2065),(2065,2067),(2067,2069),(2069,2071),(2071,2073),(2073,2075),(2075,2077),(2077,2079),(2079,2081),(2081,2083),(2083,2085),(2085,2087),(2087,2089),(2089,2091),(2091,2093),(2093,2095),(2095,2097),(2097,2099),(2099,2101),(2101,2103),(2103,2105),(2105,2107),(2107,2109),(2109,2111),(2111,2113),(2113,2115),(2115,2117),(2117,2119),(2119,2121),(2121,2123),(2123,2125),(2125,2127),(2127,2129),(2129,2131),(2131,2133),(2133,2135),(2135,2137),(2137,2139),(2139,2141),(2141,2143),(2143,2145),(2145,2147),(2147,2149),(2149,2151),(2151,2153),(2153,2155),(2155,2157),(2157,2159),(2159,2161),(2161,2163),(2163,2165),(2165,2167),(2167,2169),(2169,2171),(2171,2173),(2173,2175),(2175,2177),(2177,2179),(2179,2181),(2181,2183),(2183,2185),(2185,2187),(2187,2189),(2189,2191),(2191,2193),(2193,2195),(2195,2197),(2197,2199),(2199,2201),(2201,2203),(2203,2205),(2205,2207),(2207,2209),(2209,2211),(2211,2213),(2213,2215),(2215,2217),(2217,2219),(2219,2221),(2221,2223),(2223,2225),(2225,2227),(2227,2229),(2229,2231),(2231,2233),(2233,2235),(2235,2237),(2237,2239),(2239,2241),(2241,2243),(2243,2245),(2245,2247),(2247,2249),(2249,2251),(2251,2253),(2253,2255),(2255,2257),(2257,2259),(2259,2261),(2261,2263),(2263,2265),(2265,2267),(2267,2269),(2269,2271),(2271,2273),(2273,2275),(2275,2277),(2277,2279),(2279,2281),(2281,2283),(2283,2285),(2285,2287),(2287,2289),(2289,2291),(2291,2293),(2293,2295),(2295,2297),(2297,2299),(2299,2301),(2301,2303),(2303,2305),(2305,2307),(2307,2309),(2309,2311),(2311,2313),(2313,2315),(2315,2317),(2317,2319),(2319,2321),(2321,2323),(2323,2325),(2325,2327),(2327,2329),(2329,2331),(2331,2333),(2333,2335),(2335,2337),(2337,2339),(2339,2341),(2341,2343),(2343,2345),(2345,2347),(2347,2349),(2349,2351),(2351,2353),(2353,2355),(2355,2357),(2357,2358),(2358,2359),(2359,2360),(2360,2361),(2361,2362),(2362,2363),(2363,2364),(2364,2365),(2365,2366),(2366,2367),(2367,2368),(2368,2370),(2370,2372),(2372,2374),(2374,2376),(2376,2378),(2378,2380),(2380,2382),(2382,2384),(2384,2386),(2386,2388),(2388,2390),(2390,2392),(2392,2394),(2394,2396),(2396,2398),(2398,2401),(2401,2404),(2404,2407),(2407,2410),(2410,2413),(2413,2416),(2416,2419),(2419,2422),(2422,2425),(2425,2428),(2428,2431),(2431,2434),(2434,2437),(2437,2440),(2440,2443),(2443,2446),(2446,2449),(2449,2452),(2452,2455),(2455,2458),(2458,2461),(2461,2463),(2463,2466),(2466,2470),(2470,2472),(2472,2474),(2474,2476),(2476,2478),(2478,2480),(2480,2482),(2482,2484),(2484,2486),(2486,2488),(2488,2490),(2490,2492),(2492,2494),(2494,2496),(2496,2498),(2498,2500),(2500,2502),(2502,2504),(2504,2506),(2506,2508),(2508,2510),(2510,2512),(2512,2514),(2514,2516),(2516,2518),(2518,2520),(2520,2522),(2522,2524),(2524,2526),(2526,2528),(2528,2530),(2530,2532),(2532,2534),(2534,2536),(2536,2538),(2538,2540),(2540,2542),(2542,2544),(2544,2546),(2546,2548),(2548,2550),(2550,2552),(2552,2554),(2554,2556),(2556,2558),(2558,2560),(2560,2562),(2562,2564),(2564,2566),(2566,2568),(2568,2570),(2570,2572),(2572,2574),(2574,2576),(2576,2578),(2578,2580),(2580,2582),(2582,2584),(2584,2586),(2586,2588),(2588,2590),(2590,2592),(2592,2594),(2594,2596),(2596,2598),(2598,2600),(2600,2602),(2602,2604),(2604,2606),(2606,2608),(2608,2610),(2610,2612),(2612,2614),(2614,2616),(2616,2618),(2618,2620),(2620,2622),(2622,2624),(2624,2626),(2626,2628),(2628,2630),(2630,2632),(2632,2634),(2634,2636),(2636,2638),(2638,2640),(2640,2642),(2642,2644),(2644,2646),(2646,2648),(2648,2650),(2650,2652),(2652,2654),(2654,2656),(2656,2658),(2658,2660),(2660,2662),(2662,2664),(2664,2666),(2666,2668),(2668,2670),(2670,2672),(2672,2674),(2674,2676),(2676,2678),(2678,2680),(2680,2682),(2682,2684),(2684,2686),(2686,2688),(2688,2690),(2690,2692),(2692,2694),(2694,2696),(2696,2698),(2698,2700),(2700,2702),(2702,2704),(2704,2706),(2706,2708),(2708,2710),(2710,2712),(2712,2714),(2714,2716),(2716,2718),(2718,2720),(2720,2722),(2722,2724),(2724,2726),(2726,2728),(2728,2730),(2730,2732),(2732,2734),(2734,2736),(2736,2738),(2738,2740),(2740,2742),(2742,2744),(2744,2746),(2746,2748),(2748,2750),(2750,2752),(2752,2755),(2755,2758),(2758,2761),(2761,2768),(2768,2774),(2774,2780),(2780,2786),(2786,2792),(2792,2798),(2798,2804),(2804,2810),(2810,2815),(2815,2820),(2820,2825),(2825,2831),(2831,2836),(2836,2842),(2842,2848),(2848,2854),(2854,2857),(2857,2864),(2864,2870),(2870,2876),(2876,2882),(2882,2887),(2887,2893),(2893,2898),(2898,2903),(2903,2908),(2908,2912),(2912,2917),(2917,2922),(2922,2926),(2926,2931),(2931,2937),(2937,2942),(2942,2945),(2945,2952),(2952,2958),(2958,2964),(2964,2970),(2970,2975),(2975,2981),(2981,2986),(2986,2991),(2991,2996),(2996,3000),(3000,3005),(3005,3010),(3010,3014),(3014,3019),(3019,3025),(3025,3030),(3030,3033),(3033,3040),(3040,3046),(3046,3051),(3051,3056),(3056,3062),(3062,3068),(3068,3073),(3073,3078),(3078,3082),(3082,3087),(3087,3092),(3092,3096),(3096,3100),(3100,3105),(3105,3111),(3111,3116),(3116,3123),(3123,3129),(3129,3134),(3134,3139),(3139,3145),(3145,3151),(3151,3156),(3156,3161),(3161,3165),(3165,3170),(3170,3175),(3175,3179),(3179,3183),(3183,3188),(3188,3194),(3194,3199),(3199,3204),(3204,3209),(3209,3213),(3213,3218),(3218,3223),(3223,3228),(3228,3233),(3233,3237),(3237,3238),(3238,3241),(3241,3245),(3245,3249),(3249,3253),(3253,3258),(3258,3263),(3263,3268),(3268,3270),(3270,3277),(3277,3283),(3283,3288),(3288,3293),(3293,3299),(3299,3304),(3304,3309),(3309,3313),(3313,3317),(3317,3321),(3321,3326),(3326,3330),(3330,3334),(3334,3339),(3339,3344),(3344,3349),(3349,3354),(3354,3359),(3359,3363),(3363,3368),(3368,3372),(3372,3377),(3377,3381),(3381,3385),(3385,3386),(3386,3389),(3389,3393),(3393,3397),(3397,3401),(3401,3406),(3406,3411),(3411,3416),(3416,3418),(3418,3424),(3424,3429),(3429,3433),(3433,3438),(3438,3443),(3443,3448),(3448,3452),(3452,3456),(3456,3460),(3460,3465),(3465,3470),(3470,3474),(3474,3478),(3478,3482),(3482,3487),(3487,3492),(3492,3498),(3498,3503),(3503,3507),(3507,3512),(3512,3517),(3517,3522),(3522,3526),(3526,3530),(3530,3534),(3534,3539),(3539,3544),(3544,3548),(3548,3552),(3552,3556),(3556,3561),(3561,3566),(3566,3571),(3571,3576),(3576,3580),(3580,3584),(3584,3588),(3588,3592),(3592,3596),(3596,3600),(3600,3601),(3601,3604),(3604,3608),(3608,3612),(3612,3615),(3615,3619),(3619,3623),(3623,3627),(3627,3629),(3629,3635),(3635,3642),(3642,3647),(3647,3653),(3653,3659),(3659,3665),(3665,3671),(3671,3676),(3676,3681),(3681,3686),(3686,3691),(3691,3696),(3696,3701),(3701,3707),(3707,3713),(3713,3719),(3719,3725),(3725,3732),(3732,3737),(3737,3743),(3743,3749),(3749,3755),(3755,3761),(3761,3766),(3766,3771),(3771,3776),(3776,3781),(3781,3786),(3786,3791),(3791,3797),(3797,3803),(3803,3809),(3809,3814),(3814,3820),(3820,3824),(3824,3829),(3829,3834),(3834,3839),(3839,3844),(3844,3848),(3848,3849),(3849,3853),(3853,3857),(3857,3861),(3861,3865),(3865,3870),(3870,3875),(3875,3880),(3880,3883),(3883,3885),(3885,3890),(3890,3895),(3895,3900),(3900,3904),(3904,3909),(3909,3914),(3914,3918),(3918,3922),(3922,3926),(3926,3930),(3930,3933),(3933,3937),(3937,3941),(3941,3945),(3945,3950),(3950,3954),(3954,3957),(3957,3960),(3960,3966),(3966,3972),(3972,3977),(3977,3983),(3983,3989),(3989,3994),(3994,3999),(3999,4000),(4000,4004),(4004,4009),(4009,4010),(4010,4014),(4014,4018),(4018,4022),(4022,4027),(4027,4032),(4032,4035),(4035,4041),(4041,4047),(4047,4053),(4053,4058),(4058,4062),(4062,4066),(4066,4071),(4071,4073),(4073,4074),(4074,4077),(4077,4078),(4078,4081),(4081,4082),(4082,4087),(4087,4091),(4091,4096),(4096,4099),(4099,4104),(4104,4109),(4109,4113),(4113,4117),(4117,4122),(4122,4126),(4126,4129),(4129,4132),(4132,4135),(4135,4138),(4138,4141),(4141,4145),(4145,4148),(4148,4152),(4152,4156),(4156,4160),(4160,4165),(4165,4170),(4170,4175),(4175,4180),(4180,4185),(4185,4190),(4190,4194),(4194,4199),(4199,4204),(4204,4208),(4208,4213),(4213,4217),(4217,4221),(4221,4225),(4225,4230),(4230,4235),(4235,4238),(4238,4244),(4244,4249),(4249,4255),(4255,4261),(4261,4267),(4267,4272),(4272,4277),(4277,4282),(4282,4287),(4287,4291),(4291,4297),(4297,4302),(4302,4308),(4308,4314),(4314,4320),(4320,4332),(4332,4344),(4344,4356),(4356,4369),(4369,4381),(4381,4393),(4393,4397),(4397,None)] 366 | 367 | 368 | 369 | 370 | p_col_names = ['DUID', 'PID', 'DUPERSID', 'DRUGIDX', 'RXRECIDX', 'LINKIDX','PANEL', 'PURCHRD', 'RXBEGMM', 'RXBEGYRX', 'RXNAME', 371 | 'RXDRGNAM', 'RXNDC', 'RXQUANTY', 'RXFORM', 'RXFRMUNT','RXSTRENG', 'RXSTRUNT', 'RXDAYSUP', 'PHARTP1', 'PHARTP2', 372 | 'PHARTP3', 'PHARTP4', 'PHARTP5', 'PHARTP6', 'PHARTP7','PHARTP8', 'PHARTP9', 'RXFLG', 'IMPFLAG', 'PCIMPFLG', 373 | 'DIABEQUIP', 'INPCFLG', 'PREGCAT', 'TC1', 'TC1S1','TC1S1_1', 'TC1S1_2', 'TC1S2', 'TC1S2_1', 'TC1S3', 374 | 'TC1S3_1', 'TC2', 'TC2S1', 'TC2S1_1', 'TC2S1_2', 'TC2S2','TC3', 'TC3S1', 'TC3S1_1', 'RXSF18X', 'RXMR18X', 'RXMD18X', 375 | 'RXPV18X', 'RXVA18X', 'RXTR18X', 'RXOF18X', 'RXSL18X','RXWC18X', 'RXOT18X', 'RXOR18X', 'RXOU18X', 'RXXP18X', 376 | 'PERWT18F', 'VARSTR', 'VARPSU'] 377 | 378 | p_col_spaces = [(0,7),(7,10),(10,20),(20,33),(33,52),(52,68),(68,70),(70,71),(71,74),(74,78),(78,128),(128,188),(188,199), 379 | (199,206),(206,256),(256,306),(306,356),(356,406),(406,409),(409,412),(412,414),(414,416),(416,418),(418,420),(420,422), 380 | (422,424),(424,426),(426,428),(428,429),(429,430),(430,431),(431,432),(432,433),(433,436),(436,439),(439,442),(442,445), 381 | (445,448),(448,451),(451,454),(454,456),(456,458),(458,461),(461,464),(464,467),(467,470),(470,473),(473,476),(476,479), 382 | (479,482),(482,490),(490,498),(498,506),(506,514),(514,522),(522,529),(529,536),(536,543),(543,550),(550,558),(558,566), 383 | (566,573),(573,581),(581,593),(593,597),(597,None)] 384 | -------------------------------------------------------------------------------- /src/mdt/meps/sql/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/coderxio/medication-diversification/8d43a8e1c2c38826aa79a2717e969fea58f81065/src/mdt/meps/sql/__init__.py -------------------------------------------------------------------------------- /src/mdt/meps/sql/meps_reference.sql: -------------------------------------------------------------------------------- 1 | --"Sex" assignments are from MEPS, source: https://meps.ahrq.gov/mepsweb/data_stats/download_data_files_codebook.jsp?PUFId=PROJYR15&varName=SEX 2 | 3 | SELECT DISTINCT 4 | t1.dupersid, 5 | t2.perwtf AS person_weight, 6 | t1.rxndc, 7 | CASE WHEN t2.sex = 1 THEN 'M' 8 | WHEN t2.sex = 2 THEN 'F' 9 | END AS gender, 10 | t2.agelast, --patient's last known age; advantage of using this col over other age cols is every patient has age (no NULLs) 11 | t2.region AS region_num 12 | FROM meps_prescription AS t1 13 | INNER JOIN meps_demographics AS t2 14 | ON t1.dupersid = t2.dupersid 15 | -------------------------------------------------------------------------------- /src/mdt/meps/utils.py: -------------------------------------------------------------------------------- 1 | import os 2 | import importlib.resources as pkg_resources 3 | from pathlib import Path 4 | from typing import Callable, Any 5 | import requests 6 | 7 | from . import sql 8 | 9 | 10 | def get_dataset( 11 | dat_name: str, 12 | dest: os.PathLike = Path.cwd(), 13 | handler: Callable[[Any], None] = None 14 | ): 15 | """Get a MEPS Dataset given a dat name + extension 16 | 17 | Args: 18 | dat_name (str): MEPS dat file name, ie: h206adat.zip 19 | dest (Path): Destination path to save file, defaults to CWD 20 | hander (func, optional): Function to bypass CWD save 21 | """ 22 | url = f'https://www.meps.ahrq.gov/mepsweb/data_files/pufs/{dat_name}' 23 | response = requests.get(url) 24 | 25 | if handler: 26 | return handler(response.content) 27 | 28 | (dest / url.split('/')[-1]).write_bytes(response.content) 29 | 30 | return response 31 | 32 | 33 | def get_sql(file_name): 34 | meps_sql = pkg_resources.read_text(sql, file_name) 35 | return meps_sql 36 | -------------------------------------------------------------------------------- /src/mdt/run_mdt.py: -------------------------------------------------------------------------------- 1 | import sys 2 | from pathlib import Path 3 | from mdt.database import load_rxnorm, load_meps, load_fda, check_table 4 | from mdt import rxnorm 5 | from mdt.utils import ( 6 | rxcui_ndc_matcher, 7 | filter_by_df, 8 | filter_by_ingredient_tty, 9 | output_df, 10 | get_meps_rxcui_ndc_df, 11 | generate_module_csv, 12 | generate_module_json 13 | ) 14 | from mdt.config import MEPS_CONFIG 15 | 16 | 17 | def main(): 18 | 19 | if check_table('rxcui_ndc') == False: 20 | load_rxnorm() 21 | if check_table('meps_demographics') == False: 22 | load_meps() 23 | if check_table('package') == False: 24 | load_fda() 25 | 26 | # TODO: replace this with the actual user settings file 27 | settings = MEPS_CONFIG 28 | 29 | # Call RxClass API to get all distinct members from multiple class ID / relationship pairs 30 | # Do this for include + add individual RXCUIs to include 31 | # Do this for exclude + add individual RXCUIs to exclude 32 | # Remove exclude RXCUIs from include RXCUI list 33 | rxcui_include_list = rxnorm.rxclass.rxclass_get_rxcuis(settings['rxclass_include']) 34 | rxcui_include_list += settings['rxcui_include'] 35 | 36 | rxcui_exclude_list = rxnorm.rxclass.rxclass_get_rxcuis(settings['rxclass_exclude']) 37 | rxcui_exclude_list += settings['rxcui_exclude'] 38 | 39 | rxcui_ingredient_list = [i for i in rxcui_include_list if i not in rxcui_exclude_list] 40 | 41 | # First, get all medications that contain one of the ingredient RXCUIs 42 | # This will result in duplicate NDCs and potentially no MINs 43 | rxcui_ingredient_df = rxcui_ndc_matcher(rxcui_ingredient_list) 44 | 45 | # Second, get all of the medications that contain one of the product RXCUIs in the df above 46 | # This will result in potentially INs and MINs, but still duplicate NDCs 47 | rxcui_product_list = ( 48 | rxcui_ingredient_df["medication_product_rxcui"].drop_duplicates().tolist() 49 | ) 50 | rxcui_product_df = rxcui_ndc_matcher(rxcui_product_list) 51 | 52 | # Third, query the df above with a window function to group by NDC and prefer MIN over IN 53 | # This will result in only distinct NDCs that map to either an MIN (preferred) or an IN 54 | # https://pandas.pydata.org/pandas-docs/stable/getting_started/comparison/comparison_with_sql.html#top-n-rows-per-group 55 | rxcui_ndc_df = ( 56 | rxcui_product_df.assign( 57 | rn=rxcui_product_df.sort_values( 58 | ["medication_ingredient_tty"], ascending=False 59 | ) 60 | .groupby(["medication_ndc"]) 61 | .cumcount() 62 | + 1 63 | ) 64 | .query("rn < 2") 65 | .drop(columns=["rn"]) 66 | ) 67 | 68 | # Filter by dose form group (DFG) or dose form (DF) 69 | # Function expects the rxcui_ndc_df, a list of DFG or DF names, and a flag for whether to include (default) or exclude 70 | # If list of DFGs or DFs is empty, then nothing is filtered out 71 | # https://www.nlm.nih.gov/research/umls/rxnorm/docs/appendix3.html 72 | 73 | # Filter by dose form (DF) or dose form group (DFG) 74 | rxcui_ndc_df = filter_by_df(rxcui_ndc_df, settings['dfg_df_filter']) 75 | 76 | # Filter by ingredient term type (TTY = 'IN' or 'MIN') 77 | rxcui_ndc_df = filter_by_ingredient_tty(rxcui_ndc_df, settings['ingredient_tty_filter']) 78 | 79 | #Saves df to csv 80 | output_df(rxcui_ndc_df, filename='rxcui_ndc_df_output') 81 | 82 | #Join MEPS data with rxcui_ndc_df 83 | meps_rxcui_ndc_df = get_meps_rxcui_ndc_df(rxcui_ndc_df) 84 | 85 | #Generate distribution CSVs 86 | generate_module_csv(meps_rxcui_ndc_df) 87 | 88 | #Generate JSON 89 | generate_module_json(meps_rxcui_ndc_df) 90 | 91 | if __name__ == "__main__": 92 | main() 93 | -------------------------------------------------------------------------------- /src/mdt/rxnorm/__init__.py: -------------------------------------------------------------------------------- 1 | from . import rxclass 2 | from . import utils 3 | 4 | __all__ = ['rxclass', 'utils'] 5 | -------------------------------------------------------------------------------- /src/mdt/rxnorm/rxclass.py: -------------------------------------------------------------------------------- 1 | from .utils import rxapi_get_requestor, json_extract, payload_constructor 2 | 3 | 4 | def rxclass_getclassmember_payload(class_id, relation, ttys=['IN', 'MIN']): 5 | """Generates and returns URLs as strings for hitting the RxClass API function GetClassMembers.""" 6 | 7 | relation_dict = { 8 | 'ATC': "ATC", 9 | 'has_EPC': "DailyMed", 10 | 'has_Chemical_Structure': "DailyMed", 11 | 'has_MoA': "DailyMed", 12 | 'has_PE': "DailyMed", 13 | # 'has_EPC': "FDASPL", # this key is repeated 14 | # 'has_Chemical_Structure': "FDASPL", # this key is repeated 15 | # 'has_MoA':"FDASPL", 16 | # 'has_PE':"FDASPL", 17 | 'has_TC': "FMTSME", 18 | 'CI_with': "MEDRT", 19 | 'induces': "MEDRT", 20 | 'may_diagnose': "MEDRT", 21 | 'may_prevent': "MEDRT", 22 | 'may_treat': "MEDRT", 23 | 'CI_ChemClass': "MEDRT", 24 | 'has_active_metabolites': "MEDRT", 25 | 'has_Ingredient': "MEDRT", 26 | 'CI_MoA': "MEDRT", 27 | # 'has_MoA': "MEDRT", 28 | 'has_PK': "MEDRT", 29 | 'site_of_metabolism': "MEDRT", 30 | 'CI_PE': "MEDRT", 31 | # 'has_PE': "MEDRT", 32 | 'has_schedule': 'RXNORM', 33 | 'MESH': "MESH", 34 | 'isa_disposition': "SNOWMEDCT", 35 | 'isa_structure': "SNOWMEDCT", 36 | 'has_VAClass': "VA", 37 | 'has_VAClass_extended': "VA", 38 | } 39 | 40 | if relation not in list(relation_dict.keys()): 41 | raise ValueError("results: relation must be one of %r." % list(relation_dict.keys())) 42 | 43 | # If relaSource is VA or RXNORM, specify ttys as one or more of: SCD, SBD, GPCK, BPCK. The default TTYs do not intersect VA or RXNORM classes. 44 | if relation_dict.get(relation) in ['VA', 'RXNORM']: 45 | ttys = ttys.extend(['SCD', 'SBD', 'GPCK', 'BPCK']) 46 | 47 | param_dict = { 48 | 'classId': class_id, 49 | 'relaSource': relation_dict.get(relation), 50 | 'ttys': '+'.join(ttys) 51 | } 52 | 53 | # Does not send rela parameter on data sources with single rela, see RxClass API documentation 54 | if relation not in ['MESH', 'ATC']: 55 | param_dict['rela'] = relation 56 | 57 | payload = payload_constructor( 58 | 'https://rxnav.nlm.nih.gov/REST/rxclass/classMembers.json?', 59 | param_dict 60 | ) 61 | 62 | return payload 63 | 64 | 65 | def rxclass_get_rxcuis(rxclass_query_list): 66 | """Returns a distinct list of RXCUIs from multiple RxClass queries""" 67 | print(rxclass_query_list) 68 | rxcui_list = [] 69 | for rxclass_query in rxclass_query_list: 70 | class_id = rxclass_query['class_id'] 71 | relationship = rxclass_query['relationship'] 72 | 73 | rxclass_response = rxapi_get_requestor( 74 | rxclass_getclassmember_payload(class_id, relationship) 75 | ) 76 | rxclass_member_list = json_extract(rxclass_response, "rxcui") 77 | rxcui_list += rxclass_member_list 78 | 79 | # Remove duplicate RXCUIs 80 | rxcui_list = list(set(rxcui_list)) 81 | 82 | return rxcui_list 83 | -------------------------------------------------------------------------------- /src/mdt/rxnorm/sql/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/coderxio/medication-diversification/8d43a8e1c2c38826aa79a2717e969fea58f81065/src/mdt/rxnorm/sql/__init__.py -------------------------------------------------------------------------------- /src/mdt/rxnorm/sql/dfg_df.sql: -------------------------------------------------------------------------------- 1 | select distinct df_rxnconso.str as df, dfg_rxnconso.str as dfg 2 | 3 | -- dose form 4 | from rxnconso df_rxnconso 5 | 6 | -- dose form group 7 | left join rxnrel dfg_rxnrel on dfg_rxnrel.rxcui2 = df_rxnconso.rxcui and dfg_rxnrel.rela = 'isa' 8 | left join rxnconso dfg_rxnconso on dfg_rxnconso.rxcui = dfg_rxnrel.rxcui1 and dfg_rxnconso.sab = 'RXNORM' and dfg_rxnconso.tty = 'DFG' 9 | 10 | where df_rxnconso.sab = 'RXNORM' and df_rxnconso.tty = 'DF' 11 | -------------------------------------------------------------------------------- /src/mdt/rxnorm/sql/rxcui_ndc.sql: -------------------------------------------------------------------------------- 1 | select distinct 2 | sq.medication_ingredient_rxcui 3 | , sq.medication_ingredient_name 4 | , sq.medication_ingredient_tty 5 | , sq.medication_product_rxcui 6 | , sq.medication_product_name 7 | , sq.medication_product_tty 8 | 9 | , df_rxnconso.rxcui as dose_form_rxcui 10 | , df_rxnconso.str as dose_form_name 11 | , df_rxnconso.tty as dose_form_tty 12 | 13 | --, dfg_rxnconso.rxcui as dose_form_group_rxcui 14 | --, dfg_rxnconso.str as dose_form_group_name 15 | --, dfg_rxnconso.tty as dose_form_group_tty 16 | 17 | , ndc_rxnsat.atv as medication_ndc 18 | 19 | from ( 20 | 21 | select in_rxnconso.rxcui as medication_ingredient_rxcui 22 | , in_rxnconso.str as medication_ingredient_name 23 | , in_rxnconso.tty as medication_ingredient_tty 24 | , scd_rxnconso.rxcui as medication_product_rxcui 25 | , scd_rxnconso.str as medication_product_name 26 | , scd_rxnconso.tty as medication_product_tty 27 | 28 | -- medication ingredient (IN) 29 | from rxnconso in_rxnconso 30 | 31 | -- medication product (SCDC -> SCD) 32 | left join rxnrel scdc_rxnrel on scdc_rxnrel.rxcui2 = in_rxnconso.rxcui and scdc_rxnrel.rela = 'ingredient_of' 33 | left join rxnconso scdc_rxnconso on scdc_rxnconso.rxcui = scdc_rxnrel.rxcui1 and scdc_rxnconso.sab = 'RXNORM' and scdc_rxnconso.tty = 'SCDC' 34 | left join rxnrel scd_rxnrel on scd_rxnrel.rxcui2 = scdc_rxnrel.rxcui1 and scd_rxnrel.rela = 'constitutes' 35 | left join rxnconso scd_rxnconso on scd_rxnconso.rxcui = scd_rxnrel.rxcui1 and scd_rxnconso.sab = 'RXNORM' and scd_rxnconso.tty = 'SCD' 36 | 37 | where in_rxnconso.tty = 'IN' 38 | and in_rxnconso.sab = 'RXNORM' 39 | 40 | union all 41 | 42 | select in_rxnconso.rxcui as medication_ingredient_rxcui 43 | , in_rxnconso.str as medication_ingredient_name 44 | , in_rxnconso.tty as medication_ingredient_tty 45 | , sbd_rxnconso.rxcui as medication_product_rxcui 46 | , sbd_rxnconso.str as medication_product_name 47 | , sbd_rxnconso.tty as medication_product_tty 48 | 49 | -- medication ingredient (IN) 50 | from rxnconso in_rxnconso 51 | 52 | -- medication product (BN -> SBD) 53 | left join rxnrel bn_rxnrel on bn_rxnrel.rxcui2 = in_rxnconso.rxcui and bn_rxnrel.rela = 'has_tradename' 54 | left join rxnconso bn_rxnconso on bn_rxnconso.rxcui = bn_rxnrel.rxcui1 and bn_rxnconso.sab = 'RXNORM' and bn_rxnconso.tty = 'BN' 55 | left join rxnrel sbd_rxnrel on sbd_rxnrel.rxcui2 = bn_rxnrel.rxcui1 and sbd_rxnrel.rela = 'ingredient_of' 56 | left join rxnconso sbd_rxnconso on sbd_rxnconso.rxcui = sbd_rxnrel.rxcui1 and sbd_rxnconso.sab = 'RXNORM' and sbd_rxnconso.tty = 'SBD' 57 | 58 | where in_rxnconso.tty = 'IN' 59 | and in_rxnconso.sab = 'RXNORM' 60 | 61 | union all 62 | 63 | select in_rxnconso.rxcui as medication_ingredient_rxcui 64 | , in_rxnconso.str as medication_ingredient_name 65 | , in_rxnconso.tty as medication_ingredient_tty 66 | , gpck_rxnconso.rxcui as medication_product_rxcui 67 | , gpck_rxnconso.str as medication_product_name 68 | , gpck_rxnconso.tty as medication_product_tty 69 | 70 | -- medication ingredient (IN) 71 | from rxnconso in_rxnconso 72 | 73 | -- medication product (SCDC -> SCD -> GPCK) 74 | left join rxnrel scdc_rxnrel on scdc_rxnrel.rxcui2 = in_rxnconso.rxcui and scdc_rxnrel.rela = 'ingredient_of' 75 | left join rxnconso scdc_rxnconso on scdc_rxnconso.rxcui = scdc_rxnrel.rxcui1 and scdc_rxnconso.sab = 'RXNORM' and scdc_rxnconso.tty = 'SCDC' 76 | left join rxnrel scd_rxnrel on scd_rxnrel.rxcui2 = scdc_rxnrel.rxcui1 and scd_rxnrel.rela = 'constitutes' 77 | left join rxnconso scd_rxnconso on scd_rxnconso.rxcui = scd_rxnrel.rxcui1 and scd_rxnconso.sab = 'RXNORM' and scd_rxnconso.tty = 'SCD' 78 | left join rxnrel gpck_rxnrel on gpck_rxnrel.rxcui2 = scd_rxnrel.rxcui1 and gpck_rxnrel.rela = 'contained_in' 79 | left join rxnconso gpck_rxnconso on gpck_rxnconso.rxcui = gpck_rxnrel.rxcui1 and gpck_rxnconso.sab = 'RXNORM' and gpck_rxnconso.tty = 'GPCK' 80 | 81 | where in_rxnconso.tty = 'IN' 82 | and in_rxnconso.sab = 'RXNORM' 83 | 84 | union all 85 | 86 | select in_rxnconso.rxcui as medication_ingredient_rxcui 87 | , in_rxnconso.str as medication_ingredient_name 88 | , in_rxnconso.tty as medication_ingredient_tty 89 | , bpck_rxnconso.rxcui as medication_product_rxcui 90 | , bpck_rxnconso.str as medication_product_name 91 | , bpck_rxnconso.tty as medication_product_tty 92 | 93 | -- medication ingredient (IN) 94 | from rxnconso in_rxnconso 95 | 96 | -- medication product (SCDC -> SCD -> GPCK -> BPCK) 97 | left join rxnrel scdc_rxnrel on scdc_rxnrel.rxcui2 = in_rxnconso.rxcui and scdc_rxnrel.rela = 'ingredient_of' 98 | left join rxnconso scdc_rxnconso on scdc_rxnconso.rxcui = scdc_rxnrel.rxcui1 and scdc_rxnconso.sab = 'RXNORM' and scdc_rxnconso.tty = 'SCDC' 99 | left join rxnrel scd_rxnrel on scd_rxnrel.rxcui2 = scdc_rxnrel.rxcui1 and scd_rxnrel.rela = 'constitutes' 100 | left join rxnconso scd_rxnconso on scd_rxnconso.rxcui = scd_rxnrel.rxcui1 and scd_rxnconso.sab = 'RXNORM' and scd_rxnconso.tty = 'SCD' 101 | left join rxnrel gpck_rxnrel on gpck_rxnrel.rxcui2 = scd_rxnrel.rxcui1 and gpck_rxnrel.rela = 'contained_in' 102 | left join rxnconso gpck_rxnconso on gpck_rxnconso.rxcui = gpck_rxnrel.rxcui1 and gpck_rxnconso.sab = 'RXNORM' and gpck_rxnconso.tty = 'GPCK' 103 | left join rxnrel bpck_rxnrel on bpck_rxnrel.rxcui2 = gpck_rxnrel.rxcui1 and bpck_rxnrel.rela = 'has_tradename' 104 | left join rxnconso bpck_rxnconso on bpck_rxnconso.rxcui = bpck_rxnrel.rxcui1 and bpck_rxnconso.sab = 'RXNORM' and bpck_rxnconso.tty = 'BPCK' 105 | 106 | where in_rxnconso.tty = 'IN' 107 | and in_rxnconso.sab = 'RXNORM' 108 | 109 | union all 110 | 111 | select min_rxnconso.rxcui as medication_ingredient_rxcui 112 | , min_rxnconso.str as medication_ingredient_name 113 | , min_rxnconso.tty as medication_ingredient_tty 114 | , scd_rxnconso.rxcui as medication_product_rxcui 115 | , scd_rxnconso.str as medication_product_name 116 | , scd_rxnconso.tty as medication_product_tty 117 | 118 | -- medication ingredient (MIN) 119 | from rxnconso min_rxnconso 120 | 121 | -- medication product (SCD) 122 | left join rxnrel scd_rxnrel on scd_rxnrel.rxcui2 = min_rxnconso.rxcui and scd_rxnrel.rela = 'ingredients_of' 123 | left join rxnconso scd_rxnconso on scd_rxnconso.rxcui = scd_rxnrel.rxcui1 and scd_rxnconso.sab = 'RXNORM' and scd_rxnconso.tty = 'SCD' 124 | 125 | where min_rxnconso.tty = 'MIN' 126 | and min_rxnconso.sab = 'RXNORM' 127 | 128 | union all 129 | 130 | select min_rxnconso.rxcui as medication_ingredient_rxcui 131 | , min_rxnconso.str as medication_ingredient_name 132 | , min_rxnconso.tty as medication_ingredient_tty 133 | , sbd_rxnconso.rxcui as medication_product_rxcui 134 | , sbd_rxnconso.str as medication_product_name 135 | , sbd_rxnconso.tty as medication_product_tty 136 | 137 | -- medication ingredient (MIN) 138 | from rxnconso min_rxnconso 139 | 140 | -- medication product (SCD -> SBD) 141 | left join rxnrel scd_rxnrel on scd_rxnrel.rxcui2 = min_rxnconso.rxcui and scd_rxnrel.rela = 'ingredients_of' 142 | left join rxnconso scd_rxnconso on scd_rxnconso.rxcui = scd_rxnrel.rxcui1 and scd_rxnconso.sab = 'RXNORM' and scd_rxnconso.tty = 'SCD' 143 | left join rxnrel sbd_rxnrel on sbd_rxnrel.rxcui2 = scd_rxnrel.rxcui1 and sbd_rxnrel.rela = 'has_tradename' 144 | left join rxnconso sbd_rxnconso on sbd_rxnconso.rxcui = sbd_rxnrel.rxcui1 and sbd_rxnconso.sab = 'RXNORM' and sbd_rxnconso.tty = 'SBD' 145 | 146 | where min_rxnconso.tty = 'MIN' 147 | and min_rxnconso.sab = 'RXNORM' 148 | 149 | union all 150 | 151 | select min_rxnconso.rxcui as medication_ingredient_rxcui 152 | , min_rxnconso.str as medication_ingredient_name 153 | , min_rxnconso.tty as medication_ingredient_tty 154 | , gpck_rxnconso.rxcui as medication_product_rxcui 155 | , gpck_rxnconso.str as medication_product_name 156 | , gpck_rxnconso.tty as medication_product_tty 157 | 158 | -- medication ingredient (MIN) 159 | from rxnconso min_rxnconso 160 | 161 | -- medication product (SCD -> GPCK) 162 | left join rxnrel scd_rxnrel on scd_rxnrel.rxcui2 = min_rxnconso.rxcui and scd_rxnrel.rela = 'ingredients_of' 163 | left join rxnconso scd_rxnconso on scd_rxnconso.rxcui = scd_rxnrel.rxcui1 and scd_rxnconso.sab = 'RXNORM' and scd_rxnconso.tty = 'SCD' 164 | left join rxnrel gpck_rxnrel on gpck_rxnrel.rxcui2 = scd_rxnrel.rxcui1 and gpck_rxnrel.rela = 'contained_in' 165 | left join rxnconso gpck_rxnconso on gpck_rxnconso.rxcui = gpck_rxnrel.rxcui1 and gpck_rxnconso.sab = 'RXNORM' and gpck_rxnconso.tty = 'GPCK' 166 | 167 | where min_rxnconso.tty = 'MIN' 168 | and min_rxnconso.sab = 'RXNORM' 169 | 170 | union all 171 | 172 | select min_rxnconso.rxcui as medication_ingredient_rxcui 173 | , min_rxnconso.str as medication_ingredient_name 174 | , min_rxnconso.tty as medication_ingredient_tty 175 | , bpck_rxnconso.rxcui as medication_product_rxcui 176 | , bpck_rxnconso.str as medication_product_name 177 | , bpck_rxnconso.tty as medication_product_tty 178 | 179 | -- medication ingredient (MIN) 180 | from rxnconso min_rxnconso 181 | 182 | -- medication product (SCD -> SBD -> BPCK) 183 | left join rxnrel scd_rxnrel on scd_rxnrel.rxcui2 = min_rxnconso.rxcui and scd_rxnrel.rela = 'ingredients_of' 184 | left join rxnconso scd_rxnconso on scd_rxnconso.rxcui = scd_rxnrel.rxcui1 and scd_rxnconso.sab = 'RXNORM' and scd_rxnconso.tty = 'SCD' 185 | left join rxnrel sbd_rxnrel on sbd_rxnrel.rxcui2 = scd_rxnrel.rxcui1 and sbd_rxnrel.rela = 'has_tradename' 186 | left join rxnconso sbd_rxnconso on sbd_rxnconso.rxcui = sbd_rxnrel.rxcui1 and sbd_rxnconso.sab = 'RXNORM' and sbd_rxnconso.tty = 'SBD' 187 | left join rxnrel bpck_rxnrel on bpck_rxnrel.rxcui2 = sbd_rxnrel.rxcui1 and bpck_rxnrel.rela = 'contained_in' 188 | left join rxnconso bpck_rxnconso on bpck_rxnconso.rxcui = bpck_rxnrel.rxcui1 and bpck_rxnconso.sab = 'RXNORM' and bpck_rxnconso.tty = 'BPCK' 189 | 190 | where min_rxnconso.tty = 'MIN' 191 | and min_rxnconso.sab = 'RXNORM' 192 | ) as sq 193 | 194 | -- dose form 195 | left join rxnrel df_rxnrel on df_rxnrel.rxcui2 = sq.medication_product_rxcui and df_rxnrel.rela = 'has_dose_form' 196 | left join rxnconso df_rxnconso on df_rxnconso.rxcui = df_rxnrel.rxcui1 and df_rxnconso.sab = 'RXNORM' and df_rxnconso.tty = 'DF' 197 | 198 | -- dose form group 199 | --left join rxnrel dfg_rxnrel on dfg_rxnrel.rxcui2 = df_rxnrel.rxcui1 and dfg_rxnrel.rela = 'isa' 200 | --left join rxnconso dfg_rxnconso on dfg_rxnconso.rxcui = dfg_rxnrel.rxcui1 and dfg_rxnconso.sab = 'RXNORM' and dfg_rxnconso.tty = 'DFG' 201 | 202 | -- ndc 203 | left join rxnsat ndc_rxnsat on ndc_rxnsat.rxcui = sq.medication_product_rxcui and ndc_rxnsat.sab = 'RXNORM' and ndc_rxnsat.atn = 'NDC' 204 | 205 | where ndc_rxnsat.atv is not null 206 | -- and sq.medication_ingredient_rxcui in ('285155','10582','10814','10565','325521','10572') 207 | -------------------------------------------------------------------------------- /src/mdt/rxnorm/utils.py: -------------------------------------------------------------------------------- 1 | import os 2 | import urllib 3 | from pathlib import Path 4 | import importlib.resources as pkg_resources 5 | import requests 6 | from typing import Callable, Any 7 | 8 | from . import sql 9 | 10 | 11 | def json_extract(obj, key): 12 | """Recursively fetch values from nested JSON.""" 13 | arr = [] 14 | 15 | def extract(obj, arr, key): 16 | """Recursively search for values of key in JSON tree.""" 17 | if isinstance(obj, dict): 18 | for k, v in obj.items(): 19 | if isinstance(v, (dict, list)): 20 | extract(v, arr, key) 21 | elif k == key: 22 | arr.append(v) 23 | elif isinstance(obj, list): 24 | for item in obj: 25 | extract(item, arr, key) 26 | return arr 27 | 28 | values = extract(obj, arr, key) 29 | print(values) 30 | return values 31 | 32 | 33 | def payload_constructor(base_url, params): 34 | # TODO: exception handling for params as dict 35 | 36 | params_str = urllib.parse.urlencode(params, safe=':+') 37 | payload = { 38 | 'base_url': base_url, 39 | 'params': params_str 40 | } 41 | 42 | # debug print out 43 | print("""Payload built with base URL: {0} and parameters: {1}""".format(base_url,params_str)) 44 | 45 | return payload 46 | 47 | 48 | def rxapi_get_requestor(request_dict): 49 | """Sends a GET request to either RxNorm or RxClass""" 50 | response = requests.get( 51 | request_dict['base_url'], 52 | params=request_dict['params'] 53 | ) 54 | 55 | # debug print out 56 | print("GET Request sent to URL: {0}".format(response.url)) 57 | print("Response HTTP Code: {0}".format(response.status_code)) 58 | 59 | # TODO: Add execption handling that can manage 200 responses with no JSON 60 | if response.status_code == 200: 61 | return response.json() 62 | 63 | 64 | def get_dataset( 65 | dest: os.PathLike = Path.cwd(), 66 | handler: Callable[[Any], None] = None 67 | ): 68 | url = 'https://download.nlm.nih.gov/rxnorm/RxNorm_full_prescribe_current.zip' 69 | response = requests.get(url) 70 | if handler: 71 | return handler(response.content) 72 | (dest / url.split('/')[-1]).write_bytes(response.content) 73 | return response 74 | 75 | 76 | def get_sql(file_name): 77 | meps_sql = pkg_resources.read_text(sql, file_name) 78 | return meps_sql 79 | -------------------------------------------------------------------------------- /src/mdt/sql/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/coderxio/medication-diversification/8d43a8e1c2c38826aa79a2717e969fea58f81065/src/mdt/sql/__init__.py -------------------------------------------------------------------------------- /src/mdt/sql/meps_rx_qty_ds.sql: -------------------------------------------------------------------------------- 1 | select cast(RXQUANTY as INTEGER) [RXQUANTY] 2 | , cast(RXDAYSUP as INTEGER) [RXDAYSUP] 3 | , rxnorm.medication_product_rxcui 4 | , rxnorm.medication_product_name 5 | , count(*) [COUNT] 6 | from meps_prescription mp 7 | inner join rxcui_ndc rxnorm on mp.RXNDC = rxnorm.medication_ndc 8 | where RXDAYSUP > 0 9 | group by RXQUANTY, RXDAYSUP, medication_product_name 10 | order by count(*) desc 11 | -------------------------------------------------------------------------------- /src/mdt/utils.py: -------------------------------------------------------------------------------- 1 | import json 2 | import re 3 | import pandas as pd 4 | import time 5 | from pathlib import Path 6 | from mdt.database import db_query, path_manager, delete_csv_files 7 | from mdt import meps 8 | from mdt import rxnorm 9 | 10 | 11 | def read_json(file_name): 12 | # Opening JSON file 13 | f = open(file_name,) 14 | 15 | # returns JSON object as a dictionary 16 | data = json.load(f) 17 | return data 18 | 19 | 20 | # Monkey patched this function to get run_mdt working by removing the filename arg and importing from config 21 | def age_values(age_ranges): 22 | """Creates dataframe with age_values. Input is a list of age ranges (need at least 2) using the age_ranges in the settings.yaml file if populated, otherwise the default_age_ranges from mdt-settings.yaml.""" 23 | 24 | data = {} 25 | data['age'] = age_ranges 26 | data['age_values'] = [list(range(int(age.split('-')[0]), int(age.split('-')[1])+1)) for age in data['age']] 27 | df = pd.DataFrame(data) 28 | df = df.explode('age_values') 29 | return df 30 | 31 | 32 | # TODO: Add option to string search doseage form 33 | def rxcui_ndc_matcher(rxcui_list): 34 | """Mashes list of RxCUIs against RxNorm combined table to get matching NDCs. 35 | Select output of return, clipboard, csv....return is default""" 36 | 37 | df = db_query('SELECT * FROM rxcui_ndc') 38 | filtered_df = df[df['medication_ingredient_rxcui'].isin(rxcui_list) | df['medication_product_rxcui'].isin(rxcui_list)] 39 | 40 | print("RXCUI list matched on {0} NDCs".format(filtered_df['medication_ndc'].count())) 41 | 42 | return filtered_df 43 | 44 | 45 | def get_prescription_details(rxcui): 46 | """mashes a medication product RXCUI against MEPS prescription details + RxNorm to get common prescription details. 47 | Either outputs False or a prescription object 48 | https://github.com/synthetichealth/synthea/wiki/Generic-Module-Framework%3A-States#medicationorder""" 49 | 50 | df = db_query('SELECT * FROM meps_rx_qty_ds') 51 | filtered_df = df[df['medication_product_rxcui'] == rxcui] 52 | 53 | # If the medication product does not have any reliable prescription details, don't generate prescription details 54 | # NOTE: not sure if 'return False' is the best way to do this - open to alternatives 55 | if len(filtered_df.index) == 0: 56 | return False 57 | 58 | # Currently, this just picks the most common prescription details at the medication product level 59 | # TODO: if there are more than 1 common prescription details, randomly pick one - favoring the more common ones 60 | selected_rx_details = filtered_df.iloc[0].to_dict() 61 | 62 | # NOTE: Synthea currently doesn't appear to have a field to capture quantity prescribed as part of the MedicationOrder 63 | rx_qty = int(selected_rx_details['RXQUANTY']) 64 | rx_ds = int(selected_rx_details['RXDAYSUP']) 65 | 66 | # TODO: maybe do this in the filtered_df step above? 67 | if rx_qty == 0 or rx_ds == 0: 68 | return False 69 | 70 | # See FHIR Timing reference for how these variables are calculated 71 | # http://hl7.org/fhir/DSTU2/datatypes.html#Timing 72 | frequency = int(rx_qty / rx_ds) if rx_qty >= rx_ds else 1 73 | period = int(rx_ds / rx_qty) if rx_ds > rx_qty else 1 74 | 75 | dosage = { 76 | 'amount': 1, 77 | 'frequency': frequency, 78 | 'period': period, 79 | 'unit': 'days' 80 | } 81 | 82 | duration = { 83 | 'quantity': rx_ds, 84 | 'unit': 'days' 85 | } 86 | 87 | prescription = { 88 | 'dosage': dosage, 89 | 'duration': duration 90 | } 91 | 92 | return prescription 93 | 94 | 95 | def filter_by_dose_form(rxcui_ndc_df, settings, method='include'): 96 | """Gets DFs from dfg_df table that match either a DF in the list, or have a DFG that matches a DFG in the list 97 | If dfg_df list is empty, return the rxcui_ndc_df without filtering 98 | Select method option of include or exclude....include is default""" 99 | dose_form_filter_list = settings['dose_form_filter'] 100 | if not isinstance(dose_form_filter_list, list): 101 | return rxcui_ndc_df 102 | 103 | dfg_df_df = db_query('SELECT * FROM dfg_df') 104 | filtered_dfg_df_df = dfg_df_df[dfg_df_df['dfg'].isin(dose_form_filter_list) | dfg_df_df['df'].isin(dose_form_filter_list)] 105 | df_list = filtered_dfg_df_df['df'].tolist() 106 | 107 | if method == 'include': 108 | filtered_rxcui_ndc_df = rxcui_ndc_df[rxcui_ndc_df['dose_form_name'].isin(df_list)] 109 | elif method == 'exclude': 110 | filtered_rxcui_ndc_df = rxcui_ndc_df[~rxcui_ndc_df['dose_form_name'].isin(df_list)] 111 | else: 112 | filtered_rxcui_ndc_df = rxcui_ndc_df 113 | 114 | print("RXCUI list filtered on DF matched on {0} NDCs".format(filtered_rxcui_ndc_df['medication_ndc'].count())) 115 | 116 | return filtered_rxcui_ndc_df 117 | 118 | def filter_by_ingredient_tty(rxcui_ndc_df, settings): 119 | """Outputs a dataframe filtered by ingredient TTY""" 120 | ingredient_tty_filter = settings['ingredient_tty_filter'] 121 | 122 | if ingredient_tty_filter not in ('IN', 'MIN'): 123 | return rxcui_ndc_df 124 | 125 | filtered_rxcui_ndc_df = rxcui_ndc_df[rxcui_ndc_df['medication_ingredient_tty'] == ingredient_tty_filter] 126 | 127 | return filtered_rxcui_ndc_df 128 | 129 | def output_df(df, output='csv', path=Path.cwd(), filename='df_output'): 130 | """Outputs a dataframe to a csv of clipboard if you use the output=clipboard arguement""" 131 | filename = filename + '.' + output 132 | if output == 'clipboard': 133 | df.to_clipboard(index=False, excel=True) 134 | elif output == 'csv': 135 | df.to_csv(path / filename, index=False) 136 | 137 | 138 | def output_json(data, path=Path.cwd(), filename='json_output'): 139 | filename = filename + '.json' 140 | with open(path / filename, 'w', encoding='utf-8') as f: 141 | json.dump(data, f, ensure_ascii=False, indent=4) 142 | 143 | 144 | def output_list(data, path=Path.cwd(), filename='log'): 145 | timestamp = time.strftime('%Y%m%d-%H%M%S') 146 | filename = f'{filename} {timestamp}' 147 | filename = f'{filename}.txt' 148 | with open(path / filename, 'w', encoding = 'utf-8') as f: 149 | for list_item in data: 150 | f.write('%s\n' % list_item) 151 | 152 | 153 | def normalize_name(name, case='camel', spaces=False): 154 | """ Case is optional and choices are lower, upper, and camel """ 155 | 156 | #Replace all non-alphanumeric characters with an underscore 157 | name = re.sub(r"[^a-zA-Z0-9]", "_", name) 158 | # Then, replace all duplicate underscores with just one underscore 159 | name = re.sub(r"_{2,}", "_", name) 160 | # If there'a an underscore at the end of the word, remove 161 | name = re.sub(r"_$", "", name) 162 | 163 | if case == 'lower': 164 | name = name.lower() 165 | elif case == 'upper': 166 | name = name.upper() 167 | elif case == 'camel': 168 | name = name.title() 169 | 170 | if spaces: 171 | name = re.sub(r"_", " ", name) 172 | 173 | return name 174 | 175 | 176 | def get_rxcui_ingredient_df(settings): 177 | # Call RxClass API to get all distinct members from multiple class ID / relationship pairs 178 | # Do this for include + add individual RXCUIs to include 179 | # Do this for exclude + add individual RXCUIs to exclude 180 | # Remove exclude RXCUIs from include RXCUI list 181 | rxcui_include_list = [] 182 | rxcui_exclude_list = [] 183 | 184 | if isinstance(settings['rxclass']['include'], list): 185 | rxcui_include_list = rxnorm.rxclass.rxclass_get_rxcuis(settings['rxclass']['include']) 186 | 187 | if isinstance(settings['rxcui']['include'], list): 188 | rxcui_include_list += settings['rxcui']['include'] 189 | 190 | if isinstance(settings['rxclass']['exclude'], list): 191 | rxcui_exclude_list = rxnorm.rxclass.rxclass_get_rxcuis(settings['rxclass']['exclude']) 192 | 193 | if isinstance(settings['rxcui']['exclude'], list): 194 | rxcui_exclude_list += settings['rxcui']['exclude'] 195 | 196 | rxcui_ingredient_list = [i for i in rxcui_include_list if i not in rxcui_exclude_list] 197 | 198 | rxcui_ingredient_df = rxcui_ndc_matcher(rxcui_ingredient_list) 199 | 200 | return rxcui_ingredient_df 201 | 202 | 203 | def get_rxcui_product_df(rxcui_ingredient_df, settings): 204 | rxcui_product_list = ( 205 | rxcui_ingredient_df["medication_product_rxcui"].drop_duplicates().tolist() 206 | ) 207 | rxcui_product_df = rxcui_ndc_matcher(rxcui_product_list) 208 | 209 | return rxcui_product_df 210 | 211 | 212 | def get_rxcui_ndc_df(rxcui_product_df, module_name, settings): 213 | rxcui_ndc_df = ( 214 | rxcui_product_df.assign( 215 | rn=rxcui_product_df.sort_values( 216 | ["medication_ingredient_tty"], ascending=False 217 | ) 218 | .groupby(["medication_ndc"]) 219 | .cumcount() 220 | + 1 221 | ) 222 | .query("rn < 2") 223 | .drop(columns=["rn"]) 224 | ) 225 | 226 | # Filter by dose form group (DFG) or dose form (DF) 227 | # Function expects the rxcui_ndc_df, a list of DFG or DF names, and a flag for whether to include (default) or exclude 228 | # If list of DFGs or DFs is empty, then nothing is filtered out 229 | # https://www.nlm.nih.gov/research/umls/rxnorm/docs/appendix3.html 230 | 231 | # Filter by dose form (DF) or dose form group (DFG) 232 | rxcui_ndc_df = filter_by_dose_form(rxcui_ndc_df, settings) 233 | 234 | # Filter by ingredient term type (TTY = 'IN' or 'MIN') 235 | rxcui_ndc_df = filter_by_ingredient_tty(rxcui_ndc_df, settings) 236 | 237 | #Saves df to csv 238 | output_df(rxcui_ndc_df, path = path_manager(Path.cwd() / module_name / 'log'), filename='rxcui_ndc_df_output') 239 | 240 | return rxcui_ndc_df 241 | 242 | def get_meps_rxcui_ndc_df(rxcui_ndc_df, module_name, settings): 243 | #Read in MEPS Reference table 244 | meps_reference = db_query(meps.utils.get_sql('meps_reference.sql')) 245 | 246 | #Join MEPS to filtered rxcui_ndc dataframe (rxcui_list) 247 | meps_rxcui_ndc_df = meps_reference.astype(str).merge(rxcui_ndc_df.astype(str)[['medication_ingredient_name', 'medication_ingredient_rxcui','medication_product_name', 'medication_product_rxcui', 'medication_ndc']], how = 'inner', left_on = 'RXNDC', right_on = 'medication_ndc') 248 | 249 | output_df(meps_rxcui_ndc_df, path = path_manager(Path.cwd() / module_name / 'log'), filename = 'meps_rxcui_ndc_df_output') 250 | 251 | return meps_rxcui_ndc_df 252 | 253 | def generate_module_csv(meps_rxcui_ndc_df, module_name, settings, path=Path.cwd()): 254 | module = path / module_name 255 | lookup_tables = path_manager(module / 'lookup_tables') 256 | delete_csv_files(lookup_tables) 257 | 258 | meps_rxcui = meps_rxcui_ndc_df 259 | # Optional: Age range join - can be customized in the settings.yaml file 260 | # groupby_demographic_variable: must be either an empty list [] or list of patient demographics (e.g., age, gender, state) - based on user inputs in the settings.yaml file 261 | 262 | config = settings 263 | demographic_distribution_flags = config['meps']['demographic_distribution_flags'] 264 | state_prefix = config['state_prefix'] 265 | ingredient_distribution_suffix = config['ingredient_distribution_suffix'] 266 | product_distribution_suffix = config['product_distribution_suffix'] 267 | age_ranges = config['meps']['age_ranges'] 268 | default_age_ranges = config['default_age_ranges'] 269 | 270 | groupby_demographic_variables = [] 271 | for k, v in demographic_distribution_flags.items(): 272 | if v != False: 273 | groupby_demographic_variables.append(k) 274 | 275 | # Optional: age range from MEPS 276 | if demographic_distribution_flags['age'] != False: 277 | if not isinstance(age_ranges, list): 278 | age_ranges = default_age_ranges 279 | age_ranges_df = age_values(age_ranges) 280 | meps_rxcui_ndc_df = meps_rxcui_ndc_df.merge(age_ranges_df.astype(str), how='inner', left_on='AGELAST', right_on='age_values') 281 | 282 | # Optional: state-region mapping from MEPS 283 | if demographic_distribution_flags['state'] != False: 284 | meps_rxcui_ndc_df = meps_rxcui_ndc_df.merge(meps.columns.meps_region_states.astype(str), how='inner', left_on='region_num', right_on='region_value') 285 | 286 | # Clean text to JSON/SQL-friendly format 287 | for col in meps_rxcui_ndc_df[['medication_ingredient_name', 'medication_product_name']]: 288 | meps_rxcui_ndc_df[col] = meps_rxcui_ndc_df[col].apply(lambda x: normalize_name(x)) 289 | 290 | # dcp = 'demographic count percent' 291 | dcp_dict = {} 292 | medication_ingredient_list = meps_rxcui_ndc_df['medication_ingredient_name'].unique().tolist() 293 | 294 | # Ingredient Name Distribution (Transition 1) 295 | """Numerator = ingredient_name 296 | Denominator = total population [filtered by rxclass_name upstream between rxcui_ndc & rxclass] 297 | 1. Find distinct count of patients (DUPERSID) = patient_count 298 | 2. Multiply count of patients * personweight = weighted_patient_count 299 | 3. Add the weighted_patient_counts, segmented by ingredient_name + selected patient demographics = patients_by_demographics (Numerator) 300 | 4. Add the patients_by_demographics from Step 3 = weighted_patient_count_total (Denominator) -- Taking SUM of SUMs to make the Denominator = 100% 301 | 5. Calculate percentage (Output from Step 3/Output from Step 4) -- format as 0.0-1.0 per Synthea requirements. 302 | 6. Add the 'prescribe_' prefix to the medication_ingredient_name (e.g., 'prescribe_fluticasone') 303 | 7. Pivot the dataframe to transpose medication_ingredient_names from rows to columns """ 304 | 305 | filename = normalize_name(module_name + ingredient_distribution_suffix, 'lower') 306 | # 1 307 | dcp_dict['patient_count_ingredient'] = meps_rxcui_ndc_df[['medication_ingredient_name', 'medication_ingredient_rxcui', 'person_weight', 'DUPERSID']+groupby_demographic_variables].groupby(['medication_ingredient_name', 'medication_ingredient_rxcui', 'person_weight']+groupby_demographic_variables)['DUPERSID'].nunique() 308 | dcp_df = pd.DataFrame(dcp_dict['patient_count_ingredient']).reset_index() 309 | # 2 310 | dcp_df['weighted_patient_count_ingredient'] = dcp_df['person_weight'].astype(float)*dcp_df['DUPERSID'] 311 | # 3 312 | dcp_dict['patients_by_demographics_ingredient'] = dcp_df.groupby(['medication_ingredient_name']+groupby_demographic_variables)['weighted_patient_count_ingredient'].sum() 313 | dcp_demographic_df = pd.DataFrame(dcp_dict['patients_by_demographics_ingredient']).reset_index() 314 | # 4 315 | if len(groupby_demographic_variables) > 0: 316 | dcp_demographictotal_ingred_df = pd.merge(dcp_demographic_df, dcp_demographic_df.groupby(groupby_demographic_variables)['weighted_patient_count_ingredient'].sum(), how = 'inner', left_on = groupby_demographic_variables, right_index=True, suffixes = ('_demographic', '_total')) 317 | else: 318 | dcp_demographictotal_ingred_df = dcp_demographic_df 319 | dcp_demographictotal_ingred_df['weighted_patient_count_ingredient_demographic'] = dcp_demographic_df['weighted_patient_count_ingredient'] 320 | dcp_demographictotal_ingred_df['weighted_patient_count_ingredient_total'] = dcp_demographic_df['weighted_patient_count_ingredient'].sum() 321 | # 5 322 | dcp_demographictotal_ingred_df['percent_ingredient_patients'] = round(dcp_demographictotal_ingred_df['weighted_patient_count_ingredient_demographic']/dcp_demographictotal_ingred_df['weighted_patient_count_ingredient_total'], 3) 323 | 324 | dcp_demographictotal_ingred_remarks_dict = {} 325 | if len(groupby_demographic_variables) > 0: 326 | dcp_demographictotal_ingred_remarks_df = dcp_demographictotal_ingred_df[['medication_ingredient_name', 'weighted_patient_count_ingredient_demographic']].fillna(0) 327 | dcp_demographictotal_ingred_remarks_df.drop_duplicates(inplace=True) 328 | dcp_demographictotal_ingred_remarks_dict = dcp_demographictotal_ingred_remarks_df.groupby('medication_ingredient_name')['weighted_patient_count_ingredient_demographic'].sum() 329 | dcp_demographictotal_ingred_remarks_df = pd.DataFrame(dcp_demographictotal_ingred_remarks_dict).reset_index().rename(columns={'weighted_patient_count_ingredient_demographic':'agg_weighted_patient_count_ingredient_demographic'}) 330 | dcp_demographictotal_ingred_remarks_df['agg_weighted_patient_count_ingredient_total'] = dcp_demographictotal_ingred_remarks_df['agg_weighted_patient_count_ingredient_demographic'].sum() 331 | 332 | dcp_demographictotal_ingred_remarks_df['agg_percent_ingredient_patients'] = round(dcp_demographictotal_ingred_remarks_df['agg_weighted_patient_count_ingredient_demographic']/dcp_demographictotal_ingred_remarks_df['agg_weighted_patient_count_ingredient_total'], 3) 333 | else: 334 | dcp_demographictotal_ingred_remarks_df = dcp_demographictotal_ingred_df[['medication_ingredient_name', 'weighted_patient_count_ingredient_demographic', 'percent_ingredient_patients']].fillna(0) 335 | dcp_demographictotal_ingred_remarks_df.drop_duplicates(inplace=True) 336 | dcp_demographictotal_ingred_remarks_df['agg_percent_ingredient_patients'] = dcp_demographictotal_ingred_remarks_df['percent_ingredient_patients'] 337 | 338 | # 6 339 | dcp_dict['percent_ingredient_patients'] = dcp_demographictotal_ingred_df 340 | dcp_dict['percent_ingredient_patients']['medication_ingredient_transition_name'] = dcp_dict['percent_ingredient_patients']['medication_ingredient_name'].apply(lambda x: normalize_name(state_prefix + x)) 341 | # 7 342 | if len(groupby_demographic_variables) > 0: 343 | dcp_dict['percent_ingredient_patients'] = dcp_dict['percent_ingredient_patients'].reset_index().pivot(index=groupby_demographic_variables, columns='medication_ingredient_transition_name', values='percent_ingredient_patients').reset_index() 344 | else: 345 | dcp_dict['percent_ingredient_patients'] = dcp_dict['percent_ingredient_patients'][['medication_ingredient_transition_name', 'percent_ingredient_patients']].set_index('medication_ingredient_transition_name').T 346 | 347 | # Fill NULLs and save as CSV 348 | dcp_dict['percent_ingredient_patients'].fillna(0, inplace=True) 349 | ingredient_distribution_df = dcp_dict['percent_ingredient_patients'] 350 | output_df(ingredient_distribution_df, output = 'csv', path = path_manager(module / 'lookup_tables'), filename = filename) 351 | 352 | # Product Name Distribution (Transition 2) 353 | """Numerator = product_name 354 | Denominator = ingredient_name 355 | Loop through all the ingredient_names to create product distributions by ingredient name 356 | Same steps as above for Ingredient Name Distribution (1-7), but first filter medication_product_names for only those that have the same medication_ingredient_name (Step 0) """ 357 | 358 | # Dictionary for storing remarks %s 359 | dcp_demographictotal_prod_remarks_dict = {} 360 | validation_df = pd.DataFrame({}) 361 | 362 | for ingredient_name in medication_ingredient_list: 363 | filename = normalize_name(module_name + '_' + ingredient_name + product_distribution_suffix, 'lower') 364 | # 0 365 | meps_rxcui_ingred = meps_rxcui_ndc_df[meps_rxcui_ndc_df['medication_ingredient_name']==ingredient_name][['medication_product_name', 'medication_product_rxcui', 'medication_ingredient_name', 'medication_ingredient_rxcui', 'person_weight', 'DUPERSID']+groupby_demographic_variables] 366 | # 1 367 | dcp_dict['patient_count_product'] = meps_rxcui_ingred.groupby(['medication_product_name', 'medication_product_rxcui', 'medication_ingredient_name', 'medication_ingredient_rxcui', 'person_weight']+groupby_demographic_variables)['DUPERSID'].nunique() 368 | dcp_df = pd.DataFrame(dcp_dict['patient_count_product']).reset_index() 369 | # 2 370 | dcp_df['weighted_patient_count_product'] = dcp_df['person_weight'].astype(float)*dcp_df['DUPERSID'] 371 | # 3 372 | dcp_dict['patients_by_demographics_product'] = dcp_df.groupby(['medication_product_name', 'medication_ingredient_name']+groupby_demographic_variables)['weighted_patient_count_product'].sum() 373 | dcp_demographic_df = pd.DataFrame(dcp_dict['patients_by_demographics_product']).reset_index() 374 | # 4 375 | dcp_demographictotal_prod_df = pd.merge(dcp_demographic_df, dcp_demographic_df.groupby(['medication_ingredient_name']+groupby_demographic_variables)['weighted_patient_count_product'].sum(), how = 'inner', left_on = ['medication_ingredient_name']+groupby_demographic_variables, right_index=True, suffixes = ('_demographic', '_total')) 376 | # 5 377 | dcp_demographictotal_prod_df[ingredient_name+'_percent_product_patients'] = round(dcp_demographictotal_prod_df['weighted_patient_count_product_demographic']/dcp_demographictotal_prod_df['weighted_patient_count_product_total'], 3) 378 | 379 | if len(groupby_demographic_variables) > 0: 380 | dcp_demographictotal_prod_remarks_dict[ingredient_name] = dcp_demographictotal_prod_df[['medication_product_name', 'weighted_patient_count_product_demographic']].fillna(0) 381 | dcp_demographictotal_prod_remarks_dict[ingredient_name].drop_duplicates(inplace=True) 382 | dcp_demographictotal_prod_remarks_dict[ingredient_name+'_df'] = dcp_demographictotal_prod_remarks_dict[ingredient_name].groupby('medication_product_name')['weighted_patient_count_product_demographic'].sum() 383 | dcp_demographictotal_prod_remarks_dict[ingredient_name+'_df_remarks'] = pd.DataFrame(dcp_demographictotal_prod_remarks_dict[ingredient_name+'_df']).reset_index().rename(columns={'weighted_patient_count_product_demographic':'agg_weighted_patient_count_product_demographic'}) 384 | dcp_demographictotal_prod_remarks_dict[ingredient_name+'_df_remarks']['agg_weighted_patient_count_product_total'] = dcp_demographictotal_prod_remarks_dict[ingredient_name+'_df_remarks']['agg_weighted_patient_count_product_demographic'].sum() 385 | 386 | dcp_demographictotal_prod_remarks_dict[ingredient_name+'_df_remarks']['agg_percent_product_patients'] = round(dcp_demographictotal_prod_remarks_dict[ingredient_name+'_df_remarks']['agg_weighted_patient_count_product_demographic']/dcp_demographictotal_prod_remarks_dict[ingredient_name+'_df_remarks']['agg_weighted_patient_count_product_total'], 3) 387 | else: 388 | dcp_demographictotal_prod_remarks_dict[ingredient_name] = dcp_demographictotal_prod_df[['medication_product_name', 'weighted_patient_count_product_demographic', ingredient_name+'_percent_product_patients']].fillna(0) 389 | dcp_demographictotal_prod_remarks_dict[ingredient_name].drop_duplicates(inplace=True) 390 | dcp_demographictotal_prod_remarks_dict[ingredient_name+'_df_remarks'] = dcp_demographictotal_prod_remarks_dict[ingredient_name] 391 | dcp_demographictotal_prod_remarks_dict[ingredient_name+'_df_remarks']['agg_percent_product_patients'] = dcp_demographictotal_prod_remarks_dict[ingredient_name+'_df_remarks'][ingredient_name+'_percent_product_patients'] 392 | 393 | # 6 394 | dcp_dict['percent_product_patients'] = dcp_demographictotal_prod_df 395 | dcp_dict['percent_product_patients']['medication_product_transition_name'] = dcp_dict['percent_product_patients']['medication_product_name'].apply(lambda x: normalize_name(state_prefix + x)) 396 | # 7 397 | if len(groupby_demographic_variables) > 0: 398 | dcp_dict['percent_product_patients'] = dcp_dict['percent_product_patients'].reset_index().pivot(index= groupby_demographic_variables, columns = 'medication_product_transition_name', values=ingredient_name+'_percent_product_patients').reset_index() 399 | else: 400 | dcp_dict['percent_product_patients'] = dcp_dict['percent_product_patients'][['medication_product_transition_name', ingredient_name+'_percent_product_patients']].set_index('medication_product_transition_name').T 401 | 402 | # Fill NULLs and save as CSV 403 | dcp_dict['percent_product_patients'].fillna(0, inplace=True) 404 | product_distribution_df = dcp_dict['percent_product_patients'] 405 | output_df(product_distribution_df, output = 'csv', path = lookup_tables, filename = filename) 406 | 407 | #Generates Validation df output CSV file (% distributions at the product level) 408 | dcp_demographictotal_prod_df.rename(columns= {ingredient_name+'_percent_product_patients': 'percent_product_patients'}, inplace=True) 409 | if len(groupby_demographic_variables) > 0: 410 | validation_df_ingred = dcp_demographictotal_prod_df.merge(dcp_demographictotal_ingred_df, how='inner',on=['medication_ingredient_name']+groupby_demographic_variables) 411 | else: 412 | validation_df_ingred = dcp_demographictotal_prod_df.merge(dcp_demographictotal_ingred_df, how='inner',on='medication_ingredient_name') 413 | validation_df_ingred['validation_percent_product_patients'] = validation_df_ingred['percent_ingredient_patients']*validation_df_ingred['percent_product_patients'] 414 | validation_df = pd.concat([validation_df, validation_df_ingred]) 415 | 416 | output_df(validation_df, path = path_manager(Path.cwd() / module_name / 'log'), filename = 'validation_df_output') 417 | 418 | return dcp_demographictotal_ingred_remarks_df, dcp_demographictotal_prod_remarks_dict 419 | # return dcp_dict 420 | 421 | 422 | def generate_module_json(meps_rxcui_ndc_df, dcp_demographictotal_ingred_remarks_df, dcp_demographictotal_prod_remarks_dict, module_name, settings, path=Path.cwd()): 423 | module = path / module_name 424 | 425 | config = settings 426 | demographic_distribution_flags = config['meps']['demographic_distribution_flags'] 427 | state_prefix = config['state_prefix'] 428 | ingredient_distribution_suffix = config['ingredient_distribution_suffix'] 429 | product_distribution_suffix = config['product_distribution_suffix'] 430 | as_needed = config['module']['as_needed'] 431 | chronic = config['module']['chronic'] 432 | refills = config['module']['refills'] 433 | 434 | assign_to_attribute = normalize_name(module_name, case = 'lower') if config['module']['assign_to_attribute'] is None else normalize_name(config['module']['assign_to_attribute'], 'lower') 435 | reason = config['module']['reason'] 436 | 437 | module_dict = {} 438 | all_remarks = [] 439 | sep = '\n' 440 | module_display_name = config['module']['name'] if config['module']['name'] is not None else normalize_name(module_name, spaces = True) 441 | camelcase_module_name = normalize_name(module_display_name, spaces = True) 442 | uppercase_module_name = normalize_name(module_display_name, case = 'upper', spaces = True) 443 | 444 | module_dict['name'] = camelcase_module_name 445 | module_dict['remarks'] = [ 446 | '======================================================================', 447 | f' SUBMODULE {uppercase_module_name}', 448 | '======================================================================', 449 | '', 450 | 'This submodule prescribes a medication based on population data.', 451 | '', 452 | 'IT IS UP TO THE CALLING MODULE TO END THIS MEDICATION BY ATTRIBUTE.', 453 | 'All medications prescribed in this module are assigned to the attribute', 454 | f'\'{assign_to_attribute}\'.', 455 | '', 456 | 'Reference links:', 457 | ' RxClass: https://mor.nlm.nih.gov/RxClass/', 458 | ' RxNorm: https://www.nlm.nih.gov/research/umls/rxnorm/index.html', 459 | ' RxNav: https://mor.nlm.nih.gov/RxNav/', 460 | ' MEPS: https://meps.ahrq.gov/mepsweb/data_stats/MEPS_topics.jsp?topicid=46Z-1', 461 | ' FDA: https://www.fda.gov/drugs/drug-approvals-and-databases/national-drug-code-directory', 462 | '', 463 | 'Made with () by the CodeRx Medication Diversification Tool (MDT)' 464 | ] 465 | 466 | settings_remarks = [ 467 | '', 468 | 'MDT settings for this submodule:', 469 | ] 470 | settings_text = json.dumps(settings, indent = 4) 471 | settings_text_list = settings_text.split('\n') 472 | settings_remarks += settings_text_list 473 | module_dict['remarks'] += settings_remarks 474 | all_remarks += module_dict['remarks'] 475 | 476 | # NOTE: not sure the difference between 1 and 2... I think 2 is the most recent version(?) 477 | module_dict['gmf_version'] = 2 478 | 479 | states_dict = {} 480 | 481 | # Initial state (required) 482 | # NOTE: if we change to conditional to check for existence of medication, channge direct_transition to transition 483 | states_dict['Initial'] = { 484 | 'type': 'Initial', 485 | 'conditional_transition': [ 486 | { 487 | 'condition': { 488 | 'condition_type': 'Attribute', 489 | 'attribute': assign_to_attribute, 490 | 'operator': 'is nil' 491 | }, 492 | 'transition': normalize_name(state_prefix + 'Ingredient') 493 | }, 494 | { 495 | 'transition': 'Terminal' 496 | } 497 | ] 498 | } 499 | 500 | # Terminal state (required) 501 | states_dict['Terminal'] = { 502 | 'type': 'Terminal' 503 | } 504 | 505 | # Generate ingredient table transition 506 | ingredient_transition_state_remarks = [ 507 | '======================================================================', 508 | ' MEDICATION INGREDIENT TABLE TRANSITION ', 509 | '======================================================================', 510 | 'Ingredients in lookup table:', 511 | '# [ % pop ] Name', 512 | '-- --------- ----', 513 | ] 514 | 515 | for idx, row in dcp_demographictotal_ingred_remarks_df[['medication_ingredient_name', 'agg_percent_ingredient_patients']].iterrows(): 516 | ingredient_detail = ''+str(idx+1)+'. [ '+str(round(row['agg_percent_ingredient_patients']*100,2))+'% ] '+row['medication_ingredient_name'] 517 | ingredient_transition_state_remarks.append(ingredient_detail) 518 | 519 | medication_ingredient_transition_name_list = dcp_demographictotal_ingred_remarks_df['medication_ingredient_name'].apply(lambda x: normalize_name(state_prefix + x)).unique().tolist() 520 | filename = normalize_name(module_name + ingredient_distribution_suffix, 'lower') 521 | lookup_table_name = filename + '.csv' 522 | lookup_table_transition = [] 523 | for idx, transition in enumerate(medication_ingredient_transition_name_list): 524 | lookup_table_transition.append({ 525 | 'transition': transition, 526 | 'default_probability': '1' if idx == 0 else '0', 527 | 'lookup_table_name': lookup_table_name 528 | }) 529 | state_name = normalize_name(state_prefix + 'Ingredient') 530 | states_dict[state_name] = { 531 | 'name': state_name, 532 | 'remarks': ingredient_transition_state_remarks, 533 | 'type': 'Simple', 534 | 'lookup_table_transition': lookup_table_transition 535 | } 536 | all_remarks += ingredient_transition_state_remarks 537 | 538 | # Generate product table transition 539 | medication_ingredient_name_list = dcp_demographictotal_ingred_remarks_df['medication_ingredient_name'].unique().tolist() 540 | for ingredient_name in medication_ingredient_name_list: 541 | product_transition_state_remarks = [ 542 | '======================================================================', 543 | ' ' + ingredient_name.upper() + ' MEDICATION PRODUCT TABLE TRANSITION ', 544 | '======================================================================', 545 | 'Products in lookup table:', 546 | '# [ % pop ] Name', 547 | '-- --------- ----', 548 | ] 549 | filename = normalize_name(module_name + '_' + ingredient_name + product_distribution_suffix, 'lower') 550 | lookup_table_name = filename + '.csv' 551 | lookup_table_transition = [] 552 | 553 | medication_product_name_list = dcp_demographictotal_prod_remarks_dict[ingredient_name+'_df_remarks']['medication_product_name'].unique().tolist() 554 | medication_product_name_remarks_df = dcp_demographictotal_prod_remarks_dict[ingredient_name+'_df_remarks'] 555 | for idx, row in medication_product_name_remarks_df.iterrows(): 556 | product_detail = ''+str(idx+1)+'. [ '+str(round(row['agg_percent_product_patients']*100,2))+'% ] '+row['medication_product_name'] 557 | product_transition_state_remarks.append(product_detail) 558 | 559 | medication_product_transition_name_list = dcp_demographictotal_prod_remarks_dict[ingredient_name]['medication_product_name'].apply(lambda x: normalize_name(state_prefix + x)).unique().tolist() 560 | for idx, transition in enumerate(medication_product_transition_name_list): 561 | lookup_table_transition.append({ 562 | 'transition': transition, 563 | 'default_probability': '1' if idx == 0 else '0', 564 | 'lookup_table_name': lookup_table_name 565 | }) 566 | state_name = normalize_name(state_prefix + ingredient_name) 567 | states_dict[state_name] = { 568 | 'name': state_name, 569 | 'remarks': product_transition_state_remarks, 570 | 'type': 'Simple', 571 | 'lookup_table_transition': lookup_table_transition 572 | } 573 | all_remarks += product_transition_state_remarks 574 | 575 | # Generate MedicationOrder states 576 | # medication_products = list(dcp_demographictotal_df[['medication_product_name', 'medication_product_rxcui']].to_records(index=False)) 577 | medication_products_df = meps_rxcui_ndc_df.groupby(['medication_product_name', 'medication_product_rxcui']).size().reset_index(name='count') 578 | medication_products_list = medication_products_df[['medication_product_name', 'medication_product_rxcui']].values.tolist() 579 | #medication_products = list(medication_products_df[['medication_product_name', 'medication_product_rxcui']].to_records(index=False)) 580 | 581 | medication_order_state_remarks = [ 582 | '======================================================================', 583 | ' BEGIN MEDICATION ORDER STATES ', 584 | '======================================================================', 585 | ] 586 | for idx, (medication_product_name, medication_product_rxcui) in enumerate(medication_products_list): 587 | state_name = normalize_name(state_prefix + medication_product_name) 588 | refills = refills if isinstance(refills, int) else 0 589 | codes = { 590 | 'system': 'RxNorm', 591 | 'code': medication_product_rxcui, 592 | 'display': medication_product_name 593 | } 594 | prescription = { 595 | 'refills': refills 596 | } 597 | if as_needed in (True, False): 598 | prescription['as_needed'] = as_needed 599 | states_dict[state_name] = { 600 | 'name': state_name, 601 | 'type': 'MedicationOrder', 602 | 'assign_to_attribute': assign_to_attribute, 603 | 'codes': [ codes ], 604 | 'prescription': prescription, 605 | 'direct_transition': 'Terminal' 606 | } 607 | if chronic in (True, False): 608 | states_dict[state_name]['chronic'] = chronic 609 | 610 | if reason is not None: 611 | states_dict[state_name]['reason'] = reason 612 | 613 | if idx == 0: 614 | medication_order_state_remarks_dict = {'remarks': medication_order_state_remarks} 615 | states_dict[state_name] = {**medication_order_state_remarks_dict, **states_dict[state_name]} 616 | 617 | # NOTE: commenting this out for final submission as it is still in testing 618 | ''' 619 | prescription_details = get_prescription_details(medication_product_rxcui) 620 | if prescription_details: 621 | states_dict[state_name]['prescription'] = {**states_dict[state_name]['prescription'], **prescription_details} 622 | ''' 623 | 624 | module_dict['states'] = states_dict 625 | 626 | filename = normalize_name(module_name, 'lower') 627 | output_list(all_remarks, path = path_manager(module / 'log')) 628 | output_json(module_dict, path = module, filename = filename) -------------------------------------------------------------------------------- /src/mdt/yamlmanager.py: -------------------------------------------------------------------------------- 1 | from pathlib import Path 2 | from ruamel.yaml import YAML 3 | 4 | yaml = YAML() 5 | 6 | 7 | MDT_SETTINGS = '''\ 8 | # Base Application settings for module generation 9 | state_prefix: Prescribe_ 10 | ingredient_distribution_suffix: _ingredient_distribution 11 | product_distribution_suffix: _product_distribution 12 | default_age_ranges: 13 | - 0-3 14 | - 4-7 15 | - 8-11 16 | - 12-17 17 | - 18-25 18 | - 26-35 19 | - 36-45 20 | - 46-65 21 | - 65-103 22 | ''' 23 | 24 | MODULE_SETTINGS = '''\ 25 | # Settings for the Synthea module 26 | module: 27 | name: # (optional) string, defaults to the camelcase name of the module folder 28 | assign_to_attribute: # (optional) string, defaults to the lowercase name of the module folder 29 | reason: # (optional) string, references a previous ConditionOnset state 30 | as_needed: false # boolean, whether the prescription is as needed 31 | chronic: false # boolean, whether the prescription is chronic 32 | refills: 0 # integer, number of refills 33 | 34 | # Settings for the RxClass search to include/exclude 35 | # *** At least one RxClass include or RXCUI include is required *** 36 | # NOTE: you can include/exclude multiple class_id/relationship pairs 37 | # RxClass options - see https://mor.nlm.nih.gov/RxClass/ 38 | rxclass: 39 | include: 40 | # - class_id: 41 | # relationship: 42 | exclude: 43 | # - class_id: 44 | # relationship: 45 | 46 | # Settings for individual RXCUIs to include/exclude 47 | # *** At least one RxClass include or RXCUI include is required *** 48 | # NOTE: you can include/exclude multiple RXCUIs 49 | # You must enclose RXCUIs in quotes - example: '435' 50 | # RXCUI options - see the Ingredient section in https://mor.nlm.nih.gov/RxNav/ 51 | # Dose form options - see https://www.nlm.nih.gov/research/umls/rxnorm/docs/appendix3.html 52 | rxcui: 53 | include: 54 | # - 55 | exclude: 56 | # - 57 | ingredient_tty_filter: # (optional) string, options are IN or MIN 58 | dose_form_filter: # (optional) list, see dose form options above 59 | # - 60 | 61 | # Settings for the MEPS population 62 | meps: 63 | age_ranges: # (optional) list, defaults to mdt-settings.yaml default age ranges 64 | # - 65 | demographic_distribution_flags: 66 | age: true # boolean, whether to break up distributions by age ranges 67 | gender: true # boolean, whether to break up distributions by gender 68 | state: true # boolean, whether to break up distributions by state of residence 69 | ''' 70 | 71 | config_schema = { 72 | 'state_prefix': ((str,), ''), 73 | 'ingredient_distribution_suffix': ((str,), ''), 74 | 'product_distribution_suffix': ((str,), ''), 75 | 'default_age_ranges': ((list), ''), 76 | 'module': { 77 | 'name': ((str, type(None)), ''), 78 | 'assign_to_attribute': ((str, type(None)), ''), 79 | 'reason': ((str, type(None)), ''), 80 | 'as_needed': ((bool,), ''), 81 | 'chronic': ((bool,), ''), 82 | 'refills': ((int,), ''), 83 | }, 84 | 'rxclass': { 85 | 'include': ((list, type(None)), ''), 86 | 'exclude': ((list, type(None)), ''), 87 | }, 88 | 'rxcui': { 89 | 'include': ((list, type(None)), ''), 90 | 'exclude': ((list, type(None)), ''), 91 | }, 92 | 'ingredient_tty_filter': ((str, type(None)), 'must be either IN, MIN'), 93 | 'dose_form_filter': ((list, type(None)), ''), 94 | 'meps': { 95 | 'age_ranges': ((list, type(None)), ''), 96 | 'demographic_distribution_flags': ((object,), ''), 97 | } 98 | } 99 | 100 | 101 | def validate_config(config, schema=config_schema): 102 | err = [] 103 | 104 | for setting, attributes in schema.items(): 105 | 106 | if isinstance(attributes, tuple): 107 | value_type, err_message = attributes 108 | if not isinstance(config[setting], value_type): 109 | err.append( 110 | f'{setting} must be of type {value_type} {err_message}' 111 | ) 112 | 113 | if isinstance(attributes, dict): 114 | for attribute, value in attributes.items(): 115 | value_type, err_message = value 116 | if not isinstance(config[setting][attribute], value_type): 117 | err.append( 118 | f'{attribute} must be of type {value_type} {err_message}' 119 | ) 120 | 121 | sep = '\n' 122 | 123 | if err: 124 | raise ValueError(f'Config file validation error\n{sep.join(err)}') 125 | 126 | 127 | def validate_minimum_settings(config): 128 | err = [] 129 | 130 | if (config['rxclass']['include'] is None 131 | and config['rxcui']['include'] is None): 132 | err.append('Must have at least one RxClass include or RXCUI include.') 133 | 134 | sep = '\n' 135 | 136 | if err: 137 | raise ValueError(f'Minimum settings validation error\n{sep.join(err)}') 138 | 139 | 140 | def create_mdt_settings(path=Path.cwd()): 141 | settings = path / 'mdt-settings.yaml' 142 | 143 | if not settings.exists(): 144 | data = yaml.load(MDT_SETTINGS) 145 | yaml.dump(data, settings) 146 | 147 | 148 | def create_module_settings(module_name, path=Path.cwd()): 149 | module = path / module_name 150 | 151 | if not module.exists(): 152 | module.mkdir(parents=True) 153 | data = yaml.load(MODULE_SETTINGS) 154 | yaml.dump(data, (module / 'settings.yaml')) 155 | 156 | def get_settings(module_name, path=Path.cwd()): 157 | module_settings = path / module_name / 'settings.yaml' 158 | mdt_settings = path / 'mdt-settings.yaml' 159 | 160 | if not module_settings.exists(): 161 | raise FileNotFoundError(f'Settings file does not exist in the {module_name} module.') 162 | elif not mdt_settings.exists(): 163 | raise FileNotFoundError('MDT settings file does not exist.') 164 | 165 | module_data = yaml.load(module_settings) 166 | mdt_data = yaml.load(mdt_settings) 167 | settings = {**module_data, **mdt_data} 168 | 169 | validate_config(settings) 170 | validate_minimum_settings(settings) 171 | 172 | return settings 173 | --------------------------------------------------------------------------------