├── .gitignore ├── sample.out ├── columndefs ├── sh-users-csvcolumns.txt ├── sh-conditions-csvcolumns.txt ├── sh-medications-csvcolumns.txt ├── sh-observations-csvcolumns.txt ├── sh-appointments-csvcolumns.txt └── sh-patients-csvcolumns.txt ├── doc └── source │ └── images │ └── architecture.png ├── ACKNOWLEDGEMENTS.md ├── CONTRIBUTING.md ├── sqlite ├── createUsers.sql ├── transformConditions.sql ├── createAppointments.sql ├── transformMedications.sql ├── transformPatients.sql └── transformObservations.sql ├── pom.xml ├── MAINTAINERS.md ├── schemas.sql ├── src └── main │ └── java │ ├── GetDBData.java │ └── ZLoadFile.java ├── run.bat ├── run.sh ├── README.md └── LICENSE /.gitignore: -------------------------------------------------------------------------------- 1 | .DS_Store 2 | target/** -------------------------------------------------------------------------------- /sample.out: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IBM/example-health-synthea/HEAD/sample.out -------------------------------------------------------------------------------- /columndefs/sh-users-csvcolumns.txt: -------------------------------------------------------------------------------- 1 | (PATIENTID INTEGER, 2 | USERNAME CHAR, 3 | USERPASSWORD CHAR 4 | ) -------------------------------------------------------------------------------- /doc/source/images/architecture.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IBM/example-health-synthea/HEAD/doc/source/images/architecture.png -------------------------------------------------------------------------------- /columndefs/sh-conditions-csvcolumns.txt: -------------------------------------------------------------------------------- 1 | (PATIENTID INTEGER, 2 | START DATE EXTERNAL(DATE_C), 3 | STOP DATE EXTERNAL(DATE_C) NULLIF(STOP=''), 4 | CODE CHAR, 5 | DESCRIPTION CHAR 6 | ) -------------------------------------------------------------------------------- /columndefs/sh-medications-csvcolumns.txt: -------------------------------------------------------------------------------- 1 | (PATIENTID INTEGER, 2 | DRUGNAME CHAR, 3 | STRENGTH CHAR, 4 | AMOUNT SMALLINT, 5 | ROUTE CHAR, 6 | FREQUENCY CHAR, 7 | IDENTIFIER CHAR, 8 | TYPE CHAR 9 | ) -------------------------------------------------------------------------------- /columndefs/sh-observations-csvcolumns.txt: -------------------------------------------------------------------------------- 1 | (PATIENTID INTEGER, 2 | DATEOFOBSERVATION DATE EXTERNAL(DATE_C), 3 | CODE CHAR, 4 | DESCRIPTION CHAR, 5 | NUMERICVALUE FLOAT, 6 | CHARACTERVALUE CHAR, 7 | UNITS CHAR 8 | ) -------------------------------------------------------------------------------- /ACKNOWLEDGEMENTS.md: -------------------------------------------------------------------------------- 1 | ## Acknowledgements 2 | 3 | * Credit goes to [James Gill](http://db2geek.triton.co.uk/author/james-gill/) for his invaluable [blog](http://db2geek.triton.co.uk/zload-mid-range-data-load-db2-zos/) about zload. 4 | -------------------------------------------------------------------------------- /columndefs/sh-appointments-csvcolumns.txt: -------------------------------------------------------------------------------- 1 | (PATIENTID INTEGER, 2 | FIRSTNAME CHAR, 3 | LASTNAME CHAR, 4 | APPT_DATE DATE EXTERNAL(DATE_C), 5 | APPT_TIME CHAR, 6 | MED_FIELD CHAR, 7 | OFF_NAME CHAR, 8 | OFF_ADDR CHAR, 9 | OFF_CITY CHAR, 10 | OFF_ZIP CHAR 11 | ) -------------------------------------------------------------------------------- /columndefs/sh-patients-csvcolumns.txt: -------------------------------------------------------------------------------- 1 | (PATIENTID INTEGER, 2 | ID CHAR, 3 | USERNAME CHAR, 4 | DATEOFBIRTH DATE EXTERNAL, 5 | INSCARDNUMBER CHAR, 6 | FIRSTNAME CHAR, 7 | LASTNAME CHAR, 8 | ADDRESS CHAR, 9 | CITY CHAR, 10 | POSTCODE CHAR, 11 | PHONEMOBILE CHAR, 12 | EMAILADDRESS CHAR 13 | ) -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | # Contributing 2 | 3 | This is an open source project, and we appreciate your help! 4 | 5 | We use the GitHub issue tracker to discuss new features and non-trivial bugs. 6 | 7 | In addition to the issue tracker, [#journeys on 8 | Slack](https://dwopen.slack.com) is the best way to get into contact with the 9 | project's maintainers. 10 | 11 | To contribute code, documentation, or tests, please submit a pull request to 12 | the GitHub repository. Generally, we expect two maintainers to review your pull 13 | request before it is approved for merging. For more details, see the 14 | [MAINTAINERS](MAINTAINERS.md) page. 15 | 16 | Contributions are subject to the [Developer Certificate of Origin, Version 1.1](https://developercertificate.org/) and the [Apache License, Version 2](https://www.apache.org/licenses/LICENSE-2.0.txt). 17 | -------------------------------------------------------------------------------- /sqlite/createUsers.sql: -------------------------------------------------------------------------------- 1 | ------------------------------------------------------------------------------ 2 | -- Copyright 2019 IBM Corp. All Rights Reserved. 3 | -- 4 | -- Licensed under the Apache License, Version 2.0 (the "License"); 5 | -- you may not use this file except in compliance with the License. 6 | -- You may obtain a copy of the License at 7 | -- 8 | -- http://www.apache.org/licenses/LICENSE-2.0 9 | -- 10 | -- Unless required by applicable law or agreed to in writing, software 11 | -- distributed under the License is distributed on an "AS IS" BASIS, 12 | -- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | -- See the License for the specific language governing permissions and 14 | -- limitations under the License. 15 | ------------------------------------------------------------------------------ 16 | 17 | -- 18 | -- Create a users csv file for synthea patients. 19 | -- 20 | -- Usage: 21 | -- 1. set working directory to Synthea project 22 | -- 2. run "sqlite3 < createUsers.sql" 23 | -- 24 | -- Input: sh_patients.csv 25 | -- Output: sh_users.csv 26 | -- 27 | -- Dependencies: 28 | -- 1. Tranform the patients.csv file first to get integer patient ids assigned. 29 | -- 30 | 31 | -- Read input file 32 | 33 | .mode csv 34 | .import output/csv/sh_patients.csv sh_patients 35 | 36 | -- Create users 37 | 38 | CREATE TABLE USERS AS 39 | SELECT PATIENTID, 40 | USERNAME, 41 | 'pass' AS USERPASSWORD 42 | FROM SH_PATIENTS; 43 | 44 | -- Open output file 45 | 46 | .headers on 47 | .output output/csv/sh_users.csv 48 | 49 | -- Output table 50 | 51 | SELECT * FROM USERS; -------------------------------------------------------------------------------- /sqlite/transformConditions.sql: -------------------------------------------------------------------------------- 1 | ------------------------------------------------------------------------------ 2 | -- Copyright 2019 IBM Corp. All Rights Reserved. 3 | -- 4 | -- Licensed under the Apache License, Version 2.0 (the "License"); 5 | -- you may not use this file except in compliance with the License. 6 | -- You may obtain a copy of the License at 7 | -- 8 | -- http://www.apache.org/licenses/LICENSE-2.0 9 | -- 10 | -- Unless required by applicable law or agreed to in writing, software 11 | -- distributed under the License is distributed on an "AS IS" BASIS, 12 | -- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | -- See the License for the specific language governing permissions and 14 | -- limitations under the License. 15 | ------------------------------------------------------------------------------ 16 | 17 | -- 18 | -- Transform synthea conditions.csv file into format compatible with Example Health. 19 | -- 20 | -- Usage: 21 | -- 1. set working directory to Synthea project 22 | -- 2. run "sqlite3 < transformConditions.sql" 23 | -- 24 | -- Input: conditions.csv, sh_patients.csv 25 | -- Output: sh_conditions.csv 26 | -- 27 | -- Dependencies: 28 | -- 1. Tranform the patients.csv file first to get integer patient ids assigned. 29 | -- 30 | 31 | -- Read input files 32 | 33 | .mode csv 34 | .import output/csv/conditions.csv conditions 35 | .import output/csv/sh_patients.csv sh_patients 36 | 37 | -- Open output file 38 | 39 | .headers on 40 | .output output/csv/sh_conditions.csv 41 | 42 | -- Transform CSV file. 43 | -- * Join with the transformed patients CSV file to get the integer patient ids. 44 | 45 | SELECT PATIENTID, 46 | START, 47 | STOP, 48 | CODE, 49 | SUBSTR(DESCRIPTION,1,75) AS DESCRIPTION 50 | FROM CONDITIONS 51 | INNER JOIN SH_PATIENTS ON CONDITIONS.PATIENT = SH_PATIENTS.ID; 52 | 53 | .exit -------------------------------------------------------------------------------- /sqlite/createAppointments.sql: -------------------------------------------------------------------------------- 1 | ------------------------------------------------------------------------------ 2 | -- Copyright 2019 IBM Corp. All Rights Reserved. 3 | -- 4 | -- Licensed under the Apache License, Version 2.0 (the "License"); 5 | -- you may not use this file except in compliance with the License. 6 | -- You may obtain a copy of the License at 7 | -- 8 | -- http://www.apache.org/licenses/LICENSE-2.0 9 | -- 10 | -- Unless required by applicable law or agreed to in writing, software 11 | -- distributed under the License is distributed on an "AS IS" BASIS, 12 | -- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | -- See the License for the specific language governing permissions and 14 | -- limitations under the License. 15 | ------------------------------------------------------------------------------ 16 | 17 | -- 18 | -- Create an appointments csv file for synthea patients. 19 | -- 20 | -- Usage: 21 | -- 1. set working directory to Synthea project 22 | -- 2. run "sqlite3 < createUsers.sql" 23 | -- 24 | -- Input: sh_patients.csv 25 | -- Output: sh_users.csv 26 | -- 27 | -- Dependencies: 28 | -- 1. Tranform the patients.csv file first to get integer patient ids assigned. 29 | -- 30 | 31 | -- Read input file 32 | 33 | .mode csv 34 | .import output/csv/sh_patients.csv sh_patients 35 | 36 | -- Create dummy appointments 37 | 38 | CREATE TABLE APPOINTMENTS AS 39 | SELECT PATIENTID, 40 | FIRSTNAME, 41 | LASTNAME, 42 | DATE('now','+3 months') AS APPT_DATE, 43 | '08:00' AS APPT_TIME, 44 | 'Primary Care Physician' AS MED_FIELD, 45 | 'Example Health' AS OFF_NAME, 46 | '1 Main St' AS OFF_ADDR, 47 | CITY AS OFF_CITY, 48 | POSTCODE AS OFF_ZIP 49 | FROM SH_PATIENTS; 50 | 51 | -- Open output file 52 | 53 | .headers on 54 | .output output/csv/sh_appointments.csv 55 | 56 | -- Output table 57 | 58 | SELECT * FROM APPOINTMENTS; -------------------------------------------------------------------------------- /sqlite/transformMedications.sql: -------------------------------------------------------------------------------- 1 | ------------------------------------------------------------------------------ 2 | -- Copyright 2019 IBM Corp. All Rights Reserved. 3 | -- 4 | -- Licensed under the Apache License, Version 2.0 (the "License"); 5 | -- you may not use this file except in compliance with the License. 6 | -- You may obtain a copy of the License at 7 | -- 8 | -- http://www.apache.org/licenses/LICENSE-2.0 9 | -- 10 | -- Unless required by applicable law or agreed to in writing, software 11 | -- distributed under the License is distributed on an "AS IS" BASIS, 12 | -- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | -- See the License for the specific language governing permissions and 14 | -- limitations under the License. 15 | ------------------------------------------------------------------------------ 16 | 17 | -- 18 | -- Transform synthea medications.csv file into format compatible with Example Health. 19 | -- 20 | -- Usage: 21 | -- 1. set working directory to Synthea project 22 | -- 2. run "sqlite3 < transformMedications.sql" 23 | -- 24 | -- Input: medications.csv, sh_patients.csv 25 | -- Output: sh_medications.csv 26 | -- 27 | -- Dependencies: 28 | -- 1. Tranform the patients.csv file first to get integer patient ids assigned. 29 | -- 30 | 31 | -- Read input files 32 | 33 | .mode csv 34 | .import output/csv/medications.csv medications 35 | .import output/csv/sh_patients.csv sh_patients 36 | 37 | -- Open output file 38 | 39 | .headers on 40 | .output output/csv/sh_medications.csv 41 | 42 | -- Transform CSV file. 43 | -- * Join with the transformed patients CSV file to get the integer patient ids. 44 | -- * Truncate columns. 45 | -- * Set columns not produced by Synthea to blank. 46 | 47 | SELECT PATIENTID, 48 | SUBSTR(DESCRIPTION,1,50) AS DRUGNAME, 49 | " " AS STRENGTH, 50 | 0 AS AMOUNT, 51 | " " AS ROUTE, 52 | " " AS FREQUENCY, 53 | " " AS IDENTIFIER, 54 | " " AS TYPE 55 | FROM MEDICATIONS 56 | INNER JOIN SH_PATIENTS ON MEDICATIONS.PATIENT = SH_PATIENTS.ID 57 | WHERE STOP = ''; 58 | 59 | .exit -------------------------------------------------------------------------------- /pom.xml: -------------------------------------------------------------------------------- 1 | 15 | 16 | 17 | 4.0.0 18 | sample.ibm.exampleh 19 | loadutils 20 | jar 21 | database load utilities 22 | 1.0 23 | 24 | 25 | UTF-8 26 | 1.8 27 | 1.8 28 | 29 | 30 | 31 | 32 | com.ibm.db2.jcc 33 | db2jcc4 34 | 4.22.29 35 | 36 | 37 | 38 | 39 | 40 | 41 | org.apache.maven.plugins 42 | maven-assembly-plugin 43 | 2.4.1 44 | 45 | 46 | jar-with-dependencies 47 | 48 | false 49 | 50 | 51 | 52 | make-assembly 53 | package 54 | 55 | single 56 | 57 | 58 | 59 | 60 | 61 | 62 | 63 | -------------------------------------------------------------------------------- /sqlite/transformPatients.sql: -------------------------------------------------------------------------------- 1 | ------------------------------------------------------------------------------ 2 | -- Copyright 2019 IBM Corp. All Rights Reserved. 3 | -- 4 | -- Licensed under the Apache License, Version 2.0 (the "License"); 5 | -- you may not use this file except in compliance with the License. 6 | -- You may obtain a copy of the License at 7 | -- 8 | -- http://www.apache.org/licenses/LICENSE-2.0 9 | -- 10 | -- Unless required by applicable law or agreed to in writing, software 11 | -- distributed under the License is distributed on an "AS IS" BASIS, 12 | -- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | -- See the License for the specific language governing permissions and 14 | -- limitations under the License. 15 | ------------------------------------------------------------------------------ 16 | 17 | -- 18 | -- Transform synthea patients.csv file into format compatible with Example Health. 19 | -- 20 | -- Usage: 21 | -- 1. set working directory to Synthea project 22 | -- 2. run "sqlite3 < transformPatients.sql" 23 | -- 24 | -- Input: patients.csv 25 | -- Output: sh_patients.csv 26 | -- 27 | -- Dependencies: 28 | -- 1. Update the starting patient number in the SELECT statement before running this. 29 | -- 30 | 31 | -- Read input file 32 | 33 | .mode csv 34 | .import output/csv/patients.csv patients 35 | .import output/csv/sh_variables.csv sh_variables 36 | 37 | -- Transform CSV file. 38 | -- * Assign integer patient ids. Keep patient UUID for use in joins done by the other transformations. 39 | -- * Truncate columns. 40 | -- * Set columns not produced by Synthea to blank. 41 | 42 | CREATE TABLE SH_PATIENTS AS 43 | SELECT ROW_NUMBER() OVER(ORDER BY Id) AS PATIENTID, 44 | ID, 45 | "" AS USERNAME, 46 | BIRTHDATE AS DATEOFBIRTH, 47 | REPLACE(SSN,'-','') AS INSCARDNUMBER, 48 | SUBSTR(FIRST,1,20) AS FIRSTNAME, 49 | SUBSTR(LAST,1,20) AS LASTNAME, 50 | SUBSTR(ADDRESS,1,20) AS ADDRESS, 51 | SUBSTR(CITY,1,20) AS CITY, 52 | ZIP AS POSTCODE, 53 | " " AS PHONEMOBILE, 54 | " " AS EMAILADDRESS 55 | FROM PATIENTS WHERE DEATHDATE = ''; 56 | 57 | -- Update patient id above last id 58 | 59 | UPDATE SH_PATIENTS 60 | SET PATIENTID = PATIENTID + (SELECT LASTPATIENTID FROM SH_VARIABLES); 61 | 62 | -- Assign userids 63 | 64 | UPDATE SH_PATIENTS 65 | SET USERNAME = "USER" || PATIENTID; 66 | 67 | -- Open output file 68 | 69 | .headers on 70 | .output output/csv/sh_patients.csv 71 | 72 | -- Output table 73 | 74 | SELECT * FROM SH_PATIENTS; 75 | 76 | .exit -------------------------------------------------------------------------------- /sqlite/transformObservations.sql: -------------------------------------------------------------------------------- 1 | ------------------------------------------------------------------------------ 2 | -- Copyright 2019 IBM Corp. All Rights Reserved. 3 | -- 4 | -- Licensed under the Apache License, Version 2.0 (the "License"); 5 | -- you may not use this file except in compliance with the License. 6 | -- You may obtain a copy of the License at 7 | -- 8 | -- http://www.apache.org/licenses/LICENSE-2.0 9 | -- 10 | -- Unless required by applicable law or agreed to in writing, software 11 | -- distributed under the License is distributed on an "AS IS" BASIS, 12 | -- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | -- See the License for the specific language governing permissions and 14 | -- limitations under the License. 15 | ------------------------------------------------------------------------------ 16 | 17 | -- 18 | -- Transform synthea observations.csv file into format compatible with Example Health. 19 | -- 20 | -- Usage: 21 | -- 1. set working directory to Synthea project 22 | -- 2. run "sqlite3 < transformObservations.sql" 23 | -- 24 | -- Input: observations.csv, sh_patients.csv 25 | -- Output: sh_observations.csv 26 | -- 27 | -- Dependencies: 28 | -- 1. Tranform the patients.csv file first to get integer patient ids assigned. 29 | -- 30 | 31 | -- Read input files 32 | 33 | .mode csv 34 | .import output/csv/observations.csv observations 35 | .import output/csv/sh_patients.csv sh_patients 36 | 37 | -- Open output file 38 | 39 | .headers on 40 | .output output/csv/sh_observations.csv 41 | 42 | -- Transform CSV file. 43 | -- * Join with the transformed patients CSV file to get the integer patient ids. 44 | -- * Truncate columns. 45 | -- * Remove zero-width space characters (x'E2808B') from description as they consume bytes from 75 byte limit. 46 | 47 | SELECT PATIENTID, 48 | DATE(DATE) AS DATEOFOBSERVATION, 49 | CODE, 50 | SUBSTR(REPLACE(DESCRIPTION, x'E2808B', ''),1,75) AS DESCRIPTION, 51 | VALUE AS NUMERICVALUE, 52 | "" AS CHARACTERVALUE, 53 | SUBSTR(UNITS,1,22) AS UNITS 54 | FROM OBSERVATIONS 55 | INNER JOIN SH_PATIENTS ON OBSERVATIONS.PATIENT = SH_PATIENTS.ID 56 | WHERE TYPE = 'numeric' 57 | 58 | UNION ALL 59 | 60 | SELECT PATIENTID, 61 | DATE(DATE) AS DATEOFOBSERVATION, 62 | CODE, 63 | SUBSTR(DESCRIPTION,1,75) AS DESCRIPTION, 64 | NULL AS NUMERICVALUE, 65 | SUBSTR(VALUE,1,30) AS CHARACTERVALUE, 66 | SUBSTR(UNITS,1,22) AS UNITS 67 | FROM OBSERVATIONS 68 | INNER JOIN SH_PATIENTS ON OBSERVATIONS.PATIENT = SH_PATIENTS.ID 69 | WHERE TYPE = 'text'; 70 | 71 | .exit -------------------------------------------------------------------------------- /MAINTAINERS.md: -------------------------------------------------------------------------------- 1 | # Maintainers Guide 2 | 3 | This guide is intended for maintainers - anybody with commit access to one or 4 | more Code Pattern repositories. 5 | 6 | ## Methodology 7 | 8 | This repository does not have a traditional release management cycle, but 9 | should instead be maintained as a useful, working, and polished reference at 10 | all times. While all work can therefore be focused on the master branch, the 11 | quality of this branch should never be compromised. 12 | 13 | The remainder of this document details how to merge pull requests to the 14 | repositories. 15 | 16 | ## Merge approval 17 | 18 | The project maintainers use LGTM (Looks Good To Me) in comments on the pull 19 | request to indicate acceptance prior to merging. A change requires LGTMs from 20 | two project maintainers. If the code is written by a maintainer, the change 21 | only requires one additional LGTM. 22 | 23 | ## Reviewing Pull Requests 24 | 25 | We recommend reviewing pull requests directly within GitHub. This allows a 26 | public commentary on changes, providing transparency for all users. When 27 | providing feedback be civil, courteous, and kind. Disagreement is fine, so long 28 | as the discourse is carried out politely. If we see a record of uncivil or 29 | abusive comments, we will revoke your commit privileges and invite you to leave 30 | the project. 31 | 32 | During your review, consider the following points: 33 | 34 | ### Does the change have positive impact? 35 | 36 | Some proposed changes may not represent a positive impact to the project. Ask 37 | whether or not the change will make understanding the code easier, or if it 38 | could simply be a personal preference on the part of the author (see 39 | [bikeshedding](https://en.wiktionary.org/wiki/bikeshedding)). 40 | 41 | Pull requests that do not have a clear positive impact should be closed without 42 | merging. 43 | 44 | ### Do the changes make sense? 45 | 46 | If you do not understand what the changes are or what they accomplish, ask the 47 | author for clarification. Ask the author to add comments and/or clarify test 48 | case names to make the intentions clear. 49 | 50 | At times, such clarification will reveal that the author may not be using the 51 | code correctly, or is unaware of features that accommodate their needs. If you 52 | feel this is the case, work up a code sample that would address the pull 53 | request for them, and feel free to close the pull request once they confirm. 54 | 55 | ### Does the change introduce a new feature? 56 | 57 | For any given pull request, ask yourself "is this a new feature?" If so, does 58 | the pull request (or associated issue) contain narrative indicating the need 59 | for the feature? If not, ask them to provide that information. 60 | 61 | Are new unit tests in place that test all new behaviors introduced? If not, do 62 | not merge the feature until they are! Is documentation in place for the new 63 | feature? (See the documentation guidelines). If not do not merge the feature 64 | until it is! Is the feature necessary for general use cases? Try and keep the 65 | scope of any given component narrow. If a proposed feature does not fit that 66 | scope, recommend to the user that they maintain the feature on their own, and 67 | close the request. You may also recommend that they see if the feature gains 68 | traction among other users, and suggest they re-submit when they can show such 69 | support. 70 | -------------------------------------------------------------------------------- /schemas.sql: -------------------------------------------------------------------------------- 1 | ------------------------------------------------------------------------------ 2 | -- Copyright 2019 IBM Corp. All Rights Reserved. 3 | -- 4 | -- Licensed under the Apache License, Version 2.0 (the "License"); 5 | -- you may not use this file except in compliance with the License. 6 | -- You may obtain a copy of the License at 7 | -- 8 | -- http://www.apache.org/licenses/LICENSE-2.0 9 | -- 10 | -- Unless required by applicable law or agreed to in writing, software 11 | -- distributed under the License is distributed on an "AS IS" BASIS, 12 | -- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | -- See the License for the specific language governing permissions and 14 | -- limitations under the License. 15 | ------------------------------------------------------------------------------ 16 | 17 | SET CURRENT SCHEMA='SMHEALTH'; 18 | CREATE DATABASE SMHEALTH CCSID EBCDIC; 19 | 20 | CREATE TABLE PATIENT ( 21 | PATIENTID INTEGER NOT NULL GENERATED BY DEFAULT AS IDENTITY, 22 | USERNAME CHAR(10) NOT NULL, 23 | FIRSTNAME CHAR(20), 24 | LASTNAME CHAR(20), 25 | DATEOFBIRTH DATE, 26 | INSCARDNUMBER CHAR(10), 27 | ADDRESS CHAR(20), 28 | CITY CHAR(20), 29 | POSTCODE CHAR(10), 30 | PHONEMOBILE Char(20), 31 | EMAILADDRESS Char(50), 32 | PRIMARY KEY(patientId)) 33 | CCSID EBCDIC 34 | IN DATABASE SMHEALTH; 35 | 36 | CREATE UNIQUE INDEX IPATIENT ON PATIENT (PATIENTID) CLUSTER COPY YES; 37 | 38 | CREATE TABLE USER ( 39 | PATIENTID INTEGER NOT NULL, 40 | USERNAME CHAR(10) NOT NULL, 41 | USERPASSWORD CHAR(32), 42 | PRIMARY KEY(USERNAME), 43 | FOREIGN KEY(PATIENTID) 44 | REFERENCES PATIENT (PATIENTID) ON DELETE CASCADE) 45 | CCSID EBCDIC 46 | IN DATABASE SMHEALTH; 47 | 48 | CREATE UNIQUE INDEX IUSER ON USER (USERNAME) CLUSTER COPY YES; 49 | 50 | CREATE TABLE MEDICATION ( 51 | MEDICATIONID INTEGER NOT NULL GENERATED BY DEFAULT AS IDENTITY, 52 | PATIENTID INTEGER NOT NULL, 53 | DRUGNAME CHAR(50), 54 | STRENGTH CHAR(20), 55 | AMOUNT SMALLINT, 56 | ROUTE CHAR(20), 57 | FREQUENCY CHAR(20), 58 | IDENTIFIER CHAR(20), 59 | TYPE CHAR(2), 60 | PRIMARY KEY(MEDICATIONID), 61 | FOREIGN KEY(PATIENTID) 62 | REFERENCES PATIENT (PATIENTID) ON DELETE CASCADE) 63 | CCSID EBCDIC 64 | IN DATABASE SMHEALTH; 65 | 66 | CREATE UNIQUE INDEX IMEDICATION ON MEDICATION (MEDICATIONID) 67 | CLUSTER COPY YES; 68 | CREATE INDEX MEDICATION2 ON MEDICATION (PATIENTID) COPY YES; 69 | 70 | CREATE TABLE APPOINTMENTS ( 71 | PATIENTID INTEGER NOT NULL, 72 | FIRSTNAME CHAR(20) NOT NULL, 73 | LASTNAME CHAR(20) NOT NULL, 74 | APPT_DATE DATE NOT NULL, 75 | APPT_TIME CHAR(5) NOT NULL, 76 | DR_NAME CHAR(30), 77 | MED_FIELD CHAR(30), 78 | OFF_NAME CHAR(50), 79 | OFF_ADDR CHAR(40), 80 | OFF_CITY CHAR(20), 81 | OFF_STATE CHAR(2), 82 | OFF_ZIP CHAR(10), 83 | NOTES CHAR(40), 84 | FOLLOWUP CHAR(10)) 85 | CCSID EBCDIC 86 | IN DATABASE SMHEALTH; 87 | 88 | CREATE TABLE OBSERVATIONS ( 89 | PATIENTID INTEGER NOT NULL, 90 | DATEOFOBSERVATION DATE, 91 | CODE CHAR(8), 92 | DESCRIPTION VARCHAR(75), 93 | NUMERICVALUE DECIMAL(10, 2), 94 | CHARACTERVALUE VARCHAR(30), 95 | UNITS VARCHAR(22)) 96 | CCSID EBCDIC 97 | IN DATABASE SMHEALTH; 98 | 99 | CREATE TABLE CONDITIONS ( 100 | PATIENTID INTEGER NOT NULL, 101 | START DATE, 102 | STOP DATE, 103 | CODE CHAR(15), 104 | DESCRIPTION VARCHAR(75)) 105 | CCSID EBCDIC 106 | IN DATABASE SMHEALTH; -------------------------------------------------------------------------------- /src/main/java/GetDBData.java: -------------------------------------------------------------------------------- 1 | /*############################################################################## 2 | # Copyright 2019 IBM Corp. All Rights Reserved. 3 | # 4 | # Licensed under the Apache License, Version 2.0 (the "License"); 5 | # you may not use this file except in compliance with the License. 6 | # You may obtain a copy of the License at 7 | # 8 | # http://www.apache.org/licenses/LICENSE-2.0 9 | # 10 | # Unless required by applicable law or agreed to in writing, software 11 | # distributed under the License is distributed on an "AS IS" BASIS, 12 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | # See the License for the specific language governing permissions and 14 | # limitations under the License. 15 | ##############################################################################*/ 16 | 17 | import com.ibm.db2.jcc.*; 18 | import java.io.*; 19 | import java.sql.*; 20 | import java.time.*; 21 | 22 | public class GetDBData { 23 | 24 | static final String urlPrefix = "jdbc:db2:"; 25 | 26 | String outputFileName; 27 | String url; 28 | String user; 29 | String password; 30 | String schema; 31 | 32 | public static void main(String[] args) { 33 | if (args.length != 5) { 34 | System.err.println("This program requires these arguments:"); 35 | System.err.println(" outputFileName database-url user password tablename"); 36 | System.err.println(""); 37 | System.err.println("outputFileName: output csv file"); 38 | System.err.println("database-url: database URL in the form //host:port/location"); 39 | System.err.println("user: userid used to connect to database"); 40 | System.err.println("password: password used to connect to database"); 41 | System.err.println("schema: schema which contains patient table"); 42 | System.exit(1); 43 | } 44 | String _outputFileName = args[0]; 45 | String _url = urlPrefix + args[1]; 46 | String _user = args[2]; 47 | String _password = args[3]; 48 | String _schema = args[4]; 49 | new GetDBData(_outputFileName, _url, _user, _password, _schema).run(); 50 | } 51 | 52 | public GetDBData(String outputFileName, String url, String user, String password, String schema) { 53 | this.outputFileName = outputFileName; 54 | this.url = url; 55 | this.user = user; 56 | this.password = password; 57 | this.schema = schema; 58 | } 59 | 60 | public void run() { 61 | Connection con; 62 | try { 63 | // Load the driver 64 | log("Loading the JDBC driver"); 65 | Class.forName("com.ibm.db2.jcc.DB2Driver"); 66 | 67 | // Create the connection using the IBM Data Server Driver for JDBC and SQLJ 68 | log("Creating a JDBC connection to " + url + " with user " + user); 69 | con = DriverManager.getConnection (url, user, password); 70 | 71 | // Create the Statement 72 | Statement stmt = con.createStatement(); 73 | 74 | // Execute a query and generate a ResultSet instance 75 | log("Querying database"); 76 | ResultSet rs = stmt.executeQuery("SELECT MAX(PATIENTID) FROM " + schema + ".PATIENT"); 77 | 78 | if (!rs.next()) { 79 | throw new RuntimeException("Empty result set"); 80 | } 81 | 82 | Integer lastpatientid = rs.getInt(1); 83 | log("Last patientid is " + lastpatientid); 84 | 85 | // Close the ResultSet 86 | rs.close(); 87 | 88 | // Close the Statement 89 | stmt.close(); 90 | 91 | // Close the connection 92 | con.close(); 93 | 94 | // Write to output file 95 | DataOutputStream dos = new DataOutputStream(new FileOutputStream(outputFileName)); 96 | dos.writeBytes("LASTPATIENTID\n"); 97 | dos.writeBytes(lastpatientid.toString()); 98 | dos.close(); 99 | 100 | } 101 | 102 | catch (ClassNotFoundException e) { 103 | System.err.println("Could not load JDBC driver"); 104 | System.out.println("Exception: " + e); 105 | e.printStackTrace(); 106 | } 107 | 108 | catch(SQLException sqlex) { 109 | System.err.println("SQLException information"); 110 | System.err.println ("Error msg: " + sqlex.getMessage()); 111 | System.err.println ("SQLSTATE: " + sqlex.getSQLState()); 112 | System.err.println ("Error code: " + sqlex.getErrorCode()); 113 | sqlex.printStackTrace(); 114 | } 115 | 116 | catch(Exception ex) { 117 | ex.printStackTrace(); 118 | } 119 | 120 | } 121 | 122 | private void log(String msg) { 123 | System.out.println(LocalDateTime.now().toString() + ": " + msg); 124 | } 125 | } -------------------------------------------------------------------------------- /run.bat: -------------------------------------------------------------------------------- 1 | @REM Copyright 2019 IBM Corp. All Rights Reserved. 2 | @REM 3 | @REM Licensed under the Apache License, Version 2.0 (the "License"); 4 | @REM you may not use this file except in compliance with the License. 5 | @REM You may obtain a copy of the License at 6 | @REM 7 | @REM http://www.apache.org/licenses/LICENSE-2.0 8 | @REM 9 | @REM Unless required by applicable law or agreed to in writing, software 10 | @REM distributed under the License is distributed on an "AS IS" BASIS, 11 | @REM WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 | @REM See the License for the specific language governing permissions and 13 | @REM limitations under the License. 14 | 15 | 16 | @REM ----------------------------------------------------------------- 17 | @REM Generate patient data using Synthea and load into z/OS databases. 18 | @REM Current directory must be where Synthea is installed. 19 | @REM ----------------------------------------------------------------- 20 | 21 | @if exist output\csv goto existingOutputERROR 22 | 23 | @SETLOCAL 24 | 25 | @SET NUMPATIENTS=%1 26 | @SET STATE=%~2 27 | 28 | @SET scriptdir=%~dp0 29 | @SET transforms=%scriptdir%sqlite 30 | @SET columndefs=%scriptdir%columndefs 31 | @SET jarfile=%scriptdir%target\loadutils-1.0.jar 32 | 33 | @if "%DATABASE_URL%"=="" @set /p DATABASE_URL="Enter database URL: " 34 | @if "%DATABASE_USER%"=="" @set /p DATABASE_USER="Enter database userid: " 35 | @if "%DATABASE_PASSWORD%"=="" @set /p DATABASE_PASSWORD="Enter database password: " 36 | @if "%DATABASE_SCHEMA%"=="" @set /p DATABASE_SCHEMA="Enter database schema name: " 37 | 38 | @echo. && @echo %TIME%: Generating data using Synthea && @echo. 39 | 40 | call ./gradlew.bat run -Params="[ '-p','%NUMPATIENTS%', '%STATE%' ]" 41 | if not exist output\csv\patients.csv goto syntheaERROR 42 | if not exist output\csv\medications.csv goto syntheaERROR 43 | if not exist output\csv\observations.csv goto syntheaERROR 44 | 45 | @echo. && @echo %TIME%: Getting information from z/OS tables && @echo. 46 | 47 | java -cp %jarfile% GetDBData output/csv/sh_variables.csv %DATABASE_URL% %DATABASE_USER% %DATABASE_PASSWORD% %DATABASE_SCHEMA% 48 | if not exist output\csv\sh_variables.csv goto getDBDataERROR 49 | 50 | @echo. && @echo %TIME%: Transforming csv files && @echo. 51 | 52 | sqlite3 < %transforms%\transformPatients.sql 53 | if not exist output\csv\sh_patients.csv goto sqliteERROR 54 | 55 | sqlite3 < %transforms%\transformMedications.sql 56 | if not exist output\csv\sh_medications.csv goto sqliteERROR 57 | 58 | sqlite3 < %transforms%\transformObservations.sql 59 | if not exist output\csv\sh_observations.csv goto sqliteERROR 60 | 61 | sqlite3 < %transforms%\transformConditions.sql 62 | if not exist output\csv\sh_conditions.csv goto sqliteERROR 63 | 64 | sqlite3 < %transforms%\createAppointments.sql 65 | if not exist output\csv\sh_appointments.csv goto sqliteERROR 66 | 67 | sqlite3 < %transforms%\createUsers.sql 68 | if not exist output\csv\sh_users.csv goto sqliteERROR 69 | 70 | @echo. && @echo %TIME%: Loading z/OS tables && @echo. 71 | 72 | java -cp %jarfile% ZLoadFile output/csv/sh_patients.csv %columndefs%/sh-patients-csvcolumns.txt %DATABASE_URL% %DATABASE_USER% %DATABASE_PASSWORD% %DATABASE_SCHEMA%.PATIENT 73 | if errorlevel 8 goto zloadERROR 74 | 75 | java -cp %jarfile% ZLoadFile output/csv/sh_medications.csv %columndefs%/sh-medications-csvcolumns.txt %DATABASE_URL% %DATABASE_USER% %DATABASE_PASSWORD% %DATABASE_SCHEMA%.MEDICATION 76 | if errorlevel 8 goto zloadERROR 77 | 78 | java -cp %jarfile% ZLoadFile output/csv/sh_observations.csv %columndefs%/sh-observations-csvcolumns.txt %DATABASE_URL% %DATABASE_USER% %DATABASE_PASSWORD% %DATABASE_SCHEMA%.OBSERVATIONS 79 | if errorlevel 8 goto zloadERROR 80 | 81 | java -cp %jarfile% ZLoadFile output/csv/sh_conditions.csv %columndefs%/sh-conditions-csvcolumns.txt %DATABASE_URL% %DATABASE_USER% %DATABASE_PASSWORD% %DATABASE_SCHEMA%.CONDITIONS 82 | if errorlevel 8 goto zloadERROR 83 | 84 | java -cp %jarfile% ZLoadFile output/csv/sh_appointments.csv %columndefs%/sh-appointments-csvcolumns.txt %DATABASE_URL% %DATABASE_USER% %DATABASE_PASSWORD% %DATABASE_SCHEMA%.APPOINTMENTS 85 | if errorlevel 8 goto zloadERROR 86 | 87 | java -cp %jarfile% ZLoadFile output/csv/sh_users.csv %columndefs%/sh-users-csvcolumns.txt %DATABASE_URL% %DATABASE_USER% %DATABASE_PASSWORD% %DATABASE_SCHEMA%.USER 88 | if errorlevel 8 goto zloadERROR 89 | 90 | @echo. && @echo %TIME%: Finished && @echo. 91 | goto end 92 | 93 | :existingOutputERROR 94 | @echo. && echo ERROR: The output/csv folder exists from a previous execution. Please delete or rename it first. 95 | goto end 96 | 97 | :syntheaERROR 98 | @echo. && echo ERROR: Synthea run did not create the expected csv files. Check preceding messages. 99 | goto end 100 | 101 | :getDBDataERROR 102 | @echo. && echo ERROR: Problem obtaining data from database. Check preceding messages. 103 | goto end 104 | 105 | :sqliteERROR 106 | @echo. && echo ERROR: Problem transforming CSV files. Check preceding messages. 107 | goto end 108 | 109 | :zloadERROR 110 | @echo. && echo ERROR: Problem loading data to z/OS database. Check preceding messages. 111 | goto end 112 | 113 | :end -------------------------------------------------------------------------------- /run.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | ############################################################################## 3 | # Copyright 2019 IBM Corp. All Rights Reserved. 4 | # 5 | # Licensed under the Apache License, Version 2.0 (the "License"); 6 | # you may not use this file except in compliance with the License. 7 | # You may obtain a copy of the License at 8 | # 9 | # http://www.apache.org/licenses/LICENSE-2.0 10 | # 11 | # Unless required by applicable law or agreed to in writing, software 12 | # distributed under the License is distributed on an "AS IS" BASIS, 13 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 14 | # See the License for the specific language governing permissions and 15 | # limitations under the License. 16 | ############################################################################## 17 | 18 | # ----------------------------------------------------------------- 19 | # Generate patient data using Synthea and load into z/OS databases. 20 | # Current directory must be where Synthea is installed. 21 | # ----------------------------------------------------------------- 22 | 23 | if [ -d output/csv ]; then 24 | printf "\nERROR: The output/csv folder exists from a previous execution. Please delete or rename it first.\n" 25 | exit 1 26 | fi 27 | 28 | NUMPATIENTS=$1 29 | STATE=$2 30 | 31 | scriptdir=`dirname ${BASH_SOURCE[0]}` 32 | transforms=$scriptdir/sqlite 33 | columndefs=$scriptdir/columndefs 34 | jarfile=$scriptdir/target/loadutils-1.0.jar 35 | 36 | if [ "$DATABASE_URL" = "" ]; then 37 | read -p "Enter database URL: " DATABASE_URL 38 | fi 39 | 40 | if [ "$DATABASE_USER" = "" ]; then 41 | read -p "Enter database userid: " DATABASE_USER 42 | fi 43 | 44 | if [ "$DATABASE_PASSWORD" = "" ]; then 45 | read -s -p "Enter database password: " DATABASE_PASSWORD 46 | printf "\n" 47 | fi 48 | 49 | if [ "$DATABASE_SCHEMA" = "" ]; then 50 | read -p "Enter database schema name: " DATABASE_SCHEMA 51 | fi 52 | 53 | now=$(date +"%T") 54 | printf "\n$now: Generating data using Synthea\n" 55 | 56 | ./gradlew run -Params="[ '-p','$NUMPATIENTS', '$STATE' ]" 57 | 58 | if [ ! -f output/csv/patients.csv ] || [ ! -f output/csv/medications.csv ] || [ ! -f output/csv/observations.csv ]; then 59 | printf "\nERROR: Synthea run did not create the expected csv files. Check preceding messages.\n" 60 | exit 1 61 | fi 62 | 63 | now=$(date +"%T") 64 | printf "\n$now: Getting information from z/OS tables\n" 65 | 66 | java -cp $jarfile GetDBData output/csv/sh_variables.csv $DATABASE_URL $DATABASE_USER $DATABASE_PASSWORD $DATABASE_SCHEMA 67 | 68 | if [ ! -f output/csv/sh_variables.csv ]; then 69 | printf "\nERROR: Problem obtaining data from database. Check preceding messages.\n" 70 | exit 1 71 | fi 72 | 73 | now=$(date +"%T") 74 | printf "\n$now: Transforming csv files\n" 75 | 76 | sqlite3 < $transforms/transformPatients.sql 77 | if [ ! -f output/csv/sh_patients.csv ] || [ ! -s output/csv/sh_patients.csv ]; then 78 | printf "\nERROR: Problem transforming patients CSV file. Check preceding messages.\n" 79 | exit 1 80 | fi 81 | 82 | sqlite3 < $transforms/transformMedications.sql 83 | if [ ! -f output/csv/sh_medications.csv ] || [ ! -s output/csv/sh_medications.csv ]; then 84 | printf "\nERROR: Problem transforming medications CSV file. Check preceding messages.\n" 85 | exit 1 86 | fi 87 | 88 | sqlite3 < $transforms/transformObservations.sql 89 | if [ ! -f output/csv/sh_observations.csv ] || [ ! -s output/csv/sh_observations.csv ]; then 90 | printf "\nERROR: Problem transforming observations CSV file. Check preceding messages.\n" 91 | exit 1 92 | fi 93 | 94 | sqlite3 < $transforms/transformConditions.sql 95 | if [ ! -f output/csv/sh_conditions.csv ] || [ ! -s output/csv/sh_conditions.csv ]; then 96 | printf "\nERROR: Problem transforming conditions CSV file. Check preceding messages.\n" 97 | exit 1 98 | fi 99 | 100 | sqlite3 < $transforms/createAppointments.sql 101 | if [ ! -f output/csv/sh_appointments.csv ] || [ ! -s output/csv/sh_appointments.csv ]; then 102 | printf "\nERROR: Problem transforming appointments CSV file. Check preceding messages.\n" 103 | exit 1 104 | fi 105 | 106 | sqlite3 < $transforms/createUsers.sql 107 | if [ ! -f output/csv/sh_users.csv ] || [ ! -s output/csv/sh_users.csv ]; then 108 | printf "\nERROR: Problem transforming users CSV file. Check preceding messages.\n" 109 | exit 1 110 | fi 111 | 112 | now=$(date +"%T") 113 | printf "\n$now: Loading z/OS tables\n" 114 | 115 | java -cp $jarfile ZLoadFile output/csv/sh_patients.csv $columndefs/sh-patients-csvcolumns.txt $DATABASE_URL $DATABASE_USER $DATABASE_PASSWORD $DATABASE_SCHEMA.PATIENT 116 | if [ $? -ge 8 ]; then 117 | printf "\nERROR: Problem loading patient data to z/OS database. Check preceding messages.\n" 118 | exit 1 119 | fi 120 | 121 | java -cp $jarfile ZLoadFile output/csv/sh_medications.csv $columndefs/sh-medications-csvcolumns.txt $DATABASE_URL $DATABASE_USER $DATABASE_PASSWORD $DATABASE_SCHEMA.MEDICATION 122 | if [ $? -ge 8 ]; then 123 | printf "\nERROR: Problem loading medications data to z/OS database. Check preceding messages.\n" 124 | exit 1 125 | fi 126 | 127 | java -cp $jarfile ZLoadFile output/csv/sh_observations.csv $columndefs/sh-observations-csvcolumns.txt $DATABASE_URL $DATABASE_USER $DATABASE_PASSWORD $DATABASE_SCHEMA.OBSERVATIONS 128 | if [ $? -ge 8 ]; then 129 | printf "\nERROR: Problem loading observations data to z/OS database. Check preceding messages.\n" 130 | exit 1 131 | fi 132 | 133 | java -cp $jarfile ZLoadFile output/csv/sh_conditions.csv $columndefs/sh-conditions-csvcolumns.txt $DATABASE_URL $DATABASE_USER $DATABASE_PASSWORD $DATABASE_SCHEMA.CONDITIONS 134 | if [ $? -ge 8 ]; then 135 | printf "\nERROR: Problem loading conditions data to z/OS database. Check preceding messages.\n" 136 | exit 1 137 | fi 138 | 139 | java -cp $jarfile ZLoadFile output/csv/sh_appointments.csv $columndefs/sh-appointments-csvcolumns.txt $DATABASE_URL $DATABASE_USER $DATABASE_PASSWORD $DATABASE_SCHEMA.APPOINTMENTS 140 | if [ $? -ge 8 ]; then 141 | printf "\nERROR: Problem loading appointments data to z/OS database. Check preceding messages.\n" 142 | exit 1 143 | fi 144 | 145 | java -cp $jarfile ZLoadFile output/csv/sh_users.csv $columndefs/sh-users-csvcolumns.txt $DATABASE_URL $DATABASE_USER $DATABASE_PASSWORD $DATABASE_SCHEMA.USER 146 | if [ $? -ge 8 ]; then 147 | printf "\nERROR: Problem loading users data to z/OS database. Check preceding messages.\n" 148 | exit 1 149 | fi 150 | 151 | now=$(date +"%T") 152 | printf "\n$now: Finished\n" 153 | -------------------------------------------------------------------------------- /src/main/java/ZLoadFile.java: -------------------------------------------------------------------------------- 1 | /*############################################################################## 2 | # Copyright 2019 IBM Corp. All Rights Reserved. 3 | # 4 | # Licensed under the Apache License, Version 2.0 (the "License"); 5 | # you may not use this file except in compliance with the License. 6 | # You may obtain a copy of the License at 7 | # 8 | # http://www.apache.org/licenses/LICENSE-2.0 9 | # 10 | # Unless required by applicable law or agreed to in writing, software 11 | # distributed under the License is distributed on an "AS IS" BASIS, 12 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | # See the License for the specific language governing permissions and 14 | # limitations under the License. 15 | ##############################################################################*/ 16 | 17 | import com.ibm.db2.jcc.*; 18 | import java.io.*; 19 | import java.sql.*; 20 | import java.time.*; 21 | 22 | public class ZLoadFile { 23 | 24 | static final String urlPrefix = "jdbc:db2:"; 25 | 26 | String inputFileName; 27 | String blockModeFileName; 28 | String columnDefFileName; 29 | String url; 30 | String user; 31 | String password; 32 | String table; 33 | 34 | public static void main(String[] args) { 35 | if (args.length != 6) { 36 | System.err.println("This program requires these arguments:"); 37 | System.err.println(" inputFileName columnDefFileName database-url user password tablename"); 38 | System.err.println(""); 39 | System.err.println("inputFileName: csv file to load into z/OS database"); 40 | System.err.println("columnDefFileName: csv file column definitions in format expected by z/OS load utility"); 41 | System.err.println("database-url: database URL in the form //host:port/location"); 42 | System.err.println("user: userid used to connect to database"); 43 | System.err.println("password: password used to connect to database"); 44 | System.err.println("tablename: table name to load, in schema.table format"); 45 | System.exit(1); 46 | } 47 | String _inputFileName = args[0]; 48 | String _columnDefFileName = args[1]; 49 | String _url = urlPrefix + args[2]; 50 | String _user = args[3]; 51 | String _password = args[4]; 52 | String _table = args[5]; 53 | int rc = new ZLoadFile(_inputFileName, _columnDefFileName, _url, _user, _password, _table).run(); 54 | System.exit(rc); 55 | } 56 | 57 | public ZLoadFile(String inputFileName, String columnDefFileName, String url, String user, String password, String table) { 58 | this.inputFileName = inputFileName; 59 | this.blockModeFileName = inputFileName + ".del"; 60 | this.columnDefFileName = columnDefFileName; 61 | this.url = url; 62 | this.user = user; 63 | this.password = password; 64 | this.table = table; 65 | } 66 | 67 | public int run() { 68 | int returnCode; 69 | Connection con; 70 | try { 71 | String columnDefs = getColumnDefinitions(); 72 | 73 | convertFileToBlockMode(); 74 | 75 | // Load the driver 76 | log("Loading the JDBC driver"); 77 | Class.forName("com.ibm.db2.jcc.DB2Driver"); 78 | 79 | // Create the connection using the IBM Data Server Driver for JDBC and SQLJ 80 | log("Creating a JDBC connection to " + url + " with user " + user); 81 | con = DriverManager.getConnection (url, user, password); 82 | 83 | DB2Connection db2conn = (DB2Connection)con; 84 | String loadstmt = "TEMPLATE SORTIN DSN " + user + ".SORTIN.T&TIME. " + 85 | "UNIT SYSDA SPACE(10,10) CYL DISP(NEW,DELETE,DELETE) " + 86 | "TEMPLATE SORTOUT DSN " + user + ".SORTOUT.T&TIME. UNIT SYSDA " + 87 | "SPACE(10,10) CYL DISP(NEW,DELETE,DELETE) " + 88 | "TEMPLATE MAP DSN " + user + ".SYSMAP UNIT SYSDA " + 89 | "SPACE(10,10) CYL DISP(NEW,DELETE,CATLG) " + 90 | "LOAD DATA INDDN SYSCLIEN WORKDDN(SORTIN,SORTOUT) RESUME YES " + 91 | "FORMAT DELIMITED ASCII CCSID(1252) MAPDDN MAP " + 92 | "INTO TABLE " + table + " IGNOREFIELDS YES " + columnDefs; 93 | 94 | log("Uploading data"); 95 | 96 | LoadResult lr = db2conn.zLoad(loadstmt, blockModeFileName, null); 97 | returnCode = lr.getReturnCode(); 98 | String loadMessage = lr.getMessage(); 99 | 100 | // Close the connection 101 | con.close(); 102 | 103 | if (returnCode >= 8) { 104 | log("Upload of " + inputFileName + " FAILED. Return code " + returnCode + ". " + loadMessage); 105 | } else { 106 | log("Upload of " + inputFileName + " complete. Return code " + returnCode + ". " + loadMessage); 107 | } 108 | 109 | } 110 | 111 | catch (ClassNotFoundException e) { 112 | System.err.println("Could not load JDBC driver"); 113 | System.out.println("Exception: " + e); 114 | e.printStackTrace(); 115 | returnCode = 8; 116 | } 117 | 118 | catch(SQLException sqlex) { 119 | System.err.println("SQLException information"); 120 | System.err.println ("Error msg: " + sqlex.getMessage()); 121 | System.err.println ("SQLSTATE: " + sqlex.getSQLState()); 122 | System.err.println ("Error code: " + sqlex.getErrorCode()); 123 | sqlex.printStackTrace(); 124 | returnCode = 8; 125 | } 126 | 127 | catch(Exception ex) { 128 | ex.printStackTrace(); 129 | returnCode = 8; 130 | } 131 | 132 | return returnCode; 133 | } 134 | 135 | private String getColumnDefinitions() throws IOException { 136 | log("Reading column definitions file " + columnDefFileName); 137 | StringBuffer sb = new StringBuffer(); 138 | BufferedReader in = new BufferedReader(new FileReader(columnDefFileName)); 139 | String line = in.readLine(); 140 | while (line != null) { 141 | sb.append(line.trim() + " "); 142 | line = in.readLine(); 143 | } 144 | return sb.toString(); 145 | } 146 | 147 | private void convertFileToBlockMode() throws IOException { 148 | log("Converting input file " + inputFileName + " to block mode"); 149 | BufferedReader in = new BufferedReader(new FileReader(inputFileName)); 150 | DataOutputStream dos = new DataOutputStream(new FileOutputStream(blockModeFileName)); 151 | String line = in.readLine(); 152 | line = in.readLine(); // Skip over header line to first line of data 153 | while (line != null) { 154 | String curLine = line; 155 | line = in.readLine(); 156 | byte descriptor = (line != null ? (byte)128 : (byte)64); 157 | dos.writeByte(descriptor); 158 | dos.writeShort(curLine.length()); 159 | dos.writeBytes(curLine); 160 | } 161 | dos.close(); 162 | in.close(); 163 | } 164 | 165 | private void log(String msg) { 166 | System.out.println(LocalDateTime.now().toString() + ": " + msg); 167 | } 168 | } -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Transforming and loading big data CSV files into a DB2 for z/OS database 2 | 3 | This code pattern offers a tried and tested approach for transforming a large set of varying CSV schemas into a subset of SQL schemas using an open-source tool called SQLite 4 | and loading them to a DB2 for z/OS database using a JDBC function called zload. 5 | 6 | This work was done as part of the Example Health set of code patterns, which demonstrate how cloud technology can access data stored on z/OS systems. 7 | We needed a way to generate a large amount of patient health care data to populate the DB2 for z/OS database. 8 | We found an open source tool called [Synthea](https://github.com/synthetichealth/synthea/) which generates the kind of synthentic data we wanted. 9 | 10 | The Synthea CSV files needed to be transformed to match the table schemas used in the Example Health application. 11 | We found a public domain tool called [SQLite](https://www.SQLite.org/index.html) which made these transformations easy. 12 | 13 | Finally the transformed CSV files needed to be loaded from a distributed workstation into the DB2 for z/OS database. 14 | We used a JDBC function called [zload](https://www.ibm.com/support/knowledgecenter/en/SSEPGG_11.1.0/com.ibm.db2.luw.apdv.java.doc/src/tpc/imjcc_tjv00027.html) to accomplish this. 15 | zload requires DB2 for z/OS version 12. 16 | 17 | When the reader has completed this Code Pattern, they will understand how to: 18 | 19 | * use SQLite to transform a CSV file to match the schema of a DB2 table 20 | * use JDBC to load a CSV file from a distributed workstation into a DB2 for z/OS database. 21 | 22 | ## Flow 23 | 24 | A shell script (`run.sh` or `run.bat`) drives the processing. There are four main steps as shown below. 25 | 26 | ![](doc/source/images/architecture.png) 27 | 28 | 1. The Synthea tool is called to generate a set of CSV files containing synthesized patient health care data. 29 | 2. A JDBC program is called to determine the current maximum patient number in the DB2 for z/OS database. 30 | 3. The SQLite program is called to transform the CSV files produced by Synthea to match the schema of the DB2 for z/OS database. 31 | 4. A JDBC program is called to load the transformed CSV files into the DB2 for z/OS database tables. 32 | 33 | The Synthea tool produces a variety of data. We were interested in the following subset: 34 | * patients 35 | * observations 36 | * medications 37 | * conditions 38 | 39 | We needed to do some transformations from the [Synthea schemas](https://github.com/synthetichealth/synthea/wiki/CSV-File-Data-Dictionary) to our [DB2 schemas](schemas.sql). 40 | 1. Patients are identified by UUIDs in Synthea files and by integers in our database. 41 | 2. Our database has a subset of the columns produced by Synthea and some of our columns have shorter length restrictions. 42 | 43 | The SQLite tool makes it easy to do these transformations on CSV files. 44 | To transform the patients file, we used the ROW_NUMBER() function to assign integer patient ids. 45 | In order to avoid conflicts with existing patient numbers, before using SQLite we call a JDBC program to get the highest patient number in use. 46 | That number is added to all the rows. 47 | (It is assumed no other patients are being added to the database at the same time.) 48 | 49 | To transform the observations, medications and conditions files, we use a JOIN with the transformed patients file to substitute the 50 | patient UUIDs with the integer ids. 51 | 52 | We also used SQLite to create data for other tables (users, appointments) that don't exist in the Synthea data. 53 | This data could have been generated on z/OS instead but it was easier to reuse the framework for transforming files 54 | to generate new ones as well. 55 | 56 | Finally to get the data into the DB2 for z/OS database we made use of the zload JDBC function. 57 | The zload function transfers the CSV file to z/OS and invokes the LOAD utility to load the file into a table. 58 | The JDBC program uses parameters to know which database to connect to, the credentials needed for the connection, 59 | the table to load, and the column format of the CSV file. 60 | 61 | zload requires DB2 for z/OS version 12. 62 | It also requires the IBM Data Server Driver for JDBC version 4.22.29 or later. 63 | 64 | # Steps 65 | 66 | 1. Install the following prerequisite tools. 67 | * A Java 8 (or higher) JDK such as [OpenJDK](https://openjdk.java.net/install/index.html) 68 | * [maven](https://maven.apache.org/download.cgi) 69 | * [gradle](https://gradle.org/install/) 70 | * [SQLite](https://SQLite.org/download.html) version 3.26.0 71 | * DB2 JDBC driver version 4.22.29 or later. This needs to be installed in your local Maven artifact repository. 72 | * If you already have the DB2 JDBC driver version 4.22.29 or later in your internal Maven artifact repository, you can use it. 73 | You might need to change the `groupId`, `artifactId`, and `version` for the dependency in this project's `pom.xml` file to match the values used in your repository. 74 | * Otherwise download the [DB2 JDBC Driver](http://www-01.ibm.com/support/docview.wss?uid=swg21363866) version 4.22.29. Extract the `db2jcc4.jar` file from the downloaded archive file 75 | and run the following command to install it to your local Maven repository: 76 | ``` 77 | mvn install:install-file -Dfile=db2jcc4.jar -Dversion=4.22.29 -DgroupId=com.ibm.db2.jcc -DartifactId=db2jcc4 -Dpackaging=jar 78 | ``` 79 | 80 | 2. Clone and build this project. 81 | ``` 82 | git clone https://github.com/IBM/example-health-synthea.git 83 | cd example-health-synthea 84 | mvn package 85 | ``` 86 | 87 | 3. Clone and build the [Synthea project](https://github.com/synthetichealth/synthea/) 88 | ``` 89 | git clone https://github.com/synthetichealth/synthea.git 90 | cd synthea 91 | ./gradlew build check test 92 | ``` 93 | 94 | If you encounter any OutOfMemoryError exceptions you may need to update the `build.gradle` file to increase the size of the Java heap. 95 | 96 | ``` 97 | test { 98 | maxHeapSize = "8192m" 99 | } 100 | run { 101 | maxHeapSize = "8192m" 102 | } 103 | ``` 104 | 105 | 4. Change the following properties in synthea/src/main/resources/synthea.properties: 106 | * Set exporter.csv.export to true 107 | * Set generate.append_numbers_to_person_names = false (optional) 108 | 109 | 5. Create the DB2 for z/OS database. The [schemas.sql](schemas.sql) file contains SQL for creating the database. You can use SPUFI 110 | or the DSNTEP2 or DSNTIAD sample programs to process the SQL. You can change the database name or schema name if desired. 111 | 112 | 6. Set up environment variables that the script needs to connect to your DB2 for z/OS database. 113 | This includes the database URL, a userid and password to authenticate to DB2, and the database schema name where you defined the tables. 114 | If you do not set these variables, the script prompts you for them. (Beware that the Windows 115 | script does not mask the database password when you type it so it is recommended to use the 116 | variable instead.) 117 | 118 | ``` 119 | export DATABASE_URL=//host:port/location 120 | export DATABASE_USER=userid 121 | export DATABASE_PASSWORD=password 122 | export DATABASE_SCHEMA=schema name 123 | ``` 124 | 125 | 7. Run the script from this project with the current directory set to your synthea project. 126 | (The syntax below assumes you cloned this project and the synthea project to sibling folders.) 127 | 128 | ``` 129 | cd synthea 130 | ../example-health-synthea/run 10 "New York" 131 | ``` 132 | 133 | The first argument tells Synthea how many patients to generate. 134 | The second argument tells Synthea which U.S. state the patients live in. 135 | 136 | # Sample output 137 | 138 | Sample output from the script is in the [sample.out](sample.out) file. 139 | 140 | Elapsed time (in minutes and seconds) to generate, transform and load 5000 patients: 141 | * Synthea tool: 5:50 142 | * SQLite transformations: 0:14 143 | * zload calls: 2:19 144 | 145 | ## License 146 | 147 | This code pattern is licensed under the Apache License, Version 2. 148 | Separate third-party code objects invoked within this code pattern are licensed by their respective providers pursuant to their own separate licenses. 149 | Contributions are subject to the [Developer Certificate of Origin, Version 1.1](https://developercertificate.org/) and the [Apache License, Version 2](https://www.apache.org/licenses/LICENSE-2.0.txt). 150 | 151 | [Apache License FAQ](https://www.apache.org/foundation/license-faq.html#WhatDoesItMEAN) -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Apache License 2 | Version 2.0, January 2004 3 | http://www.apache.org/licenses/ 4 | 5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 6 | 7 | 1. Definitions. 8 | 9 | "License" shall mean the terms and conditions for use, reproduction, 10 | and distribution as defined by Sections 1 through 9 of this document. 11 | 12 | "Licensor" shall mean the copyright owner or entity authorized by 13 | the copyright owner that is granting the License. 14 | 15 | "Legal Entity" shall mean the union of the acting entity and all 16 | other entities that control, are controlled by, or are under common 17 | control with that entity. For the purposes of this definition, 18 | "control" means (i) the power, direct or indirect, to cause the 19 | direction or management of such entity, whether by contract or 20 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 21 | outstanding shares, or (iii) beneficial ownership of such entity. 22 | 23 | "You" (or "Your") shall mean an individual or Legal Entity 24 | exercising permissions granted by this License. 25 | 26 | "Source" form shall mean the preferred form for making modifications, 27 | including but not limited to software source code, documentation 28 | source, and configuration files. 29 | 30 | "Object" form shall mean any form resulting from mechanical 31 | transformation or translation of a Source form, including but 32 | not limited to compiled object code, generated documentation, 33 | and conversions to other media types. 34 | 35 | "Work" shall mean the work of authorship, whether in Source or 36 | Object form, made available under the License, as indicated by a 37 | copyright notice that is included in or attached to the work 38 | (an example is provided in the Appendix below). 39 | 40 | "Derivative Works" shall mean any work, whether in Source or Object 41 | form, that is based on (or derived from) the Work and for which the 42 | editorial revisions, annotations, elaborations, or other modifications 43 | represent, as a whole, an original work of authorship. For the purposes 44 | of this License, Derivative Works shall not include works that remain 45 | separable from, or merely link (or bind by name) to the interfaces of, 46 | the Work and Derivative Works thereof. 47 | 48 | "Contribution" shall mean any work of authorship, including 49 | the original version of the Work and any modifications or additions 50 | to that Work or Derivative Works thereof, that is intentionally 51 | submitted to Licensor for inclusion in the Work by the copyright owner 52 | or by an individual or Legal Entity authorized to submit on behalf of 53 | the copyright owner. For the purposes of this definition, "submitted" 54 | means any form of electronic, verbal, or written communication sent 55 | to the Licensor or its representatives, including but not limited to 56 | communication on electronic mailing lists, source code control systems, 57 | and issue tracking systems that are managed by, or on behalf of, the 58 | Licensor for the purpose of discussing and improving the Work, but 59 | excluding communication that is conspicuously marked or otherwise 60 | designated in writing by the copyright owner as "Not a Contribution." 61 | 62 | "Contributor" shall mean Licensor and any individual or Legal Entity 63 | on behalf of whom a Contribution has been received by Licensor and 64 | subsequently incorporated within the Work. 65 | 66 | 2. Grant of Copyright License. Subject to the terms and conditions of 67 | this License, each Contributor hereby grants to You a perpetual, 68 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 69 | copyright license to reproduce, prepare Derivative Works of, 70 | publicly display, publicly perform, sublicense, and distribute the 71 | Work and such Derivative Works in Source or Object form. 72 | 73 | 3. Grant of Patent License. Subject to the terms and conditions of 74 | this License, each Contributor hereby grants to You a perpetual, 75 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 76 | (except as stated in this section) patent license to make, have made, 77 | use, offer to sell, sell, import, and otherwise transfer the Work, 78 | where such license applies only to those patent claims licensable 79 | by such Contributor that are necessarily infringed by their 80 | Contribution(s) alone or by combination of their Contribution(s) 81 | with the Work to which such Contribution(s) was submitted. If You 82 | institute patent litigation against any entity (including a 83 | cross-claim or counterclaim in a lawsuit) alleging that the Work 84 | or a Contribution incorporated within the Work constitutes direct 85 | or contributory patent infringement, then any patent licenses 86 | granted to You under this License for that Work shall terminate 87 | as of the date such litigation is filed. 88 | 89 | 4. Redistribution. You may reproduce and distribute copies of the 90 | Work or Derivative Works thereof in any medium, with or without 91 | modifications, and in Source or Object form, provided that You 92 | meet the following conditions: 93 | 94 | (a) You must give any other recipients of the Work or 95 | Derivative Works a copy of this License; and 96 | 97 | (b) You must cause any modified files to carry prominent notices 98 | stating that You changed the files; and 99 | 100 | (c) You must retain, in the Source form of any Derivative Works 101 | that You distribute, all copyright, patent, trademark, and 102 | attribution notices from the Source form of the Work, 103 | excluding those notices that do not pertain to any part of 104 | the Derivative Works; and 105 | 106 | (d) If the Work includes a "NOTICE" text file as part of its 107 | distribution, then any Derivative Works that You distribute must 108 | include a readable copy of the attribution notices contained 109 | within such NOTICE file, excluding those notices that do not 110 | pertain to any part of the Derivative Works, in at least one 111 | of the following places: within a NOTICE text file distributed 112 | as part of the Derivative Works; within the Source form or 113 | documentation, if provided along with the Derivative Works; or, 114 | within a display generated by the Derivative Works, if and 115 | wherever such third-party notices normally appear. The contents 116 | of the NOTICE file are for informational purposes only and 117 | do not modify the License. You may add Your own attribution 118 | notices within Derivative Works that You distribute, alongside 119 | or as an addendum to the NOTICE text from the Work, provided 120 | that such additional attribution notices cannot be construed 121 | as modifying the License. 122 | 123 | You may add Your own copyright statement to Your modifications and 124 | may provide additional or different license terms and conditions 125 | for use, reproduction, or distribution of Your modifications, or 126 | for any such Derivative Works as a whole, provided Your use, 127 | reproduction, and distribution of the Work otherwise complies with 128 | the conditions stated in this License. 129 | 130 | 5. Submission of Contributions. Unless You explicitly state otherwise, 131 | any Contribution intentionally submitted for inclusion in the Work 132 | by You to the Licensor shall be under the terms and conditions of 133 | this License, without any additional terms or conditions. 134 | Notwithstanding the above, nothing herein shall supersede or modify 135 | the terms of any separate license agreement you may have executed 136 | with Licensor regarding such Contributions. 137 | 138 | 6. Trademarks. This License does not grant permission to use the trade 139 | names, trademarks, service marks, or product names of the Licensor, 140 | except as required for reasonable and customary use in describing the 141 | origin of the Work and reproducing the content of the NOTICE file. 142 | 143 | 7. Disclaimer of Warranty. Unless required by applicable law or 144 | agreed to in writing, Licensor provides the Work (and each 145 | Contributor provides its Contributions) on an "AS IS" BASIS, 146 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 147 | implied, including, without limitation, any warranties or conditions 148 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 149 | PARTICULAR PURPOSE. You are solely responsible for determining the 150 | appropriateness of using or redistributing the Work and assume any 151 | risks associated with Your exercise of permissions under this License. 152 | 153 | 8. Limitation of Liability. In no event and under no legal theory, 154 | whether in tort (including negligence), contract, or otherwise, 155 | unless required by applicable law (such as deliberate and grossly 156 | negligent acts) or agreed to in writing, shall any Contributor be 157 | liable to You for damages, including any direct, indirect, special, 158 | incidental, or consequential damages of any character arising as a 159 | result of this License or out of the use or inability to use the 160 | Work (including but not limited to damages for loss of goodwill, 161 | work stoppage, computer failure or malfunction, or any and all 162 | other commercial damages or losses), even if such Contributor 163 | has been advised of the possibility of such damages. 164 | 165 | 9. Accepting Warranty or Additional Liability. While redistributing 166 | the Work or Derivative Works thereof, You may choose to offer, 167 | and charge a fee for, acceptance of support, warranty, indemnity, 168 | or other liability obligations and/or rights consistent with this 169 | License. However, in accepting such obligations, You may act only 170 | on Your own behalf and on Your sole responsibility, not on behalf 171 | of any other Contributor, and only if You agree to indemnify, 172 | defend, and hold each Contributor harmless for any liability 173 | incurred by, or claims asserted against, such Contributor by reason 174 | of your accepting any such warranty or additional liability. 175 | 176 | END OF TERMS AND CONDITIONS 177 | 178 | APPENDIX: How to apply the Apache License to your work. 179 | 180 | To apply the Apache License to your work, attach the following 181 | boilerplate notice, with the fields enclosed by brackets "[]" 182 | replaced with your own identifying information. (Don't include 183 | the brackets!) The text should be enclosed in the appropriate 184 | comment syntax for the file format. We also recommend that a 185 | file or class name and description of purpose be included on the 186 | same "printed page" as the copyright notice for easier 187 | identification within third-party archives. 188 | 189 | Copyright 2019 IBM Corp. 190 | 191 | Licensed under the Apache License, Version 2.0 (the "License"); 192 | you may not use this file except in compliance with the License. 193 | You may obtain a copy of the License at 194 | 195 | http://www.apache.org/licenses/LICENSE-2.0 196 | 197 | Unless required by applicable law or agreed to in writing, software 198 | distributed under the License is distributed on an "AS IS" BASIS, 199 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 200 | See the License for the specific language governing permissions and 201 | limitations under the License. 202 | --------------------------------------------------------------------------------