├── .gitignore ├── Gemfile ├── README.md ├── _config.yml ├── db ├── patients.schema └── patients_mysql_dump.db ├── paraphraseBench.bib ├── test ├── evaluate.py ├── lexical_source.txt ├── missing_source.txt ├── mixed_source.txt ├── morphological_source.txt ├── naive_source.txt ├── patients_test.sql ├── semantic_source.txt └── syntactic_source.txt └── usage.md /.gitignore: -------------------------------------------------------------------------------- 1 | _site/ 2 | 3 | Gemfile.lock 4 | serve.sh 5 | -------------------------------------------------------------------------------- /Gemfile: -------------------------------------------------------------------------------- 1 | source 'https://rubygems.org' 2 | gem 'github-pages', group: :jekyll_plugins 3 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | *Introducing ParaphraseBench -- a benchmark to evaluate the robustness of NLIDBs.* 2 | 3 | Current benchmarks like the GeoQuery benchmark to not explicitly test different linguistic variations which is important to understand the robustness of an NLIDB. For testing different linguistic variants in a principled manner, we therefore curated a new benchmark as part of our paper on DBPal that covers different linguistic variations for the user NL input and maps it to an expected SQL output. 4 | 5 | The schema of our new benchmark models a medical database which contains only one table comprises of hospital’s patients attributes such as name, age, and disease. In total, the benchmark consists of 290 pairs of NL-SQL queries. The queries are grouped into one of the following categories depending on the linguistic variation that is used in the NL query: naıve, syntactic paraphrases, morphological paraphrases, and lexical paraphrases as well as a set of queries with missing information. 6 | 7 | While the NL queries in the naıve category represent a direct translation of their SQL counterpart, the other categories are more challenging: syntactic paraphrases emphasize structural variances, lexical paraphrases pose challenges such as alternative phrases, semantic paraphrases use semantic similarities such as synonyms, morphological paraphrases add affixes, apply stemming, etc., and the NL queries with missing information stress implicit and incomplete NL queries. 8 | 9 | In the following, we show an example query for each of these categories in our benchmark: 10 | 11 | * Naıve: ”What is the average length of stay of patients where age is 80?” 12 | * Syntactic: ”Where age is 80, what is the average length of stay of patients?” 13 | * Morphological: ”What is the averaged length of stay of patients where age equaled 80?” 14 | * Lexical: ”What is the mean length of stay of patients where age is 80 years?” 15 | * Semantic: ”What is the average length of stay of patients older than 80?” 16 | * Missing Information: ”What is the average stay of patients who are 80?” 17 | 18 | Please cite the following paper when using this benchmark ([download bib file](paraphraseBench.bib)): 19 | 20 | > Title: An End-to-end Neural Natural Language Interface for Databases 21 | > Authors: Utama, Prasetya; Weir, Nathaniel; Basik, Fuat; Binnig, Carsten; Cetintemel, Ugur; Hättasch, Benjamin; Ilkhechi, Amir; Ramaswamy, Shekar; Usta, Arif 22 | > Publication: eprint arXiv:1804.00401 23 | > Publication Date: 04/2018 24 | > Origin: ARXIV 25 | > Keywords: Computer Science - Databases, Computer Science - Computation and Language, Computer Science - Human-Computer Interaction 26 | > Bibliographic Code: 2018arXiv180400401U 27 | > 28 | 29 | 30 | Read about [how to use the benchmark](usage.md). -------------------------------------------------------------------------------- /_config.yml: -------------------------------------------------------------------------------- 1 | theme: jekyll-theme-minimal 2 | show_downloads: true 3 | -------------------------------------------------------------------------------- /db/patients.schema: -------------------------------------------------------------------------------- 1 | { 2 | "types": { 3 | "FIRSTNAME": 1, 4 | "LASTNAME": 2, 5 | "DIAGNOSIS": 3, 6 | "GENDER": 4, 7 | "INTEGER": 5 8 | }, 9 | "defaults" : { 10 | "patients" : { "col": "id", "utt" : "patient"} 11 | }, 12 | "ents": { 13 | "patients": { 14 | "id" : {"index": true, "type": "INTEGER", "utt": "id"}, 15 | "first_name": {"index": true, "type": "FIRSTNAME", "utt": "first name"}, 16 | "last_name": {"index": true, "type": "LASTNAME", "utt": "last name"}, 17 | "diagnosis": {"index": true, "type": "DIAGNOSIS", "utt": "diagnosis"}, 18 | "gender": {"index": true, "type": "GENDER", "utt": "gender"}, 19 | "length_of_stay": {"index": true, "type": "INTEGER", "utt": "length of stay"}, 20 | "age": {"index": true, "type": "INTEGER", "utt": "age"} 21 | } 22 | } 23 | } 24 | -------------------------------------------------------------------------------- /db/patients_mysql_dump.db: -------------------------------------------------------------------------------- 1 | CREATE TABLE `patients` ( 2 | `id` mediumint(8) unsigned NOT NULL auto_increment, 3 | `first_name` varchar(255) default NULL, 4 | `last_name` varchar(255) default NULL, 5 | `diagnosis` varchar(255) default NULL, 6 | `length_of_stay` mediumint default NULL, 7 | `age` mediumint default NULL, 8 | `gender` varchar(255) default NULL, 9 | PRIMARY KEY (`id`) 10 | ) AUTO_INCREMENT=1; 11 | 12 | INSERT INTO `patients` (`first_name`,`last_name`,`diagnosis`,`length_of_stay`,`age`,`gender`) VALUES ("Baker","Harrington","heart disease",8,50,"female"),("Florence","Patterson","tuberculosis",8,94,"male"),("Sasha","Hoffman","liver disease",8,4,"other"),("Maya","Woods","liver disease",2,41,"male"),("Baker","Morris","tuberculosis",7,76,"other"),("Florence","Morris","stroke",2,53,"female"),("Bruce","Blake","stroke",13,45,"male"),("Tate","Patterson","tuberculosis",20,57,"other"),("Tara","Ford","allergies",16,52,"female"),("Maya","Patterson","flu",14,58,"female"),("Tate","Gibson","cancer",9,83,"female"),("Tara","Ford","asthma",2,23,"other"),("Bruce","Silva","heart disease",20,62,"other"),("Baker","Gibson","allergies",1,57,"male"),("Maya","Silva","hiv",15,9,"male"),("Florence","Ford","diabetes",7,75,"male"),("Bruce","Guerrero","tuberculosis",19,28,"male"),("Ian","Morris","flu",12,50,"other"),("Bruce","Hoffman","stroke",8,2,"female"),("Maya","Harrington","cancer",11,31,"female"),("Bruce","Gibson","allergies",1,87,"other"),("Tate","Guerrero","diarrhea",15,17,"other"),("Maya","Woods","stroke",3,98,"male"),("Tara","Patterson","allergies",9,79,"male"),("August","Hoffman","heart disease",18,59,"male"),("Baker","Morris","heart disease",3,48,"other"),("Ian","Ford","tuberculosis",20,57,"other"),("Bruce","Gibson","stroke",20,32,"other"),("Tate","Gibson","hiv",18,97,"male"),("Baker","Hoffman","heart disease",18,26,"other"),("Florence","Silva","hiv",16,6,"other"),("Baker","Harrington","asthma",17,18,"female"),("August","Patterson","stroke",10,94,"female"),("August","Silva","diabetes",2,90,"male"),("Bruce","Ford","cancer",16,97,"female"),("Baker","Gibson","stroke",3,32,"other"),("Sasha","Ford","diabetes",19,80,"male"),("August","Silva","allergies",2,57,"male"),("Sasha","Gibson","flu",8,19,"other"),("Tate","Morris","diabetes",13,82,"female"),("Mary","Morris","cancer",14,91,"other"),("Sasha","Silva","asthma",2,42,"female"),("Baker","Guerrero","flu",11,2,"male"),("Mary","Patterson","hiv",12,84,"male"),("Tate","Patterson","heart disease",4,37,"female"),("Tara","Patterson","cancer",15,57,"male"),("Florence","Patterson","cancer",18,83,"other"),("Sasha","Morris","stroke",15,11,"female"),("Tara","Woods","diarrhea",2,73,"female"),("Florence","Blake","cancer",13,30,"other"),("Sasha","Hoffman","asthma",2,67,"female"),("Sasha","Harrington","liver disease",19,95,"female"),("Tate","Silva","hiv",15,19,"other"),("Florence","Guerrero","asthma",8,16,"female"),("Florence","Silva","stroke",8,80,"male"),("Tate","Harrington","flu",2,29,"other"),("Baker","Hoffman","tuberculosis",9,69,"other"),("Mary","Guerrero","liver disease",18,69,"other"),("Mary","Harrington","diabetes",20,19,"male"),("Tate","Guerrero","hiv",11,89,"male"),("Maya","Hoffman","flu",16,28,"female"),("Sasha","Blake","hiv",20,49,"male"),("Maya","Patterson","cancer",7,41,"other"),("Tate","Blake","allergies",2,25,"male"),("Tara","Silva","stroke",3,89,"female"),("Bruce","Morris","diabetes",14,56,"other"),("Maya","Ford","tuberculosis",3,91,"female"),("Baker","Hoffman","diabetes",9,86,"other"),("Ian","Morris","heart disease",18,57,"other"),("Maya","Patterson","diarrhea",2,28,"other"),("August","Blake","diarrhea",20,16,"other"),("Sasha","Hoffman","diarrhea",6,57,"male"),("Tara","Harrington","stroke",9,92,"male"),("Mary","Morris","flu",12,41,"other"),("Sasha","Ford","liver disease",11,56,"male"),("Ian","Blake","heart disease",10,18,"male"),("Mary","Guerrero","cancer",10,74,"male"),("Florence","Morris","tuberculosis",7,12,"male"),("Florence","Guerrero","stroke",3,56,"other"),("August","Silva","heart disease",11,32,"other"),("Tara","Hoffman","heart disease",15,33,"other"),("Bruce","Hoffman","liver disease",16,80,"other"),("Bruce","Woods","cancer",2,14,"male"),("Florence","Ford","allergies",11,38,"other"),("Bruce","Gibson","cancer",8,98,"female"),("August","Blake","stroke",20,38,"female"),("Ian","Harrington","diabetes",2,98,"male"),("Mary","Blake","stroke",10,78,"other"),("Ian","Guerrero","flu",7,18,"female"),("Baker","Harrington","flu",5,95,"other"),("Baker","Ford","stroke",13,1,"male"),("Tate","Hoffman","liver disease",6,40,"female"),("Maya","Hoffman","hiv",1,18,"female"),("Bruce","Patterson","diabetes",20,5,"male"),("Florence","Blake","liver disease",14,72,"male"),("August","Guerrero","diarrhea",2,13,"male"),("Tate","Morris","tuberculosis",16,90,"male"),("Tara","Hoffman","allergies",2,77,"other"),("Tara","Harrington","tuberculosis",12,27,"female"),("Florence","Woods","heart disease",18,73,"other"); 13 | -------------------------------------------------------------------------------- /paraphraseBench.bib: -------------------------------------------------------------------------------- 1 | @ARTICLE{2018arXiv180400401U, 2 | author = {{Utama}, P. and {Weir}, N. and {Basik}, F. and {Binnig}, C. and 3 | {Cetintemel}, U. and {H{\"a}ttasch}, B. and {Ilkhechi}, A. and 4 | {Ramaswamy}, S. and {Usta}, A.}, 5 | title = "{An End-to-end Neural Natural Language Interface for Databases}", 6 | journal = {ArXiv e-prints}, 7 | archivePrefix = "arXiv", 8 | eprint = {1804.00401}, 9 | primaryClass = "cs.DB", 10 | keywords = {Computer Science - Databases, Computer Science - Computation and Language, Computer Science - Human-Computer Interaction}, 11 | year = 2018, 12 | month = apr, 13 | adsurl = {http://adsabs.harvard.edu/abs/2018arXiv180400401U}, 14 | adsnote = {Provided by the SAO/NASA Astrophysics Data System} 15 | } 16 | -------------------------------------------------------------------------------- /test/evaluate.py: -------------------------------------------------------------------------------- 1 | from __future__ import print_function 2 | import MySQLdb 3 | import warnings 4 | import re 5 | import _mysql_exceptions 6 | 7 | DEBUG = False 8 | 9 | 10 | def prepare_query(q, timeout): 11 | """ 12 | Prepare query by reformating and adding a time constraint 13 | 14 | :param q: query to prepare 15 | :param timeout: timeout value 16 | :return: prepared query 17 | """ 18 | q = q.replace(' (', '(').replace('< =', '<=').replace('> =', '>=').replace('< >', '<>').replace('! =', '!=') 19 | q = re.sub('SELECT ', 'SELECT /*+ MAX_EXECUTION_TIME(' + str(timeout) + ') */ ', q, 1) 20 | q = re.sub('select ', 'SELECT /*+ MAX_EXECUTION_TIME(' + str(timeout) + ') */ ', q, 1) 21 | return q 22 | 23 | 24 | def run_query(q, timeout, dbconn, selected_fields=None): 25 | """ 26 | Run a given query and additionally return result as list of dicts. 27 | If selected_fields is not None, only the columns with the given name will be returned 28 | 29 | :param q: query 30 | :param timeout: timeout for the query 31 | :param dbconn: connection to database 32 | :param selected_fields: list of fields to return (if None, all fields will be returned) 33 | :return: answer object {row_count, tuples, cols, fields, rows, status} 34 | """ 35 | q = prepare_query(q, timeout) 36 | cu = dbconn.cursor() 37 | cu.execute(q) 38 | result = {'row_count': cu.rowcount, 'tuples': cu.fetchall()} 39 | 40 | if cu.description: 41 | result['cols'] = [col[0] for col in cu.description] 42 | result['fields'] = set(result['cols']) 43 | result['rows'] = [ 44 | {k: v for (k, v) in zip(result['cols'], row) if selected_fields is None or k in selected_fields} for row in 45 | result['tuples']] 46 | result['status'] = True 47 | else: 48 | result['status'] = False 49 | 50 | cu.close() 51 | return result 52 | 53 | 54 | def compare(query, gold, nl, dbconn): 55 | """ 56 | Compute the accuracy of a given query 57 | 58 | :param gold: gold standard sql query to compare query to 59 | :param query: sql query to test 60 | :param nl: natural language query the sql query represents 61 | :param dbconn: database connection 62 | :return: evaluation result tuple (correct, gold, query) 63 | """ 64 | 65 | # Output buffer 66 | sb = query.replace(' (', '(').replace('< =', '<=').replace('> =', '>=').replace('< >', '<>').replace('! =', '!=') 67 | 68 | # Run gold query 69 | try: 70 | res_gold = run_query(gold, 10000, dbconn) 71 | except (_mysql_exceptions.ProgrammingError, _mysql_exceptions.InterfaceError, _mysql_exceptions.OperationalError, 72 | _mysql_exceptions.NotSupportedError) as e: 73 | if DEBUG: 74 | print(type(e)) 75 | sb = '(INCORRECT) [Gold Query failed] ' + sb 76 | print(sb) 77 | return False, gold, query 78 | 79 | # Ran gold query successfully? 80 | if res_gold['status']: 81 | # Run predicted query (query to test) 82 | try: 83 | res_query = run_query(query, 10000, dbconn) # , selected_fields=res_gold["fields"]) 84 | except ( 85 | _mysql_exceptions.ProgrammingError, _mysql_exceptions.InterfaceError, _mysql_exceptions.OperationalError, 86 | _mysql_exceptions.NotSupportedError) as e: 87 | if DEBUG: 88 | print(type(e)) 89 | sb = '(INCORRECT) [Query failed] ' + sb 90 | print(sb) 91 | return False, gold, query 92 | 93 | # Print some debug information 94 | if DEBUG: 95 | print("Results query: ", res_query['rows'], res_query['tuples']) 96 | print("Results gold: ", res_gold['rows'], res_gold['tuples']) 97 | 98 | # Generate sets of resulting lines (string-represented, order/attributes not changed) 99 | # of gold and query for comparison 100 | res_query_tuple_set = set(str(t) for t in res_query['tuples']) 101 | res_gold_tuple_set = set(str(t) for t in res_gold['tuples']) 102 | 103 | # Are the lines returned entirely correct? 104 | if len(res_query_tuple_set) == len(res_gold_tuple_set) and res_query_tuple_set == res_gold_tuple_set: 105 | sb = '(CORRECT) [EXACT LINES] ' + sb 106 | print(sb) 107 | return True, gold, query 108 | # If not, do the attributes returned cover all requested attributes 109 | # and does the result contain the expected entries? 110 | else: 111 | res_query_dict_set = set(str(d) for d in res_query['rows']) 112 | res_gold_tuple_set = set(str(d) for d in res_gold['rows']) 113 | 114 | # Check amount of distinct rows returned 115 | if len(res_query_dict_set) != len(res_gold_tuple_set): 116 | sb = '(INCORRECT) [AMOUNT RETURNED] ' + sb 117 | print(sb) 118 | return False, gold, query 119 | elif (len(res_query_dict_set) == len(res_gold_tuple_set) and len(res_gold_tuple_set) == 0): 120 | sb = '(INCORRECT) [BOTH ZERO RETURNED] ' + sb 121 | print(sb) 122 | return False, gold, query 123 | else: 124 | goldcolnames = res_gold['rows'][0].keys() 125 | goldcols = [] 126 | for goldcolname in goldcolnames: 127 | goldcols.append(set(map(lambda x: x[goldcolname], res_gold['rows']))) 128 | 129 | querycolnames = res_query['rows'][0].keys() 130 | querycols = [] 131 | for querycolname in querycolnames: 132 | querycols.append(set(map(lambda x: x[querycolname], res_query['rows']))) 133 | 134 | alwaystrue = False 135 | for goldcol in goldcols: 136 | for querycol in querycols: 137 | if goldcol == querycol: 138 | alwaystrue = True 139 | 140 | if alwaystrue: 141 | sb = '(CORRECT) [ATTRIBUTE SUBSET] ' + sb 142 | print(sb) 143 | return True, gold, query 144 | 145 | # Compare returned entries 146 | if res_query_dict_set == res_gold_tuple_set: 147 | sb = '(CORRECT) [ATTRIBUTE SUBSET] ' + sb 148 | print(sb) 149 | return True, gold, query 150 | 151 | # Something went wrong 152 | sb = '(INCORRECT) ' + sb 153 | print(sb) 154 | return False, gold, query 155 | 156 | 157 | def get_connection(host, user, passwd, db_name): 158 | """ 159 | Get a database connection with the given parameters 160 | 161 | :param host: hostname of the database 162 | :param user: username for database connection 163 | :param passwd: password for database connection 164 | :param db_name: database name to connect to 165 | :return: db connection 166 | """ 167 | # Database connection 168 | dbconn = MySQLdb.connect( 169 | host=host, 170 | user=user, 171 | passwd=passwd, 172 | db=db_name 173 | ) 174 | warnings.filterwarnings('ignore', category=MySQLdb.Warning) 175 | return dbconn 176 | 177 | 178 | def compare_queries(queries, gold_queries, nl_representations, dbconn): 179 | """ 180 | Compare the given lists of queries 181 | 182 | :param queries: queries to compare 183 | :param gold_queries: gold standard queries 184 | :param nl_representations: nl sentences the queries represent 185 | :param dbconn: database connection 186 | :return: list of result-triples (correctness, gold, query) 187 | """ 188 | return [compare(query, gold, nl, dbconn) for query, gold, nl in zip(queries, gold_queries, nl_representations)] 189 | 190 | 191 | if __name__ == '__main__': 192 | dbconn = get_connection( 193 | host="localhost", 194 | user="root", 195 | passwd="root123", 196 | db_name="patients" 197 | ) 198 | 199 | queries = ["SELECT * FROM patients WHERE age=50;"] 200 | gold_queries = ["SELECT length_of_stay FROM patients WHERE age=50;"] 201 | nl_representations = ["Give me the lengths of stays for 50 year old patients"] 202 | 203 | results = compare_queries(queries, gold_queries, nl_representations, dbconn) 204 | count_correct = sum(c for c, _, _ in results) 205 | print("Correct: ", count_correct, "/", len(results)) 206 | 207 | dbconn.close() 208 | -------------------------------------------------------------------------------- /test/lexical_source.txt: -------------------------------------------------------------------------------- 1 | what are the surnames of all patients ? 2 | what are the first name and last names of patients where sex is male and age is not less than 18 3 | what is being the mean age of all patients 4 | what is the total sum of patients where diagnosis is being influenza ? 5 | for each diagnosis group , what is the oldest age of patients ? 6 | for each diagnosis , what is the highest age for patients whose length of stay is not less than 3 ? 7 | what are the ages of all hospital patients ? 8 | what are given names and ages of patients whose gender is male or age is at least 18 ? 9 | what is the least high age of all patients ? 10 | what is the minimum length of stay of patients where their gender equals male 11 | for each gender , what is the shortest length of stay of patients ? 12 | for each gender , get the shortest length of stay of patients where diagnosis is equal influenza 13 | show all of the last names and all of the diagnoses of all of the patients 14 | what are the surnames , ages , and conditions of patients ? 15 | what are the first names , last names , and ages of all the hospital patients ? 16 | find the surname and age of every patient 17 | what are the family names and diagnoses of patients ? 18 | what are the family names and sexes of patients ? 19 | what are the first names and last names of patients where age is no less than 20 and no more than 30 ? 20 | what are the ages of patients where 3 is less than or equal to length of stay ? 21 | what are the ages of patients where length of stay is not greater than 3 ? 22 | display the ages of hospital patients where duration of stay is shorter than 3 23 | what are the age of patients where length of stay must be more than 3 ? 24 | what are the last names and the diagnoses of hospital patients where either the patient's gender is male or the patient's age is greater than 18 ? 25 | what is the aggregation of age of all patients ? 26 | what is the longest length of stay of patients ? 27 | what is the total number of patients ? 28 | what is the shortest length of stay out of all hospital inpatients ? 29 | what is the total sum of length of hospital stay of inpatients in the database ? 30 | what is the count of patients where gender is male and age is not less than 18 ? 31 | find the minimum length of stay of patients where age is neither more nor less than 18 32 | what is the sum of age of all patients whose diagnosis is not influenza ? 33 | what is the cumulation of durations of stay of inpatients where diagnosis is influenza ? 34 | what is the sum of ages of patients where age is no less than 20 and age is no more than 30 35 | what is the longest length of stay of patients where age is younger than 25 ? 36 | for each gender , what is the total sum of ages of hospital patients ? 37 | sorted by each gender , what is the total count of hospital inpatients ? 38 | for each diagnosis , what is the least long length of stay of patients ? 39 | for each sex , what is being the aggregate of age of patients whose age is greater than or equal to 18 ? 40 | for each diagnosis type , what equals the sum of length of hospital stay of hospital patients ? 41 | for each diagnosis , what is being the sum of ages of patients ? 42 | for each gender , what is the summation of age of patients where length of stay does not equal more than 5 ? 43 | for each diagnosis , what is the sum of length of stay of patients where age exceeds or equals 20 and is below or equal to 30 ? 44 | for each diagnosis , what is the sum of patient age from patients where length of hospital stay is strictly less than 6 ? 45 | sorted into gender , what is the summation of length of lifetimes of inpatients where diagnosis is influenza ? 46 | for each gender , what is the number of patients where age is equal to or exceeding 10 and is not exceeding 25 47 | for each diagnosis , what is the minimum length of stay of patients where first name is not anything except John 48 | what are the first and last names of all hospital patients ? 49 | what are the possible diagnoses of patients ? 50 | what are the first names of patients and also the last names of patients where the diagnosis is flu ? 51 | what are the first names and last names of patients where diagnosis is anything but flu ? 52 | what is the least young age of patients ? 53 | what is the mean of length of stay of patients ? 54 | what is the least low age of patients where gender is female and age is at least 18 ? 55 | what is the mean of length of hospitalization of patients in the dataset where patient age equals to 15 years ? 56 | for each diagnosis category , what is the mean of length of hospital stay of inpatients ? 57 | for each diagnosis , what is the average length of hospitalization of inpatients where sex is male ? 58 | -------------------------------------------------------------------------------- /test/missing_source.txt: -------------------------------------------------------------------------------- 1 | what are all last names ? 2 | what are the full names of all patients who are older than 18 and male ? 3 | how old is the typical patient ? 4 | how many patients with flu are there ? 5 | return the oldest age for each diagnosis 6 | for each diagnosis , return the oldest patient age who stays longer or equal to 3 7 | how old are the patients ? 8 | show the age and first name of all patients who are either male or older than 18 9 | how old is the youngest patient ? 10 | find the minimum length of stay of male patients 11 | find the minimum lengths of stay for male and for female patients 12 | what are the shortest stay durations of flu patients for each gender ? 13 | get all patients' last names and what they are in for 14 | show me the list of surname along with the age , and diagnosis 15 | for every patient , display full name and age 16 | find last name and age 17 | show me the list of surname and diagnosis 18 | show the last name of every patient and whether they are male or female 19 | show the full names for patients who is between 20 to 30 years old 20 | how old are the patients who stayed at least 3 days ? 21 | how old are the patients who stayed shorter than or exactly 3 days ? 22 | how old are the patients who stayed for shorter than 3 days ? 23 | how old are the patients who stayed longer than 3 days ? 24 | show surnames and diagnosis of patients either male or older than 18 years old 25 | aggregate the years old of everyone in the database 26 | display the longest hospitalization period 27 | how many patients are there ? 28 | how long did the patient with the shortest length of stay stay ? 29 | how many hours total were spent hospitalized by patients in the database ? 30 | how many male patients older than 18 are there ? 31 | find the minimum length of stay of all patients who are as old as 18 32 | sum the ages of patients who do not have the flu 33 | get the sum of age of patients with flu 34 | get the aggregated hospitalization time for all patients between 20 and 30 years 35 | what is the longest length of stay by all patients younger than 25 ? 36 | sum the ages of all male patients and all female patients and all other patients 37 | count the number of patients who identify as male , female , or other 38 | how long were the patients in each diagnosis group who were in for the shortest times in for ? 39 | for each gender , what is the sum of age of patients 18 or older ? 40 | add up for how long patients in each diagnosis category were at the hospital 41 | aggregate the years old of everyone in the database sorted into diagnosis 42 | what is the sum of ages for each gender of patients hospitalized for 5 or less ? 43 | sum up for how long patients from 20 to 30 years old in each diagnosis category were at the hospital 44 | find the sums of the numbers of years all patients whose length of stay is less than 6 have lived from each diagnosis 45 | what is the sum of ages for each gender of patients with flu ? 46 | what is the sum of patients between 10 and 25 for each gender ? 47 | for each diagnosis , what is the minimum length of stay of patients who are named John :6 48 | what are all the full names ? 49 | list flu and all the alternatives to flu listed in the database 50 | show the full names of all patients who are diagnosed with flu 51 | what are the full names of patients not with flu ? 52 | how long has the longest lived patient lived ? 53 | average how long patients were hospitalized for 54 | get the maximum age of female patients older than 18 55 | what is the length of stay of the typical patient who is 15 ? 56 | get the average time patients for each diagnosis were in for 57 | compute the mean of male patients' length of stay per diagnosis 58 | -------------------------------------------------------------------------------- /test/mixed_source.txt: -------------------------------------------------------------------------------- 1 | what are all patients' surnames ? 2 | what are the first name and last name of all patients where age is no less than 18 and sex is male ? 3 | what is the patients' mean age ? 4 | what is the count of patients diagnosed with flu ? 5 | what is the highest patient age for every diagnosis ? 6 | what is the maximum patient age where length of stay was greater than or equal to 3 for each diagnosis ? 7 | of all hospital patients , what were the ages ? 8 | what are the first names and ages of patients where gender equaled male or age was 18 at the minimum 9 | the least high age of all patients was what ? 10 | of patients where male is the gender , their minimum length of stay equaled what ? 11 | find the length of stay minimized for each gender 12 | for each gender , get the shortest length of stay of patients diagnosed with flu 13 | show all of the last names and what all the patients were diagnosed with 14 | for all patients , what were the last name , age , and diagnoses ? 15 | what are all hospital patients' first names , last names , and ages ? 16 | find all patients' surnames and ages 17 | the last names and diagnoses of patients will be what ? 18 | the last names and sexes of patients will be what ? 19 | from patients where age is at least 20 and at most 30 , the first names and last names are what ? 20 | what are patients aged where 3 is less than or equal to the length of stay ? 21 | from patients where length of stay is not greater than 3 , the patients are aged what ? 22 | from patients where length of stay is strictly below 3 , what are the ages ? 23 | from patients who stayed a length greater than 3 , what are the ages ? 24 | from patients where gender is male or age is greater than to 18 , last names are what and diagnosis are what ? 25 | what is the aggregated age of all patients ? 26 | of all patients , the maximum length of stay was what ? 27 | the total number of patients is what ? 28 | what was the shortest length stayed by all hospital inpatients ? 29 | what is the duration of stay summed from all hospital inpatients ? 30 | what is the number of patients where male was the gender and 18 or greater was the age ? 31 | find from patients where age equals 18 the minimum length stayed 32 | of all patients not diagnosed with flu , what is the sum of age ? 33 | the sum of lengths stayed by patients diagnosed with flu is what ? 34 | for patients where age is no less than 20 and age is no more than 30 , what is the sum of ages ? 35 | of patients aged less than 25 , what is the longest duration of stay ? 36 | for each gender , what is the total summed age of hospital patients ? 37 | what is the total count of hospital inpatients sorted by each gender ? 38 | the least long length of stay for patients is what for each diagnosis ? 39 | from patients where age is greater than or equaling 18 , what is the summed ages sorted by gender ? 40 | for each diagnosis , the sum of staying length of patients was what ? 41 | what will the sum of ages of patients for each diagnosis be ? 42 | for each gender , what is the summation of age of patients where length of stay did not equal more than 5 ? 43 | for each diagnosis , what is the result of summing the length of stay of patients where age exceeds or equals 20 and is below or equal to 30 ? 44 | the sum of patient age from patients where length of stay was less than 6 is what for each diagnosis ? 45 | for each gender , summate the length of lifetimes of patients diagnosed with influenza 46 | where age is equal to or exceeding 10 and is not exceeding 25 , enumerate the patients for each gender 47 | for each diagnosis , what is the minimum length of stay of patients where first name is not anything except John 48 | All hospital patients are first and last named what ? 49 | what were the possible diagnoses of patients ? 50 | where diagnosis is flu , what are patients' first names and also patients' last names ? 51 | what are the first and last names of patients not diagnosed with flu ? 52 | the least youngest age of patients is what ? 53 | the averaged length of stay of patients was what ? 54 | the least lowest patient age is what where female is gender and age is at least 18 ? 55 | where age equaled 15 , what is the averaged length of stay of patients ? 56 | what is the averaged length stayed by patients for each diagnosis ? 57 | for each diagnosis , what is the averaging of the length of hospitalization of inpatients where sex is male ? 58 | -------------------------------------------------------------------------------- /test/morphological_source.txt: -------------------------------------------------------------------------------- 1 | what were the last names of all the patients ? 2 | what were the first name and last name from patients where gender is male and age is greater than or equal to 18 3 | what is the averaged age of all patients ? 4 | what is the count of patients diagnosed with flu ? 5 | for each diagnosis , what was the maximum age of patients ? 6 | for each diagnosis , what is the maximum age of patients where length of stay was greater than or equal to 3 ? 7 | what were the ages of all hospital patients ? 8 | what are the first names and ages of patients where gender equaled male or age was greater than or equal to 18 9 | what was the minimum age of patients ? 10 | what was the minimum length of stay of patients where gender equaled male ? 11 | for each gender , find the minimized length of stay 12 | for each gender , get the minimum length of stay of patients diagnosed with flu 13 | show the last names and what patients were diagnosed with 14 | what were the last name , age , and diagnosis of patients ? 15 | what were the first names , last names , and ages of patients ? 16 | find what patients are last named and patients' ages 17 | what will be the family names and diagnoses of patients ? 18 | what will be the family names and genders of patients ? 19 | what were the first names and last names of patients where age was greater than or equal to 20 and age was less than or equal to 30 20 | what are the patients aged where length of stay is greater than or equal to 3 ? 21 | what are the patients aged where length of stay is not greater than 3 ? 22 | display the age of patients where length of stay has been less than 3 23 | what are the ages of patients who stayed a length greater than 3 ? 24 | what were the last names and diagnosis of patients where gender is male or age is greater than 18 ? 25 | what is the summed age of all patients ? 26 | what was the maximum length of stay of patients ? 27 | what was the number of patients ? 28 | what was the minimum length stayed by all patients ? 29 | what is the length of stay summed from all patients ? 30 | what is the number of patients where gender was male and age was greater than or equal to 18 ? 31 | find the minimum length stayed by patients where age equals 18 32 | what is the sum of age of all patients not diagnosed with flu ? 33 | what is the sum of lengths stayed by patients diagnosed with flu ? 34 | what was the sum of age of patients where age was greater than or equal to 20 and age was less than or equal to 30 ? 35 | what is the maximum length of stay of patients aged younger than 25 36 | for each gender , what is the summed age of patients ? 37 | for each gender , what was the number of patients ? 38 | for each diagnosis , what is the minimum length stayed by patients ? 39 | for each gender , what is the summed age of patients where age is greater than or equaling 18 ? 40 | for each diagnosis , what was the sum of staying length of patients ? 41 | for each diagnosis , what was the sum of age of patients ? 42 | for each gender , what was the sum of age of patients where length of stay was less or equal to 5 ? 43 | for each diagnosis , what is the result of summing length of stay of patients where age is greater than or equal to 20 and age is less than or equal to 30 ? 44 | for each diagnosis , what is the sum of age of patients where length of stay was less than 6 ? 45 | for each gender , summate the ages of patients diagnosed with flu 46 | for each gender , enumerate the patients where age is greater than or equal to 10 and age is less than or equal to 25 47 | for each diagnosis , what is the minimum length of stay of patients who are first named John 48 | All patients are first named and last named what ? 49 | what were the distinct diagnoses of patients ? 50 | what were the first names and last names of patients where diagnosis was flu ? 51 | what are the first names and last names of patients not diagnosed with flu ? 52 | maximize the age from all patients 53 | what was the averaged length of stay of patients ? 54 | what is the maximum age of patients aged greater than or equal to 18 where gender is female ? 55 | what is the averaged length of stay of patients where age equaled 15 ? 56 | for each diagnosis , what is the averaged length stayed by patients ? 57 | for each diagnosis , what is the averaging of the length of staying of patients where gender is male ? 58 | -------------------------------------------------------------------------------- /test/naive_source.txt: -------------------------------------------------------------------------------- 1 | what are the last names of all the patients ? 2 | what are the first name and last names from patients where gender is male and age is greater than or equal to 18 ? 3 | what is the average age of all patients ? 4 | what is the count of patients where diagnosis is flu ? 5 | for each diagnosis , what is the maximum age of patients ? 6 | for each diagnosis , what is the maximum age of patients where length of stay is greater than or equal to 3 ? 7 | what are the ages of all patients ? 8 | what are the first names and ages of patients where gender equals male or age is greater than or equal to 18 ? 9 | what is the minimum age of patients ? 10 | what is the minimum length of stay of patients where gender equals male ? 11 | for each gender , what is the minimum lengths of stay of patients ? 12 | for each gender , get the minimum length of stay of patients where diagnosis is equal to flu 13 | show the last names and diagnosis of patients 14 | what are the last names , age , and diagnosis of patients ? 15 | what are the first names , last names , and ages of patients ? 16 | find the last name and age of patients ? 17 | what are the last names and diagnoses of patients ? 18 | what are the last names and gender of patients ? 19 | what are the first names and last names of patients where age is greater than or equal to 20 and age is less than or equal to 30 20 | what are the ages of patients where length of stay is greater than or equal to 3 ? 21 | what are the ages of patients where length of stay is less than or equal to 3 ? 22 | display the age of patients where length of stay is less than 3 23 | what are the ages of patients where length of stay is greater than 3 ? 24 | what are the last names and diagnosis of patients where gender is male or age is greater than 18 ? 25 | what is the sum of age of all patients ? 26 | what is the maximum length of stay of patients ? 27 | what is the number of patients ? 28 | what is the minimum length of stay of patients ? 29 | what is the sum of length of stay of patients ? 30 | what is the number of patients where gender is male and age is greater than or equal to 18 ? 31 | find the minimum length of stay of patients where age equals 18 32 | what is the sum of age of all patients where diagnosis is not flu ? 33 | what is the sum of lengths of stay of patients where diagnosis is flu ? 34 | what is the sum of age of patients where age is greater than or equal to 20 and age is less than or equal to 30 ? 35 | what is the maximum length of stay of patients where age is less than 25 ? 36 | for each gender , what is the sum of age of patients ? 37 | for each gender , what is the number of patients ? 38 | for each diagnosis , what is the minimum length of stay of patients ? 39 | for each gender , what is the sum of age of patients where age is greater than or equal to 18 ? 40 | for each diagnosis , what is the sum of length of stay of patients ? 41 | for each diagnosis , what is the sum of age of patients ? 42 | for each gender , what is the sum of age of patients where length of stay is less or equal to 5 ? 43 | for each diagnosis , what is the sum of length of stay of patients where age is greater than or equal to 20 and age is less than or equal to 30 ? 44 | for each diagnosis , what is the sum of age of patients where length of stay is less than 6 ? 45 | for each gender , what is the sum of age of patients where diagnosis is flu ? 46 | for each gender , what is the number of patients where age is greater than or equal to 10 and age is less than or equal to 25 47 | for each diagnosis , what is the minimum length of stay of patients where first name is John 48 | what are the first names and last names of patients ? 49 | what are the distinct diagnoses of patients ? 50 | what are the first names and last names of patients where diagnosis is flu ? 51 | what are the first names and last names of patients where diagnosis is not flu ? 52 | what is the maximum age of patients 53 | what is the average length of stay of patients 54 | what is the maximum age of patients where gender is female and age is greater than or equal to 18 ? 55 | what is the average length of stay of patients where age equals to 15 ? 56 | for each diagnosis , what is the average length of stay of patients ? 57 | for each diagnosis , what is the average length of stay of patients where gender is male ? 58 | -------------------------------------------------------------------------------- /test/patients_test.sql: -------------------------------------------------------------------------------- 1 | SELECT patients.last_name FROM patients; 2 | SELECT patients.first_name,patients.last_name FROM patients WHERE patients.gender='male' AND patients.age>=18; 3 | SELECT avg(patients.age) FROM patients; 4 | SELECT count(1) FROM patients WHERE patients.diagnosis='flu'; 5 | SELECT patients.diagnosis,max(patients.age) FROM patients GROUP BY patients.diagnosis; 6 | SELECT patients.diagnosis,max(patients.age) FROM patients WHERE patients.length_of_stay>=3 GROUP BY patients.diagnosis; 7 | SELECT patients.age FROM patients; 8 | SELECT patients.first_name,patients.age FROM patients WHERE patients.gender='male' OR patients.age>=18; 9 | SELECT min(patients.age) FROM patients; 10 | SELECT min(patients.length_of_stay) FROM patients WHERE patients.gender='male'; 11 | SELECT patients.gender,min(patients.length_of_stay) FROM patients GROUP BY patients.gender; 12 | SELECT patients.gender,min(patients.length_of_stay) FROM patients WHERE patients.diagnosis='flu' GROUP BY patients.gender; 13 | SELECT patients.last_name,patients.diagnosis FROM patients; 14 | SELECT patients.last_name,patients.age,patients.diagnosis FROM patients; 15 | SELECT patients.first_name,patients.last_name,patients.age FROM patients; 16 | SELECT patients.last_name,patients.age FROM patients; 17 | SELECT patients.last_name,patients.diagnosis FROM patients; 18 | SELECT patients.last_name,patients.gender FROM patients; 19 | SELECT patients.first_name,patients.last_name FROM patients WHERE patients.age>=20 AND patients.age<=30; 20 | SELECT patients.age FROM patients WHERE patients.length_of_stay>=3; 21 | SELECT patients.age FROM patients WHERE patients.length_of_stay<=3; 22 | SELECT patients.age FROM patients WHERE patients.length_of_stay<3; 23 | SELECT patients.age FROM patients WHERE patients.length_of_stay>3; 24 | SELECT patients.last_name,patients.diagnosis FROM patients WHERE patients.gender='male' OR patients.age>18; 25 | SELECT sum(patients.age) FROM patients; 26 | SELECT max(patients.length_of_stay) FROM patients; 27 | SELECT count(1) FROM patients; 28 | SELECT min(patients.length_of_stay) FROM patients; 29 | SELECT sum(patients.length_of_stay) FROM patients; 30 | SELECT count(1) FROM patients WHERE patients.gender='male' AND patients.age>=18; 31 | SELECT min(patients.length_of_stay) FROM patients WHERE patients.age=18; 32 | SELECT sum(patients.age) FROM patients WHERE patients.diagnosis<>'flu'; 33 | SELECT sum(patients.length_of_stay) FROM patients WHERE patients.diagnosis='flu'; 34 | SELECT sum(patients.age) FROM patients WHERE patients.age>=20 AND patients.age<=30; 35 | SELECT max(patients.length_of_stay) FROM patients WHERE patients.age<25; 36 | SELECT patients.gender,sum(patients.age) FROM patients GROUP BY patients.gender; 37 | SELECT patients.gender,count(1) FROM patients GROUP BY patients.gender; 38 | SELECT patients.diagnosis,min(patients.length_of_stay) FROM patients GROUP BY patients.diagnosis; 39 | SELECT patients.gender,sum(patients.age) FROM patients WHERE patients.age>=18 GROUP BY patients.gender; 40 | SELECT patients.diagnosis,sum(patients.length_of_stay) FROM patients GROUP BY patients.diagnosis; 41 | SELECT patients.diagnosis,sum(patients.age) FROM patients GROUP BY patients.diagnosis; 42 | SELECT patients.gender,sum(patients.age) FROM patients WHERE patients.length_of_stay<=5 GROUP BY patients.gender; 43 | SELECT patients.diagnosis,sum(patients.length_of_stay) FROM patients WHERE patients.age>=20 AND patients.age<=30 GROUP BY patients.diagnosis; 44 | SELECT patients.diagnosis,sum(patients.age) FROM patients WHERE patients.length_of_stay<6 GROUP BY patients.diagnosis; 45 | SELECT patients.gender,sum(patients.age) FROM patients WHERE patients.diagnosis<>'flu' GROUP BY patients.gender; 46 | SELECT patients.gender,count(1) FROM patients WHERE patients.age>=10 AND patients.age<=25 GROUP BY patients.gender; 47 | SELECT patients.diagnosis,min(patients.length_of_stay) FROM patients WHERE patients.first_name='John' GROUP BY patients.diagnosis; 48 | SELECT patients.first_name,patients.last_name FROM patients; 49 | SELECT distinct(patients.diagnosis) FROM patients; 50 | SELECT patients.first_name,patients.last_name FROM patients WHERE patients.diagnosis='flu'; 51 | SELECT patients.first_name,patients.last_name FROM patients WHERE patients.diagnosis<>'flu'; 52 | SELECT max(patients.age) FROM patients; 53 | SELECT avg(patients.length_of_stay) FROM patients; 54 | SELECT max(patients.age) FROM patients WHERE patients.gender='female' AND patients.age>=18; 55 | SELECT avg(patients.length_of_stay) FROM patients WHERE patients.age=15; 56 | SELECT patients.diagnosis,avg(patients.length_of_stay) FROM patients GROUP BY patients.diagnosis; 57 | SELECT patients.diagnosis,avg(patients.length_of_stay) FROM patients WHERE patients.gender='male' GROUP BY patients.diagnosis; 58 | -------------------------------------------------------------------------------- /test/semantic_source.txt: -------------------------------------------------------------------------------- 1 | return a table of last names of all patients 2 | return a table of patients who are 18 or older and male 3 | what is the mean patient age ? 4 | count the flu-diagnosed patients 5 | find the ages of the eldest patient of each diagnosis 6 | for each diagnosis , display the diagnosis and the age of the patient with the highest age who stayed for 3 or more days 7 | how much have all patients aged ? 8 | display the first names and ages of either male gendered or at least 18 year old patients 9 | what is the age of the least aged patient ? 10 | what is the shortest length of stay among patients whose sex is male ? 11 | get the smallest hospitalization period from each gender category 12 | what are the shortest stay durations of patients with flu diagnosis for each gender category ? 13 | get all patients' last names and diseases 14 | display the surname , age , and illness of every patient 15 | show a complete list of patients' first names , last names and ages 16 | for every patient , show their surname and years lived 17 | show the surname and illness for every patient 18 | show a table exhibiting the surname and the gender for every patient in the database 19 | show the first names and last names for patients in the age range from 20 to 30 20 | show the ages of all patients whose length of stay is longer than or equal to 3 21 | display the list of ages of all patients whose duration of stay is shorter than or equal in length to 3 22 | compile the ages of all patients whose duration of stay is shorter than 3 23 | find all patients who stayed for more than 3 and display their ages 24 | compile patients who are either male gendered or aged greater than 18 and display their last names and diagnosis 25 | aggregate the ages of all patients in the database 26 | display the longest length of stay out of the patients table 27 | show the patient total 28 | what was the length of stay of the patient with the shortest length of stay ? 29 | what is the total combined length of stay for all patients ? 30 | what is the size of the group of patient who are both of male gender and at least 18 years of age ? 31 | find all patients whose age is 18 and find the one with the shortest length of stay 32 | cumulate the ages of all not flu-diagnosed patients 33 | show the sum of length of stay of patients who were diagnosed with flu 34 | what is the sum of length of stay for patients in the age range from 20 to 30 ? 35 | find the length of stay of the patient aged less than 25 with the longest length of stay 36 | sum the ages of all patients sorted by gender 37 | count the number of patients by what gender they identify as 38 | identify the shortest lengths of stay for patients in each diagnosis category 39 | what is the cumulative sum of ages of all patients aged 18 and over in each gender ? 40 | add up all the lengths of stay for patients in each diagnosis category 41 | aggregate the ages of patients for each diagnosis 42 | what is the aggregate of ages for each gender of patients whose length of stay is 5 or shorter ? 43 | for each diagnosis , from patients in the age range from 20 to 30 , what is the sum of length of stay ? 44 | for patients who stayed for less than 6 sorted by diagnosis , what are the aggregates of ages ? 45 | find all patients diagnosed with flu , sort them into groups by gender and summate their ages 46 | what is the total sum of patients in the age range from 10 to 25 for each gender ? 47 | What was the shortest length of stay a patient named John spent for each diagnosis ? 48 | display a list of patient first and last names 49 | list the distinct values of diagnosis 50 | display a list first and last names of of flu diagnosed patients 51 | display a list of patient first and last names for patients not diagnosed with the flu 52 | how old is the oldest aged patient ? 53 | average the periods of stay of all patients 54 | find the oldest aged patient of female gender 55 | get the mean of patients' length of stay where they are exactly as old as 15 56 | for each different diagnosis , calculate the mean stay legnth 57 | compute the mean of male gender patients' length of stay per diagnosis 58 | -------------------------------------------------------------------------------- /test/syntactic_source.txt: -------------------------------------------------------------------------------- 1 | what are all patients' last names ? 2 | what are the first name and last name of all patients where age is greater than or equal to 18 and gender is male ? 3 | what is the patients' average age ? 4 | of patients where diagnosis is flu , what is the count ? 5 | what is the maximum age of patients for each diagnosis ? 6 | what is the maximum patient age where length of stay is greater than or equal to 3 for each diagnosis ? 7 | Of all patients , what are the ages ? 8 | for patients where gender equals male or age is greater than or equal to 18 , show first name and age 9 | of all patients , the minimum age is what ? 10 | of patients where male is the gender , minimum length of stay is what ? 11 | for each gender , the minimum length of stay of patients is what ? 12 | for each gender , get the minimum length of stay of patients where flu is equal to diagnosis 13 | for all patients , show last name and diagnosis 14 | for all patients , show last name , age , and diagnosis 15 | what are patients' first names , last names , and ages ? 16 | find patients' last names and ages 17 | the last names and diagnoses of patients are what ? 18 | the last names and genders of patients are what ? 19 | from patients where age is greater than or equal to 20 and age is less than or equal to 30 , the first names and last names are what ? 20 | from patients where length of stay is greater than or equal to 3 , what are the ages ? 21 | from patients where length of stay is less than or equal to 3 , what are the ages ? 22 | from patients where length of stay is less than 3 , what are the ages ? 23 | from patients where length of stay is greater than 3 , what are the ages ? 24 | from patients where gender is male or age is greater than or equal to 18 , last names and diagnosis are what ? 25 | from all patients show their sum of ages ? 26 | of all patients , the maximum length of stay is what ? 27 | the number of patients is what ? 28 | of all patients , what is the minimum length of stay ? 29 | of all patients , what is the sum of length of stay ? 30 | what is the number of patients where male is the gender and 18 or greater is the age ? 31 | find from patients where age equals 18 the minimum length of stay 32 | of all patients where diagnosis is not flu , what is the sum of age ? 33 | the sum of lengths of stay of patients where diagnosis is flu is what ? 34 | for patients where age is greater than or equal to 20 and age is less than or equal to 30 , what is the sum of ages ? 35 | from patients where age is less than 25 , what is the maximum length of stay ? 36 | what is the sum of patients' ages for each gender ? 37 | what is the number of patients for each gender ? 38 | the minimum length of stay for patients is what for each diagnosis ? 39 | from patients where age is greater than or equal to 18 , what is the sum of ages sorted by gender ? 40 | for each diagnosis , the sum of stay length of patients is what ? 41 | what is the sum of ages of patients for each diagnosis ? 42 | the sum of ages of patients where length of stay is equal to 5 or less is what for each gender ? 43 | what is the length of stay sum for each diagnosis of patients where age is greater than or equal to 20 and age is equal to 30 or less ? 44 | the sum of age of patients where length of stay is less than 6 is what for each diagnosis ? 45 | what is the sum of ages for each gender of patients where diagnosis is not flu ? 46 | where age is equal to 25 or less and age is equal to 10 or greater , what is the number of patients for each gender ? 47 | for each diagnosis , from all patients where John is the first name , what is the minimum length of stay ? 48 | Of all patients the first names and last names are what ? 49 | the distinct diagnoses of patients are what ? 50 | where diagnosis is flu , what are patients' first names and last names ? 51 | of patients where diagnosis is not flu , the first names and last names are what ? 52 | the patients' maximum age is what ? 53 | the mean stay length is what ? 54 | the maximimum patient age is what where female is gender and age is greater than or equal to 18 ? 55 | where age is equal to 15 , what is the average length of stay of patients ? 56 | what is the patients' average length of stay for each diagnosis ? 57 | what is the average length of stay of patients where gender is male for each diagnosis ? 58 | -------------------------------------------------------------------------------- /usage.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Usage 3 | --- 4 | 5 | ## Usage 6 | 7 | [Back to overview](. "Back to overview") 8 | 9 | ### The database -- scheme and data 10 | 11 | We provide a sql file containing a database scheme and a dump of entries as well as a machine-readable version of the scheme together with some meta information in json format. Both can be found in the `db/` folder. 12 | 13 | The database only contains one relation named *patients* with the following structure. 14 | 15 | CREATE TABLE `patients` ( 16 | `id` mediumint(8) unsigned NOT NULL auto_increment, 17 | `first_name` varchar(255) default NULL, 18 | `last_name` varchar(255) default NULL, 19 | `diagnosis` varchar(255) default NULL, 20 | `length_of_stay` mediumint default NULL, 21 | `age` mediumint default NULL, 22 | `gender` varchar(255) default NULL, 23 | PRIMARY KEY (`id`) 24 | ) AUTO_INCREMENT=1; 25 | 26 | To use the benchmark, simply import the dump into a relational database. 27 | 28 | 29 | ### Test 30 | 31 | The actual test consists of a range of natural language queries with associated sql queries. They can be used to test the coverage and the robustness against variations of natural language of your NLIDB. All tests can be found in the `test/` folder. 32 | 33 | The test cases consist of 57 different queries in 6 variations each and the associated sql queries. The variations of the naive query are: *Syntactic, Morphological, Lexical, Semantic, Missing Information*. Each variant can be found in a file called `_source.txt`, the corresponding sql queries are located in the corresponding lines of the file `patients_test.sql`. 34 | 35 | To test your NLIDB run the results of your system against the database provided above and compare the results to those of the gold standard. We propose the following metric: A query is counted as correctly translated when it produces the exact result (attributes and rows returned) or at least the expected rows and a superset of the expected attributes/columns. This metric is already implemented in `evaluate.py`. 36 | 37 | [Back to overview](. "Back to overview") --------------------------------------------------------------------------------