├── .gitignore ├── CODEOWNERS ├── CODE_OF_CONDUCT.md ├── LICENSE.txt ├── MixQG ├── README.md ├── configs │ └── ds_config_zero2.json ├── data │ ├── merge_datasets.py │ └── preprocess_datasets.py ├── eval.sh ├── requirements.txt ├── run_qg.py └── train.sh ├── Quiz_Design ├── README.md ├── model_hf_generator.py ├── qd_content.json ├── quiz_design_data.jsonl ├── quiz_design_groups.jsonl ├── requirements.txt ├── run_flask_server.py ├── static │ ├── Quiz_Design_Tutorial.mp4 │ ├── live.js │ ├── main.css │ └── slideshow.css ├── templates │ └── main_page.html └── utils_qd_data.py ├── README.md └── SECURITY.md /.gitignore: -------------------------------------------------------------------------------- 1 | # Byte-compiled / optimized / DLL files 2 | __pycache__/ 3 | *.py[cod] 4 | *$py.class 5 | 6 | # C extensions 7 | *.so 8 | 9 | # Distribution / packaging 10 | .Python 11 | build/ 12 | develop-eggs/ 13 | dist/ 14 | downloads/ 15 | eggs/ 16 | .eggs/ 17 | lib/ 18 | lib64/ 19 | parts/ 20 | sdist/ 21 | var/ 22 | wheels/ 23 | pip-wheel-metadata/ 24 | share/python-wheels/ 25 | *.egg-info/ 26 | .installed.cfg 27 | *.egg 28 | MANIFEST 29 | 30 | # PyInstaller 31 | # Usually these files are written by a python script from a template 32 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 33 | *.manifest 34 | *.spec 35 | 36 | # Installer logs 37 | pip-log.txt 38 | pip-delete-this-directory.txt 39 | 40 | # Unit test / coverage reports 41 | htmlcov/ 42 | .tox/ 43 | .nox/ 44 | .coverage 45 | .coverage.* 46 | .cache 47 | nosetests.xml 48 | coverage.xml 49 | *.cover 50 | *.py,cover 51 | .hypothesis/ 52 | .pytest_cache/ 53 | 54 | # Translations 55 | *.mo 56 | *.pot 57 | 58 | # Django stuff: 59 | *.log 60 | local_settings.py 61 | db.sqlite3 62 | db.sqlite3-journal 63 | 64 | # Flask stuff: 65 | instance/ 66 | .webassets-cache 67 | 68 | # Scrapy stuff: 69 | .scrapy 70 | 71 | # Sphinx documentation 72 | docs/_build/ 73 | 74 | # PyBuilder 75 | target/ 76 | 77 | # Jupyter Notebook 78 | .ipynb_checkpoints 79 | 80 | # IPython 81 | profile_default/ 82 | ipython_config.py 83 | 84 | # pyenv 85 | .python-version 86 | 87 | # pipenv 88 | # According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control. 89 | # However, in case of collaboration, if having platform-specific dependencies or dependencies 90 | # having no cross-platform support, pipenv may install dependencies that don't work, or not 91 | # install all needed dependencies. 92 | #Pipfile.lock 93 | 94 | # PEP 582; used by e.g. github.com/David-OConnor/pyflow 95 | __pypackages__/ 96 | 97 | # Celery stuff 98 | celerybeat-schedule 99 | celerybeat.pid 100 | 101 | # SageMath parsed files 102 | *.sage.py 103 | 104 | # Environments 105 | .env 106 | .venv 107 | env/ 108 | venv/ 109 | ENV/ 110 | env.bak/ 111 | venv.bak/ 112 | 113 | # Spyder project settings 114 | .spyderproject 115 | .spyproject 116 | 117 | # Rope project settings 118 | .ropeproject 119 | 120 | # mkdocs documentation 121 | /site 122 | 123 | # mypy 124 | .mypy_cache/ 125 | .dmypy.json 126 | dmypy.json 127 | 128 | # Pyre type checker 129 | .pyre/ 130 | 131 | -------------------------------------------------------------------------------- /CODEOWNERS: -------------------------------------------------------------------------------- 1 | # Comment line immediately above ownership line is reserved for related gus information. Please be careful while editing. 2 | #ECCN:Open Source 3 | -------------------------------------------------------------------------------- /CODE_OF_CONDUCT.md: -------------------------------------------------------------------------------- 1 | # Salesforce Open Source Community Code of Conduct 2 | 3 | ## About the Code of Conduct 4 | 5 | Equality is a core value at Salesforce. We believe a diverse and inclusive 6 | community fosters innovation and creativity, and are committed to building a 7 | culture where everyone feels included. 8 | 9 | Salesforce open-source projects are committed to providing a friendly, safe, and 10 | welcoming environment for all, regardless of gender identity and expression, 11 | sexual orientation, disability, physical appearance, body size, ethnicity, nationality, 12 | race, age, religion, level of experience, education, socioeconomic status, or 13 | other similar personal characteristics. 14 | 15 | The goal of this code of conduct is to specify a baseline standard of behavior so 16 | that people with different social values and communication styles can work 17 | together effectively, productively, and respectfully in our open source community. 18 | It also establishes a mechanism for reporting issues and resolving conflicts. 19 | 20 | All questions and reports of abusive, harassing, or otherwise unacceptable behavior 21 | in a Salesforce open-source project may be reported by contacting the Salesforce 22 | Open Source Conduct Committee at ossconduct@salesforce.com. 23 | 24 | ## Our Pledge 25 | 26 | In the interest of fostering an open and welcoming environment, we as 27 | contributors and maintainers pledge to making participation in our project and 28 | our community a harassment-free experience for everyone, regardless of gender 29 | identity and expression, sexual orientation, disability, physical appearance, 30 | body size, ethnicity, nationality, race, age, religion, level of experience, education, 31 | socioeconomic status, or other similar personal characteristics. 32 | 33 | ## Our Standards 34 | 35 | Examples of behavior that contributes to creating a positive environment 36 | include: 37 | 38 | * Using welcoming and inclusive language 39 | * Being respectful of differing viewpoints and experiences 40 | * Gracefully accepting constructive criticism 41 | * Focusing on what is best for the community 42 | * Showing empathy toward other community members 43 | 44 | Examples of unacceptable behavior by participants include: 45 | 46 | * The use of sexualized language or imagery and unwelcome sexual attention or 47 | advances 48 | * Personal attacks, insulting/derogatory comments, or trolling 49 | * Public or private harassment 50 | * Publishing, or threatening to publish, others' private information—such as 51 | a physical or electronic address—without explicit permission 52 | * Other conduct which could reasonably be considered inappropriate in a 53 | professional setting 54 | * Advocating for or encouraging any of the above behaviors 55 | 56 | ## Our Responsibilities 57 | 58 | Project maintainers are responsible for clarifying the standards of acceptable 59 | behavior and are expected to take appropriate and fair corrective action in 60 | response to any instances of unacceptable behavior. 61 | 62 | Project maintainers have the right and responsibility to remove, edit, or 63 | reject comments, commits, code, wiki edits, issues, and other contributions 64 | that are not aligned with this Code of Conduct, or to ban temporarily or 65 | permanently any contributor for other behaviors that they deem inappropriate, 66 | threatening, offensive, or harmful. 67 | 68 | ## Scope 69 | 70 | This Code of Conduct applies both within project spaces and in public spaces 71 | when an individual is representing the project or its community. Examples of 72 | representing a project or community include using an official project email 73 | address, posting via an official social media account, or acting as an appointed 74 | representative at an online or offline event. Representation of a project may be 75 | further defined and clarified by project maintainers. 76 | 77 | ## Enforcement 78 | 79 | Instances of abusive, harassing, or otherwise unacceptable behavior may be 80 | reported by contacting the Salesforce Open Source Conduct Committee 81 | at ossconduct@salesforce.com. All complaints will be reviewed and investigated 82 | and will result in a response that is deemed necessary and appropriate to the 83 | circumstances. The committee is obligated to maintain confidentiality with 84 | regard to the reporter of an incident. Further details of specific enforcement 85 | policies may be posted separately. 86 | 87 | Project maintainers who do not follow or enforce the Code of Conduct in good 88 | faith may face temporary or permanent repercussions as determined by other 89 | members of the project's leadership and the Salesforce Open Source Conduct 90 | Committee. 91 | 92 | ## Attribution 93 | 94 | This Code of Conduct is adapted from the [Contributor Covenant][contributor-covenant-home], 95 | version 1.4, available at https://www.contributor-covenant.org/version/1/4/code-of-conduct.html. 96 | It includes adaptions and additions from [Go Community Code of Conduct][golang-coc], 97 | [CNCF Code of Conduct][cncf-coc], and [Microsoft Open Source Code of Conduct][microsoft-coc]. 98 | 99 | This Code of Conduct is licensed under the [Creative Commons Attribution 3.0 License][cc-by-3-us]. 100 | 101 | [contributor-covenant-home]: https://www.contributor-covenant.org (https://www.contributor-covenant.org/) 102 | [golang-coc]: https://golang.org/conduct 103 | [cncf-coc]: https://github.com/cncf/foundation/blob/master/code-of-conduct.md 104 | [microsoft-coc]: https://opensource.microsoft.com/codeofconduct/ 105 | [cc-by-3-us]: https://creativecommons.org/licenses/by/3.0/us/ -------------------------------------------------------------------------------- /LICENSE.txt: -------------------------------------------------------------------------------- 1 | BSD 3-Clause License 2 | 3 | Copyright (c) 2021, Salesforce.com, Inc. 4 | All rights reserved. 5 | 6 | Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 7 | 8 | 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 9 | 10 | 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 11 | 12 | 3. Neither the name of Salesforce.com nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. 13 | 14 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 15 | -------------------------------------------------------------------------------- /MixQG/README.md: -------------------------------------------------------------------------------- 1 | # MixQG: Neural Question Generation with Mixed Answer Types 2 | 3 | This is the official code base for the following paper from Salesforce Research: 4 | 5 | **Title**: [MixQG: Neural Question Generation with Mixed Answer Types](https://arxiv.org/abs/2110.08175) 6 | 7 | **Authors**: Lidiya Murakhovs'ka, Chien-Sheng Wu, Tong Niu, Wenhao Liu, Caiming Xiong 8 | 9 | ## Abstract 10 | 11 | Asking good questions is an essential ability for both human and machine intelligence. However, existing neural question generation approaches mainly focus on the short factoid type of answers. In this paper, we propose a neural question generator, MixQG, to bridge this gap. We combine 9 question answering datasets with diverse answer types, including yes/no, multiple-choice, extractive, and abstractive answers, to train a single generative model. We show with empirical results that our model outperforms existing work in both seen and unseen domains and can generate questions with different cognitive levels when conditioned on different answer types. Our code is released and well-integrated with the Huggingface library to facilitate various downstream applications. 12 | 13 | ## Usage 14 | 15 | MixQG pre-trained models are available through the Huggingface library: 16 | 17 | ``` 18 | from transformers import AutoTokenizer, AutoModelForSeq2SeqLM 19 | 20 | model_name = "Salesforce/mixqg-base" 21 | tokenizer = AutoTokenizer.from_pretrained(model_name) 22 | model = AutoModelForSeq2SeqLM.from_pretrained(model_name) 23 | 24 | def run_qg(input_text, **generator_args): 25 | input_ids = tokenizer.encode(input_text, return_tensors="pt") 26 | generated_ids = model.generate(input_ids, **generator_args) 27 | return tokenizer.batch_decode(generated_ids, skip_special_tokens=True) 28 | ``` 29 | 30 | Input text should be formatted as follows: `f"{answer} \\n {context}"` 31 | 32 | For example, 33 | ``` 34 | run_qg("Robert Boyle \\n In the late 17th century, Robert Boyle proved that air is necessary for combustion.") 35 | # should output ['Who proved that air is necessary for combustion?'] 36 | ``` 37 | 38 | ## Released Model Checkpoints 39 | 40 | We have released the following checkpoints for pre-trained models described in our paper: 41 | - MixQG-base (220M parameters): [link](https://huggingface.co/Salesforce/mixqg-base) 42 | - MixQG-large (770M parameters): [link](https://huggingface.co/Salesforce/mixqg-large) 43 | - MixQG-3B (3B parameters): [link](https://huggingface.co/Salesforce/mixqg-3b) 44 | 45 | ## Set up 46 | `pip install -r requirements.txt` 47 | 48 | ## Preprocessing 49 | Preprocess the required datasets and merge them into one in the `DIR` folder. 50 | ``` 51 | DIR=/PATH/TO/DATASET/FOLDER 52 | python data/preprocess_datasets.py --dir $DIR 53 | python data/merge_datasets.py --dir $DIR 54 | ``` 55 | The `DIR` folder will contain each of the preprocessed in-domain and out-of-domain datasets as well as the final `mixqg` dataset. 56 | 57 | ## Training 58 | ``` 59 | num_gpus=4 60 | model_name=t5-base 61 | dataset=${DIR}/mixqg 62 | output_dir=mixqg-base 63 | lr=3e-5 64 | bs=32 65 | 66 | ./train.sh $num_gpus $model_name $dataset $output_dir $lr $bs 67 | ``` 68 | ## Fine-tuning 69 | ``` 70 | num_gpus=4 71 | model_name=Salesforce/mixqg-base 72 | dataset=${DIR}/squad 73 | output_dir=mixqg-base-squad 74 | lr=3e-6 75 | bs=32 76 | 77 | ./train.sh $num_gpus $model_name $dataset $output_dir $lr $bs 78 | ``` 79 | 80 | ## Evaluation 81 | ``` 82 | gpu=0 83 | model=Salesforce/mixqg-base 84 | dataset=${DIR}/squad 85 | output_dir=mixqg-base-squad-eval 86 | bs=32 87 | 88 | ./eval.sh $gpu $model $dataset $output_dir $bs 89 | ``` 90 | 91 | ## Citation 92 | 93 | ``` 94 | @misc{murakhovska2021mixqg, 95 | title={MixQG: Neural Question Generation with Mixed Answer Types}, 96 | author={Lidiya Murakhovs'ka and Chien-Sheng Wu and Tong Niu and Wenhao Liu and Caiming Xiong}, 97 | year={2021}, 98 | eprint={2110.08175}, 99 | archivePrefix={arXiv}, 100 | primaryClass={cs.CL} 101 | } 102 | ``` -------------------------------------------------------------------------------- /MixQG/configs/ds_config_zero2.json: -------------------------------------------------------------------------------- 1 | { 2 | "fp16": { 3 | "enabled": "auto", 4 | "loss_scale": 0, 5 | "loss_scale_window": 1000, 6 | "initial_scale_power": 16, 7 | "hysteresis": 2, 8 | "min_loss_scale": 1 9 | }, 10 | 11 | "optimizer": { 12 | "type": "AdamW", 13 | "params": { 14 | "lr": "auto", 15 | "betas": "auto", 16 | "eps": "auto", 17 | "weight_decay": "auto" 18 | } 19 | }, 20 | 21 | "scheduler": { 22 | "type": "WarmupLR", 23 | "params": { 24 | "warmup_min_lr": "auto", 25 | "warmup_max_lr": "auto", 26 | "warmup_num_steps": "auto" 27 | } 28 | }, 29 | 30 | "zero_optimization": { 31 | "stage": 2, 32 | "offload_optimizer": { 33 | "device": "cpu", 34 | "pin_memory": true 35 | }, 36 | "allgather_partitions": true, 37 | "allgather_bucket_size": 2e8, 38 | "overlap_comm": true, 39 | "reduce_scatter": true, 40 | "reduce_bucket_size": 2e8, 41 | "contiguous_gradients": true 42 | }, 43 | 44 | "gradient_accumulation_steps": "auto", 45 | "gradient_clipping": "auto", 46 | "steps_per_print": 2000, 47 | "train_batch_size": "auto", 48 | "train_micro_batch_size_per_gpu": "auto", 49 | "wall_clock_breakdown": false 50 | } -------------------------------------------------------------------------------- /MixQG/data/merge_datasets.py: -------------------------------------------------------------------------------- 1 | ''' 2 | Copyright (c) 2021, salesforce.com, inc. 3 | All rights reserved. 4 | SPDX-License-Identifier: BSD-3-Clause 5 | For full license text, see the LICENSE file in the repo root or https://opensource.org/licenses/BSD-3-Clause 6 | ''' 7 | 8 | import argparse 9 | import os 10 | 11 | from datasets import DatasetDict, load_from_disk, concatenate_datasets 12 | 13 | 14 | def main(args): 15 | DIR = args.dir 16 | 17 | # Load datasets 18 | mrqa = load_from_disk(f"{DIR}/mrqa") 19 | narrativeqa = load_from_disk(f"{DIR}/narrativeqa") 20 | mctest = load_from_disk(f"{DIR}/mctest") 21 | boolq = load_from_disk(f"{DIR}/boolq") 22 | 23 | loaded_datasets = [mrqa, narrativeqa, mctest, boolq] 24 | 25 | # Shuffle 26 | train_datasets = [d["train"].shuffle() 27 | for d in loaded_datasets if "train" in d.keys()] 28 | eval_datasets = [d["validation"] 29 | for d in loaded_datasets if "validation" in d.keys()] 30 | test_datasets = [d["test"] 31 | for d in loaded_datasets if "test" in d.keys()] 32 | 33 | # Merge & Save 34 | train_dataset = concatenate_datasets(train_datasets) 35 | eval_dataset = concatenate_datasets(eval_datasets) 36 | test_dataset = concatenate_datasets(test_datasets) 37 | 38 | combined = DatasetDict({ 39 | "train": train_dataset.shuffle(), 40 | "validation": eval_dataset, 41 | "test": test_dataset 42 | }) 43 | 44 | if not os.path.isdir(f"{DIR}/mixqg"): 45 | combined.save_to_disk(f"{DIR}/mixqg") 46 | 47 | 48 | if __name__ == "__main__": 49 | parser = argparse.ArgumentParser() 50 | parser.add_argument("--dir", type=str, default="", 51 | help="Path to the datasets directory.") 52 | args = parser.parse_args() 53 | main(args) 54 | -------------------------------------------------------------------------------- /MixQG/data/preprocess_datasets.py: -------------------------------------------------------------------------------- 1 | ''' 2 | Copyright (c) 2021, salesforce.com, inc. 3 | All rights reserved. 4 | SPDX-License-Identifier: BSD-3-Clause 5 | For full license text, see the LICENSE file in the repo root or https://opensource.org/licenses/BSD-3-Clause 6 | ''' 7 | 8 | import argparse 9 | import os 10 | import spacy 11 | 12 | from datasets import load_dataset 13 | 14 | 15 | nlp = spacy.load("en_core_web_sm") 16 | MC_map = {'A': 0, 'B': 1, 'C': 2, 'D': 3, 'E': 4, 'F': 5, 'G': 6} 17 | 18 | 19 | def preprocess_squad(examples): 20 | context = [] 21 | question = [] 22 | answer = [] 23 | for i in range(len(examples["answers"])): 24 | if len(examples["answers"][i]["text"]) > 0: 25 | answer.append(examples["answers"][i]["text"][0]) 26 | context.append(examples["context"][i]) 27 | question.append(examples["question"][i]) 28 | 29 | return { 30 | "context": context, 31 | "question": question, 32 | "answer": answer 33 | } 34 | 35 | 36 | def preprocess_narrative_qa(examples): 37 | context = [] 38 | question = [] 39 | answer = [] 40 | for i in range(len(examples['answers'])): 41 | context.append(examples['document'][i]['summary']['text']) 42 | question.append(examples['question'][i]['text']) 43 | answer.append(examples['answers'][i][0]['text']) 44 | 45 | return { 46 | "context": context, 47 | "question": question, 48 | "answer": answer 49 | } 50 | 51 | 52 | def preprocess_mrqa(examples): 53 | question = [] 54 | answer = [] 55 | context = [] 56 | for i in range(len(examples["answers"])): 57 | if len(examples["answers"][i]) > 0: 58 | answer.append(examples["answers"][i][0]) 59 | context.append(examples["context"][i]) 60 | question.append(examples["question"][i]) 61 | 62 | return { 63 | "context": context, 64 | "question": question, 65 | "answer": answer 66 | } 67 | 68 | 69 | def preprocess_mctest(examples): 70 | context = examples['story'] 71 | question = examples['question'] 72 | answer = [] 73 | for i in range(len(examples['question'])): 74 | answer_letter = examples['answer'][i] 75 | options = examples['answer_options'][i] 76 | correct_answer = options[answer_letter] 77 | answer.append(correct_answer) 78 | 79 | return { 80 | "context": context, 81 | "question": question, 82 | "answer": answer 83 | } 84 | 85 | 86 | def preprocess_drop(examples): 87 | question = examples["question"] 88 | context = examples["passage"] 89 | answer = [] 90 | for i in range(len(examples["answers_spans"])): 91 | answer.append(examples["answers_spans"][i]["spans"][0]) 92 | 93 | return { 94 | "context": context, 95 | "question": question, 96 | "answer": answer 97 | } 98 | 99 | 100 | def preprocess_boolq(examples): 101 | context = examples['passage'] 102 | question = examples['question'] 103 | answer = [] 104 | for i in range(len(examples['question'])): 105 | ans = 'yes' if examples['answer'][i] else 'no' 106 | doc = nlp(examples['question'][i]) 107 | entities = " ".join([ent.text for ent in doc.ents]) 108 | if len(entities) > 0: 109 | answer.append(f"{ans} {entities}") 110 | else: 111 | answer.append(ans) 112 | return { 113 | "context": context, 114 | "question": question, 115 | "answer": answer 116 | } 117 | 118 | 119 | def process_dataset(DIR, dataset_name, process_func): 120 | if os.path.isdir(f"{DIR}/{dataset_name}"): 121 | return 122 | dataset = load_dataset(dataset_name) 123 | column_names = dataset["train"].column_names 124 | processed = dataset.map( 125 | process_func, 126 | batched=True, 127 | num_proc=8, 128 | remove_columns=column_names, 129 | load_from_cache_file=True, 130 | desc=f"Running preprocessing on {dataset_name} dataset", 131 | ) 132 | print(f"Saving to disk at {DIR}/{dataset_name}") 133 | processed.save_to_disk(f"{DIR}/{dataset_name}") 134 | del processed 135 | 136 | 137 | def mctest(DIR, dataset_name="mctest"): 138 | if os.path.isdir(f"{DIR}/{dataset_name}"): 139 | return 140 | dataset = load_dataset("sagnikrayc/mctest") 141 | column_names = dataset["train"].column_names 142 | processed = dataset.map( 143 | preprocess_mctest, 144 | batched=True, 145 | num_proc=8, 146 | remove_columns=column_names, 147 | load_from_cache_file=True, 148 | desc=f"Running preprocessing on {dataset_name} dataset", 149 | ) 150 | print(f"Saving to disk at {DIR}/{dataset_name}") 151 | processed.save_to_disk(f"{DIR}/{dataset_name}") 152 | del processed 153 | 154 | 155 | def natural_questions(DIR, dataset_name="natural_questions"): 156 | if os.path.isdir(f"{DIR}/{dataset_name}"): 157 | return 158 | dataset = load_dataset("mrqa") 159 | dataset = dataset.filter(lambda ex: ex["subset"] == "NaturalQuestionsShort") 160 | column_names = dataset["train"].column_names 161 | processed = dataset.map( 162 | preprocess_mrqa, 163 | batched=True, 164 | num_proc=8, 165 | remove_columns=column_names, 166 | load_from_cache_file=True, 167 | desc=f"Running preprocessing on {dataset_name} dataset", 168 | ) 169 | print(f"Saving to disk at {DIR}/{dataset_name}") 170 | processed.save_to_disk(f"{DIR}/{dataset_name}") 171 | del processed 172 | 173 | 174 | def main(args): 175 | DIR = args.dir 176 | 177 | process_dataset(DIR, "mrqa", preprocess_mrqa) 178 | process_dataset(DIR, "narrativeqa", preprocess_narrative_qa) 179 | mctest(DIR) 180 | process_dataset(DIR, "boolq", preprocess_boolq) 181 | 182 | process_dataset(DIR, "squad", preprocess_squad) 183 | process_dataset(DIR, "quoref", preprocess_squad) 184 | process_dataset(DIR, "drop", preprocess_drop) 185 | natural_questions(DIR) 186 | 187 | 188 | if __name__ == "__main__": 189 | parser = argparse.ArgumentParser() 190 | parser.add_argument("--dir", type=str, default="", 191 | help="Path to the datasets directory.") 192 | args = parser.parse_args() 193 | main(args) 194 | -------------------------------------------------------------------------------- /MixQG/eval.sh: -------------------------------------------------------------------------------- 1 | gpu=$1 2 | model=$2 3 | dataset=$3 4 | output_dir=$4 5 | bs=$5 6 | 7 | CUDA_VISIBLE_DEVICES=${gpu} python run_qg.py \ 8 | --model_name_or_path ${model} \ 9 | --dataset_dir ${dataset} \ 10 | --output_dir ${output_dir} \ 11 | --do_eval \ 12 | --predict_with_generate True \ 13 | --per_device_eval_batch_size=${bs} \ 14 | --run_name ${output_dir} \ 15 | --report_to none \ 16 | --max_target_length 32 \ 17 | --val_max_target_length 32 \ 18 | --metric_for_best_model eval_rougeLsum \ 19 | -------------------------------------------------------------------------------- /MixQG/requirements.txt: -------------------------------------------------------------------------------- 1 | transformers==4.9.1 2 | nltk>=3.6.4 3 | numpy>=1.19.1 4 | datasets==1.13.3 5 | rouge_score==0.0.4 6 | en-core-web-sm @ https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.3.1/en_core_web_sm-2.3.1.tar.gz 7 | sacrebleu==1.4.12 8 | spacy==2.3.1 9 | bert-score==0.3.10 10 | deepspeed==0.5.4 11 | wandb==0.11.0 -------------------------------------------------------------------------------- /MixQG/run_qg.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # coding=utf-8 3 | # Copyright 2021 The HuggingFace Team. All rights reserved. 4 | # 5 | # Licensed under the Apache License, Version 2.0 (the "License"); 6 | # you may not use this file except in compliance with the License. 7 | # You may obtain a copy of the License at 8 | # 9 | # http://www.apache.org/licenses/LICENSE-2.0 10 | # 11 | # Unless required by applicable law or agreed to in writing, software 12 | # distributed under the License is distributed on an "AS IS" BASIS, 13 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 14 | # See the License for the specific language governing permissions and 15 | # limitations under the License. 16 | """ 17 | Fine-tuning script adapted for question generation. 18 | """ 19 | 20 | import logging 21 | import os 22 | import sys 23 | from dataclasses import dataclass, field 24 | from typing import Optional 25 | 26 | import nltk # Here to have a nice missing dependency error message early on 27 | import numpy as np 28 | import transformers 29 | from bert_score import BERTScorer 30 | from datasets import load_dataset, load_from_disk, load_metric 31 | from filelock import FileLock 32 | from transformers import (AutoConfig, AutoModelForSeq2SeqLM, AutoTokenizer, 33 | DataCollatorForSeq2Seq, EarlyStoppingCallback, 34 | HfArgumentParser, Seq2SeqTrainer, 35 | Seq2SeqTrainingArguments, set_seed) 36 | from transformers.file_utils import is_offline_mode 37 | from transformers.trainer_utils import get_last_checkpoint, is_main_process 38 | from transformers.utils import check_min_version 39 | 40 | # Will error if the minimal version of Transformers is not installed. Remove at your own risks. 41 | check_min_version("4.6.0.dev0") 42 | 43 | logger = logging.getLogger(__name__) 44 | 45 | try: 46 | nltk.data.find("tokenizers/punkt") 47 | except (LookupError, OSError): 48 | if is_offline_mode(): 49 | raise LookupError( 50 | "Offline mode: run this script without TRANSFORMERS_OFFLINE first to download nltk data files" 51 | ) 52 | with FileLock(".lock") as lock: 53 | nltk.download("punkt", quiet=True) 54 | 55 | 56 | @dataclass 57 | class ModelArguments: 58 | """ 59 | Arguments pertaining to which model/config/tokenizer we are going to fine-tune from. 60 | """ 61 | 62 | model_name_or_path: str = field( 63 | metadata={ 64 | "help": "Path to pretrained model or model identifier from huggingface.co/models"} 65 | ) 66 | config_name: Optional[str] = field( 67 | default=None, metadata={"help": "Pretrained config name or path if not the same as model_name"} 68 | ) 69 | tokenizer_name: Optional[str] = field( 70 | default=None, metadata={"help": "Pretrained tokenizer name or path if not the same as model_name"} 71 | ) 72 | cache_dir: Optional[str] = field( 73 | default=None, 74 | metadata={ 75 | "help": "Where to store the pretrained models downloaded from huggingface.co"}, 76 | ) 77 | use_fast_tokenizer: bool = field( 78 | default=True, 79 | metadata={ 80 | "help": "Whether to use one of the fast tokenizer (backed by the tokenizers library) or not."}, 81 | ) 82 | model_revision: str = field( 83 | default="main", 84 | metadata={ 85 | "help": "The specific model version to use (can be a branch name, tag name or commit id)."}, 86 | ) 87 | use_auth_token: bool = field( 88 | default=False, 89 | metadata={ 90 | "help": "Will use the token generated when running `transformers-cli login` (necessary to use this script " 91 | "with private models)." 92 | }, 93 | ) 94 | dropout_rate: float = field( 95 | default=0.1, 96 | metadata={"help": "Dropout rate."} 97 | ) 98 | 99 | 100 | @dataclass 101 | class DataTrainingArguments: 102 | """ 103 | Arguments pertaining to what data we are going to input our model for training and eval. 104 | """ 105 | 106 | dataset_name: Optional[str] = field( 107 | default=None, metadata={"help": "The name of the dataset to use (via the datasets library)."} 108 | ) 109 | dataset_config_name: Optional[str] = field( 110 | default=None, metadata={"help": "The configuration name of the dataset to use (via the datasets library)."} 111 | ) 112 | dataset_dir: Optional[str] = field( 113 | default=None, metadata={"help": "The input data directory (saved via save_to_disk)."} 114 | ) 115 | train_file: Optional[str] = field( 116 | default=None, metadata={"help": "The input training data file (a jsonlines or csv file)."} 117 | ) 118 | validation_file: Optional[str] = field( 119 | default=None, 120 | metadata={ 121 | "help": "An optional input evaluation data file to evaluate the metrics (rouge) on " 122 | "(a jsonlines or csv file)." 123 | }, 124 | ) 125 | test_file: Optional[str] = field( 126 | default=None, 127 | metadata={ 128 | "help": "An optional input test data file to evaluate the metrics (rouge) on " "(a jsonlines or csv file)." 129 | }, 130 | ) 131 | overwrite_cache: bool = field( 132 | default=False, metadata={"help": "Overwrite the cached training and evaluation sets"} 133 | ) 134 | preprocessing_num_workers: Optional[int] = field( 135 | default=None, 136 | metadata={"help": "The number of processes to use for the preprocessing."}, 137 | ) 138 | max_source_length: Optional[int] = field( 139 | default=512, 140 | metadata={ 141 | "help": "The maximum total input sequence length after tokenization. Sequences longer " 142 | "than this will be truncated, sequences shorter will be padded." 143 | }, 144 | ) 145 | max_target_length: Optional[int] = field( 146 | default=100, 147 | metadata={ 148 | "help": "The maximum total sequence length for target text after tokenization. Sequences longer " 149 | "than this will be truncated, sequences shorter will be padded." 150 | }, 151 | ) 152 | val_max_target_length: Optional[int] = field( 153 | default=100, 154 | metadata={ 155 | "help": "The maximum total sequence length for validation target text after tokenization. Sequences longer " 156 | "than this will be truncated, sequences shorter will be padded. Will default to `max_target_length`." 157 | "This argument is also used to override the ``max_length`` param of ``model.generate``, which is used " 158 | "during ``evaluate`` and ``predict``." 159 | }, 160 | ) 161 | pad_to_max_length: bool = field( 162 | default=True, 163 | metadata={ 164 | "help": "Whether to pad all samples to model maximum sentence length. " 165 | "If False, will pad the samples dynamically when batching to the maximum length in the batch. More " 166 | "efficient on GPU but very bad for TPU." 167 | }, 168 | ) 169 | max_train_samples: Optional[int] = field( 170 | default=None, 171 | metadata={ 172 | "help": "For debugging purposes or quicker training, truncate the number of training examples to this " 173 | "value if set." 174 | }, 175 | ) 176 | max_val_samples: Optional[int] = field( 177 | default=None, 178 | metadata={ 179 | "help": "For debugging purposes or quicker training, truncate the number of validation examples to this " 180 | "value if set." 181 | }, 182 | ) 183 | max_test_samples: Optional[int] = field( 184 | default=None, 185 | metadata={ 186 | "help": "For debugging purposes or quicker training, truncate the number of test examples to this " 187 | "value if set." 188 | }, 189 | ) 190 | num_beams: Optional[int] = field( 191 | default=4, 192 | metadata={ 193 | "help": "Number of beams to use for evaluation. This argument will be passed to ``model.generate``, " 194 | "which is used during ``evaluate`` and ``predict``." 195 | }, 196 | ) 197 | ignore_pad_token_for_loss: bool = field( 198 | default=True, 199 | metadata={ 200 | "help": "Whether to ignore the tokens corresponding to padded labels in the loss computation or not." 201 | }, 202 | ) 203 | question_column: Optional[str] = field( 204 | default="question", 205 | metadata={ 206 | "help": "The name of the column in the datasets containing the question."}, 207 | ) 208 | answer_column: Optional[str] = field( 209 | default="answer", 210 | metadata={ 211 | "help": "The name of the column in the datasets containing the answer."}, 212 | ) 213 | context_column: Optional[str] = field( 214 | default="context", 215 | metadata={ 216 | "help": "The name of the column in the datasets containing the context."}, 217 | ) 218 | early_stopping_patience: Optional[int] = field( 219 | default=15, 220 | metadata={ 221 | "help": "Early stopping patience. This argument will be passed to ``EarlyStoppingCallback``." 222 | }, 223 | ) 224 | wandb_run_id: Optional[str] = field( 225 | default=None, 226 | metadata={"help": "Wandb run id to resume training."}, 227 | ) 228 | 229 | def __post_init__(self): 230 | if self.dataset_name is None and self.train_file is None and self.validation_file is None and self.dataset_dir is None: 231 | raise ValueError( 232 | "Need either a dataset name, dataset directory or a training/validation file.") 233 | else: 234 | if self.train_file is not None: 235 | extension = self.train_file.split(".")[-1] 236 | assert extension in [ 237 | "csv", "json"], "`train_file` should be a csv or a json file." 238 | if self.validation_file is not None: 239 | extension = self.validation_file.split(".")[-1] 240 | assert extension in [ 241 | "csv", "json"], "`validation_file` should be a csv or a json file." 242 | if self.val_max_target_length is None: 243 | self.val_max_target_length = self.max_target_length 244 | 245 | 246 | def main(): 247 | # See all possible arguments in src/transformers/training_args.py 248 | # or by passing the --help flag to this script. 249 | # We now keep distinct sets of args, for a cleaner separation of concerns. 250 | 251 | parser = HfArgumentParser( 252 | (ModelArguments, DataTrainingArguments, Seq2SeqTrainingArguments)) 253 | if len(sys.argv) == 2 and sys.argv[1].endswith(".json"): 254 | # If we pass only one argument to the script and it's the path to a json file, 255 | # let's parse it to get our arguments. 256 | model_args, data_args, training_args = parser.parse_json_file( 257 | json_file=os.path.abspath(sys.argv[1])) 258 | else: 259 | model_args, data_args, training_args = parser.parse_args_into_dataclasses() 260 | 261 | # Detecting last checkpoint. 262 | last_checkpoint = None 263 | if os.path.isdir(training_args.output_dir) and training_args.do_train and not training_args.overwrite_output_dir: 264 | last_checkpoint = get_last_checkpoint(training_args.output_dir) 265 | if last_checkpoint is None and len(os.listdir(training_args.output_dir)) > 0: 266 | raise ValueError( 267 | f"Output directory ({training_args.output_dir}) already exists and is not empty. " 268 | "Use --overwrite_output_dir to overcome." 269 | ) 270 | elif last_checkpoint is not None and training_args.resume_from_checkpoint is None: 271 | logger.info( 272 | f"Checkpoint detected, resuming training at {last_checkpoint}. To avoid this behavior, change " 273 | "the `--output_dir` or add `--overwrite_output_dir` to train from scratch." 274 | ) 275 | 276 | # Setup logging 277 | logging.basicConfig( 278 | format="%(asctime)s - %(levelname)s - %(name)s - %(message)s", 279 | datefmt="%m/%d/%Y %H:%M:%S", 280 | handlers=[logging.StreamHandler(sys.stdout)], 281 | ) 282 | logger.setLevel(logging.INFO if is_main_process( 283 | training_args.local_rank) else logging.WARN) 284 | 285 | # Log on each process the small summary: 286 | logger.warning( 287 | f"Process rank: {training_args.local_rank}, device: {training_args.device}, n_gpu: {training_args.n_gpu} " 288 | + f"distributed training: {bool(training_args.local_rank != -1)}, 16-bits training: {training_args.fp16}" 289 | ) 290 | # Set the verbosity to info of the Transformers logger (on main process only): 291 | if is_main_process(training_args.local_rank): 292 | transformers.utils.logging.set_verbosity_info() 293 | logger.info(f"Training/evaluation parameters {training_args}") 294 | 295 | # Set seed before initializing model. 296 | set_seed(training_args.seed) 297 | 298 | # Set project name 299 | os.environ["TOKENIZERS_PARALLELISM"] = "false" 300 | os.environ["WANDB_PROJECT"] = "question_generation" 301 | if data_args.wandb_run_id: 302 | os.environ["WANDB_RESUME"] = "allow" 303 | os.environ["WANDB_RUN_ID"] = data_args.wandb_run_id 304 | 305 | # Get the datasets: you can either provide your own CSV/JSON training and evaluation files (see below) 306 | # or just provide the name of one of the public datasets available on the hub at https://huggingface.co/datasets/ 307 | # (the dataset will be downloaded automatically from the datasets Hub). 308 | # 309 | # In distributed training, the load_dataset function guarantee that only one local process can concurrently 310 | # download the dataset. 311 | if data_args.dataset_name is not None: 312 | # Downloading and loading a dataset from the hub. 313 | datasets = load_dataset(data_args.dataset_name, 314 | data_args.dataset_config_name) 315 | elif data_args.dataset_dir is not None: 316 | datasets = load_from_disk(data_args.dataset_dir) 317 | else: 318 | data_files = {} 319 | if data_args.train_file is not None: 320 | data_files["train"] = data_args.train_file 321 | extension = data_args.train_file.split(".")[-1] 322 | if data_args.validation_file is not None: 323 | data_files["validation"] = data_args.validation_file 324 | extension = data_args.validation_file.split(".")[-1] 325 | if data_args.test_file is not None: 326 | data_files["test"] = data_args.test_file 327 | extension = data_args.test_file.split(".")[-1] 328 | datasets = load_dataset(extension, data_files=data_files) 329 | # See more about loading any type of standard or custom dataset (from files, python dict, pandas DataFrame, etc) at 330 | # https://huggingface.co/docs/datasets/loading_datasets.html. 331 | 332 | # Load pretrained model and tokenizer 333 | # 334 | # Distributed training: 335 | # The .from_pretrained methods guarantee that only one local process can concurrently 336 | # download model & vocab. 337 | config = AutoConfig.from_pretrained( 338 | model_args.config_name if model_args.config_name else model_args.model_name_or_path, 339 | cache_dir=model_args.cache_dir, 340 | revision=model_args.model_revision, 341 | use_auth_token=True if model_args.use_auth_token else None, 342 | dropout_rate=model_args.dropout_rate, 343 | ) 344 | tokenizer = AutoTokenizer.from_pretrained( 345 | model_args.tokenizer_name if model_args.tokenizer_name else model_args.model_name_or_path, 346 | cache_dir=model_args.cache_dir, 347 | use_fast=model_args.use_fast_tokenizer, 348 | revision=model_args.model_revision, 349 | use_auth_token=True if model_args.use_auth_token else None, 350 | ) 351 | model = AutoModelForSeq2SeqLM.from_pretrained( 352 | model_args.model_name_or_path, 353 | from_tf=bool(".ckpt" in model_args.model_name_or_path), 354 | config=config, 355 | cache_dir=model_args.cache_dir, 356 | revision=model_args.model_revision, 357 | use_auth_token=True if model_args.use_auth_token else None, 358 | ) 359 | 360 | if model.config.decoder_start_token_id is None: 361 | raise ValueError( 362 | "Make sure that `config.decoder_start_token_id` is correctly defined") 363 | 364 | # Preprocessing the datasets. 365 | # We need to tokenize inputs and targets. 366 | if training_args.do_train: 367 | column_names = datasets["train"].column_names 368 | elif training_args.do_eval: 369 | column_names = datasets["validation"].column_names 370 | elif training_args.do_predict: 371 | column_names = datasets["test"].column_names 372 | else: 373 | logger.info( 374 | "There is nothing to do. Please pass `do_train`, `do_eval` and/or `do_predict`.") 375 | return 376 | 377 | # Temporarily set max_target_length for training. 378 | max_source_length = data_args.max_source_length 379 | max_target_length = data_args.max_target_length 380 | padding = "max_length" if data_args.pad_to_max_length else False 381 | 382 | if training_args.label_smoothing_factor > 0 and not hasattr(model, "prepare_decoder_input_ids_from_labels"): 383 | logger.warning( 384 | "label_smoothing is enabled but the `prepare_decoder_input_ids_from_labels` method is not defined for" 385 | f"`{model.__class__.__name__}`. This will lead to loss being calculated twice and will take up more memory" 386 | ) 387 | 388 | # Preprocessing the datasets. 389 | question_column_name = data_args.question_column 390 | context_column_name = data_args.context_column 391 | answer_column_name = data_args.answer_column 392 | 393 | def format_inputs(context: str, answer: str): 394 | return f"{answer} \\n {context}" 395 | 396 | def preprocess_function(examples): 397 | context = examples[context_column_name] 398 | answer = examples[answer_column_name] 399 | question = examples[question_column_name] 400 | 401 | inputs = [format_inputs(ctx, ans) for ctx, ans in zip(context, answer)] 402 | 403 | model_inputs = tokenizer(inputs, max_length=max_source_length, 404 | padding=padding, truncation=True) 405 | 406 | # Setup the tokenizer for targets 407 | with tokenizer.as_target_tokenizer(): 408 | labels = tokenizer(question, max_length=max_target_length, 409 | padding=padding, truncation=True) 410 | 411 | # If we are padding here, replace all tokenizer.pad_token_id in the labels by -100 when we want to ignore 412 | # padding in the loss. 413 | if padding == "max_length" and data_args.ignore_pad_token_for_loss: 414 | labels["input_ids"] = [ 415 | [(l if l != tokenizer.pad_token_id else -100) for l in label] for label in labels["input_ids"] 416 | ] 417 | 418 | model_inputs["labels"] = labels["input_ids"] 419 | return model_inputs 420 | 421 | if training_args.do_train: 422 | train_dataset = datasets["train"] 423 | if "train" not in datasets: 424 | raise ValueError("--do_train requires a train dataset") 425 | if data_args.max_train_samples is not None: 426 | train_dataset = train_dataset.select( 427 | range(data_args.max_train_samples)) 428 | train_dataset = train_dataset.map( 429 | preprocess_function, 430 | batched=True, 431 | num_proc=data_args.preprocessing_num_workers, 432 | remove_columns=column_names, 433 | load_from_cache_file=not data_args.overwrite_cache, 434 | ) 435 | 436 | if training_args.do_eval: 437 | max_target_length = data_args.val_max_target_length 438 | if "validation" not in datasets: 439 | raise ValueError("--do_eval requires a validation dataset") 440 | eval_dataset = datasets["validation"] 441 | if data_args.max_val_samples is not None: 442 | eval_dataset = eval_dataset.select( 443 | range(data_args.max_val_samples)) 444 | eval_dataset = eval_dataset.map( 445 | preprocess_function, 446 | batched=True, 447 | num_proc=data_args.preprocessing_num_workers, 448 | remove_columns=column_names, 449 | load_from_cache_file=not data_args.overwrite_cache, 450 | ) 451 | 452 | if training_args.do_predict: 453 | max_target_length = data_args.val_max_target_length 454 | if "test" not in datasets: 455 | raise ValueError("--do_predict requires a test dataset") 456 | test_dataset = datasets["test"] 457 | if data_args.max_test_samples is not None: 458 | test_dataset = test_dataset.select( 459 | range(data_args.max_test_samples)) 460 | test_dataset = test_dataset.map( 461 | preprocess_function, 462 | batched=True, 463 | num_proc=data_args.preprocessing_num_workers, 464 | remove_columns=column_names, 465 | load_from_cache_file=not data_args.overwrite_cache, 466 | ) 467 | 468 | # Data collator 469 | label_pad_token_id = - \ 470 | 100 if data_args.ignore_pad_token_for_loss else tokenizer.pad_token_id 471 | data_collator = DataCollatorForSeq2Seq( 472 | tokenizer, 473 | model=model, 474 | label_pad_token_id=label_pad_token_id, 475 | pad_to_multiple_of=8 if training_args.fp16 else None, 476 | ) 477 | 478 | # Metric 479 | rouge = load_metric("rouge") 480 | bleu = load_metric("sacrebleu") 481 | meteor = load_metric("meteor") 482 | scorer = BERTScorer(lang="en", rescale_with_baseline=True) 483 | 484 | def postprocess_text(preds, labels): 485 | preds = [pred.strip() for pred in preds] 486 | labels = [label.strip() for label in labels] 487 | 488 | bleu_labels = [[label] for label in labels] 489 | 490 | # rougeLSum expects newline after each sentence 491 | rouge_preds = ["\n".join(nltk.sent_tokenize(pred)) for pred in preds] 492 | rouge_labels = ["\n".join(nltk.sent_tokenize(label)) 493 | for label in labels] 494 | 495 | return { 496 | "bleu": [preds, bleu_labels], 497 | "meteor": [preds, labels], 498 | "rouge": [rouge_preds, rouge_labels] 499 | } 500 | 501 | def compute_metrics(eval_preds): 502 | preds, labels = eval_preds 503 | if isinstance(preds, tuple): 504 | preds = preds[0] 505 | decoded_preds = tokenizer.batch_decode(preds, skip_special_tokens=True) 506 | if data_args.ignore_pad_token_for_loss: 507 | # Replace -100 in the labels as we can't decode them. 508 | labels = np.where(labels != -100, labels, tokenizer.pad_token_id) 509 | decoded_labels = tokenizer.batch_decode( 510 | labels, skip_special_tokens=True) 511 | 512 | # Some simple post-processing 513 | decoded = postprocess_text(decoded_preds, decoded_labels) 514 | 515 | result = rouge.compute( 516 | predictions=decoded["rouge"][0], references=decoded["rouge"][1], use_stemmer=True) 517 | # Extract a few results from ROUGE 518 | result = {key: value.mid.fmeasure * 519 | 100 for key, value in result.items()} 520 | 521 | prediction_lens = [np.count_nonzero( 522 | pred != tokenizer.pad_token_id) for pred in preds] 523 | result["gen_len"] = np.mean(prediction_lens) 524 | result = {k: round(v, 4) for k, v in result.items()} 525 | 526 | bleu_result = bleu.compute( 527 | predictions=decoded["bleu"][0], references=decoded["bleu"][1]) 528 | result["bleu"] = bleu_result["score"] 529 | 530 | meteor_result = meteor.compute( 531 | predictions=decoded["meteor"][0], references=decoded["meteor"][1]) 532 | result["meteor"] = meteor_result["meteor"] 533 | 534 | P, R, F1 = scorer.score(decoded["meteor"][0], decoded["meteor"][1]) 535 | result["bertscore"] = np.mean(F1.tolist()) 536 | 537 | return result 538 | 539 | # Initialize our Trainer 540 | trainer = Seq2SeqTrainer( 541 | model=model, 542 | args=training_args, 543 | train_dataset=train_dataset if training_args.do_train else None, 544 | eval_dataset=eval_dataset if training_args.do_eval else None, 545 | tokenizer=tokenizer, 546 | data_collator=data_collator, 547 | compute_metrics=compute_metrics if training_args.predict_with_generate else None, 548 | callbacks=[EarlyStoppingCallback( 549 | early_stopping_patience=data_args.early_stopping_patience)], 550 | ) 551 | 552 | # Training 553 | if training_args.do_train: 554 | checkpoint = None 555 | if training_args.resume_from_checkpoint is not None: 556 | checkpoint = training_args.resume_from_checkpoint 557 | elif last_checkpoint is not None: 558 | checkpoint = last_checkpoint 559 | train_result = trainer.train(resume_from_checkpoint=checkpoint) 560 | trainer.save_model() # Saves the tokenizer too for easy upload 561 | 562 | metrics = train_result.metrics 563 | max_train_samples = ( 564 | data_args.max_train_samples if data_args.max_train_samples is not None else len( 565 | train_dataset) 566 | ) 567 | metrics["train_samples"] = min(max_train_samples, len(train_dataset)) 568 | 569 | trainer.log_metrics("train", metrics) 570 | trainer.save_metrics("train", metrics) 571 | trainer.save_state() 572 | 573 | # Evaluation 574 | results = {} 575 | if training_args.do_eval: 576 | logger.info("*** Evaluate ***") 577 | 578 | metrics = trainer.evaluate( 579 | max_length=data_args.val_max_target_length, num_beams=data_args.num_beams, metric_key_prefix="eval" 580 | ) 581 | max_val_samples = data_args.max_val_samples if data_args.max_val_samples is not None else len( 582 | eval_dataset) 583 | metrics["eval_samples"] = min(max_val_samples, len(eval_dataset)) 584 | 585 | trainer.log_metrics("eval", metrics) 586 | trainer.save_metrics("eval", metrics) 587 | 588 | if training_args.do_predict: 589 | logger.info("*** Test ***") 590 | 591 | test_results = trainer.predict( 592 | test_dataset, 593 | metric_key_prefix="test", 594 | max_length=data_args.val_max_target_length, 595 | num_beams=data_args.num_beams, 596 | ) 597 | metrics = test_results.metrics 598 | max_test_samples = data_args.max_test_samples if data_args.max_test_samples is not None else len( 599 | test_dataset) 600 | metrics["test_samples"] = min(max_test_samples, len(test_dataset)) 601 | 602 | trainer.log_metrics("test", metrics) 603 | trainer.save_metrics("test", metrics) 604 | 605 | if trainer.is_world_process_zero(): 606 | if training_args.predict_with_generate: 607 | test_preds = tokenizer.batch_decode( 608 | test_results.predictions, skip_special_tokens=True, clean_up_tokenization_spaces=True 609 | ) 610 | test_preds = [pred.strip() for pred in test_preds] 611 | output_test_preds_file = os.path.join( 612 | training_args.output_dir, "test_generations.txt") 613 | with open(output_test_preds_file, "w") as writer: 614 | writer.write("\n".join(test_preds)) 615 | 616 | return results 617 | 618 | 619 | def _mp_fn(index): 620 | # For xla_spawn (TPUs) 621 | main() 622 | 623 | 624 | if __name__ == "__main__": 625 | main() 626 | -------------------------------------------------------------------------------- /MixQG/train.sh: -------------------------------------------------------------------------------- 1 | num_gpus=$1 2 | model_name=$2 3 | dataset=$3 4 | output_dir=$4 5 | lr=$5 6 | bs=$6 7 | 8 | deepspeed --num_gpus=${num_gpus} run_qg.py \ 9 | --model_name_or_path ${model_name} \ 10 | --dataset_dir ${dataset} \ 11 | --output_dir ${output_dir} \ 12 | --do_train \ 13 | --do_eval \ 14 | --evaluation_strategy steps \ 15 | --eval_steps 2000 \ 16 | --save_steps 2000 \ 17 | --load_best_model_at_end True \ 18 | --metric_for_best_model eval_rougeLsum \ 19 | --greater_is_better True \ 20 | --predict_with_generate True \ 21 | --per_device_eval_batch_size=${bs} \ 22 | --per_device_train_batch_size=${bs} \ 23 | --gradient_accumulation_steps=1 \ 24 | --max_steps 100000 \ 25 | --logging_steps 100 \ 26 | --save_total_limit 4 \ 27 | --deepspeed configs/ds_config_zero2.json \ 28 | --adam_eps 1e-06 \ 29 | --label_smoothing 0.1 \ 30 | --learning_rate ${lr} \ 31 | --logging_first_step \ 32 | --warmup_steps 500 \ 33 | --max_target_length 32 \ 34 | --val_max_target_length 32 \ 35 | --fp16 36 | -------------------------------------------------------------------------------- /Quiz_Design/README.md: -------------------------------------------------------------------------------- 1 | # Quiz Design: Helping Teachers Create Quizzes with Automated Question Generation 2 | 3 | This is the official code base for the following paper from Salesforce Research: 4 | 5 | **Title**: Quiz Design Task: Helping Teachers Create Quizzes with Automated Question Generation 6 | 7 | **Authors**: Philippe Laban, Chien-Sheng Wu, Lidiya Murakhovs'ka, Wenhao Liu, Caiming Xiong 8 | 9 | ## Dataset Release: 10 | 11 | We release the dataset we collected during the study: `quiz_design_data.jsonl`. It can be opened in Python with the following: 12 | 13 | ```python 14 | import utils_qd_data 15 | annotations = utils_qd_data.load_qd_annotations() 16 | qd_dataset = utils_qd_data.build_qd_groups(annotations) 17 | ``` 18 | 19 | Each line is a JSON object. The first entry looks like: 20 | ```json 21 | {"doc_id": 0, 22 | "answer_span": "meets the needs of the present without compromising the ability of future generations to meet their own needs", 23 | "context": "Energy is sustainable if it 'meets the needs of the present without compromising the ability of future generations to meet their own needs'. Most definitions of sustainable energy [...]", 24 | "questions": [{"question": "What does energy mean if it is sustainable?", 25 | "label": 0, 26 | "reason": "disfluent", 27 | "model_name": "dgpt2_sup"}, 28 | {"question": "What does energy sustainability mean?", 29 | "label": 1, 30 | "reason": "No error", 31 | "model_name": "gpt2b_sup"}, 32 | {"question": "How is energy sustainable?", 33 | "label": 0, 34 | "reason": "wrong_context", 35 | "model_name": "gpt2m_sup"}, 36 | {"question": "What is sustainable energy?", 37 | "label": 0, 38 | "reason": "wrong_context", 39 | "model_name": "bartb_sup|prophetnet"}, 40 | {"question": "What does it mean if energy is sustainable?", 41 | "label": 1, 42 | "reason": "No error", 43 | "model_name": "mixqg"}, 44 | {"question": "What is the definition of sustainable energy?", 45 | "label": 1, 46 | "reason": "No error", 47 | "model_name": "bartl_sup"}]} 48 | ``` 49 | 50 | ## Annotation Interface 51 | 52 | We release the annotation interface used during the collection of the Quiz Design study. 53 | The interface can instantiated with the following command: 54 | ``` 55 | FLASK_APP=run_flask_server flask run 56 | ``` 57 | 58 | The list of Question Generation models used to generate candidate questions can be modified in the first lines of `run_flask_server.py`. 59 | 60 | ## Cite the work 61 | 62 | If you use the data or annotation interface, please cite the work: 63 | ``` 64 | @inproceedings{laban2022quiz, 65 | title={Quiz Design Task: Helping Teachers Create Quizzes with Automated Question Generation}, 66 | author={Laban, Philippe and Wu, Chien-Sheng and Murakhovs'ka, Lidiya and Liu, Wenhao and Xiong, Caiming}, 67 | booktitle={Findings of the North American Chapter of the Association for Computational Linguistics: NAACL 2022}, 68 | year={2022} 69 | } 70 | ``` 71 | -------------------------------------------------------------------------------- /Quiz_Design/model_hf_generator.py: -------------------------------------------------------------------------------- 1 | from transformers import AutoTokenizer, AutoModelForCausalLM, AutoModelForSeq2SeqLM 2 | import torch, os, tqdm 3 | 4 | def select_logprobs(logits, decoded_tokens, eos_id): 5 | logprobs = torch.nn.functional.log_softmax(logits, dim=2) 6 | 7 | selected_logprobs = [] 8 | for i, generated_tokenized in enumerate(decoded_tokens): 9 | if eos_id in generated_tokenized: 10 | generated_tokenized = generated_tokenized[:generated_tokenized.index(eos_id)] 11 | selected_logprob = logprobs[i, torch.arange(len(generated_tokenized)), generated_tokenized] 12 | summed_logprob = torch.sum(selected_logprob) 13 | selected_logprobs.append(summed_logprob) 14 | selected_logprobs = torch.stack(selected_logprobs, dim=0) 15 | return selected_logprobs 16 | 17 | models_folder = os.environ["MODELS_FOLDER"] 18 | 19 | class GeneratorHF: 20 | def __init__(self, model_card="gpt2-medium", device="cuda", starter_file=None, gradient_checkpointing=False, max_enc_length=None, max_dec_length=None, force_dec_prepend=None): 21 | self.model_card = model_card 22 | 23 | self.is_gpt2 = "gpt2" in self.model_card or "summary_loop" in self.model_card or "keep_it_simple" in self.model_card 24 | if self.is_gpt2: 25 | self.model = AutoModelForCausalLM.from_pretrained(self.model_card) 26 | else: 27 | self.model = AutoModelForSeq2SeqLM.from_pretrained(self.model_card) 28 | self.model.to(device) 29 | 30 | self.tokenizer = AutoTokenizer.from_pretrained(self.model_card) 31 | self.gradient_checkpointing = gradient_checkpointing 32 | self.max_enc_length = max_enc_length 33 | self.max_dec_length = max_dec_length 34 | self.force_dec_prepend = force_dec_prepend 35 | 36 | if self.gradient_checkpointing: 37 | self.model.gradient_checkpointing_enable() 38 | 39 | self.model.eval() 40 | 41 | if "facebook/wmt19" in self.model_card: 42 | self.tokenizer.pad_token = "" 43 | self.tokenizer.eos_token = "" 44 | 45 | self.start_id = self.tokenizer.bos_token_id 46 | self.end_id = self.tokenizer.eos_token_id 47 | 48 | if "prophetnet" in self.model_card: 49 | # bos_token_id=102, eos_token_id=102 50 | self.start_id = 102 51 | self.end_id = 102 52 | 53 | if self.start_id is None and self.end_id is not None: 54 | # For MixQG 55 | self.start_id = 0 56 | 57 | self.device = device 58 | if self.is_gpt2: 59 | self.tokenizer.pad_token = self.tokenizer.eos_token 60 | 61 | self.model.config.pad_token_id = self.tokenizer.pad_token_id 62 | if starter_file is not None: 63 | self.reload(starter_file, strict=False) 64 | 65 | def reload(self, from_file, strict=True): 66 | if not os.path.isfile(from_file): 67 | # Try to look at the models folder for the file 68 | from_file = os.path.join(models_folder, from_file) 69 | assert os.path.isfile(from_file), "Starter file not found, in absolute or in models folder" 70 | 71 | loaded_dict = torch.load(from_file) 72 | print(self.model.load_state_dict(loaded_dict, strict=strict)) 73 | 74 | def save(self, to_file): 75 | torch.save(self.model.state_dict(), to_file) 76 | 77 | def preprocess(self, encoded_texts, decoded_texts, max_enc_length=None, max_dec_length=None): 78 | 79 | assert len(encoded_texts) == len(decoded_texts), "Mismatch in input/output sizes" 80 | 81 | # encoder_tokenized = [torch.LongTensor(self.tokenizer.encode(text=text)) for text in encoded_texts] 82 | # encoder_ids = torch.nn.utils.rnn.pad_sequence(encoder_tokenized, batch_first=True, padding_value=0, truncation=True).to(self.device) 83 | 84 | encoder_ids = self.tokenizer.batch_encode_plus(encoded_texts, add_special_tokens=True, return_tensors="pt", padding=True, truncation=True).input_ids.to(self.device) 85 | 86 | if self.force_dec_prepend is not None: 87 | decoded_texts = [self.force_dec_prepend + text for text in decoded_texts] 88 | decoder_tokenized = [self.tokenizer.encode(text=text, add_special_tokens=False) for text in decoded_texts] 89 | 90 | decoder_ids_input = torch.nn.utils.rnn.pad_sequence([torch.LongTensor([self.start_id] + dec) for dec in decoder_tokenized], batch_first=True, padding_value=self.end_id).to(self.device) 91 | decoder_ids_output = torch.nn.utils.rnn.pad_sequence([torch.LongTensor(dec + [self.end_id]) for dec in decoder_tokenized], batch_first=True, padding_value=-1).to(self.device) 92 | 93 | if self.max_enc_length is not None and max_enc_length is None: 94 | max_enc_length = self.max_enc_length 95 | if self.max_dec_length is not None and max_dec_length is None: 96 | max_dec_length = self.max_dec_length 97 | 98 | if max_enc_length is not None: 99 | encoder_ids = encoder_ids[:, :max_enc_length] 100 | 101 | if max_dec_length is not None: 102 | decoder_ids_input = decoder_ids_input[:, :max_dec_length] 103 | decoder_ids_output = decoder_ids_output[:, :max_dec_length] 104 | 105 | return encoder_ids, decoder_ids_input, decoder_ids_output 106 | 107 | def train_batch(self, encoded_texts, decoded_texts, max_enc_length=None, max_dec_length=None, no_preinput=False): 108 | self.model.train() 109 | N = len(encoded_texts) 110 | 111 | encoder_ids, decoder_ids_input, decoder_ids_output = self.preprocess(encoded_texts, decoded_texts, max_enc_length, max_dec_length) 112 | 113 | crit = torch.nn.CrossEntropyLoss(ignore_index=-1) 114 | if self.is_gpt2: 115 | past = None 116 | if not no_preinput: 117 | encoder_output = self.model(input_ids=encoder_ids, past_key_values=None, return_dict=True, use_cache=True) 118 | past = encoder_output["past_key_values"] 119 | decoder_output = self.model(input_ids=decoder_ids_input, past_key_values=past, return_dict=True, use_cache=not self.gradient_checkpointing) 120 | logits = decoder_output["logits"] 121 | else: 122 | if no_preinput: 123 | encoder_ids = torch.LongTensor([[self.start_id]]).repeat(N, 1).to(self.device) 124 | model_output = self.model(input_ids=encoder_ids, decoder_input_ids=decoder_ids_input, return_dict=True, use_cache=not self.gradient_checkpointing) 125 | logits = model_output["logits"] 126 | 127 | N_unwrap = decoder_ids_output.shape[0] * decoder_ids_output.shape[1] 128 | loss = crit(logits.view(N_unwrap, -1), decoder_ids_output.contiguous().view(-1)) # self.tokenizer.vocab_size 129 | return loss 130 | 131 | def score_batch(self, encoded_texts, decoded_texts, max_enc_length=None, max_dec_length=None): 132 | encoder_ids, decoder_ids_input, decoder_ids_output = self.preprocess(encoded_texts, decoded_texts, max_enc_length, max_dec_length) 133 | 134 | with torch.no_grad(): 135 | 136 | crit = torch.nn.CrossEntropyLoss(ignore_index=-1, reduction="none") 137 | if self.is_gpt2: 138 | encoder_output = self.model(input_ids=encoder_ids, past_key_values=None, return_dict=True) 139 | past = encoder_output["past_key_values"] 140 | decoder_output = self.model(input_ids=decoder_ids_input, past_key_values=past, return_dict=True) 141 | logits = decoder_output["logits"] 142 | else: 143 | model_output = self.model(input_ids=encoder_ids, decoder_input_ids=decoder_ids_input, return_dict=True) 144 | logits = model_output["logits"] 145 | 146 | N, seqlength, vocab_size = logits.shape 147 | 148 | loss_components = crit(logits.view(N*seqlength, vocab_size), decoder_ids_output.contiguous().view(-1)).reshape(N, seqlength) 149 | num_words = torch.sum(decoder_ids_output != -1, dim=1) 150 | score_per_item = (- torch.sum(loss_components, dim=1) / num_words).tolist() 151 | return {"scores": score_per_item} 152 | 153 | def score(self, encoded_texts, decoded_texts, max_enc_length=None, max_dec_length=None, batch_size=32, progress=False): 154 | N = len(encoded_texts) 155 | iterator = range(0, N, batch_size) 156 | if progress and len(iterator) > 1: 157 | iterator = tqdm.tqdm(iterator) 158 | scores = [] 159 | for i in iterator: 160 | batch_encoded_texts = encoded_texts[i:i+batch_size] 161 | batch_decoded_texts = decoded_texts[i:i+batch_size] 162 | batch_scores = self.score_batch(batch_encoded_texts, batch_decoded_texts, max_enc_length, max_dec_length)["scores"] 163 | scores += batch_scores 164 | return {"scores": scores} 165 | 166 | def generate(self, texts, max_enc_length=None, max_gen_length=None, num_runs=1, compute_logprobs=False, force_start=None, **gen_params): 167 | assert type(texts) == list, "The generate function takes as input a list of `str`" 168 | if len(texts) == 0: 169 | return [] 170 | 171 | tokenized_paragraphs = [torch.LongTensor(self.tokenizer.encode(text=text)) for text in texts] 172 | tokenized_paragraphs = [tok_text for tok_text in tokenized_paragraphs for _ in range(num_runs)] 173 | 174 | decoder_input_ids = None 175 | if force_start is not None: 176 | decoder_input_ids = self.tokenizer.encode(force_start, return_tensors="pt", add_special_tokens=False) 177 | 178 | # Generate without leaving gradients 179 | with torch.no_grad(): 180 | encoder_ids = torch.nn.utils.rnn.pad_sequence(tokenized_paragraphs, batch_first=True, padding_value=0).to(self.device) 181 | if max_enc_length is not None: 182 | encoder_ids = encoder_ids[:, :(max_enc_length-1)] 183 | N = encoder_ids.shape[0] 184 | start_column = torch.LongTensor([[self.start_id]] * N).to(self.device) 185 | encoder_ids = torch.cat((encoder_ids, start_column), dim=1) 186 | 187 | if decoder_input_ids is not None: 188 | decoder_input_ids = decoder_input_ids.repeat(N, 1).to(self.device) 189 | if self.is_gpt2: 190 | encoder_ids = torch.cat((encoder_ids, decoder_input_ids), dim=1) 191 | else: 192 | decoder_input_ids = torch.cat((start_column, decoder_input_ids), dim=1) 193 | gen_params["decoder_input_ids"] = decoder_input_ids 194 | 195 | _, input_seq_length = encoder_ids.shape 196 | if max_gen_length is not None: 197 | if self.is_gpt2: 198 | gen_params["max_length"] = input_seq_length + max_gen_length 199 | else: 200 | gen_params["max_length"] = max_gen_length 201 | 202 | if "num_beams" in gen_params: # Propagate param 203 | gen_params["num_return_sequences"] = gen_params["num_beams"] 204 | 205 | output_generate = self.model.generate(encoder_ids, return_dict_in_generate=True, output_scores=True, **gen_params) 206 | 207 | generated_ids = output_generate.sequences 208 | if self.is_gpt2 and decoder_input_ids is not None: 209 | generated_ids = torch.cat((decoder_input_ids, generated_ids), dim=1) 210 | if self.is_gpt2: 211 | generated_ids = generated_ids[:, input_seq_length:] 212 | 213 | N, gen_length = generated_ids.shape 214 | batch_size = len(texts) 215 | num_beams = N // (batch_size * num_runs) 216 | if num_beams > 1: 217 | # For some reason, they do not return a score if it is not beam-search... 218 | sequences_scores = output_generate.sequences_scores 219 | else: 220 | sequences_scores = torch.zeros(N).to(self.device) 221 | 222 | # The next block is to obtain logprobs... unfortunately have to run the model again, as there's no good book-keeping for HF beam-search 223 | selected_logprobs = torch.zeros(N).to(self.device) 224 | if compute_logprobs: 225 | # Don't run this unless we really need these (for RL training) 226 | expanded_encoder_ids = torch.repeat_interleave(encoder_ids, repeats=num_beams, dim=0) 227 | 228 | if self.is_gpt2: 229 | generated_input = torch.cat((torch.LongTensor([[self.start_id]] * N).to(self.device), generated_ids), dim=1) 230 | generated_output = torch.cat((generated_ids, torch.LongTensor([[self.end_id]] * N).to(self.device)), dim=1) # There is an error here, the end_id could be AFTER padding... need to fix 231 | 232 | expanded_encoder_ids = expanded_encoder_ids[:, :-1] 233 | 234 | encoder_output = self.model(input_ids=expanded_encoder_ids[:, :-1], past_key_values=None, return_dict=True) 235 | decoder_output = self.model(input_ids=generated_input, past_key_values=encoder_output.past_key_values, return_dict=True) 236 | 237 | selected_logprobs = utils_rl.select_logprobs(decoder_output.logits, generated_output.tolist(), self.end_id) 238 | else: 239 | expanded_encoder_ids = torch.repeat_interleave(encoder_ids, repeats=num_beams, dim=0) 240 | 241 | generated_input = generated_ids[:, :-1] 242 | generated_output = generated_ids[:, 1:] 243 | 244 | model_output = self.model(input_ids=expanded_encoder_ids, decoder_input_ids=generated_input, return_dict=True) 245 | selected_logprobs = utils_rl.select_logprobs(model_output.logits, generated_output.tolist(), eos_id=self.end_id) 246 | # print("Selected logprobs:", selected_logprobs.tolist()) 247 | 248 | # Un-tokenize 249 | generated_texts = self.tokenizer.batch_decode(generated_ids, skip_special_tokens=True) 250 | 251 | # Time to un-flatten 252 | num_candidates = num_runs * num_beams 253 | 254 | generated_texts = [generated_texts[i:(i+num_candidates)] for i in range(0, N, num_candidates)] 255 | selected_logprobs = [selected_logprobs[i:(i+num_candidates)] for i in range(0, N, num_candidates)] 256 | sequences_scores = [sequences_scores[i:(i+num_candidates)] for i in range(0, N, num_candidates)] 257 | 258 | outputs = [] 259 | sort_by_key = "logprob" if compute_logprobs else "score" 260 | 261 | for gen_texts, scores, logprobs in zip(generated_texts, sequences_scores, selected_logprobs): 262 | output = [{"output_text": gen_text, "logprob": logprob, "score": score} for gen_text, score, logprob in zip(gen_texts, scores, logprobs)] 263 | output = sorted(output, key=lambda x: x[sort_by_key], reverse=True) 264 | outputs.append(output) 265 | 266 | return outputs 267 | 268 | 269 | if __name__ == "__main__": 270 | # qgen = GeneratorHF(model_card="gpt2-medium", starter_file="/export/home/models/qgen/gpt2_med_newsqa_only_logprob_2.059.bin") 271 | # qgen = GeneratorHF(model_card="Salesforce/mixqg-large", starter_file="mixqgl_clean_qg_L_1.457.bin") 272 | qgen = GeneratorHF(model_card="facebook/bart-large", starter_file="/export/home/models/bartl_clean_qg_L_1.917.bin") 273 | paragraph = "Liu Qiangdong, also known as Richard Liu, CEO of JD.com, raises his arms to celebrate the IPO for his company at the Nasdaq MarketSite, New York, May 22, 2014." 274 | 275 | for start in ["Why", "How", "What"]: 276 | print(qgen.generate([paragraph], force_start=start, max_gen_length=20)[0][0]["output_text"]) 277 | exit() 278 | 279 | # gpt2zs = GeneratorHF(model_card="gpt2-large") 280 | # document = "US President Joe Biden spoke at a news conference Thursday at the NATO headquarters in Brussels, Belgium, after meeting with other world leaders of NATO, the European Council and the G7. The key global figures are seeking to align their responses to Russia's invasion of Ukraine. The President touched upon the unity of NATO, the prospect of Russian President Vladimir Putin using chemical weapons, and the possible role of China in the conflict. Biden took questions from reporters and spoke for roughly 30 minutes. TL;DR:" 281 | # print(gpt2zs.generate([document], num_runs=1, max_gen_length=100)) 282 | 283 | # exit() 284 | paragraphs = ["On Tuesday, the Joint Committee on Administrative Rules (JCAR) voted against extending the Illinois Department of Public Health (IDPH) emergency rule on school mask mandates."] 285 | gen2 = GeneratorHF(model_card="gpt2-medium", starter_file="qgen/gpt2_med_newsqab_sched_logprob_1.793.bin") 286 | # gen2.eval() 287 | 288 | batch_outs2 = gen2.generate(paragraphs, max_gen_length=20, do_sample=True, num_runs=3) 289 | for outs2 in batch_outs2: 290 | print("=========") 291 | for out2 in outs2: 292 | print("[%.3f] %s" % (out2["logprob"], out2["output_text"])) 293 | print("--------") 294 | 295 | 296 | exit() 297 | gen = GeneratorHF(model_card="philippelaban/keep_it_simple") 298 | # paragraph = """A small capsule containing asteroid soil samples that was dropped from 136,700 miles in space by Japan's Hayabusa2 spacecraft landed as planned in the Australian Outback on December 6. The extremely high precision required to carry out the mission thrilled many in Japan, who said they took pride in its success.""" 299 | paragraph = """Earth travels a tremendous distance in its orbit around the sun, at a speed of around 30km/s or over 108000km per hour.""" 300 | outs = gen.generate([paragraph], max_length=150, num_beams=4, do_sample=True, num_return_sequences=4)[0] 301 | for out in outs: 302 | print("[%.3f] %s" % (out["score"], out["output_text"])) 303 | print() 304 | # gens = [out["output_text"] for out in outs] 305 | # inps = [paragraph] * len(gens) 306 | 307 | inps = ["Earth travels a tremendous distance in its orbit around the sun, at a speed of around 30km/s or over 108000km per hour."] * 2 308 | gens = ["Earth travels a tremendous size in its orbit around the sun, at a speed of around 30 km/s or over 108000km.", "The experiment The Earth travels very quickly -LRB- 100,000 km per hour -RRB- around the Sun ."] 309 | 310 | print(gen.score(inps, gens)) 311 | 312 | # from model_generator import Generator 313 | import utils_misc, utils_squad 314 | utils_misc.select_freer_gpu() 315 | 316 | paragraph = "The Palazzo Pitti (Italian pronunciation: [paˈlattso ˈpitti]), in English sometimes called the Pitti Palace, is a vast, mainly Renaissance, palace in Florence, Italy. It is situated on the south side of the River Arno, a short distance from the Ponte Vecchio. The core of the present palazzo dates from 1458 and was originally the town residence of Luca Pitti an ambitious Florentine banker." 317 | 318 | answer = "Luca Pitti" 319 | 320 | marked_paragraph = utils_squad.mark_paragraph_answer(paragraph, answer, model_card="Salesforce/mixqg-large") 321 | print(">>>", marked_paragraph) 322 | 323 | gen = GeneratorHF(model_card="Salesforce/mixqg-large") 324 | 325 | gen_out = gen.generate([marked_paragraph], do_sample=False, num_beams=4) 326 | 327 | for d in gen_out[0]: 328 | print("---") 329 | print(d["output_text"]) 330 | exit() 331 | 332 | # gen = GeneratorHF(model_card="facebook/bart-base", starter_file="qgen/bartb_squad_aaware_logprob_1.531.bin") 333 | # paragraph = "asteroid soil samples \n A small capsule containing asteroid soil samples that was dropped from 136,700 miles in space by Japan's Hayabusa2 spacecraft landed as planned in the Australian Outback on December 6. The extremely high precision required to carry out the mission thrilled many in Japan, who said they took pride in its success." 334 | # questions = ["What was contained in the capsule that was dropped from 136,700 miles in space?"] 335 | 336 | # gen = GeneratorHF(model_card="Salesforce/mixqg-large") 337 | # paragraph = "asteroid soil samples \n A small capsule containing asteroid soil samples that was dropped from 136,700 miles in space by Japan's Hayabusa2 spacecraft landed as planned in the Australian Outback on December 6. The extremely high precision required to carry out the mission thrilled many in Japan, who said they took pride in its success." 338 | # questions = ["What was dropped from space by Japan's Hayabusa2 spacecraft?"] 339 | 340 | # gen = GeneratorHF(model_card="microsoft/prophetnet-large-uncased-squad-qg") 341 | # paragraph = "asteroid soil samples [SEP] A small capsule containing asteroid soil samples that was dropped from 136,700 miles in space by Japan's Hayabusa2 spacecraft landed as planned in the Australian Outback on December 6. The extremely high precision required to carry out the mission thrilled many in Japan, who said they took pride in its success." 342 | # questions = ["what was in the capsule that landed in australia?"] 343 | 344 | # gen = GeneratorHF(model_card="gpt2-medium", starter_file="qgen/gpt2m_nf_squad_aaware_1.423.bin") 345 | # paragraph = "asteroid soil samples \n A small capsule containing asteroid soil samples that was dropped from 136,700 miles in space by Japan's Hayabusa2 spacecraft landed as planned in the Australian Outback on December 6. The extremely high precision required to carry out the mission thrilled many in Japan, who said they took pride in its success." 346 | # questions = ["What was contained in the capsule that was dropped from 136,700 miles in space?"] 347 | 348 | # paragraphs = [paragraph] * len(questions) 349 | 350 | # gen.model.eval() 351 | # print(gen.score(paragraphs, questions)) 352 | 353 | # for d in gen.generate([paragraph], num_beams=1, max_gen_length=40, compute_logprobs=True): 354 | # print(d[0]["output_text"]) 355 | 356 | # print("---------") 357 | # tokenized_paragraphs = [torch.LongTensor(gen.tokenizer.encode(text=p)) for p in paragraphs] 358 | # encoder_ids = torch.nn.utils.rnn.pad_sequence(tokenized_paragraphs, batch_first=True, padding_value=0).to(gen.device) 359 | 360 | # tokenized_questions = [gen.tokenizer.encode(text=q, add_special_tokens=False) for q in questions] 361 | # decoder_input_ids = torch.nn.utils.rnn.pad_sequence([torch.LongTensor([gen.start_id] + q) for q in tokenized_questions], batch_first=True, padding_value=gen.end_id).to(gen.device) 362 | # decoder_output_ids = torch.nn.utils.rnn.pad_sequence([torch.LongTensor(q + [gen.end_id]) for q in tokenized_questions], batch_first=True, padding_value=-1).to(gen.device) 363 | 364 | # print("=============") 365 | # print("Likelihood function") 366 | # print(decoder_input_ids.tolist()) 367 | # print(decoder_output_ids.tolist()) 368 | 369 | # print("============") 370 | 371 | # model_output = gen.model(input_ids=encoder_ids, decoder_input_ids=decoder_input_ids, return_dict=True) 372 | # selected_logprobs = utils_rl.select_logprobs(model_output.logits, decoder_output_ids.tolist(), eos_id=gen.end_id) 373 | # print("Manual selected logprobs", selected_logprobs) 374 | 375 | # crit = torch.nn.CrossEntropyLoss(ignore_index=-1, reduction="none") 376 | # N, seqlength, vocab_size = model_output.logits.shape 377 | # loss_components = crit(model_output.logits.view(N*seqlength, vocab_size), decoder_output_ids.contiguous().view(-1)).reshape(N, seqlength) 378 | 379 | # num_words = torch.sum(decoder_output_ids != -1, dim=1) 380 | # score_per_item = (- torch.sum(loss_components, dim=1) / num_words).tolist() 381 | 382 | # print("Manual score per item:", score_per_item) 383 | 384 | gen = GeneratorHF(model_card="distilgpt2", starter_file="qgen/dgpt2_squad_aaware_1.794.bin") 385 | 386 | answers = ["A small capsule", "asteroid soil samples", "136,700 miles", "Australian Outback"] 387 | original = "A small capsule containing asteroid soil samples that was dropped from 136,700 miles in space by Japan's Hayabusa2 spacecraft landed as planned in the Australian Outback on December 6. The extremely high precision required to carry out the mission thrilled many in Japan, who said they took pride in its success." 388 | paragraphs = ["%s \n %s" % (answer, original) for answer in answers] 389 | 390 | gen_params = [{"num_beams": 3, "num_runs": 1}, {"num_beams": 1, "num_runs": 3, "do_sample": True}] 391 | for gen_param in gen_params: 392 | print("===============") 393 | print(gen_param) 394 | batch_outs1 = gen.generate(paragraphs, max_length=100, compute_logprobs=True, **gen_param) 395 | for ans, outs1 in zip(answers, batch_outs1): 396 | print("=========") 397 | print("Target answer:", ans) 398 | for out1 in outs1: 399 | print("[%.3f] %s" % (out1["logprob"], out1["output_text"])) 400 | print("--------") 401 | 402 | print("========================") 403 | print("========================") 404 | print("========================") 405 | -------------------------------------------------------------------------------- /Quiz_Design/qd_content.json: -------------------------------------------------------------------------------- 1 | [{"doc_id": 0, "title": "Sustainable_Energy", "content": "Energy is sustainable if it \"meets the needs of the present without compromising the ability of future generations to meet their own needs\". Most definitions of sustainable energy include considerations of environmental aspects such as greenhouse gas emissions and social and economic aspects such as energy poverty. Renewable energy sources such as wind, hydroelectric power, solar, and geothermal energy are generally far more sustainable than fossil fuel sources. However, some renewable energy projects, such as the clearing of forests to produce biofuels, can cause severe environmental damage. The role of non-renewable energy sources in sustainable energy has been controversial. Nuclear power is a low-carbon source whose historic mortality rates are comparable to wind and solar, but its sustainability has been debated because of concerns about radioactive waste, nuclear proliferation, and accidents. Switching from coal to natural gas has environmental benefits, including a lower climate impact, but may lead to a delay in switching to more sustainable options. Carbon capture and storage can be built into power plants to remove their carbon dioxide (CO2) emissions, but is expensive and has seldom been implemented.
Fossil fuels provide 85% of the world's energy consumption and the energy system is responsible for 76% of global greenhouse gas emissions. Around 790 million people in developing countries lack access to electricity and 2.6 billion rely on polluting fuels such as wood or charcoal to cook. Reducing greenhouse gas emissions to levels consistent with the 2015 Paris Agreement will require a system-wide transformation of the way energy is produced, distributed, stored, and consumed. The burning of fossil fuels and biomass is a major contributor to air pollution, which causes an estimated 7 million deaths each year. Therefore, the transition to a low-carbon energy system would have strong co-benefits for human health. Pathways exist to provide universal access to electricity and clean cooking in ways that are compatible with climate goals, while bringing major health and economic benefits to developing countries."}, {"doc_id": 1, "title": "Californium", "content": "Californium is a radioactive chemical element with the symbol Cf and atomic number 98. The element was first synthesized in 1950 at the Lawrence Berkeley National Laboratory (then the University of California Radiation Laboratory), by bombarding curium with alpha particles (helium-4 ions). It is an actinide element, the sixth transuranium element to be synthesized, and has the second-highest atomic mass of all the elements that have been produced in amounts large enough to see with the unaided eye (after einsteinium). The element was named after the university and the U.S. state of California.
Two crystalline forms exist for californium under normal pressure: one above and one below 900 \u00b0C (1,650 \u00b0F). A third form exists at high pressure. Californium slowly tarnishes in air at room temperature. Compounds of californium are dominated by the +3 oxidation state. The most stable of californium's twenty known isotopes is californium-251, which has a half-life of 898 years. This short half-life means the element is not found in significant quantities in the Earth's crust. Californium-252, with a half-life of about 2.645 years, is the most common isotope used and is produced at the Oak Ridge National Laboratory in the United States and the Research Institute of Atomic Reactors in Russia.
Californium is one of the few transuranium elements that have practical applications. Most of these applications exploit the property of certain isotopes of californium to emit neutrons. For example, californium can be used to help start up nuclear reactors, and it is employed as a source of neutrons when studying materials using neutron diffraction and neutron spectroscopy. Californium can also be used in nuclear synthesis of higher mass elements; oganesson (element 118) was synthesized by bombarding californium-249 atoms with calcium-48 ions. Users of californium must take into account radiological concerns and the element's ability to disrupt the formation of red blood cells by bioaccumulating in skeletal tissue."}, {"doc_id": 2, "title": "Statue_of_Liberty", "content": "The Statue of Liberty (Liberty Enlightening the World; French: La Libert\u00e9 \u00e9clairant le monde) is a colossal neoclassical sculpture on Liberty Island in New York Harbor in New York City, in the United States. The copper statue, a gift from the people of France to the people of the United States, was designed by French sculptor Fr\u00e9d\u00e9ric Auguste Bartholdi and its metal framework was built by Gustave Eiffel. The statue was dedicated on October 28, 1886.
The statue is a figure of Libertas, a robed Roman liberty goddess. She holds a torch above her head with her right hand, and in her left hand carries a tabula ansata inscribed JULY IV MDCCLXXVI (July 4, 1776 in Roman numerals), the date of the U.S. Declaration of Independence. A broken shackle and chain lie at her feet as she walks forward, commemorating the recent national abolition of slavery. After its dedication, the statue became an icon of freedom and of the United States, seen as a symbol of welcome to immigrants arriving by sea.
Bartholdi was inspired by a French law professor and politician, \u00c9douard Ren\u00e9 de Laboulaye, who is said to have commented in 1865 that any monument raised to U.S. independence would properly be a joint project of the French and U.S. peoples. The Franco-Prussian War delayed progress until 1875, when Laboulaye proposed that the French finance the statue and the U.S. provide the site and build the pedestal. Bartholdi completed the head and the torch-bearing arm before the statue was fully designed, and these pieces were exhibited for publicity at international expositions.
The torch-bearing arm was displayed at the Centennial Exposition in Philadelphia in 1876, and in Madison Square Park in Manhattan from 1876 to 1882. Fundraising proved difficult, especially for the Americans, and by 1885 work on the pedestal was threatened by lack of funds. Publisher Joseph Pulitzer, of the New York World, started a drive for donations to finish the project and attracted more than 120,000 contributors, most of whom gave less than a dollar (equivalent to $29 in 2020). The statue was built in France, shipped overseas in crates, and assembled on the completed pedestal on what was then called Bedloe's Island. The statue's completion was marked by New York's first ticker-tape parade and a dedication ceremony presided over by President Grover Cleveland."}, {"doc_id": 3, "title": "DNA", "content": "Deoxyribonucleic acid ( (listen); DNA) is a molecule composed of two polynucleotide chains that coil around each other to form a double helix carrying genetic instructions for the development, functioning, growth and reproduction of all known organisms and many viruses. DNA and ribonucleic acid (RNA) are nucleic acids. Alongside proteins, lipids and complex carbohydrates (polysaccharides), nucleic acids are one of the four major types of macromolecules that are essential for all known forms of life.
The two DNA strands are known as polynucleotides as they are composed of simpler monomeric units called nucleotides. Each nucleotide is composed of one of four nitrogen-containing nucleobases (cytosine [C], guanine [G], adenine [A] or thymine [T]), a sugar called deoxyribose, and a phosphate group. The nucleotides are joined to one another in a chain by covalent bonds (known as the phospho-diester linkage) between the sugar of one nucleotide and the phosphate of the next, resulting in an alternating sugar-phosphate backbone. The nitrogenous bases of the two separate polynucleotide strands are bound together, according to base pairing rules (A with T and C with G), with hydrogen bonds to make double-stranded DNA. The complementary nitrogenous bases are divided into two groups, pyrimidines and purines. In DNA, the pyrimidines are thymine and cytosine; the purines are adenine and guanine.
Both strands of double-stranded DNA store the same biological information. This information is replicated when the two strands separate. A large part of DNA (more than 98% for humans) is non-coding, meaning that these sections do not serve as patterns for protein sequences. The two strands of DNA run in opposite directions to each other and are thus antiparallel. Attached to each sugar is one of four types of nucleobases (or bases). It is the sequence of these four nucleobases along the backbone that encodes genetic information. RNA strands are created using DNA strands as a template in a process called transcription, where DNA bases are exchanged for their corresponding bases except in the case of thymine (T), for which RNA substitutes uracil (U). Under the genetic code, these RNA strands specify the sequence of amino acids within proteins in a process called translation."}, {"doc_id": 4, "title": "Palazzo_Pitti", "content": "The Palazzo Pitti (Italian pronunciation: [pa\u02c8lattso \u02c8pitti]), in English sometimes called the Pitti Palace, is a vast, mainly Renaissance, palace in Florence, Italy. It is situated on the south side of the River Arno, a short distance from the Ponte Vecchio. The core of the present palazzo dates from 1458 and was originally the town residence of Luca Pitti, an ambitious Florentine banker.
The palace was bought by the Medici family in 1549 and became the chief residence of the ruling families of the Grand Duchy of Tuscany. It grew as a great treasure house as later generations amassed paintings, plates, jewelry and luxurious possessions.
In the late 18th century, the palazzo was used as a power base by Napoleon and later served for a brief period as the principal royal palace of the newly united Italy. The palace and its contents were donated to the Italian people by King Victor Emmanuel III in 1919.
The palazzo is now the largest museum complex in Florence. The principal palazzo block, often in a building of this design known as the corps de logis, is 32,000 square metres. It is divided into several principal galleries or museums detailed below.


== History ==


=== Early history ===

The construction of this severe and forbidding building was commissioned in 1458 by the Florentine banker Luca Pitti (1398\u20131472), a principal supporter and friend of Cosimo de' Medici. The early history of the Palazzo Pitti is a mixture of fact and myth. Pitti is alleged to have instructed that the windows be larger than the entrance of the Palazzo Medici. The 16th-century art historian Giorgio Vasari proposed that Brunelleschi was the palazzo's architect, and that his pupil Luca Fancelli was merely his assistant in the task, but today it is Fancelli who is generally credited. Besides obvious differences from the elder architect's style, Brunelleschi died 12 years before construction of the palazzo began. The design and fenestration suggest that the unknown architect was more experienced in utilitarian domestic architecture than in the humanist rules defined by Alberti in his book De Re Aedificatoria.Though impressive, the original palazzo would have been no rival to the Florentine Medici residences in terms of either size or content. Whoever the architect of the Palazzo Pitti was, he was moving against the contemporary flow of fashion. The rusticated stonework gives the palazzo a severe and powerful atmosphere, reinforced by the three-times-repeated series of seven arch-headed apertures, reminiscent of a Roman aqueduct. The Roman-style architecture appealed to the Florentine love of the new style all'antica. This original design has withstood the test of time: the repetitive formula of the fa\u00e7ade was continued during the subsequent additions to the palazzo, and its influence can be seen in numerous 16th-century imitations and 19th-century revivals. Work stopped after Pitti suffered financial losses following the death of Cosimo de' Medici in 1464. Luca Pitti died in 1472 with the building unfinished."}, {"doc_id": 5, "title": "Enzyme", "content": "Enzymes () are proteins that act as biological catalysts (biocatalysts). Catalysts accelerate chemical reactions. The molecules upon which enzymes may act are called substrates, and the enzyme converts the substrates into different molecules known as products. Almost all metabolic processes in the cell need enzyme catalysis in order to occur at rates fast enough to sustain life.:\u200a8.1\u200a Metabolic pathways depend upon enzymes to catalyze individual steps. The study of enzymes is called enzymology and the field of pseudoenzyme analysis recognizes that during evolution, some enzymes have lost the ability to carry out biological catalysis, which is often reflected in their amino acid sequences and unusual 'pseudocatalytic' properties.Enzymes are known to catalyze more than 5,000 biochemical reaction types. Other biocatalysts are catalytic RNA molecules, called ribozymes. Enzymes' specificity comes from their unique three-dimensional structures.
Like all catalysts, enzymes increase the reaction rate by lowering its activation energy. Some enzymes can make their conversion of substrate to product occur many millions of times faster. An extreme example is orotidine 5'-phosphate decarboxylase, which allows a reaction that would otherwise take millions of years to occur in milliseconds. Chemically, enzymes are like any catalyst and are not consumed in chemical reactions, nor do they alter the equilibrium of a reaction. Enzymes differ from most other catalysts by being much more specific. Enzyme activity can be affected by other molecules: inhibitors are molecules that decrease enzyme activity, and activators are molecules that increase activity. Many therapeutic drugs and poisons are enzyme inhibitors. An enzyme's activity decreases markedly outside its optimal temperature and pH, and many enzymes are (permanently) denatured when exposed to excessive heat, losing their structure and catalytic properties.
Some enzymes are used commercially, for example, in the synthesis of antibiotics. Some household products use enzymes to speed up chemical reactions: enzymes in biological washing powders break down protein, starch or fat stains on clothes, and enzymes in meat tenderizer break down proteins into smaller molecules, making the meat easier to chew."}, {"doc_id": 6, "title": "Cretaceous\u2013Paleogene_extinction_event", "content": "The Cretaceous\u2013Paleogene (K\u2013Pg) extinction event (also known as the Cretaceous\u2013Tertiary (K\u2013T) extinction) was a sudden mass extinction of three-quarters of the plant and animal species on Earth, approximately 66 million years ago. With the exception of some ectothermic species such as sea turtles and crocodilians, no tetrapods weighing more than 25 kilograms (55 pounds) survived. It marked the end of the Cretaceous period, and with it the Mesozoic Era, while heralding the beginning of the Cenozoic Era, which continues to this day.
In the geologic record, the K\u2013Pg event is marked by a thin layer of sediment called the K\u2013Pg boundary, which can be found throughout the world in marine and terrestrial rocks. The boundary clay shows unusually high levels of the metal iridium, which is more common in asteroids than in the Earth's crust.As originally proposed in 1980 by a team of scientists led by Luis Alvarez and his son Walter, it is now generally thought that the K\u2013Pg extinction was caused by the impact of a massive comet or asteroid 10 to 15 km (6 to 9 mi) wide, 66 million years ago, which devastated the global environment, mainly through a lingering impact winter which halted photosynthesis in plants and plankton. The impact hypothesis, also known as the Alvarez hypothesis, was bolstered by the discovery of the 180 km (112 mi) Chicxulub crater in the Gulf of Mexico's Yucat\u00e1n Peninsula in the early 1990s, which provided conclusive evidence that the K\u2013Pg boundary clay represented debris from an asteroid impact. The fact that the extinctions occurred simultaneously provides strong evidence that they were caused by the asteroid. A 2016 drilling project into the Chicxulub peak ring confirmed that the peak ring comprised granite ejected within minutes from deep in the earth, but contained hardly any gypsum, the usual sulfate-containing sea floor rock in the region: the gypsum would have vaporized and dispersed as an aerosol into the atmosphere, causing longer-term effects on the climate and food chain. In October 2019, researchers reported that the event rapidly acidified the oceans, producing ecological collapse and, in this way as well, produced long-lasting effects on the climate, and accordingly was a key reason for the mass extinction at the end of the Cretaceous. In January 2020, scientists reported that climate-modeling of the extinction event favors the asteroid impact and not volcanism.Other causal or contributing factors to the extinction may have been the Deccan Traps and other volcanic eruptions, climate change, and sea level change."}] -------------------------------------------------------------------------------- /Quiz_Design/requirements.txt: -------------------------------------------------------------------------------- 1 | torch 2 | transformers 3 | flask -------------------------------------------------------------------------------- /Quiz_Design/run_flask_server.py: -------------------------------------------------------------------------------- 1 | from flask import Flask, request, render_template, send_from_directory 2 | from model_hf_generator import GeneratorHF 3 | from datetime import datetime, timedelta 4 | import os, random, json, flask 5 | 6 | CACHE_FILE = "qd_cache.json" 7 | ANNOT_FILE = "qd_annotations_running.jsonl" 8 | CONTENT_FILE = "qd_content.json" 9 | 10 | def load_question_cache(): 11 | if os.path.exists(CACHE_FILE): 12 | with open(CACHE_FILE, "r") as f: 13 | return json.load(f) 14 | else: 15 | return {} 16 | 17 | def mark_paragraph_answer(paragraph, answer, model_card=""): 18 | if "prophetnet" in model_card: 19 | return "%s [SEP] %s" % (answer, paragraph) 20 | elif "mixqg" in model_card: 21 | return f"{answer} \\n {paragraph}" 22 | else: 23 | return "%s \n %s" % (answer, paragraph) # The default, used for our trained models 24 | 25 | def save_question_cache(): 26 | with open(CACHE_FILE, "w") as f: 27 | json.dump(cached_questions, f) 28 | 29 | def deduplicate_questions(questions): 30 | M = {} 31 | for q in questions: 32 | if q["question"] not in M: 33 | M[q["question"]] = [] 34 | M[q["question"]].append(q["model_name"]) 35 | return [{"model_name": "|".join(v), "question": k} for k, v in M.items()] 36 | 37 | def load_qgen_models(): 38 | global QGEN_MODELS, scorer 39 | QGEN_MODELS = [ 40 | # {"model_name": "dgpt2_sup", "model": GeneratorHF("distilgpt2", starter_file="qgen/dgpt2_squad_aaware_1.794.bin")}, 41 | # {"model_name": "gpt2b_sup", "model": GeneratorHF("gpt2", starter_file="qgen/gpt2b_squad_aaware_1.575.bin")}, 42 | # {"model_name": "bartb_sup", "model": GeneratorHF("facebook/bart-base", starter_file="qgen/bartb_nf_squad_aaware_1.492.bin")}, 43 | # {"model_name": "bartl_sup", "model": GeneratorHF("facebook/bart-large", starter_file="qgen/bartL_nf_squad_aaware_1.290.bin")}, 44 | # {"model_name": "gpt2m_sup", "model": GeneratorHF("gpt2-medium", starter_file="qgen/gpt2m_nf_squad_aaware_1.423.bin")}, 45 | {"model_name": "mixqg-base", "model": GeneratorHF(model_card='Salesforce/mixqg-base')}, 46 | {"model_name": "mixqg-large", "model": GeneratorHF(model_card='Salesforce/mixqg-large')}, 47 | {"model_name": "prophetnet", "model": GeneratorHF(model_card='microsoft/prophetnet-large-uncased-squad-qg')} 48 | ] 49 | 50 | print("Qgen models loaded") 51 | 52 | app = Flask(__name__) 53 | app.config["TEMPLATES_AUTO_RELOAD"] = True 54 | 55 | QGEN_MODELS = [] 56 | scorer = None 57 | 58 | load_qgen_models() 59 | cached_questions = load_question_cache() 60 | 61 | @app.before_request 62 | def before_request(): 63 | user_id = -1 64 | if "user_id" in flask.request.cookies: 65 | try: 66 | user_id = int(flask.request.cookies["user_id"]) 67 | except: 68 | pass 69 | 70 | if user_id < 0: 71 | max_user_id = 0 72 | if os.path.exists(ANNOT_FILE): 73 | with open(ANNOT_FILE, "r") as f: 74 | for line in f: 75 | obj = json.loads(line) 76 | max_user_id = max(max_user_id, obj.get("user_id", -1)) 77 | user_id = max_user_id + 1 78 | flask.request.user_id = user_id 79 | 80 | @app.after_request 81 | def after_request(response): 82 | response.headers.add('Access-Control-Allow-Origin', '*') 83 | response.headers.add('Access-Control-Allow-Headers', 'Content-Type,Authorization') 84 | response.headers.add('Access-Control-Allow-Methods', 'GET,PUT,POST,DELETE,OPTIONS') 85 | response.set_cookie("user_id", value=str(flask.request.user_id), expires=datetime.now() + timedelta(days=365)) 86 | return response 87 | 88 | @app.route("/") 89 | def api_home_page(): 90 | return render_template("main_page.html") 91 | 92 | @app.route('/static/') 93 | def send_static(path): 94 | return send_from_directory('static', path) 95 | 96 | @app.route("/api/load_documents") 97 | def api_load_document(): 98 | with open(CONTENT_FILE, "r") as f: 99 | data = json.load(f) 100 | return {"documents": data} 101 | 102 | @app.route("/api/gen_questions", methods=["POST"]) 103 | def api_gen_questions(): 104 | request_data = dict(request.form) 105 | 106 | doc_id = int(request_data["doc_id"]) 107 | context = request_data["context"] 108 | answer_span = request_data["selection"] 109 | 110 | paragraphs = context.split("
") 111 | relevant_paragraphs = [p for p in paragraphs if answer_span in p] 112 | if len(relevant_paragraphs) == 0: 113 | return [] 114 | else: 115 | question_key = "%d||%s" % (doc_id, answer_span) 116 | if question_key not in cached_questions: 117 | relevant_paragraph = relevant_paragraphs[0] 118 | response = [] 119 | for model in QGEN_MODELS: 120 | marked_paragraph = mark_paragraph_answer(relevant_paragraph, answer_span, model_card=model["model"].model_card) 121 | 122 | questions = model["model"].generate([marked_paragraph], max_gen_length=30, num_beams=2)[0] 123 | question = questions[0]["output_text"] 124 | question = question[0].upper() + question[1:] 125 | response.append({"model_name": model["model_name"], "question": question}) 126 | 127 | response = deduplicate_questions(response) 128 | cached_questions[question_key] = response 129 | save_question_cache() 130 | else: 131 | print("Reloaded from the cache") 132 | 133 | response = cached_questions[question_key] 134 | 135 | random.shuffle(response) 136 | return {"response": response} 137 | 138 | @app.route("/api/annotate_questions", methods=["POST"]) 139 | def api_annotate_questions(): 140 | request_data = dict(request.form) 141 | request_data["questions"] = json.loads(request_data["questions"].strip()) 142 | 143 | ip_addr = request.remote_addr 144 | saved_object = {"timestamp": datetime.now().strftime("%Y-%m-%d %H:%M:%S"), "ip_addr": ip_addr} 145 | saved_object["user_id"] = request.user_id 146 | saved_object["doc_id"] = request_data["doc_id"] 147 | saved_object["answer_span"] = request_data["answer_span"] 148 | saved_object["answer_span_idx"] = request_data["answer_span_idx"] 149 | saved_object["questions"] = request_data["questions"] 150 | saved_object["annotator_name"] = request_data["annotator_name"] 151 | 152 | with open(ANNOT_FILE, "a") as f: 153 | f.write(json.dumps(saved_object) + "\n") 154 | return {"response": 1} 155 | 156 | @app.route("/api/cancel_selection", methods=["POST"]) 157 | def api_cancel_selection(): 158 | request_data = dict(request.form) 159 | 160 | print("Delete request: ", request_data["doc_id"], request_data["answer_span"], request_data["annotator_name"]) 161 | if os.path.exists(ANNOT_FILE): 162 | final_annotations = [] 163 | num_deleted = 0 164 | with open(ANNOT_FILE, "r") as f: 165 | for line in f: 166 | obj = json.loads(line) 167 | if obj["doc_id"] == request_data["doc_id"] and obj["answer_span"] == request_data["answer_span"] and obj["annotator_name"] == request_data["annotator_name"]: 168 | num_deleted += 1 169 | else: 170 | final_annotations.append(obj) 171 | 172 | print("Num rows deleted: %d" % (num_deleted)) 173 | with open(ANNOT_FILE, "w") as f: 174 | for obj in final_annotations: 175 | f.write(json.dumps(obj) + "\n") 176 | 177 | return {"response": 1} 178 | -------------------------------------------------------------------------------- /Quiz_Design/static/Quiz_Design_Tutorial.mp4: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/salesforce/QGen/e0beb712cfb82316b04f12a01549d7544b37ddd0/Quiz_Design/static/Quiz_Design_Tutorial.mp4 -------------------------------------------------------------------------------- /Quiz_Design/static/live.js: -------------------------------------------------------------------------------- 1 | /* 2 | Live.js - One script closer to Designing in the Browser 3 | Written for Handcraft.com by Martin Kool (@mrtnkl). 4 | 5 | Version 4. 6 | Recent change: Made stylesheet and mimetype checks case insensitive. 7 | 8 | http://livejs.com 9 | http://livejs.com/license (MIT) 10 | @livejs 11 | 12 | Include live.js#css to monitor css changes only. 13 | Include live.js#js to monitor js changes only. 14 | Include live.js#html to monitor html changes only. 15 | Mix and match to monitor a preferred combination such as live.js#html,css 16 | 17 | By default, just include live.js to monitor all css, js and html changes. 18 | 19 | Live.js can also be loaded as a bookmarklet. It is best to only use it for CSS then, 20 | as a page reload due to a change in html or css would not re-include the bookmarklet. 21 | To monitor CSS and be notified that it has loaded, include it as: live.js#css,notify 22 | */ 23 | (function () { 24 | 25 | var headers = { "Etag": 1, "Last-Modified": 1, "Content-Length": 1, "Content-Type": 1 }, 26 | resources = {}, 27 | pendingRequests = {}, 28 | currentLinkElements = {}, 29 | oldLinkElements = {}, 30 | interval = 1000, 31 | loaded = false, 32 | active = { "html": 1, "css": 1, "js": 1 }; 33 | 34 | var Live = { 35 | 36 | // performs a cycle per interval 37 | heartbeat: function () { 38 | if (document.body) { 39 | // make sure all resources are loaded on first activation 40 | if (!loaded) Live.loadresources(); 41 | Live.checkForChanges(); 42 | } 43 | setTimeout(Live.heartbeat, interval); 44 | }, 45 | 46 | // loads all local css and js resources upon first activation 47 | loadresources: function () { 48 | 49 | // helper method to assert if a given url is local 50 | function isLocal(url) { 51 | var loc = document.location, 52 | reg = new RegExp("^\\.|^\/(?!\/)|^[\\w]((?!://).)*$|" + loc.protocol + "//" + loc.host); 53 | return url.match(reg); 54 | } 55 | 56 | // gather all resources 57 | var scripts = document.getElementsByTagName("script"), 58 | links = document.getElementsByTagName("link"), 59 | uris = []; 60 | 61 | // track local js urls 62 | for (var i = 0; i < scripts.length; i++) { 63 | var script = scripts[i], src = script.getAttribute("src"); 64 | if (src && isLocal(src)) 65 | uris.push(src); 66 | if (src && src.match(/\blive.js#/)) { 67 | for (var type in active) 68 | active[type] = src.match("[#,|]" + type) != null 69 | if (src.match("notify")) 70 | alert("Live.js is loaded."); 71 | } 72 | } 73 | if (!active.js) uris = []; 74 | if (active.html) uris.push(document.location.href); 75 | 76 | // track local css urls 77 | for (var i = 0; i < links.length && active.css; i++) { 78 | var link = links[i], rel = link.getAttribute("rel"), href = link.getAttribute("href", 2); 79 | if (href && rel && rel.match(new RegExp("stylesheet", "i")) && isLocal(href)) { 80 | uris.push(href); 81 | currentLinkElements[href] = link; 82 | } 83 | } 84 | 85 | // initialize the resources info 86 | for (var i = 0; i < uris.length; i++) { 87 | var url = uris[i]; 88 | Live.getHead(url, function (url, info) { 89 | resources[url] = info; 90 | }); 91 | } 92 | 93 | // add rule for morphing between old and new css files 94 | var head = document.getElementsByTagName("head")[0], 95 | style = document.createElement("style"), 96 | rule = "transition: all .3s ease-out;" 97 | css = [".livejs-loading * { ", rule, " -webkit-", rule, "-moz-", rule, "-o-", rule, "}"].join(''); 98 | style.setAttribute("type", "text/css"); 99 | head.appendChild(style); 100 | style.styleSheet ? style.styleSheet.cssText = css : style.appendChild(document.createTextNode(css)); 101 | 102 | // yep 103 | loaded = true; 104 | }, 105 | 106 | // check all tracking resources for changes 107 | checkForChanges: function () { 108 | for (var url in resources) { 109 | if (pendingRequests[url]) 110 | continue; 111 | 112 | Live.getHead(url, function (url, newInfo) { 113 | var oldInfo = resources[url], 114 | hasChanged = false; 115 | resources[url] = newInfo; 116 | for (var header in oldInfo) { 117 | // do verification based on the header type 118 | var oldValue = oldInfo[header], 119 | newValue = newInfo[header], 120 | contentType = newInfo["Content-Type"]; 121 | switch (header.toLowerCase()) { 122 | case "etag": 123 | if (!newValue) break; 124 | // fall through to default 125 | default: 126 | hasChanged = oldValue != newValue; 127 | break; 128 | } 129 | // if changed, act 130 | if (hasChanged) { 131 | Live.refreshResource(url, contentType); 132 | break; 133 | } 134 | } 135 | }); 136 | } 137 | }, 138 | 139 | // act upon a changed url of certain content type 140 | refreshResource: function (url, type) { 141 | switch (type.toLowerCase()) { 142 | // css files can be reloaded dynamically by replacing the link element 143 | case "text/css": 144 | var link = currentLinkElements[url], 145 | html = document.body.parentNode, 146 | head = link.parentNode, 147 | next = link.nextSibling, 148 | newLink = document.createElement("link"); 149 | 150 | html.className = html.className.replace(/\s*livejs\-loading/gi, '') + ' livejs-loading'; 151 | newLink.setAttribute("type", "text/css"); 152 | newLink.setAttribute("rel", "stylesheet"); 153 | newLink.setAttribute("href", url + "?now=" + new Date() * 1); 154 | next ? head.insertBefore(newLink, next) : head.appendChild(newLink); 155 | currentLinkElements[url] = newLink; 156 | oldLinkElements[url] = link; 157 | 158 | // schedule removal of the old link 159 | Live.removeoldLinkElements(); 160 | break; 161 | 162 | // check if an html resource is our current url, then reload 163 | case "text/html": 164 | if (url != document.location.href) 165 | return; 166 | 167 | // local javascript changes cause a reload as well 168 | case "text/javascript": 169 | case "application/javascript": 170 | case "application/x-javascript": 171 | document.location.reload(); 172 | } 173 | }, 174 | 175 | // removes the old stylesheet rules only once the new one has finished loading 176 | removeoldLinkElements: function () { 177 | var pending = 0; 178 | for (var url in oldLinkElements) { 179 | // if this sheet has any cssRules, delete the old link 180 | try { 181 | var link = currentLinkElements[url], 182 | oldLink = oldLinkElements[url], 183 | html = document.body.parentNode, 184 | sheet = link.sheet || link.styleSheet, 185 | rules = sheet.rules || sheet.cssRules; 186 | if (rules.length >= 0) { 187 | oldLink.parentNode.removeChild(oldLink); 188 | delete oldLinkElements[url]; 189 | setTimeout(function () { 190 | html.className = html.className.replace(/\s*livejs\-loading/gi, ''); 191 | }, 100); 192 | } 193 | } catch (e) { 194 | pending++; 195 | } 196 | if (pending) setTimeout(Live.removeoldLinkElements, 50); 197 | } 198 | }, 199 | 200 | // performs a HEAD request and passes the header info to the given callback 201 | getHead: function (url, callback) { 202 | pendingRequests[url] = true; 203 | var xhr = window.XMLHttpRequest ? new XMLHttpRequest() : new ActiveXObject("Microsoft.XmlHttp"); 204 | xhr.open("HEAD", url, true); 205 | xhr.onreadystatechange = function () { 206 | delete pendingRequests[url]; 207 | if (xhr.readyState == 4 && xhr.status != 304) { 208 | xhr.getAllResponseHeaders(); 209 | var info = {}; 210 | for (var h in headers) { 211 | var value = xhr.getResponseHeader(h); 212 | // adjust the simple Etag variant to match on its significant part 213 | if (h.toLowerCase() == "etag" && value) value = value.replace(/^W\//, ''); 214 | if (h.toLowerCase() == "content-type" && value) value = value.replace(/^(.*?);.*?$/i, "$1"); 215 | info[h] = value; 216 | } 217 | callback(url, info); 218 | } 219 | } 220 | xhr.send(); 221 | } 222 | }; 223 | 224 | // start listening 225 | if (document.location.protocol != "file:") { 226 | if (!window.liveJsLoaded) 227 | Live.heartbeat(); 228 | 229 | window.liveJsLoaded = true; 230 | } 231 | else if (window.console) 232 | console.log("Live.js doesn't support the file protocol. It needs http."); 233 | })(); -------------------------------------------------------------------------------- /Quiz_Design/static/main.css: -------------------------------------------------------------------------------- 1 | @import url('https://fonts.googleapis.com/css2?family=Roboto:ital,wght@0,100;0,300;0,400;0,500;0,700;0,900;1,400&display=swap'); 2 | body { 3 | margin: 0px; 4 | padding: 0px; 5 | font-family: "Roboto"; 6 | } 7 | #header { 8 | font-size: 40px; 9 | line-height: 80px; 10 | padding-left: 100px; 11 | background: #f0f0f0; 12 | margin-bottom: 40px; 13 | } 14 | #content { 15 | margin: auto; 16 | width: 850px; 17 | } 18 | #column1 { 19 | float: left; 20 | line-height: 1.8; 21 | width: 500px; 22 | margin-bottom: 50px; 23 | } 24 | #column2 { 25 | display: none; 26 | position: fixed; 27 | top: 160px; 28 | right: 30px; 29 | width: 300px; 30 | box-sizing: border-box; 31 | padding-left: 30px; 32 | background: white; 33 | } 34 | .unconfirmed_span { 35 | background: rgba(0, 0, 0, 0.07); 36 | padding: 5px 10px; 37 | } 38 | .unconfirmed_span button { 39 | background: white; 40 | cursor: pointer; 41 | } 42 | .confirmed_span { 43 | background: rgba(0, 120, 255, 0.2); 44 | padding: 5px 10px; 45 | cursor: pointer; 46 | border-bottom: 2px solid transparent; 47 | } 48 | .confirmed_span:hover { 49 | background: rgba(0, 120, 255, 0.25); 50 | } 51 | .active_span { 52 | border-bottom: 2px solid rgba(0, 120, 255, 0.75); 53 | } 54 | .column_title { 55 | font-size: 30px; 56 | } 57 | #no_questions { 58 | font-style: italic; 59 | margin-top: 30px; 60 | } 61 | .question_removed { 62 | opacity: 0.5; 63 | } 64 | .question_removed .question { 65 | text-decoration: line-through; 66 | } 67 | #loading { 68 | display: none; 69 | } 70 | #loading img { 71 | width: 80px; 72 | margin-left: 75px; 73 | } 74 | #documents_row { 75 | font-size: 25px; 76 | height: 40px; 77 | background: #e0e0e0; 78 | margin-left: -100px; 79 | padding-left: 100px; 80 | line-height: 40px; 81 | } 82 | #documents_list { 83 | display: inline-block; 84 | cursor: pointer; 85 | vertical-align: top; 86 | } 87 | .document_title { 88 | display: inline-block; 89 | font-size: 16px; 90 | line-height: 16px; 91 | padding: 5px 10px; 92 | margin: 7px 10px; 93 | background: rgba(0, 0, 0, 0.07); 94 | border-radius: 8px; 95 | } 96 | .document_title:hover { 97 | text-decoration: underline; 98 | } 99 | .active_document_title { 100 | background: rgba(0, 0, 0, 0.14); 101 | } 102 | #labeler_name, .document_select, #open_tutorial_btn { 103 | font-size: 18px; 104 | margin-left: 10px; 105 | margin-top: 8px; 106 | padding-left: 7px; 107 | background: rgba(0, 0, 0, 0.07); 108 | border: 0px; 109 | width: 200px; 110 | } 111 | #open_tutorial_btn { 112 | cursor: pointer; 113 | } 114 | .q_option { 115 | margin: 10px 0px; 116 | padding: 5px; 117 | line-height: 1.0; 118 | position: relative; 119 | overflow: hidden; 120 | box-sizing: border-box; 121 | min-height: 64px; 122 | background: #f5f5f5; 123 | border-radius: 10px; 124 | } 125 | .remove_q { 126 | color: #aa0000; 127 | cursor: pointer; 128 | font-size: 20px; 129 | vertical-align: middle; 130 | position: absolute; 131 | width: 20px; 132 | height: 20px; 133 | transform: translateY(-50%); 134 | top: 50%; 135 | text-align: right; 136 | } 137 | .question { 138 | margin-left: 35px; 139 | } 140 | .q_reasons { 141 | text-align: center; 142 | line-height: 2.0; 143 | float: right; 144 | width: calc( 100% - 20px); 145 | display: none; 146 | } 147 | .q_reasons span { 148 | padding: 0px 10px; 149 | cursor: pointer; 150 | white-space: nowrap; 151 | color: #990000; 152 | } 153 | .q_reasons span:hover { 154 | text-decoration: underline; 155 | } 156 | .cancel_btn { 157 | padding-right: 7px; 158 | font-weight: bold; 159 | font-size: 14px; 160 | } -------------------------------------------------------------------------------- /Quiz_Design/static/slideshow.css: -------------------------------------------------------------------------------- 1 | #tutorial_container { 2 | position: absolute; 3 | top: 0px; left: 0px; 4 | width: 100%; height: 100%; 5 | background: rgba(0, 0, 0, 0.6); 6 | z-index: 100; 7 | } 8 | #tutorial { 9 | position: absolute; 10 | width: 750px; 11 | height: 550px; 12 | transform: translate(-50%, -50%); 13 | top: 50%; left: 50%; 14 | background: #e0e0e0; 15 | box-sizing: border-box; 16 | padding: 40px; 17 | } 18 | #tuto_close_button { 19 | position: absolute; 20 | top: 10px; right: 10px; 21 | cursor: pointer; 22 | z-index: 101; 23 | } 24 | 25 | #tutorial .title { 26 | font-size: 20px; 27 | margin-bottom: 20px; 28 | } 29 | #tutorial .content { 30 | font-size: 14px; 31 | line-height: 1.5; 32 | } 33 | 34 | #tuto_video { 35 | width: 700px; 36 | height: 400px; 37 | margin-left: -40px; 38 | margin-right: -40px; 39 | } -------------------------------------------------------------------------------- /Quiz_Design/templates/main_page.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | Quiz Design 4 | 5 | 6 | 7 | 8 |
9 |
10 | 11 |
12 |
13 |
Introduction Video
14 |
15 | 18 |
19 |
20 |
21 |
1. Introduction to the Quiz Design Helper
22 |
23 | Your objective is to design a quiz about a particular topic for a class of students. The procedure is the following:

24 | 25 | 1) Select a quiz topic from the list (for example "Sustainable Energy")

26 | 2) The system will load a text about the topic. 27 |
28 |
29 |
30 |
31 |
2. Quiz Concepts and Questions
32 | 3) Select a concept that you want to quiz your students on (for example a phrase, a figure, or a keyword) and confirm your selection.

33 | 34 | 4) Important: It is recommended to select shorter concepts, and not full sentences to obtain more precise question. Selecting concepts of up to about 8 words is ideal.

35 | 36 | 5) The system will load a list of questions that attempt to quiz students about the selected concept.

37 | 38 | 6) Go over each question, and remove ones you would not include in your quiz. We will next go over types of questions that should be removed.

39 | 40 | 7) Important: you can keep one, multiple or none of the questions (if none of the questions are satisfactory). For each question you remove, you have to choose the reason that the question is unsatisfactory (more on this later).
41 |
42 |
43 |
44 |
3. Number of Questions in Quiz
45 |
46 | 8) Once you've finalized the question for a concept, select another concept and repeat the question selection process. Try to select 8-12 concepts per topic to generate long enough quizzes.

47 | 48 | 9) Once you've finished a full quiz set, you can move on to another quiz topic. We have found that in one hour, you should be able to complete the quizzes for 5 topics. 49 |
50 |
51 |
52 |
4. Reasons to remove questions
53 |
54 | There are three main reasons to remove a question: (I) Disfluent - the question is not fluent, (II) Wrong Answer - the question is not about the target concept, (III) Inadequate - the question phrasing makes it inadequate for use in a quiz.

55 | We next go over examples of each type of question error type. 56 |
57 |
58 |
59 |
5. Error Type I: Question is not Fluent
60 |
61 | The question can be disfluent for several reasons, including: (1) not being phrased as a question, (2) excessive repetition, (3) awkward phrasing, or (4) using the wrong verb tense. 62 | Given the following paragraph:
63 | The mammoth was identified as an extinct species of elephant by Georges Cuvier in 1796. 64 | And targetting the concept: "in 1796". Here are some disfluent questions: 65 |
    66 |
  • The mammoth was identified as an extinct specis that year. (Not phrased as a question)
  • 67 |
  • When did the mammoth the mammoth go instinct? (excessive repetition)
  • 68 |
  • When did the mammoth die? (awkward phrasing)
  • 69 |
  • When will the mammoth go extinct? (Wrong verb tense)
  • 70 |
71 |
72 |
73 |
74 |
6. Error Type II: Question's Answer is not the Concept
75 |
76 | Even though the question is fluent, it could not be about the target concept selected. In our previous example with the following context:
77 | The mammoth was identified as an extinct species of elephant by Georges Cuvier in 1796. 78 | And targetting the concept: "in 1796". Here are some questions that do not target the concept: 79 |
    80 |
  • Who identified the mammoth as an extinct species? (Wrong answer: as the answer is "Georges Cuvier" and not "in 1796")
  • 81 |
  • When did the white rhinoceros go extinct? (Unanswered: in the given context)
  • 82 |
83 |
84 |
85 |
86 |
7. Error Type III: Question Phrasing is Inadequate
87 |
88 | The question might technically be fluent and be answered by the target concept, but the phrasing might feel wrong. In our previous example with the following context:
89 | The mammoth was identified as an extinct species of elephant by Georges Cuvier in 1796. 90 | And targetting the concept: "in 1796". Here are some inadequate questions: 91 |
    92 |
  • On what year did Georges Cuvier identify the mammoth as an extinct species of elephant? (Too specific: on a quiz, we would not reveal this much information in the question)
  • 93 |
  • When did they go extinct? (Not specific enough: the question is vague)
  • 94 |
  • In 1796, when were mammoths identified as extinct? (Reveals the answer to the question)
  • 95 |
  • When did mammoths go exctinct for a second time? (Inconsistent: claims information that is not present in the context)
  • 96 |
97 |
98 |
99 |
100 |
8. High-Level
101 |
102 | If a question feels incorrect, and you would not include it in a quiz for the students, you should remove it from the quiz, and select the closest reason for the removal: Disfluent, Wrong Answer, or Inadequate.

103 | You should try to build quizzes with 8-12 concepts per document. 104 | 105 | If you have any questions, please contact Philippe Laban on Slack!

106 |
107 | 108 |
109 |
110 |
111 |
112 |
113 |
114 | 115 | 116 | 130 |
131 |
132 |
133 |
134 | 135 |
136 |
137 |
138 |
139 | Quiz Questions 140 |
141 |
142 | Select a text span to see question options. 143 |
144 |
145 | Loading... 146 |
147 |
148 | 149 |
150 |
151 | 152 | 153 | 154 | 155 | 358 | 359 | -------------------------------------------------------------------------------- /Quiz_Design/utils_qd_data.py: -------------------------------------------------------------------------------- 1 | from datetime import datetime 2 | import json, os 3 | 4 | def load_qd_annotations(): 5 | annotations = [] 6 | with open("quiz_design_data.jsonl", "r") as f: 7 | for line in f: 8 | annotations.append(json.loads(line)) 9 | 10 | for d in annotations: 11 | d["timestamp"] = datetime.strptime(d["timestamp"], "%Y-%m-%d %H:%M:%S") 12 | d["doc_id"] = int(d["doc_id"]) 13 | 14 | # Only keep the last annotation (as we store each step purposefully for timing) 15 | annotations = sorted(annotations, key=lambda a: a["timestamp"]) 16 | M = {} 17 | for d in annotations: 18 | k = "%d||%s||%s" % (d["user_id"], d["doc_id"], d["answer_span"]) 19 | M[k] = d 20 | 21 | unique_annotations = sorted(M.values(), key=lambda a: a["timestamp"]) 22 | return unique_annotations 23 | 24 | def build_qd_groups(annotations): 25 | with open("qd_content.json", "r") as f: 26 | evaluation_texts = json.load(f) 27 | 28 | groups = [] 29 | for annot in annotations: 30 | answer_span = annot["answer_span"] 31 | document = evaluation_texts[annot["doc_id"]]["content"] 32 | paragraphs = document.split("
") 33 | relevant_paragraphs = [p for p in paragraphs if answer_span in p] 34 | relevant_paragraph = relevant_paragraphs[0] 35 | 36 | questions = [] 37 | for q in annot["questions"]: 38 | label = 1 if "removed" not in q or q["removed"] is False else 0 39 | reason = q.get("reason", "No error") 40 | questions.append({"question": q["question"], "label": label, "reason": reason, "answer_span": answer_span, "model_name": q["model_name"]}) 41 | 42 | d = {"doc_id": annot["doc_id"], "answer_span": answer_span, "context": relevant_paragraph, "questions": questions} 43 | groups.append(d) 44 | return groups -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Question Generation 2 | 3 | * [Quiz Design: Helping Teachers Create Quizzes with Automated Question Generation](./Quiz_Design) 4 | * [MixQG: Neural Question Generation with Mixed Answer Types](./MixQG) 5 | -------------------------------------------------------------------------------- /SECURITY.md: -------------------------------------------------------------------------------- 1 | ## Security 2 | 3 | Please report any security issue to [security@salesforce.com](mailto:security@salesforce.com) 4 | as soon as it is discovered. This library limits its runtime dependencies in 5 | order to reduce the total cost of ownership as much as can be, but all consumers 6 | should remain vigilant and have their security stakeholders review all third-party 7 | products (3PP) like this one and their dependencies. --------------------------------------------------------------------------------