├── .gitignore
├── CODEOWNERS
├── CODE_OF_CONDUCT.md
├── LICENSE.txt
├── MixQG
    ├── README.md
    ├── configs
    │   └── ds_config_zero2.json
    ├── data
    │   ├── merge_datasets.py
    │   └── preprocess_datasets.py
    ├── eval.sh
    ├── requirements.txt
    ├── run_qg.py
    └── train.sh
├── Quiz_Design
    ├── README.md
    ├── model_hf_generator.py
    ├── qd_content.json
    ├── quiz_design_data.jsonl
    ├── quiz_design_groups.jsonl
    ├── requirements.txt
    ├── run_flask_server.py
    ├── static
    │   ├── Quiz_Design_Tutorial.mp4
    │   ├── live.js
    │   ├── main.css
    │   └── slideshow.css
    ├── templates
    │   └── main_page.html
    └── utils_qd_data.py
├── README.md
└── SECURITY.md


/.gitignore:
--------------------------------------------------------------------------------
  1 | # Byte-compiled / optimized / DLL files
  2 | __pycache__/
  3 | *.py[cod]
  4 | *$py.class
  5 | 
  6 | # C extensions
  7 | *.so
  8 | 
  9 | # Distribution / packaging
 10 | .Python
 11 | build/
 12 | develop-eggs/
 13 | dist/
 14 | downloads/
 15 | eggs/
 16 | .eggs/
 17 | lib/
 18 | lib64/
 19 | parts/
 20 | sdist/
 21 | var/
 22 | wheels/
 23 | pip-wheel-metadata/
 24 | share/python-wheels/
 25 | *.egg-info/
 26 | .installed.cfg
 27 | *.egg
 28 | MANIFEST
 29 | 
 30 | # PyInstaller
 31 | #  Usually these files are written by a python script from a template
 32 | #  before PyInstaller builds the exe, so as to inject date/other infos into it.
 33 | *.manifest
 34 | *.spec
 35 | 
 36 | # Installer logs
 37 | pip-log.txt
 38 | pip-delete-this-directory.txt
 39 | 
 40 | # Unit test / coverage reports
 41 | htmlcov/
 42 | .tox/
 43 | .nox/
 44 | .coverage
 45 | .coverage.*
 46 | .cache
 47 | nosetests.xml
 48 | coverage.xml
 49 | *.cover
 50 | *.py,cover
 51 | .hypothesis/
 52 | .pytest_cache/
 53 | 
 54 | # Translations
 55 | *.mo
 56 | *.pot
 57 | 
 58 | # Django stuff:
 59 | *.log
 60 | local_settings.py
 61 | db.sqlite3
 62 | db.sqlite3-journal
 63 | 
 64 | # Flask stuff:
 65 | instance/
 66 | .webassets-cache
 67 | 
 68 | # Scrapy stuff:
 69 | .scrapy
 70 | 
 71 | # Sphinx documentation
 72 | docs/_build/
 73 | 
 74 | # PyBuilder
 75 | target/
 76 | 
 77 | # Jupyter Notebook
 78 | .ipynb_checkpoints
 79 | 
 80 | # IPython
 81 | profile_default/
 82 | ipython_config.py
 83 | 
 84 | # pyenv
 85 | .python-version
 86 | 
 87 | # pipenv
 88 | #   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
 89 | #   However, in case of collaboration, if having platform-specific dependencies or dependencies
 90 | #   having no cross-platform support, pipenv may install dependencies that don't work, or not
 91 | #   install all needed dependencies.
 92 | #Pipfile.lock
 93 | 
 94 | # PEP 582; used by e.g. github.com/David-OConnor/pyflow
 95 | __pypackages__/
 96 | 
 97 | # Celery stuff
 98 | celerybeat-schedule
 99 | celerybeat.pid
100 | 
101 | # SageMath parsed files
102 | *.sage.py
103 | 
104 | # Environments
105 | .env
106 | .venv
107 | env/
108 | venv/
109 | ENV/
110 | env.bak/
111 | venv.bak/
112 | 
113 | # Spyder project settings
114 | .spyderproject
115 | .spyproject
116 | 
117 | # Rope project settings
118 | .ropeproject
119 | 
120 | # mkdocs documentation
121 | /site
122 | 
123 | # mypy
124 | .mypy_cache/
125 | .dmypy.json
126 | dmypy.json
127 | 
128 | # Pyre type checker
129 | .pyre/
130 | 
131 | 


--------------------------------------------------------------------------------
/CODEOWNERS:
--------------------------------------------------------------------------------
1 | # Comment line immediately above ownership line is reserved for related gus information. Please be careful while editing.
2 | #ECCN:Open Source
3 | 


--------------------------------------------------------------------------------
/CODE_OF_CONDUCT.md:
--------------------------------------------------------------------------------
  1 | # Salesforce Open Source Community Code of Conduct
  2 | 
  3 | ## About the Code of Conduct
  4 | 
  5 | Equality is a core value at Salesforce. We believe a diverse and inclusive
  6 | community fosters innovation and creativity, and are committed to building a
  7 | culture where everyone feels included.
  8 | 
  9 | Salesforce open-source projects are committed to providing a friendly, safe, and
 10 | welcoming environment for all, regardless of gender identity and expression,
 11 | sexual orientation, disability, physical appearance, body size, ethnicity, nationality, 
 12 | race, age, religion, level of experience, education, socioeconomic status, or 
 13 | other similar personal characteristics.
 14 | 
 15 | The goal of this code of conduct is to specify a baseline standard of behavior so
 16 | that people with different social values and communication styles can work
 17 | together effectively, productively, and respectfully in our open source community.
 18 | It also establishes a mechanism for reporting issues and resolving conflicts.
 19 | 
 20 | All questions and reports of abusive, harassing, or otherwise unacceptable behavior
 21 | in a Salesforce open-source project may be reported by contacting the Salesforce
 22 | Open Source Conduct Committee at ossconduct@salesforce.com.
 23 | 
 24 | ## Our Pledge
 25 | 
 26 | In the interest of fostering an open and welcoming environment, we as
 27 | contributors and maintainers pledge to making participation in our project and
 28 | our community a harassment-free experience for everyone, regardless of gender 
 29 | identity and expression, sexual orientation, disability, physical appearance, 
 30 | body size, ethnicity, nationality, race, age, religion, level of experience, education, 
 31 | socioeconomic status, or other similar personal characteristics.
 32 | 
 33 | ## Our Standards
 34 | 
 35 | Examples of behavior that contributes to creating a positive environment
 36 | include:
 37 | 
 38 | * Using welcoming and inclusive language
 39 | * Being respectful of differing viewpoints and experiences
 40 | * Gracefully accepting constructive criticism
 41 | * Focusing on what is best for the community
 42 | * Showing empathy toward other community members
 43 | 
 44 | Examples of unacceptable behavior by participants include:
 45 | 
 46 | * The use of sexualized language or imagery and unwelcome sexual attention or
 47 | advances
 48 | * Personal attacks, insulting/derogatory comments, or trolling
 49 | * Public or private harassment
 50 | * Publishing, or threatening to publish, others' private information—such as
 51 | a physical or electronic address—without explicit permission
 52 | * Other conduct which could reasonably be considered inappropriate in a
 53 | professional setting
 54 | * Advocating for or encouraging any of the above behaviors
 55 | 
 56 | ## Our Responsibilities
 57 | 
 58 | Project maintainers are responsible for clarifying the standards of acceptable
 59 | behavior and are expected to take appropriate and fair corrective action in
 60 | response to any instances of unacceptable behavior.
 61 | 
 62 | Project maintainers have the right and responsibility to remove, edit, or
 63 | reject comments, commits, code, wiki edits, issues, and other contributions
 64 | that are not aligned with this Code of Conduct, or to ban temporarily or
 65 | permanently any contributor for other behaviors that they deem inappropriate,
 66 | threatening, offensive, or harmful.
 67 | 
 68 | ## Scope
 69 | 
 70 | This Code of Conduct applies both within project spaces and in public spaces
 71 | when an individual is representing the project or its community. Examples of
 72 | representing a project or community include using an official project email
 73 | address, posting via an official social media account, or acting as an appointed
 74 | representative at an online or offline event. Representation of a project may be
 75 | further defined and clarified by project maintainers.
 76 | 
 77 | ## Enforcement
 78 | 
 79 | Instances of abusive, harassing, or otherwise unacceptable behavior may be
 80 | reported by contacting the Salesforce Open Source Conduct Committee 
 81 | at ossconduct@salesforce.com. All complaints will be reviewed and investigated 
 82 | and will result in a response that is deemed necessary and appropriate to the 
 83 | circumstances. The committee is obligated to maintain confidentiality with 
 84 | regard to the reporter of an incident. Further details of specific enforcement 
 85 | policies may be posted separately.
 86 | 
 87 | Project maintainers who do not follow or enforce the Code of Conduct in good
 88 | faith may face temporary or permanent repercussions as determined by other
 89 | members of the project's leadership and the Salesforce Open Source Conduct 
 90 | Committee.
 91 | 
 92 | ## Attribution
 93 | 
 94 | This Code of Conduct is adapted from the [Contributor Covenant][contributor-covenant-home],
 95 | version 1.4, available at https://www.contributor-covenant.org/version/1/4/code-of-conduct.html. 
 96 | It includes adaptions and additions from [Go Community Code of Conduct][golang-coc], 
 97 | [CNCF Code of Conduct][cncf-coc], and [Microsoft Open Source Code of Conduct][microsoft-coc].
 98 | 
 99 | This Code of Conduct is licensed under the [Creative Commons Attribution 3.0 License][cc-by-3-us].
100 | 
101 | [contributor-covenant-home]: https://www.contributor-covenant.org (https://www.contributor-covenant.org/)
102 | [golang-coc]: https://golang.org/conduct
103 | [cncf-coc]: https://github.com/cncf/foundation/blob/master/code-of-conduct.md
104 | [microsoft-coc]: https://opensource.microsoft.com/codeofconduct/
105 | [cc-by-3-us]: https://creativecommons.org/licenses/by/3.0/us/


--------------------------------------------------------------------------------
/LICENSE.txt:
--------------------------------------------------------------------------------
 1 | BSD 3-Clause License
 2 | 
 3 | Copyright (c) 2021, Salesforce.com, Inc.
 4 | All rights reserved.
 5 | 
 6 | Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
 7 | 
 8 | 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
 9 | 
10 | 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
11 | 
12 | 3. Neither the name of Salesforce.com nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
13 | 
14 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
15 | 


--------------------------------------------------------------------------------
/MixQG/README.md:
--------------------------------------------------------------------------------
  1 | # MixQG: Neural Question Generation with Mixed Answer Types
  2 | 
  3 | This is the official code base for the following paper from Salesforce Research:
  4 | 
  5 | **Title**: [MixQG: Neural Question Generation with Mixed Answer Types](https://arxiv.org/abs/2110.08175)
  6 | 
  7 | **Authors**: Lidiya Murakhovs'ka, Chien-Sheng Wu, Tong Niu, Wenhao Liu, Caiming Xiong
  8 | 
  9 | ## Abstract
 10 | 
 11 | Asking good questions is an essential ability for both human and machine intelligence. However, existing neural question generation approaches mainly focus on the short factoid type of answers. In this paper, we propose a neural question generator, MixQG, to bridge this gap. We combine 9 question answering datasets with diverse answer types, including yes/no, multiple-choice, extractive, and abstractive answers, to train a single generative model. We show with empirical results that our model outperforms existing work in both seen and unseen domains and can generate questions with different cognitive levels when conditioned on different answer types. Our code is released and well-integrated with the Huggingface library to facilitate various downstream applications.
 12 | 
 13 | ## Usage
 14 | 
 15 | MixQG pre-trained models are available through the Huggingface library:
 16 | 
 17 | ```
 18 | from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
 19 | 
 20 | model_name = "Salesforce/mixqg-base"
 21 | tokenizer = AutoTokenizer.from_pretrained(model_name)
 22 | model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
 23 | 
 24 | def run_qg(input_text, **generator_args):
 25 |     input_ids = tokenizer.encode(input_text, return_tensors="pt")
 26 |     generated_ids = model.generate(input_ids, **generator_args)
 27 |     return tokenizer.batch_decode(generated_ids, skip_special_tokens=True)
 28 | ```
 29 | 
 30 | Input text should be formatted as follows: `f"{answer} \\n {context}"`
 31 | 
 32 | For example,
 33 | ```
 34 | run_qg("Robert Boyle \\n In the late 17th century, Robert Boyle proved that air is necessary for combustion.")
 35 | # should output ['Who proved that air is necessary for combustion?']
 36 | ```
 37 | 
 38 | ## Released Model Checkpoints
 39 | 
 40 | We have released the following checkpoints for pre-trained models described in our paper:
 41 | - MixQG-base (220M parameters): [link](https://huggingface.co/Salesforce/mixqg-base)
 42 | - MixQG-large (770M parameters): [link](https://huggingface.co/Salesforce/mixqg-large)
 43 | - MixQG-3B (3B parameters): [link](https://huggingface.co/Salesforce/mixqg-3b)
 44 | 
 45 | ## Set up
 46 | `pip install -r requirements.txt`
 47 | 
 48 | ## Preprocessing
 49 | Preprocess the required datasets and merge them into one in the `DIR` folder.
 50 | ```
 51 | DIR=/PATH/TO/DATASET/FOLDER
 52 | python data/preprocess_datasets.py --dir $DIR
 53 | python data/merge_datasets.py --dir $DIR
 54 | ```
 55 | The `DIR` folder will contain each of the preprocessed in-domain and out-of-domain datasets as well as the final `mixqg` dataset.
 56 | 
 57 | ## Training
 58 | ```
 59 | num_gpus=4
 60 | model_name=t5-base
 61 | dataset=${DIR}/mixqg
 62 | output_dir=mixqg-base
 63 | lr=3e-5
 64 | bs=32
 65 | 
 66 | ./train.sh $num_gpus $model_name $dataset $output_dir $lr $bs
 67 | ```
 68 | ## Fine-tuning
 69 | ```
 70 | num_gpus=4
 71 | model_name=Salesforce/mixqg-base
 72 | dataset=${DIR}/squad
 73 | output_dir=mixqg-base-squad
 74 | lr=3e-6
 75 | bs=32
 76 | 
 77 | ./train.sh $num_gpus $model_name $dataset $output_dir $lr $bs
 78 | ```
 79 | 
 80 | ## Evaluation
 81 | ```
 82 | gpu=0
 83 | model=Salesforce/mixqg-base
 84 | dataset=${DIR}/squad
 85 | output_dir=mixqg-base-squad-eval
 86 | bs=32
 87 | 
 88 | ./eval.sh $gpu $model $dataset $output_dir $bs
 89 | ```
 90 | 
 91 | ## Citation
 92 | 
 93 | ```
 94 | @misc{murakhovska2021mixqg,
 95 |       title={MixQG: Neural Question Generation with Mixed Answer Types}, 
 96 |       author={Lidiya Murakhovs'ka and Chien-Sheng Wu and Tong Niu and Wenhao Liu and Caiming Xiong},
 97 |       year={2021},
 98 |       eprint={2110.08175},
 99 |       archivePrefix={arXiv},
100 |       primaryClass={cs.CL}
101 | }
102 | ```


--------------------------------------------------------------------------------
/MixQG/configs/ds_config_zero2.json:
--------------------------------------------------------------------------------
 1 | {
 2 |     "fp16": {
 3 |         "enabled": "auto",
 4 |         "loss_scale": 0,
 5 |         "loss_scale_window": 1000,
 6 |         "initial_scale_power": 16,
 7 |         "hysteresis": 2,
 8 |         "min_loss_scale": 1
 9 |     },
10 | 
11 |     "optimizer": {
12 |         "type": "AdamW",
13 |         "params": {
14 |             "lr": "auto",
15 |             "betas": "auto",
16 |             "eps": "auto",
17 |             "weight_decay": "auto"
18 |         }
19 |     },
20 | 
21 |     "scheduler": {
22 |         "type": "WarmupLR",
23 |         "params": {
24 |             "warmup_min_lr": "auto",
25 |             "warmup_max_lr": "auto",
26 |             "warmup_num_steps": "auto"
27 |         }
28 |     },
29 | 
30 |     "zero_optimization": {
31 |         "stage": 2,
32 |         "offload_optimizer": {
33 |             "device": "cpu",
34 |             "pin_memory": true
35 |         },
36 |         "allgather_partitions": true,
37 |         "allgather_bucket_size": 2e8,
38 |         "overlap_comm": true,
39 |         "reduce_scatter": true,
40 |         "reduce_bucket_size": 2e8,
41 |         "contiguous_gradients": true
42 |     },
43 | 
44 |     "gradient_accumulation_steps": "auto",
45 |     "gradient_clipping": "auto",
46 |     "steps_per_print": 2000,
47 |     "train_batch_size": "auto",
48 |     "train_micro_batch_size_per_gpu": "auto",
49 |     "wall_clock_breakdown": false
50 | }


--------------------------------------------------------------------------------
/MixQG/data/merge_datasets.py:
--------------------------------------------------------------------------------
 1 | '''
 2 | Copyright (c) 2021, salesforce.com, inc.
 3 | All rights reserved.
 4 | SPDX-License-Identifier: BSD-3-Clause
 5 | For full license text, see the LICENSE file in the repo root or https://opensource.org/licenses/BSD-3-Clause
 6 | '''
 7 | 
 8 | import argparse
 9 | import os
10 | 
11 | from datasets import DatasetDict, load_from_disk, concatenate_datasets
12 | 
13 | 
14 | def main(args):
15 |     DIR = args.dir
16 | 
17 |     # Load datasets
18 |     mrqa = load_from_disk(f"{DIR}/mrqa")
19 |     narrativeqa = load_from_disk(f"{DIR}/narrativeqa")
20 |     mctest = load_from_disk(f"{DIR}/mctest")
21 |     boolq = load_from_disk(f"{DIR}/boolq")
22 | 
23 |     loaded_datasets = [mrqa, narrativeqa, mctest, boolq]
24 | 
25 |     # Shuffle
26 |     train_datasets = [d["train"].shuffle()
27 |                     for d in loaded_datasets if "train" in d.keys()]
28 |     eval_datasets = [d["validation"]
29 |                     for d in loaded_datasets if "validation" in d.keys()]
30 |     test_datasets = [d["test"]
31 |                     for d in loaded_datasets if "test" in d.keys()]
32 | 
33 |     # Merge & Save
34 |     train_dataset = concatenate_datasets(train_datasets)
35 |     eval_dataset = concatenate_datasets(eval_datasets)
36 |     test_dataset = concatenate_datasets(test_datasets)
37 | 
38 |     combined = DatasetDict({
39 |         "train": train_dataset.shuffle(),
40 |         "validation": eval_dataset,
41 |         "test": test_dataset
42 |     })
43 | 
44 |     if not os.path.isdir(f"{DIR}/mixqg"):
45 |         combined.save_to_disk(f"{DIR}/mixqg")
46 | 
47 | 
48 | if __name__ == "__main__":
49 |     parser = argparse.ArgumentParser()
50 |     parser.add_argument("--dir", type=str, default="",
51 |                         help="Path to the datasets directory.")
52 |     args = parser.parse_args()
53 |     main(args)
54 | 


--------------------------------------------------------------------------------
/MixQG/data/preprocess_datasets.py:
--------------------------------------------------------------------------------
  1 | '''
  2 | Copyright (c) 2021, salesforce.com, inc.
  3 | All rights reserved.
  4 | SPDX-License-Identifier: BSD-3-Clause
  5 | For full license text, see the LICENSE file in the repo root or https://opensource.org/licenses/BSD-3-Clause
  6 | '''
  7 | 
  8 | import argparse
  9 | import os
 10 | import spacy
 11 | 
 12 | from datasets import load_dataset
 13 | 
 14 | 
 15 | nlp = spacy.load("en_core_web_sm")
 16 | MC_map = {'A': 0, 'B': 1, 'C': 2, 'D': 3, 'E': 4, 'F': 5, 'G': 6}
 17 | 
 18 | 
 19 | def preprocess_squad(examples):
 20 |     context = []
 21 |     question = []
 22 |     answer = []
 23 |     for i in range(len(examples["answers"])):
 24 |         if len(examples["answers"][i]["text"]) > 0:
 25 |             answer.append(examples["answers"][i]["text"][0])
 26 |             context.append(examples["context"][i])
 27 |             question.append(examples["question"][i])
 28 | 
 29 |     return {
 30 |         "context": context,
 31 |         "question": question,
 32 |         "answer": answer
 33 |     }
 34 | 
 35 | 
 36 | def preprocess_narrative_qa(examples):
 37 |     context = []
 38 |     question = []
 39 |     answer = []
 40 |     for i in range(len(examples['answers'])):
 41 |         context.append(examples['document'][i]['summary']['text'])
 42 |         question.append(examples['question'][i]['text'])
 43 |         answer.append(examples['answers'][i][0]['text'])
 44 | 
 45 |     return {
 46 |         "context": context,
 47 |         "question": question,
 48 |         "answer": answer
 49 |     }
 50 | 
 51 | 
 52 | def preprocess_mrqa(examples):
 53 |     question = []
 54 |     answer = []
 55 |     context = []
 56 |     for i in range(len(examples["answers"])):
 57 |         if len(examples["answers"][i]) > 0:
 58 |             answer.append(examples["answers"][i][0])
 59 |             context.append(examples["context"][i])
 60 |             question.append(examples["question"][i])
 61 | 
 62 |     return {
 63 |         "context": context,
 64 |         "question": question,
 65 |         "answer": answer
 66 |     }
 67 | 
 68 | 
 69 | def preprocess_mctest(examples):
 70 |     context = examples['story']
 71 |     question = examples['question']
 72 |     answer = []
 73 |     for i in range(len(examples['question'])):
 74 |         answer_letter = examples['answer'][i]
 75 |         options = examples['answer_options'][i]
 76 |         correct_answer = options[answer_letter]
 77 |         answer.append(correct_answer)
 78 | 
 79 |     return {
 80 |         "context": context,
 81 |         "question": question,
 82 |         "answer": answer
 83 |     }
 84 | 
 85 | 
 86 | def preprocess_drop(examples):
 87 |     question = examples["question"]
 88 |     context = examples["passage"]
 89 |     answer = []
 90 |     for i in range(len(examples["answers_spans"])):
 91 |         answer.append(examples["answers_spans"][i]["spans"][0])
 92 | 
 93 |     return {
 94 |         "context": context,
 95 |         "question": question,
 96 |         "answer": answer
 97 |     }
 98 | 
 99 | 
100 | def preprocess_boolq(examples):
101 |     context = examples['passage']
102 |     question = examples['question']
103 |     answer = []
104 |     for i in range(len(examples['question'])):
105 |         ans = 'yes' if examples['answer'][i] else 'no'
106 |         doc = nlp(examples['question'][i])
107 |         entities = " ".join([ent.text for ent in doc.ents])
108 |         if len(entities) > 0:
109 |             answer.append(f"{ans} {entities}")
110 |         else:
111 |             answer.append(ans)
112 |     return {
113 |         "context": context,
114 |         "question": question,
115 |         "answer": answer
116 |     }
117 | 
118 | 
119 | def process_dataset(DIR, dataset_name, process_func):
120 |     if os.path.isdir(f"{DIR}/{dataset_name}"):
121 |         return
122 |     dataset = load_dataset(dataset_name)
123 |     column_names = dataset["train"].column_names
124 |     processed = dataset.map(
125 |         process_func,
126 |         batched=True,
127 |         num_proc=8,
128 |         remove_columns=column_names,
129 |         load_from_cache_file=True,
130 |         desc=f"Running preprocessing on {dataset_name} dataset",
131 |     )
132 |     print(f"Saving to disk at {DIR}/{dataset_name}")
133 |     processed.save_to_disk(f"{DIR}/{dataset_name}")
134 |     del processed
135 | 
136 | 
137 | def mctest(DIR, dataset_name="mctest"):
138 |     if os.path.isdir(f"{DIR}/{dataset_name}"):
139 |         return
140 |     dataset = load_dataset("sagnikrayc/mctest")
141 |     column_names = dataset["train"].column_names
142 |     processed = dataset.map(
143 |         preprocess_mctest,
144 |         batched=True,
145 |         num_proc=8,
146 |         remove_columns=column_names,
147 |         load_from_cache_file=True,
148 |         desc=f"Running preprocessing on {dataset_name} dataset",
149 |     )
150 |     print(f"Saving to disk at {DIR}/{dataset_name}")
151 |     processed.save_to_disk(f"{DIR}/{dataset_name}")
152 |     del processed
153 | 
154 | 
155 | def natural_questions(DIR, dataset_name="natural_questions"):
156 |     if os.path.isdir(f"{DIR}/{dataset_name}"):
157 |         return
158 |     dataset = load_dataset("mrqa")
159 |     dataset = dataset.filter(lambda ex: ex["subset"] == "NaturalQuestionsShort")
160 |     column_names = dataset["train"].column_names
161 |     processed = dataset.map(
162 |         preprocess_mrqa,
163 |         batched=True,
164 |         num_proc=8,
165 |         remove_columns=column_names,
166 |         load_from_cache_file=True,
167 |         desc=f"Running preprocessing on {dataset_name} dataset",
168 |     )
169 |     print(f"Saving to disk at {DIR}/{dataset_name}")
170 |     processed.save_to_disk(f"{DIR}/{dataset_name}")
171 |     del processed
172 |     
173 | 
174 | def main(args):
175 |     DIR = args.dir
176 | 
177 |     process_dataset(DIR, "mrqa", preprocess_mrqa)
178 |     process_dataset(DIR, "narrativeqa", preprocess_narrative_qa)
179 |     mctest(DIR)
180 |     process_dataset(DIR, "boolq", preprocess_boolq)
181 | 
182 |     process_dataset(DIR, "squad", preprocess_squad)
183 |     process_dataset(DIR, "quoref", preprocess_squad)
184 |     process_dataset(DIR, "drop", preprocess_drop)
185 |     natural_questions(DIR)
186 | 
187 | 
188 | if __name__ == "__main__":
189 |     parser = argparse.ArgumentParser()
190 |     parser.add_argument("--dir", type=str, default="",
191 |                         help="Path to the datasets directory.")
192 |     args = parser.parse_args()
193 |     main(args)
194 | 


--------------------------------------------------------------------------------
/MixQG/eval.sh:
--------------------------------------------------------------------------------
 1 | gpu=$1
 2 | model=$2
 3 | dataset=$3
 4 | output_dir=$4
 5 | bs=$5
 6 | 
 7 | CUDA_VISIBLE_DEVICES=${gpu} python run_qg.py \
 8 | --model_name_or_path ${model} \
 9 | --dataset_dir ${dataset} \
10 | --output_dir ${output_dir} \
11 | --do_eval \
12 | --predict_with_generate True \
13 | --per_device_eval_batch_size=${bs} \
14 | --run_name ${output_dir} \
15 | --report_to none \
16 | --max_target_length 32 \
17 | --val_max_target_length 32 \
18 | --metric_for_best_model eval_rougeLsum \
19 | 


--------------------------------------------------------------------------------
/MixQG/requirements.txt:
--------------------------------------------------------------------------------
 1 | transformers==4.9.1
 2 | nltk>=3.6.4
 3 | numpy>=1.19.1
 4 | datasets==1.13.3
 5 | rouge_score==0.0.4
 6 | en-core-web-sm @ https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.3.1/en_core_web_sm-2.3.1.tar.gz
 7 | sacrebleu==1.4.12
 8 | spacy==2.3.1
 9 | bert-score==0.3.10
10 | deepspeed==0.5.4
11 | wandb==0.11.0


--------------------------------------------------------------------------------
/MixQG/run_qg.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python
  2 | # coding=utf-8
  3 | # Copyright 2021 The HuggingFace Team. All rights reserved.
  4 | #
  5 | # Licensed under the Apache License, Version 2.0 (the "License");
  6 | # you may not use this file except in compliance with the License.
  7 | # You may obtain a copy of the License at
  8 | #
  9 | #     http://www.apache.org/licenses/LICENSE-2.0
 10 | #
 11 | # Unless required by applicable law or agreed to in writing, software
 12 | # distributed under the License is distributed on an "AS IS" BASIS,
 13 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 14 | # See the License for the specific language governing permissions and
 15 | # limitations under the License.
 16 | """
 17 | Fine-tuning script adapted for question generation.
 18 | """
 19 | 
 20 | import logging
 21 | import os
 22 | import sys
 23 | from dataclasses import dataclass, field
 24 | from typing import Optional
 25 | 
 26 | import nltk  # Here to have a nice missing dependency error message early on
 27 | import numpy as np
 28 | import transformers
 29 | from bert_score import BERTScorer
 30 | from datasets import load_dataset, load_from_disk, load_metric
 31 | from filelock import FileLock
 32 | from transformers import (AutoConfig, AutoModelForSeq2SeqLM, AutoTokenizer,
 33 |                           DataCollatorForSeq2Seq, EarlyStoppingCallback,
 34 |                           HfArgumentParser, Seq2SeqTrainer,
 35 |                           Seq2SeqTrainingArguments, set_seed)
 36 | from transformers.file_utils import is_offline_mode
 37 | from transformers.trainer_utils import get_last_checkpoint, is_main_process
 38 | from transformers.utils import check_min_version
 39 | 
 40 | # Will error if the minimal version of Transformers is not installed. Remove at your own risks.
 41 | check_min_version("4.6.0.dev0")
 42 | 
 43 | logger = logging.getLogger(__name__)
 44 | 
 45 | try:
 46 |     nltk.data.find("tokenizers/punkt")
 47 | except (LookupError, OSError):
 48 |     if is_offline_mode():
 49 |         raise LookupError(
 50 |             "Offline mode: run this script without TRANSFORMERS_OFFLINE first to download nltk data files"
 51 |         )
 52 |     with FileLock(".lock") as lock:
 53 |         nltk.download("punkt", quiet=True)
 54 | 
 55 | 
 56 | @dataclass
 57 | class ModelArguments:
 58 |     """
 59 |     Arguments pertaining to which model/config/tokenizer we are going to fine-tune from.
 60 |     """
 61 | 
 62 |     model_name_or_path: str = field(
 63 |         metadata={
 64 |             "help": "Path to pretrained model or model identifier from huggingface.co/models"}
 65 |     )
 66 |     config_name: Optional[str] = field(
 67 |         default=None, metadata={"help": "Pretrained config name or path if not the same as model_name"}
 68 |     )
 69 |     tokenizer_name: Optional[str] = field(
 70 |         default=None, metadata={"help": "Pretrained tokenizer name or path if not the same as model_name"}
 71 |     )
 72 |     cache_dir: Optional[str] = field(
 73 |         default=None,
 74 |         metadata={
 75 |             "help": "Where to store the pretrained models downloaded from huggingface.co"},
 76 |     )
 77 |     use_fast_tokenizer: bool = field(
 78 |         default=True,
 79 |         metadata={
 80 |             "help": "Whether to use one of the fast tokenizer (backed by the tokenizers library) or not."},
 81 |     )
 82 |     model_revision: str = field(
 83 |         default="main",
 84 |         metadata={
 85 |             "help": "The specific model version to use (can be a branch name, tag name or commit id)."},
 86 |     )
 87 |     use_auth_token: bool = field(
 88 |         default=False,
 89 |         metadata={
 90 |             "help": "Will use the token generated when running `transformers-cli login` (necessary to use this script "
 91 |             "with private models)."
 92 |         },
 93 |     )
 94 |     dropout_rate: float = field(
 95 |         default=0.1,
 96 |         metadata={"help": "Dropout rate."}
 97 |     )
 98 | 
 99 | 
100 | @dataclass
101 | class DataTrainingArguments:
102 |     """
103 |     Arguments pertaining to what data we are going to input our model for training and eval.
104 |     """
105 | 
106 |     dataset_name: Optional[str] = field(
107 |         default=None, metadata={"help": "The name of the dataset to use (via the datasets library)."}
108 |     )
109 |     dataset_config_name: Optional[str] = field(
110 |         default=None, metadata={"help": "The configuration name of the dataset to use (via the datasets library)."}
111 |     )
112 |     dataset_dir: Optional[str] = field(
113 |         default=None, metadata={"help": "The input data directory (saved via save_to_disk)."}
114 |     )
115 |     train_file: Optional[str] = field(
116 |         default=None, metadata={"help": "The input training data file (a jsonlines or csv file)."}
117 |     )
118 |     validation_file: Optional[str] = field(
119 |         default=None,
120 |         metadata={
121 |             "help": "An optional input evaluation data file to evaluate the metrics (rouge) on "
122 |             "(a jsonlines or csv file)."
123 |         },
124 |     )
125 |     test_file: Optional[str] = field(
126 |         default=None,
127 |         metadata={
128 |             "help": "An optional input test data file to evaluate the metrics (rouge) on " "(a jsonlines or csv file)."
129 |         },
130 |     )
131 |     overwrite_cache: bool = field(
132 |         default=False, metadata={"help": "Overwrite the cached training and evaluation sets"}
133 |     )
134 |     preprocessing_num_workers: Optional[int] = field(
135 |         default=None,
136 |         metadata={"help": "The number of processes to use for the preprocessing."},
137 |     )
138 |     max_source_length: Optional[int] = field(
139 |         default=512,
140 |         metadata={
141 |             "help": "The maximum total input sequence length after tokenization. Sequences longer "
142 |             "than this will be truncated, sequences shorter will be padded."
143 |         },
144 |     )
145 |     max_target_length: Optional[int] = field(
146 |         default=100,
147 |         metadata={
148 |             "help": "The maximum total sequence length for target text after tokenization. Sequences longer "
149 |             "than this will be truncated, sequences shorter will be padded."
150 |         },
151 |     )
152 |     val_max_target_length: Optional[int] = field(
153 |         default=100,
154 |         metadata={
155 |             "help": "The maximum total sequence length for validation target text after tokenization. Sequences longer "
156 |             "than this will be truncated, sequences shorter will be padded. Will default to `max_target_length`."
157 |             "This argument is also used to override the ``max_length`` param of ``model.generate``, which is used "
158 |             "during ``evaluate`` and ``predict``."
159 |         },
160 |     )
161 |     pad_to_max_length: bool = field(
162 |         default=True,
163 |         metadata={
164 |             "help": "Whether to pad all samples to model maximum sentence length. "
165 |             "If False, will pad the samples dynamically when batching to the maximum length in the batch. More "
166 |             "efficient on GPU but very bad for TPU."
167 |         },
168 |     )
169 |     max_train_samples: Optional[int] = field(
170 |         default=None,
171 |         metadata={
172 |             "help": "For debugging purposes or quicker training, truncate the number of training examples to this "
173 |             "value if set."
174 |         },
175 |     )
176 |     max_val_samples: Optional[int] = field(
177 |         default=None,
178 |         metadata={
179 |             "help": "For debugging purposes or quicker training, truncate the number of validation examples to this "
180 |             "value if set."
181 |         },
182 |     )
183 |     max_test_samples: Optional[int] = field(
184 |         default=None,
185 |         metadata={
186 |             "help": "For debugging purposes or quicker training, truncate the number of test examples to this "
187 |             "value if set."
188 |         },
189 |     )
190 |     num_beams: Optional[int] = field(
191 |         default=4,
192 |         metadata={
193 |             "help": "Number of beams to use for evaluation. This argument will be passed to ``model.generate``, "
194 |             "which is used during ``evaluate`` and ``predict``."
195 |         },
196 |     )
197 |     ignore_pad_token_for_loss: bool = field(
198 |         default=True,
199 |         metadata={
200 |             "help": "Whether to ignore the tokens corresponding to padded labels in the loss computation or not."
201 |         },
202 |     )
203 |     question_column: Optional[str] = field(
204 |         default="question",
205 |         metadata={
206 |             "help": "The name of the column in the datasets containing the question."},
207 |     )
208 |     answer_column: Optional[str] = field(
209 |         default="answer",
210 |         metadata={
211 |             "help": "The name of the column in the datasets containing the answer."},
212 |     )
213 |     context_column: Optional[str] = field(
214 |         default="context",
215 |         metadata={
216 |             "help": "The name of the column in the datasets containing the context."},
217 |     )
218 |     early_stopping_patience: Optional[int] = field(
219 |         default=15,
220 |         metadata={
221 |             "help": "Early stopping patience. This argument will be passed to ``EarlyStoppingCallback``."
222 |         },
223 |     )
224 |     wandb_run_id: Optional[str] = field(
225 |         default=None,
226 |         metadata={"help": "Wandb run id to resume training."},
227 |     )
228 | 
229 |     def __post_init__(self):
230 |         if self.dataset_name is None and self.train_file is None and self.validation_file is None and self.dataset_dir is None:
231 |             raise ValueError(
232 |                 "Need either a dataset name, dataset directory or a training/validation file.")
233 |         else:
234 |             if self.train_file is not None:
235 |                 extension = self.train_file.split(".")[-1]
236 |                 assert extension in [
237 |                     "csv", "json"], "`train_file` should be a csv or a json file."
238 |             if self.validation_file is not None:
239 |                 extension = self.validation_file.split(".")[-1]
240 |                 assert extension in [
241 |                     "csv", "json"], "`validation_file` should be a csv or a json file."
242 |         if self.val_max_target_length is None:
243 |             self.val_max_target_length = self.max_target_length
244 | 
245 | 
246 | def main():
247 |     # See all possible arguments in src/transformers/training_args.py
248 |     # or by passing the --help flag to this script.
249 |     # We now keep distinct sets of args, for a cleaner separation of concerns.
250 | 
251 |     parser = HfArgumentParser(
252 |         (ModelArguments, DataTrainingArguments, Seq2SeqTrainingArguments))
253 |     if len(sys.argv) == 2 and sys.argv[1].endswith(".json"):
254 |         # If we pass only one argument to the script and it's the path to a json file,
255 |         # let's parse it to get our arguments.
256 |         model_args, data_args, training_args = parser.parse_json_file(
257 |             json_file=os.path.abspath(sys.argv[1]))
258 |     else:
259 |         model_args, data_args, training_args = parser.parse_args_into_dataclasses()
260 | 
261 |     # Detecting last checkpoint.
262 |     last_checkpoint = None
263 |     if os.path.isdir(training_args.output_dir) and training_args.do_train and not training_args.overwrite_output_dir:
264 |         last_checkpoint = get_last_checkpoint(training_args.output_dir)
265 |         if last_checkpoint is None and len(os.listdir(training_args.output_dir)) > 0:
266 |             raise ValueError(
267 |                 f"Output directory ({training_args.output_dir}) already exists and is not empty. "
268 |                 "Use --overwrite_output_dir to overcome."
269 |             )
270 |         elif last_checkpoint is not None and training_args.resume_from_checkpoint is None:
271 |             logger.info(
272 |                 f"Checkpoint detected, resuming training at {last_checkpoint}. To avoid this behavior, change "
273 |                 "the `--output_dir` or add `--overwrite_output_dir` to train from scratch."
274 |             )
275 | 
276 |     # Setup logging
277 |     logging.basicConfig(
278 |         format="%(asctime)s - %(levelname)s - %(name)s -   %(message)s",
279 |         datefmt="%m/%d/%Y %H:%M:%S",
280 |         handlers=[logging.StreamHandler(sys.stdout)],
281 |     )
282 |     logger.setLevel(logging.INFO if is_main_process(
283 |         training_args.local_rank) else logging.WARN)
284 | 
285 |     # Log on each process the small summary:
286 |     logger.warning(
287 |         f"Process rank: {training_args.local_rank}, device: {training_args.device}, n_gpu: {training_args.n_gpu} "
288 |         + f"distributed training: {bool(training_args.local_rank != -1)}, 16-bits training: {training_args.fp16}"
289 |     )
290 |     # Set the verbosity to info of the Transformers logger (on main process only):
291 |     if is_main_process(training_args.local_rank):
292 |         transformers.utils.logging.set_verbosity_info()
293 |     logger.info(f"Training/evaluation parameters {training_args}")
294 | 
295 |     # Set seed before initializing model.
296 |     set_seed(training_args.seed)
297 | 
298 |     # Set project name
299 |     os.environ["TOKENIZERS_PARALLELISM"] = "false"
300 |     os.environ["WANDB_PROJECT"] = "question_generation"
301 |     if data_args.wandb_run_id:
302 |         os.environ["WANDB_RESUME"] = "allow"
303 |         os.environ["WANDB_RUN_ID"] = data_args.wandb_run_id
304 | 
305 |     # Get the datasets: you can either provide your own CSV/JSON training and evaluation files (see below)
306 |     # or just provide the name of one of the public datasets available on the hub at https://huggingface.co/datasets/
307 |     # (the dataset will be downloaded automatically from the datasets Hub).
308 |     #
309 |     # In distributed training, the load_dataset function guarantee that only one local process can concurrently
310 |     # download the dataset.
311 |     if data_args.dataset_name is not None:
312 |         # Downloading and loading a dataset from the hub.
313 |         datasets = load_dataset(data_args.dataset_name,
314 |                                 data_args.dataset_config_name)
315 |     elif data_args.dataset_dir is not None:
316 |         datasets = load_from_disk(data_args.dataset_dir)
317 |     else:
318 |         data_files = {}
319 |         if data_args.train_file is not None:
320 |             data_files["train"] = data_args.train_file
321 |             extension = data_args.train_file.split(".")[-1]
322 |         if data_args.validation_file is not None:
323 |             data_files["validation"] = data_args.validation_file
324 |             extension = data_args.validation_file.split(".")[-1]
325 |         if data_args.test_file is not None:
326 |             data_files["test"] = data_args.test_file
327 |             extension = data_args.test_file.split(".")[-1]
328 |         datasets = load_dataset(extension, data_files=data_files)
329 |     # See more about loading any type of standard or custom dataset (from files, python dict, pandas DataFrame, etc) at
330 |     # https://huggingface.co/docs/datasets/loading_datasets.html.
331 | 
332 |     # Load pretrained model and tokenizer
333 |     #
334 |     # Distributed training:
335 |     # The .from_pretrained methods guarantee that only one local process can concurrently
336 |     # download model & vocab.
337 |     config = AutoConfig.from_pretrained(
338 |         model_args.config_name if model_args.config_name else model_args.model_name_or_path,
339 |         cache_dir=model_args.cache_dir,
340 |         revision=model_args.model_revision,
341 |         use_auth_token=True if model_args.use_auth_token else None,
342 |         dropout_rate=model_args.dropout_rate,
343 |     )
344 |     tokenizer = AutoTokenizer.from_pretrained(
345 |         model_args.tokenizer_name if model_args.tokenizer_name else model_args.model_name_or_path,
346 |         cache_dir=model_args.cache_dir,
347 |         use_fast=model_args.use_fast_tokenizer,
348 |         revision=model_args.model_revision,
349 |         use_auth_token=True if model_args.use_auth_token else None,
350 |     )
351 |     model = AutoModelForSeq2SeqLM.from_pretrained(
352 |         model_args.model_name_or_path,
353 |         from_tf=bool(".ckpt" in model_args.model_name_or_path),
354 |         config=config,
355 |         cache_dir=model_args.cache_dir,
356 |         revision=model_args.model_revision,
357 |         use_auth_token=True if model_args.use_auth_token else None,
358 |     )
359 | 
360 |     if model.config.decoder_start_token_id is None:
361 |         raise ValueError(
362 |             "Make sure that `config.decoder_start_token_id` is correctly defined")
363 | 
364 |     # Preprocessing the datasets.
365 |     # We need to tokenize inputs and targets.
366 |     if training_args.do_train:
367 |         column_names = datasets["train"].column_names
368 |     elif training_args.do_eval:
369 |         column_names = datasets["validation"].column_names
370 |     elif training_args.do_predict:
371 |         column_names = datasets["test"].column_names
372 |     else:
373 |         logger.info(
374 |             "There is nothing to do. Please pass `do_train`, `do_eval` and/or `do_predict`.")
375 |         return
376 | 
377 |     # Temporarily set max_target_length for training.
378 |     max_source_length = data_args.max_source_length
379 |     max_target_length = data_args.max_target_length
380 |     padding = "max_length" if data_args.pad_to_max_length else False
381 | 
382 |     if training_args.label_smoothing_factor > 0 and not hasattr(model, "prepare_decoder_input_ids_from_labels"):
383 |         logger.warning(
384 |             "label_smoothing is enabled but the `prepare_decoder_input_ids_from_labels` method is not defined for"
385 |             f"`{model.__class__.__name__}`. This will lead to loss being calculated twice and will take up more memory"
386 |         )
387 | 
388 |     # Preprocessing the datasets.
389 |     question_column_name = data_args.question_column
390 |     context_column_name = data_args.context_column
391 |     answer_column_name = data_args.answer_column
392 | 
393 |     def format_inputs(context: str, answer: str):
394 |         return f"{answer} \\n {context}"
395 | 
396 |     def preprocess_function(examples):
397 |         context = examples[context_column_name]
398 |         answer = examples[answer_column_name]
399 |         question = examples[question_column_name]
400 | 
401 |         inputs = [format_inputs(ctx, ans) for ctx, ans in zip(context, answer)]
402 | 
403 |         model_inputs = tokenizer(inputs, max_length=max_source_length,
404 |                                  padding=padding, truncation=True)
405 | 
406 |         # Setup the tokenizer for targets
407 |         with tokenizer.as_target_tokenizer():
408 |             labels = tokenizer(question, max_length=max_target_length,
409 |                                padding=padding, truncation=True)
410 | 
411 |         # If we are padding here, replace all tokenizer.pad_token_id in the labels by -100 when we want to ignore
412 |         # padding in the loss.
413 |         if padding == "max_length" and data_args.ignore_pad_token_for_loss:
414 |             labels["input_ids"] = [
415 |                 [(l if l != tokenizer.pad_token_id else -100) for l in label] for label in labels["input_ids"]
416 |             ]
417 | 
418 |         model_inputs["labels"] = labels["input_ids"]
419 |         return model_inputs
420 | 
421 |     if training_args.do_train:
422 |         train_dataset = datasets["train"]
423 |         if "train" not in datasets:
424 |             raise ValueError("--do_train requires a train dataset")
425 |         if data_args.max_train_samples is not None:
426 |             train_dataset = train_dataset.select(
427 |                 range(data_args.max_train_samples))
428 |         train_dataset = train_dataset.map(
429 |             preprocess_function,
430 |             batched=True,
431 |             num_proc=data_args.preprocessing_num_workers,
432 |             remove_columns=column_names,
433 |             load_from_cache_file=not data_args.overwrite_cache,
434 |         )
435 | 
436 |     if training_args.do_eval:
437 |         max_target_length = data_args.val_max_target_length
438 |         if "validation" not in datasets:
439 |             raise ValueError("--do_eval requires a validation dataset")
440 |         eval_dataset = datasets["validation"]
441 |         if data_args.max_val_samples is not None:
442 |             eval_dataset = eval_dataset.select(
443 |                 range(data_args.max_val_samples))
444 |         eval_dataset = eval_dataset.map(
445 |             preprocess_function,
446 |             batched=True,
447 |             num_proc=data_args.preprocessing_num_workers,
448 |             remove_columns=column_names,
449 |             load_from_cache_file=not data_args.overwrite_cache,
450 |         )
451 | 
452 |     if training_args.do_predict:
453 |         max_target_length = data_args.val_max_target_length
454 |         if "test" not in datasets:
455 |             raise ValueError("--do_predict requires a test dataset")
456 |         test_dataset = datasets["test"]
457 |         if data_args.max_test_samples is not None:
458 |             test_dataset = test_dataset.select(
459 |                 range(data_args.max_test_samples))
460 |         test_dataset = test_dataset.map(
461 |             preprocess_function,
462 |             batched=True,
463 |             num_proc=data_args.preprocessing_num_workers,
464 |             remove_columns=column_names,
465 |             load_from_cache_file=not data_args.overwrite_cache,
466 |         )
467 | 
468 |     # Data collator
469 |     label_pad_token_id = - \
470 |         100 if data_args.ignore_pad_token_for_loss else tokenizer.pad_token_id
471 |     data_collator = DataCollatorForSeq2Seq(
472 |         tokenizer,
473 |         model=model,
474 |         label_pad_token_id=label_pad_token_id,
475 |         pad_to_multiple_of=8 if training_args.fp16 else None,
476 |     )
477 | 
478 |     # Metric
479 |     rouge = load_metric("rouge")
480 |     bleu = load_metric("sacrebleu")
481 |     meteor = load_metric("meteor")
482 |     scorer = BERTScorer(lang="en", rescale_with_baseline=True)
483 | 
484 |     def postprocess_text(preds, labels):
485 |         preds = [pred.strip() for pred in preds]
486 |         labels = [label.strip() for label in labels]
487 | 
488 |         bleu_labels = [[label] for label in labels]
489 | 
490 |         # rougeLSum expects newline after each sentence
491 |         rouge_preds = ["\n".join(nltk.sent_tokenize(pred)) for pred in preds]
492 |         rouge_labels = ["\n".join(nltk.sent_tokenize(label))
493 |                         for label in labels]
494 | 
495 |         return {
496 |             "bleu": [preds, bleu_labels],
497 |             "meteor": [preds, labels],
498 |             "rouge": [rouge_preds, rouge_labels]
499 |         }
500 | 
501 |     def compute_metrics(eval_preds):
502 |         preds, labels = eval_preds
503 |         if isinstance(preds, tuple):
504 |             preds = preds[0]
505 |         decoded_preds = tokenizer.batch_decode(preds, skip_special_tokens=True)
506 |         if data_args.ignore_pad_token_for_loss:
507 |             # Replace -100 in the labels as we can't decode them.
508 |             labels = np.where(labels != -100, labels, tokenizer.pad_token_id)
509 |         decoded_labels = tokenizer.batch_decode(
510 |             labels, skip_special_tokens=True)
511 | 
512 |         # Some simple post-processing
513 |         decoded = postprocess_text(decoded_preds, decoded_labels)
514 | 
515 |         result = rouge.compute(
516 |             predictions=decoded["rouge"][0], references=decoded["rouge"][1], use_stemmer=True)
517 |         # Extract a few results from ROUGE
518 |         result = {key: value.mid.fmeasure *
519 |                   100 for key, value in result.items()}
520 | 
521 |         prediction_lens = [np.count_nonzero(
522 |             pred != tokenizer.pad_token_id) for pred in preds]
523 |         result["gen_len"] = np.mean(prediction_lens)
524 |         result = {k: round(v, 4) for k, v in result.items()}
525 | 
526 |         bleu_result = bleu.compute(
527 |             predictions=decoded["bleu"][0], references=decoded["bleu"][1])
528 |         result["bleu"] = bleu_result["score"]
529 | 
530 |         meteor_result = meteor.compute(
531 |             predictions=decoded["meteor"][0], references=decoded["meteor"][1])
532 |         result["meteor"] = meteor_result["meteor"]
533 | 
534 |         P, R, F1 = scorer.score(decoded["meteor"][0], decoded["meteor"][1])
535 |         result["bertscore"] = np.mean(F1.tolist())
536 | 
537 |         return result
538 | 
539 |     # Initialize our Trainer
540 |     trainer = Seq2SeqTrainer(
541 |         model=model,
542 |         args=training_args,
543 |         train_dataset=train_dataset if training_args.do_train else None,
544 |         eval_dataset=eval_dataset if training_args.do_eval else None,
545 |         tokenizer=tokenizer,
546 |         data_collator=data_collator,
547 |         compute_metrics=compute_metrics if training_args.predict_with_generate else None,
548 |         callbacks=[EarlyStoppingCallback(
549 |             early_stopping_patience=data_args.early_stopping_patience)],
550 |     )
551 | 
552 |     # Training
553 |     if training_args.do_train:
554 |         checkpoint = None
555 |         if training_args.resume_from_checkpoint is not None:
556 |             checkpoint = training_args.resume_from_checkpoint
557 |         elif last_checkpoint is not None:
558 |             checkpoint = last_checkpoint
559 |         train_result = trainer.train(resume_from_checkpoint=checkpoint)
560 |         trainer.save_model()  # Saves the tokenizer too for easy upload
561 | 
562 |         metrics = train_result.metrics
563 |         max_train_samples = (
564 |             data_args.max_train_samples if data_args.max_train_samples is not None else len(
565 |                 train_dataset)
566 |         )
567 |         metrics["train_samples"] = min(max_train_samples, len(train_dataset))
568 | 
569 |         trainer.log_metrics("train", metrics)
570 |         trainer.save_metrics("train", metrics)
571 |         trainer.save_state()
572 | 
573 |     # Evaluation
574 |     results = {}
575 |     if training_args.do_eval:
576 |         logger.info("*** Evaluate ***")
577 | 
578 |         metrics = trainer.evaluate(
579 |             max_length=data_args.val_max_target_length, num_beams=data_args.num_beams, metric_key_prefix="eval"
580 |         )
581 |         max_val_samples = data_args.max_val_samples if data_args.max_val_samples is not None else len(
582 |             eval_dataset)
583 |         metrics["eval_samples"] = min(max_val_samples, len(eval_dataset))
584 | 
585 |         trainer.log_metrics("eval", metrics)
586 |         trainer.save_metrics("eval", metrics)
587 | 
588 |     if training_args.do_predict:
589 |         logger.info("*** Test ***")
590 | 
591 |         test_results = trainer.predict(
592 |             test_dataset,
593 |             metric_key_prefix="test",
594 |             max_length=data_args.val_max_target_length,
595 |             num_beams=data_args.num_beams,
596 |         )
597 |         metrics = test_results.metrics
598 |         max_test_samples = data_args.max_test_samples if data_args.max_test_samples is not None else len(
599 |             test_dataset)
600 |         metrics["test_samples"] = min(max_test_samples, len(test_dataset))
601 | 
602 |         trainer.log_metrics("test", metrics)
603 |         trainer.save_metrics("test", metrics)
604 | 
605 |         if trainer.is_world_process_zero():
606 |             if training_args.predict_with_generate:
607 |                 test_preds = tokenizer.batch_decode(
608 |                     test_results.predictions, skip_special_tokens=True, clean_up_tokenization_spaces=True
609 |                 )
610 |                 test_preds = [pred.strip() for pred in test_preds]
611 |                 output_test_preds_file = os.path.join(
612 |                     training_args.output_dir, "test_generations.txt")
613 |                 with open(output_test_preds_file, "w") as writer:
614 |                     writer.write("\n".join(test_preds))
615 | 
616 |     return results
617 | 
618 | 
619 | def _mp_fn(index):
620 |     # For xla_spawn (TPUs)
621 |     main()
622 | 
623 | 
624 | if __name__ == "__main__":
625 |     main()
626 | 


--------------------------------------------------------------------------------
/MixQG/train.sh:
--------------------------------------------------------------------------------
 1 | num_gpus=$1
 2 | model_name=$2
 3 | dataset=$3
 4 | output_dir=$4
 5 | lr=$5
 6 | bs=$6
 7 | 
 8 | deepspeed --num_gpus=${num_gpus} run_qg.py \
 9 | --model_name_or_path ${model_name} \
10 | --dataset_dir ${dataset} \
11 | --output_dir ${output_dir} \
12 | --do_train \
13 | --do_eval \
14 | --evaluation_strategy steps \
15 | --eval_steps 2000 \
16 | --save_steps 2000 \
17 | --load_best_model_at_end True \
18 | --metric_for_best_model eval_rougeLsum \
19 | --greater_is_better True \
20 | --predict_with_generate True \
21 | --per_device_eval_batch_size=${bs} \
22 | --per_device_train_batch_size=${bs} \
23 | --gradient_accumulation_steps=1 \
24 | --max_steps 100000 \
25 | --logging_steps 100 \
26 | --save_total_limit 4 \
27 | --deepspeed configs/ds_config_zero2.json \
28 | --adam_eps 1e-06 \
29 | --label_smoothing 0.1 \
30 | --learning_rate ${lr} \
31 | --logging_first_step \
32 | --warmup_steps 500 \
33 | --max_target_length 32 \
34 | --val_max_target_length 32 \
35 | --fp16
36 | 


--------------------------------------------------------------------------------
/Quiz_Design/README.md:
--------------------------------------------------------------------------------
 1 | # Quiz Design: Helping Teachers Create Quizzes with Automated Question Generation
 2 | 
 3 | This is the official code base for the following paper from Salesforce Research:
 4 | 
 5 | **Title**: Quiz Design Task: Helping Teachers Create Quizzes with Automated Question Generation
 6 | 
 7 | **Authors**: Philippe Laban, Chien-Sheng Wu, Lidiya Murakhovs'ka, Wenhao Liu, Caiming Xiong
 8 | 
 9 | ## Dataset Release:
10 | 
11 | We release the dataset we collected during the study: `quiz_design_data.jsonl`. It can be opened in Python with the following:
12 | 
13 | ```python
14 | import utils_qd_data
15 | annotations = utils_qd_data.load_qd_annotations()
16 | qd_dataset = utils_qd_data.build_qd_groups(annotations)
17 | ```
18 | 
19 | Each line is a JSON object. The first entry looks like:
20 | ```json
21 | {"doc_id": 0,
22 |  "answer_span": "meets the needs of the present without compromising the ability of future generations to meet their own needs",
23 |  "context": "Energy is sustainable if it 'meets the needs of the present without compromising the ability of future generations to meet their own needs'.  Most definitions of sustainable energy [...]",
24 |  "questions": [{"question": "What does energy mean if it is sustainable?",
25 |    "label": 0,
26 |    "reason": "disfluent",
27 |    "model_name": "dgpt2_sup"},
28 |   {"question": "What does energy sustainability mean?",
29 |    "label": 1,
30 |    "reason": "No error",
31 |    "model_name": "gpt2b_sup"},
32 |   {"question": "How is energy sustainable?",
33 |    "label": 0,
34 |    "reason": "wrong_context",
35 |    "model_name": "gpt2m_sup"},
36 |   {"question": "What is sustainable energy?",
37 |    "label": 0,
38 |    "reason": "wrong_context",
39 |    "model_name": "bartb_sup|prophetnet"},
40 |   {"question": "What does it mean if energy is sustainable?",
41 |    "label": 1,
42 |    "reason": "No error",
43 |    "model_name": "mixqg"},
44 |   {"question": "What is the definition of sustainable energy?",
45 |    "label": 1,
46 |    "reason": "No error",
47 |    "model_name": "bartl_sup"}]}
48 | ```
49 | 
50 | ## Annotation Interface
51 | 
52 | We release the annotation interface used during the collection of the Quiz Design study.
53 | The interface can instantiated with the following command:
54 | ```
55 | FLASK_APP=run_flask_server flask run
56 | ```
57 | 
58 | The list of Question Generation models used to generate candidate questions can be modified in the first lines of `run_flask_server.py`.
59 | 
60 | ## Cite the work
61 | 
62 | If you use the data or annotation interface, please cite the work:
63 | ```
64 | @inproceedings{laban2022quiz,
65 |   title={Quiz Design Task: Helping Teachers Create Quizzes with Automated Question Generation},
66 |   author={Laban, Philippe and Wu, Chien-Sheng and Murakhovs'ka, Lidiya and Liu, Wenhao and Xiong, Caiming},
67 |   booktitle={Findings of the North American Chapter of the Association for Computational Linguistics: NAACL 2022},
68 |   year={2022}
69 | }
70 | ```
71 | 


--------------------------------------------------------------------------------
/Quiz_Design/model_hf_generator.py:
--------------------------------------------------------------------------------
  1 | from transformers import AutoTokenizer, AutoModelForCausalLM, AutoModelForSeq2SeqLM
  2 | import torch, os, tqdm
  3 | 
  4 | def select_logprobs(logits, decoded_tokens, eos_id):
  5 |     logprobs = torch.nn.functional.log_softmax(logits, dim=2)
  6 | 
  7 |     selected_logprobs = []
  8 |     for i, generated_tokenized in enumerate(decoded_tokens):
  9 |         if eos_id in generated_tokenized:
 10 |             generated_tokenized = generated_tokenized[:generated_tokenized.index(eos_id)]
 11 |         selected_logprob = logprobs[i, torch.arange(len(generated_tokenized)), generated_tokenized]
 12 |         summed_logprob = torch.sum(selected_logprob)
 13 |         selected_logprobs.append(summed_logprob)
 14 |     selected_logprobs = torch.stack(selected_logprobs, dim=0)
 15 |     return selected_logprobs
 16 | 
 17 | models_folder = os.environ["MODELS_FOLDER"]
 18 | 
 19 | class GeneratorHF:
 20 |     def __init__(self, model_card="gpt2-medium", device="cuda", starter_file=None, gradient_checkpointing=False, max_enc_length=None, max_dec_length=None, force_dec_prepend=None):
 21 |         self.model_card = model_card
 22 | 
 23 |         self.is_gpt2 = "gpt2" in self.model_card or "summary_loop" in self.model_card or "keep_it_simple" in self.model_card
 24 |         if self.is_gpt2:
 25 |             self.model = AutoModelForCausalLM.from_pretrained(self.model_card)
 26 |         else:
 27 |             self.model = AutoModelForSeq2SeqLM.from_pretrained(self.model_card)
 28 |         self.model.to(device)
 29 | 
 30 |         self.tokenizer = AutoTokenizer.from_pretrained(self.model_card)
 31 |         self.gradient_checkpointing = gradient_checkpointing
 32 |         self.max_enc_length = max_enc_length
 33 |         self.max_dec_length = max_dec_length
 34 |         self.force_dec_prepend = force_dec_prepend
 35 | 
 36 |         if self.gradient_checkpointing:
 37 |             self.model.gradient_checkpointing_enable()
 38 | 
 39 |         self.model.eval()
 40 | 
 41 |         if "facebook/wmt19" in self.model_card:
 42 |             self.tokenizer.pad_token = "<pad>"
 43 |             self.tokenizer.eos_token = "</s>"
 44 | 
 45 |         self.start_id = self.tokenizer.bos_token_id
 46 |         self.end_id = self.tokenizer.eos_token_id
 47 | 
 48 |         if "prophetnet" in self.model_card:
 49 |             # bos_token_id=102, eos_token_id=102
 50 |             self.start_id = 102
 51 |             self.end_id = 102
 52 | 
 53 |         if self.start_id is None and self.end_id is not None:
 54 |             # For MixQG
 55 |             self.start_id = 0
 56 | 
 57 |         self.device = device
 58 |         if self.is_gpt2:
 59 |             self.tokenizer.pad_token = self.tokenizer.eos_token
 60 | 
 61 |         self.model.config.pad_token_id = self.tokenizer.pad_token_id
 62 |         if starter_file is not None:
 63 |             self.reload(starter_file, strict=False)
 64 | 
 65 |     def reload(self, from_file, strict=True):
 66 |         if not os.path.isfile(from_file):
 67 |             # Try to look at the models folder for the file
 68 |             from_file = os.path.join(models_folder, from_file)
 69 |             assert os.path.isfile(from_file), "Starter file not found, in absolute or in models folder"
 70 | 
 71 |         loaded_dict = torch.load(from_file)
 72 |         print(self.model.load_state_dict(loaded_dict, strict=strict))
 73 | 
 74 |     def save(self, to_file):
 75 |         torch.save(self.model.state_dict(), to_file)
 76 | 
 77 |     def preprocess(self, encoded_texts, decoded_texts, max_enc_length=None, max_dec_length=None):
 78 | 
 79 |         assert len(encoded_texts) == len(decoded_texts), "Mismatch in input/output sizes"
 80 | 
 81 |         # encoder_tokenized = [torch.LongTensor(self.tokenizer.encode(text=text)) for text in encoded_texts]
 82 |         # encoder_ids = torch.nn.utils.rnn.pad_sequence(encoder_tokenized, batch_first=True, padding_value=0, truncation=True).to(self.device)
 83 | 
 84 |         encoder_ids = self.tokenizer.batch_encode_plus(encoded_texts, add_special_tokens=True, return_tensors="pt", padding=True, truncation=True).input_ids.to(self.device)
 85 | 
 86 |         if self.force_dec_prepend is not None:
 87 |             decoded_texts = [self.force_dec_prepend + text for text in decoded_texts]
 88 |         decoder_tokenized = [self.tokenizer.encode(text=text, add_special_tokens=False) for text in decoded_texts]
 89 | 
 90 |         decoder_ids_input = torch.nn.utils.rnn.pad_sequence([torch.LongTensor([self.start_id] + dec) for dec in decoder_tokenized], batch_first=True, padding_value=self.end_id).to(self.device)
 91 |         decoder_ids_output = torch.nn.utils.rnn.pad_sequence([torch.LongTensor(dec + [self.end_id]) for dec in decoder_tokenized], batch_first=True, padding_value=-1).to(self.device)
 92 | 
 93 |         if self.max_enc_length is not None and max_enc_length is None:
 94 |             max_enc_length = self.max_enc_length
 95 |         if self.max_dec_length is not None and max_dec_length is None:
 96 |             max_dec_length = self.max_dec_length
 97 | 
 98 |         if max_enc_length is not None:
 99 |             encoder_ids = encoder_ids[:, :max_enc_length]
100 | 
101 |         if max_dec_length is not None:
102 |             decoder_ids_input = decoder_ids_input[:, :max_dec_length]
103 |             decoder_ids_output = decoder_ids_output[:, :max_dec_length]
104 | 
105 |         return encoder_ids, decoder_ids_input, decoder_ids_output
106 | 
107 |     def train_batch(self, encoded_texts, decoded_texts, max_enc_length=None, max_dec_length=None, no_preinput=False):
108 |         self.model.train()
109 |         N = len(encoded_texts)
110 | 
111 |         encoder_ids, decoder_ids_input, decoder_ids_output = self.preprocess(encoded_texts, decoded_texts, max_enc_length, max_dec_length)
112 | 
113 |         crit = torch.nn.CrossEntropyLoss(ignore_index=-1)
114 |         if self.is_gpt2:
115 |             past = None
116 |             if not no_preinput:
117 |                 encoder_output = self.model(input_ids=encoder_ids, past_key_values=None, return_dict=True, use_cache=True)
118 |                 past = encoder_output["past_key_values"]
119 |             decoder_output = self.model(input_ids=decoder_ids_input, past_key_values=past, return_dict=True, use_cache=not self.gradient_checkpointing)
120 |             logits = decoder_output["logits"]
121 |         else:
122 |             if no_preinput:
123 |                 encoder_ids = torch.LongTensor([[self.start_id]]).repeat(N, 1).to(self.device)
124 |             model_output = self.model(input_ids=encoder_ids, decoder_input_ids=decoder_ids_input, return_dict=True, use_cache=not self.gradient_checkpointing)
125 |             logits = model_output["logits"]
126 | 
127 |         N_unwrap = decoder_ids_output.shape[0] * decoder_ids_output.shape[1]
128 |         loss = crit(logits.view(N_unwrap, -1), decoder_ids_output.contiguous().view(-1)) # self.tokenizer.vocab_size
129 |         return loss
130 | 
131 |     def score_batch(self, encoded_texts, decoded_texts, max_enc_length=None, max_dec_length=None):
132 |         encoder_ids, decoder_ids_input, decoder_ids_output = self.preprocess(encoded_texts, decoded_texts, max_enc_length, max_dec_length)
133 | 
134 |         with torch.no_grad():
135 | 
136 |             crit = torch.nn.CrossEntropyLoss(ignore_index=-1, reduction="none")
137 |             if self.is_gpt2:
138 |                 encoder_output = self.model(input_ids=encoder_ids, past_key_values=None, return_dict=True)
139 |                 past = encoder_output["past_key_values"]
140 |                 decoder_output = self.model(input_ids=decoder_ids_input, past_key_values=past, return_dict=True)
141 |                 logits = decoder_output["logits"]
142 |             else:
143 |                 model_output = self.model(input_ids=encoder_ids, decoder_input_ids=decoder_ids_input, return_dict=True)
144 |                 logits = model_output["logits"]
145 | 
146 |             N, seqlength, vocab_size = logits.shape
147 | 
148 |             loss_components = crit(logits.view(N*seqlength, vocab_size), decoder_ids_output.contiguous().view(-1)).reshape(N, seqlength)
149 |             num_words = torch.sum(decoder_ids_output != -1, dim=1)
150 |             score_per_item = (- torch.sum(loss_components, dim=1) / num_words).tolist()
151 |         return {"scores": score_per_item}
152 | 
153 |     def score(self, encoded_texts, decoded_texts, max_enc_length=None, max_dec_length=None, batch_size=32, progress=False):
154 |         N = len(encoded_texts)
155 |         iterator = range(0, N, batch_size)
156 |         if progress and len(iterator) > 1:
157 |             iterator = tqdm.tqdm(iterator)
158 |         scores = []
159 |         for i in iterator:
160 |             batch_encoded_texts = encoded_texts[i:i+batch_size]
161 |             batch_decoded_texts = decoded_texts[i:i+batch_size]
162 |             batch_scores = self.score_batch(batch_encoded_texts, batch_decoded_texts, max_enc_length, max_dec_length)["scores"]
163 |             scores += batch_scores
164 |         return {"scores": scores}
165 | 
166 |     def generate(self, texts, max_enc_length=None, max_gen_length=None, num_runs=1, compute_logprobs=False, force_start=None, **gen_params):
167 |         assert type(texts) == list, "The generate function takes as input a list of `str`"
168 |         if len(texts) == 0:
169 |             return []
170 | 
171 |         tokenized_paragraphs = [torch.LongTensor(self.tokenizer.encode(text=text)) for text in texts]
172 |         tokenized_paragraphs = [tok_text for tok_text in tokenized_paragraphs for _ in range(num_runs)]
173 | 
174 |         decoder_input_ids = None
175 |         if force_start is not None:
176 |             decoder_input_ids = self.tokenizer.encode(force_start, return_tensors="pt", add_special_tokens=False)
177 | 
178 |         # Generate without leaving gradients
179 |         with torch.no_grad():
180 |             encoder_ids = torch.nn.utils.rnn.pad_sequence(tokenized_paragraphs, batch_first=True, padding_value=0).to(self.device)
181 |             if max_enc_length is not None:
182 |                 encoder_ids = encoder_ids[:, :(max_enc_length-1)]
183 |             N = encoder_ids.shape[0]
184 |             start_column = torch.LongTensor([[self.start_id]] * N).to(self.device)
185 |             encoder_ids = torch.cat((encoder_ids, start_column), dim=1)
186 | 
187 |             if decoder_input_ids is not None:
188 |                 decoder_input_ids = decoder_input_ids.repeat(N, 1).to(self.device)
189 |                 if self.is_gpt2:
190 |                     encoder_ids = torch.cat((encoder_ids, decoder_input_ids), dim=1)
191 |                 else:
192 |                     decoder_input_ids = torch.cat((start_column, decoder_input_ids), dim=1)
193 |                     gen_params["decoder_input_ids"] = decoder_input_ids
194 | 
195 |             _, input_seq_length = encoder_ids.shape
196 |             if max_gen_length is not None:
197 |                 if self.is_gpt2:
198 |                     gen_params["max_length"] = input_seq_length + max_gen_length
199 |                 else:
200 |                     gen_params["max_length"] = max_gen_length
201 | 
202 |             if "num_beams" in gen_params: # Propagate param
203 |                 gen_params["num_return_sequences"] = gen_params["num_beams"]
204 | 
205 |             output_generate = self.model.generate(encoder_ids, return_dict_in_generate=True, output_scores=True, **gen_params)
206 | 
207 |         generated_ids = output_generate.sequences
208 |         if self.is_gpt2 and decoder_input_ids is not None:
209 |             generated_ids = torch.cat((decoder_input_ids, generated_ids), dim=1)
210 |         if self.is_gpt2:
211 |             generated_ids = generated_ids[:, input_seq_length:]
212 | 
213 |         N, gen_length = generated_ids.shape
214 |         batch_size = len(texts)
215 |         num_beams = N // (batch_size * num_runs)
216 |         if num_beams > 1:
217 |             # For some reason, they do not return a score if it is not beam-search...
218 |             sequences_scores = output_generate.sequences_scores
219 |         else:
220 |             sequences_scores = torch.zeros(N).to(self.device)
221 | 
222 |         # The next block is to obtain logprobs... unfortunately have to run the model again, as there's no good book-keeping for HF beam-search
223 |         selected_logprobs = torch.zeros(N).to(self.device)
224 |         if compute_logprobs:
225 |             # Don't run this unless we really need these (for RL training)
226 |             expanded_encoder_ids = torch.repeat_interleave(encoder_ids, repeats=num_beams, dim=0)
227 | 
228 |             if self.is_gpt2:
229 |                 generated_input = torch.cat((torch.LongTensor([[self.start_id]] * N).to(self.device), generated_ids), dim=1)
230 |                 generated_output = torch.cat((generated_ids, torch.LongTensor([[self.end_id]] * N).to(self.device)), dim=1) # There is an error here, the end_id could be AFTER padding... need to fix
231 | 
232 |                 expanded_encoder_ids = expanded_encoder_ids[:, :-1]
233 | 
234 |                 encoder_output = self.model(input_ids=expanded_encoder_ids[:, :-1], past_key_values=None, return_dict=True)
235 |                 decoder_output = self.model(input_ids=generated_input, past_key_values=encoder_output.past_key_values, return_dict=True)
236 | 
237 |                 selected_logprobs = utils_rl.select_logprobs(decoder_output.logits, generated_output.tolist(), self.end_id)
238 |             else:
239 |                 expanded_encoder_ids = torch.repeat_interleave(encoder_ids, repeats=num_beams, dim=0)
240 | 
241 |                 generated_input = generated_ids[:, :-1]
242 |                 generated_output = generated_ids[:, 1:]
243 | 
244 |                 model_output = self.model(input_ids=expanded_encoder_ids, decoder_input_ids=generated_input, return_dict=True)
245 |                 selected_logprobs = utils_rl.select_logprobs(model_output.logits, generated_output.tolist(), eos_id=self.end_id)
246 |             # print("Selected logprobs:", selected_logprobs.tolist())
247 | 
248 |         # Un-tokenize
249 |         generated_texts = self.tokenizer.batch_decode(generated_ids, skip_special_tokens=True)
250 | 
251 |         # Time to un-flatten
252 |         num_candidates = num_runs * num_beams
253 | 
254 |         generated_texts = [generated_texts[i:(i+num_candidates)] for i in range(0, N, num_candidates)]
255 |         selected_logprobs = [selected_logprobs[i:(i+num_candidates)] for i in range(0, N, num_candidates)]
256 |         sequences_scores = [sequences_scores[i:(i+num_candidates)] for i in range(0, N, num_candidates)]
257 | 
258 |         outputs = []
259 |         sort_by_key = "logprob" if compute_logprobs else "score"
260 | 
261 |         for gen_texts, scores, logprobs in zip(generated_texts, sequences_scores, selected_logprobs):
262 |             output = [{"output_text": gen_text, "logprob": logprob, "score": score} for gen_text, score, logprob in zip(gen_texts, scores, logprobs)]
263 |             output = sorted(output, key=lambda x: x[sort_by_key], reverse=True)
264 |             outputs.append(output)
265 | 
266 |         return outputs
267 | 
268 | 
269 | if __name__ == "__main__":
270 |     # qgen = GeneratorHF(model_card="gpt2-medium", starter_file="/export/home/models/qgen/gpt2_med_newsqa_only_logprob_2.059.bin")
271 |     # qgen = GeneratorHF(model_card="Salesforce/mixqg-large", starter_file="mixqgl_clean_qg_L_1.457.bin")
272 |     qgen = GeneratorHF(model_card="facebook/bart-large", starter_file="/export/home/models/bartl_clean_qg_L_1.917.bin")
273 |     paragraph = "Liu Qiangdong, also known as Richard Liu, CEO of JD.com, raises his arms to celebrate the IPO for his company at the Nasdaq MarketSite, New York, May 22, 2014."
274 | 
275 |     for start in ["Why", "How", "What"]:
276 |         print(qgen.generate([paragraph], force_start=start, max_gen_length=20)[0][0]["output_text"])
277 |     exit()
278 | 
279 |     # gpt2zs = GeneratorHF(model_card="gpt2-large")
280 |     # document = "US President Joe Biden spoke at a news conference Thursday at the NATO headquarters in Brussels, Belgium, after meeting with other world leaders of NATO, the European Council and the G7. The key global figures are seeking to align their responses to Russia's invasion of Ukraine. The President touched upon the unity of NATO, the prospect of Russian President Vladimir Putin using chemical weapons, and the possible role of China in the conflict. Biden took questions from reporters and spoke for roughly 30 minutes. TL;DR:"
281 |     # print(gpt2zs.generate([document], num_runs=1, max_gen_length=100))
282 | 
283 |     # exit()
284 |     paragraphs = ["On Tuesday, the Joint Committee on Administrative Rules (JCAR) voted against extending the Illinois Department of Public Health (IDPH) emergency rule on school mask mandates."]
285 |     gen2 = GeneratorHF(model_card="gpt2-medium", starter_file="qgen/gpt2_med_newsqab_sched_logprob_1.793.bin")
286 |     # gen2.eval()
287 | 
288 |     batch_outs2 = gen2.generate(paragraphs, max_gen_length=20, do_sample=True, num_runs=3)
289 |     for outs2 in batch_outs2:
290 |         print("=========")
291 |         for out2 in outs2:
292 |             print("[%.3f] %s" % (out2["logprob"], out2["output_text"]))
293 |         print("--------")
294 | 
295 | 
296 |     exit()
297 |     gen = GeneratorHF(model_card="philippelaban/keep_it_simple")
298 |     # paragraph = """A small capsule containing asteroid soil samples that was dropped from 136,700 miles in space by Japan's Hayabusa2 spacecraft landed as planned in the Australian Outback on December 6. The extremely high precision required to carry out the mission thrilled many in Japan, who said they took pride in its success."""
299 |     paragraph = """Earth travels a tremendous distance in its orbit around the sun, at a speed of around 30km/s or over 108000km per hour."""
300 |     outs = gen.generate([paragraph], max_length=150, num_beams=4, do_sample=True, num_return_sequences=4)[0]
301 |     for out in outs:
302 |         print("[%.3f] %s" % (out["score"], out["output_text"]))
303 |         print()
304 |     # gens = [out["output_text"] for out in outs]
305 |     # inps = [paragraph] * len(gens)
306 | 
307 |     inps = ["Earth travels a tremendous distance in its orbit around the sun, at a speed of around 30km/s or over 108000km per hour."] * 2
308 |     gens = ["Earth travels a tremendous size in its orbit around the sun, at a speed of around 30 km/s or over 108000km.", "The experiment The Earth travels very quickly -LRB- 100,000 km per hour -RRB- around the Sun ."]
309 | 
310 |     print(gen.score(inps, gens))
311 | 
312 |     # from model_generator import Generator
313 |     import utils_misc, utils_squad
314 |     utils_misc.select_freer_gpu()
315 | 
316 |     paragraph = "The Palazzo Pitti (Italian pronunciation: [paˈlattso ˈpitti]), in English sometimes called the Pitti Palace, is a vast, mainly Renaissance, palace in Florence, Italy. It is situated on the south side of the River Arno, a short distance from the Ponte Vecchio. The core of the present palazzo dates from 1458 and was originally the town residence of Luca Pitti an ambitious Florentine banker."
317 | 
318 |     answer = "Luca Pitti"
319 | 
320 |     marked_paragraph = utils_squad.mark_paragraph_answer(paragraph, answer, model_card="Salesforce/mixqg-large")
321 |     print(">>>", marked_paragraph)
322 | 
323 |     gen = GeneratorHF(model_card="Salesforce/mixqg-large")
324 | 
325 |     gen_out = gen.generate([marked_paragraph], do_sample=False, num_beams=4)
326 | 
327 |     for d in gen_out[0]:
328 |         print("---")
329 |         print(d["output_text"])
330 |     exit()
331 | 
332 |     # gen = GeneratorHF(model_card="facebook/bart-base", starter_file="qgen/bartb_squad_aaware_logprob_1.531.bin")
333 |     # paragraph = "asteroid soil samples \n A small capsule containing asteroid soil samples that was dropped from 136,700 miles in space by Japan's Hayabusa2 spacecraft landed as planned in the Australian Outback on December 6. The extremely high precision required to carry out the mission thrilled many in Japan, who said they took pride in its success."
334 |     # questions = ["What was contained in the capsule that was dropped from 136,700 miles in space?"]
335 | 
336 |     # gen = GeneratorHF(model_card="Salesforce/mixqg-large")
337 |     # paragraph = "asteroid soil samples \n A small capsule containing asteroid soil samples that was dropped from 136,700 miles in space by Japan's Hayabusa2 spacecraft landed as planned in the Australian Outback on December 6. The extremely high precision required to carry out the mission thrilled many in Japan, who said they took pride in its success."
338 |     # questions = ["What was dropped from space by Japan's Hayabusa2 spacecraft?"]
339 | 
340 |     # gen = GeneratorHF(model_card="microsoft/prophetnet-large-uncased-squad-qg")
341 |     # paragraph = "asteroid soil samples [SEP] A small capsule containing asteroid soil samples that was dropped from 136,700 miles in space by Japan's Hayabusa2 spacecraft landed as planned in the Australian Outback on December 6. The extremely high precision required to carry out the mission thrilled many in Japan, who said they took pride in its success."
342 |     # questions = ["what was in the capsule that landed in australia?"]
343 | 
344 |     # gen = GeneratorHF(model_card="gpt2-medium", starter_file="qgen/gpt2m_nf_squad_aaware_1.423.bin")
345 |     # paragraph = "asteroid soil samples \n A small capsule containing asteroid soil samples that was dropped from 136,700 miles in space by Japan's Hayabusa2 spacecraft landed as planned in the Australian Outback on December 6. The extremely high precision required to carry out the mission thrilled many in Japan, who said they took pride in its success."
346 |     # questions = ["What was contained in the capsule that was dropped from 136,700 miles in space?"]
347 | 
348 |     # paragraphs = [paragraph] * len(questions)
349 | 
350 |     # gen.model.eval()
351 |     # print(gen.score(paragraphs, questions))
352 | 
353 |     # for d in gen.generate([paragraph], num_beams=1, max_gen_length=40, compute_logprobs=True):
354 |     #     print(d[0]["output_text"])
355 | 
356 |     # print("---------")
357 |     # tokenized_paragraphs = [torch.LongTensor(gen.tokenizer.encode(text=p)) for p in paragraphs]
358 |     # encoder_ids = torch.nn.utils.rnn.pad_sequence(tokenized_paragraphs, batch_first=True, padding_value=0).to(gen.device)
359 | 
360 |     # tokenized_questions = [gen.tokenizer.encode(text=q, add_special_tokens=False) for q in questions]
361 |     # decoder_input_ids = torch.nn.utils.rnn.pad_sequence([torch.LongTensor([gen.start_id] + q) for q in tokenized_questions], batch_first=True, padding_value=gen.end_id).to(gen.device)
362 |     # decoder_output_ids = torch.nn.utils.rnn.pad_sequence([torch.LongTensor(q + [gen.end_id]) for q in tokenized_questions], batch_first=True, padding_value=-1).to(gen.device)
363 | 
364 |     # print("=============")
365 |     # print("Likelihood function")
366 |     # print(decoder_input_ids.tolist())
367 |     # print(decoder_output_ids.tolist())
368 | 
369 |     # print("============")
370 | 
371 |     # model_output = gen.model(input_ids=encoder_ids, decoder_input_ids=decoder_input_ids, return_dict=True)
372 |     # selected_logprobs = utils_rl.select_logprobs(model_output.logits, decoder_output_ids.tolist(), eos_id=gen.end_id)
373 |     # print("Manual selected logprobs", selected_logprobs)
374 | 
375 |     # crit = torch.nn.CrossEntropyLoss(ignore_index=-1, reduction="none")
376 |     # N, seqlength, vocab_size = model_output.logits.shape
377 |     # loss_components = crit(model_output.logits.view(N*seqlength, vocab_size), decoder_output_ids.contiguous().view(-1)).reshape(N, seqlength)
378 | 
379 |     # num_words = torch.sum(decoder_output_ids != -1, dim=1)
380 |     # score_per_item = (- torch.sum(loss_components, dim=1) / num_words).tolist()
381 | 
382 |     # print("Manual score per item:", score_per_item)
383 | 
384 |     gen = GeneratorHF(model_card="distilgpt2", starter_file="qgen/dgpt2_squad_aaware_1.794.bin")
385 | 
386 |     answers = ["A small capsule", "asteroid soil samples", "136,700 miles", "Australian Outback"]
387 |     original = "A small capsule containing asteroid soil samples that was dropped from 136,700 miles in space by Japan's Hayabusa2 spacecraft landed as planned in the Australian Outback on December 6. The extremely high precision required to carry out the mission thrilled many in Japan, who said they took pride in its success."
388 |     paragraphs = ["%s \n %s" % (answer, original) for answer in answers]
389 | 
390 |     gen_params = [{"num_beams": 3, "num_runs": 1}, {"num_beams": 1, "num_runs": 3, "do_sample": True}]
391 |     for gen_param in gen_params:
392 |         print("===============")
393 |         print(gen_param)
394 |         batch_outs1 = gen.generate(paragraphs, max_length=100, compute_logprobs=True, **gen_param)
395 |         for ans, outs1 in zip(answers, batch_outs1):
396 |             print("=========")
397 |             print("Target answer:", ans)
398 |             for out1 in outs1:
399 |                 print("[%.3f] %s" % (out1["logprob"], out1["output_text"]))
400 |             print("--------")
401 | 
402 |     print("========================")
403 |     print("========================")
404 |     print("========================")
405 | 


--------------------------------------------------------------------------------
/Quiz_Design/qd_content.json:
--------------------------------------------------------------------------------
1 | [{"doc_id": 0, "title": "Sustainable_Energy", "content": "Energy is sustainable if it \"meets the needs of the present without compromising the ability of future generations to meet their own needs\".  Most definitions of sustainable energy include considerations of environmental aspects such as greenhouse gas emissions and social and economic aspects such as energy poverty. Renewable energy sources such as wind, hydroelectric power, solar, and geothermal energy are generally far more sustainable than fossil fuel sources. However, some renewable energy projects, such as the clearing of forests to produce biofuels, can cause severe environmental damage. The role of non-renewable energy sources in sustainable energy has been controversial. Nuclear power is a low-carbon source whose historic mortality rates are comparable to wind and solar, but its sustainability has been debated because of concerns about radioactive waste, nuclear proliferation, and accidents. Switching from coal to natural gas has environmental benefits, including a lower climate impact, but may lead to a delay in switching to more sustainable options. Carbon capture and storage can be built into power plants to remove their carbon dioxide (CO2) emissions, but is expensive and has seldom been implemented.<br />Fossil fuels provide 85% of the world's energy consumption and the energy system is responsible for 76% of global greenhouse gas emissions. Around 790 million people in developing countries lack access to electricity and 2.6 billion rely on polluting fuels such as wood or charcoal to cook. Reducing greenhouse gas emissions to levels consistent with the 2015 Paris Agreement will require a system-wide transformation of the way energy is produced, distributed, stored, and consumed. The burning of fossil fuels and biomass is a major contributor to air pollution, which causes an estimated 7 million deaths each year. Therefore, the transition to a low-carbon energy system would have strong co-benefits for human health. Pathways exist to provide universal access to electricity and clean cooking in ways that are compatible with climate goals, while bringing major health and economic benefits to developing countries."}, {"doc_id": 1, "title": "Californium", "content": "Californium is a radioactive chemical element with the symbol Cf and atomic number 98. The element was first synthesized in 1950 at the Lawrence Berkeley National Laboratory (then the University of California Radiation Laboratory), by bombarding curium with alpha particles (helium-4 ions). It is an actinide element, the sixth transuranium element to be synthesized, and has the second-highest atomic mass of all the elements that have been produced in amounts large enough to see with the unaided eye (after einsteinium). The element was named after the university and the U.S. state of California.<br />Two crystalline forms exist for californium under normal pressure: one above and one below 900 \u00b0C (1,650 \u00b0F). A third form exists at high pressure. Californium slowly tarnishes in air at room temperature. Compounds of californium are dominated by the +3 oxidation state. The most stable of californium's twenty known isotopes is californium-251, which has a half-life of 898 years. This short half-life means the element is not found in significant quantities in the Earth's crust. Californium-252, with a half-life of about 2.645 years, is the most common isotope used and is produced at the Oak Ridge National Laboratory in the United States and the Research Institute of Atomic Reactors in Russia.<br />Californium is one of the few transuranium elements that have practical applications. Most of these applications exploit the property of certain isotopes of californium to emit neutrons. For example, californium can be used to help start up nuclear reactors, and it is employed as a source of neutrons when studying materials using neutron diffraction and neutron spectroscopy. Californium can also be used in nuclear synthesis of higher mass elements; oganesson (element 118) was synthesized by bombarding californium-249 atoms with calcium-48 ions. Users of californium must take into account radiological concerns and the element's ability to disrupt the formation of red blood cells by bioaccumulating in skeletal tissue."}, {"doc_id": 2, "title": "Statue_of_Liberty", "content": "The Statue of Liberty (Liberty Enlightening the World; French: La Libert\u00e9 \u00e9clairant le monde) is a colossal neoclassical sculpture on Liberty Island in New York Harbor in New York City, in the United States. The copper statue, a gift from the people of France to the people of the United States, was designed by French sculptor Fr\u00e9d\u00e9ric Auguste Bartholdi and its metal framework was built by Gustave Eiffel. The statue was dedicated on October 28, 1886.<br />The statue is a figure of Libertas, a robed Roman liberty goddess. She holds a torch above her head with her right hand, and in her left hand carries a tabula ansata inscribed JULY IV MDCCLXXVI (July 4, 1776 in Roman numerals), the date of the U.S. Declaration of Independence. A broken shackle and chain lie at her feet as she walks forward, commemorating the recent national abolition of slavery. After its dedication, the statue became an icon of freedom and of the United States, seen as a symbol of welcome to immigrants arriving by sea.<br />Bartholdi was inspired by a French law professor and politician, \u00c9douard Ren\u00e9 de Laboulaye, who is said to have commented in 1865 that any monument raised to U.S. independence would properly be a joint project of the French and U.S. peoples. The Franco-Prussian War delayed progress until 1875, when Laboulaye proposed that the French finance the statue and the U.S. provide the site and build the pedestal. Bartholdi completed the head and the torch-bearing arm before the statue was fully designed, and these pieces were exhibited for publicity at international expositions.<br />The torch-bearing arm was displayed at the Centennial Exposition in Philadelphia in 1876, and in Madison Square Park in Manhattan from 1876 to 1882. Fundraising proved difficult, especially for the Americans, and by 1885 work on the pedestal was threatened by lack of funds. Publisher Joseph Pulitzer, of the New York World, started a drive for donations to finish the project and attracted more than 120,000 contributors, most of whom gave less than a dollar (equivalent to $29 in 2020). The statue was built in France, shipped overseas in crates, and assembled on the completed pedestal on what was then called Bedloe's Island. The statue's completion was marked by New York's first ticker-tape parade and a dedication ceremony presided over by President Grover Cleveland."}, {"doc_id": 3, "title": "DNA", "content": "Deoxyribonucleic acid ( (listen); DNA) is a molecule composed of two polynucleotide chains that coil around each other to form a double helix carrying genetic instructions for the development, functioning, growth and reproduction of all known organisms and many viruses. DNA and ribonucleic acid (RNA) are nucleic acids. Alongside proteins, lipids and complex carbohydrates (polysaccharides), nucleic acids are one of the four major types of macromolecules that are essential for all known forms of life.<br />The two DNA strands are known as polynucleotides as they are composed of simpler monomeric units called nucleotides. Each nucleotide is composed of one of four nitrogen-containing nucleobases (cytosine [C], guanine [G], adenine [A] or thymine [T]), a sugar called deoxyribose, and a phosphate group. The nucleotides are joined to one another in a chain by covalent bonds (known as the phospho-diester linkage) between the sugar of one nucleotide and the phosphate of the next, resulting in an alternating sugar-phosphate backbone. The nitrogenous bases of the two separate polynucleotide strands are bound together, according to base pairing rules (A with T and C with G), with hydrogen bonds to make double-stranded DNA. The complementary nitrogenous bases are divided into two groups, pyrimidines and purines. In DNA, the pyrimidines are thymine and cytosine; the purines are adenine and guanine.<br />Both strands of double-stranded DNA store the same biological information. This information is replicated when the two strands separate. A large part of DNA (more than 98% for humans) is non-coding, meaning that these sections do not serve as patterns for protein sequences. The two strands of DNA run in opposite directions to each other and are thus antiparallel. Attached to each sugar is one of four types of nucleobases (or bases). It is the sequence of these four nucleobases along the backbone that encodes genetic information. RNA strands are created using DNA strands as a template in a process called transcription, where DNA bases are exchanged for their corresponding bases except in the case of thymine (T), for which RNA substitutes uracil (U). Under the genetic code, these RNA strands specify the sequence of amino acids within proteins in a process called translation."}, {"doc_id": 4, "title": "Palazzo_Pitti", "content": "The Palazzo Pitti (Italian pronunciation: [pa\u02c8lattso \u02c8pitti]), in English sometimes called the Pitti Palace, is a vast, mainly Renaissance, palace in Florence, Italy.  It is situated on the south side of the River Arno, a short distance from the Ponte Vecchio. The core of the present palazzo dates from 1458 and was originally the town residence of Luca Pitti, an ambitious Florentine banker.<br />The palace was bought by the Medici family in 1549 and became the chief residence of the ruling families of the Grand Duchy of Tuscany. It grew as a great treasure house as later generations amassed paintings, plates, jewelry and luxurious possessions.<br />In the late 18th century, the palazzo was used as a power base by Napoleon and later served for a brief period as the principal royal palace of the newly united Italy. The palace and its contents were donated to the Italian people by King Victor Emmanuel III in 1919.<br />The palazzo is now the largest museum complex in Florence. The principal palazzo block, often in a building of this design known as the corps de logis, is 32,000 square metres. It is divided into several principal galleries or museums detailed below.<br /><br /><br />== History ==<br /><br /><br />=== Early history ===<br /><br />The construction of this severe and forbidding building was commissioned in 1458 by the Florentine banker Luca Pitti (1398\u20131472), a principal supporter and friend of Cosimo de' Medici. The early history of the Palazzo Pitti is a mixture of fact and myth. Pitti is alleged to have instructed that the windows be larger than the entrance of the Palazzo Medici.  The 16th-century art historian Giorgio Vasari proposed that Brunelleschi was the palazzo's architect, and that his pupil Luca Fancelli was merely his assistant in the task, but today it is Fancelli who is generally credited. Besides obvious differences from the elder architect's style, Brunelleschi died 12 years before construction of the palazzo began. The design and fenestration suggest that the unknown architect was more experienced in utilitarian domestic architecture than in the humanist rules defined by Alberti in his book De Re Aedificatoria.Though impressive, the original palazzo would have been no rival to the Florentine Medici residences in terms of either size or content. Whoever the architect of the Palazzo Pitti was, he was moving against the contemporary flow of fashion.  The rusticated stonework gives the palazzo a severe and powerful atmosphere, reinforced by the three-times-repeated series of seven arch-headed apertures, reminiscent of a Roman aqueduct. The Roman-style architecture appealed to the Florentine love of the new style all'antica.  This original design has withstood the test of time: the repetitive formula of the fa\u00e7ade was continued during the subsequent additions to the palazzo, and its influence can be seen in numerous 16th-century imitations and 19th-century revivals. Work stopped after Pitti suffered financial losses following the death of Cosimo de' Medici in 1464.  Luca Pitti died in 1472 with the building unfinished."}, {"doc_id": 5, "title": "Enzyme", "content": "Enzymes () are proteins that act as biological catalysts (biocatalysts). Catalysts accelerate chemical reactions. The molecules upon which enzymes may act are called substrates, and the enzyme converts the substrates into different molecules known as products. Almost all metabolic processes in the cell need enzyme catalysis in order to occur at rates fast enough to sustain life.:\u200a8.1\u200a Metabolic pathways depend upon enzymes to catalyze individual steps. The study of enzymes is called enzymology and the field of pseudoenzyme analysis recognizes that during evolution, some enzymes have lost the ability to carry out biological catalysis, which is often reflected in their amino acid sequences and unusual 'pseudocatalytic' properties.Enzymes are known to catalyze more than 5,000 biochemical reaction types. Other biocatalysts are catalytic RNA molecules, called ribozymes. Enzymes' specificity comes from their unique three-dimensional structures.<br />Like all catalysts, enzymes increase the reaction rate by lowering its activation energy. Some enzymes can make their conversion of substrate to product occur many millions of times faster. An extreme example is orotidine 5'-phosphate decarboxylase, which allows a reaction that would otherwise take millions of years to occur in milliseconds. Chemically, enzymes are like any catalyst and are not consumed in chemical reactions, nor do they alter the equilibrium of a reaction. Enzymes differ from most other catalysts by being much more specific. Enzyme activity can be affected by other molecules: inhibitors are molecules that decrease enzyme activity, and activators are molecules that increase activity. Many therapeutic drugs and poisons are enzyme inhibitors. An enzyme's activity decreases markedly outside its optimal temperature and pH, and many enzymes are (permanently) denatured when exposed to excessive heat, losing their structure and catalytic properties.<br />Some enzymes are used commercially, for example, in the synthesis of antibiotics. Some household products use enzymes to speed up chemical reactions: enzymes in biological washing powders break down protein, starch or fat stains on clothes, and enzymes in meat tenderizer break down proteins into smaller molecules, making the meat easier to chew."}, {"doc_id": 6, "title": "Cretaceous\u2013Paleogene_extinction_event", "content": "The Cretaceous\u2013Paleogene (K\u2013Pg) extinction event (also known as the Cretaceous\u2013Tertiary (K\u2013T) extinction) was a sudden mass extinction of three-quarters of the plant and animal species on Earth, approximately 66 million years ago. With the exception of some ectothermic species such as sea turtles and crocodilians, no tetrapods weighing more than 25 kilograms (55 pounds) survived. It marked the end of the Cretaceous period, and with it the Mesozoic Era, while heralding the beginning of the Cenozoic Era, which continues to this day.<br />In the geologic record, the K\u2013Pg event is marked by a thin layer of sediment called the K\u2013Pg boundary, which can be found throughout the world in marine and terrestrial rocks. The boundary clay shows unusually high levels of the metal iridium, which is more common in asteroids than in the Earth's crust.As originally proposed in 1980 by a team of scientists led by Luis Alvarez and his son Walter, it is now generally thought that the K\u2013Pg extinction was caused by the impact of a massive comet or asteroid 10 to 15 km (6 to 9 mi) wide, 66 million years ago, which devastated the global environment, mainly through a lingering impact winter which halted photosynthesis in plants and plankton. The impact hypothesis, also known as the Alvarez hypothesis, was bolstered by the discovery of the 180 km (112 mi) Chicxulub crater in the Gulf of Mexico's Yucat\u00e1n Peninsula in the early 1990s, which provided conclusive evidence that the K\u2013Pg boundary clay represented debris from an asteroid impact. The fact that the extinctions occurred simultaneously provides strong evidence that they were caused by the asteroid. A 2016 drilling project into the Chicxulub peak ring confirmed that the peak ring comprised granite ejected within minutes from deep in the earth, but contained hardly any gypsum, the usual sulfate-containing sea floor rock in the region: the gypsum would have vaporized and dispersed as an aerosol into the atmosphere, causing longer-term effects on the climate and food chain. In October 2019, researchers reported that the event rapidly acidified the oceans, producing ecological collapse and, in this way as well, produced long-lasting effects on the climate, and accordingly was a key reason for the mass extinction at the end of the Cretaceous. In January 2020, scientists reported that climate-modeling of the extinction event favors the asteroid impact and not volcanism.Other causal or contributing factors to the extinction may have been the Deccan Traps and other volcanic eruptions, climate change, and sea level change."}]


--------------------------------------------------------------------------------
/Quiz_Design/requirements.txt:
--------------------------------------------------------------------------------
1 | torch
2 | transformers
3 | flask


--------------------------------------------------------------------------------
/Quiz_Design/run_flask_server.py:
--------------------------------------------------------------------------------
  1 | from flask import Flask, request, render_template, send_from_directory
  2 | from model_hf_generator import GeneratorHF
  3 | from datetime import datetime, timedelta
  4 | import os, random, json, flask
  5 | 
  6 | CACHE_FILE = "qd_cache.json"
  7 | ANNOT_FILE = "qd_annotations_running.jsonl"
  8 | CONTENT_FILE = "qd_content.json"
  9 | 
 10 | def load_question_cache():
 11 |     if os.path.exists(CACHE_FILE):
 12 |         with open(CACHE_FILE, "r") as f:
 13 |             return json.load(f)
 14 |     else:
 15 |         return {}
 16 | 
 17 | def mark_paragraph_answer(paragraph, answer, model_card=""):
 18 |     if "prophetnet" in model_card:
 19 |         return "%s [SEP] %s" % (answer, paragraph)
 20 |     elif "mixqg" in model_card:
 21 |         return f"{answer} \\n {paragraph}"
 22 |     else:
 23 |         return "%s \n %s" % (answer, paragraph) # The default, used for our trained models
 24 | 
 25 | def save_question_cache():
 26 |     with open(CACHE_FILE, "w") as f:
 27 |         json.dump(cached_questions, f)
 28 | 
 29 | def deduplicate_questions(questions):
 30 |     M = {}
 31 |     for q in questions:
 32 |         if q["question"] not in M:
 33 |             M[q["question"]] = []
 34 |         M[q["question"]].append(q["model_name"])
 35 |     return [{"model_name": "|".join(v), "question": k} for k, v in M.items()]
 36 | 
 37 | def load_qgen_models():
 38 |     global QGEN_MODELS, scorer
 39 |     QGEN_MODELS = [
 40 |         # {"model_name": "dgpt2_sup", "model": GeneratorHF("distilgpt2", starter_file="qgen/dgpt2_squad_aaware_1.794.bin")},
 41 |         # {"model_name": "gpt2b_sup", "model": GeneratorHF("gpt2", starter_file="qgen/gpt2b_squad_aaware_1.575.bin")},
 42 |         # {"model_name": "bartb_sup", "model": GeneratorHF("facebook/bart-base", starter_file="qgen/bartb_nf_squad_aaware_1.492.bin")},
 43 |         # {"model_name": "bartl_sup", "model": GeneratorHF("facebook/bart-large", starter_file="qgen/bartL_nf_squad_aaware_1.290.bin")},
 44 |         # {"model_name": "gpt2m_sup", "model": GeneratorHF("gpt2-medium", starter_file="qgen/gpt2m_nf_squad_aaware_1.423.bin")},
 45 |         {"model_name": "mixqg-base", "model": GeneratorHF(model_card='Salesforce/mixqg-base')},
 46 |         {"model_name": "mixqg-large", "model": GeneratorHF(model_card='Salesforce/mixqg-large')},
 47 |         {"model_name": "prophetnet", "model": GeneratorHF(model_card='microsoft/prophetnet-large-uncased-squad-qg')}
 48 |     ]
 49 | 
 50 |     print("Qgen models loaded")
 51 | 
 52 | app = Flask(__name__)
 53 | app.config["TEMPLATES_AUTO_RELOAD"] = True
 54 | 
 55 | QGEN_MODELS = []
 56 | scorer = None
 57 | 
 58 | load_qgen_models()
 59 | cached_questions = load_question_cache()
 60 | 
 61 | @app.before_request
 62 | def before_request():
 63 |     user_id = -1
 64 |     if "user_id" in flask.request.cookies:
 65 |         try:
 66 |             user_id = int(flask.request.cookies["user_id"])
 67 |         except:
 68 |             pass
 69 |     
 70 |     if user_id < 0:
 71 |         max_user_id = 0
 72 |         if os.path.exists(ANNOT_FILE):
 73 |             with open(ANNOT_FILE, "r") as f:
 74 |                 for line in f:
 75 |                     obj = json.loads(line)
 76 |                     max_user_id = max(max_user_id, obj.get("user_id", -1))
 77 |         user_id = max_user_id + 1    
 78 |     flask.request.user_id = user_id
 79 | 
 80 | @app.after_request
 81 | def after_request(response):
 82 |     response.headers.add('Access-Control-Allow-Origin', '*')
 83 |     response.headers.add('Access-Control-Allow-Headers', 'Content-Type,Authorization')
 84 |     response.headers.add('Access-Control-Allow-Methods', 'GET,PUT,POST,DELETE,OPTIONS')
 85 |     response.set_cookie("user_id", value=str(flask.request.user_id), expires=datetime.now() + timedelta(days=365))
 86 |     return response
 87 | 
 88 | @app.route("/")
 89 | def api_home_page():
 90 |     return render_template("main_page.html")
 91 | 
 92 | @app.route('/static/<path:path>')
 93 | def send_static(path):
 94 |     return send_from_directory('static', path)
 95 | 
 96 | @app.route("/api/load_documents")
 97 | def api_load_document():
 98 |     with open(CONTENT_FILE, "r") as f:
 99 |         data = json.load(f)
100 |     return {"documents": data}
101 | 
102 | @app.route("/api/gen_questions", methods=["POST"])
103 | def api_gen_questions():
104 |     request_data = dict(request.form)
105 | 
106 |     doc_id = int(request_data["doc_id"])
107 |     context = request_data["context"]
108 |     answer_span = request_data["selection"]
109 | 
110 |     paragraphs = context.split("<br />")
111 |     relevant_paragraphs = [p for p in paragraphs if answer_span in p]
112 |     if len(relevant_paragraphs) == 0:
113 |         return []
114 |     else:
115 |         question_key = "%d||%s" % (doc_id, answer_span)
116 |         if question_key not in cached_questions:
117 |             relevant_paragraph = relevant_paragraphs[0]
118 |             response = []
119 |             for model in QGEN_MODELS:
120 |                 marked_paragraph = mark_paragraph_answer(relevant_paragraph, answer_span, model_card=model["model"].model_card)
121 | 
122 |                 questions = model["model"].generate([marked_paragraph], max_gen_length=30, num_beams=2)[0]
123 |                 question = questions[0]["output_text"]
124 |                 question = question[0].upper() + question[1:]
125 |                 response.append({"model_name": model["model_name"], "question": question})
126 | 
127 |             response = deduplicate_questions(response)
128 |             cached_questions[question_key] = response
129 |             save_question_cache()
130 |         else:
131 |             print("Reloaded from the cache")
132 | 
133 |         response = cached_questions[question_key]
134 | 
135 |         random.shuffle(response)
136 |         return {"response": response}
137 | 
138 | @app.route("/api/annotate_questions", methods=["POST"])
139 | def api_annotate_questions():
140 |     request_data = dict(request.form)
141 |     request_data["questions"] = json.loads(request_data["questions"].strip())
142 | 
143 |     ip_addr = request.remote_addr
144 |     saved_object = {"timestamp": datetime.now().strftime("%Y-%m-%d %H:%M:%S"), "ip_addr": ip_addr}
145 |     saved_object["user_id"] = request.user_id
146 |     saved_object["doc_id"] = request_data["doc_id"]
147 |     saved_object["answer_span"] = request_data["answer_span"]
148 |     saved_object["answer_span_idx"] = request_data["answer_span_idx"]
149 |     saved_object["questions"] = request_data["questions"]
150 |     saved_object["annotator_name"] = request_data["annotator_name"]
151 | 
152 |     with open(ANNOT_FILE, "a") as f:
153 |         f.write(json.dumps(saved_object) + "\n")
154 |     return {"response": 1}
155 | 
156 | @app.route("/api/cancel_selection", methods=["POST"])
157 | def api_cancel_selection():
158 |     request_data = dict(request.form)
159 | 
160 |     print("Delete request: ", request_data["doc_id"], request_data["answer_span"], request_data["annotator_name"])
161 |     if os.path.exists(ANNOT_FILE):
162 |         final_annotations = []
163 |         num_deleted = 0
164 |         with open(ANNOT_FILE, "r") as f:
165 |             for line in f:
166 |                 obj = json.loads(line)
167 |                 if obj["doc_id"] == request_data["doc_id"] and obj["answer_span"] == request_data["answer_span"] and obj["annotator_name"] == request_data["annotator_name"]:
168 |                     num_deleted += 1
169 |                 else:
170 |                     final_annotations.append(obj)
171 | 
172 |         print("Num rows deleted: %d" % (num_deleted))
173 |         with open(ANNOT_FILE, "w") as f:
174 |             for obj in final_annotations:
175 |                 f.write(json.dumps(obj) + "\n")
176 | 
177 |     return {"response": 1}
178 | 


--------------------------------------------------------------------------------
/Quiz_Design/static/Quiz_Design_Tutorial.mp4:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/salesforce/QGen/e0beb712cfb82316b04f12a01549d7544b37ddd0/Quiz_Design/static/Quiz_Design_Tutorial.mp4


--------------------------------------------------------------------------------
/Quiz_Design/static/live.js:
--------------------------------------------------------------------------------
  1 |  /*
  2 |   Live.js - One script closer to Designing in the Browser
  3 |   Written for Handcraft.com by Martin Kool (@mrtnkl).
  4 | 
  5 |   Version 4.
  6 |   Recent change: Made stylesheet and mimetype checks case insensitive.
  7 | 
  8 |   http://livejs.com
  9 |   http://livejs.com/license (MIT)  
 10 |   @livejs
 11 | 
 12 |   Include live.js#css to monitor css changes only.
 13 |   Include live.js#js to monitor js changes only.
 14 |   Include live.js#html to monitor html changes only.
 15 |   Mix and match to monitor a preferred combination such as live.js#html,css  
 16 | 
 17 |   By default, just include live.js to monitor all css, js and html changes.
 18 |   
 19 |   Live.js can also be loaded as a bookmarklet. It is best to only use it for CSS then,
 20 |   as a page reload due to a change in html or css would not re-include the bookmarklet.
 21 |   To monitor CSS and be notified that it has loaded, include it as: live.js#css,notify
 22 | */
 23 | (function () {
 24 | 
 25 |   var headers = { "Etag": 1, "Last-Modified": 1, "Content-Length": 1, "Content-Type": 1 },
 26 |       resources = {},
 27 |       pendingRequests = {},
 28 |       currentLinkElements = {},
 29 |       oldLinkElements = {},
 30 |       interval = 1000,
 31 |       loaded = false,
 32 |       active = { "html": 1, "css": 1, "js": 1 };
 33 | 
 34 |   var Live = {
 35 | 
 36 |     // performs a cycle per interval
 37 |     heartbeat: function () {      
 38 |       if (document.body) {        
 39 |         // make sure all resources are loaded on first activation
 40 |         if (!loaded) Live.loadresources();
 41 |         Live.checkForChanges();
 42 |       }
 43 |       setTimeout(Live.heartbeat, interval);
 44 |     },
 45 | 
 46 |     // loads all local css and js resources upon first activation
 47 |     loadresources: function () {
 48 | 
 49 |       // helper method to assert if a given url is local
 50 |       function isLocal(url) {
 51 |         var loc = document.location,
 52 |             reg = new RegExp("^\\.|^\/(?!\/)|^[\\w]((?!://).)*$|" + loc.protocol + "//" + loc.host);
 53 |         return url.match(reg);
 54 |       }
 55 | 
 56 |       // gather all resources
 57 |       var scripts = document.getElementsByTagName("script"),
 58 |           links = document.getElementsByTagName("link"),
 59 |           uris = [];
 60 | 
 61 |       // track local js urls
 62 |       for (var i = 0; i < scripts.length; i++) {
 63 |         var script = scripts[i], src = script.getAttribute("src");
 64 |         if (src && isLocal(src))
 65 |           uris.push(src);
 66 |         if (src && src.match(/\blive.js#/)) {
 67 |           for (var type in active)
 68 |             active[type] = src.match("[#,|]" + type) != null
 69 |           if (src.match("notify")) 
 70 |             alert("Live.js is loaded.");
 71 |         }
 72 |       }
 73 |       if (!active.js) uris = [];
 74 |       if (active.html) uris.push(document.location.href);
 75 | 
 76 |       // track local css urls
 77 |       for (var i = 0; i < links.length && active.css; i++) {
 78 |         var link = links[i], rel = link.getAttribute("rel"), href = link.getAttribute("href", 2);
 79 |         if (href && rel && rel.match(new RegExp("stylesheet", "i")) && isLocal(href)) {
 80 |           uris.push(href);
 81 |           currentLinkElements[href] = link;
 82 |         }
 83 |       }
 84 | 
 85 |       // initialize the resources info
 86 |       for (var i = 0; i < uris.length; i++) {
 87 |         var url = uris[i];
 88 |         Live.getHead(url, function (url, info) {
 89 |           resources[url] = info;
 90 |         });
 91 |       }
 92 | 
 93 |       // add rule for morphing between old and new css files
 94 |       var head = document.getElementsByTagName("head")[0],
 95 |           style = document.createElement("style"),
 96 |           rule = "transition: all .3s ease-out;"
 97 |       css = [".livejs-loading * { ", rule, " -webkit-", rule, "-moz-", rule, "-o-", rule, "}"].join('');
 98 |       style.setAttribute("type", "text/css");
 99 |       head.appendChild(style);
100 |       style.styleSheet ? style.styleSheet.cssText = css : style.appendChild(document.createTextNode(css));
101 | 
102 |       // yep
103 |       loaded = true;
104 |     },
105 | 
106 |     // check all tracking resources for changes
107 |     checkForChanges: function () {
108 |       for (var url in resources) {
109 |         if (pendingRequests[url])
110 |           continue;
111 | 
112 |         Live.getHead(url, function (url, newInfo) {
113 |           var oldInfo = resources[url],
114 |               hasChanged = false;
115 |           resources[url] = newInfo;
116 |           for (var header in oldInfo) {
117 |             // do verification based on the header type
118 |             var oldValue = oldInfo[header],
119 |                 newValue = newInfo[header],
120 |                 contentType = newInfo["Content-Type"];
121 |             switch (header.toLowerCase()) {
122 |               case "etag":
123 |                 if (!newValue) break;
124 |                 // fall through to default
125 |               default:
126 |                 hasChanged = oldValue != newValue;
127 |                 break;
128 |             }
129 |             // if changed, act
130 |             if (hasChanged) {
131 |               Live.refreshResource(url, contentType);
132 |               break;
133 |             }
134 |           }
135 |         });
136 |       }
137 |     },
138 | 
139 |     // act upon a changed url of certain content type
140 |     refreshResource: function (url, type) {
141 |       switch (type.toLowerCase()) {
142 |         // css files can be reloaded dynamically by replacing the link element                               
143 |         case "text/css":
144 |           var link = currentLinkElements[url],
145 |               html = document.body.parentNode,
146 |               head = link.parentNode,
147 |               next = link.nextSibling,
148 |               newLink = document.createElement("link");
149 | 
150 |           html.className = html.className.replace(/\s*livejs\-loading/gi, '') + ' livejs-loading';
151 |           newLink.setAttribute("type", "text/css");
152 |           newLink.setAttribute("rel", "stylesheet");
153 |           newLink.setAttribute("href", url + "?now=" + new Date() * 1);
154 |           next ? head.insertBefore(newLink, next) : head.appendChild(newLink);
155 |           currentLinkElements[url] = newLink;
156 |           oldLinkElements[url] = link;
157 | 
158 |           // schedule removal of the old link
159 |           Live.removeoldLinkElements();
160 |           break;
161 | 
162 |         // check if an html resource is our current url, then reload                               
163 |         case "text/html":
164 |           if (url != document.location.href)
165 |             return;
166 | 
167 |           // local javascript changes cause a reload as well
168 |         case "text/javascript":
169 |         case "application/javascript":
170 |         case "application/x-javascript":
171 |           document.location.reload();
172 |       }
173 |     },
174 | 
175 |     // removes the old stylesheet rules only once the new one has finished loading
176 |     removeoldLinkElements: function () {
177 |       var pending = 0;
178 |       for (var url in oldLinkElements) {
179 |         // if this sheet has any cssRules, delete the old link
180 |         try {
181 |           var link = currentLinkElements[url],
182 |               oldLink = oldLinkElements[url],
183 |               html = document.body.parentNode,
184 |               sheet = link.sheet || link.styleSheet,
185 |               rules = sheet.rules || sheet.cssRules;
186 |           if (rules.length >= 0) {
187 |             oldLink.parentNode.removeChild(oldLink);
188 |             delete oldLinkElements[url];
189 |             setTimeout(function () {
190 |               html.className = html.className.replace(/\s*livejs\-loading/gi, '');
191 |             }, 100);
192 |           }
193 |         } catch (e) {
194 |           pending++;
195 |         }
196 |         if (pending) setTimeout(Live.removeoldLinkElements, 50);
197 |       }
198 |     },
199 | 
200 |     // performs a HEAD request and passes the header info to the given callback
201 |     getHead: function (url, callback) {
202 |       pendingRequests[url] = true;
203 |       var xhr = window.XMLHttpRequest ? new XMLHttpRequest() : new ActiveXObject("Microsoft.XmlHttp");
204 |       xhr.open("HEAD", url, true);
205 |       xhr.onreadystatechange = function () {
206 |         delete pendingRequests[url];
207 |         if (xhr.readyState == 4 && xhr.status != 304) {
208 |           xhr.getAllResponseHeaders();
209 |           var info = {};
210 |           for (var h in headers) {
211 |             var value = xhr.getResponseHeader(h);
212 |             // adjust the simple Etag variant to match on its significant part
213 |             if (h.toLowerCase() == "etag" && value) value = value.replace(/^W\//, '');
214 |             if (h.toLowerCase() == "content-type" && value) value = value.replace(/^(.*?);.*?$/i, "$1");
215 |             info[h] = value;
216 |           }
217 |           callback(url, info);
218 |         }
219 |       }
220 |       xhr.send();
221 |     }
222 |   };
223 | 
224 |   // start listening
225 |   if (document.location.protocol != "file:") {
226 |     if (!window.liveJsLoaded)
227 |       Live.heartbeat();
228 | 
229 |     window.liveJsLoaded = true;
230 |   }
231 |   else if (window.console)
232 |     console.log("Live.js doesn't support the file protocol. It needs http.");    
233 | })();


--------------------------------------------------------------------------------
/Quiz_Design/static/main.css:
--------------------------------------------------------------------------------
  1 | @import url('https://fonts.googleapis.com/css2?family=Roboto:ital,wght@0,100;0,300;0,400;0,500;0,700;0,900;1,400&display=swap');
  2 | body {
  3 |     margin: 0px;
  4 |     padding: 0px;
  5 |     font-family: "Roboto";
  6 | }
  7 | #header {
  8 |     font-size: 40px;
  9 |     line-height: 80px;
 10 |     padding-left: 100px;
 11 |     background: #f0f0f0;
 12 |     margin-bottom: 40px;
 13 | }
 14 | #content {
 15 |     margin: auto;
 16 |     width: 850px;
 17 | }
 18 | #column1 {
 19 |     float: left;
 20 |     line-height: 1.8;
 21 |     width: 500px;
 22 |     margin-bottom: 50px;
 23 | }
 24 | #column2 {
 25 |     display: none;
 26 |     position: fixed;
 27 |     top: 160px;
 28 |     right: 30px;  
 29 |     width: 300px;
 30 |     box-sizing: border-box;
 31 |     padding-left: 30px;
 32 |     background: white;
 33 | }
 34 | .unconfirmed_span {
 35 |     background: rgba(0, 0, 0, 0.07);
 36 |     padding: 5px 10px;
 37 | }
 38 | .unconfirmed_span button {
 39 |     background: white;
 40 |     cursor: pointer;
 41 | }
 42 | .confirmed_span {
 43 |     background: rgba(0, 120, 255, 0.2);
 44 |     padding: 5px 10px;
 45 |     cursor: pointer;
 46 |     border-bottom: 2px solid transparent;
 47 | }
 48 | .confirmed_span:hover {
 49 |     background: rgba(0, 120, 255, 0.25);
 50 | }
 51 | .active_span {
 52 |     border-bottom: 2px solid rgba(0, 120, 255, 0.75);
 53 | }
 54 | .column_title {
 55 |     font-size: 30px;
 56 | }
 57 | #no_questions {
 58 |     font-style: italic;
 59 |     margin-top: 30px;
 60 | }
 61 | .question_removed {
 62 |     opacity: 0.5;
 63 | }
 64 | .question_removed .question {
 65 |     text-decoration: line-through;
 66 | }
 67 | #loading {
 68 |     display: none;
 69 | }
 70 | #loading img {
 71 |     width: 80px;
 72 |     margin-left: 75px;
 73 | }
 74 | #documents_row {
 75 |     font-size: 25px;
 76 |     height: 40px;
 77 |     background: #e0e0e0;
 78 |     margin-left: -100px;
 79 |     padding-left: 100px;
 80 |     line-height: 40px;
 81 | }
 82 | #documents_list {
 83 |     display: inline-block;
 84 |     cursor: pointer;
 85 |     vertical-align: top;
 86 | }
 87 | .document_title {
 88 |     display: inline-block;
 89 |     font-size: 16px;
 90 |     line-height: 16px;
 91 |     padding: 5px 10px;
 92 |     margin: 7px 10px;
 93 |     background: rgba(0, 0, 0, 0.07);
 94 |     border-radius: 8px;
 95 | }
 96 | .document_title:hover {
 97 |     text-decoration: underline;
 98 | }
 99 | .active_document_title {
100 |     background: rgba(0, 0, 0, 0.14);
101 | }
102 | #labeler_name, .document_select, #open_tutorial_btn {
103 |     font-size: 18px;
104 |     margin-left: 10px;
105 |     margin-top: 8px;
106 |     padding-left: 7px;
107 |     background: rgba(0, 0, 0, 0.07);
108 |     border: 0px;
109 |     width: 200px;
110 | }
111 | #open_tutorial_btn {
112 |     cursor: pointer;
113 | }
114 | .q_option {
115 |     margin: 10px 0px;
116 |     padding: 5px;
117 |     line-height: 1.0;
118 |     position: relative;
119 |     overflow: hidden;
120 |     box-sizing: border-box;
121 |     min-height: 64px;
122 |     background: #f5f5f5;
123 |     border-radius: 10px;
124 | }
125 | .remove_q {
126 |     color: #aa0000;
127 |     cursor: pointer;
128 |     font-size: 20px;
129 |     vertical-align: middle;
130 |     position: absolute;
131 |     width: 20px;
132 |     height: 20px;
133 |     transform: translateY(-50%);
134 |     top: 50%;
135 |     text-align: right;
136 | }
137 | .question {
138 |     margin-left: 35px;
139 | }
140 | .q_reasons {
141 |     text-align: center;
142 |     line-height: 2.0;
143 |     float: right;
144 |     width: calc( 100% - 20px);
145 |     display: none;
146 | }
147 | .q_reasons span {
148 |     padding: 0px 10px;
149 |     cursor: pointer;
150 |     white-space: nowrap;
151 |     color: #990000;
152 | }
153 | .q_reasons span:hover {
154 |     text-decoration: underline;
155 | }
156 | .cancel_btn {
157 |     padding-right: 7px;
158 |     font-weight: bold;
159 |     font-size: 14px;
160 | }


--------------------------------------------------------------------------------
/Quiz_Design/static/slideshow.css:
--------------------------------------------------------------------------------
 1 | #tutorial_container {
 2 |     position: absolute;
 3 |     top: 0px; left: 0px;
 4 |     width: 100%; height: 100%;
 5 |     background: rgba(0, 0, 0, 0.6);
 6 |     z-index: 100;
 7 | }
 8 | #tutorial {
 9 |     position: absolute;
10 |     width: 750px;
11 |     height: 550px;
12 |     transform: translate(-50%, -50%);
13 |     top: 50%; left: 50%;
14 |     background: #e0e0e0;
15 |     box-sizing: border-box;
16 |     padding: 40px;
17 | }
18 | #tuto_close_button {
19 |     position: absolute;
20 |     top: 10px; right: 10px;
21 |     cursor: pointer;
22 |     z-index: 101;
23 | }
24 | 
25 | #tutorial .title {
26 |     font-size: 20px;
27 |     margin-bottom: 20px;
28 | }
29 | #tutorial .content {
30 |     font-size: 14px;
31 |     line-height: 1.5;
32 | }
33 | 
34 | #tuto_video {
35 |     width: 700px;
36 |     height: 400px;
37 |     margin-left: -40px;
38 |     margin-right: -40px;
39 | }


--------------------------------------------------------------------------------
/Quiz_Design/templates/main_page.html:
--------------------------------------------------------------------------------
  1 | <html>
  2 |     <head>
  3 |         <title>Quiz Design</title>
  4 |     </head>
  5 |     <link rel="stylesheet" type="text/css" href="http://cdn.jsdelivr.net/npm/slick-carousel@1.8.1/slick/slick.css"/>
  6 |     <link rel="stylesheet" type="text/css" href="http://cdn.jsdelivr.net/npm/slick-carousel@1.8.1/slick/slick-theme.css"/>
  7 | 
  8 |     <div id="tutorial_container">
  9 |         <div id="tutorial">
 10 |             <span id="tuto_close_button" onclick="$('#tutorial_container').hide();">&#10006;</span>
 11 |             <div id="tutorial_slider">
 12 |                 <div class="tuto_slide">
 13 |                     <div class="title">Introduction Video</div>
 14 |                     <div class="content">
 15 |                         <video id="tuto_video" controls>
 16 |                             <source src="/static/Quiz_Design_Tutorial.mp4" type="video/mp4">
 17 |                         </video>
 18 |                     </div>
 19 |                 </div>
 20 |                 <div class="tuto_slide">
 21 |                     <div class="title">1. Introduction to the Quiz Design Helper</div>
 22 |                     <div class="content">
 23 |                         Your objective is to design a quiz about a particular topic for a class of students. The procedure is the following:<br /><br />
 24 |                         
 25 |                         1) Select a quiz topic from the  list (for example "Sustainable Energy")<br /><br />
 26 |                         2) The system will load a text about the topic.
 27 |                     </div>
 28 |                 </div>
 29 |                 <div class="tuto_slide">
 30 |                     <div class="content">
 31 |                         <div class="title">2. Quiz Concepts and Questions</div>
 32 |                         3) Select a concept that you want to quiz your students on (for example a phrase, a figure, or a keyword) and confirm your selection.<br /><br />
 33 | 
 34 |                         4) <b>Important:</b> It is recommended to select <b>shorter concepts</b>, and not full sentences to obtain more precise question. Selecting concepts of up to about 8 words is ideal.<br /><br />
 35 | 
 36 |                         5) The system will load a list of questions that attempt to quiz students about the selected concept.<br /><br />
 37 | 
 38 |                         6) Go over each question, and remove ones you would not include in your quiz. We will next go over types of questions that should be removed.<br /><br />
 39 | 
 40 |                         7) <b>Important:</b> you can keep one, multiple or none of the questions (if none of the questions are satisfactory). For each question you remove, you have to choose the reason that the question is unsatisfactory (more on this later).<br />
 41 |                     </div>
 42 |                 </div>
 43 |                 <div class="tuto_slide">
 44 |                     <div class="title">3. Number of Questions in Quiz</div>
 45 |                     <div class="content">
 46 |                         8) Once you've finalized the question for a concept, select another concept and repeat the question selection process. Try to select <b>8-12</b> concepts per topic to generate long enough quizzes.<br /><br />
 47 | 
 48 |                         9) Once you've finished a full quiz set, you can move on to another quiz topic. We have found that in one hour, you should be able to complete the quizzes for 5 topics.
 49 |                     </div>
 50 |                 </div>
 51 |                 <div class="tuto_slide">
 52 |                     <div class="title">4. Reasons to remove questions</div>
 53 |                     <div class="content">
 54 |                         There are three main reasons to remove a question: (I) <b>Disfluent</b> - the question is not fluent, (II) <b>Wrong Answer</b> - the question is not about the target concept, (III) <b>Inadequate</b> - the question phrasing makes it inadequate for use in a quiz.<br /><br />
 55 |                         We next go over examples of each type of question error type.
 56 |                     </div>
 57 |                 </div>
 58 |                 <div class="tuto_slide">
 59 |                     <div class="title">5. Error Type I: Question is not Fluent</div>
 60 |                     <div class="content">
 61 |                         The question can be disfluent for several reasons, including: (1) not being phrased as a question, (2) excessive repetition, (3) awkward phrasing, or (4) using the wrong verb tense.
 62 |                         Given the following paragraph:<br />
 63 |                         <i>The mammoth was identified as an extinct species of elephant by Georges Cuvier in 1796.</i>
 64 |                         And targetting the concept: "in 1796". Here are some disfluent questions:
 65 |                         <ul>
 66 |                             <li>The mammoth was identified as an extinct specis that year. (<b>Not phrased as a question</b>)</li>
 67 |                             <li>When did the mammoth the mammoth go instinct? (<b>excessive repetition</b>)</li>
 68 |                             <li>When did the mammoth die? (<b>awkward phrasing</b>)</li>
 69 |                             <li>When will the mammoth go extinct? (<b>Wrong verb tense</b>)</li>
 70 |                         </ul>
 71 |                     </div>
 72 |                 </div>
 73 |                 <div class="tuto_slide">
 74 |                     <div class="title">6. Error Type II: Question's Answer is not the Concept</div>
 75 |                     <div class="content">
 76 |                         Even though the question is fluent, it could not be about the target concept selected. In our previous example with the following context:<br />
 77 |                         <i>The mammoth was identified as an extinct species of elephant by Georges Cuvier in 1796.</i>
 78 |                         And targetting the concept: "in 1796". Here are some questions that do not target the concept:
 79 |                         <ul>
 80 |                             <li>Who identified the mammoth as an extinct species? (<b>Wrong answer</b>: as the answer is "Georges Cuvier" and not "in 1796")</li>
 81 |                             <li>When did the white rhinoceros go extinct? (<b>Unanswered:</b> in the given context)</li>
 82 |                         </ul>
 83 |                     </div>
 84 |                 </div>
 85 |                 <div class="tuto_slide">
 86 |                     <div class="title">7. Error Type III: Question Phrasing is Inadequate</div>
 87 |                     <div class="content">
 88 |                         The question might technically be fluent and be answered by the target concept, but the phrasing might feel wrong. In our previous example with the following context:<br />
 89 |                         <i>The mammoth was identified as an extinct species of elephant by Georges Cuvier in 1796.</i>
 90 |                         And targetting the concept: "in 1796". Here are some inadequate questions:
 91 |                         <ul>
 92 |                             <li>On what year did Georges Cuvier identify the mammoth as an extinct species of elephant? (<b>Too specific</b>: on a quiz, we would not reveal this much information in the question)</li>
 93 |                             <li>When did they go extinct? (<b>Not specific enough</b>: the question is vague)</li>
 94 |                             <li>In 1796, when were mammoths identified as extinct? (<b>Reveals the answer</b> to the question)</li>
 95 |                             <li>When did mammoths go exctinct for a second time? (<b>Inconsistent</b>: claims information that is not present in the context)</li>
 96 |                         </ul>
 97 |                     </div>
 98 |                 </div>
 99 |                 <div class="tuto_slide">
100 |                     <div class="title">8. High-Level</div>
101 |                     <div class="content">
102 |                         If a question feels incorrect, and you would not include it in a quiz for the students, you should remove it from the quiz, and select the closest reason for the removal: Disfluent, Wrong Answer, or Inadequate.<br /><br />
103 |                         You should try to build quizzes with 8-12 concepts per document.
104 | 
105 |                         If you have any questions, please contact <b>Philippe Laban</b> on Slack!<br /><br />
106 |                         <center>
107 |                             <button class="btn btn-primary" onclick='$("#tutorial_container").hide();'>Close Tutorial</button>
108 |                         </center>
109 |                     </div>
110 |                 </div>
111 |             </div>
112 |         </div>
113 |     </div>
114 | 
115 | 
116 |     <div id="header">
117 |         Quiz Design
118 |         <div id="documents_row">
119 |             <!-- <select id="labeler_name">
120 |                 <option value="" selected>Select teacher name</option>
121 |                 <option value="phil">Philippe</option>
122 |                 <option value="jason">Jason</option>
123 |             </select> -->
124 |             <!-- &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -->
125 |             <div id="documents_list"></div>
126 |             &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
127 |             <button id="open_tutorial_btn" onclick="$('#tutorial_container').show();">Re-Open Tutorial</button>
128 |         </div>
129 |     </div>
130 |     <div id="content">
131 |         <div id="column1" onclick="verify_selection();">
132 |             <div class="column_title"></div>
133 |             <div id="selectable_content"></div>
134 | 
135 |         </div>
136 |     </div>
137 |     <div id="column2">
138 |         <div class="column_title">
139 |             Quiz Questions
140 |         </div>
141 |         <div id="no_questions">
142 |             Select a text span to see question options.
143 |         </div>
144 |         <div id="loading">
145 |             <img src="https://c.tenor.com/I6kN-6X7nhAAAAAj/loading-buffering.gif" alt="Loading...">
146 |         </div>
147 |         <div id="question_options">
148 | 
149 |         </div>
150 |     </div>
151 | 
152 |     <script src="https://ajax.googleapis.com/ajax/libs/jquery/3.6.0/jquery.min.js"></script>
153 |     <script type="text/javascript" src="//cdn.jsdelivr.net/npm/slick-carousel@1.8.1/slick/slick.min.js"></script>
154 |     <!-- <script src="/static/live.js"></script> -->
155 |     <script type="text/javascript">
156 |         $("head").append(`<link rel="stylesheet" href="/static/main.css?x=${Math.random()}" />`)
157 |         $("head").append(`<link rel="stylesheet" href="/static/slideshow.css?x=${Math.random()}" />`)
158 |         var documents = [];
159 |         var selections = {};
160 |         var uid = 0, active_uid = -1;
161 |         var active_doc_id = -1;
162 |         // var api_url = "http://localhost:5000";
163 |         var api_url = "";
164 |         function get_annotator_name() {
165 |             var url = new URL(window.location.href);
166 |             annotator_name = url.searchParams.get("anno");
167 |             if (annotator_name == null) {
168 |                 annotator_name = "";
169 |                 alert("Please reload the page that was sent to you (with annotator name)");
170 |             }
171 |             else {
172 |                 load_documents()
173 |             }
174 |         }
175 |         function load_documents() {
176 |             $.getJSON(`${api_url}/api/load_documents`, function(data) {
177 |                 documents = data.documents;
178 |                 selections = {};
179 |                 for(doc of documents) {
180 |                     selections[doc.doc_id] = [];
181 |                 }
182 |                 build_document_list();
183 |             });
184 |         }
185 |         function build_document_list() {
186 |             var documents_HTML = "<select class='document_select' onchange='select_document(this.value);'>";
187 |             documents_HTML += "<option value=''>Select Quiz Topic</option>";
188 |             for(var doc of documents) {
189 |                 var selected = (doc.doc_id == active_doc_id) ? "selected" : "";
190 |                 documents_HTML += `<option id='document_title_${doc.doc_id}' value='${doc.doc_id}' ${selected}>${doc.title.replaceAll("_", " ")}</option>`;
191 |             }
192 |             documents_HTML += "</select>";
193 |             $("#documents_list").html(documents_HTML);
194 |             $(".active_document_title").removeClass("active_document_title")
195 |             $(`#document_title_${active_doc_id}`).addClass("active_document_title")
196 |         }
197 |         function select_document(doc_id) {
198 |             if(annotator_name.length == 0) {
199 |                 alert("Select a name before starting.");
200 |                 return;
201 |             }
202 |             active_doc_id = doc_id;
203 |             re_run();
204 |         }
205 |         function re_run() {
206 |             var content_HTML = ""+documents[active_doc_id].content;
207 |             for(var s of selections[active_doc_id]) {
208 |                 var active_class = (active_uid==s.uid)?"active_span":"";
209 |                 if(s.confirmed) {
210 |                     var elem_HTML = `<span class='confirmed_span ${active_class}' id='span_${s.uid}' onclick='return run_selection(${s.uid});'><span class='cancel_btn' onclick='return run_cancellation(${s.uid});'>&#10006;</span> ${s.selection}</span>`;
211 |                 }
212 |                 else {
213 |                     var elem_HTML = `<span class='unconfirmed_span ${active_class}' id='span_${s.uid}'>${s.selection} <button onclick='return confirm_selection(${s.uid});'>&#10004;</button></span>`;
214 |                 }
215 |                 content_HTML = content_HTML.replace(s.selection, elem_HTML);
216 |             }
217 |             $("#selectable_content").html(content_HTML);
218 |             $("#column1 .column_title").html(documents[active_doc_id].title.replaceAll("_", " "));
219 |             $("#column2, #no_questions").show();
220 |             $("#question_options").hide();
221 |             build_document_list();
222 |         }
223 |         function get_uid_selection(uid) {
224 |             for(var s of selections[active_doc_id]) {
225 |                 if(s.uid == uid) {
226 |                     return s;
227 |                 }
228 |             }
229 |             return null;
230 |         }
231 |         function confirm_selection(uid) {
232 |             var s = get_uid_selection(uid);
233 |             s.confirmed = true;
234 |             re_run();
235 | 
236 |             run_selection(uid);
237 |             return false;
238 |         }
239 |         function fill_question_options(uid) {
240 |             var s = get_uid_selection(uid);
241 |             var questions_HTML = "";
242 |             var qi = 0;
243 |             for(var q of s.questions) {
244 |                 var question_class = (q.removed)?"question_removed":"";
245 |                 questions_HTML += `<div id="q_option_${uid}_${qi}" class='q_option ${question_class}'><div class="remove_q" onclick="show_reasons(${uid}, ${qi});">&#10006;</div><div class="question">${q.question}</div><div class="q_reasons"><span onclick="remove_question(${uid}, ${qi}, 'wrong_answer');">Wrong Answer</span><span onclick="remove_question(${uid}, ${qi}, 'wrong_context');">Inadequate</span><br /><span onclick="remove_question(${uid}, ${qi}, 'not_fluent');">Not Fluent</span><span onclick="remove_question(${uid}, ${qi}, 'other');">Other</span></div></div>`;
246 |                 qi ++;
247 |             }
248 |             $("#no_questions").hide();
249 |             $("#question_options").html(questions_HTML).show();
250 |         }
251 |         function show_reasons(uid, qi) {
252 |             var s = get_uid_selection(uid);
253 |             if(s.questions[qi].removed) {return;}
254 | 
255 |             if(!$(`#q_option_${uid}_${qi} .q_reasons`).is(":visible")) {
256 |                 $(`#q_option_${uid}_${qi} .q_reasons`).fadeIn(200);
257 |             }
258 |             else {
259 |                 $(`#q_option_${uid}_${qi} .q_reasons`).fadeOut(200);
260 |             }
261 | 
262 |         }
263 |         function remove_question(uid, qi, reason) {
264 |             var s = get_uid_selection(uid);
265 |             s.questions[qi].reason = reason;
266 |             if(s.questions[qi].removed) {
267 |                 s.questions[qi].removed = false;
268 |                 s.questions[qi].reason = "";
269 |             }
270 |             else {
271 |                 s.questions[qi].removed = true;
272 |             }
273 |             send_annotation(uid);
274 |             fill_question_options(uid);
275 |         }
276 |         function send_annotation(uid) {
277 |             var s = get_uid_selection(uid);
278 | 
279 |             var request_url = `${api_url}/api/annotate_questions`;
280 |             var request_data = {
281 |                 "doc_id": active_doc_id,
282 |                 "answer_span": s.selection,
283 |                 "answer_span_idx": s.uid,
284 |                 "questions": JSON.stringify(s.questions),
285 |                 "annotator_name": annotator_name
286 |             };
287 |             $.ajax({type: "POST", url: request_url, data: request_data});
288 | 
289 |         }
290 |         function hide_options() {
291 |             $("#question_options").hide();
292 |             $("#no_questions").show();
293 |         }
294 |         function run_cancellation(uid) {
295 |             var s = get_uid_selection(uid);
296 | 
297 |             var request_url = `${api_url}/api/cancel_selection`;
298 |             var request_data = {
299 |                 "doc_id": active_doc_id,
300 |                 "answer_span": s.selection,
301 |                 "annotator_name": annotator_name
302 |             };
303 |             $.ajax({type: "POST", url: request_url, data: request_data, success: function(response) {
304 |                 selections[active_doc_id] = selections[active_doc_id].filter(function(s) {
305 |                     console.log(">>>>", s.uid, uid)
306 |                     return s.uid != uid;
307 |                 });
308 |                 re_run();
309 |             }});
310 |             return false;
311 |         }
312 |         function run_selection(uid) {
313 |             var s = get_uid_selection(uid);
314 |             active_uid = uid;
315 |             if(!s.questions) {
316 |                 var request_url = `${api_url}/api/gen_questions`;
317 |                 var request_data = {
318 |                     "context": documents[active_doc_id].content,
319 |                     "selection": s.selection,
320 |                     "doc_id": active_doc_id
321 |                 };
322 |                 $("#loading").show();
323 |                 $("#question_options").hide();
324 |                 $.ajax({type: "POST", url: request_url, data: request_data, success: function(response) {
325 |                     s.questions = response.response;
326 |                     $("#loading").hide();
327 |                     fill_question_options(uid);
328 |                     send_annotation(uid);
329 |                 }});
330 |             }
331 |             else {
332 |                 fill_question_options(uid);
333 |             }
334 |             return false;
335 |         }
336 |         function filter_unconfirmed() {
337 |             selections[active_doc_id] = selections[active_doc_id].filter(function(item) {return item.confirmed;});
338 |         }
339 |         function verify_selection() {
340 |             filter_unconfirmed();
341 |             var selection = window.getSelection().toString().trim();
342 |             var num_spaces = selection.split(" ").length;
343 |             if(num_spaces > 10) {
344 |                 alert("You've selected a very long concept. To get more precise questions, we recommend selecting concepts of up to 8 words.");
345 |             }
346 |             if(selection && selection.length > 0) {
347 |                 active_uid = -1;
348 | 
349 |                 selections[active_doc_id].push({"selection": selection, "confirmed": false, "uid": uid});
350 |                 uid ++;
351 |             }
352 |             re_run();
353 |         }
354 |         get_annotator_name();
355 |         $('#tutorial_slider').slick({"dots": true});
356 | 
357 |     </script>
358 | </html>
359 | 


--------------------------------------------------------------------------------
/Quiz_Design/utils_qd_data.py:
--------------------------------------------------------------------------------
 1 | from datetime import datetime
 2 | import json, os
 3 | 
 4 | def load_qd_annotations():
 5 |     annotations = []
 6 |     with open("quiz_design_data.jsonl", "r") as f:
 7 |         for line in f:
 8 |             annotations.append(json.loads(line))
 9 | 
10 |     for d in annotations:
11 |         d["timestamp"] = datetime.strptime(d["timestamp"], "%Y-%m-%d %H:%M:%S")
12 |         d["doc_id"] = int(d["doc_id"])
13 |         
14 |     # Only keep the last annotation (as we store each step purposefully for timing)
15 |     annotations = sorted(annotations, key=lambda a: a["timestamp"])
16 |     M = {}
17 |     for d in annotations:
18 |         k = "%d||%s||%s" % (d["user_id"], d["doc_id"], d["answer_span"])
19 |         M[k] = d
20 |         
21 |     unique_annotations = sorted(M.values(), key=lambda a: a["timestamp"])
22 |     return unique_annotations
23 | 
24 | def build_qd_groups(annotations):
25 |     with open("qd_content.json", "r") as f:
26 |             evaluation_texts = json.load(f)
27 | 
28 |     groups = []
29 |     for annot in annotations:
30 |         answer_span = annot["answer_span"]
31 |         document = evaluation_texts[annot["doc_id"]]["content"]
32 |         paragraphs = document.split("<br />")
33 |         relevant_paragraphs = [p for p in paragraphs if answer_span in p]
34 |         relevant_paragraph = relevant_paragraphs[0]
35 |         
36 |         questions = []
37 |         for q in annot["questions"]:
38 |             label = 1 if "removed" not in q or q["removed"] is False else 0
39 |             reason = q.get("reason", "No error")
40 |             questions.append({"question": q["question"], "label": label, "reason": reason, "answer_span": answer_span, "model_name": q["model_name"]})
41 | 
42 |         d = {"doc_id": annot["doc_id"], "answer_span": answer_span, "context": relevant_paragraph, "questions": questions}
43 |         groups.append(d)
44 |     return groups


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Question Generation
2 | 
3 | * [Quiz Design: Helping Teachers Create Quizzes with Automated Question Generation](./Quiz_Design)
4 | * [MixQG: Neural Question Generation with Mixed Answer Types](./MixQG)
5 | 


--------------------------------------------------------------------------------
/SECURITY.md:
--------------------------------------------------------------------------------
1 | ## Security
2 | 
3 | Please report any security issue to [security@salesforce.com](mailto:security@salesforce.com)
4 | as soon as it is discovered. This library limits its runtime dependencies in
5 | order to reduce the total cost of ownership as much as can be, but all consumers
6 | should remain vigilant and have their security stakeholders review all third-party
7 | products (3PP) like this one and their dependencies.


--------------------------------------------------------------------------------