├── openai_account_files
    ├── used.txt
    └── accounts.txt
├── data
    └── download_data_and_put_them_here.txt
├── requirements.txt
├── download_data.py
├── llm_retrieval_prompt_drafts
    ├── update-using-missing-info-for-new-question.md
    ├── update-using-missing-info-for-new-passage.md
    ├── select_no_up_to.md
    └── filter_question_with_demo.md
├── searcher.py
├── README.md
├── multi_thread_openai_api_call.py
├── multi_process
    └── bm25_multi_process.py
├── llm.py
├── commands
    ├── eli5_iterative_retrieval.sh
    ├── qampari_iterative_retrieval.sh
    └── asqa_iterative_retrieval.sh
├── utils.py
├── prompts
    ├── qampari_default.json
    ├── asqa_demo.json
    ├── asqa_default.json
    └── eli5_default.json
├── openai_account_manager.py
├── run.py
├── eval.py
└── llm_retrieval_related
    └── iterative_select_supporting_documents.py


/openai_account_files/used.txt:
--------------------------------------------------------------------------------
1 | 


--------------------------------------------------------------------------------
/data/download_data_and_put_them_here.txt:
--------------------------------------------------------------------------------
1 | 


--------------------------------------------------------------------------------
/openai_account_files/accounts.txt:
--------------------------------------------------------------------------------
1 | EMAIL----PASSWORD----OPENAI_KEY


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
 1 | accelerate==0.22.0
 2 | faiss-cpu==1.7.4
 3 | FlagEmbedding==1.1.0
 4 | openai==0.27.4
 5 | sentence-transformers==2.2.2
 6 | sentencepiece==0.1.99
 7 | tiktoken==0.5.1
 8 | tokenizers==0.13.3
 9 | torch==2.0.1
10 | torchvision==0.15.2
11 | tqdm==4.45.0
12 | transformers==4.33.2
13 | pyserini==0.22.0


--------------------------------------------------------------------------------
/download_data.py:
--------------------------------------------------------------------------------
 1 | from datasets import load_dataset
 2 | import json
 3 | 
 4 | names = ['asqa_questions', 'qampari_questions', 'eli5_questions']
 5 | for name in names:
 6 |     ds = load_dataset("BeastyZ/Llatrieval", name)
 7 |     data = []
 8 |     for d in ds['train']:
 9 |         data.append(dict(d))
10 |     if name == 'asqa_questions':
11 |         save_path = './data/asqa_gtr_top100.json'
12 |     elif name == 'qampari_questions':
13 |         save_path = './data/qampari_gtr_top100.json'
14 |     else:
15 |         save_path = './data/eli5_bm25_top100.json'
16 |     with open(save_path, 'w') as f:
17 |         json.dump(data, f, indent=4, ensure_ascii=False)
18 | 


--------------------------------------------------------------------------------
/llm_retrieval_prompt_drafts/update-using-missing-info-for-new-question.md:
--------------------------------------------------------------------------------
 1 | You are a helpful assistant as introduced below.
 2 | 
 3 | ## Profile
 4 | - Language: English
 5 | - Description: You are a helpful assistant, capable of identifying missing content that answers the given question(s) but does not exist in the given possible answering passages and then using your own knowledge to genereate a new question base on the missing content you identify.
 6 | 
 7 | ### Input
 8 | - Question: The specific question(s).
 9 | - Answering Passages: Possible answering passages.
10 | 
11 | ### Output
12 | - A new question generated using missing content you identify based on your own knowledge.
13 | 
14 | ## Rules
15 | 1. Anyway, you have to use your own knowledge to generate a new question using missing content you identify.
16 | 2. Only generate the required new question. Do not output anything else.
17 | 3. Do not output the given question(s) and possible answering passages.
18 | 4. Do not output your analysis statement.
19 | 
20 | ## Workflow
21 | 1. Read and understand the question(s) and possible answering passages posed by the user.
22 | 2. Identify missing content that answers the given question(s) but does not exist in the given possible answering passages.
23 | 3. Use your own knowledge to generate a new question using missing content you identify.
24 | 
25 | ## Reminder
26 | You will always remind yourself of the role settings.


--------------------------------------------------------------------------------
/llm_retrieval_prompt_drafts/update-using-missing-info-for-new-passage.md:
--------------------------------------------------------------------------------
 1 | You are a helpful assistant as introduced below.
 2 | 
 3 | ## Profile
 4 | - Language: English
 5 | - Description: You are a helpful assistant, capable of identifying missing content that answers the given question(s) but does not exist in the given possible answering passages and then using your own knowledge to genereate correct answering passages using missing content you identify.
 6 | 
 7 | ### Input
 8 | - Question: The specific question(s).
 9 | - Answering Passages: Possible answering passages.
10 | 
11 | ### Output
12 | - Correct answering passages generated using missing content you identify based on your own knowledge.
13 | 
14 | ## Rules
15 | 1. Anyway, you have to use your own knowledge to generate correct answering passages using missing content you identify.
16 | 2. Only generate the required correct answering passages. Do not output anything else.
17 | 3. Directly use your own knowledge to generate correct answering passages if you think the given possible answering passages do not answer to the given question(s). 
18 | 4. Do not output the given question(s) and possible answering passages.
19 | 5. Do not output your analysis statement.
20 | 
21 | ## Workflow
22 | 1. Read and understand the question(s) and possible answering passages posed by the user.
23 | 2. identify missing content that answers the given question(s) but does not exist in the given possible answering passages.
24 | 3. Directly use your own knowledge to generate correct answering passages if you think the given possible answering passages do not answer to the given question(s). Otherwise use your own knowledge to generate correct answering passages using missing content you identify.
25 | 
26 | ## Reminder
27 | You will always remind yourself of the role settings.


--------------------------------------------------------------------------------
/searcher.py:
--------------------------------------------------------------------------------
 1 | from sklearn.feature_extraction.text import TfidfVectorizer
 2 | from sklearn.metrics.pairwise import cosine_similarity
 3 | import numpy as np
 4 | import torch
 5 | 
 6 | def doc_to_text_tfidf(doc):
 7 |     return doc['title'] + ' ' + doc['text']
 8 | 
 9 | def doc_to_text_dense(doc):
10 |     return doc['title'] + '. ' + doc['text']
11 | 
12 | 
13 | class SearcherWithinDocs:
14 | 
15 |     def __init__(self, docs, retriever, model=None, device="cuda"):
16 |         self.retriever = retriever
17 |         self.docs = docs
18 |         self.device = device
19 |         if retriever == "tfidf":
20 |             self.tfidf = TfidfVectorizer()
21 |             self.tfidf_docs = self.tfidf.fit_transform([doc_to_text_tfidf(doc) for doc in docs])
22 |         elif "gtr" in retriever: 
23 |             self.model = model
24 |             self.embeddings = self.model.encode([doc_to_text_dense(doc) for doc in docs], device=self.device, convert_to_numpy=False, convert_to_tensor=True, normalize_embeddings=True)
25 |         else:
26 |             raise NotImplementedError
27 | 
28 |     def search(self, query):
29 |         # Return the top-1 result doc id
30 | 
31 |         if self.retriever == "tfidf":
32 |             tfidf_query = self.tfidf.transform([query])[0]
33 |             similarities = [cosine_similarity(tfidf_doc, tfidf_query) for tfidf_doc in self.tfidf_docs]
34 |             best_doc_id = np.argmax(similarities)
35 |             return best_doc_id
36 |         elif "gtr" in self.retriever:
37 |             q_embed = self.model.encode([query], device=self.device, convert_to_numpy=False, convert_to_tensor=True, normalize_embeddings=True)
38 |             score = torch.matmul(self.embeddings, q_embed.t()).squeeze(1).detach().cpu().numpy()
39 |             best_doc_id = np.argmax(score)
40 |             return best_doc_id
41 |         else:
42 |             raise NotImplementedError
43 | 


--------------------------------------------------------------------------------
/llm_retrieval_prompt_drafts/select_no_up_to.md:
--------------------------------------------------------------------------------
 1 | You are DocSelectorGPT as introduced below.
 2 | 
 3 | # Role: DocSelectorGPT
 4 | 
 5 | ## Profile
 6 | - Language: English
 7 | - Description: You are DocSelectorGPT, capable of selecting a specified number (k) of documents for answering the user's specific question(s). k is a value specified by the user.
 8 | 
 9 | ### Input
10 | - Question: The specific question(s)
11 | - Candidate Documents: Documents contain supporting documents which can support answering the given questions. Candidate documents will have their own identifiers for FactRetrieverGPT to cite.
12 | 
13 | ### Skill
14 | 1. Analyzing the given question(s) and understanding the required information.
15 | 2. Searching through candidate documents to select k supporting documents whose combination can maximally support giving a direct, accurate, clear and engaging answer to the question and make the answer and is closely related to the core of the question.
16 | 
17 | ### Output
18 | - Selected Documents: The identifiers of selected supporting documents whose combination can maximally support giving an accurate and engaging answer to the question and make the answer and is closely related to the core of the question.
19 | 
20 | ### Output Format
21 | 
22 | Selected Documents: [document identifiers]
23 | 
24 | ### Output Example
25 | If the selected documents are 2, 6 and 8, the output should be as follows:
26 | 
27 | Selected Documents: 2 6 8
28 | 
29 | ## Rules
30 | 1. Don't break character.
31 | 2. When outputting the selected documents, only providing their own identifiers.
32 | 3. Strictly follow the specified output format. Do not answer the given question. Just conduct the specified retrieval task.
33 | 
34 | ## selection Criteria (Very Important)
35 | 1. The order and identifier of documents are not related to their priority.
36 | 2. Since your goal is to select a combination of supporting documents which can maximally support giving a direct, accurate, clear and engaging answer, you need to avoid redundant selection of documents containing the same or similar relevant content.
37 | 
38 | ## Workflow
39 | 1. Read and understand the questions posed by the user.
40 | 2. Browse through candidate documents to select k documents whose combination can maximally support giving a direct, accurate, clear and engaging answer to the question(s) and make the answer and is closely related to the core of the question(s).
41 | 3. List all selected documents.
42 | 
43 | ## Reminder
44 | You will always remind yourself of the role settings.


--------------------------------------------------------------------------------
/llm_retrieval_prompt_drafts/filter_question_with_demo.md:
--------------------------------------------------------------------------------
 1 | You are JudgeGPT as introduced below.
 2 | 
 3 | # Role: JudgeGPT
 4 | 
 5 | ## Profile
 6 | - Language: English
 7 | - Description: You are JudgeGPT, capable of judging whether a specified number (k) of documents can maximally support giving a direct, accurate, clear and engaging answer, similar to the answer of the demonstration, closely related to the core of the user's specific question(s).
 8 | 
 9 | ### Demonstration
10 | {Demo}
11 | 
12 | ### Input
13 | - Question: The specific question(s).
14 | - Candidate Documents: Documents whose combination may maximally support giving a direct, accurate, clear and engaging answer, similar to the answer of the demonstration, closely related to the core of the corresponding question(s).
15 | 
16 | ### Skill
17 | 1. Analyzing the given question(s) and understanding the required information.
18 | 2. Searching through documents to judge whether they can maximally support giving a direct, accurate, clear and engaging answer, similar to the answer of the demonstration, closely related to the core of the corresponding question(s).
19 | 
20 | ### Output
21 | - Judgment: "[YES]" if provided documents can maximally support giving a direct, accurate, clear, and engaging answer, similar to the answer of the demonstration, closely related to the core of the corresponding question(s), otherwise "[NO]".
22 | 
23 | ### Output Format
24 | Judgment: [YES] or [NO]
25 | 
26 | ### Output Example
27 | If provided documents can maximally support giving a direct, accurate, clear, and engaging answer, similar to the answer of the demonstration, closely related to the core of the corresponding question(s), the output should be as follows:
28 | [YES]
29 | 
30 | ## Rules
31 | 1. Don't break character.
32 | 2. When outputting final verdict, only providing "[YES]" or "[NO]".
33 | 3. Only output final verdict for the given question(s) and documents, do not evaluate the demonstration.
34 | 4. Strictly follow the specified output format. Do not answer the given question. Just conduct the specified judgment task.
35 | 
36 | ## Judgment Criteria (Very Important)
37 | 1. Do not allow the length of the documents to influence your evaluation.
38 | 2. Be as objective as possible.
39 | 3. Output "[YES]" if provided documents can maximally support giving a direct, accurate, clear, and engaging answer, similar to the answer of the demonstration, closely related to the core of the corresponding question(s), otherwise "[NO]".
40 | 
41 | ## Workflow
42 | 1. Read and understand the questions posed by the user.
43 | 2. Browse through documents to judge whether they can support giving a direct, accurate, clear, and engaging answer, similar to the answer of the demonstration, closely related to the core of the corresponding question(s).
44 | 3. Output your final verdict.
45 | 
46 | ## Reminder
47 | You will always remind yourself of the role settings.


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # LLatrieval: LLM-Verified Retrieval for Verifiable Generation
  2 | This repository contains the code and data for paper [LLatrieval: LLM-Verified Retrieval for Verifiable Generation](https://arxiv.org/abs/2311.07838). This repository also includes code to reproduce the method we propose in our paper.
  3 | 
  4 | ## :new:News
  5 | - **[2024/03/13]** Our submission to NAACL 2024, [LLatrieval: LLM-Verified Retrieval for Verifiable Generation](https://aclanthology.org/2024.naacl-long.305/), has been accepted to the main conference.
  6 | - **[2023/11/14]** We have published the preprint version of the paper on [arXiv](https://arxiv.org/abs/2311.07838).
  7 | - **[2023/11/09]** We have released the code for reproducing our method.
  8 | 
  9 | 
 10 | ## Quick Links
 11 | - [Requirements](#requirements)
 12 | - [Data](#data)
 13 | - [Code Structure](#code-structure)
 14 | - [Reproduce Our Method](#reproduce-our-method)
 15 | - [Citation](#citation)
 16 | 
 17 | 
 18 | ## Requirements
 19 | 1. We recommend that you use the python virtual environment and then install the dependencies.
 20 |     ```
 21 |     conda create -n lvr python=3.9.7
 22 |     ```
 23 | 2. Next, activate the python virtual environment you just created.
 24 |     ```
 25 |     conda activate lvr
 26 |     ```
 27 | 3. Finally, before running the code, make sure you have set up the environment and installed the required packages.
 28 |     ```
 29 |     pip install -r requirements.txt
 30 |     ```
 31 | 
 32 | ## Data
 33 | We uploaded the data to [Hugging Face](https://huggingface.co/datasets/BeastyZ/Llatrieval)🤗. 
 34 | 
 35 | **Start by installing 🤗 Datasets:**
 36 | ```bash
 37 | pip install datasets
 38 | ```
 39 | 
 40 | **Load a dataset**
 41 | 
 42 | This command will download the raw data to the `data/` folder.
 43 | ```bash
 44 | python download_data.py
 45 | ```
 46 | 
 47 | **Download corpus**
 48 | 
 49 | Use the following command to download the `BM25_SPHERE_CORPUS`.
 50 | ```bash
 51 | wget -P faiss_index https://dl.fbaipublicfiles.com/sphere/sphere_sparse_index.tar.gz
 52 | tar -xzvf faiss_index/sphere_sparse_index.tar.gz -C faiss_index
 53 | ```
 54 | 
 55 | Use the following command to download the `WIKI_TSV_CORPUS`.
 56 | ```bash
 57 | wget https://dl.fbaipublicfiles.com/dpr/wikipedia_split/psgs_w100.tsv.gz
 58 | gzip -xzvf psgs_w100.tsv.gz
 59 | ```
 60 | 
 61 | For more info about the Sphere and Wikipedia snapshot corpora, please refer to [ALCE](https://github.com/princeton-nlp/ALCE).
 62 | 
 63 | 
 64 | ## Code Structure
 65 | * `commands/`: folder that contains all shell files.
 66 | * `data/`: folder that contains all datasets.
 67 | * `llm_retrieval_prompt_drafts/`: folder that contains all prompt files.
 68 | * `llm_retrieval_related/`: folder that contains code for iteratively selecting supporting documents.
 69 | * `multi_process/`: folder that contains code for BM25 retrieval with multi-process support.
 70 | * `openai_account_files/`: folder that contains all OpenAI account files.
 71 | * `prompts/`: folder that contains all instruction and demonstration files.
 72 | * `eval.py`: eval file to evaluate generations.
 73 | * `Iterative_retrieval.py`: code for reproduce our method.
 74 | * `llm.py`: code for using LLM.
 75 | * `multi_thread_openai_api_call.py`: code for using gpt-3.5-turbo with multi-thread.
 76 | * `searcher.py`: code for retrieval using TfidfVectorizer.
 77 | * `run.py`: run file to generate citations.
 78 | * `utils.py`: file that contains auxiliary function.
 79 | 
 80 | 
 81 | ## Reproduce Our Method
 82 | **NOTE:** There must be raw data and a corpus for retrieval before running the following commands. Once you have them, you also need to modify the parameters of the corresponding files in the `commands` directory.
 83 | 
 84 | For ASQA, use the following command
 85 | ```bash
 86 | bash commands/asqa_iterative_retrieval.sh
 87 | ```
 88 | 
 89 | For QAMPARI, use the following command
 90 | ```bash
 91 | bash commands/qampari_iterative_retrieval.sh
 92 | ```
 93 | 
 94 | For ELI5, use the following command
 95 | ```bash
 96 | bash commands/eli5_iterative_retrieval.sh
 97 | ```
 98 | 
 99 | The result will be saved in `iter_retrieval_50/`.
100 | 
101 | 
102 | ## Citation
103 | ```
104 | @inproceedings{li-etal-2024-llatrieval,
105 |     title = "{LL}atrieval: {LLM}-Verified Retrieval for Verifiable Generation",
106 |     author = "Li, Xiaonan  and
107 |       Zhu, Changtai  and
108 |       Li, Linyang  and
109 |       Yin, Zhangyue  and
110 |       Sun, Tianxiang  and
111 |       Qiu, Xipeng",
112 |     editor = "Duh, Kevin  and
113 |       Gomez, Helena  and
114 |       Bethard, Steven",
115 |     booktitle = "Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)",
116 |     month = jun,
117 |     year = "2024",
118 |     address = "Mexico City, Mexico",
119 |     publisher = "Association for Computational Linguistics",
120 |     url = "https://aclanthology.org/2024.naacl-long.305",
121 |     pages = "5453--5471",
122 | }
123 | ```
124 | 


--------------------------------------------------------------------------------
/multi_thread_openai_api_call.py:
--------------------------------------------------------------------------------
 1 | import threading
 2 | import openai
 3 | import logging
 4 | 
 5 | logger = logging.getLogger(__name__)
 6 | import time
 7 | 
 8 | 
 9 | class MyThread(threading.Thread):
10 |     def __init__(self, thread_id, llm, account_manager, inp_manager, print_error, pbar, turbo_system_message,
11 |                  print_finish=True):
12 |         threading.Thread.__init__(self)
13 |         self.thread_id = thread_id
14 |         self.openai_account_manager_multi_thread = account_manager
15 |         self.openai_inp_manager = inp_manager
16 |         self.account = self.openai_account_manager_multi_thread.get_next_account(self.thread_id)
17 |         self.print_error = print_error
18 |         self.pbar = pbar
19 |         self.print_finish = print_finish
20 |         self.turbo_system_message = turbo_system_message
21 |         self.llm = llm
22 | 
23 |     def run(self):
24 | 
25 |         def repeat_until_success_call_openai_api(func):
26 |             def wrapper(*args, **kw):
27 |                 while 1:
28 |                     result = None
29 |                     try:
30 |                         result = func(*args, **kw)
31 |                     except openai.error.APIConnectionError as e:
32 |                         if self.print_error:
33 |                             logger.info('openai connection error, so retry after sleep 5 seconds')
34 |                             logger.info(e)
35 |                         time.sleep(5)
36 |                     except openai.error.RateLimitError as e:
37 |                         logger.info(type(e))
38 |                         if 'quota' in e._message:
39 |                             if self.print_error:
40 |                                 logger.info('now openai account {} runs out. so use next.'.format(self.account[-1]))
41 |                                 logger.info(type(e))
42 |                                 logger.info(e)
43 |                             self.account = self.openai_account_manager_multi_thread.get_next_account(self.thread_id,
44 |                                                                                                      self.account)
45 |                         else:
46 |                             logger.info("Meeting RateLimitError, sleep for 45 seconds.")
47 |                             time.sleep(45)
48 |                     except openai.error.AuthenticationError as e:
49 |                         if 'This key is associated with a deactivated account' in e._message:
50 |                             logger.info('the account {} is deactivated. so use next'.format(self.account[-1]))
51 |                             if self.print_error:
52 |                                 logger.info(e)
53 |                             self.account = self.openai_account_manager_multi_thread.get_next_account(self.thread_id,
54 |                                                                                                      self.account)
55 |                         else:
56 |                             logger.info('meet unexpected AuthenticationError, so retry after sleep 5 seconds')
57 |                             if self.print_error:
58 |                                 logger.info(e)
59 |                             self.account = self.openai_account_manager_multi_thread.get_next_account(self.thread_id,
60 |                                                                                                      self.account)
61 |                     except Exception as e:
62 |                         logger.info('meet unexpected error, so retry after sleep 5 seconds')
63 |                         logger.info(e)
64 |                         logger.info(type(e))
65 |                         time.sleep(5)
66 | 
67 |                     if result != None:
68 |                         return result
69 |                     else:
70 |                         pass
71 | 
72 |             return wrapper
73 | 
74 |         # pbar = tqdm.tqdm(total=len(self.idx_x_list_to_decode))
75 |         responses_with_idx = []
76 |         self.responses_with_idx = responses_with_idx
77 |         while True:
78 |             tmp = self.openai_inp_manager.get_next_gpt_idx_inp()
79 |             if tmp == None:
80 |                 if self.print_finish:
81 |                     logger.info('thread {} finish'.format(self.thread_id))
82 |                 return
83 |             else:
84 |                 idx_inp = tmp['inp']
85 |                 idx, inp = idx_inp
86 |                 hyper_parameter = tmp['hyper_parameter']
87 | 
88 |                 @repeat_until_success_call_openai_api
89 |                 def tmp_api_call():
90 | 
91 |                     result = self.llm.generate(inp, hyper_parameter['max_tokens'], api_key=self.account[-1],
92 |                                                turbo_system_message=self.turbo_system_message)
93 |                     return result
94 | 
95 |                 response = tmp_api_call()
96 |                 if self.pbar is not None:
97 |                     self.pbar.update(1)
98 |                 responses_with_idx.append([idx, response])
99 | 


--------------------------------------------------------------------------------
/multi_process/bm25_multi_process.py:
--------------------------------------------------------------------------------
  1 | from typing import List, Dict
  2 | import torch.multiprocessing as mp
  3 | import queue
  4 | import math
  5 | import json
  6 | import time
  7 | import logging
  8 | from pyserini.search import LuceneSearcher
  9 | 
 10 | logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
 11 | logger = logging.getLogger(__name__)
 12 | logger.setLevel(logging.INFO)
 13 | 
 14 | class BM25MultiProcess():
 15 |     def __init__(self, corpus_path: str=None, top_k: int=100):
 16 |         """
 17 |         Init class.
 18 |         """
 19 |         self.top_k = top_k
 20 |         self.corpus_path = corpus_path
 21 | 
 22 | 
 23 |     def start_multi_process_pool(self, process_num: int=8) -> Dict:
 24 |         """
 25 |         :param process_num: Number of process to use.
 26 |         :return: Returns a dict with the target processes, an input queue and and output queue.
 27 |         """
 28 |         target_devices = ['cpu'] * process_num
 29 |         logger.info("Start multi-process pool on devices: {}".format(', '.join(map(str, target_devices))))
 30 | 
 31 |         ctx = mp.get_context('spawn')
 32 |         input_queue = ctx.Queue()
 33 |         output_queue = ctx.Queue()
 34 |         processes = []
 35 | 
 36 |         for _ in target_devices:
 37 |             p = ctx.Process(target=BM25MultiProcess._multi_process_worker, args=(self, input_queue, output_queue), daemon=True)
 38 |             p.start()
 39 |             processes.append(p)
 40 | 
 41 |         return {'input': input_queue, 'output': output_queue, 'processes': processes}
 42 |     
 43 | 
 44 |     @staticmethod
 45 |     def _multi_process_worker(model, input_queue, results_queue) -> None:
 46 |         """
 47 |         Internal working process to retrieve documnents in multi-process setup.
 48 |         """
 49 |         searcher = LuceneSearcher(model.corpus_path)
 50 |         while True:
 51 |             try:
 52 |                 id, queries = input_queue.get()
 53 |                 docs_list = model.retrieve(queries, searcher)
 54 |                 results_queue.put([id, docs_list])
 55 |             except queue.Empty:
 56 |                 break
 57 | 
 58 | 
 59 |     def retrieve(self, queries: List[str], searcher: LuceneSearcher) -> List[List]:
 60 |         """
 61 |         Do retrieval using bm25.
 62 |         """
 63 |         docs_list = []
 64 |         for query in queries:
 65 |             start_time = time.time()
 66 |             try:
 67 |                 hits = searcher.search(query, self.top_k)
 68 |             except Exception as e:
 69 |                 if "maxClauseCount" in str(e):
 70 |                     query = " ".join(query.split())[:950]
 71 |                     hits = searcher.search(query, self.top_k)
 72 |                 else:
 73 |                     raise e
 74 |                 
 75 |             # For bm25 sphere
 76 |             docs = []
 77 |             for hit in hits:
 78 |                 h = json.loads(str(hit.docid).strip())
 79 |                 docs.append({
 80 |                     "title": h["title"],
 81 |                     "text": hit.raw,
 82 |                     "url": h["url"],
 83 |                     'score': hit.score,
 84 |                     'id':hit.docid
 85 |                 })
 86 | 
 87 |             docs_list.append(docs)
 88 |             end_time = time.time()
 89 |             logger.warning(f"It took {end_time - start_time} seconds.")
 90 |         return docs_list
 91 | 
 92 | 
 93 |     def retrieve_multi_process(self, queries: List[str], pool: Dict[str, object], chunk_size: int=None) -> List[List]:
 94 |         """
 95 |         :param queries: List of queries
 96 |         :param pool: A pool of workers started with BM25MultiProcess.start_multi_process_pool
 97 |         :param chunk_size: Queries are chunked and sent to the individual processes. If none, it determine a sensible size.
 98 |         :return: Retrieved documents.
 99 |         """
100 |         if chunk_size is None:
101 |             chunk_size = min(math.ceil(len(queries) / len(pool["processes"]) / 10), 5000)
102 | 
103 |         logger.info(f"Chunk queries into {math.ceil(len(queries) / chunk_size)} packages of size {chunk_size}")
104 | 
105 |         input_queue = pool['input']
106 |         last_chunk_id = 0
107 |         chunk = []
108 | 
109 |         for query in queries:
110 |             chunk.append(query)
111 |             if len(chunk) >= chunk_size:
112 |                 input_queue.put([last_chunk_id, chunk])
113 |                 last_chunk_id += 1
114 |                 chunk = []
115 | 
116 |         if len(chunk) > 0:
117 |             input_queue.put([last_chunk_id, chunk])
118 |             last_chunk_id += 1
119 | 
120 |         output_queue = pool['output']
121 |         results_list = sorted([output_queue.get() for _ in range(last_chunk_id)], key=lambda x: x[0])
122 |         docs_list = []
123 |         for result in results_list:
124 |             docs_list += result[1]
125 |         return docs_list
126 |     
127 | 
128 |     @staticmethod
129 |     def stop_multi_process_pool(pool: Dict):
130 |         """
131 |         Stops all processes started with start_multi_process_pool
132 |         """
133 |         for p in pool['processes']:
134 |             p.terminate()
135 | 
136 |         for p in pool['processes']:
137 |             p.join()
138 |             p.close()
139 | 
140 |         pool['input'].close()
141 |         pool['output'].close()
142 | 


--------------------------------------------------------------------------------
/llm.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | from transformers import AutoTokenizer
  3 | from utils import *
  4 | import openai
  5 | class LLM:
  6 | 
  7 |     def __init__(self, args):
  8 |         self.args = args
  9 | 
 10 |         if args.openai_api:
 11 |             # logger.info('into if args.openai_api:')
 12 |             import openai
 13 |             OPENAI_API_KEY = os.environ.get("OPENAI_API_KEY")
 14 |             OPENAI_ORG_ID = os.environ.get("OPENAI_ORG_ID")
 15 |             OPENAI_API_BASE = os.environ.get("OPENAI_API_BASE")
 16 |             assert OPENAI_API_KEY == None, "api_key={}".format(OPENAI_API_KEY)
 17 | 
 18 |             if args.azure:
 19 | 
 20 |                 openai.api_key = OPENAI_API_KEY
 21 |                 openai.api_base = OPENAI_API_BASE
 22 |                 openai.api_type = 'azure'
 23 |                 openai.api_version = '2022-12-01'
 24 |             else:
 25 |                 # logger.info('into not args.azure')
 26 |                 # logger.info('OPENAI_API_KEY:{}'.format(OPENAI_API_KEY))
 27 |                 openai.api_key = OPENAI_API_KEY
 28 |                 openai.organization = OPENAI_ORG_ID
 29 | 
 30 |             self.tokenizer = AutoTokenizer.from_pretrained("gpt2",
 31 |                                                            fast_tokenizer=False)  # TODO: For ChatGPT we should use a different one
 32 |             self.total_tokens = 0  # To keep track of how much the API costs
 33 |         else:
 34 |             self.model, self.tokenizer = load_model(args.model)
 35 | 
 36 |         self.prompt_exceed_max_length = 0
 37 |         self.fewer_than_50 = 0
 38 |         self.azure_filter_fail = 0
 39 | 
 40 |     def generate(self, prompt, max_tokens, api_key, stop=None, turbo_system_message=None):
 41 |         args = self.args
 42 |         if max_tokens == 0:
 43 |             self.prompt_exceed_max_length += 1
 44 |             logger.warning(
 45 |                 "Prompt exceeds max length and return an empty string as answer. If this happens too many times, it is suggested to make the prompt shorter")
 46 |             return ""
 47 |         if max_tokens < 50:
 48 |             self.fewer_than_50 += 1
 49 |             logger.warning(
 50 |                 "The model can at most generate < 50 tokens. If this happens too many times, it is suggested to make the prompt shorter")
 51 | 
 52 |         if args.openai_api:
 53 |             if "turbo" in args.model and not args.azure:
 54 |                 assert turbo_system_message != None
 55 |                 # For OpenAI's ChatGPT API, we need to convert text prompt to chat prompt
 56 |                 prompt = [
 57 |                     # {'role': 'system', 'content': "You are a helpful assistant that answers the following questions with proper citations."},
 58 |                     {'role': 'system', 'content': turbo_system_message},
 59 |                     {'role': 'user', 'content': prompt}
 60 |                 ]
 61 |             else:
 62 |                 if "turbo" in args.model:
 63 |                     deploy_name = "gpt-35-turbo-0301"
 64 |                 else:
 65 |                     deploy_name = args.model
 66 | 
 67 |             def repeat_until_success_call_openai_api_only_for_retry(func):
 68 |                 def wrapper(*args, **kw):
 69 |                     while 1:
 70 |                         result = None
 71 |                         try:
 72 |                             result = func(*args, **kw)
 73 |                         except openai.error.APIConnectionError as e:
 74 |                             logger.warning('openai connection error, so retry after sleep 1 seconds')
 75 |                             logger.warning(e)
 76 |                             time.sleep(1)
 77 |                         except openai.error.RateLimitError as e:
 78 |                             logger.warning(type(e))
 79 |                             if 'quota' in e._message:
 80 |                                 raise e
 81 |                             else:
 82 |                                 time.sleep(60)
 83 |                         except openai.error.AuthenticationError as e:
 84 |                             raise e
 85 |                         except Exception as e:
 86 |                             logger.warning('meet unexpected error, so retry after sleep 3 seconds')
 87 |                             logger.warning(e)
 88 |                             logger.warning(type(e))
 89 |                             time.sleep(3)
 90 | 
 91 |                         if result != None:
 92 |                             return result
 93 |                         else:
 94 |                             pass
 95 |                 return wrapper
 96 | 
 97 |             if "turbo" in args.model and not args.azure:
 98 |                 @repeat_until_success_call_openai_api_only_for_retry
 99 |                 def tmp_openai_call_func():
100 |                     response = openai.ChatCompletion.create(
101 |                         model=args.model,
102 |                         messages=prompt,
103 |                         temperature=args.temperature,
104 |                         max_tokens=max_tokens,
105 |                         stop=stop,
106 |                         top_p=args.top_p,
107 |                         api_key=api_key,
108 |                         n=self.args.num_samples,
109 |                     )
110 |                     return response
111 |                 response = tmp_openai_call_func()
112 |                 self.total_tokens += response['usage']['total_tokens']
113 |                 result = list(map(lambda x:x['message']['content'],response['choices']))
114 |                 return result
115 |             else:
116 | 
117 |                 @repeat_until_success_call_openai_api_only_for_retry
118 |                 def tmp_openai_call_func():
119 |                     response = openai.ChatCompletion.create(
120 |                         model=args.model,
121 |                         messages=prompt,
122 |                         temperature=args.temperature,
123 |                         max_tokens=max_tokens,
124 |                         stop=stop,
125 |                         top_p=args.top_p,
126 |                         api_key=api_key,
127 |                         n=self.args.num_samples
128 |                     )
129 |                     return response
130 |                 response = tmp_openai_call_func()
131 |                 self.total_tokens += response['usage']['total_tokens']
132 |                 result = list(map(lambda x:x['text'],response['choices']))
133 |                 return result
134 |         else:
135 | 
136 |             inputs = self.tokenizer([prompt], return_tensors="pt").to(self.model.device)
137 |             stop = [] if stop is None else stop
138 |             stop = list(set(stop + ["\n", "Ċ", "ĊĊ", "<0x0A>"]))  # In Llama \n is <0x0A>; In OPT \n is Ċ
139 |             stop_token_ids = list(set([self.tokenizer._convert_token_to_id(stop_token) for stop_token in stop] + [
140 |                 self.model.config.eos_token_id]))
141 |             if "llama" in args.model:
142 |                 stop_token_ids.remove(self.tokenizer.unk_token_id)
143 |             outputs = self.model.generate(
144 |                 **inputs,
145 |                 do_sample=True, temperature=args.temperature, top_p=args.top_p,
146 |                 max_new_tokens=max_tokens,
147 |                 num_return_sequences=1,
148 |                 eos_token_id=stop_token_ids
149 |             )
150 |             generation = self.tokenizer.decode(outputs[0][inputs['input_ids'].size(1):], skip_special_tokens=True)
151 |             return generation


--------------------------------------------------------------------------------
/commands/eli5_iterative_retrieval.sh:
--------------------------------------------------------------------------------
  1 | ##################### eli5 Iterative_retrieval; prompt12 #####################
  2 | export CUDA_VISIBLE_DEVICES=0
  3 | max_iteration=4
  4 | dataset_name=eli5
  5 | use_sub_questions=0
  6 | use_title=0
  7 | used_doc_field=answer
  8 | openai_model_name=gpt-3.5-turbo-0301
  9 | # Args for retrieval
 10 | input_file=data/eli5_bm25_top100.json
 11 | retriever=bm25
 12 | update_prompt_file=update-using-missing-info-for-new-question
 13 | update_query_using_missing_info_from_question_and_psgs=1
 14 | corpus_path=ATH_TO_YOUR_OWN_SPHERE_CORPUS
 15 | # Args for generating used field.
 16 | prompt_style=answer
 17 | target_used_field=answer
 18 | max_tokens=150
 19 | # Args for reranker
 20 | position=head
 21 | reranking_prompt_file=select_no_up_to
 22 | doc_num=50
 23 | # Args for filtration
 24 | demo_file=prompts/${dataset_name}_default.json
 25 | filtration_prompt_file=filter_question_with_demo
 26 | filtration_method=judgment
 27 | 
 28 | python Iterative_retrieval.py \
 29 |     --max_iteration $max_iteration \
 30 |     --dataset_name $dataset_name \
 31 |     --use_sub_questions $use_sub_questions \
 32 |     --use_title $use_title \
 33 |     --used_doc_field $used_doc_field \
 34 |     --openai_model_name $openai_model_name \
 35 |     --input_file $input_file \
 36 |     --retriever $retriever \
 37 |     --update_prompt_file $update_prompt_file \
 38 |     --update_query_using_missing_info_from_question_and_psgs $update_query_using_missing_info_from_question_and_psgs \
 39 |     --corpus_path $corpus_path \
 40 |     --prompt_style $prompt_style \
 41 |     --target_used_field $target_used_field \
 42 |     --max_tokens $max_tokens \
 43 |     --position $position \
 44 |     --reranking_prompt_file $reranking_prompt_file \
 45 |     --doc_num $doc_num \
 46 |     --demo_file $demo_file \
 47 |     --filtration_prompt_file $filtration_prompt_file \
 48 |     --filtration_method $filtration_method
 49 | 
 50 | # run_eval
 51 | export CUDA_VISIBLE_DEVICES=0
 52 | shot=1 
 53 | openai_api=1 
 54 | num_samples=1 
 55 | data_file=iter_retrieval_50/eli5_final_data/final_data_bm25_max_iteration-4_update-using-missing-info-for-new-question_head.json
 56 | ndoc=5
 57 | openai_multi_thread=6
 58 | model=gpt-3.5-turbo-0301
 59 | quick_test=0 
 60 | seed=42 
 61 | temperature=0 
 62 | eval_metric=default 
 63 | # Other args
 64 | dataset_name=eli5 
 65 | 
 66 | prompt_file=prompts/${dataset_name}_default.json
 67 | output_dir=iter_retrieval_50/${dataset_name}_max-4_bm25-new-question_llm-select-head_run_eval
 68 | mkdir $output_dir -p
 69 | output_file=${output_dir}/run_output.json
 70 | 
 71 | if [ ! -f "${output_file}" ]; then
 72 |   echo "*****************************"
 73 |   echo "start run.py"
 74 |   echo "*****************************"
 75 | 
 76 |   python run.py \
 77 |     --shot $shot \
 78 |     --openai_api $openai_api \
 79 |     --prompt_file $prompt_file \
 80 |     --output_fp $output_file \
 81 |     --dataset_name $dataset_name \
 82 |     --num_samples $num_samples \
 83 |     --data_file $data_file \
 84 |     --ndoc $ndoc \
 85 |     --openai_multi_thread $openai_multi_thread \
 86 |     --model $model \
 87 |     --quick_test $quick_test \
 88 |     --seed $seed \
 89 |     --temperature $temperature \
 90 |     --turbo_system_message "You are a helpful assistant that answers the following questions with proper citations."
 91 | 
 92 |   echo "*****************************"
 93 |   echo "finish run.py"
 94 |   echo "*****************************"
 95 | fi
 96 | 
 97 | eval_f=${output_file%.json}
 98 | eval_result_fp=${eval_f}.score
 99 | if [ ! -f $eval_result_fp ]; then
100 |     echo "*****************************"
101 |     echo "start eval.py"
102 |     echo "*****************************"
103 |     
104 |     python eval.py \
105 |       --f $output_file \
106 |       --eval_metric $eval_metric
107 | 
108 |     echo "*****************************"
109 |     echo "finish eval.py"
110 |     echo "*****************************"
111 | fi
112 | 
113 | 
114 | ##################### eli5 Iterative_retrieval; prompt13 #####################
115 | export CUDA_VISIBLE_DEVICES=0
116 | max_iteration=4
117 | dataset_name=eli5
118 | use_sub_questions=0
119 | use_title=0
120 | used_doc_field=answer
121 | openai_model_name=gpt-3.5-turbo-0301
122 | # Args for retrieval
123 | input_file=data/eli5_bm25_top100.json
124 | retriever=bm25
125 | update_prompt_file=update-using-missing-info-for-new-passage
126 | update_query_using_missing_info_from_question_and_psgs=1
127 | corpus_path=PATH_TO_YOUR_OWN_SPHERE_CORPUS
128 | # Args for generating used field.
129 | prompt_style=answer
130 | target_used_field=answer
131 | max_tokens=150
132 | # Args for reranker
133 | position=head
134 | reranking_prompt_file=select_no_up_to
135 | doc_num=50
136 | # Args for filtration
137 | demo_file=prompts/${dataset_name}_default.json
138 | filtration_prompt_file=filter_question_with_demo
139 | filtration_method=judgment
140 | 
141 | python Iterative_retrieval.py \
142 |     --max_iteration $max_iteration \
143 |     --dataset_name $dataset_name \
144 |     --use_sub_questions $use_sub_questions \
145 |     --use_title $use_title \
146 |     --used_doc_field $used_doc_field \
147 |     --openai_model_name $openai_model_name \
148 |     --input_file $input_file \
149 |     --retriever $retriever \
150 |     --update_prompt_file $update_prompt_file \
151 |     --update_query_using_missing_info_from_question_and_psgs $update_query_using_missing_info_from_question_and_psgs \
152 |     --corpus_path $corpus_path \
153 |     --prompt_style $prompt_style \
154 |     --target_used_field $target_used_field \
155 |     --max_tokens $max_tokens \
156 |     --position $position \
157 |     --reranking_prompt_file $reranking_prompt_file \
158 |     --doc_num $doc_num \
159 |     --demo_file $demo_file \
160 |     --filtration_prompt_file $filtration_prompt_file \
161 |     --filtration_method $filtration_method
162 | 
163 | # run_eval
164 | export CUDA_VISIBLE_DEVICES=0
165 | shot=1 
166 | openai_api=1 
167 | num_samples=1 
168 | data_file=iter_retrieval_50/eli5_final_data/final_data_bm25_max_iteration-4_update-using-missing-info-for-new-passage_head.json
169 | ndoc=5
170 | openai_multi_thread=6
171 | model=gpt-3.5-turbo-0301
172 | quick_test=0 
173 | seed=42 
174 | temperature=0 
175 | eval_metric=default 
176 | # Other args
177 | dataset_name=eli5 
178 | 
179 | prompt_file=prompts/${dataset_name}_default.json
180 | output_dir=iter_retrieval_50/${dataset_name}_max-4_bm25-new-paasage_llm-select-head_run_eval
181 | mkdir $output_dir -p
182 | output_file=${output_dir}/run_output.json
183 | 
184 | if [ ! -f "${output_file}" ]; then
185 |   echo "*****************************"
186 |   echo "start run.py"
187 |   echo "*****************************"
188 | 
189 |   python run.py \
190 |     --shot $shot \
191 |     --openai_api $openai_api \
192 |     --prompt_file $prompt_file \
193 |     --output_fp $output_file \
194 |     --dataset_name $dataset_name \
195 |     --num_samples $num_samples \
196 |     --data_file $data_file \
197 |     --ndoc $ndoc \
198 |     --openai_multi_thread $openai_multi_thread \
199 |     --model $model \
200 |     --quick_test $quick_test \
201 |     --seed $seed \
202 |     --temperature $temperature \
203 |     --turbo_system_message "You are a helpful assistant that answers the following questions with proper citations."
204 | 
205 |   echo "*****************************"
206 |   echo "finish run.py"
207 |   echo "*****************************"
208 | fi
209 | 
210 | eval_f=${output_file%.json}
211 | eval_result_fp=${eval_f}.score
212 | if [ ! -f $eval_result_fp ]; then
213 |     echo "*****************************"
214 |     echo "start eval.py"
215 |     echo "*****************************"
216 |     
217 |     python eval.py \
218 |       --f $output_file \
219 |       --eval_metric $eval_metric
220 | 
221 |     echo "*****************************"
222 |     echo "finish eval.py"
223 |     echo "*****************************"
224 | fi
225 | 


--------------------------------------------------------------------------------
/commands/qampari_iterative_retrieval.sh:
--------------------------------------------------------------------------------
  1 | ##################### qampari Iterative_retrieval; prompt12 #####################
  2 | export CUDA_VISIBLE_DEVICES=0
  3 | max_iteration=4
  4 | dataset_name=qampari
  5 | use_sub_questions=0
  6 | use_title=0
  7 | used_doc_field=summary
  8 | openai_model_name=gpt-3.5-turbo-0301
  9 | # Args for retrieval
 10 | input_file=data/qampari_gtr_top100.json
 11 | retriever=bge-large-en-v1.5
 12 | update_prompt_file=update-using-missing-info-for-new-question
 13 | update_query_using_missing_info_from_question_and_psgs=1
 14 | corpus_path=PATH_TO_YOUR_OWN_WIKI_CORPUS
 15 | # Args for generating used field.
 16 | prompt_style=summary
 17 | target_used_field=summary
 18 | max_tokens=150
 19 | # Args for reranker
 20 | position=head
 21 | reranking_prompt_file=select_no_up_to
 22 | doc_num=50
 23 | window_size=20
 24 | # Args for filtration
 25 | demo_file=prompts/${dataset_name}_default.json
 26 | filtration_prompt_file=filter_question_with_demo
 27 | filtration_method=judgment
 28 | 
 29 | python Iterative_retrieval.py \
 30 |     --max_iteration $max_iteration \
 31 |     --dataset_name $dataset_name \
 32 |     --use_sub_questions $use_sub_questions \
 33 |     --use_title $use_title \
 34 |     --used_doc_field $used_doc_field \
 35 |     --openai_model_name $openai_model_name \
 36 |     --input_file $input_file \
 37 |     --retriever $retriever \
 38 |     --update_prompt_file $update_prompt_file \
 39 |     --update_query_using_missing_info_from_question_and_psgs $update_query_using_missing_info_from_question_and_psgs \
 40 |     --corpus_path $corpus_path \
 41 |     --prompt_style $prompt_style \
 42 |     --target_used_field $target_used_field \
 43 |     --max_tokens $max_tokens \
 44 |     --position $position \
 45 |     --reranking_prompt_file $reranking_prompt_file \
 46 |     --doc_num $doc_num \
 47 |     --window_size $window_size \
 48 |     --filtration_prompt_file $filtration_prompt_file \
 49 |     --demo_file $demo_file \
 50 |     --filtration_method $filtration_method
 51 | 
 52 | # run_eval
 53 | export CUDA_VISIBLE_DEVICES=0
 54 | shot=1 
 55 | openai_api=1 
 56 | num_samples=1 
 57 | data_file=iter_retrieval_50/qampari_final_data/final_data_bge-large-en-v1.5_max_iteration-4_update-using-missing-info-for-new-question_head.json
 58 | ndoc=5
 59 | openai_multi_thread=10
 60 | model=gpt-3.5-turbo-0301
 61 | quick_test=0 
 62 | seed=42 
 63 | temperature=0 
 64 | eval_metric=default 
 65 | # Other args
 66 | dataset_name=qampari 
 67 | 
 68 | prompt_file=prompts/${dataset_name}_default.json
 69 | output_dir=iter_retrieval_50/${dataset_name}_max-4_bge-large-en-v1.5-new-question_llm-select-head_run_eval
 70 | mkdir $output_dir -p
 71 | output_file=${output_dir}/run_output.json
 72 | 
 73 | if [ ! -f "${output_file}" ]; then
 74 |   echo "*****************************"
 75 |   echo "start run.py"
 76 |   echo "*****************************"
 77 | 
 78 |   python run.py \
 79 |     --shot $shot \
 80 |     --openai_api $openai_api \
 81 |     --prompt_file $prompt_file \
 82 |     --output_fp $output_file \
 83 |     --dataset_name $dataset_name \
 84 |     --num_samples $num_samples \
 85 |     --data_file $data_file \
 86 |     --ndoc $ndoc \
 87 |     --openai_multi_thread $openai_multi_thread \
 88 |     --model $model \
 89 |     --quick_test $quick_test \
 90 |     --seed $seed \
 91 |     --temperature $temperature \
 92 |     --turbo_system_message "You are a helpful assistant that answers the following questions with proper citations."
 93 | 
 94 |   echo "*****************************"
 95 |   echo "finish run.py"
 96 |   echo "*****************************"
 97 | fi
 98 | 
 99 | eval_f=${output_file%.json}
100 | eval_result_fp=${eval_f}.score
101 | if [ ! -f $eval_result_fp ]; then
102 |     echo "*****************************"
103 |     echo "start eval.py"
104 |     echo "*****************************"
105 |     
106 |     python eval.py \
107 |       --f $output_file \
108 |       --eval_metric $eval_metric
109 | 
110 |     echo "*****************************"
111 |     echo "finish eval.py"
112 |     echo "*****************************"
113 | fi
114 | 
115 | 
116 | ##################### qampari Iterative_retrieval; prompt13 #####################
117 | export CUDA_VISIBLE_DEVICES=0
118 | max_iteration=4
119 | dataset_name=qampari
120 | use_sub_questions=0
121 | use_title=0
122 | used_doc_field=summary
123 | openai_model_name=gpt-3.5-turbo-0301
124 | # Args for retrieval
125 | input_file=data/qampari_gtr_top100.json
126 | retriever=bge-large-en-v1.5
127 | update_prompt_file=update-using-missing-info-for-new-passage
128 | update_query_using_missing_info_from_question_and_psgs=1
129 | corpus_path=PATH_TO_YOUR_OWN_WIKI_CORPUS
130 | # Args for generating used field.
131 | prompt_style=summary
132 | target_used_field=summary
133 | max_tokens=150
134 | # Args for reranker
135 | position=head
136 | reranking_prompt_file=select_no_up_to
137 | doc_num=50
138 | window_size=20
139 | # Args for filtration
140 | demo_file=prompts/${dataset_name}_default.json
141 | filtration_prompt_file=filter_question_with_demo
142 | filtration_method=judgment
143 | 
144 | python Iterative_retrieval.py \
145 |     --max_iteration $max_iteration \
146 |     --dataset_name $dataset_name \
147 |     --use_sub_questions $use_sub_questions \
148 |     --use_title $use_title \
149 |     --used_doc_field $used_doc_field \
150 |     --openai_model_name $openai_model_name \
151 |     --input_file $input_file \
152 |     --retriever $retriever \
153 |     --update_prompt_file $update_prompt_file \
154 |     --update_query_using_missing_info_from_question_and_psgs $update_query_using_missing_info_from_question_and_psgs \
155 |     --corpus_path $corpus_path \
156 |     --prompt_style $prompt_style \
157 |     --target_used_field $target_used_field \
158 |     --max_tokens $max_tokens \
159 |     --position $position \
160 |     --reranking_prompt_file $reranking_prompt_file \
161 |     --doc_num $doc_num \
162 |     --window_size $window_size \
163 |     --filtration_prompt_file $filtration_prompt_file \
164 |     --demo_file $demo_file \
165 |     --filtration_method $filtration_method
166 | 
167 | # run_eval
168 | export CUDA_VISIBLE_DEVICES=0
169 | shot=1 
170 | openai_api=1 
171 | num_samples=1 
172 | data_file=iter_retrieval_50/qampari_final_data/final_data_bge-large-en-v1.5_max_iteration-4_update-using-missing-info-for-new-passage_head.json
173 | ndoc=5
174 | openai_multi_thread=10
175 | model=gpt-3.5-turbo-0301
176 | quick_test=0 
177 | seed=42 
178 | temperature=0 
179 | eval_metric=default 
180 | # Other args
181 | dataset_name=qampari 
182 | 
183 | prompt_file=prompts/${dataset_name}_default.json
184 | output_dir=iter_retrieval_50/${dataset_name}_max-4_bge-large-en-v1.5-new-passage_llm-select-head_run_eval
185 | mkdir $output_dir -p
186 | output_file=${output_dir}/run_output.json
187 | 
188 | if [ ! -f "${output_file}" ]; then
189 |   echo "*****************************"
190 |   echo "start run.py"
191 |   echo "*****************************"
192 | 
193 |   python run.py \
194 |     --shot $shot \
195 |     --openai_api $openai_api \
196 |     --prompt_file $prompt_file \
197 |     --output_fp $output_file \
198 |     --dataset_name $dataset_name \
199 |     --num_samples $num_samples \
200 |     --data_file $data_file \
201 |     --ndoc $ndoc \
202 |     --openai_multi_thread $openai_multi_thread \
203 |     --model $model \
204 |     --quick_test $quick_test \
205 |     --seed $seed \
206 |     --temperature $temperature \
207 |     --turbo_system_message "You are a helpful assistant that answers the following questions with proper citations."
208 | 
209 |   echo "*****************************"
210 |   echo "finish run.py"
211 |   echo "*****************************"
212 | fi
213 | 
214 | eval_f=${output_file%.json}
215 | eval_result_fp=${eval_f}.score
216 | if [ ! -f $eval_result_fp ]; then
217 |     echo "*****************************"
218 |     echo "start eval.py"
219 |     echo "*****************************"
220 |     
221 |     python eval.py \
222 |       --f $output_file \
223 |       --eval_metric $eval_metric
224 | 
225 |     echo "*****************************"
226 |     echo "finish eval.py"
227 |     echo "*****************************"
228 | fi
229 | 


--------------------------------------------------------------------------------
/commands/asqa_iterative_retrieval.sh:
--------------------------------------------------------------------------------
  1 | ##################### asqa Iterative_retrieval; new-question #####################
  2 | export CUDA_VISIBLE_DEVICES=0
  3 | max_iteration=4
  4 | dataset_name=asqa
  5 | use_sub_questions=1
  6 | use_title=1
  7 | used_doc_field=summary_use_sub
  8 | openai_model_name=gpt-3.5-turbo-0301
  9 | # Args for retrieval
 10 | input_file=data/asqa_gtr_top100.json
 11 | retriever=bge-large-en-v1.5
 12 | update_prompt_file=update-using-missing-info-for-new-question
 13 | update_query_using_missing_info_from_question_and_psgs=1
 14 | corpus_path=PATH_TO_YOUR_OWN_WIKI_CORPUS
 15 | # Args for generating used field.
 16 | prompt_style=summary
 17 | target_used_field=summary_use_sub
 18 | max_tokens=150
 19 | # Args for reranker
 20 | position=head
 21 | reranking_prompt_file=select_no_up_to
 22 | doc_num=50
 23 | window_size=20
 24 | # Args for filtration
 25 | demo_file=prompts/${dataset_name}_demo.json
 26 | filtration_prompt_file=filter_question_with_demo
 27 | filtration_method=judgment
 28 | 
 29 | python Iterative_retrieval.py \
 30 |     --max_iteration $max_iteration \
 31 |     --dataset_name $dataset_name \
 32 |     --use_sub_questions $use_sub_questions \
 33 |     --use_title $use_title \
 34 |     --used_doc_field $used_doc_field \
 35 |     --openai_model_name $openai_model_name \
 36 |     --input_file $input_file \
 37 |     --retriever $retriever \
 38 |     --update_prompt_file $update_prompt_file \
 39 |     --update_query_using_missing_info_from_question_and_psgs $update_query_using_missing_info_from_question_and_psgs \
 40 |     --corpus_path $corpus_path \
 41 |     --prompt_style $prompt_style \
 42 |     --target_used_field $target_used_field \
 43 |     --max_tokens $max_tokens \
 44 |     --position $position \
 45 |     --reranking_prompt_file $reranking_prompt_file \
 46 |     --doc_num $doc_num \
 47 |     --window_size $window_size \
 48 |     --demo_file $demo_file \
 49 |     --filtration_prompt_file $filtration_prompt_file \
 50 |     --filtration_method $filtration_method
 51 | 
 52 | # run_eval
 53 | export CUDA_VISIBLE_DEVICES=0
 54 | shot=1 
 55 | openai_api=1 
 56 | num_samples=1 
 57 | data_file=iter_retrieval_50/asqa_final_data/final_data_bge-large-en-v1.5_max_iteration-4_update-using-missing-info-for-new-question_head.json
 58 | ndoc=5
 59 | openai_multi_thread=10
 60 | model=gpt-3.5-turbo-0301
 61 | quick_test=0 
 62 | seed=42 
 63 | temperature=0 
 64 | eval_metric=default 
 65 | use_sub_questions=1
 66 | # Other args
 67 | dataset_name=asqa 
 68 | 
 69 | prompt_file=prompts/${dataset_name}_default.json
 70 | output_dir=iter_retrieval_50/${dataset_name}_max-4_bge-large-en-v1.5-new-question_llm-select-head_run_eval
 71 | mkdir $output_dir -p
 72 | output_file=${output_dir}/run_output.json
 73 | 
 74 | if [ ! -f "${output_file}" ]; then
 75 |   echo "*****************************"
 76 |   echo "start run.py"
 77 |   echo "*****************************"
 78 | 
 79 |   python run.py \
 80 |     --shot $shot \
 81 |     --openai_api $openai_api \
 82 |     --prompt_file $prompt_file \
 83 |     --output_fp $output_file \
 84 |     --dataset_name $dataset_name \
 85 |     --num_samples $num_samples \
 86 |     --data_file $data_file \
 87 |     --ndoc $ndoc \
 88 |     --openai_multi_thread $openai_multi_thread \
 89 |     --model $model \
 90 |     --quick_test $quick_test \
 91 |     --seed $seed \
 92 |     --temperature $temperature \
 93 |     --use_sub_questions $use_sub_questions \
 94 |     --turbo_system_message "You are a helpful assistant that answers the following questions with proper citations."
 95 | 
 96 |   echo "*****************************"
 97 |   echo "finish run.py"
 98 |   echo "*****************************"
 99 | fi
100 | 
101 | eval_f=${output_file%.json}
102 | eval_result_fp=${eval_f}.score
103 | if [ ! -f $eval_result_fp ]; then
104 |     echo "*****************************"
105 |     echo "start eval.py"
106 |     echo "*****************************"
107 |     
108 |     python eval.py \
109 |       --f $output_file \
110 |       --eval_metric $eval_metric
111 | 
112 |     echo "*****************************"
113 |     echo "finish eval.py"
114 |     echo "*****************************"
115 | fi
116 | 
117 | 
118 | ##################### asqa Iterative_retrieval; new-passage #####################
119 | export CUDA_VISIBLE_DEVICES=0
120 | max_iteration=4
121 | dataset_name=asqa
122 | use_sub_questions=1
123 | use_title=1
124 | used_doc_field=summary_use_sub
125 | openai_model_name=gpt-3.5-turbo-0301
126 | # Args for retrieval
127 | input_file=data/asqa_gtr_top100.json
128 | retriever=bge-large-en-v1.5
129 | update_prompt_file=update-using-missing-info-for-new-passage
130 | update_query_using_missing_info_from_question_and_psgs=1
131 | corpus_path=PATH_TO_YOUR_OWN_WIKI_CORPUS
132 | # Args for generating used field.
133 | prompt_style=summary
134 | target_used_field=summary_use_sub
135 | max_tokens=150
136 | # Args for reranker
137 | position=head
138 | reranking_prompt_file=select_no_up_to
139 | doc_num=50
140 | window_size=20
141 | # Args for filtration
142 | demo_file=prompts/${dataset_name}_demo.json
143 | filtration_prompt_file=filter_question_with_demo
144 | filtration_method=judgment
145 | 
146 | python Iterative_retrieval.py \
147 |     --max_iteration $max_iteration \
148 |     --dataset_name $dataset_name \
149 |     --use_sub_questions $use_sub_questions \
150 |     --use_title $use_title \
151 |     --used_doc_field $used_doc_field \
152 |     --openai_model_name $openai_model_name \
153 |     --input_file $input_file \
154 |     --retriever $retriever \
155 |     --update_prompt_file $update_prompt_file \
156 |     --update_query_using_missing_info_from_question_and_psgs $update_query_using_missing_info_from_question_and_psgs \
157 |     --corpus_path $corpus_path \
158 |     --prompt_style $prompt_style \
159 |     --target_used_field $target_used_field \
160 |     --max_tokens $max_tokens \
161 |     --position $position \
162 |     --reranking_prompt_file $reranking_prompt_file \
163 |     --doc_num $doc_num \
164 |     --window_size $window_size \
165 |     --demo_file $demo_file \
166 |     --filtration_prompt_file $filtration_prompt_file \
167 |     --filtration_method $filtration_method
168 | 
169 | # run_eval
170 | export CUDA_VISIBLE_DEVICES=0
171 | shot=1 
172 | openai_api=1 
173 | num_samples=1 
174 | data_file=iter_retrieval_50/asqa_final_data/final_data_bge-large-en-v1.5_max_iteration-4_update-using-missing-info-for-new-passage_head.json
175 | ndoc=5
176 | openai_multi_thread=10
177 | model=gpt-3.5-turbo-0301
178 | quick_test=0 
179 | seed=42 
180 | temperature=0 
181 | eval_metric=default 
182 | use_sub_questions=1
183 | # Other args
184 | dataset_name=asqa 
185 | 
186 | prompt_file=prompts/${dataset_name}_default.json
187 | output_dir=iter_retrieval_50/${dataset_name}_max-4_bge-large-en-v1.5-new-passage_llm-select-head_run_eval
188 | mkdir $output_dir -p
189 | output_file=${output_dir}/run_output.json
190 | 
191 | if [ ! -f "${output_file}" ]; then
192 |   echo "*****************************"
193 |   echo "start run.py"
194 |   echo "*****************************"
195 | 
196 |   python run.py \
197 |     --shot $shot \
198 |     --openai_api $openai_api \
199 |     --prompt_file $prompt_file \
200 |     --output_fp $output_file \
201 |     --dataset_name $dataset_name \
202 |     --num_samples $num_samples \
203 |     --data_file $data_file \
204 |     --ndoc $ndoc \
205 |     --openai_multi_thread $openai_multi_thread \
206 |     --model $model \
207 |     --quick_test $quick_test \
208 |     --seed $seed \
209 |     --temperature $temperature \
210 |     --use_sub_questions $use_sub_questions \
211 |     --turbo_system_message "You are a helpful assistant that answers the following questions with proper citations."
212 | 
213 |   echo "*****************************"
214 |   echo "finish run.py"
215 |   echo "*****************************"
216 | fi
217 | 
218 | eval_f=${output_file%.json}
219 | eval_result_fp=${eval_f}.score
220 | if [ ! -f $eval_result_fp ]; then
221 |     echo "*****************************"
222 |     echo "start eval.py"
223 |     echo "*****************************"
224 |     
225 |     python eval.py \
226 |       --f $output_file \
227 |       --eval_metric $eval_metric
228 | 
229 |     echo "*****************************"
230 |     echo "finish eval.py"
231 |     echo "*****************************"
232 | fi
233 | 


--------------------------------------------------------------------------------
/utils.py:
--------------------------------------------------------------------------------
  1 | import logging
  2 | logger = logging.getLogger(__name__)
  3 | logger.setLevel(logging.INFO)
  4 | 
  5 | import torch
  6 | import re
  7 | import os
  8 | import string
  9 | import time
 10 | import pickle
 11 | 
 12 | 
 13 | def normalize_answer(s):
 14 |     def remove_articles(text):
 15 |         return re.sub(r"\b(a|an|the)\b", " ", text)
 16 | 
 17 |     def white_space_fix(text):
 18 |         return " ".join(text.split())
 19 | 
 20 |     def remove_punc(text):
 21 |         exclude = set(string.punctuation)
 22 |         return "".join(ch for ch in text if ch not in exclude)
 23 | 
 24 |     def lower(text):
 25 |         return text.lower()
 26 | 
 27 |     return white_space_fix(remove_articles(remove_punc(lower(s))))
 28 | 
 29 | 
 30 | def remove_citations(sent):
 31 |     return re.sub(r"\[\d+", "", re.sub(r" \[\d+", "", sent)).replace(" |", "").replace("]", "")
 32 | 
 33 | 
 34 | def get_max_memory():
 35 |     """Get the maximum memory available for the current GPU for loading models."""
 36 |     free_in_GB = int(torch.cuda.mem_get_info()[0]/1024**3)
 37 |     max_memory = f'{free_in_GB-6}GB'
 38 |     n_gpus = torch.cuda.device_count()
 39 |     max_memory = {i: max_memory for i in range(n_gpus)}
 40 |     return max_memory
 41 | 
 42 | 
 43 | def make_doc_prompt(doc, doc_id, doc_prompt, use_shorter=None):
 44 |     # For doc prompt:
 45 |     # - {ID}: doc id (starting from 1)
 46 |     # - {T}: title
 47 |     # - {P}: text
 48 |     # use_shorter: None, "summary", or "extraction"
 49 | 
 50 |     text = doc['text']
 51 |     if use_shorter is not None:
 52 |         text = doc[use_shorter]
 53 |     return doc_prompt.replace("{T}", doc["title"]).replace("{P}", text).replace("{ID}", str(doc_id+1))
 54 | 
 55 | 
 56 | def get_shorter_text(item, docs, ndoc, key):
 57 |     doc_list = []
 58 |     for item_id, item in enumerate(docs):
 59 |         if key not in item:
 60 |             if len(doc_list) == 0:
 61 |                 # If there aren't any document, at least provide one (using full text)
 62 |                 item[key] = item['text']
 63 |                 doc_list.append(item)
 64 |             logger.warn(f"No {key} found in document. It could be this data do not contain {key} or previous documents are not relevant. This is document {item_id}. This question will only have {len(doc_list)} documents.")
 65 |             break
 66 |         if "irrelevant" in item[key] or "Irrelevant" in item[key]:
 67 |             continue
 68 |         doc_list.append(item)
 69 |         if len(doc_list) >= ndoc:
 70 |             break
 71 |     return doc_list
 72 | 
 73 | 
 74 | def make_demo(item, prompt, ndoc=None, doc_prompt=None, instruction=None, use_shorter=None, test=False, use_sub_questions: int=0):
 75 |     # For demo prompt
 76 |     # - {INST}: the instruction
 77 |     # - {D}: the documents
 78 |     # - {Q}: the question
 79 |     # - {A}: the answers
 80 |     # ndoc: number of documents to put in context
 81 |     # use_shorter: None, "summary", or "extraction"
 82 | 
 83 |     # Use sub questions for asqa.
 84 |     if use_sub_questions and 'qa_pairs' in item:
 85 |         questions = list(map(lambda x: x['question'], list(item['qa_pairs'])))
 86 |     else:
 87 |         questions = [item['question']]
 88 |     prompt = prompt.replace("{INST}", instruction).replace("{Q}", '\n'.join(questions))
 89 |     if "{D}" in prompt:
 90 |         if ndoc == 0:
 91 |             prompt = prompt.replace("{D}\n", "") # if there is no doc we also delete the empty line
 92 |         else:
 93 |             doc_list = get_shorter_text(item, item["docs"], ndoc, use_shorter) if use_shorter is not None else item["docs"][:ndoc]
 94 |             text = "".join([make_doc_prompt(doc, doc_id, doc_prompt, use_shorter=use_shorter) for doc_id, doc in enumerate(doc_list)])
 95 |             prompt = prompt.replace("{D}", text)
 96 | 
 97 |     if not test:
 98 |         answer = "\n" + "\n".join(item["answer"]) if isinstance(item["answer"], list) else item["answer"]
 99 |         prompt = prompt.replace("{A}", "").rstrip() + answer
100 |     else:
101 |         prompt = prompt.replace("{A}", "").rstrip() # remove any space or \n
102 | 
103 |     return prompt
104 | 
105 | 
106 | def load_model(model_name_or_path, dtype=torch.float16, int8=False, reserve_memory=10):
107 |     # Load a huggingface model and tokenizer
108 |     # dtype: torch.float16 or torch.bfloat16
109 |     # int8: whether to use int8 quantization
110 |     # reserve_memory: how much memory to reserve for the model on each gpu (in GB)
111 | 
112 |     # Llama: set up the root dir
113 |     open_source_models = ["llama", "alpaca", "vicuna", "oasst"]
114 |     if any([m in model_name_or_path for m in open_source_models]):
115 |         model_name_or_path = os.path.join(os.environ["LLAMA_ROOT"], model_name_or_path)
116 | 
117 |     # Load the FP16 model
118 |     from transformers import AutoModelForCausalLM, AutoTokenizer
119 |     logger.info(f"Loading {model_name_or_path} in {dtype}...")
120 |     if int8:
121 |         logger.warn("Use LLM.int8")
122 |     start_time = time.time()
123 |     model = AutoModelForCausalLM.from_pretrained(
124 |         model_name_or_path,
125 |         device_map='auto',
126 |         torch_dtype=dtype,
127 |         max_memory=get_max_memory(),
128 |         load_in_8bit=int8,
129 |     )
130 |     logger.info("Finish loading in %.2f sec." % (time.time() - start_time))
131 | 
132 |     # Load the tokenizer
133 |     tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=False)
134 | 
135 |     # Fix OPT bos token problem in HF
136 |     if "opt" in model_name_or_path:
137 |         tokenizer.bos_token = "<s>"
138 |     tokenizer.padding_side = "left"
139 | 
140 |     return model, tokenizer
141 | 
142 | 
143 | # Save p_embeddings to local position.
144 | def save_embeddings(embeddings, file_path):
145 |     with open(file_path, 'wb') as file:
146 |         pickle.dump(embeddings, file, protocol=4)
147 | 
148 | 
149 | # Load local p_embeddings
150 | def load_embeddings(file_path):
151 |     with open(file_path, 'rb') as file:
152 |         embeddings = pickle.load(file)
153 |     return embeddings
154 | 
155 | 
156 | def get_demonstration(demo_data) -> str:
157 |     """
158 |     Args
159 |     ----
160 |     demo_data: Dict
161 |         Data for creating demonstration prompt.
162 |     
163 |     Returns
164 |     -------
165 |     demos: str
166 |         Demonstration prompt.
167 |     """
168 |     if 'qa_pairs' in demo_data:
169 |         logger.warning("Load sub questions when getting demostration.")
170 |         questions = list(map(lambda x: x['question'], list(demo_data['qa_pairs'])))
171 |         question = "\n".join(questions)
172 |         answer = demo_data["answer"]
173 |     else:
174 |         doc = demo_data["demos"][0]
175 |         question = doc['question']
176 |         answer = doc["answer"].replace('[1]', '').replace('[2]', '').replace('[3]', '').replace('[4]', '').replace('[5]', '')
177 |     
178 |     demos = f"Question: {question}\nAnswer: {answer}"
179 |     return demos
180 | 
181 | 
182 | def get_messages(questions: str, prompt_style: str, doc):
183 |     """
184 |     Args
185 |     ----
186 |     questions: str
187 |         Given questions.
188 |     prompt_style: str
189 |         Set the type of user's content.
190 |     doc: List[Dict]
191 |         One of documents that belong to a problem.
192 |     """
193 |     if prompt_style == 'summary':
194 |         messages = [
195 |             {'role': 'system', 'content': "You are a helpful assistant."},
196 |             {'role': 'user', 'content': f"Summarize the following document within 50 words for the given question(s). Return \"irrelevant\" if the document is irrelevant to the question(s). Try to keep all the important dates, numbers, and names.\nQuestion(s):\n{questions}\n\nDocument:\nTitle: {doc['title']}\nText: {doc['text']}\n\nSummary:"}
197 |         ]
198 |     elif prompt_style == 'answer':
199 |         messages = [
200 |             {'role': 'system', 'content': "You are a helpful assistant."},
201 |             {'role': 'user', 'content': f"Answer the given question(s) using the following document. Return \"irrelevant\" if the document is irrelevant to the question(s). Try to keep all the important dates, numbers, and names.\n\nQuestion(s):\n{questions}\n\nDocument:\nTitle: {doc['title']}\nText: {doc['text']}\n\nAnswer:"}
202 |         ]
203 |     else:
204 |         raise NotImplementedError
205 |     
206 |     return messages


--------------------------------------------------------------------------------
/prompts/qampari_default.json:
--------------------------------------------------------------------------------
  1 | {
  2 |     "instruction": "Instruction: Provide a list of accurate answers for the given question using only the provided search results (some of which might be irrelevant) and cite them properly. Always cite one and only one document for each answer. Separate answers by commas. For questions that have more than 5 answers, write at least 5 answers.",
  3 |     "demo_sep": "\n\n\n",
  4 |     "demo_prompt": "{INST}\n\nQuestion: {Q}\n\n{D}\nAnswer: {A}",
  5 |     "doc_prompt": "Document [{ID}](Title: {T}): {P}\n",
  6 |     "demos": [
  7 |         {
  8 |             "question": "Which books were written by Nevil Shute?",
  9 |             "answer": "Marazan [1], Stephen Morris [1], Beyond the Black Stump [2], Lonely Road [2], The Chequer Board [2], In the Wet [2], Trustee from the Toolroom [2], Round the Bend [2], No Highway [3], Ruined City [3], On the Beach [3].",
 10 |             "docs": [
 11 |                 {
 12 |                     "title": "Nevil Shute",
 13 |                     "text": "early stages. My congratulations.\" His celebrity as a writer caused the Ministry of Information to send him to the Normandy Landings on 6 June 1944 and later to Burma as a correspondent. He finished the war with the rank of lieutenant commander in the Royal Navy Volunteer Reserves (RNVR). Shute's first novel, \"Stephen Morris\", was written in 1923, but not published until 1961. His first published novel was \"Marazan\", which came out in 1926. After that he averaged one novel every two years through the 1950s, with the exception of a six-year hiatus while he was establishing his own aircraft"
 14 |                 },
 15 |                 {
 16 |                     "title": "Nevil Shute",
 17 |                     "text": "theme is the bridging of social barriers such as class (\"Lonely Road\" and \"Landfall\"), race (\"The Chequer Board\"), or religion (\"Round the Bend\"). The Australian novels are individual hymns to that country, with subtle disparagement of the mores of the United States (\"Beyond the Black Stump\") and overt antipathy towards the post-World War II socialist government of Shute's native Britain (\"The Far Country\" and \"In the Wet\"). Shute's heroes tended to be like himself: middle class solicitors, doctors, accountants, bank managers, engineers, generally university graduates. However (as in \"Trustee from the Toolroom\"), Shute valued the honest artisans and their social"
 18 |                 },
 19 |                 {
 20 |                     "title": "Nevil Shute",
 21 |                     "text": "construction company, Airspeed Ltd. His popularity grew slowly with each novel, but he became much more famous after the publication of \"On the Beach\" in 1957. Shute's novels are written in a simple, highly readable style, with clearly delineated plot lines. Where there is a romantic element, sex is referred to only obliquely. Many of the stories are introduced by a narrator who is not a character in the story. The most common theme in Shute's novels is the dignity of work, spanning all classes, whether an Eastern European bar \"hostess\" (\"Ruined City\") or brilliant boffin (\"No Highway\"). Another recurrent"
 22 |                 },
 23 |                 {
 24 |                     "title": "The Chequer Board",
 25 |                     "text": "the Burmese people\", both of which are central to the book's story. Shute was concerned that sales of the book in the United States would be negatively impacted by the book's open-minded handling of racial issues; as it turned out, sales soared. Shute and his wife traveled the U.S. on Greyhound buses to \"\"get in touch with the man on the street,\"\" finding the experience refreshing. Afterwards he wrote \"\"Sincerity is the first attribute for making money in the business of writing novels.\"\" The Chequer Board The Chequer Board is a novel by Nevil Shute, first published in the United"
 26 |                 },
 27 |                 {
 28 |                     "title": "In the Wet",
 29 |                     "text": "had used the idea of multiple votes for merit in his short story \"The Curious Republic of Gondour\". In the Wet In The Wet is a novel by Nevil Shute that was first published in the United Kingdom in 1953. It contains many of the typical elements of a hearty and adventurous Shute yarn such as flying, the future, mystic states, and ordinary people doing extraordinary things. The story is opened by its initial narrator \u2013 an Anglican priest in the Bush Brotherhood named Roger Hargreaves \u2013 who describes his ordinary circumstances in a large parish of the Australian outback"
 30 |                 }
 31 |             ]
 32 |         },
 33 |         {
 34 |             "question": "Which film has Gong Li as a member of its cast?",
 35 |             "answer": "The Story of Qiu Ju [1], Farewell My Concubine [2], Flirting Scholar [2], The Monkey King 2 [3], Mulan [3], Saturday Fiction [3], Coming Home [3].",
 36 |             "docs": [
 37 |                 {
 38 |                     "title": "Gong Li",
 39 |                     "text": "Gong Li Gong Li (born 31 December 1965) is a Chinese-born Singaporean film actress. She achieved international prominence through her close collaborations with Chinese director Zhang Yimou and won the Volpi Cup for Best Actress at Venice for her performance in his 1992 film \"The Story of Qiu Ju\". She has been credited with helping to bring Chinese cinema to prominence in Europe and the United States. In 2006, she was voted the most beautiful woman in China. Gong has won numerous accolades for her work as an actress; she won the New York Film Critics Circle Award for Best"
 40 |                 },
 41 |                 {
 42 |                     "title": "Gong Li",
 43 |                     "text": "making her realize that she has assisted the dark cynical system. In 1993, she received a New York Film Critics Circle award for her role in \"Farewell My Concubine\" (1993). Directed by Chen Kaige, the film was her first major role with a director other than Zhang Yimou. In the same year, she was awarded with the Berlinale Camera at the 43rd Berlin International Film Festival. \"Premiere\" magazine ranked her performance in \"Farewell My Concubine\" as the 89th greatest performance of all time. She also worked with renowned director Stephen Chow in comedy films \"\" (1991) and \"Flirting Scholar\" (1993)."
 44 |                 },
 45 |                 {
 46 |                     "title": "Gong Li",
 47 |                     "text": "International Film Festival. Later that same year, she reunited with Zhang Yimou for the film \"Coming Home\", which is set during the throes of the Cultural Revolution; this film was their first collaboration since 2006. In 2016, Gong took on her first action role in \"The Monkey King 2\", playing the White Bone Demon. In 2018, Gong was cast in Lou Ye's period drama \"Saturday Fiction\", where she plays an actress who is working undercover gathering intelligence for the Allies. That year, she was also cast in the live-action adaptation of the 1998 Disney animated film \"Mulan\", as an unspecified"
 48 |                 },
 49 |                 {
 50 |                     "title": "Zhang Yimou",
 51 |                     "text": "in Zhang's earlier films. \"Raise the Red Lantern\" was nominated in the Best Foreign Language Film category at the 1992 Academy Awards, becoming the second Chinese film to earn this distinction (after Zhang's \"Ju Dou\"). It eventually lost out to Gabriele Salvatores's \"Mediterraneo\". Zhang's next directorial work, \"The Story of Qiu Ju\", in 1992, once again starring Gong Li in the lead role. The film, which tells the tale of a peasant woman seeking justice for her husband after he was beaten by a village official, was a hit at film festivals and won the Golden Lion award at the"
 52 |                 },
 53 |                 {
 54 |                     "title": "Gong Li",
 55 |                     "text": "Gong Li Gong Li (born 31 December 1965) is a Chinese-born Singaporean film actress. She achieved international prominence through her close collaborations with Chinese director Zhang Yimou and won the Volpi Cup for Best Actress at Venice for her performance in his 1992 film \"The Story of Qiu Ju\". She has been credited with helping to bring Chinese cinema to prominence in Europe and the United States. In 2006, she was voted the most beautiful woman in China. Gong has won numerous accolades for her work as an actress; she won the New York Film Critics Circle Award for Best"
 56 |                 }
 57 |             ]
 58 |         },
 59 |         {
 60 |             "question": "In which years did Patti LaBelle publish music?",
 61 |             "answer": "2006 [1], 1977 [2], 2004 [3], 2005 [3], 2000 [3], 2006 [3].",
 62 |             "docs": [
 63 |                 {
 64 |                     "title": "The Gospel According to Patti LaBelle",
 65 |                     "text": "The Gospel According to Patti LaBelle The Gospel According to Patti LaBelle is the first gospel album released by singer Patti LaBelle, released in November 2006. This project began three years ago when Patti's late musical director and close friend Budd Ellison told a skeptical LaBelle that \"it's now or never, Patti.\" The album is dedicated to his memory as he succumbed to prostate cancer before the album saw a release. The album was released on November 21, 2006 through indie label Umbrella/Bungalow Records, also home to Carl Thomas, Rodney Jerkins, Dean \"DC\" Charles, and other artists. \"The Gospel According"
 66 |                 },
 67 |                 {
 68 |                     "title": "Patti LaBelle (album)",
 69 |                     "text": "scaled the high sixties on the \"Billboard\" R&B chart, it soon became one of her famous show-stoppers while performing the song. LaBelle performed the song at her first solo concert in London, getting a standing ovation, which helped to give LaBelle motivation to continue her career. The album, when released, performed successfully, reaching number 62 on the \"Billboard\" 200 and number 31 on the R&B albums chart, while critics hailed the album. Patti LaBelle (album) Patti LaBelle is the debut solo album by singer Patti LaBelle, released in 1977. The first album LaBelle recorded after sixteen years fronting the band"
 70 |                 },
 71 |                 {
 72 |                     "title": "Patti LaBelle",
 73 |                     "text": "win. In 2000, LaBelle released her final MCA album, \"When a Woman Loves\", before signing with Def Soul Classics to release the 2004 album, \"Timeless Journey\". Following the release of her 2005 covers album, \"Classic Moments\", LaBelle engaged in a rivalry with Antonio \"L.A.\" Reid over the direction of her career, leading to her leaving the label.In the same year, the World Music Awards recognized her years in the music business by awarding her the Legend Award. In 2006, she released her first gospel album, \"The Gospel According to Patti LaBelle\" on the Bungalo label, the album later peaking at"
 74 |                 },
 75 |                 {
 76 |                     "title": "Patti LaBelle",
 77 |                     "text": "Patti LaBelle Patti LaBelle (born Patricia Louise Holt; May 24, 1944) is an American singer, actress, and entrepreneur. LaBelle began her career in the early 1960s as lead singer and front woman of the vocal group, Patti LaBelle and the Bluebelles. Following the group's name change to Labelle in the early 1970s, they released the iconic disco song \"Lady Marmalade\" and the group later became the first African-American vocal group to land the cover of \"Rolling Stone\" magazine. After the group split in 1976, LaBelle began a successful solo career, starting with her critically acclaimed debut album, which included the"
 78 |                 },
 79 |                 {
 80 |                     "title": "The Gospel According to Patti LaBelle",
 81 |                     "text": "Billboard's Top Gospel Albums chart for 17 weeks. \"Where Love Begins,\" a duet with Yolanda Adams was played frequently on R&B and gospel radio stations and debuted at #68 on Billboard's Hot R&B/Hip-Hop tracks. The second single \"Anything\" featuring Kanye West, Mary Mary and Consequence hit #64 on Billboards Hot R&B/Hip-Hop tracks. In 2008, the album was nominated for a Dove Award for Contemporary Gospel Album of the Year at the 39th GMA Dove Awards. The Gospel According to Patti LaBelle The Gospel According to Patti LaBelle is the first gospel album released by singer Patti LaBelle, released in November"
 82 |                 }
 83 |             ]
 84 |         },
 85 |         {
 86 |             "question": "Glenn Ford was a member of cast in which film?",
 87 |             "answer": "So Ends Our Night [1], Heaven with a Barbed Wire Fence [1], Happy Birthday to Me [2], The Greatest Gift [2], The Gift [2], The Brotherhood of the Bell [3].",
 88 |             "docs": [
 89 |                 {
 90 |                     "title": "Glenn Ford",
 91 |                     "text": "name came from his father's hometown of Glenford, Alberta. His first major movie part was in the 1939 film, \"Heaven with a Barbed Wire Fence\". Top Hollywood director John Cromwell was impressed enough with his work to borrow him from Columbia for the independently produced drama, \"So Ends Our Night\" (1941), where Ford delivered a poignant portrayal of a 19-year-old German exile on the run in Nazi-occupied Europe. Working with Academy Award-winning Fredric March and wooing (onscreen) 30-year-old Margaret Sullavan, recently nominated for an Oscar, Ford's shy, ardent young refugee riveted attention even in such stellar company. \"Glenn Ford, a"
 92 |                 },
 93 |                 {
 94 |                     "title": "Glenn Ford",
 95 |                     "text": "were Westerns. He suggested doing a Western series, instead, which resulted in the \"modern-day Western\" series, \"Cade's County\". Ford played southwestern Sheriff Cade for one season (1971\u20131972) in a mix of police mystery and western drama. In \"The Family Holvak\" (1975\u20131976), Ford portrayed a Depression-era preacher in a family drama, reprising the same character he had played in the TV film, \"The Greatest Gift\". In 1978 Ford was host, presenter and narrator of the disaster documentary series 'When Havoc Struck'. In 1981, Ford co-starred with Melissa Sue Anderson in the slasher film \"Happy Birthday to Me\". In 1991, Ford agreed"
 96 |                 },
 97 |                 {
 98 |                     "title": "CBS Thursday Night Movie",
 99 |                     "text": "Night Movie\" opened its fall schedule with the premiere of a low-budget, made-for-TV movie, rather than a proven Hollywood blockbuster guaranteed to lure mass viewership, it became CBS's way of declaring its commitment to product that, although cheaply manufactured, was nevertheless new and topical. In this case, the movie was \"The Brotherhood of the Bell\", and the film's star was Glenn Ford, a movie actor who had never appeared in a television-film. In fact, before shooting on the project even began, Ford had been warned by friends in the industry that he would hate the experience. Instead, the actor reported"
100 |                 },
101 |                 {
102 |                     "title": "The Trouble with Girls (film) ",
103 |                     "text": "with Charlene, but when she refuses to give in, he deceives her and uses the local police force to be sure that she must leave on the train with the rest of the troupe. Cast notes In June 1959 it was announced that Don Mankiewicz would write a screenplay of an unpublished story by Mauri Grashin, Day Keene, and Dwight Babcock. By December 1960, with the project titled \"Chautauqua\", MGM was ready to make the film with Glenn Ford. Rumours circulating in Hollywood at the time stated that Presley would co-star with Ford, Hope Lange, and Arthur O'Connell, but nothing"
104 |                 },
105 |                 {
106 |                     "title": "Trouble in the Glen",
107 |                     "text": "Mel Ferrer. It was Orson Welles' fifth British movie in six months. Filming started 15 December 1953. The film received very poor reviews. Trouble in the Glen Trouble in the Glen is a 1954 British comedy film directed by Herbert Wilcox and starring Margaret Lockwood, Orson Welles, Forrest Tucker and Victor McLaglen. It is loosely based on Maurice Walsh's 1950 novel of the same name. It was filmed in Trucolor for Republic Pictures. After moving from South America to the Scottish Highlands, millionaire Sanin Cejador y Mengues (Welles) reassumes the title of laird of Glen Easan, which he inherited from"
108 |                 }
109 |             ]
110 |         }
111 |     ]
112 | }
113 | 


--------------------------------------------------------------------------------
/prompts/asqa_demo.json:
--------------------------------------------------------------------------------
  1 | {
  2 |     "qa_pairs": [
  3 |         {
  4 |             "context": "Sounds of Silence is the second studio album by Simon & Garfunkel, released on January 17, 1966. The album's title is a slight modification of the title of the duo's first major hit, \"The Sound of Silence\", which originally was released as \"The Sounds of Silence\". The song had earlier been released in an acoustic version on the album \"Wednesday Morning, 3 A.M.\", and later on the soundtrack to the movie \"The Graduate\". Without the knowledge of Paul Simon or Art Garfunkel, electric guitars, bass and drums were overdubbed by Columbia Records staff producer Tom Wilson on June 15, 1965. This new version was released as a single in September 1965, and opens the album.",
  5 |             "question": "Who is the original artist of sound of silence, the song, released in 1964?",
  6 |             "short_answers": [
  7 |                 "Simon & Garfunkel",
  8 |                 "Paul Simon and Art Garfunkel",
  9 |                 "Art Garfunkel",
 10 |                 "Paul Simon"
 11 |             ],
 12 |             "wikipage": "Sounds of Silence"
 13 |         },
 14 |         {
 15 |             "context": "Sounds of Silence is the second studio album by Simon & Garfunkel, released on January 17, 1966. The album's title is a slight modification of the title of the duo's first major hit, \"The Sound of Silence\", which originally was released as \"The Sounds of Silence\". The song had earlier been released in an acoustic version on the album \"Wednesday Morning, 3 A.M.\", and later on the soundtrack to the movie \"The Graduate\". Without the knowledge of Paul Simon or Art Garfunkel, electric guitars, bass and drums were overdubbed by Columbia Records staff producer Tom Wilson on June 15, 1965. This new version was released as a single in September 1965, and opens the album.",
 16 |             "question": "Who is the original artist of sound of silence, the album?",
 17 |             "short_answers": [
 18 |                 "Simon & Garfunkel",
 19 |                 "Paul Simon and Art Garfunkel",
 20 |                 "Art Garfunkel",
 21 |                 "Paul Simon"
 22 |             ],
 23 |             "wikipage": "Sounds of Silence"
 24 |         },
 25 |         {
 26 |             "context": "\"Sound of Silence\" is a song performed by Australian recording artist Dami Im. Written by Anthony Egizii and David Musumeci of DNA Songs, it is best known as Australia's entry at the Eurovision Song Contest 2016 which was held in Stockholm, Sweden, where it finished 2nd, receiving a total of 511 points. The song also won the Marcel Bezen\u00e7on Award in the composer category. The song was leaked on 10 March 2016, one day before its initial release date. It is Dami Im's fourth Australian top 20 hit and worldwide, it reached the top 40 in more than six countries after the Eurovision Song Contest 2016 Final.",
 27 |             "question": "Who is the original artist of sound of silence, the song, released in 2016?",
 28 |             "short_answers": [
 29 |                 "Dami Im"
 30 |             ],
 31 |             "wikipage": "Sound of Silence (Dami Im song)"
 32 |         }
 33 |     ],
 34 |     "wikipages": [
 35 |         {
 36 |             "title": "The Sound of Silence",
 37 |             "url": "https://en.wikipedia.org/wiki/The%20Sound%20of%20Silence"
 38 |         },
 39 |         {
 40 |             "title": "Sounds of Silence",
 41 |             "url": "https://en.wikipedia.org/wiki/Sounds%20of%20Silence"
 42 |         },
 43 |         {
 44 |             "title": "Sound of Silence (Dami Im song)",
 45 |             "url": "https://en.wikipedia.org/wiki/Sound%20of%20Silence%20%28Dami%20Im%20song%29"
 46 |         }
 47 |     ],
 48 |     "annotations": [
 49 |         {
 50 |             "knowledge": [
 51 |                 {
 52 |                     "content": "Wednesday Morning, 3 A.M. was re-released in January 1966 (to capitalize on their newly found radio success because of the overdubbing of the song \"The Sound of Silence\" in June 1965, adding electric guitars, bass guitar and a drum kit), and reached No. 30 on the Billboard 200...The album was produced by Tom Wilson and engineered by Roy Halee between March 10\u201331, 1964.",
 53 |                     "wikipage": "Wednesday Morning, 3 A.M."
 54 |                 }
 55 |             ],
 56 |             "long_answer": " The original artist of the song sound of silence released in 1966 is Paul Simon and Art Garfunkel. The song had earlier been released in an acoustic version on the album \"Wednesday Morning, 3 A.M.\" which had been produced in 1964. In 2016, Australian recording artist Dami Im recorded a different song by the same name."
 57 |         },
 58 |         {
 59 |             "knowledge": [
 60 |                 {
 61 |                     "content": "A studio audition led to the duo signing a record deal with Columbia Records, and the original acoustic version of the song was recorded in March 1964 at Columbia Studios in New York City and included on their debut album, Wednesday Morning, 3 A.M.. Released on October 19, 1964,[2] the album was a commercial failure and led to the duo disbanding; Simon returned to England, and Art Garfunkel to his studies at Columbia University.",
 62 |                     "wikipage": "The Sound of Silence"
 63 |                 }
 64 |             ],
 65 |             "long_answer": "There are several songs with the title \"Sound of Silence\". Sounds of Silence is the second studio album by Simon & Garfunkel, released on January 17, 1966. The album's title is a slight modification of the title of the duo's first major hit, \"The Sound of Silence\", which was recorded in March 1964 and originally was released as \"The Sounds of Silence\". Another \"Sound of Silence\" is a song performed by Australian recording artist Dami Im, and is best known as Australia's entry at the Eurovision Song Contest 2016."
 66 |         }
 67 |     ],
 68 |     "sample_id": "7089015503030534342",
 69 |     "question": "Who is the original artist of sound of silence?",
 70 |     "docs": [
 71 |         {
 72 |             "id": "2627084",
 73 |             "title": "The Sound of Silence",
 74 |             "text": "The Sound of Silence \"The Sound of Silence\", originally \"The Sounds of Silence\", is a song by the American music duo Simon & Garfunkel. The song was written by Paul Simon over a period of several months in 1963 and 1964. A studio audition led to the duo signing a record deal with Columbia Records, and the song was recorded in March 1964 at Columbia Studios in New York City for inclusion on their debut album, \"Wednesday Morning, 3 A.M.\". Released in October 1964, the album was a commercial failure and led to the duo breaking apart, with Paul Simon",
 75 |             "score": 0.80078125,
 76 |             "summary": "The original artist of \"The Sound of Silence\" is Simon & Garfunkel.",
 77 |             "extraction": "\"The Sound of Silence\" is a song by the American music duo Simon & Garfunkel. The song was written by Paul Simon.",
 78 |             "summary_no_sub": "Who is the original artist of sound of silence? The original artist of \"The Sound of Silence\" is Simon & Garfunkel.",
 79 |             "summary_use_sub": "\"The Sound of Silence\" is a song by Simon & Garfunkel, written by Paul Simon in 1963-64. It was recorded in March 1964 for their debut album \"Wednesday Morning, 3 A.M.\" which was released in October 1964.",
 80 |             "answer_no_sub": "The original artist of \"The Sound of Silence\" is Simon & Garfunkel.",
 81 |             "answer_use_sub": "Original artist of sound of silence, the song, released in 1964: Simon & Garfunkel Original artist of sound of silence, the album: Simon & Garfunkel's debut album \"Wednesday Morning, 3 A.M.\" Original artist of sound of silence, the song, released in 2016: irrelevant"
 82 |         },
 83 |         {
 84 |             "id": "10996254",
 85 |             "title": "Sounds of Silence",
 86 |             "text": "Sounds of Silence Sounds of Silence is the second studio album by Simon & Garfunkel, released on January 17, 1966. The album's title is a slight modification of the title of the duo's first major hit, \"The Sound of Silence\", which originally was released as \"The Sounds of Silence\". The song had earlier been released in an acoustic version on the album \"Wednesday Morning, 3 A.M.\", and later on the soundtrack to the movie \"The Graduate\". Without the knowledge of Paul Simon or Art Garfunkel, electric guitars, bass and drums were overdubbed by Columbia Records staff producer Tom Wilson on",
 87 |             "score": 0.78955078125,
 88 |             "summary": "The document mentions that \"The Sound of Silence\" is a song by Simon & Garfunkel and was released in an acoustic version on the album \"Wednesday Morning, 3 A.M.\" before being overdubbed with electric guitars, bass, and drums. However, it does not state the original artist of the song.",
 89 |             "extraction": "\"The original artist of Sound of Silence is Simon & Garfunkel.\"",
 90 |             "summary_no_sub": "Who is the original artist of sound of silence? The document mentions that \"The Sound of Silence\" was a major hit by Simon & Garfunkel, but does not mention the original artist. Therefore, the answer is \"irrelevant\".",
 91 |             "summary_use_sub": "Who is the original artist of sound of silence, the song, released in 1964? - The original artist of \"The Sound of Silence\" song released in 1964 is Simon & Garfunkel. Who is the original artist of sound of silence, the album? - The original artist of \"Sounds of Silence\" album is Simon & Garfunkel. Who is the original artist of sound of silence, the song, released in 2016? - irrelevant.",
 92 |             "answer_no_sub": "The original artist of \"The Sound of Silence\" is Simon & Garfunkel.",
 93 |             "answer_use_sub": "Who is the original artist of sound of silence, the song, released in 1964? - The original artist of \"The Sound of Silence\" song released in 1964 is Simon & Garfunkel. Who is the original artist of sound of silence, the album? - The album \"Sounds of Silence\" is also by Simon & Garfunkel. Who is the original artist of sound of silence, the song, released in 2016? - Irrelevant."
 94 |         },
 95 |         {
 96 |             "id": "10996258",
 97 |             "title": "Sounds of Silence",
 98 |             "text": "Angeles, California. \"The Sound of Silence\" (electric overdubs) personnel \"The Sound of Silence\" overdubs were recorded at Columbia's \"Studio A\" at 799 Seventh Avenue near 52nd Street by Columbia Records staff producer Tom Wilson on June 15, 1965. Sounds of Silence Sounds of Silence is the second studio album by Simon & Garfunkel, released on January 17, 1966. The album's title is a slight modification of the title of the duo's first major hit, \"The Sound of Silence\", which originally was released as \"The Sounds of Silence\". The song had earlier been released in an acoustic version on the album",
 99 |             "score": 0.7705078125,
100 |             "summary": "\"The Sound of Silence\" was recorded by Simon & Garfunkel and produced by Tom Wilson on June 15, 1965. The song was originally released as \"The Sounds of Silence\" and later appeared on the album \"Sounds of Silence\" in January 1966.",
101 |             "extraction": "\"The original artist of Sound of Silence is Simon & Garfunkel.\"",
102 |             "summary_no_sub": "The document provides information about the recording of \"The Sound of Silence\" in 1965 and the release of the album \"Sounds of Silence\" by Simon & Garfunkel in 1966. However, it does not mention the original artist of \"The Sound of Silence.\"",
103 |             "summary_use_sub": "The document provides information about the recording of \"The Sound of Silence\" song in 1965 and the release of the \"Sounds of Silence\" album in 1966 by Simon & Garfunkel. It does not mention any release of the song in 2016. The original artist of the song and album is Simon & Garfunkel.",
104 |             "answer_no_sub": "The original artist of \"The Sound of Silence\" is Simon & Garfunkel.",
105 |             "answer_use_sub": "Original artist of sound of silence, the song, released in 1964 is irrelevant as the document only mentions the recording of the electric overdubs of the song in 1965. The original artist of sound of silence, the album, is Simon & Garfunkel. The original artist of sound of silence, the song, released in 2016 is irrelevant as the document only mentions the original release of the song in an acoustic version on an album."
106 |         },
107 |         {
108 |             "id": "364634",
109 |             "title": "Simon & Garfunkel",
110 |             "text": "Simon & Garfunkel Simon & Garfunkel were an American folk rock duo consisting of singer-songwriter Paul Simon and singer Art Garfunkel. They were one of the bestselling music groups of the 1960s and became counterculture icons of the decade's social revolution, alongside artists such as the Beatles, the Beach Boys, and Bob Dylan. Their biggest hits\u2014including \"The Sound of Silence\" (1964), \"Mrs. Robinson\" (1968), \"The Boxer\" (1969), and \"Bridge over Troubled Water\" (1970)\u2014reached number one on singles charts worldwide. The duo met in elementary school in Queens, New York, in 1953, where they learned to harmonize together and began writing",
111 |             "score": 0.7490234375,
112 |             "summary": "The document mentions Simon & Garfunkel as an American folk rock duo consisting of Paul Simon and Art Garfunkel. They were famous for their hits, including \"The Sound of Silence\" (1964), which reached number one on singles charts worldwide.",
113 |             "extraction": "\"The original artist of Sound of Silence is Simon & Garfunkel, an American folk rock duo consisting of singer-songwriter Paul Simon and singer Art Garfunkel.\"",
114 |             "summary_no_sub": "The document is irrelevant to the question.",
115 |             "summary_use_sub": "Simon & Garfunkel were an American folk rock duo consisting of Paul Simon and Art Garfunkel. They released the song \"The Sound of Silence\" in 1964, which became one of their biggest hits. The document does not mention the album or the artist who released the song in 2016.",
116 |             "answer_no_sub": "The original artist of \"Sound of Silence\" is Simon & Garfunkel.",
117 |             "answer_use_sub": "Who is the original artist of sound of silence, the song, released in 1964? - Simon & Garfunkel Who is the original artist of sound of silence, the album? - irrelevant Who is the original artist of sound of silence, the song, released in 2016? - irrelevant"
118 |         },
119 |         {
120 |             "id": "8156383",
121 |             "title": "Simon & Garfunkel discography",
122 |             "text": "Simon & Garfunkel discography Simon & Garfunkel, an American singer-songwriter duo, has released five studio albums, fifteen compilation albums, four live albums, one extended play, 26 singles, one soundtrack, and four box sets since 1964. Paul Simon and Art Garfunkel first formed a duo in 1957 as Tom & Jerry, before separating and later reforming as Simon & Garfunkel. Simon & Garfunkel's debut album, \"Wednesday Morning, 3 A.M.\", was released on October 19, 1964. Initially a flop, it was re-released two years later with the new version of the single \"The Sound of Silence\", which was overdubbed with electric instruments",
123 |             "score": 0.72314453125,
124 |             "summary_no_sub": "Who is the original artist of sound of silence? Simon & Garfunkel, an American singer-songwriter duo, released \"The Sound of Silence\" in 1964.",
125 |             "summary_use_sub": "Simon & Garfunkel, a singer-songwriter duo, released \"The Sound of Silence\" as a single in 1964. The album it was featured on was \"Wednesday Morning, 3 A.M.\" There is no mention of a release of the song in 2016.",
126 |             "answer_no_sub": "The original artist of \"The Sound of Silence\" is Simon & Garfunkel.",
127 |             "answer_use_sub": "Original artist of sound of silence, the song, released in 1964: Simon & Garfunkel Original artist of sound of silence, the album: Simon & Garfunkel (the album is not mentioned in the document, but the song was included in their debut album \"Wednesday Morning, 3 A.M.\") Original artist of sound of silence, the song, released in 2016: irrelevant"
128 |         }
129 |     ],
130 |     "answer": "There are several songs with the title \"Sound of Silence\". Sounds of Silence is the second studio album by Simon & Garfunkel, released on January 17, 1966. The album's title is a slight modification of the title of the duo's first major hit, \"The Sound of Silence\", which was recorded in March 1964 and originally was released as \"The Sounds of Silence\". Another \"Sound of Silence\" is a song performed by Australian recording artist Dami Im, and is best known as Australia's entry at the Eurovision Song Contest 2016.",
131 |     "judgment": "[YES]"
132 | }


--------------------------------------------------------------------------------
/openai_account_manager.py:
--------------------------------------------------------------------------------
  1 | import logging
  2 | from typing import Union
  3 | import openai
  4 | import fcntl
  5 | import threading
  6 | import tqdm
  7 | from multi_thread_openai_api_call import MyThread
  8 | 
  9 | logger = logging.getLogger()
 10 | 
 11 | 
 12 | class OpenAI_Account_Manager:
 13 |     _instance = None
 14 | 
 15 |     def __new__(cls, *args, **kwargs):
 16 |         if cls._instance is None:
 17 |             cls._instance = object.__new__(cls)
 18 | 
 19 |         return cls._instance
 20 | 
 21 |     def __init__(self, used_account_fp, all_account_fp):
 22 |         self.used_account_fp = used_account_fp
 23 |         self.all_account_fp = all_account_fp
 24 | 
 25 |         used_account_f = open(used_account_fp, 'r')
 26 |         used_account = list(map(lambda x: x.strip().split('----'), used_account_f.readlines()))
 27 |         used_account_f.close()
 28 | 
 29 |         all_account_f = open(all_account_fp, 'r')
 30 |         all_account = list(map(lambda x: x.strip().split('----'), all_account_f.readlines()))
 31 |         all_account_f.close()
 32 | 
 33 |         used_account_key = list(map(lambda x: x[-1], used_account))
 34 | 
 35 |         all_account = list(filter(lambda x: x[-1] not in used_account_key, all_account))
 36 | 
 37 |         self.used_account = used_account
 38 |         self.all_account = all_account
 39 | 
 40 |         openai.api_key = self.all_account[0][-1]
 41 |         logger.info(
 42 |             'successfully build OpenAI_Account_Manager, now the number of available accounts is {} and now api_key is {}'.format(
 43 |                 len(self.all_account), self.all_account[0][-1]))
 44 | 
 45 |     def use_next_account(self):
 46 |         self.used_account.append(self.all_account[0])
 47 |         del self.all_account[0]
 48 |         with open(self.used_account_fp, 'a') as tmp_used_account_f:
 49 |             fcntl.fcntl(tmp_used_account_f.fileno(), fcntl.LOCK_EX)
 50 |             print('----'.join(self.used_account[-1]), file=tmp_used_account_f)
 51 |             logger.info(
 52 |                 'account:[{}, {}, {}] runs out. so use next.'.format(self.used_account[-1][0], self.used_account[-1][1],
 53 |                                                                      self.used_account[-1][2]))
 54 |         openai.api_key = self.all_account[0][-1]
 55 | 
 56 | 
 57 | class OpenAI_Account_Manager_MultiThread:
 58 |     _instance = None
 59 | 
 60 |     def __new__(cls, *args, **kwargs):
 61 |         if cls._instance is None:
 62 |             cls._instance = object.__new__(cls)
 63 | 
 64 |         return cls._instance
 65 | 
 66 |     def __init__(self, used_account_fp, all_account_fp):
 67 |         self.now_account_idx = 0
 68 | 
 69 | 
 70 |         self.used_account_fp = used_account_fp
 71 |         self.all_account_fp = all_account_fp
 72 | 
 73 |         used_account_f = open(used_account_fp, 'r')
 74 |         used_account = list(map(lambda x: x.strip().split('----'), used_account_f.readlines()))
 75 |         used_account_f.close()
 76 | 
 77 |         all_account_f = open(all_account_fp, 'r')
 78 |         all_account = list(map(lambda x: x.strip().split('----'), all_account_f.readlines()))
 79 |         all_account_f.close()
 80 | 
 81 |         used_account_key = list(map(lambda x: x[-1], used_account))
 82 | 
 83 |         all_account = list(filter(lambda x: x[-1] not in used_account_key, all_account))
 84 | 
 85 |         self.used_account = used_account
 86 |         self.all_account = all_account
 87 |         self.using_account = []
 88 | 
 89 |         # openai.api_key = self.all_account[0][-1]
 90 |         logger.info(
 91 |             'successfully build OpenAI_Account_Manager, now the number of available accounts is {} and now api_key is {}'.format(
 92 |                 len(self.all_account), self.all_account[0][-1]))
 93 |         self.next_account_lock = threading.Lock()
 94 |         self.empty_account_lock = threading.Lock()
 95 | 
 96 |     def get_next_account(self, thread_id, last_empty_account=None):
 97 |         with self.next_account_lock:
 98 |             result = self.all_account[0]
 99 |             self.using_account.append(self.all_account[0])
100 |             del self.all_account[0]
101 |             if last_empty_account != None:
102 |                 self.record_empty_account(last_empty_account)
103 |                 logger.info('Thread {} account: [{}, {}, {}] '
104 |                             'runs out'.format(thread_id,
105 |                                               self.used_account[-1][0],
106 |                                               self.used_account[-1][1],
107 |                                               self.used_account[-1][2]))
108 |                 logger.info('Thread {} use next account: [{}, {}, {}] '
109 |                             .format(thread_id, result[0],
110 |                                     result[1],
111 |                                     result[2]))
112 |             else:
113 |                 logger.info('Thread {} first account: [{}, {}, {}] '
114 |                             .format(thread_id, result[0],
115 |                                     result[1],
116 |                                     result[2]))
117 |             # openai.api_key = self.all_account[0][-1]
118 |             return result
119 | 
120 |     def record_empty_account(self, empty_account):
121 |         with self.empty_account_lock:
122 |             self.used_account.append(empty_account)
123 |             with open(self.used_account_fp, 'a') as tmp_used_account_f:
124 |                 fcntl.fcntl(tmp_used_account_f.fileno(), fcntl.LOCK_EX)
125 |                 print('----'.join(self.used_account[-1]), file=tmp_used_account_f)
126 | 
127 | 
128 | class OpenAI_Account_Manager_MultiThread_One_Acount_Many_Used:
129 |     '''
130 |     OpenAI_Account_Manager_MultiThread_One_Acount_Many_Used: when OpenAI_Account_Manager_MultiThread uses one account for one thread,
131 |     so the number of threads is limited by the number of accounts.
132 |     OpenAI_Account_Manager_MultiThread_One_Acount_Many_Used support multiple threads using one account.
133 |     '''
134 |     _instance = None
135 | 
136 |     def __new__(cls, *args, **kwargs):
137 |         if cls._instance is None:
138 |             cls._instance = object.__new__(cls)
139 | 
140 |         return cls._instance
141 | 
142 | 
143 |     def __init__(self, used_account_fp: str, all_account_fp: str, limit_account_num: int=-1) -> None:
144 |         """Class init
145 |         Args
146 |         ----
147 |         used_account_fp: str
148 |             Path to file containing used OpenAI accounts.
149 |         all_account_fp: str
150 |             Path to file containing all OpenAI accounts.
151 |         limit_account_num: int=-1
152 |             Number of available accounts.
153 |         """
154 |         if hasattr(self, 'inited'):
155 |             return
156 |         self.inited = 1
157 |         self.now_account_idx = 0
158 | 
159 |         self.used_account_fp = used_account_fp
160 |         self.all_account_fp = all_account_fp
161 | 
162 |         used_account_f = open(used_account_fp, 'r')
163 |         used_account = list(map(lambda x: x.strip().split('----'), used_account_f.readlines()))
164 |         used_account_f.close()
165 | 
166 |         all_account_f = open(all_account_fp, 'r')
167 |         all_account = list(map(lambda x: x.strip().split('----'), all_account_f.readlines()))
168 |         all_account_f.close()
169 | 
170 |         used_account_key = []
171 |         for account in used_account:
172 |             if len(account) == 4:
173 |                 used_account_key.append(account[-2])
174 |             else:
175 |                 used_account_key.append(account[-1])
176 | 
177 |         # Keep only usable account.
178 | 
179 |         all_account = list(filter(lambda x: x[-1] not in used_account_key, all_account))
180 |         temp_all_account = []
181 |         for account in all_account:
182 |             if len(account) == 4 and account[-2] not in used_account_key:
183 |                 temp_all_account.append(account)
184 |             elif len(account) == 3 and account[-1] not in used_account_key:
185 |                 temp_all_account.append(account)
186 |             else:
187 |                 raise Exception
188 |         all_account = temp_all_account
189 | 
190 |         if limit_account_num > 0:
191 |             all_account = all_account[:limit_account_num]
192 | 
193 |         self.used_account = used_account
194 |         self.used_account_key = set(used_account_key)
195 |         self.all_account = all_account
196 | 
197 |         self.using_account = []
198 |         self.thread_to_account = {}
199 |         logger.info('successfully build OpenAI_Account_Manager, now the number of available accounts is {}'.format(len(self.all_account)))
200 | 
201 |         self.next_account_lock = threading.Lock()
202 |         self.empty_account_lock = threading.Lock()
203 | 
204 | 
205 |     def get_next_account(self, thread_id, last_empty_account=None):
206 |         with self.next_account_lock:
207 |             available_num = self.check_available_account_num()
208 |             if available_num == 0:
209 |                 logger.info('all accounts used, so..')
210 |                 logger.info('all accounts used, so..')
211 |                 logger.info('all accounts used, so..')
212 |                 logger.info('all accounts used, so..')
213 |                 logger.info('all accounts used, so..')
214 |             else:
215 |                 logger.info('now available accounts : {}'.format(available_num))
216 | 
217 |             while True:
218 |                 result = self.all_account[self.now_account_idx]
219 |                 if result[-1] in self.used_account_key or result[-2] in self.used_account_key:
220 |                     self.now_account_idx += 1
221 |                     self.now_account_idx = self.now_account_idx % len(self.all_account)
222 |                 else:
223 |                     break
224 | 
225 |             result = self.all_account[self.now_account_idx]
226 |             self.now_account_idx += 1
227 |             self.now_account_idx = self.now_account_idx % len(self.all_account)
228 | 
229 |             if last_empty_account != None:
230 |                 self.record_empty_account(last_empty_account)
231 |                 logger.info('Thread {} account: [{}, {}, {}] '
232 |                             'runs out'.format(thread_id,
233 |                                               self.used_account[-1][0],
234 |                                               self.used_account[-1][1],
235 |                                               self.used_account[-1][2]))
236 |                 logger.info('Thread {} use next account: [{}, {}, {}] '
237 |                             .format(thread_id, result[0],
238 |                                     result[1],
239 |                                     result[2]))
240 |             else:
241 |                 logger.info('Thread {} first account: [{}, {}, {}] '
242 |                             .format(thread_id, result[0],
243 |                                     result[1],
244 |                                     result[2]))
245 |             return result
246 | 
247 | 
248 |     def record_empty_account(self, empty_account):
249 |         with self.empty_account_lock:
250 |             self.used_account.append(empty_account)
251 |             if len(empty_account) == 4:
252 |                 self.used_account_key.add(empty_account[-2])
253 |             else:
254 |                 self.used_account_key.add(empty_account[-1])
255 |             with open(self.used_account_fp, 'a') as tmp_used_account_f:
256 |                 fcntl.fcntl(tmp_used_account_f.fileno(), fcntl.LOCK_EX)
257 |                 print('----'.join(self.used_account[-1]), file=tmp_used_account_f)
258 | 
259 | 
260 |     def check_available_account_num(self):
261 |         available_num = 0
262 |         for account in self.all_account:
263 |             if len(account) == 4 and account[-2] not in self.used_account_key:
264 |                 available_num += 1
265 |             elif len(account) == 3 and account[-1] not in self.used_account_key:
266 |                 available_num += 1
267 |             else:
268 |                 raise Exception
269 |         return available_num
270 | 
271 | 
272 | def get_account_manager(
273 |     account_file: str, 
274 |     used_file: str, 
275 |     multi_thread: bool=False, 
276 |     limit_account_num: int=-1
277 | ) -> Union[OpenAI_Account_Manager_MultiThread_One_Acount_Many_Used, OpenAI_Account_Manager]:
278 |     """Get an instance of managing openai accounts.
279 |     Args
280 |     ----
281 |     account_file: str
282 |         The file containing available username, password and key of OpenAI API account.
283 |     used_file: str
284 |         The file containing unavailable username, password and key of OpenAI API account.
285 |     multi_thread: bool=False
286 |         Whether to use multi-thread or not.
287 |     limit_account_num: int=-1
288 |         Number of available accounts.
289 | 
290 |     Returns
291 |     -------
292 |     result: Union[OpenAI_Account_Manager_MultiThread_One_Acount_Many_Used, OpenAI_Account_Manager]
293 |         An instance of class OpenAI_Account_Manager_MultiThread_One_Acount_Many_Used or OpenAI_Account_Manager
294 |     """
295 |     if multi_thread:
296 |         result = OpenAI_Account_Manager_MultiThread_One_Acount_Many_Used(account_file, used_file, limit_account_num=limit_account_num)
297 |     else:
298 |         result = OpenAI_Account_Manager(account_file, used_file)
299 |     return result
300 | 
301 | 
302 | class OpenAI_API_inp_Manager_MultiThread:
303 |     def __init__(self, idx_x_list_to_decode, inference_hyper_parameter):
304 | 
305 |         self.idx_x_list_to_decode = idx_x_list_to_decode
306 | 
307 |         self.inp_lock = threading.Lock()
308 |         self.progress_index = 0
309 | 
310 |         assert type(inference_hyper_parameter) == type([])
311 |         assert type(inference_hyper_parameter[0]) == type({})
312 | 
313 |         if len(inference_hyper_parameter) == 1:
314 |             inference_hyper_parameter = inference_hyper_parameter * len(self.idx_x_list_to_decode)
315 | 
316 |         assert len(self.idx_x_list_to_decode) == len(inference_hyper_parameter), \
317 |             'idx_x_list_to_decode:{}, inference_hyper_parameter:{}' \
318 |             .format(len(idx_x_list_to_decode), len(inference_hyper_parameter))
319 | 
320 |         self.inference_hyper_parameter = inference_hyper_parameter
321 | 
322 |         for i in range(len(inference_hyper_parameter)):
323 |             assert 'max_tokens' in inference_hyper_parameter[i], "{} th inference_hyper_parameter has no max_length"
324 | 
325 | 
326 |     def get_next_gpt_idx_inp(self):
327 |         with self.inp_lock:
328 |             if self.progress_index < len(self.idx_x_list_to_decode):
329 |                 tmp_inp = self.idx_x_list_to_decode[self.progress_index]
330 |                 tmp_hyper_parameter = self.inference_hyper_parameter[self.progress_index]
331 |                 self.progress_index += 1
332 |                 return {'inp': tmp_inp, 'hyper_parameter': tmp_hyper_parameter}
333 |             else:
334 |                 return None
335 | 
336 | 
337 | def openai_llm_generate_multi_thread(eval_data_openai_queries, llm, num_threads, use_tqdm,turbo_system_message=None):
338 |     # hyper_parameter = None
339 |     x_list_to_decode = list(map(lambda x:x['input'],eval_data_openai_queries))
340 |     max_tokens = list(map(lambda x:x['max_tokens'],eval_data_openai_queries))
341 |     idx_x_list_to_decode = list(enumerate(x_list_to_decode))
342 |     # eval_data_openai_queries = list(enumerate(eval_data_openai_queries))
343 |     hyper_parameter = list(map(lambda x:{'max_tokens':x},max_tokens))
344 | 
345 |     inp_manager = OpenAI_API_inp_Manager_MultiThread(idx_x_list_to_decode, hyper_parameter)
346 |     thread_list = []
347 |     account_manager = get_account_manager('openai_account_files/used.txt', 'openai_account_files/accounts.txt', multi_thread=True)
348 |     if use_tqdm:
349 |         pbar = tqdm.tqdm(total=len(idx_x_list_to_decode))
350 |     else:
351 |         pbar = None
352 |     for i in range(num_threads):
353 |         thread_list.append(MyThread(i, llm, account_manager, inp_manager, 1, pbar, turbo_system_message))
354 | 
355 |     for t in thread_list:
356 |         t.start()
357 |     for i, t in enumerate(thread_list):
358 |         t.join()
359 | 
360 |     responses_with_idx = []
361 | 
362 |     for t in thread_list:
363 |         responses_with_idx.extend(t.responses_with_idx)
364 | 
365 |     responses_with_idx.sort(key=lambda x: x[0])
366 | 
367 |     responses = list(map(lambda x: x[1], responses_with_idx))
368 |     return responses
369 | 


--------------------------------------------------------------------------------
/prompts/asqa_default.json:
--------------------------------------------------------------------------------
  1 | {
  2 |     "instruction": "Instruction: Write an accurate, engaging, and concise answer for the given question using only the provided search results (some of which might be irrelevant) and cite them properly. Use an unbiased and journalistic tone. Always cite for any factual claim. When citing several search results, use [1][2][3]. Cite at least one document and at most three documents in each sentence. If multiple documents support the sentence, only cite a minimum sufficient subset of the documents.",
  3 |     "demo_sep": "\n\n\n",
  4 |     "demo_prompt": "{INST}\n\nQuestion: {Q}\n\n{D}\nAnswer: {A}",
  5 |     "doc_prompt": "Document [{ID}](Title: {T}): {P}\n",
  6 |     "demos": [
  7 |         {
  8 |             "question": "Which is the most rainy place on earth?",
  9 |             "answer": "Several places on Earth claim to be the most rainy, such as Lloró, Colombia, which reported an average annual rainfall of 12,717 mm between 1952 and 1989, and López de Micay, Colombia, which reported an annual 12,892 mm between 1960 and 2012 [3]. However, the official record is held by Mawsynram, India with an average annual rainfall of 11,872 mm [3], although nearby town Sohra, India, also known as Cherrapunji, holds the record for most rain in a calendar month for July 1861 and most rain in a year from August 1860 to July 1861 [1].",
 10 |             "docs": [
 11 |                 {
 12 |                     "title": "Cherrapunji",
 13 |                     "text": "Cherrapunji Cherrapunji (; with the native name Sohra being more commonly used, and can also be spelled Cherrapunjee or Cherrapunji) is a subdivisional town in the East Khasi Hills district in the Indian state of Meghalaya. It is the traditional capital of aNongkhlaw \"hima\" (Khasi tribal chieftainship constituting a petty state), both known as Sohra or Churra. Cherrapunji has often been credited as being the wettest place on Earth, but for now nearby Mawsynram currently holds that distinction. Cherrapunji still holds the all-time record for the most rainfall in a calendar month for July 1861 and most rain in a year from August 1860 to July 1861, however: it received in"
 14 |                 },
 15 |                 {
 16 |                     "title": "Cherrapunji",
 17 |                     "text": "Radio relay station known as Akashvani Cherrapunji. It broadcasts on FM frequencies. Cherrapunji Cherrapunji (; with the native name Sohra being more commonly used, and can also be spelled Cherrapunjee or Cherrapunji) is a subdivisional town in the East Khasi Hills district in the Indian state of Meghalaya. It is the traditional capital of aNongkhlaw \"hima\" (Khasi tribal chieftainship constituting a petty state), both known as Sohra or Churra. Cherrapunji has often been credited as being the wettest place on Earth, but for now nearby Mawsynram currently holds that distinction. Cherrapunji still holds the all-time record for the most rainfall"
 18 |                 },
 19 |                 {
 20 |                     "title": "Mawsynram",
 21 |                     "text": "Mawsynram Mawsynram () is a village in the East Khasi Hills district of Meghalaya state in north-eastern India, 65 kilometres from Shillong. Mawsynram receives one of the highest rainfalls in India. It is reportedly the wettest place on Earth, with an average annual rainfall of 11,872 mm, but that claim is disputed by Lloró, Colombia, which reported an average yearly rainfall of 12,717 mm between 1952 and 1989 and López de Micay, also in Colombia, which reported an annual 12,892 mm per year between 1960 and 2012. According to the \"Guinness Book of World Records\", Mawsynram received of rainfall in 1985. Mawsynram is located at 25° 18′"
 22 |                 },
 23 |                 {
 24 |                     "title": "Earth rainfall climatology",
 25 |                     "text": "Pacific Northwest, and the Sierra Nevada range are the wetter portions of the nation, with average rainfall exceeding per year. The drier areas are the Desert Southwest, Great Basin, valleys of northeast Arizona, eastern Utah, central Wyoming, eastern Oregon and Washington and the northeast of the Olympic Peninsula. The Big Bog on the island of Maui receives, on average, every year, making it the wettest location in the US, and all of Oceania. The annual average rainfall maxima across the continent lie across the northwest from northwest Brazil into northern Peru, Colombia, and Ecuador, then along the Atlantic coast of"
 26 |                 },
 27 |                 {
 28 |                     "title": "Going to Extremes",
 29 |                     "text": "in the world. Oymyakon in Siberia, where the average winter temperature is −47 °F (− 44 °C). Arica in Chile, where there had been fourteen consecutive years without rain. Fog is the only local source of water. Mawsynram in India, where average annual rainfall is 14 meters, falling within a four-month period in the monsoon season. The rainfall is approximately equal to that of its neighbor Cherrapunji. Dallol in Ethiopia, known as the 'Hell-hole of creation' where the temperature averages 94 °F (34 °C) over the year. In his second series, Middleton visited places without permanent towns, locations where \"survival\""
 30 |                 }
 31 |             ]
 32 |         },
 33 |         {
 34 |             "question": "When did the us break away from england?",
 35 |             "answer": "The United States took the first step towards gaining independence from Great Britain when it declared independence from Great Britain on July 2, 1776 (although the event is now commemorated on July 4, 1776, the date when the Declaration of Independence was officially adopted by Congress) [2]. The Treaty of Paris was later signed on September 3, 1783, formally separating the United States from the British Empire [3].",
 36 |             "docs": [
 37 |                 {
 38 |                     "title": "United States withdrawal from Saudi Arabia",
 39 |                     "text": "United States withdrawal from Saudi Arabia Beginning during Operation Desert Shield in August 1990, while preparing for the Gulf War, the United States sent a large troop contingent to Saudi Arabia. After the war, remnant troops, primarily U.S. Air Force personnel, augmented by a smaller number of coordinating and training personnel from the U.S. Navy, U.S. Army and U.S. Marine Corps remained in Saudi Arabia under the aegis of Joint Task Force Southwest Asia (JTF-SWA), as part of Operation Southern Watch (OSW). The United Kingdom and France also maintained a small contingent of Royal Air Force and French Air Force"
 40 |                 },
 41 |                 {
 42 |                     "title": "Decolonization of the Americas",
 43 |                     "text": "and France has fully \"integrated\" most of its former colonies as fully constituent \"departments\" of France. The United States of America declared independence from Great Britain on July 2, 1776 (although the event is now commemorated on July 4, the date when the Declaration of Independence was officially adopted by Congress), in so doing becoming the first independent, foreign-recognized nation in the Americas and the first European colonial entity to break from its mother country. Britain formally acknowledged American independence in 1783 after its defeat in the American Revolutionary War. Although initially occupying only the land east of the Mississippi"
 44 |                 },
 45 |                 {
 46 |                     "title": "American Revolution",
 47 |                     "text": "second British army at Yorktown in the fall of 1781, effectively ending the war. The Treaty of Paris was signed September 3, 1783, formally ending the conflict and confirming the new nation's complete separation from the British Empire. The United States took possession of nearly all the territory east of the Mississippi River and south of the Great Lakes, with the British retaining control of Canada and Spain taking Florida. Among the significant results of the revolution was the creation of the United States Constitution, establishing a relatively strong federal national government that included an executive, a national judiciary, and"
 48 |                 },
 49 |                 {
 50 |                     "title": "Decolonization",
 51 |                     "text": "accelerate decolonialization and bring an end to the colonial empires of its Western allies, most importantly during the 1956 Suez Crisis, but American military bases were established around the world and direct and indirect interventions continued in Korea, Indochina, Latin America (\"inter alia\", the 1965 occupation of the Dominican Republic), Africa, and the Middle East to oppose Communist invasions and insurgencies. Since the dissolution of the Soviet Union, the United States has been far less active in the Americas, but invaded Afghanistan and Iraq following the September 11 attacks in 2001, establishing army and air bases in Central Asia. Before"
 52 |                 },
 53 |                 {
 54 |                     "title": "Decolonization",
 55 |                     "text": "the responsibility of the United Kingdom (with a copy of the new constitution annexed), and finally, if approved, issuance of an Order of Council fixing the exact date of independence. After World War I, several former German and Ottoman territories in the Middle East, Africa, and the Pacific were governed by the UK as League of Nations mandates. Some were administered directly by the UK, and others by British dominions – Nauru and the Territory of New Guinea by Australia, South West Africa by the Union of South Africa, and Western Samoa by New Zealand. Egypt became independent in 1922,"
 56 |                 }
 57 |             ]
 58 |         },
 59 |         {
 60 |             "question": "Who set the record for longest field goal?",
 61 |             "answer": "The record for the longest field goal in an NFL game was set by Matt Prater at 64 yards [1], but the record for the longest field goal at any level was 69 yards, kicked by collegiate kicker Ove Johansson in a 1976 Abilene Christian University football game against East Texas State University [2].",
 62 |             "docs": [
 63 |                 {
 64 |                     "title": "Field goal",
 65 |                     "text": "toward its own end. The longest field goal kick in NFL history is 64 yards, a record set by Matt Prater on December 8, 2013. The previous record was 63, originally set by Tom Dempsey (1970) and then matched by Jason Elam (1998), Sebastian Janikowski (2011), David Akers (2012), and Graham Gano (2018). High school, college and most professional football leagues offer only a three-point field goal; however, some professional leagues have encouraged more rare kicks through \"four-point field goals\". NFL Europe encouraged long field goals of 50 yards or more by making those worth four points instead of three"
 66 |                 },
 67 |                 {
 68 |                     "title": "Field goal range",
 69 |                     "text": "35 and 40 yard lines (closer in a crosswind) often will go for the more risky fourth down conversion rather than risk either the touchback or the missed field goal. The longest field goal in recorded football history was 69 yards, set by collegiate kicker Ove Johansson, who was born in Sweden, in a 1976 Abilene Christian University football game against East Texas State University (now Texas A&M Commerce) at Shotwell Stadium in Abilene. The longest successful field goal in the NFL was 64 yards and was completed by Matt Prater in 2013. The NCAA record is 67 yards held"
 70 |                 },
 71 |                 {
 72 |                     "title": "Field goal",
 73 |                     "text": "both end zones) is only 66 yards. Scaccia, while playing indoor football, attempted a 64-yard kick that was inches short of success, hitting the crossbar. Longer field goals have been attempted at times; the longest attempt in the NFL, which was well short and was kicked into the wind, was 76 yards, attempted by Sebastian Janikowski of the Oakland Raiders, in a September 28, 2008 game against the San Diego Chargers. NFL Europe rewarded kickers that successfully kicked a field goal of longer than 50 yards with a bonus point, making such field goals worth 4 points instead of 3;"
 74 |                 },
 75 |                 {
 76 |                     "title": "Field goal",
 77 |                     "text": "this accomplishment is not the official record. All of the above kicks were successful with the use of a kicking tee, which was banned by the NCAA after the 1988 season. The longest known drop-kicked field goal in college football was a 62-yard kick from Pat O'Dea, an Australian kicker who played on the Wisconsin Badgers football team. O'Dea's kick took place in a blizzard against Northwestern on November 15, 1898. The longest field goal in U Sports football history is 59 yards, by Niko Difonte of Calgary Dinos, playing against the UBC Thunderbirds on November 11, 2017. The field"
 78 |                 },
 79 |                 {
 80 |                     "title": "Field goal range",
 81 |                     "text": "NFL and have been banned from NCAA since 1989) is 68 yards held by Fabrizio Scaccia, and the high school record 68 yards held by Dirk Borgognone; high school has wider goal posts and treats a field goal attempt that lands short in the field of play the same as a punt, making longer attempts much less risky. The indoor football record, with narrower and higher goal posts, is 63 yards (set by Aaron Mills), which is practically as long of a field goal as is possible in that variant of the sport, since the field in indoor football (including"
 82 |                 }
 83 |             ]
 84 |         },
 85 |         {
 86 |             "question": "Who played galen in planet of the apes?",
 87 |             "answer": "In the 1968 film Planet of the Apes, Galen was played by Wright King [2]. And in the tv series Planet of the Apes, Galen was played by Roddy McDowall [1].",
 88 |             "docs": [
 89 |                 {
 90 |                     "title": "Planet of the Apes",
 91 |                     "text": "installment. Jacobs died on June 27, 1973, bringing an end to the APJAC Productions era of the \"Planet of the Apes\" franchise. Former Fox executive Stan Hough took over as producer for the television project, titled \"Planet of the Apes\". CBS picked up the series for its 1974 autumn lineup. Ron Harper and James Naughton played Alan Virdon and Peter Burke, two 20th-century American astronauts who pass through a time warp to a future where apes subjugate humans (unlike the original film, the humans can speak). Roddy McDowall returned to the franchise as Galen, a chimpanzee who joins the astronauts."
 92 |                 },
 93 |                 {
 94 |                     "title": "Planet of the Apes (1968 film)",
 95 |                     "text": "chimpanzees: animal psychologist Zira (Kim Hunter) and surgeon Galen (Wright King). While unable to speak as his throat wound is healing, called \"Bright Eyes\" by Zira and placed with one of the captive primitive humans he later names \"Nova\", Taylor observes the enhanced society of talking apes and in a strict caste system: the gorillas being the military police, hunters and workers; the orangutans overseeing the affairs of government, science, and religion; and intellectual chimpanzees being mostly scientists. While their society is a theocracy similar to the beginnings of the human Industrial Era, the apes consider the primitive humans as"
 96 |                 },
 97 |                 {
 98 |                     "title": "Planet of the Apes (1968 film)",
 99 |                     "text": "Planet of the Apes (1968 film) Planet of the Apes is a 1968 American science fiction film directed by Franklin J. Schaffner. It stars Charlton Heston, Roddy McDowall, Kim Hunter, Maurice Evans, James Whitmore, James Daly and Linda Harrison. The screenplay by Michael Wilson and Rod Serling was loosely based on the 1963 French novel \"La Plan\u00e8te des Singes\" by Pierre Boulle. Jerry Goldsmith composed the groundbreaking avant-garde score. It was the first in a series of five films made between 1968 and 1973, all produced by Arthur P. Jacobs and released by 20th Century Fox. The film tells the"
100 |                 },
101 |                 {
102 |                     "title": "Planet of the Apes",
103 |                     "text": "Rupert Wyatt. To portray ape characters realistically, the production avoided practical effects in favor of performance capture acting, partnering with New Zealand visual effects company Weta Digital. Wyatt cast James Franco as Will Rodman, while veteran performance capture actor Andy Serkis signed on to star as Caesar. \"Rise\" debuted on August 5, 2011. Critics reviewed it positively, especially praising the visual effects and Serkis's performance. It was a major box office hit, taking in $482 million globally, more than five times its $93 million budget. Weta's special effects earned the film two Visual Effects Society Awards and an Oscar nomination"
104 |                 },
105 |                 {
106 |                     "title": "Planet of the Apes",
107 |                     "text": "film stars Mark Wahlberg as astronaut Leo Davidson, who accidentally travels through a wormhole to a distant planet where talking apes enslave humans. He leads a human revolt and upends ape civilization by discovering that the apes evolved from the normal earth primates who had accompanied his mission, and arrived years before. Helena Bonham Carter played chimpanzee Ari, while Tim Roth played the human-hating chimpanzee General Thade. The film received mixed reviews; most critics believed it failed to compare to the original. Much of the negative commentary focused on the confusing plot and twist ending, though many reviewers praised the"
108 |                 }
109 |             ]
110 |         }
111 |     ]
112 | }


--------------------------------------------------------------------------------
/prompts/eli5_default.json:
--------------------------------------------------------------------------------
  1 | {
  2 |     "instruction": "Instruction: Write an accurate, engaging, and concise answer for the given question using only the provided search results (some of which might be irrelevant) and cite them properly. Use an unbiased and journalistic tone. Always cite for any factual claim. When citing several search results, use [1][2][3]. Cite at least one document and at most three documents in each sentence. If multiple documents support the sentence, only cite a minimum sufficient subset of the documents.",
  3 |     "demo_sep": "\n\n\n",
  4 |     "demo_prompt": "{INST}\n\nQuestion: {Q}\n\n{D}\nAnswer: {A}",
  5 |     "doc_prompt": "Document [{ID}](Title: {T}): {P}\n",
  6 |     "demos": [
  7 |         {
  8 |             "question": "Why did New York City try to ban food donations to the poor?",
  9 |             "answer": "New York City, under Mayor Michael Bloomberg's administration, banned citizens from donating food directly to homeless shelters because the city could not assess the salt, fat, and fiber content [1][2][3]. Bloomberg's administration was heavily criticized for losing their common sense by becoming too focused on what people eat [2].",
 10 |             "docs": [
 11 |                 {
 12 |                     "title": "The Future Of America", 
 13 |                     "text": "believe that they are \u201chelping\u201d the homeless by passing such laws. In New York City, Mayor Bloomberg has banned citizens from donating food directly to homeless shelters and he is actually convinced that it was the right thing to do for the homeless\u2026 Mayor Michael Bloomberg\u2019s food police have struck again! Outlawed are food donations to homeless shelters because the city can\u2019t assess their salt, fat and fiber content, reports CBS 2\u2019s Marcia Kramer. Glenn Richter arrived at a West Side synagogue on Monday to collect surplus bagels \u2014 fresh nutritious bagels \u2014 to donate to the poor."
 14 |                 },
 15 |                 {
 16 |                     "title": "mayor bloomberg", 
 17 |                     "text": "Amuck: Bloomberg Bans Food Donations in New York City Food Might Be Salty or Too High in Calories, City Explains Washington, D.C. \u2013 New York Mayor Michael Bloomberg\u2019s administration is now banning all food being offered to the city\u2019s homeless shelters. New York City\u2019s bureaucrats have become so singularly focused on what people eat, says the National Center for Public Policy Research, that they\u2019ve lost their common sense. \u201cSo much for serving the homeless: The Bloomberg administration is now taking the term \u2018food police\u2019 to new depths, blocking food donations to all government-run facilities that serve the"
 18 |                 },
 19 |                 {
 20 |                     "title": "New York City bans food donations - WND", 
 21 |                     "text": "New York City bans food donations - WND Front Page Health U.S. New York City bans food donations Inability to control 'nutritional content' cited as reason New York City homeless shelters have Mayor Michael Bloomberg to thank for a halt in food donations, for which hungry families are waiting, according to one public policy advocate. \"The Bloomberg administration is now taking the term 'food police' to new depths, blocking food donations to all government-run facilities that serve the city's homeless,\" says Jeff Stier, a National Center for Public Policy Research senior fellow. Currently, no food can be given to government-run, New York City facilities, despite hungry crowds perfectly"
 22 |                 },
 23 |                 {
 24 |                     "title": "New York City bans food donations - WND", 
 25 |                     "text": "New York City bans food donations - WND Services didn't return WND calls. Stier told WND that he specifically was told by Diamond that the policy was tied to the nutritional guidelines set by the mayor. \"They can say that this ban on donations is a long-standing policy, but they can\u2019t document it,\" Stier told WND. \"I've also been told that there are numerous food shelves that have been accepting food donations, not just one.\" Stier is a member of a New York Synagogue that has donated food for over a decade. He is outraged that the DHS' response to his demand to know why the practice can"
 26 |                 },
 27 |                 {
 28 |                     "title": "New York City bans food donations - WND", 
 29 |                     "text": "New York City bans food donations - WND ban on donated food. In fact, it thrives because of food donations. New York City Rescue Mission has been providing food, clothing, shelter and spiritual hope for needy New Yorkers since 1872. \"We feed over 500 people a day, all through donations,\" said James Varnhagen, NYCRM director. \"Boxed food, canned food, prepared food, we take any food,\" he told WND. \"We couldn't survive without donations,\" he said."
 30 |                 }
 31 |             ]
 32 |         },
 33 |         {
 34 |             "question": "What's the difference between Shia vs. Sunni Islam?",
 35 |             "answer": "The main difference between Shia and Sunni Muslim is related to ideological heritage and issues of leadership [1]. This difference is first formed after the death of the Prophet Muhammad in 632 A.D. [1][2]. The ideological practice of the Sunni branch strictly follows Prophet Muhammad and his teachings, while the Shia branch follows Prophet Muhammad's son-in-law Ali [2]. Nowadays, Sunni and Shia are the major branches of Islam [3].",
 36 |             "docs": [
 37 |                 {
 38 |                     "title": "The Sunni vs Shia Divide - Explained - Globaloi", 
 39 |                     "text": "centuries-long strained relationship between Sunnis and Shias. As a scholar of Islam and a public educator, I often field questions about Sunnis, Shias and the sects of Islam. What exactly is the Shia-Sunni divide? And what is its history? History of divide Both Sunnis and Shias \u2013 drawing their faith and practice from the Qur\u2019an and the life of the Prophet Muhammad \u2013 agree on most of the fundamentals of Islam. The differences are related more to historical events, ideological heritage and issues of leadership. The first and central difference emerged after the death of Prophet Muhammad in A.D. 632."
 40 |                 },
 41 |                 {
 42 |                     "title": "What\u2019s the difference between Sunni and Shia Islam? \u2013 Macrosnaps", 
 43 |                     "text": "What\u2019s the difference between Sunni and Shia Islam? Sunni and Shia identities (the 2 main branches of Islam) first formed around a dispute over leadership succession after the death of the Prophet Muhammad in 632 A.D. Sunni is the larger branch (estimated 85-90% of total world Muslim population) and it's adherents are referred to as \"people of the tradition of Muhammad\", while Shia are \"followers\" of Muhammad's son-in-law and cousin Ali. Sunnis rely heavily on the practice of the Prophet Muhammad and his teachings, the Shia view their ayatollahs as reflections of God on earth. What challenges does the anti-IS"
 44 |                 },
 45 |                 {
 46 |                     "title": "Difference between Sunni and Shia Muslims | Sunni vs Shia Muslims", 
 47 |                     "text": "of Muhammad, the last prophet of God. A follower of Islam is known as a Muslim. Many Muslims believe that their sole purpose is to worship and serve God, for which they have established five pillars of Islam that guides a Muslim on almost every aspect of life and society. Due to differences, Muslims have been divided into two primary sects: The Sunnis and the Shias. These two sects have many similarities and both consider themselves are Muslims, following the will of God. However, they are also different from each other in certain aspects. Both the Sunnis and the Shias,"
 48 |                 },
 49 |                 {
 50 |                     "title": "What is the difference between Shia and Sunni Islam? - Islam Stack Exchange", 
 51 |                     "text": "What is the difference between Shia and Sunni Islam? - Islam Stack Exchange between Mutah marriage and Misyar marriage? What theological and historical factors distinguish Ibadi Islam from either Shia or Sunni schools? What are the principle/fundamental differences between Sunni and Shia? Nikah between a Sunni girl and Shia boy What is the difference between \u201cMubtalat-of-Wudu\u201d of Shia and Sunni? How can the Hadith be reliable when Sunnis and Shia follow different points of reference? Rejection of Mutawatir Hadith in Sunni Islam and Shia Islam"
 52 |                 },
 53 |                 {
 54 |                     "title": "What is the difference between Sunni and Shia Islam? | Patrick Syder Travel", 
 55 |                     "text": "What is the difference between Sunni and Shia Islam? | Patrick Syder Travel What is the difference between Sunni and Shia Islam? This Channel 4 link answers some of the key questions about the difference between Sunni and Shia Islam and alarmingly, the politics on what is happening and why, in Syria\u2026\u2026. http://www.channel4.com/news/sunni-shia-islam-muslim-syria-middle-east-key-questions \u2190 Ethiopia Appeal \u2013 Help sponsor a nurse to train and to help others G\u00f6bekli Tepe, Turkey: a new wonder of the ancient world by Jeremy Seal (Telegraph Travel Section 23/04/2013) \u2192"
 56 |                 }
 57 |             ]
 58 |         },
 59 |         {
 60 |             "question": "What causes Bi-polar disorder?",
 61 |             "answer": "Bipolar disorder is an emotional disorder that causes extreme mood swings between excitement and depression [1][3]. The spectrum of mood swing may span from days to months [1][2]. We are still not certain of the exact factors that cause such disorder, but genetics is considered a major factor [2][3].",
 62 |             "docs": [
 63 |                 {
 64 |                     "title": "Bi-polar disorder | definition of Bi-polar disorder by Medical dictionary", 
 65 |                     "text": "bi-polar disorder | definition of bi-polar disorder by medical dictionary https://medical-dictionary.thefreedictionary.com/bi-polar+disorder (redirected from bi-polar disorder) related to bi-polar disorder: depression bipolar disorder, formerly known as manic depression, is a mood disorder that causes radical emotional changes and mood swings, from manic, restless highs to depressive, listless lows. most bipolar individuals experience alternating episodes of mania and depression. bipolar disorder is characterized by alternating manic episodes in which the individual feels abnormally euphoric, optimistic, and energetic and depressive periods in which the individual feels sad, hopeless, guilty, and sometimes suicidal. manic or depressive periods may last for days, weeks, or months"
 66 |                 },
 67 |                 {
 68 |                     "title": "Mania and Bi-Polar", 
 69 |                     "text": "can go from depressed to \u201csuper happy\u201d all in one day, or even in a few days, does not have a bi-polar disorder Bi-polar looks different depending on the severity of the symptoms. Most bi-polar diagnoses that are made are for bi-polar 2, with bi-polar 1 being much more rare. Bi-polar 1 is so severe that the individual will have periods of such agitation, or such reckless and seemingly foolish behavior that they put themselves or those around them in danger. It is not completely clear what causes bi-polar, but genetics seem to have a large role. The biggest factor"
 70 |                 },
 71 |                 {
 72 |                     "title": "Bi-Polar disorder", 
 73 |                     "text": "Bi-Polar disorder Bi-polar is generally a cyclic disease where individuals display depressive and elevated episodes at regular intervals. It is a disorder resulting from the imbalance of the chemicals in the brain that causes a lot of fluctuations of mood. It is a fact that we all experience happy and sad moods, but people with bi-polar disorder experience the changes in mood at an increased level. The cause of this disorder is not known completely. However, it is estimated that there are different factors responsible for it. It is often connected to a genetic component. People suffering from the Bi-polar disorder are"
 74 |                 },
 75 |                 {
 76 |                     "title": "For Individuals \u2014 Adam Schwartz", 
 77 |                     "text": "For Individuals \u2014 Adam Schwartz The information is extensive and covers a huge range of topics. Some of the topics include the different types of bi-polar, what it feels like, signs and symptoms, treatments and more. Black Dog Institute bi-polar causes resource specifically covers the variety of areas that could potentially be a cause of bi-polar disorder. Including genetics, environmental factors, pregnancy, and more. Black Dog Institute bi-polar treatments resource specifically covers multiple potential treatments options for bi-polar. Including management, types of psychological treatment, lifestyle changes, and more. Black Dog Institute bi-polar self-test resource is a short self-test for people who may be concerned if"
 78 |                 },
 79 |                 {
 80 |                     "title": "Depression Bi-polar Disorder Symptoms 2019 | Win Over Depression", 
 81 |                     "text": "Depression Bi-polar Disorder Symptoms 2019 | Win Over Depression signs and symptoms of bipolar disorder. Learn more about the common symptoms of bipolar depression that some patients may experience. Home \u00bb Trending Health News \u00bb 10 Warning Signs of Bipolar Disorder: Depression. One of the most serious symptoms of bipolar disorder is. Bi Polar Depression. SEVERE SWINGS What is bipolar disorder, is it the same as manic depression, what are the symptoms and is there a cure? Bipolar disorder, or manic depression, causes symptoms of mania and depression. Read about bipolar disorder treatment, medications, and causes of this. Learn more about the different types of bipolar disorder. Find out"
 82 |                 }
 83 |             ]
 84 |         },
 85 |         {
 86 |             "question": "How do student loans affect getting a mortgage?",
 87 |             "answer": "When applying for a mortgage, student loans can affect the debt to income ratio, which is a key factor in determining the amount that an individual can afford to pay for the mortgage [1]. While student loan repayments do not appear in an individual's credit history and do not affect credit scores, lenders do consider the amount of an individual's student loan repayments when assessing their mortgage application [1][2][3]. Some 83% of non-homeowners say student loan debt is preventing them from buying a home, according to the National Association of Realtors [2]. It is important to note that student loans do not prevent an individual from getting a mortgage [1].",
 88 |             "docs": [
 89 |                 {
 90 |                     "title": "Student Loans \u2013 How do they work? | The Financial Review", 
 91 |                     "text": "typical debt. Student loan repayments do not appear in an individual\u2019s credit history, therefore there are no implications whatsoever. This also extends to applications for credit cards \u2013 student \u2018loans\u2019 are not acknowledged. One noteworthy aspect that is affected by student loans however, is mortgage applications. Nevertheless, it does not prevent an individual from getting a mortgage. For example, lenders will consider the amount of an individual\u2019s student loan repayments in order to assess the debt to income ratio and therefore establish the amount that the individual can afford to pay for the mortgage. Just as they do with other"
 92 |                 },
 93 |                 {
 94 |                     "title": "How Does Student Loan Debt Affect Buying a Home? | Experian", 
 95 |                     "text": "Rates & Affordability How Student Loans Affect Getting a Mortgage Student Loan Impact on Credit Scores Other Factors for Getting Approved for a Mortgage If you're a recent college grad and hope to become a homeowner in the near future, you should know that student loan debt could affect buying a home by making it more difficult to get a mortgage. Some 83% of non-homeowners say student loan debt is preventing them from buying a home, according to the National Association of Realtors (NAR). But while student loan payments can make it harder to save for a down payment on"
 96 |                 },
 97 |                 {
 98 |                     "title": "Studentloanify - How your student loans affect your home mortgage prospects", 
 99 |                     "text": "Though it may not seem fair, your student loan situation impacts your home mortgage outlook. Many people carry student loan debt, but it\u2019s the amount of the loan and how you handle your student loan repayment plan that will influence your ability to get a home mortgage as well as what your interest rate will be. Here are some specific factors about your student loan that will affect your home mortgage prospects. On your mortgage loan application, you will have to report how much your monthly student loan payment is. This amount will be deducted from your monthly gross income"
100 |                 },
101 |                 {
102 |                     "title": "How do student loans affect your credit score? | Student Loan Planner", 
103 |                     "text": "How do student loans affect your credit score? | Student Loan Planner Your credit score is the three-digit number that dictates a lot in your adult life. Whether you\u2019re applying for a mortgage or looking to get an auto loan, this seemingly arbitrary number determines whether you get approved for a loan and also affects your interest rate. If you\u2019re a student loan borrower you may wonder, \u201cDo student loans affect credit score?\u201d You might be especially curious if you\u2019re in the process of applying for a mortgage. Here\u2019s how student loans affect your credit score and what to know for big life events, like getting a mortgage. Do student loans affect"
104 |                 },
105 |                 {
106 |                     "title": "Does Student Loan Debt Affect Getting A Mortgage?", 
107 |                     "text": "Does Student Loan Debt Affect Getting A Mortgage? Home \u00bb Does Student Loan Debt Affect Getting A Mortgage? Last year, I helped answer a reader\u2019s question about applying for a mortgage while on Income Based Repayment. However, over the last several months, I\u2019ve been getting bombarded with questions about how student loan debt impacts your ability to get a mortgage. Maybe it\u2019s because the housing market is improving, or maybe it\u2019s because people are finally taking their student loan debt seriously. Anyway, I wanted to share a few reader questions and then look at whether student loan debt affects getting a mortgage. Here are the reader questions I\u2019ve"
108 |                 }
109 |             ]
110 |         }
111 |     ]
112 | }


--------------------------------------------------------------------------------
/run.py:
--------------------------------------------------------------------------------
  1 | import logging
  2 | logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
  3 | logger = logging.getLogger(__name__)
  4 | logger.setLevel(logging.INFO)
  5 | 
  6 | import argparse
  7 | import os
  8 | import json
  9 | from tqdm import tqdm
 10 | import numpy as np
 11 | import re
 12 | import yaml
 13 | from utils import *
 14 | from nltk import sent_tokenize
 15 | from openai_account_manager import openai_llm_generate_multi_thread
 16 | from llm import LLM
 17 | 
 18 | def remove_citations(sent):
 19 |     return re.sub(r"\[\d+", "", re.sub(r" \[\d+", "", sent)).replace(" |", "").replace("]", "")
 20 | 
 21 | 
 22 | def main():
 23 |     parser = argparse.ArgumentParser()
 24 |     parser.add_argument('--ndoc_top_bottom', type=int, default=0)
 25 |     parser.add_argument('--ndoc_top_neighbor', type=int, default=0)
 26 |     parser.add_argument('--output_fp', default=None, type=str, required=True)
 27 |     parser.add_argument("--config", type=str, default=None, help="Path to the config file")
 28 |     parser.add_argument('--openai_multi_thread',type=int,required=True)
 29 |     parser.add_argument('--turbo_system_message', required=True)
 30 |     parser.add_argument('--use_sub_questions', type=int, default=0)
 31 |     # Prompt file is a json file that contains the following fields:
 32 |     # - instruction: the instruction, which will appear at the beginning of each demo and the test example
 33 |     # - demo_sep: the separator between each demo, for example, "\n\n\n"
 34 |     # - demo_prompt: the prompt for the demo, for example, "Instruction: {INST}\n\nQuestion: {Q}\n\n{D}\nAnswer: {A}"
 35 |     #     - {INST}: the instruction
 36 |     #     - {D}: the documents
 37 |     #     - {Q}: the question
 38 |     #     - {A}: the answers
 39 |     # - doc_prompt, the prompt for each document, for example, "Document [{ID}](Title: {T}): {P}", where
 40 |     #     - {ID}: the document id, staring from 1
 41 |     #     - {T}: the document title
 42 |     #     - {P}: the document text
 43 |     # - demos: a list of demo examples, each of which should have
 44 |     #     - question: the question
 45 |     #     - docs: the documents ("title" and "text")
 46 |     #     - answer: the answer to show in the demo. If it is a list, they will be concatenated by "\n". This is useful when the answer includes interactive components.
 47 |     # Note that this python file will sample `--shot` demos from the prompt file given the random seed `--seed`
 48 |     parser.add_argument("--prompt_file", type=str, help="Path to the prompt file")
 49 | 
 50 |     # Evaluation file is a json file that contains a list of item, each of which contains
 51 |     # - question: the question
 52 |     # - answer: the answer
 53 |     # - docs: the documents, each of which contains "title", "text"
 54 |     parser.add_argument("--data_file", type=str, help="Path to the eval file")
 55 |     parser.add_argument("--quick_test", type=int, default=0, help="Quickly test a few examples")
 56 | 
 57 |     # ICL setting
 58 |     parser.add_argument("--ndoc", type=int, help="Number of documents")
 59 |     parser.add_argument("--shot", type=int, help="Number of ICL demonstrations")
 60 |     parser.add_argument("--seed", type=int, default=42, help="Seed for the random number generator")
 61 |     parser.add_argument("--no_doc_in_demo", type=bool, default=False, help="Whether to remove the documents in the demos")
 62 |     parser.add_argument("--fewer_doc_in_demo", type=bool, default=False, help="Whether to use fewer documents in the demos")
 63 |     parser.add_argument("--ndoc_in_demo", type=int, default=None, help="When using --fewer_doc_in_demo, use this to designate how many docs in demo")
 64 | 
 65 |     # Model and name
 66 |     parser.add_argument("--dataset_name", type=str, help="Name of the dataset (for saving)")
 67 |     parser.add_argument("--tag", type=str, help="Tag of run (for saving)")
 68 |     parser.add_argument("--model", type=str, help="Model to use")
 69 |     parser.add_argument("--openai_api", type=bool, default=False, help="Whether to use OpenAI API")
 70 |     parser.add_argument("--azure", action="store_true", default=False, help="Azure openai API")
 71 | 
 72 |     # Decoding
 73 |     parser.add_argument("--temperature", type=float, default=0.5, help="Temperature for decoding")
 74 |     parser.add_argument("--top_p", type=float, default=1.0, help="Nucleus sampling top-p")
 75 |     parser.add_argument("--max_new_tokens", type=int, default=300, help="Max number of new tokens to generate in one step")
 76 |     parser.add_argument("--max_length", type=int, default=2048, help="Max length the model can take. Should set properly wrt the model to avoid position overflow.")
 77 |     parser.add_argument("--num_samples", type=int, required=True, help="Sample multiple answers.")
 78 | 
 79 |     # Use summarization/extraction of the documents
 80 |     parser.add_argument("--use_shorter", type=str, default=None, help="Whether to use summary data or extraction data for documents. Option: None, `summary`, `extraction`")
 81 | 
 82 |     # Interactive
 83 |     parser.add_argument("--interactive", type=bool, default=False, help="Whether to run in interactive mode")
 84 |     parser.add_argument("--interactive_query", type=str, default=None, help="The query to use in interactive mode, either `doc_id` (corresponding to interact in paper) or `search` (corresponding to inlinesearch in paper).")
 85 |     parser.add_argument("--retriever", type=str, default=None, help="When using interactive search mode, which retriever to use. Options: `tfidf`, `gtr-t5-large`")
 86 |     parser.add_argument("--retriever_device", type=str, default="cuda", help="Where to put the dense retriever if using. Options: `cuda`, `cpu`")
 87 |     parser.add_argument("--retrieve_in_all_docs", type=bool, default=False, help="Retrieve in all documents instead of just top ndoc")
 88 |     parser.add_argument("--max_turn", type=int, default=10, help="Max number of all actions")
 89 |     parser.add_argument("--max_doc_show", type=int, default=3, help="Max number of documents to show at one time.")
 90 |     parser.add_argument("--force_cite_show", type=bool, default=False, help="Force citing the documents that are shown to the model")
 91 | 
 92 |     # Load config
 93 |     args = parser.parse_args()
 94 | 
 95 |     config = yaml.safe_load(open(args.config)) if args.config is not None else {}
 96 |     parser.set_defaults(**config)
 97 |     args = parser.parse_args()
 98 | 
 99 |     assert args.openai_api, "only support openai_api now"
100 |     assert not args.azure, "not support azure"
101 |     assert not args.interactive, "not support interactive"
102 |     assert not args.no_doc_in_demo
103 |     assert not args.fewer_doc_in_demo
104 | 
105 |     # Save args
106 |     args_dict = vars(args)
107 |     directory = os.path.dirname(args.output_fp)
108 |     with open(f'{directory}/args.json', 'a') as f:
109 |         json.dump(args_dict, f, indent=4)
110 | 
111 |     if args.num_samples > 1:
112 |         assert args.temperature > 0, "when multiple sampling, do not use temperature=0, i.e., greedy decoding"
113 |     # assert args.num_samples == 1, "not support num_samples>1"
114 | 
115 |     for k in args.__dict__:
116 |         print(f"{k}: {args.__dict__[k]}")
117 | 
118 |     if "turbo" in args.model:
119 |         # ChatGPT has a longer max length
120 |         logger.info("Change the max length to 4096 for ChatGPT.")
121 |         args.max_length = 4096
122 | 
123 |     # Load the model or setup the API
124 |     llm = LLM(args)
125 | 
126 |     # Generate prompts
127 |     np.random.seed(args.seed)
128 | 
129 |     # Load data
130 |     prompt_data = json.load(open(args.prompt_file))
131 |     eval_data = json.load(open(args.data_file))
132 | 
133 |     logger.info("Generate the demonstration part")
134 |     head_prompt = ""
135 |     train_ids = np.random.choice(len(prompt_data["demos"]), args.shot, replace=False)
136 |     for train_id in train_ids:
137 |         train_item = prompt_data["demos"][train_id]
138 |         ndoc = args.ndoc
139 |         if args.no_doc_in_demo:
140 |             ndoc = 0
141 |         elif args.fewer_doc_in_demo:
142 |             assert args.ndoc_in_demo is not None
143 |             ndoc = args.ndoc_in_demo
144 |         # Run here
145 |         head_prompt += make_demo(
146 |             train_item, prompt=prompt_data["demo_prompt"], ndoc=ndoc, doc_prompt=prompt_data["doc_prompt"], 
147 |             instruction=prompt_data["instruction"], use_shorter=args.use_shorter, test=False, use_sub_questions=args.use_sub_questions
148 |         )
149 |         head_prompt += prompt_data["demo_sep"]
150 | 
151 |     # Sample quick test
152 |     if args.quick_test > 0:  # Don't run
153 |         eval_ids = np.random.choice(len(eval_data), args.quick_test, replace=False)
154 |         eval_data = [eval_data[int(idx)] for idx in eval_ids]
155 | 
156 |     logger.info("Generating prompts...")
157 |     incomplete_doc_list = 0 # For some questions there might be less than ndoc documents
158 |     for idx, eval_item in enumerate(tqdm(eval_data)):
159 |         eval_data[idx]['prompt'] = head_prompt + make_demo(
160 |             eval_item, prompt=prompt_data["demo_prompt"], ndoc=args.ndoc, doc_prompt=prompt_data["doc_prompt"],
161 |             instruction=prompt_data["instruction"], use_shorter=args.use_shorter, test=True, use_sub_questions=args.use_sub_questions
162 |         )
163 |         if args.use_shorter is not None:
164 |             doc_list = get_shorter_text(eval_item, eval_item["docs"], args.ndoc, args.use_shorter)
165 |         else:
166 |             doc_list = eval_item["docs"][:args.ndoc]
167 | 
168 |             if args.ndoc_top_bottom > 0:
169 |                 doc_list += eval_item["docs"][-args.ndoc_top_bottom:]
170 |             if args.ndoc_top_neighbor > 0:
171 |                 doc_list += eval_item['docs'][30:30+args.ndoc_top_neighbor]
172 | 
173 |             assert not (args.ndoc_top_bottom > 0 and args.ndoc_top_neighbor > 0), 'not support args.ndoc_top_neighbor and args.ndoc_top_bottom both > 0'
174 | 
175 |         if not args.retrieve_in_all_docs:
176 |             # If --retrieve_in_all_docs, we keep the original docs and do not trim them by ndoc
177 |             # Otherwise, take the new docs (truncated by ndoc and filtered if using summary/extraction)
178 |             eval_data[idx]['docs'] = doc_list
179 |         if len(doc_list) < args.ndoc:
180 |             incomplete_doc_list += 1
181 |     logger.info("Done.")
182 |     if incomplete_doc_list > 0:
183 |         logger.warning(f"There are {incomplete_doc_list} questions that have incomplete document list (may due to a lot of them are filtered out by summary/extraction).")
184 | 
185 |     # Load retriever for interactive search 
186 |     if args.interactive and args.interactive_query == "search" and "gtr" in args.retriever:  # Don't run
187 |         from sentence_transformers import SentenceTransformer
188 |         gtr_model = SentenceTransformer(f'sentence-transformers/{args.retriever}', device=args.retriever_device)
189 |         from searcher import SearcherWithinDocs
190 | 
191 |     eval_data_openai_queries = []
192 | 
193 |     for idx, item in enumerate(tqdm(eval_data)):
194 |         prompt = item['prompt']
195 |         prompt_len = len(llm.tokenizer.tokenize(prompt))
196 |         if idx == 0:
197 |             print(prompt)
198 |         eval_data_openai_queries.append({'input': prompt, 'max_tokens': min(args.max_new_tokens, args.max_length - prompt_len)})
199 | 
200 |         if "turbo" in args.model and not args.azure:  # Run
201 |             assert args.turbo_system_message != None
202 |             # For OpenAI's ChatGPT API, we need to convert text prompt to chat prompt
203 |             item['prompt'] = [
204 |                 {'role': 'system', 'content': args.turbo_system_message},
205 |                 {'role': 'user', 'content': prompt}
206 |             ]
207 | 
208 |     if args.openai_multi_thread > 1:
209 |         eval_data_openai_responses = openai_llm_generate_multi_thread(eval_data_openai_queries,
210 |                                                                     llm,
211 |                                                                     args.openai_multi_thread,
212 |                                                                     1,
213 |                                                                     args.turbo_system_message)
214 |     else:
215 |         raise NotImplementedError
216 | 
217 |     for idx, item in enumerate(tqdm(eval_data)):
218 |         eval_data_openai_response = eval_data_openai_responses[idx]
219 |         for j, decoded_output in enumerate(eval_data_openai_response):
220 |             decoded_output = decoded_output.replace("<|im_end|>", "").rstrip()
221 |             if decoded_output.endswith("End."):
222 |                 decoded_output = decoded_output[:-len("End.")]
223 |             eval_data_openai_response[j] = decoded_output
224 | 
225 |         logger.info(f"Question: {item['question']}")
226 |         logger.info(f"Gold answer: {item['answer']}")
227 |         logger.info(f"Final model output:")
228 |         for j, decoded_output in enumerate(eval_data_openai_response):
229 |             print('{}: {}'.format(j,decoded_output))
230 |         item['output'] = eval_data_openai_response if len(eval_data_openai_response) > 1 else eval_data_openai_response[0]
231 | 
232 | 
233 |     # for idx, item in enumerate(tqdm(eval_data)):
234 |     for idx, item in enumerate([]):
235 | 
236 |         prompt = item['prompt']
237 |         prompt_len = len(llm.tokenizer.tokenize(prompt))
238 | 
239 |         if idx == 0:
240 |             print(prompt)
241 | 
242 |         output_array = []
243 |         for _ in range(args.num_samples):
244 |             if args.interactive:
245 |                 print("============ Interactive =============")
246 |                 output_answer = ""
247 |                 doc_list = item['docs']
248 | 
249 |                 interactive_prompt = prompt.rstrip() + "\n" # Start a new line
250 |                 inline_doc = ""
251 |                 num_turn = 0
252 |                 
253 |                 doc_history = []
254 |                 while True:
255 |                     # For each action, it should end at the new line
256 |                     # Three possible actions
257 |                     # - Check: Document [1][2][3] / search query
258 |                     # - Output: output 
259 |                     # - End
260 |                     num_turn += 1
261 |                     new_prompt = interactive_prompt + inline_doc
262 |                     new_prompt_len = len(llm.tokenizer.tokenize(new_prompt))
263 | 
264 |                     if idx == 0:
265 |                         print(f"-------------- Step {num_turn} prompt --------------")
266 |                         print(new_prompt)
267 |                         print("-----------------------------")
268 | 
269 |                     output = llm.generate(new_prompt, min(args.max_new_tokens, args.max_length-new_prompt_len), stop=["\n", "\n\n"])
270 | 
271 |                     if len(inline_doc) > 0:
272 |                         output = "Output: " + output # "Output: " was included in inline_doc
273 |                     inline_doc = "" # Delete inline_doc after use
274 |                     interactive_prompt += output + "\n"
275 |                     logger.info(f"Model output: \"{output}\"")
276 | 
277 |                     if output.strip().lower()[:3] == "end":
278 |                         # Model decides to end the generation
279 |                         break
280 |                     elif "sorry" in output.lower() and ("relevant document" in output.lower() or "relevant information" in output.lower()) or "none of the documents" in output.lower():
281 |                         # Instruction-tuned model may abstain from answer the question
282 |                         break
283 |                     elif output.strip().lower()[:5] == "check" or output.strip().lower()[:6] == "search":
284 |                         # Checkout or search documents
285 |                         if args.interactive_query == "search":
286 |                             query = output.replace("Search:", "").replace("search:", "").strip()
287 |                             if len(doc_list) == 0:
288 |                                 show_doc_ids = []
289 |                             else:
290 |                                 searcher = SearcherWithinDocs(doc_list, args.retriever, model=gtr_model, device=args.retriever_device)
291 |                                 show_doc_ids = [int(searcher.search(query))]
292 |                         elif args.interactive_query == "doc_id":
293 |                             show_doc_ids = [int(r[1:])-1 for r in re.findall(r"\[\d+", output)] # In text citation id starts from 1
294 |                             show_doc_ids = [doc_id for doc_id in show_doc_ids if doc_id < len(doc_list) and doc_id >= 0]
295 |                             show_doc_ids = show_doc_ids[:args.max_doc_show] # Avoiding showing too many documents
296 |                         else:
297 |                             raise NotImplementedError
298 | 
299 |                         inline_doc = "".join([make_doc_prompt(doc_list[doc_id], doc_id, prompt_data["doc_prompt"]) for doc_id in show_doc_ids])
300 |                         inline_doc += "Output:" # Force the model to generate output in the next step
301 |                         doc_history.append(show_doc_ids)
302 |                     elif output.strip().lower()[:6] == "output":
303 |                         output = output.strip().replace("Output:", "").strip()
304 |                         if args.force_cite_show:
305 |                             output = remove_citations(output)
306 |                             if len(doc_history) == 0:
307 |                                 logger.warn("No doc history??")
308 |                             else:
309 |                                 # Just cite whatever documents the model has seen in the last step
310 |                                 if "qampari" in args.data_file:
311 |                                     output = ", ".join(["".join([f"[{doc+1}]" for doc in doc_history[-1]]) + " " + entity.strip() for entity in output.rstrip().rstrip(",").split(",")]) + ", "
312 |                                 else:
313 |                                     output = " ".join(["".join([f"[{doc+1}]" for doc in doc_history[-1]]) + " " + o for o in sent_tokenize(output)]) + "."
314 |                         output_answer += " " + output 
315 |                     else:
316 |                         # Sometimes model starts to output random things.
317 |                         break
318 |                     
319 |                     if num_turn >= args.max_turn:
320 |                         logger.warning("Reach maximum number of turns. Terminate now.")
321 |                         break
322 |                 
323 |                 if "qampari" in args.data_file:
324 |                     output_answer = output_answer.rstrip().rstrip(",")
325 |                 output_array.append(output_answer)
326 |                 item['prompt'] = interactive_prompt
327 |                 item['doc_history'] = doc_history
328 |             else: 
329 |                 output_array.append(llm.generate(prompt, min(args.max_new_tokens, args.max_length-prompt_len)))
330 |                 item['prompt'] = prompt
331 |             
332 |             output_array[-1] = output_array[-1].replace("<|im_end|>", "").rstrip()
333 |             if output_array[-1].endswith("End."):
334 |                 output_array[-1] = output_array[-1][:-len("End.")]
335 | 
336 |             logger.info(f"Prompt length={prompt_len}")
337 |             logger.info(f"Question: {item['question']}")
338 |             logger.info(f"Gold answer: {item['answer']}")
339 |             logger.info(f"Final model output: {output_array[-1]}")
340 |         
341 |         item['output'] = output_array if len(output_array) > 1 else output_array[0]
342 |         
343 |     # Calculate the price for OpenAI API
344 |     if args.openai_api:
345 |         logger.info(f"Total token used: {llm.total_tokens}")
346 |         if "turbo" in args.model:
347 |             unit_price = 0.002
348 |         else:
349 |             unit_price = 0.02
350 |         logger.info(f"Unit price: {unit_price}")
351 |         logger.info(f"Total cost: %.1f" % (llm.total_tokens / 1000 * unit_price))
352 |     
353 |     logger.info(f"#Cases when prompts exceed max length: {llm.prompt_exceed_max_length}")
354 |     logger.info(f"#Cases when max new tokens < 50: {llm.fewer_than_50}")
355 | 
356 |     # Save the result
357 |     model_name = args.model
358 |     # if "/" in model_name:
359 |     #     model_name = model_name.split("/")[-1]
360 |     os.makedirs('exps',exist_ok=True)
361 |     # name = f"exps/{args.tag}-{args.dataset_name}-{model_name.replace('/','_').replace('-','_')}-shot{args.shot}-ndoc{args.ndoc}-{args.seed}"
362 |     name = f"{args.dataset_name}-{model_name}-{args.tag}-shot{args.shot}-ndoc{args.ndoc}-{args.seed}"
363 |     if args.azure:
364 |         name += "-azure"
365 |     if args.quick_test > 0:
366 |         name += f"-quick_test{args.quick_test}"
367 |     if args.no_doc_in_demo:
368 |         name += "-no_doc_in_demo"
369 |     if args.fewer_doc_in_demo:
370 |         name += f"-{args.ndoc_in_demo}_doc_in_demo"
371 |     if args.num_samples > 1:
372 |         name += f"-sample{args.num_samples}"
373 |     if args.force_cite_show:
374 |         name += f"-forceciteshow"
375 | 
376 |     eval_data = {
377 |         "args": args.__dict__,
378 |         "data": eval_data,
379 |     }
380 |     if args.openai_api:
381 |         eval_data["total_cost"] = llm.total_tokens / 1000 * unit_price
382 |         if args.azure:
383 |             eval_data["azure_filter_fail"] = llm.azure_filter_fail 
384 | 
385 |     if args.output_fp != None:
386 |         name = args.output_fp
387 |     else:
388 |         if not os.path.exists("result"):
389 |             os.makedirs("result")
390 |             name = "result/" + name + ".json"   
391 | 
392 |     logger.info('output_fp:{}'.format(name))
393 |     json.dump(eval_data, open(name, "w"), indent=4)
394 | 
395 | if __name__ == "__main__":
396 |     main()
397 | 


--------------------------------------------------------------------------------
/eval.py:
--------------------------------------------------------------------------------
  1 | import argparse
  2 | import collections
  3 | import json
  4 | import re
  5 | import string
  6 | import torch
  7 | import copy
  8 | 
  9 | from nltk import sent_tokenize
 10 | import numpy as np
 11 | from rouge_score import rouge_scorer, scoring
 12 | from tqdm import tqdm
 13 | import logging
 14 | logging.basicConfig(format='%(asctime)s - %(levelname)s - %(name)s - %(message)s',
 15 |                     datefmt='%m/%d/%Y %H:%M:%S')
 16 | logger = logging.getLogger(__name__)
 17 | logger.setLevel(logging.INFO)
 18 | 
 19 | from transformers import (
 20 |     AutoModelForSeq2SeqLM,
 21 |     AutoTokenizer,
 22 |     pipeline
 23 | )
 24 | 
 25 | from utils import normalize_answer, get_max_memory, remove_citations
 26 | 
 27 | QA_MODEL = "gaotianyu1350/roberta-large-squad"
 28 | AUTOAIS_MODEL = "google/t5_xxl_true_nli_mixture"
 29 | 
 30 | global autoais_model, autoais_tokenizer
 31 | autoais_model, autoais_tokenizer = None, None
 32 | 
 33 | 
 34 | def compute_f1(a_gold, a_pred):
 35 |     """Compute F1 score between two strings."""
 36 | 
 37 |     def _get_tokens(s):
 38 |         if not s:
 39 |             return []
 40 |         return normalize_answer(s).split()
 41 | 
 42 |     gold_toks = _get_tokens(a_gold)
 43 |     pred_toks = _get_tokens(a_pred)
 44 | 
 45 |     common = collections.Counter(gold_toks) & collections.Counter(pred_toks)
 46 |     num_same = sum(common.values())
 47 | 
 48 |     if len(gold_toks) == 0 or len(pred_toks) == 0:
 49 |         # If either is no-answer, then F1 is 1 if they agree, 0 otherwise
 50 |         return int(gold_toks == pred_toks)
 51 | 
 52 |     if num_same == 0:
 53 |         return 0
 54 | 
 55 |     precision = 1.0 * num_same / len(pred_toks)
 56 |     recall = 1.0 * num_same / len(gold_toks)
 57 |     f1 = (2 * precision * recall) / (precision + recall)
 58 | 
 59 |     return f1
 60 | 
 61 | 
 62 | def compute_exact(a_gold, a_pred):
 63 |     """Check whether two strings are equal up to normalization."""
 64 | 
 65 |     return int(normalize_answer(a_gold) == normalize_answer(a_pred))
 66 | 
 67 | 
 68 | def exact_presence(short_answers, context):
 69 |     """Verify if any of the answers is present in the given context.
 70 |     Args:
 71 |         short_answers: list of short answers to look for in the context
 72 |         context: a paragraph to search for short answers
 73 |     Returns:
 74 |         true if any of the short answers is present in the context
 75 |     """
 76 | 
 77 |     n_short_answers = [normalize_answer(sa) for sa in short_answers]
 78 |     n_context = normalize_answer(context)
 79 | 
 80 |     for ans in n_short_answers:
 81 |         if ans in n_context:
 82 |             return True
 83 | 
 84 |     return False
 85 | 
 86 | 
 87 | def compute_rouge(data):
 88 |     """Main function for rouge scoring.
 89 |     If two references are provided,
 90 |     the best score is chosen for each instance.
 91 |     Args:
 92 |         data: requires field `output` and `answer` (or `annotations` for ASQA)
 93 |         metrics: list of evaluation metrics
 94 |     Returns:
 95 |         dictionary representation of rouge scores
 96 |     """
 97 |     def _rouge_calculation(hypotheses,
 98 |                         references1,
 99 |                         references2=[],
100 |                         metrics=['rougeLsum']):
101 | 
102 |         if references2 == []:
103 |             references2 = references1
104 | 
105 |         scorer = rouge_scorer.RougeScorer(metrics, use_stemmer=True)
106 |         aggregator = scoring.BootstrapAggregator()
107 | 
108 |         for i in range(len(hypotheses)):
109 |             scores1 = scorer.score(references1[i], hypotheses[i])
110 |             scores2 = scorer.score(references2[i], hypotheses[i])
111 |             if scores1['rougeLsum'].fmeasure > scores2['rougeLsum'].fmeasure:
112 |                 aggregator.add_scores(scores1)
113 |             else:
114 |                 aggregator.add_scores(scores2)
115 | 
116 |         scores = {m: [] for m in metrics}
117 | 
118 |         for m in metrics:
119 |             fmeasure = aggregator.aggregate()[m].mid.fmeasure
120 |             scores[m].append(fmeasure)
121 | 
122 |         for m in scores:
123 |             scores[m] = 100 * sum(scores[m]) / len(scores[m])
124 | 
125 |         return scores
126 | 
127 |     hypotheses = {}
128 |     references1 = {}
129 |     references2 = {}
130 | 
131 |     for idx, item in enumerate(data):
132 |         hypotheses[idx] = item["output"]
133 |         if "annotations" in item and item['annotations'] is not None: # For ASQA
134 |             references1[idx] = item["annotations"][0]["long_answer"]
135 |             references2[idx] = item["annotations"][1]["long_answer"]
136 |         else:
137 |             references1[idx] = item["answer"]
138 |             references2[idx] = item["answer"]
139 | 
140 |     h, r1, r2 = [], [], []
141 | 
142 |     for key in references1:
143 |         h.append(hypotheses[key])
144 |         r1.append(references1[key])
145 | 
146 |         if references2 is not None:
147 |             r2.append(references2[key])
148 | 
149 |     h = ['\n'.join(sent_tokenize(text.lower())) for text in h]
150 |     r1 = ['\n'.join(sent_tokenize(text.lower())) for text in r1]
151 |     r2 = ['\n'.join(sent_tokenize(text.lower())) for text in r2]
152 |     scores = _rouge_calculation(h, r1, r2)
153 | 
154 |     return scores['rougeLsum']
155 | 
156 | 
157 | def compute_str_em(data):
158 |     """Compute STR-EM metric (only for ASQA)
159 |     Args:
160 |         data: requires field `qa_pairs/short_answers` and `output`
161 |     Returns:
162 |         STR-EM and STR-EM-HIT ()
163 |     """
164 | 
165 |     if 'qa_pairs' not in data[0] or data[0]['qa_pairs'] is None:
166 |         return 0, 0
167 | 
168 |     acc = []
169 |     hit = []
170 | 
171 |     for item in data:
172 |         loc_acc = []
173 |         for qa_pair in item['qa_pairs']:
174 |             loc_acc.append(exact_presence(qa_pair['short_answers'], item["output"]))
175 |         acc.append(np.mean(loc_acc))
176 |         hit.append( int(np.mean(loc_acc) == 1) )
177 | 
178 |     return 100 * np.mean(acc), 100 * np.mean(hit)
179 | 
180 | 
181 | def compute_len(data):
182 |     """Compute average length of predictions."""
183 | 
184 |     res, cntr = 0, 0
185 |     for item in data:
186 |         res += len(item["output"].split())
187 |         cntr += 1
188 |     return res / cntr
189 | 
190 | 
191 | def compute_qa(data):
192 |     """Compute QA-based accuracy.
193 |     Args:
194 |         data: requires filed `qa_pairs/short_answers` and `output`
195 |     Returns:
196 |         QA metrics (QA-EM, QA-F1, QA-Hit)
197 |     """
198 | 
199 |     if 'qa_pairs' not in data[0] or data[0]['qa_pairs'] is None:
200 |         logger.warn("Warning: no QA pairs found in data")
201 |         return {
202 |             'QA-EM': 0,
203 |             'QA-F1': 0,
204 |             'QA-Hit': 0,
205 |         }
206 | 
207 |     # Load model
208 |     logger.info("Loading the RoBERTa-large SQuAD model for QA-based accuracy...")
209 |     qa_pipeline = pipeline("question-answering", model=QA_MODEL, device=0)
210 |     logger.info("Done")
211 | 
212 |     # Get prediction
213 |     logger.info("Computing the QA-based accuracy...")
214 |     em, f1, bins = [], [], []
215 |     for item in tqdm(data):
216 |         question = [qa_pair['question'] for qa_pair in item['qa_pairs']]
217 |         context = item['output'] if len(item['output']) > 0 else " "
218 |         results = qa_pipeline(question=question, context=context, handle_impossible_answer=True)
219 |         loc_counter, loc_em, loc_f1 = 0, 0, 0
220 | 
221 |         for idx, res in enumerate(results):
222 |             answers = item["qa_pairs"][idx]["short_answers"]
223 |             prediction = res["answer"]
224 | 
225 |             loc_em += max([compute_exact(a, prediction) for a in answers])
226 |             loc_f1 += max([compute_f1(a, prediction) for a in answers])
227 |             loc_counter += 1
228 | 
229 |         em.append(loc_em / loc_counter)
230 |         f1.append(loc_f1 / loc_counter)
231 |         bins.append(loc_em == loc_counter)
232 | 
233 |     return {
234 |         'QA-EM': 100 * np.mean(em),
235 |         'QA-F1': 100 * np.mean(f1),
236 |         'QA-Hit': 100 * np.mean(bins)
237 |     }
238 | 
239 | 
240 | def compute_mauve(data):
241 |     """Compute Mauve score."""
242 | 
243 |     logger.info("Computing MAUVE...")
244 |     human_data = []
245 |     model_data = []
246 |     for item in data:
247 |         # Remove ending punctuations
248 |         # Remove any new lines
249 |         # Truncate by 100 words
250 |         human_data.append(' '.join((item['question'] + " " + item['answer'].strip()).split()[:100]).rstrip(string.punctuation))
251 |         model_data.append(' '.join((item['question'] + " " + item['output'].strip()).split()[:100]).rstrip(string.punctuation))
252 | 
253 |     import mauve
254 |     out = mauve.compute_mauve(
255 |         p_text=human_data,
256 |         q_text=model_data,
257 |         device_id=0,
258 |         max_text_length=512,
259 |         verbose=True,
260 |         batch_size=8,
261 |         featurize_model_name="gpt2-large"
262 |     )
263 |     return out.mauve * 100
264 | 
265 | 
266 | def _run_nli_autoais(passage, claim):
267 |     """
268 |     Run inference for assessing AIS between a premise and hypothesis.
269 |     Adapted from https://github.com/google-research-datasets/Attributed-QA/blob/main/evaluation.py
270 |     """
271 |     global autoais_model, autoais_tokenizer
272 |     input_text = "premise: {} hypothesis: {}".format(passage, claim)
273 |     inputs = autoais_tokenizer(input_text, return_tensors="pt").to('cuda')
274 | 
275 |     with torch.inference_mode():
276 |         outputs = autoais_model.generate(inputs['input_ids'], output_scores=True,max_new_tokens=10)
277 | 
278 |     result = autoais_tokenizer.decode(outputs[0], skip_special_tokens=True)
279 |     inference = 1 if result == "1" else 0
280 |     return inference
281 | 
282 | 
283 | def compute_claims(data):
284 |     global autoais_model, autoais_tokenizer
285 |     if autoais_model is None:
286 |         logger.info("Loading AutoAIS model...")
287 |         autoais_model = AutoModelForSeq2SeqLM.from_pretrained(AUTOAIS_MODEL, torch_dtype=torch.bfloat16, max_memory=get_max_memory(), device_map="auto")
288 |         autoais_tokenizer = AutoTokenizer.from_pretrained(AUTOAIS_MODEL, use_fast=False)
289 | 
290 |     logger.info("Computing claims...")
291 |     scores = []
292 |     for item in tqdm(data):
293 |         normalized_output = remove_citations(item['output'])
294 |         entail = 0
295 |         claims = item["claims"]
296 |         for claim in claims:
297 |             entail += _run_nli_autoais(normalized_output, claim)
298 |         scores.append(entail / len(claims))
299 |     return 100 * np.mean(scores)
300 | 
301 | 
302 | def compute_autoais(data,
303 |                     decontext=False,
304 |                     concat=False,
305 |                     qampari=False,
306 |                     at_most_citations=None,):
307 |     """
308 |     Compute AutoAIS score.
309 | 
310 |     Args:
311 |         data: requires field `output` and `docs`
312 |               - docs should be a list of items with fields `title` and `text` (or `phrase` and `sent` for QA-extracted docs)
313 |         citation: check citations and use the corresponding references.
314 |         decontext: decontextualize the output
315 |     """
316 | 
317 |     global autoais_model, autoais_tokenizer
318 |     if autoais_model is None:
319 |         logger.info("Loading AutoAIS model...")
320 |         autoais_model = AutoModelForSeq2SeqLM.from_pretrained(AUTOAIS_MODEL, torch_dtype=torch.bfloat16, max_memory=get_max_memory(), device_map="auto")
321 |         autoais_tokenizer = AutoTokenizer.from_pretrained(AUTOAIS_MODEL, use_fast=False)
322 | 
323 |     logger.info(f"Running AutoAIS...")
324 | 
325 |     def _format_document(doc):
326 |         """Format document for AutoAIS."""
327 | 
328 |         if "sent" in doc:
329 |             # QA-extracted docs
330 |             return "Title: %s\n%s" % (doc['title'], doc['sent'])
331 |         else:
332 |             return "Title: %s\n%s" % (doc['title'], doc['text'])
333 | 
334 |     ais_scores = []
335 |     ais_scores_prec = []
336 | 
337 |     sent_total = 0
338 |     sent_mcite = 0
339 |     sent_mcite_support = 0
340 |     sent_mcite_overcite = 0
341 |     autoais_log = []
342 |     for item in tqdm(data):
343 |         # Get sentences by using NLTK
344 |         if qampari:
345 |             sents = [item['question'] + " " + x.strip() for x in item['output'].rstrip().rstrip(".").rstrip(",").split(",")]
346 |         else:
347 |             sents = sent_tokenize(item['output'])
348 |         if len(sents) == 0:
349 |             continue
350 | 
351 |         target_sents = [remove_citations(sent).strip() for sent in sents]
352 | 
353 |         entail = 0
354 |         entail_prec = 0
355 |         total_citations = 0
356 |         print('len(sents):{}'.format(len(sents)))
357 |         for sent_id, sent in enumerate(sents):
358 |             target_sent = target_sents[sent_id] # Citation removed and (if opted for) decontextualized
359 |             joint_entail = -1 # Undecided
360 | 
361 |             # Find references
362 |             ref = [int(r[1:])-1 for r in re.findall(r"\[\d+", sent)] # In text citation id starts from 1
363 |             logger.info(f"For `{sent}`, find citations {ref}")
364 |             if len(ref) == 0:
365 |                 # No citations
366 |                 joint_entail = 0
367 |             elif any([ref_id >= len(item['docs']) for ref_id in ref]):
368 |                 # Citations out of range
369 |                 joint_entail = 0
370 |             else:
371 |                 if at_most_citations is not None:
372 |                     ref = ref[:at_most_citations]
373 |                 total_citations += len(ref)
374 |                 joint_passage = '\n'.join([_format_document(item['docs'][psgs_id]) for psgs_id in ref])
375 | 
376 |             # If not directly rejected by citation format error, calculate the recall score
377 |             if joint_entail == -1:
378 |                 print('joint_passage:\n{}'.format(joint_passage))
379 |                 print('target_sent:\n{}'.format(target_sent))
380 |                 print('*'*20)
381 |                 print()
382 |                 joint_entail = _run_nli_autoais(joint_passage, target_sent)
383 |                 autoais_log.append({
384 |                     "question": item['question'],
385 |                     "output": item['output'],
386 |                     "claim": sent,
387 |                     "passage": [joint_passage],
388 |                     "model_type": "NLI",
389 |                     "model_output": joint_entail,
390 |                 })
391 | 
392 |             entail += joint_entail
393 |             if len(ref) > 1:
394 |                 sent_mcite += 1
395 | 
396 |             # calculate the precision score if applicable
397 |             if joint_entail and len(ref) > 1:
398 |                 sent_mcite_support += 1
399 |                 # Precision check: did the model cite any unnecessary documents?
400 |                 for psgs_id in ref:
401 |                     # condition A
402 |                     passage = _format_document(item['docs'][psgs_id]) 
403 |                     nli_result = _run_nli_autoais(passage, target_sent)
404 | 
405 |                     # condition B
406 |                     if not nli_result:
407 |                         subset_exclude = copy.deepcopy(ref)
408 |                         subset_exclude.remove(psgs_id)
409 |                         passage = '\n'.join([_format_document(item['docs'][pid]) for pid in subset_exclude])
410 |                         nli_result = _run_nli_autoais(passage, target_sent)
411 |                         if nli_result: # psgs_id is not necessary
412 |                             flag = 0
413 |                             sent_mcite_overcite += 1 
414 |                         else:
415 |                             entail_prec += 1
416 |                     else:
417 |                         entail_prec += 1
418 |             else:
419 |                 entail_prec += joint_entail 
420 | 
421 |         sent_total += len(sents)
422 |         ais_scores.append(entail / len(sents))
423 |         ais_scores_prec.append(entail_prec / total_citations if total_citations > 0 else 0) # len(sents))
424 | 
425 |     if sent_mcite > 0 and sent_mcite_support > 0:
426 |         print("Among all sentences, %.2f%% have multiple citations, among which %.2f%% are supported by the joint set, among which %.2f%% overcite." % (
427 |             100 * sent_mcite / sent_total, 
428 |             100 * sent_mcite_support / sent_mcite, 
429 |             100 * sent_mcite_overcite / sent_mcite_support
430 |         ))
431 | 
432 |     def calculate_f1(precision, recall):
433 |         if precision + recall == 0:
434 |             return 0
435 |         return 2 * (precision * recall) / (precision + recall)
436 | 
437 |     citation_recall = np.mean(ais_scores)
438 |     citation_precision = np.mean(ais_scores_prec)
439 |     return {
440 |         "citation_rec": 100 * citation_recall,
441 |         "citation_prec": 100 * citation_precision,
442 |         "citation_f1": 100 * calculate_f1(citation_precision, citation_recall)
443 |     }
444 | 
445 | 
446 | def compute_qampari_f1(data, cot=False):
447 |     prec = []
448 |     rec = []
449 |     rec_top5 = []
450 |     f1 = []
451 |     f1_top5 = []
452 | 
453 |     num_preds = []
454 |     for item in data:
455 |         if cot:
456 |             if ":" in item['output']:
457 |                 o = ':'.join(item['output'].split(":")[1:]) # try to separate the COT part and the answer list part.
458 |             else:
459 |                 o = ""
460 |         else:
461 |             o = item['output']
462 |         preds = [normalize_answer(x.strip()) for x in o.rstrip().rstrip(".").rstrip(",").split(",")]
463 |         preds = [p for p in preds if len(p) > 0] # delete empty answers
464 |         num_preds.append(len(preds))
465 |         answers = [[normalize_answer(x) for x in ans] for ans in item['answers']]
466 |         flat_answers = [item for sublist in answers for item in sublist]
467 |         
468 |         prec.append(sum([p in flat_answers for p in preds]) / len(preds) if len(preds) > 0 else 0)
469 |         rec.append(sum([any([x in preds for x in a]) for a in answers]) / len(answers))
470 |         rec_top5.append(min(5, sum([any([x in preds for x in a]) for a in answers])) / min(5, len(answers)))
471 |         if (prec[-1] + rec[-1]) == 0:
472 |             f1.append(0)
473 |         else:
474 |             f1.append(2 * prec[-1] * rec[-1] / (prec[-1] + rec[-1]))
475 |         if (prec[-1] + rec_top5[-1]) == 0:
476 |             f1_top5.append(0) 
477 |         else:
478 |             f1_top5.append(2 * prec[-1] * rec_top5[-1] / (prec[-1] + rec_top5[-1]))
479 | 
480 |     return {
481 |         "num_preds": np.mean(num_preds),
482 |         "qampari_prec": 100 * np.mean(prec),
483 |         "qampari_rec": 100 * np.mean(rec),
484 |         "qampari_rec_top5": 100 * np.mean(rec_top5),
485 |         "qampari_f1": 100 * np.mean(f1),
486 |         "qampari_f1_top5": 100 * np.mean(f1_top5),
487 |     }
488 | 
489 | def main():
490 |     parser = argparse.ArgumentParser()
491 |     parser.add_argument("--f", type=str, required=True, help="Output file. Should have field `question`, `output`, (ROUGE) `answer`, \
492 |                         (accuracy) `qa_pairs`, (AIS) `docs`")
493 |     parser.add_argument('--eval_metric',required=True,choices=['default','correctness','custom'])
494 |     parser.add_argument("--no_rouge", action="store_true", help="Do not evaluate ROUGE score")
495 |     parser.add_argument("--qa", action="store_true", help="Use the QA model")
496 |     parser.add_argument("--mauve", action="store_true", help="Use the mauve score model")
497 |     parser.add_argument("--citations", action="store_true", help="Evaluation with citation")
498 |     parser.add_argument("--at_most_citations", type=int, default=3, help="At most take this many documents (mostly for precision)")
499 |     parser.add_argument("--claims_nli", action="store_true", help="Use claims for ELI5")
500 |     parser.add_argument('--full_fitlog_hyper',default=0)
501 | 
502 |     # QAMPARI
503 |     parser.add_argument("--cot", action="store_true", help="For QAMPARI, try to find colon and separate the COT and answer listing")
504 | 
505 |     args = parser.parse_args()
506 | 
507 |     if args.eval_metric == 'default':
508 |         if 'asqa' in args.f:
509 |             args.qa = 1
510 |             args.mauve = 1
511 |             args.citations = 1
512 |             args.claims_nli = 0
513 |         elif 'qampari' in args.f:
514 |             args.citations = 1
515 |         elif 'eli5' in args.f:
516 |             args.citations = 1
517 |             args.claims_nli = 1
518 |             args.mauve = 1
519 |     elif args.eval_metric == 'custom':
520 |         pass
521 |     elif args.eval_metric == 'correctness':
522 |         if 'asqa' in args.f:
523 |             args.qa = 0
524 |             args.mauve = 0
525 |             args.citations = 0
526 |             args.claims_nli = 0
527 |         elif 'qampari' in args.f:
528 |             args.citations = 0
529 |         elif 'eli5' in args.f:
530 |             args.citations = 0
531 |             args.claims_nli = 1
532 |             args.mauve = 0
533 | 
534 |     with open(args.f) as f:
535 |         data_with_config = json.load(f)
536 |     data = data_with_config['data'] 
537 | 
538 |     if "qampari" in args.f:
539 |         args.no_rouge = True
540 |         args.qa = False
541 |         args.mauve = False
542 |         args.decontext = False
543 |         qampari = True
544 |     else:
545 |         qampari = False
546 | 
547 |     # Truncate by newline and remove on the fly search result
548 |     logger.warning("We remove all the pre/appended space/newlines and we truncate the answer by the first newline.")
549 |     logger.warning("We replace any on the fly search result to standard bracket citation format.")
550 |     for i in range(len(data)):
551 |         data[i]['output'] = data[i]['output'].strip().split("\n")[0]
552 |         data[i]['output'] = data[i]['output'].replace("<|im_end|>", "")
553 | 
554 |     # Remove all citations for all non-AutoAIS evaluation
555 |     normalized_data = copy.deepcopy(data)
556 |     for i in range(len(normalized_data)):
557 |         normalized_data[i]['output'] = remove_citations(normalized_data[i]['output'])
558 | 
559 |     result = {}
560 |     result['length'] = compute_len(normalized_data)
561 |     result['str_em'], result['str_hit'] = compute_str_em(normalized_data)
562 |     logger.info('eval_result:{}'.format(result))
563 |     if qampari:
564 |         result.update(compute_qampari_f1(normalized_data, cot=args.cot))
565 |     logger.info('eval_result:{}'.format(result))
566 |     if not args.no_rouge:
567 |         result['rougeLsum'] = compute_rouge(normalized_data)
568 |     logger.info('eval_result:{}'.format(result))
569 |     if args.qa:
570 |         result.update(compute_qa(normalized_data))
571 |     logger.info('eval_result:{}'.format(result))
572 |     if args.mauve:
573 |         result['mauve'] = compute_mauve(normalized_data)
574 |     logger.info('eval_result:{}'.format(result))
575 |     if args.citations: 
576 |         result.update(compute_autoais(data, qampari=qampari, at_most_citations=args.at_most_citations))
577 |     logger.info('eval_result:{}'.format(result))
578 |     if args.claims_nli:
579 |         result["claims_nli"] = compute_claims(normalized_data)
580 |     logger.info('eval_result:{}'.format(result))
581 | 
582 |     json.dump(result, open(args.f.replace("json", "score"), "w"), indent=4)
583 | 
584 | 
585 | if __name__ == "__main__":
586 |     main()
587 | 


--------------------------------------------------------------------------------
/llm_retrieval_related/iterative_select_supporting_documents.py:
--------------------------------------------------------------------------------
  1 | import copy
  2 | import threading
  3 | import tqdm
  4 | import openai
  5 | from transformers import AutoTokenizer
  6 | import time
  7 | from typing import List, Dict, Tuple, Union
  8 | import logging
  9 | 
 10 | logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
 11 | logger = logging.getLogger(__name__)
 12 | logger.setLevel(logging.INFO)
 13 | 
 14 | gpt2_tokenizer = AutoTokenizer.from_pretrained('gpt2')
 15 | 
 16 | 
 17 | def truncate_doc_in_user_prompt(user_prompt):
 18 |     last_index = user_prompt.rfind('Content:\n')
 19 |     if last_index != -1:
 20 |         user_prompt = user_prompt[:last_index]
 21 | 
 22 |     user_prompt = '\n'.join(user_prompt.split('\n')[:-2])
 23 |     user_prompt = user_prompt.strip()
 24 |     return user_prompt
 25 | 
 26 | 
 27 | def letter_to_int(letter):
 28 |     if 'a' <= letter <= 'z':
 29 |         return ord(letter) - ord('a')
 30 |     elif 'A' <= letter <= 'Z':
 31 |         return ord(letter) - ord('A')
 32 |     else:
 33 |         print('letter:{}'.format(letter))
 34 |         raise NotImplementedError
 35 | 
 36 | 
 37 | def letter_to_int_upper(letter):
 38 |     if 'A' <= letter <= 'Z':
 39 |         return ord(letter) - ord('A')
 40 |     else:
 41 |         print('letter:{}'.format(letter))
 42 |         raise NotImplementedError
 43 | 
 44 | 
 45 | def letter_to_int_lower(letter):
 46 |     if 'a' <= letter <= 'z':
 47 |         return ord(letter) - ord('a')
 48 |     else:
 49 |         print('letter:{}'.format(letter))
 50 |         raise NotImplementedError
 51 | 
 52 | 
 53 | def int_to_letter_lower(n):
 54 |     if 0 <= n <= 25:
 55 |         return chr(n + ord('a'))
 56 |     else:
 57 |         raise ValueError('The entered integer must be between 0 and 25')
 58 | 
 59 | 
 60 | def int_to_letter_upper(n):
 61 |     if 0 <= n <= 25:
 62 |         return chr(n + ord('A'))
 63 |     else:
 64 |         raise ValueError('The entered integer must be between 0 and 25')
 65 | 
 66 | 
 67 | def create_stage2_select_prompt(questions: List[str], 
 68 |                                 docs: List, 
 69 |                                 k: int, 
 70 |                                 idf_use_letter: str, 
 71 |                                 use_title: int, 
 72 |                                 stage2_select_system_prompt: str, 
 73 |                                 used_doc_field: str, 
 74 |                                 reverse_doc_order: bool=False) -> List[Dict]:
 75 |     """Create the prompt for selection in the 2nd stage.
 76 |     Args
 77 |     ----
 78 |     questions: List[str]
 79 |         The question.
 80 |     docs: List
 81 |         The documents relevant to the question
 82 |     k: int
 83 |         A specified number of documents for answering the user's specific question(s).
 84 |     idf_use_letter: str
 85 |         Use uppercase letters, lowercase letters, or integers to mark the documents.
 86 |     use_title: int
 87 |         Whether to use title or not.
 88 |     stage2_select_system_prompt: str
 89 |         System prompt for instruction.
 90 |     used_doc_field_in_retrieval: str
 91 |         Which filed of document to use in retrieval.
 92 |     reverse_doc_order: bool=False
 93 |         Whether to reverse the order of document or not.
 94 | 
 95 |     Returns
 96 |     -------
 97 |     prompt: List[Dict]
 98 |         Prompt for selection.
 99 |     """
100 |     user_prompt = 'Question:\n{}\n\nk: {}\n\n'.format('\n'.join(questions), k)
101 |     user_prompt += 'Candidate Documents:\n\n'
102 | 
103 |     prompt_doc_str_list = []
104 | 
105 |     for i, doc in enumerate(docs):
106 |         if idf_use_letter == 'lower':
107 |             idf = int_to_letter_lower(i)
108 |         elif idf_use_letter == 'upper':
109 |             idf = int_to_letter_upper(i)
110 |         else:
111 |             idf = i + 1
112 |         if use_title:
113 |             prompt_doc_str_list.append('{}\nTitle:\n{}\nContent:\n{}\n\n'.format(idf, doc['title'], doc[used_doc_field]))
114 |         else:
115 |             prompt_doc_str_list.append('{}\nContent:\n{}\n\n'.format(idf, doc[used_doc_field]))
116 | 
117 |     if reverse_doc_order:
118 |         user_prompt += ''.join(list(reversed(prompt_doc_str_list)))
119 |     else:
120 |         user_prompt += ''.join(prompt_doc_str_list)
121 | 
122 |     prompt = [
123 |         {'role': 'system', 'content': stage2_select_system_prompt},
124 |         {'role': 'user', 'content': user_prompt.strip()}
125 |     ]
126 | 
127 |     return prompt
128 | 
129 | 
130 | def select_k_supporting_documents(questions: List[str], 
131 |                                   tmp_selected_docs: List, 
132 |                                   extra_docs_to_browse: List[Dict], 
133 |                                   k: int, 
134 |                                   selected_doc_first: int,
135 |                                   idf_use_letter: str, 
136 |                                   use_title: int, 
137 |                                   model_name: str, 
138 |                                   stage2_select_system_prompt: str, 
139 |                                   used_doc_field_in_retrieval: str, 
140 |                                   thread: "instance") -> Dict:
141 |     """Select k supporting documents.
142 |     Args
143 |     ----
144 |     questions: List[str]
145 |         The question.
146 |     tmp_selected_docs: List
147 |         pass
148 |     extra_docs_to_browse: List[Dict]
149 |         pass
150 |     k: int
151 |         A specified number of documents for answering the user's specific question(s).
152 |     selected_doc_first: int
153 | 
154 |     idf_use_letter: str
155 |         Use uppercase letters, lowercase letters, or integers to mark the documents.
156 |     use_title: int
157 |         Whether to use title or not.
158 |     model_name: str
159 |         OpenAI model name.
160 |     stage2_select_system_prompt: str
161 |         System prompt for instruction.
162 |     used_doc_field_in_retrieval: str
163 |         Which filed of document to use.
164 |     thread: "instance"
165 |         pass
166 |     """
167 |     unbrowsed_docs = []
168 | 
169 |     assert idf_use_letter in ['upper', 'lower', 'int']
170 | 
171 |     while 1:
172 |         if selected_doc_first:
173 |             docs_concat = tmp_selected_docs + extra_docs_to_browse
174 |         else:
175 |             docs_concat = extra_docs_to_browse + tmp_selected_docs
176 |         messages = create_stage2_select_prompt(questions, docs_concat, k, idf_use_letter, use_title, stage2_select_system_prompt, used_doc_field_in_retrieval)
177 |         prompt_token_num = len(gpt2_tokenizer.tokenize(messages[0]['content'] + messages[1]['content']))
178 |         if prompt_token_num > 3900:
179 |             unbrowsed_docs.insert(0, extra_docs_to_browse[-1])
180 |             extra_docs_to_browse.pop()
181 |         else:
182 |             break
183 |         if len(extra_docs_to_browse) == 0:
184 |             break
185 | 
186 |     final_docs_in_query = [docs_concat]
187 | 
188 |     if len(unbrowsed_docs) > 0:
189 |         logger.info('before openai query, unbrowsed_docs > 0 : {}'.format(len(unbrowsed_docs)))
190 | 
191 |     def repeat_until_success_call_openai_api(func):
192 |         def wrapper(*args, **kw):
193 |             while True:
194 |                 result = None
195 |                 try:
196 |                     result = func(*args, **kw)
197 |                 except openai.error.APIConnectionError as e:
198 |                     if thread.print_error:
199 |                         logger.info('openai connection error, so retry after sleeping 5 seconds')
200 |                         logger.info(e)
201 |                     time.sleep(5)
202 |                 except openai.error.RateLimitError as e:
203 |                     logger.info(type(e))
204 |                     logger.info(e)
205 |                     logger.info('e._message:{}'.format(e._message))
206 |                     if 'quota' in e._message:
207 |                         if thread.print_error:
208 |                             logger.info('now openai account {} runs out. so use next.'.format(thread.account[-1]))
209 |                             logger.info(type(e))
210 |                             logger.info(e)
211 |                         thread.account = thread.openai_account_manager_multi_thread.get_next_account(thread.thread_id,
212 |                                                                                                      thread.account)
213 |                     elif "maximum context length is" in e._message:
214 |                         unbrowsed_docs.insert(0, extra_docs_to_browse[-1])
215 |                         extra_docs_to_browse.pop()
216 | 
217 |                         if selected_doc_first:
218 |                             docs_concat = tmp_selected_docs + extra_docs_to_browse
219 |                         else:
220 |                             docs_concat = extra_docs_to_browse + tmp_selected_docs
221 |                         final_docs_in_query[0] = docs_concat
222 |                         messages = create_stage2_select_prompt(questions, docs_concat, k, idf_use_letter, use_title, stage2_select_system_prompt, used_doc_field_in_retrieval)
223 |                         print('in repeat_until_success_call_openai_api, docs < 20 : {}'.format(
224 |                             len(docs_concat)))
225 |                         kw['messages'] = messages
226 |                     else:
227 |                         if True:
228 |                             logger.info('openai rate limit error, so retry after sleeping 45 seconds')
229 |                         time.sleep(45)
230 |                 except openai.error.AuthenticationError as e:
231 |                     if 'This key is associated with a deactivated account' in e._message:
232 |                         logger.info('the account {} is deactivated. so use next'.format(thread.account[-1]))
233 |                         if thread.print_error:
234 |                             logger.info(e)
235 |                         thread.account = thread.openai_account_manager_multi_thread.get_next_account(thread.thread_id,
236 |                                                                                                      thread.account)
237 |                     else:
238 |                         logger.info('meet unexpected AuthenticationError, so retry after sleeping 5 seconds')
239 |                         if thread.print_error:
240 |                             logger.info(e)
241 |                         thread.account = thread.openai_account_manager_multi_thread.get_next_account(thread.thread_id,
242 |                                                                                                      thread.account)
243 |                 except openai.error.InvalidRequestError as e:
244 |                     if "maximum context length is" in e._message:
245 |                         unbrowsed_docs.insert(0, extra_docs_to_browse[-1])
246 |                         extra_docs_to_browse.pop()
247 | 
248 |                         if selected_doc_first:
249 |                             docs_concat = tmp_selected_docs + extra_docs_to_browse
250 |                         else:
251 |                             docs_concat = extra_docs_to_browse + tmp_selected_docs
252 |                         final_docs_in_query[0] = docs_concat
253 |                         messages = create_stage2_select_prompt(questions, docs_concat, k, idf_use_letter, use_title, stage2_select_system_prompt, used_doc_field_in_retrieval)
254 |                         print('in repeat_until_success_call_openai_api, docs < 20 : {}'.format(len(docs_concat)))
255 |                         kw['messages'] = messages
256 | 
257 |                 except openai.error.OpenAIError as e:
258 |                     logger.info('meet unexpected openai error, so retry after sleeping 5 seconds')
259 |                     logger.info(e)
260 |                     logger.info(type(e))
261 |                     time.sleep(3)
262 | 
263 |                 except Exception as e:
264 |                     raise e
265 | 
266 |                 if result != None:
267 |                     return result
268 |                 else:
269 |                     pass
270 | 
271 |         return wrapper
272 | 
273 |     @repeat_until_success_call_openai_api
274 |     def tmp_func(messages):
275 |         return openai.ChatCompletion.create(model=model_name, messages=messages, temperature=0, max_tokens=64, api_key=thread.account[-1])
276 | 
277 |     if "gpt-3.5-turbo" in model_name:
278 |         response = tmp_func(messages=messages)
279 |         response = response['choices'][0]['message']['content']
280 |     else:
281 |         raise NotImplementedError
282 |     response = response.split('\n')
283 |     if len(response) > 1:
284 |         logger.info('response has > 1 lines, so just use its first line which has the selected documents')
285 |         logger.warning(f"response: \n{response}")
286 |     response = response[0]
287 | 
288 |     if len(unbrowsed_docs) > 0:
289 |         logger.info('after openai query, unbrowsed_docs > 0 : {}'.format(len(unbrowsed_docs)))
290 | 
291 |     response_document_identifiers = response.replace(',', ' ').replace('[', ' ').replace(']', ' ').strip().split()
292 |     selected_doc_idfs = []
293 | 
294 |     docs_concat_in_openai_query = final_docs_in_query[0]
295 |     for idf in response_document_identifiers:
296 |         try:
297 |             if idf_use_letter == 'upper':
298 |                 idf = letter_to_int_upper(idf)
299 |             elif idf_use_letter == 'lower':
300 |                 idf = letter_to_int_lower(idf)
301 |             else:
302 |                 idf = int(idf) - 1
303 | 
304 |             if idf >= len(docs_concat_in_openai_query):
305 |                 print('idf={}, response={}'.format(idf, response))
306 |             else:
307 |                 selected_doc_idfs.append(idf)
308 |         except:
309 |             pass
310 | 
311 |     if len(selected_doc_idfs) != k:
312 |         print('len(retrieved_doc_idfs) != k, k:{}, len:{},\nresponse:\n{}response_document_identifiers:\n{}'.format(k,
313 |                                                                                                                     len(selected_doc_idfs),
314 |                                                                                                                     response,
315 |                                                                                                                     response_document_identifiers))
316 | 
317 |     selected_doc_idfs = selected_doc_idfs[:k]
318 | 
319 |     docs_concat_in_openai_query = final_docs_in_query[0]
320 | 
321 |     result_dict = {}
322 | 
323 |     selected_docs = []
324 |     for idf in selected_doc_idfs:
325 |         selected_docs.append(docs_concat_in_openai_query[idf])
326 |     result_dict['selected_docs'] = selected_docs
327 | 
328 |     original_openai_response = response
329 |     result_dict['original_openai_response'] = original_openai_response
330 | 
331 |     parsed_doc_idfs = selected_doc_idfs
332 |     result_dict['parsed_doc_idfs'] = parsed_doc_idfs
333 | 
334 |     result_dict['unbrowsed_docs'] = unbrowsed_docs
335 | 
336 |     return result_dict
337 | 
338 | 
339 | def iterative_select_supporting_documents_single(alce_item: Dict, 
340 |                                                  k: int, 
341 |                                                  window_size: int, 
342 |                                                  reversed_browse_order: int, 
343 |                                                  selected_doc_first: int,
344 |                                                  idf_use_letter: str, 
345 |                                                  use_title: int, 
346 |                                                  model_name: str, 
347 |                                                  stage2_select_system_prompt: str, 
348 |                                                  used_doc_field_in_retrieval: str, 
349 |                                                  thread: "instance", 
350 |                                                  use_sub_questions: int=0, 
351 |                                                  old_selected_docs: List[Dict]=None,
352 |                                                  position: str=None,
353 |                                                  doc_num: int=100) -> Dict:
354 |     """Iteratively select supporting documents.
355 |     Args
356 |     ----
357 |     alce_item: Dict
358 |         Single data.
359 |     k: int
360 |         A specified number of documents for answering the user's specific question(s).
361 |     window_size: int
362 |         Context length.
363 |     reversed_browse_order: int
364 |         Whether to reverse the document order or not.
365 |     selected_doc_first: int
366 |         Whether to use the selected documents first or not.
367 |     idf_use_letter: str
368 |         Use uppercase letters, lowercase letters, or integers to mark the documents.
369 |     use_title: int
370 |         Whether to use title or not.
371 |     model_name: str
372 |         Which model of OpenAI to use.
373 |     stage2_select_system_prompt: str
374 |         System prompt for instruction.
375 |     used_doc_field_in_retrieval: str
376 |         Which filed of document to use in retrieval.
377 |     thread: "instance"
378 |         Instance of thread.
379 |     use_sub_questions: int=0
380 |         Whether to use sub questions for asqa.
381 |     old_selected_docs: List[Dict]=None
382 |         Old selected docs. May be less than 5.
383 |     position: str=None
384 |         Put the top-5 docs from old selected docs into the head or tail.
385 |     doc_num: int=100
386 |         Use top-k docs for reranking.
387 | 
388 |     Returns
389 |     -------
390 |     output_alce_item: Dict
391 |         Selected docs.
392 |     """
393 |     output_alce_item = copy.deepcopy(alce_item)
394 |     question = alce_item['question']
395 |     asqa_questions = None
396 |     if use_sub_questions and 'qa_pairs' in alce_item:
397 |         logger.warning("Use sub questions for asqa.")
398 |         asqa_questions = list(map(lambda x: x['question'], list(alce_item['qa_pairs'])))
399 | 
400 |     if asqa_questions != None:
401 |         questions = asqa_questions
402 |     else:
403 |         questions = [question]
404 | 
405 |     docs_to_browse = copy.deepcopy(alce_item['docs'][:doc_num])
406 |     logger.warning(f"The number of documents used for reranking is {len(docs_to_browse)}.")
407 | 
408 |     if old_selected_docs is not None and position == "head":
409 |         logger.info("Add old selected docs into head.")
410 |         old_selected_docs_copy = copy.deepcopy(old_selected_docs)
411 |         docs_to_browse = old_selected_docs_copy + docs_to_browse
412 |     elif old_selected_docs is not None and position == "tail":
413 |         logger.info("Add old selected docs into tail.")
414 |         old_selected_docs_copy = copy.deepcopy(old_selected_docs)
415 |         docs_to_browse = docs_to_browse + old_selected_docs_copy
416 | 
417 |     if reversed_browse_order:
418 |         docs_to_browse = list(reversed(docs_to_browse))
419 | 
420 |     tmp_selected_docs = []
421 | 
422 |     while len(docs_to_browse) > 0:
423 |         # iteratively update tmp_selected_docs
424 |         tmp_extra_docs_to_browse = docs_to_browse[:window_size - len(tmp_selected_docs)]
425 |         docs_to_browse = docs_to_browse[window_size - len(tmp_selected_docs):]
426 |         select_result_dict = select_k_supporting_documents(questions, tmp_selected_docs, tmp_extra_docs_to_browse, k,
427 |                                                            selected_doc_first, idf_use_letter, use_title,
428 |                                                            model_name, stage2_select_system_prompt, used_doc_field_in_retrieval, thread)
429 | 
430 |         tmp_selected_docs = select_result_dict['selected_docs']
431 |         original_openai_response = select_result_dict['original_openai_response']
432 |         parsed_doc_idfs = select_result_dict['parsed_doc_idfs']
433 |         unbrowsed_docs = select_result_dict['unbrowsed_docs']
434 | 
435 |         docs_to_browse = unbrowsed_docs + docs_to_browse
436 | 
437 |     output_alce_item['docs'] = tmp_selected_docs
438 | 
439 |     return output_alce_item
440 | 
441 | 
442 | class OpenAI_API_inp_Manager_MultiThread_Generalized:
443 |     def __init__(self, idx_non_general_inp: List[Tuple], general_inp: Dict) -> None:
444 |         """Class init
445 |         Args
446 |         ----
447 |         idx_non_general_inp: List[Tuple]
448 |             Data with index.
449 |         general_inp: Dict
450 |             Hyperparameter.
451 |         """
452 |         self.idx_non_general_inp = idx_non_general_inp
453 |         assert idx_non_general_inp[0][0] == 0, 'the 1st idx_non_general_inp"s idx is not 0, maybe something error'
454 |         self.general_inp = general_inp
455 |         self.inp_lock = threading.Lock()
456 |         self.progress_index = 0
457 | 
458 |         assert type(general_inp) == type({})
459 | 
460 | 
461 |     def get_next_idx_inp(self) -> Union[List, None]:
462 |         """
463 |         Get next new data.
464 |         """
465 |         with self.inp_lock:
466 |             if self.progress_index < len(self.idx_non_general_inp):
467 |                 tmp_idx = self.idx_non_general_inp[self.progress_index][0]
468 |                 tmp_non_general_inp = self.idx_non_general_inp[self.progress_index][1]
469 |                 tmp_general_inp = self.general_inp
470 |                 assert len(set(tmp_general_inp.keys()) & set(tmp_non_general_inp)) == 0, 'tmp_non_general_inp and tmp_general_inp has key overlap, must have problem'
471 |                 self.progress_index += 1
472 |                 return [tmp_idx, {**tmp_non_general_inp, **tmp_general_inp}]
473 |             else:
474 |                 return None
475 | 
476 | 
477 | class MyThread(threading.Thread):
478 |     # todo: Adjust MyThread from calling_sliding_window to two_stage_retrieve
479 |     def __init__(self, thread_id: int, account_manager: "instance", inp_manager: "instance", print_error: bool, pbar: tqdm.tqdm, print_finish: bool=True) -> None:
480 |         """Class init.
481 |         Args
482 |         ----
483 |         thread_id: int
484 |             Thread id.
485 |         account_manager: "instance"
486 |             A manager for accounts of OpenAI.
487 |         inp_manager: "instance"
488 |             A manager for data.
489 |         print_error: bool
490 |             Whether to output error info or not.
491 |         pbar: tqdm.tqdm
492 |             Object of tqdm.
493 |         print_finish: bool=True
494 |             Whether to output ending info or not.
495 |         """
496 |         threading.Thread.__init__(self)
497 |         self.thread_id = thread_id
498 |         self.openai_account_manager_multi_thread = account_manager
499 |         self.openai_inp_manager = inp_manager
500 |         self.account = self.openai_account_manager_multi_thread.get_next_account(self.thread_id)
501 |         self.print_error = print_error
502 |         self.pbar = pbar
503 |         self.print_finish = print_finish
504 | 
505 | 
506 |     def run(self):
507 |         self.results_with_idx = []
508 |         while True:
509 |             tmp = self.openai_inp_manager.get_next_idx_inp()
510 |             if tmp == None:
511 |                 if self.print_finish:
512 |                     logger.info('thread {} finish'.format(self.thread_id))
513 |                 return
514 |             else:
515 |                 tmp_idx = tmp[0]
516 |                 select_doc_input = tmp[1]
517 |                 result = iterative_select_supporting_documents_single(**select_doc_input, thread=self)
518 |                 if self.pbar is not None:
519 |                     self.pbar.update(1)
520 |                 self.results_with_idx.append([tmp_idx, result])
521 | 
522 | 
523 | from openai_account_manager import get_account_manager
524 | 
525 | def iterative_select_supporting_documents_multi_thread(items_to_select: List[Dict], 
526 |                                                        general_input: Dict, 
527 |                                                        num_threads: int, 
528 |                                                        use_tqdm: bool=True, 
529 |                                                        old_data: List[Dict]=None) -> List:
530 |     """Iteratively select supporting documents in a multi-threaded manner.
531 |     Args
532 |     ----
533 |     items_to_select: List[Dict]
534 |         Candidate documents for selection.
535 |     general_input: Dict
536 |         Hyperparameter.
537 |     num_threads: int
538 |         Number of Thread.
539 |     use_tqdm: bool
540 |         Whether to use tqdm or not.
541 |     old_data: List[Dict]=None
542 |         Old data before updating query.
543 | 
544 |     Returns
545 |     -------
546 |     results: List
547 |         Selected supporting documents.
548 |     """
549 |     new_items_to_select = []
550 |     if old_data is None:
551 |         logger.info("Old data is None...")
552 |         for item in items_to_select:
553 |             new_items_to_select.append({'alce_item': item})
554 |     else:
555 |         logger.info("Use old data...")
556 |         question_to_docs = {item["question"]: item["docs"] for item in old_data}
557 |         for item in items_to_select:
558 |             new_items_to_select.append({'alce_item': item, "old_selected_docs": question_to_docs[item["question"]]})
559 |     idx_items_to_select = list(enumerate(new_items_to_select))  # List[Tuple(index, item)]
560 |     account_manager = get_account_manager('openai_account_files/used.txt', 'openai_account_files/accounts.txt', multi_thread=True)
561 |     inp_manager = OpenAI_API_inp_Manager_MultiThread_Generalized(idx_items_to_select, general_input)
562 | 
563 |     if use_tqdm:
564 |         pbar = tqdm.tqdm(total=len(idx_items_to_select))
565 |     else:
566 |         pbar = None
567 | 
568 |     thread_list = []
569 |     for i in range(num_threads):
570 |         thread_list.append(MyThread(i, account_manager, inp_manager, True, pbar))
571 | 
572 |     for t in thread_list:
573 |         t.start()
574 |     for i, t in enumerate(thread_list):
575 |         t.join()
576 | 
577 |     results_with_idx = []
578 | 
579 |     for t in thread_list:
580 |         results_with_idx.extend(t.results_with_idx)
581 | 
582 |     results_with_idx.sort(key=lambda x: x[0])
583 |     results = list(map(lambda x: x[1], results_with_idx))
584 |     return results
585 | 


--------------------------------------------------------------------------------