├── .DS_Store ├── results ├── .DS_Store └── cast21_human_evaluation.part.json ├── scripts ├── .DS_Store ├── run_chat_prompt_rewrite.sh ├── run_chat_prompt_cot_rewrite.sh ├── run_chat_prompt_rewrite_and_response.sh ├── run_chat_prompt_cot_rewrite_and_response.sh └── run_chat_prompt_rewrite_then_response.sh ├── evaluation ├── run_eval_dense_retrieval.sh ├── trec_eval.py ├── dense_retrievers.py └── eval_dense_retrieval.py ├── utils.py ├── generator.py ├── chat_prompt_rewrite.py ├── chat_prompt_rewrite_and_response.py ├── chat_prompt_cot_rewrite_and_response.py ├── chat_prompt_cot_rewrite.py ├── chat_prompt_rewrite_then_response.py ├── requirements.txt ├── README.md ├── chat_promptor.py └── demonstrations.json /.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kyriemao/LLM4CS/HEAD/.DS_Store -------------------------------------------------------------------------------- /results/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kyriemao/LLM4CS/HEAD/results/.DS_Store -------------------------------------------------------------------------------- /scripts/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kyriemao/LLM4CS/HEAD/scripts/.DS_Store -------------------------------------------------------------------------------- /scripts/run_chat_prompt_rewrite.sh: -------------------------------------------------------------------------------- 1 | python chat_prompt_rewrite.py \ 2 | --open_ai_key_id=0 \ 3 | --test_file_path="./datasets/cast20_test.json" \ 4 | --demo_file_path="./demonstrations.json" \ 5 | --qrel_file_path="./datasets/cast20_qrel.tsv" \ 6 | --work_dir="./results/new/cast20/REW" \ 7 | --n_generation=5 \ -------------------------------------------------------------------------------- /scripts/run_chat_prompt_cot_rewrite.sh: -------------------------------------------------------------------------------- 1 | python chat_prompt_cot_rewrite.py \ 2 | --open_ai_key_id=0 \ 3 | --test_file_path="./datasets/cast21_test.json" \ 4 | --demo_file_path="./demonstrations.json" \ 5 | --qrel_file_path="./datasets/cast21_qrel.tsv" \ 6 | --work_dir="./results/new/cast21/COT_REW" \ 7 | --n_generation=5 \ -------------------------------------------------------------------------------- /scripts/run_chat_prompt_rewrite_and_response.sh: -------------------------------------------------------------------------------- 1 | python chat_prompt_rewrite_and_response.py \ 2 | --open_ai_key_id=0 \ 3 | --qrel_file_path="./datasets/cast20_qrel.tsv" \ 4 | --test_file_path="./datasets/cast20_test.json" \ 5 | --demo_file_path="./demonstrations.json" \ 6 | --work_dir="./results/new/cast20/RAR" \ 7 | --n_generation=5 \ -------------------------------------------------------------------------------- /scripts/run_chat_prompt_cot_rewrite_and_response.sh: -------------------------------------------------------------------------------- 1 | python chat_prompt_cot_rewrite_and_response.py \ 2 | --open_ai_key_id=1 \ 3 | --qrel_file_path="./datasets/cast21_qrel.tsv" \ 4 | --test_file_path="./datasets/cast21_test.json" \ 5 | --demo_file_path="./demonstrations.json" \ 6 | --work_dir="./results/new/cast21/COT_RAR" \ 7 | --n_generation=5 \ -------------------------------------------------------------------------------- /scripts/run_chat_prompt_rewrite_then_response.sh: -------------------------------------------------------------------------------- 1 | python chat_prompt_rewrite_then_response.py \ 2 | --open_ai_key_id=0 \ 3 | --test_file_path="./datasets/cast19_test.json" \ 4 | --qrel_file_path="./datasets/cast19_qrel.tsv" \ 5 | --work_dir="./results/new/cast19/RTR" \ 6 | --demo_file_path="./demonstrations.json" \ 7 | --rewrite_file_path="./results/new/cast19/REW/rewrites.jsonl" \ 8 | --n_generation=5 \ -------------------------------------------------------------------------------- /evaluation/run_eval_dense_retrieval.sh: -------------------------------------------------------------------------------- 1 | # eval_dense_retrieval.sh 2 | # An example of evaluating RTR on cast20. 3 | eval_field_name="predicted_rewrite" 4 | work_dir="../results/cast20/RTR" # set your the folder containing your `rewrites.jsonl`file 5 | 6 | eval_file_path="$work_dir/rewrites.jsonl" \ 7 | index_path="" # set the pre-built index which contains all candidate passage emebddings. 8 | qrel_file_path="../datasets/cast20_qrel.tsv" # set the qrel file path 9 | retrieval_output_path="$work_dir/ance/+q+r+sc" # set your expected output folder path 10 | 11 | export CUDA_VISIBLE_DEVICES=0 12 | python eval_dense_retrieval.py \ 13 | --eval_file_path=$eval_file_path \ 14 | --eval_field_name=$eval_field_name \ 15 | --qrel_file_path=$qrel_file_path \ 16 | --index_path=$index_path \ 17 | --retriever_path="" \ # set the pre-trained ANCE model path. 18 | --use_gpu_in_faiss \ 19 | --n_gpu_for_faiss=1 \ 20 | --top_n=1000 \ 21 | --rel_threshold=2 \ # set 1 for cast19 and 2 for cast20. 22 | --retrieval_output_path=$retrieval_output_path \ 23 | --include_query \ 24 | --include_response \ # enable `include_response` if you test RTR or RAR prompting. 25 | --aggregation_method="sc" \ # you can set [`maxprob, mean, sc`] -------------------------------------------------------------------------------- /utils.py: -------------------------------------------------------------------------------- 1 | from IPython import embed 2 | 3 | import os 4 | import json 5 | import torch 6 | import random 7 | import numpy as np 8 | 9 | 10 | def set_seed(args): 11 | random.seed(args.seed) 12 | np.random.seed(args.seed) 13 | torch.manual_seed(args.seed) 14 | if torch.cuda.is_available(): 15 | torch.cuda.manual_seed(args.seed) 16 | torch.cuda.manual_seed_all(args.seed) 17 | torch.backends.cudnn.deterministic = True 18 | torch.backends.cudnn.benchmark = False 19 | os.environ["PYTHONHASHSEED"] = str(args.seed) 20 | 21 | 22 | def get_has_qrel_label_sample_ids(qrel_file): 23 | with open(qrel_file, 'r') as f: 24 | qrel_data = f.readlines() 25 | qids = set() 26 | for line in qrel_data: 27 | line = line.strip().split("\t") 28 | if len(line) == 1: 29 | line = line[0].strip().split(' ') 30 | qid = line[0] 31 | qids.add(qid) 32 | return qids 33 | 34 | 35 | def get_finished_sample_ids(output_file_path): 36 | finished_samples = {} 37 | if os.path.exists(output_file_path): 38 | with open(output_file_path, "r") as f: 39 | data = f.readlines() 40 | for line in data: 41 | line = json.loads(line) 42 | finished_samples[line['sample_id']] = {} 43 | if "predicted_rewrite" in line: 44 | finished_samples[line['sample_id']]["predicted_rewrite"] = line['predicted_rewrite'] 45 | if "predicted_response" in line: 46 | finished_samples[line['sample_id']]["predicted_response"] = line['predicted_response'] 47 | if "cot" in line: 48 | finished_samples[line['sample_id']]["cot"] = line['cot'] 49 | if "rewrite_part_text" in line: 50 | finished_samples[line['sample_id']]["rewrite_part_text"] = line['rewrite_part_text'] 51 | 52 | return finished_samples 53 | -------------------------------------------------------------------------------- /generator.py: -------------------------------------------------------------------------------- 1 | import time 2 | import openai 3 | from IPython import embed 4 | 5 | # TODO: Write your OpenAI API here. 6 | OPENAI_KEYS = [ 7 | 'xxx', 8 | ] 9 | 10 | 11 | 12 | # from https://github.com/texttron/hyde/blob/main/src/hyde/generator.py 13 | class ChatGenerator: 14 | def __init__(self, 15 | api_key, 16 | n_generation, 17 | **kwargs): 18 | self.model_name = 'gpt-3.5-turbo-16k' 19 | self.api_key = api_key 20 | self.n_generation = n_generation 21 | self.kwargs = kwargs 22 | 23 | def parse_result(self, result, parse_fn): 24 | choices = result['choices'] 25 | n_fail = 0 26 | res = [] 27 | 28 | for i in range(len(choices)): 29 | output = choices[i]['message']['content'] 30 | output = parse_fn(output) 31 | 32 | if not output: 33 | n_fail += 1 34 | else: 35 | res.append(output) 36 | 37 | return n_fail, res 38 | 39 | 40 | def generate(self, prompt, parse_fn): 41 | n_generation = self.n_generation 42 | output = [] 43 | n_try = 0 44 | # embed() 45 | # input() 46 | while True: 47 | if n_try == 5: 48 | if len(output) == 0: 49 | raise ValueError("Have tried 5 times but still only got 0 successful outputs") 50 | output += output[:5-len(output)] 51 | break 52 | 53 | while True: 54 | try: 55 | result = openai.ChatCompletion.create( 56 | model=self.model_name, 57 | messages=[ 58 | {"role": "system", "content": "You are a helpful assistant."}, 59 | {"role": "user", "content": "{}".format(prompt)}, 60 | ], 61 | api_key=self.api_key, 62 | n=n_generation, 63 | **self.kwargs 64 | ) 65 | # embed() 66 | # input() 67 | break 68 | except openai.error.RateLimitError: 69 | time.sleep(20) 70 | print("Trigger RateLimitError, wait 20s...") 71 | 72 | n_fail, res = self.parse_result(result, parse_fn) 73 | output += res 74 | 75 | if n_fail == 0: 76 | return output 77 | else: 78 | n_generation = n_fail 79 | 80 | n_try += 1 81 | 82 | 83 | 84 | 85 | -------------------------------------------------------------------------------- /evaluation/trec_eval.py: -------------------------------------------------------------------------------- 1 | import os 2 | import json 3 | import pytrec_eval 4 | import numpy as np 5 | from pprint import pprint 6 | from IPython import embed 7 | 8 | def trec_eval(run_trec_file, qrel_trec_file, retrieval_output_path, rel_threshold): 9 | # process run trec file 10 | with open(run_trec_file, 'r' )as f: 11 | run_data = f.readlines() 12 | runs = {} 13 | for line in run_data: 14 | line = line.split(" ") 15 | sample_id = line[0] 16 | doc_id = line[2] 17 | score = float(line[4]) 18 | if sample_id not in runs: 19 | runs[sample_id] = {} 20 | runs[sample_id][doc_id] = score 21 | 22 | # process qrel trec file 23 | with open(qrel_trec_file, 'r') as f: 24 | qrel_data = f.readlines() 25 | qrels = {} 26 | qrels_ndcg = {} 27 | for line in qrel_data: 28 | line = line.strip().split("\t") 29 | query = line[0] 30 | doc_id = line[2] 31 | rel = int(line[3]) 32 | if query not in qrels: 33 | qrels[query] = {} 34 | if query not in qrels_ndcg: 35 | qrels_ndcg[query] = {} 36 | 37 | # for NDCG 38 | qrels_ndcg[query][doc_id] = rel 39 | # for MAP, MRR, Recall 40 | if rel >= rel_threshold: 41 | rel = 1 42 | else: 43 | rel = 0 44 | qrels[query][doc_id] = rel 45 | 46 | 47 | # pytrec_eval eval 48 | evaluator = pytrec_eval.RelevanceEvaluator(qrels, {"map", "recip_rank", "recall.5", "recall.10", "recall.20", "recall.100"}) 49 | res = evaluator.evaluate(runs) 50 | map_list = [v['map'] for v in res.values()] 51 | mrr_list = [v['recip_rank'] for v in res.values()] 52 | recall_5_list = [v['recall_5'] for v in res.values()] 53 | recall_10_list = [v['recall_10'] for v in res.values()] 54 | recall_20_list = [v['recall_20'] for v in res.values()] 55 | recall_100_list = [v['recall_100'] for v in res.values()] 56 | 57 | evaluator = pytrec_eval.RelevanceEvaluator(qrels_ndcg, {"ndcg_cut.3"}) 58 | res = evaluator.evaluate(runs) 59 | ndcg_3_list = [v['ndcg_cut_3'] for v in res.values()] 60 | 61 | res = { 62 | "MAP": np.average(map_list), 63 | "MRR": np.average(mrr_list), 64 | "Recall@5": np.average(recall_5_list), 65 | "Recall@10": np.average(recall_10_list), 66 | "Recall@20": np.average(recall_20_list), 67 | "Recall@100": np.average(recall_100_list), 68 | "NDCG@3": np.average(ndcg_3_list), 69 | } 70 | 71 | 72 | print("---------------------Evaluation results:---------------------") 73 | pprint(res) 74 | if retrieval_output_path: 75 | with open(os.path.join(retrieval_output_path, "metric.res"), "w") as f: 76 | f.write(json.dumps(res, indent=4)) 77 | 78 | return res 79 | 80 | 81 | 82 | if __name__ == "__main__": 83 | pass -------------------------------------------------------------------------------- /chat_prompt_rewrite.py: -------------------------------------------------------------------------------- 1 | from IPython import embed 2 | import os 3 | import json 4 | import time 5 | import argparse 6 | from tqdm import tqdm, trange 7 | from chat_promptor import RewritePromptor 8 | from generator import ChatGenerator, OPENAI_KEYS 9 | from utils import set_seed, get_finished_sample_ids, get_has_qrel_label_sample_ids 10 | 11 | 12 | def get_args(): 13 | parser = argparse.ArgumentParser() 14 | parser.add_argument("--test_file_path", type=str, required=True) 15 | parser.add_argument("--demo_file_path", type=str, required=True) 16 | parser.add_argument("--qrel_file_path", type=str, required=True) 17 | parser.add_argument("--work_dir", type=str, required=True, help='output rewrite path.') 18 | parser.add_argument("--n_generation", type=int, required=True, help='the number for generation') 19 | parser.add_argument("--seed", type=int, default=7) 20 | parser.add_argument("--open_ai_key_id", type=int, choices=[0,1,2,3,4,5], required=True) 21 | 22 | 23 | args = parser.parse_args() 24 | os.makedirs(args.work_dir, exist_ok=True) 25 | with open(os.path.join(args.work_dir, "parameters.txt"), "w") as f: 26 | params = vars(args) 27 | f.write(json.dumps(params, indent=4)) 28 | 29 | return args 30 | 31 | 32 | def main(): 33 | args = get_args() 34 | set_seed(args) 35 | 36 | # model and promptor setting 37 | promptor = RewritePromptor(args.demo_file_path) 38 | model_kwargs = {"temperature": 0.7, "max_tokens": 64, "stop": promptor.stop_tokens} 39 | api_key = OPENAI_KEYS[args.open_ai_key_id] 40 | generator = ChatGenerator(api_key, args.n_generation, **model_kwargs) 41 | 42 | 43 | # test_dataset 44 | output_file_path = os.path.join(args.work_dir, "rewrites.jsonl") 45 | finished_samples = get_finished_sample_ids(output_file_path) 46 | has_qrel_labels_samples = get_has_qrel_label_sample_ids(args.qrel_file_path) 47 | 48 | with open(args.test_file_path, "r") as f: 49 | test_dialogs = json.load(f) 50 | begin_time = time.time() 51 | 52 | # predict 53 | with open(output_file_path, "a+") as fw: 54 | for i in trange(len(test_dialogs)): 55 | dialog = test_dialogs[i] 56 | conv_id = dialog['conv_id'] 57 | turns = dialog['turns'] 58 | 59 | for i in trange(len(turns)): 60 | turn_id = turns[i]['turn_id'] 61 | sample_id = "{}_{}".format(conv_id, turn_id) 62 | 63 | if sample_id in finished_samples or sample_id not in has_qrel_labels_samples: 64 | continue 65 | 66 | if i == 0: 67 | context = None 68 | else: 69 | context = turns[:i] 70 | current_turn = turns[i] 71 | 72 | prompt = promptor.build_turn_prompt(context, current_turn) 73 | # print(prompt) 74 | # print(len(prompt.split(" "))) 75 | 76 | rewrite_list = generator.generate(prompt, promptor.parse_returned_text) 77 | # embed() 78 | # input() 79 | record = {} 80 | record['sample_id'] = sample_id 81 | record['predicted_rewrite'] = rewrite_list 82 | 83 | fw.write(json.dumps(record)) 84 | fw.write('\n') 85 | fw.flush() 86 | 87 | 88 | print("{} Generation ok!, time cost {}".format(args.work_dir, time.time() - begin_time)) 89 | 90 | 91 | if __name__ == '__main__': 92 | main() 93 | -------------------------------------------------------------------------------- /chat_prompt_rewrite_and_response.py: -------------------------------------------------------------------------------- 1 | from IPython import embed 2 | import os 3 | import json 4 | import time 5 | import argparse 6 | from tqdm import tqdm, trange 7 | from chat_promptor import RewriteAndResponsePromptor 8 | from generator import ChatGenerator, OPENAI_KEYS 9 | from utils import set_seed, get_finished_sample_ids, get_has_qrel_label_sample_ids 10 | 11 | 12 | def get_args(): 13 | parser = argparse.ArgumentParser() 14 | parser.add_argument("--test_file_path", type=str, required=True) 15 | parser.add_argument("--qrel_file_path", type=str, required=True) 16 | parser.add_argument("--demo_file_path", type=str, required=True) 17 | parser.add_argument("--work_dir", type=str, required=True, help='output rewrite path.') 18 | parser.add_argument("--n_generation", type=int, required=True, help='the number for generation') 19 | parser.add_argument("--seed", type=int, default=7) 20 | parser.add_argument("--open_ai_key_id", type=int, choices=[0,1,2,3,4,5], required=True) 21 | 22 | 23 | args = parser.parse_args() 24 | os.makedirs(args.work_dir, exist_ok=True) 25 | with open(os.path.join(args.work_dir, "parameters.txt"), "w") as f: 26 | params = vars(args) 27 | f.write(json.dumps(params, indent=4)) 28 | 29 | return args 30 | 31 | 32 | def main(): 33 | args = get_args() 34 | set_seed(args) 35 | 36 | # model and promptor setting 37 | promptor = RewriteAndResponsePromptor(args.demo_file_path) 38 | model_kwargs = {"temperature": 0.7, "max_tokens": 256, "stop": promptor.stop_tokens} 39 | api_key = OPENAI_KEYS[args.open_ai_key_id] 40 | generator = ChatGenerator(api_key, args.n_generation, **model_kwargs) 41 | 42 | 43 | # test_dataset 44 | output_file_path = os.path.join(args.work_dir, "rewrites.jsonl") 45 | finished_samples = get_finished_sample_ids(output_file_path) 46 | has_qrel_labels_samples = get_has_qrel_label_sample_ids(args.qrel_file_path) 47 | with open(args.test_file_path, "r") as f: 48 | test_dialogs = json.load(f) 49 | begin_time = time.time() 50 | 51 | # predict 52 | with open(output_file_path, "a+") as fw: 53 | for i in trange(len(test_dialogs)): 54 | dialog = test_dialogs[i] 55 | conv_id = dialog['conv_id'] 56 | turns = dialog['turns'] 57 | 58 | for i in trange(len(turns)): 59 | turn_id = turns[i]['turn_id'] 60 | sample_id = "{}_{}".format(conv_id, turn_id) 61 | 62 | if sample_id in finished_samples or sample_id not in has_qrel_labels_samples: 63 | continue 64 | 65 | if i == 0: 66 | context = None 67 | else: 68 | context = turns[:i] 69 | current_turn = turns[i] 70 | 71 | prompt = promptor.build_turn_prompt(context, current_turn) 72 | n_outputs = generator.generate(prompt, promptor.parse_returned_text) 73 | rewrite_list, response_list = list(zip(*n_outputs)) 74 | # embed() 75 | # input() 76 | record = {} 77 | record['sample_id'] = sample_id 78 | record['predicted_rewrite'] = rewrite_list 79 | record['predicted_response'] = response_list 80 | 81 | fw.write(json.dumps(record)) 82 | fw.write('\n') 83 | fw.flush() 84 | 85 | 86 | print("{} Generation ok!, time cost {}".format(args.work_dir, time.time() - begin_time)) 87 | 88 | 89 | if __name__ == '__main__': 90 | main() 91 | -------------------------------------------------------------------------------- /chat_prompt_cot_rewrite_and_response.py: -------------------------------------------------------------------------------- 1 | from IPython import embed 2 | import os 3 | import json 4 | import time 5 | import argparse 6 | from tqdm import tqdm, trange 7 | from chat_promptor import RewriteAndResponsePromptor 8 | from generator import ChatGenerator, OPENAI_KEYS 9 | from utils import set_seed, get_finished_sample_ids, get_has_qrel_label_sample_ids 10 | 11 | 12 | def get_args(): 13 | parser = argparse.ArgumentParser() 14 | parser.add_argument("--test_file_path", type=str, required=True) 15 | parser.add_argument("--qrel_file_path", type=str, required=True) 16 | parser.add_argument("--demo_file_path", type=str, required=True) 17 | parser.add_argument("--work_dir", type=str, required=True, help='output rewrite path.') 18 | parser.add_argument("--n_generation", type=int, required=True, help='the number for generation') 19 | parser.add_argument("--seed", type=int, default=7) 20 | parser.add_argument("--open_ai_key_id", type=int, choices=[0,1,2,3,4,5], required=True) 21 | 22 | 23 | args = parser.parse_args() 24 | os.makedirs(args.work_dir, exist_ok=True) 25 | with open(os.path.join(args.work_dir, "parameters.txt"), "w") as f: 26 | params = vars(args) 27 | f.write(json.dumps(params, indent=4)) 28 | 29 | return args 30 | 31 | 32 | def main(): 33 | args = get_args() 34 | set_seed(args) 35 | 36 | # model and promptor setting 37 | promptor = RewriteAndResponsePromptor(args.demo_file_path, enable_cot=True) 38 | model_kwargs = {"temperature": 0.7, "max_tokens": 256, "stop": promptor.stop_tokens} 39 | api_key = OPENAI_KEYS[args.open_ai_key_id] 40 | generator = ChatGenerator(api_key, args.n_generation, **model_kwargs) 41 | 42 | 43 | # test_dataset 44 | output_file_path = os.path.join(args.work_dir, "rewrites.jsonl") 45 | finished_samples = get_finished_sample_ids(output_file_path) 46 | has_qrel_labels_samples = get_has_qrel_label_sample_ids(args.qrel_file_path) 47 | with open(args.test_file_path, "r") as f: 48 | test_dialogs = json.load(f) 49 | begin_time = time.time() 50 | 51 | # predict 52 | with open(output_file_path, "a+") as fw: 53 | for i in trange(len(test_dialogs)): 54 | dialog = test_dialogs[i] 55 | conv_id = dialog['conv_id'] 56 | turns = dialog['turns'] 57 | 58 | for i in trange(len(turns)): 59 | turn_id = turns[i]['turn_id'] 60 | sample_id = "{}_{}".format(conv_id, turn_id) 61 | 62 | if sample_id in finished_samples or sample_id not in has_qrel_labels_samples: 63 | continue 64 | 65 | if i == 0: 66 | context = None 67 | else: 68 | context = turns[:i] 69 | current_turn = turns[i] 70 | 71 | prompt = promptor.build_turn_prompt(context, current_turn) 72 | n_outputs = generator.generate(prompt, promptor.parse_returned_text) 73 | cot_list, rewrite_list, response_list = list(zip(*n_outputs)) 74 | 75 | record = {} 76 | record['sample_id'] = sample_id 77 | record['predicted_rewrite'] = rewrite_list 78 | record['predicted_response'] = response_list 79 | record['predicted_cot'] = cot_list 80 | 81 | fw.write(json.dumps(record)) 82 | fw.write('\n') 83 | fw.flush() 84 | 85 | 86 | print("{} Generation ok!, time cost {}".format(args.work_dir, time.time() - begin_time)) 87 | 88 | 89 | if __name__ == '__main__': 90 | main() 91 | -------------------------------------------------------------------------------- /chat_prompt_cot_rewrite.py: -------------------------------------------------------------------------------- 1 | from IPython import embed 2 | import os 3 | import json 4 | import time 5 | import argparse 6 | from tqdm import tqdm, trange 7 | from chat_promptor import RewritePromptor 8 | from generator import ChatGenerator, OPENAI_KEYS 9 | from utils import set_seed, get_finished_sample_ids, get_has_qrel_label_sample_ids 10 | 11 | 12 | def get_args(): 13 | parser = argparse.ArgumentParser() 14 | parser.add_argument("--test_file_path", type=str, required=True) 15 | parser.add_argument("--demo_file_path", type=str, required=True) 16 | parser.add_argument("--qrel_file_path", type=str, required=True) 17 | parser.add_argument("--work_dir", type=str, required=True, help='output rewrite path.') 18 | parser.add_argument("--n_generation", type=int, required=True, help='the number for generation') 19 | parser.add_argument("--seed", type=int, default=7) 20 | parser.add_argument("--open_ai_key_id", type=int, choices=[0,1,2,3,4,5], required=True) 21 | 22 | 23 | args = parser.parse_args() 24 | os.makedirs(args.work_dir, exist_ok=True) 25 | with open(os.path.join(args.work_dir, "parameters.txt"), "w") as f: 26 | params = vars(args) 27 | f.write(json.dumps(params, indent=4)) 28 | 29 | return args 30 | 31 | 32 | def main(): 33 | args = get_args() 34 | set_seed(args) 35 | 36 | # model and promptor setting 37 | promptor = RewritePromptor(args.demo_file_path, enable_cot=True) 38 | model_kwargs = {"temperature": 0.7, "max_tokens": 64, "stop": promptor.stop_tokens} 39 | api_key = OPENAI_KEYS[args.open_ai_key_id] 40 | generator = ChatGenerator(api_key, args.n_generation, **model_kwargs) 41 | 42 | 43 | # test_dataset 44 | output_file_path = os.path.join(args.work_dir, "rewrites.jsonl") 45 | finished_samples = get_finished_sample_ids(output_file_path) 46 | has_qrel_labels_samples = get_has_qrel_label_sample_ids(args.qrel_file_path) 47 | 48 | with open(args.test_file_path, "r") as f: 49 | test_dialogs = json.load(f) 50 | begin_time = time.time() 51 | 52 | # predict 53 | with open(output_file_path, "a+") as fw: 54 | for i in trange(len(test_dialogs)): 55 | dialog = test_dialogs[i] 56 | conv_id = dialog['conv_id'] 57 | turns = dialog['turns'] 58 | 59 | for i in trange(len(turns)): 60 | turn_id = turns[i]['turn_id'] 61 | sample_id = "{}_{}".format(conv_id, turn_id) 62 | 63 | if sample_id in finished_samples or sample_id not in has_qrel_labels_samples: 64 | continue 65 | 66 | if i == 0: 67 | context = None 68 | else: 69 | context = turns[:i] 70 | current_turn = turns[i] 71 | 72 | prompt = promptor.build_turn_prompt(context, current_turn) 73 | # print(prompt) 74 | # print(len(prompt.split(" "))) 75 | 76 | n_outputs = generator.generate(prompt, promptor.parse_returned_text) 77 | # embed() 78 | # input() 79 | cot_list, rewrite_list = list(zip(*n_outputs)) 80 | 81 | record = {} 82 | record['sample_id'] = sample_id 83 | record['predicted_rewrite'] = rewrite_list 84 | record['predicted_cot'] = cot_list 85 | 86 | fw.write(json.dumps(record)) 87 | fw.write('\n') 88 | fw.flush() 89 | 90 | 91 | print("{} Generation ok!, time cost {}".format(args.work_dir, time.time() - begin_time)) 92 | 93 | 94 | if __name__ == '__main__': 95 | main() 96 | -------------------------------------------------------------------------------- /chat_prompt_rewrite_then_response.py: -------------------------------------------------------------------------------- 1 | from IPython import embed 2 | import os 3 | import json 4 | import time 5 | import argparse 6 | from tqdm import tqdm, trange 7 | from chat_promptor import OneShotRewriteThenResponsePrompter 8 | from generator import ChatGenerator, OPENAI_KEYS 9 | from utils import set_seed, get_finished_sample_ids, get_has_qrel_label_sample_ids 10 | 11 | 12 | def get_args(): 13 | parser = argparse.ArgumentParser() 14 | parser.add_argument("--test_file_path", type=str, required=True) 15 | parser.add_argument("--rewrite_file_path", type=str, required=True) 16 | parser.add_argument("--demo_file_path", type=str, required=True) 17 | parser.add_argument("--qrel_file_path", type=str, required=True) 18 | parser.add_argument("--work_dir", type=str, required=True, help='output rewrite path.') 19 | parser.add_argument("--n_generation", type=int, required=True, help='the number for generation') 20 | parser.add_argument("--seed", type=int, default=7) 21 | parser.add_argument("--open_ai_key_id", type=int, choices=[0,1,2,3,4,5], required=True) 22 | 23 | 24 | 25 | args = parser.parse_args() 26 | os.makedirs(args.work_dir, exist_ok=True) 27 | with open(os.path.join(args.work_dir, "parameters.txt"), "w") as f: 28 | params = vars(args) 29 | f.write(json.dumps(params, indent=4)) 30 | 31 | return args 32 | 33 | 34 | def main(): 35 | args = get_args() 36 | set_seed(args) 37 | 38 | # model and promptor setting 39 | promptor = OneShotRewriteThenResponsePrompter() 40 | model_kwargs = {"temperature": 0.7, "max_tokens": 256, "stop": promptor.stop_tokens} 41 | api_key = OPENAI_KEYS[args.open_ai_key_id] 42 | generator = ChatGenerator(api_key, args.n_generation, **model_kwargs) 43 | 44 | 45 | # test_dataset 46 | output_file_path = os.path.join(args.work_dir, "rewrites.jsonl") 47 | finished_samples = get_finished_sample_ids(output_file_path) 48 | has_qrel_labels_samples = get_has_qrel_label_sample_ids(args.qrel_file_path) 49 | with open(args.test_file_path, "r") as f: 50 | test_dialogs = json.load(f) 51 | begin_time = time.time() 52 | 53 | # load rewrites 54 | d_rewrite = {} 55 | with open(args.rewrite_file_path, "r") as f: 56 | for line in f: 57 | line = json.loads(line) 58 | sample_id = line['sample_id'] 59 | rewrite = line['predicted_rewrite'][0] 60 | d_rewrite[sample_id] = rewrite 61 | 62 | 63 | # predict 64 | with open(output_file_path, "a+") as fw: 65 | for i in trange(len(test_dialogs)): 66 | dialog = test_dialogs[i] 67 | conv_id = dialog['conv_id'] 68 | turns = dialog['turns'] 69 | 70 | for i in trange(len(turns)): 71 | turn_id = turns[i]['turn_id'] 72 | sample_id = "{}_{}".format(conv_id, turn_id) 73 | 74 | if sample_id in finished_samples or sample_id not in has_qrel_labels_samples: 75 | continue 76 | 77 | if i == 0: 78 | context = None 79 | else: 80 | context = turns[:i] 81 | current_turn = turns[i] 82 | current_turn['question'] = d_rewrite[sample_id] # modify the current question to rewrite 83 | 84 | prompt = promptor.build_turn_prompt(context, current_turn) 85 | response_list = generator.generate(prompt, promptor.parse_returned_text) 86 | 87 | record = {} 88 | record['sample_id'] = sample_id 89 | record['predicted_rewrite'] = [current_turn['question']] 90 | record['predicted_response'] = response_list 91 | 92 | fw.write(json.dumps(record)) 93 | fw.write('\n') 94 | fw.flush() 95 | 96 | 97 | print("{} Generation ok!, time cost {}".format(args.work_dir, time.time() - begin_time)) 98 | 99 | 100 | if __name__ == '__main__': 101 | main() 102 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | aiohttp==3.8.3 2 | aiosignal==1.3.1 3 | anyio==3.6.2 4 | argon2-cffi==21.3.0 5 | argon2-cffi-bindings==21.2.0 6 | arrow==1.2.3 7 | asttokens==2.2.1 8 | async-generator==1.10 9 | async-timeout==4.0.2 10 | asyncio==3.4.3 11 | attrs==22.2.0 12 | backcall==0.2.0 13 | backoff==2.2.1 14 | beautifulsoup4==4.11.2 15 | bleach==6.0.0 16 | blis==0.7.9 17 | blobfile==2.0.1 18 | catalogue==2.0.8 19 | cbor==1.0.0 20 | certifi 21 | cffi==1.15.1 22 | charset-normalizer==2.1.1 23 | click==8.1.3 24 | coloredlogs==15.0.1 25 | comm==0.1.2 26 | confection==0.0.4 27 | cymem==2.0.7 28 | Cython==0.29.33 29 | datasets==2.9.0 30 | debugpy==1.6.6 31 | decorator==5.1.1 32 | defusedxml==0.7.1 33 | dill==0.3.6 34 | exceptiongroup==1.1.0 35 | executing==1.2.0 36 | faiss-gpu 37 | fastjsonschema==2.16.2 38 | filelock==3.8.2 39 | flatbuffers==23.1.21 40 | fqdn==1.5.1 41 | frozenlist==1.3.3 42 | fsspec==2023.1.0 43 | h11==0.14.0 44 | h5py==3.8.0 45 | httpcore==0.16.3 46 | httpx==0.23.3 47 | huggingface-hub==0.11.1 48 | humanfriendly==10.0 49 | idna==3.4 50 | importlib-metadata==6.0.0 51 | importlib-resources==5.10.2 52 | ipykernel==6.20.2 53 | ipython==8.8.0 54 | ipython-genutils==0.2.0 55 | ipywidgets==8.0.4 56 | isoduration==20.11.0 57 | jedi==0.18.2 58 | Jinja2==3.1.2 59 | joblib==1.2.0 60 | jsonpointer==2.3 61 | jsonschema==4.17.3 62 | jupyter==1.0.0 63 | jupyter-console==6.4.4 64 | jupyter-events==0.6.3 65 | jupyter_client==8.0.2 66 | jupyter_core==5.2.0 67 | jupyter_server==2.2.0 68 | jupyter_server_terminals==0.4.4 69 | jupyterlab-pygments==0.2.2 70 | jupyterlab-widgets==3.0.5 71 | langcodes==3.3.0 72 | lightgbm==3.3.5 73 | llvmlite==0.39.1 74 | lxml==4.9.2 75 | MarkupSafe==2.1.2 76 | matplotlib-inline==0.1.6 77 | mistune==2.0.4 78 | mkl-fft==1.3.1 79 | mkl-random 80 | mkl-service==2.4.0 81 | mpmath==1.2.1 82 | multidict==6.0.4 83 | multiprocess==0.70.14 84 | murmurhash==1.0.9 85 | nbclassic==0.5.1 86 | nbclient==0.7.2 87 | nbconvert==7.2.9 88 | nbformat==5.7.3 89 | nest-asyncio==1.5.6 90 | nmslib==2.1.1 91 | notebook==6.5.2 92 | notebook_shim==0.2.2 93 | numba==0.56.4 94 | numpy==1.23.5 95 | onnxruntime==1.13.1 96 | openai==0.26.4 97 | OpenAIAuth==0.2.0 98 | outcome==1.2.0 99 | packaging==22.0 100 | pandas==1.5.3 101 | pandocfilters==1.5.0 102 | parso==0.8.3 103 | pathy==0.10.1 104 | pexpect==4.8.0 105 | pickleshare==0.7.5 106 | Pillow==9.3.0 107 | pkgutil_resolve_name==1.3.10 108 | platformdirs==2.6.2 109 | preshed==3.0.8 110 | prometheus-client==0.16.0 111 | prompt-toolkit==3.0.36 112 | protobuf==4.21.12 113 | psutil==5.9.4 114 | ptyprocess==0.7.0 115 | pure-eval==0.2.2 116 | pyarrow==11.0.0 117 | pybind11==2.6.1 118 | pycparser==2.21 119 | pycryptodomex==3.17 120 | pydantic==1.10.4 121 | Pygments==2.14.0 122 | pyjnius==1.4.2 123 | pyrsistent==0.19.3 124 | pyserini==0.17.0 125 | PySocks==1.7.1 126 | python-dateutil==2.8.2 127 | python-json-logger==2.0.4 128 | pytrec-eval==0.5 129 | pytz==2022.7.1 130 | PyYAML==6.0 131 | pyzmq==25.0.0 132 | qtconsole==5.4.0 133 | QtPy==2.3.0 134 | regex==2022.10.31 135 | requests==2.28.1 136 | responses==0.18.0 137 | revChatGPT==2.2.7 138 | rfc3339-validator==0.1.4 139 | rfc3986==1.5.0 140 | rfc3986-validator==0.1.1 141 | scikit-learn==1.2.1 142 | scipy==1.10.0 143 | selenium==4.8.0 144 | Send2Trash==1.8.0 145 | sentencepiece==0.1.97 146 | six 147 | smart-open==6.3.0 148 | sniffio==1.3.0 149 | socksio==1.0.0 150 | sortedcontainers==2.4.0 151 | soupsieve==2.3.2.post1 152 | spacy==3.5.0 153 | spacy-legacy==3.0.12 154 | spacy-loggers==1.0.4 155 | srsly==2.4.5 156 | stack-data==0.6.2 157 | sympy==1.11.1 158 | terminado==0.17.1 159 | thinc==8.1.7 160 | threadpoolctl==3.1.0 161 | tiktoken==0.2.0 162 | tinycss2==1.2.1 163 | tls-client==0.1.8 164 | tokenizers==0.13.2 165 | torch==1.12.0+cu113 166 | torchaudio==0.12.0+cu113 167 | torchvision==0.13.0+cu113 168 | tornado==6.2 169 | tqdm==4.64.1 170 | traitlets==5.8.0 171 | transformers==4.25.1 172 | trec-car-tools==2.6 173 | trio==0.22.0 174 | trio-websocket==0.9.2 175 | typer==0.7.0 176 | typing_extensions==4.4.0 177 | ujson==5.7.0 178 | undetected-chromedriver==3.4.6 179 | uri-template==1.2.0 180 | urllib3==1.26.13 181 | wasabi==1.1.1 182 | wcwidth==0.2.5 183 | webcolors==1.12 184 | webencodings==0.5.1 185 | websocket-client==1.5.0 186 | websockets==10.4 187 | widgetsnbextension==4.0.5 188 | wsproto==1.2.0 189 | xxhash==3.2.0 190 | yarl==1.8.2 191 | zipp==3.12.0 192 | -------------------------------------------------------------------------------- /evaluation/dense_retrievers.py: -------------------------------------------------------------------------------- 1 | from IPython import embed 2 | import sys 3 | sys.path += ['../'] 4 | import numpy as np 5 | 6 | import torch 7 | from torch import nn 8 | from transformers import (RobertaConfig, RobertaForSequenceClassification, RobertaTokenizer, 9 | AutoTokenizer, BertModel, 10 | DPRQuestionEncoderTokenizer, DPRContextEncoderTokenizer, 11 | DPRContextEncoder, DPRQuestionEncoder) 12 | 13 | 14 | 15 | # ANCE model 16 | class ANCE(RobertaForSequenceClassification): 17 | # class Pooler: # adapt to DPR 18 | # def __init__(self, pooler_output): 19 | # self.pooler_output = pooler_output 20 | 21 | def __init__(self, config): 22 | RobertaForSequenceClassification.__init__(self, config) 23 | self.embeddingHead = nn.Linear(config.hidden_size, 768) # ANCE has 24 | self.norm = nn.LayerNorm(768) 25 | self.apply(self._init_weights) 26 | self.use_mean = False 27 | 28 | def _init_weights(self, module): 29 | """ Initialize the weights """ 30 | if isinstance(module, (nn.Linear, nn.Embedding, nn.Conv1d)): 31 | # Slightly different from the TF version which uses truncated_normal for initialization 32 | # cf https://github.com/pytorch/pytorch/pull/5617 33 | module.weight.data.normal_(mean=0.0, std=0.02) 34 | 35 | def query_emb(self, input_ids, attention_mask): 36 | outputs1 = self.roberta(input_ids=input_ids, 37 | attention_mask=attention_mask) 38 | outputs1 = outputs1.last_hidden_state 39 | full_emb = self.masked_mean_or_first(outputs1, attention_mask) 40 | query1 = self.norm(self.embeddingHead(full_emb)) 41 | return query1 42 | 43 | 44 | def doc_emb(self, input_ids, attention_mask): 45 | return self.query_emb(input_ids, attention_mask) 46 | 47 | 48 | def masked_mean_or_first(self, emb_all, mask): 49 | if self.use_mean: 50 | return self.masked_mean(emb_all, mask) 51 | else: 52 | return emb_all[:, 0] 53 | 54 | def masked_mean(self, t, mask): 55 | s = torch.sum(t * mask.unsqueeze(-1).float(), axis=1) 56 | d = mask.sum(axis=1, keepdim=True).float() 57 | return s / d 58 | 59 | def forward(self, input_ids, attention_mask, wrap_pooler=False): 60 | return self.query_emb(input_ids, attention_mask) 61 | 62 | 63 | 64 | # TCTColBERT model 65 | class TCTColBERT(nn.Module): 66 | def __init__(self, model_path) -> None: 67 | super(TCTColBERT, self).__init__() 68 | self.model = BertModel.from_pretrained(model_path) 69 | 70 | def forward(self, input_ids, attention_mask, **kwargs): 71 | outputs = self.model(input_ids=input_ids, attention_mask=attention_mask).last_hidden_state 72 | 73 | if "cur_utt_end_position" in kwargs: 74 | device = outputs.device 75 | cur_utt_end_positions = kwargs["cur_utt_end_positions"] 76 | output_mask = torch.zeros(attention_mask.size()).to(device) 77 | mask_row = [] 78 | mask_col = [] 79 | for i in range(len(cur_utt_end_positions)): 80 | mask_row += [i] * (cur_utt_end_positions[i] - 3) 81 | mask_col += list(range(4, cur_utt_end_positions[i] + 1)) 82 | 83 | mask_index = ( 84 | torch.tensor(mask_row).long().to(device), 85 | torch.tensor(mask_col).long().to(device) 86 | ) 87 | values = torch.ones(len(mask_row)).to(device) 88 | output_mask = output_mask.index_put(mask_index, values) 89 | else: 90 | output_mask = attention_mask 91 | output_mask[:, :4] = 0 # filter the first 4 tokens: [CLS] "[" "Q/D" "]" 92 | 93 | # sum / length 94 | sum_outputs = torch.sum(outputs * output_mask.unsqueeze(-1), dim = -2) 95 | real_seq_length = torch.sum(output_mask, dim = 1).view(-1, 1) 96 | 97 | return sum_outputs / real_seq_length 98 | 99 | 100 | 101 | 102 | ''' 103 | Model-related functions 104 | ''' 105 | 106 | def load_dense_retriever(model_type, query_or_doc, model_path): 107 | assert query_or_doc in ("query", "doc") 108 | if model_type.lower() == "ance": 109 | config = RobertaConfig.from_pretrained( 110 | model_path, 111 | finetuning_task="MSMarco", 112 | ) 113 | tokenizer = RobertaTokenizer.from_pretrained( 114 | model_path, 115 | do_lower_case=True 116 | ) 117 | model = ANCE.from_pretrained(model_path, config=config) 118 | elif model_type.lower() == "dpr-nq": 119 | if query_or_doc == "query": 120 | tokenizer = DPRQuestionEncoderTokenizer.from_pretrained(model_path) 121 | model = DPRQuestionEncoder.from_pretrained(model_path) 122 | else: 123 | tokenizer = DPRContextEncoderTokenizer.from_pretrained(model_path) 124 | model = DPRContextEncoder.from_pretrained(model_path) 125 | elif model_type.lower() == "tctcolbert": 126 | tokenizer = AutoTokenizer.from_pretrained(model_path) 127 | model = TCTColBERT(model_path) 128 | else: 129 | raise ValueError 130 | 131 | # tokenizer.add_tokens(["", "", "", ""]) 132 | # model.resize_token_embeddings(len(tokenizer)) 133 | 134 | return tokenizer, model 135 | 136 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # LLM4CS 2 | 3 | This is the repository for our EMNLP 2023 Findings paper: 4 | 5 | **Large Language Models Know Your Contextual Search Intent: A Prompting Framework for Conversational Search** 6 | 7 | 8 | ## 🌏 Environment 9 | We conduct experiemnts in Python 3.8.13. 10 | 11 | Main packages: 12 | - Generating rewrites: 13 | - openai==0.26.4 14 | - numpy==1.23.5 15 | 16 | - For evaluation: 17 | - torch==1.12.0+cu113 18 | - trec-car-tools==2.6 19 | - faiss-gpu 20 | - pyserini==0.17.0 21 | 22 | 23 | ```python 24 | conda create -n llmcs python=3.8 25 | source activate llmcs 26 | pip install -r requirements.txt 27 | ``` 28 | 29 | 30 | 31 | ## 📑 Data 32 | 33 | 1. We provide the preprocessed cast19,20,21 datasets in the `datasets` folder. 34 | 35 | 2. `demonstrations.json` contains four exemplars randomly sampled from the CAsT-22 dataset. We manually write CoT for all of its turns. 36 | 37 | 38 | ## 🏃 Running 39 | LLM4CS contains three prompting methods, including *Rewriting Prompt (REW)*, *Rewriting-Then-Response Prompt (RTR)*, and *Rewriting-And-Response Prompt (RAR)*. We also design chain-of-thought tailored to conversational search intent understanding that can be incorporated into these prompting methods. 40 | 41 | To get started, first, you should set your OpenAI API key in `generator.py` 42 | ```python 43 | # TODO: Write your OpenAI API here. 44 | OPENAI_KEYS = [ 45 | 'Your key', 46 | ] 47 | ``` 48 | 49 | ### REW Prompting 50 | To perform REW prompting, run: 51 | ```shell 52 | bash scripts/run_chat_prompt_rewrite.sh 53 | ``` 54 | Also, you can enable CoT by running: 55 | ```shell 56 | bash scripts/run_chat_prompt_cot_rewrite.sh 57 | ``` 58 | 59 | ### RTR Prompting 60 | Similarly, to perform RTR prompting, run: 61 | ```shell 62 | bash scripts/run_chat_prompt_rewrite_then_response.sh 63 | ``` 64 | Note that you need to provide a pre-generated rewrite file into the field of `rewrite_file_path` for running `prompt_rewrite_then_response.py`. To enable CoT for RAR, you can set `rewrite_file_path` to the rewrite file generated using CoT. 65 | ```sh 66 | --rewrite_file_path="./results/cast20/REW/rewrites.jsonl" \ 67 | # --rewrite_file_path="./results/cast20/COT-REW/rewrites.jsonl" \ +COT 68 | ``` 69 | 70 | 71 | ### RAR Prompting 72 | Similarly, to perform RAR prompting, run: 73 | ```shell 74 | bash scripts/run_chat_prompt_rewrite_and_response.sh 75 | ``` 76 | Also, you can enable CoT by running: 77 | ```shell 78 | bash scripts/run_chat_prompt_cot_rewrite_and_response.sh 79 | ``` 80 | 81 | 82 | ## 🥚 Results 83 | A `rewrites.jsonl` file, which contains the rewrites or/and hypothetical responses, will be generated into the `work_dir` that you set in the running script. 84 | 85 | 86 | We have provided our generated `rewrites.jsonl` files in the `results` folder. 87 | 88 | The Keys of `rewrites.jsonl`: 89 | - `predicted_rewrite`: rewrite (list) 90 | - `preidcted_response`: hypothetical response (list) 91 | - other auxiliary information 92 | 93 | 94 | **Note:** 95 | 96 | We provide some results of REW on CAsT-19 in `results/cast19/REW` and `results/cast19/COT+REW` for Case Study mentioned in Appendix B.1. 97 | We also provide a part of human evaluation results on CAsT-21 in `results/cast21_human_evaluation.part.json` for Case Study mentioned in Appendix B.2. 98 | 99 | All results will be provided upon acceptance. 100 | 101 | 102 | ## ⚖️ Evaluation 103 | We design three aggregation methods, including *MaxProb*, *Mean*, and *SC*, to get the final search intent vector. Then we perform dense retrieval with [ANCE (click to download)](https://webdatamltrainingdiag842.blob.core.windows.net/semistructstore/OpenSource/Passage_ANCE_FirstP_Checkpoint.zip) for evaluation. 104 | 105 | 106 | **To perform evaluation, you should first build the dense index that contains all candidate passage embeddings.** There have been many good public repositories that instruct how to build this index, such as [AutoRewriter](https://github.com/thunlp/ConversationQueryRewriter), [ConvDR](https://github.com/thunlp/ConvDR), and [ConvTrans](https://github.com/kyriemao/ConvTrans). One can refer them to build the index. All passage embeddings generated from ANCE are around 103GB. 107 | 108 | 109 | Then, run the following script for evaluation: 110 | ```sh 111 | cd evaluation 112 | bash run_eval_dense_retrieval.sh 113 | ``` 114 | We have annotated the important arguments of `run_eval_dense_retrieval.sh` as below: 115 | ```sh 116 | # eval_dense_retrieval.sh 117 | # An example of evaluating RTR on cast20. 118 | eval_field_name="predicted_rewrite" 119 | work_dir="../results/cast20/RTR" # set your the folder containing your `rewrites.jsonl`file 120 | 121 | eval_file_path="$work_dir/rewrites.jsonl" \ 122 | index_path="" # set the pre-built index which contains all candidate passage emebddings. 123 | qrel_file_path="../datasets/cast20_qrel.tsv" # set the qrel file path 124 | retrieval_output_path="$work_dir/ance/+q+r+sc" # set your expected output folder path 125 | 126 | export CUDA_VISIBLE_DEVICES=0 127 | python eval_dense_retrieval.py \ 128 | --eval_file_path=$eval_file_path \ 129 | --eval_field_name=$eval_field_name \ 130 | --qrel_file_path=$qrel_file_path \ 131 | --index_path=$index_path \ 132 | --retriever_path="" \ # set the pre-trained ANCE model path. 133 | --use_gpu_in_faiss \ 134 | --n_gpu_for_faiss=1 \ 135 | --top_n=1000 \ 136 | --rel_threshold=2 \ # set 1 for cast19 and 2 for cast20. 137 | --retrieval_output_path=$retrieval_output_path \ 138 | --include_query \ 139 | --include_response \ # enable `include_response` if you test RTR or RAR prompting. 140 | --aggregation_method="sc" \ # you can set [`maxprob, mean, sc`] 141 | ``` 142 | 143 | Evaluation results that contains the following three files will be output into `retrieval_output_path` that you set. 144 | - `metrics.res`: all evaluation metrics. 145 | - `parameters.txt`: parameters record of the evaluation. 146 | - `res.trec`: detailed TREC-style search results. 147 | 148 | 149 | 150 | 151 | -------------------------------------------------------------------------------- /chat_promptor.py: -------------------------------------------------------------------------------- 1 | from IPython import embed 2 | import re 3 | import json 4 | 5 | 6 | def check_length(prompt, max_length): 7 | n = len(prompt.split(' ')) 8 | if n >= max_length: 9 | return False 10 | return True 11 | 12 | 13 | class RewritePromptor: 14 | def __init__(self, demo_file=None, enable_cot=False) -> None: 15 | 16 | self.instruction = "For an information-seeking dialog, please help reformulate the question into rewrite that can fully express the user's information needs without the need of context." 17 | self.enable_cot = enable_cot 18 | self.demo = self.get_demo(demo_file) 19 | if self.demo != "": 20 | self.instruction += " I will give you several example multi-turn dialogs, where each turn contains a question, a response, and a rewrite that you need to generate." 21 | if enable_cot: 22 | self.instruction += " The rewrite part begins with a sentence explaining the reason for the generated rewrite." 23 | if enable_cot: 24 | self.tail_instruction = "Now, you should give me the rewrite of the **Current Question** under the **Context**. The output format should always be: \"Rewrite: $Reason. So the question should be rewritten as: $Rewrite.\" Note that you should always try to rewrite it. Never ask for clarification or say you don't understand it in the generated rewrite. Go ahead!" 25 | else: 26 | self.tail_instruction = "Now, you should give me the rewrite of the **Current Question** under the **Context**. The output format should always be: Rewrite: $Rewrite. Note that you should always try to rewrite it. Never ask for clarification or say you don't understand it in the generated rewrite. Go ahead!" 27 | self.stop_tokens = ['\n'] 28 | 29 | def get_demo(self, demo_file): 30 | try: 31 | with open(demo_file, "r") as f: 32 | demos = json.load(f) 33 | except: 34 | print("warning: No demonstration file.") 35 | return "" 36 | 37 | examples = [] 38 | for demo in demos: 39 | turns = demo['turns'] 40 | 41 | dialog = [] 42 | for turn in turns: 43 | if self.enable_cot: 44 | rewrite = turn['cot'] + " So the question should be rewritten as: {}".format(turn['manual_rewrite']) 45 | else: 46 | rewrite = turn['manual_rewrite'] 47 | turn_text = "Question: {}\nRewrite: {}\nResponse: {}".format(turn['question'], rewrite, turn['response']) 48 | dialog.append(turn_text) 49 | dialog = "\n\n".join(dialog) 50 | 51 | examples.append(dialog) 52 | 53 | for i in range(len(examples)): 54 | examples[i] = "Example #{}:\n".format(i+1) + examples[i] 55 | 56 | return "\n\n".join(examples) 57 | 58 | def build_turn_prompt(self, context, current_turn): 59 | # context 60 | this_dialog = [] 61 | if not context: 62 | this_dialog.append("N/A") 63 | else: 64 | for turn in context: 65 | this_dialog.append("Question: {}\nResponse: {}".format(turn['question'], turn['response'])) 66 | 67 | this_dialog[0] = "Context:\n" + this_dialog[0] 68 | 69 | # current turn 70 | this_dialog.append("Current Question: " + current_turn['question']) 71 | this_dialog = "\n\n".join(this_dialog) 72 | this_dialog = "YOUR TASK (only questions and responses may be given):\n" + this_dialog 73 | this_prompt = [self.instruction, self.demo, this_dialog, self.tail_instruction] 74 | this_prompt = "\n\n".join(this_prompt) 75 | 76 | return this_prompt 77 | 78 | def parse_returned_text(self, text): 79 | text = text.strip() 80 | if text[:9] != "Rewrite: ": 81 | return None 82 | if not self.enable_cot: 83 | return text[9:] 84 | else: 85 | fixed_sentence = "So the question should be rewritten as: " 86 | index = text.find(fixed_sentence) 87 | if index != -1: 88 | cot = text[:index] 89 | rewrite = text[index + len(fixed_sentence):] 90 | return [cot.strip(), rewrite.strip()] 91 | else: 92 | return None 93 | 94 | class RewriteAndResponsePromptor: 95 | def __init__(self, demo_file, enable_cot=False) -> None: 96 | 97 | self.instruction = "For an information-seeking dialog, please help reformulate the question into rewrite that can fully express the user's information needs without the need of context, but also generate an informative response to answer the question." 98 | self.enable_cot = enable_cot 99 | self.demo = self.get_demo(demo_file) 100 | if self.demo != "": 101 | self.instruction += " I will give you several example multi-turn dialogs, where each turn contains a question as well as a rewrite and a response that you need to generate." 102 | if enable_cot: 103 | self.instruction += " The rewrite part begins with a sentence explaining the reason for the generated rewrite." 104 | if enable_cot: 105 | self.tail_instruction = "Now, you should give me the rewrite and response of the **Current Question** under the **Context**. The output format should always be: \"Rewrite: $Reason. So the question should be rewritten as: $Rewrite\nResponse: $Response.\" Note that you should always try to rewrite it and generate an informative response. Never ask for clarification or say you don't understand it in the generated rewrite and response. Go ahead!" 106 | else: 107 | self.tail_instruction = "Now, you should give me the rewrite and response of the **Current Question** under the **Context**. The output format should always be:\nRewrite: $Rewrite\nResponse: $Response.\nNote that you should always try to rewrite it and generate an informative response. Never ask for clarification or say you don't understand it in the generated rewrite and response. Go ahead!" 108 | self.stop_tokens = None 109 | 110 | def get_demo(self, demo_file): 111 | try: 112 | with open(demo_file, "r") as f: 113 | demos = json.load(f) 114 | except: 115 | print("warning: No demonstration file.") 116 | return "" 117 | 118 | examples = [] 119 | for demo in demos: 120 | turns = demo['turns'] 121 | 122 | dialog = [] 123 | for turn in turns: 124 | if self.enable_cot: 125 | rewrite = turn['cot'] + " So the question should be rewritten as: {}".format(turn['manual_rewrite']) 126 | else: 127 | rewrite = turn['manual_rewrite'] 128 | turn_text = "Question: {}\nRewrite: {}\nResponse: {}".format(turn['question'], rewrite, turn['response']) 129 | dialog.append(turn_text) 130 | dialog = "\n\n".join(dialog) 131 | 132 | examples.append(dialog) 133 | 134 | for i in range(len(examples)): 135 | examples[i] = "Example #{}:\n".format(i+1) + examples[i] 136 | 137 | return "\n\n".join(examples) 138 | 139 | def build_turn_prompt(self, context, current_turn): 140 | # context 141 | this_dialog = [] 142 | if not context: 143 | this_dialog.append("N/A") 144 | else: 145 | for turn in context: 146 | this_dialog.append("Question: {}\nResponse: {}".format(turn['question'], turn['response'])) 147 | 148 | this_dialog[0] = "Context:\n" + this_dialog[0] 149 | 150 | # current turn 151 | this_dialog.append("Current Question: " + current_turn['question']) 152 | this_dialog = "\n\n".join(this_dialog) 153 | this_dialog = "YOUR TASK (only questions and responses may be given):\n" + this_dialog 154 | this_prompt = [self.instruction, self.demo, this_dialog, self.tail_instruction] 155 | this_prompt = "\n\n".join(this_prompt) 156 | 157 | return this_prompt 158 | 159 | def parse_returned_text(self, text): 160 | text = text.strip() 161 | try: 162 | splits = text.split('\n') 163 | if splits[0][:9] != "Rewrite: " or splits[1][:10] != "Response: ": 164 | return None 165 | if self.enable_cot: 166 | rewrite_text = splits[0][9:] 167 | fixed_sentence = "So the question should be rewritten as: " 168 | index = rewrite_text.find(fixed_sentence) 169 | if index != -1: 170 | cot = rewrite_text[:index] 171 | rewrite = rewrite_text[index + len(fixed_sentence):] 172 | else: 173 | return None 174 | 175 | response = "\n".join(splits[1:])[10:] 176 | return [cot, rewrite, response] 177 | else: 178 | rewrite = splits[0][9:] 179 | response = "\n".join(splits[1:])[10:] 180 | return [rewrite, response] 181 | except: 182 | return None 183 | 184 | 185 | 186 | class RewriteThenResponsePrompter: 187 | def __init__(self, demo_file, enable_cot=False) -> None: 188 | 189 | self.instruction = "For an information-seeking dialog, please help generate an informative response to answer the question." 190 | self.enable_cot = enable_cot 191 | self.demo = self.get_demo(demo_file) 192 | if self.demo != "": 193 | self.instruction += " I will give you several example multi-turn dialogs, where each turn contains a question as well as a rewrite and a response that you need to generate." 194 | if enable_cot: 195 | self.instruction += " The rewrite part begins with a sentence explaining the reason for the generated rewrite." 196 | 197 | self.tail_instruction = "Now, you should give me the response of the **Current Question** under the **Context**. The output format should always be: \"Response: $Response.\" Note that you should always try to generate an informative response. Never ask for clarification or say you don't understand it in the response. Go ahead!" 198 | self.stop_tokens = None 199 | 200 | def build_turn_prompt(self, context, current_turn): 201 | # context 202 | all_prompts = [self.instruction] 203 | if not context: 204 | all_prompts.append("N/A") 205 | else: 206 | for turn in context: 207 | all_prompts.append("Q: {}\nA: {}".format(turn['question'], turn['response'])) 208 | 209 | all_prompts[1] = "Context:\n" + all_prompts[1] 210 | 211 | # current turn 212 | all_prompts.append("Current Question: " + current_turn['question']) 213 | 214 | this_prompt = "\n\n".join(all_prompts) 215 | 216 | while not check_length(this_prompt, 3700): 217 | t = 1 218 | all_prompts[t] = "Q: {}\nA: {}".format(context[t-1]['question'], "N/A") 219 | this_prompt = "\n\n".join(all_prompts) 220 | t += 1 221 | 222 | this_prompt += "\n\n**Note that you should always try to generate a plausible answer. Never ask for clarification or say you don't understand it in the generated response.** Now, go ahead!" 223 | 224 | return this_prompt 225 | 226 | def parse_returned_text(self, text): 227 | text = text.strip() 228 | if text[:10] != "Response: ": 229 | return None 230 | return text[10:] 231 | 232 | 233 | 234 | 235 | class OneShotRewriteThenResponsePrompter: 236 | def __init__(self) -> None: 237 | 238 | self.instruction = "For an information-seeking dialog. Please help generate an appropriate response to answer the question.\n\nI will give you an example multi-turn dialog, where each turn contains a question and the corresponding response.\n\n# An Example Multi-turn Dialog:\n\nQuestion: What should I consider when buying a phone?\nResponse: The design of the phone and the overall look and feel of the phone are very important. You should be comfortable with the way the phone looks and feels when you hold it in your hand. In addition, don't be afraid to get a phone from a different manufacturer than you're used to. Consider an older, used, or refurbished phone to save money. Sometimes a year-old or even a two-year-old device might give you everything you need.\n\nQuestion: Cool. Which one would you recommend?\nResponse: If you want something more affordable, cheaper, but has the latest in Android and that comes with 5000mAh of battery, then the Motorola Moto G7 Power is an excellent phone to consider. If you want an Android smartphone that's cheaper, more affordable, and you still find the Moto G7 Power a bit off your budget, then I highly recommend the Samsung Galaxy A10e.\n\nQuestion: Tell me more about the first one.\nResponse: It sports an 8-megapixel camera on the front for selfies with an f/2.2 aperture and a pixel size of 1.12-micron. Motorola Moto G7 Power is based on Android 9.0 and packs 32GB of inbuilt storage that can be expanded via a microSD card (up to 512GB). The Motorola Moto G7 Power is a dual-SIM (GSM and GSM) smartphone that accepts Nano-SIM and Nano-SIM cards. The Motorola Moto G7 Power measures 160.83 x 76.00 x 9.40mm (height x width x thickness) and weighs 198.00 grams.\n\nQuestion: How much cheaper is the A10e?\nResponse: The current lowest price found for Samsung Galaxy A10e is 7,499 and for Motorola Moto G7 is 13990. The details of both of these products were last updated on Jun 02, 2021.\n\nQuestion: Wow, that's almost half the cost! Can you compare them?\nResponse: For the past few years, Motorola's G-series has consistently ranked as a favourite budget phone among users and the G7 continues this legacy as an exceptional phone for its price. All in all, people still love the Moto G7. It's a fantastic, affordable phone, and it won awards for its inexpensive price, good camera and reliable performance. But the Galaxy A50 is better, even if it's a bit more expensive. \n\n# Now, you should give me the response of the **Current Question** for the following dialog. The output format should be: Response: $Response." 239 | self.stop_tokens = None 240 | 241 | def build_turn_prompt(self, context, current_turn): 242 | # context 243 | all_prompts = [self.instruction] 244 | if not context: 245 | all_prompts.append("N/A") 246 | else: 247 | for turn in context: 248 | all_prompts.append("Q: {}\nA: {}".format(turn['question'], turn['response'])) 249 | 250 | all_prompts[1] = "Context:\n" + all_prompts[1] 251 | 252 | # current turn 253 | all_prompts.append("Current Question: " + current_turn['question']) 254 | 255 | this_prompt = "\n\n".join(all_prompts) 256 | 257 | while not check_length(this_prompt, 3700): 258 | t = 1 259 | all_prompts[t] = "Q: {}\nA: {}".format(context[t-1]['question'], "N/A") 260 | this_prompt = "\n\n".join(all_prompts) 261 | t += 1 262 | 263 | this_prompt += "\n\n**Note that you should always try to generate a plausible answer. Never ask for clarification or say you don't understand it in the generated response.** Now, go ahead!" 264 | 265 | return this_prompt 266 | 267 | def parse_returned_text(self, text): 268 | text = text.strip() 269 | if text[:10] != "Response: ": 270 | return None 271 | return text[10:] 272 | 273 | -------------------------------------------------------------------------------- /evaluation/eval_dense_retrieval.py: -------------------------------------------------------------------------------- 1 | # from IPython import embed 2 | from IPython import embed 3 | import logging 4 | logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(name)s - %(levelname)s - %(message)s') 5 | logger = logging.getLogger(__name__) 6 | 7 | import os 8 | import pickle 9 | import sys 10 | sys.path.append('..') 11 | sys.path.append('.') 12 | 13 | import json 14 | import time 15 | import copy 16 | import faiss 17 | import numpy as np 18 | import torch 19 | from torch.utils.data import DataLoader 20 | import argparse 21 | from tqdm import tqdm 22 | from pprint import pprint 23 | 24 | from trec_eval import trec_eval 25 | from dense_retrievers import load_dense_retriever 26 | from utils import set_seed, get_has_qrel_label_sample_ids 27 | 28 | 29 | def build_faiss_index(args): 30 | logger.info("Building Faiss Index...") 31 | # ngpu = faiss.get_num_gpus() 32 | ngpu = args.n_gpu_for_faiss 33 | gpu_resources = [] 34 | tempmem = -1 35 | 36 | for i in range(ngpu): 37 | res = faiss.StandardGpuResources() 38 | if tempmem >= 0: 39 | res.setTempMemory(tempmem) 40 | gpu_resources.append(res) 41 | 42 | cpu_index = faiss.IndexFlatIP(768) 43 | index = None 44 | if args.use_gpu_in_faiss: 45 | co = faiss.GpuMultipleClonerOptions() 46 | co.shard = True 47 | co.usePrecomputed = False 48 | # gpu_vector_resources, gpu_devices_vector 49 | vres = faiss.GpuResourcesVector() 50 | vdev = faiss.Int32Vector() 51 | for i in range(0, ngpu): 52 | vdev.push_back(i) 53 | vres.push_back(gpu_resources[i]) 54 | gpu_index = faiss.index_cpu_to_gpu_multiple(vres, 55 | vdev, 56 | cpu_index, co) 57 | index = gpu_index 58 | else: 59 | index = cpu_index 60 | 61 | return index 62 | 63 | def get_embeddings(args): 64 | def query_encoding_collate_fn(batch): 65 | bt_sample_ids, bt_src_query = list(zip(*batch)) # unzip 66 | bt_src_query_encoding = tokenizer(bt_src_query, 67 | padding="longest", 68 | max_length=args.max_query_length, 69 | truncation=True, 70 | return_tensors="pt") 71 | 72 | bt_q_input_ids, bt_q_attention_mask = bt_src_query_encoding.input_ids, bt_src_query_encoding.attention_mask 73 | 74 | return {"bt_sample_ids": bt_sample_ids, 75 | "bt_input_ids":bt_q_input_ids, 76 | "bt_attention_mask":bt_q_attention_mask} 77 | 78 | def response_encoding_collate_fn(batch): 79 | bt_sample_ids, bt_src_doc = list(zip(*batch)) # unzip 80 | bt_src_doc_encoding = tokenizer(bt_src_doc, 81 | padding="longest", 82 | max_length=512, 83 | truncation=True, 84 | return_tensors="pt") 85 | bt_d_input_ids, bt_d_attention_mask = bt_src_doc_encoding.input_ids, bt_src_doc_encoding.attention_mask 86 | return {"bt_sample_ids": bt_sample_ids, 87 | "bt_input_ids":bt_d_input_ids, 88 | "bt_attention_mask":bt_d_attention_mask} 89 | 90 | def forward_pass(test_loader, encoder, has_qrel_label_sample_ids): 91 | embeddings = [] 92 | eid2sid = [] # embedding idx to sample id 93 | encoder.zero_grad() 94 | with torch.no_grad(): 95 | for batch in tqdm(test_loader): 96 | encoder.eval() 97 | bt_sample_ids = batch["bt_sample_ids"] 98 | bt_input_ids = batch['bt_input_ids'].to(args.device) 99 | bt_attention_mask = batch['bt_attention_mask'].to(args.device) 100 | embs = encoder(bt_input_ids, bt_attention_mask) 101 | embs = embs.detach().cpu().numpy() 102 | 103 | 104 | sifted_sample_ids = [] 105 | sifted_embs = [] 106 | for i in range(len(bt_sample_ids)): 107 | if bt_sample_ids[i] not in has_qrel_label_sample_ids: 108 | continue 109 | sifted_sample_ids.append(bt_sample_ids[i]) 110 | sifted_embs.append(embs[i].reshape(1, -1)) 111 | 112 | if len(sifted_embs) > 0: 113 | sifted_embs = np.concatenate(sifted_embs) 114 | embeddings.append(sifted_embs) 115 | eid2sid.extend(sifted_sample_ids) 116 | else: 117 | continue 118 | 119 | embeddings = np.concatenate(embeddings, axis = 0) 120 | 121 | torch.cuda.empty_cache() 122 | return embeddings, eid2sid 123 | 124 | # for ANCE, query and doc encoders are shared. 125 | tokenizer, encoder = load_dense_retriever("ANCE", "query", args.retriever_path) 126 | encoder = encoder.to(args.device) 127 | 128 | with open(args.eval_file_path, "r") as f: 129 | data = f.readlines() 130 | 131 | query_encoding_dataset, response_encoding_dataset = [], [] 132 | n_query_candidate, n_response_candidate = 0, 0 133 | for line in data: 134 | record = json.loads(line) 135 | sample_id = record['sample_id'] 136 | if args.include_query: 137 | query_list = record[args.eval_field_name] 138 | if isinstance(query_list, str): 139 | query_list = [query_list] 140 | n_query_candidate = len(query_list) # all line's query_list has the same length 141 | for query in query_list: 142 | query_encoding_dataset.append([sample_id, query]) 143 | 144 | if args.include_response: 145 | response_list = record['predicted_response'] 146 | if isinstance(response_list, str): 147 | response_list = [response_list] 148 | n_response_candidate = len(response_list) 149 | for response in response_list: 150 | response_encoding_dataset.append([sample_id, response]) 151 | 152 | has_qrel_label_sample_ids = get_has_qrel_label_sample_ids(args.qrel_file_path) 153 | if args.include_query: 154 | query_test_loader = DataLoader(query_encoding_dataset, batch_size = 32, shuffle=False, collate_fn=query_encoding_collate_fn) 155 | query_embeddings, query_eid2sid = forward_pass(query_test_loader, encoder, has_qrel_label_sample_ids) 156 | 157 | if args.include_response: 158 | response_test_loader = DataLoader(response_encoding_dataset, batch_size = 32, shuffle=False, collate_fn=response_encoding_collate_fn) 159 | response_embeddings, response_eid2sid = forward_pass(response_test_loader, encoder, has_qrel_label_sample_ids) 160 | 161 | # filter out duplicate sample_ids 162 | eid2sid = query_eid2sid if query_eid2sid else response_eid2sid 163 | new_eid2sid = [] 164 | eid2sid_set = set() 165 | for x in eid2sid: 166 | if x not in eid2sid_set: 167 | new_eid2sid.append(x) 168 | eid2sid_set.add(x) 169 | eid2sid = new_eid2sid 170 | 171 | torch.cuda.empty_cache() 172 | 173 | # Different cases. We finally return one emebdding for each sample_id. 174 | if n_query_candidate == 1 and n_response_candidate == 1: 175 | return (query_embeddings + response_embeddings) / 2, eid2sid 176 | elif n_query_candidate >= 1 and n_response_candidate > 1: 177 | query_embeddings = query_embeddings.reshape(query_embeddings.shape[0] // n_query_candidate, n_query_candidate, query_embeddings.shape[1]) 178 | response_embeddings = response_embeddings.reshape(response_embeddings.shape[0] // n_response_candidate, n_response_candidate, response_embeddings.shape[1]) 179 | if args.aggregation_method == "maxprob": 180 | embeddings = (query_embeddings[:, 0, :] + response_embeddings[:, 0, :]) / 2 181 | return embeddings, eid2sid 182 | elif args.aggregation_method == "mean": 183 | embeddings = np.concatenate([query_embeddings, response_embeddings], axis = 1).mean(axis=1) 184 | return embeddings, eid2sid 185 | elif args.aggregation_method == "sc": 186 | if n_query_candidate == 1: 187 | query_embeddings = query_embeddings[:, 0, :] 188 | response_embeddings, _ = batch_closest_candidate(response_embeddings) 189 | else: 190 | query_embeddings, response_embeddings = batch_closest_candidate(query_embeddings, response_embeddings) 191 | return (query_embeddings + response_embeddings) / 2, eid2sid 192 | else: 193 | raise NotImplementedError 194 | elif n_response_candidate == 0: # only query (rewrite) 195 | query_embeddings = query_embeddings.reshape(query_embeddings.shape[0] // n_query_candidate, n_query_candidate, query_embeddings.shape[1]) 196 | if args.aggregation_method == "maxprob": 197 | query_embeddings = query_embeddings[:, 0, :] 198 | elif args.aggregation_method == "mean": 199 | query_embeddings = np.mean(query_embeddings, axis=1) 200 | elif args.aggregation_method == "sc": 201 | query_embeddings, _ = batch_closest_candidate(query_embeddings) 202 | else: 203 | raise NotImplementedError 204 | return query_embeddings, eid2sid 205 | else: # only response 206 | response_emebddings = response_embeddings.reshape(response_embeddings.shape[0] // n_response_candidate, n_response_candidate, response_embeddings.shape[1]) 207 | if args.aggregation_method == "maxprob": 208 | response_emebddings = response_emebddings[:, 0, :] 209 | elif args.aggregation_method == "mean": 210 | response_embeddings = np.mean(response_embeddings, axis=1) 211 | elif args.aggregation_method == "sc": 212 | response_embeddings, _ = batch_closest_candidate(response_embeddings) 213 | return response_emebddings, eid2sid 214 | 215 | 216 | 217 | def batch_closest_candidate(embeddings, affiliated_embeddings=None): 218 | has_aff = False 219 | if affiliated_embeddings is not None: 220 | has_aff = True 221 | 222 | res = [] 223 | res_aff = [] # corresponding affiliated_embeddings of embeddings. 224 | for i in range(embeddings.shape[0]): 225 | # Calculate the dot product of all pairs of embeddings in the batch 226 | dot_products = np.dot(embeddings[i], embeddings[i].T) 227 | 228 | # Calculate the sum of each row to get the total dot product for each candidate 229 | candidate_dots = np.sum(dot_products, axis=1) 230 | 231 | # Find the index of the candidate with the highest total dot product 232 | closest_idx = np.argmax(candidate_dots) 233 | 234 | # Return the embedding for the closest candidate 235 | res.append(embeddings[i][closest_idx].reshape(1, -1)) 236 | 237 | if has_aff: 238 | res_aff.append(affiliated_embeddings[i][closest_idx].reshape(1, -1)) 239 | 240 | return np.concatenate(res, axis=0), np.concatenate(res_aff, axis=0) if has_aff else None 241 | 242 | def faiss_flat_retrieval_one_by_one_and_finally_merge(args, query_embs): 243 | index = build_faiss_index(args) 244 | merged_candidate_matrix = None 245 | 246 | # Automaticall get the number of doc blocks 247 | args.num_doc_block = 1 248 | for filename in os.listdir(args.index_path): 249 | try: 250 | args.num_doc_block = max(args.num_doc_block, int(filename.split(".")[1])) 251 | except: 252 | continue 253 | args.num_doc_block += 1 254 | print("Automatically detect that the number of doc blocks is: {}".format(args.num_doc_block)) 255 | 256 | for block_id in range(args.num_doc_block): 257 | logger.info("Loading doc block " + str(block_id)) 258 | 259 | # load doc embeddings 260 | with open(os.path.join(args.index_path, "doc_emb_block.{}.pb".format(block_id)), 'rb') as handle: 261 | cur_doc_embs = pickle.load(handle) 262 | with open(os.path.join(args.index_path, "doc_embid_block.{}.pb".format(block_id)), 'rb') as handle: 263 | cur_eid2did = pickle.load(handle) 264 | if isinstance(cur_eid2did, list): 265 | cur_eid2did = np.array(cur_eid2did) 266 | 267 | # Split to avoid the doc embeddings to be too large 268 | num_total_doc_per_block = len(cur_doc_embs) 269 | num_doc_per_split = 500000 # please set it according to your GPU size. 700w doc needs ~28GB 270 | num_split_block = max(1, num_total_doc_per_block // num_doc_per_split) 271 | logger.info("num_total_doc: {}".format(num_total_doc_per_block)) 272 | logger.info("num_doc_per_split: {}".format(num_doc_per_split)) 273 | logger.info("num_split_block: {}".format(num_split_block)) 274 | cur_doc_embs_list = np.array_split(cur_doc_embs, num_split_block) 275 | cur_eid2did_list = np.array_split(cur_eid2did, num_split_block) 276 | for split_idx in range(len(cur_doc_embs_list)): 277 | cur_doc_embs = cur_doc_embs_list[split_idx] 278 | cur_eid2did = cur_eid2did_list[split_idx] 279 | logger.info("Adding block {} split {} into index...".format(block_id, split_idx)) 280 | index.add(cur_doc_embs) 281 | 282 | # ann search 283 | tb = time.time() 284 | 285 | D, I = index.search(query_embs, args.top_n) 286 | elapse = time.time() - tb 287 | logger.info({ 288 | 'time cost': elapse, 289 | 'query num': query_embs.shape[0], 290 | 'time cost per query': elapse / query_embs.shape[0] 291 | }) 292 | 293 | candidate_did_matrix = cur_eid2did[I] # doc embedding_idx -> real doc id 294 | D = D.tolist() 295 | candidate_did_matrix = candidate_did_matrix.tolist() 296 | candidate_matrix = [] 297 | 298 | for score_list, doc_list in zip(D, candidate_did_matrix): 299 | candidate_matrix.append([]) 300 | for score, doc in zip(score_list, doc_list): 301 | candidate_matrix[-1].append((score, doc)) 302 | assert len(candidate_matrix[-1]) == len(doc_list) 303 | assert len(candidate_matrix) == I.shape[0] 304 | 305 | index.reset() 306 | del cur_doc_embs 307 | del cur_eid2did 308 | 309 | if merged_candidate_matrix == None: 310 | merged_candidate_matrix = candidate_matrix 311 | continue 312 | 313 | # Merge 314 | merged_candidate_matrix_tmp = copy.deepcopy(merged_candidate_matrix) 315 | merged_candidate_matrix = [] 316 | for merged_list, cur_list in zip(merged_candidate_matrix_tmp, 317 | candidate_matrix): 318 | p1, p2 = 0, 0 319 | merged_candidate_matrix.append([]) 320 | while p1 < args.top_n and p2 < args.top_n: 321 | if merged_list[p1][0] >= cur_list[p2][0]: 322 | merged_candidate_matrix[-1].append(merged_list[p1]) 323 | p1 += 1 324 | else: 325 | merged_candidate_matrix[-1].append(cur_list[p2]) 326 | p2 += 1 327 | while p1 < args.top_n: 328 | merged_candidate_matrix[-1].append(merged_list[p1]) 329 | p1 += 1 330 | while p2 < args.top_n: 331 | merged_candidate_matrix[-1].append(cur_list[p2]) 332 | p2 += 1 333 | 334 | merged_D, merged_I = [], [] 335 | 336 | for merged_list in merged_candidate_matrix: 337 | merged_D.append([]) 338 | merged_I.append([]) 339 | for candidate in merged_list: 340 | merged_D[-1].append(candidate[0]) 341 | merged_I[-1].append(candidate[1]) 342 | merged_D, merged_I = np.array(merged_D), np.array(merged_I) 343 | 344 | logger.info(merged_D.shape) 345 | logger.info(merged_I.shape) 346 | 347 | return merged_D, merged_I 348 | 349 | def dense_retrieval(args): 350 | query_embs, eid2sid = get_embeddings(args) 351 | score_mat, did_mat = faiss_flat_retrieval_one_by_one_and_finally_merge(args, query_embs) 352 | 353 | # write to file 354 | run_trec_file = os.path.join(args.retrieval_output_path, "res.trec") 355 | with open(run_trec_file, "w") as f: 356 | for eid in range(len(did_mat)): 357 | sample_id = eid2sid[eid] 358 | retrieval_scores = score_mat[eid] 359 | retrieved_dids = did_mat[eid] 360 | for i in range(len(retrieval_scores)): 361 | rank = i + 1 362 | doc_id = retrieved_dids[i] 363 | rank_score = args.top_n - i # use the rank score for pytrec 364 | real_score = retrieval_scores[i] 365 | f.write("{} {} {} {} {} {} {}".format(sample_id, "Q0", doc_id, rank, rank_score, real_score, "ance")) 366 | f.write('\n') 367 | 368 | # evaluation 369 | trec_eval(run_trec_file, args.qrel_file_path, args.retrieval_output_path, args.rel_threshold) 370 | 371 | 372 | 373 | def get_args(): 374 | parser = argparse.ArgumentParser() 375 | 376 | parser.add_argument("--eval_file_path", type=str, required=True) 377 | parser.add_argument("--eval_field_name", type=str, required=True, help="Field name of the rewrite in the eval_file. E.g., t5_rewrite") 378 | parser.add_argument("--index_path", type=str, required=True) 379 | parser.add_argument("--qrel_file_path", type=str, required=True) 380 | parser.add_argument("--retriever_path", type=str, required=True) 381 | parser.add_argument("--max_query_length", type=int, default=64, help="Max single query length") 382 | parser.add_argument("--aggregation_method", type=str, default="maxprob", choices=["sc", "mean", "maxprob"]) 383 | 384 | parser.add_argument("--include_query", action="store_true") 385 | parser.add_argument("--include_response", action="store_true") 386 | 387 | parser.add_argument("--use_gpu_in_faiss", action="store_true", help="whether to use gpu in faiss or not.") 388 | parser.add_argument("--n_gpu_for_faiss", type=int, default=1, help="should be set if use_gpu_in_faiss") 389 | 390 | 391 | parser.add_argument("--top_n", type=int, default=1000) 392 | parser.add_argument("--rel_threshold", type=int, required=True, help="CAsT-20: 2, Others: 1") 393 | 394 | parser.add_argument("--retrieval_output_path", type=str, required=True) 395 | parser.add_argument("--seed", type=int, default=42) 396 | 397 | # main 398 | args = parser.parse_args() 399 | assert args.include_query or args.include_response 400 | os.makedirs(args.retrieval_output_path, exist_ok=True) 401 | with open(os.path.join(args.retrieval_output_path, "parameters.txt"), "w") as f: 402 | params = vars(args) 403 | f.write(json.dumps(params, indent=4)) 404 | 405 | device = torch.device("cuda" if torch.cuda.is_available() else "cpu") 406 | args.device = device 407 | 408 | logger.info("---------------------The arguments are:---------------------") 409 | pprint(args) 410 | 411 | return args 412 | 413 | 414 | 415 | if __name__ == '__main__': 416 | args = get_args() 417 | set_seed(args) 418 | dense_retrieval(args) 419 | logger.info("Dense retrieval and evaluation finish!") -------------------------------------------------------------------------------- /demonstrations.json: -------------------------------------------------------------------------------- 1 | [ 2 | { 3 | "conv_id": 134, 4 | "turns": [ 5 | { 6 | "turn_id": 1, 7 | "question": "What should I consider when buying a phone?", 8 | "response": "The design of the phone and the overall look and feel of the phone are very important. You should be comfortable with the way the phone looks and feels when you hold it in your hand. In addition, don't be afraid to get a phone from a different manufacturer than you're used to. Consider an older, used, or refurbished phone to save money. Sometimes a year-old or even a two-year-old device might give you everything you need.", 9 | "manual_rewrite": "What should I consider when buying a phone?", 10 | "cot": "This is the first turn." 11 | }, 12 | { 13 | "turn_id": 2, 14 | "question": "Cool. Which one would you recommend?", 15 | "response": "Just because a phone has everything and the kitchen sink doesn't mean you have to buy it \u2014 it's just an option for those who might have a larger budget. The same logic applies to pretty much anything you can buy. If you want something more affordable, cheaper, but has the latest in Android and that comes with 5000mAh of battery, then the Motorola Moto G7 Power is an excellent phone to consider. If you want an Android smartphone that's cheaper, more affordable, and you still find the Moto G7 Power a bit off your budget, then I highly recommend the Samsung Galaxy A10e.", 16 | "manual_rewrite": "Cool. Which smartphone would you recommend for me?", 17 | "cot": "Based on Turn 1, you are inquiring about what should be considered when buying a phone." 18 | 19 | }, 20 | { 21 | "turn_id": 3, 22 | "question": "Tell me more about the first one.", 23 | "response": "It sports an 8-megapixel camera on the front for selfies with an f/2.2 aperture and a pixel size of 1.12-micron. Motorola Moto G7 Power is based on Android 9.0 and packs 32GB of inbuilt storage that can be expanded via a microSD card (up to 512GB). The Motorola Moto G7 Power is a dual-SIM (GSM and GSM) smartphone that accepts Nano-SIM and Nano-SIM cards. The Motorola Moto G7 Power measures 160.83 x 76.00 x 9.40mm (height x width x thickness) and weighs 198.00 grams. It was launched in Marine Blue color. Connectivity options on the Motorola Moto G7 Power include Wi-Fi 802.11 a/b/g/n, GPS, Bluetooth v4.20, USB Type-C, FM radio, 3G, and 4G. Sensors on the phone include an accelerometer, ambient light sensor, compass/ magnetometer, gyroscope, proximity sensor, and fingerprint sensor. The Motorola Moto G7 Power supports face unlock. As of 1st June 2021, Motorola Moto G7 Power price in India starts at Rs. 14,990.", 24 | "manual_rewrite": "Tell me more about the Motorola Moto G7 Power.", 25 | "cot": "Based on Turn 2, Two phones were recommended. The first is Motorola Moto G7 Power and the second is Samsung Galaxy A10e." 26 | }, 27 | { 28 | "turn_id": 4, 29 | "question": "How much cheaper is the A10e?", 30 | "response": "The current lowest price found for Samsung Galaxy A10e is 7,499 and for Motorola Moto G7 is 13990. The details of both of these products were last updated on Jun 02, 2021.", 31 | "manual_rewrite": "How much cheaper is the Samsung Galaxy A10e than the Motorola Moto G7 Power?", 32 | "cot": "Based on Turn 2 and Turn 3, Motorola Moto G7 Power and Galaxy A10e were recommended and A10e is cheaper and more affordable." 33 | }, 34 | { 35 | "turn_id": 5, 36 | "question": "Wow, that's almost half the cost! Can you compare them?", 37 | "response": "For the past few years, Motorola's G-series has consistently ranked as a favourite budget phone among users and the G7 continues this legacy as an exceptional phone for its price. All in all, people still love the Moto G7. It's a fantastic, affordable phone, and it won awards for its inexpensive price, good camera and reliable performance. But the Galaxy A50 is better, even if it's a bit more expensive. For that, it too earned a few awards. If you have an extra $70 to spare, go for the Galaxy A50 over the Moto G7.", 38 | "manual_rewrite": "Wow, that's almost half the cost of a Samsung Galaxy A10e. Can you compare the Samsung Galaxy A10e with the Motorola Moto G7 Power?", 39 | "cot": "Based on Turn 4, the cost of Samsung Galaxy A10e is almost half of Motorola Moto G7 Power (7,499 vs 13,990)." 40 | }, 41 | { 42 | "turn_id": 6, 43 | "question": "Interesting. What makes the G7 so good?", 44 | "response": "Moto G7 and Moto G7 Power deliver a no-frills experience with few compromises. You're getting a great design, modern features, and clean and fast software. For the price, Motorola elegantly balances performance and features without making too many compromises. The biggest selling point of the Moto G 7 Power is the battery capacity and battery life.", 45 | "manual_rewrite": "Interesting. What makes the Motorola G7 such a good award-winning phone?", 46 | "cot": "Based on Turn 5, Moto G7 is a fantastic phone and won many awards." 47 | }, 48 | { 49 | "turn_id": 7, 50 | "question": "Interesting. Why is the A50 better?", 51 | "response": "The Galaxy A50's camera is way more flushed, it looks flatter and sleeker. The on-screen camera notch for the front-facing camera is also smaller. The A50 has an AMOLED display compared to the Moto G7's LCD screen. Though it doesn't shoot 4K video and Samsung's One UI is not everyone's favorite.", 52 | "manual_rewrite": "Why is the Samsung Galaxy A50 better than the Moto G7?", 53 | "cot": "Based on Turn 5, Galaxy A50 is better than Moto G7 and more expensive." 54 | } 55 | ] 56 | }, 57 | { 58 | "conv_id": 135, 59 | "turns": [ 60 | { 61 | "turn_id": 1, 62 | "question": "I took a year-long break from running due to work commitments. Tell me, how do I get back into it?", 63 | "response": "Don't rush getting back into running and you'll drastically avoid injury. Better to start with a few months of short, easy runs rather than rushing straight in with long and hard runs. Re-asses yourself every month and then increase mileage slowly if you are feeling strong.", 64 | "manual_rewrite": "How do I get back into running after a year-long break due to work commitments?", 65 | "cot": "This is the first turn." 66 | }, 67 | { 68 | "turn_id": 2, 69 | "question": "Cool. What exercises can I mix with it?", 70 | "response": "If you want to lift heavier weights in the gym, that can be done twice per week. Strength workouts should complement your running, not detract from it. If you're so sore from strength exercises that your long run or faster workout is compromised, tone down the intensity. You can lift less weight or reduce the number of reps or sets that you're completing. But no matter what, remember that strength exercises should make your running better.", 71 | "manual_rewrite": "Cool. What exercises can I mix with running?", 72 | "cot": "Based on Turn 1, the user asked about how to get back into running?" 73 | }, 74 | { 75 | "turn_id": 3, 76 | "question": "Which ones do you recommend?", 77 | "response": "Strength Training increases your running speed Weight lifting helps you build strength or improve your muscle strength, which helps you run faster. Your legs get more robust with these exercises. You can cover a longer distance with each stride if you have stronger legs. Sufficient muscle strength will act as a foundation for power training, which depends on this strength and converts it into speed. Specialized weight training improves your running performance. Half Marathon Training with Weight Lifting You need at least two weight training sessions per week. Here is a list of great strength training exercises for runners. Compound workouts such as Trap Bar and Squats and single-leg workouts such as lunges are great ways to work out multiple muscle groups at the same time. The lunge matrix is a great strength training exercise that can be done with or without weights. To start, your body's weight is sufficient enough as a great starting point for the lunge matrix. Take complete rest for at least 45 seconds up to 3 minutes between sets. Strength training needs a longer rest period because it is more exhausting and demanding on your overall body.", 78 | "manual_rewrite": "What strength exercises do you recommend to mix with running?", 79 | "cot": "Based on Turn 2, Strength workouts are recommended to complement the running." 80 | }, 81 | { 82 | "turn_id": 4, 83 | "question": "Awesome. Can I do these and run on the same day?", 84 | "response": "You can combine your strength training and easy run or long run sessions on the same day. Give yourself a 6-hour break between the exercises if possible. Remember that running is a priority and when push comes to shove you need to select runner-specific training (i.e. running) over strength training and cross-training. However, I would make it mandatory that you strength train at least once per week.", 85 | "manual_rewrite": "Awesome. Can I do these strength routines and run on the same day?", 86 | "cot": "Based on Turn 2 and Turn 3, we are talking about mixing strength training with running." 87 | } 88 | ] 89 | }, 90 | { 91 | "conv_id": 136, 92 | "turns": [ 93 | { 94 | "turn_id": 1, 95 | "question": "What was the basis of the Watergate scandal?", 96 | "response": "The Watergate Scandal erupted due to the Nixon administration's attempts to cover up its involvement in the break-in at the Democratic National Committee (DNC) on June 17, 1972, in the headquarters of the party's office at the Watergate complex in Washington, D.C. Details of the Watergate Scandal were exposed in investigations prompted by the media, especially by two reporters from the Washington Post, Bob Woodward and Carl Bernstein, along with their mysterious informant, \"Deep Throat\". The scandal surrounding the Watergate break-in and the subsequent cover-up by Nixon and his aides culminated in the president's resignation on August 9, 1974.", 97 | "manual_rewrite": "What was the basis of the Watergate scandal?", 98 | "cot": "This is the first turn." 99 | }, 100 | { 101 | "turn_id": 2, 102 | "question": "Oh wow. Why did they break in?", 103 | "response": "In June 1972, Nixon authorized members of his secret unit, as well as CREEP, to pay a team of burglars to infiltrate the Democratic Party's headquarters at the Watergate complex. This was before the election, and the burglars were meant to acquire information on his opponents. These undercover individuals broke into several Democratic offices within the Watergate and installed wire taps and recording equipment. Unfortunately for Nixon, the equipment was installed haphazardly and had to be readjusted. As the intruders attempted to return to Watergate and resolve the problem, they were arrested. The White House immediately attempted to downplay the break-in, but behind the scenes, Nixon and his secret team were quickly using funds to pay for the silence and cooperation of the intruders. Nixon also made sure that the FBI remained out of the incident by forcing the Central Intelligence Agency (CIA) to persuade the intelligence community that the break-in was classified as a matter of national security. The Nation Reacts While Nixon thought he was able to successfully cover up the break-in at the Watergate complex, members of the press, Congress and legal experts began questioning the event. By January 1973, the Watergate intruders stood trial for their participation in the break-in.", 104 | "manual_rewrite": "Oh wow. Why did the Nixon administration break in at Democratic National Committee?", 105 | "cot": "Based on Turn 1, nixon administration's attempts to cover up its involvement in the break-in at the Democratic National Committee (DNC)." 106 | }, 107 | { 108 | "turn_id": 3, 109 | "question": "What happened next?", 110 | "response": "The presiding judge, John Sirica, was unconvinced by the intruders' testimony that they acted solely in their own interest and that there was no connection to Nixon. Meanwhile, Congress intervened when the Senate organized the Watergate Committee to investigate the event. Needless to say, the walls were closing in around Nixon, and it became much worse. John Mitchell, the head of CREEP, was convicted of engaging in illegal activities by federal prosecutors. When that information became public, Judge Sirica increased the pressure on the Watergate intruders. In March, Sirica finally achieved his desired outcome when one of the intruders released information linking CREEP to the break-in. The dominoes were now falling one by one. In May, another convicted intruder testified before the Watergate Committee and revealed additional information linking not only CREEP, but the Nixon Administration to the events at Watergate. Eventually, the Watergate Committee called on John Dean, who was Nixon's lead legal counsel, who testified to Nixon's involvement in the break-in and cover-up. Simultaneously, the one piece of evidence that Congress, as well as the federal prosecution, needed was Nixon's taped conversations in the Oval Office.", 111 | "manual_rewrite": "What happened next after the Watergate intruders stood trial for their participation in the break-in?", 112 | "cot": "Based on Turn 2, the last thing we talk about is the Watergate intruders stood trial for their participation in the break-in." 113 | }, 114 | { 115 | "turn_id": 4, 116 | "question": "Did they get it?", 117 | "response": "Nixon initially refused to turn over the tapes claiming executive privilege and instead campaigned for the Stennis Compromise, which called for Senator John Stennis, who was essentially deaf, to translate the tapes for Congress! Judge Sirica and Congress both refused the compromise and instead sent a special prosecutor for the case, Archibald Cox, to present Nixon with a subpoena for the tapes. In another outlandish move, Nixon used his executive powers to terminate Cox as a prosecutor in what became known as the ' Saturday Night Massacre .' Nixon Surrenders Nixon's intransigence and outright defiance failed to pay off. While Nixon attempted to release sanitized versions of his tapes, many had had enough. On July 24, 1974, the United States Supreme Court ruled in the case of U.S. v. Nixon that Nixon's use of executive privilege was not applicable in a criminal investigation. Within weeks of the decision, the House of Representatives approved the impeachment of Nixon on the grounds of obstruction of justice, violation of the constitution and the unlawful withholding of evidence. To make matters worse, a tape was released, known as the 'smoking gun,' which unveiled Nixon agreeing to a CIA cover-up of the Watergate break-in during a phone conversation in 1972.", 118 | "manual_rewrite": "Did Congress and federal prosecution get Nixon's taped conversations in the Oval Office?", 119 | "cot": "Based on Turn 3, Congress and the federal prosecution want to get the Nixon's taped conversations in the Oval Office." 120 | }, 121 | { 122 | "turn_id": 5, 123 | "question": "So what happened to Nixon?", 124 | "response": "With the mounting evidence and loss of both political and public support, Nixon resigned as president on August 9, 1974. Notably, his successor, President Gerald Ford, pardoned Nixon for his involvement in Watergate soon after his resignation.", 125 | "manual_rewrite": "So what happened to Nixon after the events of the Watergate scandal?", 126 | "cot": "Based on all previous turns, Nixon was badly involved in the Watergate scandal." 127 | }, 128 | { 129 | "turn_id": 6, 130 | "question": "What effects did this all have on the country's politics?", 131 | "response": "The Watergate scandal not only had a profound impact on America but also on the world. Due to this scandal, the republican party suffered and moved to the right on the political scale. Congress tried to limit presidential power in an effort to ensure a similar scandal never occurs again. Further, the American people also grew weary of big government. Further, as a result of Watergate, the presidential office was weakened, allowing the Soviets to expand. In addition, Watergate also led to the North Vietnamese regaining territory because the president was too busy with Watergate. Not only did this scandal affect the world, but it also hit the homeland hard. The Republican party was severely damaged; the American people lost all trust in Republicans. The Democrats gained 49 seats in the House and five in the Senate. The scandal helped wipe out a generation of Republican politicians. Leading to a democratic majority, which advocated less government, lower taxes, and more libertarian views. Another element that came out of Watergate was Congress's attempt to limit the campaign funding of presidents. Congress attempted to make the government more transparent with new laws and amending the Freedom of Information Act. Additionally, they passed the War Powers Act in 1973 to limit the president's ability to wage war.", 132 | "manual_rewrite": "What effects did the Watergate scandal have on US politics?", 133 | "cot": "Based on all previous turns, we are talking about the event of Watergate scandal." 134 | }, 135 | { 136 | "turn_id": 7, 137 | "question": "What about Nixon's legacy?", 138 | "response": "Nixon signed into law Title IX, a section of the Education Amendment that says, \"No person in the United States shall based on sex, be excluded from participation in, be denied the benefits of, or be subjected to discrimination under any education program or activity receiving Federal financial assistance.\" The 40th anniversary of the law was celebrated this year. ABC's Serena Marshall reported on that anniversary: \"Since its passage, girls' participation in sports has gone from 1 in 27 to 2 in 5 at the high school level.\" Bostock said this and Nixon's other positive accomplishments did enough to outweigh the failures of his administration. \"I think if you look back -- and what better time to look back on a man's life than on the 100th anniversary of his birth? -- at the totality of his record and the influence he had on the country and the world, it was really extraordinary and on balance for the good,\" he said. Fulsom disagreed, saying Nixon's greatest legacy was one of \"distrust in the president and the government.\" \"I think he left definitely a negative impression on all those who lived during the time, and it continues to this day,\" Fulsom said. \"Just about every book that's ever been written about him paints a rather dim portrait as a leader.\" Love him or hate him, there's no denying, President Richard Nixon changed the course of American history.", 139 | "manual_rewrite": "What effects did the Watergate scandal have on President Nixon's legacy?", 140 | "cot": "Based on Turn 6, we are talking about the effects of the Watergate scandal." 141 | } 142 | ] 143 | } 144 | ] -------------------------------------------------------------------------------- /results/cast21_human_evaluation.part.json: -------------------------------------------------------------------------------- 1 | [ 2 | { 3 | "conv_id": 106, 4 | "turns": [ 5 | { 6 | "turn_id": 1, 7 | "question": "I just had a breast biopsy for cancer. What are the most common types?", 8 | "response": "More research is needed. Types Breast cancer can be: Ductal carcinoma: This begins in the milk duct and is the most common type. Lobular carcinoma: This starts in the lobules. Invasive breast cancer is when the cancer cells break out from inside the lobules or ducts and invade nearby tissue, increasing the chance of spreading to other parts of the body. Non-invasive breast cancer is when the cancer is still inside its place of origin and has not broken out.", 9 | "manual_rewrite": "I just had a breast biopsy for cancer. What are the most common types of breast cancer?", 10 | "REW_rewrite": [ 11 | "What are the most common types of breast cancer that can be detected through a biopsy?", 12 | "What are the most common types of breast cancer biopsies?", 13 | "What are the most common types of breast cancer that can be detected through a biopsy?", 14 | "What are the most common types of breast biopsy for cancer?", 15 | "What are the most common types of breast cancer that can be detected through a biopsy?" 16 | ], 17 | "manual_ndcg3": 0.0, 18 | "REW_ndcg3": 0.0, 19 | "ndcg3: manual_is_better": false, 20 | "ndcg3: REW_is_better": false, 21 | "ndcg3: tie": true, 22 | "human_evaluation": "good-h" 23 | }, 24 | { 25 | "turn_id": 2, 26 | "question": "Once it breaks out, how likely is it to spread?", 27 | "response": "Even though this condition doesn\u2019t spread, it\u2019s important to keep an eye on it. Between 20% to 40% of women with this condition will develop a separate invasive breast cancer -- one that will grow outside its original location -- within the next 15 years. Most of the time, these later cancers begin in the milk ducts, rather than the lobules. How is lobular carcinoma in situ diagnosed? You often won\u2019t have any symptoms with LCIS.", 28 | "manual_rewrite": "Once it breaks out, how likely is lobular carcinoma breast cancer to spread?", 29 | "REW_rewrite": [ 30 | "What is the likelihood of breast cancer spreading once it becomes invasive?", 31 | "What is the likelihood of breast cancer spreading once it becomes invasive?", 32 | "What is the likelihood of breast cancer spreading after it becomes invasive?", 33 | "What is the likelihood of breast cancer spreading after it becomes invasive?", 34 | "What is the likelihood of breast cancer spreading once it becomes invasive?" 35 | ], 36 | "manual_ndcg3": 0.0, 37 | "REW_ndcg3": 0.0, 38 | "ndcg3: manual_is_better": false, 39 | "ndcg3: REW_is_better": false, 40 | "ndcg3: tie": true, 41 | "human_evaluation": "bad-o" 42 | }, 43 | { 44 | "turn_id": 3, 45 | "question": "How deadly is it?", 46 | "response": "In 1999, a student opened fire at W. R. Myers, killing one student and seriously wounding another. In 2000, LCI was locked down after two youths were arrested and two firearms were seized. Section::::Campus.", 47 | "manual_rewrite": "How deadly is lobular carcinoma in situ?", 48 | "REW_rewrite": [ 49 | "What is the prognosis for invasive breast cancer that has spread?", 50 | "What is the mortality rate of invasive breast cancer that has spread from its original location?", 51 | "What is the mortality rate for breast cancer once it has spread from its original location?", 52 | "What is the mortality rate of breast cancer after it spreads outside its original location?", 53 | "What is the mortality rate of breast cancer that has spread outside its original location?" 54 | ], 55 | "manual_ndcg3": 0.0, 56 | "REW_ndcg3": 0.0, 57 | "ndcg3: manual_is_better": false, 58 | "ndcg3: REW_is_better": false, 59 | "ndcg3: tie": true, 60 | "human_evaluation": "bad-o" 61 | }, 62 | { 63 | "turn_id": 4, 64 | "question": "What? No, I want to know about the deadliness of lobular carcinoma in situ.", 65 | "response": "It\u2019s sometimes difficult to separate the two conditions and in this case it will be described as lobular neoplasia. Future breast cancer risk The vast majority of women diagnosed with ALH or LCIS will never get breast cancer. However, people diagnosed with either condition have a slightly higher risk than the general population of developing breast cancer. PLCIS behaves differently to ALH and LCIS, and may be more likely to develop into cancer in the future. Your individual risk depends on several factors, which your specialist can talk to you about.", 66 | "manual_rewrite": "What? No, I want to know about the deadliness of lobular carcinoma in situ.", 67 | "REW_rewrite": [ 68 | "Could you provide information on how deadly lobular carcinoma in situ can be?", 69 | "Can you provide information on the mortality rate of lobular carcinoma in situ?", 70 | "Can you tell me about the mortality rate of lobular carcinoma in situ?", 71 | "Can you provide information on how deadly lobular carcinoma in situ can be?", 72 | "Can you provide information on the mortality rate or how dangerous lobular carcinoma in situ (LCIS) is?" 73 | ], 74 | "manual_ndcg3": 0.0, 75 | "REW_ndcg3": 0.0, 76 | "ndcg3: manual_is_better": false, 77 | "ndcg3: REW_is_better": false, 78 | "ndcg3: tie": true, 79 | "human_evaluation": "good-h" 80 | }, 81 | { 82 | "turn_id": 5, 83 | "question": "Wow, that's better than I thought. What are common treatments?", 84 | "response": "Treatment and follow-up There is no standard recommended treatment or follow-up for lobular neoplasia. Your specialist will discuss treatment options with you based on your particular situation. If your lobular neoplasia is diagnosed by a core biopsy, your doctor may recommend removing more tissue from the area where the lobular neoplasia was found. This is to find out if there are any cancer cells in this area. This may be done using a procedure called a vacuum assisted biopsy, or a small operation known as an excision biopsy.", 85 | "manual_rewrite": "Wow, that's better than I thought. What are common treatments for lobular carcinoma in situ?", 86 | "REW_rewrite": [ 87 | "What are the common treatments for breast cancer?", 88 | "What are the common treatments for breast cancer?", 89 | "What are the common treatments for breast cancer?", 90 | "What are the common treatments available for breast cancer?", 91 | "Can you provide information on common treatments for breast cancer?" 92 | ], 93 | "manual_ndcg3": 0.0, 94 | "REW_ndcg3": 0.0, 95 | "ndcg3: manual_is_better": false, 96 | "ndcg3: REW_is_better": false, 97 | "ndcg3: tie": true, 98 | "human_evaluation": "bad-o" 99 | }, 100 | { 101 | "turn_id": 6, 102 | "question": "How does it behave differently from PLCIS?", 103 | "response": "Wilson said she has \u201cvigilantly monitored\u201d her health with mammograms and MRIs after knowing she had an \u201cunderlying condition of LCIS, (lobular carcinoma in situ).\u201d When a recent breast biopsy revealed PLCIS (pleomorphic carcinoma in situ), initial tests showed that it was not cancerous. But Wilson had second thoughts. \u201cA friend who had had breast cancer suggested I get a second opinion on my pathology and my gut told me that was the thing to do,\u201d she wrote. When she went to two other pathologists, both confirmed a cancer diagnosis. \u201cI share this to educate others that a second opinion is critical to your health,\u201d she said. \u201cYou have nothing to lose if both opinions match up for the good, and everything to gain if something that was missed is found, which does happen. Early diagnosis is key.\u201d Wilson is the second celebrity in a week to do a public service announcement for early cancer detection. Last week, Taylor Swift revealed that her mother had recently been diagnosed with cancer, and urged her fans to remind their parents about getting screened.", 104 | "manual_rewrite": "How does LCIS behave differently from PLCIS?", 105 | "REW_rewrite": [ 106 | "How does lobular neoplasia behave differently from PLCIS?", 107 | "How does lobular neoplasia behave differently from PLCIS?", 108 | "In what ways does lobular carcinoma in situ (LCIS) behave differently from pleomorphic lobular carcinoma in situ (PLCIS)?", 109 | "How does lobular carcinoma in situ (LCIS) behave differently from pleomorphic lobular carcinoma in situ (PLCIS)?", 110 | "How does lobular neoplasia behave differently from PLCIS?" 111 | ], 112 | "manual_ndcg3": 0.0, 113 | "REW_ndcg3": 0.0, 114 | "ndcg3: manual_is_better": false, 115 | "ndcg3: REW_is_better": false, 116 | "ndcg3: tie": true, 117 | "human_evaluation": "good-h" 118 | }, 119 | { 120 | "turn_id": 7, 121 | "question": "What makes lobular cancer distinct?", 122 | "response": "Invasive lobuluar cancer is a less common type of breast cancer than invasive ductal cancer. This cancer accounts for about 10% of all invasive breast cancer cases. Prognosis for infiltrating and invasive lobular breast carcinomas will naturally be influenced by tumor size, grade, stage and hormone receptor status.. However, lobular breast cancers, when positive for estrogen and progesterone receptors, tend to respond very well to hormone therapy. The overall breast cancer survival rates for infiltrating lobular carcinoma, when matched by stage, are a little higher than for ductal carcinoma for the first 5 years. Survival rates range from about 77% to 93%, but on average the 5 year survival rate was estimated at about 90%. How old are these figures, Doc? The general range of survival 77% to 93% is over 10 years old so expect higher now. The average survival rate of 90% was as of 2010, so still quite old. Infiltrating/Invasive Ductal Carcinoma (IDC)Infiltrating ductal carcinoma of the generic type (NOS) is by far the most common breast cancer type, and represents about 78% of all cases.", 123 | "manual_rewrite": "What makes lobular cancer distinct?", 124 | "REW_rewrite": [ 125 | "What is the difference between invasive and non-invasive lobular carcinoma and how is lobular neoplasia different from PLCIS?", 126 | "What makes lobular carcinoma different from other types of breast cancer?", 127 | "What makes lobular carcinoma distinct from other types of breast cancer?", 128 | "What makes lobular carcinoma distinct from other types of breast cancer?", 129 | "What makes lobular carcinoma in situ (LCIS) distinct from other breast cancers?" 130 | ], 131 | "manual_ndcg3": 0.0, 132 | "REW_ndcg3": 0.0, 133 | "ndcg3: manual_is_better": false, 134 | "ndcg3: REW_is_better": false, 135 | "ndcg3: tie": true, 136 | "human_evaluation": "good-h" 137 | }, 138 | { 139 | "turn_id": 8, 140 | "question": "For the first stage, what are the alternatives to surgery?", 141 | "response": "Regardless of the histological subtype, the prognosis of IDC depends also on tumor size, presence of cancer in the lymph nodes, histological grade, presence of cancer in small vessels (vascular invasion), expression of hormone receptors and of oncogenes like HER2/neu. These parameters can be entered into models that provide a statistical probability of systemic spread. The probability of systemic spread is a key factor in determining whether radiation and chemotherapy are worthwhile. The individual parameters are important also because they can predict how well a cancer will respond to specific chemotherapy agents. Overall, the 5-year survival rate of invasive ductal carcinoma was approximately 85% in 2003. Section::::Treatment. Treatment of invasive carcinoma of no special type (NST) depends on the size of the mass (size of the", 142 | "manual_rewrite": "For the first stage, what are the alternatives to surgery for invasive lobular cancer?", 143 | "REW_rewrite": [ 144 | "What are the alternative treatments to surgery for lobular neoplasia?", 145 | "What are the alternatives to surgery for treating lobular neoplasia?", 146 | "What are the alternatives to surgery for treating lobular neoplasia?", 147 | "What are the alternatives to surgery for treating lobular neoplasia?", 148 | "What are the alternatives to surgery for treating lobular neoplasia?" 149 | ], 150 | "manual_ndcg3": 0.0, 151 | "REW_ndcg3": 0.0, 152 | "ndcg3: manual_is_better": false, 153 | "ndcg3: REW_is_better": false, 154 | "ndcg3: tie": true, 155 | "human_evaluation": "bad-o" 156 | }, 157 | { 158 | "turn_id": 10, 159 | "question": "Does freezing work?", 160 | "response": "Cryoablation is another local treatment for single, small metastatic breast cancer tumors, especially in women who aren't good surgery candidates. Guided by imaging (ultrasound, CT scan, or MRI), cryoablation inserts a special freezing probe (a type of catheter) through the skin and to the tumor to be treated. Once the tip of the probe is in the right spot inside the tumor core, pressurized argon gas is injected through the probe into the tumor, freezing and destroying the cancerous tissue. Cryoablation techniques may vary from one cancer treatment facility to another. Cryoablation is more common in cases where the breast cancer is being well controlled by systemic treatments.", 161 | "manual_rewrite": "Does freezing tumors work as an alternative to surgery for stage 1 invasive lobular cancer?", 162 | "REW_rewrite": [ 163 | "Can freezing be used as a treatment for lobular carcinoma in situ or invasive lobular cancer?", 164 | "Can freezing be used as a treatment for lobular carcinoma?", 165 | "Can freezing be used as a treatment for lobular carcinoma?", 166 | "Can freezing be used as a treatment for lobular carcinoma?", 167 | "Can freezing be used as a treatment for lobular carcinoma in situ?" 168 | ], 169 | "manual_ndcg3": 0.0, 170 | "REW_ndcg3": 0.0, 171 | "ndcg3: manual_is_better": false, 172 | "ndcg3: REW_is_better": false, 173 | "ndcg3: tie": true, 174 | "human_evaluation": "bad-o" 175 | } 176 | ] 177 | }, 178 | { 179 | "conv_id": 107, 180 | "turns": [ 181 | { 182 | "turn_id": 1, 183 | "question": "How do I build a cheap driveway?", 184 | "response": "Types of Driveways \u2013 Gravel, asphalt, brick & concrete driveway Gravel Driveways \u2013 These driveways are usually created by dunking a load of gravel and leveling it. Gravel driveways are cheap, can last can last of a very long time and require little maintenance. The surface should also be regraded from time to time. The only disadvantage is that homeowners need to remove weeds and clean small dust, leaves when they appear. This type of driveway is not suitable for children because they can choke or dig into your gravel and the gravel is too sharp for their feet. Asphalt Driveways \u2013 This type of driveway looks a lot like a tarred road and is constructed by tarring or cementing a gravel sheet.", 185 | "manual_rewrite": "How do I build a cheap driveway?", 186 | "REW_rewrite": [ 187 | "What are some ways to construct an inexpensive driveway?", 188 | "What are some cost-effective ways to build a driveway?", 189 | "What are some ways to build a driveway on a budget?", 190 | "What are some cost-effective ways to build a driveway?", 191 | "What are some affordable ways to construct a driveway?" 192 | ], 193 | "manual_ndcg3": 0.0, 194 | "REW_ndcg3": 0.0, 195 | "ndcg3: manual_is_better": false, 196 | "ndcg3: REW_is_better": false, 197 | "ndcg3: tie": true, 198 | "human_evaluation": "good-h" 199 | }, 200 | { 201 | "turn_id": 2, 202 | "question": "Which is cheaper: concrete or asphalt?", 203 | "response": "\"Pros and Cons: Asphalt vs. Concrete Driveway Choosing the driveway material that\u2019s best for you often depends on your climate and budget. When the time comes to install a new driveway, the biggest decision is whether to use concrete or asphalt. Sure, there are other driveway material choices \u2014 gravel at the low end and brick pavers at the high end \u2014 but concrete and asphalt are the most common, and each comes with advantages and disadvantages. Concrete lasts longer, but asphalt driveways are less expensive to install. Asphalt needs more maintenance, but is generally easier to repair. In winter, concrete driveways can suffer damage if you use the wrong de-icing product.", 204 | "manual_rewrite": "Which is cheaper way to build a driveway: concrete or asphalt?", 205 | "REW_rewrite": [ 206 | "What are the cost differences between concrete and asphalt driveways?", 207 | "Can you compare the cost of building a concrete driveway versus an asphalt driveway?", 208 | "Can you compare the cost of building a concrete driveway versus an asphalt driveway?", 209 | "What are the price differences between concrete and asphalt driveways?", 210 | "Can you compare the cost of building a concrete driveway versus an asphalt driveway?" 211 | ], 212 | "manual_ndcg3": 0.0, 213 | "REW_ndcg3": 0.0, 214 | "ndcg3: manual_is_better": false, 215 | "ndcg3: REW_is_better": false, 216 | "ndcg3: tie": true, 217 | "human_evaluation": "good-a" 218 | }, 219 | { 220 | "turn_id": 3, 221 | "question": "Really? What type of product?", 222 | "response": "What Rock Salt is Safe for Concrete? Although we all know that snow and ice can create dangerous conditions throughout our properties, we also don\u2019t want to create even bigger issues by using the wrong ice melting product. In recent years, a growing number of home and business owners have become increasingly concerned about the effects of rock salt on concrete. But what problems can rock salt actually present to concrete, and what are some best practices for preventing ice while protecting your pavement? This post will answer these and other questions so that you can make rock salt safe for concrete within your property lines.", 223 | "manual_rewrite": "What type of de-icing product can damage concrete driveways?", 224 | "REW_rewrite": [ 225 | "Can you provide more information on which de-icing products can cause damage to concrete driveways during winter?", 226 | "Can you provide more information on the de-icing product that can damage a concrete driveway?", 227 | "Can you provide more information about the de-icing product that can damage concrete driveways during winter?", 228 | "Can you provide more information on the de-icing product that can damage concrete driveways in winter?", 229 | "Can you provide more information on the de-icing product that can damage concrete driveways?" 230 | ], 231 | "manual_ndcg3": 0.0, 232 | "REW_ndcg3": 0.0, 233 | "ndcg3: manual_is_better": false, 234 | "ndcg3: REW_is_better": false, 235 | "ndcg3: tie": true, 236 | "human_evaluation": "good-h" 237 | }, 238 | { 239 | "turn_id": 4, 240 | "question": "Who knew? Which is more environmentally friendly?", 241 | "response": "Rock salt is very inexpensive but also a corrosive and concentrated substance, which can cause problems for your plants, animals, and waterways. The same chemical magic that turns ice into water creates a very salty brine that can make household pets sick, kill lawns and eats away at concrete, brick, and stone. Salt is also an irritant, which means pets with pads on the bottom of their feet may get superficial burns. The ice-melt industry is well aware that people want green. Which means eco-friendly and pet friendly claims abound. Read the ingredients. If the bulk of the product is magnesium or calcium chloride, it is a safer and less corrosive salt than the more common sodium chloride or rock salt.", 242 | "manual_rewrite": "Which type of driveway is more environmentally friendly?", 243 | "REW_rewrite": [ 244 | "Which driveway material is more environmentally friendly, concrete or asphalt?", 245 | "Which type of driveway, concrete or asphalt, is more environmentally friendly?", 246 | "Which driveway material is more environmentally friendly, asphalt or concrete?", 247 | "Which type of driveway, concrete or asphalt, is more environmentally friendly?", 248 | "Which driveway material, concrete or asphalt, is more environmentally friendly?" 249 | ], 250 | "manual_ndcg3": 0.0, 251 | "REW_ndcg3": 0.0, 252 | "ndcg3: manual_is_better": false, 253 | "ndcg3: REW_is_better": false, 254 | "ndcg3: tie": true, 255 | "human_evaluation": "good-a" 256 | }, 257 | { 258 | "turn_id": 5, 259 | "question": "No. Which type of driveway is better for the environment?", 260 | "response": "These types of surfaces canalso be better for the environment and do not necessarily cost more or require a lot of maintenance. There are three main types of solution to creating a permeable driveway: Using gravel or a mainly green, vegetated area. Directing water from an impermeable surface to a border rain garden or soakaway. Using permeable block paving, porous asphalt or concrete. The most appropriate construction will depend on factors such as the space available, slopes, the type of soil and whether the existing garden gets waterlogged. You can also combine the methods in a single garden to get an interesting appearance.", 261 | "manual_rewrite": "No. Which type of driveway is better for the environment?", 262 | "REW_rewrite": [ 263 | "Which type of driveway is more environmentally friendly, asphalt or concrete?", 264 | "Which type of driveway is more environmentally friendly, asphalt or concrete?", 265 | "Which type of driveway is more environmentally friendly: concrete or asphalt?", 266 | "Which type of driveway is more environmentally friendly, asphalt or concrete?", 267 | "Which type of driveway is more environmentally friendly, asphalt or concrete?" 268 | ], 269 | "manual_ndcg3": 0.0, 270 | "REW_ndcg3": 0.0, 271 | "ndcg3: manual_is_better": false, 272 | "ndcg3: REW_is_better": false, 273 | "ndcg3: tie": true, 274 | "human_evaluation": "good-a" 275 | }, 276 | { 277 | "turn_id": 6, 278 | "question": "And most low-maintenance?", 279 | "response": "Cost Chart: Paver Driveway \u2013 #1 \u2013 Most Expensive Concrete Driveway \u2013 #2Asphalt Driveway \u2013 #3Gravel Driveway \u2013 #4 \u2013 Least Expensive A concrete driveway is the second most expensive driveway when comparing it to the other hard surface driveways. Ongoing Maintenance Chart: Gravel Driveway \u2013 #1 \u2013 Most Ongoing Maintenance Asphalt Driveway \u2013 #2Concrete Driveway \u2013 #3Paver Driveway \u2013 #4 \u2013 Least Ongoing Maintenance A concrete driveway requires the second least amount of ongoing maintenance, compared to the other hard surface driveways. Life Expectancy Of A Concrete Driveway A properly installed concrete driveway with regularly scheduled maintenance can last up to 20 or even 30 years! That variation in age can be determined by the stability of the sub-grade, the quality of the installation, the type and frequency of maintenance, the drainage plan and the climate of the region you\u2019re located in.", 280 | "manual_rewrite": "Which type of driveway is the most low-maintenance?", 281 | "REW_rewrite": [ 282 | "Which type of driveway requires the least maintenance?", 283 | "Which type of driveway requires the least maintenance?", 284 | "Which type of driveway requires the least amount of maintenance?", 285 | "Which type of driveway requires the least maintenance?", 286 | "Which type of driveway requires the least amount of maintenance?" 287 | ], 288 | "manual_ndcg3": 0.0, 289 | "REW_ndcg3": 0.0, 290 | "ndcg3: manual_is_better": false, 291 | "ndcg3: REW_is_better": false, 292 | "ndcg3: tie": true, 293 | "human_evaluation": "good-h" 294 | }, 295 | { 296 | "turn_id": 7, 297 | "question": "Really? What about asphalt?", 298 | "response": "Asphalt Driveways Asphalt is one of the most commonly chosen materials for paving driveways and parking lots. During installation, the area will need to be excavated and graded to be paved and allow the base to settle before applying the hot mix asphalt and compacting it with a heavy duty roller. After it has fully curated is when the driveway would be ready for use. Advantages of Asphalt Driveways Asphalt driveways offer a number of advantages including: Longevity \u2013 When an asphalt driveway is professionally installed and properly maintained it can last for up to 30 years before needing to be replaced.", 299 | "manual_rewrite": "Really? How long does an asphalt driveway last?", 300 | "REW_rewrite": [ 301 | "How does the maintenance of an asphalt driveway compare to that of a concrete driveway in terms of low-maintenance?", 302 | "Can you tell me about the maintenance requirements of an asphalt driveway compared to a concrete driveway?", 303 | "Can you tell me which type of driveway requires the least ongoing maintenance?", 304 | "How does the maintenance of an asphalt driveway compare to that of a concrete driveway?", 305 | "Can you tell me about the maintenance required for an asphalt driveway?" 306 | ], 307 | "manual_ndcg3": 0.0, 308 | "REW_ndcg3": 0.0, 309 | "ndcg3: manual_is_better": false, 310 | "ndcg3: REW_is_better": false, 311 | "ndcg3: tie": true, 312 | "human_evaluation": "good-h" 313 | }, 314 | { 315 | "turn_id": 8, 316 | "question": "Is sealing worth it?", 317 | "response": "Home > Home Improvement > Landscaping > Cost to Seal a Driveway How Much Does Sealing an Asphalt Driveway Cost? Asphalt Sealing Prices Whether you want to keep your recently paved asphalt driveway looking brand new or extend the life of an older driveway, sealcoating is a worthwhile investment. Sealant forms a layer of protection on top of the driveway that reduces the deleterious effects of the elements and chemicals. In addition to preserving the driveway, a coat of sealant improves a home's curb appeal and value. Driveway Sealing Considerations The cosmetic and performance benefits of driveway sealing make this job practically a no-brainer.", 318 | "manual_rewrite": "Is sealing an asphalt driveway worth it?", 319 | "REW_rewrite": [ 320 | "What are the advantages of sealing an asphalt driveway?", 321 | "Should I consider sealing my driveway?", 322 | "Should I consider sealing my driveway, and is it worth it?", 323 | "Should I consider sealing my driveway?", 324 | "What are the benefits of sealing a driveway made of asphalt or concrete?" 325 | ], 326 | "manual_ndcg3": 0.0, 327 | "REW_ndcg3": 0.0, 328 | "ndcg3: manual_is_better": false, 329 | "ndcg3: REW_is_better": false, 330 | "ndcg3: tie": true, 331 | "human_evaluation": "good-a" 332 | } 333 | ] 334 | }, 335 | { 336 | "conv_id": 108, 337 | "turns": [ 338 | { 339 | "turn_id": 1, 340 | "question": "How can fires help an ecosystem?", 341 | "response": "Many ecosystems, particularly prairie, savanna, chaparral and coniferous forests, have evolved with fire as an essential contributor to habitat vitality and renewal. [1] Many plant species in fire-affected environments require fire to germinate, establish, or to reproduce. Wildfire suppression not only eliminates these species, but also the animals that depend upon them. [2]Campaigns in the United States have historically molded public opinion to believe that wildfires are always harmful to nature. This view is based on the outdated belief that ecosystems progress toward an equilibrium and that any disturbance, such as fire, disrupts the harmony of nature. More recent ecological research has shown, however, that fire is an integral component in the function and biodiversity of many natural habitats, and that the organisms within these communities have adapted to withstand, and even to exploit, natural wildfire.", 342 | "manual_rewrite": "How can fires help an ecosystem?", 343 | "REW_rewrite": [ 344 | "What are the benefits of fires for an ecosystem?", 345 | "What are the benefits of fires to an ecosystem?", 346 | "How do fires benefit an ecosystem?", 347 | "What are the benefits of fires in an ecosystem?", 348 | "What are the benefits of fires for an ecosystem?" 349 | ], 350 | "manual_ndcg3": 0.0, 351 | "REW_ndcg3": 0.0, 352 | "ndcg3: manual_is_better": false, 353 | "ndcg3: REW_is_better": false, 354 | "ndcg3: tie": true, 355 | "human_evaluation": "good-h" 356 | }, 357 | { 358 | "turn_id": 2, 359 | "question": "Give me some examples.", 360 | "response": "Not only will the number of thriving annual plants be higher after a fire, but the number of species will also be significantly higher. This is due not only to the fact that there are more resources available but also because several plant species in chaparral biomes are so adapted to the cycle of periodic fires that their seeds are only able to sprout after a fire. Photo of a chaparral ecosystem the winter after a fire For example, the seeds of a native chaparral wildflower, called whispering bells, require smoke exposure to germinate. These annual plant species are sometimes called fire-followers because they are only seen in the first couple of years following a fire when they are a part of the dominant vegetation in a chaparral ecosystem.", 361 | "manual_rewrite": "Give me some examples of organisms that have adapted to exploit natural wildfire.", 362 | "REW_rewrite": [ 363 | "Can you provide some examples of ecosystems that require fire as an essential contributor to habitat vitality and renewal?", 364 | "Can you provide some specific examples of ecosystems where fire is an essential contributor to habitat vitality and renewal?", 365 | "Can you provide me with some examples of ecosystems that rely on fire as an essential contributor to habitat renewal?", 366 | "Can you provide some examples of ecosystems that require fire for habitat renewal and vitality?", 367 | "Can you provide some specific examples of ecosystems that require fire for vitality and renewal?" 368 | ], 369 | "manual_ndcg3": 0.0, 370 | "REW_ndcg3": 0.0, 371 | "ndcg3: manual_is_better": false, 372 | "ndcg3: REW_is_better": false, 373 | "ndcg3: tie": true, 374 | "human_evaluation": "good-a" 375 | }, 376 | { 377 | "turn_id": 3, 378 | "question": "Cool name! What are other fire-followers?", 379 | "response": "The longleaf pine ecosystem is known as a fire climax community. The organization group Longleaf Alliance defines a fire climax community as \"a plant and animal community that is limited by and adapted to an early successional stage by frequent fire disturbances\". The role of fire in the longleaf pine ecosystem was and still is today a key component in regeneration and shaping of the ecosystem. In pre-European establishment years, Native Americans set fires quite regularly. By doing so, they benefited their needs of establishing hunting grounds, homesteads, and safe havens. While achieving their goals, they were enriching the longleaf pine ecosystem by shaping the dynamics of the system.", 380 | "manual_rewrite": "Cool name! What are other organisms besides whispering bells that have adapted to exploit natural wildfires?", 381 | "REW_rewrite": [ 382 | "Can you give me more examples of plant species that are considered fire-followers?", 383 | "Can you give me some examples of other plant species that require fire to germinate or establish, besides whispering bells?", 384 | "Can you tell me about other plant species that require fire to sprout and establish in chaparral biomes aside from whispering bells?", 385 | "Can you give me more examples of plant species that require fire to germinate or establish, aside from whispering bells?", 386 | "Can you give me more examples of plant species that are fire-followers?" 387 | ], 388 | "manual_ndcg3": 0.0, 389 | "REW_ndcg3": 0.0, 390 | "ndcg3: manual_is_better": false, 391 | "ndcg3: REW_is_better": false, 392 | "ndcg3: tie": true, 393 | "human_evaluation": "good-h" 394 | }, 395 | { 396 | "turn_id": 4, 397 | "question": "Let\u2019s talk about other environmental influences besides fire. More broadly, what are the effects of agriculture?", 398 | "response": "That agriculture has a huge effect on our ecosystems: pesticides and other toxic chemicals used in food production can wash downstream, killing many types of animals, polluting our waterways and even making humans ill. Fertilizers can wash downstream, causing \"\"blooms\"\" or large populations of algae and bacteria, which can harm fish populations. In addition, Green Facts states that we are depleting fish populations in the oceans and seas, causing us to fish further offshore and deeper underwater as more and more fish disappear. Water Throughout the world, 2.6 billion people lack proper sanitation, which means that drinking water around the world is contaminated with waste and disease.", 399 | "manual_rewrite": "Let\u2019s talk about other environmental influences besides fire. How does agriculture affect an ecosystem?", 400 | "REW_rewrite": [ 401 | "Can you tell me about the effects of agriculture on the environment?", 402 | "Can you tell me about the impact of agriculture on the environment?", 403 | "Can you tell me about the effects of agriculture on the environment?", 404 | "Can you tell me about the effects of agriculture on the environment?", 405 | "Can you tell me about the effects of agriculture on the environment?" 406 | ], 407 | "manual_ndcg3": 0.0, 408 | "REW_ndcg3": 0.0, 409 | "ndcg3: manual_is_better": false, 410 | "ndcg3: REW_is_better": false, 411 | "ndcg3: tie": true, 412 | "human_evaluation": "good-a" 413 | }, 414 | { 415 | "turn_id": 5, 416 | "question": "Can it cause desertification?", 417 | "response": "Most plants, unless specially adapted to desert conditions, can not thrive in dry, arid conditions. Farming is major cause of desertification - soils need time to rest between crops but pressures to provide food and an income often lead to soil and land being over used and becoming tired. Soil also requires fertilisation to remain fertile, preferably through natural or organic processes. Where farming is carried out in a way that is unsympathetic to the land in places where climatic conditions mean the soil is already under stress, this can eventually result in desertification. Read More: How Big a Problem is Desertification?", 418 | "manual_rewrite": "Can agriculture cause desertification?", 419 | "REW_rewrite": [ 420 | "What are the effects of agriculture on the environment, and can it lead to desertification?", 421 | "What are the effects of agriculture on ecosystems, and can it contribute to desertification?", 422 | "What are the effects of agriculture on our ecosystems, and can it lead to desertification?", 423 | "Can agriculture cause desertification?", 424 | "What are the effects of agriculture on ecosystems? Can agriculture cause desertification?" 425 | ], 426 | "manual_ndcg3": 0.0, 427 | "REW_ndcg3": 0.0, 428 | "ndcg3: manual_is_better": false, 429 | "ndcg3: REW_is_better": false, 430 | "ndcg3: tie": true, 431 | "human_evaluation": "good-h" 432 | }, 433 | { 434 | "turn_id": 6, 435 | "question": "What does it mean to be tired?", 436 | "response": "Section::::Regenerative agriculture. Agriculture is seen to be one of the main players in the depletion of soil richness in human history. Certain agricultural practices can deplete soil of carbon, such as monoculture, failing to rotate crops sufficiently, and intensive livestock grazing. Soil that is low in carbon will not support plant life and is susceptible to desertification. Without plants, soil cannot hold water sufficiently, and will become dry and brittle over time. Agroecology is an overarching category of approaches to creating a more sustainable agricultural system and increase the health of soil. It includes \"permaculture\", originally being a combination of the words \"permanent\" and \"agriculture\", which is a systems thinking approach that can increase the carbon content of soil by using natural patterns in already existing ecosystems to promote a higher efficiency of food production, a lower need for human involvement, and a healthier ecosystem.", 437 | "manual_rewrite": "What does it mean for soil to be tired?", 438 | "REW_rewrite": [ 439 | "Can farming lead to soil exhaustion?", 440 | "Can you explain what is meant by soil becoming \"tired\" in farming?", 441 | "Can overuse of land and soil cause fatigue in the soil?", 442 | "Can farming cause tired soil and what are the consequences?", 443 | "Can you explain what is meant by soil becoming \"tired\" in agricultural practices?" 444 | ], 445 | "manual_ndcg3": 0.0, 446 | "REW_ndcg3": 0.0, 447 | "ndcg3: manual_is_better": false, 448 | "ndcg3: REW_is_better": false, 449 | "ndcg3: tie": true, 450 | "human_evaluation": "good-h" 451 | }, 452 | { 453 | "turn_id": 7, 454 | "question": "That sounds like organic farming. How are they related?", 455 | "response": "According to Permaculture Planet.com:\u201cPermaculture is an interdisciplinary earth science which encompasses many kinds of appropriate technologies and sustainable design methodologies, such as; renewable energy, water conservation, organic food production, ecological building techniques, micro-economics and more. In a world of rapid ecological change & seemingly insurmountable social & environmental problems, Permaculture offers practical solutions using design principles drawn from the observation of natural systems. From individual homes to entire bio-regions, Permaculture has the potential to repair damaged landscapes, build capacity in communities and replace apathy with self empowerment.\u201dPermaculture, when practiced, is a design philosophy and set of tools and principles that encompass the inter-related fields of organic gardening and agroforestry, architecture, ecology, community design, renewable energy, and systems theory.", 456 | "manual_rewrite": "Permaculture sounds like organic farming, how are they related?", 457 | "REW_rewrite": [ 458 | "How is agroecology related to organic farming?", 459 | "How is agroecology related to organic farming?", 460 | "How is agroecology related to organic farming?", 461 | "How is agroecology related to organic farming?", 462 | "How is agroecology related to organic farming?" 463 | ], 464 | "manual_ndcg3": 0.0, 465 | "REW_ndcg3": 0.0, 466 | "ndcg3: manual_is_better": false, 467 | "ndcg3: REW_is_better": false, 468 | "ndcg3: tie": true, 469 | "human_evaluation": "bad-c" 470 | }, 471 | { 472 | "turn_id": 8, 473 | "question": "Does organic farming reduce global warming, and if so, how?", 474 | "response": "Organic farmers are also careful about how they store and compost animal wastes, in order to safeguard air and water systems. They tailor their crops according to climate and soil, to optimise the natural growth cycle. Organic farmers also promote biodiversity, by growing a variety of crops, rather than one single crop. Through all of these methods, organic farmers prevent soil erosion, conserve energy, and help protect local wildlife, stream banks and watersheds. They help protect the global environment, as well: organic farming can greatly reduce the amount of greenhouse gases, such as carbon dioxide, that contribute to global warming. Click here to read a global report on The Role of Organic Agriculture in Mitigating Climate Change.", 475 | "manual_rewrite": "Does organic farming reduce global warming, and if so, how?", 476 | "REW_rewrite": [ 477 | "How does organic farming reduce global warming?", 478 | "How does organic farming contribute to reducing global warming?", 479 | "How does organic farming affect global warming?", 480 | "How does organic farming contribute to reducing global warming?", 481 | "How does organic farming help to reduce global warming?" 482 | ], 483 | "manual_ndcg3": 0.0, 484 | "REW_ndcg3": 0.0, 485 | "ndcg3: manual_is_better": false, 486 | "ndcg3: REW_is_better": false, 487 | "ndcg3: tie": true, 488 | "human_evaluation": "good-a" 489 | } 490 | ] 491 | }, 492 | { 493 | "conv_id": 109, 494 | "turns": [] 495 | }, 496 | { 497 | "conv_id": 110, 498 | "turns": [ 499 | { 500 | "turn_id": 1, 501 | "question": "What are the benefits of cow milk?", 502 | "response": "Cow milk seems to be everywhere and is often taken for granted, but it has many important health benefits, including its ability to aid in weight loss, build strong bones and teeth, boost the immune system, reduce fat, protect the heart, prevent diabetes, eliminate inflammation, and help stimulate growth. Cow Milk No matter where you are on the planet, you\u2019ve almost certainly come across cow milk. In most parts of the world, it is the primary [ 1] animal milk consumed by humans, although goat, sheep, and even camel milk are popular based on the availability of animals. Cow milk (derived from bovines) is intended for sucking by young cows, just as human infants are often nursed with human breast milk. However, the nutritional value and availability of cow\u2019s milk have made it one of the most in-demand liquids on the planet. More than 6 billion people on the planet drink milk on a semi-regular basis, which means that these animals are an incredibly important part of our global nutritional needs.", 503 | "manual_rewrite": "What are the benefits of cow milk?", 504 | "REW_rewrite": [ 505 | "What are the advantages of consuming cow's milk?", 506 | "What are the advantages of consuming cow milk?", 507 | "What are the advantages of consuming cow milk?", 508 | "What are some advantages of consuming cow milk?", 509 | "What are the advantages of consuming cow milk?" 510 | ], 511 | "manual_ndcg3": 0.0, 512 | "REW_ndcg3": 0.0, 513 | "ndcg3: manual_is_better": false, 514 | "ndcg3: REW_is_better": false, 515 | "ndcg3: tie": true, 516 | "human_evaluation": "good-h" 517 | }, 518 | { 519 | "turn_id": 2, 520 | "question": "What about an alternative. I\u2019ve heard it's also made from almonds?", 521 | "response": "Commercially available almond milk is often enriched with vitamins A and D, as well as calcium, to make it seem more like regular milk. (Almonds provide some calcium, but not as much as cow\u2019s milk.) Almond milk is low in calories compared to other milk substitutes and contains some heart-healthy monounsaturated fat. Be sure to check the labels of almond milk before you buy, because some brands contain preservatives, and other additives. Note that you can make almond milk at home \u2013 an easy process, requiring only organic raw almonds, water and a little sugar or other sweetener. You can find many recipes online. You might also try oat, hemp or rice milk. Oat milk provides more fiber than other milk substitutes and was shown in a Swedish study to help lower LDL (\u201cbad\u201d) cholesterol and total cholesterol. Hemp milk, made by pulverizing the seeds, blending them with water and straining out the solid residue, provides both omega-6 and omega-3 essential fatty acids in a healthy three-to-one ratio. Utterance type: Question", 522 | "manual_rewrite": "What about an alternative. How is milk made from almonds?", 523 | "REW_rewrite": [ 524 | "What are the benefits of almond milk compared to cow milk?", 525 | "Can you tell me about almond milk as an alternative to cow milk?", 526 | "Can you tell me about almond milk as an alternative to cow milk?", 527 | "Can you tell me about almond milk as an alternative to cow milk?", 528 | "Can you tell me about almond milk as an alternative to cow milk?" 529 | ], 530 | "manual_ndcg3": 0.0, 531 | "REW_ndcg3": 0.0, 532 | "ndcg3: manual_is_better": false, 533 | "ndcg3: REW_is_better": false, 534 | "ndcg3: tie": true, 535 | "human_evaluation": "good-a" 536 | }, 537 | { 538 | "turn_id": 3, 539 | "question": "Which is healthiest?", 540 | "response": "Why Whole Milk is the Healthiest Choicetweet email 227By: Experience Life August 31, 2009About Experience Life Follow Experience Life at @ Experience Life I drink whole milk and eat full-fat yogurt, cream cheese, and sour cream. Sure, full-fat dairy products taste better than the skim/fat-free versions, but I don\u2019t eat them for the taste. I eat full-fat dairy because it\u2019s better for my health and my weight. Yep, you heard me right: I eat dairy products with all the fat god gave \u2018em, and I do it because it\u2019s good for me. Here\u2019s why:1. Our bodies cannot digest the protein or absorb the calcium from milk without the fat. Love This? Never Miss Another Story.2. Vitamins A and D are also fat-soluble. So you can\u2019t absorb them from milk when all the fat has been skimmed off. (This makes fortified skim milk the biggest sham of all \u2014 you can pump fat-free milk full of a year\u2019s supply of vitamins A and D, but the body can\u2019t access them).3.", 541 | "manual_rewrite": "Which of the milk options (almond, oat, hemp or rice milk) is the healthiest?", 542 | "REW_rewrite": [ 543 | "Can you compare the health benefits of cow milk, almond milk, oat milk, hemp milk, and rice milk?", 544 | "Can you compare the health benefits of cow milk, almond milk, oat milk, hemp milk, and rice milk?", 545 | "Can you compare the health benefits of cow milk, almond milk, oat milk, hemp milk, and rice milk?", 546 | "Which milk substitute is the healthiest?", 547 | "Can you compare the health benefits of cow milk, almond milk, oat milk, hemp milk, and rice milk?" 548 | ], 549 | "manual_ndcg3": 0.0, 550 | "REW_ndcg3": 0.0, 551 | "ndcg3: manual_is_better": false, 552 | "ndcg3: REW_is_better": false, 553 | "ndcg3: tie": true, 554 | "human_evaluation": "good-a" 555 | }, 556 | { 557 | "turn_id": 4, 558 | "question": "No, the vegan ones. Which of those alternatives is healthiest?", 559 | "response": "There are a ton of nondairy milks to choose from, but which one is the healthiest?\u00bbRELATED: Milk in the fridge? Someone may owe you money To find out, researchers from Mc Gill University recently conducted an experiment, published in the Journal of Food Science and Technology, to determine the nutritional value of plant-based milk beverages. To do so, they compared the health benefits of the four most popular alternative milks, including soy, almond, coconut and rice. They examined the number of calories and amount of protein, fat and carbohydrates found in each. They also looked into the vitamins and minerals the substitutes contained. While they said cow\u2019s milk still has the most nutritional value, with 8 grams of protein, 9 grams of fat and 11.5 grams of carbs, soy is the most comparable for overall balance. It\u2019s \u201ca clear winner,\u201d they wrote in a statement. It has more protein than all the other milk options analyzed, with 7 to 12 grams and 95 calories a glass.", 560 | "manual_rewrite": "No, the vegan ones. Which of the vegan alternative milk options is healthiest?", 561 | "REW_rewrite": [ 562 | "Which vegan milk alternative is the healthiest option?", 563 | "Which of the vegan milk alternatives is considered the healthiest?", 564 | "Which of the vegan milk alternatives (such as almond, oat, or hemp milk) is considered the healthiest?", 565 | "Which vegan milk alternative is considered the healthiest option?", 566 | "Which vegan milk alternative is the healthiest?" 567 | ], 568 | "manual_ndcg3": 0.0, 569 | "REW_ndcg3": 0.0, 570 | "ndcg3: manual_is_better": false, 571 | "ndcg3: REW_is_better": false, 572 | "ndcg3: tie": true, 573 | "human_evaluation": "good-h" 574 | }, 575 | { 576 | "turn_id": 5, 577 | "question": "Can I make it at home?", 578 | "response": "Soy milk maker A soy milk maker is a small kitchen appliance which automatically cooks soy milk, a non-dairy beverage made from soy beans. Soy milk makers work similarly to a combination between a home blender and an automatic coffee maker. Some soy milk makers can also be programmed to make almond milk, rice milk, and other vegetable-based steeped beverages. Homemade soy milk can be made to the drinkers' tastes and nutritional requirements, providing added value. Soy pulp, a by-product of soy milk preparation, can be used as an ingredient in many recipes and food products. Ordinary methods for making soy milk at home are often very labor-intensive (requiring beans to be soaked, ground in a blender, strained, and then cooked). Soy milk machines perform many of these steps automatically, greatly simplifying home-based soy milk production. Section::::Standard operation. Before use, dried beans are rinsed with water to remove particulate debris, soaked for 6\u201310 hours to moisten and soften the dried beans, and then rinsed again before use.", 579 | "manual_rewrite": "Can I make soy milk at home?", 580 | "REW_rewrite": [ 581 | "Is it possible to make almond milk at home?", 582 | "Is it possible to make almond milk at home?", 583 | "Is it possible to make almond milk at home?", 584 | "What is the process for making almond milk at home?", 585 | "Is making almond milk at home an easy process?" 586 | ], 587 | "manual_ndcg3": 0.0, 588 | "REW_ndcg3": 0.0, 589 | "ndcg3: manual_is_better": false, 590 | "ndcg3: REW_is_better": false, 591 | "ndcg3: tie": true, 592 | "human_evaluation": "bad-c" 593 | }, 594 | { 595 | "turn_id": 6, 596 | "question": "Oat seems popular now. Why?", 597 | "response": "Oat milk Oat milk is a type of plant milk derived from whole oat (\"Avena spp.\") grains by soaking the plant material to extract its nutrients. Oat milk naturally has a creamy texture and a characteristically oatmeal-like flavor, though it is sold commercially in various flavor-varieties such as sweetened, unsweetened, vanilla, and chocolate. Unlike other plant milks, whose origins date as early as the 13th century, oat milk is a modern creation, developed by the Swedish scientist Rickard Oste in the early 1990s. Oats contain high amounts of functional protein, dietary fiber (\u03b2-glucan), and unsaturated fatty acids, which make oat milk a significant source of nutrients, though uncertainty surrounds its practical use as a dairy milk substitute. Regardless, oat milk is often consumed to replace dairy milk in vegan diets, or in the cases of medical conditions where dairy is incompatible, such as lactose intolerance or a cow's milk allergy (CMA). Section::::History. Section::::History.:Creation. Soy milk predates all other alternative milks, including oat milk, both as a cultural and commercial product.", 598 | "manual_rewrite": "Oat seems popular now. Why?", 599 | "REW_rewrite": [ 600 | "What are the reasons for the popularity of oat milk now?", 601 | "Why is oat milk popular?", 602 | "Why is oat milk popular now?", 603 | "Why is oat milk popular now?", 604 | "Why is oat milk so popular?" 605 | ], 606 | "manual_ndcg3": 0.0, 607 | "REW_ndcg3": 0.0, 608 | "ndcg3: manual_is_better": false, 609 | "ndcg3: REW_is_better": false, 610 | "ndcg3: tie": true, 611 | "human_evaluation": "good-h" 612 | }, 613 | { 614 | "turn_id": 7, 615 | "question": "I\u2019m going to bake some cookies. Which one would be the best to use?", 616 | "response": "One cup of soy milk provides 4 grams of unsaturated fat, 1 gram of fiber, and 7 grams of protein. Soymilk has 75% more antioxidants than cow\u2019s milk. It is hearty, rich and creamy. Soy milk is good for drinking straight from the glass, poured over cereal or in coffee. The taste is nutty and slightly sweet. When it comes to cooking, soy milk is one of the best milks to choose. It can be used in place of cow\u2019s milk in any recipe. It is stable at high temperatures which makes it a good choice for savory dishes and sauces. In baking, soy milk is also one of the best choices because of its high protein content. According to veganbaking.net curdling milk with an acid such as vinegar creates complex flavors and thickens the mixture. This in turn \u201cincreases leavening power and enhances the crumb quality of cakes and muffins.\u201d Since curdling is dependent on protein content, soy milk does the best job. That means that soy milk is also the best choice when making vegan buttermilk for Vegan Buttermilk Biscuits or for dredging to make breaded dishes like \u201cChicken-Fried\u201d Tofu Steaks.", 617 | "manual_rewrite": "I\u2019m going to bake some cookies. Which vegan milk is best for baking cookies?", 618 | "REW_rewrite": [ 619 | "What would be the best plant-based milk alternative to use for baking cookies?", 620 | "Which vegan milk alternative would be the best to use for baking cookies?", 621 | "Which alternative milk would be best to use for baking cookies?", 622 | "Which non-dairy milk alternative would be best to use for baking cookies?", 623 | "Which vegan milk alternative would be best for baking cookies?" 624 | ], 625 | "manual_ndcg3": 0.0, 626 | "REW_ndcg3": 0.0, 627 | "ndcg3: manual_is_better": false, 628 | "ndcg3: REW_is_better": false, 629 | "ndcg3: tie": true, 630 | "human_evaluation": "good-a" 631 | }, 632 | { 633 | "turn_id": 8, 634 | "question": "What would be second best?", 635 | "response": "The next time I'm in a pinch and soy or hemp milk can't be found I'll turn to almond milk as a third alternative and if I can't find this I'll go on the lookout for coconut milk, followed by rice milk. I'm aware that over use of soy in food products may be a health concern to many readers as of this writing due to some health studies that have been published. In vegan baking we're using so little of it that if you're concerned about it you really shouldn't be eating baked items in the first place because the sugar and glycemic index issues are a much more serious health issue. Remember that most things in moderation are ok. If you're still concerned or allergic to soy then it's great that we now have hemp milk as a soy-free alternative. Its wonderful to finally be in an age where we have so many alternatives to dairy milk to choose from.", 636 | "manual_rewrite": "Which vegan milk besides soy is best for baking?", 637 | "REW_rewrite": [ 638 | "Which plant-based milk would be the second-best option for baking cookies?", 639 | "Which alternative milk would be the second best option for baking cookies?", 640 | "Which non-dairy milk would be the second-best option for baking cookies?", 641 | "Which plant-based milk would be the second best to use for baking after soy milk?", 642 | "Which of the vegan milk alternatives would be the second best option for baking cookies?" 643 | ], 644 | "manual_ndcg3": 0.0, 645 | "REW_ndcg3": 0.0, 646 | "ndcg3: manual_is_better": false, 647 | "ndcg3: REW_is_better": false, 648 | "ndcg3: tie": true, 649 | "human_evaluation": "good-a" 650 | }, 651 | { 652 | "turn_id": 9, 653 | "question": "What is the best in my morning latte?", 654 | "response": "Hemp milk Hemp milk, or hemp seed milk, is a plant milk made from hemp seeds that are soaked and ground in water. The result resembles milk in color, texture, and flavor. Hemp is conducive to being organically grown and labeled. Plain hemp milk may be additionally sweetened or flavored. Section::::Production. Production of hemp milk requires hemp seeds, water, and a blender. Many recipes call for ground vanilla or vanilla extract to add flavor, and a type of sweetener. Once all the ingredients are blended together, some people pour the hemp milk through a cheesecloth and strainer to get a smoother and more refined milky texture, but this process is optional. Section::::Use in coffee. Compared to soy milk, in coffee culture, hemp milk is said to produce better latte art and to have a texture more like cow's milk. Section::::Nutrition. In a serving, hemp milk provides 46 calories from 3 g of carbohydrates, 3 g of fat and 2 g of protein. Hemp milk contains no micronutrients in significant amounts.", 655 | "manual_rewrite": "What vegan milk is best in my morning latte?", 656 | "REW_rewrite": [ 657 | "Which non-dairy milk is the best to use in a morning latte?", 658 | "Which non-dairy milk is the best option to use in my morning latte?", 659 | "Which plant milk is the best to use in a morning latte?", 660 | "Which alternative milk is best for my morning latte?", 661 | "Which non-dairy milk is the best to use in a morning latte?" 662 | ], 663 | "manual_ndcg3": 0.0, 664 | "REW_ndcg3": 0.0, 665 | "ndcg3: manual_is_better": false, 666 | "ndcg3: REW_is_better": false, 667 | "ndcg3: tie": true, 668 | "human_evaluation": "bad-o" 669 | } 670 | ] 671 | }, 672 | { 673 | "conv_id": 111, 674 | "turns": [ 675 | { 676 | "turn_id": 1, 677 | "question": "I saw a Tesla Roadster today which made me wonder, what is the world\u2019s fastest car?", 678 | "response": "Click here to visit our frequently asked questions about HTML5 video.0:00 / 2:112. Hennessey Venom GT (270.49mph)The Hennessey Venom GT is first in our list, with a verified speed of 270.49mph. The cars comes with a 1,244-hp 7.0-liter twin-turbocharged V8 engine giving it enough power to go at such high speeds. However, when it comes to Guinness World records Venom GT is still not considered as the fastest car in the world. The Guinness World record for the fastest car is held by the Bugatti Veyron Super Sport (second car in our list) at 269.86mph. The Venom GT reached the speed of 270.49mph at the Kennedy Space Center in a one-way run (NASA didn\u2019t allow running in the opposite direction). To break the Guinness World Record, it is compulsory to run in both directions to compensate for the wind conditions, so the Veyron Super Sport kept its title of fastest car in the world.", 679 | "manual_rewrite": "I saw a Tesla Roadster today which made me wonder, what is the world\u2019s fastest car?", 680 | "REW_rewrite": [ 681 | "What is the fastest car in the world?", 682 | "What is the fastest car in the world?", 683 | "What is currently considered the world's fastest car?", 684 | "What is the fastest car in the world?", 685 | "What is the fastest car in the world?" 686 | ], 687 | "manual_ndcg3": 0.0, 688 | "REW_ndcg3": 0.0, 689 | "ndcg3: manual_is_better": false, 690 | "ndcg3: REW_is_better": false, 691 | "ndcg3: tie": true, 692 | "human_evaluation": "good-h" 693 | }, 694 | { 695 | "turn_id": 2, 696 | "question": "What is the first based on?", 697 | "response": "Section::::Specifications.:Chassis. The Venom GT utilizes a heavily modified Lotus Elise/Exige chassis. The manufacturer states that the modified chassis uses components from the Lotus Exige, including the roof, doors, side glass, windscreen, cockpit, floorpan, HVAC system, wiper and head lamps, though the manufacturer is not associated with Lotus Cars. For road use, the car is registered as a Lotus Exige (modified) and is not a series production car. The Venom GT has a curb weight of aided by carbon fiber bodywork and carbon fiber wheels. The brakes have Brembo 6-piston calipers at the front and 4-piston calipers at the rear. The rotors are carbon ceramic units provided by Surface Transforms. Section::::Specifications.:Drivetrain. The Venom GT is powered by a twin-turbocharged GM LSX V8 engine. The LSX architecture incorporates specific design features such as reinforced internal components and additional head bolts with aluminum heads including twin Precision dual ball bearing turbochargers. The engine has a power output of at 6,600 rpm and of torque at 4,400 rpm.", 698 | "manual_rewrite": "What is Hennessey Venom GT based on?", 699 | "REW_rewrite": [ 700 | "Based on what criteria is the Hennessey Venom GT considered the world's fastest car?", 701 | "Based on what criteria is the Hennessey Venom GT considered the world's fastest car?", 702 | "What is the basis for the Hennessey Venom GT being considered the world's fastest car?", 703 | "Based on verified speed, what is the fastest car in the world?", 704 | "Based on what criteria is the Hennessey Venom GT considered the world's fastest car?" 705 | ], 706 | "manual_ndcg3": 0.0, 707 | "REW_ndcg3": 0.0, 708 | "ndcg3: manual_is_better": false, 709 | "ndcg3: REW_is_better": false, 710 | "ndcg3: tie": true, 711 | "human_evaluation": "good-a" 712 | }, 713 | { 714 | "turn_id": 3, 715 | "question": "How fast can the unmodified base car go?", 716 | "response": "This was a great way to validate the technical excellence of our car which includes high-speed stability with an open roof.\u201dThe Venom GT is powered by a twin-turbo 7.0L V8 engine producing 1,451 bhp @ 7,200 rpm and 1,287 lb-ft of torque @ 4,200 rpm and puts its power down to the rear Michelin Pilot Super Sport tires via a 6-speed manual gearbox by Ricardo. The Venom GT is capable of 0-60 mph in less than 2.4 seconds and has been tested from 0-200 mph in less than 13 seconds. Hennessey plans to offer a limited run of 3 \u201cWorld Record Edition\u201d Venom GT Spyder\u2019s to commemorate the record run and as a tribute to the company\u2019s 25 th Anniversary. The cost is $1.3 million each plus tax.\u201cWe are grateful to have the help of so many friends, family, employees, customers and partners over the past 25 years. We look forward to the next chapter of the Venom with the F5 as we keep making fast cars faster for the next 25 years\u201d, said John Hennessey.###", 717 | "manual_rewrite": "How fast can the unmodified base Lotus Elise/Exige go?", 718 | "REW_rewrite": [ 719 | "What is the top speed of the unmodified base model of the car used for the Hennessey Venom GT?", 720 | "What is the top speed of the unmodified base car that the Hennessey Venom GT is based on?", 721 | "What is the top speed of the unmodified base car used for the Hennessey Venom GT?", 722 | "What is the top speed of the unmodified base car used for the Hennessey Venom GT?", 723 | "What is the top speed of the unmodified base car?" 724 | ], 725 | "manual_ndcg3": 0.0, 726 | "REW_ndcg3": 0.0, 727 | "ndcg3: manual_is_better": false, 728 | "ndcg3: REW_is_better": false, 729 | "ndcg3: tie": true, 730 | "human_evaluation": "bad-o" 731 | }, 732 | { 733 | "turn_id": 4, 734 | "question": "No, I meant the Lotus.", 735 | "response": "Power to weight ratio: 307 hp/tonne (311 PS/1000kg). Lotus Exige Sport 380 (2016\u2013Present) [ edit]Lotus Exige Sport 380The Lotus Exige Sport 380 is a track focused and more powerful version of the Lotus Exige lineup. It was unveiled on 23 November 2016. Lotus' CEO, Jean-Marc Gales describes it as, \"\"The Exige Sport 380 is so good, that it is no longer the best in class, it\u2019s now in a class of its own\"\", and it fulfills this statement by taking on some of the powerful and expensive super cars both on the track and the streets. The 3.5-litre, super-charged V6 engine is now uprated and produces 375 hp (380 PS; 280 k W) and 410 N\u22c5m (302 lb\u22c5ft) of torque with a 6500 rpm red line achieved by revised supercharger and ECU. It can sprint from 0 to 60 mph (97 km/h) in 3.5 seconds and has a top speed of 178 mph (286 km/h).", 736 | "manual_rewrite": "How fast can the Lotus Elise/Exige go?", 737 | "REW_rewrite": [ 738 | "Can you provide information on the top speed of an unmodified Lotus Elise/Exige?", 739 | "Can you tell me how fast the unmodified Lotus that the Venom GT is based on can go?", 740 | "Can you provide information on the unmodified base car of the Lotus?", 741 | "Could you tell me how fast the unmodified base Lotus car can go?", 742 | "Can you provide information about the unmodified base car for the Lotus?" 743 | ], 744 | "manual_ndcg3": 0.0, 745 | "REW_ndcg3": 0.0, 746 | "ndcg3: manual_is_better": false, 747 | "ndcg3: REW_is_better": false, 748 | "ndcg3: tie": true, 749 | "human_evaluation": "good-h" 750 | }, 751 | { 752 | "turn_id": 5, 753 | "question": "Can any go over 300 mph?", 754 | "response": "Instead, it is a head to head race between two cars on a short 402 m (\u00bc mile), straight track. There are five professional classes of dragsters. They are top fuel dragster, top fuel funny car, pro modified, pro stock, and pro stock bike. Of these, top fuel and funny cars are the most popular. Top fuel dragsters are the fastest class of dragsters, capable of reaching speeds of up to 150 m/s. Funny cars are basically top fuel dragsters with a body over the chassis. As a result, funny cars are usually slower than top fuelers. Top fuel dragsters frequently reach speeds of 134 m/s (300 mph) during their runs. The current speed record for any dragster is 150.27 m/s (336.15 mph) by a top fuel dragster on a run by Tony Schumacher on May 5, 2005. Class E. T. Speed Date Driver/Car Location TF 4.437 10/01/05 Tony Shumacher - Long Grove, Il/'04 R. E. D. Hadman Joliet, Il TF 336.15 05/25/05 Tony Shumacher - Long Grove, Il/'05 Hadman R.", 755 | "manual_rewrite": "Can any super car go over 300 mph?", 756 | "REW_rewrite": [ 757 | "Can you tell me which cars can go over 300 mph?", 758 | "Are there any cars that can exceed 300 mph?", 759 | "Are there any cars that can exceed 300 mph?", 760 | "Are there any cars that can go over 300 mph?", 761 | "Which cars have a top speed of over 300 mph?" 762 | ], 763 | "manual_ndcg3": 0.0, 764 | "REW_ndcg3": 0.0, 765 | "ndcg3: manual_is_better": false, 766 | "ndcg3: REW_is_better": false, 767 | "ndcg3: tie": true, 768 | "human_evaluation": "bad-o" 769 | }, 770 | { 771 | "turn_id": 6, 772 | "question": "What is the origin of funny cars? Why do they call them that?", 773 | "response": "Hot Rod News Comparing Vintage and Modern Funny Cars Written by Brandan Gillogly on February 13, 2016Elana Scherr - writer; Wes Allison - photographer;View All 78 Photos The first Funny Cars showed up in the mid 1960s, how have they changed in the last half century? Race cars develop in fits and spurts. Only rarely can an innovation be traced back to one specific person, although often it\u2019s whoever first won with a new idea that gets credit for it. The modern Funny Car, that nitro-burning, carbon fiber shell set a-trembling over a front engine tube chassis has its roots in the A/FX classes running in 1964 and 1965. Jack Chrisman\u2019s nitro-burning Comet in 1964 showed the crowd-pleasing abilities of a Top Fuel dragster in a dealer showroom suit. The name is rumored to have come from the altered wheelbase bodies that followed in 1965. They looked \u201cfunny.\u201d The first flip-top car bodies over dragster chassis showed up in 1966, and that seems to be where the NHRA is marking its 50th anniversary date for the birth of the Funny Car.", 774 | "manual_rewrite": "What is the origin of funny cars? Why do they call them funny cars?", 775 | "REW_rewrite": [ 776 | "Can you tell me the origin of funny cars and why they are called that?", 777 | "What is the origin of funny cars and how did they get their name?", 778 | "What is the history of funny cars and why are they called funny cars?", 779 | "What is the origin of funny cars and why are they called that?", 780 | "Can you tell me about the origin of funny cars and how they got their name?" 781 | ], 782 | "manual_ndcg3": 0.0, 783 | "REW_ndcg3": 0.0, 784 | "ndcg3: manual_is_better": false, 785 | "ndcg3: REW_is_better": false, 786 | "ndcg3: tie": true, 787 | "human_evaluation": "good-a" 788 | }, 789 | { 790 | "turn_id": 7, 791 | "question": "How fast did it go?", 792 | "response": "The upgraded system also improved the maximum torque from and improves the Roadster's quarter mile times. Section::::Specifications.:Transmission.:Gear selector. In the interior the gear selector is similar to a push-button automatic with buttons labeled P, R, N and D while some earlier models have a gear lever similar to that in cars with manual transmission. Section::::Specifications.:Performance. The Roadster's acceleration time is 3.9 seconds for the Standard model and 3.7 seconds for the 2010 V2.5 Sport, which \"Motor Trend\" confirmed in the first independent, instrumented testing the Sport. The magazine also recorded a time of 12.6 seconds at . Tesla said the top speed is electronically limited to . Tesla claims it has a weight of , a and a . Tesla began delivering the higher performance version of the Roadster in July 2009. The Roadster Sport has adjustable dampers and a new hand-wound motor, capable of in 3.7 seconds. Scotty Pollacheck, a high-performance driver for Killacycle, drove a 2010 Tesla Roadster Sport at the Wayland Invitational Drag Race in Portland, Oregon, in July 2009.", 793 | "manual_rewrite": "How fast did the first funny car go?", 794 | "REW_rewrite": [ 795 | "What was the top speed of the Hennessey Venom GT?", 796 | "What was the speed of the Hennessey Venom GT?", 797 | "What was the top speed of the Hennessey Venom GT?", 798 | "What was the top speed of the Hennessey Venom GT?", 799 | "What was the speed achieved by the Hennessey Venom GT?" 800 | ], 801 | "manual_ndcg3": 0.0, 802 | "REW_ndcg3": 0.0, 803 | "ndcg3: manual_is_better": false, 804 | "ndcg3: REW_is_better": false, 805 | "ndcg3: tie": true, 806 | "human_evaluation": "bad-c" 807 | }, 808 | { 809 | "turn_id": 8, 810 | "question": "No, I meant the funny car. But, that\u2019s interesting. I heard it has a ludicrous mode. How does its quickness compare to the Venom?", 811 | "response": "\u201cOver fifty years ago the American-built P51 Mustang fighter aircraft was flown into history powered by a British Rolls-Royce Merlin engine. Today the Hennessey Venom GT promises to set a new standard of power to weight ratio in the rarified air of today\u2019s supercar market.\u201dWith such a light-weight body and astronomical horsepower, the Hennessey Venom GT can sprint from 0 \u2013 60 in just 2.2 seconds with a top speed of 262 mph on a six-speed manual transmission. Compared to the Bugatti Veyron, which has an acceleration speed from 0 \u2013 60 mph in 2.6 seconds with a top speed of 253 mph, the Hennessey Venom GT may be the \u201cVeyron-killer\u201d. To stop this venomous beast, Brembo brakes with 6-piston calipers are added in front and 4-piston calipers in the rear to give enough clamping power on the massive 15-inch carbon ceramic rotors. Hennessey plans on building 10 units of the Venom GT yearly. Four Hennessey Venom GT\u2019s have already been sold at about $600,000 each in this year.", 812 | "manual_rewrite": "No, I meant the funny car. But, that\u2019s interesting. I heard it has ludicrous mode. How does the Tesla Roadster Sport\u2019s quickness compare to the Hennessey Venom GT?", 813 | "REW_rewrite": [ 814 | "No, I meant the funny car. Can you tell me how fast a funny car can go and how it compares to the Hennessey Venom? Also, I heard the Roadster has a ludicrous mode. Can you tell me more about that and how it affects the car's quickness?", 815 | "No, I meant the funny car. How does the acceleration of a funny car compare to the Hennessey Venom GT? Also, I heard the Tesla Roadster has a ludicrous mode. Can you tell me more about it and how it compares to the Venom?", 816 | "No, I meant the funny car. Can you tell me how its quickness compares to the Hennessey Venom GT? Also, I heard the Tesla Roadster has a ludicrous mode. Can you tell me more about it?", 817 | "No, I was asking about the funny car and also heard that it has a ludicrous mode. How does the acceleration of the funny car compare to the Hennessey Venom GT?", 818 | "No, I meant the funny car. Can you tell me how fast the funny car can go and how it compares to the Venom in terms of quickness? Also, I heard the Roadster has a ludicrous mode. Can you tell me more about it?" 819 | ], 820 | "manual_ndcg3": 0.0, 821 | "REW_ndcg3": 0.0, 822 | "ndcg3: manual_is_better": false, 823 | "ndcg3: REW_is_better": false, 824 | "ndcg3: tie": true, 825 | "human_evaluation": "good-a" 826 | }, 827 | { 828 | "turn_id": 9, 829 | "question": "It sounds like the Tesla is very competitive. What makes its engine unique compared with gas ones?", 830 | "response": "\"Tesla Roadster Motor and Other Features Prev NEXTThe Roadster motor. Image \u00a9 2006 Tesla Motors, Inc. All rights reserved. The Roadster has just three gears. Image \u00a9 2006 Tesla Motors, Inc. All rights reserved. The Roaster cockpit. Image \u00a9 2006 Tesla Motors, Inc. All rights reserved. The heart of the Tesla Roadster is its 3-phase, 4-pole electric induction motor, which weighs just 70 pounds. Tesla's claims and independent testing show that the Roadster can reach 60 mph in about four seconds and its speed will top out somewhere around 130 mph. But the Roadster's performance isn't just about speed and acceleration. The unique properties of an electric motor give it a huge advantage over a combustion engine in terms of torque, force that tends to rotate or turn things, and power band, the range of operating speeds under which the engine operates efficiently. The Roadster can generate large amounts of torque even at very low RPM, and the motor can always turn out major horsepower. It can reach over 13,000 RPM, something very few large combustion engines can do.\u201d", 831 | "manual_rewrite": "It sounds like the Tesla is very competitive. What makes its electric engine unique compared with gas ones?", 832 | "REW_rewrite": [ 833 | "What makes the Tesla's engine unique compared to traditional gas engines?", 834 | "What makes the Tesla Roadster's engine unique compared to gas engines?", 835 | "What is unique about the Tesla Roadster's engine compared to gas engines?", 836 | "Can you tell me more about the engine of the Tesla Roadster and how it compares to traditional gasoline engines?", 837 | "What makes the engine of Tesla Roadster unique compared to gas engines?" 838 | ], 839 | "manual_ndcg3": 0.0, 840 | "REW_ndcg3": 0.0, 841 | "ndcg3: manual_is_better": false, 842 | "ndcg3: REW_is_better": false, 843 | "ndcg3: tie": true, 844 | "human_evaluation": "good-a" 845 | } 846 | ] 847 | } 848 | ] --------------------------------------------------------------------------------