└── README.md /README.md: -------------------------------------------------------------------------------- 1 | ## Code for Retrieval Augmentation for Commonsense Reasoning 2 | 3 | ### Introduction of RACo 4 | 5 | - This is the official resources of our **EMNLP 2022** paper **"Retrieval Augmentation for Commonsense Reasoning: A Unified Approach"** [\[arXiv\]](https://arxiv.org/abs/2210.12887). 6 | 7 | ### Step0 Download the Commonsense Corpus 8 | 9 | - Corpus (20M): Google drive [\[link\]](https://drive.google.com/drive/folders/1oj2POBBy8kyBFNU5nHb05wu2DlcOfGnV?usp=share_link) 10 | 11 | - Code: Official DPR code [\[link\]](https://github.com/facebookresearch/DPR) 12 | 13 | - first run `python merge-corpus.py` to construct corpus 14 | - modify the [retrieval corpus path](https://github.com/facebookresearch/DPR/blob/a31212dc0a54dfa85d8bfa01e1669f149ac832b7/conf/ctx_sources/default_sources.yaml#L5) in above the DPR code 15 | 16 | ### Step1 Training: Commonsense Retriever 17 | 18 | - Training Data: Google drive [\[link\]](https://drive.google.com/drive/folders/1abY1yMj9ygF7Plb52sDEBsKqn4GlwlBx?usp=share_link) 19 | 20 | - Code: Official DPR code, same as above. 21 | 22 | - modify the [training data path](https://github.com/facebookresearch/DPR/blob/a31212dc0a54dfa85d8bfa01e1669f149ac832b7/conf/datasets/encoder_train_default.yaml#L45) as 23 | 24 | ``` 25 | raco_train: 26 | _target_: dpr.data.biencoder_data.JsonQADataset 27 | file: {your folder path}/train.json 28 | 29 | raco_dev: 30 | _target_: dpr.data.biencoder_data.JsonQADataset 31 | file: {your folder path}/dev.json 32 | ``` 33 | 34 | ### Step1 Inference: Retrieve Documents 35 | 36 | - Inference Data: Google drive [\[link\]](https://drive.google.com/drive/folders/1VMpi4hl1VYuaBPhC3gB4PDTVlXdl-6Sn?usp=share_link) 37 | 38 | - Code: Official DPR code, same as above. 39 | 40 | - modify the [inference data path](https://github.com/facebookresearch/DPR/blob/a31212dc0a54dfa85d8bfa01e1669f149ac832b7/conf/datasets/retriever_default.yaml#L35) as 41 | 42 | ``` 43 | {dataset}_train: 44 | _target_: dpr.data.retriever_data.CsvQASrc 45 | file: {your folder path}/{dataset}/train.tsv 46 | 47 | {dataset}_dev: 48 | _target_: dpr.data.retriever_data.CsvQASrc 49 | file: {your folder path}/{dataset}/dev.tsv 50 | 51 | {dataset}_test: 52 | _target_: dpr.data.retriever_data.CsvQASrc 53 | file: {your folder path}/{dataset}/test.tsv 54 | ``` 55 | 56 | ### Step2 Training and Inference: Commonsense Reader 57 | 58 | - Training Data: obtained from the last step 59 | 60 | - Code: Official FiD code [\[link\]](https://github.com/facebookresearch/FiD) 61 | 62 | 63 | ### Step2: FiD Outputs Evaluation 64 | 65 | - Accuracy is the same as exact match in FiD code. 66 | 67 | - BLUE, ROUGE is from the [CommonGen GitHub repo](https://github.com/INK-USC/CommonGen). 68 | 69 | - Some commonly seen issues when installing the lib [\[link\]](https://docs.google.com/document/d/1BzOxaTx_ekV7UD07IALvhedn0T9uXNpT/edit?usp=sharing&ouid=111155601619122094904&rtpof=true&sd=true) 70 | 71 | 72 | ## Citation 73 | 74 | ``` 75 | @inproceedings{yu2022retrieval, 76 | title={Retrieval Augmentation for Commonsense Reasoning: A Unified Approach}, 77 | author={Yu, Wenhao and Zhu, Chenguang and Zhang, Zhihan and Wang, Shuohang and Zhang, Zhuosheng and Fang, Yuwei and Jiang, Meng}, 78 | booktitle={Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP)}, 79 | year={2022} 80 | } 81 | ``` 82 | 83 | Please kindly cite our paper if you find this paper and the codes helpful. 84 | --------------------------------------------------------------------------------