└── README.md


/README.md:
--------------------------------------------------------------------------------
 1 | ## Code for Retrieval Augmentation for Commonsense Reasoning
 2 | 
 3 | ### Introduction of RACo
 4 | 
 5 | - This is the official resources of our **EMNLP 2022** paper **"Retrieval Augmentation for Commonsense Reasoning: A Unified Approach"** [\[arXiv\]](https://arxiv.org/abs/2210.12887).
 6 | 
 7 | ### Step0 Download the Commonsense Corpus
 8 | 
 9 | - Corpus (20M): Google drive [\[link\]](https://drive.google.com/drive/folders/1oj2POBBy8kyBFNU5nHb05wu2DlcOfGnV?usp=share_link) 
10 | 
11 | - Code: Official DPR code [\[link\]](https://github.com/facebookresearch/DPR) 
12 |     
13 |     - first run `python merge-corpus.py` to construct corpus
14 |     - modify the [retrieval corpus path](https://github.com/facebookresearch/DPR/blob/a31212dc0a54dfa85d8bfa01e1669f149ac832b7/conf/ctx_sources/default_sources.yaml#L5) in above the DPR code
15 | 
16 | ### Step1 Training: Commonsense Retriever
17 | 
18 | - Training Data: Google drive [\[link\]](https://drive.google.com/drive/folders/1abY1yMj9ygF7Plb52sDEBsKqn4GlwlBx?usp=share_link) 
19 | 
20 | - Code: Official DPR code, same as above. 
21 | 
22 |     - modify the [training data path](https://github.com/facebookresearch/DPR/blob/a31212dc0a54dfa85d8bfa01e1669f149ac832b7/conf/datasets/encoder_train_default.yaml#L45) as 
23 | 
24 |     ```
25 |     raco_train:
26 |         _target_: dpr.data.biencoder_data.JsonQADataset
27 |         file: {your folder path}/train.json
28 |     
29 |     raco_dev:
30 |         _target_: dpr.data.biencoder_data.JsonQADataset
31 |         file: {your folder path}/dev.json
32 |     ```
33 | 
34 | ### Step1 Inference: Retrieve Documents 
35 | 
36 | - Inference Data: Google drive [\[link\]](https://drive.google.com/drive/folders/1VMpi4hl1VYuaBPhC3gB4PDTVlXdl-6Sn?usp=share_link)
37 | 
38 | - Code: Official DPR code, same as above. 
39 | 
40 |     - modify the [inference data path](https://github.com/facebookresearch/DPR/blob/a31212dc0a54dfa85d8bfa01e1669f149ac832b7/conf/datasets/retriever_default.yaml#L35) as 
41 | 
42 |     ```
43 |     {dataset}_train:
44 |         _target_: dpr.data.retriever_data.CsvQASrc
45 |         file: {your folder path}/{dataset}/train.tsv
46 |     
47 |     {dataset}_dev:
48 |         _target_: dpr.data.retriever_data.CsvQASrc
49 |         file: {your folder path}/{dataset}/dev.tsv
50 | 
51 |     {dataset}_test:
52 |         _target_: dpr.data.retriever_data.CsvQASrc
53 |         file: {your folder path}/{dataset}/test.tsv
54 |     ```
55 | 
56 | ### Step2 Training and Inference: Commonsense Reader 
57 | 
58 | - Training Data: obtained from the last step
59 | 
60 | - Code: Official FiD code [\[link\]](https://github.com/facebookresearch/FiD) 
61 | 
62 | 
63 | ### Step2: FiD Outputs Evaluation 
64 | 
65 | - Accuracy is the same as exact match in FiD code.
66 | 
67 | - BLUE, ROUGE is from the [CommonGen GitHub repo](https://github.com/INK-USC/CommonGen).
68 | 
69 |     - Some commonly seen issues when installing the lib [\[link\]](https://docs.google.com/document/d/1BzOxaTx_ekV7UD07IALvhedn0T9uXNpT/edit?usp=sharing&ouid=111155601619122094904&rtpof=true&sd=true)
70 | 
71 | 
72 | ## Citation
73 | 
74 | ```
75 | @inproceedings{yu2022retrieval,
76 |   title={Retrieval Augmentation for Commonsense Reasoning: A Unified Approach},
77 |   author={Yu, Wenhao and Zhu, Chenguang and Zhang, Zhihan and Wang, Shuohang and Zhang, Zhuosheng and Fang, Yuwei and Jiang, Meng},
78 |   booktitle={Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
79 |   year={2022}
80 | }
81 | ```
82 | 
83 | Please kindly cite our paper if you find this paper and the codes helpful.
84 | 


--------------------------------------------------------------------------------