├── PARADE_test.txt
├── PARADE_train.txt
├── PARADE_validation.txt
└── README.md


/README.md:
--------------------------------------------------------------------------------
 1 | # PARADE: A New Dataset for Paraphrase Identification Requiring Computer Science Domain Knowledge
 2 | code and dataset of EMNLP 2020 paper "PARADE: A New Dataset for Paraphrase Identification Requiring Computer Science Domain Knowledge"
 3 | 
 4 | Paper link: https://arxiv.org/pdf/2010.03725.pdf
 5 | 
 6 | Author homepage: http://people.tamu.edu/~yunhe/
 7 | 
 8 | Except to the binary-labels-verion dicussed in the paper, we also release more fine-grained labels here:
 9 | 
10 | 3: 3 expert annotators label this pair as paraphrase
11 | 
12 | 2: 2 expert annotators label this pair as paraphrase
13 | 
14 | 1: 1 expert annotators label this pair as paraphrase
15 | 
16 | 0: 0 expert annotators label this pair as paraphrase
17 | 
18 | And we evaluate Albert-xxlarge on this four-classes version dataset and obtain an accuracy of 0.512, which is more challenging than the binary-labels-verion dicussed in the paper.
19 | 
20 | 
21 | 
22 | 


--------------------------------------------------------------------------------