├── PARADE_test.txt ├── PARADE_train.txt ├── PARADE_validation.txt └── README.md /README.md: -------------------------------------------------------------------------------- 1 | # PARADE: A New Dataset for Paraphrase Identification Requiring Computer Science Domain Knowledge 2 | code and dataset of EMNLP 2020 paper "PARADE: A New Dataset for Paraphrase Identification Requiring Computer Science Domain Knowledge" 3 | 4 | Paper link: https://arxiv.org/pdf/2010.03725.pdf 5 | 6 | Author homepage: http://people.tamu.edu/~yunhe/ 7 | 8 | Except to the binary-labels-verion dicussed in the paper, we also release more fine-grained labels here: 9 | 10 | 3: 3 expert annotators label this pair as paraphrase 11 | 12 | 2: 2 expert annotators label this pair as paraphrase 13 | 14 | 1: 1 expert annotators label this pair as paraphrase 15 | 16 | 0: 0 expert annotators label this pair as paraphrase 17 | 18 | And we evaluate Albert-xxlarge on this four-classes version dataset and obtain an accuracy of 0.512, which is more challenging than the binary-labels-verion dicussed in the paper. 19 | 20 | 21 | 22 | --------------------------------------------------------------------------------