23 | {% for el in result.code %}
24 | {{ el.name }}
25 | {% endfor %}
26 |
27 |
28 | {% endfor %}
29 |
30 |
31 |
--------------------------------------------------------------------------------
/artworks/ready.svg:
--------------------------------------------------------------------------------
1 |
--------------------------------------------------------------------------------
/artworks/plan.svg:
--------------------------------------------------------------------------------
1 |
--------------------------------------------------------------------------------
/artworks/progress.svg:
--------------------------------------------------------------------------------
1 |
--------------------------------------------------------------------------------
/artworks/not-in-plan.svg:
--------------------------------------------------------------------------------
1 |
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) 2018 Neural Text Similarity Community
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 |
--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 | # Byte-compiled / optimized / DLL files
2 | __pycache__/
3 | *.py[cod]
4 | *$py.class
5 |
6 | # C extensions
7 | *.so
8 |
9 | # Distribution / packaging
10 | .Python
11 | build/
12 | develop-eggs/
13 | dist/
14 | downloads/
15 | eggs/
16 | .eggs/
17 | lib/
18 | lib64/
19 | parts/
20 | sdist/
21 | var/
22 | wheels/
23 | *.egg-info/
24 | .installed.cfg
25 | *.egg
26 | MANIFEST
27 |
28 | # PyInstaller
29 | # Usually these files are written by a python script from a template
30 | # before PyInstaller builds the exe, so as to inject date/other infos into it.
31 | *.manifest
32 | *.spec
33 |
34 | # Installer logs
35 | pip-log.txt
36 | pip-delete-this-directory.txt
37 |
38 | # Unit test / coverage reports
39 | htmlcov/
40 | .tox/
41 | .coverage
42 | .coverage.*
43 | .cache
44 | nosetests.xml
45 | coverage.xml
46 | *.cover
47 | .hypothesis/
48 | .pytest_cache/
49 |
50 | # Translations
51 | *.mo
52 | *.pot
53 |
54 | # Django stuff:
55 | *.log
56 | local_settings.py
57 | db.sqlite3
58 |
59 | # Flask stuff:
60 | instance/
61 | .webassets-cache
62 |
63 | # Scrapy stuff:
64 | .scrapy
65 |
66 | # Sphinx documentation
67 | docs/_build/
68 |
69 | # PyBuilder
70 | target/
71 |
72 | # Jupyter Notebook
73 | .ipynb_checkpoints
74 |
75 | # pyenv
76 | .python-version
77 |
78 | # celery beat schedule file
79 | celerybeat-schedule
80 |
81 | # SageMath parsed files
82 | *.sage.py
83 |
84 | # Environments
85 | .env
86 | .venv
87 | env/
88 | venv/
89 | ENV/
90 | env.bak/
91 | venv.bak/
92 |
93 | # Spyder project settings
94 | .spyderproject
95 | .spyproject
96 |
97 | # Rope project settings
98 | .ropeproject
99 |
100 | # mkdocs documentation
101 | /site
102 |
103 | # mypy
104 | .mypy_cache/
105 |
106 | .DS_Store
107 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
Awesome Neural Models for Semantic Match
5 |
6 |
7 |
8 | A collection of papers maintained by MatchZoo Team.
9 |
10 | Checkout our open source toolkit MatchZoo for more information!
11 |
12 |
13 |
14 | Text matching is a core component in many natural language processing tasks, where many task can be viewed as a matching between two texts input.
15 |
16 |
17 |
18 |
19 |
20 | Where **s** and **t** are source text input and target text input, respectively. The **psi** and **phi** are representation function for input **s** and **t**, respectively. The **f** is the interaction function, and **g** is the aggregation function. More detailed explaination about this formula can be found on [A Deep Look into Neural Ranking Models for Information Retrieval](https://arxiv.org/abs/1903.06902). The representative matching tasks are as follows:
21 |
22 | | **Tasks** | **Source Text** | **Target Text** |
23 | | :------------------------------------------------------------------------------------------: | :-------------: | :----------------------: |
24 | | [Ad-hoc Information Retrieval](Ad-hoc-Information-Retrieval/Ad-hoc-Information-Retrieval.md) | query | document (title/content) |
25 | | [Community Question Answering](Community-Question-Answering/Community-Question-Answering.md) | question | question/answer |
26 | | [Paraphrase Identification](Paraphrase-Identification/Paraphrase-Identification.md) | string1 | string2 |
27 | | [Natural Language Inference](Natural-Language-Inference/Natural-Language-Inference.md) | premise | hypothesis |
28 | | [Response Retrieval](Response-Retrieval/Response-Retrieval.md) | context/utterances | response |
29 | | [Long Form Question Answering](LFQA/LFQA.md) | question+document | answer |
30 |
31 | ### Healthcheck
32 |
33 | ```python
34 | pip3 install -r requirements.txt
35 | python3 healthcheck.py
36 | ```
37 |
--------------------------------------------------------------------------------
/artworks/awesome.svg:
--------------------------------------------------------------------------------
1 |
--------------------------------------------------------------------------------
/Paraphrase-Identification/Paraphrase-Identification.md:
--------------------------------------------------------------------------------
1 | # Paraphrase Identification
2 |
3 | ---
4 |
5 | **Paraphrase Identification** is an task to determine whether two sentences have the same meaning, a problem considered a touchstone of natural language understanding.
6 |
7 | Take an instance in MRPC dataset for example, this is a pair of sentences with the same meaning:
8 |
9 | sentence1: Amrozi accused his brother, whom he called "the witness", of deliberately distorting his evidence.
10 |
11 | sentence2: Referring to him as only "the witness", Amrozi accused his brother of deliberately distorting his evidence.
12 |
13 |
14 | Some benchmark datasets are listed in the following.
15 |
16 | ## Classic Datasets
17 |
18 |
67 |
68 | - [**TRECQA**](https://trec.nist.gov/data/qa.html) dataset is created by [Wang et. al.](https://www.aclweb.org/anthology/D07-1003) from TREC QA track 8-13 data, with candidate answers automatically selected from each question’s document pool using a combination of overlapping non-stop word counts and pattern matching. This data set is one of the most widely used benchmarks for [answer sentence selection]().
69 |
70 | - [**WikiQA**](https://www.microsoft.com/en-us/download/details.aspx?id=52419) is a publicly available set of question and sentence pairs, collected and annotated for research on open-domain question answering by Microsoft Research.
71 |
72 | - [**InsuranceQA**](https://github.com/shuzi/insuranceQA) is a non-factoid QA dataset from the insurance domain. Question may have multiple correct answers and normally the questions are much shorter than the answers. The average length of questions and answers in tokens are 7 and 95, respectively. For each question in the development and test sets, there is a set of 500 candidate answers.
73 |
74 | - [**FiQA**](https://sites.google.com/view/fiqa) is a non-factoid QA dataset from the financial domain which has been recently released for WWW 2018 Challenges. The dataset is built by crawling Stackexchange, Reddit and StockTwits in which part of the questions are opinionated, targeting mined opinions and their respective entities, aspects, sentiment polarity and opinion holder.
75 |
76 | - [**Yahoo! Answers**](https://webscope.sandbox.yahoo.com) is a web site where people post questions and answers, all of which are public to any web user willing to browse or download them. The data we have collected is the Yahoo! Answers corpus as of 10/25/2007. This is a benchmark dataset for communitybased question answering that was collected from Yahoo Answers. In this dataset, the answer lengths are relatively longer than TrecQA and WikiQA.
77 |
78 | - [**SemEval-2015 Task 3**](http://alt.qcri.org/semeval2015/task3/) consists of two sub-tasks. In Subtask A, given a question (short title + extended description), and several community answers, classify each of the answer as definitely relevance (good), potentially useful (potential), or bad or irrelevant (bad, dialog, non-english other). In Subtask B, given a YES/NO question (short title + extended description), and a list of community answers, decide whether the global answer to the question should be yes, no, or unsure.
79 |
80 | - [**SemEval-2016 Task 3**](http://alt.qcri.org/semeval2016/task3/) consists two sub-tasks, namely _Question-Comment Similarity_ and _Question-Question Similarity_. In the _Question-Comment Similarity_ task, given a question from a question-comment thread, rank the comments according to their relevance with respect to the question. In _Question-Question Similarity_ task, given the new question, rerank all similar questions retrieved by a search engine.
81 |
82 | - [**SemEval-2017 Task 3**](http://alt.qcri.org/semeval2017/task3/) contains two sub-tasks, namely _Question Similarity_ and _Relevance Classification_. Given the new question and a set of related questions from the collection, the _Question Similarity_ task is to rank the similar questions according to their similarity to the original question. While the _Relevance Classification_ is to rank the answer posts according to their relevance with respect to the question based on a question-answer thread.
83 |
84 | ## Performance
85 |
86 | ### TREC QA (Raw Version)
87 |
88 | | Model | Code | MAP | MRR | Paper |
89 | | :---------------------------------- | :----------------------------------------------------------: | :-------: | :-------: | :----------------------------------------------------------- |
90 | | Punyakanok (2004) | — | 0.419 | 0.494 | [Mapping dependencies trees: An application to question answering, ISAIM 2004](http://cogcomp.cs.illinois.edu/papers/PunyakanokRoYi04a.pdf) |
91 | | Cui (2005) | — | 0.427 | 0.526 | [Question Answering Passage Retrieval Using Dependency Relations, SIGIR 2005](https://www.comp.nus.edu.sg/~kanmy/papers/f66-cui.pdf) |
92 | | Wang (2007) | — | 0.603 | 0.685 | [What is the Jeopardy Model? A Quasi-Synchronous Grammar for QA, EMNLP 2007](http://www.aclweb.org/anthology/D/D07/D07-1003.pdf) |
93 | | H&S (2010) | — | 0.609 | 0.692 | [Tree Edit Models for Recognizing Textual Entailments, Paraphrases, and Answers to Questions, NAACL 2010](http://www.aclweb.org/anthology/N10-1145) |
94 | | W&M (2010) | — | 0.595 | 0.695 | [Probabilistic Tree-Edit Models with Structured Latent Variables for Textual Entailment and Question Answering, COLING 2020](http://aclweb.org/anthology//C/C10/C10-1131.pdf) |
95 | | Yao (2013) | — | 0.631 | 0.748 | [Answer Extraction as Sequence Tagging with Tree Edit Distance, NAACL 2013](http://www.aclweb.org/anthology/N13-1106.pdf) |
96 | | S&M (2013) | — | 0.678 | 0.736 | [Automatic Feature Engineering for Answer Selection and Extraction, EMNLP 2013](http://www.aclweb.org/anthology/D13-1044.pdf) |
97 | | Backward (Shnarch et al., 2013) | — | 0.686 | 0.754 | [Probabilistic Models for Lexical Inference, Ph.D. thesis 2013](http://u.cs.biu.ac.il/~nlp/wp-content/uploads/eyal-thesis-library-ready.pdf) |
98 | | LCLR (Yih et al., 2013) | — | 0.709 | 0.770 | [Question Answering Using Enhanced Lexical Semantic Models, ACL 2013](http://research.microsoft.com/pubs/192357/QA-SentSel-Updated-PostACL.pdf) |
99 | | bigram+count (Yu et al., 2014) | — | 0.711 | 0.785 | [Deep Learning for Answer Sentence Selection, NIPS 2014](http://arxiv.org/pdf/1412.1632v1.pdf) |
100 | | BLSTM (W&N et al., 2015) | — | 0.713 | 0.791 | [A Long Short-Term Memory Model for Answer Sentence Selection in Question Answering, ACL 2015](http://www.aclweb.org/anthology/P15-2116) |
101 | | Architecture-II (Feng et al., 2015) | — | 0.711 | 0.800 | [Applying deep learning to answer selection: A study and an open task, ASRU 2015](http://arxiv.org/abs/1508.01585) |
102 | | PairCNN (Severyn et al., 2015) | [](https://github.com/zhangzibin/PairCNN-Ranking) | 0.746 | 0.808 | [Learning to Rank Short Text Pairs with Convolutional Deep Neural Networks, SIGIR 2015](http://disi.unitn.eu/moschitti/since2013/2015_SIGIR_Severyn_LearningRankShort.pdf) |
103 | | aNMM (Yang et al., 2016) | [](https://github.com/yangliuy/aNMM-CIKM16) [](https://github.com/NTMC-Community/MatchZoo-py/tree/master/matchzoo/models/anmm.py) | 0.750 | 0.811 | [aNMM: Ranking Short Answer Texts with Attention-Based Neural Matching Model, CIKM 2016](http://maroo.cs.umass.edu/pub/web/getpdf.php?id=1240) |
104 | | HDLA (Tay et al., 2017) | [](https://github.com/vanzytay/YahooQA_Splits) | 0.750 | 0.815 | [Learning to Rank Question Answer Pairs with Holographic Dual LSTM Architecture, SIGIR 2017](https://arxiv.org/abs/1707.06372) |
105 | | PWIM (Hua et al. 2016) | [](https://github.com/castorini/VDPWI-NN-Torch) | 0.758 | 0.822 | [Pairwise Word Interaction Modeling with Deep Neural Networks for Semantic Similarity Measurement, NAACL 2016](https://cs.uwaterloo.ca/~jimmylin/publications/He_etal_NAACL-HTL2016.pdf) |
106 | | MP-CNN (Hua et al. 2015) | [](https://github.com/castorini/MP-CNN-Torch) | 0.762 | 0.830 | [Multi-Perspective Sentence Similarity Modeling with Convolutional Neural Networks, EMNLP 2015](http://aclweb.org/anthology/D/D15/D15-1181.pdf) |
107 | | HyperQA (Tay et al., 2017) | [](https://github.com/vanzytay/WSDM2018_HyperQA) | 0.770 | 0.825 | [Enabling Efficient Question Answer Retrieval via Hyperbolic Neural Networks, WSDM 2018](https://arxiv.org/pdf/1707.07847) |
108 | | MP-CNN (Rao et al., 2016) | [](https://github.com/castorini/NCE-CNN-Torch) | 0.780 | 0.834 | [Noise-Contrastive Estimation for Answer Selection with Deep Neural Networks, CIKM 2016](https://dl.acm.org/authorize.cfm?key=N27026) |
109 | | HCAN (Rao et al., 2019) | — | 0.774 | 0.843 | [Bridging the Gap between Relevance Matching and Semantic Matching for Short Text Similarity Modeling, EMNLP 2019](https://jinfengr.github.io/publications/Rao_etal_EMNLP2019.pdf) |
110 | | MP-CNN (Tayyar et al., 2018) | — | 0.836 | 0.863 | [Integrating Question Classification and Deep Learning for improved Answer Selection, COLING 2018](https://aclanthology.coli.uni-saarland.de/papers/C18-1278/c18-1278) |
111 | | Pre-Attention (Kamath et al., 2019) | — | 0.852 | 0.891 | [Predicting and Integrating Expected Answer Types into a Simple Recurrent Neural Network Model for Answer Sentence Selection, CICLING 2019](https://hal.archives-ouvertes.fr/hal-02104488/) |
112 | | CETE (Laskar et al., 2020) | — | **0.950** | **0.980** | [Contextualized Embeddings based Transformer Encoder for Sentence Similarity Modeling in Answer Selection Task LREC 2020](http://www.lrec-conf.org/proceedings/lrec2020/pdf/2020.lrec-1.676.pdf) |
113 |
114 | ### TREC QA (Clean Version)
115 |
116 | | Model | Code | MAP | MRR | Paper |
117 | | :------------------------------- | :----------------------------------------------------------: | :-------: | :-------: | :----------------------------------------------------------- |
118 | | W&I (2015) | — | 0.746 | 0.820 | [FAQ-based Question Answering via Word Alignment, arXiv 2015](http://arxiv.org/abs/1507.02628) |
119 | | LSTM (Tan et al., 2015) | [](https://github.com/Alan-Lee123/answer-selection) | 0.728 | 0.832 | [LSTM-Based Deep Learning Models for Nonfactoid Answer Selection, arXiv 2015](http://arxiv.org/abs/1511.04108) |
120 | | AP-CNN (dos Santos et al. 2016) | — | 0.753 | 0.851 | [Attentive Pooling Networks, arXiv 2016](http://arxiv.org/abs/1602.03609) |
121 | | L.D.C Model (Wang et al., 2016) | — | 0.771 | 0.845 | [Sentence Similarity Learning by Lexical Decomposition and Composition, COLING 2016](http://arxiv.org/pdf/1602.07019v1.pdf) |
122 | | MP-CNN (Hua et al., 2015) | [](https://github.com/castorini/MP-CNN-Torch) | 0.777 | 0.836 | [Multi-Perspective Sentence Similarity Modeling with Convolutional Neural Networks, EMNLP 2015](http://aclweb.org/anthology/D/D15/D15-1181.pdf) |
123 | | HyperQA (Tay et al., 2017) | [](https://github.com/vanzytay/WSDM2018_HyperQA) | 0.784 | 0.865 | [Enabling Efficient Question Answer Retrieval via Hyperbolic Neural Networks, WSDM 2018](https://arxiv.org/pdf/1707.07847) |
124 | | MP-CNN (Rao et al., 2016) | [](https://github.com/castorini/NCE-CNN-Torch) | 0.801 | 0.877 | [Noise-Contrastive Estimation for Answer Selection with Deep Neural Networks, CIKM 2016](https://dl.acm.org/authorize.cfm?key=N27026) |
125 | | BiMPM (Wang et al., 2017) | [](https://github.com/zhiguowang/BiMPM) [](https://github.com/NTMC-Community/MatchZoo-py/blob/dev/matchzoo/models/bimpm.py) | 0.802 | 0.875 | [Bilateral Multi-Perspective Matching for Natural Language Sentences, arXiv 2017](https://arxiv.org/pdf/1702.03814.pdf) |
126 | | CA (Bian et al., 2017) | [](https://github.com/wjbianjason/Dynamic-Clip-Attention) [](https://github.com/NTMC-Community/MatchZoo-py/blob/dev/matchzoo/models/dynamic_clip.py) | 0.821 | 0.899 | [A Compare-Aggregate Model with Dynamic-Clip Attention for Answer Selection, CIKM 2017](https://dl.acm.org/citation.cfm?id=3133089&CFID=791659397&CFTOKEN=43388059) |
127 | | IWAN (Shen et al., 2017) | — | 0.822 | 0.889 | [Inter-Weighted Alignment Network for Sentence Pair Modeling, EMNLP 2017](https://aclanthology.info/pdf/D/D17/D17-1122.pdf) |
128 | | sCARNN (Tran et al., 2018) | — | 0.829 | 0.875 | [The Context-dependent Additive Recurrent Neural Net, NAACL 2018](http://www.aclweb.org/anthology/N18-1115) |
129 | | MCAN (Tay et al., 2018) | — | 0.838 | 0.904 | [Multi-Cast Attention Networks, KDD 2018](https://arxiv.org/abs/1806.00778) |
130 | | MP-CNN (Tayyar et al., 2018) | — | 0.865 | 0.904 | [Integrating Question Classification and Deep Learning for improved Answer Selection, COLING 2018](https://aclanthology.coli.uni-saarland.de/papers/C18-1278/c18-1278) |
131 | | CA + LM + LC (Yoon et al., 2019) | — | 0.868 | 0.928 | [A Compare-Aggregate Model with Latent Clustering for Answer Selection, CIKM 2019](https://arxiv.org/abs/1905.12897) |
132 | | GSAMN (Lai et al., 2019) | [](https://github.com/laituan245/StackExchangeQA) | 0.914 | 0.957 | [A Gated Self-attention Memory Network for Answer Selection, EMNLP 2019](https://arxiv.org/pdf/1909.09696.pdf) |
133 | | TANDA (Garg et al., 2019) | [](https://github.com/alexa/wqa_tanda) | **0.943** | 0.974 | [TANDA: Transfer and Adapt Pre-Trained Transformer Models for Answer Sentence Selection, AAAI 2020](https://arxiv.org/abs/1911.04118) |
134 | | CETE (Laskar et al., 2020) | — | 0.936 | **0.978** | [Contextualized Embeddings based Transformer Encoder for Sentence Similarity Modeling in Answer Selection Task, LREC 2020](http://www.lrec-conf.org/proceedings/lrec2020/pdf/2020.lrec-1.676.pdf) |
135 |
136 | ### WikiQA
137 |
138 | | Model | Code | MAP | MRR | Paper |
139 | | ---------------------------------------- | :-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | --------- | --------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
140 | | ABCNN (Yin et al., 2016) | [](https://github.com/galsang/ABCNN) | 0.6921 | 0.7108 | [ABCNN: Attention-based convolutional neural network for modeling sentence pairs, ACL 2016](https://doi.org/10.1162/tacl_a_00097) |
141 | | Multi-Perspective CNN (Rao et al., 2016) | [](https://github.com/castorini/NCE-CNN-Torch) | 0.701 | 0.718 | [Noise-Contrastive Estimation for Answer Selection with Deep Neural Networks, CIKM 2016](https://dl.acm.org/authorize.cfm?key=N27026) |
142 | | HyperQA (Tay et al., 2017) | [](https://github.com/vanzytay/WSDM2018_HyperQA) | 0.705 | 0.720 | [Enabling Efficient Question Answer Retrieval via Hyperbolic Neural Networks, WSDM 2018](https://arxiv.org/pdf/1707.07847) |
143 | | KVMN (Miller et al., 2016) | [](https://github.com/siyuanzhao/key-value-memory-networks) | 0.7069 | 0.7265 | [Key-Value Memory Networks for Directly Reading Documents, ACL 2016](https://doi.org/10.18653/v1/D16-1147) |
144 | | BiMPM (Wang et al., 2017) | [](https://github.com/zhiguowang/BiMPM) [](https://github.com/NTMC-Community/MatchZoo-py/blob/dev/matchzoo/models/bimpm.py) | 0.718 | 0.731 | [Bilateral Multi-Perspective Matching for Natural Language Sentences, IJCAI 2017](https://arxiv.org/pdf/1702.03814.pdf) |
145 | | IWAN (Shen et al., 2017) | — | 0.733 | 0.750 | [Inter-Weighted Alignment Network for Sentence Pair Modeling, EMNLP 2017](https://aclanthology.info/pdf/D/D17/D17-1122.pdf) |
146 | | CA (Wang and Jiang, 2017) | [](https://github.com/pcgreat/SeqMatchSeq) | 0.7433 | 0.7545 | [A Compare-Aggregate Model for Matching Text Sequences, ICLR 2017](https://arxiv.org/abs/1611.01747) |
147 | | HCRN (Tay et al., 2018c) | [](https://github.com/NTMC-Community/MatchZoo-py/blob/dev/matchzoo/models/hcrn.py) | 0.7430 | 0.7560 | [Hermitian co-attention networks for text matching in asymmetrical domains, IJCAI 2018](https://www.ijcai.org/proceedings/2018/615) |
148 | | Compare-Aggregate (Bian et al., 2017) | [](https://github.com/wjbianjason/Dynamic-Clip-Attention) [](https://github.com/NTMC-Community/MatchZoo-py/blob/dev/matchzoo/models/dynamic_clip.py) | 0.748 | 0.758 | [A Compare-Aggregate Model with Dynamic-Clip Attention for Answer Selection, CIKM 2017](https://dl.acm.org/citation.cfm?id=3133089&CFID=791659397&CFTOKEN=43388059) |
149 | | RE2 (Yang et al., 2019) | [](https://github.com/alibaba-edu/simple-effective-text-matching) [](https://github.com/NTMC-Community/MatchZoo-py/blob/dev/matchzoo/models/re2.py) | 0.7452 | 0.7618 | [Simple and Effective Text Matching with Richer Alignment Features, ACL 2019](https://www.aclweb.org/anthology/P19-1465.pdf) |
150 | | GSAMN (Lai et al., 2019) | [](https://github.com/laituan245/StackExchangeQA) | 0.857 | 0.872 | [A Gated Self-attention Memory Network for Answer Selection, EMNLP 2019](https://arxiv.org/pdf/1909.09696.pdf) |
151 | | TANDA (Garg et al., 2019) | [](https://github.com/alexa/wqa_tanda) | **0.920** | **0.933** | [TANDA: Transfer and Adapt Pre-Trained Transformer Models for Answer Sentence Selection, AAAI 2020](https://arxiv.org/abs/1911.04118) |
152 |
--------------------------------------------------------------------------------
/Community-Question-Answering copy/Community-Question-Answering.md:
--------------------------------------------------------------------------------
1 | # Community Question Answering
2 |
3 | **Community Question Answer** is to automatically search for relevant answers among many responses provided for a given question (Answer Selection), and search for relevant questions to reuse their existing answers (Question Retrieval).
4 |
5 | ## Classic Datasets
6 |
7 |
67 |
68 | - [**TRECQA**](https://trec.nist.gov/data/qa.html) dataset is created by [Wang et. al.](https://www.aclweb.org/anthology/D07-1003) from TREC QA track 8-13 data, with candidate answers automatically selected from each question’s document pool using a combination of overlapping non-stop word counts and pattern matching. This data set is one of the most widely used benchmarks for [answer sentence selection]().
69 |
70 | - [**WikiQA**](https://www.microsoft.com/en-us/download/details.aspx?id=52419) is a publicly available set of question and sentence pairs, collected and annotated for research on open-domain question answering by Microsoft Research.
71 |
72 | - [**InsuranceQA**](https://github.com/shuzi/insuranceQA) is a non-factoid QA dataset from the insurance domain. Question may have multiple correct answers and normally the questions are much shorter than the answers. The average length of questions and answers in tokens are 7 and 95, respectively. For each question in the development and test sets, there is a set of 500 candidate answers.
73 |
74 | - [**FiQA**](https://sites.google.com/view/fiqa) is a non-factoid QA dataset from the financial domain which has been recently released for WWW 2018 Challenges. The dataset is built by crawling Stackexchange, Reddit and StockTwits in which part of the questions are opinionated, targeting mined opinions and their respective entities, aspects, sentiment polarity and opinion holder.
75 |
76 | - [**Yahoo! Answers**](https://webscope.sandbox.yahoo.com) is a web site where people post questions and answers, all of which are public to any web user willing to browse or download them. The data we have collected is the Yahoo! Answers corpus as of 10/25/2007. This is a benchmark dataset for communitybased question answering that was collected from Yahoo Answers. In this dataset, the answer lengths are relatively longer than TrecQA and WikiQA.
77 |
78 | - [**SemEval-2015 Task 3**](http://alt.qcri.org/semeval2015/task3/) consists of two sub-tasks. In Subtask A, given a question (short title + extended description), and several community answers, classify each of the answer as definitely relevance (good), potentially useful (potential), or bad or irrelevant (bad, dialog, non-english other). In Subtask B, given a YES/NO question (short title + extended description), and a list of community answers, decide whether the global answer to the question should be yes, no, or unsure.
79 |
80 | - [**SemEval-2016 Task 3**](http://alt.qcri.org/semeval2016/task3/) consists two sub-tasks, namely _Question-Comment Similarity_ and _Question-Question Similarity_. In the _Question-Comment Similarity_ task, given a question from a question-comment thread, rank the comments according to their relevance with respect to the question. In _Question-Question Similarity_ task, given the new question, rerank all similar questions retrieved by a search engine.
81 |
82 | - [**SemEval-2017 Task 3**](http://alt.qcri.org/semeval2017/task3/) contains two sub-tasks, namely _Question Similarity_ and _Relevance Classification_. Given the new question and a set of related questions from the collection, the _Question Similarity_ task is to rank the similar questions according to their similarity to the original question. While the _Relevance Classification_ is to rank the answer posts according to their relevance with respect to the question based on a question-answer thread.
83 |
84 | ## Performance
85 |
86 | ### TREC QA (Raw Version)
87 |
88 | | Model | Code | MAP | MRR | Paper |
89 | | :---------------------------------- | :----------------------------------------------------------: | :-------: | :-------: | :----------------------------------------------------------- |
90 | | Punyakanok (2004) | — | 0.419 | 0.494 | [Mapping dependencies trees: An application to question answering, ISAIM 2004](http://cogcomp.cs.illinois.edu/papers/PunyakanokRoYi04a.pdf) |
91 | | Cui (2005) | — | 0.427 | 0.526 | [Question Answering Passage Retrieval Using Dependency Relations, SIGIR 2005](https://www.comp.nus.edu.sg/~kanmy/papers/f66-cui.pdf) |
92 | | Wang (2007) | — | 0.603 | 0.685 | [What is the Jeopardy Model? A Quasi-Synchronous Grammar for QA, EMNLP 2007](http://www.aclweb.org/anthology/D/D07/D07-1003.pdf) |
93 | | H&S (2010) | — | 0.609 | 0.692 | [Tree Edit Models for Recognizing Textual Entailments, Paraphrases, and Answers to Questions, NAACL 2010](http://www.aclweb.org/anthology/N10-1145) |
94 | | W&M (2010) | — | 0.595 | 0.695 | [Probabilistic Tree-Edit Models with Structured Latent Variables for Textual Entailment and Question Answering, COLING 2020](http://aclweb.org/anthology//C/C10/C10-1131.pdf) |
95 | | Yao (2013) | — | 0.631 | 0.748 | [Answer Extraction as Sequence Tagging with Tree Edit Distance, NAACL 2013](http://www.aclweb.org/anthology/N13-1106.pdf) |
96 | | S&M (2013) | — | 0.678 | 0.736 | [Automatic Feature Engineering for Answer Selection and Extraction, EMNLP 2013](http://www.aclweb.org/anthology/D13-1044.pdf) |
97 | | Backward (Shnarch et al., 2013) | — | 0.686 | 0.754 | [Probabilistic Models for Lexical Inference, Ph.D. thesis 2013](http://u.cs.biu.ac.il/~nlp/wp-content/uploads/eyal-thesis-library-ready.pdf) |
98 | | LCLR (Yih et al., 2013) | — | 0.709 | 0.770 | [Question Answering Using Enhanced Lexical Semantic Models, ACL 2013](http://research.microsoft.com/pubs/192357/QA-SentSel-Updated-PostACL.pdf) |
99 | | bigram+count (Yu et al., 2014) | — | 0.711 | 0.785 | [Deep Learning for Answer Sentence Selection, NIPS 2014](http://arxiv.org/pdf/1412.1632v1.pdf) |
100 | | BLSTM (W&N et al., 2015) | — | 0.713 | 0.791 | [A Long Short-Term Memory Model for Answer Sentence Selection in Question Answering, ACL 2015](http://www.aclweb.org/anthology/P15-2116) |
101 | | Architecture-II (Feng et al., 2015) | — | 0.711 | 0.800 | [Applying deep learning to answer selection: A study and an open task, ASRU 2015](http://arxiv.org/abs/1508.01585) |
102 | | PairCNN (Severyn et al., 2015) | [](https://github.com/zhangzibin/PairCNN-Ranking) | 0.746 | 0.808 | [Learning to Rank Short Text Pairs with Convolutional Deep Neural Networks, SIGIR 2015](http://disi.unitn.eu/moschitti/since2013/2015_SIGIR_Severyn_LearningRankShort.pdf) |
103 | | aNMM (Yang et al., 2016) | [](https://github.com/yangliuy/aNMM-CIKM16) [](https://github.com/NTMC-Community/MatchZoo-py/tree/master/matchzoo/models/anmm.py) | 0.750 | 0.811 | [aNMM: Ranking Short Answer Texts with Attention-Based Neural Matching Model, CIKM 2016](http://maroo.cs.umass.edu/pub/web/getpdf.php?id=1240) |
104 | | HDLA (Tay et al., 2017) | [](https://github.com/vanzytay/YahooQA_Splits) | 0.750 | 0.815 | [Learning to Rank Question Answer Pairs with Holographic Dual LSTM Architecture, SIGIR 2017](https://arxiv.org/abs/1707.06372) |
105 | | PWIM (Hua et al. 2016) | [](https://github.com/castorini/VDPWI-NN-Torch) | 0.758 | 0.822 | [Pairwise Word Interaction Modeling with Deep Neural Networks for Semantic Similarity Measurement, NAACL 2016](https://cs.uwaterloo.ca/~jimmylin/publications/He_etal_NAACL-HTL2016.pdf) |
106 | | MP-CNN (Hua et al. 2015) | [](https://github.com/castorini/MP-CNN-Torch) | 0.762 | 0.830 | [Multi-Perspective Sentence Similarity Modeling with Convolutional Neural Networks, EMNLP 2015](http://aclweb.org/anthology/D/D15/D15-1181.pdf) |
107 | | HyperQA (Tay et al., 2017) | [](https://github.com/vanzytay/WSDM2018_HyperQA) | 0.770 | 0.825 | [Enabling Efficient Question Answer Retrieval via Hyperbolic Neural Networks, WSDM 2018](https://arxiv.org/pdf/1707.07847) |
108 | | MP-CNN (Rao et al., 2016) | [](https://github.com/castorini/NCE-CNN-Torch) | 0.780 | 0.834 | [Noise-Contrastive Estimation for Answer Selection with Deep Neural Networks, CIKM 2016](https://dl.acm.org/authorize.cfm?key=N27026) |
109 | | HCAN (Rao et al., 2019) | — | 0.774 | 0.843 | [Bridging the Gap between Relevance Matching and Semantic Matching for Short Text Similarity Modeling, EMNLP 2019](https://jinfengr.github.io/publications/Rao_etal_EMNLP2019.pdf) |
110 | | MP-CNN (Tayyar et al., 2018) | — | 0.836 | 0.863 | [Integrating Question Classification and Deep Learning for improved Answer Selection, COLING 2018](https://aclanthology.coli.uni-saarland.de/papers/C18-1278/c18-1278) |
111 | | Pre-Attention (Kamath et al., 2019) | — | 0.852 | 0.891 | [Predicting and Integrating Expected Answer Types into a Simple Recurrent Neural Network Model for Answer Sentence Selection, CICLING 2019](https://hal.archives-ouvertes.fr/hal-02104488/) |
112 | | CETE (Laskar et al., 2020) | — | **0.950** | **0.980** | [Contextualized Embeddings based Transformer Encoder for Sentence Similarity Modeling in Answer Selection Task LREC 2020](http://www.lrec-conf.org/proceedings/lrec2020/pdf/2020.lrec-1.676.pdf) |
113 |
114 | ### TREC QA (Clean Version)
115 |
116 | | Model | Code | MAP | MRR | Paper |
117 | | :------------------------------- | :----------------------------------------------------------: | :-------: | :-------: | :----------------------------------------------------------- |
118 | | W&I (2015) | — | 0.746 | 0.820 | [FAQ-based Question Answering via Word Alignment, arXiv 2015](http://arxiv.org/abs/1507.02628) |
119 | | LSTM (Tan et al., 2015) | [](https://github.com/Alan-Lee123/answer-selection) | 0.728 | 0.832 | [LSTM-Based Deep Learning Models for Nonfactoid Answer Selection, arXiv 2015](http://arxiv.org/abs/1511.04108) |
120 | | AP-CNN (dos Santos et al. 2016) | — | 0.753 | 0.851 | [Attentive Pooling Networks, arXiv 2016](http://arxiv.org/abs/1602.03609) |
121 | | L.D.C Model (Wang et al., 2016) | — | 0.771 | 0.845 | [Sentence Similarity Learning by Lexical Decomposition and Composition, COLING 2016](http://arxiv.org/pdf/1602.07019v1.pdf) |
122 | | MP-CNN (Hua et al., 2015) | [](https://github.com/castorini/MP-CNN-Torch) | 0.777 | 0.836 | [Multi-Perspective Sentence Similarity Modeling with Convolutional Neural Networks, EMNLP 2015](http://aclweb.org/anthology/D/D15/D15-1181.pdf) |
123 | | HyperQA (Tay et al., 2017) | [](https://github.com/vanzytay/WSDM2018_HyperQA) | 0.784 | 0.865 | [Enabling Efficient Question Answer Retrieval via Hyperbolic Neural Networks, WSDM 2018](https://arxiv.org/pdf/1707.07847) |
124 | | MP-CNN (Rao et al., 2016) | [](https://github.com/castorini/NCE-CNN-Torch) | 0.801 | 0.877 | [Noise-Contrastive Estimation for Answer Selection with Deep Neural Networks, CIKM 2016](https://dl.acm.org/authorize.cfm?key=N27026) |
125 | | BiMPM (Wang et al., 2017) | [](https://github.com/zhiguowang/BiMPM) [](https://github.com/NTMC-Community/MatchZoo-py/blob/dev/matchzoo/models/bimpm.py) | 0.802 | 0.875 | [Bilateral Multi-Perspective Matching for Natural Language Sentences, arXiv 2017](https://arxiv.org/pdf/1702.03814.pdf) |
126 | | CA (Bian et al., 2017) | [](https://github.com/wjbianjason/Dynamic-Clip-Attention) [](https://github.com/NTMC-Community/MatchZoo-py/blob/dev/matchzoo/models/dynamic_clip.py) | 0.821 | 0.899 | [A Compare-Aggregate Model with Dynamic-Clip Attention for Answer Selection, CIKM 2017](https://dl.acm.org/citation.cfm?id=3133089&CFID=791659397&CFTOKEN=43388059) |
127 | | IWAN (Shen et al., 2017) | — | 0.822 | 0.889 | [Inter-Weighted Alignment Network for Sentence Pair Modeling, EMNLP 2017](https://aclanthology.info/pdf/D/D17/D17-1122.pdf) |
128 | | sCARNN (Tran et al., 2018) | — | 0.829 | 0.875 | [The Context-dependent Additive Recurrent Neural Net, NAACL 2018](http://www.aclweb.org/anthology/N18-1115) |
129 | | MCAN (Tay et al., 2018) | — | 0.838 | 0.904 | [Multi-Cast Attention Networks, KDD 2018](https://arxiv.org/abs/1806.00778) |
130 | | MP-CNN (Tayyar et al., 2018) | — | 0.865 | 0.904 | [Integrating Question Classification and Deep Learning for improved Answer Selection, COLING 2018](https://aclanthology.coli.uni-saarland.de/papers/C18-1278/c18-1278) |
131 | | CA + LM + LC (Yoon et al., 2019) | — | 0.868 | 0.928 | [A Compare-Aggregate Model with Latent Clustering for Answer Selection, CIKM 2019](https://arxiv.org/abs/1905.12897) |
132 | | GSAMN (Lai et al., 2019) | [](https://github.com/laituan245/StackExchangeQA) | 0.914 | 0.957 | [A Gated Self-attention Memory Network for Answer Selection, EMNLP 2019](https://arxiv.org/pdf/1909.09696.pdf) |
133 | | TANDA (Garg et al., 2019) | [](https://github.com/alexa/wqa_tanda) | **0.943** | 0.974 | [TANDA: Transfer and Adapt Pre-Trained Transformer Models for Answer Sentence Selection, AAAI 2020](https://arxiv.org/abs/1911.04118) |
134 | | CETE (Laskar et al., 2020) | — | 0.936 | **0.978** | [Contextualized Embeddings based Transformer Encoder for Sentence Similarity Modeling in Answer Selection Task, LREC 2020](http://www.lrec-conf.org/proceedings/lrec2020/pdf/2020.lrec-1.676.pdf) |
135 |
136 | ### WikiQA
137 |
138 | | Model | Code | MAP | MRR | Paper |
139 | | ---------------------------------------- | :-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | --------- | --------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
140 | | ABCNN (Yin et al., 2016) | [](https://github.com/galsang/ABCNN) | 0.6921 | 0.7108 | [ABCNN: Attention-based convolutional neural network for modeling sentence pairs, ACL 2016](https://doi.org/10.1162/tacl_a_00097) |
141 | | Multi-Perspective CNN (Rao et al., 2016) | [](https://github.com/castorini/NCE-CNN-Torch) | 0.701 | 0.718 | [Noise-Contrastive Estimation for Answer Selection with Deep Neural Networks, CIKM 2016](https://dl.acm.org/authorize.cfm?key=N27026) |
142 | | HyperQA (Tay et al., 2017) | [](https://github.com/vanzytay/WSDM2018_HyperQA) | 0.705 | 0.720 | [Enabling Efficient Question Answer Retrieval via Hyperbolic Neural Networks, WSDM 2018](https://arxiv.org/pdf/1707.07847) |
143 | | KVMN (Miller et al., 2016) | [](https://github.com/siyuanzhao/key-value-memory-networks) | 0.7069 | 0.7265 | [Key-Value Memory Networks for Directly Reading Documents, ACL 2016](https://doi.org/10.18653/v1/D16-1147) |
144 | | BiMPM (Wang et al., 2017) | [](https://github.com/zhiguowang/BiMPM) [](https://github.com/NTMC-Community/MatchZoo-py/blob/dev/matchzoo/models/bimpm.py) | 0.718 | 0.731 | [Bilateral Multi-Perspective Matching for Natural Language Sentences, IJCAI 2017](https://arxiv.org/pdf/1702.03814.pdf) |
145 | | IWAN (Shen et al., 2017) | — | 0.733 | 0.750 | [Inter-Weighted Alignment Network for Sentence Pair Modeling, EMNLP 2017](https://aclanthology.info/pdf/D/D17/D17-1122.pdf) |
146 | | CA (Wang and Jiang, 2017) | [](https://github.com/pcgreat/SeqMatchSeq) | 0.7433 | 0.7545 | [A Compare-Aggregate Model for Matching Text Sequences, ICLR 2017](https://arxiv.org/abs/1611.01747) |
147 | | HCRN (Tay et al., 2018c) | [](https://github.com/NTMC-Community/MatchZoo-py/blob/dev/matchzoo/models/hcrn.py) | 0.7430 | 0.7560 | [Hermitian co-attention networks for text matching in asymmetrical domains, IJCAI 2018](https://www.ijcai.org/proceedings/2018/615) |
148 | | Compare-Aggregate (Bian et al., 2017) | [](https://github.com/wjbianjason/Dynamic-Clip-Attention) [](https://github.com/NTMC-Community/MatchZoo-py/blob/dev/matchzoo/models/dynamic_clip.py) | 0.748 | 0.758 | [A Compare-Aggregate Model with Dynamic-Clip Attention for Answer Selection, CIKM 2017](https://dl.acm.org/citation.cfm?id=3133089&CFID=791659397&CFTOKEN=43388059) |
149 | | RE2 (Yang et al., 2019) | [](https://github.com/alibaba-edu/simple-effective-text-matching) [](https://github.com/NTMC-Community/MatchZoo-py/blob/dev/matchzoo/models/re2.py) | 0.7452 | 0.7618 | [Simple and Effective Text Matching with Richer Alignment Features, ACL 2019](https://www.aclweb.org/anthology/P19-1465.pdf) |
150 | | GSAMN (Lai et al., 2019) | [](https://github.com/laituan245/StackExchangeQA) | 0.857 | 0.872 | [A Gated Self-attention Memory Network for Answer Selection, EMNLP 2019](https://arxiv.org/pdf/1909.09696.pdf) |
151 | | TANDA (Garg et al., 2019) | [](https://github.com/alexa/wqa_tanda) | **0.920** | **0.933** | [TANDA: Transfer and Adapt Pre-Trained Transformer Models for Answer Sentence Selection, AAAI 2020](https://arxiv.org/abs/1911.04118) |
152 |
--------------------------------------------------------------------------------