├── .gitignore ├── Ad-hoc-Information-Retrieval └── Ad-hoc-Information-Retrieval.md ├── Community-Question-Answering copy └── Community-Question-Answering.md ├── Gemfile ├── LFQA └── LFQA.md ├── LICENSE ├── Natural-Language-Inference └── Natural-Language-Inference.md ├── Paraphrase-Identification └── Paraphrase-Identification.md ├── README.md ├── Response-Retrieval └── Response-Retrieval.md ├── _config.yml ├── _includes ├── chart.html └── table.html ├── artworks ├── awesome.svg ├── equation.svg ├── not-in-plan.svg ├── plan.svg ├── progress.svg └── ready.svg ├── healthcheck.py └── requirements.txt /.gitignore: -------------------------------------------------------------------------------- 1 | # Byte-compiled / optimized / DLL files 2 | __pycache__/ 3 | *.py[cod] 4 | *$py.class 5 | 6 | # C extensions 7 | *.so 8 | 9 | # Distribution / packaging 10 | .Python 11 | build/ 12 | develop-eggs/ 13 | dist/ 14 | downloads/ 15 | eggs/ 16 | .eggs/ 17 | lib/ 18 | lib64/ 19 | parts/ 20 | sdist/ 21 | var/ 22 | wheels/ 23 | *.egg-info/ 24 | .installed.cfg 25 | *.egg 26 | MANIFEST 27 | 28 | # PyInstaller 29 | # Usually these files are written by a python script from a template 30 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 31 | *.manifest 32 | *.spec 33 | 34 | # Installer logs 35 | pip-log.txt 36 | pip-delete-this-directory.txt 37 | 38 | # Unit test / coverage reports 39 | htmlcov/ 40 | .tox/ 41 | .coverage 42 | .coverage.* 43 | .cache 44 | nosetests.xml 45 | coverage.xml 46 | *.cover 47 | .hypothesis/ 48 | .pytest_cache/ 49 | 50 | # Translations 51 | *.mo 52 | *.pot 53 | 54 | # Django stuff: 55 | *.log 56 | local_settings.py 57 | db.sqlite3 58 | 59 | # Flask stuff: 60 | instance/ 61 | .webassets-cache 62 | 63 | # Scrapy stuff: 64 | .scrapy 65 | 66 | # Sphinx documentation 67 | docs/_build/ 68 | 69 | # PyBuilder 70 | target/ 71 | 72 | # Jupyter Notebook 73 | .ipynb_checkpoints 74 | 75 | # pyenv 76 | .python-version 77 | 78 | # celery beat schedule file 79 | celerybeat-schedule 80 | 81 | # SageMath parsed files 82 | *.sage.py 83 | 84 | # Environments 85 | .env 86 | .venv 87 | env/ 88 | venv/ 89 | ENV/ 90 | env.bak/ 91 | venv.bak/ 92 | 93 | # Spyder project settings 94 | .spyderproject 95 | .spyproject 96 | 97 | # Rope project settings 98 | .ropeproject 99 | 100 | # mkdocs documentation 101 | /site 102 | 103 | # mypy 104 | .mypy_cache/ 105 | 106 | .DS_Store 107 | -------------------------------------------------------------------------------- /Ad-hoc-Information-Retrieval/Ad-hoc-Information-Retrieval.md: -------------------------------------------------------------------------------- 1 | # Ad-hoc Information Retrieval 2 | 3 | --- 4 | 5 | **Information retrieval** (**IR**) is the activity of obtaining information resources relevant to an information need from a collection. Searches can be based on full-text or other content-based indexing. Here, the **Ad-hoc information retrieval** refer in particular to text-based retrieval where documents in the collection remain relative static and new queries are submitted to the system continually (cited from the [survey](https://arxiv.org/pdf/1903.06902.pdf)). 6 | 7 | The number of queries is huge. Some benchmark datasets are listed in the 8 | following, 9 | 10 | ## Classic Datasets 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | 40 | 41 | 42 | 43 | 44 | 45 | 46 | 47 | 48 | 49 | 50 | 51 | 52 | 53 | 54 | 55 | 56 | 57 | 58 | 59 |

Dataset	Genre	#Query	#Collections
Robust04	news	250	0.5M
ClueWeb09-Cat-B	web	150	50M
Gov2	.gov pages	150	25M
MS MARCO (Document Ranking)	web pages	367,013	3.2M
MQ2007	.gov pages	1692	25M
MQ2008	.gov pages	794	25M

60 | 61 | * [**Robust04**](https://trec.nist.gov/data/t13_robust.html) is a small news dataset which contains about 0.5 million documents in total. The queries are collected from TREC Robust Track 2004. There are 250 queries in total. 62 | 63 | * [**Cluebweb09**](https://trec.nist.gov/data/webmain.html) is a large Web collection which contains about 34 million documents in total. The queries are accumulated from TREC Web Tracks 2009, 2010, and 2011. There are 150 queries in total. 64 | 65 | * [**Gov2**](https://trec.nist.gov/data/terabyte.html) is a large Web collection where the pages are crawled from .gov. It consists of 25 million documents in total. The queries are accumulated over TREC Terabyte Tracks 2004, 2005, and 2006. There are 150 queries in total. 66 | 67 | * [**MS MARCO (Document Ranking)**](http://www.msmarco.org/dataset.aspx) provides a large number of information question-style queries from Bing's search logs. There passages are annotated by humans with relevant/non-relevant labels. There are 8,841822 documents in total. There are 808,731queries, 6,980 queries and 48,598 queries for train, validation and test, respectively. 68 | 69 | * [**Million Query TREC 2007 (MQ2007)**](http://www.msmarco.org/dataset.aspx) is a LETOR benchmark dataset which uses Gov2 web collection. There are 1692 queries in MQ2007 with 65,323 labeled documents. 70 | 71 | * [**Million Query TREC 2008 (MQ2008)**](http://www.msmarco.org/dataset.aspx) is another LETOR benchmark dataset which also uses Gov2 web collection. There are 784 queries in MQ2008 with 14,384 labeled documents. 72 | 73 | ## Neural Models 74 | ### Robust04 75 | 76 | | Model | Code | MAP | P@20 | nDCG@20| Paper | 77 | | ---- | ---- |---- | ---- | ---- | ---- | 78 | | DSSM | [![MatchZoo](https://img.shields.io/badge/matchzoo-ready-green)](https://github.com/NTMC-Community/MatchZoo/blob/master/matchzoo/models/dssm.py) | [0.095](http://www.bigdatalab.ac.cn/~gjf/papers/2016/CIKM2016a_guo.pdf) | [0.171](http://www.bigdatalab.ac.cn/~gjf/papers/2016/CIKM2016a_guo.pdf) | [0.201](http://www.bigdatalab.ac.cn/~gjf/papers/2016/CIKM2016a_guo.pdf) | [Learning Deep Structured Semantic Models for Web Search using Clickthrough Data, CIKM 2013](https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/cikm2013_DSSM_fullversion.pdf)| 79 | | CDSSM | [![MatchZoo](https://img.shields.io/badge/matchzoo-ready-green)](https://github.com/NTMC-Community/MatchZoo/blob/master/matchzoo/models/cdssm.py) |[0.067](http://www.bigdatalab.ac.cn/~gjf/papers/2016/CIKM2016a_guo.pdf) | [0.125](http://www.bigdatalab.ac.cn/~gjf/papers/2016/CIKM2016a_guo.pdf) | [0.146](http://www.bigdatalab.ac.cn/~gjf/papers/2016/CIKM2016a_guo.pdf) |[Learning Semantic Representations Using Convolutional Neural Networks for Web Search, WWW 2014](https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/www2014_cdssm_p07.pdf) | 80 | | ARC-I | [![MatchZoo](https://img.shields.io/badge/matchzoo-ready-green)](https://github.com/NTMC-Community/MatchZoo/blob/master/matchzoo/models/arci.py) |[0.041](http://www.bigdatalab.ac.cn/~gjf/papers/2016/CIKM2016a_guo.pdf) | [0.065](http://www.bigdatalab.ac.cn/~gjf/papers/2016/CIKM2016a_guo.pdf) | [0.066](http://www.bigdatalab.ac.cn/~gjf/papers/2016/CIKM2016a_guo.pdf) |[Convolutional Neural Network Architectures for Matching Natural Language Sentences, NIPS 2014](https://papers.nips.cc/paper/5550-convolutional-neural-network-architectures-for-matching-natural-language-sentences.pdf) | 81 | | ARC-II | [![MatchZoo](https://img.shields.io/badge/matchzoo-ready-green)](https://github.com/NTMC-Community/MatchZoo/blob/master/matchzoo/models/arci.py) |[0.067](http://www.bigdatalab.ac.cn/~gjf/papers/2016/CIKM2016a_guo.pdf) | [0.128](http://www.bigdatalab.ac.cn/~gjf/papers/2016/CIKM2016a_guo.pdf) | [0.147](http://www.bigdatalab.ac.cn/~gjf/papers/2016/CIKM2016a_guo.pdf) |[Convolutional Neural Network Architectures for Matching Natural Language Sentences, NIPS 2014](https://papers.nips.cc/paper/5550-convolutional-neural-network-architectures-for-matching-natural-language-sentences.pdf) | 82 | | DRMM | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/faneshion/DRMM) [![MatchZoo](https://img.shields.io/badge/matchzoo-ready-green)](https://github.com/NTMC-Community/MatchZoo/blob/master/matchzoo/models/drmm.py)| 0.279| 0.431 | 0.382 | [A Deep Relevance Matching Model for Ad-hoc Retrieval, CIKM 2016](http://www.bigdatalab.ac.cn/~gjf/papers/2016/CIKM2016a_guo.pdf) | 83 | | KNRM | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/AdeDZY/K-NRM) [![MatchZoo](https://img.shields.io/badge/matchzoo-ready-green)](https://github.com/NTMC-Community/MatchZoo/blob/master/matchzoo/models/knrm.py) | — |[0.352](https://arxiv.org/pdf/1904.07094.pdf)|[0.409](https://arxiv.org/pdf/1904.07094.pdf) | [End-to-End Neural Ad-hoc Ranking with Kernel Pooling, SIGIR 2017](https://arxiv.org/pdf/1706.06613.pdf) | 84 | | CONV-KNRM | [![MatchZoo](https://img.shields.io/badge/matchzoo-ready-green)](https://github.com/NTMC-Community/MatchZoo/blob/master/matchzoo/models/conv_knrm.py) | — | — |[0.416](https://arxiv.org/pdf/1905.09217.pdf)| [Convolutional Neural Networks for Soft-Matching N-Grams in Ad-hoc Search, WSDM 2018](http://www.cs.cmu.edu/~zhuyund/papers/WSDM_2018_Dai.pdf) | 85 | | BERT-MaxP | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/AdeDZY/SIGIR19-BERT-IR) | — | — |0.469| [Deeper Text Understanding for IR with Contextual Neural Language Modeling, SIGIR 2019](https://arxiv.org/pdf/1905.09217.pdf) | 86 | | CEDR-DRMM | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/Georgetown-IR-Lab/cedr) | — | 0.459 |0.526| [CEDR: Contextualized Embeddings for Document Ranking, SIGIR 2019](https://arxiv.org/pdf/1904.07094.pdf) | 87 | | QINM | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/TJUIRLAB/SIGIR20_QINM) | 0.294 | 0.408 |0.453| [A Quantum Interference Inspired Neural Matching Model for Ad-hoc Retrieval, SIGIR 2020](https://dl.acm.org/doi/pdf/10.1145/3397271.3401070) | 88 | 89 | 90 | ### ClueWeb09-B 91 | 92 | | Model | Code | MAP | P@20 | nDCG@20| Paper | 93 | | ---- | ---- |---- | ---- | ---- | ---- | 94 | | DSSM | [![MatchZoo](https://img.shields.io/badge/matchzoo-ready-green)](https://github.com/NTMC-Community/MatchZoo/blob/master/matchzoo/models/dssm.py) | [0.054](http://www.bigdatalab.ac.cn/~gjf/papers/2016/CIKM2016a_guo.pdf) | [0.185](http://www.bigdatalab.ac.cn/~gjf/papers/2016/CIKM2016a_guo.pdf) | [0.132](http://www.bigdatalab.ac.cn/~gjf/papers/2016/CIKM2016a_guo.pdf) | [Learning Deep Structured Semantic Models for Web Search using Clickthrough Data, CIKM 2013](https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/cikm2013_DSSM_fullversion.pdf)| 95 | | CDSSM | [![MatchZoo](https://img.shields.io/badge/matchzoo-ready-green)](https://github.com/NTMC-Community/MatchZoo/blob/master/matchzoo/models/cdssm.py) |[0.064](http://www.bigdatalab.ac.cn/~gjf/papers/2016/CIKM2016a_guo.pdf) | [0.214](http://www.bigdatalab.ac.cn/~gjf/papers/2016/CIKM2016a_guo.pdf) | [0.153](http://www.bigdatalab.ac.cn/~gjf/papers/2016/CIKM2016a_guo.pdf) |[Learning Semantic Representations Using Convolutional Neural Networks for Web Search, WWW 2014](https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/www2014_cdssm_p07.pdf) | 96 | | ARC-I | [![MatchZoo](https://img.shields.io/badge/matchzoo-ready-green)](https://github.com/NTMC-Community/MatchZoo/blob/master/matchzoo/models/arci.py) |[0.024](http://www.bigdatalab.ac.cn/~gjf/papers/2016/CIKM2016a_guo.pdf) | [0.089](http://www.bigdatalab.ac.cn/~gjf/papers/2016/CIKM2016a_guo.pdf) | [0.073](http://www.bigdatalab.ac.cn/~gjf/papers/2016/CIKM2016a_guo.pdf) |[Convolutional Neural Network Architectures for Matching Natural Language Sentences, NIPS 2014](https://papers.nips.cc/paper/5550-convolutional-neural-network-architectures-for-matching-natural-language-sentences.pdf) | 97 | | ARC-II | [![MatchZoo](https://img.shields.io/badge/matchzoo-ready-green)](https://github.com/NTMC-Community/MatchZoo/blob/master/matchzoo/models/arci.py) |[0.033](http://www.bigdatalab.ac.cn/~gjf/papers/2016/CIKM2016a_guo.pdf) | [0.123](http://www.bigdatalab.ac.cn/~gjf/papers/2016/CIKM2016a_guo.pdf) | [0.087](http://www.bigdatalab.ac.cn/~gjf/papers/2016/CIKM2016a_guo.pdf) |[Convolutional Neural Network Architectures for Matching Natural Language Sentences, NIPS 2014](https://papers.nips.cc/paper/5550-convolutional-neural-network-architectures-for-matching-natural-language-sentences.pdf) | 98 | | DRMM | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/faneshion/DRMM)[![MatchZoo](https://img.shields.io/badge/matchzoo-ready-green)](https://github.com/NTMC-Community/MatchZoo/blob/master/matchzoo/models/drmm.py)| 0.133| 0.365 | 0.258 | [A Deep Relevance Matching Model for Ad-hoc Retrieval, CIKM 2016](http://www.bigdatalab.ac.cn/~gjf/papers/2016/CIKM2016a_guo.pdf) | 99 | | CONV-KNRM | [![MatchZoo](https://img.shields.io/badge/matchzoo-ready-green)](https://github.com/NTMC-Community/MatchZoo/blob/master/matchzoo/models/conv_knrm.py) | — | — |[0.270](https://arxiv.org/pdf/1905.09217.pdf)| [Convolutional Neural Networks for Soft-Matching N-Grams in Ad-hoc Search, WSDM 2018](http://www.cs.cmu.edu/~zhuyund/papers/WSDM_2018_Dai.pdf) | 100 | | BERT-MaxP | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/AdeDZY/SIGIR19-BERT-IR) | — | — |0.289| [Deeper Text Understanding for IR with Contextual Neural Language Modeling, SIGIR 2019](https://arxiv.org/pdf/1905.09217.pdf) | 101 | | QINM | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/TJUIRLAB/SIGIR20_QINM) | 0.134 | 0.375 |0.338| [A Quantum Interference Inspired Neural Matching Model for Ad-hoc Retrieval, SIGIR 2020](https://dl.acm.org/doi/pdf/10.1145/3397271.3401070) | 102 | 103 | ### MS MARCO (Document Ranking) 104 | 105 | | Model | Code | MRR@10 | nDCG@10 | Recall@10| Paper | 106 | | ---- | ---- |---- | ---- | ---- | ---- | 107 | | MatchPyramid| [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/pl8787/MatchPyramid-TensorFlow) [![MatchZoo](https://img.shields.io/badge/matchzoo-ready-green)](https://github.com/NTMC-Community/MatchZoo/blob/master/matchzoo/models/match_pyramid.py)| [0.286](https://arxiv.org/pdf/1710.05649.pdf)|[0.344](https://arxiv.org/pdf/2002.01854.pdf) |[0.531](https://arxiv.org/pdf/2002.01854.pdf) | [Text Matching as Image Recognition, AAAI 2016](https://arxiv.org/pdf/2002.01854.pdf) | 108 | | Duet | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/bmitra-msft/NDRM/blob/master/notebooks/Duet.ipynb) [![MatchZoo](https://img.shields.io/badge/matchzoo-ready-green)](https://github.com/NTMC-Community/MatchZoo/blob/master/matchzoo/models/duet.py) |[0.266](https://arxiv.org/pdf/2002.01854.pdf) |[0.327](https://arxiv.org/pdf/2002.01854.pdf)|[0.520](https://arxiv.org/pdf/2002.01854.pdf)|[Learning to Match using Local and Distributed Representations of Text for Web Search, WWW 2017](https://www.microsoft.com/en-us/research/wp-content/uploads/2016/10/wwwfp0192-mitra.pdf) | 109 | | Co-PACRR | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/khui/copacrr) [![MatchZoo](https://img.shields.io/badge/matchzoo-ready-green)](https://github.com/NTMC-Community/awesome-neural-models-for-semantic-match/blob/master) | [0.284](https://arxiv.org/pdf/2002.01854.pdf) | [0.345](https://arxiv.org/pdf/2002.01854.pdf) | [0.543](https://arxiv.org/pdf/2002.01854.pdf) | [Co-PACRR: A Context-Aware Neural IR Model for Ad-hoc Retrieval, WSDM 2018](https://arxiv.org/pdf/1706.10192.pdf) | 110 | | KNRM | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/AdeDZY/K-NRM) [![MatchZoo](https://img.shields.io/badge/matchzoo-ready-green)](https://github.com/NTMC-Community/MatchZoo/blob/master/matchzoo/models/knrm.py) | [0.261](https://arxiv.org/pdf/2002.01854.pdf) |[0.323](https://arxiv.org/pdf/2002.01854.pdf)|[0.519](https://arxiv.org/pdf/2002.01854.pdf) | [End-to-End Neural Ad-hoc Ranking with Kernel Pooling, SIGIR 2017](https://arxiv.org/pdf/1706.06613.pdf) | 111 | | CONV-KNRM | [![MatchZoo](https://img.shields.io/badge/matchzoo-ready-green)](https://github.com/NTMC-Community/MatchZoo/blob/master/matchzoo/models/conv_knrm.py) | [0.283](https://arxiv.org/pdf/2002.01854.pdf) | [0.345](https://arxiv.org/pdf/2002.01854.pdf) |[0.542](https://arxiv.org/pdf/2002.01854.pdf)| [Convolutional Neural Networks for Soft-Matching N-Grams in Ad-hoc Search, WSDM 2018](http://www.cs.cmu.edu/~zhuyund/papers/WSDM_2018_Dai.pdf) | 112 | | BERT | — | [0.352](https://arxiv.org/pdf/2002.01854.pdf) | [0.417](https://arxiv.org/pdf/2002.01854.pdf) |[0.623](https://arxiv.org/pdf/2002.01854.pdf)| [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, NAACL 2019](https://arxiv.org/pdf/1905.09217.pdf) | 113 | | Transformer-Kernel | — | 0.316 | 0.380 |0.586| [Interpretable & Time-Budget-Constrained Contextualization for Re-Ranking, Arxiv 2020](https://arxiv.org/pdf/2002.01854.pdf) | 114 | 115 | ### MQ2007 116 | 117 | | Model | Code | MAP | P@10 | nDCG@10| Paper | 118 | | ---- | ---- |---- | ---- | ---- | ---- | 119 | | DSSM | [![MatchZoo](https://img.shields.io/badge/matchzoo-ready-green)](https://github.com/NTMC-Community/MatchZoo/blob/master/matchzoo/models/dssm.py) | [0.409](https://arxiv.org/pdf/1805.05737.pdf) | [0.352](https://arxiv.org/pdf/1805.05737.pdf) | [0.371](https://arxiv.org/pdf/1805.05737.pdf) | [Learning Deep Structured Semantic Models for Web Search using Clickthrough Data, CIKM 2013](https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/cikm2013_DSSM_fullversion.pdf)| 120 | | CDSSM | [![MatchZoo](https://img.shields.io/badge/matchzoo-ready-green)](https://github.com/NTMC-Community/MatchZoo/blob/master/matchzoo/models/cdssm.py) |[0.364](https://arxiv.org/pdf/1710.05649.pdf)| [0.291](https://arxiv.org/pdf/1710.05649.pdf) |[0.325](https://arxiv.org/pdf/1710.05649.pdf)|[Learning Semantic Representations Using Convolutional Neural Networks for Web Search, WWW 2014](https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/www2014_cdssm_p07.pdf) | 121 | | ARC-I | [![MatchZoo](https://img.shields.io/badge/matchzoo-ready-green)](https://github.com/NTMC-Community/MatchZoo/blob/master/matchzoo/models/arci.py) |[0.417](https://arxiv.org/pdf/1710.05649.pdf) | [0.364](https://arxiv.org/pdf/1710.05649.pdf) | [0.386](https://arxiv.org/pdf/1710.05649.pdf) |[Convolutional Neural Network Architectures for Matching Natural Language Sentences, NIPS 2014](https://papers.nips.cc/paper/5550-convolutional-neural-network-architectures-for-matching-natural-language-sentences.pdf) | 122 | | ARC-II | [![MatchZoo](https://img.shields.io/badge/matchzoo-ready-green)](https://github.com/NTMC-Community/MatchZoo/blob/master/matchzoo/models/arci.py) |[0.421](https://arxiv.org/pdf/1710.05649.pdf) | [0.366](https://arxiv.org/pdf/1710.05649.pdf) | [0.390](https://arxiv.org/pdf/1710.05649.pdf) |[Convolutional Neural Network Architectures for Matching Natural Language Sentences, NIPS 2014](https://papers.nips.cc/paper/5550-convolutional-neural-network-architectures-for-matching-natural-language-sentences.pdf) | 123 | | DRMM | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/faneshion/DRMM)[![MatchZoo](https://img.shields.io/badge/matchzoo-ready-green)](https://github.com/NTMC-Community/MatchZoo/blob/master/matchzoo/models/drmm.py)| [0.467](https://arxiv.org/pdf/1710.05649.pdf)| [0.388](https://arxiv.org/pdf/1805.05737.pdf) | [0.440](https://arxiv.org/pdf/1805.05737.pdf) | [A Deep Relevance Matching Model for Ad-hoc Retrieval, CIKM 2016](http://www.bigdatalab.ac.cn/~gjf/papers/2016/CIKM2016a_guo.pdf) | 124 | | MatchPyramid| [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/pl8787/MatchPyramid-TensorFlow) [![MatchZoo](https://img.shields.io/badge/matchzoo-ready-green)](https://github.com/NTMC-Community/MatchZoo/blob/master/matchzoo/models/match_pyramid.py)| [0.434](https://arxiv.org/pdf/1710.05649.pdf)|[0.371](https://arxiv.org/pdf/1710.05649.pdf) |[0.409](https://arxiv.org/pdf/1710.05649.pdf) | [Text Matching as Image Recognition, AAAI 2016](https://arxiv.org/pdf/1602.06359.pdf) | 125 | | Duet | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/bmitra-msft/NDRM/blob/master/notebooks/Duet.ipynb) [![MatchZoo](https://img.shields.io/badge/matchzoo-ready-green)](https://github.com/NTMC-Community/MatchZoo/blob/master/matchzoo/models/duet.py) |[0.474](https://arxiv.org/pdf/1805.05737.pdf) |[0.398](https://arxiv.org/pdf/1805.05737.pdf)|[0.453](https://arxiv.org/pdf/1805.05737.pdf)|[Learning to Match using Local and Distributed Representations of Text for Web Search, WWW 2017](https://www.microsoft.com/en-us/research/wp-content/uploads/2016/10/wwwfp0192-mitra.pdf) | 126 | | DeepRank | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/pl8787/DeepRank_PyTorch) [![MatchZoo](https://img.shields.io/badge/matchzoo-ready-green)](https://github.com/NTMC-Community/awesome-neural-models-for-semantic-match/blob/master) |0.497 |0.412|0.482 | [DeepRank: A New Deep Architecture for Relevance Ranking in Information Retrieval, CIKM 2017](https://arxiv.org/pdf/1710.05649.pdf) | 127 | | HiNT | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/faneshion/HiNT) [![MatchZoo](https://img.shields.io/badge/matchzoo-ready-green)](https://github.com/NTMC-Community/awesome-neural-models-for-semantic-match/blob/master)|0.502 |0.447|0.490 | [Modeling Diverse Relevance Patterns in Ad-hoc Retrieval, SIGIR 2018](https://arxiv.org/pdf/1805.05737.pdf)| 128 | 129 | ### MQ2008 130 | 131 | | Model | Code | MAP | P@10 | nDCG@10| Paper | 132 | | ---- | ---- |---- | ---- | ---- | ---- | 133 | | DSSM | [![MatchZoo](https://img.shields.io/badge/matchzoo-ready-green)](https://github.com/NTMC-Community/MatchZoo/blob/master/matchzoo/models/dssm.py) | [0.391](https://arxiv.org/pdf/1805.05737.pdf) | [0.221](https://arxiv.org/pdf/1805.05737.pdf) | [0.178](https://arxiv.org/pdf/1805.05737.pdf) | [Learning Deep Structured Semantic Models for Web Search using Clickthrough Data, CIKM 2013](https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/cikm2013_DSSM_fullversion.pdf)| 134 | | CDSSM | [![MatchZoo](https://img.shields.io/badge/matchzoo-ready-green)](https://github.com/NTMC-Community/MatchZoo/blob/master/matchzoo/models/cdssm.py) |[0.395](https://arxiv.org/pdf/1710.05649.pdf)| [0.222](https://arxiv.org/pdf/1710.05649.pdf) |[0.175](https://arxiv.org/pdf/1710.05649.pdf)|[Learning Semantic Representations Using Convolutional Neural Networks for Web Search, WWW 2014](https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/www2014_cdssm_p07.pdf) | 135 | | ARC-I | [![MatchZoo](https://img.shields.io/badge/matchzoo-ready-green)](https://github.com/NTMC-Community/MatchZoo/blob/master/matchzoo/models/arci.py) |[0.424](https://arxiv.org/pdf/1710.05649.pdf) | [0.311](https://arxiv.org/pdf/1710.05649.pdf) | [0.187](https://arxiv.org/pdf/1710.05649.pdf) |[Convolutional Neural Network Architectures for Matching Natural Language Sentences, NIPS 2014](https://papers.nips.cc/paper/5550-convolutional-neural-network-architectures-for-matching-natural-language-sentences.pdf) | 136 | | ARC-II | [![MatchZoo](https://img.shields.io/badge/matchzoo-ready-green)](https://github.com/NTMC-Community/MatchZoo/blob/master/matchzoo/models/arci.py) |[0.421](https://arxiv.org/pdf/1710.05649.pdf) | [0.229](https://arxiv.org/pdf/1710.05649.pdf) | [0.181](https://arxiv.org/pdf/1710.05649.pdf) |[Convolutional Neural Network Architectures for Matching Natural Language Sentences, NIPS 2014](https://papers.nips.cc/paper/5550-convolutional-neural-network-architectures-for-matching-natural-language-sentences.pdf) | 137 | | DRMM | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/faneshion/DRMM)[![MatchZoo](https://img.shields.io/badge/matchzoo-ready-green)](https://github.com/NTMC-Community/MatchZoo/blob/master/matchzoo/models/drmm.py)| [0.473](https://arxiv.org/pdf/1710.05649.pdf)| [0.245](https://arxiv.org/pdf/1805.05737.pdf) | [0.220](https://arxiv.org/pdf/1805.05737.pdf) | [A Deep Relevance Matching Model for Ad-hoc Retrieval, CIKM 2016](http://www.bigdatalab.ac.cn/~gjf/papers/2016/CIKM2016a_guo.pdf) | 138 | | MatchPyramid| [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/pl8787/MatchPyramid-TensorFlow) [![MatchZoo](https://img.shields.io/badge/matchzoo-ready-green)](https://github.com/NTMC-Community/MatchZoo/blob/master/matchzoo/models/match_pyramid.py)| [0.449](https://arxiv.org/pdf/1710.05649.pdf)|[0.239](https://arxiv.org/pdf/1710.05649.pdf) |[0.211](https://arxiv.org/pdf/1710.05649.pdf) | [Text Matching as Image Recognition, AAAI 2016](https://arxiv.org/pdf/1602.06359.pdf) | 139 | | Duet | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/bmitra-msft/NDRM/blob/master/notebooks/Duet.ipynb) [![MatchZoo](https://img.shields.io/badge/matchzoo-ready-green)](https://github.com/NTMC-Community/MatchZoo/blob/master/matchzoo/models/duet.py) |[0.476](https://arxiv.org/pdf/1805.05737.pdf) |[0.240](https://arxiv.org/pdf/1805.05737.pdf)|[0.216](https://arxiv.org/pdf/1805.05737.pdf)|[Learning to Match using Local and Distributed Representations of Text for Web Search, WWW 2017](https://www.microsoft.com/en-us/research/wp-content/uploads/2016/10/wwwfp0192-mitra.pdf) | 140 | | DeepRank | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/pl8787/DeepRank_PyTorch) [![MatchZoo](https://img.shields.io/badge/matchzoo-ready-green)](https://github.com/NTMC-Community/awesome-neural-models-for-semantic-match/blob/master) |0.498 |0.252|0.240 | [DeepRank: A New Deep Architecture for Relevance Ranking in Information Retrieval, CIKM 2017](https://arxiv.org/pdf/1710.05649.pdf) | 141 | | HiNT | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/faneshion/HiNT)[![MatchZoo](https://img.shields.io/badge/matchzoo-ready-green)](https://github.com/NTMC-Community/awesome-neural-models-for-semantic-match/blob/master)|0.505 |0.255|0.244 | [Modeling Diverse Relevance Patterns in Ad-hoc Retrieval, SIGIR 2018](https://arxiv.org/pdf/1805.05737.pdf)| 142 | -------------------------------------------------------------------------------- /Community-Question-Answering copy/Community-Question-Answering.md: -------------------------------------------------------------------------------- 1 | # Community Question Answering 2 | 3 | **Community Question Answer** is to automatically search for relevant answers among many responses provided for a given question (Answer Selection), and search for relevant questions to reuse their existing answers (Question Retrieval). 4 | 5 | ## Classic Datasets 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | 40 | 41 | 42 | 43 | 44 | 45 | 46 | 47 | 48 | 49 | 50 | 51 | 52 | 53 | 54 | 55 | 56 | 57 | 58 | 59 | 60 | 61 | 62 | 63 | 64 | 65 | 66 |

Dataset	Domain	#Question	#Answer
TRECQA	Open-domain	1,229	5,3417
WikiQA	Open-domain	3,047	29,258
InsuranceQA	Insurance	12,889	21,325
FiQA	Financial	6,648	57,641
Yahoo! Answers	Open-domain	50,112	253,440
SemEval-2015 Task 3	Open-domain	2,600	16,541
SemEval-2016 Task 3	Open-domain	4,879	36,198
SemEval-2017 Task 3	Open-domain	4,879	36,198

67 | 68 | - [**TRECQA**](https://trec.nist.gov/data/qa.html) dataset is created by [Wang et. al.](https://www.aclweb.org/anthology/D07-1003) from TREC QA track 8-13 data, with candidate answers automatically selected from each question’s document pool using a combination of overlapping non-stop word counts and pattern matching. This data set is one of the most widely used benchmarks for [answer sentence selection](). 69 | 70 | - [**WikiQA**](https://www.microsoft.com/en-us/download/details.aspx?id=52419) is a publicly available set of question and sentence pairs, collected and annotated for research on open-domain question answering by Microsoft Research. 71 | 72 | - [**InsuranceQA**](https://github.com/shuzi/insuranceQA) is a non-factoid QA dataset from the insurance domain. Question may have multiple correct answers and normally the questions are much shorter than the answers. The average length of questions and answers in tokens are 7 and 95, respectively. For each question in the development and test sets, there is a set of 500 candidate answers. 73 | 74 | - [**FiQA**](https://sites.google.com/view/fiqa) is a non-factoid QA dataset from the financial domain which has been recently released for WWW 2018 Challenges. The dataset is built by crawling Stackexchange, Reddit and StockTwits in which part of the questions are opinionated, targeting mined opinions and their respective entities, aspects, sentiment polarity and opinion holder. 75 | 76 | - [**Yahoo! Answers**](https://webscope.sandbox.yahoo.com) is a web site where people post questions and answers, all of which are public to any web user willing to browse or download them. The data we have collected is the Yahoo! Answers corpus as of 10/25/2007. This is a benchmark dataset for communitybased question answering that was collected from Yahoo Answers. In this dataset, the answer lengths are relatively longer than TrecQA and WikiQA. 77 | 78 | - [**SemEval-2015 Task 3**](http://alt.qcri.org/semeval2015/task3/) consists of two sub-tasks. In Subtask A, given a question (short title + extended description), and several community answers, classify each of the answer as definitely relevance (good), potentially useful (potential), or bad or irrelevant (bad, dialog, non-english other). In Subtask B, given a YES/NO question (short title + extended description), and a list of community answers, decide whether the global answer to the question should be yes, no, or unsure. 79 | 80 | - [**SemEval-2016 Task 3**](http://alt.qcri.org/semeval2016/task3/) consists two sub-tasks, namely _Question-Comment Similarity_ and _Question-Question Similarity_. In the _Question-Comment Similarity_ task, given a question from a question-comment thread, rank the comments according to their relevance with respect to the question. In _Question-Question Similarity_ task, given the new question, rerank all similar questions retrieved by a search engine. 81 | 82 | - [**SemEval-2017 Task 3**](http://alt.qcri.org/semeval2017/task3/) contains two sub-tasks, namely _Question Similarity_ and _Relevance Classification_. Given the new question and a set of related questions from the collection, the _Question Similarity_ task is to rank the similar questions according to their similarity to the original question. While the _Relevance Classification_ is to rank the answer posts according to their relevance with respect to the question based on a question-answer thread. 83 | 84 | ## Performance 85 | 86 | ### TREC QA (Raw Version) 87 | 88 | | Model | Code | MAP | MRR | Paper | 89 | | :---------------------------------- | :----------------------------------------------------------: | :-------: | :-------: | :----------------------------------------------------------- | 90 | | Punyakanok (2004) | — | 0.419 | 0.494 | [Mapping dependencies trees: An application to question answering, ISAIM 2004](http://cogcomp.cs.illinois.edu/papers/PunyakanokRoYi04a.pdf) | 91 | | Cui (2005) | — | 0.427 | 0.526 | [Question Answering Passage Retrieval Using Dependency Relations, SIGIR 2005](https://www.comp.nus.edu.sg/~kanmy/papers/f66-cui.pdf) | 92 | | Wang (2007) | — | 0.603 | 0.685 | [What is the Jeopardy Model? A Quasi-Synchronous Grammar for QA, EMNLP 2007](http://www.aclweb.org/anthology/D/D07/D07-1003.pdf) | 93 | | H&S (2010) | — | 0.609 | 0.692 | [Tree Edit Models for Recognizing Textual Entailments, Paraphrases, and Answers to Questions, NAACL 2010](http://www.aclweb.org/anthology/N10-1145) | 94 | | W&M (2010) | — | 0.595 | 0.695 | [Probabilistic Tree-Edit Models with Structured Latent Variables for Textual Entailment and Question Answering, COLING 2020](http://aclweb.org/anthology//C/C10/C10-1131.pdf) | 95 | | Yao (2013) | — | 0.631 | 0.748 | [Answer Extraction as Sequence Tagging with Tree Edit Distance, NAACL 2013](http://www.aclweb.org/anthology/N13-1106.pdf) | 96 | | S&M (2013) | — | 0.678 | 0.736 | [Automatic Feature Engineering for Answer Selection and Extraction, EMNLP 2013](http://www.aclweb.org/anthology/D13-1044.pdf) | 97 | | Backward (Shnarch et al., 2013) | — | 0.686 | 0.754 | [Probabilistic Models for Lexical Inference, Ph.D. thesis 2013](http://u.cs.biu.ac.il/~nlp/wp-content/uploads/eyal-thesis-library-ready.pdf) | 98 | | LCLR (Yih et al., 2013) | — | 0.709 | 0.770 | [Question Answering Using Enhanced Lexical Semantic Models, ACL 2013](http://research.microsoft.com/pubs/192357/QA-SentSel-Updated-PostACL.pdf) | 99 | | bigram+count (Yu et al., 2014) | — | 0.711 | 0.785 | [Deep Learning for Answer Sentence Selection, NIPS 2014](http://arxiv.org/pdf/1412.1632v1.pdf) | 100 | | BLSTM (W&N et al., 2015) | — | 0.713 | 0.791 | [A Long Short-Term Memory Model for Answer Sentence Selection in Question Answering, ACL 2015](http://www.aclweb.org/anthology/P15-2116) | 101 | | Architecture-II (Feng et al., 2015) | — | 0.711 | 0.800 | [Applying deep learning to answer selection: A study and an open task, ASRU 2015](http://arxiv.org/abs/1508.01585) | 102 | | PairCNN (Severyn et al., 2015) | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/zhangzibin/PairCNN-Ranking) | 0.746 | 0.808 | [Learning to Rank Short Text Pairs with Convolutional Deep Neural Networks, SIGIR 2015](http://disi.unitn.eu/moschitti/since2013/2015_SIGIR_Severyn_LearningRankShort.pdf) | 103 | | aNMM (Yang et al., 2016) | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/yangliuy/aNMM-CIKM16) [![MatchZoo](https://img.shields.io/badge/matchzoo-ready-green)](https://github.com/NTMC-Community/MatchZoo-py/tree/master/matchzoo/models/anmm.py) | 0.750 | 0.811 | [aNMM: Ranking Short Answer Texts with Attention-Based Neural Matching Model, CIKM 2016](http://maroo.cs.umass.edu/pub/web/getpdf.php?id=1240) | 104 | | HDLA (Tay et al., 2017) | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/vanzytay/YahooQA_Splits) | 0.750 | 0.815 | [Learning to Rank Question Answer Pairs with Holographic Dual LSTM Architecture, SIGIR 2017](https://arxiv.org/abs/1707.06372) | 105 | | PWIM (Hua et al. 2016) | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/castorini/VDPWI-NN-Torch) | 0.758 | 0.822 | [Pairwise Word Interaction Modeling with Deep Neural Networks for Semantic Similarity Measurement, NAACL 2016](https://cs.uwaterloo.ca/~jimmylin/publications/He_etal_NAACL-HTL2016.pdf) | 106 | | MP-CNN (Hua et al. 2015) | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/castorini/MP-CNN-Torch) | 0.762 | 0.830 | [Multi-Perspective Sentence Similarity Modeling with Convolutional Neural Networks, EMNLP 2015](http://aclweb.org/anthology/D/D15/D15-1181.pdf) | 107 | | HyperQA (Tay et al., 2017) | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/vanzytay/WSDM2018_HyperQA) | 0.770 | 0.825 | [Enabling Efficient Question Answer Retrieval via Hyperbolic Neural Networks, WSDM 2018](https://arxiv.org/pdf/1707.07847) | 108 | | MP-CNN (Rao et al., 2016) | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/castorini/NCE-CNN-Torch) | 0.780 | 0.834 | [Noise-Contrastive Estimation for Answer Selection with Deep Neural Networks, CIKM 2016](https://dl.acm.org/authorize.cfm?key=N27026) | 109 | | HCAN (Rao et al., 2019) | — | 0.774 | 0.843 | [Bridging the Gap between Relevance Matching and Semantic Matching for Short Text Similarity Modeling, EMNLP 2019](https://jinfengr.github.io/publications/Rao_etal_EMNLP2019.pdf) | 110 | | MP-CNN (Tayyar et al., 2018) | — | 0.836 | 0.863 | [Integrating Question Classification and Deep Learning for improved Answer Selection, COLING 2018](https://aclanthology.coli.uni-saarland.de/papers/C18-1278/c18-1278) | 111 | | Pre-Attention (Kamath et al., 2019) | — | 0.852 | 0.891 | [Predicting and Integrating Expected Answer Types into a Simple Recurrent Neural Network Model for Answer Sentence Selection, CICLING 2019](https://hal.archives-ouvertes.fr/hal-02104488/) | 112 | | CETE (Laskar et al., 2020) | — | **0.950** | **0.980** | [Contextualized Embeddings based Transformer Encoder for Sentence Similarity Modeling in Answer Selection Task LREC 2020](http://www.lrec-conf.org/proceedings/lrec2020/pdf/2020.lrec-1.676.pdf) | 113 | 114 | ### TREC QA (Clean Version) 115 | 116 | | Model | Code | MAP | MRR | Paper | 117 | | :------------------------------- | :----------------------------------------------------------: | :-------: | :-------: | :----------------------------------------------------------- | 118 | | W&I (2015) | — | 0.746 | 0.820 | [FAQ-based Question Answering via Word Alignment, arXiv 2015](http://arxiv.org/abs/1507.02628) | 119 | | LSTM (Tan et al., 2015) | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/Alan-Lee123/answer-selection) | 0.728 | 0.832 | [LSTM-Based Deep Learning Models for Nonfactoid Answer Selection, arXiv 2015](http://arxiv.org/abs/1511.04108) | 120 | | AP-CNN (dos Santos et al. 2016) | — | 0.753 | 0.851 | [Attentive Pooling Networks, arXiv 2016](http://arxiv.org/abs/1602.03609) | 121 | | L.D.C Model (Wang et al., 2016) | — | 0.771 | 0.845 | [Sentence Similarity Learning by Lexical Decomposition and Composition, COLING 2016](http://arxiv.org/pdf/1602.07019v1.pdf) | 122 | | MP-CNN (Hua et al., 2015) | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/castorini/MP-CNN-Torch) | 0.777 | 0.836 | [Multi-Perspective Sentence Similarity Modeling with Convolutional Neural Networks, EMNLP 2015](http://aclweb.org/anthology/D/D15/D15-1181.pdf) | 123 | | HyperQA (Tay et al., 2017) | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/vanzytay/WSDM2018_HyperQA) | 0.784 | 0.865 | [Enabling Efficient Question Answer Retrieval via Hyperbolic Neural Networks, WSDM 2018](https://arxiv.org/pdf/1707.07847) | 124 | | MP-CNN (Rao et al., 2016) | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/castorini/NCE-CNN-Torch) | 0.801 | 0.877 | [Noise-Contrastive Estimation for Answer Selection with Deep Neural Networks, CIKM 2016](https://dl.acm.org/authorize.cfm?key=N27026) | 125 | | BiMPM (Wang et al., 2017) | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/zhiguowang/BiMPM) [![MatchZoo](https://img.shields.io/badge/matchzoo-ready-green)](https://github.com/NTMC-Community/MatchZoo-py/blob/dev/matchzoo/models/bimpm.py) | 0.802 | 0.875 | [Bilateral Multi-Perspective Matching for Natural Language Sentences, arXiv 2017](https://arxiv.org/pdf/1702.03814.pdf) | 126 | | CA (Bian et al., 2017) | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/wjbianjason/Dynamic-Clip-Attention) [![MatchZoo](https://img.shields.io/badge/matchzoo-ready-green)](https://github.com/NTMC-Community/MatchZoo-py/blob/dev/matchzoo/models/dynamic_clip.py) | 0.821 | 0.899 | [A Compare-Aggregate Model with Dynamic-Clip Attention for Answer Selection, CIKM 2017](https://dl.acm.org/citation.cfm?id=3133089&CFID=791659397&CFTOKEN=43388059) | 127 | | IWAN (Shen et al., 2017) | — | 0.822 | 0.889 | [Inter-Weighted Alignment Network for Sentence Pair Modeling, EMNLP 2017](https://aclanthology.info/pdf/D/D17/D17-1122.pdf) | 128 | | sCARNN (Tran et al., 2018) | — | 0.829 | 0.875 | [The Context-dependent Additive Recurrent Neural Net, NAACL 2018](http://www.aclweb.org/anthology/N18-1115) | 129 | | MCAN (Tay et al., 2018) | — | 0.838 | 0.904 | [Multi-Cast Attention Networks, KDD 2018](https://arxiv.org/abs/1806.00778) | 130 | | MP-CNN (Tayyar et al., 2018) | — | 0.865 | 0.904 | [Integrating Question Classification and Deep Learning for improved Answer Selection, COLING 2018](https://aclanthology.coli.uni-saarland.de/papers/C18-1278/c18-1278) | 131 | | CA + LM + LC (Yoon et al., 2019) | — | 0.868 | 0.928 | [A Compare-Aggregate Model with Latent Clustering for Answer Selection, CIKM 2019](https://arxiv.org/abs/1905.12897) | 132 | | GSAMN (Lai et al., 2019) | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/laituan245/StackExchangeQA) | 0.914 | 0.957 | [A Gated Self-attention Memory Network for Answer Selection, EMNLP 2019](https://arxiv.org/pdf/1909.09696.pdf) | 133 | | TANDA (Garg et al., 2019) | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/alexa/wqa_tanda) | **0.943** | 0.974 | [TANDA: Transfer and Adapt Pre-Trained Transformer Models for Answer Sentence Selection, AAAI 2020](https://arxiv.org/abs/1911.04118) | 134 | | CETE (Laskar et al., 2020) | — | 0.936 | **0.978** | [Contextualized Embeddings based Transformer Encoder for Sentence Similarity Modeling in Answer Selection Task, LREC 2020](http://www.lrec-conf.org/proceedings/lrec2020/pdf/2020.lrec-1.676.pdf) | 135 | 136 | ### WikiQA 137 | 138 | | Model | Code | MAP | MRR | Paper | 139 | | ---------------------------------------- | :-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | --------- | --------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------- | 140 | | ABCNN (Yin et al., 2016) | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/galsang/ABCNN) | 0.6921 | 0.7108 | [ABCNN: Attention-based convolutional neural network for modeling sentence pairs, ACL 2016](https://doi.org/10.1162/tacl_a_00097) | 141 | | Multi-Perspective CNN (Rao et al., 2016) | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/castorini/NCE-CNN-Torch) | 0.701 | 0.718 | [Noise-Contrastive Estimation for Answer Selection with Deep Neural Networks, CIKM 2016](https://dl.acm.org/authorize.cfm?key=N27026) | 142 | | HyperQA (Tay et al., 2017) | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/vanzytay/WSDM2018_HyperQA) | 0.705 | 0.720 | [Enabling Efficient Question Answer Retrieval via Hyperbolic Neural Networks, WSDM 2018](https://arxiv.org/pdf/1707.07847) | 143 | | KVMN (Miller et al., 2016) | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/siyuanzhao/key-value-memory-networks) | 0.7069 | 0.7265 | [Key-Value Memory Networks for Directly Reading Documents, ACL 2016](https://doi.org/10.18653/v1/D16-1147) | 144 | | BiMPM (Wang et al., 2017) | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/zhiguowang/BiMPM) [![MatchZoo](https://img.shields.io/badge/matchzoo-ready-green)](https://github.com/NTMC-Community/MatchZoo-py/blob/dev/matchzoo/models/bimpm.py) | 0.718 | 0.731 | [Bilateral Multi-Perspective Matching for Natural Language Sentences, IJCAI 2017](https://arxiv.org/pdf/1702.03814.pdf) | 145 | | IWAN (Shen et al., 2017) | — | 0.733 | 0.750 | [Inter-Weighted Alignment Network for Sentence Pair Modeling, EMNLP 2017](https://aclanthology.info/pdf/D/D17/D17-1122.pdf) | 146 | | CA (Wang and Jiang, 2017) | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/pcgreat/SeqMatchSeq) | 0.7433 | 0.7545 | [A Compare-Aggregate Model for Matching Text Sequences, ICLR 2017](https://arxiv.org/abs/1611.01747) | 147 | | HCRN (Tay et al., 2018c) | [![MatchZoo](https://img.shields.io/badge/matchzoo-ready-green)](https://github.com/NTMC-Community/MatchZoo-py/blob/dev/matchzoo/models/hcrn.py) | 0.7430 | 0.7560 | [Hermitian co-attention networks for text matching in asymmetrical domains, IJCAI 2018](https://www.ijcai.org/proceedings/2018/615) | 148 | | Compare-Aggregate (Bian et al., 2017) | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/wjbianjason/Dynamic-Clip-Attention) [![MatchZoo](https://img.shields.io/badge/matchzoo-ready-green)](https://github.com/NTMC-Community/MatchZoo-py/blob/dev/matchzoo/models/dynamic_clip.py) | 0.748 | 0.758 | [A Compare-Aggregate Model with Dynamic-Clip Attention for Answer Selection, CIKM 2017](https://dl.acm.org/citation.cfm?id=3133089&CFID=791659397&CFTOKEN=43388059) | 149 | | RE2 (Yang et al., 2019) | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/alibaba-edu/simple-effective-text-matching) [![MatchZoo](https://img.shields.io/badge/matchzoo-ready-green)](https://github.com/NTMC-Community/MatchZoo-py/blob/dev/matchzoo/models/re2.py) | 0.7452 | 0.7618 | [Simple and Effective Text Matching with Richer Alignment Features, ACL 2019](https://www.aclweb.org/anthology/P19-1465.pdf) | 150 | | GSAMN (Lai et al., 2019) | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/laituan245/StackExchangeQA) | 0.857 | 0.872 | [A Gated Self-attention Memory Network for Answer Selection, EMNLP 2019](https://arxiv.org/pdf/1909.09696.pdf) | 151 | | TANDA (Garg et al., 2019) | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/alexa/wqa_tanda) | **0.920** | **0.933** | [TANDA: Transfer and Adapt Pre-Trained Transformer Models for Answer Sentence Selection, AAAI 2020](https://arxiv.org/abs/1911.04118) | 152 | -------------------------------------------------------------------------------- /Gemfile: -------------------------------------------------------------------------------- 1 | source 'https://rubygems.org' 2 | gem 'github-pages', group: :jekyll_plugins -------------------------------------------------------------------------------- /LFQA/LFQA.md: -------------------------------------------------------------------------------- 1 | # Community Question Answering 2 | 3 | **Long Form Question Answer** is to automatically search for relevant answers among many responses provided for a given question (Answer Selection), and search for relevant questions to reuse their existing answers (Question Retrieval). 4 | 5 | ## Classic Datasets 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | 40 | 41 | 42 | 43 | 44 | 45 | 46 | 47 | 48 | 49 | 50 | 51 | 52 | 53 | 54 | 55 | 56 | 57 | 58 | 59 | 60 | 61 | 62 | 63 | 64 | 65 | 66 |

Dataset	Domain	#Question	#Answer
TRECQA	Open-domain	1,229	5,3417
WikiQA	Open-domain	3,047	29,258
InsuranceQA	Insurance	12,889	21,325
FiQA	Financial	6,648	57,641
Yahoo! Answers	Open-domain	50,112	253,440
SemEval-2015 Task 3	Open-domain	2,600	16,541
SemEval-2016 Task 3	Open-domain	4,879	36,198
SemEval-2017 Task 3	Open-domain	4,879	36,198

67 | 68 | - [**TRECQA**](https://trec.nist.gov/data/qa.html) dataset is created by [Wang et. al.](https://www.aclweb.org/anthology/D07-1003) from TREC QA track 8-13 data, with candidate answers automatically selected from each question’s document pool using a combination of overlapping non-stop word counts and pattern matching. This data set is one of the most widely used benchmarks for [answer sentence selection](). 69 | 70 | - [**WikiQA**](https://www.microsoft.com/en-us/download/details.aspx?id=52419) is a publicly available set of question and sentence pairs, collected and annotated for research on open-domain question answering by Microsoft Research. 71 | 72 | - [**InsuranceQA**](https://github.com/shuzi/insuranceQA) is a non-factoid QA dataset from the insurance domain. Question may have multiple correct answers and normally the questions are much shorter than the answers. The average length of questions and answers in tokens are 7 and 95, respectively. For each question in the development and test sets, there is a set of 500 candidate answers. 73 | 74 | - [**FiQA**](https://sites.google.com/view/fiqa) is a non-factoid QA dataset from the financial domain which has been recently released for WWW 2018 Challenges. The dataset is built by crawling Stackexchange, Reddit and StockTwits in which part of the questions are opinionated, targeting mined opinions and their respective entities, aspects, sentiment polarity and opinion holder. 75 | 76 | - [**Yahoo! Answers**](https://webscope.sandbox.yahoo.com) is a web site where people post questions and answers, all of which are public to any web user willing to browse or download them. The data we have collected is the Yahoo! Answers corpus as of 10/25/2007. This is a benchmark dataset for communitybased question answering that was collected from Yahoo Answers. In this dataset, the answer lengths are relatively longer than TrecQA and WikiQA. 77 | 78 | - [**SemEval-2015 Task 3**](http://alt.qcri.org/semeval2015/task3/) consists of two sub-tasks. In Subtask A, given a question (short title + extended description), and several community answers, classify each of the answer as definitely relevance (good), potentially useful (potential), or bad or irrelevant (bad, dialog, non-english other). In Subtask B, given a YES/NO question (short title + extended description), and a list of community answers, decide whether the global answer to the question should be yes, no, or unsure. 79 | 80 | - [**SemEval-2016 Task 3**](http://alt.qcri.org/semeval2016/task3/) consists two sub-tasks, namely _Question-Comment Similarity_ and _Question-Question Similarity_. In the _Question-Comment Similarity_ task, given a question from a question-comment thread, rank the comments according to their relevance with respect to the question. In _Question-Question Similarity_ task, given the new question, rerank all similar questions retrieved by a search engine. 81 | 82 | - [**SemEval-2017 Task 3**](http://alt.qcri.org/semeval2017/task3/) contains two sub-tasks, namely _Question Similarity_ and _Relevance Classification_. Given the new question and a set of related questions from the collection, the _Question Similarity_ task is to rank the similar questions according to their similarity to the original question. While the _Relevance Classification_ is to rank the answer posts according to their relevance with respect to the question based on a question-answer thread. 83 | 84 | ## Performance 85 | 86 | ### TREC QA (Raw Version) 87 | 88 | | Model | Code | MAP | MRR | Paper | 89 | | :---------------------------------- | :----------------------------------------------------------: | :-------: | :-------: | :----------------------------------------------------------- | 90 | | Punyakanok (2004) | — | 0.419 | 0.494 | [Mapping dependencies trees: An application to question answering, ISAIM 2004](http://cogcomp.cs.illinois.edu/papers/PunyakanokRoYi04a.pdf) | 91 | | Cui (2005) | — | 0.427 | 0.526 | [Question Answering Passage Retrieval Using Dependency Relations, SIGIR 2005](https://www.comp.nus.edu.sg/~kanmy/papers/f66-cui.pdf) | 92 | | Wang (2007) | — | 0.603 | 0.685 | [What is the Jeopardy Model? A Quasi-Synchronous Grammar for QA, EMNLP 2007](http://www.aclweb.org/anthology/D/D07/D07-1003.pdf) | 93 | | H&S (2010) | — | 0.609 | 0.692 | [Tree Edit Models for Recognizing Textual Entailments, Paraphrases, and Answers to Questions, NAACL 2010](http://www.aclweb.org/anthology/N10-1145) | 94 | | W&M (2010) | — | 0.595 | 0.695 | [Probabilistic Tree-Edit Models with Structured Latent Variables for Textual Entailment and Question Answering, COLING 2020](http://aclweb.org/anthology//C/C10/C10-1131.pdf) | 95 | | Yao (2013) | — | 0.631 | 0.748 | [Answer Extraction as Sequence Tagging with Tree Edit Distance, NAACL 2013](http://www.aclweb.org/anthology/N13-1106.pdf) | 96 | | S&M (2013) | — | 0.678 | 0.736 | [Automatic Feature Engineering for Answer Selection and Extraction, EMNLP 2013](http://www.aclweb.org/anthology/D13-1044.pdf) | 97 | | Backward (Shnarch et al., 2013) | — | 0.686 | 0.754 | [Probabilistic Models for Lexical Inference, Ph.D. thesis 2013](http://u.cs.biu.ac.il/~nlp/wp-content/uploads/eyal-thesis-library-ready.pdf) | 98 | | LCLR (Yih et al., 2013) | — | 0.709 | 0.770 | [Question Answering Using Enhanced Lexical Semantic Models, ACL 2013](http://research.microsoft.com/pubs/192357/QA-SentSel-Updated-PostACL.pdf) | 99 | | bigram+count (Yu et al., 2014) | — | 0.711 | 0.785 | [Deep Learning for Answer Sentence Selection, NIPS 2014](http://arxiv.org/pdf/1412.1632v1.pdf) | 100 | | BLSTM (W&N et al., 2015) | — | 0.713 | 0.791 | [A Long Short-Term Memory Model for Answer Sentence Selection in Question Answering, ACL 2015](http://www.aclweb.org/anthology/P15-2116) | 101 | | Architecture-II (Feng et al., 2015) | — | 0.711 | 0.800 | [Applying deep learning to answer selection: A study and an open task, ASRU 2015](http://arxiv.org/abs/1508.01585) | 102 | | PairCNN (Severyn et al., 2015) | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/zhangzibin/PairCNN-Ranking) | 0.746 | 0.808 | [Learning to Rank Short Text Pairs with Convolutional Deep Neural Networks, SIGIR 2015](http://disi.unitn.eu/moschitti/since2013/2015_SIGIR_Severyn_LearningRankShort.pdf) | 103 | | aNMM (Yang et al., 2016) | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/yangliuy/aNMM-CIKM16) [![MatchZoo](https://img.shields.io/badge/matchzoo-ready-green)](https://github.com/NTMC-Community/MatchZoo-py/tree/master/matchzoo/models/anmm.py) | 0.750 | 0.811 | [aNMM: Ranking Short Answer Texts with Attention-Based Neural Matching Model, CIKM 2016](http://maroo.cs.umass.edu/pub/web/getpdf.php?id=1240) | 104 | | HDLA (Tay et al., 2017) | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/vanzytay/YahooQA_Splits) | 0.750 | 0.815 | [Learning to Rank Question Answer Pairs with Holographic Dual LSTM Architecture, SIGIR 2017](https://arxiv.org/abs/1707.06372) | 105 | | PWIM (Hua et al. 2016) | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/castorini/VDPWI-NN-Torch) | 0.758 | 0.822 | [Pairwise Word Interaction Modeling with Deep Neural Networks for Semantic Similarity Measurement, NAACL 2016](https://cs.uwaterloo.ca/~jimmylin/publications/He_etal_NAACL-HTL2016.pdf) | 106 | | MP-CNN (Hua et al. 2015) | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/castorini/MP-CNN-Torch) | 0.762 | 0.830 | [Multi-Perspective Sentence Similarity Modeling with Convolutional Neural Networks, EMNLP 2015](http://aclweb.org/anthology/D/D15/D15-1181.pdf) | 107 | | HyperQA (Tay et al., 2017) | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/vanzytay/WSDM2018_HyperQA) | 0.770 | 0.825 | [Enabling Efficient Question Answer Retrieval via Hyperbolic Neural Networks, WSDM 2018](https://arxiv.org/pdf/1707.07847) | 108 | | MP-CNN (Rao et al., 2016) | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/castorini/NCE-CNN-Torch) | 0.780 | 0.834 | [Noise-Contrastive Estimation for Answer Selection with Deep Neural Networks, CIKM 2016](https://dl.acm.org/authorize.cfm?key=N27026) | 109 | | HCAN (Rao et al., 2019) | — | 0.774 | 0.843 | [Bridging the Gap between Relevance Matching and Semantic Matching for Short Text Similarity Modeling, EMNLP 2019](https://jinfengr.github.io/publications/Rao_etal_EMNLP2019.pdf) | 110 | | MP-CNN (Tayyar et al., 2018) | — | 0.836 | 0.863 | [Integrating Question Classification and Deep Learning for improved Answer Selection, COLING 2018](https://aclanthology.coli.uni-saarland.de/papers/C18-1278/c18-1278) | 111 | | Pre-Attention (Kamath et al., 2019) | — | 0.852 | 0.891 | [Predicting and Integrating Expected Answer Types into a Simple Recurrent Neural Network Model for Answer Sentence Selection, CICLING 2019](https://hal.archives-ouvertes.fr/hal-02104488/) | 112 | | CETE (Laskar et al., 2020) | — | **0.950** | **0.980** | [Contextualized Embeddings based Transformer Encoder for Sentence Similarity Modeling in Answer Selection Task LREC 2020](http://www.lrec-conf.org/proceedings/lrec2020/pdf/2020.lrec-1.676.pdf) | 113 | 114 | ### TREC QA (Clean Version) 115 | 116 | | Model | Code | MAP | MRR | Paper | 117 | | :------------------------------- | :----------------------------------------------------------: | :-------: | :-------: | :----------------------------------------------------------- | 118 | | W&I (2015) | — | 0.746 | 0.820 | [FAQ-based Question Answering via Word Alignment, arXiv 2015](http://arxiv.org/abs/1507.02628) | 119 | | LSTM (Tan et al., 2015) | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/Alan-Lee123/answer-selection) | 0.728 | 0.832 | [LSTM-Based Deep Learning Models for Nonfactoid Answer Selection, arXiv 2015](http://arxiv.org/abs/1511.04108) | 120 | | AP-CNN (dos Santos et al. 2016) | — | 0.753 | 0.851 | [Attentive Pooling Networks, arXiv 2016](http://arxiv.org/abs/1602.03609) | 121 | | L.D.C Model (Wang et al., 2016) | — | 0.771 | 0.845 | [Sentence Similarity Learning by Lexical Decomposition and Composition, COLING 2016](http://arxiv.org/pdf/1602.07019v1.pdf) | 122 | | MP-CNN (Hua et al., 2015) | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/castorini/MP-CNN-Torch) | 0.777 | 0.836 | [Multi-Perspective Sentence Similarity Modeling with Convolutional Neural Networks, EMNLP 2015](http://aclweb.org/anthology/D/D15/D15-1181.pdf) | 123 | | HyperQA (Tay et al., 2017) | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/vanzytay/WSDM2018_HyperQA) | 0.784 | 0.865 | [Enabling Efficient Question Answer Retrieval via Hyperbolic Neural Networks, WSDM 2018](https://arxiv.org/pdf/1707.07847) | 124 | | MP-CNN (Rao et al., 2016) | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/castorini/NCE-CNN-Torch) | 0.801 | 0.877 | [Noise-Contrastive Estimation for Answer Selection with Deep Neural Networks, CIKM 2016](https://dl.acm.org/authorize.cfm?key=N27026) | 125 | | BiMPM (Wang et al., 2017) | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/zhiguowang/BiMPM) [![MatchZoo](https://img.shields.io/badge/matchzoo-ready-green)](https://github.com/NTMC-Community/MatchZoo-py/blob/dev/matchzoo/models/bimpm.py) | 0.802 | 0.875 | [Bilateral Multi-Perspective Matching for Natural Language Sentences, arXiv 2017](https://arxiv.org/pdf/1702.03814.pdf) | 126 | | CA (Bian et al., 2017) | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/wjbianjason/Dynamic-Clip-Attention) [![MatchZoo](https://img.shields.io/badge/matchzoo-ready-green)](https://github.com/NTMC-Community/MatchZoo-py/blob/dev/matchzoo/models/dynamic_clip.py) | 0.821 | 0.899 | [A Compare-Aggregate Model with Dynamic-Clip Attention for Answer Selection, CIKM 2017](https://dl.acm.org/citation.cfm?id=3133089&CFID=791659397&CFTOKEN=43388059) | 127 | | IWAN (Shen et al., 2017) | — | 0.822 | 0.889 | [Inter-Weighted Alignment Network for Sentence Pair Modeling, EMNLP 2017](https://aclanthology.info/pdf/D/D17/D17-1122.pdf) | 128 | | sCARNN (Tran et al., 2018) | — | 0.829 | 0.875 | [The Context-dependent Additive Recurrent Neural Net, NAACL 2018](http://www.aclweb.org/anthology/N18-1115) | 129 | | MCAN (Tay et al., 2018) | — | 0.838 | 0.904 | [Multi-Cast Attention Networks, KDD 2018](https://arxiv.org/abs/1806.00778) | 130 | | MP-CNN (Tayyar et al., 2018) | — | 0.865 | 0.904 | [Integrating Question Classification and Deep Learning for improved Answer Selection, COLING 2018](https://aclanthology.coli.uni-saarland.de/papers/C18-1278/c18-1278) | 131 | | CA + LM + LC (Yoon et al., 2019) | — | 0.868 | 0.928 | [A Compare-Aggregate Model with Latent Clustering for Answer Selection, CIKM 2019](https://arxiv.org/abs/1905.12897) | 132 | | GSAMN (Lai et al., 2019) | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/laituan245/StackExchangeQA) | 0.914 | 0.957 | [A Gated Self-attention Memory Network for Answer Selection, EMNLP 2019](https://arxiv.org/pdf/1909.09696.pdf) | 133 | | TANDA (Garg et al., 2019) | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/alexa/wqa_tanda) | **0.943** | 0.974 | [TANDA: Transfer and Adapt Pre-Trained Transformer Models for Answer Sentence Selection, AAAI 2020](https://arxiv.org/abs/1911.04118) | 134 | | CETE (Laskar et al., 2020) | — | 0.936 | **0.978** | [Contextualized Embeddings based Transformer Encoder for Sentence Similarity Modeling in Answer Selection Task, LREC 2020](http://www.lrec-conf.org/proceedings/lrec2020/pdf/2020.lrec-1.676.pdf) | 135 | 136 | ### WikiQA 137 | 138 | | Model | Code | MAP | MRR | Paper | 139 | | ---------------------------------------- | :-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | --------- | --------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------- | 140 | | ABCNN (Yin et al., 2016) | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/galsang/ABCNN) | 0.6921 | 0.7108 | [ABCNN: Attention-based convolutional neural network for modeling sentence pairs, ACL 2016](https://doi.org/10.1162/tacl_a_00097) | 141 | | Multi-Perspective CNN (Rao et al., 2016) | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/castorini/NCE-CNN-Torch) | 0.701 | 0.718 | [Noise-Contrastive Estimation for Answer Selection with Deep Neural Networks, CIKM 2016](https://dl.acm.org/authorize.cfm?key=N27026) | 142 | | HyperQA (Tay et al., 2017) | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/vanzytay/WSDM2018_HyperQA) | 0.705 | 0.720 | [Enabling Efficient Question Answer Retrieval via Hyperbolic Neural Networks, WSDM 2018](https://arxiv.org/pdf/1707.07847) | 143 | | KVMN (Miller et al., 2016) | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/siyuanzhao/key-value-memory-networks) | 0.7069 | 0.7265 | [Key-Value Memory Networks for Directly Reading Documents, ACL 2016](https://doi.org/10.18653/v1/D16-1147) | 144 | | BiMPM (Wang et al., 2017) | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/zhiguowang/BiMPM) [![MatchZoo](https://img.shields.io/badge/matchzoo-ready-green)](https://github.com/NTMC-Community/MatchZoo-py/blob/dev/matchzoo/models/bimpm.py) | 0.718 | 0.731 | [Bilateral Multi-Perspective Matching for Natural Language Sentences, IJCAI 2017](https://arxiv.org/pdf/1702.03814.pdf) | 145 | | IWAN (Shen et al., 2017) | — | 0.733 | 0.750 | [Inter-Weighted Alignment Network for Sentence Pair Modeling, EMNLP 2017](https://aclanthology.info/pdf/D/D17/D17-1122.pdf) | 146 | | CA (Wang and Jiang, 2017) | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/pcgreat/SeqMatchSeq) | 0.7433 | 0.7545 | [A Compare-Aggregate Model for Matching Text Sequences, ICLR 2017](https://arxiv.org/abs/1611.01747) | 147 | | HCRN (Tay et al., 2018c) | [![MatchZoo](https://img.shields.io/badge/matchzoo-ready-green)](https://github.com/NTMC-Community/MatchZoo-py/blob/dev/matchzoo/models/hcrn.py) | 0.7430 | 0.7560 | [Hermitian co-attention networks for text matching in asymmetrical domains, IJCAI 2018](https://www.ijcai.org/proceedings/2018/615) | 148 | | Compare-Aggregate (Bian et al., 2017) | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/wjbianjason/Dynamic-Clip-Attention) [![MatchZoo](https://img.shields.io/badge/matchzoo-ready-green)](https://github.com/NTMC-Community/MatchZoo-py/blob/dev/matchzoo/models/dynamic_clip.py) | 0.748 | 0.758 | [A Compare-Aggregate Model with Dynamic-Clip Attention for Answer Selection, CIKM 2017](https://dl.acm.org/citation.cfm?id=3133089&CFID=791659397&CFTOKEN=43388059) | 149 | | RE2 (Yang et al., 2019) | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/alibaba-edu/simple-effective-text-matching) [![MatchZoo](https://img.shields.io/badge/matchzoo-ready-green)](https://github.com/NTMC-Community/MatchZoo-py/blob/dev/matchzoo/models/re2.py) | 0.7452 | 0.7618 | [Simple and Effective Text Matching with Richer Alignment Features, ACL 2019](https://www.aclweb.org/anthology/P19-1465.pdf) | 150 | | GSAMN (Lai et al., 2019) | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/laituan245/StackExchangeQA) | 0.857 | 0.872 | [A Gated Self-attention Memory Network for Answer Selection, EMNLP 2019](https://arxiv.org/pdf/1909.09696.pdf) | 151 | | TANDA (Garg et al., 2019) | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/alexa/wqa_tanda) | **0.920** | **0.933** | [TANDA: Transfer and Adapt Pre-Trained Transformer Models for Answer Sentence Selection, AAAI 2020](https://arxiv.org/abs/1911.04118) | 152 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2018 Neural Text Similarity Community 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /Natural-Language-Inference/Natural-Language-Inference.md: -------------------------------------------------------------------------------- 1 | # Natural language Inference 2 | 3 | **Natural Language Inference** is the task of determining whether a "hypothesis" is true (entailment), false (contradiction), or undetermined (neutral) given a "premise". 4 | 5 | ## Classic Datasets 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 |

Dataset	# sentence pair
SNLI	570K
MultiNLI	433K
SciTail	27K

29 | 30 | - [**SNLI**](https://arxiv.org/abs/1508.05326) is the short of Stanford Natural Language Inference, which has 570k human annotated sentence pairs. Thre premise data is draw from the captions of the Flickr30k corpus, and the hypothesis data is manually composed. 31 | - [**MultiNLI**](https://arxiv.org/abs/1704.05426) is short of Multi-Genre NLI, which has 433k sentence pairs, whose collection process and task detail are modeled closely to SNLI. The premise data is collected from maximally broad range of genre of American English such as non-fiction genres (SLATE, OUP, GOVERNMENT, VERBATIM, TRAVEL), spoken genres (TELEPHONE, FACE-TO-FACE), less formal written genres (FICTION, LETTERS) and a specialized one for 9/11. 32 | - [**SciTail**](http://ai2-website.s3.amazonaws.com/publications/scitail-aaai-2018_cameraready.pdf) entailment dataset consists of 27k. In contrast to the SNLI and MultiNLI, it was not crowd-sourced but created from sentences that already exist “in the wild”. Hypotheses were created from science questions and the corresponding answer candidates, while relevant web sentences from a large corpus were used as premises. 33 | 34 | ## Performance 35 | 36 | ### SNLI 37 | 38 | | Model | Code| Accuracy | Paper | 39 | | ------------------------------------------------------------ | :-----------------------------------: | ------------------------------------------------------------ | ------------------------------------------------------------ | 40 | | Match-LSTM (Wang et al. ,2016) | [![MatchZoo](https://img.shields.io/badge/matchzoo-ready-green)](https://github.com/NTMC-Community/MatchZoo-py/blob/master/matchzoo/models/matchlstm.py) | 86.1 | [Learning Natural Language Inference with LSTM](https://www.aclweb.org/anthology/N16-1170.pdf) | 41 | | Decomposable (Parikh et al., 2016) |—|86.3/86.8(Intra-sentence attention) | [A Decomposable Attention Model for Natural Language Inference](https://arxiv.org/pdf/1606.01933.pdf) | 42 | | BiMPM (Wang et al., 2017) | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://zhiguowang.github.io/) [![MatchZoo](https://img.shields.io/badge/matchzoo-ready-green)](https://github.com/NTMC-Community/MatchZoo-py/blob/master/matchzoo/models/bimpm.py)| 86.9 | [Bilateral Multi-Perspective Matching for Natural Language Sentences](https://arxiv.org/pdf/1702.03814.pdf) | 43 | | Shortcut-Stacked BiLSTM (Nie et al., 2017) | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/easonnie/multiNLI_encoder) | 86.1 | [Shortcut-Stacked Sentence Encoders for Multi-Domain Inference](https://arxiv.org/pdf/1708.02312.pdf) | 44 | | ESIM (Chen et al., 2017) | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/lukecq1231/nli) [![MatchZoo](https://img.shields.io/badge/matchzoo-ready-green)](https://github.com/NTMC-Community/MatchZoo-py/blob/master/matchzoo/models/esim.py) |88.0/88.6(Tree-LSTM) | [Enhanced LSTM for Natural Language Inference](https://arxiv.org/pdf/1609.06038.pdf) | 45 | | DIIN (Gong et al., 2018) | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/YichenGong/Densely-Interactive-Inference-Network) [![MatchZoo](https://img.shields.io/badge/matchzoo-ready-green)](https://github.com/NTMC-Community/MatchZoo-py/blob/master/matchzoo/models/diin.py) | 88.0 | [Natural Language Inference over Interaction Space](https://arxiv.org/pdf/1709.04348.pdf) | 46 | | SAN (Liu et al., 2018) |—| 88.7 | [Stochastic Answer Networks for Natural Language Inference](https://arxiv.org/pdf/1804.07888.pdf) | 47 | | AF-DMN (Duan et al., 2018) |—| 88.6 | [Attention-Fused Deep Matching Network for Natural Language Inference](https://www.ijcai.org/Proceedings/2018/0561.pdf) | 48 | | MwAN (Tan et al., 2018) |—| 88.3 | [Multiway Attention Networks for Modeling Sentence Pairs](https://www.ijcai.org/Proceedings/2018/0613.pdf) | 49 | | HBMP (Talman et al., 2018) | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/Helsinki-NLP/HBMP) [![MatchZoo](https://img.shields.io/badge/matchzoo-ready-green)](https://github.com/NTMC-Community/MatchZoo-py/blob/master/matchzoo/models/hbmp.py) | 86.6 | [Natural Language Inference with Hierarchical BiLSTM Max Pooling Architecture](https://arxiv.org/pdf/1808.08762v1.pdf) | 50 | | CAFE (Tay et al., 2018) |—| 88.5 | [Compare, Compress and Propagate: Enhancing Neural Architectures with Alignment Factorization for Natural Language Inference](https://arxiv.org/pdf/1801.00102v2.pdf) | 51 | | DSA (Yoon et al., 2018) |—| 86.8 | [Dynamic Self-Attention: Computing Attention over Words Dynamically for Sentence Embedding](https://arxiv.org/pdf/1808.07383.pdf) | 52 | | Enhancing Sentence Embedding with Generalized Pooling (Chen et al., 2018) | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/lukecq1231/generalized-pooling) | 86.6 | [Enhancing Sentence Embedding with Generalized Pooling](https://arxiv.org/pdf/1806.09828.pdf?source=post_page---------------------------) | 53 | | ReSAN (Shen et al., 2018) | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/taoshen58/DiSAN/tree/master/ReSAN) | 86.3 | [Reinforced Self-Attention Network: a Hybrid of Hard and Soft Attention for Sequence Modeling](https://arxiv.org/pdf/1801.10296.pdf) | 54 | | DMAN (Pan et al., 2018) |—| 88.8 | [Discourse Marker Augmented Network with Reinforcement Learning for Natural Language Inference](https://www.aclweb.org/anthology/P18-1091.pdf) | 55 | | DRCN (Kim et al., 2018) |—| 90.1 | [Semantic Sentence Matching with Densely-connected Recurrent and Co-attentive Information](https://www.aaai.org/ojs/index.php/AAAI/article/download/4627/4505) | 56 | | RE2 (Yang et al., 2019) | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/hitvoice/RE2) [![MatchZoo](https://img.shields.io/badge/matchzoo-ready-green)](https://github.com/NTMC-Community/MatchZoo-py/blob/dev/matchzoo/models/re2.py) | 88.9 | [Simple and Effective Text Matching with Richer Alignment Features](https://arxiv.org/pdf/1908.00300.pdf) | 57 | | MT-DNN (Liu et al., 2019) | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/namisan/mt-dnn) | **91.1(base)/91.6(large)** | [Multi-Task Deep Neural Networks for Natural Language Understanding](https://arxiv.org/pdf/1901.11504.pdf) | 58 | 59 | ### MNLI 60 | 61 | | Model |Code| Matched Accuracy | Mismatched Accuracy | Paper | 62 | | ------------------------------------------------------------ | :-----------------------: | ------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ | 63 | | ESIM (Chen et al., 2017) | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/lukecq1231/nli) [![MatchZoo](https://img.shields.io/badge/matchzoo-ready-green)](https://github.com/NTMC-Community/MatchZoo-py/blob/master/matchzoo/models/esim.py) | 76.8 | 75.8 | [Recurrent Neural Network-Based Sentence Encoder with Gated Attention for Natural Language Inference](https://arxiv.org/pdf/1708.01353.pdf) | 64 | | Shortcut-Stacked BiLSTM (Nie et al., 2017) | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/easonnie/multiNLI_encoder) | 74.6 | 73.6 | [Shortcut-Stacked Sentence Encoders for Multi-Domain Inference](https://arxiv.org/pdf/1708.02312.pdf) | 65 | | HBMP (Talman et al., 2018) | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/Helsinki-NLP/HBMP) [![MatchZoo](https://img.shields.io/badge/matchzoo-ready-green)](https://github.com/NTMC-Community/MatchZoo-py/blob/master/matchzoo/models/hbmp.py) | 73.7 | 73.0 | [Natural Language Inference with Hierarchical BiLSTM Max Pooling Architecture](https://arxiv.org/pdf/1808.08762v1.pdf) | 66 | | Generalized Pooling (Chen et al., 2018) | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/lukecq1231/generalized-pooling) | 73.8 | 74.0 | [Enhancing Sentence Embedding with Generalized Pooling](https://arxiv.org/pdf/1806.09828.pdf?source=post_page---------------------------) | 67 | | AF-DMN (Duan et al., 2018) |—| 76.9 | 76.3 | [Attention-Fused Deep Matching Network for Natural Language Inference](https://www.ijcai.org/Proceedings/2018/0561.pdf) | 68 | | DIIN (Gong et al., 2018) | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/YichenGong/Densely-Interactive-Inference-Network) [![MatchZoo](https://img.shields.io/badge/matchzoo-ready-green)](https://github.com/NTMC-Community/MatchZoo-py/blob/master/matchzoo/models/diin.py) | 78.8 | 77.8 | [Natural Language Inference over Interaction Space](https://github.com/YichenGong/Densely-Interactive-Inference-Network) | 69 | | SAN (Liu et al., 2018) |—| **79.3** | **78.7** | [Stochastic Answer Networks for Natural Language Inference](https://arxiv.org/pdf/1804.07888.pdf) | 70 | | MwAN (Tan et al., 2018) |—| 78.5 | 77.7 | [Multiway Attention Networks for Modeling Sentence Pairs](https://www.ijcai.org/Proceedings/2018/0613.pdf) | 71 | | CAFE (Tay et al., 2018) |—| 78.7 | 77.9 | [Compare, Compress and Propagate: Enhancing Neural Architectures with Alignment Factorization for Natural Language Inference](https://arxiv.org/pdf/1801.00102v2.pdf) | 72 | | DRCN (Kim et al., 2018) |—| 79.1 | 78.4 | [Semantic Sentence Matching with Densely-connected Recurrent and Co-attentive Information](https://www.aaai.org/ojs/index.php/AAAI/article/download/4627/4505) | 73 | | DMAN (Pan et al., 2018) |—| 78.9 | 78.2 | [Discourse Marker Augmented Network with Reinforcement Learning for Natural Language Inference](https://www.aclweb.org/anthology/P18-1091.pdf) | 74 | 75 | 76 | ### SciTail 77 | 78 | | Model | Code | Accuracy | Paper | 79 | | -------------------------- | :----------------------: | ------------------------------------------------------------ | ------------------------------------------------------------ | 80 | | SAN (Liu et al., 2018) |—| 88.4 | [Stochastic Answer Networks for Natural Language Inference](https://arxiv.org/pdf/1804.07888.pdf) | 81 | | HCRN (Tay et al., 2018) |—| 80.0 | [Hermitian Co-Attention Networks for Text Matching in Asymmetrical Domains](https://www.ijcai.org/Proceedings/2018/0615.pdf) | 82 | | HBMP (Talman et al., 2018) | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/Helsinki-NLP/HBMP) [![MatchZoo](https://img.shields.io/badge/matchzoo-ready-green)](https://github.com/NTMC-Community/MatchZoo-py/blob/master/matchzoo/models/hbmp.py) | 86.0 | [Natural Language Inference with Hierarchical BiLSTM Max Pooling Architecture](https://arxiv.org/pdf/1808.08762v1.pdf) | 83 | | CAFE (Tay et al., 2018) |—| 83.3 | [Compare, Compress and Propagate: Enhancing Neural Architectures with Alignment Factorization for Natural Language Inference](https://arxiv.org/pdf/1801.00102v2.pdf) | 84 | | RE2 (Yang et al., 2019) | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/hitvoice/RE2) [![MatchZoo](https://img.shields.io/badge/matchzoo-ready-green)](https://github.com/NTMC-Community/MatchZoo-py/blob/dev/matchzoo/models/re2.py) | 86.0 | [Simple and Effective Text Matching with Richer Alignment Features](https://arxiv.org/pdf/1908.00300.pdf) | 85 | | MT-DNN (Liu et al., 2019) | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/namisan/mt-dnn) | **94.1(base)/95.0(large)** | [Multi-Task Deep Neural Networks for Natural Language Understanding](https://arxiv.org/pdf/1901.11504.pdf) | 86 | -------------------------------------------------------------------------------- /Paraphrase-Identification/Paraphrase-Identification.md: -------------------------------------------------------------------------------- 1 | # Paraphrase Identification 2 | 3 | --- 4 | 5 | **Paraphrase Identification** is an task to determine whether two sentences have the same meaning, a problem considered a touchstone of natural language understanding. 6 | 7 | Take an instance in MRPC dataset for example, this is a pair of sentences with the same meaning: 8 | 9 | sentence1: Amrozi accused his brother, whom he called "the witness", of deliberately distorting his evidence. 10 | 11 | sentence2: Referring to him as only "the witness", Amrozi accused his brother of deliberately distorting his evidence. 12 | 13 | 14 | Some benchmark datasets are listed in the following. 15 | 16 | ## Classic Datasets 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | 40 | 41 | 42 | 43 | 44 | 45 | 46 | 47 |

Dataset	pairs of sentence
MRPC	5800
STS	1750
SICK-R	9840
SICK-E	9840
Quora Question Pair	404290

48 | 49 | - [**MRPC**](https://www.microsoft.com/en-us/download/details.aspx?id=52398&from=http%3A%2F%2Fresearch.microsoft.com%2Fen-us%2Fdownloads%2F607d14d9-20cd-47e3-85bc-a2f65cd28042%2Fdefault.aspx) is short for Microsoft Research Paraphrase Corpus. It contains 5,800 pairs of sentences which have been extracted from news sources on the web, along with human annotations indicating whether each pair captures a paraphrase/semantic equivalence relationship. 50 | - [**SentEval**](https://arxiv.org/pdf/1803.05449.pdf) encompasses semantic relatedness datasets including 51 | SICK and the STS Benchmark dataset. SICK dataset includes two subtasks SICK-R and SICK-E. 52 | For STS and SICK-R, it learns to predict relatedness scores between a pair of sentences. For SICK-E, it has the same pairs of sentences with SICK-R but can be treated as a three-class classification problem (classes are 'entailment', 'contradiction’, and 'neutral'). 53 | 54 | - [**Quora Question Pairs**](https://data.quora.com/First-Quora-Dataset-Release-Question-Pairs) is a task released by Quora which aims to identify duplicate questions. It consists of over 400,000 pairs of questions on Quora, and each question pair is annotated with a binary value indicating whether the two questions are paraphrase of each other. 55 | 56 | A list of neural matching models for paraphrase identification models are as follows. 57 | 58 | ## Performance 59 | 60 | ### MRPC 61 | 62 | | Model | Code | accuracy | f1 | Paper| 63 | | :-----:| :----: | :----: |:----: |:----: | 64 | | XLNet-Large (ensemble) (Yang et al., 2019) | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/zihangdai/xlnet/) | 93.0 | 90.7 | [XLNet: Generalized Autoregressive Pretraining for Language Understanding](https://arxiv.org/pdf/1906.08237.pdf) | 65 | | MT-DNN-ensemble (Liu et al., 2019) | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/namisan/mt-dnn/) |92.7 | 90.3 | [Improving Multi-Task Deep Neural Networks via Knowledge Distillation for Natural Language Understanding](https://arxiv.org/pdf/1904.09482.pdf) | 66 | | Snorkel MeTaL(ensemble) (Ratner et al., 2018) | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/HazyResearch/metal) | 91.5 | 88.5 | [Training Complex Models with Multi-Task Weak Supervision](https://arxiv.org/pdf/1810.02840.pdf) | 67 | | GenSen (Subramanian et al., 2018) | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/Maluuba/gensen) | 78.6 | 84.4 | [Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning](https://arxiv.org/abs/1804.00079) | 68 | | InferSent (Conneau et al., 2017) |[![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/facebookresearch/InferSent) | 76.2 | 83.1 | [Supervised Learning of Universal Sentence Representations from Natural Language Inference Data](https://arxiv.org/abs/1705.02364) | 69 | | TF-KLD (Ji and Eisenstein, 2013) | — | 80.4 | 85.9 | [Discriminative Improvements to Distributional Sentence Similarity](http://www.aclweb.org/anthology/D/D13/D13-1090.pdf) | 70 | | SpanBert (Joshi et al., 2019) | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/facebookresearch/SpanBERT) | 90.9 | 87.9 | [SpanBERT: Improving Pre-training by Representing and Predicting Spans](https://arxiv.org/pdf/1907.10529.pdf) | 71 | | MT-DNN (Liu et al., 2019) | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/namisan/mt-dnn) | 91.1 | 88.2 | [Multi-Task Deep Neural Networks for Natural Language Understanding](https://arxiv.org/pdf/1901.11504.pdf) | 72 | | AugDeepParaphrase (Agarwal et al., 2017) | — | 77.7 | 84.5 | [A Deep Network Model for Paraphrase Detection in Short Text Messages](https://arxiv.org/pdf/1712.02820.pdf) | 73 | | ERNIE (Zhang et al. 2019) | [![official](https://img.shields.io/badge/official-code-brightgreen)]( https://github.com/thunlp/ERNIE) | 88.2 | — | [ERNIE: Enhanced Language Representation with Informative Entities](https://arxiv.org/pdf/1905.07129.pdf) | 74 | | This work (Lan and Xu, 2018) | — | 84.0 | — | [Character-based Neural Networks for Sentence Pair Modeling](https://www.aclweb.org/anthology/N18-2025.pdf) | 75 | | ABCNN (Yin et al.2018) | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/yinwenpeng/Answer_Selection) | 78.9 | 84.8 | [ABCNN: Attention-Based Convolutional Neural Network for Modeling Sentence Pairs](https://arxiv.org/pdf/1512.05193.pdf) | 76 | | Attentive Tree-LSTMs (Zhou et al.2016) | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/yoosan/sentpair) | 75.8 |83.7 | [Modelling Sentence Pairs with Tree-structured Attentive Encoder](https://arxiv.org/pdf/1610.02806.pdf) | 77 | | Bi-CNN-MI (Yin and Schutze, 2015) | — | 78.1 | 84.4 | [Convolutional Neural Network for Paraphrase Identification](https://www.aclweb.org/anthology/N15-1091.pdf) | 78 | 79 | 80 | ### SentEval 81 | The evaluation metric for STS and SICK-R is Pearson correlation. 82 | 83 | The evaluation metri for SICK-E is classification accuracy. 84 | 85 | | Model | Code | SICK-R | SICK-E | STS | Paper| 86 | | :-----:| :----: | :----: |:----: |:----: |:----: | 87 | | XLNet-Large (ensemble) (Yang et al., 2019) | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/zihangdai/xlnet/) | — | — | 91.6 | [XLNet: Generalized Autoregressive Pretraining for Language Understanding](https://arxiv.org/pdf/1906.08237.pdf) | 88 | | MT-DNN-ensemble (Liu et al., 2019) | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/namisan/mt-dnn/) | — | — | 91.1 | [Improving Multi-Task Deep Neural Networks via Knowledge Distillation for Natural Language Understanding](https://arxiv.org/pdf/1904.09482.pdf) | 89 | | Snorkel MeTaL(ensemble) (Ratner et al., 2018) | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/HazyResearch/metal) | — | — | 90.1 | [Training Complex Models with Multi-Task Weak Supervision](https://arxiv.org/pdf/1810.02840.pdf) | 90 | | GenSen (Subramanian et al., 2018) | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/Maluuba/gensen) | 88.8 | 87.8 | 78.9 | [Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning](https://arxiv.org/abs/1804.00079) | 91 | | InferSent (Conneau et al., 2017) |[![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/facebookresearch/InferSent) | 88.4 | 86.3 | 75.8 | [Supervised Learning of Universal Sentence Representations from Natural Language Inference Data](https://arxiv.org/abs/1705.02364) | 92 | | SpanBert (Joshi et al., 2019) | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/facebookresearch/SpanBERT) | — | — | 89.9 | [SpanBERT: Improving Pre-training by Representing and Predicting Spans](https://arxiv.org/pdf/1907.10529.pdf) | 93 | | MT-DNN (Liu et al., 2019) | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/namisan/mt-dnn) | — | — | 89.5 | [Multi-Task Deep Neural Networks for Natural Language Understanding](https://arxiv.org/pdf/1901.11504.pdf) | 94 | | ERNIE (Zhang et al. 2019) | [![official](https://img.shields.io/badge/official-code-brightgreen)]( https://github.com/thunlp/ERNIE) | — | — | 83.2 | [ERNIE: Enhanced Language Representation with Informative Entities](https://arxiv.org/pdf/1905.07129.pdf) | 95 | | PWIM (He and Lin, 2016) |[![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/lukecq1231/nli) | — | — | 76.7 | [Pairwise Word Interaction Modeling with Deep Neural Networks for Semantic Similarity Measurement](https://www.aclweb.org/anthology/N16-1108.pdf)| 96 | 97 | 98 | ### Quora Question Pair 99 | 100 | | Model | Code | F1 | Accuracy | Paper| 101 | | :-----:| :----: | :----: |:----: |:----: | 102 | | XLNet-Large (ensemble) (Yang et al., 2019) | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/zihangdai/xlnet/) | 74.2 | 90.3 | [XLNet: Generalized Autoregressive Pretraining for Language Understanding](https://arxiv.org/pdf/1906.08237.pdf) | 103 | | MT-DNN-ensemble (Liu et al., 2019) | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/namisan/mt-dnn/) | 73.7 | 89.9 | [Improving Multi-Task Deep Neural Networks via Knowledge Distillation for Natural Language Understanding](https://arxiv.org/pdf/1904.09482.pdf) | 104 | |Snorkel MeTaL(ensemble) (Ratner et al., 2018) | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/HazyResearch/metal) | 73.1 | 89.9 | [Training Complex Models with Multi-Task Weak Supervision](https://arxiv.org/pdf/1810.02840.pdf) | 105 | |MwAN (Tan et al., 2018) | — | — | 89.12| [Multiway Attention Networks for Modeling Sentence Pairs](https://www.ijcai.org/proceedings/2018/0613.pdf) | 106 | | DIIN (Gong et al., 2018) | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/YichenGong/Densely-Interactive-Inference-Network) [![MatchZoo](https://img.shields.io/badge/matchzoo-ready-green)](https://github.com/NTMC-Community/MatchZoo-py/blob/dev/matchzoo/models/diin.py) | — | 89.06 | [Natural Language Inference over Interaction Space](https://arxiv.org/pdf/1709.04348.pdf) | 107 | | pt-DecAtt (Char) (Tomar et al., 2017) | — | — | 88.40 | [Neural Paraphrase Identification of Questions with Noisy Pretraining](https://arxiv.org/abs/1704.04565) | 108 | | BiMPM (Wang et al., 2017) | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/zhiguowang/BiMPM) [![MatchZoo](https://img.shields.io/badge/matchzoo-ready-green)](https://github.com/NTMC-Community/MatchZoo-py/blob/dev/matchzoo/models/bimpm.py) | — | 88.17 | [Bilateral Multi-Perspective Matching for Natural Language Sentences](https://arxiv.org/abs/1702.03814) | 109 | | GenSen (Subramanian et al., 2018) | — | — | 87.01 | [Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning](https://arxiv.org/abs/1804.00079)| [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/Maluuba/gensen) | 110 | | This work (Wang et al.2016) | — | 78.4 | 84.7 | [Sentence Similarity Learning by Lexical Decomposition and Composition](https://www.aclweb.org/anthology/C16-1127/) | 111 | | RE2 (Yang et al., 2019) | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/hitvoice/RE2) [![MatchZoo](https://img.shields.io/badge/matchzoo-ready-green)](https://github.com/NTMC-Community/MatchZoo-py/blob/dev/matchzoo/models/re2.py) | — | 89.2 | [Simple and Effective Text Matching with Richer Alignment Features](https://www.aclweb.org/anthology/P19-1465.pdf) | 112 | | MSEM (Wang et al.2016) | — | — | 88.86 | [Multi-task Sentence Encoding Model for Semantic Retrieval in Question Answering Systems](https://arxiv.org/ftp/arxiv/papers/1911/1911.07405.pdf) | 113 | | Bi-CAS-LSTM (Choi et al.2019) | — | — | 88.6 | [Cell-aware Stacked LSTMs for Modeling Sentences](https://arxiv.org/pdf/1809.02279.pdf)| 114 | |DecAtt (Parikh et al., 2016)| — | — | 86.5 | [A Decomposable Attention Model for Natural Language Inference](https://arxiv.org/pdf/1606.01933.pdf)| — | 115 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 |

2 |

3 |

4 |

Awesome Neural Models for Semantic Match

5 |

6 |
7 |

8 | _{A collection of papers maintained by MatchZoo Team.} 9 |
10 | _{Checkout our open source toolkit MatchZoo for more information!} 11 |

12 |
13 | 14 | Text matching is a core component in many natural language processing tasks, where many task can be viewed as a matching between two texts input. 15 | 16 |

17 |

18 |

19 | 20 | Where **s** and **t** are source text input and target text input, respectively. The **psi** and **phi** are representation function for input **s** and **t**, respectively. The **f** is the interaction function, and **g** is the aggregation function. More detailed explaination about this formula can be found on [A Deep Look into Neural Ranking Models for Information Retrieval](https://arxiv.org/abs/1903.06902). The representative matching tasks are as follows: 21 | 22 | | **Tasks** | **Source Text** | **Target Text** | 23 | | :------------------------------------------------------------------------------------------: | :-------------: | :----------------------: | 24 | | [Ad-hoc Information Retrieval](Ad-hoc-Information-Retrieval/Ad-hoc-Information-Retrieval.md) | query | document (title/content) | 25 | | [Community Question Answering](Community-Question-Answering/Community-Question-Answering.md) | question | question/answer | 26 | | [Paraphrase Identification](Paraphrase-Identification/Paraphrase-Identification.md) | string1 | string2 | 27 | | [Natural Language Inference](Natural-Language-Inference/Natural-Language-Inference.md) | premise | hypothesis | 28 | | [Response Retrieval](Response-Retrieval/Response-Retrieval.md) | context/utterances | response | 29 | | [Long Form Question Answering](LFQA/LFQA.md) | question+document | answer | 30 | 31 | ### Healthcheck 32 | 33 | ```python 34 | pip3 install -r requirements.txt 35 | python3 healthcheck.py 36 | ``` 37 | -------------------------------------------------------------------------------- /Response-Retrieval/Response-Retrieval.md: -------------------------------------------------------------------------------- 1 | ## Response retrieval 2 | 3 | **Response retrieval**/selection aims to rank/select a proper response from a dialog repository. 4 | Automatic conversation (AC) aims to create an automatic human-computer dialog process for the purpose of question answering, task completion, and social chat (i.e., chit-chat). In general, AC could be formulated either as an IR problem that aims to rank/select a proper response from a dialog repository or a generation problem that aims to generate an appropriate response with respect to the input utterance. Here, we refer response retrieval as the IR-based way to do AC. 5 | Example: 6 | ![](https://codimd.s3.shivering-isles.com/demo/uploads/upload_8b357e5358f6a9398092d46ccfeb619a.png) 7 | 8 | ## Classic Datasets 9 | 10 | |Dataset |Partition | #Context Response pair | #Candidate per Context | Positive:Negative |Avg #turns per context| 11 | |---- | ---- | ---- |---- | ---- | ---- | 12 | |UDC| train/validation/test| 1M/500k/500k| 2/10/10| 1:1/1:9/1:9| 10.13/10.11/10.11| 13 | |[**Douban**](https://github.com/MarkWuNLP/MultiTurnResponseSelection)| train/validation/test| 1M/50k/10k |2/2/10|1:1/1:1/1.18:8.82|6.69/6.75/6.45| 14 | |[**MSDialog**](https://ciir.cs.umass.edu/downloads/msdialog/)| train/validation/test| 173k/37k/35k | 10/10/10|1:9/1:9/1:9|5.0/4.9/4.4| 15 | |[**EDC**](https://github.com/cooelf/DeepUtteranceAggregation)| train/validation/test| 1M/10k/10k| 2/2/10| 1:1/1:1/1:9| 5.51/5.48/5.64| 16 | |Persona-Chat dataset| 8939/1000/968 | 20/20/20 | 1:19/1:19/1:19 | 7.35/7.80/7.76 | 17 | |CMUDoG dataset| 2881/196/537 | 20/20/20 | 1:19/1:19/1:19 | 12.55/12.37/12.36 | 18 | 19 | - Ubuntu Dialog Corpus (UDC) contains multi-turn dialogues collected from chat logs of the Ubuntu Forum. The data set consists of 1 million context-response pairs for training, 0.5 million pairs for validation, and 0.5 million pairs for testing. Positive responses are true responses from humans, and negative ones are randomly sampled. The ratio of the positive and the negative is 1:1 in training, and 1:9 in validation and testing. 20 | - [**Douban Conversation Corpus**](https://github.com/MarkWuNLP/MultiTurnResponseSelection) is an open domain dataset constructed from Douban group (a popular social networking service in China). The data set consists of 1 million context-response pairs for training, 50k pairs for validation, and 10k pairs for testing, corresponding to 2, 2, and 10 response candidates per context respectively. Response candidates on the test set, retrieved from Sina Weibo (the largest microblogging service in China), are labeled by human judges. 21 | - [**MSDialog**](https://ciir.cs.umass.edu/downloads/msdialog/) is a labeled dialog dataset of question answering (QA) interactions between information seekers and answer providers from an online forum on Microsoft products (Microsoft Community). The dataset contains more than 2,000 multi-turn information-seeking conversations with 10,000 utterances that are annotated with user intent on the utterance level. 22 | - [**E-commerce Dialogue Corpus**](https://github.com/cooelf/DeepUtteranceAggregation) contains over 5 types of conversations (e.g. commodity consultation, logistics express, recommendation, negotiation and chitchat) based on over 20 commodities. The ratio of the positive and the negative 23 | is 1:1 in training and validation, and 1:9 in testing. 24 | 25 | $R_n@k$: recall at position $k$ in $n$ candidates. 26 | 27 | ## Performance 28 | 29 | ### Ubuntu Corpus 30 | 31 | | Model | Code|MAP|$R_2@1$|$R_{10}@1$|$R_{10}@2$|$R_{10}@5$|Paper| type | 32 | | ---- | ---- | ---- | --- | --- | --- | --- | ---- | ---- | 33 | | Multi-View (Zhou et al. 2016)| N/A | — | 0.908 | 0.662 | 0.801 | 0.951 | [Multi-view Response Selection for Human-Computer Conversation, ACL 2016](https://www.aclweb.org/anthology/D16-1036.pdf) | multi-turn | 34 | | DL2R (Yan, Song and Wu 2016)| N/A | — | 0.899 | 0.626 | 0.783 | 0.944|[Learning to Respond with Deep Neural Networks for Retrieval-Based Human-Computer Conversation System, SIGIR 2016](http://www.ruiyan.me/pubs/SIGIR2016.pdf) | multi-turn| 35 | | SMN (Wu et al. 2017) | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/MarkWuNLP/MultiTurnResponseSelection) | 0.7327 | 0.927 | 0.726 |0.847| 0.962 |[Sequential Matching Network: A New Architecture for Multi-turn Response Selection in Retrieval-Based Chatbots, ACL 2017](https://www.aclweb.org/anthology/P17-1046.pdf)| Multi-turn| 36 | |DAM(Zhou et al. 2018) | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/baidu/Dialogue/tree/master/DAM) | — | 0.938 | 0.767 | 0.874 | 0.969 |[Multi-Turn Response Selection for Chatbots with Deep Attention Matching Network, ACL 2018](https://www.aclweb.org/anthology/P18-1103.pdf) | multi-turn| 37 | |DUA (Zhang et al. 2018)|[![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/cooelf/DeepUtteranceAggregation)| — | — | 0.752 | 0.868 | 0.962 |[Modeling Multi-turn Conversation with Deep Utterance Aggregation, arXiv 2018](https://arxiv.org/pdf/1806.09102.pdf)|multi-turn| 38 | | DMN (Yang et al. 2018)| [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/yangliuy/NeuralResponseRanking) | 0.7719 | — | — | — | — |[Response Ranking with Deep Matching Networks and External Knowledge in Information-seeking Conversation Systems, arXiv 2018](https://arxiv.org/pdf/1805.00188.pdf) |multi-turn | 39 | |U2U-IMN(Gu et al. 2019 a)|[![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/JasonForJoy/U2U-IMN) | **0.866** | 0.945 | 0.790 | 0.886 | 0.973 |[Utterance-to-Utterance Interactive Matching Network for Multi-Turn Response Selection in Retrieval-Based Chatbots, arXiv 2019](https://arxiv.org/pdf/1911.06940v1.pdf)|multi-turn| 40 | |TripleNet(Ma et al. 2019)|[![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/wtma/TripleNet.)|— | 0.943 | 0.79 | 0.885 | 0.97 |[TripleNet: Triple Attention Network for Multi-Turn Response Selection in Retrieval-based Chatbots, arXiv 2019](https://arxiv.org/pdf/1909.10666v2.pdf)|multi-turn| 41 | |IMN(Gu et al. 2019 b)|[![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/JasonForJoy/IMN)| — | 0.946 | 0.794 | 0.889 | 0.974|[Interactive Matching Network for Multi-Turn Response Selection in Retrieval-Based Chatbots, arXiv 2019](https://arxiv.org/pdf/1901.01824v2.pdf)|multi-turn| 42 | |IOI-local(Tao et al. 2019)|[![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/chongyangtao/IOI)| — | 0.947 | 0.796 | 0.894 | 0.974 | [One Time of Interaction May Not Be Enough: Go Deep with an Interaction-over-Interaction Network for Response Selection in Dialogues, ACL 2019](https://www.aclweb.org/anthology/P19-1001.pdf) |multi-turn| 43 | |MSN(Yuan et al. 2019)|[![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/chunyuanY/Dialogue)|— |— | 0.8 | 0.899 | 0.978 | [Multi-hop Selector Network for Multi-turn Response Selection in Retrieval-based Chatbots, ACL 2019](https://www.aclweb.org/anthology/D19-1011.pdf) |multi-turn| 44 | |SA-BERT (Gu et al. 2020)|[![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/JasonForJoy/SA-BERT)| — | **0.965** | **0.855** | **0.928** | **0.983** |[Speaker-Aware BERT for Multi-Turn Response Selection in Retrieval-Based Chatbots, arXiv 2020](https://arxiv.org/pdf/2004.03588v1.pdf)|multi-turn| 45 | |RoBERTaBASE-SS-DA (Lu et al. 2020)|[![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/CSLujunyu/Improving-Contextual-Language-Modelsfor-Response-Retrieval-in-Multi-Turn-Conversation) | - |0.955 |0.826 |0.909 |0.978 | [Improving Contextual Language Models for Response Retrieval in Multi-Turn Conversation, SIGIR 2020](https://dl.acm.org/doi/pdf/10.1145/3397271.3401255) | multi-turn| 46 | |SMN + ECMo (Tao et al. 2020)| N/A | - |0.934 |0.756 |0.867 |0.966 |[Improving Matching Models with Hierarchical Contextualized Representations for Multi-turn Response Selection, SIGIR 2020](https://dl.acm.org/doi/pdf/10.1145/3397271.3401290) | multi-turn| 47 | 48 | 49 | ### Douban Conversation Corpus 50 | 51 | | Model | Code | MAP | MRR | P@1| $R_{10}@1$ |$R_{10}@2$ |$R_{10}@5$|Paper| type | 52 | | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | 53 | | Multi-View (Zhou et al. 2016)| N/A | 0.505 | 0.543 | 0.342 | 0.202 | 0.350 | 0.729 | [Multi-view Response Selection for Human-Computer Conversation, ACL 2016](https://www.aclweb.org/anthology/D16-1036.pdf) | multi-turn | 54 | | DL2R (Yan, Song and Wu 2016)| N/A |0.488 | 0.527 | 0.33 | 0.193 | 0.342 | 0.705 |[Learning to Respond with Deep Neural Networks for Retrieval-Based Human-Computer Conversation System, SIGIR 2016](http://www.ruiyan.me/pubs/SIGIR2016.pdf) | multi-turn| 55 | | SMN (Wu et al. 2017) | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/MarkWuNLP/MultiTurnResponseSelection) | 0.529 | 0.572 | 0.397 | 0.236 | 0.396 | 0.734 |[Sequential Matching Network: A New Architecture for Multi-turn Response Selection in Retrieval-Based Chatbots, ACL 2017](https://www.aclweb.org/anthology/P17-1046.pdf)| Multi-turn| 56 | |DAM(Zhou et al. 2018) | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/baidu/Dialogue/tree/master/DAM) | 0.55 | 0.601 | 0.427 | 0.254 | 0.410 | 0.757 |[Multi-Turn Response Selection for Chatbots with Deep Attention Matching Network, ACL 2018](https://www.aclweb.org/anthology/P18-1103.pdf) | multi-turn| 57 | |DUA (Zhang et al. 2018)|[![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/cooelf/DeepUtteranceAggregation)| 0.551 | 0.599 | 0.421 | 0.243 | 0.421 | 0.780 |[Modeling Multi-turn Conversation with Deep Utterance Aggregation, arXiv 2018](https://arxiv.org/pdf/1806.09102.pdf)|multi-turn| 58 | |U2U-IMN(Gu et al. 2019 a)|[![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/JasonForJoy/U2U-IMN) |0.564 | 0.611 | 0.429 | 0.259 | 0.43 | 0.791 |[Utterance-to-Utterance Interactive Matching Network for Multi-Turn Response Selection in Retrieval-Based Chatbots, arXiv 2019](https://arxiv.org/pdf/1911.06940v1.pdf)|multi-turn| 59 | |TripleNet(Ma et al. 2019)|[![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/wtma/TripleNet.)|0.564 | 0.618 | 0.447 | 0.268 | 0.426 | 0.778 |[TripleNet: Triple Attention Network for Multi-Turn Response Selection in Retrieval-based Chatbots, arXiv 2019](https://arxiv.org/pdf/1909.10666v2.pdf)|multi-turn| 60 | |IMN(Gu et al. 2019 b)|[![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/JasonForJoy/IMN)|0.570 | 0.615 | 0.433 | 0.262 | 0.452 | 0.789 |[Interactive Matching Network for Multi-Turn Response Selection in Retrieval-Based Chatbots, arXiv 2019](https://arxiv.org/pdf/1901.01824v2.pdf)|multi-turn| 61 | |IOI-local(Tao et al. 2019)|[![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/chongyangtao/IOI)| 0.573 | 0.621 | 0.444 | 0.269 | 0.451 | 0.786 |[One Time of Interaction May Not Be Enough: Go Deep with an Interaction-over-Interaction Network for Response Selection in Dialogues, ACL 2019](https://www.aclweb.org/anthology/P19-1001.pdf)|multi-turn| 62 | |MSN(Yuan et al. 2019)|[![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/chunyuanY/Dialogue)| 0.587 | 0.632 | 0.470 | 0.295 | 0.452 | 0.788 |[Multi-hop Selector Network for Multi-turn Response Selection in Retrieval-based Chatbots, ACL 2019](https://www.aclweb.org/anthology/D19-1011.pdf)|multi-turn| 63 | |SA-BERT(Gu et al. 2020)|[![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/JasonForJoy/SA-BERT)| **0.619** | **0.659** | **0.496** | **0.313** | **0.481** | **0.847** |[Speaker-Aware BERT for Multi-Turn Response Selection in Retrieval-Based Chatbots, arXiv 2020](https://arxiv.org/pdf/2004.03588v1.pdf)|multi-turn| 64 | |RoBERTaBASE-SS-DA (Lu et al. 2020)|[![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/CSLujunyu/Improving-Contextual-Language-Modelsfor-Response-Retrieval-in-Multi-Turn-Conversation) | 0.602 | 0.646 | 0.460 | 0.280 | 0.495 | 0.847 | [Improving Contextual Language Models for Response Retrieval in Multi-Turn Conversation, SIGIR 2020](https://dl.acm.org/doi/pdf/10.1145/3397271.3401255) | multi-turn| 65 | |SMN + ECMo (Tao et al. 2020)| N/A | 0.549 | 0.593 | 0.409 | 0.247 | 0.416 | 0.774 |[Improving Matching Models with Hierarchical Contextualized Representations for Multi-turn Response Selection, SIGIR 2020](https://dl.acm.org/doi/pdf/10.1145/3397271.3401290) | multi-turn| 66 | 67 | 68 | ### MSDialog 69 | 70 | | Model | Code | MAP | Recall@5| Recall@1| Recall@2|Paper| type | 71 | | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | 72 | | DMN (Yang et al. 2018)|[![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/yangliuy/NeuralResponseRanking)| 0.6792 | 0.9356 | 0.5021 | 0.7122 |[Response Ranking with Deep Matching Networks and External Knowledge in Information-seeking Conversation Systems, arXiv 2018](https://arxiv.org/pdf/1805.00188.pdf) |multi-turn | 73 | 74 | ### E-commerce Corpus 75 | 76 | | Model | Code | MAP | $R_{10}@1$ | $R_{10}@2$ | $R_{10}@5$ | Paper| type | 77 | | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | 78 | | Multi-View (Zhou et al. 2016)| N/A | — | 0.421 | 0.601 | 0.861 | [Multi-view Response Selection for Human-Computer Conversation, ACL 2016](https://www.aclweb.org/anthology/D16-1036.pdf) | multi-turn | 79 | | DL2R (Yan, Song and Wu 2016)| N/A | — |0.399 | 0.571 | 0.842 |[Learning to Respond with Deep Neural Networks for Retrieval-Based Human-Computer Conversation System, SIGIR 2016](http://www.ruiyan.me/pubs/SIGIR2016.pdf) | multi-turn| 80 | | SMN (Wu et al. 2017) | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/MarkWuNLP/MultiTurnResponseSelection) | — | 0.453 | 0.654 | 0.886 | [Sequential Matching Network: A New Architecture for Multi-turn Response Selection in Retrieval-Based Chatbots, ACL 2017](https://www.aclweb.org/anthology/P17-1046.pdf) | Multi-turn| 81 | |DAM(Zhou et al. 2018) | [![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/baidu/Dialogue/tree/master/DAM) | — | 0.526 | 0.727 | 0.933 |[Multi-Turn Response Selection for Chatbots with Deep Attention Matching Network, ACL 2018](https://www.aclweb.org/anthology/P18-1103.pdf) | multi-turn| 82 | |DUA (Zhang et al. 2018)|[![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/cooelf/DeepUtteranceAggregation)| — | 0.501 | 0.700 | 0.921 |[Modeling Multi-turn Conversation with Deep Utterance Aggregation, arXiv 2018](https://arxiv.org/pdf/1806.09102.pdf)|multi-turn| 83 | |U2U-IMN(Gu et al. 2019 a)|[![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/JasonForJoy/U2U-IMN) |**0.759** | 0.616 | 0.806 | 0.966 |[Utterance-to-Utterance Interactive Matching Network for Multi-Turn Response Selection in Retrieval-Based Chatbots, arXiv 2019](https://arxiv.org/pdf/1911.06940v1.pdf)|multi-turn| 84 | |IMN(Gu et al. 2019 b)|[![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/JasonForJoy/IMN)| — | 0.621 | 0.797 | 0.964 |[Interactive Matching Network for Multi-Turn Response Selection in Retrieval-Based Chatbots, arXiv 2019](https://arxiv.org/pdf/1901.01824v2.pdf)|multi-turn| 85 | |IOI-local(Tao et al. 2019)|[![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/chongyangtao/IOI)|— | 0.563 | 0.768 | 0.950 |[One Time of Interaction May Not Be Enough: Go Deep with an Interaction-over-Interaction Network for Response Selection in Dialogues, ACL 2019](https://www.aclweb.org/anthology/P19-1001.pdf)|multi-turn| 86 | |MSN(Yuan et al. 2019)|[![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/chunyuanY/Dialogue)| — | 0.606 | 0.770 | 0.937 |[Multi-hop Selector Network for Multi-turn Response Selection in Retrieval-based Chatbots, ACL 2019](https://www.aclweb.org/anthology/D19-1011.pdf)|multi-turn| 87 | |SA-BERT(Gu et al. 2020)|[![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/JasonForJoy/SA-BERT)| — | 0.704 | 0.879 | **0.985** |[Speaker-Aware BERT for Multi-Turn Response Selection in Retrieval-Based Chatbots, arXiv 2020](https://arxiv.org/pdf/2004.03588v1.pdf)|multi-turn| 88 | |RoBERTaBASE-SS-DA (Lu et al. 2020)|[![official](https://img.shields.io/badge/official-code-brightgreen)](https://github.com/CSLujunyu/Improving-Contextual-Language-Modelsfor-Response-Retrieval-in-Multi-Turn-Conversation) | - | **0.800** | **0.910** | 0.972 | [Improving Contextual Language Models for Response Retrieval in Multi-Turn Conversation, SIGIR 2020](https://dl.acm.org/doi/pdf/10.1145/3397271.3401255) | multi-turn| 89 | 90 | ### Persona-Chat dataset 91 | 92 | Orinigal Persona 93 | | Model | Code | $R_{20}@1$ | $R_{20}@2$ | $R_{20}@5$ | Paper | type | 94 | | ---- | ---- | ---- | ----| ---- | ---- | ---- | 95 | | RSM-DCK (Hua et al. 2020) | N/A | 0.7965 | 0.9021 | 0.9747 | [Learning to Detect Relevant Contexts and Knowledge for Response Selection in Retrieval-based Dialogue Systems, CIKM 2020](https://dl.acm.org/doi/pdf/10.1145/3340531.3411967) | multi-turn | 96 | 97 | Revised Persona 98 | | Model | Code | $R_{20}@1$ | $R_{20}@2$ | $R_{20}@5$ | Paper | type | 99 | | ---- | ---- | ---- | ----| ---- | ---- | ---- | 100 | | RSM-DCK (Hua et al. 2020) | N/A | 0.7185 | 0.8494 | 0.9550 | [Learning to Detect Relevant Contexts and Knowledge for Response Selection in Retrieval-based Dialogue Systems, CIKM 2020](https://dl.acm.org/doi/pdf/10.1145/3340531.3411967) | multi-turn | 101 | ### CMUDoG dataset 102 | | Model | Code | $R_{20}@1$ | $R_{20}@2$ | $R_{20}@5$ | Paper | type | 103 | | ---- | ---- | ---- | ----| ---- | ---- | ---- | 104 | | RSM-DCK (Hua et al. 2020) | N/A | 0.7925 | 0.8884 | 0.9666 | [Learning to Detect Relevant Contexts and Knowledge for Response Selection in Retrieval-based Dialogue Systems, CIKM 2020](https://dl.acm.org/doi/pdf/10.1145/3340531.3411967) | multi-turn | 105 | 106 | -------------------------------------------------------------------------------- /_config.yml: -------------------------------------------------------------------------------- 1 | theme: jekyll-theme-slate -------------------------------------------------------------------------------- /_includes/chart.html: -------------------------------------------------------------------------------- 1 | 20 | 21 |

22 | {% for result in include.results %} 23 | {% assign score = result[include.score] %} 24 |

25 |

{{ result.authors }} ({{ result.year }})

26 |

27 |

28 | {% endfor %} 29 |

30 | -------------------------------------------------------------------------------- /_includes/table.html: -------------------------------------------------------------------------------- 1 | {% assign scores = include.scores | split: "," %} 2 | 3 | 4 | 5 | 6 | 7 | {% for score in scores %} 8 | 9 | {% endfor %} 10 | 11 | 12 | 13 | 14 | 15 | {% for result in include.results %} 16 | 17 | 18 | {% for score in scores %} 19 | 20 | {% endfor %} 21 | 22 | 27 | 28 | {% endfor %} 29 | 30 |

Model	{{ score }}	Paper / Source	Code
{% if result.model %} {{ result.model }} by {% endif %} {{ result.authors }} ({{ result.year }})	{{ result[score] }}	{{ result.paper }}	23 \| {% for el in result.code %} 24 \| {{ el.name }} 25 \| {% endfor %} 26 \|

31 | -------------------------------------------------------------------------------- /artworks/awesome.svg: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /artworks/equation.svg: -------------------------------------------------------------------------------- 1 | 2 | 3 | 49 | -------------------------------------------------------------------------------- /artworks/not-in-plan.svg: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /artworks/plan.svg: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /artworks/progress.svg: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /artworks/ready.svg: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /healthcheck.py: -------------------------------------------------------------------------------- 1 | import re 2 | import time 3 | import requests 4 | import markdown 5 | 6 | pattern = 'https?://(?:[-\w.]|(?:%[\da-fA-F]{2})).+?(?=">)' 7 | 8 | with open('./README.md') as f: 9 | document = markdown.markdown(f.read()) 10 | uris = re.findall(pattern, document) 11 | print(len(uris)) 12 | for uri in uris: 13 | try: 14 | r = requests.get(uri) 15 | print(f'uri {uri} with status: {r.status_code}') 16 | except: 17 | print("Connection refused by the server..") 18 | time.sleep(5) 19 | uris.append(uri) -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | requests>=2.19.1 2 | markdown>=2.6.11 --------------------------------------------------------------------------------