├── .github
└── ISSUE_TEMPLATE
│ ├── bug_report.md
│ └── feature_request.md
├── CODE_OF_CONDUCT.md
├── LICENSE
├── README.md
├── contributing.md
├── images
├── README
├── img-001.png
└── img-002.png
└── mds
├── named-entity-recognition.md
├── rdf.md
├── reference-resolution.md
└── relation-extraction.md
/.github/ISSUE_TEMPLATE/bug_report.md:
--------------------------------------------------------------------------------
1 | ---
2 | name: Bug report
3 | about: Create a report to help us improve
4 | title: ''
5 | labels: ''
6 | assignees: ''
7 |
8 | ---
9 |
10 | **Describe the bug**
11 | A clear and concise description of what the bug is.
12 |
13 | **To Reproduce**
14 | Steps to reproduce the behavior:
15 | 1. Go to '...'
16 | 2. Click on '....'
17 | 3. Scroll down to '....'
18 | 4. See error
19 |
20 | **Expected behavior**
21 | A clear and concise description of what you expected to happen.
22 |
23 | **Screenshots**
24 | If applicable, add screenshots to help explain your problem.
25 |
26 | **Desktop (please complete the following information):**
27 | - OS: [e.g. iOS]
28 | - Browser [e.g. chrome, safari]
29 | - Version [e.g. 22]
30 |
31 | **Smartphone (please complete the following information):**
32 | - Device: [e.g. iPhone6]
33 | - OS: [e.g. iOS8.1]
34 | - Browser [e.g. stock browser, safari]
35 | - Version [e.g. 22]
36 |
37 | **Additional context**
38 | Add any other context about the problem here.
39 |
--------------------------------------------------------------------------------
/.github/ISSUE_TEMPLATE/feature_request.md:
--------------------------------------------------------------------------------
1 | ---
2 | name: Feature request
3 | about: Suggest an idea for this project
4 | title: ''
5 | labels: ''
6 | assignees: ''
7 |
8 | ---
9 |
10 | **Is your feature request related to a problem? Please describe.**
11 | A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
12 |
13 | **Describe the solution you'd like**
14 | A clear and concise description of what you want to happen.
15 |
16 | **Describe alternatives you've considered**
17 | A clear and concise description of any alternative solutions or features you've considered.
18 |
19 | **Additional context**
20 | Add any other context or screenshots about the feature request here.
21 |
--------------------------------------------------------------------------------
/CODE_OF_CONDUCT.md:
--------------------------------------------------------------------------------
1 | # Contributor Covenant Code of Conduct
2 |
3 | ## Our Pledge
4 |
5 | In the interest of fostering an open and welcoming environment, we as contributors and maintainers pledge to making participation in our project and our community a harassment-free experience for everyone, regardless of age, body size, disability, ethnicity, gender identity and expression, level of experience, nationality, personal appearance, race, religion, or sexual identity and orientation.
6 |
7 | ## Our Standards
8 |
9 | Examples of behavior that contributes to creating a positive environment include:
10 |
11 | * Using welcoming and inclusive language
12 | * Being respectful of differing viewpoints and experiences
13 | * Gracefully accepting constructive criticism
14 | * Focusing on what is best for the community
15 | * Showing empathy towards other community members
16 |
17 | Examples of unacceptable behavior by participants include:
18 |
19 | * The use of sexualized language or imagery and unwelcome sexual attention or advances
20 | * Trolling, insulting/derogatory comments, and personal or political attacks
21 | * Public or private harassment
22 | * Publishing others' private information, such as a physical or electronic address, without explicit permission
23 | * Other conduct which could reasonably be considered inappropriate in a professional setting
24 |
25 | ## Our Responsibilities
26 |
27 | Project maintainers are responsible for clarifying the standards of acceptable behavior and are expected to take appropriate and fair corrective action in response to any instances of unacceptable behavior.
28 |
29 | Project maintainers have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned to this Code of Conduct, or to ban temporarily or permanently any contributor for other behaviors that they deem inappropriate, threatening, offensive, or harmful.
30 |
31 | ## Scope
32 |
33 | This Code of Conduct applies both within project spaces and in public spaces when an individual is representing the project or its community. Examples of representing a project or community include using an official project e-mail address, posting via an official social media account, or acting as an appointed representative at an online or offline event. Representation of a project may be further defined and clarified by project maintainers.
34 |
35 | ## Enforcement
36 |
37 | Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by contacting the project team at seriousran@gmail.com. The project team will review and investigate all complaints, and will respond in a way that it deems appropriate to the circumstances. The project team is obligated to maintain confidentiality with regard to the reporter of an incident. Further details of specific enforcement policies may be posted separately.
38 |
39 | Project maintainers who do not follow or enforce the Code of Conduct in good faith may face temporary or permanent repercussions as determined by other members of the project's leadership.
40 |
41 | ## Attribution
42 |
43 | This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4, available at [http://contributor-covenant.org/version/1/4][version]
44 |
45 | [homepage]: http://contributor-covenant.org
46 | [version]: http://contributor-covenant.org/version/1/4/
47 |
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | CC0 1.0 Universal
2 |
3 | Statement of Purpose
4 |
5 | The laws of most jurisdictions throughout the world automatically confer
6 | exclusive Copyright and Related Rights (defined below) upon the creator and
7 | subsequent owner(s) (each and all, an "owner") of an original work of
8 | authorship and/or a database (each, a "Work").
9 |
10 | Certain owners wish to permanently relinquish those rights to a Work for the
11 | purpose of contributing to a commons of creative, cultural and scientific
12 | works ("Commons") that the public can reliably and without fear of later
13 | claims of infringement build upon, modify, incorporate in other works, reuse
14 | and redistribute as freely as possible in any form whatsoever and for any
15 | purposes, including without limitation commercial purposes. These owners may
16 | contribute to the Commons to promote the ideal of a free culture and the
17 | further production of creative, cultural and scientific works, or to gain
18 | reputation or greater distribution for their Work in part through the use and
19 | efforts of others.
20 |
21 | For these and/or other purposes and motivations, and without any expectation
22 | of additional consideration or compensation, the person associating CC0 with a
23 | Work (the "Affirmer"), to the extent that he or she is an owner of Copyright
24 | and Related Rights in the Work, voluntarily elects to apply CC0 to the Work
25 | and publicly distribute the Work under its terms, with knowledge of his or her
26 | Copyright and Related Rights in the Work and the meaning and intended legal
27 | effect of CC0 on those rights.
28 |
29 | 1. Copyright and Related Rights. A Work made available under CC0 may be
30 | protected by copyright and related or neighboring rights ("Copyright and
31 | Related Rights"). Copyright and Related Rights include, but are not limited
32 | to, the following:
33 |
34 | i. the right to reproduce, adapt, distribute, perform, display, communicate,
35 | and translate a Work;
36 |
37 | ii. moral rights retained by the original author(s) and/or performer(s);
38 |
39 | iii. publicity and privacy rights pertaining to a person's image or likeness
40 | depicted in a Work;
41 |
42 | iv. rights protecting against unfair competition in regards to a Work,
43 | subject to the limitations in paragraph 4(a), below;
44 |
45 | v. rights protecting the extraction, dissemination, use and reuse of data in
46 | a Work;
47 |
48 | vi. database rights (such as those arising under Directive 96/9/EC of the
49 | European Parliament and of the Council of 11 March 1996 on the legal
50 | protection of databases, and under any national implementation thereof,
51 | including any amended or successor version of such directive); and
52 |
53 | vii. other similar, equivalent or corresponding rights throughout the world
54 | based on applicable law or treaty, and any national implementations thereof.
55 |
56 | 2. Waiver. To the greatest extent permitted by, but not in contravention of,
57 | applicable law, Affirmer hereby overtly, fully, permanently, irrevocably and
58 | unconditionally waives, abandons, and surrenders all of Affirmer's Copyright
59 | and Related Rights and associated claims and causes of action, whether now
60 | known or unknown (including existing as well as future claims and causes of
61 | action), in the Work (i) in all territories worldwide, (ii) for the maximum
62 | duration provided by applicable law or treaty (including future time
63 | extensions), (iii) in any current or future medium and for any number of
64 | copies, and (iv) for any purpose whatsoever, including without limitation
65 | commercial, advertising or promotional purposes (the "Waiver"). Affirmer makes
66 | the Waiver for the benefit of each member of the public at large and to the
67 | detriment of Affirmer's heirs and successors, fully intending that such Waiver
68 | shall not be subject to revocation, rescission, cancellation, termination, or
69 | any other legal or equitable action to disrupt the quiet enjoyment of the Work
70 | by the public as contemplated by Affirmer's express Statement of Purpose.
71 |
72 | 3. Public License Fallback. Should any part of the Waiver for any reason be
73 | judged legally invalid or ineffective under applicable law, then the Waiver
74 | shall be preserved to the maximum extent permitted taking into account
75 | Affirmer's express Statement of Purpose. In addition, to the extent the Waiver
76 | is so judged Affirmer hereby grants to each affected person a royalty-free,
77 | non transferable, non sublicensable, non exclusive, irrevocable and
78 | unconditional license to exercise Affirmer's Copyright and Related Rights in
79 | the Work (i) in all territories worldwide, (ii) for the maximum duration
80 | provided by applicable law or treaty (including future time extensions), (iii)
81 | in any current or future medium and for any number of copies, and (iv) for any
82 | purpose whatsoever, including without limitation commercial, advertising or
83 | promotional purposes (the "License"). The License shall be deemed effective as
84 | of the date CC0 was applied by Affirmer to the Work. Should any part of the
85 | License for any reason be judged legally invalid or ineffective under
86 | applicable law, such partial invalidity or ineffectiveness shall not
87 | invalidate the remainder of the License, and in such case Affirmer hereby
88 | affirms that he or she will not (i) exercise any of his or her remaining
89 | Copyright and Related Rights in the Work or (ii) assert any associated claims
90 | and causes of action with respect to the Work, in either case contrary to
91 | Affirmer's express Statement of Purpose.
92 |
93 | 4. Limitations and Disclaimers.
94 |
95 | a. No trademark or patent rights held by Affirmer are waived, abandoned,
96 | surrendered, licensed or otherwise affected by this document.
97 |
98 | b. Affirmer offers the Work as-is and makes no representations or warranties
99 | of any kind concerning the Work, express, implied, statutory or otherwise,
100 | including without limitation warranties of title, merchantability, fitness
101 | for a particular purpose, non infringement, or the absence of latent or
102 | other defects, accuracy, or the present or absence of errors, whether or not
103 | discoverable, all to the greatest extent permissible under applicable law.
104 |
105 | c. Affirmer disclaims responsibility for clearing rights of other persons
106 | that may apply to the Work or any use thereof, including without limitation
107 | any person's Copyright and Related Rights in the Work. Further, Affirmer
108 | disclaims responsibility for obtaining any necessary consents, permissions
109 | or other rights required for any use of the Work.
110 |
111 | d. Affirmer understands and acknowledges that Creative Commons is not a
112 | party to this document and has no duty or obligation with respect to this
113 | CC0 or use of the Work.
114 |
115 | For more information, please see
116 |
117 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Awesome Question Answering [](https://github.com/sindresorhus/awesome)
2 |
3 | _A curated list of the __[Question Answering (QA)](https://en.wikipedia.org/wiki/Question_answering)__ subject which is a computer science discipline within the fields of information retrieval and natural language processing (NLP) toward using machine learning and deep learning_
4 |
5 | _정보 검색 및 자연 언어 처리 분야의 질의응답에 관한 큐레이션 - 머신러닝과 딥러닝 단계까지_
6 | _问答系统主题的精选列表,是信息检索和自然语言处理领域的计算机科学学科 - 使用机器学习和深度学习_
7 |
8 | ## Contents
9 |
10 |
11 |
12 |
13 | - [Recent Trends](#recent-trends)
14 | - [About QA](#about-qa)
15 | - [Events](#events)
16 | - [Systems](#systems)
17 | - [Competitions in QA](#competitions-in-qa)
18 | - [Publications](#publications)
19 | - [Codes](#codes)
20 | - [Lectures](#lectures)
21 | - [Slides](#slides)
22 | - [Dataset Collections](#dataset-collections)
23 | - [Datasets](#datasets)
24 | - [Books](#books)
25 | - [Links](#links)
26 |
27 |
28 |
29 | ## Recent Trends
30 | ### Recent QA Models
31 | - DilBert: Delaying Interaction Layers in Transformer-based Encoders for Efficient Open Domain Question Answering (2020)
32 | - paper: https://arxiv.org/pdf/2010.08422.pdf
33 | - github: https://github.com/wissam-sib/dilbert
34 | - UnifiedQA: Crossing Format Boundaries With a Single QA System (2020)
35 | - Demo: https://unifiedqa.apps.allenai.org/
36 | - ProQA: Resource-efficient method for pretraining a dense corpus index for open-domain QA and IR. (2020)
37 | - paper: https://arxiv.org/pdf/2005.00038.pdf
38 | - github: https://github.com/xwhan/ProQA
39 | - TYDI QA: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages (2020)
40 | - paper: https://arxiv.org/ftp/arxiv/papers/2003/2003.05002.pdf
41 | - Retrospective Reader for Machine Reading Comprehension
42 | - paper: https://arxiv.org/pdf/2001.09694v2.pdf
43 | - TANDA: Transfer and Adapt Pre-Trained Transformer Models for Answer Sentence Selection (AAAI 2020)
44 | - paper: https://arxiv.org/pdf/1911.04118.pdf
45 | ### Recent Language Models
46 | - [ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators](https://openreview.net/pdf?id=r1xMH1BtvB), Kevin Clark, et al., ICLR, 2020.
47 | - [TinyBERT: Distilling BERT for Natural Language Understanding](https://openreview.net/pdf?id=rJx0Q6EFPB), Xiaoqi Jiao, et al., ICLR, 2020.
48 | - [MINILM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers](https://arxiv.org/abs/2002.10957), Wenhui Wang, et al., arXiv, 2020.
49 | - [T5: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/abs/1910.10683), Colin Raffel, et al., arXiv preprint, 2019.
50 | - [ERNIE: Enhanced Language Representation with Informative Entities](https://arxiv.org/abs/1905.07129), Zhengyan Zhang, et al., ACL, 2019.
51 | - [XLNet: Generalized Autoregressive Pretraining for Language Understanding](https://arxiv.org/abs/1906.08237), Zhilin Yang, et al., arXiv preprint, 2019.
52 | - [ALBERT: A Lite BERT for Self-supervised Learning of Language Representations](https://arxiv.org/abs/1909.11942), Zhenzhong Lan, et al., arXiv preprint, 2019.
53 | - [RoBERTa: A Robustly Optimized BERT Pretraining Approach](https://arxiv.org/abs/1907.11692), Yinhan Liu, et al., arXiv preprint, 2019.
54 | - [DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter](https://arxiv.org/pdf/1910.01108.pdf), Victor sanh, et al., arXiv, 2019.
55 | - [SpanBERT: Improving Pre-training by Representing and Predicting Spans](https://arxiv.org/pdf/1907.10529v3.pdf), Mandar Joshi, et al., TACL, 2019.
56 | - [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805), Jacob Devlin, et al., NAACL 2019, 2018.
57 | ### AAAI 2020
58 | - [TANDA: Transfer and Adapt Pre-Trained Transformer Models for Answer Sentence Selection](https://arxiv.org/pdf/1911.04118.pdf), Siddhant Garg, et al., AAAI 2020, Nov 2019.
59 | ### ACL 2019
60 | - [Overview of the MEDIQA 2019 Shared Task on Textual Inference,
61 | Question Entailment and Question Answering](https://www.aclweb.org/anthology/W19-5039), Asma Ben Abacha, et al., ACL-W 2019, Aug 2019.
62 | - [Towards Scalable and Reliable Capsule Networks for Challenging NLP Applications](https://arxiv.org/pdf/1906.02829v1.pdf), Wei Zhao, et al., ACL 2019, Jun 2019.
63 | - [Cognitive Graph for Multi-Hop Reading Comprehension at Scale](https://arxiv.org/pdf/1905.05460v2.pdf), Ming Ding, et al., ACL 2019, Jun 2019.
64 | - [Real-Time Open-Domain Question Answering with Dense-Sparse Phrase Index](https://arxiv.org/abs/1906.05807), Minjoon Seo, et al., ACL 2019, Jun 2019.
65 | - [Unsupervised Question Answering by Cloze Translation](https://arxiv.org/abs/1906.04980), Patrick Lewis, et al., ACL 2019, Jun 2019.
66 | - [SemEval-2019 Task 10: Math Question Answering](https://www.aclweb.org/anthology/S19-2153), Mark Hopkins, et al., ACL-W 2019, Jun 2019.
67 | - [Improving Question Answering over Incomplete KBs with Knowledge-Aware Reader](https://arxiv.org/abs/1905.07098), Wenhan Xiong, et al., ACL 2019, May 2019.
68 | - [Matching Article Pairs with Graphical Decomposition and Convolutions](https://arxiv.org/pdf/1802.07459v2.pdf), Bang Liu, et al., ACL 2019, May 2019.
69 | - [Episodic Memory Reader: Learning what to Remember for Question Answering from Streaming Data](https://arxiv.org/abs/1903.06164), Moonsu Han, et al., ACL 2019, Mar 2019.
70 | - [Natural Questions: a Benchmark for Question Answering Research](https://ai.google/research/pubs/pub47761), Tom Kwiatkowski, et al., TACL 2019, Jan 2019.
71 | - [Textbook Question Answering with Multi-modal Context Graph Understanding and Self-supervised Open-set Comprehension](https://arxiv.org/abs/1811.00232), Daesik Kim, et al., ACL 2019, Nov 2018.
72 | ### EMNLP-IJCNLP 2019
73 | - [Language Models as Knowledge Bases?](https://arxiv.org/pdf/1909.01066v2.pdf), Fabio Petron, et al., EMNLP-IJCNLP 2019, Sep 2019.
74 | - [LXMERT: Learning Cross-Modality Encoder Representations from Transformers](https://arxiv.org/pdf/1908.07490v3.pdf), Hao Tan, et al., EMNLP-IJCNLP 2019, Dec 2019.
75 | - [Answering Complex Open-domain Questions Through Iterative Query Generation](https://arxiv.org/pdf/1910.07000v1.pdf), Peng Qi, et al., EMNLP-IJCNLP 2019, Oct 2019.
76 | - [KagNet: Knowledge-Aware Graph Networks for Commonsense Reasoning](https://arxiv.org/pdf/1909.02151v1.pdf), Bill Yuchen Lin, et al., EMNLP-IJCNLP 2019, Sep 2019.
77 | - [Mixture Content Selection for Diverse Sequence Generation](https://arxiv.org/pdf/1909.01953v1.pdf), Jaemin Cho, et al., EMNLP-IJCNLP 2019, Sep 2019.
78 | - [A Discrete Hard EM Approach for Weakly Supervised Question Answering](https://arxiv.org/pdf/1909.04849v1.pdf), Sewon Min, et al., EMNLP-IJCNLP, 2019, Sep 2019.
79 | ### Arxiv
80 | - [Investigating the Successes and Failures of BERT for Passage Re-Ranking](https://arxiv.org/abs/1905.01758), Harshith Padigela, et al., arXiv preprint, May 2019.
81 | - [BERT with History Answer Embedding for Conversational Question Answering](https://arxiv.org/abs/1905.05412), Chen Qu, et al., arXiv preprint, May 2019.
82 | - [Understanding the Behaviors of BERT in Ranking](https://arxiv.org/abs/1904.07531), Yifan Qiao, et al., arXiv preprint, Apr 2019.
83 | - [BERT Post-Training for Review Reading Comprehension and Aspect-based Sentiment Analysis](https://arxiv.org/abs/1904.02232), Hu Xu, et al., arXiv preprint, Apr 2019.
84 | - [End-to-End Open-Domain Question Answering with BERTserini](https://arxiv.org/abs/1902.01718), Wei Yang, et al., arXiv preprint, Feb 2019.
85 | - [A BERT Baseline for the Natural Questions](https://arxiv.org/abs/1901.08634), Chris Alberti, et al., arXiv preprint, Jan 2019.
86 | - [Passage Re-ranking with BERT](https://arxiv.org/abs/1901.04085), Rodrigo Nogueira, et al., arXiv preprint, Jan 2019.
87 | - [SDNet: Contextualized Attention-based Deep Network for Conversational Question Answering](https://arxiv.org/abs/1812.03593), Chenguang Zhu, et al., arXiv, Dec 2018.
88 | ### Dataset
89 | - [ELI5: Long Form Question Answering](https://arxiv.org/abs/1907.09190), Angela Fan, et al., ACL 2019, Jul 2019
90 | - [CODAH: An Adversarially-Authored Question Answering Dataset for
91 | Common Sense](https://www.aclweb.org/anthology/W19-2008.pdf), Michael Chen, et al., RepEval 2019, Jun 2019.
92 |
93 | ## About QA
94 | ### Types of QA
95 | - Single-turn QA: answer without considering any context
96 | - Conversational QA: use previsous conversation turns
97 | #### Subtypes of QA
98 | - Knowledge-based QA
99 | - Table/List-based QA
100 | - Text-based QA
101 | - Community-based QA
102 | - Visual QA
103 |
104 | ### Analysis and Parsing for Pre-processing in QA systems
105 | Lanugage Analysis
106 | 1. [Morphological analysis](https://www.cs.bham.ac.uk/~pjh/sem1a5/pt2/pt2_intro_morphology.html)
107 | 2. [Named Entity Recognition(NER)](mds/named-entity-recognition.md)
108 | 3. Homonyms / Polysemy Analysis
109 | 4. Syntactic Parsing (Dependency Parsing)
110 | 5. Semantic Recognition
111 |
112 | ### Most QA systems have roughly 3 parts
113 | 1. Fact extraction
114 | 1. Entity Extraction
115 | 1. [Named-Entity Recognition(NER)](mds/named-entity-recognition.md)
116 | 2. [Relation Extraction](mds/relation-extraction.md)
117 | 2. Understanding the question
118 | 3. Generating an answer
119 |
120 | ## Events
121 | - Wolfram Alpha launced the answer engine in 2009.
122 | - IBM Watson system defeated top *[Jeopardy!](https://www.jeopardy.com)* champions in 2011.
123 | - Apple's Siri integrated Wolfram Alpha's answer engine in 2011.
124 | - Google embraced QA by launching its Knowledge Graph, leveraging the free base knowledge base in 2012.
125 | - Amazon Echo | Alexa (2015), Google Home | Google Assistant (2016), INVOKE | MS Cortana (2017), HomePod (2017)
126 |
127 | ## Systems
128 | - [IBM Watson](https://www.ibm.com/watson/) - Has state-of-the-arts performance.
129 | - [Facebook DrQA](https://research.fb.com/downloads/drqa/) - Applied to the SQuAD1.0 dataset. The SQuAD2.0 dataset has released. but DrQA is not tested yet.
130 | - [MIT media lab's Knowledge graph](http://conceptnet.io/) - Is a freely-available semantic network, designed to help computers understand the meanings of words that people use.
131 |
132 | ## Competitions in QA
133 |
134 | | | Dataset | Language | Organizer | Since | Top Rank | Model | Status | Over Human Performance |
135 | |---|------------------|---------------|---------------------|-------|-------------------------|-------------------------|--------|------------------------|
136 | | 0 | [Story Cloze Test](http://cs.rochester.edu/~nasrinm/files/Papers/lsdsem17-shared-task.pdf) | English | Univ. of Rochester | 2016 | msap | Logistic regression | Closed | x |
137 | | 1 | MS MARCO | English | Microsoft | 2016 | YUANFUDAO research NLP | MARS | Closed | o |
138 | | 2 | MS MARCO V2 | English | Microsoft | 2018 | NTT Media Intelli. Lab. | Masque Q&A Style | Opened | x |
139 | | 3 | [SQuAD](https://arxiv.org/abs/1606.05250) | English | Univ. of Stanford | 2018 | XLNet (single model) |XLNet Team | Closed | o |
140 | | 4 | [SQuAD 2.0](https://rajpurkar.github.io/SQuAD-explorer/) | English | Univ. of Stanford | 2018 | PINGAN Omni-Sinitic | ALBERT + DAAF + Verifier (ensemble) | Opened | o |
141 | | 5 | [TriviaQA](http://nlp.cs.washington.edu/triviaqa/) | English | Univ. of Washington | 2017 | Ming Yan | - | Closed | - |
142 | | 6 | [decaNLP](https://decanlp.com/) | English | Salesforce Research | 2018 | Salesforce Research | MQAN | Closed | x |
143 | | 7 | [DuReader Ver1.](https://ai.baidu.com/broad/introduction) | Chinese | Baidu | 2015 | Tryer | T-Reader (single) | Closed | x |
144 | | 8 | [DuReader Ver2.](https://ai.baidu.com/broad/introduction) | Chinese | Baidu | 2017 | renaissance | AliReader | Opened | - |
145 | | 9 | [KorQuAD](https://korquad.github.io/KorQuad%201.0/) | Korean | LG CNS AI Research | 2018 | Clova AI LaRva Team | LaRva-Kor-Large+ + CLaF (single) | Closed | o |
146 | | 10 | [KorQuAD 2.0](https://korquad.github.io/) | Korean | LG CNS AI Research | 2019 | Kangwon National University | KNU-baseline(single model) | Opened | x |
147 | | 11 | [CoQA](https://stanfordnlp.github.io/coqa/) | English | Univ. of Stanford | 2018 | Zhuiyi Technology | RoBERTa + AT + KD (ensemble) | Opened | o |
148 |
149 | ## Publications
150 | - Papers
151 | - ["Learning to Skim Text"](https://arxiv.org/pdf/1704.06877.pdf), Adams Wei Yu, Hongrae Lee, Quoc V. Le, 2017.
152 | : Show only what you want in Text
153 | - ["Deep Joint Entity Disambiguation with Local Neural Attention"](https://arxiv.org/pdf/1704.04920.pdf), Octavian-Eugen Ganea and Thomas Hofmann, 2017.
154 | - ["BI-DIRECTIONAL ATTENTION FLOW FOR MACHINE COMPREHENSION"](https://arxiv.org/pdf/1611.01603.pdf), Minjoon Seo, Aniruddha Kembhavi, Ali Farhadi, Hananneh Hajishirzi, ICLR, 2017.
155 | - ["Capturing Semantic Similarity for Entity Linking with Convolutional Neural Networks"](http://nlp.cs.berkeley.edu/pubs/FrancisLandau-Durrett-Klein_2016_EntityConvnets_paper.pdf), Matthew Francis-Landau, Greg Durrett and Dan Klei, NAACL-HLT 2016.
156 | - https://GitHub.com/matthewfl/nlp-entity-convnet
157 | - ["Entity Linking with a Knowledge Base: Issues, Techniques, and Solutions"](https://ieeexplore.ieee.org/document/6823700/), Wei Shen, Jianyong Wang, Jiawei Han, IEEE Transactions on Knowledge and Data Engineering(TKDE), 2014.
158 | - ["Introduction to “This is Watson"](https://ieeexplore.ieee.org/document/6177724/), IBM Journal of Research and Development, D. A. Ferrucci, 2012.
159 | - ["A survey on question answering technology from an information retrieval perspective"](https://www.sciencedirect.com/science/article/pii/S0020025511003860), Information Sciences, 2011.
160 | - ["Question Answering in Restricted Domains: An Overview"](https://www.mitpressjournals.org/doi/abs/10.1162/coli.2007.33.1.41), Diego Mollá and José Luis Vicedo, Computational Linguistics, 2007
161 | - ["Natural language question answering: the view from here"](), L Hirschman, R Gaizauskas, natural language engineering, 2001.
162 | - Entity Disambiguation / Entity Linking
163 |
164 | ## Codes
165 | - [BiDAF](https://github.com/allenai/bi-att-flow) - Bi-Directional Attention Flow (BIDAF) network is a multi-stage hierarchical process that represents the context at different levels of granularity and uses bi-directional attention flow mechanism to obtain a query-aware context representation without early summarization.
166 | - Official; Tensorflow v1.2
167 | - [Paper](https://arxiv.org/pdf/1611.01603.pdf)
168 | - [QANet](https://github.com/NLPLearn/QANet) - A Q&A architecture does not require recurrent networks: Its encoder consists exclusively of convolution and self-attention, where convolution models local interactions and self-attention models global interactions.
169 | - Google; Unofficial; Tensorflow v1.5
170 | - [Paper](#qanet)
171 | - [R-Net](https://github.com/HKUST-KnowComp/R-Net) - An end-to-end neural networks model for reading comprehension style question answering, which aims to answer questions from a given passage.
172 | - MS; Unofficially by HKUST; Tensorflow v1.5
173 | - [Paper](https://www.microsoft.com/en-us/research/wp-content/uploads/2017/05/r-net.pdf)
174 | - [R-Net-in-Keras](https://github.com/YerevaNN/R-NET-in-Keras) - R-NET re-implementation in Keras.
175 | - MS; Unofficial; Keras v2.0.6
176 | - [Paper](https://www.microsoft.com/en-us/research/wp-content/uploads/2017/05/r-net.pdf)
177 | - [DrQA](https://github.com/hitvoice/DrQA) - DrQA is a system for reading comprehension applied to open-domain question answering.
178 | - Facebook; Official; Pytorch v0.4
179 | - [Paper](#drqa)
180 | - [BERT](https://github.com/google-research/bert) - A new language representation model which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations by jointly conditioning on both left and right context in all layers.
181 | - Google; Official implementation; Tensorflow v1.11.0
182 | - [Paper](https://arxiv.org/abs/1810.04805)
183 |
184 | ## Lectures
185 | - [Question Answering - Natural Language Processing](https://youtu.be/Kzi6tE4JaGo) - By Dragomir Radev, Ph.D. | University of Michigan | 2016.
186 |
187 | ## Slides
188 | - [Question Answering with Knowledge Bases, Web and Beyond](https://github.com/scottyih/Slides/blob/master/QA%20Tutorial.pdf) - By Scott Wen-tau Yih & Hao Ma | Microsoft Research | 2016.
189 | - [Question Answering](https://hpi.de/fileadmin/user_upload/fachgebiete/plattner/teaching/NaturalLanguageProcessing/NLP2017/NLP8_QuestionAnswering.pdf) - By Dr. Mariana Neves | Hasso Plattner Institut | 2017.
190 |
191 | ## Dataset Collections
192 | - [NLIWOD's Question answering datasets](https://github.com/dice-group/NLIWOD/tree/master/qa.datasets)
193 | - [karthinkncode's Datasets for Natural Language Processing](https://github.com/karthikncode/nlp-datasets)
194 |
195 | ## Datasets
196 | - [AI2 Science Questions v2.1(2017)](http://data.allenai.org/ai2-science-questions/)
197 | - It consists of questions used in student assessments in the United States across elementary and middle school grade levels. Each question is 4-way multiple choice format and may or may not include a diagram element.
198 | - Paper: http://ai2-website.s3.amazonaws.com/publications/AI2ReasoningChallenge2018.pdf
199 | - [Children's Book Test](https://uclmr.github.io/ai4exams/data.html)
200 | - It is one of the bAbI project of Facebook AI Research which is organized towards the goal of automatic text understanding and reasoning. The CBT is designed to measure directly how well language models can exploit wider linguistic context.
201 | - [CODAH Dataset](https://github.com/Websail-NU/CODAH)
202 | - [DeepMind Q&A Dataset; CNN/Daily Mail](https://github.com/deepmind/rc-data)
203 | - Hermann et al. (2015) created two awesome datasets using news articles for Q&A research. Each dataset contains many documents (90k and 197k each), and each document companies on average 4 questions approximately. Each question is a sentence with one missing word/phrase which can be found from the accompanying document/context.
204 | - Paper: https://arxiv.org/abs/1506.03340
205 | - [ELI5](https://github.com/facebookresearch/ELI5)
206 | - Paper: https://arxiv.org/abs/1907.09190
207 | - [GraphQuestions](https://github.com/ysu1989/GraphQuestions)
208 | - On generating Characteristic-rich Question sets for QA evaluation.
209 | - [LC-QuAD](http://sda.cs.uni-bonn.de/projects/qa-dataset/)
210 | - It is a gold standard KBQA (Question Answering over Knowledge Base) dataset containing 5000 Question and SPARQL queries. LC-QuAD uses DBpedia v04.16 as the target KB.
211 | - [MS MARCO](http://www.msmarco.org/dataset.aspx)
212 | - This is for real-world question answering.
213 | - Paper: https://arxiv.org/abs/1611.09268
214 | - [MultiRC](https://cogcomp.org/multirc/)
215 | - A dataset of short paragraphs and multi-sentence questions
216 | - Paper: http://cogcomp.org/page/publication_view/833
217 | - [NarrativeQA](https://github.com/deepmind/narrativeqa)
218 | - It includes the list of documents with Wikipedia summaries, links to full stories, and questions and answers.
219 | - Paper: https://arxiv.org/pdf/1712.07040v1.pdf
220 | - [NewsQA](https://github.com/Maluuba/newsqa)
221 | - A machine comprehension dataset
222 | - Paper: https://arxiv.org/pdf/1611.09830.pdf
223 | - [Qestion-Answer Dataset by CMU](http://www.cs.cmu.edu/~ark/QA-data/)
224 | - This is a corpus of Wikipedia articles, manually-generated factoid questions from them, and manually-generated answers to these questions, for use in academic research. These data were collected by Noah Smith, Michael Heilman, Rebecca Hwa, Shay Cohen, Kevin Gimpel, and many students at Carnegie Mellon University and the University of Pittsburgh between 2008 and 2010.
225 | - [SQuAD1.0](https://rajpurkar.github.io/SQuAD-explorer/)
226 | - Stanford Question Answering Dataset (SQuAD) is a reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage, or the question might be unanswerable.
227 | - Paper: https://arxiv.org/abs/1606.05250
228 | - [SQuAD2.0](https://rajpurkar.github.io/SQuAD-explorer/)
229 | - SQuAD2.0 combines the 100,000 questions in SQuAD1.1 with over 50,000 new, unanswerable questions written adversarially by crowdworkers to look similar to answerable ones. To do well on SQuAD2.0, systems must not only answer questions when possible, but also determine when no answer is supported by the paragraph and abstain from answering.
230 | - Paper: https://arxiv.org/abs/1806.03822
231 | - [Story cloze test](http://cs.rochester.edu/nlp/rocstories/)
232 | - 'Story Cloze Test' is a new commonsense reasoning framework for evaluating story understanding, story generation, and script learning. This test requires a system to choose the correct ending to a four-sentence story.
233 | - Paper: https://arxiv.org/abs/1604.01696
234 | - [TriviaQA](http://nlp.cs.washington.edu/triviaqa/)
235 | - TriviaQA is a reading comprehension dataset containing over 650K question-answer-evidence triples. TriviaQA includes 95K question-answer pairs authored by trivia enthusiasts and independently gathered evidence documents, six per question on average, that provide high quality distant supervision for answering the questions.
236 | - Paper: https://arxiv.org/abs/1705.03551
237 | - [WikiQA](https://www.microsoft.com/en-us/download/details.aspx?id=52419&from=https%3A%2F%2Fresearch.microsoft.com%2Fen-US%2Fdownloads%2F4495da01-db8c-4041-a7f6-7984a4f6a905%2Fdefault.aspx)
238 | - A publicly available set of question and sentence pairs for open-domain question answering.
239 |
240 | ### The DeepQA Research Team in IBM Watson's publication within 5 years
241 | - 2015
242 | - "Automated Problem List Generation from Electronic Medical Records in IBM Watson", Murthy Devarakonda, Ching-Huei Tsou, IAAI, 2015.
243 | - "Decision Making in IBM Watson Question Answering", J. William Murdock, Ontology summit, 2015.
244 | - ["Unsupervised Entity-Relation Analysis in IBM Watson"](http://www.cogsys.org/papers/ACS2015/article12.pdf), Aditya Kalyanpur, J William Murdock, ACS, 2015.
245 | - "Commonsense Reasoning: An Event Calculus Based Approach", E T Mueller, Morgan Kaufmann/Elsevier, 2015.
246 | - 2014
247 | - "Problem-oriented patient record summary: An early report on a Watson application", M. Devarakonda, Dongyang Zhang, Ching-Huei Tsou, M. Bornea, Healthcom, 2014.
248 | - ["WatsonPaths: Scenario-based Question Answering and Inference over Unstructured Information"](http://domino.watson.ibm.com/library/Cyberdig.nsf/1e4115aea78b6e7c85256b360066f0d4/088f74984a07645485257d5f006ace96!OpenDocument&Highlight=0,RC25489), Adam Lally, Sugato Bachi, Michael A. Barborak, David W. Buchanan, Jennifer Chu-Carroll, David A. Ferrucci*, Michael R. Glass, Aditya Kalyanpur, Erik T. Mueller, J. William Murdock, Siddharth Patwardhan, John M. Prager, Christopher A. Welty, IBM Research Report RC25489, 2014.
249 | - ["Medical Relation Extraction with Manifold Models"](http://acl2014.org/acl2014/P14-1/pdf/P14-1078.pdf), Chang Wang and James Fan, ACL, 2014.
250 |
251 | ### MS Research's publication within 5 years
252 | - 2018
253 | - "Characterizing and Supporting Question Answering in Human-to-Human Communication", Xiao Yang, Ahmed Hassan Awadallah, Madian Khabsa, Wei Wang, Miaosen Wang, ACM SIGIR, 2018.
254 | - ["FigureQA: An Annotated Figure Dataset for Visual Reasoning"](https://arxiv.org/abs/1710.07300), Samira Ebrahimi Kahou, Vincent Michalski, Adam Atkinson, Akos Kadar, Adam Trischler, Yoshua Bengio, ICLR, 2018
255 | - 2017
256 | - "Multi-level Attention Networks for Visual Question Answering", Dongfei Yu, Jianlong Fu, Tao Mei, Yong Rui, CVPR, 2017.
257 | - "A Joint Model for Question Answering and Question Generation", Tong Wang, Xingdi (Eric) Yuan, Adam Trischler, ICML, 2017.
258 | - "Two-Stage Synthesis Networks for Transfer Learning in Machine Comprehension", David Golub, Po-Sen Huang, Xiaodong He, Li Deng, EMNLP, 2017.
259 | - "Question-Answering with Grammatically-Interpretable Representations", Hamid Palangi, Paul Smolensky, Xiaodong He, Li Deng,
260 | - "Search-based Neural Structured Learning for Sequential Question Answering", Mohit Iyyer, Wen-tau Yih, Ming-Wei Chang, ACL, 2017.
261 | - 2016
262 | - ["Stacked Attention Networks for Image Question Answering"](https://www.cv-foundation.org/openaccess/content_cvpr_2016/html/Yang_Stacked_Attention_Networks_CVPR_2016_paper.html), Zichao Yang, Xiaodong He, Jianfeng Gao, Li Deng, Alex Smola, CVPR, 2016.
263 | - ["Question Answering with Knowledge Base, Web and Beyond"](https://www.microsoft.com/en-us/research/publication/question-answering-with-knowledge-base-web-and-beyond/), Yih, Scott Wen-tau and Ma, Hao, ACM SIGIR, 2016.
264 | - ["NewsQA: A Machine Comprehension Dataset"](https://arxiv.org/abs/1611.09830), Adam Trischler, Tong Wang, Xingdi Yuan, Justin Harris, Alessandro Sordoni, Philip Bachman, Kaheer Suleman, RepL4NLP, 2016.
265 | - ["Table Cell Search for Question Answering"](https://dl.acm.org/citation.cfm?id=2883080), Sun, Huan and Ma, Hao and He, Xiaodong and Yih, Wen-tau and Su, Yu and Yan, Xifeng, WWW, 2016.
266 | - 2015
267 | - ["WIKIQA: A Challenge Dataset for Open-Domain Question Answering"](https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/YangYihMeek_EMNLP-15_WikiQA.pdf), Yi Yang, Wen-tau Yih, and Christopher Meek, EMNLP, 2015.
268 | - ["Web-based Question Answering: Revisiting AskMSR"](https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/AskMSRPlusTR_082815.pdf), Chen-Tse Tsai, Wen-tau Yih, and Christopher J.C. Burges, MSR-TR, 2015.
269 | - ["Open Domain Question Answering via Semantic Enrichment"](https://dl.acm.org/citation.cfm?id=2741651), Huan Sun, Hao Ma, Wen-tau Yih, Chen-Tse Tsai, Jingjing Liu, and Ming-Wei Chang, WWW, 2015.
270 | - 2014
271 | - ["An Overview of Microsoft Deep QA System on Stanford WebQuestions Benchmark"](https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/Microsoft20Deep20QA.pdf), Zhenghao Wang, Shengquan Yan, Huaming Wang, and Xuedong Huang, MSR-TR, 2014.
272 | - ["Semantic Parsing for Single-Relation Question Answering"](), Wen-tau Yih, Xiaodong He, Christopher Meek, ACL, 2014.
273 |
274 | ### Google AI's publication within 5 years
275 | - 2018
276 | - Google QA
277 | - ["QANet: Combining Local Convolution with Global Self-Attention for Reading Comprehension"](https://openreview.net/pdf?id=B14TlG-RW), Adams Wei Yu, David Dohan, Minh-Thang Luong, Rui Zhao, Kai Chen, Mohammad Norouzi, Quoc V. Le, ICLR, 2018.
278 | - ["Ask the Right Questions: Active Question Reformulation with Reinforcement Learning"](https://openreview.net/pdf?id=S1CChZ-CZ), Christian Buck and Jannis Bulian and Massimiliano Ciaramita and Wojciech Paweł Gajewski and Andrea Gesmundo and Neil Houlsby and Wei Wang, ICLR, 2018.
279 | - ["Building Large Machine Reading-Comprehension Datasets using Paragraph Vectors"](https://arxiv.org/pdf/1612.04342.pdf), Radu Soricut, Nan Ding, 2018.
280 | - Sentence representation
281 | - ["An efficient framework for learning sentence representations"](https://arxiv.org/pdf/1803.02893.pdf), Lajanugen Logeswaran, Honglak Lee, ICLR, 2018.
282 | - ["Did the model understand the question?"](https://arxiv.org/pdf/1805.05492.pdf), Pramod K. Mudrakarta and Ankur Taly and Mukund Sundararajan and Kedar Dhamdhere, ACL, 2018.
283 | - 2017
284 | - ["Analyzing Language Learned by an Active Question Answering Agent"](https://arxiv.org/pdf/1801.07537.pdf), Christian Buck and Jannis Bulian and Massimiliano Ciaramita and Wojciech Gajewski and Andrea Gesmundo and Neil Houlsby and Wei Wang, NIPS, 2017.
285 | - ["Learning Recurrent Span Representations for Extractive Question Answering"](https://arxiv.org/pdf/1611.01436.pdf), Kenton Lee and Shimi Salant and Tom Kwiatkowski and Ankur Parikh and Dipanjan Das and Jonathan Berant, ICLR, 2017.
286 | - Identify the same question
287 | - ["Neural Paraphrase Identification of Questions with Noisy Pretraining"](https://arxiv.org/pdf/1704.04565.pdf), Gaurav Singh Tomar and Thyago Duque and Oscar Täckström and Jakob Uszkoreit and Dipanjan Das, SCLeM, 2017.
288 | - 2014
289 | - "Great Question! Question Quality in Community Q&A", Sujith Ravi and Bo Pang and Vibhor Rastogi and Ravi Kumar, ICWSM, 2014.
290 |
291 | ### Facebook AI Research's publication within 5 years
292 | - 2018
293 | - [Embodied Question Answering](https://research.fb.com/publications/embodied-question-answering/), Abhishek Das, Samyak Datta, Georgia Gkioxari, Stefan Lee, Devi Parikh, and Dhruv Batra, CVPR, 2018
294 | - [Do explanations make VQA models more predictable to a human?](https://research.fb.com/publications/do-explanations-make-vqa-models-more-predictable-to-a-human/), Arjun Chandrasekaran, Viraj Prabhu, Deshraj Yadav, Prithvijit Chattopadhyay, and Devi Parikh, EMNLP, 2018
295 | - [Neural Compositional Denotational Semantics for Question Answering](https://research.fb.com/publications/neural-compositional-denotational-semantics-for-question-answering/), Nitish Gupta, Mike Lewis, EMNLP, 2018
296 | - 2017
297 | - DrQA
298 | - [Reading Wikipedia to Answer Open-Domain Questions](https://cs.stanford.edu/people/danqi/papers/acl2017.pdf), Danqi Chen, Adam Fisch, Jason Weston & Antoine Bordes, ACL, 2017.
299 |
300 | ## Books
301 | - Natural Language Question Answering system Paperback - Boris Galitsky (2003)
302 | - New Directions in Question Answering - Mark T. Maybury (2004)
303 | - Part 3. 5. Question Answering in The Oxford Handbook of Computational Linguistics - Sanda Harabagiu and Dan Moldovan (2005)
304 | - Chap.28 Question Answering in Speech and Language Processing - Daniel Jurafsky & James H. Martin (2017)
305 |
306 | ## Links
307 | - [Building a Question-Answering System from Scratch— Part 1](https://towardsdatascience.com/building-a-question-answering-system-part-1-9388aadff507)
308 | - [Qeustion Answering with Tensorflow By Steven Hewitt, O'REILLY, 2017](https://www.oreilly.com/ideas/question-answering-with-tensorflow)
309 | - [Why question answering is hard](http://nicklothian.com/blog/2014/09/25/why-question-answering-is-hard/)
310 |
311 |
312 | ## Contributing
313 |
314 | Contributions welcome! Read the [contribution guidelines](contributing.md) first.
315 |
316 | ## License
317 | [](https://creativecommons.org/share-your-work/public-domain/cc0/)
318 |
319 | To the extent possible under law, [seriousmac](https://github.com/seriousmac) (the maintainer) has waived all copyright and related or neighboring rights to this work.
320 |
--------------------------------------------------------------------------------
/contributing.md:
--------------------------------------------------------------------------------
1 | # Contribution Guidelines
2 |
3 | Please ensure your pull request adheres to the following guidelines:
4 |
5 | - Search previous suggestions before making a new one, as yours may be a duplicate.
6 | - Suggested entries should refer to tested and documented code
7 | - Projects should be relevant to the topic and (preferably) under active development.
8 | - Make an individual pull request for each suggestion.
9 | - New categories, or improvements to the existing categorization are welcome.
10 | - Keep descriptions short and simple, but descriptive.
11 | - End all descriptions with a full stop/period.
12 | - Check your spelling and grammar.
13 | - Make sure your text editor is set to remove trailing whitespace.
14 |
15 | Markdown format to use for new entries (see existing examples):
16 |
17 | - If there is only a repository, use the following format: `[NAME](REPO-LINK) - DESCRIPTION.`
18 | - If there is a website and a repositry, use the following format: `[NAME](WEBSITE-LINK) | [GITHUB-ICON](REPO-LINK) - DESCRIPTION`
19 |
20 | Thank you for your suggestions!
21 |
22 | ## Adding entries to the list
23 |
24 | If you have something awesome to contribute to this list, this is how you do it. (note: instructions copied from [the-root-of-all-awesome](https://github.com/sindresorhus/awesome) lists :smile:)
25 |
26 | You'll need a [GitHub account](https://github.com/join)!
27 |
28 | 1. Click on the `readme.md` file: 
29 | 2. Now click on the edit icon. 
30 | 3. You can start editing the text of the file in the in-browser editor. Make sure you follow guidelines above. You can use [GitHub Flavored Markdown](https://help.github.com/articles/github-flavored-markdown/). 
31 | 4. Say why you're proposing the changes, and then click on "Propose file change". 
32 | 5. Submit the [pull request](https://help.github.com/articles/using-pull-requests/)!
33 |
34 | ## Updating your Pull Request
35 |
36 | Sometimes, a maintainer of an awesome list will ask you to edit your Pull Request before it is included. This is normally due to spelling errors or because your PR didn't match the awesome-* list guidelines.
37 |
38 | [Here](https://github.com/RichardLitt/knowledge/blob/master/github/amending-a-commit-guide.md) is a write up on how to change a Pull Request, and the different ways you can do that.
39 |
--------------------------------------------------------------------------------
/images/README:
--------------------------------------------------------------------------------
1 |
2 |
--------------------------------------------------------------------------------
/images/img-001.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seriousran/awesome-qa/b5d2f5f3030c35b772e2c0064cf896377b913724/images/img-001.png
--------------------------------------------------------------------------------
/images/img-002.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seriousran/awesome-qa/b5d2f5f3030c35b772e2c0064cf896377b913724/images/img-002.png
--------------------------------------------------------------------------------
/mds/named-entity-recognition.md:
--------------------------------------------------------------------------------
1 | # Named-entity-recognition
2 |
3 | Named-entity recognition (NER) (also known as entity identification, entity chunking and entity extraction) is a subtask of information extraction that seeks to locate and classify named entities in text into pre-defined categories such as the names of persons, organizations, locations, expressions of times, quantities, monetary values, percentages, etc.
4 |
5 | Most research on NER systems has been structured as taking an unannotated block of text, such as this one:
6 |
7 | ```Jim bought 300 shares of Acme Corp. in 2006.```
8 |
9 | And producing an annotated block of text that highlights the names of entities:
10 |
11 | ```[Jim]Person bought 300 shares of [Acme Corp.]Organization in [2006]Time.```
12 |
13 | In this example, a person name consisting of one token, a two-token company name and a temporal expression have been detected and classified.
14 |
15 | State-of-the-art NER systems for English produce near-human performance. For example, the best system entering MUC-7 scored 93.39% of F-measure while human annotators scored 97.60% and 96.95%.[1][2]
16 |
17 | (Wikipedia; https://en.wikipedia.org/wiki/Named-entity_recognition)
18 |
--------------------------------------------------------------------------------
/mds/rdf.md:
--------------------------------------------------------------------------------
1 | # Resrouce Description Framework(RDF)
2 |
3 |
4 |
5 | **Table of Contents** *generated with [DocToc](https://github.com/thlorenz/doctoc)*
6 |
7 | - [What is RDF](#what-is-rdf)
8 | - [Events](#events)
9 | - [Timeline of Sementic Web Language](#timeline-of-sementic-web-language)
10 | - [Graph Model](#graph-model)
11 | - [Graph Expression in Language](#graph-expression-in-language)
12 | - [RDF dataset](#rdf-dataset)
13 | - [Technical Specifications introduced by RDF PRIMER](#technical-specifications-introduced-by-rdf-primer)
14 |
15 |
16 |
17 | ## What is RDF
18 | : Framework for representing information(espeically metadata about web resource) in the Web
19 | - Resouce
20 | - Description
21 | - Framework
22 | : For Machine processible, readable, understandable Linked Data from web as database
23 | : Has a [Graph Model](#graph-model)
24 | - RDF Schema
25 | - RDF Schema gives additional information(Domain, Range) about preoperty.
26 | - Describe the type(or Class) of Resource. ex)Book, Person, Publisher
27 | - RDF Syntac
28 |
29 | ### Events
30 | - Mid 1990s: MCF(Meta Content Framework) - Ramanathan V. Guha & Tim Bray
31 | (Apple)
32 | - 1997: MCF/XML - Ramanathan V. Guha & Tim Bray
33 | - 1998: RDF - W3C
34 |
35 | ### Timeline of Sementic Web Language
36 | - 1970s: Ontology (information science)
37 | - 1996 : XML (W3C WD)
38 | - 1997 : RDF (W3C WD)
39 | - 1998 : RDF Scheme (W3C WD)
40 | - 1999 : DAML
41 | - 1999 : OIL (Europe IST Project)
42 | - 2000 : DAML+OIL
43 | - 2002 : OWL (W3C WD)
44 | - 2004 : SPARQL (WD)
45 |
46 | ### Graph Model
47 | - Subject --- Predicate ---> Object
48 | - Resource --- Property, Relation ---> Resource, Literal
49 | - URL or Blank Node --- URL ---> URL or Literal
50 | - ex) http:///dbpedia.org/resource/Billie_Jean has a singer whose value is Michael Jackson
51 | - Subject: http:///dbpedia.org/resource/Billie_Jean (URI)
52 | - Predicate: http://www.example.com/terms/singer (URI)
53 | - Object: Michael_Jackson (Literal)
54 | - +Predicate: http://www.example.com/terms/released (URI)
55 | - +Object: 1983-01-02 (Literal)
56 | - ex) extension
57 | - Subject: http:///dbpedia.org/resource/Billie_Jean (URI)
58 | - Predicate: http://www.example.com/terms/singer (URI)
59 | - Object: http:///dbpedia.org/resource/Michael_Jackson (URI)
60 | - Predicate: http://www.example.com/terms/name (URI)
61 | - Object: Michael_Jackson (Literal)
62 | - Predicate: http://www.example.com/terms/age (URI)
63 | - Object: 44 (Literal)
64 | - Predicate: http://www.example.com/terms/released (URI)
65 | - Object: 1983-01-02 (Literal)
66 |
67 | ### Graph Expression in Language
68 | - Turtle: a text based format, easy to scribble, easy to read
69 | - JSON-LD
70 | - RDF/XML: an XML based format, hard to read/write
71 |
72 | ### RDF dataset
73 | - Wikidata: free, from wikimedia
74 | - DBpedia: extracted from Wikipedia infoboxes
75 | - WordNet
76 | - Europeana
77 | - VIAF
78 | - [datahub.io](): List for usable Linked Datasets
79 |
80 | ### Technical Specifications introduced by RDF PRIMER
81 | - JSON-LD - Manu Sporny, Gregg Kellogg, Markus Lanthaler, Editors.
82 | - JSON-LD 1.0 16 January 2014. W3C Recommendation.
83 | - URL : http://www.w3.org/TR/json-ld/
84 | - LINKED-DATA - Tim Berners-Lee. Linked Data . Personal View, imperfect but published.
85 | - URL : http://www.w3.org/DesignIssues/LinkedData.html
86 | - N-QUADS - Gavin Carothers.
87 | - RDF 1.1 N-Quads. W3C Recommendation 25 February 2014.
88 | - URL : http://www.w3.org/TR/2014/REC-n-quads-20140225/.
89 | - The latest Edition is available at http://www.w3.org/TR/n-quads/
90 | - N-TRIPLES - Gavin Carothers, Andy Seabourne.
91 | - RDF 1.1 N-Triples . W3C Recommendation 25 February 2014.
92 | - URL : http://www.w3.org/TR/2014/REC-n-triples-20140225/ .
93 | - The latest edition is available at http://www.w3.org/TR/n-triples/
94 | - OWL2-OVERVIEW - W3C OWL Working Group.
95 | - OWL 2 Web Ontology Language Document Overview (Second Edition) 11 December 2012. W3C Recommendation.
96 | - URL : http://www.w3.org/TR/owl2-overview/
97 | - RDF-PRIMER - Frank Manola; Eric Miller. RDF Primer 10 February 2004. W3C Recommendation.
98 | - URL : http://www.w3.org/TR/rdf-primer/
99 | - RDF-SYNTAX-GRAMMAR - Fabien Gandon; Guus Schreiber.
100 | - RDF 1.1 XML Syntax 9 January 2014. W3C Proposed Edited Recommendation.
101 | - URL : http://www.w3.org/TR/rdf-syntax-grammar/
102 | - RDF11-CONCEPTS - Richard Cyganiak, David Wood, Markus Lanthaler.
103 | - RDF 1.1 Concepts and Abstract Syntax. W3C Recommendation 25 February 2014.
104 | - URL :http://www.w3.org/TR/2014/REC-rdf11-concepts-20140225/ .
105 | The latest Edition is available at http://www.w3.org/TR/rdf11-concepts/
106 | - RDF11-DATASETS - Antoine Zimmermann.
107 | - RDF 1.1 : On Semantics of RDF Datasets . W3C Working Group Note 25 February 2014.
108 | - The latest version is available a thttp://www.w3.org/TR/rdf11-datasets/ .
109 | - RDF11-MT - Patrick J. Hayes, Peter F. Patel-Schneider.
110 | - RDF 1.1 Semantics. W3C Recommendation 25 February 2014.
111 | - URL : http://www.w3.org/TR/2014/REC-rdf11-mt-20140225/ .
112 | - The latest Edition is available at http://www.w3.org/TR/rdf11-mt/
113 | - RDF11-NEW - David Wood.
114 | - What 's New in RDF 1.1 . W3C Working Group Note 25 February 2014.
115 | - The latest version is available at http://www.w3.org/TR/rdf11-new/ .
116 | - RDF11-SCHEMA - Dan Brickley, RV Guha. RDF Schema 1.1 .
117 | - W3C Recommendation 25 February 2014.
118 | - URL : http://www.w3.org/TR/2014/REC-rdf-schema-20140225/ .
119 | - The latest Published version is available at HTTP : //www.w3.org/TR/rdf-schema/ .
120 | - RDF11-XML - Fabien Gandon, Guus Schreiber.
121 | - RDF 1.1 XML Syntax . W3C Recommendation 25 February 2014.
122 | - URL : http://www.w3.org/TR/2014/REC-rdf-syntax-grammar-20140225/ .
123 | - The latest published version is available at http://www.w3.org/TR/rdf-syntax-grammar/ .
124 | - RDFA-LITE - Manu Sporny.
125 | - RDFa Lite 1.1 7 June 2012. W3C Recommendation.
126 | - URL : http://www.w3.org/TR/rdfa-lite/
127 | - RDFA-PRIME - Ivan Herman; Ben Adida; Manu Sporny; Mark Birbeck.
128 | - RDFa 1.1 Primer - Second Edition 22 August 2013. W3C Note. \
129 | - URL :http://www.w3.org/TR/rdfa-primer/ -
130 | - RFC3987 - M. Durst; M. Suignard. Internationalized Resource Identifiers (IRIs).
131 | - January 2005. RFC.
132 | - URL : http://www.ietf.org/rfc/rfc3987.txt
133 | - SPARQL11-ENTAILMENT - Birte Glimm; Chimezie Ogbuji.
134 | - SPARQL 1.1 Entailment Regimes 21 March 2013. W3C Recommendation.
135 | - URL : http://www.w3.org/TR/sparql11-entailment/
136 | - SPARQL11-OVERVIEW - The W3C SPARQL Working Group.
137 | - SPARQL 1.1 Overview 21 March 2013. W3C Recommendation.
138 | - URL : http://www.w3.org/TR/sparql11-overview/
139 | - SPARQL11-UPDATE - Paul Gearon; Alexandre Passant; Axel Polleres.
140 | - SPARQL 1.1 Update 21 March 2013. W3C Recommendation.
141 | - URL :http://www.w3.org/TR/sparql11-update/
142 | - TRIG - Gavin Carothers, Andy Seaborne.
143 | - W3C Recommendation 25 February 2014.
144 | - URL :http://www.w3.org/TR/2014/REC-trig-20140225/ .
145 | - The latest Edition is available at HTTP : //www.w3.org/TR/trig/
146 | - TURTLE - Eric Prud'hommeaux, Gavin Carothers.
147 | - RDF 1.1 Turtle : Terse RDF Triple Language.
148 | - W3C Recommendation 25 February 2014.
149 | - URL :http://www.w3.org/TR/2014/REC-turtle-20140225/ .
150 | - The latest edition is available at http://www.w3.org/TR/turtle/
151 |
--------------------------------------------------------------------------------
/mds/reference-resolution.md:
--------------------------------------------------------------------------------
1 | # Reference Resolution
2 |
3 |
4 |
5 | **Table of Contents** *generated with [DocToc](https://github.com/thlorenz/doctoc)*
6 |
7 | - [EMNLP](#emnlp)
8 | - [CoNLL](#conll)
9 | - [Papers](#papers)
10 | - [의미역 결정](#%EC%9D%98%EB%AF%B8%EC%97%AD-%EA%B2%B0%EC%A0%95)
11 | - [키워드 추출](#%ED%82%A4%EC%9B%8C%EB%93%9C-%EC%B6%94%EC%B6%9C)
12 | - [Entity Linking](#entity-linking)
13 |
14 |
15 |
16 | ### [EMNLP](http://emnlp2018.org/)
17 | - Conference on Empirical Methods in Natural Language Processing
18 | - CoNLL is one of the workshop in EMNLP
19 |
20 | ### [CoNLL](http://www.conll.org/)
21 | - CoNLL is a top-tier conference, yearly organized by SIGNLL (ACL's Special Interest Group on Natural Language Learning).
22 | - CoNLL has shared tasks every year
23 | - In 2018
24 | - [Multilingual Parsing from Raw Text to Universal Dependencies](http://universaldependencies.org/conll18/)
25 | - [CoNLL–SIGMORPHON 2018 Shared Task: Universal Morphological Reinflection](https://sigmorphon.github.io/sharedtasks/2018/)
26 |
27 | ### Papers
28 | - [포인터 네트워크를 이용한 한국어 대명사 상호참조해결](http://kiise.or.kr/e_journal/2017/5/JOK/pdf/07.pdf), 박천음, 이창기
29 | - [시브 자질 기반 랜덤 포레스트를 이용한 한국어 상호참조 해결](), 정석원, 최맹식, 김학수
30 | - [Bi-directional Multiple Timescale GRU 기반 포인터 네트워크를 이용한 상호참조해결], 박천음, 이창기
31 | - [생략된 문장성분 복원 장치 및 방법](https://patents.google.com/patent/KR100641053B1/ko), 이창기, 임수종, 장명길
32 |
33 | - [Modeling Trolling in Social Media Conversations](https://arxiv.org/pdf/1612.05310.pdf)
34 | - [Supervised Noun Phrase Coreference Research: The First Fifteen Years](http://delivery.acm.org/10.1145/1860000/1858823/p1396-ng.pdf?ip=221.148.36.67&id=1858823&acc=OPEN&key=4D4702B0C3E38B35%2E4D4702B0C3E38B35%2E4D4702B0C3E38B35%2E6D218144511F3437&__acm__=1536889438_33302213c46475f9809424b36737cb15), Vincent Ng
35 | - [Deterministic Coreference Resolution Based on Entity-Centric, Precision-Ranked Rules](https://www.mitpressjournals.org/doi/pdf/10.1162/COLI_a_00152)
36 | - [Supervised Models for Coreference Resolution](), Altaf Rahman and Vincent Ng
37 | - [Learning Entity Types from Query Logs via Graph-Based Modeling](http://www.luojie.me/publications/files/zhang_cikm15_lp.pdf), Jingyuan Zhang, Luo Jie, Altaf Rahman, Sihong Xie†, Yi Chang, and Philip S. Yu
38 | - [BLANC: Implementing the Rand index for coreference evaluation](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.300.9229&rep=rep1&type=pdf), M. Recasens to E. Hovy
39 | - [Improving Machine Learning Approaches to Coreference Resolution](http://delivery.acm.org/10.1145/1080000/1073102/p104-ng.pdf?ip=221.148.36.67&id=1073102&acc=OPEN&key=4D4702B0C3E38B35%2E4D4702B0C3E38B35%2E4D4702B0C3E38B35%2E6D218144511F3437&__acm__=1536890936_3e69413d210852920058a15769c798d6), Vincent Ng and Claire Cardie
40 | - [End-to-end Neural Coreference Resolution](https://www.semanticscholar.org/paper/End-to-end-Neural-Coreference-Resolution-Lee-He/8ae1af4a424f5e464d46903bc3d18fe1cf1434ff) - Kenton Lee, Luheng He, Mike Lewis, Luke S. Zettlemoyer, EMNLP, 2017
41 |
42 | ### 의미역 결정
43 | - [형태 의미 정보를 이용한 한국어 의미역 결정](), 박태호, 차정원
44 | - [문자 기반 LSTM CRF를 이용한 한국어 의미역 결정](), 박광현, 나승훈
45 | - [Highway BiLSTM-CRFs 모델을 이용한 한국어 의미역 결정](), 배장성, 이창기, 김현기
46 |
47 | ### 키워드 추출
48 | - [용어 클러스터링을 이용한 단일문서 키워드 추출에 관한 연구], 한승희
49 | - [한글 문서의 단어 동시 출현 정보에 개선된 TextRank를 적용한 키워드 자동 추출 기법](), 송광호, 민지홍
50 | - [Automatic Keyphrase Extraction: A Survey of the State of the Art](http://www.aclweb.org/anthology/P14-1119), Kazi Saidul Hasan and Vincent Ng
51 |
52 | ### Entity Linking
53 | - Entity linking is the task to link entity mentions in text with their corresponding entities in a knowledge base
54 |
--------------------------------------------------------------------------------
/mds/relation-extraction.md:
--------------------------------------------------------------------------------
1 | # Relation Extraction
2 |
3 |
4 |
5 | **Table of Contents** *generated with [DocToc](https://github.com/thlorenz/doctoc)*
6 |
7 | - [Relation Extraction](#relation-extraction)
8 | - [Contents](#contents)
9 | - [Publications](#publications)
10 |
11 |
12 |
13 | ## Contents
14 | - [A Survey of Deep Learning Methods for Relation Extraction](#pb1), 2017 (Citations: 6)
15 | - [Joint extraction of entities and relations using reinforcement learning and deep learning](#pb2), 2017 (Citations: 2)
16 | - [On the recursive neural networks for relation extraction and entity recognition](#pb3), 2013 (Citations: 6)
17 | - [Relation extraction and scoring in DeepQA](#pb4), 2012 (Citations: 58)
18 | - [Automatic knowledge extraction from documents](#pb5), 2012 (Citations: 123)
19 | - [PRISMATIC: inducing knowledge from a large scale lexicalized relation resource](#pb6), 2010 (Citations: 35)
20 | - [A review of relation extraction](#pb7), 2007 (Citations: 118)
21 | - [REXTOR: A System for Generating Relations from Natural Language](#pb8), 2000 (Citations: 83)
22 |
23 | ## Publications
24 | - #### ["A Survey of Deep Learning Methods for Relation Extraction"](https://arxiv.org/pdf/1705.03645.pdf), Kumar, Shantanu, arXiv preprint, 2017. (Citations: 6)
25 |
26 | - #### ["Joint extraction of entities and relations using reinforcement learning and deep learning"](https://pdfs.semanticscholar.org/e0e5/9f42cfda8d34310adaa69f708db07c99b06f.pdf), Feng, Y., Zhang, H., Hao, W., & Chen, G, Computational intelligence and neuroscience, 2017. (Citations: 2)
27 |
28 | - #### ["On the recursive neural networks for relation extraction and entity recognition"](https://arxiv.org/pdf/1705.03645.pdf), Khashabi, Daniel, Computational intelligence and neuroscience, 2013. (Citations: 6)
29 |
30 | - #### ["Relation extraction and scoring in DeepQA"](http://brenocon.com/watson_special_issue/09%20relation%20extraction%20and%20scoring.pdf), C. Wang, A. Kalyanpur, J. Fan, B. K. Boguraev, and D. C. Gondek, IBM Journal, 2012. (Citations: 58)
31 | - Rule-based relation extraction
32 | - Statistical approaches for relation extraction and passage scoring'
33 |
34 | - #### ["Automatic knowledge extraction from documents"](http://brenocon.com/watson_special_issue/05%20automatic%20knowledge%20extration.pdf), J. Fan, A. Kalyanpur, D. C. Gondek, and D. A. Ferrucci, IBM Journal, 2012. (Citations: 123)
35 | - PRISMATIC is used as a knowledge resource for QA in Watson
36 | - PRISMATIC is built using a suite of natural-language processing (NLP) tools that include a dependency parser, a rule-based NER, and a co-reference resolution component. The PRISMATIC creation process consists of three phases.
37 | 1. Corpus processingVDocuments are annotated by a suite of components that perform dependency parsing, co-reference resolution, NER, and relation detection.
38 | 2. Frame extractionVFrames are extracted on the basis of the dependency parses and associated annotations. This phase implements the first stage of our approach.
39 | 3. Frame projectionVFrame projections of interest (e.g., S-V-O projections) are identified over all frames, and frequency information for each projection is tabulated. This phase produces the aggregate statistics from the extracted frames used to infer additional semantics.
40 |
41 | - #### ["PRISMATIC: inducing knowledge from a large scale lexicalized relation resource"](https://dl.acm.org/citation.cfm?id=1866790), J Fan, D Ferrucci, D Gondek, A Kalyanpur, NAACL HLT, 2010. (Citations: 35)
42 | - This paper presents PRISMATIC, a large scale lexicalized relation resource that is automatically created over 30 gb of text.
43 | - The authors' focus has been on building the infrastructure and gathering the data.
44 |
45 | 
46 |
47 | - #### ["A review of relation extraction"](https://www.cs.cmu.edu/~nbach/papers/A-survey-on-Relation-Extraction.pdf), Bach, Nguyen, and Sameer Badaskar, iterature review for Language and Statistics II, 2007. (Citations: 118)
48 |
49 | - #### ["REXTOR: A System for Generating Relations from Natural Language"](http://www.anthology.aclweb.org/W/W00/W00-1107.pdf), Boris Katz and Jimmy Lin, ACL, 2000. (Citations: 83)
50 | - The application of natural language processing (NLP) techniques to information retrieval promises to generate representational structures that better capture the semantic content of documents.
51 | In particular, syntactic analysis can highlight the relationships between various terms and phrases in a sentence, which will allow us to distinguish between the example pairs given above and answer queries with higher precision than traditional IR systems.
52 | However, a syntactically-informed representational structure faces the problem of Hnguistic variations, the phenomenon in which similar semantic content may be expressed in different surface forms.
53 | - REXTOR(Relations EXtracTOR) provides two separate grammars: one for extracting arbitrary entities from documents, and the other for building relations from the extracted items.
54 |
55 | 
56 |
57 | - Relation and extraction rules
58 | ```
59 | Eztraction Rules:
60 | NounGroup := (PRPZ|DT)? {JJX*} {(NNPX|NNX|NNPS|NNS)+};
61 | PrepositionalPhrase := IN {NounGroup} ;
62 | ComplexNounGroup := {NounGroup} {PrepositionalPhrase};
63 | Relation Rules:
64 | NounGroup :=> <{0} 'describes' [1]>;
65 | ComplexNounGroup :=>
66 | <[0],NounGroup[1]
67 | ;related-to;
68 | [1],PrepositionalPhrase[0], NounGroup[1]>;
69 | ```
70 | PRPZ: Part-of-speech tag for possessive pronouns
71 | DT: " for determiners
72 | JJX: " for adjectives
73 | JJR: " for comparative adjectives
74 | JJS: " for uperlative adjectives
75 | NNX: " for singular or mass nouns
76 | NNS: " for plural nouns
77 | NNPX: " for singular proper nouns
78 | NNPS: " for plural proper nouns
79 | IN: " for prepositions
80 |
--------------------------------------------------------------------------------