├── .gitignore ├── LICENSE ├── README.md ├── aggregate_tags.py ├── annotations ├── D18-1062.txt ├── D18-1220.txt ├── D18-1276.txt ├── D18-1332.txt ├── D18-1494.txt ├── D19-1350.txt ├── D19-1555.txt ├── D19-1597.txt ├── N18-1009.txt ├── N18-1045.txt ├── N18-1158.txt ├── N18-1176.txt ├── N18-2031.txt ├── N18-2075.txt ├── N18-2097.txt ├── N19-1015.txt ├── N19-1071.txt ├── N19-1154.txt ├── N19-1157.txt ├── N19-1185.txt ├── N19-1329.txt ├── N19-2009.txt ├── P18-1089.txt ├── P18-1145.txt ├── P18-1192.txt ├── P18-2040.txt ├── P19-1009.txt ├── P19-1085.txt ├── P19-1113.txt ├── P19-1178.txt ├── P19-1201.txt ├── P19-1263.txt ├── P19-1286.txt ├── P19-1326.txt ├── P19-1336.txt ├── P19-1447.txt ├── P19-1511.txt ├── P19-1540.txt ├── P19-2032.txt └── P19-2038.txt ├── concepts.md ├── draw_bar.py ├── fig ├── annotations.png └── auto.png ├── get_paper.py ├── requirements.txt ├── rule_classifier.py └── template.cpt /.gitignore: -------------------------------------------------------------------------------- 1 | acl-anthology 2 | __pycache__ 3 | .idea 4 | papers/ 5 | auto/ 6 | auto.tsv 7 | annotations.tsv 8 | *.swp 9 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Copyright 2020 Graham Neubig 2 | 3 | Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 4 | 5 | 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 6 | 7 | 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 8 | 9 | 3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. 10 | 11 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Concepts in Neural Networks for NLP 2 | by [Graham Neubig](http://phontron.com), [Pengfei Liu](http://pfliu.com), and other contributors 3 | 4 | This is a repository that makes an attempt to empirically take stock of the **most important concepts necessary to 5 | understand cutting-edge research in neural network models for NLP**. You can look at two figures below, generated 6 | automatically and through manual annotation, to see which of these topics are most common in current NLP research in 7 | NLP (in recent papers in the [ACL anthology](http://aclanthology.info)). See the [list of concepts](concepts.md) 8 | to see a more complete description of what each of the tags below means, and also see the directions below the figures if you want to contribute 9 | to the project, or want to figure out how to generate the graphs yourself. 10 | 11 | ![Manually Annotated Concepts in Neural Nets for NLP](fig/annotations.png) 12 | 13 | ![Automatically Annotated Concepts in Neural Nets for NLP](fig/auto.png) 14 | 15 | ## Contributing 16 | 17 | There are several ways to contribute: 18 | * **Perform annotation:** We could use more manual annotations of the concepts covered by papers, so please take a look 19 | below if you're interested in helping out in this regard. 20 | * **Improve the code:** Check the "issues" section of the github repo, there are still several things that could be done 21 | to improve the functionality of the code. 22 | 23 | ## Setup 24 | 25 | If you want to run the code here, first get the requirements: 26 | 27 | pip install -r requirements.txt 28 | 29 | Also, install `poppler` through your favorite version control software. 30 | 31 | Then download the ACL Anthology github repository: 32 | 33 | git clone https://github.com/acl-org/acl-anthology.git 34 | 35 | ## How to Perform Annotation 36 | 37 | 1. Read `concepts.md` to learn more about the concepts that are annotated here. 38 | 2. Run `get_paper.py` (directions below) to get a paper to annotate with the concepts contained therein. 39 | 3. When the paper is downloaded, a text file corresponding to the paper ID will be written out to `auto/ID.txt`. This 40 | will include some comments with the paper name, title, PDF location, etc. In addition, it will have some 41 | automatically provided concept tags that were estimated based on the article text. 42 | 4. Manually confirm that the automatically annotated concepts are correct. If so, then delete the comment saying 43 | "# CHECK: ". If the concepts are not included in the paper, then delete them. 44 | 5. Add any concepts that were not caught by the automatic process. 45 | 6. Once you are done annotating concepts, move `auto/ID.txt` to `annotations/ID.txt`. You can then send a pull request 46 | to the repository to contribute back. 47 | 48 | **Examples Running `get_paper.py`** 49 | 50 | Get random papers to annotate: 51 | * 1: `python get_paper.py --years 18-19 --confs "P,N,D" --n_sample 2 --template template.cpt --feature fulltext` 52 | 53 | Get a specific paper to annotate: 54 | * 2: `python get_paper.py --paper_id P19-1032 --template template.cpt --feature fulltext` 55 | 56 | where: 57 | * `paper_id`: it usually takes the form: P|N|D-1234. Moreover, once the `paper_id` has been specified, `years`,`confs`, and `n_sample` are not required. 58 | * `confs`: a comma-separted list of conference abbreviations from which papers can be selected (P,N,D) 59 | * `n_sample`: the number of sampled papers if paper_id is not specified 60 | * `template`: the file of concept template (e.g. template.cpt) 61 | * `feature`: which part of paper is used to classify (e.g. fulltext or title) 62 | 63 | ## Generating Aggregate Statistics/Charts 64 | 65 | To generate bar charts like the ones above, you use `aggregate_tags.py` to calculate aggregate statistics, then 66 | `draw_bar.py` to generate the bar chart. 67 | 68 | If you want to do this for the hand-annotated concepts included in this library, you can run the following commands. 69 | 70 | python aggregate_tags.py annotations --concepts concepts.md > annotations.tsv 71 | python draw_bar.py --tsv annotations.tsv --fig annotations.png 72 | 73 | If you want to do this for automatically-generated tags from some portion of the ACL anthology, you can run the 74 | following command, which will download the papers from all the specified conferences (e.g. ACL, NAACL, and EMNLP from 75 | 2018 and 2019). 76 | 77 | python get_paper.py --years 18-19 --confs "P,N,D" --n_sample all --template template.cpt --feature fulltext 78 | python aggregate_tags.py auto --concepts concepts.md > auto.tsv 79 | python draw_bar.py --tsv auto.tsv --fig auto.png 80 | 81 | ## Acknowledgements 82 | 83 | Thanks to Yoav Goldberg and the [TAS for CS11-747](http://phontron.com/class/nn4nlp2020/description.html) who gave feedback on the categorization of concepts. Also, thanks, of course, to everyone who has contributed or will contribute. 84 | -------------------------------------------------------------------------------- /aggregate_tags.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import os 3 | import re 4 | from collections import defaultdict 5 | 6 | def get_all_files(f): 7 | if os.path.isfile(f): 8 | return [f] 9 | elif os.path.isdir(f): 10 | ret = [] 11 | for x in os.listdir(f): 12 | ret += get_all_files(os.path.join(f,x)) 13 | return ret 14 | else: 15 | raise ValueError(f'Could not find file {f}') 16 | 17 | def get_tags(f, legal_tags = None): 18 | with open(f, 'r') as fin: 19 | for line in fin: 20 | line = line.strip() 21 | if not line.startswith("#"): 22 | if legal_tags and line not in legal_tags: 23 | raise ValueError(f'Illegal tag found in file {f}, {line}') 24 | yield line 25 | 26 | def parse_concepts(f): 27 | concepts = {'not-neural': 1} 28 | concept_re = re.compile(r': \[?`([a-z0-9-]+)`') 29 | with open(f, 'r') as fin: 30 | for line in fin: 31 | m = re.search(concept_re, line) 32 | if m: 33 | val = m.group(1) 34 | concepts[val] = 1 35 | return concepts 36 | 37 | if __name__ == "__main__": 38 | 39 | parser = argparse.ArgumentParser(description="Aggregate tags from several files or a directories of files into a" 40 | " tag/count file for use in draw_bar.py.") 41 | 42 | parser.add_argument("files", type=str, nargs='+', 43 | help="The files or directories over which you'd like to aggregate") 44 | parser.add_argument("--concepts", type=str, default=None, 45 | help="The concepts.md file, which is parsed to find a list of legal tags") 46 | 47 | args = parser.parse_args() 48 | 49 | fs = [] 50 | for f in args.files: 51 | fs += get_all_files(f) 52 | 53 | legal_concepts = parse_concepts(args.concepts) if args.concepts else None 54 | 55 | all_tags = defaultdict(lambda: 0) 56 | for f in fs: 57 | for tag in get_tags(f, legal_tags=legal_concepts): 58 | all_tags[tag] += 1 59 | 60 | for k, v in sorted(list(all_tags.items()), key=lambda x: -x[1]): 61 | print(f'{k}\t{v}') 62 | 63 | -------------------------------------------------------------------------------- /annotations/D18-1062.txt: -------------------------------------------------------------------------------- 1 | # Title: Unsupervised Bilingual Lexicon Induction via Latent Variable Models 2 | # Online location: https://www.aclweb.org/anthology/D18-1062.pdf 3 | train-mll 4 | adv-train 5 | latent-vae 6 | task-lexicon 7 | -------------------------------------------------------------------------------- /annotations/D18-1220.txt: -------------------------------------------------------------------------------- 1 | # Title: A Knowledge Hunting Framework for Common Sense Reasoning 2 | # Online location: https://www.aclweb.org/anthology/D18-1220.pdf 3 | not-neural 4 | -------------------------------------------------------------------------------- /annotations/D18-1276.txt: -------------------------------------------------------------------------------- 1 | # Title: Free as in Free Word Order: An Energy Based Model for Word Segmentation and Morphological Tagging in Sanskrit 2 | # Online location: https://www.aclweb.org/anthology/D18-1276.pdf 3 | init-glorot 4 | arch-lstm 5 | arch-att 6 | arch-subword 7 | arch-gnn 8 | arch-energy 9 | search-beam 10 | struct-crf 11 | task-seqlab 12 | task-seq2seq 13 | loss-margin 14 | -------------------------------------------------------------------------------- /annotations/D18-1332.txt: -------------------------------------------------------------------------------- 1 | # Title: Accelerating Asynchronous Stochastic Gradient Descent for Neural Machine Translation 2 | # Online location: https://www.aclweb.org/anthology/D18-1332.pdf 3 | train-parallel 4 | optim-adam 5 | norm-gradient 6 | arch-gru 7 | task-seq2seq 8 | -------------------------------------------------------------------------------- /annotations/D18-1494.txt: -------------------------------------------------------------------------------- 1 | # Title: Siamese Network-Based Supervised Topic Modeling 2 | # Online location: https://www.aclweb.org/anthology/D18-1494.pdf 3 | reg-stopping 4 | arch-subword 5 | pre-word2vec 6 | latent-topic 7 | task-textclass 8 | -------------------------------------------------------------------------------- /annotations/D19-1350.txt: -------------------------------------------------------------------------------- 1 | # Title: Neural Topic Model with Reinforcement Learning 2 | # Online location: https://www.aclweb.org/anthology/D19-1350.pdf 3 | latent-vae 4 | nondif-reinforce 5 | optim-adam 6 | -------------------------------------------------------------------------------- /annotations/D19-1555.txt: -------------------------------------------------------------------------------- 1 | # Title: Leveraging Structural and Semantic Correspondence for Attribute-Oriented Aspect Sentiment Discovery 2 | # Online location: https://www.aclweb.org/anthology/D19-1555.pdf 3 | # CHECK: confidence=0.9, justification=Matched regex attention 4 | arch-att 5 | pre-word2vec 6 | pre-use 7 | -------------------------------------------------------------------------------- /annotations/D19-1597.txt: -------------------------------------------------------------------------------- 1 | # Title: GeoSQA: A Benchmark for Scenario-based Question Answering in the Geography Domain at High School Level 2 | # Online location: https://www.aclweb.org/anthology/D19-1597.pdf 3 | not-neural 4 | -------------------------------------------------------------------------------- /annotations/N18-1009.txt: -------------------------------------------------------------------------------- 1 | # Title: Please Clap: Modeling Applause in Campaign Speeches 2 | # Online location: https://www.aclweb.org/anthology/N18-1009.pdf 3 | optim-adam 4 | reg-dropout 5 | arch-lstm 6 | arch-cnn 7 | task-textclass 8 | pre-skipthought 9 | -------------------------------------------------------------------------------- /annotations/N18-1045.txt: -------------------------------------------------------------------------------- 1 | # Title: Distributional Inclusion Vector Embedding for Unsupervised Hypernymy Detection 2 | # Online location: https://www.aclweb.org/anthology/N18-1045.pdf 3 | pre-word2vec 4 | optim-adam 5 | optim-projection 6 | -------------------------------------------------------------------------------- /annotations/N18-1158.txt: -------------------------------------------------------------------------------- 1 | # Title: Ranking Sentences for Extractive Summarization with Reinforcement Learning 2 | # Online location: https://www.aclweb.org/anthology/N18-1158.pdf 3 | optim-adam 4 | pool-max 5 | arch-lstm 6 | arch-cnn 7 | nondif-reinforce 8 | task-extractive 9 | task-seq2seq 10 | pre-word2vec 11 | -------------------------------------------------------------------------------- /annotations/N18-1176.txt: -------------------------------------------------------------------------------- 1 | # Title: Linguistic Cues to Deception and Perceived Deception in Interview Dialogues 2 | # Online location: https://www.aclweb.org/anthology/N18-1176.pdf 3 | not-neural 4 | -------------------------------------------------------------------------------- /annotations/N18-2031.txt: -------------------------------------------------------------------------------- 1 | # Title: Frustratingly Easy Meta-Embedding – Computing Meta-Embeddings by Averaging Source Word Embeddings 2 | # Online location: https://www.aclweb.org/anthology/N18-2031.pdf 3 | pool-mean 4 | pre-glove 5 | pre-word2vec 6 | -------------------------------------------------------------------------------- /annotations/N18-2075.txt: -------------------------------------------------------------------------------- 1 | # Title: Text Segmentation as a Supervised Learning Task 2 | # Online location: https://www.aclweb.org/anthology/N18-2075.pdf 3 | pool-max 4 | arch-bilstm 5 | pre-word2vec 6 | -------------------------------------------------------------------------------- /annotations/N18-2097.txt: -------------------------------------------------------------------------------- 1 | # Title: A Discourse-Aware Attention Model for Abstractive Summarization of Long Documents 2 | # Online location: https://www.aclweb.org/anthology/N18-2097.pdf 3 | arch-bilstm 4 | arch-att 5 | arch-copy 6 | arch-coverage 7 | search-beam 8 | optim-adagrad 9 | task-seq2seq 10 | -------------------------------------------------------------------------------- /annotations/N19-1015.txt: -------------------------------------------------------------------------------- 1 | # Title: Topic-Guided Variational Auto-Encoder for Text Generation 2 | # Online location: https://www.aclweb.org/anthology/N19-1015.pdf 3 | reg-dropout 4 | arch-gru 5 | arch-att 6 | latent-vae 7 | latent-topic 8 | task-lm 9 | task-seq2seq 10 | -------------------------------------------------------------------------------- /annotations/N19-1071.txt: -------------------------------------------------------------------------------- 1 | # Title: SEQˆ3: Differentiable Sequence-to-Sequence-to-Sequence Autoencoder for Unsupervised Abstractive Sentence Compression 2 | # Online location: https://www.aclweb.org/anthology/N19-1071.pdf 3 | optim-adam 4 | task-lm 5 | reg-dropout 6 | reg-worddropout 7 | norm-layer 8 | arch-birnn 9 | arch-lstm 10 | arch-att 11 | pre-glove 12 | nondif-reinforce 13 | adv-gan 14 | latent-vae 15 | task-seq2seq 16 | -------------------------------------------------------------------------------- /annotations/N19-1154.txt: -------------------------------------------------------------------------------- 1 | # Title: One Size Does Not Fit All: Comparing NMT Representations of Different Granularities 2 | # Online location: https://www.aclweb.org/anthology/N19-1154.pdf 3 | optim-sgd 4 | arch-lstm 5 | arch-att 6 | arch-subword 7 | task-seqlab 8 | task-seq2seq 9 | -------------------------------------------------------------------------------- /annotations/N19-1157.txt: -------------------------------------------------------------------------------- 1 | # Title: Quantifying the morphosyntactic content of Brown Clusters 2 | # Online location: https://www.aclweb.org/anthology/N19-1157.pdf 3 | not-neural 4 | -------------------------------------------------------------------------------- /annotations/N19-1185.txt: -------------------------------------------------------------------------------- 1 | # Title: Tweet Stance Detection Using an Attention based Neural Ensemble Model 2 | # Online location: https://www.aclweb.org/anthology/N19-1185.pdf 3 | optim-adam 4 | arch-bilstm 5 | reg-norm 6 | pool-max 7 | arch-cnn 8 | arch-att 9 | task-textclass 10 | -------------------------------------------------------------------------------- /annotations/N19-1329.txt: -------------------------------------------------------------------------------- 1 | # Title: Understanding Learning Dynamics Of Language Models with SVCCA 2 | # Online location: https://www.aclweb.org/anthology/N19-1329.pdf 3 | pre-elmo 4 | arch-lstm 5 | loss-cca 6 | loss-svd 7 | task-seqlab 8 | task-lm 9 | -------------------------------------------------------------------------------- /annotations/N19-2009.txt: -------------------------------------------------------------------------------- 1 | # Title: Multi-Modal Generative Adversarial Network for Short Product Title Generation in Mobile E-Commerce 2 | # Online location: https://www.aclweb.org/anthology/N19-2009.pdf 3 | arch-cnn 4 | arch-lstm 5 | arch-att 6 | activ-tanh 7 | adv-gan 8 | task-condlm 9 | nondif-reinforce 10 | -------------------------------------------------------------------------------- /annotations/P18-1089.txt: -------------------------------------------------------------------------------- 1 | # Title: Identifying Transferable Information Across Domains for Cross-domain Sentiment Classification 2 | # Online location: https://www.aclweb.org/anthology/P18-1089.pdf 3 | train-transfer 4 | comb-ensemble 5 | pre-word2vec 6 | task-textclass 7 | -------------------------------------------------------------------------------- /annotations/P18-1145.txt: -------------------------------------------------------------------------------- 1 | # Title: Nugget Proposal Networks for Chinese Event Detection 2 | # Online location: https://www.aclweb.org/anthology/P18-1145.pdf 3 | arch-bilstm 4 | arch-cnn 5 | arch-gating 6 | task-seqlab 7 | activ-tanh 8 | -------------------------------------------------------------------------------- /annotations/P18-1192.txt: -------------------------------------------------------------------------------- 1 | # Title: Syntax for Semantic Role Labeling, To Be, Or Not To Be 2 | # Online location: https://www.aclweb.org/anthology/P18-1192.pdf 3 | optim-adam 4 | reg-worddropout 5 | arch-bilstm 6 | arch-cnn 7 | arch-gating 8 | pre-glove 9 | pre-word2vec 10 | task-seqlab 11 | task-relation 12 | -------------------------------------------------------------------------------- /annotations/P18-2040.txt: -------------------------------------------------------------------------------- 1 | # Title: Improving Topic Quality by Promoting Named Entities in Topic Modeling 2 | # Online location: https://www.aclweb.org/anthology/P18-2040.pdf 3 | not-neural 4 | -------------------------------------------------------------------------------- /annotations/P19-1009.txt: -------------------------------------------------------------------------------- 1 | # Title: AMR Parsing as Sequence-to-Graph Transduction 2 | # Online location: https://www.aclweb.org/anthology/P19-1009.pdf 3 | optim-adam 4 | reg-dropout 5 | reg-stopping 6 | arch-bilinear 7 | arch-cnn 8 | task-graph 9 | norm-gradient 10 | arch-lstm 11 | pool-mean 12 | pool-max 13 | arch-att 14 | arch-copy 15 | search-greedy 16 | search-beam 17 | pre-glove 18 | pre-bert 19 | -------------------------------------------------------------------------------- /annotations/P19-1085.txt: -------------------------------------------------------------------------------- 1 | # Title: GEAR: Graph-based Evidence Aggregating and Reasoning for Fact Verification 2 | # Online location: https://www.aclweb.org/anthology/P19-1085.pdf 3 | optim-adam 4 | reg-patience 5 | arch-att 6 | arch-transformer 7 | arch-gnn 8 | activ-relu 9 | pool-max 10 | pool-mean 11 | pre-bert 12 | task-textpair 13 | # CHECK: confidence=0.9, justification=Matched regex language modeling|language model 14 | task-lm 15 | -------------------------------------------------------------------------------- /annotations/P19-1113.txt: -------------------------------------------------------------------------------- 1 | # Title: Rumor Detection by Exploiting User Credibility Information, Attention and Multi-task Learning 2 | # Online location: https://www.aclweb.org/anthology/P19-1113.pdf 3 | reg-dropout 4 | optim-adadelta 5 | arch-lstm 6 | arch-gru 7 | train-mtl 8 | arch-att 9 | task-textclass 10 | pre-word2vec 11 | -------------------------------------------------------------------------------- /annotations/P19-1178.txt: -------------------------------------------------------------------------------- 1 | # Title: Self-Supervised Neural Machine Translation 2 | # Online location: https://www.aclweb.org/anthology/P19-1178.pdf 3 | optim-noam 4 | reg-dropout 5 | reg-labelsmooth 6 | arch-lstm 7 | arch-transformer 8 | loss-margin 9 | task-lm 10 | task-seq2seq 11 | train-transfer 12 | -------------------------------------------------------------------------------- /annotations/P19-1201.txt: -------------------------------------------------------------------------------- 1 | # Title: Jointly Learning Semantic Parser and Natural Language Generator via Dual Information Maximization 2 | # Online location: https://www.aclweb.org/anthology/P19-1201.pdf 3 | optim-adam 4 | reg-patience 5 | train-mtl 6 | arch-bilstm 7 | arch-att 8 | arch-copy 9 | search-beam 10 | nondif-reinforce 11 | task-seq2seq 12 | task-tree 13 | -------------------------------------------------------------------------------- /annotations/P19-1263.txt: -------------------------------------------------------------------------------- 1 | # Title: Exploiting Explicit Paths for Multi-hop Reading Comprehension 2 | # Online location: https://www.aclweb.org/anthology/P19-1263.pdf 3 | optim-adam 4 | norm-gradient 5 | reg-dropout 6 | arch-bilstm 7 | arch-gru 8 | arch-att 9 | search-beam 10 | pre-elmo 11 | pre-glove 12 | -------------------------------------------------------------------------------- /annotations/P19-1286.txt: -------------------------------------------------------------------------------- 1 | # Title: Domain Adaptation of Neural Machine Translation by Lexicon Induction 2 | # Online location: https://www.aclweb.org/anthology/P19-1286.pdf 3 | loss-svd 4 | arch-lstm 5 | arch-transformer 6 | adv-feat 7 | train-augment 8 | train-transfer 9 | optim-adam 10 | search-beam 11 | task-seq2seq 12 | task-lexicon 13 | -------------------------------------------------------------------------------- /annotations/P19-1326.txt: -------------------------------------------------------------------------------- 1 | # Title: Embedding Imputation with Grounded Language Information 2 | # Online location: https://www.aclweb.org/anthology/P19-1326.pdf 3 | arch-gcnn 4 | activ-relu 5 | pre-glove 6 | -------------------------------------------------------------------------------- /annotations/P19-1336.txt: -------------------------------------------------------------------------------- 1 | # Title: Dual Adversarial Neural Transfer for Low-Resource Named Entity Recognition 2 | # Online location: https://www.aclweb.org/anthology/P19-1336.pdf 3 | arch-transformer 4 | arch-att 5 | arch-bilstm 6 | adv-feat 7 | adv-train 8 | struct-crf 9 | -------------------------------------------------------------------------------- /annotations/P19-1447.txt: -------------------------------------------------------------------------------- 1 | # Title: Reranking for Neural Semantic Parsing 2 | # Online location: https://www.aclweb.org/anthology/P19-1447.pdf 3 | arch-att 4 | arch-copy 5 | arch-bilstm 6 | task-tree 7 | task-textpair 8 | search-beam 9 | nondif-minrisk 10 | -------------------------------------------------------------------------------- /annotations/P19-1511.txt: -------------------------------------------------------------------------------- 1 | # Title: Sequence-to-Nuggets: Nested Entity Mention Detection via Anchor-Region Networks 2 | # Online location: https://www.aclweb.org/anthology/P19-1511.pdf 3 | arch-bilstm 4 | arch-cnn 5 | pre-glove 6 | optim-adadelta 7 | struct-crf 8 | task-seqlab 9 | -------------------------------------------------------------------------------- /annotations/P19-1540.txt: -------------------------------------------------------------------------------- 1 | # Title: Ordinal and Attribute Aware Response Generation in a Multimodal Dialogue System 2 | # Online location: https://www.aclweb.org/anthology/P19-1540.pdf 3 | arch-bilinear 4 | reg-dropout 5 | init-glorot 6 | reg-dropout 7 | reg-labelsmooth 8 | norm-gradient 9 | arch-gru 10 | arch-selfatt 11 | search-beam 12 | task-condlm 13 | -------------------------------------------------------------------------------- /annotations/P19-2032.txt: -------------------------------------------------------------------------------- 1 | # Title: Automatic Generation of Personalized Comment Based on User Profile 2 | # Online location: https://www.aclweb.org/anthology/P19-2032.pdf 3 | arch-bilstm 4 | optim-sgd 5 | arch-att 6 | arch-memo 7 | arch-gating 8 | search-beam 9 | task-seq2seq 10 | -------------------------------------------------------------------------------- /annotations/P19-2038.txt: -------------------------------------------------------------------------------- 1 | # Title: ARHNet - Leveraging Community Interaction for Detection of Religious Hate Speech in Arabic 2 | # Online location: https://www.aclweb.org/anthology/P19-2038.pdf 3 | optim-adam 4 | reg-dropout 5 | reg-norm 6 | arch-lstm 7 | arch-bilstm 8 | arch-gru 9 | arch-bigru 10 | arch-selfatt 11 | arch-gnn 12 | arch-cnn 13 | pre-word2vec 14 | task-textclass 15 | -------------------------------------------------------------------------------- /concepts.md: -------------------------------------------------------------------------------- 1 | # Concept Hierarchy in Neural Networks for NLP 2 | 3 | Below is a list of important concepts in neural networks for NLP. In the `annotations/` directory in this repository, 4 | we have examples of papers annotated with these concepts that you can peruse. 5 | 6 | **Annotation Critera**: For a particular paper, the concept should be annotated if it is important to understand the 7 | proposed method. It should also be annotated if it's important to understand the evaluation. For example, if a 8 | proposed self-attention model is compared to a baseline that uses an LSTM, and the difference between these two 9 | methods is important to understanding the experimental results, then the LSTM concept should also be annotated. Concepts 10 | do not need to be annotated if they are simply mentioned in passing, or in the related work section. 11 | 12 | **Implication**: Some tags are listed with "`XXX` (implies `YYY`)" which means you need to understand a particular 13 | concept `XXX` in order to understand concept `YYY`. If `YYY` exists in a paper, you do not need to annotate `XXX`. 14 | 15 | **Non-neural Papers**: This conceptual hierarchy is for tagging papers that are about neural network models for NLP. 16 | If a paper is not fundamentally about some application of neural networks to NLP, it should be tagged with `not-neural`, 17 | and no other tags need to be applied. 18 | 19 | ## Optimization/Learning 20 | 21 | ### Optimizers and Optimization Techniques 22 | 23 | * Mini-batch SGD: [`optim-sgd`](http://pfliu.com/pl-nlp2019/bs/optim-sgd.html) 24 | * Adam: [`optim-adam`](http://pfliu.com/pl-nlp2019/bs/optim-adam.html) (implies `optim-sgd`) 25 | * Adagrad: [`optim-adagrad`](http://pfliu.com/pl-nlp2019/bs/optim-adagrad.html) (implies `optim-sgd`) 26 | * Adadelta: [`optim-adadelta`](http://pfliu.com/pl-nlp2019/bs/optim-adadelta.html) (implies `optim-sgd`) 27 | * Adam with Specialized Transformer Learning Rate ("Noam" Schedule): [`optim-noam`](http://pfliu.com/pl-nlp2019/bs/optim-noam.html) (implies `optim-adam`) 28 | * SGD with Momentum: [`optim-momentum`](http://pfliu.com/pl-nlp2019/bs/optim-momentum.html) (implies `optim-sgd`) 29 | * AMS: [`optim-amsgrad`](http://pfliu.com/pl-nlp2019/bs/optim-amsgrad.html) (implies `optim-sgd`) 30 | * Projection / Projected Gradient Descent: [`optim-projection`](http://pfliu.com/pl-nlp2019/bs/optim-projection.html) (implies `optim-sgd`) 31 | 32 | ### Initialization 33 | 34 | * Glorot/Xavier Initialization: [`init-glorot`](http://pfliu.com/pl-nlp2019/bs/init-glorot.html) 35 | * He Initialization: [`init-he`](http://pfliu.com/pl-nlp2019/bs/init-he.html) 36 | 37 | ### Regularization 38 | 39 | * Dropout: [`reg-dropout`](http://pfliu.com/pl-nlp2019/bs/reg-dropout.html) 40 | * Word Dropout: [`reg-worddropout`](http://pfliu.com/pl-nlp2019/bs/reg-worddropout.html) (implies `reg-dropout`) 41 | * Norm (L1/L2) Regularization: [`reg-norm`](http://pfliu.com/pl-nlp2019/bs/reg-norm.html) 42 | * Early Stopping: [`reg-stopping`](http://pfliu.com/pl-nlp2019/bs/reg-stopping.html) 43 | * Patience: [`reg-patience`](http://pfliu.com/pl-nlp2019/bs/reg-patience.html) (implies `reg-stopping`) 44 | * Weight Decay: [`reg-decay`](http://pfliu.com/pl-nlp2019/bs/reg-decay.html) 45 | * Label Smoothing: [`reg-labelsmooth`](http://pfliu.com/pl-nlp2019/bs/reg-labelsmooth.html) 46 | 47 | ### Normalization 48 | 49 | * Layer Normalization: [`norm-layer`](http://pfliu.com/pl-nlp2019/bs/norm-layer.html) 50 | * Batch Normalization: [`norm-batch`](http://pfliu.com/pl-nlp2019/bs/norm-batch.html) 51 | * Gradient Clipping: [`norm-gradient`](http://pfliu.com/pl-nlp2019/bs/norm-gradient.html) 52 | 53 | ### Loss Functions (other than cross-entropy) 54 | 55 | * Canonical Correlation Analysis (CCA): [`loss-cca`](http://pfliu.com/pl-nlp2019/bs/loss-cca.html) 56 | * Singular Value Decomposition (SVD): [`loss-svd`](http://pfliu.com/pl-nlp2019/bs/loss-svd.html) 57 | * Margin-based Loss Functions: [`loss-margin`](http://pfliu.com/pl-nlp2019/bs/loss-margin.html) 58 | * Contrastive Loss: [`loss-cons`](http://pfliu.com/pl-nlp2019/bs/loss-cons.html) 59 | * Noise Contrastive Estimation (NCE): [`loss-nce`](http://pfliu.com/pl-nlp2019/bs/loss-nce.html) (implies `loss-cons`) 60 | * Triplet Loss: [`loss-triplet`](http://pfliu.com/pl-nlp2019/bs/loss-triplet.html) (implies `loss-cons`) 61 | 62 | ### Training Paradigms 63 | 64 | * Multi-task Learning (MTL): [`train-mtl`](http://pfliu.com/pl-nlp2019/bs/train-mtl.html) 65 | * Multi-lingual Learning (MLL): [`train-mll`](http://pfliu.com/pl-nlp2019/bs/train-mll.html) (implies `train-mtl`) 66 | * Transfer Learning: [`train-transfer`](http://pfliu.com/pl-nlp2019/bs/train-transfer.html) 67 | * Active Learning: [`train-active`](http://pfliu.com/pl-nlp2019/bs/train-active.html) 68 | * Data Augmentation: [`train-augment`](http://pfliu.com/pl-nlp2019/bs/train-augment.html) 69 | * Curriculum Learning: [`train-curriculum`](http://pfliu.com/pl-nlp2019/bs/train-curriculum.html) 70 | * Parallel Training: [`train-parallel`](http://pfliu.com/pl-nlp2019/bs/train-parallel.html) 71 | 72 | ## Sequence Modeling Architectures 73 | 74 | ### Activation Functions 75 | 76 | * Hyperbolic Tangent (tanh): [`activ-tanh`](http://pfliu.com/pl-nlp2019/bs/activ-tanh.html) 77 | * Rectified Linear Units (RelU): [`activ-relu`](http://pfliu.com/pl-nlp2019/bs/activ-relu.html) 78 | 79 | ### Pooling Operations 80 | 81 | * Max Pooling: [`pool-max`](http://pfliu.com/pl-nlp2019/bs/pool-max.html) 82 | * Mean Pooling: [`pool-mean`](http://pfliu.com/pl-nlp2019/bs/pool-mean.html) 83 | * k-Max Pooling: [`pool-kmax`](http://pfliu.com/pl-nlp2019/bs/pool-kmax.html) 84 | 85 | ### Recurrent Architectures 86 | 87 | * Recurrent Neural Network (RNN): [`arch-rnn`](http://pfliu.com/pl-nlp2019/bs/arch-rnn.html) 88 | * Bi-directional Recurrent Neural Network (Bi-RNN): [`arch-birnn`](http://pfliu.com/pl-nlp2019/bs/arch-birnn.html) (implies `arch-rnn`) 89 | * Long Short-term Memory (LSTM): [`arch-lstm`](http://pfliu.com/pl-nlp2019/bs/arch-lstm.html) (implies `arch-rnn`) 90 | * Bi-directional Long Short-term Memory (LSTM): [`arch-bilstm`](http://pfliu.com/pl-nlp2019/bs/arch-bilstm.html) (implies `arch-birnn`, `arch-lstm`) 91 | * Gated Recurrent Units (GRU): [`arch-gru`](http://pfliu.com/pl-nlp2019/bs/arch-gru.html) (implies `arch-rnn`) 92 | * Bi-directional Gated Recurrent Units (GRU): [`arch-bigru`](http://pfliu.com/pl-nlp2019/bs/arch-bigru.html) (implies `arch-birnn`, `arch-gru`) 93 | 94 | ### Other Sequential/Structured Architectures 95 | 96 | * Bag-of-words, Bag-of-embeddings, Continuous Bag-of-words (BOW): `arch-bow` 97 | * Convolutional Neural Networks (CNN): [`arch-cnn`](http://pfliu.com/pl-nlp2019/bs/arch-cnn.html) 98 | * Attention: [`arch-att`](http://pfliu.com/pl-nlp2019/bs/arch-att.html) 99 | * Self Attention: [`arch-selfatt`](http://pfliu.com/pl-nlp2019/bs/arch-selfatt.html) (implies `arch-att`) 100 | * Recursive Neural Network (RecNN): [`arch-recnn`](http://pfliu.com/pl-nlp2019/bs/arch-recnn.html) 101 | * Tree-structured Long Short-term Memory (TreeLSTM): [`arch-treelstm`](http://pfliu.com/pl-nlp2019/bs/arch-treelstm.html) (implies `arch-recnn`) 102 | * Graph Neural Network (GNN): [`arch-gnn`](http://pfliu.com/pl-nlp2019/bs/arch-gnn.html) 103 | * Graph Convolutional Neural Network (GCNN): [`arch-gcnn`](http://pfliu.com/pl-nlp2019/bs/arch-gcnn.html) (implies `arch-gnn`) 104 | 105 | ### Architectural Techniques 106 | 107 | * Residual Connections (ResNet): [`arch-residual`](http://pfliu.com/pl-nlp2019/bs/arch-residual.html) 108 | * Gating Connections, Highway Connections: [`arch-gating`](http://pfliu.com/pl-nlp2019/bs/arch-gating.html) 109 | * Memory: [`arch-memo`](http://pfliu.com/pl-nlp2019/bs/arch-memo.html) 110 | * Copy Mechanism: [`arch-copy`](http://pfliu.com/pl-nlp2019/bs/arch-copy.html) 111 | * Bilinear, Biaffine Models: [`arch-bilinear`](http://pfliu.com/pl-nlp2019/bs/arch-bilinear.html) 112 | * Coverage Vectors/Penalties: [`arch-coverage`](http://pfliu.com/pl-nlp2019/bs/arch-coverage.html) 113 | * Subword Units: [`arch-subword`](http://pfliu.com/pl-nlp2019/bs/arch-subword.html) 114 | * Energy-based, Globally-normalized Mdels: [`arch-energy`](http://pfliu.com/pl-nlp2019/bs/arch-energy.html) 115 | 116 | ### Standard Composite Architectures 117 | 118 | * Transformer: [`arch-transformer`](http://pfliu.com/pl-nlp2019/bs/arch-transformer.html) (implies `arch-selfatt`, `arch-residual`, `arch-layernorm`, `optim-noam`) 119 | 120 | 121 | ## Model Combination 122 | 123 | * Ensembling: [`comb-ensemble`](http://pfliu.com/pl-nlp2019/bs/comb-ensemble.html) 124 | 125 | ## Search Algorithms 126 | 127 | * Greedy Search: [`search-greedy`](http://pfliu.com/pl-nlp2019/bs/search-greedy.html) 128 | * Beam Search: [`search-beam`](http://pfliu.com/pl-nlp2019/bs/search-beam.html) 129 | * A* Search: [`search-astar`](http://pfliu.com/pl-nlp2019/bs/search-astar.html) 130 | * Viterbi Algorithm: [`search-viterbi`](http://pfliu.com/pl-nlp2019/bs/search-viterbi.html) 131 | * Ancestral Sampling: [`search-sampling`](http://pfliu.com/pl-nlp2019/bs/search-sampling.html) 132 | * Gumbel Max: [`search-gumbel`](http://pfliu.com/pl-nlp2019/bs/search-gumbel.html) (implies `search-sampling`) 133 | 134 | ## Prediction Tasks 135 | 136 | * Text Classification (text -> label): [`task-textclass`](http://pfliu.com/pl-nlp2019/bs/task-textclass.html) 137 | * Text Pair Classification (two texts -> label: [`task-textpair`](http://pfliu.com/pl-nlp2019/bs/task-textpair.html) 138 | * Sequence Labeling (text -> one label per token): [`task-seqlab`](http://pfliu.com/pl-nlp2019/bs/task-seqlab.html) 139 | * Extractive Summarization (text -> subset of text): [`task-extractive`](http://pfliu.com/pl-nlp2019/bs/task-extractive.html) (implies `text-seqlab`) 140 | * Span Labeling (text -> labels on spans): [`task-spanlab`](http://pfliu.com/pl-nlp2019/bs/task-spanlab.html) 141 | * Language Modeling (predict probability of text): [`task-lm`](http://pfliu.com/pl-nlp2019/bs/task-lm.html) 142 | * Conditioned Language Modeling (some input -> text): [`task-condlm`](http://pfliu.com/pl-nlp2019/bs/task-condlm.html) (implies `task-lm`) 143 | * Sequence-to-sequence Tasks (text -> text, including MT): [`task-seq2seq`](http://pfliu.com/pl-nlp2019/bs/task-seq2seq.html) (implies `task-condlm`) 144 | * Cloze-style Prediction, Masked Language Modeling (right and left context -> word): [`task-cloze`](http://pfliu.com/pl-nlp2019/bs/task-cloze.html) 145 | * Context Prediction (as in word2vec) (word -> right and left context): [`task-context`](http://pfliu.com/pl-nlp2019/bs/task-context.html) 146 | * Relation Prediction (text -> graph of relations between words, including dependency parsing): [`task-relation`](http://pfliu.com/pl-nlp2019/bs/task-relation.html) 147 | * Tree Prediction (text -> tree, including syntactic and some semantic semantic parsing): [`task-tree`](http://pfliu.com/pl-nlp2019/bs/task-tree.html) 148 | * Graph Prediction (text -> graph not necessarily between nodes): [`task-graph`](http://pfliu.com/pl-nlp2019/bs/task-graph.html) 149 | * Lexicon Induction/Embedding Alignment (text/embeddings -> bi- or multi-lingual lexicon): [`task-lexicon`](http://pfliu.com/pl-nlp2019/bs/task-lexicon.html) 150 | * Word Alignment (parallel text -> alignment between words): [`task-alignment`](http://pfliu.com/pl-nlp2019/bs/task-alignment.html) 151 | 152 | ## Composite Pre-trained Embedding Techniques 153 | 154 | * word2vec: [`pre-word2vec`](http://pfliu.com/pl-nlp2019/bs/pre-word2vec.html) (implies `arch-cbow`, `task-cloze`, `task-context`) 155 | * fasttext: [`pre-fasttext`](http://pfliu.com/pl-nlp2019/bs/pre-fasttext.html) (implies `arch-cbow`, `arch-subword`, `task-cloze`, `task-context`) 156 | * GloVe: [`pre-glove`](http://pfliu.com/pl-nlp2019/bs/pre-glove.html) 157 | * Paragraph Vector (ParaVec): [`pre-paravec`](http://pfliu.com/pl-nlp2019/bs/pre-paravec.html) 158 | * Skip-thought: [`pre-skipthought`](http://pfliu.com/pl-nlp2019/bs/pre-skipthought.html) (implies `arch-lstm`, `task-seq2seq`) 159 | * ELMo: [`pre-elmo`](http://pfliu.com/pl-nlp2019/bs/pre-elmo.html) (implies `arch-bilstm`, `task-lm`) 160 | * BERT: [`pre-bert`](http://pfliu.com/pl-nlp2019/bs/pre-bert.html) (implies `arch-transformer`, `task-cloze`, `task-textpair`) 161 | * Universal Sentence Encoder (USE): [`pre-use`](http://pfliu.com/pl-nlp2019/bs/pre-use.html) (implies `arch-transformer`, `task-seq2seq`) 162 | 163 | ## Structured Models/Algorithms 164 | 165 | * Hidden Markov Models (HMM): [`struct-hmm`](http://pfliu.com/pl-nlp2019/bs/struct-hmm.html) 166 | * Conditional Random Fields (CRF): [`struct-crf`](http://pfliu.com/pl-nlp2019/bs/struct-crf.html) 167 | * Context-free Grammar (CFG): [`struct-cfg`](http://pfliu.com/pl-nlp2019/bs/struct-cfg.html) 168 | * Combinatorial Categorical Grammar (CCG): [`struct-ccg`](http://pfliu.com/pl-nlp2019/bs/struct-ccg.html) 169 | 170 | ## Relaxation/Training Methods for Non-differentiable Functions 171 | 172 | * Complete Enumeration: [`nondif-enum`](http://pfliu.com/pl-nlp2019/bs/nondif-enum.html) 173 | * Straight-through Estimator: [`nondif-straightthrough`](http://pfliu.com/pl-nlp2019/bs/nondif-straightthrough.html) 174 | * Gumbel Softmax: [`nondif-gumbelsoftmax`](http://pfliu.com/pl-nlp2019/bs/nondif-gumbelsoftmax.html) 175 | * Minimum Risk Training: [`nondif-minrisk` ](http://pfliu.com/pl-nlp2019/bs/nondif-minrisk.html) 176 | * REINFORCE: [`nondif-reinforce` ](http://pfliu.com/pl-nlp2019/bs/nondif-reinforce.html) 177 | 178 | ## Adversarial Methods 179 | 180 | * Generative Adversarial Networks (GAN): [`adv-gan`](http://pfliu.com/pl-nlp2019/bs/adv-gan.html) 181 | * Adversarial Feature Learning: [`adv-feat`](http://pfliu.com/pl-nlp2019/bs/adv-feat.html) 182 | * Adversarial Examples: [`adv-examp`](http://pfliu.com/pl-nlp2019/bs/adv-examp.html) 183 | * Adversarial Training: [`adv-train`](http://pfliu.com/pl-nlp2019/bs/adv-train.html) (implies `adv-examp`) 184 | 185 | ## Latent Variable Models 186 | 187 | * Variational Auto-encoder (VAE): [`latent-vae`](http://pfliu.com/pl-nlp2019/bs/latent-vae.html) 188 | * Topic Model: [`latent-topic`](http://pfliu.com/pl-nlp2019/bs/latent-topic.html) 189 | 190 | ## Meta Learning 191 | 192 | * Meta-learning Initialization: [`meta-init`](http://pfliu.com/pl-nlp2019/bs/meta-init.html) 193 | * Meta-learning Optimizers: [`meta-optim`](http://pfliu.com/pl-nlp2019/bs/meta-optim.html) 194 | * Meta-learning Loss functions: [`meta-loss`](http://pfliu.com/pl-nlp2019/bs/meta-loss.html) 195 | * Neural Architecture Search: [`meta-arch`](http://pfliu.com/pl-nlp2019/bs/meta-arch.html) 196 | -------------------------------------------------------------------------------- /draw_bar.py: -------------------------------------------------------------------------------- 1 | # import libraries 2 | import matplotlib 3 | matplotlib.use('Agg') 4 | import pandas as pd 5 | import matplotlib.pyplot as plt 6 | import argparse 7 | from collections import defaultdict 8 | #%matplotlib inline 9 | 10 | # set font 11 | plt.rcParams['font.family'] = 'sans-serif' 12 | plt.rcParams['font.sans-serif'] = 'Helvetica' 13 | 14 | # set the style of the axes and the text color 15 | plt.rcParams['axes.edgecolor']='#333F4B' 16 | plt.rcParams['axes.linewidth']=0.8 17 | plt.rcParams['xtick.color']='#333F4B' 18 | plt.rcParams['ytick.color']='#333F4B' 19 | plt.rcParams['text.color']='#333F4B' 20 | 21 | 22 | 23 | 24 | parser = argparse.ArgumentParser(description='Draw Bar') 25 | parser.add_argument('--tsv', default='input.tsv', help='input file separted by \'\\t\' ') 26 | parser.add_argument('--fig', default='out.png', help='the output figure') 27 | parser.add_argument('--title', default='Concept Count in All Papers', help='the title of the graph') 28 | parser.add_argument('--colored_concepts', default=None, nargs='+', 29 | help='An interleaved list of filenames containing concept tags (e.g. first.txt red second.txt purple)') 30 | 31 | args = parser.parse_args() 32 | 33 | concept_colors = defaultdict(lambda: '#007ACC') 34 | if args.colored_concepts: 35 | for i in range(0, len(args.colored_concepts), 2): 36 | print(f"opening {args.colored_concepts[i]} as {args.colored_concepts[i+1]}") 37 | with open(args.colored_concepts[i], 'r') as f: 38 | for line in f: 39 | line = line.strip() 40 | concept_colors[line] = args.colored_concepts[i+1] 41 | print(f'concept_colors[{line}] = {args.colored_concepts[i+1]}') 42 | 43 | 44 | tsv_file = args.tsv 45 | fig_file = args.fig 46 | 47 | fin = open(tsv_file,"r") 48 | cpt_list = [] 49 | val_list = [] 50 | for line in fin: 51 | line = line.strip() 52 | cpt, val = line.split("\t") 53 | val_list.append(int(val)) 54 | cpt_list.append(cpt) 55 | fin.close() 56 | 57 | percentages = pd.Series(val_list, 58 | index=cpt_list) 59 | 60 | df = pd.DataFrame({'percentage' : percentages}) 61 | df = df.sort_values(by='percentage') 62 | 63 | color_list = [concept_colors[x] for x in df.index] 64 | 65 | # we first need a numeric placeholder for the y axis 66 | my_range=list(range(1,len(df.index)+1)) 67 | 68 | fig, ax = plt.subplots(figsize=(10,25)) 69 | 70 | # create lines and dots for each bar 71 | plt.hlines(y=my_range, xmin=0, xmax=df['percentage'], colors=color_list, alpha=0.5, linewidth=5) 72 | # plt.plot(df['percentage'], my_range, "o", markersize=5, colors=color_list, alpha=0.6) 73 | 74 | # set labels 75 | ax.set_xlabel(args.title, fontsize=15, fontweight='black', color = '#333F4B') 76 | ax.xaxis.set_label_position('top') 77 | ax.xaxis.tick_top() 78 | #ax.set_ylabel('') 79 | 80 | # set axis 81 | ax.tick_params(axis='both', which='major', labelsize=12) 82 | plt.yticks(my_range, df.index) 83 | 84 | # add an horizonal label for the y axis 85 | #fig.text(-0.23, 0.86, 'Concept Coverage (Fulltext)', fontsize=15, fontweight='black', color = '#333F4B') 86 | 87 | # change the style of the axis spines 88 | ax.spines['bottom'].set_color('none') 89 | ax.spines['right'].set_color('none') 90 | ax.spines['left'].set_smart_bounds(True) 91 | ax.spines['top'].set_smart_bounds(True) 92 | 93 | ''' 94 | # set the spines position 95 | ax.spines['bottom'].set_position(('axes', -0.04)) 96 | ax.spines['left'].set_position(('axes', 0.015)) 97 | ''' 98 | plt.savefig(fig_file, dpi=300, bbox_inches='tight') 99 | -------------------------------------------------------------------------------- /fig/annotations.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/neulab/nn4nlp-concepts/064d45d94f15c499165e7fed086eb2c89ac12ef8/fig/annotations.png -------------------------------------------------------------------------------- /fig/auto.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/neulab/nn4nlp-concepts/064d45d94f15c499165e7fed086eb2c89ac12ef8/fig/auto.png -------------------------------------------------------------------------------- /get_paper.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import random 3 | import itertools 4 | import os 5 | import sys 6 | import rule_classifier as paper_classifier 7 | import urllib.request 8 | import bs4 as bs 9 | import time 10 | 11 | 12 | 13 | 14 | def label_paper(paper_id = None, paper_meta = None, cased_regexes = None, feature = None): 15 | """Label one paper 16 | 17 | :param paper_id: The paper ID 18 | :param paper_meta: Store meta information of a paper 19 | :param cased_regexes: store meta information of a paper 20 | :param feature: which part of content will we used to label papers. i.e. "title" or "fulltext" 21 | :return: Nothing. 22 | """ 23 | if not os.path.isfile(f'papers/{paper_id}.pdf'): 24 | os.makedirs(f'papers/', exist_ok=True) 25 | try: 26 | urllib.request.urlretrieve(f'https://www.aclweb.org/anthology/{paper_id}.pdf', f'papers/{paper_id}.pdf') 27 | # time.sleep(2) # maybe we would wait some time until downloading processing finishes. 28 | os.system(f'pdftotext papers/{paper_id}.pdf papers/{paper_id}.txt') 29 | except: 30 | print(f'WARNING: Error while downloading/processing https://www.aclweb.org/anthology/{paper_id}.pdf') 31 | return 32 | 33 | with open(f'papers/{paper_id}.txt', 'r') as f: 34 | paper_text = '\n'.join(f.readlines()) 35 | paper_title = ''.join(paper_meta.title.findAll(text=True)) 36 | 37 | is_cased = 1 # if case-sensitive 38 | if feature == "title": 39 | feature = paper_title 40 | is_cased = 0 41 | elif feature == "fulltext": 42 | feature = paper_text 43 | is_cased = 1 44 | 45 | predicted_tags = paper_classifier.classify(feature, cased_regexes, is_cased) 46 | print(f'Title: {paper_title}\n' 47 | f'Local location: papers/{paper_id}.pdf\n' 48 | f'Online location: https://www.aclweb.org/anthology/{paper_id}.pdf\n' 49 | f'Text file location: auto/{paper_id}.txt') 50 | for i, tag in enumerate(predicted_tags): 51 | print(f'Tag {i}: {tag}') 52 | print("------------------------------------------------\n") 53 | 54 | os.makedirs(f'auto/', exist_ok=True) 55 | fin = open(f'auto/{paper_id}.txt', 'w') 56 | print(f'# Title: {paper_title}\n# Online location: https://www.aclweb.org/anthology/{paper_id}.pdf', file=fin) 57 | for tag, conf, just in predicted_tags: 58 | print(f'# CHECK: confidence={conf}, justification={just}\n{tag}',file=fin) 59 | 60 | 61 | 62 | 63 | if __name__ == "__main__": 64 | 65 | parser = argparse.ArgumentParser(description="Get a paper to try to read and annotate") 66 | 67 | parser.add_argument("--paper_id", type=str, default=None, 68 | help="The paper ID to get, if you want to specify a single one (e.g. P84-1031)") 69 | parser.add_argument("--years", type=str, default="19", 70 | help="If a paper ID is not specified, a year (e.g. 19) or range of years (e.g. 99-02) from which"+ 71 | " to select a random paper.") 72 | parser.add_argument("--confs", type=str, default="P,N,D", 73 | help="A comma-separted list of conference abbreviations from which papers can be selected") 74 | parser.add_argument("--volumes", type=str, default="1,2", 75 | help="A comma-separated list of volumes to include (default is long and short research papers)."+ 76 | " 'all' for no filtering.") 77 | parser.add_argument("--n_sample", type=str, default="1", 78 | help="the number of sampled papers if paper_id is not specified (e.g. 1)." 79 | " Write 'all' to select all papers from those years/conferences/volumes.") 80 | 81 | parser.add_argument("--template", type=str, default="template.cpt", 82 | help="The file of concept template (e.g. template.cpt)") 83 | 84 | parser.add_argument("--feature", type=str, default="fulltext", 85 | help="Which parts of paper is used to classify (e.g. fulltext|title)") 86 | 87 | args = parser.parse_args() 88 | 89 | # init variables 90 | feature = args.feature 91 | paper_id = args.paper_id 92 | template = args.template 93 | n_sample = args.n_sample 94 | volumes = args.volumes.split(',') 95 | paper_map = {} 96 | 97 | # lead the concept template 98 | cased_regexes = paper_classifier.genConceptReg(file_concept=template, formate_col = 3) 99 | 100 | # if paper_id has not been specified 101 | if paper_id == None: 102 | years = args.years.split('-') 103 | confs = args.confs.split(',') 104 | if len(years) == 2: 105 | years = list(range(int(years[0]), int(years[1])+1)) 106 | else: 107 | assert len(years) == 1, "invalid format of years, {args.years}" 108 | for pref, year in itertools.product(confs, years): 109 | year = int(year) 110 | pref= pref.upper() 111 | with open(f'acl-anthology/data/xml/{pref}{year:02d}.xml', 'r') as f: 112 | soup = bs.BeautifulSoup(f, 'xml') 113 | for vol in soup.collection.find_all('volume'): 114 | if vol.attrs['id'] in volumes: 115 | for pap in vol.find_all('paper'): 116 | if pap.url: 117 | paper_map[pap.url.contents[0]] = pap 118 | 119 | paper_keys = list(paper_map.keys()) 120 | if n_sample == 'all': 121 | for paper_id in paper_keys: 122 | paper_meta = paper_map[paper_id] 123 | label_paper(paper_id, paper_meta, cased_regexes, feature) 124 | else: 125 | for _ in range(int(n_sample)): 126 | randid = random.choice(paper_keys) 127 | if not os.path.isfile(f'annotations/{randid}.txt') and not os.path.isfile(f'auto/{randid}.txt'): 128 | paper_id = randid 129 | paper_meta = paper_map[paper_id] 130 | #print(paper_meta) 131 | label_paper(paper_id, paper_meta, cased_regexes, feature) 132 | else: 133 | print(f'Warning: {paper_id} has been labeled!') 134 | 135 | # if paper_id is specified 136 | else: 137 | prefix = paper_id.split("-")[0] 138 | with open(f'acl-anthology/data/xml/{prefix}.xml', 'r') as f: 139 | soup = bs.BeautifulSoup(f, 'xml') 140 | for vol in soup.collection.find_all('volume'): 141 | if vol.attrs['id'] in volumes: 142 | for pap in vol.find_all('paper'): 143 | if pap.url and pap.url.contents[0] == paper_id: 144 | paper_map[pap.url.contents[0]] = pap 145 | #print(paper_map[pap.url.contents[0]]) 146 | if not os.path.isfile(f'annotations/{paper_id}.txt') and not os.path.isfile(f'auto/{paper_id}.txt'): 147 | label_paper(paper_id, paper_map[paper_id], cased_regexes, feature) 148 | sys.exit(1) 149 | else: 150 | print(f'Warning: {paper_id} has been labeled!') 151 | 152 | if len(paper_map) == 0: 153 | print(f'Warning: {paper_id} can not been found!') 154 | sys.exit(1) 155 | 156 | 157 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | beautifulsoup4 2 | matplotlib 3 | pandas -------------------------------------------------------------------------------- /rule_classifier.py: -------------------------------------------------------------------------------- 1 | import os 2 | import sys 3 | import re 4 | 5 | # generate regular expression from a concept template 6 | # the formate of template: 7 | # concept \t father_concept \t keywords 8 | def genConceptReg(file_concept="test.cpt", formate_col = 3): 9 | if not os.path.exists(file_concept): 10 | print("can not find concept template") 11 | os._exit(0) 12 | 13 | cased_regexes = [] 14 | fin = open(file_concept,"r") 15 | for line in fin: 16 | line = line.rstrip("\n") 17 | if len(line.split("\t"))!= formate_col or line[0] == "#": 18 | continue 19 | info_list = line.split("\t") 20 | cased_regexes.append((info_list[2].rstrip("\r"), info_list[0], 0.9)) 21 | fin.close() 22 | return cased_regexes 23 | 24 | 25 | def classify(paper_text=None, cased_regexes = None, flag_cased = 1, threshold=0.5): 26 | ret = [] 27 | if paper_text != None: 28 | for reg, tag, certainty in cased_regexes: 29 | if flag_cased == 1: 30 | m = re.search(reg, paper_text) 31 | else: 32 | m = re.search(reg, paper_text,re.IGNORECASE) 33 | 34 | if m: 35 | ret.append((tag, certainty, 'Matched regex {}'.format(str(reg)))) 36 | return ret 37 | -------------------------------------------------------------------------------- /template.cpt: -------------------------------------------------------------------------------- 1 | #Concept Concept-fa Keyword 2 | # ---------- Optimizers ---------- 3 | optim-sgd null SGD|gradient decent 4 | optim-adam optim-sgd Adam 5 | optim-adagrad optim-sgd Adagrad 6 | optim-adadelta optim-sgd Adadelta 7 | optim-noam optim-adam specialized Transformer learning rate 8 | optim-momentum optim-sgd SGD with Momentum 9 | optim-amsgrad optim-sgd AMSGrad 10 | optim-projection optim-sgd projection|projected gradient descent 11 | # ---------- Initialization ---------- 12 | init-glorot null Xavier|Glorot 13 | init-he null He initialization 14 | # ---------- Regularization ---------- 15 | reg-dropout null Dropout|dropout 16 | reg-worddropout null word dropout 17 | reg-stopping null early stopping 18 | reg-patience null patience 19 | reg-norm null Norm (L1/L2) Regularization|L2 regularization 20 | reg-decay null weight decay 21 | reg-labelsmooth null label smooth 22 | # ---------- Normalization ---------- 23 | norm-layer null Layer Normalization|layer normalization 24 | norm-batch null Batch Normalization|batch normalization 25 | norm-gradient null gradient clipping|gradient normalization|clipnorm 26 | # ---------- Training Paradigms: Multi-task/Multi-lingual/Transfer ---------- 27 | train-mtl null multi-task learning 28 | train-mll train-multitask cross-lingual|multi-lingual|cross language 29 | train-transfer null transfer learning|domain adaptation 30 | train-active null active learning 31 | train-augment null data augmentation|Data Augmentation 32 | train-curriculum null data curriculum 33 | train-parallel null parallelism 34 | # ---------- Activation Functions ---------- 35 | activ-tanh null Hyperbolic Tangent|hyperbolic tangent 36 | activ-relu null Rectified Linear Units|rectified linear 37 | # ---------- Pooling Operations ---------- 38 | pool-max null Max Pooling|max-pooling|max pooling 39 | pool-mean null Mean Pooling|mean pooling|Average Pooling|average pooling 40 | pool-kmax null k-Max Pooling|k-max pooling 41 | # ---------- Recurrent Architectures ---------- 42 | arch-rnn null Recurrent Neural Network|RNN|recurrent neural networks 43 | arch-birnn arch-rnn Bi-directional Recurrent Neural Network|Bi-RNN|BiRNN 44 | arch-lstm arch-rnn Long Short-term Memory|LSTMs|LSTM 45 | arch-bilstm arch-rnn Bi-directional Long Short-term Memory|BiLSTM|BiLSTMs|Bi-LSTM|BLSTM 46 | arch-gru arch-rnn Gated Recurrent Units|GRU|GRUs 47 | arch-bigru arch-rnn Bi-directional GRU|Bi-GRU|BiGRU 48 | # ---------- Other Sequential Architectures ---------- 49 | arch-bow null bag-of-words|bag-of-embeddings|deep averaging network 50 | arch-cnn null Convolutional Neural Networks|CNNs|convolutional neural network 51 | arch-att null attention 52 | arch-selfatt arch-att Self Attention|self attention|self-attention 53 | arch-recnn null Recursive Neural Network 54 | arch-treelstm null Tree-LSTM|TreeLSTM|Tree-structured Long Short-term Memory 55 | arch-gnn null Graph Neural Network|GNN 56 | arch-gcnn null Graph Convolutional Neural Network|GCNN 57 | # ---------- Architectural Techniques ---------- 58 | arch-residual null residual connections 59 | arch-gating null gating connections|Highway 60 | arch-memo null memory network|external memory 61 | arch-copy null copy mechanism|copying mechanism 62 | arch-bilinear null bilinear|bi-linear|biaffine|bi-affine 63 | arch-coverage null coverage 64 | arch-subword null subword|BPE|sentencepiece 65 | arch-energy null energy-based|globally normalized|global normalization 66 | # ---------- Standard Composite Architectures ---------- 67 | arch-transformer arch-selfatt Transformer 68 | # ---------- Model Combination ---------- 69 | comb-ensemble null ensemble|ensembling 70 | # ---------- Search Algorithms ---------- 71 | search-greedy null Greedy Search|greedy search 72 | search-beam null Beam Search|beam search 73 | search-astar null A\* Search 74 | search-viterbi null Viterbi Algorithm|Viterbi|viterbi 75 | search-sampling null Ancestral Sampling|ancestral sampling 76 | search-gumbel search-sampling Gumbel Max|gumbel max 77 | # ---------- Pre-trained Embedding Techniques ---------- 78 | pre-word2vec null word2vec|Word2vec 79 | pre-fasttext null fasttext|FastText|fastText 80 | pre-glove null glove|GloVe 81 | pre-paravec null paragraph vector|ParaVector 82 | pre-skipthought task-seq2seq Skip-thought|skip-thought|skipthought 83 | pre-elmo task-lm ELMo 84 | pre-bert arch-transformer BERT 85 | pre-use null Universal Sentence Encoder|universal sentence encoder 86 | # ---------- Pre-trained Embedding Techniques ---------- 87 | struct-hmm null Hidden Markov Models|hidden markov 88 | struct-crf null Conditional Random Fields|conditional random fields|CRF 89 | struct-cfg null Context-free Grammar|context-free grammar 90 | struct-ccg null Combinatorial Categorical Grammar|combinatorial categorical grammar 91 | # ---------- Relaxation/Training Methods for Non-differentiable Functions ---------- 92 | nondif-enum null Complete Enumeration|complete enumeration 93 | nondif-straightthrough null Straight-through Estimator|straight-through estimator 94 | nondif-gumbelsoftmax null Gumbel Softmax|gumbel softmax 95 | nondif-minrisk null Minimum Risk Training|minimum risk 96 | nondif-reinforce null REINFORCE 97 | # ---------- Adversarial Methods ---------- 98 | adv-gan null Generative Adversarial Networks|GAN|generative adversarial 99 | adv-feat null Adversarial Feature Learning|adversarial feature 100 | adv-examp null Adversarial Examples|adversarial examples 101 | adv-train adv-examp Adversarial Training|adversarial trainin 102 | # ---------- Latent Variable Models ---------- 103 | latent-vae null Variational Auto-encoder|variational auto-encoder|latent variable 104 | latent-topic null topic model 105 | # ---------- Loss Functions ---------- 106 | loss-cca null Canonical Correlation Analysis|canonical correlation analysis 107 | loss-svd null Singular Value Decomposition|SVD|singular value decomposition 108 | loss-margin null Margin-based Loss Functions|margin-based|ranking-based loss 109 | loss-cons null Contrastive Loss 110 | loss-nce loss-cons Noise Contrastive Estimation|NCE 111 | loss-triplet loss-cons Triplet loss|triplet loss 112 | # ---------- Prediction Tasks ---------- 113 | task-textclass null Text Classification|text classification 114 | task-textpair null natural language inference|semantic matching|question answering matching 115 | task-seqlab null named entity recognition|Part-of-Speech|word segmentation|text chunking 116 | task-extractive task-seqlab extractive summarization 117 | task-spanlab null span labeling|machine reading comprehension|SQuAD 118 | task-lm null language model 119 | task-condlm null image caption 120 | task-seq2seq null machine translat|abstractive summarization 121 | task-cloze null cloze-style prediction|masked language model|text cloze 122 | task-context null context prediction 123 | task-relation null dependency pars 124 | task-tree null syntactic pars|semantic pars 125 | task-graph null AMR|UDD 126 | task-lexicon null lexicon induction|bi-lingual embedding|embedding alignment|MUSE 127 | task-alignment null word alignment|GIZA 128 | # ---------- Meta Learning ---------- 129 | meta-init null MAML 130 | meta-optim null meta optimizer|meta learner 131 | meta-loss null meta learning loss 132 | meta-arch null architecture search 133 | --------------------------------------------------------------------------------