├── .gitattributes
├── README.md
├── AASC.md
└── section_classify
    ├── section_classify.pl
    └── section-title-all.sup


/.gitattributes:
--------------------------------------------------------------------------------
1 | ACL_2018_v2.tar.gz filter=lfs diff=lfs merge=lfs -text
2 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | ## AASC: ACL Anthology Sentence Corpus
 2 | 
 3 | AASC is a corpus of natural language text extracted from scientific papers.
 4 | It contains 2,339,195 sentences from PDF-format papers from the ACL Anthology [[1]](http://aclanthology.info/), a comprehensive scientific paper repository on computational linguistics and natural language processing.
 5 | 
 6 | For PDF document analysis, we use PDFNLT 1.0 [[2]](https://github.com/KMCS-NII/PDFNLT-1.0), a PDF paper analysis tool specifically trained for ACL Anthology. After excluding papers with non-standard structures (eg. no _abstract_, or no _references_), the rest 13,923 papers were further processed by (1) sentence splitting, and (2) section type labeling.
 7 | 
 8 | The <a href="https://kmcs.nii.ac.jp/resource/AASC/ACL_2018_v2.tar.gz">`ACL_2018_v2.tar.gz`</a> file contains the extracted natural language sentences for each `<paper_ID>`, where the `<paper_ID>` is the unique identifier of the paper on the ACL Anthology. The corresponding PDF version can be found using the URL:
 9 | [http://aclweb.org/anthology/<paper_ID>](http://aclweb.org/anthology/<paper_ID>).
10 | 
11 | Each sentence file is named as `<paper_ID>.ss` within which each line represents tab-separated values of a sentence:
12 | 
13 | |Column|Example  (A00-1001.ss)|
14 | |:-----------|:-----------|
15 | | Sentence ID | `s-1-1-0-0` |
16 | | Section type | `abstract` | 
17 | | Sentence text: | `The paper describes a natural language based expert system route advisor for the public bus transport in Trondheim, Norway.` |
18 | 
19 | A simple dictionary-based classifier was used for the section type labeling.
20 | 
21 | For details, see also our [Overview of AASC](https://kmcs.nii.ac.jp/resource/AASC/AASC.html)
22 | 
23 | ---
24 | * Following the copyright policy of the original [ACL Anthology](https://www.aclweb.org/anthology/), AASC materials are licensed under the [Creative Commons Attribution-NonCommercial-ShareAlike 3.0](https://creativecommons.org/licenses/by-nc-sa/3.0/) International License.  
25 | * This work was supported by National Institute of Informatics and JST Crest JPMJCR1513.
26 | 


--------------------------------------------------------------------------------
/AASC.md:
--------------------------------------------------------------------------------
 1 | Construction of a New ACL Anthology Corpus for Deeper Analysis of Scientific Papers
 2 | 
 3 | AASC (ACL Anthology Sentence Corpus) is a new ACL Anthology corpus for deeper analysis of scientific papers. The corpus is constructed using [PDFNLT 1.0](https://github.com/KMCS-NII/PDFNLT-1.0), an on-going project to develop a PDF analysis tool suitable for NLP. The corpus has two features distinguishing it from existing scientific paper corpora. First, the output is a collection of natural language sentences rather than texts directly extracted from PDF files. Second, non-textual elements in a sentence, such as citation marks and inline math formulae, are substituted for uniquely identifiable tokens. 
 4 | 
 5 | ## Pipeline of the Corpus Construction
 6 | 
 7 | #### Layout analysis
 8 | 
 9 | Scientific papers formatted in PDF are converted into XHTML files. First, figures and tables are extracted with an open-source software tool, [pdffigures](http://pdffigures.allenai.org)[1]. Next, using pdftotext, which is included in [Poppler](https://poppler.freedesktop.org), words and their physical coordinates on a page are extracted. In order to simultaneously extract font properties, a patch is applied to Poppler. Then, on the basis of the vertical and horizontal alignments of words, words are arranged to form text lines.
10 | 
11 | #### Logical structure analysis
12 | The output from the layout analysis is represented as text lines containing information about (1) word attributes such as word notations, coordinates in the paper, typeface and font size, and (2) geometric attributes of lines such as position in the page, width, gap from the previous line, indentation and hanging. With these used as features, the most appropriate label for each text line is selected using a Conditional Random Field (CRF) [2] based classifier. The label set specifies major components of scientific papers such as _abstract_, _section label_, _math formula_, or _reference_. Then, adjacent text lines with the same labels are merged into a single block.
13 | 
14 | Each block is categorized as one of three element types: _header-element_, _floating-element_, and _text-element_. _Header-elements_ include abstract header, section titles and reference header. _Floating-elements_ include footnotes, figures, tables, or captions. _Text-elements_ are contents of sections. For _text-elements_, distant blocks with page/column breaks are connected to each other. In our current implementation, our CRF is trained using 147 papers from the ACL Anthology.
15 | 
16 | After serializing _body-text_ blocks in reading order, the logical structure of the sections in the document is determined on the basis of the section labels. Finally, an initial XHTML file is generated with both layout and logical structure tags. By specifying options, non-textual blocks such as tables, figures and independent-line mathematical formulae can also be output as images.
17 | 
18 | #### Sentence extraction
19 | 
20 | For texts in _body-text_ blocks, dehyphenation and sentence splitting are applied. Each extracted sentence is given a unique ID number. Also, for each sentence, non-textual elements are substituted with uniquely identifiable tokens so that they do not deteriorate subsequent syntax parsing. These non-textual elements include independent and inline mathematical expressions and citation marks that refer to tables, figures, and bibliographies. For math formulae and bibliographic items, mapping tables between the non-textual elements' IDs and their strings are stored in a separate file. Finally, all sentence IDs, non-textual elements' IDs, and citation IDs are embedded into the output XHTML files. 
21 | 
22 | #### Section type classifier
23 | 
24 | We categorize first-level sections of each paper into one of eight classes: _abstract, introduction, background, method, result, discussion, method-general_ and _others_. In our categorization, we adopted a simple keyword-based method that is based on a manually constructed dictionary. In addition, using our preliminary corpus dated January 2018, we manually checked all the titles that appeared at least ten times to enumerate all exceptional cases, such as ``document summarization'' as a topic for natural language processing instead of _conclusion_ of the paper. For the remaining section titles that did not contain any matched keywords in the dictionary, we observed that most of them referred to specific method or resource names. In our corpus, we grouped them into a single category named _method-general_. _Background_ includes related work. _Others_ includes minor cases such as acknowledgment or references recognized as sections.
25 | 
26 | ### AASC Corpus Statistics
27 | 
28 | #### Statistics as of September 2018
29 | 
30 | We crawled 44,481 PDF files from the ACL Anthology, and first converted all the files into XHTML format using our analysis tool. In total, 1,174,419 math formulae, 250,082 tables and figures, and 768,290 references were extracted. Then, we selected papers that had both _abstract_ and _references_ blocks. After the filtering, we obtained 38,864 papers with a total of 6,144,852 sentences (158.1 sentences per paper on the average). AASC Corpus includes  2,339,195 sentences from 13,923 papers published at ACL events.
31 | 
32 | #### Dataset quality: comparison with existing PDF analysis tools
33 | 
34 | Many PDF analysis systems dedicated to scientific papers have been developed as PDF document analysis is almost indispensable for any research that handles scientific papers. These existing systems process documents using pipelined multiple CRFs (or Support Vector Machines) corresponding to different block types such as references, names, or affiliations trained with annotated documents of larger size.
35 | 
36 | Since we used a single CRF trained with only a limited amount of training data, the objective of the comparison is to check whether our simplified model with a limited amount of training data still achieves a satisfactory quality level comparable to that of other existing tools. Note that despite the disadvantage of possible performance degeneration, the simplicity makes in-house customization for Japanese papers easy for us.
37 | 
38 | We evaluated our model's performance using 30 manually annotated papers randomly selected either from in- or out-of-domain paper collections. The out-of-domain papers were taken from material science journals in diverse formats.  As a baseline method, we selected GROBID [3],
39 | one of the state-of-the-art PDF analysis tools, and
40 | used online conversion tools publicly available on the Web. Once the outputs from each PDF analysis system are obtained, we first determine the position of each text line in the manually annotated reference file. Since non-negligible differences exist in the output formats (e.g., how citation marks and inline math formulae are processed), we used dynamic programming-based matching. 
41 | 
42 | In our evaluation, we used (1) section titles and (2) sequential numbers of text lines in the reference files. Tables 1 shows the recognition errors of first-level sections. Table 2 shows the errors in text line extraction where the main reason for the difference between _Missed lines (all)_ and _Missed lines (text-elements)_ is the treatment of the abstracts.
43 | 
44 | Based on the comparison, we confirm that, in terms of section structure and reading order determination, the quality of the extracted text is comparable to the ones of state-of-the-art systems such as GROBID. 
45 | 
46 | <div style="text-align: center;">
47 | Table 1: Section level performance comparison.
48 | </div>
49 | 
50 | ||Total number of errors|Falsely recognized sections|Missed sections| Total number of sections|
51 | |:-----------|:-----------|:-----------|:-----------|:-----------|
52 | |pdfanalyzer (acl) |37|10|27|212|
53 | |grobid (acl) |46|25|21|212|
54 | |pdfanalyzer (material) |67|28|39|149|
55 | |grobid (material) |30|13|17|149|
56 | 
57 | <div style="text-align: center;">
58 | Table 2: Text level performance comparison.
59 | </div>
60 | 
61 | ||Missed lines (all)|Missed lines (_text-elements_)| Incorrect line order| Extra text (in chars)|
62 | |:-----------|:-----------|:-----------|:-----------|:-----------|
63 | |pdfanalyzer (acl) |0.0045|0.0033|0.0003|0.0366|
64 | |grobid (acl) |0.0229|0.0148|0.0005|0.1127|
65 | |pdfanalyzer (material) |0.0938|0.0849|0.0016|0.0906|
66 | |grobid (material) |0.0480|0.0332|0.0012|0.0544|
67 | 
68 | ### References
69 | 
70 | 1. Clark, C.A., Divvala, S.K.: Looking beyond text: Extracting figures, tables and
71 |   captions from computer science papers. In: Proceedings of the AAAI Workshop:
72 |   Scholarly Big Data: AI Perspectives, Challenges, and Ideas (2015)
73 | 2. Lafferty, J.D., McCallum, A., Pereira, F.C.N.: Conditional random fields:
74 |   Probabilistic models for segmenting and labeling sequence data. In:
75 |   Proceedings of the Eighteenth International Conference on Machine Learning
76 |   (ICML). pp. 282--289 (2001)
77 | 3. Lopez, P.: Grobid: Combining automatic bibliographic data recognition and term
78 |   extraction for scholarship publications. In: Agosti, M., Borbinha, J.,
79 |   Kapidakis, S., Papatheodorou, C., Tsakonas, G. (eds.) Research and Advanced
80 |   Technology for Digital Libraries. pp. 473--474. Springer Berlin Heidelberg,
81 |   Berlin, Heidelberg (2009)
82 | 


--------------------------------------------------------------------------------
/section_classify/section_classify.pl:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/perl
  2 | 
  3 | #---------------------------------------------------------#
  4 | use strict;
  5 | use warnings;
  6 | 
  7 | my $sectionfile_exception = "section-title-all.sup";
  8 | my %secdic;
  9 | 
 10 | sub sec_type($);
 11 | sub is_abstract($);
 12 | sub is_intro($);
 13 | sub is_background($);
 14 | sub is_method($);
 15 | sub is_result($);
 16 | sub is_discussion($);
 17 | sub is_reference($);
 18 | sub is_acknowledge($);
 19 | sub is_appendix($);
 20 | sub read_sectitle_dic_sup();
 21 | sub read_sectitle_dic();
 22 | 
 23 | #------------------------------------------------------------------------#
 24 | sub
 25 | sec_type($)
 26 | {
 27 |     my $titlestr = shift(@_);
 28 | 
 29 |     if (!%secdic)
 30 |     {
 31 | 	print STDERR "Read exceptional rules...\n";
 32 | 	read_sectitle_dic_sup();
 33 |     }    
 34 | 
 35 |     $titlestr =~ s/^(\d+)\s*//;
 36 | 
 37 |     my $flag = "NOT_DEFINED";
 38 |     if (defined($secdic{$titlestr})) 
 39 |     { 
 40 | #	    print "$freq\t$titlestr\t$secdic{$titlestr}\n";
 41 | 	$flag = $secdic{$titlestr};
 42 |     }
 43 |     elsif (is_abstract($titlestr)) 
 44 |     { 
 45 | #	    print "$freq\t$titlestr\tabstract\n";
 46 | 	$flag = "abstract";
 47 |     }
 48 |     elsif (is_intro($titlestr)) 
 49 |     { 
 50 | #	    print "$freq\t$titlestr\tintroduction\n"; 
 51 | 	$flag = "introduction";
 52 |     }
 53 |     elsif (is_appendix($titlestr)) 
 54 |     { 
 55 | #	    print "$freq\t$titlestr\tappendix\n"; 
 56 | 	$flag = "appendix";
 57 |     }
 58 |     elsif (is_acknowledge($titlestr)) 
 59 |     { 
 60 | #	    print "$freq\t$titlestr\tacknowledge\n"; 
 61 | #	$flag = "acknowledgement";
 62 | 	$flag = "appendix";
 63 |     }
 64 |     elsif (is_discussion($titlestr)) 
 65 |     { 
 66 | #	    print "$freq\t$titlestr\tdiscussion\n"; 
 67 | 	$flag = "discussion";
 68 |     }
 69 |     elsif (is_background($titlestr)) 
 70 |     { 
 71 | #	    print "$freq\t$titlestr\tbackground\n"; 
 72 | 	$flag = "background";
 73 |     }
 74 |     elsif (is_result($titlestr)) 
 75 |     { 
 76 | #	    print "$freq\t$titlestr\tresult\n"; 
 77 | 	$flag = "result";
 78 |     }
 79 |     elsif (is_method($titlestr)) 
 80 |     { 
 81 | #	    print "$freq\t$titlestr\tmethod\n"; 
 82 | 	$flag = "method";
 83 |     }
 84 |     else 
 85 |     {
 86 | 	$flag = "method-other";
 87 |     }
 88 | #    print "$titlestr\t$flag\n";
 89 | 
 90 |     $flag;
 91 | }
 92 | 
 93 | #------------------------------------------------------------------------#
 94 | sub
 95 | is_method($)
 96 | {
 97 |     my $sentstr = shift(@_);
 98 | 
 99 |     my $flag = 0;
100 |     
101 |     if (($sentstr =~ m/^method/i) && ($sentstr !~ m/^method\w* and/i)) { $flag = 1; }
102 |     elsif ($sentstr =~ m/^approach\b/i) { $flag = 1; }
103 |     elsif ($sentstr =~ m/proposed/i) { $flag = 1; }
104 |     elsif ($sentstr =~ m/^our/i) { $flag = 1; }
105 |     elsif ($sentstr =~ m/the approach/i) { $flag = 1; }
106 |     elsif ($sentstr =~ m/the framework/i) { $flag = 1; }
107 |     elsif ($sentstr =~ m/formulation/i) { $flag = 1; }
108 |     elsif ($sentstr =~ m/framework/i) { $flag = 1; } 
109 |     elsif ($sentstr =~ m/^architecture/i) { $flag = 1; } 
110 |     elsif ($sentstr =~ m/^system/i) { $flag = 1; } 
111 |     elsif ($sentstr =~ m/definition/i) { $flag = 1; }
112 |     elsif ($sentstr =~ m/^overview/i) { $flag = 1; }
113 |     elsif ($sentstr =~ m/algorithm/i) { $flag = 1; }
114 |     elsif ($sentstr =~ m/approach/i) { $flag = 1; }
115 |     elsif ($sentstr =~ m/model/i) { $flag = 1; }
116 |     elsif ($sentstr =~ m/method/i) { $flag = 1; }
117 |     elsif ($sentstr =~ m/corpus/i) { $flag = 1; }
118 |     elsif ($sentstr =~ m/corpora/i) { $flag = 1; }
119 |     elsif ($sentstr =~ m/data/i) { $flag = 1; }
120 |     elsif ($sentstr =~ m/resource/i) { $flag = 1; }
121 |     elsif ($sentstr =~ m/preprocessing/i) { $flag = 1; }
122 |     elsif ($sentstr =~ m/feature/i) { $flag = 1; }
123 |     
124 |     $flag;
125 | }
126 | 
127 | #------------------------------------------------------------------------#
128 | sub
129 | is_result($)
130 | {
131 |     my $sentstr = shift(@_);
132 | 
133 |     my $flag = 0;
134 |     
135 |     if ($sentstr =~ m/experiment/i) { $flag = 1; }
136 |     elsif ($sentstr =~ m/evaluation/i) { $flag = 1; } # need to check
137 |     elsif ($sentstr =~ m/^error analysis/i) { $flag = 1; }
138 |     elsif ($sentstr =~ m/^analysis/i) { $flag = 1; }
139 |     elsif ($sentstr =~ m/result/i) { $flag = 1; }
140 |     elsif ($sentstr =~ m/implementation/i) { $flag = 1; }
141 |     elsif ($sentstr =~ m/preparation/i) { $flag = 1; }
142 |     elsif ($sentstr =~ m/performance/i) { $flag = 1; }
143 |     elsif (($sentstr =~ m/demo\b/i) || ($sentstr =~ m/demos\b/i) ||
144 | 	   ($sentstr =~ m/demonstrat/i)) { $flag = 1; }
145 |     elsif ($sentstr =~ m/baseline/i) { $flag = 1; }
146 |     elsif (($sentstr =~ m/setup/i) 
147 | 	   || ($sentstr =~ m/set up/i) ) { $flag = 1; }
148 |     elsif ($sentstr =~ m/example/i) { $flag = 1; }
149 |     elsif ($sentstr =~ m/illustrat/i) { $flag = 1; }
150 |     elsif ($sentstr =~ m/case study/i) { $flag = 1; }
151 |     elsif ($sentstr =~ m/study/i) { $flag = 1; }
152 |     elsif ($sentstr =~ m/studies/i) { $flag = 1; }
153 |     elsif ($sentstr =~ m/validation/i) { $flag = 1; }
154 |     elsif ($sentstr =~ m/observation/i)  { $flag = 1; }
155 |     elsif ($sentstr =~ m/evaluat/i)  { $flag = 1; }
156 |     elsif ($sentstr =~ m/statistics/i)  { $flag = 1; }
157 |     elsif (($sentstr =~ m/comparison/i) 
158 | 		|| ($sentstr =~ m/qualitative/)
159 | 		|| ($sentstr =~ m/quantitative/)
160 | 		) { $flag = 1; }
161 |     
162 |     $flag;
163 | }
164 | 
165 | #------------------------------------------------------------------------#
166 | sub
167 | is_intro($)
168 | {
169 |     my $sentstr = shift(@_);
170 | 
171 |     my $flag = 0;
172 | 
173 |     if ($sentstr =~ m/^introduction to/i) { $flag = 0; }
174 |     elsif ($sentstr =~ m/^introduction of/i) { $flag = 0; }
175 |     elsif ($sentstr =~ m/^introduction\s*:/i) { $flag = 0; }
176 |     elsif ($sentstr =~ m/^introduction\b/i) { $flag = 1; }
177 |     elsif ($sentstr =~ m/introduction/i) { $flag = 0; }
178 |     elsif ($sentstr =~ m/^preliminaries$/i) { $flag = 1; } 
179 |     
180 |     $flag;
181 | }
182 | 
183 | #------------------------------------------------------------------------#
184 | sub
185 | is_background($)
186 | {
187 |     my $sentstr = shift(@_);
188 | 
189 |     my $flag = 0;
190 |     
191 |     if ($sentstr =~ m/^motivation/i) { $flag = 1; }
192 |     elsif ($sentstr =~ m/^related/i) { $flag = 1; }
193 |     elsif ($sentstr =~ m/^background/i) { $flag = 1; }
194 |     elsif ($sentstr =~ m/^previous/i) { $flag = 1; }
195 |     elsif ($sentstr =~ m/^prior/i) { $flag = 1; }
196 |     elsif ($sentstr =~ m/^literature/i) { $flag = 1; }
197 |     elsif ($sentstr =~ m/^state of the art$/i) { $flag = 1; }
198 |     elsif ($sentstr =~ m/^existing/i) { $flag = 1; }
199 |     elsif (($sentstr =~ m/related work/i)
200 | 		|| ($sentstr =~ m/previous work/i)
201 | 		|| ($sentstr =~ m/relevant work/i)
202 | 		|| ($sentstr =~ m/prior work/i)
203 | 		|| ($sentstr =~ m/ealier work/i)
204 | 		) { $flag = 1; }
205 |     elsif (($sentstr =~ m/review of/i) 
206 | 		|| ($sentstr =~ m/review:/i)) { $flag = 1; }
207 |     elsif ($sentstr =~ m/problem/i) { $flag = 1; }
208 |     elsif ($sentstr =~ m/challenge/i) { $flag = 1; }
209 |     elsif ($sentstr =~ m/motivating/i) { $flag = 1; }
210 |     elsif ($sentstr =~ m/background/i)  { $flag = 1; }
211 |     elsif ($sentstr =~ m/goal/i)  { $flag = 1; }
212 |     elsif ($sentstr =~ m/survey/i)  { $flag = 1; }
213 |     
214 |     $flag;
215 | }
216 | 
217 | #------------------------------------------------------------------------#
218 | sub
219 | is_abstract($)
220 | {
221 |     my $sentstr = shift(@_);
222 | 
223 |     my $flag = 0;
224 |     
225 |     if ($sentstr =~ m/^a\s*bstract[\.\:\s\*]*$/i) { $flag = 1; }
226 | 
227 |     $flag;
228 | }
229 | 
230 | #------------------------------------------------------------------------#
231 | sub
232 | is_reference($)
233 | {
234 |     my $sentstr = shift(@_);
235 | 
236 |     my $flag = 0;
237 |     
238 |     ### tread this as exceptional
239 |     if ($sentstr =~ m/^references\b/i) { $flag = 1; }
240 |     
241 |     $flag;
242 | }
243 | 
244 | #------------------------------------------------------------------------#
245 | sub
246 | is_appendix($)
247 | {
248 |     my $sentstr = shift(@_);
249 | 
250 |     my $flag = 0;
251 |     
252 | #    if ($sentstr =~ m/^acknowledg/i) { $flag = 1; }
253 |     if ($sentstr =~ m/appendix/i) { $flag = 1; }
254 |     
255 |     $flag;
256 | }
257 | 
258 | #------------------------------------------------------------------------#
259 | sub
260 | is_acknowledge($)
261 | {
262 |     my $sentstr = shift(@_);
263 | 
264 |     my $flag = 0;
265 |     
266 |     if ($sentstr =~ m/^acknowledg/i) { $flag = 1; }
267 |     
268 |     $flag;
269 | }
270 | 
271 | #------------------------------------------------------------------------#
272 | sub
273 | is_discussion($)
274 | {
275 |     my $sentstr = shift(@_);
276 | 
277 |     my $flag = 0;
278 |     
279 |     if ($sentstr =~ m/^conclusion/i) { $flag = 1; }
280 |     elsif ($sentstr =~ m/^discussion/i) { $flag = 1; }
281 |     elsif ($sentstr =~ m/discussion/i) { $flag = 1; }
282 |     elsif ($sentstr =~ m/concluding/i) { $flag = 1; }
283 |     elsif ($sentstr =~ m/^summary/i) { $flag = 1; }
284 |     elsif ($sentstr =~ m/^perspective/i) { $flag = 1; }
285 |     elsif ($sentstr =~ m/^outlook$/i) { $flag = 1; }
286 |     elsif ($sentstr =~ m/^contributions/i) { $flag = 1; }
287 |     elsif ($sentstr =~ m/future/i) { $flag = 1; }
288 |     elsif ($sentstr =~ m/remaining/i) { $flag = 1; }
289 |     elsif ($sentstr =~ m/limitation/i) { $flag = 1; }
290 |     elsif ($sentstr =~ m/application/i) { $flag = 1; }
291 |     elsif ($sentstr =~ m/use case/i) { $flag = 1; }
292 |     elsif ($sentstr =~ m/findings/i) { $flag = 1; }
293 |     elsif ($sentstr =~ m/implication/i) { $flag = 1; }
294 |     elsif ($sentstr =~ m/lessons learned/i) { $flag = 1; }
295 |     elsif ($sentstr =~ m/impact on/i) { $flag = 1; }
296 |     elsif ($sentstr =~ m/impacts on/i) { $flag = 1; }
297 | #
298 | #    elsif ($sentstr =~ m/conclusion/i) { $flag = 1; }
299 | #    elsif ($sentstr =~ m/perspective/i) { $flag = 1; }
300 | #    elsif ($sentstr =~ m/remark/i) { $flag = 1; }
301 | #    elsif ($sentstr =~ m/summary/i) { $flag = 1; }
302 | #    elsif ($sentstr =~ m/further/i) { $flag = 1; }
303 |     
304 |     $flag;
305 | }
306 | 
307 | #------------------------------------------------------------------------#
308 | sub
309 | read_sectitle_dic_sup()
310 | {
311 |     my $infile = "DATA/Section/section-title-all.sup";
312 | 
313 |     open(IN, "<$infile") || die "can't open $infile\n";
314 |     print STDERR "Read from $infile.\n";
315 |     while (my $read = <IN>)
316 |     {
317 | 	my @elist = split(/\t|\n/, $read);
318 | 	my $freq = shift(@elist);
319 | 	my $titlestr = shift(@elist);
320 | 	my @seclabel = @elist;
321 | 	$secdic{$titlestr} = join("\t", @seclabel);
322 |     }
323 |     close(IN);
324 | 
325 |     if(0)
326 |     {
327 |     my $infile = "DATA/Section/section-title-method.sup";
328 | 
329 |     open(IN, "<$infile") || die "can't open $infile\n";
330 |     while (my $read = <IN>)
331 |     {
332 | 	my @elist = split(/\t|\n/, $read);
333 | 	my $freq = shift(@elist);
334 | 	my $titlestr = shift(@elist);
335 | 	my @seclabel = @elist;
336 | 	$secdic{$titlestr} = join("\t", @seclabel);
337 |     }
338 |     close(IN);
339 |     }
340 | }
341 | 
342 | #------------------------------------------------------------------------#
343 | 1;
344 | 


--------------------------------------------------------------------------------
/section_classify/section-title-all.sup:
--------------------------------------------------------------------------------
  1 | 1	Introduction to the issue	introduction
  2 | 1	Introduction to the problem	introduction
  3 | 2	Introduction: Motivation and Goals	introduction
  4 | 1	Introduction : the problem	introduction
  5 | 1	Introduction: An Overview	introduction
  6 | 2	Background and Introduction	introduction
  7 | 2	Motivation and Introduction	introduction
  8 | 2	  Introduction	introduction
  9 | 1	Background and introduction	introduction
 10 | 1	Introductions	introduction
 11 | 1	New Introduction	introduction
 12 | 1	Research Problem Introduction	introduction
 13 | 104	References	references
 14 | 2	REFERENCES	references
 15 | 1	References:	references
 16 | 1	Conclusion & Experiments	discussion	result
 17 | 22	Discussion and Related Work	discussion	background
 18 | 14	Discussion and Error Analysis	discussion	result
 19 | 12	Discussion and related work	discussion	background
 20 | 10	Discussion of Results	result
 21 | 4	Discussion and error analysis	discussion	result
 22 | 2	Discussion and Related works	discussion	background
 23 | 1	Discussion & Error Analysis	discussion	result
 24 | 1	Discussion and A Few Post-Hoc Analyses	discussion	result
 25 | 1	Discussion and Additional Tests	discussion	result
 26 | 1	Discussion and Analysis	discussion	result
 27 | 1	Discussion and comparison with related work	discussion	background
 28 | 1	Discussion and Comparisons	discussion	background
 29 | 1	Discussion and current work	discussion	background
 30 | 1	Discussion and data analysis	discussion	result
 31 | 1	Discussion and Data Analysis	discussion	result
 32 | 1	Discussion and Further Analysis	discussion	result
 33 | 1	Discussion and Further Development	discussion	result
 34 | 1	Discussion and Further Experiments	discussion	result
 35 | 1	Discussion and Related Approaches	discussion	background
 36 | 1	Discussion and Related Works	discussion	background
 37 | 1	Discussion Concerning the Difficulties in Chinese Deep Parsing	background
 38 | 1	Discussion of Approaches	background
 39 | 1	Discussion of Experimental Results	result
 40 | 1	Discussion of experiments and results	result
 41 | 1	Discussion of Experiments and Results	result
 42 | 1	Discussion of Feature Extraction	result
 43 | 1	Discussion of Mapping Principles	result
 44 | 1	Discussion of Related Work	discussion	background
 45 | 1	Discussion of results	discussion	result
 46 | 1	Discussion of Segmentation Metrics	result
 47 | 1	Discussion of the architecture	result
 48 | 1	Discussion of the Bakeoff	result
 49 | 1	Discussion of the full system	result
 50 | 1	Discussion of the Model	result
 51 | 1	Discussion of the Results	result
 52 | 1	Discussion of the Results and Related Research	result	background
 53 | 1	Discussion of the Upper-Bound Performance	result
 54 | 1	Discussion of Translation Results	result
 55 | 1	Discussion on biased data	result
 56 | 1	Discussion on Results	result
 57 | 1	Discussion on Transliteration	result
 58 | 1	Discussion  Eusing the predictive model to aid parsing	result
 59 | 1	Analysis and Concluding Remarks	discussion	result
 60 | 3	Summary Generation	method
 61 | 2	Summary Evaluation	result
 62 | 1	Summary Comparison in ParaEval	method
 63 | 1	Summary Content Representation	method
 64 | 1	Summary Content Units	method
 65 | 1	Summary evaluation	method
 66 | 1	Summary Evaluation Techniques	background
 67 | 1	Summary evaluation via question answering	method
 68 | 1	Summary Extraction	method
 69 | 1	Summary Generation Process	method
 70 | 1	Summary Grammaticality	method
 71 | 1	Summary of behavioral data	method
 72 | 1	Summary of Completed Language Packs	result
 73 | 1	Summary of Data Set and Prior Results	background
 74 | 1	Summary of Existing Techniques	background
 75 | 1	Summary of Experimental Results	result
 76 | 1	Summary of Lattice Learning Algorithm	method
 77 | 1	Summary of our results on RTE 3	result
 78 | 1	Summary of Predictors	method
 79 | 1	Summary of Progress to Date	background
 80 | 1	Summary of Results	result
 81 | 1	Summary of the algorithm	method
 82 | 1	Summary of the Approach	method
 83 | 1	Summary of the experiment	result
 84 | 1	Summary of the OntoLearn system	method
 85 | 1	Summary of the Pipeline	method
 86 | 1	Summary of the proposed system	method
 87 | 1	Summary of Topics	background
 88 | 1	Summary of Translation Process	method
 89 | 1	Summary of Unique Features	method
 90 | 1	Summary of User Functionality	method
 91 | 1	Summary Writing Task	method
 92 | 1	Perspectives and semantic similarity	discussion
 93 | 1	Perspectives on Concept Maps	discussion
 94 | 1	Perspectives on TAG	discussion
 95 | 1	Perspectives on Translation	discussion
 96 | 1	Perspectives: Thematic Adaptation	discussion
 97 | 25	Final Remarks	discussion
 98 | 20	Further Work	discussion
 99 | 13	Results and Conclusions	discussion
100 | 11	Final remarks	discussion
101 | 8	Related Work and Conclusions	discussion	background
102 | 8	Remarks	discussion
103 | 8	Results and Conclusion	discussion	result
104 | 7	Further work	discussion
105 | 6	Analysis and Conclusions	discussion	result
106 | 6	Results and conclusions	discussion	result
107 | 4	Evaluation and Conclusions	discussion	result
108 | 4	Further Analysis	result
109 | 4	Related Work and Conclusion	discussion	background
110 | 4	Results and conclusion	discussion	result
111 | 3	Directions for further research	discussion
112 | 3	Evaluations and Conclusion	discussion	result
113 | 3	Further Research	discussion
114 | 3	Outlook and conclusion	discussion
115 | 2	Automatic Summary Evaluation	result
116 | 2	Closing remarks	discussion
117 | 2	Closing Remarks	discussion
118 | 2	Data Summary	result
119 | 2	Evaluation and conclusion	discussion	result
120 | 2	"Evaluation, Results and Conclusion"	discussion	result
121 | 2	Experiment summary	result
122 | 2	Final remarks and future work	discussion
123 | 2	Further Experiments	result
124 | 2	Further Work and Conclusion	discussion
125 | 2	Implementation and Performance Remarks	result
126 | 2	Open Issues and Conclusion	discussion
127 | 2	Remarks and Conclusions	discussion
128 | 2	Results & Conclusions	discussion
129 | 1	"[Francis and Ku~era 82] Francis,Conclusions"	discussion
130 | 1	~ Conclusion	discussion
131 | 1	A brief conclusion	discussion
132 | 1	A Factored Hierarchical Model of Topic and Perspective	method
133 | 1	A Historical Perspective	discussion
134 | 1	A roadmap for further research	discussion
135 | 1	A unified perspective on GRE	discussion
136 | 1	Active Learning for Tagging Reference Summary and Summarization	method
137 | 1	Alt-i: a historical perspective	background
138 | 1	An ACG perspective	method
139 | 1	Analysis & Conclusion	discussion	result
140 | 1	Analysis of the result and conclusion	discussion	result
141 | 1	Applications and Conclusions	discussion
142 | 1	Applications and perspectives	discussion
143 | 1	Assessment and Conclusion	discussion
144 | 1	Automatic Evaluation of Summary	result
145 | 1	Availability of PDEP Data and Potential for Further Enhancements	discussion
146 | 1	Background and Perspectives	discussion	background
147 | 1	Benchmarking and Conclusion	discussion	result
148 | 1	Building the Final Summary	discussion	method
149 | 1	CO4CLUSIO4 A4D PERSPECTIVES	discussion
150 | 1	Cohcluding Remarks	discussion
151 | 1	Commentary; conclusion	discussion
152 | 1	Comparative Summary Generation ρEg	method
153 | 1	Comparison and Conclusion	discussion	result
154 | 1	Conchtding Remarks	discussion
155 | 1	CONCLUI)ING REMARKS	discussion
156 | 1	"Conclusion 514,830"	discussion
157 | 1	"Conclusion, perspectives"	discussion
158 | 1	"Conclusions, caveats, and future work"	discussion
159 | 1	"Conclusions, further directions"	discussion
160 | 1	"Conclusions, Further Directions"	discussion
161 | 1	"Conclusions, Limitations and Future Plan"	discussion
162 | 1	"Conclusions, related & future work"	discussion
163 | 1	"Conclusions, Summary and Future Work"	discussion
164 | 1	Conclusive remarks	discussion
165 | 1	Conclusive Remarks	discussion
166 | 1	Conforming with conclusions of prior surveys	result
167 | 1	Consolidation of perspective	discussion
168 | 1	Corpus overview (our perspective on LCR)	result	discussion
169 | 1	Current Implementation and Further Possibilities	discussion
170 | 1	Current practice in summary evaluation	background
171 | 1	Current Work and Conclusions	discussion
172 | 1	Dataset Annotation Perspective based on Listeners and Readers	discussion
173 | 1	Defining a Summary for News Articles	method
174 | 1	Demonstration and Conclusion	discussion
175 | 1	Demonstration Summary	discussion
176 | 1	Dialogue-oriented Review Summary Generation	method
177 | 1	DICOMER: Evaluating Summary Readability	method
178 | 1	Directions for Further Research	discussion
179 | 1	Document features as potential summary content	method
180 | 1	Employing Centering Theory from Semantic Perspective	method
181 | 1	Error Analysis and Further Enhancements	discussion	result
182 | 1	Error Analysis and Further Improvements	discussion	result
183 | 1	Evaluation and Conclusion	discussion	result
184 | 1	Evaluation and Summary	discussion	result
185 | 1	"Evaluation, experiments and further development"	discussion	result
186 | 1	Examples and Remarks Related to Formal Language Theory	result
187 | 1	Experiment and Conclusions	discussion	result
188 | 1	Experimental Requirements and Further Work	discussion
189 | 1	Experimental results and Conclusion	discussion	result
190 | 1	Experiments and conclusion	discussion	result
191 | 1	Experiments and Conclusions	discussion	result
192 | 1	Facilitating Perspective Change	method
193 | 1	Final Comments and Conclusion	discussion
194 | 1	Final Remarks and Conclusion	discussion
195 | 1	Final Remarks and Future work	discussion
196 | 1	Final results and conclusions	discussion
197 | 1	Findings & Conclusions	discussion
198 | 1	  Conclusion	discussion
199 | 1	  Conclusions	discussion
200 | 1	From Neutral to One-Sided Perspective	method
201 | 1	Further analysis on the oracle behaviour	result
202 | 1	Further Anaphora Resolution Results	result
203 | 1	Further Challenges and Directions	discussion
204 | 1	Further comparison of selected wordnets	result
205 | 1	Further considerations and limits	discussion
206 | 1	Further critique	discussion
207 | 1	Further design considerations	method
208 | 1	Further details of the corpora	method
209 | 1	Further development	method
210 | 1	Further Development of the CAOS Prototype	method
211 | 1	Further Development on Metaphorical Affect Detection	method
212 | 1	Further developments	method
213 | 1	Further Developments	method
214 | 1	Further experiments and analysis	result
215 | 1	Further Exploiting of Global Evidences	method
216 | 1	Further Extensions: Generalizing to other word types via tagset mapping	method
217 | 1	Further Extensions: Reducing False Positives	method
218 | 1	Further Improvements	method
219 | 1	Further Issue: Dealing with the Lack of Data	discussion
220 | 1	Further Issues	discussion
221 | 1	Further Issues of Manual Evaluation	discussion
222 | 1	Further motivation for the analysis	result
223 | 1	Further Observations	result
224 | 1	Further perspectives	discussion
225 | 1	Further Related Work	background
226 | 1	Further research	discussion
227 | 1	FURTHER RESEARCH ISSUES	discussion
228 | 1	Further steps and conclusions	discussion
229 | 1	Further study	discussion
230 | 1	Further testing	result
231 | 1	Further Use Cases	discussion
232 | 1	Further Ways to Improve: Mistakes of 3CosAdd and LRCos	result
233 | 1	Further Work and Conclusions	discussion
234 | 1	GATE from a Teaching Perspective	result
235 | 1	General Conclusion	discussion
236 | 1	General conclusions	discussion
237 | 1	General remarks	discussion
238 | 1	Generating a Single Sentence Summary	method
239 | 1	Generating an Abstractive Summary of a Multimodal Document	method
240 | 1	Generating summary from nested tree	method
241 | 1	gre from the Perspective of Problem Solving	method
242 | 1	Human summary analysis	result
243 | 1	Impact and Conclusions	discussion
244 | 1	Implications and Further Work	discussion
245 | 1	Implications for further research.	discussion
246 | 1	Initial Feedback and Conclusion	discussion
247 | 1	Lexicological Perspective on Syntax	method
248 | 1	"Limitations, Conclusion and Future Work"	discussion
249 | 1	Linking Givenness and the distributional semantic perspective	method
250 | 1	Mediatory Summary	discussion
251 | 1	Method for Producing Table Style Summary	method
252 | 1	Morphology and Further examples	method
253 | 1	Motivation and Long-term Perspective	background
254 | 1	MultiLingual Summary Evaluation	result
255 | 1	Multi-Perspective Evaluations	result
256 | 1	New Perspectives to Event Extraction	discussion
257 | 1	Opinion Summary Formats	method
258 | 1	Oracle Summary Extraction as an	method
259 | 1	Overall conclusion	discussion
260 | 1	Overview of System From User Perspective	method
261 | 1	Persian from a Morphological Perspective	method
262 | 1	"Perspectives, Challenges, Milestones"	discussion
263 | 1	Preliminary Conclusions	discussion
264 | 1	Reader Perspective	method
265 | 1	Reader/Writer Perspective	method
266 | 1	Related results and conclusion	discussion	result
267 | 1	Related work and conclusions	discussion	background
268 | 1	Related Work and Remarks	background
269 | 1	Remarks and Future Work	discussion
270 | 1	Remarks on ACGs	method
271 | 1	Remarks on parsing and learning	method
272 | 1	Remarks on Strategy	method
273 | 1	Result and Conclusion	discussion	result
274 | 1	Results and Further Work	discussion	result
275 | 1	Results and perspectives	discussion	result
276 | 1	"Results, Conclusions and Future Plans"	discussion	result
277 | 1	Review Summary Generation	method
278 | 1	Simulating Gesture Use: The Generation Perspective	method
279 | 1	Some Conclusions	discussion
280 | 1	Some Further Applications	discussion
281 | 1	Specificity and summary quality	result
282 | 1	Speculations and Conclusions	discussion
283 | 1	Statistical Modeling of Perspectives	method
284 | 1	Status and Conclusions	discussion
285 | 1	Summarization Outputting a Summary Containing Multiple Words	result
286 | 1	"Summary, Conclusions & Future Work"	discussion
287 | 1	System Summary	method
288 | 1	Task Conclusions	result
289 | 1	TESLA-S: Evaluating Summary Content	result
290 | 1	The ACL RD-TEC: Further Annotation Layers for ACL ARC	method
291 | 1	The Computational Perspective	method
292 | 1	The conclusions and future work	discussion
293 | 1	The Human Perspective	method
294 | 1	The Oracle or Average Jo Summary	method
295 | 1	Thesis Summary and Contributions	discussion
296 | 1	Ultra-concise Summary Generation	method
297 | 1	User Perspective	method
298 | 1	WORLDTREK EDITION perspectives	discussion
299 | 1	Writer Perspective	discussion
300 | 1	Motivation and Ablative Analyses	background	method
301 | 1	Motivation and Aimed Application	background
302 | 1	Motivation and Algorithmic Overview	background	method
303 | 1	Motivation and background to study	background
304 | 1	Motivation and corpus analysis	background	method
305 | 1	Motivation and explored methods	background	method
306 | 1	Motivation and initial experiments	background	result
307 | 1	Motivation for using discourse relations	method
308 | 1	Motivation for using ρEratios	method
309 | 1	Motivation Measure	method
310 | 1	Motivation of enhancements	method
311 | 1	Motivation through Feedback	method
312 | 1	Motivational Experiments	result
313 | 1	Motivational Interviewing	method
314 | 1	Motivational Interviewing Dataset	method
315 | 4	Related and Future Work	background	discussion
316 | 1	Related & Future Work	background	discussion
317 | 1	Related Annotation Efforts	background
318 | 1	Related Annotation Schemes	background
319 | 1	Related Approaches	background
320 | 1	Related Approximation Methods	background
321 | 1	Related Architectures and Grammars	background
322 | 1	Related computational and linguistic formalisms and psycholinguistic findings	background
323 | 1	Related Computational Work	background
324 | 1	Related Controlled Natural Languages	background
325 | 1	Related corpora and databases	background	background
326 | 1	Related datasets	background
327 | 1	Related Learner Corpora	background
328 | 1	Related metadata resources	background
329 | 1	Related Research and Future Work	background	discussion
330 | 1	Related Results	result
331 | 1	Related source work 6 of	background
332 | 1	Related Word	method
333 | 1	Related Work and Data Basis	background
334 | 1	Related Work and Dataset	background
335 | 1	Related Work and Experimental Framework	background	result
336 | 1	Related work and Future directions	background	discussion
337 | 1	Related Work and Future Directions	background	discussion
338 | 1	Related Work and Our Ideas	background	method
339 | 1	Related Work and Problem Definition	background	method
340 | 1	Relatedness	method
341 | 1	Relatedness curves for acquiring paraphrases	result
342 | 1	Relatedness to Query	result
343 | 1	Relatedness-based Query Expansion (RQE)	method
344 | 1	Background & Our Models	background	method
345 | 1	Background and Approach	background	method
346 | 1	Background and Methods	background	method
347 | 1	Background and Model	background	method
348 | 72	Overview	background
349 | 3	Overview and Related Work	background
350 | 2	Overview of Experiments	result
351 | 2	Overview of the Problem	background
352 | 1	Overview and Motivation	background
353 | 1	Overview: The AuCoPro Project	background
354 | 1	Overview: The Gloss Extraction Task	background
355 | 1	Overview of the corpus	method
356 | 1	Overview of the CRAFT Concept Annotation Guidelines	method
357 | 1	Overview of the CRAFT Corpus	method
358 | 1	Overview of Related Work	background
359 | 1	Overview of Related Works	background
360 | 1	Overview of Document Reordering in Chinese IR	background
361 | 1	Overview of Existing Editing Methods	background
362 | 1	Overview of Prior Approaches	background
363 | 1	Overview of the PDTB	background
364 | 1	Overview of the Penn Discourse Treebank	background
365 | 1	Overview of the Text-Generation System	background
366 | 1	Overview of the visualization tool	background
367 | 1	Overview of Translation Model	background
368 | 1	Overview of Verification of NLG Systems	background
369 | 1	Overview of a Chinese Word Ordering Detection and Correction System	background
370 | 1	Overview of Linguistic Typology	background
371 | 1	Overview of SMT	background
372 | 1	Overview of POMDP	background
373 | 1	Overview of HMM	background
374 | 1	Previous Data	result
375 | 1	Previous Evaluation Measures	result
376 | 1	Previous experiments	result
377 | 1	Previous MIR Assumptions	method
378 | 1	Previous models	method
379 | 1	Previous propositions	method
380 | 1	Previous Treebanks	method
381 | 1	Previous Use of Gazetteers	method
382 | 1	Previous techniques	method
383 | 1	Previously Existing Data	background
384 | 1	Prior Polarities Formulae	method
385 | 1	Prior polarity scoring	method
386 | 1	Prior-enriched semantic networks	method
387 | 1	Prioritizing analyses	method
388 | 1	Prior-Polarity Subjectivity Lexicon	method
389 | 1	Literature beyond Project Gutenberg	method
390 | 2	Experimental Approach	method
391 | 2	Methods and experiments	result	method
392 | 2	Information Retrieval Evaluation by User Experiment	background
393 | 2	Methodology and Experiment	result	method
394 | 2	Proposed Method and Experiments	result	method
395 | 1	Approach and Experimental Settings	result	method
396 | 1	Approach and Experimental Setup	result	method
397 | 1	Approach and Experiments	result	method
398 | 1	Architecture and experiments	result	method
399 | 1	Experimental approach	method
400 | 1	Our Approach and Experiments	result	method
401 | 2	Addressing Evaluation Metrics	method
402 | 2	Coreference Evaluation Metrics	method
403 | 2	Current Evaluation Schemes	background
404 | 2	Epistemic evaluation markers	method
405 | 2	Epistemic Evaluation Taxonomy	method
406 | 2	Evaluation Technique	result
407 | 2	LFG f-structure in MT evaluation	method
408 | 1	A cohesiveness evaluation model	method
409 | 1	A Method for Automatic Evaluation of Sentence Summarization	method
410 | 1	A formal account of ranking methods and their evaluation	method
411 | 2	Utilizing Nuggets in Evaluations	method
412 | 1	A framework for focused evaluation	method
413 | 1	A New Distortion Evaluation Metric	method
414 | 1	A New Evaluation Framework	method
415 | 1	A New Evaluation Framework : Image Tagging as Lexical Substitution	method
416 | 1	A New Proposal for Edit-Based Text Segmentation Evaluation	method
417 | 1	A Problem of the Type of Discourse Referents concering Dialogue Evaluation	method
418 | 1	A Unified Framework for Automatic Evaluation	method
419 | 1	Accurate and Conclusive Metric Evaluations	method
420 | 1	Adaptable evaluation system	method
421 | 1	ADIOS : a psycholinguistic evaluation	method
422 | 1	AILE: Automatic Evaluation Metric Independent of Sentence	method
423 | 1	Algorithms and Evaluation	result	method
424 | 1	Alignment and evaluation of bilingual wordnets	result	method
425 | 1	Alternatives to Correlation-based Meta-evaluation	method
426 | 1	An Evaluation Plane for NLP	method
427 | 1	An Unsupervised Meta-evaluation Method	method
428 | 1	Application to Manual Text Simplification Evaluation	discussion
429 | 1	Application: Voter Evaluations of an Ideal Candidate	discussion
430 | 1	Applications of the Evaluation Framework	discussion
431 | 1	Approach and Evaluation Methodology	result	method
432 | 1	Applying V&V to NLP  EIs it Evaluation?	background
433 | 1	Applying Multi-Reference Evaluation for ASR	method
434 | 1	Balanced dataset for evaluation of Japanese lexical simplification	method
435 | 1	"Brexit Prediction, Analysis and Evaluation"	result	method
436 | 1	CD ER : A New Evaluation Measure	method
437 | 1	Cognitively Driven Evaluation Measures	method
438 | 1	Corpora for Design and Evaluation	method
439 | 1	Method and Evaluation	result	method
440 | 2	Analysis algorithm	method
441 | 2	Analysis of Noun Senses	method
442 | 2	Analysis of Scientific Literature	method
443 | 6	Method and Results	result	method
444 | 4	Methods and Results	result	method
445 | 3	Methods and results	result	method
446 | 3	Models and Results	result	method
447 | 2	Methodology & Results	result	method
448 | 2	Methodology and Results	result	method
449 | 2	Preliminary Results and Future Development	result	discussion
450 | 1	Abstractive Summarization Method and Results	method
451 | 1	"Data, Methodology, and Results"	result	method
452 | 1	Extractive Summarization Methods and Results	method
453 | 1	Method and Result	result	method
454 | 1	Methodologies used and Results Obtained	result	method
455 | 1	Methodology and results	result	method
456 | 1	Approach & Related Work	method	background
457 | 1	Evaluating Definition Questions	result
458 | 1	Evaluating our new definition	result
459 | 1	Predicting Future Operations	method
460 | 1	Questions for the Future	method
461 | 4	Current Work	method
462 | 3	Completed Work	method
463 | 2	Arabic Morphology in Recent Work	background
464 | 2	Current work	method
465 | 2	Ongoing Work	method
466 | 2	Past Work	background
467 | 1	A Brief Review of Work To Date	background
468 | 1	A BRIEF REVIEW	background
469 | 5	Linguistic Motivation	background
470 | 3	Linguistic motivation	background
471 | 2	Goal and Motivation	background
472 | 1	Aims and Motivation	background
473 | 1	Basic Motivation: Co-occurence graphs	background
474 | 1	Empirical Motivation	background
475 | 1	Historical Motivation	background
476 | 1	Lexicographical motivation	background
477 | 1	Linguistic motivations	background
478 | 1	Objectives and Motivation	background
479 | 1	Psycholinguistic Motivation	background
480 | 1	Task Description and Motivation	background
481 | 1	Theoretical Motivations	background
482 | 1	TSD Motivation	background
483 | 17	Credits	acknowledgement
484 | 17	Qualitative Analysis	result
485 | 11	Empirical Analysis	result
486 | 8	Analyses	result
487 | 5	Quantitative Analysis	result
488 | 4	Complexity analysis	result
489 | 4	The Analysis	result
490 | 3	Manual Analysis	result
491 | 3	Statistical Analysis	result
492 | 2	Annotation analysis	result
493 | 2	Annotation Analysis	result
494 | 2	Comparative analysis	result
495 | 2	Comparative analysis of BCs and TMs	result
496 | 2	Complexity Analysis	result
497 | 2	Differential Language Analysis	result
498 | 2	Exploratory Analysis	result
499 | 2	Extending the Analysis	result
500 | 2	General Error Analysis	result
501 | 2	Detailed Analysis	result
502 | 2	Manual Error Analysis	result
503 | 1	Empirical analysis	result
504 | 1	Empirical Analysis of Lin98 and Vector Quality Measure	result
505 | 1	Empirical calibration analysis	result
506 | 1	Evaluating the accuracy of analysis	result
507 | 1	Combination Analysis	result
508 | 1	Automatic Error Analysis	result
509 | 1	A Systematic Error Analysis	result
510 | 1	Adapting error analysis to Chinese	result
511 | 1	Analysing parser errors	result
512 | 1	Anaphora resolution error analysis: background	background
513 | 1	Comparative error analysis	result
514 | 1	Comparative Quantitative Analysis	result
515 | 1	Competing analyses	result
516 | 1	Component Analysis	result
517 | 1	Componential analysis	result
518 | 1	Computational Analysis	result
519 | 1	Coreference Error Analysis	result
520 | 1	Coreference Subtask Analysis	result
521 | 1	Correlation Analysis	result
522 | 1	Cost Analysis	result
523 | 1	Cost/Benefit Analysis	result
524 | 1	Coverage Analysis	result
525 | 1	Coverage of Morphological Analysis for Arabic	result
526 | 1	Cross-parser analysis	result
527 | 1	Disagreement Analysis	result
528 | 1	Detailed analysis and system combination	result
529 | 1	Discriminative LMs for Error Analysis	result
530 | 1	Event-based Word Error Analysis	result
531 | 1	Fine-grained Word Order Error Analysis	result
532 | 1	High Level View of Error Analysis	result
533 | 1	Human Analysis of Translation Errors in Crowdsourcing Translation	result
534 | 1	Human Correlation Analysis	result
535 | 1	Humor-Prosody Analysis	result
536 | 1	Hybrid Translation Analysis	result
537 | 1	Hyperparameter Analysis	result
538 | 1	ILP-based Analysis	result
539 | 1	Impact: effect analyses and user experience studies	result
540 | 1	Implementing the Analysis of Predicative Verb Forms	result
541 | 1	Non-official error analysis	result
542 | 1	Observation analysis - addressee detection in meetings	result
543 | 1	Observations and Analysis	result
544 | 1	TempEval Error Analysis	result
545 | 1	The process of annotation and error analysis	result
546 | 1	Time complexity analysis	result
547 | 1	Token-level Analysis	result
548 | 1	Translation Analysis	result
549 | 1	Translation consistency analysis	result
550 | 1	Translation Error Analysis	result
551 | 1	Tree-based analyses	result
552 | 1	Towards An Automatic Error Analysis	discussion
553 | 2	Complexity Analysis	result
554 | 2	Detailed Analysis	result
555 | 2	Exploratory Analysis	result
556 | 2	Preliminary Analysis	result
557 | 2	Statistical analyses	result
558 | 2	Tagging confusion analysis	result
559 | 1	A Comparative Analysis	result
560 | 1	Analysing language changes with SPC	result
561 | 1	Analysing News Story Structure	result
562 | 1	Analysing n-gram frequencies	result
563 | 1	Analysing temporal relation sets	result
564 | 1	Analysing the facets of the calque effect	result
565 | 1	Analysing the Foreebank	result
566 | 1	Annotation Agreement analysis	result
567 | 1	AESOP: Automatic Affect State Analysis	result
568 | 1	Aggregate Analysis	result
569 | 1	Ambiguation Analysis	result
570 | 1	An Analysis Based on Developmental Sequences	result
571 | 1	An Analysis of Ineffective Computer- Human Interaction	result
572 | 1	An Analysis of Inter-Annotator Agreement in a Hierarchical Annota-	result
573 | 1	An Analysis of the Metrics	result
574 | 1	Basic Language Analyses	result
575 | 1	Basic Strategy for Predicate-Argument Structure Analysis and Zero-Anaphora Resolution	result
576 | 1	Behavior Analyses	result
577 | 1	Behavioral Analysis	result
578 | 1	Benchmark for German Sentiment Analysis	result
579 | 1	Preliminary Analysis about Linefeed Points	result
580 | 1	Preliminary analysis and distinctions: DUC 2001	result
581 | 1	Preliminary qualitative analysis: a tex­ tual sample	result
582 | 1	Qualitative analysis	result
583 | 1	Qualitative analysis and translation assessments	result
584 | 1	Qualitative Analysis of Lexicons	result
585 | 1	Qualitative Analysis of TimeBank	result
586 | 1	Qualitatively Content Analysis	result
587 | 1	Quantative Annotation Analysis	result
588 | 1	Quantitative analysis	result
589 | 1	Quantitative Analysis of TimeBank	result
590 | 2	Clustering with PoS and Background Knowledge	method
591 | 2	Syntactic and lexical semantic background	method
592 | 2	Terminology and Background	method
593 | 1	Analytical background	method
594 | 1	Classifying Background Information	method
595 | 1	Defining the Background Knowledge	method
596 | 1	Dictation background	method
597 | 1	Duluth38 Background	method
598 | 1	Encoding background knowledge into document classification	method
599 | 1	Predicting difficult words given reader’s background	method
600 | 1	Some background	method
601 | 1	Student background	method
602 | 5	Objective	background
603 | 5	Objectives	background
604 | 1	Aims and Objectives	background
605 | 1	Objective and research questions	background
606 | 1	Objectives and User Requirements	background
607 | 1	Research Objective	background
608 | 1	Research objectives	background
609 | 2	Aims	background
610 | 1	Aims and scope	background
611 | 1	Aims and Situation	background
612 | 1	Aims of Monroe Project	background
613 | 1	Aims of the investigation	background
614 | 1	Being Sensitive to the User’s Goals	method
615 | 1	Complex User Goals	method
616 | 1	Course goals	method
617 | 1	Educational Goals of the Exercise	method
618 | 1	Functional Goals for JANUS	method
619 | 1	Goals and Hypotheses	method
620 | 1	Goals and Scope of the Paper	method
621 | 1	Goals and System Architecture	method
622 | 1	Need for communicative goals	method
623 | 1	Patterns in communicative goals	method
624 | 1	Sets of User Goals	method
625 | 6	Implications	discussion
626 | 6	Inter-annotator Agreement	result
627 | 6	Issues	discussion
628 | 5	Benchmarking	result
629 | 5	Improvements	result
630 | 5	Lessons Learned	discussion
631 | 4	Accuracy	result
632 | 4	Simulations	result
633 | 4	Statistics	result
634 | 4	User Simulations	result
635 | 3	Agreement	result
636 | 3	Impacts on Reference Resolution	result
637 | 3	Relevant literature	background
638 | 3	Research Issues	discussion
639 | 3	Scalability	result
640 | 3	State-of-the-art	background
641 | 3	Survey	background
642 | 3	Usability	discussion
643 | 3	Wiktionary	background
644 | 3	WordNet	background
645 | 1	BabelDomains: Statistics and Release	background
646 | 1	Basic Statistics of the BCCWJ-DepPara	background
647 | 1	Computing the Statistics-based Semantic Compatibility	method
648 | 1	Descriptive statistics for the lexical markers used in feedback	method
649 | 1	Engineering statistics applied to NLP	background
650 | 1	Kappa Statistics for Individual and Common user information	method
651 | 1	Lexicon statistics	method
652 | 1	Pruning via Usage Statistics	method
653 | 1	Statistics and Technical Details	method
654 | 1	Statistics does not Refute UG	method
655 | 1	Statistics Needs UG	method
656 | 1	Statistics of Grammatical Templates	method
657 | 1	Statistics of the Treebank	method
658 | 1	The need to go beyond statistics	method
659 | 1	Treebank statistics	method
660 | 9	Inter-Annotator Agreement	result
661 | 2	Annotator Agreement	result
662 | 2	Inter-annotator agreement	result
663 | 2	Inter-annotator agreement loss	result
664 | 2	Inter-annotator agreements	result
665 | 1	Analyzing Inter-annotator Agreement	result
666 | 1	Annotator Accuracy and Bias	result
667 | 1	Annotator agreement	result
668 | 1	Annotator Reliability	result
669 | 1	Impact of Number of Annotators	result
670 | 1	Inter Annotator Agreement	result
671 | 1	Inter-Annotator Agreement (IAA)	result
672 | 1	Inter-annotator agreement across domains	result
673 | 1	Inter-Annotator Agreement and Parts of Speech	result
674 | 1	Inter-Annotator Agreement in Grammar-Based Sembanking	result
675 | 1	Inter-Annotator Agreement Tests	result
676 | 1	Intra-Chunk Dependency Annotator	result
677 | 


--------------------------------------------------------------------------------