├── PKUSUMSUM.jar ├── README.md ├── README_V1.2.docx ├── code ├── ClusterCMRW.java ├── Coverage.java ├── ILP.java ├── Lead.java ├── LexPageRank.java ├── MEAD.java ├── Main.java ├── ManifoldRank.java ├── Run.java ├── Stemmer.java ├── Submodular.java ├── TextRank.java ├── Tokenizer.java ├── doc.java └── stopword_Eng └── lib ├── ansj_seg-5.0.2-all-in-one.jar ├── lpsolve55j.jar ├── slf4j-nop-1.7.21.jar ├── stanford-ner.jar └── stopword_Eng /PKUSUMSUM.jar: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PKULCWM/PKUSUMSUM/721bbee1b1e112dce97c2fa3193fb61488a624bb/PKUSUMSUM.jar -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | #README 2 | 3 | ##Introduction 4 | PKUSUMSUM (PKU’s SUMmary of SUMmarization methods) is an integrated toolkit for automatic document summarization. It supports single-document, multi-document and topic-focused multi-document summarizations, and a variety of summarization methods have been implemented in the toolkit. 5 | 6 | Users can easily use the toolkit to produce summaries for documents or document sets, and implement their own summarization methods based on the platform. 7 | 8 | Main features of PKUSUMSUM include: * It integrates stable and various summarization methods, and the performance is good enough. 9 | * It supports three typical summarization tasks, including simple-document, multi-document and topic-focused multi-document summarizations. 10 | * It supports Western languages (e.g. English) and Chinese language. 11 | * It integrates English tokenizer, stemmer and Chinese word segmentation tools. 12 | * The Java platform can be easily distributed on different OS platforms, like Windows, Linux and MacOS. 13 | * It is open source and developed with modularization, so that users can add new methods and modules into the toolkit conveniently. 14 | 15 | The package of PKUSUMSUM includes the Jar package, source code in “/code” and referenced libraries in “/lib”. 16 | 17 | The correspondence between the summarization methods and the summarization tasks is shown in the following table: 18 | 19 | | Method | Single-document summarization | Multi-document summarization | Topic-based Multi-document summarization | 20 | |:-----------:|:--------:| :--------:| :-------:| 21 | | Coverage | - | Yes | Yes | 22 | | Lead | Yes| Yes | Yes | 23 | | Centroid [1]| Yes | Yes | Yes | 24 | | TextRank [2]| Yes | Yes | - | 25 | | LexPageRank[3]| Yes | Yes | - | 26 | | ILP [4]| Yes | Yes | - | 27 | | Submodular1 [5]| Yes | Yes | - | 28 | | Submodular2 [6]| Yes | Yes | - | 29 | | ClusterCMRW[7]| - | Yes | - | 30 | | ManifoldRank[8]| - | - | Yes | 31 | 32 | ##Notice 33 | * We use **lp_solve for Java** to solve the ILP model. If you choose the ILP method to solve the problem, please configure lp_solve. 34 | * Copy the lp_solve dynamic libraries from the archives lp_solve_5.5_dev.(zip or tar.gz) and lp_solve_5.5_exe.(zip or tar.gz) to a standard library directory on the target platform. On Windows, the typical directory is \WINDOWS or \WINDOWS\SYSTEM32. On Linux, the typical directory is /usr/local/lib. 35 | * Unzip the Java wrapper distribution file to a new directory. On Windows, copy the wrapper stub library **lpsolve55j.dll** to the directory that already contains **lpsolve55.dll**.On Linux, copy the wrapper stub library **liblpsolve55j.so** to the directory that already contains **liblpsolve55.so**. Run **ldconfig** to include the library in the shared library cache. 36 | * You can look more details on the website (http://lpsolve.sourceforge.net/5.5/). 37 | * The version of JRE requires 1.8 and above. 38 | * The input documents must be encoded using UTF-8. 39 | 40 | ##Usage 41 | Open a terminal under the PKUSUMSUM directory and type in: > java -jar PKUSUMSUM.jar 42 | 43 | ###Parameters: 44 | There are several parameters required to be set when using the toolkit. Parameters in the "[]" are optional and they have default values. 45 | 46 | 47 | | RequiredParameters | Description | 48 | |:---------------------|:--------| 49 | | -T | Specify which task to do.1: single-document summarization; 2: multi-document summarization;3: topic-based multi-document summarization. | 50 | | -topic | Specify the path of the topic file only for the topic-based multi-document summarization task.| 51 | | -input | Specify the path of the input document or document set.**For single-document summarization**, it specifies the path of the input document (including the document filename) to be summarized.**For multi-document summarization or topic-based multi-document summarization**, it specifies the directory of the input documents to be summarized.| 52 | | -output| Specify the path of the output file containing the final summary. | 53 | | -L| Specify the language of the input document(s): 1 – Chinese, 2 – English, 3 - other Western languages. | 54 | | -n| Specify the expected number of words in the final summary. | 55 | | -m| Specify which method is used to solve the problem.**For single-document summarization**: 1 - Lead, 2 - Centroid, 3 - ILP, 4 - LexPageRank, 5 -TextRank, 6 - Submodular;**For multi-document summarization**: 0 - Coverage, 1 - Lead, 2 - Centroid, 3 - ILP, 4 - LexPageRank, 5 - TextRank, 6 - Submodular, 7 - ClusterCMRW;**For topic-based multi-document summarization**: 0 - Coverage, 1 - Lead, 2 - Centroid, 8 - ManifoldRank. | 56 | | -stop| Specify whether to remove the stopwords.If you need to remove the stop words, you should provide the stopword list and specify the path of the stop word file. Note that we have prepared an English stopword list in the file “/lib/stopword_Eng”, you can use it by input “y”.If you don’t need to remove the stop words, please input “n”. | 57 | 58 | | OptionalParameters | Description | 59 | |:---------------------|:--------| 60 | | [-s] | Specify whether you want to conduct word stemming (Only for English language):1 - stem, 2 - no stem; the default value is 1. | 61 | | [-R] | Specify which **redundancy removal method** is used for summary sentence selection. The ILP and Submodular methods don’t need extra redundancy removal. The default value is 3 for ManifoldRank, and 1 for other methods which need redundancy removal.**1 – MMR-based method**;**2 – Threshold-based method**: if the maximum similarity between an unselected sentence and the already selected sentences is larger than a predefined threshold, this unselected sentence will be removed.**3 – Penalty imposing method**: after a summary sentence is selected, the score of each unselected sentence will be penalized by subtracting the product of a predefined penalty ratio and the similarity between the unselected sentence and the summary sentence. | 62 | | [-p] | It is the internal parameter of the redundancy removal methods and has a default value of 0.7.**For MMR and Penalty imposing method**, it specifies the penalty ratio. **For threshold-based method**, it specifies the threshold value.| 63 | | [-beta] | It is a scaling factor of sentence length when we choose sentences, and its range is [0, 1]. In several summarization methods, long sentences are likely to get higher scores than short sentences. Considering the length limit of the summary, we provide a scaling factor of sentence length to normalize the score of each sentence. You can learn details from README.docx. The default value is 0.1. | 64 | |**For LexPageRank**| 65 | |[-link]|It specifies the similarity threshold for linking two sentences. If the similarity of two sentences is larger than the threshold, then add an edge between the sentences. Its range is [0, 1] and the default value is 0.1.| 66 | | **For ClusterCMRW**| | 67 | |[-Alpha]|It specifies the ratio for controlling the expected cluster number of the document set. Its range is [0, 1] and has a default value of 0.1.| 68 | |[-Lamda]|It specifies the combination weight for controlling the relative contributions from the source cluster and the destination cluster. Its range is [0, 1] and has a default value of 0.8.| 69 | |**For Submodular**|| 70 | |[-sub]|It specifies the type of the submodular method, and the default value is 2. 1 – a method in Li's paper (Li at el, 2012);2 - a modification method from Lin's paper (Lin and Bilmes, 2010);| 71 | |[-A]|It specifies the threshold coefficient. The range is [0, 1] and the default value is 0.5.| 72 | |[-lam]|It specifies the trade-off coefficient. The range is [0, 1] and the default value is 0.15 for multi-document summarization and 0.5 for single-document summarization. | 73 | 74 | ###Example: java -jar PKUSUMSUM.jar –T 1 –input ./article.txt –output ./summay.txt –L 1 –n 100 –m 2 –stop n 75 | 76 | ##License 77 | PKUSUMSUM is used under the GNU GPL license. 78 | 79 | ##Contact us 80 | Welcome to contact us if you have any questions or suggestions while using PKUSUMSUM. 81 | Contact person: Jianmin Zhang 82 | Contact email: zhangjianmin2015@pku.edu.cn 83 | 84 | ##Reference 85 | [1]. Radev, Dragomir R., Hongyan Jing, Małgorzata Styś, Daniel Tam. 2004. Centroid-based summarization of multiple documents. Information Processing & Management, 40(6), 919-938. 86 | [2]. Mihalcea, Rada, and Paul Tarau. 2004. TextRank: Bringing order into texts. Association for Computational Linguistics. 87 | [3]. Erkan, Günes, and Dragomir R. Radev. 2004. LexPageRank: Prestige in Multi-Document Text Summarization. EMNLP. Vol.4. 88 | [4]. Gillick, Dan, and Benoit Favre. 2009. A scalable global model for summarization. Proceedings of the Workshop on Integer Linear Programming for Natural Language Processing, Association for Computational Linguistics. 89 | [5]. Li, Jingxuan, Lei Li, and Tao Li. 2012. Multi-document summarization via submodularity. Applied Intelligence 37.3: 420-430. 90 | [6]. Lin, Hui, and Jeff Bilmes. 2010. Multi-document summarization via budgeted maximization of submodular functions. Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Association for Computational Linguistics. 91 | [7]. Wan, Xiaojun, and Jianwu Yang. 2008. Multi-document summarization using cluster-based link analysis. Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, ACM. 92 | [8]. Wan, Xiaojun, Jianwu Yang, and Jianguo Xiao. 2007. Manifold-Ranking Based Topic-Focused Multi-Document Summarization. IJCAI. Vol. 7. 93 | 94 | 95 | -------------------------------------------------------------------------------- /README_V1.2.docx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PKULCWM/PKUSUMSUM/721bbee1b1e112dce97c2fa3193fb61488a624bb/README_V1.2.docx -------------------------------------------------------------------------------- /code/ClusterCMRW.java: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PKULCWM/PKUSUMSUM/721bbee1b1e112dce97c2fa3193fb61488a624bb/code/ClusterCMRW.java -------------------------------------------------------------------------------- /code/Coverage.java: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PKULCWM/PKUSUMSUM/721bbee1b1e112dce97c2fa3193fb61488a624bb/code/Coverage.java -------------------------------------------------------------------------------- /code/ILP.java: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PKULCWM/PKUSUMSUM/721bbee1b1e112dce97c2fa3193fb61488a624bb/code/ILP.java -------------------------------------------------------------------------------- /code/Lead.java: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PKULCWM/PKUSUMSUM/721bbee1b1e112dce97c2fa3193fb61488a624bb/code/Lead.java -------------------------------------------------------------------------------- /code/LexPageRank.java: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PKULCWM/PKUSUMSUM/721bbee1b1e112dce97c2fa3193fb61488a624bb/code/LexPageRank.java -------------------------------------------------------------------------------- /code/MEAD.java: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PKULCWM/PKUSUMSUM/721bbee1b1e112dce97c2fa3193fb61488a624bb/code/MEAD.java -------------------------------------------------------------------------------- /code/Main.java: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PKULCWM/PKUSUMSUM/721bbee1b1e112dce97c2fa3193fb61488a624bb/code/Main.java -------------------------------------------------------------------------------- /code/ManifoldRank.java: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PKULCWM/PKUSUMSUM/721bbee1b1e112dce97c2fa3193fb61488a624bb/code/ManifoldRank.java -------------------------------------------------------------------------------- /code/Run.java: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PKULCWM/PKUSUMSUM/721bbee1b1e112dce97c2fa3193fb61488a624bb/code/Run.java -------------------------------------------------------------------------------- /code/Stemmer.java: -------------------------------------------------------------------------------- 1 | package code; 2 | 3 | public class Stemmer { 4 | private char[] b; 5 | private int i, /* offset into b */ 6 | i_end, /* offset to end of stemmed word */ 7 | j, k; 8 | private static final int INC = 50; 9 | /* unit of size whereby b is increased */ 10 | public Stemmer() 11 | { b = new char[INC]; 12 | i = 0; 13 | i_end = 0; 14 | } 15 | 16 | /** 17 | * Add a character to the word being stemmed. When you are finished 18 | * adding characters, you can call stem(void) to stem the word. 19 | */ 20 | 21 | public void add(char ch) 22 | { if (i == b.length) 23 | { char[] new_b = new char[i+INC]; 24 | for (int c = 0; c < i; c++) new_b[c] = b[c]; 25 | b = new_b; 26 | } 27 | b[i++] = ch; 28 | } 29 | 30 | 31 | /** Adds wLen characters to the word being stemmed contained in a portion 32 | * of a char[] array. This is like repeated calls of add(char ch), but 33 | * faster. 34 | */ 35 | 36 | public void add(char[] w, int wLen) 37 | { if (i+wLen >= b.length) 38 | { char[] new_b = new char[i+wLen+INC]; 39 | for (int c = 0; c < i; c++) new_b[c] = b[c]; 40 | b = new_b; 41 | } 42 | for (int c = 0; c < wLen; c++) b[i++] = w[c]; 43 | } 44 | 45 | /** 46 | * After a word has been stemmed, it can be retrieved by toString(), 47 | * or a reference to the internal buffer can be retrieved by getResultBuffer 48 | * and getResultLength (which is generally more efficient.) 49 | */ 50 | public String toString() { return new String(b,0,i_end); } 51 | 52 | /** 53 | * Returns the length of the word resulting from the stemming process. 54 | */ 55 | public int getResultLength() { return i_end; } 56 | 57 | /** 58 | * Returns a reference to a character buffer containing the results of 59 | * the stemming process. You also need to consult getResultLength() 60 | * to determine the length of the result. 61 | */ 62 | public char[] getResultBuffer() { return b; } 63 | 64 | /* cons(i) is true <=> b[i] is a consonant. */ 65 | 66 | private final boolean cons(int i) 67 | { switch (b[i]) 68 | { case 'a': case 'e': case 'i': case 'o': case 'u': return false; 69 | case 'y': return (i==0) ? true : !cons(i-1); 70 | default: return true; 71 | } 72 | } 73 | 74 | /* m() measures the number of consonant sequences between 0 and j. if c is 75 | a consonant sequence and v a vowel sequence, and <..> indicates arbitrary 76 | presence, 77 | 78 | gives 0 79 | vc gives 1 80 | vcvc gives 2 81 | vcvcvc gives 3 82 | .... 83 | */ 84 | 85 | private final int m() 86 | { int n = 0; 87 | int i = 0; 88 | while(true) 89 | { if (i > j) return n; 90 | if (! cons(i)) break; i++; 91 | } 92 | i++; 93 | while(true) 94 | { while(true) 95 | { if (i > j) return n; 96 | if (cons(i)) break; 97 | i++; 98 | } 99 | i++; 100 | n++; 101 | while(true) 102 | { if (i > j) return n; 103 | if (! cons(i)) break; 104 | i++; 105 | } 106 | i++; 107 | } 108 | } 109 | 110 | /* vowelinstem() is true <=> 0,...j contains a vowel */ 111 | 112 | private final boolean vowelinstem() 113 | { int i; for (i = 0; i <= j; i++) if (! cons(i)) return true; 114 | return false; 115 | } 116 | 117 | /* doublec(j) is true <=> j,(j-1) contain a double consonant. */ 118 | 119 | private final boolean doublec(int j) 120 | { if (j < 1) return false; 121 | if (b[j] != b[j-1]) return false; 122 | return cons(j); 123 | } 124 | 125 | /* cvc(i) is true <=> i-2,i-1,i has the form consonant - vowel - consonant 126 | and also if the second c is not w,x or y. this is used when trying to 127 | restore an e at the end of a short word. e.g. 128 | 129 | cav(e), lov(e), hop(e), crim(e), but 130 | snow, box, tray. 131 | 132 | */ 133 | 134 | private final boolean cvc(int i) 135 | { if (i < 2 || !cons(i) || cons(i-1) || !cons(i-2)) return false; 136 | { int ch = b[i]; 137 | if (ch == 'w' || ch == 'x' || ch == 'y') return false; 138 | } 139 | return true; 140 | } 141 | 142 | private final boolean ends(String s) 143 | { int l = s.length(); 144 | int o = k-l+1; 145 | if (o < 0) return false; 146 | for (int i = 0; i < l; i++) if (b[o+i] != s.charAt(i)) return false; 147 | j = k-l; 148 | return true; 149 | } 150 | 151 | /* setto(s) sets (j+1),...k to the characters in the string s, readjusting 152 | k. */ 153 | 154 | private final void setto(String s) 155 | { int l = s.length(); 156 | int o = j+1; 157 | for (int i = 0; i < l; i++) b[o+i] = s.charAt(i); 158 | k = j+l; 159 | } 160 | 161 | /* r(s) is used further down. */ 162 | 163 | private final void r(String s) { if (m() > 0) setto(s); } 164 | 165 | /* step1() gets rid of plurals and -ed or -ing. e.g. 166 | 167 | caresses -> caress 168 | ponies -> poni 169 | ties -> ti 170 | caress -> caress 171 | cats -> cat 172 | 173 | feed -> feed 174 | agreed -> agree 175 | disabled -> disable 176 | 177 | matting -> mat 178 | mating -> mate 179 | meeting -> meet 180 | milling -> mill 181 | messing -> mess 182 | 183 | meetings -> meet 184 | 185 | */ 186 | 187 | private final void step1() 188 | { if (b[k] == 's') 189 | { if (ends("sses")) k -= 2; else 190 | if (ends("ies")) setto("i"); else 191 | if (b[k-1] != 's') k--; 192 | } 193 | if (ends("eed")) { if (m() > 0) k--; } else 194 | if ((ends("ed") || ends("ing")) && vowelinstem()) 195 | { k = j; 196 | if (ends("at")) setto("ate"); else 197 | if (ends("bl")) setto("ble"); else 198 | if (ends("iz")) setto("ize"); else 199 | if (doublec(k)) 200 | { k--; 201 | { int ch = b[k]; 202 | if (ch == 'l' || ch == 's' || ch == 'z') k++; 203 | } 204 | } 205 | else if (m() == 1 && cvc(k)) setto("e"); 206 | } 207 | } 208 | 209 | /* step2() turns terminal y to i when there is another vowel in the stem. */ 210 | 211 | private final void step2() { if (ends("y") && vowelinstem()) b[k] = 'i'; } 212 | 213 | /* step3() maps double suffices to single ones. so -ization ( = -ize plus 214 | -ation) maps to -ize etc. note that the string before the suffix must give 215 | m() > 0. */ 216 | 217 | private final void step3() { if (k == 0) return; /* For Bug 1 */ switch (b[k-1]) 218 | { 219 | case 'a': if (ends("ational")) { r("ate"); break; } 220 | if (ends("tional")) { r("tion"); break; } 221 | break; 222 | case 'c': if (ends("enci")) { r("ence"); break; } 223 | if (ends("anci")) { r("ance"); break; } 224 | break; 225 | case 'e': if (ends("izer")) { r("ize"); break; } 226 | break; 227 | case 'l': if (ends("bli")) { r("ble"); break; } 228 | if (ends("alli")) { r("al"); break; } 229 | if (ends("entli")) { r("ent"); break; } 230 | if (ends("eli")) { r("e"); break; } 231 | if (ends("ousli")) { r("ous"); break; } 232 | break; 233 | case 'o': if (ends("ization")) { r("ize"); break; } 234 | if (ends("ation")) { r("ate"); break; } 235 | if (ends("ator")) { r("ate"); break; } 236 | break; 237 | case 's': if (ends("alism")) { r("al"); break; } 238 | if (ends("iveness")) { r("ive"); break; } 239 | if (ends("fulness")) { r("ful"); break; } 240 | if (ends("ousness")) { r("ous"); break; } 241 | break; 242 | case 't': if (ends("aliti")) { r("al"); break; } 243 | if (ends("iviti")) { r("ive"); break; } 244 | if (ends("biliti")) { r("ble"); break; } 245 | break; 246 | case 'g': if (ends("logi")) { r("log"); break; } 247 | } } 248 | 249 | /* step4() deals with -ic-, -full, -ness etc. similar strategy to step3. */ 250 | 251 | private final void step4() { switch (b[k]) 252 | { 253 | case 'e': if (ends("icate")) { r("ic"); break; } 254 | if (ends("ative")) { r(""); break; } 255 | if (ends("alize")) { r("al"); break; } 256 | break; 257 | case 'i': if (ends("iciti")) { r("ic"); break; } 258 | break; 259 | case 'l': if (ends("ical")) { r("ic"); break; } 260 | if (ends("ful")) { r(""); break; } 261 | break; 262 | case 's': if (ends("ness")) { r(""); break; } 263 | break; 264 | } } 265 | 266 | /* step5() takes off -ant, -ence etc., in context vcvc. */ 267 | 268 | private final void step5() 269 | { if (k == 0) return; /* for Bug 1 */ switch (b[k-1]) 270 | { case 'a': if (ends("al")) break; return; 271 | case 'c': if (ends("ance")) break; 272 | if (ends("ence")) break; return; 273 | case 'e': if (ends("er")) break; return; 274 | case 'i': if (ends("ic")) break; return; 275 | case 'l': if (ends("able")) break; 276 | if (ends("ible")) break; return; 277 | case 'n': if (ends("ant")) break; 278 | if (ends("ement")) break; 279 | if (ends("ment")) break; 280 | /* element etc. not stripped before the m */ 281 | if (ends("ent")) break; return; 282 | case 'o': if (ends("ion") && j >= 0 && (b[j] == 's' || b[j] == 't')) break; 283 | /* j >= 0 fixes Bug 2 */ 284 | if (ends("ou")) break; return; 285 | /* takes care of -ous */ 286 | case 's': if (ends("ism")) break; return; 287 | case 't': if (ends("ate")) break; 288 | if (ends("iti")) break; return; 289 | case 'u': if (ends("ous")) break; return; 290 | case 'v': if (ends("ive")) break; return; 291 | case 'z': if (ends("ize")) break; return; 292 | default: return; 293 | } 294 | if (m() > 1) k = j; 295 | } 296 | 297 | /* step6() removes a final -e if m() > 1. */ 298 | 299 | private final void step6() 300 | { j = k; 301 | if (b[k] == 'e') 302 | { int a = m(); 303 | if (a > 1 || a == 1 && !cvc(k-1)) k--; 304 | } 305 | if (b[k] == 'l' && doublec(k) && m() > 1) k--; 306 | } 307 | 308 | /** Stem the word placed into the Stemmer buffer through calls to add(). 309 | * Returns true if the stemming process resulted in a word different 310 | * from the input. You can retrieve the result with 311 | * getResultLength()/getResultBuffer() or toString(). 312 | */ 313 | public void stem() 314 | { k = i - 1; 315 | if (k > 1) { step1(); step2(); step3(); step4(); step5(); step6(); } 316 | i_end = k+1; i = 0; 317 | } 318 | } 319 | -------------------------------------------------------------------------------- /code/Submodular.java: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PKULCWM/PKUSUMSUM/721bbee1b1e112dce97c2fa3193fb61488a624bb/code/Submodular.java -------------------------------------------------------------------------------- /code/TextRank.java: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PKULCWM/PKUSUMSUM/721bbee1b1e112dce97c2fa3193fb61488a624bb/code/TextRank.java -------------------------------------------------------------------------------- /code/Tokenizer.java: -------------------------------------------------------------------------------- 1 | package code; 2 | 3 | import edu.stanford.nlp.ling.CoreLabel; 4 | import edu.stanford.nlp.process.CoreLabelTokenFactory; 5 | import edu.stanford.nlp.process.PTBTokenizer; 6 | 7 | import java.io.*; 8 | import java.util.ArrayList; 9 | import java.util.HashMap; 10 | import java.util.regex.Matcher; 11 | import java.util.regex.Pattern; 12 | 13 | import org.ansj.domain.Result; 14 | import org.ansj.domain.Term; 15 | import org.ansj.splitWord.analysis.ToAnalysis; 16 | 17 | public class Tokenizer { 18 | public ArrayList passage=new ArrayList(); 19 | public ArrayList senLen=new ArrayList(); 20 | public ArrayList sentence=new ArrayList(); 21 | public ArrayList> word=new ArrayList>(); 22 | public ArrayList> stemmerWord=new ArrayList>(); 23 | HashMap stopword=new HashMap(); 24 | public void readStopwords(String stopwordPath) throws IOException 25 | { 26 | File tmpfile =new File(stopwordPath); 27 | if (!tmpfile.exists()){ 28 | System.out.println("stopwords file does not exist!"); 29 | System.exit(0); 30 | } 31 | FileReader inFReader=new FileReader(stopwordPath); 32 | BufferedReader inBReader=new BufferedReader(inFReader); 33 | String tmpWord; 34 | int i=0; 35 | while((tmpWord=inBReader.readLine())!=null) 36 | { 37 | i++; 38 | stopword.put(tmpWord, i); 39 | } 40 | inBReader.close(); 41 | } 42 | 43 | public void readStopwordsEng() throws IOException 44 | { 45 | 46 | InputStream stop = Tokenizer.class.getClassLoader().getResourceAsStream("stopword_Eng"); 47 | BufferedReader inBReader = new BufferedReader(new InputStreamReader(stop)); 48 | String tmpWord; 49 | int i=0; 50 | while((tmpWord = inBReader.readLine()) != null){ 51 | i++; 52 | stopword.put(tmpWord, i); 53 | } 54 | inBReader.close(); 55 | 56 | } 57 | 58 | public boolean ifWordsEng(String tmpWord) 59 | { 60 | if (tmpWord.charAt(0)>='A' && tmpWord.charAt(0)<='Z') return true; 61 | if (tmpWord.charAt(0)>='a' && tmpWord.charAt(0)<='z') return true; 62 | return false; 63 | } 64 | public boolean ifStopwords(String tmpWord) 65 | { 66 | if (stopword.get(tmpWord.toLowerCase())!=null) return true; 67 | return false; 68 | } 69 | 70 | public void stemmerWord() { 71 | int numOfWord = word.size(); 72 | for(int i = 0; i < numOfWord; ++i) { 73 | ArrayList stemmerW = new ArrayList(); 74 | for(int j = 0; j < word.get(i).size(); ++j) { 75 | Stemmer stemmer = new Stemmer(); 76 | int letterNumOfWord = word.get(i).get(j).length(); 77 | for(int k = 0; k < letterNumOfWord; ++k) { 78 | stemmer.add(word.get(i).get(j).charAt(k)); 79 | } 80 | stemmer.stem(); 81 | String tmpW = stemmer.toString(); 82 | stemmerW.add(tmpW); 83 | } 84 | stemmerWord.add(stemmerW); 85 | } 86 | } 87 | 88 | public ArrayList tokenizeEng(String inFile, String stopwordPath) throws IOException 89 | { 90 | PTBTokenizer ptbt = new PTBTokenizer<>(new FileReader(inFile), 91 | new CoreLabelTokenFactory(), ""); 92 | int len=0; 93 | int wlen=0; 94 | if (stopwordPath.equals("y")) 95 | readStopwordsEng(); 96 | else if (!stopwordPath.equals("n")) 97 | readStopwords(stopwordPath); 98 | String token,tmpSen; 99 | tmpSen=new String(); 100 | boolean ifend=false; 101 | while (ptbt.hasNext()) 102 | { 103 | CoreLabel label = ptbt.next(); 104 | token=label.toString(); 105 | 106 | if (ifend==false) 107 | { 108 | 109 | if (token.equals(".") || token.equals("?") ||token.equals("!")) 110 | { 111 | ifend=true; 112 | } 113 | //remove some invalit symbols 114 | if (token.equals("-LRB-") || token.equals("-RRB-") || token.equals("-LCB-")|| token.equals("-RCB-") || token.equals("\"")) 115 | continue; 116 | if (token.equals("'") || token.equals("`") || token.equals("''") || token.equals("``") || token.equals("_") || token.equals("--") || token.equals("-")){ 117 | continue; 118 | } 119 | if (token.equals("'s") || token.equals(".") || token.equals("?") || token.equals("!") || token.equals(",") || token.equals("'re") || (token.equals("'ve"))) 120 | tmpSen+=token; 121 | else 122 | tmpSen+=" "+token; 123 | 124 | if (ifWordsEng(token)) 125 | wlen++; 126 | if (!token.equals("'s")) 127 | len++; 128 | if (ifWordsEng(token) && !ifStopwords(token)) 129 | sentence.add(token.toLowerCase()); 130 | }else 131 | { 132 | if (token.equals("'") || token.equals("`") || token.equals("''") || token.equals("``") || token.equals(" ")){ 133 | continue; 134 | } 135 | if (token.equals(".")) 136 | { 137 | 138 | tmpSen+=token; 139 | len++; 140 | }else 141 | { 142 | if (len>1 && wlen*2>=len) { 143 | passage.add(tmpSen); 144 | senLen.add(len); 145 | word.add(sentence); 146 | } 147 | ifend=false; 148 | tmpSen=token; 149 | sentence=new ArrayList(); 150 | wlen=0; 151 | if (ifWordsEng(token)) 152 | wlen++; 153 | len=1; 154 | if (ifWordsEng(token) && !ifStopwords(token)) 155 | sentence.add(token.toLowerCase()); 156 | } 157 | } 158 | } 159 | if (ifend && len>1 && wlen*2>=len) 160 | { 161 | passage.add(tmpSen); 162 | word.add(sentence); 163 | senLen.add(len); 164 | } 165 | stemmerWord(); 166 | return passage; 167 | } 168 | 169 | public ArrayList tokenizeChn(String inFile, String stopwordPath) throws IOException 170 | { 171 | StringBuffer buffer=new StringBuffer(); 172 | String line; 173 | if (!stopwordPath.equals("n") && !stopwordPath.equals("y")) 174 | readStopwords(stopwordPath); 175 | BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(inFile), "utf-8")); 176 | line = reader.readLine(); 177 | while (line != null) { 178 | buffer.append(line); 179 | buffer.append("\n"); 180 | line = reader.readLine(); 181 | } 182 | reader.close(); 183 | Pattern pattern = Pattern.compile(".*?[。?!]"); 184 | Matcher matcher = pattern.matcher(buffer); 185 | Pattern p2=Pattern.compile("[\u4e00-\u9fa5]"); 186 | while (matcher.find()) { 187 | String sen=matcher.group(); 188 | passage.add(sen); 189 | senLen.add(sen.length()); 190 | Result parse = ToAnalysis.parse(sen); 191 | ArrayList tmpsen=new ArrayList<>(); 192 | for (Term x:parse){ 193 | Matcher m2=p2.matcher(x.getName()); 194 | if (m2.find()) { 195 | if (!ifStopwords(x.getName())) 196 | tmpsen.add(x.getName()); 197 | } 198 | } 199 | word.add(tmpsen); 200 | } 201 | stemmerWord(); 202 | return passage; 203 | } 204 | 205 | } 206 | -------------------------------------------------------------------------------- /code/doc.java: -------------------------------------------------------------------------------- 1 | package code; 2 | 3 | import java.io.IOException; 4 | import java.util.ArrayList; 5 | import java.util.Arrays; 6 | import java.util.HashMap; 7 | import java.util.Map; 8 | import java.util.TreeMap; 9 | import java.util.TreeSet; 10 | 11 | //some basic information about doc 12 | public class Doc { 13 | public ArrayList> sen = new ArrayList>();// the sentence after tokenize 14 | public ArrayList> stemmerSen = new ArrayList>(); 15 | public int[] lRange;//the begin of the i'th document 16 | public int[] rRange;//the end of the i'th document 17 | public ArrayList originalSen = new ArrayList();//the original sentence 18 | public ArrayList senLen = new ArrayList<>();//the length of original sentence 19 | public ArrayList wordLen = new ArrayList<>();// the length of the vector 20 | public ArrayList> sVector = new ArrayList>(); 21 | public ArrayList> sTf = new ArrayList<>();//the tf-vector of the sentence; sVector stores index 22 | public ArrayList dTf;//the tf-vector of the document; dVector stores index 23 | public TreeSet dVector; 24 | public int totalLen;// the lenth of the 25 | public int fnum, snum = 0, wnum;//fnum-document num ;wnum-word num; snum-sentence num 26 | public int[] tf;//tf of words 27 | public int[] df;//df of words 28 | public double[] idf;//idf of words 29 | public double[][] sim, normalSim; 30 | public int maxlen;//the maxlen of the summary 31 | // public String outfile; 32 | ArrayList summaryId = new ArrayList<>();//index of the sentence picked 33 | HashMap dic = new HashMap();//map words into number 34 | HashMap dd= new HashMap<>(); 35 | 36 | 37 | public void readTopic(String Topicfile, String language, String stopwordPath) throws IOException { 38 | Tokenizer mytoken = new Tokenizer(); 39 | ArrayList tmp = new ArrayList<>(); 40 | if (language.equals("1"))//1 represent Chinese 41 | tmp = mytoken.tokenizeChn(Topicfile, stopwordPath); 42 | else if (language.equals("2"))//2 represent English 43 | tmp = mytoken.tokenizeEng(Topicfile, stopwordPath); 44 | else if (language.equals("3"))//3 represent other 45 | tmp = mytoken.tokenizeEng(Topicfile, stopwordPath); 46 | 47 | int len = tmp.size(); 48 | String topic = ""; 49 | ArrayList topicWord = new ArrayList(); 50 | ArrayList stemmerTopicWord = new ArrayList(); 51 | int length = 0; 52 | for(int i = 0; i < len; ++i) { 53 | topic = topic + tmp.get(i) + " "; 54 | for(int j = 0; j < mytoken.word.get(i).size(); ++j) { 55 | topicWord.add(mytoken.word.get(i).get(j)); 56 | stemmerTopicWord.add(mytoken.stemmerWord.get(i).get(j)); 57 | } 58 | length += mytoken.senLen.get(i); 59 | } 60 | 61 | sen.add(topicWord); 62 | stemmerSen.add(stemmerTopicWord); 63 | senLen.add(length); 64 | originalSen.add(topic); 65 | snum++; 66 | } 67 | 68 | //read file from the documents 69 | public void readfile(String[] rfiles,String filepath,String language, String stopwordPath) throws IOException { 70 | int i = 0; 71 | lRange = new int[rfiles.length]; 72 | rRange = new int[rfiles.length]; 73 | fnum = 0; 74 | totalLen = snum; 75 | for (String infile : rfiles) { 76 | if (infile.equals(".DS_Store")) { 77 | System.out.println("Skiping!!"); 78 | continue; 79 | } 80 | fnum++; 81 | String path; 82 | if (!filepath.equals(" ")) { 83 | path = filepath + System.getProperty("file.separator") + infile; 84 | } 85 | else{ 86 | path = infile; 87 | } 88 | 89 | Tokenizer mytoken = new Tokenizer(); 90 | ArrayList tmp = new ArrayList<>(); 91 | if (language.equals("1"))//1 represent Chinese 92 | tmp = mytoken.tokenizeChn(path,stopwordPath); 93 | else 94 | if (language.equals("2"))//2 represent English 95 | tmp = mytoken.tokenizeEng(path, stopwordPath); 96 | else 97 | if (language.equals("3"))//3 represent other 98 | tmp = mytoken.tokenizeEng(path, stopwordPath); 99 | int len = tmp.size(); 100 | 101 | lRange[i] = totalLen; 102 | totalLen += len; 103 | rRange[i] = totalLen; 104 | i++; 105 | sen.addAll(mytoken.word); 106 | stemmerSen.addAll(mytoken.stemmerWord); 107 | senLen.addAll(mytoken.senLen); 108 | originalSen.addAll(tmp); 109 | } 110 | snum = originalSen.size(); 111 | } 112 | 113 | 114 | // op 1 represent tf-isf; 2 and 3 represent tf-idf 115 | // stemOrNot 1 represent no stemmer; 2 represent stemmer 116 | // calculate the tf-idf of the words 117 | void calcTfidf(int op, int stemOrNot) { 118 | int i = 0,wlen = 0; 119 | wnum = 0; 120 | dic = new HashMap(); 121 | dTf = new ArrayList<>(); 122 | dVector = new TreeSet<>(); 123 | int[] allTf = new int [100000]; 124 | Arrays.fill(allTf,0); 125 | wordLen = new ArrayList<>(); 126 | int dnum = 0; 127 | tf = new int[100000]; 128 | df = new int[100000]; 129 | boolean[] occur = new boolean[100000]; 130 | 131 | ArrayList> calTfIdfVec = new ArrayList>(); 132 | if(stemOrNot == 1) { 133 | calTfIdfVec = sen; 134 | }else { 135 | calTfIdfVec = stemmerSen; 136 | } 137 | 138 | for (ArrayList tmpSen : calTfIdfVec) { 139 | wlen=0; 140 | TreeSet tmpSet = new TreeSet(); 141 | Arrays.fill(tf,0); 142 | if (op == 2 || op == 3) { 143 | if (i == rRange[dnum]){ 144 | dnum++; 145 | Arrays.fill(occur,false); 146 | } 147 | }else 148 | Arrays.fill(occur,false); 149 | for (String tmpWord : tmpSen) { 150 | wlen++; 151 | if (dic.get(tmpWord) != null) { 152 | int k = dic.get(tmpWord); 153 | tmpSet.add(k); 154 | tf[k]++; 155 | allTf[k]++; 156 | if (!occur[k]) { 157 | occur[k] = true; 158 | df[k]++; 159 | } 160 | 161 | } else { 162 | dic.put(tmpWord, wnum); 163 | dd.put(wnum,tmpWord); 164 | tf[wnum]++; 165 | allTf[wnum]++; 166 | df[wnum]++; 167 | tmpSet.add(wnum); 168 | occur[wnum] = true; 169 | wnum++; 170 | } 171 | } 172 | wordLen.add(wlen); 173 | ArrayList tmpTf=new ArrayList<>(); 174 | for (int j:tmpSet) 175 | { 176 | tmpTf.add(tf[j]); 177 | } 178 | sTf.add(tmpTf); 179 | sVector.add(tmpSet); 180 | 181 | 182 | i++; 183 | } 184 | idf = new double[wnum]; 185 | if (op == 2 || op == 3){ 186 | for (i=0;i a1 , ArrayList a2, int lenA , TreeSet b1, ArrayListb2, int lenB) 207 | { 208 | int x1 = 0,x2 = 0; 209 | double l1 = 0,l2 = 0; 210 | int idA = 0, idB = 0; 211 | double cos = 0; 212 | TreeSet a = new TreeSet<>(); 213 | TreeSet b = new TreeSet<>(); 214 | a.addAll(a1); 215 | b.addAll(b1); 216 | while (a.size() > 0 && b.size() > 0) 217 | { 218 | 219 | x1 = a.first(); 220 | x2 = b.first(); 221 | if ( x1 == x2 ) 222 | { 223 | l1 += Math.pow((double)a2.get(idA)/(double)lenA*idf[x1],2); 224 | l2 += Math.pow((double)b2.get(idB)/(double)lenB*idf[x2],2); 225 | cos += Math.pow(idf[x1],2)*(double)a2.get(idA)/(double)lenA*(double)b2.get(idB)/(double)lenB; 226 | a.pollFirst(); 227 | idA++; 228 | b.pollFirst(); 229 | idB++; 230 | }else 231 | if ( x1 < x2 ) 232 | { 233 | l1 += Math.pow((double)a2.get(idA)/(double)lenA*idf[x1],2); 234 | a.pollFirst(); 235 | idA++; 236 | }else 237 | if ( x1 > x2) 238 | { 239 | l2 += Math.pow((double)b2.get(idB)/(double)lenB*idf[x2],2); 240 | b.pollFirst(); 241 | idB++; 242 | } 243 | } 244 | while (a.size() > 0) 245 | { 246 | x1 = a.first(); 247 | l1 += Math.pow((double)a2.get(idA)/(double)lenA*idf[x1],2); 248 | a.pollFirst(); 249 | idA++; 250 | } 251 | while (b.size() > 0) 252 | { 253 | x2 = b.first(); 254 | l2 += Math.pow((double)b2.get(idB)/(double)lenB*idf[x2],2); 255 | b.pollFirst(); 256 | idB++; 257 | } 258 | 259 | if (l1==0 || l2==0) return 0; 260 | return cos/Math.pow(l1*l2,0.5) ; 261 | } 262 | 263 | //calculate the similarity of two sentence 264 | void calcSim() 265 | { 266 | sim = new double[snum][snum]; 267 | normalSim = new double[snum][snum]; 268 | for (int i = 0 ; i < snum; i++){ 269 | double sumISim = 0.0; 270 | for (int j = 0; j < snum; j++) 271 | { 272 | if (i == j) { 273 | sim[i][j] = 1; 274 | } 275 | else if (i > j) { 276 | sim[i][j] = sim[j][i]; 277 | 278 | } 279 | else{ 280 | sim[i][j] = calcCos(sVector.get(i), sTf.get(i), wordLen.get(i), sVector.get(j), sTf.get(j), wordLen.get(j)); 281 | } 282 | sumISim += sim[i][j]; 283 | } 284 | for(int j = 0; j < snum; ++j) { 285 | if(sumISim != 0.0) { 286 | normalSim[i][j] = sim[i][j] / sumISim; 287 | } 288 | else 289 | normalSim[i][j] = 0.0; 290 | } 291 | } 292 | } 293 | 294 | //using MMR to remove redundancy 295 | ArrayList pickSentenceMMR(double[] score, double para, double beta) 296 | { 297 | summaryId =new ArrayList<>(); 298 | int len = 0; 299 | if (para < 0) para = 0.7; 300 | boolean[] chosen = new boolean[snum]; 301 | for (int i=0;imaxscore && !chosen[i] && len+ senLen.get(i)=5) 317 | { 318 | 319 | maxscore = tmpscore/Math.pow(senLen.get(i),beta); 320 | pick = i; 321 | 322 | } 323 | } 324 | if (pick==-1) 325 | break; 326 | chosen[pick]=true; 327 | len += senLen.get(pick); 328 | summaryId.add(pick); 329 | if (len>=maxlen-20) 330 | break; 331 | } 332 | return summaryId; 333 | } 334 | 335 | //using threshold to remove redundancy 336 | ArrayList pickSentenceThreshold(double[] score, double threshold, double beta){ 337 | summaryId = new ArrayList<>(); 338 | int len = 0; 339 | boolean[] chosen = new boolean[snum]; 340 | for (int i = 0; i < snum; i++) 341 | chosen[i] = false; 342 | while(len < maxlen) 343 | { 344 | double maxscore = 0; 345 | int pick = -1; 346 | for (int i = 0; i < snum; i++) 347 | { 348 | double tmpscore = score[i]; 349 | for (int j : summaryId) 350 | if (sim[i][j] > threshold) 351 | tmpscore = 0; 352 | 353 | if (tmpscore/Math.pow(senLen.get(i),beta)>maxscore && !chosen[i] && len+ senLen.get(i)=5) 354 | { 355 | 356 | maxscore = tmpscore/Math.pow(senLen.get(i),beta); 357 | pick = i; 358 | 359 | } 360 | } 361 | if (pick==-1) 362 | break; 363 | chosen[pick]=true; 364 | len += senLen.get(pick); 365 | summaryId.add(pick); 366 | if (len>=maxlen-20) 367 | break; 368 | } 369 | return summaryId; 370 | } 371 | 372 | //using punishment to remove redundancy 373 | ArrayList pickSentenceSumpun(double[] score, double para){ 374 | summaryId = new ArrayList<>(); 375 | Map m = new TreeMap(); 376 | int contentNum = 0; 377 | int Numm = 0; 378 | double maxSenScore = -10000.0; 379 | boolean[] yes = new boolean[snum]; 380 | for(int i = 0; i < snum; i++){ 381 | yes[i] = false; 382 | } 383 | while(contentNum <= maxlen){ 384 | 385 | for(int i = 1; i < snum; i++){ 386 | if(yes[i] == false && score[i] > maxSenScore){ 387 | maxSenScore = score[i]; 388 | Numm = i; 389 | } 390 | } 391 | 392 | m.put(Numm, maxSenScore); 393 | maxSenScore = -10000.0; 394 | contentNum += senLen.get(Numm); 395 | yes[Numm] = true; 396 | 397 | for(int i = 1;i < snum; i++){ 398 | if(yes[i] == false){ 399 | score[i] = score[i] - para * normalSim[i][Numm] * score[Numm]; 400 | } 401 | } 402 | } 403 | for (Integer key : m.keySet()) { 404 | summaryId.add(key); 405 | } 406 | 407 | return summaryId; 408 | 409 | } 410 | 411 | } -------------------------------------------------------------------------------- /code/stopword_Eng: -------------------------------------------------------------------------------- 1 | able 2 | about 3 | above 4 | according 5 | accordingly 6 | across 7 | actually 8 | after 9 | afterwards 10 | again 11 | against 12 | ain't 13 | all 14 | allow 15 | allows 16 | almost 17 | alone 18 | along 19 | already 20 | also 21 | although 22 | always 23 | am 24 | among 25 | amongst 26 | an 27 | and 28 | another 29 | any 30 | anybody 31 | anyhow 32 | anyone 33 | anything 34 | anyway 35 | anyways 36 | anywhere 37 | apart 38 | appear 39 | appreciate 40 | appropriate 41 | are 42 | aren't 43 | around 44 | as 45 | a's 46 | aside 47 | ask 48 | asking 49 | associated 50 | at 51 | available 52 | away 53 | awfully 54 | be 55 | became 56 | because 57 | become 58 | becomes 59 | becoming 60 | been 61 | before 62 | beforehand 63 | behind 64 | being 65 | believe 66 | below 67 | beside 68 | besides 69 | best 70 | better 71 | between 72 | beyond 73 | both 74 | brief 75 | but 76 | by 77 | came 78 | can 79 | cannot 80 | cant 81 | can't 82 | cause 83 | causes 84 | certain 85 | certainly 86 | changes 87 | clearly 88 | c'mon 89 | co 90 | com 91 | come 92 | comes 93 | concerning 94 | consequently 95 | consider 96 | considering 97 | contain 98 | containing 99 | contains 100 | corresponding 101 | could 102 | couldn't 103 | course 104 | c's 105 | currently 106 | definitely 107 | described 108 | despite 109 | did 110 | didn't 111 | different 112 | do 113 | does 114 | doesn't 115 | doing 116 | done 117 | don't 118 | down 119 | downwards 120 | during 121 | each 122 | edu 123 | eg 124 | eight 125 | either 126 | else 127 | elsewhere 128 | enough 129 | entirely 130 | especially 131 | et 132 | etc 133 | even 134 | ever 135 | every 136 | everybody 137 | everyone 138 | everything 139 | everywhere 140 | ex 141 | exactly 142 | example 143 | except 144 | far 145 | few 146 | fifth 147 | first 148 | five 149 | followed 150 | following 151 | follows 152 | for 153 | former 154 | formerly 155 | forth 156 | four 157 | from 158 | further 159 | furthermore 160 | get 161 | gets 162 | getting 163 | given 164 | gives 165 | go 166 | goes 167 | going 168 | gone 169 | got 170 | gotten 171 | greetings 172 | had 173 | hadn't 174 | happens 175 | hardly 176 | has 177 | hasn't 178 | have 179 | haven't 180 | having 181 | he 182 | hello 183 | help 184 | hence 185 | her 186 | here 187 | hereafter 188 | hereby 189 | herein 190 | here's 191 | hereupon 192 | hers 193 | herself 194 | he's 195 | hi 196 | him 197 | himself 198 | his 199 | hither 200 | hopefully 201 | how 202 | howbeit 203 | however 204 | i'd 205 | ie 206 | if 207 | ignored 208 | i'll 209 | i'm 210 | immediate 211 | in 212 | inasmuch 213 | inc 214 | indeed 215 | indicate 216 | indicated 217 | indicates 218 | inner 219 | insofar 220 | instead 221 | into 222 | inward 223 | is 224 | isn't 225 | it 226 | it'd 227 | it'll 228 | its 229 | it's 230 | itself 231 | i've 232 | just 233 | keep 234 | keeps 235 | kept 236 | know 237 | known 238 | knows 239 | last 240 | lately 241 | later 242 | latter 243 | latterly 244 | least 245 | less 246 | lest 247 | let 248 | let's 249 | like 250 | liked 251 | likely 252 | little 253 | look 254 | looking 255 | looks 256 | ltd 257 | mainly 258 | many 259 | may 260 | maybe 261 | me 262 | mean 263 | meanwhile 264 | merely 265 | might 266 | more 267 | moreover 268 | most 269 | mostly 270 | much 271 | must 272 | my 273 | myself 274 | name 275 | namely 276 | nd 277 | near 278 | nearly 279 | necessary 280 | need 281 | needs 282 | neither 283 | never 284 | nevertheless 285 | new 286 | next 287 | nine 288 | no 289 | nobody 290 | non 291 | none 292 | noone 293 | nor 294 | normally 295 | not 296 | nothing 297 | novel 298 | now 299 | nowhere 300 | obviously 301 | of 302 | off 303 | often 304 | oh 305 | ok 306 | okay 307 | old 308 | on 309 | once 310 | one 311 | ones 312 | only 313 | onto 314 | or 315 | other 316 | others 317 | otherwise 318 | ought 319 | our 320 | ours 321 | ourselves 322 | out 323 | outside 324 | over 325 | overall 326 | own 327 | particular 328 | particularly 329 | per 330 | perhaps 331 | placed 332 | please 333 | plus 334 | possible 335 | presumably 336 | probably 337 | provides 338 | que 339 | quite 340 | qv 341 | rather 342 | rd 343 | re 344 | really 345 | reasonably 346 | regarding 347 | regardless 348 | regards 349 | relatively 350 | respectively 351 | right 352 | said 353 | same 354 | saw 355 | say 356 | saying 357 | says 358 | second 359 | secondly 360 | see 361 | seeing 362 | seem 363 | seemed 364 | seeming 365 | seems 366 | seen 367 | self 368 | selves 369 | sensible 370 | sent 371 | serious 372 | seriously 373 | seven 374 | several 375 | shall 376 | she 377 | should 378 | shouldn't 379 | since 380 | six 381 | so 382 | some 383 | somebody 384 | somehow 385 | someone 386 | something 387 | sometime 388 | sometimes 389 | somewhat 390 | somewhere 391 | soon 392 | sorry 393 | specified 394 | specify 395 | specifying 396 | still 397 | sub 398 | such 399 | sup 400 | sure 401 | take 402 | taken 403 | tell 404 | tends 405 | th 406 | than 407 | thank 408 | thanks 409 | thanx 410 | that 411 | thats 412 | that's 413 | the 414 | their 415 | theirs 416 | them 417 | themselves 418 | then 419 | thence 420 | there 421 | thereafter 422 | thereby 423 | therefore 424 | therein 425 | theres 426 | there's 427 | thereupon 428 | these 429 | they 430 | they'd 431 | they'll 432 | they're 433 | they've 434 | think 435 | third 436 | this 437 | thorough 438 | thoroughly 439 | those 440 | though 441 | three 442 | through 443 | throughout 444 | thru 445 | thus 446 | to 447 | together 448 | too 449 | took 450 | toward 451 | towards 452 | tried 453 | tries 454 | truly 455 | try 456 | trying 457 | t's 458 | twice 459 | two 460 | un 461 | under 462 | unfortunately 463 | unless 464 | unlikely 465 | until 466 | unto 467 | up 468 | upon 469 | us 470 | use 471 | used 472 | useful 473 | uses 474 | using 475 | usually 476 | value 477 | various 478 | very 479 | via 480 | viz 481 | vs 482 | want 483 | wants 484 | was 485 | wasn't 486 | way 487 | we 488 | we'd 489 | welcome 490 | well 491 | we'll 492 | went 493 | were 494 | we're 495 | weren't 496 | we've 497 | what 498 | whatever 499 | what's 500 | when 501 | whence 502 | whenever 503 | where 504 | whereafter 505 | whereas 506 | whereby 507 | wherein 508 | where's 509 | whereupon 510 | wherever 511 | whether 512 | which 513 | while 514 | whither 515 | who 516 | whoever 517 | whole 518 | whom 519 | who's 520 | whose 521 | why 522 | will 523 | willing 524 | wish 525 | with 526 | within 527 | without 528 | wonder 529 | won't 530 | would 531 | wouldn't 532 | yes 533 | yet 534 | you 535 | you'd 536 | you'll 537 | your 538 | you're 539 | yours 540 | yourself 541 | yourselves 542 | you've 543 | zero 544 | zt 545 | ZT 546 | zz 547 | ZZ 548 | a 549 | able 550 | about 551 | above 552 | abst 553 | accordance 554 | according 555 | accordingly 556 | across 557 | act 558 | actually 559 | added 560 | adj 561 | adopted 562 | affected 563 | affecting 564 | affects 565 | after 566 | afterwards 567 | again 568 | against 569 | ah 570 | ain't 571 | all 572 | allow 573 | allows 574 | almost 575 | alone 576 | along 577 | already 578 | also 579 | although 580 | always 581 | am 582 | among 583 | amongst 584 | an 585 | and 586 | announce 587 | another 588 | any 589 | anybody 590 | anyhow 591 | anymore 592 | anyone 593 | anything 594 | anyway 595 | anyways 596 | anywhere 597 | apart 598 | apparently 599 | appear 600 | appreciate 601 | appropriate 602 | approximately 603 | are 604 | area 605 | areas 606 | aren 607 | arent 608 | aren't 609 | arise 610 | around 611 | as 612 | a's 613 | aside 614 | ask 615 | asked 616 | asking 617 | asks 618 | associated 619 | at 620 | auth 621 | available 622 | away 623 | awfully 624 | b 625 | back 626 | backed 627 | backing 628 | backs 629 | be 630 | became 631 | because 632 | become 633 | becomes 634 | becoming 635 | been 636 | before 637 | beforehand 638 | began 639 | begin 640 | beginning 641 | beginnings 642 | begins 643 | behind 644 | being 645 | beings 646 | believe 647 | below 648 | beside 649 | besides 650 | best 651 | better 652 | between 653 | beyond 654 | big 655 | biol 656 | both 657 | brief 658 | briefly 659 | but 660 | by 661 | c 662 | ca 663 | came 664 | can 665 | cannot 666 | cant 667 | can't 668 | case 669 | cases 670 | cause 671 | causes 672 | certain 673 | certainly 674 | changes 675 | clear 676 | clearly 677 | c'mon 678 | co 679 | com 680 | come 681 | comes 682 | concerning 683 | consequently 684 | consider 685 | considering 686 | contain 687 | containing 688 | contains 689 | corresponding 690 | could 691 | couldnt 692 | couldn't 693 | course 694 | c's 695 | currently 696 | d 697 | 'd 698 | date 699 | definitely 700 | describe 701 | described 702 | despite 703 | did 704 | didn't 705 | differ 706 | different 707 | differently 708 | discuss 709 | do 710 | does 711 | doesn't 712 | doing 713 | done 714 | don't 715 | down 716 | downed 717 | downing 718 | downs 719 | downwards 720 | due 721 | during 722 | e 723 | each 724 | early 725 | ed 726 | edu 727 | effect 728 | eg 729 | eight 730 | eighty 731 | either 732 | else 733 | elsewhere 734 | end 735 | ended 736 | ending 737 | ends 738 | enough 739 | entirely 740 | especially 741 | et 742 | et-al 743 | etc 744 | even 745 | evenly 746 | ever 747 | every 748 | everybody 749 | everyone 750 | everything 751 | everywhere 752 | ex 753 | exactly 754 | example 755 | except 756 | f 757 | face 758 | faces 759 | fact 760 | facts 761 | far 762 | felt 763 | few 764 | ff 765 | fifth 766 | find 767 | finds 768 | first 769 | five 770 | fix 771 | followed 772 | following 773 | follows 774 | for 775 | former 776 | formerly 777 | forth 778 | found 779 | four 780 | from 781 | full 782 | fully 783 | further 784 | furthered 785 | furthering 786 | furthermore 787 | furthers 788 | g 789 | gave 790 | general 791 | generally 792 | get 793 | gets 794 | getting 795 | give 796 | given 797 | gives 798 | giving 799 | go 800 | goes 801 | going 802 | gone 803 | good 804 | goods 805 | got 806 | gotten 807 | great 808 | greater 809 | greatest 810 | greetings 811 | group 812 | grouped 813 | grouping 814 | groups 815 | h 816 | had 817 | hadn't 818 | happens 819 | hardly 820 | has 821 | hasn't 822 | have 823 | haven't 824 | having 825 | he 826 | hed 827 | hello 828 | help 829 | hence 830 | her 831 | here 832 | hereafter 833 | hereby 834 | herein 835 | heres 836 | here's 837 | hereupon 838 | hers 839 | herself 840 | hes 841 | he's 842 | hi 843 | hid 844 | high 845 | higher 846 | highest 847 | him 848 | himself 849 | his 850 | hither 851 | home 852 | hopefully 853 | how 854 | howbeit 855 | however 856 | hundred 857 | i 858 | id 859 | i'd 860 | ie 861 | if 862 | ignored 863 | i'll 864 | im 865 | i'm 866 | immediate 867 | immediately 868 | importance 869 | important 870 | in 871 | inasmuch 872 | inc 873 | include 874 | indeed 875 | index 876 | indicate 877 | indicated 878 | indicates 879 | information 880 | inner 881 | insofar 882 | instead 883 | interest 884 | interested 885 | interesting 886 | interests 887 | into 888 | invention 889 | inward 890 | is 891 | isn't 892 | it 893 | itd 894 | it'd 895 | it'll 896 | its 897 | it's 898 | itself 899 | i've 900 | j 901 | just 902 | k 903 | keep 904 | keeps 905 | kept 906 | keys 907 | kg 908 | kind 909 | km 910 | knew 911 | know 912 | known 913 | knows 914 | l 915 | large 916 | largely 917 | last 918 | lately 919 | later 920 | latest 921 | latter 922 | latterly 923 | least 924 | less 925 | lest 926 | let 927 | lets 928 | let's 929 | like 930 | liked 931 | likely 932 | line 933 | little 934 | 'll 935 | long 936 | longer 937 | longest 938 | look 939 | looking 940 | looks 941 | ltd 942 | m 943 | 'm 944 | made 945 | mainly 946 | make 947 | makes 948 | making 949 | man 950 | many 951 | may 952 | maybe 953 | me 954 | mean 955 | means 956 | meantime 957 | meanwhile 958 | member 959 | members 960 | men 961 | merely 962 | mg 963 | might 964 | million 965 | miss 966 | ml 967 | more 968 | moreover 969 | most 970 | mostly 971 | mr 972 | mrs 973 | much 974 | mug 975 | must 976 | my 977 | myself 978 | n 979 | na 980 | name 981 | namely 982 | nay 983 | nd 984 | near 985 | nearly 986 | necessarily 987 | necessary 988 | need 989 | needed 990 | needing 991 | needs 992 | neither 993 | never 994 | nevertheless 995 | new 996 | newer 997 | newest 998 | next 999 | nine 1000 | ninety 1001 | no 1002 | nobody 1003 | non 1004 | none 1005 | nonetheless 1006 | noone 1007 | nor 1008 | normally 1009 | nos 1010 | not 1011 | noted 1012 | nothing 1013 | novel 1014 | now 1015 | nowhere 1016 | n't 1017 | number 1018 | numbers 1019 | o 1020 | obtain 1021 | obtained 1022 | obviously 1023 | of 1024 | off 1025 | often 1026 | oh 1027 | ok 1028 | okay 1029 | old 1030 | older 1031 | oldest 1032 | omitted 1033 | on 1034 | once 1035 | one 1036 | ones 1037 | only 1038 | onto 1039 | open 1040 | opened 1041 | opening 1042 | opens 1043 | or 1044 | ord 1045 | order 1046 | ordered 1047 | ordering 1048 | orders 1049 | other 1050 | others 1051 | otherwise 1052 | ought 1053 | our 1054 | ours 1055 | ourselves 1056 | out 1057 | outside 1058 | over 1059 | overall 1060 | owing 1061 | own 1062 | p 1063 | page 1064 | pages 1065 | part 1066 | parted 1067 | particular 1068 | particularly 1069 | parting 1070 | parts 1071 | past 1072 | per 1073 | perhaps 1074 | place 1075 | placed 1076 | places 1077 | please 1078 | plus 1079 | point 1080 | pointed 1081 | pointing 1082 | points 1083 | poorly 1084 | possible 1085 | possibly 1086 | potentially 1087 | pp 1088 | predominantly 1089 | present 1090 | presented 1091 | presenting 1092 | presents 1093 | presumably 1094 | previously 1095 | primarily 1096 | probably 1097 | problem 1098 | problems 1099 | promptly 1100 | proud 1101 | provides 1102 | put 1103 | puts 1104 | q 1105 | que 1106 | quickly 1107 | quite 1108 | qv 1109 | r 1110 | ran 1111 | rather 1112 | rd 1113 | re 1114 | 're 1115 | readily 1116 | really 1117 | reasonably 1118 | recent 1119 | recently 1120 | ref 1121 | refs 1122 | regarding 1123 | regardless 1124 | regards 1125 | related 1126 | relatively 1127 | research 1128 | respectively 1129 | resulted 1130 | resulting 1131 | results 1132 | right 1133 | room 1134 | rooms 1135 | run 1136 | s 1137 | 's 1138 | said 1139 | same 1140 | saw 1141 | say 1142 | saying 1143 | says 1144 | sec 1145 | second 1146 | secondly 1147 | seconds 1148 | section 1149 | see 1150 | seeing 1151 | seem 1152 | seemed 1153 | seeming 1154 | seems 1155 | seen 1156 | sees 1157 | self 1158 | selves 1159 | sensible 1160 | sent 1161 | serious 1162 | seriously 1163 | seven 1164 | several 1165 | shall 1166 | she 1167 | shed 1168 | she'll 1169 | shes 1170 | should 1171 | shouldn't 1172 | show 1173 | showed 1174 | showing 1175 | shown 1176 | showns 1177 | shows 1178 | side 1179 | sides 1180 | significant 1181 | significantly 1182 | similar 1183 | similarly 1184 | since 1185 | six 1186 | slightly 1187 | small 1188 | smaller 1189 | smallest 1190 | so 1191 | some 1192 | somebody 1193 | somehow 1194 | someone 1195 | somethan 1196 | something 1197 | sometime 1198 | sometimes 1199 | somewhat 1200 | somewhere 1201 | soon 1202 | sorry 1203 | specifically 1204 | specified 1205 | specify 1206 | specifying 1207 | state 1208 | states 1209 | still 1210 | stop 1211 | strongly 1212 | sub 1213 | substantially 1214 | successfully 1215 | such 1216 | sufficiently 1217 | suggest 1218 | sup 1219 | sure 1220 | t 1221 | 't 1222 | take 1223 | taken 1224 | taking 1225 | tell 1226 | tends 1227 | th 1228 | than 1229 | thank 1230 | thanks 1231 | thanx 1232 | that 1233 | that'll 1234 | thats 1235 | that's 1236 | that've 1237 | the 1238 | their 1239 | theirs 1240 | them 1241 | themselves 1242 | then 1243 | thence 1244 | there 1245 | thereafter 1246 | thereby 1247 | thered 1248 | therefore 1249 | therein 1250 | there'll 1251 | thereof 1252 | therere 1253 | theres 1254 | there's 1255 | thereto 1256 | thereupon 1257 | there've 1258 | these 1259 | they 1260 | theyd 1261 | they'd 1262 | they'll 1263 | theyre 1264 | they're 1265 | they've 1266 | thing 1267 | things 1268 | think 1269 | thinks 1270 | third 1271 | this 1272 | thorough 1273 | thoroughly 1274 | those 1275 | thou 1276 | though 1277 | thoughh 1278 | thought 1279 | thoughts 1280 | thousand 1281 | three 1282 | throug 1283 | through 1284 | throughout 1285 | thru 1286 | thus 1287 | til 1288 | tip 1289 | to 1290 | today 1291 | together 1292 | too 1293 | took 1294 | toward 1295 | towards 1296 | tried 1297 | tries 1298 | truly 1299 | try 1300 | trying 1301 | ts 1302 | t's 1303 | turn 1304 | turned 1305 | turning 1306 | turns 1307 | twice 1308 | two 1309 | u 1310 | un 1311 | under 1312 | unfortunately 1313 | unless 1314 | unlike 1315 | unlikely 1316 | until 1317 | unto 1318 | up 1319 | upon 1320 | ups 1321 | us 1322 | use 1323 | used 1324 | useful 1325 | usefully 1326 | usefulness 1327 | uses 1328 | using 1329 | usually 1330 | uucp 1331 | v 1332 | value 1333 | various 1334 | 've 1335 | very 1336 | via 1337 | viz 1338 | vol 1339 | vols 1340 | vs 1341 | w 1342 | want 1343 | wanted 1344 | wanting 1345 | wants 1346 | was 1347 | wasn't 1348 | way 1349 | ways 1350 | we 1351 | wed 1352 | we'd 1353 | welcome 1354 | well 1355 | we'll 1356 | wells 1357 | went 1358 | were 1359 | we're 1360 | weren't 1361 | we've 1362 | what 1363 | whatever 1364 | what'll 1365 | whats 1366 | what's 1367 | when 1368 | whence 1369 | whenever 1370 | where 1371 | whereafter 1372 | whereas 1373 | whereby 1374 | wherein 1375 | wheres 1376 | where's 1377 | whereupon 1378 | wherever 1379 | whether 1380 | which 1381 | while 1382 | whim 1383 | whither 1384 | who 1385 | whod 1386 | whoever 1387 | whole 1388 | who'll 1389 | whom 1390 | whomever 1391 | whos 1392 | who's 1393 | whose 1394 | why 1395 | widely 1396 | will 1397 | willing 1398 | wish 1399 | with 1400 | within 1401 | without 1402 | wonder 1403 | won't 1404 | words 1405 | work 1406 | worked 1407 | working 1408 | works 1409 | world 1410 | would 1411 | wouldn't 1412 | www 1413 | x 1414 | y 1415 | year 1416 | years 1417 | yes 1418 | yet 1419 | you 1420 | youd 1421 | you'd 1422 | you'll 1423 | young 1424 | younger 1425 | youngest 1426 | your 1427 | youre 1428 | you're 1429 | yours 1430 | yourself 1431 | yourselves 1432 | you've 1433 | z 1434 | zero 1435 | -------------------------------------------------------------------------------- /lib/ansj_seg-5.0.2-all-in-one.jar: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PKULCWM/PKUSUMSUM/721bbee1b1e112dce97c2fa3193fb61488a624bb/lib/ansj_seg-5.0.2-all-in-one.jar -------------------------------------------------------------------------------- /lib/lpsolve55j.jar: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PKULCWM/PKUSUMSUM/721bbee1b1e112dce97c2fa3193fb61488a624bb/lib/lpsolve55j.jar -------------------------------------------------------------------------------- /lib/slf4j-nop-1.7.21.jar: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PKULCWM/PKUSUMSUM/721bbee1b1e112dce97c2fa3193fb61488a624bb/lib/slf4j-nop-1.7.21.jar -------------------------------------------------------------------------------- /lib/stanford-ner.jar: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PKULCWM/PKUSUMSUM/721bbee1b1e112dce97c2fa3193fb61488a624bb/lib/stanford-ner.jar -------------------------------------------------------------------------------- /lib/stopword_Eng: -------------------------------------------------------------------------------- 1 | able 2 | about 3 | above 4 | according 5 | accordingly 6 | across 7 | actually 8 | after 9 | afterwards 10 | again 11 | against 12 | ain't 13 | all 14 | allow 15 | allows 16 | almost 17 | alone 18 | along 19 | already 20 | also 21 | although 22 | always 23 | am 24 | among 25 | amongst 26 | an 27 | and 28 | another 29 | any 30 | anybody 31 | anyhow 32 | anyone 33 | anything 34 | anyway 35 | anyways 36 | anywhere 37 | apart 38 | appear 39 | appreciate 40 | appropriate 41 | are 42 | aren't 43 | around 44 | as 45 | a's 46 | aside 47 | ask 48 | asking 49 | associated 50 | at 51 | available 52 | away 53 | awfully 54 | be 55 | became 56 | because 57 | become 58 | becomes 59 | becoming 60 | been 61 | before 62 | beforehand 63 | behind 64 | being 65 | believe 66 | below 67 | beside 68 | besides 69 | best 70 | better 71 | between 72 | beyond 73 | both 74 | brief 75 | but 76 | by 77 | came 78 | can 79 | cannot 80 | cant 81 | can't 82 | cause 83 | causes 84 | certain 85 | certainly 86 | changes 87 | clearly 88 | c'mon 89 | co 90 | com 91 | come 92 | comes 93 | concerning 94 | consequently 95 | consider 96 | considering 97 | contain 98 | containing 99 | contains 100 | corresponding 101 | could 102 | couldn't 103 | course 104 | c's 105 | currently 106 | definitely 107 | described 108 | despite 109 | did 110 | didn't 111 | different 112 | do 113 | does 114 | doesn't 115 | doing 116 | done 117 | don't 118 | down 119 | downwards 120 | during 121 | each 122 | edu 123 | eg 124 | eight 125 | either 126 | else 127 | elsewhere 128 | enough 129 | entirely 130 | especially 131 | et 132 | etc 133 | even 134 | ever 135 | every 136 | everybody 137 | everyone 138 | everything 139 | everywhere 140 | ex 141 | exactly 142 | example 143 | except 144 | far 145 | few 146 | fifth 147 | first 148 | five 149 | followed 150 | following 151 | follows 152 | for 153 | former 154 | formerly 155 | forth 156 | four 157 | from 158 | further 159 | furthermore 160 | get 161 | gets 162 | getting 163 | given 164 | gives 165 | go 166 | goes 167 | going 168 | gone 169 | got 170 | gotten 171 | greetings 172 | had 173 | hadn't 174 | happens 175 | hardly 176 | has 177 | hasn't 178 | have 179 | haven't 180 | having 181 | he 182 | hello 183 | help 184 | hence 185 | her 186 | here 187 | hereafter 188 | hereby 189 | herein 190 | here's 191 | hereupon 192 | hers 193 | herself 194 | he's 195 | hi 196 | him 197 | himself 198 | his 199 | hither 200 | hopefully 201 | how 202 | howbeit 203 | however 204 | i'd 205 | ie 206 | if 207 | ignored 208 | i'll 209 | i'm 210 | immediate 211 | in 212 | inasmuch 213 | inc 214 | indeed 215 | indicate 216 | indicated 217 | indicates 218 | inner 219 | insofar 220 | instead 221 | into 222 | inward 223 | is 224 | isn't 225 | it 226 | it'd 227 | it'll 228 | its 229 | it's 230 | itself 231 | i've 232 | just 233 | keep 234 | keeps 235 | kept 236 | know 237 | known 238 | knows 239 | last 240 | lately 241 | later 242 | latter 243 | latterly 244 | least 245 | less 246 | lest 247 | let 248 | let's 249 | like 250 | liked 251 | likely 252 | little 253 | look 254 | looking 255 | looks 256 | ltd 257 | mainly 258 | many 259 | may 260 | maybe 261 | me 262 | mean 263 | meanwhile 264 | merely 265 | might 266 | more 267 | moreover 268 | most 269 | mostly 270 | much 271 | must 272 | my 273 | myself 274 | name 275 | namely 276 | nd 277 | near 278 | nearly 279 | necessary 280 | need 281 | needs 282 | neither 283 | never 284 | nevertheless 285 | new 286 | next 287 | nine 288 | no 289 | nobody 290 | non 291 | none 292 | noone 293 | nor 294 | normally 295 | not 296 | nothing 297 | novel 298 | now 299 | nowhere 300 | obviously 301 | of 302 | off 303 | often 304 | oh 305 | ok 306 | okay 307 | old 308 | on 309 | once 310 | one 311 | ones 312 | only 313 | onto 314 | or 315 | other 316 | others 317 | otherwise 318 | ought 319 | our 320 | ours 321 | ourselves 322 | out 323 | outside 324 | over 325 | overall 326 | own 327 | particular 328 | particularly 329 | per 330 | perhaps 331 | placed 332 | please 333 | plus 334 | possible 335 | presumably 336 | probably 337 | provides 338 | que 339 | quite 340 | qv 341 | rather 342 | rd 343 | re 344 | really 345 | reasonably 346 | regarding 347 | regardless 348 | regards 349 | relatively 350 | respectively 351 | right 352 | said 353 | same 354 | saw 355 | say 356 | saying 357 | says 358 | second 359 | secondly 360 | see 361 | seeing 362 | seem 363 | seemed 364 | seeming 365 | seems 366 | seen 367 | self 368 | selves 369 | sensible 370 | sent 371 | serious 372 | seriously 373 | seven 374 | several 375 | shall 376 | she 377 | should 378 | shouldn't 379 | since 380 | six 381 | so 382 | some 383 | somebody 384 | somehow 385 | someone 386 | something 387 | sometime 388 | sometimes 389 | somewhat 390 | somewhere 391 | soon 392 | sorry 393 | specified 394 | specify 395 | specifying 396 | still 397 | sub 398 | such 399 | sup 400 | sure 401 | take 402 | taken 403 | tell 404 | tends 405 | th 406 | than 407 | thank 408 | thanks 409 | thanx 410 | that 411 | thats 412 | that's 413 | the 414 | their 415 | theirs 416 | them 417 | themselves 418 | then 419 | thence 420 | there 421 | thereafter 422 | thereby 423 | therefore 424 | therein 425 | theres 426 | there's 427 | thereupon 428 | these 429 | they 430 | they'd 431 | they'll 432 | they're 433 | they've 434 | think 435 | third 436 | this 437 | thorough 438 | thoroughly 439 | those 440 | though 441 | three 442 | through 443 | throughout 444 | thru 445 | thus 446 | to 447 | together 448 | too 449 | took 450 | toward 451 | towards 452 | tried 453 | tries 454 | truly 455 | try 456 | trying 457 | t's 458 | twice 459 | two 460 | un 461 | under 462 | unfortunately 463 | unless 464 | unlikely 465 | until 466 | unto 467 | up 468 | upon 469 | us 470 | use 471 | used 472 | useful 473 | uses 474 | using 475 | usually 476 | value 477 | various 478 | very 479 | via 480 | viz 481 | vs 482 | want 483 | wants 484 | was 485 | wasn't 486 | way 487 | we 488 | we'd 489 | welcome 490 | well 491 | we'll 492 | went 493 | were 494 | we're 495 | weren't 496 | we've 497 | what 498 | whatever 499 | what's 500 | when 501 | whence 502 | whenever 503 | where 504 | whereafter 505 | whereas 506 | whereby 507 | wherein 508 | where's 509 | whereupon 510 | wherever 511 | whether 512 | which 513 | while 514 | whither 515 | who 516 | whoever 517 | whole 518 | whom 519 | who's 520 | whose 521 | why 522 | will 523 | willing 524 | wish 525 | with 526 | within 527 | without 528 | wonder 529 | won't 530 | would 531 | wouldn't 532 | yes 533 | yet 534 | you 535 | you'd 536 | you'll 537 | your 538 | you're 539 | yours 540 | yourself 541 | yourselves 542 | you've 543 | zero 544 | zt 545 | ZT 546 | zz 547 | ZZ 548 | a 549 | able 550 | about 551 | above 552 | abst 553 | accordance 554 | according 555 | accordingly 556 | across 557 | act 558 | actually 559 | added 560 | adj 561 | adopted 562 | affected 563 | affecting 564 | affects 565 | after 566 | afterwards 567 | again 568 | against 569 | ah 570 | ain't 571 | all 572 | allow 573 | allows 574 | almost 575 | alone 576 | along 577 | already 578 | also 579 | although 580 | always 581 | am 582 | among 583 | amongst 584 | an 585 | and 586 | announce 587 | another 588 | any 589 | anybody 590 | anyhow 591 | anymore 592 | anyone 593 | anything 594 | anyway 595 | anyways 596 | anywhere 597 | apart 598 | apparently 599 | appear 600 | appreciate 601 | appropriate 602 | approximately 603 | are 604 | area 605 | areas 606 | aren 607 | arent 608 | aren't 609 | arise 610 | around 611 | as 612 | a's 613 | aside 614 | ask 615 | asked 616 | asking 617 | asks 618 | associated 619 | at 620 | auth 621 | available 622 | away 623 | awfully 624 | b 625 | back 626 | backed 627 | backing 628 | backs 629 | be 630 | became 631 | because 632 | become 633 | becomes 634 | becoming 635 | been 636 | before 637 | beforehand 638 | began 639 | begin 640 | beginning 641 | beginnings 642 | begins 643 | behind 644 | being 645 | beings 646 | believe 647 | below 648 | beside 649 | besides 650 | best 651 | better 652 | between 653 | beyond 654 | big 655 | biol 656 | both 657 | brief 658 | briefly 659 | but 660 | by 661 | c 662 | ca 663 | came 664 | can 665 | cannot 666 | cant 667 | can't 668 | case 669 | cases 670 | cause 671 | causes 672 | certain 673 | certainly 674 | changes 675 | clear 676 | clearly 677 | c'mon 678 | co 679 | com 680 | come 681 | comes 682 | concerning 683 | consequently 684 | consider 685 | considering 686 | contain 687 | containing 688 | contains 689 | corresponding 690 | could 691 | couldnt 692 | couldn't 693 | course 694 | c's 695 | currently 696 | d 697 | 'd 698 | date 699 | definitely 700 | describe 701 | described 702 | despite 703 | did 704 | didn't 705 | differ 706 | different 707 | differently 708 | discuss 709 | do 710 | does 711 | doesn't 712 | doing 713 | done 714 | don't 715 | down 716 | downed 717 | downing 718 | downs 719 | downwards 720 | due 721 | during 722 | e 723 | each 724 | early 725 | ed 726 | edu 727 | effect 728 | eg 729 | eight 730 | eighty 731 | either 732 | else 733 | elsewhere 734 | end 735 | ended 736 | ending 737 | ends 738 | enough 739 | entirely 740 | especially 741 | et 742 | et-al 743 | etc 744 | even 745 | evenly 746 | ever 747 | every 748 | everybody 749 | everyone 750 | everything 751 | everywhere 752 | ex 753 | exactly 754 | example 755 | except 756 | f 757 | face 758 | faces 759 | fact 760 | facts 761 | far 762 | felt 763 | few 764 | ff 765 | fifth 766 | find 767 | finds 768 | first 769 | five 770 | fix 771 | followed 772 | following 773 | follows 774 | for 775 | former 776 | formerly 777 | forth 778 | found 779 | four 780 | from 781 | full 782 | fully 783 | further 784 | furthered 785 | furthering 786 | furthermore 787 | furthers 788 | g 789 | gave 790 | general 791 | generally 792 | get 793 | gets 794 | getting 795 | give 796 | given 797 | gives 798 | giving 799 | go 800 | goes 801 | going 802 | gone 803 | good 804 | goods 805 | got 806 | gotten 807 | great 808 | greater 809 | greatest 810 | greetings 811 | group 812 | grouped 813 | grouping 814 | groups 815 | h 816 | had 817 | hadn't 818 | happens 819 | hardly 820 | has 821 | hasn't 822 | have 823 | haven't 824 | having 825 | he 826 | hed 827 | hello 828 | help 829 | hence 830 | her 831 | here 832 | hereafter 833 | hereby 834 | herein 835 | heres 836 | here's 837 | hereupon 838 | hers 839 | herself 840 | hes 841 | he's 842 | hi 843 | hid 844 | high 845 | higher 846 | highest 847 | him 848 | himself 849 | his 850 | hither 851 | home 852 | hopefully 853 | how 854 | howbeit 855 | however 856 | hundred 857 | i 858 | id 859 | i'd 860 | ie 861 | if 862 | ignored 863 | i'll 864 | im 865 | i'm 866 | immediate 867 | immediately 868 | importance 869 | important 870 | in 871 | inasmuch 872 | inc 873 | include 874 | indeed 875 | index 876 | indicate 877 | indicated 878 | indicates 879 | information 880 | inner 881 | insofar 882 | instead 883 | interest 884 | interested 885 | interesting 886 | interests 887 | into 888 | invention 889 | inward 890 | is 891 | isn't 892 | it 893 | itd 894 | it'd 895 | it'll 896 | its 897 | it's 898 | itself 899 | i've 900 | j 901 | just 902 | k 903 | keep 904 | keeps 905 | kept 906 | keys 907 | kg 908 | kind 909 | km 910 | knew 911 | know 912 | known 913 | knows 914 | l 915 | large 916 | largely 917 | last 918 | lately 919 | later 920 | latest 921 | latter 922 | latterly 923 | least 924 | less 925 | lest 926 | let 927 | lets 928 | let's 929 | like 930 | liked 931 | likely 932 | line 933 | little 934 | 'll 935 | long 936 | longer 937 | longest 938 | look 939 | looking 940 | looks 941 | ltd 942 | m 943 | 'm 944 | made 945 | mainly 946 | make 947 | makes 948 | making 949 | man 950 | many 951 | may 952 | maybe 953 | me 954 | mean 955 | means 956 | meantime 957 | meanwhile 958 | member 959 | members 960 | men 961 | merely 962 | mg 963 | might 964 | million 965 | miss 966 | ml 967 | more 968 | moreover 969 | most 970 | mostly 971 | mr 972 | mrs 973 | much 974 | mug 975 | must 976 | my 977 | myself 978 | n 979 | na 980 | name 981 | namely 982 | nay 983 | nd 984 | near 985 | nearly 986 | necessarily 987 | necessary 988 | need 989 | needed 990 | needing 991 | needs 992 | neither 993 | never 994 | nevertheless 995 | new 996 | newer 997 | newest 998 | next 999 | nine 1000 | ninety 1001 | no 1002 | nobody 1003 | non 1004 | none 1005 | nonetheless 1006 | noone 1007 | nor 1008 | normally 1009 | nos 1010 | not 1011 | noted 1012 | nothing 1013 | novel 1014 | now 1015 | nowhere 1016 | n't 1017 | number 1018 | numbers 1019 | o 1020 | obtain 1021 | obtained 1022 | obviously 1023 | of 1024 | off 1025 | often 1026 | oh 1027 | ok 1028 | okay 1029 | old 1030 | older 1031 | oldest 1032 | omitted 1033 | on 1034 | once 1035 | one 1036 | ones 1037 | only 1038 | onto 1039 | open 1040 | opened 1041 | opening 1042 | opens 1043 | or 1044 | ord 1045 | order 1046 | ordered 1047 | ordering 1048 | orders 1049 | other 1050 | others 1051 | otherwise 1052 | ought 1053 | our 1054 | ours 1055 | ourselves 1056 | out 1057 | outside 1058 | over 1059 | overall 1060 | owing 1061 | own 1062 | p 1063 | page 1064 | pages 1065 | part 1066 | parted 1067 | particular 1068 | particularly 1069 | parting 1070 | parts 1071 | past 1072 | per 1073 | perhaps 1074 | place 1075 | placed 1076 | places 1077 | please 1078 | plus 1079 | point 1080 | pointed 1081 | pointing 1082 | points 1083 | poorly 1084 | possible 1085 | possibly 1086 | potentially 1087 | pp 1088 | predominantly 1089 | present 1090 | presented 1091 | presenting 1092 | presents 1093 | presumably 1094 | previously 1095 | primarily 1096 | probably 1097 | problem 1098 | problems 1099 | promptly 1100 | proud 1101 | provides 1102 | put 1103 | puts 1104 | q 1105 | que 1106 | quickly 1107 | quite 1108 | qv 1109 | r 1110 | ran 1111 | rather 1112 | rd 1113 | re 1114 | 're 1115 | readily 1116 | really 1117 | reasonably 1118 | recent 1119 | recently 1120 | ref 1121 | refs 1122 | regarding 1123 | regardless 1124 | regards 1125 | related 1126 | relatively 1127 | research 1128 | respectively 1129 | resulted 1130 | resulting 1131 | results 1132 | right 1133 | room 1134 | rooms 1135 | run 1136 | s 1137 | 's 1138 | said 1139 | same 1140 | saw 1141 | say 1142 | saying 1143 | says 1144 | sec 1145 | second 1146 | secondly 1147 | seconds 1148 | section 1149 | see 1150 | seeing 1151 | seem 1152 | seemed 1153 | seeming 1154 | seems 1155 | seen 1156 | sees 1157 | self 1158 | selves 1159 | sensible 1160 | sent 1161 | serious 1162 | seriously 1163 | seven 1164 | several 1165 | shall 1166 | she 1167 | shed 1168 | she'll 1169 | shes 1170 | should 1171 | shouldn't 1172 | show 1173 | showed 1174 | showing 1175 | shown 1176 | showns 1177 | shows 1178 | side 1179 | sides 1180 | significant 1181 | significantly 1182 | similar 1183 | similarly 1184 | since 1185 | six 1186 | slightly 1187 | small 1188 | smaller 1189 | smallest 1190 | so 1191 | some 1192 | somebody 1193 | somehow 1194 | someone 1195 | somethan 1196 | something 1197 | sometime 1198 | sometimes 1199 | somewhat 1200 | somewhere 1201 | soon 1202 | sorry 1203 | specifically 1204 | specified 1205 | specify 1206 | specifying 1207 | state 1208 | states 1209 | still 1210 | stop 1211 | strongly 1212 | sub 1213 | substantially 1214 | successfully 1215 | such 1216 | sufficiently 1217 | suggest 1218 | sup 1219 | sure 1220 | t 1221 | 't 1222 | take 1223 | taken 1224 | taking 1225 | tell 1226 | tends 1227 | th 1228 | than 1229 | thank 1230 | thanks 1231 | thanx 1232 | that 1233 | that'll 1234 | thats 1235 | that's 1236 | that've 1237 | the 1238 | their 1239 | theirs 1240 | them 1241 | themselves 1242 | then 1243 | thence 1244 | there 1245 | thereafter 1246 | thereby 1247 | thered 1248 | therefore 1249 | therein 1250 | there'll 1251 | thereof 1252 | therere 1253 | theres 1254 | there's 1255 | thereto 1256 | thereupon 1257 | there've 1258 | these 1259 | they 1260 | theyd 1261 | they'd 1262 | they'll 1263 | theyre 1264 | they're 1265 | they've 1266 | thing 1267 | things 1268 | think 1269 | thinks 1270 | third 1271 | this 1272 | thorough 1273 | thoroughly 1274 | those 1275 | thou 1276 | though 1277 | thoughh 1278 | thought 1279 | thoughts 1280 | thousand 1281 | three 1282 | throug 1283 | through 1284 | throughout 1285 | thru 1286 | thus 1287 | til 1288 | tip 1289 | to 1290 | today 1291 | together 1292 | too 1293 | took 1294 | toward 1295 | towards 1296 | tried 1297 | tries 1298 | truly 1299 | try 1300 | trying 1301 | ts 1302 | t's 1303 | turn 1304 | turned 1305 | turning 1306 | turns 1307 | twice 1308 | two 1309 | u 1310 | un 1311 | under 1312 | unfortunately 1313 | unless 1314 | unlike 1315 | unlikely 1316 | until 1317 | unto 1318 | up 1319 | upon 1320 | ups 1321 | us 1322 | use 1323 | used 1324 | useful 1325 | usefully 1326 | usefulness 1327 | uses 1328 | using 1329 | usually 1330 | uucp 1331 | v 1332 | value 1333 | various 1334 | 've 1335 | very 1336 | via 1337 | viz 1338 | vol 1339 | vols 1340 | vs 1341 | w 1342 | want 1343 | wanted 1344 | wanting 1345 | wants 1346 | was 1347 | wasn't 1348 | way 1349 | ways 1350 | we 1351 | wed 1352 | we'd 1353 | welcome 1354 | well 1355 | we'll 1356 | wells 1357 | went 1358 | were 1359 | we're 1360 | weren't 1361 | we've 1362 | what 1363 | whatever 1364 | what'll 1365 | whats 1366 | what's 1367 | when 1368 | whence 1369 | whenever 1370 | where 1371 | whereafter 1372 | whereas 1373 | whereby 1374 | wherein 1375 | wheres 1376 | where's 1377 | whereupon 1378 | wherever 1379 | whether 1380 | which 1381 | while 1382 | whim 1383 | whither 1384 | who 1385 | whod 1386 | whoever 1387 | whole 1388 | who'll 1389 | whom 1390 | whomever 1391 | whos 1392 | who's 1393 | whose 1394 | why 1395 | widely 1396 | will 1397 | willing 1398 | wish 1399 | with 1400 | within 1401 | without 1402 | wonder 1403 | won't 1404 | words 1405 | work 1406 | worked 1407 | working 1408 | works 1409 | world 1410 | would 1411 | wouldn't 1412 | www 1413 | x 1414 | y 1415 | year 1416 | years 1417 | yes 1418 | yet 1419 | you 1420 | youd 1421 | you'd 1422 | you'll 1423 | young 1424 | younger 1425 | youngest 1426 | your 1427 | youre 1428 | you're 1429 | yours 1430 | yourself 1431 | yourselves 1432 | you've 1433 | z 1434 | zero 1435 | --------------------------------------------------------------------------------