├── PKUSUMSUM.jar
├── README.md
├── README_V1.2.docx
├── code
    ├── ClusterCMRW.java
    ├── Coverage.java
    ├── ILP.java
    ├── Lead.java
    ├── LexPageRank.java
    ├── MEAD.java
    ├── Main.java
    ├── ManifoldRank.java
    ├── Run.java
    ├── Stemmer.java
    ├── Submodular.java
    ├── TextRank.java
    ├── Tokenizer.java
    ├── doc.java
    └── stopword_Eng
└── lib
    ├── ansj_seg-5.0.2-all-in-one.jar
    ├── lpsolve55j.jar
    ├── slf4j-nop-1.7.21.jar
    ├── stanford-ner.jar
    └── stopword_Eng


/PKUSUMSUM.jar:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/PKULCWM/PKUSUMSUM/721bbee1b1e112dce97c2fa3193fb61488a624bb/PKUSUMSUM.jar


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | #README
 2 | 
 3 | ##Introduction
 4 | PKUSUMSUM (PKU’s SUMmary of SUMmarization methods) is an integrated toolkit for automatic document summarization. It supports single-document, multi-document and topic-focused multi-document summarizations, and a variety of summarization methods have been implemented in the toolkit.
 5 | 
 6 | Users can easily use the toolkit to produce summaries for documents or document sets, and implement their own summarization methods based on the platform.
 7 | 
 8 | Main features of PKUSUMSUM include:* It integrates stable and various summarization methods, and the performance is good enough.
 9 | * It supports three typical summarization tasks, including simple-document, multi-document and topic-focused multi-document summarizations.
10 | * It supports Western languages (e.g. English) and Chinese language.
11 | * It integrates English tokenizer, stemmer and Chinese word segmentation tools.
12 | * The Java platform can be easily distributed on different OS platforms, like Windows, Linux and MacOS.
13 | * It is open source and developed with modularization, so that users can add new methods and modules into the toolkit conveniently.
14 | 
15 | The package of PKUSUMSUM includes the Jar package, source code in “/code” and referenced libraries in “/lib”.
16 | 
17 | The correspondence between the summarization methods and the summarization tasks is shown in the following table:
18 | 
19 | | Method  | Single-document summarization | Multi-document summarization | Topic-based Multi-document summarization |
20 | |:-----------:|:--------:| :--------:| :-------:|
21 | | Coverage | - | Yes | Yes |
22 | | Lead | Yes|  Yes | Yes |
23 | | Centroid [1]| Yes | Yes | Yes |
24 | | TextRank [2]| Yes | Yes | - |
25 | | LexPageRank[3]| Yes | Yes | - |
26 | | ILP [4]| Yes | Yes | - |
27 | | Submodular1 [5]| Yes | Yes | - |
28 | | Submodular2 [6]| Yes | Yes | - |
29 | | ClusterCMRW[7]| - | Yes | - |
30 | | ManifoldRank[8]| - | - | Yes |
31 | 
32 | ##Notice
33 | * We use **lp_solve for Java** to solve the ILP model. If you choose the ILP method to solve the problem, please configure lp_solve. 
34 | 	* Copy the lp_solve dynamic libraries from the archives lp_solve_5.5_dev.(zip or tar.gz) and lp_solve_5.5_exe.(zip or tar.gz) to a standard library directory on the target platform. On Windows, the typical directory is \WINDOWS or \WINDOWS\SYSTEM32. On Linux, the typical directory is  /usr/local/lib.
35 | 	* Unzip the Java wrapper distribution file to a new directory. On Windows, copy the wrapper stub library **lpsolve55j.dll** to the directory that already contains **lpsolve55.dll**.On Linux, copy the wrapper stub library **liblpsolve55j.so** to the directory that already contains **liblpsolve55.so**. Run **ldconfig** to include the library in the shared library cache.
36 | 	* You can look more details on the website (http://lpsolve.sourceforge.net/5.5/).
37 | * The version of JRE requires 1.8 and above.
38 | * The input documents must be encoded using UTF-8. 
39 | 
40 | ##Usage
41 | Open a terminal under the PKUSUMSUM directory and type in:> java -jar PKUSUMSUM.jar <parameters>
42 | 
43 | ###Parameters:
44 | There are several parameters required to be set when using the toolkit. Parameters in the "[]" are optional and they have default values.
45 | 
46 | 
47 | | RequiredParameters | Description | 
48 | |:---------------------|:--------|
49 | | -T | Specify which task to do.1: single-document summarization; 2: multi-document summarization;3: topic-based multi-document summarization. | 
50 | | -topic | Specify the path of the topic file only for the topic-based multi-document summarization task.| 
51 | | -input | Specify the path of the input document or document set.**For single-document summarization**, it specifies the path of the input document (including the document filename) to be summarized.**For multi-document summarization or topic-based multi-document summarization**, it specifies the directory of the input documents to be summarized.|
52 | | -output| Specify the path of the output file containing the final summary. | 
53 | | -L| Specify the language of the input document(s): 1 – Chinese, 2 – English, 3 - other Western languages. | 
54 | | -n| Specify the expected number of words in the final summary. | 
55 | | -m| Specify which method is used to solve the problem.**For single-document summarization**: 1 - Lead, 2 - Centroid, 3 - ILP, 4 - LexPageRank, 5 -TextRank, 6 - Submodular;**For multi-document summarization**: 0 - Coverage, 1 - Lead, 2 - Centroid, 3 - ILP, 4 - LexPageRank, 5 - TextRank, 6 - Submodular, 7 - ClusterCMRW;**For topic-based multi-document summarization**: 0 - Coverage, 1 - Lead, 2 - Centroid, 8 - ManifoldRank. | 
56 | | -stop| Specify whether to remove the stopwords.If you need to remove the stop words, you should provide the stopword list and specify the path of the stop word file. Note that we have prepared an English stopword list in the file “/lib/stopword_Eng”, you can use it by input “y”.If you don’t need to remove the stop words, please input “n”. | 
57 | 
58 | | OptionalParameters | Description | 
59 | |:---------------------|:--------|
60 | | [-s] | Specify whether you want to conduct word stemming (Only for English language):1 - stem, 2 - no stem; the default value is 1. |
61 | | [-R] | Specify which **redundancy removal method** is used for summary sentence selection. The ILP and Submodular methods don’t need extra redundancy removal. The default value is  3 for ManifoldRank, and 1 for other methods which need redundancy removal.**1 – MMR-based method**;**2 – Threshold-based method**: if the maximum similarity between an unselected sentence and the already selected sentences is larger than a predefined threshold, this unselected sentence will be removed.**3 – Penalty imposing method**: after a summary sentence is selected, the score of each unselected sentence will be penalized by subtracting the product of a predefined penalty ratio and the similarity between the unselected sentence and the summary sentence. |
62 | | [-p] | It is the internal parameter of the redundancy removal methods and has a default value of 0.7.**For MMR and Penalty imposing method**, it specifies the penalty ratio. **For threshold-based method**, it specifies the threshold value.|
63 | | [-beta] | It is a scaling factor of sentence length when we choose sentences, and its range is [0, 1].  In several summarization methods, long sentences are likely to get higher scores than short sentences. Considering the length limit of the summary, we provide a scaling factor of sentence length to normalize the score of each sentence. You can learn details from README.docx. The default value is 0.1. |
64 | |**For LexPageRank**|
65 | |[-link]|It specifies the similarity threshold for linking two sentences. If the similarity of two sentences is larger than the threshold, then add an edge between the sentences. Its range is [0, 1] and the default value is 0.1.|
66 | | **For ClusterCMRW**| |
67 | |[-Alpha]|It specifies the ratio for controlling the expected cluster number of the document set. Its range is [0, 1] and has a default value of 0.1.|
68 | |[-Lamda]|It specifies the combination weight for controlling the relative contributions from the source cluster and the destination cluster. Its range is [0, 1] and has a default value of 0.8.|
69 | |**For Submodular**||
70 | |[-sub]|It specifies the type of the submodular method, and the default value is 2. 1 – a method in Li's paper (Li at el, 2012);2 - a modification method from Lin's paper (Lin and Bilmes, 2010);|
71 | |[-A]|It specifies the threshold coefficient. The range is [0, 1] and the default value is 0.5.|
72 | |[-lam]|It specifies the trade-off coefficient. The range is [0, 1] and the default value is 0.15 for multi-document summarization and 0.5 for single-document summarization. |
73 | 
74 | ###Example:java -jar PKUSUMSUM.jar –T 1 –input ./article.txt –output ./summay.txt –L 1 –n 100 –m 2 –stop n
75 | 
76 | ##License
77 | PKUSUMSUM is used under the GNU GPL license.
78 | 
79 | ##Contact us
80 | Welcome to contact us if you have any questions or suggestions while using PKUSUMSUM. 
81 | Contact person: Jianmin Zhang
82 | Contact email: zhangjianmin2015@pku.edu.cn
83 | 
84 | ##Reference
85 | [1]. Radev, Dragomir R., Hongyan Jing, Małgorzata Styś, Daniel Tam. 2004. Centroid-based summarization of multiple documents. Information Processing & Management, 40(6), 919-938.
86 | [2]. Mihalcea, Rada, and Paul Tarau. 2004. TextRank: Bringing order into texts. Association for Computational Linguistics.
87 | [3]. Erkan, Günes, and Dragomir R. Radev. 2004. LexPageRank: Prestige in Multi-Document Text Summarization. EMNLP. Vol.4.
88 | [4]. Gillick, Dan, and Benoit Favre. 2009. A scalable global model for summarization. Proceedings of the Workshop on Integer Linear Programming for Natural Language Processing, Association for Computational Linguistics.
89 | [5]. Li, Jingxuan, Lei Li, and Tao Li. 2012. Multi-document summarization via submodularity. Applied Intelligence 37.3: 420-430.
90 | [6]. Lin, Hui, and Jeff Bilmes. 2010. Multi-document summarization via budgeted maximization of submodular functions. Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Association for Computational Linguistics.
91 | [7]. Wan, Xiaojun, and Jianwu Yang. 2008. Multi-document summarization using cluster-based link analysis. Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, ACM.
92 | [8]. Wan, Xiaojun, Jianwu Yang, and Jianguo Xiao. 2007. Manifold-Ranking Based Topic-Focused Multi-Document Summarization. IJCAI. Vol. 7.
93 | 
94 | 
95 | 


--------------------------------------------------------------------------------
/README_V1.2.docx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/PKULCWM/PKUSUMSUM/721bbee1b1e112dce97c2fa3193fb61488a624bb/README_V1.2.docx


--------------------------------------------------------------------------------
/code/ClusterCMRW.java:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/PKULCWM/PKUSUMSUM/721bbee1b1e112dce97c2fa3193fb61488a624bb/code/ClusterCMRW.java


--------------------------------------------------------------------------------
/code/Coverage.java:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/PKULCWM/PKUSUMSUM/721bbee1b1e112dce97c2fa3193fb61488a624bb/code/Coverage.java


--------------------------------------------------------------------------------
/code/ILP.java:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/PKULCWM/PKUSUMSUM/721bbee1b1e112dce97c2fa3193fb61488a624bb/code/ILP.java


--------------------------------------------------------------------------------
/code/Lead.java:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/PKULCWM/PKUSUMSUM/721bbee1b1e112dce97c2fa3193fb61488a624bb/code/Lead.java


--------------------------------------------------------------------------------
/code/LexPageRank.java:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/PKULCWM/PKUSUMSUM/721bbee1b1e112dce97c2fa3193fb61488a624bb/code/LexPageRank.java


--------------------------------------------------------------------------------
/code/MEAD.java:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/PKULCWM/PKUSUMSUM/721bbee1b1e112dce97c2fa3193fb61488a624bb/code/MEAD.java


--------------------------------------------------------------------------------
/code/Main.java:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/PKULCWM/PKUSUMSUM/721bbee1b1e112dce97c2fa3193fb61488a624bb/code/Main.java


--------------------------------------------------------------------------------
/code/ManifoldRank.java:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/PKULCWM/PKUSUMSUM/721bbee1b1e112dce97c2fa3193fb61488a624bb/code/ManifoldRank.java


--------------------------------------------------------------------------------
/code/Run.java:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/PKULCWM/PKUSUMSUM/721bbee1b1e112dce97c2fa3193fb61488a624bb/code/Run.java


--------------------------------------------------------------------------------
/code/Stemmer.java:
--------------------------------------------------------------------------------
  1 | package code;
  2 | 
  3 | public class Stemmer {
  4 | 	   private char[] b;
  5 | 	   private int i,     /* offset into b */
  6 | 	               i_end, /* offset to end of stemmed word */
  7 | 	               j, k;
  8 | 	   private static final int INC = 50;
  9 | 	                     /* unit of size whereby b is increased */
 10 | 	   public Stemmer()
 11 | 	   {  b = new char[INC];
 12 | 	      i = 0;
 13 | 	      i_end = 0;
 14 | 	   }
 15 | 
 16 | 	   /**
 17 | 	    * Add a character to the word being stemmed.  When you are finished
 18 | 	    * adding characters, you can call stem(void) to stem the word.
 19 | 	    */
 20 | 
 21 | 	   public void add(char ch)
 22 | 	   {  if (i == b.length)
 23 | 	      {  char[] new_b = new char[i+INC];
 24 | 	         for (int c = 0; c < i; c++) new_b[c] = b[c];
 25 | 	         b = new_b;
 26 | 	      }
 27 | 	      b[i++] = ch;
 28 | 	   }
 29 | 
 30 | 
 31 | 	   /** Adds wLen characters to the word being stemmed contained in a portion
 32 | 	    * of a char[] array. This is like repeated calls of add(char ch), but
 33 | 	    * faster.
 34 | 	    */
 35 | 
 36 | 	   public void add(char[] w, int wLen)
 37 | 	   {  if (i+wLen >= b.length)
 38 | 	      {  char[] new_b = new char[i+wLen+INC];
 39 | 	         for (int c = 0; c < i; c++) new_b[c] = b[c];
 40 | 	         b = new_b;
 41 | 	      }
 42 | 	      for (int c = 0; c < wLen; c++) b[i++] = w[c];
 43 | 	   }
 44 | 
 45 | 	   /**
 46 | 	    * After a word has been stemmed, it can be retrieved by toString(),
 47 | 	    * or a reference to the internal buffer can be retrieved by getResultBuffer
 48 | 	    * and getResultLength (which is generally more efficient.)
 49 | 	    */
 50 | 	   public String toString() { return new String(b,0,i_end); }
 51 | 
 52 | 	   /**
 53 | 	    * Returns the length of the word resulting from the stemming process.
 54 | 	    */
 55 | 	   public int getResultLength() { return i_end; }
 56 | 
 57 | 	   /**
 58 | 	    * Returns a reference to a character buffer containing the results of
 59 | 	    * the stemming process.  You also need to consult getResultLength()
 60 | 	    * to determine the length of the result.
 61 | 	    */
 62 | 	   public char[] getResultBuffer() { return b; }
 63 | 
 64 | 	   /* cons(i) is true <=> b[i] is a consonant. */
 65 | 
 66 | 	   private final boolean cons(int i)
 67 | 	   {  switch (b[i])
 68 | 	      {  case 'a': case 'e': case 'i': case 'o': case 'u': return false;
 69 | 	         case 'y': return (i==0) ? true : !cons(i-1);
 70 | 	         default: return true;
 71 | 	      }
 72 | 	   }
 73 | 
 74 | 	   /* m() measures the number of consonant sequences between 0 and j. if c is
 75 | 	      a consonant sequence and v a vowel sequence, and <..> indicates arbitrary
 76 | 	      presence,
 77 | 
 78 | 	         <c><v>       gives 0
 79 | 	         <c>vc<v>     gives 1
 80 | 	         <c>vcvc<v>   gives 2
 81 | 	         <c>vcvcvc<v> gives 3
 82 | 	         ....
 83 | 	   */
 84 | 
 85 | 	   private final int m()
 86 | 	   {  int n = 0;
 87 | 	      int i = 0;
 88 | 	      while(true)
 89 | 	      {  if (i > j) return n;
 90 | 	         if (! cons(i)) break; i++;
 91 | 	      }
 92 | 	      i++;
 93 | 	      while(true)
 94 | 	      {  while(true)
 95 | 	         {  if (i > j) return n;
 96 | 	               if (cons(i)) break;
 97 | 	               i++;
 98 | 	         }
 99 | 	         i++;
100 | 	         n++;
101 | 	         while(true)
102 | 	         {  if (i > j) return n;
103 | 	            if (! cons(i)) break;
104 | 	            i++;
105 | 	         }
106 | 	         i++;
107 | 	       }
108 | 	   }
109 | 
110 | 	   /* vowelinstem() is true <=> 0,...j contains a vowel */
111 | 
112 | 	   private final boolean vowelinstem()
113 | 	   {  int i; for (i = 0; i <= j; i++) if (! cons(i)) return true;
114 | 	      return false;
115 | 	   }
116 | 
117 | 	   /* doublec(j) is true <=> j,(j-1) contain a double consonant. */
118 | 
119 | 	   private final boolean doublec(int j)
120 | 	   {  if (j < 1) return false;
121 | 	      if (b[j] != b[j-1]) return false;
122 | 	      return cons(j);
123 | 	   }
124 | 
125 | 	   /* cvc(i) is true <=> i-2,i-1,i has the form consonant - vowel - consonant
126 | 	      and also if the second c is not w,x or y. this is used when trying to
127 | 	      restore an e at the end of a short word. e.g.
128 | 
129 | 	         cav(e), lov(e), hop(e), crim(e), but
130 | 	         snow, box, tray.
131 | 
132 | 	   */
133 | 
134 | 	   private final boolean cvc(int i)
135 | 	   {  if (i < 2 || !cons(i) || cons(i-1) || !cons(i-2)) return false;
136 | 	      {  int ch = b[i];
137 | 	         if (ch == 'w' || ch == 'x' || ch == 'y') return false;
138 | 	      }
139 | 	      return true;
140 | 	   }
141 | 
142 | 	   private final boolean ends(String s)
143 | 	   {  int l = s.length();
144 | 	      int o = k-l+1;
145 | 	      if (o < 0) return false;
146 | 	      for (int i = 0; i < l; i++) if (b[o+i] != s.charAt(i)) return false;
147 | 	      j = k-l;
148 | 	      return true;
149 | 	   }
150 | 
151 | 	   /* setto(s) sets (j+1),...k to the characters in the string s, readjusting
152 | 	      k. */
153 | 
154 | 	   private final void setto(String s)
155 | 	   {  int l = s.length();
156 | 	      int o = j+1;
157 | 	      for (int i = 0; i < l; i++) b[o+i] = s.charAt(i);
158 | 	      k = j+l;
159 | 	   }
160 | 
161 | 	   /* r(s) is used further down. */
162 | 
163 | 	   private final void r(String s) { if (m() > 0) setto(s); }
164 | 
165 | 	   /* step1() gets rid of plurals and -ed or -ing. e.g.
166 | 
167 | 	          caresses  ->  caress
168 | 	          ponies    ->  poni
169 | 	          ties      ->  ti
170 | 	          caress    ->  caress
171 | 	          cats      ->  cat
172 | 
173 | 	          feed      ->  feed
174 | 	          agreed    ->  agree
175 | 	          disabled  ->  disable
176 | 
177 | 	          matting   ->  mat
178 | 	          mating    ->  mate
179 | 	          meeting   ->  meet
180 | 	          milling   ->  mill
181 | 	          messing   ->  mess
182 | 
183 | 	          meetings  ->  meet
184 | 
185 | 	   */
186 | 
187 | 	   private final void step1()
188 | 	   {  if (b[k] == 's')
189 | 	      {  if (ends("sses")) k -= 2; else
190 | 	         if (ends("ies")) setto("i"); else
191 | 	         if (b[k-1] != 's') k--;
192 | 	      }
193 | 	      if (ends("eed")) { if (m() > 0) k--; } else
194 | 	      if ((ends("ed") || ends("ing")) && vowelinstem())
195 | 	      {  k = j;
196 | 	         if (ends("at")) setto("ate"); else
197 | 	         if (ends("bl")) setto("ble"); else
198 | 	         if (ends("iz")) setto("ize"); else
199 | 	         if (doublec(k))
200 | 	         {  k--;
201 | 	            {  int ch = b[k];
202 | 	               if (ch == 'l' || ch == 's' || ch == 'z') k++;
203 | 	            }
204 | 	         }
205 | 	         else if (m() == 1 && cvc(k)) setto("e");
206 | 	     }
207 | 	   }
208 | 
209 | 	   /* step2() turns terminal y to i when there is another vowel in the stem. */
210 | 
211 | 	   private final void step2() { if (ends("y") && vowelinstem()) b[k] = 'i'; }
212 | 
213 | 	   /* step3() maps double suffices to single ones. so -ization ( = -ize plus
214 | 	      -ation) maps to -ize etc. note that the string before the suffix must give
215 | 	      m() > 0. */
216 | 
217 | 	   private final void step3() { if (k == 0) return; /* For Bug 1 */ switch (b[k-1])
218 | 	   {
219 | 	       case 'a': if (ends("ational")) { r("ate"); break; }
220 | 	                 if (ends("tional")) { r("tion"); break; }
221 | 	                 break;
222 | 	       case 'c': if (ends("enci")) { r("ence"); break; }
223 | 	                 if (ends("anci")) { r("ance"); break; }
224 | 	                 break;
225 | 	       case 'e': if (ends("izer")) { r("ize"); break; }
226 | 	                 break;
227 | 	       case 'l': if (ends("bli")) { r("ble"); break; }
228 | 	                 if (ends("alli")) { r("al"); break; }
229 | 	                 if (ends("entli")) { r("ent"); break; }
230 | 	                 if (ends("eli")) { r("e"); break; }
231 | 	                 if (ends("ousli")) { r("ous"); break; }
232 | 	                 break;
233 | 	       case 'o': if (ends("ization")) { r("ize"); break; }
234 | 	                 if (ends("ation")) { r("ate"); break; }
235 | 	                 if (ends("ator")) { r("ate"); break; }
236 | 	                 break;
237 | 	       case 's': if (ends("alism")) { r("al"); break; }
238 | 	                 if (ends("iveness")) { r("ive"); break; }
239 | 	                 if (ends("fulness")) { r("ful"); break; }
240 | 	                 if (ends("ousness")) { r("ous"); break; }
241 | 	                 break;
242 | 	       case 't': if (ends("aliti")) { r("al"); break; }
243 | 	                 if (ends("iviti")) { r("ive"); break; }
244 | 	                 if (ends("biliti")) { r("ble"); break; }
245 | 	                 break;
246 | 	       case 'g': if (ends("logi")) { r("log"); break; }
247 | 	   } }
248 | 
249 | 	   /* step4() deals with -ic-, -full, -ness etc. similar strategy to step3. */
250 | 
251 | 	   private final void step4() { switch (b[k])
252 | 	   {
253 | 	       case 'e': if (ends("icate")) { r("ic"); break; }
254 | 	                 if (ends("ative")) { r(""); break; }
255 | 	                 if (ends("alize")) { r("al"); break; }
256 | 	                 break;
257 | 	       case 'i': if (ends("iciti")) { r("ic"); break; }
258 | 	                 break;
259 | 	       case 'l': if (ends("ical")) { r("ic"); break; }
260 | 	                 if (ends("ful")) { r(""); break; }
261 | 	                 break;
262 | 	       case 's': if (ends("ness")) { r(""); break; }
263 | 	                 break;
264 | 	   } }
265 | 
266 | 	   /* step5() takes off -ant, -ence etc., in context <c>vcvc<v>. */
267 | 
268 | 	   private final void step5()
269 | 	   {   if (k == 0) return; /* for Bug 1 */ switch (b[k-1])
270 | 	       {  case 'a': if (ends("al")) break; return;
271 | 	          case 'c': if (ends("ance")) break;
272 | 	                    if (ends("ence")) break; return;
273 | 	          case 'e': if (ends("er")) break; return;
274 | 	          case 'i': if (ends("ic")) break; return;
275 | 	          case 'l': if (ends("able")) break;
276 | 	                    if (ends("ible")) break; return;
277 | 	          case 'n': if (ends("ant")) break;
278 | 	                    if (ends("ement")) break;
279 | 	                    if (ends("ment")) break;
280 | 	                    /* element etc. not stripped before the m */
281 | 	                    if (ends("ent")) break; return;
282 | 	          case 'o': if (ends("ion") && j >= 0 && (b[j] == 's' || b[j] == 't')) break;
283 | 	                                    /* j >= 0 fixes Bug 2 */
284 | 	                    if (ends("ou")) break; return;
285 | 	                    /* takes care of -ous */
286 | 	          case 's': if (ends("ism")) break; return;
287 | 	          case 't': if (ends("ate")) break;
288 | 	                    if (ends("iti")) break; return;
289 | 	          case 'u': if (ends("ous")) break; return;
290 | 	          case 'v': if (ends("ive")) break; return;
291 | 	          case 'z': if (ends("ize")) break; return;
292 | 	          default: return;
293 | 	       }
294 | 	       if (m() > 1) k = j;
295 | 	   }
296 | 
297 | 	   /* step6() removes a final -e if m() > 1. */
298 | 
299 | 	   private final void step6()
300 | 	   {  j = k;
301 | 	      if (b[k] == 'e')
302 | 	      {  int a = m();
303 | 	         if (a > 1 || a == 1 && !cvc(k-1)) k--;
304 | 	      }
305 | 	      if (b[k] == 'l' && doublec(k) && m() > 1) k--;
306 | 	   }
307 | 
308 | 	   /** Stem the word placed into the Stemmer buffer through calls to add().
309 | 	    * Returns true if the stemming process resulted in a word different
310 | 	    * from the input.  You can retrieve the result with
311 | 	    * getResultLength()/getResultBuffer() or toString().
312 | 	    */
313 | 	   public void stem()
314 | 	   {  k = i - 1;
315 | 	      if (k > 1) { step1(); step2(); step3(); step4(); step5(); step6(); }
316 | 	      i_end = k+1; i = 0;
317 | 	   }
318 | }
319 | 


--------------------------------------------------------------------------------
/code/Submodular.java:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/PKULCWM/PKUSUMSUM/721bbee1b1e112dce97c2fa3193fb61488a624bb/code/Submodular.java


--------------------------------------------------------------------------------
/code/TextRank.java:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/PKULCWM/PKUSUMSUM/721bbee1b1e112dce97c2fa3193fb61488a624bb/code/TextRank.java


--------------------------------------------------------------------------------
/code/Tokenizer.java:
--------------------------------------------------------------------------------
  1 | package code;
  2 | 
  3 | import edu.stanford.nlp.ling.CoreLabel;
  4 | import edu.stanford.nlp.process.CoreLabelTokenFactory;
  5 | import edu.stanford.nlp.process.PTBTokenizer;
  6 | 
  7 | import java.io.*;
  8 | import java.util.ArrayList;
  9 | import java.util.HashMap;
 10 | import java.util.regex.Matcher;
 11 | import java.util.regex.Pattern;
 12 | 
 13 | import org.ansj.domain.Result;
 14 | import org.ansj.domain.Term;
 15 | import org.ansj.splitWord.analysis.ToAnalysis;
 16 | 
 17 | public class Tokenizer {
 18 |     public ArrayList<String> passage=new ArrayList<String>();
 19 |     public ArrayList<Integer> senLen=new ArrayList<Integer>();
 20 |     public ArrayList<String> sentence=new ArrayList<String>();
 21 |     public ArrayList<ArrayList<String>> word=new ArrayList<ArrayList<String>>();
 22 |     public ArrayList<ArrayList<String>> stemmerWord=new ArrayList<ArrayList<String>>();
 23 |     HashMap<String,Integer> stopword=new HashMap<String,Integer>();
 24 |     public void readStopwords(String stopwordPath) throws IOException
 25 |     {	
 26 | 		File tmpfile =new File(stopwordPath);
 27 | 		if (!tmpfile.exists()){
 28 | 			System.out.println("stopwords file does not exist!");
 29 | 			System.exit(0);
 30 | 		}
 31 |     	FileReader inFReader=new FileReader(stopwordPath);
 32 |         BufferedReader inBReader=new BufferedReader(inFReader);
 33 |         String tmpWord;
 34 |         int i=0;
 35 |         while((tmpWord=inBReader.readLine())!=null)
 36 |         {
 37 |             i++;
 38 |             stopword.put(tmpWord, i);
 39 |         }
 40 |         inBReader.close();
 41 |     }
 42 | 
 43 |     public void readStopwordsEng() throws IOException
 44 |     {
 45 |     	
 46 |     	InputStream stop = Tokenizer.class.getClassLoader().getResourceAsStream("stopword_Eng");
 47 |     	BufferedReader inBReader = new BufferedReader(new InputStreamReader(stop));
 48 |         String tmpWord;
 49 |         int i=0;
 50 |         while((tmpWord = inBReader.readLine()) != null){
 51 |         	i++;
 52 |             stopword.put(tmpWord, i);
 53 |         }
 54 |         inBReader.close();
 55 |     	
 56 |     }
 57 |     
 58 |     public boolean ifWordsEng(String tmpWord)
 59 |     {
 60 |         if (tmpWord.charAt(0)>='A' && tmpWord.charAt(0)<='Z') return true;
 61 |         if (tmpWord.charAt(0)>='a' && tmpWord.charAt(0)<='z') return true;
 62 |         return false;
 63 |     }
 64 |     public boolean ifStopwords(String tmpWord)
 65 |     {
 66 |         if (stopword.get(tmpWord.toLowerCase())!=null) return true;
 67 |         return false;
 68 |     }
 69 | 
 70 |     public void stemmerWord() {
 71 |     	int numOfWord = word.size();
 72 |     	for(int i = 0; i < numOfWord; ++i) {
 73 |     		ArrayList<String> stemmerW = new ArrayList<String>();
 74 |     		for(int j = 0; j < word.get(i).size(); ++j) {
 75 |     			Stemmer stemmer = new Stemmer();
 76 |         		int letterNumOfWord = word.get(i).get(j).length();
 77 |         		for(int k = 0; k < letterNumOfWord; ++k) {
 78 |         			stemmer.add(word.get(i).get(j).charAt(k));
 79 |         		}
 80 |         		stemmer.stem();
 81 |         		String tmpW = stemmer.toString();
 82 |         		stemmerW.add(tmpW);
 83 |     		}
 84 |     		stemmerWord.add(stemmerW);
 85 |     	}
 86 |     }
 87 |     
 88 |     public ArrayList<String> tokenizeEng(String inFile, String stopwordPath) throws IOException
 89 |     {
 90 |         PTBTokenizer<CoreLabel> ptbt = new PTBTokenizer<>(new FileReader(inFile),
 91 |                 new CoreLabelTokenFactory(), "");
 92 |         int len=0;
 93 |         int wlen=0;
 94 | 		if (stopwordPath.equals("y"))
 95 | 			readStopwordsEng();
 96 | 		else if (!stopwordPath.equals("n"))
 97 | 			readStopwords(stopwordPath);
 98 |         String token,tmpSen;
 99 |         tmpSen=new String();
100 |         boolean ifend=false;
101 |         while (ptbt.hasNext())
102 |         {
103 |             CoreLabel label = ptbt.next();
104 |             token=label.toString();
105 | 
106 |             if (ifend==false)
107 |             {
108 | 
109 |                 if (token.equals(".") || token.equals("?") ||token.equals("!"))
110 |                 {
111 |                     ifend=true;
112 |                 }
113 |                 //remove some invalit symbols
114 |                 if (token.equals("-LRB-") || token.equals("-RRB-") || token.equals("-LCB-")|| token.equals("-RCB-") || token.equals("\""))
115 |                     continue;
116 |                 if (token.equals("'") || token.equals("`") || token.equals("''") || token.equals("``") || token.equals("_") || token.equals("--") || token.equals("-")){
117 |                     continue;
118 |                 }
119 |                 if (token.equals("'s") || token.equals(".") || token.equals("?") || token.equals("!") || token.equals(",") || token.equals("'re") || (token.equals("'ve")))
120 |                     tmpSen+=token;
121 |                 else
122 |                 tmpSen+=" "+token;
123 | 
124 |                 if (ifWordsEng(token))
125 |                     wlen++;
126 |                 if (!token.equals("'s"))
127 |                     len++;
128 |                 if (ifWordsEng(token) && !ifStopwords(token))
129 |                     sentence.add(token.toLowerCase());
130 |             }else
131 |             {
132 |                 if (token.equals("'") || token.equals("`") || token.equals("''") || token.equals("``") || token.equals(" ")){
133 |                     continue;
134 |                 }
135 |                 if (token.equals("."))
136 |                 {
137 | 
138 |                     tmpSen+=token;
139 |                     len++;
140 |                 }else
141 |                 {
142 |                     if (len>1 && wlen*2>=len) {
143 |                         passage.add(tmpSen);
144 |                         senLen.add(len);
145 |                         word.add(sentence);
146 |                     }
147 |                     ifend=false;
148 |                     tmpSen=token;
149 |                     sentence=new ArrayList<String>();
150 |                     wlen=0;
151 |                     if (ifWordsEng(token))
152 |                         wlen++;
153 |                     len=1;
154 |                     if (ifWordsEng(token) && !ifStopwords(token))
155 |                         sentence.add(token.toLowerCase());
156 |                 }
157 |             }
158 |         }
159 |         if (ifend && len>1 && wlen*2>=len)
160 |         {
161 |             passage.add(tmpSen);
162 |             word.add(sentence);
163 |             senLen.add(len);
164 |         }
165 |         stemmerWord();
166 |         return passage;
167 |     }
168 | 
169 |     public ArrayList<String> tokenizeChn(String inFile, String stopwordPath) throws IOException
170 |     {
171 |         StringBuffer buffer=new StringBuffer();
172 |         String line; 
173 | 		if (!stopwordPath.equals("n") && !stopwordPath.equals("y"))
174 | 			readStopwords(stopwordPath);
175 |         BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(inFile), "utf-8"));
176 |         line = reader.readLine(); 
177 |         while (line != null) {
178 |             buffer.append(line);
179 |             buffer.append("\n");
180 |             line = reader.readLine();
181 |         }
182 |         reader.close();
183 |         Pattern pattern = Pattern.compile(".*?[。？！]");
184 |         Matcher matcher = pattern.matcher(buffer);
185 |         Pattern p2=Pattern.compile("[\u4e00-\u9fa5]");
186 |         while (matcher.find()) {
187 |             String sen=matcher.group();
188 |             passage.add(sen);
189 |             senLen.add(sen.length());
190 |             Result parse = ToAnalysis.parse(sen);
191 |             ArrayList<String> tmpsen=new ArrayList<>();
192 |             for (Term x:parse){
193 |                 Matcher m2=p2.matcher(x.getName());
194 |                 if (m2.find()) {
195 | 					if (!ifStopwords(x.getName()))
196 |                     tmpsen.add(x.getName());
197 |                 }
198 |             }
199 |             word.add(tmpsen);
200 |         }
201 |         stemmerWord();
202 |         return passage;
203 |     }
204 | 
205 | }
206 | 


--------------------------------------------------------------------------------
/code/doc.java:
--------------------------------------------------------------------------------
  1 | package code;
  2 | 
  3 | import java.io.IOException;
  4 | import java.util.ArrayList;
  5 | import java.util.Arrays;
  6 | import java.util.HashMap;
  7 | import java.util.Map;
  8 | import java.util.TreeMap;
  9 | import java.util.TreeSet;
 10 | 
 11 | //some basic information about doc
 12 | public class Doc {
 13 |   public ArrayList<ArrayList<String>> sen = new ArrayList<ArrayList<String>>();// the sentence after tokenize
 14 |   public ArrayList<ArrayList<String>> stemmerSen = new ArrayList<ArrayList<String>>();
 15 |   public int[] lRange;//the begin of the i'th document
 16 |   public int[] rRange;//the end of the i'th document
 17 |   public ArrayList<String> originalSen = new ArrayList<String>();//the original sentence
 18 |   public ArrayList<Integer> senLen = new ArrayList<>();//the length of original sentence
 19 |   public ArrayList<Integer> wordLen = new ArrayList<>();// the length of the vector
 20 |   public ArrayList<TreeSet<Integer>> sVector = new ArrayList<TreeSet<Integer>>();
 21 |   public ArrayList<ArrayList<Integer>> sTf = new ArrayList<>();//the tf-vector of the sentence; sVector stores index
 22 |   public ArrayList<Integer> dTf;//the tf-vector of the document; dVector stores index
 23 |   public TreeSet<Integer> dVector;
 24 |   public int totalLen;// the lenth of the
 25 |   public int fnum, snum = 0, wnum;//fnum-document num ;wnum-word num; snum-sentence num
 26 |   public int[] tf;//tf of words
 27 |   public int[] df;//df of words
 28 |   public double[] idf;//idf of words
 29 |   public double[][] sim, normalSim;
 30 |   public int maxlen;//the maxlen of the summary
 31 |   // public String outfile;
 32 |   ArrayList<Integer> summaryId = new ArrayList<>();//index of the sentence picked
 33 |   HashMap<String, Integer> dic = new HashMap<String, Integer>();//map words into number
 34 |   HashMap<Integer, String> dd= new HashMap<>();
 35 | 
 36 | 
 37 |   public void readTopic(String Topicfile, String language, String stopwordPath) throws IOException {
 38 |       Tokenizer mytoken = new Tokenizer();
 39 |       ArrayList<String> tmp = new ArrayList<>();
 40 |       if (language.equals("1"))//1 represent Chinese
 41 |           tmp = mytoken.tokenizeChn(Topicfile, stopwordPath);
 42 |       else if (language.equals("2"))//2 represent English
 43 |           tmp = mytoken.tokenizeEng(Topicfile, stopwordPath);
 44 |       else if (language.equals("3"))//3 represent other
 45 |       	tmp = mytoken.tokenizeEng(Topicfile, stopwordPath);
 46 | 
 47 |       int len = tmp.size();
 48 |       String topic = "";
 49 |       ArrayList<String> topicWord = new ArrayList<String>();
 50 |       ArrayList<String> stemmerTopicWord = new ArrayList<String>();
 51 |       int length = 0;
 52 |       for(int i = 0; i < len; ++i) {
 53 |       	topic = topic + tmp.get(i) + " ";
 54 |       	for(int j = 0; j < mytoken.word.get(i).size(); ++j) {
 55 |       		topicWord.add(mytoken.word.get(i).get(j));
 56 |       		stemmerTopicWord.add(mytoken.stemmerWord.get(i).get(j));
 57 |       	}
 58 |       	length += mytoken.senLen.get(i);
 59 |       }
 60 |       
 61 |       sen.add(topicWord);
 62 |       stemmerSen.add(stemmerTopicWord);
 63 |       senLen.add(length);
 64 |       originalSen.add(topic);
 65 |       snum++;
 66 |   }
 67 |   
 68 |   //read file from the documents
 69 |   public void readfile(String[] rfiles,String filepath,String language, String stopwordPath) throws IOException {
 70 |       int i = 0;
 71 |       lRange = new int[rfiles.length];
 72 |       rRange = new int[rfiles.length];
 73 |       fnum = 0;
 74 |       totalLen = snum;
 75 |       for (String infile : rfiles) {
 76 |     	  if (infile.equals(".DS_Store")) {
 77 | 				System.out.println("Skiping!!");
 78 | 				continue;
 79 |     	  }
 80 |     	  fnum++;
 81 |           String path;
 82 |           if (!filepath.equals(" ")) {
 83 |               path = filepath + System.getProperty("file.separator") + infile;
 84 |           }
 85 |           else{
 86 |               path = infile;
 87 |           }
 88 | 
 89 |           Tokenizer mytoken = new Tokenizer();
 90 |           ArrayList<String> tmp = new ArrayList<>();
 91 |           if (language.equals("1"))//1 represent Chinese
 92 |               tmp = mytoken.tokenizeChn(path,stopwordPath);
 93 |           else
 94 |           if (language.equals("2"))//2 represent English
 95 |               tmp = mytoken.tokenizeEng(path, stopwordPath);
 96 | 			else
 97 | 			if (language.equals("3"))//3 represent other
 98 | 				tmp = mytoken.tokenizeEng(path, stopwordPath);
 99 |           int len = tmp.size();
100 |           
101 |           lRange[i] = totalLen;
102 |           totalLen += len;
103 |           rRange[i] = totalLen;
104 |           i++;
105 |           sen.addAll(mytoken.word);
106 |           stemmerSen.addAll(mytoken.stemmerWord);
107 |           senLen.addAll(mytoken.senLen);
108 |           originalSen.addAll(tmp);
109 |       }
110 |       snum = originalSen.size();
111 |   }
112 |   
113 | 
114 |   // op 1 represent tf-isf; 2 and 3 represent tf-idf
115 |   // stemOrNot 1 represent no stemmer; 2 represent stemmer
116 |   // calculate the tf-idf of the words
117 |   void calcTfidf(int op, int stemOrNot) {
118 |       int i = 0,wlen = 0;
119 |       wnum = 0;
120 |       dic = new HashMap<String, Integer>();
121 |       dTf = new ArrayList<>();
122 |       dVector = new TreeSet<>();
123 |       int[] allTf = new int [100000];
124 |       Arrays.fill(allTf,0);
125 |       wordLen = new ArrayList<>();
126 |       int dnum = 0;
127 |       tf = new int[100000];
128 |       df = new int[100000];
129 |       boolean[] occur = new boolean[100000];
130 |       
131 |       ArrayList<ArrayList<String>> calTfIdfVec = new ArrayList<ArrayList<String>>();
132 |       if(stemOrNot == 1) {
133 |       	calTfIdfVec = sen;
134 |       }else {
135 |       	calTfIdfVec = stemmerSen;
136 |       }
137 |       
138 |       for (ArrayList<String> tmpSen : calTfIdfVec) {
139 |           wlen=0;
140 |           TreeSet<Integer> tmpSet = new TreeSet<Integer>();
141 |           Arrays.fill(tf,0);
142 |           if (op == 2 || op == 3) {
143 |               if (i == rRange[dnum]){
144 |                   dnum++;
145 |                   Arrays.fill(occur,false);
146 |               }
147 |           }else
148 |               Arrays.fill(occur,false);
149 |           for (String tmpWord : tmpSen) {
150 |               wlen++;
151 |               if (dic.get(tmpWord) != null) {
152 |                   int k = dic.get(tmpWord);
153 |                   tmpSet.add(k);
154 |                   tf[k]++;
155 |                   allTf[k]++;
156 |                   if (!occur[k]) {
157 |                       occur[k] = true;
158 |                       df[k]++;
159 |                   }
160 | 
161 |               } else {
162 |                   dic.put(tmpWord, wnum);
163 |                   dd.put(wnum,tmpWord);
164 |                   tf[wnum]++;
165 |                   allTf[wnum]++;
166 |                   df[wnum]++;
167 |                   tmpSet.add(wnum);
168 |                   occur[wnum] = true;
169 |                   wnum++;
170 |               }
171 |           }
172 |           wordLen.add(wlen);
173 |           ArrayList<Integer> tmpTf=new ArrayList<>();
174 |           for (int j:tmpSet)
175 |           {
176 |               tmpTf.add(tf[j]);
177 |           }
178 |           sTf.add(tmpTf);
179 |           sVector.add(tmpSet);
180 | 
181 | 
182 |           i++;
183 |       }
184 |       idf = new double[wnum];
185 |       if (op == 2 || op == 3){
186 |           for (i=0;i<wnum;i++)
187 |           {
188 |               idf[i]= Math.log((double)(1+fnum)/df[i]);
189 |           }
190 |       }else
191 |       {
192 |           for (i=0;i<wnum;i++)
193 |           {
194 |               idf[i]= Math.log((double)(1+snum)/df[i]);
195 |           }
196 |       }
197 |       for (i=0;i<wnum;i++){
198 |           if (allTf[i]!=0)
199 |           {
200 |               dVector.add(i);
201 |               dTf.add(allTf[i]);
202 |           }
203 |       }
204 |   }
205 |   //calculate cosine of two sentence vector
206 |   double calcCos(TreeSet<Integer> a1 , ArrayList<Integer> a2, int lenA , TreeSet<Integer> b1, ArrayList<Integer>b2, int lenB)
207 |   {
208 |       int x1 = 0,x2 = 0;
209 |       double l1 = 0,l2 = 0;
210 |       int idA = 0, idB = 0;
211 |       double cos = 0;
212 |       TreeSet<Integer> a = new TreeSet<>();
213 |       TreeSet<Integer> b = new TreeSet<>();
214 |       a.addAll(a1);
215 |       b.addAll(b1);
216 |       while (a.size() > 0 && b.size() > 0)
217 |       {
218 | 
219 |           x1 = a.first();
220 |           x2 = b.first();
221 |           if ( x1 == x2 )
222 |           {
223 |               l1 += Math.pow((double)a2.get(idA)/(double)lenA*idf[x1],2);
224 |               l2 += Math.pow((double)b2.get(idB)/(double)lenB*idf[x2],2);
225 |               cos += Math.pow(idf[x1],2)*(double)a2.get(idA)/(double)lenA*(double)b2.get(idB)/(double)lenB;
226 |               a.pollFirst();
227 |               idA++;
228 |               b.pollFirst();
229 |               idB++;
230 |           }else
231 |           if ( x1 < x2 )
232 |           {
233 |               l1 += Math.pow((double)a2.get(idA)/(double)lenA*idf[x1],2);
234 |               a.pollFirst();
235 |               idA++;
236 |           }else
237 |           if ( x1 > x2)
238 |           {
239 |               l2 += Math.pow((double)b2.get(idB)/(double)lenB*idf[x2],2);
240 |               b.pollFirst();
241 |               idB++;
242 |           }
243 |       }
244 |       while (a.size() > 0)
245 |       {
246 |           x1 = a.first();
247 |           l1 += Math.pow((double)a2.get(idA)/(double)lenA*idf[x1],2);
248 |           a.pollFirst();
249 |           idA++;
250 |       }
251 |       while (b.size() > 0)
252 |       {
253 |           x2 = b.first();
254 |           l2 += Math.pow((double)b2.get(idB)/(double)lenB*idf[x2],2);
255 |           b.pollFirst();
256 |           idB++;
257 |       }
258 | 
259 |       if (l1==0 || l2==0) return 0;
260 |       return cos/Math.pow(l1*l2,0.5) ;
261 |   }
262 | 
263 |   //calculate the similarity of two sentence
264 |   void calcSim()
265 |   {
266 |       sim = new double[snum][snum];
267 |       normalSim = new double[snum][snum];
268 |       for (int i = 0 ; i < snum; i++){
269 |       	double sumISim = 0.0;
270 |       	for (int j = 0; j < snum; j++)
271 |       	{
272 |       		if (i == j) {
273 |       			sim[i][j] = 1;
274 |       		}
275 |       		else if (i > j) {
276 |       			sim[i][j] = sim[j][i];
277 | 
278 |       		}
279 |       		else{
280 |       			sim[i][j] = calcCos(sVector.get(i), sTf.get(i), wordLen.get(i), sVector.get(j), sTf.get(j), wordLen.get(j));
281 |       		}
282 |       		sumISim += sim[i][j];
283 |       	}
284 |       	for(int j = 0; j < snum; ++j) {
285 |       		if(sumISim != 0.0) {
286 |       			normalSim[i][j] = sim[i][j] / sumISim;
287 |       		}
288 |       		else 
289 |       			normalSim[i][j] = 0.0;
290 |       	}
291 |       }
292 |   }
293 | 
294 |   //using MMR to remove redundancy
295 |   ArrayList<Integer> pickSentenceMMR(double[] score, double para, double beta)
296 |   {
297 |       summaryId =new ArrayList<>();
298 |       int len = 0;
299 |       if (para < 0) para = 0.7;
300 |       boolean[] chosen = new boolean[snum];
301 |       for (int i=0;i<snum;i++)
302 |           chosen[i]=false;
303 |       while ( len < maxlen)
304 |       {
305 |           double maxscore = 0;
306 |           int pick = -1;
307 |           for (int i=0;i<snum;i++)
308 |           {
309 |               double tmpscore = score[i];
310 | 
311 | 
312 |               for (int j : summaryId)
313 |                   if (score[i] - sim[i][j] * score[j] *para< tmpscore)
314 |                       tmpscore =  score[i] - sim[i][j] * score[j] *para;
315 | 
316 |               if (tmpscore/Math.pow(senLen.get(i),beta)>maxscore && !chosen[i] && len+ senLen.get(i)<maxlen && senLen.get(i)>=5)
317 |               {
318 | 
319 |                   maxscore = tmpscore/Math.pow(senLen.get(i),beta);
320 |                   pick = i;
321 | 
322 |               }
323 |           }
324 |           if (pick==-1)
325 |               break;
326 |           chosen[pick]=true;
327 |           len += senLen.get(pick);
328 |           summaryId.add(pick);
329 |           if (len>=maxlen-20)
330 |               break;
331 |       }
332 |       return summaryId;
333 |   }
334 | 
335 |   //using threshold to remove redundancy
336 |   ArrayList<Integer> pickSentenceThreshold(double[] score, double threshold, double beta){
337 |       summaryId = new ArrayList<>();
338 |       int len = 0;
339 |       boolean[] chosen = new boolean[snum];
340 |       for (int i = 0; i < snum; i++)
341 |           chosen[i] = false;
342 |       while(len < maxlen)
343 |       {
344 |           double maxscore = 0;
345 |           int pick = -1;
346 |           for (int i = 0; i < snum; i++)
347 |           {
348 |               double tmpscore = score[i];
349 |               for (int j : summaryId)
350 |                   if (sim[i][j] > threshold)
351 |                       tmpscore = 0;
352 | 
353 |               if (tmpscore/Math.pow(senLen.get(i),beta)>maxscore && !chosen[i] && len+ senLen.get(i)<maxlen && senLen.get(i)>=5)
354 |               {
355 | 
356 |                   maxscore = tmpscore/Math.pow(senLen.get(i),beta);
357 |                   pick = i;
358 | 
359 |               }
360 |           }
361 |           if (pick==-1)
362 |               break;
363 |           chosen[pick]=true;
364 |           len += senLen.get(pick);
365 |           summaryId.add(pick);
366 |           if (len>=maxlen-20)
367 |               break;
368 |       }
369 |       return summaryId;
370 |   }
371 | 
372 |   //using punishment to remove redundancy
373 |   ArrayList<Integer> pickSentenceSumpun(double[] score, double para){
374 |   	summaryId = new ArrayList<>();
375 |   	Map<Integer,Double> m = new TreeMap<Integer,Double>();
376 | 		int contentNum = 0;
377 | 		int Numm = 0;
378 | 		double maxSenScore = -10000.0;
379 | 		boolean[] yes = new boolean[snum];
380 | 		for(int i = 0; i < snum; i++){
381 | 			yes[i] = false;
382 | 		}
383 | 		while(contentNum <= maxlen){
384 | 			
385 | 			for(int i = 1; i < snum; i++){
386 | 				if(yes[i] == false && score[i] > maxSenScore){
387 | 					maxSenScore = score[i];
388 | 					Numm = i;
389 | 				}
390 | 			}
391 | 			
392 | 			m.put(Numm, maxSenScore);
393 | 			maxSenScore = -10000.0;
394 | 			contentNum += senLen.get(Numm);
395 | 			yes[Numm] = true;
396 | 			
397 | 			for(int i = 1;i < snum; i++){
398 | 				if(yes[i] == false){
399 | 					score[i] = score[i] - para * normalSim[i][Numm] * score[Numm];
400 | 				}
401 | 			}
402 | 		} 
403 | 		for (Integer key : m.keySet()) { 
404 | 			summaryId.add(key);
405 | 		}  
406 | 		
407 |   	return summaryId;
408 |   	
409 |   }
410 |   
411 | }


--------------------------------------------------------------------------------
/code/stopword_Eng:
--------------------------------------------------------------------------------
   1 | able
   2 | about
   3 | above
   4 | according
   5 | accordingly
   6 | across
   7 | actually
   8 | after
   9 | afterwards
  10 | again
  11 | against
  12 | ain't
  13 | all
  14 | allow
  15 | allows
  16 | almost
  17 | alone
  18 | along
  19 | already
  20 | also
  21 | although
  22 | always
  23 | am
  24 | among
  25 | amongst
  26 | an
  27 | and
  28 | another
  29 | any
  30 | anybody
  31 | anyhow
  32 | anyone
  33 | anything
  34 | anyway
  35 | anyways
  36 | anywhere
  37 | apart
  38 | appear
  39 | appreciate
  40 | appropriate
  41 | are
  42 | aren't
  43 | around
  44 | as
  45 | a's
  46 | aside
  47 | ask
  48 | asking
  49 | associated
  50 | at
  51 | available
  52 | away
  53 | awfully
  54 | be
  55 | became
  56 | because
  57 | become
  58 | becomes
  59 | becoming
  60 | been
  61 | before
  62 | beforehand
  63 | behind
  64 | being
  65 | believe
  66 | below
  67 | beside
  68 | besides
  69 | best
  70 | better
  71 | between
  72 | beyond
  73 | both
  74 | brief
  75 | but
  76 | by
  77 | came
  78 | can
  79 | cannot
  80 | cant
  81 | can't
  82 | cause
  83 | causes
  84 | certain
  85 | certainly
  86 | changes
  87 | clearly
  88 | c'mon
  89 | co
  90 | com
  91 | come
  92 | comes
  93 | concerning
  94 | consequently
  95 | consider
  96 | considering
  97 | contain
  98 | containing
  99 | contains
 100 | corresponding
 101 | could
 102 | couldn't
 103 | course
 104 | c's
 105 | currently
 106 | definitely
 107 | described
 108 | despite
 109 | did
 110 | didn't
 111 | different
 112 | do
 113 | does
 114 | doesn't
 115 | doing
 116 | done
 117 | don't
 118 | down
 119 | downwards
 120 | during
 121 | each
 122 | edu
 123 | eg
 124 | eight
 125 | either
 126 | else
 127 | elsewhere
 128 | enough
 129 | entirely
 130 | especially
 131 | et
 132 | etc
 133 | even
 134 | ever
 135 | every
 136 | everybody
 137 | everyone
 138 | everything
 139 | everywhere
 140 | ex
 141 | exactly
 142 | example
 143 | except
 144 | far
 145 | few
 146 | fifth
 147 | first
 148 | five
 149 | followed
 150 | following
 151 | follows
 152 | for
 153 | former
 154 | formerly
 155 | forth
 156 | four
 157 | from
 158 | further
 159 | furthermore
 160 | get
 161 | gets
 162 | getting
 163 | given
 164 | gives
 165 | go
 166 | goes
 167 | going
 168 | gone
 169 | got
 170 | gotten
 171 | greetings
 172 | had
 173 | hadn't
 174 | happens
 175 | hardly
 176 | has
 177 | hasn't
 178 | have
 179 | haven't
 180 | having
 181 | he
 182 | hello
 183 | help
 184 | hence
 185 | her
 186 | here
 187 | hereafter
 188 | hereby
 189 | herein
 190 | here's
 191 | hereupon
 192 | hers
 193 | herself
 194 | he's
 195 | hi
 196 | him
 197 | himself
 198 | his
 199 | hither
 200 | hopefully
 201 | how
 202 | howbeit
 203 | however
 204 | i'd
 205 | ie
 206 | if
 207 | ignored
 208 | i'll
 209 | i'm
 210 | immediate
 211 | in
 212 | inasmuch
 213 | inc
 214 | indeed
 215 | indicate
 216 | indicated
 217 | indicates
 218 | inner
 219 | insofar
 220 | instead
 221 | into
 222 | inward
 223 | is
 224 | isn't
 225 | it
 226 | it'd
 227 | it'll
 228 | its
 229 | it's
 230 | itself
 231 | i've
 232 | just
 233 | keep
 234 | keeps
 235 | kept
 236 | know
 237 | known
 238 | knows
 239 | last
 240 | lately
 241 | later
 242 | latter
 243 | latterly
 244 | least
 245 | less
 246 | lest
 247 | let
 248 | let's
 249 | like
 250 | liked
 251 | likely
 252 | little
 253 | look
 254 | looking
 255 | looks
 256 | ltd
 257 | mainly
 258 | many
 259 | may
 260 | maybe
 261 | me
 262 | mean
 263 | meanwhile
 264 | merely
 265 | might
 266 | more
 267 | moreover
 268 | most
 269 | mostly
 270 | much
 271 | must
 272 | my
 273 | myself
 274 | name
 275 | namely
 276 | nd
 277 | near
 278 | nearly
 279 | necessary
 280 | need
 281 | needs
 282 | neither
 283 | never
 284 | nevertheless
 285 | new
 286 | next
 287 | nine
 288 | no
 289 | nobody
 290 | non
 291 | none
 292 | noone
 293 | nor
 294 | normally
 295 | not
 296 | nothing
 297 | novel
 298 | now
 299 | nowhere
 300 | obviously
 301 | of
 302 | off
 303 | often
 304 | oh
 305 | ok
 306 | okay
 307 | old
 308 | on
 309 | once
 310 | one
 311 | ones
 312 | only
 313 | onto
 314 | or
 315 | other
 316 | others
 317 | otherwise
 318 | ought
 319 | our
 320 | ours
 321 | ourselves
 322 | out
 323 | outside
 324 | over
 325 | overall
 326 | own
 327 | particular
 328 | particularly
 329 | per
 330 | perhaps
 331 | placed
 332 | please
 333 | plus
 334 | possible
 335 | presumably
 336 | probably
 337 | provides
 338 | que
 339 | quite
 340 | qv
 341 | rather
 342 | rd
 343 | re
 344 | really
 345 | reasonably
 346 | regarding
 347 | regardless
 348 | regards
 349 | relatively
 350 | respectively
 351 | right
 352 | said
 353 | same
 354 | saw
 355 | say
 356 | saying
 357 | says
 358 | second
 359 | secondly
 360 | see
 361 | seeing
 362 | seem
 363 | seemed
 364 | seeming
 365 | seems
 366 | seen
 367 | self
 368 | selves
 369 | sensible
 370 | sent
 371 | serious
 372 | seriously
 373 | seven
 374 | several
 375 | shall
 376 | she
 377 | should
 378 | shouldn't
 379 | since
 380 | six
 381 | so
 382 | some
 383 | somebody
 384 | somehow
 385 | someone
 386 | something
 387 | sometime
 388 | sometimes
 389 | somewhat
 390 | somewhere
 391 | soon
 392 | sorry
 393 | specified
 394 | specify
 395 | specifying
 396 | still
 397 | sub
 398 | such
 399 | sup
 400 | sure
 401 | take
 402 | taken
 403 | tell
 404 | tends
 405 | th
 406 | than
 407 | thank
 408 | thanks
 409 | thanx
 410 | that
 411 | thats
 412 | that's
 413 | the
 414 | their
 415 | theirs
 416 | them
 417 | themselves
 418 | then
 419 | thence
 420 | there
 421 | thereafter
 422 | thereby
 423 | therefore
 424 | therein
 425 | theres
 426 | there's
 427 | thereupon
 428 | these
 429 | they
 430 | they'd
 431 | they'll
 432 | they're
 433 | they've
 434 | think
 435 | third
 436 | this
 437 | thorough
 438 | thoroughly
 439 | those
 440 | though
 441 | three
 442 | through
 443 | throughout
 444 | thru
 445 | thus
 446 | to
 447 | together
 448 | too
 449 | took
 450 | toward
 451 | towards
 452 | tried
 453 | tries
 454 | truly
 455 | try
 456 | trying
 457 | t's
 458 | twice
 459 | two
 460 | un
 461 | under
 462 | unfortunately
 463 | unless
 464 | unlikely
 465 | until
 466 | unto
 467 | up
 468 | upon
 469 | us
 470 | use
 471 | used
 472 | useful
 473 | uses
 474 | using
 475 | usually
 476 | value
 477 | various
 478 | very
 479 | via
 480 | viz
 481 | vs
 482 | want
 483 | wants
 484 | was
 485 | wasn't
 486 | way
 487 | we
 488 | we'd
 489 | welcome
 490 | well
 491 | we'll
 492 | went
 493 | were
 494 | we're
 495 | weren't
 496 | we've
 497 | what
 498 | whatever
 499 | what's
 500 | when
 501 | whence
 502 | whenever
 503 | where
 504 | whereafter
 505 | whereas
 506 | whereby
 507 | wherein
 508 | where's
 509 | whereupon
 510 | wherever
 511 | whether
 512 | which
 513 | while
 514 | whither
 515 | who
 516 | whoever
 517 | whole
 518 | whom
 519 | who's
 520 | whose
 521 | why
 522 | will
 523 | willing
 524 | wish
 525 | with
 526 | within
 527 | without
 528 | wonder
 529 | won't
 530 | would
 531 | wouldn't
 532 | yes
 533 | yet
 534 | you
 535 | you'd
 536 | you'll
 537 | your
 538 | you're
 539 | yours
 540 | yourself
 541 | yourselves
 542 | you've
 543 | zero
 544 | zt
 545 | ZT
 546 | zz
 547 | ZZ
 548 | a
 549 | able
 550 | about
 551 | above
 552 | abst
 553 | accordance
 554 | according
 555 | accordingly
 556 | across
 557 | act
 558 | actually
 559 | added
 560 | adj
 561 | adopted
 562 | affected
 563 | affecting
 564 | affects
 565 | after
 566 | afterwards
 567 | again
 568 | against
 569 | ah
 570 | ain't
 571 | all
 572 | allow
 573 | allows
 574 | almost
 575 | alone
 576 | along
 577 | already
 578 | also
 579 | although
 580 | always
 581 | am
 582 | among
 583 | amongst
 584 | an
 585 | and
 586 | announce
 587 | another
 588 | any
 589 | anybody
 590 | anyhow
 591 | anymore
 592 | anyone
 593 | anything
 594 | anyway
 595 | anyways
 596 | anywhere
 597 | apart
 598 | apparently
 599 | appear
 600 | appreciate
 601 | appropriate
 602 | approximately
 603 | are
 604 | area
 605 | areas
 606 | aren
 607 | arent
 608 | aren't
 609 | arise
 610 | around
 611 | as
 612 | a's
 613 | aside
 614 | ask
 615 | asked
 616 | asking
 617 | asks
 618 | associated
 619 | at
 620 | auth
 621 | available
 622 | away
 623 | awfully
 624 | b
 625 | back
 626 | backed
 627 | backing
 628 | backs
 629 | be
 630 | became
 631 | because
 632 | become
 633 | becomes
 634 | becoming
 635 | been
 636 | before
 637 | beforehand
 638 | began
 639 | begin
 640 | beginning
 641 | beginnings
 642 | begins
 643 | behind
 644 | being
 645 | beings
 646 | believe
 647 | below
 648 | beside
 649 | besides
 650 | best
 651 | better
 652 | between
 653 | beyond
 654 | big
 655 | biol
 656 | both
 657 | brief
 658 | briefly
 659 | but
 660 | by
 661 | c
 662 | ca
 663 | came
 664 | can
 665 | cannot
 666 | cant
 667 | can't
 668 | case
 669 | cases
 670 | cause
 671 | causes
 672 | certain
 673 | certainly
 674 | changes
 675 | clear
 676 | clearly
 677 | c'mon
 678 | co
 679 | com
 680 | come
 681 | comes
 682 | concerning
 683 | consequently
 684 | consider
 685 | considering
 686 | contain
 687 | containing
 688 | contains
 689 | corresponding
 690 | could
 691 | couldnt
 692 | couldn't
 693 | course
 694 | c's
 695 | currently
 696 | d
 697 | 'd
 698 | date
 699 | definitely
 700 | describe
 701 | described
 702 | despite
 703 | did
 704 | didn't
 705 | differ
 706 | different
 707 | differently
 708 | discuss
 709 | do
 710 | does
 711 | doesn't
 712 | doing
 713 | done
 714 | don't
 715 | down
 716 | downed
 717 | downing
 718 | downs
 719 | downwards
 720 | due
 721 | during
 722 | e
 723 | each
 724 | early
 725 | ed
 726 | edu
 727 | effect
 728 | eg
 729 | eight
 730 | eighty
 731 | either
 732 | else
 733 | elsewhere
 734 | end
 735 | ended
 736 | ending
 737 | ends
 738 | enough
 739 | entirely
 740 | especially
 741 | et
 742 | et-al
 743 | etc
 744 | even
 745 | evenly
 746 | ever
 747 | every
 748 | everybody
 749 | everyone
 750 | everything
 751 | everywhere
 752 | ex
 753 | exactly
 754 | example
 755 | except
 756 | f
 757 | face
 758 | faces
 759 | fact
 760 | facts
 761 | far
 762 | felt
 763 | few
 764 | ff
 765 | fifth
 766 | find
 767 | finds
 768 | first
 769 | five
 770 | fix
 771 | followed
 772 | following
 773 | follows
 774 | for
 775 | former
 776 | formerly
 777 | forth
 778 | found
 779 | four
 780 | from
 781 | full
 782 | fully
 783 | further
 784 | furthered
 785 | furthering
 786 | furthermore
 787 | furthers
 788 | g
 789 | gave
 790 | general
 791 | generally
 792 | get
 793 | gets
 794 | getting
 795 | give
 796 | given
 797 | gives
 798 | giving
 799 | go
 800 | goes
 801 | going
 802 | gone
 803 | good
 804 | goods
 805 | got
 806 | gotten
 807 | great
 808 | greater
 809 | greatest
 810 | greetings
 811 | group
 812 | grouped
 813 | grouping
 814 | groups
 815 | h
 816 | had
 817 | hadn't
 818 | happens
 819 | hardly
 820 | has
 821 | hasn't
 822 | have
 823 | haven't
 824 | having
 825 | he
 826 | hed
 827 | hello
 828 | help
 829 | hence
 830 | her
 831 | here
 832 | hereafter
 833 | hereby
 834 | herein
 835 | heres
 836 | here's
 837 | hereupon
 838 | hers
 839 | herself
 840 | hes
 841 | he's
 842 | hi
 843 | hid
 844 | high
 845 | higher
 846 | highest
 847 | him
 848 | himself
 849 | his
 850 | hither
 851 | home
 852 | hopefully
 853 | how
 854 | howbeit
 855 | however
 856 | hundred
 857 | i
 858 | id
 859 | i'd
 860 | ie
 861 | if
 862 | ignored
 863 | i'll
 864 | im
 865 | i'm
 866 | immediate
 867 | immediately
 868 | importance
 869 | important
 870 | in
 871 | inasmuch
 872 | inc
 873 | include
 874 | indeed
 875 | index
 876 | indicate
 877 | indicated
 878 | indicates
 879 | information
 880 | inner
 881 | insofar
 882 | instead
 883 | interest
 884 | interested
 885 | interesting
 886 | interests
 887 | into
 888 | invention
 889 | inward
 890 | is
 891 | isn't
 892 | it
 893 | itd
 894 | it'd
 895 | it'll
 896 | its
 897 | it's
 898 | itself
 899 | i've
 900 | j
 901 | just
 902 | k
 903 | keep
 904 | keeps
 905 | kept
 906 | keys
 907 | kg
 908 | kind
 909 | km
 910 | knew
 911 | know
 912 | known
 913 | knows
 914 | l
 915 | large
 916 | largely
 917 | last
 918 | lately
 919 | later
 920 | latest
 921 | latter
 922 | latterly
 923 | least
 924 | less
 925 | lest
 926 | let
 927 | lets
 928 | let's
 929 | like
 930 | liked
 931 | likely
 932 | line
 933 | little
 934 | 'll
 935 | long
 936 | longer
 937 | longest
 938 | look
 939 | looking
 940 | looks
 941 | ltd
 942 | m
 943 | 'm
 944 | made
 945 | mainly
 946 | make
 947 | makes
 948 | making
 949 | man
 950 | many
 951 | may
 952 | maybe
 953 | me
 954 | mean
 955 | means
 956 | meantime
 957 | meanwhile
 958 | member
 959 | members
 960 | men
 961 | merely
 962 | mg
 963 | might
 964 | million
 965 | miss
 966 | ml
 967 | more
 968 | moreover
 969 | most
 970 | mostly
 971 | mr
 972 | mrs
 973 | much
 974 | mug
 975 | must
 976 | my
 977 | myself
 978 | n
 979 | na
 980 | name
 981 | namely
 982 | nay
 983 | nd
 984 | near
 985 | nearly
 986 | necessarily
 987 | necessary
 988 | need
 989 | needed
 990 | needing
 991 | needs
 992 | neither
 993 | never
 994 | nevertheless
 995 | new
 996 | newer
 997 | newest
 998 | next
 999 | nine
1000 | ninety
1001 | no
1002 | nobody
1003 | non
1004 | none
1005 | nonetheless
1006 | noone
1007 | nor
1008 | normally
1009 | nos
1010 | not
1011 | noted
1012 | nothing
1013 | novel
1014 | now
1015 | nowhere
1016 | n't
1017 | number
1018 | numbers
1019 | o
1020 | obtain
1021 | obtained
1022 | obviously
1023 | of
1024 | off
1025 | often
1026 | oh
1027 | ok
1028 | okay
1029 | old
1030 | older
1031 | oldest
1032 | omitted
1033 | on
1034 | once
1035 | one
1036 | ones
1037 | only
1038 | onto
1039 | open
1040 | opened
1041 | opening
1042 | opens
1043 | or
1044 | ord
1045 | order
1046 | ordered
1047 | ordering
1048 | orders
1049 | other
1050 | others
1051 | otherwise
1052 | ought
1053 | our
1054 | ours
1055 | ourselves
1056 | out
1057 | outside
1058 | over
1059 | overall
1060 | owing
1061 | own
1062 | p
1063 | page
1064 | pages
1065 | part
1066 | parted
1067 | particular
1068 | particularly
1069 | parting
1070 | parts
1071 | past
1072 | per
1073 | perhaps
1074 | place
1075 | placed
1076 | places
1077 | please
1078 | plus
1079 | point
1080 | pointed
1081 | pointing
1082 | points
1083 | poorly
1084 | possible
1085 | possibly
1086 | potentially
1087 | pp
1088 | predominantly
1089 | present
1090 | presented
1091 | presenting
1092 | presents
1093 | presumably
1094 | previously
1095 | primarily
1096 | probably
1097 | problem
1098 | problems
1099 | promptly
1100 | proud
1101 | provides
1102 | put
1103 | puts
1104 | q
1105 | que
1106 | quickly
1107 | quite
1108 | qv
1109 | r
1110 | ran
1111 | rather
1112 | rd
1113 | re
1114 | 're
1115 | readily
1116 | really
1117 | reasonably
1118 | recent
1119 | recently
1120 | ref
1121 | refs
1122 | regarding
1123 | regardless
1124 | regards
1125 | related
1126 | relatively
1127 | research
1128 | respectively
1129 | resulted
1130 | resulting
1131 | results
1132 | right
1133 | room
1134 | rooms
1135 | run
1136 | s
1137 | 's
1138 | said
1139 | same
1140 | saw
1141 | say
1142 | saying
1143 | says
1144 | sec
1145 | second
1146 | secondly
1147 | seconds
1148 | section
1149 | see
1150 | seeing
1151 | seem
1152 | seemed
1153 | seeming
1154 | seems
1155 | seen
1156 | sees
1157 | self
1158 | selves
1159 | sensible
1160 | sent
1161 | serious
1162 | seriously
1163 | seven
1164 | several
1165 | shall
1166 | she
1167 | shed
1168 | she'll
1169 | shes
1170 | should
1171 | shouldn't
1172 | show
1173 | showed
1174 | showing
1175 | shown
1176 | showns
1177 | shows
1178 | side
1179 | sides
1180 | significant
1181 | significantly
1182 | similar
1183 | similarly
1184 | since
1185 | six
1186 | slightly
1187 | small
1188 | smaller
1189 | smallest
1190 | so
1191 | some
1192 | somebody
1193 | somehow
1194 | someone
1195 | somethan
1196 | something
1197 | sometime
1198 | sometimes
1199 | somewhat
1200 | somewhere
1201 | soon
1202 | sorry
1203 | specifically
1204 | specified
1205 | specify
1206 | specifying
1207 | state
1208 | states
1209 | still
1210 | stop
1211 | strongly
1212 | sub
1213 | substantially
1214 | successfully
1215 | such
1216 | sufficiently
1217 | suggest
1218 | sup
1219 | sure
1220 | t
1221 | 't
1222 | take
1223 | taken
1224 | taking
1225 | tell
1226 | tends
1227 | th
1228 | than
1229 | thank
1230 | thanks
1231 | thanx
1232 | that
1233 | that'll
1234 | thats
1235 | that's
1236 | that've
1237 | the
1238 | their
1239 | theirs
1240 | them
1241 | themselves
1242 | then
1243 | thence
1244 | there
1245 | thereafter
1246 | thereby
1247 | thered
1248 | therefore
1249 | therein
1250 | there'll
1251 | thereof
1252 | therere
1253 | theres
1254 | there's
1255 | thereto
1256 | thereupon
1257 | there've
1258 | these
1259 | they
1260 | theyd
1261 | they'd
1262 | they'll
1263 | theyre
1264 | they're
1265 | they've
1266 | thing
1267 | things
1268 | think
1269 | thinks
1270 | third
1271 | this
1272 | thorough
1273 | thoroughly
1274 | those
1275 | thou
1276 | though
1277 | thoughh
1278 | thought
1279 | thoughts
1280 | thousand
1281 | three
1282 | throug
1283 | through
1284 | throughout
1285 | thru
1286 | thus
1287 | til
1288 | tip
1289 | to
1290 | today
1291 | together
1292 | too
1293 | took
1294 | toward
1295 | towards
1296 | tried
1297 | tries
1298 | truly
1299 | try
1300 | trying
1301 | ts
1302 | t's
1303 | turn
1304 | turned
1305 | turning
1306 | turns
1307 | twice
1308 | two
1309 | u
1310 | un
1311 | under
1312 | unfortunately
1313 | unless
1314 | unlike
1315 | unlikely
1316 | until
1317 | unto
1318 | up
1319 | upon
1320 | ups
1321 | us
1322 | use
1323 | used
1324 | useful
1325 | usefully
1326 | usefulness
1327 | uses
1328 | using
1329 | usually
1330 | uucp
1331 | v
1332 | value
1333 | various
1334 | 've
1335 | very
1336 | via
1337 | viz
1338 | vol
1339 | vols
1340 | vs
1341 | w
1342 | want
1343 | wanted
1344 | wanting
1345 | wants
1346 | was
1347 | wasn't
1348 | way
1349 | ways
1350 | we
1351 | wed
1352 | we'd
1353 | welcome
1354 | well
1355 | we'll
1356 | wells
1357 | went
1358 | were
1359 | we're
1360 | weren't
1361 | we've
1362 | what
1363 | whatever
1364 | what'll
1365 | whats
1366 | what's
1367 | when
1368 | whence
1369 | whenever
1370 | where
1371 | whereafter
1372 | whereas
1373 | whereby
1374 | wherein
1375 | wheres
1376 | where's
1377 | whereupon
1378 | wherever
1379 | whether
1380 | which
1381 | while
1382 | whim
1383 | whither
1384 | who
1385 | whod
1386 | whoever
1387 | whole
1388 | who'll
1389 | whom
1390 | whomever
1391 | whos
1392 | who's
1393 | whose
1394 | why
1395 | widely
1396 | will
1397 | willing
1398 | wish
1399 | with
1400 | within
1401 | without
1402 | wonder
1403 | won't
1404 | words
1405 | work
1406 | worked
1407 | working
1408 | works
1409 | world
1410 | would
1411 | wouldn't
1412 | www
1413 | x
1414 | y
1415 | year
1416 | years
1417 | yes
1418 | yet
1419 | you
1420 | youd
1421 | you'd
1422 | you'll
1423 | young
1424 | younger
1425 | youngest
1426 | your
1427 | youre
1428 | you're
1429 | yours
1430 | yourself
1431 | yourselves
1432 | you've
1433 | z
1434 | zero
1435 | 


--------------------------------------------------------------------------------
/lib/ansj_seg-5.0.2-all-in-one.jar:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/PKULCWM/PKUSUMSUM/721bbee1b1e112dce97c2fa3193fb61488a624bb/lib/ansj_seg-5.0.2-all-in-one.jar


--------------------------------------------------------------------------------
/lib/lpsolve55j.jar:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/PKULCWM/PKUSUMSUM/721bbee1b1e112dce97c2fa3193fb61488a624bb/lib/lpsolve55j.jar


--------------------------------------------------------------------------------
/lib/slf4j-nop-1.7.21.jar:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/PKULCWM/PKUSUMSUM/721bbee1b1e112dce97c2fa3193fb61488a624bb/lib/slf4j-nop-1.7.21.jar


--------------------------------------------------------------------------------
/lib/stanford-ner.jar:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/PKULCWM/PKUSUMSUM/721bbee1b1e112dce97c2fa3193fb61488a624bb/lib/stanford-ner.jar


--------------------------------------------------------------------------------
/lib/stopword_Eng:
--------------------------------------------------------------------------------
   1 | able
   2 | about
   3 | above
   4 | according
   5 | accordingly
   6 | across
   7 | actually
   8 | after
   9 | afterwards
  10 | again
  11 | against
  12 | ain't
  13 | all
  14 | allow
  15 | allows
  16 | almost
  17 | alone
  18 | along
  19 | already
  20 | also
  21 | although
  22 | always
  23 | am
  24 | among
  25 | amongst
  26 | an
  27 | and
  28 | another
  29 | any
  30 | anybody
  31 | anyhow
  32 | anyone
  33 | anything
  34 | anyway
  35 | anyways
  36 | anywhere
  37 | apart
  38 | appear
  39 | appreciate
  40 | appropriate
  41 | are
  42 | aren't
  43 | around
  44 | as
  45 | a's
  46 | aside
  47 | ask
  48 | asking
  49 | associated
  50 | at
  51 | available
  52 | away
  53 | awfully
  54 | be
  55 | became
  56 | because
  57 | become
  58 | becomes
  59 | becoming
  60 | been
  61 | before
  62 | beforehand
  63 | behind
  64 | being
  65 | believe
  66 | below
  67 | beside
  68 | besides
  69 | best
  70 | better
  71 | between
  72 | beyond
  73 | both
  74 | brief
  75 | but
  76 | by
  77 | came
  78 | can
  79 | cannot
  80 | cant
  81 | can't
  82 | cause
  83 | causes
  84 | certain
  85 | certainly
  86 | changes
  87 | clearly
  88 | c'mon
  89 | co
  90 | com
  91 | come
  92 | comes
  93 | concerning
  94 | consequently
  95 | consider
  96 | considering
  97 | contain
  98 | containing
  99 | contains
 100 | corresponding
 101 | could
 102 | couldn't
 103 | course
 104 | c's
 105 | currently
 106 | definitely
 107 | described
 108 | despite
 109 | did
 110 | didn't
 111 | different
 112 | do
 113 | does
 114 | doesn't
 115 | doing
 116 | done
 117 | don't
 118 | down
 119 | downwards
 120 | during
 121 | each
 122 | edu
 123 | eg
 124 | eight
 125 | either
 126 | else
 127 | elsewhere
 128 | enough
 129 | entirely
 130 | especially
 131 | et
 132 | etc
 133 | even
 134 | ever
 135 | every
 136 | everybody
 137 | everyone
 138 | everything
 139 | everywhere
 140 | ex
 141 | exactly
 142 | example
 143 | except
 144 | far
 145 | few
 146 | fifth
 147 | first
 148 | five
 149 | followed
 150 | following
 151 | follows
 152 | for
 153 | former
 154 | formerly
 155 | forth
 156 | four
 157 | from
 158 | further
 159 | furthermore
 160 | get
 161 | gets
 162 | getting
 163 | given
 164 | gives
 165 | go
 166 | goes
 167 | going
 168 | gone
 169 | got
 170 | gotten
 171 | greetings
 172 | had
 173 | hadn't
 174 | happens
 175 | hardly
 176 | has
 177 | hasn't
 178 | have
 179 | haven't
 180 | having
 181 | he
 182 | hello
 183 | help
 184 | hence
 185 | her
 186 | here
 187 | hereafter
 188 | hereby
 189 | herein
 190 | here's
 191 | hereupon
 192 | hers
 193 | herself
 194 | he's
 195 | hi
 196 | him
 197 | himself
 198 | his
 199 | hither
 200 | hopefully
 201 | how
 202 | howbeit
 203 | however
 204 | i'd
 205 | ie
 206 | if
 207 | ignored
 208 | i'll
 209 | i'm
 210 | immediate
 211 | in
 212 | inasmuch
 213 | inc
 214 | indeed
 215 | indicate
 216 | indicated
 217 | indicates
 218 | inner
 219 | insofar
 220 | instead
 221 | into
 222 | inward
 223 | is
 224 | isn't
 225 | it
 226 | it'd
 227 | it'll
 228 | its
 229 | it's
 230 | itself
 231 | i've
 232 | just
 233 | keep
 234 | keeps
 235 | kept
 236 | know
 237 | known
 238 | knows
 239 | last
 240 | lately
 241 | later
 242 | latter
 243 | latterly
 244 | least
 245 | less
 246 | lest
 247 | let
 248 | let's
 249 | like
 250 | liked
 251 | likely
 252 | little
 253 | look
 254 | looking
 255 | looks
 256 | ltd
 257 | mainly
 258 | many
 259 | may
 260 | maybe
 261 | me
 262 | mean
 263 | meanwhile
 264 | merely
 265 | might
 266 | more
 267 | moreover
 268 | most
 269 | mostly
 270 | much
 271 | must
 272 | my
 273 | myself
 274 | name
 275 | namely
 276 | nd
 277 | near
 278 | nearly
 279 | necessary
 280 | need
 281 | needs
 282 | neither
 283 | never
 284 | nevertheless
 285 | new
 286 | next
 287 | nine
 288 | no
 289 | nobody
 290 | non
 291 | none
 292 | noone
 293 | nor
 294 | normally
 295 | not
 296 | nothing
 297 | novel
 298 | now
 299 | nowhere
 300 | obviously
 301 | of
 302 | off
 303 | often
 304 | oh
 305 | ok
 306 | okay
 307 | old
 308 | on
 309 | once
 310 | one
 311 | ones
 312 | only
 313 | onto
 314 | or
 315 | other
 316 | others
 317 | otherwise
 318 | ought
 319 | our
 320 | ours
 321 | ourselves
 322 | out
 323 | outside
 324 | over
 325 | overall
 326 | own
 327 | particular
 328 | particularly
 329 | per
 330 | perhaps
 331 | placed
 332 | please
 333 | plus
 334 | possible
 335 | presumably
 336 | probably
 337 | provides
 338 | que
 339 | quite
 340 | qv
 341 | rather
 342 | rd
 343 | re
 344 | really
 345 | reasonably
 346 | regarding
 347 | regardless
 348 | regards
 349 | relatively
 350 | respectively
 351 | right
 352 | said
 353 | same
 354 | saw
 355 | say
 356 | saying
 357 | says
 358 | second
 359 | secondly
 360 | see
 361 | seeing
 362 | seem
 363 | seemed
 364 | seeming
 365 | seems
 366 | seen
 367 | self
 368 | selves
 369 | sensible
 370 | sent
 371 | serious
 372 | seriously
 373 | seven
 374 | several
 375 | shall
 376 | she
 377 | should
 378 | shouldn't
 379 | since
 380 | six
 381 | so
 382 | some
 383 | somebody
 384 | somehow
 385 | someone
 386 | something
 387 | sometime
 388 | sometimes
 389 | somewhat
 390 | somewhere
 391 | soon
 392 | sorry
 393 | specified
 394 | specify
 395 | specifying
 396 | still
 397 | sub
 398 | such
 399 | sup
 400 | sure
 401 | take
 402 | taken
 403 | tell
 404 | tends
 405 | th
 406 | than
 407 | thank
 408 | thanks
 409 | thanx
 410 | that
 411 | thats
 412 | that's
 413 | the
 414 | their
 415 | theirs
 416 | them
 417 | themselves
 418 | then
 419 | thence
 420 | there
 421 | thereafter
 422 | thereby
 423 | therefore
 424 | therein
 425 | theres
 426 | there's
 427 | thereupon
 428 | these
 429 | they
 430 | they'd
 431 | they'll
 432 | they're
 433 | they've
 434 | think
 435 | third
 436 | this
 437 | thorough
 438 | thoroughly
 439 | those
 440 | though
 441 | three
 442 | through
 443 | throughout
 444 | thru
 445 | thus
 446 | to
 447 | together
 448 | too
 449 | took
 450 | toward
 451 | towards
 452 | tried
 453 | tries
 454 | truly
 455 | try
 456 | trying
 457 | t's
 458 | twice
 459 | two
 460 | un
 461 | under
 462 | unfortunately
 463 | unless
 464 | unlikely
 465 | until
 466 | unto
 467 | up
 468 | upon
 469 | us
 470 | use
 471 | used
 472 | useful
 473 | uses
 474 | using
 475 | usually
 476 | value
 477 | various
 478 | very
 479 | via
 480 | viz
 481 | vs
 482 | want
 483 | wants
 484 | was
 485 | wasn't
 486 | way
 487 | we
 488 | we'd
 489 | welcome
 490 | well
 491 | we'll
 492 | went
 493 | were
 494 | we're
 495 | weren't
 496 | we've
 497 | what
 498 | whatever
 499 | what's
 500 | when
 501 | whence
 502 | whenever
 503 | where
 504 | whereafter
 505 | whereas
 506 | whereby
 507 | wherein
 508 | where's
 509 | whereupon
 510 | wherever
 511 | whether
 512 | which
 513 | while
 514 | whither
 515 | who
 516 | whoever
 517 | whole
 518 | whom
 519 | who's
 520 | whose
 521 | why
 522 | will
 523 | willing
 524 | wish
 525 | with
 526 | within
 527 | without
 528 | wonder
 529 | won't
 530 | would
 531 | wouldn't
 532 | yes
 533 | yet
 534 | you
 535 | you'd
 536 | you'll
 537 | your
 538 | you're
 539 | yours
 540 | yourself
 541 | yourselves
 542 | you've
 543 | zero
 544 | zt
 545 | ZT
 546 | zz
 547 | ZZ
 548 | a
 549 | able
 550 | about
 551 | above
 552 | abst
 553 | accordance
 554 | according
 555 | accordingly
 556 | across
 557 | act
 558 | actually
 559 | added
 560 | adj
 561 | adopted
 562 | affected
 563 | affecting
 564 | affects
 565 | after
 566 | afterwards
 567 | again
 568 | against
 569 | ah
 570 | ain't
 571 | all
 572 | allow
 573 | allows
 574 | almost
 575 | alone
 576 | along
 577 | already
 578 | also
 579 | although
 580 | always
 581 | am
 582 | among
 583 | amongst
 584 | an
 585 | and
 586 | announce
 587 | another
 588 | any
 589 | anybody
 590 | anyhow
 591 | anymore
 592 | anyone
 593 | anything
 594 | anyway
 595 | anyways
 596 | anywhere
 597 | apart
 598 | apparently
 599 | appear
 600 | appreciate
 601 | appropriate
 602 | approximately
 603 | are
 604 | area
 605 | areas
 606 | aren
 607 | arent
 608 | aren't
 609 | arise
 610 | around
 611 | as
 612 | a's
 613 | aside
 614 | ask
 615 | asked
 616 | asking
 617 | asks
 618 | associated
 619 | at
 620 | auth
 621 | available
 622 | away
 623 | awfully
 624 | b
 625 | back
 626 | backed
 627 | backing
 628 | backs
 629 | be
 630 | became
 631 | because
 632 | become
 633 | becomes
 634 | becoming
 635 | been
 636 | before
 637 | beforehand
 638 | began
 639 | begin
 640 | beginning
 641 | beginnings
 642 | begins
 643 | behind
 644 | being
 645 | beings
 646 | believe
 647 | below
 648 | beside
 649 | besides
 650 | best
 651 | better
 652 | between
 653 | beyond
 654 | big
 655 | biol
 656 | both
 657 | brief
 658 | briefly
 659 | but
 660 | by
 661 | c
 662 | ca
 663 | came
 664 | can
 665 | cannot
 666 | cant
 667 | can't
 668 | case
 669 | cases
 670 | cause
 671 | causes
 672 | certain
 673 | certainly
 674 | changes
 675 | clear
 676 | clearly
 677 | c'mon
 678 | co
 679 | com
 680 | come
 681 | comes
 682 | concerning
 683 | consequently
 684 | consider
 685 | considering
 686 | contain
 687 | containing
 688 | contains
 689 | corresponding
 690 | could
 691 | couldnt
 692 | couldn't
 693 | course
 694 | c's
 695 | currently
 696 | d
 697 | 'd
 698 | date
 699 | definitely
 700 | describe
 701 | described
 702 | despite
 703 | did
 704 | didn't
 705 | differ
 706 | different
 707 | differently
 708 | discuss
 709 | do
 710 | does
 711 | doesn't
 712 | doing
 713 | done
 714 | don't
 715 | down
 716 | downed
 717 | downing
 718 | downs
 719 | downwards
 720 | due
 721 | during
 722 | e
 723 | each
 724 | early
 725 | ed
 726 | edu
 727 | effect
 728 | eg
 729 | eight
 730 | eighty
 731 | either
 732 | else
 733 | elsewhere
 734 | end
 735 | ended
 736 | ending
 737 | ends
 738 | enough
 739 | entirely
 740 | especially
 741 | et
 742 | et-al
 743 | etc
 744 | even
 745 | evenly
 746 | ever
 747 | every
 748 | everybody
 749 | everyone
 750 | everything
 751 | everywhere
 752 | ex
 753 | exactly
 754 | example
 755 | except
 756 | f
 757 | face
 758 | faces
 759 | fact
 760 | facts
 761 | far
 762 | felt
 763 | few
 764 | ff
 765 | fifth
 766 | find
 767 | finds
 768 | first
 769 | five
 770 | fix
 771 | followed
 772 | following
 773 | follows
 774 | for
 775 | former
 776 | formerly
 777 | forth
 778 | found
 779 | four
 780 | from
 781 | full
 782 | fully
 783 | further
 784 | furthered
 785 | furthering
 786 | furthermore
 787 | furthers
 788 | g
 789 | gave
 790 | general
 791 | generally
 792 | get
 793 | gets
 794 | getting
 795 | give
 796 | given
 797 | gives
 798 | giving
 799 | go
 800 | goes
 801 | going
 802 | gone
 803 | good
 804 | goods
 805 | got
 806 | gotten
 807 | great
 808 | greater
 809 | greatest
 810 | greetings
 811 | group
 812 | grouped
 813 | grouping
 814 | groups
 815 | h
 816 | had
 817 | hadn't
 818 | happens
 819 | hardly
 820 | has
 821 | hasn't
 822 | have
 823 | haven't
 824 | having
 825 | he
 826 | hed
 827 | hello
 828 | help
 829 | hence
 830 | her
 831 | here
 832 | hereafter
 833 | hereby
 834 | herein
 835 | heres
 836 | here's
 837 | hereupon
 838 | hers
 839 | herself
 840 | hes
 841 | he's
 842 | hi
 843 | hid
 844 | high
 845 | higher
 846 | highest
 847 | him
 848 | himself
 849 | his
 850 | hither
 851 | home
 852 | hopefully
 853 | how
 854 | howbeit
 855 | however
 856 | hundred
 857 | i
 858 | id
 859 | i'd
 860 | ie
 861 | if
 862 | ignored
 863 | i'll
 864 | im
 865 | i'm
 866 | immediate
 867 | immediately
 868 | importance
 869 | important
 870 | in
 871 | inasmuch
 872 | inc
 873 | include
 874 | indeed
 875 | index
 876 | indicate
 877 | indicated
 878 | indicates
 879 | information
 880 | inner
 881 | insofar
 882 | instead
 883 | interest
 884 | interested
 885 | interesting
 886 | interests
 887 | into
 888 | invention
 889 | inward
 890 | is
 891 | isn't
 892 | it
 893 | itd
 894 | it'd
 895 | it'll
 896 | its
 897 | it's
 898 | itself
 899 | i've
 900 | j
 901 | just
 902 | k
 903 | keep
 904 | keeps
 905 | kept
 906 | keys
 907 | kg
 908 | kind
 909 | km
 910 | knew
 911 | know
 912 | known
 913 | knows
 914 | l
 915 | large
 916 | largely
 917 | last
 918 | lately
 919 | later
 920 | latest
 921 | latter
 922 | latterly
 923 | least
 924 | less
 925 | lest
 926 | let
 927 | lets
 928 | let's
 929 | like
 930 | liked
 931 | likely
 932 | line
 933 | little
 934 | 'll
 935 | long
 936 | longer
 937 | longest
 938 | look
 939 | looking
 940 | looks
 941 | ltd
 942 | m
 943 | 'm
 944 | made
 945 | mainly
 946 | make
 947 | makes
 948 | making
 949 | man
 950 | many
 951 | may
 952 | maybe
 953 | me
 954 | mean
 955 | means
 956 | meantime
 957 | meanwhile
 958 | member
 959 | members
 960 | men
 961 | merely
 962 | mg
 963 | might
 964 | million
 965 | miss
 966 | ml
 967 | more
 968 | moreover
 969 | most
 970 | mostly
 971 | mr
 972 | mrs
 973 | much
 974 | mug
 975 | must
 976 | my
 977 | myself
 978 | n
 979 | na
 980 | name
 981 | namely
 982 | nay
 983 | nd
 984 | near
 985 | nearly
 986 | necessarily
 987 | necessary
 988 | need
 989 | needed
 990 | needing
 991 | needs
 992 | neither
 993 | never
 994 | nevertheless
 995 | new
 996 | newer
 997 | newest
 998 | next
 999 | nine
1000 | ninety
1001 | no
1002 | nobody
1003 | non
1004 | none
1005 | nonetheless
1006 | noone
1007 | nor
1008 | normally
1009 | nos
1010 | not
1011 | noted
1012 | nothing
1013 | novel
1014 | now
1015 | nowhere
1016 | n't
1017 | number
1018 | numbers
1019 | o
1020 | obtain
1021 | obtained
1022 | obviously
1023 | of
1024 | off
1025 | often
1026 | oh
1027 | ok
1028 | okay
1029 | old
1030 | older
1031 | oldest
1032 | omitted
1033 | on
1034 | once
1035 | one
1036 | ones
1037 | only
1038 | onto
1039 | open
1040 | opened
1041 | opening
1042 | opens
1043 | or
1044 | ord
1045 | order
1046 | ordered
1047 | ordering
1048 | orders
1049 | other
1050 | others
1051 | otherwise
1052 | ought
1053 | our
1054 | ours
1055 | ourselves
1056 | out
1057 | outside
1058 | over
1059 | overall
1060 | owing
1061 | own
1062 | p
1063 | page
1064 | pages
1065 | part
1066 | parted
1067 | particular
1068 | particularly
1069 | parting
1070 | parts
1071 | past
1072 | per
1073 | perhaps
1074 | place
1075 | placed
1076 | places
1077 | please
1078 | plus
1079 | point
1080 | pointed
1081 | pointing
1082 | points
1083 | poorly
1084 | possible
1085 | possibly
1086 | potentially
1087 | pp
1088 | predominantly
1089 | present
1090 | presented
1091 | presenting
1092 | presents
1093 | presumably
1094 | previously
1095 | primarily
1096 | probably
1097 | problem
1098 | problems
1099 | promptly
1100 | proud
1101 | provides
1102 | put
1103 | puts
1104 | q
1105 | que
1106 | quickly
1107 | quite
1108 | qv
1109 | r
1110 | ran
1111 | rather
1112 | rd
1113 | re
1114 | 're
1115 | readily
1116 | really
1117 | reasonably
1118 | recent
1119 | recently
1120 | ref
1121 | refs
1122 | regarding
1123 | regardless
1124 | regards
1125 | related
1126 | relatively
1127 | research
1128 | respectively
1129 | resulted
1130 | resulting
1131 | results
1132 | right
1133 | room
1134 | rooms
1135 | run
1136 | s
1137 | 's
1138 | said
1139 | same
1140 | saw
1141 | say
1142 | saying
1143 | says
1144 | sec
1145 | second
1146 | secondly
1147 | seconds
1148 | section
1149 | see
1150 | seeing
1151 | seem
1152 | seemed
1153 | seeming
1154 | seems
1155 | seen
1156 | sees
1157 | self
1158 | selves
1159 | sensible
1160 | sent
1161 | serious
1162 | seriously
1163 | seven
1164 | several
1165 | shall
1166 | she
1167 | shed
1168 | she'll
1169 | shes
1170 | should
1171 | shouldn't
1172 | show
1173 | showed
1174 | showing
1175 | shown
1176 | showns
1177 | shows
1178 | side
1179 | sides
1180 | significant
1181 | significantly
1182 | similar
1183 | similarly
1184 | since
1185 | six
1186 | slightly
1187 | small
1188 | smaller
1189 | smallest
1190 | so
1191 | some
1192 | somebody
1193 | somehow
1194 | someone
1195 | somethan
1196 | something
1197 | sometime
1198 | sometimes
1199 | somewhat
1200 | somewhere
1201 | soon
1202 | sorry
1203 | specifically
1204 | specified
1205 | specify
1206 | specifying
1207 | state
1208 | states
1209 | still
1210 | stop
1211 | strongly
1212 | sub
1213 | substantially
1214 | successfully
1215 | such
1216 | sufficiently
1217 | suggest
1218 | sup
1219 | sure
1220 | t
1221 | 't
1222 | take
1223 | taken
1224 | taking
1225 | tell
1226 | tends
1227 | th
1228 | than
1229 | thank
1230 | thanks
1231 | thanx
1232 | that
1233 | that'll
1234 | thats
1235 | that's
1236 | that've
1237 | the
1238 | their
1239 | theirs
1240 | them
1241 | themselves
1242 | then
1243 | thence
1244 | there
1245 | thereafter
1246 | thereby
1247 | thered
1248 | therefore
1249 | therein
1250 | there'll
1251 | thereof
1252 | therere
1253 | theres
1254 | there's
1255 | thereto
1256 | thereupon
1257 | there've
1258 | these
1259 | they
1260 | theyd
1261 | they'd
1262 | they'll
1263 | theyre
1264 | they're
1265 | they've
1266 | thing
1267 | things
1268 | think
1269 | thinks
1270 | third
1271 | this
1272 | thorough
1273 | thoroughly
1274 | those
1275 | thou
1276 | though
1277 | thoughh
1278 | thought
1279 | thoughts
1280 | thousand
1281 | three
1282 | throug
1283 | through
1284 | throughout
1285 | thru
1286 | thus
1287 | til
1288 | tip
1289 | to
1290 | today
1291 | together
1292 | too
1293 | took
1294 | toward
1295 | towards
1296 | tried
1297 | tries
1298 | truly
1299 | try
1300 | trying
1301 | ts
1302 | t's
1303 | turn
1304 | turned
1305 | turning
1306 | turns
1307 | twice
1308 | two
1309 | u
1310 | un
1311 | under
1312 | unfortunately
1313 | unless
1314 | unlike
1315 | unlikely
1316 | until
1317 | unto
1318 | up
1319 | upon
1320 | ups
1321 | us
1322 | use
1323 | used
1324 | useful
1325 | usefully
1326 | usefulness
1327 | uses
1328 | using
1329 | usually
1330 | uucp
1331 | v
1332 | value
1333 | various
1334 | 've
1335 | very
1336 | via
1337 | viz
1338 | vol
1339 | vols
1340 | vs
1341 | w
1342 | want
1343 | wanted
1344 | wanting
1345 | wants
1346 | was
1347 | wasn't
1348 | way
1349 | ways
1350 | we
1351 | wed
1352 | we'd
1353 | welcome
1354 | well
1355 | we'll
1356 | wells
1357 | went
1358 | were
1359 | we're
1360 | weren't
1361 | we've
1362 | what
1363 | whatever
1364 | what'll
1365 | whats
1366 | what's
1367 | when
1368 | whence
1369 | whenever
1370 | where
1371 | whereafter
1372 | whereas
1373 | whereby
1374 | wherein
1375 | wheres
1376 | where's
1377 | whereupon
1378 | wherever
1379 | whether
1380 | which
1381 | while
1382 | whim
1383 | whither
1384 | who
1385 | whod
1386 | whoever
1387 | whole
1388 | who'll
1389 | whom
1390 | whomever
1391 | whos
1392 | who's
1393 | whose
1394 | why
1395 | widely
1396 | will
1397 | willing
1398 | wish
1399 | with
1400 | within
1401 | without
1402 | wonder
1403 | won't
1404 | words
1405 | work
1406 | worked
1407 | working
1408 | works
1409 | world
1410 | would
1411 | wouldn't
1412 | www
1413 | x
1414 | y
1415 | year
1416 | years
1417 | yes
1418 | yet
1419 | you
1420 | youd
1421 | you'd
1422 | you'll
1423 | young
1424 | younger
1425 | youngest
1426 | your
1427 | youre
1428 | you're
1429 | yours
1430 | yourself
1431 | yourselves
1432 | you've
1433 | z
1434 | zero
1435 | 


--------------------------------------------------------------------------------