├── .DS_Store ├── bin ├── .DS_Store ├── launcher.sh ├── make_vectors.py ├── w2v-compute-accuracy ├── w2v-distance ├── w2v-word-analogy ├── word2nvec └── word2phrase ├── readme.md ├── topwords2 ├── vectors ├── .DS_Store ├── analysis.py ├── analysis.pyc ├── filt.sh ├── filtVectors.txt.gz ├── filterVocab.py ├── fullVocab.txt ├── launcher.py ├── sampleVectors.txt ├── sampleVectors.txt.gz ├── top_words │ ├── .DS_Store │ └── words280 └── words ├── word2nvec-c ├── .DS_Store ├── compute-accuracy.c ├── distance.c ├── makefile ├── word-analogy.c ├── word2nvec.c └── word2phrase.c └── words /.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/luohongyin/OIWE/ae6300754801cc1b2bc9195bf305670e78774a72/.DS_Store -------------------------------------------------------------------------------- /bin/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/luohongyin/OIWE/ae6300754801cc1b2bc9195bf305670e78774a72/bin/.DS_Store -------------------------------------------------------------------------------- /bin/launcher.sh: -------------------------------------------------------------------------------- 1 | #lhy 2 | #2015.4 3 | 4 | ./word2vec -train ../../nnse/text8 -output text8.txt -size 400 -threads 24 -binary 0 -cbow 0 -negative 1 -iter 15 5 | -------------------------------------------------------------------------------- /bin/make_vectors.py: -------------------------------------------------------------------------------- 1 | #lhy 2 | #2015.5 3 | 4 | import cPickle 5 | import word2vec 6 | 7 | model = word2vec.load("text8.txt") 8 | words = list(model.vocab) 9 | vectors = {} 10 | for i in range(1,len(words)): 11 | vectors[words[i]] = list(model[words[i]]) 12 | print "Vectors ok" 13 | 14 | cPickle.dump(vectors,open("../vectors/vectors",'wb')) 15 | print "Data ok" 16 | -------------------------------------------------------------------------------- /bin/w2v-compute-accuracy: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/luohongyin/OIWE/ae6300754801cc1b2bc9195bf305670e78774a72/bin/w2v-compute-accuracy -------------------------------------------------------------------------------- /bin/w2v-distance: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/luohongyin/OIWE/ae6300754801cc1b2bc9195bf305670e78774a72/bin/w2v-distance -------------------------------------------------------------------------------- /bin/w2v-word-analogy: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/luohongyin/OIWE/ae6300754801cc1b2bc9195bf305670e78774a72/bin/w2v-word-analogy -------------------------------------------------------------------------------- /bin/word2nvec: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/luohongyin/OIWE/ae6300754801cc1b2bc9195bf305670e78774a72/bin/word2nvec -------------------------------------------------------------------------------- /bin/word2phrase: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/luohongyin/OIWE/ae6300754801cc1b2bc9195bf305670e78774a72/bin/word2phrase -------------------------------------------------------------------------------- /readme.md: -------------------------------------------------------------------------------- 1 | #Interpretable Word Embeddings 2 | 3 | ##Overview 4 | This tool provide a method of interpretable word embeddings, which is based on OIWE-IPG model in our paper, Online Learning of Interpretable Word Embeddings. 5 | 6 | ##Usage 7 | 8 | ####In Directory word2nve-c/: 9 | Use command "make" to compile sourse files in directory. OIWE model is realized in word2nvec.c 10 | 11 | ####In Directory bin/: 12 | The command to run OIWE model is in launcher.sh. Configurations in launcher.sh can be modified. Use "sh launcher.sh" to run the model. 13 | 14 | format: 15 | ./word2vec -train -output -size -threads -binary -cbow 0 16 | 17 | ##Experiments 18 | 19 | ####Data 20 | * text8 corpus from word2vec website: https://code.google.com/p/word2vec/ 21 | * Test datasets on http://wordvectors.org 22 | 23 | 24 | ####Word Similarty Task 25 | 26 | | Task name | Word pairs | Pairs found | OIWE-IPG | OIWE-NPG | Skip-gram | RNN | 27 | | ------------- |:-------------:| -----:| ------:| ----------:| ----------:| ----------:| 28 | | WS-353 | 353 | 351 | **0.6415** | 0.5833 | 0.6380 | 0.3675 | 29 | | WS-353-SIM | 203 | 202 | **0.7174** | 0.6371 | 0.6735 | 0.4928 | 30 | | WS-353-REL | 252 | 251 | 0.6032 | 0.5788 | **0.6039** | 0.2925 | 31 | | MC-30 | 30 | 30 | **0.6245** | 0.5245 | 0.5041 | 0.5826 | 32 | | RG-65 | 65 | 65 | **0.5716** | 0.5685 | 0.5049 | 0.5019 | 33 | | Rare-Word | 2034 | 951 | 0.3379 | 0.2544 | 0.3694 | **0.3904** | 34 | | MEN | 3000 | 2987 | **0.5760** | 0.5668 | 0.5256 | 0.4344 | 35 | | MTurk-287 | 287 | 284 | **0.6054** | 0.5350 | 0.5848 | 0.5093 | 36 | | MTurk-771 | 771 | 769 | **0.4981** | 0.4147 | 0.4965 | 0.3990 | 37 | | YP-130 | 139 | 118 | 0.3066 | 0.1808 | 0.2493 | **0.4021** | 38 | 39 | 40 | ####Spearman Coefficient - dimension 41 | | Dimension Number | OIWE | Skip-gram | 42 | | ------------- |:-------------:| -----:| 43 | | 100 | 62.851 | 64.724 | 44 | | 200 | 66.45 | 64.387 | 45 | | 300 | 71.74 | 67.35 | 46 | | 400 | 66.74 | 64.82 | 47 | | 500 | 64.118 | 66.046 | 48 | 49 | ####Word Intrusion 50 | | Model | Precision(%) | 51 | | ------------- |:-------------:| 52 | | Skip-gram | 32.62 | 53 | | NNSE | 92.00 | 54 | | OIWE-NPG | 61.40 | 55 | | OIWE-IPG | 94.80 | 56 | 57 | -------------------------------------------------------------------------------- /topwords2: -------------------------------------------------------------------------------- 1 | electrical electric court energy heat electricity 2 2 | among amongst things method classes families 3 3 | robert officially other essay administrative charles 2 4 | blood heart skin body versions brain 4 5 | travel cities sites heritage instrument historic 4 6 | swedish great khan deal alexander napoleon 0 7 | church much bishops churches catholic protestant 1 8 | france belgium minister intel prime becomes 3 9 | used dating earliest history periods origins 0 10 | example canada error problem solution correct 1 11 | online characteristic pair becomes constant similarly 0 12 | does either possibly otherwise household else 0 13 | mean alpha beta here hitler delta 4 14 | drug drugs mathbf alcohol agents substances 2 15 | holiday density celebrated sunday christmas festival 1 16 | month month year calendar gregorian week 1 17 | characterized followed supported convention dominated represented 3 18 | list products lists topics partial linked 1 19 | emperor holy succession kong bishop imperial 3 20 | ring dual plane australia maps inner 3 21 | start esperanto vision outer albert master 4 22 | british citizen colonies subjects going territories 4 23 | clock becomes usual marx derived selected 3 24 | were slaves forced interested family camps 4 25 | replaced supposed originally newly subsequently introduced 1 26 | together relationship count close contact relations 2 27 | males simon females thirty male years 1 28 | force legislative military corps guard task 1 29 | electronics mobile would computers technologies manufacturers 2 30 | also uses meanings section alternative details 4 31 | products gold sugar materials axis mining 4 32 | worked briefly studied whom local attended 4 33 | attempt time same spent extra period 0 34 | equal value york measures minimum length 2 35 | route through sciences paris officially translated 1 36 | ethnic death stalin moment sentence screen 0 37 | calvin radio broadcast stations channels channel 0 38 | called york sometimes formerly phenomenon simply 1 39 | would expected likely zero lose otherwise 3 40 | college could enough doing unless anything 0 41 | slightly larger smaller until ones older 3 42 | dates leap gregorian date facts calendar 4 43 | both sides houses system parents hands 3 44 | germany french russian revolution architect physicist 0 45 | since existed changed polish ever then 3 46 | these included received tests typically goals 2 47 | nine governments eight five seven four 1 48 | recent hundreds several past number decades 4 49 | pronounced literally smith district appears county 2 50 | francisco california angeles santa http berkeley 4 51 | online gibraltar newspaper news magazine daily 1 52 | versions version linux license ohio microsoft 4 53 | though intel even occasionally seemed rarely 1 54 | smith thomas dates william joseph howard 2 55 | multiple techniques wide controversy variety types 3 56 | peter paul supporting gospel acts thomas 2 57 | town college illinois boston chicago kansas 0 58 | density miles feet approximately around square 4 59 | learning ring flat moore circumstances ending 1 60 | grand import ireland dublin knights scotland 1 61 | school colleges mean schools student students 2 62 | painter births composer soldier poet architect 1 63 | distinct needed gain required allow sufficient 0 64 | english soldier dating oxford poet philosopher 2 65 | controversy good debate unknown remains exists 1 66 | count duke austria together napoleon sweden 3 67 | never much fully recognized officially confirmed 1 68 | used describe code refer exclusively frequently 2 69 | feet depth resources moses superior highest 3 70 | real attached numbers complex presence absolute 1 71 | press release operation three conference freedom 3 72 | after japanese immediately initial leaving named 1 73 | holiday least meeting request moment look 0 74 | person choose either otherwise beer refer 4 75 | general relativity lesser purpose house secretary 4 76 | superior santa around chile maria cuba 2 77 | down bear forth setting break laid 1 78 | counter disease syndrome patients cancer patient 0 79 | henry radio lord edward arthur luther 1 80 | does various including variety kinds numerous 0 81 | swedish mathematician term painter physicist scientist 2 82 | danish lead occur carry reach dangerous 0 83 | between distinction local difference distinguish conflict 2 84 | before louis jean pierre paul painter 0 85 | have seems originated traditionally around claimed 4 86 | ottoman local papers gate spring says 0 87 | kong spelling hong china chinese korea 1 88 | almost roman virtually nearly completely identical 1 89 | well attached hard mouth onto inside 0 90 | zone territorial none claims holidays ports 4 91 | lead bible book commentary biblical canon 0 92 | will continue remain there shall survive 3 93 | point points turning starting replaced view 4 94 | theorem ring integer roll steps board 1 95 | first completed look probably opened established 2 96 | classification characteristics species classified acids features 4 97 | number governments atomic increasing numbers limited 1 98 | states united kingdom though nations citizen 3 99 | austrian vienna friedrich isbn karl carl 3 100 | listing controversy virtually regardless matters nearly 1 101 | they because settled unless received remained 4 102 | paris dates pierre france jean officially 1 103 | some bowl scholars historians critics extent 1 104 | canada canadian ontario would columbia royal 3 105 | online troops soldiers army armies infantry 0 106 | james point moore bond douglas anthony 1 107 | problems effects serious region difficulty severe 3 108 | frac equal three mathbf alpha section 1 109 | bible century centuries twentieth half fifth 0 110 | intel inch without architecture processor core 2 111 | caesar rome calculus romans emperors alexandria 2 112 | moses germany berlin austria nazi hitler 0 113 | christmas campus earl soldier duke politician 1 114 | ethnic statement false choice logical statements 0 115 | mother friend husband father recent parents 4 116 | previously already begun spelling earlier been 3 117 | found discovered reported identified human preserved 4 118 | restricted only local normally limited available 2 119 | atari game games indo video animation 3 120 | function formula life operator variable define 2 121 | reading guide reference detailed republic references 4 122 | right left hitler wing behind hand 2 123 | gibraltar mathbf frac begin matrix vector 0 124 | translation latin main comes translated meaning 2 125 | civil places american native revolution revolutionary 1 126 | state heads nation background herself owned 4 127 | weeks minutes spelling hundred months days 2 128 | known formerly better land oldest earliest 3 129 | controversy when referring finally asked occurs 0 130 | rather prize than greater better usual 1 131 | congress office bureau department mean investigation 4 132 | animals fish quality birds cold forest 2 133 | bank beer drink wine meat milk 0 134 | more detail accurate recently bank races 4 135 | lady function mary queen love gave 1 136 | there usually females force currently signs 3 137 | politician poet after writer painter samuel 2 138 | self experience house physical mind consciousness 2 139 | king kings luther governments sons becomes 3 140 | muhammad islamic islam route arabic muslim 3 141 | david douglas author intel alfred smith 3 142 | should must impossible cannot replaced unless 4 143 | single list variety result creating brief 1 144 | perfect verb noun aspect hitler ending 4 145 | made contributions down custom changes progress 2 146 | four five cultural births commodore pages 2 147 | german danish world swiss finnish physicist 2 148 | center charles francis robert darwin alfred 0 149 | hitler tape nazi berlin karl stalin 1 150 | over course sovereignty poems victory concern 3 151 | bell born attached alexander biography walter 2 152 | chief poems collected published poem collection 0 153 | making makes make easier canada difficult 4 154 | between center park chicago garden stadium 0 155 | much reaching faster spent bell concerned 4 156 | will spanish brazil mexican chile portuguese 0 157 | carried original away turns broke laid 1 158 | below door table above listed here 1 159 | shall your heard king love want 3 160 | alternative active athens essential important ideal 2 161 | cell level temperatures high temperature peak 0 162 | train railway bridge eric rail road 3 163 | comics fictional monster comic known disney 4 164 | indo languages linguistic saint dialects germanic 3 165 | prime minister leader prize president cabinet 3 166 | rather project entry page site directory 0 167 | going therefore think constantine york teacher 4 168 | well doctor captain james scientist named 0 169 | many film fans amongst indeed consider 1 170 | seem also have offer suggested appear 2 171 | question whether republic explain answer questions 2 172 | with stephen exception associated combined credited 1 173 | missile fighter flight saint aircraft boat 3 174 | mail messages trying message internet send 2 175 | frank richard scott canada russell wilson 3 176 | instrument strings instruments isbn bass tone 3 177 | needed good dream dead beautiful evil 0 178 | which type represents unique actress symbol 4 179 | albert carl friedrich einstein painter karl 4 180 | polish czech poland feet danish hungarian 3 181 | chief senior staff officer going commander 4 182 | seven airlines volume list chapter revolutionary 3 183 | franklin danish benjamin historian richard roger 1 184 | liquid rocks iron kong oxygen carbon 3 185 | look your carried alone looking just 2 186 | through level passing pass moves journey 1 187 | late what know think tell happened 0 188 | until certainly formula receive survive confused 2 189 | single acids bonds acid reactions atom 0 190 | tower wall room mount player building 4 191 | three child woman children young baby 0 192 | structures athens closely linked structure complex 1 193 | personality driver hold australian musician football 2 194 | worked opera italian piano italy works 0 195 | returned player afterwards died soon moved 1 196 | this rate phenomenon leads situation approach 1 197 | official with website statistics site tourism 1 198 | holidays flat pattern executed child formed 4 199 | such existed mathbf acts artists items 2 200 | meant felt stated believed superior knew 4 201 | being despite communism latter full driven 2 202 | part respectively course consists kind rest 1 203 | genetic gene organisms attached virus evolutionary 3 204 | switzerland elsewhere netherlands function denmark norway 3 205 | question european europe nations continent union 0 206 | human rights watch beings muhammad behavior 4 207 | places countries among areas developing other 2 208 | ethnic indigenous dating tribes peoples minority 2 209 | less mail expensive efficient effective likely 1 210 | company engines ford corporation acquired owner 1 211 | method methods danish testing statistical experimental 2 212 | national unity historic guard paris liberation 4 213 | place polish takes take took taking 1 214 | zero classical approximately estimated meters roughly 1 215 | house lords houses palace cultural castle 4 216 | formula becomes characteristic transfer band conversion 4 217 | theory hypothesis used relativity model mechanics 2 218 | communism free anti radical movements movement 1 219 | center user server interface client users 0 220 | free open church foundation organization source 2 221 | html feet http campbell nasa typically 1 222 | most perhaps user notably highly influential 2 223 | rules require conditions involve ottoman requirements 4 224 | until town prison november march september 1 225 | divine well educated understood trained established 0 226 | herself husband married peter marriage twice 3 227 | commonly widely versions generally considered sometimes 2 228 | described viewed regarded danish referred interpreted 3 229 | herself easily obtained understood cannot viewed 0 230 | life intelligent trying lives spent ordinary 2 231 | infinite quality cost performance strength power 0 232 | term refers image phrase referring terms 2 233 | three dimensional least pages volume chapter 2 234 | main demographics politics possible article communications 3 235 | very extremely relatively comics fairly quite 3 236 | simon carl walter suggests joint karl 3 237 | major minor poems important factor role 2 238 | themselves identify tend dating consider accept 3 239 | isbn roosevelt franklin washington graphics coup 0 240 | sexual women gender partner brother relationships 4 241 | danish writer edition norwegian author swedish 2 242 | votes candidate vote translation candidates voting 3 243 | family families household friends rate allies 4 244 | copies worldwide both sold sales selling 2 245 | calvin john century adams davis kennedy 2 246 | their poems lives lose retained heads 1 247 | someone anyone vowel those himself person 2 248 | attempt effort incident intel event interview 3 249 | roosevelt campus university manchester museum cambridge 0 250 | system systems operating convention distributed existing 3 251 | intel processor commodore land architecture bits 3 252 | career followers swedish legacy personal work 2 253 | language caesar dialect esperanto grammar speak 1 254 | fiction irish arts martial science fine 1 255 | king late twentieth ages early beginning 0 256 | management educational technical athens planning research 3 257 | class white upper flag battles arms 4 258 | from apart cultural derived benefit obtained 2 259 | trying japanese actress russian singer finnish 0 260 | became becoming become increasingly learning apparent 4 261 | moses jesus prophet band adam angels 3 262 | ohio mississippi texas fiction fort michigan 3 263 | vowel letter career script vowels alphabet 2 264 | card engines cards score players chess 1 265 | distinct separate doctor societies categories divisions 2 266 | ball islands sword horse shot opponent 1 267 | example america north carolina south korea 0 268 | cell management cells proteins protein phase 1 269 | order orders person knight succession traditional 2 270 | tape audio atari compression storage disk 2 271 | painter saint cathedral buried anthony louis 0 272 | infinite sets finite land empty node 3 273 | links external over debt showing website 2 274 | eric dylan singer almost lead musician 3 275 | convention resolution commission german treaty issued 3 276 | http html blood index search internet 2 277 | into entered incorporated turning superior turned 4 278 | moses land area boundaries water total 0 279 | meant axis hence true gives alpha 0 280 | gibraltar cape colony dutch image coast 4 281 | isbn imports system laureate sons abstract 2 282 | hits airports current stars proper nearby 2 283 | loss reduction causing pressure products stress 4 284 | colour color blue orange supposed yellow 4 285 | later renamed eventually almost dropped incorporated 3 286 | other hand each unlike links uses 4 287 | image images pictures translation gallery drawing 3 288 | people living vowel persons killed thousands 2 289 | spelling usage throne pronunciation standard speaking 2 290 | name names changed hence replaced given 4 291 | calculus equations geometry algorithm popularity arithmetic 4 292 | economic economy policy known crisis financial 3 293 | lincoln peace washington freedom name franklin 4 294 | programming lisp painter scheme oriented functional 2 295 | republic independence czech press macedonia democratic 3 296 | ruth mark down jack clark bill 2 297 | star episode show batman cell knight 4 298 | door opposite edge dollar forward straight 3 299 | complete cited ring given complexity grow 2 300 | poor spanish safety health concerns relief 1 301 | marx philosophical though philosophy ethical ethics 2 302 | current frequency instruction environment kept content 2 303 | before prior shortly turning york immediately 4 304 | someone offered calling told gave calls 0 305 | varies vary image differ different distinct 2 306 | respectively commodore washington disease java lewis 3 307 | their player hockey basketball football figure 0 308 | cold post declared effort manner vietnam 4 309 | islands edition third volume dictionary cambridge 0 310 | cultural social political albert moral importance 3 311 | trying tried instruction managed unable attempts 2 312 | bear frac black bears genus giant 1 313 | supposed according attributed closer ottoman tend 4 314 | album records albums label medicine recording 4 315 | finalist quarter have douglas semi roger 2 316 | particle particles interaction convention mass motion 3 317 | stephen herbert zero donald edited bibliography 2 318 | below actress musician singer actor composer 0 319 | albert instruction instructions register input memory 0 320 | without dance avoid having changing apparently 1 321 | census ranked according hold index survey 3 322 | scale large management small amounts size 2 323 | throne vowel monarch kings scotland reign 1 324 | family planets planet solar earth observations 0 325 | confused proved might electronics must considered 3 326 | global climate natural isbn environmental change 3 327 | roman troops encyclopedia ancient catholic holidays 1 328 | popularity gained success multiple rapidly reputation 3 329 | didn until exist vision really anything 1 330 | bowl championship league season poor teams 4 331 | pope constantine didn philip civilization becomes 2 332 | long civil short range longest standing 1 333 | weight maximum empty round image inch 4 334 | hall fame complete baseball coach defensive 2 335 | around clock japanese turn locations across 2 336 | most import longer firm break chance 0 337 | dance both jazz music musicians folk 1 338 | zero town city metropolitan downtown village 0 339 | classical tradition prize traditions painting artistic 2 340 | hold cannot find feel attempt reason 4 341 | offers james respectively hungarian circle represents 1 342 | supporting golden best offered picture cast 3 343 | lunar apollo part moon summer winter 2 344 | york jersey testament import wave zealand 3 345 | family athens mythology greeks egyptian greek 0 346 | group translation groups task category functional 1 347 | doctor manner essentially thing another similar 0 348 | space spaces compact dimensional australia connected 4 349 | film films movie documentary german hollywood 4 350 | calculus publishing isbn publications books press 0 351 | possible interesting worth such clear note 3 352 | world fifth fourth saint largest third 3 353 | management governments democracy regime labor government 0 354 | perfect kennedy background assassination field extension 0 355 | irish scottish anglo deaths colour welsh 4 356 | region baptism judaism christians prayer christianity 0 357 | code codes domain text they document 4 358 | robert bank international african trade airport 0 359 | medicine writers scientists nobel electronics recipient 4 360 | islands island pacific characterized permanent ocean 3 361 | translation york mexico jersey city opened 0 362 | agave leaves tree plant school trees 4 363 | dollar easily currency billion percentage million 1 364 | michael will alan moore creator producer 1 365 | bush hitler clinton carter presidency roosevelt 1 366 | based focused kennedy emphasis upon depending 2 367 | court below judge trial supreme appeal 1 368 | association sport region federation olympic sports 2 369 | learning band boys song lyrics tour 0 370 | does pronounced necessarily neither seem merely 1 371 | instance purposes responsible replacement clock reasons 4 372 | prize winner nobel winning until award 4 373 | based ottoman rulers dynasty conquered empire 0 374 | four east west africa asia southeast 0 375 | engines guns cars from engine weapon 3 376 | ohio hindu buddhism indian korean cuisine 0 377 | original founding distinctive roots http peak 4 378 | language insurance stock funds goods credit 0 379 | zone soviet nato moscow stalin union 0 380 | recently suggested criticized historically louis been 4 381 | suggests argue claim indicates instruction evidence 4 382 | legislative assembly parliament elected door judicial 4 383 | sciences chemistry study canada mathematics psychology 3 384 | highway example newton nation legacy female 1 385 | battles camp kennedy camps wars battle 2 386 | rivers river lake lakes characterized waters 4 387 | received awarded which medal awards award 2 388 | counter around suicide murder against violent 1 389 | affairs ministry controversy secret service public 2 390 | region spanish province provinces regions districts 1 391 | announced charles listing july june february 1 392 | facts about soviet regarding information questions 2 393 | rate population point migration birth statistics 2 394 | laws york copyright criminal legal contract 1 395 | israeli lebanon palestinian iraq france jordan 4 396 | births founded press deaths laureate eight 2 397 | divine spirit heaven hell louis ultimate 4 398 | brother sister younger sons michael wife 4 399 | australia zealand wales australian class england 4 400 | under control could leadership circumstances placed 2 401 | -------------------------------------------------------------------------------- /vectors/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/luohongyin/OIWE/ae6300754801cc1b2bc9195bf305670e78774a72/vectors/.DS_Store -------------------------------------------------------------------------------- /vectors/analysis.py: -------------------------------------------------------------------------------- 1 | #lhy 2 | #2015.4 3 | 4 | import cPickle 5 | import random 6 | 7 | class Analysis(): 8 | def __init__(self): 9 | self.vectors = cPickle.load(open("vectors",'rb')) 10 | self.words = self.vectors.keys() 11 | self.scale = len(self.vectors[self.words[0]]) 12 | self.dimensions = {} 13 | 14 | def dimension_analysis(self): 15 | self.topWords = {} 16 | for i in range(self.scale): 17 | self.dimensions[i] = {} 18 | for i in range(len(self.words)): 19 | vector = list(self.vectors[self.words[i]]) 20 | for j in range(len(vector)): 21 | if vector[j] > 0: 22 | self.dimensions[j][self.words[i]] = vector[j] 23 | for i in range(self.scale): 24 | self.topWords[i] = [] 25 | dictionary = {} 26 | for (key,value) in self.dimensions[i].items(): 27 | if value not in dictionary: 28 | dictionary[value] = [key] 29 | else: 30 | dictionary[value].append(key) 31 | index = dictionary.keys() 32 | index.sort() 33 | for j in range(len(index)): 34 | self.topWords[i].extend(dictionary[index[j]]) 35 | print "Top Words ok" 36 | outHandle = open("words",'w') 37 | for i in range(len(self.topWords)): 38 | d = len(self.topWords[i]) 39 | words = self.topWords[i][d - 5:d] 40 | location = int(5 * random.random()) 41 | words.insert(location,self.topWords[i][0]) 42 | words.append(str(location)) 43 | string = ' '.join(words) 44 | outHandle.write(string + '\n') 45 | outHandle.close() 46 | print "Data Ready" 47 | -------------------------------------------------------------------------------- /vectors/analysis.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/luohongyin/OIWE/ae6300754801cc1b2bc9195bf305670e78774a72/vectors/analysis.pyc -------------------------------------------------------------------------------- /vectors/filt.sh: -------------------------------------------------------------------------------- 1 | #lhy 2 | #2015.5 3 | 4 | python filterVocab.py fullVocab.txt < text8.txt > filtVectors.txt 5 | gzip filtVectors.txt filtVectors.txt.gz 6 | cp filtVectors.txt.gz ../../eval-vectors/ 7 | cd ../../eval-vectors/ 8 | python wordsim.py filtVectors.txt.gz 9 | -------------------------------------------------------------------------------- /vectors/filtVectors.txt.gz: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/luohongyin/OIWE/ae6300754801cc1b2bc9195bf305670e78774a72/vectors/filtVectors.txt.gz -------------------------------------------------------------------------------- /vectors/filterVocab.py: -------------------------------------------------------------------------------- 1 | import sys 2 | 3 | d = {} 4 | for line in open(sys.argv[1], 'r'): 5 | d[line.strip()] = 0 6 | 7 | for line in sys.stdin: 8 | if line.strip().split()[0] in d: print line.strip() -------------------------------------------------------------------------------- /vectors/fullVocab.txt: -------------------------------------------------------------------------------- 1 | abandon 2 | abandonment 3 | abashed 4 | abbreviate 5 | abdomen 6 | abductor 7 | ability 8 | ablaze 9 | abnormal 10 | abnormality 11 | abortion 12 | about 13 | abraham 14 | absconding 15 | absence 16 | absolute 17 | absorb 18 | absorbance 19 | absorbing 20 | abstract 21 | abstractionist 22 | abundance 23 | abuse 24 | academicism 25 | accelerate 26 | accentuate 27 | accept 28 | acceptable 29 | acceptance 30 | access 31 | accessible 32 | accident 33 | acclaim 34 | accommodation 35 | accommodative 36 | accomplish 37 | accomplished 38 | accomplishments 39 | accordance 40 | account 41 | ache 42 | achieve 43 | acknowledge 44 | acknowledgement 45 | acoustic 46 | acoustical 47 | acoustics 48 | acquire 49 | acquiring 50 | acquisition 51 | acquisitive 52 | acre 53 | acrobat 54 | acronymic 55 | acrylic 56 | act 57 | acting 58 | action 59 | active 60 | activity 61 | actor 62 | actress 63 | actuator 64 | ad 65 | adapt 66 | adaptive 67 | add 68 | addiction 69 | addition 70 | address 71 | adhesion 72 | adhesive 73 | adjournment 74 | adjustment 75 | adjustor 76 | administration 77 | admiralty 78 | admission 79 | admit 80 | admittance 81 | admitting 82 | adult 83 | adulteration 84 | advancement 85 | adventism 86 | adventure 87 | adversary 88 | adverse 89 | adversely 90 | advertisement 91 | advertiser 92 | advise 93 | advised 94 | advisory 95 | advocate 96 | aerial 97 | aerialist 98 | aeronautical 99 | aeronautics 100 | affect 101 | affectional 102 | affiliation 103 | affirm 104 | affordable 105 | afghani 106 | afghanistan 107 | afraid 108 | africa 109 | african 110 | afternoon 111 | age 112 | agency 113 | agent 114 | aggression 115 | agitation 116 | agony 117 | agree 118 | agreement 119 | aid 120 | aim 121 | air 122 | aircraft 123 | airplane 124 | airport 125 | airship 126 | aisle 127 | alarm 128 | alarmism 129 | albert 130 | alcohol 131 | algebra 132 | algebraic 133 | algebraist 134 | algebras 135 | algorithm 136 | alien 137 | allergic 138 | alleviated 139 | alley 140 | allow 141 | alloy 142 | allurement 143 | alphabet 144 | alt 145 | alter 146 | alternative 147 | aluminum 148 | amateur 149 | amateurish 150 | amazings 151 | ambitious 152 | american 153 | americanize 154 | amethysts 155 | amorphous 156 | amount 157 | amounted 158 | amphibian 159 | amusement 160 | amusements 161 | anaesthetics 162 | analogize 163 | analogous 164 | analyze 165 | analyzed 166 | anamorphosis 167 | anarchist 168 | anarchy 169 | anatomy 170 | anchor 171 | ancient 172 | angel 173 | anger 174 | angle 175 | angola 176 | angrier 177 | angry 178 | angular 179 | animal 180 | animalism 181 | animality 182 | animalize 183 | ankle 184 | annihilator 185 | anniversary 186 | announce 187 | announced 188 | announcement 189 | annoy 190 | answer 191 | ant 192 | antagonist 193 | antagonize 194 | antecedent 195 | antechamber 196 | antedating 197 | anterooms 198 | anticancer 199 | anticyclones 200 | antifeminism 201 | antifeminist 202 | antipsychotic 203 | antipsychotics 204 | antisubmarine 205 | antitoxic 206 | antitumor 207 | antonymous 208 | anxiety 209 | anxious 210 | apartment 211 | apocalyptical 212 | apolitical 213 | apollo 214 | apologize 215 | apparel 216 | apparent 217 | apparition 218 | appear 219 | appearance 220 | appearances 221 | append 222 | applaud 223 | apple 224 | appliance 225 | apply 226 | appoint 227 | appointment 228 | appraisal 229 | apprenticeship 230 | approach 231 | approachable 232 | approval 233 | approve 234 | approved 235 | approving 236 | aquarium 237 | aqueous 238 | arab 239 | arafat 240 | arbitrary 241 | arc 242 | arch 243 | archbishop 244 | archery 245 | architecture 246 | archive 247 | area 248 | argentina 249 | argue 250 | argument 251 | arithmetic 252 | arm 253 | armor 254 | army 255 | aroma 256 | arousal 257 | arouse 258 | arrange 259 | arrangement 260 | arrival 261 | arrive 262 | arrow 263 | art 264 | article 265 | artifact 266 | artillery 267 | artist 268 | artlessness 269 | ascendence 270 | asexual 271 | ashamed 272 | ashes 273 | asia 274 | asian 275 | ask 276 | aspect 277 | aspen 278 | asphalt 279 | asphaltic 280 | aspirate 281 | ass 282 | assassinated 283 | assassination 284 | assaulted 285 | assay 286 | assembly 287 | assessment 288 | assessments 289 | assets 290 | assign 291 | assigned 292 | assignment 293 | assimilate 294 | assist 295 | assistance 296 | assistances 297 | associate 298 | association 299 | associational 300 | associations 301 | assume 302 | asteroidal 303 | astonish 304 | astringe 305 | astronautical 306 | astronomer 307 | astronomical 308 | asylum 309 | athens 310 | athlete 311 | athletics 312 | atmosphere 313 | atom 314 | atomic 315 | attach 316 | attachment 317 | attack 318 | attacker 319 | attackers 320 | attainment 321 | attempt 322 | attend 323 | attendance 324 | attendances 325 | attention 326 | attested 327 | attitude 328 | attorney 329 | attraction 330 | attractor 331 | attributable 332 | attribute 333 | attributions 334 | auditive 335 | august 336 | aunt 337 | author 338 | authority 339 | authorize 340 | authorship 341 | auto 342 | autobiographer 343 | autobiographies 344 | autobuses 345 | autocracy 346 | autoerotic 347 | autofocus 348 | autograft 349 | autografts 350 | autograph 351 | autographic 352 | autoimmune 353 | autoimmunity 354 | autoloading 355 | automate 356 | automates 357 | automatic 358 | automobile 359 | autopilot 360 | autopilots 361 | autoregulation 362 | autosuggestion 363 | autumn 364 | available 365 | avenue 366 | average 367 | aviation 368 | avoid 369 | awards 370 | awareness 371 | awful 372 | awkward 373 | baby 374 | bachelors 375 | back 376 | backwardness 377 | bacon 378 | bad 379 | badge 380 | badness 381 | bag 382 | baggers 383 | bail 384 | bait 385 | bakery 386 | balance 387 | balanced 388 | ball 389 | ballad 390 | balloon 391 | ballots 392 | balminess 393 | banana 394 | band 395 | bangkok 396 | banished 397 | bank 398 | banker 399 | bankruptcy 400 | banquet 401 | baptistic 402 | bar 403 | barn 404 | barrel 405 | base 406 | baseball 407 | baseness 408 | basic 409 | basin 410 | basket 411 | basketball 412 | bastardize 413 | baste 414 | bath 415 | bathroom 416 | battened 417 | battered 418 | battle 419 | battleship 420 | battleships 421 | bay 422 | beach 423 | bead 424 | beam 425 | bean 426 | bear 427 | beard 428 | beast 429 | beastly 430 | beat 431 | beautiful 432 | beauty 433 | become 434 | bed 435 | bedroom 436 | bee 437 | beef 438 | beer 439 | beethoven 440 | beetle 441 | beg 442 | begin 443 | beginner 444 | beginning 445 | behave 446 | behavior 447 | behaviorist 448 | behavioural 449 | beijing 450 | being 451 | belief 452 | believe 453 | believing 454 | bell 455 | belligerence 456 | bellowing 457 | belly 458 | belt 459 | bench 460 | benchmark 461 | bend 462 | bendability 463 | benefactor 464 | benefited 465 | bengali 466 | berlin 467 | berry 468 | besieging 469 | bestowal 470 | bestowals 471 | beverage 472 | bewitchment 473 | bias 474 | bible 475 | bibliographies 476 | bicycle 477 | big 478 | bike 479 | bikini 480 | bill 481 | billboard 482 | billion 483 | bin 484 | binary 485 | binging 486 | biographer 487 | biography 488 | biology 489 | bird 490 | birds 491 | birth 492 | birthday 493 | bishop 494 | bishops 495 | bit 496 | bite 497 | bitter 498 | bizarre 499 | black 500 | blacken 501 | blackmailed 502 | blade 503 | blanket 504 | blaze 505 | bleach 506 | blend 507 | blessing 508 | blithering 509 | blitzed 510 | blizzard 511 | block 512 | blonde 513 | blood 514 | bloom 515 | blossom 516 | blow 517 | blue 518 | bluejacket 519 | blues 520 | blunders 521 | blur 522 | blurting 523 | board 524 | boardwalk 525 | boast 526 | boastful 527 | boat 528 | bobbers 529 | bodily 530 | body 531 | bohemia 532 | boil 533 | boisterously 534 | bold 535 | boldness 536 | bond 537 | bone 538 | book 539 | booklet 540 | boot 541 | booth 542 | bootless 543 | border 544 | born 545 | borrow 546 | boston 547 | bottle 548 | bottom 549 | bounce 550 | bounced 551 | boundary 552 | bowl 553 | box 554 | boxer 555 | boxing 556 | boy 557 | brace 558 | bracelet 559 | brag 560 | braid 561 | brain 562 | brained 563 | brainless 564 | brake 565 | branch 566 | brand 567 | brand-newness 568 | brandish 569 | brandy 570 | brass 571 | brave 572 | brazil 573 | bread 574 | break 575 | breakfast 576 | breathe 577 | breather 578 | breed 579 | brick 580 | bride 581 | bridge 582 | bridges 583 | brigadier 584 | bright 585 | brightly 586 | brightness 587 | bring 588 | brisker 589 | britain 590 | brittany 591 | broad 592 | broadcast 593 | broadcasters 594 | broadcasting 595 | brochure 596 | brokerage 597 | bronchus 598 | bronze 599 | brood 600 | brother 601 | brotherhood 602 | brow 603 | brown 604 | bruise 605 | brush 606 | brussels 607 | bubble 608 | buck 609 | bucket 610 | bud 611 | buddy 612 | budget 613 | buffer 614 | bug 615 | buggered 616 | build 617 | builders 618 | building 619 | bulb 620 | bulgarian 621 | bull 622 | bulletin 623 | bun 624 | bunch 625 | bunking 626 | bunny 627 | bureaucrat 628 | burger 629 | burial 630 | burn 631 | burned 632 | burning 633 | burst 634 | burying 635 | bus 636 | bush 637 | business 638 | businessperson 639 | butter 640 | butterfly 641 | button 642 | buy 643 | cab 644 | cabbage 645 | cabin 646 | cache 647 | cactus 648 | caesarism 649 | cafe 650 | cage 651 | cake 652 | calcify 653 | calculate 654 | calculation 655 | calendar 656 | calf 657 | caliper 658 | call 659 | calmness 660 | camel 661 | camera 662 | camp 663 | campaign 664 | campaigning 665 | campfires 666 | can 667 | canada 668 | canal 669 | cancer 670 | candidate 671 | candle 672 | candy 673 | canine 674 | canker 675 | cannon 676 | canonical 677 | canonize 678 | canvass 679 | canyon 680 | cap 681 | capability 682 | capital 683 | capitalised 684 | capitalism 685 | capitation 686 | captain 687 | capture 688 | car 689 | caramelize 690 | carbon 691 | carbonate 692 | carbonic 693 | carburettors 694 | card 695 | cardboard 696 | cardinality 697 | cardinals 698 | care 699 | career 700 | careerism 701 | carefreeness 702 | careful 703 | cargo 704 | carnival 705 | carnivore 706 | carpet 707 | carriage 708 | carrier 709 | carrot 710 | carry 711 | cart 712 | cartoon 713 | case 714 | casey 715 | cash 716 | cast 717 | casteless 718 | castle 719 | castled 720 | cat 721 | catalogued 722 | catastrophe 723 | catch 724 | categorization 725 | category 726 | caterpillar 727 | cathedral 728 | catholics 729 | cattle 730 | causative 731 | causing 732 | cautious 733 | cave 734 | cd 735 | ceaseless 736 | ceiling 737 | celebration 738 | cell 739 | cellar 740 | cement 741 | cemetery 742 | censoring 743 | censorship 744 | censorships 745 | cent 746 | center 747 | century 748 | ceramic 749 | ceramicist 750 | cereal 751 | ceremonious 752 | ceremony 753 | certain 754 | certificate 755 | cession 756 | chain 757 | chair 758 | chairmanship 759 | challenge 760 | chamber 761 | champagne 762 | champion 763 | championship 764 | chance 765 | chandler 766 | change 767 | changeableness 768 | channel 769 | chaos 770 | chapel 771 | chapter 772 | character 773 | characteristic 774 | characterless 775 | characters 776 | charcoal 777 | charge 778 | charged 779 | charm 780 | charming 781 | chatter 782 | chauvinist 783 | cheap 784 | cheapen 785 | cheat 786 | check 787 | cheek 788 | cheerful 789 | cheerleader 790 | cheers 791 | cheese 792 | cheetah 793 | chef 794 | chemical 795 | chemistry 796 | cherish 797 | cherry 798 | chess 799 | chest 800 | chick 801 | chicken 802 | chief 803 | child 804 | childhood 805 | childish 806 | children 807 | chile 808 | chin 809 | china 810 | chip 811 | chipmunk 812 | chloride 813 | chocolate 814 | choice 815 | choir 816 | choke 817 | choose 818 | chooses 819 | chopper 820 | chord 821 | christianise 822 | christianity 823 | christmas 824 | chromatic 825 | chronicle 826 | chronologize 827 | chuck 828 | church 829 | churchs 830 | cigar 831 | cigarette 832 | circle 833 | circularize 834 | circulate 835 | circulation 836 | circumcising 837 | circumcision 838 | circumcisions 839 | circumference 840 | circumferential 841 | circumnavigate 842 | circumnavigations 843 | circumpolar 844 | circumscribed 845 | circumscribes 846 | circumspect 847 | circumstance 848 | circumstances 849 | circumvent 850 | circumvented 851 | circumventing 852 | circumvents 853 | circumvolution 854 | citizen 855 | citizenship 856 | citrus 857 | city 858 | civilise 859 | clairvoyant 860 | clamor 861 | clamorous 862 | clarify 863 | clash 864 | class 865 | classicist 866 | classics 867 | classification 868 | classify 869 | classroom 870 | clean 871 | clergyman 872 | clericalism 873 | clever 874 | click 875 | clients 876 | cliff 877 | cliffhanger 878 | climb 879 | clinic 880 | clip 881 | clock 882 | close 883 | closet 884 | cloth 885 | clothes 886 | cloud 887 | clown 888 | clownish 889 | clozapine 890 | club 891 | clubhouse 892 | clue 893 | coach 894 | coal 895 | coast 896 | coat 897 | cock 898 | cockerel 899 | cocktail 900 | cocoon 901 | code 902 | codefendants 903 | coeducation 904 | coefficient 905 | coerce 906 | cofactor 907 | cofactors 908 | coffee 909 | cofounder 910 | cofounders 911 | cognition 912 | cognizance 913 | coil 914 | coiled 915 | coin 916 | coinsurance 917 | cold 918 | collage 919 | collapse 920 | collar 921 | collect 922 | collected 923 | collection 924 | college 925 | collision 926 | colonel 927 | colonies 928 | colonise 929 | color 930 | color-blind 931 | colored 932 | colorful 933 | coloring 934 | colour 935 | colt 936 | column 937 | combatted 938 | combination 939 | combust 940 | combusted 941 | combusting 942 | combustion 943 | combusts 944 | come 945 | comfort 946 | comfortable 947 | comfortless 948 | comma 949 | command 950 | commander 951 | commandership 952 | comment 953 | commenting 954 | commerce 955 | commercialize 956 | commingle 957 | commingled 958 | commission 959 | commissions 960 | commit 961 | commitment 962 | commode 963 | commodes 964 | common 965 | communicate 966 | communication 967 | communicativeness 968 | communicator 969 | communistic 970 | community 971 | commutation 972 | commuters 973 | companion 974 | companionships 975 | company 976 | compare 977 | comparing 978 | comparison 979 | compartmentalization 980 | compatibility 981 | compel 982 | competence 983 | competes 984 | competition 985 | complain 986 | complimentary 987 | comport 988 | comportment 989 | compose 990 | composed 991 | composer 992 | composers 993 | composure 994 | compound 995 | comprehend 996 | comprehensible 997 | comprehensive 998 | computation 999 | computer 1000 | con 1001 | concavity 1002 | concentration 1003 | concept 1004 | concerning 1005 | concert 1006 | concerti 1007 | concerto 1008 | concerts 1009 | conclude 1010 | conclusion 1011 | conclusive 1012 | concoct 1013 | concordance 1014 | concrete 1015 | concreteness 1016 | concurrence 1017 | concurrencies 1018 | concurrency 1019 | condemned 1020 | condensing 1021 | condescend 1022 | condescended 1023 | condition 1024 | conditional 1025 | conditions 1026 | conductance 1027 | conductive 1028 | cone 1029 | confess 1030 | confession 1031 | confide 1032 | confidence 1033 | confident 1034 | confine 1035 | confinement 1036 | confinements 1037 | confirmable 1038 | conflagration 1039 | conflict 1040 | confluent 1041 | conformations 1042 | conformism 1043 | conformity 1044 | confusion 1045 | congeniality 1046 | congress 1047 | congruity 1048 | conjecture 1049 | conjoins 1050 | conjurors 1051 | connect 1052 | connectedness 1053 | connection 1054 | connoting 1055 | conquest 1056 | conscientious 1057 | conscientiousness 1058 | consciousness 1059 | conscripting 1060 | consequences 1061 | conservation 1062 | considerable 1063 | consign 1064 | consigning 1065 | consonant 1066 | conspicuousness 1067 | conspiracy 1068 | constancy 1069 | constant 1070 | constellation 1071 | constitute 1072 | constitution 1073 | constitutive 1074 | constrict 1075 | construct 1076 | construction 1077 | consubstantial 1078 | consultive 1079 | consume 1080 | consumer 1081 | consumptive 1082 | contact 1083 | contagious 1084 | container 1085 | containerful 1086 | containers 1087 | containership 1088 | contemplate 1089 | contend 1090 | content 1091 | contest 1092 | continence 1093 | continent 1094 | continuance 1095 | continue 1096 | continuous 1097 | continuously 1098 | contortionists 1099 | contrabands 1100 | contraception 1101 | contract 1102 | contraries 1103 | contrarily 1104 | contrastive 1105 | contravene 1106 | contravened 1107 | contrive 1108 | control 1109 | controversy 1110 | convector 1111 | convene 1112 | convergent 1113 | conversation 1114 | converse 1115 | conversely 1116 | convert 1117 | convertible 1118 | convict 1119 | convocation 1120 | cook 1121 | cookie 1122 | cooking 1123 | cooperate 1124 | cooperation 1125 | cooperator 1126 | cooperators 1127 | coordinator 1128 | cop 1129 | copartnership 1130 | copilot 1131 | copilots 1132 | copper 1133 | copulate 1134 | copy 1135 | copying 1136 | cord 1137 | corespondent 1138 | corner 1139 | corporation 1140 | corpulence 1141 | corral 1142 | correlate 1143 | correspondence 1144 | corridor 1145 | corrode 1146 | corroding 1147 | corrupt 1148 | corruption 1149 | corruptive 1150 | cosigned 1151 | cosigns 1152 | cosponsoring 1153 | cosponsors 1154 | cost 1155 | costume 1156 | costumes 1157 | cottage 1158 | cotton 1159 | couch 1160 | council 1161 | count 1162 | counter 1163 | counterfeit 1164 | country 1165 | couple 1166 | coupling 1167 | course 1168 | court 1169 | courteous 1170 | cousin 1171 | covariant 1172 | cover 1173 | covered 1174 | covering 1175 | covert 1176 | cow 1177 | cowboys 1178 | cowered 1179 | cozy 1180 | crab 1181 | crack 1182 | cradle 1183 | craft 1184 | craftsman 1185 | crane 1186 | crash 1187 | crazy 1188 | create 1189 | creation 1190 | creative 1191 | creativity 1192 | creator 1193 | creature 1194 | credentials 1195 | credibility 1196 | credit 1197 | creek 1198 | crew 1199 | crib 1200 | crier 1201 | crime 1202 | crisis 1203 | crispness 1204 | criterion 1205 | critic 1206 | critical 1207 | criticality 1208 | criticism 1209 | crochet 1210 | crop 1211 | cross 1212 | cross-index 1213 | cross-link 1214 | crosswise 1215 | crouch 1216 | crow 1217 | crowd 1218 | crowded 1219 | crown 1220 | crucial 1221 | crude 1222 | crudeness 1223 | cruel 1224 | crusaders 1225 | crush 1226 | cry 1227 | crystal 1228 | crystalline 1229 | cube 1230 | cucumber 1231 | cuddle 1232 | cuisine 1233 | cultist 1234 | culture 1235 | cup 1236 | curator 1237 | curious 1238 | curl 1239 | currency 1240 | current 1241 | curtain 1242 | curvature 1243 | curve 1244 | curved 1245 | cushion 1246 | customers 1247 | customise 1248 | cut 1249 | cute 1250 | cuteness 1251 | cutter 1252 | cylinder 1253 | cylindric 1254 | cylindrical 1255 | cynical 1256 | cynically 1257 | dad 1258 | daffodil 1259 | daisy 1260 | dam 1261 | damage 1262 | damages 1263 | dance 1264 | dancer 1265 | dandelion 1266 | danger 1267 | dangerous 1268 | dark 1269 | dash 1270 | dashboard 1271 | database 1272 | date 1273 | daughter 1274 | dawn 1275 | day 1276 | dazzle 1277 | dead 1278 | deadness 1279 | deal 1280 | death 1281 | debarred 1282 | debt 1283 | decade 1284 | decapitated 1285 | decay 1286 | deceitful 1287 | deceive 1288 | deceiver 1289 | decelerate 1290 | decide 1291 | deciphering 1292 | decision 1293 | deck 1294 | declare 1295 | decomposition 1296 | decompositions 1297 | deconstruct 1298 | decorate 1299 | decoration 1300 | decrease 1301 | deduce 1302 | deep 1303 | deer 1304 | deface 1305 | defame 1306 | defeat 1307 | defeating 1308 | defeatist 1309 | defecation 1310 | defend 1311 | defensive 1312 | defiant 1313 | deficit 1314 | defiles 1315 | definition 1316 | deflate 1317 | deflowering 1318 | deforming 1319 | deformity 1320 | defrauding 1321 | defrayed 1322 | degree 1323 | deity 1324 | delay 1325 | deletion 1326 | delight 1327 | delightful 1328 | delimitations 1329 | delimited 1330 | deliver 1331 | delivery 1332 | demand 1333 | demanded 1334 | dematerialised 1335 | demerit 1336 | democratize 1337 | demolition 1338 | demon 1339 | demonstrate 1340 | demoralise 1341 | demureness 1342 | denial 1343 | denominate 1344 | dense 1345 | density 1346 | dentist 1347 | deny 1348 | department 1349 | departure 1350 | dependence 1351 | dependent 1352 | depict 1353 | depictive 1354 | deployment 1355 | depopulate 1356 | deposes 1357 | deposit 1358 | depreciate 1359 | depression 1360 | depressor 1361 | deprive 1362 | depth 1363 | deputy 1364 | deregulating 1365 | descend 1366 | descent 1367 | describe 1368 | desert 1369 | deserters 1370 | desertion 1371 | deserve 1372 | desiccating 1373 | design 1374 | designed 1375 | designs 1376 | desirable 1377 | desire 1378 | desk 1379 | despair 1380 | despoil 1381 | dessert 1382 | destabilization 1383 | destroy 1384 | destroyers 1385 | destruction 1386 | detail 1387 | detectable 1388 | determination 1389 | determine 1390 | detestable 1391 | devalue 1392 | develop 1393 | development 1394 | developments 1395 | deviationism 1396 | device 1397 | devil 1398 | devilish 1399 | devilishly 1400 | deviously 1401 | devise 1402 | dew 1403 | diagonal 1404 | diagonals 1405 | dialogue 1406 | diamond 1407 | diarrhea 1408 | dice 1409 | dictatorship 1410 | dictionary 1411 | diet 1412 | differ 1413 | difference 1414 | differences 1415 | different 1416 | differentia 1417 | difficult 1418 | difficulty 1419 | diffidence 1420 | dig 1421 | digit 1422 | digitise 1423 | dignity 1424 | dilute 1425 | dimensional 1426 | diner 1427 | dinner 1428 | direct 1429 | direction 1430 | directional 1431 | directionless 1432 | dirt 1433 | dirty 1434 | disability 1435 | disabused 1436 | disadvantaged 1437 | disagree 1438 | disappear 1439 | disapproving 1440 | disarranged 1441 | disassembled 1442 | disassociates 1443 | disaster 1444 | disavowed 1445 | disbelieving 1446 | disc 1447 | discard 1448 | discernment 1449 | discharge 1450 | discharged 1451 | discipleship 1452 | discipline 1453 | disclosure 1454 | discolor 1455 | discontinuance 1456 | discontinuous 1457 | discordance 1458 | discountenance 1459 | discounters 1460 | discourage 1461 | discourteous 1462 | discover 1463 | discovery 1464 | discoverys 1465 | discrete 1466 | discriminate 1467 | discriminating 1468 | discrimination 1469 | discriminatory 1470 | discuss 1471 | discussion 1472 | disease 1473 | disembodied 1474 | disengages 1475 | disestablishing 1476 | disfavor 1477 | disfavoring 1478 | disfigure 1479 | disgorge 1480 | disgruntle 1481 | disguise 1482 | disgust 1483 | disgusting 1484 | dish 1485 | dishonest 1486 | disinflation 1487 | disinheritance 1488 | disinvestment 1489 | disjoined 1490 | disjunct 1491 | disk 1492 | disloyal 1493 | disorderly 1494 | disorganize 1495 | disperse 1496 | dispersive 1497 | display 1498 | displeased 1499 | displeases 1500 | disposition 1501 | dispossess 1502 | disprove 1503 | disquieting 1504 | disrespectful 1505 | dissatisfying 1506 | dissenter 1507 | dissenters 1508 | dissimulate 1509 | dissipate 1510 | dissociable 1511 | dissociations 1512 | dissolved 1513 | dissonance 1514 | distance 1515 | distillate 1516 | distinction 1517 | distinguish 1518 | distinguishing 1519 | distress 1520 | distressful 1521 | distribute 1522 | distribution 1523 | distributive 1524 | distributor 1525 | district 1526 | distrust 1527 | distrustful 1528 | disturbance 1529 | disturbances 1530 | disturbing 1531 | diver 1532 | diversion 1533 | divide 1534 | divided 1535 | dividend 1536 | diving 1537 | divisible 1538 | division 1539 | do 1540 | docile 1541 | dock 1542 | doctor 1543 | doctrine 1544 | document 1545 | dog 1546 | doll 1547 | dollar 1548 | domain 1549 | dome 1550 | dominance 1551 | dominate 1552 | donate 1553 | donkey 1554 | donut 1555 | doodle 1556 | dooming 1557 | door 1558 | doorway 1559 | doubt 1560 | down 1561 | downtown 1562 | draft 1563 | drafting 1564 | dragon 1565 | dragonfly 1566 | drain 1567 | drama 1568 | draw 1569 | drawer 1570 | drawers 1571 | drawing 1572 | dreadnought 1573 | dreary 1574 | dress 1575 | dressing 1576 | drill 1577 | drink 1578 | drip 1579 | drive 1580 | driver 1581 | driving 1582 | drizzle 1583 | drop 1584 | droplet 1585 | drought 1586 | drownings 1587 | drug 1588 | drum 1589 | dry 1590 | dryer 1591 | duck 1592 | dude 1593 | duel 1594 | dull 1595 | dumb 1596 | duplicable 1597 | dusk 1598 | dutch 1599 | duty 1600 | dwarfish 1601 | dye 1602 | dynastic 1603 | dysentery 1604 | eagle 1605 | ear 1606 | early 1607 | earmuffs 1608 | earn 1609 | earning 1610 | earth 1611 | earthquake 1612 | ease 1613 | easter 1614 | easy 1615 | eat 1616 | eavesdropper 1617 | ebb 1618 | ecclesiastic 1619 | ecology 1620 | economic 1621 | economist 1622 | ecuador 1623 | edgeless 1624 | edible 1625 | editing 1626 | editor 1627 | educate 1628 | education 1629 | effected 1630 | effectiveness 1631 | effort 1632 | egg 1633 | ego 1634 | eight 1635 | einstein 1636 | ejector 1637 | elaborate 1638 | elastic 1639 | elated 1640 | elbow 1641 | eldership 1642 | elect 1643 | election 1644 | electioneering 1645 | elector 1646 | electrical 1647 | electronics 1648 | elegance 1649 | elegant 1650 | element 1651 | elephant 1652 | elevator 1653 | embellishment 1654 | embezzle 1655 | embody 1656 | embracement 1657 | embroiderer 1658 | embroiderers 1659 | embroideress 1660 | embroideries 1661 | emerald 1662 | emergency 1663 | emigration 1664 | emission 1665 | emotion 1666 | emotional 1667 | emotionalism 1668 | empirical 1669 | employable 1670 | employed 1671 | employee 1672 | employer 1673 | employments 1674 | empower 1675 | empty 1676 | emulsify 1677 | emulsifying 1678 | encamping 1679 | encapsulate 1680 | enchantress 1681 | encoded 1682 | encourage 1683 | encouragement 1684 | encouraging 1685 | encroachments 1686 | encrust 1687 | encrusted 1688 | encyclopaedic 1689 | encyclopedia 1690 | end 1691 | endangerment 1692 | endorse 1693 | endorsement 1694 | endurable 1695 | endurance 1696 | energetic 1697 | energized 1698 | energy 1699 | enfolded 1700 | enfolding 1701 | enforcements 1702 | enforcing 1703 | engage 1704 | engagement 1705 | engine 1706 | engineering 1707 | england 1708 | english 1709 | engorge 1710 | engrave 1711 | enhancement 1712 | enjoining 1713 | enjoins 1714 | enjoy 1715 | enlarge 1716 | enlarger 1717 | enlist 1718 | enliven 1719 | ennobled 1720 | enroll 1721 | enrollment 1722 | enshrouded 1723 | enslaves 1724 | enter 1725 | enterprise 1726 | entertain 1727 | entertainer 1728 | enthuse 1729 | enthusiast 1730 | enthusiastic 1731 | entity 1732 | entombment 1733 | entrance 1734 | entrapped 1735 | entrapping 1736 | entraps 1737 | entreaty 1738 | entrench 1739 | entrust 1740 | entwined 1741 | enunciated 1742 | enunciates 1743 | enunciating 1744 | envelop 1745 | environment 1746 | epicure 1747 | episcopal 1748 | episode 1749 | equality 1750 | equation 1751 | equatorial 1752 | equip 1753 | equipment 1754 | equivalence 1755 | era 1756 | eroticism 1757 | error 1758 | eruptive 1759 | escalator 1760 | espionage 1761 | essay 1762 | essential 1763 | established 1764 | establishment 1765 | estrogenic 1766 | ethnic 1767 | evacuated 1768 | evaluate 1769 | evangelicalism 1770 | evangelistic 1771 | evangelize 1772 | evaporate 1773 | even 1774 | evening 1775 | event 1776 | eventful 1777 | evidence 1778 | evidently 1779 | evil 1780 | evolution 1781 | exacerbated 1782 | exacted 1783 | exaction 1784 | examination 1785 | examine 1786 | examiner 1787 | example 1788 | excavations 1789 | exceedance 1790 | excellent 1791 | exceptionally 1792 | excessive 1793 | exchange 1794 | exchangeable 1795 | excitation 1796 | excitations 1797 | excitedly 1798 | excitements 1799 | exciting 1800 | exclaiming 1801 | exclamation 1802 | exclusive 1803 | excommunicate 1804 | excommunicated 1805 | excrete 1806 | excretion 1807 | execute 1808 | executive 1809 | exemplify 1810 | exempt 1811 | exhibit 1812 | exhibited 1813 | exhibition 1814 | exile 1815 | exist 1816 | exit 1817 | exotic 1818 | expand 1819 | expansion 1820 | expect 1821 | expedition 1822 | expel 1823 | expensiveness 1824 | experience 1825 | experimenter 1826 | expert 1827 | explain 1828 | explanation 1829 | explicit 1830 | exploitation 1831 | exploitive 1832 | explore 1833 | explorer 1834 | explorers 1835 | explosion 1836 | explosive 1837 | exporters 1838 | expose 1839 | expound 1840 | expounded 1841 | expounding 1842 | express 1843 | expressible 1844 | expressionless 1845 | extended 1846 | extendible 1847 | extending 1848 | extension 1849 | exterminate 1850 | exterminated 1851 | exterminator 1852 | extinguish 1853 | extort 1854 | extract 1855 | extractor 1856 | extracts 1857 | extrajudicial 1858 | extraordinary 1859 | extrapolations 1860 | extrasensory 1861 | extraterrestrial 1862 | extraterrestrials 1863 | extraterritorial 1864 | extraversion 1865 | extravert 1866 | extroversive 1867 | eye 1868 | fabric 1869 | fabricate 1870 | face 1871 | facilitation 1872 | fact 1873 | factory 1874 | fail 1875 | fair 1876 | faith 1877 | fall 1878 | falsifier 1879 | fame 1880 | family 1881 | fancy 1882 | fantasist 1883 | fantasy 1884 | farley 1885 | farm 1886 | farmer 1887 | fascinate 1888 | fashion 1889 | fashionable 1890 | fast 1891 | fasten 1892 | fastness 1893 | father 1894 | fault 1895 | fauna 1896 | favourable 1897 | faze 1898 | fbi 1899 | fear 1900 | feather 1901 | feature 1902 | federalize 1903 | fee 1904 | feedback 1905 | feel 1906 | feeling 1907 | feet 1908 | feline 1909 | female 1910 | feminised 1911 | fence 1912 | ferry 1913 | fertility 1914 | festival 1915 | fetishism 1916 | feudalism 1917 | fever 1918 | fiction 1919 | fictitiously 1920 | fiddled 1921 | field 1922 | fieldworker 1923 | fight 1924 | fighter 1925 | fighting 1926 | figurative 1927 | figure 1928 | filing 1929 | fill 1930 | film 1931 | finality 1932 | finance 1933 | find 1934 | finger 1935 | fingerprint 1936 | finish 1937 | fire 1938 | fireproof 1939 | firework 1940 | fish 1941 | fishing 1942 | five 1943 | fixture 1944 | flag 1945 | flame 1946 | flamingo 1947 | flash 1948 | flat 1949 | flattery 1950 | flatulence 1951 | flaunt 1952 | flavor 1953 | flavourful 1954 | flee 1955 | fleeing 1956 | flesh 1957 | fleshiness 1958 | flexible 1959 | flicker 1960 | flight 1961 | flighted 1962 | flightless 1963 | flim-flam 1964 | float 1965 | flood 1966 | floor 1967 | flora 1968 | florescence 1969 | flour 1970 | flow 1971 | flower 1972 | flu 1973 | fluidity 1974 | flush 1975 | flute 1976 | fly 1977 | flyer 1978 | focus 1979 | fog 1980 | foliage 1981 | follow 1982 | followed 1983 | follower 1984 | food 1985 | foolish 1986 | foot 1987 | football 1988 | footballers 1989 | footprint 1990 | forbes 1991 | forbid 1992 | force 1993 | forceps 1994 | ford 1995 | fording 1996 | forecast 1997 | foreclosed 1998 | foreign 1999 | foreigner 2000 | foreigners 2001 | forest 2002 | foresters 2003 | forged 2004 | forget 2005 | forgive 2006 | form 2007 | formal 2008 | formalisms 2009 | format 2010 | formation 2011 | formations 2012 | formula 2013 | forswearing 2014 | forthcoming 2015 | fortitude 2016 | fossil 2017 | foul 2018 | foundation 2019 | founder 2020 | fountain 2021 | fox 2022 | fractionate 2023 | fractures 2024 | fragile 2025 | fragmentation 2026 | fragrance 2027 | frame 2028 | framework 2029 | france 2030 | fraternity 2031 | fraud 2032 | freakishly 2033 | freeze 2034 | freighter 2035 | french 2036 | frequency 2037 | fresh 2038 | freshen 2039 | freshness 2040 | fret 2041 | freud 2042 | friedman 2043 | friend 2044 | friendliness 2045 | friendly 2046 | friendship 2047 | friendships 2048 | frighten 2049 | frigid 2050 | fringes 2051 | frivolously 2052 | frog 2053 | front 2054 | frost 2055 | frowning 2056 | frozen 2057 | fruit 2058 | fruiterer 2059 | fruitful 2060 | frustration 2061 | fuck 2062 | fuel 2063 | fulfillments 2064 | fun 2065 | functionality 2066 | fund 2067 | fundamentalism 2068 | funds 2069 | funeral 2070 | fungus 2071 | funny 2072 | fur 2073 | furnace 2074 | furnish 2075 | furniture 2076 | fury 2077 | future 2078 | gaiety 2079 | gain 2080 | galaxy 2081 | gallon 2082 | galvanic 2083 | galvanize 2084 | gamble 2085 | gambling 2086 | game 2087 | ganging 2088 | garage 2089 | garbage 2090 | garden 2091 | gardens 2092 | garfield 2093 | garlic 2094 | garment 2095 | gas 2096 | gasoline 2097 | gate 2098 | gateway 2099 | gather 2100 | gathered 2101 | gathering 2102 | gauge 2103 | gear 2104 | gelatinous 2105 | gem 2106 | gender 2107 | gene 2108 | general 2109 | generalized 2110 | generation 2111 | generous 2112 | genius 2113 | genre 2114 | gentleman 2115 | genuinely 2116 | germanic 2117 | germany 2118 | germinate 2119 | get 2120 | giant 2121 | gibberish 2122 | gift 2123 | gin 2124 | giraffe 2125 | girl 2126 | give 2127 | giving 2128 | glad 2129 | gladness 2130 | glass 2131 | glistens 2132 | glitter 2133 | glittery 2134 | globalise 2135 | globe 2136 | gloom 2137 | glove 2138 | glue 2139 | gluttonous 2140 | go 2141 | goal 2142 | goat 2143 | god 2144 | gold 2145 | goldplated 2146 | golf 2147 | good 2148 | goose 2149 | gorgeous 2150 | gospel 2151 | gossip 2152 | goverment 2153 | governance 2154 | government 2155 | governor 2156 | gracefulness 2157 | grade 2158 | graffito 2159 | graft 2160 | granddaughter 2161 | grandfather 2162 | grandmother 2163 | grandson 2164 | grape 2165 | graphic 2166 | grass 2167 | grassland 2168 | grassroots 2169 | grate 2170 | grave 2171 | gravestone 2172 | graveyard 2173 | gravitated 2174 | gravity 2175 | gray 2176 | great 2177 | green 2178 | greengrocery 2179 | greenly 2180 | greenness 2181 | greet 2182 | greeting 2183 | grey 2184 | grief 2185 | grievous 2186 | grill 2187 | grin 2188 | grinder 2189 | gringo 2190 | grip 2191 | griping 2192 | grocery 2193 | groom 2194 | grotesque 2195 | ground 2196 | group 2197 | grow 2198 | growth 2199 | guarantee 2200 | guard 2201 | guardedly 2202 | guardian 2203 | guess 2204 | guest 2205 | guidance 2206 | guild 2207 | guillotine 2208 | guilt 2209 | guilty 2210 | guitar 2211 | gulf 2212 | gull 2213 | gum 2214 | gumption 2215 | gun 2216 | guru 2217 | gut 2218 | guy 2219 | gymnastics 2220 | habitable 2221 | hack 2222 | hail 2223 | hair 2224 | haircut 2225 | halfhearted 2226 | hall 2227 | halloween 2228 | hallucinating 2229 | hallway 2230 | hamburger 2231 | hamlet 2232 | hamster 2233 | hancock 2234 | hand 2235 | handbag 2236 | handbook 2237 | handle 2238 | handsome 2239 | handwriting 2240 | hang 2241 | hanging 2242 | hankering 2243 | happen 2244 | happening 2245 | happiness 2246 | happy 2247 | harbor 2248 | harbour 2249 | hard 2250 | hard-and-fast 2251 | harden 2252 | hardware 2253 | harm 2254 | harmful 2255 | harmony 2256 | harpsichord 2257 | harsh 2258 | harvard 2259 | hash 2260 | hasten 2261 | hat 2262 | hateful 2263 | hawk 2264 | hazard 2265 | haze 2266 | he 2267 | head 2268 | heading 2269 | headless 2270 | headship 2271 | health 2272 | healthful 2273 | hear 2274 | hearing 2275 | heart 2276 | heartlessly 2277 | heartlessness 2278 | heat 2279 | heater 2280 | heaven 2281 | heavenly 2282 | heavy 2283 | heedless 2284 | height 2285 | heiress 2286 | helical 2287 | helium 2288 | hell 2289 | helm 2290 | helmet 2291 | help 2292 | helper 2293 | helplessness 2294 | hen 2295 | her 2296 | heraldist 2297 | herb 2298 | heritage 2299 | hero 2300 | heroin 2301 | heroine 2302 | heterosexism 2303 | heterosexual 2304 | hideous 2305 | high 2306 | highjacking 2307 | highlanders 2308 | highlight 2309 | highway 2310 | hike 2311 | hilariously 2312 | hilarity 2313 | hill 2314 | hinder 2315 | hinduism 2316 | hip 2317 | hire 2318 | his 2319 | historically 2320 | history 2321 | hit 2322 | hive 2323 | hockey 2324 | hold 2325 | holder 2326 | hole 2327 | holiday 2328 | holy 2329 | home 2330 | homeless 2331 | homer 2332 | homoerotic 2333 | homogeneous 2334 | homogenized 2335 | homophobia 2336 | homophony 2337 | homosexual 2338 | honest 2339 | honey 2340 | honolulu 2341 | honor 2342 | hood 2343 | hop 2344 | hope 2345 | horace 2346 | horizon 2347 | hormone 2348 | horn 2349 | horrible 2350 | horrid 2351 | horse 2352 | horsemanship 2353 | hose 2354 | hospital 2355 | hospitalize 2356 | hostility 2357 | hot 2358 | hotel 2359 | hotness 2360 | hound 2361 | house 2362 | houseful 2363 | householders 2364 | housing 2365 | houston 2366 | hover 2367 | huffy 2368 | huge 2369 | humanness 2370 | hummingbird 2371 | humorous 2372 | hundred 2373 | hungry 2374 | hunt 2375 | hurricane 2376 | hurt 2377 | husband 2378 | hush 2379 | husky 2380 | hut 2381 | hybridise 2382 | hydrochloride 2383 | hydrogen 2384 | hydrolysed 2385 | hymn 2386 | hypercoaster 2387 | hyperextension 2388 | hyperlink 2389 | hyperlinks 2390 | hypermarket 2391 | hypermarkets 2392 | hypersensitive 2393 | hypersensitivity 2394 | hypertension 2395 | hypertext 2396 | hypertexts 2397 | hypervelocity 2398 | hypocrisy 2399 | hypothesis 2400 | hypothetical 2401 | hysteria 2402 | ice 2403 | icelandic 2404 | icon 2405 | idea 2406 | ideality 2407 | idiocy 2408 | idle 2409 | ignorance 2410 | ignore 2411 | ill 2412 | illegal 2413 | illiberal 2414 | illimitable 2415 | illiterate 2416 | illness 2417 | illusion 2418 | illustration 2419 | image 2420 | imagine 2421 | imbedding 2422 | imitate 2423 | imitation 2424 | immeasurable 2425 | immensely 2426 | immigrate 2427 | immigrating 2428 | immobile 2429 | immobilization 2430 | immobilizing 2431 | immoderate 2432 | immoral 2433 | immortalize 2434 | immoveable 2435 | immunity 2436 | impartial 2437 | impartiality 2438 | impassively 2439 | impatient 2440 | impeded 2441 | imperceptible 2442 | imperfection 2443 | imperils 2444 | impermanent 2445 | impermissible 2446 | implantations 2447 | implausible 2448 | implement 2449 | implementation 2450 | implication 2451 | implicational 2452 | imply 2453 | impoliteness 2454 | impolitic 2455 | importance 2456 | importances 2457 | important 2458 | impose 2459 | imposition 2460 | impossibilities 2461 | impossible 2462 | impotently 2463 | imprecise 2464 | impregnate 2465 | impress 2466 | impression 2467 | impressionable 2468 | imprisoned 2469 | improvement 2470 | improver 2471 | improving 2472 | improvise 2473 | improvised 2474 | impulse 2475 | impulsion 2476 | impurity 2477 | inabilities 2478 | inaccessible 2479 | inaccurate 2480 | inadvertence 2481 | inanimate 2482 | inapplicability 2483 | inarticulate 2484 | inbreeding 2485 | incalculable 2486 | incensing 2487 | inch 2488 | incised 2489 | inclosure 2490 | incombustible 2491 | income 2492 | incommensurable 2493 | incommensurate 2494 | incommutable 2495 | incomprehensible 2496 | incomprehension 2497 | incongruous 2498 | inconsiderate 2499 | incontestable 2500 | incontrovertible 2501 | inconvertible 2502 | incoordination 2503 | incorrupt 2504 | incorruptible 2505 | increase 2506 | increasing 2507 | incredulous 2508 | incubate 2509 | incurved 2510 | indecent 2511 | indelicate 2512 | independence 2513 | independences 2514 | independent 2515 | independently 2516 | index 2517 | indexical 2518 | indian 2519 | indication 2520 | indicted 2521 | indifferently 2522 | indirect 2523 | indirectness 2524 | indiscriminate 2525 | indispensable 2526 | individual 2527 | individualist 2528 | individualize 2529 | indoctrinate 2530 | inducement 2531 | inducted 2532 | industrial 2533 | industrialise 2534 | industry 2535 | ineffective 2536 | inelasticity 2537 | inelegance 2538 | inessential 2539 | inexpedient 2540 | inexpensive 2541 | inexpert 2542 | inexplicable 2543 | infeasible 2544 | infection 2545 | infectious 2546 | infectiously 2547 | inference 2548 | infinite 2549 | inflame 2550 | inflammation 2551 | inflection 2552 | inflicted 2553 | influence 2554 | infolding 2555 | inform 2556 | informal 2557 | information 2558 | informative 2559 | infrastructure 2560 | ingroup 2561 | inharmonious 2562 | inheritable 2563 | inheritance 2564 | inheritances 2565 | inheritor 2566 | initialise 2567 | initiate 2568 | initiation 2569 | injure 2570 | ink 2571 | inmate 2572 | inn 2573 | innocuous 2574 | innovativeness 2575 | inoffensive 2576 | inorganic 2577 | inquire 2578 | inquirer 2579 | inquiring 2580 | inquisitive 2581 | inquisitiveness 2582 | inquisitor 2583 | inroad 2584 | insane 2585 | insanity 2586 | insatiate 2587 | inscribe 2588 | insect 2589 | insecureness 2590 | insecurities 2591 | insensitive 2592 | insert 2593 | insertion 2594 | insidiously 2595 | insight 2596 | inspect 2597 | install 2598 | installation 2599 | institution 2600 | institutionalize 2601 | instruct 2602 | instruction 2603 | instructor 2604 | instructorship 2605 | instrument 2606 | instrumentality 2607 | instrumentation 2608 | insubordinate 2609 | insufficiency 2610 | insurance 2611 | insure 2612 | insured 2613 | insurgent 2614 | insurrectional 2615 | insurrectionist 2616 | integration 2617 | integrity 2618 | intellect 2619 | intelligence 2620 | intelligences 2621 | intelligent 2622 | intend 2623 | intended 2624 | intending 2625 | intense 2626 | intensions 2627 | intensity 2628 | intentionality 2629 | interact 2630 | interaction 2631 | intercede 2632 | interceptor 2633 | intercession 2634 | interchanging 2635 | intercommunicate 2636 | interconnect 2637 | interconnectedness 2638 | interdisciplinary 2639 | interest 2640 | interesting 2641 | interim 2642 | interior 2643 | interjection 2644 | interlace 2645 | interlaces 2646 | interlayers 2647 | interlingua 2648 | interlink 2649 | interlinking 2650 | interlinks 2651 | intermarry 2652 | intermingles 2653 | international 2654 | internationaler 2655 | internationalisms 2656 | internationality 2657 | internationalize 2658 | interned 2659 | internee 2660 | internet 2661 | internships 2662 | interpenetrate 2663 | interplanetary 2664 | interpreted 2665 | interpreter 2666 | interrelate 2667 | interrelated 2668 | interrelationship 2669 | intersected 2670 | interspecies 2671 | interstellar 2672 | intertwine 2673 | intertwining 2674 | intervention 2675 | interview 2676 | interviewing 2677 | interweaved 2678 | interwove 2679 | intoxication 2680 | intracerebral 2681 | intragroup 2682 | intramolecular 2683 | intramural 2684 | intramuscular 2685 | intraspecific 2686 | intrude 2687 | intuition 2688 | invariable 2689 | inventively 2690 | inventory 2691 | inversions 2692 | investigate 2693 | investigation 2694 | investigator 2695 | investment 2696 | investor 2697 | invigorating 2698 | invisible 2699 | invitation 2700 | invite 2701 | invoice 2702 | involve 2703 | involvement 2704 | ipod 2705 | iran 2706 | iranian 2707 | iris 2708 | iron 2709 | irrationality 2710 | irredeemable 2711 | irregardless 2712 | irrelevance 2713 | irrelevant 2714 | irreligious 2715 | irremovable 2716 | irreproducible 2717 | irresolution 2718 | irreverence 2719 | irrigate 2720 | irritatingly 2721 | islamabad 2722 | island 2723 | isolate 2724 | isolation 2725 | isosceles 2726 | israel 2727 | issue 2728 | italian 2729 | italy 2730 | itch 2731 | ivy 2732 | jacket 2733 | jackson 2734 | jaguar 2735 | jail 2736 | jamaica 2737 | japan 2738 | japanese 2739 | jar 2740 | jarringly 2741 | jaw 2742 | jay 2743 | jazz 2744 | jean 2745 | jellyfish 2746 | jerusalem 2747 | jet 2748 | jewel 2749 | jewelry 2750 | job 2751 | join 2752 | joint 2753 | joke 2754 | journal 2755 | journey 2756 | joy 2757 | judge 2758 | judging 2759 | judgment 2760 | juice 2761 | jump 2762 | jumper 2763 | juncture 2764 | jurisdiction 2765 | jury 2766 | justice 2767 | justify 2768 | juvenile 2769 | kansas 2770 | kazakhstani 2771 | keep 2772 | kennedy 2773 | key 2774 | keyboard 2775 | kick 2776 | kid 2777 | kidnapped 2778 | kidney 2779 | kill 2780 | killed 2781 | killer 2782 | kilometer 2783 | kind 2784 | kindergarteners 2785 | king 2786 | kingship 2787 | kiss 2788 | kitchen 2789 | kitten 2790 | kitty 2791 | klan 2792 | knee 2793 | knife 2794 | knifing 2795 | knight 2796 | knightly 2797 | knit 2798 | knock 2799 | know 2800 | know-how 2801 | knowing 2802 | knowledge 2803 | kremlin 2804 | lab 2805 | label 2806 | laboratory 2807 | labourer 2808 | lace 2809 | lad 2810 | laden 2811 | lady 2812 | lake 2813 | lamb 2814 | lamp 2815 | land 2816 | landlord 2817 | landscape 2818 | language 2819 | languishing 2820 | lantern 2821 | large 2822 | largeness 2823 | lastingly 2824 | latex 2825 | latin 2826 | latinist 2827 | laud 2828 | laugh 2829 | laundering 2830 | laureate 2831 | lavishness 2832 | law 2833 | lawn 2834 | lawyer 2835 | lay 2836 | layer 2837 | leader 2838 | leadership 2839 | leading 2840 | leaf 2841 | leagued 2842 | lean 2843 | learn 2844 | lease 2845 | leather 2846 | leave 2847 | lebanon 2848 | lectureship 2849 | ledge 2850 | leg 2851 | legal 2852 | legalism 2853 | legion 2854 | lego 2855 | leisured 2856 | lemon 2857 | lend 2858 | lengthy 2859 | lenience 2860 | lens 2861 | lesson 2862 | letter 2863 | letters 2864 | level 2865 | leverage 2866 | levy 2867 | liability 2868 | libelous 2869 | liberation 2870 | librarianship 2871 | library 2872 | libya 2873 | license 2874 | lick 2875 | lie 2876 | lien 2877 | lieutenant 2878 | life 2879 | lift 2880 | light 2881 | lighthouse 2882 | lighting 2883 | lightning 2884 | lightship 2885 | lilt 2886 | lily 2887 | limb 2888 | limit 2889 | lincoln 2890 | line 2891 | linen 2892 | lineup 2893 | lingerie 2894 | link 2895 | lion 2896 | lip 2897 | liquid 2898 | liquor 2899 | list 2900 | listen 2901 | listeners 2902 | listing 2903 | literalness 2904 | literature 2905 | lithium 2906 | live 2907 | liveable 2908 | lively 2909 | liver 2910 | liverpools 2911 | lizard 2912 | load 2913 | loan 2914 | lobster 2915 | local 2916 | localise 2917 | locality 2918 | locate 2919 | location 2920 | lock 2921 | locomotive 2922 | log 2923 | logic 2924 | logical 2925 | london 2926 | londoners 2927 | long 2928 | longing 2929 | look 2930 | loop 2931 | loose 2932 | lordship 2933 | lose 2934 | loss 2935 | losses 2936 | louisiana 2937 | lounge 2938 | love 2939 | loveable 2940 | loveless 2941 | lovely 2942 | lover 2943 | low 2944 | lower 2945 | lubricate 2946 | luck 2947 | luggage 2948 | lunch 2949 | lung 2950 | lushness 2951 | lusterware 2952 | lustrate 2953 | luxuriance 2954 | luxury 2955 | lyric 2956 | mac 2957 | machine 2958 | macrocosmic 2959 | macroeconomist 2960 | macroeconomists 2961 | macroevolution 2962 | mad 2963 | madhouse 2964 | madrid 2965 | magazine 2966 | magic 2967 | magically 2968 | magician 2969 | magnetic 2970 | magnetize 2971 | magnificent 2972 | magnitude 2973 | magnolia 2974 | maid 2975 | mail 2976 | maildrop 2977 | major 2978 | make 2979 | maker 2980 | makeup 2981 | male 2982 | maleness 2983 | malevolence 2984 | malfeasance 2985 | malicious 2986 | mallard 2987 | mammal 2988 | man 2989 | manacles 2990 | management 2991 | manager 2992 | managership 2993 | manchester 2994 | manifestation 2995 | mannequin 2996 | manner 2997 | manslaughter 2998 | manufacturer 2999 | map 3000 | maple 3001 | mar 3002 | maradona 3003 | marathon 3004 | marble 3005 | marching 3006 | mare 3007 | marginality 3008 | marginalize 3009 | marijuana 3010 | marinate 3011 | mark 3012 | marker 3013 | market 3014 | marketers 3015 | marriage 3016 | marrow 3017 | marry 3018 | mars 3019 | marvelous 3020 | masculinity 3021 | mask 3022 | match 3023 | mate 3024 | material 3025 | mathematical 3026 | mathematician 3027 | matter 3028 | maxwell 3029 | mayor 3030 | mayoralty 3031 | mccarthyism 3032 | meadows 3033 | meal 3034 | meaning 3035 | meaningless 3036 | measure 3037 | measurements 3038 | meat 3039 | mechanic 3040 | mechanical 3041 | mechanism 3042 | medal 3043 | media 3044 | medicate 3045 | medicine 3046 | medium 3047 | meet 3048 | meeting 3049 | melody 3050 | member 3051 | membership 3052 | memoir 3053 | memorabilia 3054 | memorial 3055 | memorialize 3056 | memory 3057 | men 3058 | mental 3059 | menu 3060 | mercantile 3061 | merchandise 3062 | merchantable 3063 | mercifulness 3064 | mere 3065 | merit 3066 | message 3067 | metabolism 3068 | metal 3069 | meter 3070 | methodically 3071 | metro 3072 | mexico 3073 | microbalance 3074 | microbiologist 3075 | microcircuit 3076 | microcircuits 3077 | microcomputers 3078 | microfiche 3079 | microfilm 3080 | microflora 3081 | microfossils 3082 | micrometer 3083 | microorganism 3084 | microphallus 3085 | microseconds 3086 | microvolts 3087 | microwave 3088 | microwaving 3089 | midday 3090 | middle 3091 | might 3092 | migrate 3093 | migrational 3094 | mildness 3095 | mile 3096 | militarize 3097 | military 3098 | militia 3099 | milk 3100 | mill 3101 | mimicked 3102 | mind 3103 | mingles 3104 | miniature 3105 | minister 3106 | ministry 3107 | mink 3108 | minority 3109 | minute 3110 | mirror 3111 | misbehave 3112 | misery 3113 | misleading 3114 | missile 3115 | missing 3116 | mission 3117 | mist 3118 | mistake 3119 | mistrustful 3120 | misty 3121 | mitigated 3122 | mixture 3123 | mob 3124 | mode 3125 | model 3126 | moderate 3127 | moderatorship 3128 | modern 3129 | modest 3130 | modesty 3131 | modification 3132 | mogul 3133 | moisten 3134 | molar 3135 | molecule 3136 | mom 3137 | moment 3138 | momentousness 3139 | momentum 3140 | monarchic 3141 | monarchical 3142 | monarchist 3143 | money 3144 | monk 3145 | monkey 3146 | monoatomic 3147 | monoclinic 3148 | monoculture 3149 | monocultures 3150 | monogenesis 3151 | monogram 3152 | monograms 3153 | monoplanes 3154 | monopolist 3155 | monotony 3156 | monotype 3157 | monsignori 3158 | monster 3159 | month 3160 | monument 3161 | mood 3162 | moon 3163 | moralist 3164 | morality 3165 | morning 3166 | morph 3167 | mortal 3168 | mortality 3169 | moscow 3170 | moss 3171 | motel 3172 | mother 3173 | motherless 3174 | motion 3175 | motivation 3176 | motive 3177 | motor 3178 | motorcycle 3179 | motto 3180 | mound 3181 | mount 3182 | mountain 3183 | mouse 3184 | mouth 3185 | move 3186 | movement 3187 | movie 3188 | mozart 3189 | mud 3190 | mug 3191 | multidimensional 3192 | multiply 3193 | munich 3194 | mural 3195 | murder 3196 | murdered 3197 | murderer 3198 | murphy 3199 | muscle 3200 | muscularity 3201 | museum 3202 | mushroom 3203 | mushroomed 3204 | music 3205 | musical 3206 | musician 3207 | mussolini 3208 | mustard 3209 | muster 3210 | mutinied 3211 | mystery 3212 | myth 3213 | nail 3214 | naivety 3215 | nakedness 3216 | nanometer 3217 | nanosecond 3218 | narrow 3219 | narrow-minded 3220 | narrow-mindedness 3221 | nation 3222 | native 3223 | naturalise 3224 | nature 3225 | naughtiness 3226 | navy 3227 | nazi 3228 | necessary 3229 | necessitate 3230 | neck 3231 | necklace 3232 | need 3233 | needle 3234 | needlepoint 3235 | needleworker 3236 | negligence 3237 | negociate 3238 | negotiable 3239 | negroes 3240 | neon 3241 | nephew 3242 | nerve 3243 | nerveless 3244 | nervous 3245 | nest 3246 | net 3247 | network 3248 | neutral 3249 | new 3250 | newness 3251 | news 3252 | newspaper 3253 | nice 3254 | nick 3255 | nickel 3256 | niece 3257 | night 3258 | nightly 3259 | nobelist 3260 | noble 3261 | noise 3262 | noisy 3263 | nomad 3264 | nominate 3265 | nominated 3266 | noncitizens 3267 | noncivilized 3268 | nonconformist 3269 | nonconscious 3270 | nondescripts 3271 | nonfunctional 3272 | nonindulgent 3273 | nonnative 3274 | nonobservant 3275 | nonpartisan 3276 | nonperformance 3277 | nonpolitical 3278 | nonprofessional 3279 | nonpublic 3280 | nonrepresentational 3281 | nonstandard 3282 | nontoxic 3283 | nonverbally 3284 | nonviable 3285 | noodle 3286 | noon 3287 | normal 3288 | normalise 3289 | normalize 3290 | north 3291 | nose 3292 | nosiness 3293 | note 3294 | notebook 3295 | notice 3296 | noticeable 3297 | notify 3298 | novel 3299 | novice 3300 | nuclear 3301 | nude 3302 | nuisance 3303 | number 3304 | numerical 3305 | nurse 3306 | nurses 3307 | nurturance 3308 | nut 3309 | nutrition 3310 | oak 3311 | obey 3312 | object 3313 | objectify 3314 | objectifying 3315 | objective 3316 | objector 3317 | obligation 3318 | observation 3319 | observatory 3320 | observe 3321 | observed 3322 | obstruct 3323 | obstructive 3324 | obtain 3325 | obtainment 3326 | obvious 3327 | obviousness 3328 | occasion 3329 | occlusion 3330 | occupation 3331 | occur 3332 | occurrence 3333 | ocean 3334 | odd 3335 | odorize 3336 | odyssey 3337 | offensive 3338 | office 3339 | officer 3340 | official 3341 | oil 3342 | old 3343 | oldest 3344 | omission 3345 | omnipotence 3346 | omnipotent 3347 | omniscience 3348 | onion 3349 | opalescence 3350 | opec 3351 | open 3352 | opening 3353 | opera 3354 | operation 3355 | operative 3356 | operator 3357 | opinion 3358 | opponent 3359 | opportune 3360 | opportunity 3361 | oppose 3362 | opposition 3363 | oppress 3364 | optical 3365 | option 3366 | oracle 3367 | orange 3368 | orchestra 3369 | orchestrations 3370 | orchid 3371 | ordain 3372 | order 3373 | ordinary 3374 | organ 3375 | organic 3376 | organism 3377 | organismal 3378 | organization 3379 | organize 3380 | orientate 3381 | origami 3382 | origin 3383 | originality 3384 | originate 3385 | orthodontist 3386 | ostentatious 3387 | ottawa 3388 | oust 3389 | out 3390 | outdoor 3391 | outfit 3392 | outfoxed 3393 | outlawed 3394 | outlet 3395 | outperforming 3396 | outshout 3397 | oval 3398 | over 3399 | overcome 3400 | overhead 3401 | overlying 3402 | overshoe 3403 | owe 3404 | owl 3405 | ox 3406 | oxford 3407 | oxide 3408 | oxygen 3409 | pacific 3410 | package 3411 | packaging 3412 | packet 3413 | pact 3414 | padding 3415 | page 3416 | pain 3417 | painkillers 3418 | paint 3419 | painter 3420 | painting 3421 | pair 3422 | pakistan 3423 | palace 3424 | palestinian 3425 | palestinians 3426 | palm 3427 | pan 3428 | panda 3429 | panic 3430 | panorama 3431 | paparazzo 3432 | paper 3433 | papered 3434 | papers 3435 | parade 3436 | paradoxical 3437 | paragraph 3438 | parallelism 3439 | parallelize 3440 | parameter 3441 | parasitical 3442 | parcel 3443 | parent 3444 | paris 3445 | parish 3446 | park 3447 | parking 3448 | parrot 3449 | part 3450 | partiality 3451 | partible 3452 | participant 3453 | participate 3454 | partner 3455 | partnership 3456 | partnerships 3457 | party 3458 | passable 3459 | passage 3460 | passion 3461 | passport 3462 | past 3463 | pastorship 3464 | patch 3465 | patent 3466 | path 3467 | pathfinder 3468 | pathfinders 3469 | pathless 3470 | patients 3471 | patio 3472 | patrol 3473 | pattern 3474 | patterns 3475 | pave 3476 | paving 3477 | paw 3478 | pay 3479 | payment 3480 | peace 3481 | peaceful 3482 | peacock 3483 | pearl 3484 | pebble 3485 | pedaler 3486 | pedicab 3487 | peel 3488 | pelican 3489 | pen 3490 | pencil 3491 | penetrate 3492 | penis 3493 | penitent 3494 | pennsylvania 3495 | people 3496 | pepper 3497 | perceive 3498 | percent 3499 | perceptible 3500 | perception 3501 | perfect 3502 | perfectible 3503 | perfective 3504 | perform 3505 | performance 3506 | performances 3507 | performer 3508 | performing 3509 | perfume 3510 | period 3511 | periodical 3512 | peripheral 3513 | perished 3514 | perjury 3515 | permission 3516 | permit 3517 | person 3518 | personify 3519 | personifying 3520 | personnel 3521 | perspectives 3522 | persuade 3523 | persuasions 3524 | pervert 3525 | pessimist 3526 | pestilence 3527 | pet 3528 | petal 3529 | phantom 3530 | phenomenon 3531 | philanthropy 3532 | philip 3533 | philosophic 3534 | phone 3535 | phosphate 3536 | photo 3537 | photocopy 3538 | photographer 3539 | photography 3540 | phrase 3541 | physical 3542 | physically 3543 | physician 3544 | physics 3545 | piano 3546 | piazza 3547 | pick 3548 | picket 3549 | picture 3550 | pie 3551 | piece 3552 | pier 3553 | piety 3554 | pig 3555 | pigeon 3556 | pillow 3557 | pilot 3558 | pin 3559 | pink 3560 | pinnacle 3561 | pinpointed 3562 | pious 3563 | pipe 3564 | piquancy 3565 | pitch 3566 | pittance 3567 | pizza 3568 | place 3569 | placement 3570 | placidity 3571 | plague 3572 | plan 3573 | plane 3574 | planet 3575 | planners 3576 | planning 3577 | plant 3578 | plantation 3579 | plastic 3580 | plate 3581 | plates 3582 | play 3583 | player 3584 | playful 3585 | playground 3586 | plays 3587 | plea 3588 | plead 3589 | pleading 3590 | please 3591 | pleasing 3592 | pledged 3593 | pledges 3594 | plenty 3595 | plot 3596 | plucked 3597 | plundered 3598 | pod 3599 | poem 3600 | poet 3601 | poetry 3602 | point 3603 | poison 3604 | poisoning 3605 | poker 3606 | poland 3607 | polar 3608 | pole 3609 | poles 3610 | police 3611 | policy 3612 | polite 3613 | political 3614 | politician 3615 | politics 3616 | poll 3617 | pollution 3618 | polo 3619 | polyester 3620 | pond 3621 | poodle 3622 | pool 3623 | poor 3624 | pop 3625 | popcorn 3626 | poppy 3627 | populace 3628 | populate 3629 | population 3630 | porch 3631 | pork 3632 | port 3633 | portioned 3634 | portrait 3635 | portray 3636 | portrayer 3637 | position 3638 | positioners 3639 | possess 3640 | possession 3641 | possessor 3642 | possibility 3643 | possible 3644 | post 3645 | postage 3646 | postboxes 3647 | postcard 3648 | postcode 3649 | postcodes 3650 | postdated 3651 | postdates 3652 | poster 3653 | postglacial 3654 | posthole 3655 | postholes 3656 | postmark 3657 | postmarks 3658 | postmodernism 3659 | postmodernist 3660 | postpone 3661 | postponements 3662 | postposition 3663 | pot 3664 | potato 3665 | potent 3666 | potential 3667 | pottery 3668 | poultry 3669 | pour 3670 | poverty 3671 | power 3672 | powerful 3673 | practicable 3674 | practical 3675 | practicality 3676 | practice 3677 | pray 3678 | prayer 3679 | prayerful 3680 | preaching 3681 | preadolescent 3682 | preassembled 3683 | precedent 3684 | precociously 3685 | preconception 3686 | preconceptions 3687 | predators 3688 | predetermination 3689 | predetermine 3690 | predict 3691 | predictive 3692 | predominance 3693 | preempt 3694 | preference 3695 | pregnancy 3696 | pregnant 3697 | preheated 3698 | preheating 3699 | prehistorical 3700 | prejudge 3701 | prejudging 3702 | prejudice 3703 | preliterate 3704 | premeditation 3705 | premise 3706 | premisses 3707 | preordained 3708 | preparation 3709 | preposed 3710 | preschooler 3711 | preschoolers 3712 | prescriptions 3713 | presence 3714 | presenters 3715 | presenting 3716 | preservation 3717 | preserve 3718 | preservers 3719 | president 3720 | press 3721 | pressure 3722 | pressurise 3723 | prestige 3724 | presuppose 3725 | preteens 3726 | pretend 3727 | pretenders 3728 | pretense 3729 | pretty 3730 | prevent 3731 | preventive 3732 | prey 3733 | price 3734 | pride 3735 | prideful 3736 | priest 3737 | primacy 3738 | primates 3739 | primer 3740 | prince 3741 | princedom 3742 | princedoms 3743 | princess 3744 | principality 3745 | print 3746 | printer 3747 | prison 3748 | prisoner 3749 | prisoners 3750 | private 3751 | privileges 3752 | prize 3753 | probability 3754 | problem 3755 | procedure 3756 | proceeding 3757 | proceedings 3758 | process 3759 | processing 3760 | proclaim 3761 | procreated 3762 | procreating 3763 | procreation 3764 | procurator 3765 | procurators 3766 | produce 3767 | producing 3768 | product 3769 | production 3770 | profanity 3771 | profession 3772 | professionals 3773 | professor 3774 | profit 3775 | profitless 3776 | profusion 3777 | program 3778 | project 3779 | projector 3780 | prolapse 3781 | prolong 3782 | prominence 3783 | promiscuous 3784 | promised 3785 | promoter 3786 | promotive 3787 | prompt 3788 | pronounce 3789 | pronunciation 3790 | proof 3791 | propaganda 3792 | propagate 3793 | proper 3794 | property 3795 | prophetic 3796 | prophetical 3797 | propose 3798 | proposition 3799 | propriety 3800 | prose 3801 | prospector 3802 | prostatic 3803 | protect 3804 | protection 3805 | protective 3806 | protectiveness 3807 | protectorship 3808 | protest 3809 | protestant 3810 | protestantism 3811 | protesters 3812 | protocol 3813 | proton 3814 | protraction 3815 | protractors 3816 | protrusion 3817 | proud 3818 | prove 3819 | providence 3820 | province 3821 | provincialism 3822 | proving 3823 | provisionally 3824 | provisionary 3825 | proximity 3826 | prudent 3827 | prudery 3828 | psychiatry 3829 | psychic 3830 | psychodynamics 3831 | psychologist 3832 | psychology 3833 | pub 3834 | publication 3835 | publicise 3836 | publicize 3837 | publish 3838 | puddle 3839 | puffery 3840 | pug 3841 | pump 3842 | pumpkin 3843 | punch 3844 | punctuate 3845 | punishment 3846 | punjabi 3847 | punk 3848 | pupil 3849 | puppy 3850 | purchasable 3851 | purgatory 3852 | puritanism 3853 | purple 3854 | purpose 3855 | purposeless 3856 | purse 3857 | pursue 3858 | purveying 3859 | push 3860 | put 3861 | pyramid 3862 | quadratics 3863 | quality 3864 | quantity 3865 | quarantine 3866 | quarrel 3867 | quarter 3868 | quarterback 3869 | quartz 3870 | queen 3871 | queens 3872 | query 3873 | quest 3874 | question 3875 | queue 3876 | quick 3877 | quicken 3878 | quiet 3879 | quieten 3880 | quit 3881 | quitter 3882 | quiz 3883 | quota 3884 | quotation 3885 | quote 3886 | rabbi 3887 | rabbit 3888 | race 3889 | racer 3890 | racing 3891 | racism 3892 | rack 3893 | racket 3894 | radar 3895 | radiance 3896 | radiation 3897 | radiators 3898 | radical 3899 | radio 3900 | rail 3901 | railroad 3902 | rails 3903 | railway 3904 | rain 3905 | rainbow 3906 | rainless 3907 | raise 3908 | rally 3909 | rampant 3910 | ranch 3911 | randomize 3912 | range 3913 | rank 3914 | rap 3915 | rapid 3916 | rare 3917 | rareness 3918 | rarity 3919 | rascality 3920 | rash 3921 | raspberry 3922 | rat 3923 | rate 3924 | rational 3925 | rationalize 3926 | rattle 3927 | rattlesnake 3928 | ravenous 3929 | raw 3930 | ray 3931 | re-argue 3932 | re-create 3933 | reaction 3934 | read 3935 | reader 3936 | reading 3937 | real 3938 | reality 3939 | realize 3940 | rearrangements 3941 | reason 3942 | reasonable 3943 | reasoning 3944 | reassess 3945 | reassessments 3946 | reassuringly 3947 | rebel 3948 | rebellious 3949 | reburial 3950 | recast 3951 | receive 3952 | receiver 3953 | receiverships 3954 | receiving 3955 | recent 3956 | receptions 3957 | recess 3958 | reciprocal 3959 | recitalist 3960 | reckoner 3961 | reclaim 3962 | reclassifications 3963 | recognition 3964 | recognize 3965 | recommend 3966 | recommendation 3967 | reconstructs 3968 | record 3969 | recorder 3970 | recorders 3971 | recourse 3972 | recovery 3973 | recreation 3974 | rectorate 3975 | recycle 3976 | recycling 3977 | red 3978 | rededicated 3979 | redhead 3980 | rediscovery 3981 | reduce 3982 | reduced 3983 | reef 3984 | reelections 3985 | reenact 3986 | reenactor 3987 | refer 3988 | refered 3989 | reference 3990 | refine 3991 | refinery 3992 | reflection 3993 | reform 3994 | reformations 3995 | reformism 3996 | refresher 3997 | refrigerator 3998 | refuels 3999 | refugees 4000 | refurbishment 4001 | refurbishments 4002 | refuted 4003 | regained 4004 | regardless 4005 | regenerate 4006 | regiment 4007 | regimental 4008 | region 4009 | regionalisms 4010 | register 4011 | registration 4012 | registry 4013 | regretful 4014 | regularise 4015 | regularize 4016 | regulate 4017 | rehashing 4018 | reichstag 4019 | reinsured 4020 | reinterpret 4021 | reject 4022 | rejoinders 4023 | relates 4024 | relation 4025 | relationship 4026 | relative 4027 | relax 4028 | relaxation 4029 | relay 4030 | reliable 4031 | relief 4032 | relieve 4033 | religion 4034 | religionist 4035 | religious 4036 | religiousness 4037 | relinquishment 4038 | relocation 4039 | remain 4040 | remainder 4041 | remakes 4042 | remarkable 4043 | remarriage 4044 | rematches 4045 | remedy 4046 | remember 4047 | remind 4048 | remitting 4049 | remounted 4050 | removal 4051 | remove 4052 | removes 4053 | rental 4054 | reordering 4055 | reorientate 4056 | repair 4057 | repeal 4058 | repeat 4059 | repeating 4060 | repel 4061 | replace 4062 | replacements 4063 | replaces 4064 | replicate 4065 | replication 4066 | replications 4067 | reply 4068 | report 4069 | reporter 4070 | reporters 4071 | repositioned 4072 | repositions 4073 | reprehensible 4074 | representable 4075 | representation 4076 | representational 4077 | representative 4078 | repress 4079 | repressing 4080 | reprints 4081 | reproachful 4082 | reprocessing 4083 | reproduce 4084 | reproducible 4085 | reproduction 4086 | reproductive 4087 | reproves 4088 | reptile 4089 | republic 4090 | republicans 4091 | repulses 4092 | repulsive 4093 | repurchases 4094 | reputable 4095 | request 4096 | requests 4097 | require 4098 | requirement 4099 | rescued 4100 | research 4101 | researchers 4102 | reservation 4103 | reserve 4104 | resides 4105 | residing 4106 | resigning 4107 | resistive 4108 | resistor 4109 | resolve 4110 | resource 4111 | respectable 4112 | respectively 4113 | respite 4114 | responsible 4115 | rest 4116 | restaurant 4117 | restless 4118 | restore 4119 | restrainer 4120 | restraint 4121 | restrict 4122 | result 4123 | resurfacing 4124 | retailer 4125 | retain 4126 | retarding 4127 | retiring 4128 | retrace 4129 | retraced 4130 | retraction 4131 | retrials 4132 | retrying 4133 | return 4134 | returning 4135 | reviewers 4136 | reviles 4137 | revivalism 4138 | revolution 4139 | revolutionise 4140 | reward 4141 | rhodes 4142 | rhymers 4143 | rhythm 4144 | rhythmicity 4145 | rice 4146 | rich 4147 | rid 4148 | ridge 4149 | right 4150 | ring 4151 | ringer 4152 | ripeness 4153 | ripple 4154 | rise 4155 | riskless 4156 | rite 4157 | river 4158 | riverside 4159 | road 4160 | roadless 4161 | roam 4162 | robbery 4163 | rock 4164 | rocker 4165 | rocket 4166 | rod 4167 | rodent 4168 | roles 4169 | roll 4170 | roller 4171 | romance 4172 | romanic 4173 | rome 4174 | rompers 4175 | roof 4176 | roofers 4177 | rook 4178 | room 4179 | roosted 4180 | rooster 4181 | roosters 4182 | root 4183 | rooters 4184 | rope 4185 | ropewalker 4186 | rose 4187 | rotate 4188 | rotational 4189 | rough 4190 | round 4191 | rounded 4192 | rover 4193 | row 4194 | rowing 4195 | royalist 4196 | rub 4197 | rubber 4198 | rubberstamp 4199 | rubbish 4200 | rudderless 4201 | rugby 4202 | ruin 4203 | rule 4204 | ruler 4205 | rumbled 4206 | run 4207 | run-down 4208 | ruralist 4209 | russia 4210 | russian 4211 | rust 4212 | rustic 4213 | rusty 4214 | sabbath 4215 | sad 4216 | safe 4217 | safety 4218 | sage 4219 | sail 4220 | sailing 4221 | sailings 4222 | sailor 4223 | saint 4224 | saints 4225 | sake 4226 | salable 4227 | salad 4228 | salary 4229 | sale 4230 | sales 4231 | salt 4232 | salute 4233 | same 4234 | sanctify 4235 | sanctifying 4236 | sanctioned 4237 | sanctions 4238 | sand 4239 | sandbag 4240 | sandwich 4241 | sanskrit 4242 | santa 4243 | satan 4244 | satellite 4245 | satin 4246 | satisfaction 4247 | satisfactory 4248 | satisfy 4249 | sauce 4250 | sausage 4251 | save 4252 | say 4253 | scale 4254 | scampering 4255 | scandal 4256 | scandalize 4257 | scandinavian 4258 | scar 4259 | scarce 4260 | scarceness 4261 | scarcity 4262 | scattered 4263 | scene 4264 | scenery 4265 | scheduled 4266 | scheme 4267 | schemer 4268 | schnauzer 4269 | scholarship 4270 | scholarships 4271 | school 4272 | science 4273 | scientist 4274 | scooter 4275 | scope 4276 | score 4277 | scorn 4278 | scornful 4279 | scot 4280 | scotch 4281 | scottish 4282 | scouting 4283 | scowl 4284 | scrape 4285 | scratch 4286 | script 4287 | scrutiny 4288 | sculpture 4289 | sea 4290 | seafaring 4291 | seafood 4292 | seagull 4293 | seal 4294 | search 4295 | seashore 4296 | season 4297 | seasonable 4298 | seat 4299 | secluding 4300 | second 4301 | secondary 4302 | secret 4303 | secretary 4304 | secularist 4305 | security 4306 | seductive 4307 | seed 4308 | seeders 4309 | seem 4310 | seepage 4311 | segment 4312 | seize 4313 | seizure 4314 | selection 4315 | selectively 4316 | self 4317 | self-discipline 4318 | self-discovery 4319 | self-fulfillment 4320 | self-improvement 4321 | sell 4322 | seller 4323 | selling 4324 | semiconducting 4325 | seminar 4326 | senate 4327 | send 4328 | sense 4329 | sensitivity 4330 | sensualist 4331 | sentence 4332 | sentenced 4333 | sentiment 4334 | sentry 4335 | separate 4336 | separation 4337 | separationist 4338 | separatist 4339 | septic 4340 | serenaded 4341 | serf 4342 | serial 4343 | series 4344 | seriousness 4345 | sermonize 4346 | serve 4347 | server 4348 | serving 4349 | session 4350 | sessions 4351 | settle 4352 | seven 4353 | severer 4354 | sewing 4355 | sex 4356 | sexism 4357 | sexless 4358 | sexual 4359 | sexually 4360 | sexy 4361 | shackle 4362 | shade 4363 | shadow 4364 | shake 4365 | shame 4366 | shanghai 4367 | shanked 4368 | shape 4369 | share 4370 | shariff 4371 | shark 4372 | sharp 4373 | she 4374 | shed 4375 | sheep 4376 | sheepish 4377 | sheet 4378 | sheikhdoms 4379 | shell 4380 | shelter 4381 | shepherd 4382 | shepherded 4383 | sheriff 4384 | shift 4385 | ship 4386 | shirt 4387 | shock 4388 | shoe 4389 | shoes 4390 | shoot 4391 | shooting 4392 | shop 4393 | shopping 4394 | shore 4395 | short 4396 | short-change 4397 | shortage 4398 | shortish 4399 | shoulder 4400 | shoulders 4401 | shout 4402 | shouter 4403 | show 4404 | shower 4405 | shrewdness 4406 | shrieks 4407 | shrink 4408 | sick 4409 | sickness 4410 | side 4411 | sidewalk 4412 | sidewinder 4413 | sight 4414 | sightedness 4415 | sign 4416 | signal 4417 | signature 4418 | significances 4419 | significant 4420 | silence 4421 | silhouette 4422 | silver 4423 | similar 4424 | similarity 4425 | simple 4426 | simulation 4427 | sincere 4428 | sing 4429 | singapore 4430 | singe 4431 | singer 4432 | singing 4433 | sink 4434 | sinner 4435 | sister 4436 | sit 4437 | site 4438 | sitting 4439 | situate 4440 | situation 4441 | sixteen 4442 | size 4443 | skate 4444 | skateboard 4445 | skateboarders 4446 | skater 4447 | skating 4448 | sketch 4449 | ski 4450 | skid 4451 | skidding 4452 | skiing 4453 | skill 4454 | skilled 4455 | skillfulness 4456 | skin 4457 | skip 4458 | skirt 4459 | skull 4460 | sky 4461 | skyline 4462 | skyscraper 4463 | slack 4464 | slacken 4465 | slanderous 4466 | slash 4467 | slaughterers 4468 | slave 4469 | slavery 4470 | slaves 4471 | slavic 4472 | slaying 4473 | sleep 4474 | sleeve 4475 | slice 4476 | slope 4477 | slurred 4478 | sly 4479 | small 4480 | smallish 4481 | smart 4482 | smash 4483 | smell 4484 | smile 4485 | smith 4486 | smoke 4487 | smoking 4488 | smooth 4489 | smoothen 4490 | snake 4491 | snap 4492 | snickering 4493 | sniffers 4494 | snookered 4495 | snooper 4496 | snow 4497 | snowboarding 4498 | snowman 4499 | soap 4500 | soccer 4501 | sociability 4502 | sociable 4503 | socialise 4504 | socialism 4505 | socialist 4506 | socialites 4507 | society 4508 | sock 4509 | sodium 4510 | sofa 4511 | softball 4512 | softness 4513 | software 4514 | soil 4515 | solar 4516 | soldier 4517 | sole 4518 | solemnity 4519 | solid 4520 | soloist 4521 | solve 4522 | son 4523 | song 4524 | sophisticate 4525 | sorcery 4526 | sorrow 4527 | sorrowful 4528 | soul 4529 | soulfully 4530 | soulless 4531 | sound 4532 | soup 4533 | source 4534 | sourdough 4535 | south 4536 | southern 4537 | soviets 4538 | soybean 4539 | space 4540 | spacewalker 4541 | spaciousness 4542 | spaghetti 4543 | spain 4544 | spangle 4545 | spark 4546 | spatiality 4547 | speak 4548 | specialism 4549 | species 4550 | speculate 4551 | speculation 4552 | speculativeness 4553 | speech 4554 | speed 4555 | spellers 4556 | spend 4557 | spending 4558 | sphere 4559 | spherical 4560 | spiciness 4561 | spider 4562 | spill 4563 | spin 4564 | spine 4565 | spiral 4566 | spirit 4567 | spirited 4568 | spiritize 4569 | spiritless 4570 | spiritualist 4571 | spiritualize 4572 | splash 4573 | splashy 4574 | splendid 4575 | splice 4576 | split 4577 | splitter 4578 | sponge 4579 | sponsor 4580 | sponsorship 4581 | spoon 4582 | spoonful 4583 | spoonfuls 4584 | sport 4585 | sportive 4586 | sports 4587 | spot 4588 | spouse 4589 | spread 4590 | spring 4591 | springfield 4592 | sprinkle 4593 | sprint 4594 | sprouting 4595 | spy 4596 | squad 4597 | square 4598 | squash 4599 | squirrel 4600 | stability 4601 | stadium 4602 | stage 4603 | stair 4604 | staircase 4605 | stake 4606 | stalin 4607 | stamp 4608 | stance 4609 | stand 4610 | stand-in 4611 | standard 4612 | standardise 4613 | standardize 4614 | standing 4615 | standoffish 4616 | star 4617 | starkness 4618 | start 4619 | starter 4620 | state 4621 | statement 4622 | station 4623 | statistician 4624 | statue 4625 | status 4626 | stay 4627 | steak 4628 | steal 4629 | steel 4630 | steep 4631 | steepen 4632 | steeple 4633 | stem 4634 | stencil 4635 | step 4636 | sterile 4637 | sternness 4638 | stevenson 4639 | stewardship 4640 | stick 4641 | sticker 4642 | stickler 4643 | stimulates 4644 | stimulation 4645 | stimuli 4646 | stitch 4647 | stock 4648 | stockers 4649 | stocking 4650 | stoical 4651 | stomach 4652 | stone 4653 | stop 4654 | store 4655 | stork 4656 | storm 4657 | stormy 4658 | story 4659 | stove 4660 | straight 4661 | strains 4662 | strange 4663 | stranger 4664 | strangers 4665 | strategy 4666 | straw 4667 | strawberry 4668 | stream 4669 | street 4670 | strength 4671 | strengthen 4672 | strengthened 4673 | stressor 4674 | stretch 4675 | strife 4676 | string 4677 | strip 4678 | stripe 4679 | stroke 4680 | stroked 4681 | strong 4682 | structure 4683 | stud 4684 | student 4685 | study 4686 | stuff 4687 | stump 4688 | stupid 4689 | style 4690 | subarctic 4691 | subcommittee 4692 | subdivide 4693 | subdivided 4694 | subdividing 4695 | subeditor 4696 | subfamily 4697 | subgroup 4698 | subhead 4699 | subjoined 4700 | subjugate 4701 | subletting 4702 | sublieutenant 4703 | submarine 4704 | submariners 4705 | submerging 4706 | submit 4707 | subordination 4708 | subroutines 4709 | subsequences 4710 | subserve 4711 | subserving 4712 | subspaces 4713 | subspecies 4714 | substance 4715 | subsurface 4716 | subtend 4717 | subtropical 4718 | subway 4719 | succeed 4720 | success 4721 | successful 4722 | suds 4723 | suffer 4724 | sufferance 4725 | sufficed 4726 | sugar 4727 | suggestible 4728 | sulfide 4729 | sulfuric 4730 | sum 4731 | summer 4732 | summonings 4733 | sun 4734 | sunflower 4735 | sunglasses 4736 | sunlight 4737 | sunny 4738 | sunrise 4739 | sunset 4740 | sunshine 4741 | superficial 4742 | supermarket 4743 | supernatural 4744 | supervise 4745 | supper 4746 | supplanting 4747 | supplement 4748 | suppleness 4749 | supply 4750 | support 4751 | supporter 4752 | supporters 4753 | supposed 4754 | suppress 4755 | suppressor 4756 | sure 4757 | surf 4758 | surface 4759 | surfer 4760 | surgery 4761 | surname 4762 | surpass 4763 | surprise 4764 | surprises 4765 | surround 4766 | surroundings 4767 | survey 4768 | survivalist 4769 | susceptible 4770 | sushi 4771 | suspect 4772 | suspenseful 4773 | suspicion 4774 | suspiciousness 4775 | sustain 4776 | sustainable 4777 | swamp 4778 | swan 4779 | sway 4780 | swear 4781 | sweat 4782 | sweater 4783 | sweden 4784 | sweet 4785 | sweeten 4786 | sweetish 4787 | swell 4788 | swim 4789 | swimming 4790 | swimsuit 4791 | swing 4792 | swooshing 4793 | sydney 4794 | syllable 4795 | symbol 4796 | symbolist 4797 | symmetrical 4798 | sympathized 4799 | synchronic 4800 | synchronized 4801 | synoptic 4802 | syntactic 4803 | syntaxes 4804 | synthesize 4805 | synthetical 4806 | syphons 4807 | system 4808 | table 4809 | tableware 4810 | tag 4811 | tail 4812 | tailgate 4813 | take 4814 | talk 4815 | talkativeness 4816 | tall 4817 | tank 4818 | tap 4819 | target 4820 | task 4821 | taste 4822 | tasteful 4823 | tasteless 4824 | tasty 4825 | tattoo 4826 | tax 4827 | taxation 4828 | taxi 4829 | taxpayer 4830 | tea 4831 | teacher 4832 | teaching 4833 | team 4834 | tear 4835 | teasingly 4836 | teaspoonful 4837 | technician 4838 | technology 4839 | teeth 4840 | tehran 4841 | telephone 4842 | television 4843 | tell 4844 | temper 4845 | temperature 4846 | temple 4847 | temptation 4848 | tend 4849 | tenderize 4850 | tennis 4851 | tenor 4852 | tense 4853 | tent 4854 | tenured 4855 | term 4856 | terminate 4857 | terms 4858 | terrible 4859 | terrier 4860 | terrific 4861 | territorial 4862 | territorials 4863 | territory 4864 | terror 4865 | terrorize 4866 | tested 4867 | texas 4868 | text 4869 | textbook 4870 | textile 4871 | texture 4872 | thailand 4873 | thatcher 4874 | theater 4875 | theatre 4876 | theft 4877 | theme 4878 | theory 4879 | therapeutical 4880 | thermonuclear 4881 | thick 4882 | thief 4883 | thing 4884 | think 4885 | thinness 4886 | thoughtless 4887 | thousand 4888 | threat 4889 | threats 4890 | throat 4891 | thrombosis 4892 | throne 4893 | thrust 4894 | thumb 4895 | thunderstorm 4896 | ticker 4897 | ticket 4898 | tidings 4899 | tie 4900 | tiger 4901 | tile 4902 | timber 4903 | time 4904 | timer 4905 | tin 4906 | tiny 4907 | titillated 4908 | title 4909 | toast 4910 | tobacco 4911 | today 4912 | toe 4913 | together 4914 | toilet 4915 | tokyo 4916 | tolerable 4917 | tolerance 4918 | tomato 4919 | tone 4920 | tongue 4921 | tool 4922 | tooth 4923 | top 4924 | topic 4925 | topically 4926 | toppled 4927 | torment 4928 | toss 4929 | totalism 4930 | touch 4931 | tough 4932 | tourist 4933 | tournament 4934 | tower 4935 | town 4936 | toxic 4937 | toy 4938 | trace 4939 | track 4940 | tract 4941 | trade 4942 | trader 4943 | trading 4944 | traditionalism 4945 | traffic 4946 | tragedy 4947 | trail 4948 | train 4949 | trainer 4950 | traitorous 4951 | tram 4952 | transact 4953 | transalpine 4954 | transducers 4955 | transfer 4956 | transfigure 4957 | transformation 4958 | transformer 4959 | transforming 4960 | transfuse 4961 | transfused 4962 | transfusing 4963 | transgress 4964 | translocate 4965 | translocating 4966 | translocation 4967 | translunar 4968 | transmigrated 4969 | transmigrating 4970 | transmissible 4971 | transmission 4972 | transmitter 4973 | transmuted 4974 | transmutes 4975 | transponder 4976 | transport 4977 | transportation 4978 | transposable 4979 | transsexual 4980 | transshipped 4981 | transubstantiate 4982 | transverse 4983 | transvestite 4984 | transvestitism 4985 | trap 4986 | travel 4987 | traveler 4988 | travelers 4989 | traveling 4990 | traversals 4991 | traverse 4992 | treat 4993 | treatment 4994 | treaty 4995 | tree 4996 | trespass 4997 | trial 4998 | tribunal 4999 | trichloride 5000 | trick 5001 | triclinic 5002 | tricolor 5003 | tricolour 5004 | tricolours 5005 | tricycle 5006 | trilateral 5007 | trilogies 5008 | trinity 5009 | trio 5010 | trioxide 5011 | trip 5012 | tripod 5013 | tripods 5014 | triumph 5015 | troops 5016 | tropical 5017 | trouble 5018 | troubles 5019 | truck 5020 | trunk 5021 | trust 5022 | trusteeship 5023 | try 5024 | tsunami 5025 | tub 5026 | tube 5027 | tulip 5028 | tumble 5029 | tumbler 5030 | tune 5031 | tunnel 5032 | tunneled 5033 | turkey 5034 | turks 5035 | turn 5036 | tv 5037 | twig 5038 | twirl 5039 | twist 5040 | type 5041 | typify 5042 | ugly 5043 | ukrainians 5044 | ulcerate 5045 | ultimate 5046 | umbrella 5047 | unacceptable 5048 | unaccessible 5049 | unaffected 5050 | unambiguous 5051 | unannounced 5052 | unapproachable 5053 | unarguable 5054 | unassertiveness 5055 | unattainableness 5056 | unattractive 5057 | unbiased 5058 | unblock 5059 | unbroken 5060 | unceremonious 5061 | uncertainty 5062 | unchaste 5063 | uncle 5064 | unclog 5065 | uncomfortable 5066 | uncommunicative 5067 | uncomprehending 5068 | unconcern 5069 | unconcerned 5070 | unconscious 5071 | unconsciousness 5072 | unconsolidated 5073 | uncontrolled 5074 | uncontroversial 5075 | unconvincing 5076 | uncreative 5077 | undatable 5078 | undated 5079 | undefeated 5080 | undefended 5081 | undefinable 5082 | undefined 5083 | undemocratic 5084 | undeniable 5085 | underground 5086 | underprivileged 5087 | understand 5088 | underwater 5089 | undesirable 5090 | undetectable 5091 | undeviating 5092 | undiscerning 5093 | undisclosed 5094 | undisputable 5095 | undissolved 5096 | undress 5097 | unemotional 5098 | unenthusiastic 5099 | unequivocal 5100 | unexpected 5101 | unfasten 5102 | unfavorable 5103 | unfavourable 5104 | unfeathered 5105 | unfit 5106 | unflagging 5107 | unfledged 5108 | unforeseen 5109 | unforgiving 5110 | unformed 5111 | unfortunate 5112 | ungraceful 5113 | unhappy 5114 | unheralded 5115 | unicycle 5116 | unicycles 5117 | unicycling 5118 | unicyclist 5119 | uniform 5120 | unilateralist 5121 | unilluminated 5122 | unimpressed 5123 | uninformed 5124 | uninhibited 5125 | unintelligent 5126 | unintelligible 5127 | uninterested 5128 | union 5129 | unionise 5130 | unisexual 5131 | unisons 5132 | unit 5133 | unite 5134 | universe 5135 | univocal 5136 | unknowing 5137 | unloved 5138 | unloving 5139 | unmanned 5140 | unmarketable 5141 | unmarried 5142 | unmelted 5143 | unmentionable 5144 | unmentionables 5145 | unmolested 5146 | unnecessary 5147 | unobjectionable 5148 | unostentatious 5149 | unparented 5150 | unpersuasive 5151 | unploughed 5152 | unprecedented 5153 | unproductive 5154 | unprofessional 5155 | unquenchable 5156 | unquestioned 5157 | unreal 5158 | unrealizable 5159 | unrepeatable 5160 | unreserved 5161 | unrewarding 5162 | unsalable 5163 | unsatisfactory 5164 | unsettled 5165 | unsexy 5166 | unshaped 5167 | unsighted 5168 | unskillfulness 5169 | unspecialised 5170 | unstuff 5171 | unsubdivided 5172 | unsuitable 5173 | untilled 5174 | untracked 5175 | untroubled 5176 | untrustworthy 5177 | untruth 5178 | unvariedness 5179 | unwanted 5180 | unwaveringly 5181 | unwed 5182 | unwelcome 5183 | unworthiness 5184 | unwrap 5185 | unzipping 5186 | up 5187 | up-to-date 5188 | uproarious 5189 | uproariously 5190 | upset 5191 | urban 5192 | urbanize 5193 | urge 5194 | urgency 5195 | usher 5196 | utilitarianism 5197 | utility 5198 | utterance 5199 | vacation 5200 | vacations 5201 | valentine 5202 | validate 5203 | valley 5204 | valor 5205 | valorous 5206 | value 5207 | van 5208 | vanish 5209 | vaporise 5210 | variable 5211 | variation 5212 | variety 5213 | vault 5214 | vector 5215 | vegetational 5216 | vehicle 5217 | vein 5218 | venders 5219 | venomous 5220 | verbalize 5221 | verdict 5222 | verify 5223 | verse 5224 | vessel 5225 | vicarious 5226 | victim 5227 | victor 5228 | victorious 5229 | victory 5230 | video 5231 | vietnamese 5232 | view 5233 | viewer 5234 | villa 5235 | village 5236 | villages 5237 | villainous 5238 | vindictively 5239 | vindictiveness 5240 | vine 5241 | vinegar 5242 | vintage 5243 | vinyl 5244 | violating 5245 | violation 5246 | violent 5247 | violet 5248 | violin 5249 | virginals 5250 | virility 5251 | virologist 5252 | virtuoso 5253 | viscometry 5254 | vision 5255 | visitor 5256 | vitamin 5257 | vocal 5258 | vocalism 5259 | vodka 5260 | voice 5261 | volatility 5262 | volcano 5263 | volunteer 5264 | voraciously 5265 | voter 5266 | vow 5267 | voyage 5268 | vulgarism 5269 | vulnerable 5270 | wagner 5271 | wagon 5272 | waist 5273 | walk 5274 | wall 5275 | walloper 5276 | wallpapered 5277 | wander 5278 | wanderers 5279 | want 5280 | war 5281 | warmness 5282 | warning 5283 | warranty 5284 | warrior 5285 | warship 5286 | warships 5287 | wash 5288 | washer 5289 | washers 5290 | wasp 5291 | waste 5292 | watch 5293 | water 5294 | waterfall 5295 | wave 5296 | way 5297 | weaken 5298 | wealth 5299 | wealthy 5300 | weapon 5301 | weaponize 5302 | wear 5303 | weather 5304 | weave 5305 | web 5306 | wedding 5307 | wednesday 5308 | weed 5309 | week 5310 | weekend 5311 | weight 5312 | weird 5313 | weirdly 5314 | welcome 5315 | welfare 5316 | west 5317 | wet 5318 | whale 5319 | wheat 5320 | wheaten 5321 | wheel 5322 | whimsically 5323 | whisker 5324 | whiskey 5325 | white 5326 | whizzed 5327 | wholeheartedness 5328 | wicked 5329 | wide 5330 | widen 5331 | wife 5332 | wig 5333 | wild 5334 | wilderness 5335 | willingness 5336 | win 5337 | wind 5338 | windmill 5339 | window 5340 | wine 5341 | wing 5342 | wingless 5343 | wings 5344 | winking 5345 | winner 5346 | winners 5347 | winter 5348 | wipe 5349 | wire 5350 | wisdom 5351 | wise 5352 | wit 5353 | witch-hunt 5354 | withdraw 5355 | withdrawal 5356 | withdrawn 5357 | withhold 5358 | wizard 5359 | wolf 5360 | woman 5361 | wonderful 5362 | wood 5363 | woodland 5364 | woods 5365 | wool 5366 | word 5367 | wordless 5368 | work 5369 | worker 5370 | workman 5371 | workplace 5372 | workshop 5373 | world 5374 | worm 5375 | worsen 5376 | worsens 5377 | worst 5378 | worthless 5379 | worthy 5380 | wrathful 5381 | wreathe 5382 | wrestle 5383 | wrist 5384 | write 5385 | writer 5386 | writing 5387 | written 5388 | wrong 5389 | wrongdoer 5390 | wrongdoing 5391 | yacht 5392 | yale 5393 | yard 5394 | yarn 5395 | year 5396 | yell 5397 | yellow 5398 | yellowish 5399 | yen 5400 | yield 5401 | yodeling 5402 | young 5403 | zebra 5404 | zinc 5405 | zombie 5406 | zone 5407 | zoo 5408 | makes 5409 | calling 5410 | developed 5411 | organising 5412 | causes 5413 | requiring 5414 | made 5415 | gave 5416 | seemed 5417 | refuses 5418 | produced 5419 | gives 5420 | sets 5421 | organising 5422 | works 5423 | using 5424 | causes 5425 | affecting 5426 | cause 5427 | strike 5428 | considering 5429 | refuses 5430 | affected 5431 | starts 5432 | gives 5433 | starts 5434 | affected 5435 | produces 5436 | happens 5437 | using 5438 | establishing 5439 | seemed 5440 | said 5441 | led 5442 | calling 5443 | found 5444 | says 5445 | happens 5446 | reducing 5447 | shows 5448 | considers 5449 | created 5450 | affecting 5451 | providing 5452 | says 5453 | given 5454 | requires 5455 | affecting 5456 | making 5457 | applies 5458 | found 5459 | organising 5460 | providing 5461 | calls 5462 | refusing 5463 | setting 5464 | organise 5465 | considers 5466 | allowed 5467 | led 5468 | helped 5469 | recognise 5470 | makes 5471 | given 5472 | creating 5473 | happened 5474 | starts 5475 | establish 5476 | employ 5477 | working 5478 | required 5479 | affecting 5480 | recognise 5481 | produced 5482 | showed 5483 | setting 5484 | given 5485 | said 5486 | employ 5487 | works 5488 | showing 5489 | employ 5490 | refused 5491 | affected 5492 | concerned 5493 | included 5494 | protected 5495 | dismiss 5496 | found 5497 | reducing 5498 | including 5499 | strike 5500 | included 5501 | continued 5502 | establishing 5503 | provide 5504 | said 5505 | refusing 5506 | paid 5507 | establishing 5508 | happens 5509 | use 5510 | working 5511 | refusing 5512 | establish 5513 | allows 5514 | increased 5515 | seemed 5516 | seems 5517 | lead 5518 | made 5519 | employ 5520 | produced 5521 | considers 5522 | developed 5523 | allows 5524 | organising 5525 | set 5526 | shows 5527 | produces 5528 | made 5529 | employ 5530 | reports 5531 | starting 5532 | shown 5533 | causes 5534 | calls 5535 | used 5536 | works 5537 | starts 5538 | refused 5539 | reported 5540 | use 5541 | requires 5542 | produced 5543 | produced 5544 | set 5545 | helping 5546 | created 5547 | starts 5548 | seems 5549 | considering 5550 | reported 5551 | taken 5552 | set 5553 | works 5554 | produced 5555 | creating 5556 | says 5557 | showing 5558 | affected 5559 | shows 5560 | working 5561 | exists 5562 | helping 5563 | establishing 5564 | cause 5565 | produced 5566 | reported 5567 | finds 5568 | used 5569 | allows 5570 | sets 5571 | organised 5572 | starts 5573 | considered 5574 | increases 5575 | showing 5576 | showed 5577 | refused 5578 | creating 5579 | developing 5580 | providing 5581 | considers 5582 | use 5583 | provide 5584 | provides 5585 | showing 5586 | shown 5587 | use 5588 | applied 5589 | dismiss 5590 | showed 5591 | makes 5592 | includes 5593 | provide 5594 | provides 5595 | allows 5596 | lead 5597 | dismiss 5598 | given 5599 | finding 5600 | working 5601 | increased 5602 | considered 5603 | applied 5604 | includes 5605 | refuses 5606 | says 5607 | starting 5608 | increased 5609 | says 5610 | organising 5611 | gives 5612 | including 5613 | included 5614 | makes 5615 | allows 5616 | seems 5617 | including 5618 | said 5619 | called 5620 | working 5621 | worked 5622 | requires 5623 | employ 5624 | reported 5625 | protects 5626 | allowed 5627 | showed 5628 | led 5629 | using -------------------------------------------------------------------------------- /vectors/launcher.py: -------------------------------------------------------------------------------- 1 | #lhy 2 | #2015.4 3 | 4 | import analysis 5 | 6 | myAnalysis = analysis.Analysis() 7 | myAnalysis.dimension_analysis() 8 | -------------------------------------------------------------------------------- /vectors/sampleVectors.txt.gz: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/luohongyin/OIWE/ae6300754801cc1b2bc9195bf305670e78774a72/vectors/sampleVectors.txt.gz -------------------------------------------------------------------------------- /vectors/top_words/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/luohongyin/OIWE/ae6300754801cc1b2bc9195bf305670e78774a72/vectors/top_words/.DS_Store -------------------------------------------------------------------------------- /vectors/top_words/words280: -------------------------------------------------------------------------------- 1 | europeans americas elephas africa continent sub africans african asian glam saharan 9 2 | att leavis ltskog starfighter circ currently hrerbunker rster rhinemaidens kirchenmusik ring 5 3 | camouflage regiment parachute corps naval battalions bases infantry battalion ivo units 9 4 | alkene russian nikolaevich boris grigori sergeyevich pavel sergei bulgakov yakovlev vladimir 0 5 | cambodia tenga machines thailand korea singapore malaysia lanka taiwan sri china 2 6 | jacobus herder theologian philosopher hierarchs swiss swedish andersen nland dutch danish 4 7 | ugosz te sparil existing charg danann ivoire affaires psy d tuatha 3 8 | zaire serbs chlorinated croats kosova macedonia bosnia fyrom republic herzegovina congo 2 9 | encryption huffman lzw stroessner calculators methods algorithms techniques ciphers cipher polyalphabetic 3 10 | andalusia utrecht francoist normandie netherlands mezquita specify gibraltar ryswick france spain 6 11 | subsistence fej mainstay mainstays collectivized privatizing restructuring modernization sectors privatization sector 1 12 | sociologist actuated tumbler drivetrain traction sheave spark casings reactor battery candu 0 13 | waver wedged cookbook antagonism continentists interrelations hymen artfully bridged mismatch between 2 14 | isaacs desktop xp xenix workbench prodos symbian nextstep gui vms qnx 0 15 | cet mondial ajahn podium nanak rgya gts mtsho gtb illmatic lithograph 3 16 | cesarani journalist bova hume founder adret petroleum artful ben author david 6 17 | watergate president scandal presidential reagan hillary assuring gore lewinsky eisenhower clinton 6 18 | coherency erechtheus memory rfid mpls packets buffer interrupt warehousing cache packet 1 19 | renaissance eminent empt classicism empted eminence extras pottery raphaelite medieval pre 6 20 | topographic eas tunisia coordinates geography geographic map location asia macromolecules gulf 9 21 | stammering columbians cinematographers linnean british mourn entomological notices apothecaries theosophical society 5 22 | shiloah qarase jospin bonding konoye ministership olmert karami fogh prime minister 3 23 | japhetic mythology scythian aryans greeks celts germanic thracians celtic cookbook artificer 9 24 | mr crane larch pakistan selina alan donald seuss strangelove shiragami dr 3 25 | accusations threats kidnappings apion beatings incited protesting plotting pitted chronicon incite 9 26 | pamphylia thrace peninsula balkans mesopotamia kashmir levant formalism philistia syria frontier 7 27 | industrialized sanchez chiroptera places chronological contrast developing carnivals in addition elsewhere 1 28 | distinct wildly differ romain radically specificities varying subtly differing vary different 3 29 | setter dru scotch lowlanders manx gaelic wolfhound extras welsh scottish irish 7 30 | eumc eu bloc belarus kalmar republics soviet enlargement macromolecules joining union 8 31 | harris zinc silicate molybdenum ore chromium antimony pentyl magnesite manganese nickel 0 32 | aug majin vegeta apr mjw mjt buu bookkeeping gem mjs pngimage 7 33 | consistently completely theorised albeniz disappeared exclusively virtually inception materialized continuously almost 3 34 | voisin dramatist politician statesman podium illustrator painter allori writer soldier doorman 4 35 | blacks peasants siegel serfs foreigners prostitutes zoramites uneducated disabled unemployed teenagers 2 36 | clk mercedes slr buren riebeeck benz boo partitioning sagan carl van 7 37 | wesley reissue metamagical tempus reichenberger existing isbns bloomsbury addison paperback isbn 5 38 | nastro danann armes cannibals tippit escrime tuatha affaires charg psy d 3 39 | dominions botc calcutta convicts mohandas overseas colonies gyi raj mahatma gandhi 7 40 | harris comparatively unimportant oddly resilient fragile fairly extremely very strangely relatively 0 41 | refus ne coppersmith facundo asterix des panza rosa don toynbee quixote 9 42 | dufay necker quarter horrid veterinarian kouchner bertillon goscinny french finalist asterix 3 43 | vlaamse inexpensively reminders restated imprints wildfires eegs counteracted advantageously connote can 0 44 | cabernet ubiquity shorn narrowness cosh deflating userbase secures softness entirety its 4 45 | ethnic bsl spoken cornish volgaic severed speakers uralic finno ovimbundu ugric 5 46 | stauber lydgate galt smeaton napier pakistan philoponus galliano tenniel zerzan john 5 47 | esteemed hugely influential popular publicized respected highly chronicon vitally publicised immensely 7 48 | phoney aif bowles veterans reunions chaco averting mortem horrors phony war 2 49 | dandelin wikisaurus lifo rav entactogen apso csd yird see comrie disambiguation 9 50 | abolitionists numismatic armigerous zoetrope philanthropists quintessentially dietetic anion everlast exceptionalism american 7 51 | coleman chaplin indiana robinson buses perky davis young gary smith jones 4 52 | discernable kakinomoto nakatomi subscribing import ewa blaster recital insubstantial stil no 5 53 | bypassed typecast swamped nothingness reversibility unpatriotic wagered chit teleported converg being 9 54 | multiracial suicides autodidacts teetotalers births actors halogen portlanders liverpudlians daylights living 6 55 | cratylus rhymer jefferson randolph nayland petroleum szasz smith buckland william thomas 5 56 | negl hazards speculators ivo dike water permanent unarable pastures land arable 3 57 | coiner babe capers outperforming manager mike tba parcells backs coordinator coach 3 58 | nya seller nebula oscar retro fairest filmfare pagal hugo newcomer best 0 59 | narratology filkers mingle tropes fanzines countercultural teleporter fandom science alexandros fiction 9 60 | nascar racecar undertaker remember actress wrestler racer swimsuit car driver race 3 61 | mcgraw lutterworth ed chlorinated macmillan greenery routledge harper publishing publishers penguin 3 62 | klansmen aediles indignant amulets mimsy ashdod sharpshooters borogoves were converg mome 9 63 | wenham krag rgen petroleum prochnow oppenheimer rgensen rvinen selye dis j 3 64 | wielki uaa trixie monohull tolkien krew krakowie buckminster chojn fuller w 3 65 | nonphysical esa baikonur horrendous tether bulkheads smollett esro cosmodrome kablooie space 6 66 | rkiye bumper unwound inhale stickers givens lfabl goodies harass don t 7 67 | kmaq perigee velika sq mph apogee calorimetry miles makeup mi km 6 68 | arapaho ertra hino eisteddfod campesinos discriminating szolnok enquirer underling anthem national 6 69 | aquarist roost fej astrodome frontierland sanitarium underdogs stadium club girt home 2 70 | honouring walk famer finlandia faneuil roll inductees inducted niqqud hall fame 8 71 | expressed table integrals comrie gives geometric mean definition formula integral above 3 72 | fibonacci factorial fac aldrin log matrix coefficient sum integer negative positive 3 73 | alberich werewolf heroquest sandokan cdots shaping roles pivotal runequest crucial role 4 74 | infirm lurk indefensible steelmaking eubanks hanshu counterfactual unce remaking infamy history 0 75 | barrels snooker yngwie peek hireling schalk macmaster pvp npcs sneaky player 0 76 | lookouts brewers pathophysiology gentlemen mandalay franchise azl delian prizren division league 2 77 | laberge freeserve ecotage sel slush inorganic gaap subs clubbing us uk 5 78 | edgware lop viscount churchill earl aberdeen bentinck carrington romain lieutenant lord 8 79 | plows imitators many observers instances aficionados analysts hubert catchphrases commentators respects 7 80 | prelog podium nevanlinna khorana carlsson economist recipient laureate prize nobel physiology 1 81 | disloyal thackeray shockley lyndon digges sydenham yeats thomas johnson william samuel 0 82 | irv assembly runoff seats plurality stv voting deputies ballot salinas vote 9 83 | mattie rutherford sedov agee boxer clogging manush bunk lyndon goode johnson 5 84 | scams including goblins romain various obstructions numerous plethora sorts besides other 3 85 | tundra iphigeneia upland volcanic subtropical terrain climate climatic arid glaciers temperate 1 86 | mozambique guyana calorimetry liberia ghana fiji communications relations miscellaneous demographics politics 2 87 | podium ll sdr therus gissurarson kkinen mongi hoos oll muet h 0 88 | military service staff chlorinated khad manpower uniformed defence police mandatory defense 3 89 | syndicalist communism conservatism capitalism anarcha individualist libertarianism anarchist anarchism phaethon anarcho 9 90 | eagles ark nd playoffs raiders divisional outperforming paradise nfc afc lost 6 91 | frauds copyright infringement provisions tort affreightment chronicon statutory statutes barratry guaranteeing 6 92 | scarcity prices maximizing marginal kazimierz reductions costs demand overruns price spiralling 4 93 | murci census acetate rublei zero bde ifconfig lx fsb fd gamini 2 94 | divides bijective det iteratively homomorphism quenched glam proceeds subgroup ekron then 6 95 | mates rink ebbets catchers joule offensive goaltender field coors baseball team 4 96 | subsided expanded resurgence undeserved lately recently undergone elicited businesswoman vagrant has 8 97 | entrez mindat ignited index wikipedia page imdb google search reading entry 2 98 | pheasants fortunella genus cats butterflies chondrichthyes scribe osteichthyes bees cartilaginous caracal 6 99 | been gotten beforehand sawn greville experimentalists begun duped already hitherto previously 4 100 | idle burrito sketch hammett miast hetfield elvis circus big monty python 4 101 | bloom asha played vocalist vocals musician evenly goy guitarist singer lead 6 102 | zarah transitivity xor fb gcd converg nf eb lcm flats cmt 5 103 | genitive epiglottal siegel alveolar voiceless diacritic alphabet labiodental nominative approximant fricative 2 104 | phelan coolidge taggart apocryphon maynard randi sncc boswell renin calvin james 8 105 | stumbles glossed meaninglessness spilled salinas cantus aggregating precedence discant arching over 4 106 | finnish mafia bulgarian sculptors muscovites slow russian hungarian romanian polish italian 5 107 | mexican pinchcliffe chile brazilian peru argentina prix brazil portuguese machines spanish 9 108 | nimitz arleigh flew bathyscaphe pilot mission tugboat crew bg boat apollo 8 109 | alfred jagdish neurobiologist converg literature recipient ig khorana winners nobel prize 3 110 | friedman milton drake russell dejacque joseph randomness abraham mill newton isaac 6 111 | taras foretell seven nikolai boylston botanist geologist painter pepys samuel poet 1 112 | necessities musicals comedy cinema filmmaking stage cinematography improv hollywood talkies picture 0 113 | stabilisation deflecting glam thrust qel fab majeure brute centrifugal fictitious force 2 114 | targets weapon attack assault suppressive melee missile missiles chronicon weapons fire 8 115 | romania kroll skeletal ukraine russia finland switzerland italy poland germany hanimex 2 116 | nehemiah ezekiel midrash esdras tanakh ezra jeremiah zephaniah esther dp nevi 9 117 | jaya timorese aldwych monarchianism bluie indies berliners imposter gaza west east 7 118 | wheatstone arthur wallace william malmesbury albert charles lyell dug pike henry 8 119 | sverre iv posthumus machines zog valdemar godfred frederick haakon king groan 3 120 | grasping handed arm face wrist chronicon leg forearm fingers fretting hand 5 121 | deeley lee voice diarmuid hackman fpu supporting brando actor producer director 5 122 | curiously longbowman captions elt candlemakers melle apace adaptability virile english amuro 5 123 | complimented unexposed emb topsy films phaethon idiots hoodlum movie celluloid film 5 124 | podiums chronicon burlesques acupuncturists conspicuous most glaring examples notably salient notable 1 125 | ltr chronicon hander mads crescens foramen untouched cornea parenthesis left right 1 126 | mcfarlane karad perfect omg noor dreamt nathuram lcar koh nkjv am 0 127 | toaster ligand colecovision intellivision console fps nintendo arcade gaming playstation consoles 1 128 | hodiernal present businesswoman event interglacial past neogene edo future extinction ediacaran 2 129 | rorty rosalind harry harrison benjamin delano aretha herbert barrack franklin richard 8 130 | gotcha homma glycosylation conflagration ludd belgrano quartermaster gotovina turgidson adjutant general 2 131 | gulch congresswoman treasuries thant apos unaffiliated hooligan bancorp llah maw u 5 132 | psychopathology stratification eisner ethical empowerment sustainability ecological creativity deprecating societal ramifications 2 133 | deinococcus converg aryl gartner spearhead warez proteobacteria thermus tits amphibole group 1 134 | cornucopia obscurity forays disrepute degenerates morphed albeniz subdivide headlong delving into 6 135 | nanook lega riposte addu marquesan north dmz uist degli deadwood south 8 136 | lodge awarded grand master bath jedi buses knights merit masonic knight 6 137 | concentration disarmament pow mauthausen belsen camps iraq prisoners dachau cookbook camp 9 138 | databank sawai towing pseudoarchaeology sanchez doubts articles glean breatharianism dichotomous article 4 139 | tressure scots danelaw gleneagles heartlands deira queen wales machines scotland england 8 140 | pinchcliffe auto grands honour prix grand photosphere theft aptly honor named 6 141 | wilhelm bismarck himmler chlorinated saxony apollonian austria speer hitler eichmann adolf 3 142 | aia valech committee sasson intelligence department cosh nsa report bureau agency 6 143 | percent biathlete lamed cug lrh doctorow angerthas affaire schodt sauvage l 0 144 | countable varnothing cantor closure ordinal sets irrationals finite uncountable davies mandelbrot 9 145 | aristotelian philosophy evolutionary linguistics biology phenomenology phaethon metaphilosophy aesthetics epistemology metaphysics 6 146 | schildt ruga techtv gaffe blacki nubus tterd reticulatus mmerung londo g 5 147 | association scuba sporting judo exhibitions gymkhana inorganic competition diving competitions olympic 6 148 | nations riget caer baronets united conurbations talkie isambard sanc baronetage kingdom 6 149 | texas austin chlorinated illinois cleveland arbor detroit atlanta houston chicago kansas 2 150 | noor frantic hotmail koh vuk dreamt montaner igo zand karad i 2 151 | infinitum proctor feminam date propter hoc listing days dates events remaining 1 152 | masturbation elba sadomasochism heterosexual dimorphism homosexual intercourse sex anal penetrative sexual 1 153 | ask msa snag ayawaska salutes sorcerers glorifies sociologically sweethearts rediscovering america 0 154 | ukiyo coghill brugge mailed wile galil danjou enjambment stroessner mails e 8 155 | paradigms procedural lisp programming grammar transformational superficially grammars romain syntax imperative 8 156 | sects halakha vipassana rites quanta cherem synagogues judaism hasidic sect practising 4 157 | opteron ql hc ia podium sx microprocessor amd motorola intel bit 4 158 | awarded broadsword rumford halogen won coveted award gold medals received medal 3 159 | bong fermanagh nova nunavut manitoba district uncountably panyu alberta quebec county 6 160 | signers punters whack tornados subcategory belligerent nickel trucial belligerency aspirant states 6 161 | pendragon philistine aurelianus fairhair lf hamlet macbeth sclc freedos gerar king 8 162 | kindia urban districts faranah provincias prefecture albeniz vilayet gemeinden sanjaks rajonas 6 163 | shammai infinitum sulba mago epistulae tempers cynic parenthesis atticus ad bc 7 164 | lds churches steeples petrochemical latter adventist schwenkfelder subgenius adventists church saints 3 165 | oakland clippers anaheim denver cycle atlanta freeway columbus los angeles boston 4 166 | luck yeah worrying thrill chronicon you damn remember okay yourself everybody 4 167 | ago weigh tya vague centering revolve ppm revolved mya revolves around 3 168 | fortitude compliment khafre sewashi milenko haste eisner auk calamity unwashed great 6 169 | aphrodite murbella harmonia remarriage leto barley freyja odrade taraza gesserit bene 5 170 | dalet versed beatniks functioned stocked refered behaved solitudes as well inasmuch 0 171 | archipelago ellice ogasawara bonin yap syros paros pribilof island degli islands 9 172 | incarnate radha wicked manifesting baal spirit copper eternal kroni vishnu mainyu 6 173 | macromolecules alexandria diocese archbishops archbishop rome suffragan papal abbots bishops bishop 0 174 | ithaca campus koestler junior dartmouth elizabethtown seniors universities madhyamika colleges school 2 175 | vegas claus moved barbara clara halogen san taiko santa berkeley washington 5 176 | adviser promoted embroiled disillusioned enamoured enamored organizer chief engrossed eaux became 9 177 | tj witwatersrand platonists uni stwertka stylebook companion lutterworth oxford cambridge press 0 178 | saguinus yellow narcotics reddish jacket stallion tamarin callithrix brindle collar aquamarine 2 179 | billions costing machir billion fistful half docs tens millions million dollars 6 180 | low lon artistry fidelity priority levels temperature altitudes high bidder level 1 181 | quickness megabits generation trimeter sixth akhana oort fifth fourth third second 6 182 | ardor gambino compositae marigold indridae voor orchidaceae amaryllidaceae sportive asphodel family 5 183 | isma qaeda cin tipper ma sharm extras capone hazmi gore al 6 184 | paleogeography thunderstorm convalescence changeover recessions copulation stopover nya interrogations sojourns during 7 185 | lcao bose bohr sepulveda quantum brownian electrodynamics electromagnetism relativistic newtonian mechanics 3 186 | calendarists domini etonians ibogaine julian senile sarum gutnish calendar fashioned old 3 187 | transport apollodorus epitome vi xxiv mk schedule ii iv lxiii iii 0 188 | tales autobiographical tristram novellas poe barrels shandy poem prose fables poems 5 189 | relies occasion rely frowned chronicon verge embark concentrating depended relying relied 4 190 | runescape anesthetics dramatize amniocentesis bahamut clonazepam midazolam polski diazepam joys such 7 191 | landowning continued chronicon bourgeoisie rushen flourish class protuberance cathari middle ages 2 192 | pfp beverly glendale mount park burbank disneyland resort near california killing 0 193 | january august december ds october july april march june september november 3 194 | contradicts thorn falsity premise fallacious argument epimenides fallacies antecedent assertion fallacy 1 195 | heidegger tarsus sartre luther le aston livestock les cezanne paul martin 6 196 | kings pretenders nickel ruled succession bentheim prince princes dukes counts throne 2 197 | carnation zapata disastrous emiliano revolutions ensuing glorious wars napoleonic molokai revolution 9 198 | strawberry consists redesigning sturmabteilung taggers misdirection grout gradations nefesh vast part 0 199 | volusia lusp linge kingssonar rtsil jc expanded grokster gug hamdi v 6 200 | tunnels murrumbidgee canal rivers konstantinos canals prut dam trails dams locks 4 201 | hallam tilburg oita calorimetry anhui witwatersrand norio cranfield wollongong troms university 3 202 | carnival litha parti yule nights mabon winter summer solstice spring autumn 2 203 | caroli disrupt torr tequila gentry noli picta var americana aurea agave 1 204 | jimmu emperors valerius sushun caesar haile caligula suzaku claudius nickel emperor 9 205 | ebne sidi ar ghul bin et mining al lech wa sa 6 206 | templar muscovy nickel xenia ellington louis grand nukem knights kahanamoku duke 2 207 | comrie incorrectly misnomer abbreviated synonymously interchangeably euphemistic pejoratively pejorative coining derogatory 0 208 | percentile nickel rd nd ringen geirr anniversary panchen lama dalai th 1 209 | ketterle fermilab visiting sleng tuol electrostatics lab hinting attendees pranks exhibited 5 210 | hubbard cug nnrot configurations thien bootlegged veranda strauss khalsa schodt l 3 211 | bombards entailed thirteenth nineteenth eighteenth fifteenth seventeenth sixteenth twentieth lupi century 9 212 | lva ua konstantinos nitin ovale overconfident vivax zollner burkitt orridge p 2 213 | ascent memoirists foremost undoubted ekaggata inuk arche sighting glimpse cataract first 1 214 | menaced cajuns gaule dunkerque financi tournois vigen fauves tagline lumi french 8 215 | marpol merchant ships pontos schoolteacher beagle erythraean dumping ozone sea ship 4 216 | tungurahua hurdles boney ljubljana rix naghten benelli hlhausen triangulum veszpr m 3 217 | agreement stor mercader uqbar lav latm latns laoco ananta meth n 0 218 | few forty years fourteen thirty fifteen cdots decades weeks eleven eighteen 6 219 | gate towers facade erected anechoic building mansions wall tower converg rooms 9 220 | bootleg idempotent lps elektra album mosh thmix shady albums remix lp 1 221 | isca nub musei sopot ricochet retronym jayapura pore fatigued vaticani city 7 222 | stages adopters riser seventies sixties mid bowles eighties late nineties early 6 223 | london paddington croydon walford islington street uncountably potters finsbury park square 6 224 | cordell processional imager kachin glam nearness corporative wildflower rakhine arakan state 4 225 | calcination sociopath demonization known pap mekugi seconda satirically businesswoman hanukiah called 8 226 | funds cdots evasion fisc filed special opinion lefkow federal cci injunctive 1 227 | sunspot grate metonic fortieth increment scaleminor nubus olds snowbound itch year 6 228 | underflow dismissing kindling multipoint rallying isoelectric deemphasized microseconds points chronicon point 9 229 | overturning overturned fisc jester appeals appeal judge trial supreme cdots court 9 230 | hoboken ndor giuliani katrina evacuees brooklyn laguardia connecticut orleans york jersey 1 231 | idealized parity eyesore affidavit oxymoron irascible undisclosed allusion assortment an afterthought 1 232 | recognize possess decide chronicon observe jeet perceive wish do deem kune 3 233 | press chicago lapd alamos dc berkeley los angeles post uncertainty california 9 234 | repeating rhombic handedly romain dodecahedron rectified nucleotides snub stranded strand helix 3 235 | intrigued brash characterised overshadowed maharajah supplemented passers nickel by immobility unaffected 7 236 | hyborian varma apprentices haircuts fej marriageable aquilegia columbine dawning gestational age 4 237 | suggested reputed seem wy have companionship giftedness elly purported suggest madai 3 238 | mining ar sa wal kebir sallah hitpa sharm phan meknes el 0 239 | theatre moved fine arbor cincinnati lincoln chihuly halogen art joslyn museum 7 240 | glycosylation bottom fossiliferous signaled hammersmith cursory odds ends signalled indentation least 0 241 | xs malcom yz y crystallography inq loge jis x keri ray 9 242 | khomeini caliph bint macromolecules imams fatwa brotherhood islamic jihad mecca muhammad 3 243 | albeniz hub restaurants pubs hubs cities centres transit attractions tourist shopping 0 244 | bin rr omar fronti gan eurocentrism sadd disiyyah lemmink khayy inen 5 245 | html suture vivum sephardim edu htm omne org www http com 3 246 | gdps ds alleviation gini per abject percentage household income capita poverty 1 247 | pretending reordered criminally inspected courteous disposed unsightly be avoided gyi impartially 9 248 | comrie provably interesting questionable necessary conceivable caveat happens occlusion is unclear 0 249 | kammerer muad reubens dib ehrlich onesimus reserves sartre tarsus cezanne paul 6 250 | liars exaggerations invariably limitless ko grouped doppelbocks interdependent mutually interchangeable are 4 251 | inch lyr three bde minutes versus hours kb seconds clogging mic 9 252 | quad forall bar y qquad frac q cdots cdot loess int 9 253 | wonders heavyweight fina wreckin wca ywca soulless haba world dalet antediluvian 9 254 | cosines returns identity curried printf functions variables bessel lon continuous function 8 255 | army badonicus halfa decisively rearguard nonprofit grunwald armies patay cavalry battle 5 256 | chronicon lanky cm wide gauge centimetres height depth mm length metre 0 257 | breaks shutting fleshed singled away forth chronicon pulled down out shut 6 258 | petroc qualification columb anchorite alphege sarov saint mary magdalene ouen st 1 259 | jackson alabama columbus albeniz biloxi superior natchez township mississippi michigan lake 3 260 | della buonarroti pirandello skeletal sergio fran da luigi giovanni vinci leonardo 3 261 | malley misogyny sapo sensei orientis brillig hoods abominations jell ino o 1 262 | triomphe lucia nevis sula suu eustatius this aung kyi collegio roque 6 263 | mysterium irreversible rebus usque eius artis primum autem sive haec et 1 264 | broadening evocation riel mazarin xiv revocation napoleon bonaparte louis pasteur france 0 265 | nsson podium mackie glennon helfen krag rgensen pfaff appl narayan dallwitz 1 266 | smokes vengeful revels cultivates wftu obsessively salter changeling man shaves kennewick 6 267 | infant runways cfa cfaf fertility paved total vicuna male disloyal female 9 268 | adiation ringway agreement sebastos zekne kea sobib shkod autokrat shindo r 2 269 | greville plasma endoplasmic mannose cytoskeleton cyclase activation vesicle membrane membranes pathway 0 270 | artnet asap ctis lankhmar grime partnership deka tonalsoft aeclanum external links 5 271 | forum news mcdonalds fan atta relevancy sites official unofficial website site 4 272 | hooters sedov lancer b rstner boying superfortress predicting tisha avodah metzitzah 7 273 | scoring skiego karpov scored shutout koopa wins streak kart scoreless innings 1 274 | consultancies deque albeniz spacewalks parlors initialisms alphabetically topics circularly lists list 2 275 | sanchez monetary ecb bankers coins bank exchange euro currency banknotes trade 0 276 | ha laden thw hammon lech bin mining ghul sa wa al 6 277 | neuropsychology hrung jtc freiheit nibelungen taz eines der iso berlin die 0 278 | rogues senators skeletal parochial drumbeat spotswood burgesses governor governors mirth house 2 279 | uncommon not endorse reexports hesitate unheard necessarily did anymore bangle does 3 280 | marco roque del juan aung calorimetry suu kyi santa sula collegio 5 281 | garb oppresses chronicon scantily incites sari sabine girlie subjection men women 2 282 | taking potentiation furrows finisher overlong takes awaited place chronicon short long 8 283 | converg envisioned subsequently excelsus discontinued debater shelved retroactively initially originally was 0 284 | dilate exo amberley percent bhaktivedanta hts ugc sar c sarea cfc 3 285 | nuanced relaxed temperamental prosaic realistically duckworth pliable lighthearted extroverted muck more 5 286 | opting jl vojt krak orgy operative chojn mp w ch co 4 287 | classroom midwifery examinations schooling vocational baccalaureate remedial courses secondary glam education 9 288 | sandow drang legende etzel calorimetry vaterland zeppelins reichs neubrandenburg osten german 4 289 | flattering nearer firmer rather eighths tougher monohulls bowles weaker fairer than 7 290 | gradation klebsiella sarde bioprospecting e bb gri mfg tylenol lineal coghill 9 291 | chile francisco costa specify puerto juan vel castro santiago salvador mexican 3 292 | johann liszt handel bwv pineapples telemann mozart brahms haydn bach beethoven 4 293 | lillard mills bening tori boxer come westerberg mcveigh songwriter mauricio musician 5 294 | ensue startle terminate abate inevitably chronicon incur suffice would uis will 5 295 | ds pontiffs augustana dartmouth vassar brasenose bowdoin collegium bard goldsmiths college 0 296 | borrows departs depart profited refraining from refrain stemmed ds refrained benefitting 8 297 | hone kiyoshi hefner urfa ryazan spousal ichikawa clairvoyant greve prematurely born 5 298 | horizons totowa upstate echota ds entrants zealander zealanders heavies yorkers new 4 299 | exports dhy fy na twh imports furtherance electricity kwh gwh est 6 300 | contrasted coincided familiarity dealing preoccupation chronicon associating fascination imbued conflicted with 5 301 | gabby jeanette playwright valerie booke bubblebath actress comedian musician podium actor 9 302 | irreversible estaing porfirio psy charg affaires danann giscard tuatha tat roosevelt 0 303 | blueprint penchant chronicon responsible reserved instance fondness for preparing sake intents 2 304 | esp re janeiro igreja jornal lin ambique sprague paulo products pr 9 305 | proportion amount amounts scale rossby chronicon tetranacci sheer mulliken shub number 5 306 | intact persist still lingers ridicule hotly unresolved phaethon debated debate controversy 7 307 | tempting xaa cringely malcom substitutable yx yz asymptote nears ys x 0 308 | chronicon moving axis tilt ecliptic perpendicular concave downward spiral parabolic rotation 0 309 | mistake clearer difficult make pinpoint headway harder chronicon easier makes making 7 310 | linking railway straits corinth massawa ports strait psl atlantic lines coast 7 311 | francisco skeletal cellspacing bullseye stata overlook rowspan bay colspan align center 1 312 | chamois jpg modis ibex mench tpvgames pitfall caption gif png image 4 313 | jacksonian parties optimizing centrist bharatiya herut demokratika uup sdp party democrats 2 314 | authoritarian seized autocratic tyrannical regime deng siegel regimes control xiaoping glasnost 6 315 | longed pore befriended tenderly recalled polygraph baffled admires excelled mortification journeyed 1 316 | buteo image ptarmigan pointed fra weevil ibex chamois angelico pix jpg 3 317 | kung piano bodhidharma cuisine taoists bushido muay daimyo thai japanese chinese 1 318 | padding fiav gallery motto colspan thorn svg hoisting background align flag 5 319 | chulainn ugc hts exo amberley prabhupada sarea dilate visby bhaktivedanta c 8 320 | beno doesn rkiye bumper inhale goodies harass stickers unwound lfabl t 5 321 | londo amounted seid umschlungen prot te escher tterd reticulatus elbl mmerung 1 322 | mv beta wealth kappa mbox phi ai alpha maersk tet chi 2 323 | mettrie cleyre cul passeig manseriche vigilant bona la flinging civili de 8 324 | aloft lapse elapsed gmt reoriented allotted sanchez devourer dst time chronos 6 325 | frankfort osage psl clair eunice avenue baton watling river memphis street 2 326 | parakeet tennessee asheville ds charleston dakota greensboro georgia virginia florida carolina 3 327 | manta var se primavera halogen americana aurea agave masse en la 4 328 | scalemajor samanid hoysala sultans despotate klux ghorids gokturk ghaznavid vijayanagara empire 0 329 | judicial ceremonial glam judiciary authority legislative branch beavis davidians butt executive 2 330 | osce kyoto interpol nsg imo pca ilo nam unctad glam opcw 9 331 | farming renger zyklon lancer avodah hooters sedov superfortress boying metzitzah b 0 332 | psa trains train concorde airliners airlines nazwy passengers airline freight passenger 6 333 | andr marie malraux ois bergson fran monet pierre innumerable jacques jean 8 334 | hammadi letterpress encyclopedias typeset newline copy rosetta glam print text printed 7 335 | aur convenes ho aq ng gi sh ta ve li na 1 336 | nigsplatz nigsberg nstlerroman benhavn imports lsch satomi dewdney volap z k 4 337 | decapitate refusing hoping rectify appease attempting revive salter locate convince persuade 7 338 | cello vie cornet flute flutes harp bassoon trumpets clarinet harmonica instrument 1 339 | maritimes quebec aboriginal ds petro australia nova bilingualism scotia canadian canada 3 340 | elbl g tterd reticulatus mmerung midrash londo incompleteness kurt escher del 5 341 | jupiter mercurial meteorite pluto planet mars defeating sun earth moon andmoreagain 6 342 | chancellorship perfectionism posterity his happiest habilitation contemporaries electrostatics erudition priscillian mandali 7 343 | tubas swordsmen descant impressionism starfighter ltskog ecky foonly rster hrerbunker f 3 344 | farriers mobil brand candler companies subsidiary exxon corporation worshipful converg company 9 345 | leclerc gounod babbage schumer grandison borda de marignac glycosylation charles gaulle 8 346 | vuoksi parallel runs tagus reaching padma sides far beyond iphigeneia side 9 347 | m arpa darpa competency foresight pew management cryonics resource dianetic research 0 348 | frightening nonexistent obscene intentionally unintentionally knowingly starck trifling either abbates thereof 6 349 | unikom rules boxing wrestling bowlers players cricket bowling amateur football golf 0 350 | domain glam registries gpl copyleft gratis floss qpl license licenses fsf 1 351 | polyphonic dances punk jazz hip chronicon tap music dance lindy hop 5 352 | mathrm phi nabla beta cdot delta omega frac alpha uup mathbf 9 353 | mop conjured racked sault trumped racking fucked summed speeded propped up 3 354 | cbs uhf cdots ntl itv repeater foxtel broadcast radio broadcasts ctv 2 355 | dreamlike unsentimental logbook shrill credentials maan ruthlessness thefts respective majesties their 5 356 | groundhog afternoon bonfire monday pfp saturday friday evening morning sunday day 4 357 | fej canceled death announcement patterned after thereafter moonset attaining hesitation shortly 0 358 | vectorborne disease hemorrhagic headaches headache asthma stanfield congestive diabetes nausea chronic 6 359 | monster villains comics robot supervillain cubic squarepants fantastic robo marvel ness 5 360 | stating hoped concluded realised extras meant realized noticed remarked claiming stated 4 361 | -------------------------------------------------------------------------------- /vectors/words: -------------------------------------------------------------------------------- 1 | mazda audi am jeep bmw miata 2 2 | tortured shut burned newton inmates killing 3 3 | drexel bordered stata curators tuol sleng 1 4 | calls mandrakes peredeo juggalos saxony maggie 4 5 | louisiana an ominous impression idealized warning 0 6 | acct workprint lumi version machina released 0 7 | anglicised louisiana archdruid meaning anagram name 1 8 | standing stone maze concrete agriculture tower 4 9 | academy aspartame bru irn dmt bittering 0 10 | atalh inen opcw rzeczypospolitej polskiej pohjola 2 11 | rang frac nabla opcw vec mathbf 3 12 | songwriter hhs under hass korzybski general 0 13 | rousseau bergson clinical cousteau necker french 2 14 | cross vortigern bastarnae imperium ceawlin attila 0 15 | speleological cem lankhmar vma celestial christadelphian 4 16 | kermit mickey bagpuss jabberwock islam nessie 4 17 | esperanto klingon interlingua manufactured iala ido 3 18 | dtd nameserver dns ipv de cccc 4 19 | capoeira bodhidharma aikido households ryu karate 3 20 | dancers style hop sqrt lindy goth 3 21 | bridgwater factors kontor ansbach hansa douglass 1 22 | mechs dubh abbates births either or 3 23 | hitpa belo riso gregorian el campeador 3 24 | buildings murals street biblical frauenkirche gotham 3 25 | iic jpg apple amiga kazaa hypercard 1 26 | isbn pandas dachshunds alpacas alpaca basenjis 0 27 | spelling rather und superficially slightly somewhat 2 28 | theorem leibniz knapsack wild minimax calculus 3 29 | km miles depth replied meters mile 3 30 | brazil chaser stories calvino niggle hexer 0 31 | tterd bread mmerung g reticulatus londo 1 32 | stitch black hungary tans red kilt 2 33 | clarinets bassoon honorary harp harmonica harmonicas 2 34 | gurmukhi louisiana yeho theophoric dativus jamo 1 35 | utc quijote pizan la de malinche 0 36 | suited und well far how too 1 37 | aloe maguey amaranthus please chaparral agave 3 38 | ibid wiecino baggini earl pernoud devries 3 39 | disorders nf lln bgcolor cra potencies 0 40 | sch warbirds features sorts items these 0 41 | still although f though present hodiernal 2 42 | cool ediacaran championships spring summer winter 2 43 | alys joan count al borda riel 3 44 | here arcology oed subspecies section interwiki 3 45 | sped knitting up championships made fucked 3 46 | entered transformed splits deer into coherent 3 47 | burnings lipsius chapter less volume book 3 48 | objectivist existentialism engines esotericism deleuze atheism 2 49 | bde lafcadio scotton humid cambridge press 3 50 | kong allele hackman virus duesberg gene 0 51 | instructions backspace instruction unido heap node 3 52 | piqad behistun text feet genevan jaffebros 3 53 | ifc frisii gildas theon haykal daoxuan 0 54 | green mi abalone doppelbock kimono haliotis 1 55 | sedov b erubin boying disorders metzitzah 4 56 | namri representationalism fran so teleoperation malintzin 2 57 | koestler vonnegut thynne unix chaucer marlowe 3 58 | afrikaners starred americans uneducated liberians americo 1 59 | amok abbreviated confusingly erich toprock kludge 3 60 | exclusively utc entirely wetlands but instead 1 61 | yo just am championships mandrakes slithy 3 62 | tommy is presently currently known now 0 63 | teimanim while whilst chancellor golliwogs others 3 64 | sherpa leap around calendar clinical feminam 4 65 | goldoni utc mjs mjt pngimage bwv 1 66 | wff louisiana hausdorff groupoid preregular functors 1 67 | haley hillary mathbb lewinsky rock oddie 2 68 | louisiana cleartype gif dddddd image rasterization 0 69 | yokoi icao hentai mahjong mancala fps 1 70 | not jpg did bangle does flamel 1 71 | cm famously notably bilateria amerikkka most 0 72 | spotters may euphemisms borzois visitors also 4 73 | hochschule curule minted expunged aediles were 0 74 | iran ectocervix below maginot line poverty 0 75 | hanazono ndnis erubin japan coal shirakawa 4 76 | ellipsis charterers subproblems fic mammals mousetrap 4 77 | camelcase use leetspeak listerine championships homeopaths 4 78 | news newspaper lung efnet daily co 2 79 | fpu mips jpg sns sse amd 2 80 | re danzon tapu enrollees msting trollhunters 0 81 | kto nahi louisiana finegold raeben i 2 82 | columbus halas cases bidwill houston chicago 2 83 | idempotents ve hom lcm sigmoid gcd 1 84 | san eskimos kazakh bamar ainu inuit 0 85 | conditions rare isotopes climate jpg ipcc 4 86 | associated concerned dealing let with cope 3 87 | reconstruct give gives cm gave giving 3 88 | eubulides jpg gregers good epimenides hypnotist 1 89 | biafra shostakovich frac difranco devo gershwin 2 90 | ronald dentine columns blocks superclusters entablature 0 91 | bene housing leto moneo gesserit faramir 1 92 | rna isbn utr mrna sanjaks vilayet 1 93 | freyr yahweh balderus lfheim humid bragi 4 94 | horatii borgia foods anjiro colonna eszterh 2 95 | caprino miyazaki amitabh mbox lugosi flockhart 3 96 | alpha bonds beta mm alkanes helix 3 97 | impressionists falstaff leni shakespearean exp hitchcock 4 98 | kg lb cma pre kgf epigr 3 99 | imports tribonacci numbers fibonacci tetranacci mersenne 0 100 | icrm functioned inasmuch well beatniks as 0 101 | manifesto anarchist river cnd marx intellectualism 2 102 | perks firsts agoraphobic other coronary include 4 103 | stripper dolphin melcher zahara named she 1 104 | protostomes chromatids televisions ants deuterostomes dimboa 2 105 | sequencing design testing process frederick cmm 4 106 | page jul jeho unassigned vah hyi 0 107 | kkinen mongi h am o therus 3 108 | kong rgya grrm lama geirr dalai 0 109 | gregorian lineker hall falwell gygax kerouac 0 110 | re fossil bergson ponty mattia merleau 1 111 | songwriter letterboxer hundred bakehouse gaeltacht letterboxes 0 112 | seems inhomogeneities arcologies redirects seem might 3 113 | gib chronic bytes inch ten kgf 1 114 | opportunity brute indirect thereby subgroup uke 4 115 | crucible rdx airlines steels cerium rearden 2 116 | pre starfighter hrerbunker rth brauerei f 0 117 | wrote drive inductor steinmetz dbv capacitor 0 118 | nobita giacometti migrant rehab leaving antinous 2 119 | honorary mbox henslow kroto wiesel nansen 1 120 | pinscher gregorian swedish danish aasen german 1 121 | joseph willey bosman charles coastline bfi 4 122 | typical forms itosu kata circa naihanchi 4 123 | inertial frames festival measurement datum hyperfocal 2 124 | hoshino enron company joual democratic labatt 4 125 | inquisitions edo middle denominations ages dicing 3 126 | wikipedia breatharianism wikis nupedia gregorian writeups 4 127 | superfluid antihydrogen electrons animal dipoles magnetization 3 128 | cretians invariably nearly ge virtually almost 3 129 | ltcm cpes jpg decedent etf arbitrage 2 130 | dominated impeded kazuki formulas unaffected by 3 131 | prev nudge newnode left cm right 4 132 | realized danneskjold claude whenever taels when 2 133 | totleben mark cleese steinbeck military galt 4 134 | butt horse foal france beavis sqrat 3 135 | dollar steinsaltz chosenness lcms demonolatry kaddish 0 136 | castro contras cuban ltte elevation cira 4 137 | reason factor iran fallacious motivating another 2 138 | karmapa grandfather wikisaurus manitobans list see 1 139 | convention muslims maronites joden jews dhimmi 0 140 | harrison davis adams specialized booth fukuyama 3 141 | kihei aarseth enzyme behe antinori gettier 2 142 | hypnotizable challenging mbox hugely highly immensely 2 143 | male female hovd people explanation uranhay 4 144 | formless qoheleth kosmos empress godlike spatiality 3 145 | hypomania bipolar bird autistic flunitrazepam hypnosis 2 146 | noise exposure low san due poor 3 147 | muck feminist spicier than more temperamental 1 148 | deer hibbing christmas happiest life denisovich 0 149 | tttt maersk rshavn insurance seaborg ak 3 150 | tenn televisions caligula kammu elagabalus gallienus 1 151 | rediscovered originally exports conceived flintlock was 2 152 | louisiana taken grodd taking took take 0 153 | technological fermentation anchoring primary food irradiation 0 154 | airlines schliemann st hildegard van buren 0 155 | vagrant maccabees undergone ricaurte subverted has 1 156 | antediluvian parts around coulomb raporto world 3 157 | sravaka psi buddha meher dharma baba 1 158 | funimation keillor hbo mbox eastenders letterbox 3 159 | circumcised temperature masculism monogamy circumcision masturbate 1 160 | pope pressure mussolini italy mansfeld celestines 1 161 | asean der caricom community ecsc european 1 162 | decimal eichmann selina guildenstern hannibal princip 0 163 | households mediation widgery convention arbitration unfccc 0 164 | wangenheim hamsun oswald page basov psychotherapist 3 165 | families mothers duryodhana letterboxers louis deadheads 4 166 | russia algorithms tsars philby efron soviet 1 167 | executive answerability mbox davidians commons lords 2 168 | respectively anglosphere jutes sides professional divide 4 169 | contraception gwh unbundling shipowner creditor relator 1 170 | already previously irrigated who telo those 2 171 | boromir correspondence feud twh friendship relationship 3 172 | carnivora lays frowned drew laid put 0 173 | covert pachomius giza athens greece druidry 0 174 | will failure heart help est without 4 175 | can be doublespeak noted multilateral should 4 176 | selection arrangement sort repetition und this 4 177 | mandate cameroons botc johore phi gibraltarians 4 178 | nazarbayev ould humid mahuad nguema diefenbaker 2 179 | speer kontor krupp abeken protein asner 4 180 | securityfocus norrath disestablished ruler april june 3 181 | algorithms hyperinflation decline rapid underwent prosperity 0 182 | james chojn maccreigh cabell kinds taggart 4 183 | ed kombinate dimasi business entrepreneurship topicslist 0 184 | states kingdom united rightarrow trucial metricated 3 185 | outrage controversies louis debate aroused controversy 2 186 | schools tennis chiropractic ivies unschooling midwifery 1 187 | ainsworth asbury zwingli declaration anselm magdalene 3 188 | wartburg camp peterloo against holyrood rooming 3 189 | responsible injunctive jpg humanitarian relief msf 2 190 | jersey conflict new yankees orleans frazee 1 191 | plurality election irv vote thousands fptp 4 192 | kick kickoff slightly bodyline hurst scrimmage 2 193 | doraemon series endgame increase dredd mecha 3 194 | eram import tus ctus factum l 1 195 | lenin zinoviev yoga tse mao trotsky 2 196 | familia marvelman fanzines herg miracleman shrugged 0 197 | nakatomi kashima kakinomoto insubstantial iran no 4 198 | jpg anh ribbentrop schliemann kampf mein 0 199 | actor jpg actress lillard doohan orton 1 200 | heritable metaclasses amoeboid interrelationships namibia chimpanzees 4 201 | interpretations worlds frequentist interpretation heat mwi 4 202 | tamarins cebidae alliaceae homininae chi marmosets 4 203 | libel sedition murder billion against whitacre 3 204 | betting sideboard riichi card am catan 4 205 | colfax millsaps salford eindhoven text calgary 4 206 | nehru japhetic voc pakistan al india 4 207 | loyally kimsey storin isbn minister dienststelle 3 208 | interactions arizona emporia abet kachin state 0 209 | spill exports million inches billion gdp 3 210 | attempted kwh helped eventually led return 1 211 | kwanzas dinara caritas topological istat est 3 212 | participant active hilary affidavit archdruid an 2 213 | isbn machetes generally are doppelbocks suppers 0 214 | valech education comment dimpac based dcsd 1 215 | amazons bc pre san lusitani cimbri 3 216 | louisiana mekugi ball pawl cue caddy 0 217 | agis championships crimson ithilien letsie moshoeshoe 1 218 | campaign jacobite battles televisions bloody wars 3 219 | lla billboard chart voting limp bizkit 3 220 | d nitz company salinger psychotherapist sparil 2 221 | abiogenesis counterfactual kwh laurales evolution polyhedra 2 222 | anteac lucid internalism al truth our 3 223 | upper unido buttstock tubing daisho bead 1 224 | non v submissive resident autistics karabiners 1 225 | nineteenth carnation late percentile rabbis mid 4 226 | alli newspaperman bogarde british papal columbians 4 227 | contributions finest legislation seminal work masterpiece 2 228 | little niggers colombia done had marotta 2 229 | breast next seconds minutes hours weeks 0 230 | refer refers louisiana to according prior 2 231 | totschlag nineteen till deer until age 3 232 | third fourth solutions jhana ekaggata klan 2 233 | preserve namibia integrity overstepping respective own 1 234 | raam ircs for example cool instance 4 235 | tamarin kornbluth ori m boney minimi 0 236 | naval piercer dieting piercing piercers myositis 0 237 | gdynia lumpur tape kuala prabang karachi 2 238 | thermodynamic passenger klm gatwick eurostar heathrow 0 239 | tar vermiculite coal lauter dancer mash 4 240 | slits escalators saddle ifc crevasses geysers 3 241 | variety bolivia wallaby pox handful a 1 242 | ability able senators find choose letterboxers 2 243 | fenian kurukshetra parva adar euler bhishma 4 244 | sent leading out came celtic back 4 245 | countercult louis eplf eritrean movement bonewits 1 246 | ethernet memory eftpos data universe storage 4 247 | food fulham afl nfl afc nfc 0 248 | percentage income des gini homelessness iq 2 249 | utub universe volta lake rajonas ladoga 1 250 | fifa bodybuilding area floorball biathlon croquet 2 251 | import mies der germany mabuse berlin 0 252 | israeli r golan krav dflp lebanese 1 253 | png pretending tends analytically disassembled offeror 0 254 | sepulchre saints bishops chalcedon louisiana filioque 4 255 | sixth seventh major televisions subdominant minor 3 256 | density smallest televisions largest its principal 2 257 | alderney gibraltar bailiwick island si guernsey 4 258 | annihilate each meld dynasty tiles melds 3 259 | archers auras v ltte migs katana 2 260 | pushed towards southward geschichte westward through 3 261 | weapon turbofans birds ammo warhead nitrox 2 262 | remonstrants earl relativity einstein bose milgram 1 263 | interested manner deer favor addition in 2 264 | merger band brand staleys mm enco 4 265 | juice clinton u roosevelt quayle ickes 0 266 | maasai hall chad kerala africa kazakstan 1 267 | gpp deaths hgp fidonet project gsm 1 268 | least chronos sunrise chronic phaestus time 3 269 | voris apollo laika flight alcohol jsf 4 270 | kosi laxness explain haznawi al capone 2 271 | destroyers camouflage battleship kamikaze louisiana yamamoto 4 272 | euro ecu banknotes families currency ecb 3 273 | mortem monarchic brink handed war post 3 274 | uis ll naval tread would will 2 275 | albania estonia kazakhstan moldova unix latvia 4 276 | cumberland cornwall albury node bouldering fermanagh 3 277 | carolla louisiana embarking focussing occasion on 1 278 | white cybernetics dembski artificial geodesy dianetic 0 279 | plebiscite signing constitution toshiki juice treaty 4 280 | signals transmitter multipath earl photophone beacons 3 281 | ayrton federer mbox wimbledon finalist agassi 2 282 | berkley msrp crossroad delicatessen beings anthropophagy 4 283 | murrumbidgee sierra jayapura diyala freeways area 1 284 | unalaska coloane krakatoa thomas kong macau 3 285 | greenhouse brightest sequestration energy deforestation ecosystem 1 286 | temperament isbn circle coriolis estimator ellipse 1 287 | avm venosus isbn brain insulin blood 2 288 | suras alcohol scyphozoa hydrozoa jedi order 1 289 | laws phylum finagle law similars fives 1 290 | scrat aikman cultural player baseball gehrig 2 291 | bissette dr billion straczynski j hoover 2 292 | extra everest sea weald level point 0 293 | sim apart coevorden refrained tinymud from 0 294 | force thou military gunto jdf golovachev 1 295 | rshavn ttt dnia senator eduke z 3 296 | confederate mexican american celestial america exceptionalism 3 297 | iuds paperback unspecific seizure mental nubcake 1 298 | dylan binomial bowie clapton mustaine letterman 1 299 | moons tombaugh jupiter drug kbos plutinos 3 300 | moravia dukes migrant duchy livonia bentheim 2 301 | -------------------------------------------------------------------------------- /word2nvec-c/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/luohongyin/OIWE/ae6300754801cc1b2bc9195bf305670e78774a72/word2nvec-c/.DS_Store -------------------------------------------------------------------------------- /word2nvec-c/compute-accuracy.c: -------------------------------------------------------------------------------- 1 | // Copyright 2013 Google Inc. All Rights Reserved. 2 | // 3 | // Licensed under the Apache License, Version 2.0 (the "License"); 4 | // you may not use this file except in compliance with the License. 5 | // You may obtain a copy of the License at 6 | // 7 | // http://www.apache.org/licenses/LICENSE-2.0 8 | // 9 | // Unless required by applicable law or agreed to in writing, software 10 | // distributed under the License is distributed on an "AS IS" BASIS, 11 | // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 | // See the License for the specific language governing permissions and 13 | // limitations under the License. 14 | 15 | #include 16 | #include 17 | #include 18 | #include 19 | #include // #include 20 | #include 21 | 22 | const long long max_size = 2000; // max length of strings 23 | const long long N = 1; // number of closest words 24 | const long long max_w = 50; // max length of vocabulary entries 25 | 26 | int main(int argc, char **argv) 27 | { 28 | FILE *f; 29 | char st1[max_size], st2[max_size], st3[max_size], st4[max_size], bestw[N][max_size], file_name[max_size], ch; 30 | float dist, len, bestd[N], vec[max_size]; 31 | long long words, size, a, b, c, d, b1, b2, b3, threshold = 0; 32 | float *M; 33 | char *vocab; 34 | int TCN, CCN = 0, TACN = 0, CACN = 0, SECN = 0, SYCN = 0, SEAC = 0, SYAC = 0, QID = 0, TQ = 0, TQS = 0; 35 | if (argc < 2) { 36 | printf("Usage: ./compute-accuracy \nwhere FILE contains word projections, and threshold is used to reduce vocabulary of the model for fast approximate evaluation (0 = off, otherwise typical value is 30000)\n"); 37 | return 0; 38 | } 39 | strcpy(file_name, argv[1]); 40 | if (argc > 2) threshold = atoi(argv[2]); 41 | f = fopen(file_name, "rb"); 42 | if (f == NULL) { 43 | printf("Input file not found\n"); 44 | return -1; 45 | } 46 | fscanf(f, "%lld", &words); 47 | if (threshold) if (words > threshold) words = threshold; 48 | fscanf(f, "%lld", &size); 49 | vocab = (char *)malloc(words * max_w * sizeof(char)); 50 | M = (float *)malloc(words * size * sizeof(float)); 51 | if (M == NULL) { 52 | printf("Cannot allocate memory: %lld MB\n", words * size * sizeof(float) / 1048576); 53 | return -1; 54 | } 55 | for (b = 0; b < words; b++) { 56 | a = 0; 57 | while (1) { 58 | vocab[b * max_w + a] = fgetc(f); 59 | if (feof(f) || (vocab[b * max_w + a] == ' ')) break; 60 | if ((a < max_w) && (vocab[b * max_w + a] != '\n')) a++; 61 | } 62 | vocab[b * max_w + a] = 0; 63 | for (a = 0; a < max_w; a++) vocab[b * max_w + a] = toupper(vocab[b * max_w + a]); 64 | for (a = 0; a < size; a++) fread(&M[a + b * size], sizeof(float), 1, f); 65 | len = 0; 66 | for (a = 0; a < size; a++) len += M[a + b * size] * M[a + b * size]; 67 | len = sqrt(len); 68 | for (a = 0; a < size; a++) M[a + b * size] /= len; 69 | } 70 | fclose(f); 71 | TCN = 0; 72 | while (1) { 73 | for (a = 0; a < N; a++) bestd[a] = 0; 74 | for (a = 0; a < N; a++) bestw[a][0] = 0; 75 | scanf("%s", st1); 76 | for (a = 0; a < strlen(st1); a++) st1[a] = toupper(st1[a]); 77 | if ((!strcmp(st1, ":")) || (!strcmp(st1, "EXIT")) || feof(stdin)) { 78 | if (TCN == 0) TCN = 1; 79 | if (QID != 0) { 80 | printf("ACCURACY TOP1: %.2f %% (%d / %d)\n", CCN / (float)TCN * 100, CCN, TCN); 81 | printf("Total accuracy: %.2f %% Semantic accuracy: %.2f %% Syntactic accuracy: %.2f %% \n", CACN / (float)TACN * 100, SEAC / (float)SECN * 100, SYAC / (float)SYCN * 100); 82 | } 83 | QID++; 84 | scanf("%s", st1); 85 | if (feof(stdin)) break; 86 | printf("%s:\n", st1); 87 | TCN = 0; 88 | CCN = 0; 89 | continue; 90 | } 91 | if (!strcmp(st1, "EXIT")) break; 92 | scanf("%s", st2); 93 | for (a = 0; a < strlen(st2); a++) st2[a] = toupper(st2[a]); 94 | scanf("%s", st3); 95 | for (a = 0; a bestd[a]) { 122 | for (d = N - 1; d > a; d--) { 123 | bestd[d] = bestd[d - 1]; 124 | strcpy(bestw[d], bestw[d - 1]); 125 | } 126 | bestd[a] = dist; 127 | strcpy(bestw[a], &vocab[c * max_w]); 128 | break; 129 | } 130 | } 131 | } 132 | if (!strcmp(st4, bestw[0])) { 133 | CCN++; 134 | CACN++; 135 | if (QID <= 5) SEAC++; else SYAC++; 136 | } 137 | if (QID <= 5) SECN++; else SYCN++; 138 | TCN++; 139 | TACN++; 140 | } 141 | printf("Questions seen / total: %d %d %.2f %% \n", TQS, TQ, TQS/(float)TQ*100); 142 | return 0; 143 | } 144 | -------------------------------------------------------------------------------- /word2nvec-c/distance.c: -------------------------------------------------------------------------------- 1 | // Copyright 2013 Google Inc. All Rights Reserved. 2 | // 3 | // Licensed under the Apache License, Version 2.0 (the "License"); 4 | // you may not use this file except in compliance with the License. 5 | // You may obtain a copy of the License at 6 | // 7 | // http://www.apache.org/licenses/LICENSE-2.0 8 | // 9 | // Unless required by applicable law or agreed to in writing, software 10 | // distributed under the License is distributed on an "AS IS" BASIS, 11 | // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 | // See the License for the specific language governing permissions and 13 | // limitations under the License. 14 | 15 | #include 16 | #include 17 | #include 18 | #include // #include 19 | 20 | const long long max_size = 2000; // max length of strings 21 | const long long N = 40; // number of closest words that will be shown 22 | const long long max_w = 50; // max length of vocabulary entries 23 | 24 | int main(int argc, char **argv) { 25 | FILE *f; 26 | char st1[max_size]; 27 | char *bestw[N]; 28 | char file_name[max_size], st[100][max_size]; 29 | float dist, len, bestd[N], vec[max_size]; 30 | long long words, size, a, b, c, d, cn, bi[100]; 31 | char ch; 32 | float *M; 33 | char *vocab; 34 | if (argc < 2) { 35 | printf("Usage: ./distance \nwhere FILE contains word projections in the BINARY FORMAT\n"); 36 | return 0; 37 | } 38 | strcpy(file_name, argv[1]); 39 | f = fopen(file_name, "rb"); 40 | if (f == NULL) { 41 | printf("Input file not found\n"); 42 | return -1; 43 | } 44 | fscanf(f, "%lld", &words); 45 | fscanf(f, "%lld", &size); 46 | vocab = (char *)malloc((long long)words * max_w * sizeof(char)); 47 | for (a = 0; a < N; a++) bestw[a] = (char *)malloc(max_size * sizeof(char)); 48 | M = (float *)malloc((long long)words * (long long)size * sizeof(float)); 49 | if (M == NULL) { 50 | printf("Cannot allocate memory: %lld MB %lld %lld\n", (long long)words * size * sizeof(float) / 1048576, words, size); 51 | return -1; 52 | } 53 | for (b = 0; b < words; b++) { 54 | a = 0; 55 | while (1) { 56 | vocab[b * max_w + a] = fgetc(f); 57 | if (feof(f) || (vocab[b * max_w + a] == ' ')) break; 58 | if ((a < max_w) && (vocab[b * max_w + a] != '\n')) a++; 59 | } 60 | vocab[b * max_w + a] = 0; 61 | for (a = 0; a < size; a++) fread(&M[a + b * size], sizeof(float), 1, f); 62 | len = 0; 63 | for (a = 0; a < size; a++) len += M[a + b * size] * M[a + b * size]; 64 | len = sqrt(len); 65 | for (a = 0; a < size; a++) M[a + b * size] /= len; 66 | } 67 | fclose(f); 68 | while (1) { 69 | for (a = 0; a < N; a++) bestd[a] = 0; 70 | for (a = 0; a < N; a++) bestw[a][0] = 0; 71 | printf("Enter word or sentence (EXIT to break): "); 72 | a = 0; 73 | while (1) { 74 | st1[a] = fgetc(stdin); 75 | if ((st1[a] == '\n') || (a >= max_size - 1)) { 76 | st1[a] = 0; 77 | break; 78 | } 79 | a++; 80 | } 81 | if (!strcmp(st1, "EXIT")) break; 82 | cn = 0; 83 | b = 0; 84 | c = 0; 85 | while (1) { 86 | st[cn][b] = st1[c]; 87 | b++; 88 | c++; 89 | st[cn][b] = 0; 90 | if (st1[c] == 0) break; 91 | if (st1[c] == ' ') { 92 | cn++; 93 | b = 0; 94 | c++; 95 | } 96 | } 97 | cn++; 98 | for (a = 0; a < cn; a++) { 99 | for (b = 0; b < words; b++) if (!strcmp(&vocab[b * max_w], st[a])) break; 100 | if (b == words) b = -1; 101 | bi[a] = b; 102 | printf("\nWord: %s Position in vocabulary: %lld\n", st[a], bi[a]); 103 | if (b == -1) { 104 | printf("Out of dictionary word!\n"); 105 | break; 106 | } 107 | } 108 | if (b == -1) continue; 109 | printf("\n Word Cosine distance\n------------------------------------------------------------------------\n"); 110 | for (a = 0; a < size; a++) vec[a] = 0; 111 | for (b = 0; b < cn; b++) { 112 | if (bi[b] == -1) continue; 113 | for (a = 0; a < size; a++) vec[a] += M[a + bi[b] * size]; 114 | } 115 | len = 0; 116 | for (a = 0; a < size; a++) len += vec[a] * vec[a]; 117 | len = sqrt(len); 118 | for (a = 0; a < size; a++) vec[a] /= len; 119 | for (a = 0; a < N; a++) bestd[a] = -1; 120 | for (a = 0; a < N; a++) bestw[a][0] = 0; 121 | for (c = 0; c < words; c++) { 122 | a = 0; 123 | for (b = 0; b < cn; b++) if (bi[b] == c) a = 1; 124 | if (a == 1) continue; 125 | dist = 0; 126 | for (a = 0; a < size; a++) dist += vec[a] * M[a + c * size]; 127 | for (a = 0; a < N; a++) { 128 | if (dist > bestd[a]) { 129 | for (d = N - 1; d > a; d--) { 130 | bestd[d] = bestd[d - 1]; 131 | strcpy(bestw[d], bestw[d - 1]); 132 | } 133 | bestd[a] = dist; 134 | strcpy(bestw[a], &vocab[c * max_w]); 135 | break; 136 | } 137 | } 138 | } 139 | for (a = 0; a < N; a++) printf("%50s\t\t%f\n", bestw[a], bestd[a]); 140 | } 141 | return 0; 142 | } 143 | -------------------------------------------------------------------------------- /word2nvec-c/makefile: -------------------------------------------------------------------------------- 1 | SCRIPTS_DIR=../scripts 2 | BIN_DIR=../bin 3 | 4 | CC = gcc 5 | #Using -Ofast instead of -O3 might result in faster code, but is supported only by newer GCC versions 6 | CFLAGS = -g -lm -pthread -O3 -Wall -march=native -funroll-loops -Wno-unused-result 7 | 8 | all: word2vec word2phrase distance word-analogy compute-accuracy 9 | 10 | word2vec : word2nvec.c 11 | $(CC) word2nvec.c -o ${BIN_DIR}/word2nvec $(CFLAGS) 12 | word2phrase : word2phrase.c 13 | $(CC) word2phrase.c -o ${BIN_DIR}/word2phrase $(CFLAGS) 14 | distance : distance.c 15 | $(CC) distance.c -o ${BIN_DIR}/w2v-distance $(CFLAGS) 16 | word-analogy : word-analogy.c 17 | $(CC) word-analogy.c -o ${BIN_DIR}/w2v-word-analogy $(CFLAGS) 18 | compute-accuracy : compute-accuracy.c 19 | $(CC) compute-accuracy.c -o ${BIN_DIR}/w2v-compute-accuracy $(CFLAGS) 20 | 21 | clean: 22 | pushd ${BIN_DIR} && rm -rf word2vec word2phrase distance word-analogy compute-accuracy; popd 23 | -------------------------------------------------------------------------------- /word2nvec-c/word-analogy.c: -------------------------------------------------------------------------------- 1 | // Copyright 2013 Google Inc. All Rights Reserved. 2 | // 3 | // Licensed under the Apache License, Version 2.0 (the "License"); 4 | // you may not use this file except in compliance with the License. 5 | // You may obtain a copy of the License at 6 | // 7 | // http://www.apache.org/licenses/LICENSE-2.0 8 | // 9 | // Unless required by applicable law or agreed to in writing, software 10 | // distributed under the License is distributed on an "AS IS" BASIS, 11 | // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 | // See the License for the specific language governing permissions and 13 | // limitations under the License. 14 | 15 | #include 16 | #include 17 | #include 18 | #include // #include 19 | 20 | const long long max_size = 2000; // max length of strings 21 | const long long N = 40; // number of closest words that will be shown 22 | const long long max_w = 50; // max length of vocabulary entries 23 | 24 | int main(int argc, char **argv) { 25 | FILE *f; 26 | char st1[max_size]; 27 | char bestw[N][max_size]; 28 | char file_name[max_size], st[100][max_size]; 29 | float dist, len, bestd[N], vec[max_size]; 30 | long long words, size, a, b, c, d, cn, bi[100]; 31 | char ch; 32 | float *M; 33 | char *vocab; 34 | if (argc < 2) { 35 | printf("Usage: ./word-analogy \nwhere FILE contains word projections in the BINARY FORMAT\n"); 36 | return 0; 37 | } 38 | strcpy(file_name, argv[1]); 39 | f = fopen(file_name, "rb"); 40 | if (f == NULL) { 41 | printf("Input file not found\n"); 42 | return -1; 43 | } 44 | fscanf(f, "%lld", &words); 45 | fscanf(f, "%lld", &size); 46 | vocab = (char *)malloc((long long)words * max_w * sizeof(char)); 47 | M = (float *)malloc((long long)words * (long long)size * sizeof(float)); 48 | if (M == NULL) { 49 | printf("Cannot allocate memory: %lld MB %lld %lld\n", (long long)words * size * sizeof(float) / 1048576, words, size); 50 | return -1; 51 | } 52 | for (b = 0; b < words; b++) { 53 | a = 0; 54 | while (1) { 55 | vocab[b * max_w + a] = fgetc(f); 56 | if (feof(f) || (vocab[b * max_w + a] == ' ')) break; 57 | if ((a < max_w) && (vocab[b * max_w + a] != '\n')) a++; 58 | } 59 | vocab[b * max_w + a] = 0; 60 | for (a = 0; a < size; a++) fread(&M[a + b * size], sizeof(float), 1, f); 61 | len = 0; 62 | for (a = 0; a < size; a++) len += M[a + b * size] * M[a + b * size]; 63 | len = sqrt(len); 64 | for (a = 0; a < size; a++) M[a + b * size] /= len; 65 | } 66 | fclose(f); 67 | while (1) { 68 | for (a = 0; a < N; a++) bestd[a] = 0; 69 | for (a = 0; a < N; a++) bestw[a][0] = 0; 70 | printf("Enter three words (EXIT to break): "); 71 | a = 0; 72 | while (1) { 73 | st1[a] = fgetc(stdin); 74 | if ((st1[a] == '\n') || (a >= max_size - 1)) { 75 | st1[a] = 0; 76 | break; 77 | } 78 | a++; 79 | } 80 | if (!strcmp(st1, "EXIT")) break; 81 | cn = 0; 82 | b = 0; 83 | c = 0; 84 | while (1) { 85 | st[cn][b] = st1[c]; 86 | b++; 87 | c++; 88 | st[cn][b] = 0; 89 | if (st1[c] == 0) break; 90 | if (st1[c] == ' ') { 91 | cn++; 92 | b = 0; 93 | c++; 94 | } 95 | } 96 | cn++; 97 | if (cn < 3) { 98 | printf("Only %lld words were entered.. three words are needed at the input to perform the calculation\n", cn); 99 | continue; 100 | } 101 | for (a = 0; a < cn; a++) { 102 | for (b = 0; b < words; b++) if (!strcmp(&vocab[b * max_w], st[a])) break; 103 | if (b == words) b = 0; 104 | bi[a] = b; 105 | printf("\nWord: %s Position in vocabulary: %lld\n", st[a], bi[a]); 106 | if (b == 0) { 107 | printf("Out of dictionary word!\n"); 108 | break; 109 | } 110 | } 111 | if (b == 0) continue; 112 | printf("\n Word Distance\n------------------------------------------------------------------------\n"); 113 | for (a = 0; a < size; a++) vec[a] = M[a + bi[1] * size] - M[a + bi[0] * size] + M[a + bi[2] * size]; 114 | len = 0; 115 | for (a = 0; a < size; a++) len += vec[a] * vec[a]; 116 | len = sqrt(len); 117 | for (a = 0; a < size; a++) vec[a] /= len; 118 | for (a = 0; a < N; a++) bestd[a] = 0; 119 | for (a = 0; a < N; a++) bestw[a][0] = 0; 120 | for (c = 0; c < words; c++) { 121 | if (c == bi[0]) continue; 122 | if (c == bi[1]) continue; 123 | if (c == bi[2]) continue; 124 | a = 0; 125 | for (b = 0; b < cn; b++) if (bi[b] == c) a = 1; 126 | if (a == 1) continue; 127 | dist = 0; 128 | for (a = 0; a < size; a++) dist += vec[a] * M[a + c * size]; 129 | for (a = 0; a < N; a++) { 130 | if (dist > bestd[a]) { 131 | for (d = N - 1; d > a; d--) { 132 | bestd[d] = bestd[d - 1]; 133 | strcpy(bestw[d], bestw[d - 1]); 134 | } 135 | bestd[a] = dist; 136 | strcpy(bestw[a], &vocab[c * max_w]); 137 | break; 138 | } 139 | } 140 | } 141 | for (a = 0; a < N; a++) printf("%50s\t\t%f\n", bestw[a], bestd[a]); 142 | } 143 | return 0; 144 | } 145 | -------------------------------------------------------------------------------- /word2nvec-c/word2nvec.c: -------------------------------------------------------------------------------- 1 | // Copyright 2013 Google Inc. All Rights Reserved. 2 | // 3 | // Licensed under the Apache License, Version 2.0 (the "License"); 4 | // you may not use this file except in compliance with the License. 5 | // You may obtain a copy of the License at 6 | // 7 | // http://www.apache.org/licenses/LICENSE-2.0 8 | // 9 | // Unless required by applicable law or agreed to in writing, software 10 | // distributed under the License is distributed on an "AS IS" BASIS, 11 | // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 | // See the License for the specific language governing permissions and 13 | // limitations under the License. 14 | 15 | #include 16 | #include 17 | #include 18 | #include 19 | #include 20 | 21 | #define MAX_STRING 100 22 | #define EXP_TABLE_SIZE 1000 23 | #define MAX_EXP 6 24 | #define MAX_SENTENCE_LENGTH 1000 25 | #define MAX_CODE_LENGTH 40 26 | 27 | const int vocab_hash_size = 30000000; // Maximum 30 * 0.7 = 21M words in the vocabulary 28 | 29 | typedef float real; // Precision of float numbers 30 | 31 | struct vocab_word { 32 | long long cn; 33 | int *point; 34 | char *word, *code, codelen; 35 | }; 36 | 37 | char train_file[MAX_STRING], output_file[MAX_STRING],output_file2[MAX_STRING]; 38 | char save_vocab_file[MAX_STRING], read_vocab_file[MAX_STRING]; 39 | struct vocab_word *vocab; 40 | int binary = 0, cbow = 1, debug_mode = 2, window = 5, min_count = 5, num_threads = 12, min_reduce = 1; 41 | int *vocab_hash; 42 | long long vocab_max_size = 1000, vocab_size = 0, layer1_size = 100; 43 | long long train_words = 0, word_count_actual = 0, iter = 5, file_size = 0, classes = 0; 44 | real alpha = 0.025, starting_alpha, sample = 1e-3; 45 | real *syn0, *syn1, *syn1neg, *expTable, *rateTable; 46 | clock_t start; 47 | 48 | int hs = 0, negative = 5; 49 | const int table_size = 1e8; 50 | int *table; 51 | 52 | void InitUnigramTable() { 53 | int a, i; 54 | double train_words_pow = 0; 55 | double d1, power = 0.75; 56 | table = (int *)malloc(table_size * sizeof(int)); 57 | for (a = 0; a < vocab_size; a++) train_words_pow += pow(vocab[a].cn, power); 58 | i = 0; 59 | d1 = pow(vocab[i].cn, power) / train_words_pow; 60 | for (a = 0; a < table_size; a++) { 61 | table[a] = i; 62 | if (a / (double)table_size > d1) { 63 | i++; 64 | d1 += pow(vocab[i].cn, power) / train_words_pow; 65 | } 66 | if (i >= vocab_size) i = vocab_size - 1; 67 | } 68 | } 69 | 70 | // Reads a single word from a file, assuming space + tab + EOL to be word boundaries 71 | void ReadWord(char *word, FILE *fin) { 72 | int a = 0, ch; 73 | while (!feof(fin)) { 74 | ch = fgetc(fin); 75 | if (ch == 13) continue; 76 | if ((ch == ' ') || (ch == '\t') || (ch == '\n')) { 77 | if (a > 0) { 78 | if (ch == '\n') ungetc(ch, fin); 79 | break; 80 | } 81 | if (ch == '\n') { 82 | strcpy(word, (char *)""); 83 | return; 84 | } else continue; 85 | } 86 | word[a] = ch; 87 | a++; 88 | if (a >= MAX_STRING - 1) a--; // Truncate too long words 89 | } 90 | word[a] = 0; 91 | } 92 | 93 | // Returns hash value of a word 94 | int GetWordHash(char *word) { 95 | unsigned long long a, hash = 0; 96 | for (a = 0; a < strlen(word); a++) hash = hash * 257 + word[a]; 97 | hash = hash % vocab_hash_size; 98 | return hash; 99 | } 100 | 101 | // Returns position of a word in the vocabulary; if the word is not found, returns -1 102 | int SearchVocab(char *word) { 103 | unsigned int hash = GetWordHash(word); 104 | while (1) { 105 | if (vocab_hash[hash] == -1) return -1; 106 | if (!strcmp(word, vocab[vocab_hash[hash]].word)) return vocab_hash[hash]; 107 | hash = (hash + 1) % vocab_hash_size; 108 | } 109 | return -1; 110 | } 111 | 112 | // Reads a word and returns its index in the vocabulary 113 | int ReadWordIndex(FILE *fin) { 114 | char word[MAX_STRING]; 115 | ReadWord(word, fin); 116 | if (feof(fin)) return -1; 117 | return SearchVocab(word); 118 | } 119 | 120 | // Adds a word to the vocabulary 121 | int AddWordToVocab(char *word) { 122 | unsigned int hash, length = strlen(word) + 1; 123 | if (length > MAX_STRING) length = MAX_STRING; 124 | vocab[vocab_size].word = (char *)calloc(length, sizeof(char)); 125 | strcpy(vocab[vocab_size].word, word); 126 | vocab[vocab_size].cn = 0; 127 | vocab_size++; 128 | // Reallocate memory if needed 129 | if (vocab_size + 2 >= vocab_max_size) { 130 | vocab_max_size += 1000; 131 | vocab = (struct vocab_word *)realloc(vocab, vocab_max_size * sizeof(struct vocab_word)); 132 | } 133 | hash = GetWordHash(word); 134 | while (vocab_hash[hash] != -1) hash = (hash + 1) % vocab_hash_size; 135 | vocab_hash[hash] = vocab_size - 1; 136 | return vocab_size - 1; 137 | } 138 | 139 | // Used later for sorting by word counts 140 | int VocabCompare(const void *a, const void *b) { 141 | return ((struct vocab_word *)b)->cn - ((struct vocab_word *)a)->cn; 142 | } 143 | 144 | // Sorts the vocabulary by frequency using word counts 145 | void SortVocab() { 146 | int a, size; 147 | unsigned int hash; 148 | // Sort the vocabulary and keep at the first position 149 | qsort(&vocab[1], vocab_size - 1, sizeof(struct vocab_word), VocabCompare); 150 | for (a = 0; a < vocab_hash_size; a++) vocab_hash[a] = -1; 151 | size = vocab_size; 152 | train_words = 0; 153 | for (a = 0; a < size; a++) { 154 | // Words occuring less than min_count times will be discarded from the vocab 155 | if ((vocab[a].cn < min_count) && (a != 0)) { 156 | vocab_size--; 157 | free(vocab[a].word); 158 | } else { 159 | // Hash will be re-computed, as after the sorting it is not actual 160 | hash=GetWordHash(vocab[a].word); 161 | while (vocab_hash[hash] != -1) hash = (hash + 1) % vocab_hash_size; 162 | vocab_hash[hash] = a; 163 | train_words += vocab[a].cn; 164 | } 165 | } 166 | vocab = (struct vocab_word *)realloc(vocab, (vocab_size + 1) * sizeof(struct vocab_word)); 167 | // Allocate memory for the binary tree construction 168 | for (a = 0; a < vocab_size; a++) { 169 | vocab[a].code = (char *)calloc(MAX_CODE_LENGTH, sizeof(char)); 170 | vocab[a].point = (int *)calloc(MAX_CODE_LENGTH, sizeof(int)); 171 | } 172 | } 173 | 174 | // Reduces the vocabulary by removing infrequent tokens 175 | void ReduceVocab() { 176 | int a, b = 0; 177 | unsigned int hash; 178 | for (a = 0; a < vocab_size; a++) if (vocab[a].cn > min_reduce) { 179 | vocab[b].cn = vocab[a].cn; 180 | vocab[b].word = vocab[a].word; 181 | b++; 182 | } else free(vocab[a].word); 183 | vocab_size = b; 184 | for (a = 0; a < vocab_hash_size; a++) vocab_hash[a] = -1; 185 | for (a = 0; a < vocab_size; a++) { 186 | // Hash will be re-computed, as it is not actual 187 | hash = GetWordHash(vocab[a].word); 188 | while (vocab_hash[hash] != -1) hash = (hash + 1) % vocab_hash_size; 189 | vocab_hash[hash] = a; 190 | } 191 | fflush(stdout); 192 | min_reduce++; 193 | } 194 | 195 | // Create binary Huffman tree using the word counts 196 | // Frequent words will have short uniqe binary codes 197 | void CreateBinaryTree() { 198 | long long a, b, i, min1i, min2i, pos1, pos2, point[MAX_CODE_LENGTH]; 199 | char code[MAX_CODE_LENGTH]; 200 | long long *count = (long long *)calloc(vocab_size * 2 + 1, sizeof(long long)); 201 | long long *binary = (long long *)calloc(vocab_size * 2 + 1, sizeof(long long)); 202 | long long *parent_node = (long long *)calloc(vocab_size * 2 + 1, sizeof(long long)); 203 | for (a = 0; a < vocab_size; a++) count[a] = vocab[a].cn; 204 | for (a = vocab_size; a < vocab_size * 2; a++) count[a] = 1e15; 205 | pos1 = vocab_size - 1; 206 | pos2 = vocab_size; 207 | // Following algorithm constructs the Huffman tree by adding one node at a time 208 | for (a = 0; a < vocab_size - 1; a++) { 209 | // First, find two smallest nodes 'min1, min2' 210 | if (pos1 >= 0) { 211 | if (count[pos1] < count[pos2]) { 212 | min1i = pos1; 213 | pos1--; 214 | } else { 215 | min1i = pos2; 216 | pos2++; 217 | } 218 | } else { 219 | min1i = pos2; 220 | pos2++; 221 | } 222 | if (pos1 >= 0) { 223 | if (count[pos1] < count[pos2]) { 224 | min2i = pos1; 225 | pos1--; 226 | } else { 227 | min2i = pos2; 228 | pos2++; 229 | } 230 | } else { 231 | min2i = pos2; 232 | pos2++; 233 | } 234 | count[vocab_size + a] = count[min1i] + count[min2i]; 235 | parent_node[min1i] = vocab_size + a; 236 | parent_node[min2i] = vocab_size + a; 237 | binary[min2i] = 1; 238 | } 239 | // Now assign binary code to each vocabulary word 240 | for (a = 0; a < vocab_size; a++) { 241 | b = a; 242 | i = 0; 243 | while (1) { 244 | code[i] = binary[b]; 245 | point[i] = b; 246 | i++; 247 | b = parent_node[b]; 248 | if (b == vocab_size * 2 - 2) break; 249 | } 250 | vocab[a].codelen = i; 251 | vocab[a].point[0] = vocab_size - 2; 252 | for (b = 0; b < i; b++) { 253 | vocab[a].code[i - b - 1] = code[b]; 254 | vocab[a].point[i - b] = point[b] - vocab_size; 255 | } 256 | } 257 | free(count); 258 | free(binary); 259 | free(parent_node); 260 | } 261 | 262 | void LearnVocabFromTrainFile() { 263 | char word[MAX_STRING]; 264 | FILE *fin; 265 | long long a, i; 266 | for (a = 0; a < vocab_hash_size; a++) vocab_hash[a] = -1; 267 | fin = fopen(train_file, "rb"); 268 | if (fin == NULL) { 269 | printf("ERROR: training data file not found!\n"); 270 | exit(1); 271 | } 272 | vocab_size = 0; 273 | AddWordToVocab((char *)""); 274 | while (1) { 275 | ReadWord(word, fin); 276 | if (feof(fin)) break; 277 | train_words++; 278 | if ((debug_mode > 1) && (train_words % 100000 == 0)) { 279 | printf("%lldK%c", train_words / 1000, 13); 280 | fflush(stdout); 281 | } 282 | i = SearchVocab(word); 283 | if (i == -1) { 284 | a = AddWordToVocab(word); 285 | vocab[a].cn = 1; 286 | } else vocab[i].cn++; 287 | if (vocab_size > vocab_hash_size * 0.7) ReduceVocab(); 288 | } 289 | SortVocab(); 290 | if (debug_mode > 0) { 291 | printf("Vocab size: %lld\n", vocab_size); 292 | printf("Words in train file: %lld\n", train_words); 293 | } 294 | file_size = ftell(fin); 295 | fclose(fin); 296 | } 297 | 298 | void SaveVocab() { 299 | long long i; 300 | FILE *fo = fopen(save_vocab_file, "wb"); 301 | for (i = 0; i < vocab_size; i++) fprintf(fo, "%s %lld\n", vocab[i].word, vocab[i].cn); 302 | fclose(fo); 303 | } 304 | 305 | void ReadVocab() { 306 | long long a, i = 0; 307 | char c; 308 | char word[MAX_STRING]; 309 | FILE *fin = fopen(read_vocab_file, "rb"); 310 | if (fin == NULL) { 311 | printf("Vocabulary file not found\n"); 312 | exit(1); 313 | } 314 | for (a = 0; a < vocab_hash_size; a++) vocab_hash[a] = -1; 315 | vocab_size = 0; 316 | while (1) { 317 | ReadWord(word, fin); 318 | if (feof(fin)) break; 319 | a = AddWordToVocab(word); 320 | fscanf(fin, "%lld%c", &vocab[a].cn, &c); 321 | i++; 322 | } 323 | SortVocab(); 324 | if (debug_mode > 0) { 325 | printf("Vocab size: %lld\n", vocab_size); 326 | printf("Words in train file: %lld\n", train_words); 327 | } 328 | fin = fopen(train_file, "rb"); 329 | if (fin == NULL) { 330 | printf("ERROR: training data file not found!\n"); 331 | exit(1); 332 | } 333 | fseek(fin, 0, SEEK_END); 334 | file_size = ftell(fin); 335 | fclose(fin); 336 | } 337 | 338 | void InitNet() { 339 | long long a, b; 340 | unsigned long long next_random = 1; 341 | a = posix_memalign((void **)&syn0, 128, (long long)vocab_size * layer1_size * sizeof(real)); 342 | if (syn0 == NULL) {printf("Memory allocation failed\n"); exit(1);} 343 | if (hs) { 344 | a = posix_memalign((void **)&syn1, 128, (long long)vocab_size * layer1_size * sizeof(real)); 345 | if (syn1 == NULL) {printf("Memory allocation failed\n"); exit(1);} 346 | for (a = 0; a < vocab_size; a++) for (b = 0; b < layer1_size; b++) 347 | syn1[a * layer1_size + b] = 0; 348 | } 349 | if (negative>0) { 350 | a = posix_memalign((void **)&syn1neg, 128, (long long)vocab_size * layer1_size * sizeof(real)); 351 | if (syn1neg == NULL) {printf("Memory allocation failed\n"); exit(1);} 352 | for (a = 0; a < vocab_size; a++) for (b = 0; b < layer1_size; b++){ 353 | //next_random = next_random * (unsigned long long)25214903917 + 11; 354 | //syn1neg[a * layer1_size + b] = fabs((((next_random & 0xFFFF) / (real)65536) - 0.5) / layer1_size); 355 | //if (syn1neg[a * layer1_size + b] < 1e-6) syn1neg[a * layer1_size + b] = 0; 356 | syn1neg[a * layer1_size + b] = 0; 357 | } 358 | } 359 | for (a = 0; a < vocab_size; a++) for (b = 0; b < layer1_size; b++) { 360 | next_random = next_random * (unsigned long long)25214903917 + 11; 361 | syn0[a * layer1_size + b] = fabs((((next_random & 0xFFFF) / (real)65536) - 0.5) / layer1_size); 362 | } 363 | a = posix_memalign((void **)&rateTable, 128, (long long)vocab_size * sizeof(real)); 364 | for (a = 0; a < vocab_size; a++) rateTable[a] = alpha; 365 | CreateBinaryTree(); 366 | } 367 | 368 | void *TrainModelThread(void *id) { 369 | long long a, b, d, cw, word, last_word, sentence_length = 0, sentence_position = 0; 370 | long long word_count = 0, last_word_count = 0, sen[MAX_SENTENCE_LENGTH + 1]; 371 | long long l1, l2, c, target, label, local_iter = iter; 372 | unsigned long long next_random = (long long)id; 373 | real f, g,_f,condition,_alpha,_beta,_sigma,_g; 374 | clock_t now; 375 | real *neu1 = (real *)calloc(layer1_size, sizeof(real)); 376 | real *neu1e = (real *)calloc(layer1_size, sizeof(real)); 377 | real *_syn1neg = (real *)calloc(layer1_size, sizeof(real)); 378 | real *_syn0 = (real *)calloc(layer1_size, sizeof(real)); 379 | FILE *fi = fopen(train_file, "rb"); 380 | fseek(fi, file_size / (long long)num_threads * (long long)id, SEEK_SET); 381 | while (1) { 382 | if (word_count - last_word_count > 10000) { 383 | word_count_actual += word_count - last_word_count; 384 | last_word_count = word_count; 385 | if ((debug_mode > 1)) { 386 | now=clock(); 387 | printf("%cAlpha: %f Progress: %.2f%% Words/thread/sec: %.2fk ", 13, alpha, 388 | word_count_actual / (real)(iter * train_words + 1) * 100, 389 | word_count_actual / ((real)(now - start + 1) / (real)CLOCKS_PER_SEC * 1000)); 390 | fflush(stdout); 391 | } 392 | alpha = starting_alpha * (1 - word_count_actual / (real)(iter * train_words + 1)); 393 | if (alpha < starting_alpha * 0.0001) alpha = starting_alpha * 0.0001; 394 | } 395 | if (sentence_length == 0) { 396 | while (1) { 397 | word = ReadWordIndex(fi); 398 | if (feof(fi)) break; 399 | if (word == -1) continue; 400 | word_count++; 401 | if (word == 0) break; 402 | // The subsampling randomly discards frequent words while keeping the ranking same 403 | if (sample > 0) { 404 | real ran = (sqrt(vocab[word].cn / (sample * train_words)) + 1) * (sample * train_words) / vocab[word].cn; 405 | next_random = next_random * (unsigned long long)25214903917 + 11; 406 | if (ran < (next_random & 0xFFFF) / (real)65536) continue; 407 | } 408 | sen[sentence_length] = word; 409 | sentence_length++; 410 | if (sentence_length >= MAX_SENTENCE_LENGTH) break; 411 | } 412 | sentence_position = 0; 413 | } 414 | if (feof(fi) || (word_count > train_words / num_threads)) { 415 | word_count_actual += word_count - last_word_count; 416 | local_iter--; 417 | if (local_iter == 0) break; 418 | word_count = 0; 419 | last_word_count = 0; 420 | sentence_length = 0; 421 | fseek(fi, file_size / (long long)num_threads * (long long)id, SEEK_SET); 422 | continue; 423 | } 424 | word = sen[sentence_position]; 425 | if (word == -1) continue; 426 | for (c = 0; c < layer1_size; c++) neu1[c] = 0; 427 | for (c = 0; c < layer1_size; c++) neu1e[c] = 0; 428 | next_random = next_random * (unsigned long long)25214903917 + 11; 429 | b = next_random % window; 430 | if (cbow) { //train the cbow architecture 431 | // in -> hidden 432 | cw = 0; 433 | for (a = b; a < window * 2 + 1 - b; a++) if (a != window) { 434 | c = sentence_position - window + a; 435 | if (c < 0) continue; 436 | if (c >= sentence_length) continue; 437 | last_word = sen[c]; 438 | if (last_word == -1) continue; 439 | for (c = 0; c < layer1_size; c++) neu1[c] += syn0[c + last_word * layer1_size]; 440 | cw++; 441 | } 442 | if (cw) { 443 | for (c = 0; c < layer1_size; c++) neu1[c] /= cw; 444 | if (hs) for (d = 0; d < vocab[word].codelen; d++) { 445 | f = 0; 446 | l2 = vocab[word].point[d] * layer1_size; 447 | // Propagate hidden -> output 448 | for (c = 0; c < layer1_size; c++) f += neu1[c] * syn1[c + l2]; 449 | if (f <= -MAX_EXP) continue; 450 | else if (f >= MAX_EXP) continue; 451 | else f = expTable[(int)((f + MAX_EXP) * (EXP_TABLE_SIZE / MAX_EXP / 2))]; 452 | // 'g' is the gradient multiplied by the learning rate 453 | g = (1 - vocab[word].code[d] - f) * alpha; 454 | // Propagate errors output -> hidden 455 | for (c = 0; c < layer1_size; c++) neu1e[c] += g * syn1[c + l2]; 456 | // Learn weights hidden -> output 457 | for (c = 0; c < layer1_size; c++) syn1[c + l2] += g * neu1[c]; 458 | } 459 | // NEGATIVE SAMPLING 460 | if (negative > 0) for (d = 0; d < negative + 1; d++) { 461 | if (d == 0) { 462 | target = word; 463 | label = 1; 464 | } else { 465 | next_random = next_random * (unsigned long long)25214903917 + 11; 466 | target = table[(next_random >> 16) % table_size]; 467 | if (target == 0) target = next_random % (vocab_size - 1) + 1; 468 | if (target == word) continue; 469 | label = 0; 470 | } 471 | l2 = target * layer1_size; 472 | f = 0; 473 | for (c = 0; c < layer1_size; c++) f += neu1[c] * syn1neg[c + l2]; 474 | if (f > MAX_EXP) g = (label - 1) * alpha; 475 | else if (f < -MAX_EXP) g = (label - 0) * alpha; 476 | else g = (label - expTable[(int)((f + MAX_EXP) * (EXP_TABLE_SIZE / MAX_EXP / 2))]) * alpha; 477 | for (c = 0; c < layer1_size; c++) neu1e[c] += g * syn1neg[c + l2]; 478 | for (c = 0; c < layer1_size; c++) syn1neg[c + l2] += g * neu1[c]; 479 | } 480 | // hidden -> in 481 | for (a = b; a < window * 2 + 1 - b; a++) if (a != window) { 482 | c = sentence_position - window + a; 483 | if (c < 0) continue; 484 | if (c >= sentence_length) continue; 485 | last_word = sen[c]; 486 | if (last_word == -1) continue; 487 | for (c = 0; c < layer1_size; c++) syn0[c + last_word * layer1_size] += neu1e[c]; 488 | } 489 | } 490 | } else { //train skip-gram 491 | for (a = b; a < window * 2 + 1 - b; a++) if (a != window) { 492 | int w_size = window * 2 + 1 - b; 493 | c = sentence_position - window + a; 494 | if (c < 0) continue; 495 | if (c >= sentence_length) continue; 496 | last_word = sen[c]; 497 | if (last_word == -1) continue; 498 | l1 = last_word * layer1_size; 499 | for (c = 0; c < layer1_size; c++) neu1e[c] = 0; 500 | // HIERARCHICAL SOFTMAX 501 | if (hs) for (d = 0; d < vocab[word].codelen; d++) { 502 | f = 0; 503 | l2 = vocab[word].point[d] * layer1_size; 504 | // Propagate hidden -> output 505 | for (c = 0; c < layer1_size; c++) f += syn0[c + l1] * syn1[c + l2]; 506 | if (f <= -MAX_EXP) continue; 507 | else if (f >= MAX_EXP) continue; 508 | else f = expTable[(int)((f + MAX_EXP) * (EXP_TABLE_SIZE / MAX_EXP / 2))]; 509 | // 'g' is the gradient multiplied by the learning rate 510 | g = (1 - vocab[word].code[d] - f) * alpha; 511 | // Propagate errors output -> hidden 512 | for (c = 0; c < layer1_size; c++) neu1e[c] += g * syn1[c + l2]; 513 | // Learn weights hidden -> output 514 | for (c = 0; c < layer1_size; c++) syn1[c + l2] += g * syn0[c + l1]; 515 | } 516 | // NEGATIVE SAMPLING 517 | if (negative > 0) for (d = 0; d < negative + 1; d++) { 518 | if (d == 0) { 519 | target = word; 520 | label = 1; 521 | } else { 522 | next_random = next_random * (unsigned long long)25214903917 + 11; 523 | target = table[(next_random >> 16) % table_size]; 524 | if (target == 0) target = next_random % (vocab_size - 1) + 1; 525 | if (target == word) continue; 526 | label = 0; 527 | } 528 | l2 = target * layer1_size; 529 | //rateTable[target] *= (1 - 100 / (real)(iter * train_words + 1)); 530 | //if (alpha < starting_alpha * 0.0001) alpha = starting_alpha * 0.0001; 531 | f = 0; 532 | for (c = 0; c < layer1_size; c++) f += syn0[c + l1] * syn0[c + l2]; 533 | //if (f < 0 && label == 0) continue; 534 | if (f > MAX_EXP) g = (label - 1); 535 | else if (f < -MAX_EXP) g = (label - 0); 536 | else g = (label - expTable[(int)((f + MAX_EXP) * (EXP_TABLE_SIZE / MAX_EXP / 2))]); 537 | for (c = 0; c < layer1_size; c++) neu1e[c] += g * syn0[c + l2]; 538 | for (c = 0; c < layer1_size; c++) _syn0[c] = syn0[c + l2]; 539 | int sum = 0; 540 | //real norm = 0; 541 | //for (c = 0; c < layer1_size; c++) sum += syn1neg[c + l2] * syn1neg[c + l2]; 542 | //norm = sqrt(sum); 543 | int satisfied = 0; 544 | real rate = 1; 545 | while(satisfied < 10){ 546 | //real sum = 0; 547 | //for (c = 0; c < layer1_size; c++) sum += fabs(syn0[c + l2]); 548 | for (c = 0; c < layer1_size; c++){ 549 | //real l_norm = 0; 550 | //if (norm == 0) l_norm = 0; 551 | //else l_norm = 0.04 * syn1neg[c + 12] / norm; 552 | real s = 0; 553 | if (syn0[c + l2] == 0) s = 1; 554 | syn0[c + l2] += g * rate * rateTable[target] * syn0[c + l1] - 1e-8; 555 | if (syn0[c + l2] < 0){ 556 | syn0[c + l2] = 0; 557 | if (s == 0) sum += 1; 558 | } 559 | } 560 | if (sum < 5 || satisfied == 10){ 561 | //printf("1 %d %d\n",sum,satisfied); 562 | break; 563 | } 564 | else{ 565 | sum = 0; 566 | for (c = 0; c < layer1_size; c++) syn0[c + l2] = _syn0[c]; 567 | } 568 | satisfied += 1; 569 | rate *= 0.6; 570 | } 571 | rateTable[target] *= (1 - 10 / (real)(iter * train_words + 1)); 572 | if (rateTable[target] < starting_alpha * 0.0001) rateTable[target] = starting_alpha * 0.0001; 573 | } 574 | int satisfied = 0; 575 | int sum = 0; 576 | real rate = 1; 577 | for (c = 0; c < layer1_size; c++) _syn0[c] = syn0[c + l1]; 578 | while(satisfied != 10){ 579 | //for (c = 0; c < layer1_size; c++) sum += fabs(syn0[c + l1]); 580 | for (c = 0; c < layer1_size; c++){ 581 | //real l_norm1 = 0; 582 | //if (norm1 == 0) l_norm1 = 0; 583 | //else l_norm1 = 0.04 * syn0[c + l1] / norm1; 584 | real s = 0; 585 | if (syn0[c + l1] == 0) s = 1; 586 | syn0[c + l1] += rate * neu1e[c] * rateTable[last_word] - 1e-8; 587 | if (syn0[c + l1] < 0){ 588 | syn0[c + l1] = 0; 589 | if (s == 0) sum += 1; 590 | } 591 | } 592 | if (sum < 5 || satisfied == 10){ 593 | //printf("2 %d %d\n",sum,satisfied); 594 | break; 595 | } 596 | else{ 597 | sum = 0; 598 | for (c = 0; c < layer1_size; c++) syn0[c + l1] = _syn0[c]; 599 | } 600 | satisfied += 1; 601 | rate *= 0.6; 602 | } 603 | rateTable[last_word] *= (1 - 100 / (real)(iter * train_words + 1)); 604 | if (rateTable[last_word] < starting_alpha * 0.0001) rateTable[last_word] = starting_alpha * 0.0001; 605 | } 606 | 607 | } 608 | sentence_position++; 609 | if (sentence_position >= sentence_length) { 610 | sentence_length = 0; 611 | continue; 612 | } 613 | } 614 | fclose(fi); 615 | free(neu1); 616 | free(neu1e); 617 | pthread_exit(NULL); 618 | } 619 | 620 | void TrainModel() { 621 | long a, b, c, d; 622 | FILE *fo, *fo2; 623 | pthread_t *pt = (pthread_t *)malloc(num_threads * sizeof(pthread_t)); 624 | printf("Starting training using file %s\n", train_file); 625 | starting_alpha = alpha; 626 | if (read_vocab_file[0] != 0) ReadVocab(); else LearnVocabFromTrainFile(); 627 | if (save_vocab_file[0] != 0) SaveVocab(); 628 | if (output_file[0] == 0) return; 629 | InitNet(); 630 | if (negative > 0) InitUnigramTable(); 631 | start = clock(); 632 | for (a = 0; a < num_threads; a++) pthread_create(&pt[a], NULL, TrainModelThread, (void *)a); 633 | for (a = 0; a < num_threads; a++) pthread_join(pt[a], NULL); 634 | fo = fopen(output_file, "wb"); 635 | //fo2 = fopen(output_file2, "wb"); 636 | if (classes == 0) { 637 | // Save the word vectors 638 | fprintf(fo, "%lld %lld\n", vocab_size, layer1_size); 639 | //fprintf(fo2, "%lld %lld\n", vocab_size, layer1_size); 640 | for (a = 0; a < vocab_size; a++) { 641 | fprintf(fo, "%s ", vocab[a].word); 642 | if (binary) for (b = 0; b < layer1_size; b++) fwrite(&syn0[a * layer1_size + b], sizeof(real), 1, fo); 643 | else for (b = 0; b < layer1_size; b++) fprintf(fo, "%lf ", syn0[a * layer1_size + b]); 644 | fprintf(fo, "\n"); 645 | } 646 | /* 647 | for (a = 0; a < vocab_size; a++){ 648 | fprintf(fo2, "%s ", vocab[a].word); 649 | if (binary) for (b = 0; b < layer1_size; b++) fwrite(&syn1neg[a * layer1_size + b], sizeof(real), 1, fo2); 650 | else for (b = 0; b < layer1_size; b++) fprintf(fo2, "%lf ", syn1neg[a * layer1_size + b]); 651 | fprintf(fo2, "\n"); 652 | } 653 | */ 654 | } else { 655 | // Run K-means on the word vectors 656 | int clcn = classes, iter = 10, closeid; 657 | int *centcn = (int *)malloc(classes * sizeof(int)); 658 | int *cl = (int *)calloc(vocab_size, sizeof(int)); 659 | real closev, x; 660 | real *cent = (real *)calloc(classes * layer1_size, sizeof(real)); 661 | for (a = 0; a < vocab_size; a++) cl[a] = a % clcn; 662 | for (a = 0; a < iter; a++) { 663 | for (b = 0; b < clcn * layer1_size; b++) cent[b] = 0; 664 | for (b = 0; b < clcn; b++) centcn[b] = 1; 665 | for (c = 0; c < vocab_size; c++) { 666 | for (d = 0; d < layer1_size; d++) cent[layer1_size * cl[c] + d] += syn0[c * layer1_size + d]; 667 | centcn[cl[c]]++; 668 | } 669 | for (b = 0; b < clcn; b++) { 670 | closev = 0; 671 | for (c = 0; c < layer1_size; c++) { 672 | cent[layer1_size * b + c] /= centcn[b]; 673 | closev += cent[layer1_size * b + c] * cent[layer1_size * b + c]; 674 | } 675 | closev = sqrt(closev); 676 | for (c = 0; c < layer1_size; c++) cent[layer1_size * b + c] /= closev; 677 | } 678 | for (c = 0; c < vocab_size; c++) { 679 | closev = -10; 680 | closeid = 0; 681 | for (d = 0; d < clcn; d++) { 682 | x = 0; 683 | for (b = 0; b < layer1_size; b++) x += cent[layer1_size * d + b] * syn0[c * layer1_size + b]; 684 | if (x > closev) { 685 | closev = x; 686 | closeid = d; 687 | } 688 | } 689 | cl[c] = closeid; 690 | } 691 | } 692 | // Save the K-means classes 693 | for (a = 0; a < vocab_size; a++) fprintf(fo, "%s %d\n", vocab[a].word, cl[a]); 694 | free(centcn); 695 | free(cent); 696 | free(cl); 697 | } 698 | fclose(fo); 699 | } 700 | 701 | int ArgPos(char *str, int argc, char **argv) { 702 | int a; 703 | for (a = 1; a < argc; a++) if (!strcmp(str, argv[a])) { 704 | if (a == argc - 1) { 705 | printf("Argument missing for %s\n", str); 706 | exit(1); 707 | } 708 | return a; 709 | } 710 | return -1; 711 | } 712 | 713 | int main(int argc, char **argv) { 714 | int i; 715 | if (argc == 1) { 716 | printf("WORD VECTOR estimation toolkit v 0.1c\n\n"); 717 | printf("Options:\n"); 718 | printf("Parameters for training:\n"); 719 | printf("\t-train \n"); 720 | printf("\t\tUse text data from to train the model\n"); 721 | printf("\t-output \n"); 722 | printf("\t\tUse to save the resulting word vectors / word clusters\n"); 723 | printf("\t-size \n"); 724 | printf("\t\tSet size of word vectors; default is 100\n"); 725 | printf("\t-window \n"); 726 | printf("\t\tSet max skip length between words; default is 5\n"); 727 | printf("\t-sample \n"); 728 | printf("\t\tSet threshold for occurrence of words. Those that appear with higher frequency in the training data\n"); 729 | printf("\t\twill be randomly down-sampled; default is 1e-3, useful range is (0, 1e-5)\n"); 730 | printf("\t-hs \n"); 731 | printf("\t\tUse Hierarchical Softmax; default is 0 (not used)\n"); 732 | printf("\t-negative \n"); 733 | printf("\t\tNumber of negative examples; default is 5, common values are 3 - 10 (0 = not used)\n"); 734 | printf("\t-threads \n"); 735 | printf("\t\tUse threads (default 12)\n"); 736 | printf("\t-iter \n"); 737 | printf("\t\tRun more training iterations (default 5)\n"); 738 | printf("\t-min-count \n"); 739 | printf("\t\tThis will discard words that appear less than times; default is 5\n"); 740 | printf("\t-alpha \n"); 741 | printf("\t\tSet the starting learning rate; default is 0.025 for skip-gram and 0.05 for CBOW\n"); 742 | printf("\t-classes \n"); 743 | printf("\t\tOutput word classes rather than word vectors; default number of classes is 0 (vectors are written)\n"); 744 | printf("\t-debug \n"); 745 | printf("\t\tSet the debug mode (default = 2 = more info during training)\n"); 746 | printf("\t-binary \n"); 747 | printf("\t\tSave the resulting vectors in binary moded; default is 0 (off)\n"); 748 | printf("\t-save-vocab \n"); 749 | printf("\t\tThe vocabulary will be saved to \n"); 750 | printf("\t-read-vocab \n"); 751 | printf("\t\tThe vocabulary will be read from , not constructed from the training data\n"); 752 | printf("\t-cbow \n"); 753 | printf("\t\tUse the continuous bag of words model; default is 1 (use 0 for skip-gram model)\n"); 754 | printf("\nExamples:\n"); 755 | printf("./word2vec -train data.txt -output vec.txt -size 200 -window 5 -sample 1e-4 -negative 5 -hs 0 -binary 0 -cbow 1 -iter 3\n\n"); 756 | return 0; 757 | } 758 | output_file[0] = 0; 759 | save_vocab_file[0] = 0; 760 | read_vocab_file[0] = 0; 761 | if ((i = ArgPos((char *)"-size", argc, argv)) > 0) layer1_size = atoi(argv[i + 1]); 762 | if ((i = ArgPos((char *)"-train", argc, argv)) > 0) strcpy(train_file, argv[i + 1]); 763 | if ((i = ArgPos((char *)"-save-vocab", argc, argv)) > 0) strcpy(save_vocab_file, argv[i + 1]); 764 | if ((i = ArgPos((char *)"-read-vocab", argc, argv)) > 0) strcpy(read_vocab_file, argv[i + 1]); 765 | if ((i = ArgPos((char *)"-debug", argc, argv)) > 0) debug_mode = atoi(argv[i + 1]); 766 | if ((i = ArgPos((char *)"-binary", argc, argv)) > 0) binary = atoi(argv[i + 1]); 767 | if ((i = ArgPos((char *)"-cbow", argc, argv)) > 0) cbow = atoi(argv[i + 1]); 768 | if (cbow) alpha = 0.05; 769 | if ((i = ArgPos((char *)"-alpha", argc, argv)) > 0) alpha = atof(argv[i + 1]); 770 | if ((i = ArgPos((char *)"-output", argc, argv)) > 0) strcpy(output_file, argv[i + 1]); 771 | if ((i = ArgPos((char *)"-output2", argc, argv)) > 0) strcpy(output_file2, argv[i + 1]); 772 | if ((i = ArgPos((char *)"-window", argc, argv)) > 0) window = atoi(argv[i + 1]); 773 | if ((i = ArgPos((char *)"-sample", argc, argv)) > 0) sample = atof(argv[i + 1]); 774 | if ((i = ArgPos((char *)"-hs", argc, argv)) > 0) hs = atoi(argv[i + 1]); 775 | if ((i = ArgPos((char *)"-negative", argc, argv)) > 0) negative = atoi(argv[i + 1]); 776 | if ((i = ArgPos((char *)"-threads", argc, argv)) > 0) num_threads = atoi(argv[i + 1]); 777 | if ((i = ArgPos((char *)"-iter", argc, argv)) > 0) iter = atoi(argv[i + 1]); 778 | if ((i = ArgPos((char *)"-min-count", argc, argv)) > 0) min_count = atoi(argv[i + 1]); 779 | if ((i = ArgPos((char *)"-classes", argc, argv)) > 0) classes = atoi(argv[i + 1]); 780 | vocab = (struct vocab_word *)calloc(vocab_max_size, sizeof(struct vocab_word)); 781 | vocab_hash = (int *)calloc(vocab_hash_size, sizeof(int)); 782 | expTable = (real *)malloc((EXP_TABLE_SIZE + 1) * sizeof(real)); 783 | for (i = 0; i < EXP_TABLE_SIZE; i++) { 784 | expTable[i] = exp((i / (real)EXP_TABLE_SIZE * 2 - 1) * MAX_EXP); // Precompute the exp() table 785 | expTable[i] = expTable[i] / (expTable[i] + 1); // Precompute f(x) = x / (x + 1) 786 | } 787 | TrainModel(); 788 | return 0; 789 | } 790 | -------------------------------------------------------------------------------- /word2nvec-c/word2phrase.c: -------------------------------------------------------------------------------- 1 | // Copyright 2013 Google Inc. All Rights Reserved. 2 | // 3 | // Licensed under the Apache License, Version 2.0 (the "License"); 4 | // you may not use this file except in compliance with the License. 5 | // You may obtain a copy of the License at 6 | // 7 | // http://www.apache.org/licenses/LICENSE-2.0 8 | // 9 | // Unless required by applicable law or agreed to in writing, software 10 | // distributed under the License is distributed on an "AS IS" BASIS, 11 | // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 | // See the License for the specific language governing permissions and 13 | // limitations under the License. 14 | 15 | #include 16 | #include 17 | #include 18 | #include 19 | #include 20 | 21 | #define MAX_STRING 60 22 | 23 | const int vocab_hash_size = 500000000; // Maximum 500M entries in the vocabulary 24 | 25 | typedef float real; // Precision of float numbers 26 | 27 | struct vocab_word { 28 | long long cn; 29 | char *word; 30 | }; 31 | 32 | char train_file[MAX_STRING], output_file[MAX_STRING]; 33 | struct vocab_word *vocab; 34 | int debug_mode = 2, min_count = 5, *vocab_hash, min_reduce = 1; 35 | long long vocab_max_size = 10000, vocab_size = 0; 36 | long long train_words = 0; 37 | real threshold = 100; 38 | 39 | unsigned long long next_random = 1; 40 | 41 | // Reads a single word from a file, assuming space + tab + EOL to be word boundaries 42 | void ReadWord(char *word, FILE *fin) { 43 | int a = 0, ch; 44 | while (!feof(fin)) { 45 | ch = fgetc(fin); 46 | if (ch == 13) continue; 47 | if ((ch == ' ') || (ch == '\t') || (ch == '\n')) { 48 | if (a > 0) { 49 | if (ch == '\n') ungetc(ch, fin); 50 | break; 51 | } 52 | if (ch == '\n') { 53 | strcpy(word, (char *)""); 54 | return; 55 | } else continue; 56 | } 57 | word[a] = ch; 58 | a++; 59 | if (a >= MAX_STRING - 1) a--; // Truncate too long words 60 | } 61 | word[a] = 0; 62 | } 63 | 64 | // Returns hash value of a word 65 | int GetWordHash(char *word) { 66 | unsigned long long a, hash = 1; 67 | for (a = 0; a < strlen(word); a++) hash = hash * 257 + word[a]; 68 | hash = hash % vocab_hash_size; 69 | return hash; 70 | } 71 | 72 | // Returns position of a word in the vocabulary; if the word is not found, returns -1 73 | int SearchVocab(char *word) { 74 | unsigned int hash = GetWordHash(word); 75 | while (1) { 76 | if (vocab_hash[hash] == -1) return -1; 77 | if (!strcmp(word, vocab[vocab_hash[hash]].word)) return vocab_hash[hash]; 78 | hash = (hash + 1) % vocab_hash_size; 79 | } 80 | return -1; 81 | } 82 | 83 | // Reads a word and returns its index in the vocabulary 84 | int ReadWordIndex(FILE *fin) { 85 | char word[MAX_STRING]; 86 | ReadWord(word, fin); 87 | if (feof(fin)) return -1; 88 | return SearchVocab(word); 89 | } 90 | 91 | // Adds a word to the vocabulary 92 | int AddWordToVocab(char *word) { 93 | unsigned int hash, length = strlen(word) + 1; 94 | if (length > MAX_STRING) length = MAX_STRING; 95 | vocab[vocab_size].word = (char *)calloc(length, sizeof(char)); 96 | strcpy(vocab[vocab_size].word, word); 97 | vocab[vocab_size].cn = 0; 98 | vocab_size++; 99 | // Reallocate memory if needed 100 | if (vocab_size + 2 >= vocab_max_size) { 101 | vocab_max_size += 10000; 102 | vocab=(struct vocab_word *)realloc(vocab, vocab_max_size * sizeof(struct vocab_word)); 103 | } 104 | hash = GetWordHash(word); 105 | while (vocab_hash[hash] != -1) hash = (hash + 1) % vocab_hash_size; 106 | vocab_hash[hash]=vocab_size - 1; 107 | return vocab_size - 1; 108 | } 109 | 110 | // Used later for sorting by word counts 111 | int VocabCompare(const void *a, const void *b) { 112 | return ((struct vocab_word *)b)->cn - ((struct vocab_word *)a)->cn; 113 | } 114 | 115 | // Sorts the vocabulary by frequency using word counts 116 | void SortVocab() { 117 | int a; 118 | unsigned int hash; 119 | // Sort the vocabulary and keep at the first position 120 | qsort(&vocab[1], vocab_size - 1, sizeof(struct vocab_word), VocabCompare); 121 | for (a = 0; a < vocab_hash_size; a++) vocab_hash[a] = -1; 122 | for (a = 0; a < vocab_size; a++) { 123 | // Words occuring less than min_count times will be discarded from the vocab 124 | if (vocab[a].cn < min_count) { 125 | vocab_size--; 126 | free(vocab[vocab_size].word); 127 | } else { 128 | // Hash will be re-computed, as after the sorting it is not actual 129 | hash = GetWordHash(vocab[a].word); 130 | while (vocab_hash[hash] != -1) hash = (hash + 1) % vocab_hash_size; 131 | vocab_hash[hash] = a; 132 | } 133 | } 134 | vocab = (struct vocab_word *)realloc(vocab, vocab_size * sizeof(struct vocab_word)); 135 | } 136 | 137 | // Reduces the vocabulary by removing infrequent tokens 138 | void ReduceVocab() { 139 | int a, b = 0; 140 | unsigned int hash; 141 | for (a = 0; a < vocab_size; a++) if (vocab[a].cn > min_reduce) { 142 | vocab[b].cn = vocab[a].cn; 143 | vocab[b].word = vocab[a].word; 144 | b++; 145 | } else free(vocab[a].word); 146 | vocab_size = b; 147 | for (a = 0; a < vocab_hash_size; a++) vocab_hash[a] = -1; 148 | for (a = 0; a < vocab_size; a++) { 149 | // Hash will be re-computed, as it is not actual 150 | hash = GetWordHash(vocab[a].word); 151 | while (vocab_hash[hash] != -1) hash = (hash + 1) % vocab_hash_size; 152 | vocab_hash[hash] = a; 153 | } 154 | fflush(stdout); 155 | min_reduce++; 156 | } 157 | 158 | void LearnVocabFromTrainFile() { 159 | char word[MAX_STRING], last_word[MAX_STRING], bigram_word[MAX_STRING * 2]; 160 | FILE *fin; 161 | long long a, i, start = 1; 162 | for (a = 0; a < vocab_hash_size; a++) vocab_hash[a] = -1; 163 | fin = fopen(train_file, "rb"); 164 | if (fin == NULL) { 165 | printf("ERROR: training data file not found!\n"); 166 | exit(1); 167 | } 168 | vocab_size = 0; 169 | AddWordToVocab((char *)""); 170 | while (1) { 171 | ReadWord(word, fin); 172 | if (feof(fin)) break; 173 | if (!strcmp(word, "")) { 174 | start = 1; 175 | continue; 176 | } else start = 0; 177 | train_words++; 178 | if ((debug_mode > 1) && (train_words % 100000 == 0)) { 179 | printf("Words processed: %lldK Vocab size: %lldK %c", train_words / 1000, vocab_size / 1000, 13); 180 | fflush(stdout); 181 | } 182 | i = SearchVocab(word); 183 | if (i == -1) { 184 | a = AddWordToVocab(word); 185 | vocab[a].cn = 1; 186 | } else vocab[i].cn++; 187 | if (start) continue; 188 | sprintf(bigram_word, "%s_%s", last_word, word); 189 | bigram_word[MAX_STRING - 1] = 0; 190 | strcpy(last_word, word); 191 | i = SearchVocab(bigram_word); 192 | if (i == -1) { 193 | a = AddWordToVocab(bigram_word); 194 | vocab[a].cn = 1; 195 | } else vocab[i].cn++; 196 | if (vocab_size > vocab_hash_size * 0.7) ReduceVocab(); 197 | } 198 | SortVocab(); 199 | if (debug_mode > 0) { 200 | printf("\nVocab size (unigrams + bigrams): %lld\n", vocab_size); 201 | printf("Words in train file: %lld\n", train_words); 202 | } 203 | fclose(fin); 204 | } 205 | 206 | void TrainModel() { 207 | long long pa = 0, pb = 0, pab = 0, oov, i, li = -1, cn = 0; 208 | char word[MAX_STRING], last_word[MAX_STRING], bigram_word[MAX_STRING * 2]; 209 | real score; 210 | FILE *fo, *fin; 211 | printf("Starting training using file %s\n", train_file); 212 | LearnVocabFromTrainFile(); 213 | fin = fopen(train_file, "rb"); 214 | fo = fopen(output_file, "wb"); 215 | word[0] = 0; 216 | while (1) { 217 | strcpy(last_word, word); 218 | ReadWord(word, fin); 219 | if (feof(fin)) break; 220 | if (!strcmp(word, "")) { 221 | fprintf(fo, "\n"); 222 | continue; 223 | } 224 | cn++; 225 | if ((debug_mode > 1) && (cn % 100000 == 0)) { 226 | printf("Words written: %lldK%c", cn / 1000, 13); 227 | fflush(stdout); 228 | } 229 | oov = 0; 230 | i = SearchVocab(word); 231 | if (i == -1) oov = 1; else pb = vocab[i].cn; 232 | if (li == -1) oov = 1; 233 | li = i; 234 | sprintf(bigram_word, "%s_%s", last_word, word); 235 | bigram_word[MAX_STRING - 1] = 0; 236 | i = SearchVocab(bigram_word); 237 | if (i == -1) oov = 1; else pab = vocab[i].cn; 238 | if (pa < min_count) oov = 1; 239 | if (pb < min_count) oov = 1; 240 | if (oov) score = 0; else score = (pab - min_count) / (real)pa / (real)pb * (real)train_words; 241 | if (score > threshold) { 242 | fprintf(fo, "_%s", word); 243 | pb = 0; 244 | } else fprintf(fo, " %s", word); 245 | pa = pb; 246 | } 247 | fclose(fo); 248 | fclose(fin); 249 | } 250 | 251 | int ArgPos(char *str, int argc, char **argv) { 252 | int a; 253 | for (a = 1; a < argc; a++) if (!strcmp(str, argv[a])) { 254 | if (a == argc - 1) { 255 | printf("Argument missing for %s\n", str); 256 | exit(1); 257 | } 258 | return a; 259 | } 260 | return -1; 261 | } 262 | 263 | int main(int argc, char **argv) { 264 | int i; 265 | if (argc == 1) { 266 | printf("WORD2PHRASE tool v0.1a\n\n"); 267 | printf("Options:\n"); 268 | printf("Parameters for training:\n"); 269 | printf("\t-train \n"); 270 | printf("\t\tUse text data from to train the model\n"); 271 | printf("\t-output \n"); 272 | printf("\t\tUse to save the resulting word vectors / word clusters / phrases\n"); 273 | printf("\t-min-count \n"); 274 | printf("\t\tThis will discard words that appear less than times; default is 5\n"); 275 | printf("\t-threshold \n"); 276 | printf("\t\t The value represents threshold for forming the phrases (higher means less phrases); default 100\n"); 277 | printf("\t-debug \n"); 278 | printf("\t\tSet the debug mode (default = 2 = more info during training)\n"); 279 | printf("\nExamples:\n"); 280 | printf("./word2phrase -train text.txt -output phrases.txt -threshold 100 -debug 2\n\n"); 281 | return 0; 282 | } 283 | if ((i = ArgPos((char *)"-train", argc, argv)) > 0) strcpy(train_file, argv[i + 1]); 284 | if ((i = ArgPos((char *)"-debug", argc, argv)) > 0) debug_mode = atoi(argv[i + 1]); 285 | if ((i = ArgPos((char *)"-output", argc, argv)) > 0) strcpy(output_file, argv[i + 1]); 286 | if ((i = ArgPos((char *)"-min-count", argc, argv)) > 0) min_count = atoi(argv[i + 1]); 287 | if ((i = ArgPos((char *)"-threshold", argc, argv)) > 0) threshold = atof(argv[i + 1]); 288 | vocab = (struct vocab_word *)calloc(vocab_max_size, sizeof(struct vocab_word)); 289 | vocab_hash = (int *)calloc(vocab_hash_size, sizeof(int)); 290 | TrainModel(); 291 | return 0; 292 | } 293 | -------------------------------------------------------------------------------- /words: -------------------------------------------------------------------------------- 1 | type form fbi way kind manner 2 2 | translates describes combines jan included includes 3 3 | gospel baptism indian jesus faith judaism 2 4 | Franz Johann Wilhelm Friedrich literally von 4 5 | tower florence eiffel society rome cathedral 3 6 | poles slaves earth camps indians displaced 2 7 | level meters bipolar gauge feet depth 2 8 | went dominate eventually return mathbf back 4 9 | into artificial transformed entered split divided 1 10 | introduced incorporated newly man subsequently added 3 11 | academy bipolar received prize awarded award 1 12 | landing group crew flying fire fighter 1 13 | lead cause whale potentially can affect 2 14 | nearly almost virtually practically leto all 4 15 | version del spanish jos luis costa 0 16 | charles viii louis henry xiv vii 2 17 | athens caesar babylon kings final alexander 4 18 | grand route montreal san mark francisco 4 19 | billion big bad looks luck boys 0 20 | vowel esperanto image consonant vowels consonants 2 21 | ipv dns irc ethernet originally packet 4 22 | while galaxies whilst behind others driving 1 23 | their respective counterparts slow homes retain 3 24 | count macbeth iii see alfonso vii 3 25 | prominent famous important influential whale popular 4 26 | differently worn boxes media collectively bigger 3 27 | chinese korean han ray japanese dog 3 28 | briefly with thirteen fourteen sixteen eleven 1 29 | image nineteenth jpg node png gif 1 30 | evidence reports claims found lovecraft dinosaurs 4 31 | more less slower stronger band simpler 4 32 | hash algorithm using its tables operations 3 33 | criticism higher rates average cent lower 0 34 | again too very quite relatively extremely 0 35 | easter literally simply either commonly contain 0 36 | cell cycle cells combustion their membrane 4 37 | bowling cricket grand ace scoring gehrig 2 38 | list lists shows references listed linked 2 39 | school secondary long schools primary learning 2 40 | our align humanity beings reality mankind 1 41 | does whale seal island voyage sea 0 42 | county data store cache storage address 0 43 | align right testament left center chi 2 44 | these participants psi all assigned remaining 2 45 | data long short duration hours longest 0 46 | attack battle enemy manifold hannibal surprise 3 47 | yellow billion www brazil http brazilian 0 48 | set setting team follow sets item 2 49 | might beer would seems appears assumed 1 50 | daily press sunday with release magazine 3 51 | with friendship exception dealing sometimes associated 4 52 | article higher articles page summary detailed 1 53 | little any longer doubt for hardly 4 54 | flip gif png chilean block jpg 3 55 | site money sites website fan heritage 1 56 | year years planning next days ten 2 57 | floppy disks drives apple mark disk 4 58 | est expenditures median manpower probability unpaved 4 59 | retrieved leto her duncan herself mary 0 60 | again arrested reportedly once above finally 4 61 | gay three women lesbian female male 1 62 | having coercion successfully cola been coca 1 63 | could must enter carolina continue forever 3 64 | see also uses disambiguation technically terminology 4 65 | will would should already future otherwise 3 66 | inuit cypriot basque crisis albanian bulgarian 3 67 | party urban remote areas populated rural 0 68 | technically psi phi rangle rays ray 0 69 | public private campus downtown dream minneapolis 4 70 | gene index brain genes chromosome genome 1 71 | agave money tree cats elephants leaf 1 72 | does did not meet bah necessarily 4 73 | dialect cornish celtic compared cuisine dialects 3 74 | include may involve extent natural seem 4 75 | shortly agave after abandoned leaving soon 1 76 | large flows number huge growing significant 1 77 | fugue italian bwv concerto human bach 4 78 | sometimes commonly often referred image occasionally 4 79 | engine bmw engines athens rifles rifle 3 80 | for instance reserved searching cell purposes 4 81 | lisp pascal processor human amd instruction 3 82 | lord song dead lady wood our 4 83 | already heard note learned had done 2 84 | hotel exhibition bipolar hospital residence dartmouth 2 85 | above chord time bar table add 2 86 | organization monetary fund participation will international 4 87 | how about index questions why thinking 2 88 | market lovecraft minor trade export economy 1 89 | order attempts martial replace restore trying 2 90 | nothing neither nor doubt mid matter 4 91 | atari nintendo cards order doom mario 3 92 | acids amino acid bond abu bonds 4 93 | man young hash doctor previously older 2 94 | beer drink alcoholic chiang taste wine 3 95 | for alternative excellent qur extraordinary enormous 0 96 | family inuit uncle maria inherited eleanor 1 97 | dracula batman comics chiang superman fantastic 3 98 | martial style dance painting get art 4 99 | lovecraft bah eds other itu ron 3 100 | remaining jury bit six female iso 1 101 | known well public documented popularly collectively 2 102 | atheism season buddha meditation buddhism buddhist 1 103 | other including media like unrelated resemble 2 104 | hash named called honor renamed honour 0 105 | based depending rely data loosely upon 3 106 | because hamas partly choice somewhat false 1 107 | zero ago bce show around acres 3 108 | little cable stations broadcasting abc radio 0 109 | louis season half saint xavier latter 1 110 | galaxies from withdrew onwards freed ranging 0 111 | pierre ritual jacques roosevelt michel andr 1 112 | together literally friends worked fellow working 1 113 | call helped remaining asked answer sought 2 114 | pounds band cubic tons inch yards 1 115 | making made fugue makes make giving 2 116 | bipolar syndrome disorder beer treatment fever 3 117 | these mathbf acceleration eta mbox mass 0 118 | which includes featured also psi saw 4 119 | forward ball pointing behind family line 4 120 | ski rand russell peirce carnegie wittgenstein 0 121 | est age organization expenditures manpower rate 2 122 | zero news weekly channel poll broadcast 0 123 | programming visual hop functional learning techniques 2 124 | manifold hilbert dimension euclidean winter spaces 4 125 | team league public football teams baseball 2 126 | crisis post famine hubbard subsequent disaster 3 127 | wikipedia laws gutenberg free wiki kde 1 128 | out nearly carried laid broke turned 1 129 | personal agent calgary free marketing providing 2 130 | purchased pen wholly rand locally manufactured 3 131 | dracula probability real equal value sum 0 132 | note suggests lennon indicates fact interesting 2 133 | group age gregorian beginning calendar starting 0 134 | public face down broken back stand 0 135 | personal bah faith movement doctrine creed 0 136 | israel yahweh covenant ark israel judah 4 137 | ray signal only process synthesis circuits 2 138 | align etc magic ascii slang characters 0 139 | scientists index fans commentators historians economists 1 140 | index simon http money publishing wide 3 141 | stated showed claiming instead concluded argues 3 142 | etc became becomes become becoming increasingly 0 143 | lisp instead didn necessarily really doesn 0 144 | gods myth party goddess mythology homer 2 145 | hypnosis aids mathbf clinical hiv health 2 146 | instead fbi conspiracy crime murder against 0 147 | april dream dreams strange jokes darkness 0 148 | election gandhi lifetime legacy predecessor funeral 0 149 | hubbard die des martin seen dianetics 4 150 | accompanied rand preceded followed assisted surrounded 1 151 | media iec targeted anti repeated widespread 1 152 | get off commons you managed stop 2 153 | blood tissue milk liver our cow 4 154 | van paul haydn church beethoven carl 3 155 | final crisis cup championship playoffs match 1 156 | forward racial living race people kills 0 157 | highly hemingway poems drawings poetry novels 0 158 | fugue monty animation pictures studio python 0 159 | ritual stated worship sects judaism believers 1 160 | gandhi church apostles saints episcopal baptist 0 161 | mining oil food petroleum with agriculture 4 162 | dates accounts highly documents manuscripts records 2 163 | has had cell yet always been 2 164 | intel iso carl johann canal iec 4 165 | those pollution who individuals opposed experts 1 166 | hamas lovecraft jihad palestinian semitism lebanese 1 167 | literally planning progress development projects technological 0 168 | middle society community shows classes elite 3 169 | criticism between controversial this critique arose 1 170 | classical intel secular protestant christian philosophers 1 171 | monty between sides gap difference relationship 0 172 | easter day celebration christmas nothing ceremony 4 173 | constantinople byzantine patriarch fugue alexandria antioch 3 174 | lennon jury court jurisdiction courts defendant 0 175 | coercion abortion intercourse collection addiction abuse 3 176 | operating linux kernel order bsd windows 3 177 | notable examples notably common voltage most 4 178 | chiang macau eritrea kosovo originally cambodia 4 179 | von carl austrian soviets hans friedrich 3 180 | august listing june ford days march 3 181 | comes meaning derived word artificial removing 4 182 | british irish see sir english scottish 2 183 | zero money cash tax price credit 0 184 | corps battalion forward navy naval regiment 2 185 | took taking racing over control take 2 186 | world group class special type hierarchy 0 187 | finland estonia euro engine belarus denmark 3 188 | committee board promoted house assembly staff 2 189 | calculus turing differential mathematical long equations 4 190 | lennon booth mccarthy fugue david macdonald 3 191 | gandhi boeing esa shuttle launch airline 0 192 | version invention first release get publication 4 193 | godzilla landing daleks blackadder dune manga 1 194 | french money jean pierre jacques belgian 1 195 | highly gods popularity extremely particularly despite 1 196 | arthur lewis stephen clarke differently gibson 4 197 | logic mathematics french university cambridge princeton 2 198 | german germany berlin hitler arthur reich 4 199 | call athlete american actor footballer actress 0 200 | types besides variations bowling divisions ways 3 201 | collection show consists consisting sort thousands 1 202 | show talk host television instead personality 4 203 | upon saul german allah gift gave 2 204 | racing horse cross riding ritual driver 4 205 | party bnp player labour democrats alliance 2 206 | centre cities because adelaide centres city 2 207 | promoted chief secretary constantinople master served 3 208 | chilean heads federal set multi head 3 209 | recent dracula past since decades generations 1 210 | originally was designed poorly dates conceived 4 211 | three blade dimensional ieee iso dec 1 212 | magnetic electrons pollution particles atomic charge 2 213 | natural sub winter selection classification species 2 214 | april october von january february september 2 215 | aramaic text bipolar translation alphabet script 2 216 | blade hollow shaped bullet racing knife 4 217 | world fugue iii war vol schedule 1 218 | museum historic gay monument memorial gallery 2 219 | mid late highway israel census migration 3 220 | ski only danish hans herman norwegian 1 221 | its name principal shape slow whose 4 222 | shows when necessary happens where clear 0 223 | carbon site ammonia dioxide hydrogen helium 1 224 | performance shows better quality strength attention 1 225 | addition manner because interested contrast favor 2 226 | century recent nfc mathematicians ranked baron 1 227 | charts chart ski biggest selling singles 2 228 | technically nowadays how currently considered widely 2 229 | clarinet note instrument harmonica tuning flute 1 230 | cocaine fuel use ethanol which alcohol 4 231 | grail grand mormon calvin gospel isaiah 1 232 | nineteenth medieval hill twentieth millennium civilization 2 233 | from own isolation prosperity wealth its 0 234 | crucial person role factor technically member 4 235 | isbn publishers wright probability publishing penguin 3 236 | variant document remaining modified creates combination 2 237 | calgary athens airport alberta airlines montreal 1 238 | newly westminster mandate company martial royal 4 239 | retrieved yellow white blue red green 0 240 | april winter storm ages period weather 0 241 | tolkien jargon introduced dick symphony shi 2 242 | how human beings behaviour animal enjoy 0 243 | originally time same moment odds conclusion 0 244 | another self problem evolutionary way essentially 3 245 | ireland pollution soil forest earthquakes environment 0 246 | wood beads paper leather side ink 4 247 | hop hip rap genre took scene 4 248 | get rather than greater larger times 0 249 | commons peers lords atheism hereditary cabinet 3 250 | mark charlie hall jack centre jason 4 251 | player boxer more singer footballer actress 2 252 | anarcho klan anarchism hubbard libertarian capitalism 3 253 | little such labeled viewed result consequence 0 254 | ireland dublin tower scotland duchy lords 2 255 | shows fugue episodes series plays appeared 1 256 | while abu ibn qaeda muhammad bin 0 257 | became indian islands archipelago india indonesian 0 258 | trotsky lenin ira gorbachev accompanied stalin 4 259 | iec iso inducted seven nineteenth shah 4 260 | flows stream flow through their river 4 261 | under ban regulations schedule introduced patents 4 262 | evolutionary theories cosmology evolutionary hypothesis einstein 3 263 | third fourth second sixth their fifth 4 264 | seen jersey zealand york new orleans 0 265 | ford kennedy oswald jury washington fitzgerald 3 266 | system systems nervous telephone cell microwave 4 267 | society jersey history study medicine chemistry 1 268 | earth view party circle crater projection 2 269 | bush gore george hoover level wilson 4 270 | bwv originally jan bin dec der 1 271 | elsewhere hill mount end gate crossing 0 272 | large county aberdeen township indiana iowa 0 273 | jedi imperial family knights emperor prince 2 274 | seen can more traced cannot found 2 275 | iranian iran bengal align timor pakistan 3 276 | elsewhere australia seen usa japan europe 2 277 | museum side occasion behalf placed emphasis 0 278 | describe refers term ford refer referring 3 279 | election lisp presidential vote clinton candidate 1 280 | voltage frequency promoted waves intensity vacuum 2 281 | finalist laws wimbledon semi nfc division 1 282 | retrieved september seen accessed mid wesley 2 283 | canal dam bridges lakes ipv roads 4 284 | gaza jordan accompanied region strip zone 2 285 | body lennon rest vast nature distribution 1 286 | identity continuous subset remaining integers inverse 3 287 | slow change pace purchased response due 3 288 | compared gay according due supposed prior 1 289 | hill artificial intelligence general entropy relativity 0 290 | season bowl super nineteenth album summer 3 291 | jan wesley body dec vol paperback 2 292 | testament known ithaca anchor old penguin 1 293 | gibraltar personal australian gandhi jamaica minister 1 294 | lincoln state lincoln delaware alabama arkansas 2 295 | carolina north america texas atari gary 4 296 | united nations states founding flip kingdom 4 297 | commons laws law passed legislation constitution 0 298 | band metallica hendrix dylan remaining elvis 4 299 | galaxies galaxy comet agave galileo orbit 3 300 | soviets newly soviet warsaw pact operation 1 301 | --------------------------------------------------------------------------------