├── README.md ├── scripts ├── batch_run_lstm.sh ├── batch_run_multitask.sh ├── run_drug_var.sh ├── run_lstm.sh └── run_triple_multitask.sh └── theano_src ├── data_process.py ├── edmonds_mst.py ├── lstm_RE.py ├── neural_architectures.py ├── neural_lib.py └── train_util.py /README.md: -------------------------------------------------------------------------------- 1 | # Cross-Sentence N-ary Relation Extraction with Graph LSTMs 2 | 3 | This is the [data](https://drive.google.com/drive/folders/1Jgw6A08nh-4umCV7tfqQ6HFg7mtDwo67?usp=sharing) and source code of the papers: 4 | 5 | **Cross-Sentence N-ary Relation Extraction with Graph LSTMs** 6 | Nanyun Peng, Hoifung Poon, Chris Quirk, Kristina Toutanova and Wen-tau Yih 7 | *Transactions of the Association for Computational Linguistics*, Vol 5, 2017 8 | 9 | If you use the code, please kindly cite the following bibtex: 10 | 11 | @article{peng2017cross, 12 | title={Cross-Sentence N-ary Relation Extraction with Graph LSTMs}, 13 | author={Peng, Nanyun and Poon, Hoifung and Quirk, Chris and Toutanova, Kristina and Yih, Wen-tau}, 14 | journal={Transactions of the Association for Computational Linguistics}, 15 | volume={5}, 16 | pages={101--115}, 17 | year={2017} 18 | } 19 | 20 | ## Data 21 | File system hierarchy: 22 | 23 | - data/ 24 | - drug_gene_var/ 25 | - 0/ 26 | - data_graph 27 | - sentences_2nd 28 | - graph_arcs 29 | - 1/ 30 | - 2/ 31 | - 3/ 32 | - 4/ 33 | - drug_var/ 34 | - the same structure as in drug_gene_var 35 | - drug_gene/ 36 | - the same structure as in drug_gene_var 37 | 38 | 39 | ### Source attribution: 40 | The full information of the instances are contained in the file "data_graph", it's a json format file containing information such as PubMed articleID, paragraph number, sentence number, and the information about the tokens including part-of-speech tags, dependencies, etc. produced by Stanford coreNLP tool. 41 | 42 | ### Preprocessing 43 | We processed the source data into the format that is easier for our code to consume, which includes two files: "sentences_2nd" and "graph_arcs". The "sentences_2nd" file contains the information of the raw input, and the format is: 44 | 45 | the-original-sentencesindices-to-the-first-entity(drug)indices-to-the-second-entity(gene/variant)[indices-to-the-third-entity(variant)]relation-label 46 | 47 | The "graph_arcs" file contains the information of the dependencies between the words, including time sequence adjacency, syntactic dependency, and discourse dependency. The format is: 48 | 49 | dependencies-for-node-0dependencies-for-node-1... 50 | dependencies-for-node-n = dependency-0,,,dependency-1... 51 | dependency-n = dependency-type::dependent-node 52 | 53 | ## Experiments 54 | To reproduce the results in our paper, the script ./scripts/batch_run_lstm.sh contains the command for running all the cross-validation folds for both drug-gene-variant triple and drug-variant binary relations. 55 | 56 | The script ./scripts/batch_run_multitask.sh contains the command for running all the multi-task learning experiments. 57 | -------------------------------------------------------------------------------- /scripts/batch_run_lstm.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | for i in {0..4}; do 4 | echo ${i} 5 | run_lstm.sh ${i} ${1}Relation gpu #WeightedGraphLSTM 6 | run_drug_var.sh ${i} ${1}Relation gpu #WeightedGraphLSTM 7 | done 8 | -------------------------------------------------------------------------------- /scripts/batch_run_multitask.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | for i in {0..4}; do 4 | echo ${i} 5 | run_triple_multitask.sh ${i} ${1}Relation_multitask gpu #qsub -l h_rt=36:00:00,gpu=1 -q gpu.q 6 | done 7 | -------------------------------------------------------------------------------- /scripts/run_drug_var.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | LOCAL_HOME=/home/npeng/graphLSTM/release # where the theano_src directory reside 4 | 5 | cd ${LOCAL_HOME} 6 | 7 | DATA_DIR=${LOCAL_HOME}/data # The data directory 8 | PP_DIR=${LOCAL_HOME}/results/Nary_param_and_predictions # The directory for the prediction files 9 | OUT_DIR=${LOCAL_HOME}/results/Nary_results # The log output dirctory 10 | 11 | THEANO_FLAGS=mode=FAST_RUN,device=$3,floatX=float32,nvcc.flags=-use_fast_math,exception_verbosity=high time python theano_src/lstm_RE.py --setting run_single_corpus --data_dir ${DATA_DIR}/drug_var/ --emb_dir ${DATA_DIR}/glove/glove.6B.100d.txt --total_fold 5 --dev_fold $1 --test_fold $1 --num_entity 2 --circuit $2 --batch_size 8 --lr 0.02 --lstm_type_dim 2 --content_file sentences_2nd --dependent_file graph_arcs --parameters_file ${PP_DIR}/all_drug_var_best_params_$2.cv$1.lr0.02.bt8 --prediction_file ${PP_DIR}/all_drug_var_$2.cv$1.predictions > ${OUT_DIR}/all_drug_var.accuracy.$2.cv$1.lr0.02.bt8.noName 12 | -------------------------------------------------------------------------------- /scripts/run_lstm.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | LOCAL_HOME=/home/npeng/graphLSTM/release # where the theano_src directory reside 4 | 5 | cd ${LOCAL_HOME} 6 | 7 | DATA_DIR=${LOCAL_HOME}/data # The data directory 8 | PP_DIR=${LOCAL_HOME}/results/Nary_param_and_predictions # The directory for the prediction files 9 | OUT_DIR=${LOCAL_HOME}/results/Nary_results # The log output dirctory 10 | 11 | THEANO_FLAGS=mode=FAST_RUN,device=$3,floatX=float32,nvcc.flags=-use_fast_math,exception_verbosity=high time python theano_src/lstm_RE.py --setting run_single_corpus --data_dir ${DATA_DIR}/drug_gene_var/ --emb_dir ${DATA_DIR}/glove/glove.6B.100d.txt --total_fold 5 --dev_fold $1 --test_fold $1 --num_entity 3 --circuit $2 --batch_size 8 --lr 0.02 --lstm_type_dim 2 --content_file sentences_2nd --dependent_file graph_arcs --parameters_file ${PP_DIR}/all_triple_best_params_$2.cv$1.lr0.02.bt8 --prediction_file ${PP_DIR}/all_triple_$2.cv$1.predictions > ${OUT_DIR}/all_triple.accuracy.$2.cv$1.lr0.02.bt8.noName 12 | -------------------------------------------------------------------------------- /scripts/run_triple_multitask.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | LOCAL_HOME=/home/npeng/graphLSTM/release # where the theano_src directory reside 4 | 5 | cd ${LOCAL_HOME} 6 | 7 | DATA_DIR=${LOCAL_HOME}/data # The data directory 8 | PP_DIR=${LOCAL_HOME}/results/Nary_multitask_param_and_predictions # The directory for the prediction files 9 | OUT_DIR=${LOCAL_HOME}/results/Nary_multitask_results # The log output dirctory 10 | 11 | THEANO_FLAGS=mode=FAST_RUN,device=$3,floatX=float32,nvcc.flags=-use_fast_math,exception_verbosity=high time python theano_src/lstm_RE.py --setting run_corpora_multitask --drug_gene_dir ${DATA_DIR}/drug_gene/ --drug_variant_dir ${DATA_DIR}/drug_var/ --drug_gene_variant_dir ${DATA_DIR}/drug_gene_var/ --emb_dir ${DATA_DIR}/glove/glove.6B.100d.txt --total_fold 5 --dev_fold $1 --test_fold $1 --circuit $2 --batch_size 8 --lr 0.02 --lstm_type_dim 2 --content_file sentences_2nd --dependent_file graph_arcs --parameters_file ${PP_DIR}/all_triple_best_params_$2.cv$1.lr0.02.bt8 --drug_gene_prediction_file ${PP_DIR}/drug_gene_$2.cv$1.predictions --drug_var_prediction_file ${PP_DIR}/drug_var_$2.cv$1.predictions --triple_prediction_file ${PP_DIR}/triple_$2.cv$1.predictions --print_prediction True --sample_coef 1.0 > ${OUT_DIR}/all_triple.accuracy.$2.cv$1.lr0.02.bt8 12 | -------------------------------------------------------------------------------- /theano_src/data_process.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/python 2 | # -*- coding: utf-8 -*- 3 | 4 | import codecs as cs 5 | import random 6 | import re, sys, os 7 | import numpy 8 | import theano 9 | from collections import defaultdict 10 | from edmonds_mst import * #edmonds import mst 11 | 12 | OOV = '_OOV_' 13 | SEG = 'Segmentation' 14 | feature_thresh = 0 15 | name_len_thresh = 5 16 | 17 | 18 | # Dirty statstics for single sentence results, copy this func from lstm_RE.py 19 | def eval_logitReg_accuracy(predictions, goldens): 20 | assert len(predictions) == len(goldens) 21 | correct = 0.0 22 | for p, g in zip(predictions, goldens): 23 | #print 'in eval_logitReg_accuracy,', p, g 24 | if p == g: 25 | correct += 1.0 26 | return correct/len(predictions) 27 | 28 | # Generate annotations that contains article id, entity names and sentence length 29 | def quick_gen_anno_from_json(infile, outfile): 30 | import json 31 | with open(infile) as inf, cs.open(outfile, 'w', encoding='utf-8') as outf: 32 | for line in inf: 33 | content = json.loads(line) #, encoding='utf-8') 34 | for item in content: 35 | local_map = {} 36 | sentences = item['sentences'] 37 | pmid = item['article'] 38 | entities = item['entities'] 39 | relation = item['relationLabel'] 40 | entity_arry = [pmid, str(len(sentences))] 41 | for entity in entities: 42 | if 'indices' not in entity or len(entity['indices']) == 0 : 43 | sys.stderr.write('WARNING: entity mention '+entity['mention'].encode('utf-8')+' does not have index!\n') 44 | entity_arry.append(entity['type']+':'+entity['id']) 45 | outf.write('\t'.join(entity_arry)+'\n') 46 | 47 | 48 | # Sample high-confident examples for the PubMed scale extraction 49 | def sample_high_conf_predictions_PubMed(sent_file, pred_file, anno_file, num_samples): 50 | all_instances = load_high_conf_predictions(sent_file, anno_file, pred_file) 51 | for i, ins in enumerate(random.sample(all_instances, int(num_samples))): 52 | print '\n'.join(gen_html_for_ins(ins, i)) 53 | 54 | 55 | def statistics_open_extraction(sent_file, anno_file, pred_file, thresh): 56 | all_instances = load_high_conf_predictions(sent_file, anno_file, pred_file, float(thresh)) 57 | single_sent_set = set() 58 | multi_sent_set = set() 59 | multi_drug_set = set() 60 | multi_gene_set = set() 61 | multi_variant_set = set() 62 | single_drug_set = set() 63 | single_gene_set = set() 64 | single_variant_set = set() 65 | for ins in all_instances: 66 | anno = ins[1].split('\t') 67 | multi_sent_set.add(tuple(anno[2:])) 68 | multi_drug_set.add(anno[2]) 69 | multi_gene_set.add(anno[3]) 70 | if len(anno) > 4: 71 | multi_variant_set.add(anno[4]) 72 | if int(anno[1]) == 1: 73 | single_sent_set.add(tuple(anno[2:])) 74 | single_drug_set.add(anno[2]) 75 | single_gene_set.add(anno[3]) 76 | if len(anno) > 4: 77 | single_variant_set.add(anno[4]) 78 | print 'high confident instances numbers:', thresh, len(single_sent_set), len(multi_sent_set) 79 | print 'single sentence distinct entities:', len(single_drug_set), len(single_gene_set), len(single_variant_set) 80 | print 'multi-sentence distinct entities:', len(multi_drug_set), len(multi_gene_set), len(multi_variant_set) 81 | 82 | # Sample high-confident examples 83 | def sample_high_conf_predictions(sent_dir, pred_dir, num_folds, sent_file_name, anno_file_name, pred_file_prefix, thresh, num_samples): 84 | all_instances = [] 85 | accuracies = [] 86 | for i in range(int(num_folds)): 87 | sentence_file = os.path.join(sent_dir, str(i), sent_file_name) 88 | annotation_file = os.path.join(sent_dir, str(i), anno_file_name) 89 | pred_file = os.path.join(pred_dir, pred_file_prefix+str(i)+'.predictions') 90 | #all_instances.extend(load_high_conf_predictions(sentence_file, annotation_file, pred_file, float(thresh))) 91 | all_instances = load_high_conf_predictions(sentence_file, annotation_file, pred_file, float(thresh)) 92 | #for i, ins in enumerate(random.sample(all_instances, int(num_samples))): 93 | # print '\n'.join(gen_html_for_ins(ins, i)) 94 | pred_array = [] 95 | gold_array = [] 96 | for ins in all_instances: 97 | if ins[1].split('\t')[1] != '1': 98 | continue 99 | if float(ins[0].split('\t')[-1]) > 0.5: 100 | pred_array.append(1) 101 | else: 102 | pred_array.append(0) 103 | if ins[2].split('\t')[-1] == 'None': 104 | gold_array.append(0) 105 | else: 106 | gold_array.append(1) 107 | print len(pred_array), len(gold_array) 108 | accuracies.append(eval_logitReg_accuracy(pred_array, gold_array)) 109 | print numpy.mean(accuracies) 110 | 111 | 112 | def gen_html_for_ins(instance, num): 113 | content_arry = ['

') 115 | content_arry.append('

p = '+instance[0].split('\t')[1]+'

') 116 | content_arry.append('

') 118 | content_arry.append(instance[2].split('\t')[0]+'

') 120 | content_arry.append('

') 121 | return content_arry 122 | 123 | def load_high_conf_predictions(sentence_file, annotation_file, pred_file, thresh=0.5): 124 | instances = [] 125 | single_sent_anno_set = set() 126 | multi_sent_anno_set = set() 127 | with open(sentence_file) as stf, open(annotation_file) as anf, open(pred_file) as pf: 128 | for sl, al, pl in zip(stf, anf, pf): 129 | anno = al.strip().split('\t') 130 | multi_sent_anno_set.add(tuple(anno[2:])) 131 | if int(anno[1]) == 1: 132 | single_sent_anno_set.add(tuple(anno[2:])) 133 | if float(pl.strip().split('\t')[-1]) > thresh: 134 | instances.append([pl.strip(), al.strip(), sl.strip()]) 135 | #for rel in single_sent_anno_set: 136 | # print rel 137 | print 'total distinct candidates for single sentence:', len(single_sent_anno_set), 'multiple sentences:', len(multi_sent_anno_set) 138 | return instances 139 | 140 | # Generate chain-structure for the tree-LSTM implementation 141 | def quick_chain(sentfile, outdepfile): 142 | with open(sentfile) as sf, open(outdepfile, 'w') as odf: 143 | for line in sf: 144 | content = line.strip().split('\t')[0] 145 | dummy_dep = [(i+1) for i in xrange(len(content.lower().split(' ')))] 146 | dummy_dep[-1] = -1 147 | odf.write(' '.join(map(str,dummy_dep))+'\n') 148 | 149 | 150 | # Sample the pos/neg examples to be similar size as the other. 151 | def quick_sample(sentfile, depfile): 152 | with open(sentfile) as sf, open(depfile) as df, open(sentfile+'.balanced', 'w') as osf, open(depfile+'.balanced', 'w') as odf: 153 | contents = [] 154 | deplabels = [] 155 | line_count = 0 156 | pos = [] 157 | neg = [] 158 | while True: 159 | sent_line = sf.readline() 160 | contents.append(sent_line) 161 | if not sent_line: 162 | assert not df.readline() 163 | break 164 | deplabels.append(df.readline()) 165 | elems = sent_line.strip().split('\t') 166 | if elems[-1] == '+': 167 | pos.append(line_count) 168 | else: 169 | neg.append(line_count) 170 | line_count += 1 171 | size = len(neg) 172 | pos_new = random.sample(pos, size) 173 | for p, n in zip(pos_new, neg): 174 | print p, n 175 | osf.write(contents[p]) 176 | osf.write(contents[n]) 177 | odf.write(deplabels[p]) 178 | odf.write(deplabels[n]) 179 | 180 | 181 | # A quick check on how many dev instances are in the training instances. 182 | def quick_check(train_file, dev_file): 183 | train = load_text(train_file) 184 | dev = load_text(dev_file) 185 | train_set = set(train) 186 | count = 0 187 | print train[0] 188 | print dev[0] 189 | for line in dev: 190 | if line in train_set: 191 | count += 1 192 | print count 193 | 194 | def load_text(filename): 195 | content = [] 196 | with open(filename) as inf: 197 | for line in inf: 198 | #print line.split('\t')[0] 199 | content.append(line.strip().split('\t')[0]) 200 | print len(content) 201 | return content 202 | 203 | # generate path and remove circle 204 | def gen_path(path_dict): 205 | start = path_dict['from'] 206 | end = path_dict['to'] 207 | path_info = path_dict['steps'] 208 | path = [] 209 | pre, prepre = -1, -1 210 | for i, item in enumerate(path_info): 211 | if item['from'] == -1: 212 | assert i == 0 213 | item['from'] = start 214 | if item['to'] == -1: 215 | assert i == len(path_info)-1 216 | item['to'] = end 217 | if len(path) > 0 and path[-1][0] == item['to']: 218 | path.pop() 219 | continue 220 | if len(path) > 0: 221 | assert path[-1][1] == item['from'] 222 | path.append((item['from'], item['to'], item['label'])) 223 | #print 'path!! type:', path_dict['name'], start, end, path 224 | return path 225 | 226 | 227 | def gen_graph_from_paths(paths): 228 | path_graph = dict() 229 | node_pair_set = set() 230 | print 'generate graph!!!' 231 | for path in paths: 232 | for node in path: 233 | pair = (node[0], node[1]) 234 | if pair in node_pair_set: 235 | assert (node[1], node[0]) in node_pair_set 236 | continue 237 | node_pair_set.add(pair) 238 | node_pair_set.add((node[1], node[0])) 239 | if node[0] not in path_graph: 240 | path_graph[node[0]] = [] 241 | value = (node[1], node[2]) 242 | if value not in path_graph[node[0]]: 243 | path_graph[node[0]].append(value) 244 | print path_graph 245 | return path_graph 246 | 247 | 248 | def topolgical_sort(graph_unsorted): 249 | """ 250 | Repeatedly go through all of the nodes in the graph, moving each of 251 | the nodes that has all its edges resolved, onto a sequence that 252 | forms our sorted graph. A node has all of its edges resolved and 253 | can be moved once all the nodes its edges point to, have been moved 254 | from the unsorted graph onto the sorted one. 255 | """ 256 | 257 | # This is the list we'll return, that stores each node/edges pair 258 | # in topological order. 259 | graph_sorted = [] 260 | # Convert the unsorted graph into a hash table. This gives us 261 | # constant-time lookup for checking if edges are unresolved, and 262 | # for removing nodes from the unsorted graph. 263 | 264 | # Run until the unsorted graph is empty. 265 | while graph_unsorted: 266 | # Go through each of the node/edges pairs in the unsorted 267 | # graph. If a set of edges doesn't contain any nodes that 268 | # haven't been resolved, that is, that are still in the 269 | # unsorted graph, remove the pair from the unsorted graph, 270 | # and append it to the sorted graph. Note here that by using 271 | # using the items() method for iterating, a copy of the 272 | # unsorted graph is used, allowing us to modify the unsorted 273 | # graph as we move through it. We also keep a flag for 274 | # checking that that graph is acyclic, which is true if any 275 | # nodes are resolved during each pass through the graph. If 276 | # not, we need to bail out as the graph therefore can't be 277 | # sorted. 278 | acyclic = False 279 | sorted_nodes = set() 280 | for node, edges in graph_unsorted.items(): 281 | print 'processing node:', node, edges 282 | for edge, label in edges: 283 | print 'processing edge:', node, edge 284 | if edge in graph_unsorted: 285 | break 286 | else: 287 | acyclic = True 288 | del graph_unsorted[node] 289 | graph_sorted.append((node, edges)) 290 | sorted_nodes.add(node) 291 | print 'graph_sorted:', graph_sorted 292 | if not acyclic: 293 | # Uh oh, we've passed through all the unsorted nodes and 294 | # weren't able to resolve any of them, which means there 295 | # are nodes with cyclic edges that will never be resolved, 296 | # so we bail out with an error. 297 | raise RuntimeError("A cyclic dependency occurred") 298 | '''print 'A cyclic dependency occurred' 299 | for node, edges in graph_unsorted.items(): 300 | print 'processing node:', node, edges 301 | removed_edge = False 302 | for edge, label in edges: 303 | if edge in sorted_nodes: 304 | graph_unsorted[node].remove((edge, label)) 305 | removed_edge = True 306 | break 307 | if removed_edge: 308 | break 309 | ''' 310 | return graph_sorted 311 | 312 | 313 | def gen_chain_shortest_paths(infile, outfile): 314 | import json 315 | with open(infile) as inf, cs.open(outfile, 'w', encoding='utf-8') as outf: 316 | ignored_item = 0 317 | for line in inf: 318 | content = json.loads(line) #, encoding='utf-8') 319 | for item in content: 320 | local_map = {} 321 | sentences = item['sentences'] 322 | entities = item['entities'] 323 | relation = item['relationLabel'] 324 | paths = item['paths'] 325 | origin_text = [] 326 | dep_arcs = [] 327 | pre_root = -1 328 | #print 'a new instance!!' 329 | # Get the original text 330 | for sentence in sentences: 331 | #print 'sentence root:', root 332 | for node in sentence['nodes']: 333 | origin_text.append(re.sub('[ \t]', '', node['label'].strip())) 334 | indices = [] 335 | # Substitute the Entities to special symbols 336 | # Double check the entity indices at the same time. 337 | for entity in entities: 338 | if 'indices' not in entity or len(entity['indices']) == 0 : 339 | sys.stderr.write('WARNING: entity mention '+entity['mention'].encode('utf-8')+' does not have index!\n') 340 | indices.append([0]) 341 | continue 342 | indices.append(entity['indices']) 343 | start = entity['indices'][0] 344 | end = entity['indices'][-1] + 1 345 | try: 346 | assert entity['mention'] in ' '.join(origin_text[start:end]).strip() 347 | except: 348 | sys.stderr.write('WARNING: entity mention does not match! '+entity['mention'].encode('utf-8')+' v.s. '+' '.join(origin_text[start:end]).encode('utf-8') +'\n') 349 | sys.stderr.write('===== Instance information: PMID: '+str(item['article']) + ', Sentences:') 350 | for sentence in sentences: 351 | sys.stderr.write(' Paragraph '+str(sentence['paragraph'])+', sentence '+str(sentence['sentence'])+',') 352 | sys.stderr.write('\n') 353 | sys.stderr.write('Original Text: '+' '.join(origin_text).encode('utf-8')+'\n') 354 | sys.stderr.write('Original entity indices: ' + str(entity['indices']).encode('utf-8') + '\t Converted indices: '+ str(range(start, end)) + '\n') 355 | for eidx in range(start, end): 356 | origin_text[eidx] = '' 357 | # Get the paths, construct the directed graph and run a topological sort. 358 | '''path_array = [] 359 | pre_name = '' 360 | for path in paths: 361 | path_name = path['name'] 362 | if path_name.startswith('drug_'): 363 | if path_name == pre_name and path['to'] != path_array[-1][-1][1] 364 | path_array.append(gen_path(path)) 365 | path_graph = gen_graph_from_paths(path_array) 366 | print path_graph 367 | print topolgical_sort(path_graph) 368 | ''' 369 | if len(paths) == 0: 370 | print 'no path!!!' 371 | ignored_item += 1 372 | continue 373 | path = gen_path(paths[0]) 374 | clean_path = [item[0] for item in path] 375 | clean_path.append(path[-1][1]) 376 | assert len(indices) == 2 377 | #print indices[0], clean_path[0] 378 | #print indices[-1], clean_path[-1] 379 | assert clean_path[0] in indices[0] 380 | assert clean_path[-1] in indices[-1] 381 | # augment the first entity 382 | for idx in indices[0]: 383 | if idx not in clean_path: 384 | clean_path.insert(0, idx) 385 | # augment the last entity 386 | for idx in indices[-1]: 387 | if idx not in clean_path: 388 | clean_path.append(idx) 389 | outf.write(re.sub('\t', ' ', ' '.join(map(lambda x: origin_text[x], clean_path)))+'\t'+'\t'.join([' '.join(map(str, map(lambda x: clean_path.index(x),idx))) for idx in indices])+'\t'+relation+'\n') 390 | #grf.write(' '.join(dep_arcs)+'\n') 391 | print 'Ignored', ignored_item, 'items!' 392 | 393 | def filter_sentence_json(infile, outfile): 394 | import json 395 | with open(infile) as inf, cs.open(outfile, 'w', encoding='utf-8') as outf: 396 | total_ins = 0 397 | single_sent = 0 398 | filtered_array = [] 399 | for line in inf: 400 | content = json.loads(line) #, encoding='utf-8') 401 | for item in content: 402 | sentences = item['sentences'] 403 | total_ins += 1 404 | if len(sentences) == 1: 405 | single_sent += 1 406 | filtered_array.append(item) 407 | outf.write(unicode(json.dumps(filtered_array, ensure_ascii=False))) 408 | print 'total instances = ', total_ins, 'single sentence instances = ', single_sent 409 | 410 | def gen_graph_from_json(infile, outfile, graphfile): 411 | import json 412 | with open(infile) as inf, cs.open(outfile, 'w', encoding='utf-8') as outf, cs.open(graphfile, 'w', encoding='utf-8') as grf: 413 | for line in inf: 414 | content = json.loads(line) #, encoding='utf-8') 415 | for item in content: 416 | local_map = {} 417 | sentences = item['sentences'] 418 | entities = item['entities'] 419 | relation = item['relationLabel'] 420 | origin_text = [] 421 | dep_arcs = [] 422 | pre_root = -1 423 | #print 'a new instance!!' 424 | for sentence in sentences: 425 | root = sentence['root'] 426 | #print 'sentence root:', root 427 | for node in sentence['nodes']: 428 | origin_text.append(re.sub('[ \t]', '', node['label'].strip())) 429 | arc_list = node['arcs'] 430 | node_idx = node['index']#-prev_sent_length 431 | arcs = [] 432 | for arc in arc_list: 433 | try: 434 | assert arc['toIndex'] != node['index'] 435 | arcs.append(re.sub(' ', '_', arc['label'])+'::'+str(arc['toIndex'])) 436 | #arc_map[arc['toIndex']-prev_sent_length] = 1 437 | except: 438 | sys.stderr.write('arc point to self!!!! Node:'+str(node['index'])+','+node['label']+'\t'+str(arc_list)+'. Printing out the \n') 439 | sys.stderr.write('===== Instance information: PMID: '+str(item['article']) + 440 | ', Paragraph '+str(sentence['paragraph'])+', sentence '+str(sentence['sentence'])+', paraSent:'+str(sentence['paragraphSentence'])+'\n') 441 | for node in sentence['nodes']: 442 | sys.stderr.write(node['label']+'\t'+str(node['arcs'])+'\n') 443 | #if temp_to not in arc_map: 444 | # arc_map[temp_to] = 100 445 | dep_arcs.append(',,,'.join(arcs)) 446 | try: 447 | assert root > -1 448 | except: 449 | root = 0 450 | sys.stderr.write('Sentence NO ROOT! ===== Instance information: PMID: '+str(item['article']) + 451 | ', Paragraph '+str(sentence['paragraph'])+', sentence '+str(sentence['sentence'])+'\n') 452 | #print 'root deps:', dep_arcs[root] 453 | if pre_root != -1: 454 | dep_arcs[pre_root] += ',,,nextsent:next::'+str(root) 455 | dep_arcs[root] += ',,,prevsent:prev::'+str(pre_root) 456 | pre_root = root 457 | assert len(dep_arcs) == len(origin_text) #len(text) 458 | indices = [] 459 | for entity in entities: 460 | if 'indices' not in entity or len(entity['indices']) == 0 : 461 | sys.stderr.write('WARNING: entity mention '+entity['mention'].encode('utf-8')+' does not have index!\n') 462 | indices.append([0]) 463 | continue 464 | indices.append(entity['indices']) 465 | start = entity['indices'][0] 466 | end = entity['indices'][-1] + 1 467 | try: 468 | assert entity['mention'] in ' '.join(origin_text[start:end]).strip() 469 | except: 470 | sys.stderr.write('WARNING: entity mention does not match! '+entity['mention'].encode('utf-8')+' v.s. '+' '.join(origin_text[start:end]).encode('utf-8') +'\n') 471 | sys.stderr.write('===== Instance information: PMID: '+str(item['article']) + ', Sentences:') 472 | for sentence in sentences: 473 | sys.stderr.write(' Paragraph '+str(sentence['paragraph'])+', sentence '+str(sentence['sentence'])+',') 474 | sys.stderr.write('\n') 475 | sys.stderr.write('Original Text: '+' '.join(origin_text).encode('utf-8')+'\n') 476 | sys.stderr.write('Original entity indices: ' + str(entity['indices']).encode('utf-8') + '\t Converted indices: '+ str(range(start, end)) + '\n') 477 | for eidx in range(start, end): 478 | # Note: change this line to get or remove the entity name. 479 | origin_text[eidx] += '' 480 | outf.write(re.sub('\t', ' ', ' '.join(origin_text))+'\t'+'\t'.join([' '.join(map(str,idx)) for idx in indices])+'\t'+relation+'\n') 481 | grf.write(' '.join(dep_arcs)+'\n') 482 | 483 | # helper function for revert sentence such that bi-directional LSTM can capture semantic order. 484 | def reverse_sent(indices, dep_arcs): 485 | sent_len = len(dep_arcs) 486 | new_indices = [[sent_len-1-i for i in idx] for idx in indices] 487 | new_deps = [',,,'.join(map(lambda x: '::'.join([str(sent_len-1-int(x[0])), x[1]]), [arc.split('::') for arc in item.split(',,,')])) for item in dep_arcs] 488 | return new_indices, new_deps 489 | 490 | 491 | def gen_MST_from_json(infile, outfile, depfile): 492 | import json 493 | with open(infile) as inf, cs.open(outfile, 'w', encoding='utf-8') as outf, cs.open(depfile, 'w', encoding='utf-8') as depf: 494 | for line in inf: 495 | content = json.loads(line) #, encoding='utf-8') 496 | #print len(content) 497 | count = 0 498 | for item in content: 499 | local_map = {} 500 | sentences = item['sentences'] 501 | entities = item['entities'] 502 | relation = item['relationLabel'] 503 | origin_text = [] 504 | dep_arcs = [] 505 | #prev_sent_length = 0 506 | pre_root = -1 507 | for sentence in sentences: 508 | dep_graph = [] #dict() 509 | root = sentence['root'] 510 | for node in sentence['nodes']: 511 | origin_text.append(node['label']) 512 | arc_list = node['arcs'] 513 | node_idx = node['index']#-prev_sent_length 514 | for arc in arc_list: 515 | if arc['label'].startswith('deparc'): 516 | try: 517 | assert arc['toIndex'] != node['index'] 518 | dep_graph.append(Arc(arc['toIndex'], 1, node_idx)) 519 | #arc_map[arc['toIndex']-prev_sent_length] = 1 520 | except: 521 | sys.stderr.write('dependency arc point to self!!!! Node:'+str(node['index'])+','+node['label']+'\t'+str(arc_list)+'. Printing out the \n') 522 | sys.stderr.write('===== Instance information: PMID: '+str(item['article']) + 523 | ', Paragraph '+str(sentence['paragraph'])+', sentence '+str(sentence['sentence'])+', paraSent:'+str(sentence['paragraphSentence'])+'\n') 524 | for node in sentence['nodes']: 525 | sys.stderr.write(node['label']+'\t'+str(node['arcs'])+'\n') 526 | elif arc['label'].startswith('adjtok'): 527 | temp_to = arc['toIndex'] 528 | dep_graph.append(Arc(temp_to, 100, node_idx)) 529 | try: 530 | assert root > -1 531 | except: 532 | root = 0 533 | sys.stderr.write('Sentence NO ROOT! ===== Instance information: PMID: '+str(item['article']) + 534 | ', Paragraph '+str(sentence['paragraph'])+', sentence '+str(sentence['sentence'])+'\n') 535 | tree = min_spanning_arborescence(dep_graph, root) #mst(root-prev_sent_length, dep_graph) 536 | temp_dep_tree = [0] * (len(tree)+1) 537 | dep_arcs.extend(temp_dep_tree) 538 | for k in tree: 539 | dep_arcs[k] = tree[k].head 540 | assert dep_arcs[root] == 0 #prev_sent_length 541 | dep_arcs[root] = -1 542 | #prev_sent_length += len(dep_graph) 543 | if pre_root != -1: 544 | dep_arcs[pre_root] = root 545 | pre_root = root 546 | assert len(dep_arcs) == len(origin_text) #len(text) 547 | indices = [] 548 | for entity in entities: 549 | if 'indices' not in entity or len(entity['indices']) == 0 : 550 | sys.stderr.write('WARNING: entity mention '+entity['mention'].encode('utf-8')+' does not have index!\n') 551 | indices.append([0]) 552 | continue 553 | indices.append(entity['indices']) 554 | start = entity['indices'][0] 555 | end = entity['indices'][-1] + 1 556 | try: 557 | assert entity['mention'] in ' '.join(origin_text[start:end]).strip() 558 | except: 559 | sys.stderr.write('WARNING: entity mention does not match! '+entity['mention'].encode('utf-8')+' v.s. '+' '.join(origin_text[start:end]).encode('utf-8') +'\n') 560 | sys.stderr.write('===== Instance information: PMID: '+str(item['article']) + ', Sentences:') 561 | for sentence in sentences: 562 | sys.stderr.write(' Paragraph '+str(sentence['paragraph'])+', sentence '+str(sentence['sentence'])+',') 563 | sys.stderr.write('\n') 564 | sys.stderr.write('Original Text: '+' '.join(origin_text).encode('utf-8')+'\n') 565 | sys.stderr.write('Original entity indices: ' + str(entity['indices']).encode('utf-8') + '\t Converted indices: '+ str(range(start, end)) + '\n') 566 | for eidx in range(start, end): 567 | origin_text[eidx] = '' 568 | outf.write(re.sub('\t', ' ', ' '.join(origin_text))+'\t'+'\t'.join([' '.join(map(str,idx)) for idx in indices])+'\t'+relation+'\n') 569 | depf.write(' '.join(map(str, [a if a not in local_map else local_map[a] for a in dep_arcs ]))+'\n') 570 | count += 1 571 | 572 | 573 | ''' Stale version of tree construction using heurestics. Changing to MST version''' 574 | def gen_data_from_json(infile, outfile, depfile): 575 | import json 576 | with open(infile) as inf, cs.open(outfile, 'w', encoding='utf-8') as outf, cs.open(depfile, 'w', encoding='utf-8') as depf: 577 | count = 1 578 | for line in inf: 579 | sys.stderr.write('processing line '+str(count) + '\n') 580 | content = json.loads(line) #, encoding='utf-8') 581 | #print len(content) 582 | icount = 0 583 | for item in content: 584 | local_map = {} 585 | missing_count = 0 586 | sentences = item['sentences'] 587 | entities = item['entities'] 588 | relation = item['relationLabel'] 589 | text = [] 590 | dep_arcs = [] 591 | origin_text = [] 592 | origin_dep = [] 593 | for sentence in sentences: 594 | for node in sentence['nodes']: 595 | origin_text.append(node['label']) 596 | origin_dep.append(node['arcs']) 597 | arc_list = node['arcs'] 598 | for arc in arc_list: 599 | if arc['label'].startswith('depinv'): 600 | dep_arcs.append(arc['toIndex']) 601 | text.append(node['label']) 602 | break 603 | try: 604 | assert (arc['label'].startswith('depinv') or node['index'] == sentence['root']) 605 | except: 606 | #print arc_list 607 | missing_count += 1 608 | if node['index'] < len(sentence['nodes'])-1: 609 | dep_arcs.append(node['index']+1) 610 | else: 611 | dep_arcs.append(node['index']-1) 612 | #print node['index'], sentence['root'] 613 | if node['index'] == sentence['root']: 614 | if arc['label'].startswith('depinv'): 615 | dep_arcs[-1] = -1 616 | else: 617 | dep_arcs.append(-1) 618 | text.append(node['label']) 619 | local_map[node['index']] = node['index'] - missing_count 620 | assert len(dep_arcs) == len(origin_text) #len(text) 621 | indices = [] 622 | for entity in entities: 623 | if 'indices' not in entity or len(entity['indices']) == 0 : 624 | sys.stderr.write('WARNING: entity mention '+entity['mention'].encode('utf-8')+' does not have index!\n') 625 | indices.append([0]) 626 | continue 627 | indices.append(entity['indices']) 628 | start = entity['indices'][0] 629 | end = entity['indices'][-1] + 1 630 | try: 631 | assert entity['mention'] in ' '.join(origin_text[start:end]).strip() 632 | #assert entity['mention'] in ' '.join(text[start:end]).strip() 633 | except: 634 | sys.stderr.write('WARNING: entity mention does not match! '+entity['mention'].encode('utf-8')+' v.s. '+' '.join(origin_text[start:end]).encode('utf-8') +'\n') 635 | sys.stderr.write('===== Instance information: PMID: '+str(item['article']) + ', Sentences:') 636 | for sentence in sentences: 637 | sys.stderr.write(' Paragraph '+str(sentence['paragraph'])+', sentence '+str(sentence['sentence'])+',') 638 | sys.stderr.write('\n') 639 | sys.stderr.write('Collapsed Text: '+' '.join(text).encode('utf-8')+'\n') 640 | sys.stderr.write('Original Text: '+' '.join(origin_text).encode('utf-8')+'\n') 641 | sys.stderr.write('Original entity indices: ' + str(entity['indices']).encode('utf-8') + '\t Converted indices: '+ str(range(start, end)) + '\n') 642 | sys.stderr.write('The original sentence with dependency arcs:\n') 643 | for t, a in zip(origin_text, origin_dep): 644 | sys.stderr.write(t.encode('utf-8')+'\t'+str(a).encode('utf-8')+'\n') 645 | outf.write(re.sub('\t', ' ', ' '.join(origin_text))+'\t'+'\t'.join([' '.join(map(str,idx)) for idx in indices])+'\t'+relation+'\n') 646 | depf.write(' '.join(map(str, [a if a not in local_map else local_map[a] for a in dep_arcs ]))+'\n') 647 | icount += 1 648 | sys.stderr.write('instances in line '+str(count)+': ' +str(icount) + '\n') 649 | count += 1 650 | 651 | 652 | def quick_split(infile, fold): 653 | with open(infile) as inf: 654 | content = [] 655 | for line in inf: 656 | content.append(line) 657 | beam = len(content) / fold 658 | random.shuffle(content) 659 | for i in range(fold): 660 | with open(infile+'_split_'+str(i)+'.train', 'w') as trainf, open(infile+'_split_'+str(i)+'.test', 'w') as testf: 661 | for line in content[0:i*beam]: 662 | trainf.write(line) 663 | for line in content[i*beam: (i+1)*beam]: 664 | testf.write(line) 665 | for line in content[(i+1)*beam:]: 666 | trainf.write(line) 667 | 668 | 669 | def prepare_data(seqs, eidxs, mask=None, maxlen=None): 670 | """Create the matrices from the datasets. 671 | 672 | This pad each sequence to the same lenght: the lenght of the 673 | longuest sequence or maxlen. 674 | 675 | if maxlen is set, we will cut all sequence to this maximum 676 | lenght. 677 | 678 | This swap the axis! 679 | """ 680 | # x: a list of sentences 681 | lengths = [len(s) for s in seqs] 682 | 683 | # This part: suspeciously wrong. 684 | if maxlen is not None: 685 | new_seqs = [] 686 | new_lengths = [] 687 | for l, s in zip(lengths, seqs): 688 | if l < maxlen: 689 | new_seqs.append(s) 690 | new_lengths.append(l) 691 | else: 692 | new_seqs.append(s[:maxlen]) 693 | new_lengths.append(maxlen) 694 | lengths = new_lengths 695 | seqs = new_seqs 696 | 697 | if len(lengths) < 1: 698 | return None, None, None 699 | 700 | n_samples = len(seqs) 701 | maxlen = numpy.max(lengths) 702 | assert seqs[0].ndim == 2 703 | x = numpy.zeros((maxlen, n_samples, seqs[0].shape[1])).astype('int32') 704 | if mask is not None: 705 | x_mask = numpy.zeros((maxlen, n_samples, maxlen, mask[0].shape[-1])).astype(theano.config.floatX) 706 | else: 707 | x_mask = numpy.zeros((maxlen, n_samples)).astype(theano.config.floatX) 708 | num_entities = len(eidxs[0]) 709 | np_eidxs = [numpy.zeros((maxlen, n_samples)).astype(theano.config.floatX) for i in range(num_entities)] 710 | for idx, s in enumerate(seqs): 711 | x[:lengths[idx], idx, :] = s 712 | if mask is not None: 713 | x_mask[:lengths[idx], idx, :lengths[idx], :] = mask[idx][:lengths[idx], :lengths[idx], :] 714 | else: 715 | x_mask[:lengths[idx], idx] = 1. 716 | for i in range(num_entities): 717 | if numpy.all(numpy.array(eidxs[idx][i]) < maxlen ): 718 | np_eidxs[i][eidxs[idx][i], idx] = 1. 719 | else: 720 | np_eidxs[i][maxlen-1, idx] = 1. 721 | return x, x_mask, np_eidxs 722 | 723 | 724 | def check_entity(x, idx): 725 | for i, (ins, ii) in enumerate(zip(x, idx)): 726 | sent_len = len(ins) 727 | for item in ii: 728 | for li in item: 729 | assert li < sent_len 730 | try: 731 | assert ins[li].startswith(' 0 or i == 0) 776 | except: 777 | #print i, elem 778 | pass 779 | dep_graph.append(local_dep) 780 | return dep_graph 781 | 782 | # prepare input for the type-multiply strategy (each type has its own parameters) 783 | def gen_child_mask_from_dep(dependency, num_arc_type): 784 | sent_len = len(dependency) 785 | child_exist = numpy.zeros([sent_len, sent_len, num_arc_type]).astype(theano.config.floatX) 786 | for ii, elem in enumerate(dependency): 787 | if ii != 0: 788 | child_exist[ii, ii-1, 0] = 1 789 | # Here I add the below line to support bi-direction 790 | if ii != sent_len - 1: 791 | child_exist[ii, ii+1, 0] = 1 792 | for jj, el in enumerate(elem): 793 | child_exist[ii, el[0], el[1]] = 1 794 | return child_exist 795 | 796 | # prepare input for the type-add strategy (dependency concatenate with type, and add them together in hidden-to-hidden transformation) 797 | def gen_child_mask_from_dep_add(dependency, num_arc_type): 798 | sent_len = len(dependency) 799 | child_exist = numpy.zeros([sent_len, sent_len, 2]).astype(theano.config.floatX) 800 | for ii, elem in enumerate(dependency): 801 | if ii != 0: 802 | child_exist[ii, ii-1, 0] = 1 803 | child_exist[ii, ii-1, 1] = 0#1 804 | # Here I add the below line to support bi-direction 805 | if ii != sent_len - 1: 806 | child_exist[ii, ii+1, 0] = 1 807 | child_exist[ii, ii+1, 1] = 0#1 808 | for jj, el in enumerate(elem): 809 | child_exist[ii, el[0], 0] = 1 810 | child_exist[ii, el[0], 1] = el[1]#+1 811 | assert numpy.all(child_exist[:,:,0] <= 1.) 812 | assert numpy.all(child_exist[:,:,1] < num_arc_type) 813 | return child_exist 814 | 815 | 816 | def collect_data(corpus_x, corpus_y, corpus_idx, corpus_dep, x, y, idx, dependencies, words2idx, arc_type_dict, dep, add): 817 | corpus_x.extend( [[words2idx.get(w, 0) for w in sent] for sent in x] ) 818 | corpus_y.extend(y) 819 | corpus_idx.extend(idx) 820 | if dep: 821 | if add: 822 | child_exists = [gen_child_mask_from_dep_add(dependency, len(arc_type_dict)) for dependency in dependencies] 823 | else: 824 | child_exists = [gen_child_mask_from_dep(dependency, len(arc_type_dict)) for dependency in dependencies] 825 | for i, (a,b,dep) in enumerate(zip(x, child_exists, dependencies)): 826 | assert len(a) == len(b) 827 | corpus_dep.extend(child_exists) 828 | assert len(corpus_idx) == len(corpus_dep) 829 | assert len(corpus_x) == len(corpus_y) 830 | assert len(corpus_y) == len(corpus_idx) 831 | 832 | 833 | def load_data_cv(data_dir, folds, dev_fold, test_fold=None, arc_type_dict=dict(), num_entities=2, dep=False, content_fname='sentences', dep_fname='graph_arcs', add=True): 834 | corpus = [] 835 | for i in range(folds): 836 | sub_corpus = read_file(os.path.join(data_dir, str(i), content_fname), num_entities) 837 | if dep: 838 | dependencies, arc_type_dict = read_graph_dependencies(os.path.join(data_dir, str(i), dep_fname), arc_type_dict, add) 839 | corpus.append((sub_corpus, dependencies)) 840 | else: 841 | corpus.append((sub_corpus, None)) 842 | 843 | # get word dict 844 | words = [w for sub_corpus, _ in corpus for sent in sub_corpus[0] for w in sent ] 845 | words2idx = {OOV: 0} 846 | for w in words: 847 | if w not in words2idx: 848 | words2idx[w] = len(words2idx) 849 | print 'voc_size:', len(words2idx) 850 | if dep: 851 | print 'arc_type_dict:', len(arc_type_dict), arc_type_dict 852 | train_set_x, train_set_y, train_set_idx = [], [], [] 853 | valid_set_x, valid_set_y, valid_set_idx = [], [], [] 854 | test_set_x, test_set_y, test_set_idx = [], [], [] 855 | 856 | if dep: 857 | train_set_dep = [] 858 | valid_set_dep = [] 859 | test_set_dep = [] 860 | print 'load dependencies as well!!!' 861 | else: 862 | train_set_dep, valid_set_dep, test_set_dep = None, None, None 863 | for i, (sub_corpus, dependencies) in enumerate(corpus): 864 | x, y, idx = sub_corpus 865 | print 'check entity of subcorpus', i 866 | check_entity(x, idx) 867 | if i == dev_fold: 868 | collect_data(valid_set_x, valid_set_y, valid_set_idx, valid_set_dep, x, y, idx, dependencies, words2idx, arc_type_dict, dep, add) 869 | if test_fold is not None and i == test_fold: 870 | collect_data(test_set_x, test_set_y, test_set_idx, test_set_dep, x, y, idx, dependencies, words2idx, arc_type_dict, dep, add) 871 | elif i != dev_fold: 872 | collect_data(train_set_x, train_set_y, train_set_idx, train_set_dep, x, y, idx, dependencies, words2idx, arc_type_dict, dep, add) 873 | print 'after word to index, sizes:', len(train_set_x), len(valid_set_x), len(test_set_x) if test_fold is not None else 0 874 | print 'arc_type_dict:', len(arc_type_dict) 875 | train = [train_set_x, train_set_y, train_set_idx] 876 | valid = [valid_set_x, valid_set_y, valid_set_idx] 877 | if test_fold is not None: 878 | test = [test_set_x, test_set_y, test_set_idx] 879 | if dep: 880 | train.append(train_set_dep) 881 | valid.append(valid_set_dep) 882 | if test_fold is not None: 883 | test.append(test_set_dep) 884 | else: 885 | train.append(None) 886 | valid.append(None) 887 | if test_fold is not None: 888 | test.append(None) 889 | labels2idx = {'+':1, '-':0} 890 | dics = {'words2idx': words2idx, 'labels2idx': labels2idx} 891 | if dep: 892 | dics['arcs2idx'] = arc_type_dict 893 | if test_fold is not None: 894 | return [train, valid, test, dics] 895 | return [train, valid, dics] 896 | 897 | 898 | 899 | def read_file(filename, num_entities=2, labeled=True): 900 | corpus_x = [] 901 | corpus_y = [] 902 | corpus_idx = [] 903 | # urlStr = 'http[s]?://(?:[a-zA-Z]|[1-9]|[$-_@.&+]|[!*,]|(?:%[0-9a-fA-F][0-9a-fA-F]))+' 904 | with cs.open(filename, 'r', encoding='utf-8') as inf: 905 | line_count = 0 906 | for line in inf: 907 | line_count += 1 908 | line = line.strip() 909 | if len(line) == 0: 910 | continue 911 | #sentence, entity_ids_1, entity_ids_2, label = line.split('\t') 912 | elems = line.split('\t') 913 | entity_id_arry = [] 914 | for ett in elems[1:1+num_entities]: 915 | entity_id = map(int, ett.split(' ')) 916 | entity_id_arry.append(entity_id) 917 | assert len(entity_id_arry) == num_entities 918 | assert len(elems) == num_entities + 2 919 | x = elems[0].lower().split(' ') 920 | label = elems[-1] 921 | try: 922 | for i in range(num_entities): 923 | assert entity_id_arry[i][-1] < len(x) 924 | except: 925 | sys.stderr.write('abnormal entity ids:'+str(entity_id_arry)+', sentence length:'+str(len(x))+'\n') 926 | continue 927 | #sentence = stringQ2B(sentence) 928 | if len(x) < 1: 929 | print x 930 | continue 931 | if label == 'None': 932 | y = 0 933 | else: 934 | y = 1 935 | corpus_x.append(x) 936 | corpus_y.append(y) 937 | corpus_idx.append(entity_id_arry) 938 | print 'read file', filename, len(corpus_x), len(corpus_y), len(corpus_idx) 939 | return corpus_x, corpus_y, corpus_idx 940 | 941 | 942 | 943 | def load_data(train_path=None, valid_path=None, test_path=None, num_entities=2, dep=False, train_dep=None, valid_dep=None, test_dep=None, add=True): 944 | print 'loading training data from', train_path, 'loading valid data from', valid_path, 'loading test data from', test_path 945 | corpus = [] 946 | arc_type_dict = dict() 947 | # load training data 948 | #train_corpus = read_file(os.path.join(train_path, content_fname+'.train'), num_entities) 949 | train_corpus = read_file(train_path, num_entities) 950 | if dep: 951 | assert train_dep is not None 952 | dependencies, arc_type_dict = read_graph_dependencies(train_dep, arc_type_dict, add) 953 | corpus.append((train_corpus, dependencies)) 954 | else: 955 | corpus.append((train_corpus, None)) 956 | # load dev data 957 | dev_corpus = read_file(valid_path, num_entities) 958 | if dep: 959 | assert valid_dep is not None 960 | dependencies, arc_type_dict = read_graph_dependencies(valid_dep, arc_type_dict, add) 961 | corpus.append((dev_corpus, dependencies)) 962 | else: 963 | corpus.append((dev_corpus, None)) 964 | # get word dict 965 | words = [w for sub_corpus, _ in corpus for sent in sub_corpus[0] for w in sent ] 966 | # Special treatment for the final PubMed experiments 967 | #words = [w for sent in train_corpus[0] for w in sent ] 968 | words2idx = {OOV: 0} 969 | for w in words: 970 | if w not in words2idx: 971 | words2idx[w] = len(words2idx) 972 | print 'voc_size:', len(words2idx) 973 | if dep: 974 | print 'arc_type_dict:', len(arc_type_dict), arc_type_dict 975 | train_set_x, train_set_y, train_set_idx = [], [], [] 976 | valid_set_x, valid_set_y, valid_set_idx = [], [], [] 977 | test_set_x, test_set_y, test_set_idx = [], [], [] 978 | 979 | if dep: 980 | train_set_dep = [] 981 | valid_set_dep = [] 982 | test_set_dep = [] 983 | print 'load dependencies as well!!!' 984 | else: 985 | train_set_dep, valid_set_dep, test_set_dep = None, None, None 986 | #train_set_dep, valid_set_dep = None, None 987 | train_corpus, dev_corpus = corpus 988 | corp, dependencies = train_corpus 989 | x, y, idx = corp 990 | print 'check entity of training data' 991 | check_entity(x, idx) 992 | collect_data(train_set_x, train_set_y, train_set_idx, train_set_dep, x, y, idx, dependencies, words2idx, arc_type_dict, dep, add) 993 | corp, dependencies = dev_corpus 994 | x, y, idx = corp 995 | print 'check entity of dev data' 996 | check_entity(x, idx) 997 | collect_data(valid_set_x, valid_set_y, valid_set_idx, valid_set_dep, x, y, idx, dependencies, words2idx, arc_type_dict, dep, add) 998 | train = [train_set_x, train_set_y, train_set_idx] 999 | valid = [valid_set_x, valid_set_y, valid_set_idx] 1000 | if test_path is not None: 1001 | test = [test_set_x, test_set_y, test_set_idx] 1002 | if dep: 1003 | train.append(train_set_dep) 1004 | valid.append(valid_set_dep) 1005 | if test_path is not None: 1006 | test.append(test_set_dep) 1007 | else: 1008 | train.append(None) 1009 | valid.append(None) 1010 | if test_path is not None: 1011 | test.append(None) 1012 | labels2idx = {'+':1, '-':0} 1013 | dics = {'words2idx': words2idx, 'labels2idx': labels2idx} 1014 | if dep: 1015 | dics['arcs2idx'] = arc_type_dict 1016 | if test_path is not None: 1017 | return [train, valid, test, dics] 1018 | return [train, valid, valid, dics] 1019 | 1020 | 1021 | if __name__ == '__main__': 1022 | #eval(sys.argv[1])(sys.argv[2], sys.argv[3]) 1023 | #quick_sample(sys.argv[1], sys.argv[2]) 1024 | #quick_check(sys.argv[1], sys.argv[2]) 1025 | #gen_MST_from_json(sys.argv[1], sys.argv[2], sys.argv[3]) 1026 | eval(sys.argv[1])(*sys.argv[2:]) 1027 | #quick_split(sys.argv[1], 5) 1028 | exit(0) 1029 | train, valid, test, dics = load_data(train_path=sys.argv[1], valid_path=sys.argv[2], test_path=sys.argv[3]) 1030 | idx2word = dict((k, v) for v, k in dics['words2idx'].iteritems()) 1031 | for k,v in idx2word.iteritems(): 1032 | print k,v 1033 | -------------------------------------------------------------------------------- /theano_src/edmonds_mst.py: -------------------------------------------------------------------------------- 1 | from collections import defaultdict, namedtuple 2 | 3 | 4 | Arc = namedtuple('Arc', ('tail', 'weight', 'head')) 5 | 6 | 7 | def min_spanning_arborescence(arcs, sink): 8 | good_arcs = [] 9 | quotient_map = {arc.tail: arc.tail for arc in arcs} 10 | quotient_map[sink] = sink 11 | while True: 12 | min_arc_by_tail_rep = {} 13 | successor_rep = {} 14 | for arc in arcs: 15 | if arc.tail == sink: 16 | continue 17 | tail_rep = quotient_map[arc.tail] 18 | head_rep = quotient_map[arc.head] 19 | if tail_rep == head_rep: 20 | continue 21 | if tail_rep not in min_arc_by_tail_rep or min_arc_by_tail_rep[tail_rep].weight > arc.weight: 22 | min_arc_by_tail_rep[tail_rep] = arc 23 | successor_rep[tail_rep] = head_rep 24 | cycle_reps = find_cycle(successor_rep, sink) 25 | if cycle_reps is None: 26 | good_arcs.extend(min_arc_by_tail_rep.values()) 27 | return spanning_arborescence(good_arcs, sink) 28 | good_arcs.extend(min_arc_by_tail_rep[cycle_rep] for cycle_rep in cycle_reps) 29 | cycle_rep_set = set(cycle_reps) 30 | cycle_rep = cycle_rep_set.pop() 31 | quotient_map = {node: cycle_rep if node_rep in cycle_rep_set else node_rep for node, node_rep in quotient_map.items()} 32 | 33 | 34 | def find_cycle(successor, sink): 35 | visited = {sink} 36 | for node in successor: 37 | cycle = [] 38 | while node not in visited: 39 | visited.add(node) 40 | cycle.append(node) 41 | node = successor[node] 42 | if node in cycle: 43 | return cycle[cycle.index(node):] 44 | return None 45 | 46 | 47 | def spanning_arborescence(arcs, sink): 48 | arcs_by_head = defaultdict(list) 49 | for arc in arcs: 50 | if arc.tail == sink: 51 | continue 52 | arcs_by_head[arc.head].append(arc) 53 | solution_arc_by_tail = {} 54 | stack = arcs_by_head[sink] 55 | while stack: 56 | arc = stack.pop() 57 | if arc.tail in solution_arc_by_tail: 58 | continue 59 | solution_arc_by_tail[arc.tail] = arc 60 | stack.extend(arcs_by_head[arc.tail]) 61 | return solution_arc_by_tail 62 | 63 | 64 | def quick_parse(infile): 65 | with open(infile) as inf: 66 | line = inf.readline() 67 | import ast 68 | line_dict = ast.literal_eval(line) 69 | graph = [] 70 | for k,v in line_dict.iteritems(): 71 | for to, weight in v.iteritems(): 72 | graph.append(Arc(to, weight, k)) 73 | print min_spanning_arborescence(graph, 3) 74 | 75 | if __name__ == '__main__': 76 | import sys 77 | quick_parse(sys.argv[1]) 78 | #print(min_spanning_arborescence([Arc(0, 17, 0), Arc(2, 16, 0), Arc(3, 19, 0), Arc(4, 16, 0), Arc(5, 16, 0), Arc(6, 18, 0), Arc(2, 3, 1), Arc(3, 3, 1), Arc(4, 11, 1), Arc(5, 10, 1), Arc(6, 12, 1), Arc(1, 3, 2), Arc(3, 4, 2), Arc(4, 8, 2), Arc(5, 8, 2), Arc(6, 11, 2), Arc(1, 3, 3), Arc(2, 4, 3), Arc(4, 12, 3), Arc(5, 11, 3), Arc(6, 14, 3), Arc(1, 11, 4), Arc(2, 8, 4), Arc(3, 12, 4), Arc(5, 6, 4), Arc(6, 10, 4), Arc(1, 10, 5), Arc(2, 8, 5), Arc(3, 11, 5), Arc(4, 6, 5), Arc(6, 4, 5), Arc(1, 12, 6), Arc(2, 11, 6), Arc(3, 14, 6), Arc(4, 10, 6), Arc(5, 4, 6)], 0)) 79 | -------------------------------------------------------------------------------- /theano_src/lstm_RE.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/python 2 | 3 | import argparse, os, random, subprocess, sys, time 4 | import theano, numpy 5 | import theano.tensor as T 6 | from copy import copy 7 | from neural_lib import StackConfig, ArrayInit 8 | from neural_architectures import CNNRelation, LSTMRelation, LSTMRelation_multitask, GraphLSTMRelation, WeightedGraphLSTMRelation, WeightedAddGraphLSTMRelation, WeightedGraphLSTMRelation_multitask, WeightedAddGraphLSTMRelation_multitask 9 | from train_util import dict_from_argparse, shuffle, create_relation_circuit, convert_id_to_word, add_arg, add_arg_to_L, conv_data_graph, create_multitask_relation_circuit 10 | from train_util import sgd, adadelta, rmsprop, read_matrix_from_gzip, read_matrix_from_file, read_matrix_and_idmap_from_file, batch_run_func, get_minibatches_idx, save_parameters, load_params 11 | from data_process import load_data, load_data_cv, prepare_data 12 | from theano.tensor.nnet import sigmoid 13 | from theano.tensor import tanh 14 | 15 | 16 | ''' Convert the entity index: from indexing to dense-vector(but many zero entries) multiplication.''' 17 | def conv_idxs(idxs, length): 18 | new_idxs = [numpy.zeros(length).astype(theano.config.floatX) for i in range(len(idxs))] 19 | for i, idx in enumerate(idxs): 20 | new_idxs[i][idx] = 1.0 21 | return new_idxs 22 | 23 | ''' For prediction, both batch version and non-batch version''' 24 | def predict(_args, lex_test, idxs_test, f_classify, groundtruth_test, batchsize=1, graph=False, dep=None, weighted=False, print_prediction=False, prediction_file=None): 25 | ''' On the test set predict the labels using f_classify. 26 | Compare those labels against groundtruth. 27 | 28 | It returns a dictionary 'results' that contains 29 | f1 : F1 or Accuracy 30 | p : Precision 31 | r : Recall 32 | ''' 33 | predictions_test = [] 34 | if print_prediction: 35 | assert prediction_file is not None 36 | pred_file = open(prediction_file, 'w') 37 | if batchsize > 1: 38 | nb_idxs = get_minibatches_idx(len(lex_test), batchsize, shuffle=False) 39 | for i, tr_idxs in enumerate(nb_idxs): 40 | words = [lex_test[ii] for ii in tr_idxs] 41 | eidxs = [idxs_test[ii] for ii in tr_idxs] 42 | #labels = [groundtruth_test[ii] for ii in tr_idxs] 43 | orig_eidxs = eidxs 44 | if graph: 45 | assert dep is not None 46 | masks = [dep[ii] for ii in tr_idxs] 47 | else: 48 | masks = None 49 | x, masks, eidxs = prepare_data(words, eidxs, masks, maxlen=200) 50 | if weighted or not graph: 51 | pred_all = f_classify(x, masks, *eidxs) 52 | predictions_test.extend(list(numpy.argmax(pred_all, axis=1))) #[0])) 53 | else: 54 | pred_all = f_classify(x, masks.sum(axis=-1), *eidxs) 55 | predictions_test.extend(list(numpy.argmax(pred_all, axis=1))) 56 | if print_prediction: 57 | for idx, p in zip(tr_idxs, pred_all): 58 | pred_file.write(str(idx) + '\t' + str(p[1]) + '\n') 59 | else: 60 | for i, (word, idxs) in enumerate(zip(lex_test, idxs_test)): 61 | idxs = conv_idxs(idxs, len(word)) 62 | if graph: 63 | assert dep is not None 64 | if weighted: 65 | predictions_test.append(f_classify(word, dep[i], *idxs)) #.sum(axis=-1) 66 | else: 67 | predictions_test.append(f_classify(word, dep[i].sum(axis=-1), *idxs)) #.sum(axis=-1) 68 | else: 69 | predictions_test.append(f_classify(word, *idxs)) 70 | print 'in predict,', len(predictions_test), len(groundtruth_test) 71 | if print_prediction: 72 | pred_file.close() 73 | #results = eval_logitReg_F1(predictions_test, groundtruth_test) 74 | results = eval_logitReg_accuracy(predictions_test, groundtruth_test) 75 | return results, predictions_test 76 | 77 | def eval_logitReg_accuracy(predictions, goldens): 78 | assert len(predictions) == len(goldens) 79 | correct = 0.0 80 | for p, g in zip(predictions, goldens): 81 | if p == g: 82 | correct += 1.0 83 | return correct/len(predictions) 84 | 85 | def eval_logitReg_F1(predictions, goldens): 86 | assert len(predictions) == len(goldens) 87 | tp, fp, fn = 0.0, 0.0, 0.0 88 | for p, g in zip(predictions, goldens): 89 | if p == 1: 90 | if g == 1: 91 | tp += 1.0 92 | else: 93 | fp += 1.0 94 | else: 95 | if g == 1: 96 | fn += 1.0 97 | prec = tp / (tp + fp) if (tp != 0 or fp != 0) else 0 98 | recall = tp / (tp + fn) if (tp != 0 or fn != 0) else 0 99 | #print 'precision:', prec, 'recall:', recall 100 | F1 = 2*prec*recall / (prec + recall) if (prec != 0 or recall != 0) else 0 101 | return prec, recall, F1 102 | 103 | ''' For training on single-task setting, both batch and non-batch version''' 104 | def train_single(train_lex, train_idxs, train_y, _args, f_cost, f_update, epoch_id, learning_rate, nsentences, batchsize=1, dep=None, weighted=False): 105 | ''' This function is called from the main method. and it is primarily responsible for updating the 106 | parameters. Because of the way that create_relation_circuit works that creates f_cost, f_update etc. this function 107 | needs to be flexible and can't be put in a lib. 108 | Look at lstm_dependency_parsing_simplification.py for more pointers. 109 | ''' 110 | # None-batched version 111 | def train_instance(words, idxs, label, learning_rate, f_cost, f_update): 112 | ' Since function is called only for side effects, it is likely useless anywhere else' 113 | if words.shape[0] < 2: 114 | return 0.0 115 | inputs = idxs + [words, label] 116 | iter_cost = f_cost(*inputs) #words, id1, id2, labels) 117 | f_update(learning_rate) 118 | return iter_cost 119 | 120 | # Mini-batch version 121 | def train_batch(words, masks, idxs, label, learning_rate, f_cost, f_update): 122 | if words.shape[0] < 2: 123 | return 0.0 124 | inputs = idxs + [words, masks, label] 125 | iter_cost = f_cost(*inputs) #words, id1, id2, labels) 126 | f_update(learning_rate) 127 | return iter_cost 128 | 129 | ## main body of train 130 | if dep: 131 | shuffle([train_lex, train_idxs, train_y, dep], _args.seed) 132 | else: 133 | shuffle([train_lex, train_idxs, train_y], _args.seed) 134 | if nsentences < len(train_lex): 135 | train_lex = train_lex[:nsentences] 136 | train_idxs = train_idxs[:nsentences] 137 | train_y = train_y[:nsentences] 138 | tic = time.time() 139 | aggregate_cost = 0.0 140 | temp_cost_arr = [0.0] * 2 141 | 142 | # make the judge on whether use mini-batch or not. 143 | # No mini-batch 144 | if batchsize == 1: 145 | for i, (words, idxs, label) in enumerate(zip(train_lex, train_idxs, train_y)): 146 | if len(words) < 2: 147 | continue 148 | #assert len(words) == len(labels) #+ 2 149 | idxs = conv_idxs(idxs, len(words)) 150 | if _args.graph: 151 | assert dep is not None 152 | if weighted: 153 | aggregate_cost += train_batch(words, dep[i], idxs, label, learning_rate, f_cost, f_update) 154 | else: 155 | aggregate_cost += train_batch(words, dep[i].sum(axis=-1), idxs, label, learning_rate, f_cost, f_update) 156 | else: 157 | aggregate_cost += train_instance(words, idxs, label, learning_rate, f_cost, f_update) 158 | if _args.verbose == 2 and i % 10 == 0: 159 | print '[learning] epoch %i >> %2.2f%%' % (epoch_id, (i + 1) * 100. / nsentences), 160 | print 'completed in %.2f (sec). << avg loss: %.2f <<\r' % (time.time() - tic, aggregate_cost/(i+1)), 161 | sys.stdout.flush() 162 | # Mini-batch 163 | else: 164 | nb_idxs = get_minibatches_idx(len(train_lex), batchsize, shuffle=False) 165 | nbatches = len(nb_idxs) 166 | for i, tr_idxs in enumerate(nb_idxs): 167 | words = [train_lex[ii] for ii in tr_idxs] 168 | eidxs = [train_idxs[ii] for ii in tr_idxs] 169 | labels = [train_y[ii] for ii in tr_idxs] 170 | orig_eidxs = eidxs 171 | if _args.graph: 172 | assert dep is not None 173 | masks = [dep[ii] for ii in tr_idxs] 174 | else: 175 | masks = None 176 | x, masks, eidxs = prepare_data(words, eidxs, masks, maxlen=200) 177 | 178 | #print 'mask shape:', masks.shape 179 | if weighted or dep is None: 180 | iter_cost = train_batch(x, masks, eidxs, labels, learning_rate, f_cost, f_update) 181 | aggregate_cost += iter_cost#[0] 182 | else: 183 | aggregate_cost += train_batch(x, masks.sum(axis=-1), eidxs, labels, learning_rate, f_cost, f_update) 184 | if _args.verbose == 2 : 185 | print '[learning] epoch %i >> %2.2f%%' % (epoch_id, (i + 1) * 100. / nbatches), 186 | print 'completed in %.2f (sec). << avg loss: %.2f <<\r' % (time.time() - tic, aggregate_cost/(i+1)), 187 | #print 'completed in %.2f (sec). << avg loss: %.2f <<%%' % (time.time() - tic, aggregate_cost/(i+1)), 188 | #print 'average cost for each part: (%.2f, %.2f) <<\r' %(temp_cost_arr[0]/(i+1), temp_cost_arr[1]/(i+1)), 189 | sys.stdout.flush() 190 | if _args.verbose == 2: 191 | print '\n>> Epoch completed in %.2f (sec) <<' % (time.time() - tic), 'training cost: %.2f' % (aggregate_cost) 192 | 193 | 194 | ''' For training on multi-task setting, both batch and non-batch version''' 195 | def train_alternative(_args, f_costs_and_updates, epoch_id, learning_rate_arr, nsentences_arr, words_arr, label_arr, idx_arr, dep_mask_arr, batch_size): 196 | num_tasks = len(f_costs_and_updates) 197 | print 'num_tasks:', num_tasks 198 | for i in range(num_tasks): 199 | f_cost, f_update = f_costs_and_updates[i] 200 | nsent = nsentences_arr[i] 201 | if nsent < len(words_arr[i]): 202 | if epoch_id == 0: 203 | if dep_mask_arr[0] is not None: 204 | shuffle([words_arr[i], idx_arr[i], label_arr[i], dep_mask_arr[i]], _args.seed) 205 | else: 206 | shuffle([words_arr[i], idx_arr[i], label_arr[i]], _args.seed) 207 | if _args.graph: 208 | train_single(words_arr[i][epoch_id*nsent:(epoch_id+1)*nsent], idx_arr[i][epoch_id*nsent:(epoch_id+1)*nsent], label_arr[i][epoch_id*nsent:(epoch_id+1)*nsent], _args, f_cost, f_update, epoch_id, learning_rate_arr[i], nsentences_arr[i], batch_size, dep_mask_arr[i][epoch_id*nsent:(epoch_id+1)*nsent], _args.weighted) 209 | else: 210 | train_single(words_arr[i][epoch_id*nsent:(epoch_id+1)*nsent], idx_arr[i][epoch_id*nsent:(epoch_id+1)*nsent], label_arr[i][epoch_id*nsent:(epoch_id+1)*nsent], _args, f_cost, f_update, epoch_id, learning_rate_arr[i], nsentences_arr[i], batch_size, None, _args.weighted) 211 | else: 212 | if _args.graph: 213 | train_single(words_arr[i], idx_arr[i], label_arr[i], _args, f_cost, f_update, epoch_id, learning_rate_arr[i], nsentences_arr[i], batch_size, dep_mask_arr[i], _args.weighted) 214 | else: 215 | train_single(words_arr[i], idx_arr[i], label_arr[i], _args, f_cost, f_update, epoch_id, learning_rate_arr[i], nsentences_arr[i], batch_size, None, _args.weighted) 216 | 217 | ''' Initialize some parameters for training and prediction''' 218 | def prepare_corpus(_args): 219 | numpy.random.seed(_args.seed) 220 | random.seed(_args.seed) 221 | if _args.win_r or _args.win_l: 222 | _args.wemb1_win = _args.win_r - _args.win_l + 1 223 | word2idx = _args.dicts['words2idx'] 224 | _args.label2idx = _args.dicts['labels2idx'] 225 | #_args.idx2word = dict((k, v) for v, k in word2idx.iteritems()) 226 | _args.nsentences = len(_args.train_set[1]) 227 | _args.voc_size = len(word2idx) 228 | _args.y_dim = len(_args.label2idx) 229 | if 'arcs2idx' in _args.dicts: 230 | _args.lstm_arc_types = len(_args.dicts['arcs2idx']) 231 | print 'lstm arc types =', _args.lstm_arc_types 232 | #del _args.dicts 233 | #_args.groundtruth_valid = convert_id_to_word(_args.valid_set[-1], _args.idx2label) 234 | #_args.groundtruth_test = convert_id_to_word(_args.test_set[-1], _args.idx2label) 235 | _args.logistic_regression_out_dim = len(_args.label2idx) 236 | eval_args(_args) 237 | _args.lstm_go_backwards = True #False 238 | try: 239 | print 'Circuit:', _args.circuit.__name__ 240 | print 'Chkpt1', len(_args.label2idx), _args.nsentences, _args.train_set[1][0], _args.voc_size, _args.train_set[2][0], _args.valid_set[1][0], _args.valid_set[2][0] 241 | print 'Chkpt2', _args.wemb1_T_initializer.matrix.shape, _args.wemb1_T_initializer.matrix[0] 242 | except AttributeError: 243 | pass 244 | 245 | ''' Compile the architecture.''' 246 | def compile_circuit(_args): 247 | ### build circuits. ### 248 | (_args.f_cost, _args.f_update, _args.f_classify, cargs) = create_relation_circuit(_args, StackConfig) 249 | _args.train_func = train_single 250 | print "Finished Compiling" 251 | return cargs 252 | 253 | def convert_args(_args, prefix): 254 | from types import StringType 255 | for a in _args.__dict__: #TOPO_PARAM + TRAIN_PARAM: 256 | try: 257 | if type(a) is StringType and a.startswith(prefix): 258 | _args.__dict__[a[len(prefix)+1:]] = _args.__dict__[a] 259 | del _args.__dict__[a] 260 | except: 261 | pass 262 | 263 | def eval_args(_args): 264 | for a in _args.__dict__: #TOPO_PARAM + TRAIN_PARAM: 265 | try: 266 | _args.__dict__[a] = eval(_args.__dict__[a]) 267 | except: 268 | pass 269 | 270 | def run_wild_prediction(_args): 271 | best_f1 = -numpy.inf 272 | param = dict(clr = _args.lr, ce = 0, be = 0, epoch_id = -1) 273 | cargs = compile_circuit(_args) 274 | while param['epoch_id']+1 < _args.nepochs: 275 | param['epoch_id'] += 1 276 | run_training(_args, param) 277 | train_lex, train_y, train_idxs, train_dep = _args.train_set 278 | valid_lex, valid_y, valid_idxs, valid_dep = _args.valid_set 279 | res_train, _ = predict(_args, train_lex, train_idxs, _args.f_classify, train_y, _args.batch_size, _args.graph, train_dep, _args.weighted) 280 | res_valid, _ = predict(_args, valid_lex, valid_idxs, _args.f_classify, valid_y, _args.batch_size, _args.graph, valid_dep, _args.weighted, _args.print_prediction, _args.prediction_file) 281 | if _args.verbose: 282 | print('TEST: epoch', param['epoch_id'], 283 | 'train performances' , res_train, 284 | 'valid performances' , res_valid) 285 | print('Training accuracy', res_train, 286 | ) 287 | 288 | 289 | def run_training(_args, param): 290 | train_lex, train_y, train_idxs, train_dep = _args.train_set 291 | _args.train_func(train_lex, train_idxs, train_y, _args, _args.f_cost, _args.f_update, param['epoch_id'], param['clr'], _args.nsentences, _args.batch_size, train_dep, _args.weighted) 292 | 293 | 294 | def run_epochs(_args, test_data=True): 295 | best_f1 = -numpy.inf 296 | param = dict(clr = _args.lr, ce = 0, be = 0, epoch_id = -1) 297 | cargs = compile_circuit(_args) 298 | while param['epoch_id']+1 < _args.nepochs: 299 | param['epoch_id'] += 1 300 | run_training(_args, param) 301 | train_lex, train_y, train_idxs, train_dep = _args.train_set 302 | valid_lex, valid_y, valid_idxs, valid_dep = _args.valid_set 303 | if test_data: 304 | test_lex, test_y, test_idxs, test_dep = _args.test_set 305 | res_train, _ = predict(_args, train_lex, train_idxs, _args.f_classify, train_y, _args.batch_size, _args.graph, train_dep, _args.weighted) 306 | res_valid, _ = predict(_args, valid_lex, valid_idxs, _args.f_classify, valid_y, _args.batch_size, _args.graph, valid_dep, _args.weighted) 307 | if _args.verbose: 308 | print('TEST: epoch', param['epoch_id'], 309 | 'train performances' , res_train, 310 | 'valid performances' , res_valid) 311 | # If this update created a 'new best' model then save it. 312 | if type(res_valid) is tuple: 313 | curr_f1 = res_valid[-1] 314 | else: 315 | curr_f1 = res_valid 316 | if curr_f1 > best_f1: 317 | best_f1 = curr_f1 318 | param['be'] = param['epoch_id'] 319 | param['last_decay'] = param['be'] 320 | param['vf1'] = res_valid 321 | param['best_classifier'] = _args.f_classify 322 | if test_data: 323 | res_test, _ = predict(_args, test_lex, test_idxs, _args.f_classify, test_y, _args.batch_size, _args.graph, test_dep, _args.weighted, _args.print_prediction, _args.prediction_file) 324 | # get the prediction, convert and write to concrete. 325 | param['tf1'] = res_test 326 | print '\nEpoch:%d'%param['be'], 'Test accuracy:', res_test, '\n' 327 | ############## Test load parameters!!!! ######## 328 | #cargs = {} 329 | #print "loading parameters!" 330 | #load_params(_args.parameters_file, cargs) 331 | #f_classify = cargs['f_classify'] 332 | #res_test, _ = predict(_args, test_lex, test_idxs, f_classify, test_y, _args.batch_size, _args.graph, test_dep, _args.weighted) 333 | #print 'Load parameter test accuracy:', res_test, '\n' 334 | ############## End Test ############## 335 | if _args.decay and (param['epoch_id'] - param['last_decay']) >= _args.decay_epochs: 336 | print 'learning rate decay at epoch', param['epoch_id'], '! Previous best epoch number:', param['be'] 337 | param['last_decay'] = param['epoch_id'] 338 | param['clr'] *= 0.5 339 | # If learning rate goes down to minimum then break. 340 | if param['clr'] < _args.minimum_lr: 341 | print "\nLearning rate became too small, breaking out of training" 342 | break 343 | print('BEST RESULT: epoch', param['be'], 344 | 'valid accuracy', param['vf1'], 345 | ) 346 | if test_data: 347 | print('best test accuracy', param['tf1'], 348 | ) 349 | 350 | 351 | def run_multi_task(_args, cargs, num_domains, num_tasks, mode='alternative', test_data=False): 352 | param = dict(epoch_id = -1) 353 | for i in range(num_domains*num_tasks): 354 | param['vf1'+str(i)] = -numpy.inf 355 | while param['epoch_id']+1 < _args.nepochs: 356 | param['epoch_id'] += 1 357 | # Four sets of training in _args.train, each contains feature, lex and label, so the resulted train_data grouped feature, lex and labels together. 358 | train_data = [list(group) for group in zip(*_args.trainSet)] 359 | if mode == 'alternative': 360 | train_alternative(_args, _args.f_costs_and_updates, param['epoch_id'], _args.lr_arr, _args.nsentences_arr, train_data[0], train_data[1], train_data[2], train_data[3], _args.batch_size) 361 | else: 362 | raise NotImplementedError 363 | for i, dev in enumerate(_args.devSet): 364 | train_lex, train_y, train_idxs, train_dep = _args.trainSet[i] 365 | valid_lex, valid_y, valid_idxs, valid_dep = dev 366 | if i == 0: 367 | train_lex, train_y, train_idxs = train_lex[:_args.nsentences_arr[i]], train_y[:_args.nsentences_arr[i]], train_idxs[:_args.nsentences_arr[i]] 368 | sample_idx = random.sample(range(len(valid_y)), 3000) 369 | valid_lex = [valid_lex[idx] for idx in sample_idx] 370 | valid_idxs = [valid_idxs[idx] for idx in sample_idx] 371 | valid_y = [valid_y[idx] for idx in sample_idx] 372 | if _args.graph: 373 | train_dep = train_dep[:_args.nsentences_arr[i]] 374 | valid_dep = [valid_dep[idx] for idx in sample_idx] 375 | res_train, _ = predict(_args, train_lex, train_idxs, _args.f_classifies[i], train_y, _args.batch_size, _args.graph, train_dep, _args.weighted) 376 | res_valid, _ = predict(_args, valid_lex, valid_idxs, _args.f_classifies[i], valid_y, _args.batch_size, _args.graph, valid_dep, _args.weighted, _args.print_prediction, _args.prediction_files[i]) 377 | if _args.verbose: 378 | print('TEST: epoch', param['epoch_id'], 'task:', i, 379 | 'train F1' , res_train, 380 | 'valid F1' , res_valid) 381 | if res_valid > param['vf1'+str(i)]: 382 | param['be'+str(i)] = param['epoch_id'] 383 | param['last_decay'+str(i)] = param['be'+str(i)] 384 | param['vf1'+str(i)] = res_valid 385 | #cargs['f_classify_'+str(i)] = _args.f_classifies[i] 386 | #save_parameters(_args.parameters_file, cargs) 387 | if test_data: 388 | if i == 0: 389 | tws, tls, tis, tds = valid_lex, valid_y, valid_idxs, valid_dep 390 | else: 391 | tws, tls, tis, tds = _args.testSet[i] 392 | pred_res, _ = predict(_args, tws, tis, _args.f_classifies[i], tls, _args.batch_size, _args.graph, tds, _args.weighted) 393 | # also store test performance here. 394 | param['tf1'+str(i)] = pred_res 395 | param['best_classifier'+str(i)] = _args.f_classifies[i] 396 | if _args.verbose: 397 | print('test F1' , pred_res) 398 | if _args.decay and (param['epoch_id'] - param['last_decay'+str(i)]) >= args.decay_epochs: 399 | print 'learning rate decay at epoch', param['epoch_id'], ', for dataset', i, '! Previous best epoch number:', param['be'+str(i)], 'current learning rates:', _args.lr_arr 400 | param['last_decay'+str(i)] = param['epoch_id'] 401 | _args.lr_arr[i] *= 0.5 402 | # If learning rate goes down to minimum then break. 403 | if _args.lr_arr[i] < args.minimum_lr: 404 | print "\nNER Learning rate became too small, breaking out of training" 405 | break 406 | for i in range(num_domains*num_tasks): 407 | print 'DATASET', i, 'BEST RESULT: epoch', param['be'+str(i)], 'valid F1', param['vf1'+str(i)], 408 | if test_data: 409 | print 'best test F1', param['tf1'+str(i)] 410 | 411 | 412 | def prepare_params_shareLSTM(_args): 413 | numpy.random.seed(_args.seed) 414 | random.seed(_args.seed) 415 | # Do not load data if we want to debug topology 416 | _args.voc_size = len(_args.global_word_map) 417 | #_args.idx2word = dict((k, v) for v, k in _args.global_word_map.iteritems()) 418 | print 'sentence size array:', _args.nsentences_arr 419 | #_args.nsentences = min(nsentences_arr) 420 | _args.wemb1_win = _args.win_r - _args.win_l + 1 421 | _args.lstm_go_backwards = False 422 | #del _args.global_word_map 423 | eval_args(_args) 424 | print 'Chkpt1: datasets label sizes:', 425 | print [len(label_dict) for label_dict in _args.idx2label_dicts], 426 | print 'training datasets size:', 427 | print [len(ds[0]) for ds in _args.trainSet], 428 | print 'vocabulary size:', _args.voc_size 429 | 430 | 431 | def combine_word_dicts(dict1, dict2): 432 | print 'the size of the two dictionaries are:', len(dict1), len(dict2) 433 | combine_dict = dict1.copy() 434 | for k, v in dict2.items(): 435 | if k not in combine_dict: 436 | combine_dict[k] = len(combine_dict) 437 | print 'the size of the combined dictionary is:', len(combine_dict) 438 | return combine_dict 439 | 440 | 441 | def load_all_data_multitask(_args): 442 | # load 3 corpora 443 | _args.loaddata = load_data_cv 444 | dataSets = [] 445 | dataset_map = dict() 446 | lr_arr = [] 447 | _args.num_entity_d0 = 2 448 | arc_type_dict = dict() 449 | _args.prediction_files = [_args.drug_gene_prediction_file, _args.drug_var_prediction_file, _args.triple_prediction_file] 450 | dataSets.append(_args.loaddata(_args.drug_gene_dir, _args.total_fold, _args.dev_fold, _args.test_fold, arc_type_dict, _args.num_entity_d0, dep=_args.graph, content_fname=_args.content_file, dep_fname=_args.dependent_file, add=_args.add)) 451 | dataset_map['drug_gene'] = len(dataset_map) 452 | lr_arr.append(_args.dg_lr) 453 | _args.num_entity_d1 = 2 454 | dataSets.append(_args.loaddata(_args.drug_variant_dir, _args.total_fold, _args.dev_fold, _args.test_fold, arc_type_dict, _args.num_entity_d1, dep=_args.graph, content_fname=_args.content_file, dep_fname=_args.dependent_file, add=_args.add)) 455 | dataset_map['drug_variant'] = len(dataset_map) 456 | lr_arr.append(_args.dv_lr) 457 | _args.num_entity_d2 = 3 458 | dataSets.append(_args.loaddata(_args.drug_gene_variant_dir, _args.total_fold, _args.dev_fold, _args.test_fold, arc_type_dict, _args.num_entity_d2, dep=_args.graph, content_fname=_args.content_file, dep_fname=_args.dependent_file, add=_args.add)) 459 | dataset_map['drug_gene_variant'] = len(dataset_map) 460 | lr_arr.append(_args.dgv_lr) 461 | # load embedding 462 | _args.global_word_map = dict() 463 | for ds in dataSets: 464 | _args.global_word_map = combine_word_dicts(_args.global_word_map, ds[-1]['words2idx']) 465 | if _args.emb_dir != 'RANDOM': 466 | print 'started loading embeddings from file', _args.emb_dir 467 | M_emb, _ = read_matrix_from_file(_args.emb_dir, _args.global_word_map) 468 | print 'global map size:', len(M_emb), len(_args.global_word_map) 469 | ## load pretrained embeddings 470 | _args.emb_matrix = theano.shared(M_emb, name='emb_matrix') 471 | _args.emb_dim = len(M_emb[0]) 472 | _args.wemb1_out_dim = _args.emb_dim 473 | if _args.fine_tuning : 474 | print 'fine tuning!!!!!' 475 | _args.emb_matrix.is_regularizable = True 476 | print 'loading data dataset map:', dataset_map 477 | return dataSets, lr_arr, dataset_map 478 | 479 | ### convert the old word idx to the new one ### 480 | def convert_word_idx(corpus_word, idx2word_old, word2idx_new): 481 | if type(corpus_word[0]) is int: 482 | return [word2idx_new[idx2word_old[idx]] for idx in corpus_word] 483 | else: 484 | return [convert_word_idx(line, idx2word_old, word2idx_new) for line in corpus_word] 485 | 486 | 487 | def data_prep_shareLSTM(_args): 488 | _args.rng = numpy.random.RandomState(_args.seed) 489 | dataSets, _args.lr_arr, dataset_map = load_all_data_multitask(_args) 490 | if 'arcs2idx' in dataSets[0][-1]: 491 | _args.lstm_arc_types = len(dataSets[0][-1]['arcs2idx']) 492 | print 'lstm arc types =', _args.lstm_arc_types 493 | ## re-map words in the news cws dataset 494 | idx2word_dicts = [dict((k, v) for v, k in ds[-1]['words2idx'].iteritems()) for ds in dataSets] 495 | _args.idx2label_dicts = [dict((k, v) for v, k in ds[-1]['labels2idx'].iteritems()) for ds in dataSets] 496 | for i, ds in enumerate(dataSets): 497 | # ds structure: train_set, valid_set, test_set, dicts 498 | print len(ds[0]), len(ds[0][0][-1]), ds[0][1][-1], ds[0][2][-1] 499 | print len(ds[1]), len(ds[1][0][-1]), ds[1][1][-1], ds[1][2][-1] 500 | print len(ds[2]), len(ds[2][0][-1]), ds[2][1][-1], ds[2][2][-1] 501 | ds[0][0], ds[1][0], ds[2][0] = batch_run_func((ds[0][0], ds[1][0], ds[2][0]), convert_word_idx, idx2word_dicts[i], _args.global_word_map) 502 | ## convert word, feature and label for array to numpy array 503 | ds[0], ds[1], ds[2] = batch_run_func((ds[0], ds[1], ds[2]), conv_data_graph, _args.win_l, _args.win_r) 504 | check_input(ds[0][:3], len(_args.global_word_map)) 505 | check_input(ds[1][:3], len(_args.global_word_map)) 506 | '''Probably want to move part of the below code to the common part.''' 507 | _args.trainSet = [ds[0] for ds in dataSets] 508 | _args.devSet = [ds[1] for ds in dataSets] 509 | _args.testSet = [ds[2] for ds in dataSets] 510 | _args.nsentences_arr = [len(ds[0]) for ds in _args.trainSet] 511 | if _args.sample_coef != 0: 512 | _args.nsentences_arr[0] = int(_args.sample_coef * _args.nsentences_arr[-1]) 513 | 514 | 515 | def check_input(dataset, voc_size): 516 | for i, (x, y, idx) in enumerate(zip(*dataset)): 517 | sent_len = len(x) 518 | for ii in idx: 519 | try: 520 | assert numpy.all(numpy.array(ii) < sent_len) and numpy.all(ii > -1) 521 | except: 522 | print 'abnormal index:', ii, 'at instance', i, 'sentence length:', sent_len 523 | try: 524 | assert (y == 0 or y == 1) 525 | except: 526 | print 'abnormal label:', y 527 | try: 528 | assert numpy.all(numpy.array(x) < voc_size) 529 | except: 530 | print 'abnormal input:', x 531 | 532 | def run_wild_test(_args): 533 | _args.rng = numpy.random.RandomState(_args.seed) 534 | _args.loaddata = load_data 535 | if 'Graph' in _args.circuit: 536 | _args.graph = True 537 | if 'Add' in _args.circuit: 538 | _args.add = True 539 | if 'Weighted' in _args.circuit: 540 | _args.weighted = True 541 | _args.train_set, _args.valid_set, _args.test_set, _args.dicts = _args.loaddata(_args.train_path, _args.valid_path, num_entities=_args.num_entity, dep=_args.graph, train_dep=_args.train_graph, valid_dep=_args.valid_graph, add=_args.add) 542 | # convert the data from array to numpy arrays 543 | _args.train_set, _args.valid_set, _args.test_set = batch_run_func((_args.train_set, _args.valid_set, _args.test_set), conv_data_graph, _args.win_l, _args.win_r) 544 | print 'word dict size:', len(_args.dicts['words2idx']) 545 | print 'checking training data!' 546 | check_input(_args.train_set[:3], len(_args.dicts['words2idx'])) 547 | print 'checking test data!' 548 | check_input(_args.valid_set[:3], len(_args.dicts['words2idx'])) 549 | print 'finish check inputs!!!' 550 | word2idx = _args.dicts['words2idx'] 551 | prepare_corpus(_args) 552 | if _args.emb_dir != 'RANDOM': 553 | print 'started loading embeddings from file', _args.emb_dir 554 | M_emb, _ = read_matrix_from_file(_args.emb_dir, word2idx) 555 | #M_emb, _ = read_matrix_from_gzip(_args.emb_dir, word2idx) 556 | print 'global map size:', len(M_emb) #, count, 'of them are initialized from glove' 557 | emb_var = theano.shared(M_emb, name='emb_matrix') 558 | _args.emb_matrix = emb_var 559 | _args.emb_dim = len(M_emb[0]) 560 | _args.wemb1_out_dim = _args.emb_dim 561 | if _args.fine_tuning : 562 | print 'fine tuning!!!!!' 563 | _args.emb_matrix.is_regularizable = True 564 | run_wild_prediction(_args) 565 | 566 | 567 | def run_single_corpus(_args): 568 | _args.rng = numpy.random.RandomState(_args.seed) 569 | _args.loaddata = load_data_cv 570 | if 'Graph' in _args.circuit: 571 | _args.graph = True 572 | if 'Add' in _args.circuit: 573 | _args.add = True 574 | if 'Weighted' in _args.circuit: 575 | _args.weighted = True 576 | # For the GENIA experiment 577 | #_args.train_set, _args.valid_set, _args.test_set, _args.dicts = _args.loaddata(_args.data_dir, _args.data_dir, num_entities=_args.num_entity, dep=_args.graph, content_fname=_args.content_file, dep_fname=_args.dependent_file, add=_args.add) 578 | # For the n-ary experiments 579 | _args.train_set, _args.valid_set, _args.test_set, _args.dicts = _args.loaddata(_args.data_dir, _args.total_fold, _args.dev_fold, _args.test_fold, num_entities=_args.num_entity, dep=_args.graph, content_fname=_args.content_file, dep_fname=_args.dependent_file, add=_args.add) 580 | # convert the data from array to numpy arrays 581 | _args.train_set, _args.valid_set, _args.test_set = batch_run_func((_args.train_set, _args.valid_set, _args.test_set), conv_data_graph, _args.win_l, _args.win_r) 582 | print 'word dict size:', len(_args.dicts['words2idx']) 583 | print 'checking training data!' 584 | check_input(_args.train_set[:3], len(_args.dicts['words2idx'])) 585 | print 'checking test data!' 586 | check_input(_args.valid_set[:3], len(_args.dicts['words2idx'])) 587 | print 'finish check inputs!!!' 588 | word2idx = _args.dicts['words2idx'] 589 | prepare_corpus(_args) 590 | #for k, v in word2idx.iteritems(): 591 | # print k, v 592 | if _args.emb_dir != 'RANDOM': 593 | print 'started loading embeddings from file', _args.emb_dir 594 | M_emb, _ = read_matrix_from_file(_args.emb_dir, word2idx) 595 | #M_emb, _ = read_matrix_from_gzip(_args.emb_dir, word2idx) 596 | print 'global map size:', len(M_emb) #, count, 'of them are initialized from glove' 597 | emb_var = theano.shared(M_emb, name='emb_matrix') 598 | _args.emb_matrix = emb_var 599 | _args.emb_dim = len(M_emb[0]) 600 | _args.wemb1_out_dim = _args.emb_dim 601 | if _args.fine_tuning : 602 | print 'fine tuning!!!!!' 603 | _args.emb_matrix.is_regularizable = True 604 | run_epochs(_args) 605 | 606 | 607 | def run_corpora_multitask(_args): 608 | if 'Graph' in _args.circuit: 609 | _args.graph = True 610 | if 'Add' in _args.circuit: 611 | _args.add = True 612 | if 'Weighted' in _args.circuit: 613 | _args.weighted = True 614 | data_prep_shareLSTM(_args) 615 | prepare_params_shareLSTM(_args) 616 | _args.f_costs_and_updates, _args.f_classifies, cargs = create_multitask_relation_circuit(_args, StackConfig, len(_args.trainSet)) 617 | print "Finished Compiling" 618 | run_multi_task(_args, cargs, 1, len(_args.trainSet), mode=_args.train_mode) #, test_data=True) 619 | 620 | 621 | def create_arg_parser(args=None): 622 | _arg_parser = argparse.ArgumentParser(description='LSTM') 623 | add_arg.arg_parser = _arg_parser 624 | add_arg('--setting' , 'run_single_corpus', help='Choosing between running single corpus or multi-task') 625 | ## File IO 626 | # For single task 627 | add_arg('--data_dir' , '.') 628 | # For multitask 629 | add_arg('--drug_gene_dir' , '.') 630 | add_arg('--drug_variant_dir' , '.') 631 | add_arg('--drug_gene_variant_dir' , '.') 632 | # End for multitask 633 | # For wild prediction 634 | add_arg('--train_path' , '.') 635 | add_arg('--valid_path' , '.') 636 | add_arg('--train_graph' , '.') 637 | add_arg('--valid_graph' , '.') 638 | add_arg('--content_file' , 'sentences') 639 | add_arg('--dependent_file' , 'graph_arcs') 640 | add_arg('--parameters_file' , 'best_parameters') 641 | add_arg('--prediction_file' , 'prediction') 642 | add_arg('--drug_gene_prediction_file' , '.') 643 | add_arg('--drug_var_prediction_file' , '.') 644 | add_arg('--triple_prediction_file' , '.') 645 | add_arg('--num_entity' , 2) 646 | add_arg('--total_fold' , 10) 647 | add_arg('--dev_fold' , 0) 648 | add_arg('--test_fold' , 1) 649 | add_arg('--circuit' , 'LSTMRelation') 650 | add_arg('--emb_dir' , '../treelstm/data', help='The initial embedding file name for cws') 651 | add_arg('--wemb1_dropout_rate' , 0.0, help='Dropout rate for the input embedding layer') 652 | add_arg('--lstm_dropout_rate' , 0.0, help='Dropout rate for the lstm output embedding layer') 653 | add_arg('--representation' , 'charpos', help='Use which representation') 654 | add_arg('--fine_tuning' , True) 655 | add_arg('--feature_thresh' , 0) 656 | add_arg('--graph' , False) 657 | add_arg('--weighted' , False) 658 | add_arg('--add' , False) 659 | add_arg('--print_prediction' , True) 660 | ## Task 661 | add_arg('--task' , 'news_cws') 662 | add_arg('--oovthresh' , 0 , help="The minimum count (upto and including) OOV threshold for NER") # Maybe 1 ? 663 | ## Training 664 | add_arg_to_L(TRAIN_PARAM, '--cost_coef' , 0.0) 665 | add_arg_to_L(TRAIN_PARAM, '--sample_coef' , 0.0) 666 | add_arg_to_L(TRAIN_PARAM, '--batch_size' , 1) 667 | add_arg_to_L(TRAIN_PARAM, '--train_mode' , 'alternative') 668 | add_arg_to_L(TRAIN_PARAM, '--lr' , 0.01) 669 | add_arg_to_L(TRAIN_PARAM, '--dg_lr' , 0.005) 670 | add_arg_to_L(TRAIN_PARAM, '--dv_lr' , 0.005) 671 | add_arg_to_L(TRAIN_PARAM, '--dgv_lr' , 0.005) 672 | add_arg_to_L(TRAIN_PARAM, '--nepochs' , 30) 673 | add_arg_to_L(TRAIN_PARAM, '--optimizer' , 'sgd', help='sgd or adadelta') 674 | add_arg_to_L(TRAIN_PARAM, '--seed' , 1) #int(random.getrandbits(10))) 675 | add_arg_to_L(TRAIN_PARAM, '--decay' , True, help='whether learning rate decay') 676 | add_arg_to_L(TRAIN_PARAM, '--decay_epochs' , 5) 677 | add_arg_to_L(TRAIN_PARAM, '--minimum_lr' , 1e-5) 678 | ## Topology 679 | add_arg_to_L(TOPO_PARAM, '--emission_trans_out_dim', -1) 680 | add_arg_to_L(TOPO_PARAM, '--crf_viterbi', False) 681 | add_arg_to_L(TOPO_PARAM, '--lstm_win_size', 5) 682 | add_arg_to_L(TOPO_PARAM, '--wemb1_out_dim', 300) 683 | add_arg_to_L(TOPO_PARAM, '--lstm_out_dim', 150) 684 | add_arg_to_L(TOPO_PARAM, '--CNN_out_dim', 500) 685 | add_arg_to_L(TOPO_PARAM, '--lstm_type_dim', 50) 686 | add_arg_to_L(TOPO_PARAM, '--MLP_hidden_out_dim', 1000) 687 | add_arg_to_L(TOPO_PARAM, '--MLP_activation_fn', 'tanh') 688 | add_arg_to_L(TOPO_PARAM, '--L2Reg_reg_weight', 0.0) 689 | add_arg_to_L(TOPO_PARAM, '--win_l', 0) 690 | add_arg_to_L(TOPO_PARAM, '--win_r', 0) 691 | ## DEBUG 692 | add_arg('--verbose' , 2) 693 | 694 | return _arg_parser 695 | 696 | 697 | if __name__ == "__main__": 698 | ####################################################################################### 699 | ## PARSE ARGUMENTS, BUILD CIRCUIT, TRAIN, TEST 700 | ####################################################################################### 701 | TOPO_PARAM = [] 702 | TRAIN_PARAM = [] 703 | _arg_parser = create_arg_parser() 704 | args = _arg_parser.parse_args() 705 | #run_wild_test(args) 706 | #run_corpora_multitask(args) 707 | eval(args.setting)(args) 708 | -------------------------------------------------------------------------------- /theano_src/neural_architectures.py: -------------------------------------------------------------------------------- 1 | import theano.tensor as T 2 | from neural_lib import * 3 | 4 | # Note: need to refactor many places to return regularizable params lists for the optimization. 5 | 6 | def calculate_params_needed(chips): 7 | l = [] 8 | for c, _ in chips: 9 | l += c.needed_key() 10 | return l 11 | 12 | ''' Automatically create(initialize) layers and hook them together''' 13 | def stackLayers(chips, current_chip, params, feature_size=0, entity_size=2): 14 | instantiated_chips = [] 15 | print 'stack layers!!!' 16 | for e in chips: 17 | previous_chip = current_chip 18 | if e[1].endswith('feature_emission_trans'): 19 | current_chip = e[0](e[1], params).prepend(previous_chip, feature_size) 20 | elif e[1].endswith('target_columns'): 21 | current_chip = e[0](e[1], params).prepend(previous_chip, entity_size) 22 | else: 23 | current_chip = e[0](e[1], params).prepend(previous_chip) 24 | instantiated_chips.append((current_chip, e[1])) 25 | print 'current chip:', e[1], "In_dim:", current_chip.in_dim, "Out_dim:", current_chip.out_dim 26 | print 'needed keys:' 27 | for e in current_chip.needed_key(): 28 | print (e, params[e]) 29 | return instantiated_chips 30 | 31 | ''' Compute the initialized layers by feed in the inputs. ''' 32 | def computeLayers(instantiated_chips, current_chip, params, feature_input=None, entities_input=None, mask=None): 33 | print 'compute layers!!!' 34 | regularizable_params = [] 35 | for e in instantiated_chips: 36 | previous_chip = current_chip 37 | current_chip = e[0] 38 | print 'current chip:', e[1], "In_dim:", current_chip.in_dim, "Out_dim:", current_chip.out_dim 39 | if e[1].endswith('feature_emission_trans'): 40 | internal_params = current_chip.parameters 41 | current_chip.compute(previous_chip.output_tv, feature_input) 42 | elif e[1].endswith('target_columns') or e[1].endswith('Entity_Att'): 43 | internal_params = current_chip.parameters 44 | current_chip.compute(previous_chip.output_tv, entities_input) 45 | elif e[1].endswith('lstm'): 46 | internal_params = current_chip.parameters 47 | current_chip.compute(previous_chip.output_tv, mask) 48 | else: 49 | internal_params = current_chip.parameters 50 | current_chip.compute(previous_chip.output_tv) 51 | assert current_chip.output_tv is not None 52 | for k in internal_params: 53 | print 'internal_params:', k.name 54 | assert k.is_regularizable 55 | params[k.name] = k 56 | regularizable_params.append(k) 57 | return regularizable_params 58 | 59 | ''' Compile the architectures for single-task learning: stack the layers, compute the forward pass.''' 60 | def RelationStackMaker(chips, params, graph=False, weighted=False, batched=False): 61 | if batched: 62 | emb_input = T.itensor3('emb_input') 63 | entities_tv = [T.fmatrix('enidx_'+str(i)).astype(theano.config.floatX) for i in range(params['num_entity'])] 64 | if graph: 65 | if weighted: 66 | masks = T.ftensor4('child_mask') 67 | else: 68 | masks = T.ftensor3('child_mask') 69 | else: 70 | masks = T.fmatrix('batch_mask') 71 | else: 72 | emb_input = T.imatrix('emb_input') 73 | entities_tv = [T.fvector('enidx_'+str(i)).astype(theano.config.floatX) for i in range(params['num_entity'])] 74 | if graph: 75 | if weighted: 76 | masks = T.ftensor3('child_mask') 77 | else: 78 | masks = T.fmatrix('child_mask') 79 | else: 80 | masks = None 81 | #print masks, type(masks), masks.ndim 82 | current_chip = Start(params['voc_size'], emb_input) 83 | print '\n', 'Building Stack now', '\n', 'Start: ', params['voc_size'], 'out_tv dim:', current_chip.output_tv.ndim 84 | instantiated_chips = stackLayers(chips, current_chip, params, entity_size=params['num_entity']) 85 | regularizable_params = computeLayers(instantiated_chips, current_chip, params, entities_input=entities_tv, mask=masks) 86 | ### Debug use: Get the attention co-efficiency and visualize. ### 87 | for c in instantiated_chips: 88 | if c[1].endswith('Entity_Att'): 89 | assert hasattr(c[0], 'att_wt_arry') 90 | assert hasattr(c[0], 'entity_tvs') 91 | attention_weights = c[0].att_wt_arry 92 | entity_tvs = c[0].entity_tvs 93 | 94 | current_chip = instantiated_chips[-1][0] 95 | if current_chip.output_tv.ndim == 2: 96 | pred_y = current_chip.output_tv #T.argmax(current_chip.output_tv, axis=1) 97 | else: 98 | pred_y = current_chip.output_tv #T.argmax(current_chip.output_tv) #, axis=1) 99 | gold_y = (current_chip.gold_y 100 | if hasattr(current_chip, 'gold_y') 101 | else None) 102 | # Show all parameters that would be needed in this system 103 | params_needed = calculate_params_needed(instantiated_chips) 104 | print "Parameters Needed", params_needed 105 | for k in params_needed: 106 | assert k in params, k 107 | print k, params[k] 108 | assert hasattr(current_chip, 'score') 109 | cost = current_chip.score #/ params['nsentences'] 110 | cost_arr = [cost] 111 | for layer in instantiated_chips[:-1]: 112 | if hasattr(layer[0], 'score'): 113 | print layer[1] 114 | cost += params['cost_coef'] * layer[0].score 115 | cost_arr.append(params['cost_coef'] * layer[0].score) 116 | 117 | grads = T.grad(cost, 118 | wrt=regularizable_params) 119 | #[params[k] for k in params if (hasattr(params[k], 'is_regularizable') and params[k].is_regularizable)]) 120 | print 'Regularizable parameters:' 121 | for k, v in params.items(): 122 | if hasattr(v, 'is_regularizable'): 123 | print k, v, v.is_regularizable 124 | if graph or batched: 125 | #return (emb_input, masks, entities_tv, attention_weights, entity_tvs, gold_y, pred_y, cost, grads, regularizable_params) 126 | return (emb_input, masks, entities_tv, gold_y, pred_y, cost, grads, regularizable_params) 127 | else: 128 | return (emb_input, entities_tv, gold_y, pred_y, cost, grads, regularizable_params) 129 | 130 | 131 | ''' Compile the architectures for multi-task learning: stack the layers, compute the forward pass.''' 132 | def MultitaskRelationStackMaker(Shared, Classifiers, params, num_tasks, graph=False, weighted=False, batched=False): 133 | if batched: 134 | emb_inputs = [T.itensor3('emb_input_'+str(i)) for i in range(num_tasks)] 135 | entities_tv = [[T.fmatrix('enidx_'+str(j)+'_t_'+str(i)) 136 | for j in range(params['num_entity_d'+str(i)])] 137 | for i in range(num_tasks)] 138 | if graph: 139 | if weighted: 140 | masks = [T.ftensor4('child_mask_d'+str(i)) for i in range(num_tasks)] 141 | else: 142 | masks = [T.ftensor3('child_mask_d'+str(i)) for i in range(num_tasks)] 143 | else: 144 | masks = [T.fmatrix('batch_mask_d'+str(i)) for i in range(num_tasks)] 145 | else: 146 | emb_inputs = [T.imatrix('emb_input_'+str(i)) for i in range(num_tasks)] 147 | entities_tv = [[T.fvector('enidx_'+str(j)+'_t_'+str(i)) 148 | for j in range(params['num_entity_d'+str(i)])] 149 | for i in range(num_tasks)] 150 | if graph: 151 | if weighted: 152 | masks = [T.ftensor3('child_mask_d'+str(i)) for i in range(num_tasks)] 153 | else: 154 | masks = [T.fmatrix('child_mask_d'+str(i)) for i in range(num_tasks)] 155 | else: 156 | masks = None 157 | current_chip = Start(params['voc_size'], None) 158 | instantiated_chips = stackLayers(Shared, current_chip, params) 159 | print 'Building Classifiers for tasks, input dim:', current_chip.out_dim 160 | pred_ys = [] 161 | gold_ys = [] 162 | costs_arr = [] 163 | grads_arr = [] 164 | regularizable_param_arr = [] 165 | global_regularizable_params = [] 166 | for i, clsfier in enumerate(Classifiers): 167 | #feature_size = len(params['features2idx_dicts'][i]) #params['feature_size_'+str(i)] 168 | current_chip = instantiated_chips[-1][0] 169 | decoder_chips = stackLayers(clsfier, current_chip, params, entity_size=params['num_entity_d'+str(i)]) 170 | ## Note: this implementation only uses the LSTM hidden layer 171 | temp_chips = instantiated_chips + decoder_chips 172 | init_chip = Start(params['voc_size'], emb_inputs[i]) 173 | if batched: 174 | regularizable_params = computeLayers(temp_chips, init_chip, params, entities_input=entities_tv[i], mask=masks[i]) 175 | else: 176 | regularizable_params = computeLayers(temp_chips, init_chip, params, entities_input=entities_tv[i]) 177 | global_regularizable_params.extend(regularizable_params) 178 | regularizable_param_arr.append(regularizable_params) 179 | #task_chips.append(temp_chips) 180 | current_chip = temp_chips[-1][0] 181 | if current_chip.output_tv.ndim == 2: 182 | pred_ys.append(current_chip.output_tv) #T.argmax(current_chip.output_tv, axis=1)) 183 | else: 184 | pred_ys.append(current_chip.output_tv) #T.argmax(current_chip.output_tv, axis=0)) 185 | gold_ys.append(current_chip.gold_y) 186 | assert hasattr(current_chip, 'score') 187 | cost = current_chip.score 188 | costs_arr.append(cost) #/params['nsentences'] 189 | grads_arr.append( T.grad(cost, 190 | wrt=regularizable_params) ) 191 | # Show all parameters that would be needed in this system 192 | params_needed = ['voc_size', 'feature_size_'+str(i)] 193 | params_needed += calculate_params_needed(temp_chips) 194 | #cost = sum(costs_arr) 195 | #global_regularizable_params = list(set(global_regularizable_params)) 196 | #grads = T.grad(cost, 197 | # wrt=global_regularizable_params) 198 | print 'The joint model regularizable parameters:' 199 | for k, v in params.items(): 200 | if hasattr(v, 'is_regularizable'): 201 | print k, v, v.is_regularizable 202 | #return (emb_inputs, entities_tv, gold_ys, pred_ys, costs_arr, cost, grads_arr, grads, regularizable_param_arr, global_regularizable_params) 203 | if batched or graph: 204 | return (emb_inputs, entities_tv, masks, gold_ys, pred_ys, costs_arr, grads_arr, regularizable_param_arr) 205 | else: 206 | return (emb_inputs, entities_tv, gold_ys, pred_ys, costs_arr, grads_arr, regularizable_param_arr) 207 | 208 | 209 | ''' Single task architectures, suitable for CNN, BiLSTM (with/without input attention). 210 | The major difference between this and the graphLSTM single task architecture is the final line: the paremeters given to RelationStackMaker function''' 211 | def LSTMRelation(params): 212 | chips = [ 213 | (Embedding ,'wemb1'), 214 | (BiLSTM ,'lstm'), 215 | (TargetHidden ,'get_target_columns'), 216 | #(Entity_attention ,'hidden_Entity_Att'), 217 | #(BiasedLinear ,'MLP_hidden'), 218 | #(Activation ,'MLP_activation'), 219 | (LogitRegression ,'logistic_regression'), 220 | (L2Reg, 'L2Reg'), 221 | ] 222 | return RelationStackMaker(chips, params, batched=(params['batch_size']>1)) 223 | 224 | def CNNRelation(params): 225 | chips = [ 226 | (Embedding ,'wemb1'), 227 | (Entity_attention ,'input_Entity_Att'), 228 | (Convolutional_NN ,'CNN'), 229 | (LogitRegression ,'logistic_regression'), 230 | (L2Reg, 'L2Reg'), 231 | ] 232 | return RelationStackMaker(chips, params, batched=(params['batch_size']>1)) 233 | 234 | 235 | 236 | def GraphLSTMRelation(params): 237 | chips = [ 238 | (Embedding ,'wemb1'), 239 | (BiGraphLSTM ,'lstm'), 240 | (TargetHidden ,'get_target_columns'), 241 | (LogitRegression ,'logistic_regression') 242 | ] 243 | return RelationStackMaker(chips, params, graph=True, batched=(params['batch_size']>1)) 244 | 245 | 246 | def WeightedGraphLSTMRelation(params): 247 | chips = [ 248 | (Embedding ,'wemb1'), 249 | (BiGraphLSTM_Wtd ,'lstm'), 250 | (TargetHidden ,'get_target_columns'), 251 | (LogitRegression ,'logistic_regression'), 252 | ] 253 | return RelationStackMaker(chips, params, graph=True, weighted=True, batched=(params['batch_size']>1)) 254 | 255 | 256 | def WeightedAddGraphLSTMRelation(params): 257 | chips = [ 258 | (Embedding ,'wemb1'), 259 | (BiGraphLSTM_WtdEmbMult ,'lstm'), 260 | (TargetHidden ,'get_target_columns'), 261 | (LogitRegression ,'logistic_regression'), 262 | ] 263 | return RelationStackMaker(chips, params, graph=True, weighted=True, batched=(params['batch_size']>1)) 264 | 265 | 266 | ''' Multitask learning architectures''' 267 | def LSTMRelation_multitask(params, num_tasks): 268 | Shared = [ 269 | (Embedding ,'wemb1'), 270 | (BiLSTM ,'lstm'), 271 | ] 272 | Classifiers = [[ 273 | (TargetHidden ,'t'+str(i)+'_get_target_columns'), 274 | (LogitRegression ,'t'+str(i)+'_logistic_regression') 275 | ] for i in range(num_tasks)] 276 | return MultitaskRelationStackMaker(Shared, Classifiers, params, num_tasks, batched=(params['batch_size']>1)) 277 | 278 | 279 | def WeightedGraphLSTMRelation_multitask(params, num_tasks): 280 | Shared = [ 281 | (Embedding ,'wemb1'), 282 | (BiGraphLSTM_Wtd ,'lstm'), 283 | ] 284 | Classifiers = [[ 285 | (TargetHidden ,'t'+str(i)+'_get_target_columns'), 286 | (LogitRegression ,'t'+str(i)+'_logistic_regression') 287 | ] for i in range(num_tasks)] 288 | return MultitaskRelationStackMaker(Shared, Classifiers, params, num_tasks, graph=True, weighted=True, batched=(params['batch_size']>1)) 289 | 290 | 291 | def WeightedAddGraphLSTMRelation_multitask(params, num_tasks): 292 | Shared = [ 293 | (Embedding ,'wemb1'), 294 | (BiGraphLSTM_WtdAdd ,'lstm'), 295 | ] 296 | Classifiers = [[ 297 | (TargetHidden ,'t'+str(i)+'_get_target_columns'), 298 | (LogitRegression ,'t'+str(i)+'_logistic_regression') 299 | ] for i in range(num_tasks)] 300 | return MultitaskRelationStackMaker(Shared, Classifiers, params, num_tasks, graph=True, weighted=True, batched=(params['batch_size']>1)) 301 | 302 | -------------------------------------------------------------------------------- /theano_src/neural_lib.py: -------------------------------------------------------------------------------- 1 | import os 2 | import theano.tensor as T 3 | import theano 4 | import time 5 | from theano import config 6 | import numpy as np 7 | import collections 8 | #theano.config.compute_test_value = 'off' 9 | #theano.config.profile=True 10 | #theano.config.profile_memory=True 11 | 12 | 13 | ''' This is the major file that defines neural classes (individual components in neural architectures) and several helper functions to facilitate configuring the neural classes. 14 | The basic classes include: 15 | Embedding 16 | Entity_attention 17 | (Bi)LSTM 18 | (Bi)GraphLSTM (and several variants) 19 | TargetHidden 20 | LogitRegression''' 21 | 22 | 23 | np.random.seed(1) 24 | def name_tv(*params): 25 | """ 26 | Helper function to generate names 27 | Join the params as string using '_' 28 | and also add a unique id, since every node in a theano 29 | graph should have a unique name 30 | """ 31 | if not hasattr(name_tv, "uid"): 32 | name_tv.uid = 0 33 | name_tv.uid += 1 34 | tmp = "_".join(params) 35 | return "_".join(['tparam', tmp, str(name_tv.uid)]) 36 | 37 | 38 | def np_floatX(data): 39 | return np.asarray(data, dtype=config.floatX) 40 | 41 | def tparams_make_name(*params): 42 | tmp = make_name(*params) 43 | return "_".join(['tparam', tmp]) 44 | 45 | def make_name(*params): 46 | """ 47 | Join the params as string using '_' 48 | and also add a unique id, since every node in a theano 49 | graph should have a unique name 50 | """ 51 | return "_".join(params) 52 | 53 | def reverse(tensor): 54 | rev, _ = theano.scan(lambda itm: itm, 55 | sequences=tensor, 56 | go_backwards=True, 57 | strict=True, 58 | name='reverse_rand%d'%np.random.randint(1000)) 59 | return rev 60 | 61 | 62 | def read_matrix_from_file(fn, dic): 63 | ''' 64 | Assume that the file contains words in first column, 65 | and embeddings in the rest and that dic maps words to indices. 66 | ''' 67 | _data = open(fn).read().strip().split('\n') 68 | _data = [e.strip().split() for e in _data] 69 | dim = len(_data[0]) - 1 70 | data = {} 71 | # NOTE: The norm of onesided_uniform rv is sqrt(n)/sqrt(3) 72 | # Since the expected value of X^2 = 1/3 where X ~ U[0, 1] 73 | # => sum(X_i^2) = dim/3 74 | # => norm = sqrt(dim/3) 75 | # => norm/dim = sqrt(1/3dim) 76 | multiplier = np.sqrt(1.0/(3*dim)) 77 | for e in _data: 78 | r = np.array([float(_e) for _e in e[1:]]) 79 | data[e[0]] = (r/np.linalg.norm(r)) * multiplier 80 | M = ArrayInit(ArrayInit.onesided_uniform, multiplier=1.0/dim).initialize(len(data), dim) 81 | for word, idx in dic.iteritems(): 82 | if word in data: 83 | M[idx] = data[word] 84 | return M 85 | 86 | ''' Dropout. Can be used in different places.''' 87 | def _dropout_from_layer(rng, layer, p): 88 | """p is the probablity of dropping a unit 89 | """ 90 | srng = theano.tensor.shared_randomstreams.RandomStreams( 91 | rng.randint(999999)) 92 | # p=1-p because 1's indicate keep and p is prob of dropping 93 | mask = srng.binomial(n=1, p=1-p, size=layer.shape) 94 | # The cast is important because 95 | output = layer * T.cast(mask, theano.config.floatX) 96 | return output 97 | 98 | ''' The class for initializing parameters matrixs.''' 99 | class ArrayInit(object): 100 | normal = 'normal' 101 | onesided_uniform = 'onesided_uniform' 102 | twosided_uniform = 'twosided_uniform' 103 | ortho = 'ortho' 104 | zero = 'zero' 105 | unit = 'unit' 106 | ones = 'ones' 107 | fromfile = 'fromfile' 108 | def __init__(self, option, 109 | multiplier=0.01, 110 | matrix=None, 111 | word2idx=None): 112 | self.option = option 113 | self.multiplier = multiplier 114 | self.matrix_filename = None 115 | self.matrix = self._matrix_reader(matrix, word2idx) 116 | if self.matrix is not None: 117 | self.multiplier = 1 118 | return 119 | 120 | def _matrix_reader(self, matrix, word2idx): 121 | if type(matrix) is str: 122 | self.matrix_filename = matrix 123 | assert os.path.exists(matrix), "File %s not found"%matrix 124 | matrix = read_matrix_from_file(matrix, word2idx) 125 | return matrix 126 | else: 127 | return None 128 | 129 | def initialize(self, *xy, **kwargs): 130 | if self.option == ArrayInit.normal: 131 | M = np.random.randn(*xy) 132 | elif self.option == ArrayInit.onesided_uniform: 133 | M = np.random.rand(*xy) 134 | elif self.option == ArrayInit.twosided_uniform: 135 | M = np.random.uniform(-1.0, 1.0, xy) 136 | elif self.option == ArrayInit.ortho: 137 | f = lambda dim: np.linalg.svd(np.random.randn(dim, dim))[0] 138 | if int(xy[1]/xy[0]) < 1 and xy[1]%xy[0] != 0: 139 | raise ValueError(str(xy)) 140 | M = np.concatenate(tuple(f(xy[0]) for _ in range(int(xy[1]/xy[0]))), 141 | axis=1) 142 | assert M.shape == xy 143 | elif self.option == ArrayInit.zero: 144 | M = np.zeros(xy) 145 | elif self.option in [ArrayInit.unit, ArrayInit.ones]: 146 | M = np.ones(xy) 147 | elif self.option == ArrayInit.fromfile: 148 | assert isinstance(self.matrix, np.ndarray) 149 | M = self.matrix 150 | else: 151 | raise NotImplementedError 152 | #self.multiplier = (kwargs['multiplier'] 153 | multiplier = (kwargs['multiplier'] 154 | if ('multiplier' in kwargs 155 | and kwargs['multiplier'] is not None) 156 | else self.multiplier) 157 | #return (M*self.multiplier).astype(config.floatX) 158 | return (M*multiplier).astype(config.floatX) 159 | 160 | def __repr__(self): 161 | mults = ', multiplier=%s'%((('%.3f'%self.multiplier) 162 | if type(self.multiplier) is float 163 | else str(self.multiplier))) 164 | mats = ((', matrix="%s"'%self.matrix_filename) 165 | if self.matrix_filename is not None 166 | else '') 167 | return "ArrayInit(ArrayInit.%s%s%s)"%(self.option, mults, mats) 168 | 169 | 170 | class SerializableLambda(object): 171 | def __init__(self, s): 172 | self.s = s 173 | self.f = eval(s) 174 | return 175 | 176 | def __repr__(self): 177 | return "SerializableLambda('%s')"%self.s 178 | 179 | def __call__(self, *args, **kwargs): 180 | return self.f(*args, **kwargs) 181 | 182 | 183 | class StackConfig(collections.MutableMapping): 184 | """A dictionary like object that would automatically recognize 185 | keys that end with the following pattern and return appropriate keys. 186 | _out_dim : 187 | _initializer : 188 | The actions to take are stored in a list for easy composition. 189 | """ 190 | actions = [ 191 | (lambda key: key.endswith('_out_dim') , lambda x: x), 192 | (lambda key: key.endswith('_T_initializer') , ArrayInit(ArrayInit.onesided_uniform)), 193 | (lambda key: key.endswith('_U_initializer') , ArrayInit(ArrayInit.ortho, multiplier=1)), 194 | (lambda key: key.endswith('_W_initializer') , ArrayInit(ArrayInit.twosided_uniform, multiplier=1)), 195 | (lambda key: key.endswith('_N_initializer') , ArrayInit(ArrayInit.normal)), 196 | (lambda key: key.endswith('_b_initializer') , ArrayInit(ArrayInit.zero)), 197 | (lambda key: key.endswith('_p_initializer') , ArrayInit(ArrayInit.twosided_uniform, multiplier=1)), 198 | (lambda key: key.endswith('_c_initializer') , ArrayInit(ArrayInit.twosided_uniform, multiplier=1)), 199 | (lambda key: key.endswith('_reg_weight') , 0), 200 | (lambda key: key.endswith('_viterbi') , False), 201 | (lambda key: key.endswith('_begin') , 1), 202 | (lambda key: key.endswith('_end') , -1), 203 | #(lambda key: key.endswith('_activation_fn') , lambda x: x + theano.tensor.abs_(x)), 204 | #(lambda key: key.endswith('_v_initializer') , ArrayInit(ArrayInit.ones, multiplier=NotImplemented)), 205 | ] 206 | def __init__(self, dictionary): 207 | self.store = collections.OrderedDict() 208 | self.store.update(dictionary) 209 | 210 | def __getitem__(self, key): 211 | if key in self.store: 212 | return self.store[key] 213 | for (predicate, retval) in self.actions: 214 | if predicate(key): 215 | return retval 216 | raise KeyError(key) 217 | 218 | def __setitem__(self, key, value): 219 | self.store[key] = value 220 | 221 | def __delitem__(self, key): 222 | del self.store[key] 223 | 224 | def __iter__(self): 225 | return iter(self.store) 226 | 227 | def __len__(self): 228 | return len(self.store) 229 | 230 | def reset(self): 231 | for k in self.store: 232 | if k.startswith('tparam_'): 233 | del self.store[k] 234 | return 235 | 236 | 237 | class Chip(object): 238 | """ The abstract class for neural chips. 239 | A Chip object requires name and a param dictionary 240 | that contains param[name+'_'+out_dim] (This can be a function that depends on the input_dim) 241 | Other than that it must also contain appropriate initializers for all the parameters. 242 | 243 | The params dictionary is updated to contain 'tparam__uid' 244 | """ 245 | def __init__(self, name, params=None): 246 | """ I set the output dimension of every node in the parameters. 247 | The input dimension would be set when prepend is called. 248 | (Since prepend method receives the previous chip) 249 | """ 250 | self.name = name 251 | if params is not None: 252 | print 'current chip:', name, 'out dimension:', self.kn('out_dim') 253 | self.out_dim = params[self.kn('out_dim')] 254 | print 'init chip:', self.name, 'out dim:', self.out_dim 255 | self.params = params 256 | return 257 | 258 | def prepend(self, previous_chip): 259 | """ Note that my input_dim of self = output_dim of previous_chip 260 | Also we keep track of absolute_input (the first input) to the layer 261 | """ 262 | #if hasattr(previous_chip, 'kn') and previous_chip.kn('win') in previous_chip.params: 263 | # print 'window size:', previous_chip.params[previous_chip.kn('win')] 264 | self.in_dim = previous_chip.out_dim * previous_chip.params[previous_chip.kn('win')] if (hasattr(previous_chip, 'kn') and previous_chip.kn('win') in previous_chip.params) else previous_chip.out_dim 265 | #print 'previous_chip out_dim:', previous_chip.out_dim, 'previous_chip window:', previous_chip.params[previous_chip.kn('win')] 266 | if hasattr(self.out_dim, '__call__'): 267 | self.out_dim = self.out_dim(self.in_dim) 268 | print 'in prepend, chip', self.name, 'in dim =', self.in_dim, 'out dim =', self.out_dim 269 | self.parameters = [] 270 | return self 271 | 272 | def compute(self, input_tv): 273 | """ Note that input_tv = previous_chip.output_tv 274 | This method returns a dictionary of internal weight params 275 | and This method sets self.output_tv 276 | """ 277 | raise NotImplementedError 278 | 279 | def regularizable_variables(self): 280 | """ If a value stored in the dictionary has the attribute 281 | is_regularizable then that value is regularizable 282 | """ 283 | return [k for k in self.params 284 | if hasattr(self.params[k], 'is_regularizable') 285 | and self.params[k].is_regularizable] 286 | 287 | def kn(self, thing): 288 | if len(thing) == 1: # It is probably ['U', 'W', 'b', 'T', 'N'] or some such Matrix 289 | keyname_suffix = '_initializer' 290 | else: 291 | keyname_suffix = '' 292 | return self.name + '_' + thing + keyname_suffix 293 | 294 | def _declare_mat(self, name, *dim, **kwargs): 295 | multiplier = (kwargs['multiplier'] 296 | if 'multiplier' in kwargs 297 | else None) 298 | var = theano.shared( 299 | self.params[self.kn(name)].initialize(*dim, multiplier=multiplier), 300 | name=tparams_make_name(self.name, name) 301 | ) 302 | if 'is_regularizable' not in kwargs: 303 | var.is_regularizable = True # Default 304 | else: 305 | var.is_regularizable = kwargs['is_regularizable'] 306 | return var 307 | 308 | def needed_key(self): 309 | return self._needed_key_impl() 310 | 311 | def _needed_key_impl(self, *things): 312 | return [self.kn(e) for e in ['out_dim'] + list(things)] 313 | 314 | class Start(object): 315 | """ A start object which has all the necessary attributes that 316 | any chip object that would call it would need. 317 | """ 318 | def __init__(self, out_dim, output_tv): 319 | self.out_dim = out_dim 320 | self.output_tv = output_tv 321 | 322 | # Note: should make changes here to pass pre_defined embeddings as parameters. 323 | class Embedding(Chip): 324 | def prepend(self, previous_chip): 325 | self = super(Embedding, self).prepend(previous_chip) 326 | if 'emb_matrix' in self.params: 327 | print 'pre_trained embedding!!' 328 | self.T_ = self.params['emb_matrix'] 329 | print self.T_, type(self.T_) 330 | else: 331 | self.T_ = self._declare_mat('T', self.params['voc_size'], self.out_dim) 332 | self.params['emb_dim'] = self.out_dim 333 | self.parameters = [self.T_] 334 | return self 335 | 336 | """ An embedding converts one-hot-vectors to dense vectors. 337 | We never take dot products with one-hot-vectors. 338 | This requires a T_initializer 339 | """ 340 | def compute(self, input_tv): 341 | print input_tv, type(input_tv) 342 | n_timesteps = input_tv.shape[0] 343 | window_size = 1 344 | if input_tv.ndim == 2: 345 | window_size = input_tv.shape[1] 346 | elif input_tv.ndim == 3: 347 | batch_size = input_tv.shape[1] 348 | window_size = input_tv.shape[2] 349 | print 'input_tv dimension:', input_tv.ndim 350 | print 'window size = ', window_size 351 | self.params[self.kn('win')] = window_size 352 | if input_tv.ndim < 3: 353 | self.output_tv = self.T_[input_tv.flatten()].reshape([n_timesteps, window_size * self.out_dim], ndim=2) 354 | else: 355 | self.output_tv = self.T_[input_tv.flatten()].reshape([n_timesteps, batch_size, window_size * self.out_dim], ndim=3) 356 | if self.params.get(self.kn('dropout_rate'), 0.0) != 0.0: 357 | print 'DROP OUT!!! at circuite', self.name, 'Drop out rate: ', self.params[self.kn('dropout_rate')] 358 | self.output_tv = _dropout_from_layer(self.params['rng'], self.output_tv, self.params[self.kn('dropout_rate')]) 359 | # Note: when we import the pre-defined emb_matrix, we do not add it to internal parameters because we already defined it as a regularizable parameter. 360 | #if 'emb_matrix' in self.params: 361 | # return tuple() 362 | #else: 363 | 364 | def needed_key(self): 365 | return self._needed_key_impl('T') 366 | 367 | 368 | # compute attention according to the similarity between the token with the entities 369 | class Entity_attention(Chip): 370 | ''' input_tv shape: (sent_len, batch_size, tv_dim)''' 371 | def get_att_weights(self, input_tv, i, entity_idxs): 372 | if input_tv.ndim == 3: 373 | ''' entities_tv shape: (batch_size, tv_dim)''' 374 | entity_tv = T.sum(input_tv * entity_idxs[:, :, None], axis=0) 375 | self.entity_tvs = T.set_subtensor(self.entity_tvs[i], entity_tv) 376 | ''' attention_weights shape: (sent_len, batch_size)''' 377 | #input_tv = input_tv.dimshuffle(1,0,2) 378 | #attention_weights = T.nnet.softmax(T.batched_dot(input_tv/input_tv.norm(2, axis=2, keepdims=True), entity_tv/entity_tv.norm(2, axis=1, keepdims=True))).T 379 | attention_weights = T.nnet.softmax(T.batched_dot(input_tv.dimshuffle(1,0,2), entity_tv)).T 380 | #attention_score = T.exp(T.sum(input_tv * entity_tv[None, :, :], axis=2)) 381 | #attention_weights = attention_score / attention_score.sum(axis=0) 382 | return attention_weights 383 | else: 384 | entity_tv = T.sum(input_tv * entity_idxs.dimshuffle(0, 'x'), axis=0) 385 | attention_weights = T.nnet.softmax(T.dot(input_tv, entity_tv)) 386 | return attention_weights 387 | 388 | def compute(self, input_tv, entities_tv): 389 | print 'in Entity_attention layer, input dimension:', input_tv.ndim 390 | if input_tv.ndim == 3: 391 | self.attention_weights = T.zeros_like(input_tv[:, :, 0]) 392 | self.att_wt_arry = T.zeros_like(input_tv[:,:,:3]).dimshuffle(2,0,1) 393 | self.entity_tvs = T.zeros_like(input_tv[:3, :, :]) 394 | else: 395 | self.attention_weights = T.zeros_like(input_tv[:, 0]) 396 | self.att_wt_arry = T.zeros_like(input_tv[:,:3]).T 397 | for i, enidx in enumerate(entities_tv): 398 | temp_weight = self.get_att_weights(input_tv, i, enidx) 399 | self.att_wt_arry = T.set_subtensor(self.att_wt_arry[i], temp_weight) 400 | self.attention_weights += temp_weight 401 | self.attention_weights = self.attention_weights / len(entities_tv) 402 | print 'attention weight dimensions:', self.attention_weights.ndim 403 | if input_tv.ndim == 3: 404 | self.output_tv = input_tv * self.attention_weights[:, :, None] 405 | else: 406 | self.output_tv = input_tv * self.attention_weights[:, None] 407 | 408 | 409 | class Activation(Chip): 410 | """ This requires a (activation)_fn parameter 411 | """ 412 | def compute(self, input_tv): 413 | self.output_tv = self.params[self.kn('fn')](input_tv) 414 | 415 | def needed_key(self): 416 | return self._needed_key_impl('fn') 417 | 418 | class TargetHidden(Chip): 419 | def prepend(self, previous_chip, entity_num): 420 | self.previous_chip = previous_chip 421 | super(TargetHidden, self).prepend(previous_chip) 422 | self.out_dim = entity_num*self.in_dim 423 | return self 424 | 425 | ''' If input_tv.ndim == 2: shape == (maxlen, in_dim), 426 | output_tv.shape == (num_entity * in_dim) 427 | If input_tv.ndim == 3: shape == (maxlen, n_samples, in_dim) 428 | output_tv.shape == (n_samples, num_entity * in_dim) 429 | ''' 430 | def compute(self, input_tv, entities_tv): 431 | if input_tv.ndim == 3: 432 | self.output_tv = T.sum(input_tv * entities_tv[0][:, :, None], axis=0) 433 | for enidx in entities_tv[1:]: 434 | self.output_tv= T.concatenate([self.output_tv, T.sum(input_tv * enidx[:, :, None], axis=0)], axis=1) 435 | 436 | else: 437 | self.output_tv = T.sum(input_tv * entities_tv[0].dimshuffle(0, 'x'), axis=0) 438 | #print 'in TargetHidden, variable types:', input_tv.dtype, entities_tv[0].dtype 439 | for enidx in entities_tv[1:]: 440 | self.output_tv = T.concatenate([self.output_tv, T.sum(input_tv * enidx.dimshuffle(0, 'x'), axis=0)]) 441 | 442 | 443 | class LogitRegression(Chip): 444 | def prepend(self, previous_chip): 445 | self.previous_chip = previous_chip 446 | super(LogitRegression, self).prepend(previous_chip) 447 | self.W = self._declare_mat('W', self.in_dim, self.out_dim) 448 | self.b = self._declare_mat('b', self.out_dim) 449 | self.parameters = [self.W, self.b] 450 | return self 451 | 452 | def compute(self, input_tv): 453 | if input_tv.ndim == 3: 454 | self.output_tv = T.nnet.softmax(T.dot(input_tv.max(axis=0), self.W) + self.b) 455 | self.gold_y = T.ivector(make_name(self.name, 'gold_y')).astype('int32') 456 | self.score = -T.mean(T.log(self.output_tv)[T.arange(self.gold_y.shape[0]), self.gold_y]) 457 | elif input_tv.ndim == 2: 458 | self.output_tv = T.nnet.softmax(T.dot(input_tv, self.W) + self.b) 459 | self.gold_y = T.ivector(make_name(self.name, 'gold_y')).astype('int32') 460 | self.score = -T.mean(T.log(self.output_tv)[T.arange(self.gold_y.shape[0]), self.gold_y]) 461 | else: 462 | assert input_tv.ndim == 1 463 | self.output_tv = T.nnet.softmax(T.dot(input_tv, self.W) + self.b).dimshuffle(1,) 464 | #print 'in LogitRegression, variable types:', input_tv.dtype, self.output_tv.dtype 465 | self.gold_y = T.iscalar(make_name(self.name, 'gold_y')).astype('int32') 466 | self.score = -T.log(self.output_tv[self.gold_y]) 467 | 468 | 469 | def needed_key(self): 470 | return self._needed_key_impl('W', 'b') 471 | 472 | 473 | class Linear(Chip): 474 | def prepend(self, previous_chip): 475 | self = super(Linear, self).prepend(previous_chip) 476 | self.N = self._declare_mat('N', self.in_dim, self.out_dim) 477 | self.parameters = [self.N] 478 | return self 479 | 480 | """ A Linear Chip is a matrix Multiplication 481 | It requires a U_initializer 482 | """ 483 | def compute(self, input_tv): 484 | self.output_tv = T.dot(input_tv, self.N) 485 | 486 | def needed_key(self): 487 | return self._needed_key_impl('N') 488 | 489 | class Bias(Chip): 490 | def prepend(self, previous_chip): 491 | self = super(Bias, self).prepend(previous_chip) 492 | self.b = self._declare_mat('b', self.out_dim) #, is_regularizable=False) 493 | self.parameters = [self.b] 494 | return self 495 | 496 | """ A Bias Chip adds a vector to the input 497 | It requires a b_initializer 498 | """ 499 | def compute(self, input_tv): 500 | self.output_tv = input_tv + self.b 501 | 502 | def needed_key(self): 503 | return self._needed_key_impl('b') 504 | 505 | class BiasedLinear(Chip): 506 | def __init__(self, name, params=None): 507 | super(BiasedLinear, self).__init__(name, params) 508 | self.params[self.name+"_linear_out_dim"] = params[self.kn('out_dim')] 509 | self.params[self.name+"_bias_out_dim"] = params[self.kn('out_dim')] 510 | self.Linear = Linear(name+'_linear', self.params) 511 | self.Bias = Bias(name+'_bias', self.params) 512 | 513 | def prepend(self, previous_chip): 514 | self.Bias.prepend(self.Linear.prepend(previous_chip)) 515 | self.in_dim = self.Linear.in_dim 516 | self.parameters = self.Linear.parameters + self.Bias.parameters 517 | return self 518 | """ Composition of Linear and Bias 519 | It requires a U_initializer and a b_initializer 520 | """ 521 | def compute(self, input_tv): 522 | self.Linear.compute(input_tv) 523 | self.Bias.compute(self.Linear.output_tv) 524 | self.output_tv = self.Bias.output_tv 525 | 526 | def needed_key(self): 527 | return self.Linear.needed_key() + self.Bias.needed_key() 528 | 529 | 530 | class Convolutional_NN(Chip): 531 | def prepend(self, previous_chip): 532 | self = super(Convolutional_NN, self).prepend(previous_chip) 533 | self.W = self._declare_mat('W', self.in_dim, self.out_dim) 534 | self.b = self._declare_mat('b', self.out_dim) #, is_regularizable = False) 535 | self.parameters = [self.W, self.b] 536 | return self 537 | 538 | """ This requires W, U and b initializer 539 | """ 540 | def compute(self, input_tv): 541 | self.output_tv = T.dot(input_tv, self.W).astype(config.floatX) + self.b 542 | 543 | def needed_key(self): 544 | return self._needed_key_impl('W', 'b') 545 | 546 | 547 | class LSTM(Chip): 548 | def prepend(self, previous_chip): 549 | self = super(LSTM, self).prepend(previous_chip) 550 | print 'lstm in dim:', self.in_dim, 'out dim:', self.out_dim 551 | self.go_backwards = self.params[self.kn('go_backwards')] 552 | self.W = self._declare_mat('W', self.in_dim, 4*self.out_dim) 553 | self.U = self._declare_mat('U', self.out_dim, 4*self.out_dim) 554 | self.b = self._declare_mat('b', 4*self.out_dim) #, is_regularizable = False) 555 | #self.p = self._declare_mat('p', 3*self.out_dim) 556 | self.parameters = [self.W, self.U, self.b] #, self.p] 557 | return self 558 | 559 | """ This requires W, U and b initializer 560 | """ 561 | def compute(self, input_tv, mask=None): 562 | n_steps = input_tv.shape[0] 563 | if input_tv.ndim == 3: 564 | n_samples = input_tv.shape[1] 565 | else: 566 | n_samples = 1 567 | 568 | def __slice(matrix, row_idx, stride): 569 | if matrix.ndim == 3: 570 | return matrix[:, :, row_idx * stride:(row_idx + 1) * stride] 571 | elif matrix.ndim == 2: 572 | return matrix[:, row_idx * stride:(row_idx + 1) * stride] 573 | else: 574 | return matrix[row_idx*stride: (row_idx+1)*stride] 575 | 576 | def __step(x_, h_prev, c_prev, U): #, p): 577 | """ 578 | x = Transformed and Bias incremented Input (This is basically a matrix) 579 | We do the precomputation for efficiency. 580 | h_prev = previous output of the LSTM (Left output of this function) 581 | c_prev = previous cell value of the LSTM (Right output of this function) 582 | 583 | This is the vanilla version of the LSTM without peephole connections 584 | See: Section 2, "LSTM: A Search Space Odyssey", Klaus et. al, ArXiv(2015) 585 | http://arxiv.org/pdf/1503.04069v1.pdf for details. 586 | """ 587 | preact = T.dot(h_prev, U) + x_ 588 | i = T.nnet.sigmoid(__slice(preact, 0, self.out_dim) )#+ __slice(p, 0, self.out_dim)*c_prev) # Input gate 589 | f = T.nnet.sigmoid(__slice(preact, 1, self.out_dim) )#+ __slice(p, 1, self.out_dim)*c_prev) # Forget gate 590 | z = T.tanh(__slice(preact, 3, self.out_dim)) # block input 591 | c = f * c_prev + i * z # cell state 592 | o = T.nnet.sigmoid(__slice(preact, 2, self.out_dim) )#+ __slice(p, 2, self.out_dim) * c) # output gate 593 | h = o * T.tanh(c) # block output 594 | return h, c 595 | 596 | def __step_batch(x_, m_, h_prev, c_prev, U): #, p): 597 | # Shape: (4*out_dim, num_sample) + (num_sample, 4*out_dim).T 598 | preact = T.dot(h_prev, U) + x_ 599 | i = T.nnet.sigmoid(__slice(preact, 0, self.out_dim) )#+ __slice(p, 0, self.out_dim)*c_prev) # Input gate 600 | f = T.nnet.sigmoid(__slice(preact, 1, self.out_dim) )#+ __slice(p, 1, self.out_dim)*c_prev) # Forget gate 601 | z = T.tanh(__slice(preact, 3, self.out_dim)) # block input 602 | c = f * c_prev + i * z # cell state 603 | o = T.nnet.sigmoid(__slice(preact, 2, self.out_dim) )#+ __slice(p, 2, self.out_dim) * c) # output gate 604 | h = o * T.tanh(c) # block output 605 | c = m_[:, None] * c + (1. - m_)[:, None] * c_prev 606 | h = m_[:, None] * h + (1. - m_)[:, None] * h_prev 607 | return h, c 608 | 609 | x_in = T.dot(input_tv, self.W).astype(config.floatX) + self.b 610 | seq_in = [x_in] 611 | lstm_step = __step 612 | h_init = T.alloc(np_floatX(0.), self.out_dim) 613 | c_init = T.alloc(np_floatX(0.), self.out_dim) 614 | if mask is not None: 615 | seq_in.append(mask) 616 | lstm_step = __step_batch 617 | h_init = T.alloc(np_floatX(0.), n_samples, self.out_dim) 618 | c_init = T.alloc(np_floatX(0.), n_samples, self.out_dim) 619 | print 'lstm step:', lstm_step 620 | rval, _ = theano.scan(lstm_step, 621 | sequences=seq_in, 622 | outputs_info=[h_init, c_init], 623 | non_sequences=[self.U], #, self.p], 624 | go_backwards=self.go_backwards, 625 | name=name_tv(self.name, 'LSTM_layer'), 626 | n_steps=n_steps, 627 | strict=True, 628 | ) 629 | self.output_tv = reverse(rval[0]) if self.go_backwards else rval[0] 630 | if self.params.get(self.kn('dropout_rate'), 0.0) != 0.0: 631 | print 'DROP OUT!!! at circuite', self.name, 'Drop out rate: ', self.params[self.kn('dropout_rate')] 632 | self.output_tv = _dropout_from_layer(self.params['rng'], self.output_tv, self.params[self.kn('dropout_rate')]) 633 | 634 | def needed_key(self): 635 | return self._needed_key_impl('W', 'U', 'b', 'go_backwards') 636 | 637 | 638 | class GraphLSTM(Chip): 639 | def prepend(self, previous_chip): 640 | self = super(GraphLSTM, self).prepend(previous_chip) 641 | print 'graph lstm in dim:', self.in_dim, 'out dim:', self.out_dim 642 | self.go_backwards = self.params[self.kn('go_backwards')] 643 | self.W = self._declare_mat('W', self.in_dim, 4*self.out_dim) 644 | self.U = self._declare_mat('U', self.out_dim, 4*self.out_dim) 645 | self.b = self._declare_mat('b', 4*self.out_dim) #, is_regularizable = False) 646 | #self.p = self._declare_mat('p', 3*self.out_dim) 647 | self.parameters = [self.W, self.U, self.b] #, self.p) 648 | return self 649 | 650 | # Shapes: x_= (4*out_dim,), child_h, child_c = (out_dim, sent_len), child_exists = (sent_len) 651 | def recursive_unit(self, x_, child_h, child_c, child_exists, U): 652 | h_tilde = T.sum(child_h, axis=-1) / child_exists.sum() #T.cast(, theano.config.floatX) 653 | preact = T.dot(h_tilde, U) + x_ 654 | i = T.nnet.sigmoid(self.slice(preact, 0, self.out_dim) ) #+ self.slice(self.p, 0, self.out_dim)*c_prev) # Input gate 655 | #f = T.nnet.sigmoid(self.slice(preact, 1, self.out_dim) )#+ self.slice(self.p, 1, self.out_dim)*c_prev) # Forget gate 656 | f = T.nnet.sigmoid(self.slice(x_, 1, self.out_dim).dimshuffle('x', 0) + T.dot(child_h.T, self.slice(U, 1, self.out_dim)) ) 657 | z = T.tanh(self.slice(preact, 3, self.out_dim)) # block input 658 | c = (f.T * child_c).sum(axis=-1) / child_exists.sum() + i * z # cell state 659 | o = T.nnet.sigmoid(self.slice(preact, 2, self.out_dim) )#+ self.slice(self.p, 2, self.out_dim) * c) # output gate 660 | h = o * T.tanh(c) # block output 661 | return h, c 662 | 663 | 664 | # Shapes: x_= (batch_size, 4*out_dim,), child_h, child_c = (batch_size, sent_len, out_dim), 665 | # child_exists = (batch_size, sent_len) 666 | def recursive_unit_batch(self, x_, child_h, child_c, child_exists, U): 667 | h_tilde = T.sum(child_h, axis=1) / child_exists[:, None] #T.cast(, theano.config.floatX) 668 | preact = T.dot(h_tilde, U) + x_ 669 | i = T.nnet.sigmoid(self.slice(preact, 0, self.out_dim) ) #+ self.slice(self.p, 0, self.out_dim)*c_prev) # Input gate 670 | #f = T.nnet.sigmoid(self.slice(preact, 1, self.out_dim) )#+ self.slice(self.p, 1, self.out_dim)*c_prev) # Forget gate 671 | f = T.nnet.sigmoid(self.slice(x_, 1, self.out_dim)[:, None, :] + T.dot(child_h, self.slice(U, 1, self.out_dim)) ) #/ child_exists[:, :, None] 672 | z = T.tanh(self.slice(preact, 3, self.out_dim)) # block input 673 | c = (f * child_c).sum(axis=1) / child_exists[:, None] + i * z # cell state 674 | o = T.nnet.sigmoid(self.slice(preact, 2, self.out_dim) )#+ self.slice(self.p, 2, self.out_dim) * c) # output gate 675 | h = o * T.tanh(c) # block output 676 | return h, c 677 | 678 | def slice(self, matrix, row_idx, stride): 679 | if matrix.ndim == 3: 680 | return matrix[:, :, row_idx * stride:(row_idx + 1) * stride] 681 | elif matrix.ndim == 2: 682 | return matrix[:, row_idx * stride:(row_idx + 1) * stride] 683 | elif matrix.ndim == 1: 684 | return matrix[row_idx*stride: (row_idx+1)*stride] 685 | else: 686 | raise NotImplementedError 687 | 688 | 689 | """ This requires W, U and b initializer 690 | """ 691 | def compute(self, input_tv, mask=None): 692 | n_steps = input_tv.shape[0] 693 | if input_tv.ndim == 3: 694 | n_samples = input_tv.shape[1] 695 | else: 696 | n_samples = 1 697 | 698 | 699 | def __step(x_, m_, t, node_h, node_c, U): 700 | child_h = node_h * m_ 701 | child_c = node_c * m_ 702 | if self.go_backwards: 703 | valid_mask = m_[t:] 704 | else: 705 | valid_mask = m_[:t+1] 706 | curr_h, curr_c = self.recursive_unit(x_, child_h, child_c, valid_mask, U) 707 | node_h = T.set_subtensor(node_h[:,t], curr_h) 708 | node_c = T.set_subtensor(node_c[:,t], curr_c) 709 | return node_h, node_c 710 | 711 | # Shapes: x_: (batch_size, 4*out_dim), children_mask: (batch_size, sent_len) 712 | # t: (1,), node_h: (batch_size, sent_len, out_dim), U: (out_dim, 4*out_dim) 713 | def __step_batch(x_, children_mask, t, node_h, node_c, U): 714 | child_h = node_h * children_mask[:, :, None] 715 | child_c = node_c * children_mask[:, :, None] 716 | if self.go_backwards: 717 | valid_mask = children_mask[:, t+1:].sum(axis=1) 718 | else: 719 | valid_mask = children_mask[:, :t].sum(axis=1) 720 | convert_mask = T.zeros_like(valid_mask) 721 | diff_mask = T.cast(T.eq(valid_mask, convert_mask), dtype=theano.config.floatX) 722 | valid_mask += diff_mask 723 | curr_h, curr_c = self.recursive_unit_batch(x_, child_h, child_c, valid_mask, U) 724 | node_h = T.set_subtensor(node_h[:, t, :], curr_h) 725 | node_c = T.set_subtensor(node_c[:, t, :], curr_c) 726 | return node_h, node_c 727 | 728 | x_in = T.dot(input_tv, self.W).astype(config.floatX) + self.b 729 | seq_in = [x_in, mask, T.arange(n_steps)] 730 | if input_tv.ndim == 3: 731 | lstm_step = __step_batch 732 | h_init = T.alloc(np_floatX(0.), n_samples, n_steps, self.out_dim) 733 | c_init = T.alloc(np_floatX(0.), n_samples, n_steps, self.out_dim) 734 | else: 735 | lstm_step = __step 736 | h_init = T.alloc(np_floatX(0.), self.out_dim, n_steps) 737 | c_init = T.alloc(np_floatX(0.), self.out_dim, n_steps) 738 | print 'lstm step:', lstm_step 739 | rval, _ = theano.scan(lstm_step, 740 | sequences=seq_in, 741 | outputs_info=[h_init, c_init], 742 | non_sequences=[self.U], #, self.p], 743 | go_backwards=self.go_backwards, 744 | name=name_tv(self.name, 'GraphLSTM_layer'), 745 | n_steps=n_steps, 746 | strict=True, 747 | ) 748 | if input_tv.ndim == 3: 749 | self.output_tv = rval[0][-1].dimshuffle(1,0,2) 750 | #self.output_tv = reverse(rval[0][-1].dimshuffle(1,0,2)) if self.go_backwards else rval[0][-1].dimshuffle(1,0,2) 751 | else: 752 | self.output_tv = rval[0][-1].T 753 | #self.output_tv = reverse(rval[0][-1].T) if self.go_backwards else rval[0][-1].T 754 | #print 'in GraphLSTM, variable types:', input_tv.dtype, self.output_tv.dtype 755 | if self.params.get(self.kn('dropout_rate'), 0.0) != 0.0: 756 | print 'DROP OUT!!! at circuite', self.name, 'Drop out rate: ', self.params[self.kn('dropout_rate')] 757 | self.output_tv = _dropout_from_layer(self.params['rng'], self.output_tv, self.params[self.kn('dropout_rate')]) 758 | 759 | def needed_key(self): 760 | return self._needed_key_impl('W', 'U', 'b', 'go_backwards') 761 | 762 | # A special BiasedLinear layer that takes advatages of the fact that the input is a one-hot vector. 763 | class Onehot_Linear(Chip): 764 | def prepend(self, previous_chip): 765 | self = super(Onehot_Linear, self).prepend(previous_chip) 766 | self.T = self._declare_mat('T', self.in_dim, self.out_dim) 767 | #self.b = self._declare_mat('b', self.out_dim) 768 | self.parameters = [self.T] 769 | return self 770 | """ Composition of Linear and Bias 771 | It requires a U_initializer and a b_initializer 772 | """ 773 | def compute(self, input_tv): 774 | linear_trans = self.T[input_tv.flatten()].reshape((input_tv.shape[0], input_tv.shape[1], self.out_dim)) 775 | self.output_tv = linear_trans #+ self.b 776 | 777 | def needed_key(self): 778 | return self._needed_key_impl('T') #, 'b') 779 | 780 | # A special autoencoder that takes advantages of the input being a one-hot vector 781 | class AutoEncoder(Chip): 782 | def __init__(self, name, params=None): 783 | super(AutoEncoder, self).__init__(name, params) 784 | self.params[self.name+"_encoder_out_dim"] = params[self.kn('hidden_dim')] 785 | self.params[self.name+"_decoder_out_dim"] = params[self.kn('out_dim')] 786 | self.Encoder = Onehot_Linear(name+'_encoder', self.params) 787 | self.Decoder = BiasedLinear(name+'_decoder', self.params) 788 | 789 | def prepend(self, previous_chip): 790 | self.Decoder.prepend(self.Encoder.prepend(previous_chip)) 791 | self.in_dim = self.Encoder.in_dim 792 | self.parameters = self.Encoder.parameters + self.Decoder.parameters 793 | return self 794 | 795 | # input_tv shape: (batch_size, sent_len, 1), the last dimension indicate which type it is, the value < num_arc_type 796 | def compute(self, input_tv): 797 | self.Encoder.compute(input_tv) 798 | self.Decoder.compute( T.nnet.sigmoid(self.Encoder.output_tv) ) 799 | self.output_tv = T.nnet.sigmoid(self.Decoder.output_tv) 800 | 801 | def needed_key(self): 802 | return self.Encoder.needed_key() + self.Decoder.needed_key() 803 | 804 | 805 | class GraphLSTM_WtdEmbMult(Chip): 806 | def prepend(self, previous_chip): 807 | self = super(GraphLSTM_WtdEmbMult, self).prepend(previous_chip) 808 | print 'graph lstm in dim:', self.in_dim, 'out dim:', self.out_dim 809 | self.go_backwards = self.params[self.kn('go_backwards')] 810 | self.W = self._declare_mat('W', self.in_dim, 4*self.out_dim) 811 | num_arc_types = self.params[self.kn('type_dim')] 812 | temp_U = np.zeros((num_arc_types, self.out_dim, 4*self.out_dim)).astype(theano.config.floatX) 813 | for i in range(num_arc_types): 814 | temp_U[i] = ArrayInit(ArrayInit.ortho).initialize(self.out_dim, 4*self.out_dim, multiplier=1) 815 | self.U = theano.shared(temp_U, name=tparams_make_name(self.name, 'U')) 816 | self.U.is_regularizable = True 817 | self.U = theano.shared(temp_U, name=tparams_make_name(self.name, 'U')) 818 | self.U.is_regularizable = True 819 | self.b = self._declare_mat('b', 4*self.out_dim) #, is_regularizable = False) 820 | #self.p = self._declare_mat('p', 3*self.out_dim) 821 | #type_matrix = T.eye(self.params[self.kn('arc_types')]+1, dtype=theano.config.floatX) 822 | self.T = self._declare_mat('T', self.params[self.kn('arc_types')], num_arc_types) 823 | self.parameters = [self.W, self.U, self.T, self.b] #, self.p) 824 | return self 825 | 826 | # Shapes: x_ = (4*out_dim,), child_h = (out_dim, n_steps, arc_types), child_c = (out_dim, n_steps), child_exists = (n_steps, arc_types) 827 | def recursive_unit(self, x_, child_h, child_c, child_exists, U): 828 | h_tilde = T.sum(child_h, axis=1) / child_exists.sum() #T.cast(, theano.config.floatX) 829 | preact = T.tensordot(h_tilde.T, U, [[0,1],[0,1]]) + x_ 830 | i = T.nnet.sigmoid(self.slice(preact, 0, self.out_dim) ) #+ self.slice(self.p, 0, self.out_dim)*c_prev) # Input gate 831 | #f = T.nnet.sigmoid(self.slice(preact, 1, self.out_dim) )#+ self.slice(self.p, 1, self.out_dim)*c_prev) # Forget gate 832 | f = T.nnet.sigmoid(self.slice(x_, 1, self.out_dim).dimshuffle('x', 0) + T.tensordot(child_h, self.slice(U, 1, self.out_dim), [[0, 2],[1, 0]]) ) 833 | z = T.tanh(self.slice(preact, 3, self.out_dim)) # block input 834 | c = (f.T * child_c).sum(axis=1) / child_exists.sum() + i * z # cell state 835 | o = T.nnet.sigmoid(self.slice(preact, 2, self.out_dim) )#+ self.slice(self.p, 2, self.out_dim) * c) # output gate 836 | h = o * T.tanh(c) # block output 837 | return h, c 838 | 839 | # Shapes: x_= (batch_size, 4*out_dim,), child_h = (batch_size, sent_len,out_dim+arc_types+1), 840 | # child_c = (batch_size, sent_len, out_dim), child_exists = (batch_size, sent_len) 841 | def recursive_unit_batch(self, x_, child_h, child_c, child_exists, U): 842 | # Result shape = (batch_size, arc_types, out_dim) 843 | h_tilde = T.sum(child_h, axis=1) / child_exists[:, None, None] #T.cast(, theano.config.floatX) 844 | preact = T.tensordot(h_tilde, U, [[1,2],[0,1]]) + x_ 845 | i = T.nnet.sigmoid(self.slice(preact, 0, self.out_dim) ) #+ self.slice(self.p, 0, self.out_dim)*c_prev) # Input gate 846 | #f = T.nnet.sigmoid(self.slice(preact, 1, self.out_dim) )#+ self.slice(self.p, 1, self.out_dim)*c_prev) # Forget gate 847 | f = T.nnet.sigmoid(self.slice(x_, 1, self.out_dim)[:, None, :] + T.tensordot(child_h, self.slice(U, 1, self.out_dim), [[2,3],[0,1]]) ) #/ child_exists[:, :, None] 848 | z = T.tanh(self.slice(preact, 3, self.out_dim)) # block input 849 | c = (f * child_c).sum(axis=1) / child_exists[:, None] + i * z # cell state 850 | o = T.nnet.sigmoid(self.slice(preact, 2, self.out_dim) )#+ self.slice(self.p, 2, self.out_dim) * c) # output gate 851 | h = o * T.tanh(c) # block output 852 | return h, c 853 | 854 | def slice(self, matrix, row_idx, stride): 855 | if matrix.ndim == 3: 856 | return matrix[:, :, row_idx * stride:(row_idx + 1) * stride] 857 | elif matrix.ndim == 2: 858 | return matrix[:, row_idx * stride:(row_idx + 1) * stride] 859 | else: 860 | return matrix[row_idx*stride: (row_idx+1)*stride] 861 | 862 | 863 | """ This requires W, U and b initializer 864 | """ 865 | def compute(self, input_tv, mask=None): 866 | n_steps = input_tv.shape[0] 867 | if input_tv.ndim == 3: 868 | n_samples = input_tv.shape[1] 869 | else: 870 | n_samples = 1 871 | 872 | # Shapes: m_ = (n_steps, arc_type), node_h = (out_dim, n_steps) 873 | def __step(x_, m_, t, node_h, node_c, U): 874 | child_h = node_h[:,:,None] * m_ 875 | child_c = node_c * m_.sum(axis=1) 876 | if self.go_backwards: 877 | valid_mask = m_[t+1:, :] 878 | else: 879 | valid_mask = m_[:t, :] 880 | curr_h, curr_c = self.recursive_unit(x_, child_h, child_c, valid_mask, U) 881 | node_h = T.set_subtensor(node_h[:,t], curr_h) 882 | node_c = T.set_subtensor(node_c[:,t], curr_c) 883 | return node_h, node_c 884 | 885 | # Shapes: x_: (batch_size, 4*out_dim), m_: (batch_size, sent_len, 2) 886 | # t: (1,), node_h: (batch_size, sent_len, out_dim), U: (out_dim, 4*out_dim) 887 | def __step_batch(x_, m_, t, node_h, node_c, U, type_matrix): 888 | child_mask = m_[:, :, 0] 889 | dep_type = T.cast(m_[:, :, 1], dtype='int32') 890 | arc_types = type_matrix[dep_type.flatten()].reshape((node_h.shape[0], node_h.shape[1], self.params[self.kn('type_dim')])) 891 | child_h = node_h[:, :, :, None] * arc_types[:, :, None, :] * child_mask[:, :, None, None] 892 | #child_h = T.concatenate((node_h, arc_types), axis=2) * child_mask[:, :, None] 893 | child_c = node_c * child_mask[:, :, None] 894 | if self.go_backwards: 895 | valid_mask = child_mask[:, t+1:].sum(axis=1) 896 | else: 897 | valid_mask = child_mask[:, :t].sum(axis=1) 898 | convert_mask = T.zeros_like(valid_mask) 899 | diff_mask = T.cast(T.eq(valid_mask, convert_mask), dtype=theano.config.floatX) 900 | valid_mask += diff_mask 901 | curr_h, curr_c = self.recursive_unit_batch(x_, child_h, child_c, valid_mask, U) 902 | node_h = T.set_subtensor(node_h[:, t, :], curr_h) 903 | node_c = T.set_subtensor(node_c[:, t, :], curr_c) 904 | return node_h, node_c 905 | ################################################################ 906 | 907 | 908 | x_in = T.dot(input_tv, self.W).astype(config.floatX) + self.b 909 | seq_in = [x_in, mask, T.arange(n_steps)] 910 | if input_tv.ndim == 3: 911 | lstm_step = __step_batch 912 | h_init = T.alloc(np_floatX(0.), n_samples, n_steps, self.out_dim) 913 | c_init = T.alloc(np_floatX(0.), n_samples, n_steps, self.out_dim) 914 | else: 915 | lstm_step = __step 916 | h_init = T.alloc(np_floatX(0.), self.out_dim, n_steps) 917 | c_init = T.alloc(np_floatX(0.), self.out_dim, n_steps) 918 | print 'lstm step:', lstm_step 919 | rval, _ = theano.scan(lstm_step, 920 | sequences=seq_in, 921 | outputs_info=[h_init, c_init], 922 | non_sequences=[self.U, self.T], #A], #, self.p], 923 | go_backwards=self.go_backwards, 924 | name=name_tv(self.name, 'WeightedAddGraphLSTM_layer'), 925 | n_steps=n_steps, 926 | strict=True, 927 | ) 928 | if input_tv.ndim == 3: 929 | self.output_tv = rval[0][-1].dimshuffle(1,0,2) 930 | #self.output_tv = reverse(rval[0][-1].dimshuffle(1,0,2)) if self.go_backwards else rval[0][-1].dimshuffle(1,0,2) 931 | else: 932 | self.output_tv = rval[0][-1].T 933 | #self.output_tv = reverse(rval[0][-1].T) if self.go_backwards else rval[0][-1].T 934 | #print 'in GraphLSTM, variable types:', input_tv.dtype, self.output_tv.dtype 935 | if self.params.get(self.kn('dropout_rate'), 0.0) != 0.0: 936 | print 'DROP OUT!!! at circuite', self.name, 'Drop out rate: ', self.params[self.kn('dropout_rate')] 937 | self.output_tv = _dropout_from_layer(self.params['rng'], self.output_tv, self.params[self.kn('dropout_rate')]) 938 | 939 | def needed_key(self): 940 | return self._needed_key_impl('W', 'U', 'b', 'T', 'go_backwards', 'arc_types') 941 | 942 | 943 | class GraphLSTM_WtdAdd(Chip): 944 | def prepend(self, previous_chip): 945 | self = super(GraphLSTM_WtdAdd, self).prepend(previous_chip) 946 | print 'graph lstm in dim:', self.in_dim, 'out dim:', self.out_dim 947 | self.go_backwards = self.params[self.kn('go_backwards')] 948 | self.W = self._declare_mat('W', self.in_dim, 4*self.out_dim) 949 | num_arc_types = self.params[self.kn('type_dim')] 950 | temp_U = np.zeros((num_arc_types+self.out_dim, 4*self.out_dim)).astype(theano.config.floatX) 951 | self.U = theano.shared(temp_U, name=tparams_make_name(self.name, 'U')) 952 | self.U.is_regularizable = True 953 | self.b = self._declare_mat('b', 4*self.out_dim) #, is_regularizable = False) 954 | #self.p = self._declare_mat('p', 3*self.out_dim) 955 | #type_matrix = T.eye(self.params[self.kn('arc_types')]+1, dtype=theano.config.floatX) 956 | self.T = self._declare_mat('T', self.params[self.kn('arc_types')], num_arc_types) 957 | #self.TA = T.concatenate([T.zeros([1, self.params[self.kn('type_dim')]], dtype=theano.config.floatX), self.T], axis=0) 958 | #self.T = T.set_subtensor(self.T[0], 0.) 959 | self.parameters = [self.W, self.U, self.T, self.b] #, self.p) 960 | return self 961 | 962 | # Shapes: x_ = (4*out_dim,), child_h = (out_dim, n_steps, arc_types), child_c = (out_dim, n_steps), child_exists = (n_steps, arc_types) 963 | def recursive_unit(self, x_, child_h, child_c, child_exists, U): 964 | h_tilde = T.sum(child_h, axis=1) / child_exists.sum() #T.cast(, theano.config.floatX) 965 | preact = T.tensordot(h_tilde.T, U, [[0,1],[0,1]]) + x_ 966 | i = T.nnet.sigmoid(self.slice(preact, 0, self.out_dim) ) #+ self.slice(self.p, 0, self.out_dim)*c_prev) # Input gate 967 | #f = T.nnet.sigmoid(self.slice(preact, 1, self.out_dim) )#+ self.slice(self.p, 1, self.out_dim)*c_prev) # Forget gate 968 | f = T.nnet.sigmoid(self.slice(x_, 1, self.out_dim).dimshuffle('x', 0) + T.tensordot(child_h, self.slice(U, 1, self.out_dim), [[0, 2],[1, 0]]) ) 969 | z = T.tanh(self.slice(preact, 3, self.out_dim)) # block input 970 | c = (f.T * child_c).sum(axis=1) / child_exists.sum() + i * z # cell state 971 | o = T.nnet.sigmoid(self.slice(preact, 2, self.out_dim) )#+ self.slice(self.p, 2, self.out_dim) * c) # output gate 972 | h = o * T.tanh(c) # block output 973 | return h, c 974 | 975 | # Shapes: x_= (batch_size, 4*out_dim,), child_h = (batch_size, sent_len,out_dim+arc_types+1), 976 | # child_c = (batch_size, sent_len, out_dim), child_exists = (batch_size, sent_len) 977 | def recursive_unit_batch(self, x_, child_h, child_c, child_exists, U): 978 | # Result shape = (batch_size, arc_types, out_dim) 979 | h_tilde = T.sum(child_h, axis=1) / child_exists[:, None] #T.cast(, theano.config.floatX) 980 | preact = T.dot(h_tilde, U) + x_ 981 | i = T.nnet.sigmoid(self.slice(preact, 0, self.out_dim) ) #+ self.slice(self.p, 0, self.out_dim)*c_prev) # Input gate 982 | #f = T.nnet.sigmoid(self.slice(preact, 1, self.out_dim) )#+ self.slice(self.p, 1, self.out_dim)*c_prev) # Forget gate 983 | #f = T.nnet.sigmoid(self.slice(x_, 1, self.out_dim)[:, None, :] + T.tensordot(child_h, self.slice(U, 1, self.out_dim), [[2,3],[0,1]]) ) #/ child_exists[:, :, None] 984 | f = T.nnet.sigmoid(self.slice(x_, 1, self.out_dim)[:, None, :] + T.dot(child_h, self.slice(U, 1, self.out_dim)) ) 985 | z = T.tanh(self.slice(preact, 3, self.out_dim)) # block input 986 | c = (f * child_c).sum(axis=1) / child_exists[:, None] + i * z # cell state 987 | o = T.nnet.sigmoid(self.slice(preact, 2, self.out_dim) )#+ self.slice(self.p, 2, self.out_dim) * c) # output gate 988 | h = o * T.tanh(c) # block output 989 | return h, c 990 | 991 | def slice(self, matrix, row_idx, stride): 992 | if matrix.ndim == 3: 993 | return matrix[:, :, row_idx * stride:(row_idx + 1) * stride] 994 | elif matrix.ndim == 2: 995 | return matrix[:, row_idx * stride:(row_idx + 1) * stride] 996 | else: 997 | return matrix[row_idx*stride: (row_idx+1)*stride] 998 | 999 | 1000 | """ This requires W, U and b initializer 1001 | """ 1002 | def compute(self, input_tv, mask=None): 1003 | n_steps = input_tv.shape[0] 1004 | if input_tv.ndim == 3: 1005 | n_samples = input_tv.shape[1] 1006 | else: 1007 | n_samples = 1 1008 | 1009 | # Shapes: m_ = (n_steps, arc_type), node_h = (out_dim, n_steps) 1010 | def __step(x_, m_, t, node_h, node_c, U): 1011 | child_h = node_h[:,:,None] * m_ 1012 | child_c = node_c * m_.sum(axis=1) 1013 | if self.go_backwards: 1014 | valid_mask = m_[t+1:, :] 1015 | else: 1016 | valid_mask = m_[:t, :] 1017 | curr_h, curr_c = self.recursive_unit(x_, child_h, child_c, valid_mask, U) 1018 | node_h = T.set_subtensor(node_h[:,t], curr_h) 1019 | node_c = T.set_subtensor(node_c[:,t], curr_c) 1020 | return node_h, node_c 1021 | 1022 | # Shapes: x_: (batch_size, 4*out_dim), m_: (batch_size, sent_len, 2) 1023 | # t: (1,), node_h: (batch_size, sent_len, out_dim), U: (out_dim, 4*out_dim) 1024 | def __step_batch(x_, m_, t, node_h, node_c, U, type_matrix): 1025 | child_mask = m_[:, :, 0] 1026 | dep_type = T.cast(m_[:, :, 1], dtype='int32') 1027 | arc_types = type_matrix[dep_type.flatten()].reshape((node_h.shape[0], node_h.shape[1], self.params[self.kn('type_dim')])) 1028 | child_h = T.concatenate((node_h, arc_types), axis=2) * child_mask[:, :, None] 1029 | child_c = node_c * child_mask[:, :, None] 1030 | if self.go_backwards: 1031 | valid_mask = child_mask[:, t+1:].sum(axis=1) 1032 | else: 1033 | valid_mask = child_mask[:, :t].sum(axis=1) 1034 | convert_mask = T.zeros_like(valid_mask) 1035 | diff_mask = T.cast(T.eq(valid_mask, convert_mask), dtype=theano.config.floatX) 1036 | valid_mask += diff_mask 1037 | curr_h, curr_c = self.recursive_unit_batch(x_, child_h, child_c, valid_mask, U) 1038 | node_h = T.set_subtensor(node_h[:, t, :], curr_h) 1039 | node_c = T.set_subtensor(node_c[:, t, :], curr_c) 1040 | return node_h, node_c 1041 | 1042 | x_in = T.dot(input_tv, self.W).astype(config.floatX) + self.b 1043 | seq_in = [x_in, mask, T.arange(n_steps)] 1044 | if input_tv.ndim == 3: 1045 | lstm_step = __step_batch 1046 | h_init = T.alloc(np_floatX(0.), n_samples, n_steps, self.out_dim) 1047 | c_init = T.alloc(np_floatX(0.), n_samples, n_steps, self.out_dim) 1048 | else: 1049 | lstm_step = __step 1050 | h_init = T.alloc(np_floatX(0.), self.out_dim, n_steps) 1051 | c_init = T.alloc(np_floatX(0.), self.out_dim, n_steps) 1052 | print 'lstm step:', lstm_step 1053 | rval, _ = theano.scan(lstm_step, 1054 | sequences=seq_in, 1055 | outputs_info=[h_init, c_init], 1056 | non_sequences=[self.U, self.T], #A], #, self.p], 1057 | go_backwards=self.go_backwards, 1058 | name=name_tv(self.name, 'WeightedAddGraphLSTM_layer'), 1059 | n_steps=n_steps, 1060 | strict=True, 1061 | ) 1062 | if input_tv.ndim == 3: 1063 | self.output_tv = rval[0][-1].dimshuffle(1,0,2) 1064 | #self.output_tv = reverse(rval[0][-1].dimshuffle(1,0,2)) if self.go_backwards else rval[0][-1].dimshuffle(1,0,2) 1065 | else: 1066 | self.output_tv = rval[0][-1].T 1067 | #self.output_tv = reverse(rval[0][-1].T) if self.go_backwards else rval[0][-1].T 1068 | #print 'in GraphLSTM, variable types:', input_tv.dtype, self.output_tv.dtype 1069 | if self.params.get(self.kn('dropout_rate'), 0.0) != 0.0: 1070 | print 'DROP OUT!!! at circuite', self.name, 'Drop out rate: ', self.params[self.kn('dropout_rate')] 1071 | self.output_tv = _dropout_from_layer(self.params['rng'], self.output_tv, self.params[self.kn('dropout_rate')]) 1072 | 1073 | def needed_key(self): 1074 | return self._needed_key_impl('W', 'U', 'b', 'T', 'go_backwards', 'arc_types') 1075 | 1076 | 1077 | class GraphLSTM_Wtd(Chip): 1078 | def prepend(self, previous_chip): 1079 | self = super(GraphLSTM_Wtd, self).prepend(previous_chip) 1080 | print 'graph lstm in dim:', self.in_dim, 'out dim:', self.out_dim 1081 | self.go_backwards = self.params[self.kn('go_backwards')] 1082 | self.W = self._declare_mat('W', self.in_dim, 4*self.out_dim) 1083 | num_arc_types = self.params[self.kn('arc_types')] 1084 | temp_U = np.zeros((num_arc_types, self.out_dim, 4*self.out_dim)).astype(theano.config.floatX) 1085 | for i in range(num_arc_types): 1086 | temp_U[i] = ArrayInit(ArrayInit.ortho).initialize(self.out_dim, 4*self.out_dim, multiplier=1) 1087 | self.U = theano.shared(temp_U, name=tparams_make_name(self.name, 'U')) 1088 | self.U.is_regularizable = True 1089 | self.b = self._declare_mat('b', 4*self.out_dim) #, is_regularizable = False) 1090 | #self.p = self._declare_mat('p', 3*self.out_dim) 1091 | self.parameters = [self.W, self.U, self.b] #, self.p) 1092 | return self 1093 | 1094 | # Shapes: x_ = (4*out_dim,), child_h = (out_dim, n_steps, arc_types), child_c = (out_dim, n_steps), child_exists = (n_steps, arc_types) 1095 | def recursive_unit(self, x_, child_h, child_c, child_exists, U): 1096 | h_tilde = T.sum(child_h, axis=1) / child_exists.sum() #T.cast(, theano.config.floatX) 1097 | preact = T.tensordot(h_tilde.T, U, [[0,1],[0,1]]) + x_ 1098 | i = T.nnet.sigmoid(self.slice(preact, 0, self.out_dim) ) #+ self.slice(self.p, 0, self.out_dim)*c_prev) # Input gate 1099 | #f = T.nnet.sigmoid(self.slice(preact, 1, self.out_dim) )#+ self.slice(self.p, 1, self.out_dim)*c_prev) # Forget gate 1100 | f = T.nnet.sigmoid(self.slice(x_, 1, self.out_dim).dimshuffle('x', 0) + T.tensordot(child_h, self.slice(U, 1, self.out_dim), [[0, 2],[1, 0]]) ) 1101 | z = T.tanh(self.slice(preact, 3, self.out_dim)) # block input 1102 | c = (f.T * child_c).sum(axis=1) / child_exists.sum() + i * z # cell state 1103 | o = T.nnet.sigmoid(self.slice(preact, 2, self.out_dim) )#+ self.slice(self.p, 2, self.out_dim) * c) # output gate 1104 | h = o * T.tanh(c) # block output 1105 | return h, c 1106 | 1107 | # Shapes: x_= (batch_size, 4*out_dim,), child_h = (batch_size, sent_len, arc_types, out_dim), 1108 | # child_c = (batch_size, sent_len, out_dim), child_exists = (batch_size, sent_len, arc_types) 1109 | def recursive_unit_batch(self, x_, child_h, child_c, child_exists, U): 1110 | # Result shape = (batch_size, arc_types, out_dim) 1111 | h_tilde = T.sum(child_h, axis=1) / child_exists[:, None, None] #T.cast(, theano.config.floatX) 1112 | preact = T.tensordot(h_tilde, U, [[1,2],[0,1]]) + x_ 1113 | i = T.nnet.sigmoid(self.slice(preact, 0, self.out_dim) ) #+ self.slice(self.p, 0, self.out_dim)*c_prev) # Input gate 1114 | #f = T.nnet.sigmoid(self.slice(preact, 1, self.out_dim) )#+ self.slice(self.p, 1, self.out_dim)*c_prev) # Forget gate 1115 | f = T.nnet.sigmoid(self.slice(x_, 1, self.out_dim)[:, None, :] + T.tensordot(child_h, self.slice(U, 1, self.out_dim), [[2,3],[0,1]]) ) #/ child_exists[:, :, None] 1116 | z = T.tanh(self.slice(preact, 3, self.out_dim)) # block input 1117 | c = (f * child_c).sum(axis=1) / child_exists[:, None] + i * z # cell state 1118 | o = T.nnet.sigmoid(self.slice(preact, 2, self.out_dim) )#+ self.slice(self.p, 2, self.out_dim) * c) # output gate 1119 | h = o * T.tanh(c) # block output 1120 | return h, c 1121 | 1122 | def slice(self, matrix, row_idx, stride): 1123 | if matrix.ndim == 3: 1124 | return matrix[:, :, row_idx * stride:(row_idx + 1) * stride] 1125 | elif matrix.ndim == 2: 1126 | return matrix[:, row_idx * stride:(row_idx + 1) * stride] 1127 | else: 1128 | return matrix[row_idx*stride: (row_idx+1)*stride] 1129 | 1130 | 1131 | """ This requires W, U and b initializer 1132 | """ 1133 | def compute(self, input_tv, mask=None): 1134 | n_steps = input_tv.shape[0] 1135 | if input_tv.ndim == 3: 1136 | n_samples = input_tv.shape[1] 1137 | else: 1138 | n_samples = 1 1139 | 1140 | # Shapes: m_ = (n_steps, arc_type), node_h = (out_dim, n_steps) 1141 | def __step(x_, m_, t, node_h, node_c, U): 1142 | child_h = node_h[:,:,None] * m_ 1143 | child_c = node_c * m_.sum(axis=1) 1144 | if self.go_backwards: 1145 | valid_mask = m_[t+1:, :] 1146 | else: 1147 | valid_mask = m_[:t, :] 1148 | curr_h, curr_c = self.recursive_unit(x_, child_h, child_c, valid_mask, U) 1149 | node_h = T.set_subtensor(node_h[:,t], curr_h) 1150 | node_c = T.set_subtensor(node_c[:,t], curr_c) 1151 | return node_h, node_c 1152 | 1153 | # Shapes: x_: (batch_size, 4*out_dim), m_: (batch_size, sent_len, arc_type) 1154 | # t: (1,), node_h: (batch_size, sent_len, out_dim), U: (out_dim, 4*out_dim) 1155 | def __step_batch(x_, m_, t, node_h, node_c, U): 1156 | child_h = node_h[:, :, None, :] * m_[:, :, :, None] 1157 | child_c = node_c * m_.sum(axis=2)[:, :, None] 1158 | if self.go_backwards: 1159 | valid_mask = m_[:, t+1:, :].sum(axis=(1,2)) 1160 | else: 1161 | valid_mask = m_[:, :t, :].sum(axis=(1,2)) 1162 | convert_mask = T.zeros_like(valid_mask) 1163 | diff_mask = T.cast(T.eq(valid_mask, convert_mask), dtype=theano.config.floatX) 1164 | valid_mask += diff_mask 1165 | curr_h, curr_c = self.recursive_unit_batch(x_, child_h, child_c, valid_mask, U) 1166 | node_h = T.set_subtensor(node_h[:, t, :], curr_h) 1167 | node_c = T.set_subtensor(node_c[:, t, :], curr_c) 1168 | return node_h, node_c 1169 | 1170 | x_in = T.dot(input_tv, self.W).astype(config.floatX) + self.b 1171 | seq_in = [x_in, mask, T.arange(n_steps)] 1172 | if input_tv.ndim == 3: 1173 | lstm_step = __step_batch 1174 | h_init = T.alloc(np_floatX(0.), n_samples, n_steps, self.out_dim) 1175 | c_init = T.alloc(np_floatX(0.), n_samples, n_steps, self.out_dim) 1176 | else: 1177 | lstm_step = __step 1178 | h_init = T.alloc(np_floatX(0.), self.out_dim, n_steps) 1179 | c_init = T.alloc(np_floatX(0.), self.out_dim, n_steps) 1180 | print 'lstm step:', lstm_step 1181 | rval, _ = theano.scan(lstm_step, 1182 | sequences=seq_in, 1183 | outputs_info=[h_init, c_init], 1184 | non_sequences=[self.U], #, self.p], 1185 | go_backwards=self.go_backwards, 1186 | name=name_tv(self.name, 'WeightedGraphLSTM_layer'), 1187 | n_steps=n_steps, 1188 | strict=True, 1189 | ) 1190 | if input_tv.ndim == 3: 1191 | self.output_tv = rval[0][-1].dimshuffle(1,0,2) 1192 | #self.output_tv = reverse(rval[0][-1].dimshuffle(1,0,2)) if self.go_backwards else rval[0][-1].dimshuffle(1,0,2) 1193 | else: 1194 | self.output_tv = rval[0][-1].T 1195 | #self.output_tv = reverse(rval[0][-1].T) if self.go_backwards else rval[0][-1].T 1196 | #print 'in GraphLSTM, variable types:', input_tv.dtype, self.output_tv.dtype 1197 | if self.params.get(self.kn('dropout_rate'), 0.0) != 0.0: 1198 | print 'DROP OUT!!! at circuite', self.name, 'Drop out rate: ', self.params[self.kn('dropout_rate')] 1199 | self.output_tv = _dropout_from_layer(self.params['rng'], self.output_tv, self.params[self.kn('dropout_rate')]) 1200 | 1201 | def needed_key(self): 1202 | return self._needed_key_impl('W', 'U', 'b', 'go_backwards', 'arc_types') 1203 | 1204 | 1205 | class BiLSTM(Chip): 1206 | def __init__(self, name, params=None): 1207 | super(BiLSTM, self).__init__(name, params) 1208 | print 'print bilstm parameters:', self.params 1209 | self.params[self.name+"_forward_go_backwards"] = False 1210 | self.params[self.name+"_backward_go_backwards"] = True 1211 | self.params[self.name+"_forward_out_dim"] = params[self.kn('out_dim')] 1212 | self.params[self.name+"_backward_out_dim"] = params[self.kn('out_dim')] 1213 | self.forward_chip = LSTM(self.name+"_forward", self.params) # 1214 | self.backward_chip = LSTM(self.name+"_backward", self.params) # 1215 | 1216 | def prepend(self, previous_chip): 1217 | self.forward_chip.prepend(previous_chip) 1218 | self.backward_chip.prepend(previous_chip) 1219 | self.in_dim = self.forward_chip.in_dim 1220 | self.out_dim = self.forward_chip.out_dim + self.backward_chip.out_dim 1221 | self.parameters = self.forward_chip.parameters + self.backward_chip.parameters 1222 | return self 1223 | 1224 | """ This requires W, U and b initializer 1225 | """ 1226 | def compute(self, input_tv, mask=None): 1227 | # Before creating the sub LSTM's set the out_dim to half 1228 | # Basically this setting would be used by the sub LSTMs 1229 | self.forward_chip.compute(input_tv, mask) 1230 | self.backward_chip.compute(input_tv, mask) 1231 | # Without mini-batch, the output shape is (sent_len, hidden_dim) 1232 | # With mini-batch, the shape is (sent_len, num_sample, hidden_dim) 1233 | self.output_tv = T.concatenate([self.forward_chip.output_tv, self.backward_chip.output_tv], axis=-1) 1234 | if self.params.get(self.kn('dropout_rate'), 0.0) != 0.0: 1235 | print 'DROP OUT!!! at circuite', self.name, 'Drop out rate: ', self.params[self.kn('dropout_rate')] 1236 | self.output_tv = _dropout_from_layer(self.params['rng'], self.output_tv, self.params[self.kn('dropout_rate')]) 1237 | #self.out_dim = self.forward_chip.out_dim + self.backward_chip.out_dim 1238 | 1239 | def needed_key(self): 1240 | return self.forward_chip.needed_key() + self.backward_chip.needed_key() 1241 | 1242 | 1243 | class BiGraphLSTM(BiLSTM): 1244 | def __init__(self, name, params=None): 1245 | super(BiGraphLSTM, self).__init__(name, params) 1246 | print 'print bi-graph-lstm parameters:', self.params 1247 | self.params[self.name+"_forward_go_backwards"] = False 1248 | self.params[self.name+"_backward_go_backwards"] = True 1249 | self.params[self.name+"_forward_out_dim"] = params[self.kn('out_dim')] 1250 | self.params[self.name+"_backward_out_dim"] = params[self.kn('out_dim')] 1251 | self.forward_chip = GraphLSTM(self.name+"_forward", self.params) # 1252 | self.backward_chip = GraphLSTM(self.name+"_backward", self.params) # 1253 | #self.params[self.kn('win')] = 2 1254 | 1255 | 1256 | class BiGraphLSTM_Wtd(BiLSTM): 1257 | def __init__(self, name, params=None): 1258 | super(BiGraphLSTM_Wtd, self).__init__(name, params) 1259 | print 'print bi-weighted-graph-lstm parameters:', self.params 1260 | self.params[self.name+"_forward_go_backwards"] = False 1261 | self.params[self.name+"_backward_go_backwards"] = True 1262 | self.params[self.name+"_forward_out_dim"] = params[self.kn('out_dim')] 1263 | self.params[self.name+"_backward_out_dim"] = params[self.kn('out_dim')] 1264 | self.params[self.name+"_forward_arc_types"] = params[self.kn('arc_types')] 1265 | self.params[self.name+"_backward_arc_types"] = params[self.kn('arc_types')] 1266 | self.forward_chip = GraphLSTM_Wtd(self.name+"_forward", self.params) # 1267 | self.backward_chip = GraphLSTM_Wtd(self.name+"_backward", self.params) # 1268 | #self.params[self.kn('win')] = 2 1269 | 1270 | 1271 | class BiGraphLSTM_WtdAdd(BiLSTM): 1272 | def __init__(self, name, params=None): 1273 | super(BiGraphLSTM_WtdAdd, self).__init__(name, params) 1274 | print 'print bi-weighted-graph-lstm parameters:', self.params 1275 | self.params[self.name+"_forward_go_backwards"] = False 1276 | self.params[self.name+"_backward_go_backwards"] = True 1277 | self.params[self.name+"_forward_out_dim"] = params[self.kn('out_dim')] 1278 | self.params[self.name+"_backward_out_dim"] = params[self.kn('out_dim')] 1279 | self.params[self.name+"_forward_arc_types"] = params[self.kn('arc_types')] 1280 | self.params[self.name+"_backward_arc_types"] = params[self.kn('arc_types')] 1281 | self.params[self.name+"_forward_type_dim"] = params[self.kn('type_dim')] 1282 | self.params[self.name+"_backward_type_dim"] = params[self.kn('type_dim')] 1283 | self.forward_chip = GraphLSTM_WtdAdd(self.name+"_forward", self.params) # 1284 | self.backward_chip = GraphLSTM_WtdAdd(self.name+"_backward", self.params) # 1285 | #self.params[self.kn('win')] = 2 1286 | 1287 | 1288 | class BiGraphLSTM_WtdEmbMult(BiLSTM): 1289 | def __init__(self, name, params=None): 1290 | super(BiGraphLSTM_WtdEmbMult, self).__init__(name, params) 1291 | print 'print bi-weighted-graph-lstm parameters:', self.params 1292 | self.params[self.name+"_forward_go_backwards"] = False 1293 | self.params[self.name+"_backward_go_backwards"] = True 1294 | self.params[self.name+"_forward_out_dim"] = params[self.kn('out_dim')] 1295 | self.params[self.name+"_backward_out_dim"] = params[self.kn('out_dim')] 1296 | self.params[self.name+"_forward_arc_types"] = params[self.kn('arc_types')] 1297 | self.params[self.name+"_backward_arc_types"] = params[self.kn('arc_types')] 1298 | self.params[self.name+"_forward_type_dim"] = params[self.kn('type_dim')] 1299 | self.params[self.name+"_backward_type_dim"] = params[self.kn('type_dim')] 1300 | self.forward_chip = GraphLSTM_WtdEmbMult(self.name+"_forward", self.params) 1301 | self.backward_chip = GraphLSTM_WtdEmbMult(self.name+"_backward", self.params) # 1302 | #self.params[self.kn('win')] = 2 1303 | 1304 | 1305 | 1306 | class L2Reg(Chip): 1307 | """ This supposes that the previous chip would have a score attribute. 1308 | And it literally only changes the score attribute by adding the regularization term 1309 | on top of it. 1310 | """ 1311 | def prepend(self, previous_chip): 1312 | self.previous_chip = previous_chip 1313 | super(L2Reg, self).prepend(previous_chip) 1314 | return self 1315 | 1316 | def compute(self, input_tv): 1317 | L2 = T.sum(T.stack([T.sum(self.params[k]*self.params[k]) 1318 | for k 1319 | in self.regularizable_variables()])) 1320 | L2.name = tparams_make_name(self.name, 'L2') 1321 | self.score = self.previous_chip.score + self.params[self.kn('reg_weight')] * L2 1322 | 1323 | def __getattr__(self, item): 1324 | """ Inherit all the attributes of the previous chip. 1325 | At present I can only see this functionality being useful 1326 | for the case of the Slice and Regularization chip. Maybe we would move 1327 | this down in case it is found necessary later on, but there is 1328 | chance of abuse. 1329 | """ 1330 | try: 1331 | return getattr(self.previous_chip, item) 1332 | except KeyError: 1333 | raise AttributeError(item) 1334 | 1335 | def needed_key(self): 1336 | return self._needed_key_impl('reg_weight') 1337 | 1338 | -------------------------------------------------------------------------------- /theano_src/train_util.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | 3 | """ 4 | The implementation of the shared model training code. 5 | """ 6 | 7 | import sys 8 | import datetime 9 | import time 10 | import codecs 11 | import theano 12 | import theano.tensor as T 13 | import numpy as np 14 | import random 15 | import warnings 16 | import cPickle as pickle 17 | from theano import config 18 | from neural_lib import ArrayInit 19 | 20 | def np_floatX(data): 21 | return np.asarray(data, dtype=config.floatX) 22 | 23 | def dict_from_argparse(apobj): 24 | return dict(apobj._get_kwargs()) 25 | 26 | def get_minibatches_idx(n, minibatch_size, shuffle=False): 27 | """ 28 | Used to shuffle the dataset at each iteration. 29 | """ 30 | idx_list = np.arange(n, dtype="int32") 31 | if shuffle: 32 | np.random.shuffle(idx_list) 33 | minibatches = [] 34 | minibatch_start = 0 35 | for i in range(n // minibatch_size): 36 | minibatches.append(idx_list[minibatch_start: minibatch_start + minibatch_size]) 37 | minibatch_start += minibatch_size 38 | if (minibatch_start != n): 39 | # Make a minibatch out of what is left 40 | minibatches.append(idx_list[minibatch_start:]) 41 | 42 | return minibatches 43 | 44 | 45 | def shuffle(lol, seed): 46 | ''' 47 | lol :: list of list as input 48 | shuffle inplace each list in the same order by ensuring that we use the same state for every run of shuffle. 49 | ''' 50 | random.seed(seed) 51 | state = random.getstate() 52 | for l in lol: 53 | random.setstate(state) 54 | random.shuffle(l) 55 | 56 | def convert_id_to_word(corpus, idx2label): 57 | return [[idx2label[word] for word in sentence] 58 | for sentence 59 | in corpus] 60 | 61 | def add_arg(name, default=None, **kwarg): 62 | assert hasattr(add_arg, 'arg_parser'), "You must register an arg_parser with add_arg before calling it" 63 | import ast 64 | if 'action' in kwarg: 65 | add_arg.arg_parser.add_argument(name, default=default, **kwarg) 66 | elif type(default)==bool: 67 | add_arg.arg_parser.add_argument(name, default=default, type=ast.literal_eval, **kwarg) 68 | else: 69 | add_arg.arg_parser.add_argument(name, default=default, type=type(default), **kwarg) 70 | return 71 | 72 | def add_arg_to_L(L, name, default=None, **kwarg): 73 | L.append(name[2:]) 74 | add_arg(name, default, **kwarg) 75 | return 76 | 77 | def read_matrix_from_pkl(fn, dic): 78 | ''' 79 | Assume that the file contains words in first column, 80 | and embeddings in the rest and that dic maps words to indices. 81 | ''' 82 | voc, data = pickle.load(codecs.open(fn)) 83 | dim = len(data[0]) 84 | # NOTE: The norm of onesided_uniform rv is sqrt(n)/sqrt(3) 85 | # Since the expected value of X^2 = 1/3 where X ~ U[0, 1] 86 | # => sum(X_i^2) = dim/3 87 | # => norm = sqrt(dim/3) 88 | # => norm/dim = sqrt(1/3dim) 89 | multiplier = np.sqrt(1.0/(3*dim)) 90 | M = ArrayInit(ArrayInit.onesided_uniform, multiplier=1.0/dim).initialize(len(dic), dim) 91 | not_in_dict = 0 92 | for i, e in enumerate(data): 93 | r = np.array([float(_e) for _e in e]) 94 | if voc[i] in dic: 95 | idx = dic[voc[i]] 96 | M[idx] = (r/np.linalg.norm(r)) * multiplier 97 | else: 98 | not_in_dict += 1 99 | print 'load embedding! %d words, %d not in the dictionary. Dictionary size: %d' %(len(voc), not_in_dict, len(dic)) 100 | return M 101 | 102 | def batch_run_func(corpora, func, *parameters ): 103 | converted_corpora = [] 104 | for corpus in corpora: 105 | converted_corpora.append(func(corpus, *parameters)) 106 | return converted_corpora 107 | 108 | 109 | def read_matrix_from_gzip(fn, dic): 110 | import gzip 111 | not_in_dict = 0 112 | with gzip.open(fn) as inf: 113 | row, column = inf.readline().rstrip().split() 114 | dim = int(column) 115 | multiplier = np.sqrt(1.0/3) 116 | #print row, column 117 | idx_map = dict() 118 | line_count = 0 119 | M = ArrayInit(ArrayInit.onesided_uniform, multiplier=1.0/dim).initialize(len(dic)+2, dim) 120 | for line in inf: 121 | elems = line.rstrip().lower().split(' ') 122 | if elems[0] in dic: 123 | idx = dic[elems[0]] 124 | vec_elem = elems[1:] #.split(',') 125 | r = np.array([float(_e) for _e in vec_elem]) 126 | M[idx] = (r/np.linalg.norm(r)) * multiplier 127 | idx_map[idx] = line_count 128 | else: 129 | not_in_dict += 1 130 | line_count += 1 131 | print 'load embedding! %s words, %d not in the dictionary. Dictionary size: %d' %(row, not_in_dict, len(dic)) 132 | print 'embedding matrix shape:', M.shape, 'word map size:', len(idx_map) 133 | return M, idx_map 134 | 135 | 136 | def read_matrix_from_file(fn, dic, ecd='utf-8'): 137 | not_in_dict = 0 138 | with codecs.open(fn, encoding=ecd, errors='ignore') as inf: 139 | row, column = inf.readline().rstrip().split() 140 | dim = int(column) 141 | multiplier = np.sqrt(1.0/3) 142 | #print row, column 143 | idx_map = dict() 144 | line_count = 0 145 | M = ArrayInit(ArrayInit.onesided_uniform, multiplier=1.0/dim).initialize(len(dic)+2, dim) 146 | for line in inf: 147 | elems = line.rstrip().split(' ') 148 | if elems[0] in dic: 149 | idx = dic[elems[0]] 150 | vec_elem = elems[1:] #.split(',') 151 | r = np.array([float(_e) for _e in vec_elem]) 152 | M[idx] = (r/np.linalg.norm(r)) * multiplier 153 | idx_map[idx] = line_count 154 | else: 155 | not_in_dict += 1 156 | line_count += 1 157 | print 'load embedding! %s words, %d not in the dictionary. Dictionary size: %d' %(row, not_in_dict, len(dic)) 158 | print 'embedding matrix shape:', M.shape, 'word map size:', len(idx_map) 159 | return M, idx_map 160 | 161 | def read_matrix_and_idmap_from_file(fn, dic=dict(), ecd='utf-8'): 162 | idx_map = dic.copy() 163 | with codecs.open(fn, encoding=ecd, errors='ignore') as inf: 164 | row, column = inf.readline().rstrip().split(' ') 165 | dim = int(column) 166 | multiplier = 1 #np.sqrt(1.0/3) #(2.0/dim) 167 | #print row, column 168 | # note: here +3 include OOV, BOS and EOS 169 | content = [] 170 | not_in_dict = 0 171 | for line in inf: 172 | elems = line.rstrip().split(' ') 173 | content.append(elems) 174 | if elems[0] not in idx_map: 175 | idx_map[elems[0]] = len(idx_map) 176 | not_in_dict += 1 177 | print 'load embedding! %s words, %d not in the dictionary. Dictionary size: %d' %(row, not_in_dict, len(dic)) 178 | assert len(idx_map) == len(dic) + not_in_dict 179 | M = ArrayInit(ArrayInit.onesided_uniform, multiplier=1.0/dim).initialize(len(idx_map)+2, dim) #*2) 180 | for elems in content: 181 | if elems[0] in idx_map: 182 | idx = idx_map[elems[0]] 183 | vec_elem = elems[1:] #.split(',') 184 | #print len(vec_elem), elems 185 | r = np.array([float(_e) for _e in vec_elem]) 186 | M[idx,:dim] = (r/np.linalg.norm(r)) * multiplier 187 | else: 188 | print 'error in load embedding matrix!!' 189 | print 'embedding matrix shape:', M.shape, 'word map size:', len(idx_map) 190 | return M, idx_map 191 | 192 | 193 | def read_idxmap_from_file(fn): 194 | idx_map = dict() 195 | with open(fn, 'r') as inf: 196 | for line in inf: 197 | elems = line.strip().split() 198 | idx_map[elems[0]] = elems[1] 199 | 200 | 201 | def write_matrix_to_file(fn, matrix, idx_map): 202 | with open(fn, 'w') as outf: #, open(fn+'.idx', 'w') as idxf: 203 | dim = matrix.shape 204 | #assert matrix.shape[0] == len(idx_map) 205 | outf.write(str(len(idx_map)) + ' ' + str(dim[1]) + '\n') 206 | for i, row in enumerate(matrix): 207 | if i in idx_map: 208 | #print idx_map[i] 209 | outf.write(str(idx_map[i])) 210 | for el in row: 211 | outf.write(' ' + str(el)) 212 | outf.write('\n') 213 | #for k, v in idx_map.items(): 214 | # idxf.write(str(k)+' '+str(v)+'\n') 215 | 216 | def conv_emb(lex_arry, M_emb, win_size): 217 | lexv = [] 218 | for x in lex_arry: 219 | embs = _conv_x(x, M_emb, win_size) 220 | lexv.append(embs) 221 | return lexv 222 | 223 | def conv_data(corpus, win_l, win_r): 224 | lexv = [] 225 | # labv = [] 226 | idxv = [] 227 | temp_lex, temp_y, temp_idx = corpus 228 | for i, (x,y,idx) in enumerate(zip(temp_lex, temp_y, temp_idx)): 229 | words = conv_x(x, win_l, win_r) 230 | assert len(words) == len(x) 231 | lexv.append(words) 232 | npidx = [] 233 | for ii in idx: 234 | npidx.append(np.array(ii).astype('int32')) 235 | idxv.append(npidx) 236 | assert len(lexv) == len(idxv) 237 | assert len(lexv) == len(temp_y) 238 | return [lexv, temp_y, idxv] 239 | 240 | def conv_data_graph(corpus, win_l, win_r): 241 | lexv = [] 242 | # labv = [] 243 | idxv = [] 244 | temp_lex, temp_y, temp_idx, temp_dep = corpus 245 | for i, (x,y,idx) in enumerate(zip(temp_lex, temp_y, temp_idx)): 246 | words = conv_x(x, win_l, win_r) 247 | assert len(words) == len(x) 248 | lexv.append(words) 249 | npidx = [] 250 | for ii in idx: 251 | npidx.append(np.array(ii).astype('int32')) 252 | idxv.append(npidx) 253 | assert len(lexv) == len(idxv) 254 | assert len(lexv) == len(temp_y) 255 | return [lexv, temp_y, idxv, temp_dep] 256 | 257 | def _contextwin(l, win_l, win_r): 258 | ''' 259 | win :: int corresponding to the size of the window 260 | given a list of indexes composing a sentence 261 | 262 | l :: array containing the word indexes 263 | 264 | it will return a list of list of indexes corresponding 265 | to context windows surrounding each word in the sentence 266 | ''' 267 | assert win_l <= 0 268 | assert win_r >= 0 269 | l = list(l) 270 | 271 | win_size = win_r - win_l + 1 272 | lpadded = -win_l * [-2] + l + win_r * [-1] 273 | out = [lpadded[i:(i + win_size)] for i in range(len(l))] 274 | 275 | assert len(out) == len(l) 276 | return out 277 | 278 | def _conv_x(x, M_emb, window_size): 279 | x = M_emb[x] 280 | emb_dim = M_emb.shape[1] 281 | for line in x: 282 | assert len(line) == emb_dim 283 | cwords = _contextwin(x, -window_size // 2, window_size // 2) 284 | words = np.ndarray((len(x), emb_dim*window_size)).astype(theano.config.floatX) 285 | for i, win in enumerate(cwords): 286 | #print emb_dim, window_size, len(win[0]) 287 | #print win 288 | #assert len(win[0]) == emb_dim*window_size 289 | words[i] = win[0] 290 | return words 291 | 292 | def _conv_y(y, labelsize): 293 | labels = np.ndarray(len(y)).astype('int32') 294 | for i, label in enumerate(y): 295 | labels[i] = label 296 | #labels[i+1] = label 297 | assert len(labels) == len(y) #+ 2 298 | return labels 299 | 300 | def conv_x(x, window_l, window_r): 301 | #print 'in conv_x, window_l=', window_l, 'window_r=', window_r 302 | #x = list(x) 303 | #x = [vocsize] + x + [vocsize + 1] 304 | cwords = _contextwin(x, window_l, window_r) 305 | words = np.ndarray((len(x), window_r-window_l+1)).astype('int32') 306 | for i, win in enumerate(cwords): 307 | words[i] = win 308 | return words 309 | 310 | def _make_temp_storage(name, tparams): 311 | return [theano.shared(p.get_value() * 0., name= (name%(p.name))) for p in tparams] 312 | 313 | def sgd(lr, tparams, grads, cost, prefix, *input_params): #x_f, x_w, y, 314 | #def sgd(lr, tparams, grads, cost, att_wts, etv, prefix, *input_params): #x_f, x_w, y, 315 | # First we make shared variables from everything in tparams 316 | true_grad = _make_temp_storage('%s_grad', tparams) 317 | #print 'in sgd, input:', input_params 318 | #print true_grad 319 | assert len(true_grad) == len(grads) 320 | print 'prefix=', prefix, 'input params:', input_params 321 | f_cost = theano.function(list(input_params), 322 | cost, #att_wts, etv], 323 | updates=[(tg, g) for tg, g in zip(true_grad, grads)], 324 | on_unused_input='warn', 325 | name=prefix+'_sgd_f_cost') 326 | f_update = theano.function([lr], 327 | [], 328 | updates=[(p, p - lr * g) 329 | for p, g 330 | in zip(tparams, true_grad)], 331 | on_unused_input='warn', 332 | name=prefix+'_sgd_f_update') 333 | return f_cost, f_update 334 | 335 | 336 | def adadelta(lr, tparams, grads, cost, *input_params): # x_f, x_w, y, 337 | """ 338 | An adaptive learning rate optimizer 339 | 340 | Parameters 341 | ---------- 342 | lr : Theanono SharedVariable 343 | Initial learning rate 344 | tpramas: Theano SharedVariable 345 | Model parameters 346 | grads: Theano variable 347 | Gradients of cost w.r.t to parameres 348 | x: Theano variable 349 | Model inputs 350 | mask: Theano variable 351 | Sequence mask 352 | y: Theano variable 353 | Targets 354 | cost: Theano variable 355 | Objective fucntion to minimize 356 | 357 | Notes 358 | ----- 359 | For more information, see [ADADELTA]_. 360 | 361 | .. [ADADELTA] Matthew D. Zeiler, *ADADELTA: An Adaptive Learning 362 | Rate Method*, arXiv:1212.5701. 363 | """ 364 | zipped_grads = [theano.shared(p.get_value() * np_floatX(0.), 365 | name='%s_grad' % k) 366 | for k, p in tparams] #.iteritems()] 367 | running_up2 = [theano.shared(p.get_value() * np_floatX(0.), 368 | name='%s_rup2' % k) 369 | for k, p in tparams] #.iteritems()] 370 | running_grads2 = [theano.shared(p.get_value() * np_floatX(0.), 371 | name='%s_rgrad2' % k) 372 | for k, p in tparams] #.iteritems()] 373 | 374 | zgup = [(zg, g) for zg, g in zip(zipped_grads, grads)] 375 | rg2up = [(rg2, 0.95 * rg2 + 0.05 * (g ** 2)) 376 | for rg2, g in zip(running_grads2, grads)] 377 | 378 | f_cost = theano.function(input_params, 379 | cost, 380 | updates=zgup+rg2up, #[(tg, g) for tg, g in zip(true_grad, grads)], 381 | on_unused_input='warn', 382 | name='adadelta_f_cost') 383 | 384 | #f_grad_shared = theano.function([x, mask, y], cost, updates=zgup + rg2up, 385 | # name='adadelta_f_grad_shared') 386 | 387 | updir = [-T.sqrt(ru2 + 1e-6) / T.sqrt(rg2 + 1e-6) * zg 388 | for zg, ru2, rg2 in zip(zipped_grads, 389 | running_up2, 390 | running_grads2)] 391 | ru2up = [(ru2, 0.95 * ru2 + 0.05 * (ud ** 2)) 392 | for ru2, ud in zip(running_up2, updir)] 393 | param_up = [(p, p + ud) for (_,p), ud in zip(tparams, updir)] 394 | 395 | f_update = theano.function([lr], [], updates=ru2up + param_up, 396 | on_unused_input='ignore', 397 | name='adadelta_f_update') 398 | 399 | return f_cost, f_update 400 | 401 | 402 | #def rmsprop(lr, tparams, grads, x, mask, y, cost): 403 | def rmsprop(lr, tparams, grads, cost, *input_params): # x_f, x_w, y, 404 | """ 405 | A variant of SGD that scales the step size by running average of the 406 | recent step norms. 407 | 408 | Parameters 409 | ---------- 410 | lr : Theano SharedVariable 411 | Initial learning rate 412 | tpramas: Theano SharedVariable 413 | Model parameters 414 | grads: Theano variable 415 | Gradients of cost w.r.t to parameres 416 | x: Theano variable 417 | Model inputs 418 | mask: Theano variable 419 | Sequence mask 420 | y: Theano variable 421 | Targets 422 | cost: Theano variable 423 | Objective fucntion to minimize 424 | 425 | Notes 426 | ----- 427 | For more information, see [Hint2014]_. 428 | 429 | .. [Hint2014] Geoff Hinton, *Neural Networks for Machine Learning*, 430 | lecture 6a, 431 | http://cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf 432 | """ 433 | zipped_grads = [theano.shared(p.get_value() * np_floatX(0.), 434 | name='%s_grad' % k) 435 | for k, p in tparams] #.iteritems()] 436 | running_grads = [theano.shared(p.get_value() * np_floatX(0.), 437 | name='%s_rgrad' % k) 438 | for k, p in tparams] #.iteritems()] 439 | running_grads2 = [theano.shared(p.get_value() * np_floatX(0.), 440 | name='%s_rgrad2' % k) 441 | for k, p in tparams] #.iteritems()] 442 | 443 | zgup = [(zg, g) for zg, g in zip(zipped_grads, grads)] 444 | rgup = [(rg, 0.95 * rg + 0.05 * g) for rg, g in zip(running_grads, grads)] 445 | rg2up = [(rg2, 0.95 * rg2 + 0.05 * (g ** 2)) 446 | for rg2, g in zip(running_grads2, grads)] 447 | 448 | f_cost = theano.function(input_params, 449 | cost, 450 | updates=zgup+rg2up+rgup, #[(tg, g) for tg, g in zip(true_grad, grads)], 451 | on_unused_input='warn', 452 | name='rmsprop_f_cost') 453 | 454 | #f_grad_shared = theano.function([x, mask, y], cost, 455 | # updates=zgup + rgup + rg2up, 456 | # name='rmsprop_f_grad_shared') 457 | 458 | updir = [theano.shared(p.get_value() * np_floatX(0.), 459 | name='%s_updir' % k) 460 | for k, p in tparams] #.iteritems()] 461 | 462 | updir_new = [(ud, 0.9 * ud - 1e-4 * zg / T.sqrt(rg2 - rg ** 2 + 1e-4)) 463 | for ud, zg, rg, rg2 in zip(updir, zipped_grads, running_grads, 464 | running_grads2)] 465 | param_up = [(p, p + udn[1]) 466 | for (_, p), udn in zip(tparams, updir_new)] 467 | f_update = theano.function([lr], [], updates=updir_new + param_up, 468 | on_unused_input='ignore', 469 | name='rmsprop_f_update') 470 | 471 | return f_cost, f_update 472 | 473 | 474 | def build_optimizer(lr, grads, cost, regularizable_params, prefix, optimizer, *input_params): 475 | #def build_optimizer(lr, grads, cost, att_wts, etv, regularizable_params, prefix, optimizer, *input_params): 476 | ''' 477 | lr: Learning Rate. 478 | grads, cost, regularizable_params: gradient, cost and regularizable parameters. 479 | prefix: build opt for what data 480 | optimizer: Either the sgd or adadelta function. 481 | 482 | In my case I need to interface between the inside-outside code 483 | and the theano functions. 484 | ''' 485 | #f_cost, f_update = optimizer(lr, regularizable_params, grads, cost, att_wts, etv, prefix, *input_params) 486 | f_cost, f_update = optimizer(lr, regularizable_params, grads, cost, prefix, *input_params) 487 | return f_cost, f_update 488 | 489 | 490 | def create_relation_circuit(_args, StackConfig): 491 | cargs = dict_from_argparse(_args) 492 | cargs = StackConfig(cargs) 493 | if _args.graph or _args.batch_size > 1: 494 | #x_w, mask, idx_arr, att_wt, etv, y, y_pred, cost, grads, regularizable_params = _args.circuit(cargs) 495 | x_w, mask, idx_arr, y, y_pred, cost, grads, regularizable_params = _args.circuit(cargs) 496 | inputs = idx_arr + [x_w, mask, y] 497 | pred_inputs = [x_w, mask] + idx_arr 498 | else: 499 | x_w, idx_arr, y, y_pred, cost, grads, regularizable_params = _args.circuit(cargs) 500 | inputs = idx_arr + [x_w, y] 501 | pred_inputs = [x_w] + idx_arr 502 | if _args.verbose == 2: 503 | print "\n", "Printing Configuration after StackConfig:" 504 | print_args(cargs) 505 | lr = T.scalar('lr') 506 | # need to refactor: all the build_optimizer and optimizers 507 | #f_cost, f_update = build_optimizer(lr, grads, cost, att_wt, etv, regularizable_params, 'global', _args.optimizer, 508 | f_cost, f_update = build_optimizer(lr, grads, cost, regularizable_params, 'global', _args.optimizer, 509 | *inputs) 510 | f_classify = theano.function(inputs=pred_inputs, 511 | outputs=y_pred, #att_wt], 512 | on_unused_input='warn', 513 | name='f_classify') 514 | return (f_cost, f_update, f_classify, cargs) 515 | 516 | 517 | def create_multitask_relation_circuit(_args, StackConfig, num_tasks): 518 | cargs = dict_from_argparse(_args) 519 | cargs = StackConfig(cargs) 520 | for i in range(num_tasks): 521 | cargs['t'+str(i)+'_logistic_regression_out_dim'] = 2 522 | if _args.verbose == 2: 523 | print "\n", "Printing Configuration after StackConfig:" 524 | print_args(cargs) 525 | return build_relation_obj_grad_and_classifier(_args, cargs, num_tasks) 526 | 527 | 528 | def build_relation_obj_grad_and_classifier(_args, cargs, num_tasks): 529 | if cargs['batch_size'] > 1: 530 | x_arr, idx_arr, masks, y_arr, pred_y_arr, costs_arr, grads_arr, regularizable_param_arr = _args.circuit(cargs, num_tasks) 531 | else: 532 | x_arr, idx_arr, y_arr, pred_y_arr, costs_arr, grads_arr, regularizable_param_arr = _args.circuit(cargs, num_tasks) 533 | lr = T.scalar('lr') 534 | f_costs_and_updates = [] 535 | f_classifies = [] 536 | for i in range(len(costs_arr)): 537 | if cargs['batch_size'] > 1: 538 | inputs = idx_arr[i] + [x_arr[i], masks[i], y_arr[i]] 539 | else: 540 | inputs = idx_arr[i] + [x_arr[i], y_arr[i]] 541 | f_costs_and_updates.append(build_optimizer(lr, grads_arr[i], costs_arr[i], regularizable_param_arr[i], 'task'+str(i), _args.optimizer, *inputs)) 542 | if cargs['batch_size'] > 1: 543 | cls_input = [x_arr[i], masks[i]] + idx_arr[i] 544 | else: 545 | cls_input = [x_arr[i]]+idx_arr[i] 546 | f_classifies.append(theano.function(inputs=cls_input, 547 | outputs=pred_y_arr[i], 548 | on_unused_input='warn', 549 | name='f_classify_'+str(i))) 550 | return (f_costs_and_updates, f_classifies, cargs) 551 | 552 | 553 | def print_args(args): 554 | for k in args: 555 | if type(args[k]) == dict or type(args[k]) == list or type(args[k]) == tuple: 556 | print k, 'container size:', len(args[k]) 557 | else: 558 | print k, args[k] 559 | 560 | def save_parameters(path, params): 561 | #savable_params={k : v.get_value() for k, v in params.items() } 562 | pickle.dump(params, open(path, 'w')) 563 | 564 | def load_params(path, params): 565 | pp = pickle.load(open(path, 'r')) 566 | for kk, vv in pp.iteritems(): 567 | #print "loading parameters:", kk, "type:", type(vv) 568 | #if kk not in params: 569 | # warnings.warn('%s is not in the archive' % kk) 570 | print 'updating parameter:', kk 571 | params[kk]=pp[kk] 572 | return params 573 | 574 | --------------------------------------------------------------------------------

('+str(num)+') '+instance[1]+'