├── LICENSE ├── README.md ├── code ├── README.md ├── config │ ├── Config.py │ ├── EviConfig.py │ └── __init__.py ├── evaluation.py ├── gen_data.py ├── models │ ├── BiLSTM.py │ ├── CNN3.py │ ├── ContextAware.py │ ├── LSTM.py │ ├── LSTM_SP.py │ └── __init__.py ├── prepro_data │ └── README.md ├── requirements.txt ├── test.py ├── test_sp.py ├── train.py └── train_sp.py └── data └── README.md /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2017 THUNLP 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # DocRED 2 | Dataset and code for baselines for [DocRED: A Large-Scale Document-Level Relation Extraction Dataset](https://arxiv.org/abs/1906.06127v3) 3 | 4 | Multiple entities in a document generally exhibit complex inter-sentence relations, and cannot be well handled by existing relation extraction (RE) methods that typically focus on extracting intra-sentence relations for single entity pairs. In order to accelerate the research on document-level RE, we introduce DocRED, a new dataset constructed from Wikipedia and Wikidata with three features: 5 | 6 | + DocRED annotates both named entities and relations, and is the largest human-annotated dataset for document-level RE from plain text. 7 | + DocRED requires reading multiple sentences in a document to extract entities and infer their relations by synthesizing all information of the document. 8 | + Along with the human-annotated data, we also offer large-scale distantly supervised data, which enables DocRED to be adopted for both supervised and weakly supervised scenarios. 9 | 10 | ## Codalab 11 | If you are interested in our dataset, you are welcome to join in the Codalab competition at [DocRED](https://competitions.codalab.org/competitions/20717) 12 | 13 | 14 | ## Cite 15 | If you use the dataset or the code, please cite this paper: 16 | ``` 17 | @inproceedings{yao2019DocRED, 18 | title={{DocRED}: A Large-Scale Document-Level Relation Extraction Dataset}, 19 | author={Yao, Yuan and Ye, Deming and Li, Peng and Han, Xu and Lin, Yankai and Liu, Zhenghao and Liu, Zhiyuan and Huang, Lixin and Zhou, Jie and Sun, Maosong}, 20 | booktitle={Proceedings of ACL 2019}, 21 | year={2019} 22 | } 23 | ``` 24 | -------------------------------------------------------------------------------- /code/README.md: -------------------------------------------------------------------------------- 1 | # Baseline code 2 | 3 | ## Requirements and Installation 4 | python3 5 | 6 | pytorch>=1.0 7 | 8 | ``` 9 | pip3 install -r requirements.txt 10 | ``` 11 | 12 | ## preprocessing data 13 | Download metadata from [TsinghuaCloud](https://cloud.tsinghua.edu.cn/d/99e1c0805eb64736af95/) or [GoogleDrive](https://drive.google.com/drive/folders/1Ri3LIILKKBi3aBJjUVCOBpGX5PpONHRK) for baseline method and put them into prepro_data folder. 14 | 15 | 16 | ``` 17 | python3 gen_data.py --in_path ../data --out_path prepro_data 18 | ``` 19 | 20 | ## relation extration 21 | 22 | training: 23 | ``` 24 | CUDA_VISIBLE_DEVICES=0 python3 train.py --model_name BiLSTM --save_name checkpoint_BiLSTM --train_prefix dev_train --test_prefix dev_dev 25 | ``` 26 | 27 | testing (--test_prefix dev_dev for dev set, dev_test for test set): 28 | ``` 29 | CUDA_VISIBLE_DEVICES=0 python3 test.py --model_name BiLSTM --save_name checkpoint_BiLSTM --train_prefix dev_train --test_prefix dev_dev --input_theta 0.3601 30 | ``` 31 | 32 | ## evidence extration 33 | 34 | training: 35 | ``` 36 | CUDA_VISIBLE_DEVICES=0 python3 train_sp.py --model_name LSTM_SP --save_name checkpoint_BiLSTMSP --train_prefix dev_train --test_prefix dev_dev 37 | ``` 38 | 39 | testing: 40 | ``` 41 | CUDA_VISIBLE_DEVICES=0 python3 test_sp.py --model_name LSTM_SP --save_name checkpoint_BiLSTMSP --train_prefix dev_train --test_prefix dev_dev --input_theta 0.4619 42 | ``` 43 | 44 | ## evaluation 45 | 46 | dev result can evaluated by 47 | ``` 48 | python3 evalutaion result.json ../data/dev.json 49 | ``` 50 | 51 | test result should be submit to Codalab. 52 | 53 | 54 | 55 | -------------------------------------------------------------------------------- /code/config/Config.py: -------------------------------------------------------------------------------- 1 | # coding: utf-8 2 | import torch 3 | import torch.nn as nn 4 | from torch.autograd import Variable 5 | import torch.optim as optim 6 | import numpy as np 7 | import os 8 | import time 9 | import datetime 10 | import json 11 | import sys 12 | import sklearn.metrics 13 | from tqdm import tqdm 14 | import matplotlib 15 | matplotlib.use('Agg') 16 | import matplotlib.pyplot as plt 17 | import random 18 | from collections import defaultdict 19 | import torch.nn.functional as F 20 | 21 | 22 | IGNORE_INDEX = -100 23 | is_transformer = False 24 | 25 | class Accuracy(object): 26 | def __init__(self): 27 | self.correct = 0 28 | self.total = 0 29 | def add(self, is_correct): 30 | self.total += 1 31 | if is_correct: 32 | self.correct += 1 33 | def get(self): 34 | if self.total == 0: 35 | return 0.0 36 | else: 37 | return float(self.correct) / self.total 38 | def clear(self): 39 | self.correct = 0 40 | self.total = 0 41 | 42 | class Config(object): 43 | def __init__(self, args): 44 | self.acc_NA = Accuracy() 45 | self.acc_not_NA = Accuracy() 46 | self.acc_total = Accuracy() 47 | self.data_path = './prepro_data' 48 | self.use_bag = False 49 | self.use_gpu = True 50 | self.is_training = True 51 | self.max_length = 512 52 | self.pos_num = 2 * self.max_length 53 | self.entity_num = self.max_length 54 | self.relation_num = 97 55 | 56 | self.coref_size = 20 57 | self.entity_type_size = 20 58 | self.max_epoch = 20 59 | self.opt_method = 'Adam' 60 | self.optimizer = None 61 | 62 | self.checkpoint_dir = './checkpoint' 63 | self.fig_result_dir = './fig_result' 64 | self.test_epoch = 5 65 | self.pretrain_model = None 66 | 67 | 68 | self.word_size = 100 69 | self.epoch_range = None 70 | self.cnn_drop_prob = 0.5 # for cnn 71 | self.keep_prob = 0.8 # for lstm 72 | 73 | self.period = 50 74 | 75 | self.batch_size = 40 76 | self.h_t_limit = 1800 77 | 78 | self.test_batch_size = self.batch_size 79 | self.test_relation_limit = 1800 80 | self.char_limit = 16 81 | self.sent_limit = 25 82 | self.dis2idx = np.zeros((512), dtype='int64') 83 | self.dis2idx[1] = 1 84 | self.dis2idx[2:] = 2 85 | self.dis2idx[4:] = 3 86 | self.dis2idx[8:] = 4 87 | self.dis2idx[16:] = 5 88 | self.dis2idx[32:] = 6 89 | self.dis2idx[64:] = 7 90 | self.dis2idx[128:] = 8 91 | self.dis2idx[256:] = 9 92 | self.dis_size = 20 93 | 94 | self.train_prefix = args.train_prefix 95 | self.test_prefix = args.test_prefix 96 | 97 | 98 | if not os.path.exists("log"): 99 | os.mkdir("log") 100 | 101 | def set_data_path(self, data_path): 102 | self.data_path = data_path 103 | def set_max_length(self, max_length): 104 | self.max_length = max_length 105 | self.pos_num = 2 * self.max_length 106 | def set_num_classes(self, num_classes): 107 | self.num_classes = num_classes 108 | def set_window_size(self, window_size): 109 | self.window_size = window_size 110 | def set_word_size(self, word_size): 111 | self.word_size = word_size 112 | def set_max_epoch(self, max_epoch): 113 | self.max_epoch = max_epoch 114 | def set_batch_size(self, batch_size): 115 | self.batch_size = batch_size 116 | def set_opt_method(self, opt_method): 117 | self.opt_method = opt_method 118 | def set_drop_prob(self, drop_prob): 119 | self.drop_prob = drop_prob 120 | def set_checkpoint_dir(self, checkpoint_dir): 121 | self.checkpoint_dir = checkpoint_dir 122 | def set_test_epoch(self, test_epoch): 123 | self.test_epoch = test_epoch 124 | def set_pretrain_model(self, pretrain_model): 125 | self.pretrain_model = pretrain_model 126 | def set_is_training(self, is_training): 127 | self.is_training = is_training 128 | def set_use_bag(self, use_bag): 129 | self.use_bag = use_bag 130 | def set_use_gpu(self, use_gpu): 131 | self.use_gpu = use_gpu 132 | def set_epoch_range(self, epoch_range): 133 | self.epoch_range = epoch_range 134 | 135 | def load_train_data(self): 136 | print("Reading training data...") 137 | prefix = self.train_prefix 138 | 139 | print ('train', prefix) 140 | self.data_train_word = np.load(os.path.join(self.data_path, prefix+'_word.npy')) 141 | self.data_train_pos = np.load(os.path.join(self.data_path, prefix+'_pos.npy')) 142 | self.data_train_ner = np.load(os.path.join(self.data_path, prefix+'_ner.npy')) 143 | self.data_train_char = np.load(os.path.join(self.data_path, prefix+'_char.npy')) 144 | self.train_file = json.load(open(os.path.join(self.data_path, prefix+'.json'))) 145 | 146 | print("Finish reading") 147 | 148 | self.train_len = ins_num = self.data_train_word.shape[0] 149 | assert(self.train_len==len(self.train_file)) 150 | 151 | self.train_order = list(range(ins_num)) 152 | self.train_batches = ins_num // self.batch_size 153 | if ins_num % self.batch_size != 0: 154 | self.train_batches += 1 155 | 156 | def load_test_data(self): 157 | print("Reading testing data...") 158 | self.data_word_vec = np.load(os.path.join(self.data_path, 'vec.npy')) 159 | self.data_char_vec = np.load(os.path.join(self.data_path, 'char_vec.npy')) 160 | self.rel2id = json.load(open(os.path.join(self.data_path, 'rel2id.json'))) 161 | self.id2rel = {v: k for k,v in self.rel2id.items()} 162 | 163 | prefix = self.test_prefix 164 | print (prefix) 165 | self.is_test = ('dev_test' == prefix) 166 | self.data_test_word = np.load(os.path.join(self.data_path, prefix+'_word.npy')) 167 | self.data_test_pos = np.load(os.path.join(self.data_path, prefix+'_pos.npy')) 168 | self.data_test_ner = np.load(os.path.join(self.data_path, prefix+'_ner.npy')) 169 | self.data_test_char = np.load(os.path.join(self.data_path, prefix+'_char.npy')) 170 | self.test_file = json.load(open(os.path.join(self.data_path, prefix+'.json'))) 171 | 172 | 173 | self.test_len = self.data_test_word.shape[0] 174 | assert(self.test_len==len(self.test_file)) 175 | 176 | 177 | print("Finish reading") 178 | 179 | self.test_batches = self.data_test_word.shape[0] // self.test_batch_size 180 | if self.data_test_word.shape[0] % self.test_batch_size != 0: 181 | self.test_batches += 1 182 | 183 | self.test_order = list(range(self.test_len)) 184 | self.test_order.sort(key=lambda x: np.sum(self.data_test_word[x] > 0), reverse=True) 185 | 186 | 187 | def get_train_batch(self): 188 | random.shuffle(self.train_order) 189 | 190 | context_idxs = torch.LongTensor(self.batch_size, self.max_length).cuda() 191 | context_pos = torch.LongTensor(self.batch_size, self.max_length).cuda() 192 | h_mapping = torch.Tensor(self.batch_size, self.h_t_limit, self.max_length).cuda() 193 | t_mapping = torch.Tensor(self.batch_size, self.h_t_limit, self.max_length).cuda() 194 | relation_multi_label = torch.Tensor(self.batch_size, self.h_t_limit, self.relation_num).cuda() 195 | relation_mask = torch.Tensor(self.batch_size, self.h_t_limit).cuda() 196 | 197 | pos_idx = torch.LongTensor(self.batch_size, self.max_length).cuda() 198 | 199 | context_ner = torch.LongTensor(self.batch_size, self.max_length).cuda() 200 | context_char_idxs = torch.LongTensor(self.batch_size, self.max_length, self.char_limit).cuda() 201 | 202 | relation_label = torch.LongTensor(self.batch_size, self.h_t_limit).cuda() 203 | 204 | 205 | ht_pair_pos = torch.LongTensor(self.batch_size, self.h_t_limit).cuda() 206 | 207 | for b in range(self.train_batches): 208 | start_id = b * self.batch_size 209 | cur_bsz = min(self.batch_size, self.train_len - start_id) 210 | cur_batch = list(self.train_order[start_id: start_id + cur_bsz]) 211 | cur_batch.sort(key=lambda x: np.sum(self.data_train_word[x]>0) , reverse = True) 212 | 213 | for mapping in [h_mapping, t_mapping]: 214 | mapping.zero_() 215 | 216 | for mapping in [relation_multi_label, relation_mask, pos_idx]: 217 | mapping.zero_() 218 | 219 | ht_pair_pos.zero_() 220 | 221 | 222 | relation_label.fill_(IGNORE_INDEX) 223 | 224 | max_h_t_cnt = 1 225 | 226 | 227 | for i, index in enumerate(cur_batch): 228 | context_idxs[i].copy_(torch.from_numpy(self.data_train_word[index, :])) 229 | context_pos[i].copy_(torch.from_numpy(self.data_train_pos[index, :])) 230 | context_char_idxs[i].copy_(torch.from_numpy(self.data_train_char[index, :])) 231 | context_ner[i].copy_(torch.from_numpy(self.data_train_ner[index, :])) 232 | 233 | for j in range(self.max_length): 234 | if self.data_train_word[index, j]==0: 235 | break 236 | pos_idx[i, j] = j+1 237 | 238 | ins = self.train_file[index] 239 | labels = ins['labels'] 240 | idx2label = defaultdict(list) 241 | 242 | for label in labels: 243 | idx2label[(label['h'], label['t'])].append(label['r']) 244 | 245 | 246 | 247 | train_tripe = list(idx2label.keys()) 248 | for j, (h_idx, t_idx) in enumerate(train_tripe): 249 | hlist = ins['vertexSet'][h_idx] 250 | tlist = ins['vertexSet'][t_idx] 251 | 252 | for h in hlist: 253 | h_mapping[i, j, h['pos'][0]:h['pos'][1]] = 1.0 / len(hlist) / (h['pos'][1] - h['pos'][0]) 254 | 255 | for t in tlist: 256 | t_mapping[i, j, t['pos'][0]:t['pos'][1]] = 1.0 / len(tlist) / (t['pos'][1] - t['pos'][0]) 257 | 258 | label = idx2label[(h_idx, t_idx)] 259 | 260 | delta_dis = hlist[0]['pos'][0] - tlist[0]['pos'][0] 261 | if delta_dis < 0: 262 | ht_pair_pos[i, j] = -int(self.dis2idx[-delta_dis]) 263 | else: 264 | ht_pair_pos[i, j] = int(self.dis2idx[delta_dis]) 265 | 266 | 267 | for r in label: 268 | relation_multi_label[i, j, r] = 1 269 | 270 | relation_mask[i, j] = 1 271 | rt = np.random.randint(len(label)) 272 | relation_label[i, j] = label[rt] 273 | 274 | 275 | 276 | lower_bound = len(ins['na_triple']) 277 | # random.shuffle(ins['na_triple']) 278 | # lower_bound = max(20, len(train_tripe)*3) 279 | 280 | 281 | for j, (h_idx, t_idx) in enumerate(ins['na_triple'][:lower_bound], len(train_tripe)): 282 | hlist = ins['vertexSet'][h_idx] 283 | tlist = ins['vertexSet'][t_idx] 284 | 285 | for h in hlist: 286 | h_mapping[i, j, h['pos'][0]:h['pos'][1]] = 1.0 / len(hlist) / (h['pos'][1] - h['pos'][0]) 287 | 288 | for t in tlist: 289 | t_mapping[i, j, t['pos'][0]:t['pos'][1]] = 1.0 / len(tlist) / (t['pos'][1] - t['pos'][0]) 290 | 291 | relation_multi_label[i, j, 0] = 1 292 | relation_label[i, j] = 0 293 | relation_mask[i, j] = 1 294 | delta_dis = hlist[0]['pos'][0] - tlist[0]['pos'][0] 295 | if delta_dis < 0: 296 | ht_pair_pos[i, j] = -int(self.dis2idx[-delta_dis]) 297 | else: 298 | ht_pair_pos[i, j] = int(self.dis2idx[delta_dis]) 299 | 300 | max_h_t_cnt = max(max_h_t_cnt, len(train_tripe) + lower_bound) 301 | 302 | 303 | input_lengths = (context_idxs[:cur_bsz] > 0).long().sum(dim=1) 304 | max_c_len = int(input_lengths.max()) 305 | 306 | yield {'context_idxs': context_idxs[:cur_bsz, :max_c_len].contiguous(), 307 | 'context_pos': context_pos[:cur_bsz, :max_c_len].contiguous(), 308 | 'h_mapping': h_mapping[:cur_bsz, :max_h_t_cnt, :max_c_len], 309 | 't_mapping': t_mapping[:cur_bsz, :max_h_t_cnt, :max_c_len], 310 | 'relation_label': relation_label[:cur_bsz, :max_h_t_cnt].contiguous(), 311 | 'input_lengths' : input_lengths, 312 | 'pos_idx': pos_idx[:cur_bsz, :max_c_len].contiguous(), 313 | 'relation_multi_label': relation_multi_label[:cur_bsz, :max_h_t_cnt], 314 | 'relation_mask': relation_mask[:cur_bsz, :max_h_t_cnt], 315 | 'context_ner': context_ner[:cur_bsz, :max_c_len].contiguous(), 316 | 'context_char_idxs': context_char_idxs[:cur_bsz, :max_c_len].contiguous(), 317 | 'ht_pair_pos': ht_pair_pos[:cur_bsz, :max_h_t_cnt], 318 | } 319 | 320 | def get_test_batch(self): 321 | context_idxs = torch.LongTensor(self.test_batch_size, self.max_length).cuda() 322 | context_pos = torch.LongTensor(self.test_batch_size, self.max_length).cuda() 323 | h_mapping = torch.Tensor(self.test_batch_size, self.test_relation_limit, self.max_length).cuda() 324 | t_mapping = torch.Tensor(self.test_batch_size, self.test_relation_limit, self.max_length).cuda() 325 | context_ner = torch.LongTensor(self.test_batch_size, self.max_length).cuda() 326 | context_char_idxs = torch.LongTensor(self.test_batch_size, self.max_length, self.char_limit).cuda() 327 | relation_mask = torch.Tensor(self.test_batch_size, self.h_t_limit).cuda() 328 | ht_pair_pos = torch.LongTensor(self.test_batch_size, self.h_t_limit).cuda() 329 | 330 | for b in range(self.test_batches): 331 | start_id = b * self.test_batch_size 332 | cur_bsz = min(self.test_batch_size, self.test_len - start_id) 333 | cur_batch = list(self.test_order[start_id : start_id + cur_bsz]) 334 | 335 | for mapping in [h_mapping, t_mapping, relation_mask]: 336 | mapping.zero_() 337 | 338 | 339 | ht_pair_pos.zero_() 340 | 341 | max_h_t_cnt = 1 342 | 343 | cur_batch.sort(key=lambda x: np.sum(self.data_test_word[x]>0) , reverse = True) 344 | 345 | labels = [] 346 | 347 | L_vertex = [] 348 | titles = [] 349 | indexes = [] 350 | for i, index in enumerate(cur_batch): 351 | context_idxs[i].copy_(torch.from_numpy(self.data_test_word[index, :])) 352 | context_pos[i].copy_(torch.from_numpy(self.data_test_pos[index, :])) 353 | context_char_idxs[i].copy_(torch.from_numpy(self.data_test_char[index, :])) 354 | context_ner[i].copy_(torch.from_numpy(self.data_test_ner[index, :])) 355 | 356 | 357 | 358 | idx2label = defaultdict(list) 359 | ins = self.test_file[index] 360 | 361 | for label in ins['labels']: 362 | idx2label[(label['h'], label['t'])].append(label['r']) 363 | 364 | 365 | 366 | L = len(ins['vertexSet']) 367 | titles.append(ins['title']) 368 | 369 | j = 0 370 | for h_idx in range(L): 371 | for t_idx in range(L): 372 | if h_idx != t_idx: 373 | hlist = ins['vertexSet'][h_idx] 374 | tlist = ins['vertexSet'][t_idx] 375 | 376 | for h in hlist: 377 | h_mapping[i, j, h['pos'][0]:h['pos'][1]] = 1.0 / len(hlist) / (h['pos'][1] - h['pos'][0]) 378 | for t in tlist: 379 | t_mapping[i, j, t['pos'][0]:t['pos'][1]] = 1.0 / len(tlist) / (t['pos'][1] - t['pos'][0]) 380 | 381 | relation_mask[i, j] = 1 382 | 383 | delta_dis = hlist[0]['pos'][0] - tlist[0]['pos'][0] 384 | if delta_dis < 0: 385 | ht_pair_pos[i, j] = -int(self.dis2idx[-delta_dis]) 386 | else: 387 | ht_pair_pos[i, j] = int(self.dis2idx[delta_dis]) 388 | j += 1 389 | 390 | 391 | max_h_t_cnt = max(max_h_t_cnt, j) 392 | label_set = {} 393 | for label in ins['labels']: 394 | label_set[(label['h'], label['t'], label['r'])] = label['in'+self.train_prefix] 395 | 396 | labels.append(label_set) 397 | 398 | 399 | L_vertex.append(L) 400 | indexes.append(index) 401 | 402 | 403 | 404 | input_lengths = (context_idxs[:cur_bsz] > 0).long().sum(dim=1) 405 | max_c_len = int(input_lengths.max()) 406 | 407 | 408 | yield {'context_idxs': context_idxs[:cur_bsz, :max_c_len].contiguous(), 409 | 'context_pos': context_pos[:cur_bsz, :max_c_len].contiguous(), 410 | 'h_mapping': h_mapping[:cur_bsz, :max_h_t_cnt, :max_c_len], 411 | 't_mapping': t_mapping[:cur_bsz, :max_h_t_cnt, :max_c_len], 412 | 'labels': labels, 413 | 'L_vertex': L_vertex, 414 | 'input_lengths': input_lengths, 415 | 'context_ner': context_ner[:cur_bsz, :max_c_len].contiguous(), 416 | 'context_char_idxs': context_char_idxs[:cur_bsz, :max_c_len].contiguous(), 417 | 'relation_mask': relation_mask[:cur_bsz, :max_h_t_cnt], 418 | 'titles': titles, 419 | 'ht_pair_pos': ht_pair_pos[:cur_bsz, :max_h_t_cnt], 420 | 'indexes': indexes 421 | } 422 | 423 | def train(self, model_pattern, model_name): 424 | 425 | ori_model = model_pattern(config = self) 426 | if self.pretrain_model != None: 427 | ori_model.load_state_dict(torch.load(self.pretrain_model)) 428 | ori_model.cuda() 429 | model = nn.DataParallel(ori_model) 430 | 431 | optimizer = optim.Adam(filter(lambda p: p.requires_grad, model.parameters())) 432 | # nll_average = nn.CrossEntropyLoss(size_average=True, ignore_index=IGNORE_INDEX) 433 | BCE = nn.BCEWithLogitsLoss(reduction='none') 434 | 435 | if not os.path.exists(self.checkpoint_dir): 436 | os.mkdir(self.checkpoint_dir) 437 | 438 | best_auc = 0.0 439 | best_f1 = 0.0 440 | best_epoch = 0 441 | 442 | model.train() 443 | 444 | global_step = 0 445 | total_loss = 0 446 | start_time = time.time() 447 | 448 | def logging(s, print_=True, log_=True): 449 | if print_: 450 | print(s) 451 | if log_: 452 | with open(os.path.join(os.path.join("log", model_name)), 'a+') as f_log: 453 | f_log.write(s + '\n') 454 | 455 | plt.xlabel('Recall') 456 | plt.ylabel('Precision') 457 | plt.ylim(0.3, 1.0) 458 | plt.xlim(0.0, 0.4) 459 | plt.title('Precision-Recall') 460 | plt.grid(True) 461 | 462 | for epoch in range(self.max_epoch): 463 | 464 | self.acc_NA.clear() 465 | self.acc_not_NA.clear() 466 | self.acc_total.clear() 467 | 468 | for data in self.get_train_batch(): 469 | 470 | context_idxs = data['context_idxs'] 471 | context_pos = data['context_pos'] 472 | h_mapping = data['h_mapping'] 473 | t_mapping = data['t_mapping'] 474 | relation_label = data['relation_label'] 475 | input_lengths = data['input_lengths'] 476 | relation_multi_label = data['relation_multi_label'] 477 | relation_mask = data['relation_mask'] 478 | context_ner = data['context_ner'] 479 | context_char_idxs = data['context_char_idxs'] 480 | ht_pair_pos = data['ht_pair_pos'] 481 | 482 | 483 | dis_h_2_t = ht_pair_pos+10 484 | dis_t_2_h = -ht_pair_pos+10 485 | 486 | 487 | predict_re = model(context_idxs, context_pos, context_ner, context_char_idxs, input_lengths, h_mapping, t_mapping, relation_mask, dis_h_2_t, dis_t_2_h) 488 | loss = torch.sum(BCE(predict_re, relation_multi_label)*relation_mask.unsqueeze(2)) / (self.relation_num * torch.sum(relation_mask)) 489 | 490 | 491 | output = torch.argmax(predict_re, dim=-1) 492 | output = output.data.cpu().numpy() 493 | 494 | optimizer.zero_grad() 495 | loss.backward() 496 | optimizer.step() 497 | 498 | relation_label = relation_label.data.cpu().numpy() 499 | 500 | for i in range(output.shape[0]): 501 | for j in range(output.shape[1]): 502 | label = relation_label[i][j] 503 | if label<0: 504 | break 505 | 506 | if label == 0: 507 | self.acc_NA.add(output[i][j] == label) 508 | else: 509 | self.acc_not_NA.add(output[i][j] == label) 510 | 511 | self.acc_total.add(output[i][j] == label) 512 | 513 | global_step += 1 514 | total_loss += loss.item() 515 | 516 | if global_step % self.period == 0 : 517 | cur_loss = total_loss / self.period 518 | elapsed = time.time() - start_time 519 | logging('| epoch {:2d} | step {:4d} | ms/b {:5.2f} | train loss {:5.3f} | NA acc: {:4.2f} | not NA acc: {:4.2f} | tot acc: {:4.2f} '.format(epoch, global_step, elapsed * 1000 / self.period, cur_loss, self.acc_NA.get(), self.acc_not_NA.get(), self.acc_total.get())) 520 | total_loss = 0 521 | start_time = time.time() 522 | 523 | 524 | 525 | if (epoch+1) % self.test_epoch == 0: 526 | logging('-' * 89) 527 | eval_start_time = time.time() 528 | model.eval() 529 | f1, auc, pr_x, pr_y = self.test(model, model_name) 530 | model.train() 531 | logging('| epoch {:3d} | time: {:5.2f}s'.format(epoch, time.time() - eval_start_time)) 532 | logging('-' * 89) 533 | 534 | 535 | if f1 > best_f1: 536 | best_f1 = f1 537 | best_auc = auc 538 | best_epoch = epoch 539 | path = os.path.join(self.checkpoint_dir, model_name) 540 | torch.save(ori_model.state_dict(), path) 541 | 542 | plt.plot(pr_x, pr_y, lw=2, label=str(epoch)) 543 | plt.legend(loc="upper right") 544 | plt.savefig(os.path.join("fig_result", model_name)) 545 | 546 | print("Finish training") 547 | print("Best epoch = %d | auc = %f" % (best_epoch, best_auc)) 548 | print("Storing best result...") 549 | print("Finish storing") 550 | 551 | def test(self, model, model_name, output=False, input_theta=-1): 552 | data_idx = 0 553 | eval_start_time = time.time() 554 | # test_result_ignore = [] 555 | total_recall_ignore = 0 556 | 557 | test_result = [] 558 | total_recall = 0 559 | top1_acc = have_label = 0 560 | 561 | def logging(s, print_=True, log_=True): 562 | if print_: 563 | print(s) 564 | if log_: 565 | with open(os.path.join(os.path.join("log", model_name)), 'a+') as f_log: 566 | f_log.write(s + '\n') 567 | 568 | 569 | 570 | for data in self.get_test_batch(): 571 | with torch.no_grad(): 572 | context_idxs = data['context_idxs'] 573 | context_pos = data['context_pos'] 574 | h_mapping = data['h_mapping'] 575 | t_mapping = data['t_mapping'] 576 | labels = data['labels'] 577 | L_vertex = data['L_vertex'] 578 | input_lengths = data['input_lengths'] 579 | context_ner = data['context_ner'] 580 | context_char_idxs = data['context_char_idxs'] 581 | relation_mask = data['relation_mask'] 582 | ht_pair_pos = data['ht_pair_pos'] 583 | 584 | titles = data['titles'] 585 | indexes = data['indexes'] 586 | 587 | dis_h_2_t = ht_pair_pos+10 588 | dis_t_2_h = -ht_pair_pos+10 589 | 590 | predict_re = model(context_idxs, context_pos, context_ner, context_char_idxs, input_lengths, 591 | h_mapping, t_mapping, relation_mask, dis_h_2_t, dis_t_2_h) 592 | 593 | predict_re = torch.sigmoid(predict_re) 594 | 595 | predict_re = predict_re.data.cpu().numpy() 596 | 597 | for i in range(len(labels)): 598 | label = labels[i] 599 | index = indexes[i] 600 | 601 | 602 | total_recall += len(label) 603 | for l in label.values(): 604 | if not l: 605 | total_recall_ignore += 1 606 | 607 | L = L_vertex[i] 608 | j = 0 609 | 610 | for h_idx in range(L): 611 | for t_idx in range(L): 612 | if h_idx != t_idx: 613 | r = np.argmax(predict_re[i, j]) 614 | if (h_idx, t_idx, r) in label: 615 | top1_acc += 1 616 | 617 | flag = False 618 | 619 | for r in range(1, self.relation_num): 620 | intrain = False 621 | 622 | if (h_idx, t_idx, r) in label: 623 | flag = True 624 | if label[(h_idx, t_idx, r)]==True: 625 | intrain = True 626 | 627 | 628 | # if not intrain: 629 | # test_result_ignore.append( ((h_idx, t_idx, r) in label, float(predict_re[i,j,r]), titles[i], self.id2rel[r], index, h_idx, t_idx, r) ) 630 | 631 | test_result.append( ((h_idx, t_idx, r) in label, float(predict_re[i,j,r]), intrain, titles[i], self.id2rel[r], index, h_idx, t_idx, r) ) 632 | 633 | if flag: 634 | have_label += 1 635 | 636 | j += 1 637 | 638 | 639 | data_idx += 1 640 | 641 | if data_idx % self.period == 0: 642 | print('| step {:3d} | time: {:5.2f}'.format(data_idx // self.period, (time.time() - eval_start_time))) 643 | eval_start_time = time.time() 644 | 645 | # test_result_ignore.sort(key=lambda x: x[1], reverse=True) 646 | test_result.sort(key = lambda x: x[1], reverse=True) 647 | 648 | print ('total_recall', total_recall) 649 | # plt.xlabel('Recall') 650 | # plt.ylabel('Precision') 651 | # plt.ylim(0.2, 1.0) 652 | # plt.xlim(0.0, 0.6) 653 | # plt.title('Precision-Recall') 654 | # plt.grid(True) 655 | 656 | pr_x = [] 657 | pr_y = [] 658 | correct = 0 659 | w = 0 660 | 661 | if total_recall == 0: 662 | total_recall = 1 # for test 663 | 664 | for i, item in enumerate(test_result): 665 | correct += item[0] 666 | pr_y.append(float(correct) / (i + 1)) 667 | pr_x.append(float(correct) / total_recall) 668 | if item[1] > input_theta: 669 | w = i 670 | 671 | 672 | pr_x = np.asarray(pr_x, dtype='float32') 673 | pr_y = np.asarray(pr_y, dtype='float32') 674 | f1_arr = (2 * pr_x * pr_y / (pr_x + pr_y + 1e-20)) 675 | f1 = f1_arr.max() 676 | f1_pos = f1_arr.argmax() 677 | theta = test_result[f1_pos][1] 678 | 679 | if input_theta==-1: 680 | w = f1_pos 681 | input_theta = theta 682 | 683 | auc = sklearn.metrics.auc(x = pr_x, y = pr_y) 684 | if not self.is_test: 685 | logging('ALL : Theta {:3.4f} | F1 {:3.4f} | AUC {:3.4f}'.format(theta, f1, auc)) 686 | else: 687 | logging('ma_f1 {:3.4f} | input_theta {:3.4f} test_result F1 {:3.4f} | AUC {:3.4f}'.format(f1, input_theta, f1_arr[w], auc)) 688 | 689 | if output: 690 | # output = [x[-4:] for x in test_result[:w+1]] 691 | output = [{'index': x[-4], 'h_idx': x[-3], 't_idx': x[-2], 'r_idx': x[-1], 'r': x[-5], 'title': x[-6]} for x in test_result[:w+1]] 692 | json.dump(output, open(self.test_prefix + "_index.json", "w")) 693 | 694 | # plt.plot(pr_x, pr_y, lw=2, label=model_name) 695 | # plt.legend(loc="upper right") 696 | if not os.path.exists(self.fig_result_dir): 697 | os.mkdir(self.fig_result_dir) 698 | # plt.savefig(os.path.join(self.fig_result_dir, model_name)) 699 | 700 | pr_x = [] 701 | pr_y = [] 702 | correct = correct_in_train = 0 703 | w = 0 704 | for i, item in enumerate(test_result): 705 | correct += item[0] 706 | if item[0] & item[2]: 707 | correct_in_train += 1 708 | if correct_in_train==correct: 709 | p = 0 710 | else: 711 | p = float(correct - correct_in_train) / (i + 1 - correct_in_train) 712 | pr_y.append(p) 713 | pr_x.append(float(correct) / total_recall) 714 | if item[1] > input_theta: 715 | w = i 716 | 717 | pr_x = np.asarray(pr_x, dtype='float32') 718 | pr_y = np.asarray(pr_y, dtype='float32') 719 | f1_arr = (2 * pr_x * pr_y / (pr_x + pr_y + 1e-20)) 720 | f1 = f1_arr.max() 721 | 722 | auc = sklearn.metrics.auc(x = pr_x, y = pr_y) 723 | 724 | logging('Ignore ma_f1 {:3.4f} | input_theta {:3.4f} test_result F1 {:3.4f} | AUC {:3.4f}'.format(f1, input_theta, f1_arr[w], auc)) 725 | 726 | return f1, auc, pr_x, pr_y 727 | 728 | 729 | 730 | def testall(self, model_pattern, model_name, input_theta):#, ignore_input_theta): 731 | model = model_pattern(config = self) 732 | 733 | model.load_state_dict(torch.load(os.path.join(self.checkpoint_dir, model_name))) 734 | model.cuda() 735 | model.eval() 736 | f1, auc, pr_x, pr_y = self.test(model, model_name, True, input_theta) 737 | -------------------------------------------------------------------------------- /code/config/EviConfig.py: -------------------------------------------------------------------------------- 1 | # coding: utf-8 2 | import torch 3 | import torch.nn as nn 4 | from torch.autograd import Variable 5 | import torch.optim as optim 6 | import numpy as np 7 | import os 8 | import time 9 | import datetime 10 | import json 11 | import sys 12 | import sklearn.metrics 13 | from tqdm import tqdm 14 | import matplotlib 15 | matplotlib.use('Agg') 16 | import matplotlib.pyplot as plt 17 | import random 18 | from collections import defaultdict 19 | import torch.nn.functional as F 20 | 21 | 22 | IGNORE_INDEX = -100 23 | TRAIN_LIMIT = 3600 24 | test_evidence = False 25 | 26 | class Accuracy(object): 27 | def __init__(self): 28 | self.correct = 0 29 | self.total = 0 30 | def add(self, is_correct): 31 | self.total += 1 32 | if is_correct: 33 | self.correct += 1 34 | def get(self): 35 | if self.total == 0: 36 | return 0.0 37 | else: 38 | return float(self.correct) / self.total 39 | def clear(self): 40 | self.correct = 0 41 | self.total = 0 42 | 43 | class EviConfig(object): 44 | def __init__(self, args): 45 | self.acc_NA = Accuracy() 46 | self.acc_not_NA = Accuracy() 47 | self.acc_total = Accuracy() 48 | self.data_path = './prepro_data' 49 | self.use_bag = False 50 | self.use_gpu = True 51 | self.is_training = True 52 | self.max_length = 512 53 | self.pos_num = 2 * self.max_length 54 | self.entity_num = self.max_length 55 | self.relation_num = 97 56 | self.coref_size = 20 57 | self.entity_type_size = 20 58 | self.max_epoch = 20 59 | self.opt_method = 'Adam' 60 | self.optimizer = None 61 | self.drop_prob = 0.5 # for cnn 62 | self.keep_prob = 0.8 # for lstm 63 | self.checkpoint_dir = './checkpoint' 64 | self.test_result_dir = './test_result' 65 | self.test_epoch = 5 66 | self.pretrain_model = None 67 | 68 | 69 | self.word_size = 100 70 | self.epoch_range = None 71 | self.dropout = 0.5 72 | self.period = 50 73 | 74 | self.ins_batch_size = 40 75 | self.test_ins_batch_size = self.ins_batch_size 76 | self.batch_size = 4000 77 | 78 | 79 | self.char_limit = 16 80 | self.sent_limit = 25 81 | self.dis2idx = np.zeros((512), dtype='int64') 82 | self.dis2idx[1] = 1 83 | self.dis2idx[2:] = 2 84 | self.dis2idx[4:] = 3 85 | self.dis2idx[8:] = 4 86 | self.dis2idx[16:] = 5 87 | self.dis2idx[32:] = 6 88 | self.dis2idx[64:] = 7 89 | self.dis2idx[128:] = 8 90 | self.dis2idx[256:] = 9 91 | self.dis_size = 20 92 | 93 | self.train_prefix = args.train_prefix 94 | self.test_prefix = args.test_prefix 95 | self.output_file = args.output_file 96 | 97 | 98 | def set_data_path(self, data_path): 99 | self.data_path = data_path 100 | def set_max_length(self, max_length): 101 | self.max_length = max_length 102 | self.pos_num = 2 * self.max_length 103 | def set_num_classes(self, num_classes): 104 | self.num_classes = num_classes 105 | def set_window_size(self, window_size): 106 | self.window_size = window_size 107 | def set_pos_size(self, pos_size): 108 | self.pos_size = pos_size 109 | def set_word_size(self, word_size): 110 | self.word_size = word_size 111 | def set_max_epoch(self, max_epoch): 112 | self.max_epoch = max_epoch 113 | def set_batch_size(self, batch_size): 114 | self.batch_size = batch_size 115 | def set_opt_method(self, opt_method): 116 | self.opt_method = opt_method 117 | def set_drop_prob(self, drop_prob): 118 | self.drop_prob = drop_prob 119 | def set_checkpoint_dir(self, checkpoint_dir): 120 | self.checkpoint_dir = checkpoint_dir 121 | def set_test_epoch(self, test_epoch): 122 | self.test_epoch = test_epoch 123 | def set_pretrain_model(self, pretrain_model): 124 | self.pretrain_model = pretrain_model 125 | def set_is_training(self, is_training): 126 | self.is_training = is_training 127 | def set_use_bag(self, use_bag): 128 | self.use_bag = use_bag 129 | def set_use_gpu(self, use_gpu): 130 | self.use_gpu = use_gpu 131 | def set_epoch_range(self, epoch_range): 132 | self.epoch_range = epoch_range 133 | 134 | def load_train_data(self): 135 | print("Reading training data...") 136 | 137 | prefix = 'dev_train' 138 | self.data_train_word = np.load(os.path.join(self.data_path, prefix+'_word.npy')) 139 | self.data_train_pos = np.load(os.path.join(self.data_path, prefix+'_pos.npy')) 140 | self.data_train_ner = np.load(os.path.join(self.data_path, prefix+'_ner.npy')) 141 | self.data_train_char = np.load(os.path.join(self.data_path, prefix+'_char.npy')) 142 | self.train_file = json.load(open(os.path.join(self.data_path, prefix+'.json'))) 143 | 144 | print("Finish reading") 145 | 146 | self.train_len = ins_num = self.data_train_word.shape[0] 147 | assert(self.train_len==len(self.train_file)) 148 | 149 | self.train_order = list(range(ins_num)) 150 | self.train_batches = ins_num // self.ins_batch_size 151 | if ins_num % self.ins_batch_size != 0: 152 | self.train_batches += 1 153 | 154 | def load_test_data(self): 155 | print("Reading testing data...") 156 | 157 | self.data_char_vec = np.load(os.path.join(self.data_path, 'char_vec.npy')) 158 | self.data_word_vec = np.load(os.path.join(self.data_path, 'vec.npy')) 159 | self.rel2id = json.load(open(os.path.join(self.data_path, 'rel2id.json'))) 160 | self.id2rel = {v: k for k,v in self.rel2id.items()} 161 | 162 | prefix = self.test_prefix 163 | print (prefix) 164 | self.data_test_word = np.load(os.path.join(self.data_path, prefix+'_word.npy')) 165 | self.data_test_pos = np.load(os.path.join(self.data_path, prefix+'_pos.npy')) 166 | self.data_test_ner = np.load(os.path.join(self.data_path, prefix+'_ner.npy')) 167 | self.data_test_char = np.load(os.path.join(self.data_path, prefix+'_char.npy')) 168 | self.test_file = json.load(open(os.path.join(self.data_path, prefix+'.json'))) 169 | 170 | self.test_len = self.data_test_word.shape[0] 171 | assert(self.test_len==len(self.test_file)) 172 | 173 | 174 | self.test_index = json.load(open(prefix+"_index.json")) 175 | 176 | 177 | self.total_evidence_recall = 0 178 | for ins in self.test_file: 179 | for label in ins['labels']: 180 | evidence = [int(e) for e in label['evidence']] 181 | self.total_evidence_recall += len(evidence) 182 | 183 | print ("total_evidence_recall:", self.total_evidence_recall) 184 | print ("Finish reading") 185 | 186 | self.test_batches = self.data_test_word.shape[0] // self.test_ins_batch_size 187 | if self.data_test_word.shape[0] % self.test_ins_batch_size != 0: 188 | self.test_batches += 1 189 | 190 | 191 | cur_batch = list(range(self.test_len)) 192 | cur_batch.sort(key=lambda x: len(self.test_file[x]['vertexSet'])) 193 | i = 0 194 | j = self.test_len-1 195 | # small vertexSet + big vertexSet as a pair 196 | self.test_order = [] 197 | while i <= j: 198 | self.test_order.append(cur_batch[i]) 199 | i += 1 200 | if i>j: 201 | break 202 | 203 | self.test_order.append(cur_batch[j]) 204 | j -= 1 205 | 206 | assert(len(self.test_order)==self.test_len) 207 | 208 | 209 | def get_N2_train_batch(self): 210 | random.shuffle(self.train_order) 211 | 212 | context_idxs = torch.LongTensor(self.batch_size, self.max_length).cuda() 213 | context_pos = torch.LongTensor(self.batch_size, self.max_length).cuda() 214 | 215 | context_ner = torch.LongTensor(self.batch_size, self.max_length).cuda() 216 | context_char_idxs = torch.LongTensor(self.batch_size, self.max_length, self.char_limit).cuda() 217 | 218 | relation_label = torch.LongTensor(self.batch_size).cuda() 219 | evidence_label = torch.Tensor(self.batch_size, self.sent_limit).cuda() 220 | sent_mask = torch.Tensor(self.batch_size, self.sent_limit).cuda() 221 | 222 | sent_h_mapping = torch.Tensor(self.batch_size, self.sent_limit, self.max_length).cuda() 223 | sent_t_mapping = torch.Tensor(self.batch_size, self.sent_limit, self.max_length).cuda() 224 | 225 | 226 | for b in range(self.train_batches): 227 | start_id = b * self.ins_batch_size 228 | cur_bsz = min(self.ins_batch_size, self.train_len - start_id) 229 | cur_batch = list(self.train_order[start_id: start_id + cur_bsz]) 230 | cur_batch.sort(key=lambda x: np.sum(self.data_train_word[x]>0) , reverse = True) 231 | 232 | for mapping in [sent_h_mapping, sent_t_mapping, sent_mask, evidence_label, context_pos, relation_label]: 233 | mapping.zero_() 234 | 235 | max_sents = 0 236 | i = 0 237 | for w, index in enumerate(cur_batch): 238 | ins = self.train_file[index] 239 | Ls = ins['Ls'] 240 | max_sents = max(max_sents, len(Ls) - 1) 241 | random.shuffle(ins['labels']) 242 | for label in ins['labels']: 243 | context_idxs[i].copy_(torch.from_numpy(self.data_train_word[index, :])) 244 | context_char_idxs[i].copy_(torch.from_numpy(self.data_train_char[index, :])) 245 | context_ner[i].copy_(torch.from_numpy(self.data_train_ner[index, :])) 246 | relation_label[i] = label['r'] 247 | 248 | h_idx = label['h'] 249 | t_idx = label['t'] 250 | 251 | hlist = ins['vertexSet'][h_idx] 252 | tlist = ins['vertexSet'][t_idx] 253 | 254 | for h in hlist: 255 | context_pos[i, h['pos'][0]:h['pos'][1]] = 1 256 | 257 | for t in tlist: 258 | context_pos[i, t['pos'][0]:t['pos'][1]] = 2 259 | 260 | for e in label['evidence']: 261 | evidence_label[i, int(e)] = 1 262 | 263 | for j in range(len(Ls) - 1): 264 | sent_h_mapping[i, j, Ls[j]] = 1 265 | sent_t_mapping[i, j, Ls[j + 1] - 1] = 1 266 | sent_mask[i, j] = 1 267 | 268 | 269 | i += 1 270 | if i == self.batch_size: 271 | break 272 | if i == self.batch_size: 273 | break 274 | 275 | 276 | 277 | cur_bsz = i 278 | input_lengths = (context_idxs[:cur_bsz] > 0).long().sum(dim=1) 279 | max_c_len = int(input_lengths.max()) 280 | 281 | yield {'context_idxs': context_idxs[:cur_bsz, :max_c_len].contiguous(), 282 | 'context_pos': context_pos[:cur_bsz, :max_c_len].contiguous(), 283 | 'relation_label': relation_label[:cur_bsz].contiguous(), 284 | 'input_lengths' : input_lengths, 285 | 'context_ner': context_ner[:cur_bsz, :max_c_len].contiguous(), 286 | 'context_char_idxs': context_char_idxs[:cur_bsz, :max_c_len].contiguous(), 287 | 'sent_h_mapping': sent_h_mapping[:cur_bsz, :max_sents, :max_c_len], 288 | 'sent_t_mapping': sent_t_mapping[:cur_bsz, :max_sents, :max_c_len], 289 | 'sent_mask': sent_mask[:cur_bsz, :max_sents], 290 | 'evidence_label': evidence_label[:cur_bsz, :max_sents] 291 | } 292 | 293 | 294 | def get_real_test_batch(self): 295 | 296 | 297 | self.test_len = len(self.test_index) 298 | 299 | self.test_order = list(range(self.test_len)) 300 | self.test_batches = self.test_len // self.batch_size 301 | if self.test_len % self.batch_size != 0: 302 | self.test_batches += 1 303 | 304 | context_idxs = torch.LongTensor(self.batch_size, self.max_length).cuda() 305 | context_pos = torch.LongTensor(self.batch_size, self.max_length).cuda() 306 | 307 | context_ner = torch.LongTensor(self.batch_size, self.max_length).cuda() 308 | context_char_idxs = torch.LongTensor(self.batch_size, self.max_length, self.char_limit).cuda() 309 | 310 | relation_label = torch.LongTensor(self.batch_size).cuda() 311 | 312 | sent_mask = torch.Tensor(self.batch_size, self.sent_limit).cuda() 313 | 314 | sent_h_mapping = torch.Tensor(self.batch_size, self.sent_limit, self.max_length).cuda() 315 | sent_t_mapping = torch.Tensor(self.batch_size, self.sent_limit, self.max_length).cuda() 316 | 317 | 318 | for b in range(self.test_batches): 319 | start_id = b * self.batch_size 320 | cur_bsz = min(self.batch_size, self.test_len - start_id) 321 | cur_batch = list(self.test_order[start_id : start_id + cur_bsz]) 322 | 323 | cur_batch.sort(key=lambda x: np.sum(self.data_test_word[self.test_index[x]['index']]>0) , reverse = True) 324 | 325 | for mapping in [sent_h_mapping, sent_t_mapping, sent_mask, context_pos, relation_label]: 326 | mapping.zero_() 327 | 328 | max_sents = 0 329 | evidences = [] 330 | sents_num = [] 331 | infos = [] 332 | 333 | 334 | for i, t_index in enumerate(cur_batch): 335 | pos_ins = self.test_index[t_index] 336 | index = pos_ins['index'] 337 | h_idx = pos_ins['h_idx'] 338 | t_idx = pos_ins['t_idx'] 339 | r = pos_ins['r_idx'] 340 | 341 | ins = self.test_file[index] 342 | Ls = ins['Ls'] 343 | max_sents = max(max_sents, len(Ls) - 1) 344 | infos.append((ins['title'], h_idx, t_idx, self.id2rel[r])) 345 | 346 | 347 | context_idxs[i].copy_(torch.from_numpy(self.data_test_word[index, :])) 348 | context_char_idxs[i].copy_(torch.from_numpy(self.data_test_char[index, :])) 349 | context_ner[i].copy_(torch.from_numpy(self.data_test_ner[index, :])) 350 | relation_label[i] = r 351 | 352 | hlist = ins['vertexSet'][h_idx] 353 | tlist = ins['vertexSet'][t_idx] 354 | 355 | for h in hlist: 356 | context_pos[i, h['pos'][0]:h['pos'][1]] = 1 357 | 358 | for t in tlist: 359 | context_pos[i, t['pos'][0]:t['pos'][1]] = 2 360 | 361 | 362 | evidence = [] 363 | for label in ins['labels']: 364 | if (label['h'], label['t'], label['r']) == (h_idx, t_idx, r): 365 | evidence = [int(e) for e in label['evidence']] 366 | 367 | evidences.append(evidence) 368 | 369 | for j in range(len(Ls) - 1): 370 | sent_h_mapping[i, j, Ls[j]] = 1 371 | sent_t_mapping[i, j, Ls[j + 1] - 1] = 1 372 | sent_mask[i, j] = 1 373 | 374 | sents_num.append(len(Ls)-1) 375 | 376 | input_lengths = (context_idxs[:cur_bsz] > 0).long().sum(dim=1) 377 | max_c_len = int(input_lengths.max()) 378 | 379 | yield {'context_idxs': context_idxs[:cur_bsz, :max_c_len].contiguous(), 380 | 'context_pos': context_pos[:cur_bsz, :max_c_len].contiguous(), 381 | 'relation_label': relation_label[:cur_bsz].contiguous(), 382 | 'input_lengths' : input_lengths, 383 | 'context_ner': context_ner[:cur_bsz, :max_c_len].contiguous(), 384 | 'context_char_idxs': context_char_idxs[:cur_bsz, :max_c_len].contiguous(), 385 | 'sent_h_mapping': sent_h_mapping[:cur_bsz, :max_sents, :max_c_len], 386 | 'sent_t_mapping': sent_t_mapping[:cur_bsz, :max_sents, :max_c_len], 387 | 'sent_mask': sent_mask[:cur_bsz, :max_sents], 388 | 'evidences': evidences, 389 | 'sents_num': sents_num, 390 | 'infos': infos 391 | } 392 | 393 | 394 | def train(self, model_pattern, model_name): 395 | ori_model = model_pattern(config = self) 396 | if self.pretrain_model != None: 397 | ori_model.load_state_dict(torch.load(self.pretrain_model)) 398 | ori_model.cuda() 399 | model = nn.DataParallel(ori_model) 400 | 401 | optimizer = optim.Adam(filter(lambda p: p.requires_grad, model.parameters())) 402 | 403 | BCE = nn.BCEWithLogitsLoss(reduction='none') 404 | 405 | if not os.path.exists(self.checkpoint_dir): 406 | os.mkdir(self.checkpoint_dir) 407 | 408 | best_auc = 0.0 409 | best_f1 = 0.0 410 | best_epoch = 0 411 | 412 | model.train() 413 | 414 | global_step = 0 415 | total_loss = 0 416 | start_time = time.time() 417 | 418 | def logging(s, print_=True, log_=True): 419 | if print_: 420 | print(s) 421 | if log_: 422 | with open(os.path.join(os.path.join("log", model_name)), 'a+') as f_log: 423 | f_log.write(s + '\n') 424 | 425 | for epoch in range(self.max_epoch): 426 | 427 | self.acc_NA.clear() 428 | self.acc_not_NA.clear() 429 | self.acc_total.clear() 430 | 431 | for data in self.get_N2_train_batch(): 432 | 433 | context_idxs = data['context_idxs'] 434 | context_pos = data['context_pos'] 435 | relation_label = data['relation_label'] 436 | input_lengths = data['input_lengths'] 437 | context_ner = data['context_ner'] 438 | context_char_idxs = data['context_char_idxs'] 439 | sent_h_mapping = data['sent_h_mapping'] 440 | sent_t_mapping = data['sent_t_mapping'] 441 | sent_mask = data['sent_mask'] 442 | evidence_label = data['evidence_label'] 443 | 444 | predict_sent = model(context_idxs, context_pos, context_ner, context_char_idxs, input_lengths, sent_h_mapping, sent_t_mapping, relation_label) 445 | loss = torch.sum(BCE(predict_sent, evidence_label) * sent_mask) / torch.sum(sent_mask) 446 | 447 | 448 | optimizer.zero_grad() 449 | loss.backward() 450 | optimizer.step() 451 | 452 | global_step += 1 453 | total_loss += loss.item() 454 | 455 | if global_step % self.period == 0 : 456 | cur_loss = total_loss / self.period 457 | elapsed = time.time() - start_time 458 | logging('| epoch {:2d} | step {:4d} | ms/b {:5.2f} | train loss {:5.3f} '.format(epoch, global_step, elapsed * 1000 / self.period, cur_loss)) 459 | total_loss = 0 460 | start_time = time.time() 461 | 462 | 463 | 464 | if (epoch + 1) % self.test_epoch == 0: 465 | logging('-' * 89) 466 | eval_start_time = time.time() 467 | model.eval() 468 | f1 = self.test(model, model_name) 469 | model.train() 470 | logging('| epoch {:3d} | time: {:5.2f}s | F1 {:.4f}'.format(epoch, time.time() - eval_start_time, f1)) 471 | logging('-' * 89) 472 | 473 | 474 | if f1 > best_f1: 475 | best_f1 = f1 476 | best_epoch = epoch 477 | path = os.path.join(self.checkpoint_dir, model_name) 478 | torch.save(ori_model.state_dict(), path) 479 | 480 | print("Finish training") 481 | print("Best epoch = %d | auc = %f" % (best_epoch, best_auc)) 482 | print("Storing best result...") 483 | print("Finish storing") 484 | 485 | def test(self, model, model_name, output=False, input_theta=-1): 486 | test_evidence_result = [] 487 | 488 | def logging(s, print_=True, log_=True): 489 | if print_: 490 | print(s) 491 | if log_: 492 | with open(os.path.join(os.path.join("log", model_name)), 'a+') as f_log: 493 | f_log.write(s + '\n') 494 | 495 | for data in self.get_real_test_batch(): 496 | with torch.no_grad(): 497 | context_idxs = data['context_idxs'] 498 | context_pos = data['context_pos'] 499 | relation_label = data['relation_label'] 500 | input_lengths = data['input_lengths'] 501 | context_ner = data['context_ner'] 502 | context_char_idxs = data['context_char_idxs'] 503 | sent_h_mapping = data['sent_h_mapping'] 504 | sent_t_mapping = data['sent_t_mapping'] 505 | evidences = data['evidences'] 506 | sents_num = data['sents_num'] 507 | infos = data['infos'] 508 | 509 | 510 | predict_sent = model(context_idxs, context_pos, context_ner, context_char_idxs, input_lengths, sent_h_mapping, sent_t_mapping, relation_label) 511 | 512 | predict_sent = torch.sigmoid(predict_sent) 513 | 514 | 515 | predict_sent = predict_sent.data.cpu().numpy() 516 | 517 | for i in range(len(evidences)): 518 | evi = evidences[i] 519 | for j in range(sents_num[i]): 520 | test_evidence_result.append( (j in evi, float(predict_sent[i, j]), infos[i], j) ) 521 | 522 | 523 | 524 | test_evidence_result.sort(key = lambda x: x[1], reverse=True) 525 | 526 | total_evidence_recall = self.total_evidence_recall 527 | if total_evidence_recall==0: # for test 528 | total_evidence_recall = 1 529 | 530 | pr_x = [] 531 | pr_y = [] 532 | correct = 0 533 | w = 0 534 | 535 | for i, item in enumerate(test_evidence_result): 536 | correct += item[0] 537 | pr_y.append(float(correct) / (i + 1)) 538 | pr_x.append(float(correct) / total_evidence_recall) 539 | if item[1] > input_theta: 540 | w = i 541 | 542 | 543 | pr_x = np.asarray(pr_x, dtype='float32') 544 | pr_y = np.asarray(pr_y, dtype='float32') 545 | f1_arr = (2 * pr_x * pr_y / (pr_x + pr_y + 1e-20)) 546 | f1_pos = f1_arr.argmax() 547 | evidence_f1 = f1_arr.max() 548 | auc = sklearn.metrics.auc(x = pr_x, y = pr_y) 549 | 550 | if input_theta==-1: 551 | w = f1_pos 552 | input_theta = test_evidence_result[w][1] 553 | 554 | logging('ma_f1{:3.4f} | input_theta {:3.4f} test_evidence_result F1 {:3.4f} | AUC {:3.4f}'.format(evidence_f1, input_theta, f1_arr[w], auc)) 555 | 556 | if output: 557 | info2evi = {} 558 | 559 | for x in self.test_index: 560 | info2evi[(x['title'], x['h_idx'], x['t_idx'], x['r'])] = [] 561 | 562 | 563 | for i in range(w+1): 564 | info = test_evidence_result[i][-2] 565 | sent_id = test_evidence_result[i][-1] 566 | info2evi[info].append(sent_id) 567 | 568 | 569 | output = [] 570 | for u, v in info2evi.items(): 571 | title = u[0] 572 | h_idx = u[1] 573 | t_idx = u[2] 574 | r = u[3] 575 | evidence = v 576 | output.append({'title':title, 'h_idx': h_idx, 't_idx': t_idx, 'r': r, 'evidence': evidence}) 577 | 578 | json.dump(output, open(self.output_file, "w")) 579 | 580 | return evidence_f1 581 | 582 | 583 | 584 | def testall(self, model_pattern, model_name, input_theta=-1): 585 | model = model_pattern(config = self) 586 | 587 | model.load_state_dict(torch.load(os.path.join(self.checkpoint_dir, model_name))) 588 | model.cuda() 589 | model.eval() 590 | self.test(model, model_name, True, input_theta) 591 | 592 | -------------------------------------------------------------------------------- /code/config/__init__.py: -------------------------------------------------------------------------------- 1 | from .Config import Config 2 | from .EviConfig import EviConfig 3 | -------------------------------------------------------------------------------- /code/evaluation.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | import sys 3 | import os 4 | import os.path 5 | import json 6 | 7 | def gen_train_facts(data_file_name, truth_dir): 8 | fact_file_name = data_file_name[data_file_name.find("train_"):] 9 | fact_file_name = os.path.join(truth_dir, fact_file_name.replace(".json", ".fact")) 10 | 11 | if os.path.exists(fact_file_name): 12 | fact_in_train = set([]) 13 | triples = json.load(open(fact_file_name)) 14 | for x in triples: 15 | fact_in_train.add(tuple(x)) 16 | return fact_in_train 17 | 18 | fact_in_train = set([]) 19 | ori_data = json.load(open(data_file_name)) 20 | for data in ori_data: 21 | vertexSet = data['vertexSet'] 22 | for label in data['labels']: 23 | rel = label['r'] 24 | for n1 in vertexSet[label['h']]: 25 | for n2 in vertexSet[label['t']]: 26 | fact_in_train.add((n1['name'], n2['name'], rel)) 27 | 28 | json.dump(list(fact_in_train), open(fact_file_name, "w")) 29 | 30 | return fact_in_train 31 | 32 | input_dir = sys.argv[1] 33 | output_dir = sys.argv[2] 34 | 35 | submit_dir = os.path.join(input_dir, 'res') 36 | truth_dir = os.path.join(input_dir, 'ref') 37 | 38 | if not os.path.isdir(submit_dir): 39 | print ("%s doesn't exist" % submit_dir) 40 | 41 | if os.path.isdir(submit_dir) and os.path.isdir(truth_dir): 42 | if not os.path.exists(output_dir): 43 | os.makedirs(output_dir) 44 | 45 | fact_in_train_annotated = gen_train_facts("../data/train_annotated.json", truth_dir) 46 | fact_in_train_distant = gen_train_facts("../data/train_distant.json", truth_dir) 47 | 48 | output_filename = os.path.join(output_dir, 'scores.txt') 49 | output_file = open(output_filename, 'w') 50 | 51 | truth_file = os.path.join(truth_dir, "dev_test.json") 52 | truth = json.load(open(truth_file)) 53 | 54 | std = {} 55 | tot_evidences = 0 56 | titleset = set([]) 57 | 58 | title2vectexSet = {} 59 | 60 | for x in truth: 61 | title = x['title'] 62 | titleset.add(title) 63 | 64 | vertexSet = x['vertexSet'] 65 | title2vectexSet[title] = vertexSet 66 | 67 | for label in x['labels']: 68 | r = label['r'] 69 | 70 | h_idx = label['h'] 71 | t_idx = label['t'] 72 | std[(title, r, h_idx, t_idx)] = set(label['evidence']) 73 | tot_evidences += len(label['evidence']) 74 | 75 | tot_relations = len(std) 76 | 77 | submission_answer_file = os.path.join(submit_dir, "result.json") 78 | tmp = json.load(open(submission_answer_file)) 79 | tmp.sort(key=lambda x: (x['title'], x['h_idx'], x['t_idx'], x['r'])) 80 | submission_answer = [tmp[0]] 81 | for i in range(1, len(tmp)): 82 | x = tmp[i] 83 | y = tmp[i-1] 84 | if (x['title'], x['h_idx'], x['t_idx'], x['r']) != (y['title'], y['h_idx'], y['t_idx'], y['r']): 85 | submission_answer.append(tmp[i]) 86 | 87 | correct_re = 0 88 | correct_evidence = 0 89 | pred_evi = 0 90 | 91 | correct_in_train_annotated = 0 92 | correct_in_train_distant = 0 93 | titleset2 = set([]) 94 | for x in submission_answer: 95 | title = x['title'] 96 | h_idx = x['h_idx'] 97 | t_idx = x['t_idx'] 98 | r = x['r'] 99 | titleset2.add(title) 100 | if title not in title2vectexSet: 101 | continue 102 | vertexSet = title2vectexSet[title] 103 | 104 | if 'evidence' in x: 105 | evi = set(x['evidence']) 106 | else: 107 | evi = set([]) 108 | pred_evi += len(evi) 109 | 110 | if (title, r, h_idx, t_idx) in std: 111 | correct_re += 1 112 | stdevi = std[(title, r, h_idx, t_idx)] 113 | correct_evidence += len(stdevi & evi) 114 | in_train_annotated = in_train_distant = False 115 | for n1 in vertexSet[h_idx]: 116 | for n2 in vertexSet[t_idx]: 117 | if (n1['name'], n2['name'], r) in fact_in_train_annotated: 118 | in_train_annotated = True 119 | if (n1['name'], n2['name'], r) in fact_in_train_distant: 120 | in_train_distant = True 121 | 122 | if in_train_annotated: 123 | correct_in_train_annotated += 1 124 | if in_train_distant: 125 | correct_in_train_distant += 1 126 | 127 | re_p = 1.0 * correct_re / len(submission_answer) 128 | re_r = 1.0 * correct_re / tot_relations 129 | if re_p+re_r == 0: 130 | re_f1 = 0 131 | else: 132 | re_f1 = 2.0 * re_p * re_r / (re_p + re_r) 133 | 134 | evi_p = 1.0 * correct_evidence / pred_evi if pred_evi>0 else 0 135 | evi_r = 1.0 * correct_evidence / tot_evidences 136 | if evi_p+evi_r == 0: 137 | evi_f1 = 0 138 | else: 139 | evi_f1 = 2.0 * evi_p * evi_r / (evi_p + evi_r) 140 | 141 | re_p_ignore_train_annotated = 1.0 * (correct_re-correct_in_train_annotated) / (len(submission_answer)-correct_in_train_annotated) 142 | re_p_ignore_train = 1.0 * (correct_re-correct_in_train_distant) / (len(submission_answer)-correct_in_train_distant) 143 | 144 | if re_p_ignore_train_annotated+re_r == 0: 145 | re_f1_ignore_train_annotated = 0 146 | else: 147 | re_f1_ignore_train_annotated = 2.0 * re_p_ignore_train_annotated * re_r / (re_p_ignore_train_annotated + re_r) 148 | 149 | if re_p_ignore_train+re_r == 0: 150 | re_f1_ignore_train = 0 151 | else: 152 | re_f1_ignore_train = 2.0 * re_p_ignore_train * re_r / (re_p_ignore_train + re_r) 153 | 154 | 155 | 156 | print ('RE_F1:', re_f1) 157 | print ('Evi_F1:', evi_f1) 158 | print ('RE_ignore_annotated_F1:', re_f1_ignore_train_annotated) 159 | print ('RE_ignore_distant_F1:', re_f1_ignore_train) 160 | 161 | output_file.write("RE_F1: %f\n" % re_f1) 162 | output_file.write("Evi_F1: %f\n" % evi_f1) 163 | 164 | output_file.write("RE_ignore_annotated_F1: %f\n" % re_f1_ignore_train_annotated) 165 | output_file.write("RE_ignore_distant_F1: %f\n" % re_f1_ignore_train) 166 | 167 | 168 | output_file.close() 169 | 170 | -------------------------------------------------------------------------------- /code/gen_data.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import os 3 | import json 4 | from nltk.tokenize import WordPunctTokenizer 5 | import argparse 6 | parser = argparse.ArgumentParser() 7 | parser.add_argument('--in_path', type = str, default = "../data") 8 | parser.add_argument('--out_path', type = str, default = "prepro_data") 9 | 10 | args = parser.parse_args() 11 | in_path = args.in_path 12 | out_path = args.out_path 13 | case_sensitive = False 14 | 15 | char_limit = 16 16 | train_distant_file_name = os.path.join(in_path, 'train_distant.json') 17 | train_annotated_file_name = os.path.join(in_path, 'train_annotated.json') 18 | dev_file_name = os.path.join(in_path, 'dev.json') 19 | test_file_name = os.path.join(in_path, 'test.json') 20 | 21 | rel2id = json.load(open(os.path.join(out_path, 'rel2id.json'), "r")) 22 | id2rel = {v:u for u,v in rel2id.items()} 23 | json.dump(id2rel, open(os.path.join(out_path, 'id2rel.json'), "w")) 24 | fact_in_train = set([]) 25 | fact_in_dev_train = set([]) 26 | 27 | def init(data_file_name, rel2id, max_length = 512, is_training = True, suffix=''): 28 | 29 | ori_data = json.load(open(data_file_name)) 30 | 31 | 32 | Ma = 0 33 | Ma_e = 0 34 | data = [] 35 | intrain = notintrain = notindevtrain = indevtrain = 0 36 | for i in range(len(ori_data)): 37 | Ls = [0] 38 | L = 0 39 | for x in ori_data[i]['sents']: 40 | L += len(x) 41 | Ls.append(L) 42 | 43 | vertexSet = ori_data[i]['vertexSet'] 44 | # point position added with sent start position 45 | for j in range(len(vertexSet)): 46 | for k in range(len(vertexSet[j])): 47 | vertexSet[j][k]['sent_id'] = int(vertexSet[j][k]['sent_id']) 48 | 49 | sent_id = vertexSet[j][k]['sent_id'] 50 | dl = Ls[sent_id] 51 | pos1 = vertexSet[j][k]['pos'][0] 52 | pos2 = vertexSet[j][k]['pos'][1] 53 | vertexSet[j][k]['pos'] = (pos1+dl, pos2+dl) 54 | 55 | ori_data[i]['vertexSet'] = vertexSet 56 | 57 | item = {} 58 | item['vertexSet'] = vertexSet 59 | labels = ori_data[i].get('labels', []) 60 | 61 | train_triple = set([]) 62 | new_labels = [] 63 | for label in labels: 64 | rel = label['r'] 65 | assert(rel in rel2id) 66 | label['r'] = rel2id[label['r']] 67 | 68 | train_triple.add((label['h'], label['t'])) 69 | 70 | 71 | if suffix=='_train': 72 | for n1 in vertexSet[label['h']]: 73 | for n2 in vertexSet[label['t']]: 74 | fact_in_dev_train.add((n1['name'], n2['name'], rel)) 75 | 76 | 77 | if is_training: 78 | for n1 in vertexSet[label['h']]: 79 | for n2 in vertexSet[label['t']]: 80 | fact_in_train.add((n1['name'], n2['name'], rel)) 81 | 82 | else: 83 | # fix a bug here 84 | label['intrain'] = False 85 | label['indev_train'] = False 86 | 87 | for n1 in vertexSet[label['h']]: 88 | for n2 in vertexSet[label['t']]: 89 | if (n1['name'], n2['name'], rel) in fact_in_train: 90 | label['intrain'] = True 91 | 92 | if suffix == '_dev' or suffix == '_test': 93 | if (n1['name'], n2['name'], rel) in fact_in_dev_train: 94 | label['indev_train'] = True 95 | 96 | 97 | new_labels.append(label) 98 | 99 | item['labels'] = new_labels 100 | item['title'] = ori_data[i]['title'] 101 | 102 | na_triple = [] 103 | for j in range(len(vertexSet)): 104 | for k in range(len(vertexSet)): 105 | if (j != k): 106 | if (j, k) not in train_triple: 107 | na_triple.append((j, k)) 108 | 109 | item['na_triple'] = na_triple 110 | item['Ls'] = Ls 111 | item['sents'] = ori_data[i]['sents'] 112 | data.append(item) 113 | 114 | Ma = max(Ma, len(vertexSet)) 115 | Ma_e = max(Ma_e, len(item['labels'])) 116 | 117 | 118 | print ('data_len:', len(ori_data)) 119 | # print ('Ma_V', Ma) 120 | # print ('Ma_e', Ma_e) 121 | # print (suffix) 122 | # print ('fact_in_train', len(fact_in_train)) 123 | # print (intrain, notintrain) 124 | # print ('fact_in_devtrain', len(fact_in_dev_train)) 125 | # print (indevtrain, notindevtrain) 126 | 127 | 128 | # saving 129 | print("Saving files") 130 | if is_training: 131 | name_prefix = "train" 132 | else: 133 | name_prefix = "dev" 134 | 135 | json.dump(data , open(os.path.join(out_path, name_prefix + suffix + '.json'), "w")) 136 | 137 | char2id = json.load(open(os.path.join(out_path, "char2id.json"))) 138 | # id2char= {v:k for k,v in char2id.items()} 139 | # json.dump(id2char, open("data/id2char.json", "w")) 140 | 141 | word2id = json.load(open(os.path.join(out_path, "word2id.json"))) 142 | ner2id = json.load(open(os.path.join(out_path, "ner2id.json"))) 143 | 144 | sen_tot = len(ori_data) 145 | sen_word = np.zeros((sen_tot, max_length), dtype = np.int64) 146 | sen_pos = np.zeros((sen_tot, max_length), dtype = np.int64) 147 | sen_ner = np.zeros((sen_tot, max_length), dtype = np.int64) 148 | sen_char = np.zeros((sen_tot, max_length, char_limit), dtype = np.int64) 149 | 150 | for i in range(len(ori_data)): 151 | item = ori_data[i] 152 | words = [] 153 | for sent in item['sents']: 154 | words += sent 155 | 156 | for j, word in enumerate(words): 157 | word = word.lower() 158 | 159 | if j < max_length: 160 | if word in word2id: 161 | sen_word[i][j] = word2id[word] 162 | else: 163 | sen_word[i][j] = word2id['UNK'] 164 | 165 | for c_idx, k in enumerate(list(word)): 166 | if c_idx>=char_limit: 167 | break 168 | sen_char[i,j,c_idx] = char2id.get(k, char2id['UNK']) 169 | 170 | for j in range(j + 1, max_length): 171 | sen_word[i][j] = word2id['BLANK'] 172 | 173 | vertexSet = item['vertexSet'] 174 | 175 | for idx, vertex in enumerate(vertexSet, 1): 176 | for v in vertex: 177 | sen_pos[i][v['pos'][0]:v['pos'][1]] = idx 178 | sen_ner[i][v['pos'][0]:v['pos'][1]] = ner2id[v['type']] 179 | 180 | print("Finishing processing") 181 | np.save(os.path.join(out_path, name_prefix + suffix + '_word.npy'), sen_word) 182 | np.save(os.path.join(out_path, name_prefix + suffix + '_pos.npy'), sen_pos) 183 | np.save(os.path.join(out_path, name_prefix + suffix + '_ner.npy'), sen_ner) 184 | np.save(os.path.join(out_path, name_prefix + suffix + '_char.npy'), sen_char) 185 | print("Finish saving") 186 | 187 | 188 | 189 | init(train_distant_file_name, rel2id, max_length = 512, is_training = True, suffix='') 190 | init(train_annotated_file_name, rel2id, max_length = 512, is_training = False, suffix='_train') 191 | init(dev_file_name, rel2id, max_length = 512, is_training = False, suffix='_dev') 192 | init(test_file_name, rel2id, max_length = 512, is_training = False, suffix='_test') 193 | 194 | 195 | -------------------------------------------------------------------------------- /code/models/BiLSTM.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.autograd as autograd 3 | import torch.nn as nn 4 | import torch.nn.functional as F 5 | import torch.optim as optim 6 | from torch.autograd import Variable 7 | from torch import nn 8 | import numpy as np 9 | import math 10 | from torch.nn import init 11 | from torch.nn.utils import rnn 12 | 13 | 14 | class BiLSTM(nn.Module): 15 | def __init__(self, config): 16 | super(BiLSTM, self).__init__() 17 | self.config = config 18 | 19 | word_vec_size = config.data_word_vec.shape[0] 20 | self.word_emb = nn.Embedding(word_vec_size, config.data_word_vec.shape[1]) 21 | self.word_emb.weight.data.copy_(torch.from_numpy(config.data_word_vec)) 22 | 23 | self.word_emb.weight.requires_grad = False 24 | self.use_entity_type = True 25 | self.use_coreference = True 26 | self.use_distance = True 27 | 28 | # performance is similar with char_embed 29 | # self.char_emb = nn.Embedding(config.data_char_vec.shape[0], config.data_char_vec.shape[1]) 30 | # self.char_emb.weight.data.copy_(torch.from_numpy(config.data_char_vec)) 31 | 32 | # char_dim = config.data_char_vec.shape[1] 33 | # char_hidden = 100 34 | # self.char_cnn = nn.Conv1d(char_dim, char_hidden, 5) 35 | 36 | hidden_size = 128 37 | input_size = config.data_word_vec.shape[1] 38 | if self.use_entity_type: 39 | input_size += config.entity_type_size 40 | self.ner_emb = nn.Embedding(7, config.entity_type_size, padding_idx=0) 41 | 42 | if self.use_coreference: 43 | input_size += config.coref_size 44 | # self.coref_embed = nn.Embedding(config.max_length, config.coref_size, padding_idx=0) 45 | self.entity_embed = nn.Embedding(config.max_length, config.coref_size, padding_idx=0) 46 | 47 | # input_size += char_hidden 48 | 49 | self.rnn = EncoderLSTM(input_size, hidden_size, 1, True, True, 1 - config.keep_prob, False) 50 | self.linear_re = nn.Linear(hidden_size*2, hidden_size) 51 | 52 | if self.use_distance: 53 | self.dis_embed = nn.Embedding(20, config.dis_size, padding_idx=10) 54 | self.bili = torch.nn.Bilinear(hidden_size+config.dis_size, hidden_size+config.dis_size, config.relation_num) 55 | else: 56 | self.bili = torch.nn.Bilinear(hidden_size, hidden_size, config.relation_num) 57 | 58 | def forward(self, context_idxs, pos, context_ner, context_char_idxs, context_lens, h_mapping, t_mapping, 59 | relation_mask, dis_h_2_t, dis_t_2_h): 60 | # para_size, char_size, bsz = context_idxs.size(1), context_char_idxs.size(2), context_idxs.size(0) 61 | # context_ch = self.char_emb(context_char_idxs.contiguous().view(-1, char_size)).view(bsz * para_size, char_size, -1) 62 | # context_ch = self.char_cnn(context_ch.permute(0, 2, 1).contiguous()).max(dim=-1)[0].view(bsz, para_size, -1) 63 | 64 | sent = self.word_emb(context_idxs) 65 | if self.use_coreference: 66 | sent = torch.cat([sent, self.entity_embed(pos)], dim=-1) 67 | 68 | if self.use_entity_type: 69 | sent = torch.cat([sent, self.ner_emb(context_ner)], dim=-1) 70 | 71 | # sent = torch.cat([sent, context_ch], dim=-1) 72 | context_output = self.rnn(sent, context_lens) 73 | 74 | context_output = torch.relu(self.linear_re(context_output)) 75 | 76 | 77 | start_re_output = torch.matmul(h_mapping, context_output) 78 | end_re_output = torch.matmul(t_mapping, context_output) 79 | 80 | 81 | if self.use_distance: 82 | s_rep = torch.cat([start_re_output, self.dis_embed(dis_h_2_t)], dim=-1) 83 | t_rep = torch.cat([end_re_output, self.dis_embed(dis_t_2_h)], dim=-1) 84 | predict_re = self.bili(s_rep, t_rep) 85 | else: 86 | predict_re = self.bili(start_re_output, end_re_output) 87 | 88 | return predict_re 89 | 90 | 91 | class LockedDropout(nn.Module): 92 | def __init__(self, dropout): 93 | super().__init__() 94 | self.dropout = dropout 95 | 96 | def forward(self, x): 97 | dropout = self.dropout 98 | if not self.training: 99 | return x 100 | m = x.data.new(x.size(0), 1, x.size(2)).bernoulli_(1 - dropout) 101 | mask = Variable(m.div_(1 - dropout), requires_grad=False) 102 | mask = mask.expand_as(x) 103 | return mask * x 104 | 105 | class EncoderRNN(nn.Module): 106 | def __init__(self, input_size, num_units, nlayers, concat, bidir, dropout, return_last): 107 | super().__init__() 108 | self.rnns = [] 109 | for i in range(nlayers): 110 | if i == 0: 111 | input_size_ = input_size 112 | output_size_ = num_units 113 | else: 114 | input_size_ = num_units if not bidir else num_units * 2 115 | output_size_ = num_units 116 | self.rnns.append(nn.GRU(input_size_, output_size_, 1, bidirectional=bidir, batch_first=True)) 117 | self.rnns = nn.ModuleList(self.rnns) 118 | self.init_hidden = nn.ParameterList([nn.Parameter(torch.Tensor(2 if bidir else 1, 1, num_units).zero_()) for _ in range(nlayers)]) 119 | self.dropout = LockedDropout(dropout) 120 | self.concat = concat 121 | self.nlayers = nlayers 122 | self.return_last = return_last 123 | 124 | # self.reset_parameters() 125 | 126 | def reset_parameters(self): 127 | for rnn in self.rnns: 128 | for name, p in rnn.named_parameters(): 129 | if 'weight' in name: 130 | p.data.normal_(std=0.1) 131 | else: 132 | p.data.zero_() 133 | 134 | def get_init(self, bsz, i): 135 | return self.init_hidden[i].expand(-1, bsz, -1).contiguous() 136 | 137 | def forward(self, input, input_lengths=None): 138 | bsz, slen = input.size(0), input.size(1) 139 | output = input 140 | outputs = [] 141 | if input_lengths is not None: 142 | lens = input_lengths.data.cpu().numpy() 143 | for i in range(self.nlayers): 144 | hidden = self.get_init(bsz, i) 145 | output = self.dropout(output) 146 | if input_lengths is not None: 147 | output = rnn.pack_padded_sequence(output, lens, batch_first=True) 148 | 149 | output, hidden = self.rnns[i](output, hidden) 150 | 151 | 152 | if input_lengths is not None: 153 | output, _ = rnn.pad_packed_sequence(output, batch_first=True) 154 | if output.size(1) < slen: # used for parallel 155 | padding = Variable(output.data.new(1, 1, 1).zero_()) 156 | output = torch.cat([output, padding.expand(output.size(0), slen-output.size(1), output.size(2))], dim=1) 157 | if self.return_last: 158 | outputs.append(hidden.permute(1, 0, 2).contiguous().view(bsz, -1)) 159 | else: 160 | outputs.append(output) 161 | if self.concat: 162 | return torch.cat(outputs, dim=2) 163 | return outputs[-1] 164 | 165 | 166 | 167 | 168 | class EncoderLSTM(nn.Module): 169 | def __init__(self, input_size, num_units, nlayers, concat, bidir, dropout, return_last): 170 | super().__init__() 171 | self.rnns = [] 172 | for i in range(nlayers): 173 | if i == 0: 174 | input_size_ = input_size 175 | output_size_ = num_units 176 | else: 177 | input_size_ = num_units if not bidir else num_units * 2 178 | output_size_ = num_units 179 | self.rnns.append(nn.LSTM(input_size_, output_size_, 1, bidirectional=bidir, batch_first=True)) 180 | self.rnns = nn.ModuleList(self.rnns) 181 | 182 | self.init_hidden = nn.ParameterList([nn.Parameter(torch.Tensor(2 if bidir else 1, 1, num_units).zero_()) for _ in range(nlayers)]) 183 | self.init_c = nn.ParameterList([nn.Parameter(torch.Tensor(2 if bidir else 1, 1, num_units).zero_()) for _ in range(nlayers)]) 184 | 185 | self.dropout = LockedDropout(dropout) 186 | self.concat = concat 187 | self.nlayers = nlayers 188 | self.return_last = return_last 189 | 190 | # self.reset_parameters() 191 | 192 | def reset_parameters(self): 193 | for rnn in self.rnns: 194 | for name, p in rnn.named_parameters(): 195 | if 'weight' in name: 196 | p.data.normal_(std=0.1) 197 | else: 198 | p.data.zero_() 199 | 200 | def get_init(self, bsz, i): 201 | return self.init_hidden[i].expand(-1, bsz, -1).contiguous(), self.init_c[i].expand(-1, bsz, -1).contiguous() 202 | 203 | def forward(self, input, input_lengths=None): 204 | bsz, slen = input.size(0), input.size(1) 205 | output = input 206 | outputs = [] 207 | if input_lengths is not None: 208 | lens = input_lengths.data.cpu().numpy() 209 | 210 | for i in range(self.nlayers): 211 | hidden, c = self.get_init(bsz, i) 212 | 213 | output = self.dropout(output) 214 | if input_lengths is not None: 215 | output = rnn.pack_padded_sequence(output, lens, batch_first=True) 216 | 217 | output, hidden = self.rnns[i](output, (hidden, c)) 218 | 219 | 220 | if input_lengths is not None: 221 | output, _ = rnn.pad_packed_sequence(output, batch_first=True) 222 | if output.size(1) < slen: # used for parallel 223 | padding = Variable(output.data.new(1, 1, 1).zero_()) 224 | output = torch.cat([output, padding.expand(output.size(0), slen-output.size(1), output.size(2))], dim=1) 225 | if self.return_last: 226 | outputs.append(hidden.permute(1, 0, 2).contiguous().view(bsz, -1)) 227 | else: 228 | outputs.append(output) 229 | if self.concat: 230 | return torch.cat(outputs, dim=2) 231 | return outputs[-1] 232 | 233 | class BiAttention(nn.Module): 234 | def __init__(self, input_size, dropout): 235 | super().__init__() 236 | self.dropout = LockedDropout(dropout) 237 | self.input_linear = nn.Linear(input_size, 1, bias=False) 238 | self.memory_linear = nn.Linear(input_size, 1, bias=False) 239 | 240 | self.dot_scale = nn.Parameter(torch.Tensor(input_size).uniform_(1.0 / (input_size ** 0.5))) 241 | 242 | def forward(self, input, memory, mask): 243 | bsz, input_len, memory_len = input.size(0), input.size(1), memory.size(1) 244 | 245 | input = self.dropout(input) 246 | memory = self.dropout(memory) 247 | 248 | input_dot = self.input_linear(input) 249 | memory_dot = self.memory_linear(memory).view(bsz, 1, memory_len) 250 | cross_dot = torch.bmm(input * self.dot_scale, memory.permute(0, 2, 1).contiguous()) 251 | att = input_dot + memory_dot + cross_dot 252 | att = att - 1e30 * (1 - mask[:,None]) 253 | 254 | weight_one = F.softmax(att, dim=-1) 255 | output_one = torch.bmm(weight_one, memory) 256 | weight_two = F.softmax(att.max(dim=-1)[0], dim=-1).view(bsz, 1, input_len) 257 | output_two = torch.bmm(weight_two, input) 258 | 259 | return torch.cat([input, output_one, input*output_one, output_two*output_one], dim=-1) 260 | -------------------------------------------------------------------------------- /code/models/CNN3.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.autograd as autograd 3 | import torch.nn as nn 4 | import torch.nn.functional as F 5 | import torch.optim as optim 6 | from torch.autograd import Variable 7 | 8 | class CNN3(nn.Module): 9 | def __init__(self, config): 10 | super(CNN3, self).__init__() 11 | self.config = config 12 | self.word_emb = nn.Embedding(config.data_word_vec.shape[0], config.data_word_vec.shape[1]) 13 | self.word_emb.weight.data.copy_(torch.from_numpy(config.data_word_vec)) 14 | self.word_emb.weight.requires_grad = False 15 | 16 | 17 | # self.char_emb = nn.Embedding(config.data_char_vec.shape[0], config.data_char_vec.shape[1]) 18 | # self.char_emb.weight.data.copy_(torch.from_numpy(config.data_char_vec)) 19 | # char_dim = config.data_char_vec.shape[1] 20 | # char_hidden = 100 21 | # self.char_cnn = nn.Conv1d(char_dim, char_hidden, 5) 22 | 23 | self.coref_embed = nn.Embedding(config.max_length, config.coref_size, padding_idx=0) 24 | self.ner_emb = nn.Embedding(7, config.entity_type_size, padding_idx=0) 25 | 26 | input_size = config.data_word_vec.shape[1] + config.coref_size + config.entity_type_size #+ char_hidden 27 | 28 | self.out_channels = 200 29 | self.in_channels = input_size 30 | 31 | self.kernel_size = 3 32 | self.stride = 1 33 | self.padding = int((self.kernel_size - 1) / 2) 34 | 35 | self.cnn_1 = nn.Conv1d(self.in_channels, self.out_channels, self.kernel_size, self.stride, self.padding) 36 | self.cnn_2 = nn.Conv1d(self.out_channels, self.out_channels, self.kernel_size, self.stride, self.padding) 37 | self.cnn_3 = nn.Conv1d(self.out_channels, self.out_channels, self.kernel_size, self.stride, self.padding) 38 | self.max_pooling = nn.MaxPool1d(self.kernel_size, stride=self.stride, padding=self.padding) 39 | self.relu = nn.ReLU() 40 | 41 | self.dropout = nn.Dropout(config.cnn_drop_prob) 42 | 43 | self.bili = torch.nn.Bilinear(self.out_channels+config.dis_size, self.out_channels+config.dis_size, config.relation_num) 44 | self.dis_embed = nn.Embedding(20, config.dis_size, padding_idx=10) 45 | 46 | 47 | def forward(self, context_idxs, pos, context_ner, context_char_idxs, context_lens, h_mapping, t_mapping, relation_mask, dis_h_2_t, dis_t_2_h): 48 | # para_size, char_size, bsz = context_idxs.size(1), context_char_idxs.size(2), context_idxs.size(0) 49 | # context_ch = self.char_emb(context_char_idxs.contiguous().view(-1, char_size)).view(bsz * para_size, char_size, -1) 50 | # context_ch = self.char_cnn(context_ch.permute(0, 2, 1).contiguous()).max(dim=-1)[0].view(bsz, para_size, -1) 51 | 52 | sent = torch.cat([self.word_emb(context_idxs), self.coref_embed(pos), self.ner_emb(context_ner)], dim=-1) 53 | 54 | sent = sent.permute(0, 2, 1) 55 | 56 | # batch * embedding_size * max_len 57 | x = self.cnn_1(sent) 58 | x = self.max_pooling(x) 59 | x = self.relu(x) 60 | x = self.dropout(x) 61 | 62 | x = self.cnn_2(x) 63 | x = self.max_pooling(x) 64 | x = self.relu(x) 65 | x = self.dropout(x) 66 | 67 | x = self.cnn_3(x) 68 | x = self.max_pooling(x) 69 | x = self.relu(x) 70 | x = self.dropout(x) 71 | 72 | context_output = x.permute(0, 2, 1) 73 | start_re_output = torch.matmul(h_mapping, context_output) 74 | end_re_output = torch.matmul(t_mapping, context_output) 75 | 76 | s_rep = torch.cat([start_re_output, self.dis_embed(dis_h_2_t)], dim=-1) 77 | t_rep = torch.cat([end_re_output, self.dis_embed(dis_t_2_h)], dim=-1) 78 | 79 | predict_re = self.bili(s_rep, t_rep) 80 | 81 | return predict_re 82 | -------------------------------------------------------------------------------- /code/models/ContextAware.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.autograd as autograd 3 | import torch.nn as nn 4 | import torch.nn.functional as F 5 | import torch.optim as optim 6 | from torch.autograd import Variable 7 | from torch import nn 8 | import numpy as np 9 | import math 10 | from torch.nn import init 11 | from torch.nn.utils import rnn 12 | 13 | 14 | class ContextAware(nn.Module): 15 | def __init__(self, config): 16 | super(ContextAware, self).__init__() 17 | self.config = config 18 | self.word_emb = nn.Embedding(config.data_word_vec.shape[0], config.data_word_vec.shape[1]) 19 | self.word_emb.weight.data.copy_(torch.from_numpy(config.data_word_vec)) 20 | self.word_emb.weight.requires_grad = False 21 | 22 | self.ner_emb = nn.Embedding(7, config.entity_type_size, padding_idx=0) 23 | 24 | self.coref_embed = nn.Embedding(config.max_length, config.coref_size, padding_idx=0) 25 | 26 | # self.char_emb = nn.Embedding(config.data_char_vec.shape[0], config.data_char_vec.shape[1]) 27 | # self.char_emb.weight.data.copy_(torch.from_numpy(config.data_char_vec)) 28 | # char_dim = config.data_char_vec.shape[1] 29 | # char_hidden = 100 30 | # self.char_cnn = nn.Conv1d(char_dim, char_hidden, 5) 31 | 32 | hidden_size = 128 33 | input_size = config.data_word_vec.shape[1] + config.coref_size + config.entity_type_size #+ char_hidden 34 | 35 | 36 | self.rnn = EncoderLSTM(input_size, hidden_size, 1, True, True, 1 - config.keep_prob, False) 37 | 38 | 39 | self.linear_re = nn.Linear(hidden_size * 2, hidden_size) 40 | self.bili = torch.nn.Bilinear(hidden_size, hidden_size, hidden_size) 41 | 42 | self.self_att = SelfAttention(hidden_size, 1.0) 43 | 44 | self.bili = torch.nn.Bilinear(hidden_size+config.dis_size, hidden_size+config.dis_size, hidden_size) 45 | self.dis_embed = nn.Embedding(20, config.dis_size, padding_idx=10) 46 | 47 | self.linear_output = nn.Linear(hidden_size * 2, config.relation_num) 48 | 49 | 50 | def forward(self, context_idxs, pos, context_ner, context_char_idxs, context_lens, h_mapping, t_mapping, relation_mask, dis_h_2_t, dis_t_2_h): 51 | # para_size, char_size, bsz = context_idxs.size(1), context_char_idxs.size(2), context_idxs.size(0) 52 | # context_ch = self.char_emb(context_char_idxs.contiguous().view(-1, char_size)).view(bsz * para_size, char_size, -1) 53 | # context_ch = self.char_cnn(context_ch.permute(0, 2, 1).contiguous()).max(dim=-1)[0].view(bsz, para_size, -1) 54 | 55 | sent = torch.cat([self.word_emb(context_idxs), self.coref_embed(pos), self.ner_emb(context_ner)], dim=-1) 56 | context_output = self.rnn(sent, context_lens) 57 | 58 | 59 | context_output = torch.relu(self.linear_re(context_output)) 60 | 61 | start_re_output = torch.matmul(h_mapping, context_output) 62 | end_re_output = torch.matmul(t_mapping, context_output) 63 | 64 | s_rep = torch.cat([start_re_output, self.dis_embed(dis_h_2_t)], dim=-1) 65 | t_rep = torch.cat([end_re_output, self.dis_embed(dis_t_2_h)], dim=-1) 66 | 67 | 68 | re_rep = self.bili(s_rep, t_rep) 69 | re_rep = self.self_att(re_rep, re_rep, relation_mask) 70 | 71 | 72 | return self.linear_output(re_rep) 73 | 74 | 75 | class LockedDropout(nn.Module): 76 | def __init__(self, dropout): 77 | super().__init__() 78 | self.dropout = dropout 79 | 80 | def forward(self, x): 81 | dropout = self.dropout 82 | if not self.training: 83 | return x 84 | m = x.data.new(x.size(0), 1, x.size(2)).bernoulli_(1 - dropout) 85 | mask = Variable(m.div_(1 - dropout), requires_grad=False) 86 | mask = mask.expand_as(x) 87 | return mask * x 88 | 89 | class EncoderRNN(nn.Module): 90 | def __init__(self, input_size, num_units, nlayers, concat, bidir, dropout, return_last): 91 | super().__init__() 92 | self.rnns = [] 93 | for i in range(nlayers): 94 | if i == 0: 95 | input_size_ = input_size 96 | output_size_ = num_units 97 | else: 98 | input_size_ = num_units if not bidir else num_units * 2 99 | output_size_ = num_units 100 | self.rnns.append(nn.GRU(input_size_, output_size_, 1, bidirectional=bidir, batch_first=True)) 101 | self.rnns = nn.ModuleList(self.rnns) 102 | self.init_hidden = nn.ParameterList([nn.Parameter(torch.Tensor(2 if bidir else 1, 1, num_units).zero_()) for _ in range(nlayers)]) 103 | self.dropout = LockedDropout(dropout) 104 | self.concat = concat 105 | self.nlayers = nlayers 106 | self.return_last = return_last 107 | 108 | # self.reset_parameters() 109 | 110 | def reset_parameters(self): 111 | for rnn in self.rnns: 112 | for name, p in rnn.named_parameters(): 113 | if 'weight' in name: 114 | p.data.normal_(std=0.1) 115 | else: 116 | p.data.zero_() 117 | 118 | def get_init(self, bsz, i): 119 | return self.init_hidden[i].expand(-1, bsz, -1).contiguous() 120 | 121 | def forward(self, input, input_lengths=None): 122 | bsz, slen = input.size(0), input.size(1) 123 | output = input 124 | outputs = [] 125 | if input_lengths is not None: 126 | lens = input_lengths.data.cpu().numpy() 127 | for i in range(self.nlayers): 128 | hidden = self.get_init(bsz, i) 129 | output = self.dropout(output) 130 | if input_lengths is not None: 131 | output = rnn.pack_padded_sequence(output, lens, batch_first=True) 132 | 133 | output, hidden = self.rnns[i](output, hidden) 134 | 135 | 136 | if input_lengths is not None: 137 | output, _ = rnn.pad_packed_sequence(output, batch_first=True) 138 | if output.size(1) < slen: # used for parallel 139 | padding = Variable(output.data.new(1, 1, 1).zero_()) 140 | output = torch.cat([output, padding.expand(output.size(0), slen-output.size(1), output.size(2))], dim=1) 141 | if self.return_last: 142 | outputs.append(hidden.permute(1, 0, 2).contiguous().view(bsz, -1)) 143 | else: 144 | outputs.append(output) 145 | if self.concat: 146 | return torch.cat(outputs, dim=2) 147 | return outputs[-1] 148 | 149 | 150 | 151 | class EncoderLSTM(nn.Module): 152 | def __init__(self, input_size, num_units, nlayers, concat, bidir, dropout, return_last): 153 | super().__init__() 154 | self.rnns = [] 155 | for i in range(nlayers): 156 | if i == 0: 157 | input_size_ = input_size 158 | output_size_ = num_units 159 | else: 160 | input_size_ = num_units if not bidir else num_units * 2 161 | output_size_ = num_units 162 | self.rnns.append(nn.LSTM(input_size_, output_size_, 1, bidirectional=bidir, batch_first=True)) 163 | self.rnns = nn.ModuleList(self.rnns) 164 | 165 | self.init_hidden = nn.ParameterList([nn.Parameter(torch.Tensor(2 if bidir else 1, 1, num_units).zero_()) for _ in range(nlayers)]) 166 | self.init_c = nn.ParameterList([nn.Parameter(torch.Tensor(2 if bidir else 1, 1, num_units).zero_()) for _ in range(nlayers)]) 167 | 168 | self.dropout = LockedDropout(dropout) 169 | self.concat = concat 170 | self.nlayers = nlayers 171 | self.return_last = return_last 172 | 173 | # self.reset_parameters() 174 | 175 | def reset_parameters(self): 176 | for rnn in self.rnns: 177 | for name, p in rnn.named_parameters(): 178 | if 'weight' in name: 179 | p.data.normal_(std=0.1) 180 | else: 181 | p.data.zero_() 182 | 183 | def get_init(self, bsz, i): 184 | return self.init_hidden[i].expand(-1, bsz, -1).contiguous(), self.init_c[i].expand(-1, bsz, -1).contiguous() 185 | 186 | def forward(self, input, input_lengths=None): 187 | bsz, slen = input.size(0), input.size(1) 188 | output = input 189 | outputs = [] 190 | if input_lengths is not None: 191 | lens = input_lengths.data.cpu().numpy() 192 | 193 | for i in range(self.nlayers): 194 | hidden, c = self.get_init(bsz, i) 195 | 196 | output = self.dropout(output) 197 | if input_lengths is not None: 198 | output = rnn.pack_padded_sequence(output, lens, batch_first=True) 199 | 200 | output, hidden = self.rnns[i](output, (hidden, c)) 201 | 202 | 203 | if input_lengths is not None: 204 | output, _ = rnn.pad_packed_sequence(output, batch_first=True) 205 | if output.size(1) < slen: # used for parallel 206 | padding = Variable(output.data.new(1, 1, 1).zero_()) 207 | output = torch.cat([output, padding.expand(output.size(0), slen-output.size(1), output.size(2))], dim=1) 208 | if self.return_last: 209 | outputs.append(hidden.permute(1, 0, 2).contiguous().view(bsz, -1)) 210 | else: 211 | outputs.append(output) 212 | if self.concat: 213 | return torch.cat(outputs, dim=2) 214 | return outputs[-1] 215 | 216 | class SelfAttention(nn.Module): 217 | def __init__(self, input_size, dropout): 218 | super().__init__() 219 | # self.dropout = LockedDropout(dropout) 220 | self.input_linear = nn.Linear(input_size, 1, bias=False) 221 | self.dot_scale = nn.Parameter(torch.Tensor(input_size).uniform_(1.0 / (input_size ** 0.5))) 222 | 223 | def forward(self, input, memory, mask): 224 | 225 | # input = self.dropout(input) 226 | # memory = self.dropout(memory) 227 | 228 | input_dot = self.input_linear(input) 229 | cross_dot = torch.bmm(input * self.dot_scale, memory.permute(0, 2, 1).contiguous()) 230 | att = input_dot + cross_dot 231 | att = att - 1e30 * (1 - mask[:,None]) 232 | 233 | weight_one = F.softmax(att, dim=-1) 234 | output_one = torch.bmm(weight_one, memory) 235 | 236 | return torch.cat([input, output_one], dim=-1) 237 | 238 | 239 | class BiAttention(nn.Module): 240 | def __init__(self, input_size, dropout): 241 | super().__init__() 242 | self.dropout = LockedDropout(dropout) 243 | self.input_linear = nn.Linear(input_size, 1, bias=False) 244 | self.memory_linear = nn.Linear(input_size, 1, bias=False) 245 | 246 | self.dot_scale = nn.Parameter(torch.Tensor(input_size).uniform_(1.0 / (input_size ** 0.5))) 247 | 248 | def forward(self, input, memory, mask): 249 | bsz, input_len, memory_len = input.size(0), input.size(1), memory.size(1) 250 | 251 | input = self.dropout(input) 252 | memory = self.dropout(memory) 253 | 254 | input_dot = self.input_linear(input) 255 | memory_dot = self.memory_linear(memory).view(bsz, 1, memory_len) 256 | cross_dot = torch.bmm(input * self.dot_scale, memory.permute(0, 2, 1).contiguous()) 257 | att = input_dot + memory_dot + cross_dot 258 | att = att - 1e30 * (1 - mask[:,None]) 259 | 260 | weight_one = F.softmax(att, dim=-1) 261 | output_one = torch.bmm(weight_one, memory) 262 | weight_two = F.softmax(att.max(dim=-1)[0], dim=-1).view(bsz, 1, input_len) 263 | output_two = torch.bmm(weight_two, input) 264 | 265 | return torch.cat([input, output_one, input*output_one, output_two*output_one], dim=-1) 266 | -------------------------------------------------------------------------------- /code/models/LSTM.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.autograd as autograd 3 | import torch.nn as nn 4 | import torch.nn.functional as F 5 | import torch.optim as optim 6 | from torch.autograd import Variable 7 | from torch import nn 8 | import numpy as np 9 | import math 10 | from torch.nn import init 11 | from torch.nn.utils import rnn 12 | 13 | 14 | class LSTM(nn.Module): 15 | def __init__(self, config): 16 | super(LSTM, self).__init__() 17 | self.config = config 18 | 19 | word_vec_size = config.data_word_vec.shape[0] 20 | self.word_emb = nn.Embedding(word_vec_size, config.data_word_vec.shape[1]) 21 | self.word_emb.weight.data.copy_(torch.from_numpy(config.data_word_vec)) 22 | self.word_emb.weight.requires_grad = False 23 | 24 | # self.char_emb = nn.Embedding(config.data_char_vec.shape[0], config.data_char_vec.shape[1]) 25 | # self.char_emb.weight.data.copy_(torch.from_numpy(config.data_char_vec)) 26 | # char_dim = config.data_char_vec.shape[1] 27 | # char_hidden = 100 28 | # self.char_cnn = nn.Conv1d(char_dim, char_hidden, 5) 29 | self.coref_embed = nn.Embedding(config.max_length, config.coref_size, padding_idx=0) 30 | self.ner_emb = nn.Embedding(7, config.entity_type_size, padding_idx=0) 31 | 32 | input_size = config.data_word_vec.shape[1] + config.coref_size + config.entity_type_size #+ char_hidden 33 | hidden_size = 128 34 | 35 | # self.rnn = EncoderLSTM(input_size, hidden_size, 1, True, True, 1 - config.keep_prob, False) 36 | # self.linear_re = nn.Linear(hidden_size*2, hidden_size) # *4 for 2layer 37 | 38 | self.rnn = EncoderLSTM(input_size, hidden_size, 1, True, False, 1 - config.keep_prob, False) 39 | self.linear_re = nn.Linear(hidden_size, hidden_size) # *4 for 2layer 40 | 41 | self.bili = torch.nn.Bilinear(hidden_size+config.dis_size, hidden_size+config.dis_size, config.relation_num) 42 | 43 | 44 | 45 | self.dis_embed = nn.Embedding(20, config.dis_size, padding_idx=10) 46 | 47 | 48 | 49 | def forward(self, context_idxs, pos, context_ner, context_char_idxs, context_lens, h_mapping, t_mapping, 50 | relation_mask, dis_h_2_t, dis_t_2_h): 51 | # para_size, char_size, bsz = context_idxs.size(1), context_char_idxs.size(2), context_idxs.size(0) 52 | # context_ch = self.char_emb(context_char_idxs.contiguous().view(-1, char_size)).view(bsz * para_size, char_size, -1) 53 | # context_ch = self.char_cnn(context_ch.permute(0, 2, 1).contiguous()).max(dim=-1)[0].view(bsz, para_size, -1) 54 | 55 | sent = torch.cat([self.word_emb(context_idxs) , self.coref_embed(pos), self.ner_emb(context_ner)], dim=-1) 56 | # sent = torch.cat([self.word_emb(context_idxs), context_ch], dim=-1) 57 | 58 | # context_mask = (context_idxs > 0).float() 59 | context_output = self.rnn(sent, context_lens) 60 | 61 | context_output = torch.relu(self.linear_re(context_output)) 62 | 63 | 64 | start_re_output = torch.matmul(h_mapping, context_output) 65 | end_re_output = torch.matmul(t_mapping, context_output) 66 | # predict_re = self.bili(start_re_output, end_re_output) 67 | 68 | s_rep = torch.cat([start_re_output, self.dis_embed(dis_h_2_t)], dim=-1) 69 | t_rep = torch.cat([end_re_output, self.dis_embed(dis_t_2_h)], dim=-1) 70 | predict_re = self.bili(s_rep, t_rep) 71 | 72 | 73 | 74 | return predict_re 75 | 76 | 77 | class LockedDropout(nn.Module): 78 | def __init__(self, dropout): 79 | super().__init__() 80 | self.dropout = dropout 81 | 82 | def forward(self, x): 83 | dropout = self.dropout 84 | if not self.training: 85 | return x 86 | m = x.data.new(x.size(0), 1, x.size(2)).bernoulli_(1 - dropout) 87 | mask = Variable(m.div_(1 - dropout), requires_grad=False) 88 | mask = mask.expand_as(x) 89 | return mask * x 90 | 91 | class EncoderRNN(nn.Module): 92 | def __init__(self, input_size, num_units, nlayers, concat, bidir, dropout, return_last): 93 | super().__init__() 94 | self.rnns = [] 95 | for i in range(nlayers): 96 | if i == 0: 97 | input_size_ = input_size 98 | output_size_ = num_units 99 | else: 100 | input_size_ = num_units if not bidir else num_units * 2 101 | output_size_ = num_units 102 | self.rnns.append(nn.GRU(input_size_, output_size_, 1, bidirectional=bidir, batch_first=True)) 103 | self.rnns = nn.ModuleList(self.rnns) 104 | self.init_hidden = nn.ParameterList([nn.Parameter(torch.Tensor(2 if bidir else 1, 1, num_units).zero_()) for _ in range(nlayers)]) 105 | self.dropout = LockedDropout(dropout) 106 | self.concat = concat 107 | self.nlayers = nlayers 108 | self.return_last = return_last 109 | 110 | # self.reset_parameters() 111 | 112 | def reset_parameters(self): 113 | for rnn in self.rnns: 114 | for name, p in rnn.named_parameters(): 115 | if 'weight' in name: 116 | p.data.normal_(std=0.1) 117 | else: 118 | p.data.zero_() 119 | 120 | def get_init(self, bsz, i): 121 | return self.init_hidden[i].expand(-1, bsz, -1).contiguous() 122 | 123 | def forward(self, input, input_lengths=None): 124 | bsz, slen = input.size(0), input.size(1) 125 | output = input 126 | outputs = [] 127 | if input_lengths is not None: 128 | lens = input_lengths.data.cpu().numpy() 129 | for i in range(self.nlayers): 130 | hidden = self.get_init(bsz, i) 131 | output = self.dropout(output) 132 | if input_lengths is not None: 133 | output = rnn.pack_padded_sequence(output, lens, batch_first=True) 134 | 135 | output, hidden = self.rnns[i](output, hidden) 136 | 137 | 138 | if input_lengths is not None: 139 | output, _ = rnn.pad_packed_sequence(output, batch_first=True) 140 | if output.size(1) < slen: # used for parallel 141 | padding = Variable(output.data.new(1, 1, 1).zero_()) 142 | output = torch.cat([output, padding.expand(output.size(0), slen-output.size(1), output.size(2))], dim=1) 143 | if self.return_last: 144 | outputs.append(hidden.permute(1, 0, 2).contiguous().view(bsz, -1)) 145 | else: 146 | outputs.append(output) 147 | if self.concat: 148 | return torch.cat(outputs, dim=2) 149 | return outputs[-1] 150 | 151 | 152 | 153 | 154 | class EncoderLSTM(nn.Module): 155 | def __init__(self, input_size, num_units, nlayers, concat, bidir, dropout, return_last): 156 | super().__init__() 157 | self.rnns = [] 158 | for i in range(nlayers): 159 | if i == 0: 160 | input_size_ = input_size 161 | output_size_ = num_units 162 | else: 163 | input_size_ = num_units if not bidir else num_units * 2 164 | output_size_ = num_units 165 | self.rnns.append(nn.LSTM(input_size_, output_size_, 1, bidirectional=bidir, batch_first=True)) 166 | self.rnns = nn.ModuleList(self.rnns) 167 | 168 | self.init_hidden = nn.ParameterList([nn.Parameter(torch.Tensor(2 if bidir else 1, 1, num_units).zero_()) for _ in range(nlayers)]) 169 | self.init_c = nn.ParameterList([nn.Parameter(torch.Tensor(2 if bidir else 1, 1, num_units).zero_()) for _ in range(nlayers)]) 170 | 171 | self.dropout = LockedDropout(dropout) 172 | self.concat = concat 173 | self.nlayers = nlayers 174 | self.return_last = return_last 175 | 176 | # self.reset_parameters() 177 | 178 | def reset_parameters(self): 179 | for rnn in self.rnns: 180 | for name, p in rnn.named_parameters(): 181 | if 'weight' in name: 182 | p.data.normal_(std=0.1) 183 | else: 184 | p.data.zero_() 185 | 186 | def get_init(self, bsz, i): 187 | return self.init_hidden[i].expand(-1, bsz, -1).contiguous(), self.init_c[i].expand(-1, bsz, -1).contiguous() 188 | 189 | def forward(self, input, input_lengths=None): 190 | bsz, slen = input.size(0), input.size(1) 191 | output = input 192 | outputs = [] 193 | if input_lengths is not None: 194 | lens = input_lengths.data.cpu().numpy() 195 | 196 | for i in range(self.nlayers): 197 | hidden, c = self.get_init(bsz, i) 198 | 199 | output = self.dropout(output) 200 | if input_lengths is not None: 201 | output = rnn.pack_padded_sequence(output, lens, batch_first=True) 202 | 203 | output, hidden = self.rnns[i](output, (hidden, c)) 204 | 205 | 206 | if input_lengths is not None: 207 | output, _ = rnn.pad_packed_sequence(output, batch_first=True) 208 | if output.size(1) < slen: # used for parallel 209 | padding = Variable(output.data.new(1, 1, 1).zero_()) 210 | output = torch.cat([output, padding.expand(output.size(0), slen-output.size(1), output.size(2))], dim=1) 211 | if self.return_last: 212 | outputs.append(hidden.permute(1, 0, 2).contiguous().view(bsz, -1)) 213 | else: 214 | outputs.append(output) 215 | if self.concat: 216 | return torch.cat(outputs, dim=2) 217 | return outputs[-1] 218 | 219 | class BiAttention(nn.Module): 220 | def __init__(self, input_size, dropout): 221 | super().__init__() 222 | self.dropout = LockedDropout(dropout) 223 | self.input_linear = nn.Linear(input_size, 1, bias=False) 224 | self.memory_linear = nn.Linear(input_size, 1, bias=False) 225 | 226 | self.dot_scale = nn.Parameter(torch.Tensor(input_size).uniform_(1.0 / (input_size ** 0.5))) 227 | 228 | def forward(self, input, memory, mask): 229 | bsz, input_len, memory_len = input.size(0), input.size(1), memory.size(1) 230 | 231 | input = self.dropout(input) 232 | memory = self.dropout(memory) 233 | 234 | input_dot = self.input_linear(input) 235 | memory_dot = self.memory_linear(memory).view(bsz, 1, memory_len) 236 | cross_dot = torch.bmm(input * self.dot_scale, memory.permute(0, 2, 1).contiguous()) 237 | att = input_dot + memory_dot + cross_dot 238 | att = att - 1e30 * (1 - mask[:,None]) 239 | 240 | weight_one = F.softmax(att, dim=-1) 241 | output_one = torch.bmm(weight_one, memory) 242 | weight_two = F.softmax(att.max(dim=-1)[0], dim=-1).view(bsz, 1, input_len) 243 | output_two = torch.bmm(weight_two, input) 244 | 245 | return torch.cat([input, output_one, input*output_one, output_two*output_one], dim=-1) 246 | -------------------------------------------------------------------------------- /code/models/LSTM_SP.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.autograd as autograd 3 | import torch.nn as nn 4 | import torch.nn.functional as F 5 | import torch.optim as optim 6 | from torch.autograd import Variable 7 | from torch import nn 8 | import numpy as np 9 | import math 10 | from torch.nn import init 11 | from torch.nn.utils import rnn 12 | 13 | 14 | class LSTM_SP(nn.Module): 15 | def __init__(self, config): 16 | super(LSTM_SP, self).__init__() 17 | self.config = config 18 | 19 | word_vec_size = config.data_word_vec.shape[0] 20 | self.word_emb = nn.Embedding(word_vec_size, config.data_word_vec.shape[1]) 21 | self.word_emb.weight.data.copy_(torch.from_numpy(config.data_word_vec)) 22 | self.word_emb.weight.requires_grad = False 23 | 24 | # char_size = config.data_char_vec.shape[0] 25 | # self.char_emb = nn.Embedding(char_size, config.data_char_vec.shape[1]) 26 | # self.char_emb.weight.data.copy_(torch.from_numpy(config.data_char_vec)) 27 | # char_dim = config.data_char_vec.shape[1] 28 | # char_hidden = 100 29 | # self.char_cnn = nn.Conv1d(char_dim, char_hidden, 5) 30 | 31 | self.coref_embed = nn.Embedding(config.max_length, config.coref_size, padding_idx=0) 32 | self.ner_emb = nn.Embedding(7, config.entity_type_size, padding_idx=0) 33 | 34 | 35 | hidden_size = 128 36 | input_size = config.data_word_vec.shape[1] + config.coref_size + config.entity_type_size# + char_hidden 37 | 38 | self.rnn = EncoderLSTM(input_size, hidden_size, 1, True, True, 1 - config.keep_prob, False) 39 | 40 | self.relation_embed = nn.Embedding(config.relation_num, hidden_size, padding_idx=0) 41 | 42 | 43 | self.linear_t = nn.Linear(hidden_size*2, hidden_size) # *4 for 2layer 44 | # self.bili = torch.nn.Bilinear(hidden_size, hidden_size, hidden_size) 45 | 46 | self.linear_re = nn.Linear(hidden_size*3, 1) 47 | 48 | 49 | def forward(self, context_idxs, pos, context_ner, context_char_idxs, context_lens, sent_h_mapping, sent_t_mapping, relation_label): 50 | # para_size, char_size, bsz = context_idxs.size(1), context_char_idxs.size(2), context_idxs.size(0) 51 | # context_ch = self.char_emb(context_char_idxs.contiguous().view(-1, char_size)).view(bsz * para_size, char_size, -1) 52 | # context_ch = self.char_cnn(context_ch.permute(0, 2, 1).contiguous()).max(dim=-1)[0].view(bsz, para_size, -1) 53 | 54 | sent = torch.cat([self.word_emb(context_idxs) , self.coref_embed(pos), self.ner_emb(context_ner)], dim=-1) 55 | 56 | el = sent_h_mapping.size(1) 57 | re_embed = (self.relation_embed(relation_label).unsqueeze(1)).expand(-1, el, -1) 58 | 59 | context_output = self.rnn(sent, context_lens) 60 | context_output = torch.relu(self.linear_t(context_output)) 61 | start_re_output = torch.matmul(sent_h_mapping, context_output) 62 | end_re_output = torch.matmul(sent_t_mapping, context_output) 63 | 64 | sent_output = torch.cat([start_re_output, end_re_output, re_embed], dim=-1) 65 | predict_sent = self.linear_re(sent_output).squeeze(2) 66 | 67 | # predict_sent = torch.sum(self.bili(start_re_output, end_re_output)*re_embed, dim=-1) 68 | 69 | return predict_sent 70 | 71 | 72 | class LockedDropout(nn.Module): 73 | def __init__(self, dropout): 74 | super().__init__() 75 | self.dropout = dropout 76 | 77 | def forward(self, x): 78 | dropout = self.dropout 79 | if not self.training: 80 | return x 81 | m = x.data.new(x.size(0), 1, x.size(2)).bernoulli_(1 - dropout) 82 | mask = Variable(m.div_(1 - dropout), requires_grad=False) 83 | mask = mask.expand_as(x) 84 | return mask * x 85 | 86 | class EncoderRNN(nn.Module): 87 | def __init__(self, input_size, num_units, nlayers, concat, bidir, dropout, return_last): 88 | super().__init__() 89 | self.rnns = [] 90 | for i in range(nlayers): 91 | if i == 0: 92 | input_size_ = input_size 93 | output_size_ = num_units 94 | else: 95 | input_size_ = num_units if not bidir else num_units * 2 96 | output_size_ = num_units 97 | self.rnns.append(nn.GRU(input_size_, output_size_, 1, bidirectional=bidir, batch_first=True)) 98 | self.rnns = nn.ModuleList(self.rnns) 99 | self.init_hidden = nn.ParameterList([nn.Parameter(torch.Tensor(2 if bidir else 1, 1, num_units).zero_()) for _ in range(nlayers)]) 100 | self.dropout = LockedDropout(dropout) 101 | self.concat = concat 102 | self.nlayers = nlayers 103 | self.return_last = return_last 104 | 105 | # self.reset_parameters() 106 | 107 | def reset_parameters(self): 108 | for rnn in self.rnns: 109 | for name, p in rnn.named_parameters(): 110 | if 'weight' in name: 111 | p.data.normal_(std=0.1) 112 | else: 113 | p.data.zero_() 114 | 115 | def get_init(self, bsz, i): 116 | return self.init_hidden[i].expand(-1, bsz, -1).contiguous() 117 | 118 | def forward(self, input, input_lengths=None): 119 | bsz, slen = input.size(0), input.size(1) 120 | output = input 121 | outputs = [] 122 | if input_lengths is not None: 123 | lens = input_lengths.data.cpu().numpy() 124 | for i in range(self.nlayers): 125 | hidden = self.get_init(bsz, i) 126 | output = self.dropout(output) 127 | if input_lengths is not None: 128 | output = rnn.pack_padded_sequence(output, lens, batch_first=True) 129 | 130 | output, hidden = self.rnns[i](output, hidden) 131 | 132 | 133 | if input_lengths is not None: 134 | output, _ = rnn.pad_packed_sequence(output, batch_first=True) 135 | if output.size(1) < slen: # used for parallel 136 | padding = Variable(output.data.new(1, 1, 1).zero_()) 137 | output = torch.cat([output, padding.expand(output.size(0), slen-output.size(1), output.size(2))], dim=1) 138 | if self.return_last: 139 | outputs.append(hidden.permute(1, 0, 2).contiguous().view(bsz, -1)) 140 | else: 141 | outputs.append(output) 142 | if self.concat: 143 | return torch.cat(outputs, dim=2) 144 | return outputs[-1] 145 | 146 | 147 | 148 | 149 | class EncoderLSTM(nn.Module): 150 | def __init__(self, input_size, num_units, nlayers, concat, bidir, dropout, return_last): 151 | super().__init__() 152 | self.rnns = [] 153 | for i in range(nlayers): 154 | if i == 0: 155 | input_size_ = input_size 156 | output_size_ = num_units 157 | else: 158 | input_size_ = num_units if not bidir else num_units * 2 159 | output_size_ = num_units 160 | self.rnns.append(nn.LSTM(input_size_, output_size_, 1, bidirectional=bidir, batch_first=True)) 161 | self.rnns = nn.ModuleList(self.rnns) 162 | 163 | self.init_hidden = nn.ParameterList([nn.Parameter(torch.Tensor(2 if bidir else 1, 1, num_units).zero_()) for _ in range(nlayers)]) 164 | self.init_c = nn.ParameterList([nn.Parameter(torch.Tensor(2 if bidir else 1, 1, num_units).zero_()) for _ in range(nlayers)]) 165 | 166 | self.dropout = LockedDropout(dropout) 167 | self.concat = concat 168 | self.nlayers = nlayers 169 | self.return_last = return_last 170 | 171 | # self.reset_parameters() 172 | 173 | def reset_parameters(self): 174 | for rnn in self.rnns: 175 | for name, p in rnn.named_parameters(): 176 | if 'weight' in name: 177 | p.data.normal_(std=0.1) 178 | else: 179 | p.data.zero_() 180 | 181 | def get_init(self, bsz, i): 182 | return self.init_hidden[i].expand(-1, bsz, -1).contiguous(), self.init_c[i].expand(-1, bsz, -1).contiguous() 183 | 184 | def forward(self, input, input_lengths=None): 185 | bsz, slen = input.size(0), input.size(1) 186 | output = input 187 | outputs = [] 188 | if input_lengths is not None: 189 | lens = input_lengths.data.cpu().numpy() 190 | 191 | for i in range(self.nlayers): 192 | hidden, c = self.get_init(bsz, i) 193 | 194 | output = self.dropout(output) 195 | if input_lengths is not None: 196 | output = rnn.pack_padded_sequence(output, lens, batch_first=True) 197 | 198 | output, hidden = self.rnns[i](output, (hidden, c)) 199 | 200 | 201 | if input_lengths is not None: 202 | output, _ = rnn.pad_packed_sequence(output, batch_first=True) 203 | if output.size(1) < slen: # used for parallel 204 | padding = Variable(output.data.new(1, 1, 1).zero_()) 205 | output = torch.cat([output, padding.expand(output.size(0), slen-output.size(1), output.size(2))], dim=1) 206 | if self.return_last: 207 | outputs.append(hidden.permute(1, 0, 2).contiguous().view(bsz, -1)) 208 | else: 209 | outputs.append(output) 210 | if self.concat: 211 | return torch.cat(outputs, dim=2) 212 | return outputs[-1] 213 | 214 | class BiAttention(nn.Module): 215 | def __init__(self, input_size, dropout): 216 | super().__init__() 217 | self.dropout = LockedDropout(dropout) 218 | self.input_linear = nn.Linear(input_size, 1, bias=False) 219 | self.memory_linear = nn.Linear(input_size, 1, bias=False) 220 | 221 | self.dot_scale = nn.Parameter(torch.Tensor(input_size).uniform_(1.0 / (input_size ** 0.5))) 222 | 223 | def forward(self, input, memory, mask): 224 | bsz, input_len, memory_len = input.size(0), input.size(1), memory.size(1) 225 | 226 | input = self.dropout(input) 227 | memory = self.dropout(memory) 228 | 229 | input_dot = self.input_linear(input) 230 | memory_dot = self.memory_linear(memory).view(bsz, 1, memory_len) 231 | cross_dot = torch.bmm(input * self.dot_scale, memory.permute(0, 2, 1).contiguous()) 232 | att = input_dot + memory_dot + cross_dot 233 | att = att - 1e30 * (1 - mask[:,None]) 234 | 235 | weight_one = F.softmax(att, dim=-1) 236 | output_one = torch.bmm(weight_one, memory) 237 | weight_two = F.softmax(att.max(dim=-1)[0], dim=-1).view(bsz, 1, input_len) 238 | output_two = torch.bmm(weight_two, input) 239 | 240 | return torch.cat([input, output_one, input*output_one, output_two*output_one], dim=-1) 241 | -------------------------------------------------------------------------------- /code/models/__init__.py: -------------------------------------------------------------------------------- 1 | from .CNN3 import CNN3 2 | from .LSTM import LSTM 3 | from .BiLSTM import BiLSTM 4 | from .ContextAware import ContextAware 5 | from .LSTM_SP import LSTM_SP 6 | -------------------------------------------------------------------------------- /code/prepro_data/README.md: -------------------------------------------------------------------------------- 1 | # Metadata 2 | 3 | Metadata for baseline model can be downloaded from [TsinghuaCloud](https://cloud.tsinghua.edu.cn/d/99e1c0805eb64736af95/) or [GoogleDrive](https://drive.google.com/drive/folders/1Ri3LIILKKBi3aBJjUVCOBpGX5PpONHRK). 4 | -------------------------------------------------------------------------------- /code/requirements.txt: -------------------------------------------------------------------------------- 1 | matplotlib>=3.0.2 2 | nltk>=3.4 3 | tqdm>=4.29.1 4 | torch>=1.0.0 5 | numpy>=1.16.0 6 | scikit_learn>=0.21.2 7 | -------------------------------------------------------------------------------- /code/test.py: -------------------------------------------------------------------------------- 1 | import config 2 | import models 3 | import numpy as np 4 | import os 5 | import time 6 | import datetime 7 | import json 8 | from sklearn.metrics import average_precision_score 9 | import sys 10 | import os 11 | import argparse 12 | # import IPython 13 | 14 | # sys.excepthook = IPython.core.ultratb.FormattedTB(mode='Verbose', color_scheme='Linux', call_pdb=1) 15 | 16 | 17 | parser = argparse.ArgumentParser() 18 | parser.add_argument('--model_name', type = str, default = 'LSTM', help = 'name of the model') 19 | parser.add_argument('--save_name', type = str) 20 | 21 | parser.add_argument('--train_prefix', type = str, default = 'train') 22 | parser.add_argument('--test_prefix', type = str, default = 'dev_dev') 23 | parser.add_argument('--input_theta', type = float, default = -1) 24 | # parser.add_argument('--ignore_input_theta', type = float, default = -1) 25 | 26 | 27 | args = parser.parse_args() 28 | model = { 29 | 'CNN3': models.CNN3, 30 | 'LSTM': models.LSTM, 31 | 'BiLSTM': models.BiLSTM, 32 | 'ContextAware': models.ContextAware, 33 | # 'LSTM_SP': models.LSTM_SP 34 | } 35 | 36 | con = config.Config(args) 37 | #con.load_train_data() 38 | con.load_test_data() 39 | # con.set_train_model() 40 | con.testall(model[args.model_name], args.save_name, args.input_theta)#, args.ignore_input_theta) 41 | -------------------------------------------------------------------------------- /code/test_sp.py: -------------------------------------------------------------------------------- 1 | import config 2 | import models 3 | import numpy as np 4 | import os 5 | import time 6 | import datetime 7 | import json 8 | from sklearn.metrics import average_precision_score 9 | import sys 10 | import os 11 | import argparse 12 | # import IPython 13 | 14 | # sys.excepthook = IPython.core.ultratb.FormattedTB(mode='Verbose', color_scheme='Linux', call_pdb=1) 15 | 16 | 17 | parser = argparse.ArgumentParser() 18 | parser.add_argument('--model_name', type = str, default = 'LSTM', help = 'name of the model') 19 | parser.add_argument('--save_name', type = str) 20 | 21 | parser.add_argument('--train_prefix', type = str, default = 'train') 22 | parser.add_argument('--test_prefix', type = str, default = 'dev_dev') 23 | parser.add_argument('--input_theta', type = float, default = -1) 24 | parser.add_argument('--output_file', type = str, default = "result.json") 25 | 26 | 27 | args = parser.parse_args() 28 | model = { 29 | 'LSTM_SP': models.LSTM_SP 30 | } 31 | 32 | con = config.EviConfig(args) 33 | #con.load_train_data() 34 | con.load_test_data() 35 | # con.set_train_model() 36 | con.testall(model[args.model_name], args.save_name, args.input_theta) 37 | -------------------------------------------------------------------------------- /code/train.py: -------------------------------------------------------------------------------- 1 | import config 2 | import models 3 | import numpy as np 4 | import os 5 | import time 6 | import datetime 7 | import json 8 | from sklearn.metrics import average_precision_score 9 | import sys 10 | import os 11 | import argparse 12 | # import IPython 13 | 14 | # sys.excepthook = IPython.core.ultratb.FormattedTB(mode='Verbose', color_scheme='Linux', call_pdb=1) 15 | 16 | 17 | parser = argparse.ArgumentParser() 18 | parser.add_argument('--model_name', type = str, default = 'BiLSTM', help = 'name of the model') 19 | parser.add_argument('--save_name', type = str) 20 | 21 | parser.add_argument('--train_prefix', type = str, default = 'dev_train') 22 | parser.add_argument('--test_prefix', type = str, default = 'dev_dev') 23 | 24 | 25 | args = parser.parse_args() 26 | model = { 27 | 'CNN3': models.CNN3, 28 | 'LSTM': models.LSTM, 29 | 'BiLSTM': models.BiLSTM, 30 | 'ContextAware': models.ContextAware, 31 | } 32 | 33 | con = config.Config(args) 34 | con.set_max_epoch(200) 35 | con.load_train_data() 36 | con.load_test_data() 37 | # con.set_train_model() 38 | con.train(model[args.model_name], args.save_name) 39 | -------------------------------------------------------------------------------- /code/train_sp.py: -------------------------------------------------------------------------------- 1 | import config 2 | import models 3 | import numpy as np 4 | import os 5 | import time 6 | import datetime 7 | import json 8 | from sklearn.metrics import average_precision_score 9 | import sys 10 | import os 11 | import argparse 12 | # import IPython 13 | 14 | # sys.excepthook = IPython.core.ultratb.FormattedTB(mode='Verbose', color_scheme='Linux', call_pdb=1) 15 | 16 | 17 | parser = argparse.ArgumentParser() 18 | parser.add_argument('--model_name', type = str, default = 'pcnn_att', help = 'name of the model') 19 | parser.add_argument('--save_name', type = str) 20 | 21 | parser.add_argument('--train_prefix', type = str, default = 'dev_train') 22 | parser.add_argument('--test_prefix', type = str, default = 'dev_dev') 23 | parser.add_argument('--output_file', type = str, default = "result.json") 24 | 25 | 26 | args = parser.parse_args() 27 | model = { 28 | # 'CNN3': models.CNN3, 29 | # 'LSTM': models.LSTM, 30 | # 'BiLSTM': models.BiLSTM, 31 | # 'ContextAware': models.ContextAware, 32 | 'LSTM_SP': models.LSTM_SP 33 | } 34 | 35 | con = config.EviConfig(args) 36 | con.set_max_epoch(200) 37 | con.load_train_data() 38 | con.load_test_data() 39 | # con.set_train_model() 40 | con.train(model[args.model_name], args.save_name) 41 | -------------------------------------------------------------------------------- /data/README.md: -------------------------------------------------------------------------------- 1 | # Data 2 | 3 | Data can be downloaded from [Google Drive](https://drive.google.com/drive/folders/1c5-0YwnoJx8NS6CV2f-NoTHR__BdkNqw?usp=sharing). 4 | 5 | Relation information file has been uploaded. 6 | 7 | 8 | ``` 9 | Data Format: 10 | { 11 | 'title', 12 | 'sents': [ 13 | [word in sent 0], 14 | [word in sent 1] 15 | ] 16 | 'vertexSet': [ 17 | [ 18 | { 'name': mention_name, 19 | 'sent_id': mention in which sentence, 20 | 'pos': postion of mention in a sentence, 21 | 'type': NER_type} 22 | {anthor mention} 23 | ], 24 | [anthoer entity] 25 | ] 26 | 'labels': [ 27 | { 28 | 'h': idx of head entity in vertexSet, 29 | 't': idx of tail entity in vertexSet, 30 | 'r': relation, 31 | 'evidence': evidence sentences' id 32 | } 33 | ] 34 | } 35 | ``` 36 | 37 | Please submit the test set result to Codalab. 38 | --------------------------------------------------------------------------------