├── CoMPILE_github ├── data.rar ├── ensembling │ ├── blend.py │ ├── compute_auc.py │ ├── compute_rank_metrics.py │ ├── get_ensemble_predictions.sh │ ├── get_kge_ensemble.sh │ └── score_triplets_kge.py ├── experiments │ ├── compile_nell_v4_ind │ │ ├── best_graph_classifier.pth │ │ ├── graph_classifier_chk.pth │ │ ├── log_rank_test_1642565763.9744015.txt │ │ ├── log_train.txt │ │ ├── params.json │ │ └── test_nell_v4_ind_0 │ │ │ └── log_test.txt │ └── compile_nell_v4_ind2 │ │ ├── best_graph_classifier.pth │ │ ├── graph_classifier_chk.pth │ │ ├── log_rank_test_1642603455.548011.txt │ │ ├── log_rank_test_1642647880.451746.txt │ │ ├── log_rank_test_1642679948.7119958.txt │ │ ├── log_rank_test_1642733457.0139189.txt │ │ ├── log_rank_test_1642832066.1408617.txt │ │ ├── log_rank_test_1642841084.211995.txt │ │ ├── log_rank_test_1642850740.4142146.txt │ │ ├── log_rank_test_1642855466.9402952.txt │ │ ├── log_rank_test_1642859913.6085432.txt │ │ ├── log_rank_test_1642908893.5786257.txt │ │ ├── log_train.txt │ │ ├── params.json │ │ └── test_nell_v4_ind_0 │ │ └── log_test.txt ├── kge │ ├── README.md │ ├── dataloader.py │ ├── model.py │ ├── run.py │ └── run.sh ├── managers │ ├── __pycache__ │ │ ├── evaluator.cpython-36.pyc │ │ └── trainer.cpython-36.pyc │ ├── evaluator.py │ └── trainer.py ├── model │ └── dgl │ │ ├── aggregators.py │ │ ├── graph_classifier.py │ │ ├── layers.py │ │ └── rgcn_model.py ├── requirements.txt ├── subgraph_extraction │ ├── __pycache__ │ │ ├── datasets.cpython-36.pyc │ │ └── graph_sampler.cpython-36.pyc │ ├── datasets.py │ └── graph_sampler.py ├── test_auc.py ├── test_ranking.py ├── train.py └── utils │ ├── __pycache__ │ ├── data_utils.cpython-36.pyc │ ├── dgl_utils.cpython-36.pyc │ ├── graph_utils.cpython-36.pyc │ └── initialization_utils.cpython-36.pyc │ ├── clean_data.py │ ├── data_utils.py │ ├── dgl_utils.py │ ├── graph_utils.py │ ├── initialization_utils.py │ └── prepare_meta_data.py ├── CoMPILE_v2 ├── README.md ├── data │ ├── FB15k-237-inductive-v1 │ │ ├── relation2id.json │ │ ├── test.txt │ │ ├── test_inductive.txt │ │ ├── train.txt │ │ ├── train_inductive.txt │ │ ├── valid.txt │ │ └── valid_inductive.txt │ ├── FB15k-237-inductive-v2 │ │ ├── test.txt │ │ ├── test_inductive.txt │ │ ├── train.txt │ │ ├── train_inductive.txt │ │ ├── valid.txt │ │ └── valid_inductive.txt │ ├── FB15k-237-inductive-v3 │ │ ├── test.txt │ │ ├── test_inductive.txt │ │ ├── train.txt │ │ ├── train_inductive.txt │ │ ├── valid.txt │ │ └── valid_inductive.txt │ ├── FB15k-237-v1_3_hop_new_data.pickle │ ├── FB15k-237-v1_3_hop_total_test_head2.pickle │ ├── FB15k-237-v1_3_hop_total_test_tail2.pickle │ ├── FB15k-237-v1_3_total_train_data.pickle │ ├── FB15k-237-v1_3_total_train_label.pickle │ ├── FB15k-237-v1_3_total_val_data.pickle │ ├── FB15k-237-v1_3_total_val_label.pickle │ ├── nell-inductive-v1 │ │ ├── test.txt │ │ ├── test_inductive.txt │ │ ├── train.txt │ │ ├── train_inductive.txt │ │ ├── valid.txt │ │ └── valid_inductive.txt │ ├── nell-inductive-v2 │ │ ├── test.txt │ │ ├── test_inductive.txt │ │ ├── train.txt │ │ ├── train_inductive.txt │ │ ├── valid.txt │ │ └── valid_inductive.txt │ └── nell-inductive-v3 │ │ ├── test.txt │ │ ├── test_inductive.txt │ │ ├── train.txt │ │ ├── train_inductive.txt │ │ ├── valid.txt │ │ └── valid_inductive.txt ├── log │ └── FB15k-237-v1_error_now.log └── train │ ├── __pycache__ │ ├── create_batch_inductive2.cpython-37.pyc │ ├── layers.cpython-37.pyc │ ├── layers2.cpython-37.pyc │ ├── models4.cpython-37.pyc │ ├── preprocess2.cpython-37.pyc │ ├── preprocess_inductive.cpython-37.pyc │ └── utils.cpython-37.pyc │ ├── create_batch.pyc │ ├── create_batch2.pyc │ ├── create_batch_inductive2.py │ ├── create_dataset_files.py │ ├── inductive_subgraph8.py │ ├── layers.py │ ├── layers.pyc │ ├── layers2.py │ ├── layers2.pyc │ ├── main_compile.py │ ├── models.pyc │ ├── models4.py │ ├── models4.pyc │ ├── preprocess.pyc │ ├── preprocess2.py │ ├── preprocess2.pyc │ ├── preprocess_inductive.py │ ├── utils.py │ └── utils.pyc ├── LICENSE └── README.md /CoMPILE_github/data.rar: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TmacMai/CoMPILE_Inductive_Knowledge_Graph/523db77f8178f89faf5ec06c677ca66b84ef35eb/CoMPILE_github/data.rar -------------------------------------------------------------------------------- /CoMPILE_github/ensembling/blend.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import os 3 | 4 | 5 | import torch 6 | import torch.nn as nn 7 | import torch.optim as optim 8 | 9 | 10 | def read_scores(path): 11 | with open(path) as f: 12 | scores = [float(line.split()[-1]) for line in f.read().split('\n')[:-1]] 13 | return scores 14 | 15 | 16 | def get_triplets(path): 17 | with open(path) as f: 18 | triplets = [line.split()[:-1] for line in f.read().split('\n')[:-1]] 19 | return triplets 20 | 21 | 22 | def train(params): 23 | ''' 24 | Train and save a linear layer model. 25 | ''' 26 | ens_model_1_pos_scores_path = os.path.join('../data/{}/{}_valid_predictions.txt'.format(params.dataset, params.ensemble_model_1)) 27 | ens_model_1_neg_scores_path = os.path.join('../data/{}/{}_neg_valid_0_predictions.txt'.format(params.dataset, params.ensemble_model_1)) 28 | ens_model_2_pos_scores_path = os.path.join('../data/{}/{}_valid_predictions.txt'.format(params.dataset, params.ensemble_model_2)) 29 | ens_model_2_neg_scores_path = os.path.join('../data/{}/{}_neg_valid_0_predictions.txt'.format(params.dataset, params.ensemble_model_2)) 30 | 31 | assert get_triplets(ens_model_1_pos_scores_path) == get_triplets(ens_model_2_pos_scores_path) 32 | assert get_triplets(ens_model_1_neg_scores_path) == get_triplets(ens_model_2_neg_scores_path) 33 | 34 | pos_scores = torch.Tensor(list(zip(read_scores(ens_model_1_pos_scores_path), read_scores(ens_model_2_pos_scores_path)))) 35 | neg_scores = torch.Tensor(list(zip(read_scores(ens_model_1_neg_scores_path), read_scores(ens_model_2_neg_scores_path)))) 36 | 37 | # scores = pos_scores + neg_scores 38 | # targets = [1] * len(pos_scores) + [0] * len(neg_scores) 39 | 40 | model = nn.Linear(in_features=2, out_features=1) 41 | criterion = nn.MarginRankingLoss(10, reduction='sum') 42 | optimizer = optim.Adam(model.parameters(), lr=0.1, weight_decay=5e-4) 43 | 44 | for e in range(params.num_epochs): 45 | pos_out = model(pos_scores) 46 | neg_out = model(neg_scores) 47 | 48 | loss = criterion(pos_out, neg_out.view(len(pos_out), -1).mean(dim=1), torch.Tensor([1])) 49 | print('Loss at epoch {} : {}'.format(e, loss)) 50 | optimizer.zero_grad() 51 | loss.backward() 52 | optimizer.step() 53 | 54 | torch.save(model, os.path.join('../experiments', f'{params.ensemble_model_1}_{params.ensemble_model_2}_{params.dataset}_ensemble.pth')) 55 | 56 | 57 | def score_triplets(params): 58 | ''' 59 | Load the saved model and save scores of given set of triplets. 60 | ''' 61 | print('Loading model..') 62 | model = torch.load(os.path.join('../experiments', f'{params.ensemble_model_1}_{params.ensemble_model_2}_{params.dataset}_ensemble.pth')) 63 | print('Model loaded successfully!') 64 | 65 | ens_model_1_scores_path = os.path.join('../data/{}/{}_{}_predictions.txt'.format(params.dataset, params.ensemble_model_1, params.file_to_score)) 66 | ens_model_2_scores_path = os.path.join('../data/{}/{}_{}_predictions.txt'.format(params.dataset, params.ensemble_model_2, params.file_to_score)) 67 | 68 | scores = torch.Tensor(list(zip(read_scores(ens_model_1_scores_path), read_scores(ens_model_2_scores_path)))) 69 | ens_scores = model(scores) 70 | 71 | ens_model_1_triplets = get_triplets(ens_model_1_scores_path) 72 | ens_model_2_triplets = get_triplets(ens_model_2_scores_path) 73 | 74 | assert ens_model_1_triplets == ens_model_2_triplets 75 | 76 | file_path = os.path.join('../', 'data/{}/{}_with_{}_{}_predictions.txt'.format(params.dataset, params.ensemble_model_1, params.ensemble_model_2, params.file_to_score)) 77 | with open(file_path, "w") as f: 78 | for ([s, r, o], score) in zip(ens_model_1_triplets, ens_scores): 79 | f.write('\t'.join([s, r, o, str(score.item())]) + '\n') 80 | 81 | 82 | if __name__ == '__main__': 83 | parser = argparse.ArgumentParser(description='Model blender script') 84 | 85 | parser.add_argument('--dataset', '-d', type=str, default='Toy') 86 | parser.add_argument('--ensemble_model_1', '-em1', default='grail', type=str) 87 | parser.add_argument('--ensemble_model_2', '-em2', default='TransE', type=str) 88 | parser.add_argument('--do_train', action='store_true') 89 | parser.add_argument("--num_epochs", "-ne", type=int, default=500, 90 | help="Number of training iterations") 91 | parser.add_argument('--do_scoring', action='store_true') 92 | parser.add_argument('--file_to_score', '-f', default='valid', type=str) 93 | 94 | params = parser.parse_args() 95 | 96 | if params.do_train: 97 | train(params) 98 | elif params.do_scoring: 99 | score_triplets(params) 100 | -------------------------------------------------------------------------------- /CoMPILE_github/ensembling/compute_auc.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | from sklearn import metrics 3 | 4 | if __name__ == '__main__': 5 | parser = argparse.ArgumentParser(description='Compute AUC from scored positive and negative triplets') 6 | 7 | parser.add_argument('--dataset', '-d', type=str, default='Toy') 8 | parser.add_argument('--model', '-m', default='ens', type=str) 9 | parser.add_argument('--test_file', '-t', default='test', type=str) 10 | 11 | params = parser.parse_args() 12 | 13 | # load pos and neg prediction scores of the test_file of the dataset for the given model. 14 | with open('../data/{}/{}_{}_predictions.txt'.format(params.dataset, params.model, params.test_file)) as f: 15 | pos_scores = [float(line.split()[-1]) for line in f.read().split('\n')[:-1]] 16 | with open('../data/{}/{}_neg_{}_0_predictions.txt'.format(params.dataset, params.model, params.test_file)) as f: 17 | neg_scores = [float(line.split()[-1]) for line in f.read().split('\n')[:-1]] 18 | 19 | # compute auc score 20 | scores = pos_scores + neg_scores 21 | labels = [1] * len(pos_scores) + [0] * len(neg_scores) 22 | 23 | auc = metrics.roc_auc_score(labels, scores) 24 | auc_pr = metrics.average_precision_score(labels, scores) 25 | 26 | with open('../data/{}/{}_{}_auc.txt'.format(params.dataset, params.model, params.test_file), "w") as f: 27 | f.write('AUC : {}, AUC_PR : {}\n'.format(auc, auc_pr)) 28 | -------------------------------------------------------------------------------- /CoMPILE_github/ensembling/compute_rank_metrics.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import os 3 | 4 | import numpy as np 5 | from scipy.stats import rankdata 6 | 7 | 8 | def get_ranks(scores): 9 | ''' 10 | Given scores of head/tail substituted triplets, return ranks of each triplet. 11 | Assumes a fixed number of negative samples (50) 12 | ''' 13 | ranks = [] 14 | for i in range(len(scores) // 50): 15 | # rank = np.argwhere(np.argsort(scores[50 * i: 50 * (i + 1)])[::-1] == 0) + 1 16 | rank = 50 - rankdata(scores[50 * i: 50 * (i + 1)], method='min')[0] + 1 17 | ranks.append(rank) 18 | return ranks 19 | 20 | 21 | if __name__ == '__main__': 22 | parser = argparse.ArgumentParser(description='Compute AUC from scored positive and negative triplets') 23 | 24 | parser.add_argument('--dataset', '-d', type=str, default='Toy') 25 | parser.add_argument('--model', '-m', default='ens', type=str) 26 | 27 | params = parser.parse_args() 28 | 29 | # load head and tail prediction scores of the test file of the dataset for the given model. 30 | with open('../data/{}/{}_ranking_head_predictions.txt'.format(params.dataset, params.model)) as f: 31 | head_scores = [float(line.split()[-1]) for line in f.read().split('\n')[:-1]] 32 | with open('../data/{}/{}_ranking_tail_predictions.txt'.format(params.dataset, params.model)) as f: 33 | tail_scores = [float(line.split()[-1]) for line in f.read().split('\n')[:-1]] 34 | 35 | # compute both ranks from the prediction scores 36 | head_ranks = get_ranks(head_scores) 37 | tail_ranks = get_ranks(tail_scores) 38 | 39 | ranks = head_ranks + tail_ranks 40 | 41 | isHit1List = [x for x in ranks if x <= 1] 42 | isHit5List = [x for x in ranks if x <= 5] 43 | isHit10List = [x for x in ranks if x <= 10] 44 | hits_1 = len(isHit1List) / len(ranks) 45 | hits_5 = len(isHit5List) / len(ranks) 46 | hits_10 = len(isHit10List) / len(ranks) 47 | 48 | mrr = np.mean(1 / np.array(ranks)) 49 | 50 | with open('../data/{}/{}_ranking_metrics.txt'.format(params.dataset, params.model), "w") as f: 51 | f.write(f'MRR | Hits@1 | Hits@5 | Hits@10 : {mrr} | {hits_1} | {hits_5} | {hits_10}\n') 52 | -------------------------------------------------------------------------------- /CoMPILE_github/ensembling/get_ensemble_predictions.sh: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bash 2 | 3 | # This script assumes GraIL predection scores on the validation and test set are already saved. 4 | # It also assumes that scored head/tail replaced triplets are also stored. 5 | # If any of those is not present, run the corresponding script from the following setup commands. 6 | ##################### SET UP ##################### 7 | # python test_auc.py -d WN18RR -e saved_grail_exp_name --hop 3 -t valid 8 | # python test_auc.py -d WN18RR -e saved_grail_exp_name --hop 3 -t test 9 | 10 | # python test_auc.py -d NELL-995 -e saved_grail_exp_name --hop 2 -t valid 11 | # python test.py -d NELL-995 -e saved_grail_exp_name --hop 2 -t test 12 | 13 | # python test_auc.py -d FB15K237 -e saved_grail_exp_name --hop 1 -t valid 14 | # python test_auc.py -d FB15K237 -e saved_grail_exp_name --hop 1 -t test 15 | 16 | # python test_ranking.py -d WN18RR -e saved_grail_exp_name --hop 3 17 | 18 | # python test_ranking.py -d NELL-995 -e saved_grail_exp_name --hop 2 19 | 20 | # python test_ranking.py -d FB15K237 -e saved_grail_exp_name --hop 1 21 | ################################################## 22 | 23 | 24 | # Arguments 25 | # Dataset 26 | DATASET=$1 27 | # KGE model to be used in ensemble 28 | KGE_MODEL=$2 29 | KGE_SAVED_MODEL_PATH="../experiments/kge_baselines/${KGE_MODEL}_${DATASET}" 30 | 31 | # score pos validation triplets with KGE model 32 | python score_triplets_kge.py -d $DATASET --model $KGE_MODEL -f valid -init $KGE_SAVED_MODEL_PATH 33 | # score neg validation triplets with KGE model 34 | python score_triplets_kge.py -d $DATASET --model $KGE_MODEL -f neg_valid_0 -init $KGE_SAVED_MODEL_PATH 35 | 36 | # train the ensemble model 37 | python blend.py -d $DATASET -em2 $KGE_MODEL --do_train -ne 500 38 | 39 | # Score the test pos and neg triplets with KGE model 40 | python score_triplets_kge.py -d $DATASET --model $KGE_MODEL -f test -init $KGE_SAVED_MODEL_PATH 41 | python score_triplets_kge.py -d $DATASET --model $KGE_MODEL -f neg_test_0 -init $KGE_SAVED_MODEL_PATH 42 | # Score the test pos and neg triplets with ensemble model 43 | python blend.py -d $DATASET -em2 $KGE_MODEL --do_scoring -f test 44 | python blend.py -d $DATASET -em2 $KGE_MODEL --do_scoring -f neg_test_0 45 | # Compute auc with the ensemble model scored pos and neg test files 46 | python compute_auc.py -d $DATASET -m grail_with_${KGE_MODEL} 47 | # Compute auc with the KGE model model scored pos and neg test files 48 | python compute_auc.py -d $DATASET -m $KGE_MODEL 49 | 50 | # Score head/tail replaced samples with KGE model 51 | python score_triplets_kge.py -d $DATASET --model $KGE_MODEL -f ranking_head -init $KGE_SAVED_MODEL_PATH 52 | python score_triplets_kge.py -d $DATASET --model $KGE_MODEL -f ranking_tail -init $KGE_SAVED_MODEL_PATH 53 | # Score head/tail replaced samples with ensemble model 54 | python blend.py -d $DATASET -em2 $KGE_MODEL --do_scoring -f ranking_head 55 | python blend.py -d $DATASET -em2 $KGE_MODEL --do_scoring -f ranking_tail 56 | # Compute ranking metrics for ensemble model with the scored head/tail replaced samples 57 | python compute_rank_metrics.py -d $DATASET -m grail_with_${KGE_MODEL} 58 | # Compute ranking metrics for KGE model with the scored head/tail replaced samples 59 | python compute_rank_metrics.py -d $DATASET -m $KGE_MODEL -------------------------------------------------------------------------------- /CoMPILE_github/ensembling/get_kge_ensemble.sh: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bash 2 | 3 | # This script assumes that head/tail replaced negative triplets are already stored while evaluating GraIL. 4 | # This assumptionn is made in order to make fair evaluations of all the methods on the same negative samples. 5 | # If any of those is not present, run the corresponding script from the following setup commands. These will 6 | # evaluate GraIL and savee thee negative samples along the way. 7 | ##################### SET UP ##################### 8 | # python test_auc.py -d WN18RR -e saved_grail_exp_name --hop 3 -t valid 9 | # python test_auc.py -d WN18RR -e saved_grail_exp_name --hop 3 -t test 10 | 11 | # python test_auc.py -d NELL-995 -e saved_grail_exp_name --hop 2 -t valid 12 | # python test_auc.py -d NELL-995 -e saved_grail_exp_name --hop 2 -t test 13 | 14 | # python test_auc.py -d FB15K237 -e saved_grail_exp_name --hop 1 -t valid 15 | # python test_auc.py -d FB15K237 -e saved_grail_exp_name --hop 1 -t test 16 | 17 | # python test_ranking.py -d WN18RR -e saved_grail_exp_name --hop 3 18 | 19 | # python test_ranking.py -d NELL-995 -e saved_grail_exp_name --hop 2 20 | 21 | # python test_ranking.py -d FB15K237 -e saved_grail_exp_name --hop 1 22 | ################################################## 23 | 24 | 25 | # Arguments 26 | # Dataset 27 | DATASET=$1 28 | # KGE model to be used in ensemble 29 | KGE_MODEL_1=$2 30 | KGE_SAVED_MODEL_PATH_1="../experiments/kge_baselines/${KGE_MODEL_1}_${DATASET}" 31 | 32 | KGE_MODEL_2=$3 33 | KGE_SAVED_MODEL_PATH_2="../experiments/kge_baselines/${KGE_MODEL_2}_${DATASET}" 34 | 35 | # score pos validation triplets with KGE model 36 | python score_triplets_kge.py -d $DATASET --model $KGE_MODEL_1 -f valid -init $KGE_SAVED_MODEL_PATH_1 37 | # score neg validation triplets with KGE model 38 | python score_triplets_kge.py -d $DATASET --model $KGE_MODEL_1 -f neg_valid_0 -init $KGE_SAVED_MODEL_PATH_1 39 | 40 | # score pos validation triplets with KGE model 41 | python score_triplets_kge.py -d $DATASET --model $KGE_MODEL_2 -f valid -init $KGE_SAVED_MODEL_PATH_2 42 | # score neg validation triplets with KGE model 43 | python score_triplets_kge.py -d $DATASET --model $KGE_MODEL_2 -f neg_valid_0 -init $KGE_SAVED_MODEL_PATH_2 44 | 45 | # train the ensemble model 46 | python blend.py -d $DATASET -em1 $KGE_MODEL_1 -em2 $KGE_MODEL_2 --do_train -ne 500 47 | 48 | # Score the test pos and neg triplets with KGE model 49 | python score_triplets_kge.py -d $DATASET --model $KGE_MODEL_1 -f test -init $KGE_SAVED_MODEL_PATH_1 50 | python score_triplets_kge.py -d $DATASET --model $KGE_MODEL_1 -f neg_test_0 -init $KGE_SAVED_MODEL_PATH_1 51 | # Score the test pos and neg triplets with KGE model 52 | python score_triplets_kge.py -d $DATASET --model $KGE_MODEL_2 -f test -init $KGE_SAVED_MODEL_PATH_2 53 | python score_triplets_kge.py -d $DATASET --model $KGE_MODEL_2 -f neg_test_0 -init $KGE_SAVED_MODEL_PATH_2 54 | 55 | 56 | # Score the test pos and neg triplets with ensemble model 57 | python blend.py -d $DATASET -em1 $KGE_MODEL_1 -em2 $KGE_MODEL_2 --do_scoring -f test 58 | python blend.py -d $DATASET -em1 $KGE_MODEL_1 -em2 $KGE_MODEL_2 --do_scoring -f neg_test_0 59 | # Compute auc with the ensemble model scored pos and neg test files 60 | python compute_auc.py -d $DATASET -m ${KGE_MODEL_1}_with_${KGE_MODEL_2} 61 | 62 | # Score head/tail replaced samples with KGE model 63 | python score_triplets_kge.py -d $DATASET --model $KGE_MODEL_1 -f ranking_head -init $KGE_SAVED_MODEL_PATH_1 64 | python score_triplets_kge.py -d $DATASET --model $KGE_MODEL_1 -f ranking_tail -init $KGE_SAVED_MODEL_PATH_1 65 | # Score head/tail replaced samples with KGE model 66 | python score_triplets_kge.py -d $DATASET --model $KGE_MODEL_2 -f ranking_head -init $KGE_SAVED_MODEL_PATH_2 67 | python score_triplets_kge.py -d $DATASET --model $KGE_MODEL_2 -f ranking_tail -init $KGE_SAVED_MODEL_PATH_2 68 | 69 | 70 | # Score head/tail replaced samples with ensemble model 71 | python blend.py -d $DATASET -em1 $KGE_MODEL_1 -em2 $KGE_MODEL_2 --do_scoring -f ranking_head 72 | python blend.py -d $DATASET -em1 $KGE_MODEL_1 -em2 $KGE_MODEL_2 --do_scoring -f ranking_tail 73 | # Compute ranking metrics for ensemble model with the scored head/tail replaced samples 74 | python compute_rank_metrics.py -d $DATASET -m ${KGE_MODEL_1}_with_${KGE_MODEL_2} -------------------------------------------------------------------------------- /CoMPILE_github/ensembling/score_triplets_kge.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/python3 2 | 3 | from __future__ import absolute_import 4 | from __future__ import division 5 | from __future__ import print_function 6 | 7 | import sys 8 | sys.path.insert(1, '../') 9 | 10 | import argparse 11 | import json 12 | import logging 13 | import os 14 | 15 | import torch 16 | 17 | from kge.model import KGEModel 18 | 19 | from utils.data_utils import process_files 20 | 21 | 22 | def parse_args(args=None): 23 | parser = argparse.ArgumentParser( 24 | description='Training and Testing Knowledge Graph Embedding Models', 25 | usage='train.py [] [-h | --help]' 26 | ) 27 | 28 | parser.add_argument('--cuda', action='store_true', help='use GPU') 29 | 30 | parser.add_argument('--dataset', '-d', type=str, default='Toy') 31 | parser.add_argument('--model', '-m', default='TransE', type=str) 32 | parser.add_argument('--file_to_score', '-f', default='test', type=str) 33 | parser.add_argument('--init_checkpoint', '-init', default=None, type=str) 34 | 35 | return parser.parse_args(args) 36 | 37 | 38 | def override_config(args): 39 | ''' 40 | Override model and data configuration 41 | ''' 42 | 43 | with open(os.path.join(args.init_checkpoint, 'config.json'), 'r') as fjson: 44 | argparse_dict = json.load(fjson) 45 | 46 | args.countries = argparse_dict['countries'] 47 | if args.dataset is None: 48 | args.dataset = argparse_dict['dataset'] 49 | args.model = argparse_dict['model'] 50 | args.double_entity_embedding = argparse_dict['double_entity_embedding'] 51 | args.double_relation_embedding = argparse_dict['double_relation_embedding'] 52 | args.hidden_dim = argparse_dict['hidden_dim'] 53 | args.test_batch_size = argparse_dict['test_batch_size'] 54 | args.gamma = argparse_dict['gamma'] 55 | 56 | 57 | def set_logger(args): 58 | ''' 59 | Write logs to checkpoint and console 60 | ''' 61 | log_file = os.path.join(args.init_checkpoint, 'score_{}.log'.format(args.file_to_score)) 62 | 63 | logging.basicConfig( 64 | format='%(asctime)s %(levelname)-8s %(message)s', 65 | level=logging.INFO, 66 | datefmt='%Y-%m-%d %H:%M:%S', 67 | filename=log_file, 68 | filemode='w' 69 | ) 70 | console = logging.StreamHandler() 71 | console.setLevel(logging.INFO) 72 | formatter = logging.Formatter('%(asctime)s %(levelname)-8s %(message)s') 73 | console.setFormatter(formatter) 74 | logging.getLogger('').addHandler(console) 75 | 76 | 77 | def read_triple(file_path, entity2id, relation2id): 78 | ''' 79 | Read triples and map them into ids. 80 | ''' 81 | triples = [] 82 | with open(file_path) as fin: 83 | for line in fin: 84 | h, r, t = line.strip().split('\t') 85 | triples.append((entity2id[h], relation2id[r], entity2id[t])) 86 | return triples 87 | 88 | 89 | def main(args): 90 | if args.init_checkpoint: 91 | override_config(args) 92 | elif args.dataset is None: 93 | raise ValueError('one of init_checkpoint/dataset must be choosed.') 94 | 95 | # Write logs to checkpoint and console 96 | set_logger(args) 97 | 98 | main_dir = os.path.join(os.path.relpath(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))), '.') 99 | 100 | with open(os.path.join(main_dir, 'data/{}/entities.dict'.format(args.dataset))) as fin: 101 | entity2id = dict() 102 | for line in fin: 103 | eid, entity = line.strip().split('\t') 104 | entity2id[entity] = int(eid) 105 | 106 | with open(os.path.join(main_dir, 'data/{}/relations.dict'.format(args.dataset))) as fin: 107 | relation2id = dict() 108 | for line in fin: 109 | rid, relation = line.strip().split('\t') 110 | relation2id[relation] = int(rid) 111 | 112 | # test_triples = to_kge_format(triplets['to_score']) 113 | test_triples = read_triple(os.path.join(main_dir, 'data/{}/{}.txt'.format(args.dataset, args.file_to_score)), entity2id, relation2id) 114 | 115 | nentity = len(entity2id) 116 | nrelation = len(relation2id) 117 | args.nentity = nentity 118 | args.nrelation = nrelation 119 | 120 | logging.info('Model: %s' % args.model) 121 | logging.info('Data Path: %s' % args.dataset) 122 | logging.info('#entity: %d' % nentity) 123 | logging.info('#relation: %d' % nrelation) 124 | 125 | kge_model = KGEModel( 126 | model_name=args.model, 127 | nentity=nentity, 128 | nrelation=nrelation, 129 | hidden_dim=args.hidden_dim, 130 | gamma=args.gamma, 131 | double_entity_embedding=args.double_entity_embedding, 132 | double_relation_embedding=args.double_relation_embedding 133 | ) 134 | 135 | logging.info('Model Parameter Configuration:') 136 | for name, param in kge_model.named_parameters(): 137 | logging.info('Parameter %s: %s, require_grad = %s' % (name, str(param.size()), str(param.requires_grad))) 138 | 139 | if args.cuda: 140 | kge_model = kge_model.cuda() 141 | 142 | # Restore model from checkpoint directory 143 | logging.info('Loading checkpoint %s...' % args.init_checkpoint) 144 | checkpoint = torch.load(os.path.join(args.init_checkpoint, 'checkpoint')) 145 | kge_model.load_state_dict(checkpoint['model_state_dict']) 146 | logging.info('Scoring the triplets in {}.txt file'.format(args.file_to_score)) 147 | scores = kge_model.score_triplets(kge_model, test_triples, args) 148 | 149 | with open(os.path.join(main_dir, 'data/{}/{}.txt'.format(args.dataset, args.file_to_score))) as f: 150 | triplets = [line.split() for line in f.read().split('\n')[:-1]] 151 | file_path = os.path.join(main_dir, 'data/{}/{}_{}_predictions.txt'.format(args.dataset, args.model, args.file_to_score)) 152 | with open(file_path, "w") as f: 153 | for ([s, r, o], score) in zip(triplets, scores): 154 | f.write('\t'.join([s, r, o, str(score)]) + '\n') 155 | 156 | 157 | if __name__ == '__main__': 158 | main(parse_args()) 159 | -------------------------------------------------------------------------------- /CoMPILE_github/experiments/compile_nell_v4_ind/best_graph_classifier.pth: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TmacMai/CoMPILE_Inductive_Knowledge_Graph/523db77f8178f89faf5ec06c677ca66b84ef35eb/CoMPILE_github/experiments/compile_nell_v4_ind/best_graph_classifier.pth -------------------------------------------------------------------------------- /CoMPILE_github/experiments/compile_nell_v4_ind/graph_classifier_chk.pth: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TmacMai/CoMPILE_Inductive_Knowledge_Graph/523db77f8178f89faf5ec06c677ca66b84ef35eb/CoMPILE_github/experiments/compile_nell_v4_ind/graph_classifier_chk.pth -------------------------------------------------------------------------------- /CoMPILE_github/experiments/compile_nell_v4_ind/log_rank_test_1642565763.9744015.txt: -------------------------------------------------------------------------------- 1 | ============ Initialized logger ============ 2 | add_traspose_rels: False 3 | dataset: nell_v4_ind 4 | enclosing_sub_graph: True 5 | experiment_name: compile_nell_v4_ind 6 | file_paths: {'graph': './data/nell_v4_ind/train.txt', 'links': './data/nell_v4_ind/test.txt'} 7 | hop: 3 8 | kge_model: TransE 9 | mode: sample 10 | model_path: experiments/compile_nell_v4_ind/best_graph_classifier.pth 11 | ruleN_pred_path: ./data/nell_v4_ind/pos_predictions.txt 12 | use_kge_embeddings: False 13 | ============================================ 14 | MRR | Hits@1 | Hits@5 | Hits@10 : 0.5919081837688219 | 0.5314637482900136 | 0.6354309165526676 | 0.6470588235294118 15 | -------------------------------------------------------------------------------- /CoMPILE_github/experiments/compile_nell_v4_ind/log_train.txt: -------------------------------------------------------------------------------- 1 | ============ Initialized logger ============ 2 | add_ht_emb: True 3 | add_traspose_rels: False 4 | attn_rel_emb_dim: 32 5 | batch_size: 16 6 | clip: 1000 7 | constrained_neg_prob: 0.0 8 | dataset: nell_v4 9 | disable_cuda: False 10 | dropout: 0 11 | early_stop: 100 12 | edge_dropout: 0.5 13 | emb_dim: 32 14 | enclosing_sub_graph: True 15 | eval_every: 1 16 | eval_every_iter: 455 17 | exp_dir: utils/../experiments/compile_nell_v4_ind 18 | experiment_name: compile_nell_v4_ind 19 | gnn_agg_type: sum 20 | gpu: 0 21 | has_attn: True 22 | hop: 3 23 | kge_model: TransE 24 | l2: 0.0005 25 | load_model: False 26 | lr: 0.001 27 | main_dir: utils/.. 28 | margin: 10 29 | max_links: 10000000 30 | max_nodes_per_hop: None 31 | model_type: dgl 32 | num_bases: 4 33 | num_epochs: 30 34 | num_gcn_layers: 3 35 | num_neg_samples_per_link: 1 36 | num_workers: 0 37 | optimizer: Adam 38 | rel_emb_dim: 32 39 | save_every: 10 40 | train_file: train 41 | use_kge_embeddings: False 42 | valid_file: valid 43 | ============================================ 44 | Sampling negative links for train 45 | Sampling negative links for valid 46 | Extracting enclosing subgraphs for positive links in train set 47 | Extracting enclosing subgraphs for negative links in train set 48 | Extracting enclosing subgraphs for positive links in valid set 49 | Extracting enclosing subgraphs for negative links in valid set 50 | Max distance from sub : 3, Max distance from obj : 3 51 | Max distance from sub : 3, Max distance from obj : 3 52 | No existing model found. Initializing new model.. 53 | Device: cuda:0 54 | Input dim : 8, # Relations : 76, # Augmented relations : 76 55 | Total number of parameters: 36609 56 | Starting training with full batch... 57 | ============ Initialized logger ============ 58 | add_ht_emb: True 59 | add_traspose_rels: False 60 | attn_rel_emb_dim: 32 61 | batch_size: 16 62 | clip: 1000 63 | constrained_neg_prob: 0.0 64 | dataset: nell_v4 65 | disable_cuda: False 66 | dropout: 0 67 | early_stop: 100 68 | edge_dropout: 0.5 69 | emb_dim: 32 70 | enclosing_sub_graph: True 71 | eval_every: 1 72 | eval_every_iter: 455 73 | exp_dir: utils/../experiments/compile_nell_v4_ind 74 | experiment_name: compile_nell_v4_ind 75 | gnn_agg_type: sum 76 | gpu: 0 77 | has_attn: True 78 | hop: 3 79 | kge_model: TransE 80 | l2: 0.0005 81 | load_model: False 82 | lr: 0.001 83 | main_dir: utils/.. 84 | margin: 10 85 | max_links: 10000000 86 | max_nodes_per_hop: None 87 | model_type: dgl 88 | num_bases: 4 89 | num_epochs: 30 90 | num_gcn_layers: 3 91 | num_neg_samples_per_link: 1 92 | num_workers: 0 93 | optimizer: Adam 94 | rel_emb_dim: 32 95 | save_every: 10 96 | train_file: train 97 | use_kge_embeddings: False 98 | valid_file: valid 99 | ============================================ 100 | Max distance from sub : 3, Max distance from obj : 3 101 | Max distance from sub : 3, Max distance from obj : 3 102 | No existing model found. Initializing new model.. 103 | Device: cuda:0 104 | Input dim : 8, # Relations : 76, # Augmented relations : 76 105 | Total number of parameters: 36609 106 | Starting training with full batch... 107 | ============ Initialized logger ============ 108 | add_ht_emb: True 109 | add_traspose_rels: False 110 | attn_rel_emb_dim: 32 111 | batch_size: 16 112 | clip: 1000 113 | constrained_neg_prob: 0.0 114 | dataset: nell_v4 115 | disable_cuda: False 116 | dropout: 0 117 | early_stop: 100 118 | edge_dropout: 0.5 119 | emb_dim: 32 120 | enclosing_sub_graph: True 121 | eval_every: 1 122 | eval_every_iter: 455 123 | exp_dir: utils/../experiments/compile_nell_v4_ind 124 | experiment_name: compile_nell_v4_ind 125 | gnn_agg_type: sum 126 | gpu: 0 127 | has_attn: True 128 | hop: 3 129 | kge_model: TransE 130 | l2: 0.0005 131 | load_model: False 132 | lr: 0.001 133 | main_dir: utils/.. 134 | margin: 10 135 | max_links: 10000000 136 | max_nodes_per_hop: None 137 | model_type: dgl 138 | num_bases: 4 139 | num_epochs: 30 140 | num_gcn_layers: 3 141 | num_neg_samples_per_link: 1 142 | num_workers: 0 143 | optimizer: Adam 144 | rel_emb_dim: 32 145 | save_every: 10 146 | train_file: train 147 | use_kge_embeddings: False 148 | valid_file: valid 149 | ============================================ 150 | Max distance from sub : 3, Max distance from obj : 3 151 | Max distance from sub : 3, Max distance from obj : 3 152 | No existing model found. Initializing new model.. 153 | Device: cuda:0 154 | Input dim : 8, # Relations : 76, # Augmented relations : 76 155 | Total number of parameters: 36609 156 | Starting training with full batch... 157 | 158 | Performance:{'auc': 0.7120642292696149, 'auc_pr': 0.708260517596693}in 98.48620629310608 159 | Better models found w.r.t accuracy. Saved it! 160 | Epoch 1 with loss: 1002472.75, training auc: 0.6504949166272641, training auc_pr: 0.6519165979079444, best validation AUC: 0.7120642292696149, weight_norm: 134.85137939453125 in 1290.5363817214966 161 | 162 | Performance:{'auc': 0.5829391328370968, 'auc_pr': 0.6081275506340519}in 97.76176166534424 163 | Epoch 2 with loss: 997751.6875, training auc: 0.6105630321149288, training auc_pr: 0.6389048512854649, best validation AUC: 0.7120642292696149, weight_norm: 136.5711212158203 in 1282.1537063121796 164 | 165 | Performance:{'auc': 0.5657422176351619, 'auc_pr': 0.5924276115981146}in 107.78150796890259 166 | Epoch 3 with loss: 1016724.5625, training auc: 0.5967820286130567, training auc_pr: 0.6396891847079792, best validation AUC: 0.7120642292696149, weight_norm: 136.8701629638672 in 1322.7062730789185 167 | 168 | Performance:{'auc': 0.5725980484143367, 'auc_pr': 0.5958905418148995}in 99.76627349853516 169 | Epoch 4 with loss: 999020.875, training auc: 0.6082831027916138, training auc_pr: 0.6467284701144813, best validation AUC: 0.7120642292696149, weight_norm: 137.06948852539062 in 1289.740923166275 170 | 171 | Performance:{'auc': 0.5727479097600133, 'auc_pr': 0.5951300420013876}in 105.44012331962585 172 | Epoch 5 with loss: 983860.6875, training auc: 0.6148082993614077, training auc_pr: 0.6538119348223812, best validation AUC: 0.7120642292696149, weight_norm: 137.42681884765625 in 1385.456704378128 173 | 174 | Performance:{'auc': 0.5731636121014991, 'auc_pr': 0.5945714192134619}in 116.94251847267151 175 | Epoch 6 with loss: 987564.5, training auc: 0.6118899761294434, training auc_pr: 0.6499089548029302, best validation AUC: 0.7120642292696149, weight_norm: 137.3944549560547 in 1562.5355093479156 176 | 177 | Performance:{'auc': 0.5721022028314673, 'auc_pr': 0.5969215515297234}in 106.19926738739014 178 | Epoch 7 with loss: 979331.1875, training auc: 0.6161806930392261, training auc_pr: 0.6558125930596259, best validation AUC: 0.7120642292696149, weight_norm: 137.98033142089844 in 1366.781834602356 179 | 180 | Performance:{'auc': 0.6194636006338483, 'auc_pr': 0.6389913633601614}in 111.00783586502075 181 | Epoch 8 with loss: 984802.125, training auc: 0.614314569904638, training auc_pr: 0.6482952953290476, best validation AUC: 0.7120642292696149, weight_norm: 139.932373046875 in 1472.37526345253 182 | 183 | Performance:{'auc': 0.6010345645420238, 'auc_pr': 0.6249457745996215}in 110.55558323860168 184 | Epoch 9 with loss: 962426.0625, training auc: 0.6265743619362512, training auc_pr: 0.6618949848228293, best validation AUC: 0.7120642292696149, weight_norm: 143.48548889160156 in 1377.1080858707428 185 | 186 | Performance:{'auc': 0.6019467640374471, 'auc_pr': 0.6253301646208065}in 107.27267694473267 187 | Epoch 10 with loss: 925264.25, training auc: 0.6459990966967227, training auc_pr: 0.6786478813874655, best validation AUC: 0.7120642292696149, weight_norm: 145.2030792236328 in 1457.4731812477112 188 | 189 | Performance:{'auc': 0.6948881643418611, 'auc_pr': 0.7064584155217968}in 106.54831171035767 190 | Epoch 11 with loss: 902373.0625, training auc: 0.6599780029249352, training auc_pr: 0.6897035802476682, best validation AUC: 0.7120642292696149, weight_norm: 146.06576538085938 in 1403.814304113388 191 | 192 | Performance:{'auc': 0.703386605783866, 'auc_pr': 0.716869067098012}in 108.78307700157166 193 | Epoch 12 with loss: 833329.1875, training auc: 0.6917684355108966, training auc_pr: 0.713560964922975, best validation AUC: 0.7120642292696149, weight_norm: 148.72451782226562 in 1410.622745513916 194 | 195 | Performance:{'auc': 0.8097190946810953, 'auc_pr': 0.7716038474767217}in 108.13592767715454 196 | Better models found w.r.t accuracy. Saved it! 197 | Epoch 13 with loss: 643480.5625, training auc: 0.7710449414981346, training auc_pr: 0.7585973968940547, best validation AUC: 0.8097190946810953, weight_norm: 151.1417999267578 in 1416.138186454773 198 | 199 | Performance:{'auc': 0.845273373157357, 'auc_pr': 0.8319535924806791}in 105.47019243240356 200 | Better models found w.r.t accuracy. Saved it! 201 | Epoch 14 with loss: 479894.96875, training auc: 0.8336343173478133, training auc_pr: 0.8193526666786017, best validation AUC: 0.845273373157357, weight_norm: 153.6695098876953 in 1397.8663086891174 202 | 203 | Performance:{'auc': 0.8688928243781405, 'auc_pr': 0.8582676820005574}in 109.96139717102051 204 | Better models found w.r.t accuracy. Saved it! 205 | Epoch 15 with loss: 414557.96875, training auc: 0.8583448145832866, training auc_pr: 0.8458281804029446, best validation AUC: 0.8688928243781405, weight_norm: 156.20025634765625 in 1466.916127204895 206 | 207 | Performance:{'auc': 0.8716541825650007, 'auc_pr': 0.8625972041473648}in 103.59730458259583 208 | Better models found w.r.t accuracy. Saved it! 209 | Epoch 16 with loss: 382601.21875, training auc: 0.8699039213786858, training auc_pr: 0.8544261442598972, best validation AUC: 0.8716541825650007, weight_norm: 158.39100646972656 in 1363.205931186676 210 | 211 | Performance:{'auc': 0.8838620180980379, 'auc_pr': 0.8694152015825152}in 112.29348659515381 212 | Better models found w.r.t accuracy. Saved it! 213 | Epoch 17 with loss: 353920.5, training auc: 0.8784802359645364, training auc_pr: 0.8658363370816708, best validation AUC: 0.8838620180980379, weight_norm: 160.42938232421875 in 1494.4573781490326 214 | 215 | Performance:{'auc': 0.8958926263005359, 'auc_pr': 0.8877860159074005}in 93.33963990211487 216 | Better models found w.r.t accuracy. Saved it! 217 | Epoch 18 with loss: 322911.84375, training auc: 0.8895199995728996, training auc_pr: 0.8727785515088582, best validation AUC: 0.8958926263005359, weight_norm: 162.52545166015625 in 1376.4243819713593 218 | 219 | Performance:{'auc': 0.8975241602552072, 'auc_pr': 0.8959679434812348}in 107.65798211097717 220 | Better models found w.r.t accuracy. Saved it! 221 | Epoch 19 with loss: 294608.5, training auc: 0.9001592072904352, training auc_pr: 0.8890395179549839, best validation AUC: 0.8975241602552072, weight_norm: 163.7848663330078 in 1473.4786217212677 222 | 223 | Performance:{'auc': 0.9035537989199558, 'auc_pr': 0.8979020537472038}in 100.02894830703735 224 | Better models found w.r.t accuracy. Saved it! 225 | Epoch 20 with loss: 286375.5625, training auc: 0.903554883699791, training auc_pr: 0.8917630850211133, best validation AUC: 0.9035537989199558, weight_norm: 165.7347412109375 in 1376.8097817897797 226 | 227 | Performance:{'auc': 0.903766862659244, 'auc_pr': 0.8974102582621495}in 118.61936497688293 228 | Better models found w.r.t accuracy. Saved it! 229 | Epoch 21 with loss: 274164.8125, training auc: 0.9069193705411298, training auc_pr: 0.8950499766698242, best validation AUC: 0.903766862659244, weight_norm: 167.73663330078125 in 1468.8919966220856 230 | 231 | Performance:{'auc': 0.9028162204707992, 'auc_pr': 0.8927616991355903}in 100.0911705493927 232 | Epoch 22 with loss: 270166.03125, training auc: 0.9080979182438529, training auc_pr: 0.8953932248409231, best validation AUC: 0.903766862659244, weight_norm: 169.07098388671875 in 1393.7504432201385 233 | 234 | Performance:{'auc': 0.8992964335606013, 'auc_pr': 0.8864098973641945}in 110.24582862854004 235 | Epoch 23 with loss: 264222.8125, training auc: 0.9097743926481412, training auc_pr: 0.8971297397725044, best validation AUC: 0.903766862659244, weight_norm: 170.19969177246094 in 1416.868691444397 236 | 237 | Performance:{'auc': 0.9011188778382434, 'auc_pr': 0.8905536584299034}in 98.31412649154663 238 | Epoch 24 with loss: 239050.1875, training auc: 0.9202242150607819, training auc_pr: 0.9086160677131943, best validation AUC: 0.903766862659244, weight_norm: 171.57135009765625 in 1421.4703109264374 239 | 240 | Performance:{'auc': 0.9133117272367131, 'auc_pr': 0.9062772620611281}in 102.3911018371582 241 | Better models found w.r.t accuracy. Saved it! 242 | Epoch 25 with loss: 255216.4375, training auc: 0.9136345231708636, training auc_pr: 0.8996050033809619, best validation AUC: 0.9133117272367131, weight_norm: 173.8502960205078 in 1353.9066956043243 243 | 244 | Performance:{'auc': 0.902582958028398, 'auc_pr': 0.8942434171019488}in 101.21613311767578 245 | Epoch 26 with loss: 227124.1875, training auc: 0.9224140792379405, training auc_pr: 0.9105422612268373, best validation AUC: 0.9133117272367131, weight_norm: 174.82058715820312 in 1476.4799265861511 246 | 247 | Performance:{'auc': 0.9132941348178728, 'auc_pr': 0.9092162368302779}in 113.04683709144592 248 | 249 | Performance:{'auc': 0.9172960843185087, 'auc_pr': 0.9147312479735876}in 109.55985236167908 250 | Better models found w.r.t accuracy. Saved it! 251 | Epoch 27 with loss: 220791.1875, training auc: 0.9252795066484709, training auc_pr: 0.9150551158030051, best validation AUC: 0.9172960843185087, weight_norm: 176.0113525390625 in 1464.3590109348297 252 | 253 | Performance:{'auc': 0.8895691290840474, 'auc_pr': 0.8776626693024772}in 112.88955640792847 254 | Epoch 28 with loss: 264310.0625, training auc: 0.9106737796677594, training auc_pr: 0.8971130010733881, best validation AUC: 0.9172960843185087, weight_norm: 178.27793884277344 in 1506.5027270317078 255 | 256 | Performance:{'auc': 0.8989615260315674, 'auc_pr': 0.8947127708678383}in 111.65631628036499 257 | Epoch 29 with loss: 278754.125, training auc: 0.9065422682922426, training auc_pr: 0.8967538493248908, best validation AUC: 0.9172960843185087, weight_norm: 179.66348266601562 in 1368.8761911392212 258 | 259 | Performance:{'auc': 0.8913655105189633, 'auc_pr': 0.8794615376574819}in 106.69048309326172 260 | Epoch 30 with loss: 405169.40625, training auc: 0.8634110558869994, training auc_pr: 0.8523589774032612, best validation AUC: 0.9172960843185087, weight_norm: 181.33078002929688 in 1464.6133909225464 261 | -------------------------------------------------------------------------------- /CoMPILE_github/experiments/compile_nell_v4_ind/params.json: -------------------------------------------------------------------------------- 1 | {"experiment_name": "compile_nell_v4_ind", "dataset": "nell_v4_ind", "train_file": "train", "test_file": "test", "runs": 1, "gpu": 0, "disable_cuda": false, "max_links": 100000, "hop": 3, "max_nodes_per_hop": null, "use_kge_embeddings": false, "kge_model": "TransE", "model_type": "dgl", "constrained_neg_prob": 0, "num_neg_samples_per_link": 1, "batch_size": 16, "num_workers": 0, "add_traspose_rels": false, "enclosing_sub_graph": true, "main_dir": "utils/..", "exp_dir": "utils/../experiments/compile_nell_v4_ind", "test_exp_dir": "utils/../experiments/compile_nell_v4_ind/test_nell_v4_ind_0"} -------------------------------------------------------------------------------- /CoMPILE_github/experiments/compile_nell_v4_ind/test_nell_v4_ind_0/log_test.txt: -------------------------------------------------------------------------------- 1 | ============ Initialized logger ============ 2 | add_traspose_rels: False 3 | batch_size: 16 4 | constrained_neg_prob: 0 5 | dataset: nell_v4_ind 6 | disable_cuda: False 7 | enclosing_sub_graph: True 8 | exp_dir: utils/../experiments/compile_nell_v4_ind 9 | experiment_name: compile_nell_v4_ind 10 | gpu: 0 11 | hop: 3 12 | kge_model: TransE 13 | main_dir: utils/.. 14 | max_links: 100000 15 | max_nodes_per_hop: None 16 | model_type: dgl 17 | num_neg_samples_per_link: 1 18 | num_workers: 0 19 | runs: 1 20 | test_exp_dir: utils/../experiments/compile_nell_v4_ind/test_nell_v4_ind_0 21 | test_file: test 22 | train_file: train 23 | use_kge_embeddings: False 24 | ============================================ 25 | Loading existing model from utils/../experiments/compile_nell_v4_ind/best_graph_classifier.pth 26 | Device: cuda:0 27 | Sampling negative links for test 28 | Extracting enclosing subgraphs for positive links in test set 29 | Extracting enclosing subgraphs for negative links in test set 30 | Max distance from sub : 3, Max distance from obj : 3 31 | 32 | Test Set Performance:{'auc': 0.78092244755886, 'auc_pr': 0.8432636131804843} 33 | 34 | Avg test Set Performance -- mean auc :0.78092244755886 std auc: 0.0 35 | 36 | Avg test Set Performance -- mean auc_pr :0.8432636131804843 std auc_pr: 0.0 37 | ============ Initialized logger ============ 38 | add_traspose_rels: False 39 | batch_size: 16 40 | constrained_neg_prob: 0 41 | dataset: nell_v4_ind 42 | disable_cuda: False 43 | enclosing_sub_graph: True 44 | exp_dir: utils/../experiments/compile_nell_v4_ind 45 | experiment_name: compile_nell_v4_ind 46 | gpu: 0 47 | hop: 3 48 | kge_model: TransE 49 | main_dir: utils/.. 50 | max_links: 100000 51 | max_nodes_per_hop: None 52 | model_type: dgl 53 | num_neg_samples_per_link: 1 54 | num_workers: 0 55 | runs: 1 56 | test_exp_dir: utils/../experiments/compile_nell_v4_ind/test_nell_v4_ind_0 57 | test_file: test 58 | train_file: train 59 | use_kge_embeddings: False 60 | ============================================ 61 | Loading existing model from utils/../experiments/compile_nell_v4_ind/best_graph_classifier.pth 62 | Device: cuda:0 63 | Sampling negative links for test 64 | Extracting enclosing subgraphs for positive links in test set 65 | Extracting enclosing subgraphs for negative links in test set 66 | Max distance from sub : 3, Max distance from obj : 3 67 | 68 | Test Set Performance:{'auc': 0.7682933447613131, 'auc_pr': 0.8343463225340576} 69 | 70 | Avg test Set Performance -- mean auc :0.7682933447613131 std auc: 0.0 71 | 72 | Avg test Set Performance -- mean auc_pr :0.8343463225340576 std auc_pr: 0.0 73 | ============ Initialized logger ============ 74 | add_traspose_rels: False 75 | batch_size: 16 76 | constrained_neg_prob: 0 77 | dataset: nell_v4_ind 78 | disable_cuda: False 79 | enclosing_sub_graph: True 80 | exp_dir: utils/../experiments/compile_nell_v4_ind 81 | experiment_name: compile_nell_v4_ind 82 | gpu: 0 83 | hop: 3 84 | kge_model: TransE 85 | main_dir: utils/.. 86 | max_links: 100000 87 | max_nodes_per_hop: None 88 | model_type: dgl 89 | num_neg_samples_per_link: 1 90 | num_workers: 0 91 | runs: 1 92 | test_exp_dir: utils/../experiments/compile_nell_v4_ind/test_nell_v4_ind_0 93 | test_file: test 94 | train_file: train 95 | use_kge_embeddings: False 96 | ============================================ 97 | Loading existing model from utils/../experiments/compile_nell_v4_ind/best_graph_classifier.pth 98 | Device: cuda:0 99 | Sampling negative links for test 100 | Extracting enclosing subgraphs for positive links in test set 101 | Extracting enclosing subgraphs for negative links in test set 102 | Max distance from sub : 3, Max distance from obj : 3 103 | 104 | Test Set Performance:{'auc': 0.7790912884735225, 'auc_pr': 0.8423223582566522} 105 | 106 | Avg test Set Performance -- mean auc :0.7790912884735225 std auc: 0.0 107 | 108 | Avg test Set Performance -- mean auc_pr :0.8423223582566522 std auc_pr: 0.0 109 | -------------------------------------------------------------------------------- /CoMPILE_github/experiments/compile_nell_v4_ind2/best_graph_classifier.pth: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TmacMai/CoMPILE_Inductive_Knowledge_Graph/523db77f8178f89faf5ec06c677ca66b84ef35eb/CoMPILE_github/experiments/compile_nell_v4_ind2/best_graph_classifier.pth -------------------------------------------------------------------------------- /CoMPILE_github/experiments/compile_nell_v4_ind2/graph_classifier_chk.pth: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TmacMai/CoMPILE_Inductive_Knowledge_Graph/523db77f8178f89faf5ec06c677ca66b84ef35eb/CoMPILE_github/experiments/compile_nell_v4_ind2/graph_classifier_chk.pth -------------------------------------------------------------------------------- /CoMPILE_github/experiments/compile_nell_v4_ind2/log_rank_test_1642603455.548011.txt: -------------------------------------------------------------------------------- 1 | ============ Initialized logger ============ 2 | add_traspose_rels: False 3 | dataset: nell_v4_ind 4 | enclosing_sub_graph: True 5 | experiment_name: compile_nell_v4_ind2 6 | file_paths: {'graph': './data/nell_v4_ind/train.txt', 'links': './data/nell_v4_ind/test.txt'} 7 | hop: 3 8 | kge_model: TransE 9 | mode: sample 10 | model_path: experiments/compile_nell_v4_ind2/best_graph_classifier.pth 11 | ruleN_pred_path: ./data/nell_v4_ind/pos_predictions.txt 12 | use_kge_embeddings: False 13 | ============================================ 14 | -------------------------------------------------------------------------------- /CoMPILE_github/experiments/compile_nell_v4_ind2/log_rank_test_1642647880.451746.txt: -------------------------------------------------------------------------------- 1 | ============ Initialized logger ============ 2 | add_traspose_rels: False 3 | dataset: nell_v4_ind 4 | enclosing_sub_graph: True 5 | experiment_name: compile_nell_v4_ind2 6 | file_paths: {'graph': './data/nell_v4_ind/train.txt', 'links': './data/nell_v4_ind/test.txt'} 7 | hop: 3 8 | kge_model: TransE 9 | mode: sample 10 | model_path: experiments/compile_nell_v4_ind2/best_graph_classifier.pth 11 | ruleN_pred_path: ./data/nell_v4_ind/pos_predictions.txt 12 | use_kge_embeddings: False 13 | ============================================ 14 | MRR | Hits@1 | Hits@5 | Hits@10 : 0.154622204580616 | 0.11696306429548564 | 0.1238030095759234 | 0.14227086183310533 15 | -------------------------------------------------------------------------------- /CoMPILE_github/experiments/compile_nell_v4_ind2/log_rank_test_1642679948.7119958.txt: -------------------------------------------------------------------------------- 1 | ============ Initialized logger ============ 2 | add_traspose_rels: False 3 | dataset: nell_v4_ind 4 | enclosing_sub_graph: True 5 | experiment_name: compile_nell_v4_ind2 6 | file_paths: {'graph': './data/nell_v4_ind/train.txt', 'links': './data/nell_v4_ind/test.txt'} 7 | hop: 3 8 | kge_model: TransE 9 | mode: sample 10 | model_path: experiments/compile_nell_v4_ind2/best_graph_classifier.pth 11 | ruleN_pred_path: ./data/nell_v4_ind/pos_predictions.txt 12 | use_kge_embeddings: False 13 | ============================================ 14 | MRR | Hits@1 | Hits@5 | Hits@10 : 0.465616634820716 | 0.393296853625171 | 0.5136798905608755 | 0.5430916552667578 15 | -------------------------------------------------------------------------------- /CoMPILE_github/experiments/compile_nell_v4_ind2/log_rank_test_1642733457.0139189.txt: -------------------------------------------------------------------------------- 1 | ============ Initialized logger ============ 2 | add_traspose_rels: False 3 | dataset: nell_v4_ind 4 | enclosing_sub_graph: True 5 | experiment_name: compile_nell_v4_ind2 6 | file_paths: {'graph': './data/nell_v4_ind/train.txt', 'links': './data/nell_v4_ind/test.txt'} 7 | hop: 3 8 | kge_model: TransE 9 | mode: sample 10 | model_path: experiments/compile_nell_v4_ind2/best_graph_classifier.pth 11 | ruleN_pred_path: ./data/nell_v4_ind/pos_predictions.txt 12 | use_kge_embeddings: False 13 | ============================================ 14 | MRR | Hits@1 | Hits@5 | Hits@10 : 0.3413417829091137 | 0.23529411764705882 | 0.4425444596443228 | 0.4781121751025992 15 | -------------------------------------------------------------------------------- /CoMPILE_github/experiments/compile_nell_v4_ind2/log_rank_test_1642832066.1408617.txt: -------------------------------------------------------------------------------- 1 | ============ Initialized logger ============ 2 | add_traspose_rels: False 3 | dataset: nell_v4_ind 4 | enclosing_sub_graph: True 5 | experiment_name: compile_nell_v4_ind2 6 | file_paths: {'graph': './data/nell_v4_ind/train.txt', 'links': './data/nell_v4_ind/test.txt'} 7 | hop: 3 8 | kge_model: TransE 9 | mode: sample 10 | model_path: experiments/compile_nell_v4_ind2/best_graph_classifier.pth 11 | ruleN_pred_path: ./data/nell_v4_ind/pos_predictions.txt 12 | use_kge_embeddings: False 13 | ============================================ 14 | MRR | Hits@1 | Hits@5 | Hits@10 : 0.6370628492990844 | 0.5690834473324213 | 0.7024623803009576 | 0.7373461012311902 15 | -------------------------------------------------------------------------------- /CoMPILE_github/experiments/compile_nell_v4_ind2/log_rank_test_1642841084.211995.txt: -------------------------------------------------------------------------------- 1 | ============ Initialized logger ============ 2 | add_traspose_rels: False 3 | dataset: nell_v4_ind 4 | enclosing_sub_graph: True 5 | experiment_name: compile_nell_v4_ind2 6 | file_paths: {'graph': './data/nell_v4_ind/train.txt', 'links': './data/nell_v4_ind/test.txt'} 7 | hop: 3 8 | kge_model: TransE 9 | mode: sample 10 | model_path: experiments/compile_nell_v4_ind2/best_graph_classifier.pth 11 | ruleN_pred_path: ./data/nell_v4_ind/pos_predictions.txt 12 | use_kge_embeddings: False 13 | ============================================ 14 | MRR | Hits@1 | Hits@5 | Hits@10 : 0.6319985352481888 | 0.5601915184678523 | 0.6969904240766074 | 0.7352941176470589 15 | -------------------------------------------------------------------------------- /CoMPILE_github/experiments/compile_nell_v4_ind2/log_rank_test_1642850740.4142146.txt: -------------------------------------------------------------------------------- 1 | ============ Initialized logger ============ 2 | add_traspose_rels: False 3 | dataset: nell_v4_ind 4 | enclosing_sub_graph: True 5 | experiment_name: compile_nell_v4_ind2 6 | file_paths: {'graph': './data/nell_v4_ind/train.txt', 'links': './data/nell_v4_ind/test.txt'} 7 | hop: 3 8 | kge_model: TransE 9 | mode: sample 10 | model_path: experiments/compile_nell_v4_ind2/best_graph_classifier.pth 11 | ruleN_pred_path: ./data/nell_v4_ind/pos_predictions.txt 12 | use_kge_embeddings: False 13 | ============================================ 14 | MRR | Hits@1 | Hits@5 | Hits@10 : 0.6328534341849693 | 0.5588235294117647 | 0.7065663474692202 | 0.7435020519835841 15 | -------------------------------------------------------------------------------- /CoMPILE_github/experiments/compile_nell_v4_ind2/log_rank_test_1642855466.9402952.txt: -------------------------------------------------------------------------------- 1 | ============ Initialized logger ============ 2 | add_traspose_rels: False 3 | dataset: nell_v4_ind 4 | enclosing_sub_graph: True 5 | experiment_name: compile_nell_v4_ind2 6 | file_paths: {'graph': './data/nell_v4_ind/train.txt', 'links': './data/nell_v4_ind/test.txt'} 7 | hop: 3 8 | kge_model: TransE 9 | mode: sample 10 | model_path: experiments/compile_nell_v4_ind2/best_graph_classifier.pth 11 | ruleN_pred_path: ./data/nell_v4_ind/pos_predictions.txt 12 | use_kge_embeddings: False 13 | ============================================ 14 | MRR | Hits@1 | Hits@5 | Hits@10 : 0.6372589530665589 | 0.5663474692202463 | 0.70109439124487 | 0.7346101231190151 15 | -------------------------------------------------------------------------------- /CoMPILE_github/experiments/compile_nell_v4_ind2/log_rank_test_1642859913.6085432.txt: -------------------------------------------------------------------------------- 1 | ============ Initialized logger ============ 2 | add_traspose_rels: False 3 | dataset: nell_v4_ind 4 | enclosing_sub_graph: True 5 | experiment_name: compile_nell_v4_ind2 6 | file_paths: {'graph': './data/nell_v4_ind/train.txt', 'links': './data/nell_v4_ind/test.txt'} 7 | hop: 3 8 | kge_model: TransE 9 | mode: sample 10 | model_path: experiments/compile_nell_v4_ind2/best_graph_classifier.pth 11 | ruleN_pred_path: ./data/nell_v4_ind/pos_predictions.txt 12 | use_kge_embeddings: False 13 | ============================================ 14 | MRR | Hits@1 | Hits@5 | Hits@10 : 0.6326340381356135 | 0.5601915184678523 | 0.7051983584131327 | 0.7373461012311902 15 | -------------------------------------------------------------------------------- /CoMPILE_github/experiments/compile_nell_v4_ind2/log_rank_test_1642908893.5786257.txt: -------------------------------------------------------------------------------- 1 | ============ Initialized logger ============ 2 | add_traspose_rels: False 3 | dataset: nell_v4_ind 4 | enclosing_sub_graph: True 5 | experiment_name: compile_nell_v4_ind2 6 | file_paths: {'graph': './data/nell_v4_ind/train.txt', 'links': './data/nell_v4_ind/test.txt'} 7 | hop: 3 8 | kge_model: TransE 9 | mode: sample 10 | model_path: experiments/compile_nell_v4_ind2/best_graph_classifier.pth 11 | ruleN_pred_path: ./data/nell_v4_ind/pos_predictions.txt 12 | use_kge_embeddings: False 13 | ============================================ 14 | MRR | Hits@1 | Hits@5 | Hits@10 : 0.6416702961382977 | 0.5725034199726402 | 0.7079343365253078 | 0.7387140902872777 15 | -------------------------------------------------------------------------------- /CoMPILE_github/experiments/compile_nell_v4_ind2/params.json: -------------------------------------------------------------------------------- 1 | {"experiment_name": "compile_nell_v4_ind2", "dataset": "nell_v4", "gpu": 0, "disable_cuda": false, "load_model": false, "train_file": "train", "valid_file": "valid", "num_epochs": 30, "eval_every": 1, "eval_every_iter": 455, "save_every": 10, "early_stop": 100, "optimizer": "Adam", "lr": 0.001, "clip": 1000, "l2": 0.0005, "margin": 10, "max_links": 10000000, "hop": 3, "max_nodes_per_hop": null, "use_kge_embeddings": false, "kge_model": "TransE", "model_type": "dgl", "constrained_neg_prob": 0.0, "batch_size": 16, "num_neg_samples_per_link": 1, "num_workers": 0, "add_traspose_rels": false, "enclosing_sub_graph": true, "rel_emb_dim": 32, "attn_rel_emb_dim": 32, "emb_dim": 32, "num_gcn_layers": 3, "num_bases": 4, "dropout": 0, "edge_dropout": 0.5, "gnn_agg_type": "sum", "add_ht_emb": true, "has_attn": true, "main_dir": "utils/..", "exp_dir": "utils/../experiments/compile_nell_v4_ind2"} -------------------------------------------------------------------------------- /CoMPILE_github/experiments/compile_nell_v4_ind2/test_nell_v4_ind_0/log_test.txt: -------------------------------------------------------------------------------- 1 | ============ Initialized logger ============ 2 | add_traspose_rels: False 3 | batch_size: 32 4 | constrained_neg_prob: 0 5 | dataset: nell_v4_ind 6 | disable_cuda: False 7 | enclosing_sub_graph: True 8 | exp_dir: utils/../experiments/compile_nell_v4_ind2 9 | experiment_name: compile_nell_v4_ind2 10 | gpu: 0 11 | hop: 3 12 | kge_model: TransE 13 | main_dir: utils/.. 14 | max_links: 100000 15 | max_nodes_per_hop: None 16 | model_type: dgl 17 | num_neg_samples_per_link: 1 18 | num_workers: 0 19 | runs: 1 20 | test_exp_dir: utils/../experiments/compile_nell_v4_ind2/test_nell_v4_ind_0 21 | test_file: test 22 | train_file: train 23 | use_kge_embeddings: False 24 | ============================================ 25 | -------------------------------------------------------------------------------- /CoMPILE_github/kge/README.md: -------------------------------------------------------------------------------- 1 | The files here are largely taken from RotatE's official code available [here](https://github.com/DeepGraphLearning/KnowledgeGraphEmbedding). 2 | -------------------------------------------------------------------------------- /CoMPILE_github/kge/dataloader.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/python3 2 | 3 | from __future__ import absolute_import 4 | from __future__ import division 5 | from __future__ import print_function 6 | 7 | import numpy as np 8 | import torch 9 | 10 | from torch.utils.data import Dataset 11 | 12 | class TrainDataset(Dataset): 13 | def __init__(self, triples, nentity, nrelation, negative_sample_size, mode): 14 | self.len = len(triples) 15 | self.triples = triples 16 | self.triple_set = set(triples) 17 | self.nentity = nentity 18 | self.nrelation = nrelation 19 | self.negative_sample_size = negative_sample_size 20 | self.mode = mode 21 | self.count = self.count_frequency(triples) 22 | self.true_head, self.true_tail = self.get_true_head_and_tail(self.triples) 23 | 24 | def __len__(self): 25 | return self.len 26 | 27 | def __getitem__(self, idx): 28 | positive_sample = self.triples[idx] 29 | 30 | head, relation, tail = positive_sample 31 | 32 | subsampling_weight = self.count[(head, relation)] + self.count[(tail, -relation-1)] 33 | subsampling_weight = torch.sqrt(1 / torch.Tensor([subsampling_weight])) 34 | 35 | negative_sample_list = [] 36 | negative_sample_size = 0 37 | 38 | while negative_sample_size < self.negative_sample_size: 39 | negative_sample = np.random.randint(self.nentity, size=self.negative_sample_size*2) 40 | if self.mode == 'head-batch': 41 | mask = np.in1d( 42 | negative_sample, 43 | self.true_head[(relation, tail)], 44 | assume_unique=True, 45 | invert=True 46 | ) 47 | elif self.mode == 'tail-batch': 48 | mask = np.in1d( 49 | negative_sample, 50 | self.true_tail[(head, relation)], 51 | assume_unique=True, 52 | invert=True 53 | ) 54 | else: 55 | raise ValueError('Training batch mode %s not supported' % self.mode) 56 | negative_sample = negative_sample[mask] 57 | negative_sample_list.append(negative_sample) 58 | negative_sample_size += negative_sample.size 59 | 60 | negative_sample = np.concatenate(negative_sample_list)[:self.negative_sample_size] 61 | 62 | negative_sample = torch.from_numpy(negative_sample) 63 | 64 | positive_sample = torch.LongTensor(positive_sample) 65 | 66 | return positive_sample, negative_sample, subsampling_weight, self.mode 67 | 68 | @staticmethod 69 | def collate_fn(data): 70 | positive_sample = torch.stack([_[0] for _ in data], dim=0) 71 | negative_sample = torch.stack([_[1] for _ in data], dim=0) 72 | subsample_weight = torch.cat([_[2] for _ in data], dim=0) 73 | mode = data[0][3] 74 | return positive_sample, negative_sample, subsample_weight, mode 75 | 76 | @staticmethod 77 | def count_frequency(triples, start=4): 78 | ''' 79 | Get frequency of a partial triple like (head, relation) or (relation, tail) 80 | The frequency will be used for subsampling like word2vec 81 | ''' 82 | count = {} 83 | for head, relation, tail in triples: 84 | if (head, relation) not in count: 85 | count[(head, relation)] = start 86 | else: 87 | count[(head, relation)] += 1 88 | 89 | if (tail, -relation-1) not in count: 90 | count[(tail, -relation-1)] = start 91 | else: 92 | count[(tail, -relation-1)] += 1 93 | return count 94 | 95 | @staticmethod 96 | def get_true_head_and_tail(triples): 97 | ''' 98 | Build a dictionary of true triples that will 99 | be used to filter these true triples for negative sampling 100 | ''' 101 | 102 | true_head = {} 103 | true_tail = {} 104 | 105 | for head, relation, tail in triples: 106 | if (head, relation) not in true_tail: 107 | true_tail[(head, relation)] = [] 108 | true_tail[(head, relation)].append(tail) 109 | if (relation, tail) not in true_head: 110 | true_head[(relation, tail)] = [] 111 | true_head[(relation, tail)].append(head) 112 | 113 | for relation, tail in true_head: 114 | true_head[(relation, tail)] = np.array(list(set(true_head[(relation, tail)]))) 115 | for head, relation in true_tail: 116 | true_tail[(head, relation)] = np.array(list(set(true_tail[(head, relation)]))) 117 | 118 | return true_head, true_tail 119 | 120 | 121 | class TestDataset(Dataset): 122 | def __init__(self, triples, all_true_triples, nentity, nrelation, mode): 123 | self.len = len(triples) 124 | self.triple_set = set(all_true_triples) 125 | self.triples = triples 126 | self.nentity = nentity 127 | self.nrelation = nrelation 128 | self.mode = mode 129 | 130 | def __len__(self): 131 | return self.len 132 | 133 | def __getitem__(self, idx): 134 | head, relation, tail = self.triples[idx] 135 | 136 | if self.mode == 'head-batch': 137 | tmp = [(0, rand_head) if (rand_head, relation, tail) not in self.triple_set 138 | else (-1, head) for rand_head in range(self.nentity)] 139 | tmp[head] = (0, head) 140 | elif self.mode == 'tail-batch': 141 | tmp = [(0, rand_tail) if (head, relation, rand_tail) not in self.triple_set 142 | else (-1, tail) for rand_tail in range(self.nentity)] 143 | tmp[tail] = (0, tail) 144 | else: 145 | raise ValueError('negative batch mode %s not supported' % self.mode) 146 | 147 | tmp = torch.LongTensor(tmp) 148 | filter_bias = tmp[:, 0].float() 149 | negative_sample = tmp[:, 1] 150 | 151 | positive_sample = torch.LongTensor((head, relation, tail)) 152 | 153 | return positive_sample, negative_sample, filter_bias, self.mode 154 | 155 | @staticmethod 156 | def collate_fn(data): 157 | positive_sample = torch.stack([_[0] for _ in data], dim=0) 158 | negative_sample = torch.stack([_[1] for _ in data], dim=0) 159 | filter_bias = torch.stack([_[2] for _ in data], dim=0) 160 | mode = data[0][3] 161 | return positive_sample, negative_sample, filter_bias, mode 162 | 163 | class BidirectionalOneShotIterator(object): 164 | def __init__(self, dataloader_head, dataloader_tail): 165 | self.iterator_head = self.one_shot_iterator(dataloader_head) 166 | self.iterator_tail = self.one_shot_iterator(dataloader_tail) 167 | self.step = 0 168 | 169 | def __next__(self): 170 | self.step += 1 171 | if self.step % 2 == 0: 172 | data = next(self.iterator_head) 173 | else: 174 | data = next(self.iterator_tail) 175 | return data 176 | 177 | @staticmethod 178 | def one_shot_iterator(dataloader): 179 | ''' 180 | Transform a PyTorch Dataloader into python iterator 181 | ''' 182 | while True: 183 | for data in dataloader: 184 | yield data -------------------------------------------------------------------------------- /CoMPILE_github/kge/run.sh: -------------------------------------------------------------------------------- 1 | #!/bin/sh 2 | 3 | python -u -c 'import torch; print(torch.__version__)' 4 | 5 | CODE_PATH=. 6 | DATA_PATH=../data 7 | SAVE_PATH=../experiments/kge_baselines 8 | 9 | #The first four parameters must be provided 10 | MODE=$1 11 | MODEL=$2 12 | DATASET=$3 13 | GPU_DEVICE=$4 14 | SAVE_ID=$5 15 | 16 | FULL_DATA_PATH=$DATA_PATH/$DATASET 17 | SAVE=$SAVE_PATH/"$MODEL"_"$DATASET" 18 | 19 | #Only used in training 20 | BATCH_SIZE=$6 21 | NEGATIVE_SAMPLE_SIZE=$7 22 | HIDDEN_DIM=$8 23 | GAMMA=$9 24 | ALPHA=${10} 25 | LEARNING_RATE=${11} 26 | MAX_STEPS=${12} 27 | TEST_BATCH_SIZE=${13} 28 | 29 | if [ $MODE == "train" ] 30 | then 31 | 32 | echo "Start Training......" 33 | 34 | CUDA_VISIBLE_DEVICES=$GPU_DEVICE python -u $CODE_PATH/run.py --do_train \ 35 | --cuda \ 36 | --do_valid \ 37 | --do_test \ 38 | --data_path $FULL_DATA_PATH \ 39 | --model $MODEL \ 40 | -n $NEGATIVE_SAMPLE_SIZE -b $BATCH_SIZE -d $HIDDEN_DIM \ 41 | -g $GAMMA -a $ALPHA \ 42 | -lr $LEARNING_RATE --max_steps $MAX_STEPS \ 43 | -save $SAVE --test_batch_size $TEST_BATCH_SIZE \ 44 | ${14} ${15} ${16} ${17} ${18} ${19} ${20} 45 | 46 | elif [ $MODE == "valid" ] 47 | then 48 | 49 | echo "Start Evaluation on Valid Data Set......" 50 | 51 | CUDA_VISIBLE_DEVICES=$GPU_DEVICE python -u $CODE_PATH/run.py --do_valid --cuda -init $SAVE 52 | 53 | elif [ $MODE == "test" ] 54 | then 55 | 56 | echo "Start Evaluation on Test Data Set......" 57 | 58 | CUDA_VISIBLE_DEVICES=$GPU_DEVICE python -u $CODE_PATH/run.py --do_test --cuda -init $SAVE 59 | 60 | else 61 | echo "Unknown MODE" $MODE 62 | fi 63 | -------------------------------------------------------------------------------- /CoMPILE_github/managers/__pycache__/evaluator.cpython-36.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TmacMai/CoMPILE_Inductive_Knowledge_Graph/523db77f8178f89faf5ec06c677ca66b84ef35eb/CoMPILE_github/managers/__pycache__/evaluator.cpython-36.pyc -------------------------------------------------------------------------------- /CoMPILE_github/managers/__pycache__/trainer.cpython-36.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TmacMai/CoMPILE_Inductive_Knowledge_Graph/523db77f8178f89faf5ec06c677ca66b84ef35eb/CoMPILE_github/managers/__pycache__/trainer.cpython-36.pyc -------------------------------------------------------------------------------- /CoMPILE_github/managers/evaluator.py: -------------------------------------------------------------------------------- 1 | import os 2 | import numpy as np 3 | import torch 4 | import pdb 5 | from sklearn import metrics 6 | import torch.nn.functional as F 7 | from torch.utils.data import DataLoader 8 | 9 | 10 | class Evaluator(): 11 | def __init__(self, params, graph_classifier, data): 12 | self.params = params 13 | self.graph_classifier = graph_classifier 14 | self.data = data 15 | 16 | def eval(self, save=False): 17 | pos_scores = [] 18 | pos_labels = [] 19 | neg_scores = [] 20 | neg_labels = [] 21 | dataloader = DataLoader(self.data, batch_size=self.params.batch_size, shuffle=False, num_workers=self.params.num_workers, collate_fn=self.params.collate_fn) 22 | 23 | self.graph_classifier.eval() 24 | with torch.no_grad(): 25 | for b_idx, batch in enumerate(dataloader): 26 | (graphs_pos, r_labels_pos), g_labels_pos, (graph_neg, r_labels_neg), g_labels_neg = batch 27 | # data_pos, targets_pos, data_neg, targets_neg = self.params.move_batch_to_device(batch, self.params.device) 28 | # print([self.data.id2relation[r.item()] for r in data_pos[1]]) 29 | # pdb.set_trace() 30 | 31 | g_labels_pos = torch.LongTensor(g_labels_pos).to(device=self.params.device) 32 | r_labels_pos = torch.LongTensor(r_labels_pos).to(device=self.params.device) 33 | 34 | g_labels_neg = torch.LongTensor(g_labels_neg).to(device=self.params.device) 35 | r_labels_neg = torch.LongTensor(r_labels_neg).to(device=self.params.device) 36 | 37 | score_pos = self.graph_classifier(graphs_pos) 38 | score_neg = self.graph_classifier(graph_neg) 39 | 40 | # preds += torch.argmax(logits.detach().cpu(), dim=1).tolist() 41 | pos_scores += score_pos.squeeze(1).detach().cpu().tolist() 42 | neg_scores += score_neg.squeeze(1).detach().cpu().tolist() 43 | pos_labels += g_labels_pos.tolist() 44 | neg_labels += g_labels_neg.tolist() 45 | 46 | # acc = metrics.accuracy_score(labels, preds) 47 | auc = metrics.roc_auc_score(pos_labels + neg_labels, pos_scores + neg_scores) 48 | auc_pr = metrics.average_precision_score(pos_labels + neg_labels, pos_scores + neg_scores) 49 | 50 | if save: 51 | pos_test_triplets_path = os.path.join(self.params.main_dir, 'data/{}/{}.txt'.format(self.params.dataset, self.data.file_name)) 52 | with open(pos_test_triplets_path) as f: 53 | pos_triplets = [line.split() for line in f.read().split('\n')[:-1]] 54 | pos_file_path = os.path.join(self.params.main_dir, 'data/{}/grail_{}_predictions.txt'.format(self.params.dataset, self.data.file_name)) 55 | with open(pos_file_path, "w") as f: 56 | for ([s, r, o], score) in zip(pos_triplets, pos_scores): 57 | f.write('\t'.join([s, r, o, str(score)]) + '\n') 58 | 59 | neg_test_triplets_path = os.path.join(self.params.main_dir, 'data/{}/neg_{}_0.txt'.format(self.params.dataset, self.data.file_name)) 60 | with open(neg_test_triplets_path) as f: 61 | neg_triplets = [line.split() for line in f.read().split('\n')[:-1]] 62 | neg_file_path = os.path.join(self.params.main_dir, 'data/{}/grail_neg_{}_{}_predictions.txt'.format(self.params.dataset, self.data.file_name, self.params.constrained_neg_prob)) 63 | with open(neg_file_path, "w") as f: 64 | for ([s, r, o], score) in zip(neg_triplets, neg_scores): 65 | f.write('\t'.join([s, r, o, str(score)]) + '\n') 66 | 67 | return {'auc': auc, 'auc_pr': auc_pr} 68 | -------------------------------------------------------------------------------- /CoMPILE_github/managers/trainer.py: -------------------------------------------------------------------------------- 1 | import statistics 2 | import timeit 3 | import os 4 | import logging 5 | import pdb 6 | import numpy as np 7 | import time 8 | 9 | import torch 10 | import torch.nn as nn 11 | import torch.optim as optim 12 | import torch.nn.functional as F 13 | from torch.utils.data import DataLoader 14 | 15 | from sklearn import metrics 16 | 17 | 18 | class Trainer(): 19 | def __init__(self, params, graph_classifier, train, valid_evaluator=None): 20 | self.graph_classifier = graph_classifier 21 | self.valid_evaluator = valid_evaluator 22 | self.params = params 23 | self.train_data = train 24 | 25 | self.updates_counter = 0 26 | 27 | model_params = list(self.graph_classifier.parameters()) 28 | logging.info('Total number of parameters: %d' % sum(map(lambda x: x.numel(), model_params))) 29 | 30 | if params.optimizer == "SGD": 31 | self.optimizer = optim.SGD(model_params, lr=params.lr, momentum=params.momentum, weight_decay=self.params.l2) 32 | if params.optimizer == "Adam": 33 | self.optimizer = optim.Adam(model_params, lr=params.lr, weight_decay=self.params.l2) 34 | 35 | self.criterion = nn.MarginRankingLoss(self.params.margin, reduction='sum') 36 | 37 | self.reset_training_state() 38 | 39 | def reset_training_state(self): 40 | self.best_metric = 0 41 | self.last_metric = 0 42 | self.not_improved_count = 0 43 | 44 | def train_epoch(self): 45 | total_loss = 0 46 | all_preds = [] 47 | all_labels = [] 48 | all_scores = [] 49 | 50 | dataloader = DataLoader(self.train_data, batch_size=self.params.batch_size, shuffle=True, num_workers=self.params.num_workers, collate_fn=self.params.collate_fn) 51 | # dataloader = DataLoader(self.train_data, batch_size=self.params.batch_size, shuffle=True, num_workers=self.params.num_workers) 52 | self.graph_classifier.train() 53 | model_params = list(self.graph_classifier.parameters()) 54 | for b_idx, batch in enumerate(dataloader): 55 | (graphs_pos, r_labels_pos), g_labels_pos, (graph_neg, r_labels_neg), g_labels_neg = batch 56 | 57 | g_labels_pos = torch.LongTensor(g_labels_pos).to(device=self.params.device) 58 | r_labels_pos = torch.LongTensor(r_labels_pos).to(device=self.params.device) 59 | 60 | g_labels_neg = torch.LongTensor(g_labels_neg).to(device=self.params.device) 61 | r_labels_neg = torch.LongTensor(r_labels_neg).to(device=self.params.device) 62 | 63 | self.graph_classifier.train() 64 | # data_pos, targets_pos, data_neg, targets_neg = self.params.move_batch_to_device(batch, self.params.device) 65 | self.optimizer.zero_grad() 66 | # print('batch size ', len(targets_pos), ' ', len(targets_neg)) 67 | # print('r label pos ', len(data_pos[1]), ' r label neg ', len(data_neg[1])) 68 | score_pos = self.graph_classifier(graphs_pos) 69 | score_neg = self.graph_classifier(graph_neg) 70 | loss = self.criterion(score_pos, score_neg.view(len(score_pos), -1).mean(dim=1), torch.Tensor([1]).to(device=self.params.device)) 71 | # print(score_pos, score_neg, loss) 72 | loss.backward() 73 | self.optimizer.step() 74 | self.updates_counter += 1 75 | 76 | with torch.no_grad(): 77 | # print(score_pos.shape, score_neg.shape) 78 | # print(score_pos) 79 | all_scores += score_pos.squeeze(1).detach().cpu().tolist() + score_neg.squeeze(1).detach().cpu().tolist() 80 | all_labels += g_labels_pos.tolist() + g_labels_neg.tolist() 81 | total_loss += loss 82 | 83 | if self.valid_evaluator and self.params.eval_every_iter and self.updates_counter % self.params.eval_every_iter == 0: 84 | tic = time.time() 85 | result = self.valid_evaluator.eval() 86 | logging.info('\nPerformance:' + str(result) + 'in ' + str(time.time() - tic)) 87 | 88 | if result['auc'] >= self.best_metric: 89 | self.save_classifier() 90 | self.best_metric = result['auc'] 91 | self.not_improved_count = 0 92 | 93 | else: 94 | self.not_improved_count += 1 95 | if self.not_improved_count > self.params.early_stop: 96 | logging.info(f"Validation performance didn\'t improve for {self.params.early_stop} epochs. Training stops.") 97 | break 98 | self.last_metric = result['auc'] 99 | 100 | auc = metrics.roc_auc_score(all_labels, all_scores) 101 | auc_pr = metrics.average_precision_score(all_labels, all_scores) 102 | 103 | weight_norm = sum(map(lambda x: torch.norm(x), model_params)) 104 | 105 | return total_loss, auc, auc_pr, weight_norm 106 | 107 | def train(self): 108 | self.reset_training_state() 109 | 110 | for epoch in range(1, self.params.num_epochs + 1): 111 | time_start = time.time() 112 | loss, auc, auc_pr, weight_norm = self.train_epoch() 113 | time_elapsed = time.time() - time_start 114 | logging.info(f'Epoch {epoch} with loss: {loss}, training auc: {auc}, training auc_pr: {auc_pr}, best validation AUC: {self.best_metric}, weight_norm: {weight_norm} in {time_elapsed}') 115 | 116 | # if self.valid_evaluator and epoch % self.params.eval_every == 0: 117 | # result = self.valid_evaluator.eval() 118 | # logging.info('\nPerformance:' + str(result)) 119 | 120 | # if result['auc'] >= self.best_metric: 121 | # self.save_classifier() 122 | # self.best_metric = result['auc'] 123 | # self.not_improved_count = 0 124 | 125 | # else: 126 | # self.not_improved_count += 1 127 | # if self.not_improved_count > self.params.early_stop: 128 | # logging.info(f"Validation performance didn\'t improve for {self.params.early_stop} epochs. Training stops.") 129 | # break 130 | # self.last_metric = result['auc'] 131 | 132 | if epoch % self.params.save_every == 0: 133 | torch.save(self.graph_classifier, os.path.join(self.params.exp_dir, 'graph_classifier_chk.pth')) 134 | 135 | def save_classifier(self): 136 | torch.save(self.graph_classifier, os.path.join(self.params.exp_dir, 'best_graph_classifier.pth')) # Does it overwrite or fuck with the existing file? 137 | logging.info('Better models found w.r.t accuracy. Saved it!') 138 | -------------------------------------------------------------------------------- /CoMPILE_github/model/dgl/aggregators.py: -------------------------------------------------------------------------------- 1 | import abc 2 | import torch.nn as nn 3 | import torch 4 | import torch.nn.functional as F 5 | 6 | 7 | class Aggregator(nn.Module): 8 | def __init__(self, emb_dim): 9 | super(Aggregator, self).__init__() 10 | 11 | def forward(self, node): 12 | curr_emb = node.mailbox['curr_emb'][:, 0, :] # (B, F) 13 | # print('curr_emb 2 ', curr_emb.shape) 14 | nei_msg = torch.bmm(node.mailbox['alpha'].transpose(1, 2), node.mailbox['msg']).squeeze(1) # (B, F) 15 | # print('nei_msg 2 ', nei_msg.shape) 16 | # nei_msg, _ = torch.max(node.mailbox['msg'], 1) # (B, F) 17 | 18 | new_emb = self.update_embedding(curr_emb, nei_msg) 19 | 20 | return {'h': new_emb} 21 | 22 | @abc.abstractmethod 23 | def update_embedding(curr_emb, nei_msg): 24 | raise NotImplementedError 25 | 26 | 27 | class SumAggregator(Aggregator): 28 | def __init__(self, emb_dim): 29 | super(SumAggregator, self).__init__(emb_dim) 30 | 31 | def update_embedding(self, curr_emb, nei_msg): 32 | new_emb = nei_msg + curr_emb 33 | # print(new_emb.shape, 'new embed') 34 | return new_emb 35 | 36 | 37 | class MLPAggregator(Aggregator): 38 | def __init__(self, emb_dim): 39 | super(MLPAggregator, self).__init__(emb_dim) 40 | self.linear = nn.Linear(2 * emb_dim, emb_dim) 41 | 42 | def update_embedding(self, curr_emb, nei_msg): 43 | inp = torch.cat((nei_msg, curr_emb), 1) 44 | new_emb = F.relu(self.linear(inp)) 45 | 46 | return new_emb 47 | 48 | 49 | class GRUAggregator(Aggregator): 50 | def __init__(self, emb_dim): 51 | super(GRUAggregator, self).__init__(emb_dim) 52 | self.gru = nn.GRUCell(emb_dim, emb_dim) 53 | 54 | def update_embedding(self, curr_emb, nei_msg): 55 | new_emb = self.gru(nei_msg, curr_emb) 56 | 57 | return new_emb 58 | -------------------------------------------------------------------------------- /CoMPILE_github/model/dgl/layers.py: -------------------------------------------------------------------------------- 1 | """ 2 | File baseed off of dgl tutorial on RGCN 3 | Source: https://github.com/dmlc/dgl/tree/master/examples/pytorch/rgcn 4 | """ 5 | import torch 6 | import torch.nn as nn 7 | import torch.nn.functional as F 8 | 9 | 10 | class Identity(nn.Module): 11 | """A placeholder identity operator that is argument-insensitive. 12 | (Identity has already been supported by PyTorch 1.2, we will directly 13 | import torch.nn.Identity in the future) 14 | """ 15 | 16 | def __init__(self): 17 | super(Identity, self).__init__() 18 | 19 | def forward(self, x): 20 | """Return input""" 21 | return x 22 | 23 | 24 | class RGCNLayer(nn.Module): 25 | def __init__(self, inp_dim, out_dim, aggregator, bias=None, activation=None, dropout=0.0, edge_dropout=0.0, is_input_layer=False): 26 | super(RGCNLayer, self).__init__() 27 | self.bias = bias 28 | self.activation = activation 29 | 30 | if self.bias: 31 | self.bias = nn.Parameter(torch.Tensor(out_dim)) 32 | nn.init.xavier_uniform_(self.bias, 33 | gain=nn.init.calculate_gain('relu')) 34 | 35 | self.aggregator = aggregator 36 | 37 | if dropout: 38 | self.dropout = nn.Dropout(dropout) 39 | else: 40 | self.dropout = None 41 | 42 | if edge_dropout: 43 | self.edge_dropout = nn.Dropout(edge_dropout) 44 | else: 45 | self.edge_dropout = Identity() 46 | 47 | # define how propagation is done in subclass 48 | def propagate(self, g): 49 | raise NotImplementedError 50 | 51 | def forward(self, g, attn_rel_emb=None): 52 | 53 | self.propagate(g, attn_rel_emb) 54 | 55 | # apply bias and activation 56 | node_repr = g.ndata['h'] 57 | if self.bias: 58 | node_repr = node_repr + self.bias 59 | if self.activation: 60 | node_repr = self.activation(node_repr) 61 | if self.dropout: 62 | node_repr = self.dropout(node_repr) 63 | 64 | g.ndata['h'] = node_repr 65 | 66 | if self.is_input_layer: 67 | g.ndata['repr'] = g.ndata['h'].unsqueeze(1) 68 | else: 69 | g.ndata['repr'] = torch.cat([g.ndata['repr'], g.ndata['h'].unsqueeze(1)], dim=1) 70 | 71 | 72 | class RGCNBasisLayer(RGCNLayer): 73 | def __init__(self, inp_dim, out_dim, aggregator, attn_rel_emb_dim, num_rels, num_bases=-1, bias=None, 74 | activation=None, dropout=0.0, edge_dropout=0.0, is_input_layer=False, has_attn=False): 75 | super( 76 | RGCNBasisLayer, 77 | self).__init__( 78 | inp_dim, 79 | out_dim, 80 | aggregator, 81 | bias, 82 | activation, 83 | dropout=dropout, 84 | edge_dropout=edge_dropout, 85 | is_input_layer=is_input_layer) 86 | self.inp_dim = inp_dim 87 | self.out_dim = out_dim 88 | self.attn_rel_emb_dim = attn_rel_emb_dim 89 | self.num_rels = num_rels 90 | self.num_bases = num_bases 91 | self.is_input_layer = is_input_layer 92 | self.has_attn = has_attn 93 | 94 | if self.num_bases <= 0 or self.num_bases > self.num_rels: 95 | self.num_bases = self.num_rels 96 | 97 | # add basis weights 98 | # self.weight = basis_weights 99 | self.weight = nn.Parameter(torch.Tensor(self.num_bases, self.inp_dim, self.out_dim)) 100 | self.w_comp = nn.Parameter(torch.Tensor(self.num_rels, self.num_bases)) 101 | 102 | if self.has_attn: 103 | self.A = nn.Linear(2 * self.inp_dim + 2 * self.attn_rel_emb_dim, inp_dim) 104 | self.B = nn.Linear(inp_dim, 1) 105 | 106 | self.self_loop_weight = nn.Parameter(torch.Tensor(self.inp_dim, self.out_dim)) 107 | 108 | nn.init.xavier_uniform_(self.self_loop_weight, gain=nn.init.calculate_gain('relu')) 109 | nn.init.xavier_uniform_(self.weight, gain=nn.init.calculate_gain('relu')) 110 | nn.init.xavier_uniform_(self.w_comp, gain=nn.init.calculate_gain('relu')) 111 | 112 | def propagate(self, g, attn_rel_emb=None): 113 | # generate all weights from bases 114 | weight = self.weight.view(self.num_bases, 115 | self.inp_dim * self.out_dim) 116 | weight = torch.matmul(self.w_comp, weight).view( 117 | self.num_rels, self.inp_dim, self.out_dim) 118 | 119 | g.edata['w'] = self.edge_dropout(torch.ones(g.number_of_edges(), 1).to(weight.device)) 120 | print('number of edge ', g.number_of_edges()) 121 | input_ = 'feat' if self.is_input_layer else 'h' 122 | 123 | def msg_func(edges): 124 | w = weight.index_select(0, edges.data['type']) 125 | msg = edges.data['w'] * torch.bmm(edges.src[input_].unsqueeze(1), w).squeeze(1) 126 | curr_emb = torch.mm(edges.dst[input_], self.self_loop_weight) # (B, F) 127 | # print('curren embed ', curr_emb.shape, 'msg ',msg.shape) 128 | if self.has_attn: 129 | e = torch.cat([edges.src[input_], edges.dst[input_], attn_rel_emb(edges.data['type']), attn_rel_emb(edges.data['label'])], dim=1) 130 | a = torch.sigmoid(self.B(F.relu(self.A(e)))) 131 | else: 132 | a = torch.ones((len(edges), 1)).to(device=w.device) 133 | 134 | return {'curr_emb': curr_emb, 'msg': msg, 'alpha': a} 135 | 136 | g.update_all(msg_func, self.aggregator, None) 137 | -------------------------------------------------------------------------------- /CoMPILE_github/model/dgl/rgcn_model.py: -------------------------------------------------------------------------------- 1 | """ 2 | File based off of dgl tutorial on RGCN 3 | Source: https://github.com/dmlc/dgl/tree/master/examples/pytorch/rgcn 4 | """ 5 | 6 | import torch 7 | import torch.nn as nn 8 | import torch.nn.functional as F 9 | from .layers import RGCNBasisLayer as RGCNLayer 10 | 11 | from .aggregators import SumAggregator, MLPAggregator, GRUAggregator 12 | 13 | 14 | class RGCN(nn.Module): 15 | def __init__(self, params): 16 | super(RGCN, self).__init__() 17 | 18 | self.max_label_value = params.max_label_value 19 | self.inp_dim = params.inp_dim 20 | self.emb_dim = params.emb_dim 21 | self.attn_rel_emb_dim = params.attn_rel_emb_dim 22 | self.num_rels = params.num_rels 23 | self.aug_num_rels = params.aug_num_rels 24 | self.num_bases = params.num_bases 25 | self.num_hidden_layers = params.num_gcn_layers 26 | self.dropout = params.dropout 27 | self.edge_dropout = params.edge_dropout 28 | # self.aggregator_type = params.gnn_agg_type 29 | self.has_attn = params.has_attn 30 | 31 | self.device = params.device 32 | 33 | if self.has_attn: 34 | self.attn_rel_emb = nn.Embedding(self.num_rels, self.attn_rel_emb_dim, sparse=False) 35 | else: 36 | self.attn_rel_emb = None 37 | 38 | # initialize aggregators for input and hidden layers 39 | if params.gnn_agg_type == "sum": 40 | self.aggregator = SumAggregator(self.emb_dim) 41 | elif params.gnn_agg_type == "mlp": 42 | self.aggregator = MLPAggregator(self.emb_dim) 43 | elif params.gnn_agg_type == "gru": 44 | self.aggregator = GRUAggregator(self.emb_dim) 45 | 46 | # initialize basis weights for input and hidden layers 47 | # self.input_basis_weights = nn.Parameter(torch.Tensor(self.num_bases, self.inp_dim, self.emb_dim)) 48 | # self.basis_weights = nn.Parameter(torch.Tensor(self.num_bases, self.emb_dim, self.emb_dim)) 49 | 50 | # create rgcn layers 51 | self.build_model() 52 | 53 | # create initial features 54 | self.features = self.create_features() 55 | 56 | def create_features(self): 57 | features = torch.arange(self.inp_dim).to(device=self.device) 58 | return features 59 | 60 | def build_model(self): 61 | self.layers = nn.ModuleList() 62 | # i2h 63 | i2h = self.build_input_layer() 64 | if i2h is not None: 65 | self.layers.append(i2h) 66 | # h2h 67 | for idx in range(self.num_hidden_layers - 1): 68 | h2h = self.build_hidden_layer(idx) 69 | self.layers.append(h2h) 70 | 71 | def build_input_layer(self): 72 | return RGCNLayer(self.inp_dim, 73 | self.emb_dim, 74 | # self.input_basis_weights, 75 | self.aggregator, 76 | self.attn_rel_emb_dim, 77 | self.aug_num_rels, 78 | self.num_bases, 79 | activation=F.relu, 80 | dropout=self.dropout, 81 | edge_dropout=self.edge_dropout, 82 | is_input_layer=True, 83 | has_attn=self.has_attn) 84 | 85 | def build_hidden_layer(self, idx): 86 | return RGCNLayer(self.emb_dim, 87 | self.emb_dim, 88 | # self.basis_weights, 89 | self.aggregator, 90 | self.attn_rel_emb_dim, 91 | self.aug_num_rels, 92 | self.num_bases, 93 | activation=F.relu, 94 | dropout=self.dropout, 95 | edge_dropout=self.edge_dropout, 96 | has_attn=self.has_attn) 97 | 98 | def forward(self, g): 99 | for layer in self.layers: 100 | layer(g, self.attn_rel_emb) 101 | return g.ndata.pop('h') 102 | -------------------------------------------------------------------------------- /CoMPILE_github/requirements.txt: -------------------------------------------------------------------------------- 1 | The requirements are the same as those in GraiL: 2 | dgl==0.4.2 3 | lmdb==0.98 4 | networkx==2.4 5 | scikit-learn==0.22.1 6 | torch==1.4.0 7 | tqdm==4.43.0 8 | -------------------------------------------------------------------------------- /CoMPILE_github/subgraph_extraction/__pycache__/datasets.cpython-36.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TmacMai/CoMPILE_Inductive_Knowledge_Graph/523db77f8178f89faf5ec06c677ca66b84ef35eb/CoMPILE_github/subgraph_extraction/__pycache__/datasets.cpython-36.pyc -------------------------------------------------------------------------------- /CoMPILE_github/subgraph_extraction/__pycache__/graph_sampler.cpython-36.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TmacMai/CoMPILE_Inductive_Knowledge_Graph/523db77f8178f89faf5ec06c677ca66b84ef35eb/CoMPILE_github/subgraph_extraction/__pycache__/graph_sampler.cpython-36.pyc -------------------------------------------------------------------------------- /CoMPILE_github/subgraph_extraction/datasets.py: -------------------------------------------------------------------------------- 1 | from torch.utils.data import Dataset 2 | import timeit 3 | import os 4 | import logging 5 | import lmdb 6 | import numpy as np 7 | import json 8 | import pickle 9 | import dgl 10 | from utils.graph_utils import ssp_multigraph_to_dgl, incidence_matrix 11 | from utils.data_utils import process_files, save_to_file, plot_rel_dist 12 | from .graph_sampler import * 13 | import pdb 14 | 15 | 16 | def generate_subgraph_datasets(params, splits=['train', 'valid'], saved_relation2id=None, max_label_value=None): 17 | 18 | testing = 'test' in splits 19 | adj_list, triplets, entity2id, relation2id, id2entity, id2relation = process_files(params.file_paths, saved_relation2id) 20 | 21 | # plot_rel_dist(adj_list, os.path.join(params.main_dir, f'data/{params.dataset}/rel_dist.png')) 22 | 23 | data_path = os.path.join(params.main_dir, f'data/{params.dataset}/relation2id.json') 24 | if not os.path.isdir(data_path) and not testing: 25 | with open(data_path, 'w') as f: 26 | json.dump(relation2id, f) 27 | 28 | graphs = {} 29 | 30 | for split_name in splits: 31 | graphs[split_name] = {'triplets': triplets[split_name], 'max_size': params.max_links} 32 | 33 | # Sample train and valid/test links 34 | for split_name, split in graphs.items(): 35 | logging.info(f"Sampling negative links for {split_name}") 36 | split['pos'], split['neg'] = sample_neg(adj_list, split['triplets'], params.num_neg_samples_per_link, max_size=split['max_size'], constrained_neg_prob=params.constrained_neg_prob) 37 | 38 | if testing: 39 | directory = os.path.join(params.main_dir, 'data/{}/'.format(params.dataset)) 40 | save_to_file(directory, f'neg_{params.test_file}_{params.constrained_neg_prob}.txt', graphs['test']['neg'], id2entity, id2relation) 41 | 42 | links2subgraphs(adj_list, graphs, params, max_label_value) 43 | 44 | 45 | def get_kge_embeddings(dataset, kge_model): 46 | 47 | path = './experiments/kge_baselines/{}_{}'.format(kge_model, dataset) 48 | node_features = np.load(os.path.join(path, 'entity_embedding.npy')) 49 | with open(os.path.join(path, 'id2entity.json')) as json_file: 50 | kge_id2entity = json.load(json_file) 51 | kge_entity2id = {v: int(k) for k, v in kge_id2entity.items()} 52 | 53 | return node_features, kge_entity2id 54 | 55 | 56 | class SubgraphDataset(Dataset): 57 | """Extracted, labeled, subgraph dataset -- DGL Only""" 58 | 59 | def __init__(self, db_path, db_name_pos, db_name_neg, raw_data_paths, included_relations=None, add_traspose_rels=False, num_neg_samples_per_link=1, use_kge_embeddings=False, dataset='', kge_model='', file_name=''): 60 | 61 | self.main_env = lmdb.open(db_path, readonly=True, max_dbs=3, lock=False) 62 | self.db_pos = self.main_env.open_db(db_name_pos.encode()) 63 | self.db_neg = self.main_env.open_db(db_name_neg.encode()) 64 | self.node_features, self.kge_entity2id = get_kge_embeddings(dataset, kge_model) if use_kge_embeddings else (None, None) 65 | self.num_neg_samples_per_link = num_neg_samples_per_link 66 | self.file_name = file_name 67 | 68 | ssp_graph, __, __, __, id2entity, id2relation = process_files(raw_data_paths, included_relations) 69 | self.num_rels = len(ssp_graph) 70 | 71 | # Add transpose matrices to handle both directions of relations. 72 | if add_traspose_rels: 73 | ssp_graph_t = [adj.T for adj in ssp_graph] 74 | ssp_graph += ssp_graph_t 75 | 76 | # the effective number of relations after adding symmetric adjacency matrices and/or self connections 77 | self.aug_num_rels = len(ssp_graph) 78 | self.graph = ssp_multigraph_to_dgl(ssp_graph) 79 | self.ssp_graph = ssp_graph 80 | self.id2entity = id2entity 81 | self.id2relation = id2relation 82 | 83 | self.max_n_label = np.array([0, 0]) 84 | with self.main_env.begin() as txn: 85 | self.max_n_label[0] = int.from_bytes(txn.get('max_n_label_sub'.encode()), byteorder='little') 86 | self.max_n_label[1] = int.from_bytes(txn.get('max_n_label_obj'.encode()), byteorder='little') 87 | 88 | self.avg_subgraph_size = struct.unpack('f', txn.get('avg_subgraph_size'.encode())) 89 | self.min_subgraph_size = struct.unpack('f', txn.get('min_subgraph_size'.encode())) 90 | self.max_subgraph_size = struct.unpack('f', txn.get('max_subgraph_size'.encode())) 91 | self.std_subgraph_size = struct.unpack('f', txn.get('std_subgraph_size'.encode())) 92 | 93 | self.avg_enc_ratio = struct.unpack('f', txn.get('avg_enc_ratio'.encode())) 94 | self.min_enc_ratio = struct.unpack('f', txn.get('min_enc_ratio'.encode())) 95 | self.max_enc_ratio = struct.unpack('f', txn.get('max_enc_ratio'.encode())) 96 | self.std_enc_ratio = struct.unpack('f', txn.get('std_enc_ratio'.encode())) 97 | 98 | self.avg_num_pruned_nodes = struct.unpack('f', txn.get('avg_num_pruned_nodes'.encode())) 99 | self.min_num_pruned_nodes = struct.unpack('f', txn.get('min_num_pruned_nodes'.encode())) 100 | self.max_num_pruned_nodes = struct.unpack('f', txn.get('max_num_pruned_nodes'.encode())) 101 | self.std_num_pruned_nodes = struct.unpack('f', txn.get('std_num_pruned_nodes'.encode())) 102 | 103 | logging.info(f"Max distance from sub : {self.max_n_label[0]}, Max distance from obj : {self.max_n_label[1]}") 104 | 105 | # logging.info('=====================') 106 | # logging.info(f"Subgraph size stats: \n Avg size {self.avg_subgraph_size}, \n Min size {self.min_subgraph_size}, \n Max size {self.max_subgraph_size}, \n Std {self.std_subgraph_size}") 107 | 108 | # logging.info('=====================') 109 | # logging.info(f"Enclosed nodes ratio stats: \n Avg size {self.avg_enc_ratio}, \n Min size {self.min_enc_ratio}, \n Max size {self.max_enc_ratio}, \n Std {self.std_enc_ratio}") 110 | 111 | # logging.info('=====================') 112 | # logging.info(f"# of pruned nodes stats: \n Avg size {self.avg_num_pruned_nodes}, \n Min size {self.min_num_pruned_nodes}, \n Max size {self.max_num_pruned_nodes}, \n Std {self.std_num_pruned_nodes}") 113 | 114 | with self.main_env.begin(db=self.db_pos) as txn: 115 | self.num_graphs_pos = int.from_bytes(txn.get('num_graphs'.encode()), byteorder='little') 116 | with self.main_env.begin(db=self.db_neg) as txn: 117 | self.num_graphs_neg = int.from_bytes(txn.get('num_graphs'.encode()), byteorder='little') 118 | 119 | self.__getitem__(0) 120 | 121 | def __getitem__(self, index): 122 | with self.main_env.begin(db=self.db_pos) as txn: 123 | # print('index ', index) 124 | str_id = '{:08}'.format(index).encode('ascii') 125 | nodes_pos, r_label_pos, g_label_pos, n_labels_pos = deserialize(txn.get(str_id)).values() 126 | # print('nodes_pos shape ', len(nodes_pos)) 127 | # print('nodes_pos ',nodes_pos) #############nodes 128 | # print('r_label_pos shape ',r_label_pos.shape) #####relation target shape 1 129 | # print('r_label_pos ',r_label_pos) 130 | # print('g_label_pos shape ',g_label_pos.shape) ###########graph label 0 or 1 shape 1 131 | # print('g_label_pos ',g_label_pos) 132 | subgraph_pos = self._prepare_subgraphs(nodes_pos, r_label_pos, n_labels_pos) 133 | # print('subgraph_pos ', len(subgraph_pos)) 134 | subgraphs_neg = [] 135 | r_labels_neg = [] 136 | g_labels_neg = [] 137 | with self.main_env.begin(db=self.db_neg) as txn: 138 | for i in range(self.num_neg_samples_per_link): 139 | str_id = '{:08}'.format(index + i * (self.num_graphs_pos)).encode('ascii') 140 | nodes_neg, r_label_neg, g_label_neg, n_labels_neg = deserialize(txn.get(str_id)).values() 141 | subgraphs_neg.append(self._prepare_subgraphs(nodes_neg, r_label_neg, n_labels_neg)) 142 | r_labels_neg.append(r_label_neg) 143 | g_labels_neg.append(g_label_neg) 144 | 145 | return subgraph_pos, g_label_pos, r_label_pos, subgraphs_neg, g_labels_neg, r_labels_neg 146 | 147 | def __len__(self): 148 | return self.num_graphs_pos 149 | 150 | def _prepare_subgraphs(self, nodes, r_label, n_labels): 151 | subgraph = dgl.DGLGraph(self.graph.subgraph(nodes)) 152 | subgraph.edata['type'] = self.graph.edata['type'][self.graph.subgraph(nodes).parent_eid] 153 | subgraph.edata['label'] = torch.tensor(r_label * np.ones(subgraph.edata['type'].shape), dtype=torch.long) 154 | 155 | edges_btw_roots = subgraph.edge_id(0, 1) 156 | rel_link = np.nonzero(subgraph.edata['type'][edges_btw_roots] == r_label) 157 | if rel_link.squeeze().nelement() == 0: 158 | subgraph.add_edge(0, 1) 159 | subgraph.edata['type'][-1] = torch.tensor(r_label).type(torch.LongTensor) 160 | subgraph.edata['label'][-1] = torch.tensor(r_label).type(torch.LongTensor) 161 | # print('all edges length ', len(subgraph.edges()[0])) #########source: subgraph.edges()[0] target: subgraph.edges()[1] 162 | # map the id read by GraIL to the entity IDs as registered by the KGE embeddings 163 | kge_nodes = [self.kge_entity2id[self.id2entity[n]] for n in nodes] if self.kge_entity2id else None 164 | n_feats = self.node_features[kge_nodes] if self.node_features is not None else None 165 | subgraph = self._prepare_features_new(subgraph, n_labels, n_feats) 166 | 167 | return subgraph 168 | 169 | def _prepare_features(self, subgraph, n_labels, n_feats=None): 170 | # One hot encode the node label feature and concat to n_featsure 171 | n_nodes = subgraph.number_of_nodes() 172 | label_feats = np.zeros((n_nodes, self.max_n_label[0] + 1)) 173 | label_feats[np.arange(n_nodes), n_labels] = 1 174 | label_feats[np.arange(n_nodes), self.max_n_label[0] + 1 + n_labels[:, 1]] = 1 175 | n_feats = np.concatenate((label_feats, n_feats), axis=1) if n_feats else label_feats 176 | subgraph.ndata['feat'] = torch.FloatTensor(n_feats) 177 | self.n_feat_dim = n_feats.shape[1] # Find cleaner way to do this -- i.e. set the n_feat_dim 178 | return subgraph 179 | 180 | def _prepare_features_new(self, subgraph, n_labels, n_feats=None): 181 | # One hot encode the node label feature and concat to n_featsure 182 | n_nodes = subgraph.number_of_nodes() 183 | label_feats = np.zeros((n_nodes, self.max_n_label[0] + 1 + self.max_n_label[1] + 1)) 184 | label_feats[np.arange(n_nodes), n_labels[:, 0]] = 1 185 | label_feats[np.arange(n_nodes), self.max_n_label[0] + 1 + n_labels[:, 1]] = 1 186 | # label_feats = np.zeros((n_nodes, self.max_n_label[0] + 1 + self.max_n_label[1] + 1)) 187 | # label_feats[np.arange(n_nodes), 0] = 1 188 | # label_feats[np.arange(n_nodes), self.max_n_label[0] + 1] = 1 189 | n_feats = np.concatenate((label_feats, n_feats), axis=1) if n_feats is not None else label_feats 190 | subgraph.ndata['feat'] = torch.FloatTensor(n_feats) 191 | 192 | head_id = np.argwhere([label[0] == 0 and label[1] == 1 for label in n_labels]) ################### 193 | tail_id = np.argwhere([label[0] == 1 and label[1] == 0 for label in n_labels]) ############## 194 | n_ids = np.zeros(n_nodes) 195 | n_ids[head_id] = 1 # head 196 | n_ids[tail_id] = 2 # tail 197 | subgraph.ndata['id'] = torch.FloatTensor(n_ids) 198 | # subgraph.ndata['head_id'] = head_id 199 | # subgraph.ndata['tail_id'] = tail_id 200 | self.n_feat_dim = n_feats.shape[1] # Find cleaner way to do this -- i.e. set the n_feat_dim 201 | return subgraph 202 | -------------------------------------------------------------------------------- /CoMPILE_github/subgraph_extraction/graph_sampler.py: -------------------------------------------------------------------------------- 1 | import os 2 | import math 3 | import struct 4 | import logging 5 | import random 6 | import pickle as pkl 7 | import pdb 8 | from tqdm import tqdm 9 | import lmdb 10 | import multiprocessing as mp 11 | import numpy as np 12 | import scipy.io as sio 13 | import scipy.sparse as ssp 14 | import sys 15 | import torch 16 | from scipy.special import softmax 17 | from utils.dgl_utils import _bfs_relational 18 | from utils.graph_utils import incidence_matrix, remove_nodes, ssp_to_torch, serialize, deserialize, get_edge_count, diameter, radius 19 | import networkx as nx 20 | 21 | 22 | def sample_neg(adj_list, edges, num_neg_samples_per_link=1, max_size=1000000, constrained_neg_prob=0): 23 | pos_edges = edges 24 | neg_edges = [] 25 | 26 | # if max_size is set, randomly sample train links 27 | if max_size < len(pos_edges): 28 | perm = np.random.permutation(len(pos_edges))[:max_size] 29 | pos_edges = pos_edges[perm] 30 | 31 | # sample negative links for train/test 32 | n, r = adj_list[0].shape[0], len(adj_list) 33 | 34 | # distribution of edges across reelations 35 | theta = 0.001 36 | edge_count = get_edge_count(adj_list) 37 | rel_dist = np.zeros(edge_count.shape) 38 | idx = np.nonzero(edge_count) 39 | rel_dist[idx] = softmax(theta * edge_count[idx]) 40 | 41 | # possible head and tails for each relation 42 | valid_heads = [adj.tocoo().row.tolist() for adj in adj_list] 43 | valid_tails = [adj.tocoo().col.tolist() for adj in adj_list] 44 | 45 | pbar = tqdm(total=len(pos_edges)) 46 | while len(neg_edges) < num_neg_samples_per_link * len(pos_edges): 47 | neg_head, neg_tail, rel = pos_edges[pbar.n % len(pos_edges)][0], pos_edges[pbar.n % len(pos_edges)][1], pos_edges[pbar.n % len(pos_edges)][2] 48 | if np.random.uniform() < constrained_neg_prob: 49 | if np.random.uniform() < 0.5: 50 | neg_head = np.random.choice(valid_heads[rel]) 51 | else: 52 | neg_tail = np.random.choice(valid_tails[rel]) 53 | else: 54 | if np.random.uniform() < 0.5: 55 | neg_head = np.random.choice(n) 56 | else: 57 | neg_tail = np.random.choice(n) 58 | 59 | if neg_head != neg_tail and adj_list[rel][neg_head, neg_tail] == 0: 60 | neg_edges.append([neg_head, neg_tail, rel]) 61 | pbar.update(1) 62 | 63 | pbar.close() 64 | 65 | neg_edges = np.array(neg_edges) 66 | return pos_edges, neg_edges 67 | 68 | 69 | def links2subgraphs(A, graphs, params, max_label_value=None): 70 | ''' 71 | extract enclosing subgraphs, write map mode + named dbs 72 | ''' 73 | max_n_label = {'value': np.array([0, 0])} 74 | subgraph_sizes = [] 75 | enc_ratios = [] 76 | num_pruned_nodes = [] 77 | 78 | BYTES_PER_DATUM = get_average_subgraph_size(100, list(graphs.values())[0]['pos'], A, params) * 1.5 79 | links_length = 0 80 | for split_name, split in graphs.items(): 81 | links_length += (len(split['pos']) + len(split['neg'])) * 2 82 | map_size = links_length * BYTES_PER_DATUM 83 | 84 | env = lmdb.open(params.db_path, map_size=map_size, max_dbs=6) 85 | 86 | def extraction_helper(A, links, g_labels, split_env): 87 | 88 | with env.begin(write=True, db=split_env) as txn: 89 | txn.put('num_graphs'.encode(), (len(links)).to_bytes(int.bit_length(len(links)), byteorder='little')) 90 | 91 | with mp.Pool(processes=None, initializer=intialize_worker, initargs=(A, params, max_label_value)) as p: 92 | args_ = zip(range(len(links)), links, g_labels) 93 | for (str_id, datum) in tqdm(p.imap(extract_save_subgraph, args_), total=len(links)): 94 | max_n_label['value'] = np.maximum(np.max(datum['n_labels'], axis=0), max_n_label['value']) 95 | subgraph_sizes.append(datum['subgraph_size']) 96 | enc_ratios.append(datum['enc_ratio']) 97 | num_pruned_nodes.append(datum['num_pruned_nodes']) 98 | 99 | with env.begin(write=True, db=split_env) as txn: 100 | txn.put(str_id, serialize(datum)) 101 | 102 | for split_name, split in graphs.items(): 103 | logging.info(f"Extracting enclosing subgraphs for positive links in {split_name} set") 104 | labels = np.ones(len(split['pos'])) 105 | db_name_pos = split_name + '_pos' 106 | split_env = env.open_db(db_name_pos.encode()) 107 | extraction_helper(A, split['pos'], labels, split_env) 108 | 109 | logging.info(f"Extracting enclosing subgraphs for negative links in {split_name} set") 110 | labels = np.zeros(len(split['neg'])) 111 | db_name_neg = split_name + '_neg' 112 | split_env = env.open_db(db_name_neg.encode()) 113 | extraction_helper(A, split['neg'], labels, split_env) 114 | 115 | max_n_label['value'] = max_label_value if max_label_value is not None else max_n_label['value'] 116 | 117 | with env.begin(write=True) as txn: 118 | bit_len_label_sub = int.bit_length(int(max_n_label['value'][0])) 119 | bit_len_label_obj = int.bit_length(int(max_n_label['value'][1])) 120 | txn.put('max_n_label_sub'.encode(), (int(max_n_label['value'][0])).to_bytes(bit_len_label_sub, byteorder='little')) 121 | txn.put('max_n_label_obj'.encode(), (int(max_n_label['value'][1])).to_bytes(bit_len_label_obj, byteorder='little')) 122 | 123 | txn.put('avg_subgraph_size'.encode(), struct.pack('f', float(np.mean(subgraph_sizes)))) 124 | txn.put('min_subgraph_size'.encode(), struct.pack('f', float(np.min(subgraph_sizes)))) 125 | txn.put('max_subgraph_size'.encode(), struct.pack('f', float(np.max(subgraph_sizes)))) 126 | txn.put('std_subgraph_size'.encode(), struct.pack('f', float(np.std(subgraph_sizes)))) 127 | 128 | txn.put('avg_enc_ratio'.encode(), struct.pack('f', float(np.mean(enc_ratios)))) 129 | txn.put('min_enc_ratio'.encode(), struct.pack('f', float(np.min(enc_ratios)))) 130 | txn.put('max_enc_ratio'.encode(), struct.pack('f', float(np.max(enc_ratios)))) 131 | txn.put('std_enc_ratio'.encode(), struct.pack('f', float(np.std(enc_ratios)))) 132 | 133 | txn.put('avg_num_pruned_nodes'.encode(), struct.pack('f', float(np.mean(num_pruned_nodes)))) 134 | txn.put('min_num_pruned_nodes'.encode(), struct.pack('f', float(np.min(num_pruned_nodes)))) 135 | txn.put('max_num_pruned_nodes'.encode(), struct.pack('f', float(np.max(num_pruned_nodes)))) 136 | txn.put('std_num_pruned_nodes'.encode(), struct.pack('f', float(np.std(num_pruned_nodes)))) 137 | 138 | 139 | def get_average_subgraph_size(sample_size, links, A, params): 140 | total_size = 0 141 | for (n1, n2, r_label) in links[np.random.choice(len(links), sample_size)]: 142 | nodes, n_labels, subgraph_size, enc_ratio, num_pruned_nodes = subgraph_extraction_labeling((n1, n2), r_label, A, params.hop, params.enclosing_sub_graph, params.max_nodes_per_hop) 143 | datum = {'nodes': nodes, 'r_label': r_label, 'g_label': 0, 'n_labels': n_labels, 'subgraph_size': subgraph_size, 'enc_ratio': enc_ratio, 'num_pruned_nodes': num_pruned_nodes} 144 | total_size += len(serialize(datum)) 145 | return total_size / sample_size 146 | 147 | 148 | def intialize_worker(A, params, max_label_value): 149 | global A_, params_, max_label_value_ 150 | A_, params_, max_label_value_ = A, params, max_label_value 151 | 152 | 153 | def extract_save_subgraph(args_): 154 | idx, (n1, n2, r_label), g_label = args_ 155 | nodes, n_labels, subgraph_size, enc_ratio, num_pruned_nodes = subgraph_extraction_labeling((n1, n2), r_label, A_, params_.hop, params_.enclosing_sub_graph, params_.max_nodes_per_hop) 156 | 157 | # max_label_value_ is to set the maximum possible value of node label while doing double-radius labelling. 158 | if max_label_value_ is not None: 159 | n_labels = np.array([np.minimum(label, max_label_value_).tolist() for label in n_labels]) 160 | 161 | datum = {'nodes': nodes, 'r_label': r_label, 'g_label': g_label, 'n_labels': n_labels, 'subgraph_size': subgraph_size, 'enc_ratio': enc_ratio, 'num_pruned_nodes': num_pruned_nodes} 162 | str_id = '{:08}'.format(idx).encode('ascii') 163 | 164 | return (str_id, datum) 165 | 166 | 167 | def get_neighbor_nodes(roots, adj, h=1, max_nodes_per_hop=None): 168 | bfs_generator = _bfs_relational(adj, roots, max_nodes_per_hop) 169 | lvls = list() 170 | for _ in range(h): 171 | try: 172 | lvls.append(next(bfs_generator)) 173 | except StopIteration: 174 | pass 175 | return set().union(*lvls) 176 | 177 | 178 | def subgraph_extraction_labeling(ind, rel, A_list, h=1, enclosing_sub_graph=False, max_nodes_per_hop=None, max_node_label_value=None): 179 | # extract the h-hop enclosing subgraphs around link 'ind' 180 | A_incidence = incidence_matrix(A_list) 181 | A_incidence += A_incidence.T 182 | 183 | root1_nei = get_neighbor_nodes(set([ind[0]]), A_incidence, h, max_nodes_per_hop) 184 | root2_nei = get_neighbor_nodes(set([ind[1]]), A_incidence, h, max_nodes_per_hop) 185 | 186 | subgraph_nei_nodes_int = root1_nei.intersection(root2_nei) 187 | subgraph_nei_nodes_un = root1_nei.union(root2_nei) 188 | 189 | # Extract subgraph | Roots being in the front is essential for labelling and the model to work properly. 190 | if enclosing_sub_graph: 191 | subgraph_nodes = list(ind) + list(subgraph_nei_nodes_int) 192 | else: 193 | subgraph_nodes = list(ind) + list(subgraph_nei_nodes_un) 194 | 195 | subgraph = [adj[subgraph_nodes, :][:, subgraph_nodes] for adj in A_list] 196 | 197 | labels, enclosing_subgraph_nodes = node_label(incidence_matrix(subgraph), max_distance=h) 198 | 199 | pruned_subgraph_nodes = np.array(subgraph_nodes)[enclosing_subgraph_nodes].tolist() 200 | pruned_labels = labels[enclosing_subgraph_nodes] 201 | # pruned_subgraph_nodes = subgraph_nodes 202 | # pruned_labels = labels 203 | 204 | if max_node_label_value is not None: 205 | pruned_labels = np.array([np.minimum(label, max_node_label_value).tolist() for label in pruned_labels]) 206 | 207 | subgraph_size = len(pruned_subgraph_nodes) 208 | enc_ratio = len(subgraph_nei_nodes_int) / (len(subgraph_nei_nodes_un) + 1e-3) 209 | num_pruned_nodes = len(subgraph_nodes) - len(pruned_subgraph_nodes) 210 | 211 | return pruned_subgraph_nodes, pruned_labels, subgraph_size, enc_ratio, num_pruned_nodes 212 | 213 | 214 | def node_label(subgraph, max_distance=1): 215 | # implementation of the node labeling scheme described in the paper 216 | roots = [0, 1] 217 | sgs_single_root = [remove_nodes(subgraph, [root]) for root in roots] 218 | dist_to_roots = [np.clip(ssp.csgraph.dijkstra(sg, indices=[0], directed=False, unweighted=True, limit=1e6)[:, 1:], 0, 1e7) for r, sg in enumerate(sgs_single_root)] 219 | dist_to_roots = np.array(list(zip(dist_to_roots[0][0], dist_to_roots[1][0])), dtype=int) 220 | 221 | target_node_labels = np.array([[0, 1], [1, 0]]) 222 | labels = np.concatenate((target_node_labels, dist_to_roots)) if dist_to_roots.size else target_node_labels 223 | 224 | enclosing_subgraph_nodes = np.where(np.max(labels, axis=1) <= max_distance)[0] 225 | return labels, enclosing_subgraph_nodes 226 | -------------------------------------------------------------------------------- /CoMPILE_github/test_auc.py: -------------------------------------------------------------------------------- 1 | # from comet_ml import Experiment 2 | import pdb 3 | import os 4 | import argparse 5 | import logging 6 | import torch 7 | from scipy.sparse import SparseEfficiencyWarning 8 | import numpy as np 9 | 10 | from subgraph_extraction.datasets import SubgraphDataset, generate_subgraph_datasets 11 | from utils.initialization_utils import initialize_experiment, initialize_model 12 | from utils.graph_utils import collate_dgl, move_batch_to_device_dgl, collate_dgl2 13 | from managers.evaluator import Evaluator 14 | 15 | from warnings import simplefilter 16 | 17 | 18 | def main(params): 19 | simplefilter(action='ignore', category=UserWarning) 20 | simplefilter(action='ignore', category=SparseEfficiencyWarning) 21 | 22 | graph_classifier = initialize_model(params, None, load_model=True) 23 | 24 | logging.info(f"Device: {params.device}") 25 | 26 | all_auc = [] 27 | auc_mean = 0 28 | 29 | all_auc_pr = [] 30 | auc_pr_mean = 0 31 | for r in range(1, params.runs + 1): 32 | 33 | params.db_path = os.path.join(params.main_dir, f'data/{params.dataset}/test_subgraphs_{params.experiment_name}_{params.constrained_neg_prob}_en_{params.enclosing_sub_graph}') 34 | 35 | generate_subgraph_datasets(params, splits=['test'], 36 | saved_relation2id=graph_classifier.relation2id, 37 | max_label_value=graph_classifier.max_label_value) 38 | 39 | test = SubgraphDataset(params.db_path, 'test_pos', 'test_neg', params.file_paths, graph_classifier.relation2id, 40 | add_traspose_rels=params.add_traspose_rels, 41 | num_neg_samples_per_link=params.num_neg_samples_per_link, 42 | use_kge_embeddings=params.use_kge_embeddings, dataset=params.dataset, 43 | kge_model=params.kge_model, file_name=params.test_file) 44 | 45 | test_evaluator = Evaluator(params, graph_classifier, test) 46 | 47 | result = test_evaluator.eval(save=True) 48 | logging.info('\nTest Set Performance:' + str(result)) 49 | all_auc.append(result['auc']) 50 | auc_mean = auc_mean + (result['auc'] - auc_mean) / r 51 | 52 | all_auc_pr.append(result['auc_pr']) 53 | auc_pr_mean = auc_pr_mean + (result['auc_pr'] - auc_pr_mean) / r 54 | 55 | auc_std = np.std(all_auc) 56 | auc_pr_std = np.std(all_auc_pr) 57 | 58 | logging.info('\nAvg test Set Performance -- mean auc :' + str(np.mean(all_auc)) + ' std auc: ' + str(np.std(all_auc))) 59 | logging.info('\nAvg test Set Performance -- mean auc_pr :' + str(np.mean(all_auc_pr)) + ' std auc_pr: ' + str(np.std(all_auc_pr))) 60 | 61 | 62 | if __name__ == '__main__': 63 | 64 | logging.basicConfig(level=logging.INFO) 65 | 66 | parser = argparse.ArgumentParser(description='TransE model') 67 | 68 | # Experiment setup params 69 | parser.add_argument("--experiment_name", "-e", type=str, default="default", 70 | help="A folder with this name would be created to dump saved models and log files") 71 | parser.add_argument("--dataset", "-d", type=str, default="Toy", 72 | help="Dataset string") 73 | parser.add_argument("--train_file", "-tf", type=str, default="train", 74 | help="Name of file containing training triplets") 75 | parser.add_argument("--test_file", "-t", type=str, default="test", 76 | help="Name of file containing test triplets") 77 | parser.add_argument("--runs", type=int, default=1, 78 | help="How many runs to perform for mean and std?") 79 | parser.add_argument("--gpu", type=int, default=0, 80 | help="Which GPU to use?") 81 | parser.add_argument('--disable_cuda', action='store_true', 82 | help='Disable CUDA') 83 | 84 | # Data processing pipeline params 85 | parser.add_argument("--max_links", type=int, default=100000, 86 | help="Set maximum number of links (to fit into memory)") 87 | parser.add_argument("--hop", type=int, default=3, 88 | help="Enclosing subgraph hop number") 89 | parser.add_argument("--max_nodes_per_hop", "-max_h", type=int, default=None, 90 | help="if > 0, upper bound the # nodes per hop by subsampling") 91 | parser.add_argument("--use_kge_embeddings", "-kge", type=bool, default=False, 92 | help='whether to use pretrained KGE embeddings') 93 | parser.add_argument("--kge_model", type=str, default="TransE", 94 | help="Which KGE model to load entity embeddings from") 95 | parser.add_argument('--model_type', '-m', type=str, choices=['dgl'], default='dgl', 96 | help='what format to store subgraphs in for model') 97 | parser.add_argument('--constrained_neg_prob', '-cn', type=float, default=0, 98 | help='with what probability to sample constrained heads/tails while neg sampling') 99 | parser.add_argument("--num_neg_samples_per_link", '-neg', type=int, default=1, 100 | help="Number of negative examples to sample per positive link") 101 | parser.add_argument("--batch_size", type=int, default=16, 102 | help="Batch size") 103 | parser.add_argument("--num_workers", type=int, default=0, 104 | help="Number of dataloading processes") 105 | parser.add_argument('--add_traspose_rels', '-tr', type=bool, default=False, 106 | help='whether to append adj matrix list with symmetric relations') 107 | parser.add_argument('--enclosing_sub_graph', '-en', type=bool, default=True, 108 | help='whether to only consider enclosing subgraph') 109 | 110 | params = parser.parse_args() 111 | initialize_experiment(params, __file__) 112 | 113 | params.file_paths = { 114 | 'train': os.path.join(params.main_dir, 'data/{}/{}.txt'.format(params.dataset, params.train_file)), 115 | 'test': os.path.join(params.main_dir, 'data/{}/{}.txt'.format(params.dataset, params.test_file)) 116 | } 117 | 118 | if not params.disable_cuda and torch.cuda.is_available(): 119 | params.device = torch.device('cuda:%d' % params.gpu) 120 | else: 121 | params.device = torch.device('cpu') 122 | 123 | params.collate_fn = collate_dgl2 124 | params.move_batch_to_device = move_batch_to_device_dgl 125 | 126 | main(params) 127 | -------------------------------------------------------------------------------- /CoMPILE_github/train.py: -------------------------------------------------------------------------------- 1 | import os 2 | import argparse 3 | import logging 4 | import torch 5 | from scipy.sparse import SparseEfficiencyWarning 6 | 7 | from subgraph_extraction.datasets import SubgraphDataset, generate_subgraph_datasets 8 | from utils.initialization_utils import initialize_experiment, initialize_model 9 | from utils.graph_utils import collate_dgl, move_batch_to_device_dgl, collate_dgl2 10 | 11 | from model.dgl.graph_classifier import GraphClassifier as dgl_model 12 | 13 | from managers.evaluator import Evaluator 14 | from managers.trainer import Trainer 15 | 16 | from warnings import simplefilter 17 | #os.environ["CUDA_VISIBLE_DEVICES"]="1" ##############2 18 | 19 | def main(params): 20 | simplefilter(action='ignore', category=UserWarning) 21 | simplefilter(action='ignore', category=SparseEfficiencyWarning) 22 | 23 | params.db_path = os.path.join(params.main_dir, f'data/{params.dataset}/subgraphs_en_{params.enclosing_sub_graph}_neg_{params.num_neg_samples_per_link}_hop_{params.hop}') 24 | 25 | if not os.path.isdir(params.db_path): 26 | generate_subgraph_datasets(params) 27 | 28 | train = SubgraphDataset(params.db_path, 'train_pos', 'train_neg', params.file_paths, 29 | add_traspose_rels=params.add_traspose_rels, 30 | num_neg_samples_per_link=params.num_neg_samples_per_link, 31 | use_kge_embeddings=params.use_kge_embeddings, dataset=params.dataset, 32 | kge_model=params.kge_model, file_name=params.train_file) 33 | valid = SubgraphDataset(params.db_path, 'valid_pos', 'valid_neg', params.file_paths, 34 | add_traspose_rels=params.add_traspose_rels, 35 | num_neg_samples_per_link=params.num_neg_samples_per_link, 36 | use_kge_embeddings=params.use_kge_embeddings, dataset=params.dataset, 37 | kge_model=params.kge_model, file_name=params.valid_file) 38 | 39 | params.num_rels = train.num_rels 40 | params.aug_num_rels = train.aug_num_rels 41 | params.inp_dim = train.n_feat_dim 42 | 43 | # Log the max label value to save it in the model. This will be used to cap the labels generated on test set. 44 | params.max_label_value = train.max_n_label 45 | 46 | graph_classifier = initialize_model(params, dgl_model, params.load_model) 47 | 48 | logging.info(f"Device: {params.device}") 49 | logging.info(f"Input dim : {params.inp_dim}, # Relations : {params.num_rels}, # Augmented relations : {params.aug_num_rels}") 50 | 51 | valid_evaluator = Evaluator(params, graph_classifier, valid) 52 | 53 | trainer = Trainer(params, graph_classifier, train, valid_evaluator) 54 | 55 | logging.info('Starting training with full batch...') 56 | 57 | trainer.train() 58 | 59 | 60 | if __name__ == '__main__': 61 | 62 | logging.basicConfig(level=logging.INFO) 63 | 64 | parser = argparse.ArgumentParser(description='TransE model') 65 | 66 | # Experiment setup params 67 | parser.add_argument("--experiment_name", "-e", type=str, default="default", 68 | help="A folder with this name would be created to dump saved models and log files") 69 | parser.add_argument("--dataset", "-d", type=str, 70 | help="Dataset string") 71 | parser.add_argument("--gpu", type=int, default=0, 72 | help="Which GPU to use?") 73 | parser.add_argument('--disable_cuda', action='store_true', 74 | help='Disable CUDA') 75 | parser.add_argument('--load_model', action='store_true', 76 | help='Load existing model?') 77 | parser.add_argument("--train_file", "-tf", type=str, default="train", 78 | help="Name of file containing training triplets") 79 | parser.add_argument("--valid_file", "-vf", type=str, default="valid", 80 | help="Name of file containing validation triplets") 81 | 82 | # Training regime params 83 | parser.add_argument("--num_epochs", "-ne", type=int, default=30, #########30 84 | help="Learning rate of the optimizer") 85 | parser.add_argument("--eval_every", type=int, default=1, 86 | help="Interval of epochs to evaluate the model?") 87 | parser.add_argument("--eval_every_iter", type=int, default=455, 88 | help="Interval of iterations to evaluate the model?") 89 | parser.add_argument("--save_every", type=int, default=10, 90 | help="Interval of epochs to save a checkpoint of the model?") 91 | parser.add_argument("--early_stop", type=int, default=100, 92 | help="Early stopping patience") 93 | parser.add_argument("--optimizer", type=str, default="Adam", 94 | help="Which optimizer to use?") 95 | parser.add_argument("--lr", type=float, default=0.001, 96 | help="Learning rate of the optimizer") 97 | parser.add_argument("--clip", type=int, default=1000, 98 | help="Maximum gradient norm allowed") 99 | parser.add_argument("--l2", type=float, default=5e-4, 100 | help="Regularization constant for GNN weights") 101 | parser.add_argument("--margin", type=float, default=10, 102 | help="The margin between positive and negative samples in the max-margin loss") 103 | 104 | # Data processing pipeline params 105 | parser.add_argument("--max_links", type=int, default=10000000, #10000000 106 | help="Set maximum number of train links (to fit into memory)") 107 | parser.add_argument("--hop", type=int, default=3, 108 | help="Enclosing subgraph hop number") 109 | parser.add_argument("--max_nodes_per_hop", "-max_h", type=int, default=None, 110 | help="if > 0, upper bound the # nodes per hop by subsampling") 111 | parser.add_argument("--use_kge_embeddings", "-kge", type=bool, default=False, 112 | help='whether to use pretrained KGE embeddings') 113 | parser.add_argument("--kge_model", type=str, default="TransE", 114 | help="Which KGE model to load entity embeddings from") 115 | parser.add_argument('--model_type', '-m', type=str, choices=['ssp', 'dgl'], default='dgl', 116 | help='what format to store subgraphs in for model') 117 | parser.add_argument('--constrained_neg_prob', '-cn', type=float, default=0.0, 118 | help='with what probability to sample constrained heads/tails while neg sampling') 119 | parser.add_argument("--batch_size", type=int, default=16, 120 | help="Batch size") 121 | parser.add_argument("--num_neg_samples_per_link", '-neg', type=int, default=1, 122 | help="Number of negative examples to sample per positive link") 123 | parser.add_argument("--num_workers", type=int, default=0, 124 | help="Number of dataloading processes") 125 | parser.add_argument('--add_traspose_rels', '-tr', type=bool, default=False, 126 | help='whether to append adj matrix list with symmetric relations') 127 | parser.add_argument('--enclosing_sub_graph', '-en', type=bool, default=True, 128 | help='whether to only consider enclosing subgraph') 129 | 130 | # Model params 131 | parser.add_argument("--rel_emb_dim", "-r_dim", type=int, default=32, 132 | help="Relation embedding size") 133 | parser.add_argument("--attn_rel_emb_dim", "-ar_dim", type=int, default=32, 134 | help="Relation embedding size for attention") 135 | parser.add_argument("--emb_dim", "-dim", type=int, default=32, 136 | help="Entity embedding size") 137 | parser.add_argument("--num_gcn_layers", "-l", type=int, default=3, 138 | help="Number of GCN layers") 139 | parser.add_argument("--num_bases", "-b", type=int, default=4, 140 | help="Number of basis functions to use for GCN weights") 141 | parser.add_argument("--dropout", type=float, default=0, 142 | help="Dropout rate in GNN layers") 143 | parser.add_argument("--edge_dropout", type=float, default=0.5, 144 | help="Dropout rate in edges of the subgraphs") 145 | parser.add_argument('--gnn_agg_type', '-a', type=str, choices=['sum', 'mlp', 'gru'], default='sum', 146 | help='what type of aggregation to do in gnn msg passing') 147 | parser.add_argument('--add_ht_emb', '-ht', type=bool, default=True, 148 | help='whether to concatenate head/tail embedding with pooled graph representation') 149 | parser.add_argument('--has_attn', '-attn', type=bool, default=True, 150 | help='whether to have attn in model or not') 151 | 152 | params = parser.parse_args() 153 | initialize_experiment(params, __file__) 154 | 155 | params.file_paths = { 156 | 'train': os.path.join(params.main_dir, 'data/{}/{}.txt'.format(params.dataset, params.train_file)), 157 | 'valid': os.path.join(params.main_dir, 'data/{}/{}.txt'.format(params.dataset, params.valid_file)) 158 | } 159 | 160 | if not params.disable_cuda and torch.cuda.is_available(): 161 | params.device = torch.device('cuda:%d' % params.gpu) 162 | else: 163 | params.device = torch.device('cpu') 164 | 165 | params.collate_fn = collate_dgl2 166 | params.move_batch_to_device = move_batch_to_device_dgl 167 | 168 | main(params) 169 | -------------------------------------------------------------------------------- /CoMPILE_github/utils/__pycache__/data_utils.cpython-36.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TmacMai/CoMPILE_Inductive_Knowledge_Graph/523db77f8178f89faf5ec06c677ca66b84ef35eb/CoMPILE_github/utils/__pycache__/data_utils.cpython-36.pyc -------------------------------------------------------------------------------- /CoMPILE_github/utils/__pycache__/dgl_utils.cpython-36.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TmacMai/CoMPILE_Inductive_Knowledge_Graph/523db77f8178f89faf5ec06c677ca66b84ef35eb/CoMPILE_github/utils/__pycache__/dgl_utils.cpython-36.pyc -------------------------------------------------------------------------------- /CoMPILE_github/utils/__pycache__/graph_utils.cpython-36.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TmacMai/CoMPILE_Inductive_Knowledge_Graph/523db77f8178f89faf5ec06c677ca66b84ef35eb/CoMPILE_github/utils/__pycache__/graph_utils.cpython-36.pyc -------------------------------------------------------------------------------- /CoMPILE_github/utils/__pycache__/initialization_utils.cpython-36.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TmacMai/CoMPILE_Inductive_Knowledge_Graph/523db77f8178f89faf5ec06c677ca66b84ef35eb/CoMPILE_github/utils/__pycache__/initialization_utils.cpython-36.pyc -------------------------------------------------------------------------------- /CoMPILE_github/utils/clean_data.py: -------------------------------------------------------------------------------- 1 | import os 2 | import argparse 3 | import numpy as np 4 | 5 | 6 | def write_to_file(file_name, data): 7 | with open(file_name, "w") as f: 8 | for s, r, o in data: 9 | f.write('\t'.join([s, r, o]) + '\n') 10 | 11 | 12 | def main(params): 13 | with open(os.path.join(params.main_dir, 'data', params.dataset, 'train.txt')) as f: 14 | train_data = [line.split() for line in f.read().split('\n')[:-1]] 15 | with open(os.path.join(params.main_dir, 'data', params.dataset, 'valid.txt')) as f: 16 | valid_data = [line.split() for line in f.read().split('\n')[:-1]] 17 | with open(os.path.join(params.main_dir, 'data', params.dataset, 'test.txt')) as f: 18 | test_data = [line.split() for line in f.read().split('\n')[:-1]] 19 | 20 | train_tails = set([d[2] for d in train_data]) 21 | train_heads = set([d[0] for d in train_data]) 22 | train_ent = train_tails.union(train_heads) 23 | train_rels = set([d[1] for d in train_data]) 24 | 25 | filtered_valid_data = [] 26 | for d in valid_data: 27 | if d[0] in train_ent and d[1] in train_rels and d[2] in train_ent: 28 | filtered_valid_data.append(d) 29 | else: 30 | train_data.append(d) 31 | train_ent = train_ent.union(set([d[0], d[2]])) 32 | train_rels = train_rels.union(set([d[1]])) 33 | 34 | filtered_test_data = [] 35 | for d in test_data: 36 | if d[0] in train_ent and d[1] in train_rels and d[2] in train_ent: 37 | filtered_test_data.append(d) 38 | else: 39 | train_data.append(d) 40 | train_ent = train_ent.union(set([d[0], d[2]])) 41 | train_rels = train_rels.union(set([d[1]])) 42 | 43 | data_dir = os.path.join(params.main_dir, 'data/{}'.format(params.dataset)) 44 | write_to_file(os.path.join(data_dir, 'train.txt'), train_data) 45 | write_to_file(os.path.join(data_dir, 'valid.txt'), filtered_valid_data) 46 | write_to_file(os.path.join(data_dir, 'test.txt'), filtered_test_data) 47 | 48 | with open(os.path.join(params.main_dir, 'data', params.dataset + '_meta', 'train.txt')) as f: 49 | meta_train_data = [line.split() for line in f.read().split('\n')[:-1]] 50 | with open(os.path.join(params.main_dir, 'data', params.dataset + '_meta', 'valid.txt')) as f: 51 | meta_valid_data = [line.split() for line in f.read().split('\n')[:-1]] 52 | with open(os.path.join(params.main_dir, 'data', params.dataset + '_meta', 'test.txt')) as f: 53 | meta_test_data = [line.split() for line in f.read().split('\n')[:-1]] 54 | 55 | meta_train_tails = set([d[2] for d in meta_train_data]) 56 | meta_train_heads = set([d[0] for d in meta_train_data]) 57 | meta_train_ent = meta_train_tails.union(meta_train_heads) 58 | meta_train_rels = set([d[1] for d in meta_train_data]) 59 | 60 | filtered_meta_valid_data = [] 61 | for d in meta_valid_data: 62 | if d[0] in meta_train_ent and d[1] in meta_train_rels and d[2] in meta_train_ent: 63 | filtered_meta_valid_data.append(d) 64 | else: 65 | meta_train_data.append(d) 66 | meta_train_ent = meta_train_ent.union(set([d[0], d[2]])) 67 | meta_train_rels = meta_train_rels.union(set([d[1]])) 68 | 69 | filtered_meta_test_data = [] 70 | for d in meta_test_data: 71 | if d[0] in meta_train_ent and d[1] in meta_train_rels and d[2] in meta_train_ent: 72 | filtered_meta_test_data.append(d) 73 | else: 74 | meta_train_data.append(d) 75 | meta_train_ent = meta_train_ent.union(set([d[0], d[2]])) 76 | meta_train_rels = meta_train_rels.union(set([d[1]])) 77 | 78 | meta_data_dir = os.path.join(params.main_dir, 'data/{}_meta'.format(params.dataset)) 79 | write_to_file(os.path.join(meta_data_dir, 'train.txt'), meta_train_data) 80 | write_to_file(os.path.join(meta_data_dir, 'valid.txt'), filtered_meta_valid_data) 81 | write_to_file(os.path.join(meta_data_dir, 'test.txt'), filtered_meta_test_data) 82 | 83 | 84 | if __name__ == '__main__': 85 | parser = argparse.ArgumentParser(description='Move new entities from test/valid to train') 86 | 87 | parser.add_argument("--dataset", "-d", type=str, default="fb237_v1_copy", 88 | help="Dataset string") 89 | params = parser.parse_args() 90 | 91 | params.main_dir = os.path.join(os.path.relpath(os.path.dirname(os.path.abspath(__file__))), '..') 92 | 93 | main(params) 94 | -------------------------------------------------------------------------------- /CoMPILE_github/utils/data_utils.py: -------------------------------------------------------------------------------- 1 | import os 2 | import pdb 3 | import numpy as np 4 | from scipy.sparse import csc_matrix 5 | import matplotlib.pyplot as plt 6 | 7 | 8 | def plot_rel_dist(adj_list, filename): 9 | rel_count = [] 10 | for adj in adj_list: 11 | rel_count.append(adj.count_nonzero()) 12 | 13 | fig = plt.figure(figsize=(12, 8)) 14 | plt.plot(rel_count) 15 | fig.savefig(filename, dpi=fig.dpi) 16 | 17 | 18 | def process_files(files, saved_relation2id=None): 19 | ''' 20 | files: Dictionary map of file paths to read the triplets from. 21 | saved_relation2id: Saved relation2id (mostly passed from a trained model) which can be used to map relations to pre-defined indices and filter out the unknown ones. 22 | ''' 23 | entity2id = {} 24 | relation2id = {} if saved_relation2id is None else saved_relation2id 25 | 26 | triplets = {} 27 | 28 | ent = 0 29 | rel = 0 30 | 31 | for file_type, file_path in files.items(): 32 | 33 | data = [] 34 | with open(file_path) as f: 35 | file_data = [line.split() for line in f.read().split('\n')[:-1]] 36 | 37 | for triplet in file_data: 38 | if triplet[0] not in entity2id: 39 | entity2id[triplet[0]] = ent 40 | ent += 1 41 | if triplet[2] not in entity2id: 42 | entity2id[triplet[2]] = ent 43 | ent += 1 44 | if not saved_relation2id and triplet[1] not in relation2id: 45 | relation2id[triplet[1]] = rel 46 | rel += 1 47 | 48 | # Save the triplets corresponding to only the known relations 49 | if triplet[1] in relation2id: 50 | data.append([entity2id[triplet[0]], entity2id[triplet[2]], relation2id[triplet[1]]]) 51 | 52 | triplets[file_type] = np.array(data) 53 | 54 | id2entity = {v: k for k, v in entity2id.items()} 55 | id2relation = {v: k for k, v in relation2id.items()} 56 | 57 | # Construct the list of adjacency matrix each corresponding to eeach relation. Note that this is constructed only from the train data. 58 | adj_list = [] 59 | for i in range(len(relation2id)): 60 | idx = np.argwhere(triplets['train'][:, 2] == i) 61 | adj_list.append(csc_matrix((np.ones(len(idx), dtype=np.uint8), (triplets['train'][:, 0][idx].squeeze(1), triplets['train'][:, 1][idx].squeeze(1))), shape=(len(entity2id), len(entity2id)))) 62 | 63 | return adj_list, triplets, entity2id, relation2id, id2entity, id2relation 64 | 65 | 66 | def save_to_file(directory, file_name, triplets, id2entity, id2relation): 67 | file_path = os.path.join(directory, file_name) 68 | with open(file_path, "w") as f: 69 | for s, o, r in triplets: 70 | f.write('\t'.join([id2entity[s], id2relation[r], id2entity[o]]) + '\n') 71 | -------------------------------------------------------------------------------- /CoMPILE_github/utils/dgl_utils.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import scipy.sparse as ssp 3 | import random 4 | 5 | """All functions in this file are from dgl.contrib.data.knowledge_graph""" 6 | 7 | 8 | def _bfs_relational(adj, roots, max_nodes_per_hop=None): 9 | """ 10 | BFS for graphs. 11 | Modified from dgl.contrib.data.knowledge_graph to accomodate node sampling 12 | """ 13 | visited = set() 14 | current_lvl = set(roots) 15 | 16 | next_lvl = set() 17 | 18 | while current_lvl: 19 | 20 | for v in current_lvl: 21 | visited.add(v) 22 | 23 | next_lvl = _get_neighbors(adj, current_lvl) 24 | next_lvl -= visited # set difference 25 | 26 | if max_nodes_per_hop and max_nodes_per_hop < len(next_lvl): 27 | next_lvl = set(random.sample(next_lvl, max_nodes_per_hop)) 28 | 29 | yield next_lvl 30 | 31 | current_lvl = set.union(next_lvl) 32 | 33 | 34 | def _get_neighbors(adj, nodes): 35 | """Takes a set of nodes and a graph adjacency matrix and returns a set of neighbors. 36 | Directly copied from dgl.contrib.data.knowledge_graph""" 37 | sp_nodes = _sp_row_vec_from_idx_list(list(nodes), adj.shape[1]) 38 | sp_neighbors = sp_nodes.dot(adj) 39 | neighbors = set(ssp.find(sp_neighbors)[1]) # convert to set of indices 40 | return neighbors 41 | 42 | 43 | def _sp_row_vec_from_idx_list(idx_list, dim): 44 | """Create sparse vector of dimensionality dim from a list of indices.""" 45 | shape = (1, dim) 46 | data = np.ones(len(idx_list)) 47 | row_ind = np.zeros(len(idx_list)) 48 | col_ind = list(idx_list) 49 | return ssp.csr_matrix((data, (row_ind, col_ind)), shape=shape) 50 | -------------------------------------------------------------------------------- /CoMPILE_github/utils/graph_utils.py: -------------------------------------------------------------------------------- 1 | import statistics 2 | import numpy as np 3 | import scipy.sparse as ssp 4 | import torch 5 | import networkx as nx 6 | import dgl 7 | import pickle 8 | 9 | 10 | def serialize(data): 11 | data_tuple = tuple(data.values()) 12 | return pickle.dumps(data_tuple) 13 | 14 | 15 | def deserialize(data): 16 | data_tuple = pickle.loads(data) 17 | keys = ('nodes', 'r_label', 'g_label', 'n_label') 18 | return dict(zip(keys, data_tuple)) 19 | 20 | 21 | def get_edge_count(adj_list): 22 | count = [] 23 | for adj in adj_list: 24 | count.append(len(adj.tocoo().row.tolist())) 25 | return np.array(count) 26 | 27 | 28 | def incidence_matrix(adj_list): 29 | ''' 30 | adj_list: List of sparse adjacency matrices 31 | ''' 32 | 33 | rows, cols, dats = [], [], [] 34 | dim = adj_list[0].shape 35 | for adj in adj_list: 36 | adjcoo = adj.tocoo() 37 | rows += adjcoo.row.tolist() 38 | cols += adjcoo.col.tolist() 39 | dats += adjcoo.data.tolist() 40 | row = np.array(rows) 41 | col = np.array(cols) 42 | data = np.array(dats) 43 | return ssp.csc_matrix((data, (row, col)), shape=dim) 44 | 45 | 46 | def remove_nodes(A_incidence, nodes): 47 | idxs_wo_nodes = list(set(range(A_incidence.shape[1])) - set(nodes)) 48 | return A_incidence[idxs_wo_nodes, :][:, idxs_wo_nodes] 49 | 50 | 51 | def ssp_to_torch(A, device, dense=False): 52 | ''' 53 | A : Sparse adjacency matrix 54 | ''' 55 | idx = torch.LongTensor([A.tocoo().row, A.tocoo().col]) 56 | dat = torch.FloatTensor(A.tocoo().data) 57 | A = torch.sparse.FloatTensor(idx, dat, torch.Size([A.shape[0], A.shape[1]])).to(device=device) 58 | return A 59 | 60 | 61 | def ssp_multigraph_to_dgl(graph, n_feats=None): 62 | """ 63 | Converting ssp multigraph (i.e. list of adjs) to dgl multigraph. 64 | """ 65 | 66 | g_nx = nx.MultiDiGraph() 67 | g_nx.add_nodes_from(list(range(graph[0].shape[0]))) 68 | # Add edges 69 | for rel, adj in enumerate(graph): 70 | # Convert adjacency matrix to tuples for nx0 71 | nx_triplets = [] 72 | for src, dst in list(zip(adj.tocoo().row, adj.tocoo().col)): 73 | nx_triplets.append((src, dst, {'type': rel})) 74 | g_nx.add_edges_from(nx_triplets) 75 | 76 | # make dgl graph 77 | g_dgl = dgl.DGLGraph(multigraph=True) 78 | g_dgl.from_networkx(g_nx, edge_attrs=['type']) 79 | # add node features 80 | if n_feats is not None: 81 | g_dgl.ndata['feat'] = torch.tensor(n_feats) 82 | 83 | return g_dgl 84 | 85 | 86 | def collate_dgl(samples): 87 | # The input `samples` is a list of pairs 88 | graphs_pos, g_labels_pos, r_labels_pos, graphs_negs, g_labels_negs, r_labels_negs = map(list, zip(*samples)) 89 | batched_graph_pos = dgl.batch(graphs_pos) 90 | print('batched_graph_pos ', len(batched_graph_pos)) 91 | 92 | graphs_neg = [item for sublist in graphs_negs for item in sublist] 93 | g_labels_neg = [item for sublist in g_labels_negs for item in sublist] 94 | r_labels_neg = [item for sublist in r_labels_negs for item in sublist] 95 | 96 | batched_graph_neg = dgl.batch(graphs_neg) 97 | return (batched_graph_pos, r_labels_pos), g_labels_pos, (batched_graph_neg, r_labels_neg), g_labels_neg 98 | 99 | def collate_dgl2(samples): 100 | # The input `samples` is a list of pairs 101 | graphs_pos, g_labels_pos, r_labels_pos, graphs_negs, g_labels_negs, r_labels_negs = map(list, zip(*samples)) 102 | 103 | # graphs_pos = [item for sublist in graphs_pos for item in sublist] 104 | # g_labels_pos = [item for sublist in g_labels_pos for item in sublist] 105 | # r_labels_pos = [item for sublist in r_labels_pos for item in sublist] 106 | 107 | # batched_graph_pos = dgl.batch(graphs_pos) 108 | 109 | graphs_neg = [item for sublist in graphs_negs for item in sublist] 110 | g_labels_neg = [item for sublist in g_labels_negs for item in sublist] 111 | r_labels_neg = [item for sublist in r_labels_negs for item in sublist] 112 | #print('neg for each pos ', len(graphs_neg)) 113 | # batched_graph_neg = dgl.batch(graphs_neg) 114 | return (graphs_pos, r_labels_pos), g_labels_pos, (graphs_neg, r_labels_neg), g_labels_neg 115 | 116 | 117 | def move_batch_to_device_dgl(batch, device): 118 | ((g_dgl_pos, r_labels_pos), targets_pos, (g_dgl_neg, r_labels_neg), targets_neg) = batch 119 | 120 | targets_pos = torch.LongTensor(targets_pos).to(device=device) 121 | r_labels_pos = torch.LongTensor(r_labels_pos).to(device=device) 122 | # print() 123 | 124 | targets_neg = torch.LongTensor(targets_neg).to(device=device) 125 | r_labels_neg = torch.LongTensor(r_labels_neg).to(device=device) 126 | 127 | g_dgl_pos = send_graph_to_device(g_dgl_pos, device) 128 | g_dgl_neg = send_graph_to_device(g_dgl_neg, device) 129 | 130 | return ((g_dgl_pos, r_labels_pos), targets_pos, (g_dgl_neg, r_labels_neg), targets_neg) 131 | 132 | 133 | def send_graph_to_device(g, device): 134 | # nodes 135 | labels = g.node_attr_schemes() 136 | # print('node labels shape ', labels[1].shape) 137 | i = 0 138 | for l in labels.keys(): 139 | g.ndata[l] = g.ndata.pop(l).to(device) 140 | # if i == 0: 141 | # print('node l shape ', g.ndata[l].shape) 142 | # i += 1 143 | #if l == 144 | # print('node l ', l) 145 | 146 | # edges 147 | labels = g.edge_attr_schemes() 148 | # print('edge labels shape ', labels[1].shape) 149 | for l in labels.keys(): 150 | g.edata[l] = g.edata.pop(l).to(device) 151 | # print('edge l ', l) 152 | return g 153 | 154 | # The following three functions are modified from networks source codes to 155 | # accomodate diameter and radius for dirercted graphs 156 | 157 | 158 | def eccentricity(G): 159 | e = {} 160 | for n in G.nbunch_iter(): 161 | length = nx.single_source_shortest_path_length(G, n) 162 | e[n] = max(length.values()) 163 | return e 164 | 165 | 166 | def radius(G): 167 | e = eccentricity(G) 168 | e = np.where(np.array(list(e.values())) > 0, list(e.values()), np.inf) 169 | return min(e) 170 | 171 | 172 | def diameter(G): 173 | e = eccentricity(G) 174 | return max(e.values()) 175 | -------------------------------------------------------------------------------- /CoMPILE_github/utils/initialization_utils.py: -------------------------------------------------------------------------------- 1 | import os 2 | import logging 3 | import json 4 | import torch 5 | 6 | 7 | def initialize_experiment(params, file_name): 8 | ''' 9 | Makes the experiment directory, sets standard paths and initializes the logger 10 | ''' 11 | params.main_dir = os.path.join(os.path.relpath(os.path.dirname(os.path.abspath(__file__))), '..') 12 | exps_dir = os.path.join(params.main_dir, 'experiments') 13 | if not os.path.exists(exps_dir): 14 | os.makedirs(exps_dir) 15 | 16 | params.exp_dir = os.path.join(exps_dir, params.experiment_name) 17 | 18 | if not os.path.exists(params.exp_dir): 19 | os.makedirs(params.exp_dir) 20 | 21 | if file_name == 'test_auc.py': 22 | params.test_exp_dir = os.path.join(params.exp_dir, f"test_{params.dataset}_{params.constrained_neg_prob}") 23 | if not os.path.exists(params.test_exp_dir): 24 | os.makedirs(params.test_exp_dir) 25 | file_handler = logging.FileHandler(os.path.join(params.test_exp_dir, f"log_test.txt")) 26 | else: 27 | file_handler = logging.FileHandler(os.path.join(params.exp_dir, "log_train.txt")) 28 | logger = logging.getLogger() 29 | logger.addHandler(file_handler) 30 | 31 | logger.info('============ Initialized logger ============') 32 | logger.info('\n'.join('%s: %s' % (k, str(v)) for k, v 33 | in sorted(dict(vars(params)).items()))) 34 | logger.info('============================================') 35 | 36 | with open(os.path.join(params.exp_dir, "params.json"), 'w') as fout: 37 | json.dump(vars(params), fout) 38 | 39 | 40 | def initialize_model(params, model, load_model=False): 41 | ''' 42 | relation2id: the relation to id mapping, this is stored in the model and used when testing 43 | model: the type of model to initialize/load 44 | load_model: flag which decide to initialize the model or load a saved model 45 | ''' 46 | 47 | if load_model and os.path.exists(os.path.join(params.exp_dir, 'best_graph_classifier.pth')): 48 | logging.info('Loading existing model from %s' % os.path.join(params.exp_dir, 'best_graph_classifier.pth')) 49 | graph_classifier = torch.load(os.path.join(params.exp_dir, 'best_graph_classifier.pth')).to(device=params.device) 50 | else: 51 | relation2id_path = os.path.join(params.main_dir, f'data/{params.dataset}/relation2id.json') 52 | with open(relation2id_path) as f: 53 | relation2id = json.load(f) 54 | 55 | logging.info('No existing model found. Initializing new model..') 56 | graph_classifier = model(params, relation2id).to(device=params.device) 57 | 58 | return graph_classifier 59 | -------------------------------------------------------------------------------- /CoMPILE_github/utils/prepare_meta_data.py: -------------------------------------------------------------------------------- 1 | import pdb 2 | import os 3 | import math 4 | import random 5 | import argparse 6 | import numpy as np 7 | 8 | from graph_utils import incidence_matrix, get_edge_count 9 | from dgl_utils import _bfs_relational 10 | from data_utils import process_files, save_to_file 11 | 12 | 13 | def get_active_relations(adj_list): 14 | act_rels = [] 15 | for r, adj in enumerate(adj_list): 16 | if len(adj.tocoo().row.tolist()) > 0: 17 | act_rels.append(r) 18 | return act_rels 19 | 20 | 21 | def get_avg_degree(adj_list): 22 | adj_mat = incidence_matrix(adj_list) 23 | degree = [] 24 | for node in range(adj_list[0].shape[0]): 25 | degree.append(np.sum(adj_mat[node, :])) 26 | return np.mean(degree) 27 | 28 | 29 | def get_splits(adj_list, nodes, valid_rels=None, valid_ratio=0.1, test_ratio=0.1): 30 | ''' 31 | Get train/valid/test splits of the sub-graph defined by the given set of nodes. The relations in this subbgraph are limited to be among the given valid_rels. 32 | ''' 33 | 34 | # Extract the subgraph 35 | subgraph = [adj[nodes, :][:, nodes] for adj in adj_list] 36 | 37 | # Get the relations that are allowed to be sampled 38 | active_rels = get_active_relations(subgraph) 39 | common_rels = list(set(active_rels).intersection(set(valid_rels))) 40 | 41 | print('Average degree : ', get_avg_degree(subgraph)) 42 | print('Nodes: ', len(nodes)) 43 | print('Links: ', np.sum(get_edge_count(subgraph))) 44 | print('Active relations: ', len(common_rels)) 45 | 46 | # get all the triplets satisfying the given constraints 47 | all_triplets = [] 48 | for r in common_rels: 49 | # print(r, len(subgraph[r].tocoo().row)) 50 | for (i, j) in zip(subgraph[r].tocoo().row, subgraph[r].tocoo().col): 51 | all_triplets.append([nodes[i], nodes[j], r]) 52 | all_triplets = np.array(all_triplets) 53 | 54 | # delete the triplets which correspond to self connections 55 | ind = np.argwhere(all_triplets[:, 0] == all_triplets[:, 1]) 56 | all_triplets = np.delete(all_triplets, ind, axis=0) 57 | print('Links after deleting self connections : %d' % len(all_triplets)) 58 | 59 | # get the splits according to the given ratio 60 | np.random.shuffle(all_triplets) 61 | train_split = int(math.ceil(len(all_triplets) * (1 - valid_ratio - test_ratio))) 62 | valid_split = int(math.ceil(len(all_triplets) * (1 - test_ratio))) 63 | 64 | train_triplets = all_triplets[:train_split] 65 | valid_triplets = all_triplets[train_split: valid_split] 66 | test_triplets = all_triplets[valid_split:] 67 | 68 | return train_triplets, valid_triplets, test_triplets, common_rels 69 | 70 | 71 | def get_subgraph(adj_list, hops, max_nodes_per_hop): 72 | ''' 73 | Samples a subgraph around randomly chosen root nodes upto hops with a limit on the nodes selected per hop given by max_nodes_per_hop 74 | ''' 75 | 76 | # collapse the list of adj mattricees to a single matrix 77 | A_incidence = incidence_matrix(adj_list) 78 | 79 | # chose a set of random root nodes 80 | idx = np.random.choice(range(len(A_incidence.tocoo().row)), size=params.n_roots, replace=False) 81 | roots = set([A_incidence.tocoo().row[id] for id in idx] + [A_incidence.tocoo().col[id] for id in idx]) 82 | 83 | # get the neighbor nodes within a limit of hops 84 | bfs_generator = _bfs_relational(A_incidence, roots, max_nodes_per_hop) 85 | lvls = list() 86 | for _ in range(hops): 87 | lvls.append(next(bfs_generator)) 88 | 89 | nodes = list(roots) + list(set().union(*lvls)) 90 | 91 | return nodes 92 | 93 | 94 | def mask_nodes(adj_list, nodes): 95 | ''' 96 | mask a set of nodes from a given graph 97 | ''' 98 | 99 | masked_adj_list = [adj.copy() for adj in adj_list] 100 | for node in nodes: 101 | for adj in masked_adj_list: 102 | adj.data[adj.indptr[node]:adj.indptr[node + 1]] = 0 103 | adj = adj.tocsr() 104 | adj.data[adj.indptr[node]:adj.indptr[node + 1]] = 0 105 | adj = adj.tocsc() 106 | for adj in masked_adj_list: 107 | adj.eliminate_zeros() 108 | return masked_adj_list 109 | 110 | 111 | def main(params): 112 | 113 | adj_list, triplets, entity2id, relation2id, id2entity, id2relation = process_files(files) 114 | 115 | meta_train_nodes = get_subgraph(adj_list, params.hops, params.max_nodes_per_hop) # list(range(750, 8500)) # 116 | 117 | masked_adj_list = mask_nodes(adj_list, meta_train_nodes) 118 | 119 | meta_test_nodes = get_subgraph(masked_adj_list, params.hops_test + 1, params.max_nodes_per_hop_test) # list(range(0, 750)) # 120 | 121 | print('Common nodes among the two disjoint datasets (should ideally be zero): ', set(meta_train_nodes).intersection(set(meta_test_nodes))) 122 | tmp = [adj[meta_train_nodes, :][:, meta_train_nodes] for adj in masked_adj_list] 123 | print('Residual edges (should be zero) : ', np.sum(get_edge_count(tmp))) 124 | 125 | print("================") 126 | print("Train graph stats") 127 | print("================") 128 | train_triplets, valid_triplets, test_triplets, train_active_rels = get_splits(adj_list, meta_train_nodes, range(len(adj_list))) 129 | print("================") 130 | print("Meta-test graph stats") 131 | print("================") 132 | meta_train_triplets, meta_valid_triplets, meta_test_triplets, meta_active_rels = get_splits(adj_list, meta_test_nodes, train_active_rels) 133 | 134 | print("================") 135 | print('Extra rels (should be empty): ', set(meta_active_rels) - set(train_active_rels)) 136 | 137 | # TODO: ABSTRACT THIS INTO A METHOD 138 | data_dir = os.path.join(params.main_dir, 'data/{}'.format(params.new_dataset)) 139 | if not os.path.exists(data_dir): 140 | os.makedirs(data_dir) 141 | 142 | save_to_file(data_dir, 'train.txt', train_triplets, id2entity, id2relation) 143 | save_to_file(data_dir, 'valid.txt', valid_triplets, id2entity, id2relation) 144 | save_to_file(data_dir, 'test.txt', test_triplets, id2entity, id2relation) 145 | 146 | meta_data_dir = os.path.join(params.main_dir, 'data/{}'.format(params.new_dataset + '_meta')) 147 | if not os.path.exists(meta_data_dir): 148 | os.makedirs(meta_data_dir) 149 | 150 | save_to_file(meta_data_dir, 'train.txt', meta_train_triplets, id2entity, id2relation) 151 | save_to_file(meta_data_dir, 'valid.txt', meta_valid_triplets, id2entity, id2relation) 152 | save_to_file(meta_data_dir, 'test.txt', meta_test_triplets, id2entity, id2relation) 153 | 154 | 155 | if __name__ == '__main__': 156 | 157 | parser = argparse.ArgumentParser(description='Save adjacency matrtices and triplets') 158 | 159 | parser.add_argument("--dataset", "-d", type=str, default="FB15K237", 160 | help="Dataset string") 161 | parser.add_argument("--new_dataset", "-nd", type=str, default="fb_v3", 162 | help="Dataset string") 163 | parser.add_argument("--n_roots", "-n", type=int, default="1", 164 | help="Number of roots to sample the neighborhood from") 165 | parser.add_argument("--hops", "-H", type=int, default="3", 166 | help="Number of hops to sample the neighborhood") 167 | parser.add_argument("--max_nodes_per_hop", "-m", type=int, default="2500", 168 | help="Number of nodes in the neighborhood") 169 | parser.add_argument("--hops_test", "-HT", type=int, default="3", 170 | help="Number of hops to sample the neighborhood") 171 | parser.add_argument("--max_nodes_per_hop_test", "-mt", type=int, default="2500", 172 | help="Number of nodes in the neighborhood") 173 | parser.add_argument("--seed", "-s", type=int, default="28", 174 | help="Numpy random seed") 175 | 176 | params = parser.parse_args() 177 | 178 | np.random.seed(params.seed) 179 | random.seed(params.seed) 180 | 181 | params.main_dir = os.path.join(os.path.relpath(os.path.dirname(os.path.abspath(__file__))), '..') 182 | 183 | files = { 184 | 'train': os.path.join(params.main_dir, 'data/{}/train.txt'.format(params.dataset)), 185 | 'valid': os.path.join(params.main_dir, 'data/{}/valid.txt'.format(params.dataset)), 186 | 'test': os.path.join(params.main_dir, 'data/{}/test.txt'.format(params.dataset)) 187 | } 188 | 189 | main(params) 190 | -------------------------------------------------------------------------------- /CoMPILE_v2/README.md: -------------------------------------------------------------------------------- 1 | 2 | 3 | To generate new dataset and extract subgraph: 4 | 5 | python3 ./train/inductive_subgraph8.py --prefix FB15k-237-v1 --data ./data/FB15k-237-inductive-v1/ --dataset FB15k-237-v1 --hop 3 6 | 7 | 8 | To train the model: 9 | 10 | python3 ./train/main_compile.py --prefix FB15k-237-v1 --data ./data/FB15k-237-inductive-v1 --dataset FB15k-237-v1 --hop 3 --train True --pretrained_emb False --epochs_conv 20 --direct True --output_folder ./train/checkpoints/FB15k-237-v1/out/ --batch_size_conv 128 --test True 11 | 12 | 13 | To test the model: 14 | 15 | python3 ./train/main_compile.py --prefix FB15k-237-v1 --data ./data/FB15k-237-inductive-v1 --dataset FB15k-237-v1 --hop 3 --train False --pretrained_emb False --epochs_conv 20 --direct True --output_folder ./train/checkpoints/FB15k-237-v1/out/ --batch_size_conv 128 --test True 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | -------------------------------------------------------------------------------- /CoMPILE_v2/data/FB15k-237-inductive-v1/relation2id.json: -------------------------------------------------------------------------------- 1 | {"/award/award_winning_work/awards_won./award/award_honor/award_winner": 0, "/film/film/genre": 1, "/people/person/profession": 2, "/sports/sports_team_location/teams": 3, "/education/educational_degree/people_with_this_degree./education/education/institution": 4, "/organization/organization_founder/organizations_founded": 5, "/award/award_category/nominees./award/award_nomination/nominated_for": 6, "/film/film/release_date_s./film/film_regional_release_date/film_release_region": 7, "/people/person/gender": 8, "/film/actor/film./film/performance/film": 9, "/olympics/olympic_sport/athletes./olympics/olympic_athlete_affiliation/country": 10, "/time/event/instance_of_recurring_event": 11, "/award/award_category/winners./award/award_honor/award_winner": 12, "/soccer/football_player/current_team./sports/sports_team_roster/team": 13, "/media_common/netflix_genre/titles": 14, "/people/person/spouse_s./people/marriage/type_of_union": 15, "/award/award_nominee/award_nominations./award/award_nomination/award_nominee": 16, "/award/award_category/winners./award/award_honor/ceremony": 17, "/award/award_ceremony/awards_presented./award/award_honor/award_winner": 18, "/film/film/language": 19, "/base/culturalevent/event/entity_involved": 20, "/music/performance_role/track_performances./music/track_contribution/role": 21, "/food/food/nutrients./food/nutrition_fact/nutrient": 22, "/sports/sports_position/players./sports/sports_team_roster/team": 23, "/tv/tv_program/genre": 24, "/people/person/spouse_s./people/marriage/location_of_ceremony": 25, "/award/award_winner/awards_won./award/award_honor/award_winner": 26, "/base/saturdaynightlive/snl_cast_member/seasons./base/saturdaynightlive/snl_season_tenure/cast_members": 27, "/people/person/nationality": 28, "/people/person/languages": 29, "/organization/organization/headquarters./location/mailing_address/state_province_region": 30, "/music/genre/artists": 31, "/tv/tv_program/languages": 32, "/base/localfood/seasonal_month/produce_available./base/localfood/produce_availability/seasonal_months": 33, "/education/educational_institution/students_graduates./education/education/student": 34, "/baseball/baseball_team/team_stats./baseball/baseball_team_stats/season": 35, "/sports/sports_team/roster./baseball/baseball_roster_position/position": 36, "/award/award_nominee/award_nominations./award/award_nomination/award": 37, "/base/popstra/celebrity/friendship./base/popstra/friendship/participant": 38, "/film/film/other_crew./film/film_crew_gig/film_crew_role": 39, "/base/biblioness/bibs_location/country": 40, "/location/location/contains": 41, "/location/statistical_region/religions./location/religion_percentage/religion": 42, "/tv/tv_program/regular_cast./tv/regular_tv_appearance/actor": 43, "/olympics/olympic_participating_country/athletes./olympics/olympic_athlete_affiliation/olympics": 44, "/people/person/religion": 45, "/award/award_nominee/award_nominations./award/award_nomination/nominated_for": 46, "/award/award_winning_work/awards_won./award/award_honor/award": 47, "/location/country/form_of_government": 48, "/film/film/produced_by": 49, "/music/group_member/membership./music/group_membership/role": 50, "/film/film/prequel": 51, "/base/popstra/celebrity/canoodled./base/popstra/canoodled/participant": 52, "/language/human_language/countries_spoken_in": 53, "/people/person/places_lived./people/place_lived/location": 54, "/organization/organization/headquarters./location/mailing_address/citytown": 55, "/film/film/executive_produced_by": 56, "/government/government_office_category/officeholders./government/government_position_held/jurisdiction_of_office": 57, "/film/actor/dubbing_performances./film/dubbing_performance/language": 58, "/sports/professional_sports_team/draft_picks./sports/sports_league_draft_pick/draft": 59, "/film/film_distributor/films_distributed./film/film_film_distributor_relationship/film": 60, "/olympics/olympic_participating_country/medals_won./olympics/olympic_medal_honor/olympics": 61, "/education/educational_degree/people_with_this_degree./education/education/student": 62, "/government/politician/government_positions_held./government/government_position_held/basic_title": 63, "/location/location/time_zones": 64, "/base/schemastaging/organization_extra/phone_number./base/schemastaging/phone_sandbox/service_language": 65, "/people/marriage_union_type/unions_of_this_type./people/marriage/location_of_ceremony": 66, "/people/person/spouse_s./people/marriage/spouse": 67, "/film/director/film": 68, "/tv/tv_program/program_creator": 69, "/sports/sports_league_draft/picks./sports/sports_league_draft_pick/school": 70, "/award/award_ceremony/awards_presented./award/award_honor/honored_for": 71, "/film/film/film_format": 72, "/music/instrument/instrumentalists": 73, "/award/award_nominated_work/award_nominations./award/award_nomination/nominated_for": 74, "/base/x2010fifaworldcupsouthafrica/world_cup_squad/current_world_cup_squad./base/x2010fifaworldcupsouthafrica/current_world_cup_squad/current_club": 75, "/education/educational_degree/people_with_this_degree./education/education/major_field_of_study": 76, "/government/legislative_session/members./government/government_position_held/district_represented": 77, "/film/film/written_by": 78, "/sports/sports_team/colors": 79, "/sports/sports_team/roster./american_football/football_historical_roster_position/position_s": 80, "/music/artist/track_contributions./music/track_contribution/role": 81, "/people/ethnicity/people": 82, "/base/marchmadness/ncaa_basketball_tournament/seeds./base/marchmadness/ncaa_tournament_seed/team": 83, "/location/location/adjoin_s./location/adjoining_relationship/adjoins": 84, "/film/film/runtime./film/film_cut/film_release_region": 85, "/award/ranked_item/appears_in_ranked_lists./award/ranking/list": 86, "/music/performance_role/regular_performances./music/group_membership/role": 87, "/education/field_of_study/students_majoring./education/education/student": 88, "/location/statistical_region/places_exported_to./location/imports_and_exports/exported_to": 89, "/base/popstra/location/vacationers./base/popstra/vacation_choice/vacationer": 90, "/tv/tv_producer/programs_produced./tv/tv_producer_term/program": 91, "/music/record_label/artist": 92, "/location/location/partially_contains": 93, "/base/popstra/celebrity/dated./base/popstra/dated/participant": 94, "/music/performance_role/guest_performances./music/recording_contribution/performance_role": 95, "/film/film/film_festivals": 96, "/travel/travel_destination/climate./travel/travel_destination_monthly_climate/month": 97, "/sports/professional_sports_team/draft_picks./sports/sports_league_draft_pick/school": 98, "/location/capital_of_administrative_division/capital_of./location/administrative_division_capital_relationship/administrative_division": 99, "/music/genre/parent_genre": 100, "/music/performance_role/regular_performances./music/group_membership/group": 101, "/influence/influence_node/influenced_by": 102, "/film/film/country": 103, "/military/military_combatant/military_conflicts./military/military_combatant_group/combatants": 104, "/award/award_winning_work/awards_won./award/award_honor/honored_for": 105, "/film/film/production_companies": 106, "/military/military_conflict/combatants./military/military_combatant_group/combatants": 107, "/sports/pro_athlete/teams./sports/sports_team_roster/team": 108, "/location/hud_county_place/county": 109, "/base/biblioness/bibs_location/state": 110, "/location/administrative_division/country": 111, "/education/educational_institution/students_graduates./education/education/major_field_of_study": 112, "/award/hall_of_fame/inductees./award/hall_of_fame_induction/inductee": 113, "/film/film/story_by": 114, "/business/business_operation/industry": 115, "/dataworld/gardening_hint/split_to": 116, "/broadcast/content/artist": 117, "/people/person/sibling_s./people/sibling_relationship/sibling": 118, "/organization/organization_member/member_of./organization/organization_membership/organization": 119, "/people/cause_of_death/people": 120, "/people/deceased_person/place_of_burial": 121, "/business/job_title/people_with_this_title./business/employment_tenure/company": 122, "/film/film/music": 123, "/film/film/other_crew./film/film_crew_gig/crewmember": 124, "/music/instrument/family": 125, "/film/film_subject/films": 126, "/government/legislative_session/members./government/government_position_held/legislative_sessions": 127, "/people/person/place_of_birth": 128, "/film/film/release_date_s./film/film_regional_release_date/film_regional_debut_venue": 129, "/education/educational_institution/colors": 130, "/sports/sports_team/sport": 131, "/travel/travel_destination/how_to_get_here./travel/transportation/mode_of_transportation": 132, "/influence/influence_node/peers./influence/peer_relationship/peers": 133, "/award/award_category/disciplines_or_subjects": 134, "/people/ethnicity/languages_spoken": 135, "/medicine/disease/risk_factors": 136, "/education/field_of_study/students_majoring./education/education/major_field_of_study": 137, "/location/country/official_language": 138, "/education/educational_institution/school_type": 139, "/base/aareas/schema/administrative_area/administrative_parent": 140, "/government/political_party/politicians_in_this_party./government/political_party_tenure/politician": 141, "/award/award_category/category_of": 142, "/celebrities/celebrity/celebrity_friends./celebrities/friendship/friend": 143, "/film/film/distributors./film/film_film_distributor_relationship/film_distribution_medium": 144, "/base/aareas/schema/administrative_area/capital": 145, "/location/us_county/county_seat": 146, "/people/ethnicity/geographic_distribution": 147, "/film/film/featured_film_locations": 148, "/film/film/dubbing_performances./film/dubbing_performance/actor": 149, "/location/administrative_division/first_level_division_of": 150, "/user/ktrueman/default_domain/international_organization/member_states": 151, "/base/popstra/celebrity/breakup./base/popstra/breakup/participant": 152, "/film/film/personal_appearances./film/personal_film_appearance/person": 153, "/time/event/locations": 154, "/user/alexander/philosophy/philosopher/interests": 155, "/user/jg/default_domain/olympic_games/sports": 156, "/olympics/olympic_sport/athletes./olympics/olympic_athlete_affiliation/olympics": 157, "/olympics/olympic_games/sports": 158, "/celebrities/celebrity/sexual_relationships./celebrities/romantic_relationship/celebrity": 159, "/tv/tv_personality/tv_regular_appearances./tv/tv_regular_personal_appearance/program": 160, "/people/person/employment_history./business/employment_tenure/company": 161, "/tv/tv_writer/tv_programs./tv/tv_program_writer_relationship/tv_program": 162, "/music/artist/origin": 163, "/film/film_set_designer/film_sets_designed": 164, "/music/artist/contribution./music/recording_contribution/performance_role": 165, "/organization/organization/child./organization/organization_relationship/child": 166, "/film/film/costume_design_by": 167, "/film/film/edited_by": 168, "/location/country/capital": 169, "/base/schemastaging/organization_extra/phone_number./base/schemastaging/phone_sandbox/service_location": 170, "/organization/organization/place_founded": 171, "/people/profession/specialization_of": 172, "/music/group_member/membership./music/group_membership/group": 173, "/film/film/film_production_design_by": 174, "/sports/sport/pro_athletes./sports/pro_sports_played/athlete": 175, "/film/film/cinematography": 176, "/tv/tv_network/programs./tv/tv_network_duration/program": 177, "/government/politician/government_positions_held./government/government_position_held/legislative_sessions": 178, "/soccer/football_team/current_roster./soccer/football_roster_position/position": 179} -------------------------------------------------------------------------------- /CoMPILE_v2/data/FB15k-237-v1_3_hop_new_data.pickle: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TmacMai/CoMPILE_Inductive_Knowledge_Graph/523db77f8178f89faf5ec06c677ca66b84ef35eb/CoMPILE_v2/data/FB15k-237-v1_3_hop_new_data.pickle -------------------------------------------------------------------------------- /CoMPILE_v2/data/FB15k-237-v1_3_hop_total_test_head2.pickle: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TmacMai/CoMPILE_Inductive_Knowledge_Graph/523db77f8178f89faf5ec06c677ca66b84ef35eb/CoMPILE_v2/data/FB15k-237-v1_3_hop_total_test_head2.pickle -------------------------------------------------------------------------------- /CoMPILE_v2/data/FB15k-237-v1_3_hop_total_test_tail2.pickle: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TmacMai/CoMPILE_Inductive_Knowledge_Graph/523db77f8178f89faf5ec06c677ca66b84ef35eb/CoMPILE_v2/data/FB15k-237-v1_3_hop_total_test_tail2.pickle -------------------------------------------------------------------------------- /CoMPILE_v2/data/FB15k-237-v1_3_total_train_data.pickle: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TmacMai/CoMPILE_Inductive_Knowledge_Graph/523db77f8178f89faf5ec06c677ca66b84ef35eb/CoMPILE_v2/data/FB15k-237-v1_3_total_train_data.pickle -------------------------------------------------------------------------------- /CoMPILE_v2/data/FB15k-237-v1_3_total_train_label.pickle: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TmacMai/CoMPILE_Inductive_Knowledge_Graph/523db77f8178f89faf5ec06c677ca66b84ef35eb/CoMPILE_v2/data/FB15k-237-v1_3_total_train_label.pickle -------------------------------------------------------------------------------- /CoMPILE_v2/data/FB15k-237-v1_3_total_val_data.pickle: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TmacMai/CoMPILE_Inductive_Knowledge_Graph/523db77f8178f89faf5ec06c677ca66b84ef35eb/CoMPILE_v2/data/FB15k-237-v1_3_total_val_data.pickle -------------------------------------------------------------------------------- /CoMPILE_v2/data/FB15k-237-v1_3_total_val_label.pickle: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TmacMai/CoMPILE_Inductive_Knowledge_Graph/523db77f8178f89faf5ec06c677ca66b84ef35eb/CoMPILE_v2/data/FB15k-237-v1_3_total_val_label.pickle -------------------------------------------------------------------------------- /CoMPILE_v2/data/nell-inductive-v1/test_inductive.txt: -------------------------------------------------------------------------------- 1 | concept:televisionstation:wqec concept:subpartof concept:company:pbs 2 | concept:televisionstation:kusm concept:agentbelongstoorganization concept:company:pbs 3 | concept:televisionstation:kqsd_tv concept:agentcollaborateswithagent concept:company:pbs 4 | concept:company:pbs concept:agentcontrols concept:televisionstation:kufm_tv 5 | concept:company:pbs concept:acquired concept:company:kues 6 | concept:company:pbs concept:agentcontrols concept:televisionstation:kyne_tv 7 | concept:company:pbs concept:agentcontrols concept:televisionstation:kcos_tv 8 | concept:televisionstation:wxxi_tv concept:agentbelongstoorganization concept:company:pbs 9 | concept:televisionstation:wvta concept:subpartof concept:company:pbs 10 | concept:televisionstation:wunp_tv concept:agentcollaborateswithagent concept:company:pbs 11 | concept:televisionstation:koac_tv concept:agentbelongstoorganization concept:company:pbs 12 | concept:televisionstation:wsec concept:subpartof concept:company:pbs 13 | concept:televisionstation:wvta concept:agentbelongstoorganization concept:company:pbs 14 | concept:televisionstation:kepb_tv concept:agentbelongstoorganization concept:company:pbs 15 | concept:televisionstation:wedn concept:subpartof concept:company:pbs 16 | concept:televisionstation:wunf_tv concept:agentbelongstoorganization concept:company:pbs 17 | concept:televisionstation:ktwu concept:agentbelongstoorganization concept:company:pbs 18 | concept:televisionstation:kood concept:televisionstationaffiliatedwith concept:televisionnetwork:pbs 19 | concept:televisionstation:wvut concept:subpartof concept:company:pbs 20 | concept:televisionstation:kfts concept:subpartof concept:company:pbs 21 | concept:televisionstation:wunm_tv concept:agentcollaborateswithagent concept:company:pbs 22 | concept:televisionstation:whla_tv concept:televisionstationaffiliatedwith concept:televisionnetwork:pbs 23 | concept:televisionstation:whtj concept:agentbelongstoorganization concept:company:pbs 24 | concept:company:pbs concept:agentcontrols concept:televisionstation:wmed_tv 25 | concept:televisionstation:kpsd_tv concept:subpartof concept:company:pbs 26 | concept:televisionstation:wkyu_tv concept:agentcollaborateswithagent concept:company:pbs 27 | concept:televisionstation:wkon concept:agentcollaborateswithagent concept:company:pbs 28 | concept:televisionstation:kuid_tv concept:subpartof concept:company:pbs 29 | concept:televisionstation:ktsd_tv concept:agentcollaborateswithagent concept:company:pbs 30 | concept:televisionstation:wgiq concept:agentcollaborateswithagent concept:company:pbs 31 | concept:televisionstation:wlef_tv concept:subpartof concept:company:pbs 32 | concept:televisionstation:whro_tv concept:subpartof concept:company:pbs 33 | concept:televisionstation:wung_tv concept:subpartoforganization concept:televisionnetwork:pbs 34 | concept:televisionstation:wkoh concept:agentbelongstoorganization concept:company:pbs 35 | concept:televisionstation:wetp_tv concept:agentbelongstoorganization concept:company:pbs 36 | concept:televisionstation:wfiq concept:agentcollaborateswithagent concept:company:pbs 37 | concept:televisionstation:kopb_tv concept:subpartof concept:company:pbs 38 | concept:televisionstation:wfiq concept:agentbelongstoorganization concept:company:pbs 39 | concept:televisionstation:wmpt concept:subpartof concept:company:pbs 40 | concept:televisionstation:kbyu_tv concept:agentcollaborateswithagent concept:company:pbs 41 | concept:televisionstation:wmeb_tv concept:subpartof concept:company:pbs 42 | concept:company:pbs concept:agentcontrols concept:televisionstation:krma_tv 43 | concept:televisionstation:wntv concept:agentbelongstoorganization concept:company:pbs 44 | concept:televisionstation:ktsc_tv concept:subpartoforganization concept:televisionnetwork:pbs 45 | concept:televisionstation:wmae_tv concept:televisionstationaffiliatedwith concept:televisionnetwork:pbs 46 | concept:televisionstation:wbcc_tv concept:agentbelongstoorganization concept:company:pbs 47 | concept:televisionstation:kcet concept:televisionstationaffiliatedwith concept:televisionnetwork:pbs 48 | concept:televisionstation:kwse concept:agentbelongstoorganization concept:company:pbs 49 | concept:televisionstation:ktci_tv concept:agentbelongstoorganization concept:company:pbs 50 | concept:televisionstation:wmae_tv concept:agentcollaborateswithagent concept:company:pbs 51 | concept:televisionstation:wunj_tv concept:televisionstationaffiliatedwith concept:televisionnetwork:pbs 52 | concept:televisionstation:kufm_tv concept:agentbelongstoorganization concept:company:pbs 53 | concept:televisionstation:kufm_tv concept:subpartoforganization concept:televisionnetwork:pbs 54 | concept:televisionstation:wpbt concept:subpartof concept:company:pbs 55 | concept:televisionstation:whmc concept:subpartoforganization concept:televisionnetwork:pbs 56 | concept:company:pbs concept:agentcontrols concept:televisionstation:kcts_tv 57 | concept:televisionstation:whtj concept:subpartoforganization concept:televisionnetwork:pbs 58 | concept:company:pbs concept:agentcontrols concept:televisionstation:wlpb_tv 59 | concept:televisionstation:wkno_tv concept:agentbelongstoorganization concept:company:pbs 60 | concept:televisionstation:ktsd_tv concept:agentbelongstoorganization concept:company:pbs 61 | concept:televisionstation:wviz_tv concept:agentbelongstoorganization concept:company:pbs 62 | concept:company:pbs concept:agentcontrols concept:televisionstation:wgvu_tv 63 | concept:company:pbs concept:agentcontrols concept:televisionstation:wbiq_tv 64 | concept:company:pbs concept:agentcontrols concept:televisionstation:woub_tv 65 | concept:televisionstation:wenh concept:televisionstationaffiliatedwith concept:televisionnetwork:pbs 66 | concept:televisionstation:kbyu_tv concept:subpartof concept:company:pbs 67 | concept:televisionstation:wunm_tv concept:agentbelongstoorganization concept:company:pbs 68 | concept:televisionstation:klru concept:televisionstationaffiliatedwith concept:televisionnetwork:pbs 69 | concept:televisionstation:wgby_tv concept:subpartof concept:company:pbs 70 | concept:company:pbs concept:agentcontrols concept:televisionstation:wdse_tv 71 | concept:company:pbs concept:agentcontrols concept:televisionstation:wmpt 72 | concept:televisionstation:kacv_tv concept:subpartof concept:company:pbs 73 | concept:televisionstation:kaet concept:subpartof concept:company:pbs 74 | concept:televisionstation:ketc concept:agentcollaborateswithagent concept:company:pbs 75 | concept:televisionstation:kbhe_tv concept:subpartof concept:company:pbs 76 | concept:televisionstation:kwse concept:televisionstationaffiliatedwith concept:televisionnetwork:pbs 77 | concept:televisionstation:krwg_tv concept:subpartof concept:company:pbs 78 | concept:televisionstation:weta_tv concept:subpartof concept:company:pbs 79 | concept:company:pbs concept:agentcontrols concept:televisionstation:kisu_tv 80 | concept:televisionstation:wfwa concept:subpartof concept:company:pbs 81 | concept:televisionstation:weta_tv concept:televisionstationaffiliatedwith concept:televisionnetwork:pbs 82 | concept:televisionstation:kcwc_tv concept:subpartof concept:company:pbs 83 | concept:televisionstation:ktne_tv concept:televisionstationaffiliatedwith concept:televisionnetwork:pbs 84 | concept:company:pbs concept:agentcontrols concept:televisionstation:wmsy_tv 85 | concept:company:pbs concept:agentcontrols concept:televisionstation:kopb_tv 86 | concept:televisionstation:wund_tv concept:subpartof concept:company:pbs 87 | concept:company:pbs concept:agentcontrols concept:televisionstation:kamu_tv 88 | concept:company:pbs concept:agentcontrols concept:televisionstation:wkzt_tv 89 | concept:company:pbs concept:agentcontrols concept:televisionstation:wpby_tv 90 | concept:televisionstation:kedt concept:televisionstationaffiliatedwith concept:televisionnetwork:pbs 91 | concept:televisionstation:ketg concept:subpartof concept:company:pbs 92 | concept:televisionstation:whro_tv concept:agentcollaborateswithagent concept:company:pbs 93 | concept:televisionstation:wneo concept:agentcollaborateswithagent concept:company:pbs 94 | concept:televisionstation:wpbt concept:subpartoforganization concept:televisionnetwork:pbs 95 | concept:televisionstation:kcwc_tv concept:agentbelongstoorganization concept:company:pbs 96 | concept:televisionstation:kopb_tv concept:televisionstationaffiliatedwith concept:televisionnetwork:pbs 97 | concept:televisionstation:wlrn_tv concept:agentcollaborateswithagent concept:company:pbs 98 | concept:televisionstation:weta_tv concept:agentcollaborateswithagent concept:company:pbs 99 | concept:televisionstation:kera_tv concept:agentbelongstoorganization concept:company:pbs 100 | concept:company:pbs concept:agentcontrols concept:televisionstation:wkha 101 | -------------------------------------------------------------------------------- /CoMPILE_v2/data/nell-inductive-v1/valid_inductive.txt: -------------------------------------------------------------------------------- 1 | concept:televisionstation:wune_tv concept:agentbelongstoorganization concept:company:pbs 2 | concept:televisionstation:wsbn_tv concept:subpartof concept:company:pbs 3 | concept:televisionstation:wnin_tv concept:subpartof concept:company:pbs 4 | concept:televisionstation:wgte_tv concept:subpartoforganization concept:televisionnetwork:pbs 5 | concept:televisionstation:wedn concept:subpartoforganization concept:televisionnetwork:pbs 6 | concept:televisionstation:wmed_tv concept:subpartoforganization concept:televisionnetwork:pbs 7 | concept:televisionstation:wfiq concept:subpartoforganization concept:televisionnetwork:pbs 8 | concept:televisionstation:wbra_tv concept:agentbelongstoorganization concept:company:pbs 9 | concept:televisionstation:wkpi concept:subpartoforganization concept:televisionnetwork:pbs 10 | concept:televisionstation:wmsy_tv concept:subpartof concept:company:pbs 11 | concept:televisionstation:krne_tv concept:agentcollaborateswithagent concept:company:pbs 12 | concept:televisionstation:kteh concept:agentbelongstoorganization concept:company:pbs 13 | concept:televisionstation:wkpc_tv concept:televisionstationaffiliatedwith concept:televisionnetwork:pbs 14 | concept:televisionstation:wyes_tv concept:subpartof concept:company:pbs 15 | concept:televisionstation:wnin_tv concept:agentbelongstoorganization concept:company:pbs 16 | concept:televisionstation:wkon concept:agentbelongstoorganization concept:company:pbs 17 | concept:company:pbs concept:agentcontrols concept:televisionstation:kmbh_tv 18 | concept:company:pbs concept:agentcontrols concept:televisionstation:wcfe_tv 19 | concept:televisionstation:wntv concept:televisionstationaffiliatedwith concept:televisionnetwork:pbs 20 | concept:televisionstation:wmec concept:agentbelongstoorganization concept:company:pbs 21 | concept:company:pbs concept:agentcontrols concept:televisionstation:kixe_tv 22 | concept:televisionstation:kusd_tv concept:agentbelongstoorganization concept:company:pbs 23 | concept:company:pbs concept:agentcontrols concept:televisionstation:wund_tv 24 | concept:televisionstation:wlvt_tv concept:agentbelongstoorganization concept:company:pbs 25 | concept:televisionstation:kcsd_tv concept:subpartof concept:company:pbs 26 | concept:televisionstation:wmea_tv concept:subpartof concept:company:pbs 27 | concept:televisionstation:witf_tv concept:agentcollaborateswithagent concept:company:pbs 28 | concept:televisionstation:kued concept:agentbelongstoorganization concept:company:pbs 29 | concept:televisionstation:wnpb_tv concept:subpartoforganization concept:televisionnetwork:pbs 30 | concept:televisionstation:wunf_tv concept:subpartof concept:company:pbs 31 | concept:televisionstation:kcdt_tv concept:subpartof concept:company:pbs 32 | concept:company:pbs concept:agentcontrols concept:televisionstation:krwg_tv 33 | concept:televisionstation:wlef_tv concept:agentcollaborateswithagent concept:company:pbs 34 | concept:televisionstation:kozk concept:subpartoforganization concept:televisionnetwork:pbs 35 | concept:televisionstation:wkyu_tv concept:subpartoforganization concept:televisionnetwork:pbs 36 | concept:televisionstation:ketc concept:televisionstationaffiliatedwith concept:televisionnetwork:pbs 37 | concept:televisionstation:wbiq_tv concept:subpartof concept:company:pbs 38 | concept:televisionstation:kcos_tv concept:subpartof concept:company:pbs 39 | concept:company:pbs concept:agentcontrols concept:televisionstation:kwet 40 | concept:company:pbs concept:agentcontrols concept:televisionstation:ktci_tv 41 | concept:company:pbs concept:agentcontrols concept:televisionstation:wntv 42 | concept:televisionstation:wkha concept:agentcollaborateswithagent concept:company:pbs 43 | concept:company:pbs concept:agentcontrols concept:televisionstation:wnsc_tv 44 | concept:televisionstation:wtiu_tv concept:agentbelongstoorganization concept:company:pbs 45 | concept:televisionstation:wgbx_tv concept:subpartoforganization concept:televisionnetwork:pbs 46 | concept:televisionstation:wkmu concept:subpartof concept:company:pbs 47 | concept:televisionstation:wyes_tv concept:agentbelongstoorganization concept:company:pbs 48 | concept:televisionstation:wkpi concept:subpartof concept:company:pbs 49 | concept:televisionstation:wunl_tv concept:televisionstationaffiliatedwith concept:televisionnetwork:pbs 50 | concept:company:pbs concept:agentcontrols concept:televisionstation:wpto 51 | concept:televisionstation:kcts_tv concept:televisionstationaffiliatedwith concept:televisionnetwork:pbs 52 | concept:televisionstation:wouc_tv concept:agentbelongstoorganization concept:company:pbs 53 | concept:televisionstation:ksps_tv concept:subpartof concept:company:pbs 54 | concept:televisionstation:kbyu_tv concept:televisionstationaffiliatedwith concept:televisionnetwork:pbs 55 | concept:televisionstation:wung_tv concept:agentbelongstoorganization concept:company:pbs 56 | concept:televisionstation:wnjs concept:agentcollaborateswithagent concept:company:pbs 57 | concept:televisionstation:wha__tv concept:subpartof concept:company:pbs 58 | concept:televisionstation:wunu concept:agentbelongstoorganization concept:company:pbs 59 | concept:televisionstation:wkso_tv concept:agentcollaborateswithagent concept:company:pbs 60 | concept:televisionstation:klrn_tv concept:televisionstationaffiliatedwith concept:televisionnetwork:pbs 61 | concept:televisionstation:krwg_tv concept:agentcollaborateswithagent concept:company:pbs 62 | concept:televisionstation:wqec concept:agentcollaborateswithagent concept:company:pbs 63 | concept:company:pbs concept:agentcontrols concept:televisionstation:kbyu_tv 64 | concept:televisionstation:wptd concept:agentbelongstoorganization concept:company:pbs 65 | concept:company:pbs concept:agentcontrols concept:televisionstation:wvpy 66 | concept:company:pbs concept:agentcontrols concept:televisionstation:wvut 67 | concept:televisionstation:wgby_tv concept:subpartoforganization concept:televisionnetwork:pbs 68 | concept:televisionstation:wwpb concept:subpartoforganization concept:televisionnetwork:pbs 69 | concept:televisionstation:wund_tv concept:agentcollaborateswithagent concept:company:pbs 70 | concept:company:pbs concept:agentcontrols concept:televisionstation:wmvs_tv 71 | concept:televisionstation:kepb_tv concept:agentcollaborateswithagent concept:company:pbs 72 | concept:company:pbs concept:agentcontrols concept:televisionstation:kaft 73 | concept:televisionstation:wnjt concept:agentbelongstoorganization concept:company:pbs 74 | concept:company:pbs concept:agentcontrols concept:televisionstation:wswp_tv 75 | concept:televisionstation:kqsd_tv concept:televisionstationaffiliatedwith concept:televisionnetwork:pbs 76 | concept:televisionstation:kmos_tv concept:agentbelongstoorganization concept:company:pbs 77 | concept:televisionstation:wpbo_tv concept:subpartof concept:company:pbs 78 | concept:televisionstation:kltm_tv concept:subpartoforganization concept:televisionnetwork:pbs 79 | concept:company:pbs concept:agentcontrols concept:televisionstation:kpne_tv 80 | concept:televisionstation:wmau_tv concept:subpartof concept:company:pbs 81 | concept:televisionstation:kued concept:subpartoforganization concept:televisionnetwork:pbs 82 | concept:company:pbs concept:agentcontrols concept:televisionstation:wkpi 83 | concept:company:pbs concept:agentcontrols concept:televisionstation:wune_tv 84 | concept:televisionstation:kltm_tv concept:agentbelongstoorganization concept:company:pbs 85 | concept:televisionstation:wmaw_tv concept:agentcollaborateswithagent concept:company:pbs 86 | concept:televisionstation:wunj_tv concept:agentbelongstoorganization concept:company:pbs 87 | concept:televisionstation:kuht concept:agentbelongstoorganization concept:company:pbs 88 | concept:televisionstation:wunf_tv concept:subpartoforganization concept:televisionnetwork:pbs 89 | concept:televisionstation:wvpy concept:subpartof concept:company:pbs 90 | concept:televisionstation:wmsy_tv concept:agentcollaborateswithagent concept:company:pbs 91 | concept:televisionstation:kwet concept:subpartoforganization concept:televisionnetwork:pbs 92 | concept:televisionstation:wmau_tv concept:subpartoforganization concept:televisionnetwork:pbs 93 | concept:televisionstation:wund_tv concept:subpartoforganization concept:televisionnetwork:pbs 94 | concept:televisionstation:wmaw_tv concept:televisionstationaffiliatedwith concept:televisionnetwork:pbs 95 | concept:televisionstation:wnjn concept:agentcollaborateswithagent concept:company:pbs 96 | concept:televisionstation:wmah_tv concept:subpartof concept:company:pbs 97 | concept:company:pbs concept:agentcontrols concept:televisionstation:wmpb_tv 98 | concept:televisionstation:ksmq_tv concept:subpartof concept:company:pbs 99 | concept:company:pbs concept:agentcontrols concept:televisionstation:weta_tv 100 | concept:televisionstation:wmpn_tv concept:agentcollaborateswithagent concept:company:pbs 101 | concept:televisionstation:wune_tv concept:agentcollaborateswithagent concept:company:pbs 102 | -------------------------------------------------------------------------------- /CoMPILE_v2/log/FB15k-237-v1_error_now.log: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TmacMai/CoMPILE_Inductive_Knowledge_Graph/523db77f8178f89faf5ec06c677ca66b84ef35eb/CoMPILE_v2/log/FB15k-237-v1_error_now.log -------------------------------------------------------------------------------- /CoMPILE_v2/train/__pycache__/create_batch_inductive2.cpython-37.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TmacMai/CoMPILE_Inductive_Knowledge_Graph/523db77f8178f89faf5ec06c677ca66b84ef35eb/CoMPILE_v2/train/__pycache__/create_batch_inductive2.cpython-37.pyc -------------------------------------------------------------------------------- /CoMPILE_v2/train/__pycache__/layers.cpython-37.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TmacMai/CoMPILE_Inductive_Knowledge_Graph/523db77f8178f89faf5ec06c677ca66b84ef35eb/CoMPILE_v2/train/__pycache__/layers.cpython-37.pyc -------------------------------------------------------------------------------- /CoMPILE_v2/train/__pycache__/layers2.cpython-37.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TmacMai/CoMPILE_Inductive_Knowledge_Graph/523db77f8178f89faf5ec06c677ca66b84ef35eb/CoMPILE_v2/train/__pycache__/layers2.cpython-37.pyc -------------------------------------------------------------------------------- /CoMPILE_v2/train/__pycache__/models4.cpython-37.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TmacMai/CoMPILE_Inductive_Knowledge_Graph/523db77f8178f89faf5ec06c677ca66b84ef35eb/CoMPILE_v2/train/__pycache__/models4.cpython-37.pyc -------------------------------------------------------------------------------- /CoMPILE_v2/train/__pycache__/preprocess2.cpython-37.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TmacMai/CoMPILE_Inductive_Knowledge_Graph/523db77f8178f89faf5ec06c677ca66b84ef35eb/CoMPILE_v2/train/__pycache__/preprocess2.cpython-37.pyc -------------------------------------------------------------------------------- /CoMPILE_v2/train/__pycache__/preprocess_inductive.cpython-37.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TmacMai/CoMPILE_Inductive_Knowledge_Graph/523db77f8178f89faf5ec06c677ca66b84ef35eb/CoMPILE_v2/train/__pycache__/preprocess_inductive.cpython-37.pyc -------------------------------------------------------------------------------- /CoMPILE_v2/train/__pycache__/utils.cpython-37.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TmacMai/CoMPILE_Inductive_Knowledge_Graph/523db77f8178f89faf5ec06c677ca66b84ef35eb/CoMPILE_v2/train/__pycache__/utils.cpython-37.pyc -------------------------------------------------------------------------------- /CoMPILE_v2/train/create_batch.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TmacMai/CoMPILE_Inductive_Knowledge_Graph/523db77f8178f89faf5ec06c677ca66b84ef35eb/CoMPILE_v2/train/create_batch.pyc -------------------------------------------------------------------------------- /CoMPILE_v2/train/create_batch2.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TmacMai/CoMPILE_Inductive_Knowledge_Graph/523db77f8178f89faf5ec06c677ca66b84ef35eb/CoMPILE_v2/train/create_batch2.pyc -------------------------------------------------------------------------------- /CoMPILE_v2/train/create_dataset_files.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import pandas as pd 3 | import argparse 4 | 5 | def parse_args(): 6 | args = argparse.ArgumentParser() 7 | # network arguments 8 | args.add_argument("--data", help="data directory") 9 | args = args.parse_args() 10 | return args 11 | 12 | args = parse_args() 13 | 14 | def getID(): 15 | folder = './data/%s/attentionEmb/'%args.data 16 | lstEnts = {} 17 | lstRels = {} 18 | with open(folder + 'train_inductive.txt') as f, open(folder + 'train_marked_inductive.txt', 'w') as f2: 19 | count = 0 20 | for line in f: 21 | line = line.strip().split('\t') 22 | line = [i.strip() for i in line] 23 | # print(line[0], line[1], line[2]) 24 | if line[0] not in lstEnts: 25 | lstEnts[line[0]] = len(lstEnts) 26 | if line[1] not in lstRels: 27 | lstRels[line[1]] = len(lstRels) 28 | if line[2] not in lstEnts: 29 | lstEnts[line[2]] = len(lstEnts) 30 | count += 1 31 | f2.write(str(line[0]) + '\t' + str(line[1]) + 32 | '\t' + str(line[2]) + '\n') 33 | print("Size of train_marked set set ", count) 34 | 35 | with open(folder + 'valid_inductive.txt') as f, open(folder + 'valid_marked_inductive.txt', 'w') as f2: 36 | count = 0 37 | for line in f: 38 | line = line.strip().split('\t') 39 | line = [i.strip() for i in line] 40 | # print(line[0], line[1], line[2]) 41 | if line[0] not in lstEnts: 42 | lstEnts[line[0]] = len(lstEnts) 43 | if line[1] not in lstRels: 44 | lstRels[line[1]] = len(lstRels) 45 | if line[2] not in lstEnts: 46 | lstEnts[line[2]] = len(lstEnts) 47 | count += 1 48 | f2.write(str(line[0]) + '\t' + str(line[1]) + 49 | '\t' + str(line[2]) + '\n') 50 | print("Size of valid_marked set set ", count) 51 | 52 | with open(folder + 'test_inductive.txt') as f, open(folder + 'test_marked_inductive.txt', 'w') as f2: 53 | count = 0 54 | for line in f: 55 | line = line.strip().split('\t') 56 | line = [i.strip() for i in line] 57 | # print(line[0], line[1], line[2]) 58 | if line[0] not in lstEnts: 59 | lstEnts[line[0]] = len(lstEnts) 60 | if line[1] not in lstRels: 61 | lstRels[line[1]] = len(lstRels) 62 | if line[2] not in lstEnts: 63 | lstEnts[line[2]] = len(lstEnts) 64 | count += 1 65 | f2.write(str(line[0]) + '\t' + str(line[1]) + 66 | '\t' + str(line[2]) + '\n') 67 | print("Size of test_marked set set ", count) 68 | 69 | wri = open(folder + 'entity2id_inductive.txt', 'w') 70 | for entity in lstEnts: 71 | wri.write(entity + '\t' + str(lstEnts[entity])) 72 | wri.write('\n') 73 | wri.close() 74 | 75 | wri = open(folder + 'relation2id_inductive.txt', 'w') 76 | for entity in lstRels: 77 | wri.write(entity + '\t' + str(lstRels[entity])) 78 | wri.write('\n') 79 | wri.close() 80 | 81 | entity_df = pd.read_csv(folder+'entity2id_inductive.txt',sep='\t',header=None,names=['entity','index']) 82 | feature_df = pd.read_csv('./data/%s/feature.csv'%args.data) 83 | feature_df = feature_df.drop_duplicates(['entity']).reset_index(drop=True) 84 | for col in feature_df.columns: 85 | if col not in ['entity','index','type']: 86 | feature_df[col] = (feature_df[col]-feature_df[col].mean()) / feature_df[col].std() 87 | feature_df = pd.concat([feature_df.drop('type',axis=1),pd.get_dummies(feature_df['type'])],axis=1) 88 | entity_df = entity_df.merge(feature_df,how='left',on='entity') 89 | entity_df[[col for col in entity_df.columns if col not in ['entity','index','type']]].to_csv(folder+'entity2vec_inductive.txt',sep='\t',header=None,index=False) 90 | relation_df = pd.read_csv(folder+'relation2id_inductive.txt',sep='\t',header=None,names=['entity','index']) 91 | #relation_embeddings = np.random.randn(len(relation_df), 50) 92 | relation_embeddings = np.concatenate([np.random.randn(len(relation_df), 50),np.eye(len(relation_df))],axis=1) 93 | np.savetxt(folder+'relation2vec_inductive.txt', relation_embeddings,fmt='%f',delimiter='\t') 94 | 95 | getID() 96 | -------------------------------------------------------------------------------- /CoMPILE_v2/train/layers.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import torch 3 | import torch.nn as nn 4 | import time 5 | import torch.nn.functional as F 6 | from torch.autograd import Variable 7 | 8 | 9 | CUDA = torch.cuda.is_available() 10 | 11 | 12 | class ConvKB(nn.Module): 13 | def __init__(self, input_dim, input_seq_len, in_channels, out_channels, drop_prob, alpha_leaky): 14 | super().__init__() 15 | 16 | self.conv_layer = nn.Conv2d( 17 | in_channels, out_channels, (1, input_seq_len)) # kernel size -> 1*input_seq_length(i.e. 2) 18 | self.dropout = nn.Dropout(drop_prob) 19 | self.non_linearity = nn.ReLU() 20 | self.fc_layer = nn.Linear((input_dim) * out_channels, 1) 21 | 22 | nn.init.xavier_uniform_(self.fc_layer.weight, gain=1.414) 23 | nn.init.xavier_uniform_(self.conv_layer.weight, gain=1.414) 24 | 25 | def forward(self, conv_input): 26 | 27 | batch_size, length, dim = conv_input.size() 28 | # assuming inputs are of the form -> 29 | conv_input = conv_input.transpose(1, 2) 30 | # batch * length(which is 3 here -> entity,relation,entity) * dim 31 | # To make tensor of size 4, where second dim is for input channels 32 | conv_input = conv_input.unsqueeze(1) 33 | 34 | out_conv = self.dropout( 35 | self.non_linearity(self.conv_layer(conv_input))) 36 | 37 | input_fc = out_conv.squeeze(-1).view(batch_size, -1) 38 | output = self.fc_layer(input_fc) 39 | return output 40 | 41 | 42 | class SpecialSpmmFunctionFinal(torch.autograd.Function): 43 | """Special function for only sparse region backpropataion layer.""" 44 | @staticmethod 45 | def forward(ctx, edge, edge_w, N, E, out_features): 46 | # assert indices.requires_grad == False 47 | a = torch.sparse_coo_tensor( 48 | edge, edge_w, torch.Size([N, N, out_features])) 49 | b = torch.sparse.sum(a, dim=1) 50 | ctx.N = b.shape[0] 51 | ctx.outfeat = b.shape[1] 52 | ctx.E = E 53 | ctx.indices = a._indices()[0, :] 54 | 55 | return b.to_dense() 56 | 57 | @staticmethod 58 | def backward(ctx, grad_output): 59 | grad_values = None 60 | if ctx.needs_input_grad[1]: 61 | edge_sources = ctx.indices 62 | 63 | if(CUDA): 64 | edge_sources = edge_sources.cuda() 65 | 66 | grad_values = grad_output[edge_sources] 67 | # grad_values = grad_values.view(ctx.E, ctx.outfeat) 68 | # print("Grad Outputs-> ", grad_output) 69 | # print("Grad values-> ", grad_values) 70 | return None, grad_values, None, None, None 71 | 72 | 73 | class SpecialSpmmFinal(nn.Module): 74 | def forward(self, edge, edge_w, N, E, out_features): 75 | return SpecialSpmmFunctionFinal.apply(edge, edge_w, N, E, out_features) 76 | 77 | 78 | class SpGraphAttentionLayer(nn.Module): 79 | """ 80 | Sparse version GAT layer, similar to https://arxiv.org/abs/1710.10903 81 | """ 82 | 83 | def __init__(self, num_nodes, in_features, out_features, nrela_dim, dropout, alpha, concat=True): 84 | super(SpGraphAttentionLayer, self).__init__() 85 | self.in_features = in_features 86 | self.out_features = out_features 87 | self.num_nodes = num_nodes 88 | self.alpha = alpha 89 | self.concat = concat 90 | self.nrela_dim = nrela_dim 91 | 92 | self.a = nn.Parameter(torch.zeros( 93 | size=(out_features, 2 * in_features + nrela_dim))) 94 | nn.init.xavier_normal_(self.a.data, gain=1.414) 95 | self.a_2 = nn.Parameter(torch.zeros(size=(1, out_features))) 96 | nn.init.xavier_normal_(self.a_2.data, gain=1.414) 97 | 98 | self.dropout = nn.Dropout(dropout) 99 | self.leakyrelu = nn.LeakyReLU(self.alpha) 100 | self.special_spmm_final = SpecialSpmmFinal() 101 | 102 | def forward(self, input, edge, edge_embed, edge_list_nhop, edge_embed_nhop): 103 | N = input.size()[0] 104 | 105 | # Self-attention on the nodes - Shared attention mechanism 106 | edge = torch.cat((edge[:, :], edge_list_nhop[:, :]), dim=1) 107 | edge_embed = torch.cat( 108 | (edge_embed[:, :], edge_embed_nhop[:, :]), dim=0) 109 | 110 | edge_h = torch.cat( 111 | (input[edge[0, :], :], input[edge[1, :], :], edge_embed[:, :]), dim=1).t() 112 | # edge_h: (2*in_dim + nrela_dim) x E 113 | 114 | edge_m = self.a.mm(edge_h) 115 | # edge_m: D * E 116 | 117 | # to be checked later 118 | powers = -self.leakyrelu(self.a_2.mm(edge_m).squeeze()) 119 | edge_e = torch.exp(powers).unsqueeze(1) 120 | # assert not torch.isnan(edge_e).any() 121 | # edge_e: E 122 | 123 | e_rowsum = self.special_spmm_final( 124 | edge, edge_e, N, edge_e.shape[0], 1) 125 | e_rowsum[e_rowsum == 0.0] = 1e-12 126 | 127 | e_rowsum = e_rowsum 128 | # e_rowsum: N x 1 129 | edge_e = edge_e.squeeze(1) 130 | 131 | edge_e = self.dropout(edge_e) 132 | # edge_e: E 133 | 134 | edge_w = (edge_e * edge_m).t() 135 | # edge_w: E * D 136 | 137 | h_prime = self.special_spmm_final( 138 | edge, edge_w, N, edge_w.shape[0], self.out_features) 139 | 140 | # assert not torch.isnan(h_prime).any() 141 | # h_prime: N x out 142 | h_prime = h_prime.div(e_rowsum) 143 | # h_prime: N x out 144 | 145 | # assert not torch.isnan(h_prime).any() 146 | if self.concat: 147 | # if this layer is not last layer, 148 | return F.elu(h_prime) 149 | else: 150 | # if this layer is last layer, 151 | return h_prime 152 | 153 | def __repr__(self): 154 | return self.__class__.__name__ + ' (' + str(self.in_features) + ' -> ' + str(self.out_features) + ')' 155 | -------------------------------------------------------------------------------- /CoMPILE_v2/train/layers.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TmacMai/CoMPILE_Inductive_Knowledge_Graph/523db77f8178f89faf5ec06c677ca66b84ef35eb/CoMPILE_v2/train/layers.pyc -------------------------------------------------------------------------------- /CoMPILE_v2/train/layers2.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import torch 3 | import torch.nn as nn 4 | import time 5 | import torch.nn.functional as F 6 | from torch.autograd import Variable 7 | 8 | 9 | CUDA = torch.cuda.is_available() 10 | 11 | 12 | class ConvKB(nn.Module): 13 | def __init__(self, input_dim, input_seq_len, in_channels, out_channels, drop_prob, alpha_leaky): 14 | super().__init__() 15 | 16 | self.conv_layer = nn.Conv2d( 17 | in_channels, out_channels, (1, input_seq_len)) # kernel size -> 1*input_seq_length(i.e. 2) 18 | self.dropout = nn.Dropout(drop_prob) 19 | self.non_linearity = nn.ReLU() 20 | self.fc_layer = nn.Linear((input_dim) * out_channels, 1) 21 | 22 | nn.init.xavier_uniform_(self.fc_layer.weight, gain=1.414) 23 | nn.init.xavier_uniform_(self.conv_layer.weight, gain=1.414) 24 | 25 | def forward(self, conv_input): 26 | 27 | batch_size, length, dim = conv_input.size() 28 | # assuming inputs are of the form -> 29 | conv_input = conv_input.transpose(1, 2) 30 | # batch * length(which is 3 here -> entity,relation,entity) * dim 31 | # To make tensor of size 4, where second dim is for input channels 32 | conv_input = conv_input.unsqueeze(1) 33 | 34 | out_conv = self.dropout( 35 | self.non_linearity(self.conv_layer(conv_input))) 36 | 37 | input_fc = out_conv.squeeze(-1).view(batch_size, -1) 38 | output = self.fc_layer(input_fc) 39 | return output 40 | 41 | 42 | class SpecialSpmmFunctionFinal(torch.autograd.Function): 43 | """Special function for only sparse region backpropataion layer.""" 44 | @staticmethod 45 | def forward(ctx, edge, edge_w, N, E, out_features): 46 | # assert indices.requires_grad == False 47 | # print('edge', edge.shape, 'edge_w', edge_w.shape, N, E, out_features) 48 | a = torch.sparse_coo_tensor( 49 | edge, edge_w, torch.Size([N, N, out_features])) #####edge (2, 298431) edge_w (298431, embeddings) N:272115 E:298431 out_features:1 50 | b = torch.sparse.sum(a, dim=1) #####[N 1] 51 | ctx.N = b.shape[0] 52 | ctx.outfeat = b.shape[1] 53 | ctx.E = E 54 | ctx.indices = a._indices()[0, :] 55 | 56 | return b.to_dense() 57 | 58 | @staticmethod 59 | def backward(ctx, grad_output): 60 | grad_values = None 61 | if ctx.needs_input_grad[1]: 62 | edge_sources = ctx.indices 63 | 64 | if(CUDA): 65 | edge_sources = edge_sources.cuda() 66 | 67 | grad_values = grad_output[edge_sources] 68 | # grad_values = grad_values.view(ctx.E, ctx.outfeat) 69 | # print("Grad Outputs-> ", grad_output) 70 | # print("Grad values-> ", grad_values) 71 | return None, grad_values, None, None, None 72 | 73 | 74 | class SpecialSpmmFinal(nn.Module): 75 | def forward(self, edge, edge_w, N, E, out_features): 76 | return SpecialSpmmFunctionFinal.apply(edge, edge_w, N, E, out_features) 77 | 78 | 79 | 80 | 81 | 82 | class OurSpGraphAttentionLayer(nn.Module): 83 | """ 84 | Sparse version GAT layer, similar to https://arxiv.org/abs/1710.10903 85 | """ 86 | 87 | def __init__(self, num_nodes, in_features, out_features, nrela_dim, dropout, alpha, concat=True): 88 | super(OurSpGraphAttentionLayer, self).__init__() 89 | self.in_features = in_features 90 | self.out_features = out_features 91 | self.num_nodes = num_nodes 92 | self.alpha = alpha 93 | self.concat = concat 94 | self.nrela_dim = nrela_dim 95 | 96 | self.a = nn.Parameter(torch.zeros( 97 | size=(out_features, 2 * in_features + nrela_dim))) 98 | nn.init.xavier_normal_(self.a.data, gain=1.414) 99 | self.a_2 = nn.Parameter(torch.zeros(size=(1, out_features))) 100 | nn.init.xavier_normal_(self.a_2.data, gain=1.414) 101 | 102 | self.dropout = nn.Dropout(dropout) 103 | self.leakyrelu = nn.LeakyReLU(self.alpha) 104 | self.special_spmm_final = SpecialSpmmFinal() 105 | 106 | def forward(self, entity_embeddings, entity_list, relation_embed, target_relation_embed): 107 | N = entity_embeddings.size()[0] #####input = entity_embedding, edge = edge_list (bz, 2) 108 | 109 | # print('edge', edge.shape, 'edge_embed', edge_embed.shape, 'edge_embed_nhop', edge_embed_nhop.shape, 'edge_list_nhop', edge_list_nhop.shape) 110 | edge = entity_list ######[bz, 2] 111 | edge_embed = relation_embed ####[bz, 100] 112 | 113 | edge_h = torch.cat((entity_embeddings[edge[:, 0], :], entity_embeddings[edge[:, 1], :], edge_embed[:, :]), dim=1).t() ####[300, bz] [source target relation] embedding 114 | # edge_h: (2*in_dim + nrela_dim) x E 115 | # print('edge_h', edge_h.shape, 'a', self.a.shape) 116 | edge_m = self.a.mm(edge_h) ######[out_features, bz] 117 | 118 | powers = -self.leakyrelu(self.a_2.mm(edge_m).squeeze(0)) 119 | edge_e = torch.exp(powers).unsqueeze(1) ####[bz, 1] 120 | # assert not torch.isnan(edge_e).any() 121 | # edge_e: E 122 | # print('edge after concat ', edge.shape, 'edge_e ', edge_e.shape) 123 | e_rowsum = self.special_spmm_final( 124 | edge.t(), edge_e, N, edge_e.shape[0], 1) 125 | e_rowsum[e_rowsum == 0.0] = 1e-12 126 | 127 | e_rowsum = e_rowsum 128 | # e_rowsum: N x 1 129 | edge_e = edge_e.squeeze(1) #####[298431] 130 | 131 | edge_e = self.dropout(edge_e) 132 | # edge_e: E 133 | 134 | edge_w = (edge_e * edge_m).t() #####[bz, out_features] 135 | # edge_w: E * D 136 | 137 | h_prime = self.special_spmm_final( 138 | edge.t(), edge_w, N, edge_w.shape[0], self.out_features) 139 | 140 | # assert not torch.isnan(h_prime).any() 141 | # h_prime: N x out 142 | h_prime = h_prime.div(e_rowsum) 143 | # h_prime: N x out 144 | 145 | # assert not torch.isnan(h_prime).any() 146 | if self.concat: 147 | # if this layer is not last layer, 148 | return F.elu(h_prime) 149 | else: 150 | # if this layer is last layer, 151 | return h_prime 152 | 153 | def __repr__(self): 154 | return self.__class__.__name__ + ' (' + str(self.in_features) + ' -> ' + str(self.out_features) + ')' 155 | 156 | 157 | 158 | 159 | 160 | 161 | 162 | class SpGraphAttentionLayer(nn.Module): 163 | """ 164 | Sparse version GAT layer, similar to https://arxiv.org/abs/1710.10903 165 | """ 166 | 167 | def __init__(self, num_nodes, in_features, out_features, nrela_dim, dropout, alpha, concat=True): 168 | super(SpGraphAttentionLayer, self).__init__() 169 | self.in_features = in_features 170 | self.out_features = out_features 171 | self.num_nodes = num_nodes 172 | self.alpha = alpha 173 | self.concat = concat 174 | self.nrela_dim = nrela_dim 175 | 176 | self.a = nn.Parameter(torch.zeros( 177 | size=(out_features, 2 * in_features + nrela_dim))) 178 | nn.init.xavier_normal_(self.a.data, gain=1.414) 179 | self.a_2 = nn.Parameter(torch.zeros(size=(1, out_features))) 180 | nn.init.xavier_normal_(self.a_2.data, gain=1.414) 181 | 182 | self.dropout = nn.Dropout(dropout) 183 | self.leakyrelu = nn.LeakyReLU(self.alpha) 184 | self.special_spmm_final = SpecialSpmmFinal() 185 | 186 | def forward(self, input, edge, edge_embed, edge_list_nhop, edge_embed_nhop): 187 | N = input.size()[0] #####input = entity_embedding, edge = edge_list (2, 272115), edge_embed_nhop (26316, 100) 188 | 189 | # Self-attention on the nodes - Shared attention mechanism 190 | # print('edge', edge.shape, 'edge_embed', edge_embed.shape, 'edge_embed_nhop', edge_embed_nhop.shape, 'edge_list_nhop', edge_list_nhop.shape) 191 | edge = torch.cat((edge[:, :], edge_list_nhop[:, :]), dim=1) ######[2, 298431] 192 | edge_embed = torch.cat((edge_embed[:, :], edge_embed_nhop[:, :]), dim=0) ####[298431, 100] 193 | 194 | edge_h = torch.cat((input[edge[0, :], :], input[edge[1, :], :], edge_embed[:, :]), dim=1).t() ####[300, 298431] [source target relation] embedding 195 | # edge_h: (2*in_dim + nrela_dim) x E 196 | # print('edge_h', edge_h.shape) 197 | edge_m = self.a.mm(edge_h) ######[out_features, 298431] 198 | # edge_m: D * E 199 | 200 | # to be checked later 201 | powers = -self.leakyrelu(self.a_2.mm(edge_m).squeeze()) 202 | edge_e = torch.exp(powers).unsqueeze(1) ####[298431, 1] 203 | assert not torch.isnan(edge_e).any() 204 | # edge_e: E 205 | # print('edge after concat ', edge.shape, 'edge_e ', edge_e.shape) 206 | e_rowsum = self.special_spmm_final( 207 | edge, edge_e, N, edge_e.shape[0], 1) 208 | e_rowsum[e_rowsum == 0.0] = 1e-12 209 | 210 | e_rowsum = e_rowsum 211 | # e_rowsum: N x 1 212 | edge_e = edge_e.squeeze(1) #####[298431] 213 | 214 | edge_e = self.dropout(edge_e) 215 | # edge_e: E 216 | 217 | edge_w = (edge_e * edge_m).t() #####[298431, out_features] 218 | # edge_w: E * D 219 | 220 | h_prime = self.special_spmm_final( 221 | edge, edge_w, N, edge_w.shape[0], self.out_features) 222 | 223 | assert not torch.isnan(h_prime).any() 224 | # h_prime: N x out 225 | h_prime = h_prime.div(e_rowsum) 226 | # h_prime: N x out 227 | 228 | assert not torch.isnan(h_prime).any() 229 | if self.concat: 230 | # if this layer is not last layer, 231 | return F.elu(h_prime) 232 | else: 233 | # if this layer is last layer, 234 | return h_prime 235 | 236 | def __repr__(self): 237 | return self.__class__.__name__ + ' (' + str(self.in_features) + ' -> ' + str(self.out_features) + ')' 238 | -------------------------------------------------------------------------------- /CoMPILE_v2/train/layers2.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TmacMai/CoMPILE_Inductive_Knowledge_Graph/523db77f8178f89faf5ec06c677ca66b84ef35eb/CoMPILE_v2/train/layers2.pyc -------------------------------------------------------------------------------- /CoMPILE_v2/train/models.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TmacMai/CoMPILE_Inductive_Knowledge_Graph/523db77f8178f89faf5ec06c677ca66b84ef35eb/CoMPILE_v2/train/models.pyc -------------------------------------------------------------------------------- /CoMPILE_v2/train/models4.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TmacMai/CoMPILE_Inductive_Knowledge_Graph/523db77f8178f89faf5ec06c677ca66b84ef35eb/CoMPILE_v2/train/models4.pyc -------------------------------------------------------------------------------- /CoMPILE_v2/train/preprocess.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TmacMai/CoMPILE_Inductive_Knowledge_Graph/523db77f8178f89faf5ec06c677ca66b84ef35eb/CoMPILE_v2/train/preprocess.pyc -------------------------------------------------------------------------------- /CoMPILE_v2/train/preprocess2.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import os 3 | import numpy as np 4 | 5 | 6 | def read_entity_from_id(filename='./data/WN18RR/entity2id_inductive.txt'): 7 | entity2id = {} 8 | with open(filename, 'r') as f: 9 | for line in f: 10 | if len(line.strip().split('\t')) > 1: 11 | entity, entity_id = line.strip().split('\t' 12 | )[0].strip(), line.strip().split('\t')[1].strip() 13 | entity2id[entity] = int(entity_id) 14 | return entity2id 15 | 16 | 17 | def read_relation_from_id(filename='./data/WN18RR/relation2id_inductive.txt'): 18 | relation2id = {} 19 | with open(filename, 'r') as f: 20 | for line in f: 21 | if len(line.strip().split('\t')) > 1: 22 | relation, relation_id = line.strip().split('\t' 23 | )[0].strip(), line.strip().split('\t')[1].strip() 24 | relation2id[relation] = int(relation_id) 25 | return relation2id 26 | 27 | 28 | def init_embeddings(entity_file, relation_file): 29 | entity_emb, relation_emb = [], [] 30 | 31 | with open(entity_file) as f: 32 | for line in f: 33 | entity_emb.append([float(val) for val in line.strip().split('\t')]) 34 | 35 | with open(relation_file) as f: 36 | for line in f: 37 | relation_emb.append([float(val) for val in line.strip().split('\t')]) 38 | 39 | return np.array(entity_emb, dtype=np.float32), np.array(relation_emb, dtype=np.float32) 40 | 41 | 42 | def parse_line(line): 43 | line = line.strip().split('\t') 44 | e1, relation, e2 = line[0].strip(), line[1].strip(), line[2].strip() 45 | return e1, relation, e2 46 | 47 | 48 | def load_data(filename, entity2id, relation2id, is_unweigted=False, directed=True): 49 | with open(filename) as f: 50 | lines = f.readlines() 51 | 52 | # this is list for relation triples 53 | triples_data = [] 54 | 55 | # for sparse tensor, rows list contains corresponding row of sparse tensor, cols list contains corresponding 56 | # columnn of sparse tensor, data contains the type of relation 57 | # Adjacecny matrix of entities is undirected, as the source and tail entities should know, the relation 58 | # type they are connected with 59 | rows, cols, data = [], [], [] 60 | unique_entities = set() 61 | for line in lines: 62 | e1, relation, e2 = parse_line(line) 63 | unique_entities.add(e1) 64 | unique_entities.add(e2) 65 | triples_data.append( 66 | (entity2id[e1], relation2id[relation], entity2id[e2])) 67 | if not directed: 68 | # Connecting source and tail entity 69 | rows.append(entity2id[e1]) 70 | cols.append(entity2id[e2]) 71 | if is_unweigted: 72 | data.append(1) 73 | else: 74 | data.append(relation2id[relation]) 75 | 76 | # Connecting tail and source entity 77 | rows.append(entity2id[e2]) 78 | cols.append(entity2id[e1]) 79 | if is_unweigted: 80 | data.append(1) 81 | else: 82 | data.append(relation2id[relation]) 83 | 84 | print("number of unique_entities ->", len(unique_entities)) 85 | return triples_data, (rows, cols, data), list(unique_entities) 86 | 87 | 88 | def build_data(path='./data/WN18RR/', is_unweigted=False, directed=True): 89 | entity2id = read_entity_from_id(path + 'entity2id_inductive.txt') 90 | relation2id = read_relation_from_id(path + 'relation2id_inductive.txt') 91 | 92 | # Adjacency matrix only required for training phase 93 | # Currenlty creating as unweighted, undirected 94 | train_triples, train_adjacency_mat, unique_entities_train = load_data(os.path.join( 95 | path, 'train_inductive.txt'), entity2id, relation2id, is_unweigted, directed) 96 | validation_triples, valid_adjacency_mat, unique_entities_validation = load_data( 97 | os.path.join(path, 'valid_inductive.txt'), entity2id, relation2id, is_unweigted, directed) 98 | test_triples, test_adjacency_mat, unique_entities_test = load_data(os.path.join( 99 | path, 'test_inductive.txt'), entity2id, relation2id, is_unweigted, directed) 100 | # all_triples, all_adjacency_mat, unique_entities_all = load_data(os.path.join( 101 | # path, 'all_inductive.txt'), entity2id, relation2id, is_unweigted, directed) 102 | 103 | id2entity = {v: k for k, v in entity2id.items()} 104 | id2relation = {v: k for k, v in relation2id.items()} 105 | left_entity, right_entity = {}, {} 106 | 107 | with open(os.path.join(path, 'train_inductive.txt')) as f: 108 | lines = f.readlines() 109 | 110 | for line in lines: 111 | e1, relation, e2 = parse_line(line) 112 | 113 | # Count number of occurences for each (e1, relation) 114 | if relation2id[relation] not in left_entity: 115 | left_entity[relation2id[relation]] = {} 116 | if entity2id[e1] not in left_entity[relation2id[relation]]: 117 | left_entity[relation2id[relation]][entity2id[e1]] = 0 118 | left_entity[relation2id[relation]][entity2id[e1]] += 1 119 | 120 | # Count number of occurences for each (relation, e2) 121 | if relation2id[relation] not in right_entity: 122 | right_entity[relation2id[relation]] = {} 123 | if entity2id[e2] not in right_entity[relation2id[relation]]: 124 | right_entity[relation2id[relation]][entity2id[e2]] = 0 125 | right_entity[relation2id[relation]][entity2id[e2]] += 1 126 | 127 | left_entity_avg = {} 128 | for i in range(len(relation2id)): 129 | left_entity_avg[i] = sum( 130 | left_entity[i].values()) * 1.0 / len(left_entity[i]) 131 | 132 | right_entity_avg = {} 133 | for i in range(len(relation2id)): 134 | right_entity_avg[i] = sum( 135 | right_entity[i].values()) * 1.0 / len(right_entity[i]) 136 | 137 | headTailSelector = {} 138 | for i in range(len(relation2id)): 139 | headTailSelector[i] = 1000 * right_entity_avg[i] / \ 140 | (right_entity_avg[i] + left_entity_avg[i]) 141 | 142 | return (train_triples, train_adjacency_mat), (validation_triples, valid_adjacency_mat), (test_triples, test_adjacency_mat), \ 143 | entity2id, relation2id, headTailSelector, unique_entities_train, unique_entities_validation, unique_entities_test 144 | -------------------------------------------------------------------------------- /CoMPILE_v2/train/preprocess2.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TmacMai/CoMPILE_Inductive_Knowledge_Graph/523db77f8178f89faf5ec06c677ca66b84ef35eb/CoMPILE_v2/train/preprocess2.pyc -------------------------------------------------------------------------------- /CoMPILE_v2/train/preprocess_inductive.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import os 3 | import numpy as np 4 | from random import sample 5 | 6 | def read_entity_from_id(filename='./data/WN18RR/entity2id.txt'): 7 | entity2id = {} 8 | with open(filename, 'r') as f: 9 | for line in f: 10 | if len(line.strip().split('\t')) > 1: 11 | entity, entity_id = line.strip().split('\t' 12 | )[0].strip(), line.strip().split('\t')[1].strip() 13 | entity2id[entity] = int(entity_id) 14 | return entity2id 15 | 16 | 17 | def read_relation_from_id(filename='./data/WN18RR/relation2id.txt'): 18 | relation2id = {} 19 | with open(filename, 'r') as f: 20 | for line in f: 21 | if len(line.strip().split('\t')) > 1: 22 | relation, relation_id = line.strip().split('\t' 23 | )[0].strip(), line.strip().split('\t')[1].strip() 24 | relation2id[relation] = int(relation_id) 25 | return relation2id 26 | 27 | 28 | def init_embeddings(entity_file, relation_file): 29 | entity_emb, relation_emb = [], [] 30 | 31 | with open(entity_file) as f: 32 | for line in f: 33 | entity_emb.append([float(val) for val in line.strip().split('\t')]) 34 | 35 | with open(relation_file) as f: 36 | for line in f: 37 | relation_emb.append([float(val) for val in line.strip().split('\t')]) 38 | 39 | return np.array(entity_emb, dtype=np.float32), np.array(relation_emb, dtype=np.float32) 40 | 41 | 42 | def parse_line(line): 43 | line = line.strip().split('\t') 44 | e1, relation, e2 = line[0].strip(), line[1].strip(), line[2].strip() 45 | return e1, relation, e2 46 | 47 | 48 | def get_id(filename, is_unweigted=False, directed=True, saved_relation2id=None): 49 | 50 | entity2id = {} 51 | relation2id = {} if saved_relation2id is None else saved_relation2id 52 | 53 | triples_data = {} 54 | rows, cols, data = [], [], [] 55 | unique_entities = set() 56 | 57 | ent = 0 58 | rel = 0 59 | 60 | for filename1 in filename: 61 | 62 | data = [] 63 | with open(filename1) as f: 64 | file_data = [line.split() for line in f.read().split('\n')[:-1]] 65 | 66 | for triplet in file_data: 67 | if triplet[0] not in entity2id: 68 | entity2id[triplet[0]] = ent 69 | ent += 1 70 | if triplet[2] not in entity2id: 71 | entity2id[triplet[2]] = ent 72 | ent += 1 73 | if not saved_relation2id and triplet[1] not in relation2id: 74 | relation2id[triplet[1]] = rel 75 | rel += 1 76 | 77 | # Save the triplets corresponding to only the known relations 78 | if triplet[1] in relation2id: 79 | data.append([entity2id[triplet[0]], entity2id[triplet[2]], relation2id[triplet[1]]]) 80 | 81 | # triplets[file_type] = np.array(data) 82 | 83 | id2entity = {v: k for k, v in entity2id.items()} 84 | id2relation = {v: k for k, v in relation2id.items()} 85 | return entity2id, relation2id, rel 86 | 87 | 88 | def load_data(filename, entity2id, relation2id, is_unweigted=False, directed=True): 89 | with open(filename) as f: 90 | lines = [line.split() for line in f.read().split('\n')[:-1]] 91 | 92 | triples_data = [] 93 | 94 | rows, cols, data = [], [], [] 95 | unique_entities = set() 96 | for line in lines: 97 | e1, relation, e2 = line[0], line[1], line[2] 98 | unique_entities.add(e1) 99 | unique_entities.add(e2) 100 | triples_data.append( 101 | (entity2id[e1], relation2id[relation], entity2id[e2])) 102 | if not directed: 103 | # Connecting source and tail entity 104 | rows.append(entity2id[e1]) 105 | cols.append(entity2id[e2]) 106 | if is_unweigted: 107 | data.append(1) 108 | else: 109 | data.append(relation2id[relation]) 110 | 111 | # Connecting tail and source entity 112 | rows.append(entity2id[e2]) 113 | cols.append(entity2id[e1]) 114 | if is_unweigted: 115 | data.append(1) 116 | else: 117 | data.append(relation2id[relation]) 118 | 119 | print("number of unique_entities ->", len(unique_entities)) 120 | return triples_data, (rows, cols, data), list(unique_entities) 121 | 122 | 123 | def build_data(path='./data/FB15k-237-inductive-v3/', is_unweigted=False, directed=True): 124 | # entity2id = read_entity_from_id(path + 'entity2id.txt') 125 | # relation2id = read_relation_from_id(path + 'relation2id.txt') 126 | entity2id, relation2id, rel = get_id([os.path.join(path, 'train.txt'), os.path.join(path, 'valid.txt'), os.path.join(path, 'train_inductive.txt')]) 127 | 128 | train_triples, train_adjacency_mat, unique_entities_train = load_data(os.path.join(path, 'train.txt'), entity2id, relation2id, is_unweigted, directed) 129 | validation_triples, valid_adjacency_mat, unique_entities_validation = load_data(os.path.join(path, 'valid.txt'), entity2id, relation2id, is_unweigted, directed) 130 | _, test_adjacency_mat, unique_entities_test = load_data(os.path.join(path, 'train_inductive.txt'), entity2id, relation2id, is_unweigted, directed) 131 | 132 | test_links, _, _ = load_data(os.path.join(path, 'test_inductive.txt'), entity2id, relation2id, is_unweigted, directed) 133 | 134 | test_triples = test_links 135 | 136 | id2entity = {v: k for k, v in entity2id.items()} 137 | id2relation = {v: k for k, v in relation2id.items()} 138 | left_entity, right_entity = {}, {} 139 | 140 | return (train_triples, train_adjacency_mat), (validation_triples, valid_adjacency_mat), (test_triples, test_adjacency_mat), \ 141 | entity2id, relation2id, rel, unique_entities_train, unique_entities_validation, unique_entities_test 142 | -------------------------------------------------------------------------------- /CoMPILE_v2/train/utils.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TmacMai/CoMPILE_Inductive_Knowledge_Graph/523db77f8178f89faf5ec06c677ca66b84ef35eb/CoMPILE_v2/train/utils.pyc -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2020 TmacMai 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | News: version 2 has been uploaded. 2 | 3 | 4 | Our CoMPILE has two versions. 5 | 6 | 7 | 8 | #################################version 1######################################### 9 | 10 | The first version is implemented based on GraIL (https://github.com/kkteru/grail), in which we evaluate our message passing model on the original inductive datasets proposed by the authors of the GraIL. We thank very much for their code sharing. 11 | 12 | To run the code, firstly you need to unrar the data.rar and place the folder under CoMPILE_github. 13 | 14 | To train the model (take FB15k-237 inductive v4 dataset as example): 15 | 16 | python train.py -d fb237_v4 -e compile_fb_v4_ind 17 | 18 | 19 | To evaluate the AUC score of the trained model: 20 | 21 | python test_auc.py -d fb237_v4_ind -e compile_fb_v4_ind 22 | 23 | 24 | 25 | To evaluate the Hits@10 score of the trained model: 26 | 27 | python test_ranking.py -d fb237_v4_ind -e compile_fb_v4_ind 28 | 29 | 30 | #################################version 2######################################### 31 | 32 | In version 2, we implement our inductive learning system, including the data filtering, directed subgraph extraction, and the message passing mechanism. 33 | 34 | To be updated... 35 | --------------------------------------------------------------------------------