├── CoMPILE_github
    ├── data.rar
    ├── ensembling
    │   ├── blend.py
    │   ├── compute_auc.py
    │   ├── compute_rank_metrics.py
    │   ├── get_ensemble_predictions.sh
    │   ├── get_kge_ensemble.sh
    │   └── score_triplets_kge.py
    ├── experiments
    │   ├── compile_nell_v4_ind
    │   │   ├── best_graph_classifier.pth
    │   │   ├── graph_classifier_chk.pth
    │   │   ├── log_rank_test_1642565763.9744015.txt
    │   │   ├── log_train.txt
    │   │   ├── params.json
    │   │   └── test_nell_v4_ind_0
    │   │   │   └── log_test.txt
    │   └── compile_nell_v4_ind2
    │   │   ├── best_graph_classifier.pth
    │   │   ├── graph_classifier_chk.pth
    │   │   ├── log_rank_test_1642603455.548011.txt
    │   │   ├── log_rank_test_1642647880.451746.txt
    │   │   ├── log_rank_test_1642679948.7119958.txt
    │   │   ├── log_rank_test_1642733457.0139189.txt
    │   │   ├── log_rank_test_1642832066.1408617.txt
    │   │   ├── log_rank_test_1642841084.211995.txt
    │   │   ├── log_rank_test_1642850740.4142146.txt
    │   │   ├── log_rank_test_1642855466.9402952.txt
    │   │   ├── log_rank_test_1642859913.6085432.txt
    │   │   ├── log_rank_test_1642908893.5786257.txt
    │   │   ├── log_train.txt
    │   │   ├── params.json
    │   │   └── test_nell_v4_ind_0
    │   │       └── log_test.txt
    ├── kge
    │   ├── README.md
    │   ├── dataloader.py
    │   ├── model.py
    │   ├── run.py
    │   └── run.sh
    ├── managers
    │   ├── __pycache__
    │   │   ├── evaluator.cpython-36.pyc
    │   │   └── trainer.cpython-36.pyc
    │   ├── evaluator.py
    │   └── trainer.py
    ├── model
    │   └── dgl
    │   │   ├── aggregators.py
    │   │   ├── graph_classifier.py
    │   │   ├── layers.py
    │   │   └── rgcn_model.py
    ├── requirements.txt
    ├── subgraph_extraction
    │   ├── __pycache__
    │   │   ├── datasets.cpython-36.pyc
    │   │   └── graph_sampler.cpython-36.pyc
    │   ├── datasets.py
    │   └── graph_sampler.py
    ├── test_auc.py
    ├── test_ranking.py
    ├── train.py
    └── utils
    │   ├── __pycache__
    │       ├── data_utils.cpython-36.pyc
    │       ├── dgl_utils.cpython-36.pyc
    │       ├── graph_utils.cpython-36.pyc
    │       └── initialization_utils.cpython-36.pyc
    │   ├── clean_data.py
    │   ├── data_utils.py
    │   ├── dgl_utils.py
    │   ├── graph_utils.py
    │   ├── initialization_utils.py
    │   └── prepare_meta_data.py
├── CoMPILE_v2
    ├── README.md
    ├── data
    │   ├── FB15k-237-inductive-v1
    │   │   ├── relation2id.json
    │   │   ├── test.txt
    │   │   ├── test_inductive.txt
    │   │   ├── train.txt
    │   │   ├── train_inductive.txt
    │   │   ├── valid.txt
    │   │   └── valid_inductive.txt
    │   ├── FB15k-237-inductive-v2
    │   │   ├── test.txt
    │   │   ├── test_inductive.txt
    │   │   ├── train.txt
    │   │   ├── train_inductive.txt
    │   │   ├── valid.txt
    │   │   └── valid_inductive.txt
    │   ├── FB15k-237-inductive-v3
    │   │   ├── test.txt
    │   │   ├── test_inductive.txt
    │   │   ├── train.txt
    │   │   ├── train_inductive.txt
    │   │   ├── valid.txt
    │   │   └── valid_inductive.txt
    │   ├── FB15k-237-v1_3_hop_new_data.pickle
    │   ├── FB15k-237-v1_3_hop_total_test_head2.pickle
    │   ├── FB15k-237-v1_3_hop_total_test_tail2.pickle
    │   ├── FB15k-237-v1_3_total_train_data.pickle
    │   ├── FB15k-237-v1_3_total_train_label.pickle
    │   ├── FB15k-237-v1_3_total_val_data.pickle
    │   ├── FB15k-237-v1_3_total_val_label.pickle
    │   ├── nell-inductive-v1
    │   │   ├── test.txt
    │   │   ├── test_inductive.txt
    │   │   ├── train.txt
    │   │   ├── train_inductive.txt
    │   │   ├── valid.txt
    │   │   └── valid_inductive.txt
    │   ├── nell-inductive-v2
    │   │   ├── test.txt
    │   │   ├── test_inductive.txt
    │   │   ├── train.txt
    │   │   ├── train_inductive.txt
    │   │   ├── valid.txt
    │   │   └── valid_inductive.txt
    │   └── nell-inductive-v3
    │   │   ├── test.txt
    │   │   ├── test_inductive.txt
    │   │   ├── train.txt
    │   │   ├── train_inductive.txt
    │   │   ├── valid.txt
    │   │   └── valid_inductive.txt
    ├── log
    │   └── FB15k-237-v1_error_now.log
    └── train
    │   ├── __pycache__
    │       ├── create_batch_inductive2.cpython-37.pyc
    │       ├── layers.cpython-37.pyc
    │       ├── layers2.cpython-37.pyc
    │       ├── models4.cpython-37.pyc
    │       ├── preprocess2.cpython-37.pyc
    │       ├── preprocess_inductive.cpython-37.pyc
    │       └── utils.cpython-37.pyc
    │   ├── create_batch.pyc
    │   ├── create_batch2.pyc
    │   ├── create_batch_inductive2.py
    │   ├── create_dataset_files.py
    │   ├── inductive_subgraph8.py
    │   ├── layers.py
    │   ├── layers.pyc
    │   ├── layers2.py
    │   ├── layers2.pyc
    │   ├── main_compile.py
    │   ├── models.pyc
    │   ├── models4.py
    │   ├── models4.pyc
    │   ├── preprocess.pyc
    │   ├── preprocess2.py
    │   ├── preprocess2.pyc
    │   ├── preprocess_inductive.py
    │   ├── utils.py
    │   └── utils.pyc
├── LICENSE
└── README.md


/CoMPILE_github/data.rar:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TmacMai/CoMPILE_Inductive_Knowledge_Graph/523db77f8178f89faf5ec06c677ca66b84ef35eb/CoMPILE_github/data.rar


--------------------------------------------------------------------------------
/CoMPILE_github/ensembling/blend.py:
--------------------------------------------------------------------------------
  1 | import argparse
  2 | import os
  3 | 
  4 | 
  5 | import torch
  6 | import torch.nn as nn
  7 | import torch.optim as optim
  8 | 
  9 | 
 10 | def read_scores(path):
 11 |     with open(path) as f:
 12 |         scores = [float(line.split()[-1]) for line in f.read().split('\n')[:-1]]
 13 |     return scores
 14 | 
 15 | 
 16 | def get_triplets(path):
 17 |     with open(path) as f:
 18 |         triplets = [line.split()[:-1] for line in f.read().split('\n')[:-1]]
 19 |     return triplets
 20 | 
 21 | 
 22 | def train(params):
 23 |     '''
 24 |     Train and save a linear layer model.
 25 |     '''
 26 |     ens_model_1_pos_scores_path = os.path.join('../data/{}/{}_valid_predictions.txt'.format(params.dataset, params.ensemble_model_1))
 27 |     ens_model_1_neg_scores_path = os.path.join('../data/{}/{}_neg_valid_0_predictions.txt'.format(params.dataset, params.ensemble_model_1))
 28 |     ens_model_2_pos_scores_path = os.path.join('../data/{}/{}_valid_predictions.txt'.format(params.dataset, params.ensemble_model_2))
 29 |     ens_model_2_neg_scores_path = os.path.join('../data/{}/{}_neg_valid_0_predictions.txt'.format(params.dataset, params.ensemble_model_2))
 30 | 
 31 |     assert get_triplets(ens_model_1_pos_scores_path) == get_triplets(ens_model_2_pos_scores_path)
 32 |     assert get_triplets(ens_model_1_neg_scores_path) == get_triplets(ens_model_2_neg_scores_path)
 33 | 
 34 |     pos_scores = torch.Tensor(list(zip(read_scores(ens_model_1_pos_scores_path), read_scores(ens_model_2_pos_scores_path))))
 35 |     neg_scores = torch.Tensor(list(zip(read_scores(ens_model_1_neg_scores_path), read_scores(ens_model_2_neg_scores_path))))
 36 | 
 37 |     # scores = pos_scores + neg_scores
 38 |     # targets = [1] * len(pos_scores) + [0] * len(neg_scores)
 39 | 
 40 |     model = nn.Linear(in_features=2, out_features=1)
 41 |     criterion = nn.MarginRankingLoss(10, reduction='sum')
 42 |     optimizer = optim.Adam(model.parameters(), lr=0.1, weight_decay=5e-4)
 43 | 
 44 |     for e in range(params.num_epochs):
 45 |         pos_out = model(pos_scores)
 46 |         neg_out = model(neg_scores)
 47 | 
 48 |         loss = criterion(pos_out, neg_out.view(len(pos_out), -1).mean(dim=1), torch.Tensor([1]))
 49 |         print('Loss at epoch {} : {}'.format(e, loss))
 50 |         optimizer.zero_grad()
 51 |         loss.backward()
 52 |         optimizer.step()
 53 | 
 54 |     torch.save(model, os.path.join('../experiments', f'{params.ensemble_model_1}_{params.ensemble_model_2}_{params.dataset}_ensemble.pth'))
 55 | 
 56 | 
 57 | def score_triplets(params):
 58 |     '''
 59 |     Load the saved model and save scores of given set of triplets.
 60 |     '''
 61 |     print('Loading model..')
 62 |     model = torch.load(os.path.join('../experiments', f'{params.ensemble_model_1}_{params.ensemble_model_2}_{params.dataset}_ensemble.pth'))
 63 |     print('Model loaded successfully!')
 64 | 
 65 |     ens_model_1_scores_path = os.path.join('../data/{}/{}_{}_predictions.txt'.format(params.dataset, params.ensemble_model_1, params.file_to_score))
 66 |     ens_model_2_scores_path = os.path.join('../data/{}/{}_{}_predictions.txt'.format(params.dataset, params.ensemble_model_2, params.file_to_score))
 67 | 
 68 |     scores = torch.Tensor(list(zip(read_scores(ens_model_1_scores_path), read_scores(ens_model_2_scores_path))))
 69 |     ens_scores = model(scores)
 70 | 
 71 |     ens_model_1_triplets = get_triplets(ens_model_1_scores_path)
 72 |     ens_model_2_triplets = get_triplets(ens_model_2_scores_path)
 73 | 
 74 |     assert ens_model_1_triplets == ens_model_2_triplets
 75 | 
 76 |     file_path = os.path.join('../', 'data/{}/{}_with_{}_{}_predictions.txt'.format(params.dataset, params.ensemble_model_1, params.ensemble_model_2, params.file_to_score))
 77 |     with open(file_path, "w") as f:
 78 |         for ([s, r, o], score) in zip(ens_model_1_triplets, ens_scores):
 79 |             f.write('\t'.join([s, r, o, str(score.item())]) + '\n')
 80 | 
 81 | 
 82 | if __name__ == '__main__':
 83 |     parser = argparse.ArgumentParser(description='Model blender script')
 84 | 
 85 |     parser.add_argument('--dataset', '-d', type=str, default='Toy')
 86 |     parser.add_argument('--ensemble_model_1', '-em1', default='grail', type=str)
 87 |     parser.add_argument('--ensemble_model_2', '-em2', default='TransE', type=str)
 88 |     parser.add_argument('--do_train', action='store_true')
 89 |     parser.add_argument("--num_epochs", "-ne", type=int, default=500,
 90 |                         help="Number of training iterations")
 91 |     parser.add_argument('--do_scoring', action='store_true')
 92 |     parser.add_argument('--file_to_score', '-f', default='valid', type=str)
 93 | 
 94 |     params = parser.parse_args()
 95 | 
 96 |     if params.do_train:
 97 |         train(params)
 98 |     elif params.do_scoring:
 99 |         score_triplets(params)
100 | 


--------------------------------------------------------------------------------
/CoMPILE_github/ensembling/compute_auc.py:
--------------------------------------------------------------------------------
 1 | import argparse
 2 | from sklearn import metrics
 3 | 
 4 | if __name__ == '__main__':
 5 |     parser = argparse.ArgumentParser(description='Compute AUC from scored positive and negative triplets')
 6 | 
 7 |     parser.add_argument('--dataset', '-d', type=str, default='Toy')
 8 |     parser.add_argument('--model', '-m', default='ens', type=str)
 9 |     parser.add_argument('--test_file', '-t', default='test', type=str)
10 | 
11 |     params = parser.parse_args()
12 | 
13 |     # load pos and neg prediction scores of the test_file of the dataset for the given model.
14 |     with open('../data/{}/{}_{}_predictions.txt'.format(params.dataset, params.model, params.test_file)) as f:
15 |         pos_scores = [float(line.split()[-1]) for line in f.read().split('\n')[:-1]]
16 |     with open('../data/{}/{}_neg_{}_0_predictions.txt'.format(params.dataset, params.model, params.test_file)) as f:
17 |         neg_scores = [float(line.split()[-1]) for line in f.read().split('\n')[:-1]]
18 | 
19 |     # compute auc score
20 |     scores = pos_scores + neg_scores
21 |     labels = [1] * len(pos_scores) + [0] * len(neg_scores)
22 | 
23 |     auc = metrics.roc_auc_score(labels, scores)
24 |     auc_pr = metrics.average_precision_score(labels, scores)
25 | 
26 |     with open('../data/{}/{}_{}_auc.txt'.format(params.dataset, params.model, params.test_file), "w") as f:
27 |         f.write('AUC : {}, AUC_PR : {}\n'.format(auc, auc_pr))
28 | 


--------------------------------------------------------------------------------
/CoMPILE_github/ensembling/compute_rank_metrics.py:
--------------------------------------------------------------------------------
 1 | import argparse
 2 | import os
 3 | 
 4 | import numpy as np
 5 | from scipy.stats import rankdata
 6 | 
 7 | 
 8 | def get_ranks(scores):
 9 |     '''
10 |     Given scores of head/tail substituted triplets, return ranks of each triplet.
11 |     Assumes a fixed number of negative samples (50)
12 |     '''
13 |     ranks = []
14 |     for i in range(len(scores) // 50):
15 |         # rank = np.argwhere(np.argsort(scores[50 * i: 50 * (i + 1)])[::-1] == 0) + 1
16 |         rank = 50 - rankdata(scores[50 * i: 50 * (i + 1)], method='min')[0] + 1
17 |         ranks.append(rank)
18 |     return ranks
19 | 
20 | 
21 | if __name__ == '__main__':
22 |     parser = argparse.ArgumentParser(description='Compute AUC from scored positive and negative triplets')
23 | 
24 |     parser.add_argument('--dataset', '-d', type=str, default='Toy')
25 |     parser.add_argument('--model', '-m', default='ens', type=str)
26 | 
27 |     params = parser.parse_args()
28 | 
29 |     # load head and tail prediction scores of the test file of the dataset for the given model.
30 |     with open('../data/{}/{}_ranking_head_predictions.txt'.format(params.dataset, params.model)) as f:
31 |         head_scores = [float(line.split()[-1]) for line in f.read().split('\n')[:-1]]
32 |     with open('../data/{}/{}_ranking_tail_predictions.txt'.format(params.dataset, params.model)) as f:
33 |         tail_scores = [float(line.split()[-1]) for line in f.read().split('\n')[:-1]]
34 | 
35 |     # compute both ranks from the prediction scores
36 |     head_ranks = get_ranks(head_scores)
37 |     tail_ranks = get_ranks(tail_scores)
38 | 
39 |     ranks = head_ranks + tail_ranks
40 | 
41 |     isHit1List = [x for x in ranks if x <= 1]
42 |     isHit5List = [x for x in ranks if x <= 5]
43 |     isHit10List = [x for x in ranks if x <= 10]
44 |     hits_1 = len(isHit1List) / len(ranks)
45 |     hits_5 = len(isHit5List) / len(ranks)
46 |     hits_10 = len(isHit10List) / len(ranks)
47 | 
48 |     mrr = np.mean(1 / np.array(ranks))
49 | 
50 |     with open('../data/{}/{}_ranking_metrics.txt'.format(params.dataset, params.model), "w") as f:
51 |         f.write(f'MRR | Hits@1 | Hits@5 | Hits@10 : {mrr} | {hits_1} | {hits_5} | {hits_10}\n')
52 | 


--------------------------------------------------------------------------------
/CoMPILE_github/ensembling/get_ensemble_predictions.sh:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env bash
 2 | 
 3 | # This script assumes GraIL predection scores on the validation and test set are already saved.
 4 | # It also assumes that scored head/tail replaced triplets are also stored.
 5 | # If any of those is not present, run the corresponding script from the following setup commands.
 6 | ##################### SET UP #####################
 7 | # python test_auc.py -d WN18RR -e saved_grail_exp_name --hop 3 -t valid
 8 | # python test_auc.py -d WN18RR -e saved_grail_exp_name --hop 3 -t test
 9 | 
10 | # python test_auc.py -d NELL-995 -e saved_grail_exp_name --hop 2 -t valid
11 | # python test.py -d NELL-995 -e saved_grail_exp_name --hop 2 -t test
12 | 
13 | # python test_auc.py -d FB15K237 -e saved_grail_exp_name --hop 1 -t valid
14 | # python test_auc.py -d FB15K237 -e saved_grail_exp_name --hop 1 -t test
15 | 
16 | # python test_ranking.py -d WN18RR -e saved_grail_exp_name --hop 3
17 | 
18 | # python test_ranking.py -d NELL-995 -e saved_grail_exp_name --hop 2
19 | 
20 | # python test_ranking.py -d FB15K237 -e saved_grail_exp_name --hop 1
21 | ##################################################
22 | 
23 | 
24 | # Arguments
25 | # Dataset
26 | DATASET=$1
27 | # KGE model to be used in ensemble
28 | KGE_MODEL=$2
29 | KGE_SAVED_MODEL_PATH="../experiments/kge_baselines/${KGE_MODEL}_${DATASET}"
30 | 
31 | # score pos validation triplets with KGE model
32 | python score_triplets_kge.py -d $DATASET --model $KGE_MODEL -f valid -init $KGE_SAVED_MODEL_PATH
33 | # score neg validation triplets with KGE model
34 | python score_triplets_kge.py -d $DATASET --model $KGE_MODEL -f neg_valid_0 -init $KGE_SAVED_MODEL_PATH
35 | 
36 | # train the ensemble model
37 | python blend.py -d $DATASET -em2 $KGE_MODEL --do_train -ne 500
38 | 
39 | # Score the test pos and neg triplets with KGE model
40 | python score_triplets_kge.py -d $DATASET --model $KGE_MODEL -f test -init $KGE_SAVED_MODEL_PATH
41 | python score_triplets_kge.py -d $DATASET --model $KGE_MODEL -f neg_test_0 -init $KGE_SAVED_MODEL_PATH
42 | # Score the test pos and neg triplets with ensemble model
43 | python blend.py -d $DATASET -em2 $KGE_MODEL --do_scoring -f test
44 | python blend.py -d $DATASET -em2 $KGE_MODEL --do_scoring -f neg_test_0
45 | # Compute auc with the ensemble model scored pos and neg test files
46 | python compute_auc.py -d $DATASET -m grail_with_${KGE_MODEL}
47 | # Compute auc with the KGE model model scored pos and neg test files
48 | python compute_auc.py -d $DATASET -m $KGE_MODEL
49 | 
50 | # Score head/tail replaced samples with KGE model
51 | python score_triplets_kge.py -d $DATASET --model $KGE_MODEL -f ranking_head -init $KGE_SAVED_MODEL_PATH
52 | python score_triplets_kge.py -d $DATASET --model $KGE_MODEL -f ranking_tail -init $KGE_SAVED_MODEL_PATH
53 | # Score head/tail replaced samples with ensemble model
54 | python blend.py -d $DATASET -em2 $KGE_MODEL --do_scoring -f ranking_head
55 | python blend.py -d $DATASET -em2 $KGE_MODEL --do_scoring -f ranking_tail
56 | # Compute ranking metrics for ensemble model with the scored head/tail replaced samples
57 | python compute_rank_metrics.py -d $DATASET -m grail_with_${KGE_MODEL}
58 | # Compute ranking metrics for KGE model with the scored head/tail replaced samples
59 | python compute_rank_metrics.py -d $DATASET -m $KGE_MODEL


--------------------------------------------------------------------------------
/CoMPILE_github/ensembling/get_kge_ensemble.sh:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env bash
 2 | 
 3 | # This script assumes that head/tail replaced negative triplets are already stored while evaluating GraIL.
 4 | # This assumptionn is made in order to make fair evaluations of all the methods on the same negative samples.
 5 | # If any of those is not present, run the corresponding script from the following setup commands. These will
 6 | # evaluate GraIL and savee thee negative samples along the way.
 7 | ##################### SET UP #####################
 8 | # python test_auc.py -d WN18RR -e saved_grail_exp_name --hop 3 -t valid
 9 | # python test_auc.py -d WN18RR -e saved_grail_exp_name --hop 3 -t test
10 | 
11 | # python test_auc.py -d NELL-995 -e saved_grail_exp_name --hop 2 -t valid
12 | # python test_auc.py -d NELL-995 -e saved_grail_exp_name --hop 2 -t test
13 | 
14 | # python test_auc.py -d FB15K237 -e saved_grail_exp_name --hop 1 -t valid
15 | # python test_auc.py -d FB15K237 -e saved_grail_exp_name --hop 1 -t test
16 | 
17 | # python test_ranking.py -d WN18RR -e saved_grail_exp_name --hop 3
18 | 
19 | # python test_ranking.py -d NELL-995 -e saved_grail_exp_name --hop 2
20 | 
21 | # python test_ranking.py -d FB15K237 -e saved_grail_exp_name --hop 1
22 | ##################################################
23 | 
24 | 
25 | # Arguments
26 | # Dataset
27 | DATASET=$1
28 | # KGE model to be used in ensemble
29 | KGE_MODEL_1=$2
30 | KGE_SAVED_MODEL_PATH_1="../experiments/kge_baselines/${KGE_MODEL_1}_${DATASET}"
31 | 
32 | KGE_MODEL_2=$3
33 | KGE_SAVED_MODEL_PATH_2="../experiments/kge_baselines/${KGE_MODEL_2}_${DATASET}"
34 | 
35 | # score pos validation triplets with KGE model
36 | python score_triplets_kge.py -d $DATASET --model $KGE_MODEL_1 -f valid -init $KGE_SAVED_MODEL_PATH_1
37 | # score neg validation triplets with KGE model
38 | python score_triplets_kge.py -d $DATASET --model $KGE_MODEL_1 -f neg_valid_0 -init $KGE_SAVED_MODEL_PATH_1
39 | 
40 | # score pos validation triplets with KGE model
41 | python score_triplets_kge.py -d $DATASET --model $KGE_MODEL_2 -f valid -init $KGE_SAVED_MODEL_PATH_2
42 | # score neg validation triplets with KGE model
43 | python score_triplets_kge.py -d $DATASET --model $KGE_MODEL_2 -f neg_valid_0 -init $KGE_SAVED_MODEL_PATH_2
44 | 
45 | # train the ensemble model
46 | python blend.py -d $DATASET -em1 $KGE_MODEL_1 -em2 $KGE_MODEL_2 --do_train -ne 500
47 | 
48 | # Score the test pos and neg triplets with KGE model
49 | python score_triplets_kge.py -d $DATASET --model $KGE_MODEL_1 -f test -init $KGE_SAVED_MODEL_PATH_1
50 | python score_triplets_kge.py -d $DATASET --model $KGE_MODEL_1 -f neg_test_0 -init $KGE_SAVED_MODEL_PATH_1
51 | # Score the test pos and neg triplets with KGE model
52 | python score_triplets_kge.py -d $DATASET --model $KGE_MODEL_2 -f test -init $KGE_SAVED_MODEL_PATH_2
53 | python score_triplets_kge.py -d $DATASET --model $KGE_MODEL_2 -f neg_test_0 -init $KGE_SAVED_MODEL_PATH_2
54 | 
55 | 
56 | # Score the test pos and neg triplets with ensemble model
57 | python blend.py -d $DATASET -em1 $KGE_MODEL_1 -em2 $KGE_MODEL_2 --do_scoring -f test
58 | python blend.py -d $DATASET -em1 $KGE_MODEL_1 -em2 $KGE_MODEL_2 --do_scoring -f neg_test_0
59 | # Compute auc with the ensemble model scored pos and neg test files
60 | python compute_auc.py -d $DATASET -m ${KGE_MODEL_1}_with_${KGE_MODEL_2}
61 | 
62 | # Score head/tail replaced samples with KGE model
63 | python score_triplets_kge.py -d $DATASET --model $KGE_MODEL_1 -f ranking_head -init $KGE_SAVED_MODEL_PATH_1
64 | python score_triplets_kge.py -d $DATASET --model $KGE_MODEL_1 -f ranking_tail -init $KGE_SAVED_MODEL_PATH_1
65 | # Score head/tail replaced samples with KGE model
66 | python score_triplets_kge.py -d $DATASET --model $KGE_MODEL_2 -f ranking_head -init $KGE_SAVED_MODEL_PATH_2
67 | python score_triplets_kge.py -d $DATASET --model $KGE_MODEL_2 -f ranking_tail -init $KGE_SAVED_MODEL_PATH_2
68 | 
69 | 
70 | # Score head/tail replaced samples with ensemble model
71 | python blend.py -d $DATASET -em1 $KGE_MODEL_1 -em2 $KGE_MODEL_2 --do_scoring -f ranking_head
72 | python blend.py -d $DATASET -em1 $KGE_MODEL_1 -em2 $KGE_MODEL_2 --do_scoring -f ranking_tail
73 | # Compute ranking metrics for ensemble model with the scored head/tail replaced samples
74 | python compute_rank_metrics.py -d $DATASET -m ${KGE_MODEL_1}_with_${KGE_MODEL_2}


--------------------------------------------------------------------------------
/CoMPILE_github/ensembling/score_triplets_kge.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/python3
  2 | 
  3 | from __future__ import absolute_import
  4 | from __future__ import division
  5 | from __future__ import print_function
  6 | 
  7 | import sys
  8 | sys.path.insert(1, '../')
  9 | 
 10 | import argparse
 11 | import json
 12 | import logging
 13 | import os
 14 | 
 15 | import torch
 16 | 
 17 | from kge.model import KGEModel
 18 | 
 19 | from utils.data_utils import process_files
 20 | 
 21 | 
 22 | def parse_args(args=None):
 23 |     parser = argparse.ArgumentParser(
 24 |         description='Training and Testing Knowledge Graph Embedding Models',
 25 |         usage='train.py [<args>] [-h | --help]'
 26 |     )
 27 | 
 28 |     parser.add_argument('--cuda', action='store_true', help='use GPU')
 29 | 
 30 |     parser.add_argument('--dataset', '-d', type=str, default='Toy')
 31 |     parser.add_argument('--model', '-m', default='TransE', type=str)
 32 |     parser.add_argument('--file_to_score', '-f', default='test', type=str)
 33 |     parser.add_argument('--init_checkpoint', '-init', default=None, type=str)
 34 | 
 35 |     return parser.parse_args(args)
 36 | 
 37 | 
 38 | def override_config(args):
 39 |     '''
 40 |     Override model and data configuration
 41 |     '''
 42 | 
 43 |     with open(os.path.join(args.init_checkpoint, 'config.json'), 'r') as fjson:
 44 |         argparse_dict = json.load(fjson)
 45 | 
 46 |     args.countries = argparse_dict['countries']
 47 |     if args.dataset is None:
 48 |         args.dataset = argparse_dict['dataset']
 49 |     args.model = argparse_dict['model']
 50 |     args.double_entity_embedding = argparse_dict['double_entity_embedding']
 51 |     args.double_relation_embedding = argparse_dict['double_relation_embedding']
 52 |     args.hidden_dim = argparse_dict['hidden_dim']
 53 |     args.test_batch_size = argparse_dict['test_batch_size']
 54 |     args.gamma = argparse_dict['gamma']
 55 | 
 56 | 
 57 | def set_logger(args):
 58 |     '''
 59 |     Write logs to checkpoint and console
 60 |     '''
 61 |     log_file = os.path.join(args.init_checkpoint, 'score_{}.log'.format(args.file_to_score))
 62 | 
 63 |     logging.basicConfig(
 64 |         format='%(asctime)s %(levelname)-8s %(message)s',
 65 |         level=logging.INFO,
 66 |         datefmt='%Y-%m-%d %H:%M:%S',
 67 |         filename=log_file,
 68 |         filemode='w'
 69 |     )
 70 |     console = logging.StreamHandler()
 71 |     console.setLevel(logging.INFO)
 72 |     formatter = logging.Formatter('%(asctime)s %(levelname)-8s %(message)s')
 73 |     console.setFormatter(formatter)
 74 |     logging.getLogger('').addHandler(console)
 75 | 
 76 | 
 77 | def read_triple(file_path, entity2id, relation2id):
 78 |     '''
 79 |     Read triples and map them into ids.
 80 |     '''
 81 |     triples = []
 82 |     with open(file_path) as fin:
 83 |         for line in fin:
 84 |             h, r, t = line.strip().split('\t')
 85 |             triples.append((entity2id[h], relation2id[r], entity2id[t]))
 86 |     return triples
 87 | 
 88 | 
 89 | def main(args):
 90 |     if args.init_checkpoint:
 91 |         override_config(args)
 92 |     elif args.dataset is None:
 93 |         raise ValueError('one of init_checkpoint/dataset must be choosed.')
 94 | 
 95 |     # Write logs to checkpoint and console
 96 |     set_logger(args)
 97 | 
 98 |     main_dir = os.path.join(os.path.relpath(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))), '.')
 99 | 
100 |     with open(os.path.join(main_dir, 'data/{}/entities.dict'.format(args.dataset))) as fin:
101 |         entity2id = dict()
102 |         for line in fin:
103 |             eid, entity = line.strip().split('\t')
104 |             entity2id[entity] = int(eid)
105 | 
106 |     with open(os.path.join(main_dir, 'data/{}/relations.dict'.format(args.dataset))) as fin:
107 |         relation2id = dict()
108 |         for line in fin:
109 |             rid, relation = line.strip().split('\t')
110 |             relation2id[relation] = int(rid)
111 | 
112 |     # test_triples = to_kge_format(triplets['to_score'])
113 |     test_triples = read_triple(os.path.join(main_dir, 'data/{}/{}.txt'.format(args.dataset, args.file_to_score)), entity2id, relation2id)
114 | 
115 |     nentity = len(entity2id)
116 |     nrelation = len(relation2id)
117 |     args.nentity = nentity
118 |     args.nrelation = nrelation
119 | 
120 |     logging.info('Model: %s' % args.model)
121 |     logging.info('Data Path: %s' % args.dataset)
122 |     logging.info('#entity: %d' % nentity)
123 |     logging.info('#relation: %d' % nrelation)
124 | 
125 |     kge_model = KGEModel(
126 |         model_name=args.model,
127 |         nentity=nentity,
128 |         nrelation=nrelation,
129 |         hidden_dim=args.hidden_dim,
130 |         gamma=args.gamma,
131 |         double_entity_embedding=args.double_entity_embedding,
132 |         double_relation_embedding=args.double_relation_embedding
133 |     )
134 | 
135 |     logging.info('Model Parameter Configuration:')
136 |     for name, param in kge_model.named_parameters():
137 |         logging.info('Parameter %s: %s, require_grad = %s' % (name, str(param.size()), str(param.requires_grad)))
138 | 
139 |     if args.cuda:
140 |         kge_model = kge_model.cuda()
141 | 
142 |     # Restore model from checkpoint directory
143 |     logging.info('Loading checkpoint %s...' % args.init_checkpoint)
144 |     checkpoint = torch.load(os.path.join(args.init_checkpoint, 'checkpoint'))
145 |     kge_model.load_state_dict(checkpoint['model_state_dict'])
146 |     logging.info('Scoring the triplets in {}.txt file'.format(args.file_to_score))
147 |     scores = kge_model.score_triplets(kge_model, test_triples, args)
148 | 
149 |     with open(os.path.join(main_dir, 'data/{}/{}.txt'.format(args.dataset, args.file_to_score))) as f:
150 |         triplets = [line.split() for line in f.read().split('\n')[:-1]]
151 |     file_path = os.path.join(main_dir, 'data/{}/{}_{}_predictions.txt'.format(args.dataset, args.model, args.file_to_score))
152 |     with open(file_path, "w") as f:
153 |         for ([s, r, o], score) in zip(triplets, scores):
154 |             f.write('\t'.join([s, r, o, str(score)]) + '\n')
155 | 
156 | 
157 | if __name__ == '__main__':
158 |     main(parse_args())
159 | 


--------------------------------------------------------------------------------
/CoMPILE_github/experiments/compile_nell_v4_ind/best_graph_classifier.pth:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TmacMai/CoMPILE_Inductive_Knowledge_Graph/523db77f8178f89faf5ec06c677ca66b84ef35eb/CoMPILE_github/experiments/compile_nell_v4_ind/best_graph_classifier.pth


--------------------------------------------------------------------------------
/CoMPILE_github/experiments/compile_nell_v4_ind/graph_classifier_chk.pth:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TmacMai/CoMPILE_Inductive_Knowledge_Graph/523db77f8178f89faf5ec06c677ca66b84ef35eb/CoMPILE_github/experiments/compile_nell_v4_ind/graph_classifier_chk.pth


--------------------------------------------------------------------------------
/CoMPILE_github/experiments/compile_nell_v4_ind/log_rank_test_1642565763.9744015.txt:
--------------------------------------------------------------------------------
 1 | ============ Initialized logger ============
 2 | add_traspose_rels: False
 3 | dataset: nell_v4_ind
 4 | enclosing_sub_graph: True
 5 | experiment_name: compile_nell_v4_ind
 6 | file_paths: {'graph': './data/nell_v4_ind/train.txt', 'links': './data/nell_v4_ind/test.txt'}
 7 | hop: 3
 8 | kge_model: TransE
 9 | mode: sample
10 | model_path: experiments/compile_nell_v4_ind/best_graph_classifier.pth
11 | ruleN_pred_path: ./data/nell_v4_ind/pos_predictions.txt
12 | use_kge_embeddings: False
13 | ============================================
14 | MRR | Hits@1 | Hits@5 | Hits@10 : 0.5919081837688219 | 0.5314637482900136 | 0.6354309165526676 | 0.6470588235294118
15 | 


--------------------------------------------------------------------------------
/CoMPILE_github/experiments/compile_nell_v4_ind/log_train.txt:
--------------------------------------------------------------------------------
  1 | ============ Initialized logger ============
  2 | add_ht_emb: True
  3 | add_traspose_rels: False
  4 | attn_rel_emb_dim: 32
  5 | batch_size: 16
  6 | clip: 1000
  7 | constrained_neg_prob: 0.0
  8 | dataset: nell_v4
  9 | disable_cuda: False
 10 | dropout: 0
 11 | early_stop: 100
 12 | edge_dropout: 0.5
 13 | emb_dim: 32
 14 | enclosing_sub_graph: True
 15 | eval_every: 1
 16 | eval_every_iter: 455
 17 | exp_dir: utils/../experiments/compile_nell_v4_ind
 18 | experiment_name: compile_nell_v4_ind
 19 | gnn_agg_type: sum
 20 | gpu: 0
 21 | has_attn: True
 22 | hop: 3
 23 | kge_model: TransE
 24 | l2: 0.0005
 25 | load_model: False
 26 | lr: 0.001
 27 | main_dir: utils/..
 28 | margin: 10
 29 | max_links: 10000000
 30 | max_nodes_per_hop: None
 31 | model_type: dgl
 32 | num_bases: 4
 33 | num_epochs: 30
 34 | num_gcn_layers: 3
 35 | num_neg_samples_per_link: 1
 36 | num_workers: 0
 37 | optimizer: Adam
 38 | rel_emb_dim: 32
 39 | save_every: 10
 40 | train_file: train
 41 | use_kge_embeddings: False
 42 | valid_file: valid
 43 | ============================================
 44 | Sampling negative links for train
 45 | Sampling negative links for valid
 46 | Extracting enclosing subgraphs for positive links in train set
 47 | Extracting enclosing subgraphs for negative links in train set
 48 | Extracting enclosing subgraphs for positive links in valid set
 49 | Extracting enclosing subgraphs for negative links in valid set
 50 | Max distance from sub : 3, Max distance from obj : 3
 51 | Max distance from sub : 3, Max distance from obj : 3
 52 | No existing model found. Initializing new model..
 53 | Device: cuda:0
 54 | Input dim : 8, # Relations : 76, # Augmented relations : 76
 55 | Total number of parameters: 36609
 56 | Starting training with full batch...
 57 | ============ Initialized logger ============
 58 | add_ht_emb: True
 59 | add_traspose_rels: False
 60 | attn_rel_emb_dim: 32
 61 | batch_size: 16
 62 | clip: 1000
 63 | constrained_neg_prob: 0.0
 64 | dataset: nell_v4
 65 | disable_cuda: False
 66 | dropout: 0
 67 | early_stop: 100
 68 | edge_dropout: 0.5
 69 | emb_dim: 32
 70 | enclosing_sub_graph: True
 71 | eval_every: 1
 72 | eval_every_iter: 455
 73 | exp_dir: utils/../experiments/compile_nell_v4_ind
 74 | experiment_name: compile_nell_v4_ind
 75 | gnn_agg_type: sum
 76 | gpu: 0
 77 | has_attn: True
 78 | hop: 3
 79 | kge_model: TransE
 80 | l2: 0.0005
 81 | load_model: False
 82 | lr: 0.001
 83 | main_dir: utils/..
 84 | margin: 10
 85 | max_links: 10000000
 86 | max_nodes_per_hop: None
 87 | model_type: dgl
 88 | num_bases: 4
 89 | num_epochs: 30
 90 | num_gcn_layers: 3
 91 | num_neg_samples_per_link: 1
 92 | num_workers: 0
 93 | optimizer: Adam
 94 | rel_emb_dim: 32
 95 | save_every: 10
 96 | train_file: train
 97 | use_kge_embeddings: False
 98 | valid_file: valid
 99 | ============================================
100 | Max distance from sub : 3, Max distance from obj : 3
101 | Max distance from sub : 3, Max distance from obj : 3
102 | No existing model found. Initializing new model..
103 | Device: cuda:0
104 | Input dim : 8, # Relations : 76, # Augmented relations : 76
105 | Total number of parameters: 36609
106 | Starting training with full batch...
107 | ============ Initialized logger ============
108 | add_ht_emb: True
109 | add_traspose_rels: False
110 | attn_rel_emb_dim: 32
111 | batch_size: 16
112 | clip: 1000
113 | constrained_neg_prob: 0.0
114 | dataset: nell_v4
115 | disable_cuda: False
116 | dropout: 0
117 | early_stop: 100
118 | edge_dropout: 0.5
119 | emb_dim: 32
120 | enclosing_sub_graph: True
121 | eval_every: 1
122 | eval_every_iter: 455
123 | exp_dir: utils/../experiments/compile_nell_v4_ind
124 | experiment_name: compile_nell_v4_ind
125 | gnn_agg_type: sum
126 | gpu: 0
127 | has_attn: True
128 | hop: 3
129 | kge_model: TransE
130 | l2: 0.0005
131 | load_model: False
132 | lr: 0.001
133 | main_dir: utils/..
134 | margin: 10
135 | max_links: 10000000
136 | max_nodes_per_hop: None
137 | model_type: dgl
138 | num_bases: 4
139 | num_epochs: 30
140 | num_gcn_layers: 3
141 | num_neg_samples_per_link: 1
142 | num_workers: 0
143 | optimizer: Adam
144 | rel_emb_dim: 32
145 | save_every: 10
146 | train_file: train
147 | use_kge_embeddings: False
148 | valid_file: valid
149 | ============================================
150 | Max distance from sub : 3, Max distance from obj : 3
151 | Max distance from sub : 3, Max distance from obj : 3
152 | No existing model found. Initializing new model..
153 | Device: cuda:0
154 | Input dim : 8, # Relations : 76, # Augmented relations : 76
155 | Total number of parameters: 36609
156 | Starting training with full batch...
157 | 
158 | Performance:{'auc': 0.7120642292696149, 'auc_pr': 0.708260517596693}in 98.48620629310608
159 | Better models found w.r.t accuracy. Saved it!
160 | Epoch 1 with loss: 1002472.75, training auc: 0.6504949166272641, training auc_pr: 0.6519165979079444, best validation AUC: 0.7120642292696149, weight_norm: 134.85137939453125 in 1290.5363817214966
161 | 
162 | Performance:{'auc': 0.5829391328370968, 'auc_pr': 0.6081275506340519}in 97.76176166534424
163 | Epoch 2 with loss: 997751.6875, training auc: 0.6105630321149288, training auc_pr: 0.6389048512854649, best validation AUC: 0.7120642292696149, weight_norm: 136.5711212158203 in 1282.1537063121796
164 | 
165 | Performance:{'auc': 0.5657422176351619, 'auc_pr': 0.5924276115981146}in 107.78150796890259
166 | Epoch 3 with loss: 1016724.5625, training auc: 0.5967820286130567, training auc_pr: 0.6396891847079792, best validation AUC: 0.7120642292696149, weight_norm: 136.8701629638672 in 1322.7062730789185
167 | 
168 | Performance:{'auc': 0.5725980484143367, 'auc_pr': 0.5958905418148995}in 99.76627349853516
169 | Epoch 4 with loss: 999020.875, training auc: 0.6082831027916138, training auc_pr: 0.6467284701144813, best validation AUC: 0.7120642292696149, weight_norm: 137.06948852539062 in 1289.740923166275
170 | 
171 | Performance:{'auc': 0.5727479097600133, 'auc_pr': 0.5951300420013876}in 105.44012331962585
172 | Epoch 5 with loss: 983860.6875, training auc: 0.6148082993614077, training auc_pr: 0.6538119348223812, best validation AUC: 0.7120642292696149, weight_norm: 137.42681884765625 in 1385.456704378128
173 | 
174 | Performance:{'auc': 0.5731636121014991, 'auc_pr': 0.5945714192134619}in 116.94251847267151
175 | Epoch 6 with loss: 987564.5, training auc: 0.6118899761294434, training auc_pr: 0.6499089548029302, best validation AUC: 0.7120642292696149, weight_norm: 137.3944549560547 in 1562.5355093479156
176 | 
177 | Performance:{'auc': 0.5721022028314673, 'auc_pr': 0.5969215515297234}in 106.19926738739014
178 | Epoch 7 with loss: 979331.1875, training auc: 0.6161806930392261, training auc_pr: 0.6558125930596259, best validation AUC: 0.7120642292696149, weight_norm: 137.98033142089844 in 1366.781834602356
179 | 
180 | Performance:{'auc': 0.6194636006338483, 'auc_pr': 0.6389913633601614}in 111.00783586502075
181 | Epoch 8 with loss: 984802.125, training auc: 0.614314569904638, training auc_pr: 0.6482952953290476, best validation AUC: 0.7120642292696149, weight_norm: 139.932373046875 in 1472.37526345253
182 | 
183 | Performance:{'auc': 0.6010345645420238, 'auc_pr': 0.6249457745996215}in 110.55558323860168
184 | Epoch 9 with loss: 962426.0625, training auc: 0.6265743619362512, training auc_pr: 0.6618949848228293, best validation AUC: 0.7120642292696149, weight_norm: 143.48548889160156 in 1377.1080858707428
185 | 
186 | Performance:{'auc': 0.6019467640374471, 'auc_pr': 0.6253301646208065}in 107.27267694473267
187 | Epoch 10 with loss: 925264.25, training auc: 0.6459990966967227, training auc_pr: 0.6786478813874655, best validation AUC: 0.7120642292696149, weight_norm: 145.2030792236328 in 1457.4731812477112
188 | 
189 | Performance:{'auc': 0.6948881643418611, 'auc_pr': 0.7064584155217968}in 106.54831171035767
190 | Epoch 11 with loss: 902373.0625, training auc: 0.6599780029249352, training auc_pr: 0.6897035802476682, best validation AUC: 0.7120642292696149, weight_norm: 146.06576538085938 in 1403.814304113388
191 | 
192 | Performance:{'auc': 0.703386605783866, 'auc_pr': 0.716869067098012}in 108.78307700157166
193 | Epoch 12 with loss: 833329.1875, training auc: 0.6917684355108966, training auc_pr: 0.713560964922975, best validation AUC: 0.7120642292696149, weight_norm: 148.72451782226562 in 1410.622745513916
194 | 
195 | Performance:{'auc': 0.8097190946810953, 'auc_pr': 0.7716038474767217}in 108.13592767715454
196 | Better models found w.r.t accuracy. Saved it!
197 | Epoch 13 with loss: 643480.5625, training auc: 0.7710449414981346, training auc_pr: 0.7585973968940547, best validation AUC: 0.8097190946810953, weight_norm: 151.1417999267578 in 1416.138186454773
198 | 
199 | Performance:{'auc': 0.845273373157357, 'auc_pr': 0.8319535924806791}in 105.47019243240356
200 | Better models found w.r.t accuracy. Saved it!
201 | Epoch 14 with loss: 479894.96875, training auc: 0.8336343173478133, training auc_pr: 0.8193526666786017, best validation AUC: 0.845273373157357, weight_norm: 153.6695098876953 in 1397.8663086891174
202 | 
203 | Performance:{'auc': 0.8688928243781405, 'auc_pr': 0.8582676820005574}in 109.96139717102051
204 | Better models found w.r.t accuracy. Saved it!
205 | Epoch 15 with loss: 414557.96875, training auc: 0.8583448145832866, training auc_pr: 0.8458281804029446, best validation AUC: 0.8688928243781405, weight_norm: 156.20025634765625 in 1466.916127204895
206 | 
207 | Performance:{'auc': 0.8716541825650007, 'auc_pr': 0.8625972041473648}in 103.59730458259583
208 | Better models found w.r.t accuracy. Saved it!
209 | Epoch 16 with loss: 382601.21875, training auc: 0.8699039213786858, training auc_pr: 0.8544261442598972, best validation AUC: 0.8716541825650007, weight_norm: 158.39100646972656 in 1363.205931186676
210 | 
211 | Performance:{'auc': 0.8838620180980379, 'auc_pr': 0.8694152015825152}in 112.29348659515381
212 | Better models found w.r.t accuracy. Saved it!
213 | Epoch 17 with loss: 353920.5, training auc: 0.8784802359645364, training auc_pr: 0.8658363370816708, best validation AUC: 0.8838620180980379, weight_norm: 160.42938232421875 in 1494.4573781490326
214 | 
215 | Performance:{'auc': 0.8958926263005359, 'auc_pr': 0.8877860159074005}in 93.33963990211487
216 | Better models found w.r.t accuracy. Saved it!
217 | Epoch 18 with loss: 322911.84375, training auc: 0.8895199995728996, training auc_pr: 0.8727785515088582, best validation AUC: 0.8958926263005359, weight_norm: 162.52545166015625 in 1376.4243819713593
218 | 
219 | Performance:{'auc': 0.8975241602552072, 'auc_pr': 0.8959679434812348}in 107.65798211097717
220 | Better models found w.r.t accuracy. Saved it!
221 | Epoch 19 with loss: 294608.5, training auc: 0.9001592072904352, training auc_pr: 0.8890395179549839, best validation AUC: 0.8975241602552072, weight_norm: 163.7848663330078 in 1473.4786217212677
222 | 
223 | Performance:{'auc': 0.9035537989199558, 'auc_pr': 0.8979020537472038}in 100.02894830703735
224 | Better models found w.r.t accuracy. Saved it!
225 | Epoch 20 with loss: 286375.5625, training auc: 0.903554883699791, training auc_pr: 0.8917630850211133, best validation AUC: 0.9035537989199558, weight_norm: 165.7347412109375 in 1376.8097817897797
226 | 
227 | Performance:{'auc': 0.903766862659244, 'auc_pr': 0.8974102582621495}in 118.61936497688293
228 | Better models found w.r.t accuracy. Saved it!
229 | Epoch 21 with loss: 274164.8125, training auc: 0.9069193705411298, training auc_pr: 0.8950499766698242, best validation AUC: 0.903766862659244, weight_norm: 167.73663330078125 in 1468.8919966220856
230 | 
231 | Performance:{'auc': 0.9028162204707992, 'auc_pr': 0.8927616991355903}in 100.0911705493927
232 | Epoch 22 with loss: 270166.03125, training auc: 0.9080979182438529, training auc_pr: 0.8953932248409231, best validation AUC: 0.903766862659244, weight_norm: 169.07098388671875 in 1393.7504432201385
233 | 
234 | Performance:{'auc': 0.8992964335606013, 'auc_pr': 0.8864098973641945}in 110.24582862854004
235 | Epoch 23 with loss: 264222.8125, training auc: 0.9097743926481412, training auc_pr: 0.8971297397725044, best validation AUC: 0.903766862659244, weight_norm: 170.19969177246094 in 1416.868691444397
236 | 
237 | Performance:{'auc': 0.9011188778382434, 'auc_pr': 0.8905536584299034}in 98.31412649154663
238 | Epoch 24 with loss: 239050.1875, training auc: 0.9202242150607819, training auc_pr: 0.9086160677131943, best validation AUC: 0.903766862659244, weight_norm: 171.57135009765625 in 1421.4703109264374
239 | 
240 | Performance:{'auc': 0.9133117272367131, 'auc_pr': 0.9062772620611281}in 102.3911018371582
241 | Better models found w.r.t accuracy. Saved it!
242 | Epoch 25 with loss: 255216.4375, training auc: 0.9136345231708636, training auc_pr: 0.8996050033809619, best validation AUC: 0.9133117272367131, weight_norm: 173.8502960205078 in 1353.9066956043243
243 | 
244 | Performance:{'auc': 0.902582958028398, 'auc_pr': 0.8942434171019488}in 101.21613311767578
245 | Epoch 26 with loss: 227124.1875, training auc: 0.9224140792379405, training auc_pr: 0.9105422612268373, best validation AUC: 0.9133117272367131, weight_norm: 174.82058715820312 in 1476.4799265861511
246 | 
247 | Performance:{'auc': 0.9132941348178728, 'auc_pr': 0.9092162368302779}in 113.04683709144592
248 | 
249 | Performance:{'auc': 0.9172960843185087, 'auc_pr': 0.9147312479735876}in 109.55985236167908
250 | Better models found w.r.t accuracy. Saved it!
251 | Epoch 27 with loss: 220791.1875, training auc: 0.9252795066484709, training auc_pr: 0.9150551158030051, best validation AUC: 0.9172960843185087, weight_norm: 176.0113525390625 in 1464.3590109348297
252 | 
253 | Performance:{'auc': 0.8895691290840474, 'auc_pr': 0.8776626693024772}in 112.88955640792847
254 | Epoch 28 with loss: 264310.0625, training auc: 0.9106737796677594, training auc_pr: 0.8971130010733881, best validation AUC: 0.9172960843185087, weight_norm: 178.27793884277344 in 1506.5027270317078
255 | 
256 | Performance:{'auc': 0.8989615260315674, 'auc_pr': 0.8947127708678383}in 111.65631628036499
257 | Epoch 29 with loss: 278754.125, training auc: 0.9065422682922426, training auc_pr: 0.8967538493248908, best validation AUC: 0.9172960843185087, weight_norm: 179.66348266601562 in 1368.8761911392212
258 | 
259 | Performance:{'auc': 0.8913655105189633, 'auc_pr': 0.8794615376574819}in 106.69048309326172
260 | Epoch 30 with loss: 405169.40625, training auc: 0.8634110558869994, training auc_pr: 0.8523589774032612, best validation AUC: 0.9172960843185087, weight_norm: 181.33078002929688 in 1464.6133909225464
261 | 


--------------------------------------------------------------------------------
/CoMPILE_github/experiments/compile_nell_v4_ind/params.json:
--------------------------------------------------------------------------------
1 | {"experiment_name": "compile_nell_v4_ind", "dataset": "nell_v4_ind", "train_file": "train", "test_file": "test", "runs": 1, "gpu": 0, "disable_cuda": false, "max_links": 100000, "hop": 3, "max_nodes_per_hop": null, "use_kge_embeddings": false, "kge_model": "TransE", "model_type": "dgl", "constrained_neg_prob": 0, "num_neg_samples_per_link": 1, "batch_size": 16, "num_workers": 0, "add_traspose_rels": false, "enclosing_sub_graph": true, "main_dir": "utils/..", "exp_dir": "utils/../experiments/compile_nell_v4_ind", "test_exp_dir": "utils/../experiments/compile_nell_v4_ind/test_nell_v4_ind_0"}


--------------------------------------------------------------------------------
/CoMPILE_github/experiments/compile_nell_v4_ind/test_nell_v4_ind_0/log_test.txt:
--------------------------------------------------------------------------------
  1 | ============ Initialized logger ============
  2 | add_traspose_rels: False
  3 | batch_size: 16
  4 | constrained_neg_prob: 0
  5 | dataset: nell_v4_ind
  6 | disable_cuda: False
  7 | enclosing_sub_graph: True
  8 | exp_dir: utils/../experiments/compile_nell_v4_ind
  9 | experiment_name: compile_nell_v4_ind
 10 | gpu: 0
 11 | hop: 3
 12 | kge_model: TransE
 13 | main_dir: utils/..
 14 | max_links: 100000
 15 | max_nodes_per_hop: None
 16 | model_type: dgl
 17 | num_neg_samples_per_link: 1
 18 | num_workers: 0
 19 | runs: 1
 20 | test_exp_dir: utils/../experiments/compile_nell_v4_ind/test_nell_v4_ind_0
 21 | test_file: test
 22 | train_file: train
 23 | use_kge_embeddings: False
 24 | ============================================
 25 | Loading existing model from utils/../experiments/compile_nell_v4_ind/best_graph_classifier.pth
 26 | Device: cuda:0
 27 | Sampling negative links for test
 28 | Extracting enclosing subgraphs for positive links in test set
 29 | Extracting enclosing subgraphs for negative links in test set
 30 | Max distance from sub : 3, Max distance from obj : 3
 31 | 
 32 | Test Set Performance:{'auc': 0.78092244755886, 'auc_pr': 0.8432636131804843}
 33 | 
 34 | Avg test Set Performance -- mean auc :0.78092244755886 std auc: 0.0
 35 | 
 36 | Avg test Set Performance -- mean auc_pr :0.8432636131804843 std auc_pr: 0.0
 37 | ============ Initialized logger ============
 38 | add_traspose_rels: False
 39 | batch_size: 16
 40 | constrained_neg_prob: 0
 41 | dataset: nell_v4_ind
 42 | disable_cuda: False
 43 | enclosing_sub_graph: True
 44 | exp_dir: utils/../experiments/compile_nell_v4_ind
 45 | experiment_name: compile_nell_v4_ind
 46 | gpu: 0
 47 | hop: 3
 48 | kge_model: TransE
 49 | main_dir: utils/..
 50 | max_links: 100000
 51 | max_nodes_per_hop: None
 52 | model_type: dgl
 53 | num_neg_samples_per_link: 1
 54 | num_workers: 0
 55 | runs: 1
 56 | test_exp_dir: utils/../experiments/compile_nell_v4_ind/test_nell_v4_ind_0
 57 | test_file: test
 58 | train_file: train
 59 | use_kge_embeddings: False
 60 | ============================================
 61 | Loading existing model from utils/../experiments/compile_nell_v4_ind/best_graph_classifier.pth
 62 | Device: cuda:0
 63 | Sampling negative links for test
 64 | Extracting enclosing subgraphs for positive links in test set
 65 | Extracting enclosing subgraphs for negative links in test set
 66 | Max distance from sub : 3, Max distance from obj : 3
 67 | 
 68 | Test Set Performance:{'auc': 0.7682933447613131, 'auc_pr': 0.8343463225340576}
 69 | 
 70 | Avg test Set Performance -- mean auc :0.7682933447613131 std auc: 0.0
 71 | 
 72 | Avg test Set Performance -- mean auc_pr :0.8343463225340576 std auc_pr: 0.0
 73 | ============ Initialized logger ============
 74 | add_traspose_rels: False
 75 | batch_size: 16
 76 | constrained_neg_prob: 0
 77 | dataset: nell_v4_ind
 78 | disable_cuda: False
 79 | enclosing_sub_graph: True
 80 | exp_dir: utils/../experiments/compile_nell_v4_ind
 81 | experiment_name: compile_nell_v4_ind
 82 | gpu: 0
 83 | hop: 3
 84 | kge_model: TransE
 85 | main_dir: utils/..
 86 | max_links: 100000
 87 | max_nodes_per_hop: None
 88 | model_type: dgl
 89 | num_neg_samples_per_link: 1
 90 | num_workers: 0
 91 | runs: 1
 92 | test_exp_dir: utils/../experiments/compile_nell_v4_ind/test_nell_v4_ind_0
 93 | test_file: test
 94 | train_file: train
 95 | use_kge_embeddings: False
 96 | ============================================
 97 | Loading existing model from utils/../experiments/compile_nell_v4_ind/best_graph_classifier.pth
 98 | Device: cuda:0
 99 | Sampling negative links for test
100 | Extracting enclosing subgraphs for positive links in test set
101 | Extracting enclosing subgraphs for negative links in test set
102 | Max distance from sub : 3, Max distance from obj : 3
103 | 
104 | Test Set Performance:{'auc': 0.7790912884735225, 'auc_pr': 0.8423223582566522}
105 | 
106 | Avg test Set Performance -- mean auc :0.7790912884735225 std auc: 0.0
107 | 
108 | Avg test Set Performance -- mean auc_pr :0.8423223582566522 std auc_pr: 0.0
109 | 


--------------------------------------------------------------------------------
/CoMPILE_github/experiments/compile_nell_v4_ind2/best_graph_classifier.pth:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TmacMai/CoMPILE_Inductive_Knowledge_Graph/523db77f8178f89faf5ec06c677ca66b84ef35eb/CoMPILE_github/experiments/compile_nell_v4_ind2/best_graph_classifier.pth


--------------------------------------------------------------------------------
/CoMPILE_github/experiments/compile_nell_v4_ind2/graph_classifier_chk.pth:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TmacMai/CoMPILE_Inductive_Knowledge_Graph/523db77f8178f89faf5ec06c677ca66b84ef35eb/CoMPILE_github/experiments/compile_nell_v4_ind2/graph_classifier_chk.pth


--------------------------------------------------------------------------------
/CoMPILE_github/experiments/compile_nell_v4_ind2/log_rank_test_1642603455.548011.txt:
--------------------------------------------------------------------------------
 1 | ============ Initialized logger ============
 2 | add_traspose_rels: False
 3 | dataset: nell_v4_ind
 4 | enclosing_sub_graph: True
 5 | experiment_name: compile_nell_v4_ind2
 6 | file_paths: {'graph': './data/nell_v4_ind/train.txt', 'links': './data/nell_v4_ind/test.txt'}
 7 | hop: 3
 8 | kge_model: TransE
 9 | mode: sample
10 | model_path: experiments/compile_nell_v4_ind2/best_graph_classifier.pth
11 | ruleN_pred_path: ./data/nell_v4_ind/pos_predictions.txt
12 | use_kge_embeddings: False
13 | ============================================
14 | 


--------------------------------------------------------------------------------
/CoMPILE_github/experiments/compile_nell_v4_ind2/log_rank_test_1642647880.451746.txt:
--------------------------------------------------------------------------------
 1 | ============ Initialized logger ============
 2 | add_traspose_rels: False
 3 | dataset: nell_v4_ind
 4 | enclosing_sub_graph: True
 5 | experiment_name: compile_nell_v4_ind2
 6 | file_paths: {'graph': './data/nell_v4_ind/train.txt', 'links': './data/nell_v4_ind/test.txt'}
 7 | hop: 3
 8 | kge_model: TransE
 9 | mode: sample
10 | model_path: experiments/compile_nell_v4_ind2/best_graph_classifier.pth
11 | ruleN_pred_path: ./data/nell_v4_ind/pos_predictions.txt
12 | use_kge_embeddings: False
13 | ============================================
14 | MRR | Hits@1 | Hits@5 | Hits@10 : 0.154622204580616 | 0.11696306429548564 | 0.1238030095759234 | 0.14227086183310533
15 | 


--------------------------------------------------------------------------------
/CoMPILE_github/experiments/compile_nell_v4_ind2/log_rank_test_1642679948.7119958.txt:
--------------------------------------------------------------------------------
 1 | ============ Initialized logger ============
 2 | add_traspose_rels: False
 3 | dataset: nell_v4_ind
 4 | enclosing_sub_graph: True
 5 | experiment_name: compile_nell_v4_ind2
 6 | file_paths: {'graph': './data/nell_v4_ind/train.txt', 'links': './data/nell_v4_ind/test.txt'}
 7 | hop: 3
 8 | kge_model: TransE
 9 | mode: sample
10 | model_path: experiments/compile_nell_v4_ind2/best_graph_classifier.pth
11 | ruleN_pred_path: ./data/nell_v4_ind/pos_predictions.txt
12 | use_kge_embeddings: False
13 | ============================================
14 | MRR | Hits@1 | Hits@5 | Hits@10 : 0.465616634820716 | 0.393296853625171 | 0.5136798905608755 | 0.5430916552667578
15 | 


--------------------------------------------------------------------------------
/CoMPILE_github/experiments/compile_nell_v4_ind2/log_rank_test_1642733457.0139189.txt:
--------------------------------------------------------------------------------
 1 | ============ Initialized logger ============
 2 | add_traspose_rels: False
 3 | dataset: nell_v4_ind
 4 | enclosing_sub_graph: True
 5 | experiment_name: compile_nell_v4_ind2
 6 | file_paths: {'graph': './data/nell_v4_ind/train.txt', 'links': './data/nell_v4_ind/test.txt'}
 7 | hop: 3
 8 | kge_model: TransE
 9 | mode: sample
10 | model_path: experiments/compile_nell_v4_ind2/best_graph_classifier.pth
11 | ruleN_pred_path: ./data/nell_v4_ind/pos_predictions.txt
12 | use_kge_embeddings: False
13 | ============================================
14 | MRR | Hits@1 | Hits@5 | Hits@10 : 0.3413417829091137 | 0.23529411764705882 | 0.4425444596443228 | 0.4781121751025992
15 | 


--------------------------------------------------------------------------------
/CoMPILE_github/experiments/compile_nell_v4_ind2/log_rank_test_1642832066.1408617.txt:
--------------------------------------------------------------------------------
 1 | ============ Initialized logger ============
 2 | add_traspose_rels: False
 3 | dataset: nell_v4_ind
 4 | enclosing_sub_graph: True
 5 | experiment_name: compile_nell_v4_ind2
 6 | file_paths: {'graph': './data/nell_v4_ind/train.txt', 'links': './data/nell_v4_ind/test.txt'}
 7 | hop: 3
 8 | kge_model: TransE
 9 | mode: sample
10 | model_path: experiments/compile_nell_v4_ind2/best_graph_classifier.pth
11 | ruleN_pred_path: ./data/nell_v4_ind/pos_predictions.txt
12 | use_kge_embeddings: False
13 | ============================================
14 | MRR | Hits@1 | Hits@5 | Hits@10 : 0.6370628492990844 | 0.5690834473324213 | 0.7024623803009576 | 0.7373461012311902
15 | 


--------------------------------------------------------------------------------
/CoMPILE_github/experiments/compile_nell_v4_ind2/log_rank_test_1642841084.211995.txt:
--------------------------------------------------------------------------------
 1 | ============ Initialized logger ============
 2 | add_traspose_rels: False
 3 | dataset: nell_v4_ind
 4 | enclosing_sub_graph: True
 5 | experiment_name: compile_nell_v4_ind2
 6 | file_paths: {'graph': './data/nell_v4_ind/train.txt', 'links': './data/nell_v4_ind/test.txt'}
 7 | hop: 3
 8 | kge_model: TransE
 9 | mode: sample
10 | model_path: experiments/compile_nell_v4_ind2/best_graph_classifier.pth
11 | ruleN_pred_path: ./data/nell_v4_ind/pos_predictions.txt
12 | use_kge_embeddings: False
13 | ============================================
14 | MRR | Hits@1 | Hits@5 | Hits@10 : 0.6319985352481888 | 0.5601915184678523 | 0.6969904240766074 | 0.7352941176470589
15 | 


--------------------------------------------------------------------------------
/CoMPILE_github/experiments/compile_nell_v4_ind2/log_rank_test_1642850740.4142146.txt:
--------------------------------------------------------------------------------
 1 | ============ Initialized logger ============
 2 | add_traspose_rels: False
 3 | dataset: nell_v4_ind
 4 | enclosing_sub_graph: True
 5 | experiment_name: compile_nell_v4_ind2
 6 | file_paths: {'graph': './data/nell_v4_ind/train.txt', 'links': './data/nell_v4_ind/test.txt'}
 7 | hop: 3
 8 | kge_model: TransE
 9 | mode: sample
10 | model_path: experiments/compile_nell_v4_ind2/best_graph_classifier.pth
11 | ruleN_pred_path: ./data/nell_v4_ind/pos_predictions.txt
12 | use_kge_embeddings: False
13 | ============================================
14 | MRR | Hits@1 | Hits@5 | Hits@10 : 0.6328534341849693 | 0.5588235294117647 | 0.7065663474692202 | 0.7435020519835841
15 | 


--------------------------------------------------------------------------------
/CoMPILE_github/experiments/compile_nell_v4_ind2/log_rank_test_1642855466.9402952.txt:
--------------------------------------------------------------------------------
 1 | ============ Initialized logger ============
 2 | add_traspose_rels: False
 3 | dataset: nell_v4_ind
 4 | enclosing_sub_graph: True
 5 | experiment_name: compile_nell_v4_ind2
 6 | file_paths: {'graph': './data/nell_v4_ind/train.txt', 'links': './data/nell_v4_ind/test.txt'}
 7 | hop: 3
 8 | kge_model: TransE
 9 | mode: sample
10 | model_path: experiments/compile_nell_v4_ind2/best_graph_classifier.pth
11 | ruleN_pred_path: ./data/nell_v4_ind/pos_predictions.txt
12 | use_kge_embeddings: False
13 | ============================================
14 | MRR | Hits@1 | Hits@5 | Hits@10 : 0.6372589530665589 | 0.5663474692202463 | 0.70109439124487 | 0.7346101231190151
15 | 


--------------------------------------------------------------------------------
/CoMPILE_github/experiments/compile_nell_v4_ind2/log_rank_test_1642859913.6085432.txt:
--------------------------------------------------------------------------------
 1 | ============ Initialized logger ============
 2 | add_traspose_rels: False
 3 | dataset: nell_v4_ind
 4 | enclosing_sub_graph: True
 5 | experiment_name: compile_nell_v4_ind2
 6 | file_paths: {'graph': './data/nell_v4_ind/train.txt', 'links': './data/nell_v4_ind/test.txt'}
 7 | hop: 3
 8 | kge_model: TransE
 9 | mode: sample
10 | model_path: experiments/compile_nell_v4_ind2/best_graph_classifier.pth
11 | ruleN_pred_path: ./data/nell_v4_ind/pos_predictions.txt
12 | use_kge_embeddings: False
13 | ============================================
14 | MRR | Hits@1 | Hits@5 | Hits@10 : 0.6326340381356135 | 0.5601915184678523 | 0.7051983584131327 | 0.7373461012311902
15 | 


--------------------------------------------------------------------------------
/CoMPILE_github/experiments/compile_nell_v4_ind2/log_rank_test_1642908893.5786257.txt:
--------------------------------------------------------------------------------
 1 | ============ Initialized logger ============
 2 | add_traspose_rels: False
 3 | dataset: nell_v4_ind
 4 | enclosing_sub_graph: True
 5 | experiment_name: compile_nell_v4_ind2
 6 | file_paths: {'graph': './data/nell_v4_ind/train.txt', 'links': './data/nell_v4_ind/test.txt'}
 7 | hop: 3
 8 | kge_model: TransE
 9 | mode: sample
10 | model_path: experiments/compile_nell_v4_ind2/best_graph_classifier.pth
11 | ruleN_pred_path: ./data/nell_v4_ind/pos_predictions.txt
12 | use_kge_embeddings: False
13 | ============================================
14 | MRR | Hits@1 | Hits@5 | Hits@10 : 0.6416702961382977 | 0.5725034199726402 | 0.7079343365253078 | 0.7387140902872777
15 | 


--------------------------------------------------------------------------------
/CoMPILE_github/experiments/compile_nell_v4_ind2/params.json:
--------------------------------------------------------------------------------
1 | {"experiment_name": "compile_nell_v4_ind2", "dataset": "nell_v4", "gpu": 0, "disable_cuda": false, "load_model": false, "train_file": "train", "valid_file": "valid", "num_epochs": 30, "eval_every": 1, "eval_every_iter": 455, "save_every": 10, "early_stop": 100, "optimizer": "Adam", "lr": 0.001, "clip": 1000, "l2": 0.0005, "margin": 10, "max_links": 10000000, "hop": 3, "max_nodes_per_hop": null, "use_kge_embeddings": false, "kge_model": "TransE", "model_type": "dgl", "constrained_neg_prob": 0.0, "batch_size": 16, "num_neg_samples_per_link": 1, "num_workers": 0, "add_traspose_rels": false, "enclosing_sub_graph": true, "rel_emb_dim": 32, "attn_rel_emb_dim": 32, "emb_dim": 32, "num_gcn_layers": 3, "num_bases": 4, "dropout": 0, "edge_dropout": 0.5, "gnn_agg_type": "sum", "add_ht_emb": true, "has_attn": true, "main_dir": "utils/..", "exp_dir": "utils/../experiments/compile_nell_v4_ind2"}


--------------------------------------------------------------------------------
/CoMPILE_github/experiments/compile_nell_v4_ind2/test_nell_v4_ind_0/log_test.txt:
--------------------------------------------------------------------------------
 1 | ============ Initialized logger ============
 2 | add_traspose_rels: False
 3 | batch_size: 32
 4 | constrained_neg_prob: 0
 5 | dataset: nell_v4_ind
 6 | disable_cuda: False
 7 | enclosing_sub_graph: True
 8 | exp_dir: utils/../experiments/compile_nell_v4_ind2
 9 | experiment_name: compile_nell_v4_ind2
10 | gpu: 0
11 | hop: 3
12 | kge_model: TransE
13 | main_dir: utils/..
14 | max_links: 100000
15 | max_nodes_per_hop: None
16 | model_type: dgl
17 | num_neg_samples_per_link: 1
18 | num_workers: 0
19 | runs: 1
20 | test_exp_dir: utils/../experiments/compile_nell_v4_ind2/test_nell_v4_ind_0
21 | test_file: test
22 | train_file: train
23 | use_kge_embeddings: False
24 | ============================================
25 | 


--------------------------------------------------------------------------------
/CoMPILE_github/kge/README.md:
--------------------------------------------------------------------------------
1 | The files here are largely taken from RotatE's official code available [here](https://github.com/DeepGraphLearning/KnowledgeGraphEmbedding).
2 | 


--------------------------------------------------------------------------------
/CoMPILE_github/kge/dataloader.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/python3
  2 | 
  3 | from __future__ import absolute_import
  4 | from __future__ import division
  5 | from __future__ import print_function
  6 | 
  7 | import numpy as np
  8 | import torch
  9 | 
 10 | from torch.utils.data import Dataset
 11 | 
 12 | class TrainDataset(Dataset):
 13 |     def __init__(self, triples, nentity, nrelation, negative_sample_size, mode):
 14 |         self.len = len(triples)
 15 |         self.triples = triples
 16 |         self.triple_set = set(triples)
 17 |         self.nentity = nentity
 18 |         self.nrelation = nrelation
 19 |         self.negative_sample_size = negative_sample_size
 20 |         self.mode = mode
 21 |         self.count = self.count_frequency(triples)
 22 |         self.true_head, self.true_tail = self.get_true_head_and_tail(self.triples)
 23 |         
 24 |     def __len__(self):
 25 |         return self.len
 26 |     
 27 |     def __getitem__(self, idx):
 28 |         positive_sample = self.triples[idx]
 29 | 
 30 |         head, relation, tail = positive_sample
 31 | 
 32 |         subsampling_weight = self.count[(head, relation)] + self.count[(tail, -relation-1)]
 33 |         subsampling_weight = torch.sqrt(1 / torch.Tensor([subsampling_weight]))
 34 |         
 35 |         negative_sample_list = []
 36 |         negative_sample_size = 0
 37 | 
 38 |         while negative_sample_size < self.negative_sample_size:
 39 |             negative_sample = np.random.randint(self.nentity, size=self.negative_sample_size*2)
 40 |             if self.mode == 'head-batch':
 41 |                 mask = np.in1d(
 42 |                     negative_sample, 
 43 |                     self.true_head[(relation, tail)], 
 44 |                     assume_unique=True, 
 45 |                     invert=True
 46 |                 )
 47 |             elif self.mode == 'tail-batch':
 48 |                 mask = np.in1d(
 49 |                     negative_sample, 
 50 |                     self.true_tail[(head, relation)], 
 51 |                     assume_unique=True, 
 52 |                     invert=True
 53 |                 )
 54 |             else:
 55 |                 raise ValueError('Training batch mode %s not supported' % self.mode)
 56 |             negative_sample = negative_sample[mask]
 57 |             negative_sample_list.append(negative_sample)
 58 |             negative_sample_size += negative_sample.size
 59 |         
 60 |         negative_sample = np.concatenate(negative_sample_list)[:self.negative_sample_size]
 61 | 
 62 |         negative_sample = torch.from_numpy(negative_sample)
 63 |         
 64 |         positive_sample = torch.LongTensor(positive_sample)
 65 |             
 66 |         return positive_sample, negative_sample, subsampling_weight, self.mode
 67 |     
 68 |     @staticmethod
 69 |     def collate_fn(data):
 70 |         positive_sample = torch.stack([_[0] for _ in data], dim=0)
 71 |         negative_sample = torch.stack([_[1] for _ in data], dim=0)
 72 |         subsample_weight = torch.cat([_[2] for _ in data], dim=0)
 73 |         mode = data[0][3]
 74 |         return positive_sample, negative_sample, subsample_weight, mode
 75 |     
 76 |     @staticmethod
 77 |     def count_frequency(triples, start=4):
 78 |         '''
 79 |         Get frequency of a partial triple like (head, relation) or (relation, tail)
 80 |         The frequency will be used for subsampling like word2vec
 81 |         '''
 82 |         count = {}
 83 |         for head, relation, tail in triples:
 84 |             if (head, relation) not in count:
 85 |                 count[(head, relation)] = start
 86 |             else:
 87 |                 count[(head, relation)] += 1
 88 | 
 89 |             if (tail, -relation-1) not in count:
 90 |                 count[(tail, -relation-1)] = start
 91 |             else:
 92 |                 count[(tail, -relation-1)] += 1
 93 |         return count
 94 |     
 95 |     @staticmethod
 96 |     def get_true_head_and_tail(triples):
 97 |         '''
 98 |         Build a dictionary of true triples that will
 99 |         be used to filter these true triples for negative sampling
100 |         '''
101 |         
102 |         true_head = {}
103 |         true_tail = {}
104 | 
105 |         for head, relation, tail in triples:
106 |             if (head, relation) not in true_tail:
107 |                 true_tail[(head, relation)] = []
108 |             true_tail[(head, relation)].append(tail)
109 |             if (relation, tail) not in true_head:
110 |                 true_head[(relation, tail)] = []
111 |             true_head[(relation, tail)].append(head)
112 | 
113 |         for relation, tail in true_head:
114 |             true_head[(relation, tail)] = np.array(list(set(true_head[(relation, tail)])))
115 |         for head, relation in true_tail:
116 |             true_tail[(head, relation)] = np.array(list(set(true_tail[(head, relation)])))                 
117 | 
118 |         return true_head, true_tail
119 | 
120 |     
121 | class TestDataset(Dataset):
122 |     def __init__(self, triples, all_true_triples, nentity, nrelation, mode):
123 |         self.len = len(triples)
124 |         self.triple_set = set(all_true_triples)
125 |         self.triples = triples
126 |         self.nentity = nentity
127 |         self.nrelation = nrelation
128 |         self.mode = mode
129 | 
130 |     def __len__(self):
131 |         return self.len
132 |     
133 |     def __getitem__(self, idx):
134 |         head, relation, tail = self.triples[idx]
135 | 
136 |         if self.mode == 'head-batch':
137 |             tmp = [(0, rand_head) if (rand_head, relation, tail) not in self.triple_set
138 |                    else (-1, head) for rand_head in range(self.nentity)]
139 |             tmp[head] = (0, head)
140 |         elif self.mode == 'tail-batch':
141 |             tmp = [(0, rand_tail) if (head, relation, rand_tail) not in self.triple_set
142 |                    else (-1, tail) for rand_tail in range(self.nentity)]
143 |             tmp[tail] = (0, tail)
144 |         else:
145 |             raise ValueError('negative batch mode %s not supported' % self.mode)
146 |             
147 |         tmp = torch.LongTensor(tmp)            
148 |         filter_bias = tmp[:, 0].float()
149 |         negative_sample = tmp[:, 1]
150 | 
151 |         positive_sample = torch.LongTensor((head, relation, tail))
152 |             
153 |         return positive_sample, negative_sample, filter_bias, self.mode
154 |     
155 |     @staticmethod
156 |     def collate_fn(data):
157 |         positive_sample = torch.stack([_[0] for _ in data], dim=0)
158 |         negative_sample = torch.stack([_[1] for _ in data], dim=0)
159 |         filter_bias = torch.stack([_[2] for _ in data], dim=0)
160 |         mode = data[0][3]
161 |         return positive_sample, negative_sample, filter_bias, mode
162 |     
163 | class BidirectionalOneShotIterator(object):
164 |     def __init__(self, dataloader_head, dataloader_tail):
165 |         self.iterator_head = self.one_shot_iterator(dataloader_head)
166 |         self.iterator_tail = self.one_shot_iterator(dataloader_tail)
167 |         self.step = 0
168 |         
169 |     def __next__(self):
170 |         self.step += 1
171 |         if self.step % 2 == 0:
172 |             data = next(self.iterator_head)
173 |         else:
174 |             data = next(self.iterator_tail)
175 |         return data
176 |     
177 |     @staticmethod
178 |     def one_shot_iterator(dataloader):
179 |         '''
180 |         Transform a PyTorch Dataloader into python iterator
181 |         '''
182 |         while True:
183 |             for data in dataloader:
184 |                 yield data


--------------------------------------------------------------------------------
/CoMPILE_github/kge/run.sh:
--------------------------------------------------------------------------------
 1 | #!/bin/sh
 2 | 
 3 | python -u -c 'import torch; print(torch.__version__)'
 4 | 
 5 | CODE_PATH=.
 6 | DATA_PATH=../data
 7 | SAVE_PATH=../experiments/kge_baselines
 8 | 
 9 | #The first four parameters must be provided
10 | MODE=$1
11 | MODEL=$2
12 | DATASET=$3
13 | GPU_DEVICE=$4
14 | SAVE_ID=$5
15 | 
16 | FULL_DATA_PATH=$DATA_PATH/$DATASET
17 | SAVE=$SAVE_PATH/"$MODEL"_"$DATASET"
18 | 
19 | #Only used in training
20 | BATCH_SIZE=$6
21 | NEGATIVE_SAMPLE_SIZE=$7
22 | HIDDEN_DIM=$8
23 | GAMMA=$9
24 | ALPHA=${10}
25 | LEARNING_RATE=${11}
26 | MAX_STEPS=${12}
27 | TEST_BATCH_SIZE=${13}
28 | 
29 | if [ $MODE == "train" ]
30 | then
31 | 
32 | echo "Start Training......"
33 | 
34 | CUDA_VISIBLE_DEVICES=$GPU_DEVICE python -u $CODE_PATH/run.py --do_train \
35 |     --cuda \
36 |     --do_valid \
37 |     --do_test \
38 |     --data_path $FULL_DATA_PATH \
39 |     --model $MODEL \
40 |     -n $NEGATIVE_SAMPLE_SIZE -b $BATCH_SIZE -d $HIDDEN_DIM \
41 |     -g $GAMMA -a $ALPHA \
42 |     -lr $LEARNING_RATE --max_steps $MAX_STEPS \
43 |     -save $SAVE --test_batch_size $TEST_BATCH_SIZE \
44 |     ${14} ${15} ${16} ${17} ${18} ${19} ${20}
45 | 
46 | elif [ $MODE == "valid" ]
47 | then
48 | 
49 | echo "Start Evaluation on Valid Data Set......"
50 | 
51 | CUDA_VISIBLE_DEVICES=$GPU_DEVICE python -u $CODE_PATH/run.py --do_valid --cuda -init $SAVE
52 |     
53 | elif [ $MODE == "test" ]
54 | then
55 | 
56 | echo "Start Evaluation on Test Data Set......"
57 | 
58 | CUDA_VISIBLE_DEVICES=$GPU_DEVICE python -u $CODE_PATH/run.py --do_test --cuda -init $SAVE
59 | 
60 | else
61 |    echo "Unknown MODE" $MODE
62 | fi
63 | 


--------------------------------------------------------------------------------
/CoMPILE_github/managers/__pycache__/evaluator.cpython-36.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TmacMai/CoMPILE_Inductive_Knowledge_Graph/523db77f8178f89faf5ec06c677ca66b84ef35eb/CoMPILE_github/managers/__pycache__/evaluator.cpython-36.pyc


--------------------------------------------------------------------------------
/CoMPILE_github/managers/__pycache__/trainer.cpython-36.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TmacMai/CoMPILE_Inductive_Knowledge_Graph/523db77f8178f89faf5ec06c677ca66b84ef35eb/CoMPILE_github/managers/__pycache__/trainer.cpython-36.pyc


--------------------------------------------------------------------------------
/CoMPILE_github/managers/evaluator.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | import numpy as np
 3 | import torch
 4 | import pdb
 5 | from sklearn import metrics
 6 | import torch.nn.functional as F
 7 | from torch.utils.data import DataLoader
 8 | 
 9 | 
10 | class Evaluator():
11 |     def __init__(self, params, graph_classifier, data):
12 |         self.params = params
13 |         self.graph_classifier = graph_classifier
14 |         self.data = data
15 | 
16 |     def eval(self, save=False):
17 |         pos_scores = []
18 |         pos_labels = []
19 |         neg_scores = []
20 |         neg_labels = []
21 |         dataloader = DataLoader(self.data, batch_size=self.params.batch_size, shuffle=False, num_workers=self.params.num_workers, collate_fn=self.params.collate_fn)
22 | 
23 |         self.graph_classifier.eval()
24 |         with torch.no_grad():
25 |             for b_idx, batch in enumerate(dataloader):
26 |                 (graphs_pos, r_labels_pos), g_labels_pos, (graph_neg, r_labels_neg), g_labels_neg = batch
27 |                # data_pos, targets_pos, data_neg, targets_neg = self.params.move_batch_to_device(batch, self.params.device)
28 |                 # print([self.data.id2relation[r.item()] for r in data_pos[1]])
29 |                 # pdb.set_trace()
30 |                 
31 |                 g_labels_pos = torch.LongTensor(g_labels_pos).to(device=self.params.device)
32 |                 r_labels_pos = torch.LongTensor(r_labels_pos).to(device=self.params.device)
33 |     
34 |                 g_labels_neg = torch.LongTensor(g_labels_neg).to(device=self.params.device)
35 |                 r_labels_neg = torch.LongTensor(r_labels_neg).to(device=self.params.device)               
36 |                 
37 |                 score_pos = self.graph_classifier(graphs_pos)
38 |                 score_neg = self.graph_classifier(graph_neg)
39 | 
40 |                 # preds += torch.argmax(logits.detach().cpu(), dim=1).tolist()
41 |                 pos_scores += score_pos.squeeze(1).detach().cpu().tolist()
42 |                 neg_scores += score_neg.squeeze(1).detach().cpu().tolist()
43 |                 pos_labels += g_labels_pos.tolist()
44 |                 neg_labels += g_labels_neg.tolist()
45 | 
46 |         # acc = metrics.accuracy_score(labels, preds)
47 |         auc = metrics.roc_auc_score(pos_labels + neg_labels, pos_scores + neg_scores)
48 |         auc_pr = metrics.average_precision_score(pos_labels + neg_labels, pos_scores + neg_scores)
49 | 
50 |         if save:
51 |             pos_test_triplets_path = os.path.join(self.params.main_dir, 'data/{}/{}.txt'.format(self.params.dataset, self.data.file_name))
52 |             with open(pos_test_triplets_path) as f:
53 |                 pos_triplets = [line.split() for line in f.read().split('\n')[:-1]]
54 |             pos_file_path = os.path.join(self.params.main_dir, 'data/{}/grail_{}_predictions.txt'.format(self.params.dataset, self.data.file_name))
55 |             with open(pos_file_path, "w") as f:
56 |                 for ([s, r, o], score) in zip(pos_triplets, pos_scores):
57 |                     f.write('\t'.join([s, r, o, str(score)]) + '\n')
58 | 
59 |             neg_test_triplets_path = os.path.join(self.params.main_dir, 'data/{}/neg_{}_0.txt'.format(self.params.dataset, self.data.file_name))
60 |             with open(neg_test_triplets_path) as f:
61 |                 neg_triplets = [line.split() for line in f.read().split('\n')[:-1]]
62 |             neg_file_path = os.path.join(self.params.main_dir, 'data/{}/grail_neg_{}_{}_predictions.txt'.format(self.params.dataset, self.data.file_name, self.params.constrained_neg_prob))
63 |             with open(neg_file_path, "w") as f:
64 |                 for ([s, r, o], score) in zip(neg_triplets, neg_scores):
65 |                     f.write('\t'.join([s, r, o, str(score)]) + '\n')
66 | 
67 |         return {'auc': auc, 'auc_pr': auc_pr}
68 | 


--------------------------------------------------------------------------------
/CoMPILE_github/managers/trainer.py:
--------------------------------------------------------------------------------
  1 | import statistics
  2 | import timeit
  3 | import os
  4 | import logging
  5 | import pdb
  6 | import numpy as np
  7 | import time
  8 | 
  9 | import torch
 10 | import torch.nn as nn
 11 | import torch.optim as optim
 12 | import torch.nn.functional as F
 13 | from torch.utils.data import DataLoader
 14 | 
 15 | from sklearn import metrics
 16 | 
 17 | 
 18 | class Trainer():
 19 |     def __init__(self, params, graph_classifier, train, valid_evaluator=None):
 20 |         self.graph_classifier = graph_classifier
 21 |         self.valid_evaluator = valid_evaluator
 22 |         self.params = params
 23 |         self.train_data = train
 24 | 
 25 |         self.updates_counter = 0
 26 | 
 27 |         model_params = list(self.graph_classifier.parameters())
 28 |         logging.info('Total number of parameters: %d' % sum(map(lambda x: x.numel(), model_params)))
 29 | 
 30 |         if params.optimizer == "SGD":
 31 |             self.optimizer = optim.SGD(model_params, lr=params.lr, momentum=params.momentum, weight_decay=self.params.l2)
 32 |         if params.optimizer == "Adam":
 33 |             self.optimizer = optim.Adam(model_params, lr=params.lr, weight_decay=self.params.l2)
 34 | 
 35 |         self.criterion = nn.MarginRankingLoss(self.params.margin, reduction='sum')
 36 | 
 37 |         self.reset_training_state()
 38 | 
 39 |     def reset_training_state(self):
 40 |         self.best_metric = 0
 41 |         self.last_metric = 0
 42 |         self.not_improved_count = 0
 43 | 
 44 |     def train_epoch(self):
 45 |         total_loss = 0
 46 |         all_preds = []
 47 |         all_labels = []
 48 |         all_scores = []
 49 | 
 50 |         dataloader = DataLoader(self.train_data, batch_size=self.params.batch_size, shuffle=True, num_workers=self.params.num_workers, collate_fn=self.params.collate_fn)
 51 |   #      dataloader = DataLoader(self.train_data, batch_size=self.params.batch_size, shuffle=True, num_workers=self.params.num_workers)
 52 |         self.graph_classifier.train()
 53 |         model_params = list(self.graph_classifier.parameters())
 54 |         for b_idx, batch in enumerate(dataloader):
 55 |             (graphs_pos, r_labels_pos), g_labels_pos, (graph_neg, r_labels_neg), g_labels_neg = batch
 56 |        
 57 |             g_labels_pos = torch.LongTensor(g_labels_pos).to(device=self.params.device)
 58 |             r_labels_pos = torch.LongTensor(r_labels_pos).to(device=self.params.device)
 59 |     
 60 |             g_labels_neg = torch.LongTensor(g_labels_neg).to(device=self.params.device)
 61 |             r_labels_neg = torch.LongTensor(r_labels_neg).to(device=self.params.device)   
 62 |                    
 63 |             self.graph_classifier.train()
 64 |            # data_pos, targets_pos, data_neg, targets_neg = self.params.move_batch_to_device(batch, self.params.device)
 65 |             self.optimizer.zero_grad()
 66 |            # print('batch size ', len(targets_pos), '     ', len(targets_neg))
 67 |            # print('r label pos ', len(data_pos[1]), '   r label neg  ', len(data_neg[1]))
 68 |             score_pos = self.graph_classifier(graphs_pos)
 69 |             score_neg = self.graph_classifier(graph_neg)
 70 |             loss = self.criterion(score_pos, score_neg.view(len(score_pos), -1).mean(dim=1), torch.Tensor([1]).to(device=self.params.device))
 71 |             # print(score_pos, score_neg, loss)
 72 |             loss.backward()
 73 |             self.optimizer.step()
 74 |             self.updates_counter += 1
 75 | 
 76 |             with torch.no_grad():
 77 |                # print(score_pos.shape, score_neg.shape)
 78 |               #  print(score_pos)
 79 |                 all_scores += score_pos.squeeze(1).detach().cpu().tolist() + score_neg.squeeze(1).detach().cpu().tolist()
 80 |                 all_labels += g_labels_pos.tolist() + g_labels_neg.tolist()
 81 |                 total_loss += loss
 82 | 
 83 |             if self.valid_evaluator and self.params.eval_every_iter and self.updates_counter % self.params.eval_every_iter == 0:
 84 |                 tic = time.time()
 85 |                 result = self.valid_evaluator.eval()
 86 |                 logging.info('\nPerformance:' + str(result) + 'in ' + str(time.time() - tic))
 87 | 
 88 |                 if result['auc'] >= self.best_metric:
 89 |                     self.save_classifier()
 90 |                     self.best_metric = result['auc']
 91 |                     self.not_improved_count = 0
 92 | 
 93 |                 else:
 94 |                     self.not_improved_count += 1
 95 |                     if self.not_improved_count > self.params.early_stop:
 96 |                         logging.info(f"Validation performance didn\'t improve for {self.params.early_stop} epochs. Training stops.")
 97 |                         break
 98 |                 self.last_metric = result['auc']
 99 | 
100 |         auc = metrics.roc_auc_score(all_labels, all_scores)
101 |         auc_pr = metrics.average_precision_score(all_labels, all_scores)
102 | 
103 |         weight_norm = sum(map(lambda x: torch.norm(x), model_params))
104 | 
105 |         return total_loss, auc, auc_pr, weight_norm
106 | 
107 |     def train(self):
108 |         self.reset_training_state()
109 | 
110 |         for epoch in range(1, self.params.num_epochs + 1):
111 |             time_start = time.time()
112 |             loss, auc, auc_pr, weight_norm = self.train_epoch()
113 |             time_elapsed = time.time() - time_start
114 |             logging.info(f'Epoch {epoch} with loss: {loss}, training auc: {auc}, training auc_pr: {auc_pr}, best validation AUC: {self.best_metric}, weight_norm: {weight_norm} in {time_elapsed}')
115 | 
116 |             # if self.valid_evaluator and epoch % self.params.eval_every == 0:
117 |             #     result = self.valid_evaluator.eval()
118 |             #     logging.info('\nPerformance:' + str(result))
119 |             
120 |             #     if result['auc'] >= self.best_metric:
121 |             #         self.save_classifier()
122 |             #         self.best_metric = result['auc']
123 |             #         self.not_improved_count = 0
124 | 
125 |             #     else:
126 |             #         self.not_improved_count += 1
127 |             #         if self.not_improved_count > self.params.early_stop:
128 |             #             logging.info(f"Validation performance didn\'t improve for {self.params.early_stop} epochs. Training stops.")
129 |             #             break
130 |             #     self.last_metric = result['auc']
131 | 
132 |             if epoch % self.params.save_every == 0:
133 |                 torch.save(self.graph_classifier, os.path.join(self.params.exp_dir, 'graph_classifier_chk.pth'))
134 | 
135 |     def save_classifier(self):
136 |         torch.save(self.graph_classifier, os.path.join(self.params.exp_dir, 'best_graph_classifier.pth'))  # Does it overwrite or fuck with the existing file?
137 |         logging.info('Better models found w.r.t accuracy. Saved it!')
138 | 


--------------------------------------------------------------------------------
/CoMPILE_github/model/dgl/aggregators.py:
--------------------------------------------------------------------------------
 1 | import abc
 2 | import torch.nn as nn
 3 | import torch
 4 | import torch.nn.functional as F
 5 | 
 6 | 
 7 | class Aggregator(nn.Module):
 8 |     def __init__(self, emb_dim):
 9 |         super(Aggregator, self).__init__()
10 | 
11 |     def forward(self, node):
12 |         curr_emb = node.mailbox['curr_emb'][:, 0, :]  # (B, F)
13 |       #  print('curr_emb 2 ', curr_emb.shape)
14 |         nei_msg = torch.bmm(node.mailbox['alpha'].transpose(1, 2), node.mailbox['msg']).squeeze(1)  # (B, F)
15 |       #  print('nei_msg 2  ', nei_msg.shape)
16 |         # nei_msg, _ = torch.max(node.mailbox['msg'], 1)  # (B, F)
17 | 
18 |         new_emb = self.update_embedding(curr_emb, nei_msg)
19 | 
20 |         return {'h': new_emb}
21 | 
22 |     @abc.abstractmethod
23 |     def update_embedding(curr_emb, nei_msg):
24 |         raise NotImplementedError
25 | 
26 | 
27 | class SumAggregator(Aggregator):
28 |     def __init__(self, emb_dim):
29 |         super(SumAggregator, self).__init__(emb_dim)
30 | 
31 |     def update_embedding(self, curr_emb, nei_msg):
32 |         new_emb = nei_msg + curr_emb
33 |       #  print(new_emb.shape, 'new embed')
34 |         return new_emb
35 | 
36 | 
37 | class MLPAggregator(Aggregator):
38 |     def __init__(self, emb_dim):
39 |         super(MLPAggregator, self).__init__(emb_dim)
40 |         self.linear = nn.Linear(2 * emb_dim, emb_dim)
41 | 
42 |     def update_embedding(self, curr_emb, nei_msg):
43 |         inp = torch.cat((nei_msg, curr_emb), 1)
44 |         new_emb = F.relu(self.linear(inp))
45 | 
46 |         return new_emb
47 | 
48 | 
49 | class GRUAggregator(Aggregator):
50 |     def __init__(self, emb_dim):
51 |         super(GRUAggregator, self).__init__(emb_dim)
52 |         self.gru = nn.GRUCell(emb_dim, emb_dim)
53 | 
54 |     def update_embedding(self, curr_emb, nei_msg):
55 |         new_emb = self.gru(nei_msg, curr_emb)
56 | 
57 |         return new_emb
58 | 


--------------------------------------------------------------------------------
/CoMPILE_github/model/dgl/layers.py:
--------------------------------------------------------------------------------
  1 | """
  2 | File baseed off of dgl tutorial on RGCN
  3 | Source: https://github.com/dmlc/dgl/tree/master/examples/pytorch/rgcn
  4 | """
  5 | import torch
  6 | import torch.nn as nn
  7 | import torch.nn.functional as F
  8 | 
  9 | 
 10 | class Identity(nn.Module):
 11 |     """A placeholder identity operator that is argument-insensitive.
 12 |     (Identity has already been supported by PyTorch 1.2, we will directly
 13 |     import torch.nn.Identity in the future)
 14 |     """
 15 | 
 16 |     def __init__(self):
 17 |         super(Identity, self).__init__()
 18 | 
 19 |     def forward(self, x):
 20 |         """Return input"""
 21 |         return x
 22 | 
 23 | 
 24 | class RGCNLayer(nn.Module):
 25 |     def __init__(self, inp_dim, out_dim, aggregator, bias=None, activation=None, dropout=0.0, edge_dropout=0.0, is_input_layer=False):
 26 |         super(RGCNLayer, self).__init__()
 27 |         self.bias = bias
 28 |         self.activation = activation
 29 | 
 30 |         if self.bias:
 31 |             self.bias = nn.Parameter(torch.Tensor(out_dim))
 32 |             nn.init.xavier_uniform_(self.bias,
 33 |                                     gain=nn.init.calculate_gain('relu'))
 34 | 
 35 |         self.aggregator = aggregator
 36 | 
 37 |         if dropout:
 38 |             self.dropout = nn.Dropout(dropout)
 39 |         else:
 40 |             self.dropout = None
 41 | 
 42 |         if edge_dropout:
 43 |             self.edge_dropout = nn.Dropout(edge_dropout)
 44 |         else:
 45 |             self.edge_dropout = Identity()
 46 | 
 47 |     # define how propagation is done in subclass
 48 |     def propagate(self, g):
 49 |         raise NotImplementedError
 50 | 
 51 |     def forward(self, g, attn_rel_emb=None):
 52 | 
 53 |         self.propagate(g, attn_rel_emb)
 54 | 
 55 |         # apply bias and activation
 56 |         node_repr = g.ndata['h']
 57 |         if self.bias:
 58 |             node_repr = node_repr + self.bias
 59 |         if self.activation:
 60 |             node_repr = self.activation(node_repr)
 61 |         if self.dropout:
 62 |             node_repr = self.dropout(node_repr)
 63 | 
 64 |         g.ndata['h'] = node_repr
 65 | 
 66 |         if self.is_input_layer:
 67 |             g.ndata['repr'] = g.ndata['h'].unsqueeze(1)
 68 |         else:
 69 |             g.ndata['repr'] = torch.cat([g.ndata['repr'], g.ndata['h'].unsqueeze(1)], dim=1)
 70 | 
 71 | 
 72 | class RGCNBasisLayer(RGCNLayer):
 73 |     def __init__(self, inp_dim, out_dim, aggregator, attn_rel_emb_dim, num_rels, num_bases=-1, bias=None,
 74 |                  activation=None, dropout=0.0, edge_dropout=0.0, is_input_layer=False, has_attn=False):
 75 |         super(
 76 |             RGCNBasisLayer,
 77 |             self).__init__(
 78 |             inp_dim,
 79 |             out_dim,
 80 |             aggregator,
 81 |             bias,
 82 |             activation,
 83 |             dropout=dropout,
 84 |             edge_dropout=edge_dropout,
 85 |             is_input_layer=is_input_layer)
 86 |         self.inp_dim = inp_dim
 87 |         self.out_dim = out_dim
 88 |         self.attn_rel_emb_dim = attn_rel_emb_dim
 89 |         self.num_rels = num_rels
 90 |         self.num_bases = num_bases
 91 |         self.is_input_layer = is_input_layer
 92 |         self.has_attn = has_attn
 93 | 
 94 |         if self.num_bases <= 0 or self.num_bases > self.num_rels:
 95 |             self.num_bases = self.num_rels
 96 | 
 97 |         # add basis weights
 98 |         # self.weight = basis_weights
 99 |         self.weight = nn.Parameter(torch.Tensor(self.num_bases, self.inp_dim, self.out_dim))
100 |         self.w_comp = nn.Parameter(torch.Tensor(self.num_rels, self.num_bases))
101 | 
102 |         if self.has_attn:
103 |             self.A = nn.Linear(2 * self.inp_dim + 2 * self.attn_rel_emb_dim, inp_dim)
104 |             self.B = nn.Linear(inp_dim, 1)
105 | 
106 |         self.self_loop_weight = nn.Parameter(torch.Tensor(self.inp_dim, self.out_dim))
107 | 
108 |         nn.init.xavier_uniform_(self.self_loop_weight, gain=nn.init.calculate_gain('relu'))
109 |         nn.init.xavier_uniform_(self.weight, gain=nn.init.calculate_gain('relu'))
110 |         nn.init.xavier_uniform_(self.w_comp, gain=nn.init.calculate_gain('relu'))
111 | 
112 |     def propagate(self, g, attn_rel_emb=None):
113 |         # generate all weights from bases
114 |         weight = self.weight.view(self.num_bases,
115 |                                   self.inp_dim * self.out_dim)
116 |         weight = torch.matmul(self.w_comp, weight).view(
117 |             self.num_rels, self.inp_dim, self.out_dim)
118 | 
119 |         g.edata['w'] = self.edge_dropout(torch.ones(g.number_of_edges(), 1).to(weight.device))
120 |         print('number of edge ', g.number_of_edges())
121 |         input_ = 'feat' if self.is_input_layer else 'h'
122 | 
123 |         def msg_func(edges):
124 |             w = weight.index_select(0, edges.data['type'])
125 |             msg = edges.data['w'] * torch.bmm(edges.src[input_].unsqueeze(1), w).squeeze(1)
126 |             curr_emb = torch.mm(edges.dst[input_], self.self_loop_weight)  # (B, F)
127 |          #   print('curren embed ', curr_emb.shape, 'msg ',msg.shape)
128 |             if self.has_attn:
129 |                 e = torch.cat([edges.src[input_], edges.dst[input_], attn_rel_emb(edges.data['type']), attn_rel_emb(edges.data['label'])], dim=1)
130 |                 a = torch.sigmoid(self.B(F.relu(self.A(e))))
131 |             else:
132 |                 a = torch.ones((len(edges), 1)).to(device=w.device)
133 | 
134 |             return {'curr_emb': curr_emb, 'msg': msg, 'alpha': a}
135 | 
136 |         g.update_all(msg_func, self.aggregator, None)
137 | 


--------------------------------------------------------------------------------
/CoMPILE_github/model/dgl/rgcn_model.py:
--------------------------------------------------------------------------------
  1 | """
  2 | File based off of dgl tutorial on RGCN
  3 | Source: https://github.com/dmlc/dgl/tree/master/examples/pytorch/rgcn
  4 | """
  5 | 
  6 | import torch
  7 | import torch.nn as nn
  8 | import torch.nn.functional as F
  9 | from .layers import RGCNBasisLayer as RGCNLayer
 10 | 
 11 | from .aggregators import SumAggregator, MLPAggregator, GRUAggregator
 12 | 
 13 | 
 14 | class RGCN(nn.Module):
 15 |     def __init__(self, params):
 16 |         super(RGCN, self).__init__()
 17 | 
 18 |         self.max_label_value = params.max_label_value
 19 |         self.inp_dim = params.inp_dim
 20 |         self.emb_dim = params.emb_dim
 21 |         self.attn_rel_emb_dim = params.attn_rel_emb_dim
 22 |         self.num_rels = params.num_rels
 23 |         self.aug_num_rels = params.aug_num_rels
 24 |         self.num_bases = params.num_bases
 25 |         self.num_hidden_layers = params.num_gcn_layers
 26 |         self.dropout = params.dropout
 27 |         self.edge_dropout = params.edge_dropout
 28 |         # self.aggregator_type = params.gnn_agg_type
 29 |         self.has_attn = params.has_attn
 30 | 
 31 |         self.device = params.device
 32 | 
 33 |         if self.has_attn:
 34 |             self.attn_rel_emb = nn.Embedding(self.num_rels, self.attn_rel_emb_dim, sparse=False)
 35 |         else:
 36 |             self.attn_rel_emb = None
 37 | 
 38 |         # initialize aggregators for input and hidden layers
 39 |         if params.gnn_agg_type == "sum":
 40 |             self.aggregator = SumAggregator(self.emb_dim)
 41 |         elif params.gnn_agg_type == "mlp":
 42 |             self.aggregator = MLPAggregator(self.emb_dim)
 43 |         elif params.gnn_agg_type == "gru":
 44 |             self.aggregator = GRUAggregator(self.emb_dim)
 45 | 
 46 |         # initialize basis weights for input and hidden layers
 47 |         # self.input_basis_weights = nn.Parameter(torch.Tensor(self.num_bases, self.inp_dim, self.emb_dim))
 48 |         # self.basis_weights = nn.Parameter(torch.Tensor(self.num_bases, self.emb_dim, self.emb_dim))
 49 | 
 50 |         # create rgcn layers
 51 |         self.build_model()
 52 | 
 53 |         # create initial features
 54 |         self.features = self.create_features()
 55 | 
 56 |     def create_features(self):
 57 |         features = torch.arange(self.inp_dim).to(device=self.device)
 58 |         return features
 59 | 
 60 |     def build_model(self):
 61 |         self.layers = nn.ModuleList()
 62 |         # i2h
 63 |         i2h = self.build_input_layer()
 64 |         if i2h is not None:
 65 |             self.layers.append(i2h)
 66 |         # h2h
 67 |         for idx in range(self.num_hidden_layers - 1):
 68 |             h2h = self.build_hidden_layer(idx)
 69 |             self.layers.append(h2h)
 70 | 
 71 |     def build_input_layer(self):
 72 |         return RGCNLayer(self.inp_dim,
 73 |                          self.emb_dim,
 74 |                          # self.input_basis_weights,
 75 |                          self.aggregator,
 76 |                          self.attn_rel_emb_dim,
 77 |                          self.aug_num_rels,
 78 |                          self.num_bases,
 79 |                          activation=F.relu,
 80 |                          dropout=self.dropout,
 81 |                          edge_dropout=self.edge_dropout,
 82 |                          is_input_layer=True,
 83 |                          has_attn=self.has_attn)
 84 | 
 85 |     def build_hidden_layer(self, idx):
 86 |         return RGCNLayer(self.emb_dim,
 87 |                          self.emb_dim,
 88 |                          # self.basis_weights,
 89 |                          self.aggregator,
 90 |                          self.attn_rel_emb_dim,
 91 |                          self.aug_num_rels,
 92 |                          self.num_bases,
 93 |                          activation=F.relu,
 94 |                          dropout=self.dropout,
 95 |                          edge_dropout=self.edge_dropout,
 96 |                          has_attn=self.has_attn)
 97 | 
 98 |     def forward(self, g):
 99 |         for layer in self.layers:
100 |             layer(g, self.attn_rel_emb)
101 |         return g.ndata.pop('h')
102 | 


--------------------------------------------------------------------------------
/CoMPILE_github/requirements.txt:
--------------------------------------------------------------------------------
1 | The requirements are the same as those in GraiL:
2 | dgl==0.4.2
3 | lmdb==0.98
4 | networkx==2.4
5 | scikit-learn==0.22.1
6 | torch==1.4.0
7 | tqdm==4.43.0
8 | 


--------------------------------------------------------------------------------
/CoMPILE_github/subgraph_extraction/__pycache__/datasets.cpython-36.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TmacMai/CoMPILE_Inductive_Knowledge_Graph/523db77f8178f89faf5ec06c677ca66b84ef35eb/CoMPILE_github/subgraph_extraction/__pycache__/datasets.cpython-36.pyc


--------------------------------------------------------------------------------
/CoMPILE_github/subgraph_extraction/__pycache__/graph_sampler.cpython-36.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TmacMai/CoMPILE_Inductive_Knowledge_Graph/523db77f8178f89faf5ec06c677ca66b84ef35eb/CoMPILE_github/subgraph_extraction/__pycache__/graph_sampler.cpython-36.pyc


--------------------------------------------------------------------------------
/CoMPILE_github/subgraph_extraction/datasets.py:
--------------------------------------------------------------------------------
  1 | from torch.utils.data import Dataset
  2 | import timeit
  3 | import os
  4 | import logging
  5 | import lmdb
  6 | import numpy as np
  7 | import json
  8 | import pickle
  9 | import dgl
 10 | from utils.graph_utils import ssp_multigraph_to_dgl, incidence_matrix
 11 | from utils.data_utils import process_files, save_to_file, plot_rel_dist
 12 | from .graph_sampler import *
 13 | import pdb
 14 | 
 15 | 
 16 | def generate_subgraph_datasets(params, splits=['train', 'valid'], saved_relation2id=None, max_label_value=None):
 17 | 
 18 |     testing = 'test' in splits
 19 |     adj_list, triplets, entity2id, relation2id, id2entity, id2relation = process_files(params.file_paths, saved_relation2id)
 20 | 
 21 |     # plot_rel_dist(adj_list, os.path.join(params.main_dir, f'data/{params.dataset}/rel_dist.png'))
 22 | 
 23 |     data_path = os.path.join(params.main_dir, f'data/{params.dataset}/relation2id.json')
 24 |     if not os.path.isdir(data_path) and not testing:
 25 |         with open(data_path, 'w') as f:
 26 |             json.dump(relation2id, f)
 27 | 
 28 |     graphs = {}
 29 | 
 30 |     for split_name in splits:
 31 |         graphs[split_name] = {'triplets': triplets[split_name], 'max_size': params.max_links}
 32 | 
 33 |     # Sample train and valid/test links
 34 |     for split_name, split in graphs.items():
 35 |         logging.info(f"Sampling negative links for {split_name}")
 36 |         split['pos'], split['neg'] = sample_neg(adj_list, split['triplets'], params.num_neg_samples_per_link, max_size=split['max_size'], constrained_neg_prob=params.constrained_neg_prob)
 37 | 
 38 |     if testing:
 39 |         directory = os.path.join(params.main_dir, 'data/{}/'.format(params.dataset))
 40 |         save_to_file(directory, f'neg_{params.test_file}_{params.constrained_neg_prob}.txt', graphs['test']['neg'], id2entity, id2relation)
 41 | 
 42 |     links2subgraphs(adj_list, graphs, params, max_label_value)
 43 | 
 44 | 
 45 | def get_kge_embeddings(dataset, kge_model):
 46 | 
 47 |     path = './experiments/kge_baselines/{}_{}'.format(kge_model, dataset)
 48 |     node_features = np.load(os.path.join(path, 'entity_embedding.npy'))
 49 |     with open(os.path.join(path, 'id2entity.json')) as json_file:
 50 |         kge_id2entity = json.load(json_file)
 51 |         kge_entity2id = {v: int(k) for k, v in kge_id2entity.items()}
 52 | 
 53 |     return node_features, kge_entity2id
 54 | 
 55 | 
 56 | class SubgraphDataset(Dataset):
 57 |     """Extracted, labeled, subgraph dataset -- DGL Only"""
 58 | 
 59 |     def __init__(self, db_path, db_name_pos, db_name_neg, raw_data_paths, included_relations=None, add_traspose_rels=False, num_neg_samples_per_link=1, use_kge_embeddings=False, dataset='', kge_model='', file_name=''):
 60 | 
 61 |         self.main_env = lmdb.open(db_path, readonly=True, max_dbs=3, lock=False)
 62 |         self.db_pos = self.main_env.open_db(db_name_pos.encode())
 63 |         self.db_neg = self.main_env.open_db(db_name_neg.encode())
 64 |         self.node_features, self.kge_entity2id = get_kge_embeddings(dataset, kge_model) if use_kge_embeddings else (None, None)
 65 |         self.num_neg_samples_per_link = num_neg_samples_per_link
 66 |         self.file_name = file_name
 67 | 
 68 |         ssp_graph, __, __, __, id2entity, id2relation = process_files(raw_data_paths, included_relations)
 69 |         self.num_rels = len(ssp_graph)
 70 | 
 71 |         # Add transpose matrices to handle both directions of relations.
 72 |         if add_traspose_rels:
 73 |             ssp_graph_t = [adj.T for adj in ssp_graph]
 74 |             ssp_graph += ssp_graph_t
 75 | 
 76 |         # the effective number of relations after adding symmetric adjacency matrices and/or self connections
 77 |         self.aug_num_rels = len(ssp_graph)
 78 |         self.graph = ssp_multigraph_to_dgl(ssp_graph)
 79 |         self.ssp_graph = ssp_graph
 80 |         self.id2entity = id2entity
 81 |         self.id2relation = id2relation
 82 | 
 83 |         self.max_n_label = np.array([0, 0])
 84 |         with self.main_env.begin() as txn:
 85 |             self.max_n_label[0] = int.from_bytes(txn.get('max_n_label_sub'.encode()), byteorder='little')
 86 |             self.max_n_label[1] = int.from_bytes(txn.get('max_n_label_obj'.encode()), byteorder='little')
 87 | 
 88 |             self.avg_subgraph_size = struct.unpack('f', txn.get('avg_subgraph_size'.encode()))
 89 |             self.min_subgraph_size = struct.unpack('f', txn.get('min_subgraph_size'.encode()))
 90 |             self.max_subgraph_size = struct.unpack('f', txn.get('max_subgraph_size'.encode()))
 91 |             self.std_subgraph_size = struct.unpack('f', txn.get('std_subgraph_size'.encode()))
 92 | 
 93 |             self.avg_enc_ratio = struct.unpack('f', txn.get('avg_enc_ratio'.encode()))
 94 |             self.min_enc_ratio = struct.unpack('f', txn.get('min_enc_ratio'.encode()))
 95 |             self.max_enc_ratio = struct.unpack('f', txn.get('max_enc_ratio'.encode()))
 96 |             self.std_enc_ratio = struct.unpack('f', txn.get('std_enc_ratio'.encode()))
 97 | 
 98 |             self.avg_num_pruned_nodes = struct.unpack('f', txn.get('avg_num_pruned_nodes'.encode()))
 99 |             self.min_num_pruned_nodes = struct.unpack('f', txn.get('min_num_pruned_nodes'.encode()))
100 |             self.max_num_pruned_nodes = struct.unpack('f', txn.get('max_num_pruned_nodes'.encode()))
101 |             self.std_num_pruned_nodes = struct.unpack('f', txn.get('std_num_pruned_nodes'.encode()))
102 | 
103 |         logging.info(f"Max distance from sub : {self.max_n_label[0]}, Max distance from obj : {self.max_n_label[1]}")
104 | 
105 |         # logging.info('=====================')
106 |         # logging.info(f"Subgraph size stats: \n Avg size {self.avg_subgraph_size}, \n Min size {self.min_subgraph_size}, \n Max size {self.max_subgraph_size}, \n Std {self.std_subgraph_size}")
107 | 
108 |         # logging.info('=====================')
109 |         # logging.info(f"Enclosed nodes ratio stats: \n Avg size {self.avg_enc_ratio}, \n Min size {self.min_enc_ratio}, \n Max size {self.max_enc_ratio}, \n Std {self.std_enc_ratio}")
110 | 
111 |         # logging.info('=====================')
112 |         # logging.info(f"# of pruned nodes stats: \n Avg size {self.avg_num_pruned_nodes}, \n Min size {self.min_num_pruned_nodes}, \n Max size {self.max_num_pruned_nodes}, \n Std {self.std_num_pruned_nodes}")
113 | 
114 |         with self.main_env.begin(db=self.db_pos) as txn:
115 |             self.num_graphs_pos = int.from_bytes(txn.get('num_graphs'.encode()), byteorder='little')
116 |         with self.main_env.begin(db=self.db_neg) as txn:
117 |             self.num_graphs_neg = int.from_bytes(txn.get('num_graphs'.encode()), byteorder='little')
118 | 
119 |         self.__getitem__(0)
120 | 
121 |     def __getitem__(self, index):
122 |         with self.main_env.begin(db=self.db_pos) as txn:
123 |           #  print('index ', index)
124 |             str_id = '{:08}'.format(index).encode('ascii')
125 |             nodes_pos, r_label_pos, g_label_pos, n_labels_pos = deserialize(txn.get(str_id)).values()
126 |          #   print('nodes_pos shape ', len(nodes_pos))
127 |          #   print('nodes_pos ',nodes_pos)   #############nodes
128 |           #  print('r_label_pos shape ',r_label_pos.shape)   #####relation target  shape 1
129 |           #  print('r_label_pos ',r_label_pos)    
130 |           #  print('g_label_pos shape ',g_label_pos.shape)  ###########graph label 0 or 1 shape 1
131 |           #  print('g_label_pos ',g_label_pos) 
132 |             subgraph_pos = self._prepare_subgraphs(nodes_pos, r_label_pos, n_labels_pos)
133 |            # print('subgraph_pos ', len(subgraph_pos))
134 |         subgraphs_neg = []
135 |         r_labels_neg = []
136 |         g_labels_neg = []
137 |         with self.main_env.begin(db=self.db_neg) as txn:
138 |             for i in range(self.num_neg_samples_per_link):
139 |                 str_id = '{:08}'.format(index + i * (self.num_graphs_pos)).encode('ascii')
140 |                 nodes_neg, r_label_neg, g_label_neg, n_labels_neg = deserialize(txn.get(str_id)).values()
141 |                 subgraphs_neg.append(self._prepare_subgraphs(nodes_neg, r_label_neg, n_labels_neg))
142 |                 r_labels_neg.append(r_label_neg)
143 |                 g_labels_neg.append(g_label_neg)
144 | 
145 |         return subgraph_pos, g_label_pos, r_label_pos, subgraphs_neg, g_labels_neg, r_labels_neg
146 | 
147 |     def __len__(self):
148 |         return self.num_graphs_pos
149 | 
150 |     def _prepare_subgraphs(self, nodes, r_label, n_labels):
151 |         subgraph = dgl.DGLGraph(self.graph.subgraph(nodes))
152 |         subgraph.edata['type'] = self.graph.edata['type'][self.graph.subgraph(nodes).parent_eid]
153 |         subgraph.edata['label'] = torch.tensor(r_label * np.ones(subgraph.edata['type'].shape), dtype=torch.long)
154 | 
155 |         edges_btw_roots = subgraph.edge_id(0, 1)
156 |         rel_link = np.nonzero(subgraph.edata['type'][edges_btw_roots] == r_label)
157 |         if rel_link.squeeze().nelement() == 0:
158 |             subgraph.add_edge(0, 1)
159 |             subgraph.edata['type'][-1] = torch.tensor(r_label).type(torch.LongTensor)
160 |             subgraph.edata['label'][-1] = torch.tensor(r_label).type(torch.LongTensor)
161 |            # print('all edges length ', len(subgraph.edges()[0]))  #########source: subgraph.edges()[0] target: subgraph.edges()[1]
162 |         # map the id read by GraIL to the entity IDs as registered by the KGE embeddings
163 |         kge_nodes = [self.kge_entity2id[self.id2entity[n]] for n in nodes] if self.kge_entity2id else None
164 |         n_feats = self.node_features[kge_nodes] if self.node_features is not None else None
165 |         subgraph = self._prepare_features_new(subgraph, n_labels, n_feats)
166 | 
167 |         return subgraph
168 | 
169 |     def _prepare_features(self, subgraph, n_labels, n_feats=None):
170 |         # One hot encode the node label feature and concat to n_featsure
171 |         n_nodes = subgraph.number_of_nodes()
172 |         label_feats = np.zeros((n_nodes, self.max_n_label[0] + 1))
173 |         label_feats[np.arange(n_nodes), n_labels] = 1
174 |         label_feats[np.arange(n_nodes), self.max_n_label[0] + 1 + n_labels[:, 1]] = 1
175 |         n_feats = np.concatenate((label_feats, n_feats), axis=1) if n_feats else label_feats
176 |         subgraph.ndata['feat'] = torch.FloatTensor(n_feats)
177 |         self.n_feat_dim = n_feats.shape[1]  # Find cleaner way to do this -- i.e. set the n_feat_dim
178 |         return subgraph
179 | 
180 |     def _prepare_features_new(self, subgraph, n_labels, n_feats=None):
181 |         # One hot encode the node label feature and concat to n_featsure
182 |         n_nodes = subgraph.number_of_nodes()
183 |         label_feats = np.zeros((n_nodes, self.max_n_label[0] + 1 + self.max_n_label[1] + 1))
184 |         label_feats[np.arange(n_nodes), n_labels[:, 0]] = 1
185 |         label_feats[np.arange(n_nodes), self.max_n_label[0] + 1 + n_labels[:, 1]] = 1
186 |         # label_feats = np.zeros((n_nodes, self.max_n_label[0] + 1 + self.max_n_label[1] + 1))
187 |         # label_feats[np.arange(n_nodes), 0] = 1
188 |         # label_feats[np.arange(n_nodes), self.max_n_label[0] + 1] = 1
189 |         n_feats = np.concatenate((label_feats, n_feats), axis=1) if n_feats is not None else label_feats
190 |         subgraph.ndata['feat'] = torch.FloatTensor(n_feats)
191 | 
192 |         head_id = np.argwhere([label[0] == 0 and label[1] == 1 for label in n_labels])   ###################
193 |         tail_id = np.argwhere([label[0] == 1 and label[1] == 0 for label in n_labels])   ##############
194 |         n_ids = np.zeros(n_nodes)
195 |         n_ids[head_id] = 1  # head
196 |         n_ids[tail_id] = 2  # tail
197 |         subgraph.ndata['id'] = torch.FloatTensor(n_ids)
198 |       #  subgraph.ndata['head_id'] = head_id
199 |       #  subgraph.ndata['tail_id'] = tail_id        
200 |         self.n_feat_dim = n_feats.shape[1]  # Find cleaner way to do this -- i.e. set the n_feat_dim
201 |         return subgraph
202 | 


--------------------------------------------------------------------------------
/CoMPILE_github/subgraph_extraction/graph_sampler.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | import math
  3 | import struct
  4 | import logging
  5 | import random
  6 | import pickle as pkl
  7 | import pdb
  8 | from tqdm import tqdm
  9 | import lmdb
 10 | import multiprocessing as mp
 11 | import numpy as np
 12 | import scipy.io as sio
 13 | import scipy.sparse as ssp
 14 | import sys
 15 | import torch
 16 | from scipy.special import softmax
 17 | from utils.dgl_utils import _bfs_relational
 18 | from utils.graph_utils import incidence_matrix, remove_nodes, ssp_to_torch, serialize, deserialize, get_edge_count, diameter, radius
 19 | import networkx as nx
 20 | 
 21 | 
 22 | def sample_neg(adj_list, edges, num_neg_samples_per_link=1, max_size=1000000, constrained_neg_prob=0):
 23 |     pos_edges = edges
 24 |     neg_edges = []
 25 | 
 26 |     # if max_size is set, randomly sample train links
 27 |     if max_size < len(pos_edges):
 28 |         perm = np.random.permutation(len(pos_edges))[:max_size]
 29 |         pos_edges = pos_edges[perm]
 30 | 
 31 |     # sample negative links for train/test
 32 |     n, r = adj_list[0].shape[0], len(adj_list)
 33 | 
 34 |     # distribution of edges across reelations
 35 |     theta = 0.001
 36 |     edge_count = get_edge_count(adj_list)
 37 |     rel_dist = np.zeros(edge_count.shape)
 38 |     idx = np.nonzero(edge_count)
 39 |     rel_dist[idx] = softmax(theta * edge_count[idx])
 40 | 
 41 |     # possible head and tails for each relation
 42 |     valid_heads = [adj.tocoo().row.tolist() for adj in adj_list]
 43 |     valid_tails = [adj.tocoo().col.tolist() for adj in adj_list]
 44 | 
 45 |     pbar = tqdm(total=len(pos_edges))
 46 |     while len(neg_edges) < num_neg_samples_per_link * len(pos_edges):
 47 |         neg_head, neg_tail, rel = pos_edges[pbar.n % len(pos_edges)][0], pos_edges[pbar.n % len(pos_edges)][1], pos_edges[pbar.n % len(pos_edges)][2]
 48 |         if np.random.uniform() < constrained_neg_prob:
 49 |             if np.random.uniform() < 0.5:
 50 |                 neg_head = np.random.choice(valid_heads[rel])
 51 |             else:
 52 |                 neg_tail = np.random.choice(valid_tails[rel])
 53 |         else:
 54 |             if np.random.uniform() < 0.5:
 55 |                 neg_head = np.random.choice(n)
 56 |             else:
 57 |                 neg_tail = np.random.choice(n)
 58 | 
 59 |         if neg_head != neg_tail and adj_list[rel][neg_head, neg_tail] == 0:
 60 |             neg_edges.append([neg_head, neg_tail, rel])
 61 |             pbar.update(1)
 62 | 
 63 |     pbar.close()
 64 | 
 65 |     neg_edges = np.array(neg_edges)
 66 |     return pos_edges, neg_edges
 67 | 
 68 | 
 69 | def links2subgraphs(A, graphs, params, max_label_value=None):
 70 |     '''
 71 |     extract enclosing subgraphs, write map mode + named dbs
 72 |     '''
 73 |     max_n_label = {'value': np.array([0, 0])}
 74 |     subgraph_sizes = []
 75 |     enc_ratios = []
 76 |     num_pruned_nodes = []
 77 | 
 78 |     BYTES_PER_DATUM = get_average_subgraph_size(100, list(graphs.values())[0]['pos'], A, params) * 1.5
 79 |     links_length = 0
 80 |     for split_name, split in graphs.items():
 81 |         links_length += (len(split['pos']) + len(split['neg'])) * 2
 82 |     map_size = links_length * BYTES_PER_DATUM
 83 | 
 84 |     env = lmdb.open(params.db_path, map_size=map_size, max_dbs=6)
 85 | 
 86 |     def extraction_helper(A, links, g_labels, split_env):
 87 | 
 88 |         with env.begin(write=True, db=split_env) as txn:
 89 |             txn.put('num_graphs'.encode(), (len(links)).to_bytes(int.bit_length(len(links)), byteorder='little'))
 90 | 
 91 |         with mp.Pool(processes=None, initializer=intialize_worker, initargs=(A, params, max_label_value)) as p:
 92 |             args_ = zip(range(len(links)), links, g_labels)
 93 |             for (str_id, datum) in tqdm(p.imap(extract_save_subgraph, args_), total=len(links)):
 94 |                 max_n_label['value'] = np.maximum(np.max(datum['n_labels'], axis=0), max_n_label['value'])
 95 |                 subgraph_sizes.append(datum['subgraph_size'])
 96 |                 enc_ratios.append(datum['enc_ratio'])
 97 |                 num_pruned_nodes.append(datum['num_pruned_nodes'])
 98 | 
 99 |                 with env.begin(write=True, db=split_env) as txn:
100 |                     txn.put(str_id, serialize(datum))
101 | 
102 |     for split_name, split in graphs.items():
103 |         logging.info(f"Extracting enclosing subgraphs for positive links in {split_name} set")
104 |         labels = np.ones(len(split['pos']))
105 |         db_name_pos = split_name + '_pos'
106 |         split_env = env.open_db(db_name_pos.encode())
107 |         extraction_helper(A, split['pos'], labels, split_env)
108 | 
109 |         logging.info(f"Extracting enclosing subgraphs for negative links in {split_name} set")
110 |         labels = np.zeros(len(split['neg']))
111 |         db_name_neg = split_name + '_neg'
112 |         split_env = env.open_db(db_name_neg.encode())
113 |         extraction_helper(A, split['neg'], labels, split_env)
114 | 
115 |     max_n_label['value'] = max_label_value if max_label_value is not None else max_n_label['value']
116 | 
117 |     with env.begin(write=True) as txn:
118 |         bit_len_label_sub = int.bit_length(int(max_n_label['value'][0]))
119 |         bit_len_label_obj = int.bit_length(int(max_n_label['value'][1]))
120 |         txn.put('max_n_label_sub'.encode(), (int(max_n_label['value'][0])).to_bytes(bit_len_label_sub, byteorder='little'))
121 |         txn.put('max_n_label_obj'.encode(), (int(max_n_label['value'][1])).to_bytes(bit_len_label_obj, byteorder='little'))
122 | 
123 |         txn.put('avg_subgraph_size'.encode(), struct.pack('f', float(np.mean(subgraph_sizes))))
124 |         txn.put('min_subgraph_size'.encode(), struct.pack('f', float(np.min(subgraph_sizes))))
125 |         txn.put('max_subgraph_size'.encode(), struct.pack('f', float(np.max(subgraph_sizes))))
126 |         txn.put('std_subgraph_size'.encode(), struct.pack('f', float(np.std(subgraph_sizes))))
127 | 
128 |         txn.put('avg_enc_ratio'.encode(), struct.pack('f', float(np.mean(enc_ratios))))
129 |         txn.put('min_enc_ratio'.encode(), struct.pack('f', float(np.min(enc_ratios))))
130 |         txn.put('max_enc_ratio'.encode(), struct.pack('f', float(np.max(enc_ratios))))
131 |         txn.put('std_enc_ratio'.encode(), struct.pack('f', float(np.std(enc_ratios))))
132 | 
133 |         txn.put('avg_num_pruned_nodes'.encode(), struct.pack('f', float(np.mean(num_pruned_nodes))))
134 |         txn.put('min_num_pruned_nodes'.encode(), struct.pack('f', float(np.min(num_pruned_nodes))))
135 |         txn.put('max_num_pruned_nodes'.encode(), struct.pack('f', float(np.max(num_pruned_nodes))))
136 |         txn.put('std_num_pruned_nodes'.encode(), struct.pack('f', float(np.std(num_pruned_nodes))))
137 | 
138 | 
139 | def get_average_subgraph_size(sample_size, links, A, params):
140 |     total_size = 0
141 |     for (n1, n2, r_label) in links[np.random.choice(len(links), sample_size)]:
142 |         nodes, n_labels, subgraph_size, enc_ratio, num_pruned_nodes = subgraph_extraction_labeling((n1, n2), r_label, A, params.hop, params.enclosing_sub_graph, params.max_nodes_per_hop)
143 |         datum = {'nodes': nodes, 'r_label': r_label, 'g_label': 0, 'n_labels': n_labels, 'subgraph_size': subgraph_size, 'enc_ratio': enc_ratio, 'num_pruned_nodes': num_pruned_nodes}
144 |         total_size += len(serialize(datum))
145 |     return total_size / sample_size
146 | 
147 | 
148 | def intialize_worker(A, params, max_label_value):
149 |     global A_, params_, max_label_value_
150 |     A_, params_, max_label_value_ = A, params, max_label_value
151 | 
152 | 
153 | def extract_save_subgraph(args_):
154 |     idx, (n1, n2, r_label), g_label = args_
155 |     nodes, n_labels, subgraph_size, enc_ratio, num_pruned_nodes = subgraph_extraction_labeling((n1, n2), r_label, A_, params_.hop, params_.enclosing_sub_graph, params_.max_nodes_per_hop)
156 | 
157 |     # max_label_value_ is to set the maximum possible value of node label while doing double-radius labelling.
158 |     if max_label_value_ is not None:
159 |         n_labels = np.array([np.minimum(label, max_label_value_).tolist() for label in n_labels])
160 | 
161 |     datum = {'nodes': nodes, 'r_label': r_label, 'g_label': g_label, 'n_labels': n_labels, 'subgraph_size': subgraph_size, 'enc_ratio': enc_ratio, 'num_pruned_nodes': num_pruned_nodes}
162 |     str_id = '{:08}'.format(idx).encode('ascii')
163 | 
164 |     return (str_id, datum)
165 | 
166 | 
167 | def get_neighbor_nodes(roots, adj, h=1, max_nodes_per_hop=None):
168 |     bfs_generator = _bfs_relational(adj, roots, max_nodes_per_hop)
169 |     lvls = list()
170 |     for _ in range(h):
171 |         try:
172 |             lvls.append(next(bfs_generator))
173 |         except StopIteration:
174 |             pass
175 |     return set().union(*lvls)
176 | 
177 | 
178 | def subgraph_extraction_labeling(ind, rel, A_list, h=1, enclosing_sub_graph=False, max_nodes_per_hop=None, max_node_label_value=None):
179 |     # extract the h-hop enclosing subgraphs around link 'ind'
180 |     A_incidence = incidence_matrix(A_list)
181 |     A_incidence += A_incidence.T
182 | 
183 |     root1_nei = get_neighbor_nodes(set([ind[0]]), A_incidence, h, max_nodes_per_hop)
184 |     root2_nei = get_neighbor_nodes(set([ind[1]]), A_incidence, h, max_nodes_per_hop)
185 | 
186 |     subgraph_nei_nodes_int = root1_nei.intersection(root2_nei)
187 |     subgraph_nei_nodes_un = root1_nei.union(root2_nei)
188 | 
189 |     # Extract subgraph | Roots being in the front is essential for labelling and the model to work properly.
190 |     if enclosing_sub_graph:
191 |         subgraph_nodes = list(ind) + list(subgraph_nei_nodes_int)
192 |     else:
193 |         subgraph_nodes = list(ind) + list(subgraph_nei_nodes_un)
194 | 
195 |     subgraph = [adj[subgraph_nodes, :][:, subgraph_nodes] for adj in A_list]
196 | 
197 |     labels, enclosing_subgraph_nodes = node_label(incidence_matrix(subgraph), max_distance=h)
198 | 
199 |     pruned_subgraph_nodes = np.array(subgraph_nodes)[enclosing_subgraph_nodes].tolist()
200 |     pruned_labels = labels[enclosing_subgraph_nodes]
201 |     # pruned_subgraph_nodes = subgraph_nodes
202 |     # pruned_labels = labels
203 | 
204 |     if max_node_label_value is not None:
205 |         pruned_labels = np.array([np.minimum(label, max_node_label_value).tolist() for label in pruned_labels])
206 | 
207 |     subgraph_size = len(pruned_subgraph_nodes)
208 |     enc_ratio = len(subgraph_nei_nodes_int) / (len(subgraph_nei_nodes_un) + 1e-3)
209 |     num_pruned_nodes = len(subgraph_nodes) - len(pruned_subgraph_nodes)
210 | 
211 |     return pruned_subgraph_nodes, pruned_labels, subgraph_size, enc_ratio, num_pruned_nodes
212 | 
213 | 
214 | def node_label(subgraph, max_distance=1):
215 |     # implementation of the node labeling scheme described in the paper
216 |     roots = [0, 1]
217 |     sgs_single_root = [remove_nodes(subgraph, [root]) for root in roots]
218 |     dist_to_roots = [np.clip(ssp.csgraph.dijkstra(sg, indices=[0], directed=False, unweighted=True, limit=1e6)[:, 1:], 0, 1e7) for r, sg in enumerate(sgs_single_root)]
219 |     dist_to_roots = np.array(list(zip(dist_to_roots[0][0], dist_to_roots[1][0])), dtype=int)
220 | 
221 |     target_node_labels = np.array([[0, 1], [1, 0]])
222 |     labels = np.concatenate((target_node_labels, dist_to_roots)) if dist_to_roots.size else target_node_labels
223 | 
224 |     enclosing_subgraph_nodes = np.where(np.max(labels, axis=1) <= max_distance)[0]
225 |     return labels, enclosing_subgraph_nodes
226 | 


--------------------------------------------------------------------------------
/CoMPILE_github/test_auc.py:
--------------------------------------------------------------------------------
  1 | # from comet_ml import Experiment
  2 | import pdb
  3 | import os
  4 | import argparse
  5 | import logging
  6 | import torch
  7 | from scipy.sparse import SparseEfficiencyWarning
  8 | import numpy as np
  9 | 
 10 | from subgraph_extraction.datasets import SubgraphDataset, generate_subgraph_datasets
 11 | from utils.initialization_utils import initialize_experiment, initialize_model
 12 | from utils.graph_utils import collate_dgl, move_batch_to_device_dgl, collate_dgl2
 13 | from managers.evaluator import Evaluator
 14 | 
 15 | from warnings import simplefilter
 16 | 
 17 | 
 18 | def main(params):
 19 |     simplefilter(action='ignore', category=UserWarning)
 20 |     simplefilter(action='ignore', category=SparseEfficiencyWarning)
 21 | 
 22 |     graph_classifier = initialize_model(params, None, load_model=True)
 23 | 
 24 |     logging.info(f"Device: {params.device}")
 25 | 
 26 |     all_auc = []
 27 |     auc_mean = 0
 28 | 
 29 |     all_auc_pr = []
 30 |     auc_pr_mean = 0
 31 |     for r in range(1, params.runs + 1):
 32 | 
 33 |         params.db_path = os.path.join(params.main_dir, f'data/{params.dataset}/test_subgraphs_{params.experiment_name}_{params.constrained_neg_prob}_en_{params.enclosing_sub_graph}')
 34 | 
 35 |         generate_subgraph_datasets(params, splits=['test'],
 36 |                                    saved_relation2id=graph_classifier.relation2id,
 37 |                                    max_label_value=graph_classifier.max_label_value)
 38 | 
 39 |         test = SubgraphDataset(params.db_path, 'test_pos', 'test_neg', params.file_paths, graph_classifier.relation2id,
 40 |                                add_traspose_rels=params.add_traspose_rels,
 41 |                                num_neg_samples_per_link=params.num_neg_samples_per_link,
 42 |                                use_kge_embeddings=params.use_kge_embeddings, dataset=params.dataset,
 43 |                                kge_model=params.kge_model, file_name=params.test_file)
 44 | 
 45 |         test_evaluator = Evaluator(params, graph_classifier, test)
 46 | 
 47 |         result = test_evaluator.eval(save=True)
 48 |         logging.info('\nTest Set Performance:' + str(result))
 49 |         all_auc.append(result['auc'])
 50 |         auc_mean = auc_mean + (result['auc'] - auc_mean) / r
 51 | 
 52 |         all_auc_pr.append(result['auc_pr'])
 53 |         auc_pr_mean = auc_pr_mean + (result['auc_pr'] - auc_pr_mean) / r
 54 | 
 55 |     auc_std = np.std(all_auc)
 56 |     auc_pr_std = np.std(all_auc_pr)
 57 | 
 58 |     logging.info('\nAvg test Set Performance -- mean auc :' + str(np.mean(all_auc)) + ' std auc: ' + str(np.std(all_auc)))
 59 |     logging.info('\nAvg test Set Performance -- mean auc_pr :' + str(np.mean(all_auc_pr)) + ' std auc_pr: ' + str(np.std(all_auc_pr)))
 60 | 
 61 | 
 62 | if __name__ == '__main__':
 63 | 
 64 |     logging.basicConfig(level=logging.INFO)
 65 | 
 66 |     parser = argparse.ArgumentParser(description='TransE model')
 67 | 
 68 |     # Experiment setup params
 69 |     parser.add_argument("--experiment_name", "-e", type=str, default="default",
 70 |                         help="A folder with this name would be created to dump saved models and log files")
 71 |     parser.add_argument("--dataset", "-d", type=str, default="Toy",
 72 |                         help="Dataset string")
 73 |     parser.add_argument("--train_file", "-tf", type=str, default="train",
 74 |                         help="Name of file containing training triplets")
 75 |     parser.add_argument("--test_file", "-t", type=str, default="test",
 76 |                         help="Name of file containing test triplets")
 77 |     parser.add_argument("--runs", type=int, default=1,
 78 |                         help="How many runs to perform for mean and std?")
 79 |     parser.add_argument("--gpu", type=int, default=0,
 80 |                         help="Which GPU to use?")
 81 |     parser.add_argument('--disable_cuda', action='store_true',
 82 |                         help='Disable CUDA')
 83 | 
 84 |     # Data processing pipeline params
 85 |     parser.add_argument("--max_links", type=int, default=100000,
 86 |                         help="Set maximum number of links (to fit into memory)")
 87 |     parser.add_argument("--hop", type=int, default=3,
 88 |                         help="Enclosing subgraph hop number")
 89 |     parser.add_argument("--max_nodes_per_hop", "-max_h", type=int, default=None,
 90 |                         help="if > 0, upper bound the # nodes per hop by subsampling")
 91 |     parser.add_argument("--use_kge_embeddings", "-kge", type=bool, default=False,
 92 |                         help='whether to use pretrained KGE embeddings')
 93 |     parser.add_argument("--kge_model", type=str, default="TransE",
 94 |                         help="Which KGE model to load entity embeddings from")
 95 |     parser.add_argument('--model_type', '-m', type=str, choices=['dgl'], default='dgl',
 96 |                         help='what format to store subgraphs in for model')
 97 |     parser.add_argument('--constrained_neg_prob', '-cn', type=float, default=0,
 98 |                         help='with what probability to sample constrained heads/tails while neg sampling')
 99 |     parser.add_argument("--num_neg_samples_per_link", '-neg', type=int, default=1,
100 |                         help="Number of negative examples to sample per positive link")
101 |     parser.add_argument("--batch_size", type=int, default=16,
102 |                         help="Batch size")
103 |     parser.add_argument("--num_workers", type=int, default=0,
104 |                         help="Number of dataloading processes")
105 |     parser.add_argument('--add_traspose_rels', '-tr', type=bool, default=False,
106 |                         help='whether to append adj matrix list with symmetric relations')
107 |     parser.add_argument('--enclosing_sub_graph', '-en', type=bool, default=True,
108 |                         help='whether to only consider enclosing subgraph')
109 | 
110 |     params = parser.parse_args()
111 |     initialize_experiment(params, __file__)
112 | 
113 |     params.file_paths = {
114 |         'train': os.path.join(params.main_dir, 'data/{}/{}.txt'.format(params.dataset, params.train_file)),
115 |         'test': os.path.join(params.main_dir, 'data/{}/{}.txt'.format(params.dataset, params.test_file))
116 |     }
117 | 
118 |     if not params.disable_cuda and torch.cuda.is_available():
119 |         params.device = torch.device('cuda:%d' % params.gpu)
120 |     else:
121 |         params.device = torch.device('cpu')
122 | 
123 |     params.collate_fn = collate_dgl2
124 |     params.move_batch_to_device = move_batch_to_device_dgl
125 | 
126 |     main(params)
127 | 


--------------------------------------------------------------------------------
/CoMPILE_github/train.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | import argparse
  3 | import logging
  4 | import torch
  5 | from scipy.sparse import SparseEfficiencyWarning
  6 | 
  7 | from subgraph_extraction.datasets import SubgraphDataset, generate_subgraph_datasets
  8 | from utils.initialization_utils import initialize_experiment, initialize_model
  9 | from utils.graph_utils import collate_dgl, move_batch_to_device_dgl, collate_dgl2
 10 | 
 11 | from model.dgl.graph_classifier import GraphClassifier as dgl_model
 12 | 
 13 | from managers.evaluator import Evaluator
 14 | from managers.trainer import Trainer
 15 | 
 16 | from warnings import simplefilter
 17 | #os.environ["CUDA_VISIBLE_DEVICES"]="1" ##############2
 18 | 
 19 | def main(params):
 20 |     simplefilter(action='ignore', category=UserWarning)
 21 |     simplefilter(action='ignore', category=SparseEfficiencyWarning)
 22 | 
 23 |     params.db_path = os.path.join(params.main_dir, f'data/{params.dataset}/subgraphs_en_{params.enclosing_sub_graph}_neg_{params.num_neg_samples_per_link}_hop_{params.hop}')
 24 | 
 25 |     if not os.path.isdir(params.db_path):
 26 |         generate_subgraph_datasets(params)
 27 | 
 28 |     train = SubgraphDataset(params.db_path, 'train_pos', 'train_neg', params.file_paths,
 29 |                             add_traspose_rels=params.add_traspose_rels,
 30 |                             num_neg_samples_per_link=params.num_neg_samples_per_link,
 31 |                             use_kge_embeddings=params.use_kge_embeddings, dataset=params.dataset,
 32 |                             kge_model=params.kge_model, file_name=params.train_file)
 33 |     valid = SubgraphDataset(params.db_path, 'valid_pos', 'valid_neg', params.file_paths,
 34 |                             add_traspose_rels=params.add_traspose_rels,
 35 |                             num_neg_samples_per_link=params.num_neg_samples_per_link,
 36 |                             use_kge_embeddings=params.use_kge_embeddings, dataset=params.dataset,
 37 |                             kge_model=params.kge_model, file_name=params.valid_file)
 38 | 
 39 |     params.num_rels = train.num_rels
 40 |     params.aug_num_rels = train.aug_num_rels
 41 |     params.inp_dim = train.n_feat_dim
 42 | 
 43 |     # Log the max label value to save it in the model. This will be used to cap the labels generated on test set.
 44 |     params.max_label_value = train.max_n_label
 45 | 
 46 |     graph_classifier = initialize_model(params, dgl_model, params.load_model)
 47 | 
 48 |     logging.info(f"Device: {params.device}")
 49 |     logging.info(f"Input dim : {params.inp_dim}, # Relations : {params.num_rels}, # Augmented relations : {params.aug_num_rels}")
 50 | 
 51 |     valid_evaluator = Evaluator(params, graph_classifier, valid)
 52 | 
 53 |     trainer = Trainer(params, graph_classifier, train, valid_evaluator)
 54 | 
 55 |     logging.info('Starting training with full batch...')
 56 | 
 57 |     trainer.train()
 58 | 
 59 | 
 60 | if __name__ == '__main__':
 61 | 
 62 |     logging.basicConfig(level=logging.INFO)
 63 | 
 64 |     parser = argparse.ArgumentParser(description='TransE model')
 65 | 
 66 |     # Experiment setup params
 67 |     parser.add_argument("--experiment_name", "-e", type=str, default="default",
 68 |                         help="A folder with this name would be created to dump saved models and log files")
 69 |     parser.add_argument("--dataset", "-d", type=str,
 70 |                         help="Dataset string")
 71 |     parser.add_argument("--gpu", type=int, default=0,
 72 |                         help="Which GPU to use?")
 73 |     parser.add_argument('--disable_cuda', action='store_true',
 74 |                         help='Disable CUDA')
 75 |     parser.add_argument('--load_model', action='store_true',
 76 |                         help='Load existing model?')
 77 |     parser.add_argument("--train_file", "-tf", type=str, default="train",
 78 |                         help="Name of file containing training triplets")
 79 |     parser.add_argument("--valid_file", "-vf", type=str, default="valid",
 80 |                         help="Name of file containing validation triplets")
 81 | 
 82 |     # Training regime params
 83 |     parser.add_argument("--num_epochs", "-ne", type=int, default=30,   #########30
 84 |                         help="Learning rate of the optimizer")
 85 |     parser.add_argument("--eval_every", type=int, default=1,
 86 |                         help="Interval of epochs to evaluate the model?")
 87 |     parser.add_argument("--eval_every_iter", type=int, default=455,
 88 |                         help="Interval of iterations to evaluate the model?")
 89 |     parser.add_argument("--save_every", type=int, default=10,
 90 |                         help="Interval of epochs to save a checkpoint of the model?")
 91 |     parser.add_argument("--early_stop", type=int, default=100,
 92 |                         help="Early stopping patience")
 93 |     parser.add_argument("--optimizer", type=str, default="Adam",
 94 |                         help="Which optimizer to use?")
 95 |     parser.add_argument("--lr", type=float, default=0.001,
 96 |                         help="Learning rate of the optimizer")
 97 |     parser.add_argument("--clip", type=int, default=1000,
 98 |                         help="Maximum gradient norm allowed")
 99 |     parser.add_argument("--l2", type=float, default=5e-4,
100 |                         help="Regularization constant for GNN weights")
101 |     parser.add_argument("--margin", type=float, default=10,
102 |                         help="The margin between positive and negative samples in the max-margin loss")
103 | 
104 |     # Data processing pipeline params
105 |     parser.add_argument("--max_links", type=int, default=10000000, #10000000
106 |                         help="Set maximum number of train links (to fit into memory)")
107 |     parser.add_argument("--hop", type=int, default=3,
108 |                         help="Enclosing subgraph hop number")
109 |     parser.add_argument("--max_nodes_per_hop", "-max_h", type=int, default=None,
110 |                         help="if > 0, upper bound the # nodes per hop by subsampling")
111 |     parser.add_argument("--use_kge_embeddings", "-kge", type=bool, default=False,
112 |                         help='whether to use pretrained KGE embeddings')
113 |     parser.add_argument("--kge_model", type=str, default="TransE",
114 |                         help="Which KGE model to load entity embeddings from")
115 |     parser.add_argument('--model_type', '-m', type=str, choices=['ssp', 'dgl'], default='dgl',
116 |                         help='what format to store subgraphs in for model')
117 |     parser.add_argument('--constrained_neg_prob', '-cn', type=float, default=0.0,
118 |                         help='with what probability to sample constrained heads/tails while neg sampling')
119 |     parser.add_argument("--batch_size", type=int, default=16,
120 |                         help="Batch size")
121 |     parser.add_argument("--num_neg_samples_per_link", '-neg', type=int, default=1,
122 |                         help="Number of negative examples to sample per positive link")
123 |     parser.add_argument("--num_workers", type=int, default=0,
124 |                         help="Number of dataloading processes")
125 |     parser.add_argument('--add_traspose_rels', '-tr', type=bool, default=False,
126 |                         help='whether to append adj matrix list with symmetric relations')
127 |     parser.add_argument('--enclosing_sub_graph', '-en', type=bool, default=True,
128 |                         help='whether to only consider enclosing subgraph')
129 | 
130 |     # Model params
131 |     parser.add_argument("--rel_emb_dim", "-r_dim", type=int, default=32,
132 |                         help="Relation embedding size")
133 |     parser.add_argument("--attn_rel_emb_dim", "-ar_dim", type=int, default=32,
134 |                         help="Relation embedding size for attention")
135 |     parser.add_argument("--emb_dim", "-dim", type=int, default=32,
136 |                         help="Entity embedding size")
137 |     parser.add_argument("--num_gcn_layers", "-l", type=int, default=3,
138 |                         help="Number of GCN layers")
139 |     parser.add_argument("--num_bases", "-b", type=int, default=4,
140 |                         help="Number of basis functions to use for GCN weights")
141 |     parser.add_argument("--dropout", type=float, default=0,
142 |                         help="Dropout rate in GNN layers")
143 |     parser.add_argument("--edge_dropout", type=float, default=0.5,
144 |                         help="Dropout rate in edges of the subgraphs")
145 |     parser.add_argument('--gnn_agg_type', '-a', type=str, choices=['sum', 'mlp', 'gru'], default='sum',
146 |                         help='what type of aggregation to do in gnn msg passing')
147 |     parser.add_argument('--add_ht_emb', '-ht', type=bool, default=True,
148 |                         help='whether to concatenate head/tail embedding with pooled graph representation')
149 |     parser.add_argument('--has_attn', '-attn', type=bool, default=True,
150 |                         help='whether to have attn in model or not')
151 | 
152 |     params = parser.parse_args()
153 |     initialize_experiment(params, __file__)
154 | 
155 |     params.file_paths = {
156 |         'train': os.path.join(params.main_dir, 'data/{}/{}.txt'.format(params.dataset, params.train_file)),
157 |         'valid': os.path.join(params.main_dir, 'data/{}/{}.txt'.format(params.dataset, params.valid_file))
158 |     }
159 | 
160 |     if not params.disable_cuda and torch.cuda.is_available():
161 |         params.device = torch.device('cuda:%d' % params.gpu)
162 |     else:
163 |         params.device = torch.device('cpu')
164 | 
165 |     params.collate_fn = collate_dgl2
166 |     params.move_batch_to_device = move_batch_to_device_dgl
167 | 
168 |     main(params)
169 | 


--------------------------------------------------------------------------------
/CoMPILE_github/utils/__pycache__/data_utils.cpython-36.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TmacMai/CoMPILE_Inductive_Knowledge_Graph/523db77f8178f89faf5ec06c677ca66b84ef35eb/CoMPILE_github/utils/__pycache__/data_utils.cpython-36.pyc


--------------------------------------------------------------------------------
/CoMPILE_github/utils/__pycache__/dgl_utils.cpython-36.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TmacMai/CoMPILE_Inductive_Knowledge_Graph/523db77f8178f89faf5ec06c677ca66b84ef35eb/CoMPILE_github/utils/__pycache__/dgl_utils.cpython-36.pyc


--------------------------------------------------------------------------------
/CoMPILE_github/utils/__pycache__/graph_utils.cpython-36.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TmacMai/CoMPILE_Inductive_Knowledge_Graph/523db77f8178f89faf5ec06c677ca66b84ef35eb/CoMPILE_github/utils/__pycache__/graph_utils.cpython-36.pyc


--------------------------------------------------------------------------------
/CoMPILE_github/utils/__pycache__/initialization_utils.cpython-36.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TmacMai/CoMPILE_Inductive_Knowledge_Graph/523db77f8178f89faf5ec06c677ca66b84ef35eb/CoMPILE_github/utils/__pycache__/initialization_utils.cpython-36.pyc


--------------------------------------------------------------------------------
/CoMPILE_github/utils/clean_data.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | import argparse
 3 | import numpy as np
 4 | 
 5 | 
 6 | def write_to_file(file_name, data):
 7 |     with open(file_name, "w") as f:
 8 |         for s, r, o in data:
 9 |             f.write('\t'.join([s, r, o]) + '\n')
10 | 
11 | 
12 | def main(params):
13 |     with open(os.path.join(params.main_dir, 'data', params.dataset, 'train.txt')) as f:
14 |         train_data = [line.split() for line in f.read().split('\n')[:-1]]
15 |     with open(os.path.join(params.main_dir, 'data', params.dataset, 'valid.txt')) as f:
16 |         valid_data = [line.split() for line in f.read().split('\n')[:-1]]
17 |     with open(os.path.join(params.main_dir, 'data', params.dataset, 'test.txt')) as f:
18 |         test_data = [line.split() for line in f.read().split('\n')[:-1]]
19 | 
20 |     train_tails = set([d[2] for d in train_data])
21 |     train_heads = set([d[0] for d in train_data])
22 |     train_ent = train_tails.union(train_heads)
23 |     train_rels = set([d[1] for d in train_data])
24 | 
25 |     filtered_valid_data = []
26 |     for d in valid_data:
27 |         if d[0] in train_ent and d[1] in train_rels and d[2] in train_ent:
28 |             filtered_valid_data.append(d)
29 |         else:
30 |             train_data.append(d)
31 |             train_ent = train_ent.union(set([d[0], d[2]]))
32 |             train_rels = train_rels.union(set([d[1]]))
33 | 
34 |     filtered_test_data = []
35 |     for d in test_data:
36 |         if d[0] in train_ent and d[1] in train_rels and d[2] in train_ent:
37 |             filtered_test_data.append(d)
38 |         else:
39 |             train_data.append(d)
40 |             train_ent = train_ent.union(set([d[0], d[2]]))
41 |             train_rels = train_rels.union(set([d[1]]))
42 | 
43 |     data_dir = os.path.join(params.main_dir, 'data/{}'.format(params.dataset))
44 |     write_to_file(os.path.join(data_dir, 'train.txt'), train_data)
45 |     write_to_file(os.path.join(data_dir, 'valid.txt'), filtered_valid_data)
46 |     write_to_file(os.path.join(data_dir, 'test.txt'), filtered_test_data)
47 | 
48 |     with open(os.path.join(params.main_dir, 'data', params.dataset + '_meta', 'train.txt')) as f:
49 |         meta_train_data = [line.split() for line in f.read().split('\n')[:-1]]
50 |     with open(os.path.join(params.main_dir, 'data', params.dataset + '_meta', 'valid.txt')) as f:
51 |         meta_valid_data = [line.split() for line in f.read().split('\n')[:-1]]
52 |     with open(os.path.join(params.main_dir, 'data', params.dataset + '_meta', 'test.txt')) as f:
53 |         meta_test_data = [line.split() for line in f.read().split('\n')[:-1]]
54 | 
55 |     meta_train_tails = set([d[2] for d in meta_train_data])
56 |     meta_train_heads = set([d[0] for d in meta_train_data])
57 |     meta_train_ent = meta_train_tails.union(meta_train_heads)
58 |     meta_train_rels = set([d[1] for d in meta_train_data])
59 | 
60 |     filtered_meta_valid_data = []
61 |     for d in meta_valid_data:
62 |         if d[0] in meta_train_ent and d[1] in meta_train_rels and d[2] in meta_train_ent:
63 |             filtered_meta_valid_data.append(d)
64 |         else:
65 |             meta_train_data.append(d)
66 |             meta_train_ent = meta_train_ent.union(set([d[0], d[2]]))
67 |             meta_train_rels = meta_train_rels.union(set([d[1]]))
68 | 
69 |     filtered_meta_test_data = []
70 |     for d in meta_test_data:
71 |         if d[0] in meta_train_ent and d[1] in meta_train_rels and d[2] in meta_train_ent:
72 |             filtered_meta_test_data.append(d)
73 |         else:
74 |             meta_train_data.append(d)
75 |             meta_train_ent = meta_train_ent.union(set([d[0], d[2]]))
76 |             meta_train_rels = meta_train_rels.union(set([d[1]]))
77 | 
78 |     meta_data_dir = os.path.join(params.main_dir, 'data/{}_meta'.format(params.dataset))
79 |     write_to_file(os.path.join(meta_data_dir, 'train.txt'), meta_train_data)
80 |     write_to_file(os.path.join(meta_data_dir, 'valid.txt'), filtered_meta_valid_data)
81 |     write_to_file(os.path.join(meta_data_dir, 'test.txt'), filtered_meta_test_data)
82 | 
83 | 
84 | if __name__ == '__main__':
85 |     parser = argparse.ArgumentParser(description='Move new entities from test/valid to train')
86 | 
87 |     parser.add_argument("--dataset", "-d", type=str, default="fb237_v1_copy",
88 |                         help="Dataset string")
89 |     params = parser.parse_args()
90 | 
91 |     params.main_dir = os.path.join(os.path.relpath(os.path.dirname(os.path.abspath(__file__))), '..')
92 | 
93 |     main(params)
94 | 


--------------------------------------------------------------------------------
/CoMPILE_github/utils/data_utils.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | import pdb
 3 | import numpy as np
 4 | from scipy.sparse import csc_matrix
 5 | import matplotlib.pyplot as plt
 6 | 
 7 | 
 8 | def plot_rel_dist(adj_list, filename):
 9 |     rel_count = []
10 |     for adj in adj_list:
11 |         rel_count.append(adj.count_nonzero())
12 | 
13 |     fig = plt.figure(figsize=(12, 8))
14 |     plt.plot(rel_count)
15 |     fig.savefig(filename, dpi=fig.dpi)
16 | 
17 | 
18 | def process_files(files, saved_relation2id=None):
19 |     '''
20 |     files: Dictionary map of file paths to read the triplets from.
21 |     saved_relation2id: Saved relation2id (mostly passed from a trained model) which can be used to map relations to pre-defined indices and filter out the unknown ones.
22 |     '''
23 |     entity2id = {}
24 |     relation2id = {} if saved_relation2id is None else saved_relation2id
25 | 
26 |     triplets = {}
27 | 
28 |     ent = 0
29 |     rel = 0
30 | 
31 |     for file_type, file_path in files.items():
32 | 
33 |         data = []
34 |         with open(file_path) as f:
35 |             file_data = [line.split() for line in f.read().split('\n')[:-1]]
36 | 
37 |         for triplet in file_data:
38 |             if triplet[0] not in entity2id:
39 |                 entity2id[triplet[0]] = ent
40 |                 ent += 1
41 |             if triplet[2] not in entity2id:
42 |                 entity2id[triplet[2]] = ent
43 |                 ent += 1
44 |             if not saved_relation2id and triplet[1] not in relation2id:
45 |                 relation2id[triplet[1]] = rel
46 |                 rel += 1
47 | 
48 |             # Save the triplets corresponding to only the known relations
49 |             if triplet[1] in relation2id:
50 |                 data.append([entity2id[triplet[0]], entity2id[triplet[2]], relation2id[triplet[1]]])
51 | 
52 |         triplets[file_type] = np.array(data)
53 | 
54 |     id2entity = {v: k for k, v in entity2id.items()}
55 |     id2relation = {v: k for k, v in relation2id.items()}
56 | 
57 |     # Construct the list of adjacency matrix each corresponding to eeach relation. Note that this is constructed only from the train data.
58 |     adj_list = []
59 |     for i in range(len(relation2id)):
60 |         idx = np.argwhere(triplets['train'][:, 2] == i)
61 |         adj_list.append(csc_matrix((np.ones(len(idx), dtype=np.uint8), (triplets['train'][:, 0][idx].squeeze(1), triplets['train'][:, 1][idx].squeeze(1))), shape=(len(entity2id), len(entity2id))))
62 | 
63 |     return adj_list, triplets, entity2id, relation2id, id2entity, id2relation
64 | 
65 | 
66 | def save_to_file(directory, file_name, triplets, id2entity, id2relation):
67 |     file_path = os.path.join(directory, file_name)
68 |     with open(file_path, "w") as f:
69 |         for s, o, r in triplets:
70 |             f.write('\t'.join([id2entity[s], id2relation[r], id2entity[o]]) + '\n')
71 | 


--------------------------------------------------------------------------------
/CoMPILE_github/utils/dgl_utils.py:
--------------------------------------------------------------------------------
 1 | import numpy as np
 2 | import scipy.sparse as ssp
 3 | import random
 4 | 
 5 | """All functions in this file are from  dgl.contrib.data.knowledge_graph"""
 6 | 
 7 | 
 8 | def _bfs_relational(adj, roots, max_nodes_per_hop=None):
 9 |     """
10 |     BFS for graphs.
11 |     Modified from dgl.contrib.data.knowledge_graph to accomodate node sampling
12 |     """
13 |     visited = set()
14 |     current_lvl = set(roots)
15 | 
16 |     next_lvl = set()
17 | 
18 |     while current_lvl:
19 | 
20 |         for v in current_lvl:
21 |             visited.add(v)
22 | 
23 |         next_lvl = _get_neighbors(adj, current_lvl)
24 |         next_lvl -= visited  # set difference
25 | 
26 |         if max_nodes_per_hop and max_nodes_per_hop < len(next_lvl):
27 |             next_lvl = set(random.sample(next_lvl, max_nodes_per_hop))
28 | 
29 |         yield next_lvl
30 | 
31 |         current_lvl = set.union(next_lvl)
32 | 
33 | 
34 | def _get_neighbors(adj, nodes):
35 |     """Takes a set of nodes and a graph adjacency matrix and returns a set of neighbors.
36 |     Directly copied from dgl.contrib.data.knowledge_graph"""
37 |     sp_nodes = _sp_row_vec_from_idx_list(list(nodes), adj.shape[1])
38 |     sp_neighbors = sp_nodes.dot(adj)
39 |     neighbors = set(ssp.find(sp_neighbors)[1])  # convert to set of indices
40 |     return neighbors
41 | 
42 | 
43 | def _sp_row_vec_from_idx_list(idx_list, dim):
44 |     """Create sparse vector of dimensionality dim from a list of indices."""
45 |     shape = (1, dim)
46 |     data = np.ones(len(idx_list))
47 |     row_ind = np.zeros(len(idx_list))
48 |     col_ind = list(idx_list)
49 |     return ssp.csr_matrix((data, (row_ind, col_ind)), shape=shape)
50 | 


--------------------------------------------------------------------------------
/CoMPILE_github/utils/graph_utils.py:
--------------------------------------------------------------------------------
  1 | import statistics
  2 | import numpy as np
  3 | import scipy.sparse as ssp
  4 | import torch
  5 | import networkx as nx
  6 | import dgl
  7 | import pickle
  8 | 
  9 | 
 10 | def serialize(data):
 11 |     data_tuple = tuple(data.values())
 12 |     return pickle.dumps(data_tuple)
 13 | 
 14 | 
 15 | def deserialize(data):
 16 |     data_tuple = pickle.loads(data)
 17 |     keys = ('nodes', 'r_label', 'g_label', 'n_label')
 18 |     return dict(zip(keys, data_tuple))
 19 | 
 20 | 
 21 | def get_edge_count(adj_list):
 22 |     count = []
 23 |     for adj in adj_list:
 24 |         count.append(len(adj.tocoo().row.tolist()))
 25 |     return np.array(count)
 26 | 
 27 | 
 28 | def incidence_matrix(adj_list):
 29 |     '''
 30 |     adj_list: List of sparse adjacency matrices
 31 |     '''
 32 | 
 33 |     rows, cols, dats = [], [], []
 34 |     dim = adj_list[0].shape
 35 |     for adj in adj_list:
 36 |         adjcoo = adj.tocoo()
 37 |         rows += adjcoo.row.tolist()
 38 |         cols += adjcoo.col.tolist()
 39 |         dats += adjcoo.data.tolist()
 40 |     row = np.array(rows)
 41 |     col = np.array(cols)
 42 |     data = np.array(dats)
 43 |     return ssp.csc_matrix((data, (row, col)), shape=dim)
 44 | 
 45 | 
 46 | def remove_nodes(A_incidence, nodes):
 47 |     idxs_wo_nodes = list(set(range(A_incidence.shape[1])) - set(nodes))
 48 |     return A_incidence[idxs_wo_nodes, :][:, idxs_wo_nodes]
 49 | 
 50 | 
 51 | def ssp_to_torch(A, device, dense=False):
 52 |     '''
 53 |     A : Sparse adjacency matrix
 54 |     '''
 55 |     idx = torch.LongTensor([A.tocoo().row, A.tocoo().col])
 56 |     dat = torch.FloatTensor(A.tocoo().data)
 57 |     A = torch.sparse.FloatTensor(idx, dat, torch.Size([A.shape[0], A.shape[1]])).to(device=device)
 58 |     return A
 59 | 
 60 | 
 61 | def ssp_multigraph_to_dgl(graph, n_feats=None):
 62 |     """
 63 |     Converting ssp multigraph (i.e. list of adjs) to dgl multigraph.
 64 |     """
 65 | 
 66 |     g_nx = nx.MultiDiGraph()
 67 |     g_nx.add_nodes_from(list(range(graph[0].shape[0])))
 68 |     # Add edges
 69 |     for rel, adj in enumerate(graph):
 70 |         # Convert adjacency matrix to tuples for nx0
 71 |         nx_triplets = []
 72 |         for src, dst in list(zip(adj.tocoo().row, adj.tocoo().col)):
 73 |             nx_triplets.append((src, dst, {'type': rel}))
 74 |         g_nx.add_edges_from(nx_triplets)
 75 | 
 76 |     # make dgl graph
 77 |     g_dgl = dgl.DGLGraph(multigraph=True)
 78 |     g_dgl.from_networkx(g_nx, edge_attrs=['type'])
 79 |     # add node features
 80 |     if n_feats is not None:
 81 |         g_dgl.ndata['feat'] = torch.tensor(n_feats)
 82 | 
 83 |     return g_dgl
 84 | 
 85 | 
 86 | def collate_dgl(samples):
 87 |     # The input `samples` is a list of pairs
 88 |     graphs_pos, g_labels_pos, r_labels_pos, graphs_negs, g_labels_negs, r_labels_negs = map(list, zip(*samples))
 89 |     batched_graph_pos = dgl.batch(graphs_pos)
 90 |     print('batched_graph_pos ', len(batched_graph_pos))
 91 |     
 92 |     graphs_neg = [item for sublist in graphs_negs for item in sublist]
 93 |     g_labels_neg = [item for sublist in g_labels_negs for item in sublist]
 94 |     r_labels_neg = [item for sublist in r_labels_negs for item in sublist]
 95 | 
 96 |     batched_graph_neg = dgl.batch(graphs_neg)
 97 |     return (batched_graph_pos, r_labels_pos), g_labels_pos, (batched_graph_neg, r_labels_neg), g_labels_neg
 98 | 
 99 | def collate_dgl2(samples):
100 |     # The input `samples` is a list of pairs
101 |     graphs_pos, g_labels_pos, r_labels_pos, graphs_negs, g_labels_negs, r_labels_negs = map(list, zip(*samples))
102 |     
103 |   #  graphs_pos = [item for sublist in graphs_pos for item in sublist]
104 |   #  g_labels_pos = [item for sublist in g_labels_pos for item in sublist]
105 |   #  r_labels_pos = [item for sublist in r_labels_pos for item in sublist]    
106 |     
107 |    # batched_graph_pos = dgl.batch(graphs_pos)
108 | 
109 |     graphs_neg = [item for sublist in graphs_negs for item in sublist]
110 |     g_labels_neg = [item for sublist in g_labels_negs for item in sublist]
111 |     r_labels_neg = [item for sublist in r_labels_negs for item in sublist]
112 |     #print('neg for each pos ', len(graphs_neg))
113 |    # batched_graph_neg = dgl.batch(graphs_neg)
114 |     return (graphs_pos, r_labels_pos), g_labels_pos, (graphs_neg, r_labels_neg), g_labels_neg
115 | 
116 | 
117 | def move_batch_to_device_dgl(batch, device):
118 |     ((g_dgl_pos, r_labels_pos), targets_pos, (g_dgl_neg, r_labels_neg), targets_neg) = batch
119 | 
120 |     targets_pos = torch.LongTensor(targets_pos).to(device=device)
121 |     r_labels_pos = torch.LongTensor(r_labels_pos).to(device=device)
122 |   #  print()
123 |     
124 |     targets_neg = torch.LongTensor(targets_neg).to(device=device)
125 |     r_labels_neg = torch.LongTensor(r_labels_neg).to(device=device)
126 | 
127 |     g_dgl_pos = send_graph_to_device(g_dgl_pos, device)
128 |     g_dgl_neg = send_graph_to_device(g_dgl_neg, device)
129 | 
130 |     return ((g_dgl_pos, r_labels_pos), targets_pos, (g_dgl_neg, r_labels_neg), targets_neg)
131 | 
132 | 
133 | def send_graph_to_device(g, device):
134 |     # nodes
135 |     labels = g.node_attr_schemes()
136 |   #  print('node labels shape ', labels[1].shape)
137 |     i = 0
138 |     for l in labels.keys():
139 |         g.ndata[l] = g.ndata.pop(l).to(device)
140 |       #  if i == 0:
141 |        #     print('node l shape ', g.ndata[l].shape)
142 |        # i += 1
143 |         #if l ==
144 |       #  print('node l  ', l)
145 | 
146 |     # edges
147 |     labels = g.edge_attr_schemes()
148 |   #  print('edge labels shape ', labels[1].shape)
149 |     for l in labels.keys():
150 |         g.edata[l] = g.edata.pop(l).to(device)
151 |       #  print('edge l ', l)
152 |     return g
153 | 
154 | #  The following three functions are modified from networks source codes to
155 | #  accomodate diameter and radius for dirercted graphs
156 | 
157 | 
158 | def eccentricity(G):
159 |     e = {}
160 |     for n in G.nbunch_iter():
161 |         length = nx.single_source_shortest_path_length(G, n)
162 |         e[n] = max(length.values())
163 |     return e
164 | 
165 | 
166 | def radius(G):
167 |     e = eccentricity(G)
168 |     e = np.where(np.array(list(e.values())) > 0, list(e.values()), np.inf)
169 |     return min(e)
170 | 
171 | 
172 | def diameter(G):
173 |     e = eccentricity(G)
174 |     return max(e.values())
175 | 


--------------------------------------------------------------------------------
/CoMPILE_github/utils/initialization_utils.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | import logging
 3 | import json
 4 | import torch
 5 | 
 6 | 
 7 | def initialize_experiment(params, file_name):
 8 |     '''
 9 |     Makes the experiment directory, sets standard paths and initializes the logger
10 |     '''
11 |     params.main_dir = os.path.join(os.path.relpath(os.path.dirname(os.path.abspath(__file__))), '..')
12 |     exps_dir = os.path.join(params.main_dir, 'experiments')
13 |     if not os.path.exists(exps_dir):
14 |         os.makedirs(exps_dir)
15 | 
16 |     params.exp_dir = os.path.join(exps_dir, params.experiment_name)
17 | 
18 |     if not os.path.exists(params.exp_dir):
19 |         os.makedirs(params.exp_dir)
20 | 
21 |     if file_name == 'test_auc.py':
22 |         params.test_exp_dir = os.path.join(params.exp_dir, f"test_{params.dataset}_{params.constrained_neg_prob}")
23 |         if not os.path.exists(params.test_exp_dir):
24 |             os.makedirs(params.test_exp_dir)
25 |         file_handler = logging.FileHandler(os.path.join(params.test_exp_dir, f"log_test.txt"))
26 |     else:
27 |         file_handler = logging.FileHandler(os.path.join(params.exp_dir, "log_train.txt"))
28 |     logger = logging.getLogger()
29 |     logger.addHandler(file_handler)
30 | 
31 |     logger.info('============ Initialized logger ============')
32 |     logger.info('\n'.join('%s: %s' % (k, str(v)) for k, v
33 |                           in sorted(dict(vars(params)).items())))
34 |     logger.info('============================================')
35 | 
36 |     with open(os.path.join(params.exp_dir, "params.json"), 'w') as fout:
37 |         json.dump(vars(params), fout)
38 | 
39 | 
40 | def initialize_model(params, model, load_model=False):
41 |     '''
42 |     relation2id: the relation to id mapping, this is stored in the model and used when testing
43 |     model: the type of model to initialize/load
44 |     load_model: flag which decide to initialize the model or load a saved model
45 |     '''
46 | 
47 |     if load_model and os.path.exists(os.path.join(params.exp_dir, 'best_graph_classifier.pth')):
48 |         logging.info('Loading existing model from %s' % os.path.join(params.exp_dir, 'best_graph_classifier.pth'))
49 |         graph_classifier = torch.load(os.path.join(params.exp_dir, 'best_graph_classifier.pth')).to(device=params.device)
50 |     else:
51 |         relation2id_path = os.path.join(params.main_dir, f'data/{params.dataset}/relation2id.json')
52 |         with open(relation2id_path) as f:
53 |             relation2id = json.load(f)
54 | 
55 |         logging.info('No existing model found. Initializing new model..')
56 |         graph_classifier = model(params, relation2id).to(device=params.device)
57 | 
58 |     return graph_classifier
59 | 


--------------------------------------------------------------------------------
/CoMPILE_github/utils/prepare_meta_data.py:
--------------------------------------------------------------------------------
  1 | import pdb
  2 | import os
  3 | import math
  4 | import random
  5 | import argparse
  6 | import numpy as np
  7 | 
  8 | from graph_utils import incidence_matrix, get_edge_count
  9 | from dgl_utils import _bfs_relational
 10 | from data_utils import process_files, save_to_file
 11 | 
 12 | 
 13 | def get_active_relations(adj_list):
 14 |     act_rels = []
 15 |     for r, adj in enumerate(adj_list):
 16 |         if len(adj.tocoo().row.tolist()) > 0:
 17 |             act_rels.append(r)
 18 |     return act_rels
 19 | 
 20 | 
 21 | def get_avg_degree(adj_list):
 22 |     adj_mat = incidence_matrix(adj_list)
 23 |     degree = []
 24 |     for node in range(adj_list[0].shape[0]):
 25 |         degree.append(np.sum(adj_mat[node, :]))
 26 |     return np.mean(degree)
 27 | 
 28 | 
 29 | def get_splits(adj_list, nodes, valid_rels=None, valid_ratio=0.1, test_ratio=0.1):
 30 |     '''
 31 |     Get train/valid/test splits of the sub-graph defined by the given set of nodes. The relations in this subbgraph are limited to be among the given valid_rels.
 32 |     '''
 33 | 
 34 |     # Extract the subgraph
 35 |     subgraph = [adj[nodes, :][:, nodes] for adj in adj_list]
 36 | 
 37 |     # Get the relations that are allowed to be sampled
 38 |     active_rels = get_active_relations(subgraph)
 39 |     common_rels = list(set(active_rels).intersection(set(valid_rels)))
 40 | 
 41 |     print('Average degree : ', get_avg_degree(subgraph))
 42 |     print('Nodes: ', len(nodes))
 43 |     print('Links: ', np.sum(get_edge_count(subgraph)))
 44 |     print('Active relations: ', len(common_rels))
 45 | 
 46 |     # get all the triplets satisfying the given constraints
 47 |     all_triplets = []
 48 |     for r in common_rels:
 49 |         # print(r, len(subgraph[r].tocoo().row))
 50 |         for (i, j) in zip(subgraph[r].tocoo().row, subgraph[r].tocoo().col):
 51 |             all_triplets.append([nodes[i], nodes[j], r])
 52 |     all_triplets = np.array(all_triplets)
 53 | 
 54 |     # delete the triplets which correspond to self connections
 55 |     ind = np.argwhere(all_triplets[:, 0] == all_triplets[:, 1])
 56 |     all_triplets = np.delete(all_triplets, ind, axis=0)
 57 |     print('Links after deleting self connections : %d' % len(all_triplets))
 58 | 
 59 |     # get the splits according to the given ratio
 60 |     np.random.shuffle(all_triplets)
 61 |     train_split = int(math.ceil(len(all_triplets) * (1 - valid_ratio - test_ratio)))
 62 |     valid_split = int(math.ceil(len(all_triplets) * (1 - test_ratio)))
 63 | 
 64 |     train_triplets = all_triplets[:train_split]
 65 |     valid_triplets = all_triplets[train_split: valid_split]
 66 |     test_triplets = all_triplets[valid_split:]
 67 | 
 68 |     return train_triplets, valid_triplets, test_triplets, common_rels
 69 | 
 70 | 
 71 | def get_subgraph(adj_list, hops, max_nodes_per_hop):
 72 |     '''
 73 |     Samples a subgraph around randomly chosen root nodes upto hops with a limit on the nodes selected per hop given by max_nodes_per_hop
 74 |     '''
 75 | 
 76 |     # collapse the list of adj mattricees to a single matrix
 77 |     A_incidence = incidence_matrix(adj_list)
 78 | 
 79 |     # chose a set of random root nodes
 80 |     idx = np.random.choice(range(len(A_incidence.tocoo().row)), size=params.n_roots, replace=False)
 81 |     roots = set([A_incidence.tocoo().row[id] for id in idx] + [A_incidence.tocoo().col[id] for id in idx])
 82 | 
 83 |     # get the neighbor nodes within a limit of hops
 84 |     bfs_generator = _bfs_relational(A_incidence, roots, max_nodes_per_hop)
 85 |     lvls = list()
 86 |     for _ in range(hops):
 87 |         lvls.append(next(bfs_generator))
 88 | 
 89 |     nodes = list(roots) + list(set().union(*lvls))
 90 | 
 91 |     return nodes
 92 | 
 93 | 
 94 | def mask_nodes(adj_list, nodes):
 95 |     '''
 96 |      mask a set of nodes from a given graph
 97 |     '''
 98 | 
 99 |     masked_adj_list = [adj.copy() for adj in adj_list]
100 |     for node in nodes:
101 |         for adj in masked_adj_list:
102 |             adj.data[adj.indptr[node]:adj.indptr[node + 1]] = 0
103 |             adj = adj.tocsr()
104 |             adj.data[adj.indptr[node]:adj.indptr[node + 1]] = 0
105 |             adj = adj.tocsc()
106 |     for adj in masked_adj_list:
107 |         adj.eliminate_zeros()
108 |     return masked_adj_list
109 | 
110 | 
111 | def main(params):
112 | 
113 |     adj_list, triplets, entity2id, relation2id, id2entity, id2relation = process_files(files)
114 | 
115 |     meta_train_nodes = get_subgraph(adj_list, params.hops, params.max_nodes_per_hop)  # list(range(750, 8500))  #
116 | 
117 |     masked_adj_list = mask_nodes(adj_list, meta_train_nodes)
118 | 
119 |     meta_test_nodes = get_subgraph(masked_adj_list, params.hops_test + 1, params.max_nodes_per_hop_test)  # list(range(0, 750))  #
120 | 
121 |     print('Common nodes among the two disjoint datasets (should ideally be zero): ', set(meta_train_nodes).intersection(set(meta_test_nodes)))
122 |     tmp = [adj[meta_train_nodes, :][:, meta_train_nodes] for adj in masked_adj_list]
123 |     print('Residual edges (should be zero) : ', np.sum(get_edge_count(tmp)))
124 | 
125 |     print("================")
126 |     print("Train graph stats")
127 |     print("================")
128 |     train_triplets, valid_triplets, test_triplets, train_active_rels = get_splits(adj_list, meta_train_nodes, range(len(adj_list)))
129 |     print("================")
130 |     print("Meta-test graph stats")
131 |     print("================")
132 |     meta_train_triplets, meta_valid_triplets, meta_test_triplets, meta_active_rels = get_splits(adj_list, meta_test_nodes, train_active_rels)
133 | 
134 |     print("================")
135 |     print('Extra rels (should be empty): ', set(meta_active_rels) - set(train_active_rels))
136 | 
137 |     # TODO: ABSTRACT THIS INTO A METHOD
138 |     data_dir = os.path.join(params.main_dir, 'data/{}'.format(params.new_dataset))
139 |     if not os.path.exists(data_dir):
140 |         os.makedirs(data_dir)
141 | 
142 |     save_to_file(data_dir, 'train.txt', train_triplets, id2entity, id2relation)
143 |     save_to_file(data_dir, 'valid.txt', valid_triplets, id2entity, id2relation)
144 |     save_to_file(data_dir, 'test.txt', test_triplets, id2entity, id2relation)
145 | 
146 |     meta_data_dir = os.path.join(params.main_dir, 'data/{}'.format(params.new_dataset + '_meta'))
147 |     if not os.path.exists(meta_data_dir):
148 |         os.makedirs(meta_data_dir)
149 | 
150 |     save_to_file(meta_data_dir, 'train.txt', meta_train_triplets, id2entity, id2relation)
151 |     save_to_file(meta_data_dir, 'valid.txt', meta_valid_triplets, id2entity, id2relation)
152 |     save_to_file(meta_data_dir, 'test.txt', meta_test_triplets, id2entity, id2relation)
153 | 
154 | 
155 | if __name__ == '__main__':
156 | 
157 |     parser = argparse.ArgumentParser(description='Save adjacency matrtices and triplets')
158 | 
159 |     parser.add_argument("--dataset", "-d", type=str, default="FB15K237",
160 |                         help="Dataset string")
161 |     parser.add_argument("--new_dataset", "-nd", type=str, default="fb_v3",
162 |                         help="Dataset string")
163 |     parser.add_argument("--n_roots", "-n", type=int, default="1",
164 |                         help="Number of roots to sample the neighborhood from")
165 |     parser.add_argument("--hops", "-H", type=int, default="3",
166 |                         help="Number of hops to sample the neighborhood")
167 |     parser.add_argument("--max_nodes_per_hop", "-m", type=int, default="2500",
168 |                         help="Number of nodes in the neighborhood")
169 |     parser.add_argument("--hops_test", "-HT", type=int, default="3",
170 |                         help="Number of hops to sample the neighborhood")
171 |     parser.add_argument("--max_nodes_per_hop_test", "-mt", type=int, default="2500",
172 |                         help="Number of nodes in the neighborhood")
173 |     parser.add_argument("--seed", "-s", type=int, default="28",
174 |                         help="Numpy random seed")
175 | 
176 |     params = parser.parse_args()
177 | 
178 |     np.random.seed(params.seed)
179 |     random.seed(params.seed)
180 | 
181 |     params.main_dir = os.path.join(os.path.relpath(os.path.dirname(os.path.abspath(__file__))), '..')
182 | 
183 |     files = {
184 |         'train': os.path.join(params.main_dir, 'data/{}/train.txt'.format(params.dataset)),
185 |         'valid': os.path.join(params.main_dir, 'data/{}/valid.txt'.format(params.dataset)),
186 |         'test': os.path.join(params.main_dir, 'data/{}/test.txt'.format(params.dataset))
187 |     }
188 | 
189 |     main(params)
190 | 


--------------------------------------------------------------------------------
/CoMPILE_v2/README.md:
--------------------------------------------------------------------------------
 1 | 
 2 | 
 3 | To generate new dataset and extract subgraph:
 4 | 
 5 |      python3 ./train/inductive_subgraph8.py --prefix FB15k-237-v1  --data ./data/FB15k-237-inductive-v1/  --dataset FB15k-237-v1 --hop 3
 6 | 
 7 | 
 8 | To train the model:
 9 | 
10 |      python3 ./train/main_compile.py --prefix FB15k-237-v1 --data ./data/FB15k-237-inductive-v1  --dataset FB15k-237-v1 --hop 3 --train True --pretrained_emb False  --epochs_conv 20 --direct True --output_folder ./train/checkpoints/FB15k-237-v1/out/ --batch_size_conv 128 --test True
11 | 
12 | 
13 | To test the model:
14 | 
15 |      python3 ./train/main_compile.py --prefix FB15k-237-v1 --data ./data/FB15k-237-inductive-v1  --dataset FB15k-237-v1 --hop 3 --train False --pretrained_emb False  --epochs_conv 20 --direct True --output_folder ./train/checkpoints/FB15k-237-v1/out/ --batch_size_conv 128 --test True
16 | 
17 | 
18 | 
19 | 
20 | 
21 | 
22 | 
23 | 
24 | 
25 | 
26 | 
27 | 
28 | 
29 | 
30 | 
31 | 
32 | 
33 | 
34 | 
35 | 
36 | 


--------------------------------------------------------------------------------
/CoMPILE_v2/data/FB15k-237-inductive-v1/relation2id.json:
--------------------------------------------------------------------------------
1 | {"/award/award_winning_work/awards_won./award/award_honor/award_winner": 0, "/film/film/genre": 1, "/people/person/profession": 2, "/sports/sports_team_location/teams": 3, "/education/educational_degree/people_with_this_degree./education/education/institution": 4, "/organization/organization_founder/organizations_founded": 5, "/award/award_category/nominees./award/award_nomination/nominated_for": 6, "/film/film/release_date_s./film/film_regional_release_date/film_release_region": 7, "/people/person/gender": 8, "/film/actor/film./film/performance/film": 9, "/olympics/olympic_sport/athletes./olympics/olympic_athlete_affiliation/country": 10, "/time/event/instance_of_recurring_event": 11, "/award/award_category/winners./award/award_honor/award_winner": 12, "/soccer/football_player/current_team./sports/sports_team_roster/team": 13, "/media_common/netflix_genre/titles": 14, "/people/person/spouse_s./people/marriage/type_of_union": 15, "/award/award_nominee/award_nominations./award/award_nomination/award_nominee": 16, "/award/award_category/winners./award/award_honor/ceremony": 17, "/award/award_ceremony/awards_presented./award/award_honor/award_winner": 18, "/film/film/language": 19, "/base/culturalevent/event/entity_involved": 20, "/music/performance_role/track_performances./music/track_contribution/role": 21, "/food/food/nutrients./food/nutrition_fact/nutrient": 22, "/sports/sports_position/players./sports/sports_team_roster/team": 23, "/tv/tv_program/genre": 24, "/people/person/spouse_s./people/marriage/location_of_ceremony": 25, "/award/award_winner/awards_won./award/award_honor/award_winner": 26, "/base/saturdaynightlive/snl_cast_member/seasons./base/saturdaynightlive/snl_season_tenure/cast_members": 27, "/people/person/nationality": 28, "/people/person/languages": 29, "/organization/organization/headquarters./location/mailing_address/state_province_region": 30, "/music/genre/artists": 31, "/tv/tv_program/languages": 32, "/base/localfood/seasonal_month/produce_available./base/localfood/produce_availability/seasonal_months": 33, "/education/educational_institution/students_graduates./education/education/student": 34, "/baseball/baseball_team/team_stats./baseball/baseball_team_stats/season": 35, "/sports/sports_team/roster./baseball/baseball_roster_position/position": 36, "/award/award_nominee/award_nominations./award/award_nomination/award": 37, "/base/popstra/celebrity/friendship./base/popstra/friendship/participant": 38, "/film/film/other_crew./film/film_crew_gig/film_crew_role": 39, "/base/biblioness/bibs_location/country": 40, "/location/location/contains": 41, "/location/statistical_region/religions./location/religion_percentage/religion": 42, "/tv/tv_program/regular_cast./tv/regular_tv_appearance/actor": 43, "/olympics/olympic_participating_country/athletes./olympics/olympic_athlete_affiliation/olympics": 44, "/people/person/religion": 45, "/award/award_nominee/award_nominations./award/award_nomination/nominated_for": 46, "/award/award_winning_work/awards_won./award/award_honor/award": 47, "/location/country/form_of_government": 48, "/film/film/produced_by": 49, "/music/group_member/membership./music/group_membership/role": 50, "/film/film/prequel": 51, "/base/popstra/celebrity/canoodled./base/popstra/canoodled/participant": 52, "/language/human_language/countries_spoken_in": 53, "/people/person/places_lived./people/place_lived/location": 54, "/organization/organization/headquarters./location/mailing_address/citytown": 55, "/film/film/executive_produced_by": 56, "/government/government_office_category/officeholders./government/government_position_held/jurisdiction_of_office": 57, "/film/actor/dubbing_performances./film/dubbing_performance/language": 58, "/sports/professional_sports_team/draft_picks./sports/sports_league_draft_pick/draft": 59, "/film/film_distributor/films_distributed./film/film_film_distributor_relationship/film": 60, "/olympics/olympic_participating_country/medals_won./olympics/olympic_medal_honor/olympics": 61, "/education/educational_degree/people_with_this_degree./education/education/student": 62, "/government/politician/government_positions_held./government/government_position_held/basic_title": 63, "/location/location/time_zones": 64, "/base/schemastaging/organization_extra/phone_number./base/schemastaging/phone_sandbox/service_language": 65, "/people/marriage_union_type/unions_of_this_type./people/marriage/location_of_ceremony": 66, "/people/person/spouse_s./people/marriage/spouse": 67, "/film/director/film": 68, "/tv/tv_program/program_creator": 69, "/sports/sports_league_draft/picks./sports/sports_league_draft_pick/school": 70, "/award/award_ceremony/awards_presented./award/award_honor/honored_for": 71, "/film/film/film_format": 72, "/music/instrument/instrumentalists": 73, "/award/award_nominated_work/award_nominations./award/award_nomination/nominated_for": 74, "/base/x2010fifaworldcupsouthafrica/world_cup_squad/current_world_cup_squad./base/x2010fifaworldcupsouthafrica/current_world_cup_squad/current_club": 75, "/education/educational_degree/people_with_this_degree./education/education/major_field_of_study": 76, "/government/legislative_session/members./government/government_position_held/district_represented": 77, "/film/film/written_by": 78, "/sports/sports_team/colors": 79, "/sports/sports_team/roster./american_football/football_historical_roster_position/position_s": 80, "/music/artist/track_contributions./music/track_contribution/role": 81, "/people/ethnicity/people": 82, "/base/marchmadness/ncaa_basketball_tournament/seeds./base/marchmadness/ncaa_tournament_seed/team": 83, "/location/location/adjoin_s./location/adjoining_relationship/adjoins": 84, "/film/film/runtime./film/film_cut/film_release_region": 85, "/award/ranked_item/appears_in_ranked_lists./award/ranking/list": 86, "/music/performance_role/regular_performances./music/group_membership/role": 87, "/education/field_of_study/students_majoring./education/education/student": 88, "/location/statistical_region/places_exported_to./location/imports_and_exports/exported_to": 89, "/base/popstra/location/vacationers./base/popstra/vacation_choice/vacationer": 90, "/tv/tv_producer/programs_produced./tv/tv_producer_term/program": 91, "/music/record_label/artist": 92, "/location/location/partially_contains": 93, "/base/popstra/celebrity/dated./base/popstra/dated/participant": 94, "/music/performance_role/guest_performances./music/recording_contribution/performance_role": 95, "/film/film/film_festivals": 96, "/travel/travel_destination/climate./travel/travel_destination_monthly_climate/month": 97, "/sports/professional_sports_team/draft_picks./sports/sports_league_draft_pick/school": 98, "/location/capital_of_administrative_division/capital_of./location/administrative_division_capital_relationship/administrative_division": 99, "/music/genre/parent_genre": 100, "/music/performance_role/regular_performances./music/group_membership/group": 101, "/influence/influence_node/influenced_by": 102, "/film/film/country": 103, "/military/military_combatant/military_conflicts./military/military_combatant_group/combatants": 104, "/award/award_winning_work/awards_won./award/award_honor/honored_for": 105, "/film/film/production_companies": 106, "/military/military_conflict/combatants./military/military_combatant_group/combatants": 107, "/sports/pro_athlete/teams./sports/sports_team_roster/team": 108, "/location/hud_county_place/county": 109, "/base/biblioness/bibs_location/state": 110, "/location/administrative_division/country": 111, "/education/educational_institution/students_graduates./education/education/major_field_of_study": 112, "/award/hall_of_fame/inductees./award/hall_of_fame_induction/inductee": 113, "/film/film/story_by": 114, "/business/business_operation/industry": 115, "/dataworld/gardening_hint/split_to": 116, "/broadcast/content/artist": 117, "/people/person/sibling_s./people/sibling_relationship/sibling": 118, "/organization/organization_member/member_of./organization/organization_membership/organization": 119, "/people/cause_of_death/people": 120, "/people/deceased_person/place_of_burial": 121, "/business/job_title/people_with_this_title./business/employment_tenure/company": 122, "/film/film/music": 123, "/film/film/other_crew./film/film_crew_gig/crewmember": 124, "/music/instrument/family": 125, "/film/film_subject/films": 126, "/government/legislative_session/members./government/government_position_held/legislative_sessions": 127, "/people/person/place_of_birth": 128, "/film/film/release_date_s./film/film_regional_release_date/film_regional_debut_venue": 129, "/education/educational_institution/colors": 130, "/sports/sports_team/sport": 131, "/travel/travel_destination/how_to_get_here./travel/transportation/mode_of_transportation": 132, "/influence/influence_node/peers./influence/peer_relationship/peers": 133, "/award/award_category/disciplines_or_subjects": 134, "/people/ethnicity/languages_spoken": 135, "/medicine/disease/risk_factors": 136, "/education/field_of_study/students_majoring./education/education/major_field_of_study": 137, "/location/country/official_language": 138, "/education/educational_institution/school_type": 139, "/base/aareas/schema/administrative_area/administrative_parent": 140, "/government/political_party/politicians_in_this_party./government/political_party_tenure/politician": 141, "/award/award_category/category_of": 142, "/celebrities/celebrity/celebrity_friends./celebrities/friendship/friend": 143, "/film/film/distributors./film/film_film_distributor_relationship/film_distribution_medium": 144, "/base/aareas/schema/administrative_area/capital": 145, "/location/us_county/county_seat": 146, "/people/ethnicity/geographic_distribution": 147, "/film/film/featured_film_locations": 148, "/film/film/dubbing_performances./film/dubbing_performance/actor": 149, "/location/administrative_division/first_level_division_of": 150, "/user/ktrueman/default_domain/international_organization/member_states": 151, "/base/popstra/celebrity/breakup./base/popstra/breakup/participant": 152, "/film/film/personal_appearances./film/personal_film_appearance/person": 153, "/time/event/locations": 154, "/user/alexander/philosophy/philosopher/interests": 155, "/user/jg/default_domain/olympic_games/sports": 156, "/olympics/olympic_sport/athletes./olympics/olympic_athlete_affiliation/olympics": 157, "/olympics/olympic_games/sports": 158, "/celebrities/celebrity/sexual_relationships./celebrities/romantic_relationship/celebrity": 159, "/tv/tv_personality/tv_regular_appearances./tv/tv_regular_personal_appearance/program": 160, "/people/person/employment_history./business/employment_tenure/company": 161, "/tv/tv_writer/tv_programs./tv/tv_program_writer_relationship/tv_program": 162, "/music/artist/origin": 163, "/film/film_set_designer/film_sets_designed": 164, "/music/artist/contribution./music/recording_contribution/performance_role": 165, "/organization/organization/child./organization/organization_relationship/child": 166, "/film/film/costume_design_by": 167, "/film/film/edited_by": 168, "/location/country/capital": 169, "/base/schemastaging/organization_extra/phone_number./base/schemastaging/phone_sandbox/service_location": 170, "/organization/organization/place_founded": 171, "/people/profession/specialization_of": 172, "/music/group_member/membership./music/group_membership/group": 173, "/film/film/film_production_design_by": 174, "/sports/sport/pro_athletes./sports/pro_sports_played/athlete": 175, "/film/film/cinematography": 176, "/tv/tv_network/programs./tv/tv_network_duration/program": 177, "/government/politician/government_positions_held./government/government_position_held/legislative_sessions": 178, "/soccer/football_team/current_roster./soccer/football_roster_position/position": 179}


--------------------------------------------------------------------------------
/CoMPILE_v2/data/FB15k-237-v1_3_hop_new_data.pickle:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TmacMai/CoMPILE_Inductive_Knowledge_Graph/523db77f8178f89faf5ec06c677ca66b84ef35eb/CoMPILE_v2/data/FB15k-237-v1_3_hop_new_data.pickle


--------------------------------------------------------------------------------
/CoMPILE_v2/data/FB15k-237-v1_3_hop_total_test_head2.pickle:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TmacMai/CoMPILE_Inductive_Knowledge_Graph/523db77f8178f89faf5ec06c677ca66b84ef35eb/CoMPILE_v2/data/FB15k-237-v1_3_hop_total_test_head2.pickle


--------------------------------------------------------------------------------
/CoMPILE_v2/data/FB15k-237-v1_3_hop_total_test_tail2.pickle:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TmacMai/CoMPILE_Inductive_Knowledge_Graph/523db77f8178f89faf5ec06c677ca66b84ef35eb/CoMPILE_v2/data/FB15k-237-v1_3_hop_total_test_tail2.pickle


--------------------------------------------------------------------------------
/CoMPILE_v2/data/FB15k-237-v1_3_total_train_data.pickle:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TmacMai/CoMPILE_Inductive_Knowledge_Graph/523db77f8178f89faf5ec06c677ca66b84ef35eb/CoMPILE_v2/data/FB15k-237-v1_3_total_train_data.pickle


--------------------------------------------------------------------------------
/CoMPILE_v2/data/FB15k-237-v1_3_total_train_label.pickle:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TmacMai/CoMPILE_Inductive_Knowledge_Graph/523db77f8178f89faf5ec06c677ca66b84ef35eb/CoMPILE_v2/data/FB15k-237-v1_3_total_train_label.pickle


--------------------------------------------------------------------------------
/CoMPILE_v2/data/FB15k-237-v1_3_total_val_data.pickle:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TmacMai/CoMPILE_Inductive_Knowledge_Graph/523db77f8178f89faf5ec06c677ca66b84ef35eb/CoMPILE_v2/data/FB15k-237-v1_3_total_val_data.pickle


--------------------------------------------------------------------------------
/CoMPILE_v2/data/FB15k-237-v1_3_total_val_label.pickle:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TmacMai/CoMPILE_Inductive_Knowledge_Graph/523db77f8178f89faf5ec06c677ca66b84ef35eb/CoMPILE_v2/data/FB15k-237-v1_3_total_val_label.pickle


--------------------------------------------------------------------------------
/CoMPILE_v2/data/nell-inductive-v1/test_inductive.txt:
--------------------------------------------------------------------------------
  1 | concept:televisionstation:wqec	concept:subpartof	concept:company:pbs
  2 | concept:televisionstation:kusm	concept:agentbelongstoorganization	concept:company:pbs
  3 | concept:televisionstation:kqsd_tv	concept:agentcollaborateswithagent	concept:company:pbs
  4 | concept:company:pbs	concept:agentcontrols	concept:televisionstation:kufm_tv
  5 | concept:company:pbs	concept:acquired	concept:company:kues
  6 | concept:company:pbs	concept:agentcontrols	concept:televisionstation:kyne_tv
  7 | concept:company:pbs	concept:agentcontrols	concept:televisionstation:kcos_tv
  8 | concept:televisionstation:wxxi_tv	concept:agentbelongstoorganization	concept:company:pbs
  9 | concept:televisionstation:wvta	concept:subpartof	concept:company:pbs
 10 | concept:televisionstation:wunp_tv	concept:agentcollaborateswithagent	concept:company:pbs
 11 | concept:televisionstation:koac_tv	concept:agentbelongstoorganization	concept:company:pbs
 12 | concept:televisionstation:wsec	concept:subpartof	concept:company:pbs
 13 | concept:televisionstation:wvta	concept:agentbelongstoorganization	concept:company:pbs
 14 | concept:televisionstation:kepb_tv	concept:agentbelongstoorganization	concept:company:pbs
 15 | concept:televisionstation:wedn	concept:subpartof	concept:company:pbs
 16 | concept:televisionstation:wunf_tv	concept:agentbelongstoorganization	concept:company:pbs
 17 | concept:televisionstation:ktwu	concept:agentbelongstoorganization	concept:company:pbs
 18 | concept:televisionstation:kood	concept:televisionstationaffiliatedwith	concept:televisionnetwork:pbs
 19 | concept:televisionstation:wvut	concept:subpartof	concept:company:pbs
 20 | concept:televisionstation:kfts	concept:subpartof	concept:company:pbs
 21 | concept:televisionstation:wunm_tv	concept:agentcollaborateswithagent	concept:company:pbs
 22 | concept:televisionstation:whla_tv	concept:televisionstationaffiliatedwith	concept:televisionnetwork:pbs
 23 | concept:televisionstation:whtj	concept:agentbelongstoorganization	concept:company:pbs
 24 | concept:company:pbs	concept:agentcontrols	concept:televisionstation:wmed_tv
 25 | concept:televisionstation:kpsd_tv	concept:subpartof	concept:company:pbs
 26 | concept:televisionstation:wkyu_tv	concept:agentcollaborateswithagent	concept:company:pbs
 27 | concept:televisionstation:wkon	concept:agentcollaborateswithagent	concept:company:pbs
 28 | concept:televisionstation:kuid_tv	concept:subpartof	concept:company:pbs
 29 | concept:televisionstation:ktsd_tv	concept:agentcollaborateswithagent	concept:company:pbs
 30 | concept:televisionstation:wgiq	concept:agentcollaborateswithagent	concept:company:pbs
 31 | concept:televisionstation:wlef_tv	concept:subpartof	concept:company:pbs
 32 | concept:televisionstation:whro_tv	concept:subpartof	concept:company:pbs
 33 | concept:televisionstation:wung_tv	concept:subpartoforganization	concept:televisionnetwork:pbs
 34 | concept:televisionstation:wkoh	concept:agentbelongstoorganization	concept:company:pbs
 35 | concept:televisionstation:wetp_tv	concept:agentbelongstoorganization	concept:company:pbs
 36 | concept:televisionstation:wfiq	concept:agentcollaborateswithagent	concept:company:pbs
 37 | concept:televisionstation:kopb_tv	concept:subpartof	concept:company:pbs
 38 | concept:televisionstation:wfiq	concept:agentbelongstoorganization	concept:company:pbs
 39 | concept:televisionstation:wmpt	concept:subpartof	concept:company:pbs
 40 | concept:televisionstation:kbyu_tv	concept:agentcollaborateswithagent	concept:company:pbs
 41 | concept:televisionstation:wmeb_tv	concept:subpartof	concept:company:pbs
 42 | concept:company:pbs	concept:agentcontrols	concept:televisionstation:krma_tv
 43 | concept:televisionstation:wntv	concept:agentbelongstoorganization	concept:company:pbs
 44 | concept:televisionstation:ktsc_tv	concept:subpartoforganization	concept:televisionnetwork:pbs
 45 | concept:televisionstation:wmae_tv	concept:televisionstationaffiliatedwith	concept:televisionnetwork:pbs
 46 | concept:televisionstation:wbcc_tv	concept:agentbelongstoorganization	concept:company:pbs
 47 | concept:televisionstation:kcet	concept:televisionstationaffiliatedwith	concept:televisionnetwork:pbs
 48 | concept:televisionstation:kwse	concept:agentbelongstoorganization	concept:company:pbs
 49 | concept:televisionstation:ktci_tv	concept:agentbelongstoorganization	concept:company:pbs
 50 | concept:televisionstation:wmae_tv	concept:agentcollaborateswithagent	concept:company:pbs
 51 | concept:televisionstation:wunj_tv	concept:televisionstationaffiliatedwith	concept:televisionnetwork:pbs
 52 | concept:televisionstation:kufm_tv	concept:agentbelongstoorganization	concept:company:pbs
 53 | concept:televisionstation:kufm_tv	concept:subpartoforganization	concept:televisionnetwork:pbs
 54 | concept:televisionstation:wpbt	concept:subpartof	concept:company:pbs
 55 | concept:televisionstation:whmc	concept:subpartoforganization	concept:televisionnetwork:pbs
 56 | concept:company:pbs	concept:agentcontrols	concept:televisionstation:kcts_tv
 57 | concept:televisionstation:whtj	concept:subpartoforganization	concept:televisionnetwork:pbs
 58 | concept:company:pbs	concept:agentcontrols	concept:televisionstation:wlpb_tv
 59 | concept:televisionstation:wkno_tv	concept:agentbelongstoorganization	concept:company:pbs
 60 | concept:televisionstation:ktsd_tv	concept:agentbelongstoorganization	concept:company:pbs
 61 | concept:televisionstation:wviz_tv	concept:agentbelongstoorganization	concept:company:pbs
 62 | concept:company:pbs	concept:agentcontrols	concept:televisionstation:wgvu_tv
 63 | concept:company:pbs	concept:agentcontrols	concept:televisionstation:wbiq_tv
 64 | concept:company:pbs	concept:agentcontrols	concept:televisionstation:woub_tv
 65 | concept:televisionstation:wenh	concept:televisionstationaffiliatedwith	concept:televisionnetwork:pbs
 66 | concept:televisionstation:kbyu_tv	concept:subpartof	concept:company:pbs
 67 | concept:televisionstation:wunm_tv	concept:agentbelongstoorganization	concept:company:pbs
 68 | concept:televisionstation:klru	concept:televisionstationaffiliatedwith	concept:televisionnetwork:pbs
 69 | concept:televisionstation:wgby_tv	concept:subpartof	concept:company:pbs
 70 | concept:company:pbs	concept:agentcontrols	concept:televisionstation:wdse_tv
 71 | concept:company:pbs	concept:agentcontrols	concept:televisionstation:wmpt
 72 | concept:televisionstation:kacv_tv	concept:subpartof	concept:company:pbs
 73 | concept:televisionstation:kaet	concept:subpartof	concept:company:pbs
 74 | concept:televisionstation:ketc	concept:agentcollaborateswithagent	concept:company:pbs
 75 | concept:televisionstation:kbhe_tv	concept:subpartof	concept:company:pbs
 76 | concept:televisionstation:kwse	concept:televisionstationaffiliatedwith	concept:televisionnetwork:pbs
 77 | concept:televisionstation:krwg_tv	concept:subpartof	concept:company:pbs
 78 | concept:televisionstation:weta_tv	concept:subpartof	concept:company:pbs
 79 | concept:company:pbs	concept:agentcontrols	concept:televisionstation:kisu_tv
 80 | concept:televisionstation:wfwa	concept:subpartof	concept:company:pbs
 81 | concept:televisionstation:weta_tv	concept:televisionstationaffiliatedwith	concept:televisionnetwork:pbs
 82 | concept:televisionstation:kcwc_tv	concept:subpartof	concept:company:pbs
 83 | concept:televisionstation:ktne_tv	concept:televisionstationaffiliatedwith	concept:televisionnetwork:pbs
 84 | concept:company:pbs	concept:agentcontrols	concept:televisionstation:wmsy_tv
 85 | concept:company:pbs	concept:agentcontrols	concept:televisionstation:kopb_tv
 86 | concept:televisionstation:wund_tv	concept:subpartof	concept:company:pbs
 87 | concept:company:pbs	concept:agentcontrols	concept:televisionstation:kamu_tv
 88 | concept:company:pbs	concept:agentcontrols	concept:televisionstation:wkzt_tv
 89 | concept:company:pbs	concept:agentcontrols	concept:televisionstation:wpby_tv
 90 | concept:televisionstation:kedt	concept:televisionstationaffiliatedwith	concept:televisionnetwork:pbs
 91 | concept:televisionstation:ketg	concept:subpartof	concept:company:pbs
 92 | concept:televisionstation:whro_tv	concept:agentcollaborateswithagent	concept:company:pbs
 93 | concept:televisionstation:wneo	concept:agentcollaborateswithagent	concept:company:pbs
 94 | concept:televisionstation:wpbt	concept:subpartoforganization	concept:televisionnetwork:pbs
 95 | concept:televisionstation:kcwc_tv	concept:agentbelongstoorganization	concept:company:pbs
 96 | concept:televisionstation:kopb_tv	concept:televisionstationaffiliatedwith	concept:televisionnetwork:pbs
 97 | concept:televisionstation:wlrn_tv	concept:agentcollaborateswithagent	concept:company:pbs
 98 | concept:televisionstation:weta_tv	concept:agentcollaborateswithagent	concept:company:pbs
 99 | concept:televisionstation:kera_tv	concept:agentbelongstoorganization	concept:company:pbs
100 | concept:company:pbs	concept:agentcontrols	concept:televisionstation:wkha
101 | 


--------------------------------------------------------------------------------
/CoMPILE_v2/data/nell-inductive-v1/valid_inductive.txt:
--------------------------------------------------------------------------------
  1 | concept:televisionstation:wune_tv	concept:agentbelongstoorganization	concept:company:pbs
  2 | concept:televisionstation:wsbn_tv	concept:subpartof	concept:company:pbs
  3 | concept:televisionstation:wnin_tv	concept:subpartof	concept:company:pbs
  4 | concept:televisionstation:wgte_tv	concept:subpartoforganization	concept:televisionnetwork:pbs
  5 | concept:televisionstation:wedn	concept:subpartoforganization	concept:televisionnetwork:pbs
  6 | concept:televisionstation:wmed_tv	concept:subpartoforganization	concept:televisionnetwork:pbs
  7 | concept:televisionstation:wfiq	concept:subpartoforganization	concept:televisionnetwork:pbs
  8 | concept:televisionstation:wbra_tv	concept:agentbelongstoorganization	concept:company:pbs
  9 | concept:televisionstation:wkpi	concept:subpartoforganization	concept:televisionnetwork:pbs
 10 | concept:televisionstation:wmsy_tv	concept:subpartof	concept:company:pbs
 11 | concept:televisionstation:krne_tv	concept:agentcollaborateswithagent	concept:company:pbs
 12 | concept:televisionstation:kteh	concept:agentbelongstoorganization	concept:company:pbs
 13 | concept:televisionstation:wkpc_tv	concept:televisionstationaffiliatedwith	concept:televisionnetwork:pbs
 14 | concept:televisionstation:wyes_tv	concept:subpartof	concept:company:pbs
 15 | concept:televisionstation:wnin_tv	concept:agentbelongstoorganization	concept:company:pbs
 16 | concept:televisionstation:wkon	concept:agentbelongstoorganization	concept:company:pbs
 17 | concept:company:pbs	concept:agentcontrols	concept:televisionstation:kmbh_tv
 18 | concept:company:pbs	concept:agentcontrols	concept:televisionstation:wcfe_tv
 19 | concept:televisionstation:wntv	concept:televisionstationaffiliatedwith	concept:televisionnetwork:pbs
 20 | concept:televisionstation:wmec	concept:agentbelongstoorganization	concept:company:pbs
 21 | concept:company:pbs	concept:agentcontrols	concept:televisionstation:kixe_tv
 22 | concept:televisionstation:kusd_tv	concept:agentbelongstoorganization	concept:company:pbs
 23 | concept:company:pbs	concept:agentcontrols	concept:televisionstation:wund_tv
 24 | concept:televisionstation:wlvt_tv	concept:agentbelongstoorganization	concept:company:pbs
 25 | concept:televisionstation:kcsd_tv	concept:subpartof	concept:company:pbs
 26 | concept:televisionstation:wmea_tv	concept:subpartof	concept:company:pbs
 27 | concept:televisionstation:witf_tv	concept:agentcollaborateswithagent	concept:company:pbs
 28 | concept:televisionstation:kued	concept:agentbelongstoorganization	concept:company:pbs
 29 | concept:televisionstation:wnpb_tv	concept:subpartoforganization	concept:televisionnetwork:pbs
 30 | concept:televisionstation:wunf_tv	concept:subpartof	concept:company:pbs
 31 | concept:televisionstation:kcdt_tv	concept:subpartof	concept:company:pbs
 32 | concept:company:pbs	concept:agentcontrols	concept:televisionstation:krwg_tv
 33 | concept:televisionstation:wlef_tv	concept:agentcollaborateswithagent	concept:company:pbs
 34 | concept:televisionstation:kozk	concept:subpartoforganization	concept:televisionnetwork:pbs
 35 | concept:televisionstation:wkyu_tv	concept:subpartoforganization	concept:televisionnetwork:pbs
 36 | concept:televisionstation:ketc	concept:televisionstationaffiliatedwith	concept:televisionnetwork:pbs
 37 | concept:televisionstation:wbiq_tv	concept:subpartof	concept:company:pbs
 38 | concept:televisionstation:kcos_tv	concept:subpartof	concept:company:pbs
 39 | concept:company:pbs	concept:agentcontrols	concept:televisionstation:kwet
 40 | concept:company:pbs	concept:agentcontrols	concept:televisionstation:ktci_tv
 41 | concept:company:pbs	concept:agentcontrols	concept:televisionstation:wntv
 42 | concept:televisionstation:wkha	concept:agentcollaborateswithagent	concept:company:pbs
 43 | concept:company:pbs	concept:agentcontrols	concept:televisionstation:wnsc_tv
 44 | concept:televisionstation:wtiu_tv	concept:agentbelongstoorganization	concept:company:pbs
 45 | concept:televisionstation:wgbx_tv	concept:subpartoforganization	concept:televisionnetwork:pbs
 46 | concept:televisionstation:wkmu	concept:subpartof	concept:company:pbs
 47 | concept:televisionstation:wyes_tv	concept:agentbelongstoorganization	concept:company:pbs
 48 | concept:televisionstation:wkpi	concept:subpartof	concept:company:pbs
 49 | concept:televisionstation:wunl_tv	concept:televisionstationaffiliatedwith	concept:televisionnetwork:pbs
 50 | concept:company:pbs	concept:agentcontrols	concept:televisionstation:wpto
 51 | concept:televisionstation:kcts_tv	concept:televisionstationaffiliatedwith	concept:televisionnetwork:pbs
 52 | concept:televisionstation:wouc_tv	concept:agentbelongstoorganization	concept:company:pbs
 53 | concept:televisionstation:ksps_tv	concept:subpartof	concept:company:pbs
 54 | concept:televisionstation:kbyu_tv	concept:televisionstationaffiliatedwith	concept:televisionnetwork:pbs
 55 | concept:televisionstation:wung_tv	concept:agentbelongstoorganization	concept:company:pbs
 56 | concept:televisionstation:wnjs	concept:agentcollaborateswithagent	concept:company:pbs
 57 | concept:televisionstation:wha__tv	concept:subpartof	concept:company:pbs
 58 | concept:televisionstation:wunu	concept:agentbelongstoorganization	concept:company:pbs
 59 | concept:televisionstation:wkso_tv	concept:agentcollaborateswithagent	concept:company:pbs
 60 | concept:televisionstation:klrn_tv	concept:televisionstationaffiliatedwith	concept:televisionnetwork:pbs
 61 | concept:televisionstation:krwg_tv	concept:agentcollaborateswithagent	concept:company:pbs
 62 | concept:televisionstation:wqec	concept:agentcollaborateswithagent	concept:company:pbs
 63 | concept:company:pbs	concept:agentcontrols	concept:televisionstation:kbyu_tv
 64 | concept:televisionstation:wptd	concept:agentbelongstoorganization	concept:company:pbs
 65 | concept:company:pbs	concept:agentcontrols	concept:televisionstation:wvpy
 66 | concept:company:pbs	concept:agentcontrols	concept:televisionstation:wvut
 67 | concept:televisionstation:wgby_tv	concept:subpartoforganization	concept:televisionnetwork:pbs
 68 | concept:televisionstation:wwpb	concept:subpartoforganization	concept:televisionnetwork:pbs
 69 | concept:televisionstation:wund_tv	concept:agentcollaborateswithagent	concept:company:pbs
 70 | concept:company:pbs	concept:agentcontrols	concept:televisionstation:wmvs_tv
 71 | concept:televisionstation:kepb_tv	concept:agentcollaborateswithagent	concept:company:pbs
 72 | concept:company:pbs	concept:agentcontrols	concept:televisionstation:kaft
 73 | concept:televisionstation:wnjt	concept:agentbelongstoorganization	concept:company:pbs
 74 | concept:company:pbs	concept:agentcontrols	concept:televisionstation:wswp_tv
 75 | concept:televisionstation:kqsd_tv	concept:televisionstationaffiliatedwith	concept:televisionnetwork:pbs
 76 | concept:televisionstation:kmos_tv	concept:agentbelongstoorganization	concept:company:pbs
 77 | concept:televisionstation:wpbo_tv	concept:subpartof	concept:company:pbs
 78 | concept:televisionstation:kltm_tv	concept:subpartoforganization	concept:televisionnetwork:pbs
 79 | concept:company:pbs	concept:agentcontrols	concept:televisionstation:kpne_tv
 80 | concept:televisionstation:wmau_tv	concept:subpartof	concept:company:pbs
 81 | concept:televisionstation:kued	concept:subpartoforganization	concept:televisionnetwork:pbs
 82 | concept:company:pbs	concept:agentcontrols	concept:televisionstation:wkpi
 83 | concept:company:pbs	concept:agentcontrols	concept:televisionstation:wune_tv
 84 | concept:televisionstation:kltm_tv	concept:agentbelongstoorganization	concept:company:pbs
 85 | concept:televisionstation:wmaw_tv	concept:agentcollaborateswithagent	concept:company:pbs
 86 | concept:televisionstation:wunj_tv	concept:agentbelongstoorganization	concept:company:pbs
 87 | concept:televisionstation:kuht	concept:agentbelongstoorganization	concept:company:pbs
 88 | concept:televisionstation:wunf_tv	concept:subpartoforganization	concept:televisionnetwork:pbs
 89 | concept:televisionstation:wvpy	concept:subpartof	concept:company:pbs
 90 | concept:televisionstation:wmsy_tv	concept:agentcollaborateswithagent	concept:company:pbs
 91 | concept:televisionstation:kwet	concept:subpartoforganization	concept:televisionnetwork:pbs
 92 | concept:televisionstation:wmau_tv	concept:subpartoforganization	concept:televisionnetwork:pbs
 93 | concept:televisionstation:wund_tv	concept:subpartoforganization	concept:televisionnetwork:pbs
 94 | concept:televisionstation:wmaw_tv	concept:televisionstationaffiliatedwith	concept:televisionnetwork:pbs
 95 | concept:televisionstation:wnjn	concept:agentcollaborateswithagent	concept:company:pbs
 96 | concept:televisionstation:wmah_tv	concept:subpartof	concept:company:pbs
 97 | concept:company:pbs	concept:agentcontrols	concept:televisionstation:wmpb_tv
 98 | concept:televisionstation:ksmq_tv	concept:subpartof	concept:company:pbs
 99 | concept:company:pbs	concept:agentcontrols	concept:televisionstation:weta_tv
100 | concept:televisionstation:wmpn_tv	concept:agentcollaborateswithagent	concept:company:pbs
101 | concept:televisionstation:wune_tv	concept:agentcollaborateswithagent	concept:company:pbs
102 | 


--------------------------------------------------------------------------------
/CoMPILE_v2/log/FB15k-237-v1_error_now.log:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TmacMai/CoMPILE_Inductive_Knowledge_Graph/523db77f8178f89faf5ec06c677ca66b84ef35eb/CoMPILE_v2/log/FB15k-237-v1_error_now.log


--------------------------------------------------------------------------------
/CoMPILE_v2/train/__pycache__/create_batch_inductive2.cpython-37.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TmacMai/CoMPILE_Inductive_Knowledge_Graph/523db77f8178f89faf5ec06c677ca66b84ef35eb/CoMPILE_v2/train/__pycache__/create_batch_inductive2.cpython-37.pyc


--------------------------------------------------------------------------------
/CoMPILE_v2/train/__pycache__/layers.cpython-37.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TmacMai/CoMPILE_Inductive_Knowledge_Graph/523db77f8178f89faf5ec06c677ca66b84ef35eb/CoMPILE_v2/train/__pycache__/layers.cpython-37.pyc


--------------------------------------------------------------------------------
/CoMPILE_v2/train/__pycache__/layers2.cpython-37.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TmacMai/CoMPILE_Inductive_Knowledge_Graph/523db77f8178f89faf5ec06c677ca66b84ef35eb/CoMPILE_v2/train/__pycache__/layers2.cpython-37.pyc


--------------------------------------------------------------------------------
/CoMPILE_v2/train/__pycache__/models4.cpython-37.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TmacMai/CoMPILE_Inductive_Knowledge_Graph/523db77f8178f89faf5ec06c677ca66b84ef35eb/CoMPILE_v2/train/__pycache__/models4.cpython-37.pyc


--------------------------------------------------------------------------------
/CoMPILE_v2/train/__pycache__/preprocess2.cpython-37.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TmacMai/CoMPILE_Inductive_Knowledge_Graph/523db77f8178f89faf5ec06c677ca66b84ef35eb/CoMPILE_v2/train/__pycache__/preprocess2.cpython-37.pyc


--------------------------------------------------------------------------------
/CoMPILE_v2/train/__pycache__/preprocess_inductive.cpython-37.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TmacMai/CoMPILE_Inductive_Knowledge_Graph/523db77f8178f89faf5ec06c677ca66b84ef35eb/CoMPILE_v2/train/__pycache__/preprocess_inductive.cpython-37.pyc


--------------------------------------------------------------------------------
/CoMPILE_v2/train/__pycache__/utils.cpython-37.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TmacMai/CoMPILE_Inductive_Knowledge_Graph/523db77f8178f89faf5ec06c677ca66b84ef35eb/CoMPILE_v2/train/__pycache__/utils.cpython-37.pyc


--------------------------------------------------------------------------------
/CoMPILE_v2/train/create_batch.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TmacMai/CoMPILE_Inductive_Knowledge_Graph/523db77f8178f89faf5ec06c677ca66b84ef35eb/CoMPILE_v2/train/create_batch.pyc


--------------------------------------------------------------------------------
/CoMPILE_v2/train/create_batch2.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TmacMai/CoMPILE_Inductive_Knowledge_Graph/523db77f8178f89faf5ec06c677ca66b84ef35eb/CoMPILE_v2/train/create_batch2.pyc


--------------------------------------------------------------------------------
/CoMPILE_v2/train/create_dataset_files.py:
--------------------------------------------------------------------------------
 1 | import numpy as np
 2 | import pandas as pd
 3 | import argparse
 4 | 
 5 | def parse_args():
 6 |     args = argparse.ArgumentParser()
 7 |     # network arguments
 8 |     args.add_argument("--data", help="data directory")
 9 |     args = args.parse_args()
10 |     return args
11 | 
12 | args = parse_args()
13 | 
14 | def getID():
15 |     folder = './data/%s/attentionEmb/'%args.data
16 |     lstEnts = {}
17 |     lstRels = {}
18 |     with open(folder + 'train_inductive.txt') as f, open(folder + 'train_marked_inductive.txt', 'w') as f2:
19 |         count = 0
20 |         for line in f:
21 |             line = line.strip().split('\t')
22 |             line = [i.strip() for i in line]
23 |             # print(line[0], line[1], line[2])
24 |             if line[0] not in lstEnts:
25 |                 lstEnts[line[0]] = len(lstEnts)
26 |             if line[1] not in lstRels:
27 |                 lstRels[line[1]] = len(lstRels)
28 |             if line[2] not in lstEnts:
29 |                 lstEnts[line[2]] = len(lstEnts)
30 |             count += 1
31 |             f2.write(str(line[0]) + '\t' + str(line[1]) +
32 |                      '\t' + str(line[2]) + '\n')
33 |         print("Size of train_marked set set ", count)
34 | 
35 |     with open(folder + 'valid_inductive.txt') as f, open(folder + 'valid_marked_inductive.txt', 'w') as f2:
36 |         count = 0
37 |         for line in f:
38 |             line = line.strip().split('\t')
39 |             line = [i.strip() for i in line]
40 |             # print(line[0], line[1], line[2])
41 |             if line[0] not in lstEnts:
42 |                 lstEnts[line[0]] = len(lstEnts)
43 |             if line[1] not in lstRels:
44 |                 lstRels[line[1]] = len(lstRels)
45 |             if line[2] not in lstEnts:
46 |                 lstEnts[line[2]] = len(lstEnts)
47 |             count += 1
48 |             f2.write(str(line[0]) + '\t' + str(line[1]) +
49 |                      '\t' + str(line[2]) + '\n')
50 |         print("Size of valid_marked set set ", count)
51 | 
52 |     with open(folder + 'test_inductive.txt') as f, open(folder + 'test_marked_inductive.txt', 'w') as f2:
53 |         count = 0
54 |         for line in f:
55 |             line = line.strip().split('\t')
56 |             line = [i.strip() for i in line]
57 |             # print(line[0], line[1], line[2])
58 |             if line[0] not in lstEnts:
59 |                 lstEnts[line[0]] = len(lstEnts)
60 |             if line[1] not in lstRels:
61 |                 lstRels[line[1]] = len(lstRels)
62 |             if line[2] not in lstEnts:
63 |                 lstEnts[line[2]] = len(lstEnts)
64 |             count += 1
65 |             f2.write(str(line[0]) + '\t' + str(line[1]) +
66 |                      '\t' + str(line[2]) + '\n')
67 |         print("Size of test_marked set set ", count)
68 | 
69 |     wri = open(folder + 'entity2id_inductive.txt', 'w')
70 |     for entity in lstEnts:
71 |         wri.write(entity + '\t' + str(lstEnts[entity]))
72 |         wri.write('\n')
73 |     wri.close()
74 | 
75 |     wri = open(folder + 'relation2id_inductive.txt', 'w')
76 |     for entity in lstRels:
77 |         wri.write(entity + '\t' + str(lstRels[entity]))
78 |         wri.write('\n')
79 |     wri.close()
80 | 
81 |     entity_df = pd.read_csv(folder+'entity2id_inductive.txt',sep='\t',header=None,names=['entity','index'])
82 |     feature_df = pd.read_csv('./data/%s/feature.csv'%args.data)
83 |     feature_df = feature_df.drop_duplicates(['entity']).reset_index(drop=True)
84 |     for col in feature_df.columns:
85 |         if col not in ['entity','index','type']:
86 |             feature_df[col] = (feature_df[col]-feature_df[col].mean()) / feature_df[col].std()
87 |     feature_df = pd.concat([feature_df.drop('type',axis=1),pd.get_dummies(feature_df['type'])],axis=1)
88 |     entity_df = entity_df.merge(feature_df,how='left',on='entity')
89 |     entity_df[[col for col in entity_df.columns if col not in ['entity','index','type']]].to_csv(folder+'entity2vec_inductive.txt',sep='\t',header=None,index=False)
90 |     relation_df = pd.read_csv(folder+'relation2id_inductive.txt',sep='\t',header=None,names=['entity','index'])
91 |     #relation_embeddings = np.random.randn(len(relation_df), 50)
92 |     relation_embeddings = np.concatenate([np.random.randn(len(relation_df), 50),np.eye(len(relation_df))],axis=1)
93 |     np.savetxt(folder+'relation2vec_inductive.txt', relation_embeddings,fmt='%f',delimiter='\t')
94 | 
95 | getID()
96 | 


--------------------------------------------------------------------------------
/CoMPILE_v2/train/layers.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | import torch
  3 | import torch.nn as nn
  4 | import time
  5 | import torch.nn.functional as F
  6 | from torch.autograd import Variable
  7 | 
  8 | 
  9 | CUDA = torch.cuda.is_available()
 10 | 
 11 | 
 12 | class ConvKB(nn.Module):
 13 |     def __init__(self, input_dim, input_seq_len, in_channels, out_channels, drop_prob, alpha_leaky):
 14 |         super().__init__()
 15 | 
 16 |         self.conv_layer = nn.Conv2d(
 17 |             in_channels, out_channels, (1, input_seq_len))  # kernel size -> 1*input_seq_length(i.e. 2)
 18 |         self.dropout = nn.Dropout(drop_prob)
 19 |         self.non_linearity = nn.ReLU()
 20 |         self.fc_layer = nn.Linear((input_dim) * out_channels, 1)
 21 | 
 22 |         nn.init.xavier_uniform_(self.fc_layer.weight, gain=1.414)
 23 |         nn.init.xavier_uniform_(self.conv_layer.weight, gain=1.414)
 24 | 
 25 |     def forward(self, conv_input):
 26 | 
 27 |         batch_size, length, dim = conv_input.size()
 28 |         # assuming inputs are of the form ->
 29 |         conv_input = conv_input.transpose(1, 2)
 30 |         # batch * length(which is 3 here -> entity,relation,entity) * dim
 31 |         # To make tensor of size 4, where second dim is for input channels
 32 |         conv_input = conv_input.unsqueeze(1)
 33 | 
 34 |         out_conv = self.dropout(
 35 |             self.non_linearity(self.conv_layer(conv_input)))
 36 | 
 37 |         input_fc = out_conv.squeeze(-1).view(batch_size, -1)
 38 |         output = self.fc_layer(input_fc)
 39 |         return output
 40 | 
 41 | 
 42 | class SpecialSpmmFunctionFinal(torch.autograd.Function):
 43 |     """Special function for only sparse region backpropataion layer."""
 44 |     @staticmethod
 45 |     def forward(ctx, edge, edge_w, N, E, out_features):
 46 |         # assert indices.requires_grad == False
 47 |         a = torch.sparse_coo_tensor(
 48 |             edge, edge_w, torch.Size([N, N, out_features]))
 49 |         b = torch.sparse.sum(a, dim=1)
 50 |         ctx.N = b.shape[0]
 51 |         ctx.outfeat = b.shape[1]
 52 |         ctx.E = E
 53 |         ctx.indices = a._indices()[0, :]
 54 | 
 55 |         return b.to_dense()
 56 | 
 57 |     @staticmethod
 58 |     def backward(ctx, grad_output):
 59 |         grad_values = None
 60 |         if ctx.needs_input_grad[1]:
 61 |             edge_sources = ctx.indices
 62 | 
 63 |             if(CUDA):
 64 |                 edge_sources = edge_sources.cuda()
 65 | 
 66 |             grad_values = grad_output[edge_sources]
 67 |             # grad_values = grad_values.view(ctx.E, ctx.outfeat)
 68 |             # print("Grad Outputs-> ", grad_output)
 69 |             # print("Grad values-> ", grad_values)
 70 |         return None, grad_values, None, None, None
 71 | 
 72 | 
 73 | class SpecialSpmmFinal(nn.Module):
 74 |     def forward(self, edge, edge_w, N, E, out_features):
 75 |         return SpecialSpmmFunctionFinal.apply(edge, edge_w, N, E, out_features)
 76 | 
 77 | 
 78 | class SpGraphAttentionLayer(nn.Module):
 79 |     """
 80 |     Sparse version GAT layer, similar to https://arxiv.org/abs/1710.10903
 81 |     """
 82 | 
 83 |     def __init__(self, num_nodes, in_features, out_features, nrela_dim, dropout, alpha, concat=True):
 84 |         super(SpGraphAttentionLayer, self).__init__()
 85 |         self.in_features = in_features
 86 |         self.out_features = out_features
 87 |         self.num_nodes = num_nodes
 88 |         self.alpha = alpha
 89 |         self.concat = concat
 90 |         self.nrela_dim = nrela_dim
 91 | 
 92 |         self.a = nn.Parameter(torch.zeros(
 93 |             size=(out_features, 2 * in_features + nrela_dim)))
 94 |         nn.init.xavier_normal_(self.a.data, gain=1.414)
 95 |         self.a_2 = nn.Parameter(torch.zeros(size=(1, out_features)))
 96 |         nn.init.xavier_normal_(self.a_2.data, gain=1.414)
 97 | 
 98 |         self.dropout = nn.Dropout(dropout)
 99 |         self.leakyrelu = nn.LeakyReLU(self.alpha)
100 |         self.special_spmm_final = SpecialSpmmFinal()
101 | 
102 |     def forward(self, input, edge, edge_embed, edge_list_nhop, edge_embed_nhop):
103 |         N = input.size()[0]
104 | 
105 |         # Self-attention on the nodes - Shared attention mechanism
106 |         edge = torch.cat((edge[:, :], edge_list_nhop[:, :]), dim=1)
107 |         edge_embed = torch.cat(
108 |             (edge_embed[:, :], edge_embed_nhop[:, :]), dim=0)
109 | 
110 |         edge_h = torch.cat(
111 |             (input[edge[0, :], :], input[edge[1, :], :], edge_embed[:, :]), dim=1).t()
112 |         # edge_h: (2*in_dim + nrela_dim) x E
113 | 
114 |         edge_m = self.a.mm(edge_h)
115 |         # edge_m: D * E
116 | 
117 |         # to be checked later
118 |         powers = -self.leakyrelu(self.a_2.mm(edge_m).squeeze())
119 |         edge_e = torch.exp(powers).unsqueeze(1)
120 |      #   assert not torch.isnan(edge_e).any()
121 |         # edge_e: E
122 | 
123 |         e_rowsum = self.special_spmm_final(
124 |             edge, edge_e, N, edge_e.shape[0], 1)
125 |         e_rowsum[e_rowsum == 0.0] = 1e-12
126 | 
127 |         e_rowsum = e_rowsum
128 |         # e_rowsum: N x 1
129 |         edge_e = edge_e.squeeze(1)
130 | 
131 |         edge_e = self.dropout(edge_e)
132 |         # edge_e: E
133 | 
134 |         edge_w = (edge_e * edge_m).t()
135 |         # edge_w: E * D
136 | 
137 |         h_prime = self.special_spmm_final(
138 |             edge, edge_w, N, edge_w.shape[0], self.out_features)
139 | 
140 |     #    assert not torch.isnan(h_prime).any()
141 |         # h_prime: N x out
142 |         h_prime = h_prime.div(e_rowsum)
143 |         # h_prime: N x out
144 | 
145 |    #     assert not torch.isnan(h_prime).any()
146 |         if self.concat:
147 |             # if this layer is not last layer,
148 |             return F.elu(h_prime)
149 |         else:
150 |             # if this layer is last layer,
151 |             return h_prime
152 | 
153 |     def __repr__(self):
154 |         return self.__class__.__name__ + ' (' + str(self.in_features) + ' -> ' + str(self.out_features) + ')'
155 | 


--------------------------------------------------------------------------------
/CoMPILE_v2/train/layers.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TmacMai/CoMPILE_Inductive_Knowledge_Graph/523db77f8178f89faf5ec06c677ca66b84ef35eb/CoMPILE_v2/train/layers.pyc


--------------------------------------------------------------------------------
/CoMPILE_v2/train/layers2.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | import torch
  3 | import torch.nn as nn
  4 | import time
  5 | import torch.nn.functional as F
  6 | from torch.autograd import Variable
  7 | 
  8 | 
  9 | CUDA = torch.cuda.is_available()
 10 | 
 11 | 
 12 | class ConvKB(nn.Module):
 13 |     def __init__(self, input_dim, input_seq_len, in_channels, out_channels, drop_prob, alpha_leaky):
 14 |         super().__init__()
 15 | 
 16 |         self.conv_layer = nn.Conv2d(
 17 |             in_channels, out_channels, (1, input_seq_len))  # kernel size -> 1*input_seq_length(i.e. 2)
 18 |         self.dropout = nn.Dropout(drop_prob)
 19 |         self.non_linearity = nn.ReLU()
 20 |         self.fc_layer = nn.Linear((input_dim) * out_channels, 1)
 21 | 
 22 |         nn.init.xavier_uniform_(self.fc_layer.weight, gain=1.414)
 23 |         nn.init.xavier_uniform_(self.conv_layer.weight, gain=1.414)
 24 | 
 25 |     def forward(self, conv_input):
 26 | 
 27 |         batch_size, length, dim = conv_input.size()
 28 |         # assuming inputs are of the form ->
 29 |         conv_input = conv_input.transpose(1, 2)
 30 |         # batch * length(which is 3 here -> entity,relation,entity) * dim
 31 |         # To make tensor of size 4, where second dim is for input channels
 32 |         conv_input = conv_input.unsqueeze(1)
 33 | 
 34 |         out_conv = self.dropout(
 35 |             self.non_linearity(self.conv_layer(conv_input)))
 36 | 
 37 |         input_fc = out_conv.squeeze(-1).view(batch_size, -1)
 38 |         output = self.fc_layer(input_fc)
 39 |         return output
 40 | 
 41 | 
 42 | class SpecialSpmmFunctionFinal(torch.autograd.Function):
 43 |     """Special function for only sparse region backpropataion layer."""
 44 |     @staticmethod
 45 |     def forward(ctx, edge, edge_w, N, E, out_features):
 46 |         # assert indices.requires_grad == False
 47 |       #  print('edge', edge.shape, 'edge_w', edge_w.shape, N, E, out_features)
 48 |         a = torch.sparse_coo_tensor(
 49 |             edge, edge_w, torch.Size([N, N, out_features]))   #####edge (2, 298431)  edge_w (298431, embeddings)   N:272115  E:298431 out_features:1
 50 |         b = torch.sparse.sum(a, dim=1)   #####[N 1]
 51 |         ctx.N = b.shape[0]
 52 |         ctx.outfeat = b.shape[1]
 53 |         ctx.E = E
 54 |         ctx.indices = a._indices()[0, :]
 55 | 
 56 |         return b.to_dense()
 57 | 
 58 |     @staticmethod
 59 |     def backward(ctx, grad_output):
 60 |         grad_values = None
 61 |         if ctx.needs_input_grad[1]:
 62 |             edge_sources = ctx.indices
 63 | 
 64 |             if(CUDA):
 65 |                 edge_sources = edge_sources.cuda()
 66 | 
 67 |             grad_values = grad_output[edge_sources]
 68 |             # grad_values = grad_values.view(ctx.E, ctx.outfeat)
 69 |             # print("Grad Outputs-> ", grad_output)
 70 |             # print("Grad values-> ", grad_values)
 71 |         return None, grad_values, None, None, None
 72 | 
 73 | 
 74 | class SpecialSpmmFinal(nn.Module):
 75 |     def forward(self, edge, edge_w, N, E, out_features):
 76 |         return SpecialSpmmFunctionFinal.apply(edge, edge_w, N, E, out_features)
 77 | 
 78 | 
 79 | 
 80 | 
 81 | 
 82 | class OurSpGraphAttentionLayer(nn.Module):
 83 |     """
 84 |     Sparse version GAT layer, similar to https://arxiv.org/abs/1710.10903
 85 |     """
 86 | 
 87 |     def __init__(self, num_nodes, in_features, out_features, nrela_dim, dropout, alpha, concat=True):
 88 |         super(OurSpGraphAttentionLayer, self).__init__()
 89 |         self.in_features = in_features
 90 |         self.out_features = out_features
 91 |         self.num_nodes = num_nodes
 92 |         self.alpha = alpha
 93 |         self.concat = concat
 94 |         self.nrela_dim = nrela_dim
 95 | 
 96 |         self.a = nn.Parameter(torch.zeros(
 97 |             size=(out_features, 2 * in_features + nrela_dim)))
 98 |         nn.init.xavier_normal_(self.a.data, gain=1.414)
 99 |         self.a_2 = nn.Parameter(torch.zeros(size=(1, out_features)))
100 |         nn.init.xavier_normal_(self.a_2.data, gain=1.414)
101 | 
102 |         self.dropout = nn.Dropout(dropout)
103 |         self.leakyrelu = nn.LeakyReLU(self.alpha)
104 |         self.special_spmm_final = SpecialSpmmFinal()
105 | 
106 |     def forward(self, entity_embeddings, entity_list, relation_embed, target_relation_embed):
107 |         N = entity_embeddings.size()[0]    #####input = entity_embedding, edge = edge_list (bz, 2)
108 | 
109 |       #  print('edge', edge.shape, 'edge_embed', edge_embed.shape, 'edge_embed_nhop', edge_embed_nhop.shape, 'edge_list_nhop', edge_list_nhop.shape)
110 |         edge = entity_list    ######[bz, 2]
111 |         edge_embed = relation_embed  ####[bz, 100]
112 | 
113 |         edge_h = torch.cat((entity_embeddings[edge[:, 0], :], entity_embeddings[edge[:, 1], :], edge_embed[:, :]), dim=1).t()   ####[300, bz] [source target relation] embedding
114 |          # edge_h: (2*in_dim + nrela_dim) x E
115 |       #  print('edge_h', edge_h.shape, 'a', self.a.shape)
116 |         edge_m = self.a.mm(edge_h)   ######[out_features, bz]
117 | 
118 |         powers = -self.leakyrelu(self.a_2.mm(edge_m).squeeze(0))
119 |         edge_e = torch.exp(powers).unsqueeze(1)    ####[bz, 1]
120 |       #  assert not torch.isnan(edge_e).any()
121 |         # edge_e: E
122 |        # print('edge after concat ', edge.shape, 'edge_e ', edge_e.shape)
123 |         e_rowsum = self.special_spmm_final(
124 |             edge.t(), edge_e, N, edge_e.shape[0], 1)
125 |         e_rowsum[e_rowsum == 0.0] = 1e-12
126 | 
127 |         e_rowsum = e_rowsum
128 |         # e_rowsum: N x 1
129 |         edge_e = edge_e.squeeze(1)    #####[298431]
130 | 
131 |         edge_e = self.dropout(edge_e)
132 |         # edge_e: E
133 | 
134 |         edge_w = (edge_e * edge_m).t()    #####[bz, out_features]
135 |         # edge_w: E * D
136 | 
137 |         h_prime = self.special_spmm_final(
138 |             edge.t(), edge_w, N, edge_w.shape[0], self.out_features)
139 | 
140 |        # assert not torch.isnan(h_prime).any()
141 |         # h_prime: N x out
142 |         h_prime = h_prime.div(e_rowsum)
143 |         # h_prime: N x out
144 | 
145 |       #  assert not torch.isnan(h_prime).any()
146 |         if self.concat:
147 |             # if this layer is not last layer,
148 |             return F.elu(h_prime)
149 |         else:
150 |             # if this layer is last layer,
151 |             return h_prime
152 | 
153 |     def __repr__(self):
154 |         return self.__class__.__name__ + ' (' + str(self.in_features) + ' -> ' + str(self.out_features) + ')'
155 | 
156 | 
157 | 
158 | 
159 | 
160 | 
161 | 
162 | class SpGraphAttentionLayer(nn.Module):
163 |     """
164 |     Sparse version GAT layer, similar to https://arxiv.org/abs/1710.10903
165 |     """
166 | 
167 |     def __init__(self, num_nodes, in_features, out_features, nrela_dim, dropout, alpha, concat=True):
168 |         super(SpGraphAttentionLayer, self).__init__()
169 |         self.in_features = in_features
170 |         self.out_features = out_features
171 |         self.num_nodes = num_nodes
172 |         self.alpha = alpha
173 |         self.concat = concat
174 |         self.nrela_dim = nrela_dim
175 | 
176 |         self.a = nn.Parameter(torch.zeros(
177 |             size=(out_features, 2 * in_features + nrela_dim)))
178 |         nn.init.xavier_normal_(self.a.data, gain=1.414)
179 |         self.a_2 = nn.Parameter(torch.zeros(size=(1, out_features)))
180 |         nn.init.xavier_normal_(self.a_2.data, gain=1.414)
181 | 
182 |         self.dropout = nn.Dropout(dropout)
183 |         self.leakyrelu = nn.LeakyReLU(self.alpha)
184 |         self.special_spmm_final = SpecialSpmmFinal()
185 | 
186 |     def forward(self, input, edge, edge_embed, edge_list_nhop, edge_embed_nhop):
187 |         N = input.size()[0]    #####input = entity_embedding, edge = edge_list (2, 272115),   edge_embed_nhop (26316, 100)
188 | 
189 |         # Self-attention on the nodes - Shared attention mechanism
190 |       #  print('edge', edge.shape, 'edge_embed', edge_embed.shape, 'edge_embed_nhop', edge_embed_nhop.shape, 'edge_list_nhop', edge_list_nhop.shape)
191 |         edge = torch.cat((edge[:, :], edge_list_nhop[:, :]), dim=1)    ######[2, 298431]
192 |         edge_embed = torch.cat((edge_embed[:, :], edge_embed_nhop[:, :]), dim=0)  ####[298431, 100]
193 | 
194 |         edge_h = torch.cat((input[edge[0, :], :], input[edge[1, :], :], edge_embed[:, :]), dim=1).t()   ####[300, 298431] [source target relation] embedding
195 |          # edge_h: (2*in_dim + nrela_dim) x E
196 |        # print('edge_h', edge_h.shape)
197 |         edge_m = self.a.mm(edge_h)   ######[out_features, 298431]
198 |         # edge_m: D * E
199 | 
200 |         # to be checked later
201 |         powers = -self.leakyrelu(self.a_2.mm(edge_m).squeeze())
202 |         edge_e = torch.exp(powers).unsqueeze(1)    ####[298431, 1]
203 |         assert not torch.isnan(edge_e).any()
204 |         # edge_e: E
205 |       #  print('edge after concat ', edge.shape, 'edge_e ', edge_e.shape)
206 |         e_rowsum = self.special_spmm_final(
207 |             edge, edge_e, N, edge_e.shape[0], 1)
208 |         e_rowsum[e_rowsum == 0.0] = 1e-12
209 | 
210 |         e_rowsum = e_rowsum
211 |         # e_rowsum: N x 1
212 |         edge_e = edge_e.squeeze(1)    #####[298431]
213 | 
214 |         edge_e = self.dropout(edge_e)
215 |         # edge_e: E
216 | 
217 |         edge_w = (edge_e * edge_m).t()    #####[298431, out_features]
218 |         # edge_w: E * D
219 | 
220 |         h_prime = self.special_spmm_final(
221 |             edge, edge_w, N, edge_w.shape[0], self.out_features)
222 | 
223 |         assert not torch.isnan(h_prime).any()
224 |         # h_prime: N x out
225 |         h_prime = h_prime.div(e_rowsum)
226 |         # h_prime: N x out
227 | 
228 |         assert not torch.isnan(h_prime).any()
229 |         if self.concat:
230 |             # if this layer is not last layer,
231 |             return F.elu(h_prime)
232 |         else:
233 |             # if this layer is last layer,
234 |             return h_prime
235 | 
236 |     def __repr__(self):
237 |         return self.__class__.__name__ + ' (' + str(self.in_features) + ' -> ' + str(self.out_features) + ')'
238 | 


--------------------------------------------------------------------------------
/CoMPILE_v2/train/layers2.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TmacMai/CoMPILE_Inductive_Knowledge_Graph/523db77f8178f89faf5ec06c677ca66b84ef35eb/CoMPILE_v2/train/layers2.pyc


--------------------------------------------------------------------------------
/CoMPILE_v2/train/models.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TmacMai/CoMPILE_Inductive_Knowledge_Graph/523db77f8178f89faf5ec06c677ca66b84ef35eb/CoMPILE_v2/train/models.pyc


--------------------------------------------------------------------------------
/CoMPILE_v2/train/models4.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TmacMai/CoMPILE_Inductive_Knowledge_Graph/523db77f8178f89faf5ec06c677ca66b84ef35eb/CoMPILE_v2/train/models4.pyc


--------------------------------------------------------------------------------
/CoMPILE_v2/train/preprocess.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TmacMai/CoMPILE_Inductive_Knowledge_Graph/523db77f8178f89faf5ec06c677ca66b84ef35eb/CoMPILE_v2/train/preprocess.pyc


--------------------------------------------------------------------------------
/CoMPILE_v2/train/preprocess2.py:
--------------------------------------------------------------------------------
  1 | import torch
  2 | import os
  3 | import numpy as np
  4 | 
  5 | 
  6 | def read_entity_from_id(filename='./data/WN18RR/entity2id_inductive.txt'):
  7 |     entity2id = {}
  8 |     with open(filename, 'r') as f:
  9 |         for line in f:
 10 |             if len(line.strip().split('\t')) > 1:
 11 |                 entity, entity_id = line.strip().split('\t'
 12 |                 )[0].strip(), line.strip().split('\t')[1].strip()
 13 |                 entity2id[entity] = int(entity_id)
 14 |     return entity2id
 15 | 
 16 | 
 17 | def read_relation_from_id(filename='./data/WN18RR/relation2id_inductive.txt'):
 18 |     relation2id = {}
 19 |     with open(filename, 'r') as f:
 20 |         for line in f:
 21 |             if len(line.strip().split('\t')) > 1:
 22 |                 relation, relation_id = line.strip().split('\t'
 23 |                 )[0].strip(), line.strip().split('\t')[1].strip()
 24 |                 relation2id[relation] = int(relation_id)
 25 |     return relation2id
 26 | 
 27 | 
 28 | def init_embeddings(entity_file, relation_file):
 29 |     entity_emb, relation_emb = [], []
 30 | 
 31 |     with open(entity_file) as f:
 32 |         for line in f:
 33 |             entity_emb.append([float(val) for val in line.strip().split('\t')])
 34 | 
 35 |     with open(relation_file) as f:
 36 |         for line in f:
 37 |             relation_emb.append([float(val) for val in line.strip().split('\t')])
 38 | 
 39 |     return np.array(entity_emb, dtype=np.float32), np.array(relation_emb, dtype=np.float32)
 40 | 
 41 | 
 42 | def parse_line(line):
 43 |     line = line.strip().split('\t')
 44 |     e1, relation, e2 = line[0].strip(), line[1].strip(), line[2].strip()
 45 |     return e1, relation, e2
 46 | 
 47 | 
 48 | def load_data(filename, entity2id, relation2id, is_unweigted=False, directed=True):
 49 |     with open(filename) as f:
 50 |         lines = f.readlines()
 51 | 
 52 |     # this is list for relation triples
 53 |     triples_data = []
 54 | 
 55 |     # for sparse tensor, rows list contains corresponding row of sparse tensor, cols list contains corresponding
 56 |     # columnn of sparse tensor, data contains the type of relation
 57 |     # Adjacecny matrix of entities is undirected, as the source and tail entities should know, the relation
 58 |     # type they are connected with
 59 |     rows, cols, data = [], [], []
 60 |     unique_entities = set()
 61 |     for line in lines:
 62 |         e1, relation, e2 = parse_line(line)
 63 |         unique_entities.add(e1)
 64 |         unique_entities.add(e2)
 65 |         triples_data.append(
 66 |             (entity2id[e1], relation2id[relation], entity2id[e2]))
 67 |         if not directed:
 68 |                 # Connecting source and tail entity
 69 |             rows.append(entity2id[e1])
 70 |             cols.append(entity2id[e2])
 71 |             if is_unweigted:
 72 |                 data.append(1)
 73 |             else:
 74 |                 data.append(relation2id[relation])
 75 | 
 76 |         # Connecting tail and source entity
 77 |         rows.append(entity2id[e2])
 78 |         cols.append(entity2id[e1])
 79 |         if is_unweigted:
 80 |             data.append(1)
 81 |         else:
 82 |             data.append(relation2id[relation])
 83 | 
 84 |     print("number of unique_entities ->", len(unique_entities))
 85 |     return triples_data, (rows, cols, data), list(unique_entities)
 86 | 
 87 | 
 88 | def build_data(path='./data/WN18RR/', is_unweigted=False, directed=True):
 89 |     entity2id = read_entity_from_id(path + 'entity2id_inductive.txt')
 90 |     relation2id = read_relation_from_id(path + 'relation2id_inductive.txt')
 91 | 
 92 |     # Adjacency matrix only required for training phase
 93 |     # Currenlty creating as unweighted, undirected
 94 |     train_triples, train_adjacency_mat, unique_entities_train = load_data(os.path.join(
 95 |         path, 'train_inductive.txt'), entity2id, relation2id, is_unweigted, directed)
 96 |     validation_triples, valid_adjacency_mat, unique_entities_validation = load_data(
 97 |         os.path.join(path, 'valid_inductive.txt'), entity2id, relation2id, is_unweigted, directed)
 98 |     test_triples, test_adjacency_mat, unique_entities_test = load_data(os.path.join(
 99 |         path, 'test_inductive.txt'), entity2id, relation2id, is_unweigted, directed)
100 |  #   all_triples, all_adjacency_mat, unique_entities_all = load_data(os.path.join(
101 |   #      path, 'all_inductive.txt'), entity2id, relation2id, is_unweigted, directed)
102 | 
103 |     id2entity = {v: k for k, v in entity2id.items()}
104 |     id2relation = {v: k for k, v in relation2id.items()}
105 |     left_entity, right_entity = {}, {}
106 | 
107 |     with open(os.path.join(path, 'train_inductive.txt')) as f:
108 |         lines = f.readlines()
109 | 
110 |     for line in lines:
111 |         e1, relation, e2 = parse_line(line)
112 | 
113 |         # Count number of occurences for each (e1, relation)
114 |         if relation2id[relation] not in left_entity:
115 |             left_entity[relation2id[relation]] = {}
116 |         if entity2id[e1] not in left_entity[relation2id[relation]]:
117 |             left_entity[relation2id[relation]][entity2id[e1]] = 0
118 |         left_entity[relation2id[relation]][entity2id[e1]] += 1
119 | 
120 |         # Count number of occurences for each (relation, e2)
121 |         if relation2id[relation] not in right_entity:
122 |             right_entity[relation2id[relation]] = {}
123 |         if entity2id[e2] not in right_entity[relation2id[relation]]:
124 |             right_entity[relation2id[relation]][entity2id[e2]] = 0
125 |         right_entity[relation2id[relation]][entity2id[e2]] += 1
126 | 
127 |     left_entity_avg = {}
128 |     for i in range(len(relation2id)):
129 |         left_entity_avg[i] = sum(
130 |             left_entity[i].values()) * 1.0 / len(left_entity[i])
131 | 
132 |     right_entity_avg = {}
133 |     for i in range(len(relation2id)):
134 |         right_entity_avg[i] = sum(
135 |             right_entity[i].values()) * 1.0 / len(right_entity[i])
136 | 
137 |     headTailSelector = {}
138 |     for i in range(len(relation2id)):
139 |         headTailSelector[i] = 1000 * right_entity_avg[i] / \
140 |             (right_entity_avg[i] + left_entity_avg[i])
141 | 
142 |     return (train_triples, train_adjacency_mat), (validation_triples, valid_adjacency_mat), (test_triples, test_adjacency_mat), \
143 |         entity2id, relation2id, headTailSelector, unique_entities_train, unique_entities_validation, unique_entities_test
144 | 


--------------------------------------------------------------------------------
/CoMPILE_v2/train/preprocess2.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TmacMai/CoMPILE_Inductive_Knowledge_Graph/523db77f8178f89faf5ec06c677ca66b84ef35eb/CoMPILE_v2/train/preprocess2.pyc


--------------------------------------------------------------------------------
/CoMPILE_v2/train/preprocess_inductive.py:
--------------------------------------------------------------------------------
  1 | import torch
  2 | import os
  3 | import numpy as np
  4 | from random import sample
  5 | 
  6 | def read_entity_from_id(filename='./data/WN18RR/entity2id.txt'):
  7 |     entity2id = {}
  8 |     with open(filename, 'r') as f:
  9 |         for line in f:
 10 |             if len(line.strip().split('\t')) > 1:
 11 |                 entity, entity_id = line.strip().split('\t'
 12 |                 )[0].strip(), line.strip().split('\t')[1].strip()
 13 |                 entity2id[entity] = int(entity_id)
 14 |     return entity2id
 15 | 
 16 | 
 17 | def read_relation_from_id(filename='./data/WN18RR/relation2id.txt'):
 18 |     relation2id = {}
 19 |     with open(filename, 'r') as f:
 20 |         for line in f:
 21 |             if len(line.strip().split('\t')) > 1:
 22 |                 relation, relation_id = line.strip().split('\t'
 23 |                 )[0].strip(), line.strip().split('\t')[1].strip()
 24 |                 relation2id[relation] = int(relation_id)
 25 |     return relation2id
 26 | 
 27 | 
 28 | def init_embeddings(entity_file, relation_file):
 29 |     entity_emb, relation_emb = [], []
 30 | 
 31 |     with open(entity_file) as f:
 32 |         for line in f:
 33 |             entity_emb.append([float(val) for val in line.strip().split('\t')])
 34 | 
 35 |     with open(relation_file) as f:
 36 |         for line in f:
 37 |             relation_emb.append([float(val) for val in line.strip().split('\t')])
 38 | 
 39 |     return np.array(entity_emb, dtype=np.float32), np.array(relation_emb, dtype=np.float32)
 40 | 
 41 | 
 42 | def parse_line(line):
 43 |     line = line.strip().split('\t')
 44 |     e1, relation, e2 = line[0].strip(), line[1].strip(), line[2].strip()
 45 |     return e1, relation, e2
 46 | 
 47 | 
 48 | def get_id(filename,  is_unweigted=False, directed=True, saved_relation2id=None):
 49 | 
 50 |     entity2id = {}
 51 |     relation2id = {} if saved_relation2id is None else saved_relation2id
 52 | 
 53 |     triples_data = {}
 54 |     rows, cols, data = [], [], []
 55 |     unique_entities = set()
 56 | 
 57 |     ent = 0
 58 |     rel = 0
 59 | 
 60 |     for filename1 in filename:
 61 | 
 62 |         data = []
 63 |         with open(filename1) as f:
 64 |             file_data = [line.split() for line in f.read().split('\n')[:-1]]
 65 | 
 66 |         for triplet in file_data:
 67 |             if triplet[0] not in entity2id:
 68 |                 entity2id[triplet[0]] = ent
 69 |                 ent += 1
 70 |             if triplet[2] not in entity2id:
 71 |                 entity2id[triplet[2]] = ent
 72 |                 ent += 1
 73 |             if not saved_relation2id and triplet[1] not in relation2id:
 74 |                 relation2id[triplet[1]] = rel
 75 |                 rel += 1
 76 | 
 77 |             # Save the triplets corresponding to only the known relations
 78 |             if triplet[1] in relation2id:
 79 |                 data.append([entity2id[triplet[0]], entity2id[triplet[2]], relation2id[triplet[1]]])
 80 | 
 81 |        # triplets[file_type] = np.array(data)
 82 | 
 83 |     id2entity = {v: k for k, v in entity2id.items()}
 84 |     id2relation = {v: k for k, v in relation2id.items()}
 85 |     return entity2id, relation2id, rel
 86 | 
 87 | 
 88 | def load_data(filename, entity2id, relation2id, is_unweigted=False, directed=True):
 89 |     with open(filename) as f:
 90 |         lines = [line.split() for line in f.read().split('\n')[:-1]]
 91 | 
 92 |     triples_data = []
 93 | 
 94 |     rows, cols, data = [], [], []
 95 |     unique_entities = set()
 96 |     for line in lines:
 97 |         e1, relation, e2 = line[0], line[1], line[2]
 98 |         unique_entities.add(e1)
 99 |         unique_entities.add(e2)
100 |         triples_data.append(
101 |             (entity2id[e1], relation2id[relation], entity2id[e2]))
102 |         if not directed:
103 |                 # Connecting source and tail entity
104 |             rows.append(entity2id[e1])
105 |             cols.append(entity2id[e2])
106 |             if is_unweigted:
107 |                 data.append(1)
108 |             else:
109 |                 data.append(relation2id[relation])
110 | 
111 |         # Connecting tail and source entity
112 |         rows.append(entity2id[e2])
113 |         cols.append(entity2id[e1])
114 |         if is_unweigted:
115 |             data.append(1)
116 |         else:
117 |             data.append(relation2id[relation])
118 | 
119 |     print("number of unique_entities ->", len(unique_entities))
120 |     return triples_data, (rows, cols, data), list(unique_entities)
121 | 
122 | 
123 | def build_data(path='./data/FB15k-237-inductive-v3/', is_unweigted=False, directed=True):
124 |    # entity2id = read_entity_from_id(path + 'entity2id.txt')
125 |    # relation2id = read_relation_from_id(path + 'relation2id.txt')
126 |     entity2id, relation2id, rel = get_id([os.path.join(path, 'train.txt'), os.path.join(path, 'valid.txt'), os.path.join(path, 'train_inductive.txt')])
127 | 
128 |     train_triples, train_adjacency_mat, unique_entities_train = load_data(os.path.join(path, 'train.txt'), entity2id, relation2id, is_unweigted, directed)
129 |     validation_triples, valid_adjacency_mat, unique_entities_validation = load_data(os.path.join(path, 'valid.txt'), entity2id, relation2id, is_unweigted, directed)
130 |     _, test_adjacency_mat, unique_entities_test = load_data(os.path.join(path, 'train_inductive.txt'), entity2id, relation2id, is_unweigted, directed)
131 | 
132 |     test_links, _, _ = load_data(os.path.join(path, 'test_inductive.txt'), entity2id, relation2id, is_unweigted, directed)
133 | 
134 |     test_triples = test_links
135 | 
136 |     id2entity = {v: k for k, v in entity2id.items()}
137 |     id2relation = {v: k for k, v in relation2id.items()}
138 |     left_entity, right_entity = {}, {}
139 | 
140 |     return (train_triples, train_adjacency_mat), (validation_triples, valid_adjacency_mat), (test_triples, test_adjacency_mat), \
141 |         entity2id, relation2id, rel, unique_entities_train,  unique_entities_validation, unique_entities_test
142 | 


--------------------------------------------------------------------------------
/CoMPILE_v2/train/utils.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TmacMai/CoMPILE_Inductive_Knowledge_Graph/523db77f8178f89faf5ec06c677ca66b84ef35eb/CoMPILE_v2/train/utils.pyc


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2020 TmacMai
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | News: version 2 has been uploaded.
 2 | 
 3 | 
 4 | Our CoMPILE has two versions. 
 5 | 
 6 | 
 7 | 
 8 | #################################version 1#########################################
 9 | 
10 | The first version is implemented based on GraIL (https://github.com/kkteru/grail), in which we evaluate our message passing model on the original inductive datasets proposed by the authors of the GraIL. We thank very much for their code sharing.
11 | 
12 | To run the code, firstly you need to unrar the data.rar and place the folder under CoMPILE_github.
13 | 
14 | To train the model (take FB15k-237 inductive v4 dataset as example):
15 | 
16 |      python train.py -d fb237_v4 -e compile_fb_v4_ind
17 | 
18 | 
19 | To evaluate the AUC score of the trained model:
20 | 
21 |      python test_auc.py -d fb237_v4_ind -e compile_fb_v4_ind
22 | 
23 | 
24 | 
25 | To evaluate the Hits@10 score of the trained model:
26 | 
27 |      python test_ranking.py -d fb237_v4_ind -e compile_fb_v4_ind
28 |      
29 |      
30 | #################################version 2#########################################
31 | 
32 | In version 2, we implement our inductive learning system, including the data filtering, directed subgraph extraction, and the message passing mechanism.
33 | 
34 | To be updated...
35 | 


--------------------------------------------------------------------------------