├── .idea ├── .gitignore ├── Graph_Learning.iml ├── inspectionProfiles │ └── profiles_settings.xml ├── misc.xml ├── modules.xml └── vcs.xml ├── LICENSE ├── README.md ├── __init__.py ├── __pycache__ └── __init__.cpython-36.pyc ├── data └── blog │ ├── blog-label.txt │ ├── blog-net.txt │ ├── blog-vocab.txt │ └── node_map_dic.pkl ├── deepwalk ├── __init__.py ├── __pycache__ │ ├── __init__.cpython-36.pyc │ ├── build_graph.cpython-36.pyc │ ├── dataset.cpython-36.pyc │ ├── model.cpython-36.pyc │ └── utils.cpython-36.pyc ├── build_graph.py ├── dataset.py ├── main.py ├── model.py ├── node_classfication.py ├── test │ ├── __init__.py │ └── test_hetergraph.py └── utils.py ├── gat ├── __init__.py ├── __pycache__ │ └── model.cpython-38.pyc ├── model.py └── train.py ├── gcn ├── build_graph.py ├── gcn_model.py ├── test_api │ └── nx_to_scipy_sparse_matrix_test.py ├── train.py └── utils.py ├── graphsage ├── __init__.py ├── link_prediction │ ├── __init__.py │ ├── __pycache__ │ │ ├── dataloader.cpython-38.pyc │ │ └── model.cpython-38.pyc │ ├── dataloader.py │ ├── model.py │ └── train.py ├── node_classification │ ├── __init__.py │ ├── __pycache__ │ │ ├── dataloader.cpython-38.pyc │ │ └── model.cpython-38.pyc │ ├── dataloader.py │ ├── model.py │ └── train.py └── test_api │ ├── __init__.py │ └── edge_dataloader.py ├── node2vec ├── __init__.py ├── main.py ├── model.py └── sample_walks.py ├── out └── blog_deepwalk_ckpt ├── pictures ├── GCN_AD2.png ├── graphSAGE_link_pre.png └── node_classification.png └── requirements.txt /.idea/.gitignore: -------------------------------------------------------------------------------- 1 | # Default ignored files 2 | /shelf/ 3 | /workspace.xml 4 | -------------------------------------------------------------------------------- /.idea/Graph_Learning.iml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 12 | -------------------------------------------------------------------------------- /.idea/inspectionProfiles/profiles_settings.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 6 | -------------------------------------------------------------------------------- /.idea/misc.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | -------------------------------------------------------------------------------- /.idea/modules.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | -------------------------------------------------------------------------------- /.idea/vcs.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2021 Princeton Natural Language Processing 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | ### 图模型实践 2 | 图模型项目(GCN、GAT、GraphSAGE、deepwalk、node2vec)细节实践、论文复现、持续更新、欢迎star、交流学习。 3 | 4 | #### 1. 环境准备 5 | based on [dgl](https://github.com/dmlc/dgl) and pytorch mainly 6 | >pip install -r requirements.txt 7 | 8 | #### 2. 数据 9 | >download dataset,put it to ./data/ 10 | 11 | uploaded dataset blog already 12 | 13 | #### 3. 图模型代码详解: 14 | Notes of model written here: 15 | 1. [游走图模型--同构图DeepWalk解析](https://zhuanlan.zhihu.com/p/397710211) 16 | 2. [游走图模型-聊聊Node2Vec](https://zhuanlan.zhihu.com/p/400849086) 17 | 3. [图卷积:从GCN到GAT、GraphSAGE](https://zhuanlan.zhihu.com/p/404826711) 18 | 4. [怎么搭一个GCN?只需这四步](https://zhuanlan.zhihu.com/p/422380707) 19 | 5. [怎么搭好一个GraphSAGE?按这三步走](https://zhuanlan.zhihu.com/p/429147607) 20 | 6. [Link-Prediction:搭一个无监督的GraphSAGE](https://zhuanlan.zhihu.com/p/435766657) 21 | #### How to run 22 | ##### DeepWalk 23 | ①. How to run deepwalk model for graph embedding? 24 | >cd deepwalk 25 | >python main.py 26 | 27 | ②. node classification task 28 | >python node_classification.py 29 | 30 | #### Node2Vec 31 | ①. How to run Node2Vec model 32 | >cd node2vec 33 | >python main.py 34 | 35 | ②. node classification task(should chang the checkpoint of node2vec in node_classification.py). 36 | >python node_classification.py 37 | 38 | ##### GCN 39 | ①. How to run GCN model 40 | >python train.py 41 | 42 | Cora dataset node classification(cora dataset will be download in ~/.dgl/ automatically). 43 | Test accuracy ~0.806 (0.793-0.819) ([paper](https://arxiv.org/abs/1609.02907): 0.815). 44 | 45 | ##### GraphSAGE 46 | 47 | ###### Node Classification 48 | 49 | ①. How to run GraphSAGE model 50 | >cd graphsage/node_classification 51 | >python train.py 52 | 53 | Cora dataset node classification(cora dataset will be download in ~/.dgl/ automatically). 54 | Test accuracy ~0.781(0.762-0.801) ([paper](https://arxiv.org/abs/1609.02907): 0.815). 55 | 56 | ###### Link Prediction 57 | 58 | ①. How to run GraphSAGE model 59 | >cd graphsage/link_prediction 60 | >python train.py 61 | 62 | Test F1: ~0.630 (0.612~0.648) (cora数据集) 63 | 64 | ##### GAT 65 | ①. How to run GAT model 66 | >python train.py 67 | 68 | Cora dataset node classification(cora dataset will be download in ~/.dgl/ automatically). 69 | Test accuracy ~0.810 (0.792-0.820) ([paper](https://arxiv.org/pdf/1710.10903.pdf): 0.830). -------------------------------------------------------------------------------- /__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/xinyi-code/Graph-Learning/fb8ff32e2e1ab4dbebb60ce0acd75ae595d806b8/__init__.py -------------------------------------------------------------------------------- /__pycache__/__init__.cpython-36.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/xinyi-code/Graph-Learning/fb8ff32e2e1ab4dbebb60ce0acd75ae595d806b8/__pycache__/__init__.cpython-36.pyc -------------------------------------------------------------------------------- /data/blog/node_map_dic.pkl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/xinyi-code/Graph-Learning/fb8ff32e2e1ab4dbebb60ce0acd75ae595d806b8/data/blog/node_map_dic.pkl -------------------------------------------------------------------------------- /deepwalk/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/xinyi-code/Graph-Learning/fb8ff32e2e1ab4dbebb60ce0acd75ae595d806b8/deepwalk/__init__.py -------------------------------------------------------------------------------- /deepwalk/__pycache__/__init__.cpython-36.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/xinyi-code/Graph-Learning/fb8ff32e2e1ab4dbebb60ce0acd75ae595d806b8/deepwalk/__pycache__/__init__.cpython-36.pyc -------------------------------------------------------------------------------- /deepwalk/__pycache__/build_graph.cpython-36.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/xinyi-code/Graph-Learning/fb8ff32e2e1ab4dbebb60ce0acd75ae595d806b8/deepwalk/__pycache__/build_graph.cpython-36.pyc -------------------------------------------------------------------------------- /deepwalk/__pycache__/dataset.cpython-36.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/xinyi-code/Graph-Learning/fb8ff32e2e1ab4dbebb60ce0acd75ae595d806b8/deepwalk/__pycache__/dataset.cpython-36.pyc -------------------------------------------------------------------------------- /deepwalk/__pycache__/model.cpython-36.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/xinyi-code/Graph-Learning/fb8ff32e2e1ab4dbebb60ce0acd75ae595d806b8/deepwalk/__pycache__/model.cpython-36.pyc -------------------------------------------------------------------------------- /deepwalk/__pycache__/utils.cpython-36.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/xinyi-code/Graph-Learning/fb8ff32e2e1ab4dbebb60ce0acd75ae595d806b8/deepwalk/__pycache__/utils.cpython-36.pyc -------------------------------------------------------------------------------- /deepwalk/build_graph.py: -------------------------------------------------------------------------------- 1 | import os 2 | import copy 3 | import numpy as np 4 | import scipy.sparse as sp 5 | import pickle 6 | import torch 7 | from torch.utils.data import DataLoader 8 | from dgl.data.utils import download, _get_dgl_url, get_download_dir, extract_archive 9 | import random 10 | import time 11 | import dgl 12 | 13 | 14 | def make_undirected(G): 15 | G.add_edges(G.edges()[1], G.edges()[0]) 16 | return G 17 | 18 | 19 | def find_connected_nodes(G): 20 | nodes = G.out_degrees().nonzero().squeeze(-1) 21 | return nodes 22 | 23 | 24 | import time 25 | import pandas as pd 26 | 27 | 28 | class Build_Graph(object): 29 | def __init__(self, data_dir, walk_length=5, self_loop=True, undirected=True): 30 | self.edge_file_path = data_dir + "blog-net.txt" 31 | self.map_dict_save_path = data_dir + "node_map_dic.pkl" 32 | self.self_loop = self_loop 33 | self.undirected = undirected 34 | self.walk_length = walk_length 35 | 36 | self.edges, self.nodes, self.node2id, self.id2node = self.get_edges_and_mapdict(self.edge_file_path, 37 | self_loop=self.self_loop, 38 | undirected=self.undirected) 39 | self.save_dict(self.node2id, self.map_dict_save_path) 40 | 41 | self.graph = self.build_graph(self.edges) 42 | 43 | print("total nodes number: %d" % self.graph.num_nodes()) 44 | print("total edges number: %d" % len(self.edges[0])) 45 | def get_edges_and_mapdict(self, file_path, self_loop=True, undirected=True): 46 | df_net = pd.read_csv(file_path, header=None, sep=" ", names=["src", "dst", "weight"]) 47 | 48 | nodes = list(set(sorted(df_net.src.to_list() + df_net.dst.to_list()))) 49 | node2id = dict(zip(nodes, range(len(nodes)))) 50 | id2node = dict(zip(range(len(nodes)), nodes)) 51 | 52 | src = df_net.src.map(node2id).to_list() 53 | dst = df_net.dst.map(node2id).to_list() 54 | 55 | if undirected: 56 | tmp = copy.deepcopy(src) 57 | src.extend(dst) 58 | dst.extend(tmp) 59 | 60 | if self_loop: 61 | src.extend(nodes) 62 | dst.extend(nodes) 63 | 64 | assert max(node2id.values()) == len(nodes) - 1, "error reading net, quit" 65 | 66 | return (src, dst), nodes, node2id, id2node 67 | 68 | def build_graph(self, edges): 69 | start = time.time() 70 | G = dgl.graph((torch.tensor(edges[0]), torch.tensor(edges[1]))) 71 | t = time.time() - start 72 | print("Building DGLGraph in %.2fs" % t) 73 | return G 74 | 75 | def save_dict(self, map_dic, save_path): 76 | a_file = open(save_path, "wb") 77 | pickle.dump(map_dic, a_file) 78 | a_file.close() 79 | 80 | 81 | if __name__ == "__main__": 82 | # net, node2id, id2node, sm = ReadTxtNet(file_path="youtube", undirected=True) 83 | file_path = "D:/Learn_Project/graph_work/news/blog/" 84 | GraphSet = Build_Graph(file_path, walk_length=5, self_loop=True, undirected=True) 85 | 86 | # Walk_Sampler = DeepwalkSampler(G, seeds, walk_length) 87 | -------------------------------------------------------------------------------- /deepwalk/dataset.py: -------------------------------------------------------------------------------- 1 | import random 2 | import numpy as np 3 | 4 | import torch 5 | from torch.utils.data import Dataset, DataLoader 6 | 7 | import dgl 8 | 9 | 10 | class Collate_Func(object): 11 | def __init__(self, graph, config, walk_mode="random_walk"): 12 | self.walk_mode = config.walk_mode 13 | self.p = config.p 14 | self.q = config.q 15 | self.walk_length = config.walk_length 16 | self.half_win_size = config.win_size // 2 17 | self.walk_num_per_node = config.walk_num_per_node 18 | self.graph = graph 19 | self.neg_num = config.neg_num 20 | self.nodes = graph.nodes().tolist() 21 | 22 | def sample_walks(self, graph, seed_nodes, walk_length, walk_mode): 23 | # DeepwalkSampler(self.G, self.seeds[i], self.walk_length) 24 | if walk_mode == "random_walk": 25 | walks = dgl.sampling.random_walk(graph, seed_nodes, length=walk_length) 26 | elif walk_mode == "node2vec_random_walk": 27 | walks = dgl.sampling.node2vec_random_walk(graph, seed_nodes, self.p, self.q, length=walk_length) 28 | else: 29 | raise ValueError('walk mode should be defined explicit.') 30 | return walks 31 | 32 | def skip_gram_gen_pairs(self, walk, half_win_size=2): 33 | src, dst = list(), list() 34 | 35 | l = len(walk) 36 | # rnd = np.random.randint(1, half_win_size+1, dtype=np.int64, size=l) 37 | for i in range(l): 38 | real_win_size = half_win_size 39 | left = i - real_win_size 40 | if left < 0: 41 | left = 0 42 | right = i + real_win_size 43 | if right >= l: 44 | right = l - 1 45 | for j in range(left, right + 1): 46 | if walk[i] == walk[j]: 47 | continue 48 | src.append(walk[i]) 49 | dst.append(walk[j]) 50 | return src, dst 51 | 52 | def __call__(self, batch_nodes): 53 | batch_src, batch_dst = list(), list() 54 | 55 | walks_list = list() 56 | for i in range(self.walk_num_per_node): 57 | walks = self.sample_walks(self.graph, batch_nodes, self.walk_length, self.walk_mode) 58 | walks_list += walks[0].tolist() 59 | for walk in walks_list: 60 | src, dst = self.skip_gram_gen_pairs(walk, self.half_win_size) 61 | batch_src += src 62 | batch_dst += dst 63 | 64 | # shuffle pair 65 | batch_tmp = list(set(zip(batch_src, batch_dst))) 66 | random.shuffle(batch_tmp) 67 | batch_src, batch_dst = zip(*batch_tmp) 68 | 69 | batch_src = torch.from_numpy(np.array(batch_src)) 70 | batch_dst = torch.from_numpy(np.array(batch_dst)) 71 | return batch_src, batch_dst 72 | 73 | 74 | class NodesDataset(Dataset): 75 | def __init__(self, nodes): 76 | self.nodes = nodes 77 | 78 | def __len__(self): 79 | return len(self.nodes) 80 | 81 | def __getitem__(self, index): 82 | return self.nodes[index] 83 | 84 | 85 | class Word2VecWalkset(object): 86 | def __init__(self, graph, seed_nodes, walk_length): 87 | print() 88 | 89 | def __iter__(self, graph, seed_nodes, walk_length): 90 | # DeepwalkSampler(self.G, self.seeds[i], self.walk_length) 91 | walks = dgl.sampling.random_walk(graph, seed_nodes, length=walk_length) 92 | yield walks 93 | 94 | # self.w2v_model = Word2Vec(walks, sg=1, hs=1) 95 | def forward(self): 96 | print() 97 | 98 | 99 | if __name__ == "__main__": 100 | from build_graph import Build_Graph 101 | 102 | file_path = "../data/blog/" 103 | GraphSet = Build_Graph(file_path, undirected=True) 104 | graph = GraphSet.graph 105 | print(GraphSet.id2node[0], GraphSet.id2node[1]) 106 | print(random.sample(graph.nodes().tolist(), 5)) 107 | 108 | nodes_dataset = NodesDataset(graph.nodes()) 109 | 110 | 111 | class ConfigClass(object): 112 | def __init__(self, lr=0.05, gpu="0"): 113 | self.lr = 0.005 114 | self.gpu = "0" 115 | self.epochs = 32 116 | self.embed_dim = 64 117 | self.batch_size = 10 118 | self.walk_num_per_node = 6 119 | self.walk_length = 12 120 | self.win_size = 6 121 | self.neg_num = 5 122 | self.save_path = "../out/blog_deepwalk_ckpt" 123 | self.file_path = "../data/blog/" 124 | 125 | 126 | config = ConfigClass() 127 | pair_generate_func = Collate_Func(graph, config) 128 | 129 | pair_loader = DataLoader(nodes_dataset, batch_size=1, shuffle=True, num_workers=4, 130 | collate_fn=pair_generate_func) 131 | 132 | pair = set() 133 | for i, (batch_src, batch_dst) in enumerate(pair_loader): 134 | print(batch_src.shape) 135 | print(batch_dst.shape) 136 | for i, j in zip(batch_src.tolist(), batch_dst.tolist()): 137 | pair.add((i, j)) 138 | print(len(pair)) 139 | break 140 | -------------------------------------------------------------------------------- /deepwalk/main.py: -------------------------------------------------------------------------------- 1 | import time 2 | import os 3 | import numpy as np 4 | from tqdm import tqdm 5 | 6 | import torch 7 | from torch.utils.data import Dataset, DataLoader 8 | 9 | from model import SkipGramModel 10 | from build_graph import Build_Graph 11 | from dataset import NodesDataset, Collate_Func 12 | 13 | 14 | def main(config): 15 | print(torch.cuda.device_count(), torch.cuda.is_available()) 16 | os.environ["CUDA_VISIBLE_DEVICES"] = config.gpu 17 | # device = torch.device(config.gpu) 18 | torch.backends.cudnn.benchmark = True 19 | 20 | GraphSet = Build_Graph(config.file_path, walk_length=config.walk_length, self_loop=True, undirected=True) 21 | graph = GraphSet.graph 22 | 23 | model = SkipGramModel(graph.num_nodes(), embed_dim=config.embed_dim) 24 | model.cuda() 25 | optimizer = torch.optim.SparseAdam(model.parameters(), lr=float(config.lr)) 26 | 27 | nodes_dataset = NodesDataset(graph.nodes()) 28 | pair_generate_func = Collate_Func(graph, config) 29 | 30 | pair_loader = DataLoader(nodes_dataset, batch_size=config.batch_size, shuffle=True, num_workers=4, 31 | collate_fn=pair_generate_func) 32 | 33 | for epoch in range(config.epochs): 34 | start_time = time.time() 35 | model.train() 36 | 37 | loss_total = list() 38 | top_loss = 0 39 | tqdm_bar = tqdm(pair_loader, desc="Training epoch{epoch}".format(epoch=epoch)) 40 | for i, (batch_src, batch_dst) in enumerate(tqdm_bar): 41 | batch_src = batch_src.cuda().long() 42 | batch_dst = batch_dst.cuda().long() 43 | 44 | batch_neg = np.random.randint(0, graph.num_nodes(), size=(batch_src.shape[0], config.neg_num)) 45 | batch_neg = torch.from_numpy(batch_neg).cuda().long() # change multi neg_num 46 | 47 | model.zero_grad() 48 | loss = model.forward(batch_src, batch_dst, batch_neg) 49 | loss.backward() 50 | optimizer.step() 51 | loss_total.append(loss.detach().item()) 52 | 53 | if top_loss > np.mean(loss_total): 54 | top_loss = np.mean(loss_total) 55 | torch.save(model.state_dict(), config.save_path) 56 | print("Epoch: %03d; loss = %.4f saved path: %s" % (epoch, top_loss, config.save_path)) 57 | print("Epoch: %03d; loss = %.4f cost time %.4f" % (epoch, np.mean(loss_total), time.time() - start_time)) 58 | 59 | 60 | if __name__ == "__main__": 61 | class ConfigClass(): 62 | def __init__(self): 63 | self.lr = 0.005 64 | self.gpu = "0" 65 | self.epochs = 32 66 | self.embed_dim = 64 67 | self.batch_size = 10 68 | self.walk_mode = "random_walk" # node2vec_random_walk 69 | self.p = 1.0 70 | self.q = 1.0 71 | self.walk_num_per_node = 6 72 | self.walk_length = 12 73 | self.win_size = 6 74 | self.neg_num = 5 75 | self.save_path = "../out/blog_deepwalk_ckpt" 76 | self.file_path = "../data/blog/" 77 | 78 | config = ConfigClass() 79 | main(config) 80 | 81 | # import argparse 82 | # from utils import load_config 83 | # 84 | # parser = argparse.ArgumentParser(description='bert classification') 85 | # parser.add_argument("-c", "--config", type=str, default="./config.yaml") 86 | # args = parser.parse_args() 87 | # config = load_config(args.config) 88 | 89 | -------------------------------------------------------------------------------- /deepwalk/model.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | import torch.nn.functional as F 4 | 5 | import random 6 | import numpy as np 7 | 8 | 9 | # from gensim.models import Word2Vec 10 | 11 | 12 | class SkipGramModel(nn.Module): 13 | def __init__(self, num_nodes, embed_dim): 14 | super(SkipGramModel, self).__init__() 15 | self.num_nodes = num_nodes 16 | self.emb_dimension = embed_dim 17 | 18 | self.embed_nodes = nn.Embedding(self.num_nodes, self.emb_dimension, sparse=True) 19 | nn.init.xavier_uniform_(self.embed_nodes.weight) 20 | self.loss = nn.BCEWithLogitsLoss() 21 | 22 | def forward(self, src, pos, neg): 23 | embed_src = self.embed_nodes(src) # (B, d) 24 | embed_pos = self.embed_nodes(pos) # (B, d) 25 | embed_neg = self.embed_nodes(neg) # (B, neg_num, d) 26 | # print(embed_src.shape, embed_pos.shape, embed_neg.shape) 27 | 28 | # pos_socre = torch.sum(torch.matmul(embed_src, embed_pos.transpose(0, 1)), 1) 29 | # pos_socre = -F.logsigmoid(pos_socre) 30 | # 31 | # neg_socre = torch.sum(torch.matmul(embed_src, embed_neg.transpose(1, 2)), (1, 2)) 32 | # neg_socre = -F.logsigmoid(-neg_socre) 33 | 34 | pos_logits = torch.matmul(embed_src, embed_pos.transpose(0, 1)) 35 | ones_label = torch.ones_like(pos_logits) 36 | # print(pos_logits.shape, ones_label.shape) 37 | pos_loss = self.loss(pos_logits, ones_label) 38 | 39 | neg_logits = torch.matmul(embed_src, embed_neg.transpose(1, 2)) 40 | zeros_label = torch.zeros_like(neg_logits) 41 | # print(neg_logits.shape, zeros_label.shape) 42 | neg_loss = self.loss(neg_logits, zeros_label) 43 | 44 | loss = (pos_loss + neg_loss) / 2 45 | return loss 46 | 47 | 48 | 49 | 50 | 51 | def skip_gram_model_test(): 52 | model = SkipGramModel(1000, embed_dim=32) 53 | model.cuda() 54 | 55 | src = np.random.randint(0, 100, size=10) 56 | src = torch.from_numpy(src).cuda().long() 57 | 58 | dst = np.random.randint(0, 100, size=10) 59 | dst = torch.from_numpy(dst).cuda().long() 60 | 61 | neg = np.random.randint(0, 100, size=(10, 5)) 62 | neg = torch.from_numpy(neg).cuda().long() 63 | 64 | print(src.shape, dst.shape, neg.shape) 65 | 66 | print(model(src, dst, neg)) 67 | 68 | 69 | 70 | 71 | if __name__ == "__main__": 72 | skip_gram_model_test() 73 | -------------------------------------------------------------------------------- /deepwalk/node_classfication.py: -------------------------------------------------------------------------------- 1 | import time 2 | import os 3 | import pickle 4 | import numpy as np 5 | import pandas as pd 6 | 7 | import torch 8 | import torch.nn as nn 9 | import torch.nn.functional as F 10 | from torch.utils.data import Dataset, DataLoader 11 | 12 | from sklearn import metrics 13 | from sklearn.model_selection import train_test_split 14 | 15 | 16 | def get_map_dict(dict_path): 17 | a_file = open(dict_path, "rb") 18 | map_dict = pickle.load(a_file) 19 | return map_dict 20 | 21 | 22 | def label_data(config, mode="train"): 23 | node_map_dict = get_map_dict(config.file_path + "node_map_dic.pkl") 24 | df = pd.read_csv(config.file_path + "blog-label.txt", header=None, sep="\t", names=["nodes", "label"]) 25 | df.nodes = df.nodes.map(node_map_dict) 26 | 27 | df_label = pd.crosstab(df.nodes, df.label).gt(0).astype(int) 28 | df_label = df_label.reset_index() 29 | 30 | train, test = train_test_split(df_label, test_size=0.1, random_state=123, shuffle=True) 31 | if mode == "train": 32 | node = train.nodes.to_list() 33 | label = train.drop('nodes', axis=1).values.tolist() 34 | else: 35 | node = test.nodes.to_list() 36 | label = test.drop('nodes', axis=1).values.tolist() 37 | 38 | return np.array(node), np.array(label) 39 | 40 | 41 | 42 | class NodesDataset(Dataset): 43 | def __init__(self, config, mode="train"): 44 | self.nodes, self.labels = label_data(config, mode=mode) 45 | 46 | def __len__(self): 47 | return len(self.nodes) 48 | 49 | def __getitem__(self, index): 50 | return self.nodes[index], self.labels[index] 51 | 52 | 53 | class NodeClassification(nn.Module): 54 | def __init__(self, emb, num_class=38): 55 | super(NodeClassification, self).__init__() 56 | self.emb = nn.Embedding.from_pretrained(emb, freeze=True) 57 | self.size = emb.shape[1] 58 | self.num_class = num_class 59 | self.fc = nn.Linear(self.size, self.num_class) 60 | 61 | def forward(self, node): 62 | # node = torch.tensor(node).to(torch.int64) 63 | node_emb = self.emb(node) 64 | prob = self.fc(node_emb) 65 | 66 | return prob 67 | 68 | 69 | def evaluate(test_nodes_loader, model): 70 | model.eval() 71 | 72 | predict_all = np.array([], dtype=int) 73 | labels_all = np.array([], dtype=int) 74 | 75 | for i, (batch_nodes, batch_labels) in enumerate(test_nodes_loader): 76 | batch_nodes = batch_nodes.cuda().long() 77 | batch_labels = batch_labels.cuda().float() 78 | 79 | logit = model(batch_nodes) 80 | probs = torch.sigmoid(logit) 81 | 82 | label = torch.max(batch_labels.data, 1)[1].cpu().numpy() 83 | pred = torch.max(probs.data, 1)[1].cpu().numpy() 84 | # predic = torch.max(probs.news, 1)[1].cpu().numpy() 85 | print(pred.size, label.size) 86 | 87 | labels_all = np.append(labels_all, label) 88 | predict_all = np.append(predict_all, pred) 89 | print(labels_all.size, predict_all.size) 90 | f1_score = metrics.f1_score(labels_all, predict_all, average="macro") 91 | return f1_score 92 | 93 | 94 | def main(config): 95 | os.environ["CUDA_VISIBLE_DEVICES"] = config.gpu 96 | # device = torch.device(config.gpu) 97 | torch.backends.cudnn.benchmark = True 98 | 99 | param = torch.load(config.save_path) 100 | emb = param["embed_nodes.weight"] 101 | print(emb.shape) 102 | 103 | train_nodes_dataset = NodesDataset(config, "train") 104 | train_nodes_loader = DataLoader(train_nodes_dataset, batch_size=config.batch_size, shuffle=True, num_workers=4) 105 | test_nodes_dataset = NodesDataset(config, "test") 106 | test_nodes_loader = DataLoader(test_nodes_dataset, batch_size=config.batch_size, shuffle=False, num_workers=4) 107 | print("--", len(train_nodes_loader), len(test_nodes_loader)) 108 | 109 | model = NodeClassification(emb, num_class=config.num_class) 110 | model.cuda() 111 | optimizer = torch.optim.AdamW(model.parameters(), lr=float(config.lr), betas=(0.9, 0.999), eps=1e-08, 112 | weight_decay=0.01, amsgrad=False) 113 | loss_func = nn.BCEWithLogitsLoss() 114 | 115 | start_time = time.time() 116 | for epoch in range(config.epochs): 117 | loss_total = list() 118 | for i, (batch_nodes, batch_labels) in enumerate(train_nodes_loader): 119 | batch_nodes = batch_nodes.cuda().long() 120 | batch_labels = batch_labels.cuda().float() 121 | 122 | model.zero_grad() 123 | logit = model(batch_nodes) 124 | probs = torch.sigmoid(logit) 125 | loss = loss_func(probs, batch_labels) 126 | loss.backward() 127 | optimizer.step() 128 | 129 | loss_total.append(loss.detach().item()) 130 | print("Epoch: %03d; loss = %.4f cost time %.4f" % (epoch, np.mean(loss_total), time.time() - start_time)) 131 | f1 = evaluate(test_nodes_loader, model) 132 | print("Epoch: %03d; f1 = %.4f" % (epoch, f1)) 133 | 134 | 135 | if __name__ == "__main__": 136 | class ConfigClass(): 137 | def __init__(self): 138 | self.lr = 0.05 139 | self.gpu = "0" 140 | self.epochs = 32 141 | self.batch_size = 256 142 | self.num_class = 39 143 | self.save_path = "../out/blog_deepwalk_ckpt" 144 | self.file_path = "../data/blog/" 145 | 146 | 147 | config = ConfigClass() 148 | main(config) 149 | -------------------------------------------------------------------------------- /deepwalk/test/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/xinyi-code/Graph-Learning/fb8ff32e2e1ab4dbebb60ce0acd75ae595d806b8/deepwalk/test/__init__.py -------------------------------------------------------------------------------- /deepwalk/test/test_hetergraph.py: -------------------------------------------------------------------------------- 1 | import dgl 2 | import torch 3 | 4 | 5 | g = dgl.heterograph({('user', 'plays', 'game'): (torch.tensor([0]), torch.tensor([1])), 6 | ('developer', 'develops', 'game'): (torch.tensor([1]), torch.tensor([2]))}) 7 | 8 | g.dstnodes('game') 9 | 10 | g.dstnodes['game'].data['h'] = torch.ones(3, 1) 11 | print(g.dstnodes['game'].data['h']) 12 | 13 | 14 | g = dgl.heterograph({('user', 'follows', 'user'): (torch.tensor([0]), torch.tensor([1])), 15 | ('developer', 'develops', 'game'): (torch.tensor([1]), torch.tensor([2]))}) 16 | 17 | 18 | g.dstnodes('developer') 19 | 20 | g.dstnodes['developer'].data['h'] = torch.ones(2, 1) 21 | print(g.dstnodes['developer'].data['h']) -------------------------------------------------------------------------------- /deepwalk/utils.py: -------------------------------------------------------------------------------- 1 | import yaml 2 | 3 | 4 | class AttrDict(dict): 5 | """Attr dict: make value private 6 | """ 7 | 8 | def __init__(self, d): 9 | self.dict = d 10 | 11 | def __getattr__(self, attr): 12 | value = self.dict[attr] 13 | if isinstance(value, dict): 14 | return AttrDict(value) 15 | else: 16 | return value 17 | 18 | def __str__(self): 19 | return str(self.dict) 20 | 21 | 22 | def load_config(config_file): 23 | """Load config file""" 24 | with open(config_file) as f: 25 | if hasattr(yaml, 'FullLoader'): 26 | config = yaml.load(f, Loader=yaml.FullLoader) 27 | else: 28 | config = yaml.load(f) 29 | print(config) 30 | return AttrDict(config) 31 | 32 | 33 | def skip_gram_gen_pairs(walk, half_win_size=2): 34 | src, dst = list(), list() 35 | 36 | l = len(walk) 37 | # rnd = np.random.randint(1, half_win_size+1, dtype=np.int64, size=l) 38 | for i in range(l): 39 | real_win_size = half_win_size 40 | left = i - real_win_size 41 | if left < 0: 42 | left = 0 43 | right = i + real_win_size 44 | if right >= l: 45 | right = l - 1 46 | for j in range(left, right + 1): 47 | if walk[i] == walk[j]: 48 | continue 49 | src.append(walk[i]) 50 | dst.append(walk[j]) 51 | return src, dst 52 | 53 | 54 | if __name__ == "__main__": 55 | import argparse 56 | 57 | parser = argparse.ArgumentParser(description='text classification') 58 | parser.add_argument("-c", "--config", type=str, default="./config.yaml") 59 | args = parser.parse_args() 60 | config = load_config(args.config) 61 | -------------------------------------------------------------------------------- /gat/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/xinyi-code/Graph-Learning/fb8ff32e2e1ab4dbebb60ce0acd75ae595d806b8/gat/__init__.py -------------------------------------------------------------------------------- /gat/__pycache__/model.cpython-38.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/xinyi-code/Graph-Learning/fb8ff32e2e1ab4dbebb60ce0acd75ae595d806b8/gat/__pycache__/model.cpython-38.pyc -------------------------------------------------------------------------------- /gat/model.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | import torch.nn.functional as F 4 | 5 | 6 | class GATLayer(nn.Module): 7 | def __init__(self, g, in_feats, out_feats, 8 | feat_drop=0.6, attn_drop=0.6, 9 | negative_slope=0.2, residual=False, activation=None): 10 | ''' define a GAT layer: 11 | you can adjust the parameter of drop rate and negative_slope to get bette result 12 | for cora dataset because cora is too small to get overfitting. 13 | ''' 14 | super(GATLayer, self).__init__() 15 | self.g = g 16 | self.dropout_feat = nn.Dropout(feat_drop) 17 | self.fc = nn.Linear(in_feats, out_feats, bias=False) 18 | self.dropout_attn = nn.Dropout(attn_drop) 19 | self.attention_func = nn.Linear(2 * out_feats, 1, bias=False) 20 | self.activation = activation 21 | self.leaky_relu = nn.LeakyReLU(negative_slope) 22 | 23 | def edge_attention(self, edges): 24 | concat_z = torch.cat([edges.src['z'], edges.dst['z']], dim=1) 25 | src_e = self.attention_func(concat_z) 26 | src_e = self.leaky_relu(src_e) 27 | return {'e': src_e} 28 | 29 | def message_func(self, edges): 30 | return {'z': edges.src['z'], 'e': edges.data['e']} 31 | 32 | def reduce_func(self, nodes): 33 | alpha = F.softmax(nodes.mailbox['e'], dim=1) 34 | alpha = self.dropout_attn(alpha) # add attention dropout 35 | h = torch.sum(alpha * nodes.mailbox['z'], dim=1) 36 | return {'h': h} 37 | 38 | def forward(self, h): 39 | h = self.dropout_feat(h) # add feat dropout 40 | z = self.fc(h) 41 | self.g.ndata['z'] = z 42 | self.g.apply_edges(self.edge_attention) 43 | self.g.update_all(self.message_func, self.reduce_func) 44 | return self.g.ndata.pop('h') 45 | 46 | 47 | class MultiHeadGATLayer(nn.Module): 48 | def __init__(self, g, in_dim, out_dim, num_heads, merge='cat'): 49 | super(MultiHeadGATLayer, self).__init__() 50 | self.heads = nn.ModuleList() 51 | for i in range(num_heads): 52 | self.heads.append(GATLayer(g, in_dim, out_dim)) 53 | self.merge = merge 54 | 55 | def forward(self, h): 56 | head_outs = [attn_head(h) for attn_head in self.heads] 57 | if self.merge == 'cat': 58 | return torch.cat(head_outs, dim=1) 59 | else: 60 | return torch.mean(torch.stack(head_outs)) 61 | 62 | 63 | class GATModel(nn.Module): 64 | def __init__(self, g, in_dim, hidden_dim, out_dim, num_heads): 65 | super(GATModel, self).__init__() 66 | self.in_dim = in_dim 67 | self.hidden_dim = hidden_dim 68 | self.out_dim = out_dim 69 | self.num_heads = num_heads 70 | 71 | self.layer1 = MultiHeadGATLayer(g, in_dim, hidden_dim, num_heads) 72 | # input dimension: hidden_dim * num_heads 73 | # output dimension: hidden_dim * 1 (由于concat(num_heads), 只输出一个头。) 74 | self.layer2 = MultiHeadGATLayer(g, hidden_dim * num_heads, out_dim, 1) 75 | 76 | def forward(self, h): 77 | h = self.layer1(h) 78 | h = F.elu(h) 79 | h = self.layer2(h) 80 | return h 81 | -------------------------------------------------------------------------------- /gat/train.py: -------------------------------------------------------------------------------- 1 | import time 2 | import argparse 3 | import numpy as np 4 | 5 | import torch 6 | import torch.nn.functional as F 7 | 8 | import dgl 9 | from dgl import DGLGraph 10 | from dgl.data import CoraGraphDataset 11 | 12 | from model import GATModel 13 | 14 | 15 | def load_cora_data(args): 16 | if args.dataset == 'cora': 17 | data = CoraGraphDataset() 18 | else: 19 | data = None 20 | raise NotImplementedError 21 | 22 | g = data[0] 23 | features = g.ndata['feat'] 24 | labels = g.ndata['label'] 25 | train_mask = g.ndata['train_mask'] 26 | val_mask = g.ndata['val_mask'] 27 | test_mask = g.ndata['test_mask'] 28 | # num_feats = features.shape[1] 29 | # n_classes = data.num_labels 30 | # n_edges = data.graph.number_of_edges() 31 | return g, features, labels, train_mask, val_mask, test_mask 32 | 33 | 34 | def accuracy(logits, labels): 35 | _, indices = torch.max(logits, dim=1) 36 | correct = torch.sum(indices == labels) 37 | return correct.item() * 1.0 / len(labels) 38 | 39 | 40 | def evaluate(model, features, labels, mask): 41 | model.eval() 42 | with torch.no_grad(): 43 | logits = model(features) 44 | logits = logits[mask] 45 | labels = labels[mask] 46 | return accuracy(logits, labels) 47 | 48 | 49 | def main(args): 50 | g, features, labels, train_mask, val_mask, test_mask = load_cora_data(args) 51 | 52 | if args.add_self_loop: 53 | # add self loop 54 | g = dgl.remove_self_loop(g) 55 | g = dgl.add_self_loop(g) 56 | 57 | model = GATModel(g, 58 | in_dim=features.size()[1], 59 | hidden_dim=8, 60 | out_dim=7, 61 | num_heads=8) 62 | # print(model) 63 | 64 | optimizer = torch.optim.Adam(model.parameters(), lr=args.lr, weight_decay=args.weight_decay) 65 | 66 | top_val_acc, top_test_acc = 0, 0 67 | cost_time = [] 68 | for epoch in range(args.epochs): 69 | t0 = time.time() 70 | 71 | logits = model(features) 72 | logp = F.log_softmax(logits, 1) 73 | loss = F.nll_loss(logp[train_mask], labels[train_mask]) 74 | 75 | optimizer.zero_grad() 76 | loss.backward() 77 | optimizer.step() 78 | 79 | val_acc = evaluate(model, features, labels, val_mask) 80 | 81 | print("Epoch {:03d} | val acc {:.4f}| Loss {:.4f} | Time(s) {:.8f}".format( 82 | epoch, val_acc, loss.item(), time.time() - t0)) 83 | 84 | if top_val_acc <= val_acc: 85 | top_val_acc = val_acc 86 | acc = evaluate(model, features, labels, test_mask) 87 | top_test_acc = max(top_test_acc, acc) 88 | print("Test Accuracy {:.4f}".format(acc)) 89 | print(f"Top Test Acc: {top_test_acc}") 90 | 91 | if __name__ == '__main__': 92 | parser = argparse.ArgumentParser(description='GAT') 93 | parser.add_argument("--dataset", type=str, default='cora', 94 | help="which dataset to use.") 95 | parser.add_argument("--gpu", type=int, default=-1, 96 | help="which GPU to use. Set -1 to use CPU.") 97 | parser.add_argument("--epochs", type=int, default=200, 98 | help="number of training epochs") 99 | parser.add_argument("--num-heads", type=int, default=8, 100 | help="number of hidden attention heads") 101 | parser.add_argument("--num-out-heads", type=int, default=1, 102 | help="number of output attention heads") 103 | parser.add_argument("--num-layers", type=int, default=2, 104 | help="number of hidden layers") 105 | parser.add_argument("--num-hidden", type=int, default=8, 106 | help="number of hidden units") 107 | parser.add_argument("--residual", action="store_true", default=False, 108 | help="use residual connection") 109 | parser.add_argument("--add_self_loop", type=bool, default=True, 110 | help="add self loop") 111 | parser.add_argument("--in-drop", type=float, default=.6, 112 | help="input feature dropout") 113 | parser.add_argument("--attn-drop", type=float, default=.6, 114 | help="attention dropout") 115 | parser.add_argument("--lr", type=float, default=0.005, 116 | help="learning rate") 117 | parser.add_argument('--weight-decay', type=float, default=5e-4, 118 | help="weight decay") 119 | parser.add_argument('--negative-slope', type=float, default=0.2, 120 | help="the negative slope of leaky relu") 121 | args = parser.parse_args() 122 | print(args) 123 | 124 | main(args) 125 | -------------------------------------------------------------------------------- /gcn/build_graph.py: -------------------------------------------------------------------------------- 1 | import math 2 | import random 3 | import scipy.sparse as sp 4 | from scipy.sparse import coo_matrix, csr_matrix 5 | import numpy as np 6 | 7 | import dgl 8 | import torch 9 | 10 | from utils import preprocess_adj 11 | from dgl.data import CoraGraphDataset 12 | 13 | 14 | class GraphBuild(object): 15 | def __init__(self): 16 | # self.graph = self.build_graph_test() 17 | self.graph = self.build_graph_cora() 18 | self.adj = self.get_adj(self.graph) 19 | self.features = self.init_node_feat(self.graph) 20 | 21 | def build_graph_test(self): 22 | """a demo graph: just for graph test 23 | """ 24 | src_nodes = torch.tensor([0, 0, 1, 1, 2, 2, 2, 3, 3, 3, 3, 4, 5, 6]) 25 | dst_nodes = torch.tensor([1, 2, 0, 2, 0, 1, 3, 4, 5, 6, 2, 3, 3, 3]) 26 | graph = dgl.graph((src_nodes, dst_nodes)) 27 | # edges weights if edges has else 1 28 | graph.edata["w"] = torch.ones(graph.num_edges()) 29 | return graph 30 | 31 | def build_graph_cora(self): 32 | # Default: ~/.dgl/ 33 | data = CoraGraphDataset() 34 | graph = data[0] 35 | 36 | return graph 37 | 38 | def convert_symmetric(self, X, sparse=True): 39 | # add symmetric edges 40 | if sparse: 41 | X += X.T - sp.diags(X.diagonal()) 42 | else: 43 | X += X.T - np.diag(X.diagonal()) 44 | return X 45 | 46 | def add_self_loop(self, graph): 47 | # add self loop 48 | graph = dgl.remove_self_loop(graph) 49 | graph = dgl.add_self_loop(graph) 50 | return graph 51 | 52 | def get_adj(self, graph): 53 | graph = self.add_self_loop(graph) 54 | # edges weights if edges has weights else 1 55 | graph.edata["w"] = torch.ones(graph.num_edges()) 56 | adj = coo_matrix((graph.edata["w"], (graph.edges()[0], graph.edges()[1])), 57 | shape=(graph.num_nodes(), graph.num_nodes())) 58 | 59 | # add symmetric edges 60 | adj = self.convert_symmetric(adj, sparse=True) 61 | # adj normalize and transform matrix to torch tensor type 62 | adj = preprocess_adj(adj, is_sparse=True) 63 | 64 | return adj 65 | 66 | def init_node_feat(self, graph): 67 | # init graph node features 68 | self.nfeat_dim = graph.number_of_nodes() 69 | 70 | row = list(range(self.nfeat_dim)) 71 | col = list(range(self.nfeat_dim)) 72 | indices = torch.from_numpy( 73 | np.vstack((row, col)).astype(np.int64)) 74 | values = torch.ones(self.nfeat_dim) 75 | 76 | features = torch.sparse.FloatTensor(indices, values, 77 | (self.nfeat_dim, self.nfeat_dim)) 78 | return features 79 | 80 | 81 | if __name__ == "__main__": 82 | GraphSet = GraphBuild() 83 | graph = GraphSet.graph 84 | graph = GraphSet.add_self_loop(graph) 85 | print(graph) 86 | # print(graph.ndata['feat'].shape) 87 | features = GraphSet.init_node_feat(graph) # (num_nodes, num_nodes) 88 | adj = GraphSet.get_adj(graph) 89 | print(features.shape, adj.shape) 90 | print(adj.shape) # (10556, 10556) 91 | print(graph.edges()) 92 | -------------------------------------------------------------------------------- /gcn/gcn_model.py: -------------------------------------------------------------------------------- 1 | import math 2 | 3 | import torch 4 | import torch.nn as nn 5 | from torch.nn.parameter import Parameter 6 | from torch.nn.modules.module import Module 7 | 8 | 9 | class GraphConvolution(Module): 10 | """ 11 | Simple GCN layer, similar to https://arxiv.org/abs/1609.02907 12 | """ 13 | 14 | def __init__(self, in_features_dim, out_features_dim, activation=None, bias=True): 15 | super(GraphConvolution, self).__init__() 16 | self.in_features = in_features_dim 17 | self.out_features = out_features_dim 18 | self.activation = activation 19 | self.weight = Parameter(torch.FloatTensor(in_features_dim, out_features_dim)) 20 | if bias: 21 | self.bias = Parameter(torch.FloatTensor(out_features_dim)) 22 | else: 23 | self.register_parameter('bias', None) 24 | self.reset_parameters() 25 | 26 | def reset_parameters(self): 27 | stdv = 1. / math.sqrt(self.weight.size(1)) 28 | # self.weight.news.uniform_(-stdv, stdv) 29 | nn.init.xavier_uniform_(self.weight) 30 | if self.bias is not None: 31 | # self.bias.news.uniform_(-stdv, stdv) 32 | nn.init.zeros_(self.bias) 33 | 34 | def forward(self, infeatn, adj): 35 | ''' 36 | infeatn: init feature(H) 37 | adj: A 38 | ''' 39 | support = torch.spmm(infeatn, self.weight) # H*W # (in_feat_dim, in_feat_dim) * (in_feat_dim, out_dim) 40 | output = torch.spmm(adj, support) # A*H*W # (in_feat_dim, in_feat_dim) * (in_feat_dim, out_dim) 41 | if self.bias is not None: 42 | output = output + self.bias 43 | 44 | if self.activation is not None: 45 | output = self.activation(output) 46 | 47 | return output 48 | 49 | def __repr__(self): 50 | return self.__class__.__name__ + ' (' \ 51 | + str(self.in_features) + ' -> ' \ 52 | + str(self.out_features) + ')' 53 | 54 | 55 | class GCN(Module): 56 | def __init__(self, nfeat, nhid, nclass, n_layers, activation, dropout): 57 | super(GCN, self).__init__() 58 | self.layers = nn.ModuleList() 59 | # input layer 60 | self.layers.append(GraphConvolution(nfeat, nhid, activation=activation)) 61 | # hidden layers 62 | for i in range(n_layers - 1): 63 | self.layers.append(GraphConvolution(nhid, nhid, activation=activation)) 64 | # output layer 65 | self.layers.append(GraphConvolution(nhid, nclass)) 66 | self.dropout = torch.nn.Dropout(p=dropout) 67 | 68 | def forward(self, x, adj): 69 | 70 | h = x 71 | for i, layer in enumerate(self.layers): 72 | if i != 0: 73 | h = self.dropout(h) 74 | h = layer(h, adj) 75 | return h 76 | -------------------------------------------------------------------------------- /gcn/test_api/nx_to_scipy_sparse_matrix_test.py: -------------------------------------------------------------------------------- 1 | from scipy.sparse import coo_matrix, csr_matrix 2 | import numpy as np 3 | import networkx as nx 4 | import scipy as sp 5 | 6 | def nx_test_to_scipy_sparse_matrix1(): 7 | G = nx.Graph([(1, 1)]) 8 | A = nx.to_scipy_sparse_matrix(G) 9 | print(A.todense()) 10 | 11 | A.setdiag(A.diagonal() * 2) 12 | print(A.todense()) 13 | 14 | def nx_test_to_scipy_sparse_matrix2(): 15 | G = nx.MultiDiGraph() 16 | G.add_edge(0, 1, weight=2) 17 | G.add_edge(1, 0) 18 | G.add_edge(2, 2, weight=3) 19 | G.add_edge(2, 2) 20 | S = nx.to_scipy_sparse_matrix(G, nodelist=[0, 1, 2]) 21 | print(S.shape) 22 | print(S) 23 | print(S.todense()) 24 | 25 | def test_sp_csr_matrix(): 26 | indptr = np.array([0, 2, 3, 6]) 27 | indices = np.array([0, 2, 2, 0, 1, 2]) 28 | data = np.array([1, 2, 3, 4, 5, 6]) 29 | csr = csr_matrix((data, indices, indptr), shape=(3, 3)) 30 | print(csr) 31 | print(csr.toarray()) 32 | 33 | 34 | def test_sp_coo_matrix(): 35 | row = np.array([0, 0, 1, 3, 1, 0, 0]) 36 | col = np.array([0, 2, 1, 3, 1, 0, 0]) 37 | data = np.array([1, 1, 1, 1, 1, 1, 1]) 38 | coo = coo_matrix((data, (row, col)), shape=(4, 4)) 39 | print(coo) 40 | print(coo.toarray()) 41 | 42 | 43 | test_sp_coo_matrix() -------------------------------------------------------------------------------- /gcn/train.py: -------------------------------------------------------------------------------- 1 | import sys 2 | import time 3 | import argparse 4 | import numpy as np 5 | 6 | import torch 7 | import torch.nn.functional as F 8 | from torch.utils.data import Dataset, DataLoader 9 | 10 | import dgl 11 | 12 | from build_graph import GraphBuild 13 | from gcn_model import GCN 14 | 15 | 16 | def evaluate(model, features, adj, labels, mask): 17 | model.eval() 18 | with torch.no_grad(): 19 | logits = model(features, adj) 20 | logits = logits[mask] 21 | labels = labels[mask] 22 | _, indices = torch.max(logits, dim=1) 23 | correct = torch.sum(indices == labels) 24 | return correct.item() * 1.0 / len(labels) 25 | 26 | 27 | def main(args): 28 | if args.gpu < 0: 29 | cuda = False 30 | else: 31 | cuda = True 32 | GraphSet = GraphBuild() 33 | 34 | graph = GraphSet.build_graph_cora() 35 | labels = graph.ndata['label'] 36 | train_mask = graph.ndata['train_mask'] 37 | val_mask = graph.ndata['val_mask'] 38 | test_mask = graph.ndata['test_mask'] 39 | print(graph.nodes()) 40 | 41 | features = graph.ndata['feat'] # shape [2708, 1433] 42 | # features = GraphSet.init_node_feat(graph) # (num_nodes, num_nodes) Test accuracy 66.20% 43 | adj = GraphSet.get_adj(graph) 44 | print(features.shape, adj.shape) # [2708, 1433], [2708, 2708] 45 | # sys.exit() 46 | 47 | model = GCN(nfeat=features.shape[1], nhid=args.dim, nclass=7, n_layers=args.n_layers, activation=F.relu, 48 | dropout=args.dropout) 49 | if cuda: 50 | model.cuda() 51 | 52 | loss_fcn = torch.nn.CrossEntropyLoss() 53 | # use optimizer 54 | optimizer = torch.optim.Adam(model.parameters(), 55 | lr=args.lr, 56 | weight_decay=args.weight_decay) 57 | # scheduler = torch.optim.lr_scheduler.ExponentialLR(optimizer=optimizer, gamma=0.96) # lr decay 58 | 59 | dur = list() 60 | for epoch in range(args.epochs): 61 | model.train() 62 | if epoch >= 3: 63 | t0 = time.time() 64 | # forward 65 | logits = model(features, adj) 66 | loss = loss_fcn(logits[train_mask], labels[train_mask]) 67 | 68 | optimizer.zero_grad() 69 | loss.backward() 70 | optimizer.step() 71 | 72 | if epoch >= 3: 73 | dur.append(time.time() - t0) 74 | acc = evaluate(model, features, adj, labels, val_mask) 75 | print("Epoch {:05d} | Time(s) {:.4f} | Loss {:.4f} | Accuracy {:.4f} | " 76 | "ETputs(KTEPS) {:.2f}".format(epoch, np.mean(dur), loss.item(), 77 | acc, graph.number_of_edges() / np.mean(dur) / 1000)) 78 | # scheduler.step() 79 | 80 | print() 81 | acc = evaluate(model, features, adj, labels, test_mask) 82 | print("Test accuracy {:.2%}".format(acc)) # Test accuracy ~0.806 (0.793-0.819) (paper: 0.815) 83 | 84 | 85 | if __name__ == "__main__": 86 | parser = argparse.ArgumentParser(description='Graph Parameters Set.') 87 | parser.add_argument('--gpu', metavar='N', type=int, default=-1, 88 | help='an integer for the accumulator') 89 | parser.add_argument('--batch_size', metavar='N', type=int, default=128, 90 | help='an integer for the accumulator') 91 | parser.add_argument('--num_workers', metavar='N', type=int, default=6, 92 | help='an integer for the accumulator') 93 | parser.add_argument("--epochs", type=int, default=200, 94 | help="number of training epochs") 95 | parser.add_argument("--n_layers", type=int, default=1, 96 | help="number of training epochs") 97 | parser.add_argument('--dim', metavar='N', type=int, default=128, 98 | help='an integer for the accumulator') 99 | parser.add_argument("--lr", type=float, default=1e-2, 100 | help="learning rate") 101 | parser.add_argument("--weight-decay", type=float, default=5e-4, 102 | help="Weight for L2 loss") 103 | parser.add_argument("--dropout", type=float, default=0.1, 104 | help="dropout probability") 105 | 106 | args = parser.parse_args() 107 | main(args) 108 | -------------------------------------------------------------------------------- /gcn/utils.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import numpy as np 3 | import scipy.sparse as sp 4 | 5 | 6 | def preprocess_adj(adj, is_sparse=False): 7 | """Preprocessing of adjacency matrix for simple pygGCN model and conversion to 8 | tuple representation.""" 9 | adj_normalized = normalize_adj(adj + sp.eye(adj.shape[0])) 10 | if is_sparse: 11 | adj_normalized = sparse_mx_to_torch_sparse_tensor(adj_normalized) 12 | return adj_normalized 13 | else: 14 | return torch.from_numpy(adj_normalized.A).float() 15 | 16 | 17 | def sparse_mx_to_torch_sparse_tensor(sparse_mx): 18 | """Convert a scipy sparse matrix to a torch sparse tensor.""" 19 | sparse_mx = sparse_mx.tocoo().astype(np.float32) 20 | indices = torch.from_numpy( 21 | np.vstack((sparse_mx.row, sparse_mx.col)).astype(np.int64)) 22 | values = torch.from_numpy(sparse_mx.data) 23 | shape = torch.Size(sparse_mx.shape) 24 | return torch.sparse.FloatTensor(indices, values, shape) 25 | 26 | 27 | def normalize_adj(adj): 28 | """Symmetrically normalize adjacency matrix.""" 29 | adj = sp.coo_matrix(adj) 30 | # print(f"sparse adj: {adj}") 31 | rowsum = np.array(adj.sum(1)) 32 | # print(f"rowsum: {rowsum.shape}") 33 | d_inv_sqrt = np.power(rowsum, -0.5).flatten() 34 | d_inv_sqrt[np.isinf(d_inv_sqrt)] = 0. 35 | d_mat_inv_sqrt = sp.diags(d_inv_sqrt) 36 | # print(d_mat_inv_sqrt) 37 | # D^(-1/2)AD^(-1/2) 38 | return adj.dot(d_mat_inv_sqrt).transpose().dot(d_mat_inv_sqrt).tocoo() 39 | # return d_mat_inv_sqrt.dot(adj).transpose().dot(d_mat_inv_sqrt).tocoo() 40 | -------------------------------------------------------------------------------- /graphsage/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/xinyi-code/Graph-Learning/fb8ff32e2e1ab4dbebb60ce0acd75ae595d806b8/graphsage/__init__.py -------------------------------------------------------------------------------- /graphsage/link_prediction/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/xinyi-code/Graph-Learning/fb8ff32e2e1ab4dbebb60ce0acd75ae595d806b8/graphsage/link_prediction/__init__.py -------------------------------------------------------------------------------- /graphsage/link_prediction/__pycache__/dataloader.cpython-38.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/xinyi-code/Graph-Learning/fb8ff32e2e1ab4dbebb60ce0acd75ae595d806b8/graphsage/link_prediction/__pycache__/dataloader.cpython-38.pyc -------------------------------------------------------------------------------- /graphsage/link_prediction/__pycache__/model.cpython-38.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/xinyi-code/Graph-Learning/fb8ff32e2e1ab4dbebb60ce0acd75ae595d806b8/graphsage/link_prediction/__pycache__/model.cpython-38.pyc -------------------------------------------------------------------------------- /graphsage/link_prediction/dataloader.py: -------------------------------------------------------------------------------- 1 | import sys 2 | import argparse 3 | import torch 4 | from torch.utils.data import IterableDataset, Dataset, DataLoader 5 | 6 | import dgl 7 | from dgl.data import AIFBDataset, MUTAGDataset, BGSDataset, AMDataset, CoraGraphDataset 8 | 9 | import random 10 | from loguru import logger 11 | random.seed(123) 12 | 13 | 14 | class NodesSet(Dataset): 15 | def __init__(self, g, neg_num=1): 16 | # only load masked node for training/testing 17 | self.g = g 18 | self.nodes = g.nodes().tolist() 19 | self.neg_num = neg_num # wait to complement 20 | 21 | def __len__(self): 22 | return len(self.nodes) 23 | 24 | def __getitem__(self, index): 25 | heads = self.nodes[index] 26 | pos_nodes = dgl.sampling.random_walk(self.g, 27 | heads, 28 | length=1)[0][:, 1].tolist()[0] 29 | 30 | neg_nodes = random.sample(self.nodes, k=self.neg_num)[0] 31 | # logger.info(f"heads: {heads}") 32 | # logger.info(f"pos_nodes: {pos_nodes}") 33 | # logger.info(f"neg_nodes: {neg_nodes}") 34 | 35 | return heads, pos_nodes, neg_nodes 36 | 37 | 38 | class NodesGraphCollactor(object): 39 | """ 40 | select heads/tails/neg_tails's neighbors for aggregation 41 | """ 42 | 43 | def __init__(self, g, neighbors_every_layer=[5, 1]): 44 | self.g = g 45 | self.nodes = g.nodes().tolist() 46 | self.neighbors_every_layer = neighbors_every_layer 47 | 48 | 49 | def __call__(self, batch): 50 | # logger.info(f"batch: {batch}") 51 | # pos_nodes, neg_nodes = self.sample_pos_neg_nodes(batch) 52 | # heads: [2569, 741] 53 | # pos_nodes: tensor([2268, 1423]) 54 | # neg_nodes: [[1827, 1051, 1862, 477, 1595], [1907, 634, 88, 495, 2697]] 55 | heads = [b[0] for b in batch] 56 | tails = [b[1] for b in batch] 57 | neg_tails = [b[2] for b in batch] 58 | 59 | heads, tails, neg_tails = torch.tensor(heads), torch.tensor(tails), torch.tensor(neg_tails) 60 | # logger.info(heads, tails, neg_tails) 61 | pos_graph, neg_graph, blocks, all_seeds = self.sample_from_item_pairs(heads, tails, neg_tails) 62 | 63 | return pos_graph, neg_graph, blocks, set(all_seeds) 64 | 65 | def sample_from_item_pairs(self, heads, tails, neg_tails): 66 | # Create a graph with positive connections only and another graph with negative 67 | # connections only. 68 | pos_graph = dgl.graph( 69 | (heads, tails), 70 | num_nodes=self.g.number_of_nodes()) 71 | neg_graph = dgl.graph( 72 | (heads, neg_tails), 73 | num_nodes=self.g.number_of_nodes()) 74 | pos_graph, neg_graph = dgl.compact_graphs([pos_graph, neg_graph]) 75 | seeds = pos_graph.ndata[dgl.NID] 76 | 77 | blocks, all_seeds = self.sample_blocks(seeds, heads, tails, neg_tails) 78 | return pos_graph, neg_graph, blocks, all_seeds 79 | 80 | def sample_blocks(self, seeds, heads=None, tails=None, neg_tails=None): 81 | blocks, all_seeds = [], [] 82 | for n_neighbors in self.neighbors_every_layer: 83 | frontier = dgl.sampling.sample_neighbors( 84 | self.g, 85 | seeds, 86 | fanout=n_neighbors, 87 | edge_dir='in') 88 | if heads is not None: 89 | eids = frontier.edge_ids(torch.cat([heads, heads]), torch.cat([tails, neg_tails]), return_uv=True)[2] 90 | if len(eids) > 0: 91 | old_frontier = frontier 92 | frontier = dgl.remove_edges(old_frontier, eids) 93 | block = self.compact_and_copy(frontier, seeds) 94 | seeds = block.srcdata[dgl.NID] # 这里应该返回这一层的src node 95 | # logger.info(f"seeds: {seeds}") 96 | all_seeds += seeds.tolist() 97 | blocks.insert(0, block) 98 | return blocks, all_seeds 99 | 100 | def compact_and_copy(self, frontier, seeds): 101 | # 将第一轮的dst节点与frontier压缩成block 102 | # 并设置block的seeds 为 output nodes,其他为input nodes 103 | block = dgl.to_block(frontier, seeds) 104 | for col, data in frontier.edata.items(): 105 | if col == dgl.EID: 106 | continue 107 | block.edata[col] = data[block.edata[dgl.EID]] 108 | return block 109 | 110 | 111 | def build_cora_dataset(add_symmetric_edges=True, add_self_loop=True): 112 | dataset = CoraGraphDataset() 113 | graph = dataset[0] 114 | 115 | train_mask = graph.ndata['train_mask'] 116 | val_mask = graph.ndata['val_mask'] 117 | test_mask = graph.ndata['test_mask'] 118 | labels = graph.ndata['label'] 119 | feat = graph.ndata['feat'] 120 | 121 | if add_symmetric_edges: 122 | edges = graph.edges() 123 | graph.add_edges(edges[1], edges[0]) 124 | 125 | graph = dgl.remove_self_loop(graph) 126 | if add_self_loop: 127 | graph = dgl.add_self_loop(graph) 128 | return graph 129 | 130 | 131 | if __name__ == "__main__": 132 | parser = argparse.ArgumentParser(description="parameter set") 133 | parser.add_argument('--dataset', type=str, default='aifb') 134 | 135 | args = parser.parse_args() 136 | graph = build_cora_dataset() 137 | train_mask = graph.ndata['train_mask'] 138 | 139 | batch_sampler = NodesSet(graph, train_mask) 140 | collator = NodesGraphCollactor(graph, neighbors_every_layer=[5, 2]) 141 | dataloader = DataLoader( 142 | batch_sampler, 143 | batch_size=2, 144 | shuffle=True, 145 | num_workers=1, 146 | collate_fn=collator 147 | ) 148 | # for step, (input_nodes, pos_graph, neg_graph, blocks) in enumerate(dataloader): 149 | for step, (batch, heads_seeds, heads_blocks, all_seeds) in enumerate(dataloader): 150 | logger.info(f"---------step: {step}") 151 | logger.info(f"nodes: {batch}") 152 | logger.info(f"seeds: {heads_seeds}") 153 | logger.info(f"graph: {heads_blocks}") 154 | logger.info(f"grapall_seedsh: {all_seeds}") 155 | 156 | break 157 | -------------------------------------------------------------------------------- /graphsage/link_prediction/model.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | import torch.nn.functional as F 4 | from torch.nn.parameter import Parameter 5 | 6 | import dgl 7 | import dgl.nn.pytorch as dglnn 8 | import dgl.function as fn 9 | 10 | 11 | class WeightedSAGEConv(nn.Module): 12 | def __init__(self, input_dims, output_dims, act=F.relu, dropout=0.5, bias=True): 13 | super().__init__() 14 | 15 | self.act = act 16 | self.dropout = nn.Dropout(dropout) 17 | self.Q = nn.Linear(input_dims, output_dims) 18 | self.W = nn.Linear(input_dims + output_dims, output_dims) 19 | if bias: 20 | self.bias = Parameter(torch.FloatTensor(output_dims)) 21 | else: 22 | self.register_parameter('bias', None) 23 | # self.dropout = nn.Dropout(dropout) 24 | self.reset_parameters() 25 | 26 | def reset_parameters(self): 27 | gain = nn.init.calculate_gain('relu') 28 | nn.init.xavier_uniform_(self.Q.weight, gain=gain) 29 | nn.init.xavier_uniform_(self.W.weight, gain=gain) 30 | nn.init.constant_(self.Q.bias, 0) 31 | nn.init.constant_(self.W.bias, 0) 32 | if self.bias is not None: 33 | nn.init.zeros_(self.bias) 34 | 35 | def forward(self, g, h, weights=None): 36 | """ 37 | g : graph 38 | h : node features 39 | weights : scalar edge weights 40 | """ 41 | h_src, h_dst = h 42 | with g.local_scope(): 43 | if weights: 44 | g.srcdata['n'] = self.act(self.Q(self.dropout(h_src))) 45 | g.edata['w'] = weights.float() 46 | g.update_all(fn.u_mul_e('n', 'w', 'm'), fn.sum('m', 'n')) 47 | g.update_all(fn.copy_e('w', 'm'), fn.sum('m', 'ws')) 48 | n = g.dstdata['n'] 49 | ws = g.dstdata['ws'].unsqueeze(1).clamp(min=1) 50 | z = self.act(self.W(self.dropout(torch.cat([n / ws, h_dst], 1)))) 51 | z_norm = z.norm(2, 1, keepdim=True) 52 | z_norm = torch.where(z_norm == 0, torch.tensor(1.).to(z_norm), z_norm) 53 | z = z / z_norm 54 | else: 55 | g.srcdata['n'] = self.Q(h_src) 56 | g.update_all(fn.copy_src('n', 'm'), fn.mean('m', 'neigh')) # aggregation 57 | n = g.dstdata['neigh'] 58 | z = self.act(self.W(torch.cat([n, h_dst], 1))) + self.bias 59 | z_norm = z.norm(2, 1, keepdim=True) 60 | z_norm = torch.where(z_norm == 0, torch.tensor(1.).to(z_norm), z_norm) 61 | z = z / z_norm 62 | return z 63 | 64 | 65 | class SAGENet(nn.Module): 66 | def __init__(self, input_dim, hidden_dims, output_dims, 67 | n_layers, act=F.relu, dropout=0.5): 68 | super().__init__() 69 | self.convs = nn.ModuleList() 70 | self.convs.append(WeightedSAGEConv(input_dim, hidden_dims, act, dropout)) 71 | for _ in range(n_layers - 2): 72 | self.convs.append(WeightedSAGEConv(hidden_dims, hidden_dims, 73 | act, dropout)) 74 | self.convs.append(WeightedSAGEConv(hidden_dims, output_dims, 75 | act, dropout)) 76 | self.dropout = nn.Dropout(dropout) 77 | # self.act = act 78 | 79 | def forward(self, blocks, h): 80 | for l, (layer, block) in enumerate(zip(self.convs, blocks)): 81 | h_dst = h[:block.number_of_nodes('DST/' + block.ntypes[0])] # 这只取dst点,从下往上aggregate,得到头结点 82 | h = layer(block, (h, h_dst)) 83 | if l != len(self.convs) - 1: 84 | h = self.dropout(h) 85 | return h 86 | 87 | 88 | 89 | -------------------------------------------------------------------------------- /graphsage/link_prediction/train.py: -------------------------------------------------------------------------------- 1 | import os 2 | import argparse 3 | from tqdm import tqdm 4 | from loguru import logger 5 | import numpy as np 6 | 7 | import torch 8 | import torch.nn as nn 9 | import torch.nn.functional as F 10 | import torch.optim as optim 11 | import torch.backends.cudnn as cudnn 12 | from torch.utils.data import IterableDataset, Dataset, DataLoader 13 | 14 | import dgl.function as fn 15 | import sklearn.linear_model as lm 16 | import sklearn.metrics as skm 17 | 18 | from dataloader import build_cora_dataset, NodesSet, NodesGraphCollactor 19 | from model import SAGENet 20 | 21 | 22 | class CrossEntropyLoss(nn.Module): 23 | def forward(self, block_outputs, pos_graph, neg_graph): 24 | with pos_graph.local_scope(): 25 | pos_graph.ndata['h'] = block_outputs 26 | pos_graph.apply_edges(fn.u_dot_v('h', 'h', 'score')) 27 | pos_score = pos_graph.edata['score'] 28 | with neg_graph.local_scope(): 29 | neg_graph.ndata['h'] = block_outputs 30 | neg_graph.apply_edges(fn.u_dot_v('h', 'h', 'score')) 31 | neg_score = neg_graph.edata['score'] 32 | 33 | score = torch.cat([pos_score, neg_score]) 34 | label = torch.cat([torch.ones_like(pos_score), torch.zeros_like(neg_score)]).long() 35 | loss = F.binary_cross_entropy_with_logits(score, label.float()) 36 | return loss 37 | 38 | 39 | def load_subtensor(nfeat, seeds, device='cpu'): 40 | """ 41 | Extracts features and labels for a subset of nodes 42 | """ 43 | # logger.info(len(seeds)) 44 | seeds_feats = nfeat[list(seeds)].to(device) 45 | # batch_labels = labels[input_nodes].to(device) 46 | return seeds_feats 47 | 48 | 49 | def compute_acc_unsupervised(emb, graph): 50 | """ 51 | Compute the accuracy of prediction given the labels. 52 | """ 53 | 54 | train_mask = graph.ndata['train_mask'] 55 | test_mask = graph.ndata['test_mask'] 56 | val_mask = graph.ndata['val_mask'] 57 | train_nids = torch.LongTensor(np.nonzero(train_mask)).squeeze().cpu().numpy() 58 | val_nids = torch.LongTensor(np.nonzero(val_mask)).squeeze().cpu().numpy() 59 | test_nids = torch.LongTensor(np.nonzero(test_mask)).squeeze().cpu().numpy() 60 | 61 | emb = emb.cpu().detach().numpy() 62 | labels = graph.ndata['label'].cpu().numpy() 63 | train_labels = labels[train_nids] 64 | val_labels = labels[val_nids] 65 | test_labels = labels[test_nids] 66 | 67 | emb = (emb - emb.mean(0, keepdims=True)) / emb.std(0, keepdims=True) 68 | 69 | lr = lm.LogisticRegression(multi_class='multinomial', max_iter=1000) 70 | lr.fit(emb[train_nids], train_labels) 71 | 72 | pred = lr.predict(emb) 73 | f1_micro_train = skm.f1_score(train_labels, pred[train_nids], average='micro') 74 | f1_micro_eval = skm.f1_score(val_labels, pred[val_nids], average='micro') 75 | f1_micro_test = skm.f1_score(test_labels, pred[test_nids], average='micro') 76 | return f1_micro_train, f1_micro_eval, f1_micro_test 77 | 78 | 79 | def train(args, graph): 80 | # os.environ["CUDA_VISIBLE_DEVICES"] = args.gpu 81 | device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") 82 | cudnn.benchmark = True 83 | graph.to(device) 84 | 85 | features = graph.ndata['feat'] 86 | in_feats = features.shape[1] 87 | n_classes = 7 88 | 89 | collator = NodesGraphCollactor(graph, neighbors_every_layer=args.neighbors_every_layer) 90 | 91 | batch_sampler = NodesSet(graph) 92 | data_loader = DataLoader( 93 | batch_sampler, 94 | batch_size=512, 95 | shuffle=True, 96 | num_workers=6, 97 | collate_fn=collator 98 | ) 99 | 100 | # should aggregate while testing. 101 | test_collator = NodesGraphCollactor(graph, neighbors_every_layer=[10000]) 102 | test_data_loader = DataLoader( 103 | batch_sampler, 104 | batch_size=10000, 105 | shuffle=False, 106 | num_workers=6, 107 | collate_fn=test_collator 108 | ) 109 | 110 | # Define model and optimizer 111 | model = SAGENet(in_feats, args.num_hidden, n_classes, 112 | args.num_layers, F.relu, args.dropout) 113 | model.cuda() 114 | 115 | # loss_fcn = nn.CrossEntropyLoss() 116 | loss_fcn = CrossEntropyLoss() 117 | optimizer = optim.AdamW(model.parameters(), lr=args.lr, betas=(0.9, 0.999), eps=1e-08, 118 | weight_decay=0.1, amsgrad=False) 119 | top_acc, top_f1 = 0, 0 120 | for epoch in range(args.num_epochs): 121 | acc_cnt = 0 122 | for step, (pos_graph, neg_graph, blocks, all_seeds) in enumerate(data_loader): 123 | # pos_nodes, neg_nodes_batch = collator.sample_pos_neg_nodes(batch) 124 | # logger.info(len(batch), len(all_seeds)) 125 | # logger.info(len(pos_nodes), len(neg_nodes_batch), torch.tensor(neg_nodes_batch).shape) 126 | feats = load_subtensor(features, all_seeds, device=device) 127 | # pos_feats = load_subtensor(features, pos_nodes, device=device) 128 | # neg_feats = load_subtensor(features, neg_nodes_batch, device=device) 129 | # logger.info(heads_feats.shape, pos_feats.shape, neg_feats.shape) 130 | blocks = [b.to(device) for b in blocks] 131 | pos_graph = pos_graph.to(device) 132 | neg_graph = neg_graph.to(device) 133 | bacth_pred = model(blocks, feats) 134 | loss = loss_fcn(bacth_pred, pos_graph, neg_graph) 135 | optimizer.zero_grad() 136 | loss.backward() 137 | optimizer.step() 138 | # batch_acc_cnt = (torch.argmax(bacth_pred, dim=1) == batch_labels.long()).float().sum() 139 | # acc_cnt += int(batch_acc_cnt) 140 | logger.info(f"Train Epoch:{epoch}, Loss:{loss}") 141 | 142 | # evaluation 143 | for step, (pos_graph, neg_graph, blocks, all_seeds) in enumerate(test_data_loader): 144 | feats = load_subtensor(features, all_seeds, device=device) 145 | blocks = [b.to(device) for b in blocks] 146 | bacth_pred = model(blocks, feats) 147 | 148 | f1_micro_train, f1_micro_eval, f1_micro_test = compute_acc_unsupervised(bacth_pred, graph) 149 | if top_f1 < f1_micro_test: 150 | top_f1 = f1_micro_test 151 | logger.info( 152 | f" train f1:{f1_micro_train}, Val micro F1: {f1_micro_eval}, Test micro F1:{f1_micro_test}, TOP micro F1:{top_f1}") 153 | 154 | 155 | if __name__ == "__main__": 156 | parser = argparse.ArgumentParser(description="parameter set") 157 | parser.add_argument('--num_epochs', type=int, default=64) 158 | parser.add_argument('--num_hidden', type=int, default=256) 159 | parser.add_argument('--dropout', type=float, default=0.0) 160 | parser.add_argument('--lr', type=float, default=0.001) 161 | parser.add_argument('--num_layers', type=int, default=2) 162 | # TODO: multiple negative nodes 163 | parser.add_argument('--neighbors_every_layer', type=list, default=[10], help="or [10, 5]") 164 | parser.add_argument("--gpu", type=str, default='0', 165 | help="gpu or cpu") 166 | args = parser.parse_args() 167 | graph = build_cora_dataset(add_symmetric_edges=True, add_self_loop=True) 168 | 169 | train(args, graph) 170 | -------------------------------------------------------------------------------- /graphsage/node_classification/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/xinyi-code/Graph-Learning/fb8ff32e2e1ab4dbebb60ce0acd75ae595d806b8/graphsage/node_classification/__init__.py -------------------------------------------------------------------------------- /graphsage/node_classification/__pycache__/dataloader.cpython-38.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/xinyi-code/Graph-Learning/fb8ff32e2e1ab4dbebb60ce0acd75ae595d806b8/graphsage/node_classification/__pycache__/dataloader.cpython-38.pyc -------------------------------------------------------------------------------- /graphsage/node_classification/__pycache__/model.cpython-38.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/xinyi-code/Graph-Learning/fb8ff32e2e1ab4dbebb60ce0acd75ae595d806b8/graphsage/node_classification/__pycache__/model.cpython-38.pyc -------------------------------------------------------------------------------- /graphsage/node_classification/dataloader.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import torch 3 | from torch.utils.data import IterableDataset, Dataset, DataLoader 4 | 5 | import dgl 6 | from dgl.data import AIFBDataset, MUTAGDataset, BGSDataset, AMDataset, CoraGraphDataset 7 | 8 | import random 9 | 10 | random.seed(123) 11 | 12 | 13 | class HomoNodesSet(Dataset): 14 | def __init__(self, g, mask): 15 | # only load masked node for training/testing 16 | self.g = g 17 | self.nodes = g.nodes()[mask].tolist() 18 | 19 | def __len__(self): 20 | return len(self.nodes) 21 | 22 | def __getitem__(self, index): 23 | heads = self.nodes[index] 24 | return heads 25 | 26 | 27 | class NodesGraphCollactor(object): 28 | """ 29 | select heads/tails/neg_tails's neighbors for aggregation 30 | """ 31 | 32 | def __init__(self, g, neighbors_every_layer=[5, 1]): 33 | self.g = g 34 | self.neighbors_every_layer = neighbors_every_layer 35 | 36 | def __call__(self, batch): 37 | blocks, seeds = self.sample_blocks(batch) 38 | return batch, seeds, blocks 39 | 40 | def sample_blocks(self, seeds): 41 | blocks = [] 42 | for n_neighbors in self.neighbors_every_layer: 43 | frontier = dgl.sampling.sample_neighbors( 44 | self.g, 45 | seeds, 46 | fanout=n_neighbors, 47 | edge_dir='in') 48 | block = self.compact_and_copy(frontier, seeds) 49 | seeds = block.srcdata[dgl.NID] # 这里应该返回这一层的src node 50 | 51 | blocks.insert(0, block) 52 | return blocks, seeds 53 | 54 | def compact_and_copy(self, frontier, seeds): 55 | # 将第一轮的dst节点与frontier压缩成block 56 | # 并设置block的seeds 为 output nodes,其他为input nodes 57 | block = dgl.to_block(frontier, seeds) 58 | for col, data in frontier.edata.items(): 59 | if col == dgl.EID: 60 | continue 61 | block.edata[col] = data[block.edata[dgl.EID]] 62 | return block 63 | 64 | 65 | def build_graph(args): 66 | # load graph news 67 | if args.dataset == 'aifb': 68 | dataset = AIFBDataset() 69 | elif args.dataset == 'mutag': 70 | dataset = MUTAGDataset() 71 | elif args.dataset == 'bgs': 72 | dataset = BGSDataset() 73 | elif args.dataset == 'am': 74 | dataset = AMDataset() 75 | elif args.dataset == 'cora': 76 | dataset = AMDataset() 77 | else: 78 | raise ValueError() 79 | 80 | g = dataset[0] 81 | category = dataset.predict_category 82 | num_classes = dataset.num_classes 83 | train_mask = g.nodes[category].data.pop('train_mask') 84 | test_mask = g.nodes[category].data.pop('test_mask') 85 | train_idx = torch.nonzero(train_mask, as_tuple=False).squeeze() 86 | test_idx = torch.nonzero(test_mask, as_tuple=False).squeeze() 87 | labels = g.nodes[category].data.pop('labels') 88 | print(g) 89 | print(len(g.etypes), g.etypes) 90 | return g 91 | 92 | 93 | def build_cora_dataset(add_symmetric_edges=True, add_self_loop=True): 94 | dataset = CoraGraphDataset() 95 | graph = dataset[0] 96 | 97 | train_mask = graph.ndata['train_mask'] 98 | val_mask = graph.ndata['val_mask'] 99 | test_mask = graph.ndata['test_mask'] 100 | labels = graph.ndata['label'] 101 | feat = graph.ndata['feat'] 102 | 103 | if add_symmetric_edges: 104 | edges = graph.edges() 105 | graph.add_edges(edges[1], edges[0]) 106 | 107 | graph = dgl.remove_self_loop(graph) 108 | if add_self_loop: 109 | graph = dgl.add_self_loop(graph) 110 | return graph 111 | 112 | 113 | if __name__ == "__main__": 114 | parser = argparse.ArgumentParser(description="parameter set") 115 | parser.add_argument('--dataset', type=str, default='aifb') 116 | 117 | args = parser.parse_args() 118 | # graph = build_graph(args) 119 | graph = build_cora_dataset() 120 | train_mask = graph.ndata['train_mask'] 121 | 122 | batch_sampler = HomoNodesSet(graph, train_mask) 123 | collator = NodesGraphCollactor(graph, neighbors_every_layer=[5, 2]) 124 | dataloader = DataLoader( 125 | batch_sampler, 126 | batch_size=2, 127 | shuffle=True, 128 | num_workers=1, 129 | collate_fn=collator 130 | ) 131 | 132 | for step, (seed, blocks_nodes, blocks) in enumerate(dataloader): 133 | print("------------------------") 134 | print(seed) 135 | print(blocks_nodes) 136 | print(blocks) 137 | for b in blocks: 138 | print(b.edges()) 139 | break 140 | -------------------------------------------------------------------------------- /graphsage/node_classification/model.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | import torch.nn.functional as F 4 | from torch.nn.parameter import Parameter 5 | 6 | import dgl 7 | import dgl.nn.pytorch as dglnn 8 | import dgl.function as fn 9 | 10 | 11 | class WeightedSAGEConv(nn.Module): 12 | def __init__(self, input_dims, output_dims, act=F.relu, dropout=0.5, bias=True): 13 | super().__init__() 14 | 15 | self.act = act 16 | self.Q = nn.Linear(input_dims, output_dims) 17 | self.W = nn.Linear(input_dims + output_dims, output_dims) 18 | if bias: 19 | self.bias = Parameter(torch.FloatTensor(output_dims)) 20 | else: 21 | self.register_parameter('bias', None) 22 | # self.dropout = nn.Dropout(dropout) 23 | self.reset_parameters() 24 | 25 | def reset_parameters(self): 26 | gain = nn.init.calculate_gain('relu') 27 | nn.init.xavier_uniform_(self.Q.weight, gain=gain) 28 | nn.init.xavier_uniform_(self.W.weight, gain=gain) 29 | nn.init.constant_(self.Q.bias, 0) 30 | nn.init.constant_(self.W.bias, 0) 31 | if self.bias is not None: 32 | nn.init.zeros_(self.bias) 33 | 34 | def forward(self, g, h, weights=None): 35 | """ 36 | g : graph 37 | h : node features 38 | weights : scalar edge weights 39 | """ 40 | h_src, h_dst = h 41 | with g.local_scope(): 42 | if weights: 43 | g.srcdata['n'] = self.act(self.Q(self.dropout(h_src))) 44 | g.edata['w'] = weights.float() 45 | g.update_all(fn.u_mul_e('n', 'w', 'm'), fn.sum('m', 'n')) 46 | g.update_all(fn.copy_e('w', 'm'), fn.sum('m', 'ws')) 47 | n = g.dstdata['n'] 48 | ws = g.dstdata['ws'].unsqueeze(1).clamp(min=1) 49 | z = self.act(self.W(self.dropout(torch.cat([n / ws, h_dst], 1)))) 50 | z_norm = z.norm(2, 1, keepdim=True) 51 | z_norm = torch.where(z_norm == 0, torch.tensor(1.).to(z_norm), z_norm) 52 | z = z / z_norm 53 | else: 54 | # a= self.Q(h_src) 55 | g.srcdata['n'] = self.Q(h_src) 56 | g.update_all(fn.copy_src('n', 'm'), fn.mean('m', 'neigh')) # aggregation 57 | n = g.dstdata['neigh'] 58 | z = self.act(self.W(torch.cat([n, h_dst], 1))) + self.bias 59 | z_norm = z.norm(2, 1, keepdim=True) 60 | z_norm = torch.where(z_norm == 0, torch.tensor(1.).to(z_norm), z_norm) 61 | z = z / z_norm 62 | return z 63 | 64 | 65 | class SAGENet(nn.Module): 66 | def __init__(self, input_dim, hidden_dims, output_dims, 67 | n_layers, act=F.relu, dropout=0.5): 68 | super().__init__() 69 | self.convs = nn.ModuleList() 70 | self.convs.append(WeightedSAGEConv(input_dim, hidden_dims, act, dropout)) 71 | for _ in range(n_layers - 2): 72 | self.convs.append(WeightedSAGEConv(hidden_dims, hidden_dims, 73 | act, dropout)) 74 | self.convs.append(WeightedSAGEConv(hidden_dims, output_dims, 75 | act, dropout)) 76 | self.dropout = nn.Dropout(dropout) 77 | self.act = act 78 | 79 | def forward(self, blocks, h): 80 | for l, (layer, block) in enumerate(zip(self.convs, blocks)): 81 | h_dst = h[:block.number_of_nodes('DST/' + block.ntypes[0])] # 这只取dst点,从下往上aggregate,得到头结点 82 | h = layer(block, (h, h_dst)) 83 | if l != len(self.convs) - 1: 84 | h = self.dropout(h) 85 | return h 86 | -------------------------------------------------------------------------------- /graphsage/node_classification/train.py: -------------------------------------------------------------------------------- 1 | import os 2 | import argparse 3 | from tqdm import tqdm 4 | 5 | import torch 6 | import torch.nn as nn 7 | import torch.nn.functional as F 8 | import torch.optim as optim 9 | import torch.backends.cudnn as cudnn 10 | from torch.utils.data import IterableDataset, Dataset, DataLoader 11 | 12 | from dataloader import build_cora_dataset, HomoNodesSet, NodesGraphCollactor 13 | from model import SAGENet 14 | from sklearn.metrics import f1_score 15 | 16 | def load_subtensor(nfeat, labels, seeds, input_nodes, device): 17 | """ 18 | Extracts features and labels for a subset of nodes 19 | """ 20 | batch_inputs = nfeat[seeds].to(device) 21 | batch_labels = labels[input_nodes].to(device) 22 | return batch_inputs, batch_labels 23 | 24 | 25 | def evaluation(features, labels, test_mask, 26 | model, test_data_loader, loss_fcn, device='cpu'): 27 | model.eval() 28 | with torch.no_grad(): 29 | acc_cnt = 0 30 | for step, (input_nodes, seeds, blocks) in enumerate(test_data_loader): 31 | batch_feat, batch_labels = load_subtensor(features, labels, seeds, input_nodes, device='cpu') 32 | blocks = [b.to(device) for b in blocks] 33 | batch_feat = batch_feat.to(device) 34 | batch_labels = batch_labels.to(device) 35 | bacth_pred = model(blocks, batch_feat) 36 | loss = loss_fcn(bacth_pred, batch_labels) 37 | batch_acc_cnt = (torch.argmax(bacth_pred, dim=1) == batch_labels.long()).float().sum() 38 | acc_cnt += int(batch_acc_cnt) 39 | f1 = f1_score(batch_labels.detach().cpu(), torch.argmax(bacth_pred, dim=1).detach().cpu(), average='macro') 40 | print(f"Test: Loss:{loss}, cnt:{acc_cnt}, {torch.nonzero(test_mask).shape[0]}," 41 | f"Acc:{int(acc_cnt) / torch.nonzero(test_mask).shape[0]}") 42 | return int(acc_cnt) / torch.nonzero(test_mask).shape[0], f1 43 | 44 | 45 | def train(args, graph): 46 | # os.environ["CUDA_VISIBLE_DEVICES"] = args.gpu 47 | device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") 48 | cudnn.benchmark = True 49 | graph.to(device) 50 | 51 | features = graph.ndata['feat'] 52 | labels = graph.ndata['label'] 53 | train_mask = graph.ndata['train_mask'] 54 | test_mask = graph.ndata['test_mask'] 55 | val_mask = graph.ndata['val_mask'] 56 | in_feats = features.shape[1] 57 | n_classes = 7 58 | 59 | collator = NodesGraphCollactor(graph, neighbors_every_layer=args.neighbors_every_layer) 60 | 61 | batch_sampler = HomoNodesSet(graph, train_mask) 62 | data_loader = DataLoader( 63 | batch_sampler, 64 | batch_size=512, 65 | shuffle=True, 66 | num_workers=6, 67 | collate_fn=collator 68 | ) 69 | 70 | test_batch_sampler = HomoNodesSet(graph, test_mask) 71 | test_data_loader = DataLoader( 72 | test_batch_sampler, 73 | batch_size=1000, 74 | shuffle=False, 75 | num_workers=6, 76 | collate_fn=collator 77 | ) 78 | 79 | # Define model and optimizer 80 | model = SAGENet(in_feats, args.num_hidden, n_classes, 81 | args.num_layers, F.relu, args.dropout) 82 | model.cuda() 83 | 84 | loss_fcn = nn.CrossEntropyLoss() 85 | optimizer = optim.AdamW(model.parameters(), lr=args.lr, betas=(0.9, 0.999), eps=1e-08, 86 | weight_decay=0.1, amsgrad=False) 87 | top_acc, top_f1 = 0, 0 88 | for epoch in range(args.num_epochs): 89 | acc_cnt = 0 90 | for step, (input_nodes, seeds, blocks) in enumerate(data_loader): 91 | batch_feat, batch_labels = load_subtensor(features, labels, seeds, input_nodes, device=device) 92 | blocks = [b.to(device) for b in blocks] 93 | batch_feat = batch_feat.to(device) 94 | batch_labels = batch_labels.to(device) 95 | bacth_pred = model(blocks, batch_feat) 96 | loss = loss_fcn(bacth_pred, batch_labels) 97 | optimizer.zero_grad() 98 | loss.backward() 99 | optimizer.step() 100 | batch_acc_cnt = (torch.argmax(bacth_pred, dim=1) == batch_labels.long()).float().sum() 101 | acc_cnt += int(batch_acc_cnt) 102 | print(f"Train Epoch:{epoch}, Loss:{loss}, Acc:{int(acc_cnt) / torch.nonzero(train_mask).shape[0]}") 103 | 104 | # evaluation() 105 | 106 | acc, f1 = evaluation(features, labels, test_mask, model, test_data_loader, loss_fcn, device) 107 | if top_f1 < f1: 108 | top_acc, top_f1 = acc, f1 109 | print(f"Test Top Acc: {top_acc}, F1:{top_f1}") 110 | 111 | 112 | if __name__ == "__main__": 113 | parser = argparse.ArgumentParser(description="parameter set") 114 | parser.add_argument('--num_hidden', type=int, default=256) 115 | parser.add_argument('--dropout', type=float, default=0.8) 116 | parser.add_argument('--lr', type=float, default=0.01) 117 | parser.add_argument('--num_layers', type=int, default=2) 118 | parser.add_argument('--neighbors_every_layer', type=list, default=[10], help="or [10, 5]") 119 | parser.add_argument('--num_epochs', type=int, default=200) 120 | parser.add_argument("--gpu", type=str, default='0', 121 | help="gpu or cpu") 122 | args = parser.parse_args() 123 | graph = build_cora_dataset(add_symmetric_edges=True, add_self_loop=True) 124 | 125 | train(args, graph) 126 | -------------------------------------------------------------------------------- /graphsage/test_api/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/xinyi-code/Graph-Learning/fb8ff32e2e1ab4dbebb60ce0acd75ae595d806b8/graphsage/test_api/__init__.py -------------------------------------------------------------------------------- /graphsage/test_api/edge_dataloader.py: -------------------------------------------------------------------------------- 1 | import dgl 2 | 3 | import torch 4 | 5 | src = torch.tensor([1, 3, 5, 7, 9]) 6 | dst = torch.tensor([2, 4, 6, 8, 10]) 7 | g = dgl.graph((torch.cat([src, dst]), torch.cat([dst, src]))) 8 | 9 | E = len(src) 10 | reverse_eids = torch.cat([torch.arange(E, 2 * E), torch.arange(0, E)]) 11 | 12 | train_eid = src 13 | sampler = dgl.dataloading.MultiLayerNeighborSampler([15, 10, 5]) 14 | dataloader = dgl.dataloading.EdgeDataLoader( 15 | g, train_eid, sampler, exclude='reverse_id', 16 | reverse_eids=reverse_eids, 17 | batch_size=1024, shuffle=True, drop_last=False, num_workers=4) 18 | for input_nodes, pair_graph, blocks in dataloader: 19 | print(input_nodes) 20 | print(pair_graph) 21 | # print(neg_graph) 22 | print(blocks) 23 | -------------------------------------------------------------------------------- /node2vec/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/xinyi-code/Graph-Learning/fb8ff32e2e1ab4dbebb60ce0acd75ae595d806b8/node2vec/__init__.py -------------------------------------------------------------------------------- /node2vec/main.py: -------------------------------------------------------------------------------- 1 | import time 2 | import os 3 | import random 4 | import numpy as np 5 | from tqdm import tqdm 6 | 7 | import torch 8 | from torch.utils.data import Dataset, DataLoader 9 | 10 | import dgl 11 | 12 | from A_graph_learning.deepwalk.model import SkipGramModel 13 | from A_graph_learning.deepwalk.build_graph import Build_Graph 14 | from A_graph_learning.deepwalk.dataset import NodesDataset 15 | from A_graph_learning.deepwalk.utils import skip_gram_gen_pairs 16 | 17 | 18 | class Call_Func(): 19 | def __init__(self, g, half_win_size, walk_length=4, p=1, q=1): 20 | self.g = g 21 | self.p = p 22 | self.q = q 23 | self.walk_length = walk_length 24 | self.half_win_size = half_win_size 25 | 26 | def __call__(self, nodes): 27 | batch_src, batch_dst = list(), list() 28 | 29 | walks_list = list() 30 | walks = dgl.sampling.node2vec_random_walk(self.g, nodes, p=1, q=1, walk_length=self.walk_length) 31 | walks_list += walks.tolist() 32 | for walk in walks_list: 33 | src, dst = skip_gram_gen_pairs(walk, self.half_win_size) 34 | batch_src += src 35 | batch_dst += dst 36 | 37 | # shuffle pair 38 | batch_tmp = list(zip(batch_src, batch_dst)) 39 | random.shuffle(batch_tmp) 40 | batch_src, batch_dst = zip(*batch_tmp) 41 | 42 | batch_src = torch.from_numpy(np.array(batch_src)) 43 | batch_dst = torch.from_numpy(np.array(batch_dst)) 44 | return batch_src, batch_dst 45 | 46 | 47 | def main(config): 48 | print(torch.cuda.device_count(), torch.cuda.is_available()) 49 | os.environ["CUDA_VISIBLE_DEVICES"] = config.gpu 50 | # device = torch.device(config.gpu) 51 | torch.backends.cudnn.benchmark = True 52 | 53 | GraphSet = Build_Graph(config.file_path, walk_length=config.walk_length, self_loop=True, undirected=True) 54 | graph = GraphSet.graph 55 | 56 | model = SkipGramModel(graph.num_nodes(), embed_dim=config.embed_dim) 57 | model.cuda() 58 | optimizer = torch.optim.SparseAdam(model.parameters(), lr=float(config.lr)) 59 | 60 | nodes_dataset = NodesDataset(graph.nodes()) 61 | pair_generate_func = Call_Func(graph, config.half_win_size, config.walk_length, config.p, config.q) 62 | 63 | pair_loader = DataLoader(nodes_dataset, batch_size=config.batch_size, shuffle=True, num_workers=4, 64 | collate_fn=pair_generate_func) 65 | 66 | for epoch in range(config.epochs): 67 | start_time = time.time() 68 | model.train() 69 | 70 | loss_total = list() 71 | top_loss = 0 72 | tqdm_bar = tqdm(pair_loader, desc="Training epoch{epoch}".format(epoch=epoch)) 73 | for i, (batch_src, batch_dst) in enumerate(tqdm_bar): 74 | batch_src = batch_src.cuda().long() 75 | batch_dst = batch_dst.cuda().long() 76 | 77 | batch_neg = np.random.randint(0, graph.num_nodes(), size=(batch_src.shape[0], config.neg_num)) 78 | batch_neg = torch.from_numpy(batch_neg).cuda().long() # change multi neg_num 79 | 80 | model.zero_grad() 81 | loss = model.forward(batch_src, batch_dst, batch_neg) 82 | loss.backward() 83 | optimizer.step() 84 | loss_total.append(loss.detach().item()) 85 | 86 | if top_loss > np.mean(loss_total): 87 | top_loss = np.mean(loss_total) 88 | torch.save(model.state_dict(), config.save_path) 89 | print("Epoch: %03d; loss = %.4f saved path: %s" % (epoch, top_loss, config.save_path)) 90 | print("Epoch: %03d; loss = %.4f cost time %.4f" % (epoch, np.mean(loss_total), time.time() - start_time)) 91 | 92 | 93 | if __name__ == "__main__": 94 | class ConfigClass(): 95 | def __init__(self): 96 | self.lr = 0.005 97 | self.gpu = "0" 98 | self.epochs = 32 99 | self.embed_dim = 64 100 | self.batch_size = 10 101 | 102 | self.p = 1 103 | self.q = 1 104 | self.walk_num_per_node = 6 105 | self.walk_length = 12 106 | self.win_size = 6 107 | self.neg_num = 5 108 | self.save_path = "../out/blog_deepwalk_ckpt" 109 | self.file_path = "../data/blog/" 110 | 111 | 112 | config = ConfigClass() 113 | # main(config) 114 | 115 | # import argparse 116 | # from utils import load_config 117 | # 118 | # parser = argparse.ArgumentParser(description='bert classification') 119 | # parser.add_argument("-c", "--config", type=str, default="./config.yaml") 120 | # args = parser.parse_args() 121 | # config = load_config(args.config) 122 | -------------------------------------------------------------------------------- /node2vec/model.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | import torch.nn.functional as F 4 | 5 | import random 6 | import numpy as np 7 | 8 | 9 | # from gensim.models import Word2Vec 10 | 11 | 12 | class SkipGramModel(nn.Module): 13 | def __init__(self, num_nodes, embed_dim): 14 | super(SkipGramModel, self).__init__() 15 | self.num_nodes = num_nodes 16 | self.emb_dimension = embed_dim 17 | 18 | self.embed_nodes = nn.Embedding(self.num_nodes, self.emb_dimension, sparse=True) 19 | nn.init.xavier_uniform_(self.embed_nodes.weight) 20 | self.loss = nn.BCEWithLogitsLoss() 21 | 22 | def forward(self, src, pos, neg): 23 | embed_src = self.embed_nodes(src) # (B, d) 24 | embed_pos = self.embed_nodes(pos) # (B, d) 25 | embed_neg = self.embed_nodes(neg) # (B, neg_num, d) 26 | # print(embed_src.shape, embed_pos.shape, embed_neg.shape) 27 | 28 | pos_logits = torch.matmul(embed_src, embed_pos.transpose(0, 1)) 29 | ones_label = torch.ones_like(pos_logits) 30 | # print(pos_logits.shape, ones_label.shape) 31 | pos_loss = self.loss(pos_logits, ones_label) 32 | 33 | neg_logits = torch.matmul(embed_src, embed_neg.transpose(1, 2)) 34 | zeros_label = torch.zeros_like(neg_logits) 35 | # print(neg_logits.shape, zeros_label.shape) 36 | neg_loss = self.loss(neg_logits, zeros_label) 37 | 38 | loss = (pos_loss + neg_loss) / 2 39 | return loss 40 | 41 | 42 | def skip_gram_model_test(): 43 | model = SkipGramModel(1000, embed_dim=32) 44 | model.cuda() 45 | 46 | src = np.random.randint(0, 100, size=10) 47 | src = torch.from_numpy(src).cuda().long() 48 | 49 | dst = np.random.randint(0, 100, size=10) 50 | dst = torch.from_numpy(dst).cuda().long() 51 | 52 | neg = np.random.randint(0, 100, size=(10, 5)) 53 | neg = torch.from_numpy(neg).cuda().long() 54 | 55 | print(src.shape, dst.shape, neg.shape) 56 | 57 | print(model(src, dst, neg)) 58 | 59 | 60 | if __name__ == "__main__": 61 | skip_gram_model_test() 62 | -------------------------------------------------------------------------------- /node2vec/sample_walks.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import random 3 | import pgl 4 | 5 | def random_walk(g, nodes, max_depth): 6 | walk_paths = [] 7 | # init 8 | for node in nodes: 9 | walk_paths.append([node]) 10 | 11 | cur_walk_ids = np.arange(0, len(nodes)) 12 | cur_nodes = np.array(nodes) 13 | for l in range(max_depth - 1): 14 | # select the walks not end 15 | cur_succs = g.successor(cur_nodes) 16 | mask = [len(succ) > 0 for succ in cur_succs] 17 | 18 | if np.any(mask): 19 | cur_walk_ids = cur_walk_ids[mask] 20 | cur_nodes = cur_nodes[mask] 21 | cur_succs = cur_succs[mask] 22 | else: 23 | # stop when all nodes have no successor 24 | break 25 | 26 | outdegree = [len(cur_succ) for cur_succ in cur_succs] 27 | sample_index = np.floor( 28 | np.random.rand(cur_succs.shape[0]) * outdegree).astype("int64") 29 | 30 | nxt_cur_nodes = [] 31 | for s, ind, walk_id in zip(cur_succs, sample_index, cur_walk_ids): 32 | walk_paths[walk_id].append(s[ind]) 33 | nxt_cur_nodes.append(s[ind]) 34 | cur_nodes = np.array(nxt_cur_nodes) 35 | return walk_paths 36 | 37 | 38 | def node2vec_walk(graph, nodes, max_depth, p=1.0, q=1.0): 39 | if p == 1.0 and q == 1.0: 40 | return random_walk(graph, nodes, max_depth) 41 | 42 | walk = [] 43 | # init 44 | for node in nodes: 45 | walk.append([node]) 46 | 47 | cur_walk_ids = np.arange(0, len(nodes)) 48 | cur_nodes = np.array(nodes) 49 | prev_nodes = np.array([-1] * len(nodes), dtype="int64") 50 | prev_succs = np.array([[]] * len(nodes), dtype="int64") 51 | for l in range(max_depth): 52 | # select the walks not end 53 | cur_succs = graph.successor(cur_nodes) 54 | 55 | mask = [len(succ) > 0 for succ in cur_succs] 56 | if np.any(mask): 57 | cur_walk_ids = cur_walk_ids[mask] 58 | cur_nodes = cur_nodes[mask] 59 | prev_nodes = prev_nodes[mask] 60 | prev_succs = prev_succs[mask] 61 | cur_succs = cur_succs[mask] 62 | else: 63 | # stop when all nodes have no successor 64 | break 65 | num_nodes = cur_nodes.shape[0] 66 | nxt_nodes = np.zeros(num_nodes, dtype="int64") 67 | 68 | for idx, ( 69 | succ, prev_succ, walk_id, prev_node 70 | ) in enumerate(zip(cur_succs, prev_succs, cur_walk_ids, prev_nodes)): 71 | sampled_succ = node2vec_sample(succ, prev_succ, 72 | prev_node, p, q) 73 | walk[walk_id].append(sampled_succ) 74 | nxt_nodes[idx] = sampled_succ 75 | 76 | prev_nodes, prev_succs = cur_nodes, cur_succs 77 | cur_nodes = nxt_nodes 78 | return walk 79 | 80 | 81 | def node2vec_walk(graph, nodes, max_depth, p=1.0, q=1.0): 82 | if p == 1.0 and q == 1.0: 83 | return random_walk(graph, nodes, max_depth) 84 | 85 | walk = [] 86 | # init 87 | for node in nodes: 88 | walk.append([node]) 89 | 90 | cur_walk_ids = np.arange(0, len(nodes)) 91 | cur_nodes = np.array(nodes) 92 | prev_nodes = np.array([-1] * len(nodes), dtype="int64") 93 | prev_succs = np.array([[]] * len(nodes), dtype="int64") 94 | for l in range(max_depth): 95 | # select the walks not end 96 | cur_succs = graph.successor(cur_nodes) # all the successor 97 | 98 | mask = [len(succ) > 0 for succ in cur_succs] 99 | if np.any(mask): 100 | cur_walk_ids = cur_walk_ids[mask] 101 | cur_nodes = cur_nodes[mask] 102 | prev_nodes = prev_nodes[mask] 103 | prev_succs = prev_succs[mask] 104 | cur_succs = cur_succs[mask] 105 | else: 106 | # stop when all nodes have no successor 107 | break 108 | num_nodes = cur_nodes.shape[0] 109 | nxt_nodes = np.zeros(num_nodes, dtype="int64") 110 | print(l, cur_succs, prev_succs, cur_walk_ids, prev_nodes) 111 | for idx, ( 112 | succ, prev_succ, walk_id, prev_node 113 | ) in enumerate(zip(cur_succs, prev_succs, cur_walk_ids, prev_nodes)): 114 | sampled_succ = node2vec_sample(succ, prev_succ, 115 | prev_node, p, q) 116 | walk[walk_id].append(sampled_succ) 117 | nxt_nodes[idx] = sampled_succ 118 | 119 | prev_nodes, prev_succs = cur_nodes, cur_succs 120 | cur_nodes = nxt_nodes 121 | return walk 122 | 123 | 124 | def node2vec_sample(succ, prev_succ, prev_node, p, q): 125 | """Fast implement of node2vec sampling 126 | """ 127 | print("succ", succ, "prev_succ", prev_succ, "prev_node", prev_node) 128 | succ_len = len(succ) 129 | prev_succ_len = len(prev_succ) 130 | 131 | probs = list() 132 | prob_sum = 0 133 | 134 | prev_succ_set = list() 135 | for i in range(prev_succ_len): 136 | prev_succ_set.insert(0, prev_succ[i]) 137 | 138 | for i in range(succ_len): 139 | if succ[i] == prev_node: 140 | prob = 1. / p 141 | elif len(prev_succ_set) > 0 and succ[i] != prev_succ_set[-1]: 142 | prob = 1. 143 | else: 144 | prob = 1. / q 145 | probs.append(prob) 146 | prob_sum += prob 147 | 148 | rand_num = random.uniform(0, 1) * prob_sum 149 | 150 | for i in range(succ_len): 151 | rand_num -= probs[i] 152 | if rand_num <= 0: 153 | sample_succ = succ[i] 154 | return sample_succ 155 | 156 | 157 | if __name__ == "__main__": 158 | import dgl 159 | import torch 160 | 161 | # g = dgl.graph(([0, 0, 1, 1, 2, 2], [1, 2, 0, 1, 2, 0])) 162 | # g.edata['weight'] = torch.FloatTensor([0.1, 0.2, 0.3, 0.4, 0.5, 0.5]) 163 | # sg = dgl.sampling.select_topk(g, k=1, nodes=[0], weight='weight', edge_dir="in") 164 | # sg2 = dgl.sampling.sample_neighbors(g, [0, 1], 1, prob='weight', edge_dir="out") 165 | # print(sg2.edges(order='eid')) 166 | # 167 | # cur_nodes = np.array([0, 1, 2]) 168 | # cur_succs = dgl.sampling.sample_neighbors(g, cur_nodes, 1, edge_dir="out") 169 | # print(cur_succs.edges()[1].tolist()) 170 | # mask = [1 for succ in cur_succs.edges()[1].tolist() if succ] 171 | 172 | cur_nodes = np.array([0]) 173 | g1 = dgl.graph(([0, 1, 1, 2, 3, 3, 4], [1, 2, 3, 0, 0, 4, 2])) 174 | print(dgl.sampling.sample_neighbors(g1, cur_nodes, 4, edge_dir="out")) 175 | print(node2vec_walk(g1, cur_nodes, max_depth=3, p=4, q=0.25)) 176 | -------------------------------------------------------------------------------- /out/blog_deepwalk_ckpt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/xinyi-code/Graph-Learning/fb8ff32e2e1ab4dbebb60ce0acd75ae595d806b8/out/blog_deepwalk_ckpt -------------------------------------------------------------------------------- /pictures/GCN_AD2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/xinyi-code/Graph-Learning/fb8ff32e2e1ab4dbebb60ce0acd75ae595d806b8/pictures/GCN_AD2.png -------------------------------------------------------------------------------- /pictures/graphSAGE_link_pre.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/xinyi-code/Graph-Learning/fb8ff32e2e1ab4dbebb60ce0acd75ae595d806b8/pictures/graphSAGE_link_pre.png -------------------------------------------------------------------------------- /pictures/node_classification.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/xinyi-code/Graph-Learning/fb8ff32e2e1ab4dbebb60ce0acd75ae595d806b8/pictures/node_classification.png -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | torch==1.8.2 2 | dgl==0.7.1 3 | tqdm==4.61.0 4 | scikit-learn==0.24.1 5 | numpy==1.21.2 6 | pandas==1.3.3 --------------------------------------------------------------------------------