├── .idea
├── .gitignore
├── Graph_Learning.iml
├── inspectionProfiles
│ └── profiles_settings.xml
├── misc.xml
├── modules.xml
└── vcs.xml
├── LICENSE
├── README.md
├── __init__.py
├── __pycache__
└── __init__.cpython-36.pyc
├── data
└── blog
│ ├── blog-label.txt
│ ├── blog-net.txt
│ ├── blog-vocab.txt
│ └── node_map_dic.pkl
├── deepwalk
├── __init__.py
├── __pycache__
│ ├── __init__.cpython-36.pyc
│ ├── build_graph.cpython-36.pyc
│ ├── dataset.cpython-36.pyc
│ ├── model.cpython-36.pyc
│ └── utils.cpython-36.pyc
├── build_graph.py
├── dataset.py
├── main.py
├── model.py
├── node_classfication.py
├── test
│ ├── __init__.py
│ └── test_hetergraph.py
└── utils.py
├── gat
├── __init__.py
├── __pycache__
│ └── model.cpython-38.pyc
├── model.py
└── train.py
├── gcn
├── build_graph.py
├── gcn_model.py
├── test_api
│ └── nx_to_scipy_sparse_matrix_test.py
├── train.py
└── utils.py
├── graphsage
├── __init__.py
├── link_prediction
│ ├── __init__.py
│ ├── __pycache__
│ │ ├── dataloader.cpython-38.pyc
│ │ └── model.cpython-38.pyc
│ ├── dataloader.py
│ ├── model.py
│ └── train.py
├── node_classification
│ ├── __init__.py
│ ├── __pycache__
│ │ ├── dataloader.cpython-38.pyc
│ │ └── model.cpython-38.pyc
│ ├── dataloader.py
│ ├── model.py
│ └── train.py
└── test_api
│ ├── __init__.py
│ └── edge_dataloader.py
├── node2vec
├── __init__.py
├── main.py
├── model.py
└── sample_walks.py
├── out
└── blog_deepwalk_ckpt
├── pictures
├── GCN_AD2.png
├── graphSAGE_link_pre.png
└── node_classification.png
└── requirements.txt
/.idea/.gitignore:
--------------------------------------------------------------------------------
1 | # Default ignored files
2 | /shelf/
3 | /workspace.xml
4 |
--------------------------------------------------------------------------------
/.idea/Graph_Learning.iml:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
12 |
--------------------------------------------------------------------------------
/.idea/inspectionProfiles/profiles_settings.xml:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 |
6 |
--------------------------------------------------------------------------------
/.idea/misc.xml:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
--------------------------------------------------------------------------------
/.idea/modules.xml:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
--------------------------------------------------------------------------------
/.idea/vcs.xml:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 |
6 |
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) 2021 Princeton Natural Language Processing
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | ### 图模型实践
2 | 图模型项目(GCN、GAT、GraphSAGE、deepwalk、node2vec)细节实践、论文复现、持续更新、欢迎star、交流学习。
3 |
4 | #### 1. 环境准备
5 | based on [dgl](https://github.com/dmlc/dgl) and pytorch mainly
6 | >pip install -r requirements.txt
7 |
8 | #### 2. 数据
9 | >download dataset,put it to ./data/
10 |
11 | uploaded dataset blog already
12 |
13 | #### 3. 图模型代码详解:
14 | Notes of model written here:
15 | 1. [游走图模型--同构图DeepWalk解析](https://zhuanlan.zhihu.com/p/397710211)
16 | 2. [游走图模型-聊聊Node2Vec](https://zhuanlan.zhihu.com/p/400849086)
17 | 3. [图卷积:从GCN到GAT、GraphSAGE](https://zhuanlan.zhihu.com/p/404826711)
18 | 4. [怎么搭一个GCN?只需这四步](https://zhuanlan.zhihu.com/p/422380707)
19 | 5. [怎么搭好一个GraphSAGE?按这三步走](https://zhuanlan.zhihu.com/p/429147607)
20 | 6. [Link-Prediction:搭一个无监督的GraphSAGE](https://zhuanlan.zhihu.com/p/435766657)
21 | #### How to run
22 | ##### DeepWalk
23 | ①. How to run deepwalk model for graph embedding?
24 | >cd deepwalk
25 | >python main.py
26 |
27 | ②. node classification task
28 | >python node_classification.py
29 |
30 | #### Node2Vec
31 | ①. How to run Node2Vec model
32 | >cd node2vec
33 | >python main.py
34 |
35 | ②. node classification task(should chang the checkpoint of node2vec in node_classification.py).
36 | >python node_classification.py
37 |
38 | ##### GCN
39 | ①. How to run GCN model
40 | >python train.py
41 |
42 | Cora dataset node classification(cora dataset will be download in ~/.dgl/ automatically).
43 | Test accuracy ~0.806 (0.793-0.819) ([paper](https://arxiv.org/abs/1609.02907): 0.815).
44 |
45 | ##### GraphSAGE
46 |
47 | ###### Node Classification
48 |
49 | ①. How to run GraphSAGE model
50 | >cd graphsage/node_classification
51 | >python train.py
52 |
53 | Cora dataset node classification(cora dataset will be download in ~/.dgl/ automatically).
54 | Test accuracy ~0.781(0.762-0.801) ([paper](https://arxiv.org/abs/1609.02907): 0.815).
55 |
56 | ###### Link Prediction
57 |
58 | ①. How to run GraphSAGE model
59 | >cd graphsage/link_prediction
60 | >python train.py
61 |
62 | Test F1: ~0.630 (0.612~0.648) (cora数据集)
63 |
64 | ##### GAT
65 | ①. How to run GAT model
66 | >python train.py
67 |
68 | Cora dataset node classification(cora dataset will be download in ~/.dgl/ automatically).
69 | Test accuracy ~0.810 (0.792-0.820) ([paper](https://arxiv.org/pdf/1710.10903.pdf): 0.830).
--------------------------------------------------------------------------------
/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/xinyi-code/Graph-Learning/fb8ff32e2e1ab4dbebb60ce0acd75ae595d806b8/__init__.py
--------------------------------------------------------------------------------
/__pycache__/__init__.cpython-36.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/xinyi-code/Graph-Learning/fb8ff32e2e1ab4dbebb60ce0acd75ae595d806b8/__pycache__/__init__.cpython-36.pyc
--------------------------------------------------------------------------------
/data/blog/node_map_dic.pkl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/xinyi-code/Graph-Learning/fb8ff32e2e1ab4dbebb60ce0acd75ae595d806b8/data/blog/node_map_dic.pkl
--------------------------------------------------------------------------------
/deepwalk/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/xinyi-code/Graph-Learning/fb8ff32e2e1ab4dbebb60ce0acd75ae595d806b8/deepwalk/__init__.py
--------------------------------------------------------------------------------
/deepwalk/__pycache__/__init__.cpython-36.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/xinyi-code/Graph-Learning/fb8ff32e2e1ab4dbebb60ce0acd75ae595d806b8/deepwalk/__pycache__/__init__.cpython-36.pyc
--------------------------------------------------------------------------------
/deepwalk/__pycache__/build_graph.cpython-36.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/xinyi-code/Graph-Learning/fb8ff32e2e1ab4dbebb60ce0acd75ae595d806b8/deepwalk/__pycache__/build_graph.cpython-36.pyc
--------------------------------------------------------------------------------
/deepwalk/__pycache__/dataset.cpython-36.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/xinyi-code/Graph-Learning/fb8ff32e2e1ab4dbebb60ce0acd75ae595d806b8/deepwalk/__pycache__/dataset.cpython-36.pyc
--------------------------------------------------------------------------------
/deepwalk/__pycache__/model.cpython-36.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/xinyi-code/Graph-Learning/fb8ff32e2e1ab4dbebb60ce0acd75ae595d806b8/deepwalk/__pycache__/model.cpython-36.pyc
--------------------------------------------------------------------------------
/deepwalk/__pycache__/utils.cpython-36.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/xinyi-code/Graph-Learning/fb8ff32e2e1ab4dbebb60ce0acd75ae595d806b8/deepwalk/__pycache__/utils.cpython-36.pyc
--------------------------------------------------------------------------------
/deepwalk/build_graph.py:
--------------------------------------------------------------------------------
1 | import os
2 | import copy
3 | import numpy as np
4 | import scipy.sparse as sp
5 | import pickle
6 | import torch
7 | from torch.utils.data import DataLoader
8 | from dgl.data.utils import download, _get_dgl_url, get_download_dir, extract_archive
9 | import random
10 | import time
11 | import dgl
12 |
13 |
14 | def make_undirected(G):
15 | G.add_edges(G.edges()[1], G.edges()[0])
16 | return G
17 |
18 |
19 | def find_connected_nodes(G):
20 | nodes = G.out_degrees().nonzero().squeeze(-1)
21 | return nodes
22 |
23 |
24 | import time
25 | import pandas as pd
26 |
27 |
28 | class Build_Graph(object):
29 | def __init__(self, data_dir, walk_length=5, self_loop=True, undirected=True):
30 | self.edge_file_path = data_dir + "blog-net.txt"
31 | self.map_dict_save_path = data_dir + "node_map_dic.pkl"
32 | self.self_loop = self_loop
33 | self.undirected = undirected
34 | self.walk_length = walk_length
35 |
36 | self.edges, self.nodes, self.node2id, self.id2node = self.get_edges_and_mapdict(self.edge_file_path,
37 | self_loop=self.self_loop,
38 | undirected=self.undirected)
39 | self.save_dict(self.node2id, self.map_dict_save_path)
40 |
41 | self.graph = self.build_graph(self.edges)
42 |
43 | print("total nodes number: %d" % self.graph.num_nodes())
44 | print("total edges number: %d" % len(self.edges[0]))
45 | def get_edges_and_mapdict(self, file_path, self_loop=True, undirected=True):
46 | df_net = pd.read_csv(file_path, header=None, sep=" ", names=["src", "dst", "weight"])
47 |
48 | nodes = list(set(sorted(df_net.src.to_list() + df_net.dst.to_list())))
49 | node2id = dict(zip(nodes, range(len(nodes))))
50 | id2node = dict(zip(range(len(nodes)), nodes))
51 |
52 | src = df_net.src.map(node2id).to_list()
53 | dst = df_net.dst.map(node2id).to_list()
54 |
55 | if undirected:
56 | tmp = copy.deepcopy(src)
57 | src.extend(dst)
58 | dst.extend(tmp)
59 |
60 | if self_loop:
61 | src.extend(nodes)
62 | dst.extend(nodes)
63 |
64 | assert max(node2id.values()) == len(nodes) - 1, "error reading net, quit"
65 |
66 | return (src, dst), nodes, node2id, id2node
67 |
68 | def build_graph(self, edges):
69 | start = time.time()
70 | G = dgl.graph((torch.tensor(edges[0]), torch.tensor(edges[1])))
71 | t = time.time() - start
72 | print("Building DGLGraph in %.2fs" % t)
73 | return G
74 |
75 | def save_dict(self, map_dic, save_path):
76 | a_file = open(save_path, "wb")
77 | pickle.dump(map_dic, a_file)
78 | a_file.close()
79 |
80 |
81 | if __name__ == "__main__":
82 | # net, node2id, id2node, sm = ReadTxtNet(file_path="youtube", undirected=True)
83 | file_path = "D:/Learn_Project/graph_work/news/blog/"
84 | GraphSet = Build_Graph(file_path, walk_length=5, self_loop=True, undirected=True)
85 |
86 | # Walk_Sampler = DeepwalkSampler(G, seeds, walk_length)
87 |
--------------------------------------------------------------------------------
/deepwalk/dataset.py:
--------------------------------------------------------------------------------
1 | import random
2 | import numpy as np
3 |
4 | import torch
5 | from torch.utils.data import Dataset, DataLoader
6 |
7 | import dgl
8 |
9 |
10 | class Collate_Func(object):
11 | def __init__(self, graph, config, walk_mode="random_walk"):
12 | self.walk_mode = config.walk_mode
13 | self.p = config.p
14 | self.q = config.q
15 | self.walk_length = config.walk_length
16 | self.half_win_size = config.win_size // 2
17 | self.walk_num_per_node = config.walk_num_per_node
18 | self.graph = graph
19 | self.neg_num = config.neg_num
20 | self.nodes = graph.nodes().tolist()
21 |
22 | def sample_walks(self, graph, seed_nodes, walk_length, walk_mode):
23 | # DeepwalkSampler(self.G, self.seeds[i], self.walk_length)
24 | if walk_mode == "random_walk":
25 | walks = dgl.sampling.random_walk(graph, seed_nodes, length=walk_length)
26 | elif walk_mode == "node2vec_random_walk":
27 | walks = dgl.sampling.node2vec_random_walk(graph, seed_nodes, self.p, self.q, length=walk_length)
28 | else:
29 | raise ValueError('walk mode should be defined explicit.')
30 | return walks
31 |
32 | def skip_gram_gen_pairs(self, walk, half_win_size=2):
33 | src, dst = list(), list()
34 |
35 | l = len(walk)
36 | # rnd = np.random.randint(1, half_win_size+1, dtype=np.int64, size=l)
37 | for i in range(l):
38 | real_win_size = half_win_size
39 | left = i - real_win_size
40 | if left < 0:
41 | left = 0
42 | right = i + real_win_size
43 | if right >= l:
44 | right = l - 1
45 | for j in range(left, right + 1):
46 | if walk[i] == walk[j]:
47 | continue
48 | src.append(walk[i])
49 | dst.append(walk[j])
50 | return src, dst
51 |
52 | def __call__(self, batch_nodes):
53 | batch_src, batch_dst = list(), list()
54 |
55 | walks_list = list()
56 | for i in range(self.walk_num_per_node):
57 | walks = self.sample_walks(self.graph, batch_nodes, self.walk_length, self.walk_mode)
58 | walks_list += walks[0].tolist()
59 | for walk in walks_list:
60 | src, dst = self.skip_gram_gen_pairs(walk, self.half_win_size)
61 | batch_src += src
62 | batch_dst += dst
63 |
64 | # shuffle pair
65 | batch_tmp = list(set(zip(batch_src, batch_dst)))
66 | random.shuffle(batch_tmp)
67 | batch_src, batch_dst = zip(*batch_tmp)
68 |
69 | batch_src = torch.from_numpy(np.array(batch_src))
70 | batch_dst = torch.from_numpy(np.array(batch_dst))
71 | return batch_src, batch_dst
72 |
73 |
74 | class NodesDataset(Dataset):
75 | def __init__(self, nodes):
76 | self.nodes = nodes
77 |
78 | def __len__(self):
79 | return len(self.nodes)
80 |
81 | def __getitem__(self, index):
82 | return self.nodes[index]
83 |
84 |
85 | class Word2VecWalkset(object):
86 | def __init__(self, graph, seed_nodes, walk_length):
87 | print()
88 |
89 | def __iter__(self, graph, seed_nodes, walk_length):
90 | # DeepwalkSampler(self.G, self.seeds[i], self.walk_length)
91 | walks = dgl.sampling.random_walk(graph, seed_nodes, length=walk_length)
92 | yield walks
93 |
94 | # self.w2v_model = Word2Vec(walks, sg=1, hs=1)
95 | def forward(self):
96 | print()
97 |
98 |
99 | if __name__ == "__main__":
100 | from build_graph import Build_Graph
101 |
102 | file_path = "../data/blog/"
103 | GraphSet = Build_Graph(file_path, undirected=True)
104 | graph = GraphSet.graph
105 | print(GraphSet.id2node[0], GraphSet.id2node[1])
106 | print(random.sample(graph.nodes().tolist(), 5))
107 |
108 | nodes_dataset = NodesDataset(graph.nodes())
109 |
110 |
111 | class ConfigClass(object):
112 | def __init__(self, lr=0.05, gpu="0"):
113 | self.lr = 0.005
114 | self.gpu = "0"
115 | self.epochs = 32
116 | self.embed_dim = 64
117 | self.batch_size = 10
118 | self.walk_num_per_node = 6
119 | self.walk_length = 12
120 | self.win_size = 6
121 | self.neg_num = 5
122 | self.save_path = "../out/blog_deepwalk_ckpt"
123 | self.file_path = "../data/blog/"
124 |
125 |
126 | config = ConfigClass()
127 | pair_generate_func = Collate_Func(graph, config)
128 |
129 | pair_loader = DataLoader(nodes_dataset, batch_size=1, shuffle=True, num_workers=4,
130 | collate_fn=pair_generate_func)
131 |
132 | pair = set()
133 | for i, (batch_src, batch_dst) in enumerate(pair_loader):
134 | print(batch_src.shape)
135 | print(batch_dst.shape)
136 | for i, j in zip(batch_src.tolist(), batch_dst.tolist()):
137 | pair.add((i, j))
138 | print(len(pair))
139 | break
140 |
--------------------------------------------------------------------------------
/deepwalk/main.py:
--------------------------------------------------------------------------------
1 | import time
2 | import os
3 | import numpy as np
4 | from tqdm import tqdm
5 |
6 | import torch
7 | from torch.utils.data import Dataset, DataLoader
8 |
9 | from model import SkipGramModel
10 | from build_graph import Build_Graph
11 | from dataset import NodesDataset, Collate_Func
12 |
13 |
14 | def main(config):
15 | print(torch.cuda.device_count(), torch.cuda.is_available())
16 | os.environ["CUDA_VISIBLE_DEVICES"] = config.gpu
17 | # device = torch.device(config.gpu)
18 | torch.backends.cudnn.benchmark = True
19 |
20 | GraphSet = Build_Graph(config.file_path, walk_length=config.walk_length, self_loop=True, undirected=True)
21 | graph = GraphSet.graph
22 |
23 | model = SkipGramModel(graph.num_nodes(), embed_dim=config.embed_dim)
24 | model.cuda()
25 | optimizer = torch.optim.SparseAdam(model.parameters(), lr=float(config.lr))
26 |
27 | nodes_dataset = NodesDataset(graph.nodes())
28 | pair_generate_func = Collate_Func(graph, config)
29 |
30 | pair_loader = DataLoader(nodes_dataset, batch_size=config.batch_size, shuffle=True, num_workers=4,
31 | collate_fn=pair_generate_func)
32 |
33 | for epoch in range(config.epochs):
34 | start_time = time.time()
35 | model.train()
36 |
37 | loss_total = list()
38 | top_loss = 0
39 | tqdm_bar = tqdm(pair_loader, desc="Training epoch{epoch}".format(epoch=epoch))
40 | for i, (batch_src, batch_dst) in enumerate(tqdm_bar):
41 | batch_src = batch_src.cuda().long()
42 | batch_dst = batch_dst.cuda().long()
43 |
44 | batch_neg = np.random.randint(0, graph.num_nodes(), size=(batch_src.shape[0], config.neg_num))
45 | batch_neg = torch.from_numpy(batch_neg).cuda().long() # change multi neg_num
46 |
47 | model.zero_grad()
48 | loss = model.forward(batch_src, batch_dst, batch_neg)
49 | loss.backward()
50 | optimizer.step()
51 | loss_total.append(loss.detach().item())
52 |
53 | if top_loss > np.mean(loss_total):
54 | top_loss = np.mean(loss_total)
55 | torch.save(model.state_dict(), config.save_path)
56 | print("Epoch: %03d; loss = %.4f saved path: %s" % (epoch, top_loss, config.save_path))
57 | print("Epoch: %03d; loss = %.4f cost time %.4f" % (epoch, np.mean(loss_total), time.time() - start_time))
58 |
59 |
60 | if __name__ == "__main__":
61 | class ConfigClass():
62 | def __init__(self):
63 | self.lr = 0.005
64 | self.gpu = "0"
65 | self.epochs = 32
66 | self.embed_dim = 64
67 | self.batch_size = 10
68 | self.walk_mode = "random_walk" # node2vec_random_walk
69 | self.p = 1.0
70 | self.q = 1.0
71 | self.walk_num_per_node = 6
72 | self.walk_length = 12
73 | self.win_size = 6
74 | self.neg_num = 5
75 | self.save_path = "../out/blog_deepwalk_ckpt"
76 | self.file_path = "../data/blog/"
77 |
78 | config = ConfigClass()
79 | main(config)
80 |
81 | # import argparse
82 | # from utils import load_config
83 | #
84 | # parser = argparse.ArgumentParser(description='bert classification')
85 | # parser.add_argument("-c", "--config", type=str, default="./config.yaml")
86 | # args = parser.parse_args()
87 | # config = load_config(args.config)
88 |
89 |
--------------------------------------------------------------------------------
/deepwalk/model.py:
--------------------------------------------------------------------------------
1 | import torch
2 | import torch.nn as nn
3 | import torch.nn.functional as F
4 |
5 | import random
6 | import numpy as np
7 |
8 |
9 | # from gensim.models import Word2Vec
10 |
11 |
12 | class SkipGramModel(nn.Module):
13 | def __init__(self, num_nodes, embed_dim):
14 | super(SkipGramModel, self).__init__()
15 | self.num_nodes = num_nodes
16 | self.emb_dimension = embed_dim
17 |
18 | self.embed_nodes = nn.Embedding(self.num_nodes, self.emb_dimension, sparse=True)
19 | nn.init.xavier_uniform_(self.embed_nodes.weight)
20 | self.loss = nn.BCEWithLogitsLoss()
21 |
22 | def forward(self, src, pos, neg):
23 | embed_src = self.embed_nodes(src) # (B, d)
24 | embed_pos = self.embed_nodes(pos) # (B, d)
25 | embed_neg = self.embed_nodes(neg) # (B, neg_num, d)
26 | # print(embed_src.shape, embed_pos.shape, embed_neg.shape)
27 |
28 | # pos_socre = torch.sum(torch.matmul(embed_src, embed_pos.transpose(0, 1)), 1)
29 | # pos_socre = -F.logsigmoid(pos_socre)
30 | #
31 | # neg_socre = torch.sum(torch.matmul(embed_src, embed_neg.transpose(1, 2)), (1, 2))
32 | # neg_socre = -F.logsigmoid(-neg_socre)
33 |
34 | pos_logits = torch.matmul(embed_src, embed_pos.transpose(0, 1))
35 | ones_label = torch.ones_like(pos_logits)
36 | # print(pos_logits.shape, ones_label.shape)
37 | pos_loss = self.loss(pos_logits, ones_label)
38 |
39 | neg_logits = torch.matmul(embed_src, embed_neg.transpose(1, 2))
40 | zeros_label = torch.zeros_like(neg_logits)
41 | # print(neg_logits.shape, zeros_label.shape)
42 | neg_loss = self.loss(neg_logits, zeros_label)
43 |
44 | loss = (pos_loss + neg_loss) / 2
45 | return loss
46 |
47 |
48 |
49 |
50 |
51 | def skip_gram_model_test():
52 | model = SkipGramModel(1000, embed_dim=32)
53 | model.cuda()
54 |
55 | src = np.random.randint(0, 100, size=10)
56 | src = torch.from_numpy(src).cuda().long()
57 |
58 | dst = np.random.randint(0, 100, size=10)
59 | dst = torch.from_numpy(dst).cuda().long()
60 |
61 | neg = np.random.randint(0, 100, size=(10, 5))
62 | neg = torch.from_numpy(neg).cuda().long()
63 |
64 | print(src.shape, dst.shape, neg.shape)
65 |
66 | print(model(src, dst, neg))
67 |
68 |
69 |
70 |
71 | if __name__ == "__main__":
72 | skip_gram_model_test()
73 |
--------------------------------------------------------------------------------
/deepwalk/node_classfication.py:
--------------------------------------------------------------------------------
1 | import time
2 | import os
3 | import pickle
4 | import numpy as np
5 | import pandas as pd
6 |
7 | import torch
8 | import torch.nn as nn
9 | import torch.nn.functional as F
10 | from torch.utils.data import Dataset, DataLoader
11 |
12 | from sklearn import metrics
13 | from sklearn.model_selection import train_test_split
14 |
15 |
16 | def get_map_dict(dict_path):
17 | a_file = open(dict_path, "rb")
18 | map_dict = pickle.load(a_file)
19 | return map_dict
20 |
21 |
22 | def label_data(config, mode="train"):
23 | node_map_dict = get_map_dict(config.file_path + "node_map_dic.pkl")
24 | df = pd.read_csv(config.file_path + "blog-label.txt", header=None, sep="\t", names=["nodes", "label"])
25 | df.nodes = df.nodes.map(node_map_dict)
26 |
27 | df_label = pd.crosstab(df.nodes, df.label).gt(0).astype(int)
28 | df_label = df_label.reset_index()
29 |
30 | train, test = train_test_split(df_label, test_size=0.1, random_state=123, shuffle=True)
31 | if mode == "train":
32 | node = train.nodes.to_list()
33 | label = train.drop('nodes', axis=1).values.tolist()
34 | else:
35 | node = test.nodes.to_list()
36 | label = test.drop('nodes', axis=1).values.tolist()
37 |
38 | return np.array(node), np.array(label)
39 |
40 |
41 |
42 | class NodesDataset(Dataset):
43 | def __init__(self, config, mode="train"):
44 | self.nodes, self.labels = label_data(config, mode=mode)
45 |
46 | def __len__(self):
47 | return len(self.nodes)
48 |
49 | def __getitem__(self, index):
50 | return self.nodes[index], self.labels[index]
51 |
52 |
53 | class NodeClassification(nn.Module):
54 | def __init__(self, emb, num_class=38):
55 | super(NodeClassification, self).__init__()
56 | self.emb = nn.Embedding.from_pretrained(emb, freeze=True)
57 | self.size = emb.shape[1]
58 | self.num_class = num_class
59 | self.fc = nn.Linear(self.size, self.num_class)
60 |
61 | def forward(self, node):
62 | # node = torch.tensor(node).to(torch.int64)
63 | node_emb = self.emb(node)
64 | prob = self.fc(node_emb)
65 |
66 | return prob
67 |
68 |
69 | def evaluate(test_nodes_loader, model):
70 | model.eval()
71 |
72 | predict_all = np.array([], dtype=int)
73 | labels_all = np.array([], dtype=int)
74 |
75 | for i, (batch_nodes, batch_labels) in enumerate(test_nodes_loader):
76 | batch_nodes = batch_nodes.cuda().long()
77 | batch_labels = batch_labels.cuda().float()
78 |
79 | logit = model(batch_nodes)
80 | probs = torch.sigmoid(logit)
81 |
82 | label = torch.max(batch_labels.data, 1)[1].cpu().numpy()
83 | pred = torch.max(probs.data, 1)[1].cpu().numpy()
84 | # predic = torch.max(probs.news, 1)[1].cpu().numpy()
85 | print(pred.size, label.size)
86 |
87 | labels_all = np.append(labels_all, label)
88 | predict_all = np.append(predict_all, pred)
89 | print(labels_all.size, predict_all.size)
90 | f1_score = metrics.f1_score(labels_all, predict_all, average="macro")
91 | return f1_score
92 |
93 |
94 | def main(config):
95 | os.environ["CUDA_VISIBLE_DEVICES"] = config.gpu
96 | # device = torch.device(config.gpu)
97 | torch.backends.cudnn.benchmark = True
98 |
99 | param = torch.load(config.save_path)
100 | emb = param["embed_nodes.weight"]
101 | print(emb.shape)
102 |
103 | train_nodes_dataset = NodesDataset(config, "train")
104 | train_nodes_loader = DataLoader(train_nodes_dataset, batch_size=config.batch_size, shuffle=True, num_workers=4)
105 | test_nodes_dataset = NodesDataset(config, "test")
106 | test_nodes_loader = DataLoader(test_nodes_dataset, batch_size=config.batch_size, shuffle=False, num_workers=4)
107 | print("--", len(train_nodes_loader), len(test_nodes_loader))
108 |
109 | model = NodeClassification(emb, num_class=config.num_class)
110 | model.cuda()
111 | optimizer = torch.optim.AdamW(model.parameters(), lr=float(config.lr), betas=(0.9, 0.999), eps=1e-08,
112 | weight_decay=0.01, amsgrad=False)
113 | loss_func = nn.BCEWithLogitsLoss()
114 |
115 | start_time = time.time()
116 | for epoch in range(config.epochs):
117 | loss_total = list()
118 | for i, (batch_nodes, batch_labels) in enumerate(train_nodes_loader):
119 | batch_nodes = batch_nodes.cuda().long()
120 | batch_labels = batch_labels.cuda().float()
121 |
122 | model.zero_grad()
123 | logit = model(batch_nodes)
124 | probs = torch.sigmoid(logit)
125 | loss = loss_func(probs, batch_labels)
126 | loss.backward()
127 | optimizer.step()
128 |
129 | loss_total.append(loss.detach().item())
130 | print("Epoch: %03d; loss = %.4f cost time %.4f" % (epoch, np.mean(loss_total), time.time() - start_time))
131 | f1 = evaluate(test_nodes_loader, model)
132 | print("Epoch: %03d; f1 = %.4f" % (epoch, f1))
133 |
134 |
135 | if __name__ == "__main__":
136 | class ConfigClass():
137 | def __init__(self):
138 | self.lr = 0.05
139 | self.gpu = "0"
140 | self.epochs = 32
141 | self.batch_size = 256
142 | self.num_class = 39
143 | self.save_path = "../out/blog_deepwalk_ckpt"
144 | self.file_path = "../data/blog/"
145 |
146 |
147 | config = ConfigClass()
148 | main(config)
149 |
--------------------------------------------------------------------------------
/deepwalk/test/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/xinyi-code/Graph-Learning/fb8ff32e2e1ab4dbebb60ce0acd75ae595d806b8/deepwalk/test/__init__.py
--------------------------------------------------------------------------------
/deepwalk/test/test_hetergraph.py:
--------------------------------------------------------------------------------
1 | import dgl
2 | import torch
3 |
4 |
5 | g = dgl.heterograph({('user', 'plays', 'game'): (torch.tensor([0]), torch.tensor([1])),
6 | ('developer', 'develops', 'game'): (torch.tensor([1]), torch.tensor([2]))})
7 |
8 | g.dstnodes('game')
9 |
10 | g.dstnodes['game'].data['h'] = torch.ones(3, 1)
11 | print(g.dstnodes['game'].data['h'])
12 |
13 |
14 | g = dgl.heterograph({('user', 'follows', 'user'): (torch.tensor([0]), torch.tensor([1])),
15 | ('developer', 'develops', 'game'): (torch.tensor([1]), torch.tensor([2]))})
16 |
17 |
18 | g.dstnodes('developer')
19 |
20 | g.dstnodes['developer'].data['h'] = torch.ones(2, 1)
21 | print(g.dstnodes['developer'].data['h'])
--------------------------------------------------------------------------------
/deepwalk/utils.py:
--------------------------------------------------------------------------------
1 | import yaml
2 |
3 |
4 | class AttrDict(dict):
5 | """Attr dict: make value private
6 | """
7 |
8 | def __init__(self, d):
9 | self.dict = d
10 |
11 | def __getattr__(self, attr):
12 | value = self.dict[attr]
13 | if isinstance(value, dict):
14 | return AttrDict(value)
15 | else:
16 | return value
17 |
18 | def __str__(self):
19 | return str(self.dict)
20 |
21 |
22 | def load_config(config_file):
23 | """Load config file"""
24 | with open(config_file) as f:
25 | if hasattr(yaml, 'FullLoader'):
26 | config = yaml.load(f, Loader=yaml.FullLoader)
27 | else:
28 | config = yaml.load(f)
29 | print(config)
30 | return AttrDict(config)
31 |
32 |
33 | def skip_gram_gen_pairs(walk, half_win_size=2):
34 | src, dst = list(), list()
35 |
36 | l = len(walk)
37 | # rnd = np.random.randint(1, half_win_size+1, dtype=np.int64, size=l)
38 | for i in range(l):
39 | real_win_size = half_win_size
40 | left = i - real_win_size
41 | if left < 0:
42 | left = 0
43 | right = i + real_win_size
44 | if right >= l:
45 | right = l - 1
46 | for j in range(left, right + 1):
47 | if walk[i] == walk[j]:
48 | continue
49 | src.append(walk[i])
50 | dst.append(walk[j])
51 | return src, dst
52 |
53 |
54 | if __name__ == "__main__":
55 | import argparse
56 |
57 | parser = argparse.ArgumentParser(description='text classification')
58 | parser.add_argument("-c", "--config", type=str, default="./config.yaml")
59 | args = parser.parse_args()
60 | config = load_config(args.config)
61 |
--------------------------------------------------------------------------------
/gat/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/xinyi-code/Graph-Learning/fb8ff32e2e1ab4dbebb60ce0acd75ae595d806b8/gat/__init__.py
--------------------------------------------------------------------------------
/gat/__pycache__/model.cpython-38.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/xinyi-code/Graph-Learning/fb8ff32e2e1ab4dbebb60ce0acd75ae595d806b8/gat/__pycache__/model.cpython-38.pyc
--------------------------------------------------------------------------------
/gat/model.py:
--------------------------------------------------------------------------------
1 | import torch
2 | import torch.nn as nn
3 | import torch.nn.functional as F
4 |
5 |
6 | class GATLayer(nn.Module):
7 | def __init__(self, g, in_feats, out_feats,
8 | feat_drop=0.6, attn_drop=0.6,
9 | negative_slope=0.2, residual=False, activation=None):
10 | ''' define a GAT layer:
11 | you can adjust the parameter of drop rate and negative_slope to get bette result
12 | for cora dataset because cora is too small to get overfitting.
13 | '''
14 | super(GATLayer, self).__init__()
15 | self.g = g
16 | self.dropout_feat = nn.Dropout(feat_drop)
17 | self.fc = nn.Linear(in_feats, out_feats, bias=False)
18 | self.dropout_attn = nn.Dropout(attn_drop)
19 | self.attention_func = nn.Linear(2 * out_feats, 1, bias=False)
20 | self.activation = activation
21 | self.leaky_relu = nn.LeakyReLU(negative_slope)
22 |
23 | def edge_attention(self, edges):
24 | concat_z = torch.cat([edges.src['z'], edges.dst['z']], dim=1)
25 | src_e = self.attention_func(concat_z)
26 | src_e = self.leaky_relu(src_e)
27 | return {'e': src_e}
28 |
29 | def message_func(self, edges):
30 | return {'z': edges.src['z'], 'e': edges.data['e']}
31 |
32 | def reduce_func(self, nodes):
33 | alpha = F.softmax(nodes.mailbox['e'], dim=1)
34 | alpha = self.dropout_attn(alpha) # add attention dropout
35 | h = torch.sum(alpha * nodes.mailbox['z'], dim=1)
36 | return {'h': h}
37 |
38 | def forward(self, h):
39 | h = self.dropout_feat(h) # add feat dropout
40 | z = self.fc(h)
41 | self.g.ndata['z'] = z
42 | self.g.apply_edges(self.edge_attention)
43 | self.g.update_all(self.message_func, self.reduce_func)
44 | return self.g.ndata.pop('h')
45 |
46 |
47 | class MultiHeadGATLayer(nn.Module):
48 | def __init__(self, g, in_dim, out_dim, num_heads, merge='cat'):
49 | super(MultiHeadGATLayer, self).__init__()
50 | self.heads = nn.ModuleList()
51 | for i in range(num_heads):
52 | self.heads.append(GATLayer(g, in_dim, out_dim))
53 | self.merge = merge
54 |
55 | def forward(self, h):
56 | head_outs = [attn_head(h) for attn_head in self.heads]
57 | if self.merge == 'cat':
58 | return torch.cat(head_outs, dim=1)
59 | else:
60 | return torch.mean(torch.stack(head_outs))
61 |
62 |
63 | class GATModel(nn.Module):
64 | def __init__(self, g, in_dim, hidden_dim, out_dim, num_heads):
65 | super(GATModel, self).__init__()
66 | self.in_dim = in_dim
67 | self.hidden_dim = hidden_dim
68 | self.out_dim = out_dim
69 | self.num_heads = num_heads
70 |
71 | self.layer1 = MultiHeadGATLayer(g, in_dim, hidden_dim, num_heads)
72 | # input dimension: hidden_dim * num_heads
73 | # output dimension: hidden_dim * 1 (由于concat(num_heads), 只输出一个头。)
74 | self.layer2 = MultiHeadGATLayer(g, hidden_dim * num_heads, out_dim, 1)
75 |
76 | def forward(self, h):
77 | h = self.layer1(h)
78 | h = F.elu(h)
79 | h = self.layer2(h)
80 | return h
81 |
--------------------------------------------------------------------------------
/gat/train.py:
--------------------------------------------------------------------------------
1 | import time
2 | import argparse
3 | import numpy as np
4 |
5 | import torch
6 | import torch.nn.functional as F
7 |
8 | import dgl
9 | from dgl import DGLGraph
10 | from dgl.data import CoraGraphDataset
11 |
12 | from model import GATModel
13 |
14 |
15 | def load_cora_data(args):
16 | if args.dataset == 'cora':
17 | data = CoraGraphDataset()
18 | else:
19 | data = None
20 | raise NotImplementedError
21 |
22 | g = data[0]
23 | features = g.ndata['feat']
24 | labels = g.ndata['label']
25 | train_mask = g.ndata['train_mask']
26 | val_mask = g.ndata['val_mask']
27 | test_mask = g.ndata['test_mask']
28 | # num_feats = features.shape[1]
29 | # n_classes = data.num_labels
30 | # n_edges = data.graph.number_of_edges()
31 | return g, features, labels, train_mask, val_mask, test_mask
32 |
33 |
34 | def accuracy(logits, labels):
35 | _, indices = torch.max(logits, dim=1)
36 | correct = torch.sum(indices == labels)
37 | return correct.item() * 1.0 / len(labels)
38 |
39 |
40 | def evaluate(model, features, labels, mask):
41 | model.eval()
42 | with torch.no_grad():
43 | logits = model(features)
44 | logits = logits[mask]
45 | labels = labels[mask]
46 | return accuracy(logits, labels)
47 |
48 |
49 | def main(args):
50 | g, features, labels, train_mask, val_mask, test_mask = load_cora_data(args)
51 |
52 | if args.add_self_loop:
53 | # add self loop
54 | g = dgl.remove_self_loop(g)
55 | g = dgl.add_self_loop(g)
56 |
57 | model = GATModel(g,
58 | in_dim=features.size()[1],
59 | hidden_dim=8,
60 | out_dim=7,
61 | num_heads=8)
62 | # print(model)
63 |
64 | optimizer = torch.optim.Adam(model.parameters(), lr=args.lr, weight_decay=args.weight_decay)
65 |
66 | top_val_acc, top_test_acc = 0, 0
67 | cost_time = []
68 | for epoch in range(args.epochs):
69 | t0 = time.time()
70 |
71 | logits = model(features)
72 | logp = F.log_softmax(logits, 1)
73 | loss = F.nll_loss(logp[train_mask], labels[train_mask])
74 |
75 | optimizer.zero_grad()
76 | loss.backward()
77 | optimizer.step()
78 |
79 | val_acc = evaluate(model, features, labels, val_mask)
80 |
81 | print("Epoch {:03d} | val acc {:.4f}| Loss {:.4f} | Time(s) {:.8f}".format(
82 | epoch, val_acc, loss.item(), time.time() - t0))
83 |
84 | if top_val_acc <= val_acc:
85 | top_val_acc = val_acc
86 | acc = evaluate(model, features, labels, test_mask)
87 | top_test_acc = max(top_test_acc, acc)
88 | print("Test Accuracy {:.4f}".format(acc))
89 | print(f"Top Test Acc: {top_test_acc}")
90 |
91 | if __name__ == '__main__':
92 | parser = argparse.ArgumentParser(description='GAT')
93 | parser.add_argument("--dataset", type=str, default='cora',
94 | help="which dataset to use.")
95 | parser.add_argument("--gpu", type=int, default=-1,
96 | help="which GPU to use. Set -1 to use CPU.")
97 | parser.add_argument("--epochs", type=int, default=200,
98 | help="number of training epochs")
99 | parser.add_argument("--num-heads", type=int, default=8,
100 | help="number of hidden attention heads")
101 | parser.add_argument("--num-out-heads", type=int, default=1,
102 | help="number of output attention heads")
103 | parser.add_argument("--num-layers", type=int, default=2,
104 | help="number of hidden layers")
105 | parser.add_argument("--num-hidden", type=int, default=8,
106 | help="number of hidden units")
107 | parser.add_argument("--residual", action="store_true", default=False,
108 | help="use residual connection")
109 | parser.add_argument("--add_self_loop", type=bool, default=True,
110 | help="add self loop")
111 | parser.add_argument("--in-drop", type=float, default=.6,
112 | help="input feature dropout")
113 | parser.add_argument("--attn-drop", type=float, default=.6,
114 | help="attention dropout")
115 | parser.add_argument("--lr", type=float, default=0.005,
116 | help="learning rate")
117 | parser.add_argument('--weight-decay', type=float, default=5e-4,
118 | help="weight decay")
119 | parser.add_argument('--negative-slope', type=float, default=0.2,
120 | help="the negative slope of leaky relu")
121 | args = parser.parse_args()
122 | print(args)
123 |
124 | main(args)
125 |
--------------------------------------------------------------------------------
/gcn/build_graph.py:
--------------------------------------------------------------------------------
1 | import math
2 | import random
3 | import scipy.sparse as sp
4 | from scipy.sparse import coo_matrix, csr_matrix
5 | import numpy as np
6 |
7 | import dgl
8 | import torch
9 |
10 | from utils import preprocess_adj
11 | from dgl.data import CoraGraphDataset
12 |
13 |
14 | class GraphBuild(object):
15 | def __init__(self):
16 | # self.graph = self.build_graph_test()
17 | self.graph = self.build_graph_cora()
18 | self.adj = self.get_adj(self.graph)
19 | self.features = self.init_node_feat(self.graph)
20 |
21 | def build_graph_test(self):
22 | """a demo graph: just for graph test
23 | """
24 | src_nodes = torch.tensor([0, 0, 1, 1, 2, 2, 2, 3, 3, 3, 3, 4, 5, 6])
25 | dst_nodes = torch.tensor([1, 2, 0, 2, 0, 1, 3, 4, 5, 6, 2, 3, 3, 3])
26 | graph = dgl.graph((src_nodes, dst_nodes))
27 | # edges weights if edges has else 1
28 | graph.edata["w"] = torch.ones(graph.num_edges())
29 | return graph
30 |
31 | def build_graph_cora(self):
32 | # Default: ~/.dgl/
33 | data = CoraGraphDataset()
34 | graph = data[0]
35 |
36 | return graph
37 |
38 | def convert_symmetric(self, X, sparse=True):
39 | # add symmetric edges
40 | if sparse:
41 | X += X.T - sp.diags(X.diagonal())
42 | else:
43 | X += X.T - np.diag(X.diagonal())
44 | return X
45 |
46 | def add_self_loop(self, graph):
47 | # add self loop
48 | graph = dgl.remove_self_loop(graph)
49 | graph = dgl.add_self_loop(graph)
50 | return graph
51 |
52 | def get_adj(self, graph):
53 | graph = self.add_self_loop(graph)
54 | # edges weights if edges has weights else 1
55 | graph.edata["w"] = torch.ones(graph.num_edges())
56 | adj = coo_matrix((graph.edata["w"], (graph.edges()[0], graph.edges()[1])),
57 | shape=(graph.num_nodes(), graph.num_nodes()))
58 |
59 | # add symmetric edges
60 | adj = self.convert_symmetric(adj, sparse=True)
61 | # adj normalize and transform matrix to torch tensor type
62 | adj = preprocess_adj(adj, is_sparse=True)
63 |
64 | return adj
65 |
66 | def init_node_feat(self, graph):
67 | # init graph node features
68 | self.nfeat_dim = graph.number_of_nodes()
69 |
70 | row = list(range(self.nfeat_dim))
71 | col = list(range(self.nfeat_dim))
72 | indices = torch.from_numpy(
73 | np.vstack((row, col)).astype(np.int64))
74 | values = torch.ones(self.nfeat_dim)
75 |
76 | features = torch.sparse.FloatTensor(indices, values,
77 | (self.nfeat_dim, self.nfeat_dim))
78 | return features
79 |
80 |
81 | if __name__ == "__main__":
82 | GraphSet = GraphBuild()
83 | graph = GraphSet.graph
84 | graph = GraphSet.add_self_loop(graph)
85 | print(graph)
86 | # print(graph.ndata['feat'].shape)
87 | features = GraphSet.init_node_feat(graph) # (num_nodes, num_nodes)
88 | adj = GraphSet.get_adj(graph)
89 | print(features.shape, adj.shape)
90 | print(adj.shape) # (10556, 10556)
91 | print(graph.edges())
92 |
--------------------------------------------------------------------------------
/gcn/gcn_model.py:
--------------------------------------------------------------------------------
1 | import math
2 |
3 | import torch
4 | import torch.nn as nn
5 | from torch.nn.parameter import Parameter
6 | from torch.nn.modules.module import Module
7 |
8 |
9 | class GraphConvolution(Module):
10 | """
11 | Simple GCN layer, similar to https://arxiv.org/abs/1609.02907
12 | """
13 |
14 | def __init__(self, in_features_dim, out_features_dim, activation=None, bias=True):
15 | super(GraphConvolution, self).__init__()
16 | self.in_features = in_features_dim
17 | self.out_features = out_features_dim
18 | self.activation = activation
19 | self.weight = Parameter(torch.FloatTensor(in_features_dim, out_features_dim))
20 | if bias:
21 | self.bias = Parameter(torch.FloatTensor(out_features_dim))
22 | else:
23 | self.register_parameter('bias', None)
24 | self.reset_parameters()
25 |
26 | def reset_parameters(self):
27 | stdv = 1. / math.sqrt(self.weight.size(1))
28 | # self.weight.news.uniform_(-stdv, stdv)
29 | nn.init.xavier_uniform_(self.weight)
30 | if self.bias is not None:
31 | # self.bias.news.uniform_(-stdv, stdv)
32 | nn.init.zeros_(self.bias)
33 |
34 | def forward(self, infeatn, adj):
35 | '''
36 | infeatn: init feature(H)
37 | adj: A
38 | '''
39 | support = torch.spmm(infeatn, self.weight) # H*W # (in_feat_dim, in_feat_dim) * (in_feat_dim, out_dim)
40 | output = torch.spmm(adj, support) # A*H*W # (in_feat_dim, in_feat_dim) * (in_feat_dim, out_dim)
41 | if self.bias is not None:
42 | output = output + self.bias
43 |
44 | if self.activation is not None:
45 | output = self.activation(output)
46 |
47 | return output
48 |
49 | def __repr__(self):
50 | return self.__class__.__name__ + ' (' \
51 | + str(self.in_features) + ' -> ' \
52 | + str(self.out_features) + ')'
53 |
54 |
55 | class GCN(Module):
56 | def __init__(self, nfeat, nhid, nclass, n_layers, activation, dropout):
57 | super(GCN, self).__init__()
58 | self.layers = nn.ModuleList()
59 | # input layer
60 | self.layers.append(GraphConvolution(nfeat, nhid, activation=activation))
61 | # hidden layers
62 | for i in range(n_layers - 1):
63 | self.layers.append(GraphConvolution(nhid, nhid, activation=activation))
64 | # output layer
65 | self.layers.append(GraphConvolution(nhid, nclass))
66 | self.dropout = torch.nn.Dropout(p=dropout)
67 |
68 | def forward(self, x, adj):
69 |
70 | h = x
71 | for i, layer in enumerate(self.layers):
72 | if i != 0:
73 | h = self.dropout(h)
74 | h = layer(h, adj)
75 | return h
76 |
--------------------------------------------------------------------------------
/gcn/test_api/nx_to_scipy_sparse_matrix_test.py:
--------------------------------------------------------------------------------
1 | from scipy.sparse import coo_matrix, csr_matrix
2 | import numpy as np
3 | import networkx as nx
4 | import scipy as sp
5 |
6 | def nx_test_to_scipy_sparse_matrix1():
7 | G = nx.Graph([(1, 1)])
8 | A = nx.to_scipy_sparse_matrix(G)
9 | print(A.todense())
10 |
11 | A.setdiag(A.diagonal() * 2)
12 | print(A.todense())
13 |
14 | def nx_test_to_scipy_sparse_matrix2():
15 | G = nx.MultiDiGraph()
16 | G.add_edge(0, 1, weight=2)
17 | G.add_edge(1, 0)
18 | G.add_edge(2, 2, weight=3)
19 | G.add_edge(2, 2)
20 | S = nx.to_scipy_sparse_matrix(G, nodelist=[0, 1, 2])
21 | print(S.shape)
22 | print(S)
23 | print(S.todense())
24 |
25 | def test_sp_csr_matrix():
26 | indptr = np.array([0, 2, 3, 6])
27 | indices = np.array([0, 2, 2, 0, 1, 2])
28 | data = np.array([1, 2, 3, 4, 5, 6])
29 | csr = csr_matrix((data, indices, indptr), shape=(3, 3))
30 | print(csr)
31 | print(csr.toarray())
32 |
33 |
34 | def test_sp_coo_matrix():
35 | row = np.array([0, 0, 1, 3, 1, 0, 0])
36 | col = np.array([0, 2, 1, 3, 1, 0, 0])
37 | data = np.array([1, 1, 1, 1, 1, 1, 1])
38 | coo = coo_matrix((data, (row, col)), shape=(4, 4))
39 | print(coo)
40 | print(coo.toarray())
41 |
42 |
43 | test_sp_coo_matrix()
--------------------------------------------------------------------------------
/gcn/train.py:
--------------------------------------------------------------------------------
1 | import sys
2 | import time
3 | import argparse
4 | import numpy as np
5 |
6 | import torch
7 | import torch.nn.functional as F
8 | from torch.utils.data import Dataset, DataLoader
9 |
10 | import dgl
11 |
12 | from build_graph import GraphBuild
13 | from gcn_model import GCN
14 |
15 |
16 | def evaluate(model, features, adj, labels, mask):
17 | model.eval()
18 | with torch.no_grad():
19 | logits = model(features, adj)
20 | logits = logits[mask]
21 | labels = labels[mask]
22 | _, indices = torch.max(logits, dim=1)
23 | correct = torch.sum(indices == labels)
24 | return correct.item() * 1.0 / len(labels)
25 |
26 |
27 | def main(args):
28 | if args.gpu < 0:
29 | cuda = False
30 | else:
31 | cuda = True
32 | GraphSet = GraphBuild()
33 |
34 | graph = GraphSet.build_graph_cora()
35 | labels = graph.ndata['label']
36 | train_mask = graph.ndata['train_mask']
37 | val_mask = graph.ndata['val_mask']
38 | test_mask = graph.ndata['test_mask']
39 | print(graph.nodes())
40 |
41 | features = graph.ndata['feat'] # shape [2708, 1433]
42 | # features = GraphSet.init_node_feat(graph) # (num_nodes, num_nodes) Test accuracy 66.20%
43 | adj = GraphSet.get_adj(graph)
44 | print(features.shape, adj.shape) # [2708, 1433], [2708, 2708]
45 | # sys.exit()
46 |
47 | model = GCN(nfeat=features.shape[1], nhid=args.dim, nclass=7, n_layers=args.n_layers, activation=F.relu,
48 | dropout=args.dropout)
49 | if cuda:
50 | model.cuda()
51 |
52 | loss_fcn = torch.nn.CrossEntropyLoss()
53 | # use optimizer
54 | optimizer = torch.optim.Adam(model.parameters(),
55 | lr=args.lr,
56 | weight_decay=args.weight_decay)
57 | # scheduler = torch.optim.lr_scheduler.ExponentialLR(optimizer=optimizer, gamma=0.96) # lr decay
58 |
59 | dur = list()
60 | for epoch in range(args.epochs):
61 | model.train()
62 | if epoch >= 3:
63 | t0 = time.time()
64 | # forward
65 | logits = model(features, adj)
66 | loss = loss_fcn(logits[train_mask], labels[train_mask])
67 |
68 | optimizer.zero_grad()
69 | loss.backward()
70 | optimizer.step()
71 |
72 | if epoch >= 3:
73 | dur.append(time.time() - t0)
74 | acc = evaluate(model, features, adj, labels, val_mask)
75 | print("Epoch {:05d} | Time(s) {:.4f} | Loss {:.4f} | Accuracy {:.4f} | "
76 | "ETputs(KTEPS) {:.2f}".format(epoch, np.mean(dur), loss.item(),
77 | acc, graph.number_of_edges() / np.mean(dur) / 1000))
78 | # scheduler.step()
79 |
80 | print()
81 | acc = evaluate(model, features, adj, labels, test_mask)
82 | print("Test accuracy {:.2%}".format(acc)) # Test accuracy ~0.806 (0.793-0.819) (paper: 0.815)
83 |
84 |
85 | if __name__ == "__main__":
86 | parser = argparse.ArgumentParser(description='Graph Parameters Set.')
87 | parser.add_argument('--gpu', metavar='N', type=int, default=-1,
88 | help='an integer for the accumulator')
89 | parser.add_argument('--batch_size', metavar='N', type=int, default=128,
90 | help='an integer for the accumulator')
91 | parser.add_argument('--num_workers', metavar='N', type=int, default=6,
92 | help='an integer for the accumulator')
93 | parser.add_argument("--epochs", type=int, default=200,
94 | help="number of training epochs")
95 | parser.add_argument("--n_layers", type=int, default=1,
96 | help="number of training epochs")
97 | parser.add_argument('--dim', metavar='N', type=int, default=128,
98 | help='an integer for the accumulator')
99 | parser.add_argument("--lr", type=float, default=1e-2,
100 | help="learning rate")
101 | parser.add_argument("--weight-decay", type=float, default=5e-4,
102 | help="Weight for L2 loss")
103 | parser.add_argument("--dropout", type=float, default=0.1,
104 | help="dropout probability")
105 |
106 | args = parser.parse_args()
107 | main(args)
108 |
--------------------------------------------------------------------------------
/gcn/utils.py:
--------------------------------------------------------------------------------
1 | import torch
2 | import numpy as np
3 | import scipy.sparse as sp
4 |
5 |
6 | def preprocess_adj(adj, is_sparse=False):
7 | """Preprocessing of adjacency matrix for simple pygGCN model and conversion to
8 | tuple representation."""
9 | adj_normalized = normalize_adj(adj + sp.eye(adj.shape[0]))
10 | if is_sparse:
11 | adj_normalized = sparse_mx_to_torch_sparse_tensor(adj_normalized)
12 | return adj_normalized
13 | else:
14 | return torch.from_numpy(adj_normalized.A).float()
15 |
16 |
17 | def sparse_mx_to_torch_sparse_tensor(sparse_mx):
18 | """Convert a scipy sparse matrix to a torch sparse tensor."""
19 | sparse_mx = sparse_mx.tocoo().astype(np.float32)
20 | indices = torch.from_numpy(
21 | np.vstack((sparse_mx.row, sparse_mx.col)).astype(np.int64))
22 | values = torch.from_numpy(sparse_mx.data)
23 | shape = torch.Size(sparse_mx.shape)
24 | return torch.sparse.FloatTensor(indices, values, shape)
25 |
26 |
27 | def normalize_adj(adj):
28 | """Symmetrically normalize adjacency matrix."""
29 | adj = sp.coo_matrix(adj)
30 | # print(f"sparse adj: {adj}")
31 | rowsum = np.array(adj.sum(1))
32 | # print(f"rowsum: {rowsum.shape}")
33 | d_inv_sqrt = np.power(rowsum, -0.5).flatten()
34 | d_inv_sqrt[np.isinf(d_inv_sqrt)] = 0.
35 | d_mat_inv_sqrt = sp.diags(d_inv_sqrt)
36 | # print(d_mat_inv_sqrt)
37 | # D^(-1/2)AD^(-1/2)
38 | return adj.dot(d_mat_inv_sqrt).transpose().dot(d_mat_inv_sqrt).tocoo()
39 | # return d_mat_inv_sqrt.dot(adj).transpose().dot(d_mat_inv_sqrt).tocoo()
40 |
--------------------------------------------------------------------------------
/graphsage/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/xinyi-code/Graph-Learning/fb8ff32e2e1ab4dbebb60ce0acd75ae595d806b8/graphsage/__init__.py
--------------------------------------------------------------------------------
/graphsage/link_prediction/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/xinyi-code/Graph-Learning/fb8ff32e2e1ab4dbebb60ce0acd75ae595d806b8/graphsage/link_prediction/__init__.py
--------------------------------------------------------------------------------
/graphsage/link_prediction/__pycache__/dataloader.cpython-38.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/xinyi-code/Graph-Learning/fb8ff32e2e1ab4dbebb60ce0acd75ae595d806b8/graphsage/link_prediction/__pycache__/dataloader.cpython-38.pyc
--------------------------------------------------------------------------------
/graphsage/link_prediction/__pycache__/model.cpython-38.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/xinyi-code/Graph-Learning/fb8ff32e2e1ab4dbebb60ce0acd75ae595d806b8/graphsage/link_prediction/__pycache__/model.cpython-38.pyc
--------------------------------------------------------------------------------
/graphsage/link_prediction/dataloader.py:
--------------------------------------------------------------------------------
1 | import sys
2 | import argparse
3 | import torch
4 | from torch.utils.data import IterableDataset, Dataset, DataLoader
5 |
6 | import dgl
7 | from dgl.data import AIFBDataset, MUTAGDataset, BGSDataset, AMDataset, CoraGraphDataset
8 |
9 | import random
10 | from loguru import logger
11 | random.seed(123)
12 |
13 |
14 | class NodesSet(Dataset):
15 | def __init__(self, g, neg_num=1):
16 | # only load masked node for training/testing
17 | self.g = g
18 | self.nodes = g.nodes().tolist()
19 | self.neg_num = neg_num # wait to complement
20 |
21 | def __len__(self):
22 | return len(self.nodes)
23 |
24 | def __getitem__(self, index):
25 | heads = self.nodes[index]
26 | pos_nodes = dgl.sampling.random_walk(self.g,
27 | heads,
28 | length=1)[0][:, 1].tolist()[0]
29 |
30 | neg_nodes = random.sample(self.nodes, k=self.neg_num)[0]
31 | # logger.info(f"heads: {heads}")
32 | # logger.info(f"pos_nodes: {pos_nodes}")
33 | # logger.info(f"neg_nodes: {neg_nodes}")
34 |
35 | return heads, pos_nodes, neg_nodes
36 |
37 |
38 | class NodesGraphCollactor(object):
39 | """
40 | select heads/tails/neg_tails's neighbors for aggregation
41 | """
42 |
43 | def __init__(self, g, neighbors_every_layer=[5, 1]):
44 | self.g = g
45 | self.nodes = g.nodes().tolist()
46 | self.neighbors_every_layer = neighbors_every_layer
47 |
48 |
49 | def __call__(self, batch):
50 | # logger.info(f"batch: {batch}")
51 | # pos_nodes, neg_nodes = self.sample_pos_neg_nodes(batch)
52 | # heads: [2569, 741]
53 | # pos_nodes: tensor([2268, 1423])
54 | # neg_nodes: [[1827, 1051, 1862, 477, 1595], [1907, 634, 88, 495, 2697]]
55 | heads = [b[0] for b in batch]
56 | tails = [b[1] for b in batch]
57 | neg_tails = [b[2] for b in batch]
58 |
59 | heads, tails, neg_tails = torch.tensor(heads), torch.tensor(tails), torch.tensor(neg_tails)
60 | # logger.info(heads, tails, neg_tails)
61 | pos_graph, neg_graph, blocks, all_seeds = self.sample_from_item_pairs(heads, tails, neg_tails)
62 |
63 | return pos_graph, neg_graph, blocks, set(all_seeds)
64 |
65 | def sample_from_item_pairs(self, heads, tails, neg_tails):
66 | # Create a graph with positive connections only and another graph with negative
67 | # connections only.
68 | pos_graph = dgl.graph(
69 | (heads, tails),
70 | num_nodes=self.g.number_of_nodes())
71 | neg_graph = dgl.graph(
72 | (heads, neg_tails),
73 | num_nodes=self.g.number_of_nodes())
74 | pos_graph, neg_graph = dgl.compact_graphs([pos_graph, neg_graph])
75 | seeds = pos_graph.ndata[dgl.NID]
76 |
77 | blocks, all_seeds = self.sample_blocks(seeds, heads, tails, neg_tails)
78 | return pos_graph, neg_graph, blocks, all_seeds
79 |
80 | def sample_blocks(self, seeds, heads=None, tails=None, neg_tails=None):
81 | blocks, all_seeds = [], []
82 | for n_neighbors in self.neighbors_every_layer:
83 | frontier = dgl.sampling.sample_neighbors(
84 | self.g,
85 | seeds,
86 | fanout=n_neighbors,
87 | edge_dir='in')
88 | if heads is not None:
89 | eids = frontier.edge_ids(torch.cat([heads, heads]), torch.cat([tails, neg_tails]), return_uv=True)[2]
90 | if len(eids) > 0:
91 | old_frontier = frontier
92 | frontier = dgl.remove_edges(old_frontier, eids)
93 | block = self.compact_and_copy(frontier, seeds)
94 | seeds = block.srcdata[dgl.NID] # 这里应该返回这一层的src node
95 | # logger.info(f"seeds: {seeds}")
96 | all_seeds += seeds.tolist()
97 | blocks.insert(0, block)
98 | return blocks, all_seeds
99 |
100 | def compact_and_copy(self, frontier, seeds):
101 | # 将第一轮的dst节点与frontier压缩成block
102 | # 并设置block的seeds 为 output nodes,其他为input nodes
103 | block = dgl.to_block(frontier, seeds)
104 | for col, data in frontier.edata.items():
105 | if col == dgl.EID:
106 | continue
107 | block.edata[col] = data[block.edata[dgl.EID]]
108 | return block
109 |
110 |
111 | def build_cora_dataset(add_symmetric_edges=True, add_self_loop=True):
112 | dataset = CoraGraphDataset()
113 | graph = dataset[0]
114 |
115 | train_mask = graph.ndata['train_mask']
116 | val_mask = graph.ndata['val_mask']
117 | test_mask = graph.ndata['test_mask']
118 | labels = graph.ndata['label']
119 | feat = graph.ndata['feat']
120 |
121 | if add_symmetric_edges:
122 | edges = graph.edges()
123 | graph.add_edges(edges[1], edges[0])
124 |
125 | graph = dgl.remove_self_loop(graph)
126 | if add_self_loop:
127 | graph = dgl.add_self_loop(graph)
128 | return graph
129 |
130 |
131 | if __name__ == "__main__":
132 | parser = argparse.ArgumentParser(description="parameter set")
133 | parser.add_argument('--dataset', type=str, default='aifb')
134 |
135 | args = parser.parse_args()
136 | graph = build_cora_dataset()
137 | train_mask = graph.ndata['train_mask']
138 |
139 | batch_sampler = NodesSet(graph, train_mask)
140 | collator = NodesGraphCollactor(graph, neighbors_every_layer=[5, 2])
141 | dataloader = DataLoader(
142 | batch_sampler,
143 | batch_size=2,
144 | shuffle=True,
145 | num_workers=1,
146 | collate_fn=collator
147 | )
148 | # for step, (input_nodes, pos_graph, neg_graph, blocks) in enumerate(dataloader):
149 | for step, (batch, heads_seeds, heads_blocks, all_seeds) in enumerate(dataloader):
150 | logger.info(f"---------step: {step}")
151 | logger.info(f"nodes: {batch}")
152 | logger.info(f"seeds: {heads_seeds}")
153 | logger.info(f"graph: {heads_blocks}")
154 | logger.info(f"grapall_seedsh: {all_seeds}")
155 |
156 | break
157 |
--------------------------------------------------------------------------------
/graphsage/link_prediction/model.py:
--------------------------------------------------------------------------------
1 | import torch
2 | import torch.nn as nn
3 | import torch.nn.functional as F
4 | from torch.nn.parameter import Parameter
5 |
6 | import dgl
7 | import dgl.nn.pytorch as dglnn
8 | import dgl.function as fn
9 |
10 |
11 | class WeightedSAGEConv(nn.Module):
12 | def __init__(self, input_dims, output_dims, act=F.relu, dropout=0.5, bias=True):
13 | super().__init__()
14 |
15 | self.act = act
16 | self.dropout = nn.Dropout(dropout)
17 | self.Q = nn.Linear(input_dims, output_dims)
18 | self.W = nn.Linear(input_dims + output_dims, output_dims)
19 | if bias:
20 | self.bias = Parameter(torch.FloatTensor(output_dims))
21 | else:
22 | self.register_parameter('bias', None)
23 | # self.dropout = nn.Dropout(dropout)
24 | self.reset_parameters()
25 |
26 | def reset_parameters(self):
27 | gain = nn.init.calculate_gain('relu')
28 | nn.init.xavier_uniform_(self.Q.weight, gain=gain)
29 | nn.init.xavier_uniform_(self.W.weight, gain=gain)
30 | nn.init.constant_(self.Q.bias, 0)
31 | nn.init.constant_(self.W.bias, 0)
32 | if self.bias is not None:
33 | nn.init.zeros_(self.bias)
34 |
35 | def forward(self, g, h, weights=None):
36 | """
37 | g : graph
38 | h : node features
39 | weights : scalar edge weights
40 | """
41 | h_src, h_dst = h
42 | with g.local_scope():
43 | if weights:
44 | g.srcdata['n'] = self.act(self.Q(self.dropout(h_src)))
45 | g.edata['w'] = weights.float()
46 | g.update_all(fn.u_mul_e('n', 'w', 'm'), fn.sum('m', 'n'))
47 | g.update_all(fn.copy_e('w', 'm'), fn.sum('m', 'ws'))
48 | n = g.dstdata['n']
49 | ws = g.dstdata['ws'].unsqueeze(1).clamp(min=1)
50 | z = self.act(self.W(self.dropout(torch.cat([n / ws, h_dst], 1))))
51 | z_norm = z.norm(2, 1, keepdim=True)
52 | z_norm = torch.where(z_norm == 0, torch.tensor(1.).to(z_norm), z_norm)
53 | z = z / z_norm
54 | else:
55 | g.srcdata['n'] = self.Q(h_src)
56 | g.update_all(fn.copy_src('n', 'm'), fn.mean('m', 'neigh')) # aggregation
57 | n = g.dstdata['neigh']
58 | z = self.act(self.W(torch.cat([n, h_dst], 1))) + self.bias
59 | z_norm = z.norm(2, 1, keepdim=True)
60 | z_norm = torch.where(z_norm == 0, torch.tensor(1.).to(z_norm), z_norm)
61 | z = z / z_norm
62 | return z
63 |
64 |
65 | class SAGENet(nn.Module):
66 | def __init__(self, input_dim, hidden_dims, output_dims,
67 | n_layers, act=F.relu, dropout=0.5):
68 | super().__init__()
69 | self.convs = nn.ModuleList()
70 | self.convs.append(WeightedSAGEConv(input_dim, hidden_dims, act, dropout))
71 | for _ in range(n_layers - 2):
72 | self.convs.append(WeightedSAGEConv(hidden_dims, hidden_dims,
73 | act, dropout))
74 | self.convs.append(WeightedSAGEConv(hidden_dims, output_dims,
75 | act, dropout))
76 | self.dropout = nn.Dropout(dropout)
77 | # self.act = act
78 |
79 | def forward(self, blocks, h):
80 | for l, (layer, block) in enumerate(zip(self.convs, blocks)):
81 | h_dst = h[:block.number_of_nodes('DST/' + block.ntypes[0])] # 这只取dst点,从下往上aggregate,得到头结点
82 | h = layer(block, (h, h_dst))
83 | if l != len(self.convs) - 1:
84 | h = self.dropout(h)
85 | return h
86 |
87 |
88 |
89 |
--------------------------------------------------------------------------------
/graphsage/link_prediction/train.py:
--------------------------------------------------------------------------------
1 | import os
2 | import argparse
3 | from tqdm import tqdm
4 | from loguru import logger
5 | import numpy as np
6 |
7 | import torch
8 | import torch.nn as nn
9 | import torch.nn.functional as F
10 | import torch.optim as optim
11 | import torch.backends.cudnn as cudnn
12 | from torch.utils.data import IterableDataset, Dataset, DataLoader
13 |
14 | import dgl.function as fn
15 | import sklearn.linear_model as lm
16 | import sklearn.metrics as skm
17 |
18 | from dataloader import build_cora_dataset, NodesSet, NodesGraphCollactor
19 | from model import SAGENet
20 |
21 |
22 | class CrossEntropyLoss(nn.Module):
23 | def forward(self, block_outputs, pos_graph, neg_graph):
24 | with pos_graph.local_scope():
25 | pos_graph.ndata['h'] = block_outputs
26 | pos_graph.apply_edges(fn.u_dot_v('h', 'h', 'score'))
27 | pos_score = pos_graph.edata['score']
28 | with neg_graph.local_scope():
29 | neg_graph.ndata['h'] = block_outputs
30 | neg_graph.apply_edges(fn.u_dot_v('h', 'h', 'score'))
31 | neg_score = neg_graph.edata['score']
32 |
33 | score = torch.cat([pos_score, neg_score])
34 | label = torch.cat([torch.ones_like(pos_score), torch.zeros_like(neg_score)]).long()
35 | loss = F.binary_cross_entropy_with_logits(score, label.float())
36 | return loss
37 |
38 |
39 | def load_subtensor(nfeat, seeds, device='cpu'):
40 | """
41 | Extracts features and labels for a subset of nodes
42 | """
43 | # logger.info(len(seeds))
44 | seeds_feats = nfeat[list(seeds)].to(device)
45 | # batch_labels = labels[input_nodes].to(device)
46 | return seeds_feats
47 |
48 |
49 | def compute_acc_unsupervised(emb, graph):
50 | """
51 | Compute the accuracy of prediction given the labels.
52 | """
53 |
54 | train_mask = graph.ndata['train_mask']
55 | test_mask = graph.ndata['test_mask']
56 | val_mask = graph.ndata['val_mask']
57 | train_nids = torch.LongTensor(np.nonzero(train_mask)).squeeze().cpu().numpy()
58 | val_nids = torch.LongTensor(np.nonzero(val_mask)).squeeze().cpu().numpy()
59 | test_nids = torch.LongTensor(np.nonzero(test_mask)).squeeze().cpu().numpy()
60 |
61 | emb = emb.cpu().detach().numpy()
62 | labels = graph.ndata['label'].cpu().numpy()
63 | train_labels = labels[train_nids]
64 | val_labels = labels[val_nids]
65 | test_labels = labels[test_nids]
66 |
67 | emb = (emb - emb.mean(0, keepdims=True)) / emb.std(0, keepdims=True)
68 |
69 | lr = lm.LogisticRegression(multi_class='multinomial', max_iter=1000)
70 | lr.fit(emb[train_nids], train_labels)
71 |
72 | pred = lr.predict(emb)
73 | f1_micro_train = skm.f1_score(train_labels, pred[train_nids], average='micro')
74 | f1_micro_eval = skm.f1_score(val_labels, pred[val_nids], average='micro')
75 | f1_micro_test = skm.f1_score(test_labels, pred[test_nids], average='micro')
76 | return f1_micro_train, f1_micro_eval, f1_micro_test
77 |
78 |
79 | def train(args, graph):
80 | # os.environ["CUDA_VISIBLE_DEVICES"] = args.gpu
81 | device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
82 | cudnn.benchmark = True
83 | graph.to(device)
84 |
85 | features = graph.ndata['feat']
86 | in_feats = features.shape[1]
87 | n_classes = 7
88 |
89 | collator = NodesGraphCollactor(graph, neighbors_every_layer=args.neighbors_every_layer)
90 |
91 | batch_sampler = NodesSet(graph)
92 | data_loader = DataLoader(
93 | batch_sampler,
94 | batch_size=512,
95 | shuffle=True,
96 | num_workers=6,
97 | collate_fn=collator
98 | )
99 |
100 | # should aggregate while testing.
101 | test_collator = NodesGraphCollactor(graph, neighbors_every_layer=[10000])
102 | test_data_loader = DataLoader(
103 | batch_sampler,
104 | batch_size=10000,
105 | shuffle=False,
106 | num_workers=6,
107 | collate_fn=test_collator
108 | )
109 |
110 | # Define model and optimizer
111 | model = SAGENet(in_feats, args.num_hidden, n_classes,
112 | args.num_layers, F.relu, args.dropout)
113 | model.cuda()
114 |
115 | # loss_fcn = nn.CrossEntropyLoss()
116 | loss_fcn = CrossEntropyLoss()
117 | optimizer = optim.AdamW(model.parameters(), lr=args.lr, betas=(0.9, 0.999), eps=1e-08,
118 | weight_decay=0.1, amsgrad=False)
119 | top_acc, top_f1 = 0, 0
120 | for epoch in range(args.num_epochs):
121 | acc_cnt = 0
122 | for step, (pos_graph, neg_graph, blocks, all_seeds) in enumerate(data_loader):
123 | # pos_nodes, neg_nodes_batch = collator.sample_pos_neg_nodes(batch)
124 | # logger.info(len(batch), len(all_seeds))
125 | # logger.info(len(pos_nodes), len(neg_nodes_batch), torch.tensor(neg_nodes_batch).shape)
126 | feats = load_subtensor(features, all_seeds, device=device)
127 | # pos_feats = load_subtensor(features, pos_nodes, device=device)
128 | # neg_feats = load_subtensor(features, neg_nodes_batch, device=device)
129 | # logger.info(heads_feats.shape, pos_feats.shape, neg_feats.shape)
130 | blocks = [b.to(device) for b in blocks]
131 | pos_graph = pos_graph.to(device)
132 | neg_graph = neg_graph.to(device)
133 | bacth_pred = model(blocks, feats)
134 | loss = loss_fcn(bacth_pred, pos_graph, neg_graph)
135 | optimizer.zero_grad()
136 | loss.backward()
137 | optimizer.step()
138 | # batch_acc_cnt = (torch.argmax(bacth_pred, dim=1) == batch_labels.long()).float().sum()
139 | # acc_cnt += int(batch_acc_cnt)
140 | logger.info(f"Train Epoch:{epoch}, Loss:{loss}")
141 |
142 | # evaluation
143 | for step, (pos_graph, neg_graph, blocks, all_seeds) in enumerate(test_data_loader):
144 | feats = load_subtensor(features, all_seeds, device=device)
145 | blocks = [b.to(device) for b in blocks]
146 | bacth_pred = model(blocks, feats)
147 |
148 | f1_micro_train, f1_micro_eval, f1_micro_test = compute_acc_unsupervised(bacth_pred, graph)
149 | if top_f1 < f1_micro_test:
150 | top_f1 = f1_micro_test
151 | logger.info(
152 | f" train f1:{f1_micro_train}, Val micro F1: {f1_micro_eval}, Test micro F1:{f1_micro_test}, TOP micro F1:{top_f1}")
153 |
154 |
155 | if __name__ == "__main__":
156 | parser = argparse.ArgumentParser(description="parameter set")
157 | parser.add_argument('--num_epochs', type=int, default=64)
158 | parser.add_argument('--num_hidden', type=int, default=256)
159 | parser.add_argument('--dropout', type=float, default=0.0)
160 | parser.add_argument('--lr', type=float, default=0.001)
161 | parser.add_argument('--num_layers', type=int, default=2)
162 | # TODO: multiple negative nodes
163 | parser.add_argument('--neighbors_every_layer', type=list, default=[10], help="or [10, 5]")
164 | parser.add_argument("--gpu", type=str, default='0',
165 | help="gpu or cpu")
166 | args = parser.parse_args()
167 | graph = build_cora_dataset(add_symmetric_edges=True, add_self_loop=True)
168 |
169 | train(args, graph)
170 |
--------------------------------------------------------------------------------
/graphsage/node_classification/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/xinyi-code/Graph-Learning/fb8ff32e2e1ab4dbebb60ce0acd75ae595d806b8/graphsage/node_classification/__init__.py
--------------------------------------------------------------------------------
/graphsage/node_classification/__pycache__/dataloader.cpython-38.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/xinyi-code/Graph-Learning/fb8ff32e2e1ab4dbebb60ce0acd75ae595d806b8/graphsage/node_classification/__pycache__/dataloader.cpython-38.pyc
--------------------------------------------------------------------------------
/graphsage/node_classification/__pycache__/model.cpython-38.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/xinyi-code/Graph-Learning/fb8ff32e2e1ab4dbebb60ce0acd75ae595d806b8/graphsage/node_classification/__pycache__/model.cpython-38.pyc
--------------------------------------------------------------------------------
/graphsage/node_classification/dataloader.py:
--------------------------------------------------------------------------------
1 | import argparse
2 | import torch
3 | from torch.utils.data import IterableDataset, Dataset, DataLoader
4 |
5 | import dgl
6 | from dgl.data import AIFBDataset, MUTAGDataset, BGSDataset, AMDataset, CoraGraphDataset
7 |
8 | import random
9 |
10 | random.seed(123)
11 |
12 |
13 | class HomoNodesSet(Dataset):
14 | def __init__(self, g, mask):
15 | # only load masked node for training/testing
16 | self.g = g
17 | self.nodes = g.nodes()[mask].tolist()
18 |
19 | def __len__(self):
20 | return len(self.nodes)
21 |
22 | def __getitem__(self, index):
23 | heads = self.nodes[index]
24 | return heads
25 |
26 |
27 | class NodesGraphCollactor(object):
28 | """
29 | select heads/tails/neg_tails's neighbors for aggregation
30 | """
31 |
32 | def __init__(self, g, neighbors_every_layer=[5, 1]):
33 | self.g = g
34 | self.neighbors_every_layer = neighbors_every_layer
35 |
36 | def __call__(self, batch):
37 | blocks, seeds = self.sample_blocks(batch)
38 | return batch, seeds, blocks
39 |
40 | def sample_blocks(self, seeds):
41 | blocks = []
42 | for n_neighbors in self.neighbors_every_layer:
43 | frontier = dgl.sampling.sample_neighbors(
44 | self.g,
45 | seeds,
46 | fanout=n_neighbors,
47 | edge_dir='in')
48 | block = self.compact_and_copy(frontier, seeds)
49 | seeds = block.srcdata[dgl.NID] # 这里应该返回这一层的src node
50 |
51 | blocks.insert(0, block)
52 | return blocks, seeds
53 |
54 | def compact_and_copy(self, frontier, seeds):
55 | # 将第一轮的dst节点与frontier压缩成block
56 | # 并设置block的seeds 为 output nodes,其他为input nodes
57 | block = dgl.to_block(frontier, seeds)
58 | for col, data in frontier.edata.items():
59 | if col == dgl.EID:
60 | continue
61 | block.edata[col] = data[block.edata[dgl.EID]]
62 | return block
63 |
64 |
65 | def build_graph(args):
66 | # load graph news
67 | if args.dataset == 'aifb':
68 | dataset = AIFBDataset()
69 | elif args.dataset == 'mutag':
70 | dataset = MUTAGDataset()
71 | elif args.dataset == 'bgs':
72 | dataset = BGSDataset()
73 | elif args.dataset == 'am':
74 | dataset = AMDataset()
75 | elif args.dataset == 'cora':
76 | dataset = AMDataset()
77 | else:
78 | raise ValueError()
79 |
80 | g = dataset[0]
81 | category = dataset.predict_category
82 | num_classes = dataset.num_classes
83 | train_mask = g.nodes[category].data.pop('train_mask')
84 | test_mask = g.nodes[category].data.pop('test_mask')
85 | train_idx = torch.nonzero(train_mask, as_tuple=False).squeeze()
86 | test_idx = torch.nonzero(test_mask, as_tuple=False).squeeze()
87 | labels = g.nodes[category].data.pop('labels')
88 | print(g)
89 | print(len(g.etypes), g.etypes)
90 | return g
91 |
92 |
93 | def build_cora_dataset(add_symmetric_edges=True, add_self_loop=True):
94 | dataset = CoraGraphDataset()
95 | graph = dataset[0]
96 |
97 | train_mask = graph.ndata['train_mask']
98 | val_mask = graph.ndata['val_mask']
99 | test_mask = graph.ndata['test_mask']
100 | labels = graph.ndata['label']
101 | feat = graph.ndata['feat']
102 |
103 | if add_symmetric_edges:
104 | edges = graph.edges()
105 | graph.add_edges(edges[1], edges[0])
106 |
107 | graph = dgl.remove_self_loop(graph)
108 | if add_self_loop:
109 | graph = dgl.add_self_loop(graph)
110 | return graph
111 |
112 |
113 | if __name__ == "__main__":
114 | parser = argparse.ArgumentParser(description="parameter set")
115 | parser.add_argument('--dataset', type=str, default='aifb')
116 |
117 | args = parser.parse_args()
118 | # graph = build_graph(args)
119 | graph = build_cora_dataset()
120 | train_mask = graph.ndata['train_mask']
121 |
122 | batch_sampler = HomoNodesSet(graph, train_mask)
123 | collator = NodesGraphCollactor(graph, neighbors_every_layer=[5, 2])
124 | dataloader = DataLoader(
125 | batch_sampler,
126 | batch_size=2,
127 | shuffle=True,
128 | num_workers=1,
129 | collate_fn=collator
130 | )
131 |
132 | for step, (seed, blocks_nodes, blocks) in enumerate(dataloader):
133 | print("------------------------")
134 | print(seed)
135 | print(blocks_nodes)
136 | print(blocks)
137 | for b in blocks:
138 | print(b.edges())
139 | break
140 |
--------------------------------------------------------------------------------
/graphsage/node_classification/model.py:
--------------------------------------------------------------------------------
1 | import torch
2 | import torch.nn as nn
3 | import torch.nn.functional as F
4 | from torch.nn.parameter import Parameter
5 |
6 | import dgl
7 | import dgl.nn.pytorch as dglnn
8 | import dgl.function as fn
9 |
10 |
11 | class WeightedSAGEConv(nn.Module):
12 | def __init__(self, input_dims, output_dims, act=F.relu, dropout=0.5, bias=True):
13 | super().__init__()
14 |
15 | self.act = act
16 | self.Q = nn.Linear(input_dims, output_dims)
17 | self.W = nn.Linear(input_dims + output_dims, output_dims)
18 | if bias:
19 | self.bias = Parameter(torch.FloatTensor(output_dims))
20 | else:
21 | self.register_parameter('bias', None)
22 | # self.dropout = nn.Dropout(dropout)
23 | self.reset_parameters()
24 |
25 | def reset_parameters(self):
26 | gain = nn.init.calculate_gain('relu')
27 | nn.init.xavier_uniform_(self.Q.weight, gain=gain)
28 | nn.init.xavier_uniform_(self.W.weight, gain=gain)
29 | nn.init.constant_(self.Q.bias, 0)
30 | nn.init.constant_(self.W.bias, 0)
31 | if self.bias is not None:
32 | nn.init.zeros_(self.bias)
33 |
34 | def forward(self, g, h, weights=None):
35 | """
36 | g : graph
37 | h : node features
38 | weights : scalar edge weights
39 | """
40 | h_src, h_dst = h
41 | with g.local_scope():
42 | if weights:
43 | g.srcdata['n'] = self.act(self.Q(self.dropout(h_src)))
44 | g.edata['w'] = weights.float()
45 | g.update_all(fn.u_mul_e('n', 'w', 'm'), fn.sum('m', 'n'))
46 | g.update_all(fn.copy_e('w', 'm'), fn.sum('m', 'ws'))
47 | n = g.dstdata['n']
48 | ws = g.dstdata['ws'].unsqueeze(1).clamp(min=1)
49 | z = self.act(self.W(self.dropout(torch.cat([n / ws, h_dst], 1))))
50 | z_norm = z.norm(2, 1, keepdim=True)
51 | z_norm = torch.where(z_norm == 0, torch.tensor(1.).to(z_norm), z_norm)
52 | z = z / z_norm
53 | else:
54 | # a= self.Q(h_src)
55 | g.srcdata['n'] = self.Q(h_src)
56 | g.update_all(fn.copy_src('n', 'm'), fn.mean('m', 'neigh')) # aggregation
57 | n = g.dstdata['neigh']
58 | z = self.act(self.W(torch.cat([n, h_dst], 1))) + self.bias
59 | z_norm = z.norm(2, 1, keepdim=True)
60 | z_norm = torch.where(z_norm == 0, torch.tensor(1.).to(z_norm), z_norm)
61 | z = z / z_norm
62 | return z
63 |
64 |
65 | class SAGENet(nn.Module):
66 | def __init__(self, input_dim, hidden_dims, output_dims,
67 | n_layers, act=F.relu, dropout=0.5):
68 | super().__init__()
69 | self.convs = nn.ModuleList()
70 | self.convs.append(WeightedSAGEConv(input_dim, hidden_dims, act, dropout))
71 | for _ in range(n_layers - 2):
72 | self.convs.append(WeightedSAGEConv(hidden_dims, hidden_dims,
73 | act, dropout))
74 | self.convs.append(WeightedSAGEConv(hidden_dims, output_dims,
75 | act, dropout))
76 | self.dropout = nn.Dropout(dropout)
77 | self.act = act
78 |
79 | def forward(self, blocks, h):
80 | for l, (layer, block) in enumerate(zip(self.convs, blocks)):
81 | h_dst = h[:block.number_of_nodes('DST/' + block.ntypes[0])] # 这只取dst点,从下往上aggregate,得到头结点
82 | h = layer(block, (h, h_dst))
83 | if l != len(self.convs) - 1:
84 | h = self.dropout(h)
85 | return h
86 |
--------------------------------------------------------------------------------
/graphsage/node_classification/train.py:
--------------------------------------------------------------------------------
1 | import os
2 | import argparse
3 | from tqdm import tqdm
4 |
5 | import torch
6 | import torch.nn as nn
7 | import torch.nn.functional as F
8 | import torch.optim as optim
9 | import torch.backends.cudnn as cudnn
10 | from torch.utils.data import IterableDataset, Dataset, DataLoader
11 |
12 | from dataloader import build_cora_dataset, HomoNodesSet, NodesGraphCollactor
13 | from model import SAGENet
14 | from sklearn.metrics import f1_score
15 |
16 | def load_subtensor(nfeat, labels, seeds, input_nodes, device):
17 | """
18 | Extracts features and labels for a subset of nodes
19 | """
20 | batch_inputs = nfeat[seeds].to(device)
21 | batch_labels = labels[input_nodes].to(device)
22 | return batch_inputs, batch_labels
23 |
24 |
25 | def evaluation(features, labels, test_mask,
26 | model, test_data_loader, loss_fcn, device='cpu'):
27 | model.eval()
28 | with torch.no_grad():
29 | acc_cnt = 0
30 | for step, (input_nodes, seeds, blocks) in enumerate(test_data_loader):
31 | batch_feat, batch_labels = load_subtensor(features, labels, seeds, input_nodes, device='cpu')
32 | blocks = [b.to(device) for b in blocks]
33 | batch_feat = batch_feat.to(device)
34 | batch_labels = batch_labels.to(device)
35 | bacth_pred = model(blocks, batch_feat)
36 | loss = loss_fcn(bacth_pred, batch_labels)
37 | batch_acc_cnt = (torch.argmax(bacth_pred, dim=1) == batch_labels.long()).float().sum()
38 | acc_cnt += int(batch_acc_cnt)
39 | f1 = f1_score(batch_labels.detach().cpu(), torch.argmax(bacth_pred, dim=1).detach().cpu(), average='macro')
40 | print(f"Test: Loss:{loss}, cnt:{acc_cnt}, {torch.nonzero(test_mask).shape[0]},"
41 | f"Acc:{int(acc_cnt) / torch.nonzero(test_mask).shape[0]}")
42 | return int(acc_cnt) / torch.nonzero(test_mask).shape[0], f1
43 |
44 |
45 | def train(args, graph):
46 | # os.environ["CUDA_VISIBLE_DEVICES"] = args.gpu
47 | device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
48 | cudnn.benchmark = True
49 | graph.to(device)
50 |
51 | features = graph.ndata['feat']
52 | labels = graph.ndata['label']
53 | train_mask = graph.ndata['train_mask']
54 | test_mask = graph.ndata['test_mask']
55 | val_mask = graph.ndata['val_mask']
56 | in_feats = features.shape[1]
57 | n_classes = 7
58 |
59 | collator = NodesGraphCollactor(graph, neighbors_every_layer=args.neighbors_every_layer)
60 |
61 | batch_sampler = HomoNodesSet(graph, train_mask)
62 | data_loader = DataLoader(
63 | batch_sampler,
64 | batch_size=512,
65 | shuffle=True,
66 | num_workers=6,
67 | collate_fn=collator
68 | )
69 |
70 | test_batch_sampler = HomoNodesSet(graph, test_mask)
71 | test_data_loader = DataLoader(
72 | test_batch_sampler,
73 | batch_size=1000,
74 | shuffle=False,
75 | num_workers=6,
76 | collate_fn=collator
77 | )
78 |
79 | # Define model and optimizer
80 | model = SAGENet(in_feats, args.num_hidden, n_classes,
81 | args.num_layers, F.relu, args.dropout)
82 | model.cuda()
83 |
84 | loss_fcn = nn.CrossEntropyLoss()
85 | optimizer = optim.AdamW(model.parameters(), lr=args.lr, betas=(0.9, 0.999), eps=1e-08,
86 | weight_decay=0.1, amsgrad=False)
87 | top_acc, top_f1 = 0, 0
88 | for epoch in range(args.num_epochs):
89 | acc_cnt = 0
90 | for step, (input_nodes, seeds, blocks) in enumerate(data_loader):
91 | batch_feat, batch_labels = load_subtensor(features, labels, seeds, input_nodes, device=device)
92 | blocks = [b.to(device) for b in blocks]
93 | batch_feat = batch_feat.to(device)
94 | batch_labels = batch_labels.to(device)
95 | bacth_pred = model(blocks, batch_feat)
96 | loss = loss_fcn(bacth_pred, batch_labels)
97 | optimizer.zero_grad()
98 | loss.backward()
99 | optimizer.step()
100 | batch_acc_cnt = (torch.argmax(bacth_pred, dim=1) == batch_labels.long()).float().sum()
101 | acc_cnt += int(batch_acc_cnt)
102 | print(f"Train Epoch:{epoch}, Loss:{loss}, Acc:{int(acc_cnt) / torch.nonzero(train_mask).shape[0]}")
103 |
104 | # evaluation()
105 |
106 | acc, f1 = evaluation(features, labels, test_mask, model, test_data_loader, loss_fcn, device)
107 | if top_f1 < f1:
108 | top_acc, top_f1 = acc, f1
109 | print(f"Test Top Acc: {top_acc}, F1:{top_f1}")
110 |
111 |
112 | if __name__ == "__main__":
113 | parser = argparse.ArgumentParser(description="parameter set")
114 | parser.add_argument('--num_hidden', type=int, default=256)
115 | parser.add_argument('--dropout', type=float, default=0.8)
116 | parser.add_argument('--lr', type=float, default=0.01)
117 | parser.add_argument('--num_layers', type=int, default=2)
118 | parser.add_argument('--neighbors_every_layer', type=list, default=[10], help="or [10, 5]")
119 | parser.add_argument('--num_epochs', type=int, default=200)
120 | parser.add_argument("--gpu", type=str, default='0',
121 | help="gpu or cpu")
122 | args = parser.parse_args()
123 | graph = build_cora_dataset(add_symmetric_edges=True, add_self_loop=True)
124 |
125 | train(args, graph)
126 |
--------------------------------------------------------------------------------
/graphsage/test_api/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/xinyi-code/Graph-Learning/fb8ff32e2e1ab4dbebb60ce0acd75ae595d806b8/graphsage/test_api/__init__.py
--------------------------------------------------------------------------------
/graphsage/test_api/edge_dataloader.py:
--------------------------------------------------------------------------------
1 | import dgl
2 |
3 | import torch
4 |
5 | src = torch.tensor([1, 3, 5, 7, 9])
6 | dst = torch.tensor([2, 4, 6, 8, 10])
7 | g = dgl.graph((torch.cat([src, dst]), torch.cat([dst, src])))
8 |
9 | E = len(src)
10 | reverse_eids = torch.cat([torch.arange(E, 2 * E), torch.arange(0, E)])
11 |
12 | train_eid = src
13 | sampler = dgl.dataloading.MultiLayerNeighborSampler([15, 10, 5])
14 | dataloader = dgl.dataloading.EdgeDataLoader(
15 | g, train_eid, sampler, exclude='reverse_id',
16 | reverse_eids=reverse_eids,
17 | batch_size=1024, shuffle=True, drop_last=False, num_workers=4)
18 | for input_nodes, pair_graph, blocks in dataloader:
19 | print(input_nodes)
20 | print(pair_graph)
21 | # print(neg_graph)
22 | print(blocks)
23 |
--------------------------------------------------------------------------------
/node2vec/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/xinyi-code/Graph-Learning/fb8ff32e2e1ab4dbebb60ce0acd75ae595d806b8/node2vec/__init__.py
--------------------------------------------------------------------------------
/node2vec/main.py:
--------------------------------------------------------------------------------
1 | import time
2 | import os
3 | import random
4 | import numpy as np
5 | from tqdm import tqdm
6 |
7 | import torch
8 | from torch.utils.data import Dataset, DataLoader
9 |
10 | import dgl
11 |
12 | from A_graph_learning.deepwalk.model import SkipGramModel
13 | from A_graph_learning.deepwalk.build_graph import Build_Graph
14 | from A_graph_learning.deepwalk.dataset import NodesDataset
15 | from A_graph_learning.deepwalk.utils import skip_gram_gen_pairs
16 |
17 |
18 | class Call_Func():
19 | def __init__(self, g, half_win_size, walk_length=4, p=1, q=1):
20 | self.g = g
21 | self.p = p
22 | self.q = q
23 | self.walk_length = walk_length
24 | self.half_win_size = half_win_size
25 |
26 | def __call__(self, nodes):
27 | batch_src, batch_dst = list(), list()
28 |
29 | walks_list = list()
30 | walks = dgl.sampling.node2vec_random_walk(self.g, nodes, p=1, q=1, walk_length=self.walk_length)
31 | walks_list += walks.tolist()
32 | for walk in walks_list:
33 | src, dst = skip_gram_gen_pairs(walk, self.half_win_size)
34 | batch_src += src
35 | batch_dst += dst
36 |
37 | # shuffle pair
38 | batch_tmp = list(zip(batch_src, batch_dst))
39 | random.shuffle(batch_tmp)
40 | batch_src, batch_dst = zip(*batch_tmp)
41 |
42 | batch_src = torch.from_numpy(np.array(batch_src))
43 | batch_dst = torch.from_numpy(np.array(batch_dst))
44 | return batch_src, batch_dst
45 |
46 |
47 | def main(config):
48 | print(torch.cuda.device_count(), torch.cuda.is_available())
49 | os.environ["CUDA_VISIBLE_DEVICES"] = config.gpu
50 | # device = torch.device(config.gpu)
51 | torch.backends.cudnn.benchmark = True
52 |
53 | GraphSet = Build_Graph(config.file_path, walk_length=config.walk_length, self_loop=True, undirected=True)
54 | graph = GraphSet.graph
55 |
56 | model = SkipGramModel(graph.num_nodes(), embed_dim=config.embed_dim)
57 | model.cuda()
58 | optimizer = torch.optim.SparseAdam(model.parameters(), lr=float(config.lr))
59 |
60 | nodes_dataset = NodesDataset(graph.nodes())
61 | pair_generate_func = Call_Func(graph, config.half_win_size, config.walk_length, config.p, config.q)
62 |
63 | pair_loader = DataLoader(nodes_dataset, batch_size=config.batch_size, shuffle=True, num_workers=4,
64 | collate_fn=pair_generate_func)
65 |
66 | for epoch in range(config.epochs):
67 | start_time = time.time()
68 | model.train()
69 |
70 | loss_total = list()
71 | top_loss = 0
72 | tqdm_bar = tqdm(pair_loader, desc="Training epoch{epoch}".format(epoch=epoch))
73 | for i, (batch_src, batch_dst) in enumerate(tqdm_bar):
74 | batch_src = batch_src.cuda().long()
75 | batch_dst = batch_dst.cuda().long()
76 |
77 | batch_neg = np.random.randint(0, graph.num_nodes(), size=(batch_src.shape[0], config.neg_num))
78 | batch_neg = torch.from_numpy(batch_neg).cuda().long() # change multi neg_num
79 |
80 | model.zero_grad()
81 | loss = model.forward(batch_src, batch_dst, batch_neg)
82 | loss.backward()
83 | optimizer.step()
84 | loss_total.append(loss.detach().item())
85 |
86 | if top_loss > np.mean(loss_total):
87 | top_loss = np.mean(loss_total)
88 | torch.save(model.state_dict(), config.save_path)
89 | print("Epoch: %03d; loss = %.4f saved path: %s" % (epoch, top_loss, config.save_path))
90 | print("Epoch: %03d; loss = %.4f cost time %.4f" % (epoch, np.mean(loss_total), time.time() - start_time))
91 |
92 |
93 | if __name__ == "__main__":
94 | class ConfigClass():
95 | def __init__(self):
96 | self.lr = 0.005
97 | self.gpu = "0"
98 | self.epochs = 32
99 | self.embed_dim = 64
100 | self.batch_size = 10
101 |
102 | self.p = 1
103 | self.q = 1
104 | self.walk_num_per_node = 6
105 | self.walk_length = 12
106 | self.win_size = 6
107 | self.neg_num = 5
108 | self.save_path = "../out/blog_deepwalk_ckpt"
109 | self.file_path = "../data/blog/"
110 |
111 |
112 | config = ConfigClass()
113 | # main(config)
114 |
115 | # import argparse
116 | # from utils import load_config
117 | #
118 | # parser = argparse.ArgumentParser(description='bert classification')
119 | # parser.add_argument("-c", "--config", type=str, default="./config.yaml")
120 | # args = parser.parse_args()
121 | # config = load_config(args.config)
122 |
--------------------------------------------------------------------------------
/node2vec/model.py:
--------------------------------------------------------------------------------
1 | import torch
2 | import torch.nn as nn
3 | import torch.nn.functional as F
4 |
5 | import random
6 | import numpy as np
7 |
8 |
9 | # from gensim.models import Word2Vec
10 |
11 |
12 | class SkipGramModel(nn.Module):
13 | def __init__(self, num_nodes, embed_dim):
14 | super(SkipGramModel, self).__init__()
15 | self.num_nodes = num_nodes
16 | self.emb_dimension = embed_dim
17 |
18 | self.embed_nodes = nn.Embedding(self.num_nodes, self.emb_dimension, sparse=True)
19 | nn.init.xavier_uniform_(self.embed_nodes.weight)
20 | self.loss = nn.BCEWithLogitsLoss()
21 |
22 | def forward(self, src, pos, neg):
23 | embed_src = self.embed_nodes(src) # (B, d)
24 | embed_pos = self.embed_nodes(pos) # (B, d)
25 | embed_neg = self.embed_nodes(neg) # (B, neg_num, d)
26 | # print(embed_src.shape, embed_pos.shape, embed_neg.shape)
27 |
28 | pos_logits = torch.matmul(embed_src, embed_pos.transpose(0, 1))
29 | ones_label = torch.ones_like(pos_logits)
30 | # print(pos_logits.shape, ones_label.shape)
31 | pos_loss = self.loss(pos_logits, ones_label)
32 |
33 | neg_logits = torch.matmul(embed_src, embed_neg.transpose(1, 2))
34 | zeros_label = torch.zeros_like(neg_logits)
35 | # print(neg_logits.shape, zeros_label.shape)
36 | neg_loss = self.loss(neg_logits, zeros_label)
37 |
38 | loss = (pos_loss + neg_loss) / 2
39 | return loss
40 |
41 |
42 | def skip_gram_model_test():
43 | model = SkipGramModel(1000, embed_dim=32)
44 | model.cuda()
45 |
46 | src = np.random.randint(0, 100, size=10)
47 | src = torch.from_numpy(src).cuda().long()
48 |
49 | dst = np.random.randint(0, 100, size=10)
50 | dst = torch.from_numpy(dst).cuda().long()
51 |
52 | neg = np.random.randint(0, 100, size=(10, 5))
53 | neg = torch.from_numpy(neg).cuda().long()
54 |
55 | print(src.shape, dst.shape, neg.shape)
56 |
57 | print(model(src, dst, neg))
58 |
59 |
60 | if __name__ == "__main__":
61 | skip_gram_model_test()
62 |
--------------------------------------------------------------------------------
/node2vec/sample_walks.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | import random
3 | import pgl
4 |
5 | def random_walk(g, nodes, max_depth):
6 | walk_paths = []
7 | # init
8 | for node in nodes:
9 | walk_paths.append([node])
10 |
11 | cur_walk_ids = np.arange(0, len(nodes))
12 | cur_nodes = np.array(nodes)
13 | for l in range(max_depth - 1):
14 | # select the walks not end
15 | cur_succs = g.successor(cur_nodes)
16 | mask = [len(succ) > 0 for succ in cur_succs]
17 |
18 | if np.any(mask):
19 | cur_walk_ids = cur_walk_ids[mask]
20 | cur_nodes = cur_nodes[mask]
21 | cur_succs = cur_succs[mask]
22 | else:
23 | # stop when all nodes have no successor
24 | break
25 |
26 | outdegree = [len(cur_succ) for cur_succ in cur_succs]
27 | sample_index = np.floor(
28 | np.random.rand(cur_succs.shape[0]) * outdegree).astype("int64")
29 |
30 | nxt_cur_nodes = []
31 | for s, ind, walk_id in zip(cur_succs, sample_index, cur_walk_ids):
32 | walk_paths[walk_id].append(s[ind])
33 | nxt_cur_nodes.append(s[ind])
34 | cur_nodes = np.array(nxt_cur_nodes)
35 | return walk_paths
36 |
37 |
38 | def node2vec_walk(graph, nodes, max_depth, p=1.0, q=1.0):
39 | if p == 1.0 and q == 1.0:
40 | return random_walk(graph, nodes, max_depth)
41 |
42 | walk = []
43 | # init
44 | for node in nodes:
45 | walk.append([node])
46 |
47 | cur_walk_ids = np.arange(0, len(nodes))
48 | cur_nodes = np.array(nodes)
49 | prev_nodes = np.array([-1] * len(nodes), dtype="int64")
50 | prev_succs = np.array([[]] * len(nodes), dtype="int64")
51 | for l in range(max_depth):
52 | # select the walks not end
53 | cur_succs = graph.successor(cur_nodes)
54 |
55 | mask = [len(succ) > 0 for succ in cur_succs]
56 | if np.any(mask):
57 | cur_walk_ids = cur_walk_ids[mask]
58 | cur_nodes = cur_nodes[mask]
59 | prev_nodes = prev_nodes[mask]
60 | prev_succs = prev_succs[mask]
61 | cur_succs = cur_succs[mask]
62 | else:
63 | # stop when all nodes have no successor
64 | break
65 | num_nodes = cur_nodes.shape[0]
66 | nxt_nodes = np.zeros(num_nodes, dtype="int64")
67 |
68 | for idx, (
69 | succ, prev_succ, walk_id, prev_node
70 | ) in enumerate(zip(cur_succs, prev_succs, cur_walk_ids, prev_nodes)):
71 | sampled_succ = node2vec_sample(succ, prev_succ,
72 | prev_node, p, q)
73 | walk[walk_id].append(sampled_succ)
74 | nxt_nodes[idx] = sampled_succ
75 |
76 | prev_nodes, prev_succs = cur_nodes, cur_succs
77 | cur_nodes = nxt_nodes
78 | return walk
79 |
80 |
81 | def node2vec_walk(graph, nodes, max_depth, p=1.0, q=1.0):
82 | if p == 1.0 and q == 1.0:
83 | return random_walk(graph, nodes, max_depth)
84 |
85 | walk = []
86 | # init
87 | for node in nodes:
88 | walk.append([node])
89 |
90 | cur_walk_ids = np.arange(0, len(nodes))
91 | cur_nodes = np.array(nodes)
92 | prev_nodes = np.array([-1] * len(nodes), dtype="int64")
93 | prev_succs = np.array([[]] * len(nodes), dtype="int64")
94 | for l in range(max_depth):
95 | # select the walks not end
96 | cur_succs = graph.successor(cur_nodes) # all the successor
97 |
98 | mask = [len(succ) > 0 for succ in cur_succs]
99 | if np.any(mask):
100 | cur_walk_ids = cur_walk_ids[mask]
101 | cur_nodes = cur_nodes[mask]
102 | prev_nodes = prev_nodes[mask]
103 | prev_succs = prev_succs[mask]
104 | cur_succs = cur_succs[mask]
105 | else:
106 | # stop when all nodes have no successor
107 | break
108 | num_nodes = cur_nodes.shape[0]
109 | nxt_nodes = np.zeros(num_nodes, dtype="int64")
110 | print(l, cur_succs, prev_succs, cur_walk_ids, prev_nodes)
111 | for idx, (
112 | succ, prev_succ, walk_id, prev_node
113 | ) in enumerate(zip(cur_succs, prev_succs, cur_walk_ids, prev_nodes)):
114 | sampled_succ = node2vec_sample(succ, prev_succ,
115 | prev_node, p, q)
116 | walk[walk_id].append(sampled_succ)
117 | nxt_nodes[idx] = sampled_succ
118 |
119 | prev_nodes, prev_succs = cur_nodes, cur_succs
120 | cur_nodes = nxt_nodes
121 | return walk
122 |
123 |
124 | def node2vec_sample(succ, prev_succ, prev_node, p, q):
125 | """Fast implement of node2vec sampling
126 | """
127 | print("succ", succ, "prev_succ", prev_succ, "prev_node", prev_node)
128 | succ_len = len(succ)
129 | prev_succ_len = len(prev_succ)
130 |
131 | probs = list()
132 | prob_sum = 0
133 |
134 | prev_succ_set = list()
135 | for i in range(prev_succ_len):
136 | prev_succ_set.insert(0, prev_succ[i])
137 |
138 | for i in range(succ_len):
139 | if succ[i] == prev_node:
140 | prob = 1. / p
141 | elif len(prev_succ_set) > 0 and succ[i] != prev_succ_set[-1]:
142 | prob = 1.
143 | else:
144 | prob = 1. / q
145 | probs.append(prob)
146 | prob_sum += prob
147 |
148 | rand_num = random.uniform(0, 1) * prob_sum
149 |
150 | for i in range(succ_len):
151 | rand_num -= probs[i]
152 | if rand_num <= 0:
153 | sample_succ = succ[i]
154 | return sample_succ
155 |
156 |
157 | if __name__ == "__main__":
158 | import dgl
159 | import torch
160 |
161 | # g = dgl.graph(([0, 0, 1, 1, 2, 2], [1, 2, 0, 1, 2, 0]))
162 | # g.edata['weight'] = torch.FloatTensor([0.1, 0.2, 0.3, 0.4, 0.5, 0.5])
163 | # sg = dgl.sampling.select_topk(g, k=1, nodes=[0], weight='weight', edge_dir="in")
164 | # sg2 = dgl.sampling.sample_neighbors(g, [0, 1], 1, prob='weight', edge_dir="out")
165 | # print(sg2.edges(order='eid'))
166 | #
167 | # cur_nodes = np.array([0, 1, 2])
168 | # cur_succs = dgl.sampling.sample_neighbors(g, cur_nodes, 1, edge_dir="out")
169 | # print(cur_succs.edges()[1].tolist())
170 | # mask = [1 for succ in cur_succs.edges()[1].tolist() if succ]
171 |
172 | cur_nodes = np.array([0])
173 | g1 = dgl.graph(([0, 1, 1, 2, 3, 3, 4], [1, 2, 3, 0, 0, 4, 2]))
174 | print(dgl.sampling.sample_neighbors(g1, cur_nodes, 4, edge_dir="out"))
175 | print(node2vec_walk(g1, cur_nodes, max_depth=3, p=4, q=0.25))
176 |
--------------------------------------------------------------------------------
/out/blog_deepwalk_ckpt:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/xinyi-code/Graph-Learning/fb8ff32e2e1ab4dbebb60ce0acd75ae595d806b8/out/blog_deepwalk_ckpt
--------------------------------------------------------------------------------
/pictures/GCN_AD2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/xinyi-code/Graph-Learning/fb8ff32e2e1ab4dbebb60ce0acd75ae595d806b8/pictures/GCN_AD2.png
--------------------------------------------------------------------------------
/pictures/graphSAGE_link_pre.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/xinyi-code/Graph-Learning/fb8ff32e2e1ab4dbebb60ce0acd75ae595d806b8/pictures/graphSAGE_link_pre.png
--------------------------------------------------------------------------------
/pictures/node_classification.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/xinyi-code/Graph-Learning/fb8ff32e2e1ab4dbebb60ce0acd75ae595d806b8/pictures/node_classification.png
--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | torch==1.8.2
2 | dgl==0.7.1
3 | tqdm==4.61.0
4 | scikit-learn==0.24.1
5 | numpy==1.21.2
6 | pandas==1.3.3
--------------------------------------------------------------------------------