├── .gitignore
├── GNN_overview.pptx
├── README.md
├── _legacy
    ├── advanced_apps
    │   └── rec
    │   │   ├── Recommendation-fism.ipynb
    │   │   ├── Recommendation.ipynb
    │   │   ├── graphsage.py
    │   │   ├── movielens.py
    │   │   ├── sageconv.py
    │   │   └── slim_load.py
    └── basic_apps
    │   ├── BasicTasks_mxnet.ipynb
    │   ├── BasicTasks_pytorch.ipynb
    │   ├── data_loading.ipynb
    │   ├── graph_classification_tutorial.ipynb
    │   └── synthetic_data.ipynb
├── applications
    ├── assets
    │   ├── example_bipartite.png
    │   ├── example_bipartite_train.png
    │   └── example_bipartite_train_removed.png
    ├── document.pdf
    ├── fraud.ipynb
    ├── recsys.ipynb
    ├── u.item
    ├── u.user
    ├── ua.base
    └── ua.test
├── asset
    ├── dgl-mp.png
    ├── dgl-query.png
    ├── dgl_logo.png
    ├── enzymes.png
    ├── gnn_ep0.png
    ├── gnn_ep_anime.gif
    ├── karat_club.png
    ├── sagemaker.pdf
    └── sagemaker.pptx
├── basic_tasks
    ├── 1_load_data.ipynb
    ├── 2_gnn.ipynb
    ├── 3_link_predict.ipynb
    ├── 4_message_passing.ipynb
    ├── data
    │   ├── edges.csv
    │   ├── gen_data.py
    │   └── nodes.csv
    ├── slides.pptx
    └── tutorial_utils.py
├── basic_tasks_tf
    ├── 1_load_data-CN.ipynb
    ├── 1_load_data.ipynb
    ├── 2_gnn-CN.ipynb
    ├── 2_gnn.ipynb
    ├── 3_link_predict-CN.ipynb
    ├── 3_link_predict.ipynb
    ├── 4_message_passing-CN.ipynb
    ├── 4_message_passing.ipynb
    ├── data
    │   ├── edges.csv
    │   ├── gen_data.py
    │   └── nodes.csv
    └── tutorial_utils.py
├── dgl_api
    ├── dgl-www-zz.pptx
    ├── graph-1.png
    ├── graph-2.png
    ├── graph-3.png
    ├── graph-4.png
    ├── nodeflow.png
    ├── nodeflow2.png
    ├── slides.pdf
    └── slides.tex
├── images
    ├── GNN.png
    ├── Link_predict.png
    ├── link_predict1.png
    ├── link_predict2.png
    ├── negative_edges.png
    ├── node_classify1.png
    └── node_classify2.png
└── large_graphs
    ├── assets
        ├── block1.png
        ├── block2.png
        ├── block_with_self1.png
        ├── block_with_self2.png
        ├── graph.png
        ├── graph_1layer_46.png
        ├── graph_2layer_46.png
        ├── in_subgraph_1.png
        └── in_subgraph_2.png
    ├── large_graphs.ipynb
    └── sampling.pptx


/.gitignore:
--------------------------------------------------------------------------------
1 | .ipynb_checkpoints
2 | __pycache__
3 | 


--------------------------------------------------------------------------------
/GNN_overview.pptx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/dglai/WWW20-Hands-on-Tutorial/8fd0bbf9aca2bfcb95074e4124e01a2b074be300/GNN_overview.pptx


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | Learning Graph Neural Networks with Deep Graph Library -- WWW 2020 Hands-on Tutorial
 2 | ===
 3 | 
 4 | **This tutorial is developed for DGL 0.4.3, so some of the contents could be out-dated.**
 5 | 
 6 | 
 7 | Presenters: George Karypis, Zheng Zhang, Minjie Wang, Da Zheng, Quan Gan
 8 | 
 9 | Time: (UTC/GMT +8) 09:00-16:30, April, 20, Monday
10 | 
11 | Abstract
12 | ---
13 | Learning from graph and relational data plays a major role in many applications
14 | including social network analysis, marketing, e-commerce, information retrieval,
15 | knowledge modeling, medical and biological sciences, engineering, and others. In the
16 | last few years, Graph Neural Networks (GNNs) have emerged as a promising new supervised
17 | learning framework capable of bringing the power of deep representation learning to
18 | graph and relational data. This ever-growing body of research has shown that GNNs
19 | achieve state-of-the-art performance for problems such as link prediction, fraud
20 | detection, target-ligand binding activity prediction, knowledge-graph completion,
21 | and product recommendations.
22 | 
23 | The objective of this tutorial is twofold. First, it will provide an overview of the
24 | theory behind GNNs, discuss the types of problems that GNNs are well suited for, and
25 | introduce some of the most widely used GNN model architectures and problems/applications
26 | that are designed to solve. Second, it will introduce the Deep Graph Library (DGL), a
27 | new software framework that simplifies the development of efficient GNN-based training
28 | and inference programs. To make things concrete, the tutorial will provide hands-on
29 | sessions using DGL. This hands-on part will cover both basic graph applications (e.g.,
30 | node classification and link prediction), as well as more advanced topics including
31 | training GNNs on large graphs and in a distributed setting. In addition, it will provide
32 | hands-on tutorials on using GNNs and DGL for real-world applications such as recommendation
33 | and fraud detection.
34 | 
35 | Prerequisite
36 | ---
37 | 
38 | The attendees should have some experience with deep learning and have used deep learning
39 | frameworks such as MXNet, Pytorch, and TensorFlow. Attendees should have experience with
40 | the various problems and techniques arising and used in graph learning and analysis, but
41 | it is not required.
42 | 
43 | Agenda
44 | ---
45 | 
46 | | Time | Session | Material | Presenter |
47 | |:----:|:-------:|:--------:|:---------:|
48 | | 9:00-9:45 | Overview of Graph Neural Networks | [slides](https://github.com/zheng-da/dgl-tutorial-full/blob/master/GNN_overview.pptx) | George Karypis |
49 | | 9:45-10:30 | Overview of Deep Graph Library (DGL) | [slides](https://github.com/zheng-da/dgl-tutorial-full/blob/master/dgl_api/dgl-www-zz.pptx) | Zheng Zhang |
50 | | 10:30-11:00 | Virtual Coffee Break | | |
51 | | 11:00-12:30 | (Hands-on) GNN models for basic graph tasks | [notebook](https://github.com/dglai/WWW20-Hands-on-Tutorial/blob/master/_legacy/basic_apps/BasicTasks_pytorch.ipynb) | Minjie Wang |
52 | | 12:30-14:00 | Virtual Lunch Break | | |
53 | | 14:00-15:30 | (Hands-on) GNN training on large graphs | [notebook](https://github.com/dglai/WWW20-Hands-on-Tutorial/blob/master/large_graphs) | Da Zheng |
54 | | 15:30-16:00 | Virtual Coffee Break | | |
55 | | 16:00-17:30 | (Hands-on) GNN models for real-world applications | [notebook](https://github.com/dglai/WWW20-Hands-on-Tutorial/blob/master/_legacy/advanced_apps/rec/Recommendation.ipynb) | Quan Gan |
56 | 
57 | Section Content
58 | ---
59 | 
60 | * **Section 1: Overview of Graph Neural Networks.** This section describes how graph
61 |   neural networks operate, their underlying theory, and their advantages over alternative
62 |   graph learning approaches. In addition, it describes various learning problems on graphs
63 |   and shows how GNNs can be used to solve them.
64 | * **Section 2: Overview of Deep Graph Library (DGL).** This section describes the different
65 |   abstractions and APIs that DGL provides, which are designed to simplify the implementation
66 |   of GNN models, and explains how DGL interfaces with MXNet, Pytorch, and TensorFlow.
67 |   It then proceeds to introduce DGL’s message-passing API that can be used to develop
68 |   arbitrarily complex GNNs and the pre-defined GNN nn modules that it provides.
69 | * **Section 3: GNN models for basic graph tasks.** This section demonstrates how to use
70 |   GNNs to solve four key graph learning tasks: node classification, link prediction, graph
71 |   classification, and network embedding pre-training. It will show how GraphSage, a popular
72 |   GNN model, can be implemented with DGL’s nn module and show how the node embeddings
73 |   computed by GraphSage can be used in different types of downstream tasks. In addition,
74 |   it will demonstrate the implementation of a customized GNN model with DGL’s message passing
75 |   interface.
76 | * **Section 4: GNN training on large graphs.** This section uses some of the models described
77 |   in Section 3 to demonstrate mini-batch training, multi-GPU training, and distributed
78 |   training in DGL. It starts by describing how the concept of mini-batch training applies to
79 |   GNNs and how mini-batch computations can be sped up by using various sampling techniques.
80 |   It then proceeds to illustrate how one such sampling technique, called neighbor sampling,
81 |   can be implemented in DGL using a Jupyter notebook. This notebook is then extended to show
82 |   multi-GPU training and distributed training.
83 | * **Section 5: GNN models for real-world applications.** This section uses the techniques
84 |   described in the earlier sections to show how GNNs can be used to develop scalable solutions
85 |   for recommendation and fraud detection. For recommendation, it develops a nearest-neighbor
86 |   item-based recommendation method that employs a GNN model to learn item embeddings by
87 |   following an end-to-end learning approach. For fraud detection, it extends the node
88 |   classification model in the previous section to work on heterogeneous graphs and addresses
89 |   the scenario where there exist few labelled samples.
90 | 
91 | ## Community
92 | 
93 | Join our [Slack channel "WWW20-tutorial"](https://join.slack.com/t/deep-graph-library/shared_invite/zt-docxzmw2-9yMsL7rv9a2tpjzlLlVptg) for discussion.
94 | 


--------------------------------------------------------------------------------
/_legacy/advanced_apps/rec/graphsage.py:
--------------------------------------------------------------------------------
  1 | import torch
  2 | import torch.nn as nn
  3 | import torch.nn.functional as F
  4 | 
  5 | import dgl
  6 | from dgl import DGLGraph
  7 | 
  8 | # Load Pytorch as backend
  9 | dgl.load_backend('pytorch')
 10 | 
 11 | import numpy as np
 12 | import pandas as pd
 13 | from scipy import stats
 14 | from scipy import sparse as spsp
 15 | 
 16 | import pickle
 17 | user_movie_spm = pickle.load(open('bx/bx_train.pkl', 'rb'))
 18 | abstracts = pickle.load(open('bx/bx_book_abstract.pkl', 'rb')).asnumpy()
 19 | titles = pickle.load(open('bx/bx_book_title.pkl', 'rb')).asnumpy()
 20 | features = torch.tensor(np.concatenate((titles, abstracts), 1), dtype=torch.float32)
 21 | valid_set, test_set = pickle.load(open('bx/bx_eval.pkl', 'rb'))
 22 | neg_valid, neg_test = pickle.load(open('bx/bx_neg.pkl', 'rb'))
 23 | 
 24 | num_users = user_movie_spm.shape[0]
 25 | num_movies = user_movie_spm.shape[1]
 26 | 
 27 | users_valid = np.arange(num_users)
 28 | movies_valid = valid_set
 29 | users_test = np.arange(num_users)
 30 | movies_test = test_set
 31 | 
 32 | #features = torch.cat([features, movie_popularity, v], 1)
 33 | #one_hot = torch.tensor(np.diag(np.ones(shape=(num_movies))), dtype=torch.float32)
 34 | #features = torch.cat([features, one_hot], 1)
 35 | in_feats = features.shape[1]
 36 | print('#feats:', in_feats)
 37 | 
 38 | user_deg = user_movie_spm.dot(np.ones((num_movies)))
 39 | print(len(user_deg))
 40 | user_deg1 = np.nonzero(user_deg == 1)[0]
 41 | user_deg2 = np.nonzero(user_deg == 2)[0]
 42 | user_deg3 = np.nonzero(user_deg == 3)[0]
 43 | user_deg4 = np.nonzero(user_deg == 4)[0]
 44 | user_deg5 = np.nonzero(user_deg == 5)[0]
 45 | user_deg6 = np.nonzero(user_deg == 6)[0]
 46 | user_deg7 = np.nonzero(user_deg == 7)[0]
 47 | user_deg8 = np.nonzero(user_deg == 8)[0]
 48 | user_deg9 = np.nonzero(user_deg == 9)[0]
 49 | user_deg10 = np.nonzero(np.logical_and(10 <= user_deg, user_deg < 20))[0]
 50 | user_deg20 = np.nonzero(np.logical_and(20 <= user_deg, user_deg < 30))[0]
 51 | user_deg30 = np.nonzero(np.logical_and(30 <= user_deg, user_deg < 40))[0]
 52 | user_deg40 = np.nonzero(np.logical_and(40 <= user_deg, user_deg < 50))[0]
 53 | user_deg50 = np.nonzero(np.logical_and(50 <= user_deg, user_deg < 60))[0]
 54 | user_deg60 = np.nonzero(np.logical_and(60 <= user_deg, user_deg < 70))[0]
 55 | user_deg70 = np.nonzero(np.logical_and(70 <= user_deg, user_deg < 80))[0]
 56 | user_deg80 = np.nonzero(np.logical_and(80 <= user_deg, user_deg < 90))[0]
 57 | user_deg90 = np.nonzero(np.logical_and(90 <= user_deg, user_deg < 100))[0]
 58 | user_deg100 = np.nonzero(user_deg >= 100)[0]
 59 | print(len(user_deg1))
 60 | print(len(user_deg2))
 61 | print(len(user_deg3))
 62 | print(len(user_deg4))
 63 | print(len(user_deg5))
 64 | print(len(user_deg6))
 65 | print(len(user_deg7))
 66 | print(len(user_deg8))
 67 | print(len(user_deg9))
 68 | print(len(user_deg10))
 69 | print(len(user_deg20))
 70 | print(len(user_deg30))
 71 | print(len(user_deg40))
 72 | print(len(user_deg50))
 73 | print(len(user_deg60))
 74 | print(len(user_deg70))
 75 | print(len(user_deg80))
 76 | print(len(user_deg90))
 77 | print(len(user_deg100))
 78 | 
 79 | movie_deg = user_movie_spm.transpose().dot(np.ones((num_users)))
 80 | test_deg = np.zeros((num_users))
 81 | for i in range(num_users):
 82 |     movie = int(movies_test[i])
 83 |     test_deg[i] = movie_deg[movie]
 84 | test_deg_dict = {}
 85 | for i in range(1, 10):
 86 |     test_deg_dict[i] = np.nonzero(test_deg == i)[0]
 87 | for i in range(1, 10):
 88 |     test_deg_dict[i*10] = np.nonzero(np.logical_and(i*10 <= test_deg, test_deg < (i+1)*10))[0]
 89 | test_deg_dict[100] = np.nonzero(test_deg >= 100)[0]
 90 | tmp = 0
 91 | for key, deg in test_deg_dict.items():
 92 |     print(key, len(deg))
 93 |     tmp += len(deg)
 94 | print(num_users, tmp)
 95 | 
 96 | from SLIM import SLIM, SLIMatrix
 97 | model = SLIM()
 98 | params = {'algo': 'cd', 'nthreads': 16, 'l1r': 2, 'l2r': 1}
 99 | trainmat = SLIMatrix(user_movie_spm.tocsr())
100 | model.train(params, trainmat)
101 | model.save_model(modelfname='slim_model.csr', mapfname='slim_map.csr')
102 | 
103 | from slim_load import read_csr
104 | 
105 | movie_spm = read_csr('slim_model.csr')
106 | print('#edges:', movie_spm.nnz)
107 | print('most similar:', np.max(movie_spm.data))
108 | print('most unsimilar:', np.min(movie_spm.data))
109 | 
110 | deg = movie_spm.dot(np.ones((num_movies)))
111 | print(np.sum(deg == 0))
112 | print(len(deg))
113 | print(movie_spm.sum(0))
114 | 
115 | g = dgl.DGLGraph(movie_spm, readonly=True)
116 | g.edata['similarity'] = torch.tensor(movie_spm.data, dtype=torch.float32)
117 | 
118 | user_id = user_movie_spm.row
119 | movie_id = user_movie_spm.col
120 | movie_deg = user_movie_spm.transpose().dot(np.ones((num_users,)))
121 | movie_ratio = movie_deg / np.sum(movie_deg)
122 | # 1e-6 is a hyperparameter for this dataset.
123 | movie_sample_prob = 1 - np.maximum(1 - np.sqrt(1e-5 / movie_ratio), 0)
124 | sample_prob = movie_sample_prob[movie_id]
125 | sample = np.random.uniform(size=(len(movie_id),))
126 | user_id = user_id[sample_prob > sample]
127 | movie_id = movie_id[sample_prob > sample]
128 | print('#samples:', len(user_id))
129 | spm = spsp.coo_matrix((np.ones((len(user_id),)), (user_id, movie_id)))
130 | print(spm.shape)
131 | movie_deg = spm.transpose().dot(np.ones((num_users,)))
132 | print(np.sum(movie_deg == 0))
133 | 
134 | movie_spm = np.dot(spm.transpose(), spm)
135 | print(movie_spm.nnz)
136 | dense_movie = np.sort(movie_spm.todense())
137 | topk_movie = dense_movie[:,-50]
138 | topk_movie_spm = movie_spm > topk_movie
139 | 
140 | from sklearn.metrics.pairwise import cosine_similarity
141 | movie_spm = cosine_similarity(spm.transpose(),dense_output=False)
142 | 
143 | dense_movie = np.sort(movie_spm.todense())
144 | topk_movie = dense_movie[:,-20]
145 | topk_movie_spm = movie_spm > topk_movie
146 | 
147 | movie_spm = spsp.csr_matrix(topk_movie_spm)
148 | print(movie_spm.nnz)
149 | g = dgl.DGLGraph(movie_spm, readonly=True)
150 | 
151 | #from sageconv import SAGEConv
152 | from dgl.nn.pytorch import conv as dgl_conv
153 | 
154 | class GraphSAGEModel(nn.Module):
155 |     def __init__(self,
156 |                  in_feats,
157 |                  n_hidden,
158 |                  out_dim,
159 |                  n_layers,
160 |                  activation,
161 |                  dropout,
162 |                  aggregator_type):
163 |         super(GraphSAGEModel, self).__init__()
164 |         self.layers = nn.ModuleList()
165 |         if n_layers == 1:
166 |             self.layers.append(dgl_conv.SAGEConv(in_feats, n_hidden, aggregator_type,
167 |                                         feat_drop=dropout, activation=None))
168 |         elif n_layers > 1:
169 |             # input layer
170 |             self.layers.append(dgl_conv.SAGEConv(in_feats, n_hidden, aggregator_type,
171 |                                         feat_drop=dropout, activation=activation))
172 |             # hidden layer
173 |             for i in range(n_layers - 2):
174 |                 self.layers.append(dgl_conv.SAGEConv(n_hidden, n_hidden, aggregator_type,
175 |                                             feat_drop=dropout, activation=activation))
176 |             # output layer
177 |             self.layers.append(dgl_conv.SAGEConv(n_hidden, out_dim, aggregator_type,
178 |                                         feat_drop=dropout, activation=None))
179 | 
180 |     def forward(self, g, features):
181 |         h = features
182 |         for layer in self.layers:
183 |             h = layer(g, h)
184 |             #h = layer(g, prev_h, g.edata['similarity'])
185 |             #h = tmp + prev_h
186 |             #prev_h = h
187 |         return h
188 | 
189 | class EncodeLayer(nn.Module):
190 |     def __init__(self, in_feats, num_hidden, device):
191 |         super(EncodeLayer, self).__init__()
192 |         self.proj = nn.Linear(in_feats, int(num_hidden))
193 |         #self.emb = nn.Embedding(num_movies, int(num_hidden))
194 |         #self.nid = torch.arange(num_movies).to(device)
195 | 
196 |     def forward(self, feats):
197 |         #return torch.cat([self.proj(feats), self.emb(self.nid)], 1)
198 |         return self.proj(feats)
199 |         #return self.emb(self.nid)
200 | 
201 | beta = 0
202 | gamma = 0
203 | 
204 | class FISM(nn.Module):
205 |     def __init__(self, user_movie_spm, gconv_p, gconv_q, in_feats, num_hidden, device):
206 |         super(FISM, self).__init__()
207 |         self.num_users = user_movie_spm.shape[0]
208 |         self.num_movies = user_movie_spm.shape[1]
209 |         self.b_u = nn.Parameter(torch.zeros(num_users))
210 |         self.b_i = nn.Parameter(torch.zeros(num_movies))
211 |         self.user_deg = torch.tensor(user_movie_spm.dot(np.ones(num_movies)), dtype=torch.float32).to(device)
212 |         values = user_movie_spm.data
213 |         indices = np.vstack((user_movie_spm.row, user_movie_spm.col))
214 |         indices = torch.LongTensor(indices)
215 |         values = torch.FloatTensor(values)
216 |         self.user_item_spm = torch.sparse_coo_tensor(indices, values, user_movie_spm.shape).to(device)
217 |         self.users = user_movie_spm.row
218 |         self.movies = user_movie_spm.col
219 |         self.ratings = user_movie_spm.data
220 |         user_movie_csr = user_movie_spm.tocsr()
221 |         self.neg_train = [
222 |                 np.setdiff1d(np.arange(self.num_movies),
223 |                     user_movie_csr[i].nonzero()[1])
224 |                 for i in range(self.num_users)
225 |                 ]
226 |         self.encode_p = EncodeLayer(in_feats, num_hidden, device)
227 |         self.encode_q = EncodeLayer(in_feats, num_hidden, device)
228 |         self.gconv_p = gconv_p
229 |         self.gconv_q = gconv_q
230 | 
231 |     def _est_rating(self, P, Q, user_idx, item_idx):
232 |         bu = self.b_u[user_idx]
233 |         bi = self.b_i[item_idx]
234 |         user_emb = torch.sparse.mm(self.user_item_spm, P)
235 |         user_emb = (user_emb[user_idx] - P[item_idx]) / \
236 |                 (torch.unsqueeze(self.user_deg[user_idx], 1) - 1).clamp(min=1)
237 |         tmp = torch.mul(user_emb, Q[item_idx])
238 |         r_ui = bu + bi + torch.sum(tmp, 1)
239 |         return r_ui
240 |     
241 |     def est_rating(self, g, features, user_idx, item_idx, neg_item_idx):
242 |         P = self.gconv_p(g, self.encode_p(features))
243 |         Q = self.gconv_q(g, self.encode_q(features))
244 |         r = self._est_rating(P, Q, user_idx, item_idx)
245 |         neg_sample_size = len(neg_item_idx) / len(user_idx)
246 |         neg_r = self._est_rating(P, Q, np.repeat(user_idx, neg_sample_size), neg_item_idx)
247 |         return torch.unsqueeze(r, 1), neg_r.reshape((-1, int(neg_sample_size)))
248 | 
249 |     def loss(self, P, Q, r_ui, neg_r_ui):
250 |         diff = 1 - (r_ui - neg_r_ui)
251 |         return torch.sum(torch.mul(diff, diff)/2) \
252 |             + beta/2 * torch.sum(torch.mul(P, P) + torch.mul(Q, Q)) \
253 |             + gamma/2 * (torch.sum(torch.mul(self.b_u, self.b_u)) + torch.sum(torch.mul(self.b_i, self.b_i)))
254 | 
255 |     def forward(self, g, features, neg_sample_size):
256 |         P = self.gconv_p(g, self.encode_p(features))
257 |         Q = self.gconv_q(g, self.encode_q(features))
258 |         tot = len(self.users)
259 |         pos_idx = np.random.choice(tot, 1024)
260 |         user_idx = self.users[pos_idx]
261 |         item_idx = self.movies[pos_idx]
262 |         neg_item_idx = np.array([
263 |             np.random.choice(self.neg_train[i], neg_sample_size)
264 |             for i in user_idx
265 |             ]).flatten()
266 |         neg_item_idx = np.random.choice(self.num_movies, len(pos_idx) * neg_sample_size)
267 |         r_ui = self._est_rating(P, Q, user_idx, item_idx)
268 |         neg_r_ui = self._est_rating(P, Q, np.repeat(user_idx, neg_sample_size), neg_item_idx)
269 |         r_ui = torch.unsqueeze(r_ui, 1)
270 |         neg_r_ui = neg_r_ui.reshape((-1, int(neg_sample_size)))
271 |         return self.loss(P, Q, r_ui, neg_r_ui)
272 | 
273 | def RecValid(model, g, features):
274 |     model.eval()
275 |     with torch.no_grad():
276 |         neg_movies_eval = neg_valid[users_valid]
277 |         r, neg_r = model.est_rating(g, features, users_valid, movies_valid, neg_movies_eval.flatten())
278 |         hits10 = (torch.sum(neg_r > r, 1) <= 10).cpu().numpy()
279 |         return np.mean(hits10)
280 |     
281 | def RecTest(model, g, features):
282 |     model.eval()
283 |     with torch.no_grad():
284 |         neg_movies_eval = neg_test[users_test]
285 |         r, neg_r = model.est_rating(g, features, users_test, movies_test, neg_movies_eval.flatten())
286 |         hits10 = (torch.sum(neg_r > r, 1) <= 10).cpu().numpy()
287 |         for popularity, users in test_deg_dict.items():
288 |             print(popularity, np.mean(hits10[users]))
289 |         return np.mean(hits10)
290 | 
291 | if torch.cuda.is_available():
292 |     device = torch.device('cuda:0')
293 | else:
294 |     device = torch.device('cpu')
295 | 
296 | 
297 | #Model hyperparameters
298 | n_hidden = 16
299 | n_layers = 0
300 | dropout = 0
301 | aggregator_type = 'gcn'
302 | 
303 | # create GraphSAGE model
304 | gconv_p = GraphSAGEModel(n_hidden,
305 |                          n_hidden,
306 |                          n_hidden,
307 |                          n_layers,
308 |                          F.relu,
309 |                          dropout,
310 |                          aggregator_type)
311 | 
312 | gconv_q = GraphSAGEModel(n_hidden,
313 |                          n_hidden,
314 |                          n_hidden,
315 |                          n_layers,
316 |                          F.relu,
317 |                          dropout,
318 |                          aggregator_type)
319 | 
320 | model = FISM(user_movie_spm, gconv_p, gconv_q, in_feats, n_hidden, device).to(device)
321 | g.to(device)
322 | features = features.to(device)
323 | 
324 | # Training hyperparameters
325 | weight_decay = 1e-3
326 | n_epochs = 30000
327 | lr = 3e-5
328 | neg_sample_size = 20
329 | 
330 | # use optimizer
331 | optimizer = torch.optim.Adam(model.parameters(), lr=lr, weight_decay=weight_decay)
332 | 
333 | # initialize graph
334 | dur = []
335 | prev_acc = 0
336 | for epoch in range(n_epochs):
337 |     model.train()
338 |     loss = model(g, features, neg_sample_size)
339 |     optimizer.zero_grad()
340 |     loss.backward()
341 |     optimizer.step()
342 |     
343 |     if epoch % 10 == 0:
344 |         hits10 = RecValid(model, g, features)
345 |         print("Epoch {:05d} | Loss {:.4f} | HITS@10:{:.4f}".format(epoch, loss.item(), np.mean(hits10)))
346 | 
347 | print()
348 | # Let's save the trained node embeddings.
349 | hits10 = RecTest(model, g, features)
350 | print("Test HITS@10:{:.4f}".format(np.mean(hits10)))
351 | 
352 | # use optimizer
353 | lr = 1e-4
354 | weight_decay = 1e-2
355 | 
356 | optimizer = torch.optim.Adam(model.parameters(), lr=lr, weight_decay=weight_decay)
357 | 
358 | # initialize graph
359 | dur = []
360 | prev_acc = 0
361 | for epoch in range(1000):
362 |     model.train()
363 |     loss = model(g, features, neg_sample_size)
364 |     optimizer.zero_grad()
365 |     loss.backward()
366 |     optimizer.step()
367 |     
368 |     if epoch % 10 == 0:
369 |         hits10 = RecValid(model, g, features)
370 |         print("Epoch {:05d} | Loss {:.4f} | HITS@10:{:.4f}".format(epoch, loss.item(), np.mean(hits10)))
371 | 
372 | print()
373 | # Let's save the trained node embeddings.
374 | hits10 = RecTest(model, g, features)
375 | print("Test HITS@10:{:.4f}".format(np.mean(hits10)))
376 | 


--------------------------------------------------------------------------------
/_legacy/advanced_apps/rec/movielens.py:
--------------------------------------------------------------------------------
  1 | 
  2 | import pandas as pd
  3 | import dgl
  4 | import os
  5 | import torch
  6 | import numpy as np
  7 | import scipy.sparse as sp
  8 | import time
  9 | from functools import partial
 10 | import stanfordnlp
 11 | import re
 12 | import tqdm
 13 | import string
 14 | 
 15 | def _download_and_extract(url, path, filename):
 16 |     import zipfile, tarfile
 17 |     import requests
 18 | 
 19 |     fn = os.path.join(path, filename)
 20 | 
 21 |     while True:
 22 |         try:
 23 |             tar = tarfile.open(fn, 'r:gz')
 24 |             tar.extractall()
 25 |             tar.close()
 26 |             break
 27 |         except Exception:
 28 |             os.makedirs(path, exist_ok=True)
 29 |             f_remote = requests.get(url, stream=True)
 30 |             sz = f_remote.headers.get('content-length')
 31 |             assert f_remote.status_code == 200, 'fail to open {}'.format(url)
 32 |             with open(fn, 'wb') as writer:
 33 |                 for chunk in f_remote.iter_content(chunk_size=1024*1024):
 34 |                     writer.write(chunk)
 35 |             print('Download finished. Unzipping the file...')
 36 | 
 37 | class MovieLens(object):
 38 |     def __init__(self, directory, neg_size=99):
 39 |         '''
 40 |         directory: path to movielens directory which should have the three
 41 |                    files:
 42 |                    users.dat
 43 |                    movies.dat
 44 |                    ratings.dat
 45 |         '''
 46 |         url = 'https://s3.us-east-2.amazonaws.com/dgl.ai/dataset/movielens.tar.gz'
 47 |         if not os.path.exists(os.path.join(directory, 'users.dat')):
 48 |             print('File not found. Downloading from', url)
 49 |             _download_and_extract(url, directory, 'movielens.tar.gz')
 50 |         directory = os.path.join(directory, 'movielens')
 51 | 
 52 |         users = []
 53 |         movies = []
 54 |         ratings = []
 55 | 
 56 |         # read users
 57 |         with open(os.path.join(directory, 'users.dat')) as f:
 58 |             for l in f:
 59 |                 id_, gender, age, occupation, zip_ = l.strip().split('::')
 60 |                 users.append({
 61 |                     'id': int(id_),
 62 |                     'gender': gender,
 63 |                     'age': age,
 64 |                     'occupation': occupation,
 65 |                     'zip': zip_,
 66 |                     })
 67 |         self.users = pd.DataFrame(users)
 68 | 
 69 |         # read movies
 70 |         with open(os.path.join(directory, 'movies.dat'), encoding='latin1') as f:
 71 |             for l in f:
 72 |                 id_, title, genres = l.strip().split('::')
 73 |                 genres_set = set(genres.split('|'))
 74 | 
 75 |                 # extract year
 76 |                 assert re.match(r'.*\([0-9]{4}\)$', title)
 77 |                 year = title[-5:-1]
 78 |                 title = title[:-6].strip()
 79 | 
 80 |                 data = {'id': int(id_), 'title': title, 'year': year}
 81 |                 for g in genres_set:
 82 |                     data[g] = True
 83 |                 movies.append(data)
 84 |         self.movies = (
 85 |                 pd.DataFrame(movies)
 86 |                 .fillna(False)
 87 |                 .astype({'year': 'category'}))
 88 |         self.genres = self.movies.columns[self.movies.dtypes == bool]
 89 | 
 90 |         # read ratings
 91 |         with open(os.path.join(directory, 'ratings.dat')) as f:
 92 |             for l in f:
 93 |                 user_id, movie_id, rating, timestamp = [int(_) for _ in l.split('::')]
 94 |                 ratings.append({
 95 |                     'user_id': user_id,
 96 |                     'movie_id': movie_id,
 97 |                     'rating': rating,
 98 |                     'timestamp': timestamp,
 99 |                     })
100 |         ratings = pd.DataFrame(ratings)
101 |         movie_count = ratings['movie_id'].value_counts()
102 |         movie_count.name = 'movie_count'
103 |         ratings = ratings.join(movie_count, on='movie_id')
104 |         self.ratings = ratings
105 | 
106 |         # determine test and validation set
107 |         self.ratings['timerank'] = self.ratings.groupby('user_id')['timestamp'].rank().astype('int')
108 |         self.ratings['test_mask'] = (self.ratings['timerank'] == 1)
109 |         self.ratings['valid_mask'] = (self.ratings['timerank'] == 2)
110 | 
111 |         # remove movies that only appear in validation and test set
112 |         movies_selected = self.ratings[self.ratings['timerank'] > 2]['movie_id'].unique()
113 |         self.ratings = self.ratings[self.ratings['movie_id'].isin(movies_selected)].copy()
114 | 
115 |         # drop users and movies which do not exist in ratings
116 |         self.users = self.users[self.users['id'].isin(self.ratings['user_id'])]
117 |         self.movies = self.movies[self.movies['id'].isin(self.ratings['movie_id'])]
118 |         self.user_ids_invmap = {u: i for i, u in enumerate(self.users['id'])}
119 |         self.movie_ids_invmap = {m: i for i, m in enumerate(self.movies['id'])}
120 | 
121 |         self.ratings['user_idx'] = self.ratings['user_id'].apply(lambda x: self.user_ids_invmap[x])
122 |         self.ratings['movie_idx'] = self.ratings['movie_id'].apply(lambda x: self.movie_ids_invmap[x])
123 | 
124 |         # parse movie features
125 |         movie_data = {}
126 |         num_movies = len(self.movies)
127 |         num_users = len(self.users)
128 | 
129 |         movie_genres = torch.from_numpy(self.movies[self.genres].values.astype('float32'))
130 |         movie_data['genre'] = movie_genres
131 |         movie_data['year'] = \
132 |                 torch.LongTensor(self.movies['year'].cat.codes.values.astype('int64') + 1)
133 | 
134 |         nlp = stanfordnlp.Pipeline(use_gpu=False, processors='tokenize,lemma')
135 |         vocab = set()
136 |         title_words = []
137 |         for t in tqdm.tqdm(self.movies['title'].values):
138 |             doc = nlp(t)
139 |             words = set()
140 |             for s in doc.sentences:
141 |                 words.update(w.lemma.lower() for w in s.words
142 |                              if not re.fullmatch(r'['+string.punctuation+']+', w.lemma))
143 |             vocab.update(words)
144 |             title_words.append(words)
145 |         vocab = list(vocab)
146 |         vocab_invmap = {w: i for i, w in enumerate(vocab)}
147 |         # bag-of-words
148 |         movie_data['title'] = torch.zeros(num_movies, len(vocab))
149 |         for i, tw in enumerate(tqdm.tqdm(title_words)):
150 |             movie_data['title'][i, [vocab_invmap[w] for w in tw]] = 1
151 |         self.vocab = vocab
152 |         self.vocab_invmap = vocab_invmap
153 | 
154 |         self.movie_data = movie_data
155 | 
156 |         # unobserved items for each user in training set
157 |         self.neg_train = [None] * len(self.users)
158 |         # negative examples for validation and test for evaluating ranking
159 |         self.neg_valid = np.zeros((len(self.users), neg_size), dtype='int64')
160 |         self.neg_test = np.zeros((len(self.users), neg_size), dtype='int64')
161 |         rating_groups = self.ratings.groupby('user_idx')
162 | 
163 |         for u in range(len(self.users)):
164 |             interacted_movies = self.ratings['movie_idx'][rating_groups.indices[u]]
165 |             timerank = self.ratings['timerank'][rating_groups.indices[u]]
166 | 
167 |             interacted_movies_valid = interacted_movies[timerank > 2]
168 |             neg_samples = np.setdiff1d(np.arange(len(self.movies)), interacted_movies_valid)
169 |             self.neg_train[u] = neg_samples
170 |             self.neg_valid[u] = np.random.choice(neg_samples, neg_size)
171 | 
172 |             interacted_movies_test = interacted_movies[timerank > 1]
173 |             neg_samples = np.setdiff1d(np.arange(len(self.movies)), interacted_movies_test)
174 |             self.neg_test[u] = np.random.choice(neg_samples, neg_size)
175 | 


--------------------------------------------------------------------------------
/_legacy/advanced_apps/rec/sageconv.py:
--------------------------------------------------------------------------------
  1 | """Torch Module for GraphSAGE layer"""
  2 | # pylint: disable= no-member, arguments-differ, invalid-name
  3 | from torch import nn
  4 | from torch.nn import functional as F
  5 | 
  6 | from dgl import function as fn
  7 | 
  8 | 
  9 | class SAGEConv(nn.Module):
 10 |     r"""GraphSAGE layer from paper `Inductive Representation Learning on
 11 |     Large Graphs <https://arxiv.org/pdf/1706.02216.pdf>`__.
 12 | 
 13 |     .. math::
 14 |         h_{\mathcal{N}(i)}^{(l+1)} & = \mathrm{aggregate}
 15 |         \left(\{h_{j}^{l}, \forall j \in \mathcal{N}(i) \}\right)
 16 | 
 17 |         h_{i}^{(l+1)} & = \sigma \left(W \cdot \mathrm{concat}
 18 |         (h_{i}^{l}, h_{\mathcal{N}(i)}^{l+1} + b) \right)
 19 | 
 20 |         h_{i}^{(l+1)} & = \mathrm{norm}(h_{i}^{l})
 21 | 
 22 |     Parameters
 23 |     ----------
 24 |     in_feats : int
 25 |         Input feature size.
 26 |     out_feats : int
 27 |         Output feature size.
 28 |     feat_drop : float
 29 |         Dropout rate on features, default: ``0``.
 30 |     aggregator_type : str
 31 |         Aggregator type to use (``sum``, ``mean``, ``gcn``, ``pool``, ``lstm``).
 32 |     bias : bool
 33 |         If True, adds a learnable bias to the output. Default: ``True``.
 34 |     norm : callable activation function/layer or None, optional
 35 |         If not None, applies normalization to the updated node features.
 36 |     activation : callable activation function/layer or None, optional
 37 |         If not None, applies an activation function to the updated node features.
 38 |         Default: ``None``.
 39 |     """
 40 |     def __init__(self,
 41 |                  in_feats,
 42 |                  out_feats,
 43 |                  aggregator_type,
 44 |                  feat_drop=0.,
 45 |                  bias=True,
 46 |                  norm=None,
 47 |                  activation=None):
 48 |         super(SAGEConv, self).__init__()
 49 |         self._in_feats = in_feats
 50 |         self._out_feats = out_feats
 51 |         self._aggre_type = aggregator_type
 52 |         self.norm = norm
 53 |         self.feat_drop = nn.Dropout(feat_drop)
 54 |         self.activation = activation
 55 |         # aggregator type: mean/pool/lstm/gcn
 56 |         if aggregator_type == 'pool':
 57 |             self.fc_pool = nn.Linear(in_feats, in_feats)
 58 |         if aggregator_type == 'lstm':
 59 |             self.lstm = nn.LSTM(in_feats, in_feats, batch_first=True)
 60 |         if aggregator_type != 'gcn':
 61 |             self.fc_self = nn.Linear(in_feats, out_feats, bias=bias)
 62 |         self.fc_neigh = nn.Linear(in_feats, out_feats, bias=bias)
 63 |         self.reset_parameters()
 64 | 
 65 |     def reset_parameters(self):
 66 |         """Reinitialize learnable parameters."""
 67 |         gain = nn.init.calculate_gain('relu')
 68 |         if self._aggre_type == 'pool':
 69 |             nn.init.xavier_uniform_(self.fc_pool.weight, gain=gain)
 70 |         if self._aggre_type == 'lstm':
 71 |             self.lstm.reset_parameters()
 72 |         if self._aggre_type != 'gcn':
 73 |             nn.init.xavier_uniform_(self.fc_self.weight, gain=gain)
 74 |         nn.init.xavier_uniform_(self.fc_neigh.weight, gain=gain)
 75 | 
 76 |     def _lstm_reducer(self, nodes):
 77 |         """LSTM reducer
 78 |         NOTE(zihao): lstm reducer with default schedule (degree bucketing)
 79 |         is slow, we could accelerate this with degree padding in the future.
 80 |         """
 81 |         m = nodes.mailbox['m'] # (B, L, D)
 82 |         batch_size = m.shape[0]
 83 |         h = (m.new_zeros((1, batch_size, self._in_feats)),
 84 |              m.new_zeros((1, batch_size, self._in_feats)))
 85 |         _, (rst, _) = self.lstm(m, h)
 86 |         return {'neigh': rst.squeeze(0)}
 87 | 
 88 |     def forward(self, graph, feat, e_feat):
 89 |         r"""Compute GraphSAGE layer.
 90 | 
 91 |         Parameters
 92 |         ----------
 93 |         graph : DGLGraph
 94 |             The graph.
 95 |         feat : torch.Tensor
 96 |             The input feature of shape :math:`(N, D_{in})` where :math:`D_{in}`
 97 |             is size of input feature, :math:`N` is the number of nodes.
 98 | 
 99 |         Returns
100 |         -------
101 |         torch.Tensor
102 |             The output feature of shape :math:`(N, D_{out})` where :math:`D_{out}`
103 |             is size of output feature.
104 |         """
105 |         graph = graph.local_var()
106 |         feat = self.feat_drop(feat)
107 |         h_self = feat
108 |         graph.edata['e'] = e_feat
109 |         if self._aggre_type == 'sum':
110 |             graph.ndata['h'] = feat
111 |             graph.update_all(fn.u_mul_e('h', 'e', 'm'), fn.sum('m', 'neigh'))
112 |             h_neigh = graph.ndata['neigh']
113 |         elif self._aggre_type == 'mean':
114 |             graph.ndata['h'] = feat
115 |             graph.update_all(fn.u_mul_e('h', 'e', 'm'), fn.mean('m', 'neigh'))
116 |             h_neigh = graph.ndata['neigh']
117 |         elif self._aggre_type == 'gcn':
118 |             graph.ndata['h'] = feat
119 |             graph.update_all(fn.u_mul_e('h', 'e', 'm'), fn.sum('m', 'neigh'))
120 |             # divide in_degrees
121 |             degs = graph.in_degrees().float()
122 |             degs = degs.to(feat.device)
123 |             h_neigh = (graph.ndata['neigh'] + graph.ndata['h']) / (degs.unsqueeze(-1) + 1)
124 |         elif self._aggre_type == 'pool':
125 |             graph.ndata['h'] = F.relu(self.fc_pool(feat))
126 |             graph.update_all(fn.u_mul_e('h', 'e', 'm'), fn.max('m', 'neigh'))
127 |             h_neigh = graph.ndata['neigh']
128 |         elif self._aggre_type == 'lstm':
129 |             graph.ndata['h'] = feat
130 |             graph.update_all(fn.u_mul_e('h', 'e', 'm'), self._lstm_reducer)
131 |             h_neigh = graph.ndata['neigh']
132 |         else:
133 |             raise KeyError('Aggregator type {} not recognized.'.format(self._aggre_type))
134 |         # GraphSAGE GCN does not require fc_self.
135 |         if self._aggre_type == 'gcn':
136 |             rst = self.fc_neigh(h_neigh)
137 |         else:
138 |             rst = self.fc_self(h_self) + self.fc_neigh(h_neigh)
139 |         # activation
140 |         if self.activation is not None:
141 |             rst = self.activation(rst)
142 |         # normalization
143 |         if self.norm is not None:
144 |             rst = self.norm(rst)
145 |         return rst
146 | 


--------------------------------------------------------------------------------
/_legacy/advanced_apps/rec/slim_load.py:
--------------------------------------------------------------------------------
 1 | import numpy as np
 2 | from scipy import sparse as spsp
 3 | 
 4 | def read_csr(filename):
 5 |     f = open(filename, 'r')
 6 |     all_rows = []
 7 |     all_cols = []
 8 |     all_vals = []
 9 |     for i, line in enumerate(f.readlines()):
10 |         strs = line.split(' ')
11 |         cols = [int(s) for s in strs[1::2]]
12 |         vals = [float(s) for s in strs[2::2]]
13 |         all_cols.extend(cols)
14 |         all_vals.extend(vals)
15 |         all_rows.extend([i for _ in cols])
16 |     all_rows = np.array(all_rows, dtype=np.int64)
17 |     all_cols = np.array(all_cols, dtype=np.int64)
18 |     all_vals = np.array(all_vals, dtype=np.float32)
19 |     mat = spsp.coo_matrix((all_vals, (all_rows, all_cols)))
20 |     return mat
21 | 
22 | 


--------------------------------------------------------------------------------
/_legacy/basic_apps/data_loading.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "code",
  5 |    "execution_count": null,
  6 |    "metadata": {},
  7 |    "outputs": [],
  8 |    "source": [
  9 |     "import numpy as np\n",
 10 |     "from scipy import sparse as spsp\n",
 11 |     "import pandas as pd\n",
 12 |     "import dgl\n",
 13 |     "import torch"
 14 |    ]
 15 |   },
 16 |   {
 17 |    "cell_type": "markdown",
 18 |    "metadata": {},
 19 |    "source": [
 20 |     "## Synthesize data\n",
 21 |     "\n",
 22 |     "Please use the 'synthetic_data.ipynb' notebook to synthesize data in this tutorial."
 23 |    ]
 24 |   },
 25 |   {
 26 |    "cell_type": "markdown",
 27 |    "metadata": {},
 28 |    "source": [
 29 |     "## Construct a homogeneous graph\n",
 30 |     "\n",
 31 |     "'edges.csv' stores a graph with multiple edge types. Although the graph has multiple edge types and node types, we can still construct it as a homogeneous graph. In this case, we create `DGLGraph` for this graph and stores node types and edge types as node data and edge data. In this graph, each node has two features."
 32 |    ]
 33 |   },
 34 |   {
 35 |    "cell_type": "code",
 36 |    "execution_count": null,
 37 |    "metadata": {},
 38 |    "outputs": [],
 39 |    "source": [
 40 |     "node_data = pd.read_csv('node_data.csv', header=None)\n",
 41 |     "edges = pd.read_csv('edges.csv', header=None)\n",
 42 |     "\n",
 43 |     "num_edges = edges.shape[0]\n",
 44 |     "spm = spsp.coo_matrix((np.ones(num_edges), (edges[0], edges[1])))\n",
 45 |     "g = dgl.DGLGraph(spm, readonly=True)\n",
 46 |     "ndata1 = np.expand_dims(np.array(node_data[1]), 1)\n",
 47 |     "ndata2 = np.expand_dims(np.array(node_data[2]), 1)\n",
 48 |     "ndata = np.concatenate([ndata1, ndata2], 1)\n",
 49 |     "g.ndata['feats'] = torch.tensor(ndata)\n",
 50 |     "g.edata['rel'] = torch.tensor(np.array(edges[2]))"
 51 |    ]
 52 |   },
 53 |   {
 54 |    "cell_type": "markdown",
 55 |    "metadata": {},
 56 |    "source": [
 57 |     "## Construct a heterogeneous graph with one node type\n",
 58 |     "\n",
 59 |     "For the graph above, we can construct a heterogeneous graph with one node type. We first need to create an adjacency matrix for each edge type and call `heterograph` to create `DGLHeteroGraph`."
 60 |    ]
 61 |   },
 62 |   {
 63 |    "cell_type": "code",
 64 |    "execution_count": null,
 65 |    "metadata": {},
 66 |    "outputs": [],
 67 |    "source": [
 68 |     "node_data = pd.read_csv('node_data.csv', header=None)\n",
 69 |     "edges = pd.read_csv('edges.csv', header=None)\n",
 70 |     "\n",
 71 |     "rel = np.array(edges[2])\n",
 72 |     "spm_dict = {}\n",
 73 |     "src = edges[0][rel==0]\n",
 74 |     "dst = edges[1][rel==0]\n",
 75 |     "spm_dict['a', '0', 'a'] = spsp.coo_matrix((np.ones(len(src)), (src, dst)))\n",
 76 |     "\n",
 77 |     "src = edges[0][rel==1]\n",
 78 |     "dst = edges[1][rel==1]\n",
 79 |     "spm_dict['a', '1', 'a'] = spsp.coo_matrix((np.ones(len(src)), (src, dst)))\n",
 80 |     "\n",
 81 |     "src = edges[0][rel==2]\n",
 82 |     "dst = edges[1][rel==2]\n",
 83 |     "spm_dict['a', '2', 'a'] = spsp.coo_matrix((np.ones(len(src)), (src, dst)))\n",
 84 |     "\n",
 85 |     "g = dgl.heterograph(spm_dict)\n",
 86 |     "\n",
 87 |     "ndata1 = np.expand_dims(np.array(node_data[1]), 1)\n",
 88 |     "ndata2 = np.expand_dims(np.array(node_data[2]), 1)\n",
 89 |     "ndata = np.concatenate([ndata1, ndata2], 1)\n",
 90 |     "g.ndata['feats'] = torch.tensor(ndata)"
 91 |    ]
 92 |   },
 93 |   {
 94 |    "cell_type": "markdown",
 95 |    "metadata": {},
 96 |    "source": [
 97 |     "## Construct a heterogeneous graph with multiple node types"
 98 |    ]
 99 |   },
100 |   {
101 |    "cell_type": "markdown",
102 |    "metadata": {},
103 |    "source": [
104 |     "'edges1.csv' stores a graph with 3 edge types and 2 node types. We can construct a heterogeneous graph in a similar way as above. We first need to create an adjacency matrix for each edge type and call `heterograph` to create `DGLHeteroGraph`. Note: The third edge type connects nodes of type 'a' and 'b'.\n",
105 |     "\n",
106 |     "After constructing the heterogeneous graph, we need to assign node data to each node type. We need to assign data to the nodes of type 'a' and 'b' separately."
107 |    ]
108 |   },
109 |   {
110 |    "cell_type": "code",
111 |    "execution_count": null,
112 |    "metadata": {},
113 |    "outputs": [],
114 |    "source": [
115 |     "edges = pd.read_csv('edges1.csv', header=None)\n",
116 |     "node_data = pd.read_csv('node_data.csv', header=None)\n",
117 |     "node_data1 = pd.read_csv('node_data1.csv', header=None)\n",
118 |     "\n",
119 |     "rel = np.array(edges[2])\n",
120 |     "spm_dict = {}\n",
121 |     "src = edges[0][rel==0]\n",
122 |     "dst = edges[1][rel==0]\n",
123 |     "spm_dict['a', '0', 'a'] = spsp.coo_matrix((np.ones(len(src)), (src, dst)))\n",
124 |     "\n",
125 |     "src = edges[0][rel==1]\n",
126 |     "dst = edges[1][rel==1]\n",
127 |     "spm_dict['a', '1', 'a'] = spsp.coo_matrix((np.ones(len(src)), (src, dst)))\n",
128 |     "\n",
129 |     "src = edges[0][rel==2]\n",
130 |     "dst = edges[1][rel==2]\n",
131 |     "spm_dict['a', '2', 'b'] = spsp.coo_matrix((np.ones(len(src)), (src, dst)))\n",
132 |     "\n",
133 |     "g = dgl.heterograph(spm_dict)\n",
134 |     "\n",
135 |     "ndata = np.concatenate([np.expand_dims(np.array(node_data[1]), 1),\n",
136 |     "                        np.expand_dims(np.array(node_data[2]), 1)], 1)\n",
137 |     "ndata1 = np.concatenate([np.expand_dims(np.array(node_data1[1]), 1),\n",
138 |     "                         np.expand_dims(np.array(node_data1[2]), 1),\n",
139 |     "                         np.expand_dims(np.array(node_data1[3]), 1)], 1)\n",
140 |     "g.nodes['a'].data['feats'] = torch.tensor(ndata)\n",
141 |     "g.nodes['b'].data['feats'] = torch.tensor(ndata1)"
142 |    ]
143 |   },
144 |   {
145 |    "cell_type": "markdown",
146 |    "metadata": {},
147 |    "source": [
148 |     "## Construct a heterogeneous graph with non-contiguous node Ids"
149 |    ]
150 |   },
151 |   {
152 |    "cell_type": "code",
153 |    "execution_count": null,
154 |    "metadata": {},
155 |    "outputs": [],
156 |    "source": []
157 |   }
158 |  ],
159 |  "metadata": {
160 |   "kernelspec": {
161 |    "display_name": "Python 3",
162 |    "language": "python",
163 |    "name": "python3"
164 |   },
165 |   "language_info": {
166 |    "codemirror_mode": {
167 |     "name": "ipython",
168 |     "version": 3
169 |    },
170 |    "file_extension": ".py",
171 |    "mimetype": "text/x-python",
172 |    "name": "python",
173 |    "nbconvert_exporter": "python",
174 |    "pygments_lexer": "ipython3",
175 |    "version": "3.7.5"
176 |   }
177 |  },
178 |  "nbformat": 4,
179 |  "nbformat_minor": 2
180 | }
181 | 


--------------------------------------------------------------------------------
/_legacy/basic_apps/graph_classification_tutorial.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "\n",
  8 |     "# Graph Classification with DGL"
  9 |    ]
 10 |   },
 11 |   {
 12 |    "cell_type": "markdown",
 13 |    "metadata": {},
 14 |    "source": [
 15 |     "Here we demonstrate how to use DGL to finish graph classification tasks. "
 16 |    ]
 17 |   },
 18 |   {
 19 |    "cell_type": "code",
 20 |    "execution_count": null,
 21 |    "metadata": {},
 22 |    "outputs": [],
 23 |    "source": [
 24 |     "import dgl\n",
 25 |     "from dgl.data import TUDataset\n",
 26 |     "from dgl.data.utils import split_dataset\n",
 27 |     "from dgl.nn.pytorch import conv"
 28 |    ]
 29 |   },
 30 |   {
 31 |    "cell_type": "code",
 32 |    "execution_count": null,
 33 |    "metadata": {},
 34 |    "outputs": [],
 35 |    "source": [
 36 |     "import numpy as np\n",
 37 |     "import networkx as nx\n",
 38 |     "\n",
 39 |     "import torch\n",
 40 |     "import torch.nn as nn\n",
 41 |     "from torch.nn import BCEWithLogitsLoss\n",
 42 |     "from torch.optim import Adam\n",
 43 |     "from torch.utils.data import DataLoader\n",
 44 |     "import torch.nn.functional as F"
 45 |    ]
 46 |   },
 47 |   {
 48 |    "cell_type": "markdown",
 49 |    "metadata": {},
 50 |    "source": [
 51 |     "## Load Dataset"
 52 |    ]
 53 |   },
 54 |   {
 55 |    "cell_type": "markdown",
 56 |    "metadata": {},
 57 |    "source": [
 58 |     "<img src=\"./asset/enzymes.png\" width=\"500\"/>"
 59 |    ]
 60 |   },
 61 |   {
 62 |    "cell_type": "markdown",
 63 |    "metadata": {},
 64 |    "source": [
 65 |     "Here we use an enzymes dataset. It constructs graphs from the enzymes based on group functions. Nodes means structure elements and edges means the connections between them. Each graph has a label from 0-5, which means the type of the enzymes."
 66 |    ]
 67 |   },
 68 |   {
 69 |    "cell_type": "code",
 70 |    "execution_count": null,
 71 |    "metadata": {},
 72 |    "outputs": [],
 73 |    "source": [
 74 |     "dataset = TUDataset(\"ENZYMES\")\n",
 75 |     "\n",
 76 |     "dataset.graph_labels=torch.tensor(dataset.graph_labels)\n",
 77 |     "for i in range(len(dataset)):\n",
 78 |     "    dataset[i][0].ndata['node_attr']=(dataset[i][0].ndata['node_attr']).float()"
 79 |    ]
 80 |   },
 81 |   {
 82 |    "cell_type": "code",
 83 |    "execution_count": null,
 84 |    "metadata": {},
 85 |    "outputs": [],
 86 |    "source": [
 87 |     "graph, label= dataset[0]\n",
 88 |     "print(graph)\n",
 89 |     "print(label)"
 90 |    ]
 91 |   },
 92 |   {
 93 |    "cell_type": "code",
 94 |    "execution_count": null,
 95 |    "metadata": {},
 96 |    "outputs": [],
 97 |    "source": [
 98 |     "nx.draw_spring(graph.to_networkx())"
 99 |    ]
100 |   },
101 |   {
102 |    "cell_type": "markdown",
103 |    "metadata": {},
104 |    "source": [
105 |     "### Split dataset into train and val"
106 |    ]
107 |   },
108 |   {
109 |    "cell_type": "code",
110 |    "execution_count": null,
111 |    "metadata": {},
112 |    "outputs": [],
113 |    "source": [
114 |     "trainset, valset = split_dataset(dataset, [0.8, 0.2], shuffle=True, random_state=42)"
115 |    ]
116 |   },
117 |   {
118 |    "cell_type": "markdown",
119 |    "metadata": {},
120 |    "source": [
121 |     "## Prepare Dataloader"
122 |    ]
123 |   },
124 |   {
125 |    "cell_type": "markdown",
126 |    "metadata": {},
127 |    "source": [
128 |     "DGL could batch multiple small graphs together to accelerate the computation. Detail of batching can be found [here](https://docs.dgl.ai/tutorials/basics/4_batch.html)."
129 |    ]
130 |   },
131 |   {
132 |    "cell_type": "markdown",
133 |    "metadata": {},
134 |    "source": [
135 |     "<img src=\"https://s3.us-east-2.amazonaws.com/dgl.ai/tutorial/batch/batch.png\" width=\"500\"/>"
136 |    ]
137 |   },
138 |   {
139 |    "cell_type": "code",
140 |    "execution_count": null,
141 |    "metadata": {},
142 |    "outputs": [],
143 |    "source": [
144 |     "def collate_molgraphs_for_classification(data):\n",
145 |     "    \"\"\"Batching a list of datapoints for dataloader in classification tasks.\"\"\"\n",
146 |     "    graphs, labels = map(list, zip(*data))\n",
147 |     "    bg = dgl.batch(graphs)\n",
148 |     "    labels = torch.stack(labels, dim=0)\n",
149 |     "    return bg, labels\n",
150 |     "\n",
151 |     "train_loader = DataLoader(trainset, batch_size=512,\n",
152 |     "                          collate_fn=collate_molgraphs_for_classification)\n",
153 |     "val_loader = DataLoader(valset, batch_size=512,\n",
154 |     "                        collate_fn=collate_molgraphs_for_classification)"
155 |    ]
156 |   },
157 |   {
158 |    "cell_type": "markdown",
159 |    "metadata": {},
160 |    "source": [
161 |     "## Prepare Model and Optimizer"
162 |    ]
163 |   },
164 |   {
165 |    "cell_type": "markdown",
166 |    "metadata": {},
167 |    "source": [
168 |     "Here we use a two layer Graph Convolutional Network to classify the graphs. Detailed source code can be found [here](https://github.com/dmlc/dgl/blob/master/python/dgl/model_zoo/chem/classifiers.py#L111)."
169 |    ]
170 |   },
171 |   {
172 |    "cell_type": "markdown",
173 |    "metadata": {},
174 |    "source": [
175 |     "We use the similar structure as introduced before, a 3-layer GNN to learn the node-level representations. Then we use built-in readout functions `dgl.sum_nodes`, suming all the node(vertex) representation to get the graph representsions. $$h_g=\\sum{h_v}$$  \n",
176 |     "Then we use a linear(MLP) classifier to classify the graph based on its representation"
177 |    ]
178 |   },
179 |   {
180 |    "cell_type": "code",
181 |    "execution_count": null,
182 |    "metadata": {},
183 |    "outputs": [],
184 |    "source": [
185 |     "class GCNModel(nn.Module):\n",
186 |     "    def __init__(self,\n",
187 |     "                 in_feats,\n",
188 |     "                 n_hidden,\n",
189 |     "                 out_feats):\n",
190 |     "        super().__init__()\n",
191 |     "        self.layers = nn.ModuleList([\n",
192 |     "            conv.GraphConv(in_feats, n_hidden, activation=F.relu),\n",
193 |     "            conv.GraphConv(n_hidden, n_hidden, activation=F.relu),\n",
194 |     "            conv.GraphConv(n_hidden, n_hidden, activation=F.relu)\n",
195 |     "        ])\n",
196 |     "        \n",
197 |     "        self.classifier = nn.Linear(n_hidden, out_feats)\n",
198 |     "\n",
199 |     "    def forward(self, g, features):\n",
200 |     "        h = features\n",
201 |     "        for layer in self.layers:\n",
202 |     "            h = layer(g, h)\n",
203 |     "        with g.local_scope():\n",
204 |     "            g.ndata['feat'] = h\n",
205 |     "            h_g = dgl.sum_nodes(g, 'feat')\n",
206 |     "        return self.classifier(h_g)"
207 |    ]
208 |   },
209 |   {
210 |    "cell_type": "markdown",
211 |    "metadata": {},
212 |    "source": [
213 |     "## Training"
214 |    ]
215 |   },
216 |   {
217 |    "cell_type": "code",
218 |    "execution_count": null,
219 |    "metadata": {},
220 |    "outputs": [],
221 |    "source": [
222 |     "device = \"cuda\" if torch.cuda.is_available() else \"cpu\"\n",
223 |     "epochs = 500 if torch.cuda.is_available() else 50\n",
224 |     "model = GCNModel(in_feats=18, n_hidden=64, out_feats=6).to(device)\n",
225 |     "loss_criterion = torch.nn.CrossEntropyLoss()\n",
226 |     "optimizer = Adam(model.parameters())\n",
227 |     "print(device)\n",
228 |     "print(model)"
229 |    ]
230 |   },
231 |   {
232 |    "cell_type": "code",
233 |    "execution_count": null,
234 |    "metadata": {
235 |     "scrolled": false
236 |    },
237 |    "outputs": [],
238 |    "source": [
239 |     "model.train()\n",
240 |     "for i in range(epochs):\n",
241 |     "    loss_list = []\n",
242 |     "    true_samples = 0\n",
243 |     "    num_samples = 0\n",
244 |     "    for batch_id, batch_data in enumerate(train_loader):\n",
245 |     "        bg, labels = batch_data\n",
246 |     "        atom_feats = bg.ndata.pop('node_attr').float()\n",
247 |     "        atom_feats, labels = atom_feats.to(device), \\\n",
248 |     "                                   labels.to(device).squeeze(-1)\n",
249 |     "        logits = model(bg, atom_feats)\n",
250 |     "        loss = loss_criterion(logits, labels)\n",
251 |     "        true_samples += (logits.argmax(1)==labels.long()).float().sum().item()\n",
252 |     "        num_samples += len(labels)\n",
253 |     "        loss_list.append(loss.item())\n",
254 |     "        optimizer.zero_grad()\n",
255 |     "        loss.backward()\n",
256 |     "        optimizer.step()\n",
257 |     "    print(\"Epoch {:05d} | Loss: {:.4f} | Accuracy: {:.4f}\".format(i, np.mean(loss_list), true_samples/num_samples))"
258 |    ]
259 |   },
260 |   {
261 |    "cell_type": "markdown",
262 |    "metadata": {},
263 |    "source": [
264 |     "## Validation"
265 |    ]
266 |   },
267 |   {
268 |    "cell_type": "code",
269 |    "execution_count": null,
270 |    "metadata": {},
271 |    "outputs": [],
272 |    "source": [
273 |     "model.eval()\n",
274 |     "true_samples = 0\n",
275 |     "num_samples = 0\n",
276 |     "with torch.no_grad():\n",
277 |     "    for batch_id, batch_data in enumerate(val_loader):\n",
278 |     "        bg, labels = batch_data\n",
279 |     "        atom_feats = bg.ndata.pop('node_attr')\n",
280 |     "        atom_feats, labels = atom_feats.to(device), \\\n",
281 |     "                                   labels.to(device).squeeze(-1)\n",
282 |     "        logits = model(bg, atom_feats)\n",
283 |     "        logits.argmax()\n",
284 |     "        num_samples += len(labels)\n",
285 |     "        true_samples += (logits.argmax(1)==labels.long()).float().sum().item()\n",
286 |     "print(\"Validation Accuracy: {:.4f}\".format(true_samples/num_samples))"
287 |    ]
288 |   },
289 |   {
290 |    "cell_type": "markdown",
291 |    "metadata": {},
292 |    "source": [
293 |     "## Excercise"
294 |    ]
295 |   },
296 |   {
297 |    "cell_type": "markdown",
298 |    "metadata": {},
299 |    "source": [
300 |     "There's other built-in readout function, such as `max_nodes` and `mean_nodes`. Docs can be found [here](https://docs.dgl.ai/api/python/batch.html#graph-readout). You can try to replace the `sum_nodes` with other functions to see whether you could acheive better performances.\n",
301 |     "You can also change the network structure as the exercise of `BasicTask.ipynb`"
302 |    ]
303 |   },
304 |   {
305 |    "cell_type": "code",
306 |    "execution_count": null,
307 |    "metadata": {},
308 |    "outputs": [],
309 |    "source": []
310 |   }
311 |  ],
312 |  "metadata": {
313 |   "kernelspec": {
314 |    "display_name": "Python 3",
315 |    "language": "python",
316 |    "name": "python3"
317 |   },
318 |   "language_info": {
319 |    "codemirror_mode": {
320 |     "name": "ipython",
321 |     "version": 3
322 |    },
323 |    "file_extension": ".py",
324 |    "mimetype": "text/x-python",
325 |    "name": "python",
326 |    "nbconvert_exporter": "python",
327 |    "pygments_lexer": "ipython3",
328 |    "version": "3.6.8"
329 |   }
330 |  },
331 |  "nbformat": 4,
332 |  "nbformat_minor": 2
333 | }
334 | 


--------------------------------------------------------------------------------
/_legacy/basic_apps/synthetic_data.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "code",
  5 |    "execution_count": null,
  6 |    "metadata": {},
  7 |    "outputs": [],
  8 |    "source": [
  9 |     "import numpy as np\n",
 10 |     "from scipy import sparse as spsp"
 11 |    ]
 12 |   },
 13 |   {
 14 |    "cell_type": "markdown",
 15 |    "metadata": {},
 16 |    "source": [
 17 |     "## Construct a heterogeneous graph with one node type"
 18 |    ]
 19 |   },
 20 |   {
 21 |    "cell_type": "code",
 22 |    "execution_count": null,
 23 |    "metadata": {},
 24 |    "outputs": [],
 25 |    "source": [
 26 |     "num_nodes = 1000\n",
 27 |     "num_edges = 3000\n",
 28 |     "\n",
 29 |     "edata = []\n",
 30 |     "for i in range(3):\n",
 31 |     "    src = np.random.randint(0, num_nodes, num_edges)\n",
 32 |     "    dst = np.random.randint(0, num_nodes, num_edges)\n",
 33 |     "    etype = np.ones(num_edges) * i\n",
 34 |     "\n",
 35 |     "    src = np.expand_dims(src, 1)\n",
 36 |     "    dst = np.expand_dims(dst, 1)\n",
 37 |     "    etype = np.expand_dims(etype, 1)\n",
 38 |     "    edges = np.concatenate([src, dst, etype], 1)\n",
 39 |     "    edata.append(edges)\n",
 40 |     "edata = np.concatenate(edata, 0)\n",
 41 |     "np.savetxt('edges.csv', edata, fmt='%d', delimiter=',')"
 42 |    ]
 43 |   },
 44 |   {
 45 |    "cell_type": "code",
 46 |    "execution_count": null,
 47 |    "metadata": {},
 48 |    "outputs": [],
 49 |    "source": [
 50 |     "node_id = np.arange(num_nodes)\n",
 51 |     "ndata = np.random.randint(0, 10, num_nodes * 2).reshape(num_nodes, 2)\n",
 52 |     "node_id = np.expand_dims(node_id, 1)\n",
 53 |     "ndata = np.concatenate([node_id, ndata], 1)\n",
 54 |     "np.savetxt('node_data.csv', ndata, fmt='%d', delimiter=',')"
 55 |    ]
 56 |   },
 57 |   {
 58 |    "cell_type": "markdown",
 59 |    "metadata": {},
 60 |    "source": [
 61 |     "## Construct a heterogeneous graph with two node types"
 62 |    ]
 63 |   },
 64 |   {
 65 |    "cell_type": "code",
 66 |    "execution_count": null,
 67 |    "metadata": {},
 68 |    "outputs": [],
 69 |    "source": [
 70 |     "num_nodes = 1000\n",
 71 |     "num_nodes1 = 100\n",
 72 |     "num_edges = 3000\n",
 73 |     "\n",
 74 |     "edata = []\n",
 75 |     "for i in range(2):\n",
 76 |     "    src = np.random.randint(0, num_nodes, num_edges)\n",
 77 |     "    dst = np.random.randint(0, num_nodes, num_edges)\n",
 78 |     "    etype = np.ones(num_edges) * i\n",
 79 |     "    src = np.expand_dims(src, 1)\n",
 80 |     "    dst = np.expand_dims(dst, 1)\n",
 81 |     "    etype = np.expand_dims(etype, 1)\n",
 82 |     "    edges = np.concatenate([src, dst, etype], 1)\n",
 83 |     "    edata.append(edges)\n",
 84 |     "    \n",
 85 |     "src = np.random.randint(0, num_nodes, num_edges)\n",
 86 |     "dst = np.random.randint(0, num_nodes1, num_edges)\n",
 87 |     "etype = np.ones(num_edges) * 2\n",
 88 |     "src = np.expand_dims(src, 1)\n",
 89 |     "dst = np.expand_dims(dst, 1)\n",
 90 |     "etype = np.expand_dims(etype, 1)\n",
 91 |     "edges = np.concatenate([src, dst, etype], 1)\n",
 92 |     "edata.append(edges)\n",
 93 |     "\n",
 94 |     "edata = np.concatenate(edata, 0)\n",
 95 |     "np.savetxt('edges1.csv', edata, fmt='%d', delimiter=',')"
 96 |    ]
 97 |   },
 98 |   {
 99 |    "cell_type": "code",
100 |    "execution_count": null,
101 |    "metadata": {},
102 |    "outputs": [],
103 |    "source": [
104 |     "node_id = np.arange(num_nodes1)\n",
105 |     "ndata = np.random.randint(0, 10, num_nodes1 * 3).reshape(num_nodes1, 3)\n",
106 |     "node_id = np.expand_dims(node_id, 1)\n",
107 |     "ndata = np.concatenate([node_id, ndata], 1)\n",
108 |     "np.savetxt('node_data1.csv', ndata, fmt='%d', delimiter=',')"
109 |    ]
110 |   },
111 |   {
112 |    "cell_type": "code",
113 |    "execution_count": null,
114 |    "metadata": {},
115 |    "outputs": [],
116 |    "source": []
117 |   }
118 |  ],
119 |  "metadata": {
120 |   "kernelspec": {
121 |    "display_name": "Python 3",
122 |    "language": "python",
123 |    "name": "python3"
124 |   },
125 |   "language_info": {
126 |    "codemirror_mode": {
127 |     "name": "ipython",
128 |     "version": 3
129 |    },
130 |    "file_extension": ".py",
131 |    "mimetype": "text/x-python",
132 |    "name": "python",
133 |    "nbconvert_exporter": "python",
134 |    "pygments_lexer": "ipython3",
135 |    "version": "3.7.5"
136 |   }
137 |  },
138 |  "nbformat": 4,
139 |  "nbformat_minor": 2
140 | }
141 | 


--------------------------------------------------------------------------------
/applications/assets/example_bipartite.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/dglai/WWW20-Hands-on-Tutorial/8fd0bbf9aca2bfcb95074e4124e01a2b074be300/applications/assets/example_bipartite.png


--------------------------------------------------------------------------------
/applications/assets/example_bipartite_train.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/dglai/WWW20-Hands-on-Tutorial/8fd0bbf9aca2bfcb95074e4124e01a2b074be300/applications/assets/example_bipartite_train.png


--------------------------------------------------------------------------------
/applications/assets/example_bipartite_train_removed.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/dglai/WWW20-Hands-on-Tutorial/8fd0bbf9aca2bfcb95074e4124e01a2b074be300/applications/assets/example_bipartite_train_removed.png


--------------------------------------------------------------------------------
/applications/document.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/dglai/WWW20-Hands-on-Tutorial/8fd0bbf9aca2bfcb95074e4124e01a2b074be300/applications/document.pdf


--------------------------------------------------------------------------------
/applications/fraud.ipynb:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/dglai/WWW20-Hands-on-Tutorial/8fd0bbf9aca2bfcb95074e4124e01a2b074be300/applications/fraud.ipynb


--------------------------------------------------------------------------------
/applications/u.item:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/dglai/WWW20-Hands-on-Tutorial/8fd0bbf9aca2bfcb95074e4124e01a2b074be300/applications/u.item


--------------------------------------------------------------------------------
/asset/dgl-mp.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/dglai/WWW20-Hands-on-Tutorial/8fd0bbf9aca2bfcb95074e4124e01a2b074be300/asset/dgl-mp.png


--------------------------------------------------------------------------------
/asset/dgl-query.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/dglai/WWW20-Hands-on-Tutorial/8fd0bbf9aca2bfcb95074e4124e01a2b074be300/asset/dgl-query.png


--------------------------------------------------------------------------------
/asset/dgl_logo.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/dglai/WWW20-Hands-on-Tutorial/8fd0bbf9aca2bfcb95074e4124e01a2b074be300/asset/dgl_logo.png


--------------------------------------------------------------------------------
/asset/enzymes.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/dglai/WWW20-Hands-on-Tutorial/8fd0bbf9aca2bfcb95074e4124e01a2b074be300/asset/enzymes.png


--------------------------------------------------------------------------------
/asset/gnn_ep0.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/dglai/WWW20-Hands-on-Tutorial/8fd0bbf9aca2bfcb95074e4124e01a2b074be300/asset/gnn_ep0.png


--------------------------------------------------------------------------------
/asset/gnn_ep_anime.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/dglai/WWW20-Hands-on-Tutorial/8fd0bbf9aca2bfcb95074e4124e01a2b074be300/asset/gnn_ep_anime.gif


--------------------------------------------------------------------------------
/asset/karat_club.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/dglai/WWW20-Hands-on-Tutorial/8fd0bbf9aca2bfcb95074e4124e01a2b074be300/asset/karat_club.png


--------------------------------------------------------------------------------
/asset/sagemaker.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/dglai/WWW20-Hands-on-Tutorial/8fd0bbf9aca2bfcb95074e4124e01a2b074be300/asset/sagemaker.pdf


--------------------------------------------------------------------------------
/asset/sagemaker.pptx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/dglai/WWW20-Hands-on-Tutorial/8fd0bbf9aca2bfcb95074e4124e01a2b074be300/asset/sagemaker.pptx


--------------------------------------------------------------------------------
/basic_tasks/1_load_data.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "# Load and Process Graph Data\n",
  8 |     "\n",
  9 |     "In this session, you will learn:\n",
 10 |     "\n",
 11 |     "* Load graph data stored in CSV files.\n",
 12 |     "* Construct a graph in DGL.\n",
 13 |     "* Query structural information of a DGL graph.\n",
 14 |     "* Load and pre-process node and edge features.\n"
 15 |    ]
 16 |   },
 17 |   {
 18 |    "cell_type": "markdown",
 19 |    "metadata": {},
 20 |    "source": [
 21 |     "## Load graph data from CSV"
 22 |    ]
 23 |   },
 24 |   {
 25 |    "cell_type": "markdown",
 26 |    "metadata": {},
 27 |    "source": [
 28 |     "Comma Separated Values (CSV) is a widely-used format for storing relational data. In this tutorial, we have prepared two CSV files that store [the Zachery's Karate Club network](https://en.wikipedia.org/wiki/Zachary%27s_karate_club).\n",
 29 |     "* The `nodes.csv` stores every club members and their attributes.\n",
 30 |     "* The `edges.csv` stores the pair-wise interactions between two club members."
 31 |    ]
 32 |   },
 33 |   {
 34 |    "cell_type": "code",
 35 |    "execution_count": null,
 36 |    "metadata": {},
 37 |    "outputs": [],
 38 |    "source": [
 39 |     "!ls -lh 'data'"
 40 |    ]
 41 |   },
 42 |   {
 43 |    "cell_type": "markdown",
 44 |    "metadata": {},
 45 |    "source": [
 46 |     "We can use `pandas` to load the two CSV files."
 47 |    ]
 48 |   },
 49 |   {
 50 |    "cell_type": "code",
 51 |    "execution_count": null,
 52 |    "metadata": {},
 53 |    "outputs": [],
 54 |    "source": [
 55 |     "import pandas as pd\n",
 56 |     "\n",
 57 |     "nodes_data = pd.read_csv('data/nodes.csv')\n",
 58 |     "print(nodes_data)"
 59 |    ]
 60 |   },
 61 |   {
 62 |    "cell_type": "code",
 63 |    "execution_count": null,
 64 |    "metadata": {},
 65 |    "outputs": [],
 66 |    "source": [
 67 |     "edges_data = pd.read_csv('data/edges.csv')\n",
 68 |     "print(edges_data)"
 69 |    ]
 70 |   },
 71 |   {
 72 |    "cell_type": "markdown",
 73 |    "metadata": {},
 74 |    "source": [
 75 |     "We then construct a graph where each node is a club member and each edge represents their interactions. In DGL, **nodes are consecutive integers starting from zero**. Thus, when preparing the data, it is important to re-label or re-shuffle the row order so that the first row corresponding to the first nodes, so on and so forth.\n",
 76 |     "\n",
 77 |     "In this example, we have already prepared the data in the correct order, so we can create the graph by the `'Src'` and `'Dst'` columns from the `edges.csv` table."
 78 |    ]
 79 |   },
 80 |   {
 81 |    "cell_type": "code",
 82 |    "execution_count": null,
 83 |    "metadata": {},
 84 |    "outputs": [],
 85 |    "source": [
 86 |     "import dgl\n",
 87 |     "\n",
 88 |     "src = edges_data['Src'].to_numpy()\n",
 89 |     "dst = edges_data['Dst'].to_numpy()\n",
 90 |     "\n",
 91 |     "# Create a DGL graph from a pair of numpy arrays\n",
 92 |     "g = dgl.graph((src, dst))\n",
 93 |     "\n",
 94 |     "# Print a graph gives some meta information such as number of nodes and edges.\n",
 95 |     "print(g)"
 96 |    ]
 97 |   },
 98 |   {
 99 |    "cell_type": "markdown",
100 |    "metadata": {},
101 |    "source": [
102 |     "A DGL graph can be converted to a `networkx` graph, so to utilize its rich functionalities such as visualization."
103 |    ]
104 |   },
105 |   {
106 |    "cell_type": "code",
107 |    "execution_count": null,
108 |    "metadata": {},
109 |    "outputs": [],
110 |    "source": [
111 |     "import networkx as nx\n",
112 |     "# Since the actual graph is undirected, we convert it for visualization\n",
113 |     "# purpose.\n",
114 |     "nx_g = g.to_networkx().to_undirected()\n",
115 |     "# Kamada-Kawaii layout usually looks pretty for arbitrary graphs\n",
116 |     "pos = nx.kamada_kawai_layout(nx_g)\n",
117 |     "nx.draw(nx_g, pos, with_labels=True, node_color=[[.7, .7, .7]])"
118 |    ]
119 |   },
120 |   {
121 |    "cell_type": "markdown",
122 |    "metadata": {},
123 |    "source": [
124 |     "## Query graph structures"
125 |    ]
126 |   },
127 |   {
128 |    "cell_type": "markdown",
129 |    "metadata": {},
130 |    "source": [
131 |     "Let's print out how many nodes and edges are there in this graph."
132 |    ]
133 |   },
134 |   {
135 |    "cell_type": "code",
136 |    "execution_count": null,
137 |    "metadata": {},
138 |    "outputs": [],
139 |    "source": [
140 |     "print('#Nodes', g.number_of_nodes())\n",
141 |     "print('#Edges', g.number_of_edges())"
142 |    ]
143 |   },
144 |   {
145 |    "cell_type": "markdown",
146 |    "metadata": {},
147 |    "source": [
148 |     "We can also perform queries on the graph structures."
149 |    ]
150 |   },
151 |   {
152 |    "cell_type": "markdown",
153 |    "metadata": {},
154 |    "source": [
155 |     "Get the in-degree of node 0:"
156 |    ]
157 |   },
158 |   {
159 |    "cell_type": "code",
160 |    "execution_count": null,
161 |    "metadata": {},
162 |    "outputs": [],
163 |    "source": [
164 |     "g.in_degree(0)"
165 |    ]
166 |   },
167 |   {
168 |    "cell_type": "markdown",
169 |    "metadata": {},
170 |    "source": [
171 |     "Get the successors of node 0:"
172 |    ]
173 |   },
174 |   {
175 |    "cell_type": "code",
176 |    "execution_count": null,
177 |    "metadata": {},
178 |    "outputs": [],
179 |    "source": [
180 |     "g.successors(0)"
181 |    ]
182 |   },
183 |   {
184 |    "cell_type": "markdown",
185 |    "metadata": {},
186 |    "source": [
187 |     "DGL provides APIs for querying structural information. See the API document [here](https://docs.dgl.ai/api/python/heterograph.html#querying-graph-structure)."
188 |    ]
189 |   },
190 |   {
191 |    "cell_type": "markdown",
192 |    "metadata": {},
193 |    "source": [
194 |     "![image.png](../asset/dgl-query.png)"
195 |    ]
196 |   },
197 |   {
198 |    "cell_type": "markdown",
199 |    "metadata": {},
200 |    "source": [
201 |     "## Load node and edge features"
202 |    ]
203 |   },
204 |   {
205 |    "cell_type": "markdown",
206 |    "metadata": {},
207 |    "source": [
208 |     "In many graph data, nodes and edges have attributes. Although these attributes can have arbitrary types, a DGL graph only accepts attributes stored in tensors (with numerical contents). The vast development of deep learning has provided us many ways to vectorize various types of attributes into numerical features. Here are some general suggestions:\n",
209 |     "* For categorical attributes (e.g. gender, occupation), consider converting them to integers or one-hot encoding.\n",
210 |     "* For variable length string contents (e.g. news article, quote), consider applying a language model.\n",
211 |     "* For images, consider applying a vision model such as CNNs.\n",
212 |     "\n",
213 |     "Our data set has the following attribute columns:\n",
214 |     "* `Age` is already an integer attribute.\n",
215 |     "* `Club` is a categorical attribute representing which community each member belongs to.\n",
216 |     "* `Weight` is a floating number indicating the strength of each interaction."
217 |    ]
218 |   },
219 |   {
220 |    "cell_type": "code",
221 |    "execution_count": null,
222 |    "metadata": {},
223 |    "outputs": [],
224 |    "source": [
225 |     "import torch\n",
226 |     "import torch.nn.functional as F\n",
227 |     "\n",
228 |     "# Prepare the age node feature\n",
229 |     "age = torch.tensor(nodes_data['Age'].to_numpy()).float() / 100\n",
230 |     "print(age)"
231 |    ]
232 |   },
233 |   {
234 |    "cell_type": "code",
235 |    "execution_count": null,
236 |    "metadata": {},
237 |    "outputs": [],
238 |    "source": [
239 |     "# Get the features of node 0 and 10\n",
240 |     "age[[0, 10]]"
241 |    ]
242 |   },
243 |   {
244 |    "cell_type": "markdown",
245 |    "metadata": {},
246 |    "source": [
247 |     "Use `g.ndata` to set the age features to the graph."
248 |    ]
249 |   },
250 |   {
251 |    "cell_type": "code",
252 |    "execution_count": null,
253 |    "metadata": {},
254 |    "outputs": [],
255 |    "source": [
256 |     "# Feed the features to graph\n",
257 |     "g.ndata['age'] = age\n",
258 |     "print(g)"
259 |    ]
260 |   },
261 |   {
262 |    "cell_type": "code",
263 |    "execution_count": null,
264 |    "metadata": {},
265 |    "outputs": [],
266 |    "source": [
267 |     "# The \"Club\" column represents which community does each node belong to.\n",
268 |     "# The values are of string type, so we must convert it to either categorical\n",
269 |     "# integer values or one-hot encoding.\n",
270 |     "\n",
271 |     "club = nodes_data['Club'].to_list()\n",
272 |     "# Convert to categorical integer values with 0 for 'Mr. Hi', 1 for 'Officer'.\n",
273 |     "club = torch.tensor([c == 'Officer' for c in club]).long()\n",
274 |     "# We can also convert it to one-hot encoding.\n",
275 |     "club_onehot = F.one_hot(club)\n",
276 |     "print(club_onehot)\n",
277 |     "\n",
278 |     "# Use `g.ndata` like a normal dictionary\n",
279 |     "g.ndata.update({'club' : club, 'club_onehot' : club_onehot})\n",
280 |     "# Remove some features using del\n",
281 |     "del g.ndata['age']\n",
282 |     "\n",
283 |     "print(g)"
284 |    ]
285 |   },
286 |   {
287 |    "cell_type": "markdown",
288 |    "metadata": {},
289 |    "source": [
290 |     "Feeding edge features to a DGL graph is similar."
291 |    ]
292 |   },
293 |   {
294 |    "cell_type": "code",
295 |    "execution_count": null,
296 |    "metadata": {},
297 |    "outputs": [],
298 |    "source": [
299 |     "# Get edge features from the DataFrame and feed it to graph.\n",
300 |     "edge_weight = torch.tensor(edges_data['Weight'].to_numpy())\n",
301 |     "# Similarly, use `g.edata` for getting/setting edge features.\n",
302 |     "g.edata['weight'] = edge_weight\n",
303 |     "print(g)"
304 |    ]
305 |   },
306 |   {
307 |    "cell_type": "code",
308 |    "execution_count": null,
309 |    "metadata": {},
310 |    "outputs": [],
311 |    "source": []
312 |   }
313 |  ],
314 |  "metadata": {
315 |   "kernelspec": {
316 |    "display_name": "Python 3",
317 |    "language": "python",
318 |    "name": "python3"
319 |   },
320 |   "language_info": {
321 |    "codemirror_mode": {
322 |     "name": "ipython",
323 |     "version": 3
324 |    },
325 |    "file_extension": ".py",
326 |    "mimetype": "text/x-python",
327 |    "name": "python",
328 |    "nbconvert_exporter": "python",
329 |    "pygments_lexer": "ipython3",
330 |    "version": "3.7.5"
331 |   }
332 |  },
333 |  "nbformat": 4,
334 |  "nbformat_minor": 4
335 | }
336 | 


--------------------------------------------------------------------------------
/basic_tasks/2_gnn.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "# Semi-supervised Community Detection using Graph Neural Networks\n",
  8 |     "\n",
  9 |     "Almost every computer 101 class starts with a \"Hello World\" example. Like MNIST for deep learning, in graph domain we have the Zachary's Karate Club problem. The karate club is a social network that includes 34 members and documents pairwise links between members who interact outside the club. The club later divides into two communities led by the instructor (node 0) and the club president (node 33). The network is visualized as follows with the color indicating the community.\n",
 10 |     "<img src='../asset/karat_club.png' align='center' width=\"400px\" height=\"300px\" />\n",
 11 |     "\n",
 12 |     "In this tutorial, you will learn:\n",
 13 |     "\n",
 14 |     "* Formulate the community detection problem as a semi-supervised node classification task.\n",
 15 |     "* Build a GraphSAGE model, a popular Graph Neural Network architecture proposed by [Hamilton et al.](https://arxiv.org/abs/1706.02216)\n",
 16 |     "* Train the model and understand the result."
 17 |    ]
 18 |   },
 19 |   {
 20 |    "cell_type": "code",
 21 |    "execution_count": null,
 22 |    "metadata": {},
 23 |    "outputs": [],
 24 |    "source": [
 25 |     "import dgl\n",
 26 |     "import torch\n",
 27 |     "import torch.nn as nn\n",
 28 |     "import torch.nn.functional as F\n",
 29 |     "import itertools"
 30 |    ]
 31 |   },
 32 |   {
 33 |    "cell_type": "markdown",
 34 |    "metadata": {},
 35 |    "source": [
 36 |     "## Community detection as node classification\n",
 37 |     "\n",
 38 |     "The study of community structure in graphs has a long history. Many proposed methods are *unsupervised* (or *self-supervised* by recent definition), where the model predicts the community labels only by connectivity. Recently, [Kipf et al.,](https://arxiv.org/abs/1609.02907) proposed to formulate the community detection problem as a semi-supervised node classification task. With the help of only a small portion of labeled nodes, a GNN can accurately predict the community labels of the others.\n",
 39 |     "\n",
 40 |     "In this tutorial, we apply Kipf's setting to the Zachery's Karate Club network to predict the community membership, where only the labels of a few nodes are used."
 41 |    ]
 42 |   },
 43 |   {
 44 |    "cell_type": "markdown",
 45 |    "metadata": {},
 46 |    "source": [
 47 |     "We first load the graph and node labels as is covered in the [last session](./1_load_data.ipynb). Here, we have provided you a function for loading the data."
 48 |    ]
 49 |   },
 50 |   {
 51 |    "cell_type": "code",
 52 |    "execution_count": null,
 53 |    "metadata": {},
 54 |    "outputs": [],
 55 |    "source": [
 56 |     "from tutorial_utils import load_zachery\n",
 57 |     "\n",
 58 |     "# ----------- 0. load graph -------------- #\n",
 59 |     "g = load_zachery()\n",
 60 |     "print(g)"
 61 |    ]
 62 |   },
 63 |   {
 64 |    "cell_type": "markdown",
 65 |    "metadata": {},
 66 |    "source": [
 67 |     "In the original Zachery's Karate Club graph, nodes are feature-less. (The `'Age'` attribute is an artificial one mainly for tutorial purposes). For feature-less graph, a common practice is to use an embedding weight that is updated during training for every node.\n",
 68 |     "\n",
 69 |     "We can use PyTorch's `Embedding` module to achieve this."
 70 |    ]
 71 |   },
 72 |   {
 73 |    "cell_type": "code",
 74 |    "execution_count": null,
 75 |    "metadata": {},
 76 |    "outputs": [],
 77 |    "source": [
 78 |     "# ----------- 1. node features -------------- #\n",
 79 |     "node_embed = nn.Embedding(g.number_of_nodes(), 5)  # Every node has an embedding of size 5.\n",
 80 |     "inputs = node_embed.weight                         # Use the embedding weight as the node features.\n",
 81 |     "nn.init.xavier_uniform_(inputs)\n",
 82 |     "print(inputs)"
 83 |    ]
 84 |   },
 85 |   {
 86 |    "cell_type": "markdown",
 87 |    "metadata": {},
 88 |    "source": [
 89 |     "The community label is stored in the `'club'` node feature (0 for instructor, 1 for club president). Only nodes 0 and 33 are labeled."
 90 |    ]
 91 |   },
 92 |   {
 93 |    "cell_type": "code",
 94 |    "execution_count": null,
 95 |    "metadata": {},
 96 |    "outputs": [],
 97 |    "source": [
 98 |     "labels = g.ndata['club']\n",
 99 |     "labeled_nodes = [0, 33]\n",
100 |     "print('Labels', labels[labeled_nodes])"
101 |    ]
102 |   },
103 |   {
104 |    "cell_type": "markdown",
105 |    "metadata": {},
106 |    "source": [
107 |     "## Define a GraphSAGE model\n",
108 |     "\n",
109 |     "Our model consists of two layers, each computes new node representations by aggregating neighbor information. The equations are:\n",
110 |     "\n",
111 |     "$$\n",
112 |     "h_{\\mathcal{N}(v)}^k\\leftarrow \\text{AGGREGATE}_k\\{h_u^{k-1},\\forall u\\in\\mathcal{N}(v)\\}\n",
113 |     "$$\n",
114 |     "\n",
115 |     "$$\n",
116 |     "h_v^k\\leftarrow \\sigma\\left(W^k\\cdot \\text{CONCAT}(h_v^{k-1}, h_{\\mathcal{N}(v)}^k) \\right)\n",
117 |     "$$\n",
118 |     "\n",
119 |     "DGL provides implementation of many popular neighbor aggregation modules. They all can be invoked easily with one line of codes. See the full list of supported [graph convolution modules](https://docs.dgl.ai/api/python/nn.pytorch.html#module-dgl.nn.pytorch.conv)."
120 |    ]
121 |   },
122 |   {
123 |    "cell_type": "code",
124 |    "execution_count": null,
125 |    "metadata": {},
126 |    "outputs": [],
127 |    "source": [
128 |     "from dgl.nn import SAGEConv\n",
129 |     "\n",
130 |     "# ----------- 2. create model -------------- #\n",
131 |     "# build a two-layer GraphSAGE model\n",
132 |     "class GraphSAGE(nn.Module):\n",
133 |     "    def __init__(self, in_feats, h_feats, num_classes):\n",
134 |     "        super(GraphSAGE, self).__init__()\n",
135 |     "        self.conv1 = SAGEConv(in_feats, h_feats, 'mean')\n",
136 |     "        self.conv2 = SAGEConv(h_feats, num_classes, 'mean')\n",
137 |     "    \n",
138 |     "    def forward(self, g, in_feat):\n",
139 |     "        h = self.conv1(g, in_feat)\n",
140 |     "        h = F.relu(h)\n",
141 |     "        h = self.conv2(g, h)\n",
142 |     "        return h\n",
143 |     "    \n",
144 |     "# Create the model with given dimensions \n",
145 |     "# input layer dimension: 5, node embeddings\n",
146 |     "# hidden layer dimension: 16\n",
147 |     "# output layer dimension: 2, the two classes, 0 and 1\n",
148 |     "net = GraphSAGE(5, 16, 2)"
149 |    ]
150 |   },
151 |   {
152 |    "cell_type": "code",
153 |    "execution_count": null,
154 |    "metadata": {},
155 |    "outputs": [],
156 |    "source": [
157 |     "# ----------- 3. set up loss and optimizer -------------- #\n",
158 |     "# in this case, loss will in training loop\n",
159 |     "optimizer = torch.optim.Adam(itertools.chain(net.parameters(), node_embed.parameters()), lr=0.01)\n",
160 |     "\n",
161 |     "# ----------- 4. training -------------------------------- #\n",
162 |     "all_logits = []\n",
163 |     "for e in range(100):\n",
164 |     "    # forward\n",
165 |     "    logits = net(g, inputs)\n",
166 |     "    \n",
167 |     "    # compute loss\n",
168 |     "    logp = F.log_softmax(logits, 1)\n",
169 |     "    loss = F.nll_loss(logp[labeled_nodes], labels[labeled_nodes])\n",
170 |     "    \n",
171 |     "    # backward\n",
172 |     "    optimizer.zero_grad()\n",
173 |     "    loss.backward()\n",
174 |     "    optimizer.step()\n",
175 |     "    all_logits.append(logits.detach())\n",
176 |     "    \n",
177 |     "    if e % 5 == 0:\n",
178 |     "        print('In epoch {}, loss: {}'.format(e, loss))"
179 |    ]
180 |   },
181 |   {
182 |    "cell_type": "code",
183 |    "execution_count": null,
184 |    "metadata": {},
185 |    "outputs": [],
186 |    "source": [
187 |     "# ----------- 5. check results ------------------------ #\n",
188 |     "pred = torch.argmax(logits, axis=1)\n",
189 |     "print('Accuracy', (pred == labels).sum().item() / len(pred))"
190 |    ]
191 |   },
192 |   {
193 |    "cell_type": "markdown",
194 |    "metadata": {},
195 |    "source": [
196 |     "## Visualize the result\n",
197 |     "\n",
198 |     "Since the GNN produces a logit vector of size 2 for each array. We can plot to a 2-D plane.\n",
199 |     "\n",
200 |     "<img src='../asset/gnn_ep0.png' align='center' width=\"400px\" height=\"300px\"/>\n",
201 |     "<img src='../asset/gnn_ep_anime.gif' align='center' width=\"400px\" height=\"300px\"/>"
202 |    ]
203 |   },
204 |   {
205 |    "cell_type": "markdown",
206 |    "metadata": {},
207 |    "source": [
208 |     "Run the following code to visualize the result. Require ffmpeg."
209 |    ]
210 |   },
211 |   {
212 |    "cell_type": "code",
213 |    "execution_count": null,
214 |    "metadata": {},
215 |    "outputs": [],
216 |    "source": [
217 |     "# A bit of setup, just ignore this cell\n",
218 |     "import matplotlib.pyplot as plt\n",
219 |     "\n",
220 |     "# for auto-reloading external modules\n",
221 |     "%load_ext autoreload\n",
222 |     "%autoreload 2\n",
223 |     "\n",
224 |     "%matplotlib inline\n",
225 |     "plt.rcParams['figure.figsize'] = (4.0, 3.0) # set default size of plots\n",
226 |     "plt.rcParams['image.interpolation'] = 'nearest'\n",
227 |     "plt.rcParams['image.cmap'] = 'gray'\n",
228 |     "plt.rcParams['animation.html'] = 'html5'"
229 |    ]
230 |   },
231 |   {
232 |    "cell_type": "code",
233 |    "execution_count": null,
234 |    "metadata": {},
235 |    "outputs": [],
236 |    "source": [
237 |     "# Visualize the node classification using the logits output. Requires ffmpeg.\n",
238 |     "import networkx as nx\n",
239 |     "import numpy as np\n",
240 |     "import matplotlib.animation as animation\n",
241 |     "from IPython.display import HTML\n",
242 |     "\n",
243 |     "fig = plt.figure(dpi=150)\n",
244 |     "fig.clf()\n",
245 |     "ax = fig.subplots()\n",
246 |     "nx_G = g.to_networkx()\n",
247 |     "def draw(i):\n",
248 |     "    cls1color = '#00FFFF'\n",
249 |     "    cls2color = '#FF00FF'\n",
250 |     "    pos = {}\n",
251 |     "    colors = []\n",
252 |     "    for v in range(34):\n",
253 |     "        pred = all_logits[i].numpy()\n",
254 |     "        pos[v] = pred[v]\n",
255 |     "        cls = labels[v]\n",
256 |     "        colors.append(cls1color if cls else cls2color)\n",
257 |     "    ax.cla()\n",
258 |     "    ax.axis('off')\n",
259 |     "    ax.set_title('Epoch: %d' % i)\n",
260 |     "    nx.draw(nx_G.to_undirected(), pos, node_color=colors, with_labels=True, node_size=200)\n",
261 |     "\n",
262 |     "ani = animation.FuncAnimation(fig, draw, frames=len(all_logits), interval=200)\n",
263 |     "HTML(ani.to_html5_video())"
264 |    ]
265 |   },
266 |   {
267 |    "cell_type": "markdown",
268 |    "metadata": {},
269 |    "source": [
270 |     "## Exercise\n",
271 |     "\n",
272 |     "Play with the GNN models by using other [graph convolution modules](https://docs.dgl.ai/api/python/nn.pytorch.html#module-dgl.nn.pytorch.conv)."
273 |    ]
274 |   },
275 |   {
276 |    "cell_type": "code",
277 |    "execution_count": null,
278 |    "metadata": {},
279 |    "outputs": [],
280 |    "source": []
281 |   }
282 |  ],
283 |  "metadata": {
284 |   "kernelspec": {
285 |    "display_name": "Python 3",
286 |    "language": "python",
287 |    "name": "python3"
288 |   },
289 |   "language_info": {
290 |    "codemirror_mode": {
291 |     "name": "ipython",
292 |     "version": 3
293 |    },
294 |    "file_extension": ".py",
295 |    "mimetype": "text/x-python",
296 |    "name": "python",
297 |    "nbconvert_exporter": "python",
298 |    "pygments_lexer": "ipython3",
299 |    "version": "3.7.5"
300 |   }
301 |  },
302 |  "nbformat": 4,
303 |  "nbformat_minor": 4
304 | }
305 | 


--------------------------------------------------------------------------------
/basic_tasks/3_link_predict.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "# Link Prediction using Graph Neural Networks\n",
  8 |     "\n",
  9 |     "GNNs are powerful tools for many machine learning tasks on graphs. This tutorial teaches the basic workflow of using GNNs for link prediction. We again use the Zachery's Karate Club graph but try to predict interactions between two members.\n",
 10 |     "\n",
 11 |     "In this tutorial, you will learn:\n",
 12 |     "* Prepare training and testing sets for link prediction task.\n",
 13 |     "* Build a GNN-based link prediction model.\n",
 14 |     "* Train the model and verify the result."
 15 |    ]
 16 |   },
 17 |   {
 18 |    "cell_type": "code",
 19 |    "execution_count": null,
 20 |    "metadata": {},
 21 |    "outputs": [],
 22 |    "source": [
 23 |     "import dgl\n",
 24 |     "import torch\n",
 25 |     "import torch.nn as nn\n",
 26 |     "import torch.nn.functional as F\n",
 27 |     "import itertools\n",
 28 |     "import numpy as np\n",
 29 |     "import scipy.sparse as sp"
 30 |    ]
 31 |   },
 32 |   {
 33 |    "cell_type": "markdown",
 34 |    "metadata": {},
 35 |    "source": [
 36 |     "## Load graph and features\n",
 37 |     "\n",
 38 |     "Following the last [session](./2_gnn.ipynb), we first load the Zachery's Karate Club graph and creates node embeddings."
 39 |    ]
 40 |   },
 41 |   {
 42 |    "cell_type": "code",
 43 |    "execution_count": null,
 44 |    "metadata": {},
 45 |    "outputs": [],
 46 |    "source": [
 47 |     "from tutorial_utils import load_zachery\n",
 48 |     "\n",
 49 |     "# ----------- 0. load graph -------------- #\n",
 50 |     "g = load_zachery()\n",
 51 |     "print(g)\n",
 52 |     "\n",
 53 |     "# ----------- 1. node features -------------- #\n",
 54 |     "node_embed = nn.Embedding(g.number_of_nodes(), 5)  # Every node has an embedding of size 5.\n",
 55 |     "inputs = node_embed.weight                         # Use the embedding weight as the node features.\n",
 56 |     "nn.init.xavier_uniform_(inputs)"
 57 |    ]
 58 |   },
 59 |   {
 60 |    "cell_type": "markdown",
 61 |    "metadata": {},
 62 |    "source": [
 63 |     "## Prepare training and testing sets"
 64 |    ]
 65 |   },
 66 |   {
 67 |    "cell_type": "markdown",
 68 |    "metadata": {},
 69 |    "source": [
 70 |     "In general, a link prediction data set contains two types of edges, *positive* and *negative edges*. Positive edges are usually drawn from the existing edges in the graph. In this example, we randomly pick 50 edges for testing and leave the rest for training."
 71 |    ]
 72 |   },
 73 |   {
 74 |    "cell_type": "code",
 75 |    "execution_count": null,
 76 |    "metadata": {},
 77 |    "outputs": [],
 78 |    "source": [
 79 |     "# Split edge set for training and testing\n",
 80 |     "u, v = g.edges()\n",
 81 |     "eids = np.arange(g.number_of_edges())\n",
 82 |     "eids = np.random.permutation(eids)\n",
 83 |     "test_pos_u, test_pos_v = u[eids[:50]], v[eids[:50]]\n",
 84 |     "train_pos_u, train_pos_v = u[eids[50:]], v[eids[50:]]"
 85 |    ]
 86 |   },
 87 |   {
 88 |    "cell_type": "markdown",
 89 |    "metadata": {},
 90 |    "source": [
 91 |     "Since the number of negative edges is large, sampling is usually desired. How to choose proper negative sampling algorithms is a widely-studied topic and is out of scope of this tutorial. Since our example graph is quite small (with only 34 nodes), we enumerate all the missing edges and randomly pick 50 for testing and 150 for training."
 92 |    ]
 93 |   },
 94 |   {
 95 |    "cell_type": "code",
 96 |    "execution_count": null,
 97 |    "metadata": {},
 98 |    "outputs": [],
 99 |    "source": [
100 |     "# Find all negative edges and split them for training and testing\n",
101 |     "adj = sp.coo_matrix((np.ones(len(u)), (u.numpy(), v.numpy())))\n",
102 |     "adj_neg = 1 - adj.todense() - np.eye(34)\n",
103 |     "neg_u, neg_v = np.where(adj_neg != 0)\n",
104 |     "neg_eids = np.random.choice(len(neg_u), 200)\n",
105 |     "test_neg_u, test_neg_v = neg_u[neg_eids[:50]], neg_v[neg_eids[:50]]\n",
106 |     "train_neg_u, train_neg_v = neg_u[neg_eids[50:]], neg_v[neg_eids[50:]]"
107 |    ]
108 |   },
109 |   {
110 |    "cell_type": "markdown",
111 |    "metadata": {},
112 |    "source": [
113 |     "Put positive and negative edges together and form training and testing sets."
114 |    ]
115 |   },
116 |   {
117 |    "cell_type": "code",
118 |    "execution_count": null,
119 |    "metadata": {},
120 |    "outputs": [],
121 |    "source": [
122 |     "# Create training set.\n",
123 |     "train_u = torch.cat([torch.as_tensor(train_pos_u), torch.as_tensor(train_neg_u)])\n",
124 |     "train_v = torch.cat([torch.as_tensor(train_pos_v), torch.as_tensor(train_neg_v)])\n",
125 |     "train_label = torch.cat([torch.zeros(len(train_pos_u)), torch.ones(len(train_neg_u))])\n",
126 |     "\n",
127 |     "# Create testing set.\n",
128 |     "test_u = torch.cat([torch.as_tensor(test_pos_u), torch.as_tensor(test_neg_u)])\n",
129 |     "test_v = torch.cat([torch.as_tensor(test_pos_v), torch.as_tensor(test_neg_v)])\n",
130 |     "test_label = torch.cat([torch.zeros(len(test_pos_u)), torch.ones(len(test_neg_u))])"
131 |    ]
132 |   },
133 |   {
134 |    "cell_type": "markdown",
135 |    "metadata": {},
136 |    "source": [
137 |     "## Define a GraphSAGE model\n",
138 |     "\n",
139 |     "Our model consists of two layers, each computes new node representations by aggregating neighbor information. The equations are:\n",
140 |     "\n",
141 |     "$$\n",
142 |     "h_{\\mathcal{N}(v)}^k\\leftarrow \\text{AGGREGATE}_k\\{h_u^{k-1},\\forall u\\in\\mathcal{N}(v)\\}\n",
143 |     "$$\n",
144 |     "\n",
145 |     "$$\n",
146 |     "h_v^k\\leftarrow \\text{ReLU}\\left(W^k\\cdot \\text{CONCAT}(h_v^{k-1}, h_{\\mathcal{N}(v)}^k) \\right)\n",
147 |     "$$\n",
148 |     "\n",
149 |     "DGL provides implementation of many popular neighbor aggregation modules. They all can be invoked easily with one line of codes. See the full list of supported [graph convolution modules](https://docs.dgl.ai/api/python/nn.pytorch.html#module-dgl.nn.pytorch.conv)."
150 |    ]
151 |   },
152 |   {
153 |    "cell_type": "code",
154 |    "execution_count": null,
155 |    "metadata": {},
156 |    "outputs": [],
157 |    "source": [
158 |     "from dgl.nn import SAGEConv\n",
159 |     "\n",
160 |     "# ----------- 2. create model -------------- #\n",
161 |     "# build a two-layer GraphSAGE model\n",
162 |     "class GraphSAGE(nn.Module):\n",
163 |     "    def __init__(self, in_feats, h_feats):\n",
164 |     "        super(GraphSAGE, self).__init__()\n",
165 |     "        self.conv1 = SAGEConv(in_feats, h_feats, 'mean')\n",
166 |     "        self.conv2 = SAGEConv(h_feats, h_feats, 'mean')\n",
167 |     "    \n",
168 |     "    def forward(self, g, in_feat):\n",
169 |     "        h = self.conv1(g, in_feat)\n",
170 |     "        h = F.relu(h)\n",
171 |     "        h = self.conv2(g, h)\n",
172 |     "        return h\n",
173 |     "    \n",
174 |     "# Create the model with given dimensions \n",
175 |     "# input layer dimension: 5, node embeddings\n",
176 |     "# hidden layer dimension: 16\n",
177 |     "net = GraphSAGE(5, 16)"
178 |    ]
179 |   },
180 |   {
181 |    "cell_type": "markdown",
182 |    "metadata": {},
183 |    "source": [
184 |     "We then optimize the model using the following loss function.\n",
185 |     "\n",
186 |     "$$\n",
187 |     "\\hat{y}_{u\\sim v} = \\sigma(h_u^T h_v)\n",
188 |     "$$\n",
189 |     "\n",
190 |     "$$\n",
191 |     "\\mathcal{L} = -\\sum_{u\\sim v\\in \\mathcal{D}}\\left( y_{u\\sim v}\\log(\\hat{y}_{u\\sim v}) + (1-y_{u\\sim v})\\log(1-\\hat{y}_{u\\sim v})) \\right)\n",
192 |     "$$\n",
193 |     "\n",
194 |     "Essentially, the model predicts a score for each edge by dot-producting the representations of its two end-points. It then computes a binary cross entropy loss with the target $y$ being 0 or 1 meaning whether the edge is a positive one or not."
195 |    ]
196 |   },
197 |   {
198 |    "cell_type": "code",
199 |    "execution_count": null,
200 |    "metadata": {},
201 |    "outputs": [],
202 |    "source": [
203 |     "# ----------- 3. set up loss and optimizer -------------- #\n",
204 |     "# in this case, loss will in training loop\n",
205 |     "optimizer = torch.optim.Adam(itertools.chain(net.parameters(), node_embed.parameters()), lr=0.01)\n",
206 |     "\n",
207 |     "# ----------- 4. training -------------------------------- #\n",
208 |     "all_logits = []\n",
209 |     "for e in range(100):\n",
210 |     "    # forward\n",
211 |     "    logits = net(g, inputs)\n",
212 |     "    pred = torch.sigmoid((logits[train_u] * logits[train_v]).sum(dim=1))\n",
213 |     "    \n",
214 |     "    # compute loss\n",
215 |     "    loss = F.binary_cross_entropy(pred, train_label)\n",
216 |     "    \n",
217 |     "    # backward\n",
218 |     "    optimizer.zero_grad()\n",
219 |     "    loss.backward()\n",
220 |     "    optimizer.step()\n",
221 |     "    all_logits.append(logits.detach())\n",
222 |     "    \n",
223 |     "    if e % 5 == 0:\n",
224 |     "        print('In epoch {}, loss: {}'.format(e, loss))"
225 |    ]
226 |   },
227 |   {
228 |    "cell_type": "code",
229 |    "execution_count": null,
230 |    "metadata": {},
231 |    "outputs": [],
232 |    "source": [
233 |     "# ----------- 5. check results ------------------------ #\n",
234 |     "pred = torch.sigmoid((logits[test_u] * logits[test_v]).sum(dim=1))\n",
235 |     "print('Accuracy', ((pred >= 0.5) == test_label).sum().item() / len(pred))"
236 |    ]
237 |   },
238 |   {
239 |    "cell_type": "code",
240 |    "execution_count": null,
241 |    "metadata": {},
242 |    "outputs": [],
243 |    "source": []
244 |   }
245 |  ],
246 |  "metadata": {
247 |   "kernelspec": {
248 |    "display_name": "Python 3",
249 |    "language": "python",
250 |    "name": "python3"
251 |   },
252 |   "language_info": {
253 |    "codemirror_mode": {
254 |     "name": "ipython",
255 |     "version": 3
256 |    },
257 |    "file_extension": ".py",
258 |    "mimetype": "text/x-python",
259 |    "name": "python",
260 |    "nbconvert_exporter": "python",
261 |    "pygments_lexer": "ipython3",
262 |    "version": "3.7.5"
263 |   }
264 |  },
265 |  "nbformat": 4,
266 |  "nbformat_minor": 4
267 | }
268 | 


--------------------------------------------------------------------------------
/basic_tasks/4_message_passing.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "# Customize Graph Convolution using Message Passing APIs\n",
  8 |     "\n",
  9 |     "In previous sessions, we have learned using the built-in [graph convolution modules](https://docs.dgl.ai/api/python/nn.pytorch.html#module-dgl.nn.pytorch.conv) to build a multi-layer graph neural network. However, sometimes one desires to invent a new way of aggregating neighbor information. DGL's message passing APIs are designed for this scenario.\n",
 10 |     "\n",
 11 |     "In this tutorial, you will learn:\n",
 12 |     "* What is under the hood of the `nn.SAGEConv` module in DGL?\n",
 13 |     "* DGL's message passing APIs.\n",
 14 |     "* Design a new graph convolution module."
 15 |    ]
 16 |   },
 17 |   {
 18 |    "cell_type": "code",
 19 |    "execution_count": null,
 20 |    "metadata": {},
 21 |    "outputs": [],
 22 |    "source": [
 23 |     "import dgl\n",
 24 |     "import torch\n",
 25 |     "import torch.nn as nn\n",
 26 |     "import torch.nn.functional as F"
 27 |    ]
 28 |   },
 29 |   {
 30 |    "cell_type": "markdown",
 31 |    "metadata": {},
 32 |    "source": [
 33 |     "## A gentle explanation of the `SAGEConv` module\n",
 34 |     "\n",
 35 |     "Recall that a `SAGEConv` module aggregates neighbor information and generates new node representations as follows:\n",
 36 |     "\n",
 37 |     "\n",
 38 |     "$$\n",
 39 |     "h_{\\mathcal{N}(v)}^k\\leftarrow \\text{AGGREGATE}_k\\{h_u^{k-1},\\forall u\\in\\mathcal{N}(v)\\}\n",
 40 |     "$$\n",
 41 |     "\n",
 42 |     "$$\n",
 43 |     "h_v^k\\leftarrow \\text{ReLU}\\left(W^k\\cdot \\text{CONCAT}(h_v^{k-1}, h_{\\mathcal{N}(v)}^k) \\right)\n",
 44 |     "$$\n",
 45 |     "\n",
 46 |     "Here is its implementation in DGL."
 47 |    ]
 48 |   },
 49 |   {
 50 |    "cell_type": "code",
 51 |    "execution_count": null,
 52 |    "metadata": {},
 53 |    "outputs": [],
 54 |    "source": [
 55 |     "import dgl.function as fn\n",
 56 |     "\n",
 57 |     "class SAGEConv(nn.Module):\n",
 58 |     "    \"\"\"Graph convolution module used by the GraphSAGE model.\n",
 59 |     "    \n",
 60 |     "    Parameters\n",
 61 |     "    ----------\n",
 62 |     "    in_feat : int\n",
 63 |     "        Input feature size.\n",
 64 |     "    out_feat : int\n",
 65 |     "        Output feature size.\n",
 66 |     "    \"\"\"\n",
 67 |     "    def __init__(self, in_feat, out_feat):\n",
 68 |     "        super(SAGEConv, self).__init__()\n",
 69 |     "        # A linear submodule for projecting the input and neighbor feature to the output.\n",
 70 |     "        self.linear = nn.Linear(in_feat * 2, out_feat)\n",
 71 |     "    \n",
 72 |     "    def forward(self, g, h):\n",
 73 |     "        \"\"\"Forward computation\n",
 74 |     "        \n",
 75 |     "        Parameters\n",
 76 |     "        ----------\n",
 77 |     "        g : Graph\n",
 78 |     "            The input graph.\n",
 79 |     "        h : Tensor\n",
 80 |     "            The input node feature.\n",
 81 |     "        \"\"\"\n",
 82 |     "        # All the `ndata` set within a local scope will be automatically popped out.\n",
 83 |     "        with g.local_scope():\n",
 84 |     "            g.ndata['h'] = h\n",
 85 |     "            # update_all is a message passing API.\n",
 86 |     "            g.update_all(fn.copy_u('h', 'm'), fn.mean('m', 'h_neigh'))\n",
 87 |     "            h_neigh = g.ndata['h_neigh']\n",
 88 |     "            h_total = torch.cat([h, h_neigh], dim=1)\n",
 89 |     "            return F.relu(self.linear(h_total))"
 90 |    ]
 91 |   },
 92 |   {
 93 |    "cell_type": "markdown",
 94 |    "metadata": {},
 95 |    "source": [
 96 |     "The central piece in this code is the `g.update_all` function, which gathers and averages the neighbor features. There are three concepts here:\n",
 97 |     "* Message function `fn.copy_u('h', 'm')` that copies the node feature under name `'h'` as *messages* sent to neighbors.\n",
 98 |     "* Reduce function `fn.mean('m', 'h_neigh')` that averages all the received messages under name `'m'` and saves the result as a new node feature `'h_neigh'`.\n",
 99 |     "* `update_all` tells DGL to trigger the message and reduce functions for all the nodes and edges."
100 |    ]
101 |   },
102 |   {
103 |    "cell_type": "markdown",
104 |    "metadata": {},
105 |    "source": [
106 |     "## Message passing and GNNs\n",
107 |     "\n",
108 |     "The `update_all` is one of the **message passing APIs** in DGL, inspired by the Message Passing Neural Network proposed by [Gilmer et al.](https://arxiv.org/abs/1704.01212) Essentailly, they found many GNN models can fit into the following framework:\n",
109 |     "\n",
110 |     "$$\n",
111 |     "m_{u\\sim v}^{(l)} = M^{(l)}\\left(h_v^{(l-1)}, h_u^{(l-1)}, e_{u\\sim v}^{(l-1)}\\right)\n",
112 |     "$$\n",
113 |     "\n",
114 |     "$$\n",
115 |     "m_{v}^{(l)} = \\sum_{u\\in\\mathcal{N}(v)}m_{u\\sim v}^{(l)}\n",
116 |     "$$\n",
117 |     "\n",
118 |     "$$\n",
119 |     "h_v^{(l)} = U^{(l)}\\left(h_v^{(l-1)}, m_v^{(l)}\\right)\n",
120 |     "$$\n",
121 |     "\n",
122 |     ", where the $M^{(l)}$ is called message function and the $\\sum$ is the reduce function. In DGL, we provide many built-in message and reduce functions under the `dgl.function` package.\n",
123 |     "\n",
124 |     "![api](../asset/dgl-mp.png)\n",
125 |     "\n",
126 |     "You can find more details in [the API doc](https://docs.dgl.ai/api/python/function.html)."
127 |    ]
128 |   },
129 |   {
130 |    "cell_type": "markdown",
131 |    "metadata": {},
132 |    "source": [
133 |     "DGL's message passing APIs allow one to quickly implement new graph convolution modules. For example, the following implements a new `SAGEConv` that aggregates neighbor representations using a weighted average."
134 |    ]
135 |   },
136 |   {
137 |    "cell_type": "code",
138 |    "execution_count": null,
139 |    "metadata": {},
140 |    "outputs": [],
141 |    "source": [
142 |     "class SAGEConv(nn.Module):\n",
143 |     "    \"\"\"Graph convolution module used by the GraphSAGE model.\n",
144 |     "    \n",
145 |     "    Parameters\n",
146 |     "    ----------\n",
147 |     "    in_feat : int\n",
148 |     "        Input feature size.\n",
149 |     "    out_feat : int\n",
150 |     "        Output feature size.\n",
151 |     "    \"\"\"\n",
152 |     "    def __init__(self, in_feat, out_feat):\n",
153 |     "        super(SAGEConv, self).__init__()\n",
154 |     "        # A linear submodule for projecting the input and neighbor feature to the output.\n",
155 |     "        self.linear = nn.Linear(in_feat * 2, out_feat)\n",
156 |     "    \n",
157 |     "    def forward(self, g, h, w):\n",
158 |     "        \"\"\"Forward computation\n",
159 |     "        \n",
160 |     "        Parameters\n",
161 |     "        ----------\n",
162 |     "        g : Graph\n",
163 |     "            The input graph.\n",
164 |     "        h : Tensor\n",
165 |     "            The input node feature.\n",
166 |     "        w : Tensor\n",
167 |     "            The edge weight.\n",
168 |     "        \"\"\"\n",
169 |     "        # All the `ndata` set within a local scope will be automatically popped out.\n",
170 |     "        with g.local_scope():\n",
171 |     "            g.ndata['h'] = h\n",
172 |     "            g.edata['w'] = w\n",
173 |     "            # update_all is a message passing API.\n",
174 |     "            g.update_all(fn.u_mul_e('h', 'w', 'm'), fn.mean('m', 'h_neigh'))\n",
175 |     "            h_neigh = g.ndata['h_neigh']\n",
176 |     "            h_total = torch.cat([h, h_neigh], dim=1)\n",
177 |     "            return F.relu(self.linear(h_total))"
178 |    ]
179 |   },
180 |   {
181 |    "cell_type": "markdown",
182 |    "metadata": {},
183 |    "source": [
184 |     "## Even more customization by user-defined function\n",
185 |     "\n",
186 |     "DGL allows user-defined message and reduce function for the maximal expressiveness. Here is a user-defined message function that is equivalent to `fn.u_mul_e('h', 'w', 'm')`."
187 |    ]
188 |   },
189 |   {
190 |    "cell_type": "code",
191 |    "execution_count": null,
192 |    "metadata": {},
193 |    "outputs": [],
194 |    "source": [
195 |     "def u_mul_e_udf(edges):\n",
196 |     "    return {'m' : edges.src['h'] * edges.data['w']}"
197 |    ]
198 |   },
199 |   {
200 |    "cell_type": "markdown",
201 |    "metadata": {},
202 |    "source": [
203 |     "## Recap\n",
204 |     "\n",
205 |     "* `dgl.nn` provides many popular modules for quick bootstrap.\n",
206 |     "* Using the built-in message and reduce functions in `dgl.function` to customize a new NN module.\n",
207 |     "* User-defined function provides even more flexibility."
208 |    ]
209 |   },
210 |   {
211 |    "cell_type": "code",
212 |    "execution_count": null,
213 |    "metadata": {},
214 |    "outputs": [],
215 |    "source": []
216 |   }
217 |  ],
218 |  "metadata": {
219 |   "kernelspec": {
220 |    "display_name": "Python 3",
221 |    "language": "python",
222 |    "name": "python3"
223 |   },
224 |   "language_info": {
225 |    "codemirror_mode": {
226 |     "name": "ipython",
227 |     "version": 3
228 |    },
229 |    "file_extension": ".py",
230 |    "mimetype": "text/x-python",
231 |    "name": "python",
232 |    "nbconvert_exporter": "python",
233 |    "pygments_lexer": "ipython3",
234 |    "version": "3.7.5"
235 |   }
236 |  },
237 |  "nbformat": 4,
238 |  "nbformat_minor": 2
239 | }
240 | 


--------------------------------------------------------------------------------
/basic_tasks/data/edges.csv:
--------------------------------------------------------------------------------
  1 | Src,Dst,Weight
  2 | 0,1,0.31845103596456925
  3 | 0,2,0.5512145529252186
  4 | 0,3,0.22741585224191552
  5 | 0,4,0.2669188689251851
  6 | 0,5,0.47544947326394815
  7 | 0,6,0.8862627361494558
  8 | 0,7,0.16042605375040297
  9 | 0,8,0.7459807864037868
 10 | 0,10,0.5892903561029029
 11 | 0,11,0.47815888753487035
 12 | 0,12,0.782470682321906
 13 | 0,13,0.4179567052866201
 14 | 0,17,0.3685358669980614
 15 | 0,19,0.9551929987199811
 16 | 0,21,0.9613567741407174
 17 | 0,31,0.726227171317607
 18 | 1,0,0.6481096579934986
 19 | 1,2,0.37891536859801955
 20 | 1,3,0.24363852899322458
 21 | 1,7,0.41483571945746534
 22 | 1,13,0.6155209924586298
 23 | 1,17,0.2680276146366586
 24 | 1,19,0.11295441102477333
 25 | 1,21,0.08552253827088485
 26 | 1,30,0.5935657977437264
 27 | 2,0,0.5362144558950368
 28 | 2,1,0.5768112213837137
 29 | 2,3,0.43712621359822357
 30 | 2,7,0.13078796894587796
 31 | 2,8,0.08022424811393303
 32 | 2,9,0.14176452681346274
 33 | 2,13,0.8180530705514824
 34 | 2,27,0.8320016262502158
 35 | 2,28,0.409540145234772
 36 | 2,32,0.2642001276499689
 37 | 3,0,0.04615295242576234
 38 | 3,1,0.7049149713067023
 39 | 3,2,0.8540307600498042
 40 | 3,7,0.46208027528965967
 41 | 3,12,0.6784210067516615
 42 | 3,13,0.3301297641821943
 43 | 4,0,0.8832405252248015
 44 | 4,6,0.6669881098899139
 45 | 4,10,0.9640899941433938
 46 | 5,0,0.4268133485450748
 47 | 5,6,0.8149795460997955
 48 | 5,10,0.9267811195268474
 49 | 5,16,0.565166945765339
 50 | 6,0,0.026799684543950875
 51 | 6,4,0.9402764505035385
 52 | 6,5,0.6631523621900474
 53 | 6,16,0.28025742709759016
 54 | 7,0,0.9254762282755662
 55 | 7,1,0.796690523489831
 56 | 7,2,0.0979476899619427
 57 | 7,3,0.7074291143964807
 58 | 8,0,0.7761087565695074
 59 | 8,2,0.3073975630293794
 60 | 8,30,0.7605817165692309
 61 | 8,32,0.04947830225770011
 62 | 8,33,0.6309335405401543
 63 | 9,2,0.17380258005907812
 64 | 9,33,0.9785414932201859
 65 | 10,0,0.7191944186343044
 66 | 10,4,0.6595607436981878
 67 | 10,5,0.5389153328170596
 68 | 11,0,0.8284252928975201
 69 | 12,0,0.08159504164928544
 70 | 12,3,0.026621551124401566
 71 | 13,0,0.37654143271899876
 72 | 13,1,0.698648970574733
 73 | 13,2,0.1974780964394499
 74 | 13,3,0.45021972444903435
 75 | 13,33,0.9722036402765037
 76 | 14,32,0.009557409279240536
 77 | 14,33,0.589593597849638
 78 | 15,32,0.9878091453185213
 79 | 15,33,0.056667558149276376
 80 | 16,5,0.9045359763970754
 81 | 16,6,0.16879835545426658
 82 | 17,0,0.7730618928346192
 83 | 17,1,0.612815141007156
 84 | 18,32,0.4986886436717537
 85 | 18,33,0.39903509029811446
 86 | 19,0,0.08989463618875349
 87 | 19,1,0.7528198786718245
 88 | 19,33,0.8144615833348146
 89 | 20,32,0.6850036047325682
 90 | 20,33,0.10859338317785638
 91 | 21,0,0.20571853793912853
 92 | 21,1,0.8687748452451053
 93 | 22,32,0.008113164327674838
 94 | 22,33,0.36145242064640726
 95 | 23,25,0.19801959093221744
 96 | 23,27,0.7132375875281998
 97 | 23,29,0.8363094707133548
 98 | 23,32,0.28537615612136547
 99 | 23,33,0.0772935150077827
100 | 24,25,0.26813609940254624
101 | 24,27,0.22638821516538454
102 | 24,31,0.642997701810025
103 | 25,23,0.14459300691102495
104 | 25,24,0.7946476714989169
105 | 25,31,0.7388561092944019
106 | 26,29,0.445934837683155
107 | 26,33,0.4916511327260056
108 | 27,2,0.9527176446503433
109 | 27,23,0.7422628198042871
110 | 27,24,0.23101654883380685
111 | 27,33,0.9550339587184191
112 | 28,2,0.3339314258188022
113 | 28,31,0.34149893586394486
114 | 28,33,0.8180157469491468
115 | 29,23,0.4771935478203284
116 | 29,26,0.12938838154434495
117 | 29,32,0.1215136458344257
118 | 29,33,0.019569167078249627
119 | 30,1,0.6393264342401126
120 | 30,8,0.6646644798466513
121 | 30,32,0.1479691524369151
122 | 30,33,0.6403112524880046
123 | 31,0,0.1394065169246964
124 | 31,24,0.134245083586921
125 | 31,25,0.6243484303605552
126 | 31,28,0.3482911865356624
127 | 31,32,0.23331307519961453
128 | 31,33,0.4593599814263031
129 | 32,2,0.8839177811841961
130 | 32,8,0.5539934876068489
131 | 32,14,0.41970743621036855
132 | 32,15,0.8168582822265494
133 | 32,18,0.30481639228312607
134 | 32,20,0.07279882286966943
135 | 32,22,0.3978619031804904
136 | 32,23,0.20690915689699585
137 | 32,29,0.2338575632865122
138 | 32,30,0.8955881108049515
139 | 32,31,0.7316583958942351
140 | 32,33,0.0033742797158278215
141 | 33,8,0.3631579670659171
142 | 33,9,0.5292689687034333
143 | 33,13,0.7535800037841369
144 | 33,14,0.6738394218089896
145 | 33,15,0.8140789052125266
146 | 33,18,0.8968652515555932
147 | 33,19,0.45952115221569223
148 | 33,20,0.6897437506770194
149 | 33,22,0.5883557598002018
150 | 33,23,0.004996264899124525
151 | 33,26,0.04515847947583995
152 | 33,27,0.8556199432433349
153 | 33,28,0.2664787836457956
154 | 33,29,0.2799011634702968
155 | 33,30,0.6521544693031561
156 | 33,31,0.8285364872414698
157 | 33,32,0.8426561777783549
158 | 


--------------------------------------------------------------------------------
/basic_tasks/data/gen_data.py:
--------------------------------------------------------------------------------
 1 | import networkx as nx
 2 | import torch
 3 | import scipy.sparse as sp
 4 | import pandas as pd
 5 | import numpy as np
 6 | import random
 7 | 
 8 | g = nx.karate_club_graph().to_undirected().to_directed()
 9 | ids = []
10 | clubs = []
11 | ages = []
12 | for nid, attr in g.nodes(data=True):
13 |     ids.append(nid)
14 |     clubs.append(attr['club'])
15 |     ages.append(random.randint(30, 50))
16 | nodes = pd.DataFrame({'Id' : ids, 'Club' : clubs, 'Age' : ages})
17 | print(nodes)
18 | src = []
19 | dst = []
20 | weight = []
21 | for u, v in g.edges():
22 |     src.append(u)
23 |     dst.append(v)
24 |     weight.append(random.random())
25 | edges = pd.DataFrame({'Src' : src, 'Dst' : dst, 'Weight' : weight})
26 | print(edges)
27 | 
28 | nodes.to_csv('nodes.csv', index=False)
29 | edges.to_csv('edges.csv', index=False)
30 | 
31 | #with open('edges.txt', 'w') as f:
32 | #    for u, v in zip(src, dst):
33 | #        f.write('{} {}\n'.format(u, v))
34 | #
35 | #torch.save(torch.tensor(src), 'src.pt')
36 | #torch.save(torch.tensor(dst), 'dst.pt')
37 | #
38 | #spmat = nx.to_scipy_sparse_matrix(g)
39 | #print(spmat)
40 | #sp.save_npz('scipy_adj.npz', spmat)
41 | #
42 | #from networkx.readwrite import json_graph
43 | #import json
44 | #
45 | #with open('adj.json', 'w') as f:
46 | #    json.dump(json_graph.adjacency_data(g), f)
47 | #
48 | #node_feat = torch.randn((34, 5)) / 10.
49 | #edge_feat = torch.ones((156,))
50 | #torch.save(node_feat, 'node_feat.pt')
51 | #torch.save(edge_feat, 'edge_feat.pt')
52 | 


--------------------------------------------------------------------------------
/basic_tasks/data/nodes.csv:
--------------------------------------------------------------------------------
 1 | Id,Club,Age
 2 | 0,Mr. Hi,45
 3 | 1,Mr. Hi,33
 4 | 2,Mr. Hi,36
 5 | 3,Mr. Hi,31
 6 | 4,Mr. Hi,41
 7 | 5,Mr. Hi,42
 8 | 6,Mr. Hi,48
 9 | 7,Mr. Hi,41
10 | 8,Mr. Hi,30
11 | 9,Officer,35
12 | 10,Mr. Hi,38
13 | 11,Mr. Hi,44
14 | 12,Mr. Hi,37
15 | 13,Mr. Hi,39
16 | 14,Officer,36
17 | 15,Officer,38
18 | 16,Mr. Hi,47
19 | 17,Mr. Hi,45
20 | 18,Officer,41
21 | 19,Mr. Hi,31
22 | 20,Officer,31
23 | 21,Mr. Hi,44
24 | 22,Officer,42
25 | 23,Officer,32
26 | 24,Officer,30
27 | 25,Officer,50
28 | 26,Officer,30
29 | 27,Officer,43
30 | 28,Officer,48
31 | 29,Officer,40
32 | 30,Officer,39
33 | 31,Officer,45
34 | 32,Officer,47
35 | 33,Officer,33
36 | 


--------------------------------------------------------------------------------
/basic_tasks/slides.pptx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/dglai/WWW20-Hands-on-Tutorial/8fd0bbf9aca2bfcb95074e4124e01a2b074be300/basic_tasks/slides.pptx


--------------------------------------------------------------------------------
/basic_tasks/tutorial_utils.py:
--------------------------------------------------------------------------------
 1 | import dgl
 2 | import pandas as pd
 3 | import torch
 4 | import torch.nn.functional as F
 5 | 
 6 | def load_zachery():
 7 |     nodes_data = pd.read_csv('data/nodes.csv')
 8 |     edges_data = pd.read_csv('data/edges.csv')
 9 |     src = edges_data['Src'].to_numpy()
10 |     dst = edges_data['Dst'].to_numpy()
11 |     g = dgl.graph((src, dst))
12 |     club = nodes_data['Club'].to_list()
13 |     # Convert to categorical integer values with 0 for 'Mr. Hi', 1 for 'Officer'.
14 |     club = torch.tensor([c == 'Officer' for c in club]).long()
15 |     # We can also convert it to one-hot encoding.
16 |     club_onehot = F.one_hot(club)
17 |     g.ndata.update({'club' : club, 'club_onehot' : club_onehot})
18 |     return g
19 | 


--------------------------------------------------------------------------------
/basic_tasks_tf/1_load_data.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "# Load and Process Graph Data\n",
  8 |     "\n",
  9 |     "In this session, you will learn:\n",
 10 |     "\n",
 11 |     "* Load graph data stored in CSV files.\n",
 12 |     "* Construct a graph in DGL.\n",
 13 |     "* Query structural information of a DGL graph.\n",
 14 |     "* Load and pre-process node and edge features.\n"
 15 |    ]
 16 |   },
 17 |   {
 18 |    "cell_type": "code",
 19 |    "execution_count": null,
 20 |    "metadata": {},
 21 |    "outputs": [],
 22 |    "source": [
 23 |     "from tutorial_utils import setup_tf\n",
 24 |     "setup_tf()"
 25 |    ]
 26 |   },
 27 |   {
 28 |    "cell_type": "markdown",
 29 |    "metadata": {},
 30 |    "source": [
 31 |     "## Load graph data from CSV"
 32 |    ]
 33 |   },
 34 |   {
 35 |    "cell_type": "markdown",
 36 |    "metadata": {},
 37 |    "source": [
 38 |     "Comma Separated Values (CSV) is a widely-used format for storing relational data. In this tutorial, we have prepared two CSV files that store [the Zachery's Karate Club network](https://en.wikipedia.org/wiki/Zachary%27s_karate_club).\n",
 39 |     "* The `nodes.csv` stores every club members and their attributes.\n",
 40 |     "* The `edges.csv` stores the pair-wise interactions between two club members."
 41 |    ]
 42 |   },
 43 |   {
 44 |    "cell_type": "code",
 45 |    "execution_count": null,
 46 |    "metadata": {},
 47 |    "outputs": [],
 48 |    "source": [
 49 |     "!ls -lh 'data'"
 50 |    ]
 51 |   },
 52 |   {
 53 |    "cell_type": "markdown",
 54 |    "metadata": {},
 55 |    "source": [
 56 |     "We can use `pandas` to load the two CSV files."
 57 |    ]
 58 |   },
 59 |   {
 60 |    "cell_type": "code",
 61 |    "execution_count": null,
 62 |    "metadata": {},
 63 |    "outputs": [],
 64 |    "source": [
 65 |     "import pandas as pd\n",
 66 |     "\n",
 67 |     "nodes_data = pd.read_csv('data/nodes.csv')\n",
 68 |     "print(nodes_data)"
 69 |    ]
 70 |   },
 71 |   {
 72 |    "cell_type": "code",
 73 |    "execution_count": null,
 74 |    "metadata": {},
 75 |    "outputs": [],
 76 |    "source": [
 77 |     "edges_data = pd.read_csv('data/edges.csv')\n",
 78 |     "print(edges_data)"
 79 |    ]
 80 |   },
 81 |   {
 82 |    "cell_type": "markdown",
 83 |    "metadata": {},
 84 |    "source": [
 85 |     "We then construct a graph where each node is a club member and each edge represents their interactions. In DGL, **nodes are consecutive integers starting from zero**. Thus, when preparing the data, it is important to re-label or re-shuffle the row order so that the first row corresponding to the first nodes, so on and so forth.\n",
 86 |     "\n",
 87 |     "In this example, we have already prepared the data in the correct order, so we can create the graph by the `'Src'` and `'Dst'` columns from the `edges.csv` table."
 88 |    ]
 89 |   },
 90 |   {
 91 |    "cell_type": "code",
 92 |    "execution_count": null,
 93 |    "metadata": {},
 94 |    "outputs": [],
 95 |    "source": [
 96 |     "import dgl\n",
 97 |     "\n",
 98 |     "src = edges_data['Src'].to_numpy()\n",
 99 |     "dst = edges_data['Dst'].to_numpy()\n",
100 |     "\n",
101 |     "# Create a DGL graph from a pair of numpy arrays\n",
102 |     "g = dgl.graph((src, dst))\n",
103 |     "\n",
104 |     "# Print a graph gives some meta information such as number of nodes and edges.\n",
105 |     "print(g)"
106 |    ]
107 |   },
108 |   {
109 |    "cell_type": "markdown",
110 |    "metadata": {},
111 |    "source": [
112 |     "A DGL graph can be converted to a `networkx` graph, so to utilize its rich functionalities such as visualization."
113 |    ]
114 |   },
115 |   {
116 |    "cell_type": "code",
117 |    "execution_count": null,
118 |    "metadata": {},
119 |    "outputs": [],
120 |    "source": [
121 |     "import networkx as nx\n",
122 |     "# Since the actual graph is undirected, we convert it for visualization\n",
123 |     "# purpose.\n",
124 |     "nx_g = g.to_networkx().to_undirected()\n",
125 |     "# Kamada-Kawaii layout usually looks pretty for arbitrary graphs\n",
126 |     "pos = nx.kamada_kawai_layout(nx_g)\n",
127 |     "nx.draw(nx_g, pos, with_labels=True, node_color=[[.7, .7, .7]])"
128 |    ]
129 |   },
130 |   {
131 |    "cell_type": "markdown",
132 |    "metadata": {},
133 |    "source": [
134 |     "## Query graph structures"
135 |    ]
136 |   },
137 |   {
138 |    "cell_type": "markdown",
139 |    "metadata": {},
140 |    "source": [
141 |     "Let's print out how many nodes and edges are there in this graph."
142 |    ]
143 |   },
144 |   {
145 |    "cell_type": "code",
146 |    "execution_count": null,
147 |    "metadata": {},
148 |    "outputs": [],
149 |    "source": [
150 |     "print('#Nodes', g.number_of_nodes())\n",
151 |     "print('#Edges', g.number_of_edges())"
152 |    ]
153 |   },
154 |   {
155 |    "cell_type": "markdown",
156 |    "metadata": {},
157 |    "source": [
158 |     "We can also perform queries on the graph structures."
159 |    ]
160 |   },
161 |   {
162 |    "cell_type": "markdown",
163 |    "metadata": {},
164 |    "source": [
165 |     "Get the in-degree of node 0:"
166 |    ]
167 |   },
168 |   {
169 |    "cell_type": "code",
170 |    "execution_count": null,
171 |    "metadata": {},
172 |    "outputs": [],
173 |    "source": [
174 |     "g.in_degree(0)"
175 |    ]
176 |   },
177 |   {
178 |    "cell_type": "markdown",
179 |    "metadata": {},
180 |    "source": [
181 |     "Get the successors of node 0:"
182 |    ]
183 |   },
184 |   {
185 |    "cell_type": "code",
186 |    "execution_count": null,
187 |    "metadata": {},
188 |    "outputs": [],
189 |    "source": [
190 |     "g.successors(0)"
191 |    ]
192 |   },
193 |   {
194 |    "cell_type": "markdown",
195 |    "metadata": {},
196 |    "source": [
197 |     "DGL provides APIs for querying structural information. See the API document [here](https://docs.dgl.ai/api/python/heterograph.html#querying-graph-structure)."
198 |    ]
199 |   },
200 |   {
201 |    "cell_type": "markdown",
202 |    "metadata": {},
203 |    "source": [
204 |     "![image.png](../asset/dgl-query.png)"
205 |    ]
206 |   },
207 |   {
208 |    "cell_type": "markdown",
209 |    "metadata": {},
210 |    "source": [
211 |     "## Load node and edge features"
212 |    ]
213 |   },
214 |   {
215 |    "cell_type": "markdown",
216 |    "metadata": {},
217 |    "source": [
218 |     "In many graph data, nodes and edges have attributes. Although these attributes can have arbitrary types, a DGL graph only accepts attributes stored in tensors (with numerical contents). The vast development of deep learning has provided us many ways to vectorize various types of attributes into numerical features. Here are some general suggestions:\n",
219 |     "* For categorical attributes (e.g. gender, occupation), consider converting them to integers or one-hot encoding.\n",
220 |     "* For variable length string contents (e.g. news article, quote), consider applying a language model.\n",
221 |     "* For images, consider applying a vision model such as CNNs.\n",
222 |     "\n",
223 |     "Our data set has the following attribute columns:\n",
224 |     "* `Age` is already an integer attribute.\n",
225 |     "* `Club` is a categorical attribute representing which community each member belongs to.\n",
226 |     "* `Weight` is a floating number indicating the strength of each interaction."
227 |    ]
228 |   },
229 |   {
230 |    "cell_type": "code",
231 |    "execution_count": null,
232 |    "metadata": {},
233 |    "outputs": [],
234 |    "source": [
235 |     "import tensorflow as tf\n",
236 |     "import numpy as np\n",
237 |     "\n",
238 |     "# Prepare the age node feature\n",
239 |     "age = tf.constant(nodes_data['Age'].to_numpy(), dtype=tf.float32) / 100\n",
240 |     "print(age)"
241 |    ]
242 |   },
243 |   {
244 |    "cell_type": "code",
245 |    "execution_count": null,
246 |    "metadata": {},
247 |    "outputs": [],
248 |    "source": [
249 |     "# Get the features of node 0 and 10\n",
250 |     "tf.gather(age, [0, 10])"
251 |    ]
252 |   },
253 |   {
254 |    "cell_type": "markdown",
255 |    "metadata": {},
256 |    "source": [
257 |     "Use `g.ndata` to set the age features to the graph."
258 |    ]
259 |   },
260 |   {
261 |    "cell_type": "code",
262 |    "execution_count": null,
263 |    "metadata": {},
264 |    "outputs": [],
265 |    "source": [
266 |     "# Feed the features to graph\n",
267 |     "g.ndata['age'] = age\n",
268 |     "print(g)"
269 |    ]
270 |   },
271 |   {
272 |    "cell_type": "code",
273 |    "execution_count": null,
274 |    "metadata": {},
275 |    "outputs": [],
276 |    "source": [
277 |     "# The \"Club\" column represents which community does each node belong to.\n",
278 |     "# The values are of string type, so we must convert it to either categorical\n",
279 |     "# integer values or one-hot encoding.\n",
280 |     "\n",
281 |     "club = nodes_data['Club'].to_list()\n",
282 |     "# Convert to categorical integer values with 0 for 'Mr. Hi', 1 for 'Officer'.\n",
283 |     "club = tf.constant([c == 'Officer' for c in club], dtype=tf.int64)\n",
284 |     "# We can also convert it to one-hot encoding.\n",
285 |     "club_onehot = tf.one_hot(club, np.max(club)+1)\n",
286 |     "print(club_onehot)\n",
287 |     "\n",
288 |     "# Use `g.ndata` like a normal dictionary\n",
289 |     "g.ndata.update({'club' : club, 'club_onehot' : club_onehot})\n",
290 |     "# Remove some features using del\n",
291 |     "del g.ndata['age']\n",
292 |     "\n",
293 |     "print(g)"
294 |    ]
295 |   },
296 |   {
297 |    "cell_type": "markdown",
298 |    "metadata": {},
299 |    "source": [
300 |     "Feeding edge features to a DGL graph is similar."
301 |    ]
302 |   },
303 |   {
304 |    "cell_type": "code",
305 |    "execution_count": null,
306 |    "metadata": {},
307 |    "outputs": [],
308 |    "source": [
309 |     "# Get edge features from the DataFrame and feed it to graph.\n",
310 |     "edge_weight = tf.constant(edges_data['Weight'].to_numpy())\n",
311 |     "# Similarly, use `g.edata` for getting/setting edge features.\n",
312 |     "g.edata['weight'] = edge_weight\n",
313 |     "print(g)"
314 |    ]
315 |   },
316 |   {
317 |    "cell_type": "code",
318 |    "execution_count": null,
319 |    "metadata": {},
320 |    "outputs": [],
321 |    "source": []
322 |   }
323 |  ],
324 |  "metadata": {
325 |   "kernelspec": {
326 |    "display_name": "3.6.8",
327 |    "language": "python",
328 |    "name": "3.6.8"
329 |   },
330 |   "language_info": {
331 |    "codemirror_mode": {
332 |     "name": "ipython",
333 |     "version": 3
334 |    },
335 |    "file_extension": ".py",
336 |    "mimetype": "text/x-python",
337 |    "name": "python",
338 |    "nbconvert_exporter": "python",
339 |    "pygments_lexer": "ipython3",
340 |    "version": "3.6.8"
341 |   }
342 |  },
343 |  "nbformat": 4,
344 |  "nbformat_minor": 4
345 | }
346 | 


--------------------------------------------------------------------------------
/basic_tasks_tf/2_gnn.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "# Semi-supervised Community Detection using Graph Neural Networks\n",
  8 |     "\n",
  9 |     "Almost every computer 101 class starts with a \"Hello World\" example. Like MNIST for deep learning, in graph domain we have the Zachary's Karate Club problem. The karate club is a social network that includes 34 members and documents pairwise links between members who interact outside the club. The club later divides into two communities led by the instructor (node 0) and the club president (node 33). The network is visualized as follows with the color indicating the community.\n",
 10 |     "<img src='../asset/karat_club.png' align='center' width=\"400px\" height=\"300px\" />\n",
 11 |     "\n",
 12 |     "In this tutorial, you will learn:\n",
 13 |     "\n",
 14 |     "* Formulate the community detection problem as a semi-supervised node classification task.\n",
 15 |     "* Build a GraphSAGE model, a popular Graph Neural Network architecture proposed by [Hamilton et al.](https://arxiv.org/abs/1706.02216)\n",
 16 |     "* Train the model and understand the result."
 17 |    ]
 18 |   },
 19 |   {
 20 |    "cell_type": "code",
 21 |    "execution_count": null,
 22 |    "metadata": {},
 23 |    "outputs": [],
 24 |    "source": [
 25 |     "from tutorial_utils import setup_tf\n",
 26 |     "setup_tf()"
 27 |    ]
 28 |   },
 29 |   {
 30 |    "cell_type": "code",
 31 |    "execution_count": null,
 32 |    "metadata": {},
 33 |    "outputs": [],
 34 |    "source": [
 35 |     "import dgl\n",
 36 |     "import tensorflow as tf\n",
 37 |     "import itertools"
 38 |    ]
 39 |   },
 40 |   {
 41 |    "cell_type": "markdown",
 42 |    "metadata": {},
 43 |    "source": [
 44 |     "## Community detection as node classification\n",
 45 |     "\n",
 46 |     "The study of community structure in graphs has a long history. Many proposed methods are *unsupervised* (or *self-supervised* by recent definition), where the model predicts the community labels only by connectivity. Recently, [Kipf et al.,](https://arxiv.org/abs/1609.02907) proposed to formulate the community detection problem as a semi-supervised node classification task. With the help of only a small portion of labeled nodes, a GNN can accurately predict the community labels of the others.\n",
 47 |     "\n",
 48 |     "In this tutorial, we apply Kipf's setting to the Zachery's Karate Club network to predict the community membership, where only the labels of a few nodes are used."
 49 |    ]
 50 |   },
 51 |   {
 52 |    "cell_type": "markdown",
 53 |    "metadata": {},
 54 |    "source": [
 55 |     "We first load the graph and node labels as is covered in the [last session](./1_load_data.ipynb). Here, we have provided you a function for loading the data."
 56 |    ]
 57 |   },
 58 |   {
 59 |    "cell_type": "code",
 60 |    "execution_count": null,
 61 |    "metadata": {},
 62 |    "outputs": [],
 63 |    "source": [
 64 |     "from tutorial_utils import load_zachery\n",
 65 |     "\n",
 66 |     "# ----------- 0. load graph -------------- #\n",
 67 |     "g = load_zachery()\n",
 68 |     "print(g)"
 69 |    ]
 70 |   },
 71 |   {
 72 |    "cell_type": "markdown",
 73 |    "metadata": {},
 74 |    "source": [
 75 |     "In the original Zachery's Karate Club graph, nodes are feature-less. (The `'Age'` attribute is an artificial one mainly for tutorial purposes). For feature-less graph, a common practice is to use an embedding weight that is updated during training for every node.\n",
 76 |     "\n",
 77 |     "We can use Keras's `Embedding` module to achieve this."
 78 |    ]
 79 |   },
 80 |   {
 81 |    "cell_type": "code",
 82 |    "execution_count": null,
 83 |    "metadata": {},
 84 |    "outputs": [],
 85 |    "source": [
 86 |     "# ----------- 1. node features -------------- #\n",
 87 |     "node_embed = tf.keras.layers.Embedding(g.number_of_nodes(), 5,\n",
 88 |     "                                       embeddings_initializer='glorot_uniform')  # Every node has an embedding of size 5.\n",
 89 |     "node_embed(1) # intialize embedding layer\n",
 90 |     "inputs = node_embed.embeddings # the embedding matrix\n",
 91 |     "print(inputs)"
 92 |    ]
 93 |   },
 94 |   {
 95 |    "cell_type": "markdown",
 96 |    "metadata": {},
 97 |    "source": [
 98 |     "The community label is stored in the `'club'` node feature (0 for instructor, 1 for club president). Only nodes 0 and 33 are labeled."
 99 |    ]
100 |   },
101 |   {
102 |    "cell_type": "code",
103 |    "execution_count": null,
104 |    "metadata": {},
105 |    "outputs": [],
106 |    "source": [
107 |     "labels = g.ndata['club']\n",
108 |     "labeled_nodes = [0, 33]\n",
109 |     "print('Labels', tf.gather(labels, labeled_nodes))"
110 |    ]
111 |   },
112 |   {
113 |    "cell_type": "markdown",
114 |    "metadata": {},
115 |    "source": [
116 |     "## Define a GraphSAGE model\n",
117 |     "\n",
118 |     "Our model consists of two layers, each computes new node representations by aggregating neighbor information. The equations are:\n",
119 |     "\n",
120 |     "$$\n",
121 |     "h_{\\mathcal{N}(v)}^k\\leftarrow \\text{AGGREGATE}_k\\{h_u^{k-1},\\forall u\\in\\mathcal{N}(v)\\}\n",
122 |     "$$\n",
123 |     "\n",
124 |     "$$\n",
125 |     "h_v^k\\leftarrow \\sigma\\left(W^k\\cdot \\text{CONCAT}(h_v^{k-1}, h_{\\mathcal{N}(v)}^k) \\right)\n",
126 |     "$$\n",
127 |     "\n",
128 |     "DGL provides implementation of many popular neighbor aggregation modules. They all can be invoked easily with one line of codes. See the full list of supported [graph convolution modules](https://docs.dgl.ai/api/python/nn.pytorch.html#module-dgl.nn.pytorch.conv)."
129 |    ]
130 |   },
131 |   {
132 |    "cell_type": "code",
133 |    "execution_count": null,
134 |    "metadata": {},
135 |    "outputs": [],
136 |    "source": [
137 |     "from dgl.nn import SAGEConv\n",
138 |     "\n",
139 |     "# ----------- 2. create model -------------- #\n",
140 |     "# build a two-layer GraphSAGE model\n",
141 |     "class GraphSAGE(tf.keras.layers.Layer):\n",
142 |     "    def __init__(self, in_feats, h_feats, num_classes):\n",
143 |     "        super(GraphSAGE, self).__init__()\n",
144 |     "        self.conv1 = SAGEConv(in_feats, h_feats, 'mean')\n",
145 |     "        self.conv2 = SAGEConv(h_feats, num_classes, 'mean')\n",
146 |     "    \n",
147 |     "    def call(self, g, in_feat):\n",
148 |     "        h = self.conv1(g, in_feat)\n",
149 |     "        h = tf.nn.relu(h)\n",
150 |     "        h = self.conv2(g, h)\n",
151 |     "        return h\n",
152 |     "    \n",
153 |     "# Create the model with given dimensions \n",
154 |     "# input layer dimension: 5, node embeddings\n",
155 |     "# hidden layer dimension: 16\n",
156 |     "# output layer dimension: 2, the two classes, 0 and 1\n",
157 |     "net = GraphSAGE(5, 16, 2)"
158 |    ]
159 |   },
160 |   {
161 |    "cell_type": "code",
162 |    "execution_count": null,
163 |    "metadata": {},
164 |    "outputs": [],
165 |    "source": [
166 |     "# ----------- 3. set up loss and optimizer -------------- #\n",
167 |     "optimizer = tf.keras.optimizers.Adam(learning_rate=0.01)\n",
168 |     "loss_fcn = tf.keras.losses.SparseCategoricalCrossentropy(\n",
169 |     "    from_logits=True)\n",
170 |     "\n",
171 |     "# ----------- 4. training -------------------------------- #\n",
172 |     "all_logits = []\n",
173 |     "for e in range(100):\n",
174 |     "    \n",
175 |     "    with tf.GradientTape() as tape:\n",
176 |     "        tape.watch(inputs) # optimize embedding layer also\n",
177 |     "        \n",
178 |     "        # forward\n",
179 |     "        logits = net(g, inputs)\n",
180 |     "\n",
181 |     "        # compute loss\n",
182 |     "        loss = loss_fcn(tf.gather(labels, labeled_nodes), \n",
183 |     "                        tf.gather(logits, labeled_nodes))\n",
184 |     "\n",
185 |     "        # backward\n",
186 |     "        grads = tape.gradient(loss, net.trainable_weights + node_embed.trainable_weights)        \n",
187 |     "        optimizer.apply_gradients(zip(grads, net.trainable_weights + node_embed.trainable_weights))\n",
188 |     "        all_logits.append(logits.numpy())\n",
189 |     "    \n",
190 |     "    if e % 5 == 0:\n",
191 |     "        print('In epoch {}, loss: {}'.format(e, loss))"
192 |    ]
193 |   },
194 |   {
195 |    "cell_type": "code",
196 |    "execution_count": null,
197 |    "metadata": {},
198 |    "outputs": [],
199 |    "source": [
200 |     "# ----------- 5. check results ------------------------ #\n",
201 |     "pred = tf.argmax(logits, axis=1).numpy()\n",
202 |     "print('Accuracy', (pred == labels.numpy()).sum().item() / len(pred))"
203 |    ]
204 |   },
205 |   {
206 |    "cell_type": "markdown",
207 |    "metadata": {},
208 |    "source": [
209 |     "## Visualize the result\n",
210 |     "\n",
211 |     "Since the GNN produces a logit vector of size 2 for each array. We can plot to a 2-D plane.\n",
212 |     "\n",
213 |     "<img src='../asset/gnn_ep0.png' align='center' width=\"400px\" height=\"300px\"/>\n",
214 |     "<img src='../asset/gnn_ep_anime.gif' align='center' width=\"400px\" height=\"300px\"/>"
215 |    ]
216 |   },
217 |   {
218 |    "cell_type": "markdown",
219 |    "metadata": {},
220 |    "source": [
221 |     "Run the following code to visualize the result. Require ffmpeg."
222 |    ]
223 |   },
224 |   {
225 |    "cell_type": "code",
226 |    "execution_count": null,
227 |    "metadata": {},
228 |    "outputs": [],
229 |    "source": [
230 |     "# A bit of setup, just ignore this cell\n",
231 |     "import matplotlib.pyplot as plt\n",
232 |     "\n",
233 |     "# for auto-reloading external modules\n",
234 |     "%load_ext autoreload\n",
235 |     "%autoreload 2\n",
236 |     "\n",
237 |     "%matplotlib inline\n",
238 |     "plt.rcParams['figure.figsize'] = (4.0, 3.0) # set default size of plots\n",
239 |     "plt.rcParams['image.interpolation'] = 'nearest'\n",
240 |     "plt.rcParams['image.cmap'] = 'gray'\n",
241 |     "plt.rcParams['animation.html'] = 'html5'"
242 |    ]
243 |   },
244 |   {
245 |    "cell_type": "code",
246 |    "execution_count": null,
247 |    "metadata": {},
248 |    "outputs": [],
249 |    "source": [
250 |     "# Visualize the node classification using the logits output. Requires ffmpeg.\n",
251 |     "import networkx as nx\n",
252 |     "import numpy as np\n",
253 |     "import matplotlib.animation as animation\n",
254 |     "from IPython.display import HTML\n",
255 |     "\n",
256 |     "fig = plt.figure(dpi=150)\n",
257 |     "fig.clf()\n",
258 |     "ax = fig.subplots()\n",
259 |     "nx_G = g.to_networkx()\n",
260 |     "def draw(i):\n",
261 |     "    cls1color = '#00FFFF'\n",
262 |     "    cls2color = '#FF00FF'\n",
263 |     "    pos = {}\n",
264 |     "    colors = []\n",
265 |     "    for v in range(34):\n",
266 |     "        pred = all_logits[i].numpy()\n",
267 |     "        pos[v] = pred[v]\n",
268 |     "        cls = labels[v]\n",
269 |     "        colors.append(cls1color if cls else cls2color)\n",
270 |     "    ax.cla()\n",
271 |     "    ax.axis('off')\n",
272 |     "    ax.set_title('Epoch: %d' % i)\n",
273 |     "    nx.draw(nx_G.to_undirected(), pos, node_color=colors, with_labels=True, node_size=200)\n",
274 |     "\n",
275 |     "ani = animation.FuncAnimation(fig, draw, frames=len(all_logits), interval=200)\n",
276 |     "HTML(ani.to_html5_video())"
277 |    ]
278 |   },
279 |   {
280 |    "cell_type": "markdown",
281 |    "metadata": {},
282 |    "source": [
283 |     "## Exercise\n",
284 |     "\n",
285 |     "Play with the GNN models by using other [graph convolution modules](https://docs.dgl.ai/api/python/nn.pytorch.html#module-dgl.nn.pytorch.conv)."
286 |    ]
287 |   },
288 |   {
289 |    "cell_type": "code",
290 |    "execution_count": null,
291 |    "metadata": {},
292 |    "outputs": [],
293 |    "source": []
294 |   }
295 |  ],
296 |  "metadata": {
297 |   "kernelspec": {
298 |    "display_name": "Python [conda env:dgl]",
299 |    "language": "python",
300 |    "name": "conda-env-dgl-py"
301 |   },
302 |   "language_info": {
303 |    "codemirror_mode": {
304 |     "name": "ipython",
305 |     "version": 3
306 |    },
307 |    "file_extension": ".py",
308 |    "mimetype": "text/x-python",
309 |    "name": "python",
310 |    "nbconvert_exporter": "python",
311 |    "pygments_lexer": "ipython3",
312 |    "version": "3.6.10"
313 |   }
314 |  },
315 |  "nbformat": 4,
316 |  "nbformat_minor": 4
317 | }
318 | 


--------------------------------------------------------------------------------
/basic_tasks_tf/3_link_predict-CN.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "# 用GNN进行边预测\n",
  8 |     "\n",
  9 |     "GNNs are powerful tools for many machine learning tasks on graphs. This tutorial teaches the basic workflow of using GNNs for link prediction. We again use the Zachery's Karate Club graph but try to predict interactions between two members.\n",
 10 |     "\n",
 11 |     "In this tutorial, you will learn:\n",
 12 |     "* Prepare training and testing sets for link prediction task.\n",
 13 |     "* Build a GNN-based link prediction model.\n",
 14 |     "* Train the model and verify the result."
 15 |    ]
 16 |   },
 17 |   {
 18 |    "cell_type": "code",
 19 |    "execution_count": 1,
 20 |    "metadata": {},
 21 |    "outputs": [],
 22 |    "source": [
 23 |     "from tutorial_utils import setup_tf\n",
 24 |     "setup_tf()"
 25 |    ]
 26 |   },
 27 |   {
 28 |    "cell_type": "code",
 29 |    "execution_count": 2,
 30 |    "metadata": {},
 31 |    "outputs": [],
 32 |    "source": [
 33 |     "import dgl\n",
 34 |     "import tensorflow as tf\n",
 35 |     "import itertools\n",
 36 |     "import numpy as np\n",
 37 |     "import scipy.sparse as sp"
 38 |    ]
 39 |   },
 40 |   {
 41 |    "cell_type": "markdown",
 42 |    "metadata": {},
 43 |    "source": [
 44 |     "## 导入图结构和特征\n",
 45 |     "\n",
 46 |     "Following the last [session](./2_gnn-CN.ipynb), we first load the Zachery's Karate Club graph and creates node embeddings."
 47 |    ]
 48 |   },
 49 |   {
 50 |    "cell_type": "code",
 51 |    "execution_count": 3,
 52 |    "metadata": {},
 53 |    "outputs": [
 54 |     {
 55 |      "name": "stdout",
 56 |      "output_type": "stream",
 57 |      "text": [
 58 |       "Graph(num_nodes=34, num_edges=156,\n",
 59 |       "      ndata_schemes={'club': Scheme(shape=(), dtype=tf.int64), 'club_onehot': Scheme(shape=(2,), dtype=tf.float32)}\n",
 60 |       "      edata_schemes={})\n",
 61 |       "<tf.Variable 'embedding/embeddings:0' shape=(34, 5) dtype=float32, numpy=\n",
 62 |       "array([[-0.38376212,  0.02762738,  0.2652063 ,  0.32293776,  0.04524353],\n",
 63 |       "       [ 0.30032715,  0.02468556, -0.1900916 , -0.04702508, -0.30461857],\n",
 64 |       "       [-0.1929569 ,  0.24494252, -0.38531214, -0.08113599,  0.06808767],\n",
 65 |       "       [ 0.02020505,  0.26825735, -0.3504967 ,  0.2496821 , -0.34984836],\n",
 66 |       "       [ 0.0258891 , -0.15108845, -0.35368958,  0.37582836, -0.29545236],\n",
 67 |       "       [ 0.2677717 ,  0.08830869,  0.28496173,  0.02015099,  0.05002049],\n",
 68 |       "       [-0.00256091,  0.10553828, -0.10098866, -0.25102186,  0.20928678],\n",
 69 |       "       [ 0.23899326, -0.27900234,  0.23708245, -0.20309108, -0.11720824],\n",
 70 |       "       [-0.07901543,  0.31122229,  0.01442784,  0.03468132, -0.16346353],\n",
 71 |       "       [ 0.00062189,  0.32725772,  0.22976199, -0.09203568,  0.0605621 ],\n",
 72 |       "       [ 0.18852326, -0.22114322,  0.01407388,  0.02442989, -0.07274896],\n",
 73 |       "       [-0.19398539, -0.1850395 , -0.28360528, -0.12099096, -0.17968921],\n",
 74 |       "       [-0.00870264, -0.29916045, -0.0631997 ,  0.15708569,  0.07021001],\n",
 75 |       "       [ 0.3636075 , -0.00296146,  0.00969708,  0.32759282,  0.13714948],\n",
 76 |       "       [ 0.03305024,  0.10574755, -0.2602409 ,  0.29580548, -0.11125207],\n",
 77 |       "       [-0.11671582,  0.11565533,  0.3492187 ,  0.293992  ,  0.37837824],\n",
 78 |       "       [ 0.3312573 , -0.28587145, -0.03252721,  0.26750275,  0.28610566],\n",
 79 |       "       [ 0.12475982, -0.30301076, -0.24551296,  0.29956558, -0.18915391],\n",
 80 |       "       [-0.11880317, -0.24317536,  0.2580764 ,  0.22547969,  0.22258344],\n",
 81 |       "       [ 0.2329677 ,  0.10412478, -0.28541106,  0.37384805, -0.02423584],\n",
 82 |       "       [ 0.2116668 , -0.22576325,  0.07006133, -0.19694842, -0.30447578],\n",
 83 |       "       [-0.38416442,  0.07963702, -0.22657318,  0.2628081 , -0.20528664],\n",
 84 |       "       [-0.2577273 , -0.23448086,  0.32212886, -0.24628036, -0.3447023 ],\n",
 85 |       "       [-0.3161682 , -0.19704661, -0.19434355, -0.01621708, -0.30884323],\n",
 86 |       "       [ 0.37599108,  0.0297989 ,  0.06864455,  0.03377843,  0.23777828],\n",
 87 |       "       [ 0.08998618,  0.3604851 , -0.099262  ,  0.10451549, -0.07155886],\n",
 88 |       "       [ 0.22283366, -0.3278796 ,  0.1046724 , -0.09897768,  0.23841599],\n",
 89 |       "       [-0.1946501 ,  0.11716357, -0.32857564,  0.10770017, -0.1645889 ],\n",
 90 |       "       [ 0.17030624, -0.36210936,  0.01169559,  0.10552621,  0.36919817],\n",
 91 |       "       [ 0.19632813,  0.36338618,  0.36978415,  0.20446762, -0.18749607],\n",
 92 |       "       [-0.27377343,  0.00047234,  0.18350473,  0.37348035,  0.23951712],\n",
 93 |       "       [-0.26527306,  0.1634542 , -0.17902365,  0.38443735, -0.17161882],\n",
 94 |       "       [-0.15571484,  0.08100843,  0.26142022,  0.02388862,  0.22822693],\n",
 95 |       "       [ 0.19441465,  0.07608762, -0.24678141, -0.22402386, -0.19314057]],\n",
 96 |       "      dtype=float32)>\n"
 97 |      ]
 98 |     }
 99 |    ],
100 |    "source": [
101 |     "from tutorial_utils import load_zachery\n",
102 |     "\n",
103 |     "# ----------- 0. load graph -------------- #\n",
104 |     "g = load_zachery()\n",
105 |     "print(g)\n",
106 |     "\n",
107 |     "# ----------- 1. node features -------------- #\n",
108 |     "node_embed = tf.keras.layers.Embedding(g.number_of_nodes(), 5,\n",
109 |     "                                       embeddings_initializer='glorot_uniform')  # Every node has an embedding of size 5.\n",
110 |     "node_embed(1) # intialize embedding layer\n",
111 |     "inputs = node_embed.embeddings # # Use the embedding weight as the node features.\n",
112 |     "print(inputs)"
113 |    ]
114 |   },
115 |   {
116 |    "cell_type": "markdown",
117 |    "metadata": {},
118 |    "source": [
119 |     "## 准备训练和测试集"
120 |    ]
121 |   },
122 |   {
123 |    "cell_type": "markdown",
124 |    "metadata": {},
125 |    "source": [
126 |     "In general, a link prediction data set contains two types of edges, *positive* and *negative edges*. Positive edges are usually drawn from the existing edges in the graph. In this example, we randomly pick 50 edges for testing and leave the rest for training."
127 |    ]
128 |   },
129 |   {
130 |    "cell_type": "code",
131 |    "execution_count": 4,
132 |    "metadata": {},
133 |    "outputs": [],
134 |    "source": [
135 |     "# Split edge set for training and testing\n",
136 |     "u, v = g.edges()\n",
137 |     "u, v = u.numpy(), v.numpy()\n",
138 |     "eids = np.arange(g.number_of_edges())\n",
139 |     "eids = np.random.permutation(eids)\n",
140 |     "test_pos_u, test_pos_v = u[eids[:50]], v[eids[:50]]\n",
141 |     "train_pos_u, train_pos_v = u[eids[50:]], v[eids[50:]]"
142 |    ]
143 |   },
144 |   {
145 |    "cell_type": "markdown",
146 |    "metadata": {},
147 |    "source": [
148 |     "Since the number of negative edges is large, sampling is usually desired. How to choose proper negative sampling algorithms is a widely-studied topic and is out of scope of this tutorial. Since our example graph is quite small (with only 34 nodes), we enumerate all the missing edges and randomly pick 50 for testing and 150 for training."
149 |    ]
150 |   },
151 |   {
152 |    "cell_type": "code",
153 |    "execution_count": 5,
154 |    "metadata": {},
155 |    "outputs": [],
156 |    "source": [
157 |     "# Find all negative edges and split them for training and testing\n",
158 |     "adj = sp.coo_matrix((np.ones(len(u)), (u, v)))\n",
159 |     "adj_neg = 1 - adj.todense() - np.eye(34)\n",
160 |     "neg_u, neg_v = np.where(adj_neg != 0)\n",
161 |     "neg_eids = np.random.choice(len(neg_u), 200)\n",
162 |     "test_neg_u, test_neg_v = neg_u[neg_eids[:50]], neg_v[neg_eids[:50]]\n",
163 |     "train_neg_u, train_neg_v = neg_u[neg_eids[50:]], neg_v[neg_eids[50:]]"
164 |    ]
165 |   },
166 |   {
167 |    "cell_type": "markdown",
168 |    "metadata": {},
169 |    "source": [
170 |     "Put positive and negative edges together and form training and testing sets."
171 |    ]
172 |   },
173 |   {
174 |    "cell_type": "code",
175 |    "execution_count": 6,
176 |    "metadata": {},
177 |    "outputs": [],
178 |    "source": [
179 |     "# Create training set.\n",
180 |     "train_u = tf.concat([train_pos_u, train_neg_u], axis=0)\n",
181 |     "train_v = tf.concat([train_pos_v, train_neg_v], axis=0)\n",
182 |     "train_label = tf.concat([tf.zeros(len(train_pos_u)), tf.ones(len(train_neg_u))], axis=0)\n",
183 |     "\n",
184 |     "# Create testing set.\n",
185 |     "test_u = tf.concat([test_pos_u, test_neg_u], axis=0)\n",
186 |     "test_v = tf.concat([test_pos_v, test_neg_v], axis=0)\n",
187 |     "test_label = tf.concat([tf.zeros(len(test_pos_u)), tf.ones(len(test_neg_u))], axis=0)"
188 |    ]
189 |   },
190 |   {
191 |    "cell_type": "markdown",
192 |    "metadata": {},
193 |    "source": [
194 |     "## 定义GraphSAGE的模型\n",
195 |     "\n",
196 |     "Our model consists of two layers, each computes new node representations by aggregating neighbor information. The equations are:\n",
197 |     "\n",
198 |     "$$\n",
199 |     "h_{\\mathcal{N}(v)}^k\\leftarrow \\text{AGGREGATE}_k\\{h_u^{k-1},\\forall u\\in\\mathcal{N}(v)\\}\n",
200 |     "$$\n",
201 |     "\n",
202 |     "$$\n",
203 |     "h_v^k\\leftarrow \\text{ReLU}\\left(W^k\\cdot \\text{CONCAT}(h_v^{k-1}, h_{\\mathcal{N}(v)}^k) \\right)\n",
204 |     "$$\n",
205 |     "\n",
206 |     "DGL provides implementation of many popular neighbor aggregation modules. They all can be invoked easily with one line of codes. See the full list of supported [graph convolution modules](https://docs.dgl.ai/api/python/nn.pytorch.html#module-dgl.nn.pytorch.conv)."
207 |    ]
208 |   },
209 |   {
210 |    "cell_type": "code",
211 |    "execution_count": 7,
212 |    "metadata": {},
213 |    "outputs": [],
214 |    "source": [
215 |     "from dgl.nn import SAGEConv\n",
216 |     "\n",
217 |     "# ----------- 2. create model -------------- #\n",
218 |     "# build a two-layer GraphSAGE model\n",
219 |     "class GraphSAGE(tf.keras.layers.Layer):\n",
220 |     "    def __init__(self, in_feats, h_feats):\n",
221 |     "        super(GraphSAGE, self).__init__()\n",
222 |     "        self.conv1 = SAGEConv(in_feats, h_feats, 'mean')\n",
223 |     "        self.conv2 = SAGEConv(h_feats, h_feats, 'mean')\n",
224 |     "    \n",
225 |     "    def call(self, g, in_feat):\n",
226 |     "        h = self.conv1(g, in_feat)\n",
227 |     "        h = tf.nn.relu(h)\n",
228 |     "        h = self.conv2(g, h)\n",
229 |     "        return h\n",
230 |     "    \n",
231 |     "# Create the model with given dimensions \n",
232 |     "# input layer dimension: 5, node embeddings\n",
233 |     "# hidden layer dimension: 16\n",
234 |     "net = GraphSAGE(5, 16)"
235 |    ]
236 |   },
237 |   {
238 |    "cell_type": "markdown",
239 |    "metadata": {},
240 |    "source": [
241 |     "## 对边预测使用针对性的损失函数\n",
242 |     "\n",
243 |     "We then optimize the model using the following loss function.\n",
244 |     "\n",
245 |     "$$\n",
246 |     "\\hat{y}_{u\\sim v} = \\sigma(h_u^T h_v)\n",
247 |     "$$\n",
248 |     "\n",
249 |     "$$\n",
250 |     "\\mathcal{L} = -\\sum_{u\\sim v\\in \\mathcal{D}}\\left( y_{u\\sim v}\\log(\\hat{y}_{u\\sim v}) + (1-y_{u\\sim v})\\log(1-\\hat{y}_{u\\sim v})) \\right)\n",
251 |     "$$\n",
252 |     "\n",
253 |     "Essentially, the model predicts a score for each edge by dot-producting the representations of its two end-points. It then computes a binary cross entropy loss with the target $y$ being 0 or 1 meaning whether the edge is a positive one or not."
254 |    ]
255 |   },
256 |   {
257 |    "cell_type": "code",
258 |    "execution_count": 8,
259 |    "metadata": {},
260 |    "outputs": [
261 |     {
262 |      "name": "stdout",
263 |      "output_type": "stream",
264 |      "text": [
265 |       "WARNING:tensorflow:From /Users/jamezhan/anaconda3/envs/dgl/lib/python3.6/site-packages/tensorflow/python/ops/array_grad.py:644: _EagerTensorBase.cpu (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.\n",
266 |       "Instructions for updating:\n",
267 |       "Use tf.identity instead.\n",
268 |       "In epoch 0, loss: 0.6808468699455261\n",
269 |       "In epoch 5, loss: 0.5773298740386963\n",
270 |       "In epoch 10, loss: 0.44662487506866455\n",
271 |       "In epoch 15, loss: 0.39926254749298096\n",
272 |       "In epoch 20, loss: 0.3401777446269989\n",
273 |       "In epoch 25, loss: 0.32041066884994507\n",
274 |       "In epoch 30, loss: 0.2868993878364563\n",
275 |       "In epoch 35, loss: 0.2574978470802307\n",
276 |       "In epoch 40, loss: 0.23335880041122437\n",
277 |       "In epoch 45, loss: 0.20876789093017578\n",
278 |       "In epoch 50, loss: 0.1869775354862213\n",
279 |       "In epoch 55, loss: 0.16242516040802002\n",
280 |       "In epoch 60, loss: 0.1356896162033081\n",
281 |       "In epoch 65, loss: 0.11280933022499084\n",
282 |       "In epoch 70, loss: 0.08605014532804489\n",
283 |       "In epoch 75, loss: 0.054293327033519745\n",
284 |       "In epoch 80, loss: 0.02977687492966652\n",
285 |       "In epoch 85, loss: 0.013111919164657593\n",
286 |       "In epoch 90, loss: 0.005503080785274506\n",
287 |       "In epoch 95, loss: 0.0021488661877810955\n"
288 |      ]
289 |     }
290 |    ],
291 |    "source": [
292 |     "# ----------- 3. set up loss and optimizer -------------- #\n",
293 |     "optimizer = tf.keras.optimizers.Adam(learning_rate=0.01)\n",
294 |     "loss_fcn = tf.keras.losses.BinaryCrossentropy(\n",
295 |     "    from_logits=False)\n",
296 |     "\n",
297 |     "# ----------- 4. training -------------------------------- #\n",
298 |     "all_logits = []\n",
299 |     "for e in range(100):\n",
300 |     "    \n",
301 |     "    with tf.GradientTape() as tape:\n",
302 |     "        tape.watch(inputs) # optimize embedding layer also\n",
303 |     "        # forward\n",
304 |     "        logits = net(g, inputs)\n",
305 |     "        pred = tf.sigmoid(tf.reduce_sum(tf.gather(logits, train_u) *\n",
306 |     "                                        tf.gather(logits, train_v), axis=1))\n",
307 |     "\n",
308 |     "        # compute loss\n",
309 |     "        loss = loss_fcn(train_label, pred)\n",
310 |     "\n",
311 |     "        # backward\n",
312 |     "        grads = tape.gradient(loss, net.trainable_weights + node_embed.trainable_weights)        \n",
313 |     "        optimizer.apply_gradients(zip(grads, net.trainable_weights + node_embed.trainable_weights))\n",
314 |     "        all_logits.append(logits.numpy())\n",
315 |     "\n",
316 |     "    if e % 5 == 0:\n",
317 |     "        print('In epoch {}, loss: {}'.format(e, loss))"
318 |    ]
319 |   },
320 |   {
321 |    "cell_type": "code",
322 |    "execution_count": 9,
323 |    "metadata": {},
324 |    "outputs": [
325 |     {
326 |      "name": "stdout",
327 |      "output_type": "stream",
328 |      "text": [
329 |       "Accuracy 0.8\n"
330 |      ]
331 |     }
332 |    ],
333 |    "source": [
334 |     "# ----------- 5. check results ------------------------ #\n",
335 |     "pred = tf.sigmoid(tf.reduce_sum(tf.gather(logits, test_u) *\n",
336 |     "                                tf.gather(logits, test_v), axis=1)).numpy()\n",
337 |     "print('Accuracy', ((pred >= 0.5) == test_label.numpy()).sum().item() / len(pred))"
338 |    ]
339 |   },
340 |   {
341 |    "cell_type": "code",
342 |    "execution_count": null,
343 |    "metadata": {},
344 |    "outputs": [],
345 |    "source": []
346 |   }
347 |  ],
348 |  "metadata": {
349 |   "kernelspec": {
350 |    "display_name": "Python [conda env:dgl]",
351 |    "language": "python",
352 |    "name": "conda-env-dgl-py"
353 |   },
354 |   "language_info": {
355 |    "codemirror_mode": {
356 |     "name": "ipython",
357 |     "version": 3
358 |    },
359 |    "file_extension": ".py",
360 |    "mimetype": "text/x-python",
361 |    "name": "python",
362 |    "nbconvert_exporter": "python",
363 |    "pygments_lexer": "ipython3",
364 |    "version": "3.6.10"
365 |   }
366 |  },
367 |  "nbformat": 4,
368 |  "nbformat_minor": 4
369 | }
370 | 


--------------------------------------------------------------------------------
/basic_tasks_tf/3_link_predict.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "# Link Prediction using Graph Neural Networks\n",
  8 |     "\n",
  9 |     "GNNs are powerful tools for many machine learning tasks on graphs. This tutorial teaches the basic workflow of using GNNs for link prediction. We again use the Zachery's Karate Club graph but try to predict interactions between two members.\n",
 10 |     "\n",
 11 |     "In this tutorial, you will learn:\n",
 12 |     "* Prepare training and testing sets for link prediction task.\n",
 13 |     "* Build a GNN-based link prediction model.\n",
 14 |     "* Train the model and verify the result."
 15 |    ]
 16 |   },
 17 |   {
 18 |    "cell_type": "code",
 19 |    "execution_count": null,
 20 |    "metadata": {},
 21 |    "outputs": [],
 22 |    "source": [
 23 |     "from tutorial_utils import setup_tf\n",
 24 |     "setup_tf()"
 25 |    ]
 26 |   },
 27 |   {
 28 |    "cell_type": "code",
 29 |    "execution_count": null,
 30 |    "metadata": {},
 31 |    "outputs": [],
 32 |    "source": [
 33 |     "import dgl\n",
 34 |     "import tensorflow as tf\n",
 35 |     "import itertools\n",
 36 |     "import numpy as np\n",
 37 |     "import scipy.sparse as sp"
 38 |    ]
 39 |   },
 40 |   {
 41 |    "cell_type": "markdown",
 42 |    "metadata": {},
 43 |    "source": [
 44 |     "## Load graph and features\n",
 45 |     "\n",
 46 |     "Following the last [session](./2_gnn.ipynb), we first load the Zachery's Karate Club graph and creates node embeddings."
 47 |    ]
 48 |   },
 49 |   {
 50 |    "cell_type": "code",
 51 |    "execution_count": null,
 52 |    "metadata": {},
 53 |    "outputs": [],
 54 |    "source": [
 55 |     "from tutorial_utils import load_zachery\n",
 56 |     "\n",
 57 |     "# ----------- 0. load graph -------------- #\n",
 58 |     "g = load_zachery()\n",
 59 |     "print(g)\n",
 60 |     "\n",
 61 |     "# ----------- 1. node features -------------- #\n",
 62 |     "node_embed = tf.keras.layers.Embedding(g.number_of_nodes(), 5,\n",
 63 |     "                                       embeddings_initializer='glorot_uniform')  # Every node has an embedding of size 5.\n",
 64 |     "node_embed(1) # intialize embedding layer\n",
 65 |     "inputs = node_embed.embeddings # # Use the embedding weight as the node features.\n",
 66 |     "print(inputs)"
 67 |    ]
 68 |   },
 69 |   {
 70 |    "cell_type": "markdown",
 71 |    "metadata": {},
 72 |    "source": [
 73 |     "## Prepare training and testing sets"
 74 |    ]
 75 |   },
 76 |   {
 77 |    "cell_type": "markdown",
 78 |    "metadata": {},
 79 |    "source": [
 80 |     "In general, a link prediction data set contains two types of edges, *positive* and *negative edges*. Positive edges are usually drawn from the existing edges in the graph. In this example, we randomly pick 50 edges for testing and leave the rest for training."
 81 |    ]
 82 |   },
 83 |   {
 84 |    "cell_type": "code",
 85 |    "execution_count": null,
 86 |    "metadata": {},
 87 |    "outputs": [],
 88 |    "source": [
 89 |     "# Split edge set for training and testing\n",
 90 |     "u, v = g.edges()\n",
 91 |     "u, v = u.numpy(), v.numpy()\n",
 92 |     "eids = np.arange(g.number_of_edges())\n",
 93 |     "eids = np.random.permutation(eids)\n",
 94 |     "test_pos_u, test_pos_v = u[eids[:50]], v[eids[:50]]\n",
 95 |     "train_pos_u, train_pos_v = u[eids[50:]], v[eids[50:]]"
 96 |    ]
 97 |   },
 98 |   {
 99 |    "cell_type": "markdown",
100 |    "metadata": {},
101 |    "source": [
102 |     "Since the number of negative edges is large, sampling is usually desired. How to choose proper negative sampling algorithms is a widely-studied topic and is out of scope of this tutorial. Since our example graph is quite small (with only 34 nodes), we enumerate all the missing edges and randomly pick 50 for testing and 150 for training."
103 |    ]
104 |   },
105 |   {
106 |    "cell_type": "code",
107 |    "execution_count": null,
108 |    "metadata": {},
109 |    "outputs": [],
110 |    "source": [
111 |     "# Find all negative edges and split them for training and testing\n",
112 |     "adj = sp.coo_matrix((np.ones(len(u)), (u, v)))\n",
113 |     "adj_neg = 1 - adj.todense() - np.eye(34)\n",
114 |     "neg_u, neg_v = np.where(adj_neg != 0)\n",
115 |     "neg_eids = np.random.choice(len(neg_u), 200)\n",
116 |     "test_neg_u, test_neg_v = neg_u[neg_eids[:50]], neg_v[neg_eids[:50]]\n",
117 |     "train_neg_u, train_neg_v = neg_u[neg_eids[50:]], neg_v[neg_eids[50:]]"
118 |    ]
119 |   },
120 |   {
121 |    "cell_type": "markdown",
122 |    "metadata": {},
123 |    "source": [
124 |     "Put positive and negative edges together and form training and testing sets."
125 |    ]
126 |   },
127 |   {
128 |    "cell_type": "code",
129 |    "execution_count": null,
130 |    "metadata": {},
131 |    "outputs": [],
132 |    "source": [
133 |     "# Create training set.\n",
134 |     "train_u = tf.concat([train_pos_u, train_neg_u], axis=0)\n",
135 |     "train_v = tf.concat([train_pos_v, train_neg_v], axis=0)\n",
136 |     "train_label = tf.concat([tf.zeros(len(train_pos_u)), tf.ones(len(train_neg_u))], axis=0)\n",
137 |     "\n",
138 |     "# Create testing set.\n",
139 |     "test_u = tf.concat([test_pos_u, test_neg_u], axis=0)\n",
140 |     "test_v = tf.concat([test_pos_v, test_neg_v], axis=0)\n",
141 |     "test_label = tf.concat([tf.zeros(len(test_pos_u)), tf.ones(len(test_neg_u))], axis=0)"
142 |    ]
143 |   },
144 |   {
145 |    "cell_type": "markdown",
146 |    "metadata": {},
147 |    "source": [
148 |     "## Define a GraphSAGE model\n",
149 |     "\n",
150 |     "Our model consists of two layers, each computes new node representations by aggregating neighbor information. The equations are:\n",
151 |     "\n",
152 |     "$$\n",
153 |     "h_{\\mathcal{N}(v)}^k\\leftarrow \\text{AGGREGATE}_k\\{h_u^{k-1},\\forall u\\in\\mathcal{N}(v)\\}\n",
154 |     "$$\n",
155 |     "\n",
156 |     "$$\n",
157 |     "h_v^k\\leftarrow \\text{ReLU}\\left(W^k\\cdot \\text{CONCAT}(h_v^{k-1}, h_{\\mathcal{N}(v)}^k) \\right)\n",
158 |     "$$\n",
159 |     "\n",
160 |     "DGL provides implementation of many popular neighbor aggregation modules. They all can be invoked easily with one line of codes. See the full list of supported [graph convolution modules](https://docs.dgl.ai/api/python/nn.pytorch.html#module-dgl.nn.pytorch.conv)."
161 |    ]
162 |   },
163 |   {
164 |    "cell_type": "code",
165 |    "execution_count": null,
166 |    "metadata": {},
167 |    "outputs": [],
168 |    "source": [
169 |     "from dgl.nn import SAGEConv\n",
170 |     "\n",
171 |     "# ----------- 2. create model -------------- #\n",
172 |     "# build a two-layer GraphSAGE model\n",
173 |     "class GraphSAGE(tf.keras.layers.Layer):\n",
174 |     "    def __init__(self, in_feats, h_feats):\n",
175 |     "        super(GraphSAGE, self).__init__()\n",
176 |     "        self.conv1 = SAGEConv(in_feats, h_feats, 'mean')\n",
177 |     "        self.conv2 = SAGEConv(h_feats, h_feats, 'mean')\n",
178 |     "    \n",
179 |     "    def call(self, g, in_feat):\n",
180 |     "        h = self.conv1(g, in_feat)\n",
181 |     "        h = tf.nn.relu(h)\n",
182 |     "        h = self.conv2(g, h)\n",
183 |     "        return h\n",
184 |     "    \n",
185 |     "# Create the model with given dimensions \n",
186 |     "# input layer dimension: 5, node embeddings\n",
187 |     "# hidden layer dimension: 16\n",
188 |     "net = GraphSAGE(5, 16)"
189 |    ]
190 |   },
191 |   {
192 |    "cell_type": "markdown",
193 |    "metadata": {},
194 |    "source": [
195 |     "We then optimize the model using the following loss function.\n",
196 |     "\n",
197 |     "$$\n",
198 |     "\\hat{y}_{u\\sim v} = \\sigma(h_u^T h_v)\n",
199 |     "$$\n",
200 |     "\n",
201 |     "$$\n",
202 |     "\\mathcal{L} = -\\sum_{u\\sim v\\in \\mathcal{D}}\\left( y_{u\\sim v}\\log(\\hat{y}_{u\\sim v}) + (1-y_{u\\sim v})\\log(1-\\hat{y}_{u\\sim v})) \\right)\n",
203 |     "$$\n",
204 |     "\n",
205 |     "Essentially, the model predicts a score for each edge by dot-producting the representations of its two end-points. It then computes a binary cross entropy loss with the target $y$ being 0 or 1 meaning whether the edge is a positive one or not."
206 |    ]
207 |   },
208 |   {
209 |    "cell_type": "code",
210 |    "execution_count": null,
211 |    "metadata": {},
212 |    "outputs": [],
213 |    "source": [
214 |     "# ----------- 3. set up loss and optimizer -------------- #\n",
215 |     "optimizer = tf.keras.optimizers.Adam(learning_rate=0.01)\n",
216 |     "loss_fcn = tf.keras.losses.BinaryCrossentropy(\n",
217 |     "    from_logits=False)\n",
218 |     "\n",
219 |     "# ----------- 4. training -------------------------------- #\n",
220 |     "all_logits = []\n",
221 |     "for e in range(100):\n",
222 |     "    \n",
223 |     "    with tf.GradientTape() as tape:\n",
224 |     "        tape.watch(inputs) # optimize embedding layer also\n",
225 |     "        # forward\n",
226 |     "        logits = net(g, inputs)\n",
227 |     "        pred = tf.sigmoid(tf.reduce_sum(tf.gather(logits, train_u) *\n",
228 |     "                                        tf.gather(logits, train_v), axis=1))\n",
229 |     "\n",
230 |     "        # compute loss\n",
231 |     "        loss = loss_fcn(train_label, pred)\n",
232 |     "\n",
233 |     "        # backward\n",
234 |     "        grads = tape.gradient(loss, net.trainable_weights + node_embed.trainable_weights)        \n",
235 |     "        optimizer.apply_gradients(zip(grads, net.trainable_weights + node_embed.trainable_weights))\n",
236 |     "        all_logits.append(logits.numpy())\n",
237 |     "\n",
238 |     "    if e % 5 == 0:\n",
239 |     "        print('In epoch {}, loss: {}'.format(e, loss))"
240 |    ]
241 |   },
242 |   {
243 |    "cell_type": "code",
244 |    "execution_count": null,
245 |    "metadata": {},
246 |    "outputs": [],
247 |    "source": [
248 |     "# ----------- 5. check results ------------------------ #\n",
249 |     "pred = tf.sigmoid(tf.reduce_sum(tf.gather(logits, test_u) *\n",
250 |     "                                tf.gather(logits, test_v), axis=1)).numpy()\n",
251 |     "print('Accuracy', ((pred >= 0.5) == test_label.numpy()).sum().item() / len(pred))"
252 |    ]
253 |   },
254 |   {
255 |    "cell_type": "code",
256 |    "execution_count": null,
257 |    "metadata": {},
258 |    "outputs": [],
259 |    "source": []
260 |   }
261 |  ],
262 |  "metadata": {
263 |   "kernelspec": {
264 |    "display_name": "3.6.8",
265 |    "language": "python",
266 |    "name": "3.6.8"
267 |   },
268 |   "language_info": {
269 |    "codemirror_mode": {
270 |     "name": "ipython",
271 |     "version": 3
272 |    },
273 |    "file_extension": ".py",
274 |    "mimetype": "text/x-python",
275 |    "name": "python",
276 |    "nbconvert_exporter": "python",
277 |    "pygments_lexer": "ipython3",
278 |    "version": "3.6.8"
279 |   }
280 |  },
281 |  "nbformat": 4,
282 |  "nbformat_minor": 4
283 | }
284 | 


--------------------------------------------------------------------------------
/basic_tasks_tf/4_message_passing-CN.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "# 使用消息传递APIs来定制图卷集模型\n",
  8 |     "\n",
  9 |     "In previous sessions, we have learned using the built-in [graph convolution modules](https://docs.dgl.ai/api/python/nn.pytorch.html#module-dgl.nn.pytorch.conv) to build a multi-layer graph neural network. However, sometimes one desires to invent a new way of aggregating neighbor information. DGL's message passing APIs are designed for this scenario.\n",
 10 |     "\n",
 11 |     "In this tutorial, you will learn:\n",
 12 |     "* What is under the hood of the `nn.SAGEConv` module in DGL?\n",
 13 |     "* DGL's message passing APIs.\n",
 14 |     "* Design a new graph convolution module."
 15 |    ]
 16 |   },
 17 |   {
 18 |    "cell_type": "code",
 19 |    "execution_count": 1,
 20 |    "metadata": {},
 21 |    "outputs": [],
 22 |    "source": [
 23 |     "from tutorial_utils import setup_tf\n",
 24 |     "setup_tf()"
 25 |    ]
 26 |   },
 27 |   {
 28 |    "cell_type": "code",
 29 |    "execution_count": 2,
 30 |    "metadata": {},
 31 |    "outputs": [],
 32 |    "source": [
 33 |     "import dgl\n",
 34 |     "import tensorflow as tf"
 35 |    ]
 36 |   },
 37 |   {
 38 |    "cell_type": "markdown",
 39 |    "metadata": {},
 40 |    "source": [
 41 |     "## 简介SAGEConv模型\n",
 42 |     "\n",
 43 |     "Recall that a `SAGEConv` module aggregates neighbor information and generates new node representations as follows:\n",
 44 |     "\n",
 45 |     "\n",
 46 |     "$$\n",
 47 |     "h_{\\mathcal{N}(v)}^k\\leftarrow \\text{AGGREGATE}_k\\{h_u^{k-1},\\forall u\\in\\mathcal{N}(v)\\}\n",
 48 |     "$$\n",
 49 |     "\n",
 50 |     "$$\n",
 51 |     "h_v^k\\leftarrow \\text{ReLU}\\left(W^k\\cdot \\text{CONCAT}(h_v^{k-1}, h_{\\mathcal{N}(v)}^k) \\right)\n",
 52 |     "$$\n",
 53 |     "\n",
 54 |     "Here is its implementation in DGL."
 55 |    ]
 56 |   },
 57 |   {
 58 |    "cell_type": "code",
 59 |    "execution_count": 3,
 60 |    "metadata": {},
 61 |    "outputs": [],
 62 |    "source": [
 63 |     "import dgl.function as fn\n",
 64 |     "from tensorflow.keras import layers\n",
 65 |     "\n",
 66 |     "class SAGEConv(layers.Layer):\n",
 67 |     "    \"\"\"Graph convolution module used by the GraphSAGE model.\n",
 68 |     "    \n",
 69 |     "    Parameters\n",
 70 |     "    ----------\n",
 71 |     "    in_feat : int\n",
 72 |     "        Input feature size.\n",
 73 |     "    out_feat : int\n",
 74 |     "        Output feature size.\n",
 75 |     "    \"\"\"\n",
 76 |     "    def __init__(self, in_feat, out_feat):\n",
 77 |     "        super(SAGEConv, self).__init__()\n",
 78 |     "        # A linear submodule for projecting the input and neighbor feature to the output.\n",
 79 |     "        self.linear = layers.Dense(out_feat)\n",
 80 |     "    \n",
 81 |     "    def call(self, g, h):\n",
 82 |     "        \"\"\"Forward computation\n",
 83 |     "        \n",
 84 |     "        Parameters\n",
 85 |     "        ----------\n",
 86 |     "        g : Graph\n",
 87 |     "            The input graph.\n",
 88 |     "        h : Tensor\n",
 89 |     "            The input node feature.\n",
 90 |     "        \"\"\"\n",
 91 |     "        # All the `ndata` set within a local scope will be automatically popped out.\n",
 92 |     "        with g.local_scope():\n",
 93 |     "            g.ndata['h'] = h\n",
 94 |     "            # update_all is a message passing API.\n",
 95 |     "            g.update_all(fn.copy_u('h', 'm'), fn.mean('m', 'h_neigh'))\n",
 96 |     "            h_neigh = g.ndata['h_neigh']\n",
 97 |     "            h_total = tf.concat([h, h_neigh], dim=1)\n",
 98 |     "            return tf.nn.relu(self.linear(h_total))"
 99 |    ]
100 |   },
101 |   {
102 |    "cell_type": "markdown",
103 |    "metadata": {},
104 |    "source": [
105 |     "The central piece in this code is the `g.update_all` function, which gathers and averages the neighbor features. There are three concepts here:\n",
106 |     "* Message function `fn.copy_u('h', 'm')` that copies the node feature under name `'h'` as *messages* sent to neighbors.\n",
107 |     "* Reduce function `fn.mean('m', 'h_neigh')` that averages all the received messages under name `'m'` and saves the result as a new node feature `'h_neigh'`.\n",
108 |     "* `update_all` tells DGL to trigger the message and reduce functions for all the nodes and edges."
109 |    ]
110 |   },
111 |   {
112 |    "cell_type": "markdown",
113 |    "metadata": {},
114 |    "source": [
115 |     "## 消息传递和GNN\n",
116 |     "\n",
117 |     "The `update_all` is one of the **message passing APIs** in DGL, inspired by the Message Passing Neural Network proposed by [Gilmer et al.](https://arxiv.org/abs/1704.01212) Essentailly, they found many GNN models can fit into the following framework:\n",
118 |     "\n",
119 |     "$$\n",
120 |     "m_{u\\sim v}^{(l)} = M^{(l)}\\left(h_v^{(l-1)}, h_u^{(l-1)}, e_{u\\sim v}^{(l-1)}\\right)\n",
121 |     "$$\n",
122 |     "\n",
123 |     "$$\n",
124 |     "m_{v}^{(l)} = \\sum_{u\\in\\mathcal{N}(v)}m_{u\\sim v}^{(l)}\n",
125 |     "$$\n",
126 |     "\n",
127 |     "$$\n",
128 |     "h_v^{(l)} = U^{(l)}\\left(h_v^{(l-1)}, m_v^{(l)}\\right)\n",
129 |     "$$\n",
130 |     "\n",
131 |     ", where the $M^{(l)}$ is called message function and the $\\sum$ is the reduce function. In DGL, we provide many built-in message and reduce functions under the `dgl.function` package.\n",
132 |     "\n",
133 |     "![api](../asset/dgl-mp.png)\n",
134 |     "\n",
135 |     "You can find more details in [the API doc](https://docs.dgl.ai/api/python/function.html)."
136 |    ]
137 |   },
138 |   {
139 |    "cell_type": "markdown",
140 |    "metadata": {},
141 |    "source": [
142 |     "DGL's message passing APIs allow one to quickly implement new graph convolution modules. For example, the following implements a new `SAGEConv` that aggregates neighbor representations using a weighted average."
143 |    ]
144 |   },
145 |   {
146 |    "cell_type": "code",
147 |    "execution_count": 4,
148 |    "metadata": {},
149 |    "outputs": [],
150 |    "source": [
151 |     "class SAGEConv(layers.Layer):\n",
152 |     "    \"\"\"Graph convolution module used by the GraphSAGE model.\n",
153 |     "    \n",
154 |     "    Parameters\n",
155 |     "    ----------\n",
156 |     "    in_feat : int\n",
157 |     "        Input feature size.\n",
158 |     "    out_feat : int\n",
159 |     "        Output feature size.\n",
160 |     "    \"\"\"\n",
161 |     "    def __init__(self, in_feat, out_feat):\n",
162 |     "        super(SAGEConv, self).__init__()\n",
163 |     "        # A linear submodule for projecting the input and neighbor feature to the output.\n",
164 |     "        self.linear = layers.Dense(out_feat)\n",
165 |     "    \n",
166 |     "    def forward(self, g, h, w):\n",
167 |     "        \"\"\"Forward computation\n",
168 |     "        \n",
169 |     "        Parameters\n",
170 |     "        ----------\n",
171 |     "        g : Graph\n",
172 |     "            The input graph.\n",
173 |     "        h : Tensor\n",
174 |     "            The input node feature.\n",
175 |     "        w : Tensor\n",
176 |     "            The edge weight.\n",
177 |     "        \"\"\"\n",
178 |     "        # All the `ndata` set within a local scope will be automatically popped out.\n",
179 |     "        with g.local_scope():\n",
180 |     "            g.ndata['h'] = h\n",
181 |     "            g.edata['w'] = w\n",
182 |     "            # update_all is a message passing API.\n",
183 |     "            g.update_all(fn.u_mul_e('h', 'w', 'm'), fn.mean('m', 'h_neigh'))\n",
184 |     "            h_neigh = g.ndata['h_neigh']\n",
185 |     "            h_total = tf.concat([h, h_neigh], axis=1)\n",
186 |     "            return tf.nn.relu(self.linear(h_total))"
187 |    ]
188 |   },
189 |   {
190 |    "cell_type": "markdown",
191 |    "metadata": {},
192 |    "source": [
193 |     "## 使用用户定义函数进一步定制化\n",
194 |     "\n",
195 |     "DGL allows user-defined message and reduce function for the maximal expressiveness. Here is a user-defined message function that is equivalent to `fn.u_mul_e('h', 'w', 'm')`."
196 |    ]
197 |   },
198 |   {
199 |    "cell_type": "code",
200 |    "execution_count": 1,
201 |    "metadata": {},
202 |    "outputs": [],
203 |    "source": [
204 |     "def u_mul_e_udf(edges):\n",
205 |     "    return {'m' : edges.src['h'] * edges.data['w']}"
206 |    ]
207 |   },
208 |   {
209 |    "cell_type": "markdown",
210 |    "metadata": {},
211 |    "source": [
212 |     "## 回顾\n",
213 |     "\n",
214 |     "* `dgl.nn` provides many popular modules for quick bootstrap.\n",
215 |     "* Using the built-in message and reduce functions in `dgl.function` to customize a new NN module.\n",
216 |     "* User-defined function provides even more flexibility."
217 |    ]
218 |   },
219 |   {
220 |    "cell_type": "code",
221 |    "execution_count": null,
222 |    "metadata": {},
223 |    "outputs": [],
224 |    "source": []
225 |   }
226 |  ],
227 |  "metadata": {
228 |   "kernelspec": {
229 |    "display_name": "Python [conda env:dgl]",
230 |    "language": "python",
231 |    "name": "conda-env-dgl-py"
232 |   },
233 |   "language_info": {
234 |    "codemirror_mode": {
235 |     "name": "ipython",
236 |     "version": 3
237 |    },
238 |    "file_extension": ".py",
239 |    "mimetype": "text/x-python",
240 |    "name": "python",
241 |    "nbconvert_exporter": "python",
242 |    "pygments_lexer": "ipython3",
243 |    "version": "3.6.10"
244 |   }
245 |  },
246 |  "nbformat": 4,
247 |  "nbformat_minor": 2
248 | }
249 | 


--------------------------------------------------------------------------------
/basic_tasks_tf/4_message_passing.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "# Customize Graph Convolution using Message Passing APIs\n",
  8 |     "\n",
  9 |     "In previous sessions, we have learned using the built-in [graph convolution modules](https://docs.dgl.ai/api/python/nn.pytorch.html#module-dgl.nn.pytorch.conv) to build a multi-layer graph neural network. However, sometimes one desires to invent a new way of aggregating neighbor information. DGL's message passing APIs are designed for this scenario.\n",
 10 |     "\n",
 11 |     "In this tutorial, you will learn:\n",
 12 |     "* What is under the hood of the `nn.SAGEConv` module in DGL?\n",
 13 |     "* DGL's message passing APIs.\n",
 14 |     "* Design a new graph convolution module."
 15 |    ]
 16 |   },
 17 |   {
 18 |    "cell_type": "code",
 19 |    "execution_count": null,
 20 |    "metadata": {},
 21 |    "outputs": [],
 22 |    "source": [
 23 |     "from tutorial_utils import setup_tf\n",
 24 |     "setup_tf()"
 25 |    ]
 26 |   },
 27 |   {
 28 |    "cell_type": "code",
 29 |    "execution_count": null,
 30 |    "metadata": {},
 31 |    "outputs": [],
 32 |    "source": [
 33 |     "import dgl\n",
 34 |     "import tensorflow as tf"
 35 |    ]
 36 |   },
 37 |   {
 38 |    "cell_type": "markdown",
 39 |    "metadata": {},
 40 |    "source": [
 41 |     "## A gentle explanation of the `SAGEConv` module\n",
 42 |     "\n",
 43 |     "Recall that a `SAGEConv` module aggregates neighbor information and generates new node representations as follows:\n",
 44 |     "\n",
 45 |     "\n",
 46 |     "$$\n",
 47 |     "h_{\\mathcal{N}(v)}^k\\leftarrow \\text{AGGREGATE}_k\\{h_u^{k-1},\\forall u\\in\\mathcal{N}(v)\\}\n",
 48 |     "$$\n",
 49 |     "\n",
 50 |     "$$\n",
 51 |     "h_v^k\\leftarrow \\text{ReLU}\\left(W^k\\cdot \\text{CONCAT}(h_v^{k-1}, h_{\\mathcal{N}(v)}^k) \\right)\n",
 52 |     "$$\n",
 53 |     "\n",
 54 |     "Here is its implementation in DGL."
 55 |    ]
 56 |   },
 57 |   {
 58 |    "cell_type": "code",
 59 |    "execution_count": null,
 60 |    "metadata": {},
 61 |    "outputs": [],
 62 |    "source": [
 63 |     "import dgl.function as fn\n",
 64 |     "from tensorflow.keras import layers\n",
 65 |     "\n",
 66 |     "class SAGEConv(layers.Layer):\n",
 67 |     "    \"\"\"Graph convolution module used by the GraphSAGE model.\n",
 68 |     "    \n",
 69 |     "    Parameters\n",
 70 |     "    ----------\n",
 71 |     "    in_feat : int\n",
 72 |     "        Input feature size.\n",
 73 |     "    out_feat : int\n",
 74 |     "        Output feature size.\n",
 75 |     "    \"\"\"\n",
 76 |     "    def __init__(self, in_feat, out_feat):\n",
 77 |     "        super(SAGEConv, self).__init__()\n",
 78 |     "        # A linear submodule for projecting the input and neighbor feature to the output.\n",
 79 |     "        self.linear = layers.Dense(out_feat)\n",
 80 |     "    \n",
 81 |     "    def call(self, g, h):\n",
 82 |     "        \"\"\"Forward computation\n",
 83 |     "        \n",
 84 |     "        Parameters\n",
 85 |     "        ----------\n",
 86 |     "        g : Graph\n",
 87 |     "            The input graph.\n",
 88 |     "        h : Tensor\n",
 89 |     "            The input node feature.\n",
 90 |     "        \"\"\"\n",
 91 |     "        # All the `ndata` set within a local scope will be automatically popped out.\n",
 92 |     "        with g.local_scope():\n",
 93 |     "            g.ndata['h'] = h\n",
 94 |     "            # update_all is a message passing API.\n",
 95 |     "            g.update_all(fn.copy_u('h', 'm'), fn.mean('m', 'h_neigh'))\n",
 96 |     "            h_neigh = g.ndata['h_neigh']\n",
 97 |     "            h_total = tf.concat([h, h_neigh], dim=1)\n",
 98 |     "            return tf.nn.relu(self.linear(h_total))"
 99 |    ]
100 |   },
101 |   {
102 |    "cell_type": "markdown",
103 |    "metadata": {},
104 |    "source": [
105 |     "The central piece in this code is the `g.update_all` function, which gathers and averages the neighbor features. There are three concepts here:\n",
106 |     "* Message function `fn.copy_u('h', 'm')` that copies the node feature under name `'h'` as *messages* sent to neighbors.\n",
107 |     "* Reduce function `fn.mean('m', 'h_neigh')` that averages all the received messages under name `'m'` and saves the result as a new node feature `'h_neigh'`.\n",
108 |     "* `update_all` tells DGL to trigger the message and reduce functions for all the nodes and edges."
109 |    ]
110 |   },
111 |   {
112 |    "cell_type": "markdown",
113 |    "metadata": {},
114 |    "source": [
115 |     "## Message passing and GNNs\n",
116 |     "\n",
117 |     "The `update_all` is one of the **message passing APIs** in DGL, inspired by the Message Passing Neural Network proposed by [Gilmer et al.](https://arxiv.org/abs/1704.01212) Essentailly, they found many GNN models can fit into the following framework:\n",
118 |     "\n",
119 |     "$$\n",
120 |     "m_{u\\sim v}^{(l)} = M^{(l)}\\left(h_v^{(l-1)}, h_u^{(l-1)}, e_{u\\sim v}^{(l-1)}\\right)\n",
121 |     "$$\n",
122 |     "\n",
123 |     "$$\n",
124 |     "m_{v}^{(l)} = \\sum_{u\\in\\mathcal{N}(v)}m_{u\\sim v}^{(l)}\n",
125 |     "$$\n",
126 |     "\n",
127 |     "$$\n",
128 |     "h_v^{(l)} = U^{(l)}\\left(h_v^{(l-1)}, m_v^{(l)}\\right)\n",
129 |     "$$\n",
130 |     "\n",
131 |     ", where the $M^{(l)}$ is called message function and the $\\sum$ is the reduce function. In DGL, we provide many built-in message and reduce functions under the `dgl.function` package.\n",
132 |     "\n",
133 |     "![api](../asset/dgl-mp.png)\n",
134 |     "\n",
135 |     "You can find more details in [the API doc](https://docs.dgl.ai/api/python/function.html)."
136 |    ]
137 |   },
138 |   {
139 |    "cell_type": "markdown",
140 |    "metadata": {},
141 |    "source": [
142 |     "DGL's message passing APIs allow one to quickly implement new graph convolution modules. For example, the following implements a new `SAGEConv` that aggregates neighbor representations using a weighted average."
143 |    ]
144 |   },
145 |   {
146 |    "cell_type": "code",
147 |    "execution_count": null,
148 |    "metadata": {},
149 |    "outputs": [],
150 |    "source": [
151 |     "class SAGEConv(layers.Layer):\n",
152 |     "    \"\"\"Graph convolution module used by the GraphSAGE model.\n",
153 |     "    \n",
154 |     "    Parameters\n",
155 |     "    ----------\n",
156 |     "    in_feat : int\n",
157 |     "        Input feature size.\n",
158 |     "    out_feat : int\n",
159 |     "        Output feature size.\n",
160 |     "    \"\"\"\n",
161 |     "    def __init__(self, in_feat, out_feat):\n",
162 |     "        super(SAGEConv, self).__init__()\n",
163 |     "        # A linear submodule for projecting the input and neighbor feature to the output.\n",
164 |     "        self.linear = layers.Dense(out_feat)\n",
165 |     "    \n",
166 |     "    def forward(self, g, h, w):\n",
167 |     "        \"\"\"Forward computation\n",
168 |     "        \n",
169 |     "        Parameters\n",
170 |     "        ----------\n",
171 |     "        g : Graph\n",
172 |     "            The input graph.\n",
173 |     "        h : Tensor\n",
174 |     "            The input node feature.\n",
175 |     "        w : Tensor\n",
176 |     "            The edge weight.\n",
177 |     "        \"\"\"\n",
178 |     "        # All the `ndata` set within a local scope will be automatically popped out.\n",
179 |     "        with g.local_scope():\n",
180 |     "            g.ndata['h'] = h\n",
181 |     "            g.edata['w'] = w\n",
182 |     "            # update_all is a message passing API.\n",
183 |     "            g.update_all(fn.u_mul_e('h', 'w', 'm'), fn.mean('m', 'h_neigh'))\n",
184 |     "            h_neigh = g.ndata['h_neigh']\n",
185 |     "            h_total = tf.concat([h, h_neigh], axis=1)\n",
186 |     "            return tf.nn.relu(self.linear(h_total))"
187 |    ]
188 |   },
189 |   {
190 |    "cell_type": "markdown",
191 |    "metadata": {},
192 |    "source": [
193 |     "## Even more customization by user-defined function\n",
194 |     "\n",
195 |     "DGL allows user-defined message and reduce function for the maximal expressiveness. Here is a user-defined message function that is equivalent to `fn.u_mul_e('h', 'w', 'm')`."
196 |    ]
197 |   },
198 |   {
199 |    "cell_type": "code",
200 |    "execution_count": null,
201 |    "metadata": {},
202 |    "outputs": [],
203 |    "source": [
204 |     "def u_mul_e_udf(edges):\n",
205 |     "    return {'m' : edges.src['h'] * edges.data['w']}"
206 |    ]
207 |   },
208 |   {
209 |    "cell_type": "markdown",
210 |    "metadata": {},
211 |    "source": [
212 |     "## Recap\n",
213 |     "\n",
214 |     "* `dgl.nn` provides many popular modules for quick bootstrap.\n",
215 |     "* Using the built-in message and reduce functions in `dgl.function` to customize a new NN module.\n",
216 |     "* User-defined function provides even more flexibility."
217 |    ]
218 |   },
219 |   {
220 |    "cell_type": "code",
221 |    "execution_count": null,
222 |    "metadata": {},
223 |    "outputs": [],
224 |    "source": []
225 |   }
226 |  ],
227 |  "metadata": {
228 |   "kernelspec": {
229 |    "display_name": "3.6.8",
230 |    "language": "python",
231 |    "name": "3.6.8"
232 |   },
233 |   "language_info": {
234 |    "codemirror_mode": {
235 |     "name": "ipython",
236 |     "version": 3
237 |    },
238 |    "file_extension": ".py",
239 |    "mimetype": "text/x-python",
240 |    "name": "python",
241 |    "nbconvert_exporter": "python",
242 |    "pygments_lexer": "ipython3",
243 |    "version": "3.6.8"
244 |   }
245 |  },
246 |  "nbformat": 4,
247 |  "nbformat_minor": 2
248 | }
249 | 


--------------------------------------------------------------------------------
/basic_tasks_tf/data/edges.csv:
--------------------------------------------------------------------------------
  1 | Src,Dst,Weight
  2 | 0,1,0.31845103596456925
  3 | 0,2,0.5512145529252186
  4 | 0,3,0.22741585224191552
  5 | 0,4,0.2669188689251851
  6 | 0,5,0.47544947326394815
  7 | 0,6,0.8862627361494558
  8 | 0,7,0.16042605375040297
  9 | 0,8,0.7459807864037868
 10 | 0,10,0.5892903561029029
 11 | 0,11,0.47815888753487035
 12 | 0,12,0.782470682321906
 13 | 0,13,0.4179567052866201
 14 | 0,17,0.3685358669980614
 15 | 0,19,0.9551929987199811
 16 | 0,21,0.9613567741407174
 17 | 0,31,0.726227171317607
 18 | 1,0,0.6481096579934986
 19 | 1,2,0.37891536859801955
 20 | 1,3,0.24363852899322458
 21 | 1,7,0.41483571945746534
 22 | 1,13,0.6155209924586298
 23 | 1,17,0.2680276146366586
 24 | 1,19,0.11295441102477333
 25 | 1,21,0.08552253827088485
 26 | 1,30,0.5935657977437264
 27 | 2,0,0.5362144558950368
 28 | 2,1,0.5768112213837137
 29 | 2,3,0.43712621359822357
 30 | 2,7,0.13078796894587796
 31 | 2,8,0.08022424811393303
 32 | 2,9,0.14176452681346274
 33 | 2,13,0.8180530705514824
 34 | 2,27,0.8320016262502158
 35 | 2,28,0.409540145234772
 36 | 2,32,0.2642001276499689
 37 | 3,0,0.04615295242576234
 38 | 3,1,0.7049149713067023
 39 | 3,2,0.8540307600498042
 40 | 3,7,0.46208027528965967
 41 | 3,12,0.6784210067516615
 42 | 3,13,0.3301297641821943
 43 | 4,0,0.8832405252248015
 44 | 4,6,0.6669881098899139
 45 | 4,10,0.9640899941433938
 46 | 5,0,0.4268133485450748
 47 | 5,6,0.8149795460997955
 48 | 5,10,0.9267811195268474
 49 | 5,16,0.565166945765339
 50 | 6,0,0.026799684543950875
 51 | 6,4,0.9402764505035385
 52 | 6,5,0.6631523621900474
 53 | 6,16,0.28025742709759016
 54 | 7,0,0.9254762282755662
 55 | 7,1,0.796690523489831
 56 | 7,2,0.0979476899619427
 57 | 7,3,0.7074291143964807
 58 | 8,0,0.7761087565695074
 59 | 8,2,0.3073975630293794
 60 | 8,30,0.7605817165692309
 61 | 8,32,0.04947830225770011
 62 | 8,33,0.6309335405401543
 63 | 9,2,0.17380258005907812
 64 | 9,33,0.9785414932201859
 65 | 10,0,0.7191944186343044
 66 | 10,4,0.6595607436981878
 67 | 10,5,0.5389153328170596
 68 | 11,0,0.8284252928975201
 69 | 12,0,0.08159504164928544
 70 | 12,3,0.026621551124401566
 71 | 13,0,0.37654143271899876
 72 | 13,1,0.698648970574733
 73 | 13,2,0.1974780964394499
 74 | 13,3,0.45021972444903435
 75 | 13,33,0.9722036402765037
 76 | 14,32,0.009557409279240536
 77 | 14,33,0.589593597849638
 78 | 15,32,0.9878091453185213
 79 | 15,33,0.056667558149276376
 80 | 16,5,0.9045359763970754
 81 | 16,6,0.16879835545426658
 82 | 17,0,0.7730618928346192
 83 | 17,1,0.612815141007156
 84 | 18,32,0.4986886436717537
 85 | 18,33,0.39903509029811446
 86 | 19,0,0.08989463618875349
 87 | 19,1,0.7528198786718245
 88 | 19,33,0.8144615833348146
 89 | 20,32,0.6850036047325682
 90 | 20,33,0.10859338317785638
 91 | 21,0,0.20571853793912853
 92 | 21,1,0.8687748452451053
 93 | 22,32,0.008113164327674838
 94 | 22,33,0.36145242064640726
 95 | 23,25,0.19801959093221744
 96 | 23,27,0.7132375875281998
 97 | 23,29,0.8363094707133548
 98 | 23,32,0.28537615612136547
 99 | 23,33,0.0772935150077827
100 | 24,25,0.26813609940254624
101 | 24,27,0.22638821516538454
102 | 24,31,0.642997701810025
103 | 25,23,0.14459300691102495
104 | 25,24,0.7946476714989169
105 | 25,31,0.7388561092944019
106 | 26,29,0.445934837683155
107 | 26,33,0.4916511327260056
108 | 27,2,0.9527176446503433
109 | 27,23,0.7422628198042871
110 | 27,24,0.23101654883380685
111 | 27,33,0.9550339587184191
112 | 28,2,0.3339314258188022
113 | 28,31,0.34149893586394486
114 | 28,33,0.8180157469491468
115 | 29,23,0.4771935478203284
116 | 29,26,0.12938838154434495
117 | 29,32,0.1215136458344257
118 | 29,33,0.019569167078249627
119 | 30,1,0.6393264342401126
120 | 30,8,0.6646644798466513
121 | 30,32,0.1479691524369151
122 | 30,33,0.6403112524880046
123 | 31,0,0.1394065169246964
124 | 31,24,0.134245083586921
125 | 31,25,0.6243484303605552
126 | 31,28,0.3482911865356624
127 | 31,32,0.23331307519961453
128 | 31,33,0.4593599814263031
129 | 32,2,0.8839177811841961
130 | 32,8,0.5539934876068489
131 | 32,14,0.41970743621036855
132 | 32,15,0.8168582822265494
133 | 32,18,0.30481639228312607
134 | 32,20,0.07279882286966943
135 | 32,22,0.3978619031804904
136 | 32,23,0.20690915689699585
137 | 32,29,0.2338575632865122
138 | 32,30,0.8955881108049515
139 | 32,31,0.7316583958942351
140 | 32,33,0.0033742797158278215
141 | 33,8,0.3631579670659171
142 | 33,9,0.5292689687034333
143 | 33,13,0.7535800037841369
144 | 33,14,0.6738394218089896
145 | 33,15,0.8140789052125266
146 | 33,18,0.8968652515555932
147 | 33,19,0.45952115221569223
148 | 33,20,0.6897437506770194
149 | 33,22,0.5883557598002018
150 | 33,23,0.004996264899124525
151 | 33,26,0.04515847947583995
152 | 33,27,0.8556199432433349
153 | 33,28,0.2664787836457956
154 | 33,29,0.2799011634702968
155 | 33,30,0.6521544693031561
156 | 33,31,0.8285364872414698
157 | 33,32,0.8426561777783549
158 | 


--------------------------------------------------------------------------------
/basic_tasks_tf/data/gen_data.py:
--------------------------------------------------------------------------------
 1 | import networkx as nx
 2 | import torch
 3 | import scipy.sparse as sp
 4 | import pandas as pd
 5 | import numpy as np
 6 | import random
 7 | 
 8 | g = nx.karate_club_graph().to_undirected().to_directed()
 9 | ids = []
10 | clubs = []
11 | ages = []
12 | for nid, attr in g.nodes(data=True):
13 |     ids.append(nid)
14 |     clubs.append(attr['club'])
15 |     ages.append(random.randint(30, 50))
16 | nodes = pd.DataFrame({'Id' : ids, 'Club' : clubs, 'Age' : ages})
17 | print(nodes)
18 | src = []
19 | dst = []
20 | weight = []
21 | for u, v in g.edges():
22 |     src.append(u)
23 |     dst.append(v)
24 |     weight.append(random.random())
25 | edges = pd.DataFrame({'Src' : src, 'Dst' : dst, 'Weight' : weight})
26 | print(edges)
27 | 
28 | nodes.to_csv('nodes.csv', index=False)
29 | edges.to_csv('edges.csv', index=False)
30 | 
31 | #with open('edges.txt', 'w') as f:
32 | #    for u, v in zip(src, dst):
33 | #        f.write('{} {}\n'.format(u, v))
34 | #
35 | #torch.save(torch.tensor(src), 'src.pt')
36 | #torch.save(torch.tensor(dst), 'dst.pt')
37 | #
38 | #spmat = nx.to_scipy_sparse_matrix(g)
39 | #print(spmat)
40 | #sp.save_npz('scipy_adj.npz', spmat)
41 | #
42 | #from networkx.readwrite import json_graph
43 | #import json
44 | #
45 | #with open('adj.json', 'w') as f:
46 | #    json.dump(json_graph.adjacency_data(g), f)
47 | #
48 | #node_feat = torch.randn((34, 5)) / 10.
49 | #edge_feat = torch.ones((156,))
50 | #torch.save(node_feat, 'node_feat.pt')
51 | #torch.save(edge_feat, 'edge_feat.pt')
52 | 


--------------------------------------------------------------------------------
/basic_tasks_tf/data/nodes.csv:
--------------------------------------------------------------------------------
 1 | Id,Club,Age
 2 | 0,Mr. Hi,45
 3 | 1,Mr. Hi,33
 4 | 2,Mr. Hi,36
 5 | 3,Mr. Hi,31
 6 | 4,Mr. Hi,41
 7 | 5,Mr. Hi,42
 8 | 6,Mr. Hi,48
 9 | 7,Mr. Hi,41
10 | 8,Mr. Hi,30
11 | 9,Officer,35
12 | 10,Mr. Hi,38
13 | 11,Mr. Hi,44
14 | 12,Mr. Hi,37
15 | 13,Mr. Hi,39
16 | 14,Officer,36
17 | 15,Officer,38
18 | 16,Mr. Hi,47
19 | 17,Mr. Hi,45
20 | 18,Officer,41
21 | 19,Mr. Hi,31
22 | 20,Officer,31
23 | 21,Mr. Hi,44
24 | 22,Officer,42
25 | 23,Officer,32
26 | 24,Officer,30
27 | 25,Officer,50
28 | 26,Officer,30
29 | 27,Officer,43
30 | 28,Officer,48
31 | 29,Officer,40
32 | 30,Officer,39
33 | 31,Officer,45
34 | 32,Officer,47
35 | 33,Officer,33
36 | 


--------------------------------------------------------------------------------
/basic_tasks_tf/tutorial_utils.py:
--------------------------------------------------------------------------------
 1 | import pandas as pd
 2 | import os
 3 | 
 4 | def setup_tf():
 5 |     os.environ['USE_OFFICIAL_TFDLPACK']='true'
 6 |     os.environ['DGLBACKEND']='tensorflow'
 7 |     os.environ['CUDA_VISIBLE_DEVICES'] = '-1'
 8 | 
 9 | def load_zachery():
10 |     import dgl
11 |     import numpy as np    
12 |     import tensorflow as tf
13 |     nodes_data = pd.read_csv('data/nodes.csv')
14 |     edges_data = pd.read_csv('data/edges.csv')
15 |     src = edges_data['Src'].to_numpy()
16 |     dst = edges_data['Dst'].to_numpy()
17 |     g = dgl.graph((src, dst))
18 |     club = nodes_data['Club'].to_list()
19 |     # Convert to categorical integer values with 0 for 'Mr. Hi', 1 for 'Officer'.
20 |     club = tf.constant([c == 'Officer' for c in club], dtype=tf.int64)
21 |     # We can also convert it to one-hot encoding.
22 |     club_onehot = tf.one_hot(club, np.max(club) + 1)
23 |     g.ndata.update({'club' : club, 'club_onehot' : club_onehot})
24 |     return g
25 | 


--------------------------------------------------------------------------------
/dgl_api/dgl-www-zz.pptx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/dglai/WWW20-Hands-on-Tutorial/8fd0bbf9aca2bfcb95074e4124e01a2b074be300/dgl_api/dgl-www-zz.pptx


--------------------------------------------------------------------------------
/dgl_api/graph-1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/dglai/WWW20-Hands-on-Tutorial/8fd0bbf9aca2bfcb95074e4124e01a2b074be300/dgl_api/graph-1.png


--------------------------------------------------------------------------------
/dgl_api/graph-2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/dglai/WWW20-Hands-on-Tutorial/8fd0bbf9aca2bfcb95074e4124e01a2b074be300/dgl_api/graph-2.png


--------------------------------------------------------------------------------
/dgl_api/graph-3.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/dglai/WWW20-Hands-on-Tutorial/8fd0bbf9aca2bfcb95074e4124e01a2b074be300/dgl_api/graph-3.png


--------------------------------------------------------------------------------
/dgl_api/graph-4.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/dglai/WWW20-Hands-on-Tutorial/8fd0bbf9aca2bfcb95074e4124e01a2b074be300/dgl_api/graph-4.png


--------------------------------------------------------------------------------
/dgl_api/nodeflow.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/dglai/WWW20-Hands-on-Tutorial/8fd0bbf9aca2bfcb95074e4124e01a2b074be300/dgl_api/nodeflow.png


--------------------------------------------------------------------------------
/dgl_api/nodeflow2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/dglai/WWW20-Hands-on-Tutorial/8fd0bbf9aca2bfcb95074e4124e01a2b074be300/dgl_api/nodeflow2.png


--------------------------------------------------------------------------------
/dgl_api/slides.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/dglai/WWW20-Hands-on-Tutorial/8fd0bbf9aca2bfcb95074e4124e01a2b074be300/dgl_api/slides.pdf


--------------------------------------------------------------------------------
/dgl_api/slides.tex:
--------------------------------------------------------------------------------
  1 | \documentclass[10pt,aspectratio=169]{beamer}
  2 | \usepackage[utf8]{inputenc}
  3 | \usepackage[T1]{fontenc}
  4 | \usepackage{lmodern}
  5 | \usepackage{amsmath}
  6 | \usepackage{amsfonts}
  7 | \usepackage{amssymb}
  8 | \usepackage{graphicx}
  9 | \usepackage{hyperref}
 10 | \usepackage{listings}
 11 | \usepackage{xcolor}
 12 | \usepackage{soul}
 13 | \usepackage[11pt]{moresize}
 14 | \usepackage{multirow}
 15 | 
 16 | \definecolor{codegreen}{rgb}{0,0.6,0}
 17 | \definecolor{codegray}{rgb}{0.5,0.5,0.5}
 18 | \definecolor{codepurple}{rgb}{0.58,0,0.82}
 19 | \definecolor{backcolour}{rgb}{0.95,0.95,0.92}
 20 | 
 21 | \lstdefinestyle{standard}{
 22 | 	backgroundcolor=\color{backcolour},   
 23 | 	commentstyle=\color{codegreen},
 24 | 	keywordstyle=\color{magenta},
 25 | 	numberstyle=\color{codegray},
 26 | 	stringstyle=\color{codepurple},
 27 | 	basicstyle=\ttfamily\scriptsize,
 28 | 	breakatwhitespace=false,         
 29 | 	breaklines=true,                 
 30 | 	captionpos=b,                    
 31 | 	keepspaces=true,                 
 32 | 	showspaces=false,                
 33 | 	showstringspaces=false,
 34 | 	showtabs=false,                  
 35 | 	framesep=0mm,
 36 | 	tabsize=4
 37 | }
 38 | \lstset{style=standard}
 39 | \lstset{escapeinside={(*@}{@*)}}
 40 | 
 41 | \AtBeginSection[]{
 42 | 	\begin{frame}
 43 | 		\vfill
 44 | 		\centering
 45 | 		\begin{beamercolorbox}[sep=8pt,center,shadow=true,rounded=true]{title}
 46 | 			\usebeamerfont{title}\insertsectionhead\par%
 47 | 		\end{beamercolorbox}
 48 | 		\vfill
 49 | 	\end{frame}
 50 | }
 51 | 
 52 | \usetheme{Singapore}
 53 | \begin{document}
 54 | 	\author{Quan Gan}
 55 | 	\title{Introduction to DGL}
 56 | 	%\subtitle{}
 57 | 	%\logo{}
 58 | 	\institute{AWS Shanghai AI Lab}
 59 | 	%\date{}
 60 | 	%\subject{}
 61 | 	%\setbeamercovered{transparent}
 62 | 	\setbeamertemplate{navigation symbols}{}
 63 | 	\begin{frame}
 64 | 		\begin{center}
 65 | 			\centering
 66 | 			If you have yet to do this...
 67 | 			
 68 | 			~
 69 | 			
 70 | 			\Large Please send an email to
 71 | 			
 72 | 			\Huge \url{dgl-devday@request-nb.mxnet.io}
 73 | 			
 74 | 			~
 75 | 			
 76 | 			\normalsize Content does not matter
 77 | 		\end{center}
 78 | 	\end{frame}
 79 | 	
 80 | 	\begin{frame}[plain]
 81 | 		\maketitle
 82 | 	\end{frame}
 83 | 
 84 | 	\begin{frame}
 85 | 		\frametitle{What does DGL provide?}
 86 | 		\begin{itemize}
 87 | 			\item From bottom-level to top-level:
 88 | 			\begin{itemize}
 89 | 				\item \emph{Plug-and-play model zoo}, to run an existing model on your data directly
 90 | 				\item \emph{Easy-to-use graph neural network layer modules}, to plugin popular GNN layers into your model.
 91 | 				\item \emph{Flexible and efficient message passing APIs}, to design your own message passing (not necessarily full-graph!) from scratch.
 92 | 			\end{itemize}
 93 | 		\end{itemize}
 94 | 	\end{frame}
 95 | 	
 96 | 	\begin{frame}[fragile]
 97 | 		\frametitle{Model Zoo?}
 98 | 		\begin{itemize}
 99 | 			\item Get a (pretrained) model that works on molecules immediately:
100 | \begin{lstlisting}[language=Python]
101 | from dgl.model_zoo.chem import load_pretrained
102 | model = load_pretrained('MPNN_Alchemy')
103 | result = model(molecule, atom_features, bond_features)
104 | \end{lstlisting}
105 | 			\item We just released a subpackage \emph{DGL-KE} for training embeddings on large scale knowledge graphs such as FreeBase.
106 | 			\item We are shooting for a model zoo for recommender systems in 0.5.
107 | 		\end{itemize}
108 | 	\end{frame}
109 | 
110 | 	\begin{frame}[fragile]
111 | 		\frametitle{Graph Neural Network Layer Modules?}
112 | 		\begin{itemize}
113 | 			\item Use DGL GNN Modules to build a bigger network:
114 | \begin{lstlisting}[language=Python]
115 | from dgl.nn.pytorch import SAGEConv
116 | 
117 | # One layer GraphSAGE
118 | class NodeClassifier(nn.Module):
119 |     def __init__(self, in_dim, n_classes):
120 |         self.gnn = SAGEConv(in_dim, in_dim, 'mean')
121 |         self.cls = nn.Linear(in_dim, n_classes)
122 |     def forward(self, g, x):
123 |         h = self.gnn(g, x)
124 |         return self.cls(h)
125 | \end{lstlisting}
126 | 			\item We have lots of popular GNN modules implemented.
127 | 			\begin{itemize}
128 | 				\item For example, GCN (for embedding learning), GraphSAGE (for inductive learning), EdgeConv (for point clouds), etc.
129 | 				\item The list is growing!
130 | 			\end{itemize}
131 | 		\end{itemize}
132 | 	\end{frame}
133 | 	
134 | 	\begin{frame}
135 | 		\frametitle{Custom Graph Neural Network Layer?}
136 | 		$$
137 | 		h^{(k)}_v = \phi\left(
138 | 		h^{(k-1)}_v,
139 | 		h^{(k)}_{\mathcal{N}(v)}
140 | 		\right) \qquad h^{(k)}_{\mathcal{N}(v)} = f\left(
141 | 		\left\lbrace
142 | 		h^{(k-1)}_u : u \in \mathcal{N}(v)
143 | 		\right\rbrace
144 | 		\right) \footnote{Xu et al., \emph{How Powerful Are Graph Neural Networks?}, ICLR 2019}
145 | 		$$
146 | 		\begin{center}
147 | 			\centering
148 | 			\only<1>{\includegraphics[width=0.4\textwidth]{graph-1.png}}
149 | 			\only<2>{\includegraphics[width=0.4\textwidth]{graph-2.png}}
150 | 			\only<3>{\includegraphics[width=0.4\textwidth]{graph-3.png}}
151 | 			\only<4>{\includegraphics[width=0.4\textwidth]{graph-4.png}}
152 | 		\end{center}
153 | 	\end{frame}
154 | 
155 | 	\begin{frame}[fragile]
156 | 		\frametitle{Aggregation: Average Pooling\footnote{Hamilton et al., \emph{Inductive Representation Learning on Large Graphs}, NIPS 2017}}
157 | 		\begin{minipage}{0.5\textwidth}
158 | 			Sparse matrix multiplication, very well-known:
159 | \begin{lstlisting}[language=Python]
160 | # code: PyTorch
161 | # src: edge source node IDs (n_edges,)
162 | # dst: edge destination node IDs (n_edges,)
163 | # H: node repr matrix (n_nodes, in_dim)
164 | # W: weights (in_dim * 2, out_dim)
165 | A = torch.sparse_coo_tensor(
166 |     torch.stack([dst, src], 0),
167 |     torch.ones(n_nodes),
168 |     (n_nodes, n_nodes))
169 | in_deg = torch.sparse.sum(A, 1).to_dense()
170 | H_N = A @ H / in_deg.unsqueeze(1)
171 | H = torch.relu(torch.cat([H_N, H], 1) @ W)
172 | \end{lstlisting}
173 | 		\end{minipage}%
174 | 		\begin{minipage}{0.5\textwidth}
175 | \begin{lstlisting}[language=Python]
176 | # code: PyTorch + DGL
177 | # G: DGL Graph
178 | # H: node repr matrix (n_nodes, in_dim)
179 | # W: weights (in_dim * 2, out_dim)
180 | import dgl.function as fn
181 | G.ndata['h'] = H
182 | G.update_all(
183 |     fn.copy_u('h', 'm'),
184 |     fn.mean('m', 'h_n'))
185 | H_N = G.ndata['h_n']
186 | H = torch.relu(torch.cat([H_N, H], 1) @ W)
187 | \end{lstlisting}
188 | 		\end{minipage}
189 | 	\end{frame}
190 | 
191 | 	\begin{frame}[fragile]
192 | 		\frametitle{How about max pooling?}
193 | 		
194 | 			Not possible in Vanilla PyTorch \& MXNet.  Not memory-efficient in Tensorflow.
195 | \begin{minipage}{0.5\textwidth}
196 | \begin{lstlisting}[language=Python]
197 | # code: Tensorflow 2
198 | # src: edge source node IDs (n_edges,)
199 | # dst: edge destination node IDs (n_edges,)
200 | # H: node repr matrix (n_nodes, in_dim)
201 | # W: weights (in_dim * 2, out_dim)
202 | 
203 | # Broadcast source features to edges
204 | H_src = tf.gather(H, src)
205 | H_N = (*@\textit{\textcolor{red}{tf.math.unsorted\_segment\_max}}@*)(
206 |     H_src, dst, n_nodes)
207 | H = tf.nn.relu(tf.concat([H_N, H], 1) @ W)
208 | \end{lstlisting}
209 | \end{minipage}%
210 | \begin{minipage}{0.5\textwidth}
211 | \begin{lstlisting}[language=Python]
212 | # code: PyTorch + DGL
213 | # G: DGL Graph
214 | # H: node repr matrix (n_nodes, in_dim)
215 | # W: weights (in_dim * 2, out_dim)
216 | import dgl.function as fn
217 | G.ndata['h'] = H
218 | G.update_all(
219 |     fn.copy_u('h', 'm'),
220 |     fn.(*@\textit{\textcolor{red}{max}}@*)('m', 'h_n'))
221 | H_N = G.ndata['h_n']
222 | H = torch.relu(torch.cat([H_N, H], 1) @ W)
223 | \end{lstlisting}
224 | \end{minipage}
225 | 	\end{frame}
226 | 
227 | 	\begin{frame}[fragile]
228 | 		\frametitle{With attention?\footnote{Velickovic et al., \emph{Graph Attention Networks}, ICLR 2018}}
229 | 		Can't do it easily with vanilla PyTorch/MXNet.  Possible in Tensorflow
230 | 		\begin{minipage}{0.5\textwidth}
231 | \begin{lstlisting}[language=Python]
232 | # code: Tensorflow 2
233 | # src: edge source node IDs (n_edges,)
234 | # dst: edge destination node IDs (n_edges,)
235 | # H: node repr matrix (n_nodes, in_dim)
236 | # W: weights (in_dim * 2, out_dim)
237 | # Only one attention head is considered
238 | H_src = tf.gather(H, src)
239 | H_dst = tf.gather(H, dst)
240 | alpha_hat = MLP(tf.concat([H_dst, H_src], 1))
241 | alpha_hat_sp = tf.sparse.SparseTensor(
242 |     tf.stack([dst, src], 1),
243 |     alpha_hat,
244 |     (n_nodes, n_nodes))
245 | alpha = (*@\textit{\textcolor{red}{tf.sparse.softmax}}@*)(alpha_hat_sp)
246 | H_N = tf.sparse.sparse_dense_matmul(
247 |     alpha, H)
248 | H = tf.nn.relu(tf.concat([H_N, H], 1) @ W)
249 | \end{lstlisting}
250 | 		\end{minipage}%
251 | 		\begin{minipage}{0.5\textwidth}
252 | \begin{lstlisting}[language=Python]
253 | # code: PyTorch + DGL
254 | # G: DGL Graph
255 | # H: node repr matrix (n_nodes, in_dim)
256 | # W: weights (in_dim * 2, out_dim)
257 | 
258 | import dgl.function as fn
259 | G.ndata['h'] = H
260 | G.update_all((*@\textit{\textcolor{red}{msg\_func}}@*), (*@\textit{\textcolor{red}{reduce\_func}}@*))
261 | H_N = G.ndata['h_n']
262 | H = torch.relu(torch.cat([H_N, H], 1) @ W)
263 | \end{lstlisting}
264 | 		\end{minipage}
265 | 	\end{frame}
266 | 
267 | 	\begin{frame}[fragile]
268 | 		\frametitle{With attention?}
269 | 		Can't do it easily with vanilla PyTorch/MXNet.  Possible in Tensorflow
270 | 		\begin{minipage}{0.5\textwidth}
271 | \begin{lstlisting}[language=Python]
272 | # code: Tensorflow 2
273 | # src: edge source node IDs (n_edges,)
274 | # dst: edge destination node IDs (n_edges,)
275 | # H: node repr matrix (n_nodes, in_dim)
276 | # W: weights (in_dim * 2, out_dim)
277 | # Only one attention head is considered
278 | H_src = tf.gather(H, src)
279 | H_dst = tf.gather(H, dst)
280 | alpha_hat = MLP(tf.concat([H_dst, H_src], 1))
281 | alpha_hat_sp = tf.sparse.SparseTensor(
282 |     tf.stack([dst, src], 1),
283 |     alpha_hat,
284 |     (n_nodes, n_nodes))
285 | alpha = (*@\textit{\textcolor{red}{tf.sparse.softmax}}@*)(alpha_hat_sp)
286 | H_N = tf.sparse.sparse_dense_matmul(
287 |     alpha, H)
288 | H = tf.nn.relu(tf.concat([H_N, H], 1) @ W)
289 | \end{lstlisting}
290 | 		\end{minipage}%
291 | 		\begin{minipage}{0.5\textwidth}
292 | \begin{lstlisting}[language=Python]
293 | def msg_func(edges):
294 |     h_src = edges.src['h']
295 |     h_dst = edges.dst['h']
296 |     alpha_hat = MLP(
297 |         torch.cat([h_dst, h_src], 1))
298 |     return {'m': h_src, 'alpha_hat': alpha}
299 | 
300 | def reduce_func(nodes):
301 |     # Incoming messages are batched along
302 |     # 2nd axis.
303 |     m = nodes.mailbox['m']
304 |     alpha_hat = nodes.mailbox['alpha_hat']
305 |     alpha = torch.softmax(alpha_hat, 1)
306 |     return {'h_n':
307 |         (m * alpha[:, None]).sum(1)}
308 | \end{lstlisting}
309 | 		\end{minipage}
310 | 	\end{frame}
311 | 
312 | 	\begin{frame}[fragile]
313 | 		\frametitle{How about LSTM\footnote{Fan et al., \emph{Metapath-guided Heterogeneous Graph Neural Network for Intent Recommendation}, KDD 2019}?}
314 | 		\begin{minipage}{0.5\textwidth}
315 | \begin{lstlisting}[language=Python]
316 | # code: PyTorch
317 | # src: edge source node IDs (n_edges,)
318 | # dst: edge destination node IDs (n_edges,)
319 | # t: timestamp of edges.
320 | #    LSTM goes through messages in the order
321 | #    of timestamps
322 | # H: node repr matrix (n_nodes, in_dim)
323 | # lstm: LSTM module
324 | # W: weights (in_dim * 2, out_dim)
325 | from torch.nn.utils.rnn import pack_sequence
326 | # Build adjacency list
327 | adjlist = []
328 | for v in range(n_nodes): 
329 |     v_mask = (dst == v) 
330 |     t_v = t[v_mask] 
331 |     N_v = src[v_mask] 
332 |     indices = t_v.argsort() 
333 |     adjlist.append(N_v[indices])
334 | # Pack input sequence
335 | seqs = [H[u] for u in adjlist]
336 | packed_seq = pack_sequence(seqs, False)
337 | # Run LSTM and compute the new H
338 | _, (H_N, _) = lstm(packed_seq)
339 | H = torch.relu(torch.cat([H_N, H], 1) @ W)
340 | \end{lstlisting}
341 | 		\end{minipage}%
342 | 		\begin{minipage}{0.5\textwidth}
343 | \begin{lstlisting}[language=Python]
344 | # code: PyTorch + DGL
345 | # G: DGL Graph
346 | # H: node repr matrix (n_nodes, in_dim)
347 | # W: weights (in_dim * 2, out_dim)
348 | # lstm: LSTM module
349 | 
350 | import dgl.function as fn
351 | G.ndata['h'] = H
352 | G.update_all(fn.copy_u('h', 'm'), (*@\textit{\textcolor{red}{reduce\_func}}@*))
353 | H_N = G.ndata['h_n']
354 | H = torch.relu(torch.cat([H_N, H], 1) @ W)
355 | \end{lstlisting}
356 | 		\end{minipage}
357 | 	\end{frame}
358 | 
359 | 	\begin{frame}[fragile]
360 | 		\frametitle{How about LSTM?}
361 | 		\begin{minipage}{0.5\textwidth}
362 | \begin{lstlisting}[language=Python]
363 | # code: PyTorch
364 | # src: edge source node IDs (n_edges,)
365 | # dst: edge destination node IDs (n_edges,)
366 | # t: timestamp of edges.
367 | #    LSTM goes through messages in the order
368 | #    of timestamps
369 | # H: node repr matrix (n_nodes, in_dim)
370 | # lstm: LSTM module
371 | # W: weights (in_dim * 2, out_dim)
372 | from torch.nn.utils.rnn import pack_sequence
373 | # Build adjacency list
374 | adjlist = []
375 | for v in range(n_nodes):
376 |     v_mask = (dst == v)
377 |     t_v = t[v_mask]
378 |     N_v = src[v_mask]
379 |     indices = t_v.argsort()
380 |     adjlist.append(N_v[indices])
381 | # Pack input sequence
382 | seqs = [H[u] for u in adjlist]
383 | packed_seq = pack_sequence(seqs, False)
384 | # Run LSTM and compute the new H
385 | _, (H_N, _) = lstm(packed_seq)
386 | H = torch.relu(torch.cat([H_N, H], 1) @ W)
387 | \end{lstlisting}
388 | 		\end{minipage}%
389 | 		\begin{minipage}{0.5\textwidth}
390 | \begin{lstlisting}[language=Python]
391 | def reduce_func(nodes):
392 |     indices = nodes.mailbox['t'].argsort(1)
393 |     m = nodes.mailbox['m']
394 |     m_ordered = m.gather(
395 |         1, t[:, :, None].expand_as(m))
396 |     return {'h_n': lstm(m)}
397 | \end{lstlisting}
398 | 		\end{minipage}
399 | 	\end{frame}
400 | 
401 | 	\begin{frame}[fragile]
402 | 		\frametitle{How about updating partially\footnote{Trivedi et al., \emph{Know-Evolve: Deep Temporal Reasoning for Dynamic Knowledge Graphs}, ICML 2017}\footnote{Tai et al., \emph{Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks}  (TreeLSTM), ACL 2015}?}
403 | 		DGL does not confine itself in full-graph updates; one can send messages on, and receive message along, \emph{some of} the edges at a time.
404 | 		\begin{center}
405 | 			\centering
406 | 			\begin{minipage}{0.5\textwidth}
407 | \begin{lstlisting}[language=Python]
408 | # code: PyTorch + DGL
409 | # messages are sent/received in the order of
410 | # edge timestamps.
411 | # H: node repr matrix (n_nodes, in_dim)
412 | # T: numpy array of edge timestamps
413 | G.ndata['h'] = H
414 | distinct_T = np.sort(np.unique(T))
415 | for t in distinct_T:
416 |     eid = np.where(T == t)
417 |     G.(*@\textit{\textcolor{red}{reduce\_func}}@*)(eid, msg_func, reduce_func)
418 | H_output = G.ndata['h']
419 | \end{lstlisting}
420 | 			\end{minipage}
421 | 		\end{center}
422 | 	\end{frame}
423 | 
424 | 	\begin{frame}[fragile]
425 | 		\frametitle{How about heterogeneous graphs\footnote{Schlichtkrull et al., \emph{Modeling Relational Data with Graph Convolutional Networks}}?}
426 | 		\begin{itemize}
427 | 			\item DGL supports heterogeneous graphs whose nodes and edges are typed and may have type-specific features.
428 | 			\item One can perform message passing on one edge type at a time.
429 | 		\end{itemize}
430 | \begin{lstlisting}[language=Python]
431 | # code: PyTorch + DGL
432 | # xs: node features for each node type
433 | # ws: weights for each edge type
434 | # g: DGL heterogeneous graph
435 | for i, ntype in enumerate(g.ntypes):
436 |     g.nodes[ntype].data['x'] = xs[i]
437 | 
438 | # intra-type aggregation
439 | for i, (srctype, etype, dsttype) in enumerate(g.canonical_etypes):
440 |     g.nodes[srctype].data['h'] = g.nodes[srctype].data['x'] @ ws[etype]
441 |     g(*@\textit{\textcolor{red}{[srctype, etype, dsttype]}}@*).update_all(fn.copy_u('h', 'm'), fn.mean('m', 'h_%d'))
442 | \end{lstlisting}
443 | 	\end{frame}
444 | 
445 | 	\begin{frame}[fragile]
446 | 		\frametitle{How about heterogeneous graphs?}
447 | 		\begin{itemize}
448 | 			\item One can also perform message passing on multiple edge types, further aggregating the outcome of per-edge-type aggregation with an \emph{cross-type reducer}.
449 | 		\end{itemize}
450 | \begin{lstlisting}[language=Python]
451 | # code: PyTorch + DGL
452 | # xs: node features for each node type
453 | # ws: weights for each edge type
454 | # g: DGL heterogeneous graph
455 | for i, ntype in enumerate(g.ntypes):
456 |     g.nodes[ntype].data['x'] = xs[i]
457 | 
458 | funcs = {}
459 | for i, (srctype, etype, dsttype) in enumerate(g.canonical_etypes):
460 |     g.nodes[srctype].data['h%d' % i] = g.nodes[srctype].data['x'] @ ws[etype]
461 |     funcs[(srctype, etype, dsttype)] = (
462 |         fn.copy_u('h%d' % i, 'm'), fn.mean('m', 'h'))
463 | 
464 | # message passing
465 | g.(*@\textit{\textcolor{red}{multi\_update\_all(funcs, cross\_reducer='sum')}}@*)
466 | \end{lstlisting}
467 | 	\end{frame}
468 | 
469 | 	\begin{frame}
470 | 		\frametitle{Comparison of flexibility}
471 | 		\begin{tabular}{|c|ccc|}
472 | 			\hline
473 | 			Computation & Tensorflow & PyTorch/MXNet & DGL \\
474 | 			\hline
475 | 			Average pooling & Sparse matmul & Sparse matmul & \multirow{6}{*}{Message Passing API} \\
476 | 			Max pooling & Segment-max & N/A & \\
477 | 			Attention pooling & Sparse softmax & N/A & \\
478 | 			LSTM pooling & Sequence padding & Sequence padding & \\
479 | 			Partial graph computation & Manual labor & Manual labor & \\
480 | 			Heterogeneous graph & Manual labor & Manual labor & \\
481 | 			\hline
482 | 		\end{tabular}
483 | 	\end{frame}
484 | 
485 | 	\begin{frame}
486 | 		\frametitle{Is it efficient?}
487 | 		\begin{center}
488 | 			\centering
489 | 			\begin{tabular}{|p{0.3\textwidth}p{0.2\textwidth}p{0.2\textwidth}p{0.2\textwidth}|}
490 | 				\hline
491 | 				Model & Train time/epoch (Original) & Train time/epoch (DGL) & Speedup \\
492 | 				\hline
493 | 				Graph Convolutional Networks & 0.0051s (TF) & 0.0031s & 1.64x \\
494 | 				\hline
495 | 				Graph Attention Networks & 0.0982s (TF) & 0.0113s & 8.69x \\
496 | 				\hline
497 | 				Relational GCN (classification) & 0.2853s (Theano) & 0.0075s & 38.2x \\
498 | 				\hline
499 | 				Relational GCN (link prediction) & 2.204s (TF) & 0.453s & 4.86x \\
500 | 				\hline
501 | 				Graph Convolutional Matrix Completion (MovieLens-100k) & 0.1008s (TF) & 0.0246s (MXNet) & 4.09x \\
502 | 				\hline
503 | 				TreeLSTM & 14.02s (DyNet) & 3.18s & 4.3x \\
504 | 				\hline
505 | 				Junction Tree Variational Autoencoder & 1826s (PyTorch) & 743s & 2.5x \\
506 | 				\hline
507 | 			\end{tabular}
508 | 			And much more examples....
509 | 		\end{center}
510 | 	\end{frame}
511 | 
512 | 	\begin{frame}
513 | 		\frametitle{What's more?}
514 | 		\begin{itemize}
515 | 			\item Check out our repository: \url{https://github.com/dmlc/dgl}
516 | 			\begin{itemize}
517 | 				\item We have lots of PyTorch and MXNet examples!
518 | 				\item In 0.4 we also released DGL-KE, a subpackage for training knowledge graph embeddings.
519 | 			\end{itemize}
520 | 			\item Check out our documentation: \url{https://docs.dgl.ai}
521 | 			\item Discussion forum: \url{https://discuss.dgl.ai}
522 | 			\item Stay tuned for 0.5, which will include better support on large-scale \& distributed GNN training!
523 | 		\end{itemize}
524 | 	\end{frame}
525 | \end{document}


--------------------------------------------------------------------------------
/images/GNN.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/dglai/WWW20-Hands-on-Tutorial/8fd0bbf9aca2bfcb95074e4124e01a2b074be300/images/GNN.png


--------------------------------------------------------------------------------
/images/Link_predict.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/dglai/WWW20-Hands-on-Tutorial/8fd0bbf9aca2bfcb95074e4124e01a2b074be300/images/Link_predict.png


--------------------------------------------------------------------------------
/images/link_predict1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/dglai/WWW20-Hands-on-Tutorial/8fd0bbf9aca2bfcb95074e4124e01a2b074be300/images/link_predict1.png


--------------------------------------------------------------------------------
/images/link_predict2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/dglai/WWW20-Hands-on-Tutorial/8fd0bbf9aca2bfcb95074e4124e01a2b074be300/images/link_predict2.png


--------------------------------------------------------------------------------
/images/negative_edges.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/dglai/WWW20-Hands-on-Tutorial/8fd0bbf9aca2bfcb95074e4124e01a2b074be300/images/negative_edges.png


--------------------------------------------------------------------------------
/images/node_classify1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/dglai/WWW20-Hands-on-Tutorial/8fd0bbf9aca2bfcb95074e4124e01a2b074be300/images/node_classify1.png


--------------------------------------------------------------------------------
/images/node_classify2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/dglai/WWW20-Hands-on-Tutorial/8fd0bbf9aca2bfcb95074e4124e01a2b074be300/images/node_classify2.png


--------------------------------------------------------------------------------
/large_graphs/assets/block1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/dglai/WWW20-Hands-on-Tutorial/8fd0bbf9aca2bfcb95074e4124e01a2b074be300/large_graphs/assets/block1.png


--------------------------------------------------------------------------------
/large_graphs/assets/block2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/dglai/WWW20-Hands-on-Tutorial/8fd0bbf9aca2bfcb95074e4124e01a2b074be300/large_graphs/assets/block2.png


--------------------------------------------------------------------------------
/large_graphs/assets/block_with_self1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/dglai/WWW20-Hands-on-Tutorial/8fd0bbf9aca2bfcb95074e4124e01a2b074be300/large_graphs/assets/block_with_self1.png


--------------------------------------------------------------------------------
/large_graphs/assets/block_with_self2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/dglai/WWW20-Hands-on-Tutorial/8fd0bbf9aca2bfcb95074e4124e01a2b074be300/large_graphs/assets/block_with_self2.png


--------------------------------------------------------------------------------
/large_graphs/assets/graph.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/dglai/WWW20-Hands-on-Tutorial/8fd0bbf9aca2bfcb95074e4124e01a2b074be300/large_graphs/assets/graph.png


--------------------------------------------------------------------------------
/large_graphs/assets/graph_1layer_46.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/dglai/WWW20-Hands-on-Tutorial/8fd0bbf9aca2bfcb95074e4124e01a2b074be300/large_graphs/assets/graph_1layer_46.png


--------------------------------------------------------------------------------
/large_graphs/assets/graph_2layer_46.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/dglai/WWW20-Hands-on-Tutorial/8fd0bbf9aca2bfcb95074e4124e01a2b074be300/large_graphs/assets/graph_2layer_46.png


--------------------------------------------------------------------------------
/large_graphs/assets/in_subgraph_1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/dglai/WWW20-Hands-on-Tutorial/8fd0bbf9aca2bfcb95074e4124e01a2b074be300/large_graphs/assets/in_subgraph_1.png


--------------------------------------------------------------------------------
/large_graphs/assets/in_subgraph_2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/dglai/WWW20-Hands-on-Tutorial/8fd0bbf9aca2bfcb95074e4124e01a2b074be300/large_graphs/assets/in_subgraph_2.png


--------------------------------------------------------------------------------
/large_graphs/sampling.pptx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/dglai/WWW20-Hands-on-Tutorial/8fd0bbf9aca2bfcb95074e4124e01a2b074be300/large_graphs/sampling.pptx


--------------------------------------------------------------------------------