├── README.md ├── code ├── ad_hoc_functions.py ├── embedding_explore.ipynb ├── node2vec_clean_implementation.ipynb ├── node2vec_experiment.ipynb ├── node2vec_network_preprosess.ipynb ├── pre_compute_walks.py ├── random_walk.ipynb ├── train_node2vec.py └── train_node2vec_symmetric.py ├── data ├── co-author-index.json ├── co-author-matrix.npz ├── co-author-original-ids.json └── readme.txt ├── results └── .gitignore └── work └── .gitignore /README.md: -------------------------------------------------------------------------------- 1 | # Node2vec with tensorflow 2 | This repo contains ad hoc implementation of node2vec using tensorflow. I call it ad hoc because the codes are not so clean and efficient. However, it is applicable for large networks. I tested on a network with 74,530 nodes. Also, the input network needs to be represented as a Scipy.sparse.csr_matrix. 3 | 4 | main reference appeared at KDD 2016: [node2vec: Scalable Feature Learning for Networks](http://arxiv.org/abs/1607.00653) 5 | 6 | Also, I noticed that the first author of the paper open sourced the implementation: https://github.com/aditya-grover/node2vec 7 | I guess that is more efficent. So please try to use that first. This repo is for people who want to use tensorflow for some reasons. 8 | 9 | ## Requirements 10 | I recommend you to install [Anaconda](https://www.continuum.io/downloads) and then tensorflow. 11 | - [tneosorflow 0.9](http://tensorflow.org) 12 | - and some other libraries... 13 | 14 | ## How to use. 15 | I constructed a co-author network from [Microsoft academic graph](https://www.microsoft.com/en-us/research/project/microsoft-academic-graph/). This is co-author network only from SIGMOD, VLDB, ICDE, ICML, NIPS, IJCAI, AAAI, ECMLPAKDD, ICCV, CVPR, ECCV, ACL, NAACL, EMNLP, KDD, ICDM, WSDM, WWW, CIKM, and ISWC. It has 74,530 nodes. I’ll use this example here. The data is in the ./data directory. 16 | 17 | First, prepare sample nodes by random walk parameterized with p (return parameter) and q (in-out parameter). The input network has to be scipy.sparse.csr_matrix represented as serialized as noted [here](http://stackoverflow.com/questions/8955448/save-load-scipy-sparse-csr-matrix-in-portable-data-format) 18 | ``` 19 | cd code #make sure you are in code directory 20 | python pre_compute_walks.py --graph ../data/co-author-matrix.npz --walk ../work/random_walks.npz --p 1.0 --q 0.5 21 | ``` 22 | Then, learn embeddings using the random walks 23 | ``` 24 | cd code #make sure you are in code directory 25 | python train_node2vec.py --graph ../data/co-author-matrix.npz --walk ../work/random_walks.npz --log ../log1/ --save ../results/node_embeddings.npz 26 | ``` 27 | 28 | ##Important Notes 29 | Current implementation hard code several parameters: 30 | The number of dimension d = 200 31 | The number of epochs = 1 32 | The number of walk per node r = 1 33 | Random walk length l = 100 34 | Context size k = 16 35 | You should modify these parameters as well as p an q. In particular, r should be increased if you want to use for real application. However, my implementation is not efficient so it takes hours with the example network (74,530 nodes), so I restrict r = 1. 36 | 37 | The experimental settings in the original paper are: d=128, epochs=1, r=10, l=80, and k=10. 38 | 39 | ## Download random walks and embeddings 40 | I made a sample random walks, and learned embeddings available because it takes time to make them:. You can download them as below: 41 | - https://googledrive.com/host/0B046sNk0DhCDZ3pla3BKdnllcEE/random_walks.npz 42 | - https://googledrive.com/host/0B046sNk0DhCDZ3pla3BKdnllcEE/node_embeddings.npz 43 | Put random_walks.npz into ./work and node_embeddings.npz into ./results 44 | 45 | ## Vector Examples 46 | The embeddings are learned with p=1.0, q=0.5, d=200, epochs=1, r=1, l=100, and k=16. 47 | 48 | Top 3 cosine similar authors to Jure Leskovec: 49 | julian mcauley 0.459304 50 | jon kleinberg 0.438476 51 | jaewon yang 0.423793 52 | Top 3 cosine similar authors to Ying Ding: 53 | xin shuai 0.438184 54 | jie tang 0.424988 55 | jerome r busemeyer 0.395817 56 | 57 | *Note that Ying does not publish so many papers on conferences, but I only use top conferences. So the results might not be intuitive. 58 | See examples codes on ipython notebook: https://github.com/apple2373/node2vec/blob/master/code/embedding_explore.ipynb 59 | 60 | ##To do list 61 | 1. Make the code more flexible using command line arguments (e.g. dimensions of embeddings) 62 | 2. Use multi-processing for computing transition probabilities and random walks. 63 | 3. Use asynchronous SGD (currently using Adam SGD with single process). 64 | PR welcome especially for 2 and 3. 65 | -------------------------------------------------------------------------------- /code/ad_hoc_functions.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | #!/usr/bin/env python 3 | 4 | __author__ = "satoshi tsutsui" 5 | 6 | import numpy as np 7 | from scipy.sparse import csr_matrix 8 | import multiprocessing as mp 9 | import json 10 | 11 | #ref http://stackoverflow.com/questions/8955448/save-load-scipy-sparse-csr-matrix-in-portable-data-format 12 | def save_sparse_csr(filename,array): 13 | np.savez(filename,data = array.data ,indices=array.indices, 14 | indptr =array.indptr, shape=array.shape ) 15 | 16 | def load_sparse_csr(filename): 17 | loader = np.load(filename) 18 | return csr_matrix(( loader['data'], loader['indices'], loader['indptr']),shape = loader['shape']) 19 | 20 | def alpha(p,q,t,x,adj_mat_csr_sparse): 21 | if t==x: 22 | return 1.0/p 23 | elif adj_mat_csr_sparse[t,x]>0: 24 | return 1.0 25 | else: 26 | return 1.0/q 27 | 28 | def compute_transition_prob(adj_mat_csr_sparse,p,q): 29 | transition={} 30 | num_nodes=adj_mat_csr_sparse.shape[0] 31 | indices=adj_mat_csr_sparse.indices 32 | indptr=adj_mat_csr_sparse.indptr 33 | data=adj_mat_csr_sparse.data 34 | #Precompute the transition matrix in advance 35 | for t in xrange(num_nodes):#t is row index 36 | for v in indices[indptr[t]:indptr[t+1]]:#i.e possible next ndoes from t 37 | pi_vx_indices=indices[indptr[v]:indptr[v+1]]#i.e possible next ndoes from v 38 | pi_vx_values = np.array([alpha(p,q,t,x,adj_mat_csr_sparse) for x in pi_vx_indices]) 39 | pi_vx_values=pi_vx_values*data[indptr[v]:indptr[v+1]] 40 | #This is eqilvalent to the following 41 | # pi_vx_values=[] 42 | # for x in pi_vx_indices: 43 | # pi_vx=alpha(p,q,t,x)*adj_mat_csr_sparse[v,x] 44 | # pi_vx_values.append(pi_vx) 45 | pi_vx_values=pi_vx_values/np.sum(pi_vx_values) 46 | #now, we have normalzied transion probabilities for v traversed from t 47 | #the probabilities are stored as a sparse vector. 48 | transition[t,v]=(pi_vx_indices,pi_vx_values) 49 | 50 | return transition 51 | 52 | 53 | def generate_random_walks(adj_mat_csr_sparse,transition,random_walk_length): 54 | random_walks=[] 55 | num_nodes=adj_mat_csr_sparse.shape[0] 56 | indices=adj_mat_csr_sparse.indices 57 | indptr=adj_mat_csr_sparse.indptr 58 | data=adj_mat_csr_sparse.data 59 | #get random walks 60 | for u in xrange(num_nodes): 61 | if len(indices[indptr[u]:indptr[u+1]]) !=0: 62 | #first move is just depends on weight 63 | possible_next_node=indices[indptr[u]:indptr[u+1]] 64 | weight_for_next_move=data[indptr[u]:indptr[u+1]]#i.e possible next ndoes from u 65 | weight_for_next_move=weight_for_next_move.astype(np.float32)/np.sum(weight_for_next_move) 66 | first_walk=np.random.choice(possible_next_node, 1, p=weight_for_next_move) 67 | random_walk=[u,first_walk[0]] 68 | for i in xrange(random_walk_length-2): 69 | cur_node = random_walk[-1] 70 | precious_node=random_walk[-2] 71 | (pi_vx_indices,pi_vx_values)=transition[precious_node,cur_node] 72 | next_node=np.random.choice(pi_vx_indices, 1, p=pi_vx_values) 73 | random_walk.append(next_node[0]) 74 | random_walks.append(random_walk) 75 | 76 | return random_walks 77 | -------------------------------------------------------------------------------- /code/embedding_explore.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 69, 6 | "metadata": { 7 | "collapsed": true 8 | }, 9 | "outputs": [], 10 | "source": [ 11 | "%matplotlib inline\n", 12 | "import numpy as np\n", 13 | "import json" 14 | ] 15 | }, 16 | { 17 | "cell_type": "code", 18 | "execution_count": 70, 19 | "metadata": { 20 | "collapsed": true 21 | }, 22 | "outputs": [], 23 | "source": [ 24 | "#ref http://stackoverflow.com/questions/8955448/save-load-scipy-sparse-csr-matrix-in-portable-data-format\n", 25 | "def save_sparse_csr(filename,array):\n", 26 | " np.savez(filename,data = array.data ,indices=array.indices,\n", 27 | " indptr =array.indptr, shape=array.shape )\n", 28 | "\n", 29 | "def load_sparse_csr(filename):\n", 30 | " loader = np.load(filename)\n", 31 | " return csr_matrix(( loader['data'], loader['indices'], loader['indptr']),shape = loader['shape'])" 32 | ] 33 | }, 34 | { 35 | "cell_type": "code", 36 | "execution_count": 71, 37 | "metadata": { 38 | "collapsed": false 39 | }, 40 | "outputs": [], 41 | "source": [ 42 | "co_author_matrix=load_sparse_csr(\"../data/co-author-matrix.npz\")\n", 43 | "with open('../data/co-author-index.json', 'r') as f:\n", 44 | " aid2aname=json.load(f)\n", 45 | "aid2aname=dict((int(k), v) for k, v in aid2aname.iteritems())" 46 | ] 47 | }, 48 | { 49 | "cell_type": "code", 50 | "execution_count": 77, 51 | "metadata": { 52 | "collapsed": true 53 | }, 54 | "outputs": [], 55 | "source": [ 56 | "from sklearn.preprocessing import normalize\n", 57 | "\n", 58 | "\n", 59 | "def top_k(C,c_id,cid2cname,topk=5):\n", 60 | " C_norm = normalize(C)\n", 61 | " c_vec=C_norm[c_id]\n", 62 | " sim = np.dot(C_norm,c_vec)\n", 63 | " nearest = (-sim).argsort()[1:topk+1]\n", 64 | " results=[(cid2cname[nearest[k]],sim[nearest[k]]) for k in xrange(topk)]\n", 65 | " return results\n", 66 | "\n", 67 | "def top_k_vec(C,vec,cid2cname,topk=5):\n", 68 | " C_norm=normalize(C)\n", 69 | " vec_norm=vec/np.linalg.norm(vec)\n", 70 | " sim = np.dot(C_norm,vec_norm)\n", 71 | " nearest = (-sim).argsort()[0:topk]\n", 72 | " results=[(cid2cname[nearest[k]],sim[nearest[k]]) for k in xrange(topk)]\n", 73 | " return results\n", 74 | "\n", 75 | "def print_top(results):\n", 76 | " for pair in results:\n", 77 | " print pair[0],pair[1]" 78 | ] 79 | }, 80 | { 81 | "cell_type": "code", 82 | "execution_count": 80, 83 | "metadata": { 84 | "collapsed": true 85 | }, 86 | "outputs": [], 87 | "source": [ 88 | "np_node_embeddings=np.load('../results/node_embeddings.npz')['arr_0']" 89 | ] 90 | }, 91 | { 92 | "cell_type": "code", 93 | "execution_count": 81, 94 | "metadata": { 95 | "collapsed": false 96 | }, 97 | "outputs": [ 98 | { 99 | "name": "stdout", 100 | "output_type": "stream", 101 | "text": [ 102 | "jure leskovec\n", 103 | "julian mcauley 0.459304\n", 104 | "jon kleinberg 0.438476\n", 105 | "jaewon yang 0.423793\n", 106 | "cristian danescuniculescumizil 0.359046\n", 107 | "caroline suen 0.358588\n" 108 | ] 109 | } 110 | ], 111 | "source": [ 112 | "print aid2aname[10937]\n", 113 | "print_top(top_k(np_node_embeddings,10937,aid2aname))" 114 | ] 115 | }, 116 | { 117 | "cell_type": "code", 118 | "execution_count": 82, 119 | "metadata": { 120 | "collapsed": false 121 | }, 122 | "outputs": [ 123 | { 124 | "name": "stdout", 125 | "output_type": "stream", 126 | "text": [ 127 | "ying ding\n", 128 | "xin shuai 0.438184\n", 129 | "jie tang 0.424988\n", 130 | "jerome r busemeyer 0.395817\n", 131 | "martin klein 0.377127\n", 132 | "herbert van de sompel 0.355121\n" 133 | ] 134 | } 135 | ], 136 | "source": [ 137 | "print aid2aname[52753]\n", 138 | "print_top(top_k(np_node_embeddings,52753,aid2aname))" 139 | ] 140 | }, 141 | { 142 | "cell_type": "code", 143 | "execution_count": 76, 144 | "metadata": { 145 | "collapsed": false 146 | }, 147 | "outputs": [ 148 | { 149 | "ename": "NameError", 150 | "evalue": "global name 'normalize' is not defined", 151 | "output_type": "error", 152 | "traceback": [ 153 | "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m", 154 | "\u001b[1;31mNameError\u001b[0m Traceback (most recent call last)", 155 | "\u001b[1;32m\u001b[0m in \u001b[0;36m\u001b[1;34m()\u001b[0m\n\u001b[0;32m 1\u001b[0m \u001b[0mtop_k_vec\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m----> 2\u001b[1;33m \u001b[0mprint_top\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mtop_k_vec\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mnp_node_embeddings\u001b[0m\u001b[1;33m,\u001b[0m\u001b[0mnp_node_embeddings\u001b[0m\u001b[1;33m[\u001b[0m\u001b[1;36m52753\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m,\u001b[0m\u001b[0maid2aname\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m", 156 | "\u001b[1;32m\u001b[0m in \u001b[0;36mtop_k_vec\u001b[1;34m(C, vec, cid2cname, topk)\u001b[0m\n\u001b[0;32m 9\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 10\u001b[0m \u001b[1;32mdef\u001b[0m \u001b[0mtop_k_vec\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mC\u001b[0m\u001b[1;33m,\u001b[0m\u001b[0mvec\u001b[0m\u001b[1;33m,\u001b[0m\u001b[0mcid2cname\u001b[0m\u001b[1;33m,\u001b[0m\u001b[0mtopk\u001b[0m\u001b[1;33m=\u001b[0m\u001b[1;36m5\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m---> 11\u001b[1;33m \u001b[0mC_norm\u001b[0m\u001b[1;33m=\u001b[0m\u001b[0mnormalize\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mC\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 12\u001b[0m \u001b[0mvec_norm\u001b[0m\u001b[1;33m=\u001b[0m\u001b[0mvec\u001b[0m\u001b[1;33m/\u001b[0m\u001b[0mnp\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mlinalg\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mnorm\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mvec\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 13\u001b[0m \u001b[0msim\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mnp\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mdot\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mC_norm\u001b[0m\u001b[1;33m,\u001b[0m\u001b[0mvec_norm\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n", 157 | "\u001b[1;31mNameError\u001b[0m: global name 'normalize' is not defined" 158 | ] 159 | } 160 | ], 161 | "source": [ 162 | "top_k_vec\n", 163 | "print_top(top_k_vec(np_node_embeddings,np_node_embeddings[52753],aid2aname))" 164 | ] 165 | }, 166 | { 167 | "cell_type": "code", 168 | "execution_count": null, 169 | "metadata": { 170 | "collapsed": true 171 | }, 172 | "outputs": [], 173 | "source": [ 174 | "#If you want to restore from ckpt files, this might be helpful....\n", 175 | "\n", 176 | "\n", 177 | "# from gensim import corpora\n", 178 | "# import numpy as np\n", 179 | "# import unicodecsv as csv\n", 180 | "# import tensorflow as tf\n", 181 | "# import math\n", 182 | "# import os,sys\n", 183 | "# import random\n", 184 | "# from scipy.sparse import csr_matrix\n", 185 | "# from tqdm import tqdm\n", 186 | "# import json\n", 187 | "\n", 188 | "# #Computational Graph Definition\n", 189 | "# tf.reset_default_graph()#remove this if not ipython notebook\n", 190 | "\n", 191 | "# num_nodes=adj_mat_csr_sparse.shape[0]\n", 192 | "# context_size=16\n", 193 | "# batch_size = None\n", 194 | "# embedding_size = 200 # Dimension of the embedding vector.\n", 195 | "# num_sampled = 64 # Number of negative examples to sample.\n", 196 | "\n", 197 | "# global_step = tf.Variable(0, name='global_step', trainable=False)\n", 198 | "\n", 199 | "# # Parameters to learn\n", 200 | "# node_embeddings = tf.Variable(tf.random_uniform([num_nodes, embedding_size], -1.0, 1.0))\n", 201 | "\n", 202 | "# #Fixedones\n", 203 | "# biases=tf.zeros([num_nodes])\n", 204 | "\n", 205 | "# # Input data and re-orgenize size.\n", 206 | "# with tf.name_scope(\"context_node\") as scope:\n", 207 | "# #context nodes to each input node in the batch (e.g [[1,2],[4,6],[5,7]] where batch_size = 3,context_size=3)\n", 208 | "# train_context_node= tf.placeholder(tf.int32, shape=[batch_size,context_size],name=\"context_node\")\n", 209 | "# #orgenize prediction labels (skip-gram model predicts context nodes (i.e labels) given a input node)\n", 210 | "# #i.e make [[1,2,4,6,5,7]] given context above. The redundant dimention is just for restriction on tensorflow API.\n", 211 | "# train_context_node_flat=tf.reshape(train_context_node,[-1,1])\n", 212 | "# with tf.name_scope(\"input_node\") as scope:\n", 213 | "# #batch input node to the network(e.g [2,1,3] where batch_size = 3)\n", 214 | "# train_input_node= tf.placeholder(tf.int32, shape=[batch_size],name=\"input_node\")\n", 215 | "# #orgenize input as flat. i.e we want to make [2,2,2,1,1,1,3,3,3] given the input nodes above\n", 216 | "# input_ones=tf.ones_like(train_context_node)\n", 217 | "# train_input_node_flat=tf.reshape(tf.mul(input_ones,tf.reshape(train_input_node,[-1,1])),[-1])\n", 218 | "\n", 219 | "# # Model.\n", 220 | "# with tf.name_scope(\"loss\") as scope:\n", 221 | "# # Look up embeddings for words.\n", 222 | "# node_embed = tf.nn.embedding_lookup(node_embeddings, train_input_node_flat)\n", 223 | "# # Compute the softmax loss, using a sample of the negative labels each time.\n", 224 | "# loss_node2vec = tf.reduce_mean(tf.nn.sampled_softmax_loss(node_embeddings,biases,node_embed,train_context_node_flat, num_sampled, num_nodes))\n", 225 | "# loss_node2vec_summary = tf.scalar_summary(\"loss_node2vec\", loss_node2vec)\n", 226 | "\n", 227 | "# # Initializing the variables\n", 228 | "# init = tf.initialize_all_variables()\n", 229 | "\n", 230 | "# # Add ops to save and restore all the variables.\n", 231 | "# saver = tf.train.Saver(max_to_keep=20)\n", 232 | "\n", 233 | "# merged = tf.merge_all_summaries()\n", 234 | "\n", 235 | "# with tf.Session() as sess:\n", 236 | "# # Restore variables from disk.\n", 237 | "# log_dir=\"../log1/\"\n", 238 | "# global_step=30001\n", 239 | "# model_path=log_dir+\"model.ckpt-%d\"%global_step\n", 240 | "# saver.restore(sess, model_path)\n", 241 | "# print(\"Model restored.\")\n", 242 | "# node_embeddings_=sess.run(node_embeddings)" 243 | ] 244 | } 245 | ], 246 | "metadata": { 247 | "kernelspec": { 248 | "display_name": "Python 2", 249 | "language": "python", 250 | "name": "python2" 251 | }, 252 | "language_info": { 253 | "codemirror_mode": { 254 | "name": "ipython", 255 | "version": 2 256 | }, 257 | "file_extension": ".py", 258 | "mimetype": "text/x-python", 259 | "name": "python", 260 | "nbconvert_exporter": "python", 261 | "pygments_lexer": "ipython2", 262 | "version": "2.7.11" 263 | } 264 | }, 265 | "nbformat": 4, 266 | "nbformat_minor": 0 267 | } 268 | -------------------------------------------------------------------------------- /code/node2vec_clean_implementation.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 4, 6 | "metadata": { 7 | "collapsed": false 8 | }, 9 | "outputs": [], 10 | "source": [ 11 | "from gensim import corpora\n", 12 | "import numpy as np\n", 13 | "import unicodecsv as csv\n", 14 | "import tensorflow as tf\n", 15 | "import math\n", 16 | "import os,sys\n", 17 | "import random\n", 18 | "from scipy.sparse import csr_matrix\n", 19 | "from tqdm import tqdm\n", 20 | "import json" 21 | ] 22 | }, 23 | { 24 | "cell_type": "code", 25 | "execution_count": 29, 26 | "metadata": { 27 | "collapsed": true 28 | }, 29 | "outputs": [], 30 | "source": [ 31 | "#ref http://stackoverflow.com/questions/8955448/save-load-scipy-sparse-csr-matrix-in-portable-data-format\n", 32 | "def save_sparse_csr(filename,array):\n", 33 | " np.savez(filename,data = array.data ,indices=array.indices,\n", 34 | " indptr =array.indptr, shape=array.shape )\n", 35 | "\n", 36 | "def load_sparse_csr(filename):\n", 37 | " loader = np.load(filename)\n", 38 | " return csr_matrix(( loader['data'], loader['indices'], loader['indptr']),shape = loader['shape'])" 39 | ] 40 | }, 41 | { 42 | "cell_type": "code", 43 | "execution_count": 34, 44 | "metadata": { 45 | "collapsed": false 46 | }, 47 | "outputs": [], 48 | "source": [ 49 | "co_author_matrix=load_sparse_csr(\"../data/co-author-matrix.npz\")\n", 50 | "with open('../data/co-author-index.json', 'r') as f:\n", 51 | " aid2aname=json.load(f)\n", 52 | "aid2aname=dict((int(k), v) for k, v in aid2aname.iteritems())" 53 | ] 54 | }, 55 | { 56 | "cell_type": "code", 57 | "execution_count": 35, 58 | "metadata": { 59 | "collapsed": false 60 | }, 61 | "outputs": [ 62 | { 63 | "name": "stderr", 64 | "output_type": "stream", 65 | "text": [ 66 | "100%|██████████| 74530/74530 [11:21<00:00, 109.31it/s]\n" 67 | ] 68 | } 69 | ], 70 | "source": [ 71 | "adj_mat_csr_sparse=co_author_matrix\n", 72 | "\n", 73 | "def alpha(p,q,t,x):\n", 74 | " if t==x:\n", 75 | " return 1.0/p\n", 76 | " elif adj_mat_csr_sparse[t,x]>0:\n", 77 | " return 1.0\n", 78 | " else:\n", 79 | " return 1.0/q\n", 80 | " \n", 81 | "p=1.0\n", 82 | "q=0.5\n", 83 | " \n", 84 | "transition={}\n", 85 | "\n", 86 | "num_nodes=adj_mat_csr_sparse.shape[0]\n", 87 | "indices=adj_mat_csr_sparse.indices\n", 88 | "indptr=adj_mat_csr_sparse.indptr\n", 89 | "data=adj_mat_csr_sparse.data\n", 90 | "\n", 91 | "#Precompute the transition matrix in advance\n", 92 | "for t in tqdm(xrange(num_nodes)):#t is row index\n", 93 | " for v in indices[indptr[t]:indptr[t+1]]:#i.e possible next ndoes from t\n", 94 | " pi_vx_indices=indices[indptr[v]:indptr[v+1]]#i.e possible next ndoes from v\n", 95 | " pi_vx_values = np.array([alpha(p,q,t,x) for x in pi_vx_indices])\n", 96 | " pi_vx_values=pi_vx_values*data[indptr[v]:indptr[v+1]]\n", 97 | " #This is eqilvalent to the following\n", 98 | "# pi_vx_values=[]\n", 99 | "# for x in pi_vx_indices:\n", 100 | "# pi_vx=alpha(p,q,t,x)*adj_mat_csr_sparse[v,x]\n", 101 | "# pi_vx_values.append(pi_vx)\n", 102 | " pi_vx_values=pi_vx_values/np.sum(pi_vx_values)\n", 103 | " #now, we have normalzied transion probabilities for v traversed from t\n", 104 | " #the probabilities are stored as a sparse vector. \n", 105 | " transition[t,v]=(pi_vx_indices,pi_vx_values)" 106 | ] 107 | }, 108 | { 109 | "cell_type": "code", 110 | "execution_count": 36, 111 | "metadata": { 112 | "collapsed": false 113 | }, 114 | "outputs": [], 115 | "source": [ 116 | "adj_mat_csr_sparse=co_author_matrix\n", 117 | "indices=adj_mat_csr_sparse.indices\n", 118 | "indptr=adj_mat_csr_sparse.indptr\n", 119 | "data=adj_mat_csr_sparse.data\n", 120 | "random_walk_length=100\n", 121 | " \n", 122 | "def get_random_walk(p):\n", 123 | " random_walks=[]\n", 124 | " #get random walks\n", 125 | " for u in tqdm(xrange(num_nodes)):\n", 126 | " if len(indices[indptr[u]:indptr[u+1]]) !=0:\n", 127 | " #first move is just depends on weight\n", 128 | " possible_next_node=indices[indptr[u]:indptr[u+1]]\n", 129 | " weight_for_next_move=data[indptr[u]:indptr[u+1]]#i.e possible next ndoes from u\n", 130 | " weight_for_next_move=weight_for_next_move.astype(np.float32)/np.sum(weight_for_next_move)\n", 131 | " first_walk=np.random.choice(possible_next_node, 1, p=weight_for_next_move)\n", 132 | " random_walk=[u,first_walk[0]]\n", 133 | " for i in xrange(random_walk_length-2):\n", 134 | " cur_node = random_walk[-1]\n", 135 | " precious_node=random_walk[-2]\n", 136 | " (pi_vx_indices,pi_vx_values)=transition[precious_node,cur_node]\n", 137 | " next_node=np.random.choice(pi_vx_indices, 1, p=pi_vx_values)\n", 138 | " random_walk.append(next_node[0])\n", 139 | " random_walks.append(random_walk)\n", 140 | " \n", 141 | " return random_walks\n", 142 | "\n", 143 | "# random_walks=[]\n", 144 | "# adj_mat_csr_sparse=co_author_matrix\n", 145 | "# indices=adj_mat_csr_sparse.indices\n", 146 | "# indptr=adj_mat_csr_sparse.indptr\n", 147 | "# data=adj_mat_csr_sparse.data\n", 148 | "# random_walk_length=100\n", 149 | "\n", 150 | "# #get random walks\n", 151 | "# for u in tqdm(xrange(num_nodes)):\n", 152 | "# if len(indices[indptr[u]:indptr[u+1]]) !=0:\n", 153 | "# #first move is just depends on weight\n", 154 | "# possible_next_node=indices[indptr[u]:indptr[u+1]]\n", 155 | "# weight_for_next_move=data[indptr[u]:indptr[u+1]]#i.e possible next ndoes from u\n", 156 | "# weight_for_next_move=weight_for_next_move.astype(np.float32)/np.sum(weight_for_next_move)\n", 157 | "# first_walk=np.random.choice(possible_next_node, 1, p=weight_for_next_move)\n", 158 | "# random_walk=[u,first_walk[0]]\n", 159 | "# for i in xrange(random_walk_length-2):\n", 160 | "# cur_node = random_walk[-1]\n", 161 | "# precious_node=random_walk[-2]\n", 162 | "# (pi_vx_indices,pi_vx_values)=transition[precious_node,cur_node]\n", 163 | "# next_node=np.random.choice(pi_vx_indices, 1, p=pi_vx_values)\n", 164 | "# random_walk.append(next_node[0])\n", 165 | "# random_walks.append(random_walk)" 166 | ] 167 | }, 168 | { 169 | "cell_type": "code", 170 | "execution_count": 37, 171 | "metadata": { 172 | "collapsed": false 173 | }, 174 | "outputs": [ 175 | { 176 | "name": "stderr", 177 | "output_type": "stream", 178 | "text": [ 179 | "100%|██████████| 74530/74530 [07:58<00:00, 155.87it/s]\n", 180 | "100%|██████████| 74530/74530 [07:58<00:00, 155.71it/s]\n", 181 | "100%|██████████| 74530/74530 [08:01<00:00, 154.75it/s]\n", 182 | " 99%|█████████▉| 74151/74530 [08:03<00:02, 158.70it/s]\n", 183 | "100%|██████████| 74530/74530 [08:03<00:00, 154.04it/s]\n", 184 | "100%|██████████| 74530/74530 [08:03<00:00, 154.02it/s]\n", 185 | "100%|██████████| 74530/74530 [08:05<00:00, 153.47it/s]\n", 186 | "100%|██████████| 74530/74530 [08:05<00:00, 153.45it/s]\n", 187 | "100%|██████████| 74530/74530 [08:05<00:00, 153.37it/s]\n", 188 | "100%|██████████| 74530/74530 [08:06<00:00, 153.23it/s]\n", 189 | "100%|██████████| 74530/74530 [08:08<00:00, 152.63it/s]\n", 190 | "100%|██████████| 74530/74530 [08:08<00:00, 152.51it/s]\n", 191 | "100%|██████████| 74530/74530 [08:08<00:00, 152.42it/s]\n", 192 | "100%|██████████| 74530/74530 [08:11<00:00, 151.61it/s]\n", 193 | "100%|██████████| 74530/74530 [08:13<00:00, 150.97it/s]\n", 194 | "100%|██████████| 74530/74530 [08:14<00:00, 150.81it/s]\n", 195 | "100%|██████████| 74530/74530 [08:16<00:00, 150.02it/s]\n", 196 | "100%|██████████| 74530/74530 [08:18<00:00, 149.51it/s]\n", 197 | "100%|██████████| 74530/74530 [08:24<00:00, 147.80it/s]\n", 198 | "100%|██████████| 74530/74530 [08:26<00:00, 147.18it/s]\n" 199 | ] 200 | }, 201 | { 202 | "name": "stdout", 203 | "output_type": "stream", 204 | "text": [ 205 | "elapsed_time:1350.17567492[sec]\n", 206 | "elapsed_time:1369.05380893[sec]\n" 207 | ] 208 | } 209 | ], 210 | "source": [ 211 | "import time\n", 212 | "start = time.time()\n", 213 | "elapsed_time = time.time() - start\n", 214 | "\n", 215 | "import multiprocessing as mp\n", 216 | "proc = 20 \n", 217 | "pool = mp.Pool(proc)\n", 218 | "callback = pool.map(get_random_walk, range(20))\n", 219 | "pool.close()\n", 220 | "\n", 221 | "elapsed_time = time.time() - start\n", 222 | "print (\"elapsed_time:{0}\".format(elapsed_time)) + \"[sec]\"\n", 223 | "\n", 224 | "random_walks=[]\n", 225 | "for temp in callback:\n", 226 | " random_walks.extend(temp)\n", 227 | "del callback\n", 228 | "np_random_walks=np.array(random_walks,dtype=np.int32)\n", 229 | "del random_walks\n", 230 | "np.savez('../work/random_walks.npz',np_random_walks)\n", 231 | "\n", 232 | "elapsed_time = time.time() - start\n", 233 | "print (\"elapsed_time:{0}\".format(elapsed_time)) + \"[sec]\"" 234 | ] 235 | }, 236 | { 237 | "cell_type": "code", 238 | "execution_count": 46, 239 | "metadata": { 240 | "collapsed": false 241 | }, 242 | "outputs": [], 243 | "source": [ 244 | "#Computational Graph Definition\n", 245 | "tf.reset_default_graph()#remove this if not ipython notebook\n", 246 | "\n", 247 | "num_nodes=adj_mat_csr_sparse.shape[0]\n", 248 | "context_size=16\n", 249 | "batch_size = None\n", 250 | "embedding_size = 200 # Dimension of the embedding vector.\n", 251 | "num_sampled = 64 # Number of negative examples to sample.\n", 252 | "\n", 253 | "global_step = tf.Variable(0, name='global_step', trainable=False)\n", 254 | "\n", 255 | "# Parameters to learn\n", 256 | "node_embeddings = tf.Variable(tf.random_uniform([num_nodes, embedding_size], -1.0, 1.0))\n", 257 | "\n", 258 | "#Fixedones\n", 259 | "biases=tf.zeros([num_nodes])\n", 260 | "\n", 261 | "# Input data and re-orgenize size.\n", 262 | "with tf.name_scope(\"context_node\") as scope:\n", 263 | " #context nodes to each input node in the batch (e.g [[1,2],[4,6],[5,7]] where batch_size = 3,context_size=3)\n", 264 | " train_context_node= tf.placeholder(tf.int32, shape=[batch_size,context_size],name=\"context_node\")\n", 265 | " #orgenize prediction labels (skip-gram model predicts context nodes (i.e labels) given a input node)\n", 266 | " #i.e make [[1,2,4,6,5,7]] given context above. The redundant dimention is just for restriction on tensorflow API.\n", 267 | " train_context_node_flat=tf.reshape(train_context_node,[-1,1])\n", 268 | "with tf.name_scope(\"input_node\") as scope:\n", 269 | " #batch input node to the network(e.g [2,1,3] where batch_size = 3)\n", 270 | " train_input_node= tf.placeholder(tf.int32, shape=[batch_size],name=\"input_node\")\n", 271 | " #orgenize input as flat. i.e we want to make [2,2,2,1,1,1,3,3,3] given the input nodes above\n", 272 | " input_ones=tf.ones_like(train_context_node)\n", 273 | " train_input_node_flat=tf.reshape(tf.mul(input_ones,tf.reshape(train_input_node,[-1,1])),[-1])\n", 274 | "\n", 275 | "# Model.\n", 276 | "with tf.name_scope(\"loss\") as scope:\n", 277 | " # Look up embeddings for words.\n", 278 | " node_embed = tf.nn.embedding_lookup(node_embeddings, train_input_node_flat)\n", 279 | " # Compute the softmax loss, using a sample of the negative labels each time.\n", 280 | " loss_node2vec = tf.reduce_mean(tf.nn.sampled_softmax_loss(node_embeddings,biases,node_embed,train_context_node_flat, num_sampled, num_nodes))\n", 281 | " loss_node2vec_summary = tf.scalar_summary(\"loss_node2vec\", loss_node2vec)\n", 282 | "\n", 283 | "# Initializing the variables\n", 284 | "init = tf.initialize_all_variables()\n", 285 | "\n", 286 | "# Add ops to save and restore all the variables.\n", 287 | "saver = tf.train.Saver(max_to_keep=20)\n", 288 | "\n", 289 | "# Optimizer.\n", 290 | "update_loss = tf.train.AdamOptimizer().minimize(loss_node2vec,global_step=global_step)\n", 291 | "\n", 292 | "merged = tf.merge_all_summaries()" 293 | ] 294 | }, 295 | { 296 | "cell_type": "code", 297 | "execution_count": null, 298 | "metadata": { 299 | "collapsed": false 300 | }, 301 | "outputs": [ 302 | { 303 | "name": "stdout", 304 | "output_type": "stream", 305 | "text": [ 306 | "\u001b[0m\u001b[01;34mcode\u001b[0m/ \u001b[01;34mdata\u001b[0m/ \u001b[01;34mlog_node2vec1\u001b[0m/ \u001b[01;34mresults\u001b[0m/ \u001b[01;34mwork\u001b[0m/\r\n" 307 | ] 308 | } 309 | ], 310 | "source": [ 311 | "%ls ../" 312 | ] 313 | }, 314 | { 315 | "cell_type": "code", 316 | "execution_count": null, 317 | "metadata": { 318 | "collapsed": false, 319 | "scrolled": false 320 | }, 321 | "outputs": [ 322 | { 323 | "name": "stdout", 324 | "output_type": "stream", 325 | "text": [ 326 | "Model saved in file: ../log0/model.ckpt-1\n" 327 | ] 328 | } 329 | ], 330 | "source": [ 331 | "# hyper parameters\n", 332 | "num_random_walks=np_random_walks.shape[0]\n", 333 | "\n", 334 | "# Launch the graph\n", 335 | "# Initializing the variables\n", 336 | "init = tf.initialize_all_variables()\n", 337 | "\n", 338 | "with tf.Session() as sess:\n", 339 | " log_dir=\"../log0/\"\n", 340 | " writer = tf.train.SummaryWriter(log_dir, sess.graph)\n", 341 | " sess.run(init)\n", 342 | " for i in xrange(0,num_random_walks):\n", 343 | " a_random_walk=np_random_walks[i]\n", 344 | " train_input_batch = np.array([a_random_walk[j] for j in xrange(random_walk_length-context_size)])\n", 345 | " train_context_batch = np.array([a_random_walk[j+1:j+1+context_size] for j in xrange(random_walk_length-context_size)])\n", 346 | " feed_dict={train_input_node:train_input_batch,\n", 347 | " train_context_node:train_context_batch,\n", 348 | " } \n", 349 | " _,loss_value,summary_str=sess.run([update_loss,loss_node2vec,merged], feed_dict)\n", 350 | " writer.add_summary(summary_str,i)\n", 351 | "\n", 352 | " with open(log_dir+\"loss_value.txt\", \"a\") as f:\n", 353 | " f.write(str(loss_value)+'\\n') \n", 354 | " \n", 355 | " # Save the variables to disk.\n", 356 | " if i%10000==0:\n", 357 | " model_path=log_dir+\"model.ckpt\"\n", 358 | " save_path = saver.save(sess, model_path,global_step)\n", 359 | " print(\"Model saved in file: %s\" % save_path)" 360 | ] 361 | }, 362 | { 363 | "cell_type": "markdown", 364 | "metadata": {}, 365 | "source": [ 366 | "tensorboard --logdir=./log0" 367 | ] 368 | } 369 | ], 370 | "metadata": { 371 | "kernelspec": { 372 | "display_name": "Python 2", 373 | "language": "python", 374 | "name": "python2" 375 | }, 376 | "language_info": { 377 | "codemirror_mode": { 378 | "name": "ipython", 379 | "version": 2 380 | }, 381 | "file_extension": ".py", 382 | "mimetype": "text/x-python", 383 | "name": "python", 384 | "nbconvert_exporter": "python", 385 | "pygments_lexer": "ipython2", 386 | "version": "2.7.11" 387 | } 388 | }, 389 | "nbformat": 4, 390 | "nbformat_minor": 0 391 | } 392 | -------------------------------------------------------------------------------- /code/node2vec_experiment.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 1, 6 | "metadata": { 7 | "collapsed": false 8 | }, 9 | "outputs": [], 10 | "source": [ 11 | "from gensim import corpora\n", 12 | "import numpy as np\n", 13 | "import unicodecsv as csv\n", 14 | "import tensorflow as tf\n", 15 | "import math\n", 16 | "import os,sys\n", 17 | "import random\n", 18 | "from scipy.sparse import csr_matrix\n", 19 | "from tqdm import tqdm" 20 | ] 21 | }, 22 | { 23 | "cell_type": "code", 24 | "execution_count": 2, 25 | "metadata": { 26 | "collapsed": false 27 | }, 28 | "outputs": [ 29 | { 30 | "name": "stdout", 31 | "output_type": "stream", 32 | "text": [ 33 | "The number of papers:88675\n", 34 | "The number of authors:102587\n" 35 | ] 36 | } 37 | ], 38 | "source": [ 39 | "tsv = csv.reader(file('../data/MajorPapers.txt'), delimiter = '\\t')\n", 40 | "dbpid2pid={}\n", 41 | "#pid2title={}\n", 42 | "pid=0\n", 43 | "for row in tsv:\n", 44 | " #row[0] pid \n", 45 | " #row[1] title\n", 46 | " dbpid=row[0]\n", 47 | " dbpid2pid[dbpid]=pid\n", 48 | " #pid2title[pid]=nltk.word_tokenize(row[1])\n", 49 | " pid+=1\n", 50 | "\n", 51 | "print \"The number of papers:%d\"%len(dbpid2pid)\n", 52 | "\n", 53 | "#authors\n", 54 | "tsv = csv.reader(file('../data/MajorAuthors.txt'), delimiter = '\\t')\n", 55 | "dbaid2aid={}\n", 56 | "aid2aname={}\n", 57 | "aid=0\n", 58 | "for row in tsv:\n", 59 | " #row[0] dbaid\n", 60 | " #row[1] author name\n", 61 | " dbaid=row[0]\n", 62 | " aname=row[1]\n", 63 | " dbaid2aid[dbaid]=aid\n", 64 | " aid2aname[aid]=aname\n", 65 | " aid+=1\n", 66 | "\n", 67 | "#author-paper\n", 68 | "tsv = csv.reader(file('../data/MajorPaperAuthor.txt'), delimiter = '\\t')\n", 69 | "aid2pids={}\n", 70 | "#iitialize aid2pids\n", 71 | "for aid in aid2aname:\n", 72 | " aid2pids[aid]=[]\n", 73 | "#collect aids\n", 74 | "for row in tsv:\n", 75 | " #row[0] dbpid \n", 76 | " #row[1] dbaid\n", 77 | " dbpid=row[0]\n", 78 | " pid=dbpid2pid[dbpid]\n", 79 | " dbaid=row[1]\n", 80 | " aid=dbaid2aid[dbaid]\n", 81 | " aid2pids[aid].append(pid)\n", 82 | "\n", 83 | "author_paper_indices=[]\n", 84 | "author_paper_values=[]\n", 85 | "author_paper_shape=(len(aid2aname), len(dbpid2pid))\n", 86 | "for aid in aid2pids:\n", 87 | " for pid in aid2pids[aid]:\n", 88 | " author_paper_indices.append([aid,pid])\n", 89 | " author_paper_values.append(1)\n", 90 | "indeces=np.array(author_paper_indices).T\n", 91 | "author_paper=csr_matrix((author_paper_values, indeces), shape=author_paper_shape, dtype=np.int32)\n", 92 | "\n", 93 | "print \"The number of authors:%d\"%author_paper.shape[0]" 94 | ] 95 | }, 96 | { 97 | "cell_type": "code", 98 | "execution_count": 3, 99 | "metadata": { 100 | "collapsed": false 101 | }, 102 | "outputs": [], 103 | "source": [ 104 | "co_author_matrix=np.dot(author_paper,author_paper.T)\n", 105 | "for i in xrange(co_author_matrix.shape[0]):\n", 106 | " co_author_matrix[i,i]=0\n", 107 | "co_author_matrix.eliminate_zeros()" 108 | ] 109 | }, 110 | { 111 | "cell_type": "code", 112 | "execution_count": 4, 113 | "metadata": { 114 | "collapsed": false 115 | }, 116 | "outputs": [ 117 | { 118 | "name": "stdout", 119 | "output_type": "stream", 120 | "text": [ 121 | "ying ding\n", 122 | "jie tang\n", 123 | "0\n", 124 | "7\n", 125 | "7\n" 126 | ] 127 | } 128 | ], 129 | "source": [ 130 | "#print dbaid2aid[\"80CF3CF9\"]Ying\n", 131 | "print aid2aname[71929]\n", 132 | "print aid2aname[70282]\n", 133 | "#print co_author_matrix[71929]\n", 134 | "print co_author_matrix[70282,70282]\n", 135 | "print co_author_matrix[71929,70282]\n", 136 | "print co_author_matrix[70282,71929]" 137 | ] 138 | }, 139 | { 140 | "cell_type": "markdown", 141 | "metadata": {}, 142 | "source": [ 143 | "The column indices for row i are stored in indices[indptr[i]:indptr[i+1]] and their corresponding values are stored in data[indptr[i]:indptr[i+1]]. " 144 | ] 145 | }, 146 | { 147 | "cell_type": "code", 148 | "execution_count": 6, 149 | "metadata": { 150 | "collapsed": false 151 | }, 152 | "outputs": [ 153 | { 154 | "name": "stderr", 155 | "output_type": "stream", 156 | "text": [ 157 | "100%|██████████| 102587/102587 [10:51<00:00, 157.42it/s]\n" 158 | ] 159 | } 160 | ], 161 | "source": [ 162 | "adj_mat_csr_sparse=co_author_matrix\n", 163 | "\n", 164 | "def alpha(p,q,t,x):\n", 165 | " if t==x:\n", 166 | " return 1.0/p\n", 167 | " elif adj_mat_csr_sparse[t,x]>0:\n", 168 | " return 1.0\n", 169 | " else:\n", 170 | " return 1.0/q\n", 171 | " \n", 172 | "p=1.0\n", 173 | "q=0.5\n", 174 | " \n", 175 | "transition={}\n", 176 | "\n", 177 | "num_nodes=adj_mat_csr_sparse.shape[0]\n", 178 | "indices=adj_mat_csr_sparse.indices\n", 179 | "indptr=adj_mat_csr_sparse.indptr\n", 180 | "data=adj_mat_csr_sparse.data\n", 181 | "\n", 182 | "#Precompute the transition matrix in advance\n", 183 | "for t in tqdm(xrange(num_nodes)):#t is row index\n", 184 | " for v in indices[indptr[t]:indptr[t+1]]:#i.e possible next ndoes from t\n", 185 | " pi_vx_indices=indices[indptr[v]:indptr[v+1]]#i.e possible next ndoes from v\n", 186 | " pi_vx_values = np.array([alpha(p,q,t,x) for x in pi_vx_indices])\n", 187 | " pi_vx_values=pi_vx_values*data[indptr[v]:indptr[v+1]]\n", 188 | " #This is eqilvalent to the following\n", 189 | "# pi_vx_values=[]\n", 190 | "# for x in pi_vx_indices:\n", 191 | "# pi_vx=alpha(p,q,t,x)*adj_mat_csr_sparse[v,x]\n", 192 | "# pi_vx_values.append(pi_vx)\n", 193 | " pi_vx_values=pi_vx_values/np.sum(pi_vx_values)\n", 194 | " #now, we have normalzied transion probabilities for v traversed from t\n", 195 | " #the probabilities are stored as a sparse vector. \n", 196 | " transition[t,v]=(pi_vx_indices,pi_vx_values)" 197 | ] 198 | }, 199 | { 200 | "cell_type": "code", 201 | "execution_count": 80, 202 | "metadata": { 203 | "collapsed": false 204 | }, 205 | "outputs": [], 206 | "source": [ 207 | "adj_mat_csr_sparse=co_author_matrix\n", 208 | "indices=adj_mat_csr_sparse.indices\n", 209 | "indptr=adj_mat_csr_sparse.indptr\n", 210 | "data=adj_mat_csr_sparse.data\n", 211 | "random_walk_length=100\n", 212 | " \n", 213 | "def get_random_walk(p):\n", 214 | " random_walks=[]\n", 215 | " #get random walks\n", 216 | " for u in tqdm(xrange(num_nodes)):\n", 217 | " if len(indices[indptr[u]:indptr[u+1]]) !=0:\n", 218 | " #first move is just depends on weight\n", 219 | " possible_next_node=indices[indptr[u]:indptr[u+1]]\n", 220 | " weight_for_next_move=data[indptr[u]:indptr[u+1]]#i.e possible next ndoes from u\n", 221 | " weight_for_next_move=weight_for_next_move.astype(np.float32)/np.sum(weight_for_next_move)\n", 222 | " first_walk=np.random.choice(possible_next_node, 1, p=weight_for_next_move)\n", 223 | " random_walk=[u,first_walk[0]]\n", 224 | " for i in xrange(random_walk_length-2):\n", 225 | " cur_node = random_walk[-1]\n", 226 | " precious_node=random_walk[-2]\n", 227 | " (pi_vx_indices,pi_vx_values)=transition[precious_node,cur_node]\n", 228 | " next_node=np.random.choice(pi_vx_indices, 1, p=pi_vx_values)\n", 229 | " random_walk.append(next_node[0])\n", 230 | " random_walks.append(random_walk)\n", 231 | " \n", 232 | " return random_walks\n", 233 | "\n", 234 | "# random_walks=[]\n", 235 | "# adj_mat_csr_sparse=co_author_matrix\n", 236 | "# indices=adj_mat_csr_sparse.indices\n", 237 | "# indptr=adj_mat_csr_sparse.indptr\n", 238 | "# data=adj_mat_csr_sparse.data\n", 239 | "# random_walk_length=100\n", 240 | "\n", 241 | "# #get random walks\n", 242 | "# for u in tqdm(xrange(num_nodes)):\n", 243 | "# if len(indices[indptr[u]:indptr[u+1]]) !=0:\n", 244 | "# #first move is just depends on weight\n", 245 | "# possible_next_node=indices[indptr[u]:indptr[u+1]]\n", 246 | "# weight_for_next_move=data[indptr[u]:indptr[u+1]]#i.e possible next ndoes from u\n", 247 | "# weight_for_next_move=weight_for_next_move.astype(np.float32)/np.sum(weight_for_next_move)\n", 248 | "# first_walk=np.random.choice(possible_next_node, 1, p=weight_for_next_move)\n", 249 | "# random_walk=[u,first_walk[0]]\n", 250 | "# for i in xrange(random_walk_length-2):\n", 251 | "# cur_node = random_walk[-1]\n", 252 | "# precious_node=random_walk[-2]\n", 253 | "# (pi_vx_indices,pi_vx_values)=transition[precious_node,cur_node]\n", 254 | "# next_node=np.random.choice(pi_vx_indices, 1, p=pi_vx_values)\n", 255 | "# random_walk.append(next_node[0])\n", 256 | "# random_walks.append(random_walk)" 257 | ] 258 | }, 259 | { 260 | "cell_type": "code", 261 | "execution_count": 8, 262 | "metadata": { 263 | "collapsed": false 264 | }, 265 | "outputs": [ 266 | { 267 | "name": "stderr", 268 | "output_type": "stream", 269 | "text": [ 270 | "100%|██████████| 102587/102587 [09:57<00:00, 171.67it/s]\n", 271 | "100%|█████████▉| 102211/102587 [10:00<00:02, 182.13it/s]\n", 272 | "100%|██████████| 102587/102587 [10:01<00:00, 170.51it/s]\n", 273 | "100%|██████████| 102587/102587 [10:02<00:00, 170.35it/s]\n", 274 | "100%|██████████| 102587/102587 [10:02<00:00, 170.20it/s]\n", 275 | "100%|██████████| 102587/102587 [10:04<00:00, 169.81it/s]\n", 276 | "100%|██████████| 102587/102587 [10:04<00:00, 169.79it/s]\n", 277 | " 99%|█████████▉| 101988/102587 [10:04<00:03, 195.94it/s]\n", 278 | "100%|██████████| 102587/102587 [10:05<00:00, 169.37it/s]\n", 279 | "100%|██████████| 102587/102587 [10:06<00:00, 169.12it/s]\n", 280 | "100%|██████████| 102587/102587 [10:07<00:00, 188.57it/s]\n", 281 | "100%|██████████| 102587/102587 [10:08<00:00, 168.63it/s]\n", 282 | "100%|██████████| 102587/102587 [10:12<00:00, 167.59it/s]\n", 283 | "100%|██████████| 102587/102587 [10:12<00:00, 167.36it/s]\n", 284 | "100%|██████████| 102587/102587 [10:16<00:00, 166.35it/s]\n", 285 | "\n", 286 | "100%|██████████| 102587/102587 [10:18<00:00, 165.79it/s]\n", 287 | "100%|██████████| 102587/102587 [10:19<00:00, 165.55it/s]\n", 288 | "100%|██████████| 102587/102587 [10:22<00:00, 164.83it/s]\n", 289 | "100%|██████████| 102587/102587 [10:27<00:00, 163.50it/s]\n" 290 | ] 291 | } 292 | ], 293 | "source": [ 294 | "import multiprocessing as mp\n", 295 | "proc = 20 \n", 296 | "pool = mp.Pool(proc)\n", 297 | "callback = pool.map(get_random_walk, range(20))\n", 298 | "pool.close()\n", 299 | "random_walks=[]\n", 300 | "for temp in callback:\n", 301 | " random_walks.extend(temp)\n", 302 | "del callback" 303 | ] 304 | }, 305 | { 306 | "cell_type": "code", 307 | "execution_count": 68, 308 | "metadata": { 309 | "collapsed": false 310 | }, 311 | "outputs": [], 312 | "source": [ 313 | "np_random_walks=np.array(random_walks,dtype=np.int32)\n", 314 | "del random_walks\n", 315 | "np.save('../work/random_walks.npz',np_random_walks)" 316 | ] 317 | }, 318 | { 319 | "cell_type": "code", 320 | "execution_count": 141, 321 | "metadata": { 322 | "collapsed": false 323 | }, 324 | "outputs": [], 325 | "source": [ 326 | "#Computational Graph Definition\n", 327 | "tf.reset_default_graph()#remove this if not ipython notebook\n", 328 | "\n", 329 | "num_nodes=adj_mat_csr_sparse.shape[0]\n", 330 | "context_size=16\n", 331 | "batch_size = None\n", 332 | "embedding_size = 200 # Dimension of the embedding vector.\n", 333 | "num_sampled = 64 # Number of negative examples to sample.\n", 334 | "\n", 335 | "global_step = tf.Variable(0, name='global_step', trainable=False)\n", 336 | "\n", 337 | "# Parameters to learn\n", 338 | "node_embeddings = tf.Variable(tf.random_uniform([num_nodes, embedding_size], -1.0, 1.0))\n", 339 | "\n", 340 | "#Fixedones\n", 341 | "biases=tf.zeros([num_nodes])\n", 342 | "\n", 343 | "# Input data and re-orgenize size.\n", 344 | "with tf.name_scope(\"context_node\") as scope:\n", 345 | " #context nodes to each input node in the batch (e.g [[1,2],[4,6],[5,7]] where batch_size = 3,context_size=3)\n", 346 | " train_context_node= tf.placeholder(tf.int32, shape=[batch_size,context_size],name=\"context_node\")\n", 347 | " #orgenize prediction labels (skip-gram model predicts context nodes (i.e labels) given a input node)\n", 348 | " #i.e make [[1,2,4,6,5,7]] given context above. The redundant dimention is just for restriction on tensorflow API.\n", 349 | " train_context_node_flat=tf.reshape(train_context_node,[-1,1])\n", 350 | "with tf.name_scope(\"input_node\") as scope:\n", 351 | " #batch input node to the network(e.g [2,1,3] where batch_size = 3)\n", 352 | " train_input_node= tf.placeholder(tf.int32, shape=[batch_size],name=\"input_node\")\n", 353 | " #orgenize input as flat. i.e we want to make [2,2,2,1,1,1,3,3,3] given the input nodes above\n", 354 | " input_ones=tf.ones_like(train_context_node)\n", 355 | " train_input_node_flat=tf.reshape(tf.mul(input_ones,tf.reshape(train_input_node,[-1,1])),[-1])\n", 356 | "\n", 357 | "# Model.\n", 358 | "with tf.name_scope(\"loss\") as scope:\n", 359 | " # Look up embeddings for words.\n", 360 | " node_embed = tf.nn.embedding_lookup(node_embeddings, train_input_node_flat)\n", 361 | " # Compute the softmax loss, using a sample of the negative labels each time.\n", 362 | " loss_node2vec = tf.reduce_mean(tf.nn.sampled_softmax_loss(node_embeddings,biases,node_embed,train_context_node_flat, num_sampled, num_nodes))\n", 363 | " loss_node2vec_summary = tf.scalar_summary(\"loss_node2vec\", loss_node2vec)\n", 364 | "\n", 365 | "#logits_cp=tf.matmul(node_embed,node_embeddings_T)\n", 366 | "#loss_ww=tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(logits_cp,train_cp_labels,name='xentropy'))\n", 367 | "#loss_ww_summary = tf.scalar_summary(\"loss_cp\", loss_cp)\n", 368 | "\n", 369 | "# Initializing the variables\n", 370 | "init = tf.initialize_all_variables()\n", 371 | "\n", 372 | "# Add ops to save and restore all the variables.\n", 373 | "saver = tf.train.Saver(max_to_keep=20)\n", 374 | "\n", 375 | "# Optimizer.\n", 376 | "update_loss = tf.train.AdamOptimizer().minimize(loss_node2vec,global_step=global_step)\n", 377 | "\n", 378 | "merged = tf.merge_all_summaries()" 379 | ] 380 | }, 381 | { 382 | "cell_type": "code", 383 | "execution_count": null, 384 | "metadata": { 385 | "collapsed": false 386 | }, 387 | "outputs": [ 388 | { 389 | "name": "stdout", 390 | "output_type": "stream", 391 | "text": [ 392 | "\u001b[0m\u001b[01;34mcode\u001b[0m/ \u001b[01;34mdata\u001b[0m/ \u001b[01;34mlog_node2vec1\u001b[0m/ \u001b[01;34mresults\u001b[0m/ \u001b[01;34mwork\u001b[0m/\r\n" 393 | ] 394 | } 395 | ], 396 | "source": [ 397 | "%ls ../\n", 398 | "%rm -rf ../log_node2vec1" 399 | ] 400 | }, 401 | { 402 | "cell_type": "code", 403 | "execution_count": null, 404 | "metadata": { 405 | "collapsed": false, 406 | "scrolled": false 407 | }, 408 | "outputs": [ 409 | { 410 | "name": "stdout", 411 | "output_type": "stream", 412 | "text": [ 413 | "Model saved in file: ../log_node2vec1/model.ckpt-1\n" 414 | ] 415 | } 416 | ], 417 | "source": [ 418 | "# hyper parameters\n", 419 | "num_random_walks=np_random_walks.shape[0]\n", 420 | "\n", 421 | "# Launch the graph\n", 422 | "# Initializing the variables\n", 423 | "init = tf.initialize_all_variables()\n", 424 | "\n", 425 | "with tf.Session() as sess:\n", 426 | " log_dir=\"../log_node2vec1/\"\n", 427 | " writer = tf.train.SummaryWriter(log_dir, sess.graph)\n", 428 | " sess.run(init)\n", 429 | " for i in xrange(0,num_random_walks):\n", 430 | " a_random_walk=np_random_walks[i]\n", 431 | " train_input_batch = np.array([a_random_walk[j] for j in xrange(random_walk_length-context_size)])\n", 432 | " train_context_batch = np.array([a_random_walk[j+1:j+1+context_size] for j in xrange(random_walk_length-context_size)])\n", 433 | " feed_dict={train_input_node:train_input_batch,\n", 434 | " train_context_node:train_context_batch,\n", 435 | " } \n", 436 | " _,loss_value,summary_str=sess.run([update_loss,loss_node2vec,merged], feed_dict)\n", 437 | " writer.add_summary(summary_str,i)\n", 438 | "\n", 439 | " with open(log_dir+\"loss_value.txt\", \"a\") as f:\n", 440 | " f.write(str(loss_value)+'\\n') \n", 441 | " \n", 442 | " # Save the variables to disk.\n", 443 | " if i%10000==0:\n", 444 | " model_path=log_dir+\"model.ckpt\"\n", 445 | " save_path = saver.save(sess, model_path,global_step)\n", 446 | " print(\"Model saved in file: %s\" % save_path)" 447 | ] 448 | }, 449 | { 450 | "cell_type": "markdown", 451 | "metadata": {}, 452 | "source": [ 453 | "tensorboard --logdir=./log_node2vec1" 454 | ] 455 | } 456 | ], 457 | "metadata": { 458 | "kernelspec": { 459 | "display_name": "Python 2", 460 | "language": "python", 461 | "name": "python2" 462 | }, 463 | "language_info": { 464 | "codemirror_mode": { 465 | "name": "ipython", 466 | "version": 2 467 | }, 468 | "file_extension": ".py", 469 | "mimetype": "text/x-python", 470 | "name": "python", 471 | "nbconvert_exporter": "python", 472 | "pygments_lexer": "ipython2", 473 | "version": "2.7.11" 474 | } 475 | }, 476 | "nbformat": 4, 477 | "nbformat_minor": 0 478 | } 479 | -------------------------------------------------------------------------------- /code/node2vec_network_preprosess.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 36, 6 | "metadata": { 7 | "collapsed": false 8 | }, 9 | "outputs": [], 10 | "source": [ 11 | "from gensim import corpora\n", 12 | "import numpy as np\n", 13 | "import unicodecsv as csv\n", 14 | "import tensorflow as tf\n", 15 | "import math\n", 16 | "import os,sys\n", 17 | "import random\n", 18 | "from scipy.sparse import csr_matrix\n", 19 | "from tqdm import tqdm\n", 20 | "import networkx as nx\n", 21 | "import json" 22 | ] 23 | }, 24 | { 25 | "cell_type": "code", 26 | "execution_count": 16, 27 | "metadata": { 28 | "collapsed": false 29 | }, 30 | "outputs": [ 31 | { 32 | "name": "stdout", 33 | "output_type": "stream", 34 | "text": [ 35 | "The number of papers:88675\n", 36 | "The number of authors:102587\n" 37 | ] 38 | } 39 | ], 40 | "source": [ 41 | "tsv = csv.reader(file('../data/MajorPapers.txt'), delimiter = '\\t')\n", 42 | "dbpid2pid={}\n", 43 | "#pid2dbpid={}\n", 44 | "#pid2title={}\n", 45 | "pid=0\n", 46 | "for row in tsv:\n", 47 | " #row[0] pid \n", 48 | " #row[1] title\n", 49 | " dbpid=row[0]\n", 50 | " dbpid2pid[dbpid]=pid\n", 51 | " #pid2title[pid]=nltk.word_tokenize(row[1])\n", 52 | " pid+=1\n", 53 | "\n", 54 | "print \"The number of papers:%d\"%len(dbpid2pid)\n", 55 | "\n", 56 | "#authors\n", 57 | "tsv = csv.reader(file('../data/MajorAuthors.txt'), delimiter = '\\t')\n", 58 | "dbaid2aid={}\n", 59 | "aid2aname={}\n", 60 | "aid=0\n", 61 | "for row in tsv:\n", 62 | " #row[0] dbaid\n", 63 | " #row[1] author name\n", 64 | " dbaid=row[0]\n", 65 | " aname=row[1]\n", 66 | " dbaid2aid[dbaid]=aid\n", 67 | " aid2aname[aid]=aname\n", 68 | " aid+=1\n", 69 | "\n", 70 | "#author-paper\n", 71 | "tsv = csv.reader(file('../data/MajorPaperAuthor.txt'), delimiter = '\\t')\n", 72 | "aid2pids={}\n", 73 | "#iitialize aid2pids\n", 74 | "for aid in aid2aname:\n", 75 | " aid2pids[aid]=[]\n", 76 | "#collect aids\n", 77 | "for row in tsv:\n", 78 | " #row[0] dbpid \n", 79 | " #row[1] dbaid\n", 80 | " dbpid=row[0]\n", 81 | " pid=dbpid2pid[dbpid]\n", 82 | " dbaid=row[1]\n", 83 | " aid=dbaid2aid[dbaid]\n", 84 | " aid2pids[aid].append(pid)\n", 85 | "\n", 86 | "author_paper_indices=[]\n", 87 | "author_paper_values=[]\n", 88 | "author_paper_shape=(len(aid2aname), len(dbpid2pid))\n", 89 | "for aid in aid2pids:\n", 90 | " for pid in aid2pids[aid]:\n", 91 | " author_paper_indices.append([aid,pid])\n", 92 | " author_paper_values.append(1)\n", 93 | "indeces=np.array(author_paper_indices).T\n", 94 | "author_paper=csr_matrix((author_paper_values, indeces), shape=author_paper_shape, dtype=np.int32)\n", 95 | "\n", 96 | "print \"The number of authors:%d\"%author_paper.shape[0]" 97 | ] 98 | }, 99 | { 100 | "cell_type": "code", 101 | "execution_count": 3, 102 | "metadata": { 103 | "collapsed": false 104 | }, 105 | "outputs": [], 106 | "source": [ 107 | "co_author_matrix=np.dot(author_paper,author_paper.T)\n", 108 | "for i in xrange(co_author_matrix.shape[0]):\n", 109 | " co_author_matrix[i,i]=0\n", 110 | "co_author_matrix.eliminate_zeros()" 111 | ] 112 | }, 113 | { 114 | "cell_type": "code", 115 | "execution_count": 26, 116 | "metadata": { 117 | "collapsed": false 118 | }, 119 | "outputs": [ 120 | { 121 | "name": "stdout", 122 | "output_type": "stream", 123 | "text": [ 124 | "ying ding\n", 125 | "jie tang\n", 126 | "0\n", 127 | "7\n", 128 | "7\n" 129 | ] 130 | } 131 | ], 132 | "source": [ 133 | "#print dbaid2aid[\"80CF3CF9\"]Ying\n", 134 | "print aid2aname[71929]\n", 135 | "print aid2aname[70282]\n", 136 | "#print co_author_matrix[71929]\n", 137 | "print co_author_matrix[70282,70282]\n", 138 | "print co_author_matrix[71929,70282]\n", 139 | "print co_author_matrix[70282,71929]" 140 | ] 141 | }, 142 | { 143 | "cell_type": "code", 144 | "execution_count": 29, 145 | "metadata": { 146 | "collapsed": false 147 | }, 148 | "outputs": [], 149 | "source": [ 150 | "G = nx.from_scipy_sparse_matrix(co_author_matrix)\n", 151 | "G_max=max(nx.connected_component_subgraphs(G), key=len)\n", 152 | "new_co_author_matrix=nx.adjacency_matrix(G_max)" 153 | ] 154 | }, 155 | { 156 | "cell_type": "code", 157 | "execution_count": 31, 158 | "metadata": { 159 | "collapsed": false 160 | }, 161 | "outputs": [], 162 | "source": [ 163 | "new_aid2old_aid={}\n", 164 | "new_aid2name={}\n", 165 | "for new_aid,old_aid in enumerate(G_max.nodes()):#aidがそのまま残っている。インデックスとしてはバラバラ\n", 166 | " new_aid2old_aid[new_aid]=old_aid\n", 167 | " new_aid2name[new_aid]=aid2aname[old_aid]" 168 | ] 169 | }, 170 | { 171 | "cell_type": "code", 172 | "execution_count": 43, 173 | "metadata": { 174 | "collapsed": false 175 | }, 176 | "outputs": [], 177 | "source": [ 178 | "new_aid2dabid={}\n", 179 | "aid2dbaid=dict((v, k) for k, v in dbaid2aid.iteritems())\n", 180 | "for new_aid,old_aid in new_aid2old_aid.iteritems():\n", 181 | " dbaid=aid2dbaid[old_aid]\n", 182 | " new_aid2dabid[new_aid]=dbaid" 183 | ] 184 | }, 185 | { 186 | "cell_type": "code", 187 | "execution_count": 38, 188 | "metadata": { 189 | "collapsed": true 190 | }, 191 | "outputs": [], 192 | "source": [ 193 | "#ref http://stackoverflow.com/questions/8955448/save-load-scipy-sparse-csr-matrix-in-portable-data-format\n", 194 | "\n", 195 | "def save_sparse_csr(filename,array):\n", 196 | " np.savez(filename,data = array.data ,indices=array.indices,\n", 197 | " indptr =array.indptr, shape=array.shape )\n", 198 | "\n", 199 | "def load_sparse_csr(filename):\n", 200 | " loader = np.load(filename)\n", 201 | " return csr_matrix(( loader['data'], loader['indices'], loader['indptr']),\n", 202 | " shape = loader['shape'])" 203 | ] 204 | }, 205 | { 206 | "cell_type": "code", 207 | "execution_count": 40, 208 | "metadata": { 209 | "collapsed": false 210 | }, 211 | "outputs": [], 212 | "source": [ 213 | "save_sparse_csr(\"../data/co-author-matrix.npz\",new_co_author_matrix)\n", 214 | "with open('../data/co-author-index.json', 'w') as f:\n", 215 | " json.dump(new_aid2name, f, sort_keys=True, indent=4)" 216 | ] 217 | }, 218 | { 219 | "cell_type": "code", 220 | "execution_count": 44, 221 | "metadata": { 222 | "collapsed": true 223 | }, 224 | "outputs": [], 225 | "source": [ 226 | "with open('../data/co-author-original-ids.json', 'w') as f:\n", 227 | " json.dump(new_aid2dabid, f, sort_keys=True, indent=4)" 228 | ] 229 | }, 230 | { 231 | "cell_type": "code", 232 | "execution_count": 41, 233 | "metadata": { 234 | "collapsed": false 235 | }, 236 | "outputs": [ 237 | { 238 | "data": { 239 | "text/plain": [ 240 | "(74530, 74530)" 241 | ] 242 | }, 243 | "execution_count": 41, 244 | "metadata": {}, 245 | "output_type": "execute_result" 246 | } 247 | ], 248 | "source": [ 249 | "new_co_author_matrix.shape" 250 | ] 251 | }, 252 | { 253 | "cell_type": "code", 254 | "execution_count": null, 255 | "metadata": { 256 | "collapsed": true 257 | }, 258 | "outputs": [], 259 | "source": [] 260 | } 261 | ], 262 | "metadata": { 263 | "kernelspec": { 264 | "display_name": "Python 2", 265 | "language": "python", 266 | "name": "python2" 267 | }, 268 | "language_info": { 269 | "codemirror_mode": { 270 | "name": "ipython", 271 | "version": 2 272 | }, 273 | "file_extension": ".py", 274 | "mimetype": "text/x-python", 275 | "name": "python", 276 | "nbconvert_exporter": "python", 277 | "pygments_lexer": "ipython2", 278 | "version": "2.7.11" 279 | } 280 | }, 281 | "nbformat": 4, 282 | "nbformat_minor": 0 283 | } 284 | -------------------------------------------------------------------------------- /code/pre_compute_walks.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | #!/usr/bin/env python 3 | 4 | __author__ = "satoshi tsutsui" 5 | 6 | import numpy as np 7 | import ad_hoc_functions 8 | import argparse 9 | 10 | parser = argparse.ArgumentParser() 11 | parser.add_argument('--graph', type=str, default="../data/co-author-matrix.npz",help=u"numpy serialized scipy.sparse.csr_matrix. See http://stackoverflow.com/questions/8955448/save-load-scipy-sparse-csr-matrix-in-portable-data-format") 12 | parser.add_argument('--walks', type=str, default='../work/random_walks.npz' ,help=u"path to save numpy serialized random walks.") 13 | parser.add_argument('--p', type=float, default=1.0 ,help=u"Node2vec parameter p") 14 | parser.add_argument('--q', type=float, default=0.5 ,help=u"Node2vec parameter q") 15 | args = parser.parse_args() 16 | 17 | print("loading adjacent matrix") 18 | 19 | file_csr_matrix=args.graph 20 | adj_mat_csr_sparse=ad_hoc_functions.load_sparse_csr(file_csr_matrix) 21 | p=args.p 22 | q=args.q 23 | 24 | print("computing transition probabilities") 25 | transition = ad_hoc_functions.compute_transition_prob(adj_mat_csr_sparse,p,q) 26 | 27 | print("generating random walks") 28 | random_walk_length=100 29 | random_walks = ad_hoc_functions.generate_random_walks(adj_mat_csr_sparse,transition,random_walk_length) 30 | 31 | #This is only for one epoch. If you want to generate two epochs: 32 | # random_walks1 = ad_hoc_functions.generate_random_walks(adj_mat_csr_sparse,transition,random_walk_length) 33 | # random_walks2 = ad_hoc_functions.generate_random_walks(adj_mat_csr_sparse,transition,random_walk_length) 34 | # random_walks=random_walks1.extend(random_walks2) 35 | 36 | np_random_walks=np.array(random_walks,dtype=np.int32) 37 | np.savez(args.graph.walks,np_random_walks) 38 | -------------------------------------------------------------------------------- /code/random_walk.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 316, 6 | "metadata": { 7 | "collapsed": false 8 | }, 9 | "outputs": [], 10 | "source": [ 11 | "%matplotlib inline\n", 12 | "import numpy as np\n", 13 | "import random\n", 14 | "import networkx as nx\n", 15 | "import matplotlib.pylab as plt" 16 | ] 17 | }, 18 | { 19 | "cell_type": "markdown", 20 | "metadata": {}, 21 | "source": [ 22 | "# Random Walk prectice" 23 | ] 24 | }, 25 | { 26 | "cell_type": "code", 27 | "execution_count": 317, 28 | "metadata": { 29 | "collapsed": false 30 | }, 31 | "outputs": [ 32 | { 33 | "name": "stdout", 34 | "output_type": "stream", 35 | "text": [ 36 | "[[0 1 0 0]\n", 37 | " [1 0 3 1]\n", 38 | " [0 3 0 2]\n", 39 | " [0 1 2 0]]\n" 40 | ] 41 | }, 42 | { 43 | "data": { 44 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAV0AAADtCAYAAAAcNaZ2AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3XlU1PX+x/HnDJtSSmySEgK/FCQIBdTr7kn0ZCluabZc\nWzStsCxTNPNWmpWilJE5pql1zVJMMNGLZuaKdk1wBXeBCNzYNdkG5vP7w8skibjhDMv7cQ7H/C7M\n5+s5vfjw/n4WjVIKIYQQpqE1dwOEEKIhkdAVQggTktAVQggTktAVQggTktAVQggTktAVQggTsqzu\npEajkfFkQghxG5RSmqqOVxu6/7ux5lsjhBD1mEZTZd4CUl4QQgiTktAVQggTktAVQggTktAVQggT\nktAVQggTktAVQggTktAVQggTktAVQggTktAVQggTktAVQggTktAVQggTktAVQggTktAVQggTktAV\nQggTktAVQggTktAVQggTktAVQggTktAVQggTktAVQggTuuEeafVRQUEBOTk5ADg6OmJnZ2fmFgkh\nGooG09MtKSlhxYoVdO/eHVdXV4KDgwkODsbV1ZXu3buzYsUKSktLzd1MIUQ91yBCNyoqCnd3d5Yu\nXcpbb71Ffn4+qamppKamkpeXx/jx41myZAktW7YkKirK3M0VQtRjmuq2WNdoNKqub8H++eefExER\nwZo1awgKCqr22sTERAYPHszEiRMZN26ciVoohKhvNBoNSqkq92Gv1z3dqKgoIiIiiI+PNwZuRkYG\nQ4cO5b777sPOzo4nnniCP/74A4CgoCDi4+OJiIiQHq8Q4q6otz3dkpIS3N3diYuLIzAwEICioiL8\n/f1p3LgxH330EQBTp06lqKiIQ4cO0bhxY+BKj7dfv36kp6djbW1ttmcQQtRNDbKnGxMTg5+fnzFw\nARYtWkRaWhpr164lJCSEkJAQYmNjSUtLY+HChcbrgoKC8PX1JSYmxhxNF0LUY/U2dHU6HaGhoZWO\nrVu3jk6dOuHp6Wk85uHhQdeuXVm7dm2la0NDQ9HpdCZpqxCi4aiXoVtQUMD+/fsZMGBApePJycn4\n+fldc72vry9HjhypdGzAgAHs27ePgoKCu9pWIUTDUi9DNycnB2dnZywtK8/9yM3Nxd7e/prrHRwc\nyMvLq3TMysoKJycncnNz72pbhRANS70MXSGEqK3qZeg6OjqSlZWFXq+vdNze3v6aHi1U3QPW6/Vk\nZ2fj4OBwV9sqhGhY6mXo2tnZERAQwLp16yod9/X1JTk5+Zrrjxw5wkMPPVTpWGxsLIGBgbIugxCi\nRtXL0IWqRx8MGDCA//73v6SlpRmPpaWlsWvXLgYOHFjp2vnz518z+kEIIe5Ug5ocUVhYSLt27Wjc\nuDEzZswA4L333uPy5cscPHgQW1tb4MrkiB49ehAWFkZYWBj33HOP2Z5DCFH3NMjJETY2NkRGRjJo\n0CDS09MBsLW1ZcuWLXh5efHcc88xYsQIHnzwQX755Rdj4KanpzN48GBmzpzJsWPHaN26NQsXLrym\nPiyEELdFKXXdryun67bIyEjl5uamEhISbnhtQkKCcnNzU5GRkcZje/fuVb169VJeXl4qOjpaGQyG\nu9lcIUQ98L/srDpXr3dC1ZPQVUqplStXKhcXFxUcHKyio6OVXq83nistLVWrV69WvXr1Ui4uLmrl\nypXX3G8wGNTGjRtV27ZtVadOndSOHTtM2XwhRB1TXejW25ru35WWlhITE4NOp2Pfvn04OTkBkJ2d\nTWBgIKGhoQwZMqTaBW4MBgPff/89//rXv/D392fmzJn4+vqa6hGEEHVEdTXdBhO6VysoKDDONHNw\ncLjlYWElJSXodDpmzpxJSEgI06dP54EHHrgbTRVC1EEN8kVadezs7PD09MTT0/O2xuHa2Ngwfvx4\nTp48iYuLC23btmXy5MlVTrwQQoirNcjQrSl2dnZ8/PHHHDp0iNzcXLy9vYmIiKC4uNjcTRNC1FIS\nujXA1dWVr776im3bthEfH4+3tzfLli2jvLzc3E0TQtQyDbKme7fFx8czefJkLl26RHh4OH379kWj\nqbK8I4Soh+RFmhkopVi7di1Tpkzh/vvvZ/bs2XTo0MHczRJCmIC8SDMDjUbDoEGDOHz4MM888wyD\nBg3iySef5NSpU+ZumhDCjCR07zJLS0tGjx7NyZMnadeuHZ06dWLs2LGcP3/e3E0TQpiBhK6J2Nra\n8s4773Ds2DGsra156KGHmDZtGpcuXTJ304QQJiSha2JOTk7MnTuXhIQETp06hZeXF/Pnz7/pBXWk\nxi5E3Sahayaenp4sX76cuLg4YmNjeeihh1i1atUNQ1VGQQhRt0nomllAQAA//fQTCxYsIDw8nFGj\nRl13fG9ZWRnh4eEEBgYSFxdn4pYKIWqCDBmrRQwGA/n5+dx3331otZV/Hur1etavX8+nn37KpEmT\ncHR0pEuXLpw9e5bmzZubqcVCiKrIkLE6QqvV4uDgcE3gAmRkZBAXF8ezzz5LSEgIXbp0IT8/n3Hj\nxrFr1y4ztFYIcTsszd0AUb1ff/2VnTt3UlhYyMWLFxk1apTx3LfffkurVq1o2bIlcKWnXFVgCyFq\nD/k/tJZzcXFBp9Nx77338sEHH2BlZWWs+bZo0YK+fftiZWUFYAxcg8FgtvYKIaonNd06YMaMGTRr\n1oyXX365yvNffvklqamphIeHm7hlQoiqSE23DktKSmL//v24ubkBf43TvfqH4bJlywgODqa4uJix\nY8cSHx9f5XVCCPOT0K2lKsJyz549WFpa8vjjjwN/jdOt+HPnzp34+PjQtm1bxo8fz9dff01hYSEa\njYbU1NSKn7iVvqcQwnwkdGupilBVSrF//37Onz9f5fjd3Nxc/P39mTRpEuXl5Xz88ce0aNGCbdu2\n0bVrV/Lz8ysFtcFgkPAVwowkdGu5l156iUWLFmFpaYmFhYXxeMXLsiNHjqDT6bCxseGDDz4gNzeX\n2NhYNm7cyJIlS7C0tCQ2NpZJkyaRlZWFVqs1hvDly5fN8kxCNGQSunXAI488gqOjY6VjFSMVdu/e\nTUBAAJ999hnJycksXbqUtLQ0evXqRXBwMI888giXL1/G09OTRx55hLVr1wJXQjsqKorY2Fjp+Qph\nQjJ6oR4oKirCxsaGZ555hh9//JEff/yRvn378tFHH/HRRx/Ro0cPpk+fDlypEY8bN44zZ87QokUL\nMjMzcXV1RSkl6zoIUUOqG70gkyPqsPLyciwsLGjcuDEAzz77LP7+/vTt25fy8nKWLVvG/v37KSgo\nYOrUqRw+fJg+ffpQVlbGRx99RNu2bY2TLSoCVyZYCHF3SejWYVfXeAFCQkIICQkBrryAe+yxx7hw\n4QLdu3fn559/Zv369XTq1Int27cD0L59ewB+++03Ll26RHBwMFqt1hjmQoiaJ12aeuTqUpClpSU+\nPj6MGzeOiIgI0tPT6d+/P8XFxWzcuBEvLy8CAgLo3bs3Cxcu5LXXXmPq1Kmkp6djYWGBwWBgw4YN\nHD582IxPJET9I6Fbj/y9Jvvyyy+zcuVKDh48yOLFiwHYunUr5eXl9OvXj6+//pqysjKWLFnC0aNH\nuXTpEl27duX48eOcPHmS//znP2zYsAGQMb5C1BQpL9RTSimUUnh7e/Ptt99SWFhITk4O27dvp127\ndrRq1YrHH3/cGMalpaXk5eXRvXt3vL29+eabbygrK2PAgAGA8cWAvGwT4g7J6IV67u8vxo4ePYqV\nlRX29vb885//NPZk09PTGTRoEEuWLKF58+Y8++yzWFhYUFpaSp8+fZg6darxex06dAhfX1+p+wpx\nHbL2QgP295EIPj4+tGrVCgcHB+677z5CQkJYvXo1L774IoGBgQQEBBATE4OjoyPffvstmzZtYu3a\ntRw9ehStVsuePXuYMmWKse4rhLg1EroNUGpqKnFxcaxYsYKnn36aXbt2ceHCBaZNm8bRo0c5fPgw\nQ4YMwcXFhaysLLKzs2natCkxMTH069cPOzs74Eqg79u3j++++87MTyRE3SGh2wA5OTmxdu1aOnbs\nyJEjR8jLy+OFF17ggQceYNWqVVhbWxtruV988QVPPPEErq6u2NjYYGFhQV5eHmPHjgXgjTfe4OzZ\ns+Z8HCHqFHmR1gA1adKERYsWkZCQQFxcHFOmTMHd3R0ANzc3/P39sbW1Zd++fSQnJzNlyhQA3nzz\nTZYsWUL//v1JSUkhKSmJvLw8Jk6cCMjECiFuhrxIEwBVjkx47rnncHFxYc6cOeh0OnQ6HUlJSZWu\nOXXqFK1atTJlU4Wo9ap7kSahK4wqgrfiz6SkJKytrfHy8qJ9+/bMnz+ff/zjH+j1euMWQXClhzt3\n7lwmTJhwTXiXlpZSXFxM06ZNzfFIQpiFjF4QN+XvC6T7+fnh5eVFUVERHTp0YPbs2Zw+fbpS4Cql\nyM/PR6/XV7q3QkJCAmPGjOGFF16gsLDQRE8iRO0lPV1x07766it8fHzo1q3bda+5uqdrMBi4fPky\np06dIigoiJ9//png4GBTNVcIs5HygrgjVb0gq1gUp7i4mEaNGlV5X0UAh4eH89tvvxEdHS2z2kSD\nIOUFcUeqGpFgYWFBVlYWXbt2paSk5JrzBoMBjUbDxYsXWbRoEZMnTwb+Kj/s2rWLhx9+mMTExLvb\neCFqGQldcducnZ3ZtGkTNjY21+zfVvEb0r/+9S+6du1Kx44dK53v2rUrLVq04MiRIyZrrxC1gYzT\nFbfNYDAYtxG6eh2GS5cu0aRJEwoLC1m3bh1xcXGV7tFqtRw8eBA7Ozs8PDxM3WwhzEpCV9y2602E\n2Lx5MytWrCAzM5MRI0bg4+NjDNuKe1atWoW/vz8PPvigKZsshNlJeUHUuMGDBxMcHExGRga7du0i\nIyMDrVZrLDmcOXOGs2fP0qFDB1q0aGHm1gphWhK6okZVBOvLL7/M77//Tv/+/dm0aRPw10s0CwsL\n/vjjD3JzcyvdI0RDIEPGxF1RVlaGpeW11at169YRHR1Nfn4+99xzD8uXL5chZKLekd2AhclVBG5F\nLbeoqIjGjRuzdu1a2rdvz2uvvXbNPUVFReTk5PDAAw+YurlCmIyUF8RdVfHibNasWXh4eGAwGOjZ\ns2eV15aVldGxY0defPFF0tPTTdlMIUxGygvCZHbu3Imvry8ODg7XvaagoIA5c+awYMECRo4cyZQp\nU6q9XojaSGakiVqhe/fuNwxQOzs7PvzwQw4fPszFixfx9vZm9uzZFBUVmaiVQtxdErqiVmrRogUL\nFy5k586d/Pe//8Xb25uvv/76mplvQtQ1Ul4QdcLu3buZNGkSBQUFzJo1i8cff1xGPYhaS1YZE/WC\nUop169bx9ttv4+zszOzZs/nHP/5h7mYJcQ2p6Yp6QaPRMGDAAA4dOsTzzz/P0KFDGTp0KCdOnDB3\n04S4aRK6os6xtLRk5MiRHD9+nA4dOtC1a1deffVV2ZVY1AkSuqLOsrW1ZfLkyRw7dox77rkHPz8/\n3nvvPS5evGjupglxXRK6os5zdHQkIiKCffv2kZaWhpeXF/PmzaO0tNTcTRPiGhK6ot5wd3dn2bJl\n/PTTT8TFxeHj48PKlSsxGAzmbpoQRjJ6QdRbW7ZsYfLkySilCA8Pv61NMQsKCsjJyQGu9Kjt7Oxq\nupmiHpIhY6LBMhgMrF69mnfeeYdWrVoxa9Ys2rVrV+09JSUlxMTEoNPp2L9/P87OzgBkZWUREBBA\naGgoTzzxBNbW1qZ4BFEHyZAx0WBptVqefPJJjhw5QkhICH379mXEiBGkpaVVeX1UVBTu7u4sXbqU\nt956i/z8fFJTU0lNTSUvL4/x48ezZMkSWrZsSVRUlGkfRtQPSqnrfl05LUT9cfHiRfX+++8rBwcH\nNX78eJWdnW08FxkZqdzc3FRCQoLKyMhQr732murcubOytbVVGo1G/f7778ZrExISlJubm4qMjDTH\nY4ha7n/ZWWWuSk9XNChNmjRh2rRpJCcnU1JSgre3NzNnzmTZsmVEREQQHx9PUFAQp06dYvXq1Tg4\nONCjR49rphwHBQURHx9PRESE9HjFLZGarmjQTp48ydtvv83GjRvZuXMngYGB11yzZMkSxowZQ2pq\nKi1btqx0LjExkX79+pGeni41XmEkNV0hrqN169YMHTqUjh07Vhm4NxIUFISvry8xMTF3oXWiPpLQ\nFQ2eTqfj9ddfv+37Q0ND0el0NdgiUZ9J6IoGraCggP379zNgwIDb/h4DBgxg3759FBQU1GDLRH0l\noSsatJycHJydnavcufhmWVlZ4eTkZNxS/mbIu5KGS0JXCDOoGA0xa9YsOnXqxLZt2wDQ6/VmbJUw\nBQld0aA5OjqSlZV1R2Gn1+s5e/YsoaGhzJ49m927d1NSUnLD+5KSklizZg1Tp06ldevWAEycOJEf\nf/xRtiWqxyR0RYNmZ2dHQEAA69atu+3vERsbS2BgICNHjiQzM5PXX38dR0dHevTowfLly6sMdL1e\nT1JSEg8++CAhISG4urpSUlJCQUEB9vb2WFhYVLq+rKzsttsnapfbL2QJUU9UjD4YMmRIpePR0dEA\nJCQkoJQiLi4OZ2dnnJ2d6dGjh/E6nU7HG2+8wbBhwxg2bBgAly5d4r///S9NmjSpFKBKKTQaDRqN\nhuTkZHx8fIznzpw5g7OzM82aNePQoUNYWVkZz99JzVnULjI5QjR4JSUluLu7ExcXV2msrlarrXLz\ny549e7Jlyxbg9iZHbN68mfXr1/Pnn38yYsQIevbsSXl5ORYWFpSVlWFpacmUKVOwsLDgww8/ZMKE\nCTRr1ozJkydTUlKCjY0NBoMBrfavX1Sv/ntxcTGNGjW6k38ScYdkcoQQ1bCxsSEyMpJBgwaRnp5u\nPG4wGCgvL7/mqyJw09PTCQkJITIy8pZmoxUXFxMXF8fAgQPp2bMngLE3XPFnSkoKnTp1YteuXaxZ\ns4bs7GwAxo0bx/Tp09FqtRw7doykpCQAY+Dm5OSwYMECfvnlF0BGSdRG8juLEMDw4cM5f/483bp1\nY82aNQQFBVV7fWJiIoMGDaK8vJyMjIxb+qz+/fuTn5/PmjVrCAkJqXSuomft4OCAg4MD4eHhPPfc\nc3h4eLB582bKysp4/vnniY6OJjk5mT179pCens7MmTPp378/tra2jB8/3vgi7uqeekVv+u+9ZGFa\n8i8vxP+MGzeOOXPm0K9fP3r37k1MTEylF1h6vZ7o6GiCg4Pp168fERERJCQk8OWXXxIeHn7Tn1NU\nVERGRgZOTk4A1+xsodfr8fX1ZfXq1ZSXlzNy5Eji4+PZsGEDHTt2xNXVlQkTJnDvvffyww8/EBMT\nQ1xcHGVlZSxdupSwsDDjmOH9+/cbF2Gv6EUnJyczceJE4zA12VnDtKSnK8RVhg8fzuDBg4mJieGz\nzz7jueeeM4ZjdnY2gYGBhIaGMmTIEGNJYdu2bfTq1Yvy8nLeeeedG35G48aNOX36tLF+fHUJwGAw\nYGVlRU5ODt988w3ff/89BoOBDRs20LNnT95//30OHz7MpUuXSEtLo2fPnpSVlXHmzBl0Oh0FBQU4\nOjri7OzM9OnT2blzJ2fOnGHEiBEUFBQwbdo0cnNzyc/Pp1WrVsCV0kTFC74TJ07QrFkz7rvvvpr+\npxUVrrfmo5L1dIVQ+fn5KiUlRaWkpKj8/PzrXnfmzBnVpk0bNX369Jv6vvPnz1fu7u5q586dVZ7X\n6XRqzJgxSiml1q5dq6ytrdWMGTOUUkotXrxYhYSEGK/NzMxUR44cURcvXlTjxo1TUVFR6vjx48rN\nzU2Vl5crpZQaNWqUatmypVJKqeXLl6unn35ajRs3Tn3yySfqzJkzxutCQkJUenq68XtHRESotLS0\nm3om8RdkPV0hbo+dnR2enp54enpWuz9a8+bN2bp1K1FRUUybNu2GL7BCQ0M5efIk7du3v+acUopX\nX32VhQsXAhAQEEBkZCRDhw4FoFevXjg5OfHvf/8bvV6Pg4MDPj4+JCUlYTAYaN26Ndu3b6dbt25o\ntVrjsa5duwJw6NAhTpw4Qf/+/UlMTOTjjz9Gq9UyY8YM1q9fT0pKinFs8cyZM2XJyhomoStEDbn/\n/vvZunUrq1ev5r333rth8FpZWVU5tEuj0VSqJbu5ufHKK6/Qpk0bADw9PQkJCWHNmjU8/PDDjB07\nlsLCQlJSUlBK4ePjw65du3j44YeBKxMr0tPTadeuHUVFRRQVFTFy5Ej69OnD888/z759+wCwt7fH\n0dGRyMhIwsPDOXv2LPb29jRv3hy4/kiI1atXs3PnToqLi2/9H60BkpquEDWoWbNmbN26leDgYMrK\nyvj444+rHOt7I1dPhjD+WnrViIPBgwczePBgAM6dO4etrS22trZ4enqi1Wp5/PHH2bRpEydPnuS3\n335j06ZNREVFkZqaSmFhoXF0xoEDB+jSpQtwZQjciBEj+PTTTykrK6OoqIgvvvgC+Gvs799HPpSW\nlpKens7ixYs5d+4cjz766C29VGyIpKcrRA1zdnZmy5YtbNy40bgF/J3QaDTXDPEqLy83jjq4//77\ngStBPGHCBKytrRk4cCAtW7bktdde47fffqNp06b4+/uTlJSEVqvFy8sLgGPHjhnXffjll1/o3Lkz\ncOXlWpMmTXj00UcB+OCDD4iMjLymHdbW1rz11lts3LiRVatWkZOTYxw7nJSUhI+PDwsWLLij579a\nQUEBKSkppKSk1NmlNCV0hbgLnJyc+OWXX/jll1+YMGFCjU9SsLCwqDKIAfLy8oiJiaFPnz7odDqs\nrKzo3r07lpaWlJaW4uDggL29vbEkUTHV+OjRo8ZpzBUjGuBKT/vjjz9m9OjRxr/DX0PNDh48SEpK\nCseOHePSpUucOXMGAD8/P0JDQ9mzZ0+l+25VSUkJK1asoHv37ri6uhIcHExwcDCurq50796dFStW\nUFpaelvf2yyu94ZNyegFIe5Ybm6uat++vRo3bpwyGAwm+cySkhIVExOjQkJCVMeOHdXs2bPVhQsX\nrrkuMzNTvfvuu+qPP/5QFy5cUB07dlQxMTFVXludb7/9Vvn5+ak2bdqorVu3qpKSEuO5L774Qk2a\nNKnSrsu3YuXKlcrFxUX17t1bxcTEKL1ebzxXWlqqoqOjVXBwsHJxcVErV668rc+4G6hm9IKErhB3\nWV5enpoxY0alMDKlsrIypZRSBoPBGPx//wFQVlam1q1bp4YNG6aWLl1a6b4///yzynv+btSoUSo+\nPr7SsTfeeEMtXLhQXb58+ZbbHRkZqdzc3FRCQoL64Ycf1KBBg5Sbm5tq3Lix8vb2VlOmTFGXLl1S\nSimVkJCg3NzcVGRk5C1/zt1QXejKgjdCmID63+SD2sRgMBhXPLveea1WS2RkJBcuXOCjjz6q9Bx6\nvR4rKyvgyvN9/fXXJCcn88knnwBQWFjI+PHj6dWrF8OHD7+ltkVFRREWFkZ8fDwtW7akc+fOPPDA\nAwwePJgHHniAAwcO8P777+Pj48Pu3buBKy8Cu3Xrxpw5c27582padQveSE9XiFpi9+7dKiUlxWyf\nX1ZWZpwkUaHi7xW/1hsMBuMxnU6nXnzxRbVjxw61Zs0a9cwzzyidTlfpvgULFqgePXqojIyMSt+3\nul5zcXGxcnFxUYmJicZjVZUnli1bprRardq6davxWEJCgnJxcTHbbxUVqKanK0PGhDAjpRRnzpzh\nyy+/5Mcff6SwsJAdO3bg6upq8rb8feF0+OuFWsUQtqt7xr169eLixYtERERQUlLCm2++ySOPPGK8\nb/369SQmJmJvb8+hQ4cqPVPF5Ixu3boZv7y8vNBoNMTExODn51dpmU1HR8dr2tahQweUUmRmZhqP\nBQUF4evrS0xMDE899VTN/MPUMAldIcwgMzMTGxsbiouL+eabbzh+/Di//fYbmZmZZgnc6lyv/ODt\n7c3kyZOvOV5RltixYwdt2rThq6++uuYaLy8v1qxZQ3x8PFu3bmXGjBkUFhbSrVs3UlNTeffdd2/Y\nrm3btqHRaCotBA9XZvtFRkbW2tCV8oIQZrB27VrVpEkTNWrUKPXkk0+qU6dOVTofGRlZaQ2E2shg\nMFRZkigpKVEZGRkqJCRErVmzxvhC7kbS09PV4sWL1T333FNplEJVMjIyVLNmzdSjjz56zbnS0lJ1\nzz33VLtWxt2GrL0gRO0SEBBA//79CQsLY/78+Tz44IPGc5988gn79+/n7NmzZmzhjWk0mirHCycm\nJtK1a1datGhBnz59qixbVMXNzY1HHnkEZ2fnarcnunz5MgMHDsTa2pqlS5dec97KygonJyfj8pa1\njZQXhDCxrKws44aV999/P3Z2dsZRAbm5ufz555+MHj26ysVw6oLOnTuTlpZGYWEhtra2NTpyo7i4\nmP79+5OWlsaOHTto0aJFjXxfU5IhY0KY2JIlS1ixYgXz58/H29v7mvMlJSWcP3+ey5cvX1OvrO8K\nCgpwdXUlLy/POBytQllZGQMHDiQ+Pp7NmzfToUOHKr+HXq/H3t6ezMzMaleGu5tkjzQhaomysjK2\nb99OUFAQ3t7elabGVvy3jY0NixcvZvr06ZW2b28IHSA7OzsCAgJYt25dpeNKKZ555hm2bdvG2rVr\nrxu4ALGxsQQGBpotcG9EQlcIE7K0tOTdd9/l3nvvRa/XV/q1u+K/V6xYwZYtW9DpdJSXl5OQkFDp\nfH0XGhqKTqe75tjq1auZMGECjRs3Zs+ePcavq4eMAeh0OkJDQ03Z5Fsi5QUhapHMzEy+/PJLHnvs\nMTIyMvj3v//Nhg0b+P3337n//vuxsrKqlbPbalJJSQnu7u7ExcUZx+p6enpW2qn5au+//z7vvfce\ncOUlXr9+/UhPTzfr4uvVlRfkRZoQtUhSUhLJyckcP36coqIi2rRpw8CBA3Fzc6OwsBArK6t6Hbhw\npbwSGRnJoEGDjNOAU1NTb3hfeno6gwcPJjIyslbvdiHlBSFqAaUUer2eV155hZMnTxIcHMzixYvx\n8vLCycmJzZs34+/vz969exvE7r3Dhw9n7NixdOnShcTExBten5iYSLdu3Zg4caLZ1124ESkvCFGL\nnDp1CjtVLlfNAAAQmElEQVQ7O5ydnZkxYwbR0dE89thjGAwGunTpwsCBAytdX19LDRVllQsXLrBv\n3z4CAwN57bXXGDBggHEMr16vJzY2Fp1OR3JyMpGRkbUmcKsrL0joClFLXB2gy5cvZ/To0bRr1473\n3nuPDh06kJWVRXh4OM7Ozri5uTFu3Lhr7qsvOnfuzDvvvEPLli3R6XQkJCTQqFEjDh48SNOmTbG0\ntCQ3N5fAwEBCQ0MZMmRIrSopyCpjQtQxKSkpatSoUerEiRNKKaW2bdumnJycVPfu3dXFixdVnz59\n1Ny5c83cyrsjOztbPfTQQ5WOhYaGqpdfflnl5+crPz8/tXv3brNO870RZBqwEHWHwWDA09OTL7/8\nktatW3Pu3DnGjBlDeHg4bm5uLFiwgPnz55OdnW28Z8eOHRw4cMCMra45f/zxB35+fmRmZhrr19Om\nTePSpUvMmzeP++67j86dO9facbg3IqErRC1TsZZBRe3S1taWzp07M3LkSJYtW8auXbt47LHHyMvL\nA66EVLt27Th27BglJSVma3dNadeuHUuWLMHe3h6tVktZWRnOzs688sorRERE4O7uDvy1J1xdI6Er\nRC1XVlbGqVOnWLp0KRYWFqxZs4Z//vOfzJw5k507d/Luu++SkZHB8OHDsbGxMXdza8S9996Lra0t\n8NcPn27dujFp0iTjDsWqjr5vkhdpQtRi6n8vyY4fP84TTzzByJEjGTt2LDY2NuzYsYNVq1bh4eHB\n888/j7Ozs/G+ijVthXnI5Agh6iiNRoPBYMDb25v169ezc+dObGxs2LRpE//5z3/w9PTk9ddfZ+PG\njZw7dw6DwcArr7xi3PGhro9q+PszlJWVVbvsY10gPwqFqOW0Wi0GgwEPDw9GjBjBnj17+P777/Hw\n8GDMmDEsWrSI119/HYAvv/ySL774AqjbazVU/Ib992e42bV5azMJXSHqgKtLBV5eXnTq1IlRo0YR\nExPD559/zrZt2xgzZgw//vgjSUlJzJ8/H7jSM8zIyDBXs29LeXk5Go2GJUuWcO7cOeD6IVwXSegK\nUYcYDAbs7e0ZM2YMTZs2ZceOHeh0Ojw9PdHr9Zw+fZoNGzaQnJwMwE8//cQHH3zA/v37zdzym1fR\nm83IyKi0IeaMGTOMIzbqMgldIeqQih5vfn4+paWllJaWsmvXLgBOnz5NZGQkQ4cORafTsWvXLrZv\n3467uztt27Y1Z7NvSk5OTqWFbd5//32cnJyAKz3dxo0bY29vb67m1RgJXSHqoK1bt/LFF1/w7bff\nkpyczJtvvsngwYNxc3Pjk08+Yc+ePbz77rskJCRgaWlpDOvaPBppx44dtG/fnv/7v/9j8eLFlc5p\nNBomTpxYq9t/s2TImBB1kFKKXr160bFjR9zd3fnhhx/o2LEj4eHh/Prrr6xYsYKioiLCwsKYNGkS\nDz/8MDNmzKi1IxqubldWVhYAzs7OxqFvBoMBpVSdeZEm2/UIUc9oNBq2bt1KYWEhxcXFDB06lPDw\ncLZt28bKlStxcHDg008/xcvLi6FDh3Ly5EljsOXk5HD06FFzP0IlV/8gWLBgATY2NlfWKfhfD12r\n1daZwL0RCV0h6rB58+bx1ltvMXbsWAB27dqFhYUFEyZMoEmTJuzcuZP58+cTEhJCVlYW06dPJzAw\nsNZt715cXMypU6c4ffo0P//8M02bNjUGscFgYOHChfWitADIKmNC1AcGg0EppVRRUZG6dOmSUkqp\n3bt3q/79+6vIyEillFInTpxQTZo0UUFBQWZr5/WcOXNGjRw5Umk0GtWkSRP16quvqu+++06VlJSo\n5ORk1aVLF6XUX89Z21HNKmNS0xWinlBX1UUPHTrEq6++yrBhw3jzzTdJS0ujS5cuhIaGcvz4cfz9\n/QkLCyMiIoKmTZsyZswYM7f+iq+++oqcnByKiorYuHEjhw8fpnXr1rz66qu88sorlJeX14kygyxi\nLkQDU1xcTHR0NMOHD+fcuXN07NiRl156iQ8++ACAvLw8YmNj+fDDD1m3bh1t2rQxa3uVUhgMhioD\n9dy5c9jb29epxXwkdIVoQK5e7KasrIzhw4cTGBjI1KlTK61dEBYWxsiRI/Hx8akVaxoUFhYyY8YM\ntm3bRqdOnfDy8uKhhx7Cw8PDuJxjXSEL3gjRgFw9ZVir1dK0aVPjOrtXB+vs2bON5Yja8Cv7zz//\nzN69e5kxYwYHDx4kISGBmJgYWrVqxYIFC8zdvBojPV0h6qmKGm9xcTETJkxg8uTJtGzZstI1Bw8e\nJD09nZCQEDO1En799VeaNWtGVFQUzs7OjB492niuuLiYc+fO4eHhUaeWq5SerhANkEajoby8nEaN\nGhkXwPm7kpISiouLKx1TJp5AsXnzZk6fPg3ADz/8wMWLF+ncubNx+3kPD49KY3brOunpCtGA3Mrb\n/99//90ktdTjx49z4cIFzpw5Q1JSEvn5+RQXF6PVamnUqBEzZ8407iJRV8iLNCGEUUpKCjqdjoiI\niOtec+rUKRYvXkxpaSmffvrpXW9TWVkZFhYWaDQaSktLSU1NJTk5mezs7FoznO1WSOgKISqJjY1l\nwIAB1V5z4sQJ2rRpw3fffcfTTz99V9pR0fP+/PPPmTlzJj169MDf35/evXsTEBCAtbU1YPqSx52S\n0BVCADcuL1x9PiQkBEdHR7755pu71p6KMJ03bx4HDhzgwQcfJCsriw0bNtCoUSM6d+7M22+/Xa+G\njEnoCiGAyr3Jvn374uDgwPfffw/cWi34dnh5eXHgwAEaN25MeXk569ev55dffqFx48Y0adKEsLAw\nGjVqdNc+v6bJKmNCiGqdOHGiUuDa29uzfPly43mtVktpaalx/7Wa9Pvvv9O8eXP0ej0ajQZLS0t6\n9+7N3r17mTBhAtHR0XUqcG9EhowJIfjxxx/Zv38/RUVFODs7s3jx4kpDtDQaDdbW1sybNw+tVkto\naGiNfba7uztPPvkkvr6+dO7cmXbt2pGdnU3btm0pLi6ucyMXbkR6ukIIJk2axKVLl/jpp5/49ttv\nsbGxwWAwVLpm3759eHp6EhAQUCOfuWXLFkpKSsjKymLs2LHEx8fTr18/Tp8+ja+vL3PnziUpKYmX\nXnqpRj6vtpCarhAN3NX12kGDBuHj48PMmTMrXfPrr7+yaNEi2rVrx4gRI3BwcLjjz/3kk0+YMGEC\n7777LtbW1rRq1YrWrVvj6+uLhYUF1tbW5OfnY2NjQ+PGje/480xJXqQJIap19YI3CQkJtG/f3niu\nuLiYsLAwPDw8eOKJJ/Dw8KiRz8zNzaVp06YsXbqU3Nxczp07Z9x+3cXFhalTp9bI55iDTAMWQlTL\n0tLSuLbB1YGbnJxMQUEBx48fZ/To0TUWuICxt/zSSy+h1WopKiri0KFDpKamGksbdWm9hZsloSuE\nAP5anawi6E6cOMGsWbNwdHTk5MmT7Nu3D39//xr5rIqSxsaNG9m0aRPffPMNcXFxdOrUiYcfftj4\n8qy+BS7IizQhxN/ExMTw4IMP8vrrr+Pv78+kSZNITU3lySefrLHPqKghv/3224wePRpvb2+aNm0K\nwJtvvmlcAKc+kp6uEKKSoUOHopTCxcUFHx8fnJ2dAYy9z8uXL1NQUECLFi3u6HP++OMP7Ozs8PHx\noaSkhIceegiA7du3M2/evDt7iFpMerpCiGsMGzaMHj16GAP3asXFxQQGBrJu3bo7+gyDwUDPnj3R\n6XT4+fkBV8YLt2zZssoha/WFjF4QQtyyvXv30r9/fxYtWsTAgQNv+f6KuvHPP/9MaGgoFy5coHv3\n7jg7O9OrVy9GjBhRZzahrIqMXhBC1KgOHToQFxfH448/Tnl5OUOGDLnpeysWJJ82bRpPPfUUR48e\n5ejRo+zdu5cuXboYN8msq4F7I9LTFULctv379/PYY48xb948hg0bdtP3lZeXExYWRl5eHoGBgTzz\nzDM4OjrexZaalkyOEELcNQcPHqRv377MnTuXp5566pbu3b59O+vXrycjIwNvb2+GDh1qrO/WZVJe\nEELcNW3btmXTpk08+uijlJeX8+yzz1732oqZb59++ilpaWkEBgby559/smvXLqKjo7n33nvx8/Or\nl5MiKkjoCiHu2MMPP8zPP/9Mnz59MBgMjBgxosrrKqYaX7x4kfXr13PPPfcwefJk3nnnHQ4cOICv\nr68pm20WUl4QQtSYo0eP0rt3bz766CNeeOGFKq8pKCggJyeHQ4cO8fXXX1NSUsLYsWPNug18TZOa\nrhDCZI4fP07v3r2ZNm0ao0aNAq5s9R4TE4NOp2P//v3Y2dmh1WrJzc2lefPmnD9/ngEDBvDdd9+Z\nufU1Q3aOEEKYjLe3N1u2bGH69Onk5eURFRWFu7s7S5cu5a233iI/P5/MzEz++OMP8vPzmT17Nu3a\ntWPDhg1ERUVRVlZm7ke4q6SnK4S4K1JSUli7di1z585lzZo15OTkEB4ezpEjR8jLy8PZ2ZkuXbow\nbdo0fHx8SExMZPDgwUycOJFx48aZu/l3REYvCCFMbu/evcydO5f4+HhatmzJypUrad++PWPHjsXZ\n2Zn09HRmzpxJ586dOXz4MEFBQcTHx9OtWzdcXFwYPny4uR/hrpCerhCixpWUlODu7k5cXByBgYHX\nve7EiRO0adOGTz75hPHjxwOQmJhIv379SE9Px9ra2lRNrlFS0xVCmFRMTAx+fn7VBi78tZB5xVAy\ngKCgIHx9fYmJibmrbTQXCV0hRI3T6XTX3THYYDCg1+s5efIkL7/8Mi1atODpp5+udE1oaCg6nc4U\nTTU5KS8IIWpUQUEBrq6u5OfnV+rBVujQoQOJiYkAtG7dmtjYWLy9vStdo9frsbe3JzMzEzs7O5O0\nuyZJeUEIYTI5OTk4OztXGbgAy5cvZ8+ePaxYsYKmTZvSu3dv0tPTK11jZWWFk5MTubm5pmiySUno\nCiFMytvbmw4dOjB8+HA2b97Mn3/+yaxZs8zdLJOR0BVC1ChHR0eysrLQ6/U3vNbOzo5WrVpx6tSp\nSsf1ej3Z2dnGF231iYSuEKJG2dnZERAQcFPb+Zw/f55jx47RqlWrSsdjY2MJDAysk/XcG5HJEUKI\nGlcx+uDqHSWGDBlCYGAg/v7+NG3alOPHj/PZZ59hbW3NW2+9Ven+6kY/1HUyekEIUeOqmhwxZ84c\nVq1axenTpyktLcXNzY1HHnmEt99+m5YtWxrvre+TIyR0hRB3RVRUFGFhYcZpwDcjPT2dbt26MWfO\nnDo9DViGjAkhTG748OFMnDiRbt26GcflVicxMZFu3boxceLEOh24NyI9XSHEXRUVFcUbb7yBn58f\noaGhDBgwwDiGV6/XExsbi06nIzk5mcjIyHoRuFJeEEKYVWlpqXER83379uHk5ARAdnY2gYGBhIaG\nMmTIkDpbw/07CV0hRK1RUFBgnGnm4OBQL4eFSegKIYQJyYs0IYSoJSR0hRDChCR0hRDChCR0hRDC\nhCR0hRDChCR0hRDChCR0hRDChCR0hRDChCR0hRDChCR0hRDChCR0hRDChCR0hRDChCR0hRDChCR0\nhRDChCR0hRDChCR0hRDChCR0hRDChCR0hRDChCR0hRDChCR0hRDChCR0hRDChCxvdIFGU+WGlkII\nIW5DtVuwCyGEqFlSXhBCCBOS0BVCCBOS0BVCCBOS0BVCCBOS0BVCCBP6fxPr/Wn+box1AAAAAElF\nTkSuQmCC\n", 45 | "text/plain": [ 46 | "" 47 | ] 48 | }, 49 | "metadata": {}, 50 | "output_type": "display_data" 51 | } 52 | ], 53 | "source": [ 54 | "adj_mat=np.array([[0,1,0,0],\n", 55 | " [1,0,3,1],\n", 56 | " [0,3,0,2],\n", 57 | " [0,1,2,0]])\n", 58 | "\n", 59 | "print adj_mat\n", 60 | "\n", 61 | "G = nx.Graph(adj_mat)\n", 62 | "\n", 63 | "pos = nx.spring_layout(G)\n", 64 | "\n", 65 | "nx.draw_networkx_nodes(G, pos, node_color=\"w\")\n", 66 | "nx.draw_networkx_edges(G, pos, width=1)\n", 67 | "nx.draw_networkx_edge_labels(G, pos)\n", 68 | "nx.draw_networkx_labels(G, pos ,font_size=16, font_color=\"black\")\n", 69 | "\n", 70 | "plt.xticks([])\n", 71 | "plt.yticks([])\n", 72 | "plt.show()" 73 | ] 74 | }, 75 | { 76 | "cell_type": "code", 77 | "execution_count": 318, 78 | "metadata": { 79 | "collapsed": false 80 | }, 81 | "outputs": [ 82 | { 83 | "data": { 84 | "text/plain": [ 85 | "array([[ 0. , 1. , 0. , 0. ],\n", 86 | " [ 0.30151134, 0. , 0.90453403, 0.30151134],\n", 87 | " [ 0. , 0.83205029, 0. , 0.5547002 ],\n", 88 | " [ 0. , 0.4472136 , 0.89442719, 0. ]])" 89 | ] 90 | }, 91 | "execution_count": 318, 92 | "metadata": {}, 93 | "output_type": "execute_result" 94 | } 95 | ], 96 | "source": [ 97 | "#normalize matrix with L2 norm\n", 98 | "\n", 99 | "norms = np.apply_along_axis(np.linalg.norm, 0, adj_mat)\n", 100 | "adj_mat_norm = adj_mat / norms.reshape(-1,1)\n", 101 | "adj_mat_norm" 102 | ] 103 | }, 104 | { 105 | "cell_type": "code", 106 | "execution_count": 319, 107 | "metadata": { 108 | "collapsed": false 109 | }, 110 | "outputs": [ 111 | { 112 | "data": { 113 | "text/plain": [ 114 | "array([[ 0. , 1. , 0. , 0. ],\n", 115 | " [ 0.2 , 0. , 0.6 , 0.2 ],\n", 116 | " [ 0. , 0.6 , 0. , 0.4 ],\n", 117 | " [ 0. , 0.33333333, 0.66666667, 0. ]])" 118 | ] 119 | }, 120 | "execution_count": 319, 121 | "metadata": {}, 122 | "output_type": "execute_result" 123 | } 124 | ], 125 | "source": [ 126 | "#normalize matrix with L1 norm\n", 127 | "#this is transition matrix of random walk\n", 128 | "\n", 129 | "sums = np.sum(adj_mat,axis=1).astype(np.float32)\n", 130 | "adj_mat_trans = adj_mat / sums.reshape(-1,1)\n", 131 | "adj_mat_trans" 132 | ] 133 | }, 134 | { 135 | "cell_type": "code", 136 | "execution_count": 334, 137 | "metadata": { 138 | "collapsed": false 139 | }, 140 | "outputs": [ 141 | { 142 | "data": { 143 | "text/plain": [ 144 | "[1, 3, 1, 2, 1, 2, 3, 1, 2, 3]" 145 | ] 146 | }, 147 | "execution_count": 334, 148 | "metadata": {}, 149 | "output_type": "execute_result" 150 | } 151 | ], 152 | "source": [ 153 | "#generarte pure random walk\n", 154 | "\n", 155 | "random_walk=[np.random.choice([0,1,2,3])]\n", 156 | "random_walk_length=10\n", 157 | "for i in xrange(random_walk_length-1):\n", 158 | " cur_node = random_walk[-1]\n", 159 | " next_node=np.random.choice([0,1,2,3], 1, p=adj_mat_trans[int(cur_node)])\n", 160 | " random_walk.append(next_node[0])\n", 161 | "\n", 162 | "random_walk" 163 | ] 164 | }, 165 | { 166 | "cell_type": "code", 167 | "execution_count": 335, 168 | "metadata": { 169 | "collapsed": false 170 | }, 171 | "outputs": [ 172 | { 173 | "data": { 174 | "text/plain": [ 175 | "[0, 1, 3, 2, 0, 1, 3, 2, 3, 2]" 176 | ] 177 | }, 178 | "execution_count": 335, 179 | "metadata": {}, 180 | "output_type": "execute_result" 181 | } 182 | ], 183 | "source": [ 184 | "#generate random walk with restart\n", 185 | "\n", 186 | "random_walk=[np.random.choice([0,1,2,3])]\n", 187 | "random_walk_length=10\n", 188 | "restart_prob=0.2\n", 189 | "start=random_walk[-1]\n", 190 | "for i in xrange(random_walk_length-1):\n", 191 | " cur_node = random_walk[-1]\n", 192 | " if np.random.choice([True,False], 1, p=[0.2,0.8])[0]:\n", 193 | " next_node=start\n", 194 | " else:\n", 195 | " next_node=np.random.choice([0,1,2,3], 1, p=adj_mat_trans[int(cur_node)])\n", 196 | " random_walk.append(int(next_node))\n", 197 | " \n", 198 | "random_walk" 199 | ] 200 | }, 201 | { 202 | "cell_type": "markdown", 203 | "metadata": { 204 | "collapsed": true 205 | }, 206 | "source": [ 207 | "# node2vec walk" 208 | ] 209 | }, 210 | { 211 | "cell_type": "markdown", 212 | "metadata": {}, 213 | "source": [ 214 | "node2vec has a parameterlized strategy to generate random walk. (See section 3.2.2 in the node2vec paper) \n", 215 | "\n", 216 | "Suppose the edge t -> v. i.e random walk jsut traversed from node t to node v \n", 217 | "Unnormalized transition probabilities $ \\pi_{vx} = \\alpha_{pq}(t,x)w_{vx} $ where \n", 218 | "$\n", 219 | " \\alpha_{pq}(t,x) = \\left\\{ \\begin{array}{ll}\n", 220 | " 1/p & if \\ d_{t,x}=0 \\\\\n", 221 | " 1 & if \\ d_{t,x}=1 \\\\\n", 222 | " 1/q & if \\ d_{t,x}=2 \\\\\n", 223 | " \\end{array} \\right.\n", 224 | "$\n", 225 | " \n", 226 | "Note that transition probability for a random walker at a node v now depends on the previous node t in addition to v's connections\n", 227 | " \n", 228 | "Now, let's think about exmaple as follows" 229 | ] 230 | }, 231 | { 232 | "cell_type": "code", 233 | "execution_count": 322, 234 | "metadata": { 235 | "collapsed": false 236 | }, 237 | "outputs": [ 238 | { 239 | "data": { 240 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAV0AAADtCAYAAAAcNaZ2AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3Xlcjen7B/DPOSmFNpUiFdkl0gnTqMYoM0YjIwY/+/pl\nQgjD2GdsXzLIEqPJjGVUqMiSxk7WaUEkazui7Wg/2/37w/c842hhTJ1Tp+v9es1reJ7nnK4HXd3n\nfq77unmMMRBCCFEOvqoDIISQ+oSSLiGEKBElXUIIUSJKuoQQokSUdAkhRIko6RJCiBI1qOokj8ej\nejJCCPkIjDFeRcerTLr/e2H1R0MIIWqMx6sw3wKg6QVCCFEqSrqEEKJElHQJIUSJKOkSQogSUdIl\nhBAloqRLCCFKREmXEEKUiJIuIYQoESVdQghRIkq6hBCiRJR0CSFEid7be4F8GKFQiJycHACAkZER\n9PX1VRwRIaQ2opHuv1BWVoagoCA4OzvD3Nwcrq6ucHV1hbm5OZydnREUFASRSKTqMAkhtQgl3Y8U\nEhICKysr7N69Gz4+PsjPz0dycjKSk5ORl5eHOXPmIDAwEJaWlggJCVF1uISQWoKmFz7Cli1bsGHD\nBpw4cQICgaDc+YEDB+LPP//EkiVLsG7dOgwePBhZWVnw9vZWQbSEkNqEku4/FBISgg0bNiA6OhqW\nlpblzgcFBeHOnTtcP02BQIDo6Gg4OTnB1NQUw4cPV3bIhJBahKYX/oGysjLMmjULR44cqTDh5uXl\nwcfHB5s2bVJo/m5paYnw8HDMmjWL5ngJqeco6f4DYWFh6NKlC+zt7Ss8v2DBAnTt2rXC0axAIICN\njQ3CwsJqOkxCSC1GSfcf8Pf3h5eXV4XnoqOjsX//fmzfvr3S13t5ecHf37+mwiOE1AGUdD+QUChE\nfHw8PDw8yp0Ti8WYNm0a5s+fj7Zt21b6Hh4eHoiLi4NQKKzJUAkhtRgl3Q+Uk5MDExMTNGhQ/tnj\nunXrUFpaikWLFlX5HpqamjA2NkZubm5NhUkIqeWoeuFfSk9Px5o1axAYGIjS0lKUlpZyD9HKysog\nFAqhq6sLPp9+vhFCAF5VW6zzeDxGW7C/IRQKYW5ujry8PGhqanLHL168iL59+wJQ3K6ex+OBMQYe\nj4f4+Hh07doVYrEYhoaGyMzMpGXChKix/33/V7gPO410P5C+vj66d++OY8eOwdPTkzvevXt3nD9/\nvtz1ffr0wZgxYzB58mRunjciIgL29vaUcAmpxyjp/gPy6oO3k66enh5cXFwqvN7KygrOzs7c76uq\nfiCE1A800fgPeHp64u7du4iLi3vvtTwej1uVBgCxsbG4d++eQsImhNQ/NNL9Bxo2bAg/Pz988803\nlS4DlpNKpdyv09LSMHjwYPj5+UFLS0sZoRJCaika6f5Dw4cPx7x58+Dk5ITY2Nj3Xh8bGwuBQIBR\no0ZR3wVCCCXdj+Ht7Q1fX1+4u7vDzc0NYWFhkEgk3HmxWIzQ0FC4urrC3d0dI0aMwIEDB5CZmanC\nqAkhtQGVjP0LIpEIYWFh8Pf3R1xcHIyNjQEA2dnZsLe3h5eXFzw9PaGlpYX169dj//79uHz5MlUv\nEKLmqioZo6RbTYRCIbfSrGnTpuUSK2MMM2fOxP379xEZGUlzu4SoMUq6tYRUKsXQoUPRuHFj7N27\nl1apEaKmqkq69F2vRBoaGjhw4ACePn2KxYsXqzocQogKUNJVMh0dHURERHBzwYSQ+oXqdFXA2NgY\nkZGRcHJyQosWLfDNN9+oOiRCiJJQ0lURa2trRERE4KuvvoKpqSkcHR1VHRIhRAloekGFHBwcsHfv\nXgwePBgPHz5UdTiEECWgpKtiX331FVavXo2vvvoKWVlZqg6HEFLDKOnWApMmTcKYMWPg7u6OwsJC\nVYdDCKlBVKdbSzDGMHnyZHTq1Alz585V6FBGCKlbaHFEHSGVSiEWi6Gtra3qUAgh/wIlXUIIUSJa\nkVaHyWQy7tdZWVkICAjAsmXLVBgRIeTfoDrdWk7en+Hu3bvYsWMHUlNTUVRUhJCQEOrPS0gdREm3\nFvvrr7+wZMkSDBkyBFevXoWenh6OHz+u6rAIIf8CTS/UYl26dEFOTg5ycnIgEAiwZcsWhfM//fQT\nrl+/rqLoCCEfg0a6tVRhYSHOnDmD1q1bY9KkSWjWrBmANxUOUqkUy5Yt42p6GWNUYkZIHUFJt5Y6\nc+YMtmzZgrFjx3IJF3jTHpLP56NTp07o1q0b7OzsVBglIeSfopKxWqikpAQjRoxA//798d133wH4\nezT79qg2KSkJpaWllHgJqWWoZKyO0dHRgZOTE7eRZWXTB3PmzEFwcDD3e/oBSUjtRyPdWqykpAQ6\nOjrljpeVlcHHxwdaWlrYtGkTnj9/DolEAgsLCxVESQh5F61IUxMymQx8Ph/nzp1DbGwsnJyccOrU\nKQQHB0NbWxtxcXEA3sz7EkJUh5KumunXrx/4fD6Ki4vRo0cPiEQiTJgwAQKBgLuGKhoIUR2a01Uj\nsbGxuHPnDj7//HPs3r0bo0ePRuvWrSGTybBr1y6MHDkSACjhElJLUclYHcIYg0AgwMOHD6Gvr48r\nV65g9uzZMDc3R1FREV68eAEvLy+F18inJAghtQNNL9RRGRkZ+Pbbb3Hnzh0sXboUrq6usLOzw7Fj\nxyAWi1FSUoLx48cDoKkGQpStqukFGunWUS1btsTEiROhp6fHNb7x8vJCQEAALl68iLVr1yI/Px+z\nZ89WcaSEkLfRSLcOqmjKYNKkSUhLS4OlpSWsrKwwe/ZseHh4YN++fVwpmUgkgpaWlipCJqReoZGu\nmqlojrZx48bYtWsXWrdujf79+yM/Px/t2rXjEm5qaiqOHTuGGTNmKDtcQshb6AmLGpBKpUhPT0do\naCgA4NChQ7hy5QqaN28Oxhj27NmDZs2awdnZGQUFBSqOlpD6jUa6dRxjDBoaGti0aRPc3d3RrFkz\njB07Fjdu3IBMJkNAQAAuXboEAwMDDBo0iHsNPVgjRDUo6dZxPB4PMpkMrVq1wt69e3H06FEUFBQg\nLy8Pf/zxBzIzMzFhwgS4uroqvEYqldLKNUJUgB6kqQn56FUikSAjIwO//PILJBIJvvnmG9jY2GDK\nlCnQ19eHSCTC3r17FV5DCKletCKtHpAnzwYNGiArKwtFRUWYNGkSWrduDXd3d+jq6uKnn35CYWEh\n/Pz8FF5DCFEeGumqqefPn6N58+ZwcXFB7969sXbtWgBv9l1bt24d+vTpgxkzZqC0tBRlZWXQ19dX\nccSEqA8a6dZDRkZGKC0thYmJCZdwGWOIiopCXFwcmjZtCpFIhLCwMKxatQpCoVDFERNSP1DSVUOl\npaU4evQoGGNo3LgxvL29ceHCBaxevRoHDx7Enj17MHLkSAQHB+P69euwtLREkyZNVB02IfUCJV01\npK2tjeTkZIwbNw4BAQHQ19fH4sWLceLECQQFBcHR0RFbt27FunXrkJKSAldXV2hoaEAsFqs6dELU\nHs3pqrEpU6bg+fPnMDIyglgsxpYtW6CpqYmAgAAkJSWhW7du6NSpE2bNmoWoqCi0bNlS1SETohZo\nGXA9FRAQgLi4OGhpaaFdu3Z49eoVfv75ZwiFQowaNQqff/45AKBZs2ZISEiAubk5eDweXr9+DT09\nPRVHT4h6oqSr5uzt7blfl5WV4fXr15gwYQJ69OgBAFiwYAFKSkrg7OyMmJgYPHv2DMePH8eaNWtg\nYmKiqrAJUVs0vVBPyBdCFBUVoXHjxigsLMQPP/yAgoICzJs3D126dIGnpyeOHDmCc+fOoU+fPtzu\nwlTPS8g/QyVjhEucOjo6KCoqwujRoyEWi7Fs2TJ06dIFS5cuxdWrV9G3b19cvnyZe01qaqoqwyZE\n7dD0Qj3D5/PRuHFjTJs2DZ06dYKVlRXGjBmDa9eu4fHjx2jSpAm2bNkCAJg2bRri4+Nx5coVNGhA\n/1QIqQ40vVDPvNtvYc2aNTh+/DiuXr2qcN2aNWsQFRWFU6dOQUdHR9lhElKn0fQC4bw7P1tQUAAn\nJycAb/ryyo0bNw4XL16Ejo4O1e8SUo0o6dZT8k8wK1aswNOnT5GQkMC1emSMwdzcnLtWU1NTJTES\noo5oeqEek/fULSsrQ8OGDcud3717NxISErBp0yYVREfqI6FQiJycHABv+ofU1UZMVU0vUNIlHLFY\nXG5U+25Cph68pLqVlZUhLCwM/v7+iI+P5+rDX716he7du8PLywtDhgypU5uq0pwu+SCampoQiUR4\n+vQpd6xhw4YKc73yhJuenq70+Ij6CQkJgZWVFXbv3g0fHx/k5+cjOTkZycnJyMvLw5w5cxAYGAhL\nS0uEhISoOtxqQUmXKDh69Ch+//13hWPvbusTFxeHn376Cfv27VNiZETdbNmyBfPnz8eJEycwdepU\n7N27F9bW1mjUqBE6duyI5cuX44svvsCZM2dw4sQJzJ8/nytnrMtoeoF8FH9/f8yYMQOPHj1CmzZt\nVB0OqWNCQkIwf/58REdHw9LSEo6OjmjZsiUGDx6Mli1b4tatW1i+fDk6derElTOmpaXByckJvr6+\nGD58uIrvoGpVTS+AMVbpf29Ok/pIJpOV+738WHR0NOvatStbsWKFKkIjdVxpaSkzNTVlsbGx3LHs\n7Oxy1+3du5fx+Xx2/vx57lhMTAwzNTVlZWVlygj1o/0vd1aYV2l6gVTo3YdlPB4PPB4Phw8fxujR\no+Ht7Y3ly5cDUKzvJeR9wsLC0KVLF4VmTEZGRuWu69GjBxhjyMzM5I4JBALY2NggLCxMKbHWBEq6\npEppaWmIjIwEAGzbtg0LFy7E1q1bMWnSJO4aDQ0N5ObmQiaTqSpMUof4+/vDy8vrvddduHABPB4P\nnTp1Ujju5eUFf3//mgqvxtGcLqlSaWkpPvvsM0ilUvB4PGzbtg29evXizstkMvD5fGzevBlRUVFc\ngiakIkKhEObm5sjPz6+yn0dmZibs7e3RvXt3nDp1SuGcWCyGoaEhMjMza20dL5WMkY8ik8mgra2N\n4OBgpKenY+TIkejVqxc3opUnXOBNaZmWlhZEIpEqQya1XE5ODkxMTKpMuEVFRRg0aBC0tLSwe/fu\ncuc1NTVhbGyM3Nzcmgy1xlDSJZXi8/mQSqVo3bo1wsPDERERgYyMDC7Ryv+/efNmXLt2DdOnT4eW\nlhbo0xH5WKWlpfj666+RkpKCqKgotGjRQtUhVTuaXiDvJR/RpqWlwdLSUuHcL7/8gpiYGAwePBgu\nLi60qzBRIBaLkZiYiLi4OMTGxuLmzZu4ffs2CgsLy61+lEgkGDRoEKKjo3HmzBlud5OK3pOmF4ha\n4/P5kMlkCgk3NzcXT548QVJSEnr37o1+/fpRwq1Dnj17Bi8vL/j5+QFAtXw6KSsrQ1xcHAICAvDd\nd9+hZ8+e0NfXx4gRI3DmzBlYW1tj/fr1EAgEOHbsmMJrGWMYOXIkLly4gKNHj1aacAEgIiIC9vb2\ntTbhvg91piYfRJ54CwoKwBjDokWLIJPJEBERgaZNm1Inslrq5cuXOHbsGFJSUrBy5Uqud0bDhg3R\nvHlzHDp0CD179oSjo+M/et/S0lLcuXOHG8HGxcXh/v37sLa2hkAggL29PUaNGoVu3bpBV1dX4bUz\nZ86Ev78/PD09uWNeXl44fPgwlixZAh0dHdy4cYM717JlS4Wudx9a/VBb0fQC+WBXr17F0KFDYWNj\nA0dHR4wYMQLm5uZ48uSJQs0lUR72ngZEW7Zs4apK0tLS0LJlS4Xz3t7esLGxwfjx4yvsNAcAxcXF\nuH37NpdcY2Nj8fDhQ3To0AH29vZcku3WrRsaNWr03pjLyspgZWWFkydPcv9uWrdujbS0tAqvX758\nOZYtWwYAiI2Nhbu7O9LS0mp1AxxakUaqzZkzZ9jly5fZs2fPKjz/7ko2UnMOHjzIpFJpldc8evSI\nMcbY559/zk6dOsUdl7/O19eXrVixguXk5DDGyv/9xcbGMh0dHWZvb8+mTJnCduzYwW7evMlKSkr+\nVezBwcHMwsKCpaamfvBrUlNTmYWFBQsODv5XX1sZUMWKNEq6pNoUFxcrLNkk1evy5cssPz+f+72F\nhQW7ceMGY4xVmnzlSXTSpEnsxx9/5I7Lr4+MjGQTJ05khw8fVnhvOYlEUmNLbv38/JiFhQWLiYl5\n77UxMTHMwsKC+fn51Ugs1a2qpEsP0ki1EYlEmDhxYoW1leTDvH79GhcuXMCrV6+4Y/K66F27duHa\ntWvccVdXV+73fD4fQqEQgOJDMflre/Xqhfj4eO68vNzvyy+/RL9+/TBixAhs27atXDwaGho19jHe\n29sbvr6+cHd3h5ubG8LCwiCRSLjzYrEYoaGhcHV1hbu7O3x9feHt7V0jsShVZdmY0UiXfIQHDx4w\nU1NTdvLkSVWHUuvJZDImkUiYVCplEomEMcbYtWvX2IgRI9hff/3FXVdWVsZkMhlzdnZmxsbG7JNP\nPmGnT59mMTEx7M8//2SMMRYaGsrGjx/Pve/bX4MxxuLj45mNjU25GCQSCVu6dCnbvHlzjd3n+5SV\nlbGgoCDm7OzMGjduzKysrJiVlRVr3Lgxc3Z2ZkFBQbW+wc27UMVIl6oXSLVq3749wsPD4eHhgVOn\nTkEgEKg6pFrj6dOnuHLlCj799FO0adMGPB6vXK9iCwsLtGnTBvv378ePP/7IlXaNGzcOTk5OePHi\nhcJoV27w4MFcNcDbD9bkv7azs4NMJoNYLAafz+e+rrxvhoWFBUpLS6GtrV1Tt19OaWkpnj17Bh0d\nHYwYMQKampq4desWJk6cCABo2rRpnS0LqwolXVLtHB0dERAQAA8PD0RHR6N169aqDknpGGPcohJ5\n4tPT04OTkxNX71xQUIB9+/YhOzsbEokEjx49QlBQEBo0aIBr165h//79KCgowLfffotvv/0WP/74\nI3bs2AHg7wUr736Nirx48QKnTp1CUlISLCws0L17d/z6669cGZaZmRmeP39ew38iitavX48LFy7A\nzMwM5ubm0NTUxPHjx2FoaIj/+7//g66urlomXIAWR5Aa8s0332DRokXo378/t9GgOpIn13ePyUex\nPB6PO6+np4e7d+/i5MmTAIDg4GBERkaiRYsWSElJwaNHjwAAlpaW0NXVRbt27bhSrMuXL0NTUxNa\nWlpITU0ttxQbAEpKShTmddn/5nbT09ORlJSEX3/9FefOnUNkZCSXcE+fPo2AgAC0aNEC2traSlvC\nHR4ejnHjxmHMmDFo3rw5NDQ08Ntvv3FNyteuXYsjR46oZ9vQyuYdGM3pkmqwYMEC5ujoyIqLi1Ud\nyr8ikUhYYmIi94T/3dIq+ZysfO6xuLiYzZo1izk7OzNXV1d26dIlxhhj8+bNYytXrmSMMda2bVuu\npOv27dvM1taWPX/+nJ09e5ZNnTqVK6eaNm0a99TewcGBHT58mDHGWF5eHmOMMbFYzBhjbP78+Swg\nIOCD7kdevfD48WN29epVLn5lef78eYXH8/LyWHR0NPv9999Z69atWUZGhlLjqi6g6gWiKmvWrIG1\ntTVGjRpVp0YtEokE27dvx6RJkyCRSJCXl4eFCxciKSkJwN9zpUKhENu2bePmSDt06IDi4mJERkai\nR48eOHr0KHbu3ImpU6fi2bNn6NatGwoLC1FSUoIWLVpwH+s7deqE5s2bIyEhgZuOSUlJAQC0adMG\niYmJAIDvv/8eP//8M2xtbbltbORfe/369Zg8eXKFfY1lMhmkUik3kpWPkNu0aQNHR8dyc8s1zczM\nDDKZjPtPzsDAAL179+bmsN9eiaYuKOmSGsXn87F7924IhULMnj27znQgKywsxC+//IIhQ4aAz+fD\n2NgYJiYmyM7OxtOnTzFjxgzk5OQgMTER3t7eOH78OF68eIFPP/0UEokE+/btQ2BgIKZMmYJRo0Zx\nbS9NTExQXFyMvLw8dO3ales//PLlS+79TExMIBaLcffuXQBA9+7dYWZmhqKiIgwePBjHjx9HQkIC\nBgwYAEDxwdnb7TbfJn94VtXcrzKx/5Wt8Xi8CuMFgL179yo5KuWgB2mkxmlpaSEsLAzOzs7YsGED\n5s+fX+F1QqGQm/81MjJS6YOUvLw8mJiYoFevXlxS6NOnD8zMzGBtbY0HDx4gJiYGZWVlcHd3x/Pn\nzxEaGor27dtDLBajTZs2uH37NjZv3gwDAwOuGZB8zjUrKwvDhg2Dr68vvvvuO5SWlqJdu3ZISkpC\nkyZNMHLkSO6Bm6urK1xdXbnYmjZtCqDiJcCVJbDa5PXr11w/hnfjv3z5Mvh8Pnr37v3eJc51FSVd\nohT6+vo4efIkPv30U5ibm2PkyJEA3qzDDwsLg7+/P+Lj42FiYgIAePXqFbp37w4vLy8MGTJE6evs\ntbW1YWVlhatXr2LgwIGQyWQYPXo0d37AgAGIj4+Hjo4OWrdujTZt2mD58uXo3bs3jIyM0LVrV8TF\nxXG9Du7evYvc3FxueuHWrVuYMGECmjVrhosXL8LBwQEnTpzgFge8nWSBikewdTUh9evXD+np6bCz\ns0OnTp0gEAjQuXNn2NnZYffu3XBxcaGkS0h1aNmyJU6ePIm+ffuiefPmePnyJWbNmgVbW1v4+Phg\n4MCB3I4CYrEYx44dg7+/P+bMmQM/Pz+lbrttaGiIpk2b4unTpwrHJRIJGjRogAEDBiA4OBgpKSkw\nMzODjY0Nnj59yv1w8PDwQGZmJlxdXZGXlwc+nw8fHx+4uLhgyZIlXHNubW1t6Orq4rfffkNCQgL2\n7NkD4O9VZfKkUxdGsB+qqKgI/v7+kEgkuHbtGn7//XekpqZCQ0MDSUlJmD17NoC6+0PlvSp7wsao\neoHUkPPnzzN9fX1mbm7OYmJiWEZGBpsxYwZzdHRkjRo1YjweT6ERijLW3b9bjXD27Fk2aNAgdufO\nnQqvl0qlLCIigjVq1IirGPj666+Zm5ubQiVAfHw810ymIvHx8eynn35i+/bt+0fNX+qq0tJStmXL\nFpacnFzuXE5ODmvSpAlXjVGXgVakkdokKysLjRs3xtWrV2FpaYmLFy/i8OHDEAgEcHFxwZ9//qlw\nvUAgQHR0NJycnGBqavqvR7yFhYUKrQrv37+P6OhoriewSCTCnj178OzZM1y5cgW2trbl3oPP52Pg\nwIFo164dV22wc+dOGBoaKlQC2NnZcb+uaIrAzs5O4Rp117BhQ8ycORMikQiMMW7DUw0NDfD5fEyZ\nMgUNGjRQ26kFgPrpEiWrqJfq2wIDA/Gf//wHycnJ5bYG+pheqq9fv0Z8fLxCs+3U1FTY2Ngo9IK1\nt7cv901+69YtLF68GFu3boW1tfXH3zT5IFKpFAUFBTAwMFB1KP9aVf10aaRLlCosLAxdunT5qKbn\nAoEANjY2CAsLw4gRI8qdz8vLQ3x8vEKz7czMTHTt2hUCgQB9+/bF/Pnz0blz5w/a6cLU1BTa2tpI\nS0uDtbV1paMvqVSq9DpXdaShoaEWCfd9KOkSpZI/GPtY8n293NzcEBcXpzCCffnyJezs7GBvb4+v\nvvoKixcvRseOHavc7rsqzZs3R58+fTB58mR4enpi/fr1FV5HCffjVFZTrO5oeoEojVAohLm5OfLz\n8ytNhFVNLwBvqhp0dXWhpaXFTQ3I/9+uXbtqT4BFRUW4f/8+2rVrp7YNWGoLdUrCNL1AaoWcnByY\nmJh89MgTADQ1NWFqaopz586hTZs21RhdxRo3bgwHB4ca/zr1hXyKJj09Hb///juWLl3KnVOXhPs+\n9eMuiVqpaukoqd3kc+KMMbRq1Yo7fv78eYwbN05FUSkXjXSJ0hgZGeHVq1cQi8UfvWW7WCxGdnY2\ntxSW1B2pqakwMzNDw4YNYWlpiTFjxnAjXxcXF65CRJ2mGSqivndGah19fX10794dx44d++j3iIiI\ngL29Pc2v1kFLly6FlZUVOnXqhPv37wP4e+SroaEBKysrhf3b1BWNdIlSeXl5wd/fn9taRi40NBQA\nEBMTA8YYTp48CRMTE5iYmMDFxYW7buPGjRg0aJBSYybVQ941LCsrC8bGxhVeo64LIt5G1QtEqV68\neIEOHTrg/PnzCrW6lW0589lnn+HcuXMA3iyOcHNzg46ODmxsbODj44Mvv/xS7UdG6iQvLw/bt2/H\nkiVLIJVK37vVUF1VVfUC/WslShMeHg6BQABHR0d88803SEtL487Jm2y/+5884aalpWHw4MHYuXMn\nUlJSMHbsWCxatAg2NjbYtWsXSkpKVHVb5AMUFRWhuLgYt27d4voEy/v7PnnyBIsWLVJxhEpUWVMG\nRg1vSDV5/vw5Gzp0KOvQoQO7fPkyY4wxPz8/ZmFhwWJiYt77+soa3shkMnbu3Dk2cOBAZmJiwpYs\nWVLpNjBEte7fv8/69+/PeDweMzQ0ZP/3f//HVq9eze7cucO2bt3Kxo4dyxhjSt82qKagioY3lHRJ\njZHJZGzPnj2sWbNm7IcffmAlJSUK54ODg5mpqSlzdXVloaGhCt2lRCIRO3z4MOvbty8zNTVlwcHB\nVX6tpKQk9t133zEDAwM2fvx4dvv27Rq5J/LvrF+/nm3dupVt3bqVDR06lFlZWTE3NzcWHR3NGPt7\n77a6rqqkS3O6pEakpaVh6tSpePHiBQIDAyvttSASibgm5nFxcdwDluzsbNjb28PLywuenp4f3OAm\nJycHu3btwrZt29CpUyf4+Pigf//+NO+rYvKEU1/+Hqqa06WkS6qVTCbDjh07sGLFCvj4+GDevHkf\nXJMrFAqRm5sL4M2WNP+mLEwkEuHgwYPYuHEjSkpKMHv2bIwZMwaNGjWq9DVMjdsJ1gZSqRRr1qxB\nREQEunXrhpYtW6J169awtLREnz591OrPnpIuUYoHDx5wu9EGBgaiY8eOqg4JjDFcvHgRmzZtwrVr\n1zB16lT4+PjA0NCwwuulUim2bNkCW1tbuLm5KTla9RYfH4///Oc/2LRpE548eYKHDx/iyZMn0NLS\nUrtNKKn3AqlREokEGzZswIYNG7B8+XJMnz691nyM5PF46NOnD/r06YOHDx/Cz88P6enpFSbd4uJi\nrF+/HqfKrhi2AAAbhElEQVROncKdO3fg4uICTU1NtRqBqUJGRgZ0dHTw8OFDuLq6wsnJCU5OTtz5\niraMV2eUdMm/cuvWLUyaNAlGRkaIiYlRWE9f27Rv3x7bt2+vcBt4xhgePnyIuLg4BAcHc/cREhKC\nW7duYeHChbQK7iOdOXMGp06dgr6+PmJiYiCRSPDZZ5/BysoK5ubmMDIyqldTO5R0yUcpLS3FqlWr\nsGvXLqxfvx7jxo2rM980FcUpk8lw9uxZmJmZKfzguHTpEhwcHKCjowOA5n0/Rs+ePWFiYgKhUIiW\nLVvi1atXiIqKgkgkQn5+PlasWIHOnTurOkyloaRL/rGrV69i0qRJ6Ny5M+7cuQMzMzNVh/SvzJ07\nF5cvX4a9vT23NbxMJsPLly/RvHlzODg4oKCgAEZGRlzCpd0iPlznzp3LJdWcnBykp6fj0aNHFfZN\nVmf0II18sLKyMnz//fc4dOgQtm7diiFDhqg6pGpx//59zJkzB9u3by/Xo1c+sp0xYwY++eQTjB49\nutzr1b0rVnV49uwZevfuDVtbW1hZWcHe3h69evWCtbU1tLW1VR1etaNlwKRayGQyNGrUCHfv3lWb\nhAu86X5mbGzMLU+Vk+9UC7yZl+zbty/u3buHzp074+LFi9x1lHArJx+05efno0+fPrCysoKFhQUO\nHTqEr7/+Gl988QV27Nih4iiVi/61EGRlZWHKlCk4evRolT0MGjZsiLVr16pdL9v4+Hjk5OTA1dVV\n4bg8mUZFRcHV1RWlpaVYvnw5pFIpNyJesGABCgoKFJ7AS6VS5QVfy8n/XPbv34+2bdti69atmDdv\nHnbv3o3Ro0fjq6++QmxsLNdlrj6gpEsQFRWFsrIy7Ny5UyHxvju1pG4jOvn9JSUloVmzZmjSpIlC\n8pSPcktKStChQwcsXrwYbdu2xYIFC/Do0SPs2LEDMTEx0NXVxevXr3Hp0iUAihtV5ufnK/GOaq/c\n3FxuGoHP58PMzAxPnjyBg4MDTE1N8eLFCxVHqDzq9V1E/hF50hk7diz27t2LESNG4NKlS7h16xYA\n9e9tKr8/e3t7xMXF4cqVKwo/WOQJuLCwENu2bUPjxo3x/fff486dOwgPD0dWVhb27NmDgIAAbNq0\nCdu2bYOjoyMiIiK49zhz5gx+++03lJWVKffmagn5D6Aff/wRoaGh+OSTTzB9+nSsWLECz549Q7du\n3XDv3j106dJFxZEqDz1Iq4fWr1+Pzz//HD169ADw95N4qVSKefPmQSaTQUtLC61atcL06dNVHK3y\nSCSSCjfNnD59Om7evInIyEikpaWhf//+cHV1xdKlS9G8eXO0b98eX3/9NXbu3ImEhAScPHkSy5Yt\nw8uXL2FkZISSkhI0adIEQP2qehCLxUhNTUXbtm0hkUggEolw9epVxMXFISMjA3PnzoWpqSkCAgIw\nZcoUtXqgVtWDNOoyVo+8fv2a/ec//2E8Ho998sknCh2d5L9OS0tjx44dY5aWluzMmTOMsTfdwuq7\nvLw8xhhjK1euZFpaWiwkJIQxxtiqVauYq6sr27RpE+vevTvz8PBgbdu2ZWKxmK1bt47NnTuXZWdn\nl3s/demmVZWkpCT2xx9/sNzcXLZgwQL222+/saioKPbgwQNWWFjICgsLGWOMlZaWqjjS6ocquozR\n9EI9whhD3759IZVKoa+vj9WrV3Pn5B+rLSws0LlzZ7i6unIdv9R9mqEq8ikGAwMDAMCUKVMQEBCA\nYcOGAXizxHXYsGGYPXs2rl69ii+//BLff/89xGIxMjMz0a1bNxgZGeHs2bOYMGECjh49CuDvP+8X\nL14gOTlZBXdW85o2bYrevXtDKBSCx+Ph/v37iIiIwPbt2zF79mxur7yGDRuqOFLloumFeiY/Px8G\nBgZITEzE119/jSNHjqBr166QyWTg8Xjg8XgoKSnB3LlzYWVlhalTp3IJh5QXExODhQsXYsGCBejX\nrx93/I8//sCNGzewYMEC/PHHHzh9+jQGDx6MCxcuoGvXrujYsSOGDh2KgwcPIiwsDCtXrkS7du1U\neCfKkZGRgZSUFKSmpqJjx44QCARqWedMdbqEY2BgAMYYOnfujPHjx8PHxweAYmWCjo4Opk+fjvPn\nz9e7ZiTv8+4gxMHBARMnTsTy5cvRv39/REdHIz8/H/fu3UP79u2hp6eHvXv34r///S+8vLzg4+OD\nX3/9FfHx8ZDJZHj69Ck6dOiglglX/m8nLi4OEyZMgL29PRo3bgwnJyd4eHhAIBAAUL+qmPepX3db\nj0il0vcmzGXLlkEikSAwMBDh4eHYuXMnd+7s2bN48eJFvX3qXpmKplpGjhyJq1evYsOGDbC3t8eV\nK1eQnp6Ofv36ISoqCrq6uhAIBCgrK0OTJk2gq6uLOXPm4M6dOzhz5gzi4uIwdepU3Lt3j3vP3Nxc\nREdHK/PWqp08mU6aNAleXl4oLi6GpqYmsrOz4eXlhcLCQhVHqBqUdNUIYwwrV64E8KZUp6LyJ+BN\n4pAX8M+dOxdTpkzBTz/9hN69e3PXuLm54erVq2jevLmSoq+75H+WXbp0QaNGjeDo6IhRo0bB0tIS\nfD4fjo6OAN40Vt+3bx969eoFY2NjREZGoqysDIGBgejQoQMWLlwIiUQCiUSCefPmIS4ujvsadTVB\nJSUlwczMDJ06dYKenh6aNGmCRo0aITY2lqvoqG8o6aoJ9r8eAdHR0ZgzZw6AN8nAy8sLT548KfcR\nTkNDA3fv3sW6deuwaNEixMfHo2vXrtzH586dO1e5ywL527slYE2bNkX//v2ho6MDFxcXREdHY9Cg\nQVi1ahUOHTqE2bNn49GjR0hPT4ePjw+aNWsGR0dHpKSkoEGDBvjpp5+wZ88eFBUVce85bNgwBAQE\nKPvW/rXCwkJ069YNp0+fhp2dHQDg6NGjaNu2LYD610sXoC5jdZ5UKkVKSgq3LPXw4cPo2bMnpk+f\njrZt2+Kbb74p18RFrkWLFjh06BA3mq2sTpX8M/IfgAkJCcjMzMTNmzdx/PhxHD58GM7OzujSpQt+\n+eUXNGjQgNud4uDBgxg6dCgAoEGDBujZsydSU1Oxdu1aTJ06FUlJSRg1apQqb+sfk0qlcHBwQGJi\nIqZMmQKZTIaCggJoa2tzjYPq44N6+g6r43JzczFt2jTs2rULBw4cwMCBA3Hp0iWYmJgAAL744guF\n6+UJAQDXQ0H+D58SbvWQ//mamJhg7dq1+OGHH2BjYwMDAwOMHj0aBQUFiI2Nha2tLXR1dfHo0SM8\nevQIS5YsAQD4+/vj/Pnz6NSpE0pLSxEbG4thw4ZxnzzqwtN+mUwGDQ0NREdHY+jQoRg8eDD++usv\nJCQkoE+fPujWrRuA8p8S6gMqGauj5B/L+Hw+RowYgfDwcMyYMQNr1qypsO7x7t27MDY2rvO9b+ui\n9PR03LhxA4MGDeI26Xz58iU0NTVhaGiIefPmobi4GP7+/ti9ezf8/Pxw+/ZthU8e8pVsL168gJmZ\nWblm6iKRCJmZmWjdurVK7rEy7u7u4PF48PDwwLffflvp3nTqhkrG1IhQKATwJtnKfyA6ODjAzMwM\ntra2aNiwYYUVB6dOncKNGzeUGit5w8LCAkOHDoWmpia3KqlZs2Zc/XOvXr24qYWQkBCsWLECwN9P\n/xlj0NDQQHZ2Njw8PACUr6JISkrC2rVr0aJFC0RGRirpzt7vxIkTWLRoEZ4/f44ffvgBq1atwpUr\nV1QdlmpVtlSN0TLgWkUsFrPAwEC2cOFCxhhjhw4dYmPHjmVBQUGMMcbu3bvH9PT0WEZGhsLrJBKJ\n0mMlH0cikbCVK1cyR0dHduzYsQqvkS8ffvfvVSgUMqlUygwMDNj+/fsZY6pbvi2PMTk5mYWGhrIj\nR46wOXPmMGtra8bj8diYMWNUGp8ygJYB122MMTRo0ACGhoYQCoWYOXMm9uzZAxcXFyxZsgQHDx5E\n586dMW7cOEyaNAkAsHPnThQWFtbLObO6SkNDA0uWLMGaNWu4key7T/flo9+3/15lMhn09PRw7tw5\ntGnTBiNGjADw92h41apVSEpKUsYtKHzds2fPYujQoTh58iTGjh2LyMhIhIeHY+rUqQDq50M0ADTS\nre3eHg2UlJSwbdu2MYFAwK5cucIYY2zv3r3s22+/ZQ8fPmSMMebk5MQcHBzY+PHjmUgkUknM5ONU\n1ARHPqINDQ1lKSkpFY4O5a8bMGAA8/X1Lfdab29v5ubmVhMhV0kqlbI///yTDRkyhH3xxRdsz549\nrKysTOlxqAJopFv3yEc4PB4PBQUFmDZtGu7evYtRo0bB2tqaaxYyZswYmJiYIDAwEBKJBBEREThw\n4AB+++037qENqRvensOVk49oU1JSUFBQUG4ulzEGPp+PxMREPHjwgPuk8/b7CQQCtGvXDiKRSCmj\nS/nXuH79Otq2bYuNGzdi0KBB2LRpExwcHJCenl7jMdRmVCNUy+Tl5cHQ0JD7htm4cSMSExNx69Yt\npKen48SJE5g8eTIOHz6MkydPYsCAAfD29uaSsp2dHfeEuC6UFpHyKlpqLO+R8S55ada2bdvg7u4O\nQ0NDrrKBx+OhuLgYly9fhpOTU6XvXd3kX+P333/HzZs3ucUfgwYNwpMnT2BkZFTjMdRmlHRrkfPn\nz+PBgwcYP348tLW1cfHiRRw8eBD79+9HSkoKNmzYwNV9JiYm4vjx47Czs0OHDh0QFBRUrhyMEq56\nKygowIYNG5CdnY2jR4/iwYMHABQT68WLF8EYQ9euXaGlpVXjMRUXF+PZs2do0qQJ1qxZA2NjYzDG\nIBQKoa+vX6/bhMrRd2UtYmNjg2nTpiE1NRUikQj3799Hz5490bZtW/Tp0wfr1q3Dvn378OjRIwwc\nOBAA8PjxYwCg+tt6SFdXFxMmTEBxcTGEQiHmzp2L58+fK1zTsmVLSCQSpW2Hc+3aNSxevBibN2/G\n/fv3Abz5IWBgYAAej1cvl/2+ixZHqNi727ecO3cOv/76K2bMmAEjIyN8+eWX3Ee058+fY+jQobCy\nssKBAwdQWFhYb5uGEEUFBQXYt28fnJ2dYWtrCwC4d+8eTpw4gWXLlmHBggX48ccfazyOoqIi3Lt3\nD0+fPoWNjQ1sbW3LLeSoD2hxRC3F/lf0DrxJtsXFxXBxcYGtrS2OHz8OIyMjTJgwAcOHD0dJSQmO\nHj0KBwcH5OXl4fz585RwCWQyGaRSKXR1deHl5cUlXODNtuevX7/G7du3FR6wAW8a0cyfPx/h4eHV\numW8lpYWevbsiREjRnCx1LeE+z6UdFWIx+Ph2rVr6NOnD4KCgjBu3DgcPnwYY8eORWFhIY4cOYLl\ny5fD2toao0ePRkREBLy8vGBlZYVWrVqpOnxSC/D5fGhoaIAxxn10T0xMRGRkJBITE+Hi4oIOHTrA\n0tJS4XWNGzdGjx49sG7dOrRv3x5bt26tlvaRAwYMqPC4PD6aXqCkq3IhISFYv349Vq9ejZSUFFy4\ncAHm5ubo2bMnbt++jcuXLyMwMBD79u1DeHg4AgIC8PjxYzRq1Kj+FpeTcng8Hvfg9PTp05g3bx4G\nDBjA9fKt6Pphw4bh+vXr2L9/Py5dugQrKyt8//33H13S9eLFC6SmpmLhwoXYvXs3Tp48ifv373N7\npPH5fPD5fFy/fv2j71Md0JyuksjnbmUyGRISErhR7MCBA9GrVy8cOXIEY8eOhbe3N4A3T4GXLVsG\nW1tbjBkzBjk5Odi8eTPKysqwYcMGFd8Nqe0+plwwOTkZW7ZswZ49e/Dll1/Cx8cHPXr0+ODXl5WV\nYePGjdi9ezcyMjJgYGAAPp+P/Px8mJmZoW/fvujYsSNCQkJw8+bNf3pLdQptwV5LyFeIpaamsh49\nerCCggI2Y8YMZmJionDdqlWrWFZWFsvNzVU4LhQKlRYrqb/y8/PZzz//zCwtLZmTkxMLCwt7bw+P\n4OBgZmpqytzc3FhYWBgTi8XcOZFIxEJDQ1nv3r2Znp4e8/b2rulbUDlUsSKNkq6SZGVlsZYtW7Jd\nu3axtLQ0tmbNGnbq1Cn24sULZmVlxQIDA7l/mBMnTmTFxcXcaytaHkpITROLxezgwYPsk08+YdbW\n1szPz4+9fv263HV+fn7MwsKCxcTEsKioKNa3b19mZmbGGjZsyFq2bMmGDRvGEhMTGWOMxcTEMAsL\nC+bn56fs21GqqpIuTS/UgNOnT8PY2Bjdu3dHaWkpUlNT0aFDB3zxxRewtLRE9+7dkZCQAEdHR4wb\nNw6RkZGIiYlBTEwMRo0ahWHDhqn6FghRcO3aNWzatAlnz57FxIkTMXPmTFhaWiIkJATz589HdHQ0\nLC0tERwcjPj4ePTq1QsmJiZIS0vD2rVrkZGRgYSEBFhYWCAtLQ1OTk7w9fXF8OHDVX1rNaKq6QVK\nutVIPm/r4+MDAwMDTJgwAdevX8fq1atx/fp1rFy5EhoaGnB2doaXlxeaNWuGS5cucWVjb9fs0hJe\nUhulpKRw8759+/bFhQsXEBUVBXt7+0pf8/DhQ3Ts2BE///wzt39fbGws3N3dkZaWppSVcspGdbpK\nsHTpUvj6+gJ4s8Pu48ePkZGRgW+//RZ9+/bF4sWLYWdnh5s3b+Kzzz7DoEGDcOvWLRw4cIB7D/mD\nNoCW8JLaqVWrVti4cSOSk5Ohra2NNm3aVJlwgb+3hXp7OyiBQAAbGxuEhYXVaLy1UmXzDozmdD9Y\namoqa9++vUID8cePHytcs3TpUrZs2TLWs2dP9uDBA5adnc2Cg4NpvpbUWU5OTiw0NLTCc1KplIlE\nIvbw4UPm6enJzM3N2atXrxSuOXz4MHN2dlZGqEoHmtOtWTKZDKNHj0avXr0wa9YshXPyfa7y8vJw\n4cIFDBkyBL/99hvGjRun8Hoa2ZK6RCgUwtzcHPn5+RVuaNqjRw/ExsYCANq1a4eIiAh06NBB4Rqx\nWAxDQ0NkZmZCX19fKXErC00v1DAej4dJkybh3LlziI+PB1B+h11DQ0MMHjwY58+fV0i4AE0lkLon\nJycHJiYmle4gvX//fty4cQNBQUHQ09ODm5sb0tLSFK7R1NSEsbExcnNzlRFyrUHf7dWAx+PB3t4e\nPXr0wKFDh7hjFXFxcQFQfhsWQtRJhw4d0KNHDwwfPhxnzpxBYWEh/vvf/6o6rFqBkm41MTQ0RGlp\nKbS0tPDq1SsAFe8BJU/GNLoldZmRkRFevXoFsVj83mv19fXRtm1brg2pnFgsRnZ2Nvegrb6g7/x/\nQZ5UGWNIS0tDVFQUHjx4gClTpiAhIQE8Hq9aOzgRUlvo6+uje/fu3LZRVcnKykJSUhLatm2rcDwi\nIgL29vZqN5/7PvQg7SOdPXsWmZmZGD58OBo2bMgdLy4uxs8//4wnT57A398fL1++pI5gRC0FBQUh\nMDAQZ86c4Y55enrC3t4eXbt2hZ6eHh48eIDNmzfj5cuXuHHjhkLidXV1xZQpU7jdi9UJPUirRvn5\n+Zg8eTImTJiAZs2aKSRcANDW1sbIkSMhEonQsWNHnDt3TkWRElKzPD09cffuXcTFxXHHHB0dcfTo\nUYwfPx5ff/01Nm/ejM8//xzx8fEKCTc2Nhb37t2Dp6enKkJXrcpqyRjV6ZZz5MgRZm5uzr777rv3\nNp8pKipiaWlpSoqMENUIDg5mFhYWLDU19YNfk5qayiwsLFhwcHANRqZaqKJOlzam/AAvX77EzJkz\nER8fjwMHDnAVCFVp1KgRGjVqpIToCFGd4cOHIysrC05OTggPD4dAIKjy+tjYWAwePBjz5s1T274L\n70PTC1VgjGH//v2wtbVFq1atcPv27Q9KuITUJ97e3vD19YW7uzvc3NwQFhYGiUTCnReLxQgNDYWr\nqyvc3d3h6+vL9Y2uj+hBWiXS09Mxbdo0ZGRkIDAwEA4ODqoOiZBaTSQSISwsDP7+/oiLi4OxsTEA\nIDs7G/b29vDy8oKnp6daNrh5F3UZ+wdkMhl++eUXLFu2DLNmzcL3339fL/6REFKdhEIht9KsadOm\n9a8srIqkq5ZzukKhEDk5OQDeFHF/6F/4o0ePMHnyZIhEIly8eBGdO3euyTAJUVv6+vr1LtF+KLWZ\n0y0rK0NQUBCcnZ1hbm4OV1dXuLq6wtzcHM7OzggKCoJIJKrwtRKJBL6+vnB0dISnpyeio6Mp4RJC\naoRaJN2QkBBYWVlh9+7d8PHxQX5+PpKTk5GcnIy8vDzMmTMHgYGBXKf7t925cweOjo6IiorCzZs3\nMWvWLK6ROCGEVLvKaslYHanTfXt/JsYYS09PZ0OGDGH6+vpMT0+PeXp6cvWyb+/PVFpaypYuXcqM\njY3Zr7/+ymQymSpvgxCiRqCu/XTf3Z+ppKQEXbt2hY6ODlavXg0AWLx4MUpKSnDnzh3o6OggLS0N\nn376KQDAwcEB/v7+aNGihSpvgxCiZtRyC/bS0lJmamrKYmNjuWObN29mDRo0YE+fPuWOJScnswYN\nGrBNmzZxx2JiYljTpk1ZaWmpUmMmhNQPqGKkW2fndMPCwtClSxeF/ZmOHTuGTz75BK1bt+aOtWrV\nCr1798bRo0e5YwKBAHZ2dggPD1dqzIQQUmeTrr+/P7y8vBSO3bt3D126dCl3rY2NDRITExWOeXl5\nwd/fv0ZjJISQd9XJpCsUChEfHw8PDw+F47m5uTA0NCx3fdOmTZGXl6dwzMPDA3FxcRAKhTUaKyGE\nvK1OJt337c/0Ierr/kyEENWqk0m3MoaGhuVGtEDlI2BCCFG2Opl0K9ufycbGBvfu3St3fWJiYrkV\nZvV1fyZCiGrVyaRb2f5MHh4euH79OlJSUrhjKSkpuHLlCgYNGqRwbX3dn4kQolp1dnFERfszFRcX\nw87ODjo6Oli5ciUAYNmyZSgqKsLt27cVmoqr8/5MhBDVUsvWjmVlZbCyssLJkycVanUzMjIwZ84c\nnD59GowxuLm5YdOmTbC0tOSuiY2Nhbu7O9LS0qhtIyGk2qll0gXKLwP+EGlpaXBycoKvr2+93S6E\nEFKz1HY34OHDh2PevHlwcnJCbGzse6+PjY2Fk5NTvd6fiRCiWnV6pCsXEhKCWbNmoUuXLvDy8oKH\nhwdXwysWixEREQF/f3/cu3cPfn5+lHAJITVKbacX3kb7MxFCaot6kXTfVt/3ZyKEqFa9S7qEEKJK\navsgjRBC6hpKuoQQokSUdAkhRIko6RJCiBJR0iWEECWipEsIIUpESZcQQpSIki4hhCgRJV1CCFEi\nSrqEEKJElHQJIUSJKOkSQogSUdIlhBAloqRLCCFKREmXEEKUiJIuIYQoESVdQghRIkq6hBCiRJR0\nCSFEiSjpEkKIElHSJYQQJaKkSwghSkRJlxBClIiSLiGEKBElXUIIUSJKuoQQokQN3ncBj8dTRhyE\nEFIv8Bhjqo6BEELqDZpeIIQQJaKkSwghSkRJlxBClIiSLiGEKBElXUIIUaL/B7CYbXBPOgvbAAAA\nAElFTkSuQmCC\n", 241 | "text/plain": [ 242 | "" 243 | ] 244 | }, 245 | "metadata": {}, 246 | "output_type": "display_data" 247 | } 248 | ], 249 | "source": [ 250 | "adj_mat=np.array([[0,1,0,0,0],\n", 251 | " [1,0,1,1,1],\n", 252 | " [0,1,0,1,0],\n", 253 | " [0,1,1,0,0],\n", 254 | " [0,1,0,0,0]])\n", 255 | "\n", 256 | "G = nx.Graph(adj_mat)\n", 257 | "\n", 258 | "pos = nx.spring_layout(G)\n", 259 | "\n", 260 | "nx.draw_networkx_nodes(G, pos, node_color=\"w\")\n", 261 | "nx.draw_networkx_edges(G, pos, width=1)\n", 262 | "nx.draw_networkx_edge_labels(G, pos)\n", 263 | "nx.draw_networkx_labels(G, pos ,font_size=16, font_color=\"black\")\n", 264 | "\n", 265 | "plt.xticks([])\n", 266 | "plt.yticks([])\n", 267 | "plt.show()" 268 | ] 269 | }, 270 | { 271 | "cell_type": "markdown", 272 | "metadata": {}, 273 | "source": [ 274 | "This examples corresponds the condition that v=1, t=2, x1=3, x2=0, x3=4\n", 275 | "\n", 276 | "let's think about the edge 2 -> 1. i.e random walk jsut traversed from node 2 to node 1. \n", 277 | "Then, unnormalized transition probabilities $ \\pi_{1x} = \\alpha_{pq}(2,x)w_{1x} $ where \n", 278 | "$\n", 279 | " \\alpha_{pq}(2,x) = \\left\\{ \\begin{array}{ll}\n", 280 | " 1/p & if \\ x=2 \\ \\ (i.e \\ \\ d_{2,x}=0) \\\\\n", 281 | " 1 & if \\ x=3 \\ \\ (i.e \\ \\ d_{2,x}=1) \\\\\n", 282 | " 1/q & if \\ x=0 \\ or \\ 4 \\ \\ (i.e \\ \\ d_{2,x}=2) \\\\\n", 283 | " \\end{array} \\right.\n", 284 | "$\n", 285 | "

\n", 286 | "This can be implemented as follows" 287 | ] 288 | }, 289 | { 290 | "cell_type": "code", 291 | "execution_count": 323, 292 | "metadata": { 293 | "collapsed": false 294 | }, 295 | "outputs": [ 296 | { 297 | "name": "stdout", 298 | "output_type": "stream", 299 | "text": [ 300 | "0.5\n", 301 | "1.0\n", 302 | "2.0\n", 303 | "2.0\n" 304 | ] 305 | } 306 | ], 307 | "source": [ 308 | "def alpha(p,q,t,x):\n", 309 | " if t==x:\n", 310 | " return 1.0/p\n", 311 | " elif adj_mat[t,x]>0:\n", 312 | " return 1.0\n", 313 | " else:\n", 314 | " return 1.0/q\n", 315 | "\n", 316 | "#Test\n", 317 | " \n", 318 | "v=1\n", 319 | "t=2\n", 320 | "p=2\n", 321 | "q=0.5 \n", 322 | "\n", 323 | "print alpha(p,q,t,2)\n", 324 | "print alpha(p,q,t,3)\n", 325 | "print alpha(p,q,t,0)\n", 326 | "print alpha(p,q,t,4)" 327 | ] 328 | }, 329 | { 330 | "cell_type": "code", 331 | "execution_count": 354, 332 | "metadata": { 333 | "collapsed": false 334 | }, 335 | "outputs": [], 336 | "source": [ 337 | "#Let's make pi for each edge using the function alpha\n", 338 | "#The pi is stored in the follwing python dictionary whose index is edge\n", 339 | "transition={}\n", 340 | "#e.g. transition = {(1,2):[0.1,0,0.5,0,0.4], (1,3):[0.1,0,0.7,0,0.2], ...}\n", 341 | "# if we want to get a transition prob from a random walker at 1 from 2, the prob is transition[1,2]\n", 342 | "\n", 343 | "p=2\n", 344 | "q=0.5\n", 345 | "\n", 346 | "for t in xrange(adj_mat.shape[0]):\n", 347 | " for v in xrange(adj_mat.shape[1]):\n", 348 | " #if edge t to v exists\n", 349 | " if adj_mat[t,v]>0:\n", 350 | " num_nodes=adj_mat.shape[0]\n", 351 | " pi=np.zeros(num_nodes)\n", 352 | " for x in xrange(num_nodes):\n", 353 | " #if edge v to x exists. i.e possible next ndoes from v\n", 354 | " if adj_mat[v,x]>0:\n", 355 | " pi[x]=alpha(p,q,t,x)*adj_mat[v,x]\n", 356 | " #pi[x] is nunormlized transion probability from v to x where v is traversed from t\n", 357 | " pi=pi/np.sum(pi)\n", 358 | " #now, pi is normalzied transion probabilities for v traversed from t\n", 359 | " transition[t,v]=pi" 360 | ] 361 | }, 362 | { 363 | "cell_type": "code", 364 | "execution_count": 355, 365 | "metadata": { 366 | "collapsed": false 367 | }, 368 | "outputs": [ 369 | { 370 | "data": { 371 | "text/plain": [ 372 | "[3, 1, 3, 2, 1, 0, 1, 4, 1, 3]" 373 | ] 374 | }, 375 | "execution_count": 355, 376 | "metadata": {}, 377 | "output_type": "execute_result" 378 | } 379 | ], 380 | "source": [ 381 | "#Now we have pre-computed random walk transition probabilities (i.e python dict transition)\n", 382 | "#So let's use this to generate random walk for a given node\n", 383 | "\n", 384 | "#get random walk \n", 385 | "all_nodes=[0,1,2,3,4]\n", 386 | "random_edge_index=np.random.choice(len(transition))\n", 387 | "random_edge=transition.keys()[random_edge_index]\n", 388 | "random_walk=[random_edge[0],random_edge[1]]\n", 389 | "random_walk_length=10\n", 390 | "for i in xrange(random_walk_length-2):\n", 391 | " cur_node = random_walk[-1]\n", 392 | " precious_node=random_walk[-2]\n", 393 | " next_node=np.random.choice(all_nodes, 1, p=transition[precious_node,cur_node])\n", 394 | " random_walk.append(next_node[0])\n", 395 | " \n", 396 | "random_walk" 397 | ] 398 | }, 399 | { 400 | "cell_type": "code", 401 | "execution_count": 361, 402 | "metadata": { 403 | "collapsed": false 404 | }, 405 | "outputs": [], 406 | "source": [ 407 | "#let's use sparse vector to store transition probabilities\n", 408 | "\n", 409 | "transition_sparse={}\n", 410 | "\n", 411 | "#Let's make pi for each edge using the function alpha\n", 412 | "#The pi is stored in the follwing python dictionary whose index is edge\n", 413 | "\n", 414 | "p=2\n", 415 | "q=0.5\n", 416 | "\n", 417 | "for t in xrange(adj_mat.shape[0]):\n", 418 | " for v in xrange(adj_mat.shape[1]):\n", 419 | " #if edge t to v exists\n", 420 | " if adj_mat[t,v]>0:\n", 421 | " pi_vx_indices=[]\n", 422 | " pi_vx_values =[]\n", 423 | " for x in xrange(num_nodes):\n", 424 | " #if edge v to x exists. i.e possible next ndoes from v\n", 425 | " if adj_mat[v,x]>0:\n", 426 | " pi_vx_indices.append(x)\n", 427 | " pi_vx_values.append(alpha(p,q,t,x)*adj_mat[v,x])\n", 428 | " #pi[x] is nunormlized transion probability from v to x where v is traversed from t\n", 429 | " pi_vx_values=np.array(pi_vx_values)/sum(pi_vx_values)\n", 430 | " #now, pi is normalzied transion probabilities for v traversed from t\n", 431 | " transition_sparse[t,v]=(pi_vx_indices,pi_vx_values)" 432 | ] 433 | }, 434 | { 435 | "cell_type": "code", 436 | "execution_count": 369, 437 | "metadata": { 438 | "collapsed": false 439 | }, 440 | "outputs": [ 441 | { 442 | "name": "stdout", 443 | "output_type": "stream", 444 | "text": [ 445 | "[ 0.07692308 0. 0.30769231 0.30769231 0.30769231]\n", 446 | "([0, 2, 3, 4], array([ 0.07692308, 0.30769231, 0.30769231, 0.30769231]))\n" 447 | ] 448 | } 449 | ], 450 | "source": [ 451 | "# see they are the same\n", 452 | "print transition[0,1]\n", 453 | "print transition_sparse[0,1]" 454 | ] 455 | }, 456 | { 457 | "cell_type": "code", 458 | "execution_count": 370, 459 | "metadata": { 460 | "collapsed": false 461 | }, 462 | "outputs": [ 463 | { 464 | "data": { 465 | "text/plain": [ 466 | "[1, 0, 1, 4, 1, 0, 1, 4, 1, 0]" 467 | ] 468 | }, 469 | "execution_count": 370, 470 | "metadata": {}, 471 | "output_type": "execute_result" 472 | } 473 | ], 474 | "source": [ 475 | "#get random walk \n", 476 | "all_nodes=[0,1,2,3,4]\n", 477 | "random_edge_index=np.random.choice(len(transition))\n", 478 | "random_edge=transition.keys()[random_edge_index]\n", 479 | "random_walk=[random_edge[0],random_edge[1]]\n", 480 | "random_walk_length=10\n", 481 | "for i in xrange(random_walk_length-2):\n", 482 | " cur_node = random_walk[-1]\n", 483 | " precious_node=random_walk[-2]\n", 484 | " (pi_vx_indices,pi_vx_values)=transition_sparse[precious_node,cur_node]\n", 485 | " next_node=np.random.choice(pi_vx_indices, 1, p=pi_vx_values)\n", 486 | " random_walk.append(next_node[0])\n", 487 | " \n", 488 | "random_walk" 489 | ] 490 | }, 491 | { 492 | "cell_type": "code", 493 | "execution_count": 371, 494 | "metadata": { 495 | "collapsed": true 496 | }, 497 | "outputs": [], 498 | "source": [ 499 | "#next steps\n", 500 | "#implement a case that the adjacent matrix is sparse matrix\n", 501 | "#implement a muti prosessing code to generate random walks" 502 | ] 503 | } 504 | ], 505 | "metadata": { 506 | "kernelspec": { 507 | "display_name": "Python 2", 508 | "language": "python", 509 | "name": "python2" 510 | }, 511 | "language_info": { 512 | "codemirror_mode": { 513 | "name": "ipython", 514 | "version": 2 515 | }, 516 | "file_extension": ".py", 517 | "mimetype": "text/x-python", 518 | "name": "python", 519 | "nbconvert_exporter": "python", 520 | "pygments_lexer": "ipython2", 521 | "version": "2.7.11" 522 | } 523 | }, 524 | "nbformat": 4, 525 | "nbformat_minor": 0 526 | } 527 | -------------------------------------------------------------------------------- /code/train_node2vec.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | #!/usr/bin/env python 3 | 4 | __author__ = "satoshi tsutsui" 5 | 6 | import numpy as np 7 | import tensorflow as tf 8 | import ad_hoc_functions 9 | import argparse 10 | import math 11 | 12 | parser = argparse.ArgumentParser() 13 | parser.add_argument('--graph', type=str, default="../data/co-author-matrix.npz",help=u"numpy serialized scipy.sparse.csr_matrix. See http://stackoverflow.com/questions/8955448/save-load-scipy-sparse-csr-matrix-in-portable-data-format") 14 | parser.add_argument('--walks', type=str, default='../work/random_walks.npz' ,help=u"numpy serialized random walks. Use codes/pre_cumpute_walks.py to generate this file") 15 | parser.add_argument('--log', type=str, default="../log1/", help=u"directory to save tensorflow logs") 16 | parser.add_argument('--save', type=str, default='../results/node_embeddings.npz', help=u"directory to save final embeddigs") 17 | args = parser.parse_args() 18 | 19 | print("loading adjacent matrix") 20 | 21 | file_csr_matrix=args.graph 22 | adj_mat_csr_sparse=ad_hoc_functions.load_sparse_csr(file_csr_matrix) 23 | 24 | print("loading pre-computed random walks") 25 | random_walk_files=args.walks 26 | np_random_walks=np.load(random_walk_files)['arr_0'] 27 | np.random.shuffle(np_random_walks) 28 | 29 | print("defining compuotational graphs") 30 | #Computational Graph Definition 31 | num_nodes=adj_mat_csr_sparse.shape[0] 32 | context_size=16 33 | batch_size = None 34 | embedding_size = 200 # Dimension of the embedding vector. 35 | num_sampled = 64 # Number of negative examples to sample. 36 | 37 | global_step = tf.Variable(0, name='global_step', trainable=False) 38 | 39 | # Parameters to learn 40 | node_embeddings = tf.Variable(tf.random_uniform([num_nodes, embedding_size], -1.0, 1.0)) 41 | softmax_weights = tf.Variable(tf.truncated_normal([num_nodes, embedding_size],stddev=1.0 / math.sqrt(embedding_size))) 42 | softmax_biases = tf.Variable(tf.zeros([num_nodes])) 43 | 44 | # Input data and re-orgenize size. 45 | with tf.name_scope("context_node") as scope: 46 | #context nodes to each input node in the batch (e.g [[1,2],[4,6],[5,7]] where batch_size = 3,context_size=3) 47 | train_context_node= tf.placeholder(tf.int32, shape=[batch_size,context_size],name="context_node") 48 | #orgenize prediction labels (skip-gram model predicts context nodes (i.e labels) given a input node) 49 | #i.e make [[1,2,4,6,5,7]] given context above. The redundant dimention is just for restriction on tensorflow API. 50 | train_context_node_flat=tf.reshape(train_context_node,[-1,1]) 51 | with tf.name_scope("input_node") as scope: 52 | #batch input node to the network(e.g [2,1,3] where batch_size = 3) 53 | train_input_node= tf.placeholder(tf.int32, shape=[batch_size],name="input_node") 54 | #orgenize input as flat. i.e we want to make [2,2,2,1,1,1,3,3,3] given the input nodes above 55 | input_ones=tf.ones_like(train_context_node) 56 | train_input_node_flat=tf.reshape(tf.mul(input_ones,tf.reshape(train_input_node,[-1,1])),[-1]) 57 | 58 | # Model. 59 | with tf.name_scope("loss") as scope: 60 | # Look up embeddings for words. 61 | node_embed = tf.nn.embedding_lookup(node_embeddings, train_input_node_flat) 62 | # Compute the softmax loss, using a sample of the negative labels each time. 63 | loss_node2vec = tf.reduce_mean(tf.nn.sampled_softmax_loss(softmax_weights,softmax_biases,node_embed,train_context_node_flat, num_sampled, num_nodes)) 64 | loss_node2vec_summary = tf.scalar_summary("loss_node2vec", loss_node2vec) 65 | 66 | # Initializing the variables 67 | init = tf.initialize_all_variables() 68 | 69 | # Add ops to save and restore all the variables. 70 | saver = tf.train.Saver(max_to_keep=20) 71 | 72 | # Optimizer. 73 | update_loss = tf.train.AdamOptimizer().minimize(loss_node2vec,global_step=global_step) 74 | 75 | merged = tf.merge_all_summaries() 76 | 77 | num_random_walks=np_random_walks.shape[0] 78 | random_walk_length=np_random_walks.shape[1] 79 | 80 | # Launch the graph 81 | # Initializing the variables 82 | init = tf.initialize_all_variables() 83 | 84 | print("Optimizing") 85 | with tf.Session() as sess: 86 | log_dir=args.log# tensorboard --logdir=./log1 87 | writer = tf.train.SummaryWriter(log_dir, sess.graph) 88 | sess.run(init) 89 | for i in xrange(0,num_random_walks): 90 | a_random_walk=np_random_walks[i] 91 | train_input_batch = np.array([a_random_walk[j] for j in xrange(random_walk_length-context_size)]) 92 | train_context_batch = np.array([a_random_walk[j+1:j+1+context_size] for j in xrange(random_walk_length-context_size)]) 93 | feed_dict={train_input_node:train_input_batch, 94 | train_context_node:train_context_batch,} 95 | _,loss_value,summary_str=sess.run([update_loss,loss_node2vec,merged], feed_dict) 96 | writer.add_summary(summary_str,i) 97 | 98 | with open(log_dir+"loss_value.txt", "a") as f: 99 | f.write(str(loss_value)+'\n') 100 | 101 | # Save the variables to disk. 102 | if i%10000==0: 103 | model_path=log_dir+"model.ckpt" 104 | save_path = saver.save(sess, model_path,global_step) 105 | print("Model saved in file: %s" % save_path) 106 | 107 | model_path=log_dir+"model.ckpt" 108 | save_path = saver.save(sess, model_path,global_step) 109 | print("Model saved in file: %s" % save_path) 110 | 111 | print("Save final embeddings as numpy array") 112 | np_node_embeddings=sess.run(node_embeddings) 113 | np.savez(args.save,np_node_embeddings) 114 | -------------------------------------------------------------------------------- /code/train_node2vec_symmetric.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | #!/usr/bin/env python 3 | 4 | __author__ = "satoshi tsutsui" 5 | 6 | import numpy as np 7 | import tensorflow as tf 8 | import ad_hoc_functions 9 | import argparse 10 | 11 | parser = argparse.ArgumentParser() 12 | parser.add_argument('--graph', type=str, default="../data/co-author-matrix.npz",help=u"numpy serialized scipy.sparse.csr_matrix. See http://stackoverflow.com/questions/8955448/save-load-scipy-sparse-csr-matrix-in-portable-data-format") 13 | parser.add_argument('--walks', type=str, default='../work/random_walks.npz' ,help=u"numpy serialized random walks. Use codes/pre_cumpute_walks.py to generate this file") 14 | parser.add_argument('--log', type=str, default="../log1/", help=u"directory to save tensorflow logs") 15 | parser.add_argument('--save', type=str, default='../results/node_embeddings.npz', help=u"directory to save final embeddigs") 16 | args = parser.parse_args() 17 | 18 | print("loading adjacent matrix") 19 | 20 | file_csr_matrix=args.graph 21 | adj_mat_csr_sparse=ad_hoc_functions.load_sparse_csr(file_csr_matrix) 22 | 23 | print("loading pre-computed random walks") 24 | random_walk_files=args.walks 25 | np_random_walks=np.load(random_walk_files)['arr_0'] 26 | np.random.shuffle(np_random_walks) 27 | 28 | print("defining compuotational graphs") 29 | #Computational Graph Definition 30 | num_nodes=adj_mat_csr_sparse.shape[0] 31 | context_size=16 32 | batch_size = None 33 | embedding_size = 200 # Dimension of the embedding vector. 34 | num_sampled = 64 # Number of negative examples to sample. 35 | 36 | global_step = tf.Variable(0, name='global_step', trainable=False) 37 | 38 | # Parameters to learn 39 | node_embeddings = tf.Variable(tf.random_uniform([num_nodes, embedding_size], -1.0, 1.0)) 40 | 41 | #Fixedones 42 | biases=tf.zeros([num_nodes]) 43 | 44 | # Input data and re-orgenize size. 45 | with tf.name_scope("context_node") as scope: 46 | #context nodes to each input node in the batch (e.g [[1,2],[4,6],[5,7]] where batch_size = 3,context_size=3) 47 | train_context_node= tf.placeholder(tf.int32, shape=[batch_size,context_size],name="context_node") 48 | #orgenize prediction labels (skip-gram model predicts context nodes (i.e labels) given a input node) 49 | #i.e make [[1,2,4,6,5,7]] given context above. The redundant dimention is just for restriction on tensorflow API. 50 | train_context_node_flat=tf.reshape(train_context_node,[-1,1]) 51 | with tf.name_scope("input_node") as scope: 52 | #batch input node to the network(e.g [2,1,3] where batch_size = 3) 53 | train_input_node= tf.placeholder(tf.int32, shape=[batch_size],name="input_node") 54 | #orgenize input as flat. i.e we want to make [2,2,2,1,1,1,3,3,3] given the input nodes above 55 | input_ones=tf.ones_like(train_context_node) 56 | train_input_node_flat=tf.reshape(tf.mul(input_ones,tf.reshape(train_input_node,[-1,1])),[-1]) 57 | 58 | # Model. 59 | with tf.name_scope("loss") as scope: 60 | # Look up embeddings for words. 61 | node_embed = tf.nn.embedding_lookup(node_embeddings, train_input_node_flat) 62 | # Compute the softmax loss, using a sample of the negative labels each time. 63 | loss_node2vec = tf.reduce_mean(tf.nn.sampled_softmax_loss(node_embeddings,biases,node_embed,train_context_node_flat, num_sampled, num_nodes)) 64 | loss_node2vec_summary = tf.scalar_summary("loss_node2vec", loss_node2vec) 65 | 66 | # Initializing the variables 67 | init = tf.initialize_all_variables() 68 | 69 | # Add ops to save and restore all the variables. 70 | saver = tf.train.Saver(max_to_keep=20) 71 | 72 | # Optimizer. 73 | update_loss = tf.train.AdamOptimizer().minimize(loss_node2vec,global_step=global_step) 74 | 75 | merged = tf.merge_all_summaries() 76 | 77 | num_random_walks=np_random_walks.shape[0] 78 | random_walk_length=np_random_walks.shape[1] 79 | 80 | # Launch the graph 81 | # Initializing the variables 82 | init = tf.initialize_all_variables() 83 | 84 | print("Optimizing") 85 | with tf.Session() as sess: 86 | log_dir=args.log# tensorboard --logdir=./log1 87 | writer = tf.train.SummaryWriter(log_dir, sess.graph) 88 | sess.run(init) 89 | for i in xrange(0,num_random_walks): 90 | a_random_walk=np_random_walks[i] 91 | train_input_batch = np.array([a_random_walk[j] for j in xrange(random_walk_length-context_size)]) 92 | train_context_batch = np.array([a_random_walk[j+1:j+1+context_size] for j in xrange(random_walk_length-context_size)]) 93 | feed_dict={train_input_node:train_input_batch, 94 | train_context_node:train_context_batch,} 95 | _,loss_value,summary_str=sess.run([update_loss,loss_node2vec,merged], feed_dict) 96 | writer.add_summary(summary_str,i) 97 | 98 | with open(log_dir+"loss_value.txt", "a") as f: 99 | f.write(str(loss_value)+'\n') 100 | 101 | # Save the variables to disk. 102 | if i%10000==0: 103 | model_path=log_dir+"model.ckpt" 104 | save_path = saver.save(sess, model_path,global_step) 105 | print("Model saved in file: %s" % save_path) 106 | 107 | model_path=log_dir+"model.ckpt" 108 | save_path = saver.save(sess, model_path,global_step) 109 | print("Model saved in file: %s" % save_path) 110 | 111 | print("Save final embeddings as numpy array") 112 | np_node_embeddings=sess.run(node_embeddings) 113 | np.savez(args.save,np_node_embeddings) 114 | -------------------------------------------------------------------------------- /data/co-author-matrix.npz: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/apple2373/node2vec/c1d4b593441c9aa2bdaf50f8ce458ed74c11c765/data/co-author-matrix.npz -------------------------------------------------------------------------------- /data/readme.txt: -------------------------------------------------------------------------------- 1 | co-author-matrix.npz: Scipy.sparse.csr_matrix for co-author network 2 | co-author-index.json: mapping from matrix row (or columun) id to author name. 3 | co-author-original-ids.json:mapping from matrix row (or columun) id to its original id in Microsoft Academic Graph -------------------------------------------------------------------------------- /results/.gitignore: -------------------------------------------------------------------------------- 1 | # Ignore everything in this directory 2 | * 3 | # Except this file 4 | !.gitignore 5 | -------------------------------------------------------------------------------- /work/.gitignore: -------------------------------------------------------------------------------- 1 | # Ignore everything in this directory 2 | * 3 | # Except this file 4 | !.gitignore 5 | --------------------------------------------------------------------------------