├── README.md
├── Region2Vec_Workflow.jpg
├── code
    ├── analytics.py
    ├── clustering.py
    ├── layers.py
    ├── models.py
    ├── run_clustering.sh
    ├── train.ipynb
    ├── train.py
    └── utils.py
├── data.zip
└── requirements.txt


/README.md:
--------------------------------------------------------------------------------
 1 | # Region2vec
 2 | 
 3 | **Region2vec: Community Detection on Spatial Networks Using Graph Embedding with Node Attributes and Spatial Interactions**
 4 | 
 5 | ![Region2vec](https://github.com/GeoDS/Region2vec/blob/master/Region2Vec_Workflow.jpg)
 6 | 
 7 | **Abstract:** 
 8 | Community Detection algorithms are used to detect densely connected components in complex networks and reveal underlying relationships among components. As a special type of networks, spatial networks are usually generated by the connections among geographic regions. Identifying the spatial network communities can help reveal the spatial interaction patterns, understand the hidden regional structures and support regional development decision-making. Given the recent development of Graph Convolutional Networks (GCN) and its powerful performance in identifying multi-scale spatial interactions, we proposed an unsupervised GCN-based community detection method "region2vec" on spatial networks. Our method first generates node embeddings for regions that share common attributes and have intense spatial interactions, and then applies clustering algorithms to detect communities based on their embedding similarity and geographic adjacency. Experimental results show that while existing methods trade off either attribute similarities or spatial interactions for one another, "region2vec" maintains a great balance between both and performs the best when one wants to maximize both attribute similarities and spatial interactions within communities.
 9 | 
10 | **Region2vec-GAT**: An improved version of the region2vec approach with Graph Attention Networks (GAT)
11 | [https://github.com/GeoDS/region2vec-GAT/](https://github.com/GeoDS/region2vec-GAT/)
12 | 
13 | ## Paper
14 | 
15 | If you find our code useful for your research, you may cite our paper:
16 | 
17 | *Liang, Y., Zhu, J., Ye, W., and Gao, S.\. (2022).  [Region2vec: Community Detection on Spatial Networks Using Graph Embedding with Node Attributes and Spatial Interactions](https://arxiv.org/abs/2210.08041). In Proceedings of 30th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems
18 | (ACM SIGSPATIAL 2022), November 1-4, 2022, Seattle, WA, USA. DOI: https://doi.org/10.1145/3557915.3560974* 
19 | 
20 | 
21 | ```
22 | @inproceedings{liang2022regions2vec,
23 |   title={Region2vec: Community Detection on Spatial Networks Using Graph Embedding with Node Attributes and Spatial Interactions},
24 |   author={Liang, Yunlei and Zhu, Jiawei and Ye, Wen and Gao, Song },
25 |   booktitle={Proceedings of 30th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems
26 | (ACM SIGSPATIAL 2022), November 1-4, 2022, Seattle, WA, USA},
27 |   year={2022},
28 |   pages={1--4},
29 |   doi={10.1145/3557915.3560974}
30 | }
31 | ```
32 | 
33 | ## Requirements
34 | 
35 | Region2vec uses the following packages with Python 3.7
36 | 
37 | numpy==1.19.5
38 | 
39 | pandas==0.24.1
40 | 
41 | scikit_learn==1.1.2
42 | 
43 | scipy==1.3.1
44 | 
45 | torch==1.4.0 (suggest torch>=2.2.0 for security alert)
46 | 
47 | 
48 | 
49 | ## Usage
50 | 
51 | 1. run train.py to generate the embeddings.
52 | ```
53 | python train.py
54 | ```
55 | 2. run clustering.py to generate the clustering result. 
56 | 
57 | ```
58 | python clustering.py --filename your_filename
59 | ```
60 | Here the 'your_filename' should be replaced with the generated file from step 1.
61 | 
62 | 3. Alternatively, to generate the clustering for all the files, please use bash, and run bash run_clustering.py.
63 | 
64 | ```
65 | bash run_clustering.sh 
66 | ```
67 | Notes: the final results (e.g., metric values) may vary depends on different platforms and package versions.
68 | The current result is obtained using Ubuntu with all pacakge versions in requirements.txt. 
69 | 
70 | ## Data
71 | The data files used in our method are listed below with detailed descriptions.
72 | 
73 | Flow_matrix.csv: The visitor flow matrix between Census Tracts in Wisconsin (The spatial flow interaction matrix).
74 | 
75 | Spatial_matrix.csv: The adjacency matrix generated based on the geographic adjacency relationship.
76 | 
77 | Spatial_matrix_rook.csv: The adjacency matrix generated based on the geographic adjacency relationship with the rook-type contiguity relationship.
78 | 
79 | Spatial_distance_matrix.csv: the hop distance calculated based on the spatial adjacency matrix.
80 | 
81 | flow_reID.csv: the visitor flows with updated IDs of Census Tracts.
82 | 
83 | feature_matrix_f1.csv: the features of nodes (Census Tracts).
84 | 
85 | feature_matrix_lwinc.csv: the low income population feature of nodes used for generating the homogeneous scores.
86 | 
87 | 
88 | 
89 | ## Acknowledgement
90 | We acknowledge the funding support from the County Health Rankings and Roadmaps program of the University of Wisconsin Population Health Institute, Wisconsin Department of Health Services, and the National Science Foundation funded AI institute [Grant No. 2112606] for [Intelligent Cyberinfrastructure with Computational Learning in the Environment (ICICLE)](https://icicle.ai/). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the funders.
91 | 
92 | 


--------------------------------------------------------------------------------
/Region2Vec_Workflow.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/GeoDS/Region2vec/8d265088babf1cb74139b119913737fb3eaac6bb/Region2Vec_Workflow.jpg


--------------------------------------------------------------------------------
/code/analytics.py:
--------------------------------------------------------------------------------
  1 | import pandas as pd
  2 | import numpy as np
  3 | import math
  4 | from sklearn.metrics.pairwise import euclidean_distances, cosine_similarity
  5 | import matplotlib.pyplot as plt
  6 | from sklearn.cluster import AgglomerativeClustering
  7 | from sklearn import metrics
  8 | 
  9 | 
 10 | EPS = 1e-15
 11 | def run_aggclustering(path, file_name, affinity, n_clusters, linkage = 'ward'):
 12 |     #print(n_clusters)
 13 |     X = np.loadtxt(path+file_name, delimiter=' ')
 14 |     if 'csv' in file_name:
 15 |         file_name = file_name[:-4]
 16 | 
 17 |     adj = np.loadtxt('../data/Spatial_matrix_rook.csv', delimiter=',')
 18 |     model = AgglomerativeClustering(linkage=linkage, n_clusters=n_clusters, connectivity = adj, affinity=affinity)
 19 | 
 20 |     model.fit(X)
 21 |     labels = model.labels_
 22 | 
 23 |     homo_score = lwinc_purity(labels)
 24 |   
 25 |     total_ratio = intra_inter_idx(labels, n_clusters)
 26 |     median_ineq = community_inequality(labels, file_name, path, n_clusters)
 27 | 
 28 |     median_sim, median_dist = similarity(labels, file_name, path, n_clusters)
 29 | 
 30 |     return labels, total_ratio, median_ineq, median_sim, median_dist, homo_score
 31 | 
 32 | 
 33 | #generate the homogeneous score
 34 | def lwinc_purity(labels, lwinc_file = "../data/feature_matrix_lwinc.csv"):
 35 |     X_lwinc = np.loadtxt(lwinc_file, delimiter=',') 
 36 |     X_lwinc = X_lwinc[:,1:]
 37 |     n_thres = 5
 38 |     threshold = np.arange(0, 1+1/n_thres, 1/n_thres)
 39 |     lwinc_classes = [np.quantile(X_lwinc, q) for q in threshold] #the classification of lwinc perc
 40 |     lwinc_classes[-1] = 1 + EPS #make the upper limit larger than any exsiting values
 41 | 
 42 |     X_classes = np.array([next(i-1 for i,t in enumerate(lwinc_classes) if t > v) for v in X_lwinc])
 43 | 
 44 |     homo_score = metrics.homogeneity_score(X_classes, labels)
 45 |     print("The homogeneous score is {:.3f}".format(homo_score))
 46 | 
 47 |     return homo_score
 48 |     
 49 | 
 50 | def intra_inter_idx(labels, k):
 51 |     CIDs = labels
 52 |     
 53 |     #generate ID to Community ID mapping
 54 |     UID = range(0,len(CIDs))
 55 |     ID_dict = dict(zip(UID, CIDs))
 56 |     
 57 |     flow = pd.read_csv('../data/flow_reID.csv')
 58 |     if 'Unnamed: 0' in flow.columns:
 59 |         flow = flow.drop(columns = 'Unnamed: 0')
 60 |         
 61 |     flow['From'] = flow['From'].map(ID_dict)
 62 |     flow['To'] = flow['To'].map(ID_dict)  
 63 |     
 64 |     #groupby into communities
 65 |     flow_com = flow.groupby(['From','To']).sum(['visitor_flows','pop_flows']).reset_index()
 66 |     
 67 |     ComIDs = list(flow_com.From.unique())
 68 |     intra_flows = list(flow_com[flow_com['From'] == flow_com['To']]['visitor_flows'].values)
 69 |     inter_flows = list(flow_com[flow_com['From'] != flow_com['To']].groupby(['From']).sum(['visitor_flows']).reset_index()['visitor_flows'])
 70 |     d = {'CID':ComIDs, 'intra': intra_flows, 'inter': inter_flows}
 71 |     df = pd.DataFrame(d)
 72 |     df['intra_inter'] = df['intra']/df['inter']    
 73 | 
 74 |     total_ratio = sum(df['intra'])/sum(df['inter']) 
 75 |     print("The total intra/inter ratio is {:.3f}".format(total_ratio))
 76 | 
 77 |     return total_ratio
 78 | 
 79 | 
 80 | def similarity(labels, file_name, path, n_clusters, savefig = True, feature_path = '../data/feature_matrix_f1.csv'):
 81 |     features = np.loadtxt(feature_path, delimiter=',') 
 82 |     X = features[:,1:]
 83 | 
 84 |     #calculate cos similarity for all features
 85 |     cossim_mx = cosine_similarity(X)
 86 | 
 87 |     sim_dict = {}
 88 |     for c in range(n_clusters):
 89 |         ct_com = np.where(labels == c)[0]
 90 |         cossim_com = cossim_mx[ct_com[:,None], ct_com[None,:]]  #slice the matrix so all the included values is for this community
 91 |         cossim = cossim_com[np.triu_indices(len(ct_com), k = 0)]
 92 |         sim_dict[c] = np.mean(cossim)
 93 | 
 94 |     median_sim = np.median(list(sim_dict.values()))
 95 | 
 96 |     #calculate the euclidean distance for all features
 97 |     eucdist_mx = euclidean_distances(X)
 98 | 
 99 |     dist_dict = {}
100 |     for c in range(n_clusters):
101 |         ct_com = np.where(labels == c)[0]
102 |         eucdist_com = eucdist_mx[ct_com[:,None], ct_com[None,:]]  #slice the matrix so all the included values is for this community
103 |         eucdist = eucdist_com[np.triu_indices(len(ct_com), k = 0)]
104 |         dist_dict[c] = np.mean(eucdist)
105 | 
106 |     median_dist = np.median(list(dist_dict.values()))
107 | 
108 |     print("The median cosine similarity is {:.3f}".format(median_sim))
109 |     print("The median euclidean distance similarity is {:.3f}".format(median_dist))
110 | 
111 |     return median_sim, median_dist
112 | 
113 | def cal_inequality(values):
114 |     mean = np.mean(values)
115 |     std = np.std(values)
116 |     ineq = std/math.sqrt(mean*(1-mean))
117 |     return ineq
118 | 
119 | def community_inequality(labels, file_name, path, k = 13):
120 |     features = np.loadtxt('../data/feature_matrix_f1.csv', delimiter=',') #use updated features   
121 |     features = features[:,1:]
122 |     pdist = np.linalg.norm(features[:, None]-features, ord = 2, axis=2)
123 | 
124 |     ineq_dict = {}
125 |     for c in range(k):
126 |         ct_com = np.where(labels == c)[0]
127 |         if len(ct_com) < 2:
128 |             continue
129 |         else:
130 |             pdist_com = pdist[ct_com[:,None], ct_com[None,:]]  #slice the pdist so all the included values is for this community
131 |             dist = pdist_com[np.triu_indices(len(ct_com), k = 1)]
132 |             
133 |             #calculate the inequality
134 |             ineq = cal_inequality(dist)
135 |             ineq_dict[c] = ineq
136 | 
137 |     median_ineq = np.median(list(ineq_dict.values()))
138 |     print("The median inequality is {:.3f}".format(median_ineq))
139 | 
140 |     return median_ineq
141 | 
142 | 


--------------------------------------------------------------------------------
/code/clustering.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | from analytics import run_aggclustering 
 3 | import csv
 4 | import argparse
 5 | 
 6 | 
 7 | parser = argparse.ArgumentParser()
 8 | parser.add_argument('--n_clusters', type=int, default=14,
 9 |                     help='Number of clusters.')
10 | parser.add_argument('--affinity', type=str, default='euclidean',
11 |                     help='affinity metric')
12 | parser.add_argument('--filename', type=str, default='Epoch_378_dropout_0.1_hop_5.0_losstype_divreg_mod_False.csv',
13 |                     help='file name')
14 |                   
15 | args = parser.parse_args()
16 | 
17 | 
18 | if '../' in args.filename:
19 |     args.filename = args.filename.split('/')[-1]
20 | 
21 | linkage = 'ward'
22 | path = '../result/'
23 | 
24 | labels, total_ratio, median_ineq, median_cossim, median_dist, homo_score = run_aggclustering(path, args.filename, args.affinity, args.n_clusters, linkage)
25 | csv_data = [args.filename, args.n_clusters, linkage, args.affinity, total_ratio, median_ineq, median_cossim, median_dist, homo_score]
26 | result_csv = 'cluster_result.csv'
27 | 
28 | if not os.path.exists(os.path.join(path, result_csv)):
29 |     with open(os.path.join(path, result_csv), 'w') as f:
30 |         csv_write = csv.writer(f)
31 |         csv_head = ['file_name', 'n_clusters', 'linkage', 'distance', 'total_ratio', 'median_ineq', 'median_cossim','median_dist', "homo_score"]
32 |         csv_write.writerow(csv_head)
33 |         f.close()
34 | 
35 | with open(os.path.join(path, result_csv), mode='a', newline='') as f1:
36 |     csv_write = csv.writer(f1)
37 |     csv_write.writerow(csv_data)


--------------------------------------------------------------------------------
/code/layers.py:
--------------------------------------------------------------------------------
 1 | import math
 2 | 
 3 | import torch
 4 | 
 5 | from torch.nn.parameter import Parameter
 6 | from torch.nn.modules.module import Module
 7 | 
 8 | 
 9 | class GraphConvolution(Module):
10 |     """
11 |     Simple GCN layer, similar to https://arxiv.org/abs/1609.02907
12 |     """
13 | 
14 |     def __init__(self, in_features, out_features, bias=True):
15 |         super(GraphConvolution, self).__init__()
16 |         self.in_features = in_features
17 |         self.out_features = out_features
18 |         self.weight = Parameter(torch.FloatTensor(in_features, out_features))
19 |         if bias:
20 |             self.bias = Parameter(torch.FloatTensor(out_features))
21 |         else:
22 |             self.register_parameter('bias', None)
23 |         self.reset_parameters()
24 | 
25 |     def reset_parameters(self):
26 |         stdv = 1. / math.sqrt(self.weight.size(1))
27 |         self.weight.data.uniform_(-stdv, stdv)
28 |         if self.bias is not None:
29 |             self.bias.data.uniform_(-stdv, stdv)
30 | 
31 |     def forward(self, input, adj):
32 |         support = torch.mm(input, self.weight)
33 |         output = torch.spmm(adj, support)
34 |         if self.bias is not None:
35 |             return output + self.bias
36 |         else:
37 |             return output
38 | 
39 |     def __repr__(self):
40 |         return self.__class__.__name__ + ' (' \
41 |                + str(self.in_features) + ' -> ' \
42 |                + str(self.out_features) + ')'
43 | 


--------------------------------------------------------------------------------
/code/models.py:
--------------------------------------------------------------------------------
 1 | import torch.nn as nn
 2 | import torch.nn.functional as F
 3 | from layers import GraphConvolution
 4 | 
 5 | 
 6 | class GCN(nn.Module):
 7 |     def __init__(self, nfeat, nhid, nout, dropout):
 8 |         super(GCN, self).__init__()
 9 | 
10 |         self.gc1 = GraphConvolution(nfeat, nhid)
11 |         self.gc2 = GraphConvolution(nhid, nout)
12 |         self.dropout = dropout
13 | 
14 |     def forward(self, x, adj):
15 |         x = F.relu(self.gc1(x, adj))
16 |         x = F.dropout(x, self.dropout, training=self.training)
17 |         x = self.gc2(x, adj) 
18 |         x_softmax = F.softmax(x, dim=1)  #also returns the softmax result
19 |         return x, x_softmax
20 | 


--------------------------------------------------------------------------------
/code/run_clustering.sh:
--------------------------------------------------------------------------------
 1 | #!/bin/bash 
 2 | 
 3 | source D:/Applications/anaconda3/etc/profile.d/conda.sh
 4 | conda activate D:/Applications/envs/gcn
 5 | 
 6 | nclu=14
 7 | find ../result -maxdepth 1 -type f -name Epoch*.csv  -exec python clustering.py --filename {} --n_clusters $nclu \;
 8 | 
 9 | #nclu=20
10 | # for nclu in 20 30 50 80
11 | # do
12 | #python clustering.py --n_clusters $nclu --filename algo_line_output_14_epochs_5_new.csv;
13 | #python clustering.py --filename Epoch_378_dropout_0.1_hop_5_losstype_divreg_mod_True.csv --n_clusters $nclu;
14 | #     python clustering.py --filename algo_node2vec_output_13_epochs_5.csv --n_clusters $nclu --nbr none;
15 | #     python clustering.py --filename algo_deepwalk_output_13_epochs_5.csv --n_clusters $nclu --nbr none;
16 | #python clustering.py  --n_clusters $nclu --algo kmeans;
17 | # done#


--------------------------------------------------------------------------------
/code/train.py:
--------------------------------------------------------------------------------
  1 | from __future__ import division
  2 | from __future__ import print_function
  3 | 
  4 | import time
  5 | import argparse
  6 | import numpy as np
  7 | 
  8 | import torch
  9 | import torch.nn.functional as F
 10 | import torch.optim as optim
 11 | 
 12 | from utils import load_widata, purge
 13 | from models import GCN
 14 | import csv
 15 | import os
 16 | import math
 17 | # Training settings
 18 | parser = argparse.ArgumentParser()
 19 | parser.add_argument('--no-cuda', action='store_true', default=False,
 20 |                     help='Disables CUDA training.')
 21 | parser.add_argument('--seed', type=int, default=42, help='Random seed.')
 22 | parser.add_argument('--epochs', type=int, default=500,
 23 |                     help='Number of epochs to train.')
 24 | parser.add_argument('--patience', type=int, default=50,
 25 |                     help='Early stopping control.')
 26 | parser.add_argument('--ltype', type=str, default='divreg',
 27 |                     help='divide or loglike with regularization.')
 28 | parser.add_argument('--lr', type=float, default=0.001,     
 29 |                     help='Initial learning rate.')
 30 | parser.add_argument('--weight_decay', type=float, default=5e-4,
 31 |                     help='Weight decay (L2 loss on parameters).')
 32 | parser.add_argument('--hidden', type=int, default=16,
 33 |                     help='Number of hidden units.')
 34 | parser.add_argument('--output', type=int, default=14,
 35 |                     help='Output dim.')
 36 | parser.add_argument('--dropout', type=float, default=0.1,
 37 |                     help='Dropout rate (1 - keep probability).')
 38 | parser.add_argument('--hops', type=float, default=5,
 39 |                     help='Contrain with hops')
 40 | 
 41 | 
 42 | args = parser.parse_args()
 43 | args.cuda = not args.no_cuda and torch.cuda.is_available()
 44 | 
 45 | np.random.seed(args.seed)
 46 | torch.manual_seed(args.seed)
 47 | if args.cuda:
 48 |     torch.cuda.manual_seed(args.seed)
 49 | 
 50 | 
 51 | # Load data
 52 | adj, features, labels, neg_mask, pos_mask, hops_m, intensity_m_norm, strength = load_widata()
 53 | print(args.ltype)
 54 | 
 55 | # Model and optimizer
 56 | model = GCN(nfeat=features.shape[1],
 57 |             nhid=args.hidden,
 58 |             nout=args.output,
 59 |             dropout=args.dropout)
 60 | optimizer = optim.Adam(model.parameters(),
 61 |                        lr=args.lr, weight_decay=args.weight_decay)
 62 | 
 63 | 
 64 | EPS = 1e-15
 65 | 
 66 | if args.cuda:
 67 |     model.cuda()
 68 |     features = features.cuda()
 69 |     adj = adj.cuda()
 70 |     labels = labels.cuda()
 71 |     neg_mask = neg_mask.cuda()
 72 |     pos_mask = pos_mask.cuda()
 73 |     hops_m = hops_m.cuda()
 74 | 
 75 | N_pos = sum(sum(pos_mask))
 76 | N_neg = sum(sum(neg_mask))
 77 | 
 78 | # Train model
 79 | t_total = time.time()
 80 | loss_list = []
 81 | for epoch in range(args.epochs):
 82 |     t = time.time()
 83 |     model.train()
 84 |     optimizer.zero_grad()
 85 |     output, output_sfx = model(features, adj) 
 86 | 
 87 |     pdist = torch.norm(output[:, None]-output, dim=2, p=2)
 88 |     inner_pro = torch.mm(output,output.T)
 89 |     loss_hops = 0
 90 | 
 91 | # the div loss uses the flow strength directly. It tries to minimize the loss of W_pos*Dist - W_zero*Dist. A larger strength (W_pos) leads to smaller\
 92 | #  the distances between embeddings; the W_zero refers nodes with no flow, so their embeddings differences should be as large as possible. 
 93 |     if args.ltype == 'div':
 94 |         if args.hops > 1:
 95 |             loss_hops = torch.sum(pdist.mul(hops_m)) + EPS
 96 |         loss_train = torch.sum(pdist.mul(labels).mul(pos_mask)) /( (torch.sum(pdist.mul(neg_mask)) + EPS) + loss_hops)
 97 | 
 98 |     elif args.ltype == 'divreg':
 99 |         if args.hops > 1:
100 |             loss_hops = torch.sum(pdist.mul(hops_m)) + EPS
101 |         loss_train = torch.sum(pdist.mul(labels).mul(pos_mask))*N_neg /(N_pos*( (torch.sum(pdist.mul(neg_mask)) + EPS) + loss_hops))
102 | 
103 | 
104 | 
105 |     loss_train.backward()
106 |     optimizer.step()
107 | 
108 |     loss_list.append(loss_train.item())
109 | 
110 |     print('Epoch: {:04d}'.format(epoch),
111 |           'loss_train: {:.5f}'.format(loss_train.item()),
112 |           'time: {:.4f}s'.format(time.time() - t))
113 | 
114 | 
115 |     result_path = '../result/'
116 |     if not os.path.exists(result_path):
117 |         os.mkdir(result_path)
118 | 
119 |     if epoch >= 200:
120 |         save_name = 'lr_{}_dropout_{}_hidden_{}_output_{}_patience_{}_hos_{}_losstype_{}_seed_{}.csv'.format(args.lr, args.dropout, args.hidden, args.output, args.patience, args.hops, args.ltype, args.seed)
121 |        
122 |         np.savetxt(result_path + 'Epoch_{}_'.format(epoch) + save_name, output.detach().numpy())
123 | 
124 |         if epoch > 200 + args.patience and loss_train > np.average(loss_list[-args.patience:]):
125 |             best_epoch = loss_list.index(min(loss_list))
126 |             print('Lose patience, stop training...')
127 |             print('Best epoch: {}'.format(best_epoch))
128 |             purge(result_path, save_name, best_epoch, epoch-best_epoch)
129 |             break
130 | 
131 |         if epoch == args.epochs -1:
132 |             print('Last epoch, saving...')
133 |             best_epoch = epoch
134 |             purge(result_path, save_name, best_epoch, 0)
135 | 
136 | 
137 | print("Optimization Finished!")
138 | print("Total time elapsed: {:.4f}s".format(time.time() - t_total))
139 | 
140 | 
141 | result_csv = 'result.csv'
142 | if not os.path.exists(os.path.join(result_path, result_csv)):
143 |     with open(os.path.join(result_path, result_csv), 'w') as f:
144 |         csv_write = csv.writer(f)
145 |         csv_head = ['epoch', 'losstype', 'hops', 'inertia', 'total_ratio', 'global_ineq', 'output', 'hidden', 'lr', 'dropout', 'patience', 'median_ineq','n_clu']
146 |         csv_write.writerow(csv_head)
147 |         f.close()
148 | 
149 | 
150 | 


--------------------------------------------------------------------------------
/code/utils.py:
--------------------------------------------------------------------------------
 1 | import numpy as np
 2 | import scipy.sparse as sp
 3 | import torch
 4 | import torch.nn.functional as F
 5 | import os
 6 | import re
 7 | 
 8 | EPS = 1e-15
 9 | 
10 | 
11 | def encode_onehot(labels):
12 |     classes = set(labels)
13 |     classes_dict = {c: np.identity(len(classes))[i, :] for i, c in
14 |                     enumerate(classes)}
15 |     labels_onehot = np.array(list(map(classes_dict.get, labels)),
16 |                              dtype=np.int32)
17 |     return labels_onehot
18 | 
19 | 
20 | 
21 | def load_widata(path="../data/", dataset="wi", hops=5):
22 |     print('Loading {} dataset...'.format(dataset))
23 | 
24 |     features = np.loadtxt(path+'feature_matrix_f1.csv', delimiter=',')
25 |     features = torch.FloatTensor(features[:,1:])
26 | 
27 |     intensity_m = np.loadtxt(path + 'Flow_matrix.csv', delimiter=',')
28 |     intensity_neg = np.zeros([len(intensity_m),len(intensity_m)])
29 |     intensity_neg[intensity_m == 0] = 1
30 |     intensity_pos = np.zeros([len(intensity_m),len(intensity_m)])
31 |     intensity_pos[intensity_m > 0] = 1
32 | 
33 |     intensity_m = np.log(intensity_m + EPS)
34 |     intensity_m_norm = mx_normalize(intensity_m)
35 |     intensity_m_norm = torch.FloatTensor(intensity_m_norm)
36 |     strength = torch.sum(intensity_m_norm, axis = 0)
37 |     strength = torch.FloatTensor(strength)
38 | 
39 |     hops_m = np.loadtxt(path + 'Spatial_distance_matrix.csv', delimiter=',')
40 |     zero_entries = hops_m < hops
41 |     hops_m = 1/(np.log(hops_m + EPS)+1)
42 |     hops_m[zero_entries] = 0
43 |     hops_m = torch.FloatTensor(hops_m)
44 | 
45 |     intensity_m = torch.FloatTensor(intensity_m)
46 |     intensity_neg = torch.FloatTensor(intensity_neg)
47 |     intensity_pos = torch.FloatTensor(intensity_pos)
48 | 
49 |     adj = np.loadtxt(path + 'Spatial_matrix.csv', delimiter=',')
50 |     adj = normalize(adj + sp.eye(adj.shape[0]))
51 |     adj = torch.FloatTensor(adj)
52 | 
53 |     return adj, features, intensity_m, intensity_neg, intensity_pos, hops_m, intensity_m_norm, strength
54 | 
55 | def normalize(mx):
56 |     """Row-normalize sparse matrix"""
57 |     rowsum = np.array(mx.sum(1))
58 |     r_inv = np.power(rowsum, -1).flatten()
59 |     r_inv[np.isinf(r_inv)] = 0.
60 |     r_mat_inv = sp.diags(r_inv)
61 |     mx = r_mat_inv.dot(mx)
62 |     return mx
63 | 
64 | def mx_normalize(mx):
65 |     "element wise normalization"
66 |     mx_std = (mx - mx.min()) / (mx.max() - mx.min())
67 |     return mx_std
68 | 
69 | def accuracy(output, labels):
70 |     preds = output.max(1)[1].type_as(labels)
71 |     correct = preds.eq(labels).double()
72 |     correct = correct.sum()
73 |     return correct / len(labels)
74 | 
75 | 
76 | def sparse_mx_to_torch_sparse_tensor(sparse_mx):
77 |     """Convert a scipy sparse matrix to a torch sparse tensor."""
78 |     sparse_mx = sparse_mx.tocoo().astype(np.float32)
79 |     indices = torch.from_numpy(
80 |         np.vstack((sparse_mx.row, sparse_mx.col)).astype(np.int64))
81 |     values = torch.from_numpy(sparse_mx.data)
82 |     shape = torch.Size(sparse_mx.shape)
83 |     return torch.sparse.FloatTensor(indices, values, shape)
84 | 
85 | def purge(dir, filename, best_epoch, spill_num):
86 |     del_list = ['Epoch_{}_'.format(i) + filename for i in range(0, best_epoch)]
87 |     if spill_num > 0:
88 |         tmp = ['Epoch_{}_'.format(j) + filename for j in range(best_epoch + 1, best_epoch + spill_num + 1)]
89 |         del_list.extend(tmp)        
90 |     for f in os.listdir(dir):
91 |         if f in del_list:
92 |             os.remove(os.path.join(dir,f))


--------------------------------------------------------------------------------
/data.zip:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/GeoDS/Region2vec/8d265088babf1cb74139b119913737fb3eaac6bb/data.zip


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
 1 | community==1.0.0b1
 2 | geopandas==0.9.0
 3 | matplotlib==3.0.3
 4 | networkx==2.6.2
 5 | numpy~>1.21
 6 | pandas==0.24.1
 7 | pyproj==3.1.0
 8 | python_louvain==0.16
 9 | scikit_learn==1.5.1
10 | scipy==1.3.1
11 | torch>=2.2.0
12 | 


--------------------------------------------------------------------------------