├── README.md ├── Region2Vec_Workflow.jpg ├── code ├── analytics.py ├── clustering.py ├── layers.py ├── models.py ├── run_clustering.sh ├── train.ipynb ├── train.py └── utils.py ├── data.zip └── requirements.txt /README.md: -------------------------------------------------------------------------------- 1 | # Region2vec 2 | 3 | **Region2vec: Community Detection on Spatial Networks Using Graph Embedding with Node Attributes and Spatial Interactions** 4 | 5 | ![Region2vec](https://github.com/GeoDS/Region2vec/blob/master/Region2Vec_Workflow.jpg) 6 | 7 | **Abstract:** 8 | Community Detection algorithms are used to detect densely connected components in complex networks and reveal underlying relationships among components. As a special type of networks, spatial networks are usually generated by the connections among geographic regions. Identifying the spatial network communities can help reveal the spatial interaction patterns, understand the hidden regional structures and support regional development decision-making. Given the recent development of Graph Convolutional Networks (GCN) and its powerful performance in identifying multi-scale spatial interactions, we proposed an unsupervised GCN-based community detection method "region2vec" on spatial networks. Our method first generates node embeddings for regions that share common attributes and have intense spatial interactions, and then applies clustering algorithms to detect communities based on their embedding similarity and geographic adjacency. Experimental results show that while existing methods trade off either attribute similarities or spatial interactions for one another, "region2vec" maintains a great balance between both and performs the best when one wants to maximize both attribute similarities and spatial interactions within communities. 9 | 10 | **Region2vec-GAT**: An improved version of the region2vec approach with Graph Attention Networks (GAT) 11 | [https://github.com/GeoDS/region2vec-GAT/](https://github.com/GeoDS/region2vec-GAT/) 12 | 13 | ## Paper 14 | 15 | If you find our code useful for your research, you may cite our paper: 16 | 17 | *Liang, Y., Zhu, J., Ye, W., and Gao, S.\. (2022). [Region2vec: Community Detection on Spatial Networks Using Graph Embedding with Node Attributes and Spatial Interactions](https://arxiv.org/abs/2210.08041). In Proceedings of 30th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems 18 | (ACM SIGSPATIAL 2022), November 1-4, 2022, Seattle, WA, USA. DOI: https://doi.org/10.1145/3557915.3560974* 19 | 20 | 21 | ``` 22 | @inproceedings{liang2022regions2vec, 23 | title={Region2vec: Community Detection on Spatial Networks Using Graph Embedding with Node Attributes and Spatial Interactions}, 24 | author={Liang, Yunlei and Zhu, Jiawei and Ye, Wen and Gao, Song }, 25 | booktitle={Proceedings of 30th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems 26 | (ACM SIGSPATIAL 2022), November 1-4, 2022, Seattle, WA, USA}, 27 | year={2022}, 28 | pages={1--4}, 29 | doi={10.1145/3557915.3560974} 30 | } 31 | ``` 32 | 33 | ## Requirements 34 | 35 | Region2vec uses the following packages with Python 3.7 36 | 37 | numpy==1.19.5 38 | 39 | pandas==0.24.1 40 | 41 | scikit_learn==1.1.2 42 | 43 | scipy==1.3.1 44 | 45 | torch==1.4.0 (suggest torch>=2.2.0 for security alert) 46 | 47 | 48 | 49 | ## Usage 50 | 51 | 1. run train.py to generate the embeddings. 52 | ``` 53 | python train.py 54 | ``` 55 | 2. run clustering.py to generate the clustering result. 56 | 57 | ``` 58 | python clustering.py --filename your_filename 59 | ``` 60 | Here the 'your_filename' should be replaced with the generated file from step 1. 61 | 62 | 3. Alternatively, to generate the clustering for all the files, please use bash, and run bash run_clustering.py. 63 | 64 | ``` 65 | bash run_clustering.sh 66 | ``` 67 | Notes: the final results (e.g., metric values) may vary depends on different platforms and package versions. 68 | The current result is obtained using Ubuntu with all pacakge versions in requirements.txt. 69 | 70 | ## Data 71 | The data files used in our method are listed below with detailed descriptions. 72 | 73 | Flow_matrix.csv: The visitor flow matrix between Census Tracts in Wisconsin (The spatial flow interaction matrix). 74 | 75 | Spatial_matrix.csv: The adjacency matrix generated based on the geographic adjacency relationship. 76 | 77 | Spatial_matrix_rook.csv: The adjacency matrix generated based on the geographic adjacency relationship with the rook-type contiguity relationship. 78 | 79 | Spatial_distance_matrix.csv: the hop distance calculated based on the spatial adjacency matrix. 80 | 81 | flow_reID.csv: the visitor flows with updated IDs of Census Tracts. 82 | 83 | feature_matrix_f1.csv: the features of nodes (Census Tracts). 84 | 85 | feature_matrix_lwinc.csv: the low income population feature of nodes used for generating the homogeneous scores. 86 | 87 | 88 | 89 | ## Acknowledgement 90 | We acknowledge the funding support from the County Health Rankings and Roadmaps program of the University of Wisconsin Population Health Institute, Wisconsin Department of Health Services, and the National Science Foundation funded AI institute [Grant No. 2112606] for [Intelligent Cyberinfrastructure with Computational Learning in the Environment (ICICLE)](https://icicle.ai/). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the funders. 91 | 92 | -------------------------------------------------------------------------------- /Region2Vec_Workflow.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/GeoDS/Region2vec/8d265088babf1cb74139b119913737fb3eaac6bb/Region2Vec_Workflow.jpg -------------------------------------------------------------------------------- /code/analytics.py: -------------------------------------------------------------------------------- 1 | import pandas as pd 2 | import numpy as np 3 | import math 4 | from sklearn.metrics.pairwise import euclidean_distances, cosine_similarity 5 | import matplotlib.pyplot as plt 6 | from sklearn.cluster import AgglomerativeClustering 7 | from sklearn import metrics 8 | 9 | 10 | EPS = 1e-15 11 | def run_aggclustering(path, file_name, affinity, n_clusters, linkage = 'ward'): 12 | #print(n_clusters) 13 | X = np.loadtxt(path+file_name, delimiter=' ') 14 | if 'csv' in file_name: 15 | file_name = file_name[:-4] 16 | 17 | adj = np.loadtxt('../data/Spatial_matrix_rook.csv', delimiter=',') 18 | model = AgglomerativeClustering(linkage=linkage, n_clusters=n_clusters, connectivity = adj, affinity=affinity) 19 | 20 | model.fit(X) 21 | labels = model.labels_ 22 | 23 | homo_score = lwinc_purity(labels) 24 | 25 | total_ratio = intra_inter_idx(labels, n_clusters) 26 | median_ineq = community_inequality(labels, file_name, path, n_clusters) 27 | 28 | median_sim, median_dist = similarity(labels, file_name, path, n_clusters) 29 | 30 | return labels, total_ratio, median_ineq, median_sim, median_dist, homo_score 31 | 32 | 33 | #generate the homogeneous score 34 | def lwinc_purity(labels, lwinc_file = "../data/feature_matrix_lwinc.csv"): 35 | X_lwinc = np.loadtxt(lwinc_file, delimiter=',') 36 | X_lwinc = X_lwinc[:,1:] 37 | n_thres = 5 38 | threshold = np.arange(0, 1+1/n_thres, 1/n_thres) 39 | lwinc_classes = [np.quantile(X_lwinc, q) for q in threshold] #the classification of lwinc perc 40 | lwinc_classes[-1] = 1 + EPS #make the upper limit larger than any exsiting values 41 | 42 | X_classes = np.array([next(i-1 for i,t in enumerate(lwinc_classes) if t > v) for v in X_lwinc]) 43 | 44 | homo_score = metrics.homogeneity_score(X_classes, labels) 45 | print("The homogeneous score is {:.3f}".format(homo_score)) 46 | 47 | return homo_score 48 | 49 | 50 | def intra_inter_idx(labels, k): 51 | CIDs = labels 52 | 53 | #generate ID to Community ID mapping 54 | UID = range(0,len(CIDs)) 55 | ID_dict = dict(zip(UID, CIDs)) 56 | 57 | flow = pd.read_csv('../data/flow_reID.csv') 58 | if 'Unnamed: 0' in flow.columns: 59 | flow = flow.drop(columns = 'Unnamed: 0') 60 | 61 | flow['From'] = flow['From'].map(ID_dict) 62 | flow['To'] = flow['To'].map(ID_dict) 63 | 64 | #groupby into communities 65 | flow_com = flow.groupby(['From','To']).sum(['visitor_flows','pop_flows']).reset_index() 66 | 67 | ComIDs = list(flow_com.From.unique()) 68 | intra_flows = list(flow_com[flow_com['From'] == flow_com['To']]['visitor_flows'].values) 69 | inter_flows = list(flow_com[flow_com['From'] != flow_com['To']].groupby(['From']).sum(['visitor_flows']).reset_index()['visitor_flows']) 70 | d = {'CID':ComIDs, 'intra': intra_flows, 'inter': inter_flows} 71 | df = pd.DataFrame(d) 72 | df['intra_inter'] = df['intra']/df['inter'] 73 | 74 | total_ratio = sum(df['intra'])/sum(df['inter']) 75 | print("The total intra/inter ratio is {:.3f}".format(total_ratio)) 76 | 77 | return total_ratio 78 | 79 | 80 | def similarity(labels, file_name, path, n_clusters, savefig = True, feature_path = '../data/feature_matrix_f1.csv'): 81 | features = np.loadtxt(feature_path, delimiter=',') 82 | X = features[:,1:] 83 | 84 | #calculate cos similarity for all features 85 | cossim_mx = cosine_similarity(X) 86 | 87 | sim_dict = {} 88 | for c in range(n_clusters): 89 | ct_com = np.where(labels == c)[0] 90 | cossim_com = cossim_mx[ct_com[:,None], ct_com[None,:]] #slice the matrix so all the included values is for this community 91 | cossim = cossim_com[np.triu_indices(len(ct_com), k = 0)] 92 | sim_dict[c] = np.mean(cossim) 93 | 94 | median_sim = np.median(list(sim_dict.values())) 95 | 96 | #calculate the euclidean distance for all features 97 | eucdist_mx = euclidean_distances(X) 98 | 99 | dist_dict = {} 100 | for c in range(n_clusters): 101 | ct_com = np.where(labels == c)[0] 102 | eucdist_com = eucdist_mx[ct_com[:,None], ct_com[None,:]] #slice the matrix so all the included values is for this community 103 | eucdist = eucdist_com[np.triu_indices(len(ct_com), k = 0)] 104 | dist_dict[c] = np.mean(eucdist) 105 | 106 | median_dist = np.median(list(dist_dict.values())) 107 | 108 | print("The median cosine similarity is {:.3f}".format(median_sim)) 109 | print("The median euclidean distance similarity is {:.3f}".format(median_dist)) 110 | 111 | return median_sim, median_dist 112 | 113 | def cal_inequality(values): 114 | mean = np.mean(values) 115 | std = np.std(values) 116 | ineq = std/math.sqrt(mean*(1-mean)) 117 | return ineq 118 | 119 | def community_inequality(labels, file_name, path, k = 13): 120 | features = np.loadtxt('../data/feature_matrix_f1.csv', delimiter=',') #use updated features 121 | features = features[:,1:] 122 | pdist = np.linalg.norm(features[:, None]-features, ord = 2, axis=2) 123 | 124 | ineq_dict = {} 125 | for c in range(k): 126 | ct_com = np.where(labels == c)[0] 127 | if len(ct_com) < 2: 128 | continue 129 | else: 130 | pdist_com = pdist[ct_com[:,None], ct_com[None,:]] #slice the pdist so all the included values is for this community 131 | dist = pdist_com[np.triu_indices(len(ct_com), k = 1)] 132 | 133 | #calculate the inequality 134 | ineq = cal_inequality(dist) 135 | ineq_dict[c] = ineq 136 | 137 | median_ineq = np.median(list(ineq_dict.values())) 138 | print("The median inequality is {:.3f}".format(median_ineq)) 139 | 140 | return median_ineq 141 | 142 | -------------------------------------------------------------------------------- /code/clustering.py: -------------------------------------------------------------------------------- 1 | import os 2 | from analytics import run_aggclustering 3 | import csv 4 | import argparse 5 | 6 | 7 | parser = argparse.ArgumentParser() 8 | parser.add_argument('--n_clusters', type=int, default=14, 9 | help='Number of clusters.') 10 | parser.add_argument('--affinity', type=str, default='euclidean', 11 | help='affinity metric') 12 | parser.add_argument('--filename', type=str, default='Epoch_378_dropout_0.1_hop_5.0_losstype_divreg_mod_False.csv', 13 | help='file name') 14 | 15 | args = parser.parse_args() 16 | 17 | 18 | if '../' in args.filename: 19 | args.filename = args.filename.split('/')[-1] 20 | 21 | linkage = 'ward' 22 | path = '../result/' 23 | 24 | labels, total_ratio, median_ineq, median_cossim, median_dist, homo_score = run_aggclustering(path, args.filename, args.affinity, args.n_clusters, linkage) 25 | csv_data = [args.filename, args.n_clusters, linkage, args.affinity, total_ratio, median_ineq, median_cossim, median_dist, homo_score] 26 | result_csv = 'cluster_result.csv' 27 | 28 | if not os.path.exists(os.path.join(path, result_csv)): 29 | with open(os.path.join(path, result_csv), 'w') as f: 30 | csv_write = csv.writer(f) 31 | csv_head = ['file_name', 'n_clusters', 'linkage', 'distance', 'total_ratio', 'median_ineq', 'median_cossim','median_dist', "homo_score"] 32 | csv_write.writerow(csv_head) 33 | f.close() 34 | 35 | with open(os.path.join(path, result_csv), mode='a', newline='') as f1: 36 | csv_write = csv.writer(f1) 37 | csv_write.writerow(csv_data) -------------------------------------------------------------------------------- /code/layers.py: -------------------------------------------------------------------------------- 1 | import math 2 | 3 | import torch 4 | 5 | from torch.nn.parameter import Parameter 6 | from torch.nn.modules.module import Module 7 | 8 | 9 | class GraphConvolution(Module): 10 | """ 11 | Simple GCN layer, similar to https://arxiv.org/abs/1609.02907 12 | """ 13 | 14 | def __init__(self, in_features, out_features, bias=True): 15 | super(GraphConvolution, self).__init__() 16 | self.in_features = in_features 17 | self.out_features = out_features 18 | self.weight = Parameter(torch.FloatTensor(in_features, out_features)) 19 | if bias: 20 | self.bias = Parameter(torch.FloatTensor(out_features)) 21 | else: 22 | self.register_parameter('bias', None) 23 | self.reset_parameters() 24 | 25 | def reset_parameters(self): 26 | stdv = 1. / math.sqrt(self.weight.size(1)) 27 | self.weight.data.uniform_(-stdv, stdv) 28 | if self.bias is not None: 29 | self.bias.data.uniform_(-stdv, stdv) 30 | 31 | def forward(self, input, adj): 32 | support = torch.mm(input, self.weight) 33 | output = torch.spmm(adj, support) 34 | if self.bias is not None: 35 | return output + self.bias 36 | else: 37 | return output 38 | 39 | def __repr__(self): 40 | return self.__class__.__name__ + ' (' \ 41 | + str(self.in_features) + ' -> ' \ 42 | + str(self.out_features) + ')' 43 | -------------------------------------------------------------------------------- /code/models.py: -------------------------------------------------------------------------------- 1 | import torch.nn as nn 2 | import torch.nn.functional as F 3 | from layers import GraphConvolution 4 | 5 | 6 | class GCN(nn.Module): 7 | def __init__(self, nfeat, nhid, nout, dropout): 8 | super(GCN, self).__init__() 9 | 10 | self.gc1 = GraphConvolution(nfeat, nhid) 11 | self.gc2 = GraphConvolution(nhid, nout) 12 | self.dropout = dropout 13 | 14 | def forward(self, x, adj): 15 | x = F.relu(self.gc1(x, adj)) 16 | x = F.dropout(x, self.dropout, training=self.training) 17 | x = self.gc2(x, adj) 18 | x_softmax = F.softmax(x, dim=1) #also returns the softmax result 19 | return x, x_softmax 20 | -------------------------------------------------------------------------------- /code/run_clustering.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | source D:/Applications/anaconda3/etc/profile.d/conda.sh 4 | conda activate D:/Applications/envs/gcn 5 | 6 | nclu=14 7 | find ../result -maxdepth 1 -type f -name Epoch*.csv -exec python clustering.py --filename {} --n_clusters $nclu \; 8 | 9 | #nclu=20 10 | # for nclu in 20 30 50 80 11 | # do 12 | #python clustering.py --n_clusters $nclu --filename algo_line_output_14_epochs_5_new.csv; 13 | #python clustering.py --filename Epoch_378_dropout_0.1_hop_5_losstype_divreg_mod_True.csv --n_clusters $nclu; 14 | # python clustering.py --filename algo_node2vec_output_13_epochs_5.csv --n_clusters $nclu --nbr none; 15 | # python clustering.py --filename algo_deepwalk_output_13_epochs_5.csv --n_clusters $nclu --nbr none; 16 | #python clustering.py --n_clusters $nclu --algo kmeans; 17 | # done# -------------------------------------------------------------------------------- /code/train.py: -------------------------------------------------------------------------------- 1 | from __future__ import division 2 | from __future__ import print_function 3 | 4 | import time 5 | import argparse 6 | import numpy as np 7 | 8 | import torch 9 | import torch.nn.functional as F 10 | import torch.optim as optim 11 | 12 | from utils import load_widata, purge 13 | from models import GCN 14 | import csv 15 | import os 16 | import math 17 | # Training settings 18 | parser = argparse.ArgumentParser() 19 | parser.add_argument('--no-cuda', action='store_true', default=False, 20 | help='Disables CUDA training.') 21 | parser.add_argument('--seed', type=int, default=42, help='Random seed.') 22 | parser.add_argument('--epochs', type=int, default=500, 23 | help='Number of epochs to train.') 24 | parser.add_argument('--patience', type=int, default=50, 25 | help='Early stopping control.') 26 | parser.add_argument('--ltype', type=str, default='divreg', 27 | help='divide or loglike with regularization.') 28 | parser.add_argument('--lr', type=float, default=0.001, 29 | help='Initial learning rate.') 30 | parser.add_argument('--weight_decay', type=float, default=5e-4, 31 | help='Weight decay (L2 loss on parameters).') 32 | parser.add_argument('--hidden', type=int, default=16, 33 | help='Number of hidden units.') 34 | parser.add_argument('--output', type=int, default=14, 35 | help='Output dim.') 36 | parser.add_argument('--dropout', type=float, default=0.1, 37 | help='Dropout rate (1 - keep probability).') 38 | parser.add_argument('--hops', type=float, default=5, 39 | help='Contrain with hops') 40 | 41 | 42 | args = parser.parse_args() 43 | args.cuda = not args.no_cuda and torch.cuda.is_available() 44 | 45 | np.random.seed(args.seed) 46 | torch.manual_seed(args.seed) 47 | if args.cuda: 48 | torch.cuda.manual_seed(args.seed) 49 | 50 | 51 | # Load data 52 | adj, features, labels, neg_mask, pos_mask, hops_m, intensity_m_norm, strength = load_widata() 53 | print(args.ltype) 54 | 55 | # Model and optimizer 56 | model = GCN(nfeat=features.shape[1], 57 | nhid=args.hidden, 58 | nout=args.output, 59 | dropout=args.dropout) 60 | optimizer = optim.Adam(model.parameters(), 61 | lr=args.lr, weight_decay=args.weight_decay) 62 | 63 | 64 | EPS = 1e-15 65 | 66 | if args.cuda: 67 | model.cuda() 68 | features = features.cuda() 69 | adj = adj.cuda() 70 | labels = labels.cuda() 71 | neg_mask = neg_mask.cuda() 72 | pos_mask = pos_mask.cuda() 73 | hops_m = hops_m.cuda() 74 | 75 | N_pos = sum(sum(pos_mask)) 76 | N_neg = sum(sum(neg_mask)) 77 | 78 | # Train model 79 | t_total = time.time() 80 | loss_list = [] 81 | for epoch in range(args.epochs): 82 | t = time.time() 83 | model.train() 84 | optimizer.zero_grad() 85 | output, output_sfx = model(features, adj) 86 | 87 | pdist = torch.norm(output[:, None]-output, dim=2, p=2) 88 | inner_pro = torch.mm(output,output.T) 89 | loss_hops = 0 90 | 91 | # the div loss uses the flow strength directly. It tries to minimize the loss of W_pos*Dist - W_zero*Dist. A larger strength (W_pos) leads to smaller\ 92 | # the distances between embeddings; the W_zero refers nodes with no flow, so their embeddings differences should be as large as possible. 93 | if args.ltype == 'div': 94 | if args.hops > 1: 95 | loss_hops = torch.sum(pdist.mul(hops_m)) + EPS 96 | loss_train = torch.sum(pdist.mul(labels).mul(pos_mask)) /( (torch.sum(pdist.mul(neg_mask)) + EPS) + loss_hops) 97 | 98 | elif args.ltype == 'divreg': 99 | if args.hops > 1: 100 | loss_hops = torch.sum(pdist.mul(hops_m)) + EPS 101 | loss_train = torch.sum(pdist.mul(labels).mul(pos_mask))*N_neg /(N_pos*( (torch.sum(pdist.mul(neg_mask)) + EPS) + loss_hops)) 102 | 103 | 104 | 105 | loss_train.backward() 106 | optimizer.step() 107 | 108 | loss_list.append(loss_train.item()) 109 | 110 | print('Epoch: {:04d}'.format(epoch), 111 | 'loss_train: {:.5f}'.format(loss_train.item()), 112 | 'time: {:.4f}s'.format(time.time() - t)) 113 | 114 | 115 | result_path = '../result/' 116 | if not os.path.exists(result_path): 117 | os.mkdir(result_path) 118 | 119 | if epoch >= 200: 120 | save_name = 'lr_{}_dropout_{}_hidden_{}_output_{}_patience_{}_hos_{}_losstype_{}_seed_{}.csv'.format(args.lr, args.dropout, args.hidden, args.output, args.patience, args.hops, args.ltype, args.seed) 121 | 122 | np.savetxt(result_path + 'Epoch_{}_'.format(epoch) + save_name, output.detach().numpy()) 123 | 124 | if epoch > 200 + args.patience and loss_train > np.average(loss_list[-args.patience:]): 125 | best_epoch = loss_list.index(min(loss_list)) 126 | print('Lose patience, stop training...') 127 | print('Best epoch: {}'.format(best_epoch)) 128 | purge(result_path, save_name, best_epoch, epoch-best_epoch) 129 | break 130 | 131 | if epoch == args.epochs -1: 132 | print('Last epoch, saving...') 133 | best_epoch = epoch 134 | purge(result_path, save_name, best_epoch, 0) 135 | 136 | 137 | print("Optimization Finished!") 138 | print("Total time elapsed: {:.4f}s".format(time.time() - t_total)) 139 | 140 | 141 | result_csv = 'result.csv' 142 | if not os.path.exists(os.path.join(result_path, result_csv)): 143 | with open(os.path.join(result_path, result_csv), 'w') as f: 144 | csv_write = csv.writer(f) 145 | csv_head = ['epoch', 'losstype', 'hops', 'inertia', 'total_ratio', 'global_ineq', 'output', 'hidden', 'lr', 'dropout', 'patience', 'median_ineq','n_clu'] 146 | csv_write.writerow(csv_head) 147 | f.close() 148 | 149 | 150 | -------------------------------------------------------------------------------- /code/utils.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import scipy.sparse as sp 3 | import torch 4 | import torch.nn.functional as F 5 | import os 6 | import re 7 | 8 | EPS = 1e-15 9 | 10 | 11 | def encode_onehot(labels): 12 | classes = set(labels) 13 | classes_dict = {c: np.identity(len(classes))[i, :] for i, c in 14 | enumerate(classes)} 15 | labels_onehot = np.array(list(map(classes_dict.get, labels)), 16 | dtype=np.int32) 17 | return labels_onehot 18 | 19 | 20 | 21 | def load_widata(path="../data/", dataset="wi", hops=5): 22 | print('Loading {} dataset...'.format(dataset)) 23 | 24 | features = np.loadtxt(path+'feature_matrix_f1.csv', delimiter=',') 25 | features = torch.FloatTensor(features[:,1:]) 26 | 27 | intensity_m = np.loadtxt(path + 'Flow_matrix.csv', delimiter=',') 28 | intensity_neg = np.zeros([len(intensity_m),len(intensity_m)]) 29 | intensity_neg[intensity_m == 0] = 1 30 | intensity_pos = np.zeros([len(intensity_m),len(intensity_m)]) 31 | intensity_pos[intensity_m > 0] = 1 32 | 33 | intensity_m = np.log(intensity_m + EPS) 34 | intensity_m_norm = mx_normalize(intensity_m) 35 | intensity_m_norm = torch.FloatTensor(intensity_m_norm) 36 | strength = torch.sum(intensity_m_norm, axis = 0) 37 | strength = torch.FloatTensor(strength) 38 | 39 | hops_m = np.loadtxt(path + 'Spatial_distance_matrix.csv', delimiter=',') 40 | zero_entries = hops_m < hops 41 | hops_m = 1/(np.log(hops_m + EPS)+1) 42 | hops_m[zero_entries] = 0 43 | hops_m = torch.FloatTensor(hops_m) 44 | 45 | intensity_m = torch.FloatTensor(intensity_m) 46 | intensity_neg = torch.FloatTensor(intensity_neg) 47 | intensity_pos = torch.FloatTensor(intensity_pos) 48 | 49 | adj = np.loadtxt(path + 'Spatial_matrix.csv', delimiter=',') 50 | adj = normalize(adj + sp.eye(adj.shape[0])) 51 | adj = torch.FloatTensor(adj) 52 | 53 | return adj, features, intensity_m, intensity_neg, intensity_pos, hops_m, intensity_m_norm, strength 54 | 55 | def normalize(mx): 56 | """Row-normalize sparse matrix""" 57 | rowsum = np.array(mx.sum(1)) 58 | r_inv = np.power(rowsum, -1).flatten() 59 | r_inv[np.isinf(r_inv)] = 0. 60 | r_mat_inv = sp.diags(r_inv) 61 | mx = r_mat_inv.dot(mx) 62 | return mx 63 | 64 | def mx_normalize(mx): 65 | "element wise normalization" 66 | mx_std = (mx - mx.min()) / (mx.max() - mx.min()) 67 | return mx_std 68 | 69 | def accuracy(output, labels): 70 | preds = output.max(1)[1].type_as(labels) 71 | correct = preds.eq(labels).double() 72 | correct = correct.sum() 73 | return correct / len(labels) 74 | 75 | 76 | def sparse_mx_to_torch_sparse_tensor(sparse_mx): 77 | """Convert a scipy sparse matrix to a torch sparse tensor.""" 78 | sparse_mx = sparse_mx.tocoo().astype(np.float32) 79 | indices = torch.from_numpy( 80 | np.vstack((sparse_mx.row, sparse_mx.col)).astype(np.int64)) 81 | values = torch.from_numpy(sparse_mx.data) 82 | shape = torch.Size(sparse_mx.shape) 83 | return torch.sparse.FloatTensor(indices, values, shape) 84 | 85 | def purge(dir, filename, best_epoch, spill_num): 86 | del_list = ['Epoch_{}_'.format(i) + filename for i in range(0, best_epoch)] 87 | if spill_num > 0: 88 | tmp = ['Epoch_{}_'.format(j) + filename for j in range(best_epoch + 1, best_epoch + spill_num + 1)] 89 | del_list.extend(tmp) 90 | for f in os.listdir(dir): 91 | if f in del_list: 92 | os.remove(os.path.join(dir,f)) -------------------------------------------------------------------------------- /data.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/GeoDS/Region2vec/8d265088babf1cb74139b119913737fb3eaac6bb/data.zip -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | community==1.0.0b1 2 | geopandas==0.9.0 3 | matplotlib==3.0.3 4 | networkx==2.6.2 5 | numpy~>1.21 6 | pandas==0.24.1 7 | pyproj==3.1.0 8 | python_louvain==0.16 9 | scikit_learn==1.5.1 10 | scipy==1.3.1 11 | torch>=2.2.0 12 | --------------------------------------------------------------------------------