├── .pydevproject ├── .project ├── LICENSE ├── README.md └── src ├── network_models.py ├── draw_plots.py ├── three_pass_benchmarks.py └── FARZ.py /.pydevproject: -------------------------------------------------------------------------------- 1 | 2 | 3 | Default 4 | python 2.7 5 | 6 | -------------------------------------------------------------------------------- /.project: -------------------------------------------------------------------------------- 1 | 2 | 3 | FARZ 4 | 5 | 6 | 7 | 8 | 9 | org.python.pydev.PyDevBuilder 10 | 11 | 12 | 13 | 14 | 15 | org.python.pydev.pythonNature 16 | 17 | 18 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | The MIT License (MIT) 2 | 3 | Copyright (c) 2015 rabbanyk 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | 23 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # FARZ 2 | ## Benchmarks for Community Detection Algorithms 3 | 4 | FARZ is generator/simulator for networks with built-in community structure. 5 | It creates graphs/networks with community labels, which can be used for evaluating community detection algorithms. 6 | 7 | ### Generator Parameters 8 | * main parameters 9 | + `n`: number of nodes 10 | + `m`: number of edges created per node (i.e. half the average degree of nodes) 11 | + `k`: number of communities 12 | * control parameters 13 | + `beta`: the strength of community structure, i.e. the probability of edges to be formed within communities, default (0.8) 14 | + `alpha`: the strength of common neighbor's effect on edge formation edges, default (0.5) 15 | + `gamma`: the strength of degree similarity effect on edge formation, default (0.5), can be negative for networks with negative degree correlation 16 | * overlap parameters 17 | + `r`: the maximum number of communities each node can belong to, default (1, which results in disjoint communities) 18 | + `q`: the probability of a node belonging to the multiple communities, default (0.5, has an effect only if r>1) 19 | * config parameters 20 | + `phi`: the constant added to all community sizes, higher number makes the communities more balanced in size, default (1) which results in power law distribution for community sizes 21 | + `epsilon`: the probability of noisy/random edges, default (0.0000001) 22 | + `t`: the probability of also connecting to the neighbors of a node each nodes connects to. The default value is (0), but could be increase to a small number to achieve higher clustering coefficient. 23 | 24 | ### How to run 25 | The source code is in Pyhton 2.7. 26 | You can generate FARZ benchmarks from FARZ.py in src. 27 | See ` python FARZ.py -h ` to examine the usage and options; or try the following examples. 28 | 29 | ### Examples 30 | * example 1: generate a network with 1000 nodes and about 5x1000 edges (m=5), with 4 communities, where 90% of edges fall within communities (beta=0.9) 31 | 32 | ` python FARZ.py -n 1000 -m 5 -k 4 --beta 0.9` 33 | * example 2: generate a network with properties of example 1, where alpha = 0.2 and gamma = -0.8 34 | 35 | ` python FARZ.py -n 1000 -m 5 -k 4 --beta 0.9 --alpha 0.2 --gamma -0.8 ` 36 | * example 3: generate 10 sample networks with properties of example 1 and save them into ./data 37 | 38 | ` python FARZ.py --path ./data -s 10 -n 1000 -m 5 -k 4 --beta 0.9` 39 | * example 4: repeat example 2, for beta that varies from 0.5 to 1 with 0.05 increments 40 | 41 | ` python FARZ.py --path ./data -s 10 -v beta -c [0.5,1,0.05] -n 1000 -m 5 -k 4 ` 42 | * example 5: generate overlapping communities, where each node belongs to at most 3 communities and the portion of overlapping nodes varies, overlapping communities are saved as a list format (a community per line) 43 | 44 | ` python FARZ.py -r 3 -v q --path ./data -s 5 --format list` 45 | 46 | ### Supplementary Materials 47 | 48 | 49 | ### Support or Contact 50 | Reihaneh Rabbany, rabbanyk@ualberta.ca 51 | -------------------------------------------------------------------------------- /src/network_models.py: -------------------------------------------------------------------------------- 1 | import igraph as ig 2 | import networkx as nx 3 | import numpy as np 4 | 5 | 6 | class Graph: 7 | def __init__(self, n, edge_list, directed=False): 8 | self.edge_list = edge_list 9 | self.n = n 10 | self.directed=directed 11 | # self.C = None 12 | # self.Cid = None 13 | self.rewirings_count = 0 14 | 15 | self.deg = [0]* n 16 | self.neigh = [[] for i in range(n)] 17 | for e in edge_list: 18 | u , v = e 19 | self.neigh[u].append(v) 20 | self.deg[v]+=1 21 | if not self.directed: #if directed deg is indegree, outdegree = len(negh) 22 | self.neigh[v].append(u) 23 | self.deg[u]+=1 24 | return 25 | 26 | def add_edge(self, u, v): 27 | self.edge_list.append((u,v) if u hence works with any input, even if not a valid degree sequence 74 | # todo: add direction and weights 75 | ''' 76 | edge_list = [] 77 | max_itr = len(S) 78 | itr = 0 79 | Q = [i for i in range(0, len(S)) if S[i]!=0 ] 80 | while len(Q)>(1 if not selfloop else 0): 81 | itr+=1 82 | i, j = np.random.choice(Q, size =2, replace = False if not selfloop else True) 83 | e = (i,j) if imax_itr: 86 | edge_list.append(e) 87 | S[i]-=1 88 | S[j]-=1 89 | itr = 0 90 | if S[i]==0: Q.remove(i) 91 | if S[j]==0: Q.remove(j) 92 | 93 | return edge_list if multilinks else list(set(edge_list)) 94 | 95 | -------------------------------------------------------------------------------- /src/draw_plots.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import matplotlib 3 | import matplotlib.pyplot as plt 4 | from scipy.stats import gaussian_kde 5 | from scipy.interpolate import UnivariateSpline 6 | from scipy import interpolate 7 | from sklearn.neighbors.kde import KernelDensity 8 | from scipy.stats import norm 9 | from sklearn.neighbors import KernelDensity 10 | from matplotlib.patches import Polygon 11 | 12 | matplotlib.rcParams.update({'font.size': 14, 13 | 'font.family': 'Times New Roman', 14 | # 'legend.frameon': False 15 | }) 16 | colrs= ["#429617", 17 | "#4A76BD", 18 | "#E04927", 19 | "#EEC958", 20 | "#B962CE", 21 | "#57445A", 22 | "#716035", 23 | "#C6617E", 24 | "#92C4AE"] 25 | markers=['o','>','s','d','*','v','<','o','>','s','d','*','v','<'] 26 | 27 | 28 | 29 | def rewirings(data): 30 | fig,axess = plt.subplots(2, 2, True,True, squeeze=False, figsize=(8,8)) 31 | plt.subplots_adjust(left=0.1, bottom=0.1, right=0.98, top=0.95, 32 | wspace=0.02, hspace=0.14) 33 | 34 | network_models = ['CF', 'BA', 'FF', 'FF+BA'] 35 | assign_methods = ["LFR","LFR-CN","LFR-NE"] 36 | 37 | for i in {0,1}: 38 | for j in {0,1}: 39 | mean, var = data[i][j] 40 | k = i+j 41 | ax = axess[i,j] 42 | for m in range(0, len(mean)): 43 | ax.errorbar(np.arange(1,len(mean[m])+1)*0.1, mean[m], yerr=var[m], 44 | label =assign_methods[m], c=colrs[m], marker=markers[m],linewidth=1.2)#, xerr=0.4) 45 | ax.legend() 46 | ax.set_title(network_models[k]) 47 | 48 | # axess[1,0].set_xlabel("mixing parameter $\mu$") 49 | # axess[0,0].set_ylabel("percentage of rewirings") 50 | axess[0,0].set_xlim(0, (len(mean[i])+1)*0.1) 51 | 52 | fig.text(0.5, 0.05, 'mixing parameter $\mu$', ha='center', va='center', size=22) 53 | fig.text(0.025, 0.5, 'percentage of rewirings', ha='center', va='center', rotation='vertical', size=22) 54 | 55 | # plt.tight_layout() 56 | plt.show() 57 | 58 | 59 | # fakeMean = [[.7,.6,.3,.2,.3,.2],[.4,.3,.2,.4,.35,.25],[.2,.3,.4,.5,.55,.44]] 60 | # fakevar = [[.05,.05,.05,.05,.05,.05],[.05,.05,.05,.05,.05,.05],[.05,.05,.05,.05,.05,.05]] 61 | # fakeOnePlotData = (fakeMean,fakevar) 62 | # rewirings( [[fakeOnePlotData,fakeOnePlotData],[fakeOnePlotData,fakeOnePlotData]] ) 63 | # 64 | # 65 | 66 | 67 | #sequences for degrees, shortest_pathes, clustering_coeffients, degree_corelations 68 | def basic_properties_freq( sequences , axess=None, labl = None, logscale=False, markr=None, clr=None,offset=0): 69 | if axess is None: 70 | fig,axess = plt.subplots( 3,len(sequences),'col',False, squeeze=False, figsize=(10,8)) 71 | plt.subplots_adjust(left=0.04, bottom=0.05, right=0.98, top=0.94, wspace=0.28, hspace=0.1) 72 | 73 | for i in range(0,len(sequences)): 74 | ax = axess[offset,i] 75 | seq = sequences[i] 76 | seq = [f for f in seq if f>0] 77 | smax =max(seq) 78 | smin =min(seq) 79 | 80 | #print seq 81 | freqs , bin_edges = np.histogram(seq, smax+1 if smax>1 else 100, range = (0,smax+1) if smax>1 else (0,smax))#, normed = True, density=True) 82 | bin_centers = (bin_edges[:-1] + bin_edges[1:])/2. 83 | vals = range(0,smax+1) if smax>1 else bin_centers 84 | freqs=freqs*1.0/sum(freqs) 85 | 86 | # fplot = ax.loglog if lplot else ax.plot 87 | 88 | # his, = ax.plot(vals, freqs,lw=0, label=labl, alpha =0.8, color = clr , marker =markr) 89 | # if lplot: 90 | # his, = ax.loglog(vals, freqs,'.', marker ='.', label=labl, alpha =0.5) 91 | # his, = ax.loglog(vals, freqs,'.', marker ='.', label=labl, alpha =0.5) 92 | # else : 93 | # his, = ax.plot(vals, freqs,'.', marker ='.', label=labl, alpha =0.5) 94 | # his, = ax.loglog(vals, freqs,'.', marker ='.', label=labl, alpha =0.5) 95 | 96 | 97 | # x = bin_centers 98 | # f = UnivariateSpline(x, freqs)#, s=0.1*len(freq)) 99 | # ax.plot(x, f(x),c= his.get_color(), alpha=0.5) 100 | # print len(freqs) #, freqs 101 | # print bin_edges 102 | # print bin_centers 103 | 104 | #remove zeros 105 | y = np.array(freqs) 106 | nz_indexes = np.nonzero(y) 107 | y = y[nz_indexes] 108 | x = np.array(vals)[nz_indexes] 109 | 110 | # ax.plot(x, y,':',c= his.get_color(),alpha=0.8) 111 | ax.plot(x, y,':', label=labl, alpha =0.8, color = clr , marker =markr) 112 | # f = interpolate.interp1d(x, y, kind='linear') 113 | # f = interpolate.UnivariateSpline(x, y, k=2) 114 | # 115 | # xs = np.linspace(min(x),max(x),200) 116 | # ys = f(xs) # use interpolation function returned by `interp1d` 117 | # ax.plot(xs, ys,'-',c= his.get_color(),) 118 | # 119 | # density = gaussian_kde(y) 120 | # xs = np.linspace(min(x),max(x),200) 121 | # # density.covariance_factor = lambda : .5 122 | # # density._compute_covariance() 123 | # ax.plot(xs,density(xs),c= his.get_color(), alpha=0.5) 124 | if len(logscale)==len(sequences): 125 | if 'x' in logscale[i] : ax.set_xscale('log') 126 | if 'y' in logscale[i] : ax.set_yscale('log') 127 | # ax.legend() 128 | # plt.show() 129 | return axess 130 | 131 | 132 | 133 | #sequences for degrees, shortest_pathes, clustering_coeffients, degree_corelations 134 | def basic_properties( sequences , axess=None, labl = None, logscale=[False], markr='.', clr='k',offset=0, alfa = 0.8, 135 | distir = [False,False,False, False], bandwidths = [3, 0.1,0.01,1], limits = [(1,50),(0,1),(0,1),(1,25)] ): 136 | if axess is None: 137 | fig,axess = plt.subplots( 3, len(sequences),False,False, squeeze=False,figsize=(len(sequences)*3,8))#'col' 138 | plt.subplots_adjust(left=0.12, bottom=0.05, right=0.95, top=0.94, wspace=0.28, hspace=0.1) 139 | plt.subplots_adjust(left=0.45, bottom=0.05, right=0.95, top=0.94, wspace=0.28, hspace=1.2) 140 | 141 | for i in range(0,len(sequences)): 142 | ax = axess[offset][i] 143 | seq = sequences[i] 144 | smax =max(seq) 145 | smin =min(seq) 146 | 147 | if distir[i]==0: 148 | #print seq 149 | freqs , bin_edges = np.histogram(seq, smax+1 if smax>1 else 100, range = (0,smax+1) if smax>1 else (0,smax))#, normed = True, density=True) 150 | bin_centers = (bin_edges[:-1] + bin_edges[1:])/2. 151 | vals = range(0,smax+1) if smax>1 else bin_centers 152 | freqs=freqs*1.0/sum(freqs) 153 | #remove zeros 154 | y = np.array(freqs) 155 | nz_indexes = np.nonzero(y) 156 | y = y[nz_indexes] 157 | x = np.array(vals)[nz_indexes] 158 | ax.plot(x, y,':', label=labl, alpha =alfa, color = clr , marker ='.') 159 | else : 160 | X = np.array(seq) 161 | X = [ x for x in X if x>=limits[i][0] and x<=limits[i][1]] 162 | # X= (np.abs(X)) 163 | # print len(X) 164 | X = np.random.choice(X, size=min(10000, len(X))) 165 | X = X[:, np.newaxis] 166 | kde = KernelDensity(kernel = 'gaussian', bandwidth=bandwidths[i]).fit(X)#,atol=atols[i],kernel = 'tophat'kernel='gaussian' 167 | # if 'x' in logscale[i] : 168 | # X_plot = np.logspace( limits[i][0], limits[i][1], 1000)[:, np.newaxis] 169 | # else : 170 | X_plot = np.linspace(limits[i][0], limits[i][1], 1000)[:, np.newaxis] 171 | 172 | log_dens = kde.score_samples(X_plot) # 173 | # ax.fill(X_plot[:, 0], np.exp(log_dens), alpha =0.5, label=labl) 174 | Y = np.exp(log_dens) 175 | if distir[i]==2: Y = np.cumsum(Y) 176 | ax.plot(X_plot[:, 0],Y, '-',label=labl, alpha =alfa, color = clr ,markersize=2, marker ='') 177 | 178 | verts = [(limits[i][0]-1e-6, 0)] + list(zip(X_plot[:, 0],Y)) + [(limits[i][1]+1e-6, 0)] 179 | poly = Polygon(verts, facecolor=clr, alpha =alfa ) #, edgecolor='0.5') 180 | ax.add_patch(poly) 181 | # ax.set_yticks([]) 182 | # ax.set_ylim(bottom=-0.02) 183 | ax.set_xlim(limits[i][0],limits[i][1]) 184 | 185 | if len(logscale)==len(sequences): 186 | if 'x' in logscale[i] : 187 | ax.set_xscale('log') 188 | if 'y' in logscale[i] : 189 | ax.set_yscale('log') 190 | if i<3: ax.set_ylim(bottom=0.001) 191 | # ax.legend() 192 | # plt.show(block=False) 193 | return axess 194 | 195 | def test_density_plot(): 196 | fig, ax = plt.subplots(2, 2, sharex=True, sharey=True) 197 | 198 | N=20 199 | X = np.concatenate((np.random.normal(0, 1, 0.3 * N), 200 | np.random.normal(5, 1, 0.7 * N)))[:, np.newaxis] 201 | 202 | print np.shape(X) 203 | X_plot = np.linspace(-5, 10, 1000)[:, np.newaxis] 204 | print np.shape(X_plot) 205 | kde = KernelDensity(kernel='gaussian', bandwidth=0.75).fit(X) 206 | log_dens = kde.score_samples(X_plot) 207 | ax[0,0].fill(X_plot[:, 0], np.exp(log_dens), fc='#AAAAFF') 208 | ax[0,0].text(-3.5, 0.31, "Gaussian Kernel Density") 209 | ax[0,0].plot(X[:, 0], np.zeros(X.shape[0]) - 0.01, '+k') 210 | 211 | plt.show() 212 | 213 | -------------------------------------------------------------------------------- /src/three_pass_benchmarks.py: -------------------------------------------------------------------------------- 1 | import random 2 | import bisect 3 | import numpy as np 4 | from network_models import * 5 | 6 | def generalize_three_pass(network_model, assign_nodes, overlay_communities, g_params, c_params): 7 | G = network_model(g_params) 8 | # print_seq_stats( '\t\t network_generated', G.deg) 9 | return generalize_three_pass_network(G, assign_nodes, overlay_communities, c_params ) 10 | 11 | def generalize_three_pass_network(G, assign_nodes, overlay_communities, c_params): 12 | C = assign_nodes(G, c_params) 13 | print_seq_stats('\t\t node_assigned', [len(c) for c in C[0]]) 14 | return overlay_communities(G, C, c_params) 15 | 16 | def print_seq_stats(msg, S): 17 | print msg, ':::\t len: ',len(S),' min: ', np.min(S) ,' avg: ',np.mean(S),' max: ', np.max(S),' sum: ', np.sum(S) 18 | 19 | 20 | 21 | def assign_CN(G, c_params): 22 | def cn_prob(G, v, C, ec, Cid): 23 | p = [0.1]*(len(C)) 24 | for u in G.neigh[v]: 25 | if Cid[u]>=0: p[Cid[u]]+=1 26 | p= [p[i] for i in ec] #remove communities that are full 27 | p= [i/sum(p) for i in p]#np.divide(p, np.sum(p)) 28 | # print p 29 | return p 30 | return assign_LFR(G, c_params, cn_prob) 31 | 32 | def random_choice(values, weights=None, size = 1, replace = True): 33 | if weights is None: 34 | i = int(random.random() * len(values)) 35 | # i = random.randrange(0,len(values)) 36 | # res = random.choice(values) 37 | else : 38 | total = 0 39 | cum_weights = [] 40 | for w in weights: 41 | total += w 42 | cum_weights.append(total) 43 | x = random.random() * total 44 | i = bisect.bisect(cum_weights, x) 45 | # print weights 46 | #res = values[i] 47 | 48 | if size <=1: return values[i] 49 | else: 50 | # print i, values 51 | cval = [values[j] for j in range(len(values)) if replace or i<>j] 52 | if weights is None: cwei=None 53 | else: cwei = [weights[j] for j in range(len(weights)) if replace or i<>j] 54 | 55 | # if not replace : del values[i] 56 | return values[i], random_choice(cval, cwei, size-1, replace) 57 | 58 | def assign_first_pass_original(G,mu, s, c, cid, prob): 59 | # assign nodes to communities 60 | for v in range(G.n): 61 | # pick a community at random 62 | ec = [i for i in range( len(c)) if s[i]>len(c[i])] 63 | 64 | # if prob is None: 65 | # p = None 66 | # else: 67 | # print prob 68 | # print ec 69 | # p = prob(G,v,c, cid) 70 | # p = [p[i] for i in ec] 71 | # p = np.divide(p, np.sum(p)) 72 | i = random_choice(ec, None if prob is None else prob(G,v,c,ec, cid)) 73 | # assign to community if fits 74 | if s[i] >= (1- mu)*G.deg[v] : 75 | c[i].append(v) 76 | cid[v] = i 77 | 78 | def assign_first_pass_NE(G,mu, s, c, cid, prob): 79 | for v in range(G.n): 80 | if cid[v]==-1: 81 | # pick a community at random 82 | ec = [i for i in range( len(c)) if s[i]>len(c[i])] 83 | i = random_choice(ec, None if prob is None else prob(G,v,c,ec, cid)) 84 | 85 | # i = np.random_choice([i for i in range( len(c)) if s[i]>len(c[i])], 86 | # p = None if prob is None else prob(G,v,c, cid) ) 87 | to_add = [v] 88 | marked = [0]*G.n 89 | marked[v] =1 90 | while len(to_add)>0 and len(c[i])< s[i]: 91 | v = to_add.pop(0) 92 | # assign to community if fits 93 | if s[i] >= (1- mu)*G.deg[v] : 94 | c[i].append(v) 95 | cid[v] = i 96 | for u in G.neigh[v]: 97 | if marked[u]==0 and cid[u]==-1: 98 | to_add.append(u) 99 | marked[u]=1 100 | 101 | 102 | def assign_LFR(G, c_params, prob=None, first_pass= assign_first_pass_original, max_itr = 1000): 103 | c_params['s_sum'] = G.n 104 | mu = c_params['mu'] 105 | # determine capacity of communities 106 | d_max = np.max(G.deg) 107 | # print G.n, c_params['s_max'], d_max, (1-mu) *d_max 108 | 109 | if c_params['s_max']< (1-mu) *d_max : c_params['s_max'] =(1-mu) * d_max 110 | # print c_params['s_max'], d_max 111 | 112 | s = sample_power_law(**c_params) 113 | c_max = max(s) 114 | # print_seq_stats('community_sizes_sampeled',s) 115 | 116 | # initialize the communities and community ids 117 | c = [[] for i in range(len(s))] 118 | cid = [-1] * G.n 119 | first_pass(G,mu, s, c, cid, prob) 120 | 121 | # print_seq_stats('1... ',[len(l) for l in c]) 122 | # initialize the homeless queue 123 | H = [v for v in range(G.n) if cid[v]==-1] 124 | 125 | itr = 0 126 | # assign homeless nodes to communities 127 | while len(H)>0 and max_itr>itr: 128 | itr+=1 129 | # print itr 130 | # pick a community at random 131 | v = random_choice(H) 132 | ec = [i for i in range( len(c))] 133 | i = random_choice(ec, None if prob is None else prob(G, v,c,ec, cid) ) 134 | if s[i] >= min((1- mu)*G.deg[v], c_max) : 135 | c[i].append(v) 136 | cid[v]=i 137 | H.remove(v) 138 | # itr=0 139 | # kick out a random node 140 | if len(c[i])> s[i]: 141 | u = random_choice(c[i]) 142 | c[i].remove(u) 143 | cid[u] = -1 144 | H.append(u) 145 | 146 | if len(H)>0: print "Failed in 2nd run" 147 | for v in H: 148 | # pick a community at random 149 | ec =[i for i in range( len(c))] 150 | i = random_choice(ec, None if prob is None else prob(G, v,c,ec, cid) ) 151 | c[i].append(v) 152 | cid[v]=i 153 | 154 | return c, cid 155 | 156 | 157 | 158 | def assign_NE(G, c_params, prob=None): 159 | return assign_LFR(G, c_params, prob=None, first_pass= assign_first_pass_NE) 160 | 161 | def overlay_LFR(G, C, c_params): 162 | mu = c_params['mu'] 163 | n= G.n 164 | deg = G.deg #degree_seq(n, edge_list) 165 | 166 | C, Cid = C 167 | # determine degree of each node and its between/outlink degree, 168 | # i.e. number of edges that go outside its community 169 | db = [0]* n 170 | # d = [0] * n 171 | # neigh = [[] for i in range(0, n)] 172 | for e in G.edge_list: 173 | u , v = e 174 | if Cid[u] != Cid[v]: 175 | db[u]+=1 176 | db[v]+=1 177 | # determine desired between changes 178 | for v in range(n): 179 | db[v] = np.floor(mu*G.deg[v] - db[v]) 180 | dw = np.multiply(db, -1) 181 | # rewire edges within communities 182 | for c in C: 183 | I = [v for v in c if dw[v]>0] 184 | # add internal edges 185 | while len(I)>=2: 186 | u, v = random_choice(I, size =2, replace = False) 187 | G.add_edge(u,v) 188 | dw[u]-=1 189 | dw[v]-=1 190 | if dw[u] ==0: I.remove(u) 191 | if dw[v] ==0: I.remove(v) 192 | # remove excess edges 193 | for v in c: 194 | if dw[v]<0: 195 | I = [u for u in G.neigh[v] if u in c and dw[u]<0] 196 | while len(I)>=1 and dw[v]<0: 197 | u = random_choice(I) 198 | G.remove_edge(u,v) 199 | dw[u]+=1 200 | dw[v]+=1 201 | I.remove(u) 202 | 203 | # rewire edges between communities 204 | for c in C: 205 | I = [v for v in c if db[v]>0] 206 | O = [v for v in range(n) if v not in c and db[v]>0] 207 | # add internal edges 208 | while len(I)>=1 and len(O)>=1: 209 | v = random_choice(I) 210 | u = random_choice(O) 211 | G.add_edge(u,v) 212 | db[u]-=1 213 | db[v]-=1 214 | if db[v] ==0: I.remove(v) 215 | if db[u] ==0: O.remove(u) 216 | # remove excess edges 217 | for v in c: 218 | if db[v]<0: 219 | O = [u for u in G.neigh[v] if u not in c and db[u]<0] 220 | while len(O)>=1 and db[v]<0: 221 | u = random_choice(O) 222 | G.remove_edge(u,v) 223 | db[u]+=1 224 | db[v]+=1 225 | O.remove(u) 226 | 227 | return G, C 228 | 229 | def configuration_model(params): 230 | S = sample_power_law(**params) 231 | # print_seq_stats('degree_sampled',S) 232 | return Graph(len(S),configuration_model_from_sequence(S)) 233 | 234 | 235 | def sample_power_law( s_exp=2, n=None, s_avg=None, s_max=None, s_min=1, s_sum=None, discrete = True, **kwargs): 236 | S = None 237 | if n is not None: # number of samples is fixed 238 | if s_avg is None and s_sum is not None: s_avg = s_sum*1.0/n 239 | # 1.0/np.random.power(exp+1, size=n) 240 | S = np.array([]) 241 | c = None 242 | while (len(S)=s_min: 259 | S.append(tmp) 260 | else: 261 | shift = np.ceil(tmp*1.0/len(S)) 262 | for i in range(0,len(S)): 263 | if tmp>0: 264 | S[i]+=shift 265 | tmp-=shift 266 | else: break 267 | 268 | # print S, np.sum(S), s_sum 269 | return S 270 | 271 | 272 | 273 | def scale_truncate(S, max=None, min=None, avg=None, c= None): 274 | # print c 275 | if c==None: 276 | c = 1 277 | if avg is not None: 278 | itr =0 279 | max_itr = 100 280 | while (itrmin] 286 | if max is not None: S = S[Smin] 290 | if max is not None: S = S[Si: return values[i] 19 | else: return None 20 | else: 21 | cval = [values[j] for j in range(len(values)) if replace or i<>j] 22 | if weights is None: cwei=None 23 | else: cwei = [weights[j] for j in range(len(weights)) if replace or i<>j] 24 | tmp= random_choice(cval, cwei, size-1, replace) 25 | if not isinstance(tmp,list): tmp = [tmp] 26 | tmp.append(values[i]) 27 | return tmp 28 | 29 | class Comms: 30 | def __init__(self, k): 31 | self.k = k 32 | self.groups = [[] for i in range(k)] 33 | self.memberships = {} 34 | 35 | def add(self, cluster_id, i, s = 1): 36 | if i not in [m[0] for m in self.groups[cluster_id]]: 37 | self.groups[cluster_id].append((i,s)) 38 | if i in self.memberships: 39 | self.memberships[i].append((cluster_id,s)) 40 | else: 41 | self.memberships[i] =[(cluster_id,s)] 42 | def write_groups(self, path): 43 | with open(path, 'w') as f: 44 | for g in self.groups: 45 | for i,s in g: 46 | f.write(str(i) + ' ') 47 | f.write('\n') 48 | 49 | 50 | 51 | class Graph: 52 | def __init__(self,directed=False, weighted=False): 53 | self.n = 0 54 | self.counter = 0 55 | self.max_degree = 0 56 | self.directed = directed 57 | self.weighted = weighted 58 | self.edge_list = [] 59 | self.edge_time = [] 60 | self.deg = [] 61 | self.neigh = [[]] 62 | return 63 | 64 | def add_node(self): 65 | self.deg.append(0) 66 | self.neigh.append([]) 67 | self.n+=1 68 | 69 | def weight(self, u, v): 70 | for i,w in self.neigh[u]: 71 | if i == v: return w 72 | return 0 73 | 74 | def is_neigh(self, u, v): 75 | for i,_ in self.neigh[u]: 76 | if i == v: return True 77 | return False 78 | 79 | def add_edge(self, u, v, w=1): 80 | if u==v: return 81 | if not self.weighted : w =1 82 | self.edge_list.append((u,v,w) if uself.max_degree: self.max_degree = self.deg[v] 88 | 89 | if not self.directed: #if directed deg is indegree, outdegree = len(negh) 90 | self.neigh[v].append((u,w)) 91 | self.deg[u]+=w 92 | if self.deg[u]>self.max_degree: self.max_degree = self.deg[u] 93 | 94 | return 95 | 96 | 97 | def to_nx(self): 98 | import networkx as nx 99 | G=nx.Graph() 100 | for u,v, w in self.edge_list: 101 | G.add_edge(u, v) 102 | # G.add_edges_from(self.edge_list) 103 | return G 104 | 105 | def to_nx(self, C): 106 | import networkx as nx 107 | G=nx.Graph() 108 | for i in range(self.n): 109 | G.add_node(i, {'c':str(sorted(C.memberships[i]))}) 110 | # G.add_node(i, {'c':int(C.memberships[i][0][0])}) 111 | for i in range(len(self.edge_list)): 112 | # for u,v, w in self.edge_list: 113 | u,v, w = self.edge_list[i] 114 | G.add_edge(u, v, weight=w, capacity=self.edge_time[i]) 115 | # G.add_edges_from(self.edge_list) 116 | return G 117 | 118 | def to_ig(self): 119 | G=ig.Graph() 120 | G.add_edges(self.edge_list) 121 | return G 122 | 123 | 124 | def write_edgelist(self, path): 125 | with open(path, 'w') as f: 126 | for i,j,w in self.edge_list: 127 | f.write(str(i) + '\t'+str(j) + '\n') 128 | 129 | 130 | def Q(G, C): 131 | q = 0.0 132 | m = 2 * len(G.edge_list) 133 | for c in C.groups: 134 | for i,_ in c: 135 | for j,_ in c: 136 | q+= G.weight(i,j) - (G.deg[i]*G.deg[j]/(2*m)) 137 | q /= 2*m 138 | return q 139 | 140 | def common_neighbour(i, G, normalize=True): 141 | p = {} 142 | for k,wik in G.neigh[i]: 143 | for j,wjk in G.neigh[k]: 144 | if j in p: p[j]+=(wik * wjk) 145 | else: p[j]= (wik * wjk) 146 | if len(p)<=0 or not normalize: return p 147 | maxp = p[max(p, key = lambda i: p[i])] 148 | for j in p: p[j] = p[j]*1.0 / maxp 149 | return p 150 | 151 | def choose_community(i, G, C, alpha, beta, gamma, epsilon): 152 | mids =[k for k,uik in C.memberships[i]] 153 | if random.random()< beta: #inside 154 | cids = mids 155 | else: 156 | cids = [j for j in range(len(C.groups)) if j not in mids] #: cids.append(j) 157 | 158 | return cids[ int(random.random()*len(cids))] if len(cids)>0 else None 159 | 160 | def degree_similarity(i, ids, G, gamma, normalize = True): 161 | p = [0]*len(ids) 162 | for ij,j in enumerate(ids): 163 | p[ij] = (G.deg[j] -G.deg[i])**2 164 | if len(p)<=0 or not normalize: return p 165 | maxp = max(p) 166 | if maxp==0: return p 167 | p = [pi*1.0/maxp if gamma<0 else 1-pi*1.0/maxp for pi in p] 168 | return p 169 | 170 | def combine (a,b,alpha,gamma): 171 | return (a**alpha) / ((b+1)**gamma) 172 | 173 | def choose_node(i,c, G, C, alpha, beta, gamma, epsilon): 174 | ids = [j for j,_ in C.groups[c] if j !=i ] 175 | # also remove nodes that are already connected from the candidate list 176 | for k,_ in G.neigh[i]: 177 | if k in ids: ids.remove(k) 178 | 179 | norma = False 180 | cn = common_neighbour(i, G, normalize=norma) 181 | trim_ids = [id for id in ids if id in cn] 182 | dd = degree_similarity(i, trim_ids, G, gamma, normalize=norma) 183 | 184 | if random.random()beta)): 206 | G.add_edge(i,k,wjk*pj) 207 | 208 | def connect(i, b, G, C, alpha, beta, gamma, epsilon): 209 | #Choose community 210 | c = choose_community(i, G, C, alpha, beta, gamma, epsilon) 211 | if c is None: return 212 | #Choose node within community 213 | tmp = choose_node(i, c, G, C, alpha, beta, gamma, epsilon) 214 | if tmp is None: return 215 | j, pj = tmp 216 | G.add_edge(i,j,pj) 217 | connect_neighbor(i, j, pj , c, b, G, C, beta) 218 | 219 | def select_node(G, method = 'uniform'): 220 | if method=='uniform': 221 | return int(random.random() * G.n) # uniform 222 | else: 223 | if method == 'older_less_active': p = [(i+1) for i in range(G.n)] # older less active 224 | elif method == 'younger_less_active' : p = [G.n-i for i in range(G.n)] # younger less active 225 | else: p = [1 for i in range(G.n)] # uniform 226 | return random_choice(range(len(p)), p ) #, size=1, replace = False)[0] 227 | 228 | def assign(i, C, e=1, r=1, q = 0.5): 229 | p = [e +len(c) for c in C.groups] 230 | id = random_choice(range(C.k),p ) 231 | C.add(id, i) 232 | for j in range(1,r): #todo add strength for fuzzy 233 | if (random.random()1 else '') 314 | write_to_file(G,C,path,name,format,farz_params) 315 | return 316 | if arange ==None: 317 | arange = default_ranges[vari] 318 | for i,var in enumerate(get_range(arange[0],arange[1],arange[2])): 319 | for r in range(repeat): 320 | farz_params[vari] = var 321 | print 's',i+1, r+1, str(farz_params) 322 | G, C =realize(**farz_params) 323 | name = 'S'+str(i+1)+'-'+net_name+ (str(r+1) if repeat>1 else '') 324 | write_to_file(G,C,path,name,format,farz_params) 325 | 326 | 327 | import sys 328 | def main(argv): 329 | import getopt 330 | FARZsetting = default_FARZ_setting.copy() 331 | batch_setting= default_batch_setting.copy() 332 | try: 333 | opts, args = getopt.getopt(argv,"ho:s:v:c:f:n:k:m:a:b:g:p:r:q:t:e:dw",\ 334 | ["output=","path=","repeat=","vary=",'range=','format=',"alpha=","beta=","gamma=",'phi=','overlap=','oProb=','epsilon=','cneigh=','directed','weighted']) 335 | except getopt.GetoptError: 336 | print 'invalid command, try -h to see usage and options' 337 | sys.exit(2) 338 | for opt, arg in opts: 339 | if opt == '-h': 340 | print '*** examples:' 341 | print '+ example 1: generate a network with 1000 nodes and about 5x1000 edges (m=5), with 4 communities, where 90% of edges fall within communities (beta=0.9)' 342 | print '> python FARZ.py -n 1000 -m 5 -k 4 --beta 0.9\n' 343 | print '+ example 2: generate a network with properties of example 1, where alpha = 0.2 and gamma = -0.8' 344 | print '> python FARZ.py -n 1000 -m 5 -k 4 --beta 0.9 --alpha 0.2 --gamma -0.8 \n' 345 | print '+ example 3: generate 10 sample networks with properties of example 1 and save them into ./data' 346 | print '> python FARZ.py --path ./data -s 10 -n 1000 -m 5 -k 4 --beta 0.9\n' 347 | print '+ example 4: repeat example 2, for beta that varies from 0.5 to 1 with 0.05 increments' 348 | print '> python FARZ.py --path ./data -s 10 -v beta -c [0.5,1,0.05] -n 1000 -m 5 -k 4 \n' 349 | print '+ example 5: generate overlapping communities, where each node belongs to at most 3 communities and the portion of overlapping nodes varies' 350 | print 'python FARZ.py -r 3 -v q --path ./datavrq -s 5 --format list\n' 351 | 352 | print '*** parameters:' 353 | print '-n: number of nodes, default (1000)' 354 | print '-m: half the average degree of nodes, default (5)' 355 | print '-k: number of communities, default (4)' 356 | print '-b [or --beta]: the strength of community structure, i.e. the probability of edges to be formed within communities, default (0.8)' 357 | print '-a [or --alpha]: the strength of common neighbor\'s effect on edge formation edges, default (0.5)' 358 | print '-g [or --gamma]: the strength of degree similarity effect on edge formation, default (0.5), can be negative for networks with negative degree correlation' 359 | print '-p [or --phi]: the constant added to all community sizes, higher number makes the communities more balanced in size, default (1), which results in power law distribution for community sizes' 360 | print '-r: the number of communities each node can belong to, default (1)' 361 | print '-q: the probability of a node belonging to the multiple communities, default (0.5)' 362 | print '-e [or --epsilon]: the probability of noisy/random edges, default (0.0000001)' 363 | print '-t: the probability of also connecting to the neighbors of a node each nodes connects to. The default value is (0), but could be increased to a small number to achieve higher clustering coefficient. \n' 364 | 365 | print '*** batch parameters:' 366 | print '-s: the number of networks to be sampled with the given properties, default (1)' 367 | print '-o: the name of the output network, default (network)' 368 | print '--path : the path to write the network(s) to, default (.)' 369 | print '-f [or --format]: the format of output, list or gml, default (gml)' 370 | print '-v: the parameter to vary and sample networks for, default (None)' 371 | print '-c: the range to change the given parameter, should be in format of [s,e,inc]' 372 | #print 'default FARZ parameters are :\n', default_FARZ_setting 373 | #print 'default batch generator parameters are :\n', default_batch_setting 374 | 375 | sys.exit() 376 | 377 | elif opt in ("-o", "--output"): 378 | batch_setting['net_name'] = arg 379 | elif opt in ("--path"): 380 | batch_setting['path'] = arg 381 | elif opt in ("-f", "--format"): 382 | if arg in supported_formats: 383 | batch_setting['format'] = arg 384 | else: 385 | print 'Format not supported , choose from ',supported_formats,' or try -h to see the usage and options' 386 | sys.exit(2) 387 | elif opt in ("-s","--repeat"): 388 | try: batch_setting['repeat'] = int(arg) 389 | except ValueError: 390 | print 'Invalid Number , try -h to see the usage and options' 391 | sys.exit(2) 392 | elif opt in ("-v", "--vary"): 393 | if (arg in default_ranges.keys()): 394 | batch_setting['vari'] = arg 395 | else: 396 | print 'Invalid variable, choose form :', default_ranges.keys(), ', try -h to see the usage and options' 397 | sys.exit(2) 398 | elif opt in ("-c", "--range"): 399 | try: 400 | arange = [float(s) for s in arg[1:-1].split(',')] 401 | batch_setting['arange'] = arange 402 | except Error: 403 | print 'Invalid range, should have the following form : [start,end,incrementBy], try -h to see the usage and options ' 404 | sys.exit(2) 405 | elif opt in ("-n"): 406 | try: FARZsetting['n'] = int(arg) 407 | except ValueError: 408 | print 'Invalid Number , try -h to see the usage and options' 409 | sys.exit(2) 410 | elif opt in ("-k"): 411 | try: FARZsetting['k'] = int(arg) 412 | except ValueError: 413 | print 'Invalid Number , try -h to see the usage and options' 414 | sys.exit(2) 415 | elif opt in ("-m"): 416 | try: FARZsetting['m'] = int(arg) 417 | except ValueError: 418 | print 'Invalid Number , try -h to see the usage and options' 419 | sys.exit(2) 420 | elif opt in ("-a","--alpha"): 421 | try: FARZsetting['alpha'] = float(arg) 422 | except ValueError: 423 | print 'Invalid Number , try -h to see the usage and options' 424 | sys.exit(2) 425 | elif opt in ("-b","--beta"): 426 | try: FARZsetting['beta'] = float(arg) 427 | except ValueError: 428 | print 'Invalid Number , try -h to see the usage and options' 429 | sys.exit(2) 430 | elif opt in ("-g","--gamma"): 431 | try: FARZsetting['gamma'] = float(arg) 432 | except ValueError: 433 | print 'Invalid Number , try -h to see the usage and options' 434 | sys.exit(2) 435 | elif opt in ("-p","--phi"): 436 | try: FARZsetting['phi'] = int(arg) 437 | except ValueError: 438 | print 'Invalid Number , try -h to see the usage and options' 439 | sys.exit(2) 440 | elif opt in ("-r","--overlap"): 441 | try: FARZsetting['o'] = int(arg) 442 | except ValueError: 443 | print 'Invalid Number , try -h to see the usage and options' 444 | sys.exit(2) 445 | elif opt in ("-q","--oProb"): 446 | try: FARZsetting['q'] = float(arg) 447 | except ValueError: 448 | print 'Invalid Number , try -h to see the usage and options' 449 | sys.exit(2) 450 | elif opt in ("-d","--directed"): 451 | FARZsetting['directed'] = True 452 | elif opt in ("-w","--wighted"): 453 | FARZsetting['weighted'] = True 454 | elif opt in ("-t","--cneigh"): 455 | try: FARZsetting['b'] = float(arg) 456 | except ValueError: 457 | print 'Invalid Number , try -h to see the usage and options' 458 | sys.exit(2) 459 | elif opt in ("-e","--epsilon"): 460 | try: FARZsetting['epsilon'] = float(arg) 461 | except ValueError: 462 | print 'Invalid Number , try -h to see the usage and options' 463 | sys.exit(2) 464 | 465 | batch_setting['farz_params'] = FARZsetting 466 | print 'generating FARZ benchmark(s) ... ' 467 | generate( **batch_setting) 468 | 469 | 470 | if __name__ == "__main__": 471 | main(sys.argv[1:]) 472 | 473 | 474 | # python FARZ.py --path ./dataVb55 -s 10 -v beta 475 | # python FARZ.py --path ./dataVb82 -s 10 -v beta --alpha 0.8 --gamma 0.2 476 | # python FARZ.py --path ./dataVb5-5 -s 10 -v beta --alpha 0.5 --gamma -0.5 477 | # python FARZ.py --path ./dataVb2-8 -s 10 -v beta --alpha 0.2 --gamma -0.8 478 | --------------------------------------------------------------------------------