├── .pydevproject
├── .project
├── LICENSE
├── README.md
└── src
├── network_models.py
├── draw_plots.py
├── three_pass_benchmarks.py
└── FARZ.py
/.pydevproject:
--------------------------------------------------------------------------------
1 |
2 |
3 | Default
4 | python 2.7
5 |
6 |
--------------------------------------------------------------------------------
/.project:
--------------------------------------------------------------------------------
1 |
2 |
3 | FARZ
4 |
5 |
6 |
7 |
8 |
9 | org.python.pydev.PyDevBuilder
10 |
11 |
12 |
13 |
14 |
15 | org.python.pydev.pythonNature
16 |
17 |
18 |
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | The MIT License (MIT)
2 |
3 | Copyright (c) 2015 rabbanyk
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 |
23 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # FARZ
2 | ## Benchmarks for Community Detection Algorithms
3 |
4 | FARZ is generator/simulator for networks with built-in community structure.
5 | It creates graphs/networks with community labels, which can be used for evaluating community detection algorithms.
6 |
7 | ### Generator Parameters
8 | * main parameters
9 | + `n`: number of nodes
10 | + `m`: number of edges created per node (i.e. half the average degree of nodes)
11 | + `k`: number of communities
12 | * control parameters
13 | + `beta`: the strength of community structure, i.e. the probability of edges to be formed within communities, default (0.8)
14 | + `alpha`: the strength of common neighbor's effect on edge formation edges, default (0.5)
15 | + `gamma`: the strength of degree similarity effect on edge formation, default (0.5), can be negative for networks with negative degree correlation
16 | * overlap parameters
17 | + `r`: the maximum number of communities each node can belong to, default (1, which results in disjoint communities)
18 | + `q`: the probability of a node belonging to the multiple communities, default (0.5, has an effect only if r>1)
19 | * config parameters
20 | + `phi`: the constant added to all community sizes, higher number makes the communities more balanced in size, default (1) which results in power law distribution for community sizes
21 | + `epsilon`: the probability of noisy/random edges, default (0.0000001)
22 | + `t`: the probability of also connecting to the neighbors of a node each nodes connects to. The default value is (0), but could be increase to a small number to achieve higher clustering coefficient.
23 |
24 | ### How to run
25 | The source code is in Pyhton 2.7.
26 | You can generate FARZ benchmarks from FARZ.py in src.
27 | See ` python FARZ.py -h ` to examine the usage and options; or try the following examples.
28 |
29 | ### Examples
30 | * example 1: generate a network with 1000 nodes and about 5x1000 edges (m=5), with 4 communities, where 90% of edges fall within communities (beta=0.9)
31 |
32 | ` python FARZ.py -n 1000 -m 5 -k 4 --beta 0.9`
33 | * example 2: generate a network with properties of example 1, where alpha = 0.2 and gamma = -0.8
34 |
35 | ` python FARZ.py -n 1000 -m 5 -k 4 --beta 0.9 --alpha 0.2 --gamma -0.8 `
36 | * example 3: generate 10 sample networks with properties of example 1 and save them into ./data
37 |
38 | ` python FARZ.py --path ./data -s 10 -n 1000 -m 5 -k 4 --beta 0.9`
39 | * example 4: repeat example 2, for beta that varies from 0.5 to 1 with 0.05 increments
40 |
41 | ` python FARZ.py --path ./data -s 10 -v beta -c [0.5,1,0.05] -n 1000 -m 5 -k 4 `
42 | * example 5: generate overlapping communities, where each node belongs to at most 3 communities and the portion of overlapping nodes varies, overlapping communities are saved as a list format (a community per line)
43 |
44 | ` python FARZ.py -r 3 -v q --path ./data -s 5 --format list`
45 |
46 | ### Supplementary Materials
47 |
48 |
49 | ### Support or Contact
50 | Reihaneh Rabbany, rabbanyk@ualberta.ca
51 |
--------------------------------------------------------------------------------
/src/network_models.py:
--------------------------------------------------------------------------------
1 | import igraph as ig
2 | import networkx as nx
3 | import numpy as np
4 |
5 |
6 | class Graph:
7 | def __init__(self, n, edge_list, directed=False):
8 | self.edge_list = edge_list
9 | self.n = n
10 | self.directed=directed
11 | # self.C = None
12 | # self.Cid = None
13 | self.rewirings_count = 0
14 |
15 | self.deg = [0]* n
16 | self.neigh = [[] for i in range(n)]
17 | for e in edge_list:
18 | u , v = e
19 | self.neigh[u].append(v)
20 | self.deg[v]+=1
21 | if not self.directed: #if directed deg is indegree, outdegree = len(negh)
22 | self.neigh[v].append(u)
23 | self.deg[u]+=1
24 | return
25 |
26 | def add_edge(self, u, v):
27 | self.edge_list.append((u,v) if u hence works with any input, even if not a valid degree sequence
74 | # todo: add direction and weights
75 | '''
76 | edge_list = []
77 | max_itr = len(S)
78 | itr = 0
79 | Q = [i for i in range(0, len(S)) if S[i]!=0 ]
80 | while len(Q)>(1 if not selfloop else 0):
81 | itr+=1
82 | i, j = np.random.choice(Q, size =2, replace = False if not selfloop else True)
83 | e = (i,j) if imax_itr:
86 | edge_list.append(e)
87 | S[i]-=1
88 | S[j]-=1
89 | itr = 0
90 | if S[i]==0: Q.remove(i)
91 | if S[j]==0: Q.remove(j)
92 |
93 | return edge_list if multilinks else list(set(edge_list))
94 |
95 |
--------------------------------------------------------------------------------
/src/draw_plots.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | import matplotlib
3 | import matplotlib.pyplot as plt
4 | from scipy.stats import gaussian_kde
5 | from scipy.interpolate import UnivariateSpline
6 | from scipy import interpolate
7 | from sklearn.neighbors.kde import KernelDensity
8 | from scipy.stats import norm
9 | from sklearn.neighbors import KernelDensity
10 | from matplotlib.patches import Polygon
11 |
12 | matplotlib.rcParams.update({'font.size': 14,
13 | 'font.family': 'Times New Roman',
14 | # 'legend.frameon': False
15 | })
16 | colrs= ["#429617",
17 | "#4A76BD",
18 | "#E04927",
19 | "#EEC958",
20 | "#B962CE",
21 | "#57445A",
22 | "#716035",
23 | "#C6617E",
24 | "#92C4AE"]
25 | markers=['o','>','s','d','*','v','<','o','>','s','d','*','v','<']
26 |
27 |
28 |
29 | def rewirings(data):
30 | fig,axess = plt.subplots(2, 2, True,True, squeeze=False, figsize=(8,8))
31 | plt.subplots_adjust(left=0.1, bottom=0.1, right=0.98, top=0.95,
32 | wspace=0.02, hspace=0.14)
33 |
34 | network_models = ['CF', 'BA', 'FF', 'FF+BA']
35 | assign_methods = ["LFR","LFR-CN","LFR-NE"]
36 |
37 | for i in {0,1}:
38 | for j in {0,1}:
39 | mean, var = data[i][j]
40 | k = i+j
41 | ax = axess[i,j]
42 | for m in range(0, len(mean)):
43 | ax.errorbar(np.arange(1,len(mean[m])+1)*0.1, mean[m], yerr=var[m],
44 | label =assign_methods[m], c=colrs[m], marker=markers[m],linewidth=1.2)#, xerr=0.4)
45 | ax.legend()
46 | ax.set_title(network_models[k])
47 |
48 | # axess[1,0].set_xlabel("mixing parameter $\mu$")
49 | # axess[0,0].set_ylabel("percentage of rewirings")
50 | axess[0,0].set_xlim(0, (len(mean[i])+1)*0.1)
51 |
52 | fig.text(0.5, 0.05, 'mixing parameter $\mu$', ha='center', va='center', size=22)
53 | fig.text(0.025, 0.5, 'percentage of rewirings', ha='center', va='center', rotation='vertical', size=22)
54 |
55 | # plt.tight_layout()
56 | plt.show()
57 |
58 |
59 | # fakeMean = [[.7,.6,.3,.2,.3,.2],[.4,.3,.2,.4,.35,.25],[.2,.3,.4,.5,.55,.44]]
60 | # fakevar = [[.05,.05,.05,.05,.05,.05],[.05,.05,.05,.05,.05,.05],[.05,.05,.05,.05,.05,.05]]
61 | # fakeOnePlotData = (fakeMean,fakevar)
62 | # rewirings( [[fakeOnePlotData,fakeOnePlotData],[fakeOnePlotData,fakeOnePlotData]] )
63 | #
64 | #
65 |
66 |
67 | #sequences for degrees, shortest_pathes, clustering_coeffients, degree_corelations
68 | def basic_properties_freq( sequences , axess=None, labl = None, logscale=False, markr=None, clr=None,offset=0):
69 | if axess is None:
70 | fig,axess = plt.subplots( 3,len(sequences),'col',False, squeeze=False, figsize=(10,8))
71 | plt.subplots_adjust(left=0.04, bottom=0.05, right=0.98, top=0.94, wspace=0.28, hspace=0.1)
72 |
73 | for i in range(0,len(sequences)):
74 | ax = axess[offset,i]
75 | seq = sequences[i]
76 | seq = [f for f in seq if f>0]
77 | smax =max(seq)
78 | smin =min(seq)
79 |
80 | #print seq
81 | freqs , bin_edges = np.histogram(seq, smax+1 if smax>1 else 100, range = (0,smax+1) if smax>1 else (0,smax))#, normed = True, density=True)
82 | bin_centers = (bin_edges[:-1] + bin_edges[1:])/2.
83 | vals = range(0,smax+1) if smax>1 else bin_centers
84 | freqs=freqs*1.0/sum(freqs)
85 |
86 | # fplot = ax.loglog if lplot else ax.plot
87 |
88 | # his, = ax.plot(vals, freqs,lw=0, label=labl, alpha =0.8, color = clr , marker =markr)
89 | # if lplot:
90 | # his, = ax.loglog(vals, freqs,'.', marker ='.', label=labl, alpha =0.5)
91 | # his, = ax.loglog(vals, freqs,'.', marker ='.', label=labl, alpha =0.5)
92 | # else :
93 | # his, = ax.plot(vals, freqs,'.', marker ='.', label=labl, alpha =0.5)
94 | # his, = ax.loglog(vals, freqs,'.', marker ='.', label=labl, alpha =0.5)
95 |
96 |
97 | # x = bin_centers
98 | # f = UnivariateSpline(x, freqs)#, s=0.1*len(freq))
99 | # ax.plot(x, f(x),c= his.get_color(), alpha=0.5)
100 | # print len(freqs) #, freqs
101 | # print bin_edges
102 | # print bin_centers
103 |
104 | #remove zeros
105 | y = np.array(freqs)
106 | nz_indexes = np.nonzero(y)
107 | y = y[nz_indexes]
108 | x = np.array(vals)[nz_indexes]
109 |
110 | # ax.plot(x, y,':',c= his.get_color(),alpha=0.8)
111 | ax.plot(x, y,':', label=labl, alpha =0.8, color = clr , marker =markr)
112 | # f = interpolate.interp1d(x, y, kind='linear')
113 | # f = interpolate.UnivariateSpline(x, y, k=2)
114 | #
115 | # xs = np.linspace(min(x),max(x),200)
116 | # ys = f(xs) # use interpolation function returned by `interp1d`
117 | # ax.plot(xs, ys,'-',c= his.get_color(),)
118 | #
119 | # density = gaussian_kde(y)
120 | # xs = np.linspace(min(x),max(x),200)
121 | # # density.covariance_factor = lambda : .5
122 | # # density._compute_covariance()
123 | # ax.plot(xs,density(xs),c= his.get_color(), alpha=0.5)
124 | if len(logscale)==len(sequences):
125 | if 'x' in logscale[i] : ax.set_xscale('log')
126 | if 'y' in logscale[i] : ax.set_yscale('log')
127 | # ax.legend()
128 | # plt.show()
129 | return axess
130 |
131 |
132 |
133 | #sequences for degrees, shortest_pathes, clustering_coeffients, degree_corelations
134 | def basic_properties( sequences , axess=None, labl = None, logscale=[False], markr='.', clr='k',offset=0, alfa = 0.8,
135 | distir = [False,False,False, False], bandwidths = [3, 0.1,0.01,1], limits = [(1,50),(0,1),(0,1),(1,25)] ):
136 | if axess is None:
137 | fig,axess = plt.subplots( 3, len(sequences),False,False, squeeze=False,figsize=(len(sequences)*3,8))#'col'
138 | plt.subplots_adjust(left=0.12, bottom=0.05, right=0.95, top=0.94, wspace=0.28, hspace=0.1)
139 | plt.subplots_adjust(left=0.45, bottom=0.05, right=0.95, top=0.94, wspace=0.28, hspace=1.2)
140 |
141 | for i in range(0,len(sequences)):
142 | ax = axess[offset][i]
143 | seq = sequences[i]
144 | smax =max(seq)
145 | smin =min(seq)
146 |
147 | if distir[i]==0:
148 | #print seq
149 | freqs , bin_edges = np.histogram(seq, smax+1 if smax>1 else 100, range = (0,smax+1) if smax>1 else (0,smax))#, normed = True, density=True)
150 | bin_centers = (bin_edges[:-1] + bin_edges[1:])/2.
151 | vals = range(0,smax+1) if smax>1 else bin_centers
152 | freqs=freqs*1.0/sum(freqs)
153 | #remove zeros
154 | y = np.array(freqs)
155 | nz_indexes = np.nonzero(y)
156 | y = y[nz_indexes]
157 | x = np.array(vals)[nz_indexes]
158 | ax.plot(x, y,':', label=labl, alpha =alfa, color = clr , marker ='.')
159 | else :
160 | X = np.array(seq)
161 | X = [ x for x in X if x>=limits[i][0] and x<=limits[i][1]]
162 | # X= (np.abs(X))
163 | # print len(X)
164 | X = np.random.choice(X, size=min(10000, len(X)))
165 | X = X[:, np.newaxis]
166 | kde = KernelDensity(kernel = 'gaussian', bandwidth=bandwidths[i]).fit(X)#,atol=atols[i],kernel = 'tophat'kernel='gaussian'
167 | # if 'x' in logscale[i] :
168 | # X_plot = np.logspace( limits[i][0], limits[i][1], 1000)[:, np.newaxis]
169 | # else :
170 | X_plot = np.linspace(limits[i][0], limits[i][1], 1000)[:, np.newaxis]
171 |
172 | log_dens = kde.score_samples(X_plot) #
173 | # ax.fill(X_plot[:, 0], np.exp(log_dens), alpha =0.5, label=labl)
174 | Y = np.exp(log_dens)
175 | if distir[i]==2: Y = np.cumsum(Y)
176 | ax.plot(X_plot[:, 0],Y, '-',label=labl, alpha =alfa, color = clr ,markersize=2, marker ='')
177 |
178 | verts = [(limits[i][0]-1e-6, 0)] + list(zip(X_plot[:, 0],Y)) + [(limits[i][1]+1e-6, 0)]
179 | poly = Polygon(verts, facecolor=clr, alpha =alfa ) #, edgecolor='0.5')
180 | ax.add_patch(poly)
181 | # ax.set_yticks([])
182 | # ax.set_ylim(bottom=-0.02)
183 | ax.set_xlim(limits[i][0],limits[i][1])
184 |
185 | if len(logscale)==len(sequences):
186 | if 'x' in logscale[i] :
187 | ax.set_xscale('log')
188 | if 'y' in logscale[i] :
189 | ax.set_yscale('log')
190 | if i<3: ax.set_ylim(bottom=0.001)
191 | # ax.legend()
192 | # plt.show(block=False)
193 | return axess
194 |
195 | def test_density_plot():
196 | fig, ax = plt.subplots(2, 2, sharex=True, sharey=True)
197 |
198 | N=20
199 | X = np.concatenate((np.random.normal(0, 1, 0.3 * N),
200 | np.random.normal(5, 1, 0.7 * N)))[:, np.newaxis]
201 |
202 | print np.shape(X)
203 | X_plot = np.linspace(-5, 10, 1000)[:, np.newaxis]
204 | print np.shape(X_plot)
205 | kde = KernelDensity(kernel='gaussian', bandwidth=0.75).fit(X)
206 | log_dens = kde.score_samples(X_plot)
207 | ax[0,0].fill(X_plot[:, 0], np.exp(log_dens), fc='#AAAAFF')
208 | ax[0,0].text(-3.5, 0.31, "Gaussian Kernel Density")
209 | ax[0,0].plot(X[:, 0], np.zeros(X.shape[0]) - 0.01, '+k')
210 |
211 | plt.show()
212 |
213 |
--------------------------------------------------------------------------------
/src/three_pass_benchmarks.py:
--------------------------------------------------------------------------------
1 | import random
2 | import bisect
3 | import numpy as np
4 | from network_models import *
5 |
6 | def generalize_three_pass(network_model, assign_nodes, overlay_communities, g_params, c_params):
7 | G = network_model(g_params)
8 | # print_seq_stats( '\t\t network_generated', G.deg)
9 | return generalize_three_pass_network(G, assign_nodes, overlay_communities, c_params )
10 |
11 | def generalize_three_pass_network(G, assign_nodes, overlay_communities, c_params):
12 | C = assign_nodes(G, c_params)
13 | print_seq_stats('\t\t node_assigned', [len(c) for c in C[0]])
14 | return overlay_communities(G, C, c_params)
15 |
16 | def print_seq_stats(msg, S):
17 | print msg, ':::\t len: ',len(S),' min: ', np.min(S) ,' avg: ',np.mean(S),' max: ', np.max(S),' sum: ', np.sum(S)
18 |
19 |
20 |
21 | def assign_CN(G, c_params):
22 | def cn_prob(G, v, C, ec, Cid):
23 | p = [0.1]*(len(C))
24 | for u in G.neigh[v]:
25 | if Cid[u]>=0: p[Cid[u]]+=1
26 | p= [p[i] for i in ec] #remove communities that are full
27 | p= [i/sum(p) for i in p]#np.divide(p, np.sum(p))
28 | # print p
29 | return p
30 | return assign_LFR(G, c_params, cn_prob)
31 |
32 | def random_choice(values, weights=None, size = 1, replace = True):
33 | if weights is None:
34 | i = int(random.random() * len(values))
35 | # i = random.randrange(0,len(values))
36 | # res = random.choice(values)
37 | else :
38 | total = 0
39 | cum_weights = []
40 | for w in weights:
41 | total += w
42 | cum_weights.append(total)
43 | x = random.random() * total
44 | i = bisect.bisect(cum_weights, x)
45 | # print weights
46 | #res = values[i]
47 |
48 | if size <=1: return values[i]
49 | else:
50 | # print i, values
51 | cval = [values[j] for j in range(len(values)) if replace or i<>j]
52 | if weights is None: cwei=None
53 | else: cwei = [weights[j] for j in range(len(weights)) if replace or i<>j]
54 |
55 | # if not replace : del values[i]
56 | return values[i], random_choice(cval, cwei, size-1, replace)
57 |
58 | def assign_first_pass_original(G,mu, s, c, cid, prob):
59 | # assign nodes to communities
60 | for v in range(G.n):
61 | # pick a community at random
62 | ec = [i for i in range( len(c)) if s[i]>len(c[i])]
63 |
64 | # if prob is None:
65 | # p = None
66 | # else:
67 | # print prob
68 | # print ec
69 | # p = prob(G,v,c, cid)
70 | # p = [p[i] for i in ec]
71 | # p = np.divide(p, np.sum(p))
72 | i = random_choice(ec, None if prob is None else prob(G,v,c,ec, cid))
73 | # assign to community if fits
74 | if s[i] >= (1- mu)*G.deg[v] :
75 | c[i].append(v)
76 | cid[v] = i
77 |
78 | def assign_first_pass_NE(G,mu, s, c, cid, prob):
79 | for v in range(G.n):
80 | if cid[v]==-1:
81 | # pick a community at random
82 | ec = [i for i in range( len(c)) if s[i]>len(c[i])]
83 | i = random_choice(ec, None if prob is None else prob(G,v,c,ec, cid))
84 |
85 | # i = np.random_choice([i for i in range( len(c)) if s[i]>len(c[i])],
86 | # p = None if prob is None else prob(G,v,c, cid) )
87 | to_add = [v]
88 | marked = [0]*G.n
89 | marked[v] =1
90 | while len(to_add)>0 and len(c[i])< s[i]:
91 | v = to_add.pop(0)
92 | # assign to community if fits
93 | if s[i] >= (1- mu)*G.deg[v] :
94 | c[i].append(v)
95 | cid[v] = i
96 | for u in G.neigh[v]:
97 | if marked[u]==0 and cid[u]==-1:
98 | to_add.append(u)
99 | marked[u]=1
100 |
101 |
102 | def assign_LFR(G, c_params, prob=None, first_pass= assign_first_pass_original, max_itr = 1000):
103 | c_params['s_sum'] = G.n
104 | mu = c_params['mu']
105 | # determine capacity of communities
106 | d_max = np.max(G.deg)
107 | # print G.n, c_params['s_max'], d_max, (1-mu) *d_max
108 |
109 | if c_params['s_max']< (1-mu) *d_max : c_params['s_max'] =(1-mu) * d_max
110 | # print c_params['s_max'], d_max
111 |
112 | s = sample_power_law(**c_params)
113 | c_max = max(s)
114 | # print_seq_stats('community_sizes_sampeled',s)
115 |
116 | # initialize the communities and community ids
117 | c = [[] for i in range(len(s))]
118 | cid = [-1] * G.n
119 | first_pass(G,mu, s, c, cid, prob)
120 |
121 | # print_seq_stats('1... ',[len(l) for l in c])
122 | # initialize the homeless queue
123 | H = [v for v in range(G.n) if cid[v]==-1]
124 |
125 | itr = 0
126 | # assign homeless nodes to communities
127 | while len(H)>0 and max_itr>itr:
128 | itr+=1
129 | # print itr
130 | # pick a community at random
131 | v = random_choice(H)
132 | ec = [i for i in range( len(c))]
133 | i = random_choice(ec, None if prob is None else prob(G, v,c,ec, cid) )
134 | if s[i] >= min((1- mu)*G.deg[v], c_max) :
135 | c[i].append(v)
136 | cid[v]=i
137 | H.remove(v)
138 | # itr=0
139 | # kick out a random node
140 | if len(c[i])> s[i]:
141 | u = random_choice(c[i])
142 | c[i].remove(u)
143 | cid[u] = -1
144 | H.append(u)
145 |
146 | if len(H)>0: print "Failed in 2nd run"
147 | for v in H:
148 | # pick a community at random
149 | ec =[i for i in range( len(c))]
150 | i = random_choice(ec, None if prob is None else prob(G, v,c,ec, cid) )
151 | c[i].append(v)
152 | cid[v]=i
153 |
154 | return c, cid
155 |
156 |
157 |
158 | def assign_NE(G, c_params, prob=None):
159 | return assign_LFR(G, c_params, prob=None, first_pass= assign_first_pass_NE)
160 |
161 | def overlay_LFR(G, C, c_params):
162 | mu = c_params['mu']
163 | n= G.n
164 | deg = G.deg #degree_seq(n, edge_list)
165 |
166 | C, Cid = C
167 | # determine degree of each node and its between/outlink degree,
168 | # i.e. number of edges that go outside its community
169 | db = [0]* n
170 | # d = [0] * n
171 | # neigh = [[] for i in range(0, n)]
172 | for e in G.edge_list:
173 | u , v = e
174 | if Cid[u] != Cid[v]:
175 | db[u]+=1
176 | db[v]+=1
177 | # determine desired between changes
178 | for v in range(n):
179 | db[v] = np.floor(mu*G.deg[v] - db[v])
180 | dw = np.multiply(db, -1)
181 | # rewire edges within communities
182 | for c in C:
183 | I = [v for v in c if dw[v]>0]
184 | # add internal edges
185 | while len(I)>=2:
186 | u, v = random_choice(I, size =2, replace = False)
187 | G.add_edge(u,v)
188 | dw[u]-=1
189 | dw[v]-=1
190 | if dw[u] ==0: I.remove(u)
191 | if dw[v] ==0: I.remove(v)
192 | # remove excess edges
193 | for v in c:
194 | if dw[v]<0:
195 | I = [u for u in G.neigh[v] if u in c and dw[u]<0]
196 | while len(I)>=1 and dw[v]<0:
197 | u = random_choice(I)
198 | G.remove_edge(u,v)
199 | dw[u]+=1
200 | dw[v]+=1
201 | I.remove(u)
202 |
203 | # rewire edges between communities
204 | for c in C:
205 | I = [v for v in c if db[v]>0]
206 | O = [v for v in range(n) if v not in c and db[v]>0]
207 | # add internal edges
208 | while len(I)>=1 and len(O)>=1:
209 | v = random_choice(I)
210 | u = random_choice(O)
211 | G.add_edge(u,v)
212 | db[u]-=1
213 | db[v]-=1
214 | if db[v] ==0: I.remove(v)
215 | if db[u] ==0: O.remove(u)
216 | # remove excess edges
217 | for v in c:
218 | if db[v]<0:
219 | O = [u for u in G.neigh[v] if u not in c and db[u]<0]
220 | while len(O)>=1 and db[v]<0:
221 | u = random_choice(O)
222 | G.remove_edge(u,v)
223 | db[u]+=1
224 | db[v]+=1
225 | O.remove(u)
226 |
227 | return G, C
228 |
229 | def configuration_model(params):
230 | S = sample_power_law(**params)
231 | # print_seq_stats('degree_sampled',S)
232 | return Graph(len(S),configuration_model_from_sequence(S))
233 |
234 |
235 | def sample_power_law( s_exp=2, n=None, s_avg=None, s_max=None, s_min=1, s_sum=None, discrete = True, **kwargs):
236 | S = None
237 | if n is not None: # number of samples is fixed
238 | if s_avg is None and s_sum is not None: s_avg = s_sum*1.0/n
239 | # 1.0/np.random.power(exp+1, size=n)
240 | S = np.array([])
241 | c = None
242 | while (len(S)=s_min:
259 | S.append(tmp)
260 | else:
261 | shift = np.ceil(tmp*1.0/len(S))
262 | for i in range(0,len(S)):
263 | if tmp>0:
264 | S[i]+=shift
265 | tmp-=shift
266 | else: break
267 |
268 | # print S, np.sum(S), s_sum
269 | return S
270 |
271 |
272 |
273 | def scale_truncate(S, max=None, min=None, avg=None, c= None):
274 | # print c
275 | if c==None:
276 | c = 1
277 | if avg is not None:
278 | itr =0
279 | max_itr = 100
280 | while (itrmin]
286 | if max is not None: S = S[Smin]
290 | if max is not None: S = S[Si: return values[i]
19 | else: return None
20 | else:
21 | cval = [values[j] for j in range(len(values)) if replace or i<>j]
22 | if weights is None: cwei=None
23 | else: cwei = [weights[j] for j in range(len(weights)) if replace or i<>j]
24 | tmp= random_choice(cval, cwei, size-1, replace)
25 | if not isinstance(tmp,list): tmp = [tmp]
26 | tmp.append(values[i])
27 | return tmp
28 |
29 | class Comms:
30 | def __init__(self, k):
31 | self.k = k
32 | self.groups = [[] for i in range(k)]
33 | self.memberships = {}
34 |
35 | def add(self, cluster_id, i, s = 1):
36 | if i not in [m[0] for m in self.groups[cluster_id]]:
37 | self.groups[cluster_id].append((i,s))
38 | if i in self.memberships:
39 | self.memberships[i].append((cluster_id,s))
40 | else:
41 | self.memberships[i] =[(cluster_id,s)]
42 | def write_groups(self, path):
43 | with open(path, 'w') as f:
44 | for g in self.groups:
45 | for i,s in g:
46 | f.write(str(i) + ' ')
47 | f.write('\n')
48 |
49 |
50 |
51 | class Graph:
52 | def __init__(self,directed=False, weighted=False):
53 | self.n = 0
54 | self.counter = 0
55 | self.max_degree = 0
56 | self.directed = directed
57 | self.weighted = weighted
58 | self.edge_list = []
59 | self.edge_time = []
60 | self.deg = []
61 | self.neigh = [[]]
62 | return
63 |
64 | def add_node(self):
65 | self.deg.append(0)
66 | self.neigh.append([])
67 | self.n+=1
68 |
69 | def weight(self, u, v):
70 | for i,w in self.neigh[u]:
71 | if i == v: return w
72 | return 0
73 |
74 | def is_neigh(self, u, v):
75 | for i,_ in self.neigh[u]:
76 | if i == v: return True
77 | return False
78 |
79 | def add_edge(self, u, v, w=1):
80 | if u==v: return
81 | if not self.weighted : w =1
82 | self.edge_list.append((u,v,w) if uself.max_degree: self.max_degree = self.deg[v]
88 |
89 | if not self.directed: #if directed deg is indegree, outdegree = len(negh)
90 | self.neigh[v].append((u,w))
91 | self.deg[u]+=w
92 | if self.deg[u]>self.max_degree: self.max_degree = self.deg[u]
93 |
94 | return
95 |
96 |
97 | def to_nx(self):
98 | import networkx as nx
99 | G=nx.Graph()
100 | for u,v, w in self.edge_list:
101 | G.add_edge(u, v)
102 | # G.add_edges_from(self.edge_list)
103 | return G
104 |
105 | def to_nx(self, C):
106 | import networkx as nx
107 | G=nx.Graph()
108 | for i in range(self.n):
109 | G.add_node(i, {'c':str(sorted(C.memberships[i]))})
110 | # G.add_node(i, {'c':int(C.memberships[i][0][0])})
111 | for i in range(len(self.edge_list)):
112 | # for u,v, w in self.edge_list:
113 | u,v, w = self.edge_list[i]
114 | G.add_edge(u, v, weight=w, capacity=self.edge_time[i])
115 | # G.add_edges_from(self.edge_list)
116 | return G
117 |
118 | def to_ig(self):
119 | G=ig.Graph()
120 | G.add_edges(self.edge_list)
121 | return G
122 |
123 |
124 | def write_edgelist(self, path):
125 | with open(path, 'w') as f:
126 | for i,j,w in self.edge_list:
127 | f.write(str(i) + '\t'+str(j) + '\n')
128 |
129 |
130 | def Q(G, C):
131 | q = 0.0
132 | m = 2 * len(G.edge_list)
133 | for c in C.groups:
134 | for i,_ in c:
135 | for j,_ in c:
136 | q+= G.weight(i,j) - (G.deg[i]*G.deg[j]/(2*m))
137 | q /= 2*m
138 | return q
139 |
140 | def common_neighbour(i, G, normalize=True):
141 | p = {}
142 | for k,wik in G.neigh[i]:
143 | for j,wjk in G.neigh[k]:
144 | if j in p: p[j]+=(wik * wjk)
145 | else: p[j]= (wik * wjk)
146 | if len(p)<=0 or not normalize: return p
147 | maxp = p[max(p, key = lambda i: p[i])]
148 | for j in p: p[j] = p[j]*1.0 / maxp
149 | return p
150 |
151 | def choose_community(i, G, C, alpha, beta, gamma, epsilon):
152 | mids =[k for k,uik in C.memberships[i]]
153 | if random.random()< beta: #inside
154 | cids = mids
155 | else:
156 | cids = [j for j in range(len(C.groups)) if j not in mids] #: cids.append(j)
157 |
158 | return cids[ int(random.random()*len(cids))] if len(cids)>0 else None
159 |
160 | def degree_similarity(i, ids, G, gamma, normalize = True):
161 | p = [0]*len(ids)
162 | for ij,j in enumerate(ids):
163 | p[ij] = (G.deg[j] -G.deg[i])**2
164 | if len(p)<=0 or not normalize: return p
165 | maxp = max(p)
166 | if maxp==0: return p
167 | p = [pi*1.0/maxp if gamma<0 else 1-pi*1.0/maxp for pi in p]
168 | return p
169 |
170 | def combine (a,b,alpha,gamma):
171 | return (a**alpha) / ((b+1)**gamma)
172 |
173 | def choose_node(i,c, G, C, alpha, beta, gamma, epsilon):
174 | ids = [j for j,_ in C.groups[c] if j !=i ]
175 | # also remove nodes that are already connected from the candidate list
176 | for k,_ in G.neigh[i]:
177 | if k in ids: ids.remove(k)
178 |
179 | norma = False
180 | cn = common_neighbour(i, G, normalize=norma)
181 | trim_ids = [id for id in ids if id in cn]
182 | dd = degree_similarity(i, trim_ids, G, gamma, normalize=norma)
183 |
184 | if random.random()beta)):
206 | G.add_edge(i,k,wjk*pj)
207 |
208 | def connect(i, b, G, C, alpha, beta, gamma, epsilon):
209 | #Choose community
210 | c = choose_community(i, G, C, alpha, beta, gamma, epsilon)
211 | if c is None: return
212 | #Choose node within community
213 | tmp = choose_node(i, c, G, C, alpha, beta, gamma, epsilon)
214 | if tmp is None: return
215 | j, pj = tmp
216 | G.add_edge(i,j,pj)
217 | connect_neighbor(i, j, pj , c, b, G, C, beta)
218 |
219 | def select_node(G, method = 'uniform'):
220 | if method=='uniform':
221 | return int(random.random() * G.n) # uniform
222 | else:
223 | if method == 'older_less_active': p = [(i+1) for i in range(G.n)] # older less active
224 | elif method == 'younger_less_active' : p = [G.n-i for i in range(G.n)] # younger less active
225 | else: p = [1 for i in range(G.n)] # uniform
226 | return random_choice(range(len(p)), p ) #, size=1, replace = False)[0]
227 |
228 | def assign(i, C, e=1, r=1, q = 0.5):
229 | p = [e +len(c) for c in C.groups]
230 | id = random_choice(range(C.k),p )
231 | C.add(id, i)
232 | for j in range(1,r): #todo add strength for fuzzy
233 | if (random.random()1 else '')
314 | write_to_file(G,C,path,name,format,farz_params)
315 | return
316 | if arange ==None:
317 | arange = default_ranges[vari]
318 | for i,var in enumerate(get_range(arange[0],arange[1],arange[2])):
319 | for r in range(repeat):
320 | farz_params[vari] = var
321 | print 's',i+1, r+1, str(farz_params)
322 | G, C =realize(**farz_params)
323 | name = 'S'+str(i+1)+'-'+net_name+ (str(r+1) if repeat>1 else '')
324 | write_to_file(G,C,path,name,format,farz_params)
325 |
326 |
327 | import sys
328 | def main(argv):
329 | import getopt
330 | FARZsetting = default_FARZ_setting.copy()
331 | batch_setting= default_batch_setting.copy()
332 | try:
333 | opts, args = getopt.getopt(argv,"ho:s:v:c:f:n:k:m:a:b:g:p:r:q:t:e:dw",\
334 | ["output=","path=","repeat=","vary=",'range=','format=',"alpha=","beta=","gamma=",'phi=','overlap=','oProb=','epsilon=','cneigh=','directed','weighted'])
335 | except getopt.GetoptError:
336 | print 'invalid command, try -h to see usage and options'
337 | sys.exit(2)
338 | for opt, arg in opts:
339 | if opt == '-h':
340 | print '*** examples:'
341 | print '+ example 1: generate a network with 1000 nodes and about 5x1000 edges (m=5), with 4 communities, where 90% of edges fall within communities (beta=0.9)'
342 | print '> python FARZ.py -n 1000 -m 5 -k 4 --beta 0.9\n'
343 | print '+ example 2: generate a network with properties of example 1, where alpha = 0.2 and gamma = -0.8'
344 | print '> python FARZ.py -n 1000 -m 5 -k 4 --beta 0.9 --alpha 0.2 --gamma -0.8 \n'
345 | print '+ example 3: generate 10 sample networks with properties of example 1 and save them into ./data'
346 | print '> python FARZ.py --path ./data -s 10 -n 1000 -m 5 -k 4 --beta 0.9\n'
347 | print '+ example 4: repeat example 2, for beta that varies from 0.5 to 1 with 0.05 increments'
348 | print '> python FARZ.py --path ./data -s 10 -v beta -c [0.5,1,0.05] -n 1000 -m 5 -k 4 \n'
349 | print '+ example 5: generate overlapping communities, where each node belongs to at most 3 communities and the portion of overlapping nodes varies'
350 | print 'python FARZ.py -r 3 -v q --path ./datavrq -s 5 --format list\n'
351 |
352 | print '*** parameters:'
353 | print '-n: number of nodes, default (1000)'
354 | print '-m: half the average degree of nodes, default (5)'
355 | print '-k: number of communities, default (4)'
356 | print '-b [or --beta]: the strength of community structure, i.e. the probability of edges to be formed within communities, default (0.8)'
357 | print '-a [or --alpha]: the strength of common neighbor\'s effect on edge formation edges, default (0.5)'
358 | print '-g [or --gamma]: the strength of degree similarity effect on edge formation, default (0.5), can be negative for networks with negative degree correlation'
359 | print '-p [or --phi]: the constant added to all community sizes, higher number makes the communities more balanced in size, default (1), which results in power law distribution for community sizes'
360 | print '-r: the number of communities each node can belong to, default (1)'
361 | print '-q: the probability of a node belonging to the multiple communities, default (0.5)'
362 | print '-e [or --epsilon]: the probability of noisy/random edges, default (0.0000001)'
363 | print '-t: the probability of also connecting to the neighbors of a node each nodes connects to. The default value is (0), but could be increased to a small number to achieve higher clustering coefficient. \n'
364 |
365 | print '*** batch parameters:'
366 | print '-s: the number of networks to be sampled with the given properties, default (1)'
367 | print '-o: the name of the output network, default (network)'
368 | print '--path : the path to write the network(s) to, default (.)'
369 | print '-f [or --format]: the format of output, list or gml, default (gml)'
370 | print '-v: the parameter to vary and sample networks for, default (None)'
371 | print '-c: the range to change the given parameter, should be in format of [s,e,inc]'
372 | #print 'default FARZ parameters are :\n', default_FARZ_setting
373 | #print 'default batch generator parameters are :\n', default_batch_setting
374 |
375 | sys.exit()
376 |
377 | elif opt in ("-o", "--output"):
378 | batch_setting['net_name'] = arg
379 | elif opt in ("--path"):
380 | batch_setting['path'] = arg
381 | elif opt in ("-f", "--format"):
382 | if arg in supported_formats:
383 | batch_setting['format'] = arg
384 | else:
385 | print 'Format not supported , choose from ',supported_formats,' or try -h to see the usage and options'
386 | sys.exit(2)
387 | elif opt in ("-s","--repeat"):
388 | try: batch_setting['repeat'] = int(arg)
389 | except ValueError:
390 | print 'Invalid Number , try -h to see the usage and options'
391 | sys.exit(2)
392 | elif opt in ("-v", "--vary"):
393 | if (arg in default_ranges.keys()):
394 | batch_setting['vari'] = arg
395 | else:
396 | print 'Invalid variable, choose form :', default_ranges.keys(), ', try -h to see the usage and options'
397 | sys.exit(2)
398 | elif opt in ("-c", "--range"):
399 | try:
400 | arange = [float(s) for s in arg[1:-1].split(',')]
401 | batch_setting['arange'] = arange
402 | except Error:
403 | print 'Invalid range, should have the following form : [start,end,incrementBy], try -h to see the usage and options '
404 | sys.exit(2)
405 | elif opt in ("-n"):
406 | try: FARZsetting['n'] = int(arg)
407 | except ValueError:
408 | print 'Invalid Number , try -h to see the usage and options'
409 | sys.exit(2)
410 | elif opt in ("-k"):
411 | try: FARZsetting['k'] = int(arg)
412 | except ValueError:
413 | print 'Invalid Number , try -h to see the usage and options'
414 | sys.exit(2)
415 | elif opt in ("-m"):
416 | try: FARZsetting['m'] = int(arg)
417 | except ValueError:
418 | print 'Invalid Number , try -h to see the usage and options'
419 | sys.exit(2)
420 | elif opt in ("-a","--alpha"):
421 | try: FARZsetting['alpha'] = float(arg)
422 | except ValueError:
423 | print 'Invalid Number , try -h to see the usage and options'
424 | sys.exit(2)
425 | elif opt in ("-b","--beta"):
426 | try: FARZsetting['beta'] = float(arg)
427 | except ValueError:
428 | print 'Invalid Number , try -h to see the usage and options'
429 | sys.exit(2)
430 | elif opt in ("-g","--gamma"):
431 | try: FARZsetting['gamma'] = float(arg)
432 | except ValueError:
433 | print 'Invalid Number , try -h to see the usage and options'
434 | sys.exit(2)
435 | elif opt in ("-p","--phi"):
436 | try: FARZsetting['phi'] = int(arg)
437 | except ValueError:
438 | print 'Invalid Number , try -h to see the usage and options'
439 | sys.exit(2)
440 | elif opt in ("-r","--overlap"):
441 | try: FARZsetting['o'] = int(arg)
442 | except ValueError:
443 | print 'Invalid Number , try -h to see the usage and options'
444 | sys.exit(2)
445 | elif opt in ("-q","--oProb"):
446 | try: FARZsetting['q'] = float(arg)
447 | except ValueError:
448 | print 'Invalid Number , try -h to see the usage and options'
449 | sys.exit(2)
450 | elif opt in ("-d","--directed"):
451 | FARZsetting['directed'] = True
452 | elif opt in ("-w","--wighted"):
453 | FARZsetting['weighted'] = True
454 | elif opt in ("-t","--cneigh"):
455 | try: FARZsetting['b'] = float(arg)
456 | except ValueError:
457 | print 'Invalid Number , try -h to see the usage and options'
458 | sys.exit(2)
459 | elif opt in ("-e","--epsilon"):
460 | try: FARZsetting['epsilon'] = float(arg)
461 | except ValueError:
462 | print 'Invalid Number , try -h to see the usage and options'
463 | sys.exit(2)
464 |
465 | batch_setting['farz_params'] = FARZsetting
466 | print 'generating FARZ benchmark(s) ... '
467 | generate( **batch_setting)
468 |
469 |
470 | if __name__ == "__main__":
471 | main(sys.argv[1:])
472 |
473 |
474 | # python FARZ.py --path ./dataVb55 -s 10 -v beta
475 | # python FARZ.py --path ./dataVb82 -s 10 -v beta --alpha 0.8 --gamma 0.2
476 | # python FARZ.py --path ./dataVb5-5 -s 10 -v beta --alpha 0.5 --gamma -0.5
477 | # python FARZ.py --path ./dataVb2-8 -s 10 -v beta --alpha 0.2 --gamma -0.8
478 |
--------------------------------------------------------------------------------