├── .gitignore
├── .nojekyll
├── README.md
├── images
├── 389px-Network_Community_Structure.svg.png
├── ILLUSTRATION3.PNG.png
├── Konigsberg_bridges.png
├── Multi-pseudograph.png
├── Multi-pseudograph.svg
├── Pseudoforest.png
├── UKfaculty.gexf
├── bowtie-page.png
├── communities1.png
├── communities2.png
├── graph6.png
├── scc.jpg
└── twitter-facebook-branding2.png
├── tweaks.css
├── user-2015.R
├── user-2015.Rmd
├── user-2015.html
├── user-2015.md
├── user-2015.pdf
└── user-2015_files
└── figure-html
├── unnamed-chunk-1-1.png
├── unnamed-chunk-100-1.png
├── unnamed-chunk-101-1.png
├── unnamed-chunk-102-1.png
├── unnamed-chunk-103-1.png
├── unnamed-chunk-104-1.png
├── unnamed-chunk-105-1.png
├── unnamed-chunk-106-1.png
├── unnamed-chunk-107-1.png
├── unnamed-chunk-108-1.png
├── unnamed-chunk-109-1.png
├── unnamed-chunk-110-1.png
├── unnamed-chunk-111-1.png
├── unnamed-chunk-112-1.png
├── unnamed-chunk-113-1.png
├── unnamed-chunk-114-1.png
├── unnamed-chunk-115-1.png
├── unnamed-chunk-116-1.png
├── unnamed-chunk-117-1.png
├── unnamed-chunk-118-1.png
├── unnamed-chunk-119-1.png
├── unnamed-chunk-120-1.png
├── unnamed-chunk-121-1.png
├── unnamed-chunk-122-1.png
├── unnamed-chunk-123-1.png
├── unnamed-chunk-124-1.png
├── unnamed-chunk-125-1.png
├── unnamed-chunk-126-1.png
├── unnamed-chunk-127-1.png
├── unnamed-chunk-22-1.png
├── unnamed-chunk-24-1.png
├── unnamed-chunk-25-1.png
├── unnamed-chunk-26-1.png
├── unnamed-chunk-27-1.png
├── unnamed-chunk-28-1.png
├── unnamed-chunk-29-1.png
├── unnamed-chunk-3-1.png
├── unnamed-chunk-30-1.png
├── unnamed-chunk-31-1.png
├── unnamed-chunk-32-1.png
├── unnamed-chunk-33-1.png
├── unnamed-chunk-34-1.png
├── unnamed-chunk-35-1.png
├── unnamed-chunk-36-1.png
├── unnamed-chunk-37-1.png
├── unnamed-chunk-38-1.png
├── unnamed-chunk-39-1.png
├── unnamed-chunk-4-1.png
├── unnamed-chunk-40-1.png
├── unnamed-chunk-41-1.png
├── unnamed-chunk-42-1.png
├── unnamed-chunk-43-1.png
├── unnamed-chunk-44-1.png
├── unnamed-chunk-45-1.png
├── unnamed-chunk-46-1.png
├── unnamed-chunk-47-1.png
├── unnamed-chunk-48-1.png
├── unnamed-chunk-49-1.png
├── unnamed-chunk-50-1.png
├── unnamed-chunk-51-1.png
├── unnamed-chunk-52-1.png
├── unnamed-chunk-53-1.png
├── unnamed-chunk-54-1.png
├── unnamed-chunk-55-1.png
├── unnamed-chunk-56-1.png
├── unnamed-chunk-57-1.png
├── unnamed-chunk-58-1.png
├── unnamed-chunk-59-1.png
├── unnamed-chunk-6-1.png
├── unnamed-chunk-60-1.png
├── unnamed-chunk-61-1.png
├── unnamed-chunk-62-1.png
├── unnamed-chunk-63-1.png
├── unnamed-chunk-64-1.png
├── unnamed-chunk-65-1.png
├── unnamed-chunk-66-1.png
├── unnamed-chunk-67-1.png
├── unnamed-chunk-68-1.png
├── unnamed-chunk-69-1.png
├── unnamed-chunk-70-1.png
├── unnamed-chunk-71-1.png
├── unnamed-chunk-72-1.png
├── unnamed-chunk-73-1.png
├── unnamed-chunk-74-1.png
├── unnamed-chunk-75-1.png
├── unnamed-chunk-76-1.png
├── unnamed-chunk-77-1.png
├── unnamed-chunk-78-1.png
├── unnamed-chunk-80-1.png
├── unnamed-chunk-81-1.png
├── unnamed-chunk-83-1.png
├── unnamed-chunk-84-1.png
├── unnamed-chunk-85-1.png
├── unnamed-chunk-86-1.png
├── unnamed-chunk-87-1.png
├── unnamed-chunk-88-1.png
├── unnamed-chunk-89-1.png
├── unnamed-chunk-90-1.png
├── unnamed-chunk-91-1.png
├── unnamed-chunk-92-1.png
├── unnamed-chunk-93-1.png
├── unnamed-chunk-94-1.png
├── unnamed-chunk-95-1.png
├── unnamed-chunk-96-1.png
├── unnamed-chunk-97-1.png
├── unnamed-chunk-98-1.png
└── unnamed-chunk-99-1.png
/.gitignore:
--------------------------------------------------------------------------------
1 | /tags
2 |
--------------------------------------------------------------------------------
/.nojekyll:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/igraph/netuser15/3a43bb2d397c0cf1b46d478f507e8ed57b146335/.nojekyll
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Statistical analysis of network data
2 |
3 | > Tutorial at the useR! 2015 conference
4 |
5 | ## Following the tutorial
6 |
7 | * The formatted slides are available here:
8 | http://igraph.github.io/netuser15/user-2015.html#1
9 | * For running the R markdown source, download the
10 | [user-2015.Rmd](user-2015.Rmd) file
11 | (or clone this repo), and open it in R Studio or the R IDE of
12 | your choice.
13 | * Alternatively download only the R source from
14 | [user-2015.R](user-2015.R) and copy and paste it
15 | into your R session.
16 |
17 | ## Software needed
18 |
19 | You will need at least `igraph` version 1.0.0 and `igraphdata` version
20 | `1.0.0`. You will also need the `DiagrammeR` package. To install them
21 | from within R, run:
22 |
23 | ```r
24 | install.packages("igraph")
25 | install.packages("igraphdata")
26 | install.packages("DiagrammeR")
27 | ```
28 |
29 |
--------------------------------------------------------------------------------
/images/389px-Network_Community_Structure.svg.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/igraph/netuser15/3a43bb2d397c0cf1b46d478f507e8ed57b146335/images/389px-Network_Community_Structure.svg.png
--------------------------------------------------------------------------------
/images/ILLUSTRATION3.PNG.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/igraph/netuser15/3a43bb2d397c0cf1b46d478f507e8ed57b146335/images/ILLUSTRATION3.PNG.png
--------------------------------------------------------------------------------
/images/Konigsberg_bridges.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/igraph/netuser15/3a43bb2d397c0cf1b46d478f507e8ed57b146335/images/Konigsberg_bridges.png
--------------------------------------------------------------------------------
/images/Multi-pseudograph.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/igraph/netuser15/3a43bb2d397c0cf1b46d478f507e8ed57b146335/images/Multi-pseudograph.png
--------------------------------------------------------------------------------
/images/Multi-pseudograph.svg:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
150 |
--------------------------------------------------------------------------------
/images/Pseudoforest.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/igraph/netuser15/3a43bb2d397c0cf1b46d478f507e8ed57b146335/images/Pseudoforest.png
--------------------------------------------------------------------------------
/images/bowtie-page.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/igraph/netuser15/3a43bb2d397c0cf1b46d478f507e8ed57b146335/images/bowtie-page.png
--------------------------------------------------------------------------------
/images/communities1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/igraph/netuser15/3a43bb2d397c0cf1b46d478f507e8ed57b146335/images/communities1.png
--------------------------------------------------------------------------------
/images/communities2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/igraph/netuser15/3a43bb2d397c0cf1b46d478f507e8ed57b146335/images/communities2.png
--------------------------------------------------------------------------------
/images/graph6.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/igraph/netuser15/3a43bb2d397c0cf1b46d478f507e8ed57b146335/images/graph6.png
--------------------------------------------------------------------------------
/images/scc.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/igraph/netuser15/3a43bb2d397c0cf1b46d478f507e8ed57b146335/images/scc.jpg
--------------------------------------------------------------------------------
/images/twitter-facebook-branding2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/igraph/netuser15/3a43bb2d397c0cf1b46d478f507e8ed57b146335/images/twitter-facebook-branding2.png
--------------------------------------------------------------------------------
/tweaks.css:
--------------------------------------------------------------------------------
1 |
2 |
--------------------------------------------------------------------------------
/user-2015.R:
--------------------------------------------------------------------------------
1 | ## ---- setup, echo = FALSE, message = FALSE-------------------------------
2 | library(knitr)
3 | library(igraph)
4 | opts_chunk$set(
5 | prompt = FALSE,
6 | comment = "#>",
7 | tidy = FALSE)
8 | options(width = 73)
9 | igraph_options(graph.margin = 0, margin = 0)
10 |
11 | ## ----echo = FALSE, results = "hide", message = FALSE---------------------
12 | set.seed(42)
13 | library(igraph)
14 | library(igraphdata)
15 | data(karate)
16 | par(mar=c(0,0,0,0))
17 | plot(karate, margin = 0)
18 |
19 | ## ----message = FALSE-----------------------------------------------------
20 | library(igraph)
21 |
22 | ## ------------------------------------------------------------------------
23 | toy1 <- make_graph(~ A - B, B - C - D, D - E:F:A, A:B - G:H)
24 | toy1
25 |
26 | ## ------------------------------------------------------------------------
27 | par(mar = c(0,0,0,0)); plot(toy1)
28 |
29 | ## ------------------------------------------------------------------------
30 | toy2 <- make_graph(~ A -+ B, B -+ C -+ D +- A:B)
31 | toy2
32 |
33 | ## ------------------------------------------------------------------------
34 | par(mar = c(0,0,0,0)); plot(toy2)
35 |
36 | ## ------------------------------------------------------------------------
37 | toy2
38 |
39 | ## ------------------------------------------------------------------------
40 | make_ring(5)
41 |
42 | ## ------------------------------------------------------------------------
43 | make_ring(5)
44 |
45 | ## ------------------------------------------------------------------------
46 | A <- matrix(sample(0:1, 100, replace = TRUE), nrow = 10)
47 | A
48 |
49 | ## ------------------------------------------------------------------------
50 | graph_from_adjacency_matrix(A)
51 |
52 | ## ------------------------------------------------------------------------
53 | L <- matrix(sample(1:10, 20, replace = TRUE), ncol = 2)
54 | L
55 |
56 | ## ------------------------------------------------------------------------
57 | graph_from_edgelist(L)
58 |
59 | ## ------------------------------------------------------------------------
60 | edges <- data.frame(
61 | stringsAsFactors = FALSE,
62 | from = c("BOS", "JFK", "LAX"),
63 | to = c("JFK", "LAX", "JFK"),
64 | Carrier = c("United", "Jetblue", "Virgin America"),
65 | Departures = c(30, 60, 121)
66 | )
67 | vertices <- data.frame(
68 | stringsAsFactors = FALSE,
69 | name = c("BOS", "JFK", "LAX"),
70 | City = c("Boston, MA", "New York City, NY",
71 | "Los Angeles, CA")
72 | )
73 |
74 | ## ------------------------------------------------------------------------
75 | edges
76 |
77 | ## ------------------------------------------------------------------------
78 | vertices
79 |
80 | ## ------------------------------------------------------------------------
81 | toy_air <- graph_from_data_frame(edges, vertices = vertices)
82 | toy_air
83 |
84 | ## ------------------------------------------------------------------------
85 | library(igraphdata)
86 | data(USairports)
87 | USairports
88 |
89 | ## ------------------------------------------------------------------------
90 | as_data_frame(toy_air, what = "edges")
91 |
92 | ## ------------------------------------------------------------------------
93 | as_data_frame(toy_air, what = "vertices")
94 |
95 | ## ------------------------------------------------------------------------
96 | as_long_data_frame(toy_air)
97 |
98 | ## ------------------------------------------------------------------------
99 | V(USairports)[[1:5]]
100 |
101 | ## ------------------------------------------------------------------------
102 | E(USairports)[[1:5]]
103 |
104 | ## ------------------------------------------------------------------------
105 | is_simple(USairports)
106 | sum(which_multiple(USairports))
107 | sum(which_loop(USairports))
108 |
109 | ## ------------------------------------------------------------------------
110 | air <- simplify(USairports, edge.attr.comb =
111 | list(Departures = "sum", Seats = "sum", Passengers = "sum", "ignore"))
112 | is_simple(air)
113 | summary(air)
114 |
115 | ## ----eval = FALSE--------------------------------------------------------
116 | ## BOS: JFK, LAX, EWR, MKE, PVD
117 | ## JFK: BGR, BOS, SFO, BNA, BUF, SRQ, RIC RDU, MSP
118 | ## LAX: DTW, MSY, LAS, FLL, STL,
119 | ## . . .
120 |
121 | ## ------------------------------------------------------------------------
122 | air["BOS", "JFK"]
123 | air["BOS", "ANC"]
124 |
125 | ## ------------------------------------------------------------------------
126 | air[c("BOS", "JFK", "ANC"), c("BOS", "JFK", "ANC")]
127 |
128 | ## ------------------------------------------------------------------------
129 | E(air)$weight <- E(air)$Passengers
130 | air["BOS", "JFK"]
131 |
132 | ## ------------------------------------------------------------------------
133 | air[["BOS"]]
134 |
135 | ## ------------------------------------------------------------------------
136 | air[[, "BOS"]]
137 |
138 | ## ------------------------------------------------------------------------
139 | air["BOS", "ANC"] <- TRUE
140 | air["BOS", "ANC"]
141 |
142 | ## ------------------------------------------------------------------------
143 | air["BOS", "ANC"] <- FALSE
144 | air["BOS", "ANC"]
145 |
146 | ## ------------------------------------------------------------------------
147 | g <- make_empty_graph(10)
148 | g[-1, 1] <- TRUE
149 | g
150 |
151 | ## ------------------------------------------------------------------------
152 | g <- make_ring(10) + 2
153 | par(mar = c(0,0,0,0)); plot(g)
154 |
155 | ## ------------------------------------------------------------------------
156 | g <- make_(ring(10), with_vertex_(color = "grey")) +
157 | vertices(2, color = "red")
158 | par(mar = c(0,0,0,0)); plot(g)
159 |
160 | ## ------------------------------------------------------------------------
161 | g <- make_(star(10), with_edge_(color = "grey")) +
162 | edge(5, 6, color = "red")
163 | par(mar = c(0,0,0,0)); plot(g)
164 |
165 | ## ------------------------------------------------------------------------
166 | g <- make_(empty_graph(5)) + path(1,2,3,4,5,1)
167 | g2 <- make_(empty_graph(5)) + path(1:5, 1)
168 | g
169 | g2
170 |
171 | ## ----echo = FALSE--------------------------------------------------------
172 | par(mar=c(0,0,0,0))
173 | plot(make_star(11, center = 11, mode = "undirected") + path(1:10, 1))
174 |
175 | ## ------------------------------------------------------------------------
176 | make_star(11, center = 11, mode = "undirected") + path(1:10, 1)
177 |
178 | ## ------------------------------------------------------------------------
179 | FL <- V(air)[grepl("FL$", City)]
180 | CA <- V(air)[grepl("CA$", City)]
181 |
182 | V(air)$color <- "grey"
183 | V(air)[FL]$color <- "blue"
184 | V(air)[CA]$color <- "blue"
185 |
186 | ## ------------------------------------------------------------------------
187 | E(air)[FL %--% CA]
188 | E(air)$color <- "grey"
189 | E(air)[FL %--% CA]$color <- "red"
190 |
191 | ## ------------------------------------------------------------------------
192 | V(air)[[1:5]]
193 |
194 | ## ------------------------------------------------------------------------
195 | E(air)[[1:5]]
196 |
197 | ## ----echo = FALSE--------------------------------------------------------
198 | set.seed(42)
199 | g <- sample_gnp(12, 0.25)
200 | l <- layout_nicely(g)
201 | par(mar=c(0,0,0,0))
202 | plot(g, margin = 0, layout = l)
203 |
204 | ## ----echo = FALSE--------------------------------------------------------
205 | pa <- V(g)[11, 2, 12, 8]
206 | V(g)[pa]$color <- 'green'
207 | E(g)$color <- 'grey'
208 | E(g, path = pa)$color <- 'red'
209 | E(g, path = pa)$width <- 3
210 | par(mar=c(0,0,0,0))
211 | plot(g, margin = 0, layout = l)
212 |
213 | ## ------------------------------------------------------------------------
214 | set.seed(42)
215 | g <- sample_gnp(12, 0.25)
216 |
217 | pa <- V(g)[11, 2, 12, 8]
218 |
219 | V(g)[pa]$color <- 'green'
220 | E(g)$color <- 'grey'
221 | E(g, path = pa)$color <- 'red'
222 | E(g, path = pa)$width <- 3
223 |
224 | ## ------------------------------------------------------------------------
225 | par(mar=c(0,0,0,0))
226 | plot(g, margin = 0, layout = layout_nicely)
227 |
228 | ## ----echo = FALSE--------------------------------------------------------
229 | set.seed(42)
230 | g <- sample_gnp(12, 0.25)
231 | pa <- V(g)[11, 2, 12, 8]
232 | V(g)[pa]$color <- 'green'
233 | E(g)$color <- 'grey'
234 | E(g, path = pa)$color <- 'red'
235 | E(g, path = pa)$width <- 3
236 | par(mar=c(0,0,0,0))
237 | plot(g, margin = 0, layout = layout_nicely)
238 |
239 | ## ------------------------------------------------------------------------
240 | air <- delete_edge_attr(air, "weight")
241 | distances(air, 'PBI', 'ANC')
242 |
243 | ## ------------------------------------------------------------------------
244 | sp <- shortest_paths(air, 'PBI', 'ANC', output = "both")
245 | sp
246 | air[[ sp$epath[[1]] ]]
247 |
248 | ## ------------------------------------------------------------------------
249 | all_shortest_paths(air, 'PBI', 'ANC')$res
250 |
251 | ## ------------------------------------------------------------------------
252 | wair <- simplify(USairports, edge.attr.comb =
253 | list(Departures = "sum", Seats = "sum", Passangers = "sum",
254 | Distance = "first", "ignore"))
255 | E(wair)$weight <- E(wair)$Distance
256 |
257 | ## ------------------------------------------------------------------------
258 | distances(wair, c('BOS', 'JFK', 'PBI', 'AZO'),
259 | c('BOS', 'JFK', 'PBI', 'AZO'))
260 |
261 | ## ------------------------------------------------------------------------
262 | shortest_paths(wair, from = 'BOS', to = 'AZO')$vpath
263 | all_shortest_paths(wair, from = 'BOS', to = 'AZO')$res
264 |
265 | ## ------------------------------------------------------------------------
266 | mean_distance(air)
267 | air_dist_hist <- distance_table(air)
268 | air_dist_hist
269 |
270 | ## ------------------------------------------------------------------------
271 | barplot(air_dist_hist$res, names.arg = seq_along(air_dist_hist$res))
272 |
273 | ## ------------------------------------------------------------------------
274 | co <- components(air, mode = "weak")
275 | co$csize
276 | groups(co)[[2]]
277 |
278 | ## ------------------------------------------------------------------------
279 | co <- components(air, mode = "strong")
280 | co$csize
281 |
282 | ## ------------------------------------------------------------------------
283 | largest_component <- function(graph) {
284 | comps <- components(graph, mode = "strong")
285 | gr <- groups(comps)
286 | sizes <- vapply(gr, length, 1L)
287 | induced_subgraph(graph, gr[[ which.max(sizes) ]])
288 | }
289 | sc_air <- largest_component(air)
290 |
291 | ## ------------------------------------------------------------------------
292 | table(distances(sc_air, "BOS"))
293 | table(distances(sc_air, "LAX"))
294 |
295 | ## ------------------------------------------------------------------------
296 | mean(as.vector(distances(sc_air, "BOS")))
297 | mean(as.vector(distances(sc_air, "LAX")))
298 |
299 | ## ------------------------------------------------------------------------
300 | D <- distances(sc_air)
301 | sort(rowMeans(D))[1:10]
302 |
303 | ## ------------------------------------------------------------------------
304 | sort(rowMeans(D), decreasing = TRUE)[1:10]
305 |
306 | ## ------------------------------------------------------------------------
307 | V(sc_air)[[names(sort(rowMeans(D), decreasing = TRUE)[1:10])]]
308 |
309 | ## ----echo = FALSE--------------------------------------------------------
310 | par(mar=c(0,0,0,0))
311 | plot(make_star(11))
312 |
313 | ## ----echo = FALSE--------------------------------------------------------
314 | data(kite)
315 | par(mar=c(0,0,0,0))
316 | plot(kite)
317 |
318 | ## ------------------------------------------------------------------------
319 | V(kite)$label.cex <- 2
320 | V(kite)$color <- V(kite)$frame.color <- "grey"
321 | V(kite)$size <- 30
322 | par(mar=c(0,0,0,0)) ; plot(kite)
323 |
324 | ## ------------------------------------------------------------------------
325 | d <- degree(kite)
326 | par(mar = c(0,0,0,0))
327 | plot(kite, vertex.size = 10 * d, vertex.label =
328 | paste0(V(kite)$name, ":", d))
329 |
330 | ## ------------------------------------------------------------------------
331 | cl <- closeness(kite)
332 |
333 | ## ------------------------------------------------------------------------
334 | par(mar=c(0,0,0,0)); plot(kite, vertex.size = 500 * cl)
335 |
336 | ## ------------------------------------------------------------------------
337 | btw <- betweenness(kite)
338 | btw
339 |
340 | ## ------------------------------------------------------------------------
341 | par(mar=c(0,0,0,0)); plot(kite, vertex.size = 3 * btw)
342 |
343 | ## ------------------------------------------------------------------------
344 | ec <- eigen_centrality(kite)$vector
345 | ec
346 | cor(ec, d)
347 |
348 | ## ------------------------------------------------------------------------
349 | par(mar=c(0,0,0,0)); plot(kite, vertex.size = 20 * ec)
350 |
351 | ## ------------------------------------------------------------------------
352 | page_rank(kite)$vector
353 |
354 | ## ------------------------------------------------------------------------
355 | graph <- make_graph( ~ A-B-C-D-A, E-A:B:C:D,
356 | F-G-H-I-F, J-F:G:H:I,
357 | K-L-M-N-K, O-K:L:M:N,
358 | P-Q-R-S-P, T-P:Q:R:S,
359 | B-F, E-J, C-I, L-T, O-T, M-S,
360 | C-P, C-L, I-L, I-P)
361 |
362 | ## ------------------------------------------------------------------------
363 | par(mar=c(0,0,0,0)); plot(graph)
364 |
365 | ## ------------------------------------------------------------------------
366 | flat_clustering <- make_clusters(
367 | graph,
368 | c(1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,4,4,4,4,4))
369 |
370 | ## ------------------------------------------------------------------------
371 | flat_clustering
372 |
373 | ## ------------------------------------------------------------------------
374 | flat_clustering[[1]]
375 | length(flat_clustering)
376 | sizes(flat_clustering)
377 |
378 | ## ------------------------------------------------------------------------
379 | induced_subgraph(graph, flat_clustering[[1]])
380 |
381 | ## ------------------------------------------------------------------------
382 | data(karate)
383 | karate
384 | karate <- delete_edge_attr(karate, "weight")
385 |
386 | ## ------------------------------------------------------------------------
387 | ground_truth <- make_clusters(karate, V(karate)$Faction)
388 | length(ground_truth)
389 | ground_truth
390 |
391 | ## ------------------------------------------------------------------------
392 | cluster_naive2 <- function(graph, center1, center2) {
393 | # ...
394 | }
395 |
396 | ## ------------------------------------------------------------------------
397 | cluster_naive2 <- function(graph, center1, center2) {
398 | dist <- distances(graph, c(center1, center2))
399 | cl <- apply(dist, 2, which.min)
400 | make_clusters(graph, cl)
401 | }
402 | dist_memb <- cluster_naive2(karate, 'John A', 'Mr Hi')
403 |
404 | ## ------------------------------------------------------------------------
405 | dist_memb
406 |
407 | ## ------------------------------------------------------------------------
408 | rand_index <- compare(ground_truth, dist_memb, method = "rand")
409 | rand_index
410 |
411 | ## ------------------------------------------------------------------------
412 | random_partition <- function(n, k = 2) { sample(k, n, replace = TRUE) }
413 | total <- numeric(100)
414 | for (i in seq_len(100)) {
415 | c1 <- random_partition(100)
416 | c2 <- random_partition(100)
417 | total[i] <- compare(c1, c2, method = "rand")
418 | }
419 | mean(total)
420 |
421 | ## ------------------------------------------------------------------------
422 | total <- numeric(100)
423 | for (i in seq_len(100)) {
424 | c1 <- random_partition(100)
425 | c2 <- random_partition(100)
426 | total[i] <- compare(c1, c2, method = "adjusted.rand")
427 | }
428 | mean(total)
429 |
430 | ## ------------------------------------------------------------------------
431 | compare(ground_truth, dist_memb, method = "adjusted.rand")
432 |
433 | ## ------------------------------------------------------------------------
434 | edge_density(karate)
435 | subgraph_density <- function(graph, vertices) {
436 | sg <- induced_subgraph(graph, vertices)
437 | edge_density(sg)
438 | }
439 |
440 | ## ------------------------------------------------------------------------
441 | subgraph_density(karate, ground_truth[[1]])
442 | subgraph_density(karate, ground_truth[[2]])
443 |
444 | ## ------------------------------------------------------------------------
445 | modularity(ground_truth)
446 | modularity(karate, membership(ground_truth))
447 |
448 | ## ------------------------------------------------------------------------
449 | modularity(karate, rep(1, gorder(karate)))
450 | modularity(karate, seq_len(gorder(karate)))
451 |
452 | ## ------------------------------------------------------------------------
453 | dendrogram <- cluster_edge_betweenness(karate)
454 | dendrogram
455 |
456 | ## ------------------------------------------------------------------------
457 | membership(dendrogram)
458 |
459 | ## ------------------------------------------------------------------------
460 | compare_all <- function(cl1, cl2) {
461 | methods <- eval(as.list(args(compare))$method)
462 | vapply(methods, compare, 1.0, comm1 = cl1, comm2 = cl2)
463 | }
464 | compare_all(dendrogram, ground_truth)
465 |
466 | ## ------------------------------------------------------------------------
467 | cluster_memb <- cut_at(dendrogram, no = 2)
468 | compare_all(cluster_memb, ground_truth)
469 | clustering <- make_clusters(karate, membership = cluster_memb)
470 |
471 | ## ------------------------------------------------------------------------
472 | V(karate)[Faction == 1]$shape <- "circle"
473 | V(karate)[Faction == 2]$shape <- "square"
474 | par(mar=c(0,0,0,0)); plot(clustering, karate)
475 |
476 | ## ------------------------------------------------------------------------
477 | par(mar=c(0,0,0,0)); plot_dendrogram(dendrogram, direction = "downwards")
478 |
479 | ## ------------------------------------------------------------------------
480 | optimal <- cluster_optimal(karate)
481 | modularity(clustering)
482 | modularity(optimal)
483 | modularity(ground_truth)
484 |
485 | ## ------------------------------------------------------------------------
486 | dend_fast <- cluster_fast_greedy(karate)
487 | compare_all(dend_fast, ground_truth)
488 |
489 | ## ------------------------------------------------------------------------
490 | par(mar = c(0,0,0,0)); plot_dendrogram(dend_fast, direction = "downwards")
491 |
492 | ## ------------------------------------------------------------------------
493 | igraph_options(edge.color = "black")
494 | data(karate) ; par(mar=c(0,0,0,0)); plot(karate)
495 |
496 | ## ----fig.width = 6-------------------------------------------------------
497 | V(karate)$color <- "DarkOliveGreen" ; E(karate)$color <- "grey"
498 | par(mar=c(0,0,0,0)) ; plot(karate)
499 |
500 | ## ----fig.width = 6-------------------------------------------------------
501 | par(mar = c(0,0,0,0))
502 | plot(karate, edge.color = "black", vertex.color = "#00B7FF",
503 | vertex.label.color = "black")
504 |
505 | ## ------------------------------------------------------------------------
506 | karate$palette <- categorical_pal(length(clustering))
507 | par(mar = c(0,0,0,0)); plot(karate, vertex.color = membership(clustering))
508 |
509 | ## ------------------------------------------------------------------------
510 | shapes()
511 |
512 | ## ----echo = FALSE--------------------------------------------------------
513 | shapes <- setdiff(shapes(), "")
514 | g <- make_ring(length(shapes))
515 |
516 | ## ----eval = FALSE--------------------------------------------------------
517 | ## plot(g, vertex.shape=shapes, vertex.label=shapes, vertex.label.dist=1,
518 | ## vertex.size=15, vertex.size2=15,
519 | ## vertex.pie=lapply(shapes, function(x) if (x=="pie") 2:6 else 0),
520 | ## vertex.pie.color=list(heat.colors(5)))
521 |
522 | ## ----echo = FALSE--------------------------------------------------------
523 | par(mar = c(0,0,0,0))
524 | plot(g, vertex.shape=shapes, vertex.label=shapes, vertex.label.dist=1,
525 | vertex.size=15, vertex.size2=15,
526 | vertex.pie=lapply(shapes, function(x) if (x=="pie") 2:6 else 0),
527 | vertex.pie.color=list(heat.colors(5)))
528 |
529 | ## ----echo = FALSE--------------------------------------------------------
530 | lat <- make_lattice(c(5,5))
531 | layout(rbind(1:2,3:4))
532 | par(mar=c(0,0,0,0))
533 | set.seed(42); plot(lat, layout = layout_with_fr(lat, niter = 1))
534 | set.seed(42); plot(lat, layout = layout_with_fr(lat, niter = 5))
535 | set.seed(42); plot(lat, layout = layout_with_fr(lat, niter = 10))
536 | set.seed(42); plot(lat, layout = layout_with_fr(lat, niter = 20))
537 |
538 | ## ------------------------------------------------------------------------
539 | tree <- make_tree(20, 3)
540 | par(mar = c(0,0,0,0)); plot(tree, layout=layout_as_tree)
541 |
542 | ## ------------------------------------------------------------------------
543 | l <- layout_as_tree(tree, circular = TRUE)
544 | par(mar = c(0,0,0,0)); plot(tree, layout = l)
545 |
546 | ## ----echo = FALSE--------------------------------------------------------
547 | ## Data taken from http://tehnick-8.narod.ru/dc_clients/
548 | DC <- graph_from_literal("DC++" -+
549 | "LinuxDC++":"BCDC++":"EiskaltDC++":"StrongDC++":"DiCe!++",
550 | "LinuxDC++" -+ "FreeDC++", "BCDC++" -+ "StrongDC++",
551 | "FreeDC++" -+ "BMDC++":"EiskaltDC++",
552 | "StrongDC++" -+ "AirDC++":"zK++":"ApexDC++":"TkDC++",
553 | "StrongDC++" -+ "StrongDC++ SQLite":"RSX++",
554 | "ApexDC++" -+ "FlylinkDC++ ver <= 4xx",
555 | "ApexDC++" -+ "ApexDC++ Speed-Mod":"DiCe!++",
556 | "StrongDC++ SQLite" -+ "FlylinkDC++ ver >= 5xx",
557 | "ApexDC++ Speed-Mod" -+ "FlylinkDC++ ver <= 4xx",
558 | "ApexDC++ Speed-Mod" -+ "GreylinkDC++",
559 | "FlylinkDC++ ver <= 4xx" -+ "FlylinkDC++ ver >= 5xx",
560 | "FlylinkDC++ ver <= 4xx" -+ AvaLink,
561 | "GreylinkDC++" -+ AvaLink:"RayLinkDC++":"SparkDC++":PeLink)
562 |
563 | ## Use edge types
564 | E(DC)$lty <- 1
565 | E(DC)["BCDC++" %->% "StrongDC++"]$lty <- 2
566 | E(DC)["FreeDC++" %->% "EiskaltDC++"]$lty <- 2
567 | E(DC)["ApexDC++" %->% "FlylinkDC++ ver <= 4xx"]$lty <- 2
568 | E(DC)["ApexDC++" %->% "DiCe!++"]$lty <- 2
569 | E(DC)["StrongDC++ SQLite" %->% "FlylinkDC++ ver >= 5xx"]$lty <- 2
570 | E(DC)["GreylinkDC++" %->% "AvaLink"]$lty <- 2
571 |
572 | ## Layers, as on the plot
573 | layers <- list(c("DC++"),
574 | c("LinuxDC++", "BCDC++"),
575 | c("FreeDC++", "StrongDC++"),
576 | c("BMDC++", "EiskaltDC++", "AirDC++", "zK++", "ApexDC++",
577 | "TkDC++", "RSX++"),
578 | c("StrongDC++ SQLite", "ApexDC++ Speed-Mod", "DiCe!++"),
579 | c("FlylinkDC++ ver <= 4xx", "GreylinkDC++"),
580 | c("FlylinkDC++ ver >= 5xx", "AvaLink", "RayLinkDC++",
581 | "SparkDC++", "PeLink"))
582 |
583 | ## Check that we have all nodes
584 | all(sort(unlist(layers)) == sort(V(DC)$name))
585 |
586 | ## Add some graphical parameters
587 | V(DC)$color <- "white"
588 | V(DC)$shape <- "rectangle"
589 | V(DC)$size <- 20
590 | V(DC)$size2 <- 10
591 | V(DC)$label <- lapply(V(DC)$name, function(x)
592 | paste(strwrap(x, 12), collapse="\n"))
593 | E(DC)$arrow.size <- 0.5
594 | invisible()
595 |
596 | ## ------------------------------------------------------------------------
597 | summary(DC)
598 | lay1 <- layout_with_sugiyama(DC, layers=apply(sapply(layers,
599 | function(x) V(DC)$name %in% x), 1, which))
600 |
601 | ## ------------------------------------------------------------------------
602 | par(mar = rep(0, 4))
603 | plot(DC, layout = lay1$layout, vertex.label.cex = 0.5)
604 |
605 | ## ------------------------------------------------------------------------
606 | par(mar = c(0,0,0,0)); plot(lay1$extd_graph, vertex.label.cex=0.5)
607 |
608 | ## ------------------------------------------------------------------------
609 | data(UKfaculty)
610 | UKfaculty
611 |
612 | ## ------------------------------------------------------------------------
613 | par(mar = c(0,0,0,0)); plot(UKfaculty, layout = layout_with_graphopt)
614 |
615 | ## ------------------------------------------------------------------------
616 | cl_uk <- cluster_louvain(as.undirected(UKfaculty))
617 | cl_gr <- contract(UKfaculty, mapping = cl_uk$membership)
618 | E(cl_gr)$weight <- count_multiple(cl_gr)
619 | cl_grs <- simplify(cl_gr)
620 | E(cl_grs)$weight
621 |
622 | ## ------------------------------------------------------------------------
623 | par(mar = c(0,0,0,0)); plot(cl_grs, edge.width=E(cl_grs)$weight / 200,
624 | edge.curved = .2, vertex.size = sizes(cl_uk) * 2)
625 |
626 | ## ------------------------------------------------------------------------
627 | subs <- lapply(groups(cl_uk), induced_subgraph, graph = UKfaculty)
628 | summary(subs[[1]])
629 |
630 | ## ------------------------------------------------------------------------
631 | par(mar=c(0,0,0,0)); plot(subs[[1]])
632 |
633 | ## ------------------------------------------------------------------------
634 | library(networkD3)
635 | d3_net <- simpleNetwork(as_data_frame(karate, what = "edges")[, 1:3])
636 | d3_net
637 |
638 | ## ------------------------------------------------------------------------
639 | library(DiagrammeR)
640 |
641 | ## ------------------------------------------------------------------------
642 | df_kar <- as_data_frame(karate, what = "both")
643 | df_kar$vertices <- cbind(node = rownames(df_kar$vertices),
644 | df_kar$vertices)
645 | dg <- create_graph(
646 | nodes_df = df_kar$vertices,
647 | edges_df = df_kar$edges
648 | )
649 | render_graph(dg, width = 800, height = 600)
650 |
651 | ## ------------------------------------------------------------------------
652 | library(rgexf)
653 | df_fac <- as_data_frame(UKfaculty, what = "both")
654 | df_fac$vertices <- cbind(seq_len(gorder(UKfaculty)), df_fac$vertices)
655 | output <- "images/UKfaculty.gexf"
656 | write.gexf(nodes = df_fac$vertices, edges = df_fac$edges[,1:2],
657 | edgesAtt = df_fac$edges[,-(1:2), drop = FALSE],
658 | output = output)
659 |
660 |
--------------------------------------------------------------------------------
/user-2015.Rmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Statistical Analysis of Network Data"
3 | author: "Gábor Csárdi"
4 | date: "`r Sys.Date()`"
5 | output:
6 | ioslides_presentation:
7 | css: tweaks.css
8 | highlight: pygments
9 | keep_md: yes
10 | vignette: >
11 | %\VignetteIndexEntry{Statistical Analysis of Network Data}
12 | \usepackage[utf8]{inputenc}
13 | %\VignetteEngine{knitr::docco_classic}
14 | ---
15 |
16 | ```{r, setup, echo = FALSE, message = FALSE}
17 | library(knitr)
18 | library(igraph)
19 | opts_chunk$set(
20 | prompt = FALSE,
21 | comment = "#>",
22 | tidy = FALSE)
23 | options(width = 73)
24 | igraph_options(graph.margin = 0, margin = 0)
25 | ```
26 |
27 | ## How to follow this tutorial
28 |
29 | Go to https://github.com/igraph/netuser15
30 |
31 | You will need at least `igraph` version `1.0.0` and `igraphdata` version
32 | `1.0.0`. You will also need the `DiagrammeR` package. To install them
33 | from within R, type:
34 |
35 | ```r
36 | install.packages("igraph")
37 | install.packages("igraphdata")
38 | install.packages("DiagrammeR")
39 | ```
40 |
41 |
42 |
43 | ## Outline
44 |
45 | * Introduction
46 | * Manipulate network data
47 | * Questions
48 |
49 | ### BREAK
50 |
51 | * Classic graph theory: paths
52 | * Social network concepts: centrality, groups
53 | * Visualization
54 | * Questions
55 |
56 | ## Why networks?
57 |
58 | Sometimes connections are important, even more important than
59 | (the properties of) the things they connect.
60 |
61 | ## Example 1: Königsberg Bridges
62 |
63 | 
64 |
65 | -- Bogdan Giuşcă, CC BY-SA 3.0, Wikipedia
66 |
67 | ## Example 2: Page Rank
68 |
69 |
70 |
71 | http://computationalculture.net/article/what_is_in_pagerank
72 |
73 | ## Example 3: Matching Twitter to Facebook
74 |
75 | 
76 |
77 | http://morganlinton.com/wp-content/uploads/2013/12/twitter-facebook-branding2.png
78 |
79 | ## Example 4: Detection of groups
80 |
81 | 
82 |
83 | https://en.wikipedia.org/wiki/Community_structure#/
84 | media/File:Network_Community_Structure.svg
85 |
86 |
87 |
88 |
89 |
90 | ## About igraph
91 |
92 | * Network analysis library, written mostly in C/C++.
93 | * Interface to R and Python
94 | * https://github.com/igraph
95 | * http://igraph.org
96 | * Mailing list, stack overflow help.
97 | * Open GitHub issues for bugs
98 |
99 | # Creating and manipulating networks in R/igraph.
100 |
101 | ## What is a network or graph?
102 |
103 | ```{r echo = FALSE, results = "hide", message = FALSE}
104 | set.seed(42)
105 | library(igraph)
106 | library(igraphdata)
107 | data(karate)
108 | par(mar=c(0,0,0,0))
109 | plot(karate, margin = 0)
110 | ```
111 |
112 | ## More formally:
113 |
114 | * `V`: set of vertices
115 | * `E`: subset of ordered or unordered pairs of vertices. Multiset, really.
116 |
117 | ## Creating toy networks with `make_graph`
118 |
119 | ```{r message = FALSE}
120 | library(igraph)
121 | ```
122 |
123 | ```{r}
124 | toy1 <- make_graph(~ A - B, B - C - D, D - E:F:A, A:B - G:H)
125 | toy1
126 | ```
127 |
128 | ----
129 |
130 | ```{r}
131 | par(mar = c(0,0,0,0)); plot(toy1)
132 | ```
133 |
134 | ----
135 |
136 | ```{r}
137 | toy2 <- make_graph(~ A -+ B, B -+ C -+ D +- A:B)
138 | toy2
139 | ```
140 |
141 | ----
142 |
143 | ```{r}
144 | par(mar = c(0,0,0,0)); plot(toy2)
145 | ```
146 |
147 | ## Printout of a graph
148 |
149 | ```{r}
150 | toy2
151 | ```
152 |
153 | `IGRAPH` means this is a graph object. Next, comes a four letter
154 | code:
155 |
156 | * `U` or `D` for undirected or directed
157 | * `N` if the graph is named, always use named graphs for real data sets.
158 | * `W` if the graph is weighted (has a `weight` edge attribute).
159 | * `B` if the graph is bipartite (has a `type` vertex attribute).
160 |
161 | ## Attributes
162 |
163 | ```{r}
164 | make_ring(5)
165 | ```
166 |
167 | * Some graphs have a name (`name` graph attribute), that comes after
168 | the two dashes.
169 | * Then the various attributes are listed. Attributes
170 | are metadata that is attached to the vertices, edges, or the graph
171 | itself.
172 | * `(v/c)` means that `name` is a vertex attribute, and it is
173 | character.
174 | * `(e/.)` means an edge attribute, `(g/.)` means a graph attribute
175 |
176 | -----
177 |
178 | ```{r}
179 | make_ring(5)
180 | ```
181 | * Attribute types: `c` for character, `n` for numeric, `l` for
182 | logical and `x` (complex) for anything else.
183 | * igraph treats some attributes specially. Always start your non-special
184 | attributes with an uppercase letter.
185 |
186 | ## Real network data
187 |
188 | ## Adjacency matrices
189 |
190 | ```{r}
191 | A <- matrix(sample(0:1, 100, replace = TRUE), nrow = 10)
192 | A
193 | ```
194 |
195 | -----
196 |
197 | ```{r}
198 | graph_from_adjacency_matrix(A)
199 | ```
200 |
201 | ## List of edges
202 |
203 | ```{r}
204 | L <- matrix(sample(1:10, 20, replace = TRUE), ncol = 2)
205 | L
206 | ```
207 |
208 | -----
209 |
210 | ```{r}
211 | graph_from_edgelist(L)
212 | ```
213 |
214 | ## Two tables, one for vertices, one for edges
215 |
216 | ```{r}
217 | edges <- data.frame(
218 | stringsAsFactors = FALSE,
219 | from = c("BOS", "JFK", "LAX"),
220 | to = c("JFK", "LAX", "JFK"),
221 | Carrier = c("United", "Jetblue", "Virgin America"),
222 | Departures = c(30, 60, 121)
223 | )
224 | vertices <- data.frame(
225 | stringsAsFactors = FALSE,
226 | name = c("BOS", "JFK", "LAX"),
227 | City = c("Boston, MA", "New York City, NY",
228 | "Los Angeles, CA")
229 | )
230 | ```
231 |
232 | -----
233 |
234 | ```{r}
235 | edges
236 | ```
237 |
238 | -----
239 |
240 | ```{r}
241 | vertices
242 | ```
243 |
244 | -----
245 |
246 | ```{r}
247 | toy_air <- graph_from_data_frame(edges, vertices = vertices)
248 | toy_air
249 | ```
250 |
251 | ----
252 |
253 | The real US airports data set is in the `igraphdata` package:
254 |
255 | ```{r}
256 | library(igraphdata)
257 | data(USairports)
258 | USairports
259 | ```
260 |
261 | ----
262 |
263 | Converting it back to tables
264 |
265 | ```{r}
266 | as_data_frame(toy_air, what = "edges")
267 | ```
268 |
269 | -----
270 |
271 | ```{r}
272 | as_data_frame(toy_air, what = "vertices")
273 | ```
274 |
275 | -----
276 |
277 | Long data frames
278 |
279 | ```{r}
280 | as_long_data_frame(toy_air)
281 | ```
282 |
283 | -----
284 |
285 | Quickly look at the metadata, without conversion:
286 |
287 | ```{r}
288 | V(USairports)[[1:5]]
289 | ```
290 |
291 | ----
292 |
293 | ```{r}
294 | E(USairports)[[1:5]]
295 | ```
296 |
297 | ## Weighted graphs
298 |
299 | Numbers (usually real) assigned to edges. E.g. number of departures,
300 | or number of passengers.
301 |
302 | 
303 |
304 | http://web.cecs.pdx.edu/~sheard/course/Cs163/Doc/Graphs.html
305 |
306 | ## Multigraphs
307 |
308 | They have multiple (directed) edges between the
309 | same pair of vertices. A graph that has no multiple edges
310 | and no loop edges is a simple graph.
311 |
312 | 
313 |
314 | https://en.wikipedia.org/wiki/Multigraph
315 |
316 | Multi-graphs are nasty. Always check if your graph is a multi-graph.
317 |
318 | -----
319 |
320 | ```{r}
321 | is_simple(USairports)
322 | sum(which_multiple(USairports))
323 | sum(which_loop(USairports))
324 | ```
325 |
326 | -----
327 |
328 | `simplify()` creates a simple graph from a multigraph, in a flexible
329 | way: you can specify what it should do with the edge attributes.
330 |
331 | ```{r}
332 | air <- simplify(USairports, edge.attr.comb =
333 | list(Departures = "sum", Seats = "sum", Passengers = "sum", "ignore"))
334 | is_simple(air)
335 | summary(air)
336 | ```
337 |
338 | ## Querying and manipulating networks: the `[` and `[[` operators
339 |
340 | The `[` operator treats the graph as an adjacency matrix.
341 |
342 | ```
343 | BOS JFK ANC EWR . . .
344 | BOS . 1 . 1
345 | JFK 1 . 1 .
346 | ANC . 1 . .
347 | EWR 1 . 1 .
348 | . . .
349 | ```
350 | -----
351 |
352 | The `[[` operator treats the graph as an adjacency list.
353 |
354 | ```{r eval = FALSE}
355 | BOS: JFK, LAX, EWR, MKE, PVD
356 | JFK: BGR, BOS, SFO, BNA, BUF, SRQ, RIC RDU, MSP
357 | LAX: DTW, MSY, LAS, FLL, STL,
358 | . . .
359 | ```
360 |
361 | ## Queries
362 |
363 | Does an edge exist?
364 |
365 | ```{r}
366 | air["BOS", "JFK"]
367 | air["BOS", "ANC"]
368 | ```
369 |
370 | -----
371 |
372 | Convert the graph to an adjacency matrix, or just a part of it:
373 |
374 | ```{r}
375 | air[c("BOS", "JFK", "ANC"), c("BOS", "JFK", "ANC")]
376 | ```
377 |
378 | For weighted graphs, query the edge weight:
379 |
380 | ```{r}
381 | E(air)$weight <- E(air)$Passengers
382 | air["BOS", "JFK"]
383 | ```
384 |
385 | ----
386 |
387 | All adjacent vertices of a vertex:
388 |
389 | ```{r}
390 | air[["BOS"]]
391 | ```
392 |
393 | ----
394 |
395 | ```{r}
396 | air[[, "BOS"]]
397 | ```
398 |
399 | ## Manipulation
400 |
401 | Add an edge (and potentially set its weight):
402 | ```{r}
403 | air["BOS", "ANC"] <- TRUE
404 | air["BOS", "ANC"]
405 | ```
406 |
407 | Remove an edge:
408 | ```{r}
409 | air["BOS", "ANC"] <- FALSE
410 | air["BOS", "ANC"]
411 | ```
412 |
413 | ----
414 |
415 | Note that you can use all allowed indexing modes, e.g.
416 | ```{r}
417 | g <- make_empty_graph(10)
418 | g[-1, 1] <- TRUE
419 | g
420 | ```
421 | creates a star graph.
422 |
423 | ----
424 |
425 | Add vertices to a graph:
426 |
427 | ```{r}
428 | g <- make_ring(10) + 2
429 | par(mar = c(0,0,0,0)); plot(g)
430 | ```
431 |
432 | ----
433 |
434 | Add vertices with attributes:
435 |
436 | ```{r}
437 | g <- make_(ring(10), with_vertex_(color = "grey")) +
438 | vertices(2, color = "red")
439 | par(mar = c(0,0,0,0)); plot(g)
440 | ```
441 |
442 | ----
443 |
444 | Add an edge
445 |
446 | ```{r}
447 | g <- make_(star(10), with_edge_(color = "grey")) +
448 | edge(5, 6, color = "red")
449 | par(mar = c(0,0,0,0)); plot(g)
450 | ```
451 |
452 | ----
453 |
454 | Add a chain of edges
455 |
456 | ```{r}
457 | g <- make_(empty_graph(5)) + path(1,2,3,4,5,1)
458 | g2 <- make_(empty_graph(5)) + path(1:5, 1)
459 | g
460 | g2
461 | ```
462 |
463 | ## Exercise
464 |
465 | Create the wheel graph.
466 |
467 | ```{r echo = FALSE}
468 | par(mar=c(0,0,0,0))
469 | plot(make_star(11, center = 11, mode = "undirected") + path(1:10, 1))
470 | ```
471 |
472 | ## (A) solution
473 |
474 | ```{r}
475 | make_star(11, center = 11, mode = "undirected") + path(1:10, 1)
476 | ```
477 |
478 | ## Vertex sequences
479 |
480 | They are the key objects to manipulate graphs. Vertex sequences
481 | can be created in various ways. Most frequently used ones:
482 |
483 | |expression |result |
484 | |:--------------------------|:---------------------------------|
485 | |`V(air)` |All vertices. |
486 | |`V(air)[1,2:5]` |Vertices in these positions |
487 | |`V(air)[degree(air) < 2]` |Vertices satisfying condition |
488 | |`V(air)[nei('BOS')]` |Neighbors of a vertex |
489 | |`V(air)['BOS', 'JFK']` |Select given vertices |
490 |
491 | ## Edge sequences
492 |
493 | The same for edges:
494 |
495 | |expresssion |result |
496 | |:--------------------------|:--------------------------------------------|
497 | |`E(air)` |All edges. |
498 | |`E(air)[FL %--% CA]` |Edges between two vertex sets |
499 | |`E(air)[FL %->% CA]` |Edges between two vertex sets, directionally |
500 | |`E(air, path = P)` |Edges along a path |
501 | |`E(air)[to('BOS')]` |Incoming edges of a vertex |
502 | |`E(air)[from('BOS')]` |Outgoing edges of a vertex |
503 |
504 | ## Manipulate attributes via vertex and edge sequences
505 |
506 | ```{r}
507 | FL <- V(air)[grepl("FL$", City)]
508 | CA <- V(air)[grepl("CA$", City)]
509 |
510 | V(air)$color <- "grey"
511 | V(air)[FL]$color <- "blue"
512 | V(air)[CA]$color <- "blue"
513 | ```
514 |
515 | ----
516 |
517 | ```{r}
518 | E(air)[FL %--% CA]
519 | E(air)$color <- "grey"
520 | E(air)[FL %--% CA]$color <- "red"
521 | ```
522 |
523 | ## Quick look at metadata
524 |
525 | ```{r}
526 | V(air)[[1:5]]
527 | ```
528 |
529 | ----
530 |
531 | ```{r}
532 | E(air)[[1:5]]
533 | ```
534 |
535 | # BREAK
536 |
537 | ## Paths
538 |
539 |
540 | ```{r echo = FALSE}
541 | set.seed(42)
542 | g <- sample_gnp(12, 0.25)
543 | l <- layout_nicely(g)
544 | par(mar=c(0,0,0,0))
545 | plot(g, margin = 0, layout = l)
546 | ```
547 |
548 | ## Paths
549 |
550 | ```{r echo = FALSE}
551 | pa <- V(g)[11, 2, 12, 8]
552 | V(g)[pa]$color <- 'green'
553 | E(g)$color <- 'grey'
554 | E(g, path = pa)$color <- 'red'
555 | E(g, path = pa)$width <- 3
556 | par(mar=c(0,0,0,0))
557 | plot(g, margin = 0, layout = l)
558 | ```
559 |
560 | ## Define a path in igraph
561 |
562 | ```{r}
563 | set.seed(42)
564 | g <- sample_gnp(12, 0.25)
565 |
566 | pa <- V(g)[11, 2, 12, 8]
567 |
568 | V(g)[pa]$color <- 'green'
569 | E(g)$color <- 'grey'
570 | E(g, path = pa)$color <- 'red'
571 | E(g, path = pa)$width <- 3
572 | ```
573 |
574 | ----
575 |
576 | ```{r}
577 | par(mar=c(0,0,0,0))
578 | plot(g, margin = 0, layout = layout_nicely)
579 | ```
580 |
581 | ## Shortest paths
582 |
583 | ```{r echo = FALSE}
584 | set.seed(42)
585 | g <- sample_gnp(12, 0.25)
586 | pa <- V(g)[11, 2, 12, 8]
587 | V(g)[pa]$color <- 'green'
588 | E(g)$color <- 'grey'
589 | E(g, path = pa)$color <- 'red'
590 | E(g, path = pa)$width <- 3
591 | par(mar=c(0,0,0,0))
592 | plot(g, margin = 0, layout = layout_nicely)
593 | ```
594 |
595 | ----
596 |
597 | Length of the shortest path: distance.
598 | How many planes to get from `PBI` to `BDL`?
599 |
600 | ```{r}
601 | air <- delete_edge_attr(air, "weight")
602 | distances(air, 'PBI', 'ANC')
603 | ```
604 |
605 | ----
606 |
607 | ```{r}
608 | sp <- shortest_paths(air, 'PBI', 'ANC', output = "both")
609 | sp
610 | air[[ sp$epath[[1]] ]]
611 | ```
612 |
613 | ----
614 |
615 | ```{r}
616 | all_shortest_paths(air, 'PBI', 'ANC')$res
617 | ```
618 |
619 | ## Weighted paths
620 |
621 | ```{r}
622 | wair <- simplify(USairports, edge.attr.comb =
623 | list(Departures = "sum", Seats = "sum", Passangers = "sum",
624 | Distance = "first", "ignore"))
625 | E(wair)$weight <- E(wair)$Distance
626 | ```
627 |
628 | ## Weighted (shortest) paths
629 |
630 | ```{r}
631 | distances(wair, c('BOS', 'JFK', 'PBI', 'AZO'),
632 | c('BOS', 'JFK', 'PBI', 'AZO'))
633 | ```
634 |
635 | ----
636 |
637 | ```{r}
638 | shortest_paths(wair, from = 'BOS', to = 'AZO')$vpath
639 | all_shortest_paths(wair, from = 'BOS', to = 'AZO')$res
640 | ```
641 |
642 | ## Mean path length
643 |
644 | ```{r}
645 | mean_distance(air)
646 | air_dist_hist <- distance_table(air)
647 | air_dist_hist
648 | ```
649 |
650 | ----
651 |
652 | ```{r}
653 | barplot(air_dist_hist$res, names.arg = seq_along(air_dist_hist$res))
654 | ```
655 |
656 | ## Components
657 |
658 |
659 |
660 | David Eppstein, public domain
661 |
662 | ## Strongly connected components
663 |
664 |
665 |
666 | http://www.greatandlittle.com/studios/
667 |
668 | ----
669 |
670 | ```{r}
671 | co <- components(air, mode = "weak")
672 | co$csize
673 | groups(co)[[2]]
674 | ```
675 |
676 | ----
677 |
678 | ```{r}
679 | co <- components(air, mode = "strong")
680 | co$csize
681 | ```
682 |
683 | ## Bow-tie structure of a directed graph
684 |
685 |
686 |
687 | http://webdatacommons.org/hyperlinkgraph/2012-08/topology.html
688 |
689 | ## Exercise
690 |
691 | 1. Extract the large (strongly) connected component from the
692 | airport graph, as a separate graph.
693 | Hint: `components()`, `induced_subgraph()`.
694 | How many airports are not in this component?
695 |
696 | 1. In the large connected component, which airport is better
697 | connected, `LAX` or `BOS`? I.e. what is the mean number of
698 | plane changes that are required if traveling to a uniformly
699 | randomly picked airport?
700 |
701 | 1. Which airport is the best connected one? Which one is the
702 | worst (within the strongly connected component)?
703 |
704 | ## Solution
705 |
706 | ```{r}
707 | largest_component <- function(graph) {
708 | comps <- components(graph, mode = "strong")
709 | gr <- groups(comps)
710 | sizes <- vapply(gr, length, 1L)
711 | induced_subgraph(graph, gr[[ which.max(sizes) ]])
712 | }
713 | sc_air <- largest_component(air)
714 | ```
715 |
716 | ----
717 |
718 | ```{r}
719 | table(distances(sc_air, "BOS"))
720 | table(distances(sc_air, "LAX"))
721 | ```
722 |
723 | ----
724 |
725 | ```{r}
726 | mean(as.vector(distances(sc_air, "BOS")))
727 | mean(as.vector(distances(sc_air, "LAX")))
728 | ```
729 |
730 | ----
731 |
732 | ```{r}
733 | D <- distances(sc_air)
734 | sort(rowMeans(D))[1:10]
735 | ```
736 |
737 | ----
738 |
739 | ```{r}
740 | sort(rowMeans(D), decreasing = TRUE)[1:10]
741 | ```
742 |
743 | ----
744 |
745 | ```{r}
746 | V(sc_air)[[names(sort(rowMeans(D), decreasing = TRUE)[1:10])]]
747 | ```
748 |
749 | ## Centrality
750 |
751 | Finding important vertices in the network (family of concepts)
752 |
753 | ```{r echo = FALSE}
754 | par(mar=c(0,0,0,0))
755 | plot(make_star(11))
756 | ```
757 |
758 | ## Centrality
759 |
760 | ```{r echo = FALSE}
761 | data(kite)
762 | par(mar=c(0,0,0,0))
763 | plot(kite)
764 | ```
765 |
766 | ## Classic centrality measures: degree
767 |
768 | ```{r}
769 | V(kite)$label.cex <- 2
770 | V(kite)$color <- V(kite)$frame.color <- "grey"
771 | V(kite)$size <- 30
772 | par(mar=c(0,0,0,0)) ; plot(kite)
773 | ```
774 |
775 | -------
776 |
777 | ```{r}
778 | d <- degree(kite)
779 | par(mar = c(0,0,0,0))
780 | plot(kite, vertex.size = 10 * d, vertex.label =
781 | paste0(V(kite)$name, ":", d))
782 | ```
783 |
784 |
785 | ## Classic centrality measures: closeness
786 |
787 | 1 / How many steps do you need to get there?
788 |
789 | ```{r}
790 | cl <- closeness(kite)
791 | ```
792 |
793 | -----
794 |
795 | ```{r}
796 | par(mar=c(0,0,0,0)); plot(kite, vertex.size = 500 * cl)
797 | ```
798 |
799 | ## Classic centrality measures: betweenness
800 |
801 | How many shortest paths goes through me
802 |
803 | ```{r}
804 | btw <- betweenness(kite)
805 | btw
806 | ```
807 |
808 | -----
809 |
810 | ```{r}
811 | par(mar=c(0,0,0,0)); plot(kite, vertex.size = 3 * btw)
812 | ```
813 |
814 | ## Eigenvector centrality
815 |
816 | Typically for directed. Central vertex: it is cited by central vertices.
817 |
818 | ```{r}
819 | ec <- eigen_centrality(kite)$vector
820 | ec
821 | cor(ec, d)
822 | ```
823 |
824 | -----
825 |
826 | ```{r}
827 | par(mar=c(0,0,0,0)); plot(kite, vertex.size = 20 * ec)
828 | ```
829 |
830 | ## Page Rank
831 |
832 | Fixes the practical problems with eigenvector centrality
833 |
834 | ```{r}
835 | page_rank(kite)$vector
836 | ```
837 |
838 | ## Exercise
839 |
840 | Create a table that contains the top 10 most central
841 | airports according to all these centrality measures.
842 |
843 | # Clusters
844 |
845 | ## Why finding groups
846 |
847 | Finding groups in networks. Dimensionality reduction. Community detection.
848 |
849 | We want to find dense groups.
850 |
851 | -----
852 |
853 |
854 |
855 | ## Clusters by hand
856 |
857 | ```{r}
858 | graph <- make_graph( ~ A-B-C-D-A, E-A:B:C:D,
859 | F-G-H-I-F, J-F:G:H:I,
860 | K-L-M-N-K, O-K:L:M:N,
861 | P-Q-R-S-P, T-P:Q:R:S,
862 | B-F, E-J, C-I, L-T, O-T, M-S,
863 | C-P, C-L, I-L, I-P)
864 | ```
865 |
866 | ----
867 |
868 | ```{r}
869 | par(mar=c(0,0,0,0)); plot(graph)
870 | ```
871 |
872 | ----
873 |
874 | ```{r}
875 | flat_clustering <- make_clusters(
876 | graph,
877 | c(1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,4,4,4,4,4))
878 | ```
879 |
880 | -----
881 |
882 | ```{r}
883 | flat_clustering
884 | ```
885 |
886 | -----
887 |
888 | ```{r}
889 | flat_clustering[[1]]
890 | length(flat_clustering)
891 | sizes(flat_clustering)
892 | ```
893 |
894 | -----
895 |
896 | ```{r}
897 | induced_subgraph(graph, flat_clustering[[1]])
898 | ```
899 |
900 | ## Hierarchical community structure
901 |
902 | Typically produced by top-down or bottom-up clustering algorithms.
903 |
904 | The outcome can be represented as a *dendrogram*,
905 | a tree-like diagram that illustrates the order in which the clusters
906 | are merged (in the bottom-up case) or split (in the top-down case).
907 |
908 | -----
909 |
910 |
911 |
912 | ## Clustering quality measures
913 |
914 | - External quality measures: require ground truth
915 | - Internal quality measures: require assumption about *good*
916 | clusters.
917 |
918 | ## External quality measures
919 |
920 | Measure | Type | Range | igraph name
921 | ------------------------------|------------|------------|----------------
922 | Rand index | similarity | 0 to 1 | `rand`
923 | Adjusted Rand index | similarity | -0.5 to 1 | `adjusted.rand`
924 | Split-join distance | distance | 0 to 2n | `split.join`
925 | Variation of information | distance | 0 to log n | `vi` |
926 | Normalized mutual information | similarity | 0 to 1 | `nmi`
927 |
928 | ## External quality measures
929 |
930 | ```{r}
931 | data(karate)
932 | karate
933 | karate <- delete_edge_attr(karate, "weight")
934 | ```
935 |
936 | -----
937 |
938 | ```{r}
939 | ground_truth <- make_clusters(karate, V(karate)$Faction)
940 | length(ground_truth)
941 | ground_truth
942 | ```
943 |
944 | ## Exercise
945 |
946 | Write a naive clustering method that classifies vertices
947 | into two groups, based on two center vertices. Put the two
948 | centers in separate clusters, and other vertices in the
949 | cluster whose center is closer to it.
950 |
951 | ```{r}
952 | cluster_naive2 <- function(graph, center1, center2) {
953 | # ...
954 | }
955 | ```
956 |
957 | ## Solution
958 |
959 | ```{r}
960 | cluster_naive2 <- function(graph, center1, center2) {
961 | dist <- distances(graph, c(center1, center2))
962 | cl <- apply(dist, 2, which.min)
963 | make_clusters(graph, cl)
964 | }
965 | dist_memb <- cluster_naive2(karate, 'John A', 'Mr Hi')
966 | ```
967 |
968 | ----
969 |
970 | ```{r}
971 | dist_memb
972 | ```
973 |
974 | ## Rand index
975 |
976 | Check if pairs of vertices are classified correctly
977 |
978 | ```{r}
979 | rand_index <- compare(ground_truth, dist_memb, method = "rand")
980 | rand_index
981 | ```
982 |
983 | ## Rand index
984 |
985 | Random clusterings
986 |
987 | ```{r}
988 | random_partition <- function(n, k = 2) { sample(k, n, replace = TRUE) }
989 | total <- numeric(100)
990 | for (i in seq_len(100)) {
991 | c1 <- random_partition(100)
992 | c2 <- random_partition(100)
993 | total[i] <- compare(c1, c2, method = "rand")
994 | }
995 | mean(total)
996 | ```
997 |
998 | ## Adjusted Rand index
999 |
1000 | ```{r}
1001 | total <- numeric(100)
1002 | for (i in seq_len(100)) {
1003 | c1 <- random_partition(100)
1004 | c2 <- random_partition(100)
1005 | total[i] <- compare(c1, c2, method = "adjusted.rand")
1006 | }
1007 | mean(total)
1008 | ```
1009 |
1010 | ## Adjusted rand index
1011 |
1012 | ```{r}
1013 | compare(ground_truth, dist_memb, method = "adjusted.rand")
1014 | ```
1015 |
1016 | ## Internal quality metrics: density
1017 |
1018 | ```{r}
1019 | edge_density(karate)
1020 | subgraph_density <- function(graph, vertices) {
1021 | sg <- induced_subgraph(graph, vertices)
1022 | edge_density(sg)
1023 | }
1024 | ```
1025 |
1026 | ```{r}
1027 | subgraph_density(karate, ground_truth[[1]])
1028 | subgraph_density(karate, ground_truth[[2]])
1029 | ```
1030 |
1031 | ## Internal quality metrics: modularity
1032 |
1033 | Uses a null model
1034 |
1035 | $$Q(G) = \frac{1}{2m} \sum_{i=1}^n \sum_{j=1}^n \left( A_{ij} - p_{ij} \right) \delta_{ij}$$
1036 |
1037 | $A_{ij}$: Adjacency matrix
1038 |
1039 | $\delta_{ij}$: $i$ and $j$ are in the same cluster
1040 |
1041 | $p_{ij}$ expected value for an $(i,j)$ edge from the null model
1042 |
1043 | ## Modularity
1044 |
1045 | Common null model: degree-sequence (configuration) model
1046 |
1047 | $$Q(G) = \frac{1}{2m} \sum_{i=1}^n \sum_{j=1}^n \left( A_{ij} - \frac{k_i k_j}{2m} \right)
1048 | \delta_{ij}$$
1049 |
1050 | ## Modularity in igraph
1051 |
1052 | ```{r}
1053 | modularity(ground_truth)
1054 | modularity(karate, membership(ground_truth))
1055 | ```
1056 |
1057 | ----
1058 |
1059 | Well behaving:
1060 |
1061 | ```{r}
1062 | modularity(karate, rep(1, gorder(karate)))
1063 | modularity(karate, seq_len(gorder(karate)))
1064 | ```
1065 |
1066 | ## Heuristic algorithms
1067 |
1068 | Edge-betweenness clustering
1069 |
1070 | Exact modularity optimization
1071 |
1072 | Greedy agglomerative algorithm to maximize modularity
1073 |
1074 | ## Edge-betweenness clustering
1075 |
1076 | ```{r}
1077 | dendrogram <- cluster_edge_betweenness(karate)
1078 | dendrogram
1079 | ```
1080 |
1081 | -----
1082 |
1083 | ```{r}
1084 | membership(dendrogram)
1085 | ```
1086 |
1087 | -----
1088 |
1089 | ```{r}
1090 | compare_all <- function(cl1, cl2) {
1091 | methods <- eval(as.list(args(compare))$method)
1092 | vapply(methods, compare, 1.0, comm1 = cl1, comm2 = cl2)
1093 | }
1094 | compare_all(dendrogram, ground_truth)
1095 | ```
1096 |
1097 | -----
1098 |
1099 | ```{r}
1100 | cluster_memb <- cut_at(dendrogram, no = 2)
1101 | compare_all(cluster_memb, ground_truth)
1102 | clustering <- make_clusters(karate, membership = cluster_memb)
1103 | ```
1104 |
1105 | ----
1106 |
1107 | ```{r}
1108 | V(karate)[Faction == 1]$shape <- "circle"
1109 | V(karate)[Faction == 2]$shape <- "square"
1110 | par(mar=c(0,0,0,0)); plot(clustering, karate)
1111 | ```
1112 |
1113 | -----
1114 |
1115 | ```{r}
1116 | par(mar=c(0,0,0,0)); plot_dendrogram(dendrogram, direction = "downwards")
1117 | ```
1118 |
1119 | ## Exact modularity maximization
1120 |
1121 | ```{r}
1122 | optimal <- cluster_optimal(karate)
1123 | modularity(clustering)
1124 | modularity(optimal)
1125 | modularity(ground_truth)
1126 | ```
1127 |
1128 | ## Heuristic modularity optimization
1129 |
1130 | ```{r}
1131 | dend_fast <- cluster_fast_greedy(karate)
1132 | compare_all(dend_fast, ground_truth)
1133 | ```
1134 |
1135 | -----
1136 |
1137 | ```{r}
1138 | par(mar = c(0,0,0,0)); plot_dendrogram(dend_fast, direction = "downwards")
1139 | ```
1140 |
1141 | # Visualization
1142 |
1143 | ## Plotting parameters
1144 |
1145 | ----
1146 |
1147 | Globally
1148 |
1149 | ```{r}
1150 | igraph_options(edge.color = "black")
1151 | data(karate) ; par(mar=c(0,0,0,0)); plot(karate)
1152 | ```
1153 |
1154 | -----
1155 |
1156 | Graph parameter
1157 |
1158 | ```{r fig.width = 6}
1159 | V(karate)$color <- "DarkOliveGreen" ; E(karate)$color <- "grey"
1160 | par(mar=c(0,0,0,0)) ; plot(karate)
1161 | ```
1162 |
1163 | -----
1164 |
1165 | As an argument to `plot()`:
1166 | ```{r fig.width = 6}
1167 | par(mar = c(0,0,0,0))
1168 | plot(karate, edge.color = "black", vertex.color = "#00B7FF",
1169 | vertex.label.color = "black")
1170 | ```
1171 |
1172 | ## igraph color palettes
1173 |
1174 | ```{r}
1175 | karate$palette <- categorical_pal(length(clustering))
1176 | par(mar = c(0,0,0,0)); plot(karate, vertex.color = membership(clustering))
1177 | ```
1178 |
1179 | ----
1180 |
1181 | Others: `r_pal()`, `sequential_pal()`, `diverging_pal()`.
1182 |
1183 | ## Graphical parameters
1184 |
1185 | Vertices: `size`, `size`, `color`, `frame.color`, `shape` (circle, square, rectangle, pie,
1186 | raster, none), `label`, `label.family`, `label.font`, `label.cex`, `label.dist`,
1187 | `label.degree`, `label.color`.
1188 |
1189 | Edges: `color`, `width`, `arrow.size`, `arrow.width`, `lty`, `label`,
1190 | `label.family`, `label.font`, `label.cex`, `label.color`, `label.x`, `label.y`,
1191 | `curved`, `arrow.mode`, `loop.angle`, `loop.angle2`.
1192 |
1193 | Graph: `layout` (a numeric matrix), `margin`, `palette` (for vertex color),
1194 | `rescale`, `asp`, `frame`, `main` (title), `sub` (title), `xlab`, `ylab`.
1195 |
1196 | ## Vertex shapes
1197 |
1198 | ```{r}
1199 | shapes()
1200 | ```
1201 |
1202 | ----
1203 |
1204 | ```{r echo = FALSE}
1205 | shapes <- setdiff(shapes(), "")
1206 | g <- make_ring(length(shapes))
1207 | ```
1208 |
1209 | ```{r eval = FALSE}
1210 | plot(g, vertex.shape=shapes, vertex.label=shapes, vertex.label.dist=1,
1211 | vertex.size=15, vertex.size2=15,
1212 | vertex.pie=lapply(shapes, function(x) if (x=="pie") 2:6 else 0),
1213 | vertex.pie.color=list(heat.colors(5)))
1214 | ```
1215 |
1216 | ----
1217 |
1218 | ```{r echo = FALSE}
1219 | par(mar = c(0,0,0,0))
1220 | plot(g, vertex.shape=shapes, vertex.label=shapes, vertex.label.dist=1,
1221 | vertex.size=15, vertex.size2=15,
1222 | vertex.pie=lapply(shapes, function(x) if (x=="pie") 2:6 else 0),
1223 | vertex.pie.color=list(heat.colors(5)))
1224 | ```
1225 |
1226 | ## Layout algorithms
1227 |
1228 | Layout algorithm: place the vertices in a way, such that
1229 |
1230 | * nodes are distributed evenly
1231 | * edges have about the same length
1232 | * connected vertices are closer to each other
1233 | * edges are not crossing
1234 |
1235 | This is really hard, often impossible!
1236 |
1237 | ## Force-directed algorithms
1238 |
1239 | ```{r echo = FALSE}
1240 | lat <- make_lattice(c(5,5))
1241 | layout(rbind(1:2,3:4))
1242 | par(mar=c(0,0,0,0))
1243 | set.seed(42); plot(lat, layout = layout_with_fr(lat, niter = 1))
1244 | set.seed(42); plot(lat, layout = layout_with_fr(lat, niter = 5))
1245 | set.seed(42); plot(lat, layout = layout_with_fr(lat, niter = 10))
1246 | set.seed(42); plot(lat, layout = layout_with_fr(lat, niter = 20))
1247 | ```
1248 |
1249 | ## Trees
1250 |
1251 | ```{r}
1252 | tree <- make_tree(20, 3)
1253 | par(mar = c(0,0,0,0)); plot(tree, layout=layout_as_tree)
1254 | ```
1255 |
1256 | ----
1257 |
1258 | ```{r}
1259 | l <- layout_as_tree(tree, circular = TRUE)
1260 | par(mar = c(0,0,0,0)); plot(tree, layout = l)
1261 | ```
1262 |
1263 | ----
1264 |
1265 | ```{r echo = FALSE}
1266 | ## Data taken from http://tehnick-8.narod.ru/dc_clients/
1267 | DC <- graph_from_literal("DC++" -+
1268 | "LinuxDC++":"BCDC++":"EiskaltDC++":"StrongDC++":"DiCe!++",
1269 | "LinuxDC++" -+ "FreeDC++", "BCDC++" -+ "StrongDC++",
1270 | "FreeDC++" -+ "BMDC++":"EiskaltDC++",
1271 | "StrongDC++" -+ "AirDC++":"zK++":"ApexDC++":"TkDC++",
1272 | "StrongDC++" -+ "StrongDC++ SQLite":"RSX++",
1273 | "ApexDC++" -+ "FlylinkDC++ ver <= 4xx",
1274 | "ApexDC++" -+ "ApexDC++ Speed-Mod":"DiCe!++",
1275 | "StrongDC++ SQLite" -+ "FlylinkDC++ ver >= 5xx",
1276 | "ApexDC++ Speed-Mod" -+ "FlylinkDC++ ver <= 4xx",
1277 | "ApexDC++ Speed-Mod" -+ "GreylinkDC++",
1278 | "FlylinkDC++ ver <= 4xx" -+ "FlylinkDC++ ver >= 5xx",
1279 | "FlylinkDC++ ver <= 4xx" -+ AvaLink,
1280 | "GreylinkDC++" -+ AvaLink:"RayLinkDC++":"SparkDC++":PeLink)
1281 |
1282 | ## Use edge types
1283 | E(DC)$lty <- 1
1284 | E(DC)["BCDC++" %->% "StrongDC++"]$lty <- 2
1285 | E(DC)["FreeDC++" %->% "EiskaltDC++"]$lty <- 2
1286 | E(DC)["ApexDC++" %->% "FlylinkDC++ ver <= 4xx"]$lty <- 2
1287 | E(DC)["ApexDC++" %->% "DiCe!++"]$lty <- 2
1288 | E(DC)["StrongDC++ SQLite" %->% "FlylinkDC++ ver >= 5xx"]$lty <- 2
1289 | E(DC)["GreylinkDC++" %->% "AvaLink"]$lty <- 2
1290 |
1291 | ## Layers, as on the plot
1292 | layers <- list(c("DC++"),
1293 | c("LinuxDC++", "BCDC++"),
1294 | c("FreeDC++", "StrongDC++"),
1295 | c("BMDC++", "EiskaltDC++", "AirDC++", "zK++", "ApexDC++",
1296 | "TkDC++", "RSX++"),
1297 | c("StrongDC++ SQLite", "ApexDC++ Speed-Mod", "DiCe!++"),
1298 | c("FlylinkDC++ ver <= 4xx", "GreylinkDC++"),
1299 | c("FlylinkDC++ ver >= 5xx", "AvaLink", "RayLinkDC++",
1300 | "SparkDC++", "PeLink"))
1301 |
1302 | ## Check that we have all nodes
1303 | all(sort(unlist(layers)) == sort(V(DC)$name))
1304 |
1305 | ## Add some graphical parameters
1306 | V(DC)$color <- "white"
1307 | V(DC)$shape <- "rectangle"
1308 | V(DC)$size <- 20
1309 | V(DC)$size2 <- 10
1310 | V(DC)$label <- lapply(V(DC)$name, function(x)
1311 | paste(strwrap(x, 12), collapse="\n"))
1312 | E(DC)$arrow.size <- 0.5
1313 | invisible()
1314 | ```
1315 |
1316 | ```{r}
1317 | summary(DC)
1318 | lay1 <- layout_with_sugiyama(DC, layers=apply(sapply(layers,
1319 | function(x) V(DC)$name %in% x), 1, which))
1320 | ```
1321 |
1322 | ----
1323 |
1324 | ```{r}
1325 | par(mar = rep(0, 4))
1326 | plot(DC, layout = lay1$layout, vertex.label.cex = 0.5)
1327 | ```
1328 |
1329 | ----
1330 |
1331 | ```{r}
1332 | par(mar = c(0,0,0,0)); plot(lay1$extd_graph, vertex.label.cex=0.5)
1333 | ```
1334 |
1335 | ## Slightly bigger networks
1336 |
1337 | ```{r}
1338 | data(UKfaculty)
1339 | UKfaculty
1340 | ```
1341 |
1342 | ----
1343 |
1344 | ```{r}
1345 | par(mar = c(0,0,0,0)); plot(UKfaculty, layout = layout_with_graphopt)
1346 | ```
1347 |
1348 | ----
1349 |
1350 | ```{r}
1351 | cl_uk <- cluster_louvain(as.undirected(UKfaculty))
1352 | cl_gr <- contract(UKfaculty, mapping = cl_uk$membership)
1353 | E(cl_gr)$weight <- count_multiple(cl_gr)
1354 | cl_grs <- simplify(cl_gr)
1355 | E(cl_grs)$weight
1356 | ```
1357 |
1358 | ----
1359 |
1360 | ```{r}
1361 | par(mar = c(0,0,0,0)); plot(cl_grs, edge.width=E(cl_grs)$weight / 200,
1362 | edge.curved = .2, vertex.size = sizes(cl_uk) * 2)
1363 | ```
1364 |
1365 | ----
1366 |
1367 | ```{r}
1368 | subs <- lapply(groups(cl_uk), induced_subgraph, graph = UKfaculty)
1369 | summary(subs[[1]])
1370 | ```
1371 |
1372 | ----
1373 |
1374 | ```{r}
1375 | par(mar=c(0,0,0,0)); plot(subs[[1]])
1376 | ```
1377 |
1378 | ## Exercise
1379 |
1380 | A minimum spanning tree is a graph without cycle, that has the minimal
1381 | weight sum among all spanning trees of the graph.
1382 |
1383 | Try to visualize the airport network using the minimal spanning tree.
1384 | `mst()` calculates the (or a) minimum spanning tree. Hint: what will
1385 | you use as weight? Do you really want a minimum spanning tree, or a
1386 | maximum spanning tree?
1387 |
1388 | ## Exporting and importing graphs
1389 |
1390 | `read_graph()` and `write_graph()`.
1391 |
1392 | Imports: edge list, Pajek, GraphML, GML, DL, ...
1393 |
1394 | Exports: edge list, Pajek, GraphML, GML, DOT, Leda, ...
1395 |
1396 | Helpful packages: `rgexf`, `intergraph`, `DiagrammeR`, `networkD3`.
1397 |
1398 | ## The `networkD3` package
1399 |
1400 | ```{r}
1401 | library(networkD3)
1402 | d3_net <- simpleNetwork(as_data_frame(karate, what = "edges")[, 1:3])
1403 | d3_net
1404 | ```
1405 |
1406 | ## The `DiagrammeR` package
1407 |
1408 | ```{r}
1409 | library(DiagrammeR)
1410 | ```
1411 |
1412 | ----
1413 |
1414 | ```{r}
1415 | df_kar <- as_data_frame(karate, what = "both")
1416 | df_kar$vertices <- cbind(node = rownames(df_kar$vertices),
1417 | df_kar$vertices)
1418 | dg <- create_graph(
1419 | nodes_df = df_kar$vertices,
1420 | edges_df = df_kar$edges
1421 | )
1422 | render_graph(dg, width = 800, height = 600)
1423 | ```
1424 |
1425 | ## How to export to Gephi
1426 |
1427 | ```{r}
1428 | library(rgexf)
1429 | df_fac <- as_data_frame(UKfaculty, what = "both")
1430 | df_fac$vertices <- cbind(seq_len(gorder(UKfaculty)), df_fac$vertices)
1431 | output <- "images/UKfaculty.gexf"
1432 | write.gexf(nodes = df_fac$vertices, edges = df_fac$edges[,1:2],
1433 | edgesAtt = df_fac$edges[,-(1:2), drop = FALSE],
1434 | output = output)
1435 | ```
1436 |
1437 | ## A network viz tutorial
1438 |
1439 | Highly recommended:
1440 |
1441 | https://github.com/kateto/R-Network-Visualization-Workshop
1442 |
1443 | ## Questions?
1444 |
1445 | Ask a question:
1446 |
1447 | http://igraph.org/r/#help
1448 |
1449 | Report a bug:
1450 |
1451 | http://igraph.org/r/#contribute
1452 |
--------------------------------------------------------------------------------
/user-2015.md:
--------------------------------------------------------------------------------
1 | # Statistical Analysis of Network Data
2 | Gábor Csárdi
3 | `r Sys.Date()`
4 |
5 |
6 |
7 | ## How to follow this tutorial
8 |
9 | Go to https://github.com/igraph/netuser15
10 |
11 | You will need at least `igraph` version `1.0.0` and `igraphdata` version
12 | `1.0.0`. You will also need the `DiagrammeR` package. To install them
13 | from within R, type:
14 |
15 | ```r
16 | install.packages("igraph")
17 | install.packages("igraphdata")
18 | install.packages("DiagrammeR")
19 | ```
20 |
21 |
22 |
23 | ## Outline
24 |
25 | * Introduction
26 | * Manipulate network data
27 | * Questions
28 |
29 | ### BREAK
30 |
31 | * Classic graph theory: paths
32 | * Social network concepts: centrality, groups
33 | * Visualization
34 | * Questions
35 |
36 | ## Why networks?
37 |
38 | Sometimes connections are important, even more important than
39 | (the properties of) the things they connect.
40 |
41 | ## Example 1: Königsberg Bridges
42 |
43 | 
44 |
45 | -- Bogdan Giuşcă, CC BY-SA 3.0, Wikipedia
46 |
47 | ## Example 2: Page Rank
48 |
49 |
50 |
51 | http://computationalculture.net/article/what_is_in_pagerank
52 |
53 | ## Example 3: Matching Twitter to Facebook
54 |
55 | 
56 |
57 | http://morganlinton.com/wp-content/uploads/2013/12/twitter-facebook-branding2.png
58 |
59 | ## Example 4: Detection of groups
60 |
61 | 
62 |
63 | https://en.wikipedia.org/wiki/Community_structure#/
64 | media/File:Network_Community_Structure.svg
65 |
66 |
67 |
68 |
69 |
70 | ## About igraph
71 |
72 | * Network analysis library, written mostly in C/C++.
73 | * Interface to R and Python
74 | * https://github.com/igraph
75 | * http://igraph.org
76 | * Mailing list, stack overflow help.
77 | * Open GitHub issues for bugs
78 |
79 | # Creating and manipulating networks in R/igraph.
80 |
81 | ## What is a network or graph?
82 |
83 | 
84 |
85 | ## More formally:
86 |
87 | * `V`: set of vertices
88 | * `E`: subset of ordered or unordered pairs of vertices. Multiset, really.
89 |
90 | ## Creating toy networks with `make_graph`
91 |
92 |
93 | ```r
94 | library(igraph)
95 | ```
96 |
97 |
98 | ```r
99 | toy1 <- make_graph(~ A - B, B - C - D, D - E:F:A, A:B - G:H)
100 | toy1
101 | ```
102 |
103 | ```
104 | #> IGRAPH UN-- 8 10 --
105 | #> + attr: name (v/c)
106 | #> + edges (vertex names):
107 | #> [1] A--B A--D A--G A--H B--C B--G B--H C--D D--E D--F
108 | ```
109 |
110 | ----
111 |
112 |
113 | ```r
114 | par(mar = c(0,0,0,0)); plot(toy1)
115 | ```
116 |
117 | 
118 |
119 | ----
120 |
121 |
122 | ```r
123 | toy2 <- make_graph(~ A -+ B, B -+ C -+ D +- A:B)
124 | toy2
125 | ```
126 |
127 | ```
128 | #> IGRAPH DN-- 4 5 --
129 | #> + attr: name (v/c)
130 | #> + edges (vertex names):
131 | #> [1] A->B A->D B->C B->D C->D
132 | ```
133 |
134 | ----
135 |
136 |
137 | ```r
138 | par(mar = c(0,0,0,0)); plot(toy2)
139 | ```
140 |
141 | 
142 |
143 | ## Printout of a graph
144 |
145 |
146 | ```r
147 | toy2
148 | ```
149 |
150 | ```
151 | #> IGRAPH DN-- 4 5 --
152 | #> + attr: name (v/c)
153 | #> + edges (vertex names):
154 | #> [1] A->B A->D B->C B->D C->D
155 | ```
156 |
157 | `IGRAPH` means this is a graph object. Next, comes a four letter
158 | code:
159 |
160 | * `U` or `D` for undirected or directed
161 | * `N` if the graph is named, always use named graphs for real data sets.
162 | * `W` if the graph is weighted (has a `weight` edge attribute).
163 | * `B` if the graph is bipartite (has a `type` vertex attribute).
164 |
165 | ## Attributes
166 |
167 |
168 | ```r
169 | make_ring(5)
170 | ```
171 |
172 | ```
173 | #> IGRAPH U--- 5 5 -- Ring graph
174 | #> + attr: name (g/c), mutual (g/l), circular (g/l)
175 | #> + edges:
176 | #> [1] 1--2 2--3 3--4 4--5 1--5
177 | ```
178 |
179 | * Some graphs have a name (`name` graph attribute), that comes after
180 | the two dashes.
181 | * Then the various attributes are listed. Attributes
182 | are metadata that is attached to the vertices, edges, or the graph
183 | itself.
184 | * `(v/c)` means that `name` is a vertex attribute, and it is
185 | character.
186 | * `(e/.)` means an edge attribute, `(g/.)` means a graph attribute
187 |
188 | -----
189 |
190 |
191 | ```r
192 | make_ring(5)
193 | ```
194 |
195 | ```
196 | #> IGRAPH U--- 5 5 -- Ring graph
197 | #> + attr: name (g/c), mutual (g/l), circular (g/l)
198 | #> + edges:
199 | #> [1] 1--2 2--3 3--4 4--5 1--5
200 | ```
201 | * Attribute types: `c` for character, `n` for numeric, `l` for
202 | logical and `x` (complex) for anything else.
203 | * igraph treats some attributes specially. Always start your non-special
204 | attributes with an uppercase letter.
205 |
206 | ## Real network data
207 |
208 | ## Adjacency matrices
209 |
210 |
211 | ```r
212 | A <- matrix(sample(0:1, 100, replace = TRUE), nrow = 10)
213 | A
214 | ```
215 |
216 | ```
217 | #> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
218 | #> [1,] 1 0 1 1 0 0 1 0 1 1
219 | #> [2,] 1 1 0 1 0 0 1 0 0 0
220 | #> [3,] 0 1 1 0 0 0 1 0 0 0
221 | #> [4,] 1 0 1 1 1 1 1 0 1 1
222 | #> [5,] 1 0 0 0 0 0 1 0 1 1
223 | #> [6,] 1 1 1 1 1 1 0 1 1 1
224 | #> [7,] 1 1 0 0 1 1 0 0 0 0
225 | #> [8,] 0 0 1 0 1 0 1 0 0 1
226 | #> [9,] 1 0 0 1 1 0 1 1 0 1
227 | #> [10,] 1 1 1 1 1 1 0 0 0 1
228 | ```
229 |
230 | -----
231 |
232 |
233 | ```r
234 | graph_from_adjacency_matrix(A)
235 | ```
236 |
237 | ```
238 | #> IGRAPH D--- 10 55 --
239 | #> + edges:
240 | #> [1] 1-> 1 1-> 3 1-> 4 1-> 7 1-> 9 1->10 2-> 1 2-> 2 2-> 4
241 | #> [10] 2-> 7 3-> 2 3-> 3 3-> 7 4-> 1 4-> 3 4-> 4 4-> 5 4-> 6
242 | #> [19] 4-> 7 4-> 9 4->10 5-> 1 5-> 7 5-> 9 5->10 6-> 1 6-> 2
243 | #> [28] 6-> 3 6-> 4 6-> 5 6-> 6 6-> 8 6-> 9 6->10 7-> 1 7-> 2
244 | #> [37] 7-> 5 7-> 6 8-> 3 8-> 5 8-> 7 8->10 9-> 1 9-> 4 9-> 5
245 | #> [46] 9-> 7 9-> 8 9->10 10-> 1 10-> 2 10-> 3 10-> 4 10-> 5 10-> 6
246 | #> [55] 10->10
247 | ```
248 |
249 | ## List of edges
250 |
251 |
252 | ```r
253 | L <- matrix(sample(1:10, 20, replace = TRUE), ncol = 2)
254 | L
255 | ```
256 |
257 | ```
258 | #> [,1] [,2]
259 | #> [1,] 7 7
260 | #> [2,] 3 9
261 | #> [3,] 3 8
262 | #> [4,] 4 5
263 | #> [5,] 10 6
264 | #> [6,] 10 6
265 | #> [7,] 8 1
266 | #> [8,] 8 4
267 | #> [9,] 6 7
268 | #> [10,] 1 9
269 | ```
270 |
271 | -----
272 |
273 |
274 | ```r
275 | graph_from_edgelist(L)
276 | ```
277 |
278 | ```
279 | #> IGRAPH D--- 10 10 --
280 | #> + edges:
281 | #> [1] 7->7 3->9 3->8 4->5 10->6 10->6 8->1 8->4 6->7 1->9
282 | ```
283 |
284 | ## Two tables, one for vertices, one for edges
285 |
286 |
287 | ```r
288 | edges <- data.frame(
289 | stringsAsFactors = FALSE,
290 | from = c("BOS", "JFK", "LAX"),
291 | to = c("JFK", "LAX", "JFK"),
292 | Carrier = c("United", "Jetblue", "Virgin America"),
293 | Departures = c(30, 60, 121)
294 | )
295 | vertices <- data.frame(
296 | stringsAsFactors = FALSE,
297 | name = c("BOS", "JFK", "LAX"),
298 | City = c("Boston, MA", "New York City, NY",
299 | "Los Angeles, CA")
300 | )
301 | ```
302 |
303 | -----
304 |
305 |
306 | ```r
307 | edges
308 | ```
309 |
310 | ```
311 | #> from to Carrier Departures
312 | #> 1 BOS JFK United 30
313 | #> 2 JFK LAX Jetblue 60
314 | #> 3 LAX JFK Virgin America 121
315 | ```
316 |
317 | -----
318 |
319 |
320 | ```r
321 | vertices
322 | ```
323 |
324 | ```
325 | #> name City
326 | #> 1 BOS Boston, MA
327 | #> 2 JFK New York City, NY
328 | #> 3 LAX Los Angeles, CA
329 | ```
330 |
331 | -----
332 |
333 |
334 | ```r
335 | toy_air <- graph_from_data_frame(edges, vertices = vertices)
336 | toy_air
337 | ```
338 |
339 | ```
340 | #> IGRAPH DN-- 3 3 --
341 | #> + attr: name (v/c), City (v/c), Carrier (e/c), Departures (e/n)
342 | #> + edges (vertex names):
343 | #> [1] BOS->JFK JFK->LAX LAX->JFK
344 | ```
345 |
346 | ----
347 |
348 | The real US airports data set is in the `igraphdata` package:
349 |
350 |
351 | ```r
352 | library(igraphdata)
353 | data(USairports)
354 | USairports
355 | ```
356 |
357 | ```
358 | #> IGRAPH DN-- 755 23473 -- US airports
359 | #> + attr: name (g/c), name (v/c), City (v/c), Position (v/c),
360 | #> | Carrier (e/c), Departures (e/n), Seats (e/n), Passengers
361 | #> | (e/n), Aircraft (e/n), Distance (e/n)
362 | #> + edges (vertex names):
363 | #> [1] BGR->JFK BGR->JFK BOS->EWR ANC->JFK JFK->ANC LAS->LAX MIA->JFK
364 | #> [8] EWR->ANC BJC->MIA MIA->BJC TEB->ANC JFK->LAX LAX->JFK LAX->SFO
365 | #> [15] AEX->LAS BFI->SBA ELM->PIT GEG->SUN ICT->PBI LAS->LAX LAS->PBI
366 | #> [22] LAS->SFO LAX->LAS PBI->AEX PBI->ICT PIT->VCT SFO->LAX VCT->DWH
367 | #> [29] IAD->JFK ABE->CLT ABE->HPN AGS->CLT AGS->CLT AVL->CLT AVL->CLT
368 | #> [36] AVP->CLT AVP->PHL BDL->CLT BHM->CLT BHM->CLT BNA->CLT BNA->CLT
369 | #> + ... omitted several edges
370 | ```
371 |
372 | ----
373 |
374 | Converting it back to tables
375 |
376 |
377 | ```r
378 | as_data_frame(toy_air, what = "edges")
379 | ```
380 |
381 | ```
382 | #> from to Carrier Departures
383 | #> 1 BOS JFK United 30
384 | #> 2 JFK LAX Jetblue 60
385 | #> 3 LAX JFK Virgin America 121
386 | ```
387 |
388 | -----
389 |
390 |
391 | ```r
392 | as_data_frame(toy_air, what = "vertices")
393 | ```
394 |
395 | ```
396 | #> name City
397 | #> BOS BOS Boston, MA
398 | #> JFK JFK New York City, NY
399 | #> LAX LAX Los Angeles, CA
400 | ```
401 |
402 | -----
403 |
404 | Long data frames
405 |
406 |
407 | ```r
408 | as_long_data_frame(toy_air)
409 | ```
410 |
411 | ```
412 | #> from to Carrier Departures from_name from_City to_name
413 | #> 1 1 2 United 30 BOS Boston, MA JFK
414 | #> 2 2 3 Jetblue 60 JFK New York City, NY LAX
415 | #> 3 3 2 Virgin America 121 LAX Los Angeles, CA JFK
416 | #> to_City
417 | #> 1 New York City, NY
418 | #> 2 Los Angeles, CA
419 | #> 3 New York City, NY
420 | ```
421 |
422 | -----
423 |
424 | Quickly look at the metadata, without conversion:
425 |
426 |
427 | ```r
428 | V(USairports)[[1:5]]
429 | ```
430 |
431 | ```
432 | #> + 5/755 vertices, named:
433 | #> name City Position
434 | #> 1 BGR Bangor, ME N444827 W0684941
435 | #> 2 BOS Boston, MA N422152 W0710019
436 | #> 3 ANC Anchorage, AK N611028 W1495947
437 | #> 4 JFK New York, NY N403823 W0734644
438 | #> 5 LAS Las Vegas, NV N360449 W1150908
439 | ```
440 |
441 | ----
442 |
443 |
444 | ```r
445 | E(USairports)[[1:5]]
446 | ```
447 |
448 | ```
449 | #> + 5/23473 edges (vertex names):
450 | #> tail head tid hid Carrier Departures Seats Passengers
451 | #> 1 JFK BGR 4 1 British Airways Plc 1 226 193
452 | #> 2 JFK BGR 4 1 British Airways Plc 1 299 253
453 | #> 3 EWR BOS 7 2 British Airways Plc 1 216 141
454 | #> 4 JFK ANC 4 3 China Airlines Ltd. 13 5161 3135
455 | #> 5 ANC JFK 3 4 China Airlines Ltd. 13 5161 4097
456 | #> Aircraft Distance
457 | #> 1 627 382
458 | #> 2 819 382
459 | #> 3 627 200
460 | #> 4 819 3386
461 | #> 5 819 3386
462 | ```
463 |
464 | ## Weighted graphs
465 |
466 | Numbers (usually real) assigned to edges. E.g. number of departures,
467 | or number of passengers.
468 |
469 | 
470 |
471 | http://web.cecs.pdx.edu/~sheard/course/Cs163/Doc/Graphs.html
472 |
473 | ## Multigraphs
474 |
475 | They have multiple (directed) edges between the
476 | same pair of vertices. A graph that has no multiple edges
477 | and no loop edges is a simple graph.
478 |
479 | 
480 |
481 | https://en.wikipedia.org/wiki/Multigraph
482 |
483 | Multi-graphs are nasty. Always check if your graph is a multi-graph.
484 |
485 | -----
486 |
487 |
488 | ```r
489 | is_simple(USairports)
490 | ```
491 |
492 | ```
493 | #> [1] FALSE
494 | ```
495 |
496 | ```r
497 | sum(which_multiple(USairports))
498 | ```
499 |
500 | ```
501 | #> [1] 15208
502 | ```
503 |
504 | ```r
505 | sum(which_loop(USairports))
506 | ```
507 |
508 | ```
509 | #> [1] 53
510 | ```
511 |
512 | -----
513 |
514 | `simplify()` creates a simple graph from a multigraph, in a flexible
515 | way: you can specify what it should do with the edge attributes.
516 |
517 |
518 | ```r
519 | air <- simplify(USairports, edge.attr.comb =
520 | list(Departures = "sum", Seats = "sum", Passengers = "sum", "ignore"))
521 | is_simple(air)
522 | ```
523 |
524 | ```
525 | #> [1] TRUE
526 | ```
527 |
528 | ```r
529 | summary(air)
530 | ```
531 |
532 | ```
533 | #> IGRAPH DN-- 755 8228 -- US airports
534 | #> + attr: name (g/c), name (v/c), City (v/c), Position (v/c),
535 | #> | Departures (e/n), Seats (e/n), Passengers (e/n)
536 | ```
537 |
538 | ## Querying and manipulating networks: the `[` and `[[` operators
539 |
540 | The `[` operator treats the graph as an adjacency matrix.
541 |
542 | ```
543 | BOS JFK ANC EWR . . .
544 | BOS . 1 . 1
545 | JFK 1 . 1 .
546 | ANC . 1 . .
547 | EWR 1 . 1 .
548 | . . .
549 | ```
550 | -----
551 |
552 | The `[[` operator treats the graph as an adjacency list.
553 |
554 |
555 | ```r
556 | BOS: JFK, LAX, EWR, MKE, PVD
557 | JFK: BGR, BOS, SFO, BNA, BUF, SRQ, RIC RDU, MSP
558 | LAX: DTW, MSY, LAS, FLL, STL,
559 | . . .
560 | ```
561 |
562 | ## Queries
563 |
564 | Does an edge exist?
565 |
566 |
567 | ```r
568 | air["BOS", "JFK"]
569 | ```
570 |
571 | ```
572 | #> [1] 1
573 | ```
574 |
575 | ```r
576 | air["BOS", "ANC"]
577 | ```
578 |
579 | ```
580 | #> [1] 0
581 | ```
582 |
583 | -----
584 |
585 | Convert the graph to an adjacency matrix, or just a part of it:
586 |
587 |
588 | ```r
589 | air[c("BOS", "JFK", "ANC"), c("BOS", "JFK", "ANC")]
590 | ```
591 |
592 | ```
593 | #> 3 x 3 sparse Matrix of class "dgCMatrix"
594 | #> BOS JFK ANC
595 | #> BOS . 1 .
596 | #> JFK 1 . 1
597 | #> ANC . 1 .
598 | ```
599 |
600 | For weighted graphs, query the edge weight:
601 |
602 |
603 | ```r
604 | E(air)$weight <- E(air)$Passengers
605 | air["BOS", "JFK"]
606 | ```
607 |
608 | ```
609 | #> [1] 31426
610 | ```
611 |
612 | ----
613 |
614 | All adjacent vertices of a vertex:
615 |
616 |
617 | ```r
618 | air[["BOS"]]
619 | ```
620 |
621 | ```
622 | #> $BOS
623 | #> + 79/755 vertices, named:
624 | #> [1] BGR JFK LAS MIA EWR LAX PBI PIT SFO IAD BDL BUF BWI CAK CLE CLT CMH
625 | #> [18] CVG DCA DTW GSO IND LGA MDT MKE MSP MSY MYR ORF PHF PHL RDU RIC SRQ
626 | #> [35] STL SYR ALB PVD ROC SCE FLL MCO TPA BHB IAH ORD PBG PQI MCI ATL AUS
627 | #> [52] DEN DFW MDW PDX PHX RSW SAN SEA SLC ACY JAX MEM SJU STT SJC LGB FRG
628 | #> [69] IAG ACK LEB MVY PVC BMG AUG HYA RKD RUT SLK
629 | ```
630 |
631 | ----
632 |
633 |
634 | ```r
635 | air[[, "BOS"]]
636 | ```
637 |
638 | ```
639 | #> $BOS
640 | #> + 79/755 vertices, named:
641 | #> [1] BGR JFK LAS MIA EWR LAX PBI PIT SFO IAD BDL BUF BWI CAK CLE CLT CMH
642 | #> [18] CVG DCA DTW IND LGA MDT MKE MSP MSY MYR PHF PHL RDU RIC SRQ STL SYR
643 | #> [35] XNA ALB MHT PVD ROC SCE FLL MCO TPA BHB IAH ORD PBG PQI MCI ATL AUS
644 | #> [52] DEN DFW MDW PDX PHX RSW SAN SEA SLC ACY JAX MEM SJU STT SJC LGB FRG
645 | #> [69] PTK PGD ACK LEB MVY PVC AUG HYA RKD RUT SLK
646 | ```
647 |
648 | ## Manipulation
649 |
650 | Add an edge (and potentially set its weight):
651 |
652 | ```r
653 | air["BOS", "ANC"] <- TRUE
654 | air["BOS", "ANC"]
655 | ```
656 |
657 | ```
658 | #> [1] 1
659 | ```
660 |
661 | Remove an edge:
662 |
663 | ```r
664 | air["BOS", "ANC"] <- FALSE
665 | air["BOS", "ANC"]
666 | ```
667 |
668 | ```
669 | #> [1] 0
670 | ```
671 |
672 | ----
673 |
674 | Note that you can use all allowed indexing modes, e.g.
675 |
676 | ```r
677 | g <- make_empty_graph(10)
678 | g[-1, 1] <- TRUE
679 | g
680 | ```
681 |
682 | ```
683 | #> IGRAPH D--- 10 9 --
684 | #> + edges:
685 | #> [1] 2->1 3->1 4->1 5->1 6->1 7->1 8->1 9->1 10->1
686 | ```
687 | creates a star graph.
688 |
689 | ----
690 |
691 | Add vertices to a graph:
692 |
693 |
694 | ```r
695 | g <- make_ring(10) + 2
696 | par(mar = c(0,0,0,0)); plot(g)
697 | ```
698 |
699 | 
700 |
701 | ----
702 |
703 | Add vertices with attributes:
704 |
705 |
706 | ```r
707 | g <- make_(ring(10), with_vertex_(color = "grey")) +
708 | vertices(2, color = "red")
709 | par(mar = c(0,0,0,0)); plot(g)
710 | ```
711 |
712 | 
713 |
714 | ----
715 |
716 | Add an edge
717 |
718 |
719 | ```r
720 | g <- make_(star(10), with_edge_(color = "grey")) +
721 | edge(5, 6, color = "red")
722 | par(mar = c(0,0,0,0)); plot(g)
723 | ```
724 |
725 | 
726 |
727 | ----
728 |
729 | Add a chain of edges
730 |
731 |
732 | ```r
733 | g <- make_(empty_graph(5)) + path(1,2,3,4,5,1)
734 | g2 <- make_(empty_graph(5)) + path(1:5, 1)
735 | g
736 | ```
737 |
738 | ```
739 | #> IGRAPH D--- 5 5 --
740 | #> + edges:
741 | #> [1] 1->2 2->3 3->4 4->5 5->1
742 | ```
743 |
744 | ```r
745 | g2
746 | ```
747 |
748 | ```
749 | #> IGRAPH D--- 5 5 --
750 | #> + edges:
751 | #> [1] 1->2 2->3 3->4 4->5 5->1
752 | ```
753 |
754 | ## Exercise
755 |
756 | Create the wheel graph.
757 |
758 | 
759 |
760 | ## (A) solution
761 |
762 |
763 | ```r
764 | make_star(11, center = 11, mode = "undirected") + path(1:10, 1)
765 | ```
766 |
767 | ```
768 | #> IGRAPH U--- 11 20 -- Star
769 | #> + attr: name (g/c), mode (g/c), center (g/n)
770 | #> + edges:
771 | #> [1] 1--11 2--11 3--11 4--11 5--11 6--11 7--11 8--11 9--11
772 | #> [10] 10--11 1-- 2 2-- 3 3-- 4 4-- 5 5-- 6 6-- 7 7-- 8 8-- 9
773 | #> [19] 9--10 1--10
774 | ```
775 |
776 | ## Vertex sequences
777 |
778 | They are the key objects to manipulate graphs. Vertex sequences
779 | can be created in various ways. Most frequently used ones:
780 |
781 | |expression |result |
782 | |:--------------------------|:---------------------------------|
783 | |`V(air)` |All vertices. |
784 | |`V(air)[1,2:5]` |Vertices in these positions |
785 | |`V(air)[degree(air) < 2]` |Vertices satisfying condition |
786 | |`V(air)[nei('BOS')]` |Neighbors of a vertex |
787 | |`V(air)['BOS', 'JFK']` |Select given vertices |
788 |
789 | ## Edge sequences
790 |
791 | The same for edges:
792 |
793 | |expresssion |result |
794 | |:--------------------------|:--------------------------------------------|
795 | |`E(air)` |All edges. |
796 | |`E(air)[FL %--% CA]` |Edges between two vertex sets |
797 | |`E(air)[FL %->% CA]` |Edges between two vertex sets, directionally |
798 | |`E(air, path = P)` |Edges along a path |
799 | |`E(air)[to('BOS')]` |Incoming edges of a vertex |
800 | |`E(air)[from('BOS')]` |Outgoing edges of a vertex |
801 |
802 | ## Manipulate attributes via vertex and edge sequences
803 |
804 |
805 | ```r
806 | FL <- V(air)[grepl("FL$", City)]
807 | CA <- V(air)[grepl("CA$", City)]
808 |
809 | V(air)$color <- "grey"
810 | V(air)[FL]$color <- "blue"
811 | V(air)[CA]$color <- "blue"
812 | ```
813 |
814 | ----
815 |
816 |
817 | ```r
818 | E(air)[FL %--% CA]
819 | ```
820 |
821 | ```
822 | #> + 21/8228 edges (vertex names):
823 | #> [1] MIA->LAX MIA->SFO MIA->SJC LAX->MIA LAX->FLL LAX->MCO LAX->TPA
824 | #> [8] SFO->MIA SFO->FLL SFO->MCO FLL->LAX FLL->SFO FLL->LGB MCO->LAX
825 | #> [15] MCO->SFO TPA->LAX SMF->MIA JAX->OAK OAK->JAX LGB->FLL VNY->ORL
826 | ```
827 |
828 | ```r
829 | E(air)$color <- "grey"
830 | E(air)[FL %--% CA]$color <- "red"
831 | ```
832 |
833 | ## Quick look at metadata
834 |
835 |
836 | ```r
837 | V(air)[[1:5]]
838 | ```
839 |
840 | ```
841 | #> + 5/755 vertices, named:
842 | #> name City Position color
843 | #> 1 BGR Bangor, ME N444827 W0684941 grey
844 | #> 2 BOS Boston, MA N422152 W0710019 grey
845 | #> 3 ANC Anchorage, AK N611028 W1495947 grey
846 | #> 4 JFK New York, NY N403823 W0734644 grey
847 | #> 5 LAS Las Vegas, NV N360449 W1150908 grey
848 | ```
849 |
850 | ----
851 |
852 |
853 | ```r
854 | E(air)[[1:5]]
855 | ```
856 |
857 | ```
858 | #> + 5/8228 edges (vertex names):
859 | #> tail head tid hid Departures Seats Passengers weight color
860 | #> 1 BOS BGR 2 1 1 34 6 6 grey
861 | #> 2 JFK BGR 4 1 2 525 446 446 grey
862 | #> 3 MIA BGR 6 1 1 12 4 4 grey
863 | #> 4 EWR BGR 7 1 4 758 680 680 grey
864 | #> 5 DCA BGR 43 1 4 200 116 116 grey
865 | ```
866 |
867 | # BREAK
868 |
869 | ## Paths
870 |
871 |
872 | 
873 |
874 | ## Paths
875 |
876 | 
877 |
878 | ## Define a path in igraph
879 |
880 |
881 | ```r
882 | set.seed(42)
883 | g <- sample_gnp(12, 0.25)
884 |
885 | pa <- V(g)[11, 2, 12, 8]
886 |
887 | V(g)[pa]$color <- 'green'
888 | E(g)$color <- 'grey'
889 | E(g, path = pa)$color <- 'red'
890 | E(g, path = pa)$width <- 3
891 | ```
892 |
893 | ----
894 |
895 |
896 | ```r
897 | par(mar=c(0,0,0,0))
898 | plot(g, margin = 0, layout = layout_nicely)
899 | ```
900 |
901 | 
902 |
903 | ## Shortest paths
904 |
905 | 
906 |
907 | ----
908 |
909 | Length of the shortest path: distance.
910 | How many planes to get from `PBI` to `BDL`?
911 |
912 |
913 | ```r
914 | air <- delete_edge_attr(air, "weight")
915 | distances(air, 'PBI', 'ANC')
916 | ```
917 |
918 | ```
919 | #> ANC
920 | #> PBI 2
921 | ```
922 |
923 | ----
924 |
925 |
926 | ```r
927 | sp <- shortest_paths(air, 'PBI', 'ANC', output = "both")
928 | sp
929 | ```
930 |
931 | ```
932 | #> $vpath
933 | #> $vpath[[1]]
934 | #> + 3/755 vertices, named:
935 | #> [1] PBI JFK ANC
936 | #>
937 | #>
938 | #> $epath
939 | #> $epath[[1]]
940 | #> + 2/8228 edges (vertex names):
941 | #> [1] PBI->JFK JFK->ANC
942 | #>
943 | #>
944 | #> $predecessors
945 | #> NULL
946 | #>
947 | #> $inbound_edges
948 | #> NULL
949 | ```
950 |
951 | ```r
952 | air[[ sp$epath[[1]] ]]
953 | ```
954 |
955 | ```
956 | #> $MSL
957 | #> + 2/755 vertices, named:
958 | #> [1] ATL DLH
959 | #>
960 | #> $OKC
961 | #> + 34/755 vertices, named:
962 | #> [1] JFK LAS EWR LAX ELM PIT IAD BWI CLE CLT CMH DTW MSP SDF STL IAH ORD
963 | #> [18] MCI ABQ ATL DEN DFW HOU MDW PHX SAT SLC SMF TUS MEM GJT DAL NYL LUK
964 | ```
965 |
966 | ----
967 |
968 |
969 | ```r
970 | all_shortest_paths(air, 'PBI', 'ANC')$res
971 | ```
972 |
973 | ```
974 | #> [[1]]
975 | #> + 3/755 vertices, named:
976 | #> [1] PBI ORD ANC
977 | #>
978 | #> [[2]]
979 | #> + 3/755 vertices, named:
980 | #> [1] PBI EWR ANC
981 | #>
982 | #> [[3]]
983 | #> + 3/755 vertices, named:
984 | #> [1] PBI JFK ANC
985 | ```
986 |
987 | ## Weighted paths
988 |
989 |
990 | ```r
991 | wair <- simplify(USairports, edge.attr.comb =
992 | list(Departures = "sum", Seats = "sum", Passangers = "sum",
993 | Distance = "first", "ignore"))
994 | E(wair)$weight <- E(wair)$Distance
995 | ```
996 |
997 | ## Weighted (shortest) paths
998 |
999 |
1000 | ```r
1001 | distances(wair, c('BOS', 'JFK', 'PBI', 'AZO'),
1002 | c('BOS', 'JFK', 'PBI', 'AZO'))
1003 | ```
1004 |
1005 | ```
1006 | #> BOS JFK PBI AZO
1007 | #> BOS 0 187 1197 745
1008 | #> JFK 187 0 1028 621
1009 | #> PBI 1197 1028 0 1116
1010 | #> AZO 745 621 1116 0
1011 | ```
1012 |
1013 | ----
1014 |
1015 |
1016 | ```r
1017 | shortest_paths(wair, from = 'BOS', to = 'AZO')$vpath
1018 | ```
1019 |
1020 | ```
1021 | #> [[1]]
1022 | #> + 3/755 vertices, named:
1023 | #> [1] BOS DTW AZO
1024 | ```
1025 |
1026 | ```r
1027 | all_shortest_paths(wair, from = 'BOS', to = 'AZO')$res
1028 | ```
1029 |
1030 | ```
1031 | #> [[1]]
1032 | #> + 3/755 vertices, named:
1033 | #> [1] BOS DTW AZO
1034 | ```
1035 |
1036 | ## Mean path length
1037 |
1038 |
1039 | ```r
1040 | mean_distance(air)
1041 | ```
1042 |
1043 | ```
1044 | #> [1] 3.52743
1045 | ```
1046 |
1047 | ```r
1048 | air_dist_hist <- distance_table(air)
1049 | air_dist_hist
1050 | ```
1051 |
1052 | ```
1053 | #> $res
1054 | #> [1] 8228 94912 166335 163830 86263 15328 2793 291 27
1055 | #>
1056 | #> $unconnected
1057 | #> [1] 31263
1058 | ```
1059 |
1060 | ----
1061 |
1062 |
1063 | ```r
1064 | barplot(air_dist_hist$res, names.arg = seq_along(air_dist_hist$res))
1065 | ```
1066 |
1067 | 
1068 |
1069 | ## Components
1070 |
1071 |
1072 |
1073 | David Eppstein, public domain
1074 |
1075 | ## Strongly connected components
1076 |
1077 |
1078 |
1079 | http://www.greatandlittle.com/studios/
1080 |
1081 | ----
1082 |
1083 |
1084 | ```r
1085 | co <- components(air, mode = "weak")
1086 | co$csize
1087 | ```
1088 |
1089 | ```
1090 | #> [1] 745 2 2 3 2 1
1091 | ```
1092 |
1093 | ```r
1094 | groups(co)[[2]]
1095 | ```
1096 |
1097 | ```
1098 | #> [1] "GKN" "MXY"
1099 | ```
1100 |
1101 | ----
1102 |
1103 |
1104 | ```r
1105 | co <- components(air, mode = "strong")
1106 | co$csize
1107 | ```
1108 |
1109 | ```
1110 | #> [1] 1 1 1 1 1 1 1 1 1 1 2 1 2 1 1 2 1
1111 | #> [18] 1 1 1 1 1 1 1 723 1 1 1 1 1
1112 | ```
1113 |
1114 | ## Bow-tie structure of a directed graph
1115 |
1116 |
1117 |
1118 | http://webdatacommons.org/hyperlinkgraph/2012-08/topology.html
1119 |
1120 | ## Exercise
1121 |
1122 | 1. Extract the large (strongly) connected component from the
1123 | airport graph, as a separate graph.
1124 | Hint: `components()`, `induced_subgraph()`.
1125 | How many airports are not in this component?
1126 |
1127 | 1. In the large connected component, which airport is better
1128 | connected, `LAX` or `BOS`? I.e. what is the mean number of
1129 | plane changes that are required if traveling to a uniformly
1130 | randomly picked airport?
1131 |
1132 | 1. Which airport is the best connected one? Which one is the
1133 | worst (within the strongly connected component)?
1134 |
1135 | ## Solution
1136 |
1137 |
1138 | ```r
1139 | largest_component <- function(graph) {
1140 | comps <- components(graph, mode = "strong")
1141 | gr <- groups(comps)
1142 | sizes <- vapply(gr, length, 1L)
1143 | induced_subgraph(graph, gr[[ which.max(sizes) ]])
1144 | }
1145 | sc_air <- largest_component(air)
1146 | ```
1147 |
1148 | ----
1149 |
1150 |
1151 | ```r
1152 | table(distances(sc_air, "BOS"))
1153 | ```
1154 |
1155 | ```
1156 | #>
1157 | #> 0 1 2 3 4 5
1158 | #> 1 83 355 135 147 2
1159 | ```
1160 |
1161 | ```r
1162 | table(distances(sc_air, "LAX"))
1163 | ```
1164 |
1165 | ```
1166 | #>
1167 | #> 0 1 2 3 4 5
1168 | #> 1 109 394 195 22 2
1169 | ```
1170 |
1171 | ----
1172 |
1173 |
1174 | ```r
1175 | mean(as.vector(distances(sc_air, "BOS")))
1176 | ```
1177 |
1178 | ```
1179 | #> [1] 2.484094
1180 | ```
1181 |
1182 | ```r
1183 | mean(as.vector(distances(sc_air, "LAX")))
1184 | ```
1185 |
1186 | ```
1187 | #> [1] 2.185339
1188 | ```
1189 |
1190 | ----
1191 |
1192 |
1193 | ```r
1194 | D <- distances(sc_air)
1195 | sort(rowMeans(D))[1:10]
1196 | ```
1197 |
1198 | ```
1199 | #> ORD MSP SEA DTW LAX PHX EWR ANC
1200 | #> 2.117566 2.146611 2.149378 2.170124 2.185339 2.218534 2.224066 2.230982
1201 | #> SLC JFK
1202 | #> 2.235131 2.275242
1203 | ```
1204 |
1205 | ----
1206 |
1207 |
1208 | ```r
1209 | sort(rowMeans(D), decreasing = TRUE)[1:10]
1210 | ```
1211 |
1212 | ```
1213 | #> DQR SDX BLD TIQ TCL CPX AFK WHD
1214 | #> 6.147994 6.147994 5.150761 5.135546 4.889350 4.872752 4.820194 4.799447
1215 | #> ZXH DOF
1216 | #> 4.799447 4.798064
1217 | ```
1218 |
1219 | ----
1220 |
1221 |
1222 | ```r
1223 | V(sc_air)[[names(sort(rowMeans(D), decreasing = TRUE)[1:10])]]
1224 | ```
1225 |
1226 | ```
1227 | #> + 10/723 vertices, named:
1228 | #> name City Position color
1229 | #> 567 DQR Peach Springs, AZ N355919 W1134836 grey
1230 | #> 570 SDX Sedona, AZ N345055 W1114718 grey
1231 | #> 566 BLD Boulder City, NV N355651 W1145140 grey
1232 | #> 180 TIQ Tinian, TT N145949 E1453705 grey
1233 | #> 688 TCL Tuscaloosa, AL N331314 W0873641 grey
1234 | #> 722 CPX Culebra, PR N181848 W651816 grey
1235 | #> 670 AFK Nebraska, NE N403620 W955204 grey
1236 | #> 418 WHD Hyder, AK N555412 W1300024 grey
1237 | #> 420 ZXH Chomondely Sound, AK N551421 W1320651 grey
1238 | #> 410 DOF Dora Bay, AK N551400 W1321300 grey
1239 | ```
1240 |
1241 | ## Centrality
1242 |
1243 | Finding important vertices in the network (family of concepts)
1244 |
1245 | 
1246 |
1247 | ## Centrality
1248 |
1249 | 
1250 |
1251 | ## Classic centrality measures: degree
1252 |
1253 |
1254 | ```r
1255 | V(kite)$label.cex <- 2
1256 | V(kite)$color <- V(kite)$frame.color <- "grey"
1257 | V(kite)$size <- 30
1258 | par(mar=c(0,0,0,0)) ; plot(kite)
1259 | ```
1260 |
1261 | 
1262 |
1263 | -------
1264 |
1265 |
1266 | ```r
1267 | d <- degree(kite)
1268 | par(mar = c(0,0,0,0))
1269 | plot(kite, vertex.size = 10 * d, vertex.label =
1270 | paste0(V(kite)$name, ":", d))
1271 | ```
1272 |
1273 | 
1274 |
1275 |
1276 | ## Classic centrality measures: closeness
1277 |
1278 | 1 / How many steps do you need to get there?
1279 |
1280 |
1281 | ```r
1282 | cl <- closeness(kite)
1283 | ```
1284 |
1285 | -----
1286 |
1287 |
1288 | ```r
1289 | par(mar=c(0,0,0,0)); plot(kite, vertex.size = 500 * cl)
1290 | ```
1291 |
1292 | 
1293 |
1294 | ## Classic centrality measures: betweenness
1295 |
1296 | How many shortest paths goes through me
1297 |
1298 |
1299 | ```r
1300 | btw <- betweenness(kite)
1301 | btw
1302 | ```
1303 |
1304 | ```
1305 | #> A B C D E F
1306 | #> 0.8333333 0.8333333 0.0000000 3.6666667 0.0000000 8.3333333
1307 | #> G H I J
1308 | #> 8.3333333 14.0000000 8.0000000 0.0000000
1309 | ```
1310 |
1311 | -----
1312 |
1313 |
1314 | ```r
1315 | par(mar=c(0,0,0,0)); plot(kite, vertex.size = 3 * btw)
1316 | ```
1317 |
1318 | 
1319 |
1320 | ## Eigenvector centrality
1321 |
1322 | Typically for directed. Central vertex: it is cited by central vertices.
1323 |
1324 |
1325 | ```r
1326 | ec <- eigen_centrality(kite)$vector
1327 | ec
1328 | ```
1329 |
1330 | ```
1331 | #> A B C D E F
1332 | #> 0.73221232 0.73221232 0.59422577 1.00000000 0.59422577 0.82676381
1333 | #> G H I J
1334 | #> 0.82676381 0.40717690 0.09994054 0.02320742
1335 | ```
1336 |
1337 | ```r
1338 | cor(ec, d)
1339 | ```
1340 |
1341 | ```
1342 | #> [1] 0.9542561
1343 | ```
1344 |
1345 | -----
1346 |
1347 |
1348 | ```r
1349 | par(mar=c(0,0,0,0)); plot(kite, vertex.size = 20 * ec)
1350 | ```
1351 |
1352 | 
1353 |
1354 | ## Page Rank
1355 |
1356 | Fixes the practical problems with eigenvector centrality
1357 |
1358 |
1359 | ```r
1360 | page_rank(kite)$vector
1361 | ```
1362 |
1363 | ```
1364 | #> A B C D E F
1365 | #> 0.10191991 0.10191991 0.07941811 0.14714792 0.07941811 0.12890693
1366 | #> G H I J
1367 | #> 0.12890693 0.09524829 0.08569396 0.05141993
1368 | ```
1369 |
1370 | ## Exercise
1371 |
1372 | Create a table that contains the top 10 most central
1373 | airports according to all these centrality measures.
1374 |
1375 | # Clusters
1376 |
1377 | ## Why finding groups
1378 |
1379 | Finding groups in networks. Dimensionality reduction. Community detection.
1380 |
1381 | We want to find dense groups.
1382 |
1383 | -----
1384 |
1385 |
1386 |
1387 | ## Clusters by hand
1388 |
1389 |
1390 | ```r
1391 | graph <- make_graph( ~ A-B-C-D-A, E-A:B:C:D,
1392 | F-G-H-I-F, J-F:G:H:I,
1393 | K-L-M-N-K, O-K:L:M:N,
1394 | P-Q-R-S-P, T-P:Q:R:S,
1395 | B-F, E-J, C-I, L-T, O-T, M-S,
1396 | C-P, C-L, I-L, I-P)
1397 | ```
1398 |
1399 | ----
1400 |
1401 |
1402 | ```r
1403 | par(mar=c(0,0,0,0)); plot(graph)
1404 | ```
1405 |
1406 | 
1407 |
1408 | ----
1409 |
1410 |
1411 | ```r
1412 | flat_clustering <- make_clusters(
1413 | graph,
1414 | c(1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,4,4,4,4,4))
1415 | ```
1416 |
1417 | -----
1418 |
1419 |
1420 | ```r
1421 | flat_clustering
1422 | ```
1423 |
1424 | ```
1425 | #> IGRAPH clustering unknown, groups: 4, mod: 0.51
1426 | #> + groups:
1427 | #> $`1`
1428 | #> [1] 1 2 3 4 5
1429 | #>
1430 | #> $`2`
1431 | #> [1] 6 7 8 9 10
1432 | #>
1433 | #> $`3`
1434 | #> [1] 11 12 13 14 15
1435 | #>
1436 | #> $`4`
1437 | #> + ... omitted several groups/vertices
1438 | ```
1439 |
1440 | -----
1441 |
1442 |
1443 | ```r
1444 | flat_clustering[[1]]
1445 | ```
1446 |
1447 | ```
1448 | #> [1] 1 2 3 4 5
1449 | ```
1450 |
1451 | ```r
1452 | length(flat_clustering)
1453 | ```
1454 |
1455 | ```
1456 | #> [1] 4
1457 | ```
1458 |
1459 | ```r
1460 | sizes(flat_clustering)
1461 | ```
1462 |
1463 | ```
1464 | #> Community sizes
1465 | #> 1 2 3 4
1466 | #> 5 5 5 5
1467 | ```
1468 |
1469 | -----
1470 |
1471 |
1472 | ```r
1473 | induced_subgraph(graph, flat_clustering[[1]])
1474 | ```
1475 |
1476 | ```
1477 | #> IGRAPH UN-- 5 8 --
1478 | #> + attr: name (v/c)
1479 | #> + edges (vertex names):
1480 | #> [1] A--B A--D A--E B--C B--E C--D C--E D--E
1481 | ```
1482 |
1483 | ## Hierarchical community structure
1484 |
1485 | Typically produced by top-down or bottom-up clustering algorithms.
1486 |
1487 | The outcome can be represented as a *dendrogram*,
1488 | a tree-like diagram that illustrates the order in which the clusters
1489 | are merged (in the bottom-up case) or split (in the top-down case).
1490 |
1491 | -----
1492 |
1493 |
1494 |
1495 | ## Clustering quality measures
1496 |
1497 | - External quality measures: require ground truth
1498 | - Internal quality measures: require assumption about *good*
1499 | clusters.
1500 |
1501 | ## External quality measures
1502 |
1503 | Measure | Type | Range | igraph name
1504 | ------------------------------|------------|------------|----------------
1505 | Rand index | similarity | 0 to 1 | `rand`
1506 | Adjusted Rand index | similarity | -0.5 to 1 | `adjusted.rand`
1507 | Split-join distance | distance | 0 to 2n | `split.join`
1508 | Variation of information | distance | 0 to log n | `vi` |
1509 | Normalized mutual information | similarity | 0 to 1 | `nmi`
1510 |
1511 | ## External quality measures
1512 |
1513 |
1514 | ```r
1515 | data(karate)
1516 | karate
1517 | ```
1518 |
1519 | ```
1520 | #> IGRAPH UNW- 34 78 -- Zachary's karate club network
1521 | #> + attr: name (g/c), Citation (g/c), Author (g/c), Faction (v/n),
1522 | #> | name (v/c), label (v/c), color (v/n), weight (e/n)
1523 | #> + edges (vertex names):
1524 | #> [1] Mr Hi --Actor 2 Mr Hi --Actor 3 Mr Hi --Actor 4
1525 | #> [4] Mr Hi --Actor 5 Mr Hi --Actor 6 Mr Hi --Actor 7
1526 | #> [7] Mr Hi --Actor 8 Mr Hi --Actor 9 Mr Hi --Actor 11
1527 | #> [10] Mr Hi --Actor 12 Mr Hi --Actor 13 Mr Hi --Actor 14
1528 | #> [13] Mr Hi --Actor 18 Mr Hi --Actor 20 Mr Hi --Actor 22
1529 | #> [16] Mr Hi --Actor 32 Actor 2--Actor 3 Actor 2--Actor 4
1530 | #> [19] Actor 2--Actor 8 Actor 2--Actor 14 Actor 2--Actor 18
1531 | #> + ... omitted several edges
1532 | ```
1533 |
1534 | ```r
1535 | karate <- delete_edge_attr(karate, "weight")
1536 | ```
1537 |
1538 | -----
1539 |
1540 |
1541 | ```r
1542 | ground_truth <- make_clusters(karate, V(karate)$Faction)
1543 | length(ground_truth)
1544 | ```
1545 |
1546 | ```
1547 | #> [1] 2
1548 | ```
1549 |
1550 | ```r
1551 | ground_truth
1552 | ```
1553 |
1554 | ```
1555 | #> IGRAPH clustering unknown, groups: 2, mod: 0.37
1556 | #> + groups:
1557 | #> $`1`
1558 | #> [1] 1 2 3 4 5 6 7 8 11 12 13 14 17 18 20 22
1559 | #>
1560 | #> $`2`
1561 | #> [1] 9 10 15 16 19 21 23 24 25 26 27 28 29 30 31 32 33 34
1562 | #>
1563 | ```
1564 |
1565 | ## Exercise
1566 |
1567 | Write a naive clustering method that classifies vertices
1568 | into two groups, based on two center vertices. Put the two
1569 | centers in separate clusters, and other vertices in the
1570 | cluster whose center is closer to it.
1571 |
1572 |
1573 | ```r
1574 | cluster_naive2 <- function(graph, center1, center2) {
1575 | # ...
1576 | }
1577 | ```
1578 |
1579 | ## Solution
1580 |
1581 |
1582 | ```r
1583 | cluster_naive2 <- function(graph, center1, center2) {
1584 | dist <- distances(graph, c(center1, center2))
1585 | cl <- apply(dist, 2, which.min)
1586 | make_clusters(graph, cl)
1587 | }
1588 | dist_memb <- cluster_naive2(karate, 'John A', 'Mr Hi')
1589 | ```
1590 |
1591 | ----
1592 |
1593 |
1594 | ```r
1595 | dist_memb
1596 | ```
1597 |
1598 | ```
1599 | #> IGRAPH clustering unknown, groups: 2, mod: 0.31
1600 | #> + groups:
1601 | #> $`1`
1602 | #> [1] "Actor 9" "Actor 10" "Actor 14" "Actor 15" "Actor 16" "Actor 19"
1603 | #> [7] "Actor 20" "Actor 21" "Actor 23" "Actor 24" "Actor 25" "Actor 26"
1604 | #> [13] "Actor 27" "Actor 28" "Actor 29" "Actor 30" "Actor 31" "Actor 32"
1605 | #> [19] "Actor 33" "John A"
1606 | #>
1607 | #> $`2`
1608 | #> [1] "Mr Hi" "Actor 2" "Actor 3" "Actor 4" "Actor 5" "Actor 6"
1609 | #> [7] "Actor 7" "Actor 8" "Actor 11" "Actor 12" "Actor 13" "Actor 17"
1610 | #> [13] "Actor 18" "Actor 22"
1611 | #> + ... omitted several groups/vertices
1612 | ```
1613 |
1614 | ## Rand index
1615 |
1616 | Check if pairs of vertices are classified correctly
1617 |
1618 |
1619 | ```r
1620 | rand_index <- compare(ground_truth, dist_memb, method = "rand")
1621 | rand_index
1622 | ```
1623 |
1624 | ```
1625 | #> [1] 0.885918
1626 | ```
1627 |
1628 | ## Rand index
1629 |
1630 | Random clusterings
1631 |
1632 |
1633 | ```r
1634 | random_partition <- function(n, k = 2) { sample(k, n, replace = TRUE) }
1635 | total <- numeric(100)
1636 | for (i in seq_len(100)) {
1637 | c1 <- random_partition(100)
1638 | c2 <- random_partition(100)
1639 | total[i] <- compare(c1, c2, method = "rand")
1640 | }
1641 | mean(total)
1642 | ```
1643 |
1644 | ```
1645 | #> [1] 0.5017414
1646 | ```
1647 |
1648 | ## Adjusted Rand index
1649 |
1650 |
1651 | ```r
1652 | total <- numeric(100)
1653 | for (i in seq_len(100)) {
1654 | c1 <- random_partition(100)
1655 | c2 <- random_partition(100)
1656 | total[i] <- compare(c1, c2, method = "adjusted.rand")
1657 | }
1658 | mean(total)
1659 | ```
1660 |
1661 | ```
1662 | #> [1] 0.00168767
1663 | ```
1664 |
1665 | ## Adjusted rand index
1666 |
1667 |
1668 | ```r
1669 | compare(ground_truth, dist_memb, method = "adjusted.rand")
1670 | ```
1671 |
1672 | ```
1673 | #> [1] 0.7718469
1674 | ```
1675 |
1676 | ## Internal quality metrics: density
1677 |
1678 |
1679 | ```r
1680 | edge_density(karate)
1681 | ```
1682 |
1683 | ```
1684 | #> [1] 0.1390374
1685 | ```
1686 |
1687 | ```r
1688 | subgraph_density <- function(graph, vertices) {
1689 | sg <- induced_subgraph(graph, vertices)
1690 | edge_density(sg)
1691 | }
1692 | ```
1693 |
1694 |
1695 | ```r
1696 | subgraph_density(karate, ground_truth[[1]])
1697 | ```
1698 |
1699 | ```
1700 | #> [1] 0.275
1701 | ```
1702 |
1703 | ```r
1704 | subgraph_density(karate, ground_truth[[2]])
1705 | ```
1706 |
1707 | ```
1708 | #> [1] 0.2287582
1709 | ```
1710 |
1711 | ## Internal quality metrics: modularity
1712 |
1713 | Uses a null model
1714 |
1715 | $$Q(G) = \frac{1}{2m} \sum_{i=1}^n \sum_{j=1}^n \left( A_{ij} - p_{ij} \right) \delta_{ij}$$
1716 |
1717 | $A_{ij}$: Adjacency matrix
1718 |
1719 | $\delta_{ij}$: $i$ and $j$ are in the same cluster
1720 |
1721 | $p_{ij}$ expected value for an $(i,j)$ edge from the null model
1722 |
1723 | ## Modularity
1724 |
1725 | Common null model: degree-sequence (configuration) model
1726 |
1727 | $$Q(G) = \frac{1}{2m} \sum_{i=1}^n \sum_{j=1}^n \left( A_{ij} - \frac{k_i k_j}{2m} \right)
1728 | \delta_{ij}$$
1729 |
1730 | ## Modularity in igraph
1731 |
1732 |
1733 | ```r
1734 | modularity(ground_truth)
1735 | ```
1736 |
1737 | ```
1738 | #> [1] 0.3714661
1739 | ```
1740 |
1741 | ```r
1742 | modularity(karate, membership(ground_truth))
1743 | ```
1744 |
1745 | ```
1746 | #> [1] 0.3714661
1747 | ```
1748 |
1749 | ----
1750 |
1751 | Well behaving:
1752 |
1753 |
1754 | ```r
1755 | modularity(karate, rep(1, gorder(karate)))
1756 | ```
1757 |
1758 | ```
1759 | #> [1] 0
1760 | ```
1761 |
1762 | ```r
1763 | modularity(karate, seq_len(gorder(karate)))
1764 | ```
1765 |
1766 | ```
1767 | #> [1] -0.04980276
1768 | ```
1769 |
1770 | ## Heuristic algorithms
1771 |
1772 | Edge-betweenness clustering
1773 |
1774 | Exact modularity optimization
1775 |
1776 | Greedy agglomerative algorithm to maximize modularity
1777 |
1778 | ## Edge-betweenness clustering
1779 |
1780 |
1781 | ```r
1782 | dendrogram <- cluster_edge_betweenness(karate)
1783 | dendrogram
1784 | ```
1785 |
1786 | ```
1787 | #> IGRAPH clustering edge betweenness, groups: 5, mod: 0.4
1788 | #> + groups:
1789 | #> $`1`
1790 | #> [1] "Mr Hi" "Actor 2" "Actor 4" "Actor 8" "Actor 12" "Actor 13"
1791 | #> [7] "Actor 14" "Actor 18" "Actor 20" "Actor 22"
1792 | #>
1793 | #> $`2`
1794 | #> [1] "Actor 3" "Actor 25" "Actor 26" "Actor 28" "Actor 29" "Actor 32"
1795 | #>
1796 | #> $`3`
1797 | #> [1] "Actor 5" "Actor 6" "Actor 7" "Actor 11" "Actor 17"
1798 | #>
1799 | #> + ... omitted several groups/vertices
1800 | ```
1801 |
1802 | -----
1803 |
1804 |
1805 | ```r
1806 | membership(dendrogram)
1807 | ```
1808 |
1809 | ```
1810 | #> Mr Hi Actor 2 Actor 3 Actor 4 Actor 5 Actor 6 Actor 7 Actor 8
1811 | #> 1 1 2 1 3 3 3 1
1812 | #> Actor 9 Actor 10 Actor 11 Actor 12 Actor 13 Actor 14 Actor 15 Actor 16
1813 | #> 4 5 3 1 1 1 4 4
1814 | #> Actor 17 Actor 18 Actor 19 Actor 20 Actor 21 Actor 22 Actor 23 Actor 24
1815 | #> 3 1 4 1 4 1 4 4
1816 | #> Actor 25 Actor 26 Actor 27 Actor 28 Actor 29 Actor 30 Actor 31 Actor 32
1817 | #> 2 2 4 2 2 4 4 2
1818 | #> Actor 33 John A
1819 | #> 4 4
1820 | ```
1821 |
1822 | -----
1823 |
1824 |
1825 | ```r
1826 | compare_all <- function(cl1, cl2) {
1827 | methods <- eval(as.list(args(compare))$method)
1828 | vapply(methods, compare, 1.0, comm1 = cl1, comm2 = cl2)
1829 | }
1830 | compare_all(dendrogram, ground_truth)
1831 | ```
1832 |
1833 | ```
1834 | #> vi nmi split.join rand adjusted.rand
1835 | #> 0.8868344 0.5798278 13.0000000 0.7379679 0.4686165
1836 | ```
1837 |
1838 | -----
1839 |
1840 |
1841 | ```r
1842 | cluster_memb <- cut_at(dendrogram, no = 2)
1843 | compare_all(cluster_memb, ground_truth)
1844 | ```
1845 |
1846 | ```
1847 | #> vi nmi split.join rand adjusted.rand
1848 | #> 0.2252446 0.8364981 2.0000000 0.9411765 0.8823025
1849 | ```
1850 |
1851 | ```r
1852 | clustering <- make_clusters(karate, membership = cluster_memb)
1853 | ```
1854 |
1855 | ----
1856 |
1857 |
1858 | ```r
1859 | V(karate)[Faction == 1]$shape <- "circle"
1860 | V(karate)[Faction == 2]$shape <- "square"
1861 | par(mar=c(0,0,0,0)); plot(clustering, karate)
1862 | ```
1863 |
1864 | 
1865 |
1866 | -----
1867 |
1868 |
1869 | ```r
1870 | par(mar=c(0,0,0,0)); plot_dendrogram(dendrogram, direction = "downwards")
1871 | ```
1872 |
1873 | 
1874 |
1875 | ## Exact modularity maximization
1876 |
1877 |
1878 | ```r
1879 | optimal <- cluster_optimal(karate)
1880 | modularity(clustering)
1881 | ```
1882 |
1883 | ```
1884 | #> [1] 0.3599606
1885 | ```
1886 |
1887 | ```r
1888 | modularity(optimal)
1889 | ```
1890 |
1891 | ```
1892 | #> [1] 0.4197896
1893 | ```
1894 |
1895 | ```r
1896 | modularity(ground_truth)
1897 | ```
1898 |
1899 | ```
1900 | #> [1] 0.3714661
1901 | ```
1902 |
1903 | ## Heuristic modularity optimization
1904 |
1905 |
1906 | ```r
1907 | dend_fast <- cluster_fast_greedy(karate)
1908 | compare_all(dend_fast, ground_truth)
1909 | ```
1910 |
1911 | ```
1912 | #> vi nmi split.join rand adjusted.rand
1913 | #> 0.5321150 0.6924673 10.0000000 0.8413547 0.6802559
1914 | ```
1915 |
1916 | -----
1917 |
1918 |
1919 | ```r
1920 | par(mar = c(0,0,0,0)); plot_dendrogram(dend_fast, direction = "downwards")
1921 | ```
1922 |
1923 | 
1924 |
1925 | # Visualization
1926 |
1927 | ## Plotting parameters
1928 |
1929 | ----
1930 |
1931 | Globally
1932 |
1933 |
1934 | ```r
1935 | igraph_options(edge.color = "black")
1936 | data(karate) ; par(mar=c(0,0,0,0)); plot(karate)
1937 | ```
1938 |
1939 | 
1940 |
1941 | -----
1942 |
1943 | Graph parameter
1944 |
1945 |
1946 | ```r
1947 | V(karate)$color <- "DarkOliveGreen" ; E(karate)$color <- "grey"
1948 | par(mar=c(0,0,0,0)) ; plot(karate)
1949 | ```
1950 |
1951 | 
1952 |
1953 | -----
1954 |
1955 | As an argument to `plot()`:
1956 |
1957 | ```r
1958 | par(mar = c(0,0,0,0))
1959 | plot(karate, edge.color = "black", vertex.color = "#00B7FF",
1960 | vertex.label.color = "black")
1961 | ```
1962 |
1963 | 
1964 |
1965 | ## igraph color palettes
1966 |
1967 |
1968 | ```r
1969 | karate$palette <- categorical_pal(length(clustering))
1970 | par(mar = c(0,0,0,0)); plot(karate, vertex.color = membership(clustering))
1971 | ```
1972 |
1973 | 
1974 |
1975 | ----
1976 |
1977 | Others: `r_pal()`, `sequential_pal()`, `diverging_pal()`.
1978 |
1979 | ## Graphical parameters
1980 |
1981 | Vertices: `size`, `size`, `color`, `frame.color`, `shape` (circle, square, rectangle, pie,
1982 | raster, none), `label`, `label.family`, `label.font`, `label.cex`, `label.dist`,
1983 | `label.degree`, `label.color`.
1984 |
1985 | Edges: `color`, `width`, `arrow.size`, `arrow.width`, `lty`, `label`,
1986 | `label.family`, `label.font`, `label.cex`, `label.color`, `label.x`, `label.y`,
1987 | `curved`, `arrow.mode`, `loop.angle`, `loop.angle2`.
1988 |
1989 | Graph: `layout` (a numeric matrix), `margin`, `palette` (for vertex color),
1990 | `rescale`, `asp`, `frame`, `main` (title), `sub` (title), `xlab`, `ylab`.
1991 |
1992 | ## Vertex shapes
1993 |
1994 |
1995 | ```r
1996 | shapes()
1997 | ```
1998 |
1999 | ```
2000 | #> [1] "circle" "crectangle" "csquare" "none" "pie"
2001 | #> [6] "raster" "rectangle" "sphere" "square" "vrectangle"
2002 | ```
2003 |
2004 | ----
2005 |
2006 |
2007 |
2008 |
2009 | ```r
2010 | plot(g, vertex.shape=shapes, vertex.label=shapes, vertex.label.dist=1,
2011 | vertex.size=15, vertex.size2=15,
2012 | vertex.pie=lapply(shapes, function(x) if (x=="pie") 2:6 else 0),
2013 | vertex.pie.color=list(heat.colors(5)))
2014 | ```
2015 |
2016 | ----
2017 |
2018 | 
2019 |
2020 | ## Layout algorithms
2021 |
2022 | Layout algorithm: place the vertices in a way, such that
2023 |
2024 | * nodes are distributed evenly
2025 | * edges have about the same length
2026 | * connected vertices are closer to each other
2027 | * edges are not crossing
2028 |
2029 | This is really hard, often impossible!
2030 |
2031 | ## Force-directed algorithms
2032 |
2033 | 
2034 |
2035 | ## Trees
2036 |
2037 |
2038 | ```r
2039 | tree <- make_tree(20, 3)
2040 | par(mar = c(0,0,0,0)); plot(tree, layout=layout_as_tree)
2041 | ```
2042 |
2043 | 
2044 |
2045 | ----
2046 |
2047 |
2048 | ```r
2049 | l <- layout_as_tree(tree, circular = TRUE)
2050 | par(mar = c(0,0,0,0)); plot(tree, layout = l)
2051 | ```
2052 |
2053 | 
2054 |
2055 | ----
2056 |
2057 |
2058 | ```
2059 | #> [1] TRUE
2060 | ```
2061 |
2062 |
2063 | ```r
2064 | summary(DC)
2065 | ```
2066 |
2067 | ```
2068 | #> IGRAPH DN-- 22 27 --
2069 | #> + attr: name (v/c), color (v/c), shape (v/c), size (v/n), size2
2070 | #> | (v/n), label (v/x), lty (e/n), arrow.size (e/n)
2071 | ```
2072 |
2073 | ```r
2074 | lay1 <- layout_with_sugiyama(DC, layers=apply(sapply(layers,
2075 | function(x) V(DC)$name %in% x), 1, which))
2076 | ```
2077 |
2078 | ----
2079 |
2080 |
2081 | ```r
2082 | par(mar = rep(0, 4))
2083 | plot(DC, layout = lay1$layout, vertex.label.cex = 0.5)
2084 | ```
2085 |
2086 | 
2087 |
2088 | ----
2089 |
2090 |
2091 | ```r
2092 | par(mar = c(0,0,0,0)); plot(lay1$extd_graph, vertex.label.cex=0.5)
2093 | ```
2094 |
2095 | 
2096 |
2097 | ## Slightly bigger networks
2098 |
2099 |
2100 | ```r
2101 | data(UKfaculty)
2102 | UKfaculty
2103 | ```
2104 |
2105 | ```
2106 | #> IGRAPH D-W- 81 817 --
2107 | #> + attr: Type (g/c), Date (g/c), Citation (g/c), Author (g/c),
2108 | #> | Group (v/n), weight (e/n)
2109 | #> + edges:
2110 | #> [1] 57->52 76->42 12->69 43->34 28->47 58->51 7->29 40->71 5->37
2111 | #> [10] 48->55 6->58 21-> 8 28->69 43->21 67->58 65->42 5->67 52->75
2112 | #> [19] 37->64 4->36 12->49 19->46 37-> 9 74->36 62-> 1 15-> 2 72->49
2113 | #> [28] 46->62 2->29 40->12 22->29 71->69 4-> 3 37->69 5-> 6 77->13
2114 | #> [37] 23->49 52->35 20->14 62->70 34->35 76->72 7->42 37->42 51->80
2115 | #> [46] 38->45 62->64 36->53 62->77 17->61 7->68 46->29 44->53 18->58
2116 | #> [55] 12->16 72->42 52->32 58->21 38->17 15->51 22-> 7 22->69 5->13
2117 | #> + ... omitted several edges
2118 | ```
2119 |
2120 | ----
2121 |
2122 |
2123 | ```r
2124 | par(mar = c(0,0,0,0)); plot(UKfaculty, layout = layout_with_graphopt)
2125 | ```
2126 |
2127 | 
2128 |
2129 | ----
2130 |
2131 |
2132 | ```r
2133 | cl_uk <- cluster_louvain(as.undirected(UKfaculty))
2134 | cl_gr <- contract(UKfaculty, mapping = cl_uk$membership)
2135 | E(cl_gr)$weight <- count_multiple(cl_gr)
2136 | cl_grs <- simplify(cl_gr)
2137 | E(cl_grs)$weight
2138 | ```
2139 |
2140 | ```
2141 | #> [1] 289 1 49 256 289 1296 16 256 144 16 4 729 784
2142 | #> [14] 256 1 81 121 169
2143 | ```
2144 |
2145 | ----
2146 |
2147 |
2148 | ```r
2149 | par(mar = c(0,0,0,0)); plot(cl_grs, edge.width=E(cl_grs)$weight / 200,
2150 | edge.curved = .2, vertex.size = sizes(cl_uk) * 2)
2151 | ```
2152 |
2153 | 
2154 |
2155 | ----
2156 |
2157 |
2158 | ```r
2159 | subs <- lapply(groups(cl_uk), induced_subgraph, graph = UKfaculty)
2160 | summary(subs[[1]])
2161 | ```
2162 |
2163 | ```
2164 | #> IGRAPH D-W- 6 29 --
2165 | #> + attr: Type (g/c), Date (g/c), Citation (g/c), Author (g/c),
2166 | #> | Group (v/n), weight (e/n)
2167 | ```
2168 |
2169 | ----
2170 |
2171 |
2172 | ```r
2173 | par(mar=c(0,0,0,0)); plot(subs[[1]])
2174 | ```
2175 |
2176 | 
2177 |
2178 | ## Exercise
2179 |
2180 | A minimum spanning tree is a graph without cycle, that has the minimal
2181 | weight sum among all spanning trees of the graph.
2182 |
2183 | Try to visualize the airport network using the minimal spanning tree.
2184 | `mst()` calculates the (or a) minimum spanning tree. Hint: what will
2185 | you use as weight? Do you really want a minimum spanning tree, or a
2186 | maximum spanning tree?
2187 |
2188 | ## Exporting and importing graphs
2189 |
2190 | `read_graph()` and `write_graph()`.
2191 |
2192 | Imports: edge list, Pajek, GraphML, GML, DL, ...
2193 |
2194 | Exports: edge list, Pajek, GraphML, GML, DOT, Leda, ...
2195 |
2196 | Helpful packages: `rgexf`, `intergraph`, `DiagrammeR`, `networkD3`.
2197 |
2198 | ## The `networkD3` package
2199 |
2200 |
2201 | ```r
2202 | library(networkD3)
2203 | d3_net <- simpleNetwork(as_data_frame(karate, what = "edges")[, 1:3])
2204 | d3_net
2205 | ```
2206 |
2207 |