├── .gitignore ├── Basics.ipynb ├── DAG-Julia-Pkgs.ipynb ├── README.md ├── Talk-of-the-network.ipynb ├── TheTippingPoint.ipynb └── Watts-Model.ipynb /.gitignore: -------------------------------------------------------------------------------- 1 | .ipynb_checkpoints 2 | -------------------------------------------------------------------------------- /DAG-Julia-Pkgs.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# The DAG of Julia packages\n", 8 | "\n", 9 | "## Problem statement\n", 10 | "\n", 11 | "In this tutorial, we show how LG in conjunction with other utility packages can be used for extracting the most recent directed acyclic graph (DAG) of the Julia package system. This information can be used for interactive data visualization with [D3](https://d3js.org/) like in the following links:\n", 12 | "\n", 13 | "- **The DAG of Julia packages:** https://juliohm.github.io/dataviz/DAG-of-Julia-packages\n", 14 | "- **Where are the Julians?** https://juliohm.github.io/dataviz/where-are-the-julians\n", 15 | "\n", 16 | "All the packages used in this notebook can be installed with:" 17 | ] 18 | }, 19 | { 20 | "cell_type": "code", 21 | "execution_count": 1, 22 | "metadata": {}, 23 | "outputs": [ 24 | { 25 | "name": "stderr", 26 | "output_type": "stream", 27 | "text": [ 28 | "\u001b[1m\u001b[36mINFO: \u001b[39m\u001b[22m\u001b[36mPackage HTTP is already installed\n", 29 | "\u001b[39m\u001b[1m\u001b[36mINFO: \u001b[39m\u001b[22m\u001b[36mPackage JSON is already installed\n", 30 | "\u001b[39m\u001b[1m\u001b[36mINFO: \u001b[39m\u001b[22m\u001b[36mPackage GitHub is already installed\n", 31 | "\u001b[39m\u001b[1m\u001b[36mINFO: \u001b[39m\u001b[22m\u001b[36mPackage LightGraphs is already installed\n", 32 | "\u001b[39m\u001b[1m\u001b[36mINFO: \u001b[39m\u001b[22m\u001b[36mPackage ProgressMeter is already installed\n", 33 | "\u001b[39m" 34 | ] 35 | } 36 | ], 37 | "source": [ 38 | "for dep in [\"HTTP\",\"JSON\",\"GitHub\",\"LightGraphs\",\"ProgressMeter\"]\n", 39 | " Pkg.add(dep)\n", 40 | "end" 41 | ] 42 | }, 43 | { 44 | "cell_type": "markdown", 45 | "metadata": {}, 46 | "source": [ 47 | "In order to be able to query information from GitHub without be misinterpreted as a malicious robot, you need to [create a personal token](https://github.com/settings/tokens) in your GitHub settings. Since this token is private, we ask you to save it as an environment variable in your operating system (e.g. set `GITHUB_AUTH` in your `.bashrc` file). This variable will be read in Julia and used for authentication as follows:" 48 | ] 49 | }, 50 | { 51 | "cell_type": "code", 52 | "execution_count": 2, 53 | "metadata": {}, 54 | "outputs": [ 55 | { 56 | "name": "stderr", 57 | "output_type": "stream", 58 | "text": [ 59 | "\n", 60 | "WARNING: deprecated syntax \"abstract GitHubType\" at /home/juliohm/.julia/v0.6/GitHub/src/utils/GitHubType.jl:20.\n", 61 | "Use \"abstract type GitHubType end\" instead.\n", 62 | "\n", 63 | "WARNING: deprecated syntax \"typealias GitHubString Compat.UTF8String\" at /home/juliohm/.julia/v0.6/GitHub/src/utils/GitHubType.jl:22.\n", 64 | "Use \"const GitHubString = Compat.UTF8String\" instead.\n", 65 | "\n", 66 | "WARNING: deprecated syntax \"abstract Authorization\" at /home/juliohm/.julia/v0.6/GitHub/src/utils/auth.jl:6.\n", 67 | "Use \"abstract type Authorization end\" instead.\n" 68 | ] 69 | }, 70 | { 71 | "data": { 72 | "text/plain": [ 73 | "GitHub.OAuth2(8cda0d**********************************)" 74 | ] 75 | }, 76 | "execution_count": 2, 77 | "metadata": {}, 78 | "output_type": "execute_result" 79 | } 80 | ], 81 | "source": [ 82 | "using HTTP\n", 83 | "using JSON\n", 84 | "using GitHub\n", 85 | "using LightGraphs\n", 86 | "using ProgressMeter\n", 87 | "\n", 88 | "# authenticate with GitHub to increase query limits\n", 89 | "mytoken = ENV[\"GITHUB_AUTH\"]\n", 90 | "myauth = GitHub.authenticate(mytoken)" 91 | ] 92 | }, 93 | { 94 | "cell_type": "markdown", 95 | "metadata": {}, 96 | "source": [ 97 | "After successful authentication, we are now ready to start coding. First, we extract the names of all registered packages in METADATA and assign to each of them a unique integer id:" 98 | ] 99 | }, 100 | { 101 | "cell_type": "code", 102 | "execution_count": 3, 103 | "metadata": {}, 104 | "outputs": [ 105 | { 106 | "data": { 107 | "text/plain": [ 108 | "Dict{String,Int64} with 1500 entries:\n", 109 | " \"Levenshtein\" => 724\n", 110 | " \"ReadStat\" => 1141\n", 111 | " \"Discretizers\" => 326\n", 112 | " \"SchumakerSpline\" => 1209\n", 113 | " \"FredData\" => 455\n", 114 | " \"GaussQuadrature\" => 475\n", 115 | " \"RecurrenceAnalysis\" => 1147\n", 116 | " \"MKLSparse\" => 843\n", 117 | " \"AnsiColor\" => 20\n", 118 | " \"ProximalOperators\" => 1075\n", 119 | " \"Luxor\" => 776\n", 120 | " \"RobustLeastSquares\" => 1186\n", 121 | " \"Temporal\" => 1353\n", 122 | " \"Robotlib\" => 1184\n", 123 | " \"PiecewiseLinearOpt\" => 1026\n", 124 | " \"JLDArchives\" => 665\n", 125 | " \"MatrixDepot\" => 803\n", 126 | " \"CodeTools\" => 168\n", 127 | " \"NumericSuffixes\" => 935\n", 128 | " \"COBRA\" => 162\n", 129 | " \"Crypto\" => 234\n", 130 | " \"Mongo\" => 857\n", 131 | " \"ROOT\" => 1194\n", 132 | " \"MNIST\" => 849\n", 133 | " \"RandomMatrices\" => 1123\n", 134 | " ⋮ => ⋮" 135 | ] 136 | }, 137 | "execution_count": 3, 138 | "metadata": {}, 139 | "output_type": "execute_result" 140 | } 141 | ], 142 | "source": [ 143 | "# find all packages in METADATA\n", 144 | "pkgs = readdir(Pkg.dir(\"METADATA\"))\n", 145 | "filterfunc = p -> isdir(joinpath(Pkg.dir(\"METADATA\"), p)) && p ∉ [\".git\",\".test\"]\n", 146 | "pkgs = filter(filterfunc, pkgs)\n", 147 | "\n", 148 | "# assign each package an id\n", 149 | "pkgdict = Dict{String,Int}()\n", 150 | "for (i,pkg) in enumerate(pkgs)\n", 151 | " push!(pkgdict, pkg => i)\n", 152 | "end\n", 153 | "pkgdict" 154 | ] 155 | }, 156 | { 157 | "cell_type": "markdown", 158 | "metadata": {}, 159 | "source": [ 160 | "Using the ids, we can easily build the DAG of packages with LG:" 161 | ] 162 | }, 163 | { 164 | "cell_type": "code", 165 | "execution_count": 4, 166 | "metadata": {}, 167 | "outputs": [ 168 | { 169 | "name": "stderr", 170 | "output_type": "stream", 171 | "text": [ 172 | "Building graph...100% Time: 0:04:02\n" 173 | ] 174 | } 175 | ], 176 | "source": [ 177 | "# build DAG\n", 178 | "DAG = DiGraph(length(pkgs))\n", 179 | "@showprogress 1 \"Building graph...\" for pkg in pkgs\n", 180 | " children = Pkg.dependents(pkg)\n", 181 | " for c in children\n", 182 | " add_edge!(DAG, pkgdict[pkg], pkgdict[c])\n", 183 | " end\n", 184 | "end" 185 | ] 186 | }, 187 | { 188 | "cell_type": "markdown", 189 | "metadata": {}, 190 | "source": [ 191 | "We are interested in finding all the descendents of a package. In other words, we are interested in finding all packages that are influenced by a given package. In this context, we further want to save the level of dependency (or geodesic distance) from descendents to the package being queried. This is a straightforward operation in LG:" 192 | ] 193 | }, 194 | { 195 | "cell_type": "code", 196 | "execution_count": 5, 197 | "metadata": { 198 | "collapsed": true 199 | }, 200 | "outputs": [], 201 | "source": [ 202 | "# find (indirect) descendents\n", 203 | "descendents = []\n", 204 | "for pkg in pkgs\n", 205 | " gdists = gdistances(DAG, pkgdict[pkg])\n", 206 | " desc = [Dict(\"id\"=>pkgs[v], \"level\"=>gdists[v]) for v in find(gdists .> 0)]\n", 207 | " push!(descendents, desc)\n", 208 | "end" 209 | ] 210 | }, 211 | { 212 | "cell_type": "markdown", 213 | "metadata": {}, 214 | "source": [ 215 | "For each package, we also want to save information about who has contributed to the project. This task is easy to implement with the awesome [GitHub.jl](https://github.com/JuliaWeb/GitHub.jl) API. However, some of the packages registered in METADATA are hosted on different websites such as gitlab, for which an API is missing. We simply skip them and ask authors to migrate their code to GitHub if possible:" 216 | ] 217 | }, 218 | { 219 | "cell_type": "code", 220 | "execution_count": 6, 221 | "metadata": {}, 222 | "outputs": [ 223 | { 224 | "name": "stderr", 225 | "output_type": "stream", 226 | "text": [ 227 | "Finding contributors...100% Time: 0:12:27\n" 228 | ] 229 | } 230 | ], 231 | "source": [ 232 | "# find contributors\n", 233 | "pkgcontributors = []\n", 234 | "hostnames = []\n", 235 | "@showprogress 1 \"Finding contributors...\" for pkg in pkgs\n", 236 | " url = Pkg.Read.url(pkg)\n", 237 | " m = match(r\".*://([a-z.]*)/(.*)\\.git.*\", url)\n", 238 | " hostname = m[1]; reponame = m[2]\n", 239 | " if hostname == \"github.com\"\n", 240 | " users, _ = contributors(reponame, auth=myauth)\n", 241 | " usersdata = map(u -> (u[\"contributor\"].login, u[\"contributions\"]), users)\n", 242 | " pkgcontrib = [Dict(\"id\"=>u, \"contributions\"=>c) for (u,c) in usersdata]\n", 243 | " push!(pkgcontributors, pkgcontrib)\n", 244 | " push!(hostnames, hostname)\n", 245 | " else\n", 246 | " push!(pkgcontributors, [])\n", 247 | " push!(hostnames, hostname)\n", 248 | " end\n", 249 | "end" 250 | ] 251 | }, 252 | { 253 | "cell_type": "markdown", 254 | "metadata": {}, 255 | "source": [ 256 | "We also extract the Julia version required in the last tag of a package. Both the lower and upper bounds are saved as well as a \"cleaned\" `major.minor` string for the lower bound, which is useful for data visualization:" 257 | ] 258 | }, 259 | { 260 | "cell_type": "code", 261 | "execution_count": 7, 262 | "metadata": { 263 | "collapsed": true 264 | }, 265 | "outputs": [], 266 | "source": [ 267 | "# find required Julia version\n", 268 | "juliaversion = []\n", 269 | "for pkg in pkgs\n", 270 | " versiondir = joinpath(Pkg.dir(\"METADATA\"), pkg, \"versions\")\n", 271 | " if isdir(versiondir)\n", 272 | " latestversion = readdir(versiondir)[end]\n", 273 | " reqfile = joinpath(versiondir, latestversion, \"requires\")\n", 274 | " reqs = Pkg.Reqs.parse(reqfile)\n", 275 | " if \"julia\" ∈ keys(reqs)\n", 276 | " vinterval = reqs[\"julia\"].intervals[1]\n", 277 | " vmin = vinterval.lower\n", 278 | " vmax = vinterval.upper\n", 279 | " majorminor = \"v$(vmin.major).$(vmin.minor)\"\n", 280 | " push!(juliaversion, Dict(\"min\"=>string(vinterval.lower),\n", 281 | " \"max\"=>string(vinterval.upper),\n", 282 | " \"majorminor\"=>majorminor))\n", 283 | " else\n", 284 | " push!(juliaversion, Dict(\"min\"=>\"NA\", \"max\"=>\"NA\", \"majorminor\"=>\"NA\"))\n", 285 | " end\n", 286 | " else\n", 287 | " push!(juliaversion, Dict(\"min\"=>\"BOGUS\", \"max\"=>\"BOGUS\", \"majorminor\"=>\"BOGUS\"))\n", 288 | " end\n", 289 | "end" 290 | ] 291 | }, 292 | { 293 | "cell_type": "markdown", 294 | "metadata": {}, 295 | "source": [ 296 | "Finally, we save the data in a JSON file:" 297 | ] 298 | }, 299 | { 300 | "cell_type": "code", 301 | "execution_count": 8, 302 | "metadata": { 303 | "collapsed": true 304 | }, 305 | "outputs": [], 306 | "source": [ 307 | "# construct JSON\n", 308 | "nodes = [Dict(\"id\"=>pkgs[v],\n", 309 | " \"indegree\"=>indegree(DAG,v),\n", 310 | " \"outdegree\"=>outdegree(DAG,v),\n", 311 | " \"juliaversion\"=>juliaversion[v],\n", 312 | " \"descendents\"=>descendents[v],\n", 313 | " \"contributors\"=>pkgcontributors[v]) for v in vertices(DAG)]\n", 314 | "\n", 315 | "links = [Dict(\"source\"=>pkgs[src(e)], \"target\"=>pkgs[dst(e)]) for e in edges(DAG)]\n", 316 | "\n", 317 | "data = Dict(\"nodes\"=>nodes, \"links\"=>links)\n", 318 | "\n", 319 | "# write to file\n", 320 | "open(\"DAG-Julia-Pkgs.json\", \"w\") do f\n", 321 | " JSON.print(f, data, 2)\n", 322 | "end" 323 | ] 324 | }, 325 | { 326 | "cell_type": "markdown", 327 | "metadata": {}, 328 | "source": [ 329 | "## Where are the Julians?\n", 330 | "\n", 331 | "Having extracted and saved the DAG of Julia packages, we take this opportunity to find out the Julians responsible for this amazing package system.\n", 332 | "\n", 333 | "We use LG again to build this social network:" 334 | ] 335 | }, 336 | { 337 | "cell_type": "code", 338 | "execution_count": 9, 339 | "metadata": {}, 340 | "outputs": [ 341 | { 342 | "data": { 343 | "text/plain": [ 344 | "Dict{String,Int64} with 1558 entries:\n", 345 | " \"credentiality\" => 496\n", 346 | " \"ZacCranko\" => 238\n", 347 | " \"Snnappie\" => 206\n", 348 | " \"benhamner\" => 387\n", 349 | " \"lynyus\" => 991\n", 350 | " \"iraikov\" => 781\n", 351 | " \"nstiurca\" => 1143\n", 352 | " \"pearlzli\" => 1180\n", 353 | " \"GunnarFarneback\" => 90\n", 354 | " \"njwilson23\" => 1135\n", 355 | " \"gustafsson\" => 719\n", 356 | " \"cgoldammer\" => 449\n", 357 | " \"garrison\" => 680\n", 358 | " \"lobingera\" => 977\n", 359 | " \"randyzwitch\" => 1232\n", 360 | " \"JonathanAnderson\" => 123\n", 361 | " \"madanim\" => 999\n", 362 | " \"Armavica\" => 20\n", 363 | " \"Matt5sean3\" => 148\n", 364 | " \"slangangular\" => 1356\n", 365 | " \"raphapr\" => 1236\n", 366 | " \"kuldeepdhaka\" => 946\n", 367 | " \"jdrugo\" => 818\n", 368 | " \"J-Revell\" => 103\n", 369 | " \"fserra\" => 668\n", 370 | " ⋮ => ⋮" 371 | ] 372 | }, 373 | "execution_count": 9, 374 | "metadata": {}, 375 | "output_type": "execute_result" 376 | } 377 | ], 378 | "source": [ 379 | "# find Julians on Github\n", 380 | "julians = []\n", 381 | "for pkgcontrib in pkgcontributors\n", 382 | " append!(julians, [julian[\"id\"].value for julian in pkgcontrib])\n", 383 | "end\n", 384 | "julians = sort(unique(julians))\n", 385 | "\n", 386 | "# assign each Julian an id\n", 387 | "juliandict = Dict{String,Int}()\n", 388 | "for (i,julian) in enumerate(julians)\n", 389 | " push!(juliandict, julian => i)\n", 390 | "end\n", 391 | "juliandict" 392 | ] 393 | }, 394 | { 395 | "cell_type": "code", 396 | "execution_count": 10, 397 | "metadata": {}, 398 | "outputs": [ 399 | { 400 | "name": "stderr", 401 | "output_type": "stream", 402 | "text": [ 403 | "\u001b[1m\u001b[36mINFO: \u001b[39m\u001b[22m\u001b[36m1558 Julians and 43978 connections\n", 404 | "\u001b[39m" 405 | ] 406 | } 407 | ], 408 | "source": [ 409 | "# build the social network\n", 410 | "socialnet = Graph(length(julians))\n", 411 | "contribdict = Dict{String,Int}()\n", 412 | "for pkgcontrib in pkgcontributors\n", 413 | " ids = [julian[\"id\"].value for julian in pkgcontrib]\n", 414 | " contribs = [julian[\"contributions\"] for julian in pkgcontrib]\n", 415 | " for i=1:length(ids)\n", 416 | " contribdict[ids[i]] = get(contribdict, ids[i], 0) + contribs[i]\n", 417 | " end\n", 418 | " for i=1:length(ids), j=1:i-1\n", 419 | " add_edge!(socialnet, juliandict[ids[i]], juliandict[ids[j]])\n", 420 | " end\n", 421 | "end\n", 422 | "\n", 423 | "njulians = nv(socialnet)\n", 424 | "nconnections = ne(socialnet)\n", 425 | "\n", 426 | "info(\"$njulians Julians and $nconnections connections\")" 427 | ] 428 | }, 429 | { 430 | "cell_type": "markdown", 431 | "metadata": {}, 432 | "source": [ 433 | "For each node of the social network, we use GitHub API to retrieve user information:" 434 | ] 435 | }, 436 | { 437 | "cell_type": "code", 438 | "execution_count": 11, 439 | "metadata": { 440 | "scrolled": false 441 | }, 442 | "outputs": [ 443 | { 444 | "name": "stderr", 445 | "output_type": "stream", 446 | "text": [ 447 | "Retrieving Julian info...100% Time: 0:04:44\n" 448 | ] 449 | } 450 | ], 451 | "source": [ 452 | "# HTTP requests on https://api.github.com\n", 453 | "juliansinfo = []\n", 454 | "@showprogress 1 \"Retrieving Julian info...\" for julian in julians\n", 455 | " resp = HTTP.get(\"https://api.github.com/users/$julian?access_token=$mytoken\")\n", 456 | " htmlbody = identity(String(resp.body))\n", 457 | " push!(juliansinfo, JSON.Parser.parse(htmlbody))\n", 458 | "end" 459 | ] 460 | }, 461 | { 462 | "cell_type": "markdown", 463 | "metadata": {}, 464 | "source": [ 465 | "If the user has typed an address on his profile, we find an approximate latitude/longitude with Google Maps geocoding API:" 466 | ] 467 | }, 468 | { 469 | "cell_type": "code", 470 | "execution_count": 12, 471 | "metadata": { 472 | "scrolled": false 473 | }, 474 | "outputs": [ 475 | { 476 | "name": "stderr", 477 | "output_type": "stream", 478 | "text": [ 479 | "Geocoding Julian address...100% Time: 0:04:37\n" 480 | ] 481 | } 482 | ], 483 | "source": [ 484 | "locnames = []\n", 485 | "latitudes = []\n", 486 | "longitudes = []\n", 487 | "countries = []\n", 488 | "@showprogress 1 \"Geocoding Julian address...\" for julian in juliansinfo\n", 489 | " address = julian[\"location\"]\n", 490 | " if address ≠ nothing\n", 491 | " address = replace(address, \"–\", \"\")\n", 492 | " address = replace(address, \" \", \"+\")\n", 493 | " resp = HTTP.get(\"http://maps.google.com/maps/api/geocode/json?address=$address\")\n", 494 | " htmlbody = identity(String(resp.body))\n", 495 | " results = JSON.Parser.parse(htmlbody)[\"results\"]\n", 496 | " if length(results) > 0\n", 497 | " geoinfo = results[1]\n", 498 | " locname = geoinfo[\"formatted_address\"]\n", 499 | " loccoords = geoinfo[\"geometry\"][\"location\"]\n", 500 | " push!(locnames, locname)\n", 501 | " push!(latitudes, loccoords[\"lat\"])\n", 502 | " push!(longitudes, loccoords[\"lng\"])\n", 503 | " for comp in geoinfo[\"address_components\"]\n", 504 | " if \"country\" ∈ comp[\"types\"]\n", 505 | " push!(countries, comp[\"long_name\"])\n", 506 | " end\n", 507 | " end\n", 508 | " else\n", 509 | " push!(locnames, nothing)\n", 510 | " push!(latitudes, nothing)\n", 511 | " push!(longitudes, nothing)\n", 512 | " push!(countries, nothing)\n", 513 | " end\n", 514 | " else\n", 515 | " push!(locnames, nothing)\n", 516 | " push!(latitudes, nothing)\n", 517 | " push!(longitudes, nothing)\n", 518 | " push!(countries, nothing)\n", 519 | " end\n", 520 | "end" 521 | ] 522 | }, 523 | { 524 | "cell_type": "markdown", 525 | "metadata": {}, 526 | "source": [ 527 | "Finally, we use JSON again to save the data:" 528 | ] 529 | }, 530 | { 531 | "cell_type": "code", 532 | "execution_count": 13, 533 | "metadata": { 534 | "collapsed": true, 535 | "scrolled": false 536 | }, 537 | "outputs": [], 538 | "source": [ 539 | "# construct JSON\n", 540 | "usernodes = [Dict(\"id\"=>julian[\"login\"],\n", 541 | " \"name\"=>julian[\"name\"],\n", 542 | " \"avatar_url\"=>julian[\"avatar_url\"],\n", 543 | " \"contributions\"=>contribdict[julian[\"login\"]],\n", 544 | " \"location\"=>locnames[i],\n", 545 | " \"latitude\"=>latitudes[i],\n", 546 | " \"longitude\"=>longitudes[i],\n", 547 | " \"country\"=>countries[i]) for (i,julian) in enumerate(juliansinfo)]\n", 548 | "\n", 549 | "userlinks = [Dict(\"source\"=>julians[src(e)], \"target\"=>julians[dst(e)]) for e in edges(socialnet)]\n", 550 | "\n", 551 | "userdata = Dict(\"nodes\"=>usernodes, \"links\"=>userlinks)\n", 552 | "\n", 553 | "# write to file\n", 554 | "open(\"Julians.json\", \"w\") do f\n", 555 | " JSON.print(f, userdata, 2)\n", 556 | "end" 557 | ] 558 | } 559 | ], 560 | "metadata": { 561 | "kernelspec": { 562 | "display_name": "Julia 0.6.2", 563 | "language": "julia", 564 | "name": "julia-0.6" 565 | }, 566 | "language_info": { 567 | "file_extension": ".jl", 568 | "mimetype": "application/julia", 569 | "name": "julia", 570 | "version": "0.6.2" 571 | } 572 | }, 573 | "nbformat": 4, 574 | "nbformat_minor": 2 575 | } 576 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # JuliaGraphsTutorials 2 | [![Slack](https://img.shields.io/badge/Join%20Our%20Community-Slack-blue)](https://app.slack.com/client/T68168MUP/C759LHXV0) 3 | 4 | Tutorials in the form of Jupyter notebooks for the JuliaGraphs ecosystem. 5 | 6 | ## Notebooks 7 | 8 | The notebooks are better viewed on nbviewer: 9 | 10 | - [Basic usage](http://nbviewer.jupyter.org/github/JuliaGraphs/JuliaGraphsTutorials/blob/master/Basics.ipynb) 11 | - [DAG of Julia packages](http://nbviewer.jupyter.org/github/JuliaGraphs/JuliaGraphsTutorials/blob/master/DAG-Julia-Pkgs.ipynb) 12 | - [Watts model](http://nbviewer.jupyter.org/github/JuliaGraphs/JuliaGraphsTutorials/blob/master/Watts-Model.ipynb) 13 | - [Tipping Point](http://nbviewer.jupyter.org/github/JuliaGraphs/JuliaGraphsTutorials/blob/master/TheTippingPoint.ipynb) 14 | - [Talk of the network](http://nbviewer.jupyter.org/github/JuliaGraphs/JuliaGraphsTutorials/blob/master/Talk-of-the-network.ipynb) 15 | 16 | ## Contributing 17 | 18 | Please let us know if any of the notebooks are outdated or not functioning with the current release of `Graphs.jl`. Contributions are very welcome, please submit a pull request or open an issue with an example that you feel is missing. 19 | -------------------------------------------------------------------------------- /Talk-of-the-network.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "*Note: In this workbook, we try to replicate the results from the classic paper \"Talk of the Network: A Complex Systems Look at the Underlying Process of Word-of-Mouth\", Goldenberg, Libai and Muller (2001). This is a self-didactic attempt.*" 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": 1, 13 | "metadata": {}, 14 | "outputs": [ 15 | { 16 | "name": "stderr", 17 | "output_type": "stream", 18 | "text": [ 19 | "\u001b[32m\u001b[1m Updating\u001b[22m\u001b[39m registry at `C:\\Users\\Thibaut\\.julia\\registries\\General`\n", 20 | "\u001b[32m\u001b[1m Updating\u001b[22m\u001b[39m git-repo `https://github.com/JuliaRegistries/General.git`\n", 21 | "\u001b[32m\u001b[1m Resolving\u001b[22m\u001b[39m package versions...\n", 22 | "\u001b[32m\u001b[1m Installed\u001b[22m\u001b[39m StatsFuns ─────────────── v1.0.1\n", 23 | "\u001b[32m\u001b[1m Installed\u001b[22m\u001b[39m DualNumbers ───────────── v0.6.8\n", 24 | "\u001b[32m\u001b[1m Installed\u001b[22m\u001b[39m PDMats ────────────────── v0.11.16\n", 25 | "\u001b[32m\u001b[1m Installed\u001b[22m\u001b[39m HypergeometricFunctions ─ v0.3.11\n", 26 | "\u001b[32m\u001b[1m Installed\u001b[22m\u001b[39m QuadGK ────────────────── v2.6.0\n", 27 | "\u001b[32m\u001b[1m Installed\u001b[22m\u001b[39m StatsModels ───────────── v0.6.33\n", 28 | "\u001b[32m\u001b[1m Installed\u001b[22m\u001b[39m FillArrays ────────────── v0.13.5\n", 29 | "\u001b[32m\u001b[1m Installed\u001b[22m\u001b[39m GLM ───────────────────── v1.8.1\n", 30 | "\u001b[32m\u001b[1m Installed\u001b[22m\u001b[39m ShiftedArrays ─────────── v2.0.0\n", 31 | "\u001b[32m\u001b[1m Installed\u001b[22m\u001b[39m Distributions ─────────── v0.25.78\n", 32 | "\u001b[32m\u001b[1m Updating\u001b[22m\u001b[39m `C:\\Users\\Thibaut\\.julia\\environments\\v1.8\\Project.toml`\n", 33 | " \u001b[90m [31c24e10] \u001b[39m\u001b[92m+ Distributions v0.25.78\u001b[39m\n", 34 | " \u001b[90m [38e38edf] \u001b[39m\u001b[92m+ GLM v1.8.1\u001b[39m\n", 35 | "\u001b[32m\u001b[1m Updating\u001b[22m\u001b[39m `C:\\Users\\Thibaut\\.julia\\environments\\v1.8\\Manifest.toml`\n", 36 | " \u001b[90m [49dc2e85] \u001b[39m\u001b[92m+ Calculus v0.5.1\u001b[39m\n", 37 | " \u001b[90m [b429d917] \u001b[39m\u001b[92m+ DensityInterface v0.4.0\u001b[39m\n", 38 | " \u001b[90m [31c24e10] \u001b[39m\u001b[92m+ Distributions v0.25.78\u001b[39m\n", 39 | " \u001b[90m [fa6b7ba4] \u001b[39m\u001b[92m+ DualNumbers v0.6.8\u001b[39m\n", 40 | " \u001b[90m [1a297f60] \u001b[39m\u001b[92m+ FillArrays v0.13.5\u001b[39m\n", 41 | " \u001b[90m [38e38edf] \u001b[39m\u001b[92m+ GLM v1.8.1\u001b[39m\n", 42 | " \u001b[90m [34004b35] \u001b[39m\u001b[92m+ HypergeometricFunctions v0.3.11\u001b[39m\n", 43 | " \u001b[90m [90014a1f] \u001b[39m\u001b[92m+ PDMats v0.11.16\u001b[39m\n", 44 | " \u001b[90m [1fd47b50] \u001b[39m\u001b[92m+ QuadGK v2.6.0\u001b[39m\n", 45 | " \u001b[90m [79098fc4] \u001b[39m\u001b[92m+ Rmath v0.7.0\u001b[39m\n", 46 | " \u001b[90m [1277b4bf] \u001b[39m\u001b[92m+ ShiftedArrays v2.0.0\u001b[39m\n", 47 | " \u001b[90m [4c63d2b9] \u001b[39m\u001b[92m+ StatsFuns v1.0.1\u001b[39m\n", 48 | " \u001b[90m [3eaba693] \u001b[39m\u001b[92m+ StatsModels v0.6.33\u001b[39m\n", 49 | " \u001b[90m [f50d1b31] \u001b[39m\u001b[92m+ Rmath_jll v0.3.0+0\u001b[39m\n", 50 | " \u001b[90m [4607b0f0] \u001b[39m\u001b[92m+ SuiteSparse\u001b[39m\n", 51 | "\u001b[32m\u001b[1mPrecompiling\u001b[22m\u001b[39m project...\n", 52 | "\u001b[32m ✓ \u001b[39m\u001b[90mShiftedArrays\u001b[39m\n", 53 | "\u001b[32m ✓ \u001b[39m\u001b[90mPDMats\u001b[39m\n", 54 | "\u001b[32m ✓ \u001b[39m\u001b[90mRmath_jll\u001b[39m\n", 55 | "\u001b[32m ✓ \u001b[39m\u001b[90mDensityInterface\u001b[39m\n", 56 | "\u001b[32m ✓ \u001b[39m\u001b[90mFillArrays\u001b[39m\n", 57 | "\u001b[32m ✓ \u001b[39m\u001b[90mQuadGK\u001b[39m\n", 58 | "\u001b[32m ✓ \u001b[39m\u001b[90mDualNumbers\u001b[39m\n", 59 | "\u001b[32m ✓ \u001b[39m\u001b[90mRmath\u001b[39m\n", 60 | "\u001b[32m ✓ \u001b[39m\u001b[90mHypergeometricFunctions\u001b[39m\n", 61 | "\u001b[32m ✓ \u001b[39m\u001b[90mStatsFuns\u001b[39m\n", 62 | "\u001b[32m ✓ \u001b[39m\u001b[90mStatsModels\u001b[39m\n", 63 | "\u001b[32m ✓ \u001b[39mDistributions\n", 64 | "\u001b[32m ✓ \u001b[39mGLM\n", 65 | " 13 dependencies successfully precompiled in 15 seconds. 224 already precompiled. 1 skipped during auto due to previous errors.\n" 66 | ] 67 | } 68 | ], 69 | "source": [ 70 | "] add Graphs Distributions DataFrames GLM ProgressMeter" 71 | ] 72 | }, 73 | { 74 | "cell_type": "code", 75 | "execution_count": 2, 76 | "metadata": {}, 77 | "outputs": [], 78 | "source": [ 79 | "using Graphs\n", 80 | "\n", 81 | "using Distributions, DataFrames, GLM, ProgressMeter\n", 82 | "using Dates\n", 83 | "using Random: shuffle, seed!" 84 | ] 85 | }, 86 | { 87 | "cell_type": "code", 88 | "execution_count": 3, 89 | "metadata": {}, 90 | "outputs": [], 91 | "source": [ 92 | "seed!(20130810);" 93 | ] 94 | }, 95 | { 96 | "cell_type": "markdown", 97 | "metadata": {}, 98 | "source": [ 99 | "# 1. Introduction \n", 100 | "\n", 101 | "In [Talk of the Network](https://www0.gsb.columbia.edu/mygsb/faculty/research/pubfiles/3391/TalkofNetworks.pdf), the authors explore the pattern of personal communication between an individual's core friends group (strong ties) and a wider set of acquaintances (weak ties). This remarkable study is one of the first ones in marketing that explored the influence of social networks on the diffusion of marketing messages. The key questions investigated in this paper are:\n", 102 | "\n", 103 | "- What matters more - strong ties or weak ties?\n", 104 | "- What effect does the size of an average individuals network have?\n", 105 | "- How does advertising interact with the diffusion through weak ties and that through strong ties\n", 106 | "\n", 107 | "In this workbook, we focus on replicating the efforts of the authors to answer the first question: do strong ties or weak ties influence the speed of information dissemination in a network?" 108 | ] 109 | }, 110 | { 111 | "cell_type": "markdown", 112 | "metadata": {}, 113 | "source": [ 114 | "# 2. Initializing the network" 115 | ] 116 | }, 117 | { 118 | "cell_type": "markdown", 119 | "metadata": {}, 120 | "source": [ 121 | "This study employs a large number of synthetic networks as substrates to study the diffusion of information diffusion. To quote the authors logic to create and initialize the networks:\n", 122 | "\n", 123 | "> *\"Each individual belongs to a single personal network. Each network consists of individuals who are connected by strong ties. In each period, individuals also conduct a finte number of weak tie interactions outside their personal networks... We divide the entire market equally into personal networks, in which each individual can belong to one network. In addition, in each period, every individual conducts random meetings with individuals external to his personal network.\"*\n", 124 | "\n", 125 | "Given this specification, we utilize the built-in complete graph generator from [Graphs.jl](https://juliagraphs.org/Graphs.jl/dev/core_functions/simplegraphs_generators/) to build several mini-regular networks and then allow individuals in each of these mini-networks to mingle. Our final data structure is hence a vector of several complete networks that are built based on the number of strong ties for each individual. Note that each individual in the network has a fixed number of strong ties ($s$) and weak ties ($w$)." 126 | ] 127 | }, 128 | { 129 | "cell_type": "code", 130 | "execution_count": 4, 131 | "metadata": {}, 132 | "outputs": [ 133 | { 134 | "data": { 135 | "text/plain": [ 136 | "initialize_network (generic function with 1 method)" 137 | ] 138 | }, 139 | "execution_count": 4, 140 | "metadata": {}, 141 | "output_type": "execute_result" 142 | } 143 | ], 144 | "source": [ 145 | "function initialize_network(n_nodes::Int, n_strong_ties::Int)\n", 146 | " G = [complete_graph(n_strong_ties) for g in 1:floor(Int, n_nodes/n_strong_ties)]\n", 147 | " return G\n", 148 | "end" 149 | ] 150 | }, 151 | { 152 | "cell_type": "markdown", 153 | "metadata": {}, 154 | "source": [ 155 | "# 3. Model\n", 156 | "\n", 157 | "## 3.1 Assumptions\n", 158 | "\n", 159 | "The probability of activation of a node, i.e., an uninformed individual turning to informed can happen in three ways: through a strong tie with probability $\\beta_s$, through a weak tie with probability $\\beta_w$ or through external marketing efforts with probability $\\alpha$. In line with conventional wisdom, the authors assume $\\alpha < \\beta_w < \\beta_s$. \n", 160 | "\n", 161 | "At timestep $t$, if an individual is connected to $m$ strong ties and $j$ weak ties, the probability of the individual being informed in this time step is:\n", 162 | "\n", 163 | "$$\n", 164 | "p(t) = 1 - (1- \\alpha)(1 - \\beta_w)^j(1 - \\beta_s)^m\n", 165 | "$$\n", 166 | "\n", 167 | "The outcome variable of interest is the number of time steps elapsed till 95% of the network engages." 168 | ] 169 | }, 170 | { 171 | "cell_type": "markdown", 172 | "metadata": {}, 173 | "source": [ 174 | "## 3.2 Execution\n", 175 | "\n", 176 | "Following our earlier discussion on the construction of substrate networks, each node in the network belongs to a complete sub-network. In addition, at each time step each node interacts with a fixed number of weak ties chosen at random from sub-networks other than its own.\n", 177 | "\n", 178 | "*Step 1:* At $t = 0$, the status of all nodes is set to `false`\n", 179 | "\n", 180 | "*Step 2:* For each node, the probability $p(t)$ of being informed is calculated using the above equation. A random draw $U$ is made from a standard uniform distribution and compared with $p(t)$. If $U < p(t)$ the status of the node is changed to `true`\n", 181 | "\n", 182 | "*Step 3:* In each successive time step, Step 2 is repeated till 95% of the total network (of size 3000) engages" 183 | ] 184 | }, 185 | { 186 | "cell_type": "markdown", 187 | "metadata": {}, 188 | "source": [ 189 | "We now look at several helper functions that execute the above logic\n", 190 | "\n", 191 | "### 3.2.1 Reset node status\n", 192 | "\n", 193 | "The node status is stored as a vector of `BitVector`'s. At the beginning of each simulation run, we call the following function to set the status of all the nodes to `false`. " 194 | ] 195 | }, 196 | { 197 | "cell_type": "code", 198 | "execution_count": 5, 199 | "metadata": {}, 200 | "outputs": [ 201 | { 202 | "data": { 203 | "text/plain": [ 204 | "reset_node_status (generic function with 1 method)" 205 | ] 206 | }, 207 | "execution_count": 5, 208 | "metadata": {}, 209 | "output_type": "execute_result" 210 | } 211 | ], 212 | "source": [ 213 | "function reset_node_status(G::Vector{Graphs.SimpleGraphs.SimpleGraph{Int}})\n", 214 | " node_status = [falses(nv(g)) for g in G]\n", 215 | " return node_status\n", 216 | "end" 217 | ] 218 | }, 219 | { 220 | "cell_type": "markdown", 221 | "metadata": {}, 222 | "source": [ 223 | "### 3.2.2 Updating status of the nodes\n", 224 | "\n", 225 | "At each time step, we execute two tasks. First, we allow the nodes to mingle randomly with their strong ties and with weak ties from other sub-networks. At this point, we count the number of active strong and weak ties for each node. Then, we use this information to update the status of all the nodes in the network.\n", 226 | "\n", 227 | "The first function counts the number of active strong ties within the node's sub-network. The second function executes the \"random meetings\" with weak ties as discussed in the paper. For each node we generate a random sample (without replacement) of size $w$ from sub-networks other than its own. We then count the number of active ties in its own sub-network and among the random sample taken from the rest of the network." 228 | ] 229 | }, 230 | { 231 | "cell_type": "code", 232 | "execution_count": 6, 233 | "metadata": {}, 234 | "outputs": [ 235 | { 236 | "data": { 237 | "text/plain": [ 238 | "count_active_str_ties (generic function with 1 method)" 239 | ] 240 | }, 241 | "execution_count": 6, 242 | "metadata": {}, 243 | "output_type": "execute_result" 244 | } 245 | ], 246 | "source": [ 247 | "function count_active_str_ties(G::Vector{Graphs.SimpleGraphs.SimpleGraph{Int}},\n", 248 | " node_network_id::Int,\n", 249 | " node::Int,\n", 250 | " node_status::Vector{BitVector})\n", 251 | " n_active_str_ties = sum([node_status[node_network_id][nbr] for nbr in neighbors(G[node_network_id], node)])\n", 252 | " return n_active_str_ties\n", 253 | "end" 254 | ] 255 | }, 256 | { 257 | "cell_type": "code", 258 | "execution_count": 7, 259 | "metadata": {}, 260 | "outputs": [ 261 | { 262 | "data": { 263 | "text/plain": [ 264 | "random_meetings (generic function with 1 method)" 265 | ] 266 | }, 267 | "execution_count": 7, 268 | "metadata": {}, 269 | "output_type": "execute_result" 270 | } 271 | ], 272 | "source": [ 273 | "function random_meetings(G::Vector{Graphs.SimpleGraphs.SimpleGraph{Int}},\n", 274 | " node_network_id::Int,\n", 275 | " node::Int,\n", 276 | " node_status::Vector{BitVector},\n", 277 | " n_weak_ties::Int)\n", 278 | " # Choose a random sample of size `n_weak_ties` from the other sub-networks and query\n", 279 | " # their status. We first sample the network id, and use this to sample a random node\n", 280 | " # in the sub-network defined by this id.\n", 281 | "\n", 282 | " all_network_ids = 1:length(G)\n", 283 | "\n", 284 | " other_network_ids = all_network_ids[all_network_ids .!= node_network_id]\n", 285 | " possible_weak_ties = []\n", 286 | " nsamples = 1\n", 287 | "\n", 288 | " while nsamples < n_weak_ties\n", 289 | " rand_network_id = sample(other_network_ids)\n", 290 | " rand_nbr = sample(vertices(G[rand_network_id]))\n", 291 | " if !((rand_network_id, rand_nbr) in possible_weak_ties)\n", 292 | " push!(possible_weak_ties, (rand_network_id, rand_nbr))\n", 293 | " nsamples += 1\n", 294 | " end\n", 295 | " end\n", 296 | "\n", 297 | " n_active_wk_ties = sum([node_status[network_id][weak_tie] for (network_id, weak_tie) in possible_weak_ties])\n", 298 | " return n_active_wk_ties\n", 299 | "end" 300 | ] 301 | }, 302 | { 303 | "cell_type": "markdown", 304 | "metadata": {}, 305 | "source": [ 306 | "Finally, the function below conducts the updation of the status of all the nodes at each time step by calculating the probability of activation. " 307 | ] 308 | }, 309 | { 310 | "cell_type": "code", 311 | "execution_count": 8, 312 | "metadata": {}, 313 | "outputs": [ 314 | { 315 | "data": { 316 | "text/plain": [ 317 | "update_status! (generic function with 1 method)" 318 | ] 319 | }, 320 | "execution_count": 8, 321 | "metadata": {}, 322 | "output_type": "execute_result" 323 | } 324 | ], 325 | "source": [ 326 | "function update_status!(G::Vector{Graphs.SimpleGraphs.SimpleGraph{Int}},\n", 327 | " node_status::Vector{BitVector},\n", 328 | " n_weak_ties::Int,\n", 329 | " alpha::Float64, beta_w::Float64, beta_s::Float64)\n", 330 | " # assuming that the nodes update in random order\n", 331 | "\n", 332 | " for node_network_id in shuffle(1:length(G))\n", 333 | " for node in shuffle(vertices(G[node_network_id]))\n", 334 | " n_active_str_ties = count_active_str_ties(G, node_network_id, node, node_status)\n", 335 | " n_active_wk_ties = random_meetings(G, node_network_id, node, node_status, n_weak_ties)\n", 336 | "\n", 337 | " activation_prob = 1 - (1 - alpha) * (1 - beta_w)^n_active_wk_ties * (1 - beta_s)^n_active_str_ties\n", 338 | "\n", 339 | " if rand(Uniform()) < activation_prob\n", 340 | " node_status[node_network_id][node] = true\n", 341 | " end\n", 342 | " end\n", 343 | " end\n", 344 | "\n", 345 | " return nothing\n", 346 | "end" 347 | ] 348 | }, 349 | { 350 | "cell_type": "markdown", 351 | "metadata": {}, 352 | "source": [ 353 | "### 3.2.4 Simulation on the parameter space\n", 354 | "\n", 355 | "The function `execute_simulation` puts together the scaffolding to set up the parameter space $(s, w, \\alpha, \\beta_w, \\beta_s)$ and execute diffusion along the network. From what I can gather from the paper, one simulation was carried out at each point on the parameter space. No further details regarding the execution are mentioned except that since each parameter has 7 levels, a total of $7^5 = 16,807$ simulations were executed in a factorial design. In this workbook, we work on a smaller parameter space using 3 levels for each parameter.\n", 356 | "\n", 357 | "Also, I am assuming that the network is drawn at random for each run of the simulation.\n", 358 | "\n", 359 | "One more interesting thing to note: The authors mention that their simulations were written in C, it would be interesting to compare the execution times with Julia. This is a non-standard problem that tests both the robustness of Julia types and its execution speed (maybe this will prompt someone to make a pull request!)." 360 | ] 361 | }, 362 | { 363 | "cell_type": "code", 364 | "execution_count": 9, 365 | "metadata": {}, 366 | "outputs": [ 367 | { 368 | "name": "stdout", 369 | "output_type": "stream", 370 | "text": [ 371 | "Number of strong ties per node (s): [5, 17, 29]\n", 372 | "Number of weak ties per node(w): [5, 17, 29]\n", 373 | "Effect of advertising (α): [0.0005, 0.00525, 0.01]\n", 374 | "Effect of weak ties (β_w): [0.005, 0.01, 0.015]\n", 375 | "Effect of strong ties (β_s): [0.01, 0.04, 0.07]\n" 376 | ] 377 | } 378 | ], 379 | "source": [ 380 | "println(\"Number of strong ties per node (s): \", floor.(Int, range(5, stop=29, length=3)))\n", 381 | "println(\"Number of weak ties per node(w): \", floor.(Int, range(5, stop=29, length=3)))\n", 382 | "println(\"Effect of advertising (α): \", collect(range(0.0005, stop=0.01, length=3)))\n", 383 | "println(\"Effect of weak ties (β_w): \", collect(range(0.005, stop=0.015, length=3)))\n", 384 | "println(\"Effect of strong ties (β_s): \", collect(range(0.01, stop=0.07, length=3)))" 385 | ] 386 | }, 387 | { 388 | "cell_type": "code", 389 | "execution_count": 10, 390 | "metadata": {}, 391 | "outputs": [ 392 | { 393 | "data": { 394 | "text/plain": [ 395 | "((3, 3, 3, 3, 3), 243)" 396 | ] 397 | }, 398 | "execution_count": 10, 399 | "metadata": {}, 400 | "output_type": "execute_result" 401 | } 402 | ], 403 | "source": [ 404 | "parameter_space = [(s, w, alpha, beta_w, beta_s) for s in floor.(Int, range(5, stop=29, length=3)), \n", 405 | " w in floor.(Int, range(5, stop=29, length=3)),\n", 406 | " alpha in range(0.0005, stop=0.01, length=3),\n", 407 | " beta_w in range(0.005, stop=0.015, length=3),\n", 408 | " beta_s in range(0.01, stop=0.07, length=3)]\n", 409 | "\n", 410 | "size(parameter_space), length(parameter_space)" 411 | ] 412 | }, 413 | { 414 | "cell_type": "code", 415 | "execution_count": 11, 416 | "metadata": {}, 417 | "outputs": [ 418 | { 419 | "data": { 420 | "text/plain": [ 421 | "execute_simulation (generic function with 1 method)" 422 | ] 423 | }, 424 | "execution_count": 11, 425 | "metadata": {}, 426 | "output_type": "execute_result" 427 | } 428 | ], 429 | "source": [ 430 | "function execute_simulation(parameter_space, n_nodes::Int)\n", 431 | " # n_nodes dictates how big the network will be\n", 432 | " # We cannot pre-allocate the output since we do not know for how many time steps the simulation will\n", 433 | " # run at each setting\n", 434 | "\n", 435 | " output = DataFrame(s = Int[], w = Int[], alpha = Float64[],\n", 436 | " beta_w = Float64[], beta_s = Float64[],\n", 437 | " t = Int[], num_engaged = Int[])\n", 438 | "\n", 439 | " println(\"Beginning simulation at : \", Dates.format(now(), \"HH:MM\"))\n", 440 | " println(\"You might want to grab a cup of coffee while Julia brews the simulation...\")\n", 441 | "\n", 442 | " @showprogress 1 \"Crunching numbers while you munch...\" for (s, w, alpha, beta_w, beta_s) in parameter_space[1:end]\n", 443 | " G = initialize_network(n_nodes, s)\n", 444 | " node_status = reset_node_status(G)\n", 445 | " num_engaged = sum(sum(node_status))\n", 446 | "\n", 447 | " # Continue updates at each setting till 95% of the network engages\n", 448 | " t = 1\n", 449 | " while num_engaged < floor(Int, 0.95 * n_nodes)\n", 450 | " update_status!(G, node_status, w, alpha, beta_w, beta_s)\n", 451 | " num_engaged = sum(sum(node_status))\n", 452 | " push!(output, [s, w, alpha, beta_w, beta_s, t, num_engaged])\n", 453 | " t += 1\n", 454 | " end\n", 455 | " end\n", 456 | "\n", 457 | " return output\n", 458 | "end" 459 | ] 460 | }, 461 | { 462 | "cell_type": "code", 463 | "execution_count": 12, 464 | "metadata": {}, 465 | "outputs": [ 466 | { 467 | "name": "stdout", 468 | "output_type": "stream", 469 | "text": [ 470 | "Beginning simulation at : 16:05\n", 471 | "You might want to grab a cup of coffee while Julia brews the simulation...\n" 472 | ] 473 | }, 474 | { 475 | "name": "stderr", 476 | "output_type": "stream", 477 | "text": [ 478 | "\u001b[32mCrunching numbers while you munch... 100%|███████████████| Time: 0:02:44\u001b[39m\n" 479 | ] 480 | }, 481 | { 482 | "data": { 483 | "text/html": [ 484 | "
5654×7 DataFrame
5629 rows omitted
Rowswalphabeta_wbeta_stnum_engaged
Int64Int64Float64Float64Float64Int64Int64
1550.00050.0050.0111
2550.00050.0050.0126
3550.00050.0050.0139
4550.00050.0050.01411
5550.00050.0050.01511
6550.00050.0050.01611
7550.00050.0050.01714
8550.00050.0050.01816
9550.00050.0050.01919
10550.00050.0050.011021
11550.00050.0050.011123
12550.00050.0050.011229
13550.00050.0050.011331
56435290.010.0150.07102781
56445290.010.0150.07112890
564517290.010.0150.07163
564617290.010.0150.072340
564717290.010.0150.0731002
564817290.010.0150.0741949
564917290.010.0150.0752613
565017290.010.0150.0762902
565129290.010.0150.071238
565229290.010.0150.0721146
565329290.010.0150.0732373
565429290.010.0150.0742921
" 485 | ], 486 | "text/latex": [ 487 | "\\begin{tabular}{r|ccccccc}\n", 488 | "\t& s & w & alpha & beta\\_w & beta\\_s & t & num\\_engaged\\\\\n", 489 | "\t\\hline\n", 490 | "\t& Int64 & Int64 & Float64 & Float64 & Float64 & Int64 & Int64\\\\\n", 491 | "\t\\hline\n", 492 | "\t1 & 5 & 5 & 0.0005 & 0.005 & 0.01 & 1 & 1 \\\\\n", 493 | "\t2 & 5 & 5 & 0.0005 & 0.005 & 0.01 & 2 & 6 \\\\\n", 494 | "\t3 & 5 & 5 & 0.0005 & 0.005 & 0.01 & 3 & 9 \\\\\n", 495 | "\t4 & 5 & 5 & 0.0005 & 0.005 & 0.01 & 4 & 11 \\\\\n", 496 | "\t5 & 5 & 5 & 0.0005 & 0.005 & 0.01 & 5 & 11 \\\\\n", 497 | "\t6 & 5 & 5 & 0.0005 & 0.005 & 0.01 & 6 & 11 \\\\\n", 498 | "\t7 & 5 & 5 & 0.0005 & 0.005 & 0.01 & 7 & 14 \\\\\n", 499 | "\t8 & 5 & 5 & 0.0005 & 0.005 & 0.01 & 8 & 16 \\\\\n", 500 | "\t9 & 5 & 5 & 0.0005 & 0.005 & 0.01 & 9 & 19 \\\\\n", 501 | "\t10 & 5 & 5 & 0.0005 & 0.005 & 0.01 & 10 & 21 \\\\\n", 502 | "\t11 & 5 & 5 & 0.0005 & 0.005 & 0.01 & 11 & 23 \\\\\n", 503 | "\t12 & 5 & 5 & 0.0005 & 0.005 & 0.01 & 12 & 29 \\\\\n", 504 | "\t13 & 5 & 5 & 0.0005 & 0.005 & 0.01 & 13 & 31 \\\\\n", 505 | "\t14 & 5 & 5 & 0.0005 & 0.005 & 0.01 & 14 & 33 \\\\\n", 506 | "\t15 & 5 & 5 & 0.0005 & 0.005 & 0.01 & 15 & 35 \\\\\n", 507 | "\t16 & 5 & 5 & 0.0005 & 0.005 & 0.01 & 16 & 40 \\\\\n", 508 | "\t17 & 5 & 5 & 0.0005 & 0.005 & 0.01 & 17 & 45 \\\\\n", 509 | "\t18 & 5 & 5 & 0.0005 & 0.005 & 0.01 & 18 & 47 \\\\\n", 510 | "\t19 & 5 & 5 & 0.0005 & 0.005 & 0.01 & 19 & 52 \\\\\n", 511 | "\t20 & 5 & 5 & 0.0005 & 0.005 & 0.01 & 20 & 52 \\\\\n", 512 | "\t21 & 5 & 5 & 0.0005 & 0.005 & 0.01 & 21 & 57 \\\\\n", 513 | "\t22 & 5 & 5 & 0.0005 & 0.005 & 0.01 & 22 & 61 \\\\\n", 514 | "\t23 & 5 & 5 & 0.0005 & 0.005 & 0.01 & 23 & 68 \\\\\n", 515 | "\t24 & 5 & 5 & 0.0005 & 0.005 & 0.01 & 24 & 74 \\\\\n", 516 | "\t25 & 5 & 5 & 0.0005 & 0.005 & 0.01 & 25 & 79 \\\\\n", 517 | "\t26 & 5 & 5 & 0.0005 & 0.005 & 0.01 & 26 & 85 \\\\\n", 518 | "\t27 & 5 & 5 & 0.0005 & 0.005 & 0.01 & 27 & 91 \\\\\n", 519 | "\t28 & 5 & 5 & 0.0005 & 0.005 & 0.01 & 28 & 97 \\\\\n", 520 | "\t29 & 5 & 5 & 0.0005 & 0.005 & 0.01 & 29 & 102 \\\\\n", 521 | "\t30 & 5 & 5 & 0.0005 & 0.005 & 0.01 & 30 & 111 \\\\\n", 522 | "\t$\\dots$ & $\\dots$ & $\\dots$ & $\\dots$ & $\\dots$ & $\\dots$ & $\\dots$ & $\\dots$ \\\\\n", 523 | "\\end{tabular}\n" 524 | ], 525 | "text/plain": [ 526 | "\u001b[1m5654×7 DataFrame\u001b[0m\n", 527 | "\u001b[1m Row \u001b[0m│\u001b[1m s \u001b[0m\u001b[1m w \u001b[0m\u001b[1m alpha \u001b[0m\u001b[1m beta_w \u001b[0m\u001b[1m beta_s \u001b[0m\u001b[1m t \u001b[0m\u001b[1m num_engaged \u001b[0m\n", 528 | " │\u001b[90m Int64 \u001b[0m\u001b[90m Int64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Int64 \u001b[0m\u001b[90m Int64 \u001b[0m\n", 529 | "──────┼─────────────────────────────────────────────────────────────\n", 530 | " 1 │ 5 5 0.0005 0.005 0.01 1 1\n", 531 | " 2 │ 5 5 0.0005 0.005 0.01 2 6\n", 532 | " 3 │ 5 5 0.0005 0.005 0.01 3 9\n", 533 | " 4 │ 5 5 0.0005 0.005 0.01 4 11\n", 534 | " 5 │ 5 5 0.0005 0.005 0.01 5 11\n", 535 | " 6 │ 5 5 0.0005 0.005 0.01 6 11\n", 536 | " 7 │ 5 5 0.0005 0.005 0.01 7 14\n", 537 | " 8 │ 5 5 0.0005 0.005 0.01 8 16\n", 538 | " 9 │ 5 5 0.0005 0.005 0.01 9 19\n", 539 | " 10 │ 5 5 0.0005 0.005 0.01 10 21\n", 540 | " 11 │ 5 5 0.0005 0.005 0.01 11 23\n", 541 | " ⋮ │ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮\n", 542 | " 5645 │ 17 29 0.01 0.015 0.07 1 63\n", 543 | " 5646 │ 17 29 0.01 0.015 0.07 2 340\n", 544 | " 5647 │ 17 29 0.01 0.015 0.07 3 1002\n", 545 | " 5648 │ 17 29 0.01 0.015 0.07 4 1949\n", 546 | " 5649 │ 17 29 0.01 0.015 0.07 5 2613\n", 547 | " 5650 │ 17 29 0.01 0.015 0.07 6 2902\n", 548 | " 5651 │ 29 29 0.01 0.015 0.07 1 238\n", 549 | " 5652 │ 29 29 0.01 0.015 0.07 2 1146\n", 550 | " 5653 │ 29 29 0.01 0.015 0.07 3 2373\n", 551 | " 5654 │ 29 29 0.01 0.015 0.07 4 2921\n", 552 | "\u001b[36m 5633 rows omitted\u001b[0m" 553 | ] 554 | }, 555 | "execution_count": 12, 556 | "metadata": {}, 557 | "output_type": "execute_result" 558 | } 559 | ], 560 | "source": [ 561 | "results = execute_simulation(parameter_space, 3000)" 562 | ] 563 | }, 564 | { 565 | "cell_type": "markdown", 566 | "metadata": {}, 567 | "source": [ 568 | "# 4. Discussion\n", 569 | "\n", 570 | "To answer the research questions, the authors resort to simple linear regression. \n", 571 | "\n", 572 | "Since our focus in this workbook is on highlighting the strengths of the JuliaGraphs ecosystem, we keep the regression modeling at the most basic level.\n", 573 | "\n", 574 | "As discussed earlier, the outcome is the time taken for 95% of the network to engage with the message. The features used to predict this outcome are $s$, $w$, $\\alpha$, $\\beta_w$ and $\\beta_S$. " 575 | ] 576 | }, 577 | { 578 | "cell_type": "code", 579 | "execution_count": 13, 580 | "metadata": {}, 581 | "outputs": [ 582 | { 583 | "data": { 584 | "text/html": [ 585 | "
10×7 DataFrame
Rowswalphabeta_wbeta_stnum_engaged
Int64Int64Float64Float64Float64Int64Int64
1550.00050.0050.0111
2550.00050.0050.0126
3550.00050.0050.0139
4550.00050.0050.01411
5550.00050.0050.01511
6550.00050.0050.01611
7550.00050.0050.01714
8550.00050.0050.01816
9550.00050.0050.01919
10550.00050.0050.011021
" 586 | ], 587 | "text/latex": [ 588 | "\\begin{tabular}{r|ccccccc}\n", 589 | "\t& s & w & alpha & beta\\_w & beta\\_s & t & num\\_engaged\\\\\n", 590 | "\t\\hline\n", 591 | "\t& Int64 & Int64 & Float64 & Float64 & Float64 & Int64 & Int64\\\\\n", 592 | "\t\\hline\n", 593 | "\t1 & 5 & 5 & 0.0005 & 0.005 & 0.01 & 1 & 1 \\\\\n", 594 | "\t2 & 5 & 5 & 0.0005 & 0.005 & 0.01 & 2 & 6 \\\\\n", 595 | "\t3 & 5 & 5 & 0.0005 & 0.005 & 0.01 & 3 & 9 \\\\\n", 596 | "\t4 & 5 & 5 & 0.0005 & 0.005 & 0.01 & 4 & 11 \\\\\n", 597 | "\t5 & 5 & 5 & 0.0005 & 0.005 & 0.01 & 5 & 11 \\\\\n", 598 | "\t6 & 5 & 5 & 0.0005 & 0.005 & 0.01 & 6 & 11 \\\\\n", 599 | "\t7 & 5 & 5 & 0.0005 & 0.005 & 0.01 & 7 & 14 \\\\\n", 600 | "\t8 & 5 & 5 & 0.0005 & 0.005 & 0.01 & 8 & 16 \\\\\n", 601 | "\t9 & 5 & 5 & 0.0005 & 0.005 & 0.01 & 9 & 19 \\\\\n", 602 | "\t10 & 5 & 5 & 0.0005 & 0.005 & 0.01 & 10 & 21 \\\\\n", 603 | "\\end{tabular}\n" 604 | ], 605 | "text/plain": [ 606 | "\u001b[1m10×7 DataFrame\u001b[0m\n", 607 | "\u001b[1m Row \u001b[0m│\u001b[1m s \u001b[0m\u001b[1m w \u001b[0m\u001b[1m alpha \u001b[0m\u001b[1m beta_w \u001b[0m\u001b[1m beta_s \u001b[0m\u001b[1m t \u001b[0m\u001b[1m num_engaged \u001b[0m\n", 608 | " │\u001b[90m Int64 \u001b[0m\u001b[90m Int64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Int64 \u001b[0m\u001b[90m Int64 \u001b[0m\n", 609 | "─────┼─────────────────────────────────────────────────────────────\n", 610 | " 1 │ 5 5 0.0005 0.005 0.01 1 1\n", 611 | " 2 │ 5 5 0.0005 0.005 0.01 2 6\n", 612 | " 3 │ 5 5 0.0005 0.005 0.01 3 9\n", 613 | " 4 │ 5 5 0.0005 0.005 0.01 4 11\n", 614 | " 5 │ 5 5 0.0005 0.005 0.01 5 11\n", 615 | " 6 │ 5 5 0.0005 0.005 0.01 6 11\n", 616 | " 7 │ 5 5 0.0005 0.005 0.01 7 14\n", 617 | " 8 │ 5 5 0.0005 0.005 0.01 8 16\n", 618 | " 9 │ 5 5 0.0005 0.005 0.01 9 19\n", 619 | " 10 │ 5 5 0.0005 0.005 0.01 10 21" 620 | ] 621 | }, 622 | "execution_count": 13, 623 | "metadata": {}, 624 | "output_type": "execute_result" 625 | } 626 | ], 627 | "source": [ 628 | "first(results, 10)" 629 | ] 630 | }, 631 | { 632 | "cell_type": "markdown", 633 | "metadata": {}, 634 | "source": [ 635 | "To build the data required for the linear modeling, we group the data by each parameter setting and calculate the time the network takes to reach 95% activation." 636 | ] 637 | }, 638 | { 639 | "cell_type": "code", 640 | "execution_count": 14, 641 | "metadata": {}, 642 | "outputs": [ 643 | { 644 | "data": { 645 | "text/html": [ 646 | "
10×6 DataFrame
Rowswalphabeta_wbeta_sT95
Int64Int64Float64Float64Float64Int64
1550.00050.0050.01154
21750.00050.0050.0164
32950.00050.0050.0147
45170.00050.0050.0178
517170.00050.0050.0143
629170.00050.0050.0132
75290.00050.0050.0147
817290.00050.0050.0133
929290.00050.0050.0125
10550.005250.0050.0193
" 647 | ], 648 | "text/latex": [ 649 | "\\begin{tabular}{r|cccccc}\n", 650 | "\t& s & w & alpha & beta\\_w & beta\\_s & T95\\\\\n", 651 | "\t\\hline\n", 652 | "\t& Int64 & Int64 & Float64 & Float64 & Float64 & Int64\\\\\n", 653 | "\t\\hline\n", 654 | "\t1 & 5 & 5 & 0.0005 & 0.005 & 0.01 & 154 \\\\\n", 655 | "\t2 & 17 & 5 & 0.0005 & 0.005 & 0.01 & 64 \\\\\n", 656 | "\t3 & 29 & 5 & 0.0005 & 0.005 & 0.01 & 47 \\\\\n", 657 | "\t4 & 5 & 17 & 0.0005 & 0.005 & 0.01 & 78 \\\\\n", 658 | "\t5 & 17 & 17 & 0.0005 & 0.005 & 0.01 & 43 \\\\\n", 659 | "\t6 & 29 & 17 & 0.0005 & 0.005 & 0.01 & 32 \\\\\n", 660 | "\t7 & 5 & 29 & 0.0005 & 0.005 & 0.01 & 47 \\\\\n", 661 | "\t8 & 17 & 29 & 0.0005 & 0.005 & 0.01 & 33 \\\\\n", 662 | "\t9 & 29 & 29 & 0.0005 & 0.005 & 0.01 & 25 \\\\\n", 663 | "\t10 & 5 & 5 & 0.00525 & 0.005 & 0.01 & 93 \\\\\n", 664 | "\\end{tabular}\n" 665 | ], 666 | "text/plain": [ 667 | "\u001b[1m10×6 DataFrame\u001b[0m\n", 668 | "\u001b[1m Row \u001b[0m│\u001b[1m s \u001b[0m\u001b[1m w \u001b[0m\u001b[1m alpha \u001b[0m\u001b[1m beta_w \u001b[0m\u001b[1m beta_s \u001b[0m\u001b[1m T95 \u001b[0m\n", 669 | " │\u001b[90m Int64 \u001b[0m\u001b[90m Int64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Int64 \u001b[0m\n", 670 | "─────┼────────────────────────────────────────────────\n", 671 | " 1 │ 5 5 0.0005 0.005 0.01 154\n", 672 | " 2 │ 17 5 0.0005 0.005 0.01 64\n", 673 | " 3 │ 29 5 0.0005 0.005 0.01 47\n", 674 | " 4 │ 5 17 0.0005 0.005 0.01 78\n", 675 | " 5 │ 17 17 0.0005 0.005 0.01 43\n", 676 | " 6 │ 29 17 0.0005 0.005 0.01 32\n", 677 | " 7 │ 5 29 0.0005 0.005 0.01 47\n", 678 | " 8 │ 17 29 0.0005 0.005 0.01 33\n", 679 | " 9 │ 29 29 0.0005 0.005 0.01 25\n", 680 | " 10 │ 5 5 0.00525 0.005 0.01 93" 681 | ] 682 | }, 683 | "execution_count": 14, 684 | "metadata": {}, 685 | "output_type": "execute_result" 686 | } 687 | ], 688 | "source": [ 689 | "all_engaged = combine(groupby(results, [:s, :w, :alpha, :beta_w, :beta_s]), df -> DataFrame(T95 = maximum(df[!,:t])));\n", 690 | "first(all_engaged, 10)" 691 | ] 692 | }, 693 | { 694 | "cell_type": "markdown", 695 | "metadata": {}, 696 | "source": [ 697 | "We then run a simple linear model on the data" 698 | ] 699 | }, 700 | { 701 | "cell_type": "code", 702 | "execution_count": 15, 703 | "metadata": {}, 704 | "outputs": [ 705 | { 706 | "data": { 707 | "text/plain": [ 708 | "StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}}}, Matrix{Float64}}\n", 709 | "\n", 710 | "T95 ~ 1 + s + w + alpha + beta_s + beta_w\n", 711 | "\n", 712 | "Coefficients:\n", 713 | "────────────────────────────────────────────────────────────────────────────────────\n", 714 | " Coef. Std. Error t Pr(>|t|) Lower 95% Upper 95%\n", 715 | "────────────────────────────────────────────────────────────────────────────────────\n", 716 | "(Intercept) 84.3588 3.04531 27.70 <1e-75 78.3594 90.3581\n", 717 | "s -1.01132 0.0742558 -13.62 <1e-30 -1.1576 -0.865031\n", 718 | "w -0.824588 0.0742558 -11.10 <1e-22 -0.970874 -0.678303\n", 719 | "alpha -1374.92 187.594 -7.33 <1e-11 -1744.48 -1005.35\n", 720 | "beta_s -292.798 29.7023 -9.86 <1e-18 -351.313 -234.284\n", 721 | "beta_w -1095.06 178.214 -6.14 <1e-08 -1446.15 -743.976\n", 722 | "────────────────────────────────────────────────────────────────────────────────────" 723 | ] 724 | }, 725 | "execution_count": 15, 726 | "metadata": {}, 727 | "output_type": "execute_result" 728 | } 729 | ], 730 | "source": [ 731 | "ols = lm(@formula(T95 ~ s + w + alpha + beta_s + beta_w), all_engaged)" 732 | ] 733 | }, 734 | { 735 | "cell_type": "code", 736 | "execution_count": 16, 737 | "metadata": {}, 738 | "outputs": [ 739 | { 740 | "data": { 741 | "text/plain": [ 742 | "0.6773101916389891" 743 | ] 744 | }, 745 | "execution_count": 16, 746 | "metadata": {}, 747 | "output_type": "execute_result" 748 | } 749 | ], 750 | "source": [ 751 | "r2(ols)" 752 | ] 753 | }, 754 | { 755 | "cell_type": "markdown", 756 | "metadata": {}, 757 | "source": [ 758 | "This is a rather strong finding. The speed of information diffusion is impacted equally strongly by both strong ties and weak ties. As the authors note, the surprising aspect of this strudy is that the effect of weak ties is rather strong despite the inferiority of the weak ties parameter in the model assumptions." 759 | ] 760 | } 761 | ], 762 | "metadata": { 763 | "kernelspec": { 764 | "display_name": "Julia 1.8.2", 765 | "language": "julia", 766 | "name": "julia-1.8" 767 | }, 768 | "language_info": { 769 | "file_extension": ".jl", 770 | "mimetype": "application/julia", 771 | "name": "julia", 772 | "version": "1.8.2" 773 | } 774 | }, 775 | "nbformat": 4, 776 | "nbformat_minor": 2 777 | } 778 | --------------------------------------------------------------------------------