7 | A graph embedding is a representation of graph vertices in a low-dimensional space, which approximately preserves properties such as distances between nodes. Vertex sequence-based embedding procedures use features extracted from linear sequences of nodes to create embeddings using a neural network. In this paper, we propose diffusion graphs as a method to rapidly generate vertex sequences for network embedding. Its computational efficiency is superior to previous methods due to simpler sequence generation, and it produces more accurate results. In experiments, we found that the performance relative to other methods improves with increasing edge density in the graph. In a community detection task, clustering nodes in the embedding space produces better results compared to other sequence-based embedding methods. 8 |
9 | 10 |
11 |
12 |
54 | The code takes an input graph in a csv file. Every row indicates an edge between two nodes separated by a comma. The first row is a header. A sample graph for the `Facebook Restaurants` dataset is included in the `data/` directory.
55 | 56 | ### Options 57 | 58 |59 | Learning of the embedding is handled by the `src/diffusion_2_vec.py` script which provides the following command line arguments.
60 | 61 | #### Input and output options 62 | ``` 63 | --input STR Path to the edge list csv. Default is `data/restaurant_edges.csv` 64 | --output STR Path to the embedding features. Default is `emb/restaurant.csv` 65 | ``` 66 | 67 | #### Model options 68 | ``` 69 | --model STR Embedding procedure. Default is `non-pooled` 70 | --dimensions INT Number of embedding dimensions. Default is 128. 71 | --vertex-set-cardinality INT Number of nodes per diffusion tree. Default is 80. 72 | --num-diffusions INT Number of diffusions per source node. Default is 10. 73 | --window-size INT Context size for optimization. Default is 10. 74 | --iter INT Number of ASGD iterations. Default is 1. 75 | --workers INT Number of cores. Default is 4. 76 | --alpha FLOAT Initial learning rate. Default is 0.025. 77 | ``` 78 | 79 | ### Examples 80 | 81 |82 | The following commands learns a graph embedding and writes it to disk. The first column in the embedding file is the node ID. 83 |
84 | Creating an embedding of the default dataset with the default hyperparameter settings. 85 | 86 | ``` 87 | python src/diffusion_2_vec.py 88 | ``` 89 | Creating an embedding of an other dataset the `Facebook Politicians`. 90 | 91 | ``` 92 | python src/diffusion_2_vec.py --input data/politician_edges.csv --output output/politician.csv 93 | ``` 94 |95 | Creating an embedding of the default dataset in 32 dimensions, 5 sequences per source node with maximal vertex set cardinality of 40.
96 | 97 | ``` 98 | python src/diffusion_2_vec.py --dimensions 32 --num-diffusions 5 --vertex-set-cardinality 40 99 | ``` 100 | ----------------------------------------------------------------- 101 | 102 | **License** 103 | 104 | - [GNU](https://github.com/benedekrozemberczki/diff2vec/blob/master/LICENSE) 105 | 106 | ----------------------------------------------------------------- 107 | -------------------------------------------------------------------------------- /data/restaurant_edges.csv: -------------------------------------------------------------------------------- 1 | node_1,node_2 2 | 0,276 3 | 0,58 4 | 0,132 5 | 0,603 6 | 0,398 7 | 0,555 8 | 1,265 9 | 1,611 10 | 2,265 11 | 2,182 12 | 2,345 13 | 3,608 14 | 3,377 15 | 3,40 16 | 3,352 17 | 3,450 18 | 3,484 19 | 3,299 20 | 3,65 21 | 3,185 22 | 3,228 23 | 557,227 24 | 557,182 25 | 557,90 26 | 557,151 27 | 557,394 28 | 4,265 29 | 4,287 30 | 4,254 31 | 4,336 32 | 5,503 33 | 6,429 34 | 6,163 35 | 6,486 36 | 6,478 37 | 6,518 38 | 7,293 39 | 7,265 40 | 7,305 41 | 7,339 42 | 7,35 43 | 7,444 44 | 7,583 45 | 8,278 46 | 8,434 47 | 8,73 48 | 8,98 49 | 8,369 50 | 9,317 51 | 9,288 52 | 9,206 53 | 9,289 54 | 9,249 55 | 9,593 56 | 9,434 57 | 9,254 58 | 9,550 59 | 9,131 60 | 9,446 61 | 9,89 62 | 9,90 63 | 9,265 64 | 9,15 65 | 9,340 66 | 9,248 67 | 9,229 68 | 9,56 69 | 9,570 70 | 9,189 71 | 9,611 72 | 9,107 73 | 9,23 74 | 9,465 75 | 9,67 76 | 9,351 77 | 9,498 78 | 10,393 79 | 10,258 80 | 11,212 81 | 11,485 82 | 11,264 83 | 11,380 84 | 11,174 85 | 11,259 86 | 12,548 87 | 12,618 88 | 12,310 89 | 13,352 90 | 14,264 91 | 14,326 92 | 15,265 93 | 15,70 94 | 15,340 95 | 15,67 96 | 15,454 97 | 15,505 98 | 15,43 99 | 15,432 100 | 15,56 101 | 16,516 102 | 16,119 103 | 16,400 104 | 16,545 105 | 16,334 106 | 16,237 107 | 16,374 108 | 16,117 109 | 16,274 110 | 16,613 111 | 16,466 112 | 16,75 113 | 17,185 114 | 17,608 115 | 17,450 116 | 17,299 117 | 17,65 118 | 18,265 119 | 408,265 120 | 408,418 121 | 408,581 122 | 408,143 123 | 408,599 124 | 20,324 125 | 20,355 126 | 20,481 127 | 20,238 128 | 20,242 129 | 20,160 130 | 21,516 131 | 21,441 132 | 21,329 133 | 21,334 134 | 22,270 135 | 22,187 136 | 22,104 137 | 22,22 138 | 22,420 139 | 23,434 140 | 23,58 141 | 23,128 142 | 23,257 143 | 23,288 144 | 23,277 145 | 23,182 146 | 23,340 147 | 23,157 148 | 23,227 149 | 23,343 150 | 23,54 151 | 23,67 152 | 24,547 153 | 24,230 154 | 24,502 155 | 24,306 156 | 24,501 157 | 24,591 158 | 25,363 159 | 26,230 160 | 27,424 161 | 28,147 162 | 28,230 163 | 28,527 164 | 29,49 165 | 30,126 166 | 31,285 167 | 31,424 168 | 31,543 169 | 31,357 170 | 31,507 171 | 31,508 172 | 31,491 173 | 31,169 174 | 31,595 175 | 31,41 176 | 31,63 177 | 31,327 178 | 31,266 179 | 31,48 180 | 31,179 181 | 31,546 182 | 31,518 183 | 31,142 184 | 31,449 185 | 31,269 186 | 31,524 187 | 31,488 188 | 31,515 189 | 31,193 190 | 31,315 191 | 32,89 192 | 32,265 193 | 32,183 194 | 32,611 195 | 32,364 196 | 32,581 197 | 32,159 198 | 32,300 199 | 357,543 200 | 357,504 201 | 357,118 202 | 357,430 203 | 357,524 204 | 357,508 205 | 357,164 206 | 357,165 207 | 357,127 208 | 357,395 209 | 357,596 210 | 357,79 211 | 357,515 212 | 357,48 213 | 357,448 214 | 357,179 215 | 357,518 216 | 357,335 217 | 357,307 218 | 357,491 219 | 357,572 220 | 357,313 221 | 357,63 222 | 357,45 223 | 357,532 224 | 526,552 225 | 526,70 226 | 526,517 227 | 526,181 228 | 526,558 229 | 526,519 230 | 526,352 231 | 526,462 232 | 526,224 233 | 526,235 234 | 526,536 235 | 526,403 236 | 526,363 237 | 526,195 238 | 526,388 239 | 526,498 240 | 526,87 241 | 517,181 242 | 517,403 243 | 517,224 244 | 517,235 245 | 34,89 246 | 34,419 247 | 34,254 248 | 34,265 249 | 34,373 250 | 34,208 251 | 35,536 252 | 35,383 253 | 35,289 254 | 35,389 255 | 35,131 256 | 35,597 257 | 35,338 258 | 35,87 259 | 35,89 260 | 35,90 261 | 35,265 262 | 35,601 263 | 35,56 264 | 35,58 265 | 35,611 266 | 35,439 267 | 35,615 268 | 35,195 269 | 35,198 270 | 35,618 271 | 35,243 272 | 36,490 273 | 37,389 274 | 38,518 275 | 411,449 276 | 40,344 277 | 40,333 278 | 40,352 279 | 40,538 280 | 40,484 281 | 40,240 282 | 41,434 283 | 41,265 284 | 41,594 285 | 41,397 286 | 41,62 287 | 41,465 288 | 41,193 289 | 41,109 290 | 41,611 291 | 41,321 292 | 41,121 293 | 41,282 294 | 42,467 295 | 43,253 296 | 43,102 297 | 43,265 298 | 43,360 299 | 43,577 300 | 43,39 301 | 43,340 302 | 43,229 303 | 43,83 304 | 43,505 305 | 43,67 306 | 44,265 307 | 180,496 308 | 180,331 309 | 180,209 310 | 180,520 311 | 46,417 312 | 46,550 313 | 46,86 314 | 46,317 315 | 46,578 316 | 46,110 317 | 46,67 318 | 47,475 319 | 47,500 320 | 47,508 321 | 48,518 322 | 48,524 323 | 48,164 324 | 49,293 325 | 49,512 326 | 49,71 327 | 49,196 328 | 49,188 329 | 50,544 330 | 50,70 331 | 50,171 332 | 50,483 333 | 50,131 334 | 50,289 335 | 50,182 336 | 50,340 337 | 50,183 338 | 50,562 339 | 50,578 340 | 50,389 341 | 50,67 342 | 50,330 343 | 52,113 344 | 52,494 345 | 53,122 346 | 54,265 347 | 54,275 348 | 54,89 349 | 54,536 350 | 55,210 351 | 55,158 352 | 55,154 353 | 55,437 354 | 55,252 355 | 55,78 356 | 55,452 357 | 55,214 358 | 56,68 359 | 56,383 360 | 56,116 361 | 56,70 362 | 56,597 363 | 56,505 364 | 56,389 365 | 56,479 366 | 56,359 367 | 56,128 368 | 56,439 369 | 56,130 370 | 56,131 371 | 56,217 372 | 56,338 373 | 56,584 374 | 56,87 375 | 56,89 376 | 56,90 377 | 56,265 378 | 56,601 379 | 56,182 380 | 56,340 381 | 56,375 382 | 56,603 383 | 56,229 384 | 56,198 385 | 56,102 386 | 56,345 387 | 56,289 388 | 56,611 389 | 56,60 390 | 56,235 391 | 56,577 392 | 56,350 393 | 56,465 394 | 56,238 395 | 56,195 396 | 56,616 397 | 56,617 398 | 56,244 399 | 56,351 400 | 56,67 401 | 56,151 402 | 57,264 403 | 57,148 404 | 58,89 405 | 58,90 406 | 58,265 407 | 58,601 408 | 58,434 409 | 58,537 410 | 58,350 411 | 58,603 412 | 58,611 413 | 58,616 414 | 58,302 415 | 58,87 416 | 59,265 417 | 59,116 418 | 60,265 419 | 60,350 420 | 60,439 421 | 60,389 422 | 60,198 423 | 61,244 424 | 62,392 425 | 62,285 426 | 62,265 427 | 62,266 428 | 62,169 429 | 62,449 430 | 62,269 431 | 62,193 432 | 62,611 433 | 62,282 434 | 63,139 435 | 63,448 436 | 63,572 437 | 63,518 438 | 63,504 439 | 63,602 440 | 63,45 441 | 63,164 442 | 64,498 443 | 65,608 444 | 65,333 445 | 65,450 446 | 65,299 447 | 65,185 448 | 65,240 449 | 66,141 450 | 67,245 451 | 67,419 452 | 67,317 453 | 67,68 454 | 67,70 455 | 67,291 456 | 67,249 457 | 67,505 458 | 67,107 459 | 67,389 460 | 67,593 461 | 67,253 462 | 67,434 463 | 67,323 464 | 67,550 465 | 67,131 466 | 67,440 467 | 67,445 468 | 67,198 469 | 67,89 470 | 67,90 471 | 67,181 472 | 67,265 473 | 67,288 474 | 67,338 475 | 67,372 476 | 67,340 477 | 67,183 478 | 67,603 479 | 67,248 480 | 67,325 481 | 67,229 482 | 67,254 483 | 67,146 484 | 67,570 485 | 67,189 486 | 67,102 487 | 67,289 488 | 67,611 489 | 67,352 490 | 67,577 491 | 67,410 492 | 67,465 493 | 67,578 494 | 67,531 495 | 67,469 496 | 67,351 497 | 67,416 498 | 67,498 499 | 68,265 500 | 68,288 501 | 68,70 502 | 68,340 503 | 68,603 504 | 68,505 505 | 68,198 506 | 68,242 507 | 69,253 508 | 69,385 509 | 69,498 510 | 70,317 511 | 70,383 512 | 70,205 513 | 70,288 514 | 70,249 515 | 70,72 516 | 70,505 517 | 70,107 518 | 70,593 519 | 70,434 520 | 70,323 521 | 70,536 522 | 70,550 523 | 70,439 524 | 70,170 525 | 70,473 526 | 70,116 527 | 70,599 528 | 70,89 529 | 70,90 530 | 70,181 531 | 70,265 532 | 70,601 533 | 70,269 534 | 70,403 535 | 70,340 536 | 70,248 537 | 70,342 538 | 70,229 539 | 70,275 540 | 70,570 541 | 70,102 542 | 70,611 543 | 70,235 544 | 70,577 545 | 70,465 546 | 70,189 547 | 70,254 548 | 70,616 549 | 70,498 550 | 70,244 551 | 71,265 552 | 71,495 553 | 72,269 554 | 72,240 555 | 419,570 556 | 419,89 557 | 419,254 558 | 419,265 559 | 419,155 560 | 419,128 561 | 419,288 562 | 419,465 563 | 419,343 564 | 485,525 565 | 485,212 566 | 485,115 567 | 485,264 568 | 485,145 569 | 485,174 570 | 485,259 571 | 485,241 572 | 485,380 573 | 75,516 574 | 75,119 575 | 75,400 576 | 75,334 577 | 75,237 578 | 75,117 579 | 75,274 580 | 75,613 581 | 75,75 582 | 77,379 583 | 78,331 584 | 78,210 585 | 78,154 586 | 78,437 587 | 78,252 588 | 78,214 589 | 78,452 590 | 78,158 591 | 79,546 592 | 79,518 593 | 79,507 594 | 79,595 595 | 79,414 596 | 79,85 597 | 79,491 598 | 80,138 599 | 81,290 600 | 81,563 601 | 82,261 602 | 82,584 603 | 82,409 604 | 82,171 605 | 82,451 606 | 82,389 607 | 83,89 608 | 83,254 609 | 83,317 610 | 83,265 611 | 83,288 612 | 83,465 613 | 84,374 614 | 84,545 615 | 84,329 616 | 85,518 617 | 85,335 618 | 85,507 619 | 85,292 620 | 86,578 621 | 87,289 622 | 87,389 623 | 87,479 624 | 87,257 625 | 87,131 626 | 87,597 627 | 87,338 628 | 87,175 629 | 87,89 630 | 87,90 631 | 87,217 632 | 87,265 633 | 87,601 634 | 87,182 635 | 87,340 636 | 87,375 637 | 87,227 638 | 87,229 639 | 87,611 640 | 87,576 641 | 87,350 642 | 87,439 643 | 87,616 644 | 87,580 645 | 87,198 646 | 87,151 647 | 88,434 648 | 88,550 649 | 88,460 650 | 88,265 651 | 88,474 652 | 88,498 653 | 89,245 654 | 89,317 655 | 89,155 656 | 89,288 657 | 89,597 658 | 89,505 659 | 89,432 660 | 89,254 661 | 89,324 662 | 89,128 663 | 89,300 664 | 89,343 665 | 89,136 666 | 89,116 667 | 89,446 668 | 89,90 669 | 89,558 670 | 89,601 671 | 89,221 672 | 89,182 673 | 89,340 674 | 89,248 675 | 89,569 676 | 89,275 677 | 89,570 678 | 89,584 679 | 89,217 680 | 89,350 681 | 89,465 682 | 89,351 683 | 89,151 684 | 90,317 685 | 90,473 686 | 90,116 687 | 90,206 688 | 90,355 689 | 90,157 690 | 90,505 691 | 90,389 692 | 90,479 693 | 90,434 694 | 90,324 695 | 90,550 696 | 90,130 697 | 90,131 698 | 90,584 699 | 90,208 700 | 90,597 701 | 90,265 702 | 90,601 703 | 90,372 704 | 90,182 705 | 90,340 706 | 90,603 707 | 90,227 708 | 90,342 709 | 90,229 710 | 90,275 711 | 90,198 712 | 90,289 713 | 90,611 714 | 90,217 715 | 90,350 716 | 90,439 717 | 90,373 718 | 90,238 719 | 90,413 720 | 90,195 721 | 90,616 722 | 90,580 723 | 90,351 724 | 90,242 725 | 90,151 726 | 91,253 727 | 91,563 728 | 92,186 729 | 93,146 730 | 93,103 731 | 93,567 732 | 93,94 733 | 93,614 734 | 94,103 735 | 95,492 736 | 95,523 737 | 96,537 738 | 96,379 739 | 96,211 740 | 97,265 741 | 98,74 742 | 98,295 743 | 98,471 744 | 98,216 745 | 98,99 746 | 98,605 747 | 99,142 748 | 99,431 749 | 99,108 750 | 100,518 751 | 101,124 752 | 101,545 753 | 101,311 754 | 101,374 755 | 101,216 756 | 101,329 757 | 101,152 758 | 232,146 759 | 232,253 760 | 232,563 761 | 232,134 762 | 232,498 763 | 424,164 764 | 424,480 765 | 424,256 766 | 424,518 767 | 424,348 768 | 424,118 769 | 424,491 770 | 104,253 771 | 104,290 772 | 104,563 773 | 104,187 774 | 104,469 775 | 105,548 776 | 105,521 777 | 106,305 778 | 106,583 779 | 106,444 780 | 107,340 781 | 107,550 782 | 107,383 783 | 107,265 784 | 107,317 785 | 107,291 786 | 107,505 787 | 107,416 788 | 107,498 789 | 108,147 790 | 108,547 791 | 108,230 792 | 108,527 793 | 109,181 794 | 109,169 795 | 109,320 796 | 110,578 797 | 111,265 798 | 111,389 799 | 111,451 800 | 111,229 801 | 112,265 802 | 112,498 803 | 113,559 804 | 113,358 805 | 113,494 806 | 113,276 807 | 113,319 808 | 113,222 809 | 113,522 810 | 113,398 811 | 113,555 812 | 114,200 813 | 114,138 814 | 114,201 815 | 114,143 816 | 114,397 817 | 114,134 818 | 114,184 819 | 114,389 820 | 115,145 821 | 116,536 822 | 116,593 823 | 116,434 824 | 116,366 825 | 116,446 826 | 116,558 827 | 116,265 828 | 116,372 829 | 116,182 830 | 116,248 831 | 116,342 832 | 116,275 833 | 116,611 834 | 116,616 835 | 116,198 836 | 116,243 837 | 116,151 838 | 429,163 839 | 429,535 840 | 429,220 841 | 429,362 842 | 429,427 843 | 429,478 844 | 118,518 845 | 118,507 846 | 120,265 847 | 120,375 848 | 121,227 849 | 121,433 850 | 121,553 851 | 121,265 852 | 123,524 853 | 123,179 854 | 123,518 855 | 123,347 856 | 123,491 857 | 123,292 858 | 124,374 859 | 124,545 860 | 124,613 861 | 125,565 862 | 125,308 863 | 125,423 864 | 529,412 865 | 529,253 866 | 529,474 867 | 127,518 868 | 127,347 869 | 127,524 870 | 128,446 871 | 128,570 872 | 128,254 873 | 128,373 874 | 128,265 875 | 128,576 876 | 128,131 877 | 128,289 878 | 128,465 879 | 128,404 880 | 128,208 881 | 128,135 882 | 130,217 883 | 130,151 884 | 130,288 885 | 130,355 886 | 130,340 887 | 130,597 888 | 130,136 889 | 130,432 890 | 131,122 891 | 131,157 892 | 131,288 893 | 131,597 894 | 131,505 895 | 131,510 896 | 131,217 897 | 131,343 898 | 131,265 899 | 131,601 900 | 131,182 901 | 131,340 902 | 131,248 903 | 131,325 904 | 131,229 905 | 131,238 906 | 131,351 907 | 131,151 908 | 132,198 909 | 133,397 910 | 133,407 911 | 134,143 912 | 134,563 913 | 134,184 914 | 134,456 915 | 134,251 916 | 135,446 917 | 135,544 918 | 135,317 919 | 135,360 920 | 135,350 921 | 135,458 922 | 135,386 923 | 135,227 924 | 135,321 925 | 135,281 926 | 135,359 927 | 136,422 928 | 136,208 929 | 136,536 930 | 136,558 931 | 136,265 932 | 136,155 933 | 136,576 934 | 136,439 935 | 136,611 936 | 136,568 937 | 136,389 938 | 136,229 939 | 239,616 940 | 138,418 941 | 208,446 942 | 208,601 943 | 208,288 944 | 208,340 945 | 208,455 946 | 208,343 947 | 140,265 948 | 140,581 949 | 141,461 950 | 141,51 951 | 141,280 952 | 141,574 953 | 142,164 954 | 142,518 955 | 142,491 956 | 142,596 957 | 142,524 958 | 143,407 959 | 143,265 960 | 143,397 961 | 143,182 962 | 143,184 963 | 143,172 964 | 143,456 965 | 143,580 966 | 143,581 967 | 143,151 968 | 144,150 969 | 144,311 970 | 144,374 971 | 144,545 972 | 144,441 973 | 145,525 974 | 145,264 975 | 145,380 976 | 145,241 977 | 146,253 978 | 146,354 979 | 146,193 980 | 146,505 981 | 146,567 982 | 146,498 983 | 147,501 984 | 147,547 985 | 147,230 986 | 147,502 987 | 147,438 988 | 147,126 989 | 147,527 990 | 147,279 991 | 147,591 992 | 148,265 993 | 148,253 994 | 148,563 995 | 148,498 996 | 149,494 997 | 149,276 998 | 149,319 999 | 149,522 1000 | 149,555 1001 | 150,545 1002 | 150,311 1003 | 150,374 1004 | 150,613 1005 | 150,329 1006 | 244,368 1007 | 244,352 1008 | 244,235 1009 | 244,224 1010 | 244,472 1011 | 152,545 1012 | 152,382 1013 | 152,311 1014 | 152,441 1015 | 152,137 1016 | 152,374 1017 | 152,329 1018 | 153,412 1019 | 154,331 1020 | 154,210 1021 | 154,158 1022 | 154,129 1023 | 154,437 1024 | 154,252 1025 | 154,214 1026 | 154,452 1027 | 154,223 1028 | 154,533 1029 | 155,570 1030 | 155,254 1031 | 155,265 1032 | 155,576 1033 | 155,293 1034 | 156,483 1035 | 157,585 1036 | 157,288 1037 | 157,257 1038 | 157,289 1039 | 157,603 1040 | 157,604 1041 | 157,510 1042 | 158,210 1043 | 158,437 1044 | 158,252 1045 | 158,452 1046 | 158,214 1047 | 158,223 1048 | 159,265 1049 | 159,360 1050 | 159,397 1051 | 159,375 1052 | 159,458 1053 | 159,432 1054 | 160,383 1055 | 161,265 1056 | 162,610 1057 | 162,323 1058 | 162,498 1059 | 163,535 1060 | 163,561 1061 | 163,486 1062 | 163,478 1063 | 164,543 1064 | 164,507 1065 | 164,508 1066 | 164,256 1067 | 164,524 1068 | 164,595 1069 | 164,327 1070 | 164,176 1071 | 164,515 1072 | 164,179 1073 | 164,546 1074 | 164,518 1075 | 164,337 1076 | 164,307 1077 | 164,491 1078 | 164,488 1079 | 164,315 1080 | 165,165 1081 | 165,179 1082 | 165,518 1083 | 165,596 1084 | 165,524 1085 | 166,265 1086 | 166,182 1087 | 166,418 1088 | 166,183 1089 | 166,611 1090 | 167,601 1091 | 168,299 1092 | 169,181 1093 | 169,265 1094 | 169,397 1095 | 169,611 1096 | 169,282 1097 | 170,383 1098 | 170,340 1099 | 171,406 1100 | 171,202 1101 | 171,459 1102 | 172,265 1103 | 172,611 1104 | 172,581 1105 | 172,599 1106 | 173,369 1107 | 174,212 1108 | 174,264 1109 | 174,318 1110 | 174,259 1111 | 174,380 1112 | 176,220 1113 | 176,362 1114 | 176,427 1115 | 176,301 1116 | 176,497 1117 | 176,478 1118 | 176,216 1119 | 177,350 1120 | 177,510 1121 | 177,389 1122 | 178,495 1123 | 178,539 1124 | 179,491 1125 | 179,518 1126 | 179,347 1127 | 179,596 1128 | 179,605 1129 | 179,524 1130 | 179,292 1131 | 181,352 1132 | 181,363 1133 | 181,235 1134 | 181,552 1135 | 181,476 1136 | 181,340 1137 | 181,320 1138 | 181,258 1139 | 181,248 1140 | 182,246 1141 | 182,317 1142 | 182,473 1143 | 182,289 1144 | 182,389 1145 | 182,350 1146 | 182,300 1147 | 182,265 1148 | 182,463 1149 | 182,375 1150 | 182,603 1151 | 182,456 1152 | 182,229 1153 | 182,345 1154 | 182,611 1155 | 182,439 1156 | 182,195 1157 | 182,198 1158 | 182,451 1159 | 183,459 1160 | 183,265 1161 | 183,202 1162 | 184,397 1163 | 184,456 1164 | 184,251 1165 | 185,377 1166 | 185,333 1167 | 185,538 1168 | 185,356 1169 | 185,240 1170 | 185,432 1171 | 186,202 1172 | 186,459 1173 | 186,414 1174 | 187,420 1175 | 187,270 1176 | 187,217 1177 | 187,514 1178 | 189,340 1179 | 189,288 1180 | 190,393 1181 | 191,518 1182 | 191,524 1183 | 191,45 1184 | 445,340 1185 | 445,229 1186 | 445,250 1187 | 445,576 1188 | 192,559 1189 | 192,371 1190 | 193,433 1191 | 193,285 1192 | 193,449 1193 | 193,273 1194 | 193,282 1195 | 194,619 1196 | 194,226 1197 | 194,398 1198 | 195,217 1199 | 195,257 1200 | 195,350 1201 | 195,340 1202 | 195,597 1203 | 195,227 1204 | 195,175 1205 | 195,389 1206 | 195,151 1207 | 198,245 1208 | 198,317 1209 | 198,355 1210 | 198,505 1211 | 198,432 1212 | 198,597 1213 | 198,217 1214 | 198,558 1215 | 198,265 1216 | 198,601 1217 | 198,372 1218 | 198,340 1219 | 198,603 1220 | 198,229 1221 | 198,102 1222 | 198,577 1223 | 198,350 1224 | 198,238 1225 | 198,580 1226 | 199,518 1227 | 199,316 1228 | 199,566 1229 | 202,426 1230 | 202,288 1231 | 202,340 1232 | 202,343 1233 | 202,555 1234 | 203,435 1235 | 203,345 1236 | 204,540 1237 | 204,215 1238 | 206,473 1239 | 207,372 1240 | 139,518 1241 | 209,33 1242 | 210,252 1243 | 210,437 1244 | 210,452 1245 | 210,223 1246 | 544,404 1247 | 212,264 1248 | 212,380 1249 | 212,259 1250 | 213,502 1251 | 214,534 1252 | 214,223 1253 | 214,533 1254 | 215,540 1255 | 215,278 1256 | 215,405 1257 | 216,618 1258 | 216,402 1259 | 217,245 1260 | 217,317 1261 | 217,473 1262 | 217,585 1263 | 217,289 1264 | 217,589 1265 | 217,510 1266 | 217,597 1267 | 217,265 1268 | 217,601 1269 | 217,404 1270 | 217,603 1271 | 217,229 1272 | 217,102 1273 | 217,611 1274 | 217,576 1275 | 217,577 1276 | 217,350 1277 | 217,439 1278 | 217,413 1279 | 217,617 1280 | 217,580 1281 | 218,521 1282 | 220,535 1283 | 220,518 1284 | 220,441 1285 | 220,427 1286 | 220,478 1287 | 221,265 1288 | 222,365 1289 | 222,494 1290 | 222,522 1291 | 223,252 1292 | 223,542 1293 | 223,452 1294 | 224,262 1295 | 224,258 1296 | 224,352 1297 | 224,235 1298 | 224,552 1299 | 224,363 1300 | 224,376 1301 | 224,248 1302 | 224,443 1303 | 224,388 1304 | 224,470 1305 | 224,569 1306 | 225,253 1307 | 225,493 1308 | 225,584 1309 | 225,290 1310 | 225,563 1311 | 225,469 1312 | 225,498 1313 | 226,431 1314 | 226,371 1315 | 227,436 1316 | 227,265 1317 | 227,370 1318 | 227,352 1319 | 227,577 1320 | 227,442 1321 | 227,453 1322 | 227,102 1323 | 228,392 1324 | 229,317 1325 | 229,289 1326 | 229,597 1327 | 229,505 1328 | 229,389 1329 | 229,509 1330 | 229,436 1331 | 229,549 1332 | 229,396 1333 | 229,361 1334 | 229,440 1335 | 229,554 1336 | 229,446 1337 | 229,265 1338 | 229,340 1339 | 229,576 1340 | 229,151 1341 | 230,547 1342 | 230,501 1343 | 230,502 1344 | 230,438 1345 | 230,306 1346 | 230,279 1347 | 230,527 1348 | 230,591 1349 | 231,425 1350 | 102,554 1351 | 233,265 1352 | 234,618 1353 | 235,368 1354 | 235,352 1355 | 235,354 1356 | 235,552 1357 | 235,363 1358 | 235,578 1359 | 235,258 1360 | 235,472 1361 | 236,282 1362 | 237,516 1363 | 237,119 1364 | 237,400 1365 | 237,545 1366 | 237,334 1367 | 237,117 1368 | 237,374 1369 | 237,274 1370 | 237,613 1371 | 238,550 1372 | 238,265 1373 | 238,449 1374 | 238,289 1375 | 238,603 1376 | 238,611 1377 | 238,616 1378 | 238,325 1379 | 137,117 1380 | 137,311 1381 | 137,441 1382 | 137,329 1383 | 240,608 1384 | 240,377 1385 | 240,450 1386 | 240,484 1387 | 240,299 1388 | 240,563 1389 | 241,264 1390 | 242,434 1391 | 242,323 1392 | 242,611 1393 | 242,248 1394 | 242,265 1395 | 242,498 1396 | 243,265 1397 | 243,521 1398 | 151,345 1399 | 151,265 1400 | 151,389 1401 | 151,289 1402 | 151,611 1403 | 151,375 1404 | 151,603 1405 | 151,597 1406 | 151,580 1407 | 151,601 1408 | 245,593 1409 | 245,509 1410 | 245,322 1411 | 245,340 1412 | 245,473 1413 | 245,265 1414 | 245,291 1415 | 245,611 1416 | 245,616 1417 | 245,580 1418 | 245,416 1419 | 246,265 1420 | 247,601 1421 | 453,597 1422 | 593,323 1423 | 593,558 1424 | 593,576 1425 | 593,288 1426 | 593,340 1427 | 593,597 1428 | 593,505 1429 | 593,343 1430 | 593,275 1431 | 249,340 1432 | 249,288 1433 | 250,39 1434 | 250,440 1435 | 251,456 1436 | 251,560 1437 | 252,437 1438 | 252,452 1439 | 253,354 1440 | 253,563 1441 | 253,505 1442 | 253,469 1443 | 253,323 1444 | 253,498 1445 | 254,570 1446 | 254,558 1447 | 254,265 1448 | 254,288 1449 | 254,465 1450 | 254,340 1451 | 254,505 1452 | 255,518 1453 | 256,518 1454 | 257,265 1455 | 257,616 1456 | 258,363 1457 | 258,403 1458 | 258,476 1459 | 259,264 1460 | 284,374 1461 | 284,545 1462 | 261,265 1463 | 261,397 1464 | 261,436 1465 | 261,451 1466 | 262,470 1467 | 262,368 1468 | 263,311 1469 | 263,613 1470 | 264,525 1471 | 264,318 1472 | 264,380 1473 | 265,317 1474 | 265,476 1475 | 265,320 1476 | 265,479 1477 | 265,322 1478 | 265,324 1479 | 265,271 1480 | 265,175 1481 | 265,338 1482 | 265,340 1483 | 265,341 1484 | 265,312 1485 | 265,345 1486 | 265,346 1487 | 265,404 1488 | 265,349 1489 | 265,350 1490 | 265,351 1491 | 265,505 1492 | 265,321 1493 | 265,360 1494 | 265,361 1495 | 265,364 1496 | 265,343 1497 | 265,372 1498 | 265,375 1499 | 265,439 1500 | 265,531 1501 | 265,383 1502 | 265,384 1503 | 265,386 1504 | 265,510 1505 | 265,549 1506 | 265,397 1507 | 265,458 1508 | 265,558 1509 | 265,265 1510 | 265,569 1511 | 265,275 1512 | 265,570 1513 | 265,277 1514 | 265,352 1515 | 265,576 1516 | 265,578 1517 | 265,584 1518 | 265,288 1519 | 265,426 1520 | 265,588 1521 | 265,432 1522 | 265,433 1523 | 265,436 1524 | 265,594 1525 | 265,300 1526 | 265,597 1527 | 265,302 1528 | 265,446 1529 | 265,599 1530 | 265,601 1531 | 265,449 1532 | 265,248 1533 | 265,454 1534 | 265,456 1535 | 265,309 1536 | 265,611 1537 | 265,464 1538 | 265,465 1539 | 265,615 1540 | 265,616 1541 | 267,548 1542 | 268,456 1543 | 268,463 1544 | 269,599 1545 | 269,410 1546 | 269,248 1547 | 269,505 1548 | 270,420 1549 | 271,482 1550 | 271,603 1551 | 272,299 1552 | 274,516 1553 | 274,119 1554 | 274,400 1555 | 274,545 1556 | 274,334 1557 | 274,117 1558 | 274,374 1559 | 274,613 1560 | 274,466 1561 | 275,611 1562 | 275,434 1563 | 275,584 1564 | 276,276 1565 | 276,296 1566 | 276,619 1567 | 276,494 1568 | 276,398 1569 | 276,598 1570 | 276,555 1571 | 554,451 1572 | 554,389 1573 | 554,577 1574 | 460,582 1575 | 460,587 1576 | 460,571 1577 | 460,385 1578 | 460,498 1579 | 281,404 1580 | 282,392 1581 | 282,285 1582 | 282,449 1583 | 282,341 1584 | 282,364 1585 | 283,350 1586 | 260,386 1587 | 260,294 1588 | 260,432 1589 | 260,458 1590 | 286,499 1591 | 286,368 1592 | 287,303 1593 | 288,536 1594 | 288,289 1595 | 288,434 1596 | 288,550 1597 | 288,340 1598 | 288,248 1599 | 288,455 1600 | 288,342 1601 | 288,323 1602 | 288,570 1603 | 288,459 1604 | 288,611 1605 | 288,576 1606 | 288,528 1607 | 288,350 1608 | 288,465 1609 | 288,317 1610 | 288,616 1611 | 288,351 1612 | 289,317 1613 | 289,122 1614 | 289,597 1615 | 289,505 1616 | 289,510 1617 | 289,343 1618 | 289,601 1619 | 289,340 1620 | 289,248 1621 | 289,325 1622 | 289,351 1623 | 290,563 1624 | 290,469 1625 | 290,498 1626 | 291,248 1627 | 292,421 1628 | 292,335 1629 | 292,612 1630 | 292,518 1631 | 292,328 1632 | 292,307 1633 | 293,570 1634 | 297,518 1635 | 297,448 1636 | 298,386 1637 | 298,360 1638 | 298,389 1639 | 299,377 1640 | 299,575 1641 | 299,538 1642 | 299,577 1643 | 299,299 1644 | 299,356 1645 | 300,303 1646 | 300,564 1647 | 300,611 1648 | 300,581 1649 | 302,536 1650 | 303,336 1651 | 304,369 1652 | 305,339 1653 | 305,583 1654 | 306,547 1655 | 307,518 1656 | 307,335 1657 | 307,524 1658 | 308,555 1659 | 618,393 1660 | 618,471 1661 | 312,389 1662 | 313,518 1663 | 313,524 1664 | 313,488 1665 | 314,446 1666 | 315,518 1667 | 315,524 1668 | 316,518 1669 | 317,550 1670 | 317,584 1671 | 317,505 1672 | 317,434 1673 | 317,324 1674 | 317,446 1675 | 317,372 1676 | 317,340 1677 | 317,603 1678 | 317,343 1679 | 317,611 1680 | 317,352 1681 | 317,576 1682 | 317,350 1683 | 317,404 1684 | 317,617 1685 | 318,447 1686 | 319,559 1687 | 319,494 1688 | 319,619 1689 | 319,522 1690 | 319,398 1691 | 319,365 1692 | 361,550 1693 | 361,576 1694 | 361,603 1695 | 361,611 1696 | 361,616 1697 | 321,404 1698 | 321,389 1699 | 323,434 1700 | 323,550 1701 | 323,340 1702 | 323,610 1703 | 323,498 1704 | 324,422 1705 | 324,372 1706 | 324,611 1707 | 324,616 1708 | 325,505 1709 | 327,518 1710 | 327,480 1711 | 328,508 1712 | 328,518 1713 | 328,524 1714 | 328,596 1715 | 328,491 1716 | 329,609 1717 | 329,545 1718 | 329,382 1719 | 329,311 1720 | 329,441 1721 | 329,374 1722 | 330,436 1723 | 332,494 1724 | 570,340 1725 | 570,505 1726 | 570,343 1727 | 335,518 1728 | 335,414 1729 | 337,498 1730 | 364,360 1731 | 364,397 1732 | 364,458 1733 | 364,432 1734 | 339,583 1735 | 339,444 1736 | 340,473 1737 | 340,355 1738 | 340,505 1739 | 340,342 1740 | 340,434 1741 | 340,550 1742 | 340,440 1743 | 340,343 1744 | 340,446 1745 | 340,601 1746 | 340,603 1747 | 340,248 1748 | 340,568 1749 | 340,459 1750 | 340,351 1751 | 340,39 1752 | 340,465 1753 | 340,611 1754 | 340,616 1755 | 340,581 1756 | 340,498 1757 | 342,372 1758 | 342,597 1759 | 342,343 1760 | 342,351 1761 | 343,536 1762 | 343,434 1763 | 343,558 1764 | 343,372 1765 | 343,248 1766 | 343,459 1767 | 343,611 1768 | 343,465 1769 | 343,616 1770 | 343,550 1771 | 383,442 1772 | 383,611 1773 | 344,377 1774 | 345,391 1775 | 345,435 1776 | 345,451 1777 | 345,531 1778 | 345,580 1779 | 346,599 1780 | 347,488 1781 | 347,518 1782 | 347,543 1783 | 347,524 1784 | 348,518 1785 | 350,389 1786 | 350,434 1787 | 350,446 1788 | 350,463 1789 | 350,404 1790 | 350,603 1791 | 350,611 1792 | 350,413 1793 | 575,436 1794 | 353,374 1795 | 353,311 1796 | 353,441 1797 | 353,545 1798 | 354,498 1799 | 355,550 1800 | 355,603 1801 | 355,616 1802 | 356,377 1803 | 358,467 1804 | 358,494 1805 | 358,555 1806 | 359,404 1807 | 360,479 1808 | 360,404 1809 | 360,568 1810 | 360,389 1811 | 362,478 1812 | 363,401 1813 | 363,388 1814 | 363,368 1815 | 338,505 1816 | 338,607 1817 | 365,398 1818 | 365,522 1819 | 365,555 1820 | 367,539 1821 | 369,73 1822 | 45,518 1823 | 45,461 1824 | 45,524 1825 | 372,550 1826 | 372,465 1827 | 372,603 1828 | 372,611 1829 | 372,616 1830 | 372,389 1831 | 374,382 1832 | 374,119 1833 | 374,117 1834 | 374,441 1835 | 374,516 1836 | 374,400 1837 | 374,311 1838 | 374,334 1839 | 374,609 1840 | 374,613 1841 | 375,404 1842 | 375,432 1843 | 377,608 1844 | 377,538 1845 | 377,450 1846 | 377,563 1847 | 381,248 1848 | 382,545 1849 | 385,587 1850 | 385,571 1851 | 385,352 1852 | 385,498 1853 | 386,404 1854 | 386,458 1855 | 386,432 1856 | 387,425 1857 | 387,530 1858 | 387,390 1859 | 389,432 1860 | 389,458 1861 | 389,446 1862 | 389,451 1863 | 389,454 1864 | 389,531 1865 | 395,518 1866 | 395,508 1867 | 395,524 1868 | 397,600 1869 | 397,581 1870 | 398,522 1871 | 398,598 1872 | 398,555 1873 | 399,197 1874 | 399,597 1875 | 400,516 1876 | 400,119 1877 | 400,545 1878 | 400,334 1879 | 400,117 1880 | 400,613 1881 | 400,466 1882 | 404,446 1883 | 404,458 1884 | 405,540 1885 | 175,576 1886 | 175,611 1887 | 19,572 1888 | 410,505 1889 | 39,509 1890 | 39,576 1891 | 39,603 1892 | 413,597 1893 | 414,518 1894 | 415,393 1895 | 416,248 1896 | 417,578 1897 | 418,440 1898 | 418,506 1899 | 421,546 1900 | 421,518 1901 | 421,524 1902 | 421,595 1903 | 421,491 1904 | 423,555 1905 | 103,567 1906 | 426,459 1907 | 426,611 1908 | 427,478 1909 | 427,535 1910 | 428,524 1911 | 117,516 1912 | 117,119 1913 | 117,545 1914 | 117,334 1915 | 117,613 1916 | 117,466 1917 | 430,518 1918 | 432,479 1919 | 432,611 1920 | 432,454 1921 | 434,505 1922 | 434,584 1923 | 434,597 1924 | 434,352 1925 | 436,611 1926 | 436,531 1927 | 436,489 1928 | 437,521 1929 | 437,452 1930 | 438,527 1931 | 378,604 1932 | 441,545 1933 | 441,311 1934 | 442,129 1935 | 443,590 1936 | 76,483 1937 | 446,558 1938 | 446,584 1939 | 446,603 1940 | 446,611 1941 | 446,616 1942 | 446,505 1943 | 447,606 1944 | 448,572 1945 | 448,518 1946 | 448,585 1947 | 448,602 1948 | 448,487 1949 | 449,608 1950 | 449,450 1951 | 449,611 1952 | 449,553 1953 | 450,538 1954 | 248,499 1955 | 248,550 1956 | 248,505 1957 | 248,599 1958 | 248,611 1959 | 248,352 1960 | 248,498 1961 | 457,518 1962 | 457,524 1963 | 458,479 1964 | 458,611 1965 | 459,555 1966 | 279,501 1967 | 461,574 1968 | 465,505 1969 | 466,516 1970 | 466,119 1971 | 466,334 1972 | 466,613 1973 | 467,494 1974 | 467,555 1975 | 468,499 1976 | 469,563 1977 | 469,498 1978 | 469,505 1979 | 470,352 1980 | 470,470 1981 | 470,368 1982 | 474,498 1983 | 474,550 1984 | 476,368 1985 | 477,521 1986 | 478,535 1987 | 478,486 1988 | 478,497 1989 | 480,518 1990 | 487,518 1991 | 490,33 1992 | 491,508 1993 | 491,566 1994 | 491,612 1995 | 491,579 1996 | 491,515 1997 | 494,522 1998 | 494,598 1999 | 494,555 2000 | 495,539 2001 | 498,505 2002 | 498,587 2003 | 498,571 2004 | 498,352 2005 | 498,530 2006 | 498,563 2007 | 498,511 2008 | 500,513 2009 | 501,547 2010 | 501,502 2011 | 501,126 2012 | 501,527 2013 | 501,605 2014 | 502,547 2015 | 502,502 2016 | 502,126 2017 | 503,311 2018 | 504,518 2019 | 505,592 2020 | 505,550 2021 | 505,599 2022 | 505,611 2023 | 507,546 2024 | 508,518 2025 | 515,518 2026 | 515,612 2027 | 516,334 2028 | 516,119 2029 | 516,545 2030 | 516,613 2031 | 518,543 2032 | 518,585 2033 | 518,595 2034 | 518,596 2035 | 518,546 2036 | 518,602 2037 | 518,524 2038 | 518,566 2039 | 518,572 2040 | 518,573 2041 | 518,612 2042 | 518,532 2043 | 518,579 2044 | 519,584 2045 | 522,559 2046 | 522,619 2047 | 522,598 2048 | 522,555 2049 | 523,605 2050 | 524,586 2051 | 524,541 2052 | 524,543 2053 | 524,551 2054 | 524,596 2055 | 524,488 2056 | 524,566 2057 | 524,579 2058 | 497,535 2059 | 119,545 2060 | 119,334 2061 | 119,613 2062 | 527,547 2063 | 368,352 2064 | 536,558 2065 | 538,608 2066 | 545,311 2067 | 545,334 2068 | 545,609 2069 | 545,613 2070 | 219,548 2071 | 546,546 2072 | 546,595 2073 | 547,591 2074 | 548,556 2075 | 555,565 2076 | 555,619 2077 | 555,598 2078 | 558,603 2079 | 558,616 2080 | 334,613 2081 | 571,587 2082 | 572,572 2083 | 352,616 2084 | 576,584 2085 | 576,601 2086 | 576,603 2087 | 578,550 2088 | 578,588 2089 | 584,550 2090 | 584,603 2091 | 584,611 2092 | 488,596 2093 | 589,601 2094 | 591,605 2095 | 556,556 2096 | 615,611 2097 | 597,601 2098 | 597,603 2099 | 597,611 2100 | 601,603 2101 | 601,616 2102 | 603,616 2103 | 311,613 2104 | -------------------------------------------------------------------------------- /diff2vec.jpeg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/benedekrozemberczki/diff2vec/12d915539e43143092d8295183f3473d83fec2bf/diff2vec.jpeg -------------------------------------------------------------------------------- /src/diffusion_2_vec.py: -------------------------------------------------------------------------------- 1 | """Diff2Vec model.""" 2 | 3 | import time 4 | import logging 5 | import pandas as pd 6 | from tqdm import tqdm 7 | from joblib import Parallel, delayed 8 | from gensim.models import Word2Vec, Doc2Vec 9 | import numpy.distutils.system_info as sysinfo 10 | from subgraphcomponents import SubGraphComponents 11 | from gensim.models.word2vec import logger, FAST_VERSION 12 | from helper import parameter_parser, result_processing 13 | from helper import process_non_pooled_model_data, argument_printer 14 | 15 | sysinfo.get_info("atlas") 16 | logging.basicConfig(format="%(asctime)s : %(levelname)s : %(message)s", level=logging.INFO) 17 | 18 | 19 | def create_features(seeding, edge_list_path, vertex_set_cardinality): 20 | """ 21 | Creating a single feature for every node. 22 | :param seeding: Random seed. 23 | :param edge_list_path: Path to edge list csv. 24 | :param vertex_set_cardinality: Number of diffusions per node. 25 | :return: Sequences and measurements. 26 | """ 27 | sub_graphs = SubGraphComponents(edge_list_path, seeding, vertex_set_cardinality) 28 | return sub_graphs.paths, sub_graphs.read_time, sub_graphs.generation_time, sub_graphs.counts 29 | 30 | def run_parallel_feature_creation(edge_list_path, 31 | vertex_set_card, 32 | replicates, 33 | workers): 34 | """ 35 | Creating linear node sequences for every node multiple times in a parallel fashion 36 | :param edge_list_path: Path to edge list csv. 37 | :param vertex_set_card: Number of diffusions per node. 38 | :param replicates: Number of unique nodes per diffusion. 39 | :param workers: Number of cores used. 40 | :return walk_results: List of 3-length tuples with sequences and performance measurements. 41 | :return counts: Number of nodes. 42 | """ 43 | results = Parallel(n_jobs=workers)(delayed(create_features)(i, edge_list_path, vertex_set_card) for i in tqdm(range(replicates))) 44 | walk_results, counts = result_processing(results) 45 | return walk_results, counts 46 | 47 | def learn_pooled_embeddings(walks, counts, args): 48 | """ 49 | Method to learn an embedding given the sequences and arguments. 50 | :param walks: Linear vertex sequences. 51 | :param counts: Number of nodes. 52 | :param args: Arguments. 53 | """ 54 | model = Word2Vec(walks, 55 | size=args.dimensions, 56 | window=args.window_size, 57 | min_count=1, 58 | sg=1, 59 | workers=args.workers, 60 | iter=args.iter, 61 | alpha=args.alpha) 62 | 63 | save_embedding(args, model, counts) 64 | 65 | def learn_non_pooled_embeddings(walks, counts, args): 66 | """ 67 | Method to learn an embedding given the sequences and arguments. 68 | :param walks: Linear vertex sequences. 69 | :param counts: Number of nodes. 70 | :param args: Arguments. 71 | """ 72 | walks = process_non_pooled_model_data(walks, counts, args) 73 | model = Doc2Vec(walks, 74 | size=args.dimensions, 75 | window=0, 76 | dm=0, 77 | alpha=args.alpha, 78 | iter=args.iter, 79 | workers=args.workers) 80 | 81 | save_embedding(args, model, counts) 82 | 83 | def save_embedding(args, model, counts): 84 | """ 85 | Function to save the embedding. 86 | :param args: Arguments object. 87 | :param model: The embedding model object. 88 | :param counts: Number of nodes. 89 | """ 90 | out = [] 91 | for node in range(1, counts[0]): 92 | if args.model == "non-pooled": 93 | out.append([int(node)-1] + list(model.docvecs[node])) 94 | else: 95 | out.append([int(node)-1] + list(model.wv[str(node-1)])) 96 | columns = ["node"] + ["x_"+str(x) for x in range(args.dimensions)] 97 | out = pd.DataFrame(out, columns=columns) 98 | out = out.sort_values(["node"]) 99 | out.to_csv(args.output, index=None) 100 | 101 | def main(args): 102 | """ 103 | Main method for creating sequences and learning the embedding. 104 | :param args: Arguments object. 105 | """ 106 | argument_printer(args) 107 | print("\n----------\nFeature extraction starts.\n----------\n\n") 108 | walks, counts = run_parallel_feature_creation(args.input, 109 | args.vertex_set_cardinality, 110 | args.num_diffusions, 111 | args.workers) 112 | print("\n----------\nLearning starts.\n----------\n") 113 | 114 | if args.model == "non-pooled": 115 | learn_non_pooled_embeddings(walks, counts, args) 116 | else: 117 | learn_pooled_embeddings(walks, counts, args) 118 | 119 | if __name__ == "__main__": 120 | args = parameter_parser() 121 | main(args) 122 | -------------------------------------------------------------------------------- /src/diffusiontrees.py: -------------------------------------------------------------------------------- 1 | """Eulerian Diffuser.""" 2 | 3 | import random 4 | import networkx as nx 5 | 6 | class EulerianDiffuser: 7 | """ 8 | Class to make diffusions for a given graph. 9 | """ 10 | def __init__(self, graph, number_of_nodes): 11 | """ 12 | Initializing a diffusion object. 13 | :param graph: Graph of interest. 14 | :param number_of_nodes: Cardinality of vertex set. 15 | """ 16 | self.graph = graph 17 | self.number_of_nodes = number_of_nodes 18 | self.nodes = graph.nodes() 19 | self.run_diffusions() 20 | 21 | def run_diffusion_process(self, node): 22 | """ 23 | Generating a diffusion tree from a given source node. 24 | Linearizing it with an Eulerian tour. 25 | :param node: Source of diffusion. 26 | :return euler: Eulerian linear node sequence. 27 | """ 28 | infected = [node] 29 | sub_graph = nx.DiGraph() 30 | sub_graph.add_node(node) 31 | infected_counter = 1 32 | while infected_counter < self.number_of_nodes: 33 | end_point = random.sample(infected, 1)[0] 34 | nebs = [node for node in self.graph.neighbors(end_point)] 35 | sample = random.sample(nebs, 1)[0] 36 | if sample not in infected: 37 | infected_counter = infected_counter + 1 38 | infected = infected + [sample] 39 | sub_graph.add_edges_from([(end_point, sample), (sample, end_point)]) 40 | if infected_counter == self.number_of_nodes: 41 | break 42 | euler = [str(u) for u, v in nx.eulerian_circuit(sub_graph, infected[0])] 43 | if len(euler) == 0: 44 | euler = [str(u) for u, v in nx.eulerian_circuit(graph, infected[0])] 45 | return euler 46 | 47 | def run_diffusions(self): 48 | """ 49 | Running diffusions from every node. 50 | """ 51 | self.diffusions = {node: self.run_diffusion_process(node) for node in self.nodes} 52 | -------------------------------------------------------------------------------- /src/helper.py: -------------------------------------------------------------------------------- 1 | """Helper functions.""" 2 | 3 | import argparse 4 | import numpy as np 5 | from tqdm import tqdm 6 | from texttable import Texttable 7 | from gensim.models.doc2vec import TaggedDocument 8 | 9 | def parameter_parser(): 10 | """ 11 | A method to parse up command line parameters. 12 | By default it gives an embedding of the Facebook Restaurants network. 13 | The default hyperparameters give a high quality representation already without grid search. 14 | :return : Object with hyperparameters. 15 | """ 16 | 17 | parser = argparse.ArgumentParser(description="Run diffusion2vec.") 18 | 19 | parser.add_argument("--input", 20 | nargs="?", 21 | default="./data/restaurant_edges.csv", 22 | help="Input graph path") 23 | 24 | parser.add_argument("--output", 25 | nargs="?", 26 | default="./output/restaurant.csv", 27 | help="Embeddings path") 28 | 29 | parser.add_argument("--model", 30 | nargs="?", 31 | default="pooled", 32 | help="Model type.") 33 | 34 | parser.add_argument("--dimensions", 35 | type=int, 36 | default=128, 37 | help="Number of dimensions. Default is 128.") 38 | 39 | parser.add_argument("--vertex-set-cardinality", 40 | type=int, 41 | default=40, 42 | help="Length of diffusion per source is 2*cardianlity-1. Default is 40.") 43 | 44 | parser.add_argument("--num-diffusions", 45 | type=int, 46 | default=10, 47 | help="Number of diffusions per source. Default is 10.") 48 | 49 | parser.add_argument("--window-size", 50 | type=int, 51 | default=10, 52 | help="Context size for optimization. Default is 10.") 53 | 54 | parser.add_argument("--iter", 55 | default=1, 56 | type=int, 57 | help="Number of epochs in ASGD. Default is 1.") 58 | 59 | parser.add_argument("--workers", 60 | type=int, 61 | default=4, 62 | help="Number of cores. Default is 4.") 63 | 64 | parser.add_argument("--alpha", 65 | type=float, 66 | default=0.025, 67 | help="Initial learning rate. Default is 0.025.") 68 | 69 | return parser.parse_args() 70 | 71 | 72 | def argument_printer(args): 73 | """ 74 | Function to print the arguments in a nice tabular format. 75 | :param args: Parameters used for the model. 76 | """ 77 | args = vars(args) 78 | keys = sorted(args.keys()) 79 | t = Texttable() 80 | t.add_rows([["Parameter", "Value"]]) 81 | t.add_rows([[k.replace("_", " ").capitalize(), args[k]] for k in keys]) 82 | print(t.draw()) 83 | 84 | def generation_tab_printer(read_times, generation_times): 85 | """ 86 | Function to print the time logs in a nice tabular format. 87 | :param read_times: List of reading times. 88 | :param generation_times: List of generation times. 89 | """ 90 | t = Texttable() 91 | t.add_rows([["Metric", "Value"], 92 | ["Mean graph read time:", np.mean(read_times)], 93 | ["Standard deviation of read time.", np.std(read_times)]]) 94 | print(t.draw()) 95 | t = Texttable() 96 | t.add_rows([["Metric", "Value"], 97 | ["Mean sequence generation time:", np.mean(generation_times)], 98 | ["Standard deviation of generation time.", np.std(generation_times)]]) 99 | print(t.draw()) 100 | 101 | def result_processing(results): 102 | """ 103 | Function to separate the sequences from time measurements and process them. 104 | :param results: List of 3-length tuples including the sequences and results. 105 | :return walk_results: List of random walks. 106 | :return counts: Number of nodes. 107 | """ 108 | walk_results = [res[0] for res in results] 109 | read_time_results = [res[1] for res in results] 110 | generation_time_results = [res[2] for res in results] 111 | counts = [res[3] for res in results] 112 | generation_tab_printer(read_time_results, generation_time_results) 113 | walk_results = [walk for walks in walk_results for walk in walks] 114 | return walk_results, counts 115 | 116 | 117 | def process_non_pooled_model_data(walks, counts, args): 118 | """ 119 | Function to extract proximity statistics. 120 | :param walks: Diffusion lists. 121 | :param counts: Number of nodes. 122 | :param args: Arguments objects. 123 | :return docs: Processed walks. 124 | """ 125 | print("Run feature extraction across windows.") 126 | features = {str(node): [] for node in range(counts)} 127 | for walk in tqdm(walks): 128 | for i in range(len(walk)-args.window_size): 129 | for j in range(1, args.window_size+1): 130 | features[walk[i]].append(["+"+str(j)+"_"+walk[i+j]]) 131 | features[walk[i+j]].append(["_"+str(j)+"_"+walk[i]]) 132 | 133 | docs = [TaggedDocument(words=[x[0] for x in v], tags=[str(k)]) for k, v in features.items()] 134 | return docs 135 | -------------------------------------------------------------------------------- /src/subgraphcomponents.py: -------------------------------------------------------------------------------- 1 | """Subgraph components module.""" 2 | 3 | import time 4 | import random 5 | import pandas as pd 6 | import networkx as nx 7 | from diffusiontrees import EulerianDiffuser 8 | 9 | class SubGraphComponents: 10 | """ 11 | Methods separate the original graph and run diffusion on each node in the subgraphs. 12 | """ 13 | def __init__(self, edge_list_path, seeding, vertex_set_cardinality): 14 | """ 15 | Initializing the object with the main parameters. 16 | :param edge_list_path: Path to the csv with edges. 17 | :param seeding: Random seed. 18 | :param vertex_set_cardinality: Number of unique nodes per tree. 19 | """ 20 | self.seed = seeding 21 | self.vertex_set_cardinality = vertex_set_cardinality 22 | self.read_start_time = time.time() 23 | self.graph = nx.from_edgelist(pd.read_csv(edge_list_path, index_col=None).values.tolist()) 24 | self.counts = len(self.graph.nodes())+1 25 | self.separate_subcomponents() 26 | self.single_feature_generation_run() 27 | 28 | def separate_subcomponents(self): 29 | """ 30 | Finding the connected components. 31 | """ 32 | comps = [self.graph.subgraph(c) for c in nx.connected_components(self.graph)] 33 | self.graph = sorted(comps, key=len, reverse=True) 34 | self.read_time = time.time()-self.read_start_time 35 | 36 | def single_feature_generation_run(self): 37 | """ 38 | Running a round of diffusions and measuring the sequence generation performance. 39 | """ 40 | random.seed(self.seed) 41 | self.generation_start_time = time.time() 42 | self.paths = {} 43 | for sub_graph in self.graph: 44 | current_cardinality = len(sub_graph.nodes()) 45 | if current_cardinality < self.vertex_set_cardinality: 46 | self.vertex_set_cardinality = current_cardinality 47 | diffuser = EulerianDiffuser(sub_graph, self.vertex_set_cardinality) 48 | self.paths.update(diffuser.diffusions) 49 | self.paths = [v for k, v in self.paths.items()] 50 | self.generation_time = time.time() - self.generation_start_time 51 | --------------------------------------------------------------------------------