├── .DS_Store ├── README.md ├── data ├── .DS_Store ├── CONTROL_fmt │ ├── 0040013.preprocess_v1.csv │ ├── 0040014.preprocess_v1.csv │ ├── 0040017.preprocess_v1.csv │ ├── 0040018.preprocess_v1.csv │ ├── 0040019.preprocess_v1.csv │ ├── 0040020.preprocess_v1.csv │ ├── 0040023.preprocess_v1.csv │ ├── 0040024.preprocess_v1.csv │ ├── 0040026.preprocess_v1.csv │ ├── 0040027.preprocess_v1.csv │ ├── 0040030.preprocess_v1.csv │ ├── 0040031.preprocess_v1.csv │ ├── 0040033.preprocess_v1.csv │ ├── 0040035.preprocess_v1.csv │ ├── 0040036.preprocess_v1.csv │ ├── 0040038.preprocess_v1.csv │ ├── 0040043.preprocess_v1.csv │ ├── 0040045.preprocess_v1.csv │ ├── 0040048.preprocess_v1.csv │ ├── 0040050.preprocess_v1.csv │ ├── 0040051.preprocess_v1.csv │ ├── 0040052.preprocess_v1.csv │ ├── 0040053.preprocess_v1.csv │ ├── 0040054.preprocess_v1.csv │ ├── 0040055.preprocess_v1.csv │ ├── 0040056.preprocess_v1.csv │ ├── 0040057.preprocess_v1.csv │ ├── 0040058.preprocess_v1.csv │ ├── 0040061.preprocess_v1.csv │ ├── 0040062.preprocess_v1.csv │ ├── 0040063.preprocess_v1.csv │ ├── 0040065.preprocess_v1.csv │ ├── 0040066.preprocess_v1.csv │ ├── 0040067.preprocess_v1.csv │ ├── 0040068.preprocess_v1.csv │ ├── 0040069.preprocess_v1.csv │ ├── 0040074.preprocess_v1.csv │ ├── 0040076.preprocess_v1.csv │ ├── 0040086.preprocess_v1.csv │ ├── 0040087.preprocess_v1.csv │ ├── 0040090.preprocess_v1.csv │ ├── 0040091.preprocess_v1.csv │ ├── 0040093.preprocess_v1.csv │ ├── 0040095.preprocess_v1.csv │ ├── 0040102.preprocess_v1.csv │ ├── 0040104.preprocess_v1.csv │ ├── 0040107.preprocess_v1.csv │ ├── 0040111.preprocess_v1.csv │ ├── 0040113.preprocess_v1.csv │ ├── 0040114.preprocess_v1.csv │ ├── 0040115.preprocess_v1.csv │ ├── 0040116.preprocess_v1.csv │ ├── 0040118.preprocess_v1.csv │ ├── 0040119.preprocess_v1.csv │ ├── 0040120.preprocess_v1.csv │ ├── 0040121.preprocess_v1.csv │ ├── 0040123.preprocess_v1.csv │ ├── 0040124.preprocess_v1.csv │ ├── 0040125.preprocess_v1.csv │ ├── 0040127.preprocess_v1.csv │ ├── 0040128.preprocess_v1.csv │ ├── 0040129.preprocess_v1.csv │ ├── 0040130.preprocess_v1.csv │ ├── 0040131.preprocess_v1.csv │ ├── 0040134.preprocess_v1.csv │ ├── 0040135.preprocess_v1.csv │ ├── 0040136.preprocess_v1.csv │ ├── 0040138.preprocess_v1.csv │ ├── 0040139.preprocess_v1.csv │ ├── 0040140.preprocess_v1.csv │ ├── 0040141.preprocess_v1.csv │ ├── 0040144.preprocess_v1.csv │ ├── 0040146.preprocess_v1.csv │ └── 0040147.preprocess_v1.csv ├── power_atlas_info.csv └── test │ ├── 0040013.preprocess_v1.csv │ └── 0040014.preprocess_v1.csv ├── mn2vec_toy.png ├── multi_node2vec.py ├── requirements.txt ├── results └── test │ ├── .DS_Store │ └── r0.25 │ ├── mltn2v_control.csv │ ├── mltn2v_control.emb │ ├── mltn2v_results.csv │ └── mltn2v_results.emb └── src ├── __init__.py ├── __pycache__ ├── __init__.cpython-36.pyc ├── mltn2v_utils.cpython-36.pyc ├── multinode2vec.cpython-36.pyc └── nbrhd_gen_walk_nx.cpython-36.pyc ├── mltn2v_utils.py ├── multinode2vec.py ├── nbrhd_gen_walk.py ├── nbrhd_gen_walk_nx.py └── utils.py /.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jdwilson4/multi-node2vec/f6f86707e10227a7609bfcce5db4b21e03c932ea/.DS_Store -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # multi-node2vec 2 | This is Python source code for the multi-node2vec algorithm. Multi-node2vec is a fast network embedding method for multilayer networks 3 | that identifies a continuous and low-dimensional representation for the unique nodes in the network. 4 | 5 | Details of the algorithm can be found in the paper: *Fast Embedding of Multilayer Networks: An Algorithm and Application to Group fMRI* 6 | by JD Wilson, M Baybay, R Sankar, and P Stillman. 7 | 8 | **Preprint**: https://arxiv.org/pdf/1809.06437.pdf 9 | 10 | __Contributors__: 11 | - Melanie Baybay 12 | University of San Francisco, Department of Computer Science 13 | - Rishi Sankar 14 | Henry M. Gunn High School 15 | - James D. Wilson (maintainer) 16 | University of San Francisco, Department of Mathematics and Statistics 17 | 18 | **Questions or Bugs?** Contact James D. Wilson at jdwilson4@usfca.edu 19 | 20 | # Description 21 | 22 | ## The Mathematical Objective 23 | 24 | A multilayer network of length *m* is a collection of networks or graphs {G1, ..., Gm}, where the graph Gj models the relational structure of the *j*th layer of the network. Each layer Gj = (Vj, Wj) is described by the vertex set Vj that describes the units, or actors, of the layer, and the edge weights Wj that describes the strength of relationship between the nodes. Layers in the multilayer sequence may be heterogeneous across vertices, edges, and size. Denote the set of unique nodes in {G1, ..., Gm} by **N**, and let 25 | *N* = |**N**| denote the number of nodes in that set. 26 | 27 | The aim of the **multi-node2vec** is to learn an interpretable low-dimensional feature representation of **N**. In particular, it seeks a *D*-dimensional representation 28 | 29 | **F**: **N** --> R*D*, 30 | 31 | where *D* < < N. The function **F** can be viewed as an *N* x *D* matrix whose rows {**f**v: v = 1, ..., N} represent the feature space of each node in **N**. 32 | 33 | ## The Algorithm 34 | The **multi-node2vec** algorithm estimates **F** through maximum likelihood estimation, and relies upon two core steps 35 | 36 | 1) __NeighborhoodSearch__: a collection of vertex neighborhoods from the observed multilayer graph, also known as a *BagofNodes*, is identified. This is done through a 2nd order random walk on the multilayer network. 37 | 38 | 2) __Optimization__: Given a *BagofNodes*, **F** is then estimated through the maximization of the log-likelihood of **F** | **N**. This is done through the application of stochastic gradient descent on a two-layer Skip-gram neural network model. 39 | 40 | The following image provides a schematic: 41 | 42 | ![multi-node2vec schematic](https://github.com/jdwilson4/multi-node2vec/blob/master/mn2vec_toy.png) 43 | 44 | # Running multi-node2vec 45 | 46 | ## Requirements 47 | This package requires Python == 3.6 with the following libraries: 48 | - numpy==1.12.1 49 | - pandas==0.24.0 50 | - gensim==2.3.0 51 | - networkx==2.5.1 52 | 53 | You can install these libraries by running the command 54 | 55 | ``` 56 | pip install -r requirements.txt 57 | ``` 58 | 59 | from this project's root directory. 60 | 61 | 62 | ## Usage 63 | ``` 64 | python3 multi_node2vec.py [--dir [DIR]] [--output [OUTPUT]] [--d [D]] [--walk_length [WALK_LENGTH]] [--window_size [WINDOW_SIZE]][--n_samples [N_SAMPLES]][--thresh [THRESH]][--w2v_iter [W2V_ITER]] [--w2v_workers [W2V_WORKERS]] [--rvals [RVALS]] [--pvals [PVALS]] [--qvals [QVALS]] 65 | ``` 66 | 67 | ***Arguments*** 68 | 69 | - --dir [directory name] : Absolute path to directory of correlation/adjacency matrix files in csv format. Note that each .csv should contain an adjacency matrix with columns and rows labeled by the node ID. 70 | - --output [filename] : Absolute path to output file (no extension). 71 | - --d [dimensions] : Dimensionality. Default is 100. 72 | - --walk_length [n] : Length of each random walk for identifying multilayer neighborhoods. Default is 100. 73 | - --window_size [w] : Size of context window used for Skip Gram optimization. Default is 10. 74 | - --n_samples [samples] : Number of times to sample a layer. Default is 1. 75 | - --thresh [thresh] : Threshold for converting a weighted network to an unweighted one. All weights less than or equal to thresh will be considered 0 and all others 1. Default is 0.5. Use None if the network is unweighted. 76 | - --w2v_workers [workers] : Number of parallel worker threads. Default is 8. 77 | - --rvals [layer walk prob]: The unnormalized walk probability for traversing layers. Default is .25. 78 | - --pvals [return prob] : The unnormalized walk probability of returning to a previously seen node. Default is 1. 79 | - --qvals [explore prob] : The unnormalized walk probability of exploring new nodes. Default is 0.50. 80 | 81 | ### Examples 82 | 83 | __Quick Test example__ 84 | 85 | This example runs **multi-node2vec** on a small test multilayer network with 2 layers and 264 nodes in each layer. It takes about 2 minutes to run on a personal computer using 8 cores. 86 | ``` 87 | python3 multi_node2vec.py --dir data/test --output results/test --d 100 --window_size 2 --n_samples 1 --thresh 0.5 --rvals 0.25 88 | ``` 89 | 90 | __fMRI Case Study__ 91 | 92 | This example runs **multi-node2vec** on the multilayer network representing group fMRI of 74 healthy controls as run in the paper *Fast Embedding of Multilayer Networks: An Algorithm and Application to Group fMRI*. The model will generate 93 | generate 100 features for each of 264 unique nodes using a walk parameter *r = 0.25*. The values of *p* (=1) and *q* (=0.50) are set to the default of what is available in the original **node2vec** specification. It takes about an hour to run on a personal computer using 8 cores. 94 | ``` 95 | python3 multi_node2vec.py --dir data/CONTROL_fmt --output results/control --d 100 --window_size 10 --n_samples 1 --rvals 0.25 --pvals 1 --thresh 0.5 --qvals 0.5 96 | ``` 97 | 98 | 99 | 100 | -------------------------------------------------------------------------------- /data/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jdwilson4/multi-node2vec/f6f86707e10227a7609bfcce5db4b21e03c932ea/data/.DS_Store -------------------------------------------------------------------------------- /data/power_atlas_info.csv: -------------------------------------------------------------------------------- 1 | ROI,X,Y,Z,MasterAssignment,SuggestedSystem,color,network_revised,roi_name_unique,color_updated 2 | 1,-25,-98,-12,-1,Uncertain,White,uncertain,roi_1_uncertain,white 3 | 2,27,-97,-13,-1,Uncertain,White,uncertain,roi_2_uncertain,white 4 | 3,24,32,-18,-1,Uncertain,White,uncertain,roi_3_uncertain,white 5 | 4,-56,-45,-24,-1,Uncertain,White,uncertain,roi_4_uncertain,white 6 | 5,8,41,-24,-1,Uncertain,White,uncertain,roi_5_uncertain,white 7 | 6,-21,-22,-20,-1,Uncertain,White,uncertain,roi_6_uncertain,white 8 | 7,17,-28,-17,-1,Uncertain,White,uncertain,roi_7_uncertain,white 9 | 8,-37,-29,-26,-1,Uncertain,White,uncertain,roi_8_uncertain,white 10 | 9,65,-24,-19,-1,Uncertain,White,uncertain,roi_9_uncertain,white 11 | 10,52,-34,-27,-1,Uncertain,White,uncertain,roi_10_uncertain,white 12 | 11,55,-31,-17,-1,Uncertain,White,uncertain,roi_11_uncertain,white 13 | 12,34,38,-12,-1,Uncertain,White,uncertain,roi_12_uncertain,white 14 | 13,-7,-52,61,1,Sensory/somatomotor Hand,Cyan,hand,roi_13_hand,cyan 15 | 14,-14,-18,40,1,Sensory/somatomotor Hand,Cyan,hand,roi_14_hand,cyan 16 | 15,0,-15,47,1,Sensory/somatomotor Hand,Cyan,hand,roi_15_hand,cyan 17 | 16,10,-2,45,1,Sensory/somatomotor Hand,Cyan,hand,roi_16_hand,cyan 18 | 17,-7,-21,65,1,Sensory/somatomotor Hand,Cyan,hand,roi_17_hand,cyan 19 | 18,-7,-33,72,1,Sensory/somatomotor Hand,Cyan,hand,roi_18_hand,cyan 20 | 19,13,-33,75,1,Sensory/somatomotor Hand,Cyan,hand,roi_19_hand,cyan 21 | 20,-54,-23,43,1,Sensory/somatomotor Hand,Cyan,hand,roi_20_hand,cyan 22 | 21,29,-17,71,1,Sensory/somatomotor Hand,Cyan,hand,roi_21_hand,cyan 23 | 22,10,-46,73,1,Sensory/somatomotor Hand,Cyan,hand,roi_22_hand,cyan 24 | 23,-23,-30,72,1,Sensory/somatomotor Hand,Cyan,hand,roi_23_hand,cyan 25 | 24,-40,-19,54,1,Sensory/somatomotor Hand,Cyan,hand,roi_24_hand,cyan 26 | 25,29,-39,59,1,Sensory/somatomotor Hand,Cyan,hand,roi_25_hand,cyan 27 | 26,50,-20,42,1,Sensory/somatomotor Hand,Cyan,hand,roi_26_hand,cyan 28 | 27,-38,-27,69,1,Sensory/somatomotor Hand,Cyan,hand,roi_27_hand,cyan 29 | 28,20,-29,60,1,Sensory/somatomotor Hand,Cyan,hand,roi_28_hand,cyan 30 | 29,44,-8,57,1,Sensory/somatomotor Hand,Cyan,hand,roi_29_hand,cyan 31 | 30,-29,-43,61,1,Sensory/somatomotor Hand,Cyan,hand,roi_30_hand,cyan 32 | 31,10,-17,74,1,Sensory/somatomotor Hand,Cyan,hand,roi_31_hand,cyan 33 | 32,22,-42,69,1,Sensory/somatomotor Hand,Cyan,hand,roi_32_hand,cyan 34 | 33,-45,-32,47,1,Sensory/somatomotor Hand,Cyan,hand,roi_33_hand,cyan 35 | 34,-21,-31,61,1,Sensory/somatomotor Hand,Cyan,hand,roi_34_hand,cyan 36 | 35,-13,-17,75,1,Sensory/somatomotor Hand,Cyan,hand,roi_35_hand,cyan 37 | 36,42,-20,55,1,Sensory/somatomotor Hand,Cyan,hand,roi_36_hand,cyan 38 | 37,-38,-15,69,1,Sensory/somatomotor Hand,Cyan,hand,roi_37_hand,cyan 39 | 38,-16,-46,73,1,Sensory/somatomotor Hand,Cyan,hand,roi_38_hand,cyan 40 | 39,2,-28,60,1,Sensory/somatomotor Hand,Cyan,hand,roi_39_hand,cyan 41 | 40,3,-17,58,1,Sensory/somatomotor Hand,Cyan,hand,roi_40_hand,cyan 42 | 41,38,-17,45,1,Sensory/somatomotor Hand,Cyan,hand,roi_41_hand,cyan 43 | 42,-49,-11,35,2,Sensory/somatomotor Mouth,Orange,mouth,roi_42_mouth,orange 44 | 43,36,-9,14,2,Sensory/somatomotor Mouth,Orange,mouth,roi_43_mouth,orange 45 | 44,51,-6,32,2,Sensory/somatomotor Mouth,Orange,mouth,roi_44_mouth,orange 46 | 45,-53,-10,24,2,Sensory/somatomotor Mouth,Orange,mouth,roi_45_mouth,orange 47 | 46,66,-8,25,2,Sensory/somatomotor Mouth,Orange,mouth,roi_46_mouth,orange 48 | 47,-3,2,53,3,Cingulo-opercular Task Control,Purple,cing_oper_task_control,roi_47_cing_oper_task_control,purple 49 | 48,54,-28,34,3,Cingulo-opercular Task Control,Purple,cing_oper_task_control,roi_48_cing_oper_task_control,purple 50 | 49,19,-8,64,3,Cingulo-opercular Task Control,Purple,cing_oper_task_control,roi_49_cing_oper_task_control,purple 51 | 50,-16,-5,71,3,Cingulo-opercular Task Control,Purple,cing_oper_task_control,roi_50_cing_oper_task_control,purple 52 | 51,-10,-2,42,3,Cingulo-opercular Task Control,Purple,cing_oper_task_control,roi_51_cing_oper_task_control,purple 53 | 52,37,1,-4,3,Cingulo-opercular Task Control,Purple,cing_oper_task_control,roi_52_cing_oper_task_control,purple 54 | 53,13,-1,70,3,Cingulo-opercular Task Control,Purple,cing_oper_task_control,roi_53_cing_oper_task_control,purple 55 | 54,7,8,51,3,Cingulo-opercular Task Control,Purple,cing_oper_task_control,roi_54_cing_oper_task_control,purple 56 | 55,-45,0,9,3,Cingulo-opercular Task Control,Purple,cing_oper_task_control,roi_55_cing_oper_task_control,purple 57 | 56,49,8,-1,3,Cingulo-opercular Task Control,Purple,cing_oper_task_control,roi_56_cing_oper_task_control,purple 58 | 57,-34,3,4,3,Cingulo-opercular Task Control,Purple,cing_oper_task_control,roi_57_cing_oper_task_control,purple 59 | 58,-51,8,-2,3,Cingulo-opercular Task Control,Purple,cing_oper_task_control,roi_58_cing_oper_task_control,purple 60 | 59,-5,18,34,3,Cingulo-opercular Task Control,Purple,cing_oper_task_control,roi_59_cing_oper_task_control,purple 61 | 60,36,10,1,3,Cingulo-opercular Task Control,Purple,cing_oper_task_control,roi_60_cing_oper_task_control,purple 62 | 61,32,-26,13,4,Auditory,Pink,auditory,roi_61_auditory,pink 63 | 62,65,-33,20,4,Auditory,Pink,auditory,roi_62_auditory,pink 64 | 63,58,-16,7,4,Auditory,Pink,auditory,roi_63_auditory,pink 65 | 64,-38,-33,17,4,Auditory,Pink,auditory,roi_64_auditory,pink 66 | 65,-60,-25,14,4,Auditory,Pink,auditory,roi_65_auditory,pink 67 | 66,-49,-26,5,4,Auditory,Pink,auditory,roi_66_auditory,pink 68 | 67,43,-23,20,4,Auditory,Pink,auditory,roi_67_auditory,pink 69 | 68,-50,-34,26,4,Auditory,Pink,auditory,roi_68_auditory,pink 70 | 69,-53,-22,23,4,Auditory,Pink,auditory,roi_69_auditory,pink 71 | 70,-55,-9,12,4,Auditory,Pink,auditory,roi_70_auditory,pink 72 | 71,56,-5,13,4,Auditory,Pink,auditory,roi_71_auditory,pink 73 | 72,59,-17,29,4,Auditory,Pink,auditory,roi_72_auditory,pink 74 | 73,-30,-27,12,4,Auditory,Pink,auditory,roi_73_auditory,pink 75 | 74,-41,-75,26,5,Default mode,Red,dmn,roi_74_dmn,red 76 | 75,6,67,-4,5,Default mode,Red,dmn,roi_75_dmn,red 77 | 76,8,48,-15,5,Default mode,Red,dmn,roi_76_dmn,red 78 | 77,-13,-40,1,5,Default mode,Red,dmn,roi_77_dmn,red 79 | 78,-18,63,-9,5,Default mode,Red,dmn,roi_78_dmn,red 80 | 79,-46,-61,21,5,Default mode,Red,dmn,roi_79_dmn,red 81 | 80,43,-72,28,5,Default mode,Red,dmn,roi_80_dmn,red 82 | 81,-44,12,-34,5,Default mode,Red,dmn,roi_81_dmn,red 83 | 82,46,16,-30,5,Default mode,Red,dmn,roi_82_dmn,red 84 | 83,-68,-23,-16,5,Default mode,Red,dmn,roi_83_dmn,red 85 | 84,-58,-26,-15,-1,Uncertain,White,uncertain,roi_84_uncertain,white 86 | 85,27,16,-17,-1,Uncertain,White,uncertain,roi_85_uncertain,white 87 | 86,-44,-65,35,5,Default mode,Red,dmn,roi_86_dmn,red 88 | 87,-39,-75,44,5,Default mode,Red,dmn,roi_87_dmn,red 89 | 88,-7,-55,27,5,Default mode,Red,dmn,roi_88_dmn,red 90 | 89,6,-59,35,5,Default mode,Red,dmn,roi_89_dmn,red 91 | 90,-11,-56,16,5,Default mode,Red,dmn,roi_90_dmn,red 92 | 91,-3,-49,13,5,Default mode,Red,dmn,roi_91_dmn,red 93 | 92,8,-48,31,5,Default mode,Red,dmn,roi_92_dmn,red 94 | 93,15,-63,26,5,Default mode,Red,dmn,roi_93_dmn,red 95 | 94,-2,-37,44,5,Default mode,Red,dmn,roi_94_dmn,red 96 | 95,11,-54,17,5,Default mode,Red,dmn,roi_95_dmn,red 97 | 96,52,-59,36,5,Default mode,Red,dmn,roi_96_dmn,red 98 | 97,23,33,48,5,Default mode,Red,dmn,roi_97_dmn,red 99 | 98,-10,39,52,5,Default mode,Red,dmn,roi_98_dmn,red 100 | 99,-16,29,53,5,Default mode,Red,dmn,roi_99_dmn,red 101 | 100,-35,20,51,5,Default mode,Red,dmn,roi_100_dmn,red 102 | 101,22,39,39,5,Default mode,Red,dmn,roi_101_dmn,red 103 | 102,13,55,38,5,Default mode,Red,dmn,roi_102_dmn,red 104 | 103,-10,55,39,5,Default mode,Red,dmn,roi_103_dmn,red 105 | 104,-20,45,39,5,Default mode,Red,dmn,roi_104_dmn,red 106 | 105,6,54,16,5,Default mode,Red,dmn,roi_105_dmn,red 107 | 106,6,64,22,5,Default mode,Red,dmn,roi_106_dmn,red 108 | 107,-7,51,-1,5,Default mode,Red,dmn,roi_107_dmn,red 109 | 108,9,54,3,5,Default mode,Red,dmn,roi_108_dmn,red 110 | 109,-3,44,-9,5,Default mode,Red,dmn,roi_109_dmn,red 111 | 110,8,42,-5,5,Default mode,Red,dmn,roi_110_dmn,red 112 | 111,-11,45,8,5,Default mode,Red,dmn,roi_111_dmn,red 113 | 112,-2,38,36,5,Default mode,Red,dmn,roi_112_dmn,red 114 | 113,-3,42,16,5,Default mode,Red,dmn,roi_113_dmn,red 115 | 114,-20,64,19,5,Default mode,Red,dmn,roi_114_dmn,red 116 | 115,-8,48,23,5,Default mode,Red,dmn,roi_115_dmn,red 117 | 116,65,-12,-19,5,Default mode,Red,dmn,roi_116_dmn,red 118 | 117,-56,-13,-10,5,Default mode,Red,dmn,roi_117_dmn,red 119 | 118,-58,-30,-4,5,Default mode,Red,dmn,roi_118_dmn,red 120 | 119,65,-31,-9,5,Default mode,Red,dmn,roi_119_dmn,red 121 | 120,-68,-41,-5,5,Default mode,Red,dmn,roi_120_dmn,red 122 | 121,13,30,59,5,Default mode,Red,dmn,roi_121_dmn,red 123 | 122,12,36,20,5,Default mode,Red,dmn,roi_122_dmn,red 124 | 123,52,-2,-16,5,Default mode,Red,dmn,roi_123_dmn,red 125 | 124,-26,-40,-8,5,Default mode,Red,dmn,roi_124_dmn,red 126 | 125,27,-37,-13,5,Default mode,Red,dmn,roi_125_dmn,red 127 | 126,-34,-38,-16,5,Default mode,Red,dmn,roi_126_dmn,red 128 | 127,28,-77,-32,5,Default mode,Red,dmn,roi_127_dmn,red 129 | 128,52,7,-30,5,Default mode,Red,dmn,roi_128_dmn,red 130 | 129,-53,3,-27,5,Default mode,Red,dmn,roi_129_dmn,red 131 | 130,47,-50,29,5,Default mode,Red,dmn,roi_130_dmn,red 132 | 131,-49,-42,1,5,Default mode,Red,dmn,roi_131_dmn,red 133 | 132,-31,19,-19,-1,Uncertain,White,uncertain,roi_132_uncertain,white 134 | 133,-2,-35,31,6,Memory retrieval?,Gray,mem_retr,roi_133_mem_retr,gray 135 | 134,-7,-71,42,6,Memory retrieval?,Gray,mem_retr,roi_134_mem_retr,gray 136 | 135,11,-66,42,6,Memory retrieval?,Gray,mem_retr,roi_135_mem_retr,gray 137 | 136,4,-48,51,6,Memory retrieval?,Gray,mem_retr,roi_136_mem_retr,gray 138 | 137,-46,31,-13,5,Default mode,Red,dmn,roi_137_dmn,red 139 | 138,-10,11,67,11,Ventral attention,Teal,ventral_attention,roi_138_ventral_attention,olivedrab 140 | 139,49,35,-12,5,Default mode,Red,dmn,roi_139_dmn,red 141 | 140,8,-91,-7,-1,Uncertain,White,uncertain,roi_140_uncertain,white 142 | 141,17,-91,-14,-1,Uncertain,White,uncertain,roi_141_uncertain,white 143 | 142,-12,-95,-13,-1,Uncertain,White,uncertain,roi_142_uncertain,white 144 | 143,18,-47,-10,7,Visual,Blue,visual,roi_143_visual,blue 145 | 144,40,-72,14,7,Visual,Blue,visual,roi_144_visual,blue 146 | 145,8,-72,11,7,Visual,Blue,visual,roi_145_visual,blue 147 | 146,-8,-81,7,7,Visual,Blue,visual,roi_146_visual,blue 148 | 147,-28,-79,19,7,Visual,Blue,visual,roi_147_visual,blue 149 | 148,20,-66,2,7,Visual,Blue,visual,roi_148_visual,blue 150 | 149,-24,-91,19,7,Visual,Blue,visual,roi_149_visual,blue 151 | 150,27,-59,-9,7,Visual,Blue,visual,roi_150_visual,blue 152 | 151,-15,-72,-8,7,Visual,Blue,visual,roi_151_visual,blue 153 | 152,-18,-68,5,7,Visual,Blue,visual,roi_152_visual,blue 154 | 153,43,-78,-12,7,Visual,Blue,visual,roi_153_visual,blue 155 | 154,-47,-76,-10,7,Visual,Blue,visual,roi_154_visual,blue 156 | 155,-14,-91,31,7,Visual,Blue,visual,roi_155_visual,blue 157 | 156,15,-87,37,7,Visual,Blue,visual,roi_156_visual,blue 158 | 157,29,-77,25,7,Visual,Blue,visual,roi_157_visual,blue 159 | 158,20,-86,-2,7,Visual,Blue,visual,roi_158_visual,blue 160 | 159,15,-77,31,7,Visual,Blue,visual,roi_159_visual,blue 161 | 160,-16,-52,-1,7,Visual,Blue,visual,roi_160_visual,blue 162 | 161,42,-66,-8,7,Visual,Blue,visual,roi_161_visual,blue 163 | 162,24,-87,24,7,Visual,Blue,visual,roi_162_visual,blue 164 | 163,6,-72,24,7,Visual,Blue,visual,roi_163_visual,blue 165 | 164,-42,-74,0,7,Visual,Blue,visual,roi_164_visual,blue 166 | 165,26,-79,-16,7,Visual,Blue,visual,roi_165_visual,blue 167 | 166,-16,-77,34,7,Visual,Blue,visual,roi_166_visual,blue 168 | 167,-3,-81,21,7,Visual,Blue,visual,roi_167_visual,blue 169 | 168,-40,-88,-6,7,Visual,Blue,visual,roi_168_visual,blue 170 | 169,37,-84,13,7,Visual,Blue,visual,roi_169_visual,blue 171 | 170,6,-81,6,7,Visual,Blue,visual,roi_170_visual,blue 172 | 171,-26,-90,3,7,Visual,Blue,visual,roi_171_visual,blue 173 | 172,-33,-79,-13,7,Visual,Blue,visual,roi_172_visual,blue 174 | 173,37,-81,1,7,Visual,Blue,visual,roi_173_visual,blue 175 | 174,-44,2,46,8,Fronto-parietal Task Control,Yellow,fronto_parietal_task_control,roi_174_fronto_parietal_task_control,yellow 176 | 175,48,25,27,8,Fronto-parietal Task Control,Yellow,fronto_parietal_task_control,roi_175_fronto_parietal_task_control,yellow 177 | 176,-47,11,23,8,Fronto-parietal Task Control,Yellow,fronto_parietal_task_control,roi_176_fronto_parietal_task_control,yellow 178 | 177,-53,-49,43,8,Fronto-parietal Task Control,Yellow,fronto_parietal_task_control,roi_177_fronto_parietal_task_control,yellow 179 | 178,-23,11,64,8,Fronto-parietal Task Control,Yellow,fronto_parietal_task_control,roi_178_fronto_parietal_task_control,yellow 180 | 179,58,-53,-14,8,Fronto-parietal Task Control,Yellow,fronto_parietal_task_control,roi_179_fronto_parietal_task_control,yellow 181 | 180,24,45,-15,8,Fronto-parietal Task Control,Yellow,fronto_parietal_task_control,roi_180_fronto_parietal_task_control,yellow 182 | 181,34,54,-13,8,Fronto-parietal Task Control,Yellow,fronto_parietal_task_control,roi_181_fronto_parietal_task_control,yellow 183 | 182,-21,41,-20,-1,Uncertain,White,uncertain,roi_182_uncertain,white 184 | 183,-18,-76,-24,-1,Uncertain,White,uncertain,roi_183_uncertain,white 185 | 184,17,-80,-34,-1,Uncertain,White,uncertain,roi_184_uncertain,white 186 | 185,35,-67,-34,-1,Uncertain,White,uncertain,roi_185_uncertain,white 187 | 186,47,10,33,8,Fronto-parietal Task Control,Yellow,fronto_parietal_task_control,roi_186_fronto_parietal_task_control,yellow 188 | 187,-41,6,33,8,Fronto-parietal Task Control,Yellow,fronto_parietal_task_control,roi_187_fronto_parietal_task_control,yellow 189 | 188,-42,38,21,8,Fronto-parietal Task Control,Yellow,fronto_parietal_task_control,roi_188_fronto_parietal_task_control,yellow 190 | 189,38,43,15,8,Fronto-parietal Task Control,Yellow,fronto_parietal_task_control,roi_189_fronto_parietal_task_control,yellow 191 | 190,49,-42,45,8,Fronto-parietal Task Control,Yellow,fronto_parietal_task_control,roi_190_fronto_parietal_task_control,yellow 192 | 191,-28,-58,48,8,Fronto-parietal Task Control,Yellow,fronto_parietal_task_control,roi_191_fronto_parietal_task_control,yellow 193 | 192,44,-53,47,8,Fronto-parietal Task Control,Yellow,fronto_parietal_task_control,roi_192_fronto_parietal_task_control,yellow 194 | 193,32,14,56,8,Fronto-parietal Task Control,Yellow,fronto_parietal_task_control,roi_193_fronto_parietal_task_control,yellow 195 | 194,37,-65,40,8,Fronto-parietal Task Control,Yellow,fronto_parietal_task_control,roi_194_fronto_parietal_task_control,yellow 196 | 195,-42,-55,45,8,Fronto-parietal Task Control,Yellow,fronto_parietal_task_control,roi_195_fronto_parietal_task_control,yellow 197 | 196,40,18,40,8,Fronto-parietal Task Control,Yellow,fronto_parietal_task_control,roi_196_fronto_parietal_task_control,yellow 198 | 197,-34,55,4,8,Fronto-parietal Task Control,Yellow,fronto_parietal_task_control,roi_197_fronto_parietal_task_control,yellow 199 | 198,-42,45,-2,8,Fronto-parietal Task Control,Yellow,fronto_parietal_task_control,roi_198_fronto_parietal_task_control,yellow 200 | 199,33,-53,44,8,Fronto-parietal Task Control,Yellow,fronto_parietal_task_control,roi_199_fronto_parietal_task_control,yellow 201 | 200,43,49,-2,8,Fronto-parietal Task Control,Yellow,fronto_parietal_task_control,roi_200_fronto_parietal_task_control,yellow 202 | 201,-42,25,30,8,Fronto-parietal Task Control,Yellow,fronto_parietal_task_control,roi_201_fronto_parietal_task_control,yellow 203 | 202,-3,26,44,8,Fronto-parietal Task Control,Yellow,fronto_parietal_task_control,roi_202_fronto_parietal_task_control,yellow 204 | 203,11,-39,50,9,Salience,Black,salience,roi_203_salience,black 205 | 204,55,-45,37,9,Salience,Black,salience,roi_204_salience,black 206 | 205,42,0,47,9,Salience,Black,salience,roi_205_salience,black 207 | 206,31,33,26,9,Salience,Black,salience,roi_206_salience,black 208 | 207,48,22,10,9,Salience,Black,salience,roi_207_salience,black 209 | 208,-35,20,0,9,Salience,Black,salience,roi_208_salience,black 210 | 209,36,22,3,9,Salience,Black,salience,roi_209_salience,black 211 | 210,37,32,-2,9,Salience,Black,salience,roi_210_salience,black 212 | 211,34,16,-8,9,Salience,Black,salience,roi_211_salience,black 213 | 212,-11,26,25,9,Salience,Black,salience,roi_212_salience,black 214 | 213,-1,15,44,9,Salience,Black,salience,roi_213_salience,black 215 | 214,-28,52,21,9,Salience,Black,salience,roi_214_salience,black 216 | 215,0,30,27,9,Salience,Black,salience,roi_215_salience,black 217 | 216,5,23,37,9,Salience,Black,salience,roi_216_salience,black 218 | 217,10,22,27,9,Salience,Black,salience,roi_217_salience,black 219 | 218,31,56,14,9,Salience,Black,salience,roi_218_salience,black 220 | 219,26,50,27,9,Salience,Black,salience,roi_219_salience,black 221 | 220,-39,51,17,9,Salience,Black,salience,roi_220_salience,black 222 | 221,2,-24,30,6,Memory retrieval?,Gray,mem_retr,roi_221_mem_retr,gray 223 | 222,6,-24,0,10,Subcortical,Brown,subcortical,roi_222_subcortical,brown 224 | 223,-2,-13,12,10,Subcortical,Brown,subcortical,roi_223_subcortical,brown 225 | 224,-10,-18,7,10,Subcortical,Brown,subcortical,roi_224_subcortical,brown 226 | 225,12,-17,8,10,Subcortical,Brown,subcortical,roi_225_subcortical,brown 227 | 226,-5,-28,-4,10,Subcortical,Brown,subcortical,roi_226_subcortical,brown 228 | 227,-22,7,-5,10,Subcortical,Brown,subcortical,roi_227_subcortical,brown 229 | 228,-15,4,8,10,Subcortical,Brown,subcortical,roi_228_subcortical,brown 230 | 229,31,-14,2,10,Subcortical,Brown,subcortical,roi_229_subcortical,brown 231 | 230,23,10,1,10,Subcortical,Brown,subcortical,roi_230_subcortical,brown 232 | 231,29,1,4,10,Subcortical,Brown,subcortical,roi_231_subcortical,brown 233 | 232,-31,-11,0,10,Subcortical,Brown,subcortical,roi_232_subcortical,brown 234 | 233,15,5,7,10,Subcortical,Brown,subcortical,roi_233_subcortical,brown 235 | 234,9,-4,6,10,Subcortical,Brown,subcortical,roi_234_subcortical,brown 236 | 235,54,-43,22,11,Ventral attention,Teal,ventral_attention,roi_235_ventral_attention,olivedrab 237 | 236,-56,-50,10,11,Ventral attention,Teal,ventral_attention,roi_236_ventral_attention,olivedrab 238 | 237,-55,-40,14,11,Ventral attention,Teal,ventral_attention,roi_237_ventral_attention,olivedrab 239 | 238,52,-33,8,11,Ventral attention,Teal,ventral_attention,roi_238_ventral_attention,olivedrab 240 | 239,51,-29,-4,11,Ventral attention,Teal,ventral_attention,roi_239_ventral_attention,olivedrab 241 | 240,56,-46,11,11,Ventral attention,Teal,ventral_attention,roi_240_ventral_attention,olivedrab 242 | 241,53,33,1,11,Ventral attention,Teal,ventral_attention,roi_241_ventral_attention,olivedrab 243 | 242,-49,25,-1,11,Ventral attention,Teal,ventral_attention,roi_242_ventral_attention,olivedrab 244 | 243,-16,-65,-20,13,Cerebellar,Pale blue,cerebellar,roi_243_cerebellar,lightslateblue 245 | 244,-32,-55,-25,13,Cerebellar,Pale blue,cerebellar,roi_244_cerebellar,lightslateblue 246 | 245,22,-58,-23,13,Cerebellar,Pale blue,cerebellar,roi_245_cerebellar,lightslateblue 247 | 246,1,-62,-18,13,Cerebellar,Pale blue,cerebellar,roi_246_cerebellar,lightslateblue 248 | 247,33,-12,-34,-1,Uncertain,White,uncertain,roi_247_uncertain,white 249 | 248,-31,-10,-36,-1,Uncertain,White,uncertain,roi_248_uncertain,white 250 | 249,49,-3,-38,-1,Uncertain,White,uncertain,roi_249_uncertain,white 251 | 250,-50,-7,-39,-1,Uncertain,White,uncertain,roi_250_uncertain,white 252 | 251,10,-62,61,12,Dorsal attention,Green,dorsal_attention,roi_251_dorsal_attention,green 253 | 252,-52,-63,5,12,Dorsal attention,Green,dorsal_attention,roi_252_dorsal_attention,green 254 | 253,-47,-51,-21,-1,Uncertain,White,uncertain,roi_253_uncertain,white 255 | 254,46,-47,-17,-1,Uncertain,White,uncertain,roi_254_uncertain,white 256 | 255,47,-30,49,1,Sensory/somatomotor Hand,Cyan,hand,roi_255_hand,cyan 257 | 256,22,-65,48,12,Dorsal attention,Green,dorsal_attention,roi_256_dorsal_attention,green 258 | 257,46,-59,4,12,Dorsal attention,Green,dorsal_attention,roi_257_dorsal_attention,green 259 | 258,25,-58,60,12,Dorsal attention,Green,dorsal_attention,roi_258_dorsal_attention,green 260 | 259,-33,-46,47,12,Dorsal attention,Green,dorsal_attention,roi_259_dorsal_attention,green 261 | 260,-27,-71,37,12,Dorsal attention,Green,dorsal_attention,roi_260_dorsal_attention,green 262 | 261,-32,-1,54,12,Dorsal attention,Green,dorsal_attention,roi_261_dorsal_attention,green 263 | 262,-42,-60,-9,12,Dorsal attention,Green,dorsal_attention,roi_262_dorsal_attention,green 264 | 263,-17,-59,64,12,Dorsal attention,Green,dorsal_attention,roi_263_dorsal_attention,green 265 | 264,29,-5,54,12,Dorsal attention,Green,dorsal_attention,roi_264_dorsal_attention,green 266 | -------------------------------------------------------------------------------- /mn2vec_toy.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jdwilson4/multi-node2vec/f6f86707e10227a7609bfcce5db4b21e03c932ea/mn2vec_toy.png -------------------------------------------------------------------------------- /multi_node2vec.py: -------------------------------------------------------------------------------- 1 | ''' 2 | Wrapper for the multi-node2vec algorithm. 3 | 4 | Details can be found in the paper: "Fast Embedding of Multilayer Networks: An Algorithm and Application to Group fMRI" 5 | by JD Wilson, M Baybay, R Sankar, and P Stillman 6 | 7 | Preprint here: https://arxiv.org/pdf/1809.06437.pdf 8 | 9 | Contributors: 10 | - Melanie Baybay 11 | University of San Francisco, Department of Computer Science 12 | - Rishi Sankar 13 | Henry M. Gunn High School 14 | - James D. Wilson (maintainer) 15 | University of San Francisco, Department of Mathematics and Statistics 16 | 17 | Questions or Bugs? Contact James D. Wilson at jdwilson4@usfca.edu 18 | ''' 19 | import os 20 | import src as mltn2v 21 | import argparse 22 | import time 23 | 24 | 25 | def parse_args(): 26 | parser = argparse.ArgumentParser(description="Run multi-node2vec on multilayer networks.") 27 | 28 | parser.add_argument('--dir', nargs='?', default='data/CONTROL_fmt', 29 | help='Absolute path to directory of correlation/adjacency matrix files (csv format). Note that rows and columns must be properly labeled by node ID in each .csv.') 30 | 31 | parser.add_argument('--output', nargs='?', default='new_results/', 32 | help='Absolute path to output directory (no extension).') 33 | 34 | #parser.add_argument('--filename', nargs='?', default='new_results/mltn2v_control', 35 | # help='output filename (no extension).') 36 | 37 | parser.add_argument('--d', type=int, default=100, 38 | help='Dimensionality. Default is 100.') 39 | 40 | parser.add_argument('--walk_length', type=int, default=100, 41 | help='Length of each random walk. Default is 100.') 42 | 43 | parser.add_argument('--window_size', type=int, default = 10, 44 | help='Size of context window used for Skip Gram optimization. Default is 10.') 45 | 46 | parser.add_argument('--n_samples', type=int, default=1, 47 | help='Number of walks per node per layer. Default is 1.') 48 | 49 | parser.add_argument('--thresh', type=float, default=0.5, 50 | help='Threshold for converting a weighted network to an unweighted one. All weights less than or equal to thresh will be considered 0 and all others 1. Default is 0.5. Use None if the network is unweighted.') 51 | 52 | # parser.add_argument('--w2v_iter', default=1, type=int, 53 | # help='Number of epochs in word2vec') 54 | 55 | parser.add_argument('--w2v_workers', type=int, default=8, 56 | help='Number of parallel worker threads. Default is 8.') 57 | 58 | parser.add_argument('--rvals', type=float, default=0.25, 59 | help='Layer walk parameter for neighborhood search. Default is 0.25') 60 | 61 | parser.add_argument('--pvals', type=float, default=1, 62 | help='Return walk parameter for neighborhood search. Default is 1') 63 | 64 | parser.add_argument('--qvals', type=float, default=0.5, 65 | help='Exploration walk parameter for neighborhood search. Default is 0.50') 66 | 67 | 68 | return parser.parse_args() 69 | 70 | 71 | def main(args): 72 | start = time.time() 73 | # PARSE LAYERS -- THRESHOLD & CONVERT TO BINARY 74 | layers = mltn2v.timed_invoke("parsing network layers", 75 | lambda: mltn2v.parse_matrix_layers(args.dir, binary=True, thresh=args.thresh)) 76 | # check if layers were parsed 77 | if layers: 78 | # EXTRACT NEIGHBORHOODS 79 | nbrhd_dict = mltn2v.timed_invoke("extracting neighborhoods", 80 | lambda: mltn2v.extract_neighborhoods_walk(layers, args.walk_length, args.rvals, args.pvals, args.qvals)) 81 | # GENERATE FEATURES 82 | out = mltn2v.clean_output(args.output) 83 | for w in args.rvals: 84 | out_path = os.path.join(out, 'r' + str(w) + '/mltn2v_results') 85 | mltn2v.timed_invoke("generating features", 86 | lambda: mltn2v.generate_features(nbrhd_dict[w], args.d, out_path, nbrhd_size=args.window_size, 87 | w2v_iter=1, workers=args.w2v_workers)) 88 | 89 | print("\nCompleted Multilayer Network Embedding for r=" + str(w) + " in {:.2f} secs.\nSee results:".format(time.time() - start)) 90 | print("\t" + out_path + ".csv") 91 | print("Completed Multilayer Network Embedding for all r values.") 92 | else: 93 | print("Whoops!") 94 | 95 | 96 | if __name__ == '__main__': 97 | args = parse_args() 98 | args.rvals = [args.rvals] 99 | main(args) 100 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | numpy==1.12.1 2 | pandas==0.24.0 3 | gensim==2.3.0 4 | networkx==2.5.1 5 | -------------------------------------------------------------------------------- /results/test/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jdwilson4/multi-node2vec/f6f86707e10227a7609bfcce5db4b21e03c932ea/results/test/.DS_Store -------------------------------------------------------------------------------- /src/__init__.py: -------------------------------------------------------------------------------- 1 | from .multinode2vec import * 2 | from .mltn2v_utils import * -------------------------------------------------------------------------------- /src/__pycache__/__init__.cpython-36.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jdwilson4/multi-node2vec/f6f86707e10227a7609bfcce5db4b21e03c932ea/src/__pycache__/__init__.cpython-36.pyc -------------------------------------------------------------------------------- /src/__pycache__/mltn2v_utils.cpython-36.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jdwilson4/multi-node2vec/f6f86707e10227a7609bfcce5db4b21e03c932ea/src/__pycache__/mltn2v_utils.cpython-36.pyc -------------------------------------------------------------------------------- /src/__pycache__/multinode2vec.cpython-36.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jdwilson4/multi-node2vec/f6f86707e10227a7609bfcce5db4b21e03c932ea/src/__pycache__/multinode2vec.cpython-36.pyc -------------------------------------------------------------------------------- /src/__pycache__/nbrhd_gen_walk_nx.cpython-36.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jdwilson4/multi-node2vec/f6f86707e10227a7609bfcce5db4b21e03c932ea/src/__pycache__/nbrhd_gen_walk_nx.cpython-36.pyc -------------------------------------------------------------------------------- /src/mltn2v_utils.py: -------------------------------------------------------------------------------- 1 | """ 2 | Helper functions for parsing multilayer graphs and layers. 3 | 4 | Details of multi-node2vec can be found in the paper: "Fast Embedding of Multilayer Networks: An Algorithm and Application to Group fMRI" 5 | by JD Wilson, M Baybay, R Sankar, and P Stillman 6 | 7 | Preprint here: https://arxiv.org/pdf/1809.06437.pdf 8 | 9 | Contributors: 10 | - Melanie Baybay 11 | University of San Francisco, Department of Computer Science 12 | - Rishi Sankar 13 | Henry M. Gunn High School 14 | - James D. Wilson (maintainer) 15 | University of San Francisco, Department of Mathematics and Statistics 16 | 17 | Questions or Bugs? Contact James D. Wilson at jdwilson4@usfca.edu 18 | """ 19 | 20 | import os 21 | import pandas as pd 22 | from pandas.api.types import is_numeric_dtype 23 | import time 24 | 25 | 26 | # ------------------------------------------------------------------------------- 27 | # PARSING AND CONVERSION FOR MULTILAYER GRAPHS 28 | # ------------------------------------------------------------------------------- 29 | def parse_matrix_layers(network_dir, delim=',', binary=False, thresh=None): 30 | """ 31 | Converts directory of adjacency matrix files into pandas dataframes. 32 | :param network_dir: Directory of adjacency matrix files 33 | :param delim: separator for adjacency matrix 34 | :param binary: boolean of whether or not to convert edge weights to binary 35 | :param thresh: threshold for edge weights. Will accepts weights <= thresh 36 | :return: List of adjacency lists. Each adjacency list is one layer and is represented 37 | as pandas DataFrames with 'source', 'target', 'weight' columns. 38 | """ 39 | # expand directory path 40 | network_dir = expand_path(network_dir) 41 | 42 | # iterate files and convert to pandas dataframes 43 | layers = [] 44 | for network_file in os.listdir(network_dir): 45 | file_path = os.path.join(network_dir, network_file) 46 | try: 47 | # read as pandas DataFrame, index=source, col=target 48 | layer = pd.read_csv(file_path, index_col=0) 49 | if layer.shape[0] != layer.shape[1]: 50 | print('[ERROR] Invalid adjacency matrix. Expecting matrix with index as source and column as target.') 51 | return 52 | if thresh is not None: 53 | layer[layer <= thresh] = 0 54 | if binary: 55 | layer[layer != 0] = 1 56 | # ensure that index (node name) is string, since word2vec will need it as str 57 | if is_numeric_dtype(layer.index): 58 | layer.index = layer.index.map(str) 59 | # replace all 0s with NaN 60 | layer.replace(to_replace=0, value=pd.np.nan, inplace=True) 61 | # convert matrix --> adjacency list with cols ["source", "target", "weight"] 62 | layer = layer.stack(dropna=True).reset_index() 63 | # rename columns 64 | layer.columns = ["source", "target", "weight"] 65 | layers.append(layer) 66 | except Exception as e: 67 | print('[ERROR] Could not read file "{}": {} '.format(file_path, e)) 68 | return layers 69 | 70 | 71 | def expand_path(path): 72 | """ 73 | Expands a file path to handle user and environmental variables. 74 | :param path: path to expand 75 | :return: expanded path 76 | """ 77 | new_path = os.path.expanduser(path) 78 | return os.path.expandvars(new_path) 79 | 80 | 81 | # ------------------------------------------------------------------------------- 82 | # OUTPUT 83 | # ------------------------------------------------------------------------------- 84 | def feature_matrix_to_csv(ftrs, filename): 85 | """ 86 | Convert feature matrix to csv. 87 | :param matrix: pandas DataFrame of features 88 | :param filename: absolute path to output file (no extension) 89 | :param nodes_on: if True, first column indicates node_id 90 | :return: 91 | """ 92 | out = filename + ".csv" 93 | ftrs.to_csv(out, sep=',', header=False) 94 | return 95 | 96 | 97 | def timed_invoke(action_desc, method): 98 | """ 99 | Invokes a method with timing. 100 | :param action_desc: The string describing the method action 101 | :param method: The method to invoke 102 | :return: The return object of the method 103 | """ 104 | print('Started {}...'.format(action_desc)) 105 | start = time.time() 106 | try: 107 | output = method() 108 | print('Finished {} in {} seconds'.format(action_desc, int(time.time() - start))) 109 | return output 110 | except Exception: 111 | print('Exception while {} after {} seconds'.format(action_desc, int(time.time() - start))) 112 | raise 113 | 114 | 115 | def clean_output(directory): 116 | """ 117 | Checks if output directory exists, otherwise it is created. 118 | """ 119 | directory = expand_path(directory) 120 | if os.path.isdir(directory): 121 | return directory 122 | else: 123 | os.makedirs(directory) 124 | print("[WARNING] Directory not found. Created {}".format(directory)) 125 | return directory 126 | -------------------------------------------------------------------------------- /src/multinode2vec.py: -------------------------------------------------------------------------------- 1 | """ 2 | Core functions of the multi-node2vec algorithm. 3 | 4 | Details can be found in the paper: "Fast Embedding of Multilayer Networks: An Algorithm and Application to Group fMRI" 5 | by JD Wilson, M Baybay, R Sankar, and P Stillman 6 | 7 | Preprint here: https://arxiv.org/pdf/1809.06437.pdf 8 | 9 | Contributors: 10 | - Melanie Baybay 11 | University of San Francisco, Department of Computer Science 12 | - Rishi Sankar 13 | Henry M. Gunn High School 14 | - James D. Wilson (maintainer) 15 | University of San Francisco, Department of Mathematics and Statistics 16 | 17 | Questions or Bugs? Contact James D. Wilson at jdwilson4@usfca.edu 18 | """ 19 | from gensim.models import word2vec as w2v 20 | from .mltn2v_utils import * 21 | from .nbrhd_gen_walk_nx import * 22 | import time 23 | import networkx as nx 24 | 25 | 26 | # ------------------------------------------------------------------------------- 27 | # multinode2vec 28 | # ------------------------------------------------------------------------------- 29 | def generate_features(nbrhds, d, out, nbrhd_size=-1, w2v_iter=1, workers=8, sg=1): 30 | """ 31 | Generates d features for each unique node in a multilayer network based on 32 | its neighborhood. 33 | 34 | :param G_m: multilayer graph 35 | :param d: feature dimensionality 36 | :param out: absolute path for output file (no extension, file type) 37 | :param nbrhd_size: window size for Skip-Gram optimization 38 | :param n_samples: number of generated neighborhoods per node 39 | :param w2v_iter: number of word2vec training epochs 40 | :param workers: number of workers 41 | :param sg: sets word2vec architecture. 1 for Skip-Gram, 0 for CBOW 42 | :return: n x d network embedding 43 | """ 44 | print("Total Neighborhoods: {}".format(len(nbrhds))) 45 | w2v_model = w2v.Word2Vec(nbrhds, size=d, window=nbrhd_size, min_count=0, 46 | workers=workers, iter=w2v_iter, sg=sg) 47 | embfile = out + ".emb" 48 | splitpath = embfile.split('/') 49 | if len(splitpath) > 1: 50 | dirs = embfile[:-len(splitpath[-1])] 51 | if not os.path.exists(dirs): 52 | os.makedirs(dirs) 53 | if not os.path.exists(embfile): 54 | with open(embfile, 'w'): pass 55 | w2v_model.wv.save_word2vec_format(embfile) 56 | ftrs = emb_to_pandas(embfile) 57 | feature_matrix_to_csv(ftrs, out) 58 | return ftrs 59 | 60 | 61 | # ------------------------------------------------------------------------------- 62 | # NEIGHBORHOODS 63 | # ------------------------------------------------------------------------------- 64 | def extract_neighborhoods_walk(layers, nbrhd_size, wvals, p, q, is_directed=False, weighted=False): 65 | nxg = [] 66 | for layer in layers: 67 | nxg.append(nx.convert_matrix.from_pandas_edgelist(layer,edge_attr='weight')) 68 | 69 | start = time.time() 70 | nbrhd_gen = NeighborhoodGen(nxg, p, q, is_directed=is_directed, weighted=weighted) 71 | print("Finished initialization of neighborhood generator in " + str(time.time() - start) + " seconds.") 72 | 73 | neighborhood_dict = {} 74 | for w in wvals: 75 | neighborhoods = [] 76 | for i in range(len(nxg)): 77 | layer = nxg[i] 78 | for node in layer.nodes(): 79 | for j in range(52): 80 | neighborhoods.append(nbrhd_gen.multinode2vec_walk(w, nbrhd_size, node, i)) 81 | print("Finished nbrhd generation for r=" + str(w)) 82 | neighborhood_dict[w] = neighborhoods 83 | 84 | return neighborhood_dict 85 | 86 | def extract_neighborhoods(layers, nbrhd_size, n_samples, weighted=False): 87 | """ 88 | Extracts neighborhoods of length, nbrhd_size, for each node in each layer. 89 | :param layers: list of adjacency lists as pandas DataFrames with columns ["source", "target", "weight"] 90 | :param nbrhd_size: number of nodes per neighborhood 91 | :param n_samples: number of samples per node 92 | :param weighted: whether to select neighborhoods by highest weight 93 | :return: list of neighborhoods, represented as lists 94 | """ 95 | neighborhoods = [] 96 | if weighted: 97 | for layer in layers: 98 | for node in layer["source"].unique(): 99 | neighbors = layer.loc[layer["source"] == node, "target"] 100 | neighbors.sort_values(by="weight", ascending=False, inplace=True) 101 | neighborhoods.extend( 102 | extract_node_neighborhoods(node, neighbors, nbrhd_size, n_samples) 103 | ) 104 | else: 105 | for layer in layers: 106 | for node in layer["source"].unique(): 107 | neighbors = layer.loc[layer["source"] == node, "target"] 108 | neighborhoods.extend( 109 | extract_node_neighborhoods(node, neighbors, nbrhd_size, n_samples) 110 | ) 111 | return neighborhoods 112 | 113 | 114 | def extract_node_neighborhoods(node, neighbors, nbrhd_size, n_samples): 115 | if len(neighbors) < nbrhd_size: 116 | print("[WARNING] Selected neighborhood size {} > node-{}'s degree {}. " 117 | "Setting neighborhood size to {} for node-{}." 118 | .format(nbrhd_size, node, len(neighbors), len(neighbors), node)) 119 | nbrhd_size = len(neighbors) 120 | node_neighborhoods = [] 121 | n = 0 122 | while n < n_samples: 123 | nbrhd = [node] 124 | nbrhd.extend(neighbors.sample(n=nbrhd_size-1).values) 125 | node_neighborhoods.append(nbrhd) 126 | n += 1 127 | return node_neighborhoods 128 | 129 | 130 | # ------------------------------------------------------------------------------- 131 | # HELPERS 132 | # ------------------------------------------------------------------------------- 133 | def emb_to_pandas(emb_file): 134 | """ 135 | Converts embedding file, as extracted from trained word2vec model, to a numpy n-dimensional array. 136 | 137 | :param emb_file: absolute path to word2vec embedding file 138 | :return: numpy ndarray, (N x d) 139 | """ 140 | ftrs = pd.read_csv(emb_file, delim_whitespace=True, skiprows=1, header=None, index_col=0) 141 | ftrs.sort_index(inplace=True) 142 | return ftrs 143 | -------------------------------------------------------------------------------- /src/nbrhd_gen_walk.py: -------------------------------------------------------------------------------- 1 | ''' 2 | Neighborhood aliasing procedure used for fast random walks on multilayer networks. 3 | 4 | Details can be found in the paper: "Fast Embedding of Multilayer Networks: An Algorithm and Application to Group fMRI" 5 | by JD Wilson, M Baybay, R Sankar, and P Stillman 6 | 7 | Preprint here: https://arxiv.org/pdf/1809.06437.pdf 8 | 9 | Contributors: 10 | - Melanie Baybay 11 | University of San Francisco, Department of Computer Science 12 | - Rishi Sankar 13 | Henry M. Gunn High School 14 | - James D. Wilson (maintainer) 15 | University of San Francisco, Department of Mathematics and Statistics 16 | 17 | Questions or Bugs? Contact James D. Wilson at jdwilson4@usfca.edu 18 | ''' 19 | 20 | 21 | import numpy as np 22 | import networkx as nx 23 | import random 24 | #import multiprocessing 25 | import threading 26 | import time 27 | 28 | #is is_directed needed? 29 | 30 | class NeighborhoodGen(): 31 | def __init__(self, graph, p, q, thread_limit=1, is_directed=False, weighted=False): 32 | self.G = graph 33 | self.is_directed = is_directed 34 | self.p = p 35 | self.q = q 36 | self.weighted = weighted 37 | self.thread_limit = thread_limit 38 | 39 | self.preprocess_transition_probs() 40 | 41 | def multinode2vec_walk(self, w, walk_length, start_node, start_layer_id): 42 | ''' 43 | Simulate a random walk starting from start node. (Generate one neighborhood) 44 | ''' 45 | 46 | G = self.G 47 | alias_nodes = self.alias_nodes 48 | alias_edges = self.alias_edges 49 | 50 | walk = [start_node] #nbrhd 51 | cur_layer_id = start_layer_id 52 | force_switch = False 53 | while len(walk) < walk_length: 54 | cur = walk[-1] 55 | if not force_switch: 56 | prev_layer_id = cur_layer_id 57 | random.seed(1234) 58 | rval = random.random() 59 | if rval < w or force_switch: #then switch layer 60 | total_layers = len(G) 61 | rlay = random.randint(0, total_layers - 2) 62 | if rlay >= cur_layer_id: 63 | rlay += 1 64 | cur_layer_id = rlay 65 | force_switch = False 66 | cur_layer = G[cur_layer_id] 67 | try: 68 | cur_nbrs = sorted(cur_layer.neighbors(cur)) 69 | if len(cur_nbrs) > 0: 70 | if len(walk) == 1 or prev_layer_id != cur_layer_id: 71 | walk.append(cur_nbrs[alias_draw(alias_nodes[cur_layer_id][cur][0], alias_nodes[cur_layer_id][cur][1])]) 72 | else: 73 | prev = walk[-2] 74 | next = cur_nbrs[alias_draw(alias_edges[cur_layer_id][(prev, cur)][0], 75 | alias_edges[cur_layer_id][(prev, cur)][1])] 76 | walk.append(next) 77 | else: 78 | force_switch = True 79 | continue 80 | except Exception as e: 81 | force_switch = True 82 | continue 83 | 84 | return walk 85 | 86 | def simulate_walks(self, num_walks, walk_length): 87 | ''' 88 | Repeatedly simulate random walks from each node. 89 | ''' 90 | G = self.G 91 | walks = {} 92 | for layer in G: 93 | walks[layer] = [] 94 | nodes = list(layer.nodes()) 95 | print('Walk iteration:') 96 | for walk_iter in range(num_walks): 97 | print(str(walk_iter+1), '/', str(num_walks)) 98 | random.shuffle(nodes) 99 | for node in nodes: 100 | walks[layer].append(self.node2vec_walk(walk_length=walk_length, start_node=node)) 101 | 102 | return walks 103 | 104 | def get_alias_edge(self, src, dst, layer): 105 | ''' 106 | Get the alias edge setup lists for a given edge. 107 | ''' 108 | p = self.p 109 | q = self.q 110 | 111 | unnormalized_probs = [] 112 | for dst_nbr in sorted(layer.neighbors(dst)): 113 | if dst_nbr == src: 114 | unnormalized_probs.append(layer[dst][dst_nbr]['weight']/p) 115 | elif layer.has_edge(dst_nbr, src): 116 | unnormalized_probs.append(layer[dst][dst_nbr]['weight']) 117 | else: 118 | unnormalized_probs.append(layer[dst][dst_nbr]['weight']/q) 119 | norm_const = sum(unnormalized_probs) 120 | normalized_probs = [float(u_prob)/norm_const for u_prob in unnormalized_probs] 121 | 122 | return alias_setup(normalized_probs) 123 | 124 | def preprocess_transition_probs(self): 125 | ''' 126 | Preprocessing of transition probabilities for guiding the random walks. 127 | ''' 128 | G = self.G 129 | is_directed = self.is_directed 130 | 131 | self.alias_nodes = {} 132 | self.alias_edges = {} 133 | self.lock = threading.Lock() 134 | 135 | tlimit = self.thread_limit 136 | layer_count = len(self.G) 137 | counter = 0 138 | if tlimit == 1: 139 | for i in range(layer_count): 140 | self.preprocess_thread(self.G[i],i) 141 | else: 142 | while counter < layer_count: 143 | threads = [] 144 | rem = layer_count - counter 145 | if rem >= tlimit: 146 | for i in range(tlimit): 147 | thread = threading.Thread(target=self.preprocess_thread, args=(self.G[counter],counter,)) 148 | threads.append(thread) 149 | thread.start() 150 | counter += 1 151 | else: 152 | for i in range(rem): 153 | thread = threading.Thread(target=self.preprocess_thread, args=(self.G[counter],counter,)) 154 | threads.append(thread) 155 | thread.start() 156 | counter += 1 157 | for thread in threads: 158 | thread.join() 159 | 160 | return 161 | 162 | def preprocess_thread(self, layer, counter): 163 | start_time = time.time() 164 | print("Starting thread for layer " + str(counter)) 165 | alias_nodes = {} 166 | for node in layer.nodes(): 167 | unnormalized_probs = [layer[node][nbr]['weight'] for nbr in sorted(layer.neighbors(node))] 168 | norm_const = sum(unnormalized_probs) 169 | normalized_probs = [float(u_prob)/norm_const for u_prob in unnormalized_probs] 170 | alias_nodes[node] = alias_setup(normalized_probs) 171 | 172 | alias_edges = {} 173 | triads = {} 174 | 175 | if self.is_directed: 176 | for edge in layer.edges(): 177 | alias_edges[edge] = self.get_alias_edge(edge[0], edge[1], layer) 178 | else: 179 | for edge in layer.edges(): 180 | alias_edges[edge] = self.get_alias_edge(edge[0], edge[1], layer) 181 | alias_edges[(edge[1], edge[0])] = self.get_alias_edge(edge[1], edge[0], layer) 182 | 183 | self.lock.acquire() 184 | try: 185 | self.alias_nodes[counter] = alias_nodes 186 | self.alias_edges[counter] = alias_edges 187 | finally: 188 | self.lock.release() 189 | 190 | print("Finished thread for layer " + str(counter) + " in " + str(time.time() - start_time) + " seconds.") 191 | 192 | return 193 | 194 | def alias_setup(probs): 195 | ''' 196 | Compute utility lists for non-uniform sampling from discrete distributions. 197 | Refer to https://hips.seas.harvard.edu/blog/2013/03/03/the-alias-method-efficient-sampling-with-many-discrete-outcomes/ 198 | for details 199 | ''' 200 | K = len(probs) 201 | q = np.zeros(K) 202 | J = np.zeros(K, dtype=np.int) 203 | 204 | smaller = [] 205 | larger = [] 206 | for kk, prob in enumerate(probs): 207 | q[kk] = K*prob 208 | if q[kk] < 1.0: 209 | smaller.append(kk) 210 | else: 211 | larger.append(kk) 212 | 213 | while len(smaller) > 0 and len(larger) > 0: 214 | small = smaller.pop() 215 | large = larger.pop() 216 | 217 | J[small] = large 218 | q[large] = q[large] + q[small] - 1.0 219 | if q[large] < 1.0: 220 | smaller.append(large) 221 | else: 222 | larger.append(large) 223 | 224 | return J, q 225 | 226 | def alias_draw(J, q): 227 | ''' 228 | Draw sample from a non-uniform discrete distribution using alias sampling. 229 | ''' 230 | K = len(J) 231 | 232 | kk = int(np.floor(np.random.rand()*K)) 233 | if np.random.rand() < q[kk]: 234 | return kk 235 | else: 236 | return J[kk] 237 | -------------------------------------------------------------------------------- /src/nbrhd_gen_walk_nx.py: -------------------------------------------------------------------------------- 1 | ''' 2 | Neighborhood aliasing procedure used for fast random walks on multilayer networks. 3 | 4 | Details can be found in the paper: "Fast Embedding of Multilayer Networks: An Algorithm and Application to Group fMRI" 5 | by JD Wilson, M Baybay, R Sankar, and P Stillman 6 | 7 | Preprint here: https://arxiv.org/pdf/1809.06437.pdf 8 | 9 | Contributors: 10 | - Melanie Baybay 11 | University of San Francisco, Department of Computer Science 12 | - Rishi Sankar 13 | Henry M. Gunn High School 14 | - James D. Wilson (maintainer) 15 | University of San Francisco, Department of Mathematics and Statistics 16 | 17 | Questions or Bugs? Contact James D. Wilson at jdwilson4@usfca.edu 18 | ''' 19 | 20 | 21 | import numpy as np 22 | import networkx as nx 23 | import random 24 | #import multiprocessing 25 | import threading 26 | import time 27 | 28 | #is is_directed needed? 29 | 30 | class NeighborhoodGen(): 31 | def __init__(self, graph, p, q, thread_limit=1, is_directed=False, weighted=False): 32 | self.G = graph 33 | self.is_directed = is_directed 34 | self.p = p 35 | self.q = q 36 | self.weighted = weighted 37 | self.thread_limit = thread_limit 38 | 39 | self.preprocess_transition_probs() 40 | 41 | def multinode2vec_walk(self, w, walk_length, start_node, start_layer_id): 42 | ''' 43 | Simulate a random walk starting from start node. (Generate one neighborhood) 44 | ''' 45 | 46 | G = self.G 47 | alias_nodes = self.alias_nodes 48 | alias_edges = self.alias_edges 49 | 50 | walk = [start_node] #nbrhd 51 | cur_layer_id = start_layer_id 52 | force_switch = False 53 | while len(walk) < walk_length: 54 | cur = walk[-1] 55 | if not force_switch: 56 | prev_layer_id = cur_layer_id 57 | random.seed(1234) 58 | rval = random.random() 59 | if rval < w or force_switch: #then switch layer CHECK THIS!! 60 | total_layers = len(G) 61 | rlay = random.randint(0, total_layers - 2) 62 | if rlay >= cur_layer_id: 63 | rlay += 1 64 | cur_layer_id = rlay 65 | force_switch = False 66 | cur_layer = G[cur_layer_id] 67 | try: 68 | cur_nbrs = sorted(cur_layer.neighbors(cur)) 69 | if len(cur_nbrs) > 0: 70 | if len(walk) == 1 or prev_layer_id != cur_layer_id: 71 | walk.append(cur_nbrs[alias_draw(alias_nodes[cur_layer_id][cur][0], alias_nodes[cur_layer_id][cur][1])]) 72 | else: 73 | prev = walk[-2] 74 | next = cur_nbrs[alias_draw(alias_edges[cur_layer_id][(prev, cur)][0], 75 | alias_edges[cur_layer_id][(prev, cur)][1])] 76 | walk.append(next) 77 | else: 78 | force_switch = True 79 | continue 80 | except Exception as e: 81 | force_switch = True 82 | continue 83 | 84 | return walk 85 | 86 | def simulate_walks(self, num_walks, walk_length): 87 | ''' 88 | Repeatedly simulate random walks from each node. 89 | ''' 90 | G = self.G 91 | walks = {} 92 | for layer in G: 93 | walks[layer] = [] 94 | nodes = list(layer.nodes()) 95 | print('Walk iteration:') 96 | for walk_iter in range(num_walks): 97 | print(str(walk_iter+1), '/', str(num_walks)) 98 | random.shuffle(nodes) 99 | for node in nodes: 100 | walks[layer].append(self.node2vec_walk(walk_length=walk_length, start_node=node)) 101 | 102 | return walks 103 | 104 | def get_alias_edge(self, src, dst, layer): 105 | ''' 106 | Get the alias edge setup lists for a given edge. 107 | ''' 108 | p = self.p 109 | q = self.q 110 | 111 | unnormalized_probs = [] 112 | for dst_nbr in sorted(layer.neighbors(dst)): 113 | if dst_nbr == src: 114 | unnormalized_probs.append(layer[dst][dst_nbr]['weight']/p) 115 | elif layer.has_edge(dst_nbr, src): 116 | unnormalized_probs.append(layer[dst][dst_nbr]['weight']) 117 | else: 118 | unnormalized_probs.append(layer[dst][dst_nbr]['weight']/q) 119 | norm_const = sum(unnormalized_probs) 120 | normalized_probs = [float(u_prob)/norm_const for u_prob in unnormalized_probs] 121 | 122 | return alias_setup(normalized_probs) 123 | 124 | def preprocess_transition_probs(self): 125 | ''' 126 | Preprocessing of transition probabilities for guiding the random walks. 127 | ''' 128 | G = self.G 129 | is_directed = self.is_directed 130 | 131 | self.alias_nodes = {} 132 | self.alias_edges = {} 133 | self.lock = threading.Lock() 134 | 135 | tlimit = self.thread_limit 136 | layer_count = len(self.G) 137 | counter = 0 138 | if tlimit == 1: 139 | for i in range(layer_count): 140 | self.preprocess_thread(self.G[i],i) 141 | else: 142 | while counter < layer_count: 143 | threads = [] 144 | rem = layer_count - counter 145 | if rem >= tlimit: 146 | for i in range(tlimit): 147 | thread = threading.Thread(target=self.preprocess_thread, args=(self.G[counter],counter,)) 148 | threads.append(thread) 149 | thread.start() 150 | counter += 1 151 | else: 152 | for i in range(rem): 153 | thread = threading.Thread(target=self.preprocess_thread, args=(self.G[counter],counter,)) 154 | threads.append(thread) 155 | thread.start() 156 | counter += 1 157 | for thread in threads: 158 | thread.join() 159 | 160 | return 161 | 162 | def preprocess_thread(self, layer, counter): 163 | start_time = time.time() 164 | print("Starting thread for layer " + str(counter)) 165 | alias_nodes = {} 166 | for node in layer.nodes(): 167 | unnormalized_probs = [layer[node][nbr]['weight'] for nbr in sorted(layer.neighbors(node))] 168 | norm_const = sum(unnormalized_probs) 169 | normalized_probs = [float(u_prob)/norm_const for u_prob in unnormalized_probs] 170 | alias_nodes[node] = alias_setup(normalized_probs) 171 | 172 | alias_edges = {} 173 | triads = {} 174 | 175 | if self.is_directed: 176 | for edge in layer.edges(): 177 | alias_edges[edge] = self.get_alias_edge(edge[0], edge[1], layer) 178 | else: 179 | for edge in layer.edges(): 180 | alias_edges[edge] = self.get_alias_edge(edge[0], edge[1], layer) 181 | alias_edges[(edge[1], edge[0])] = self.get_alias_edge(edge[1], edge[0], layer) 182 | 183 | self.lock.acquire() 184 | try: 185 | self.alias_nodes[counter] = alias_nodes 186 | self.alias_edges[counter] = alias_edges 187 | finally: 188 | self.lock.release() 189 | 190 | print("Finished thread for layer " + str(counter) + " in " + str(time.time() - start_time) + " seconds.") 191 | 192 | return 193 | 194 | def alias_setup(probs): 195 | ''' 196 | Compute utility lists for non-uniform sampling from discrete distributions. 197 | Refer to https://hips.seas.harvard.edu/blog/2013/03/03/the-alias-method-efficient-sampling-with-many-discrete-outcomes/ 198 | for details 199 | ''' 200 | K = len(probs) 201 | q = np.zeros(K) 202 | J = np.zeros(K, dtype=np.int) 203 | 204 | smaller = [] 205 | larger = [] 206 | for kk, prob in enumerate(probs): 207 | q[kk] = K*prob 208 | if q[kk] < 1.0: 209 | smaller.append(kk) 210 | else: 211 | larger.append(kk) 212 | 213 | while len(smaller) > 0 and len(larger) > 0: 214 | small = smaller.pop() 215 | large = larger.pop() 216 | 217 | J[small] = large 218 | q[large] = q[large] + q[small] - 1.0 219 | if q[large] < 1.0: 220 | smaller.append(large) 221 | else: 222 | larger.append(large) 223 | 224 | return J, q 225 | 226 | def alias_draw(J, q): 227 | ''' 228 | Draw sample from a non-uniform discrete distribution using alias sampling. 229 | ''' 230 | K = len(J) 231 | 232 | kk = int(np.floor(np.random.rand()*K)) 233 | if np.random.rand() < q[kk]: 234 | return kk 235 | else: 236 | return J[kk] 237 | -------------------------------------------------------------------------------- /src/utils.py: -------------------------------------------------------------------------------- 1 | """ 2 | Helper functions for multi-node2vec. (duplicate of mltn2v_utils.py used for testing) 3 | Details can be found in the paper: "Fast Embedding of Multilayer Networks: An Algorithm and Application to Group fMRI" 4 | by JD Wilson, M Baybay, R Sankar, and P Stillman 5 | 6 | Preprint here: https://arxiv.org/pdf/1809.06437.pdf 7 | 8 | Contributors: 9 | - Melanie Baybay 10 | University of San Francisco, Department of Computer Science 11 | - Rishi Sankar 12 | Henry M. Gunn High School 13 | - James D. Wilson (maintainer) 14 | University of San Francisco, Department of Mathematics and Statistics 15 | 16 | Questions or Bugs? Contact James D. Wilson at jdwilson4@usfca.edu 17 | """ 18 | 19 | import os 20 | import pandas as pd 21 | from pandas.api.types import is_numeric_dtype 22 | import time 23 | 24 | 25 | # ------------------------------------------------------------------------------- 26 | # PARSING AND CONVERSION FOR MULTILAYER GRAPHS 27 | # ------------------------------------------------------------------------------- 28 | def parse_matrix_layers(network_dir, delim=',', binary=False, thresh=None): 29 | """ 30 | Converts directory of adjacency matrix files into pandas dataframes. 31 | :param network_dir: Directory of adjacency matrix files 32 | :param delim: separator for adjacency matrix 33 | :param binary: boolean of whether or not to convert edge weights to binary 34 | :param thresh: threshold for edge weights. Will accepts weights <= thresh 35 | :return: List of adjacency lists. Each adjacency list is one layer and is represented 36 | as pandas DataFrames with 'source', 'target', 'weight' columns. 37 | """ 38 | # expand directory path 39 | network_dir = expand_path(network_dir) 40 | 41 | # iterate files and convert to pandas dataframes 42 | layers = [] 43 | for network_file in os.listdir(network_dir): 44 | file_path = os.path.join(network_dir, network_file) 45 | try: 46 | # read as pandas DataFrame, index=source, col=target 47 | layer = pd.read_csv(file_path, index_col=0) 48 | if layer.shape[0] != layer.shape[1]: 49 | print('[ERROR] Invalid adjacency matrix. Expecting matrix with index as source and column as target.') 50 | return 51 | if thresh is not None: 52 | layer[layer <= thresh] = 0 53 | if binary: 54 | layer[layer != 0] = 1 55 | # ensure that index (node name) is string, since word2vec will need it as str 56 | if is_numeric_dtype(layer.index): 57 | layer.index = layer.index.map(str) 58 | # replace all 0s with NaN 59 | layer.replace(to_replace=0, value=pd.np.nan, inplace=True) 60 | # convert matrix --> adjacency list with cols ["source", "target", "weight"] 61 | layer = layer.stack(dropna=True).reset_index() 62 | # rename columns 63 | layer.columns = ["source", "target", "weight"] 64 | layers.append(layer) 65 | except Exception as e: 66 | print('[ERROR] Could not read file "{}": {} '.format(file_path, e)) 67 | return layers 68 | 69 | 70 | def expand_path(path): 71 | """ 72 | Expands a file path to handle user and environmental variables. 73 | :param path: path to expand 74 | :return: expanded path 75 | """ 76 | new_path = os.path.expanduser(path) 77 | return os.path.expandvars(new_path) 78 | 79 | 80 | # ------------------------------------------------------------------------------- 81 | # OUTPUT 82 | # ------------------------------------------------------------------------------- 83 | def feature_matrix_to_csv(ftrs, filename): 84 | """ 85 | Convert feature matrix to csv. 86 | :param matrix: pandas DataFrame of features 87 | :param filename: absolute path to output file (no extension) 88 | :param nodes_on: if True, first column indicates node_id 89 | :return: 90 | """ 91 | out = filename + ".csv" 92 | ftrs.to_csv(out, sep=',', header=False) 93 | return 94 | 95 | 96 | def timed_invoke(action_desc, method): 97 | """ 98 | Invokes a method with timing. 99 | :param action_desc: The string describing the method action 100 | :param method: The method to invoke 101 | :return: The return object of the method 102 | """ 103 | print('Started {}...'.format(action_desc)) 104 | start = time.time() 105 | try: 106 | output = method() 107 | print('Finished {} in {} seconds'.format(action_desc, int(time.time() - start))) 108 | return output 109 | except Exception: 110 | print('Exception while {} after {} seconds'.format(action_desc, int(time.time() - start))) 111 | raise 112 | 113 | 114 | def clean_output(directory): 115 | """ 116 | Checks if output directory exists, otherwise it is created. 117 | """ 118 | directory = expand_path(directory) 119 | if os.path.isdir(directory): 120 | return directory 121 | else: 122 | os.makedirs(directory) 123 | print("[WARNING] Directory not found. Created {}".format(directory)) 124 | return directory 125 | --------------------------------------------------------------------------------