├── .DS_Store
├── README.md
├── data
├── .DS_Store
├── CONTROL_fmt
│ ├── 0040013.preprocess_v1.csv
│ ├── 0040014.preprocess_v1.csv
│ ├── 0040017.preprocess_v1.csv
│ ├── 0040018.preprocess_v1.csv
│ ├── 0040019.preprocess_v1.csv
│ ├── 0040020.preprocess_v1.csv
│ ├── 0040023.preprocess_v1.csv
│ ├── 0040024.preprocess_v1.csv
│ ├── 0040026.preprocess_v1.csv
│ ├── 0040027.preprocess_v1.csv
│ ├── 0040030.preprocess_v1.csv
│ ├── 0040031.preprocess_v1.csv
│ ├── 0040033.preprocess_v1.csv
│ ├── 0040035.preprocess_v1.csv
│ ├── 0040036.preprocess_v1.csv
│ ├── 0040038.preprocess_v1.csv
│ ├── 0040043.preprocess_v1.csv
│ ├── 0040045.preprocess_v1.csv
│ ├── 0040048.preprocess_v1.csv
│ ├── 0040050.preprocess_v1.csv
│ ├── 0040051.preprocess_v1.csv
│ ├── 0040052.preprocess_v1.csv
│ ├── 0040053.preprocess_v1.csv
│ ├── 0040054.preprocess_v1.csv
│ ├── 0040055.preprocess_v1.csv
│ ├── 0040056.preprocess_v1.csv
│ ├── 0040057.preprocess_v1.csv
│ ├── 0040058.preprocess_v1.csv
│ ├── 0040061.preprocess_v1.csv
│ ├── 0040062.preprocess_v1.csv
│ ├── 0040063.preprocess_v1.csv
│ ├── 0040065.preprocess_v1.csv
│ ├── 0040066.preprocess_v1.csv
│ ├── 0040067.preprocess_v1.csv
│ ├── 0040068.preprocess_v1.csv
│ ├── 0040069.preprocess_v1.csv
│ ├── 0040074.preprocess_v1.csv
│ ├── 0040076.preprocess_v1.csv
│ ├── 0040086.preprocess_v1.csv
│ ├── 0040087.preprocess_v1.csv
│ ├── 0040090.preprocess_v1.csv
│ ├── 0040091.preprocess_v1.csv
│ ├── 0040093.preprocess_v1.csv
│ ├── 0040095.preprocess_v1.csv
│ ├── 0040102.preprocess_v1.csv
│ ├── 0040104.preprocess_v1.csv
│ ├── 0040107.preprocess_v1.csv
│ ├── 0040111.preprocess_v1.csv
│ ├── 0040113.preprocess_v1.csv
│ ├── 0040114.preprocess_v1.csv
│ ├── 0040115.preprocess_v1.csv
│ ├── 0040116.preprocess_v1.csv
│ ├── 0040118.preprocess_v1.csv
│ ├── 0040119.preprocess_v1.csv
│ ├── 0040120.preprocess_v1.csv
│ ├── 0040121.preprocess_v1.csv
│ ├── 0040123.preprocess_v1.csv
│ ├── 0040124.preprocess_v1.csv
│ ├── 0040125.preprocess_v1.csv
│ ├── 0040127.preprocess_v1.csv
│ ├── 0040128.preprocess_v1.csv
│ ├── 0040129.preprocess_v1.csv
│ ├── 0040130.preprocess_v1.csv
│ ├── 0040131.preprocess_v1.csv
│ ├── 0040134.preprocess_v1.csv
│ ├── 0040135.preprocess_v1.csv
│ ├── 0040136.preprocess_v1.csv
│ ├── 0040138.preprocess_v1.csv
│ ├── 0040139.preprocess_v1.csv
│ ├── 0040140.preprocess_v1.csv
│ ├── 0040141.preprocess_v1.csv
│ ├── 0040144.preprocess_v1.csv
│ ├── 0040146.preprocess_v1.csv
│ └── 0040147.preprocess_v1.csv
├── power_atlas_info.csv
└── test
│ ├── 0040013.preprocess_v1.csv
│ └── 0040014.preprocess_v1.csv
├── mn2vec_toy.png
├── multi_node2vec.py
├── requirements.txt
├── results
└── test
│ ├── .DS_Store
│ └── r0.25
│ ├── mltn2v_control.csv
│ ├── mltn2v_control.emb
│ ├── mltn2v_results.csv
│ └── mltn2v_results.emb
└── src
├── __init__.py
├── __pycache__
├── __init__.cpython-36.pyc
├── mltn2v_utils.cpython-36.pyc
├── multinode2vec.cpython-36.pyc
└── nbrhd_gen_walk_nx.cpython-36.pyc
├── mltn2v_utils.py
├── multinode2vec.py
├── nbrhd_gen_walk.py
├── nbrhd_gen_walk_nx.py
└── utils.py
/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jdwilson4/multi-node2vec/f6f86707e10227a7609bfcce5db4b21e03c932ea/.DS_Store
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # multi-node2vec
2 | This is Python source code for the multi-node2vec algorithm. Multi-node2vec is a fast network embedding method for multilayer networks
3 | that identifies a continuous and low-dimensional representation for the unique nodes in the network.
4 |
5 | Details of the algorithm can be found in the paper: *Fast Embedding of Multilayer Networks: An Algorithm and Application to Group fMRI*
6 | by JD Wilson, M Baybay, R Sankar, and P Stillman.
7 |
8 | **Preprint**: https://arxiv.org/pdf/1809.06437.pdf
9 |
10 | __Contributors__:
11 | - Melanie Baybay
12 | University of San Francisco, Department of Computer Science
13 | - Rishi Sankar
14 | Henry M. Gunn High School
15 | - James D. Wilson (maintainer)
16 | University of San Francisco, Department of Mathematics and Statistics
17 |
18 | **Questions or Bugs?** Contact James D. Wilson at jdwilson4@usfca.edu
19 |
20 | # Description
21 |
22 | ## The Mathematical Objective
23 |
24 | A multilayer network of length *m* is a collection of networks or graphs {G1, ..., Gm}, where the graph Gj models the relational structure of the *j*th layer of the network. Each layer Gj = (Vj, Wj) is described by the vertex set Vj that describes the units, or actors, of the layer, and the edge weights Wj that describes the strength of relationship between the nodes. Layers in the multilayer sequence may be heterogeneous across vertices, edges, and size. Denote the set of unique nodes in {G1, ..., Gm} by **N**, and let
25 | *N* = |**N**| denote the number of nodes in that set.
26 |
27 | The aim of the **multi-node2vec** is to learn an interpretable low-dimensional feature representation of **N**. In particular, it seeks a *D*-dimensional representation
28 |
29 | **F**: **N** --> R*D*,
30 |
31 | where *D* < < N. The function **F** can be viewed as an *N* x *D* matrix whose rows {**f**v: v = 1, ..., N} represent the feature space of each node in **N**.
32 |
33 | ## The Algorithm
34 | The **multi-node2vec** algorithm estimates **F** through maximum likelihood estimation, and relies upon two core steps
35 |
36 | 1) __NeighborhoodSearch__: a collection of vertex neighborhoods from the observed multilayer graph, also known as a *BagofNodes*, is identified. This is done through a 2nd order random walk on the multilayer network.
37 |
38 | 2) __Optimization__: Given a *BagofNodes*, **F** is then estimated through the maximization of the log-likelihood of **F** | **N**. This is done through the application of stochastic gradient descent on a two-layer Skip-gram neural network model.
39 |
40 | The following image provides a schematic:
41 |
42 | 
43 |
44 | # Running multi-node2vec
45 |
46 | ## Requirements
47 | This package requires Python == 3.6 with the following libraries:
48 | - numpy==1.12.1
49 | - pandas==0.24.0
50 | - gensim==2.3.0
51 | - networkx==2.5.1
52 |
53 | You can install these libraries by running the command
54 |
55 | ```
56 | pip install -r requirements.txt
57 | ```
58 |
59 | from this project's root directory.
60 |
61 |
62 | ## Usage
63 | ```
64 | python3 multi_node2vec.py [--dir [DIR]] [--output [OUTPUT]] [--d [D]] [--walk_length [WALK_LENGTH]] [--window_size [WINDOW_SIZE]][--n_samples [N_SAMPLES]][--thresh [THRESH]][--w2v_iter [W2V_ITER]] [--w2v_workers [W2V_WORKERS]] [--rvals [RVALS]] [--pvals [PVALS]] [--qvals [QVALS]]
65 | ```
66 |
67 | ***Arguments***
68 |
69 | - --dir [directory name] : Absolute path to directory of correlation/adjacency matrix files in csv format. Note that each .csv should contain an adjacency matrix with columns and rows labeled by the node ID.
70 | - --output [filename] : Absolute path to output file (no extension).
71 | - --d [dimensions] : Dimensionality. Default is 100.
72 | - --walk_length [n] : Length of each random walk for identifying multilayer neighborhoods. Default is 100.
73 | - --window_size [w] : Size of context window used for Skip Gram optimization. Default is 10.
74 | - --n_samples [samples] : Number of times to sample a layer. Default is 1.
75 | - --thresh [thresh] : Threshold for converting a weighted network to an unweighted one. All weights less than or equal to thresh will be considered 0 and all others 1. Default is 0.5. Use None if the network is unweighted.
76 | - --w2v_workers [workers] : Number of parallel worker threads. Default is 8.
77 | - --rvals [layer walk prob]: The unnormalized walk probability for traversing layers. Default is .25.
78 | - --pvals [return prob] : The unnormalized walk probability of returning to a previously seen node. Default is 1.
79 | - --qvals [explore prob] : The unnormalized walk probability of exploring new nodes. Default is 0.50.
80 |
81 | ### Examples
82 |
83 | __Quick Test example__
84 |
85 | This example runs **multi-node2vec** on a small test multilayer network with 2 layers and 264 nodes in each layer. It takes about 2 minutes to run on a personal computer using 8 cores.
86 | ```
87 | python3 multi_node2vec.py --dir data/test --output results/test --d 100 --window_size 2 --n_samples 1 --thresh 0.5 --rvals 0.25
88 | ```
89 |
90 | __fMRI Case Study__
91 |
92 | This example runs **multi-node2vec** on the multilayer network representing group fMRI of 74 healthy controls as run in the paper *Fast Embedding of Multilayer Networks: An Algorithm and Application to Group fMRI*. The model will generate
93 | generate 100 features for each of 264 unique nodes using a walk parameter *r = 0.25*. The values of *p* (=1) and *q* (=0.50) are set to the default of what is available in the original **node2vec** specification. It takes about an hour to run on a personal computer using 8 cores.
94 | ```
95 | python3 multi_node2vec.py --dir data/CONTROL_fmt --output results/control --d 100 --window_size 10 --n_samples 1 --rvals 0.25 --pvals 1 --thresh 0.5 --qvals 0.5
96 | ```
97 |
98 |
99 |
100 |
--------------------------------------------------------------------------------
/data/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jdwilson4/multi-node2vec/f6f86707e10227a7609bfcce5db4b21e03c932ea/data/.DS_Store
--------------------------------------------------------------------------------
/data/power_atlas_info.csv:
--------------------------------------------------------------------------------
1 | ROI,X,Y,Z,MasterAssignment,SuggestedSystem,color,network_revised,roi_name_unique,color_updated
2 | 1,-25,-98,-12,-1,Uncertain,White,uncertain,roi_1_uncertain,white
3 | 2,27,-97,-13,-1,Uncertain,White,uncertain,roi_2_uncertain,white
4 | 3,24,32,-18,-1,Uncertain,White,uncertain,roi_3_uncertain,white
5 | 4,-56,-45,-24,-1,Uncertain,White,uncertain,roi_4_uncertain,white
6 | 5,8,41,-24,-1,Uncertain,White,uncertain,roi_5_uncertain,white
7 | 6,-21,-22,-20,-1,Uncertain,White,uncertain,roi_6_uncertain,white
8 | 7,17,-28,-17,-1,Uncertain,White,uncertain,roi_7_uncertain,white
9 | 8,-37,-29,-26,-1,Uncertain,White,uncertain,roi_8_uncertain,white
10 | 9,65,-24,-19,-1,Uncertain,White,uncertain,roi_9_uncertain,white
11 | 10,52,-34,-27,-1,Uncertain,White,uncertain,roi_10_uncertain,white
12 | 11,55,-31,-17,-1,Uncertain,White,uncertain,roi_11_uncertain,white
13 | 12,34,38,-12,-1,Uncertain,White,uncertain,roi_12_uncertain,white
14 | 13,-7,-52,61,1,Sensory/somatomotor Hand,Cyan,hand,roi_13_hand,cyan
15 | 14,-14,-18,40,1,Sensory/somatomotor Hand,Cyan,hand,roi_14_hand,cyan
16 | 15,0,-15,47,1,Sensory/somatomotor Hand,Cyan,hand,roi_15_hand,cyan
17 | 16,10,-2,45,1,Sensory/somatomotor Hand,Cyan,hand,roi_16_hand,cyan
18 | 17,-7,-21,65,1,Sensory/somatomotor Hand,Cyan,hand,roi_17_hand,cyan
19 | 18,-7,-33,72,1,Sensory/somatomotor Hand,Cyan,hand,roi_18_hand,cyan
20 | 19,13,-33,75,1,Sensory/somatomotor Hand,Cyan,hand,roi_19_hand,cyan
21 | 20,-54,-23,43,1,Sensory/somatomotor Hand,Cyan,hand,roi_20_hand,cyan
22 | 21,29,-17,71,1,Sensory/somatomotor Hand,Cyan,hand,roi_21_hand,cyan
23 | 22,10,-46,73,1,Sensory/somatomotor Hand,Cyan,hand,roi_22_hand,cyan
24 | 23,-23,-30,72,1,Sensory/somatomotor Hand,Cyan,hand,roi_23_hand,cyan
25 | 24,-40,-19,54,1,Sensory/somatomotor Hand,Cyan,hand,roi_24_hand,cyan
26 | 25,29,-39,59,1,Sensory/somatomotor Hand,Cyan,hand,roi_25_hand,cyan
27 | 26,50,-20,42,1,Sensory/somatomotor Hand,Cyan,hand,roi_26_hand,cyan
28 | 27,-38,-27,69,1,Sensory/somatomotor Hand,Cyan,hand,roi_27_hand,cyan
29 | 28,20,-29,60,1,Sensory/somatomotor Hand,Cyan,hand,roi_28_hand,cyan
30 | 29,44,-8,57,1,Sensory/somatomotor Hand,Cyan,hand,roi_29_hand,cyan
31 | 30,-29,-43,61,1,Sensory/somatomotor Hand,Cyan,hand,roi_30_hand,cyan
32 | 31,10,-17,74,1,Sensory/somatomotor Hand,Cyan,hand,roi_31_hand,cyan
33 | 32,22,-42,69,1,Sensory/somatomotor Hand,Cyan,hand,roi_32_hand,cyan
34 | 33,-45,-32,47,1,Sensory/somatomotor Hand,Cyan,hand,roi_33_hand,cyan
35 | 34,-21,-31,61,1,Sensory/somatomotor Hand,Cyan,hand,roi_34_hand,cyan
36 | 35,-13,-17,75,1,Sensory/somatomotor Hand,Cyan,hand,roi_35_hand,cyan
37 | 36,42,-20,55,1,Sensory/somatomotor Hand,Cyan,hand,roi_36_hand,cyan
38 | 37,-38,-15,69,1,Sensory/somatomotor Hand,Cyan,hand,roi_37_hand,cyan
39 | 38,-16,-46,73,1,Sensory/somatomotor Hand,Cyan,hand,roi_38_hand,cyan
40 | 39,2,-28,60,1,Sensory/somatomotor Hand,Cyan,hand,roi_39_hand,cyan
41 | 40,3,-17,58,1,Sensory/somatomotor Hand,Cyan,hand,roi_40_hand,cyan
42 | 41,38,-17,45,1,Sensory/somatomotor Hand,Cyan,hand,roi_41_hand,cyan
43 | 42,-49,-11,35,2,Sensory/somatomotor Mouth,Orange,mouth,roi_42_mouth,orange
44 | 43,36,-9,14,2,Sensory/somatomotor Mouth,Orange,mouth,roi_43_mouth,orange
45 | 44,51,-6,32,2,Sensory/somatomotor Mouth,Orange,mouth,roi_44_mouth,orange
46 | 45,-53,-10,24,2,Sensory/somatomotor Mouth,Orange,mouth,roi_45_mouth,orange
47 | 46,66,-8,25,2,Sensory/somatomotor Mouth,Orange,mouth,roi_46_mouth,orange
48 | 47,-3,2,53,3,Cingulo-opercular Task Control,Purple,cing_oper_task_control,roi_47_cing_oper_task_control,purple
49 | 48,54,-28,34,3,Cingulo-opercular Task Control,Purple,cing_oper_task_control,roi_48_cing_oper_task_control,purple
50 | 49,19,-8,64,3,Cingulo-opercular Task Control,Purple,cing_oper_task_control,roi_49_cing_oper_task_control,purple
51 | 50,-16,-5,71,3,Cingulo-opercular Task Control,Purple,cing_oper_task_control,roi_50_cing_oper_task_control,purple
52 | 51,-10,-2,42,3,Cingulo-opercular Task Control,Purple,cing_oper_task_control,roi_51_cing_oper_task_control,purple
53 | 52,37,1,-4,3,Cingulo-opercular Task Control,Purple,cing_oper_task_control,roi_52_cing_oper_task_control,purple
54 | 53,13,-1,70,3,Cingulo-opercular Task Control,Purple,cing_oper_task_control,roi_53_cing_oper_task_control,purple
55 | 54,7,8,51,3,Cingulo-opercular Task Control,Purple,cing_oper_task_control,roi_54_cing_oper_task_control,purple
56 | 55,-45,0,9,3,Cingulo-opercular Task Control,Purple,cing_oper_task_control,roi_55_cing_oper_task_control,purple
57 | 56,49,8,-1,3,Cingulo-opercular Task Control,Purple,cing_oper_task_control,roi_56_cing_oper_task_control,purple
58 | 57,-34,3,4,3,Cingulo-opercular Task Control,Purple,cing_oper_task_control,roi_57_cing_oper_task_control,purple
59 | 58,-51,8,-2,3,Cingulo-opercular Task Control,Purple,cing_oper_task_control,roi_58_cing_oper_task_control,purple
60 | 59,-5,18,34,3,Cingulo-opercular Task Control,Purple,cing_oper_task_control,roi_59_cing_oper_task_control,purple
61 | 60,36,10,1,3,Cingulo-opercular Task Control,Purple,cing_oper_task_control,roi_60_cing_oper_task_control,purple
62 | 61,32,-26,13,4,Auditory,Pink,auditory,roi_61_auditory,pink
63 | 62,65,-33,20,4,Auditory,Pink,auditory,roi_62_auditory,pink
64 | 63,58,-16,7,4,Auditory,Pink,auditory,roi_63_auditory,pink
65 | 64,-38,-33,17,4,Auditory,Pink,auditory,roi_64_auditory,pink
66 | 65,-60,-25,14,4,Auditory,Pink,auditory,roi_65_auditory,pink
67 | 66,-49,-26,5,4,Auditory,Pink,auditory,roi_66_auditory,pink
68 | 67,43,-23,20,4,Auditory,Pink,auditory,roi_67_auditory,pink
69 | 68,-50,-34,26,4,Auditory,Pink,auditory,roi_68_auditory,pink
70 | 69,-53,-22,23,4,Auditory,Pink,auditory,roi_69_auditory,pink
71 | 70,-55,-9,12,4,Auditory,Pink,auditory,roi_70_auditory,pink
72 | 71,56,-5,13,4,Auditory,Pink,auditory,roi_71_auditory,pink
73 | 72,59,-17,29,4,Auditory,Pink,auditory,roi_72_auditory,pink
74 | 73,-30,-27,12,4,Auditory,Pink,auditory,roi_73_auditory,pink
75 | 74,-41,-75,26,5,Default mode,Red,dmn,roi_74_dmn,red
76 | 75,6,67,-4,5,Default mode,Red,dmn,roi_75_dmn,red
77 | 76,8,48,-15,5,Default mode,Red,dmn,roi_76_dmn,red
78 | 77,-13,-40,1,5,Default mode,Red,dmn,roi_77_dmn,red
79 | 78,-18,63,-9,5,Default mode,Red,dmn,roi_78_dmn,red
80 | 79,-46,-61,21,5,Default mode,Red,dmn,roi_79_dmn,red
81 | 80,43,-72,28,5,Default mode,Red,dmn,roi_80_dmn,red
82 | 81,-44,12,-34,5,Default mode,Red,dmn,roi_81_dmn,red
83 | 82,46,16,-30,5,Default mode,Red,dmn,roi_82_dmn,red
84 | 83,-68,-23,-16,5,Default mode,Red,dmn,roi_83_dmn,red
85 | 84,-58,-26,-15,-1,Uncertain,White,uncertain,roi_84_uncertain,white
86 | 85,27,16,-17,-1,Uncertain,White,uncertain,roi_85_uncertain,white
87 | 86,-44,-65,35,5,Default mode,Red,dmn,roi_86_dmn,red
88 | 87,-39,-75,44,5,Default mode,Red,dmn,roi_87_dmn,red
89 | 88,-7,-55,27,5,Default mode,Red,dmn,roi_88_dmn,red
90 | 89,6,-59,35,5,Default mode,Red,dmn,roi_89_dmn,red
91 | 90,-11,-56,16,5,Default mode,Red,dmn,roi_90_dmn,red
92 | 91,-3,-49,13,5,Default mode,Red,dmn,roi_91_dmn,red
93 | 92,8,-48,31,5,Default mode,Red,dmn,roi_92_dmn,red
94 | 93,15,-63,26,5,Default mode,Red,dmn,roi_93_dmn,red
95 | 94,-2,-37,44,5,Default mode,Red,dmn,roi_94_dmn,red
96 | 95,11,-54,17,5,Default mode,Red,dmn,roi_95_dmn,red
97 | 96,52,-59,36,5,Default mode,Red,dmn,roi_96_dmn,red
98 | 97,23,33,48,5,Default mode,Red,dmn,roi_97_dmn,red
99 | 98,-10,39,52,5,Default mode,Red,dmn,roi_98_dmn,red
100 | 99,-16,29,53,5,Default mode,Red,dmn,roi_99_dmn,red
101 | 100,-35,20,51,5,Default mode,Red,dmn,roi_100_dmn,red
102 | 101,22,39,39,5,Default mode,Red,dmn,roi_101_dmn,red
103 | 102,13,55,38,5,Default mode,Red,dmn,roi_102_dmn,red
104 | 103,-10,55,39,5,Default mode,Red,dmn,roi_103_dmn,red
105 | 104,-20,45,39,5,Default mode,Red,dmn,roi_104_dmn,red
106 | 105,6,54,16,5,Default mode,Red,dmn,roi_105_dmn,red
107 | 106,6,64,22,5,Default mode,Red,dmn,roi_106_dmn,red
108 | 107,-7,51,-1,5,Default mode,Red,dmn,roi_107_dmn,red
109 | 108,9,54,3,5,Default mode,Red,dmn,roi_108_dmn,red
110 | 109,-3,44,-9,5,Default mode,Red,dmn,roi_109_dmn,red
111 | 110,8,42,-5,5,Default mode,Red,dmn,roi_110_dmn,red
112 | 111,-11,45,8,5,Default mode,Red,dmn,roi_111_dmn,red
113 | 112,-2,38,36,5,Default mode,Red,dmn,roi_112_dmn,red
114 | 113,-3,42,16,5,Default mode,Red,dmn,roi_113_dmn,red
115 | 114,-20,64,19,5,Default mode,Red,dmn,roi_114_dmn,red
116 | 115,-8,48,23,5,Default mode,Red,dmn,roi_115_dmn,red
117 | 116,65,-12,-19,5,Default mode,Red,dmn,roi_116_dmn,red
118 | 117,-56,-13,-10,5,Default mode,Red,dmn,roi_117_dmn,red
119 | 118,-58,-30,-4,5,Default mode,Red,dmn,roi_118_dmn,red
120 | 119,65,-31,-9,5,Default mode,Red,dmn,roi_119_dmn,red
121 | 120,-68,-41,-5,5,Default mode,Red,dmn,roi_120_dmn,red
122 | 121,13,30,59,5,Default mode,Red,dmn,roi_121_dmn,red
123 | 122,12,36,20,5,Default mode,Red,dmn,roi_122_dmn,red
124 | 123,52,-2,-16,5,Default mode,Red,dmn,roi_123_dmn,red
125 | 124,-26,-40,-8,5,Default mode,Red,dmn,roi_124_dmn,red
126 | 125,27,-37,-13,5,Default mode,Red,dmn,roi_125_dmn,red
127 | 126,-34,-38,-16,5,Default mode,Red,dmn,roi_126_dmn,red
128 | 127,28,-77,-32,5,Default mode,Red,dmn,roi_127_dmn,red
129 | 128,52,7,-30,5,Default mode,Red,dmn,roi_128_dmn,red
130 | 129,-53,3,-27,5,Default mode,Red,dmn,roi_129_dmn,red
131 | 130,47,-50,29,5,Default mode,Red,dmn,roi_130_dmn,red
132 | 131,-49,-42,1,5,Default mode,Red,dmn,roi_131_dmn,red
133 | 132,-31,19,-19,-1,Uncertain,White,uncertain,roi_132_uncertain,white
134 | 133,-2,-35,31,6,Memory retrieval?,Gray,mem_retr,roi_133_mem_retr,gray
135 | 134,-7,-71,42,6,Memory retrieval?,Gray,mem_retr,roi_134_mem_retr,gray
136 | 135,11,-66,42,6,Memory retrieval?,Gray,mem_retr,roi_135_mem_retr,gray
137 | 136,4,-48,51,6,Memory retrieval?,Gray,mem_retr,roi_136_mem_retr,gray
138 | 137,-46,31,-13,5,Default mode,Red,dmn,roi_137_dmn,red
139 | 138,-10,11,67,11,Ventral attention,Teal,ventral_attention,roi_138_ventral_attention,olivedrab
140 | 139,49,35,-12,5,Default mode,Red,dmn,roi_139_dmn,red
141 | 140,8,-91,-7,-1,Uncertain,White,uncertain,roi_140_uncertain,white
142 | 141,17,-91,-14,-1,Uncertain,White,uncertain,roi_141_uncertain,white
143 | 142,-12,-95,-13,-1,Uncertain,White,uncertain,roi_142_uncertain,white
144 | 143,18,-47,-10,7,Visual,Blue,visual,roi_143_visual,blue
145 | 144,40,-72,14,7,Visual,Blue,visual,roi_144_visual,blue
146 | 145,8,-72,11,7,Visual,Blue,visual,roi_145_visual,blue
147 | 146,-8,-81,7,7,Visual,Blue,visual,roi_146_visual,blue
148 | 147,-28,-79,19,7,Visual,Blue,visual,roi_147_visual,blue
149 | 148,20,-66,2,7,Visual,Blue,visual,roi_148_visual,blue
150 | 149,-24,-91,19,7,Visual,Blue,visual,roi_149_visual,blue
151 | 150,27,-59,-9,7,Visual,Blue,visual,roi_150_visual,blue
152 | 151,-15,-72,-8,7,Visual,Blue,visual,roi_151_visual,blue
153 | 152,-18,-68,5,7,Visual,Blue,visual,roi_152_visual,blue
154 | 153,43,-78,-12,7,Visual,Blue,visual,roi_153_visual,blue
155 | 154,-47,-76,-10,7,Visual,Blue,visual,roi_154_visual,blue
156 | 155,-14,-91,31,7,Visual,Blue,visual,roi_155_visual,blue
157 | 156,15,-87,37,7,Visual,Blue,visual,roi_156_visual,blue
158 | 157,29,-77,25,7,Visual,Blue,visual,roi_157_visual,blue
159 | 158,20,-86,-2,7,Visual,Blue,visual,roi_158_visual,blue
160 | 159,15,-77,31,7,Visual,Blue,visual,roi_159_visual,blue
161 | 160,-16,-52,-1,7,Visual,Blue,visual,roi_160_visual,blue
162 | 161,42,-66,-8,7,Visual,Blue,visual,roi_161_visual,blue
163 | 162,24,-87,24,7,Visual,Blue,visual,roi_162_visual,blue
164 | 163,6,-72,24,7,Visual,Blue,visual,roi_163_visual,blue
165 | 164,-42,-74,0,7,Visual,Blue,visual,roi_164_visual,blue
166 | 165,26,-79,-16,7,Visual,Blue,visual,roi_165_visual,blue
167 | 166,-16,-77,34,7,Visual,Blue,visual,roi_166_visual,blue
168 | 167,-3,-81,21,7,Visual,Blue,visual,roi_167_visual,blue
169 | 168,-40,-88,-6,7,Visual,Blue,visual,roi_168_visual,blue
170 | 169,37,-84,13,7,Visual,Blue,visual,roi_169_visual,blue
171 | 170,6,-81,6,7,Visual,Blue,visual,roi_170_visual,blue
172 | 171,-26,-90,3,7,Visual,Blue,visual,roi_171_visual,blue
173 | 172,-33,-79,-13,7,Visual,Blue,visual,roi_172_visual,blue
174 | 173,37,-81,1,7,Visual,Blue,visual,roi_173_visual,blue
175 | 174,-44,2,46,8,Fronto-parietal Task Control,Yellow,fronto_parietal_task_control,roi_174_fronto_parietal_task_control,yellow
176 | 175,48,25,27,8,Fronto-parietal Task Control,Yellow,fronto_parietal_task_control,roi_175_fronto_parietal_task_control,yellow
177 | 176,-47,11,23,8,Fronto-parietal Task Control,Yellow,fronto_parietal_task_control,roi_176_fronto_parietal_task_control,yellow
178 | 177,-53,-49,43,8,Fronto-parietal Task Control,Yellow,fronto_parietal_task_control,roi_177_fronto_parietal_task_control,yellow
179 | 178,-23,11,64,8,Fronto-parietal Task Control,Yellow,fronto_parietal_task_control,roi_178_fronto_parietal_task_control,yellow
180 | 179,58,-53,-14,8,Fronto-parietal Task Control,Yellow,fronto_parietal_task_control,roi_179_fronto_parietal_task_control,yellow
181 | 180,24,45,-15,8,Fronto-parietal Task Control,Yellow,fronto_parietal_task_control,roi_180_fronto_parietal_task_control,yellow
182 | 181,34,54,-13,8,Fronto-parietal Task Control,Yellow,fronto_parietal_task_control,roi_181_fronto_parietal_task_control,yellow
183 | 182,-21,41,-20,-1,Uncertain,White,uncertain,roi_182_uncertain,white
184 | 183,-18,-76,-24,-1,Uncertain,White,uncertain,roi_183_uncertain,white
185 | 184,17,-80,-34,-1,Uncertain,White,uncertain,roi_184_uncertain,white
186 | 185,35,-67,-34,-1,Uncertain,White,uncertain,roi_185_uncertain,white
187 | 186,47,10,33,8,Fronto-parietal Task Control,Yellow,fronto_parietal_task_control,roi_186_fronto_parietal_task_control,yellow
188 | 187,-41,6,33,8,Fronto-parietal Task Control,Yellow,fronto_parietal_task_control,roi_187_fronto_parietal_task_control,yellow
189 | 188,-42,38,21,8,Fronto-parietal Task Control,Yellow,fronto_parietal_task_control,roi_188_fronto_parietal_task_control,yellow
190 | 189,38,43,15,8,Fronto-parietal Task Control,Yellow,fronto_parietal_task_control,roi_189_fronto_parietal_task_control,yellow
191 | 190,49,-42,45,8,Fronto-parietal Task Control,Yellow,fronto_parietal_task_control,roi_190_fronto_parietal_task_control,yellow
192 | 191,-28,-58,48,8,Fronto-parietal Task Control,Yellow,fronto_parietal_task_control,roi_191_fronto_parietal_task_control,yellow
193 | 192,44,-53,47,8,Fronto-parietal Task Control,Yellow,fronto_parietal_task_control,roi_192_fronto_parietal_task_control,yellow
194 | 193,32,14,56,8,Fronto-parietal Task Control,Yellow,fronto_parietal_task_control,roi_193_fronto_parietal_task_control,yellow
195 | 194,37,-65,40,8,Fronto-parietal Task Control,Yellow,fronto_parietal_task_control,roi_194_fronto_parietal_task_control,yellow
196 | 195,-42,-55,45,8,Fronto-parietal Task Control,Yellow,fronto_parietal_task_control,roi_195_fronto_parietal_task_control,yellow
197 | 196,40,18,40,8,Fronto-parietal Task Control,Yellow,fronto_parietal_task_control,roi_196_fronto_parietal_task_control,yellow
198 | 197,-34,55,4,8,Fronto-parietal Task Control,Yellow,fronto_parietal_task_control,roi_197_fronto_parietal_task_control,yellow
199 | 198,-42,45,-2,8,Fronto-parietal Task Control,Yellow,fronto_parietal_task_control,roi_198_fronto_parietal_task_control,yellow
200 | 199,33,-53,44,8,Fronto-parietal Task Control,Yellow,fronto_parietal_task_control,roi_199_fronto_parietal_task_control,yellow
201 | 200,43,49,-2,8,Fronto-parietal Task Control,Yellow,fronto_parietal_task_control,roi_200_fronto_parietal_task_control,yellow
202 | 201,-42,25,30,8,Fronto-parietal Task Control,Yellow,fronto_parietal_task_control,roi_201_fronto_parietal_task_control,yellow
203 | 202,-3,26,44,8,Fronto-parietal Task Control,Yellow,fronto_parietal_task_control,roi_202_fronto_parietal_task_control,yellow
204 | 203,11,-39,50,9,Salience,Black,salience,roi_203_salience,black
205 | 204,55,-45,37,9,Salience,Black,salience,roi_204_salience,black
206 | 205,42,0,47,9,Salience,Black,salience,roi_205_salience,black
207 | 206,31,33,26,9,Salience,Black,salience,roi_206_salience,black
208 | 207,48,22,10,9,Salience,Black,salience,roi_207_salience,black
209 | 208,-35,20,0,9,Salience,Black,salience,roi_208_salience,black
210 | 209,36,22,3,9,Salience,Black,salience,roi_209_salience,black
211 | 210,37,32,-2,9,Salience,Black,salience,roi_210_salience,black
212 | 211,34,16,-8,9,Salience,Black,salience,roi_211_salience,black
213 | 212,-11,26,25,9,Salience,Black,salience,roi_212_salience,black
214 | 213,-1,15,44,9,Salience,Black,salience,roi_213_salience,black
215 | 214,-28,52,21,9,Salience,Black,salience,roi_214_salience,black
216 | 215,0,30,27,9,Salience,Black,salience,roi_215_salience,black
217 | 216,5,23,37,9,Salience,Black,salience,roi_216_salience,black
218 | 217,10,22,27,9,Salience,Black,salience,roi_217_salience,black
219 | 218,31,56,14,9,Salience,Black,salience,roi_218_salience,black
220 | 219,26,50,27,9,Salience,Black,salience,roi_219_salience,black
221 | 220,-39,51,17,9,Salience,Black,salience,roi_220_salience,black
222 | 221,2,-24,30,6,Memory retrieval?,Gray,mem_retr,roi_221_mem_retr,gray
223 | 222,6,-24,0,10,Subcortical,Brown,subcortical,roi_222_subcortical,brown
224 | 223,-2,-13,12,10,Subcortical,Brown,subcortical,roi_223_subcortical,brown
225 | 224,-10,-18,7,10,Subcortical,Brown,subcortical,roi_224_subcortical,brown
226 | 225,12,-17,8,10,Subcortical,Brown,subcortical,roi_225_subcortical,brown
227 | 226,-5,-28,-4,10,Subcortical,Brown,subcortical,roi_226_subcortical,brown
228 | 227,-22,7,-5,10,Subcortical,Brown,subcortical,roi_227_subcortical,brown
229 | 228,-15,4,8,10,Subcortical,Brown,subcortical,roi_228_subcortical,brown
230 | 229,31,-14,2,10,Subcortical,Brown,subcortical,roi_229_subcortical,brown
231 | 230,23,10,1,10,Subcortical,Brown,subcortical,roi_230_subcortical,brown
232 | 231,29,1,4,10,Subcortical,Brown,subcortical,roi_231_subcortical,brown
233 | 232,-31,-11,0,10,Subcortical,Brown,subcortical,roi_232_subcortical,brown
234 | 233,15,5,7,10,Subcortical,Brown,subcortical,roi_233_subcortical,brown
235 | 234,9,-4,6,10,Subcortical,Brown,subcortical,roi_234_subcortical,brown
236 | 235,54,-43,22,11,Ventral attention,Teal,ventral_attention,roi_235_ventral_attention,olivedrab
237 | 236,-56,-50,10,11,Ventral attention,Teal,ventral_attention,roi_236_ventral_attention,olivedrab
238 | 237,-55,-40,14,11,Ventral attention,Teal,ventral_attention,roi_237_ventral_attention,olivedrab
239 | 238,52,-33,8,11,Ventral attention,Teal,ventral_attention,roi_238_ventral_attention,olivedrab
240 | 239,51,-29,-4,11,Ventral attention,Teal,ventral_attention,roi_239_ventral_attention,olivedrab
241 | 240,56,-46,11,11,Ventral attention,Teal,ventral_attention,roi_240_ventral_attention,olivedrab
242 | 241,53,33,1,11,Ventral attention,Teal,ventral_attention,roi_241_ventral_attention,olivedrab
243 | 242,-49,25,-1,11,Ventral attention,Teal,ventral_attention,roi_242_ventral_attention,olivedrab
244 | 243,-16,-65,-20,13,Cerebellar,Pale blue,cerebellar,roi_243_cerebellar,lightslateblue
245 | 244,-32,-55,-25,13,Cerebellar,Pale blue,cerebellar,roi_244_cerebellar,lightslateblue
246 | 245,22,-58,-23,13,Cerebellar,Pale blue,cerebellar,roi_245_cerebellar,lightslateblue
247 | 246,1,-62,-18,13,Cerebellar,Pale blue,cerebellar,roi_246_cerebellar,lightslateblue
248 | 247,33,-12,-34,-1,Uncertain,White,uncertain,roi_247_uncertain,white
249 | 248,-31,-10,-36,-1,Uncertain,White,uncertain,roi_248_uncertain,white
250 | 249,49,-3,-38,-1,Uncertain,White,uncertain,roi_249_uncertain,white
251 | 250,-50,-7,-39,-1,Uncertain,White,uncertain,roi_250_uncertain,white
252 | 251,10,-62,61,12,Dorsal attention,Green,dorsal_attention,roi_251_dorsal_attention,green
253 | 252,-52,-63,5,12,Dorsal attention,Green,dorsal_attention,roi_252_dorsal_attention,green
254 | 253,-47,-51,-21,-1,Uncertain,White,uncertain,roi_253_uncertain,white
255 | 254,46,-47,-17,-1,Uncertain,White,uncertain,roi_254_uncertain,white
256 | 255,47,-30,49,1,Sensory/somatomotor Hand,Cyan,hand,roi_255_hand,cyan
257 | 256,22,-65,48,12,Dorsal attention,Green,dorsal_attention,roi_256_dorsal_attention,green
258 | 257,46,-59,4,12,Dorsal attention,Green,dorsal_attention,roi_257_dorsal_attention,green
259 | 258,25,-58,60,12,Dorsal attention,Green,dorsal_attention,roi_258_dorsal_attention,green
260 | 259,-33,-46,47,12,Dorsal attention,Green,dorsal_attention,roi_259_dorsal_attention,green
261 | 260,-27,-71,37,12,Dorsal attention,Green,dorsal_attention,roi_260_dorsal_attention,green
262 | 261,-32,-1,54,12,Dorsal attention,Green,dorsal_attention,roi_261_dorsal_attention,green
263 | 262,-42,-60,-9,12,Dorsal attention,Green,dorsal_attention,roi_262_dorsal_attention,green
264 | 263,-17,-59,64,12,Dorsal attention,Green,dorsal_attention,roi_263_dorsal_attention,green
265 | 264,29,-5,54,12,Dorsal attention,Green,dorsal_attention,roi_264_dorsal_attention,green
266 |
--------------------------------------------------------------------------------
/mn2vec_toy.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jdwilson4/multi-node2vec/f6f86707e10227a7609bfcce5db4b21e03c932ea/mn2vec_toy.png
--------------------------------------------------------------------------------
/multi_node2vec.py:
--------------------------------------------------------------------------------
1 | '''
2 | Wrapper for the multi-node2vec algorithm.
3 |
4 | Details can be found in the paper: "Fast Embedding of Multilayer Networks: An Algorithm and Application to Group fMRI"
5 | by JD Wilson, M Baybay, R Sankar, and P Stillman
6 |
7 | Preprint here: https://arxiv.org/pdf/1809.06437.pdf
8 |
9 | Contributors:
10 | - Melanie Baybay
11 | University of San Francisco, Department of Computer Science
12 | - Rishi Sankar
13 | Henry M. Gunn High School
14 | - James D. Wilson (maintainer)
15 | University of San Francisco, Department of Mathematics and Statistics
16 |
17 | Questions or Bugs? Contact James D. Wilson at jdwilson4@usfca.edu
18 | '''
19 | import os
20 | import src as mltn2v
21 | import argparse
22 | import time
23 |
24 |
25 | def parse_args():
26 | parser = argparse.ArgumentParser(description="Run multi-node2vec on multilayer networks.")
27 |
28 | parser.add_argument('--dir', nargs='?', default='data/CONTROL_fmt',
29 | help='Absolute path to directory of correlation/adjacency matrix files (csv format). Note that rows and columns must be properly labeled by node ID in each .csv.')
30 |
31 | parser.add_argument('--output', nargs='?', default='new_results/',
32 | help='Absolute path to output directory (no extension).')
33 |
34 | #parser.add_argument('--filename', nargs='?', default='new_results/mltn2v_control',
35 | # help='output filename (no extension).')
36 |
37 | parser.add_argument('--d', type=int, default=100,
38 | help='Dimensionality. Default is 100.')
39 |
40 | parser.add_argument('--walk_length', type=int, default=100,
41 | help='Length of each random walk. Default is 100.')
42 |
43 | parser.add_argument('--window_size', type=int, default = 10,
44 | help='Size of context window used for Skip Gram optimization. Default is 10.')
45 |
46 | parser.add_argument('--n_samples', type=int, default=1,
47 | help='Number of walks per node per layer. Default is 1.')
48 |
49 | parser.add_argument('--thresh', type=float, default=0.5,
50 | help='Threshold for converting a weighted network to an unweighted one. All weights less than or equal to thresh will be considered 0 and all others 1. Default is 0.5. Use None if the network is unweighted.')
51 |
52 | # parser.add_argument('--w2v_iter', default=1, type=int,
53 | # help='Number of epochs in word2vec')
54 |
55 | parser.add_argument('--w2v_workers', type=int, default=8,
56 | help='Number of parallel worker threads. Default is 8.')
57 |
58 | parser.add_argument('--rvals', type=float, default=0.25,
59 | help='Layer walk parameter for neighborhood search. Default is 0.25')
60 |
61 | parser.add_argument('--pvals', type=float, default=1,
62 | help='Return walk parameter for neighborhood search. Default is 1')
63 |
64 | parser.add_argument('--qvals', type=float, default=0.5,
65 | help='Exploration walk parameter for neighborhood search. Default is 0.50')
66 |
67 |
68 | return parser.parse_args()
69 |
70 |
71 | def main(args):
72 | start = time.time()
73 | # PARSE LAYERS -- THRESHOLD & CONVERT TO BINARY
74 | layers = mltn2v.timed_invoke("parsing network layers",
75 | lambda: mltn2v.parse_matrix_layers(args.dir, binary=True, thresh=args.thresh))
76 | # check if layers were parsed
77 | if layers:
78 | # EXTRACT NEIGHBORHOODS
79 | nbrhd_dict = mltn2v.timed_invoke("extracting neighborhoods",
80 | lambda: mltn2v.extract_neighborhoods_walk(layers, args.walk_length, args.rvals, args.pvals, args.qvals))
81 | # GENERATE FEATURES
82 | out = mltn2v.clean_output(args.output)
83 | for w in args.rvals:
84 | out_path = os.path.join(out, 'r' + str(w) + '/mltn2v_results')
85 | mltn2v.timed_invoke("generating features",
86 | lambda: mltn2v.generate_features(nbrhd_dict[w], args.d, out_path, nbrhd_size=args.window_size,
87 | w2v_iter=1, workers=args.w2v_workers))
88 |
89 | print("\nCompleted Multilayer Network Embedding for r=" + str(w) + " in {:.2f} secs.\nSee results:".format(time.time() - start))
90 | print("\t" + out_path + ".csv")
91 | print("Completed Multilayer Network Embedding for all r values.")
92 | else:
93 | print("Whoops!")
94 |
95 |
96 | if __name__ == '__main__':
97 | args = parse_args()
98 | args.rvals = [args.rvals]
99 | main(args)
100 |
--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | numpy==1.12.1
2 | pandas==0.24.0
3 | gensim==2.3.0
4 | networkx==2.5.1
5 |
--------------------------------------------------------------------------------
/results/test/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jdwilson4/multi-node2vec/f6f86707e10227a7609bfcce5db4b21e03c932ea/results/test/.DS_Store
--------------------------------------------------------------------------------
/src/__init__.py:
--------------------------------------------------------------------------------
1 | from .multinode2vec import *
2 | from .mltn2v_utils import *
--------------------------------------------------------------------------------
/src/__pycache__/__init__.cpython-36.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jdwilson4/multi-node2vec/f6f86707e10227a7609bfcce5db4b21e03c932ea/src/__pycache__/__init__.cpython-36.pyc
--------------------------------------------------------------------------------
/src/__pycache__/mltn2v_utils.cpython-36.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jdwilson4/multi-node2vec/f6f86707e10227a7609bfcce5db4b21e03c932ea/src/__pycache__/mltn2v_utils.cpython-36.pyc
--------------------------------------------------------------------------------
/src/__pycache__/multinode2vec.cpython-36.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jdwilson4/multi-node2vec/f6f86707e10227a7609bfcce5db4b21e03c932ea/src/__pycache__/multinode2vec.cpython-36.pyc
--------------------------------------------------------------------------------
/src/__pycache__/nbrhd_gen_walk_nx.cpython-36.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jdwilson4/multi-node2vec/f6f86707e10227a7609bfcce5db4b21e03c932ea/src/__pycache__/nbrhd_gen_walk_nx.cpython-36.pyc
--------------------------------------------------------------------------------
/src/mltn2v_utils.py:
--------------------------------------------------------------------------------
1 | """
2 | Helper functions for parsing multilayer graphs and layers.
3 |
4 | Details of multi-node2vec can be found in the paper: "Fast Embedding of Multilayer Networks: An Algorithm and Application to Group fMRI"
5 | by JD Wilson, M Baybay, R Sankar, and P Stillman
6 |
7 | Preprint here: https://arxiv.org/pdf/1809.06437.pdf
8 |
9 | Contributors:
10 | - Melanie Baybay
11 | University of San Francisco, Department of Computer Science
12 | - Rishi Sankar
13 | Henry M. Gunn High School
14 | - James D. Wilson (maintainer)
15 | University of San Francisco, Department of Mathematics and Statistics
16 |
17 | Questions or Bugs? Contact James D. Wilson at jdwilson4@usfca.edu
18 | """
19 |
20 | import os
21 | import pandas as pd
22 | from pandas.api.types import is_numeric_dtype
23 | import time
24 |
25 |
26 | # -------------------------------------------------------------------------------
27 | # PARSING AND CONVERSION FOR MULTILAYER GRAPHS
28 | # -------------------------------------------------------------------------------
29 | def parse_matrix_layers(network_dir, delim=',', binary=False, thresh=None):
30 | """
31 | Converts directory of adjacency matrix files into pandas dataframes.
32 | :param network_dir: Directory of adjacency matrix files
33 | :param delim: separator for adjacency matrix
34 | :param binary: boolean of whether or not to convert edge weights to binary
35 | :param thresh: threshold for edge weights. Will accepts weights <= thresh
36 | :return: List of adjacency lists. Each adjacency list is one layer and is represented
37 | as pandas DataFrames with 'source', 'target', 'weight' columns.
38 | """
39 | # expand directory path
40 | network_dir = expand_path(network_dir)
41 |
42 | # iterate files and convert to pandas dataframes
43 | layers = []
44 | for network_file in os.listdir(network_dir):
45 | file_path = os.path.join(network_dir, network_file)
46 | try:
47 | # read as pandas DataFrame, index=source, col=target
48 | layer = pd.read_csv(file_path, index_col=0)
49 | if layer.shape[0] != layer.shape[1]:
50 | print('[ERROR] Invalid adjacency matrix. Expecting matrix with index as source and column as target.')
51 | return
52 | if thresh is not None:
53 | layer[layer <= thresh] = 0
54 | if binary:
55 | layer[layer != 0] = 1
56 | # ensure that index (node name) is string, since word2vec will need it as str
57 | if is_numeric_dtype(layer.index):
58 | layer.index = layer.index.map(str)
59 | # replace all 0s with NaN
60 | layer.replace(to_replace=0, value=pd.np.nan, inplace=True)
61 | # convert matrix --> adjacency list with cols ["source", "target", "weight"]
62 | layer = layer.stack(dropna=True).reset_index()
63 | # rename columns
64 | layer.columns = ["source", "target", "weight"]
65 | layers.append(layer)
66 | except Exception as e:
67 | print('[ERROR] Could not read file "{}": {} '.format(file_path, e))
68 | return layers
69 |
70 |
71 | def expand_path(path):
72 | """
73 | Expands a file path to handle user and environmental variables.
74 | :param path: path to expand
75 | :return: expanded path
76 | """
77 | new_path = os.path.expanduser(path)
78 | return os.path.expandvars(new_path)
79 |
80 |
81 | # -------------------------------------------------------------------------------
82 | # OUTPUT
83 | # -------------------------------------------------------------------------------
84 | def feature_matrix_to_csv(ftrs, filename):
85 | """
86 | Convert feature matrix to csv.
87 | :param matrix: pandas DataFrame of features
88 | :param filename: absolute path to output file (no extension)
89 | :param nodes_on: if True, first column indicates node_id
90 | :return:
91 | """
92 | out = filename + ".csv"
93 | ftrs.to_csv(out, sep=',', header=False)
94 | return
95 |
96 |
97 | def timed_invoke(action_desc, method):
98 | """
99 | Invokes a method with timing.
100 | :param action_desc: The string describing the method action
101 | :param method: The method to invoke
102 | :return: The return object of the method
103 | """
104 | print('Started {}...'.format(action_desc))
105 | start = time.time()
106 | try:
107 | output = method()
108 | print('Finished {} in {} seconds'.format(action_desc, int(time.time() - start)))
109 | return output
110 | except Exception:
111 | print('Exception while {} after {} seconds'.format(action_desc, int(time.time() - start)))
112 | raise
113 |
114 |
115 | def clean_output(directory):
116 | """
117 | Checks if output directory exists, otherwise it is created.
118 | """
119 | directory = expand_path(directory)
120 | if os.path.isdir(directory):
121 | return directory
122 | else:
123 | os.makedirs(directory)
124 | print("[WARNING] Directory not found. Created {}".format(directory))
125 | return directory
126 |
--------------------------------------------------------------------------------
/src/multinode2vec.py:
--------------------------------------------------------------------------------
1 | """
2 | Core functions of the multi-node2vec algorithm.
3 |
4 | Details can be found in the paper: "Fast Embedding of Multilayer Networks: An Algorithm and Application to Group fMRI"
5 | by JD Wilson, M Baybay, R Sankar, and P Stillman
6 |
7 | Preprint here: https://arxiv.org/pdf/1809.06437.pdf
8 |
9 | Contributors:
10 | - Melanie Baybay
11 | University of San Francisco, Department of Computer Science
12 | - Rishi Sankar
13 | Henry M. Gunn High School
14 | - James D. Wilson (maintainer)
15 | University of San Francisco, Department of Mathematics and Statistics
16 |
17 | Questions or Bugs? Contact James D. Wilson at jdwilson4@usfca.edu
18 | """
19 | from gensim.models import word2vec as w2v
20 | from .mltn2v_utils import *
21 | from .nbrhd_gen_walk_nx import *
22 | import time
23 | import networkx as nx
24 |
25 |
26 | # -------------------------------------------------------------------------------
27 | # multinode2vec
28 | # -------------------------------------------------------------------------------
29 | def generate_features(nbrhds, d, out, nbrhd_size=-1, w2v_iter=1, workers=8, sg=1):
30 | """
31 | Generates d features for each unique node in a multilayer network based on
32 | its neighborhood.
33 |
34 | :param G_m: multilayer graph
35 | :param d: feature dimensionality
36 | :param out: absolute path for output file (no extension, file type)
37 | :param nbrhd_size: window size for Skip-Gram optimization
38 | :param n_samples: number of generated neighborhoods per node
39 | :param w2v_iter: number of word2vec training epochs
40 | :param workers: number of workers
41 | :param sg: sets word2vec architecture. 1 for Skip-Gram, 0 for CBOW
42 | :return: n x d network embedding
43 | """
44 | print("Total Neighborhoods: {}".format(len(nbrhds)))
45 | w2v_model = w2v.Word2Vec(nbrhds, size=d, window=nbrhd_size, min_count=0,
46 | workers=workers, iter=w2v_iter, sg=sg)
47 | embfile = out + ".emb"
48 | splitpath = embfile.split('/')
49 | if len(splitpath) > 1:
50 | dirs = embfile[:-len(splitpath[-1])]
51 | if not os.path.exists(dirs):
52 | os.makedirs(dirs)
53 | if not os.path.exists(embfile):
54 | with open(embfile, 'w'): pass
55 | w2v_model.wv.save_word2vec_format(embfile)
56 | ftrs = emb_to_pandas(embfile)
57 | feature_matrix_to_csv(ftrs, out)
58 | return ftrs
59 |
60 |
61 | # -------------------------------------------------------------------------------
62 | # NEIGHBORHOODS
63 | # -------------------------------------------------------------------------------
64 | def extract_neighborhoods_walk(layers, nbrhd_size, wvals, p, q, is_directed=False, weighted=False):
65 | nxg = []
66 | for layer in layers:
67 | nxg.append(nx.convert_matrix.from_pandas_edgelist(layer,edge_attr='weight'))
68 |
69 | start = time.time()
70 | nbrhd_gen = NeighborhoodGen(nxg, p, q, is_directed=is_directed, weighted=weighted)
71 | print("Finished initialization of neighborhood generator in " + str(time.time() - start) + " seconds.")
72 |
73 | neighborhood_dict = {}
74 | for w in wvals:
75 | neighborhoods = []
76 | for i in range(len(nxg)):
77 | layer = nxg[i]
78 | for node in layer.nodes():
79 | for j in range(52):
80 | neighborhoods.append(nbrhd_gen.multinode2vec_walk(w, nbrhd_size, node, i))
81 | print("Finished nbrhd generation for r=" + str(w))
82 | neighborhood_dict[w] = neighborhoods
83 |
84 | return neighborhood_dict
85 |
86 | def extract_neighborhoods(layers, nbrhd_size, n_samples, weighted=False):
87 | """
88 | Extracts neighborhoods of length, nbrhd_size, for each node in each layer.
89 | :param layers: list of adjacency lists as pandas DataFrames with columns ["source", "target", "weight"]
90 | :param nbrhd_size: number of nodes per neighborhood
91 | :param n_samples: number of samples per node
92 | :param weighted: whether to select neighborhoods by highest weight
93 | :return: list of neighborhoods, represented as lists
94 | """
95 | neighborhoods = []
96 | if weighted:
97 | for layer in layers:
98 | for node in layer["source"].unique():
99 | neighbors = layer.loc[layer["source"] == node, "target"]
100 | neighbors.sort_values(by="weight", ascending=False, inplace=True)
101 | neighborhoods.extend(
102 | extract_node_neighborhoods(node, neighbors, nbrhd_size, n_samples)
103 | )
104 | else:
105 | for layer in layers:
106 | for node in layer["source"].unique():
107 | neighbors = layer.loc[layer["source"] == node, "target"]
108 | neighborhoods.extend(
109 | extract_node_neighborhoods(node, neighbors, nbrhd_size, n_samples)
110 | )
111 | return neighborhoods
112 |
113 |
114 | def extract_node_neighborhoods(node, neighbors, nbrhd_size, n_samples):
115 | if len(neighbors) < nbrhd_size:
116 | print("[WARNING] Selected neighborhood size {} > node-{}'s degree {}. "
117 | "Setting neighborhood size to {} for node-{}."
118 | .format(nbrhd_size, node, len(neighbors), len(neighbors), node))
119 | nbrhd_size = len(neighbors)
120 | node_neighborhoods = []
121 | n = 0
122 | while n < n_samples:
123 | nbrhd = [node]
124 | nbrhd.extend(neighbors.sample(n=nbrhd_size-1).values)
125 | node_neighborhoods.append(nbrhd)
126 | n += 1
127 | return node_neighborhoods
128 |
129 |
130 | # -------------------------------------------------------------------------------
131 | # HELPERS
132 | # -------------------------------------------------------------------------------
133 | def emb_to_pandas(emb_file):
134 | """
135 | Converts embedding file, as extracted from trained word2vec model, to a numpy n-dimensional array.
136 |
137 | :param emb_file: absolute path to word2vec embedding file
138 | :return: numpy ndarray, (N x d)
139 | """
140 | ftrs = pd.read_csv(emb_file, delim_whitespace=True, skiprows=1, header=None, index_col=0)
141 | ftrs.sort_index(inplace=True)
142 | return ftrs
143 |
--------------------------------------------------------------------------------
/src/nbrhd_gen_walk.py:
--------------------------------------------------------------------------------
1 | '''
2 | Neighborhood aliasing procedure used for fast random walks on multilayer networks.
3 |
4 | Details can be found in the paper: "Fast Embedding of Multilayer Networks: An Algorithm and Application to Group fMRI"
5 | by JD Wilson, M Baybay, R Sankar, and P Stillman
6 |
7 | Preprint here: https://arxiv.org/pdf/1809.06437.pdf
8 |
9 | Contributors:
10 | - Melanie Baybay
11 | University of San Francisco, Department of Computer Science
12 | - Rishi Sankar
13 | Henry M. Gunn High School
14 | - James D. Wilson (maintainer)
15 | University of San Francisco, Department of Mathematics and Statistics
16 |
17 | Questions or Bugs? Contact James D. Wilson at jdwilson4@usfca.edu
18 | '''
19 |
20 |
21 | import numpy as np
22 | import networkx as nx
23 | import random
24 | #import multiprocessing
25 | import threading
26 | import time
27 |
28 | #is is_directed needed?
29 |
30 | class NeighborhoodGen():
31 | def __init__(self, graph, p, q, thread_limit=1, is_directed=False, weighted=False):
32 | self.G = graph
33 | self.is_directed = is_directed
34 | self.p = p
35 | self.q = q
36 | self.weighted = weighted
37 | self.thread_limit = thread_limit
38 |
39 | self.preprocess_transition_probs()
40 |
41 | def multinode2vec_walk(self, w, walk_length, start_node, start_layer_id):
42 | '''
43 | Simulate a random walk starting from start node. (Generate one neighborhood)
44 | '''
45 |
46 | G = self.G
47 | alias_nodes = self.alias_nodes
48 | alias_edges = self.alias_edges
49 |
50 | walk = [start_node] #nbrhd
51 | cur_layer_id = start_layer_id
52 | force_switch = False
53 | while len(walk) < walk_length:
54 | cur = walk[-1]
55 | if not force_switch:
56 | prev_layer_id = cur_layer_id
57 | random.seed(1234)
58 | rval = random.random()
59 | if rval < w or force_switch: #then switch layer
60 | total_layers = len(G)
61 | rlay = random.randint(0, total_layers - 2)
62 | if rlay >= cur_layer_id:
63 | rlay += 1
64 | cur_layer_id = rlay
65 | force_switch = False
66 | cur_layer = G[cur_layer_id]
67 | try:
68 | cur_nbrs = sorted(cur_layer.neighbors(cur))
69 | if len(cur_nbrs) > 0:
70 | if len(walk) == 1 or prev_layer_id != cur_layer_id:
71 | walk.append(cur_nbrs[alias_draw(alias_nodes[cur_layer_id][cur][0], alias_nodes[cur_layer_id][cur][1])])
72 | else:
73 | prev = walk[-2]
74 | next = cur_nbrs[alias_draw(alias_edges[cur_layer_id][(prev, cur)][0],
75 | alias_edges[cur_layer_id][(prev, cur)][1])]
76 | walk.append(next)
77 | else:
78 | force_switch = True
79 | continue
80 | except Exception as e:
81 | force_switch = True
82 | continue
83 |
84 | return walk
85 |
86 | def simulate_walks(self, num_walks, walk_length):
87 | '''
88 | Repeatedly simulate random walks from each node.
89 | '''
90 | G = self.G
91 | walks = {}
92 | for layer in G:
93 | walks[layer] = []
94 | nodes = list(layer.nodes())
95 | print('Walk iteration:')
96 | for walk_iter in range(num_walks):
97 | print(str(walk_iter+1), '/', str(num_walks))
98 | random.shuffle(nodes)
99 | for node in nodes:
100 | walks[layer].append(self.node2vec_walk(walk_length=walk_length, start_node=node))
101 |
102 | return walks
103 |
104 | def get_alias_edge(self, src, dst, layer):
105 | '''
106 | Get the alias edge setup lists for a given edge.
107 | '''
108 | p = self.p
109 | q = self.q
110 |
111 | unnormalized_probs = []
112 | for dst_nbr in sorted(layer.neighbors(dst)):
113 | if dst_nbr == src:
114 | unnormalized_probs.append(layer[dst][dst_nbr]['weight']/p)
115 | elif layer.has_edge(dst_nbr, src):
116 | unnormalized_probs.append(layer[dst][dst_nbr]['weight'])
117 | else:
118 | unnormalized_probs.append(layer[dst][dst_nbr]['weight']/q)
119 | norm_const = sum(unnormalized_probs)
120 | normalized_probs = [float(u_prob)/norm_const for u_prob in unnormalized_probs]
121 |
122 | return alias_setup(normalized_probs)
123 |
124 | def preprocess_transition_probs(self):
125 | '''
126 | Preprocessing of transition probabilities for guiding the random walks.
127 | '''
128 | G = self.G
129 | is_directed = self.is_directed
130 |
131 | self.alias_nodes = {}
132 | self.alias_edges = {}
133 | self.lock = threading.Lock()
134 |
135 | tlimit = self.thread_limit
136 | layer_count = len(self.G)
137 | counter = 0
138 | if tlimit == 1:
139 | for i in range(layer_count):
140 | self.preprocess_thread(self.G[i],i)
141 | else:
142 | while counter < layer_count:
143 | threads = []
144 | rem = layer_count - counter
145 | if rem >= tlimit:
146 | for i in range(tlimit):
147 | thread = threading.Thread(target=self.preprocess_thread, args=(self.G[counter],counter,))
148 | threads.append(thread)
149 | thread.start()
150 | counter += 1
151 | else:
152 | for i in range(rem):
153 | thread = threading.Thread(target=self.preprocess_thread, args=(self.G[counter],counter,))
154 | threads.append(thread)
155 | thread.start()
156 | counter += 1
157 | for thread in threads:
158 | thread.join()
159 |
160 | return
161 |
162 | def preprocess_thread(self, layer, counter):
163 | start_time = time.time()
164 | print("Starting thread for layer " + str(counter))
165 | alias_nodes = {}
166 | for node in layer.nodes():
167 | unnormalized_probs = [layer[node][nbr]['weight'] for nbr in sorted(layer.neighbors(node))]
168 | norm_const = sum(unnormalized_probs)
169 | normalized_probs = [float(u_prob)/norm_const for u_prob in unnormalized_probs]
170 | alias_nodes[node] = alias_setup(normalized_probs)
171 |
172 | alias_edges = {}
173 | triads = {}
174 |
175 | if self.is_directed:
176 | for edge in layer.edges():
177 | alias_edges[edge] = self.get_alias_edge(edge[0], edge[1], layer)
178 | else:
179 | for edge in layer.edges():
180 | alias_edges[edge] = self.get_alias_edge(edge[0], edge[1], layer)
181 | alias_edges[(edge[1], edge[0])] = self.get_alias_edge(edge[1], edge[0], layer)
182 |
183 | self.lock.acquire()
184 | try:
185 | self.alias_nodes[counter] = alias_nodes
186 | self.alias_edges[counter] = alias_edges
187 | finally:
188 | self.lock.release()
189 |
190 | print("Finished thread for layer " + str(counter) + " in " + str(time.time() - start_time) + " seconds.")
191 |
192 | return
193 |
194 | def alias_setup(probs):
195 | '''
196 | Compute utility lists for non-uniform sampling from discrete distributions.
197 | Refer to https://hips.seas.harvard.edu/blog/2013/03/03/the-alias-method-efficient-sampling-with-many-discrete-outcomes/
198 | for details
199 | '''
200 | K = len(probs)
201 | q = np.zeros(K)
202 | J = np.zeros(K, dtype=np.int)
203 |
204 | smaller = []
205 | larger = []
206 | for kk, prob in enumerate(probs):
207 | q[kk] = K*prob
208 | if q[kk] < 1.0:
209 | smaller.append(kk)
210 | else:
211 | larger.append(kk)
212 |
213 | while len(smaller) > 0 and len(larger) > 0:
214 | small = smaller.pop()
215 | large = larger.pop()
216 |
217 | J[small] = large
218 | q[large] = q[large] + q[small] - 1.0
219 | if q[large] < 1.0:
220 | smaller.append(large)
221 | else:
222 | larger.append(large)
223 |
224 | return J, q
225 |
226 | def alias_draw(J, q):
227 | '''
228 | Draw sample from a non-uniform discrete distribution using alias sampling.
229 | '''
230 | K = len(J)
231 |
232 | kk = int(np.floor(np.random.rand()*K))
233 | if np.random.rand() < q[kk]:
234 | return kk
235 | else:
236 | return J[kk]
237 |
--------------------------------------------------------------------------------
/src/nbrhd_gen_walk_nx.py:
--------------------------------------------------------------------------------
1 | '''
2 | Neighborhood aliasing procedure used for fast random walks on multilayer networks.
3 |
4 | Details can be found in the paper: "Fast Embedding of Multilayer Networks: An Algorithm and Application to Group fMRI"
5 | by JD Wilson, M Baybay, R Sankar, and P Stillman
6 |
7 | Preprint here: https://arxiv.org/pdf/1809.06437.pdf
8 |
9 | Contributors:
10 | - Melanie Baybay
11 | University of San Francisco, Department of Computer Science
12 | - Rishi Sankar
13 | Henry M. Gunn High School
14 | - James D. Wilson (maintainer)
15 | University of San Francisco, Department of Mathematics and Statistics
16 |
17 | Questions or Bugs? Contact James D. Wilson at jdwilson4@usfca.edu
18 | '''
19 |
20 |
21 | import numpy as np
22 | import networkx as nx
23 | import random
24 | #import multiprocessing
25 | import threading
26 | import time
27 |
28 | #is is_directed needed?
29 |
30 | class NeighborhoodGen():
31 | def __init__(self, graph, p, q, thread_limit=1, is_directed=False, weighted=False):
32 | self.G = graph
33 | self.is_directed = is_directed
34 | self.p = p
35 | self.q = q
36 | self.weighted = weighted
37 | self.thread_limit = thread_limit
38 |
39 | self.preprocess_transition_probs()
40 |
41 | def multinode2vec_walk(self, w, walk_length, start_node, start_layer_id):
42 | '''
43 | Simulate a random walk starting from start node. (Generate one neighborhood)
44 | '''
45 |
46 | G = self.G
47 | alias_nodes = self.alias_nodes
48 | alias_edges = self.alias_edges
49 |
50 | walk = [start_node] #nbrhd
51 | cur_layer_id = start_layer_id
52 | force_switch = False
53 | while len(walk) < walk_length:
54 | cur = walk[-1]
55 | if not force_switch:
56 | prev_layer_id = cur_layer_id
57 | random.seed(1234)
58 | rval = random.random()
59 | if rval < w or force_switch: #then switch layer CHECK THIS!!
60 | total_layers = len(G)
61 | rlay = random.randint(0, total_layers - 2)
62 | if rlay >= cur_layer_id:
63 | rlay += 1
64 | cur_layer_id = rlay
65 | force_switch = False
66 | cur_layer = G[cur_layer_id]
67 | try:
68 | cur_nbrs = sorted(cur_layer.neighbors(cur))
69 | if len(cur_nbrs) > 0:
70 | if len(walk) == 1 or prev_layer_id != cur_layer_id:
71 | walk.append(cur_nbrs[alias_draw(alias_nodes[cur_layer_id][cur][0], alias_nodes[cur_layer_id][cur][1])])
72 | else:
73 | prev = walk[-2]
74 | next = cur_nbrs[alias_draw(alias_edges[cur_layer_id][(prev, cur)][0],
75 | alias_edges[cur_layer_id][(prev, cur)][1])]
76 | walk.append(next)
77 | else:
78 | force_switch = True
79 | continue
80 | except Exception as e:
81 | force_switch = True
82 | continue
83 |
84 | return walk
85 |
86 | def simulate_walks(self, num_walks, walk_length):
87 | '''
88 | Repeatedly simulate random walks from each node.
89 | '''
90 | G = self.G
91 | walks = {}
92 | for layer in G:
93 | walks[layer] = []
94 | nodes = list(layer.nodes())
95 | print('Walk iteration:')
96 | for walk_iter in range(num_walks):
97 | print(str(walk_iter+1), '/', str(num_walks))
98 | random.shuffle(nodes)
99 | for node in nodes:
100 | walks[layer].append(self.node2vec_walk(walk_length=walk_length, start_node=node))
101 |
102 | return walks
103 |
104 | def get_alias_edge(self, src, dst, layer):
105 | '''
106 | Get the alias edge setup lists for a given edge.
107 | '''
108 | p = self.p
109 | q = self.q
110 |
111 | unnormalized_probs = []
112 | for dst_nbr in sorted(layer.neighbors(dst)):
113 | if dst_nbr == src:
114 | unnormalized_probs.append(layer[dst][dst_nbr]['weight']/p)
115 | elif layer.has_edge(dst_nbr, src):
116 | unnormalized_probs.append(layer[dst][dst_nbr]['weight'])
117 | else:
118 | unnormalized_probs.append(layer[dst][dst_nbr]['weight']/q)
119 | norm_const = sum(unnormalized_probs)
120 | normalized_probs = [float(u_prob)/norm_const for u_prob in unnormalized_probs]
121 |
122 | return alias_setup(normalized_probs)
123 |
124 | def preprocess_transition_probs(self):
125 | '''
126 | Preprocessing of transition probabilities for guiding the random walks.
127 | '''
128 | G = self.G
129 | is_directed = self.is_directed
130 |
131 | self.alias_nodes = {}
132 | self.alias_edges = {}
133 | self.lock = threading.Lock()
134 |
135 | tlimit = self.thread_limit
136 | layer_count = len(self.G)
137 | counter = 0
138 | if tlimit == 1:
139 | for i in range(layer_count):
140 | self.preprocess_thread(self.G[i],i)
141 | else:
142 | while counter < layer_count:
143 | threads = []
144 | rem = layer_count - counter
145 | if rem >= tlimit:
146 | for i in range(tlimit):
147 | thread = threading.Thread(target=self.preprocess_thread, args=(self.G[counter],counter,))
148 | threads.append(thread)
149 | thread.start()
150 | counter += 1
151 | else:
152 | for i in range(rem):
153 | thread = threading.Thread(target=self.preprocess_thread, args=(self.G[counter],counter,))
154 | threads.append(thread)
155 | thread.start()
156 | counter += 1
157 | for thread in threads:
158 | thread.join()
159 |
160 | return
161 |
162 | def preprocess_thread(self, layer, counter):
163 | start_time = time.time()
164 | print("Starting thread for layer " + str(counter))
165 | alias_nodes = {}
166 | for node in layer.nodes():
167 | unnormalized_probs = [layer[node][nbr]['weight'] for nbr in sorted(layer.neighbors(node))]
168 | norm_const = sum(unnormalized_probs)
169 | normalized_probs = [float(u_prob)/norm_const for u_prob in unnormalized_probs]
170 | alias_nodes[node] = alias_setup(normalized_probs)
171 |
172 | alias_edges = {}
173 | triads = {}
174 |
175 | if self.is_directed:
176 | for edge in layer.edges():
177 | alias_edges[edge] = self.get_alias_edge(edge[0], edge[1], layer)
178 | else:
179 | for edge in layer.edges():
180 | alias_edges[edge] = self.get_alias_edge(edge[0], edge[1], layer)
181 | alias_edges[(edge[1], edge[0])] = self.get_alias_edge(edge[1], edge[0], layer)
182 |
183 | self.lock.acquire()
184 | try:
185 | self.alias_nodes[counter] = alias_nodes
186 | self.alias_edges[counter] = alias_edges
187 | finally:
188 | self.lock.release()
189 |
190 | print("Finished thread for layer " + str(counter) + " in " + str(time.time() - start_time) + " seconds.")
191 |
192 | return
193 |
194 | def alias_setup(probs):
195 | '''
196 | Compute utility lists for non-uniform sampling from discrete distributions.
197 | Refer to https://hips.seas.harvard.edu/blog/2013/03/03/the-alias-method-efficient-sampling-with-many-discrete-outcomes/
198 | for details
199 | '''
200 | K = len(probs)
201 | q = np.zeros(K)
202 | J = np.zeros(K, dtype=np.int)
203 |
204 | smaller = []
205 | larger = []
206 | for kk, prob in enumerate(probs):
207 | q[kk] = K*prob
208 | if q[kk] < 1.0:
209 | smaller.append(kk)
210 | else:
211 | larger.append(kk)
212 |
213 | while len(smaller) > 0 and len(larger) > 0:
214 | small = smaller.pop()
215 | large = larger.pop()
216 |
217 | J[small] = large
218 | q[large] = q[large] + q[small] - 1.0
219 | if q[large] < 1.0:
220 | smaller.append(large)
221 | else:
222 | larger.append(large)
223 |
224 | return J, q
225 |
226 | def alias_draw(J, q):
227 | '''
228 | Draw sample from a non-uniform discrete distribution using alias sampling.
229 | '''
230 | K = len(J)
231 |
232 | kk = int(np.floor(np.random.rand()*K))
233 | if np.random.rand() < q[kk]:
234 | return kk
235 | else:
236 | return J[kk]
237 |
--------------------------------------------------------------------------------
/src/utils.py:
--------------------------------------------------------------------------------
1 | """
2 | Helper functions for multi-node2vec. (duplicate of mltn2v_utils.py used for testing)
3 | Details can be found in the paper: "Fast Embedding of Multilayer Networks: An Algorithm and Application to Group fMRI"
4 | by JD Wilson, M Baybay, R Sankar, and P Stillman
5 |
6 | Preprint here: https://arxiv.org/pdf/1809.06437.pdf
7 |
8 | Contributors:
9 | - Melanie Baybay
10 | University of San Francisco, Department of Computer Science
11 | - Rishi Sankar
12 | Henry M. Gunn High School
13 | - James D. Wilson (maintainer)
14 | University of San Francisco, Department of Mathematics and Statistics
15 |
16 | Questions or Bugs? Contact James D. Wilson at jdwilson4@usfca.edu
17 | """
18 |
19 | import os
20 | import pandas as pd
21 | from pandas.api.types import is_numeric_dtype
22 | import time
23 |
24 |
25 | # -------------------------------------------------------------------------------
26 | # PARSING AND CONVERSION FOR MULTILAYER GRAPHS
27 | # -------------------------------------------------------------------------------
28 | def parse_matrix_layers(network_dir, delim=',', binary=False, thresh=None):
29 | """
30 | Converts directory of adjacency matrix files into pandas dataframes.
31 | :param network_dir: Directory of adjacency matrix files
32 | :param delim: separator for adjacency matrix
33 | :param binary: boolean of whether or not to convert edge weights to binary
34 | :param thresh: threshold for edge weights. Will accepts weights <= thresh
35 | :return: List of adjacency lists. Each adjacency list is one layer and is represented
36 | as pandas DataFrames with 'source', 'target', 'weight' columns.
37 | """
38 | # expand directory path
39 | network_dir = expand_path(network_dir)
40 |
41 | # iterate files and convert to pandas dataframes
42 | layers = []
43 | for network_file in os.listdir(network_dir):
44 | file_path = os.path.join(network_dir, network_file)
45 | try:
46 | # read as pandas DataFrame, index=source, col=target
47 | layer = pd.read_csv(file_path, index_col=0)
48 | if layer.shape[0] != layer.shape[1]:
49 | print('[ERROR] Invalid adjacency matrix. Expecting matrix with index as source and column as target.')
50 | return
51 | if thresh is not None:
52 | layer[layer <= thresh] = 0
53 | if binary:
54 | layer[layer != 0] = 1
55 | # ensure that index (node name) is string, since word2vec will need it as str
56 | if is_numeric_dtype(layer.index):
57 | layer.index = layer.index.map(str)
58 | # replace all 0s with NaN
59 | layer.replace(to_replace=0, value=pd.np.nan, inplace=True)
60 | # convert matrix --> adjacency list with cols ["source", "target", "weight"]
61 | layer = layer.stack(dropna=True).reset_index()
62 | # rename columns
63 | layer.columns = ["source", "target", "weight"]
64 | layers.append(layer)
65 | except Exception as e:
66 | print('[ERROR] Could not read file "{}": {} '.format(file_path, e))
67 | return layers
68 |
69 |
70 | def expand_path(path):
71 | """
72 | Expands a file path to handle user and environmental variables.
73 | :param path: path to expand
74 | :return: expanded path
75 | """
76 | new_path = os.path.expanduser(path)
77 | return os.path.expandvars(new_path)
78 |
79 |
80 | # -------------------------------------------------------------------------------
81 | # OUTPUT
82 | # -------------------------------------------------------------------------------
83 | def feature_matrix_to_csv(ftrs, filename):
84 | """
85 | Convert feature matrix to csv.
86 | :param matrix: pandas DataFrame of features
87 | :param filename: absolute path to output file (no extension)
88 | :param nodes_on: if True, first column indicates node_id
89 | :return:
90 | """
91 | out = filename + ".csv"
92 | ftrs.to_csv(out, sep=',', header=False)
93 | return
94 |
95 |
96 | def timed_invoke(action_desc, method):
97 | """
98 | Invokes a method with timing.
99 | :param action_desc: The string describing the method action
100 | :param method: The method to invoke
101 | :return: The return object of the method
102 | """
103 | print('Started {}...'.format(action_desc))
104 | start = time.time()
105 | try:
106 | output = method()
107 | print('Finished {} in {} seconds'.format(action_desc, int(time.time() - start)))
108 | return output
109 | except Exception:
110 | print('Exception while {} after {} seconds'.format(action_desc, int(time.time() - start)))
111 | raise
112 |
113 |
114 | def clean_output(directory):
115 | """
116 | Checks if output directory exists, otherwise it is created.
117 | """
118 | directory = expand_path(directory)
119 | if os.path.isdir(directory):
120 | return directory
121 | else:
122 | os.makedirs(directory)
123 | print("[WARNING] Directory not found. Created {}".format(directory))
124 | return directory
125 |
--------------------------------------------------------------------------------