├── .DS_Store
├── README.md
├── data
    ├── .DS_Store
    ├── CONTROL_fmt
    │   ├── 0040013.preprocess_v1.csv
    │   ├── 0040014.preprocess_v1.csv
    │   ├── 0040017.preprocess_v1.csv
    │   ├── 0040018.preprocess_v1.csv
    │   ├── 0040019.preprocess_v1.csv
    │   ├── 0040020.preprocess_v1.csv
    │   ├── 0040023.preprocess_v1.csv
    │   ├── 0040024.preprocess_v1.csv
    │   ├── 0040026.preprocess_v1.csv
    │   ├── 0040027.preprocess_v1.csv
    │   ├── 0040030.preprocess_v1.csv
    │   ├── 0040031.preprocess_v1.csv
    │   ├── 0040033.preprocess_v1.csv
    │   ├── 0040035.preprocess_v1.csv
    │   ├── 0040036.preprocess_v1.csv
    │   ├── 0040038.preprocess_v1.csv
    │   ├── 0040043.preprocess_v1.csv
    │   ├── 0040045.preprocess_v1.csv
    │   ├── 0040048.preprocess_v1.csv
    │   ├── 0040050.preprocess_v1.csv
    │   ├── 0040051.preprocess_v1.csv
    │   ├── 0040052.preprocess_v1.csv
    │   ├── 0040053.preprocess_v1.csv
    │   ├── 0040054.preprocess_v1.csv
    │   ├── 0040055.preprocess_v1.csv
    │   ├── 0040056.preprocess_v1.csv
    │   ├── 0040057.preprocess_v1.csv
    │   ├── 0040058.preprocess_v1.csv
    │   ├── 0040061.preprocess_v1.csv
    │   ├── 0040062.preprocess_v1.csv
    │   ├── 0040063.preprocess_v1.csv
    │   ├── 0040065.preprocess_v1.csv
    │   ├── 0040066.preprocess_v1.csv
    │   ├── 0040067.preprocess_v1.csv
    │   ├── 0040068.preprocess_v1.csv
    │   ├── 0040069.preprocess_v1.csv
    │   ├── 0040074.preprocess_v1.csv
    │   ├── 0040076.preprocess_v1.csv
    │   ├── 0040086.preprocess_v1.csv
    │   ├── 0040087.preprocess_v1.csv
    │   ├── 0040090.preprocess_v1.csv
    │   ├── 0040091.preprocess_v1.csv
    │   ├── 0040093.preprocess_v1.csv
    │   ├── 0040095.preprocess_v1.csv
    │   ├── 0040102.preprocess_v1.csv
    │   ├── 0040104.preprocess_v1.csv
    │   ├── 0040107.preprocess_v1.csv
    │   ├── 0040111.preprocess_v1.csv
    │   ├── 0040113.preprocess_v1.csv
    │   ├── 0040114.preprocess_v1.csv
    │   ├── 0040115.preprocess_v1.csv
    │   ├── 0040116.preprocess_v1.csv
    │   ├── 0040118.preprocess_v1.csv
    │   ├── 0040119.preprocess_v1.csv
    │   ├── 0040120.preprocess_v1.csv
    │   ├── 0040121.preprocess_v1.csv
    │   ├── 0040123.preprocess_v1.csv
    │   ├── 0040124.preprocess_v1.csv
    │   ├── 0040125.preprocess_v1.csv
    │   ├── 0040127.preprocess_v1.csv
    │   ├── 0040128.preprocess_v1.csv
    │   ├── 0040129.preprocess_v1.csv
    │   ├── 0040130.preprocess_v1.csv
    │   ├── 0040131.preprocess_v1.csv
    │   ├── 0040134.preprocess_v1.csv
    │   ├── 0040135.preprocess_v1.csv
    │   ├── 0040136.preprocess_v1.csv
    │   ├── 0040138.preprocess_v1.csv
    │   ├── 0040139.preprocess_v1.csv
    │   ├── 0040140.preprocess_v1.csv
    │   ├── 0040141.preprocess_v1.csv
    │   ├── 0040144.preprocess_v1.csv
    │   ├── 0040146.preprocess_v1.csv
    │   └── 0040147.preprocess_v1.csv
    ├── power_atlas_info.csv
    └── test
    │   ├── 0040013.preprocess_v1.csv
    │   └── 0040014.preprocess_v1.csv
├── mn2vec_toy.png
├── multi_node2vec.py
├── requirements.txt
├── results
    └── test
    │   ├── .DS_Store
    │   └── r0.25
    │       ├── mltn2v_control.csv
    │       ├── mltn2v_control.emb
    │       ├── mltn2v_results.csv
    │       └── mltn2v_results.emb
└── src
    ├── __init__.py
    ├── __pycache__
        ├── __init__.cpython-36.pyc
        ├── mltn2v_utils.cpython-36.pyc
        ├── multinode2vec.cpython-36.pyc
        └── nbrhd_gen_walk_nx.cpython-36.pyc
    ├── mltn2v_utils.py
    ├── multinode2vec.py
    ├── nbrhd_gen_walk.py
    ├── nbrhd_gen_walk_nx.py
    └── utils.py


/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jdwilson4/multi-node2vec/f6f86707e10227a7609bfcce5db4b21e03c932ea/.DS_Store


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # multi-node2vec
  2 | This is Python source code for the multi-node2vec algorithm. Multi-node2vec is a fast network embedding method for multilayer networks 
  3 | that identifies a continuous and low-dimensional representation for the unique nodes in the network. 
  4 | 
  5 | Details of the algorithm can be found in the paper: *Fast Embedding of Multilayer Networks: An Algorithm and Application to Group fMRI* 
  6 | by JD Wilson, M Baybay, R Sankar, and P Stillman. 
  7 | 
  8 | **Preprint**: https://arxiv.org/pdf/1809.06437.pdf
  9 | 
 10 | __Contributors__:
 11 | - Melanie Baybay
 12 | University of San Francisco, Department of Computer Science
 13 | - Rishi Sankar
 14 | Henry M. Gunn High School
 15 | - James D. Wilson (maintainer)
 16 | University of San Francisco, Department of Mathematics and Statistics
 17 | 
 18 | **Questions or Bugs?** Contact James D. Wilson at jdwilson4@usfca.edu
 19 | 
 20 | # Description
 21 | 
 22 | ## The Mathematical Objective
 23 | 
 24 |  A multilayer network of length *m* is a collection of networks or graphs {G<sub>1</sub>, ..., G<sub>m</sub>}, where the graph G<sub>j</sub> models the relational structure of the *j*th layer of the network. Each layer G<sub>j</sub> = (V<sub>j</sub>, W<sub>j</sub>) is described by the vertex set V<sub>j</sub> that describes the units, or actors, of the layer, and the edge weights W<sub>j</sub> that describes the strength of relationship between the nodes. Layers in the multilayer sequence may be heterogeneous across vertices, edges, and size. Denote the set of unique nodes in {G<sub>1</sub>, ..., G<sub>m</sub>} by **N**, and let 
 25 |  *N* = |**N**| denote the number of nodes in that set. 
 26 |  
 27 | The aim of the **multi-node2vec** is to learn an interpretable low-dimensional feature representation of **N**. In particular, it seeks a *D*-dimensional representation
 28 | 
 29 | **F**: **N** --> R<sup>*D*</sup>, 
 30 | 
 31 | where *D* < < N. The function **F** can be viewed as an *N* x *D* matrix whose rows {**f**<sub>v</sub>: v = 1, ..., N} represent the feature space of each node in **N**. 
 32 | 
 33 | ## The Algorithm
 34 | The **multi-node2vec** algorithm estimates **F** through maximum likelihood estimation, and relies upon two core steps
 35 | 
 36 | 1) __NeighborhoodSearch__: a collection of vertex neighborhoods from the observed multilayer graph, also known as a *BagofNodes*, is identified. This is done through a 2nd order random walk on the multilayer network.
 37 | 
 38 | 2) __Optimization__: Given a *BagofNodes*, **F** is then estimated through the maximization of the log-likelihood of **F** | **N**. This is done through the application of stochastic gradient descent on a two-layer Skip-gram neural network model.
 39 | 
 40 | The following image provides a schematic:
 41 | 
 42 | ![multi-node2vec schematic](https://github.com/jdwilson4/multi-node2vec/blob/master/mn2vec_toy.png)
 43 | 
 44 | # Running multi-node2vec
 45 | 
 46 | ## Requirements
 47 | This package requires Python == 3.6 with the following libraries:
 48 | - numpy==1.12.1
 49 | - pandas==0.24.0
 50 | - gensim==2.3.0
 51 | - networkx==2.5.1
 52 | 
 53 | You can install these libraries by running the command 
 54 | 
 55 | ```
 56 | pip install -r requirements.txt
 57 | ``` 
 58 | 
 59 | from this project's root directory.
 60 | 
 61 | 
 62 | ## Usage
 63 | ```
 64 | python3 multi_node2vec.py [--dir [DIR]] [--output [OUTPUT]] [--d [D]] [--walk_length [WALK_LENGTH]] [--window_size [WINDOW_SIZE]][--n_samples [N_SAMPLES]][--thresh [THRESH]][--w2v_iter [W2V_ITER]] [--w2v_workers [W2V_WORKERS]] [--rvals [RVALS]] [--pvals [PVALS]] [--qvals [QVALS]]
 65 | ```
 66 | 
 67 | ***Arguments***
 68 | 
 69 | - --dir [directory name]   : Absolute path to directory of correlation/adjacency matrix files in csv format. Note that each .csv should contain an adjacency matrix with columns and rows labeled by the node ID.
 70 | - --output [filename]      : Absolute path to output file (no extension).
 71 | - --d [dimensions]         : Dimensionality. Default is 100.
 72 | - --walk_length [n]        : Length of each random walk for identifying multilayer neighborhoods. Default is 100. 
 73 | - --window_size [w]        : Size of context window used for Skip Gram optimization. Default is 10.
 74 | - --n_samples [samples]    : Number of times to sample a layer. Default is 1.
 75 | - --thresh [thresh]		   : Threshold for converting a weighted network to an unweighted one. All weights less than or equal to thresh will be considered 0 and all others 1. Default is 0.5. Use None if the network is unweighted.
 76 | - --w2v_workers [workers]  : Number of parallel worker threads. Default is 8.
 77 | - --rvals [layer walk prob]: The unnormalized walk probability for traversing layers. Default is .25.
 78 | - --pvals [return prob]    : The unnormalized walk probability of returning to a previously seen node. Default is 1.
 79 | - --qvals [explore prob]   : The unnormalized walk probability of exploring new nodes. Default is 0.50. 
 80 | 
 81 | ### Examples
 82 | 
 83 | __Quick Test example__
 84 | 
 85 | This example runs **multi-node2vec** on a small test multilayer network with 2 layers and 264 nodes in each layer. It takes about 2 minutes to run on a personal computer using 8 cores.
 86 | ```
 87 | python3 multi_node2vec.py --dir data/test --output results/test --d 100 --window_size 2 --n_samples 1 --thresh 0.5 --rvals 0.25
 88 | ```
 89 | 
 90 | __fMRI Case Study__
 91 | 
 92 | This example runs **multi-node2vec** on the multilayer network representing group fMRI of 74 healthy controls as run in the paper *Fast Embedding of Multilayer Networks: An Algorithm and Application to Group fMRI*. The model will generate
 93 | generate 100 features for each of 264 unique nodes using a walk parameter *r = 0.25*. The values of *p* (=1) and *q* (=0.50) are set to the default of what is available in the original **node2vec** specification. It takes about an hour to run on a personal computer using 8 cores.
 94 | ```
 95 | python3 multi_node2vec.py --dir data/CONTROL_fmt --output results/control --d 100 --window_size 10 --n_samples 1 --rvals 0.25 --pvals 1 --thresh 0.5 --qvals 0.5
 96 | ```
 97 | 
 98 | 
 99 | 
100 | 


--------------------------------------------------------------------------------
/data/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jdwilson4/multi-node2vec/f6f86707e10227a7609bfcce5db4b21e03c932ea/data/.DS_Store


--------------------------------------------------------------------------------
/data/power_atlas_info.csv:
--------------------------------------------------------------------------------
  1 | ROI,X,Y,Z,MasterAssignment,SuggestedSystem,color,network_revised,roi_name_unique,color_updated
  2 | 1,-25,-98,-12,-1,Uncertain,White,uncertain,roi_1_uncertain,white
  3 | 2,27,-97,-13,-1,Uncertain,White,uncertain,roi_2_uncertain,white
  4 | 3,24,32,-18,-1,Uncertain,White,uncertain,roi_3_uncertain,white
  5 | 4,-56,-45,-24,-1,Uncertain,White,uncertain,roi_4_uncertain,white
  6 | 5,8,41,-24,-1,Uncertain,White,uncertain,roi_5_uncertain,white
  7 | 6,-21,-22,-20,-1,Uncertain,White,uncertain,roi_6_uncertain,white
  8 | 7,17,-28,-17,-1,Uncertain,White,uncertain,roi_7_uncertain,white
  9 | 8,-37,-29,-26,-1,Uncertain,White,uncertain,roi_8_uncertain,white
 10 | 9,65,-24,-19,-1,Uncertain,White,uncertain,roi_9_uncertain,white
 11 | 10,52,-34,-27,-1,Uncertain,White,uncertain,roi_10_uncertain,white
 12 | 11,55,-31,-17,-1,Uncertain,White,uncertain,roi_11_uncertain,white
 13 | 12,34,38,-12,-1,Uncertain,White,uncertain,roi_12_uncertain,white
 14 | 13,-7,-52,61,1,Sensory/somatomotor Hand,Cyan,hand,roi_13_hand,cyan
 15 | 14,-14,-18,40,1,Sensory/somatomotor Hand,Cyan,hand,roi_14_hand,cyan
 16 | 15,0,-15,47,1,Sensory/somatomotor Hand,Cyan,hand,roi_15_hand,cyan
 17 | 16,10,-2,45,1,Sensory/somatomotor Hand,Cyan,hand,roi_16_hand,cyan
 18 | 17,-7,-21,65,1,Sensory/somatomotor Hand,Cyan,hand,roi_17_hand,cyan
 19 | 18,-7,-33,72,1,Sensory/somatomotor Hand,Cyan,hand,roi_18_hand,cyan
 20 | 19,13,-33,75,1,Sensory/somatomotor Hand,Cyan,hand,roi_19_hand,cyan
 21 | 20,-54,-23,43,1,Sensory/somatomotor Hand,Cyan,hand,roi_20_hand,cyan
 22 | 21,29,-17,71,1,Sensory/somatomotor Hand,Cyan,hand,roi_21_hand,cyan
 23 | 22,10,-46,73,1,Sensory/somatomotor Hand,Cyan,hand,roi_22_hand,cyan
 24 | 23,-23,-30,72,1,Sensory/somatomotor Hand,Cyan,hand,roi_23_hand,cyan
 25 | 24,-40,-19,54,1,Sensory/somatomotor Hand,Cyan,hand,roi_24_hand,cyan
 26 | 25,29,-39,59,1,Sensory/somatomotor Hand,Cyan,hand,roi_25_hand,cyan
 27 | 26,50,-20,42,1,Sensory/somatomotor Hand,Cyan,hand,roi_26_hand,cyan
 28 | 27,-38,-27,69,1,Sensory/somatomotor Hand,Cyan,hand,roi_27_hand,cyan
 29 | 28,20,-29,60,1,Sensory/somatomotor Hand,Cyan,hand,roi_28_hand,cyan
 30 | 29,44,-8,57,1,Sensory/somatomotor Hand,Cyan,hand,roi_29_hand,cyan
 31 | 30,-29,-43,61,1,Sensory/somatomotor Hand,Cyan,hand,roi_30_hand,cyan
 32 | 31,10,-17,74,1,Sensory/somatomotor Hand,Cyan,hand,roi_31_hand,cyan
 33 | 32,22,-42,69,1,Sensory/somatomotor Hand,Cyan,hand,roi_32_hand,cyan
 34 | 33,-45,-32,47,1,Sensory/somatomotor Hand,Cyan,hand,roi_33_hand,cyan
 35 | 34,-21,-31,61,1,Sensory/somatomotor Hand,Cyan,hand,roi_34_hand,cyan
 36 | 35,-13,-17,75,1,Sensory/somatomotor Hand,Cyan,hand,roi_35_hand,cyan
 37 | 36,42,-20,55,1,Sensory/somatomotor Hand,Cyan,hand,roi_36_hand,cyan
 38 | 37,-38,-15,69,1,Sensory/somatomotor Hand,Cyan,hand,roi_37_hand,cyan
 39 | 38,-16,-46,73,1,Sensory/somatomotor Hand,Cyan,hand,roi_38_hand,cyan
 40 | 39,2,-28,60,1,Sensory/somatomotor Hand,Cyan,hand,roi_39_hand,cyan
 41 | 40,3,-17,58,1,Sensory/somatomotor Hand,Cyan,hand,roi_40_hand,cyan
 42 | 41,38,-17,45,1,Sensory/somatomotor Hand,Cyan,hand,roi_41_hand,cyan
 43 | 42,-49,-11,35,2,Sensory/somatomotor Mouth,Orange,mouth,roi_42_mouth,orange
 44 | 43,36,-9,14,2,Sensory/somatomotor Mouth,Orange,mouth,roi_43_mouth,orange
 45 | 44,51,-6,32,2,Sensory/somatomotor Mouth,Orange,mouth,roi_44_mouth,orange
 46 | 45,-53,-10,24,2,Sensory/somatomotor Mouth,Orange,mouth,roi_45_mouth,orange
 47 | 46,66,-8,25,2,Sensory/somatomotor Mouth,Orange,mouth,roi_46_mouth,orange
 48 | 47,-3,2,53,3,Cingulo-opercular Task Control,Purple,cing_oper_task_control,roi_47_cing_oper_task_control,purple
 49 | 48,54,-28,34,3,Cingulo-opercular Task Control,Purple,cing_oper_task_control,roi_48_cing_oper_task_control,purple
 50 | 49,19,-8,64,3,Cingulo-opercular Task Control,Purple,cing_oper_task_control,roi_49_cing_oper_task_control,purple
 51 | 50,-16,-5,71,3,Cingulo-opercular Task Control,Purple,cing_oper_task_control,roi_50_cing_oper_task_control,purple
 52 | 51,-10,-2,42,3,Cingulo-opercular Task Control,Purple,cing_oper_task_control,roi_51_cing_oper_task_control,purple
 53 | 52,37,1,-4,3,Cingulo-opercular Task Control,Purple,cing_oper_task_control,roi_52_cing_oper_task_control,purple
 54 | 53,13,-1,70,3,Cingulo-opercular Task Control,Purple,cing_oper_task_control,roi_53_cing_oper_task_control,purple
 55 | 54,7,8,51,3,Cingulo-opercular Task Control,Purple,cing_oper_task_control,roi_54_cing_oper_task_control,purple
 56 | 55,-45,0,9,3,Cingulo-opercular Task Control,Purple,cing_oper_task_control,roi_55_cing_oper_task_control,purple
 57 | 56,49,8,-1,3,Cingulo-opercular Task Control,Purple,cing_oper_task_control,roi_56_cing_oper_task_control,purple
 58 | 57,-34,3,4,3,Cingulo-opercular Task Control,Purple,cing_oper_task_control,roi_57_cing_oper_task_control,purple
 59 | 58,-51,8,-2,3,Cingulo-opercular Task Control,Purple,cing_oper_task_control,roi_58_cing_oper_task_control,purple
 60 | 59,-5,18,34,3,Cingulo-opercular Task Control,Purple,cing_oper_task_control,roi_59_cing_oper_task_control,purple
 61 | 60,36,10,1,3,Cingulo-opercular Task Control,Purple,cing_oper_task_control,roi_60_cing_oper_task_control,purple
 62 | 61,32,-26,13,4,Auditory,Pink,auditory,roi_61_auditory,pink
 63 | 62,65,-33,20,4,Auditory,Pink,auditory,roi_62_auditory,pink
 64 | 63,58,-16,7,4,Auditory,Pink,auditory,roi_63_auditory,pink
 65 | 64,-38,-33,17,4,Auditory,Pink,auditory,roi_64_auditory,pink
 66 | 65,-60,-25,14,4,Auditory,Pink,auditory,roi_65_auditory,pink
 67 | 66,-49,-26,5,4,Auditory,Pink,auditory,roi_66_auditory,pink
 68 | 67,43,-23,20,4,Auditory,Pink,auditory,roi_67_auditory,pink
 69 | 68,-50,-34,26,4,Auditory,Pink,auditory,roi_68_auditory,pink
 70 | 69,-53,-22,23,4,Auditory,Pink,auditory,roi_69_auditory,pink
 71 | 70,-55,-9,12,4,Auditory,Pink,auditory,roi_70_auditory,pink
 72 | 71,56,-5,13,4,Auditory,Pink,auditory,roi_71_auditory,pink
 73 | 72,59,-17,29,4,Auditory,Pink,auditory,roi_72_auditory,pink
 74 | 73,-30,-27,12,4,Auditory,Pink,auditory,roi_73_auditory,pink
 75 | 74,-41,-75,26,5,Default mode,Red,dmn,roi_74_dmn,red
 76 | 75,6,67,-4,5,Default mode,Red,dmn,roi_75_dmn,red
 77 | 76,8,48,-15,5,Default mode,Red,dmn,roi_76_dmn,red
 78 | 77,-13,-40,1,5,Default mode,Red,dmn,roi_77_dmn,red
 79 | 78,-18,63,-9,5,Default mode,Red,dmn,roi_78_dmn,red
 80 | 79,-46,-61,21,5,Default mode,Red,dmn,roi_79_dmn,red
 81 | 80,43,-72,28,5,Default mode,Red,dmn,roi_80_dmn,red
 82 | 81,-44,12,-34,5,Default mode,Red,dmn,roi_81_dmn,red
 83 | 82,46,16,-30,5,Default mode,Red,dmn,roi_82_dmn,red
 84 | 83,-68,-23,-16,5,Default mode,Red,dmn,roi_83_dmn,red
 85 | 84,-58,-26,-15,-1,Uncertain,White,uncertain,roi_84_uncertain,white
 86 | 85,27,16,-17,-1,Uncertain,White,uncertain,roi_85_uncertain,white
 87 | 86,-44,-65,35,5,Default mode,Red,dmn,roi_86_dmn,red
 88 | 87,-39,-75,44,5,Default mode,Red,dmn,roi_87_dmn,red
 89 | 88,-7,-55,27,5,Default mode,Red,dmn,roi_88_dmn,red
 90 | 89,6,-59,35,5,Default mode,Red,dmn,roi_89_dmn,red
 91 | 90,-11,-56,16,5,Default mode,Red,dmn,roi_90_dmn,red
 92 | 91,-3,-49,13,5,Default mode,Red,dmn,roi_91_dmn,red
 93 | 92,8,-48,31,5,Default mode,Red,dmn,roi_92_dmn,red
 94 | 93,15,-63,26,5,Default mode,Red,dmn,roi_93_dmn,red
 95 | 94,-2,-37,44,5,Default mode,Red,dmn,roi_94_dmn,red
 96 | 95,11,-54,17,5,Default mode,Red,dmn,roi_95_dmn,red
 97 | 96,52,-59,36,5,Default mode,Red,dmn,roi_96_dmn,red
 98 | 97,23,33,48,5,Default mode,Red,dmn,roi_97_dmn,red
 99 | 98,-10,39,52,5,Default mode,Red,dmn,roi_98_dmn,red
100 | 99,-16,29,53,5,Default mode,Red,dmn,roi_99_dmn,red
101 | 100,-35,20,51,5,Default mode,Red,dmn,roi_100_dmn,red
102 | 101,22,39,39,5,Default mode,Red,dmn,roi_101_dmn,red
103 | 102,13,55,38,5,Default mode,Red,dmn,roi_102_dmn,red
104 | 103,-10,55,39,5,Default mode,Red,dmn,roi_103_dmn,red
105 | 104,-20,45,39,5,Default mode,Red,dmn,roi_104_dmn,red
106 | 105,6,54,16,5,Default mode,Red,dmn,roi_105_dmn,red
107 | 106,6,64,22,5,Default mode,Red,dmn,roi_106_dmn,red
108 | 107,-7,51,-1,5,Default mode,Red,dmn,roi_107_dmn,red
109 | 108,9,54,3,5,Default mode,Red,dmn,roi_108_dmn,red
110 | 109,-3,44,-9,5,Default mode,Red,dmn,roi_109_dmn,red
111 | 110,8,42,-5,5,Default mode,Red,dmn,roi_110_dmn,red
112 | 111,-11,45,8,5,Default mode,Red,dmn,roi_111_dmn,red
113 | 112,-2,38,36,5,Default mode,Red,dmn,roi_112_dmn,red
114 | 113,-3,42,16,5,Default mode,Red,dmn,roi_113_dmn,red
115 | 114,-20,64,19,5,Default mode,Red,dmn,roi_114_dmn,red
116 | 115,-8,48,23,5,Default mode,Red,dmn,roi_115_dmn,red
117 | 116,65,-12,-19,5,Default mode,Red,dmn,roi_116_dmn,red
118 | 117,-56,-13,-10,5,Default mode,Red,dmn,roi_117_dmn,red
119 | 118,-58,-30,-4,5,Default mode,Red,dmn,roi_118_dmn,red
120 | 119,65,-31,-9,5,Default mode,Red,dmn,roi_119_dmn,red
121 | 120,-68,-41,-5,5,Default mode,Red,dmn,roi_120_dmn,red
122 | 121,13,30,59,5,Default mode,Red,dmn,roi_121_dmn,red
123 | 122,12,36,20,5,Default mode,Red,dmn,roi_122_dmn,red
124 | 123,52,-2,-16,5,Default mode,Red,dmn,roi_123_dmn,red
125 | 124,-26,-40,-8,5,Default mode,Red,dmn,roi_124_dmn,red
126 | 125,27,-37,-13,5,Default mode,Red,dmn,roi_125_dmn,red
127 | 126,-34,-38,-16,5,Default mode,Red,dmn,roi_126_dmn,red
128 | 127,28,-77,-32,5,Default mode,Red,dmn,roi_127_dmn,red
129 | 128,52,7,-30,5,Default mode,Red,dmn,roi_128_dmn,red
130 | 129,-53,3,-27,5,Default mode,Red,dmn,roi_129_dmn,red
131 | 130,47,-50,29,5,Default mode,Red,dmn,roi_130_dmn,red
132 | 131,-49,-42,1,5,Default mode,Red,dmn,roi_131_dmn,red
133 | 132,-31,19,-19,-1,Uncertain,White,uncertain,roi_132_uncertain,white
134 | 133,-2,-35,31,6,Memory retrieval?,Gray,mem_retr,roi_133_mem_retr,gray
135 | 134,-7,-71,42,6,Memory retrieval?,Gray,mem_retr,roi_134_mem_retr,gray
136 | 135,11,-66,42,6,Memory retrieval?,Gray,mem_retr,roi_135_mem_retr,gray
137 | 136,4,-48,51,6,Memory retrieval?,Gray,mem_retr,roi_136_mem_retr,gray
138 | 137,-46,31,-13,5,Default mode,Red,dmn,roi_137_dmn,red
139 | 138,-10,11,67,11,Ventral attention,Teal,ventral_attention,roi_138_ventral_attention,olivedrab
140 | 139,49,35,-12,5,Default mode,Red,dmn,roi_139_dmn,red
141 | 140,8,-91,-7,-1,Uncertain,White,uncertain,roi_140_uncertain,white
142 | 141,17,-91,-14,-1,Uncertain,White,uncertain,roi_141_uncertain,white
143 | 142,-12,-95,-13,-1,Uncertain,White,uncertain,roi_142_uncertain,white
144 | 143,18,-47,-10,7,Visual,Blue,visual,roi_143_visual,blue
145 | 144,40,-72,14,7,Visual,Blue,visual,roi_144_visual,blue
146 | 145,8,-72,11,7,Visual,Blue,visual,roi_145_visual,blue
147 | 146,-8,-81,7,7,Visual,Blue,visual,roi_146_visual,blue
148 | 147,-28,-79,19,7,Visual,Blue,visual,roi_147_visual,blue
149 | 148,20,-66,2,7,Visual,Blue,visual,roi_148_visual,blue
150 | 149,-24,-91,19,7,Visual,Blue,visual,roi_149_visual,blue
151 | 150,27,-59,-9,7,Visual,Blue,visual,roi_150_visual,blue
152 | 151,-15,-72,-8,7,Visual,Blue,visual,roi_151_visual,blue
153 | 152,-18,-68,5,7,Visual,Blue,visual,roi_152_visual,blue
154 | 153,43,-78,-12,7,Visual,Blue,visual,roi_153_visual,blue
155 | 154,-47,-76,-10,7,Visual,Blue,visual,roi_154_visual,blue
156 | 155,-14,-91,31,7,Visual,Blue,visual,roi_155_visual,blue
157 | 156,15,-87,37,7,Visual,Blue,visual,roi_156_visual,blue
158 | 157,29,-77,25,7,Visual,Blue,visual,roi_157_visual,blue
159 | 158,20,-86,-2,7,Visual,Blue,visual,roi_158_visual,blue
160 | 159,15,-77,31,7,Visual,Blue,visual,roi_159_visual,blue
161 | 160,-16,-52,-1,7,Visual,Blue,visual,roi_160_visual,blue
162 | 161,42,-66,-8,7,Visual,Blue,visual,roi_161_visual,blue
163 | 162,24,-87,24,7,Visual,Blue,visual,roi_162_visual,blue
164 | 163,6,-72,24,7,Visual,Blue,visual,roi_163_visual,blue
165 | 164,-42,-74,0,7,Visual,Blue,visual,roi_164_visual,blue
166 | 165,26,-79,-16,7,Visual,Blue,visual,roi_165_visual,blue
167 | 166,-16,-77,34,7,Visual,Blue,visual,roi_166_visual,blue
168 | 167,-3,-81,21,7,Visual,Blue,visual,roi_167_visual,blue
169 | 168,-40,-88,-6,7,Visual,Blue,visual,roi_168_visual,blue
170 | 169,37,-84,13,7,Visual,Blue,visual,roi_169_visual,blue
171 | 170,6,-81,6,7,Visual,Blue,visual,roi_170_visual,blue
172 | 171,-26,-90,3,7,Visual,Blue,visual,roi_171_visual,blue
173 | 172,-33,-79,-13,7,Visual,Blue,visual,roi_172_visual,blue
174 | 173,37,-81,1,7,Visual,Blue,visual,roi_173_visual,blue
175 | 174,-44,2,46,8,Fronto-parietal Task Control,Yellow,fronto_parietal_task_control,roi_174_fronto_parietal_task_control,yellow
176 | 175,48,25,27,8,Fronto-parietal Task Control,Yellow,fronto_parietal_task_control,roi_175_fronto_parietal_task_control,yellow
177 | 176,-47,11,23,8,Fronto-parietal Task Control,Yellow,fronto_parietal_task_control,roi_176_fronto_parietal_task_control,yellow
178 | 177,-53,-49,43,8,Fronto-parietal Task Control,Yellow,fronto_parietal_task_control,roi_177_fronto_parietal_task_control,yellow
179 | 178,-23,11,64,8,Fronto-parietal Task Control,Yellow,fronto_parietal_task_control,roi_178_fronto_parietal_task_control,yellow
180 | 179,58,-53,-14,8,Fronto-parietal Task Control,Yellow,fronto_parietal_task_control,roi_179_fronto_parietal_task_control,yellow
181 | 180,24,45,-15,8,Fronto-parietal Task Control,Yellow,fronto_parietal_task_control,roi_180_fronto_parietal_task_control,yellow
182 | 181,34,54,-13,8,Fronto-parietal Task Control,Yellow,fronto_parietal_task_control,roi_181_fronto_parietal_task_control,yellow
183 | 182,-21,41,-20,-1,Uncertain,White,uncertain,roi_182_uncertain,white
184 | 183,-18,-76,-24,-1,Uncertain,White,uncertain,roi_183_uncertain,white
185 | 184,17,-80,-34,-1,Uncertain,White,uncertain,roi_184_uncertain,white
186 | 185,35,-67,-34,-1,Uncertain,White,uncertain,roi_185_uncertain,white
187 | 186,47,10,33,8,Fronto-parietal Task Control,Yellow,fronto_parietal_task_control,roi_186_fronto_parietal_task_control,yellow
188 | 187,-41,6,33,8,Fronto-parietal Task Control,Yellow,fronto_parietal_task_control,roi_187_fronto_parietal_task_control,yellow
189 | 188,-42,38,21,8,Fronto-parietal Task Control,Yellow,fronto_parietal_task_control,roi_188_fronto_parietal_task_control,yellow
190 | 189,38,43,15,8,Fronto-parietal Task Control,Yellow,fronto_parietal_task_control,roi_189_fronto_parietal_task_control,yellow
191 | 190,49,-42,45,8,Fronto-parietal Task Control,Yellow,fronto_parietal_task_control,roi_190_fronto_parietal_task_control,yellow
192 | 191,-28,-58,48,8,Fronto-parietal Task Control,Yellow,fronto_parietal_task_control,roi_191_fronto_parietal_task_control,yellow
193 | 192,44,-53,47,8,Fronto-parietal Task Control,Yellow,fronto_parietal_task_control,roi_192_fronto_parietal_task_control,yellow
194 | 193,32,14,56,8,Fronto-parietal Task Control,Yellow,fronto_parietal_task_control,roi_193_fronto_parietal_task_control,yellow
195 | 194,37,-65,40,8,Fronto-parietal Task Control,Yellow,fronto_parietal_task_control,roi_194_fronto_parietal_task_control,yellow
196 | 195,-42,-55,45,8,Fronto-parietal Task Control,Yellow,fronto_parietal_task_control,roi_195_fronto_parietal_task_control,yellow
197 | 196,40,18,40,8,Fronto-parietal Task Control,Yellow,fronto_parietal_task_control,roi_196_fronto_parietal_task_control,yellow
198 | 197,-34,55,4,8,Fronto-parietal Task Control,Yellow,fronto_parietal_task_control,roi_197_fronto_parietal_task_control,yellow
199 | 198,-42,45,-2,8,Fronto-parietal Task Control,Yellow,fronto_parietal_task_control,roi_198_fronto_parietal_task_control,yellow
200 | 199,33,-53,44,8,Fronto-parietal Task Control,Yellow,fronto_parietal_task_control,roi_199_fronto_parietal_task_control,yellow
201 | 200,43,49,-2,8,Fronto-parietal Task Control,Yellow,fronto_parietal_task_control,roi_200_fronto_parietal_task_control,yellow
202 | 201,-42,25,30,8,Fronto-parietal Task Control,Yellow,fronto_parietal_task_control,roi_201_fronto_parietal_task_control,yellow
203 | 202,-3,26,44,8,Fronto-parietal Task Control,Yellow,fronto_parietal_task_control,roi_202_fronto_parietal_task_control,yellow
204 | 203,11,-39,50,9,Salience,Black,salience,roi_203_salience,black
205 | 204,55,-45,37,9,Salience,Black,salience,roi_204_salience,black
206 | 205,42,0,47,9,Salience,Black,salience,roi_205_salience,black
207 | 206,31,33,26,9,Salience,Black,salience,roi_206_salience,black
208 | 207,48,22,10,9,Salience,Black,salience,roi_207_salience,black
209 | 208,-35,20,0,9,Salience,Black,salience,roi_208_salience,black
210 | 209,36,22,3,9,Salience,Black,salience,roi_209_salience,black
211 | 210,37,32,-2,9,Salience,Black,salience,roi_210_salience,black
212 | 211,34,16,-8,9,Salience,Black,salience,roi_211_salience,black
213 | 212,-11,26,25,9,Salience,Black,salience,roi_212_salience,black
214 | 213,-1,15,44,9,Salience,Black,salience,roi_213_salience,black
215 | 214,-28,52,21,9,Salience,Black,salience,roi_214_salience,black
216 | 215,0,30,27,9,Salience,Black,salience,roi_215_salience,black
217 | 216,5,23,37,9,Salience,Black,salience,roi_216_salience,black
218 | 217,10,22,27,9,Salience,Black,salience,roi_217_salience,black
219 | 218,31,56,14,9,Salience,Black,salience,roi_218_salience,black
220 | 219,26,50,27,9,Salience,Black,salience,roi_219_salience,black
221 | 220,-39,51,17,9,Salience,Black,salience,roi_220_salience,black
222 | 221,2,-24,30,6,Memory retrieval?,Gray,mem_retr,roi_221_mem_retr,gray
223 | 222,6,-24,0,10,Subcortical,Brown,subcortical,roi_222_subcortical,brown
224 | 223,-2,-13,12,10,Subcortical,Brown,subcortical,roi_223_subcortical,brown
225 | 224,-10,-18,7,10,Subcortical,Brown,subcortical,roi_224_subcortical,brown
226 | 225,12,-17,8,10,Subcortical,Brown,subcortical,roi_225_subcortical,brown
227 | 226,-5,-28,-4,10,Subcortical,Brown,subcortical,roi_226_subcortical,brown
228 | 227,-22,7,-5,10,Subcortical,Brown,subcortical,roi_227_subcortical,brown
229 | 228,-15,4,8,10,Subcortical,Brown,subcortical,roi_228_subcortical,brown
230 | 229,31,-14,2,10,Subcortical,Brown,subcortical,roi_229_subcortical,brown
231 | 230,23,10,1,10,Subcortical,Brown,subcortical,roi_230_subcortical,brown
232 | 231,29,1,4,10,Subcortical,Brown,subcortical,roi_231_subcortical,brown
233 | 232,-31,-11,0,10,Subcortical,Brown,subcortical,roi_232_subcortical,brown
234 | 233,15,5,7,10,Subcortical,Brown,subcortical,roi_233_subcortical,brown
235 | 234,9,-4,6,10,Subcortical,Brown,subcortical,roi_234_subcortical,brown
236 | 235,54,-43,22,11,Ventral attention,Teal,ventral_attention,roi_235_ventral_attention,olivedrab
237 | 236,-56,-50,10,11,Ventral attention,Teal,ventral_attention,roi_236_ventral_attention,olivedrab
238 | 237,-55,-40,14,11,Ventral attention,Teal,ventral_attention,roi_237_ventral_attention,olivedrab
239 | 238,52,-33,8,11,Ventral attention,Teal,ventral_attention,roi_238_ventral_attention,olivedrab
240 | 239,51,-29,-4,11,Ventral attention,Teal,ventral_attention,roi_239_ventral_attention,olivedrab
241 | 240,56,-46,11,11,Ventral attention,Teal,ventral_attention,roi_240_ventral_attention,olivedrab
242 | 241,53,33,1,11,Ventral attention,Teal,ventral_attention,roi_241_ventral_attention,olivedrab
243 | 242,-49,25,-1,11,Ventral attention,Teal,ventral_attention,roi_242_ventral_attention,olivedrab
244 | 243,-16,-65,-20,13,Cerebellar,Pale blue,cerebellar,roi_243_cerebellar,lightslateblue
245 | 244,-32,-55,-25,13,Cerebellar,Pale blue,cerebellar,roi_244_cerebellar,lightslateblue
246 | 245,22,-58,-23,13,Cerebellar,Pale blue,cerebellar,roi_245_cerebellar,lightslateblue
247 | 246,1,-62,-18,13,Cerebellar,Pale blue,cerebellar,roi_246_cerebellar,lightslateblue
248 | 247,33,-12,-34,-1,Uncertain,White,uncertain,roi_247_uncertain,white
249 | 248,-31,-10,-36,-1,Uncertain,White,uncertain,roi_248_uncertain,white
250 | 249,49,-3,-38,-1,Uncertain,White,uncertain,roi_249_uncertain,white
251 | 250,-50,-7,-39,-1,Uncertain,White,uncertain,roi_250_uncertain,white
252 | 251,10,-62,61,12,Dorsal attention,Green,dorsal_attention,roi_251_dorsal_attention,green
253 | 252,-52,-63,5,12,Dorsal attention,Green,dorsal_attention,roi_252_dorsal_attention,green
254 | 253,-47,-51,-21,-1,Uncertain,White,uncertain,roi_253_uncertain,white
255 | 254,46,-47,-17,-1,Uncertain,White,uncertain,roi_254_uncertain,white
256 | 255,47,-30,49,1,Sensory/somatomotor Hand,Cyan,hand,roi_255_hand,cyan
257 | 256,22,-65,48,12,Dorsal attention,Green,dorsal_attention,roi_256_dorsal_attention,green
258 | 257,46,-59,4,12,Dorsal attention,Green,dorsal_attention,roi_257_dorsal_attention,green
259 | 258,25,-58,60,12,Dorsal attention,Green,dorsal_attention,roi_258_dorsal_attention,green
260 | 259,-33,-46,47,12,Dorsal attention,Green,dorsal_attention,roi_259_dorsal_attention,green
261 | 260,-27,-71,37,12,Dorsal attention,Green,dorsal_attention,roi_260_dorsal_attention,green
262 | 261,-32,-1,54,12,Dorsal attention,Green,dorsal_attention,roi_261_dorsal_attention,green
263 | 262,-42,-60,-9,12,Dorsal attention,Green,dorsal_attention,roi_262_dorsal_attention,green
264 | 263,-17,-59,64,12,Dorsal attention,Green,dorsal_attention,roi_263_dorsal_attention,green
265 | 264,29,-5,54,12,Dorsal attention,Green,dorsal_attention,roi_264_dorsal_attention,green
266 | 


--------------------------------------------------------------------------------
/mn2vec_toy.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jdwilson4/multi-node2vec/f6f86707e10227a7609bfcce5db4b21e03c932ea/mn2vec_toy.png


--------------------------------------------------------------------------------
/multi_node2vec.py:
--------------------------------------------------------------------------------
  1 | '''
  2 | Wrapper for the multi-node2vec algorithm. 
  3 | 
  4 | Details can be found in the paper: "Fast Embedding of Multilayer Networks: An Algorithm and Application to Group fMRI" 
  5 | by JD Wilson, M Baybay, R Sankar, and P Stillman
  6 | 
  7 | Preprint here: https://arxiv.org/pdf/1809.06437.pdf
  8 | 
  9 | Contributors:
 10 | - Melanie Baybay
 11 | University of San Francisco, Department of Computer Science
 12 | - Rishi Sankar
 13 | Henry M. Gunn High School
 14 | - James D. Wilson (maintainer)
 15 | University of San Francisco, Department of Mathematics and Statistics
 16 | 
 17 | Questions or Bugs? Contact James D. Wilson at jdwilson4@usfca.edu
 18 | '''
 19 | import os
 20 | import src as mltn2v
 21 | import argparse
 22 | import time
 23 | 
 24 | 
 25 | def parse_args():
 26 |     parser = argparse.ArgumentParser(description="Run multi-node2vec on multilayer networks.")
 27 | 
 28 |     parser.add_argument('--dir', nargs='?', default='data/CONTROL_fmt',
 29 |                         help='Absolute path to directory of correlation/adjacency matrix files (csv format). Note that rows and columns must be properly labeled by node ID in each .csv.')
 30 | 
 31 |     parser.add_argument('--output', nargs='?', default='new_results/',
 32 |                         help='Absolute path to output directory (no extension).')
 33 | 
 34 |     #parser.add_argument('--filename', nargs='?', default='new_results/mltn2v_control',
 35 |     #                    help='output filename (no extension).')
 36 | 
 37 |     parser.add_argument('--d', type=int, default=100,
 38 |                         help='Dimensionality. Default is 100.')
 39 | 
 40 |     parser.add_argument('--walk_length', type=int, default=100,
 41 |                         help='Length of each random walk. Default is 100.')
 42 |                         
 43 |     parser.add_argument('--window_size', type=int, default = 10,
 44 |                         help='Size of context window used for Skip Gram optimization. Default is 10.')
 45 | 
 46 |     parser.add_argument('--n_samples', type=int, default=1,
 47 |                         help='Number of walks per node per layer. Default is 1.')
 48 | 
 49 |     parser.add_argument('--thresh', type=float, default=0.5,
 50 |                         help='Threshold for converting a weighted network to an unweighted one. All weights less than or equal to thresh will be considered 0 and all others 1. Default is 0.5. Use None if the network is unweighted.')
 51 | 
 52 |     # parser.add_argument('--w2v_iter', default=1, type=int,
 53 | #                         help='Number of epochs in word2vec')
 54 | 
 55 |     parser.add_argument('--w2v_workers', type=int, default=8,
 56 |                         help='Number of parallel worker threads. Default is 8.')
 57 |                         
 58 |     parser.add_argument('--rvals', type=float, default=0.25,
 59 |                         help='Layer walk parameter for neighborhood search. Default is 0.25')
 60 | 
 61 |     parser.add_argument('--pvals', type=float, default=1,
 62 |                         help='Return walk parameter for neighborhood search. Default is 1')
 63 |     
 64 |     parser.add_argument('--qvals', type=float, default=0.5,
 65 |                         help='Exploration walk parameter for neighborhood search. Default is 0.50')
 66 |   
 67 | 
 68 |     return parser.parse_args()
 69 | 
 70 | 
 71 | def main(args):
 72 |     start = time.time()
 73 |     # PARSE LAYERS -- THRESHOLD & CONVERT TO BINARY
 74 |     layers = mltn2v.timed_invoke("parsing network layers",
 75 |                                  lambda: mltn2v.parse_matrix_layers(args.dir, binary=True, thresh=args.thresh))
 76 |     # check if layers were parsed
 77 |     if layers:
 78 |         # EXTRACT NEIGHBORHOODS
 79 |         nbrhd_dict = mltn2v.timed_invoke("extracting neighborhoods",
 80 |                                      lambda: mltn2v.extract_neighborhoods_walk(layers, args.walk_length, args.rvals, args.pvals, args.qvals))
 81 |         # GENERATE FEATURES
 82 |         out = mltn2v.clean_output(args.output)
 83 |         for w in args.rvals:
 84 |             out_path = os.path.join(out, 'r' + str(w) + '/mltn2v_results') 
 85 |             mltn2v.timed_invoke("generating features",
 86 |                                 lambda: mltn2v.generate_features(nbrhd_dict[w], args.d, out_path, nbrhd_size=args.window_size,
 87 |                                                                  w2v_iter=1, workers=args.w2v_workers))
 88 | 
 89 |             print("\nCompleted Multilayer Network Embedding for r=" + str(w) + " in {:.2f} secs.\nSee results:".format(time.time() - start))
 90 |             print("\t" + out_path + ".csv")
 91 |         print("Completed Multilayer Network Embedding for all r values.")
 92 |     else:
 93 |         print("Whoops!")
 94 | 
 95 | 
 96 | if __name__ == '__main__':
 97 |     args = parse_args()
 98 |     args.rvals = [args.rvals]
 99 |     main(args)
100 | 


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | numpy==1.12.1
2 | pandas==0.24.0
3 | gensim==2.3.0
4 | networkx==2.5.1
5 | 


--------------------------------------------------------------------------------
/results/test/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jdwilson4/multi-node2vec/f6f86707e10227a7609bfcce5db4b21e03c932ea/results/test/.DS_Store


--------------------------------------------------------------------------------
/src/__init__.py:
--------------------------------------------------------------------------------
1 | from .multinode2vec import *
2 | from .mltn2v_utils import *


--------------------------------------------------------------------------------
/src/__pycache__/__init__.cpython-36.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jdwilson4/multi-node2vec/f6f86707e10227a7609bfcce5db4b21e03c932ea/src/__pycache__/__init__.cpython-36.pyc


--------------------------------------------------------------------------------
/src/__pycache__/mltn2v_utils.cpython-36.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jdwilson4/multi-node2vec/f6f86707e10227a7609bfcce5db4b21e03c932ea/src/__pycache__/mltn2v_utils.cpython-36.pyc


--------------------------------------------------------------------------------
/src/__pycache__/multinode2vec.cpython-36.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jdwilson4/multi-node2vec/f6f86707e10227a7609bfcce5db4b21e03c932ea/src/__pycache__/multinode2vec.cpython-36.pyc


--------------------------------------------------------------------------------
/src/__pycache__/nbrhd_gen_walk_nx.cpython-36.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jdwilson4/multi-node2vec/f6f86707e10227a7609bfcce5db4b21e03c932ea/src/__pycache__/nbrhd_gen_walk_nx.cpython-36.pyc


--------------------------------------------------------------------------------
/src/mltn2v_utils.py:
--------------------------------------------------------------------------------
  1 | """
  2 | Helper functions for parsing multilayer graphs and layers.
  3 | 
  4 | Details of multi-node2vec can be found in the paper: "Fast Embedding of Multilayer Networks: An Algorithm and Application to Group fMRI" 
  5 | by JD Wilson, M Baybay, R Sankar, and P Stillman
  6 | 
  7 | Preprint here: https://arxiv.org/pdf/1809.06437.pdf
  8 | 
  9 | Contributors:
 10 | - Melanie Baybay
 11 | University of San Francisco, Department of Computer Science
 12 | - Rishi Sankar
 13 | Henry M. Gunn High School
 14 | - James D. Wilson (maintainer)
 15 | University of San Francisco, Department of Mathematics and Statistics
 16 | 
 17 | Questions or Bugs? Contact James D. Wilson at jdwilson4@usfca.edu
 18 | """
 19 | 
 20 | import os
 21 | import pandas as pd
 22 | from pandas.api.types import is_numeric_dtype
 23 | import time
 24 | 
 25 | 
 26 | # -------------------------------------------------------------------------------
 27 | # PARSING AND CONVERSION FOR MULTILAYER GRAPHS
 28 | # -------------------------------------------------------------------------------
 29 | def parse_matrix_layers(network_dir, delim=',', binary=False, thresh=None):
 30 |     """
 31 |     Converts directory of adjacency matrix files into pandas dataframes.
 32 |     :param network_dir: Directory of adjacency matrix files
 33 |     :param delim: separator for adjacency matrix
 34 |     :param binary: boolean of whether or not to convert edge weights to binary
 35 |     :param thresh: threshold for edge weights. Will accepts weights <= thresh
 36 |     :return: List of adjacency lists. Each adjacency list is one layer and is represented
 37 |             as pandas DataFrames with 'source', 'target', 'weight' columns.
 38 |     """
 39 |     # expand directory path
 40 |     network_dir = expand_path(network_dir)
 41 | 
 42 |     # iterate files and convert to pandas dataframes
 43 |     layers = []
 44 |     for network_file in os.listdir(network_dir):
 45 |         file_path = os.path.join(network_dir, network_file)
 46 |         try:
 47 |             # read as pandas DataFrame, index=source, col=target
 48 |             layer = pd.read_csv(file_path, index_col=0)
 49 |             if layer.shape[0] != layer.shape[1]:
 50 |                 print('[ERROR] Invalid adjacency matrix. Expecting matrix with index as source and column as target.')
 51 |                 return
 52 |             if thresh is not None:
 53 |                 layer[layer <= thresh] = 0
 54 |             if binary:
 55 |                 layer[layer != 0] = 1
 56 |             # ensure that index (node name) is string, since word2vec will need it as str
 57 |             if is_numeric_dtype(layer.index):
 58 |                 layer.index = layer.index.map(str)
 59 |             # replace all 0s with NaN
 60 |             layer.replace(to_replace=0, value=pd.np.nan, inplace=True)
 61 |             # convert matrix --> adjacency list with cols ["source", "target", "weight"]
 62 |             layer = layer.stack(dropna=True).reset_index()
 63 |             # rename columns
 64 |             layer.columns = ["source", "target", "weight"]
 65 |             layers.append(layer)
 66 |         except Exception as e:
 67 |             print('[ERROR] Could not read file "{}": {} '.format(file_path, e))
 68 |     return layers
 69 | 
 70 | 
 71 | def expand_path(path):
 72 |     """
 73 |     Expands a file path to handle user and environmental variables.
 74 |     :param path: path to expand
 75 |     :return: expanded path
 76 |     """
 77 |     new_path = os.path.expanduser(path)
 78 |     return os.path.expandvars(new_path)
 79 | 
 80 | 
 81 | # -------------------------------------------------------------------------------
 82 | # OUTPUT
 83 | # -------------------------------------------------------------------------------
 84 | def feature_matrix_to_csv(ftrs, filename):
 85 |     """
 86 |     Convert feature matrix to csv.
 87 |     :param matrix: pandas DataFrame  of features
 88 |     :param filename: absolute path to output file (no extension)
 89 |     :param nodes_on: if True, first column indicates node_id
 90 |     :return:
 91 |     """
 92 |     out = filename + ".csv"
 93 |     ftrs.to_csv(out, sep=',', header=False)
 94 |     return
 95 | 
 96 | 
 97 | def timed_invoke(action_desc, method):
 98 |     """
 99 |     Invokes a method with timing.
100 |     :param action_desc: The string describing the method action
101 |     :param method: The method to invoke
102 |     :return: The return object of the method
103 |     """
104 |     print('Started {}...'.format(action_desc))
105 |     start = time.time()
106 |     try:
107 |         output = method()
108 |         print('Finished {} in {} seconds'.format(action_desc, int(time.time() - start)))
109 |         return output
110 |     except Exception:
111 |         print('Exception while {} after {} seconds'.format(action_desc, int(time.time() - start)))
112 |         raise
113 | 
114 | 
115 | def clean_output(directory):
116 |     """
117 |     Checks if output directory exists, otherwise it is created.
118 |     """
119 |     directory = expand_path(directory)
120 |     if os.path.isdir(directory):
121 |         return directory
122 |     else:
123 |         os.makedirs(directory)
124 |         print("[WARNING] Directory not found. Created {}".format(directory))
125 |         return directory
126 | 


--------------------------------------------------------------------------------
/src/multinode2vec.py:
--------------------------------------------------------------------------------
  1 | """
  2 | Core functions of the multi-node2vec algorithm. 
  3 | 
  4 | Details can be found in the paper: "Fast Embedding of Multilayer Networks: An Algorithm and Application to Group fMRI" 
  5 | by JD Wilson, M Baybay, R Sankar, and P Stillman
  6 | 
  7 | Preprint here: https://arxiv.org/pdf/1809.06437.pdf
  8 | 
  9 | Contributors:
 10 | - Melanie Baybay
 11 | University of San Francisco, Department of Computer Science
 12 | - Rishi Sankar
 13 | Henry M. Gunn High School
 14 | - James D. Wilson (maintainer)
 15 | University of San Francisco, Department of Mathematics and Statistics
 16 | 
 17 | Questions or Bugs? Contact James D. Wilson at jdwilson4@usfca.edu
 18 | """
 19 | from gensim.models import word2vec as w2v
 20 | from .mltn2v_utils import *
 21 | from .nbrhd_gen_walk_nx import *
 22 | import time
 23 | import networkx as nx
 24 | 
 25 | 
 26 | # -------------------------------------------------------------------------------
 27 | # multinode2vec
 28 | # -------------------------------------------------------------------------------
 29 | def generate_features(nbrhds, d, out, nbrhd_size=-1, w2v_iter=1, workers=8, sg=1):
 30 |     """
 31 |     Generates d features for each unique node in a multilayer network based on
 32 |     its neighborhood.
 33 | 
 34 |     :param G_m: multilayer graph
 35 |     :param d: feature dimensionality
 36 |     :param out: absolute path for output file (no extension, file type)
 37 |     :param nbrhd_size: window size for Skip-Gram optimization
 38 |     :param n_samples: number of generated neighborhoods per node
 39 |     :param w2v_iter: number of word2vec training epochs
 40 |     :param workers: number of workers
 41 |     :param sg: sets word2vec architecture. 1 for Skip-Gram, 0 for CBOW
 42 |     :return: n x d network embedding
 43 |     """
 44 |     print("Total Neighborhoods: {}".format(len(nbrhds)))
 45 |     w2v_model = w2v.Word2Vec(nbrhds, size=d, window=nbrhd_size, min_count=0,
 46 |                              workers=workers, iter=w2v_iter, sg=sg)
 47 |     embfile = out + ".emb"
 48 |     splitpath = embfile.split('/')
 49 |     if len(splitpath) > 1:
 50 |     	dirs = embfile[:-len(splitpath[-1])]
 51 |     	if not os.path.exists(dirs):
 52 |     		os.makedirs(dirs)
 53 |     if not os.path.exists(embfile):
 54 |     	with open(embfile, 'w'): pass
 55 |     w2v_model.wv.save_word2vec_format(embfile)
 56 |     ftrs = emb_to_pandas(embfile)
 57 |     feature_matrix_to_csv(ftrs, out)
 58 |     return ftrs
 59 | 
 60 | 
 61 | # -------------------------------------------------------------------------------
 62 | # NEIGHBORHOODS
 63 | # -------------------------------------------------------------------------------
 64 | def extract_neighborhoods_walk(layers, nbrhd_size, wvals, p, q, is_directed=False, weighted=False):
 65 |     nxg = []
 66 |     for layer in layers:
 67 |         nxg.append(nx.convert_matrix.from_pandas_edgelist(layer,edge_attr='weight'))
 68 | 
 69 |     start = time.time()
 70 |     nbrhd_gen = NeighborhoodGen(nxg, p, q, is_directed=is_directed, weighted=weighted)
 71 |     print("Finished initialization of neighborhood generator in " + str(time.time() - start) + " seconds.")
 72 | 
 73 |     neighborhood_dict = {}
 74 |     for w in wvals:
 75 |         neighborhoods = []
 76 |         for i in range(len(nxg)):
 77 |             layer = nxg[i]
 78 |             for node in layer.nodes():
 79 |                 for j in range(52):
 80 |                     neighborhoods.append(nbrhd_gen.multinode2vec_walk(w, nbrhd_size, node, i))
 81 |         print("Finished nbrhd generation for r=" + str(w))
 82 |         neighborhood_dict[w] = neighborhoods
 83 | 
 84 |     return neighborhood_dict
 85 | 
 86 | def extract_neighborhoods(layers, nbrhd_size, n_samples, weighted=False):
 87 |     """
 88 |     Extracts neighborhoods of length, nbrhd_size, for each node in each layer.
 89 |     :param layers: list of adjacency lists as pandas DataFrames with columns ["source", "target", "weight"]
 90 |     :param nbrhd_size: number of nodes per neighborhood
 91 |     :param n_samples: number of samples per node
 92 |     :param weighted: whether to select neighborhoods by highest weight
 93 |     :return: list of neighborhoods, represented as lists
 94 |     """
 95 |     neighborhoods = []
 96 |     if weighted:
 97 |         for layer in layers:
 98 |             for node in layer["source"].unique():
 99 |                 neighbors = layer.loc[layer["source"] == node, "target"]
100 |                 neighbors.sort_values(by="weight", ascending=False, inplace=True)
101 |                 neighborhoods.extend(
102 |                     extract_node_neighborhoods(node, neighbors, nbrhd_size, n_samples)
103 |                 )
104 |     else:
105 |         for layer in layers:
106 |             for node in layer["source"].unique():
107 |                 neighbors = layer.loc[layer["source"] == node, "target"]
108 |                 neighborhoods.extend(
109 |                     extract_node_neighborhoods(node, neighbors, nbrhd_size, n_samples)
110 |                 )
111 |     return neighborhoods
112 | 
113 | 
114 | def extract_node_neighborhoods(node, neighbors, nbrhd_size, n_samples):
115 |     if len(neighbors) < nbrhd_size:
116 |         print("[WARNING] Selected neighborhood size {} > node-{}'s degree {}. "
117 |               "Setting neighborhood size to {} for node-{}."
118 |               .format(nbrhd_size, node, len(neighbors), len(neighbors), node))
119 |         nbrhd_size = len(neighbors)
120 |     node_neighborhoods = []
121 |     n = 0
122 |     while n < n_samples:
123 |         nbrhd = [node]
124 |         nbrhd.extend(neighbors.sample(n=nbrhd_size-1).values)
125 |         node_neighborhoods.append(nbrhd)
126 |         n += 1
127 |     return node_neighborhoods
128 | 
129 | 
130 | # -------------------------------------------------------------------------------
131 | # HELPERS
132 | # -------------------------------------------------------------------------------
133 | def emb_to_pandas(emb_file):
134 |     """
135 |     Converts embedding file, as extracted from trained word2vec model, to a numpy n-dimensional array.
136 | 
137 |     :param emb_file: absolute path to word2vec embedding file
138 |     :return: numpy ndarray, (N x d)
139 |     """
140 |     ftrs = pd.read_csv(emb_file, delim_whitespace=True, skiprows=1, header=None, index_col=0)
141 |     ftrs.sort_index(inplace=True)
142 |     return ftrs
143 | 


--------------------------------------------------------------------------------
/src/nbrhd_gen_walk.py:
--------------------------------------------------------------------------------
  1 | '''
  2 | Neighborhood aliasing procedure used for fast random walks on multilayer networks. 
  3 | 
  4 | Details can be found in the paper: "Fast Embedding of Multilayer Networks: An Algorithm and Application to Group fMRI" 
  5 | by JD Wilson, M Baybay, R Sankar, and P Stillman
  6 | 
  7 | Preprint here: https://arxiv.org/pdf/1809.06437.pdf
  8 | 
  9 | Contributors:
 10 | - Melanie Baybay
 11 | University of San Francisco, Department of Computer Science
 12 | - Rishi Sankar
 13 | Henry M. Gunn High School
 14 | - James D. Wilson (maintainer)
 15 | University of San Francisco, Department of Mathematics and Statistics
 16 | 
 17 | Questions or Bugs? Contact James D. Wilson at jdwilson4@usfca.edu
 18 | '''
 19 | 
 20 | 
 21 | import numpy as np
 22 | import networkx as nx
 23 | import random
 24 | #import multiprocessing
 25 | import threading
 26 | import time
 27 | 
 28 | #is is_directed needed?
 29 | 
 30 | class NeighborhoodGen():
 31 | 	def __init__(self, graph, p, q, thread_limit=1, is_directed=False, weighted=False):
 32 | 		self.G = graph
 33 | 		self.is_directed = is_directed
 34 | 		self.p = p
 35 | 		self.q = q
 36 | 		self.weighted = weighted
 37 | 		self.thread_limit = thread_limit
 38 | 
 39 | 		self.preprocess_transition_probs()
 40 | 
 41 | 	def multinode2vec_walk(self, w, walk_length, start_node, start_layer_id):
 42 | 		'''
 43 | 		Simulate a random walk starting from start node. (Generate one neighborhood)
 44 | 		'''
 45 | 
 46 | 		G = self.G
 47 | 		alias_nodes = self.alias_nodes
 48 | 		alias_edges = self.alias_edges
 49 | 
 50 | 		walk = [start_node] #nbrhd
 51 | 		cur_layer_id = start_layer_id
 52 | 		force_switch = False
 53 | 		while len(walk) < walk_length:
 54 | 			cur = walk[-1]
 55 | 			if not force_switch:
 56 | 				prev_layer_id = cur_layer_id
 57 | 			random.seed(1234)
 58 | 			rval = random.random()
 59 | 			if rval < w or force_switch: #then switch layer
 60 | 				total_layers = len(G)
 61 | 				rlay = random.randint(0, total_layers - 2)
 62 | 				if rlay >= cur_layer_id:
 63 | 					rlay += 1
 64 | 				cur_layer_id = rlay
 65 | 				force_switch = False
 66 | 			cur_layer = G[cur_layer_id]
 67 | 			try:
 68 | 				cur_nbrs = sorted(cur_layer.neighbors(cur))
 69 | 				if len(cur_nbrs) > 0:
 70 | 					if len(walk) == 1 or prev_layer_id != cur_layer_id:
 71 | 						walk.append(cur_nbrs[alias_draw(alias_nodes[cur_layer_id][cur][0], alias_nodes[cur_layer_id][cur][1])])
 72 | 					else:
 73 | 						prev = walk[-2]
 74 | 						next = cur_nbrs[alias_draw(alias_edges[cur_layer_id][(prev, cur)][0],
 75 | 							alias_edges[cur_layer_id][(prev, cur)][1])]
 76 | 						walk.append(next)
 77 | 				else:
 78 | 					force_switch = True
 79 | 					continue
 80 | 			except Exception as e:
 81 | 				force_switch = True
 82 | 				continue
 83 | 
 84 | 		return walk
 85 | 
 86 | 	def simulate_walks(self, num_walks, walk_length):
 87 | 		'''
 88 | 		Repeatedly simulate random walks from each node.
 89 | 		'''
 90 | 		G = self.G
 91 | 		walks = {}
 92 | 		for layer in G:
 93 | 			walks[layer] = []
 94 | 			nodes = list(layer.nodes())
 95 | 			print('Walk iteration:')
 96 | 			for walk_iter in range(num_walks):
 97 | 				print(str(walk_iter+1), '/', str(num_walks))
 98 | 				random.shuffle(nodes)
 99 | 				for node in nodes:
100 | 					walks[layer].append(self.node2vec_walk(walk_length=walk_length, start_node=node))
101 | 
102 | 		return walks
103 | 
104 | 	def get_alias_edge(self, src, dst, layer):
105 | 		'''
106 | 		Get the alias edge setup lists for a given edge.
107 | 		'''
108 | 		p = self.p
109 | 		q = self.q
110 | 
111 | 		unnormalized_probs = []
112 | 		for dst_nbr in sorted(layer.neighbors(dst)):
113 | 			if dst_nbr == src:
114 | 				unnormalized_probs.append(layer[dst][dst_nbr]['weight']/p)
115 | 			elif layer.has_edge(dst_nbr, src):
116 | 				unnormalized_probs.append(layer[dst][dst_nbr]['weight'])
117 | 			else:
118 | 				unnormalized_probs.append(layer[dst][dst_nbr]['weight']/q)
119 | 		norm_const = sum(unnormalized_probs)
120 | 		normalized_probs =  [float(u_prob)/norm_const for u_prob in unnormalized_probs]
121 | 
122 | 		return alias_setup(normalized_probs)
123 | 
124 | 	def preprocess_transition_probs(self):
125 | 		'''
126 | 		Preprocessing of transition probabilities for guiding the random walks.
127 | 		'''
128 | 		G = self.G
129 | 		is_directed = self.is_directed
130 | 
131 | 		self.alias_nodes = {}
132 | 		self.alias_edges = {}
133 | 		self.lock = threading.Lock()
134 | 
135 | 		tlimit = self.thread_limit
136 | 		layer_count = len(self.G)
137 | 		counter = 0
138 | 		if tlimit == 1:
139 | 			for i in range(layer_count):
140 | 				self.preprocess_thread(self.G[i],i)
141 | 		else:
142 | 			while counter < layer_count:
143 | 				threads = []
144 | 				rem = layer_count - counter
145 | 				if rem >= tlimit:
146 | 					for i in range(tlimit):
147 | 						thread = threading.Thread(target=self.preprocess_thread, args=(self.G[counter],counter,))
148 | 						threads.append(thread)
149 | 						thread.start()
150 | 						counter += 1
151 | 				else:
152 | 					for i in range(rem):
153 | 						thread = threading.Thread(target=self.preprocess_thread, args=(self.G[counter],counter,))
154 | 						threads.append(thread)
155 | 						thread.start()
156 | 						counter += 1
157 | 				for thread in threads:
158 | 					thread.join()
159 | 
160 | 		return
161 | 
162 | 	def preprocess_thread(self, layer, counter):
163 | 		start_time = time.time()
164 | 		print("Starting thread for layer " + str(counter))
165 | 		alias_nodes = {}
166 | 		for node in layer.nodes():
167 | 			unnormalized_probs = [layer[node][nbr]['weight'] for nbr in sorted(layer.neighbors(node))]
168 | 			norm_const = sum(unnormalized_probs)
169 | 			normalized_probs =  [float(u_prob)/norm_const for u_prob in unnormalized_probs]
170 | 			alias_nodes[node] = alias_setup(normalized_probs)
171 | 
172 | 		alias_edges = {}
173 | 		triads = {}
174 | 
175 | 		if self.is_directed:
176 | 			for edge in layer.edges():
177 | 				alias_edges[edge] = self.get_alias_edge(edge[0], edge[1], layer)
178 | 		else:
179 | 			for edge in layer.edges():
180 | 				alias_edges[edge] = self.get_alias_edge(edge[0], edge[1], layer)
181 | 				alias_edges[(edge[1], edge[0])] = self.get_alias_edge(edge[1], edge[0], layer)
182 | 
183 | 		self.lock.acquire()
184 | 		try:
185 | 			self.alias_nodes[counter] = alias_nodes
186 | 			self.alias_edges[counter] = alias_edges
187 | 		finally:
188 | 			self.lock.release()
189 | 
190 | 		print("Finished thread for layer " + str(counter) + " in " + str(time.time() - start_time) + " seconds.")
191 | 
192 | 		return
193 | 
194 | def alias_setup(probs):
195 | 	'''
196 | 	Compute utility lists for non-uniform sampling from discrete distributions.
197 | 	Refer to https://hips.seas.harvard.edu/blog/2013/03/03/the-alias-method-efficient-sampling-with-many-discrete-outcomes/
198 | 	for details
199 | 	'''
200 | 	K = len(probs)
201 | 	q = np.zeros(K)
202 | 	J = np.zeros(K, dtype=np.int)
203 | 
204 | 	smaller = []
205 | 	larger = []
206 | 	for kk, prob in enumerate(probs):
207 | 		q[kk] = K*prob
208 | 		if q[kk] < 1.0:
209 | 			smaller.append(kk)
210 | 		else:
211 | 			larger.append(kk)
212 | 
213 | 	while len(smaller) > 0 and len(larger) > 0:
214 | 		small = smaller.pop()
215 | 		large = larger.pop()
216 | 
217 | 		J[small] = large
218 | 		q[large] = q[large] + q[small] - 1.0
219 | 		if q[large] < 1.0:
220 | 			smaller.append(large)
221 | 		else:
222 | 			larger.append(large)
223 | 
224 | 	return J, q
225 | 
226 | def alias_draw(J, q):
227 | 	'''
228 | 	Draw sample from a non-uniform discrete distribution using alias sampling.
229 | 	'''
230 | 	K = len(J)
231 | 
232 | 	kk = int(np.floor(np.random.rand()*K))
233 | 	if np.random.rand() < q[kk]:
234 | 		return kk
235 | 	else:
236 | 		return J[kk]
237 | 


--------------------------------------------------------------------------------
/src/nbrhd_gen_walk_nx.py:
--------------------------------------------------------------------------------
  1 | '''
  2 | Neighborhood aliasing procedure used for fast random walks on multilayer networks. 
  3 | 
  4 | Details can be found in the paper: "Fast Embedding of Multilayer Networks: An Algorithm and Application to Group fMRI" 
  5 | by JD Wilson, M Baybay, R Sankar, and P Stillman
  6 | 
  7 | Preprint here: https://arxiv.org/pdf/1809.06437.pdf
  8 | 
  9 | Contributors:
 10 | - Melanie Baybay
 11 | University of San Francisco, Department of Computer Science
 12 | - Rishi Sankar
 13 | Henry M. Gunn High School
 14 | - James D. Wilson (maintainer)
 15 | University of San Francisco, Department of Mathematics and Statistics
 16 | 
 17 | Questions or Bugs? Contact James D. Wilson at jdwilson4@usfca.edu
 18 | '''
 19 | 
 20 | 
 21 | import numpy as np
 22 | import networkx as nx
 23 | import random
 24 | #import multiprocessing
 25 | import threading
 26 | import time
 27 | 
 28 | #is is_directed needed?
 29 | 
 30 | class NeighborhoodGen():
 31 | 	def __init__(self, graph, p, q, thread_limit=1, is_directed=False, weighted=False):
 32 | 		self.G = graph
 33 | 		self.is_directed = is_directed
 34 | 		self.p = p
 35 | 		self.q = q
 36 | 		self.weighted = weighted
 37 | 		self.thread_limit = thread_limit
 38 | 
 39 | 		self.preprocess_transition_probs()
 40 | 
 41 | 	def multinode2vec_walk(self, w, walk_length, start_node, start_layer_id):
 42 | 		'''
 43 | 		Simulate a random walk starting from start node. (Generate one neighborhood)
 44 | 		'''
 45 | 
 46 | 		G = self.G
 47 | 		alias_nodes = self.alias_nodes
 48 | 		alias_edges = self.alias_edges
 49 | 
 50 | 		walk = [start_node] #nbrhd
 51 | 		cur_layer_id = start_layer_id
 52 | 		force_switch = False
 53 | 		while len(walk) < walk_length:
 54 | 			cur = walk[-1]
 55 | 			if not force_switch:
 56 | 				prev_layer_id = cur_layer_id
 57 | 			random.seed(1234)
 58 | 			rval = random.random()
 59 | 			if rval < w or force_switch: #then switch layer CHECK THIS!!
 60 | 				total_layers = len(G)
 61 | 				rlay = random.randint(0, total_layers - 2)
 62 | 				if rlay >= cur_layer_id:
 63 | 					rlay += 1
 64 | 				cur_layer_id = rlay
 65 | 				force_switch = False
 66 | 			cur_layer = G[cur_layer_id]
 67 | 			try:
 68 | 				cur_nbrs = sorted(cur_layer.neighbors(cur))
 69 | 				if len(cur_nbrs) > 0:
 70 | 					if len(walk) == 1 or prev_layer_id != cur_layer_id:
 71 | 						walk.append(cur_nbrs[alias_draw(alias_nodes[cur_layer_id][cur][0], alias_nodes[cur_layer_id][cur][1])])
 72 | 					else:
 73 | 						prev = walk[-2]
 74 | 						next = cur_nbrs[alias_draw(alias_edges[cur_layer_id][(prev, cur)][0],
 75 | 							alias_edges[cur_layer_id][(prev, cur)][1])]
 76 | 						walk.append(next)
 77 | 				else:
 78 | 					force_switch = True
 79 | 					continue
 80 | 			except Exception as e:
 81 | 				force_switch = True
 82 | 				continue
 83 | 
 84 | 		return walk
 85 | 
 86 | 	def simulate_walks(self, num_walks, walk_length):
 87 | 		'''
 88 | 		Repeatedly simulate random walks from each node.
 89 | 		'''
 90 | 		G = self.G
 91 | 		walks = {}
 92 | 		for layer in G:
 93 | 			walks[layer] = []
 94 | 			nodes = list(layer.nodes())
 95 | 			print('Walk iteration:')
 96 | 			for walk_iter in range(num_walks):
 97 | 				print(str(walk_iter+1), '/', str(num_walks))
 98 | 				random.shuffle(nodes)
 99 | 				for node in nodes:
100 | 					walks[layer].append(self.node2vec_walk(walk_length=walk_length, start_node=node))
101 | 
102 | 		return walks
103 | 
104 | 	def get_alias_edge(self, src, dst, layer):
105 | 		'''
106 | 		Get the alias edge setup lists for a given edge.
107 | 		'''
108 | 		p = self.p
109 | 		q = self.q
110 | 
111 | 		unnormalized_probs = []
112 | 		for dst_nbr in sorted(layer.neighbors(dst)):
113 | 			if dst_nbr == src:
114 | 				unnormalized_probs.append(layer[dst][dst_nbr]['weight']/p)
115 | 			elif layer.has_edge(dst_nbr, src):
116 | 				unnormalized_probs.append(layer[dst][dst_nbr]['weight'])
117 | 			else:
118 | 				unnormalized_probs.append(layer[dst][dst_nbr]['weight']/q)
119 | 		norm_const = sum(unnormalized_probs)
120 | 		normalized_probs =  [float(u_prob)/norm_const for u_prob in unnormalized_probs]
121 | 
122 | 		return alias_setup(normalized_probs)
123 | 
124 | 	def preprocess_transition_probs(self):
125 | 		'''
126 | 		Preprocessing of transition probabilities for guiding the random walks.
127 | 		'''
128 | 		G = self.G
129 | 		is_directed = self.is_directed
130 | 
131 | 		self.alias_nodes = {}
132 | 		self.alias_edges = {}
133 | 		self.lock = threading.Lock()
134 | 
135 | 		tlimit = self.thread_limit
136 | 		layer_count = len(self.G)
137 | 		counter = 0
138 | 		if tlimit == 1:
139 | 			for i in range(layer_count):
140 | 				self.preprocess_thread(self.G[i],i)
141 | 		else:
142 | 			while counter < layer_count:
143 | 				threads = []
144 | 				rem = layer_count - counter
145 | 				if rem >= tlimit:
146 | 					for i in range(tlimit):
147 | 						thread = threading.Thread(target=self.preprocess_thread, args=(self.G[counter],counter,))
148 | 						threads.append(thread)
149 | 						thread.start()
150 | 						counter += 1
151 | 				else:
152 | 					for i in range(rem):
153 | 						thread = threading.Thread(target=self.preprocess_thread, args=(self.G[counter],counter,))
154 | 						threads.append(thread)
155 | 						thread.start()
156 | 						counter += 1
157 | 				for thread in threads:
158 | 					thread.join()
159 | 
160 | 		return
161 | 
162 | 	def preprocess_thread(self, layer, counter):
163 | 		start_time = time.time()
164 | 		print("Starting thread for layer " + str(counter))
165 | 		alias_nodes = {}
166 | 		for node in layer.nodes():
167 | 			unnormalized_probs = [layer[node][nbr]['weight'] for nbr in sorted(layer.neighbors(node))]
168 | 			norm_const = sum(unnormalized_probs)
169 | 			normalized_probs =  [float(u_prob)/norm_const for u_prob in unnormalized_probs]
170 | 			alias_nodes[node] = alias_setup(normalized_probs)
171 | 
172 | 		alias_edges = {}
173 | 		triads = {}
174 | 
175 | 		if self.is_directed:
176 | 			for edge in layer.edges():
177 | 				alias_edges[edge] = self.get_alias_edge(edge[0], edge[1], layer)
178 | 		else:
179 | 			for edge in layer.edges():
180 | 				alias_edges[edge] = self.get_alias_edge(edge[0], edge[1], layer)
181 | 				alias_edges[(edge[1], edge[0])] = self.get_alias_edge(edge[1], edge[0], layer)
182 | 
183 | 		self.lock.acquire()
184 | 		try:
185 | 			self.alias_nodes[counter] = alias_nodes
186 | 			self.alias_edges[counter] = alias_edges
187 | 		finally:
188 | 			self.lock.release()
189 | 
190 | 		print("Finished thread for layer " + str(counter) + " in " + str(time.time() - start_time) + " seconds.")
191 | 
192 | 		return
193 | 
194 | def alias_setup(probs):
195 | 	'''
196 | 	Compute utility lists for non-uniform sampling from discrete distributions.
197 | 	Refer to https://hips.seas.harvard.edu/blog/2013/03/03/the-alias-method-efficient-sampling-with-many-discrete-outcomes/
198 | 	for details
199 | 	'''
200 | 	K = len(probs)
201 | 	q = np.zeros(K)
202 | 	J = np.zeros(K, dtype=np.int)
203 | 
204 | 	smaller = []
205 | 	larger = []
206 | 	for kk, prob in enumerate(probs):
207 | 		q[kk] = K*prob
208 | 		if q[kk] < 1.0:
209 | 			smaller.append(kk)
210 | 		else:
211 | 			larger.append(kk)
212 | 
213 | 	while len(smaller) > 0 and len(larger) > 0:
214 | 		small = smaller.pop()
215 | 		large = larger.pop()
216 | 
217 | 		J[small] = large
218 | 		q[large] = q[large] + q[small] - 1.0
219 | 		if q[large] < 1.0:
220 | 			smaller.append(large)
221 | 		else:
222 | 			larger.append(large)
223 | 
224 | 	return J, q
225 | 
226 | def alias_draw(J, q):
227 | 	'''
228 | 	Draw sample from a non-uniform discrete distribution using alias sampling.
229 | 	'''
230 | 	K = len(J)
231 | 
232 | 	kk = int(np.floor(np.random.rand()*K))
233 | 	if np.random.rand() < q[kk]:
234 | 		return kk
235 | 	else:
236 | 		return J[kk]
237 | 


--------------------------------------------------------------------------------
/src/utils.py:
--------------------------------------------------------------------------------
  1 | """
  2 | Helper functions for multi-node2vec. (duplicate of mltn2v_utils.py used for testing)
  3 | Details can be found in the paper: "Fast Embedding of Multilayer Networks: An Algorithm and Application to Group fMRI" 
  4 | by JD Wilson, M Baybay, R Sankar, and P Stillman
  5 | 
  6 | Preprint here: https://arxiv.org/pdf/1809.06437.pdf
  7 | 
  8 | Contributors:
  9 | - Melanie Baybay
 10 | University of San Francisco, Department of Computer Science
 11 | - Rishi Sankar
 12 | Henry M. Gunn High School
 13 | - James D. Wilson (maintainer)
 14 | University of San Francisco, Department of Mathematics and Statistics
 15 | 
 16 | Questions or Bugs? Contact James D. Wilson at jdwilson4@usfca.edu
 17 | """
 18 | 
 19 | import os
 20 | import pandas as pd
 21 | from pandas.api.types import is_numeric_dtype
 22 | import time
 23 | 
 24 | 
 25 | # -------------------------------------------------------------------------------
 26 | # PARSING AND CONVERSION FOR MULTILAYER GRAPHS
 27 | # -------------------------------------------------------------------------------
 28 | def parse_matrix_layers(network_dir, delim=',', binary=False, thresh=None):
 29 |     """
 30 |     Converts directory of adjacency matrix files into pandas dataframes.
 31 |     :param network_dir: Directory of adjacency matrix files
 32 |     :param delim: separator for adjacency matrix
 33 |     :param binary: boolean of whether or not to convert edge weights to binary
 34 |     :param thresh: threshold for edge weights. Will accepts weights <= thresh
 35 |     :return: List of adjacency lists. Each adjacency list is one layer and is represented
 36 |             as pandas DataFrames with 'source', 'target', 'weight' columns.
 37 |     """
 38 |     # expand directory path
 39 |     network_dir = expand_path(network_dir)
 40 | 
 41 |     # iterate files and convert to pandas dataframes
 42 |     layers = []
 43 |     for network_file in os.listdir(network_dir):
 44 |         file_path = os.path.join(network_dir, network_file)
 45 |         try:
 46 |             # read as pandas DataFrame, index=source, col=target
 47 |             layer = pd.read_csv(file_path, index_col=0)
 48 |             if layer.shape[0] != layer.shape[1]:
 49 |                 print('[ERROR] Invalid adjacency matrix. Expecting matrix with index as source and column as target.')
 50 |                 return
 51 |             if thresh is not None:
 52 |                 layer[layer <= thresh] = 0
 53 |             if binary:
 54 |                 layer[layer != 0] = 1
 55 |             # ensure that index (node name) is string, since word2vec will need it as str
 56 |             if is_numeric_dtype(layer.index):
 57 |                 layer.index = layer.index.map(str)
 58 |             # replace all 0s with NaN
 59 |             layer.replace(to_replace=0, value=pd.np.nan, inplace=True)
 60 |             # convert matrix --> adjacency list with cols ["source", "target", "weight"]
 61 |             layer = layer.stack(dropna=True).reset_index()
 62 |             # rename columns
 63 |             layer.columns = ["source", "target", "weight"]
 64 |             layers.append(layer)
 65 |         except Exception as e:
 66 |             print('[ERROR] Could not read file "{}": {} '.format(file_path, e))
 67 |     return layers
 68 | 
 69 | 
 70 | def expand_path(path):
 71 |     """
 72 |     Expands a file path to handle user and environmental variables.
 73 |     :param path: path to expand
 74 |     :return: expanded path
 75 |     """
 76 |     new_path = os.path.expanduser(path)
 77 |     return os.path.expandvars(new_path)
 78 | 
 79 | 
 80 | # -------------------------------------------------------------------------------
 81 | # OUTPUT
 82 | # -------------------------------------------------------------------------------
 83 | def feature_matrix_to_csv(ftrs, filename):
 84 |     """
 85 |     Convert feature matrix to csv.
 86 |     :param matrix: pandas DataFrame  of features
 87 |     :param filename: absolute path to output file (no extension)
 88 |     :param nodes_on: if True, first column indicates node_id
 89 |     :return:
 90 |     """
 91 |     out = filename + ".csv"
 92 |     ftrs.to_csv(out, sep=',', header=False)
 93 |     return
 94 | 
 95 | 
 96 | def timed_invoke(action_desc, method):
 97 |     """
 98 |     Invokes a method with timing.
 99 |     :param action_desc: The string describing the method action
100 |     :param method: The method to invoke
101 |     :return: The return object of the method
102 |     """
103 |     print('Started {}...'.format(action_desc))
104 |     start = time.time()
105 |     try:
106 |         output = method()
107 |         print('Finished {} in {} seconds'.format(action_desc, int(time.time() - start)))
108 |         return output
109 |     except Exception:
110 |         print('Exception while {} after {} seconds'.format(action_desc, int(time.time() - start)))
111 |         raise
112 | 
113 | 
114 | def clean_output(directory):
115 |     """
116 |     Checks if output directory exists, otherwise it is created.
117 |     """
118 |     directory = expand_path(directory)
119 |     if os.path.isdir(directory):
120 |         return directory
121 |     else:
122 |         os.makedirs(directory)
123 |         print("[WARNING] Directory not found. Created {}".format(directory))
124 |         return directory
125 | 


--------------------------------------------------------------------------------