├── README.md
├── SIMLR_PY
    ├── LICENSE.txt
    ├── MANIFEST
    ├── Makefile
    ├── README.md
    ├── SIMLR.egg-info
    │   ├── PKG-INFO
    │   ├── SOURCES.txt
    │   ├── dependency_links.txt
    │   └── top_level.txt
    ├── SIMLR
    │   ├── __init__.py
    │   ├── __init__.pyc
    │   ├── __pycache__
    │   │   └── __init__.cpython-39.pyc
    │   ├── core.py
    │   ├── core.pyc
    │   ├── helper.py
    │   └── helper.pyc
    ├── build
    │   ├── lib.linux-x86_64-2.7
    │   │   └── SIMLR
    │   │   │   ├── __init__.py
    │   │   │   ├── core.py
    │   │   │   ├── helper.py
    │   │   │   └── helpers.py
    │   └── lib
    │   │   └── SIMLR
    │   │       ├── __init__.py
    │   │       ├── core.py
    │   │       └── helper.py
    ├── dist
    │   ├── SIMLR-0.0.0.tar.gz
    │   ├── SIMLR-0.1.0.tar.gz
    │   ├── SIMLR-0.1.1.tar.gz
    │   └── SIMLR-0.1.3.tar.gz
    ├── requirements.txt
    ├── setup.py
    └── tests
    │   ├── Zeisel.mat
    │   └── test_largescale.py
├── centrality.py
├── data_loader.py
├── fig1.png
├── fig2.png
├── main.py
├── model.py
├── plot.py
├── prediction.py
└── requirements.txt


/README.md:
--------------------------------------------------------------------------------
  1 | # MultiGraphGAN
  2 | MultiGraphGAN for jointly predicting multiple brain graphs from a single brain graph, coded up in Python by Alaa Bessadok. Please contact alaa.bessadok@gmail.Com for further inquiries. Thanks. 
  3 | 
  4 | This repository provides the official PyTorch implementation of the following paper:
  5 | 
  6 | <p align="center">
  7 |   <img src="./fig1.png">
  8 | </p>
  9 | 
 10 | 
 11 | > **Topology-Aware Generative Adversarial Network for Joint Prediction of Multiple Brain Graphs from a Single Brain Graph**
 12 | > [Alaa Bessadok](https://github.com/AlaaBessadok)<sup>1,2</sup>, [Mohamed Ali Mahjoub]<sup>2</sup>, [Islem Rekik](https://basira-lab.com/)<sup>1</sup>
 13 | > <sup>1</sup>BASIRA Lab, Faculty of Computer and Informatics, Istanbul Technical University, Istanbul, Turkey
 14 | > <sup>2</sup>University of Sousse, Higher Institute of Informatics and Communication Technologies, Sousse, Tunisia
 15 | >
 16 | > **Abstract:** *Multimodal medical datasets with incomplete observations present a barrier to large-scale neuroscience studies. Several works based on Generative Adversarial Networks (GAN) have been recently proposed to predict a set of medical images from a single modality (e.g, FLAIR
 17 | MRI from T1 MRI). However, such frameworks are primarily designed to operate on images, limiting their generalizability to non-Euclidean geometric data such as brain graphs. While a growing number of connectomic studies has demonstrated the promise of including brain graphs for diagnosing neurological disorders, no geometric deep learning work was designed for multiple target brain graphs prediction from a source brain graph. Despite the momentum the field of graph generation has gained
 18 | in the last two years, existing works have two critical drawbacks. First, the bulk of such works aims to learn one model for each target domain to generate from a source domain. Thus, they have a limited scalability in jointly predicting multiple target domains. Second, they merely consider the global topological scale of a graph (i.e., graph connectivity structure) and overlook the local topology at the node scale of a graph (e.g., how central a node is in the graph). To meet these challenges, we introduce MultiGraphGAN architecture, which not only predicts multiple brain graphs from a single brain graph but also preserves the topological structure of each target graph to predict. Its three core contributions lie in: (i) designing a graph adversarial auto-encoder for jointly predicting brain graphs from a single one, (ii) handling the mode collapse problem of GAN by clustering the encoded source graphs and proposing a cluster-specific decoder, (iii) introducing a topological loss to force the reconstruction of topologically sound target brain graphs. Our MultiGraphGAN significantly outperformed its variants thereby showing its great potential in multi-view brain graph generation from a single graph. Our code is available at https://github.com/basiralab/MultiGraphGAN.*
 19 | 
 20 | This work is published in MICCAI 2020, Lima, Peru. MultiGraphGAN is a geometric deep learning framework for jointly predicting multiple brain graphs from a single graph. Using an end-to-end learning fashion, it preserves the topological structure of each target graph. Our MultiGraphGAN framework comprises two key steps (1) source graphs embedding and clustering and, (2) cluster-specific multi-target graph prediction. We have evaluated our method on ABIDE dataset. Detailed information can be found in the original paper and the video in the BASIRA Lab YouTube channel. In this repository, we release the code for training and testing MultiGraphGAN on a simulated dataset.
 21 | 
 22 | # Installation
 23 | 
 24 | The code has been tested with Python 3, PyTorch 1.3.1 on Ubuntu 16.04. GPU is required to run the code. You also need other dependencies (e.g., numpy, yaml, networkx, SIMLR) which can be installed via: 
 25 | 
 26 | ```bash
 27 | pip install -r requirements.txt
 28 | ```
 29 | 
 30 | # Training and testing MultiGraphGAN
 31 | 
 32 | We provide a demo code for the usage of MultiGraphGAN for multiple target graphs prediction from a source graph. In main.py we train MultiGraphGAN on a simulated dataset with 280 subjects and test it on 30 subjects. Each sample has 6 brain graphs (one source graph and five target graphs). In this example, we used three input arguments (i.e., num_domains, nb_clusters and mode), you can add hyper-parameters (e.g., lambda_topology, lambda_rec) and vary their default values.
 33 | 
 34 | You can train the program with the following command:
 35 | 
 36 | ```bash
 37 | python main.py --num_domains=6 --nb_clusters=2 --mode='train'
 38 | ```
 39 | 
 40 | In this example, we simulated a training dataset with 280 samples and a testing set with 30 samples. If you want to test the code using the hyperparameters described in the paper, type in the terminal the following commande:
 41 | 
 42 | ```bash
 43 | python main.py --num_domains=6 --nb_clusters=2 --mode='test'
 44 | ```
 45 | 
 46 | # Input data
 47 | 
 48 | In order to use our framework, you need to provide:
 49 | 
 50 | * a source_target_domains list where each element is a matrix of size (n * f). We denote n the total number of subjects in the dataset and f the number of features. Any element of the list can be considered as the source domain and the rest are the target domains. You need to include your data in the file main.py. So, just remove our simulated training and testing dataset and replace it with yours.
 51 | 
 52 | # Output Data
 53 | 
 54 | If you set the number of source and target domains to 3 using this argument --num_domains=3 , and keep the same size of our simulated data, the execution of main.py will produce saved csv files of the source and target data. Then, you can plot the brain graphs of any subject from the saved csv files. To do so, run the plot.py to get the following outputs especially when running the demo with default parameter setting:
 55 | 
 56 | <p align="center">
 57 |   <img src="./fig2.png">
 58 | </p>
 59 | 
 60 | # YouTube videos to install and run the code and understand how MultiGraphGAN works
 61 | 
 62 | To install and run MultiGraphGAN, check the following YouTube video:
 63 | 
 64 | https://youtu.be/JvT5XtAgbUk
 65 | 
 66 | To learn about how MultiGraphGAN works, check the following YouTube videos:
 67 | 
 68 | Short version (10mn): https://youtu.be/vEnzMQqbdHc
 69 | 
 70 | Long version (20mn): https://youtu.be/yNx7H9NLzlE
 71 | 
 72 | # Related references
 73 | 
 74 | Multi-Marginal Wasserstein GAN (MWGAN): 
 75 | Cao, J., Mo, L., Zhang, Y., Jia, K., Shen, C., Tan, M.: Multi-marginal wasserstein gan. [https://arxiv.org/pdf/1911.00888.pdf] (2019) [https://github.com/caojiezhang/MWGAN].
 76 | 
 77 | Single‐cell Interpretation via Multi‐kernel LeaRning (SIMLR):
 78 | Wang, B., Ramazzotti, D., De Sano, L., Zhu, J., Pierson, E., Batzoglou, S.: SIMLR: a tool for large-scale single-cell analysis by multi-kernel learning. [https://www.biorxiv.org/content/10.1101/052225v3] (2017) [https://github.com/bowang87/SIMLR_PY].
 79 | 
 80 | 
 81 | # Citation
 82 | 
 83 | If our code is useful for your work please cite our paper:
 84 | 
 85 | ```latex
 86 | @inproceedings{bessadok2020,
 87 | title={Topology-Aware Generative Adversarial Network for Joint Prediction of Multiple Brain Graphs from a Single Brain Graph},
 88 | author={Bessadok, Alaa and Mahjoub, Mohamed Ali and Rekik, Islem},
 89 | booktitle={ International Conference on Medical Image Computing and Computer Assisted Intervention},
 90 | year={2020},
 91 | organization={Springer}
 92 | }
 93 | ```
 94 | 
 95 | # MultiGraphGAN on arXiv
 96 | 
 97 | https://arxiv.org/abs/2009.11058
 98 | 
 99 | # Acknowledgement
100 | 
101 | This project has been funded by the 2232 International Fellowship for Outstanding Researchers Program of TUBITAK (Project No:118C288, http://basira-lab.com/reprime/) supporting Dr. Islem Rekik. However, all scientific contributions made in this project are owned and approved solely by the authors.
102 | 
103 | # License
104 | Our code is released under MIT License (see LICENSE file for details).
105 | 
106 | 
107 | 


--------------------------------------------------------------------------------
/SIMLR_PY/LICENSE.txt:
--------------------------------------------------------------------------------
1 | Copyright <2017> <Bo Wang>
2 | 
3 | Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
4 | 
5 | The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
6 | 
7 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
8 | 


--------------------------------------------------------------------------------
/SIMLR_PY/MANIFEST:
--------------------------------------------------------------------------------
1 | # file GENERATED by distutils, do NOT edit
2 | setup.py
3 | SIMLR/__init__.py
4 | SIMLR/core.py
5 | SIMLR/helper.py
6 | 


--------------------------------------------------------------------------------
/SIMLR_PY/Makefile:
--------------------------------------------------------------------------------
1 | init:
2 | 	pip install -r requirements.txt
3 | test:
4 | 	py.test tests
5 | .PHONY: init test
6 | 


--------------------------------------------------------------------------------
/SIMLR_PY/README.md:
--------------------------------------------------------------------------------
 1 | SIMLR 
 2 | ============================================================
 3 | This is a python implementation of the paper published in Nature Methods titled as "Visualization and analysis of single-cell RNA-seq data by kernel-based similarity learning".
 4 | 
 5 | 
 6 | OVERVIEW
 7 | ============================================================
 8 | 
 9 | Single-cell RNA-seq technologies enable high throughput gene expression measurement of individual cells, and allow the discovery of heterogeneity within cell populations. Measurement of cell-to-cell gene expression similarity is critical to identification, visualization and analysis of cell populations. However, single-cell data introduce challenges to conventional measures of gene expression similarity because of the high level of noise, outliers and dropouts. We develop
10 | a novel similarity-learning framework, SIMLR (Single-cell Interpretation via Multi-kernel LeaRning), which learns an appropriate distance metric from the data for dimension reduction, clustering and visualization. SIMLR is capable of separating known subpopulations more accurately in single-cell data sets than do existing dimension reduction methods. Additionally, SIMLR demonstrates high sensitivity and accuracy on high-throughput peripheral blood mononuclear cells
11 | (PBMC) data sets generated by the GemCode single-cell technology from 10x Genomics. 
12 | 
13 | IMPLEMENTATIONS
14 | ============================================================
15 | We provide implementations of SIMLR for large scale single-cell RNA-seq data. With small dataset (e.g, dataset with less than 3,000 cells), we recommend the user to use the matlab package or R package from https://github.com/BatzoglouLabSU/SIMLR. For Large dataset (with more than 3,000 cells), we recommend the user to use the python function called "SIMLR_LARGE".  
16 | 
17 | This large-scale implementation uses approximate version of SIMLR to address the computational issue.  
18 | 
19 | DEMO
20 | ============================================================ 
21 | We provide two demos for the usage of SIMLR in large scale. In test_largescale.py we run SIMLR on Zeisel dataset with 3005 cells in our paper. 
22 | 
23 | DEBUG
24 | ============================================================ 
25 | Please feel free to send us emails if you have touble running our SIMLR. The correspondence email is bowang87@stanford.edu
26 | 
27 | Requirements
28 | ============================================================
29 | 
30 | - `numpy>=1.8`
31 | - `scipy>=0.13.2`
32 | - `annoy>=1.8`
33 | - `sklearn>=0.17`
34 | - `fbpca>=1.0`  
35 | 
36 | Installation
37 | ============================================================
38 |     python setup.py install
39 | or
40 |     pip install SIMLR
41 | 
42 | 
43 | Tutorial
44 | ============================================================
45 | see tests/test_largescale.py
46 | 
47 | 
48 | 
49 | 
50 | 


--------------------------------------------------------------------------------
/SIMLR_PY/SIMLR.egg-info/PKG-INFO:
--------------------------------------------------------------------------------
 1 | Metadata-Version: 1.0
 2 | Name: SIMLR
 3 | Version: 0.1.0
 4 | Summary: Visualization and analysis of single-cell RNA-seq data by kernel-based similarity learning
 5 | Home-page: https://github.com/bowang87/SIMLR-PY
 6 | Author: Bo Wang
 7 | Author-email: bowang87@stanford.edu
 8 | License: UNKNOWN
 9 | Description: UNKNOWN
10 | Platform: UNKNOWN
11 | 


--------------------------------------------------------------------------------
/SIMLR_PY/SIMLR.egg-info/SOURCES.txt:
--------------------------------------------------------------------------------
1 | setup.py
2 | SIMLR/__init__.py
3 | SIMLR/core.py
4 | SIMLR/helper.py
5 | SIMLR.egg-info/PKG-INFO
6 | SIMLR.egg-info/SOURCES.txt
7 | SIMLR.egg-info/dependency_links.txt
8 | SIMLR.egg-info/top_level.txt


--------------------------------------------------------------------------------
/SIMLR_PY/SIMLR.egg-info/dependency_links.txt:
--------------------------------------------------------------------------------
1 | 
2 | 


--------------------------------------------------------------------------------
/SIMLR_PY/SIMLR.egg-info/top_level.txt:
--------------------------------------------------------------------------------
1 | SIMLR
2 | 


--------------------------------------------------------------------------------
/SIMLR_PY/SIMLR/__init__.py:
--------------------------------------------------------------------------------
1 | from .core import SIMLR_LARGE
2 | 


--------------------------------------------------------------------------------
/SIMLR_PY/SIMLR/__init__.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/basiralab/MultiGraphGAN/7b6d2160daaf07c2462b11a7a621e5ead04cef0b/SIMLR_PY/SIMLR/__init__.pyc


--------------------------------------------------------------------------------
/SIMLR_PY/SIMLR/__pycache__/__init__.cpython-39.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/basiralab/MultiGraphGAN/7b6d2160daaf07c2462b11a7a621e5ead04cef0b/SIMLR_PY/SIMLR/__pycache__/__init__.cpython-39.pyc


--------------------------------------------------------------------------------
/SIMLR_PY/SIMLR/core.py:
--------------------------------------------------------------------------------
  1 | """
  2 |     Functions for large scale SIMLR and accuracy checks
  3 | 
  4 |     ---------------------------------------------------------------------
  5 | 
  6 |     This module contains the following functions:
  7 | 
  8 |     save_sparse_csr
  9 |     save a sparse csr format input of single-cell RNA-seq data
 10 |     load_sparse_csr
 11 |     load a sparse csr format input of single-cell RNA-seq data
 12 |     nearest_neighbor_search
 13 |     Approximate Nearset Neighbor search for every cell
 14 |     NE_dn
 15 |     Row-normalization of a matrix
 16 |     mex_L2_distance
 17 |     A fast way to calculate L2 distance
 18 |     Cal_distance_memory
 19 |     Calculate Kernels in a memory-saving mode
 20 |     mex_multipleK
 21 |     A fast way to calculate kernels
 22 |     Hbeta
 23 |     A simple LP method to solve linear weight
 24 |     euclidean_proj_simplex
 25 |     A fast way to calculate simplex projection
 26 |     fast_pca
 27 |     A fast randomized pca with sparse input
 28 |     fast_minibatch_kmeans
 29 |     A fast mini-batch version of k-means
 30 |     SIMLR_Large
 31 |     A large-scale implementation of our SIMLR
 32 |     ---------------------------------------------------------------------
 33 | 
 34 |     Copyright 2016 Bo Wang, Stanford University.
 35 |     All rights reserved.
 36 |     """
 37 | from __future__ import absolute_import
 38 | from __future__ import division
 39 | from __future__ import print_function
 40 | from __future__ import unicode_literals
 41 | from . import helper
 42 | import numpy as np
 43 | import sys
 44 | import os
 45 | from annoy import AnnoyIndex
 46 | import scipy.io as sio
 47 | from scipy.sparse import csr_matrix, csc_matrix, linalg
 48 | from fbpca import svd, pca
 49 | import time
 50 | from sklearn.decomposition import TruncatedSVD
 51 | from sklearn.cluster import MiniBatchKMeans, KMeans
 52 | 
 53 | class SIMLR_LARGE(object):
 54 |     """A class for large-scale SIMLR.
 55 | 
 56 |     Attributes:
 57 |         num_of_rank: The rank hyper-parameter in SIMLR usually set to number of clusters.
 58 |         num_of_neighbors: the number of neighbors kept for each cell to approximate full cell similarities
 59 |         mode_of_memory: an indicator to open the memory-saving mode. This is helpful for datasets of millions of cells. It will sacrify a bit speed though.
 60 |     """
 61 |     def __init__(self, num_of_rank, num_of_neighbor=30, mode_of_memory = False, max_iter = 5):
 62 |         self.num_of_rank = int(num_of_rank)
 63 |         self.num_of_neighbor = int(num_of_neighbor)
 64 |         self.mode_of_memory = mode_of_memory
 65 |         self.max_iter = int(max_iter)
 66 | 
 67 |     def nearest_neighbor_search(self, GE_csc):
 68 |         K = self.num_of_neighbor * 2
 69 |         n,d = GE_csc.shape
 70 |         t = AnnoyIndex(d)
 71 |         for i in range(n):
 72 |             t.add_item(i,GE_csc[i,:])
 73 |         t.build(100)
 74 |         t.save('test.ann')
 75 |         u = AnnoyIndex(d)
 76 |         u.load('test.ann')
 77 |         os.remove('test.ann')
 78 |         val = np.zeros((n,K))
 79 |         ind = np.zeros((n,K))
 80 |         for i in range(n):
 81 |             tmp, tmp1 = u.get_nns_by_item(i,K, include_distances=True)
 82 |             ind[i,:] = tmp
 83 |             val[i,:] = tmp1
 84 |         return ind.astype('int'), val
 85 | 
 86 | 
 87 | 
 88 | 
 89 |     def mex_L2_distance(self, F, ind):
 90 |         m,n = ind.shape
 91 |         I = np.tile(np.arange(m), n)
 92 |         if self.mode_of_memory:
 93 |             temp = np.zeros((m,n))
 94 |             for i in range(n):
 95 |                 temptemp = np.take(F, np.arange(m), axis = 0) - np.take(F, ind[:,i],axis=0)
 96 |                 temp[:,i] = (temptemp*temptemp).sum(axis=1)
 97 |             return temp
 98 |         else:
 99 |             temp = np.take(F, I, axis = 0) - np.take(F, ind.ravel(order = 'F'),axis=0)
100 |             temp = (temp*temp).sum(axis=1)
101 |             return temp.reshape((m,n),order = 'F')
102 | 
103 | 
104 | 
105 |     def Cal_distance_memory(self, S, alpha):
106 |         NT = len(alpha)
107 |         DD = alpha.copy()
108 |         for i in range(NT):
109 |             temp = np.load('Kernel_' + str(i)+'.npy')
110 |             if i == 0:
111 |                 distX = alpha[0]*temp
112 |             else:
113 |                 distX += alpha[i]*temp
114 |             DD[i] = ((temp*S).sum(axis = 0)/(S.shape[0]+0.0)).mean(axis = 0)
115 |         alphaK0 = helper.umkl_bo(DD, 1.0/len(DD));
116 |         alphaK0 = alphaK0/np.sum(alphaK0)
117 |         return distX, alphaK0
118 | 
119 | 
120 |     def mex_multipleK(self, val, ind):
121 |         #val *= val
122 |         KK = self.num_of_neighbor
123 |         ismemory = self.mode_of_memory
124 |         m,n=val.shape
125 |         sigma = np.arange(1,2.1,.25)
126 |         allK = (np.arange(np.ceil(KK/2.0), min(n,np.ceil(KK*1.5))+1, np.ceil(KK/10.0))).astype('int')
127 |         if ismemory:
128 |             D_kernels = []
129 |             alphaK = np.ones(len(allK)*len(sigma))/(0.0 + len(allK)*len(sigma))
130 |         else:
131 |             D_kernels = np.zeros((m,n,len(allK)*len(sigma)))
132 |             alphaK = np.ones(D_kernels.shape[2])/(0.0 + D_kernels.shape[2])
133 |         t = 0;
134 |         for k in allK:
135 |             temp = val[:,np.arange(k)].sum(axis=1)/(k+0.0)
136 |             temp0 = .5*(temp[:,np.newaxis] + np.take(temp,ind))
137 |             temp = val/temp0
138 |             temp*=temp
139 |             for s in sigma:
140 |                 temp1 =  np.exp(-temp/2.0/s/s)/np.sqrt(2*np.pi)/s/temp0
141 |                 temptemp = temp1[:, 0]
142 |                 temp1[:] = .5*(temptemp[:,np.newaxis] + temptemp[ind]) - temp1
143 |                 if ismemory:
144 |                     np.save('Kernel_' + str(t), temp1 - temp1.min())
145 |                 else:
146 |                     D_kernels[:,:,t] = temp1 - temp1.min()
147 |                 t = t+1
148 | 
149 |         return D_kernels, alphaK
150 | 
151 | 
152 |     def fast_eigens(self, val, ind):
153 |         n,d = val.shape
154 |         rows = np.tile(np.arange(n), d)
155 |         cols = ind.ravel(order='F')
156 |         A = csr_matrix((val.ravel(order='F'),(rows,cols)),shape = (n, n)) + csr_matrix((val.ravel(order='F'),(cols,rows)),shape = (n, n))
157 |         (d,V) = linalg.eigsh(A,self.num_of_rank,which='LM')
158 |         d = -np.sort(-np.real(d))
159 |         return np.real(V),d/np.max(abs(d))
160 |     def fast_minibatch_kmeans(self, X,C):
161 |         batchsize = int(min(1000, np.round(X.shape[0]/C/C)))
162 |         cls = MiniBatchKMeans(init='k-means++',n_clusters=C, batch_size = batchsize, n_init = 100, max_iter = 100)
163 |         return cls.fit_predict(X)
164 | 
165 |     def fit(self, X, beta = 0.8):
166 |         K = self.num_of_neighbor
167 |         is_memory = self.mode_of_memory
168 |         c = self.num_of_rank
169 |         NITER = self.max_iter
170 |         n,d = X.shape
171 |         if d > 500:
172 |             print('SIMLR highly recommends you to perform PCA first on the data\n');
173 |             print('Please use the in-line function fast_pca on your input\n');
174 |         ind, val = self.nearest_neighbor_search(X)
175 |         del X
176 |         D_Kernels, alphaK = self.mex_multipleK(val, ind)
177 |         del val
178 |         if is_memory:
179 |             distX,alphaK0 = self.Cal_distance_memory(np.ones((ind.shape[0], ind.shape[1])), alphaK)
180 |         else:
181 |             distX = D_Kernels.dot(alphaK)
182 |         rr = (.5*(K*distX[:,K+2] - distX[:,np.arange(1,K+1)].sum(axis = 1))).mean()
183 |         lambdar = rr
184 |         S0 = distX.max() - distX
185 |         S0[:] = helper.NE_dn(S0)
186 |         F, evalues = self.fast_eigens(S0.copy(), ind.copy())
187 |         F = helper.NE_dn(F)
188 |         F *= (1-beta)*d/(1-beta*d*d);
189 |         F0 = F.copy()
190 |         for iter in range(NITER):
191 |             FF = self.mex_L2_distance(F, ind)
192 |             FF[:] = (distX + lambdar*FF)/2.0/rr
193 |             FF[:] = helper.euclidean_proj_simplex(-FF)
194 |             S0[:] = (1-beta)*S0 + beta*FF
195 | 
196 |             F[:], evalues = self.fast_eigens(S0, ind)
197 |             F *= (1-beta)*d/(1-beta*d*d);
198 |             F[:] = helper.NE_dn(F)
199 |             F[:] = (1-beta)*F0 + beta*F
200 |             F0 = F.copy()
201 |             lambdar = lambdar * 1.5
202 |             rr = rr / 1.05
203 |             if is_memory:
204 |                 distX, alphaK0 = self.Cal_distance_memory(S0, alphaK)
205 |                 alphaK = (1-beta)*alphaK + beta*alphaK0
206 |                 alphaK = alphaK/np.sum(alphaK)
207 | 
208 |             else:
209 |                 DD = ((D_Kernels*S0[:,:,np.newaxis]).sum(axis = 0)/(D_Kernels.shape[0]+0.0)).mean(axis = 0)
210 |                 alphaK0 = helper.umkl_bo(DD, 1.0/len(DD));
211 |                 alphaK0 = alphaK0/np.sum(alphaK0)
212 |                 alphaK = (1-beta)*alphaK + beta*alphaK0
213 |                 alphaK = alphaK/np.sum(alphaK)
214 |                 distX = D_Kernels.dot(alphaK)
215 | 
216 |         if is_memory:
217 |             for i in range(len(alphaK)):
218 |                 os.remove('Kernel_' + str(i) + '.npy')
219 |         rows = np.tile(np.arange(n), S0.shape[1])
220 |         cols = ind.ravel(order='F')
221 |         val = S0
222 |         S0 = csr_matrix((S0.ravel(order='F'),(rows,cols)),shape = (n, n)) + csr_matrix((S0.ravel(order='F'),(cols,rows)),    shape = (n, n))
223 |         return S0, F, val, ind
224 | 
225 | 
226 | 
227 | 


--------------------------------------------------------------------------------
/SIMLR_PY/SIMLR/core.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/basiralab/MultiGraphGAN/7b6d2160daaf07c2462b11a7a621e5ead04cef0b/SIMLR_PY/SIMLR/core.pyc


--------------------------------------------------------------------------------
/SIMLR_PY/SIMLR/helper.py:
--------------------------------------------------------------------------------
  1 | from __future__ import absolute_import
  2 | from __future__ import division
  3 | from __future__ import print_function
  4 | from __future__ import unicode_literals
  5 | 
  6 | import numpy as np
  7 | from scipy.sparse import csr_matrix, csc_matrix, linalg
  8 | from sklearn.decomposition import TruncatedSVD
  9 | from fbpca import pca
 10 | import time
 11 | 
 12 | 
 13 | def save_sparse_csr(filename,array, label=[]):
 14 |     np.savez(filename,data = array.data ,indices=array.indices,
 15 |           indptr =array.indptr, shape=array.shape, label = label )
 16 | 
 17 | def load_sparse_csr(filename):
 18 |     loader = np.load(filename)
 19 |     if 'label' in loader.keys():
 20 |         label = loader['label']
 21 |     else:
 22 |         label = []
 23 |     return csr_matrix((  np.log10(1.0+loader['data']), loader['indices'], loader['indptr']),
 24 |                       shape = loader['shape']), label
 25 | 
 26 | def NE_dn(A, type='ave'):
 27 |     m , n = A.shape
 28 |     diags = np.ones(m)
 29 |     diags[:] = abs(A).sum(axis=1).flatten()
 30 |     if type == 'ave':
 31 |         D = 1/(diags+np.finfo(float).eps)
 32 |         return A*D[:,np.newaxis]
 33 |     elif type == 'gph':
 34 |         D = 1/np.sqrt(diags+np.finfo(float).eps)
 35 |         return (A*D[:,np.newaxis])*D
 36 | 
 37 | def Hbeta(D,beta):
 38 |     D = (D-D.min())/(D.max() - D.min() + np.finfo(float).eps)
 39 |     P = np.exp(-D*beta)
 40 |     sumP = P.sum()
 41 |     H = np.log(sumP) + beta*sum(D*P)/sumP
 42 |     P = P / sumP
 43 |     return H, P
 44 | 
 45 | 
 46 | 
 47 | def umkl_bo(D, beta):
 48 |     tol = 1e-4
 49 |     u = 20
 50 |     logU = np.log(u)
 51 |     H, P = Hbeta(D,beta)
 52 |     betamin = -np.inf
 53 |     betamax = np.inf
 54 |     Hdiff = H - logU
 55 |     tries = 0
 56 |     while(abs(Hdiff)>tol)&(tries < 30):
 57 |         if Hdiff>0:
 58 |             betamin = beta
 59 |             if np.isinf(betamax):
 60 |                 beta *= 2.0
 61 |             else:
 62 |                 beta = .5*(beta + betamax)
 63 |         else:
 64 |             betamax = beta
 65 |             if np.isinf(betamin):
 66 |                 beta /= 2.0
 67 |             else:
 68 |                 beta = .5*(beta + betamin)
 69 |         H, P = Hbeta(D,beta)
 70 |         Hdiff = H - logU
 71 |         tries +=1
 72 |     return P
 73 | 
 74 | def euclidean_proj_simplex(v, s=1):
 75 |     """ Compute the Euclidean projection on a positive simplex
 76 |     Solves the optimisation problem (using the algorithm from [1]):
 77 |         min_w 0.5 * || w - v ||_2^2 , s.t. \sum_i w_i = s, w_i >= 0
 78 |     Parameters
 79 |     ----------
 80 |     v: (n,) numpy array,
 81 |        n-dimensional vector to project
 82 |     s: int, optional, default: 1,
 83 |        radius of the simplex
 84 |     Returns
 85 |     -------
 86 |     w: (n,) numpy array,
 87 |        Euclidean projection of v on the simplex
 88 |     """
 89 |     assert s > 0, "Radius s must be strictly positive (%d <= 0)" % s
 90 |     n,d = v.shape  # will raise ValueError if v is not 1-D
 91 |     # get the array of cumulative sums of a sorted (decreasing) copy of v
 92 | 
 93 |     v -= (v.mean(axis = 1)[:,np.newaxis]-1.0/d)
 94 |     u = -np.sort(-v)
 95 |     cssv = np.cumsum(u,axis = 1)
 96 |     # get the number of > 0 components of the optimal solution
 97 |     temp = u * np.arange(1,d+1) - cssv +s
 98 |     temp[temp<0] = 'nan'
 99 |     rho = np.nanargmin(temp,axis = 1)
100 |     #rho = np.nonzero(u * np.arange(1, n+1) > (cssv - s))[0][-1]
101 |     # compute the Lagrange multiplier associated to the simplex constraint
102 |     theta = (cssv[np.arange(n), rho] - s) / (rho + 1.0)
103 |     # compute the projection by thresholding v using theta
104 |     w = (v - theta[:,np.newaxis]).clip(min=0)
105 |     return w
106 | 
107 | 
108 | def fast_pca(in_X, no_dim):
109 |     (U, s, Va) = pca(in_X, no_dim, False, 8)
110 |     del Va
111 |     U[:] = U*np.sqrt(np.abs(s))
112 |     D = 1/(np.sqrt(np.sum(U*U,axis = 1)+np.finfo(float).eps)+np.finfo(float).eps)
113 |     return U*D[:,np.newaxis]
114 | 
115 | 
116 | 


--------------------------------------------------------------------------------
/SIMLR_PY/SIMLR/helper.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/basiralab/MultiGraphGAN/7b6d2160daaf07c2462b11a7a621e5ead04cef0b/SIMLR_PY/SIMLR/helper.pyc


--------------------------------------------------------------------------------
/SIMLR_PY/build/lib.linux-x86_64-2.7/SIMLR/__init__.py:
--------------------------------------------------------------------------------
1 | from core import SIMLR_LARGE
2 | 


--------------------------------------------------------------------------------
/SIMLR_PY/build/lib.linux-x86_64-2.7/SIMLR/core.py:
--------------------------------------------------------------------------------
  1 | """
  2 |     Functions for large scale SIMLR and accuracy checks
  3 | 
  4 |     ---------------------------------------------------------------------
  5 | 
  6 |     This module contains the following functions:
  7 | 
  8 |     save_sparse_csr
  9 |     save a sparse csr format input of single-cell RNA-seq data
 10 |     load_sparse_csr
 11 |     load a sparse csr format input of single-cell RNA-seq data
 12 |     nearest_neighbor_search
 13 |     Approximate Nearset Neighbor search for every cell
 14 |     NE_dn
 15 |     Row-normalization of a matrix
 16 |     mex_L2_distance
 17 |     A fast way to calculate L2 distance
 18 |     Cal_distance_memory
 19 |     Calculate Kernels in a memory-saving mode
 20 |     mex_multipleK
 21 |     A fast way to calculate kernels
 22 |     Hbeta
 23 |     A simple LP method to solve linear weight
 24 |     euclidean_proj_simplex
 25 |     A fast way to calculate simplex projection
 26 |     fast_pca
 27 |     A fast randomized pca with sparse input
 28 |     fast_minibatch_kmeans
 29 |     A fast mini-batch version of k-means
 30 |     SIMLR_Large
 31 |     A large-scale implementation of our SIMLR
 32 |     ---------------------------------------------------------------------
 33 | 
 34 |     Copyright 2016 Bo Wang, Stanford University.
 35 |     All rights reserved.
 36 |     """
 37 | from __future__ import absolute_import
 38 | from __future__ import division
 39 | from __future__ import print_function
 40 | from __future__ import unicode_literals
 41 | from . import helper
 42 | import numpy as np
 43 | import sys
 44 | import os
 45 | from annoy import AnnoyIndex
 46 | import scipy.io as sio
 47 | from scipy.sparse import csr_matrix, csc_matrix, linalg
 48 | from fbpca import svd, pca
 49 | import time
 50 | from sklearn.decomposition import TruncatedSVD
 51 | from sklearn.cluster import MiniBatchKMeans, KMeans
 52 | 
 53 | class SIMLR_LARGE(object):
 54 |     """A class for large-scale SIMLR.
 55 | 
 56 |     Attributes:
 57 |         num_of_rank: The rank hyper-parameter in SIMLR usually set to number of clusters.
 58 |         num_of_neighbors: the number of neighbors kept for each cell to approximate full cell similarities
 59 |         mode_of_memory: an indicator to open the memory-saving mode. This is helpful for datasets of millions of cells. It will sacrify a bit speed though.
 60 |     """
 61 |     def __init__(self, num_of_rank, num_of_neighbor=30, mode_of_memory = False, max_iter = 5):
 62 |         self.num_of_rank = int(num_of_rank)
 63 |         self.num_of_neighbor = int(num_of_neighbor)
 64 |         self.mode_of_memory = mode_of_memory
 65 |         self.max_iter = int(max_iter)
 66 | 
 67 |     def nearest_neighbor_search(self, GE_csc):
 68 |         K = self.num_of_neighbor * 2
 69 |         n,d = GE_csc.shape
 70 |         t = AnnoyIndex(d)
 71 |         for i in xrange(n):
 72 |             t.add_item(i,GE_csc[i,:])
 73 |         t.build(100)
 74 |         t.save('test.ann')
 75 |         u = AnnoyIndex(d)
 76 |         u.load('test.ann')
 77 |         os.remove('test.ann')
 78 |         val = np.zeros((n,K))
 79 |         ind = np.zeros((n,K))
 80 |         for i in xrange(n):
 81 |             tmp, tmp1 = u.get_nns_by_item(i,K, include_distances=True)
 82 |             ind[i,:] = tmp
 83 |             val[i,:] = tmp1
 84 |         return ind.astype('int'), val
 85 | 
 86 | 
 87 | 
 88 | 
 89 |     def mex_L2_distance(self, F, ind):
 90 |         m,n = ind.shape
 91 |         I = np.tile(np.arange(m), n)
 92 |         if self.mode_of_memory:
 93 |             temp = np.zeros((m,n))
 94 |             for i in range(n):
 95 |                 temptemp = np.take(F, np.arange(m), axis = 0) - np.take(F, ind[:,i],axis=0)
 96 |                 temp[:,i] = (temptemp*temptemp).sum(axis=1)
 97 |             return temp
 98 |         else:
 99 |             temp = np.take(F, I, axis = 0) - np.take(F, ind.ravel(order = 'F'),axis=0)
100 |             temp = (temp*temp).sum(axis=1)
101 |             return temp.reshape((m,n),order = 'F')
102 | 
103 | 
104 | 
105 |     def Cal_distance_memory(self, S, alpha):
106 |         NT = len(alpha)
107 |         DD = alpha.copy()
108 |         for i in xrange(NT):
109 |             temp = np.load('Kernel_' + str(i)+'.npy')
110 |             if i == 0:
111 |                 distX = alpha[0]*temp
112 |             else:
113 |                 distX += alpha[i]*temp
114 |             DD[i] = ((temp*S).sum(axis = 0)/(S.shape[0]+0.0)).mean(axis = 0)
115 |         alphaK0 = helper.umkl_bo(DD, 1.0/len(DD));
116 |         alphaK0 = alphaK0/np.sum(alphaK0)
117 |         return distX, alphaK0
118 | 
119 | 
120 |     def mex_multipleK(self, val, ind):
121 |         #val *= val
122 |         KK = self.num_of_neighbor
123 |         ismemory = self.mode_of_memory
124 |         m,n=val.shape
125 |         sigma = np.arange(1,2.1,.25)
126 |         allK = (np.arange(np.ceil(KK/2.0), min(n,np.ceil(KK*1.5))+1, np.ceil(KK/10.0))).astype('int')
127 |         if ismemory:
128 |             D_kernels = []
129 |             alphaK = np.ones(len(allK)*len(sigma))/(0.0 + len(allK)*len(sigma))
130 |         else:
131 |             D_kernels = np.zeros((m,n,len(allK)*len(sigma)))
132 |             alphaK = np.ones(D_kernels.shape[2])/(0.0 + D_kernels.shape[2])
133 |         t = 0;
134 |         for k in allK:
135 |             temp = val[:,np.arange(k)].sum(axis=1)/(k+0.0)
136 |             temp0 = .5*(temp[:,np.newaxis] + np.take(temp,ind))
137 |             temp = val/temp0
138 |             temp*=temp
139 |             for s in sigma:
140 |                 temp1 =  np.exp(-temp/2.0/s/s)/np.sqrt(2*np.pi)/s/temp0
141 |                 temptemp = temp1[:, 0]
142 |                 temp1[:] = .5*(temptemp[:,np.newaxis] + temptemp[ind]) - temp1
143 |                 if ismemory:
144 |                     np.save('Kernel_' + str(t), temp1 - temp1.min())
145 |                 else:
146 |                     D_kernels[:,:,t] = temp1 - temp1.min()
147 |                 t = t+1
148 | 
149 |         return D_kernels, alphaK
150 | 
151 | 
152 |     def fast_eigens(self, val, ind):
153 |         n,d = val.shape
154 |         rows = np.tile(np.arange(n), d)
155 |         cols = ind.ravel(order='F')
156 |         A = csr_matrix((val.ravel(order='F'),(rows,cols)),shape = (n, n)) + csr_matrix((val.ravel(order='F'),(cols,rows)),shape = (n, n))
157 |         (d,V) = linalg.eigsh(A,self.num_of_rank,which='LM')
158 |         d = -np.sort(-np.real(d))
159 |         return np.real(V),d/np.max(abs(d))
160 |     def fast_minibatch_kmeans(self, X,C):
161 |         batchsize = int(min(1000, np.round(X.shape[0]/C/C)))
162 |         cls = MiniBatchKMeans(init='k-means++',n_clusters=C, batch_size = batchsize, n_init = 100, max_iter = 100)
163 |         return cls.fit_predict(X)
164 | 
165 |     def fit(self, X, beta = 0.8):
166 |         K = self.num_of_neighbor
167 |         is_memory = self.mode_of_memory
168 |         c = self.num_of_rank
169 |         NITER = self.max_iter
170 |         n,d = X.shape
171 |         if d > 500:
172 |             print('SIMLR highly recommends you to perform PCA first on the data\n');
173 |             print('Please use the in-line function fast_pca on your input\n');
174 |         ind, val = self.nearest_neighbor_search(X)
175 |         del X
176 |         D_Kernels, alphaK = self.mex_multipleK(val, ind)
177 |         del val
178 |         if is_memory:
179 |             distX,alphaK0 = self.Cal_distance_memory(np.ones((ind.shape[0], ind.shape[1])), alphaK)
180 |         else:
181 |             distX = D_Kernels.dot(alphaK)
182 |         rr = (.5*(K*distX[:,K+2] - distX[:,np.arange(1,K+1)].sum(axis = 1))).mean()
183 |         lambdar = rr
184 |         S0 = distX.max() - distX
185 |         S0[:] = helper.NE_dn(S0)
186 |         F, evalues = self.fast_eigens(S0.copy(), ind.copy())
187 |         F = helper.NE_dn(F)
188 |         F *= (1-beta)*d/(1-beta*d*d);
189 |         F0 = F.copy()
190 |         for iter in range(NITER):
191 |             FF = self.mex_L2_distance(F, ind)
192 |             FF[:] = (distX + lambdar*FF)/2.0/rr
193 |             FF[:] = helper.euclidean_proj_simplex(-FF)
194 |             S0[:] = (1-beta)*S0 + beta*FF
195 | 
196 |             F[:], evalues = self.fast_eigens(S0, ind)
197 |             F *= (1-beta)*d/(1-beta*d*d);
198 |             F[:] = helper.NE_dn(F)
199 |             F[:] = (1-beta)*F0 + beta*F
200 |             F0 = F.copy()
201 |             lambdar = lambdar * 1.5
202 |             rr = rr / 1.05
203 |             if is_memory:
204 |                 distX, alphaK0 = self.Cal_distance_memory(S0, alphaK)
205 |                 alphaK = (1-beta)*alphaK + beta*alphaK0
206 |                 alphaK = alphaK/np.sum(alphaK)
207 | 
208 |             else:
209 |                 DD = ((D_Kernels*S0[:,:,np.newaxis]).sum(axis = 0)/(D_Kernels.shape[0]+0.0)).mean(axis = 0)
210 |                 alphaK0 = helper.umkl_bo(DD, 1.0/len(DD));
211 |                 alphaK0 = alphaK0/np.sum(alphaK0)
212 |                 alphaK = (1-beta)*alphaK + beta*alphaK0
213 |                 alphaK = alphaK/np.sum(alphaK)
214 |                 distX = D_Kernels.dot(alphaK)
215 | 
216 |         if is_memory:
217 |             for i in xrange(len(alphaK)):
218 |                 os.remove('Kernel_' + str(i) + '.npy')
219 |         rows = np.tile(np.arange(n), S0.shape[1])
220 |         cols = ind.ravel(order='F')
221 |         val = S0
222 |         S0 = csr_matrix((S0.ravel(order='F'),(rows,cols)),shape = (n, n)) + csr_matrix((S0.ravel(order='F'),(cols,rows)),    shape = (n, n))
223 |         return S0, F, val, ind
224 | 
225 | 
226 | 
227 | 


--------------------------------------------------------------------------------
/SIMLR_PY/build/lib.linux-x86_64-2.7/SIMLR/helper.py:
--------------------------------------------------------------------------------
  1 | from __future__ import absolute_import
  2 | from __future__ import division
  3 | from __future__ import print_function
  4 | from __future__ import unicode_literals
  5 | 
  6 | import numpy as np
  7 | from scipy.sparse import csr_matrix, csc_matrix, linalg
  8 | from sklearn.decomposition import TruncatedSVD
  9 | from fbpca import pca
 10 | import time
 11 | 
 12 | 
 13 | def save_sparse_csr(filename,array, label=[]):
 14 |     np.savez(filename,data = array.data ,indices=array.indices,
 15 |           indptr =array.indptr, shape=array.shape, label = label )
 16 | 
 17 | def load_sparse_csr(filename):
 18 |     loader = np.load(filename)
 19 |     if 'label' in loader.keys():
 20 |         label = loader['label']
 21 |     else:
 22 |         label = []
 23 |     return csr_matrix((  np.log10(1.0+loader['data']), loader['indices'], loader['indptr']),
 24 |                       shape = loader['shape']), label
 25 | 
 26 | def NE_dn(A, type='ave'):
 27 |     m , n = A.shape
 28 |     diags = np.ones(m)
 29 |     diags[:] = abs(A).sum(axis=1).flatten()
 30 |     if type == 'ave':
 31 |         D = 1/(diags+np.finfo(float).eps)
 32 |         return A*D[:,np.newaxis]
 33 |     elif type == 'gph':
 34 |         D = 1/np.sqrt(diags+np.finfo(float).eps)
 35 |         return (A*D[:,np.newaxis])*D
 36 | 
 37 | def Hbeta(D,beta):
 38 |     D = (D-D.min())/(D.max() - D.min() + np.finfo(float).eps)
 39 |     P = np.exp(-D*beta)
 40 |     sumP = P.sum()
 41 |     H = np.log(sumP) + beta*sum(D*P)/sumP
 42 |     P = P / sumP
 43 |     return H, P
 44 | 
 45 | 
 46 | 
 47 | def umkl_bo(D, beta):
 48 |     tol = 1e-4
 49 |     u = 20
 50 |     logU = np.log(u)
 51 |     H, P = Hbeta(D,beta)
 52 |     betamin = -np.inf
 53 |     betamax = np.inf
 54 |     Hdiff = H - logU
 55 |     tries = 0
 56 |     while(abs(Hdiff)>tol)&(tries < 30):
 57 |         if Hdiff>0:
 58 |             betamin = beta
 59 |             if np.isinf(betamax):
 60 |                 beta *= 2.0
 61 |             else:
 62 |                 beta = .5*(beta + betamax)
 63 |         else:
 64 |             betamax = beta
 65 |             if np.isinf(betamin):
 66 |                 beta /= 2.0
 67 |             else:
 68 |                 beta = .5*(beta + betamin)
 69 |         H, P = Hbeta(D,beta)
 70 |         Hdiff = H - logU
 71 |         tries +=1
 72 |     return P
 73 | 
 74 | def euclidean_proj_simplex(v, s=1):
 75 |     """ Compute the Euclidean projection on a positive simplex
 76 |     Solves the optimisation problem (using the algorithm from [1]):
 77 |         min_w 0.5 * || w - v ||_2^2 , s.t. \sum_i w_i = s, w_i >= 0
 78 |     Parameters
 79 |     ----------
 80 |     v: (n,) numpy array,
 81 |        n-dimensional vector to project
 82 |     s: int, optional, default: 1,
 83 |        radius of the simplex
 84 |     Returns
 85 |     -------
 86 |     w: (n,) numpy array,
 87 |        Euclidean projection of v on the simplex
 88 |     """
 89 |     assert s > 0, "Radius s must be strictly positive (%d <= 0)" % s
 90 |     n,d = v.shape  # will raise ValueError if v is not 1-D
 91 |     # get the array of cumulative sums of a sorted (decreasing) copy of v
 92 | 
 93 |     v -= (v.mean(axis = 1)[:,np.newaxis]-1.0/d)
 94 |     u = -np.sort(-v)
 95 |     cssv = np.cumsum(u,axis = 1)
 96 |     # get the number of > 0 components of the optimal solution
 97 |     temp = u * np.arange(1,d+1) - cssv +s
 98 |     temp[temp<0] = 'nan'
 99 |     rho = np.nanargmin(temp,axis = 1)
100 |     #rho = np.nonzero(u * np.arange(1, n+1) > (cssv - s))[0][-1]
101 |     # compute the Lagrange multiplier associated to the simplex constraint
102 |     theta = (cssv[np.arange(n), rho] - s) / (rho + 1.0)
103 |     # compute the projection by thresholding v using theta
104 |     w = (v - theta[:,np.newaxis]).clip(min=0)
105 |     return w
106 | 
107 | 
108 | def fast_pca(in_X, no_dim):
109 |     (U, s, Va) = pca(csc_matrix(in_X), no_dim, False, 8)
110 |     del Va
111 |     U[:] = U*np.sqrt(np.abs(s))
112 |     D = 1/(np.sqrt(np.sum(U*U,axis = 1)+np.finfo(float).eps)+np.finfo(float).eps)
113 |     return U*D[:,np.newaxis]
114 | 
115 | 
116 | 


--------------------------------------------------------------------------------
/SIMLR_PY/build/lib.linux-x86_64-2.7/SIMLR/helpers.py:
--------------------------------------------------------------------------------
  1 | from __future__ import absolute_import
  2 | from __future__ import division
  3 | from __future__ import print_function
  4 | from __future__ import unicode_literals
  5 | 
  6 | import numpy as np
  7 | from scipy.sparse import csr_matrix, csc_matrix, linalg
  8 | from fbpca import svd, pca
  9 | import time
 10 | 
 11 | 
 12 | def save_sparse_csr(filename,array, label=[]):
 13 |     np.savez(filename,data = array.data ,indices=array.indices,
 14 |           indptr =array.indptr, shape=array.shape, label = label )
 15 | 
 16 | def load_sparse_csr(filename):
 17 |     loader = np.load(filename)
 18 |     if 'label' in loader.keys():
 19 |         label = loader['label']
 20 |     else:
 21 |         label = []
 22 |     return csr_matrix((  np.log10(1.0+loader['data']), loader['indices'], loader['indptr']),
 23 |                       shape = loader['shape']), label
 24 | 
 25 | def NE_dn(A, type='ave'):
 26 |     m , n = A.shape
 27 |     diags = np.ones(m)
 28 |     diags[:] = abs(A).sum(axis=1).flatten()
 29 |     if type == 'ave':
 30 |         D = 1/(diags+np.finfo(float).eps)
 31 |         return A*D[:,np.newaxis]
 32 |     elif type == 'gph':
 33 |         D = 1/np.sqrt(diags+np.finfo(float).eps)
 34 |         return (A*D[:,np.newaxis])*D
 35 | 
 36 | def Hbeta(D,beta):
 37 |     D = (D-D.min())/(D.max() - D.min() + np.finfo(float).eps)
 38 |     P = np.exp(-D*beta)
 39 |     sumP = P.sum()
 40 |     H = np.log(sumP) + beta*sum(D*P)/sumP
 41 |     P = P / sumP
 42 |     return H, P
 43 | 
 44 | 
 45 | 
 46 | def umkl_bo(D, beta):
 47 |     tol = 1e-4
 48 |     u = 20
 49 |     logU = np.log(u)
 50 |     H, P = Hbeta(D,beta)
 51 |     betamin = -np.inf
 52 |     betamax = np.inf
 53 |     Hdiff = H - logU
 54 |     tries = 0
 55 |     while(abs(Hdiff)>tol)&(tries < 30):
 56 |         if Hdiff>0:
 57 |             betamin = beta
 58 |             if np.isinf(betamax):
 59 |                 beta *= 2.0
 60 |             else:
 61 |                 beta = .5*(beta + betamax)
 62 |         else:
 63 |             betamax = beta
 64 |             if np.isinf(betamin):
 65 |                 beta /= 2.0
 66 |             else:
 67 |                 beta = .5*(beta + betamin)
 68 |         H, P = Hbeta(D,beta)
 69 |         Hdiff = H - logU
 70 |         tries +=1
 71 |     return P
 72 | 
 73 | def euclidean_proj_simplex(v, s=1):
 74 |     """ Compute the Euclidean projection on a positive simplex
 75 |     Solves the optimisation problem (using the algorithm from [1]):
 76 |         min_w 0.5 * || w - v ||_2^2 , s.t. \sum_i w_i = s, w_i >= 0
 77 |     Parameters
 78 |     ----------
 79 |     v: (n,) numpy array,
 80 |        n-dimensional vector to project
 81 |     s: int, optional, default: 1,
 82 |        radius of the simplex
 83 |     Returns
 84 |     -------
 85 |     w: (n,) numpy array,
 86 |        Euclidean projection of v on the simplex
 87 |     """
 88 |     assert s > 0, "Radius s must be strictly positive (%d <= 0)" % s
 89 |     n,d = v.shape  # will raise ValueError if v is not 1-D
 90 |     # get the array of cumulative sums of a sorted (decreasing) copy of v
 91 | 
 92 |     v -= (v.mean(axis = 1)[:,np.newaxis]-1.0/d)
 93 |     u = -np.sort(-v)
 94 |     cssv = np.cumsum(u,axis = 1)
 95 |     # get the number of > 0 components of the optimal solution
 96 |     temp = u * np.arange(1,d+1) - cssv +s
 97 |     temp[temp<0] = 'nan'
 98 |     rho = np.nanargmin(temp,axis = 1)
 99 |     #rho = np.nonzero(u * np.arange(1, n+1) > (cssv - s))[0][-1]
100 |     # compute the Lagrange multiplier associated to the simplex constraint
101 |     theta = (cssv[np.arange(n), rho] - s) / (rho + 1.0)
102 |     # compute the projection by thresholding v using theta
103 |     w = (v - theta[:,np.newaxis]).clip(min=0)
104 |     return w
105 | 
106 | 
107 | def fast_pca(in_X, no_dim):
108 |     (U, s, Va) = pca(csc_matrix(in_X), no_dim, True, 5)
109 |     del Va
110 |     U[:] = U*np.sqrt(np.abs(s))
111 |     D = 1/(np.sqrt(np.sum(U*U,axis = 1)+np.finfo(float).eps)+np.finfo(float).eps)
112 |     return U*D[:,np.newaxis]
113 | 
114 | 
115 | 
116 | 
117 | 


--------------------------------------------------------------------------------
/SIMLR_PY/build/lib/SIMLR/__init__.py:
--------------------------------------------------------------------------------
1 | from .core import SIMLR_LARGE
2 | 


--------------------------------------------------------------------------------
/SIMLR_PY/build/lib/SIMLR/core.py:
--------------------------------------------------------------------------------
  1 | """
  2 |     Functions for large scale SIMLR and accuracy checks
  3 | 
  4 |     ---------------------------------------------------------------------
  5 | 
  6 |     This module contains the following functions:
  7 | 
  8 |     save_sparse_csr
  9 |     save a sparse csr format input of single-cell RNA-seq data
 10 |     load_sparse_csr
 11 |     load a sparse csr format input of single-cell RNA-seq data
 12 |     nearest_neighbor_search
 13 |     Approximate Nearset Neighbor search for every cell
 14 |     NE_dn
 15 |     Row-normalization of a matrix
 16 |     mex_L2_distance
 17 |     A fast way to calculate L2 distance
 18 |     Cal_distance_memory
 19 |     Calculate Kernels in a memory-saving mode
 20 |     mex_multipleK
 21 |     A fast way to calculate kernels
 22 |     Hbeta
 23 |     A simple LP method to solve linear weight
 24 |     euclidean_proj_simplex
 25 |     A fast way to calculate simplex projection
 26 |     fast_pca
 27 |     A fast randomized pca with sparse input
 28 |     fast_minibatch_kmeans
 29 |     A fast mini-batch version of k-means
 30 |     SIMLR_Large
 31 |     A large-scale implementation of our SIMLR
 32 |     ---------------------------------------------------------------------
 33 | 
 34 |     Copyright 2016 Bo Wang, Stanford University.
 35 |     All rights reserved.
 36 |     """
 37 | from __future__ import absolute_import
 38 | from __future__ import division
 39 | from __future__ import print_function
 40 | from __future__ import unicode_literals
 41 | from . import helper
 42 | import numpy as np
 43 | import sys
 44 | import os
 45 | from annoy import AnnoyIndex
 46 | import scipy.io as sio
 47 | from scipy.sparse import csr_matrix, csc_matrix, linalg
 48 | from fbpca import svd, pca
 49 | import time
 50 | from sklearn.decomposition import TruncatedSVD
 51 | from sklearn.cluster import MiniBatchKMeans, KMeans
 52 | 
 53 | class SIMLR_LARGE(object):
 54 |     """A class for large-scale SIMLR.
 55 | 
 56 |     Attributes:
 57 |         num_of_rank: The rank hyper-parameter in SIMLR usually set to number of clusters.
 58 |         num_of_neighbors: the number of neighbors kept for each cell to approximate full cell similarities
 59 |         mode_of_memory: an indicator to open the memory-saving mode. This is helpful for datasets of millions of cells. It will sacrify a bit speed though.
 60 |     """
 61 |     def __init__(self, num_of_rank, num_of_neighbor=30, mode_of_memory = False, max_iter = 5):
 62 |         self.num_of_rank = int(num_of_rank)
 63 |         self.num_of_neighbor = int(num_of_neighbor)
 64 |         self.mode_of_memory = mode_of_memory
 65 |         self.max_iter = int(max_iter)
 66 | 
 67 |     def nearest_neighbor_search(self, GE_csc):
 68 |         K = self.num_of_neighbor * 2
 69 |         n,d = GE_csc.shape
 70 |         t = AnnoyIndex(d)
 71 |         for i in range(n):
 72 |             t.add_item(i,GE_csc[i,:])
 73 |         t.build(100)
 74 |         t.save('test.ann')
 75 |         u = AnnoyIndex(d)
 76 |         u.load('test.ann')
 77 |         os.remove('test.ann')
 78 |         val = np.zeros((n,K))
 79 |         ind = np.zeros((n,K))
 80 |         for i in range(n):
 81 |             tmp, tmp1 = u.get_nns_by_item(i,K, include_distances=True)
 82 |             ind[i,:] = tmp
 83 |             val[i,:] = tmp1
 84 |         return ind.astype('int'), val
 85 | 
 86 | 
 87 | 
 88 | 
 89 |     def mex_L2_distance(self, F, ind):
 90 |         m,n = ind.shape
 91 |         I = np.tile(np.arange(m), n)
 92 |         if self.mode_of_memory:
 93 |             temp = np.zeros((m,n))
 94 |             for i in range(n):
 95 |                 temptemp = np.take(F, np.arange(m), axis = 0) - np.take(F, ind[:,i],axis=0)
 96 |                 temp[:,i] = (temptemp*temptemp).sum(axis=1)
 97 |             return temp
 98 |         else:
 99 |             temp = np.take(F, I, axis = 0) - np.take(F, ind.ravel(order = 'F'),axis=0)
100 |             temp = (temp*temp).sum(axis=1)
101 |             return temp.reshape((m,n),order = 'F')
102 | 
103 | 
104 | 
105 |     def Cal_distance_memory(self, S, alpha):
106 |         NT = len(alpha)
107 |         DD = alpha.copy()
108 |         for i in range(NT):
109 |             temp = np.load('Kernel_' + str(i)+'.npy')
110 |             if i == 0:
111 |                 distX = alpha[0]*temp
112 |             else:
113 |                 distX += alpha[i]*temp
114 |             DD[i] = ((temp*S).sum(axis = 0)/(S.shape[0]+0.0)).mean(axis = 0)
115 |         alphaK0 = helper.umkl_bo(DD, 1.0/len(DD));
116 |         alphaK0 = alphaK0/np.sum(alphaK0)
117 |         return distX, alphaK0
118 | 
119 | 
120 |     def mex_multipleK(self, val, ind):
121 |         #val *= val
122 |         KK = self.num_of_neighbor
123 |         ismemory = self.mode_of_memory
124 |         m,n=val.shape
125 |         sigma = np.arange(1,2.1,.25)
126 |         allK = (np.arange(np.ceil(KK/2.0), min(n,np.ceil(KK*1.5))+1, np.ceil(KK/10.0))).astype('int')
127 |         if ismemory:
128 |             D_kernels = []
129 |             alphaK = np.ones(len(allK)*len(sigma))/(0.0 + len(allK)*len(sigma))
130 |         else:
131 |             D_kernels = np.zeros((m,n,len(allK)*len(sigma)))
132 |             alphaK = np.ones(D_kernels.shape[2])/(0.0 + D_kernels.shape[2])
133 |         t = 0;
134 |         for k in allK:
135 |             temp = val[:,np.arange(k)].sum(axis=1)/(k+0.0)
136 |             temp0 = .5*(temp[:,np.newaxis] + np.take(temp,ind))
137 |             temp = val/temp0
138 |             temp*=temp
139 |             for s in sigma:
140 |                 temp1 =  np.exp(-temp/2.0/s/s)/np.sqrt(2*np.pi)/s/temp0
141 |                 temptemp = temp1[:, 0]
142 |                 temp1[:] = .5*(temptemp[:,np.newaxis] + temptemp[ind]) - temp1
143 |                 if ismemory:
144 |                     np.save('Kernel_' + str(t), temp1 - temp1.min())
145 |                 else:
146 |                     D_kernels[:,:,t] = temp1 - temp1.min()
147 |                 t = t+1
148 | 
149 |         return D_kernels, alphaK
150 | 
151 | 
152 |     def fast_eigens(self, val, ind):
153 |         n,d = val.shape
154 |         rows = np.tile(np.arange(n), d)
155 |         cols = ind.ravel(order='F')
156 |         A = csr_matrix((val.ravel(order='F'),(rows,cols)),shape = (n, n)) + csr_matrix((val.ravel(order='F'),(cols,rows)),shape = (n, n))
157 |         (d,V) = linalg.eigsh(A,self.num_of_rank,which='LM')
158 |         d = -np.sort(-np.real(d))
159 |         return np.real(V),d/np.max(abs(d))
160 |     def fast_minibatch_kmeans(self, X,C):
161 |         batchsize = int(min(1000, np.round(X.shape[0]/C/C)))
162 |         cls = MiniBatchKMeans(init='k-means++',n_clusters=C, batch_size = batchsize, n_init = 100, max_iter = 100)
163 |         return cls.fit_predict(X)
164 | 
165 |     def fit(self, X, beta = 0.8):
166 |         K = self.num_of_neighbor
167 |         is_memory = self.mode_of_memory
168 |         c = self.num_of_rank
169 |         NITER = self.max_iter
170 |         n,d = X.shape
171 |         if d > 500:
172 |             print('SIMLR highly recommends you to perform PCA first on the data\n');
173 |             print('Please use the in-line function fast_pca on your input\n');
174 |         ind, val = self.nearest_neighbor_search(X)
175 |         del X
176 |         D_Kernels, alphaK = self.mex_multipleK(val, ind)
177 |         del val
178 |         if is_memory:
179 |             distX,alphaK0 = self.Cal_distance_memory(np.ones((ind.shape[0], ind.shape[1])), alphaK)
180 |         else:
181 |             distX = D_Kernels.dot(alphaK)
182 |         rr = (.5*(K*distX[:,K+2] - distX[:,np.arange(1,K+1)].sum(axis = 1))).mean()
183 |         lambdar = rr
184 |         S0 = distX.max() - distX
185 |         S0[:] = helper.NE_dn(S0)
186 |         F, evalues = self.fast_eigens(S0.copy(), ind.copy())
187 |         F = helper.NE_dn(F)
188 |         F *= (1-beta)*d/(1-beta*d*d);
189 |         F0 = F.copy()
190 |         for iter in range(NITER):
191 |             FF = self.mex_L2_distance(F, ind)
192 |             FF[:] = (distX + lambdar*FF)/2.0/rr
193 |             FF[:] = helper.euclidean_proj_simplex(-FF)
194 |             S0[:] = (1-beta)*S0 + beta*FF
195 | 
196 |             F[:], evalues = self.fast_eigens(S0, ind)
197 |             F *= (1-beta)*d/(1-beta*d*d);
198 |             F[:] = helper.NE_dn(F)
199 |             F[:] = (1-beta)*F0 + beta*F
200 |             F0 = F.copy()
201 |             lambdar = lambdar * 1.5
202 |             rr = rr / 1.05
203 |             if is_memory:
204 |                 distX, alphaK0 = self.Cal_distance_memory(S0, alphaK)
205 |                 alphaK = (1-beta)*alphaK + beta*alphaK0
206 |                 alphaK = alphaK/np.sum(alphaK)
207 | 
208 |             else:
209 |                 DD = ((D_Kernels*S0[:,:,np.newaxis]).sum(axis = 0)/(D_Kernels.shape[0]+0.0)).mean(axis = 0)
210 |                 alphaK0 = helper.umkl_bo(DD, 1.0/len(DD));
211 |                 alphaK0 = alphaK0/np.sum(alphaK0)
212 |                 alphaK = (1-beta)*alphaK + beta*alphaK0
213 |                 alphaK = alphaK/np.sum(alphaK)
214 |                 distX = D_Kernels.dot(alphaK)
215 | 
216 |         if is_memory:
217 |             for i in range(len(alphaK)):
218 |                 os.remove('Kernel_' + str(i) + '.npy')
219 |         rows = np.tile(np.arange(n), S0.shape[1])
220 |         cols = ind.ravel(order='F')
221 |         val = S0
222 |         S0 = csr_matrix((S0.ravel(order='F'),(rows,cols)),shape = (n, n)) + csr_matrix((S0.ravel(order='F'),(cols,rows)),    shape = (n, n))
223 |         return S0, F, val, ind
224 | 
225 | 
226 | 
227 | 


--------------------------------------------------------------------------------
/SIMLR_PY/build/lib/SIMLR/helper.py:
--------------------------------------------------------------------------------
  1 | from __future__ import absolute_import
  2 | from __future__ import division
  3 | from __future__ import print_function
  4 | from __future__ import unicode_literals
  5 | 
  6 | import numpy as np
  7 | from scipy.sparse import csr_matrix, csc_matrix, linalg
  8 | from sklearn.decomposition import TruncatedSVD
  9 | from fbpca import pca
 10 | import time
 11 | 
 12 | 
 13 | def save_sparse_csr(filename,array, label=[]):
 14 |     np.savez(filename,data = array.data ,indices=array.indices,
 15 |           indptr =array.indptr, shape=array.shape, label = label )
 16 | 
 17 | def load_sparse_csr(filename):
 18 |     loader = np.load(filename)
 19 |     if 'label' in loader.keys():
 20 |         label = loader['label']
 21 |     else:
 22 |         label = []
 23 |     return csr_matrix((  np.log10(1.0+loader['data']), loader['indices'], loader['indptr']),
 24 |                       shape = loader['shape']), label
 25 | 
 26 | def NE_dn(A, type='ave'):
 27 |     m , n = A.shape
 28 |     diags = np.ones(m)
 29 |     diags[:] = abs(A).sum(axis=1).flatten()
 30 |     if type == 'ave':
 31 |         D = 1/(diags+np.finfo(float).eps)
 32 |         return A*D[:,np.newaxis]
 33 |     elif type == 'gph':
 34 |         D = 1/np.sqrt(diags+np.finfo(float).eps)
 35 |         return (A*D[:,np.newaxis])*D
 36 | 
 37 | def Hbeta(D,beta):
 38 |     D = (D-D.min())/(D.max() - D.min() + np.finfo(float).eps)
 39 |     P = np.exp(-D*beta)
 40 |     sumP = P.sum()
 41 |     H = np.log(sumP) + beta*sum(D*P)/sumP
 42 |     P = P / sumP
 43 |     return H, P
 44 | 
 45 | 
 46 | 
 47 | def umkl_bo(D, beta):
 48 |     tol = 1e-4
 49 |     u = 20
 50 |     logU = np.log(u)
 51 |     H, P = Hbeta(D,beta)
 52 |     betamin = -np.inf
 53 |     betamax = np.inf
 54 |     Hdiff = H - logU
 55 |     tries = 0
 56 |     while(abs(Hdiff)>tol)&(tries < 30):
 57 |         if Hdiff>0:
 58 |             betamin = beta
 59 |             if np.isinf(betamax):
 60 |                 beta *= 2.0
 61 |             else:
 62 |                 beta = .5*(beta + betamax)
 63 |         else:
 64 |             betamax = beta
 65 |             if np.isinf(betamin):
 66 |                 beta /= 2.0
 67 |             else:
 68 |                 beta = .5*(beta + betamin)
 69 |         H, P = Hbeta(D,beta)
 70 |         Hdiff = H - logU
 71 |         tries +=1
 72 |     return P
 73 | 
 74 | def euclidean_proj_simplex(v, s=1):
 75 |     """ Compute the Euclidean projection on a positive simplex
 76 |     Solves the optimisation problem (using the algorithm from [1]):
 77 |         min_w 0.5 * || w - v ||_2^2 , s.t. \sum_i w_i = s, w_i >= 0
 78 |     Parameters
 79 |     ----------
 80 |     v: (n,) numpy array,
 81 |        n-dimensional vector to project
 82 |     s: int, optional, default: 1,
 83 |        radius of the simplex
 84 |     Returns
 85 |     -------
 86 |     w: (n,) numpy array,
 87 |        Euclidean projection of v on the simplex
 88 |     """
 89 |     assert s > 0, "Radius s must be strictly positive (%d <= 0)" % s
 90 |     n,d = v.shape  # will raise ValueError if v is not 1-D
 91 |     # get the array of cumulative sums of a sorted (decreasing) copy of v
 92 | 
 93 |     v -= (v.mean(axis = 1)[:,np.newaxis]-1.0/d)
 94 |     u = -np.sort(-v)
 95 |     cssv = np.cumsum(u,axis = 1)
 96 |     # get the number of > 0 components of the optimal solution
 97 |     temp = u * np.arange(1,d+1) - cssv +s
 98 |     temp[temp<0] = 'nan'
 99 |     rho = np.nanargmin(temp,axis = 1)
100 |     #rho = np.nonzero(u * np.arange(1, n+1) > (cssv - s))[0][-1]
101 |     # compute the Lagrange multiplier associated to the simplex constraint
102 |     theta = (cssv[np.arange(n), rho] - s) / (rho + 1.0)
103 |     # compute the projection by thresholding v using theta
104 |     w = (v - theta[:,np.newaxis]).clip(min=0)
105 |     return w
106 | 
107 | 
108 | def fast_pca(in_X, no_dim):
109 |     (U, s, Va) = pca(in_X, no_dim, False, 8)
110 |     del Va
111 |     U[:] = U*np.sqrt(np.abs(s))
112 |     D = 1/(np.sqrt(np.sum(U*U,axis = 1)+np.finfo(float).eps)+np.finfo(float).eps)
113 |     return U*D[:,np.newaxis]
114 | 
115 | 
116 | 


--------------------------------------------------------------------------------
/SIMLR_PY/dist/SIMLR-0.0.0.tar.gz:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/basiralab/MultiGraphGAN/7b6d2160daaf07c2462b11a7a621e5ead04cef0b/SIMLR_PY/dist/SIMLR-0.0.0.tar.gz


--------------------------------------------------------------------------------
/SIMLR_PY/dist/SIMLR-0.1.0.tar.gz:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/basiralab/MultiGraphGAN/7b6d2160daaf07c2462b11a7a621e5ead04cef0b/SIMLR_PY/dist/SIMLR-0.1.0.tar.gz


--------------------------------------------------------------------------------
/SIMLR_PY/dist/SIMLR-0.1.1.tar.gz:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/basiralab/MultiGraphGAN/7b6d2160daaf07c2462b11a7a621e5ead04cef0b/SIMLR_PY/dist/SIMLR-0.1.1.tar.gz


--------------------------------------------------------------------------------
/SIMLR_PY/dist/SIMLR-0.1.3.tar.gz:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/basiralab/MultiGraphGAN/7b6d2160daaf07c2462b11a7a621e5ead04cef0b/SIMLR_PY/dist/SIMLR-0.1.3.tar.gz


--------------------------------------------------------------------------------
/SIMLR_PY/requirements.txt:
--------------------------------------------------------------------------------
1 | fbpca==1.0
2 | numpy==1.8.0
3 | scikit-learn==0.17.1
4 | scipy==0.13.2
5 | annoy==1.8.0
6 | 


--------------------------------------------------------------------------------
/SIMLR_PY/setup.py:
--------------------------------------------------------------------------------
 1 | """
 2 | Copyright (c) 2017, Stanford University.
 3 | All rights reserved.
 4 | 
 5 | This source code is a Python implementation of SIMLR for the following paper published in Nature Methods:
 6 | Visualization and analysis of single-cell RNA-seq data by kernel-based similarity learning
 7 | """
 8 | from distutils.core import setup
 9 | 
10 | setup(
11 |     name='SIMLR',
12 |     version='0.1.3',
13 |     author='Bo Wang',
14 |     author_email='bowang87@stanford.edu',
15 |     url='https://github.com/bowang87/SIMLR-PY',
16 |     description='Visualization and analysis of single-cell RNA-seq data by kernel-based similarity learning',
17 |     packages=['SIMLR'],
18 |     install_requires=[
19 |                    'fbpca>=1.0',
20 |                    'numpy>=1.8.0',
21 |                    'scipy>=0.13.2',
22 |                    'annoy>=1.8.0',
23 |                    'scikit-learn>=0.17.1',
24 |      ],
25 |     classifiers=[])
26 | 


--------------------------------------------------------------------------------
/SIMLR_PY/tests/Zeisel.mat:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/basiralab/MultiGraphGAN/7b6d2160daaf07c2462b11a7a621e5ead04cef0b/SIMLR_PY/tests/Zeisel.mat


--------------------------------------------------------------------------------
/SIMLR_PY/tests/test_largescale.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | import sys
 3 | import scipy.io as sio
 4 | sys.path.insert(0,os.path.abspath('..'))
 5 | import time
 6 | import numpy as np
 7 | import SIMLR
 8 | from sklearn.metrics.cluster import normalized_mutual_info_score as nmi
 9 | from sklearn.metrics.cluster import adjusted_rand_score as ari
10 | from scipy.sparse import csr_matrix
11 | 
12 | 
13 | filename = 'Zeisel.mat'
14 | X = csr_matrix(sio.loadmat(filename)['X']) #loading single-cell RNA-seq data
15 | X.data = np.log10(1+X.data) ##take log transform of gene counts. This is very important since it makes the data more gaussian
16 | label = sio.loadmat(filename)['true_labs'] #this is ground-truth label for validation
17 | c = label.max() # number of clusters
18 | ### if the number of genes are more than 500, we recommend to perform pca first!
19 | print('Start to Run PCA on the RNA-seq data!\n')
20 | start_main = time.time()
21 | if X.shape[1]>500:
22 |     X = SIMLR.helper.fast_pca(X,500)
23 | else:
24 |     X = X.todense()
25 | print('Successfully Run PCA! PCA took %f seconds in total\n' % (time.time() -             start_main))
26 | print('Start to Run SIMLR!\n')
27 | start_main = time.time()
28 | simlr = SIMLR.SIMLR_LARGE(c, 30, 0); ###This is how we initialize an object for SIMLR. the first input is number of rank (clusters) and the second input is number of neighbors. The third one is an binary indicator whether to use memory-saving mode. you can turn it on when the number of cells are extremely large to save some memory but with the cost of efficiency.
29 | S, F,val, ind = simlr.fit(X)
30 | print('Successfully Run SIMLR! SIMLR took %f seconds in total\n' % (time.time() -         start_main))
31 | y_pred = simlr.fast_minibatch_kmeans(F,c)
32 | print('NMI value is %f \n' % nmi(y_pred.flatten(),label.flatten()))
33 | print('ARI value is %f \n' % ari(y_pred.flatten(),label.flatten()))
34 | 
35 | 


--------------------------------------------------------------------------------
/centrality.py:
--------------------------------------------------------------------------------
 1 | import numpy as np
 2 | import networkx as nx
 3 | 
 4 | # put it back into a 2D symmetric array
 5 | def to_2d(vector):
 6 |     size = 35
 7 |     x = np.zeros((size,size))
 8 |     c = 0
 9 |     for i in range(1,size):
10 |         for j in range(0,i):
11 |             x[i][j] = vector[c]
12 |             x[j][i] = vector[c]
13 |             c = c + 1
14 |     return x
15 | 
16 | def topological_measures(data):
17 |     # ROI is the number of brain regions (i.e.,35 in our case)
18 |     ROI= to_2d(data[0]).shape[0]
19 |     CC = np.empty((0,ROI), int)
20 |     BC = np.empty((0,ROI), int)
21 |     EC = np.empty((0,ROI), int)
22 |     topology = []
23 |     for i in range(data.shape[0]):
24 |         A = to_2d(data[i])
25 |         np.fill_diagonal(A, 0)
26 | 
27 |         # create a graph from similarity matrix
28 |         # G = nx.from_numpy_matrix(A)
29 |         G = nx.DiGraph(np.array(A))
30 |         U = G.to_undirected()
31 | 
32 |         # Centrality #
33 |         # compute closeness centrality and transform the output to vector
34 |         cc = nx.closeness_centrality(U)
35 |         closeness_centrality = np.array([cc[g] for g in U])
36 |         # compute betweeness centrality and transform the output to vector
37 |         bc = nx.betweenness_centrality(U)
38 |         betweenness_centrality = np.array([bc[g] for g in U])
39 |         # compute egeinvector centrality and transform the output to vector
40 |         ec = nx.eigenvector_centrality(U)
41 |         eigenvector_centrality = np.array([ec[g] for g in U])
42 |         
43 |         # create a matrix of all subjects centralities
44 |         CC = np.vstack((CC, closeness_centrality))
45 |         BC = np.vstack((BC, betweenness_centrality))
46 |         EC = np.vstack((EC, eigenvector_centrality))
47 |         
48 |     topology.append(CC)#0
49 |     topology.append(BC)#1
50 |     topology.append(EC)#2
51 |     
52 |     return topology


--------------------------------------------------------------------------------
/data_loader.py:
--------------------------------------------------------------------------------
 1 | import torch.utils.data as data_utils
 2 | import numpy as np
 3 | import torch
 4 | import SIMLR
 5 | import os
 6 | 
 7 | def get_loader(features, batch_size, num_workers=1):
 8 |     """
 9 |     Build and return a data loader.
10 |     """
11 |     dataset = data_utils.TensorDataset(torch.Tensor(features))
12 |     loader = data_utils.DataLoader(dataset,
13 |                         batch_size=batch_size,
14 |                         shuffle = True, #set to True in case of training and False when testing the model
15 |                         num_workers=num_workers
16 |                         )
17 |     
18 |     return loader
19 | 
20 | def learn_adj(x):
21 |     y = []
22 |     for t in x:
23 |             b = t.cpu().numpy()
24 |             y.append(b)
25 |     
26 |     x = np.array(y)
27 |     batchsize = x.shape[0]
28 |     simlr = SIMLR.SIMLR_LARGE(1, batchsize/3, 0)
29 |     adj, _,_, _ = simlr.fit(x)
30 |     array = adj.toarray()
31 |     tensor = torch.Tensor(array).cuda()
32 |     
33 |     return tensor
34 | 
35 | def to_tensor(x):
36 |     y = []
37 |     for t in x:
38 |             b = t.numpy()
39 |             y.append(b)
40 |     
41 |     x = np.array(y)
42 |     x = x[0]
43 |     tensor = torch.Tensor(x)
44 |     
45 |     return tensor
46 | 
47 | def create_dirs_if_not_exist(dir_list):
48 |     if isinstance(dir_list, list):
49 |         for dir in dir_list:
50 |             if not os.path.exists(dir):
51 |                 os.makedirs(dir)
52 |     else:
53 |         if not os.path.exists(dir_list):
54 |             os.makedirs(dir_list)


--------------------------------------------------------------------------------
/fig1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/basiralab/MultiGraphGAN/7b6d2160daaf07c2462b11a7a621e5ead04cef0b/fig1.png


--------------------------------------------------------------------------------
/fig2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/basiralab/MultiGraphGAN/7b6d2160daaf07c2462b11a7a621e5ead04cef0b/fig2.png


--------------------------------------------------------------------------------
/main.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/python
  2 | # -*- coding: utf8 -*-
  3 | """
  4 | Main function of MultiGraphGAN framework 
  5 | for jointly predicting multiple target brain graphs from a single source graph.
  6 | 
  7 | Details can be found in:
  8 | (1) the original paper https://link.springer.com/
  9 |     Alaa Bessadok, Mohamed Ali Mahjoub, and Islem Rekik. "Topology-Aware Generative Adversarial Network for Joint Prediction of
 10 |    														  Multiple Brain Graphs from a Single Brain Graph", MICCAI 2020, Lima, Peru.
 11 | (2) the youtube channel of BASIRA Lab: https://www.youtube.com/watch?v=OJOtLy9Xd34
 12 | ---------------------------------------------------------------------
 13 | 
 14 | This file contains the implementation of two main steps of our MultiGraphGAN framework:
 15 |   (1) source graphs embedding and clustering, and
 16 |   (2) cluster-specific multi-target graph prediction.
 17 |     
 18 |   MultiGraphGAN(src_loader, tgt_loaders, nb_clusters, opts)
 19 |           Inputs:
 20 |                   src_loader:   a PyTorch dataloader returning elements from source dataset batch by batch
 21 |                   tgt_loaders:  a PyTorch dataloader returning elements from target dataset batch by batch
 22 |                   nb_clusters:  number of clusters used to cluster the source graph embeddings
 23 |                   opts:         a python object (parser) storing all arguments needed to run the code such as hyper-parameters 
 24 |           Output:
 25 |                   model:        our MultiGraphGAN model
 26 |                   
 27 | To evaluate our framework we used 90% of the dataset as training set and 10% for testing.
 28 |     
 29 | Sample use for training:
 30 |   model = MultiGraphGAN(src_loader, tgt_loaders, opts.nb_clusters, opts)
 31 |   model.train()
 32 | 
 33 | Sample use for testing:
 34 |   model = MultiGraphGAN(src_loader, tgt_loaders, opts.nb_clusters, opts)
 35 |   predicted_target_graphs, source_graphs = model.test()
 36 |           Output:
 37 |                   predicted_target_graphs : a list of size num_domains-1 where num_domains is the number of source and target domains.
 38 |                                            Each element is an (n × f) matrix stacking the predicted target feature graphs f of n testing subjects
 39 |                   source_graphs : a matrix of size (n × f) stacking the source feature graphs f of n testing subjects
 40 | ---------------------------------------------------------------------
 41 | Copyright 2020 Alaa Bessadok, Sousse University.
 42 | Please cite the above paper if you use this code.
 43 | All rights reserved.
 44 | """
 45 | import argparse
 46 | import random
 47 | import yaml
 48 | import numpy as np
 49 | from torch.backends import cudnn
 50 | from prediction import MultiGraphGAN
 51 | from data_loader import *
 52 | 
 53 | parser = argparse.ArgumentParser()
 54 | # initialisation
 55 | # Basic opts.
 56 | parser.add_argument('--num_domains', type=int, default=6, help='how many domains(including source domain)')
 57 | parser.add_argument('--mode', type=str, default='train', choices=['train', 'test'])
 58 | parser.add_argument('--log_dir', type=str, default='logs/')
 59 | parser.add_argument('--checkpoint_dir', type=str, default='models/')
 60 | parser.add_argument('--sample_dir', type=str, default='samples/')
 61 | parser.add_argument('--result_dir', type=str, default='results/')
 62 | parser.add_argument('--result_root', type=str, default='result_MultiGraphGAN/')
 63 | 
 64 | # GCN model opts
 65 | parser.add_argument('--hidden1', type=int, default=32)
 66 | parser.add_argument('--hidden2', type=int, default=16)
 67 | parser.add_argument('--dropout', type=float, default=0.5)
 68 | parser.add_argument('--in_feature', type=int, default=595)
 69 | 
 70 | # Discriminator model opts.
 71 | parser.add_argument('--cls_loss', type=str, default='BCE', choices=['LS','BCE'], help='least square loss or binary cross entropy loss')
 72 | parser.add_argument('--lambda_cls', type=float, default=1, help='hyper-parameter for domain classification loss')
 73 | parser.add_argument('--Lf', type=float, default=5, help='a constant with respect to the inter-domain constraint')
 74 | parser.add_argument('--lambda_reg', type=float, default=0.1, help='a constant with respect to the gradient penalty')
 75 | 
 76 | # Generator model opts.
 77 | parser.add_argument('--lambda_idt', type=float, default=10, help='hyper-parameter for identity loss')
 78 | parser.add_argument('--lambda_info', type=float, default=1, help='hype-rparameter for information maximazation loss')
 79 | parser.add_argument('--lambda_topology', type=float, default=0.1, help='hyper-parameter for topological constraint')
 80 | parser.add_argument('--lambda_rec', type=float, default=0.01, help='hyper-parameter for graph reconstruction loss')
 81 | parser.add_argument('--nb_clusters', type=int, default=2, help='number of clusters for MKML clustering')
 82 | 
 83 | # Training opts.
 84 | parser.add_argument('--batch_size', type=int, default=70, help='mini-batch size')
 85 | parser.add_argument('--num_iters', type=int, default=10, help='number of total iterations for training D')
 86 | parser.add_argument('--g_lr', type=float, default=0.0001, help='learning rate for G')
 87 | parser.add_argument('--d_lr', type=float, default=0.0001, help='learning rate for D')
 88 | parser.add_argument('--n_critic', type=int, default=5, help='number of D updates per each G update')
 89 | parser.add_argument('--beta1', type=float, default=0.5, help='beta1 for Adam optimizer')
 90 | parser.add_argument('--beta2', type=float, default=0.999, help='beta2 for Adam optimizer')
 91 | parser.add_argument('--resume_iters', type=int, default=None, help='resume training from this step')
 92 | parser.add_argument('--num_workers', type=int, default=1, help='num_workers to load data.')
 93 | parser.add_argument('--log_step', type=int, default=5)
 94 | parser.add_argument('--sample_step', type=int, default=5)
 95 | parser.add_argument('--model_save_step', type=int, default=10)
 96 | 
 97 | # Test opts.
 98 | parser.add_argument('--test_iters', type=int, default=10, help='test model from this step')
 99 | 
100 | opts = parser.parse_args()
101 | opts.log_dir = os.path.join(opts.result_root, opts.log_dir)
102 | opts.checkpoint_dir = os.path.join(opts.result_root, opts.checkpoint_dir)
103 | opts.sample_dir = os.path.join(opts.result_root, opts.sample_dir)
104 | opts.result_dir = os.path.join(opts.result_root, opts.result_dir)
105 | 
106 | 
107 | if __name__ == '__main__':
108 |     import pandas as pd
109 |     # For fast training.
110 |     cudnn.benchmark = True
111 |     
112 |     if opts.mode == 'train':
113 |         """
114 |         Training MultiGraphGAN
115 |         """
116 |         # Create directories if not exist.
117 |         create_dirs_if_not_exist([opts.log_dir, opts.checkpoint_dir, opts.sample_dir, opts.result_dir])
118 |         
119 |         # log opts.
120 |         with open(os.path.join(opts.result_root, 'opts.yaml'), 'w') as f:
121 |             f.write(yaml.dump(vars(opts)))
122 | 
123 |         # Simulate graph data for easy test the code
124 |         source_target_domains = []
125 |         for i in range(opts.num_domains):
126 |           source_target_domains.append(np.random.normal(random.random(), random.random(), (280,595)))
127 |         
128 |         # Choose the source domain to be translated
129 |         src_domain = 0
130 |         
131 |         # Load source and target TRAIN datasets
132 |         tgt_loaders = []
133 |         for domain in range(0, opts.num_domains):
134 |             if domain == src_domain:
135 |                 source_feature = source_target_domains[domain]
136 |                 src_loader = get_loader(source_feature, opts.batch_size, opts.num_workers)
137 |             else:
138 |                 target_feature = source_target_domains[domain]
139 |                 tgt_loader = get_loader(target_feature, opts.batch_size, opts.num_workers)
140 |                 tgt_loaders.append(tgt_loader)
141 | 
142 |         # Train MultiGraphGAN
143 |         model = MultiGraphGAN(src_loader, tgt_loaders, opts.nb_clusters, opts)
144 |         model.train()
145 | 
146 |     elif opts.mode == 'test':
147 |         """
148 |         Testing MultiGraphGAN
149 |         """
150 |         # Create directories if not exist.
151 |         create_dirs_if_not_exist([opts.result_dir])
152 |         
153 |         # Simulate graph data for easy test the code
154 |         source_target_domains = []
155 |         for i in range(opts.num_domains):
156 |           source_target_domains.append(np.random.normal(random.random(), random.random(), (30,595)))
157 |         
158 |         # Choose the source domain to be translated
159 |         src_domain = 0
160 |         
161 |         # Load source and target TEST datasets
162 |         tgt_loaders = []
163 |         for domain in range(0, opts.num_domains):
164 |             if domain == src_domain:
165 |                 source_feature = source_target_domains[domain]
166 |                 src_loader = get_loader(source_feature, opts.batch_size, opts.num_workers)
167 |             else:
168 |                 target_feature = source_target_domains[domain]
169 |                 tgt_loader = get_loader(target_feature, opts.batch_size, opts.num_workers)
170 |                 tgt_loaders.append(tgt_loader)
171 | 
172 |        # Test MultiGraphGAN
173 |         model = MultiGraphGAN(src_loader, tgt_loaders, opts.nb_clusters, opts)
174 |         predicted_target_graphs, source_graphs = model.test()
175 | 
176 |       # Save data into csv files
177 |         print("saving source graphs into csv file...") 
178 |         f = source_graphs.cpu().numpy()
179 |         dataframe = pd.DataFrame(data=f.astype(float))
180 |         dataframe.to_csv('source_graphs.csv', sep=' ', header=True, float_format='%.6f', index=False)
181 | 
182 |         print("saving predicted target graphs into csv files...") 
183 |         for idx in range(len(predicted_target_graphs)): 
184 |           f = predicted_target_graphs[idx].numpy()
185 |           dataframe = pd.DataFrame(data=f.astype(float))
186 |           dataframe.to_csv('predicted_graphs_%d.csv'%(idx+1), sep=' ', header=True, float_format='%.6f', index=False)
187 |         
188 | 
189 | 


--------------------------------------------------------------------------------
/model.py:
--------------------------------------------------------------------------------
  1 | import torch
  2 | import numpy as np
  3 | import math
  4 | import torch.nn as nn
  5 | from torch.nn.parameter import Parameter
  6 | from torch.nn.modules.module import Module
  7 | from torch.nn import functional as F
  8 | 
  9 | class GCN(Module):
 10 |     """
 11 |     Graph Convolutional Network
 12 |     """
 13 |     def __init__(self, in_features, out_features, bias=True):
 14 |         super(GCN, self).__init__()
 15 |         self.in_features = in_features
 16 |         self.out_features = out_features
 17 |         self.weight = Parameter(torch.FloatTensor(in_features, out_features))
 18 |         if bias:
 19 |             self.bias = Parameter(torch.FloatTensor(out_features))
 20 |         else:
 21 |             self.register_parameter('bias', None)
 22 |         self.reset_parameters()
 23 | 
 24 |     def reset_parameters(self):
 25 |         stdv = 1. / math.sqrt(self.weight.size(1))
 26 |         self.weight.data.uniform_(-stdv, stdv)
 27 |         if self.bias is not None:
 28 |             self.bias.data.uniform_(-stdv, stdv)
 29 | 
 30 |     def forward(self, input, adj):
 31 |         support = torch.mm(input, self.weight)
 32 |         output = torch.spmm(adj, support)
 33 |         if self.bias is not None:
 34 |             return output + self.bias
 35 |         else:
 36 |             return output
 37 | 
 38 |     def __repr__(self):
 39 |         return self.__class__.__name__ + ' (' \
 40 |                + str(self.in_features) + ' -> ' \
 41 |                + str(self.out_features) + ')'
 42 | 
 43 |         return output
 44 |     
 45 | class GCNencoder(nn.Module):
 46 |     """
 47 |     Encoder network.
 48 |     """
 49 |     def __init__(self, nfeat, nhid, nout, dropout):
 50 |         super(GCNencoder, self).__init__()
 51 | 
 52 |         self.gc1 = GCN(nfeat, nhid)
 53 |         self.gc2 = GCN(nhid, nout)
 54 |         self.dropout = dropout
 55 | 
 56 |     def forward(self, x, adj):
 57 |         x = F.relu(self.gc1(x, adj))
 58 |         x = F.dropout(x, self.dropout, training=self.training)
 59 |         x = F.relu(self.gc2(x, adj))
 60 |         return x
 61 | 
 62 |     
 63 | class GCNdecoder(nn.Module):
 64 |     """
 65 |     Decoder network.
 66 |     """
 67 |     def __init__(self, nfeat, nhid, nout, dropout):
 68 |         super(GCNdecoder, self).__init__()
 69 | 
 70 |         self.gc1 = GCN(nfeat, nhid)
 71 |         self.gc2 = GCN(nhid, nout)
 72 |         self.dropout = dropout
 73 | 
 74 |     def forward(self, x, adj):
 75 |         x = F.relu(self.gc1(x, adj))
 76 |         x = F.dropout(x, self.dropout, training=self.training)
 77 |         x = F.relu(self.gc2(x, adj))
 78 |         return x
 79 | 
 80 | class Discriminator(nn.Module):
 81 |     """
 82 |     Discriminator network with GCN.
 83 |     """
 84 |     def __init__(self, input_size, output_size, dropout):
 85 |         super(Discriminator, self).__init__()
 86 | 
 87 |         self.gc1 = GCN(input_size, 32)
 88 |         self.gc2 = GCN(32, 16)
 89 |         self.gc3 = GCN(16, 1)
 90 |         self.dropout = dropout
 91 | 
 92 |     def forward(self, x, adj):
 93 |         x = F.relu(self.gc1(x, adj))
 94 |         x = F.dropout(x, self.dropout, training=self.training)
 95 |         x = F.relu(self.gc2(x, adj))
 96 |         x = F.dropout(x, self.dropout, training=self.training)
 97 |         a = self.gc3(x, adj)
 98 |         x = a.view(a.shape[0])
 99 |         return F.sigmoid(x), F.softmax(x, dim=0)
100 | 


--------------------------------------------------------------------------------
/plot.py:
--------------------------------------------------------------------------------
 1 | import numpy as np
 2 | from scipy.stats import ttest_rel
 3 | from sklearn.metrics import mean_absolute_error
 4 | from scipy.io import loadmat
 5 | import matplotlib.pyplot as plt
 6 | import pandas as pd
 7 | 
 8 | # read csv file
 9 | def readcsv(filename):
10 |     data = pd.read_csv(filename) 
11 |     c = []
12 |     data = np.array(data)
13 |     for i in range(0,data.shape[0]):
14 |         a = data[i][0]
15 |         b = np.array(list(a.split(" ")))
16 |         c.append(b) 
17 | 
18 |     return(np.array(c))
19 | 
20 | # Plot the connectomes
21 | def show_mtrx(m):
22 |     fig, ax = plt.subplots(figsize = (20, 10))
23 | 
24 |     min_val = round((m.min()), 6)
25 |     max_val = round((m.max()), 6)
26 |     
27 |     cax = ax.matshow(m, cmap=plt.cm.Spectral)
28 |     cbar = fig.colorbar(cax, ticks=[min_val, float((min_val + max_val)/2), max_val])
29 |     cbar.ax.set_yticklabels(['< %.2f'%(min_val), '%.2f'%(float((min_val + max_val)/2)), '> %.2f'%(max_val)])
30 |     plt.title(label="Source graph")
31 |     plt.show()
32 |     
33 | 
34 | # put it back into a 2D symmetric array
35 | def to_2d(vector):
36 |     size = 35
37 |     x = np.zeros((size,size))
38 |     c = 0
39 |     for i in range(1,size):
40 |         for j in range(0,i):
41 |             x[i][j] = vector[c]
42 |             x[j][i] = vector[c]
43 |             c = c + 1
44 |     return x
45 | 
46 | # Display the source matrix of the first subject
47 | pred = readcsv("source_graphs.csv")
48 | SG = to_2d(pred[0])
49 | show_mtrx(SG)
50 | 
51 | # Display the target graph in the domain 1 of the first subject
52 | pred = readcsv("predicted_graphs_1.csv")
53 | TG1 = to_2d(pred[0])
54 | show_mtrx(TG1)
55 | 
56 | # Display the target graph in the domain 2 of the first subject
57 | pred = readcsv("predicted_graphs_2.csv")
58 | TG2 = to_2d(pred[0])
59 | show_mtrx(TG2)
60 | 
61 |     


--------------------------------------------------------------------------------
/prediction.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | import time
  3 | import datetime
  4 | import itertools
  5 | import torch
  6 | import SIMLR
  7 | import torch.nn.functional as F
  8 | from sklearn.metrics import mean_absolute_error
  9 | from model import GCNencoder, GCNdecoder
 10 | from model import Discriminator
 11 | from data_loader import *
 12 | from centrality import *
 13 | import numpy as np
 14 | 
 15 | class MultiGraphGAN(object):
 16 |     """
 17 |     Build MultiGraphGAN model for training and testing.
 18 |     """
 19 |     def __init__(self, src_loader, tgt_loaders, nb_clusters, opts):
 20 |         self.src_loader = src_loader
 21 |         self.tgt_loaders = tgt_loaders
 22 |         self.opts = opts
 23 |         # device
 24 |         self.device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
 25 |         # criterion function
 26 |         self.criterionIdt = torch.nn.L1Loss()
 27 |         # build models
 28 |         self.build_model()
 29 |         self.build_generators(nb_clusters)
 30 |         self.nb_clusters = nb_clusters
 31 | 
 32 |     def build_model(self):
 33 |         """
 34 |         Build encoder and discriminator models and initialize optimizers.
 35 |         """
 36 |         # build shared encoder
 37 |         self.E = GCNencoder(self.opts.in_feature, self.opts.hidden1, self.opts.hidden2, self.opts.dropout).to(self.device)
 38 |         
 39 |         # build discriminator( combined with the auxiliary classifier )
 40 |         self.D = Discriminator(self.opts.in_feature, 1, self.opts.dropout).to(self.device)
 41 |        
 42 |         self.d_optimizer = torch.optim.Adam(self.D.parameters(), self.opts.d_lr, [self.opts.beta1, self.opts.beta2])
 43 | 
 44 |     def build_generators(self,nb_clusters):
 45 |         """
 46 |         Build cluster-specific generators models and initialize optimizers.
 47 |         """
 48 |         self.Gs = []
 49 |         param = []
 50 |         for i in range(self.opts.num_domains - 1):
 51 |             inside_list=[]
 52 |             for i in range (nb_clusters):
 53 |                 G_i = GCNdecoder(self.opts.hidden2, self.opts.hidden1, self.opts.in_feature, self.opts.dropout).to(self.device)
 54 |                 inside_list.append(G_i)
 55 |                 param.append(G_i)
 56 |             self.Gs.append(inside_list)
 57 |             
 58 |         # build optimizers
 59 |         param_list = [self.E.parameters()] + [G.parameters() for G in param]
 60 |         self.g_optimizer = torch.optim.Adam(itertools.chain(*param_list),
 61 |                                             self.opts.g_lr, [self.opts.beta1, self.opts.beta2])
 62 | 
 63 |     def restore_model(self, resume_iters, nb_clusters):
 64 |         """
 65 |         Restore the trained generators and discriminator.
 66 |         """
 67 |         print('Loading the trained models from step {}...'.format(resume_iters))
 68 | 
 69 |         E_path = os.path.join(self.opts.checkpoint_dir, '{}-E.ckpt'.format(resume_iters))
 70 |         self.E.load_state_dict(torch.load(E_path, map_location=lambda storage, loc: storage))
 71 | 
 72 |         for c in range(nb_clusters):
 73 |             for i in range(self.opts.num_domains - 1):
 74 |                 G_i_path = os.path.join(self.opts.checkpoint_dir, '{}-G{}-{}.ckpt'.format(resume_iters, i+1, c))
 75 |                 print(G_i_path )
 76 |                 self.Gs[i][c].load_state_dict(torch.load(G_i_path, map_location=lambda storage, loc: storage))
 77 | 
 78 |         D_path = os.path.join(self.opts.checkpoint_dir, '{}-D.ckpt'.format(resume_iters))
 79 |         if os.path.exists(D_path):
 80 |             self.D.load_state_dict(torch.load(D_path, map_location=lambda storage, loc: storage))
 81 | 
 82 | 
 83 |     def reset_grad(self):
 84 |         """
 85 |         Reset the gradient buffers.
 86 |         """
 87 |         self.g_optimizer.zero_grad()
 88 |         self.d_optimizer.zero_grad()
 89 | 
 90 | 
 91 |     def gradient_penalty(self, y, x, Lf):
 92 |         """
 93 |         Compute gradient penalty.
 94 |         """
 95 |         weight = torch.ones(y.size()).to(self.device)
 96 |         dydx = torch.autograd.grad(outputs=y,
 97 |                                    inputs=x,
 98 |                                    grad_outputs=weight,
 99 |                                    retain_graph=True,
100 |                                    create_graph=True,
101 |                                    only_inputs=True)[0]
102 | 
103 |         dydx = dydx.view(dydx.size(0), -1)
104 |         dydx_l2norm = torch.sqrt(torch.sum(dydx ** 2, dim=1))
105 | 
106 |         ZERO = torch.zeros_like(dydx_l2norm).to(self.device)
107 |         penalty = torch.max(dydx_l2norm - Lf, ZERO)
108 |         return torch.mean(penalty) ** 2
109 | 
110 | 
111 |     def classification_loss(self, logit, target, type='LS'):
112 |         """
113 |         Compute classification loss.
114 |         """
115 |         print(type)
116 |         if type == 'BCE':
117 |             return F.binary_cross_entropy_with_logits(logit, target)
118 |         elif type == 'LS':
119 |             return F.mse_loss(logit, target)
120 |         else:
121 |             assert False, '[*] classification loss not implemented.'
122 | 
123 | 
124 |     def train(self):
125 |         """
126 |         Train MultiGraphGAN
127 |         """
128 |         nb_clusters = self.nb_clusters
129 |         
130 |         #fixed data for evaluating: generate samples.
131 |         src_iter = iter(self.src_loader)
132 |         
133 |         x_src_fixed= next(src_iter)
134 |         x_src_fixed = x_src_fixed[0].to(self.device)
135 |         d = next(iter(self.src_loader))
136 |         
137 |         tgt_iters = []
138 |         for loader in self.tgt_loaders:
139 |             tgt_iters.append(iter(loader))
140 | 
141 |         # label
142 |         label_pos = torch.FloatTensor([1] * d[0].shape[0]).to(self.device)
143 |         label_neg = torch.FloatTensor([0] * d[0].shape[0]).to(self.device)
144 |         
145 |         # Start training from scratch or resume training.
146 |         start_iters = 0
147 |         if self.opts.resume_iters:
148 |             start_iters = self.opts.resume_iters
149 |             self.restore_model(self.opts.resume_iters)
150 | 
151 |         # Start training.
152 |         print('Start training MultiGraphGAN...')
153 |         start_time = time.time()
154 |         
155 |         for i in range(start_iters, self.opts.num_iters):
156 |             print("iteration",i)
157 |             # =================================================================================== #
158 |             #                             1. Preprocess input data                                #
159 |             # =================================================================================== #
160 |             try:
161 |                 x_src = next(src_iter)
162 |             except:
163 |                 src_iter = iter(self.src_loader)
164 |                 x_src = next(src_iter)
165 | 
166 |             x_src = x_src[0].to(self.device)
167 | 
168 |             x_tgts = []
169 |             for tgt_idx in range(len(tgt_iters)):
170 |                 try:
171 |                     x_tgt_i= next(tgt_iters[tgt_idx])
172 |                     x_tgts.append(x_tgt_i)
173 |                 except:
174 |                     tgt_iters[tgt_idx] = iter(self.tgt_loaders[tgt_idx])
175 |                     x_tgt_i= next(tgt_iters[tgt_idx])
176 |                     x_tgts.append(x_tgt_i)
177 | 
178 |             for tgt_idx in range(len(x_tgts)):
179 |                 x_tgts[tgt_idx] = x_tgts[tgt_idx][0].to(self.device)
180 |                 print("x_tgts",x_tgts[tgt_idx].shape)
181 | 
182 |             # =================================================================================== #
183 |             #                             2. Train the discriminator                              #
184 |             # =================================================================================== #
185 |             
186 |             embedding = self.E(x_src,learn_adj(x_src)).detach()
187 |             ## Cluster the source graph embeddings using SIMLR
188 |             simlr = SIMLR.SIMLR_LARGE(nb_clusters, embedding.shape[0]/2, 0)
189 |             S, ff, val, ind = simlr.fit(embedding)
190 |             y_pred = simlr.fast_minibatch_kmeans(ff,nb_clusters)
191 |             y_pred = y_pred.tolist()
192 |             get_indexes = lambda x, xs: [i for (y, i) in zip(xs, range(len(xs))) if x == y]
193 | 
194 |             x_fake_list = []
195 |             x_src_list = []
196 |             d_loss_cls = 0
197 |             d_loss_fake = 0
198 |             d_loss = 0
199 | 
200 |             print("Train the discriminator")
201 |             for par in range(nb_clusters):
202 |                 print("================")
203 |                 print("cluster",par)
204 |                 print("================")
205 |                 cluster_index_list = get_indexes(par,y_pred)
206 |                 print(cluster_index_list)
207 |                 for idx in range(len(self.Gs)):
208 |                     x_fake_i = self.Gs[idx][par](embedding[cluster_index_list],learn_adj(x_tgts[idx][cluster_index_list])).detach()
209 |                     x_fake_list.append(x_fake_i)
210 |                     x_src_list.append(x_src[cluster_index_list])
211 | 
212 |                     out_fake_i, out_cls_fake_i = self.D(x_fake_i,learn_adj(x_fake_i))
213 |                     _, out_cls_real_i = self.D(x_tgts[idx][cluster_index_list],learn_adj(x_tgts[idx][cluster_index_list]))
214 | 
215 |                     ### Graph domain classification loss
216 |                     d_loss_cls_i = self.classification_loss(out_cls_real_i, label_pos[cluster_index_list], type=self.opts.cls_loss) \
217 |                                    + self.classification_loss(out_cls_fake_i, label_neg[cluster_index_list], type=self.opts.cls_loss)
218 |                     d_loss_cls += d_loss_cls_i
219 | 
220 |                     # Part of adversarial loss
221 |                     d_loss_fake += torch.mean(out_fake_i)
222 | 
223 |                 out_src, out_cls_src = self.D(x_src[cluster_index_list],learn_adj(x_src[cluster_index_list]))
224 |                 ### Adversarial loss
225 |                 d_loss_adv = torch.mean(out_src) - d_loss_fake / (self.opts.num_domains - 1)
226 | 
227 |                 ### Gradient penalty loss
228 |                 x_fake_cat = torch.cat(x_fake_list)
229 |                 x_src_cat = torch.cat(x_src_list)
230 | 
231 |                 alpha = torch.rand(x_src_cat.size(0), 1).to(self.device)
232 |                 x_hat = (alpha * x_src_cat.data + (1 - alpha) * x_fake_cat.data).requires_grad_(True)
233 | 
234 |                 out_hat, _ = self.D(x_hat,learn_adj(x_hat.detach()))
235 |                 d_loss_reg = self.gradient_penalty(out_hat, x_hat, self.opts.Lf)
236 | 
237 |                 # Cluster-based loss to update the discriminator
238 |                 d_loss_cluster = -1 * d_loss_adv + self.opts.lambda_cls * d_loss_cls + self.opts.lambda_reg * d_loss_reg
239 |                 
240 |                 ### Discriminator loss
241 |                 d_loss += d_loss_cluster
242 | 
243 | 
244 |             print("d_loss",d_loss)
245 |             self.reset_grad()
246 |             d_loss.backward()
247 |             self.d_optimizer.step()
248 | 
249 |             # Logging.
250 |             loss = {}
251 |             loss['D/loss_adv'] = d_loss_adv.item()
252 |             loss['D/loss_cls'] = d_loss_cls.item()
253 |             loss['D/loss_reg'] = d_loss_reg.item()
254 | 
255 |             # =================================================================================== #
256 |             #                       3. Train the cluster-specific generators                      #
257 |             # =================================================================================== #
258 |             print("Train the generators")
259 |             if (i + 1) % self.opts.n_critic == 0:
260 |                 g_loss_info = 0
261 |                 g_loss_adv = 0
262 |                 g_loss_idt = 0
263 |                 g_loss_topo = 0
264 |                 g_loss_rec = 0
265 |                 g_loss = 0
266 |                 
267 |                 for par in range(nb_clusters):
268 |                     print("cluster",par)
269 |                     for idx in range(len(self.Gs)):
270 |                         # ========================= #
271 |                         # =====source-to-target==== #
272 |                         # ========================= #
273 |                         x_fake_i = self.Gs[idx][par](embedding[cluster_index_list],learn_adj(x_tgts[idx][cluster_index_list]))
274 |                         
275 |                         # Global topology loss
276 |                         global_topology = self.criterionIdt(x_fake_i, x_tgts[idx][cluster_index_list])
277 |                         
278 |                         # Local topology loss
279 |                         real_topology = topological_measures(x_tgts[idx][cluster_index_list])
280 |                         fake_topology = topological_measures(x_fake_i.detach())
281 |                         # 0:closeness centrality    1:betweeness centrality    2:eginvector centrality
282 |                         local_topology = mean_absolute_error(fake_topology[0],real_topology[0])
283 |                         
284 |                         ### Topology loss
285 |                         g_loss_topo += (local_topology + global_topology)
286 |                         
287 |                         if self.opts.lambda_idt > 0:
288 |                             x_fake_i_idt = self.Gs[idx][par](self.E(x_tgts[idx][cluster_index_list],learn_adj(x_tgts[idx][cluster_index_list])),learn_adj(x_tgts[idx][cluster_index_list]))
289 |                             g_loss_idt += self.criterionIdt(x_fake_i_idt, x_tgts[idx][cluster_index_list])
290 | 
291 |                         out_fake_i, out_cls_fake_i = self.D(x_fake_i,learn_adj(x_fake_i.detach()))
292 | 
293 |                         ### Information maximization loss
294 |                         g_loss_info_i = F.binary_cross_entropy_with_logits(out_cls_fake_i, label_pos[cluster_index_list])
295 |                         g_loss_info += g_loss_info_i
296 | 
297 |                         ### Adversarial loss
298 |                         g_loss_adv -= torch.mean(out_fake_i) # opposed sign
299 | 
300 |                         # ========================= #
301 |                         # =====target-to-source==== #
302 |                         # ========================= #
303 |                         x_reconst = self.Gs[idx][par](self.E(x_fake_i,learn_adj(x_fake_i.detach())),learn_adj(x_fake_i.detach()))
304 |                         
305 |                         # Reconstructed global topology loss
306 |                         reconstructed_global_topology = self.criterionIdt(x_src[cluster_index_list], x_reconst)
307 |                         
308 |                         # Reconstructed local topology loss
309 |                         real_topology = topological_measures(x_src[cluster_index_list])
310 |                         fake_topology = topological_measures(x_reconst.detach())
311 |                         # 0:closeness centrality    1:betweeness centrality    2:eginvector centrality
312 |                         reconstructed_local_topology = mean_absolute_error(fake_topology[0],real_topology[0])
313 |                         
314 |                         ### Graph reconstruction loss
315 |                         g_loss_rec += (reconstructed_local_topology + reconstructed_global_topology)
316 | 
317 |                     # Cluster-based loss to update the generators
318 |                     g_loss_cluster = g_loss_adv / (self.opts.num_domains - 1) + self.opts.lambda_info * g_loss_info + self.opts.lambda_idt * g_loss_idt + self.opts.lambda_topology * g_loss_topo + self.opts.lambda_rec * g_loss_rec
319 |                     
320 |                     ### Generator loss
321 |                     g_loss += g_loss_cluster
322 | 
323 |                 print("g_loss",g_loss)
324 |                 self.reset_grad()
325 |                 g_loss.backward()
326 |                 self.g_optimizer.step()
327 | 
328 |                 # Logging.
329 |                 loss['G/loss_adv'] = g_loss_adv.item()
330 |                 loss['G/loss_rec'] = g_loss_rec.item()
331 |                 loss['G/loss_cls'] = g_loss_info.item()
332 |                 if self.opts.lambda_idt > 0:
333 |                     loss['G/loss_idt'] = g_loss_idt.item()
334 | 
335 |             # =================================================================================== #
336 |             #                                 4. Miscellaneous                                    #
337 |             # =================================================================================== #
338 |             # print out training information.
339 |             if (i + 1) % self.opts.log_step == 0:
340 |                 et = time.time() - start_time
341 |                 et = str(datetime.timedelta(seconds=et))[:-7]
342 |                 log = "Elapsed [{}], Iteration [{}/{}]".format(et, i + 1, self.opts.num_iters)
343 |                 for tag, value in loss.items():
344 |                     log += ", {}: {:.4f}".format(tag, value)
345 |                 print(log)
346 | 
347 |            
348 |             # save model checkpoints.
349 |             if (i + 1) % self.opts.model_save_step == 0:
350 |               E_path = os.path.join(self.opts.checkpoint_dir, '{}-E.ckpt'.format(i+1))
351 |               torch.save(self.E.state_dict(), E_path)
352 | 
353 |               D_path = os.path.join(self.opts.checkpoint_dir, '{}-D.ckpt'.format(i+1))
354 |               torch.save(self.D.state_dict(), D_path)
355 | 
356 |               for par in range(nb_clusters):
357 |                   for idx in range(len(self.Gs)):
358 |                       G_i_path = os.path.join(self.opts.checkpoint_dir, '{}-G{}-{}.ckpt'.format(i+1, idx+1, par))
359 |                       print(G_i_path)
360 |                       torch.save(self.Gs[idx][par].state_dict(), G_i_path)
361 | 
362 |               print('Saved model checkpoints into {}...'.format(self.opts.checkpoint_dir))
363 |              
364 |               print('=============================')
365 |               print("End of Training")
366 |               print('=============================')
367 | 
368 |     # =================================================================================== #
369 |     #                              5. Test with a new dataset                             #
370 |     # =================================================================================== #
371 |     def test(self):
372 |         """
373 |         Test the trained MultiGraphGAN.
374 |         """
375 |         self.restore_model(self.opts.test_iters,self.opts.nb_clusters)
376 |         
377 |         # Set data loader.
378 |         src_loader = self.src_loader
379 |         x_src = next(iter(self.src_loader))
380 |         x_src = x_src[0].to(self.device)
381 |         
382 |         tgt_iters = []
383 |         for loader in self.tgt_loaders:
384 |             tgt_iters.append(iter(loader))
385 |         
386 |         x_tgts = []
387 |         for tgt_idx in range(len(tgt_iters)):
388 |             try:
389 |                 x_tgt_i= next(tgt_iters[tgt_idx])
390 |                 x_tgts.append(x_tgt_i)
391 |             except:
392 |                 tgt_iters[tgt_idx] = iter(self.tgt_loaders[tgt_idx])
393 |                 x_tgt_i= next(tgt_iters[tgt_idx])
394 |                 x_tgts.append(x_tgt_i)
395 | 
396 |         for tgt_idx in range(len(x_tgts)):
397 |             x_tgts[tgt_idx] = x_tgts[tgt_idx][0].to(self.device)
398 |            
399 |         # return model.eval()
400 |         for par in range(self.opts.nb_clusters):
401 |             for idx in range(len(self.Gs)):
402 |                 self.Gs[idx][par].eval()
403 | 
404 |         with torch.no_grad():
405 |             embedding = self.E(x_src,learn_adj(x_src))
406 |             predicted_target_graphs = []
407 |             for idx in range(len(self.Gs)):
408 |                 sum_cluster_pred_graph = 0
409 |                 for par in range(self.opts.nb_clusters):
410 |                     x_fake_i = self.Gs[idx][par](embedding,learn_adj(x_src))
411 |                     sum_cluster_pred_graph = np.add(sum_cluster_pred_graph,x_fake_i.cpu())
412 | 
413 |                 average_predicted_target_graph = sum_cluster_pred_graph / float(self.opts.nb_clusters)
414 |                 predicted_target_graphs.append(average_predicted_target_graph)
415 |                 
416 |         
417 |         print('=============================')
418 |         print("End of Testing")
419 |         print('=============================')
420 |         
421 |         return predicted_target_graphs, x_src


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | numpy
2 | networkx
3 | annoy
4 | fbpca
5 | SIMLR
6 | yaml


--------------------------------------------------------------------------------