├── LICENSE
├── New_predicted_genes.png
├── README.md
├── SpaGE
    ├── dimensionality_reduction.py
    ├── main.py
    └── principal_vectors.py
├── SpaGE_Tutorial.ipynb
├── benchmark
    ├── MERFISH_Moffit
    │   ├── Liger
    │   │   └── LIGER.R
    │   ├── Performance_evaluation.py
    │   ├── Seurat
    │   │   ├── MERFISH.R
    │   │   ├── MERFISH_integration.R
    │   │   └── Moffit_RNA.R
    │   ├── SpaGE
    │   │   ├── Integration.py
    │   │   ├── MERFISH_data_preprocessing.py
    │   │   ├── Moffit_RNA_preprocessing.py
    │   │   ├── Precise_output.py
    │   │   ├── dimensionality_reduction.py
    │   │   └── principal_vectors.py
    │   └── gimVI
    │   │   └── gimVI.py
    ├── Performance_evaluation_osmFISH.py
    ├── Precise_output_all.py
    ├── STARmap_AllenVISp
    │   ├── DownSampling
    │   │   ├── DownSampling.py
    │   │   └── DownSampling_evaluation.py
    │   ├── Liger
    │   │   └── LIGER.R
    │   ├── Performance_evaluation.py
    │   ├── Seurat
    │   │   ├── allen_brain.R
    │   │   ├── impute_starmap.R
    │   │   ├── starmap.R
    │   │   └── starmap_integration.R
    │   ├── SpaGE
    │   │   ├── Allen_data_preprocessing.py
    │   │   ├── Integration.py
    │   │   ├── Precise_output.py
    │   │   ├── Starmap_data_preprocessing.py
    │   │   ├── dimensionality_reduction.py
    │   │   ├── principal_vectors.py
    │   │   └── viz.py
    │   ├── Starmap_plots.R
    │   └── gimVI
    │   │   └── gimVI.py
    ├── Timing_Evaluation.py
    ├── osmFISH_AllenSSp
    │   ├── Liger
    │   │   └── LIGER.R
    │   ├── Performance_evaluation.py
    │   ├── Seurat
    │   │   ├── allen_brain.R
    │   │   ├── impute_osmFISH.R
    │   │   ├── osmFISH.R
    │   │   └── osmFISH_integration.R
    │   ├── SpaGE
    │   │   ├── Allen_data_preprocessing.py
    │   │   ├── Integration.py
    │   │   ├── Precise_output.py
    │   │   ├── dimensionality_reduction.py
    │   │   ├── osmFISH_data_preprocessing.py
    │   │   └── principal_vectors.py
    │   └── gimVI
    │   │   └── gimVI.py
    ├── osmFISH_AllenVISp
    │   ├── Liger
    │   │   └── LIGER.R
    │   ├── Performance_evaluation.py
    │   ├── Seurat
    │   │   ├── allen_brain.R
    │   │   ├── impute_osmFISH.R
    │   │   ├── osmFISH.R
    │   │   └── osmFISH_integration.R
    │   ├── SpaGE
    │   │   ├── Allen_data_preprocessing.py
    │   │   ├── Integration.py
    │   │   ├── Precise_output.py
    │   │   ├── dimensionality_reduction.py
    │   │   ├── osmFISH_data_preprocessing.py
    │   │   └── principal_vectors.py
    │   └── gimVI
    │   │   └── gimVI.py
    ├── osmFISH_Ziesel
    │   ├── Liger
    │   │   └── LIGER.R
    │   ├── Performance_evaluation.py
    │   ├── Seurat
    │   │   ├── Zeisel_Cortex.R
    │   │   ├── impute_osmFISH.R
    │   │   ├── osmFISH.R
    │   │   └── osmFISH_integration.R
    │   ├── SpaGE
    │   │   ├── Integration.py
    │   │   ├── Linnarson_data_preprocessing.py
    │   │   ├── Precise_output.py
    │   │   ├── dimensionality_reduction.py
    │   │   └── principal_vectors.py
    │   └── gimVI
    │   │   └── gimVI.py
    └── seqFISH_AllenVISp
    │   ├── DownSampling
    │       ├── DownSampling.py
    │       └── DownSampling_evaluation.py
    │   ├── Performance_evaluation.py
    │   └── SpaGE
    │       ├── Integration.py
    │       ├── Precise_output.py
    │       ├── dimensionality_reduction.py
    │       ├── principal_vectors.py
    │       └── seqFISH_data_preprocessing.py
├── dimensionality_reduction.py
└── principal_vectors.py


/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2020 tabdelaal
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/New_predicted_genes.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tabdelaal/SpaGE/edd0fb7086eca791c6589020893eb4f034b195c7/New_predicted_genes.png


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # SpaGE
 2 | ## Predicting whole-transcriptome expression of spatial transcriptomics data through integration with scRNA-seq data 
 3 | 
 4 | ### Implementation description
 5 | 
 6 | Python implementation can be found in the 'SpaGE' folder. The ```SpaGE``` function takes as input i) two single cell datasets, spatial transcriptomics and scRNA-seq, ii) the number of principal vectors *(PVs)*, and iii) the set of unmeasured genes in the spatial data for which predictions are obtained from the scRNA-seq (optional). The function returns back the predicted expression for these unmeasured genes across all spatial cells. 
 7 | 
 8 | For full description, please check the ```SpaGE``` function description in ```main.py```.
 9 | 
10 | ### Tutorial
11 | 
12 | The ```SpaGE_Tutorial``` notebook is a step-by-step example showing how to validate SpaGE on the spatially measured genes, and how to use SpaGE to predict new spatial gene patterns.
13 | 
14 | <p align="center">
15 |   <img src="New_predicted_genes.png" width="600">
16 | </p>
17 | 
18 | ### Experiments code description
19 | 
20 | The 'benchmark' folder contains the scripts to reproduce the results shown in the pre-print. The bencmark folder has five subfolders, each corresponds to one dataset-pair and contains the scripts to run SpaGE, Seurat-v3, Liger and gimVI. Additionally, we provide evaluation scripts to calculate and compare the performance of all four methods, and to reproduce all the figures in the paper.
21 | 
22 | ### Datasets
23 | 
24 | All datasets used are publicly available data, for convenience datasets can be downloaded from Zenodo (https://doi.org/10.5281/zenodo.3967291)
25 | 
26 | For citation and further information please refer to:
27 | "SpaGE: Spatial Gene Enhancement using scRNA-seq", [NAR](https://academic.oup.com/nar/article/48/18/e107/5909530)
28 | 


--------------------------------------------------------------------------------
/SpaGE/dimensionality_reduction.py:
--------------------------------------------------------------------------------
 1 | """ Dimensionality Reduction
 2 | @author: Soufiane Mourragui
 3 | This module extracts the domain-specific factors from the high-dimensional omics
 4 | dataset. Several methods are here implemented and they can be directly
 5 | called from string name in main method method. All the methods
 6 | use scikit-learn implementation.
 7 | Notes
 8 | -------
 9 | 	-
10 | 	
11 | References
12 | -------
13 | 	[1] Pedregosa, Fabian, et al. (2011) Scikit-learn: Machine learning in Python.
14 | 	Journal of Machine Learning Research
15 | """
16 | 
17 | import numpy as np
18 | from sklearn.decomposition import PCA, FastICA, FactorAnalysis, NMF, SparsePCA
19 | from sklearn.cross_decomposition import PLSRegression
20 | 
21 | 
22 | def process_dim_reduction(method='pca', n_dim=10):
23 |     """
24 |     Default linear dimensionality reduction method. For each method, return a
25 |     BaseEstimator instance corresponding to the method given as input.
26 | 	Attributes
27 |     -------
28 |     method: str, default to 'pca'
29 |     	Method used for dimensionality reduction.
30 |     	Implemented: 'pca', 'ica', 'fa' (Factor Analysis), 
31 |     	'nmf' (Non-negative matrix factorisation), 'sparsepca' (Sparse PCA).
32 |     
33 |     n_dim: int, default to 10
34 |     	Number of domain-specific factors to compute.
35 |     Return values
36 |     -------
37 |     Classifier, i.e. BaseEstimator instance
38 |     """
39 | 
40 |     if method.lower() == 'pca':
41 |         clf = PCA(n_components=n_dim)
42 | 
43 |     elif method.lower() == 'ica':
44 |         print('ICA')
45 |         clf = FastICA(n_components=n_dim)
46 | 
47 |     elif method.lower() == 'fa':
48 |         clf = FactorAnalysis(n_components=n_dim)
49 | 
50 |     elif method.lower() == 'nmf':
51 |         clf = NMF(n_components=n_dim)
52 | 
53 |     elif method.lower() == 'sparsepca':
54 |         clf = SparsePCA(n_components=n_dim, alpha=10., tol=1e-4, verbose=10, n_jobs=1)
55 | 
56 |     elif method.lower() == 'pls':
57 |         clf = PLS(n_components=n_dim)
58 | 		
59 |     else:
60 |         raise NameError('%s is not an implemented method'%(method))
61 | 
62 |     return clf
63 | 
64 | 
65 | class PLS():
66 |     """
67 |     Implement PLS to make it compliant with the other dimensionality
68 |     reduction methodology.
69 |     (Simple class rewritting).
70 |     """
71 |     def __init__(self, n_components=10):
72 |         self.clf = PLSRegression(n_components)
73 | 
74 |     def get_components_(self):
75 |         return self.clf.x_weights_.transpose()
76 | 
77 |     def set_components_(self, x):
78 |         pass
79 | 
80 |     components_ = property(get_components_, set_components_)
81 | 
82 |     def fit(self, X, y):
83 |         self.clf.fit(X,y)
84 |         return self
85 | 
86 |     def transform(self, X):
87 |         return self.clf.transform(X)
88 | 
89 |     def predict(self, X):
90 |         return self.clf.predict(X)


--------------------------------------------------------------------------------
/SpaGE/main.py:
--------------------------------------------------------------------------------
 1 | """ SpaGE [1]
 2 | @author: Tamim Abdelaal
 3 | This function integrates two single-cell datasets, spatial and scRNA-seq, and 
 4 | enhance the spatial data by predicting the expression of the spatially 
 5 | unmeasured genes from the scRNA-seq data.
 6 | The integration is performed using the domain adaption method PRECISE [2]
 7 | 	
 8 | References
 9 | -------
10 |     [1] Abdelaal T., Mourragui S., Mahfouz A., Reiders M.J.T. (2020)
11 |     SpaGE: Spatial Gene Enhancement using scRNA-seq
12 |     [2] Mourragui S., Loog M., Reinders M.J.T., Wessels L.F.A. (2019)
13 |     PRECISE: A domain adaptation approach to transfer predictors of drug response
14 |     from pre-clinical models to tumors
15 | """
16 | 
17 | import numpy as np
18 | import pandas as pd
19 | import scipy.stats as st
20 | from sklearn.neighbors import NearestNeighbors
21 | from SpaGE.principal_vectors import PVComputation
22 | 
23 | def SpaGE(Spatial_data,RNA_data,n_pv,genes_to_predict=None):
24 |     """
25 |         @author: Tamim Abdelaal
26 |         This function integrates two single-cell datasets, spatial and scRNA-seq, 
27 |         and enhance the spatial data by predicting the expression of the spatially 
28 |         unmeasured genes from the scRNA-seq data.
29 |         
30 |         Parameters
31 |         -------
32 |         Spatial_data : Dataframe
33 |             Normalized Spatial data matrix (cells X genes).
34 |         RNA_data : Dataframe
35 |             Normalized scRNA-seq data matrix (cells X genes).
36 |         n_pv : int
37 |             Number of principal vectors to find from the independently computed
38 |             principal components, and used to align both datasets. This should
39 |             be <= number of shared genes between the two datasets.
40 |         genes_to_predict : str array 
41 |             list of gene names missing from the spatial data, to be predicted 
42 |             from the scRNA-seq data. Default is the set of different genes 
43 |             (columns) between scRNA-seq and spatial data.
44 |             
45 |         Return
46 |         -------
47 |         Imp_Genes: Dataframe
48 |             Matrix containing the predicted gene expressions for the spatial 
49 |             cells. Rows are equal to the number of spatial data rows (cells), 
50 |             and columns are equal to genes_to_predict,  .
51 |     """
52 |     
53 |     if genes_to_predict is SpaGE.__defaults__[0]:
54 |         genes_to_predict = np.setdiff1d(RNA_data.columns,Spatial_data.columns)
55 |         
56 |     RNA_data_scaled = pd.DataFrame(data=st.zscore(RNA_data,axis=0),
57 |                                    index = RNA_data.index,columns=RNA_data.columns)
58 |     Spatial_data_scaled = pd.DataFrame(data=st.zscore(Spatial_data,axis=0),
59 |                                    index = Spatial_data.index,columns=Spatial_data.columns)
60 |     Common_data = RNA_data_scaled[np.intersect1d(Spatial_data_scaled.columns,RNA_data_scaled.columns)]
61 | 
62 |     Y_train = RNA_data[genes_to_predict]
63 |     
64 |     Imp_Genes = pd.DataFrame(np.zeros((Spatial_data.shape[0],len(genes_to_predict))),
65 |                                  columns=genes_to_predict)
66 |     
67 |     pv_Spatial_RNA = PVComputation(
68 |             n_factors = n_pv,
69 |             n_pv = n_pv,
70 |             dim_reduction = 'pca',
71 |             dim_reduction_target = 'pca'
72 |     )
73 |     
74 |     pv_Spatial_RNA.fit(Common_data,Spatial_data_scaled[Common_data.columns])
75 |     
76 |     S = pv_Spatial_RNA.source_components_.T
77 |         
78 |     Effective_n_pv = sum(np.diag(pv_Spatial_RNA.cosine_similarity_matrix_) > 0.3)
79 |     S = S[:,0:Effective_n_pv]
80 |     
81 |     Common_data_projected = Common_data.dot(S)
82 |     Spatial_data_projected = Spatial_data_scaled[Common_data.columns].dot(S)
83 |         
84 |     nbrs = NearestNeighbors(n_neighbors=50, algorithm='auto',
85 |                             metric = 'cosine').fit(Common_data_projected)
86 |     distances, indices = nbrs.kneighbors(Spatial_data_projected)
87 |     
88 |     for j in range(0,Spatial_data.shape[0]):
89 |     
90 |         weights = 1-(distances[j,:][distances[j,:]<1])/(np.sum(distances[j,:][distances[j,:]<1]))
91 |         weights = weights/(len(weights)-1)
92 |         Imp_Genes.iloc[j,:] = np.dot(weights,Y_train.iloc[indices[j,:][distances[j,:] < 1]])
93 |         
94 |     return Imp_Genes
95 | 


--------------------------------------------------------------------------------
/benchmark/MERFISH_Moffit/Liger/LIGER.R:
--------------------------------------------------------------------------------
 1 | setwd("MERFISH_Moffit/")
 2 | library(liger)
 3 | library(Seurat)
 4 | library(ggplot2)
 5 | 
 6 | # Moffit RNA
 7 | Moffit <- Read10X("data/Moffit_RNA/GSE113576/")
 8 | 
 9 | Moffit <- as.matrix(Moffit)
10 | Genes_count = rowSums(Moffit > 0)
11 | Moffit <- Moffit[Genes_count>=10,]
12 | 
13 | # MERFISH
14 | MERFISH <- read.csv(file = "data/MERFISH/Moffitt_and_Bambah-Mukku_et_al_merfish_all_cells.csv", header = TRUE)
15 | MERFISH_1 <- MERFISH[MERFISH['Animal_ID']==1,]
16 | MERFISH_1 <- MERFISH_1[MERFISH_1['Cell_class']!='Ambiguous',]
17 | MERFISH_meta <- MERFISH_1[,c(1:9)]
18 | MERFISH_data <- MERFISH_1[,c(10:170)]
19 | drops <- c('Blank_1','Blank_2','Blank_3','Blank_4','Blank_5','Fos')
20 | MERFISH_data <- MERFISH_data[ , !(colnames(MERFISH_data) %in% drops)]
21 | MERFISH_data <- t(MERFISH_data)
22 | 
23 | Gene_set <- intersect(rownames(MERFISH_data),rownames(Moffit))
24 | 
25 | #### New genes prediction
26 | Ligerex <- createLiger(list(MERFISH = MERFISH_data, Moffit_RNA = Moffit))
27 | Ligerex <- normalize(Ligerex)
28 | Ligerex@var.genes <- Gene_set
29 | Ligerex <- scaleNotCenter(Ligerex)
30 | 
31 | # suggestK(Ligerex) # K = 25
32 | # suggestLambda(Ligerex, k = 25)
33 | 
34 | Ligerex <- optimizeALS(Ligerex,k = 25)
35 | Ligerex <- quantileAlignSNF(Ligerex)
36 | 
37 | # leave-one-out validation
38 | genes.leaveout <- Gene_set
39 | Imp_genes <- matrix(0,nrow = length(genes.leaveout),ncol = dim(MERFISH_data)[2])
40 | rownames(Imp_genes) <- genes.leaveout
41 | colnames(Imp_genes) <- colnames(MERFISH_data)
42 | NMF_time <- vector(mode= "numeric")
43 | knn_time <- vector(mode= "numeric")
44 | 
45 | for(i in 1:length(genes.leaveout)) {
46 |   print(i)
47 |   start_time <- Sys.time()
48 |   Ligerex.leaveout <- createLiger(list(MERFISH = MERFISH_data[-which(rownames(MERFISH_data) %in% genes.leaveout[i]),], Moffit_RNA = Moffit))
49 |   Ligerex.leaveout <- normalize(Ligerex.leaveout)
50 |   Ligerex.leaveout@var.genes <- setdiff(Gene_set,genes.leaveout[i])
51 |   Ligerex.leaveout <- scaleNotCenter(Ligerex.leaveout)
52 |   Ligerex.leaveout <- optimizeALS(Ligerex.leaveout,k = 25)
53 |   Ligerex.leaveout <- quantileAlignSNF(Ligerex.leaveout)
54 |   end_time <- Sys.time()
55 |   NMF_time <- c(NMF_time,as.numeric(difftime(end_time,start_time,units = 'secs')))
56 |   
57 |   start_time <- Sys.time()
58 |   Imputation <- imputeKNN(Ligerex.leaveout,reference = 'Moffit_RNA', queries = list('MERFISH'), norm = TRUE, scale = FALSE)
59 |   Imp_genes[genes.leaveout[[i]],] = as.vector(Imputation@norm.data$MERFISH[genes.leaveout[i],])
60 |   end_time <- Sys.time()
61 |   knn_time <- c(knn_time,as.numeric(difftime(end_time,start_time,units = 'secs')))
62 | }
63 | write.csv(Imp_genes,file = 'Results/Liger_LeaveOneOut.csv')
64 | write.csv(NMF_time,file = 'Results/Liger_NMF_time.csv',row.names = FALSE)
65 | write.csv(knn_time,file = 'Results/Liger_knn_time.csv',row.names = FALSE)


--------------------------------------------------------------------------------
/benchmark/MERFISH_Moffit/Performance_evaluation.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | os.chdir('MERFISH_Moffit/')
  3 | 
  4 | import numpy as np
  5 | import pandas as pd
  6 | import pickle
  7 | import matplotlib
  8 | matplotlib.use('qt5agg')
  9 | matplotlib.rcParams['pdf.fonttype'] = 42
 10 | matplotlib.rcParams['ps.fonttype'] = 42
 11 | import matplotlib.pyplot as plt
 12 | import scipy.stats as st
 13 | 
 14 | with open ('data/SpaGE_pkl/MERFISH.pkl', 'rb') as f:
 15 |     datadict = pickle.load(f)
 16 | 
 17 | MERFISH_data = datadict['MERFISH_data']
 18 | del datadict
 19 | 
 20 | with open ('data/SpaGE_pkl/Moffit_RNA.pkl', 'rb') as f:
 21 |     datadict = pickle.load(f)
 22 |     
 23 | RNA_data = datadict['RNA_data']
 24 | del datadict
 25 | 
 26 | Gene_Order = np.intersect1d(MERFISH_data.columns,RNA_data.columns)
 27 | 
 28 | ### SpaGE
 29 | SpaGE_imputed = pd.read_csv('Results/SpaGE_LeaveOneOut.csv',header=0,index_col=0,sep=',')
 30 | 
 31 | SpaGE_imputed = SpaGE_imputed.loc[:,Gene_Order]
 32 | 
 33 | SpaGE_Corr = pd.Series(index = Gene_Order)
 34 | for i in Gene_Order:
 35 |     SpaGE_Corr[i] = st.spearmanr(MERFISH_data[i],SpaGE_imputed[i])[0]
 36 |     
 37 | ### gimVI
 38 | gimVI_imputed = pd.read_csv('Results/gimVI_LeaveOneOut.csv',header=0,index_col=0,sep=',')
 39 | gimVI_imputed = gimVI_imputed.drop(columns='AVPR2')
 40 | 
 41 | gimVI_imputed = gimVI_imputed.loc[:,[x.upper() for x in np.array(Gene_Order,dtype='str')]]
 42 | 
 43 | gimVI_Corr = pd.Series(index = Gene_Order)
 44 | for i in Gene_Order:
 45 |     gimVI_Corr[i] = st.spearmanr(MERFISH_data[i],gimVI_imputed[str(i).upper()])[0]
 46 | gimVI_Corr[np.isnan(gimVI_Corr)] = 0
 47 | 
 48 | 
 49 | ### Seurat
 50 | Seurat_imputed = pd.read_csv('Results/Seurat_LeaveOneOut.csv',header=0,index_col=0,sep=',').T
 51 | 
 52 | Seurat_imputed = Seurat_imputed.loc[:,Gene_Order]
 53 | 
 54 | Seurat_Corr = pd.Series(index = Gene_Order)
 55 | for i in Gene_Order:
 56 |     Seurat_Corr[i] = st.spearmanr(MERFISH_data[i],Seurat_imputed[i])[0]
 57 | 
 58 | ### Liger
 59 | Liger_imputed = pd.read_csv('Results/Liger_LeaveOneOut.csv',header=0,index_col=0,sep=',').T
 60 | 
 61 | Liger_imputed = Liger_imputed.loc[:,Gene_Order]
 62 | 
 63 | Liger_Corr = pd.Series(index = Gene_Order)
 64 | for i in Gene_Order:
 65 |     Liger_Corr[i] = st.spearmanr(MERFISH_data[i],Liger_imputed[i])[0]
 66 | Liger_Corr[np.isnan(Liger_Corr)] = 0
 67 | 
 68 | ### Comparison plots
 69 | plt.style.use('ggplot')
 70 | fig, ax = plt.subplots(figsize=(3.7, 5.5))
 71 | ax.boxplot([SpaGE_Corr,Seurat_Corr, Liger_Corr,gimVI_Corr])
 72 | 
 73 | y = SpaGE_Corr
 74 | x = np.random.normal(1, 0.05, len(y))
 75 | plt.plot(x, y, 'g.', markersize=3, alpha=0.2)
 76 | y = Seurat_Corr
 77 | x = np.random.normal(2, 0.05, len(y))
 78 | plt.plot(x, y, 'g.', markersize=3, alpha=0.2)
 79 | y = Liger_Corr
 80 | x = np.random.normal(3, 0.05, len(y))
 81 | plt.plot(x, y, 'g.', markersize=3, alpha=0.2)
 82 | y = gimVI_Corr
 83 | x = np.random.normal(4, 0.05, len(y))
 84 | plt.plot(x, y, 'g.', markersize=3, alpha=0.2)
 85 | 
 86 | plt.xticks((1,2,3,4),('SpaGE', 'Seurat', 'Liger','gimVI'),size=12)
 87 | plt.yticks(size=8)
 88 | plt.gca().set_ylim([-0.5,1])
 89 | plt.ylabel('Spearman Correlation',size=12)
 90 | ax.set_aspect(aspect=3)
 91 | _,p_val = st.wilcoxon(SpaGE_Corr,Seurat_Corr)
 92 | plt.text(2,np.max(plt.gca().get_ylim()),'%1.2e'%p_val,color='black',size=8)
 93 | _,p_val = st.wilcoxon(SpaGE_Corr,Liger_Corr)
 94 | plt.text(3,np.max(plt.gca().get_ylim()),'%1.2e'%p_val,color='black',size=8)
 95 | _,p_val = st.wilcoxon(SpaGE_Corr,gimVI_Corr)
 96 | plt.text(4,np.max(plt.gca().get_ylim()),'%1.2e'%p_val,color='black',size=8)
 97 | plt.show()
 98 | 
 99 | def Compare_Correlations(X,Y):
100 |     fig, ax = plt.subplots(figsize=(4.5, 4.5))
101 |     ax.scatter(X, Y, s=5)        
102 |     ax.axvline(linestyle='--',color='gray')
103 |     ax.axhline(linestyle='--',color='gray') 
104 |     plt.gca().set_ylim([-0.5,1])
105 |     lims = [np.min([ax.get_xlim(), ax.get_ylim()]),  
106 |         np.max([ax.get_xlim(), ax.get_ylim()])]
107 |     ax.plot(lims, lims, 'k-')
108 |     ax.set_aspect('equal')
109 |     ax.set_xlim(lims)
110 |     ax.set_ylim(lims)
111 |     plt.xticks(size=8)
112 |     plt.yticks(size=8)
113 |     plt.show()
114 | 
115 | 
116 | Compare_Correlations(Seurat_Corr,SpaGE_Corr)
117 | plt.xlabel('Spearman Correlation Seurat',size=12)
118 | plt.ylabel('Spearman Correlation SpaGE',size=12)
119 | plt.show()
120 | 
121 | Compare_Correlations(Liger_Corr,SpaGE_Corr)
122 | plt.xlabel('Spearman Correlation Liger',size=12)
123 | plt.ylabel('Spearman Correlation SpaGE',size=12)
124 | plt.show()
125 | 
126 | Compare_Correlations(gimVI_Corr,SpaGE_Corr)
127 | plt.xlabel('Spearman Correlation gimVI',size=12)
128 | plt.ylabel('Spearman Correlation SpaGE',size=12)
129 | plt.show()
130 | 


--------------------------------------------------------------------------------
/benchmark/MERFISH_Moffit/Seurat/MERFISH.R:
--------------------------------------------------------------------------------
 1 | setwd("MERFISH_Moffit/")
 2 | library(Seurat)
 3 | library(Matrix)
 4 | 
 5 | MERFISH <- read.csv(file = "data/MERFISH/Moffitt_and_Bambah-Mukku_et_al_merfish_all_cells.csv", header = TRUE)
 6 | MERFISH_1 <- MERFISH[MERFISH['Animal_ID']==1,]
 7 | MERFISH_1 <- MERFISH_1[MERFISH_1['Cell_class']!='Ambiguous',]
 8 | MERFISH_meta <- MERFISH_1[,c(1:9)]
 9 | MERFISH_data <- MERFISH_1[,c(10:170)]
10 | drops <- c('Blank_1','Blank_2','Blank_3','Blank_4','Blank_5','Fos')
11 | MERFISH_data <- MERFISH_data[ , !(colnames(MERFISH_data) %in% drops)]
12 | MERFISH_data <- t(MERFISH_data)
13 | 
14 | MERFISH_seurat <- CreateSeuratObject(counts = MERFISH_data, project = 'MERFISH', assay = 'RNA', meta.data = MERFISH_meta ,min.cells = -1, min.features = -1)
15 | total.counts = colSums(x = as.matrix(MERFISH_seurat@assays$RNA@counts))
16 | MERFISH_seurat <- NormalizeData(MERFISH_seurat, scale.factor = median(x = total.counts))
17 | MERFISH_seurat <- ScaleData(MERFISH_seurat)
18 | saveRDS(object = MERFISH_seurat, file = 'data/seurat_objects/MERFISH.rds')


--------------------------------------------------------------------------------
/benchmark/MERFISH_Moffit/Seurat/MERFISH_integration.R:
--------------------------------------------------------------------------------
 1 | setwd("MERFISH_Moffit/")
 2 | library(Seurat)
 3 | library(ggplot2)
 4 | 
 5 | MERFISH <- readRDS("data/seurat_objects/MERFISH.rds")
 6 | Moffit <- readRDS("data/seurat_objects/Moffit_RNA.rds")
 7 | 
 8 | genes.leaveout <- intersect(rownames(MERFISH),rownames(Moffit))
 9 | Imp_genes <- matrix(0,nrow = length(genes.leaveout),ncol = dim(MERFISH@assays$RNA)[2])
10 | rownames(Imp_genes) <- genes.leaveout
11 | anchor_time <- vector(mode= "numeric")
12 | Transfer_time <- vector(mode= "numeric")
13 | 
14 | run_imputation <- function(ref.obj, query.obj, feature.remove) {
15 |   message(paste0('removing ', feature.remove))
16 |   features <- setdiff(rownames(query.obj), feature.remove)
17 |   DefaultAssay(ref.obj) <- 'RNA'
18 |   DefaultAssay(query.obj) <- 'RNA'
19 |   
20 |   start_time <- Sys.time()
21 |   anchors <- FindTransferAnchors(
22 |     reference = ref.obj,
23 |     query = query.obj,
24 |     features = features,
25 |     dims = 1:30,
26 |     reduction = 'cca'
27 |   )
28 |   end_time <- Sys.time()
29 |   anchor_time <- c(anchor_time,as.numeric(difftime(end_time,start_time,units = 'secs')))
30 |   
31 |   refdata <- GetAssayData(
32 |     object = ref.obj,
33 |     assay = 'RNA',
34 |     slot = 'data'
35 |   )
36 |   
37 |   start_time <- Sys.time()
38 |   imputation <- TransferData(
39 |     anchorset = anchors,
40 |     refdata = refdata,
41 |     weight.reduction = 'pca'
42 |   )
43 |   query.obj[['seq']] <- imputation
44 |   end_time <- Sys.time()
45 |   Transfer_time <- c(Transfer_time,as.numeric(difftime(end_time,start_time,units = 'secs')))
46 |   return(query.obj)
47 | }
48 | 
49 | for(i in 1:length(genes.leaveout)) {
50 |   imputed.ss2 <- run_imputation(ref.obj = Moffit, query.obj = MERFISH, feature.remove = genes.leaveout[[i]])
51 |   MERFISH[['ss2']] <- imputed.ss2[, colnames(MERFISH)][['seq']]
52 |   Imp_genes[genes.leaveout[[i]],] = as.vector(MERFISH@assays$ss2[genes.leaveout[i],])
53 | }
54 | write.csv(Imp_genes,file = 'Results/Seurat_LeaveOneOut.csv')
55 | write.csv(anchor_time,file = 'Results/Seurat_anchor_time.csv',row.names = FALSE)
56 | write.csv(Transfer_time,file = 'Results/Seurat_transfer_time.csv',row.names = FALSE)


--------------------------------------------------------------------------------
/benchmark/MERFISH_Moffit/Seurat/Moffit_RNA.R:
--------------------------------------------------------------------------------
 1 | setwd("MERFISH_Moffit/")
 2 | library(Seurat)
 3 | 
 4 | Moffit <- Read10X("data/Moffit_RNA/GSE113576/")
 5 | 
 6 | Mo <- CreateSeuratObject(counts = Moffit, project = 'POR', min.cells = 10)
 7 | Mo <- NormalizeData(object = Mo)
 8 | Mo <- FindVariableFeatures(object = Mo, nfeatures = 2000)
 9 | Mo <- ScaleData(object = Mo)
10 | Mo <- RunPCA(object = Mo, npcs = 50, verbose = FALSE)
11 | Mo <- RunUMAP(object = Mo, dims = 1:50, nneighbors = 5)
12 | saveRDS(object = Mo, file = paste0("data/seurat_objects/","Moffit_RNA.rds"))


--------------------------------------------------------------------------------
/benchmark/MERFISH_Moffit/SpaGE/Integration.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | os.chdir('MERFISH_Moffit/')
 3 | 
 4 | import pickle
 5 | import numpy as np
 6 | import pandas as pd
 7 | from sklearn.neighbors import NearestNeighbors
 8 | import time as tm
 9 | 
10 | with open ('data/SpaGE_pkl/MERFISH.pkl', 'rb') as f:
11 |     datadict = pickle.load(f)
12 | 
13 | MERFISH_data = datadict['MERFISH_data']
14 | MERFISH_data_scaled = datadict['MERFISH_data_scaled']
15 | MERFISH_meta = datadict['MERFISH_meta']
16 | del datadict
17 | 
18 | with open ('data/SpaGE_pkl/Moffit_RNA.pkl', 'rb') as f:
19 |     datadict = pickle.load(f)
20 |     
21 | RNA_data = datadict['RNA_data']
22 | RNA_data_scaled = datadict['RNA_data_scaled']
23 | del datadict
24 | 
25 | #### Leave One Out Validation ####
26 | Common_data = RNA_data_scaled[np.intersect1d(MERFISH_data_scaled.columns,RNA_data_scaled.columns)]
27 | Imp_Genes = pd.DataFrame(columns=Common_data.columns)
28 | precise_time = []
29 | knn_time = []
30 | for i in Common_data.columns:
31 |     print(i)
32 |     start = tm.time()
33 |     from principal_vectors import PVComputation
34 | 
35 |     n_factors = 50
36 |     n_pv = 50
37 |     dim_reduction = 'pca'
38 |     dim_reduction_target = 'pca'
39 | 
40 |     pv_FISH_RNA = PVComputation(n_factors = n_factors,n_pv = n_pv,dim_reduction = dim_reduction,dim_reduction_target = dim_reduction_target)
41 | 
42 |     pv_FISH_RNA.fit(Common_data.drop(i,axis=1),MERFISH_data_scaled[Common_data.columns].drop(i,axis=1))
43 | 
44 |     S = pv_FISH_RNA.source_components_.T
45 |     
46 |     Effective_n_pv = sum(np.diag(pv_FISH_RNA.cosine_similarity_matrix_) > 0.3)
47 |     S = S[:,0:Effective_n_pv]
48 | 
49 |     Common_data_t = Common_data.drop(i,axis=1).dot(S)
50 |     FISH_exp_t = MERFISH_data_scaled[Common_data.columns].drop(i,axis=1).dot(S)
51 |     precise_time.append(tm.time()-start)
52 |     
53 |     start = tm.time()
54 |     nbrs = NearestNeighbors(n_neighbors=50, algorithm='auto',metric = 'cosine').fit(Common_data_t)
55 |     distances, indices = nbrs.kneighbors(FISH_exp_t)
56 | 
57 |     Imp_Gene = np.zeros(MERFISH_data.shape[0])
58 |     for j in range(0,MERFISH_data.shape[0]):
59 |         weights = 1-(distances[j,:][distances[j,:]<1])/(np.sum(distances[j,:][distances[j,:]<1]))
60 |         weights = weights/(len(weights)-1)
61 |         Imp_Gene[j] = np.sum(np.multiply(RNA_data[i][indices[j,:][distances[j,:] < 1]],weights))
62 |     Imp_Gene[np.isnan(Imp_Gene)] = 0
63 |     Imp_Genes[i] = Imp_Gene
64 |     knn_time.append(tm.time()-start)
65 | 
66 | Imp_Genes.to_csv('Results/SpaGE_LeaveOneOut.csv')
67 | precise_time = pd.DataFrame(precise_time)
68 | knn_time = pd.DataFrame(knn_time)
69 | precise_time.to_csv('Results/SpaGE_PreciseTime.csv', index = False)
70 | knn_time.to_csv('Results/SpaGE_knnTime.csv', index = False)


--------------------------------------------------------------------------------
/benchmark/MERFISH_Moffit/SpaGE/MERFISH_data_preprocessing.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | os.chdir('MERFISH_Moffit/')
 3 | 
 4 | import numpy as np
 5 | import pandas as pd
 6 | import scipy.stats as st
 7 | import pickle
 8 | 
 9 | MERFISH = pd.read_csv('data/MERFISH/Moffitt_and_Bambah-Mukku_et_al_merfish_all_cells.csv')
10 | #Select 1st replicate, Naive female
11 | MERFISH_1 = MERFISH.loc[MERFISH['Animal_ID']==1,:]
12 | 
13 | #remove Blank-1 to 5 and Fos --> 155 genes
14 | MERFISH_1 = MERFISH_1.loc[MERFISH_1['Cell_class']!='Ambiguous',:]
15 | MERFISH_meta = MERFISH_1.iloc[:,0:9]
16 | MERFISH_data = MERFISH_1.iloc[:,9:171]
17 | MERFISH_data = MERFISH_data.drop(columns = ['Blank_1','Blank_2','Blank_3','Blank_4','Blank_5','Fos'])
18 | del MERFISH, MERFISH_1
19 | 
20 | MERFISH_data = MERFISH_data.T
21 | 
22 | cell_count = np.sum(MERFISH_data,axis=0)
23 | def Log_Norm(x):
24 |     return np.log(((x/np.sum(x))*np.median(cell_count)) + 1)
25 | 
26 | MERFISH_data = MERFISH_data.apply(Log_Norm,axis=0)
27 | MERFISH_data_scaled = pd.DataFrame(data=st.zscore(MERFISH_data.T),index = MERFISH_data.columns,columns=MERFISH_data.index)
28 | 
29 | datadict = dict()
30 | datadict['MERFISH_data'] = MERFISH_data.T
31 | datadict['MERFISH_data_scaled'] = MERFISH_data_scaled
32 | datadict['MERFISH_meta'] = MERFISH_meta
33 | 
34 | with open('data/SpaGE_pkl/MERFISH.pkl','wb') as f:
35 |     pickle.dump(datadict, f)


--------------------------------------------------------------------------------
/benchmark/MERFISH_Moffit/SpaGE/Moffit_RNA_preprocessing.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | os.chdir('MERFISH_Moffit/')
 3 | 
 4 | import numpy as np
 5 | import pandas as pd
 6 | import scipy.stats as st
 7 | import pickle
 8 | import scipy.io as io
 9 | 
10 | genes = pd.read_csv('data/Moffit_RNA/GSE113576/genes.tsv',sep='\t',header=None)
11 | barcodes = pd.read_csv('data/Moffit_RNA/GSE113576/barcodes.tsv',sep='\t',header=None)
12 | 
13 | genes = np.array(genes.loc[:,1])
14 | barcodes = np.array(barcodes.loc[:,0])
15 | RNA_data = io.mmread('data/Moffit_RNA/GSE113576/matrix.mtx')
16 | RNA_data = RNA_data.todense()
17 | RNA_data = pd.DataFrame(RNA_data,index=genes,columns=barcodes)
18 | 
19 | # filter lowely expressed genes
20 | Genes_count = np.sum(RNA_data > 0, axis=1)
21 | RNA_data = RNA_data.loc[Genes_count >=10,:]
22 | del Genes_count
23 | 
24 | def Log_Norm(x):
25 |     return np.log(((x/np.sum(x))*1000000) + 1)
26 | 
27 | RNA_data = RNA_data.apply(Log_Norm,axis=0)
28 | RNA_data_scaled = pd.DataFrame(data=st.zscore(RNA_data.T),index = RNA_data.columns,columns=RNA_data.index)
29 | 
30 | datadict = dict()
31 | datadict['RNA_data'] = RNA_data.T
32 | datadict['RNA_data_scaled'] = RNA_data_scaled
33 | 
34 | with open('data/SpaGE_pkl/Moffit_RNA.pkl','wb') as f:
35 |     pickle.dump(datadict, f, protocol=4)


--------------------------------------------------------------------------------
/benchmark/MERFISH_Moffit/SpaGE/Precise_output.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | os.chdir('MERFISH_Moffit/')
 3 | 
 4 | import pickle
 5 | import numpy as np
 6 | import pandas as pd
 7 | import matplotlib
 8 | matplotlib.rcParams['pdf.fonttype'] = 42
 9 | matplotlib.rcParams['ps.fonttype'] = 42
10 | import matplotlib.pyplot as plt
11 | import seaborn as sns
12 | import sys
13 | sys.path.insert(1,'SpaGE/')
14 | from principal_vectors import PVComputation
15 | 
16 | with open ('data/SpaGE_pkl/MERFISH.pkl', 'rb') as f:
17 |     datadict = pickle.load(f)
18 | 
19 | MERFISH_data = datadict['MERFISH_data']
20 | MERFISH_data_scaled = datadict['MERFISH_data_scaled']
21 | MERFISH_meta = datadict['MERFISH_meta']
22 | del datadict
23 | 
24 | with open ('data/SpaGE_pkl/Moffit_RNA.pkl', 'rb') as f:
25 |     datadict = pickle.load(f)
26 |     
27 | RNA_data = datadict['RNA_data']
28 | RNA_data_scaled = datadict['RNA_data_scaled']
29 | del datadict
30 | 
31 | Common_data = RNA_data_scaled[np.intersect1d(MERFISH_data_scaled.columns,RNA_data_scaled.columns)]
32 | 
33 | n_factors = 50
34 | n_pv = 50
35 | n_pv_display = 50
36 | dim_reduction = 'pca'
37 | dim_reduction_target = 'pca'
38 | 
39 | pv_FISH_RNA = PVComputation(
40 |         n_factors = n_factors,
41 |         n_pv = n_pv,
42 |         dim_reduction = dim_reduction,
43 |         dim_reduction_target = dim_reduction_target
44 | )
45 | 
46 | pv_FISH_RNA.fit(Common_data,MERFISH_data_scaled[Common_data.columns])
47 | 
48 | fig = plt.figure()
49 | sns.heatmap(pv_FISH_RNA.initial_cosine_similarity_matrix_[:n_pv_display,:n_pv_display], cmap='seismic_r',
50 |             center=0, vmax=1., vmin=0)
51 | plt.xlabel('MERFISH',fontsize=18, color='black')
52 | plt.ylabel('scRNA-seq',fontsize=18, color='black')
53 | plt.xticks(np.arange(n_pv_display)+0.5, range(1, n_pv_display+1), fontsize=12)
54 | plt.yticks(np.arange(n_pv_display)+0.5, range(1, n_pv_display+1), fontsize=12, rotation='horizontal')
55 | plt.gca().set_ylim([n_pv_display,0])
56 | plt.show()
57 | 
58 | plt.figure()
59 | sns.heatmap(pv_FISH_RNA.cosine_similarity_matrix_[:n_pv_display,:n_pv_display], cmap='seismic_r',
60 |             center=0, vmax=1., vmin=0)
61 | for i in range(n_pv_display-1):
62 |     plt.text(i+1,i+.7,'%1.2f'%pv_FISH_RNA.cosine_similarity_matrix_[i,i], fontsize=14,color='black')
63 |     
64 | plt.xlabel('MERFISH',fontsize=18, color='black')
65 | plt.ylabel('scRNA-seq',fontsize=18, color='black')
66 | plt.xticks(np.arange(n_pv_display)+0.5, range(1, n_pv_display+1), fontsize=12)
67 | plt.yticks(np.arange(n_pv_display)+0.5, range(1, n_pv_display+1), fontsize=12, rotation='horizontal')
68 | plt.gca().set_ylim([n_pv_display,0])
69 | plt.show()
70 | 
71 | Importance = pd.Series(np.sum(pv_FISH_RNA.source_components_**2,axis=0),index=Common_data.columns)
72 | Importance.sort_values(ascending=False,inplace=True)
73 | Importance.index[0:50]
74 | 
75 | ### Technology specific Processes
76 | Effective_n_pv = sum(np.diag(pv_FISH_RNA.cosine_similarity_matrix_) > 0.3)
77 | 
78 | # explained variance RNA
79 | np.sum(pv_FISH_RNA.source_explained_variance_ratio_[np.arange(Effective_n_pv)])*100
80 | # explained variance spatial
81 | np.sum(pv_FISH_RNA.target_explained_variance_ratio_[np.arange(Effective_n_pv)])*100
82 | 


--------------------------------------------------------------------------------
/benchmark/MERFISH_Moffit/SpaGE/dimensionality_reduction.py:
--------------------------------------------------------------------------------
 1 | """ Dimensionality Reduction
 2 | @author: Soufiane Mourragui
 3 | This module extracts the domain-specific factors from the high-dimensional omics
 4 | dataset. Several methods are here implemented and they can be directly
 5 | called from string name in main method method. All the methods
 6 | use scikit-learn implementation.
 7 | Notes
 8 | -------
 9 | 	-
10 | 	
11 | References
12 | -------
13 | 	[1] Pedregosa, Fabian, et al. (2011) Scikit-learn: Machine learning in Python.
14 | 	Journal of Machine Learning Research
15 | """
16 | 
17 | import numpy as np
18 | from sklearn.decomposition import PCA, FastICA, FactorAnalysis, NMF, SparsePCA
19 | from sklearn.cross_decomposition import PLSRegression
20 | 
21 | 
22 | def process_dim_reduction(method='pca', n_dim=10):
23 |     """
24 |     Default linear dimensionality reduction method. For each method, return a
25 |     BaseEstimator instance corresponding to the method given as input.
26 | 	Attributes
27 |     -------
28 |     method: str, default to 'pca'
29 |     	Method used for dimensionality reduction.
30 |     	Implemented: 'pca', 'ica', 'fa' (Factor Analysis), 
31 |     	'nmf' (Non-negative matrix factorisation), 'sparsepca' (Sparse PCA).
32 |     
33 |     n_dim: int, default to 10
34 |     	Number of domain-specific factors to compute.
35 |     Return values
36 |     -------
37 |     Classifier, i.e. BaseEstimator instance
38 |     """
39 | 
40 |     if method.lower() == 'pca':
41 |         clf = PCA(n_components=n_dim)
42 | 
43 |     elif method.lower() == 'ica':
44 |         print('ICA')
45 |         clf = FastICA(n_components=n_dim)
46 | 
47 |     elif method.lower() == 'fa':
48 |         clf = FactorAnalysis(n_components=n_dim)
49 | 
50 |     elif method.lower() == 'nmf':
51 |         clf = NMF(n_components=n_dim)
52 | 
53 |     elif method.lower() == 'sparsepca':
54 |         clf = SparsePCA(n_components=n_dim, alpha=10., tol=1e-4, verbose=10, n_jobs=1)
55 | 
56 |     elif method.lower() == 'pls':
57 |         clf = PLS(n_components=n_dim)
58 | 		
59 |     else:
60 |         raise NameError('%s is not an implemented method'%(method))
61 | 
62 |     return clf
63 | 
64 | 
65 | class PLS():
66 |     """
67 |     Implement PLS to make it compliant with the other dimensionality
68 |     reduction methodology.
69 |     (Simple class rewritting).
70 |     """
71 |     def __init__(self, n_components=10):
72 |         self.clf = PLSRegression(n_components)
73 | 
74 |     def get_components_(self):
75 |         return self.clf.x_weights_.transpose()
76 | 
77 |     def set_components_(self, x):
78 |         pass
79 | 
80 |     components_ = property(get_components_, set_components_)
81 | 
82 |     def fit(self, X, y):
83 |         self.clf.fit(X,y)
84 |         return self
85 | 
86 |     def transform(self, X):
87 |         return self.clf.transform(X)
88 | 
89 |     def predict(self, X):
90 |         return self.clf.predict(X)


--------------------------------------------------------------------------------
/benchmark/MERFISH_Moffit/gimVI/gimVI.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | os.chdir('MERFISH_Moffit/')
 3 | 
 4 | from scvi.dataset import CsvDataset
 5 | from scvi.models import JVAE, Classifier
 6 | from scvi.inference import JVAETrainer
 7 | import numpy as np
 8 | import pandas as pd
 9 | import copy
10 | import torch
11 | import time as tm
12 | 
13 | ### MERFISH data
14 | MERFISH_data = CsvDataset('data/gimVI_data/MERFISH_data_scvi.csv', save_path = "", sep = ",", gene_by_cell = False)
15 | 
16 | ### AllenVISp
17 | RNA_data = CsvDataset('data/gimVI_data/Moffit_RNA_data_scvi.csv', save_path = "", sep = ",", gene_by_cell = False)
18 | 
19 | ### Leave-one-out validation
20 | Gene_set = np.intersect1d(MERFISH_data.gene_names,RNA_data.gene_names)
21 | MERFISH_data.gene_names = Gene_set
22 | MERFISH_data.X = MERFISH_data.X[:,np.reshape(np.vstack(np.argwhere(i==MERFISH_data.gene_names) for i in Gene_set),-1)]
23 | Common_data = copy.deepcopy(RNA_data)
24 | Common_data.gene_names = Gene_set
25 | Common_data.X = Common_data.X[:,np.reshape(np.vstack(np.argwhere(i==RNA_data.gene_names) for i in Gene_set),-1)]
26 | Imp_Genes = pd.DataFrame(columns=Gene_set)
27 | gimVI_time = []
28 | 
29 | for i in Gene_set:
30 |     print(i)
31 |     # Create copy of the fish dataset with hidden genes
32 |     data_spatial_partial = copy.deepcopy(MERFISH_data)
33 |     data_spatial_partial.filter_genes_by_attribute(np.setdiff1d(MERFISH_data.gene_names,i))
34 |     data_spatial_partial.batch_indices += Common_data.n_batches
35 |     
36 |     if(data_spatial_partial.X.shape[0] != MERFISH_data.X.shape[0]):
37 |         continue
38 |     
39 |     datasets = [Common_data, data_spatial_partial]
40 |     generative_distributions = ["zinb", "nb"]
41 |     gene_mappings = [slice(None), np.reshape(np.vstack(np.argwhere(i==Common_data.gene_names) for i in data_spatial_partial.gene_names),-1)]
42 |     n_inputs = [d.nb_genes for d in datasets]
43 |     total_genes = Common_data.nb_genes
44 |     n_batches = sum([d.n_batches for d in datasets])
45 |     
46 |     model_library_size = [True, False]
47 |     
48 |     n_latent = 8
49 |     kappa = 5
50 |     
51 |     start = tm.time()
52 |     torch.manual_seed(0)
53 |     
54 |     model = JVAE(
55 |         n_inputs,
56 |         total_genes,
57 |         gene_mappings,
58 |         generative_distributions,
59 |         model_library_size,
60 |         n_layers_decoder_individual=0,
61 |         n_layers_decoder_shared=0,
62 |         n_layers_encoder_individual=1,
63 |         n_layers_encoder_shared=1,
64 |         dim_hidden_encoder=64,
65 |         dim_hidden_decoder_shared=64,
66 |         dropout_rate_encoder=0.2,
67 |         dropout_rate_decoder=0.2,
68 |         n_batch=n_batches,
69 |         n_latent=n_latent,
70 |     )
71 |     discriminator = Classifier(n_latent, 32, 2, 3, logits=True)
72 |     
73 |     trainer = JVAETrainer(model, discriminator, datasets, 0.95, frequency=1, kappa=kappa)
74 |     trainer.train(n_epochs=200)
75 |     _,Imputed = trainer.get_imputed_values(normalized=True)
76 |     Imputed = np.reshape(Imputed[:,np.argwhere(i==Common_data.gene_names)[0]],-1)
77 |     Imp_Genes[i] = Imputed
78 |     gimVI_time.append(tm.time()-start)
79 |     
80 | Imp_Genes = Imp_Genes.fillna(0)    
81 | Imp_Genes.to_csv('Results/gimVI_LeaveOneOut.csv')
82 | gimVI_time = pd.DataFrame(gimVI_time)
83 | gimVI_time.to_csv('Results/gimVI_Time.csv', index = False)
84 | 


--------------------------------------------------------------------------------
/benchmark/STARmap_AllenVISp/DownSampling/DownSampling.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | os.chdir('STARmap_AllenVISp/')
  3 | 
  4 | import pickle
  5 | import numpy as np
  6 | import pandas as pd
  7 | from sklearn.neighbors import NearestNeighbors
  8 | import sys
  9 | sys.path.insert(1,'Scripts/SpaGE/')
 10 | from principal_vectors import PVComputation
 11 | 
 12 | with open ('data/SpaGE_pkl/Starmap.pkl', 'rb') as f:
 13 |     datadict = pickle.load(f)
 14 | 
 15 | Starmap_data = datadict['Starmap_data']
 16 | Starmap_data_scaled = datadict['Starmap_data_scaled']
 17 | coords = datadict['coords']
 18 | del datadict
 19 | 
 20 | with open ('data/SpaGE_pkl/Allen_VISp.pkl', 'rb') as f:
 21 |     datadict = pickle.load(f)
 22 |     
 23 | RNA_data = datadict['RNA_data']
 24 | RNA_data_scaled = datadict['RNA_data_scaled']
 25 | del datadict
 26 | 
 27 | all_centroids  = np.vstack([c.mean(0) for c in coords])
 28 | 
 29 | def Moran_I(SpatialData,XYmap):
 30 |     XYnbrs = NearestNeighbors(n_neighbors=5, algorithm='auto',metric = 'euclidean').fit(XYmap)
 31 |     XYdistances, XYindices = XYnbrs.kneighbors(XYmap)
 32 |     W = np.zeros((SpatialData.shape[0],SpatialData.shape[0]))
 33 |     for i in range(0,SpatialData.shape[0]):
 34 |         W[i,XYindices[i,:]]=1
 35 | 
 36 |     for i in range(0,SpatialData.shape[0]):
 37 |         W[i,i]=0
 38 |     
 39 |     I = pd.Series(index=SpatialData.columns)
 40 |     for k in SpatialData.columns:
 41 |         X_minus_mean = np.array(SpatialData[k] - np.mean(SpatialData[k]))
 42 |         X_minus_mean = np.reshape(X_minus_mean,(len(X_minus_mean),1))
 43 |         Nom = np.sum(np.multiply(W,np.matmul(X_minus_mean,X_minus_mean.T)))
 44 |         Den = np.sum(np.multiply(X_minus_mean,X_minus_mean))
 45 |         I[k] = (len(SpatialData[k])/np.sum(W))*(Nom/Den)
 46 |     return(I)
 47 |     
 48 | Moran_Is = Moran_I(Starmap_data,all_centroids)
 49 | 
 50 | Gene_Order = np.intersect1d(Starmap_data.columns,RNA_data.columns)
 51 | Moran_Is = Moran_Is[Gene_Order]
 52 | 
 53 | Moran_Is.sort_values(ascending=False,inplace=True)
 54 | test_set = Moran_Is.index[0:50]
 55 | 
 56 | Common_data = RNA_data[np.intersect1d(Starmap_data.columns,RNA_data.columns)]
 57 | Common_data = Common_data.drop(columns=test_set)
 58 | 
 59 | Corr = np.corrcoef(Common_data.T)
 60 | removed_genes = []
 61 | for i in range(0,Corr.shape[0]):
 62 |     for j in range(i+1,Corr.shape[0]):
 63 |         if(np.abs(Corr[i,j]) > 0.7):
 64 |             Vi = np.var(Common_data.iloc[:,i])
 65 |             Vj = np.var(Common_data.iloc[:,j])
 66 |             if(Vi > Vj):
 67 |                 removed_genes.append(Common_data.columns[j])
 68 |             else:
 69 |                 removed_genes.append(Common_data.columns[i])
 70 | removed_genes= np.unique(removed_genes)
 71 | 
 72 | Common_data = Common_data.drop(columns=removed_genes)
 73 | Variance = np.var(Common_data)
 74 | Variance.sort_values(ascending=False,inplace=True)
 75 | Variance = Variance.append(pd.Series(0,index=removed_genes))
 76 | 
 77 | genes_to_impute = test_set
 78 | for i in [10,30,50,100,200,500,len(Variance)]:
 79 |     Imp_New_Genes = pd.DataFrame(np.zeros((Starmap_data.shape[0],len(genes_to_impute))),columns=genes_to_impute)
 80 |     
 81 |     if(i>=50):
 82 |         n_factors = 50
 83 |         n_pv = 50
 84 |     else:
 85 |         n_factors = i
 86 |         n_pv = i
 87 |     
 88 |     dim_reduction = 'pca'
 89 |     dim_reduction_target = 'pca'
 90 |     
 91 |     pv_FISH_RNA = PVComputation(
 92 |             n_factors = n_factors,
 93 |             n_pv = n_pv,
 94 |             dim_reduction = dim_reduction,
 95 |             dim_reduction_target = dim_reduction_target
 96 |     )
 97 |     
 98 |     source_data = RNA_data_scaled[Variance.index[0:i]]
 99 |     target_data = Starmap_data_scaled[Variance.index[0:i]]
100 |     
101 |     pv_FISH_RNA.fit(source_data,target_data)
102 |     
103 |     S = pv_FISH_RNA.source_components_.T
104 |     
105 |     Effective_n_pv = sum(np.diag(pv_FISH_RNA.cosine_similarity_matrix_) > 0.3)
106 |     S = S[:,0:Effective_n_pv]
107 |     
108 |     RNA_data_t = source_data.dot(S)
109 |     FISH_exp_t = target_data.dot(S)
110 |         
111 |     nbrs = NearestNeighbors(n_neighbors=50, algorithm='auto',
112 |                             metric = 'cosine').fit(RNA_data_t)
113 |     distances, indices = nbrs.kneighbors(FISH_exp_t)
114 |      
115 |     for j in range(0,Starmap_data.shape[0]):
116 |         weights = 1-(distances[j,:][distances[j,:]<1])/(np.sum(distances[j,:][distances[j,:]<1]))
117 |         weights = weights/(len(weights)-1)
118 |         Imp_New_Genes.iloc[j,:] = np.dot(weights,RNA_data[genes_to_impute].iloc[indices[j,:][distances[j,:] < 1]])
119 |         
120 |     Imp_New_Genes.to_csv('Results/' + str(i) +'SpaGE_New_genes.csv')
121 | 


--------------------------------------------------------------------------------
/benchmark/STARmap_AllenVISp/DownSampling/DownSampling_evaluation.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | os.chdir('STARmap_AllenVISp/')
  3 | 
  4 | import numpy as np
  5 | import pandas as pd
  6 | import matplotlib
  7 | matplotlib.use('qt5agg')
  8 | matplotlib.rcParams['pdf.fonttype'] = 42
  9 | matplotlib.rcParams['ps.fonttype'] = 42
 10 | import matplotlib.pyplot as plt
 11 | import scipy.stats as st
 12 | import pickle
 13 | 
 14 | with open ('data/SpaGE_pkl/Starmap.pkl', 'rb') as f:
 15 |     datadict = pickle.load(f)
 16 | 
 17 | Starmap_data = datadict['Starmap_data']
 18 | del datadict
 19 | 
 20 | test_set = pd.read_csv('Results/10SpaGE_New_genes.csv',header=0,index_col=0,sep=',').columns
 21 | Starmap_data = Starmap_data[test_set]
 22 | 
 23 | ### SpaGE
 24 | #10
 25 | SpaGE_imputed_10 = pd.read_csv('Results/10SpaGE_New_genes.csv',header=0,index_col=0,sep=',')
 26 | 
 27 | SpaGE_Corr_10 = pd.Series(index = test_set)
 28 | for i in test_set:
 29 |     SpaGE_Corr_10[i] = st.spearmanr(Starmap_data[i],SpaGE_imputed_10[i])[0]
 30 |   
 31 | #30
 32 | SpaGE_imputed_30 = pd.read_csv('Results/30SpaGE_New_genes.csv',header=0,index_col=0,sep=',')
 33 | 
 34 | SpaGE_Corr_30 = pd.Series(index = test_set)
 35 | for i in test_set:
 36 |     SpaGE_Corr_30[i] = st.spearmanr(Starmap_data[i],SpaGE_imputed_30[i])[0]
 37 | 
 38 | #50    
 39 | SpaGE_imputed_50 = pd.read_csv('Results/50SpaGE_New_genes.csv',header=0,index_col=0,sep=',')
 40 | 
 41 | SpaGE_Corr_50 = pd.Series(index = test_set)
 42 | for i in test_set:
 43 |     SpaGE_Corr_50[i] = st.spearmanr(Starmap_data[i],SpaGE_imputed_50[i])[0]
 44 | 
 45 | #100    
 46 | SpaGE_imputed_100 = pd.read_csv('Results/100SpaGE_New_genes.csv',header=0,index_col=0,sep=',')
 47 | 
 48 | SpaGE_Corr_100 = pd.Series(index = test_set)
 49 | for i in test_set:
 50 |     SpaGE_Corr_100[i] = st.spearmanr(Starmap_data[i],SpaGE_imputed_100[i])[0]
 51 | 
 52 | #200    
 53 | SpaGE_imputed_200 = pd.read_csv('Results/200SpaGE_New_genes.csv',header=0,index_col=0,sep=',')
 54 | 
 55 | SpaGE_Corr_200 = pd.Series(index = test_set)
 56 | for i in test_set:
 57 |     SpaGE_Corr_200[i] = st.spearmanr(Starmap_data[i],SpaGE_imputed_200[i])[0]
 58 |     
 59 | #500    
 60 | SpaGE_imputed_500 = pd.read_csv('Results/500SpaGE_New_genes.csv',header=0,index_col=0,sep=',')
 61 | 
 62 | SpaGE_Corr_500 = pd.Series(index = test_set)
 63 | for i in test_set:
 64 |     SpaGE_Corr_500[i] = st.spearmanr(Starmap_data[i],SpaGE_imputed_500[i])[0]
 65 | 
 66 | #944    
 67 | SpaGE_imputed_944 = pd.read_csv('Results/944SpaGE_New_genes.csv',header=0,index_col=0,sep=',')
 68 | 
 69 | SpaGE_Corr_944 = pd.Series(index = test_set)
 70 | for i in test_set:
 71 |     SpaGE_Corr_944[i] = st.spearmanr(Starmap_data[i],SpaGE_imputed_944[i])[0]
 72 | 
 73 | ### Comparison plots
 74 | plt.style.use('ggplot')
 75 | plt.figure(figsize=(9, 3))
 76 | plt.boxplot([SpaGE_Corr_10, SpaGE_Corr_30, SpaGE_Corr_50,
 77 |              SpaGE_Corr_100,SpaGE_Corr_200,SpaGE_Corr_944])
 78 | 
 79 | y = SpaGE_Corr_10
 80 | x = np.random.normal(1, 0.05, len(y))
 81 | plt.plot(x, y, 'g.', alpha=0.2)
 82 | y = SpaGE_Corr_30
 83 | x = np.random.normal(2, 0.05, len(y))
 84 | plt.plot(x, y, 'g.', alpha=0.2)
 85 | y = SpaGE_Corr_50
 86 | x = np.random.normal(3, 0.05, len(y))
 87 | plt.plot(x, y, 'g.', alpha=0.2)
 88 | y = SpaGE_Corr_100
 89 | x = np.random.normal(4, 0.05, len(y))
 90 | plt.plot(x, y, 'g.', alpha=0.2)
 91 | y = SpaGE_Corr_200
 92 | x = np.random.normal(5, 0.05, len(y))
 93 | plt.plot(x, y, 'g.', alpha=0.2)
 94 | y = SpaGE_Corr_500
 95 | x = np.random.normal(6, 0.05, len(y))
 96 | plt.plot(x, y, 'g.', alpha=0.2)
 97 | y = SpaGE_Corr_944
 98 | x = np.random.normal(7, 0.05, len(y))
 99 | plt.plot(x, y, 'g.', alpha=0.2)
100 | 
101 | 
102 | plt.xticks((1,2,3,4,5,6,7),('10','30', '50','100','200','500','944'),size=10)
103 | plt.yticks(size=8)
104 | plt.xlabel('Number of shared genes',size=12)
105 | plt.gca().set_ylim([-0.3,0.8])
106 | plt.ylabel('Spearman Correlation',size=12)
107 | plt.show()
108 | 


--------------------------------------------------------------------------------
/benchmark/STARmap_AllenVISp/Liger/LIGER.R:
--------------------------------------------------------------------------------
 1 | setwd("STARmap_AllenVISp/")
 2 | library(liger)
 3 | library(Seurat)
 4 | library(ggplot2)
 5 | 
 6 | # allen VISp
 7 | allen <- read.table(file = "data/Allen_VISp/mouse_VISp_2018-06-14_exon-matrix.csv",
 8 |                     row.names = 1, sep = ',', stringsAsFactors = FALSE, header = TRUE)
 9 | allen <- as.matrix(x = allen)
10 | genes <- read.table(file = "data/Allen_VISp/mouse_VISp_2018-06-14_genes-rows.csv",
11 |                     sep = ',', stringsAsFactors = FALSE, header = TRUE)
12 | rownames(x = allen) <- make.unique(names = genes$gene_symbol)
13 | meta.data <- read.csv(file = "data/Allen_VISp/mouse_VISp_2018-06-14_samples-columns.csv",
14 |                       row.names = 1, stringsAsFactors = FALSE)
15 | 
16 | Genes_count = rowSums(allen > 0)
17 | allen <- allen[Genes_count>=10,]
18 | 
19 | low.q.cells <- rownames(x = meta.data[meta.data$class %in% c('Low Quality', 'No Class'), ])
20 | ok.cells <- rownames(x = meta.data)[!(rownames(x = meta.data) %in% low.q.cells)]
21 | allen <-allen[,ok.cells]
22 | 
23 | # STARmap
24 | counts <- read.table(file = "data/Starmap/visual_1020/20180505_BY3_1kgenes/cell_barcode_count.csv",
25 |                      sep = ",", stringsAsFactors = FALSE)
26 | gene.names <- read.table(file = "data/Starmap/visual_1020/20180505_BY3_1kgenes/genes.csv",
27 |                          sep = ",", stringsAsFactors = FALSE)
28 | 
29 | qhulls <- read.table(file = "data/Starmap/visual_1020/20180505_BY3_1kgenes/qhulls.tsv",
30 |                      sep = '\t', col.names = c('cell', 'y', 'x'), stringsAsFactors = FALSE)
31 | centroids <- read.table(file = "data/Starmap/visual_1020/20180505_BY3_1kgenes/centroids.tsv",
32 |                         sep = "\t", col.names = c("spatial2", "spatial1"), stringsAsFactors = FALSE)
33 | colnames(counts) <- gene.names$V1
34 | rownames(counts) <- paste0('starmap', seq(1:nrow(counts)))
35 | counts <- as.matrix(counts)
36 | rownames(centroids) <- rownames(counts)
37 | centroids <- as.matrix(centroids)
38 | counts <- t(counts)
39 | 
40 | Gene_set <- intersect(rownames(counts),rownames(allen))
41 | 
42 | #### New genes prediction
43 | Ligerex <- createLiger(list(STARmap = counts, AllenVISp = allen))
44 | Ligerex <- normalize(Ligerex)
45 | Ligerex@var.genes <- Gene_set
46 | Ligerex <- scaleNotCenter(Ligerex)
47 | 
48 | # suggestK(Ligerex) # K = 25
49 | # suggestLambda(Ligerex, k = 25) # Lambda = 35
50 | 
51 | Ligerex <- optimizeALS(Ligerex,k = 25, lambda = 35)
52 | Ligerex <- quantileAlignSNF(Ligerex)
53 | 
54 | Imputation <- imputeKNN(Ligerex,reference = 'AllenVISp', queries = list('STARmap'), norm = TRUE, scale = FALSE)
55 | 
56 | new.genes <- c('Tesc', 'Pvrl3', 'Sox10', 'Grm2', 'Tcrb')
57 | Imp_New_genes <- matrix(0,nrow= length(new.genes),ncol = dim(Imputation@norm.data$STARmap)[2])
58 | rownames(Imp_New_genes) <- new.genes
59 | 
60 | for (i in c(1:length(new.genes))){
61 |   Imp_New_genes[new.genes[[i]],] = as.vector(Imputation@norm.data$STARmap[new.genes[i],])
62 | }
63 | write.csv(Imp_New_genes,file = 'Results/Liger_New_genes.csv')
64 | 
65 | # leave-one-out validation
66 | genes.leaveout <- Gene_set
67 | Imp_genes <- matrix(0,nrow = length(genes.leaveout),ncol = dim(counts)[2])
68 | rownames(Imp_genes) <- genes.leaveout
69 | colnames(Imp_genes) <- colnames(counts)
70 | NMF_time <- vector(mode= "numeric")
71 | knn_time <- vector(mode= "numeric")
72 | 
73 | for(i in 1:length(genes.leaveout)) {
74 |   print(i)
75 |   start_time <- Sys.time()
76 |   Ligerex.leaveout <- createLiger(list(STARmap = counts[-which(rownames(counts) %in% genes.leaveout[i]),], AllenVISp = allen))
77 |   Ligerex.leaveout <- normalize(Ligerex.leaveout)
78 |   Ligerex.leaveout@var.genes <- setdiff(Gene_set,genes.leaveout[i])
79 |   Ligerex.leaveout <- scaleNotCenter(Ligerex.leaveout)
80 |   Ligerex.leaveout <- optimizeALS(Ligerex.leaveout,k = 25, lambda = 35)
81 |   Ligerex.leaveout <- quantileAlignSNF(Ligerex.leaveout)
82 |   end_time <- Sys.time()
83 |   NMF_time <- c(NMF_time,as.numeric(difftime(end_time,start_time,units = 'secs')))
84 |   
85 |   start_time <- Sys.time()
86 |   Imputation <- imputeKNN(Ligerex.leaveout,reference = 'AllenVISp', queries = list('STARmap'), norm = TRUE, scale = FALSE)
87 |   Imp_genes[genes.leaveout[[i]],] = as.vector(Imputation@norm.data$STARmap[genes.leaveout[i],])
88 |   end_time <- Sys.time()
89 |   knn_time <- c(knn_time,as.numeric(difftime(end_time,start_time,units = 'secs')))
90 | }
91 | write.csv(Imp_genes,file = 'Results/Liger_LeaveOneOut.csv')
92 | write.csv(NMF_time,file = 'Results/Liger_NMF_time.csv',row.names = FALSE)
93 | write.csv(knn_time,file = 'Results/Liger_knn_time.csv',row.names = FALSE)


--------------------------------------------------------------------------------
/benchmark/STARmap_AllenVISp/Performance_evaluation.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | os.chdir('STARmap_AllenVISp/')
  3 | 
  4 | import numpy as np
  5 | import pandas as pd
  6 | import pickle
  7 | import matplotlib
  8 | matplotlib.use('qt5agg')
  9 | matplotlib.rcParams['pdf.fonttype'] = 42
 10 | matplotlib.rcParams['ps.fonttype'] = 42
 11 | import matplotlib.pyplot as plt
 12 | import scipy.stats as st
 13 | from sklearn.neighbors import NearestNeighbors
 14 | from matplotlib import cm
 15 | 
 16 | with open ('data/SpaGE_pkl/Starmap.pkl', 'rb') as f:
 17 |     datadict = pickle.load(f)
 18 | 
 19 | coords = datadict['coords']
 20 | Starmap_data = datadict['Starmap_data']
 21 | del datadict
 22 | 
 23 | with open ('data/SpaGE_pkl/Allen_VISp.pkl', 'rb') as f:
 24 |     datadict = pickle.load(f)
 25 |     
 26 | RNA_data = datadict['RNA_data']
 27 | del datadict
 28 | 
 29 | all_centroids  = np.vstack([c.mean(0) for c in coords])
 30 | 
 31 | plt.style.use('dark_background')
 32 | cmap = cm.get_cmap('viridis',20)
 33 | 
 34 | def Moran_I(SpatialData,XYmap):
 35 |     XYnbrs = NearestNeighbors(n_neighbors=5, algorithm='auto',metric = 'euclidean').fit(XYmap)
 36 |     XYdistances, XYindices = XYnbrs.kneighbors(XYmap)
 37 |     W = np.zeros((SpatialData.shape[0],SpatialData.shape[0]))
 38 |     for i in range(0,SpatialData.shape[0]):
 39 |         W[i,XYindices[i,:]]=1
 40 | 
 41 |     for i in range(0,SpatialData.shape[0]):
 42 |         W[i,i]=0
 43 |     
 44 |     I = pd.Series(index=SpatialData.columns)
 45 |     for k in SpatialData.columns:
 46 |         X_minus_mean = np.array(SpatialData[k] - np.mean(SpatialData[k]))
 47 |         X_minus_mean = np.reshape(X_minus_mean,(len(X_minus_mean),1))
 48 |         Nom = np.sum(np.multiply(W,np.matmul(X_minus_mean,X_minus_mean.T)))
 49 |         Den = np.sum(np.multiply(X_minus_mean,X_minus_mean))
 50 |         I[k] = (len(SpatialData[k])/np.sum(W))*(Nom/Den)
 51 |     return(I)
 52 |     
 53 | Moran_Is = Moran_I(Starmap_data,all_centroids)
 54 |     
 55 | Gene_Order = np.intersect1d(Starmap_data.columns,RNA_data.columns)
 56 | 
 57 | Moran_Is = Moran_Is[Gene_Order]
 58 | 
 59 | ### SpaGE
 60 | SpaGE_imputed = pd.read_csv('Results/SpaGE_LeaveOneOut.csv',header=0,index_col=0,sep=',')
 61 | 
 62 | SpaGE_imputed = SpaGE_imputed.loc[:,Gene_Order]
 63 | 
 64 | SpaGE_Corr = pd.Series(index = Gene_Order)
 65 | for i in Gene_Order:
 66 |     SpaGE_Corr[i] = st.spearmanr(Starmap_data[i],SpaGE_imputed[i])[0]
 67 |     
 68 | ### gimVI
 69 | gimVI_imputed = pd.read_csv('Results/gimVI_LeaveOneOut.csv',header=0,index_col=0,sep=',')
 70 | 
 71 | gimVI_imputed.columns = Gene_Order
 72 | 
 73 | gimVI_Corr = pd.Series(index = Gene_Order)
 74 | for i in Gene_Order:
 75 |     gimVI_Corr[i] = st.spearmanr(Starmap_data[i],gimVI_imputed[i])[0]
 76 | gimVI_Corr[np.isnan(gimVI_Corr)] = 0
 77 | 
 78 | ### Seurat
 79 | Seurat_imputed = pd.read_csv('Results/Seurat_LeaveOneOut.csv',header=0,index_col=0,sep=',').T
 80 | 
 81 | Seurat_imputed = Seurat_imputed.loc[:,Gene_Order]
 82 | Seurat_imputed.index = range(0,Seurat_imputed.shape[0])
 83 | cell_labels = pd.read_csv('data/Starmap/visual_1020/20180505_BY3_1kgenes/class_labels.csv',
 84 |                           header=0,sep=',')
 85 | Starmap_data_Seurat = Starmap_data.drop(np.where(cell_labels['ClusterName']=='HPC')[0],axis=0)
 86 | 
 87 | Seurat_Corr = pd.Series(index = Gene_Order)
 88 | for i in Gene_Order:
 89 |     Seurat_Corr[i] = st.spearmanr(Starmap_data_Seurat[i],Seurat_imputed[i])[0]
 90 | 
 91 | ### Liger
 92 | Liger_imputed = pd.read_csv('Results/Liger_LeaveOneOut.csv',header=0,index_col=0,sep=',').T
 93 | 
 94 | Liger_imputed = Liger_imputed.loc[:,Gene_Order]
 95 | Liger_imputed.index = range(0,Liger_imputed.shape[0])
 96 | 
 97 | Liger_Corr = pd.Series(index = Gene_Order)
 98 | for i in Gene_Order:
 99 |     Liger_Corr[i] = st.spearmanr(Starmap_data[i],Liger_imputed[i])[0]
100 | Liger_Corr[np.isnan(Liger_Corr)] = 0
101 | 
102 | ### Comparison plots
103 | plt.style.use('ggplot')
104 | fig, ax = plt.subplots(figsize=(3.7, 5.5))
105 | ax.boxplot([SpaGE_Corr,Seurat_Corr, Liger_Corr,gimVI_Corr])
106 | 
107 | y = SpaGE_Corr
108 | x = np.random.normal(1, 0.05, len(y))
109 | plt.plot(x, y, 'g.', markersize=1, alpha=0.2)
110 | y = Seurat_Corr
111 | x = np.random.normal(2, 0.05, len(y))
112 | plt.plot(x, y, 'g.', markersize=1, alpha=0.2)
113 | y = Liger_Corr
114 | x = np.random.normal(3, 0.05, len(y))
115 | plt.plot(x, y, 'g.', markersize=1, alpha=0.2)
116 | y = gimVI_Corr
117 | x = np.random.normal(4, 0.05, len(y))
118 | plt.plot(x, y, 'g.', markersize=1, alpha=0.2)
119 | 
120 | plt.xticks((1,2,3,4),('SpaGE', 'Seurat', 'Liger','gimVI'),size=12)
121 | plt.yticks(size=8)
122 | plt.gca().set_ylim([-0.5,1])
123 | plt.ylabel('Spearman Correlation',size=12)
124 | ax.set_aspect(aspect=3)
125 | _,p_val = st.wilcoxon(SpaGE_Corr,Seurat_Corr)
126 | plt.text(2,np.max(plt.gca().get_ylim()),'%1.2e'%p_val,color='black',size=8)
127 | _,p_val = st.wilcoxon(SpaGE_Corr,Liger_Corr)
128 | plt.text(3,np.max(plt.gca().get_ylim()),'%1.2e'%p_val,color='black',size=8)
129 | _,p_val = st.wilcoxon(SpaGE_Corr,gimVI_Corr)
130 | plt.text(4,np.max(plt.gca().get_ylim()),'%1.2e'%p_val,color='black',size=8)
131 | plt.show()
132 | 
133 | def Compare_Correlations(X,Y):
134 |     fig, ax = plt.subplots(figsize=(5.2, 5.2))
135 |     cmap = Moran_Is
136 |     ax.axvline(linestyle='--',color='gray')
137 |     ax.axhline(linestyle='--',color='gray')
138 |     im = ax.scatter(X, Y, s=1, c=cmap)
139 |     im.set_cmap('seismic')
140 |     plt.gca().set_ylim([-0.5,1])
141 |     lims = [np.min([ax.get_xlim(), ax.get_ylim()]),  
142 |         np.max([ax.get_xlim(), ax.get_ylim()])]
143 |     ax.plot(lims, lims, 'k-')
144 |     ax.set_aspect('equal')
145 |     ax.set_xlim(lims)
146 |     ax.set_ylim(lims)
147 |     plt.xticks((-0.4,-0.2,0,0.2,0.4,0.6,0.8,1),size=8)
148 |     plt.yticks((-0.4,-0.2,0,0.2,0.4,0.6,0.8,1),size=8)
149 |     cbar = plt.colorbar(im)
150 |     cbar.ax.tick_params(labelsize=8) 
151 |     cbar.ax.set_ylabel("Moran's I statistic",fontsize=12)
152 |     plt.show()
153 | 
154 | Compare_Correlations(Seurat_Corr,SpaGE_Corr)
155 | plt.xlabel('Spearman Correlation Seurat',size=12)
156 | plt.ylabel('Spearman Correlation SpaGE',size=12)
157 | plt.show()
158 | 
159 | Compare_Correlations(Liger_Corr,SpaGE_Corr)
160 | plt.xlabel('Spearman Correlation Liger',size=12)
161 | plt.ylabel('Spearman Correlation SpaGE',size=12)
162 | plt.show()
163 | 
164 | Compare_Correlations(gimVI_Corr,SpaGE_Corr)
165 | plt.xlabel('Spearman Correlation gimVI',size=12)
166 | plt.ylabel('Spearman Correlation SpaGE',size=12)
167 | plt.show()
168 | 
169 | def Correlation_vs_Moran(X,Y):
170 |     fig, ax = plt.subplots(figsize=(4.8, 4.8))
171 |     ax.scatter(X, Y, s=1)
172 |     Corr = st.spearmanr(X,Y)[0]
173 |     plt.text(np.mean(plt.gca().get_xlim()),np.min(plt.gca().get_ylim()),'%1.3f'%Corr,color='black',size=9)
174 |     plt.xticks(size=8)
175 |     plt.yticks(size=8)
176 |     plt.axis('scaled')
177 |     plt.gca().set_ylim([-0.5,1])
178 |     plt.gca().set_xlim([-0.2,1])
179 |     plt.show()
180 | 
181 | 
182 | Correlation_vs_Moran(Moran_Is,SpaGE_Corr)
183 | plt.xlabel("Moran's I",size=12)
184 | plt.ylabel('Spearman Correlation SpaGE',size=12)
185 | plt.show()
186 | 
187 | Correlation_vs_Moran(Moran_Is,Seurat_Corr)
188 | plt.xlabel("Moran's I",size=12)
189 | plt.ylabel('Spearman Correlation Seurat',size=12)
190 | plt.show()
191 | 
192 | Correlation_vs_Moran(Moran_Is,Liger_Corr)
193 | plt.xlabel("Moran's I",size=12)
194 | plt.ylabel('Spearman Correlation Liger',size=12)
195 | plt.show()
196 | 
197 | Correlation_vs_Moran(Moran_Is,gimVI_Corr)
198 | plt.xlabel("Moran's I",size=12)
199 | plt.ylabel('Spearman Correlation gimVI',size=12)
200 | plt.show()
201 | 


--------------------------------------------------------------------------------
/benchmark/STARmap_AllenVISp/Seurat/allen_brain.R:
--------------------------------------------------------------------------------
 1 | setwd("STARmap_AllenVISp/")
 2 | library(Seurat)
 3 | 
 4 | allen <- read.table(file = "data/Allen_VISp/mouse_VISp_2018-06-14_exon-matrix.csv",
 5 |                     row.names = 1,sep = ',', stringsAsFactors = FALSE, header = TRUE)
 6 | allen <- as.matrix(x = allen)
 7 | genes <- read.table(file = "data/Allen_VISp/mouse_VISp_2018-06-14_genes-rows.csv",
 8 |                     sep = ',', stringsAsFactors = FALSE, header = TRUE)
 9 | rownames(x = allen) <- make.unique(names = genes$gene_symbol)
10 | meta.data <- read.csv(file = "data/Allen_VISp/mouse_VISp_2018-06-14_samples-columns.csv",
11 |                       row.names = 1, stringsAsFactors = FALSE)
12 | 
13 | al <- CreateSeuratObject(counts = allen, project = 'VISp', meta.data = meta.data, min.cells = 10)
14 | low.q.cells <- rownames(x = meta.data[meta.data$class %in% c('Low Quality', 'No Class'), ])
15 | ok.cells <- rownames(x = meta.data)[!(rownames(x = meta.data) %in% low.q.cells)]
16 | al <- al[, ok.cells]
17 | al <- NormalizeData(object = al)
18 | al <- FindVariableFeatures(object = al, nfeatures = 2000)
19 | al <- ScaleData(object = al)
20 | al <- RunPCA(object = al, npcs = 50, verbose = FALSE)
21 | al <- RunUMAP(object = al, dims = 1:50, nneighbors = 5)
22 | saveRDS(object = al, file = paste0("data/seurat_objects/","allen_brain.rds"))


--------------------------------------------------------------------------------
/benchmark/STARmap_AllenVISp/Seurat/impute_starmap.R:
--------------------------------------------------------------------------------
 1 | setwd("STARmap_AllenVISp/")
 2 | library(Seurat)
 3 | 
 4 | starmap <- readRDS("data/seurat_objects/20180505_BY3_1kgenes.rds")
 5 | allen <- readRDS("data/seurat_objects/allen_brain.rds")
 6 | 
 7 | # remove HPC from starmap
 8 | class_labels <- read.table(
 9 |   file = "data/Starmap/visual_1020/20180505_BY3_1kgenes/class_labels.csv",
10 |   sep = ",",
11 |   header = TRUE,
12 |   stringsAsFactors = FALSE
13 | )
14 | 
15 | class_labels$cellname <- paste0('starmap', rownames(class_labels))
16 | 
17 | class_labels$ClusterName <- ifelse(is.na(class_labels$ClusterName), 'Other', class_labels$ClusterName)
18 | 
19 | hpc <- class_labels[class_labels$ClusterName == 'HPC', ]$cellname
20 | 
21 | accept.cells <- setdiff(colnames(starmap), hpc)
22 | 
23 | starmap <- starmap[, accept.cells]
24 | 
25 | starmap@misc$spatial <- starmap@misc$spatial[starmap@misc$spatial$cell %in% accept.cells, ]
26 | 
27 | #Project on allen labels
28 | i2 <- FindTransferAnchors(
29 |   reference = allen,
30 |   query = starmap,
31 |   features = rownames(starmap),
32 |   reduction = 'cca',
33 |   reference.assay = 'RNA',
34 |   query.assay = 'RNA'
35 | )
36 | 
37 | refdata <- GetAssayData(
38 |   object = allen,
39 |   assay = 'RNA',
40 |   slot = 'data'
41 | )
42 | 
43 | imputation <- TransferData(
44 |   anchorset = i2,
45 |   refdata = refdata,
46 |   weight.reduction = 'pca'
47 | )
48 | 
49 | starmap[['ss2']] <- imputation
50 | saveRDS(starmap, 'data/seurat_objects/20180505_BY3_1kgenes_imputed.rds')


--------------------------------------------------------------------------------
/benchmark/STARmap_AllenVISp/Seurat/starmap.R:
--------------------------------------------------------------------------------
 1 | setwd("STARmap_AllenVISp/")
 2 | library(Seurat)
 3 | library(Matrix)
 4 | 
 5 | read_data <- function(base_path, project) {
 6 |   counts <- read.table(
 7 |     file = paste0(base_path, "cell_barcode_count.csv"),
 8 |     sep = ",",
 9 |     stringsAsFactors = FALSE
10 |   )
11 |   gene.names <- read.table(
12 |     file = paste0(base_path, "genes.csv"),
13 |     sep = ",",
14 |     stringsAsFactors = FALSE
15 |   )
16 |   qhulls <- read.table(
17 |     file = paste0(base_path, "qhulls.tsv"),
18 |     sep = '\t',
19 |     col.names = c('cell', 'y', 'x'),
20 |     stringsAsFactors = FALSE
21 |   )
22 |   centroids <- read.table(
23 |     file = paste0(base_path, "centroids.tsv"),
24 |     sep = "\t",
25 |     col.names = c("spatial2", "spatial1"),
26 |     stringsAsFactors = FALSE
27 |   )
28 |   colnames(x = counts) <- gene.names$V1
29 |   rownames(x = counts) <- paste0('starmap', seq(1:nrow(x = counts)))
30 |   counts <- as.matrix(x = counts)
31 |   rownames(x = centroids) <- rownames(x = counts)
32 |   centroids <- as.matrix(x = centroids)
33 |   total.counts = rowSums(x = counts)
34 |   
35 |   obj <- CreateSeuratObject(
36 |     counts = t(x = counts),
37 |     project = project,
38 |     min.cells = -1,
39 |     min.features = -1
40 |   )
41 |   obj <- NormalizeData(
42 |     object = obj,
43 |     scale.factor = median(x = total.counts)
44 |   )
45 |   obj <- ScaleData(
46 |     object = obj,
47 |     features = rownames(x = obj)
48 |   )
49 |   obj <- RunPCA(
50 |     object = obj,
51 |     features = rownames(x = obj),
52 |     npcs = 30
53 |   )
54 |   obj <- RunUMAP(
55 |     object = obj,
56 |     dims = 1:30
57 |   )
58 |   obj[['spatial']] <- CreateDimReducObject(
59 |     embeddings <- centroids, 
60 |     assay = "RNA",
61 |     key = 'spatial'
62 |   )
63 |   qhulls$cell <- paste0('starmap', qhulls$cell)
64 |   obj@misc[['spatial']] <- qhulls
65 |   return(obj)
66 | }
67 | 
68 | # experiments <- c("visual_1020/20180505_BY3_1kgenes/", "visual_1020/20180410-BY3_1kgenes/")
69 | experiments <- c("data/Starmap/visual_1020/20180505_BY3_1kgenes/")
70 | 
71 | for(i in 1:length(experiments)) {
72 |   project.name <- unlist(x = strsplit(x = experiments[[i]], "/"))[[4]]
73 |   dat <- read_data(base_path =  experiments[[i]],project = project.name)
74 |   saveRDS(object = dat, file = paste0("data/seurat_objects/", project.name, ".rds"))
75 | }


--------------------------------------------------------------------------------
/benchmark/STARmap_AllenVISp/Seurat/starmap_integration.R:
--------------------------------------------------------------------------------
 1 | setwd("STARmap_AllenVISp/")
 2 | library(Seurat)
 3 | library(ggplot2)
 4 | 
 5 | starmap <- readRDS("data/seurat_objects/20180505_BY3_1kgenes.rds")
 6 | starmap.imputed <- readRDS("data/seurat_objects/20180505_BY3_1kgenes_imputed.rds")
 7 | allen <- readRDS("data/seurat_objects/allen_brain.rds")
 8 | 
 9 | # remove HPC from starmap
10 | class_labels <- read.table(
11 |   file = "data/Starmap/visual_1020/20180505_BY3_1kgenes/class_labels.csv",
12 |   sep = ",",
13 |   header = TRUE,
14 |   stringsAsFactors = FALSE
15 | )
16 | 
17 | class_labels$cellname <- paste0('starmap', rownames(class_labels))
18 | 
19 | class_labels$ClusterName <- ifelse(is.na(class_labels$ClusterName), 'Other', class_labels$ClusterName)
20 | 
21 | hpc <- class_labels[class_labels$ClusterName == 'HPC', ]$cellname
22 | 
23 | accept.cells <- setdiff(colnames(starmap), hpc)
24 | 
25 | starmap <- starmap[, accept.cells]
26 | 
27 | starmap@misc$spatial <- starmap@misc$spatial[starmap@misc$spatial$cell %in% accept.cells, ]
28 | 
29 | genes.leaveout <- intersect(rownames(starmap),rownames(allen))
30 | Imp_genes <- matrix(0,nrow = length(genes.leaveout),ncol = dim(starmap@assays$RNA)[2])
31 | rownames(Imp_genes) <- genes.leaveout
32 | anchor_time <- vector(mode= "numeric")
33 | Transfer_time <- vector(mode= "numeric")
34 | 
35 | run_imputation <- function(ref.obj, query.obj, feature.remove) {
36 |   message(paste0('removing ', feature.remove))
37 |   features <- setdiff(rownames(query.obj), feature.remove)
38 |   DefaultAssay(ref.obj) <- 'RNA'
39 |   DefaultAssay(query.obj) <- 'RNA'
40 |   
41 |   start_time <- Sys.time()
42 |   anchors <- FindTransferAnchors(
43 |     reference = ref.obj,
44 |     query = query.obj,
45 |     features = features,
46 |     dims = 1:30,
47 |     reduction = 'cca'
48 |   )
49 |   end_time <- Sys.time()
50 |   anchor_time <<- c(anchor_time,as.numeric(difftime(end_time,start_time,units = 'secs')))
51 |   
52 |   refdata <- GetAssayData(
53 |     object = ref.obj,
54 |     assay = 'RNA',
55 |     slot = 'data'
56 |   )
57 |   
58 |   start_time <- Sys.time()
59 |   imputation <- TransferData(
60 |     anchorset = anchors,
61 |     refdata = refdata,
62 |     weight.reduction = 'pca'
63 |   )
64 |   query.obj[['seq']] <- imputation
65 |   end_time <- Sys.time()
66 |   Transfer_time <<- c(Transfer_time,as.numeric(difftime(end_time,start_time,units = 'secs')))
67 |   return(query.obj)
68 | }
69 | 
70 | for(i in 1:length(genes.leaveout)) {
71 |   imputed.ss2 <- run_imputation(ref.obj = allen, query.obj = starmap, feature.remove = genes.leaveout[[i]])
72 |   starmap[['ss2']] <- imputed.ss2[, colnames(starmap)][['seq']]
73 |   Imp_genes[genes.leaveout[[i]],] = as.vector(starmap@assays$ss2[genes.leaveout[i],])
74 | }
75 | write.csv(Imp_genes,file = 'Results/Seurat_LeaveOneOut.csv')
76 | write.csv(anchor_time,file = 'Results/Seurat_anchor_time.csv',row.names = FALSE)
77 | write.csv(Transfer_time,file = 'Results/Seurat_transfer_time.csv',row.names = FALSE)
78 | 
79 | # show genes not in the starmap dataset
80 | DefaultAssay(starmap.imputed) <- "ss2"
81 | new.genes <- c('Tesc', 'Pvrl3', 'Sox10', 'Grm2', 'Tcrb')
82 | Imp_New_genes <- matrix(0,nrow = length(new.genes),ncol = dim(starmap.imputed@assays$ss2)[2])
83 | rownames(Imp_New_genes) <- new.genes
84 | for(i in 1:length(new.genes)) {
85 |   Imp_New_genes[new.genes[[i]],] = as.vector(starmap.imputed@assays$ss2[new.genes[i],])
86 | }
87 | write.csv(Imp_New_genes,file = 'Results/Seurat_New_genes.csv')


--------------------------------------------------------------------------------
/benchmark/STARmap_AllenVISp/SpaGE/Allen_data_preprocessing.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | os.chdir('STARmap_AllenVISp/')
 3 | 
 4 | import numpy as np
 5 | import pandas as pd
 6 | import scipy.stats as st
 7 | import pickle
 8 | 
 9 | RNA_data = pd.read_csv('data/Allen_VISp/mouse_VISp_2018-06-14_exon-matrix.csv',
10 |                        header=0,index_col=0,sep=',')
11 | 
12 | Genes = pd.read_csv('data/Allen_VISp/mouse_VISp_2018-06-14_genes-rows.csv',
13 |                         header=0,sep=',')
14 | RNA_data.index = Genes.gene_symbol
15 | del Genes
16 | 
17 | # filter lowely expressed genes
18 | Genes_count = np.sum(RNA_data > 0, axis=1)
19 | RNA_data = RNA_data.loc[Genes_count >=10,:]
20 | del Genes_count
21 | 
22 | # filter low quality cells
23 | meta_data = pd.read_csv('data/Allen_VISp/mouse_VISp_2018-06-14_samples-columns.csv',
24 |                         header=0,sep=',')
25 | HighQualityCells = (meta_data['class'] != 'No Class') & (meta_data['class'] != 'Low Quality')
26 | RNA_data = RNA_data.iloc[:,np.where(HighQualityCells)[0]]
27 | del meta_data, HighQualityCells
28 | 
29 | def Log_Norm(x):
30 |     return np.log(((x/np.sum(x))*1000000) + 1)
31 | 
32 | RNA_data = RNA_data.apply(Log_Norm,axis=0)
33 | RNA_data_scaled = pd.DataFrame(data=st.zscore(RNA_data.T),index = RNA_data.columns,columns=RNA_data.index)
34 | 
35 | datadict = dict()
36 | datadict['RNA_data'] = RNA_data.T
37 | datadict['RNA_data_scaled'] = RNA_data_scaled
38 | 
39 | with open('data/SpaGE_pkl/Allen_VISp.pkl','wb') as f:
40 |     pickle.dump(datadict, f)
41 | 


--------------------------------------------------------------------------------
/benchmark/STARmap_AllenVISp/SpaGE/Integration.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | os.chdir('STARmap_AllenVISp/')
  3 | 
  4 | import pickle
  5 | import numpy as np
  6 | import pandas as pd
  7 | from sklearn.neighbors import NearestNeighbors
  8 | import time as tm
  9 | 
 10 | with open ('data/SpaGE_pkl/Starmap.pkl', 'rb') as f:
 11 |     datadict = pickle.load(f)
 12 | 
 13 | Starmap_data = datadict['Starmap_data']
 14 | Starmap_data_scaled = datadict['Starmap_data_scaled']
 15 | labels = datadict['labels']
 16 | qhulls = datadict['qhulls']
 17 | coords = datadict['coords']
 18 | del datadict
 19 | 
 20 | with open ('data/SpaGE_pkl/Allen_VISp.pkl', 'rb') as f:
 21 |     datadict = pickle.load(f)
 22 |     
 23 | RNA_data = datadict['RNA_data']
 24 | RNA_data_scaled = datadict['RNA_data_scaled']
 25 | del datadict
 26 | 
 27 | #### Leave One Out Validation ####
 28 | Common_data = RNA_data_scaled[np.intersect1d(Starmap_data_scaled.columns,RNA_data_scaled.columns)]
 29 | Imp_Genes = pd.DataFrame(columns=Common_data.columns)
 30 | precise_time = []
 31 | knn_time = []
 32 | for i in Common_data.columns:
 33 |     print(i)
 34 |     start = tm.time()
 35 |     from principal_vectors import PVComputation
 36 | 
 37 |     n_factors = 50
 38 |     n_pv = 50
 39 |     dim_reduction = 'pca'
 40 |     dim_reduction_target = 'pca'
 41 | 
 42 |     pv_FISH_RNA = PVComputation(n_factors = n_factors,n_pv = n_pv,dim_reduction = dim_reduction,dim_reduction_target = dim_reduction_target)
 43 | 
 44 |     pv_FISH_RNA.fit(Common_data.drop(i,axis=1),Starmap_data_scaled[Common_data.columns].drop(i,axis=1))
 45 | 
 46 |     S = pv_FISH_RNA.source_components_.T
 47 |     
 48 |     Effective_n_pv = sum(np.diag(pv_FISH_RNA.cosine_similarity_matrix_) > 0.3)
 49 |     S = S[:,0:Effective_n_pv]
 50 | 
 51 |     Common_data_t = Common_data.drop(i,axis=1).dot(S)
 52 |     FISH_exp_t = Starmap_data_scaled[Common_data.columns].drop(i,axis=1).dot(S)
 53 |     precise_time.append(tm.time()-start)
 54 |     
 55 |     start = tm.time()
 56 |     nbrs = NearestNeighbors(n_neighbors=50, algorithm='auto',metric = 'cosine').fit(Common_data_t)
 57 |     distances, indices = nbrs.kneighbors(FISH_exp_t)
 58 | 
 59 |     Imp_Gene = np.zeros(Starmap_data.shape[0])
 60 |     for j in range(0,Starmap_data.shape[0]):
 61 |         weights = 1-(distances[j,:][distances[j,:]<1])/(np.sum(distances[j,:][distances[j,:]<1]))
 62 |         weights = weights/(len(weights)-1)
 63 |         Imp_Gene[j] = np.sum(np.multiply(RNA_data[i][indices[j,:][distances[j,:] < 1]],weights))
 64 |     Imp_Gene[np.isnan(Imp_Gene)] = 0
 65 |     Imp_Genes[i] = Imp_Gene
 66 |     knn_time.append(tm.time()-start)
 67 | 
 68 | Imp_Genes.to_csv('Results/SpaGE_LeaveOneOut.csv')
 69 | precise_time = pd.DataFrame(precise_time)
 70 | knn_time = pd.DataFrame(knn_time)
 71 | precise_time.to_csv('Results/SpaGE_PreciseTime.csv', index = False)
 72 | knn_time.to_csv('Results/SpaGE_knnTime.csv', index = False)
 73 | 
 74 | ##### Novel Genes Expression Patterns ####
 75 | Common_data = RNA_data_scaled[np.intersect1d(Starmap_data_scaled.columns,RNA_data_scaled.columns)]
 76 | #genes_to_impute = ["Tesc","Pvrl3","Sox10","Grm2","Tcrb"]
 77 | genes_to_impute = np.setdiff1d(RNA_data.columns,Starmap_data.columns)
 78 | Imp_New_Genes = pd.DataFrame(np.zeros((Starmap_data.shape[0],len(genes_to_impute))),columns=genes_to_impute)
 79 | 
 80 | from principal_vectors import PVComputation
 81 | 
 82 | n_factors = 50
 83 | n_pv = 50
 84 | dim_reduction = 'pca'
 85 | dim_reduction_target = 'pca'
 86 | 
 87 | pv_FISH_RNA = PVComputation(
 88 |         n_factors = n_factors,
 89 |         n_pv = n_pv,
 90 |         dim_reduction = dim_reduction,
 91 |         dim_reduction_target = dim_reduction_target
 92 | )
 93 | 
 94 | pv_FISH_RNA.fit(Common_data,Starmap_data_scaled[Common_data.columns])
 95 | 
 96 | S = pv_FISH_RNA.source_components_.T
 97 |     
 98 | Effective_n_pv = sum(np.diag(pv_FISH_RNA.cosine_similarity_matrix_) > 0.3)
 99 | S = S[:,0:Effective_n_pv]
100 | 
101 | Common_data_t = Common_data.dot(S)
102 | FISH_exp_t = Starmap_data_scaled[Common_data.columns].dot(S)
103 |     
104 | nbrs = NearestNeighbors(n_neighbors=50, algorithm='auto',
105 |                         metric = 'cosine').fit(Common_data_t)
106 | distances, indices = nbrs.kneighbors(FISH_exp_t)
107 |  
108 | for j in range(0,Starmap_data.shape[0]):
109 |     weights = 1-(distances[j,:][distances[j,:]<1])/(np.sum(distances[j,:][distances[j,:]<1]))
110 |     weights = weights/(len(weights)-1)
111 |     Imp_New_Genes.iloc[j,:] = np.dot(weights,RNA_data[genes_to_impute].iloc[indices[j,:][distances[j,:] < 1]])
112 |     
113 | Imp_New_Genes.to_csv('Results/SpaGE_New_genes.csv')


--------------------------------------------------------------------------------
/benchmark/STARmap_AllenVISp/SpaGE/Precise_output.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | os.chdir('STARmap_AllenVISp/')
 3 | 
 4 | import pickle
 5 | import numpy as np
 6 | import pandas as pd
 7 | import matplotlib
 8 | matplotlib.rcParams['pdf.fonttype'] = 42
 9 | matplotlib.rcParams['ps.fonttype'] = 42
10 | import matplotlib.pyplot as plt
11 | import seaborn as sns
12 | import sys
13 | sys.path.insert(1,'SpaGE/')
14 | from principal_vectors import PVComputation
15 | 
16 | with open ('data/SpaGE_pkl/Starmap.pkl', 'rb') as f:
17 |     datadict = pickle.load(f)
18 | 
19 | Starmap_data = datadict['Starmap_data']
20 | Starmap_data_scaled = datadict['Starmap_data_scaled']
21 | labels = datadict['labels']
22 | qhulls = datadict['qhulls']
23 | coords = datadict['coords']
24 | del datadict
25 | 
26 | with open ('data/SpaGE_pkl/Allen_VISp.pkl', 'rb') as f:
27 |     datadict = pickle.load(f)
28 |     
29 | RNA_data = datadict['RNA_data']
30 | RNA_data_scaled = datadict['RNA_data_scaled']
31 | del datadict
32 | 
33 | Common_data = RNA_data_scaled[np.intersect1d(Starmap_data_scaled.columns,RNA_data_scaled.columns)]
34 | 
35 | n_factors = 50
36 | n_pv = 50
37 | n_pv_display = 50
38 | dim_reduction = 'pca'
39 | dim_reduction_target = 'pca'
40 | 
41 | pv_FISH_RNA = PVComputation(
42 |         n_factors = n_factors,
43 |         n_pv = n_pv,
44 |         dim_reduction = dim_reduction,
45 |         dim_reduction_target = dim_reduction_target
46 | )
47 | 
48 | pv_FISH_RNA.fit(Common_data,Starmap_data_scaled[Common_data.columns])
49 | 
50 | fig = plt.figure()
51 | sns.heatmap(pv_FISH_RNA.initial_cosine_similarity_matrix_[:n_pv_display,:n_pv_display], cmap='seismic_r',
52 |             center=0, vmax=1., vmin=0)
53 | plt.xlabel('Starmap',fontsize=18, color='black')
54 | plt.ylabel('Allen_VISp',fontsize=18, color='black')
55 | plt.xticks(np.arange(n_pv_display)+0.5, range(1, n_pv_display+1), fontsize=12)
56 | plt.yticks(np.arange(n_pv_display)+0.5, range(1, n_pv_display+1), fontsize=12, rotation='horizontal')
57 | plt.gca().set_ylim([n_pv_display,0])
58 | plt.show()
59 | 
60 | plt.figure()
61 | sns.heatmap(pv_FISH_RNA.cosine_similarity_matrix_[:n_pv_display,:n_pv_display], cmap='seismic_r',
62 |             center=0, vmax=1., vmin=0)
63 | for i in range(n_pv_display-1):
64 |     plt.text(i+1,i+.7,'%1.2f'%pv_FISH_RNA.cosine_similarity_matrix_[i,i], fontsize=14,color='black')
65 |     
66 | plt.xlabel('Starmap',fontsize=18, color='black')
67 | plt.ylabel('Allen_VISp',fontsize=18, color='black')
68 | plt.xticks(np.arange(n_pv_display)+0.5, range(1, n_pv_display+1), fontsize=12)
69 | plt.yticks(np.arange(n_pv_display)+0.5, range(1, n_pv_display+1), fontsize=12, rotation='horizontal')
70 | plt.gca().set_ylim([n_pv_display,0])
71 | plt.show()
72 | 
73 | Importance = pd.Series(np.sum(pv_FISH_RNA.source_components_**2,axis=0),index=Common_data.columns)
74 | Importance.sort_values(ascending=False,inplace=True)
75 | Importance.index[0:50]
76 | 
77 | ### Technology specific Processes
78 | Effective_n_pv = sum(np.diag(pv_FISH_RNA.cosine_similarity_matrix_) > 0.3)
79 | 
80 | # explained variance RNA
81 | np.sum(pv_FISH_RNA.source_explained_variance_ratio_[np.arange(Effective_n_pv)])*100
82 | # explained variance spatial
83 | np.sum(pv_FISH_RNA.target_explained_variance_ratio_[np.arange(Effective_n_pv)])*100
84 | 


--------------------------------------------------------------------------------
/benchmark/STARmap_AllenVISp/SpaGE/Starmap_data_preprocessing.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | os.chdir('STARmap_AllenVISp/')
 3 | 
 4 | import numpy as np
 5 | import pandas as pd
 6 | import scipy.stats as st
 7 | import pickle
 8 | from viz import GetQHulls
 9 | 
10 | counts = np.load('data/Starmap/visual_1020/20180505_BY3_1kgenes/cell_barcode_count.npy')
11 | Genes = pd.read_csv('data/Starmap/visual_1020/20180505_BY3_1kgenes/genes.csv',header=None)
12 | Genes = (Genes.iloc[:,0])
13 | counts = pd.DataFrame(data=counts,columns=Genes)
14 | Starmap_data = counts.T
15 | del Genes, counts
16 | 
17 | cell_count = np.sum(Starmap_data,axis=0)
18 | def Log_Norm(x):
19 |     return np.log(((x/np.sum(x))*np.median(cell_count)) + 1)
20 | 
21 | Starmap_data = Starmap_data.apply(Log_Norm,axis=0)
22 | Starmap_data_scaled = pd.DataFrame(data=st.zscore(Starmap_data.T),index = Starmap_data.columns,columns=Starmap_data.index)
23 | 
24 | labels = np.load('data/Starmap/visual_1020/20180505_BY3_1kgenes/labels.npz')["labels"]
25 | qhulls,coords = GetQHulls(labels)
26 | 
27 | datadict = dict()
28 | datadict['Starmap_data'] = Starmap_data.T
29 | datadict['Starmap_data_scaled'] = Starmap_data_scaled
30 | datadict['labels'] = labels
31 | datadict['qhulls'] = qhulls
32 | datadict['coords'] = coords
33 | 
34 | with open('data/SpaGE_pkl/Starmap.pkl','wb') as f:
35 |     pickle.dump(datadict, f)
36 | 


--------------------------------------------------------------------------------
/benchmark/STARmap_AllenVISp/SpaGE/dimensionality_reduction.py:
--------------------------------------------------------------------------------
 1 | """ Dimensionality Reduction
 2 | @author: Soufiane Mourragui
 3 | This module extracts the domain-specific factors from the high-dimensional omics
 4 | dataset. Several methods are here implemented and they can be directly
 5 | called from string name in main method method. All the methods
 6 | use scikit-learn implementation.
 7 | Notes
 8 | -------
 9 | 	-
10 | 	
11 | References
12 | -------
13 | 	[1] Pedregosa, Fabian, et al. (2011) Scikit-learn: Machine learning in Python.
14 | 	Journal of Machine Learning Research
15 | """
16 | 
17 | import numpy as np
18 | from sklearn.decomposition import PCA, FastICA, FactorAnalysis, NMF, SparsePCA
19 | from sklearn.cross_decomposition import PLSRegression
20 | 
21 | 
22 | def process_dim_reduction(method='pca', n_dim=10):
23 |     """
24 |     Default linear dimensionality reduction method. For each method, return a
25 |     BaseEstimator instance corresponding to the method given as input.
26 | 	Attributes
27 |     -------
28 |     method: str, default to 'pca'
29 |     	Method used for dimensionality reduction.
30 |     	Implemented: 'pca', 'ica', 'fa' (Factor Analysis), 
31 |     	'nmf' (Non-negative matrix factorisation), 'sparsepca' (Sparse PCA).
32 |     
33 |     n_dim: int, default to 10
34 |     	Number of domain-specific factors to compute.
35 |     Return values
36 |     -------
37 |     Classifier, i.e. BaseEstimator instance
38 |     """
39 | 
40 |     if method.lower() == 'pca':
41 |         clf = PCA(n_components=n_dim)
42 | 
43 |     elif method.lower() == 'ica':
44 |         print('ICA')
45 |         clf = FastICA(n_components=n_dim)
46 | 
47 |     elif method.lower() == 'fa':
48 |         clf = FactorAnalysis(n_components=n_dim)
49 | 
50 |     elif method.lower() == 'nmf':
51 |         clf = NMF(n_components=n_dim)
52 | 
53 |     elif method.lower() == 'sparsepca':
54 |         clf = SparsePCA(n_components=n_dim, alpha=10., tol=1e-4, verbose=10, n_jobs=1)
55 | 
56 |     elif method.lower() == 'pls':
57 |         clf = PLS(n_components=n_dim)
58 | 		
59 |     else:
60 |         raise NameError('%s is not an implemented method'%(method))
61 | 
62 |     return clf
63 | 
64 | 
65 | class PLS():
66 |     """
67 |     Implement PLS to make it compliant with the other dimensionality
68 |     reduction methodology.
69 |     (Simple class rewritting).
70 |     """
71 |     def __init__(self, n_components=10):
72 |         self.clf = PLSRegression(n_components)
73 | 
74 |     def get_components_(self):
75 |         return self.clf.x_weights_.transpose()
76 | 
77 |     def set_components_(self, x):
78 |         pass
79 | 
80 |     components_ = property(get_components_, set_components_)
81 | 
82 |     def fit(self, X, y):
83 |         self.clf.fit(X,y)
84 |         return self
85 | 
86 |     def transform(self, X):
87 |         return self.clf.transform(X)
88 | 
89 |     def predict(self, X):
90 |         return self.clf.predict(X)


--------------------------------------------------------------------------------
/benchmark/STARmap_AllenVISp/SpaGE/viz.py:
--------------------------------------------------------------------------------
  1 | from __future__ import print_function
  2 | from scipy.spatial import ConvexHull
  3 | from skimage.transform import downscale_local_mean
  4 | from matplotlib.patches import Polygon
  5 | from matplotlib.collections import PatchCollection
  6 | from skimage.measure import regionprops
  7 | import numpy as np
  8 | import matplotlib.pyplot as plt
  9 | 
 10 | def plot_heatmap_with_labels(data_obj, degenes,cmap,show_axis=True, font_size=10):
 11 |     g = plt.GridSpec(2,1, wspace=0.01, hspace=0.01, height_ratios=[0.5,10])
 12 |     ax = plt.subplot(g[0])
 13 |     ax.imshow(np.expand_dims(np.sort(data_obj._clusts),1).T,aspect='auto',interpolation='none',cmap=cmap)
 14 |     ax.axis('off')
 15 |     ax = plt.subplot(g[1])
 16 |     data_obj.plot_heatmap(list(degenes), fontsize=font_size,use_imshow=False,ax=ax)
 17 |     if not show_axis:
 18 |         plt.axis('off')
 19 | 
 20 | def GetQHulls(labels):
 21 |     labels += 1
 22 |     Nlabels = labels.max()
 23 |     hulls = []
 24 |     coords = []
 25 |     num_cells = 0
 26 |     print('blah')
 27 |     for i in range(Nlabels):#enumerate(regionprops(labels)):    
 28 |         print(i,"/",Nlabels)
 29 |         curr_coords = np.argwhere(labels==i)
 30 |         # size threshold of > 100 pixels and < 100000
 31 |         if curr_coords.shape[0] < 100000 and curr_coords.shape[0] > 1000:
 32 |             num_cells += 1
 33 |             hulls.append(ConvexHull(curr_coords))
 34 |             coords.append(curr_coords)
 35 |     print("Used %d / %d" % (num_cells, Nlabels))
 36 |     return hulls, coords 
 37 | 
 38 | 
 39 | def hull_to_polygon(hull):    
 40 |     cent = np.mean(hull.points, 0)
 41 |     pts = []
 42 |     for pt in hull.points[hull.simplices]:
 43 |         pts.append(pt[0].tolist())
 44 |         pts.append(pt[1].tolist())
 45 |     pts.sort(key=lambda p: np.arctan2(p[1] - cent[1],
 46 |                                     p[0] - cent[0]))
 47 |     pts = pts[0::2]  # Deleting duplicates
 48 |     pts.insert(len(pts), pts[0])
 49 |     k =1.1
 50 |     poly = Polygon(k*(np.array(pts)- cent) + cent,edgecolor='k', linewidth=1)
 51 |     #poly.set_capstyle('round')
 52 |     return poly
 53 | 
 54 | def plot_poly_cells_expression(nissl, hulls, expr, cmap, good_cells=None,width=2, height=9,figscale=10,alpha=1):
 55 |     figscale = 10
 56 |     plt.figure(figsize=(figscale*width/float(height),figscale))
 57 |     polys = [hull_to_polygon(h) for h in hulls]
 58 |     if good_cells is not None:
 59 |         polys = [p for i,p in enumerate(polys) if i in good_cells]
 60 |     p = PatchCollection(polys,alpha=alpha, cmap=cmap,linewidths=0)
 61 |     p.set_array(expr)
 62 |     p.set_clim(vmin=0, vmax=expr.max())        
 63 |     plt.gca().add_collection(p)
 64 |     plt.imshow(nissl.T, cmap=plt.cm.gray_r,alpha=0.15)
 65 |     plt.axis('off')
 66 | 
 67 | def plot_poly_cells_expression_99(nissl, hulls, expr, cmap, good_cells=None,width=2, height=9,figscale=10,alpha=1):
 68 |     figscale = 10
 69 |     plt.figure(figsize=(figscale*width/float(height),figscale))
 70 |     polys = [hull_to_polygon(h) for h in hulls]
 71 |     if good_cells is not None:
 72 |         polys = [p for i,p in enumerate(polys) if i in good_cells]
 73 |     p = PatchCollection(polys,alpha=alpha, cmap=cmap,linewidths=0)
 74 |     p.set_array(expr)
 75 |     p.set_clim(vmin=0, vmax=np.percentile(expr,99))        
 76 |     plt.gca().add_collection(p)
 77 |     plt.imshow(nissl.T, cmap=plt.cm.gray_r,alpha=0.15)
 78 |     plt.axis('off')
 79 |     
 80 | def plot_poly_cells_expression_40(nissl, hulls, expr, cmap, good_cells=None,width=2, height=9,figscale=10,alpha=1):
 81 |     figscale = 10
 82 |     plt.figure(figsize=(figscale*width/float(height),figscale))
 83 |     polys = [hull_to_polygon(h) for h in hulls]
 84 |     if good_cells is not None:
 85 |         polys = [p for i,p in enumerate(polys) if i in good_cells]
 86 |     p = PatchCollection(polys,alpha=alpha, cmap=cmap,linewidths=0)
 87 |     p.set_array(expr)
 88 |     p.set_clim(vmin=np.percentile(expr,40), vmax=np.percentile(expr,99))        
 89 |     plt.gca().add_collection(p)
 90 |     plt.imshow(nissl.T, cmap=plt.cm.gray_r,alpha=0.15)
 91 |     plt.axis('off')
 92 | 
 93 | def plot_poly_cells_cluster(nissl, hulls, colors, cmap, good_cells=None,width=2, height=9,figscale=10, rescale_colors=False,alpha=1,vmin=None,vmax=None):
 94 |     figscale = 10
 95 |     plt.figure(figsize=(figscale*width/float(height),figscale))
 96 |     polys = [hull_to_polygon(h) for h in hulls]
 97 |     if good_cells is not None:
 98 |         polys = [p for i,p in enumerate(polys) if i in good_cells]
 99 |     p = PatchCollection(polys,alpha=alpha, cmap=cmap,edgecolor='k', linewidth=0.5)
100 |     if vmin or vmax is not None:
101 |         p.set_array(colors)
102 |         p.set_clim(vmin=vmin,vmax=vmax)
103 |     else:
104 |         if rescale_colors:
105 |             p.set_array(colors+1)
106 |             p.set_clim(vmin=0, vmax=max(colors+1))
107 |         else:
108 |             p.set_array(colors)
109 |             p.set_clim(vmin=0, vmax=max(colors))        
110 |     nissl = (nissl > 0).astype(np.int)
111 |     plt.imshow(nissl.T,cmap=plt.cm.gray_r,alpha=0.15)
112 |     plt.gca().add_collection(p)
113 |     plt.axis('off')
114 |     return polys
115 | 
116 | def plot_cells_cluster(nissl, coords, good_cells, colors, cmap, width=2, height=9,figscale=100, vmin=None,vmax=None):
117 |     figscale = 10
118 |     plt.figure(figsize=(figscale*width/float(height),figscale))
119 |     img = -1*np.ones_like(nissl)
120 |     curr_coords = [coords[k] for k in range(len(coords)) if k in good_cells]
121 |     for i,c in enumerate(curr_coords):
122 |         for k in c:
123 |             if k[0] < img.shape[0] and k[1] < img.shape[1]:
124 |                 img[k[0],k[1]] = colors[i]
125 |     plt.imshow(img.T,cmap=cmap,vmin=-1,vmax=colors.max())
126 |     plt.axis('off')
127 |                       
128 | def get_cells_and_clusts_for_experiment(analysis_obj, expt_id):
129 |     good_cells = analysis_obj._meta.index[(analysis_obj._meta["orig_ident"]==expt_id)].values
130 |     colors = analysis_obj._clusts[analysis_obj._meta["orig_ident"]==expt_id]
131 |     return good_cells, colors
132 | 


--------------------------------------------------------------------------------
/benchmark/STARmap_AllenVISp/Starmap_plots.R:
--------------------------------------------------------------------------------
  1 | setwd("STARmap_AllenVISp/")
  2 | library(Seurat)
  3 | library(ggplot2)
  4 | 
  5 | starmap <- readRDS("data/seurat_objects/20180505_BY3_1kgenes.rds")
  6 | starmap.imputed <- readRDS("data/seurat_objects/20180505_BY3_1kgenes_imputed.rds")
  7 | allen <- readRDS("data/seurat_objects/allen_brain.rds")
  8 | 
  9 | DefaultAssay(starmap.imputed) <- "RNA"
 10 | genes.leaveout <- intersect(rownames(starmap),rownames(allen))
 11 | 
 12 | # Original
 13 | for(i in 1:length(genes.leaveout)) {
 14 |   if (genes.leaveout[[i]] %in% c('Bsg', 'Rab3c')) {
 15 |     qcut = 'q40'
 16 |   } else {
 17 |     qcut = 0
 18 |   }
 19 |   p <- PolyFeaturePlot(starmap.imputed, genes.leaveout[[i]], flip.coords = TRUE, min.cutoff = qcut, max.cutoff = 'q99') + SpatialTheme()
 20 |   ggsave(plot = p, filename = paste0('Figures/Original/', genes.leaveout[[i]],'.pdf'))
 21 | }
 22 | 
 23 | # Seurat_Predicted
 24 | Seurat_Predicted <- read.csv(file = 'Results/Seurat_LeaveOneOut.csv',header = TRUE, check.names = FALSE, row.names = 1)
 25 | Seurat_Predicted <- Seurat_Predicted[genes.leaveout,]
 26 | colnames(Seurat_Predicted) <- colnames(starmap.imputed@assays$ss2[,])
 27 | starmap.imputed[['ss2']] <- CreateAssayObject(data = as.matrix(Seurat_Predicted))
 28 | DefaultAssay(starmap.imputed) <- 'ss2'
 29 | for(i in 1:length(genes.leaveout)) {
 30 |   if (genes.leaveout[[i]] %in% c('Bsg', 'Rab3c')) {
 31 |     qcut = 'q40'
 32 |   } else {
 33 |     qcut = 0
 34 |   }
 35 |   p <- PolyFeaturePlot(starmap.imputed, genes.leaveout[[i]], flip.coords = TRUE, min.cutoff = qcut, max.cutoff = 'q99') + SpatialTheme()
 36 |   ggsave(plot = p, filename = paste0('Figures/Seurat_Predicted/', genes.leaveout[[i]],'.pdf'))
 37 | }
 38 | 
 39 | # SpaGE_Predicted
 40 | SpaGE_Predicted <- read.csv(file = 'Results/SpaGE_LeaveOneOut.csv',header = TRUE, check.names = FALSE, row.names = 1)
 41 | SpaGE_Predicted <- t(SpaGE_Predicted)
 42 | SpaGE_Predicted <- SpaGE_Predicted[genes.leaveout,]
 43 | colnames(SpaGE_Predicted) <- colnames(starmap@assays$RNA[,])
 44 | starmap[['ss2']] <- CreateAssayObject(data = as.matrix(SpaGE_Predicted))
 45 | DefaultAssay(starmap) <- 'ss2'
 46 | for(i in 1:length(genes.leaveout)) {
 47 |   if (genes.leaveout[[i]] %in% c('Bsg', 'Rab3c')) {
 48 |     qcut = 'q40'
 49 |   } else {
 50 |     qcut = 0
 51 |   }
 52 |   p <- PolyFeaturePlot(starmap, genes.leaveout[[i]], flip.coords = TRUE, min.cutoff = qcut, max.cutoff = 'q99') + SpatialTheme()
 53 |   ggsave(plot = p, filename = paste0('Figures/SpaGE_Predicted/', genes.leaveout[[i]],'.pdf'))
 54 | }
 55 | 
 56 | starmap <- readRDS("data/seurat_objects/20180505_BY3_1kgenes.rds")
 57 | 
 58 | # Liger_Predicted
 59 | Liger_Predicted <- read.csv(file = 'Results/Liger_LeaveOneOut.csv',header = TRUE, check.names = FALSE, row.names = 1)
 60 | Liger_Predicted <- Liger_Predicted[genes.leaveout,]
 61 | colnames(Liger_Predicted) <- colnames(starmap@assays$RNA[,])
 62 | starmap[['ss2']] <- CreateAssayObject(data = as.matrix(Liger_Predicted))
 63 | DefaultAssay(starmap) <- 'ss2'
 64 | for(i in 1:length(genes.leaveout)) {
 65 |   if (genes.leaveout[[i]] %in% c('Bsg', 'Rab3c')) {
 66 |     qcut = 'q40'
 67 |   } else {
 68 |     qcut = 0
 69 |   }
 70 |   p <- PolyFeaturePlot(starmap, genes.leaveout[[i]], flip.coords = TRUE, min.cutoff = qcut, max.cutoff = 'q99') + SpatialTheme()
 71 |   ggsave(plot = p, filename = paste0('Figures/Liger_Predicted/', genes.leaveout[[i]],'.pdf'))
 72 | }
 73 | 
 74 | starmap <- readRDS("data/seurat_objects/20180505_BY3_1kgenes.rds")
 75 | 
 76 | # gimVI_Predicted
 77 | gimVI_Predicted <- read.csv(file = 'Results/gimVI_LeaveOneOut.csv',header = TRUE, check.names = FALSE, row.names = 1)
 78 | gimVI_Predicted <- t(gimVI_Predicted)
 79 | gimVI_Predicted <- gimVI_Predicted[toupper(genes.leaveout),]
 80 | colnames(gimVI_Predicted) <- colnames(starmap@assays$RNA[,])
 81 | rownames(gimVI_Predicted) <- genes.leaveout
 82 | starmap[['ss2']] <- CreateAssayObject(data = as.matrix(gimVI_Predicted))
 83 | DefaultAssay(starmap) <- 'ss2'
 84 | for(i in 1:length(genes.leaveout)) {
 85 |   if (genes.leaveout[[i]] %in% c('Bsg', 'Rab3c')) {
 86 |     qcut = 'q40'
 87 |   } else {
 88 |     qcut = 0
 89 |   }
 90 |   p <- PolyFeaturePlot(starmap, genes.leaveout[[i]], flip.coords = TRUE, min.cutoff = qcut, max.cutoff = 'q99') + SpatialTheme()
 91 |   ggsave(plot = p, filename = paste0('Figures/gimVI_Predicted/', genes.leaveout[[i]],'.pdf'))
 92 | }
 93 | 
 94 | 
 95 | ### NEw genes 
 96 | # show genes not in the starmap dataset
 97 | new.genes <- c('Tesc', 'Pvrl3', 'Sox10', 'Grm2', 'Tcrb')
 98 | starmap.imputed <- readRDS("data/seurat_objects/20180505_BY3_1kgenes_imputed.rds")
 99 | DefaultAssay(starmap.imputed) <- "ss2"
100 | 
101 | for(i in new.genes) {
102 |   p <- PolyFeaturePlot(starmap.imputed, i, flip.coords = TRUE, min.cutoff = qcut, max.cutoff = 'q99') + SpatialTheme()
103 |   ggsave(plot = p, filename = paste0('Figures/Seurat_Predicted/New_', i,'.pdf'))
104 | }
105 | 
106 | starmap <- readRDS("data/seurat_objects/20180505_BY3_1kgenes.rds")
107 | #SpaGE
108 | SpaGE_New <- read.csv(file = 'Results/SpaGE_New_genes.csv',header = TRUE, check.names = FALSE, row.names = 1)
109 | SpaGE_New <- t(SpaGE_New)
110 | new.genes <- c('Tesc', 'Pvrl3', 'Sox10', 'Grm2', 'Tcrb','Ttyh2','Cldn11','Tmem88b')
111 | SpaGE_New2 <- SpaGE_New[new.genes,]
112 | colnames(SpaGE_New2) <- colnames(starmap@assays$RNA[,])
113 | starmap[['ss2']] <- CreateAssayObject(data = as.matrix(SpaGE_New2))
114 | DefaultAssay(starmap) <- 'ss2'
115 | for(i in new.genes) {
116 |   p <- PolyFeaturePlot(starmap, i, flip.coords = TRUE, min.cutoff = 0, max.cutoff = 'q99') + SpatialTheme()
117 |   ggsave(plot = p, filename = paste0('Figures/SpaGE_Predicted/New_', i,'.pdf'))
118 | }
119 | 
120 | starmap <- readRDS("data/seurat_objects/20180505_BY3_1kgenes.rds")
121 | # Liger
122 | Liger_New <- read.csv(file = 'Results/Liger_New_genes.csv',header = TRUE, check.names = FALSE, row.names = 1)
123 | Liger_New <- Liger_New[new.genes,]
124 | colnames(Liger_New) <- colnames(starmap@assays$RNA[,])
125 | starmap[['ss2']] <- CreateAssayObject(data = as.matrix(Liger_New))
126 | DefaultAssay(starmap) <- 'ss2'
127 | for(i in new.genes) {
128 |   p <- PolyFeaturePlot(starmap, i, flip.coords = TRUE, min.cutoff = qcut, max.cutoff = 'q99') + SpatialTheme()
129 |   ggsave(plot = p, filename = paste0('Figures/Liger_Predicted/New_', i,'.pdf'))
130 | }
131 | 
132 | starmap <- readRDS("data/seurat_objects/20180505_BY3_1kgenes.rds")
133 | # gimVI
134 | gimVI_New <- read.csv(file = 'Results/gimVI_New_genes.csv',header = TRUE, check.names = FALSE, row.names = 1)
135 | gimVI_New <- t(gimVI_New)
136 | gimVI_New <- gimVI_New[toupper(new.genes),]
137 | colnames(gimVI_New) <- colnames(starmap@assays$RNA[,])
138 | rownames(gimVI_New) <- new.genes
139 | starmap[['ss2']] <- CreateAssayObject(data = as.matrix(gimVI_New))
140 | DefaultAssay(starmap) <- 'ss2'
141 | for(i in new.genes) {
142 |   p <- PolyFeaturePlot(starmap, i, flip.coords = TRUE, min.cutoff = qcut, max.cutoff = 'q99') + SpatialTheme()
143 |   ggsave(plot = p, filename = paste0('Figures/gimVI_Predicted/New_', i,'.pdf'))
144 | }


--------------------------------------------------------------------------------
/benchmark/STARmap_AllenVISp/gimVI/gimVI.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | os.chdir('STARmap_AllenVISp/')
  3 | 
  4 | from scvi.dataset import CsvDataset
  5 | from scvi.models import JVAE, Classifier
  6 | from scvi.inference import JVAETrainer
  7 | import numpy as np
  8 | import pandas as pd
  9 | import copy
 10 | import torch
 11 | import time as tm
 12 | 
 13 | ### STARmap data
 14 | Starmap_data = CsvDataset('data/gimVI_data/STARmap_data_scvi.csv', save_path = "", sep = ",", gene_by_cell = False)
 15 | 
 16 | ### AllenVISp
 17 | RNA_data = CsvDataset('data/gimVI_data/Allen_data_scvi.csv', save_path = "", sep = ",", gene_by_cell = False)
 18 | 
 19 | ### Leave-one-out validation
 20 | Gene_set = np.intersect1d(Starmap_data.gene_names,RNA_data.gene_names)
 21 | Starmap_data.gene_names = Gene_set
 22 | Starmap_data.X = Starmap_data.X[:,np.reshape(np.vstack(np.argwhere(i==Starmap_data.gene_names) for i in Gene_set),-1)]
 23 | Common_data = copy.deepcopy(RNA_data)
 24 | Common_data.gene_names = Gene_set
 25 | Common_data.X = Common_data.X[:,np.reshape(np.vstack(np.argwhere(i==RNA_data.gene_names) for i in Gene_set),-1)]
 26 | Imp_Genes = pd.DataFrame(columns=Gene_set)
 27 | gimVI_time = []
 28 | 
 29 | for i in Gene_set:
 30 |     print(i)
 31 |     # Create copy of the fish dataset with hidden genes
 32 |     data_spatial_partial = copy.deepcopy(Starmap_data)
 33 |     data_spatial_partial.filter_genes_by_attribute(np.setdiff1d(Starmap_data.gene_names,i))
 34 |     data_spatial_partial.batch_indices += Common_data.n_batches
 35 |     
 36 |     datasets = [Common_data, data_spatial_partial]
 37 |     generative_distributions = ["zinb", "nb"]
 38 |     gene_mappings = [slice(None), np.reshape(np.vstack(np.argwhere(i==Common_data.gene_names) for i in data_spatial_partial.gene_names),-1)]
 39 |     n_inputs = [d.nb_genes for d in datasets]
 40 |     total_genes = Common_data.nb_genes
 41 |     n_batches = sum([d.n_batches for d in datasets])
 42 |     
 43 |     model_library_size = [True, False]
 44 |     
 45 |     n_latent = 8
 46 |     kappa = 5
 47 |     
 48 |     start = tm.time()
 49 |     torch.manual_seed(0)
 50 |     
 51 |     model = JVAE(
 52 |         n_inputs,
 53 |         total_genes,
 54 |         gene_mappings,
 55 |         generative_distributions,
 56 |         model_library_size,
 57 |         n_layers_decoder_individual=0,
 58 |         n_layers_decoder_shared=0,
 59 |         n_layers_encoder_individual=1,
 60 |         n_layers_encoder_shared=1,
 61 |         dim_hidden_encoder=64,
 62 |         dim_hidden_decoder_shared=64,
 63 |         dropout_rate_encoder=0.2,
 64 |         dropout_rate_decoder=0.2,
 65 |         n_batch=n_batches,
 66 |         n_latent=n_latent,
 67 |     )
 68 |     discriminator = Classifier(n_latent, 32, 2, 3, logits=True)
 69 |     
 70 |     trainer = JVAETrainer(model, discriminator, datasets, 0.95, frequency=1, kappa=kappa)
 71 |     trainer.train(n_epochs=200)
 72 |     _,Imputed = trainer.get_imputed_values(normalized=True)
 73 |     Imputed = np.reshape(Imputed[:,np.argwhere(i==Common_data.gene_names)[0]],-1)
 74 |     Imp_Genes[i] = Imputed
 75 |     gimVI_time.append(tm.time()-start)
 76 |     
 77 | Imp_Genes = Imp_Genes.fillna(0)    
 78 | Imp_Genes.to_csv('Results/gimVI_LeaveOneOut.csv')
 79 | gimVI_time = pd.DataFrame(gimVI_time)
 80 | gimVI_time.to_csv('Results/gimVI_Time.csv', index = False)
 81 | 
 82 | ### New genes
 83 | Imp_New_Genes = pd.DataFrame(columns=["TESC","PVRL3","SOX10","GRM2","TCRB"])
 84 | 
 85 | # Create copy of the fish dataset with hidden genes
 86 | data_spatial_partial = copy.deepcopy(Starmap_data)
 87 | data_spatial_partial.filter_genes_by_attribute(Starmap_data.gene_names)
 88 | data_spatial_partial.batch_indices += RNA_data.n_batches
 89 | 
 90 | datasets = [RNA_data, data_spatial_partial]
 91 | generative_distributions = ["zinb", "nb"]
 92 | gene_mappings = [slice(None), np.reshape(np.vstack(np.argwhere(i==RNA_data.gene_names) for i in data_spatial_partial.gene_names),-1)]
 93 | n_inputs = [d.nb_genes for d in datasets]
 94 | total_genes = RNA_data.nb_genes
 95 | n_batches = sum([d.n_batches for d in datasets])
 96 | 
 97 | model_library_size = [True, False]
 98 | 
 99 | n_latent = 8
100 | kappa = 5
101 | 
102 | torch.manual_seed(0)
103 | 
104 | model = JVAE(
105 |     n_inputs,
106 |     total_genes,
107 |     gene_mappings,
108 |     generative_distributions,
109 |     model_library_size,
110 |     n_layers_decoder_individual=0,
111 |     n_layers_decoder_shared=0,
112 |     n_layers_encoder_individual=1,
113 |     n_layers_encoder_shared=1,
114 |     dim_hidden_encoder=64,
115 |     dim_hidden_decoder_shared=64,
116 |     dropout_rate_encoder=0.2,
117 |     dropout_rate_decoder=0.2,
118 |     n_batch=n_batches,
119 |     n_latent=n_latent,
120 | )
121 | discriminator = Classifier(n_latent, 32, 2, 3, logits=True)
122 | 
123 | trainer = JVAETrainer(model, discriminator, datasets, 0.95, frequency=1, kappa=kappa)
124 | trainer.train(n_epochs=200)
125 |     
126 | for i in ["TESC","PVRL3","SOX10","GRM2","TCRB"]:
127 |     _,Imputed = trainer.get_imputed_values(normalized=True)
128 |     Imputed = np.reshape(Imputed[:,np.argwhere(i==RNA_data.gene_names)[0]],-1)
129 |     Imp_New_Genes[i] = Imputed
130 |     
131 | Imp_New_Genes.to_csv('Results/gimVI_New_genes.csv')
132 | 


--------------------------------------------------------------------------------
/benchmark/Timing_Evaluation.py:
--------------------------------------------------------------------------------
 1 | import numpy as np
 2 | import pandas as pd
 3 | import matplotlib
 4 | matplotlib.rcParams['pdf.fonttype'] = 42
 5 | matplotlib.rcParams['ps.fonttype'] = 42
 6 | import matplotlib.pyplot as plt
 7 | from matplotlib.lines import Line2D
 8 | 
 9 | ### Timing
10 | experiments = ['STARmap_AllenVISp','osmFISH_Ziesel','osmFISH_AllenVISp',
11 |                'osmFISH_AllenSSp','MERFISH_Moffit']
12 | # SpaGE
13 | SpaGE_Avg_Time = pd.Series(index = experiments)
14 | for i in experiments:
15 |     SpaGE_Precise_Time = np.array(pd.read_csv(i + '/Results/SpaGE_PreciseTime.csv',
16 |                                           header=0,sep=',')).reshape(-1)
17 |     SpaGE_knn_Time = np.array(pd.read_csv(i +'/Results/SpaGE_knnTime.csv',
18 |                                       header=0,sep=',')).reshape(-1)
19 | 
20 |     SpaGE_total_time = SpaGE_Precise_Time + SpaGE_knn_Time
21 |     SpaGE_Avg_Time[i] = np.mean(SpaGE_total_time)
22 | del SpaGE_Precise_Time,SpaGE_knn_Time,SpaGE_total_time
23 | 
24 | # gimVI
25 | gimVI_Avg_Time = pd.Series(index = experiments)
26 | for i in experiments:
27 |     gimVI_Time = np.array(pd.read_csv(i+'/Results/gimVI_Time.csv',header=0,sep=',')).reshape(-1)
28 |     gimVI_Avg_Time[i] = np.mean(gimVI_Time)
29 | del gimVI_Time
30 | 
31 | # Seurat
32 | Seurat_Avg_Time = pd.Series(index = experiments)
33 | for i in experiments:
34 |     Seurat_anchor_Time = np.array(pd.read_csv(i+'/Results/Seurat_anchor_time.csv',
35 |                                               header=0,sep=',')).reshape(-1)
36 |     Seurat_transfer_Time = np.array(pd.read_csv(i+'/Results/Seurat_transfer_time.csv'
37 |                                                 ,header=0,sep=',')).reshape(-1)
38 |     
39 |     Seurat_total_time = Seurat_anchor_Time + Seurat_transfer_Time
40 |     Seurat_Avg_Time[i] = np.mean(Seurat_total_time)
41 | del Seurat_anchor_Time,Seurat_transfer_Time,Seurat_total_time
42 | 
43 | # Liger
44 | Liger_Avg_Time = pd.Series(index = experiments)
45 | for i in experiments:
46 |     Liger_NMF_Time = np.array(pd.read_csv(i+'/Results/Liger_NMF_time.csv',
47 |                                           header=0,sep=',')).reshape(-1)
48 |     Liger_knn_Time = np.array(pd.read_csv(i+'/Results/Liger_knn_time.csv',
49 |                                           header=0,sep=',')).reshape(-1)
50 |     
51 |     Liger_total_time = Liger_NMF_Time + Liger_knn_Time
52 |     Liger_Avg_Time[i] = np.mean(Liger_total_time)
53 | del Liger_NMF_Time,Liger_knn_Time,Liger_total_time
54 | 
55 | plt.style.use('ggplot')
56 | plt.figure(figsize=(9, 3))
57 | plt.plot([1,2,3,4,5],SpaGE_Avg_Time,color='black',marker='s',linewidth=3)
58 | plt.plot([1,2,3,4,5],Seurat_Avg_Time,color='blue',marker='s',linewidth=3)
59 | plt.plot([1,2,3,4,5],Liger_Avg_Time,color='red',marker='s',linewidth=3)
60 | plt.plot([1,2,3,4,5],gimVI_Avg_Time,color='purple',marker='s',linewidth=3)
61 | #plt.yscale('log')
62 | plt.xticks((1,2,3,4,5),('STARmap_AllenVISp\n(1549,14249)','osmFISH_Zeisel\n(3405,1691)',
63 |             'osmFISH_AllenVISp\n(3405,14249)','osmFISH_AllenSSp\n(3405,5613)',
64 |             'MERFSIH_Moffit\n(64373,31299)'),size=10)
65 | plt.yticks(size=8)
66 | plt.ylabel('Avergae computation time (seconds)',size=12)
67 | colors = ['black','blue', 'red', 'purple']
68 | lines = [Line2D([0], [0], color=c, linewidth=3, marker='s') for c in colors]
69 | labels = ['SpaGE','Seurat', 'Liger','gimVI']
70 | plt.legend(lines, labels)
71 | plt.show()


--------------------------------------------------------------------------------
/benchmark/osmFISH_AllenSSp/Liger/LIGER.R:
--------------------------------------------------------------------------------
 1 | setwd("osmFISH_AllenSSp/")
 2 | library(liger)
 3 | library(hdf5r)
 4 | library(methods)
 5 | 
 6 | # allen VISp
 7 | allen <- read.table(file = "data/Allen_SSp/SSp_exons_matrix.csv",
 8 |                     row.names = 1, sep = ',', stringsAsFactors = FALSE, header = TRUE)
 9 | allen <- as.matrix(x = allen)
10 | 
11 | Genes_count = rowSums(allen > 0)
12 | allen <- allen[Genes_count>=10,]
13 | 
14 | meta.data <- read.csv(file = "data/Allen_SSp/AllenSSp_metadata.csv",
15 |                       row.names = 1, stringsAsFactors = FALSE)
16 | ok.cells <- colnames(allen[,meta.data$class_label != 'Exclude'])
17 | allen <- allen[,ok.cells]
18 | 
19 | # osmFISH
20 | osm <- H5File$new("data/osmFISH/osmFISH_SScortex_mouse_all_cells.loom")
21 | mat <- osm[['matrix']][,]
22 | colnames(mat) <- osm[['row_attrs']][['Gene']][]
23 | rownames(mat) <- paste0('osm_', osm[['col_attrs']][['CellID']][])
24 | x_dim <- osm[['col_attrs']][['X']][]
25 | y_dim <- osm[['col_attrs']][['Y']][]
26 | region <- osm[['col_attrs']][['Region']][]
27 | cluster <- osm[['col_attrs']][['ClusterName']][]
28 | osm$close_all()
29 | spatial <- data.frame(spatial1 = x_dim, spatial2 = y_dim)
30 | rownames(spatial) <- rownames(mat)
31 | spatial <- as.matrix(spatial)
32 | osmFISH <- t(mat)
33 | osmFISH <- osmFISH[,!is.element(region,c('Excluded','Internal Capsule Caudoputamen','White matter','Hippocampus','Ventricle'))]
34 | 
35 | Gene_set <- intersect(rownames(osmFISH),rownames(allen))
36 | 
37 | #### New genes prediction
38 | Ligerex <- createLiger(list(SMSC_FISH = osmFISH, SMSC_RNA = allen))
39 | Ligerex <- normalize(Ligerex)
40 | Ligerex@var.genes <- Gene_set
41 | Ligerex <- scaleNotCenter(Ligerex)
42 | 
43 | # suggestK(Ligerex, k.test= seq(5,30,5)) # K = 20
44 | # suggestLambda(Ligerex, k = 20) # Lambda = 20
45 | 
46 | Ligerex <- optimizeALS(Ligerex,k = 20, lambda = 20)
47 | Ligerex <- quantileAlignSNF(Ligerex)
48 | 
49 | Imputation <- imputeKNN(Ligerex,reference = 'SMSC_RNA', queries = list('SMSC_FISH'), norm = TRUE, scale = FALSE)
50 | new.genes <- c('Tesc', 'Pvrl3', 'Grm2')
51 | Imp_New_genes <- matrix(0,nrow = length(new.genes),ncol = dim(Imputation@norm.data$SMSC_FISH)[2])
52 | rownames(Imp_New_genes) <- new.genes
53 | for(i in 1:length(new.genes)) {
54 |   Imp_New_genes[new.genes[[i]],] = as.vector(Imputation@norm.data$SMSC_FISH[new.genes[i],])
55 | }
56 | write.csv(Imp_New_genes,file = 'Results/Liger_New_genes.csv')
57 | 
58 | # leave-one-out validation
59 | genes.leaveout <- intersect(rownames(osmFISH),rownames(allen))
60 | Imp_genes <- matrix(0,nrow = length(genes.leaveout),ncol = dim(osmFISH)[2])
61 | rownames(Imp_genes) <- genes.leaveout
62 | colnames(Imp_genes) <- colnames(osmFISH)
63 | NMF_time <- vector(mode= "numeric")
64 | knn_time <- vector(mode= "numeric")
65 | 
66 | for(i in 1:length(genes.leaveout)) {
67 |   print(i)
68 |   start_time <- Sys.time()
69 |   Ligerex.leaveout <- createLiger(list(SMSC_FISH = osmFISH[-which(rownames(osmFISH) %in% genes.leaveout[i]),], SMSC_RNA = allen[rownames(osmFISH),]))
70 |   Ligerex.leaveout <- normalize(Ligerex.leaveout)
71 |   Ligerex.leaveout@var.genes <- setdiff(Gene_set,genes.leaveout[i])
72 |   Ligerex.leaveout <- scaleNotCenter(Ligerex.leaveout)
73 |   if(dim(Ligerex.leaveout@norm.data$SMSC_FISH)[2]!=dim(osmFISH)[2]){
74 |     next
75 |   }
76 |   Ligerex.leaveout <- optimizeALS(Ligerex.leaveout,k = 20, lambda = 20)
77 |   Ligerex.leaveout <- quantileAlignSNF(Ligerex.leaveout)
78 |   end_time <- Sys.time()
79 |   NMF_time <- c(NMF_time,as.numeric(difftime(end_time,start_time,units = 'secs')))
80 |   
81 |   start_time <- Sys.time()
82 |   Imputation <- imputeKNN(Ligerex.leaveout,reference = 'SMSC_RNA', queries = list('SMSC_FISH'), norm = TRUE, scale = FALSE, knn_k = 30)
83 |   Imp_genes[genes.leaveout[[i]],] = as.vector(Imputation@norm.data$SMSC_FISH[genes.leaveout[i],])
84 |   end_time <- Sys.time()
85 |   knn_time <- c(knn_time,as.numeric(difftime(end_time,start_time,units = 'secs')))
86 | }
87 | write.csv(Imp_genes,file = 'Results/Liger_LeaveOneOut.csv')
88 | write.csv(NMF_time,file = 'Results/Liger_NMF_time.csv',row.names = FALSE)
89 | write.csv(knn_time,file = 'Results/Liger_knn_time.csv',row.names = FALSE)
90 | 


--------------------------------------------------------------------------------
/benchmark/osmFISH_AllenSSp/Seurat/allen_brain.R:
--------------------------------------------------------------------------------
 1 | setwd("osmFISH_AllenSSp/")
 2 | library(Seurat)
 3 | 
 4 | allen <- read.table(file = "data/Allen_SSp/SSp_exons_matrix.csv",
 5 |                     row.names = 1,sep = ',', stringsAsFactors = FALSE, header = TRUE)
 6 | allen <- as.matrix(x = allen)
 7 | 
 8 | meta.data <- read.csv(file = "data/Allen_SSp/AllenSSp_metadata.csv",
 9 |                       row.names = 1, stringsAsFactors = FALSE)
10 | 
11 | al <- CreateSeuratObject(counts = allen, project = 'SSp', min.cells = 10)
12 | ok.cells <- colnames(al[,meta.data$class_label != 'Exclude'])
13 | al <- al[, ok.cells]
14 | al <- NormalizeData(object = al)
15 | al <- FindVariableFeatures(object = al, nfeatures = 2000)
16 | al <- ScaleData(object = al)
17 | al <- RunPCA(object = al, npcs = 50, verbose = FALSE)
18 | al <- RunUMAP(object = al, dims = 1:50, nneighbors = 5)
19 | saveRDS(object = al, file = paste0("data/seurat_objects/","allen_brain_SSp.rds"))
20 | 


--------------------------------------------------------------------------------
/benchmark/osmFISH_AllenSSp/Seurat/impute_osmFISH.R:
--------------------------------------------------------------------------------
 1 | setwd("osmFISH_AllenSSp/")
 2 | library(Seurat)
 3 | 
 4 | osmFISH <- readRDS("data/seurat_objects/osmFISH_Cortex.rds")
 5 | allen <- readRDS("data/seurat_objects/allen_brain_SSp.rds")
 6 | 
 7 | #Project on allen labels
 8 | i2 <- FindTransferAnchors(
 9 |   reference = allen,
10 |   query = osmFISH,
11 |   features = rownames(osmFISH),
12 |   reduction = 'cca',
13 |   reference.assay = 'RNA',
14 |   query.assay = 'RNA'
15 | )
16 | 
17 | refdata <- GetAssayData(
18 |   object = allen,
19 |   assay = 'RNA',
20 |   slot = 'data'
21 | )
22 | 
23 | imputation <- TransferData(
24 |   anchorset = i2,
25 |   refdata = refdata,
26 |   weight.reduction = 'pca'
27 | )
28 | 
29 | osmFISH[['ss2']] <- imputation
30 | saveRDS(osmFISH, 'data/seurat_objects/osmFISH_Cortex_imputed.rds')


--------------------------------------------------------------------------------
/benchmark/osmFISH_AllenSSp/Seurat/osmFISH.R:
--------------------------------------------------------------------------------
 1 | setwd("osmFISH_AllenSSp/")
 2 | library(Seurat)
 3 | library(hdf5r)
 4 | library(methods)
 5 | 
 6 | osm <- H5File$new("data/osmFISH/osmFISH_SScortex_mouse_all_cells.loom")
 7 | mat <- osm[['matrix']][,]
 8 | colnames(mat) <- osm[['row_attrs']][['Gene']][]
 9 | rownames(mat) <- paste0('osm_', osm[['col_attrs']][['CellID']][])
10 | x_dim <- osm[['col_attrs']][['X']][]
11 | y_dim <- osm[['col_attrs']][['Y']][]
12 | region <- osm[['col_attrs']][['Region']][]
13 | cluster <- osm[['col_attrs']][['ClusterName']][]
14 | osm$close_all()
15 | spatial <- data.frame(spatial1 = x_dim, spatial2 = y_dim)
16 | rownames(spatial) <- rownames(mat)
17 | spatial <- as.matrix(spatial)
18 | mat <- t(mat)
19 | 
20 | osm_seurat <- CreateSeuratObject(counts = mat, project = 'osmFISH', assay = 'RNA', min.cells = -1, min.features = -1)
21 | names(region) <- colnames(osm_seurat)
22 | names(cluster) <- colnames(osm_seurat)
23 | osm_seurat <- AddMetaData(osm_seurat, region, col.name = 'region')
24 | osm_seurat <- AddMetaData(osm_seurat, cluster, col.name = 'cluster')
25 | osm_seurat[['spatial']] <- CreateDimReducObject(embeddings = spatial, key = 'spatial', assay = 'RNA')
26 | Idents(osm_seurat) <- 'region'
27 | osm_seurat <- SubsetData(osm_seurat, ident.remove = c('Excluded','Internal Capsule Caudoputamen','White matter','Hippocampus','Ventricle'))
28 | total.counts = colSums(x = as.matrix(osm_seurat@assays$RNA@counts))
29 | osm_seurat <- NormalizeData(osm_seurat, scale.factor = median(x = total.counts))
30 | osm_seurat <- ScaleData(osm_seurat)
31 | saveRDS(object = osm_seurat, file = 'data/seurat_objects/osmFISH_Cortex.rds')


--------------------------------------------------------------------------------
/benchmark/osmFISH_AllenSSp/Seurat/osmFISH_integration.R:
--------------------------------------------------------------------------------
 1 | setwd("osmFISH_AllenSSp/")
 2 | library(Seurat)
 3 | library(ggplot2)
 4 | 
 5 | osmFISH <- readRDS("data/seurat_objects/osmFISH_Cortex.rds")
 6 | osmFISH.imputed <- readRDS("data/seurat_objects/osmFISH_Cortex_imputed.rds")
 7 | allen <- readRDS("data/seurat_objects/allen_brain_SSp.rds")
 8 | 
 9 | genes.leaveout <- intersect(rownames(osmFISH),rownames(allen))
10 | Imp_genes <- matrix(0,nrow = length(genes.leaveout),ncol = dim(osmFISH@assays$RNA)[2])
11 | rownames(Imp_genes) <- genes.leaveout
12 | anchor_time <- vector(mode= "numeric")
13 | Transfer_time <- vector(mode= "numeric")
14 | 
15 | run_imputation <- function(ref.obj, query.obj, feature.remove) {
16 |   message(paste0('removing ', feature.remove))
17 |   features <- setdiff(rownames(query.obj), feature.remove)
18 |   DefaultAssay(ref.obj) <- 'RNA'
19 |   DefaultAssay(query.obj) <- 'RNA'
20 |   
21 |   start_time <- Sys.time()
22 |   anchors <- FindTransferAnchors(
23 |     reference = ref.obj,
24 |     query = query.obj,
25 |     features = features,
26 |     dims = 1:30,
27 |     reduction = 'cca'
28 |   )
29 |   end_time <- Sys.time()
30 |   anchor_time <<- c(anchor_time,as.numeric(difftime(end_time,start_time,units = 'secs')))
31 |   
32 |   refdata <- GetAssayData(
33 |     object = ref.obj,
34 |     assay = 'RNA',
35 |     slot = 'data'
36 |   )
37 |   
38 |   start_time <- Sys.time()
39 |   imputation <- TransferData(
40 |     anchorset = anchors,
41 |     refdata = refdata,
42 |     weight.reduction = 'pca'
43 |   )
44 |   query.obj[['seq']] <- imputation
45 |   end_time <- Sys.time()
46 |   Transfer_time <<- c(Transfer_time,as.numeric(difftime(end_time,start_time,units = 'secs')))
47 |   return(query.obj)
48 | }
49 | 
50 | for(i in 1:length(genes.leaveout)) {
51 |   imputed.ss2 <- run_imputation(ref.obj = allen, query.obj = osmFISH, feature.remove = genes.leaveout[[i]])
52 |   osmFISH[['ss2']] <- imputed.ss2[, colnames(osmFISH)][['seq']]
53 |   Imp_genes[genes.leaveout[[i]],] = as.vector(osmFISH@assays$ss2[genes.leaveout[i],])
54 | }
55 | write.csv(Imp_genes,file = 'Results/Seurat_LeaveOneOut.csv')
56 | write.csv(anchor_time,file = 'Results/Seurat_anchor_time.csv',row.names = FALSE)
57 | write.csv(Transfer_time,file = 'Results/Seurat_transfer_time.csv',row.names = FALSE)
58 | 
59 | # show genes not in the osmFISH dataset
60 | DefaultAssay(osmFISH.imputed) <- "ss2"
61 | new.genes <- c('Tesc', 'Pvrl3', 'Grm2')
62 | Imp_New_genes <- matrix(0,nrow = length(new.genes),ncol = dim(osmFISH.imputed@assays$ss2)[2])
63 | rownames(Imp_New_genes) <- new.genes
64 | for(i in 1:length(new.genes)) {
65 |   Imp_New_genes[new.genes[[i]],] = as.vector(osmFISH.imputed@assays$ss2[new.genes[i],])
66 | }
67 | write.csv(Imp_New_genes,file = 'Results/Seurat_New_genes.csv')


--------------------------------------------------------------------------------
/benchmark/osmFISH_AllenSSp/SpaGE/Allen_data_preprocessing.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | os.chdir('osmFISH_AllenSSp/')
 3 | 
 4 | import numpy as np
 5 | import pandas as pd
 6 | import scipy.stats as st
 7 | import pickle
 8 | 
 9 | RNA_data = pd.read_csv('data/Allen_SSp/SSp_exons_matrix.csv',
10 |                        header=0,index_col=0,sep=',')
11 | 
12 | # filter lowely expressed genes
13 | Genes_count = np.sum(RNA_data > 0, axis=1)
14 | RNA_data = RNA_data.loc[Genes_count >=10,:]
15 | del Genes_count
16 | 
17 | # filter low quality cells
18 | meta_data = pd.read_csv('data/Allen_SSp/AllenSSp_metadata.csv',
19 |                         header=0,sep=',')
20 | HighQualityCells = meta_data['class_label'] != 'Exclude'
21 | RNA_data = RNA_data.iloc[:,np.where(HighQualityCells)[0]]
22 | del meta_data, HighQualityCells
23 | 
24 | def Log_Norm(x):
25 |     return np.log(((x/np.sum(x))*1000000) + 1)
26 | 
27 | RNA_data = RNA_data.apply(Log_Norm,axis=0)
28 | RNA_data_scaled = pd.DataFrame(data=st.zscore(RNA_data.T),index = RNA_data.columns,columns=RNA_data.index)
29 | 
30 | datadict = dict()
31 | datadict['RNA_data'] = RNA_data.T
32 | datadict['RNA_data_scaled'] = RNA_data_scaled
33 | 
34 | with open('data/SpaGE_pkl/Allen_SSp.pkl','wb') as f:
35 |     pickle.dump(datadict, f)
36 | 


--------------------------------------------------------------------------------
/benchmark/osmFISH_AllenSSp/SpaGE/Integration.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | os.chdir('osmFISH_AllenSSp/')
  3 | 
  4 | import pickle
  5 | import numpy as np
  6 | import pandas as pd
  7 | from sklearn.neighbors import NearestNeighbors
  8 | import time as tm
  9 | 
 10 | with open ('data/SpaGE_pkl/osmFISH_Cortex.pkl', 'rb') as f:
 11 |     datadict = pickle.load(f)
 12 | 
 13 | osmFISH_data = datadict['osmFISH_data']
 14 | osmFISH_data_scaled = datadict['osmFISH_data_scaled']
 15 | osmFISH_meta= datadict['osmFISH_meta']
 16 | del datadict
 17 | 
 18 | with open ('data/SpaGE_pkl/Allen_SSp.pkl', 'rb') as f:
 19 |     datadict = pickle.load(f)
 20 |     
 21 | RNA_data = datadict['RNA_data']
 22 | RNA_data_scaled = datadict['RNA_data_scaled']
 23 | del datadict
 24 | 
 25 | #### Leave One Out Validation ####
 26 | Common_data = RNA_data_scaled[np.intersect1d(osmFISH_data_scaled.columns,RNA_data_scaled.columns)]
 27 | Imp_Genes = pd.DataFrame(columns=Common_data.columns)
 28 | precise_time = []
 29 | knn_time = []
 30 | for i in Common_data.columns:
 31 |     print(i)
 32 |     start = tm.time()
 33 |     from principal_vectors import PVComputation
 34 | 
 35 |     n_factors = 30
 36 |     n_pv = 30
 37 |     dim_reduction = 'pca'
 38 |     dim_reduction_target = 'pca'
 39 | 
 40 |     pv_FISH_RNA = PVComputation(
 41 |             n_factors = n_factors,
 42 |             n_pv = n_pv,
 43 |             dim_reduction = dim_reduction,
 44 |             dim_reduction_target = dim_reduction_target
 45 |     )
 46 | 
 47 |     pv_FISH_RNA.fit(Common_data.drop(i,axis=1),osmFISH_data_scaled[Common_data.columns].drop(i,axis=1))
 48 | 
 49 |     S = pv_FISH_RNA.source_components_.T
 50 |     
 51 |     Effective_n_pv = sum(np.diag(pv_FISH_RNA.cosine_similarity_matrix_) > 0.3)
 52 |     S = S[:,0:Effective_n_pv]
 53 | 
 54 |     Common_data_t = Common_data.drop(i,axis=1).dot(S)
 55 |     FISH_exp_t = osmFISH_data_scaled[Common_data.columns].drop(i,axis=1).dot(S)
 56 |     precise_time.append(tm.time()-start)
 57 | 
 58 |     start = tm.time()    
 59 |     nbrs = NearestNeighbors(n_neighbors=50, algorithm='auto',metric = 'cosine').fit(Common_data_t)
 60 |     distances, indices = nbrs.kneighbors(FISH_exp_t)
 61 |     
 62 |     Imp_Gene = np.zeros(osmFISH_data.shape[0])
 63 |     for j in range(0,osmFISH_data.shape[0]):
 64 |         weights = 1-(distances[j,:][distances[j,:]<1])/(np.sum(distances[j,:][distances[j,:]<1]))
 65 |         weights = weights/(len(weights)-1)
 66 |         Imp_Gene[j] = np.sum(np.multiply(RNA_data[i][indices[j,:][distances[j,:] < 1]],weights))
 67 |     Imp_Gene[np.isnan(Imp_Gene)] = 0
 68 |     Imp_Genes[i] = Imp_Gene
 69 |     knn_time.append(tm.time()-start)
 70 |         
 71 | Imp_Genes.to_csv('Results/SpaGE_LeaveOneOut.csv')
 72 | precise_time = pd.DataFrame(precise_time)
 73 | knn_time = pd.DataFrame(knn_time)
 74 | precise_time.to_csv('Results/SpaGE_PreciseTime.csv', index = False)
 75 | knn_time.to_csv('Results/SpaGE_knnTime.csv', index = False)
 76 | 
 77 | #### Novel Genes Expression Patterns ####
 78 | Common_data = RNA_data_scaled[np.intersect1d(osmFISH_data_scaled.columns,RNA_data_scaled.columns)]
 79 | genes_to_impute = ["Tesc","Pvrl3","Grm2"]
 80 | Imp_New_Genes = pd.DataFrame(np.zeros((osmFISH_data.shape[0],len(genes_to_impute))),columns=genes_to_impute)
 81 | 
 82 | from principal_vectors import PVComputation
 83 | 
 84 | n_factors = 30
 85 | n_pv = 30
 86 | dim_reduction = 'pca'
 87 | dim_reduction_target = 'pca'
 88 | 
 89 | pv_FISH_RNA = PVComputation(
 90 |         n_factors = n_factors,
 91 |         n_pv = n_pv,
 92 |         dim_reduction = dim_reduction,
 93 |         dim_reduction_target = dim_reduction_target
 94 | )
 95 | 
 96 | pv_FISH_RNA.fit(Common_data,osmFISH_data_scaled[Common_data.columns])
 97 | 
 98 | S = pv_FISH_RNA.source_components_.T
 99 |     
100 | Effective_n_pv = sum(np.diag(pv_FISH_RNA.cosine_similarity_matrix_) > 0.3)
101 | S = S[:,0:Effective_n_pv]
102 | 
103 | Common_data_t = Common_data.dot(S)
104 | FISH_exp_t = osmFISH_data_scaled[Common_data.columns].dot(S)
105 |     
106 | nbrs = NearestNeighbors(n_neighbors=50, algorithm='auto',
107 |                         metric = 'cosine').fit(Common_data_t)
108 | distances, indices = nbrs.kneighbors(FISH_exp_t)
109 | 
110 | for j in range(0,osmFISH_data.shape[0]):
111 | 
112 |     weights = 1-(distances[j,:][distances[j,:]<1])/(np.sum(distances[j,:][distances[j,:]<1]))
113 |     weights = weights/(len(weights)-1)
114 |     Imp_New_Genes.iloc[j,:] = np.dot(weights,RNA_data[genes_to_impute].iloc[indices[j,:][distances[j,:] < 1]])
115 |     
116 | Imp_New_Genes.to_csv('Results/SpaGE_New_genes.csv')   


--------------------------------------------------------------------------------
/benchmark/osmFISH_AllenSSp/SpaGE/Precise_output.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | os.chdir('osmFISH_AllenSSp/')
 3 | 
 4 | import pickle
 5 | import numpy as np
 6 | import pandas as pd
 7 | import matplotlib
 8 | matplotlib.rcParams['pdf.fonttype'] = 42
 9 | matplotlib.rcParams['ps.fonttype'] = 42
10 | import matplotlib.pyplot as plt
11 | import seaborn as sns
12 | import sys
13 | sys.path.insert(1,'SpaGE/')
14 | from principal_vectors import PVComputation
15 | 
16 | with open ('data/SpaGE_pkl/osmFISH_Cortex.pkl', 'rb') as f:
17 |     datadict = pickle.load(f)
18 | 
19 | osmFISH_data = datadict['osmFISH_data']
20 | osmFISH_data_scaled = datadict['osmFISH_data_scaled']
21 | osmFISH_meta= datadict['osmFISH_meta']
22 | del datadict
23 | 
24 | with open ('data/SpaGE_pkl/Allen_SSp.pkl', 'rb') as f:
25 |     datadict = pickle.load(f)
26 |     
27 | RNA_data = datadict['RNA_data']
28 | RNA_data_scaled = datadict['RNA_data_scaled']
29 | del datadict
30 | 
31 | Common_data = RNA_data_scaled[np.intersect1d(osmFISH_data_scaled.columns,RNA_data_scaled.columns)]
32 | 
33 | n_factors = 30
34 | n_pv = 30
35 | n_pv_display = 30
36 | dim_reduction = 'pca'
37 | dim_reduction_target = 'pca'
38 | 
39 | pv_FISH_RNA = PVComputation(
40 |         n_factors = n_factors,
41 |         n_pv = n_pv,
42 |         dim_reduction = dim_reduction,
43 |         dim_reduction_target = dim_reduction_target
44 | )
45 | 
46 | pv_FISH_RNA.fit(Common_data,osmFISH_data_scaled[Common_data.columns])
47 | 
48 | fig = plt.figure()
49 | sns.heatmap(pv_FISH_RNA.initial_cosine_similarity_matrix_[:n_pv_display,:n_pv_display], cmap='seismic_r',
50 |             center=0, vmax=1., vmin=0)
51 | plt.xlabel('osmFISH',fontsize=18, color='black')
52 | plt.ylabel('Allen_SSp',fontsize=18, color='black')
53 | plt.xticks(np.arange(n_pv_display)+0.5, range(1, n_pv_display+1), fontsize=12)
54 | plt.yticks(np.arange(n_pv_display)+0.5, range(1, n_pv_display+1), fontsize=12, rotation='horizontal')
55 | plt.gca().set_ylim([n_pv_display,0])
56 | plt.show()
57 | 
58 | plt.figure()
59 | sns.heatmap(pv_FISH_RNA.cosine_similarity_matrix_[:n_pv_display,:n_pv_display], cmap='seismic_r',
60 |             center=0, vmax=1., vmin=0)
61 | for i in range(n_pv_display-1):
62 |     plt.text(i+1,i+.7,'%1.2f'%pv_FISH_RNA.cosine_similarity_matrix_[i,i], fontsize=14,color='black')
63 |     
64 | plt.xlabel('osmFISH',fontsize=18, color='black')
65 | plt.ylabel('Allen_SSp',fontsize=18, color='black')
66 | plt.xticks(np.arange(n_pv_display)+0.5, range(1, n_pv_display+1), fontsize=12)
67 | plt.yticks(np.arange(n_pv_display)+0.5, range(1, n_pv_display+1), fontsize=12, rotation='horizontal')
68 | plt.gca().set_ylim([n_pv_display,0])
69 | plt.show()
70 | 
71 | Importance = pd.Series(np.sum(pv_FISH_RNA.source_components_**2,axis=0),index=Common_data.columns)
72 | Importance.sort_values(ascending=False,inplace=True)
73 | Importance.index[0:30]
74 | 
75 | ### Technology specific Processes
76 | Effective_n_pv = sum(np.diag(pv_FISH_RNA.cosine_similarity_matrix_) > 0.3)
77 | 
78 | # explained variance RNA
79 | np.sum(pv_FISH_RNA.source_explained_variance_ratio_[np.arange(Effective_n_pv)])*100
80 | # explained variance spatial
81 | np.sum(pv_FISH_RNA.target_explained_variance_ratio_[np.arange(Effective_n_pv)])*100
82 | 


--------------------------------------------------------------------------------
/benchmark/osmFISH_AllenSSp/SpaGE/dimensionality_reduction.py:
--------------------------------------------------------------------------------
 1 | """ Dimensionality Reduction
 2 | @author: Soufiane Mourragui
 3 | This module extracts the domain-specific factors from the high-dimensional omics
 4 | dataset. Several methods are here implemented and they can be directly
 5 | called from string name in main method method. All the methods
 6 | use scikit-learn implementation.
 7 | Notes
 8 | -------
 9 | 	-
10 | 	
11 | References
12 | -------
13 | 	[1] Pedregosa, Fabian, et al. (2011) Scikit-learn: Machine learning in Python.
14 | 	Journal of Machine Learning Research
15 | """
16 | 
17 | import numpy as np
18 | from sklearn.decomposition import PCA, FastICA, FactorAnalysis, NMF, SparsePCA
19 | from sklearn.cross_decomposition import PLSRegression
20 | 
21 | 
22 | def process_dim_reduction(method='pca', n_dim=10):
23 |     """
24 |     Default linear dimensionality reduction method. For each method, return a
25 |     BaseEstimator instance corresponding to the method given as input.
26 | 	Attributes
27 |     -------
28 |     method: str, default to 'pca'
29 |     	Method used for dimensionality reduction.
30 |     	Implemented: 'pca', 'ica', 'fa' (Factor Analysis), 
31 |     	'nmf' (Non-negative matrix factorisation), 'sparsepca' (Sparse PCA).
32 |     
33 |     n_dim: int, default to 10
34 |     	Number of domain-specific factors to compute.
35 |     Return values
36 |     -------
37 |     Classifier, i.e. BaseEstimator instance
38 |     """
39 | 
40 |     if method.lower() == 'pca':
41 |         clf = PCA(n_components=n_dim)
42 | 
43 |     elif method.lower() == 'ica':
44 |         print('ICA')
45 |         clf = FastICA(n_components=n_dim)
46 | 
47 |     elif method.lower() == 'fa':
48 |         clf = FactorAnalysis(n_components=n_dim)
49 | 
50 |     elif method.lower() == 'nmf':
51 |         clf = NMF(n_components=n_dim)
52 | 
53 |     elif method.lower() == 'sparsepca':
54 |         clf = SparsePCA(n_components=n_dim, alpha=10., tol=1e-4, verbose=10, n_jobs=1)
55 | 
56 |     elif method.lower() == 'pls':
57 |         clf = PLS(n_components=n_dim)
58 | 		
59 |     else:
60 |         raise NameError('%s is not an implemented method'%(method))
61 | 
62 |     return clf
63 | 
64 | 
65 | class PLS():
66 |     """
67 |     Implement PLS to make it compliant with the other dimensionality
68 |     reduction methodology.
69 |     (Simple class rewritting).
70 |     """
71 |     def __init__(self, n_components=10):
72 |         self.clf = PLSRegression(n_components)
73 | 
74 |     def get_components_(self):
75 |         return self.clf.x_weights_.transpose()
76 | 
77 |     def set_components_(self, x):
78 |         pass
79 | 
80 |     components_ = property(get_components_, set_components_)
81 | 
82 |     def fit(self, X, y):
83 |         self.clf.fit(X,y)
84 |         return self
85 | 
86 |     def transform(self, X):
87 |         return self.clf.transform(X)
88 | 
89 |     def predict(self, X):
90 |         return self.clf.predict(X)


--------------------------------------------------------------------------------
/benchmark/osmFISH_AllenSSp/SpaGE/osmFISH_data_preprocessing.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | os.chdir('osmFISH_AllenSSp/')
 3 | 
 4 | import numpy as np
 5 | import pandas as pd
 6 | import scipy.stats as st
 7 | import pickle
 8 | import loompy
 9 | 
10 | ## Read loom data
11 | ds = loompy.connect('data/osmFISH/osmFISH_SScortex_mouse_all_cells.loom')
12 | 
13 | FISH_Genes = ds.ra['Gene']
14 |     
15 | colAtr = ds.ca.keys()
16 | 
17 | df = pd.DataFrame()
18 | for i in colAtr:
19 |     df[i] = ds.ca[i]
20 | 
21 | osmFISH_meta = df.iloc[np.where(df.Valid == 1)[0], :]
22 | osmFISH_data = ds[:,:]
23 | osmFISH_data = osmFISH_data[:,np.where(df.Valid == 1)[0]]
24 | osmFISH_data = pd.DataFrame(data= osmFISH_data, index= FISH_Genes)
25 | 
26 | del ds, colAtr, i, df, FISH_Genes
27 | 
28 | Cortex_Regions = ['Layer 2-3 lateral', 'Layer 2-3 medial', 'Layer 3-4', 
29 |                   'Layer 4','Layer 5', 'Layer 6', 'Pia Layer 1']
30 | Cortical = np.stack(i in Cortex_Regions for i in osmFISH_meta.Region)
31 | 
32 | osmFISH_meta = osmFISH_meta.iloc[Cortical,:]
33 | osmFISH_data = osmFISH_data.iloc[:,Cortical]
34 | 
35 | cell_count = np.sum(osmFISH_data,axis=0)
36 | def Log_Norm(x):
37 |     return np.log(((x/np.sum(x))*np.median(cell_count)) + 1)
38 | 
39 | osmFISH_data = osmFISH_data.apply(Log_Norm,axis=0)
40 | osmFISH_data_scaled = pd.DataFrame(data=st.zscore(osmFISH_data.T),index = osmFISH_data.columns,columns=osmFISH_data.index)
41 | 
42 | datadict = dict()
43 | datadict['osmFISH_data'] = osmFISH_data.T
44 | datadict['osmFISH_data_scaled'] = osmFISH_data_scaled
45 | datadict['osmFISH_meta'] = osmFISH_meta
46 | 
47 | with open('data/SpaGE_pkl/osmFISH_Cortex.pkl','wb') as f:
48 |     pickle.dump(datadict, f)
49 | 


--------------------------------------------------------------------------------
/benchmark/osmFISH_AllenSSp/gimVI/gimVI.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | os.chdir('osmFISH_AllenSSp/')
  3 | 
  4 | from scvi.dataset import CsvDataset
  5 | from scvi.models import JVAE, Classifier
  6 | from scvi.inference import JVAETrainer
  7 | import numpy as np
  8 | import pandas as pd
  9 | import copy
 10 | import torch
 11 | import time as tm
 12 | 
 13 | ### osmFISH data
 14 | osmFISH_data = CsvDataset('data/gimVI_data/osmFISH_Cortex_scvi.csv', save_path = "", sep = ",", gene_by_cell = False)
 15 | 
 16 | ### RNA
 17 | RNA_data = CsvDataset('data/gimVI_data/Allen_SSp_data_scvi.csv', save_path = "", sep = ",", gene_by_cell = False)
 18 | 
 19 | ### Leave-one-out validation
 20 | Common_data = copy.deepcopy(RNA_data)
 21 | Common_data.gene_names = osmFISH_data.gene_names
 22 | Common_data.X = Common_data.X[:,np.reshape(np.vstack(np.argwhere(i==RNA_data.gene_names) for i in osmFISH_data.gene_names),-1)]
 23 | Gene_set = np.intersect1d(osmFISH_data.gene_names,Common_data.gene_names)
 24 | Imp_Genes = pd.DataFrame(columns=Gene_set)
 25 | gimVI_time = []
 26 | 
 27 | for i in Gene_set:
 28 |     print(i)
 29 |     # Create copy of the fish dataset with hidden genes
 30 |     data_spatial_partial = copy.deepcopy(osmFISH_data)
 31 |     data_spatial_partial.filter_genes_by_attribute(np.setdiff1d(osmFISH_data.gene_names,i))
 32 |     data_spatial_partial.batch_indices += Common_data.n_batches
 33 |     
 34 |     if(data_spatial_partial.X.shape[0] != osmFISH_data.X.shape[0]):
 35 |         continue
 36 |     
 37 |     datasets = [Common_data, data_spatial_partial]
 38 |     generative_distributions = ["zinb", "nb"]
 39 |     gene_mappings = [slice(None), np.reshape(np.vstack(np.argwhere(i==Common_data.gene_names) for i in data_spatial_partial.gene_names),-1)]
 40 |     n_inputs = [d.nb_genes for d in datasets]
 41 |     total_genes = Common_data.nb_genes
 42 |     n_batches = sum([d.n_batches for d in datasets])
 43 |     
 44 |     model_library_size = [True, False]
 45 |     
 46 |     n_latent = 8
 47 |     kappa = 5
 48 |     
 49 |     start = tm.time()
 50 |     torch.manual_seed(0)
 51 |     
 52 |     model = JVAE(
 53 |         n_inputs,
 54 |         total_genes,
 55 |         gene_mappings,
 56 |         generative_distributions,
 57 |         model_library_size,
 58 |         n_layers_decoder_individual=0,
 59 |         n_layers_decoder_shared=0,
 60 |         n_layers_encoder_individual=1,
 61 |         n_layers_encoder_shared=1,
 62 |         dim_hidden_encoder=64,
 63 |         dim_hidden_decoder_shared=64,
 64 |         dropout_rate_encoder=0.2,
 65 |         dropout_rate_decoder=0.2,
 66 |         n_batch=n_batches,
 67 |         n_latent=n_latent,
 68 |     )
 69 |     discriminator = Classifier(n_latent, 32, 2, 3, logits=True)
 70 |     
 71 |     trainer = JVAETrainer(model, discriminator, datasets, 0.95, frequency=1, kappa=kappa)
 72 |     trainer.train(n_epochs=200)
 73 |     _,Imputed = trainer.get_imputed_values(normalized=True)
 74 |     Imputed = np.reshape(Imputed[:,np.argwhere(i==Common_data.gene_names)[0]],-1)
 75 |     Imp_Genes[i] = Imputed
 76 |     gimVI_time.append(tm.time()-start)
 77 |     
 78 | Imp_Genes = Imp_Genes.fillna(0)    
 79 | Imp_Genes.to_csv('Results/gimVI_LeaveOneOut.csv')
 80 | gimVI_time = pd.DataFrame(gimVI_time)
 81 | gimVI_time.to_csv('Results/gimVI_Time.csv', index = False)
 82 | 
 83 | ### New genes
 84 | Imp_New_Genes = pd.DataFrame(columns=["TESC","PVRL3","GRM2"])
 85 | 
 86 | # Create copy of the fish dataset with hidden genes
 87 | data_spatial_partial = copy.deepcopy(osmFISH_data)
 88 | data_spatial_partial.filter_genes_by_attribute(osmFISH_data.gene_names)
 89 | data_spatial_partial.batch_indices += RNA_data.n_batches
 90 | 
 91 | datasets = [RNA_data, data_spatial_partial]
 92 | generative_distributions = ["zinb", "nb"]
 93 | gene_mappings = [slice(None), np.reshape(np.vstack(np.argwhere(i==RNA_data.gene_names) for i in data_spatial_partial.gene_names),-1)]
 94 | n_inputs = [d.nb_genes for d in datasets]
 95 | total_genes = RNA_data.nb_genes
 96 | n_batches = sum([d.n_batches for d in datasets])
 97 | 
 98 | model_library_size = [True, False]
 99 | 
100 | n_latent = 8
101 | kappa = 5
102 | 
103 | torch.manual_seed(0)
104 | 
105 | model = JVAE(
106 |     n_inputs,
107 |     total_genes,
108 |     gene_mappings,
109 |     generative_distributions,
110 |     model_library_size,
111 |     n_layers_decoder_individual=0,
112 |     n_layers_decoder_shared=0,
113 |     n_layers_encoder_individual=1,
114 |     n_layers_encoder_shared=1,
115 |     dim_hidden_encoder=64,
116 |     dim_hidden_decoder_shared=64,
117 |     dropout_rate_encoder=0.2,
118 |     dropout_rate_decoder=0.2,
119 |     n_batch=n_batches,
120 |     n_latent=n_latent,
121 | )
122 | discriminator = Classifier(n_latent, 32, 2, 3, logits=True)
123 | 
124 | trainer = JVAETrainer(model, discriminator, datasets, 0.95, frequency=1, kappa=kappa)
125 | trainer.train(n_epochs=200)
126 |     
127 | for i in ["TESC","PVRL3","GRM2"]:
128 |     _,Imputed = trainer.get_imputed_values(normalized=True)
129 |     Imputed = np.reshape(Imputed[:,np.argwhere(i==RNA_data.gene_names)[0]],-1)
130 |     Imp_New_Genes[i] = Imputed
131 |     
132 | Imp_New_Genes.to_csv('Results/gimVI_New_genes.csv')
133 | 


--------------------------------------------------------------------------------
/benchmark/osmFISH_AllenVISp/Liger/LIGER.R:
--------------------------------------------------------------------------------
 1 | setwd("osmFISH_AllenVISp/")
 2 | library(liger)
 3 | library(hdf5r)
 4 | library(methods)
 5 | 
 6 | # allen VISp
 7 | allen <- read.table(file = "data/Allen_VISp/mouse_VISp_2018-06-14_exon-matrix.csv",
 8 |                     row.names = 1, sep = ',', stringsAsFactors = FALSE, header = TRUE)
 9 | allen <- as.matrix(x = allen)
10 | genes <- read.table(file = "data/Allen_VISp/mouse_VISp_2018-06-14_genes-rows.csv",
11 |                     sep = ',', stringsAsFactors = FALSE, header = TRUE)
12 | rownames(x = allen) <- make.unique(names = genes$gene_symbol)
13 | meta.data <- read.csv(file = "data/Allen_VISp/mouse_VISp_2018-06-14_samples-columns.csv",
14 |                       row.names = 1, stringsAsFactors = FALSE)
15 | 
16 | Genes_count = rowSums(allen > 0)
17 | allen <- allen[Genes_count>=10,]
18 | 
19 | low.q.cells <- rownames(x = meta.data[meta.data$class %in% c('Low Quality', 'No Class'), ])
20 | ok.cells <- rownames(x = meta.data)[!(rownames(x = meta.data) %in% low.q.cells)]
21 | allen <-allen[,ok.cells]
22 | 
23 | # osmFISH
24 | osm <- H5File$new("data/osmFISH/osmFISH_SScortex_mouse_all_cells.loom")
25 | mat <- osm[['matrix']][,]
26 | colnames(mat) <- osm[['row_attrs']][['Gene']][]
27 | rownames(mat) <- paste0('osm_', osm[['col_attrs']][['CellID']][])
28 | x_dim <- osm[['col_attrs']][['X']][]
29 | y_dim <- osm[['col_attrs']][['Y']][]
30 | region <- osm[['col_attrs']][['Region']][]
31 | cluster <- osm[['col_attrs']][['ClusterName']][]
32 | osm$close_all()
33 | spatial <- data.frame(spatial1 = x_dim, spatial2 = y_dim)
34 | rownames(spatial) <- rownames(mat)
35 | spatial <- as.matrix(spatial)
36 | osmFISH <- t(mat)
37 | osmFISH <- osmFISH[,!is.element(region,c('Excluded','Internal Capsule Caudoputamen','White matter','Hippocampus','Ventricle'))]
38 | 
39 | Gene_set <- intersect(rownames(osmFISH),rownames(allen))
40 | 
41 | #### New genes prediction
42 | Ligerex <- createLiger(list(SMSC_FISH = osmFISH, SMSC_RNA = allen))
43 | Ligerex <- normalize(Ligerex)
44 | Ligerex@var.genes <- Gene_set
45 | Ligerex <- scaleNotCenter(Ligerex)
46 | 
47 | # suggestK(Ligerex, k.test= seq(5,30,5)) # K = 20
48 | # suggestLambda(Ligerex, k = 20) # Lambda = 20
49 | 
50 | Ligerex <- optimizeALS(Ligerex,k = 20, lambda = 20)
51 | Ligerex <- quantileAlignSNF(Ligerex)
52 | 
53 | Imputation <- imputeKNN(Ligerex,reference = 'SMSC_RNA', queries = list('SMSC_FISH'), norm = TRUE, scale = FALSE)
54 | new.genes <- c('Tesc', 'Pvrl3', 'Grm2')
55 | Imp_New_genes <- matrix(0,nrow = length(new.genes),ncol = dim(Imputation@norm.data$SMSC_FISH)[2])
56 | rownames(Imp_New_genes) <- new.genes
57 | for(i in 1:length(new.genes)) {
58 |   Imp_New_genes[new.genes[[i]],] = as.vector(Imputation@norm.data$SMSC_FISH[new.genes[i],])
59 | }
60 | write.csv(Imp_New_genes,file = 'Results/Liger_New_genes.csv')
61 | 
62 | # leave-one-out validation
63 | genes.leaveout <- intersect(rownames(osmFISH),rownames(allen))
64 | Imp_genes <- matrix(0,nrow = length(genes.leaveout),ncol = dim(osmFISH)[2])
65 | rownames(Imp_genes) <- genes.leaveout
66 | colnames(Imp_genes) <- colnames(osmFISH)
67 | NMF_time <- vector(mode= "numeric")
68 | knn_time <- vector(mode= "numeric")
69 | 
70 | for(i in 1:length(genes.leaveout)) {
71 |   print(i)
72 |   start_time <- Sys.time()
73 |   Ligerex.leaveout <- createLiger(list(SMSC_FISH = osmFISH[-which(rownames(osmFISH) %in% genes.leaveout[i]),], SMSC_RNA = allen[rownames(osmFISH),]))
74 |   Ligerex.leaveout <- normalize(Ligerex.leaveout)
75 |   Ligerex.leaveout@var.genes <- setdiff(Gene_set,genes.leaveout[i])
76 |   Ligerex.leaveout <- scaleNotCenter(Ligerex.leaveout)
77 |   if(dim(Ligerex.leaveout@norm.data$SMSC_FISH)[2]!=dim(osmFISH)[2]){
78 |     next
79 |   }
80 |   Ligerex.leaveout <- optimizeALS(Ligerex.leaveout,k = 20, lambda = 20)
81 |   Ligerex.leaveout <- quantileAlignSNF(Ligerex.leaveout)
82 |   end_time <- Sys.time()
83 |   NMF_time <- c(NMF_time,as.numeric(difftime(end_time,start_time,units = 'secs')))
84 |   
85 |   start_time <- Sys.time()
86 |   Imputation <- imputeKNN(Ligerex.leaveout,reference = 'SMSC_RNA', queries = list('SMSC_FISH'), norm = TRUE, scale = FALSE, knn_k = 30)
87 |   Imp_genes[genes.leaveout[[i]],] = as.vector(Imputation@norm.data$SMSC_FISH[genes.leaveout[i],])
88 |   end_time <- Sys.time()
89 |   knn_time <- c(knn_time,as.numeric(difftime(end_time,start_time,units = 'secs')))
90 | }
91 | write.csv(Imp_genes,file = 'Results/Liger_LeaveOneOut.csv')
92 | write.csv(NMF_time,file = 'Results/Liger_NMF_time.csv',row.names = FALSE)
93 | write.csv(knn_time,file = 'Results/Liger_knn_time.csv',row.names = FALSE)


--------------------------------------------------------------------------------
/benchmark/osmFISH_AllenVISp/Seurat/allen_brain.R:
--------------------------------------------------------------------------------
 1 | setwd("osmFISH_AllenVISp/")
 2 | library(Seurat)
 3 | 
 4 | allen <- read.table(file = "data/Allen_VISp/mouse_VISp_2018-06-14_exon-matrix.csv",
 5 |                     row.names = 1,sep = ',', stringsAsFactors = FALSE, header = TRUE)
 6 | allen <- as.matrix(x = allen)
 7 | genes <- read.table(file = "data/Allen_VISp/mouse_VISp_2018-06-14_genes-rows.csv",
 8 |                     sep = ',', stringsAsFactors = FALSE, header = TRUE)
 9 | rownames(x = allen) <- make.unique(names = genes$gene_symbol)
10 | meta.data <- read.csv(file = "data/Allen_VISp/mouse_VISp_2018-06-14_samples-columns.csv",
11 |                       row.names = 1, stringsAsFactors = FALSE)
12 | 
13 | al <- CreateSeuratObject(counts = allen, project = 'VISp', meta.data = meta.data, min.cells = 10)
14 | low.q.cells <- rownames(x = meta.data[meta.data$class %in% c('Low Quality', 'No Class'), ])
15 | ok.cells <- rownames(x = meta.data)[!(rownames(x = meta.data) %in% low.q.cells)]
16 | al <- al[, ok.cells]
17 | al <- NormalizeData(object = al)
18 | al <- FindVariableFeatures(object = al, nfeatures = 2000)
19 | al <- ScaleData(object = al)
20 | al <- RunPCA(object = al, npcs = 50, verbose = FALSE)
21 | al <- RunUMAP(object = al, dims = 1:50, nneighbors = 5)
22 | saveRDS(object = al, file = paste0("data/seurat_objects/","allen_brain.rds"))


--------------------------------------------------------------------------------
/benchmark/osmFISH_AllenVISp/Seurat/impute_osmFISH.R:
--------------------------------------------------------------------------------
 1 | setwd("osmFISH_AllenVISp/")
 2 | library(Seurat)
 3 | 
 4 | osmFISH <- readRDS("data/seurat_objects/osmFISH_Cortex.rds")
 5 | allen <- readRDS("data/seurat_objects/allen_brain.rds")
 6 | 
 7 | #Project on allen labels
 8 | i2 <- FindTransferAnchors(
 9 |   reference = allen,
10 |   query = osmFISH,
11 |   features = rownames(osmFISH),
12 |   reduction = 'cca',
13 |   reference.assay = 'RNA',
14 |   query.assay = 'RNA'
15 | )
16 | 
17 | refdata <- GetAssayData(
18 |   object = allen,
19 |   assay = 'RNA',
20 |   slot = 'data'
21 | )
22 | 
23 | imputation <- TransferData(
24 |   anchorset = i2,
25 |   refdata = refdata,
26 |   weight.reduction = 'pca'
27 | )
28 | 
29 | osmFISH[['ss2']] <- imputation
30 | saveRDS(osmFISH, 'data/seurat_objects/osmFISH_Cortex_imputed.rds')


--------------------------------------------------------------------------------
/benchmark/osmFISH_AllenVISp/Seurat/osmFISH.R:
--------------------------------------------------------------------------------
 1 | setwd("osmFISH_AllenVISp/")
 2 | library(Seurat)
 3 | library(hdf5r)
 4 | library(methods)
 5 | 
 6 | osm <- H5File$new("data/osmFISH/osmFISH_SScortex_mouse_all_cells.loom")
 7 | mat <- osm[['matrix']][,]
 8 | colnames(mat) <- osm[['row_attrs']][['Gene']][]
 9 | rownames(mat) <- paste0('osm_', osm[['col_attrs']][['CellID']][])
10 | x_dim <- osm[['col_attrs']][['X']][]
11 | y_dim <- osm[['col_attrs']][['Y']][]
12 | region <- osm[['col_attrs']][['Region']][]
13 | cluster <- osm[['col_attrs']][['ClusterName']][]
14 | osm$close_all()
15 | spatial <- data.frame(spatial1 = x_dim, spatial2 = y_dim)
16 | rownames(spatial) <- rownames(mat)
17 | spatial <- as.matrix(spatial)
18 | mat <- t(mat)
19 | 
20 | osm_seurat <- CreateSeuratObject(counts = mat, project = 'osmFISH', assay = 'RNA', min.cells = -1, min.features = -1)
21 | names(region) <- colnames(osm_seurat)
22 | names(cluster) <- colnames(osm_seurat)
23 | osm_seurat <- AddMetaData(osm_seurat, region, col.name = 'region')
24 | osm_seurat <- AddMetaData(osm_seurat, cluster, col.name = 'cluster')
25 | osm_seurat[['spatial']] <- CreateDimReducObject(embeddings = spatial, key = 'spatial', assay = 'RNA')
26 | Idents(osm_seurat) <- 'region'
27 | osm_seurat <- SubsetData(osm_seurat, ident.remove = c('Excluded','Internal Capsule Caudoputamen','White matter','Hippocampus','Ventricle'))
28 | total.counts = colSums(x = as.matrix(osm_seurat@assays$RNA@counts))
29 | osm_seurat <- NormalizeData(osm_seurat, scale.factor = median(x = total.counts))
30 | osm_seurat <- ScaleData(osm_seurat)
31 | saveRDS(object = osm_seurat, file = 'data/seurat_objects/osmFISH_Cortex.rds')


--------------------------------------------------------------------------------
/benchmark/osmFISH_AllenVISp/Seurat/osmFISH_integration.R:
--------------------------------------------------------------------------------
 1 | setwd("osmFISH_AllenVISp/")
 2 | library(Seurat)
 3 | library(ggplot2)
 4 | 
 5 | osmFISH <- readRDS("data/seurat_objects/osmFISH_Cortex.rds")
 6 | osmFISH.imputed <- readRDS("data/seurat_objects/osmFISH_Cortex_imputed.rds")
 7 | allen <- readRDS("data/seurat_objects/allen_brain.rds")
 8 | 
 9 | genes.leaveout <- intersect(rownames(osmFISH),rownames(allen))
10 | Imp_genes <- matrix(0,nrow = length(genes.leaveout),ncol = dim(osmFISH@assays$RNA)[2])
11 | rownames(Imp_genes) <- genes.leaveout
12 | anchor_time <- vector(mode= "numeric")
13 | Transfer_time <- vector(mode= "numeric")
14 | 
15 | run_imputation <- function(ref.obj, query.obj, feature.remove) {
16 |   message(paste0('removing ', feature.remove))
17 |   features <- setdiff(rownames(query.obj), feature.remove)
18 |   DefaultAssay(ref.obj) <- 'RNA'
19 |   DefaultAssay(query.obj) <- 'RNA'
20 |   
21 |   start_time <- Sys.time()
22 |   anchors <- FindTransferAnchors(
23 |     reference = ref.obj,
24 |     query = query.obj,
25 |     features = features,
26 |     dims = 1:30,
27 |     reduction = 'cca'
28 |   )
29 |   end_time <- Sys.time()
30 |   anchor_time <<- c(anchor_time,as.numeric(difftime(end_time,start_time,units = 'secs')))
31 |   
32 |   refdata <- GetAssayData(
33 |     object = ref.obj,
34 |     assay = 'RNA',
35 |     slot = 'data'
36 |   )
37 |   
38 |   start_time <- Sys.time()
39 |   imputation <- TransferData(
40 |     anchorset = anchors,
41 |     refdata = refdata,
42 |     weight.reduction = 'pca'
43 |   )
44 |   query.obj[['seq']] <- imputation
45 |   end_time <- Sys.time()
46 |   Transfer_time <<- c(Transfer_time,as.numeric(difftime(end_time,start_time,units = 'secs')))
47 |   return(query.obj)
48 | }
49 | 
50 | for(i in 1:length(genes.leaveout)) {
51 |   imputed.ss2 <- run_imputation(ref.obj = allen, query.obj = osmFISH, feature.remove = genes.leaveout[[i]])
52 |   osmFISH[['ss2']] <- imputed.ss2[, colnames(osmFISH)][['seq']]
53 |   Imp_genes[genes.leaveout[[i]],] = as.vector(osmFISH@assays$ss2[genes.leaveout[i],])
54 | }
55 | write.csv(Imp_genes,file = 'Results/Seurat_LeaveOneOut.csv')
56 | write.csv(anchor_time,file = 'Results/Seurat_anchor_time.csv',row.names = FALSE)
57 | write.csv(Transfer_time,file = 'Results/Seurat_transfer_time.csv',row.names = FALSE)
58 | 
59 | # show genes not in the osmFISH dataset
60 | DefaultAssay(osmFISH.imputed) <- "ss2"
61 | new.genes <- c('Tesc', 'Pvrl3', 'Grm2')
62 | Imp_New_genes <- matrix(0,nrow = length(new.genes),ncol = dim(osmFISH.imputed@assays$ss2)[2])
63 | rownames(Imp_New_genes) <- new.genes
64 | for(i in 1:length(new.genes)) {
65 |   Imp_New_genes[new.genes[[i]],] = as.vector(osmFISH.imputed@assays$ss2[new.genes[i],])
66 | }
67 | write.csv(Imp_New_genes,file = 'Results/Seurat_New_genes.csv')


--------------------------------------------------------------------------------
/benchmark/osmFISH_AllenVISp/SpaGE/Allen_data_preprocessing.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | os.chdir('osmFISH_AllenVISp/')
 3 | 
 4 | import numpy as np
 5 | import pandas as pd
 6 | import scipy.stats as st
 7 | import pickle
 8 | 
 9 | RNA_data = pd.read_csv('data/Allen_VISp/mouse_VISp_2018-06-14_exon-matrix.csv',
10 |                        header=0,index_col=0,sep=',')
11 | 
12 | Genes = pd.read_csv('data/Allen_VISp/mouse_VISp_2018-06-14_genes-rows.csv',
13 |                         header=0,sep=',')
14 | RNA_data.index = Genes.gene_symbol
15 | del Genes
16 | 
17 | # filter lowely expressed genes
18 | Genes_count = np.sum(RNA_data > 0, axis=1)
19 | RNA_data = RNA_data.loc[Genes_count >=10,:]
20 | del Genes_count
21 | 
22 | # filter low quality cells
23 | meta_data = pd.read_csv('data/Allen_VISp/mouse_VISp_2018-06-14_samples-columns.csv',
24 |                         header=0,sep=',')
25 | HighQualityCells = (meta_data['class'] != 'No Class') & (meta_data['class'] != 'Low Quality')
26 | RNA_data = RNA_data.iloc[:,np.where(HighQualityCells)[0]]
27 | del meta_data, HighQualityCells
28 | 
29 | def Log_Norm(x):
30 |     return np.log(((x/np.sum(x))*1000000) + 1)
31 | 
32 | RNA_data = RNA_data.apply(Log_Norm,axis=0)
33 | RNA_data_scaled = pd.DataFrame(data=st.zscore(RNA_data.T),index = RNA_data.columns,columns=RNA_data.index)
34 | 
35 | datadict = dict()
36 | datadict['RNA_data'] = RNA_data.T
37 | datadict['RNA_data_scaled'] = RNA_data_scaled
38 | 
39 | with open('data/SpaGE_pkl/Allen_VISp.pkl','wb') as f:
40 |     pickle.dump(datadict, f)


--------------------------------------------------------------------------------
/benchmark/osmFISH_AllenVISp/SpaGE/Integration.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | os.chdir('osmFISH_AllenVISp/')
  3 | 
  4 | import pickle
  5 | import numpy as np
  6 | import pandas as pd
  7 | from sklearn.neighbors import NearestNeighbors
  8 | import time as tm
  9 | 
 10 | with open ('data/SpaGE_pkl/osmFISH_Cortex.pkl', 'rb') as f:
 11 |     datadict = pickle.load(f)
 12 | 
 13 | osmFISH_data = datadict['osmFISH_data']
 14 | osmFISH_data_scaled = datadict['osmFISH_data_scaled']
 15 | osmFISH_meta= datadict['osmFISH_meta']
 16 | del datadict
 17 | 
 18 | with open ('data/SpaGE_pkl/Allen_VISp.pkl', 'rb') as f:
 19 |     datadict = pickle.load(f)
 20 |     
 21 | RNA_data = datadict['RNA_data']
 22 | RNA_data_scaled = datadict['RNA_data_scaled']
 23 | del datadict
 24 | 
 25 | #### Leave One Out Validation ####
 26 | Common_data = RNA_data_scaled[np.intersect1d(osmFISH_data_scaled.columns,RNA_data_scaled.columns)]
 27 | Imp_Genes = pd.DataFrame(columns=Common_data.columns)
 28 | precise_time = []
 29 | knn_time = []
 30 | for i in Common_data.columns:
 31 |     print(i)
 32 |     start = tm.time()
 33 |     from principal_vectors import PVComputation
 34 | 
 35 |     n_factors = 30
 36 |     n_pv = 30
 37 |     dim_reduction = 'pca'
 38 |     dim_reduction_target = 'pca'
 39 | 
 40 |     pv_FISH_RNA = PVComputation(
 41 |             n_factors = n_factors,
 42 |             n_pv = n_pv,
 43 |             dim_reduction = dim_reduction,
 44 |             dim_reduction_target = dim_reduction_target
 45 |     )
 46 | 
 47 |     pv_FISH_RNA.fit(Common_data.drop(i,axis=1),osmFISH_data_scaled[Common_data.columns].drop(i,axis=1))
 48 | 
 49 |     S = pv_FISH_RNA.source_components_.T
 50 |     
 51 |     Effective_n_pv = sum(np.diag(pv_FISH_RNA.cosine_similarity_matrix_) > 0.3)
 52 |     S = S[:,0:Effective_n_pv]
 53 | 
 54 |     Common_data_t = Common_data.drop(i,axis=1).dot(S)
 55 |     FISH_exp_t = osmFISH_data_scaled[Common_data.columns].drop(i,axis=1).dot(S)
 56 |     precise_time.append(tm.time()-start)
 57 |     
 58 |     start = tm.time()    
 59 |     nbrs = NearestNeighbors(n_neighbors=50, algorithm='auto',metric = 'cosine').fit(Common_data_t)
 60 |     distances, indices = nbrs.kneighbors(FISH_exp_t)
 61 |     
 62 |     Imp_Gene = np.zeros(osmFISH_data.shape[0])
 63 |     for j in range(0,osmFISH_data.shape[0]):
 64 |         weights = 1-(distances[j,:][distances[j,:]<1])/(np.sum(distances[j,:][distances[j,:]<1]))
 65 |         weights = weights/(len(weights)-1)
 66 |         Imp_Gene[j] = np.sum(np.multiply(RNA_data[i][indices[j,:][distances[j,:] < 1]],weights))
 67 |     Imp_Gene[np.isnan(Imp_Gene)] = 0
 68 |     Imp_Genes[i] = Imp_Gene
 69 |     knn_time.append(tm.time()-start)
 70 | 
 71 | Imp_Genes.to_csv('Results/SpaGE_LeaveOneOut.csv')
 72 | precise_time = pd.DataFrame(precise_time)
 73 | knn_time = pd.DataFrame(knn_time)
 74 | precise_time.to_csv('Results/SpaGE_PreciseTime.csv', index = False)
 75 | knn_time.to_csv('Results/SpaGE_knnTime.csv', index = False)
 76 | 
 77 | #### Novel Genes Expression Patterns ####
 78 | Common_data = RNA_data_scaled[np.intersect1d(osmFISH_data_scaled.columns,RNA_data_scaled.columns)]
 79 | genes_to_impute = ["Tesc","Pvrl3","Grm2"]
 80 | Imp_New_Genes = pd.DataFrame(np.zeros((osmFISH_data.shape[0],len(genes_to_impute))),columns=genes_to_impute)
 81 | 
 82 | from principal_vectors import PVComputation
 83 | 
 84 | n_factors = 30
 85 | n_pv = 30
 86 | dim_reduction = 'pca'
 87 | dim_reduction_target = 'pca'
 88 | 
 89 | pv_FISH_RNA = PVComputation(
 90 |         n_factors = n_factors,
 91 |         n_pv = n_pv,
 92 |         dim_reduction = dim_reduction,
 93 |         dim_reduction_target = dim_reduction_target
 94 | )
 95 | 
 96 | pv_FISH_RNA.fit(Common_data,osmFISH_data_scaled[Common_data.columns])
 97 | 
 98 | S = pv_FISH_RNA.source_components_.T
 99 |     
100 | Effective_n_pv = sum(np.diag(pv_FISH_RNA.cosine_similarity_matrix_) > 0.3)
101 | S = S[:,0:Effective_n_pv]
102 | 
103 | Common_data_t = Common_data.dot(S)
104 | FISH_exp_t = osmFISH_data_scaled[Common_data.columns].dot(S)
105 |     
106 | nbrs = NearestNeighbors(n_neighbors=50, algorithm='auto',
107 |                         metric = 'cosine').fit(Common_data_t)
108 | distances, indices = nbrs.kneighbors(FISH_exp_t)
109 | 
110 | for j in range(0,osmFISH_data.shape[0]):
111 | 
112 |     weights = 1-(distances[j,:][distances[j,:]<1])/(np.sum(distances[j,:][distances[j,:]<1]))
113 |     weights = weights/(len(weights)-1)
114 |     Imp_New_Genes.iloc[j,:] = np.dot(weights,RNA_data[genes_to_impute].iloc[indices[j,:][distances[j,:] < 1]])
115 | 
116 | Imp_New_Genes.to_csv('Results/SpaGE_New_genes.csv')   


--------------------------------------------------------------------------------
/benchmark/osmFISH_AllenVISp/SpaGE/Precise_output.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | os.chdir('osmFISH_AllenVISp/')
 3 | 
 4 | import pickle
 5 | import numpy as np
 6 | import pandas as pd
 7 | import matplotlib
 8 | matplotlib.rcParams['pdf.fonttype'] = 42
 9 | matplotlib.rcParams['ps.fonttype'] = 42
10 | import matplotlib.pyplot as plt
11 | from matplotlib import cm
12 | import seaborn as sns
13 | import sys
14 | sys.path.insert(1,'SpaGE/')
15 | from principal_vectors import PVComputation
16 | 
17 | with open ('data/SpaGE_pkl/osmFISH_Cortex.pkl', 'rb') as f:
18 |     datadict = pickle.load(f)
19 | 
20 | osmFISH_data = datadict['osmFISH_data']
21 | osmFISH_data_scaled = datadict['osmFISH_data_scaled']
22 | osmFISH_meta= datadict['osmFISH_meta']
23 | del datadict
24 | 
25 | with open ('data/SpaGE_pkl/Allen_VISp.pkl', 'rb') as f:
26 |     datadict = pickle.load(f)
27 |     
28 | RNA_data = datadict['RNA_data']
29 | RNA_data_scaled = datadict['RNA_data_scaled']
30 | del datadict
31 | 
32 | all_centroids  = osmFISH_meta[['X','Y']]
33 | cmap = cm.get_cmap('viridis',20)
34 | 
35 | Common_data = RNA_data_scaled[np.intersect1d(osmFISH_data_scaled.columns,RNA_data_scaled.columns)]
36 | 
37 | n_factors = 30
38 | n_pv = 30
39 | n_pv_display = 30
40 | dim_reduction = 'pca'
41 | dim_reduction_target = 'pca'
42 | 
43 | pv_FISH_RNA = PVComputation(
44 |         n_factors = n_factors,
45 |         n_pv = n_pv,
46 |         dim_reduction = dim_reduction,
47 |         dim_reduction_target = dim_reduction_target
48 | )
49 | 
50 | pv_FISH_RNA.fit(Common_data,osmFISH_data_scaled[Common_data.columns])
51 | 
52 | fig = plt.figure()
53 | sns.heatmap(pv_FISH_RNA.initial_cosine_similarity_matrix_[:n_pv_display,:n_pv_display], cmap='seismic_r',
54 |             center=0, vmax=1., vmin=0)
55 | plt.xlabel('osmFISH',fontsize=18, color='black')
56 | plt.ylabel('Allen_VISp',fontsize=18, color='black')
57 | plt.xticks(np.arange(n_pv_display)+0.5, range(1, n_pv_display+1), fontsize=12)
58 | plt.yticks(np.arange(n_pv_display)+0.5, range(1, n_pv_display+1), fontsize=12, rotation='horizontal')
59 | plt.gca().set_ylim([n_pv_display,0])
60 | plt.show()
61 | 
62 | plt.figure()
63 | sns.heatmap(pv_FISH_RNA.cosine_similarity_matrix_[:n_pv_display,:n_pv_display], cmap='seismic_r',
64 |             center=0, vmax=1., vmin=0)
65 | for i in range(n_pv_display-1):
66 |     plt.text(i+1,i+.7,'%1.2f'%pv_FISH_RNA.cosine_similarity_matrix_[i,i], fontsize=14,color='black')
67 |     
68 | plt.xlabel('osmFISH',fontsize=18, color='black')
69 | plt.ylabel('Allen_VISp',fontsize=18, color='black')
70 | plt.xticks(np.arange(n_pv_display)+0.5, range(1, n_pv_display+1), fontsize=12)
71 | plt.yticks(np.arange(n_pv_display)+0.5, range(1, n_pv_display+1), fontsize=12, rotation='horizontal')
72 | plt.gca().set_ylim([n_pv_display,0])
73 | plt.show()
74 | 
75 | Importance = pd.Series(np.sum(pv_FISH_RNA.source_components_**2,axis=0),index=Common_data.columns)
76 | Importance.sort_values(ascending=False,inplace=True)
77 | Importance.index[0:30]
78 | 
79 | ### Technology specific Processes
80 | Effective_n_pv = sum(np.diag(pv_FISH_RNA.cosine_similarity_matrix_) > 0.3)
81 | 
82 | # explained variance RNA
83 | np.sum(pv_FISH_RNA.source_explained_variance_ratio_[np.arange(Effective_n_pv)])*100
84 | # explained variance spatial
85 | np.sum(pv_FISH_RNA.target_explained_variance_ratio_[np.arange(Effective_n_pv)])*100
86 | 


--------------------------------------------------------------------------------
/benchmark/osmFISH_AllenVISp/SpaGE/dimensionality_reduction.py:
--------------------------------------------------------------------------------
 1 | """ Dimensionality Reduction
 2 | @author: Soufiane Mourragui
 3 | This module extracts the domain-specific factors from the high-dimensional omics
 4 | dataset. Several methods are here implemented and they can be directly
 5 | called from string name in main method method. All the methods
 6 | use scikit-learn implementation.
 7 | Notes
 8 | -------
 9 | 	-
10 | 	
11 | References
12 | -------
13 | 	[1] Pedregosa, Fabian, et al. (2011) Scikit-learn: Machine learning in Python.
14 | 	Journal of Machine Learning Research
15 | """
16 | 
17 | import numpy as np
18 | from sklearn.decomposition import PCA, FastICA, FactorAnalysis, NMF, SparsePCA
19 | from sklearn.cross_decomposition import PLSRegression
20 | 
21 | 
22 | def process_dim_reduction(method='pca', n_dim=10):
23 |     """
24 |     Default linear dimensionality reduction method. For each method, return a
25 |     BaseEstimator instance corresponding to the method given as input.
26 | 	Attributes
27 |     -------
28 |     method: str, default to 'pca'
29 |     	Method used for dimensionality reduction.
30 |     	Implemented: 'pca', 'ica', 'fa' (Factor Analysis), 
31 |     	'nmf' (Non-negative matrix factorisation), 'sparsepca' (Sparse PCA).
32 |     
33 |     n_dim: int, default to 10
34 |     	Number of domain-specific factors to compute.
35 |     Return values
36 |     -------
37 |     Classifier, i.e. BaseEstimator instance
38 |     """
39 | 
40 |     if method.lower() == 'pca':
41 |         clf = PCA(n_components=n_dim)
42 | 
43 |     elif method.lower() == 'ica':
44 |         print('ICA')
45 |         clf = FastICA(n_components=n_dim)
46 | 
47 |     elif method.lower() == 'fa':
48 |         clf = FactorAnalysis(n_components=n_dim)
49 | 
50 |     elif method.lower() == 'nmf':
51 |         clf = NMF(n_components=n_dim)
52 | 
53 |     elif method.lower() == 'sparsepca':
54 |         clf = SparsePCA(n_components=n_dim, alpha=10., tol=1e-4, verbose=10, n_jobs=1)
55 | 
56 |     elif method.lower() == 'pls':
57 |         clf = PLS(n_components=n_dim)
58 | 		
59 |     else:
60 |         raise NameError('%s is not an implemented method'%(method))
61 | 
62 |     return clf
63 | 
64 | 
65 | class PLS():
66 |     """
67 |     Implement PLS to make it compliant with the other dimensionality
68 |     reduction methodology.
69 |     (Simple class rewritting).
70 |     """
71 |     def __init__(self, n_components=10):
72 |         self.clf = PLSRegression(n_components)
73 | 
74 |     def get_components_(self):
75 |         return self.clf.x_weights_.transpose()
76 | 
77 |     def set_components_(self, x):
78 |         pass
79 | 
80 |     components_ = property(get_components_, set_components_)
81 | 
82 |     def fit(self, X, y):
83 |         self.clf.fit(X,y)
84 |         return self
85 | 
86 |     def transform(self, X):
87 |         return self.clf.transform(X)
88 | 
89 |     def predict(self, X):
90 |         return self.clf.predict(X)


--------------------------------------------------------------------------------
/benchmark/osmFISH_AllenVISp/SpaGE/osmFISH_data_preprocessing.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | os.chdir('osmFISH_AllenVISp/')
 3 | 
 4 | import numpy as np
 5 | import pandas as pd
 6 | import scipy.stats as st
 7 | import pickle
 8 | import loompy
 9 | 
10 | ## Read loom data
11 | ds = loompy.connect('data/osmFISH/osmFISH_SScortex_mouse_all_cells.loom')
12 | 
13 | FISH_Genes = ds.ra['Gene']
14 |     
15 | colAtr = ds.ca.keys()
16 | 
17 | df = pd.DataFrame()
18 | for i in colAtr:
19 |     df[i] = ds.ca[i]
20 | 
21 | osmFISH_meta = df.iloc[np.where(df.Valid == 1)[0], :]
22 | osmFISH_data = ds[:,:]
23 | osmFISH_data = osmFISH_data[:,np.where(df.Valid == 1)[0]]
24 | osmFISH_data = pd.DataFrame(data= osmFISH_data, index= FISH_Genes)
25 | 
26 | del ds, colAtr, i, df, FISH_Genes
27 | 
28 | Cortex_Regions = ['Layer 2-3 lateral', 'Layer 2-3 medial', 'Layer 3-4', 
29 |                   'Layer 4','Layer 5', 'Layer 6', 'Pia Layer 1']
30 | Cortical = np.stack(i in Cortex_Regions for i in osmFISH_meta.Region)
31 | 
32 | osmFISH_meta = osmFISH_meta.iloc[Cortical,:]
33 | osmFISH_data = osmFISH_data.iloc[:,Cortical]
34 | 
35 | cell_count = np.sum(osmFISH_data,axis=0)
36 | def Log_Norm(x):
37 |     return np.log(((x/np.sum(x))*np.median(cell_count)) + 1)
38 | 
39 | osmFISH_data = osmFISH_data.apply(Log_Norm,axis=0)
40 | osmFISH_data_scaled = pd.DataFrame(data=st.zscore(osmFISH_data.T),index = osmFISH_data.columns,columns=osmFISH_data.index)
41 | 
42 | datadict = dict()
43 | datadict['osmFISH_data'] = osmFISH_data.T
44 | datadict['osmFISH_data_scaled'] = osmFISH_data_scaled
45 | datadict['osmFISH_meta'] = osmFISH_meta
46 | 
47 | with open('data/SpaGE_pkl/osmFISH_Cortex.pkl','wb') as f:
48 |     pickle.dump(datadict, f)
49 | 


--------------------------------------------------------------------------------
/benchmark/osmFISH_AllenVISp/gimVI/gimVI.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | os.chdir('osmFISH_AllenVISp/')
  3 | 
  4 | from scvi.dataset import CsvDataset
  5 | from scvi.models import JVAE, Classifier
  6 | from scvi.inference import JVAETrainer
  7 | import numpy as np
  8 | import pandas as pd
  9 | import copy
 10 | import torch
 11 | import time as tm
 12 | 
 13 | ### osmFISH data
 14 | osmFISH_data = CsvDataset('data/gimVI_data/osmFISH_Cortex_scvi.csv', save_path = "", sep = ",", gene_by_cell = False)
 15 | 
 16 | ### RNA
 17 | RNA_data = CsvDataset('data/gimVI_data/Allen_data_scvi.csv', save_path = "", sep = ",", gene_by_cell = False)
 18 | 
 19 | ### Leave-one-out validation
 20 | Common_data = copy.deepcopy(RNA_data)
 21 | Common_data.gene_names = osmFISH_data.gene_names
 22 | Common_data.X = Common_data.X[:,np.reshape(np.vstack(np.argwhere(i==RNA_data.gene_names) for i in osmFISH_data.gene_names),-1)]
 23 | Gene_set = np.intersect1d(osmFISH_data.gene_names,Common_data.gene_names)
 24 | Imp_Genes = pd.DataFrame(columns=Gene_set)
 25 | gimVI_time = []
 26 | 
 27 | for i in Gene_set:
 28 |     print(i)
 29 |     # Create copy of the fish dataset with hidden genes
 30 |     data_spatial_partial = copy.deepcopy(osmFISH_data)
 31 |     data_spatial_partial.filter_genes_by_attribute(np.setdiff1d(osmFISH_data.gene_names,i))
 32 |     data_spatial_partial.batch_indices += Common_data.n_batches
 33 |     
 34 |     if(data_spatial_partial.X.shape[0] != osmFISH_data.X.shape[0]):
 35 |         continue
 36 |     
 37 |     datasets = [Common_data, data_spatial_partial]
 38 |     generative_distributions = ["zinb", "nb"]
 39 |     gene_mappings = [slice(None), np.reshape(np.vstack(np.argwhere(i==Common_data.gene_names) for i in data_spatial_partial.gene_names),-1)]
 40 |     n_inputs = [d.nb_genes for d in datasets]
 41 |     total_genes = Common_data.nb_genes
 42 |     n_batches = sum([d.n_batches for d in datasets])
 43 |     
 44 |     model_library_size = [True, False]
 45 |     
 46 |     n_latent = 8
 47 |     kappa = 5
 48 |     
 49 |     start = tm.time()
 50 |     torch.manual_seed(0)
 51 |     
 52 |     model = JVAE(
 53 |         n_inputs,
 54 |         total_genes,
 55 |         gene_mappings,
 56 |         generative_distributions,
 57 |         model_library_size,
 58 |         n_layers_decoder_individual=0,
 59 |         n_layers_decoder_shared=0,
 60 |         n_layers_encoder_individual=1,
 61 |         n_layers_encoder_shared=1,
 62 |         dim_hidden_encoder=64,
 63 |         dim_hidden_decoder_shared=64,
 64 |         dropout_rate_encoder=0.2,
 65 |         dropout_rate_decoder=0.2,
 66 |         n_batch=n_batches,
 67 |         n_latent=n_latent,
 68 |     )
 69 |     discriminator = Classifier(n_latent, 32, 2, 3, logits=True)
 70 |     
 71 |     trainer = JVAETrainer(model, discriminator, datasets, 0.95, frequency=1, kappa=kappa)
 72 |     trainer.train(n_epochs=200)
 73 |     _,Imputed = trainer.get_imputed_values(normalized=True)
 74 |     Imputed = np.reshape(Imputed[:,np.argwhere(i==Common_data.gene_names)[0]],-1)
 75 |     Imp_Genes[i] = Imputed
 76 |     gimVI_time.append(tm.time()-start)
 77 |     
 78 | Imp_Genes = Imp_Genes.fillna(0)    
 79 | Imp_Genes.to_csv('Results/gimVI_LeaveOneOut.csv')
 80 | gimVI_time = pd.DataFrame(gimVI_time)
 81 | gimVI_time.to_csv('Results/gimVI_Time.csv', index = False)
 82 | 
 83 | ### New genes
 84 | Imp_New_Genes = pd.DataFrame(columns=["TESC","PVRL3","GRM2"])
 85 | 
 86 | # Create copy of the fish dataset with hidden genes
 87 | data_spatial_partial = copy.deepcopy(osmFISH_data)
 88 | data_spatial_partial.filter_genes_by_attribute(osmFISH_data.gene_names)
 89 | data_spatial_partial.batch_indices += RNA_data.n_batches
 90 | 
 91 | datasets = [RNA_data, data_spatial_partial]
 92 | generative_distributions = ["zinb", "nb"]
 93 | gene_mappings = [slice(None), np.reshape(np.vstack(np.argwhere(i==RNA_data.gene_names) for i in data_spatial_partial.gene_names),-1)]
 94 | n_inputs = [d.nb_genes for d in datasets]
 95 | total_genes = RNA_data.nb_genes
 96 | n_batches = sum([d.n_batches for d in datasets])
 97 | 
 98 | model_library_size = [True, False]
 99 | 
100 | n_latent = 8
101 | kappa = 5
102 | 
103 | torch.manual_seed(0)
104 | 
105 | model = JVAE(
106 |     n_inputs,
107 |     total_genes,
108 |     gene_mappings,
109 |     generative_distributions,
110 |     model_library_size,
111 |     n_layers_decoder_individual=0,
112 |     n_layers_decoder_shared=0,
113 |     n_layers_encoder_individual=1,
114 |     n_layers_encoder_shared=1,
115 |     dim_hidden_encoder=64,
116 |     dim_hidden_decoder_shared=64,
117 |     dropout_rate_encoder=0.2,
118 |     dropout_rate_decoder=0.2,
119 |     n_batch=n_batches,
120 |     n_latent=n_latent,
121 | )
122 | discriminator = Classifier(n_latent, 32, 2, 3, logits=True)
123 | 
124 | trainer = JVAETrainer(model, discriminator, datasets, 0.95, frequency=1, kappa=kappa)
125 | trainer.train(n_epochs=200)
126 |     
127 | for i in ["TESC","PVRL3","GRM2"]:
128 |     _,Imputed = trainer.get_imputed_values(normalized=True)
129 |     Imputed = np.reshape(Imputed[:,np.argwhere(i==RNA_data.gene_names)[0]],-1)
130 |     Imp_New_Genes[i] = Imputed
131 |     
132 | Imp_New_Genes.to_csv('Results/gimVI_New_genes.csv')
133 | 


--------------------------------------------------------------------------------
/benchmark/osmFISH_Ziesel/Liger/LIGER.R:
--------------------------------------------------------------------------------
 1 | setwd("osmFISH_Ziesel/")
 2 | library(liger)
 3 | library(hdf5r)
 4 | library(methods)
 5 | 
 6 | # Zeisel SMSC
 7 | Zeisel <- read.delim(file = "data/Zeisel/expression_mRNA_17-Aug-2014.txt",header = FALSE)
 8 | meta.data <- Zeisel[1:10,]
 9 | rownames(meta.data) <- meta.data[,2]
10 | meta.data <- meta.data[,-c(1,2)]
11 | meta.data <- as.data.frame(t(meta.data))
12 | 
13 | Zeisel <- Zeisel[12:19983,]
14 | gene_names <- Zeisel[,1]
15 | Zeisel <- Zeisel[,-c(1,2)]
16 | Zeisel <- apply(Zeisel,2,as.numeric)
17 | Zeisel <- as.matrix(Zeisel)
18 | rownames(Zeisel) <- gene_names
19 | Zeisel <- Zeisel[,meta.data$tissue=='sscortex']
20 | meta.data <- meta.data[meta.data$tissue=='sscortex',]
21 | 
22 | colnames(Zeisel) <- paste0('SMSC_',c(1:dim(Zeisel)[2]))
23 | rownames(meta.data) <- paste0('SMSC_',c(1:dim(meta.data)[1]))
24 | 
25 | gene_count <- rowSums(Zeisel>0)
26 | Zeisel <- Zeisel[gene_count >= 10,]
27 | 
28 | # osmFISH
29 | osm <- H5File$new("data/osmFISH/osmFISH_SScortex_mouse_all_cells.loom")
30 | mat <- osm[['matrix']][,]
31 | colnames(mat) <- osm[['row_attrs']][['Gene']][]
32 | rownames(mat) <- paste0('osm_', osm[['col_attrs']][['CellID']][])
33 | x_dim <- osm[['col_attrs']][['X']][]
34 | y_dim <- osm[['col_attrs']][['Y']][]
35 | region <- osm[['col_attrs']][['Region']][]
36 | cluster <- osm[['col_attrs']][['ClusterName']][]
37 | osm$close_all()
38 | spatial <- data.frame(spatial1 = x_dim, spatial2 = y_dim)
39 | rownames(spatial) <- rownames(mat)
40 | spatial <- as.matrix(spatial)
41 | osmFISH <- t(mat)
42 | osmFISH <- osmFISH[,!is.element(region,c('Excluded','Internal Capsule Caudoputamen','White matter','Hippocampus','Ventricle'))]
43 | 
44 | Gene_set <- intersect(rownames(osmFISH),rownames(Zeisel))
45 | 
46 | #### New genes prediction
47 | Ligerex <- createLiger(list(SMSC_FISH = osmFISH, SMSC_RNA = Zeisel))
48 | Ligerex <- normalize(Ligerex)
49 | Ligerex@var.genes <- Gene_set
50 | Ligerex <- scaleNotCenter(Ligerex)
51 | 
52 | # suggestK(Ligerex, k.test= seq(5,30,5)) # K = 20
53 | # suggestLambda(Ligerex, k = 20) # Lambda = 20
54 | 
55 | Ligerex <- optimizeALS(Ligerex,k = 20, lambda = 20)
56 | Ligerex <- quantileAlignSNF(Ligerex)
57 | 
58 | Imputation <- imputeKNN(Ligerex,reference = 'SMSC_RNA', queries = list('SMSC_FISH'), norm = TRUE, scale = FALSE)
59 | new.genes <- c('Tesc', 'Pvrl3', 'Grm2')
60 | Imp_New_genes <- matrix(0,nrow = length(new.genes),ncol = dim(Imputation@norm.data$SMSC_FISH)[2])
61 | rownames(Imp_New_genes) <- new.genes
62 | for(i in 1:length(new.genes)) {
63 |   Imp_New_genes[new.genes[[i]],] = as.vector(Imputation@norm.data$SMSC_FISH[new.genes[i],])
64 | }
65 | write.csv(Imp_New_genes,file = 'Results/Liger_New_genes.csv')
66 | 
67 | # leave-one-out validation
68 | genes.leaveout <- intersect(rownames(osmFISH),rownames(Zeisel))
69 | Imp_genes <- matrix(0,nrow = length(genes.leaveout),ncol = dim(osmFISH)[2])
70 | rownames(Imp_genes) <- genes.leaveout
71 | colnames(Imp_genes) <- colnames(osmFISH)
72 | NMF_time <- vector(mode= "numeric")
73 | knn_time <- vector(mode= "numeric")
74 | 
75 | for(i in 1:length(genes.leaveout)) {
76 |   print(i)
77 |   start_time <- Sys.time()
78 |   Ligerex.leaveout <- createLiger(list(SMSC_FISH = osmFISH[-which(rownames(osmFISH) %in% genes.leaveout[i]),], SMSC_RNA = Zeisel[rownames(osmFISH),]))
79 |   Ligerex.leaveout <- normalize(Ligerex.leaveout)
80 |   Ligerex.leaveout@var.genes <- setdiff(Gene_set,genes.leaveout[i])
81 |   Ligerex.leaveout <- scaleNotCenter(Ligerex.leaveout)
82 |   if(dim(Ligerex.leaveout@norm.data$SMSC_FISH)[2]!=dim(osmFISH)[2]){
83 |     next
84 |   }
85 |   Ligerex.leaveout <- optimizeALS(Ligerex.leaveout,k = 20, lambda = 20)
86 |   Ligerex.leaveout <- quantileAlignSNF(Ligerex.leaveout)
87 |   end_time <- Sys.time()
88 |   NMF_time <- c(NMF_time,as.numeric(difftime(end_time,start_time,units = 'secs')))
89 |   
90 |   start_time <- Sys.time()
91 |   Imputation <- imputeKNN(Ligerex.leaveout,reference = 'SMSC_RNA', queries = list('SMSC_FISH'), norm = TRUE, scale = FALSE, knn_k = 30)
92 |   Imp_genes[genes.leaveout[[i]],] = as.vector(Imputation@norm.data$SMSC_FISH[genes.leaveout[i],])
93 |   end_time <- Sys.time()
94 |   knn_time <- c(knn_time,as.numeric(difftime(end_time,start_time,units = 'secs')))
95 | }
96 | write.csv(Imp_genes,file = 'Results/Liger_LeaveOneOut.csv')
97 | write.csv(NMF_time,file = 'Results/Liger_NMF_time.csv',row.names = FALSE)
98 | write.csv(knn_time,file = 'Results/Liger_knn_time.csv',row.names = FALSE)


--------------------------------------------------------------------------------
/benchmark/osmFISH_Ziesel/Seurat/Zeisel_Cortex.R:
--------------------------------------------------------------------------------
 1 | setwd("osmFISH_Ziesel/")
 2 | library(Seurat)
 3 | 
 4 | Zeisel <- read.delim(file = "data/Zeisel/expression_mRNA_17-Aug-2014.txt",header = FALSE)
 5 | 
 6 | meta.data <- Zeisel[1:10,]
 7 | rownames(meta.data) <- meta.data[,2]
 8 | meta.data <- meta.data[,-c(1,2)]
 9 | meta.data <- as.data.frame(t(meta.data))
10 | 
11 | Zeisel <- Zeisel[12:19983,]
12 | gene_names <- Zeisel[,1]
13 | Zeisel <- Zeisel[,-c(1,2)]
14 | Zeisel <- apply(Zeisel,2,as.numeric)
15 | Zeisel <- as.matrix(Zeisel)
16 | rownames(Zeisel) <- gene_names
17 | Zeisel <- Zeisel[,meta.data$tissue=='sscortex']
18 | meta.data <- meta.data[meta.data$tissue=='sscortex',]
19 | 
20 | colnames(Zeisel) <- paste0('SMSC_',c(1:dim(Zeisel)[2]))
21 | rownames(meta.data) <- paste0('SMSC_',c(1:dim(meta.data)[1]))
22 | 
23 | Zeisel <- CreateSeuratObject(counts = Zeisel, project = 'SMSC', meta.data = meta.data, min.cells = 10)
24 | Zeisel <- NormalizeData(object = Zeisel)
25 | Zeisel <- FindVariableFeatures(object = Zeisel, nfeatures = 2000)
26 | Zeisel <- ScaleData(object = Zeisel)
27 | Zeisel <- RunPCA(object = Zeisel, npcs = 50, verbose = FALSE)
28 | Zeisel <- RunUMAP(object = Zeisel, dims = 1:50, nneighbors = 5)
29 | saveRDS(object = Zeisel, file = paste0("data/seurat_objects/","Zeisel_SMSC.rds"))


--------------------------------------------------------------------------------
/benchmark/osmFISH_Ziesel/Seurat/impute_osmFISH.R:
--------------------------------------------------------------------------------
 1 | setwd("osmFISH_Ziesel/")
 2 | library(Seurat)
 3 | 
 4 | osmFISH <- readRDS("data/seurat_objects/osmFISH_Cortex.rds")
 5 | Zeisel <- readRDS("data/seurat_objects/Zeisel_SMSC.rds")
 6 | 
 7 | #Project on Zeisel labels
 8 | i2 <- FindTransferAnchors(
 9 |   reference = Zeisel,
10 |   query = osmFISH,
11 |   features = rownames(osmFISH),
12 |   reduction = 'cca',
13 |   reference.assay = 'RNA',
14 |   query.assay = 'RNA'
15 | )
16 | 
17 | refdata <- GetAssayData(
18 |   object = Zeisel,
19 |   assay = 'RNA',
20 |   slot = 'data'
21 | )
22 | 
23 | imputation <- TransferData(
24 |   anchorset = i2,
25 |   refdata = refdata,
26 |   weight.reduction = 'pca'
27 | )
28 | 
29 | osmFISH[['ss2']] <- imputation
30 | saveRDS(osmFISH, 'data/seurat_objects/osmFISH_Cortex_imputed.rds')


--------------------------------------------------------------------------------
/benchmark/osmFISH_Ziesel/Seurat/osmFISH.R:
--------------------------------------------------------------------------------
 1 | setwd("osmFISH_Ziesel/")
 2 | library(Seurat)
 3 | library(hdf5r)
 4 | library(methods)
 5 | 
 6 | osm <- H5File$new("data/osmFISH/osmFISH_SScortex_mouse_all_cells.loom")
 7 | mat <- osm[['matrix']][,]
 8 | colnames(mat) <- osm[['row_attrs']][['Gene']][]
 9 | rownames(mat) <- paste0('osm_', osm[['col_attrs']][['CellID']][])
10 | x_dim <- osm[['col_attrs']][['X']][]
11 | y_dim <- osm[['col_attrs']][['Y']][]
12 | region <- osm[['col_attrs']][['Region']][]
13 | cluster <- osm[['col_attrs']][['ClusterName']][]
14 | osm$close_all()
15 | spatial <- data.frame(spatial1 = x_dim, spatial2 = y_dim)
16 | rownames(spatial) <- rownames(mat)
17 | spatial <- as.matrix(spatial)
18 | mat <- t(mat)
19 | 
20 | osm_seurat <- CreateSeuratObject(counts = mat, project = 'osmFISH', assay = 'RNA', min.cells = -1, min.features = -1)
21 | names(region) <- colnames(osm_seurat)
22 | names(cluster) <- colnames(osm_seurat)
23 | osm_seurat <- AddMetaData(osm_seurat, region, col.name = 'region')
24 | osm_seurat <- AddMetaData(osm_seurat, cluster, col.name = 'cluster')
25 | osm_seurat[['spatial']] <- CreateDimReducObject(embeddings = spatial, key = 'spatial', assay = 'RNA')
26 | Idents(osm_seurat) <- 'region'
27 | osm_seurat <- SubsetData(osm_seurat, ident.remove = c('Excluded','Internal Capsule Caudoputamen','White matter','Hippocampus','Ventricle'))
28 | total.counts = colSums(x = as.matrix(osm_seurat@assays$RNA@counts))
29 | osm_seurat <- NormalizeData(osm_seurat, scale.factor = median(x = total.counts))
30 | osm_seurat <- ScaleData(osm_seurat)
31 | saveRDS(object = osm_seurat, file = 'data/seurat_objects/osmFISH_Cortex.rds')


--------------------------------------------------------------------------------
/benchmark/osmFISH_Ziesel/Seurat/osmFISH_integration.R:
--------------------------------------------------------------------------------
 1 | setwd("osmFISH_Ziesel/")
 2 | library(Seurat)
 3 | library(ggplot2)
 4 | 
 5 | osmFISH <- readRDS("data/seurat_objects/osmFISH_Cortex.rds")
 6 | osmFISH.imputed <- readRDS("data/seurat_objects/osmFISH_Cortex_imputed.rds")
 7 | Zeisel <- readRDS("data/seurat_objects/Zeisel_SMSC.rds")
 8 | 
 9 | genes.leaveout <- intersect(rownames(osmFISH),rownames(Zeisel))
10 | Imp_genes <- matrix(0,nrow = length(genes.leaveout),ncol = dim(osmFISH@assays$RNA)[2])
11 | rownames(Imp_genes) <- genes.leaveout
12 | anchor_time <- vector(mode= "numeric")
13 | Transfer_time <- vector(mode= "numeric")
14 | 
15 | run_imputation <- function(ref.obj, query.obj, feature.remove) {
16 |   message(paste0('removing ', feature.remove))
17 |   features <- setdiff(rownames(query.obj), feature.remove)
18 |   DefaultAssay(ref.obj) <- 'RNA'
19 |   DefaultAssay(query.obj) <- 'RNA'
20 |   
21 |   start_time <- Sys.time()
22 |   anchors <- FindTransferAnchors(
23 |     reference = ref.obj,
24 |     query = query.obj,
25 |     features = features,
26 |     dims = 1:30,
27 |     reduction = 'cca'
28 |   )
29 |   end_time <- Sys.time()
30 |   anchor_time <<- c(anchor_time,as.numeric(difftime(end_time,start_time,units = 'secs')))
31 |   
32 |   refdata <- GetAssayData(
33 |     object = ref.obj,
34 |     assay = 'RNA',
35 |     slot = 'data'
36 |   )
37 |   
38 |   start_time <- Sys.time()
39 |   imputation <- TransferData(
40 |     anchorset = anchors,
41 |     refdata = refdata,
42 |     weight.reduction = 'pca'
43 |   )
44 |   query.obj[['seq']] <- imputation
45 |   end_time <- Sys.time()
46 |   Transfer_time <<- c(Transfer_time,as.numeric(difftime(end_time,start_time,units = 'secs')))
47 |   return(query.obj)
48 | }
49 | 
50 | for(i in 1:length(genes.leaveout)) {
51 |   imputed.ss2 <- run_imputation(ref.obj = Zeisel, query.obj = osmFISH, feature.remove = genes.leaveout[[i]])
52 |   osmFISH[['ss2']] <- imputed.ss2[, colnames(osmFISH)][['seq']]
53 |   Imp_genes[genes.leaveout[[i]],] = as.vector(osmFISH@assays$ss2[genes.leaveout[i],])
54 | }
55 | write.csv(Imp_genes,file = 'Results/Seurat_LeaveOneOut.csv')
56 | write.csv(anchor_time,file = 'Results/Seurat_anchor_time.csv',row.names = FALSE)
57 | write.csv(Transfer_time,file = 'Results/Seurat_transfer_time.csv',row.names = FALSE)
58 | 
59 | # show genes not in the osmFISH dataset
60 | DefaultAssay(osmFISH.imputed) <- "ss2"
61 | new.genes <- c('Tesc', 'Pvrl3', 'Grm2')
62 | Imp_New_genes <- matrix(0,nrow = length(new.genes),ncol = dim(osmFISH.imputed@assays$ss2)[2])
63 | rownames(Imp_New_genes) <- new.genes
64 | for(i in 1:length(new.genes)) {
65 |   Imp_New_genes[new.genes[[i]],] = as.vector(osmFISH.imputed@assays$ss2[new.genes[i],])
66 | }
67 | write.csv(Imp_New_genes,file = 'Results/Seurat_New_genes.csv')


--------------------------------------------------------------------------------
/benchmark/osmFISH_Ziesel/SpaGE/Integration.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | os.chdir('osmFISH_Ziesel/')
  3 | 
  4 | import pickle
  5 | import numpy as np
  6 | import pandas as pd
  7 | from sklearn.neighbors import NearestNeighbors
  8 | import time as tm
  9 | 
 10 | with open ('data/SpaGE_pkl/Ziesel.pkl', 'rb') as f:
 11 |     datadict = pickle.load(f)
 12 | 
 13 | RNA_data = datadict['RNA_data']
 14 | RNA_data_scaled = datadict['RNA_data_scaled']
 15 | del datadict
 16 | 
 17 | with open ('data/SpaGE_pkl/osmFISH_Cortex.pkl', 'rb') as f:
 18 |     datadict = pickle.load(f)
 19 | 
 20 | osmFISH_data = datadict['osmFISH_data']
 21 | osmFISH_data_scaled = datadict['osmFISH_data_scaled']
 22 | osmFISH_meta= datadict['osmFISH_meta']
 23 | del datadict
 24 | 
 25 | #### Leave One Out Validation ####
 26 | Common_data = RNA_data_scaled[np.intersect1d(osmFISH_data_scaled.columns,RNA_data_scaled.columns)]
 27 | Imp_Genes = pd.DataFrame(columns=Common_data.columns)
 28 | precise_time = []
 29 | knn_time = []
 30 | for i in Common_data.columns:
 31 |     print(i)
 32 |     start = tm.time()
 33 |     from principal_vectors import PVComputation
 34 | 
 35 |     n_factors = 30
 36 |     n_pv = 30
 37 |     dim_reduction = 'pca'
 38 |     dim_reduction_target = 'pca'
 39 | 
 40 |     pv_FISH_RNA = PVComputation(
 41 |             n_factors = n_factors,
 42 |             n_pv = n_pv,
 43 |             dim_reduction = dim_reduction,
 44 |             dim_reduction_target = dim_reduction_target
 45 |     )
 46 | 
 47 |     pv_FISH_RNA.fit(Common_data.drop(i,axis=1),osmFISH_data_scaled[Common_data.columns].drop(i,axis=1))
 48 | 
 49 |     S = pv_FISH_RNA.source_components_.T
 50 |     
 51 |     Effective_n_pv = sum(np.diag(pv_FISH_RNA.cosine_similarity_matrix_) > 0.3)
 52 |     S = S[:,0:Effective_n_pv]
 53 | 
 54 |     Common_data_t = Common_data.drop(i,axis=1).dot(S)
 55 |     FISH_exp_t = osmFISH_data_scaled[Common_data.columns].drop(i,axis=1).dot(S)
 56 |     precise_time.append(tm.time()-start)
 57 |         
 58 |     start = tm.time()
 59 |     nbrs = NearestNeighbors(n_neighbors=50, algorithm='auto',metric = 'cosine').fit(Common_data_t)
 60 |     distances, indices = nbrs.kneighbors(FISH_exp_t)
 61 |     
 62 |     Imp_Gene = np.zeros(osmFISH_data.shape[0])
 63 |     for j in range(0,osmFISH_data.shape[0]):
 64 |         weights = 1-(distances[j,:][distances[j,:]<1])/(np.sum(distances[j,:][distances[j,:]<1]))
 65 |         weights = weights/(len(weights)-1)
 66 |         Imp_Gene[j] = np.sum(np.multiply(RNA_data[i][indices[j,:][distances[j,:] < 1]],weights))
 67 |     Imp_Gene[np.isnan(Imp_Gene)] = 0
 68 |     Imp_Genes[i] = Imp_Gene
 69 |     knn_time.append(tm.time()-start)
 70 | 
 71 | Imp_Genes.to_csv('Results/SpaGE_LeaveOneOut.csv')
 72 | precise_time = pd.DataFrame(precise_time)
 73 | knn_time = pd.DataFrame(knn_time)
 74 | precise_time.to_csv('Results/SpaGE_PreciseTime.csv', index = False)
 75 | knn_time.to_csv('Results/SpaGE_knnTime.csv', index = False)
 76 | 
 77 | #### Novel Genes Expression Patterns ####
 78 | Common_data = RNA_data_scaled[np.intersect1d(osmFISH_data_scaled.columns,RNA_data_scaled.columns)]
 79 | genes_to_impute = ["Tesc","Pvrl3","Grm2"]
 80 | Imp_New_Genes = pd.DataFrame(np.zeros((osmFISH_data.shape[0],len(genes_to_impute))),columns=genes_to_impute)
 81 | 
 82 | from principal_vectors import PVComputation
 83 | 
 84 | n_factors = 30
 85 | n_pv = 30
 86 | dim_reduction = 'pca'
 87 | dim_reduction_target = 'pca'
 88 | 
 89 | pv_FISH_RNA = PVComputation(
 90 |         n_factors = n_factors,
 91 |         n_pv = n_pv,
 92 |         dim_reduction = dim_reduction,
 93 |         dim_reduction_target = dim_reduction_target
 94 | )
 95 | 
 96 | pv_FISH_RNA.fit(Common_data,osmFISH_data_scaled[Common_data.columns])
 97 | 
 98 | S = pv_FISH_RNA.source_components_.T
 99 |     
100 | Effective_n_pv = sum(np.diag(pv_FISH_RNA.cosine_similarity_matrix_) > 0.3)
101 | S = S[:,0:Effective_n_pv]
102 | 
103 | Common_data_t = Common_data.dot(S)
104 | FISH_exp_t = osmFISH_data_scaled[Common_data.columns].dot(S)
105 |     
106 | nbrs = NearestNeighbors(n_neighbors=50, algorithm='auto',
107 |                         metric = 'cosine').fit(Common_data_t)
108 | distances, indices = nbrs.kneighbors(FISH_exp_t)
109 | 
110 | for j in range(0,osmFISH_data.shape[0]):
111 | 
112 |     weights = 1-(distances[j,:][distances[j,:]<1])/(np.sum(distances[j,:][distances[j,:]<1]))
113 |     weights = weights/(len(weights)-1)
114 |     Imp_New_Genes.iloc[j,:] = np.dot(weights,RNA_data[genes_to_impute].iloc[indices[j,:][distances[j,:] < 1]])
115 | 
116 | Imp_New_Genes.to_csv('Results/SpaGE_New_genes.csv')


--------------------------------------------------------------------------------
/benchmark/osmFISH_Ziesel/SpaGE/Linnarson_data_preprocessing.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | os.chdir('osmFISH_Ziesel/')
 3 | 
 4 | import numpy as np
 5 | import pandas as pd
 6 | import scipy.stats as st
 7 | import pickle
 8 | import loompy
 9 | 
10 | ## Read RNA seq data
11 | RNA_exp = pd.read_csv('data/Zeisel/expression_mRNA_17-Aug-2014.txt',header=None,sep='\t')
12 | 
13 | RNA_data = RNA_exp.iloc[11:19984,:]
14 | RNA_data = RNA_data.drop([1],axis=1)
15 | RNA_data.set_index(0,inplace=True)
16 | RNA_data.columns = range(0,RNA_data.shape[1])
17 | RNA_data = RNA_data.astype('float')
18 | 
19 | RNA_meta = RNA_exp.iloc[0:10,:]
20 | RNA_meta = RNA_meta.drop([0],axis=1)
21 | RNA_meta.set_index(1,inplace=True)
22 | RNA_meta.columns = range(0,RNA_meta.shape[1])
23 | RNA_meta = RNA_meta.T
24 | del RNA_exp
25 | 
26 | RNA_data = RNA_data.loc[:,RNA_meta['tissue']=='sscortex']
27 | RNA_meta = RNA_meta.loc[RNA_meta['tissue']=='sscortex',:]
28 | 
29 | RNA_data.columns = range(0,RNA_data.shape[1])
30 | RNA_meta.index = range(0,RNA_meta.shape[0])
31 | 
32 | # filter lowely expressed genes
33 | Genes_count = np.sum(RNA_data > 0, axis=1)
34 | RNA_data = RNA_data.loc[Genes_count >=10,:]
35 | del Genes_count
36 | 
37 | def Log_Norm_cpm(x):
38 |     return np.log(((x/np.sum(x))*1000000) + 1)
39 | 
40 | RNA_data = RNA_data.apply(Log_Norm_cpm,axis=0)
41 | RNA_data_scaled = pd.DataFrame(data=st.zscore(RNA_data.T),index = RNA_data.columns,columns=RNA_data.index)
42 | 
43 | datadict = dict()
44 | datadict['RNA_data'] = RNA_data.T
45 | datadict['RNA_data_scaled'] = RNA_data_scaled
46 | datadict['RNA_meta'] = RNA_meta
47 | 
48 | with open('data/SpaGE_pkl/Ziesel.pkl','wb') as f:
49 |     pickle.dump(datadict, f)
50 | 
51 | ## Read loom data
52 | ds = loompy.connect('data/osmFISH/osmFISH_SScortex_mouse_all_cells.loom')
53 | 
54 | FISH_Genes = ds.ra['Gene']
55 |     
56 | colAtr = ds.ca.keys()
57 | 
58 | df = pd.DataFrame()
59 | for i in colAtr:
60 |     df[i] = ds.ca[i]
61 | 
62 | osmFISH_meta = df.iloc[np.where(df.Valid == 1)[0], :]
63 | osmFISH_data = ds[:,:]
64 | osmFISH_data = osmFISH_data[:,np.where(df.Valid == 1)[0]]
65 | osmFISH_data = pd.DataFrame(data= osmFISH_data, index= FISH_Genes)
66 | 
67 | del ds, colAtr, i, df, FISH_Genes
68 | 
69 | Cortex_Regions = ['Layer 2-3 lateral', 'Layer 2-3 medial', 'Layer 3-4', 
70 |                   'Layer 4','Layer 5', 'Layer 6', 'Pia Layer 1']
71 | Cortical = np.stack(i in Cortex_Regions for i in osmFISH_meta.Region)
72 | 
73 | osmFISH_meta = osmFISH_meta.iloc[Cortical,:]
74 | osmFISH_data = osmFISH_data.iloc[:,Cortical]
75 | 
76 | cell_count = np.sum(osmFISH_data,axis=0)
77 | def Log_Norm(x):
78 |     return np.log(((x/np.sum(x))*np.median(cell_count)) + 1)
79 | 
80 | osmFISH_data = osmFISH_data.apply(Log_Norm,axis=0)
81 | osmFISH_data_scaled = pd.DataFrame(data=st.zscore(osmFISH_data.T),index = osmFISH_data.columns,columns=osmFISH_data.index)
82 | 
83 | datadict = dict()
84 | datadict['osmFISH_data'] = osmFISH_data.T
85 | datadict['osmFISH_data_scaled'] = osmFISH_data_scaled
86 | datadict['osmFISH_meta'] = osmFISH_meta
87 | 
88 | with open('data/SpaGE_pkl/osmFISH_Cortex.pkl','wb') as f:
89 |     pickle.dump(datadict, f)
90 | 


--------------------------------------------------------------------------------
/benchmark/osmFISH_Ziesel/SpaGE/Precise_output.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | os.chdir('osmFISH_Ziesel/')
 3 | 
 4 | import pickle
 5 | import numpy as np
 6 | import pandas as pd
 7 | import matplotlib
 8 | matplotlib.rcParams['pdf.fonttype'] = 42
 9 | matplotlib.rcParams['ps.fonttype'] = 42
10 | import matplotlib.pyplot as plt
11 | import seaborn as sns
12 | import sys
13 | sys.path.insert(1,'SpaGE/')
14 | from principal_vectors import PVComputation
15 | 
16 | with open ('data/SpaGE_pkl/osmFISH_Cortex.pkl', 'rb') as f:
17 |     datadict = pickle.load(f)
18 | 
19 | osmFISH_data = datadict['osmFISH_data']
20 | osmFISH_data_scaled = datadict['osmFISH_data_scaled']
21 | osmFISH_meta= datadict['osmFISH_meta']
22 | del datadict
23 | 
24 | with open ('data/SpaGE_pkl/Ziesel.pkl', 'rb') as f:
25 |     datadict = pickle.load(f)
26 | 
27 | RNA_data = datadict['RNA_data']
28 | RNA_data_scaled = datadict['RNA_data_scaled']
29 | RNA_meta = datadict['RNA_meta']
30 | del datadict
31 | 
32 | Common_data = RNA_data_scaled[np.intersect1d(osmFISH_data_scaled.columns,RNA_data_scaled.columns)]
33 | 
34 | n_factors = 30
35 | n_pv = 30
36 | n_pv_display = 30
37 | dim_reduction = 'pca'
38 | dim_reduction_target = 'pca'
39 | 
40 | pv_FISH_RNA = PVComputation(
41 |         n_factors = n_factors,
42 |         n_pv = n_pv,
43 |         dim_reduction = dim_reduction,
44 |         dim_reduction_target = dim_reduction_target
45 | )
46 | 
47 | pv_FISH_RNA.fit(Common_data,osmFISH_data_scaled[Common_data.columns])
48 | 
49 | fig = plt.figure()
50 | sns.heatmap(pv_FISH_RNA.initial_cosine_similarity_matrix_[:n_pv_display,:n_pv_display], cmap='seismic_r',
51 |             center=0, vmax=1., vmin=0)
52 | plt.xlabel('osmFISH',fontsize=18, color='black')
53 | plt.ylabel('Ziesel_RNA',fontsize=18, color='black')
54 | plt.xticks(np.arange(n_pv_display)+0.5, range(1, n_pv_display+1), fontsize=12)
55 | plt.yticks(np.arange(n_pv_display)+0.5, range(1, n_pv_display+1), fontsize=12, rotation='horizontal')
56 | plt.gca().set_ylim([n_pv_display,0])
57 | plt.show()
58 | 
59 | plt.figure()
60 | sns.heatmap(pv_FISH_RNA.cosine_similarity_matrix_[:n_pv_display,:n_pv_display], cmap='seismic_r',
61 |             center=0, vmax=1., vmin=0)
62 | for i in range(n_pv_display-1):
63 |     plt.text(i+1,i+.7,'%1.2f'%pv_FISH_RNA.cosine_similarity_matrix_[i,i], fontsize=14,color='black')
64 |     
65 | plt.xlabel('osmFISH',fontsize=18, color='black')
66 | plt.ylabel('Ziesel_RNA',fontsize=18, color='black')
67 | plt.xticks(np.arange(n_pv_display)+0.5, range(1, n_pv_display+1), fontsize=12)
68 | plt.yticks(np.arange(n_pv_display)+0.5, range(1, n_pv_display+1), fontsize=12, rotation='horizontal')
69 | plt.gca().set_ylim([n_pv_display,0])
70 | plt.show()
71 | 
72 | Importance = pd.Series(np.sum(pv_FISH_RNA.source_components_**2,axis=0),index=Common_data.columns)
73 | Importance.sort_values(ascending=False,inplace=True)
74 | Importance.index[0:30]
75 | 
76 | ### Technology specific Processes
77 | Effective_n_pv = sum(np.diag(pv_FISH_RNA.cosine_similarity_matrix_) > 0.3)
78 | 
79 | # explained variance RNA
80 | np.sum(pv_FISH_RNA.source_explained_variance_ratio_[np.arange(Effective_n_pv)])*100
81 | # explained variance spatial
82 | np.sum(pv_FISH_RNA.target_explained_variance_ratio_[np.arange(Effective_n_pv)])*100
83 | 


--------------------------------------------------------------------------------
/benchmark/osmFISH_Ziesel/SpaGE/dimensionality_reduction.py:
--------------------------------------------------------------------------------
 1 | """ Dimensionality Reduction
 2 | @author: Soufiane Mourragui
 3 | This module extracts the domain-specific factors from the high-dimensional omics
 4 | dataset. Several methods are here implemented and they can be directly
 5 | called from string name in main method method. All the methods
 6 | use scikit-learn implementation.
 7 | Notes
 8 | -------
 9 | 	-
10 | 	
11 | References
12 | -------
13 | 	[1] Pedregosa, Fabian, et al. (2011) Scikit-learn: Machine learning in Python.
14 | 	Journal of Machine Learning Research
15 | """
16 | 
17 | import numpy as np
18 | from sklearn.decomposition import PCA, FastICA, FactorAnalysis, NMF, SparsePCA
19 | from sklearn.cross_decomposition import PLSRegression
20 | 
21 | 
22 | def process_dim_reduction(method='pca', n_dim=10):
23 |     """
24 |     Default linear dimensionality reduction method. For each method, return a
25 |     BaseEstimator instance corresponding to the method given as input.
26 | 	Attributes
27 |     -------
28 |     method: str, default to 'pca'
29 |     	Method used for dimensionality reduction.
30 |     	Implemented: 'pca', 'ica', 'fa' (Factor Analysis), 
31 |     	'nmf' (Non-negative matrix factorisation), 'sparsepca' (Sparse PCA).
32 |     
33 |     n_dim: int, default to 10
34 |     	Number of domain-specific factors to compute.
35 |     Return values
36 |     -------
37 |     Classifier, i.e. BaseEstimator instance
38 |     """
39 | 
40 |     if method.lower() == 'pca':
41 |         clf = PCA(n_components=n_dim)
42 | 
43 |     elif method.lower() == 'ica':
44 |         print('ICA')
45 |         clf = FastICA(n_components=n_dim)
46 | 
47 |     elif method.lower() == 'fa':
48 |         clf = FactorAnalysis(n_components=n_dim)
49 | 
50 |     elif method.lower() == 'nmf':
51 |         clf = NMF(n_components=n_dim)
52 | 
53 |     elif method.lower() == 'sparsepca':
54 |         clf = SparsePCA(n_components=n_dim, alpha=10., tol=1e-4, verbose=10, n_jobs=1)
55 | 
56 |     elif method.lower() == 'pls':
57 |         clf = PLS(n_components=n_dim)
58 | 		
59 |     else:
60 |         raise NameError('%s is not an implemented method'%(method))
61 | 
62 |     return clf
63 | 
64 | 
65 | class PLS():
66 |     """
67 |     Implement PLS to make it compliant with the other dimensionality
68 |     reduction methodology.
69 |     (Simple class rewritting).
70 |     """
71 |     def __init__(self, n_components=10):
72 |         self.clf = PLSRegression(n_components)
73 | 
74 |     def get_components_(self):
75 |         return self.clf.x_weights_.transpose()
76 | 
77 |     def set_components_(self, x):
78 |         pass
79 | 
80 |     components_ = property(get_components_, set_components_)
81 | 
82 |     def fit(self, X, y):
83 |         self.clf.fit(X,y)
84 |         return self
85 | 
86 |     def transform(self, X):
87 |         return self.clf.transform(X)
88 | 
89 |     def predict(self, X):
90 |         return self.clf.predict(X)


--------------------------------------------------------------------------------
/benchmark/osmFISH_Ziesel/gimVI/gimVI.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | os.chdir('osmFISH_Ziesel/')
  3 | 
  4 | from scvi.dataset import CsvDataset
  5 | from scvi.models import JVAE, Classifier
  6 | from scvi.inference import JVAETrainer
  7 | import numpy as np
  8 | import pandas as pd
  9 | import copy
 10 | import torch
 11 | import time as tm
 12 | 
 13 | ### osmFISH data
 14 | osmFISH_data = CsvDataset('data/gimVI_data/osmFISH_Cortex_scvi.csv', save_path = "", sep = ",", gene_by_cell = False)
 15 | 
 16 | ### RNA
 17 | RNA_data = CsvDataset('data/gimVI_data/RNA_Cortex_scvi.csv', save_path = "", sep = ",", gene_by_cell = False)
 18 | 
 19 | ### Leave-one-out validation
 20 | Common_data = copy.deepcopy(RNA_data)
 21 | Common_data.gene_names = osmFISH_data.gene_names
 22 | Common_data.X = Common_data.X[:,np.reshape(np.vstack(np.argwhere(i==RNA_data.gene_names) for i in osmFISH_data.gene_names),-1)]
 23 | Gene_set = np.intersect1d(osmFISH_data.gene_names,Common_data.gene_names)
 24 | Imp_Genes = pd.DataFrame(columns=Gene_set)
 25 | gimVI_time = []
 26 | 
 27 | for i in Gene_set:
 28 |     print(i)
 29 |     # Create copy of the fish dataset with hidden genes
 30 |     data_spatial_partial = copy.deepcopy(osmFISH_data)
 31 |     data_spatial_partial.filter_genes_by_attribute(np.setdiff1d(osmFISH_data.gene_names,i))
 32 |     data_spatial_partial.batch_indices += Common_data.n_batches
 33 |     
 34 |     if(data_spatial_partial.X.shape[0] != osmFISH_data.X.shape[0]):
 35 |         continue
 36 |     
 37 |     datasets = [Common_data, data_spatial_partial]
 38 |     generative_distributions = ["zinb", "nb"]
 39 |     gene_mappings = [slice(None), np.reshape(np.vstack(np.argwhere(i==Common_data.gene_names) for i in data_spatial_partial.gene_names),-1)]
 40 |     n_inputs = [d.nb_genes for d in datasets]
 41 |     total_genes = Common_data.nb_genes
 42 |     n_batches = sum([d.n_batches for d in datasets])
 43 |     
 44 |     model_library_size = [True, False]
 45 |     
 46 |     n_latent = 8
 47 |     kappa = 5
 48 | 
 49 |     start = tm.time()
 50 |     torch.manual_seed(0)
 51 |     
 52 |     model = JVAE(
 53 |         n_inputs,
 54 |         total_genes,
 55 |         gene_mappings,
 56 |         generative_distributions,
 57 |         model_library_size,
 58 |         n_layers_decoder_individual=0,
 59 |         n_layers_decoder_shared=0,
 60 |         n_layers_encoder_individual=1,
 61 |         n_layers_encoder_shared=1,
 62 |         dim_hidden_encoder=64,
 63 |         dim_hidden_decoder_shared=64,
 64 |         dropout_rate_encoder=0.2,
 65 |         dropout_rate_decoder=0.2,
 66 |         n_batch=n_batches,
 67 |         n_latent=n_latent,
 68 |     )
 69 |     discriminator = Classifier(n_latent, 32, 2, 3, logits=True)
 70 |     
 71 |     trainer = JVAETrainer(model, discriminator, datasets, 0.95, frequency=1, kappa=kappa)
 72 |     trainer.train(n_epochs=200)
 73 |     _,Imputed = trainer.get_imputed_values(normalized=True)
 74 |     Imputed = np.reshape(Imputed[:,np.argwhere(i==Common_data.gene_names)[0]],-1)
 75 |     Imp_Genes[i] = Imputed
 76 |     gimVI_time.append(tm.time()-start)
 77 |     
 78 | Imp_Genes = Imp_Genes.fillna(0)    
 79 | Imp_Genes.to_csv('Results/gimVI_LeaveOneOut.csv')
 80 | gimVI_time = pd.DataFrame(gimVI_time)
 81 | gimVI_time.to_csv('Results/gimVI_Time.csv', index = False)
 82 | 
 83 | ### New genes
 84 | Imp_New_Genes = pd.DataFrame(columns=["TESC","PVRL3","GRM2"])
 85 | 
 86 | # Create copy of the fish dataset with hidden genes
 87 | data_spatial_partial = copy.deepcopy(osmFISH_data)
 88 | data_spatial_partial.filter_genes_by_attribute(osmFISH_data.gene_names)
 89 | data_spatial_partial.batch_indices += RNA_data.n_batches
 90 | 
 91 | datasets = [RNA_data, data_spatial_partial]
 92 | generative_distributions = ["zinb", "nb"]
 93 | gene_mappings = [slice(None), np.reshape(np.vstack(np.argwhere(i==RNA_data.gene_names) for i in data_spatial_partial.gene_names),-1)]
 94 | n_inputs = [d.nb_genes for d in datasets]
 95 | total_genes = RNA_data.nb_genes
 96 | n_batches = sum([d.n_batches for d in datasets])
 97 | 
 98 | model_library_size = [True, False]
 99 | 
100 | n_latent = 8
101 | kappa = 5
102 | 
103 | torch.manual_seed(0)
104 | 
105 | model = JVAE(
106 |     n_inputs,
107 |     total_genes,
108 |     gene_mappings,
109 |     generative_distributions,
110 |     model_library_size,
111 |     n_layers_decoder_individual=0,
112 |     n_layers_decoder_shared=0,
113 |     n_layers_encoder_individual=1,
114 |     n_layers_encoder_shared=1,
115 |     dim_hidden_encoder=64,
116 |     dim_hidden_decoder_shared=64,
117 |     dropout_rate_encoder=0.2,
118 |     dropout_rate_decoder=0.2,
119 |     n_batch=n_batches,
120 |     n_latent=n_latent,
121 | )
122 | discriminator = Classifier(n_latent, 32, 2, 3, logits=True)
123 | 
124 | trainer = JVAETrainer(model, discriminator, datasets, 0.95, frequency=1, kappa=kappa)
125 | trainer.train(n_epochs=200)
126 |     
127 | for i in ["TESC","PVRL3","GRM2"]:
128 |     _,Imputed = trainer.get_imputed_values(normalized=True)
129 |     Imputed = np.reshape(Imputed[:,np.argwhere(i==RNA_data.gene_names)[0]],-1)
130 |     Imp_New_Genes[i] = Imputed
131 |     
132 | Imp_New_Genes.to_csv('Results/gimVI_New_genes.csv')
133 | 


--------------------------------------------------------------------------------
/benchmark/seqFISH_AllenVISp/DownSampling/DownSampling.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | os.chdir('seqFISH_AllenVISp/')
  3 | 
  4 | import pickle
  5 | import numpy as np
  6 | import pandas as pd
  7 | from sklearn.neighbors import NearestNeighbors
  8 | import scipy.stats as st
  9 | import sys
 10 | sys.path.insert(1,'Scripts/SpaGE/')
 11 | from principal_vectors import PVComputation
 12 | 
 13 | with open ('data/SpaGE_pkl/seqFISH_Cortex.pkl', 'rb') as f:
 14 |     datadict = pickle.load(f)
 15 | 
 16 | seqFISH_data = datadict['seqFISH_data']
 17 | seqFISH_data_scaled = datadict['seqFISH_data_scaled']
 18 | seqFISH_meta= datadict['seqFISH_meta']
 19 | del datadict
 20 | 
 21 | with open ('data/SpaGE_pkl/Allen_VISp.pkl', 'rb') as f:
 22 |     datadict = pickle.load(f)
 23 | 
 24 | RNA_data = datadict['RNA_data']
 25 | RNA_data_scaled = datadict['RNA_data_scaled']
 26 | del datadict
 27 | 
 28 | Gene_Order = np.intersect1d(seqFISH_data.columns,RNA_data.columns)
 29 | 
 30 | ### SpaGE
 31 | SpaGE_imputed = pd.read_csv('Results/SpaGE_LeaveOneOut.csv',header=0,index_col=0,sep=',')
 32 | #SpaGE_New = pd.read_csv('Results/SpaGE_New_genes.csv',header=0,index_col=0,sep=',')
 33 | 
 34 | SpaGE_imputed = SpaGE_imputed.loc[:,Gene_Order]
 35 | 
 36 | SpaGE_seqCorr = pd.Series(index = Gene_Order)
 37 | for i in Gene_Order:
 38 |     SpaGE_seqCorr[i] = st.spearmanr(seqFISH_data[i],SpaGE_imputed[i])[0]
 39 | SpaGE_seqCorr[np.isnan(SpaGE_seqCorr)] = 0
 40 | 
 41 | SpaGE_seqCorr.sort_values(ascending=False,inplace=True)
 42 | test_set = SpaGE_seqCorr.index[0:50]
 43 | 
 44 | Common_data = RNA_data[Gene_Order]
 45 | Common_data = Common_data.drop(columns=test_set)
 46 | 
 47 | Corr = np.corrcoef(Common_data.T)
 48 | #for i in range(0,Corr.shape[0]):
 49 | #    Corr[i,i]=0
 50 |     
 51 | #plt.hist(np.abs(np.reshape(Corr,-1)),bins=np.arange(0,1.05,0.05))
 52 | #plt.show()   
 53 | # 0.7
 54 | removed_genes = []
 55 | for i in range(0,Corr.shape[0]):
 56 |     for j in range(i+1,Corr.shape[0]):
 57 |         if(np.abs(Corr[i,j]) > 0.7):
 58 |             Vi = np.var(Common_data.iloc[:,i])
 59 |             Vj = np.var(Common_data.iloc[:,j])
 60 |             if(Vi > Vj):
 61 |                 removed_genes.append(Common_data.columns[j])
 62 |             else:
 63 |                 removed_genes.append(Common_data.columns[i])
 64 | removed_genes= np.unique(removed_genes)
 65 | 
 66 | Common_data = Common_data.drop(columns=removed_genes)
 67 | Variance = np.var(Common_data)
 68 | Variance.sort_values(ascending=False,inplace=True)
 69 | Variance = Variance.append(pd.Series(0,index=removed_genes))
 70 | 
 71 | #### Novel Genes Expression Patterns ####
 72 | genes_to_impute = test_set
 73 | for i in [10,30,50,100,200,500,1000,2000,5000,7000,len(Variance)]:
 74 |     print(i)
 75 |     Imp_New_Genes = pd.DataFrame(np.zeros((seqFISH_data.shape[0],len(genes_to_impute))),columns=genes_to_impute)
 76 |     
 77 |     if(i>=50):
 78 |         n_factors = 50
 79 |         n_pv = 50
 80 |     else:
 81 |         n_factors = i
 82 |         n_pv = i
 83 |     
 84 |     dim_reduction = 'pca'
 85 |     dim_reduction_target = 'pca'
 86 |     
 87 |     pv_FISH_RNA = PVComputation(
 88 |             n_factors = n_factors,
 89 |             n_pv = n_pv,
 90 |             dim_reduction = dim_reduction,
 91 |             dim_reduction_target = dim_reduction_target
 92 |     )
 93 |     
 94 |     source_data = RNA_data_scaled[Variance.index[0:i]]
 95 |     target_data = seqFISH_data_scaled[Variance.index[0:i]]
 96 |     
 97 |     pv_FISH_RNA.fit(source_data,target_data)
 98 |     
 99 |     S = pv_FISH_RNA.source_components_.T
100 |     
101 |     Effective_n_pv = sum(np.diag(pv_FISH_RNA.cosine_similarity_matrix_) > 0.3)
102 |     S = S[:,0:Effective_n_pv]
103 |     
104 |     RNA_data_t = source_data.dot(S)
105 |     FISH_exp_t = target_data.dot(S)
106 |         
107 |     nbrs = NearestNeighbors(n_neighbors=50, algorithm='auto',
108 |                             metric = 'cosine').fit(RNA_data_t)
109 |     distances, indices = nbrs.kneighbors(FISH_exp_t)
110 |      
111 |     for j in range(0,seqFISH_data.shape[0]):
112 |         weights = 1-(distances[j,:][distances[j,:]<1])/(np.sum(distances[j,:][distances[j,:]<1]))
113 |         weights = weights/(len(weights)-1)
114 |         Imp_New_Genes.iloc[j,:] = np.dot(weights,RNA_data[genes_to_impute].iloc[indices[j,:][distances[j,:] < 1]])
115 |         
116 |     Imp_New_Genes.to_csv('Results/' + str(i) +'SpaGE_New_genes.csv')
117 | 


--------------------------------------------------------------------------------
/benchmark/seqFISH_AllenVISp/DownSampling/DownSampling_evaluation.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | os.chdir('seqFISH_AllenVISp/')
  3 | 
  4 | import numpy as np
  5 | import pandas as pd
  6 | import matplotlib
  7 | matplotlib.use('qt5agg')
  8 | matplotlib.rcParams['pdf.fonttype'] = 42
  9 | matplotlib.rcParams['ps.fonttype'] = 42
 10 | import matplotlib.pyplot as plt
 11 | import scipy.stats as st
 12 | import pickle
 13 | 
 14 | ### Original data
 15 | with open ('data/SpaGE_pkl/seqFISH_Cortex.pkl', 'rb') as f:
 16 |     datadict = pickle.load(f)
 17 | 
 18 | seqFISH_data = datadict['seqFISH_data']
 19 | del datadict
 20 | 
 21 | test_set = pd.read_csv('Results/10SpaGE_New_genes.csv',header=0,index_col=0,sep=',').columns
 22 | seqFISH_data = seqFISH_data[test_set]
 23 | 
 24 | ### SpaGE
 25 | #10
 26 | SpaGE_imputed_10 = pd.read_csv('Results/10SpaGE_New_genes.csv',header=0,index_col=0,sep=',')
 27 | 
 28 | SpaGE_Corr_10 = pd.Series(index = test_set)
 29 | for i in test_set:
 30 |     SpaGE_Corr_10[i] = st.spearmanr(seqFISH_data[i],SpaGE_imputed_10[i])[0]
 31 |   
 32 | #30
 33 | SpaGE_imputed_30 = pd.read_csv('Results/30SpaGE_New_genes.csv',header=0,index_col=0,sep=',')
 34 | 
 35 | SpaGE_Corr_30 = pd.Series(index = test_set)
 36 | for i in test_set:
 37 |     SpaGE_Corr_30[i] = st.spearmanr(seqFISH_data[i],SpaGE_imputed_30[i])[0]
 38 | 
 39 | #50    
 40 | SpaGE_imputed_50 = pd.read_csv('Results/50SpaGE_New_genes.csv',header=0,index_col=0,sep=',')
 41 | 
 42 | SpaGE_Corr_50 = pd.Series(index = test_set)
 43 | for i in test_set:
 44 |     SpaGE_Corr_50[i] = st.spearmanr(seqFISH_data[i],SpaGE_imputed_50[i])[0]
 45 | 
 46 | #100    
 47 | SpaGE_imputed_100 = pd.read_csv('Results/100SpaGE_New_genes.csv',header=0,index_col=0,sep=',')
 48 | 
 49 | SpaGE_Corr_100 = pd.Series(index = test_set)
 50 | for i in test_set:
 51 |     SpaGE_Corr_100[i] = st.spearmanr(seqFISH_data[i],SpaGE_imputed_100[i])[0]
 52 | 
 53 | #200    
 54 | SpaGE_imputed_200 = pd.read_csv('Results/200SpaGE_New_genes.csv',header=0,index_col=0,sep=',')
 55 | 
 56 | SpaGE_Corr_200 = pd.Series(index = test_set)
 57 | for i in test_set:
 58 |     SpaGE_Corr_200[i] = st.spearmanr(seqFISH_data[i],SpaGE_imputed_200[i])[0]
 59 | 
 60 | #500    
 61 | SpaGE_imputed_500 = pd.read_csv('Results/500SpaGE_New_genes.csv',header=0,index_col=0,sep=',')
 62 | 
 63 | SpaGE_Corr_500 = pd.Series(index = test_set)
 64 | for i in test_set:
 65 |     SpaGE_Corr_500[i] = st.spearmanr(seqFISH_data[i],SpaGE_imputed_500[i])[0]
 66 | 
 67 | #1000    
 68 | SpaGE_imputed_1000 = pd.read_csv('Results/1000SpaGE_New_genes.csv',header=0,index_col=0,sep=',')
 69 | 
 70 | SpaGE_Corr_1000 = pd.Series(index = test_set)
 71 | for i in test_set:
 72 |     SpaGE_Corr_1000[i] = st.spearmanr(seqFISH_data[i],SpaGE_imputed_1000[i])[0]
 73 | 
 74 | #2000    
 75 | SpaGE_imputed_2000 = pd.read_csv('Results/2000SpaGE_New_genes.csv',header=0,index_col=0,sep=',')
 76 | 
 77 | SpaGE_Corr_2000 = pd.Series(index = test_set)
 78 | for i in test_set:
 79 |     SpaGE_Corr_2000[i] = st.spearmanr(seqFISH_data[i],SpaGE_imputed_2000[i])[0]
 80 | 
 81 | #5000    
 82 | SpaGE_imputed_5000 = pd.read_csv('Results/5000SpaGE_New_genes.csv',header=0,index_col=0,sep=',')
 83 | 
 84 | SpaGE_Corr_5000 = pd.Series(index = test_set)
 85 | for i in test_set:
 86 |     SpaGE_Corr_5000[i] = st.spearmanr(seqFISH_data[i],SpaGE_imputed_5000[i])[0]
 87 | 
 88 | #7000    
 89 | SpaGE_imputed_7000 = pd.read_csv('Results/7000SpaGE_New_genes.csv',header=0,index_col=0,sep=',')
 90 | 
 91 | SpaGE_Corr_7000 = pd.Series(index = test_set)
 92 | for i in test_set:
 93 |     SpaGE_Corr_7000[i] = st.spearmanr(seqFISH_data[i],SpaGE_imputed_7000[i])[0]
 94 | 
 95 | #9701   
 96 | SpaGE_imputed_9701 = pd.read_csv('Results/9701SpaGE_New_genes.csv',header=0,index_col=0,sep=',')
 97 | 
 98 | SpaGE_Corr_9701 = pd.Series(index = test_set)
 99 | for i in test_set:
100 |     SpaGE_Corr_9701[i] = st.spearmanr(seqFISH_data[i],SpaGE_imputed_9701[i])[0]
101 | 
102 | ### Comparison plots
103 | plt.style.use('ggplot')
104 | plt.figure(figsize=(9, 3))
105 | plt.boxplot([SpaGE_Corr_10, SpaGE_Corr_30, SpaGE_Corr_50,
106 |              SpaGE_Corr_100,SpaGE_Corr_200,SpaGE_Corr_500,
107 |              SpaGE_Corr_1000,SpaGE_Corr_2000,SpaGE_Corr_5000,
108 |              SpaGE_Corr_7000,SpaGE_Corr_9701])
109 | 
110 | y = SpaGE_Corr_10
111 | x = np.random.normal(1, 0.05, len(y))
112 | plt.plot(x, y, 'g.', alpha=0.2)
113 | y = SpaGE_Corr_30
114 | x = np.random.normal(2, 0.05, len(y))
115 | plt.plot(x, y, 'g.', alpha=0.2)
116 | y = SpaGE_Corr_50
117 | x = np.random.normal(3, 0.05, len(y))
118 | plt.plot(x, y, 'g.', alpha=0.2)
119 | y = SpaGE_Corr_100
120 | x = np.random.normal(4, 0.05, len(y))
121 | plt.plot(x, y, 'g.', alpha=0.2)
122 | y = SpaGE_Corr_200
123 | x = np.random.normal(5, 0.05, len(y))
124 | plt.plot(x, y, 'g.', alpha=0.2)
125 | y = SpaGE_Corr_500
126 | x = np.random.normal(6, 0.05, len(y))
127 | plt.plot(x, y, 'g.', alpha=0.2)
128 | y = SpaGE_Corr_1000
129 | x = np.random.normal(7, 0.05, len(y))
130 | plt.plot(x, y, 'g.', alpha=0.2)
131 | y = SpaGE_Corr_2000
132 | x = np.random.normal(8, 0.05, len(y))
133 | plt.plot(x, y, 'g.', alpha=0.2)
134 | y = SpaGE_Corr_5000
135 | x = np.random.normal(9, 0.05, len(y))
136 | plt.plot(x, y, 'g.', alpha=0.2)
137 | y = SpaGE_Corr_7000
138 | x = np.random.normal(10, 0.05, len(y))
139 | plt.plot(x, y, 'g.', alpha=0.2)
140 | y = SpaGE_Corr_9701
141 | x = np.random.normal(11, 0.05, len(y))
142 | plt.plot(x, y, 'g.', alpha=0.2)
143 | 
144 | 
145 | plt.xticks((1,2,3,4,5,6,7,8,9,10,11),
146 |            ('10','30','50','100','200','500','1000','2000','5000','7000','9701'),size=10)
147 | plt.yticks(size=8)
148 | plt.xlabel('Number of shared genes',size=12)
149 | plt.gca().set_ylim([-0.3,0.8])
150 | plt.ylabel('Spearman Correlation',size=12)
151 | plt.show()
152 | 


--------------------------------------------------------------------------------
/benchmark/seqFISH_AllenVISp/Performance_evaluation.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | os.chdir('seqFISH_AllenVISp/')
  3 | 
  4 | 
  5 | import numpy as np
  6 | import pandas as pd
  7 | import pickle
  8 | import matplotlib
  9 | matplotlib.use('qt5agg')
 10 | matplotlib.rcParams['pdf.fonttype'] = 42
 11 | matplotlib.rcParams['ps.fonttype'] = 42
 12 | import matplotlib.pyplot as plt
 13 | #from matplotlib import cm
 14 | import scipy.stats as st
 15 | 
 16 | with open ('data/SpaGE_pkl/seqFISH_Cortex.pkl', 'rb') as f:
 17 |     datadict = pickle.load(f)
 18 | 
 19 | seqFISH_data = datadict['seqFISH_data']
 20 | seqFISH_meta= datadict['seqFISH_meta']
 21 | del datadict
 22 | 
 23 | with open ('data/SpaGE_pkl/Allen_VISp.pkl', 'rb') as f:
 24 |     datadict = pickle.load(f)
 25 |     
 26 | RNA_data = datadict['RNA_data']
 27 | del datadict
 28 | 
 29 | Gene_Order = np.intersect1d(seqFISH_data.columns,RNA_data.columns)
 30 | 
 31 | ### SpaGE
 32 | SpaGE_imputed = pd.read_csv('Results/SpaGE_LeaveOneOut.csv',header=0,index_col=0,sep=',')
 33 | 
 34 | SpaGE_imputed = SpaGE_imputed.loc[:,Gene_Order]
 35 | 
 36 | SpaGE_seqCorr = pd.Series(index = Gene_Order)
 37 | for i in Gene_Order:
 38 |     SpaGE_seqCorr[i] = st.spearmanr(seqFISH_data[i],SpaGE_imputed[i])[0]
 39 | SpaGE_seqCorr[np.isnan(SpaGE_seqCorr)] = 0
 40 | 
 41 | os.chdir('STARmap_AllenVISp/')
 42 | 
 43 | with open ('data/SpaGE_pkl/Starmap.pkl', 'rb') as f:
 44 |     datadict = pickle.load(f)
 45 | 
 46 | coords = datadict['coords']
 47 | Starmap_data = datadict['Starmap_data']
 48 | del datadict
 49 | 
 50 | Gene_Order = np.intersect1d(Starmap_data.columns,RNA_data.columns)
 51 | 
 52 | ### SpaGE
 53 | SpaGE_imputed = pd.read_csv('Results/SpaGE_LeaveOneOut_cutoff.csv',header=0,index_col=0,sep=',')
 54 | 
 55 | SpaGE_imputed = SpaGE_imputed.loc[:,Gene_Order]
 56 | 
 57 | SpaGE_starCorr = pd.Series(index = Gene_Order)
 58 | for i in Gene_Order:
 59 |     SpaGE_starCorr[i] = st.spearmanr(Starmap_data[i],SpaGE_imputed[i])[0]
 60 | 
 61 | def Compare_Correlations(X,Y):
 62 |     fig, ax = plt.subplots(figsize=(4.5, 4.5))
 63 |     ax.scatter(X, Y, s=1)        
 64 |     ax.axvline(linestyle='--',color='gray')
 65 |     ax.axhline(linestyle='--',color='gray')
 66 |     plt.gca().set_ylim([-0.5,1])
 67 |     lims = [np.min([ax.get_xlim(), ax.get_ylim()]),  
 68 |         np.max([ax.get_xlim(), ax.get_ylim()])]
 69 |     ax.plot(lims, lims, 'k-')
 70 |     ax.set_aspect('equal')
 71 |     ax.set_xlim(lims)
 72 |     ax.set_ylim(lims)
 73 |     plt.xticks(size=8)
 74 |     plt.yticks(size=8)
 75 |     plt.show()
 76 | 
 77 | Starmap_seq_genes = np.intersect1d(Starmap_data.columns,seqFISH_data.columns)
 78 | Compare_Correlations(SpaGE_starCorr[Starmap_seq_genes],SpaGE_seqCorr[Starmap_seq_genes])
 79 | plt.xlabel('Spearman Correlation STARmap',size=12)
 80 | plt.ylabel('Spearman Correlation seqFISH',size=12)
 81 | plt.show()
 82 | 
 83 | fig, ax = plt.subplots(figsize=(3.7, 4.5))
 84 | ax.boxplot([SpaGE_starCorr[Starmap_seq_genes],SpaGE_seqCorr[Starmap_seq_genes]],widths=0.5)
 85 | 
 86 | y = SpaGE_starCorr[Starmap_seq_genes]
 87 | x = np.random.normal(1, 0.05, len(y))
 88 | plt.plot(x, y, 'g.', markersize=1, alpha=0.2)
 89 | y = SpaGE_seqCorr[Starmap_seq_genes]
 90 | x = np.random.normal(2, 0.05, len(y))
 91 | plt.plot(x, y, 'g.', markersize=1, alpha=0.2)
 92 | 
 93 | plt.xticks((1,2),('STARmap','seqFISH'),size=12)
 94 | plt.yticks(size=8)
 95 | plt.gca().set_ylim([-0.4,0.8])
 96 | plt.ylabel('Spearman Correlation',size=12)
 97 | #ax.set_aspect(aspect=3)
 98 | _,p_val = st.wilcoxon(SpaGE_starCorr[Starmap_seq_genes],SpaGE_seqCorr[Starmap_seq_genes])
 99 | plt.text(2,np.max(plt.gca().get_ylim()),'%1.2e'%p_val,color='black',size=8)
100 | plt.show()
101 | 
102 | os.chdir('osmFISH_AllenVISp/')
103 | 
104 | with open ('data/SpaGE_pkl/osmFISH_Cortex.pkl', 'rb') as f:
105 |     datadict = pickle.load(f)
106 | 
107 | osmFISH_data = datadict['osmFISH_data']
108 | del datadict
109 | 
110 | Gene_Order = osmFISH_data.columns
111 | 
112 | ### SpaGE
113 | SpaGE_imputed = pd.read_csv('Results/SpaGE_LeaveOneOut_cutoff.csv',header=0,index_col=0,sep=',')
114 | 
115 | SpaGE_imputed = SpaGE_imputed.loc[:,Gene_Order]
116 | 
117 | SpaGE_osmCorr = pd.Series(index = Gene_Order)
118 | for i in Gene_Order:
119 |     SpaGE_osmCorr[i] = st.spearmanr(osmFISH_data[i],SpaGE_imputed[i])[0]
120 | 
121 | def Compare_Correlations(X,Y):
122 |     fig, ax = plt.subplots(figsize=(4.5, 4.5))
123 |     ax.scatter(X, Y, s=25)        
124 |     ax.axvline(linestyle='--',color='gray')
125 |     ax.axhline(linestyle='--',color='gray')
126 |     plt.gca().set_ylim([-0.5,1])
127 |     lims = [np.min([ax.get_xlim(), ax.get_ylim()]),  
128 |         np.max([ax.get_xlim(), ax.get_ylim()])]
129 |     ax.plot(lims, lims, 'k-')
130 |     ax.set_aspect('equal')
131 |     ax.set_xlim(lims)
132 |     ax.set_ylim(lims)
133 |     plt.xticks(size=8)
134 |     plt.yticks(size=8)
135 |     plt.show()
136 | 
137 | osm_seq_genes = np.intersect1d(osmFISH_data.columns,seqFISH_data.columns)
138 | Compare_Correlations(SpaGE_osmCorr[osm_seq_genes],SpaGE_seqCorr[osm_seq_genes])
139 | plt.xlabel('Spearman Correlation osmFISH',size=12)
140 | plt.ylabel('Spearman Correlation seqFISH',size=12)
141 | plt.show()
142 | 
143 | fig, ax = plt.subplots(figsize=(3.7, 4.5))
144 | ax.boxplot([SpaGE_osmCorr[osm_seq_genes],SpaGE_seqCorr[osm_seq_genes]],widths=0.5)
145 | 
146 | y = SpaGE_osmCorr[osm_seq_genes]
147 | x = np.random.normal(1, 0.05, len(y))
148 | plt.plot(x, y, 'g.', alpha=0.2)
149 | y = SpaGE_seqCorr[osm_seq_genes]
150 | x = np.random.normal(2, 0.05, len(y))
151 | plt.plot(x, y, 'g.', alpha=0.2)
152 | 
153 | plt.xticks((1,2),('osmFISH','seqFISH'),size=12)
154 | plt.yticks(size=8)
155 | plt.gca().set_ylim([-0.5,1])
156 | plt.ylabel('Spearman Correlation',size=12)
157 | #ax.set_aspect(aspect=3)
158 | _,p_val = st.wilcoxon(SpaGE_osmCorr[osm_seq_genes],SpaGE_seqCorr[osm_seq_genes])
159 | plt.text(2,np.max(plt.gca().get_ylim()),'%1.2e'%p_val,color='black',size=8)
160 | plt.show()
161 | 


--------------------------------------------------------------------------------
/benchmark/seqFISH_AllenVISp/SpaGE/Integration.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | os.chdir('seqFISH_AllenVISp/')
 3 | 
 4 | import pickle
 5 | import numpy as np
 6 | import pandas as pd
 7 | from sklearn.neighbors import NearestNeighbors
 8 | import time as tm
 9 | 
10 | with open ('data/SpaGE_pkl/seqFISH_Cortex.pkl', 'rb') as f:
11 |     datadict = pickle.load(f)
12 | 
13 | seqFISH_data = datadict['seqFISH_data']
14 | seqFISH_data_scaled = datadict['seqFISH_data_scaled']
15 | seqFISH_meta= datadict['seqFISH_meta']
16 | del datadict
17 | 
18 | with open ('data/SpaGE_pkl/Allen_VISp.pkl', 'rb') as f:
19 |     datadict = pickle.load(f)
20 |     
21 | RNA_data = datadict['RNA_data']
22 | RNA_data_scaled = datadict['RNA_data_scaled']
23 | del datadict
24 | 
25 | #### Leave One Out Validation ####
26 | Common_data = RNA_data_scaled[np.intersect1d(seqFISH_data_scaled.columns,RNA_data_scaled.columns)]
27 | Imp_Genes = pd.DataFrame(columns=Common_data.columns)
28 | precise_time = []
29 | knn_time = []
30 | for i in Common_data.columns:
31 |     print(i)
32 |     start = tm.time()
33 |     from principal_vectors import PVComputation
34 | 
35 |     n_factors = 50
36 |     n_pv = 50
37 |     dim_reduction = 'pca'
38 |     dim_reduction_target = 'pca'
39 | 
40 |     pv_FISH_RNA = PVComputation(
41 |             n_factors = n_factors,
42 |             n_pv = n_pv,
43 |             dim_reduction = dim_reduction,
44 |             dim_reduction_target = dim_reduction_target
45 |     )
46 | 
47 |     pv_FISH_RNA.fit(Common_data.drop(i,axis=1),seqFISH_data_scaled[Common_data.columns].drop(i,axis=1))
48 | 
49 |     S = pv_FISH_RNA.source_components_.T
50 |     
51 |     Effective_n_pv = sum(np.diag(pv_FISH_RNA.cosine_similarity_matrix_) > 0.3)
52 |     S = S[:,0:Effective_n_pv]
53 | 
54 |     Common_data_t = Common_data.drop(i,axis=1).dot(S)
55 |     FISH_exp_t = seqFISH_data_scaled[Common_data.columns].drop(i,axis=1).dot(S)
56 |     precise_time.append(tm.time()-start)
57 |     
58 |     start = tm.time()    
59 |     nbrs = NearestNeighbors(n_neighbors=50, algorithm='auto',metric = 'cosine').fit(Common_data_t)
60 |     distances, indices = nbrs.kneighbors(FISH_exp_t)
61 |     
62 |     Imp_Gene = np.zeros(seqFISH_data.shape[0])
63 |     for j in range(0,seqFISH_data.shape[0]):
64 |         weights = 1-(distances[j,:][distances[j,:]<1])/(np.sum(distances[j,:][distances[j,:]<1]))
65 |         weights = weights/(len(weights)-1)
66 |         Imp_Gene[j] = np.sum(np.multiply(RNA_data[i][indices[j,:][distances[j,:] < 1]],weights))
67 |     Imp_Gene[np.isnan(Imp_Gene)] = 0
68 |     Imp_Genes[i] = Imp_Gene
69 |     knn_time.append(tm.time()-start)
70 | 
71 | Imp_Genes.to_csv('Results/SpaGE_LeaveOneOut.csv')
72 | precise_time = pd.DataFrame(precise_time)
73 | knn_time = pd.DataFrame(knn_time)
74 | precise_time.to_csv('Results/SpaGE_PreciseTime.csv', index = False)
75 | knn_time.to_csv('Results/SpaGE_knnTime.csv', index = False)
76 | 


--------------------------------------------------------------------------------
/benchmark/seqFISH_AllenVISp/SpaGE/Precise_output.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | os.chdir('seqFISH_AllenVISp')
 3 | 
 4 | import pickle
 5 | import numpy as np
 6 | import pandas as pd
 7 | import matplotlib
 8 | matplotlib.rcParams['pdf.fonttype'] = 42
 9 | matplotlib.rcParams['ps.fonttype'] = 42
10 | import matplotlib.pyplot as plt
11 | from matplotlib import cm
12 | import seaborn as sns
13 | import sys
14 | sys.path.insert(1,'Scripts/SpaGE/')
15 | from principal_vectors import PVComputation
16 | 
17 | with open ('data/SpaGE_pkl/seqFISH_Cortex.pkl', 'rb') as f:
18 |     datadict = pickle.load(f)
19 | 
20 | seqFISH_data = datadict['seqFISH_data']
21 | seqFISH_data_scaled = datadict['seqFISH_data_scaled']
22 | seqFISH_meta= datadict['seqFISH_meta']
23 | del datadict
24 | 
25 | with open ('data/SpaGE_pkl/Allen_VISp.pkl', 'rb') as f:
26 |     datadict = pickle.load(f)
27 |     
28 | RNA_data = datadict['RNA_data']
29 | RNA_data_scaled = datadict['RNA_data_scaled']
30 | del datadict
31 | 
32 | Common_data = RNA_data_scaled[np.intersect1d(seqFISH_data_scaled.columns,RNA_data_scaled.columns)]
33 | 
34 | n_factors = 50
35 | n_pv = 50
36 | n_pv_display = 50
37 | dim_reduction = 'pca'
38 | dim_reduction_target = 'pca'
39 | 
40 | pv_FISH_RNA = PVComputation(
41 |         n_factors = n_factors,
42 |         n_pv = n_pv,
43 |         dim_reduction = dim_reduction,
44 |         dim_reduction_target = dim_reduction_target
45 | )
46 | 
47 | pv_FISH_RNA.fit(Common_data,seqFISH_data_scaled[Common_data.columns])
48 | 
49 | fig = plt.figure()
50 | sns.heatmap(pv_FISH_RNA.initial_cosine_similarity_matrix_[:n_pv_display,:n_pv_display], cmap='seismic_r',
51 |             center=0, vmax=1., vmin=0)
52 | plt.xlabel('seqFISH',fontsize=18, color='black')
53 | plt.ylabel('Allen_VISp',fontsize=18, color='black')
54 | plt.xticks(np.arange(n_pv_display)+0.5, range(1, n_pv_display+1), fontsize=12)
55 | plt.yticks(np.arange(n_pv_display)+0.5, range(1, n_pv_display+1), fontsize=12, rotation='horizontal')
56 | plt.gca().set_ylim([n_pv_display,0])
57 | plt.show()
58 | 
59 | plt.figure()
60 | sns.heatmap(pv_FISH_RNA.cosine_similarity_matrix_[:n_pv_display,:n_pv_display], cmap='seismic_r',
61 |             center=0, vmax=1., vmin=0)
62 | for i in range(n_pv_display-1):
63 |     plt.text(i+1,i+.7,'%1.2f'%pv_FISH_RNA.cosine_similarity_matrix_[i,i], fontsize=14,color='black')
64 |     
65 | plt.xlabel('seqFISH',fontsize=18, color='black')
66 | plt.ylabel('Allen_VISp',fontsize=18, color='black')
67 | plt.xticks(np.arange(n_pv_display)+0.5, range(1, n_pv_display+1), fontsize=12)
68 | plt.yticks(np.arange(n_pv_display)+0.5, range(1, n_pv_display+1), fontsize=12, rotation='horizontal')
69 | plt.gca().set_ylim([n_pv_display,0])
70 | plt.show()
71 | 
72 | Importance = pd.Series(np.sum(pv_FISH_RNA.source_components_**2,axis=0),index=Common_data.columns)
73 | Importance.sort_values(ascending=False,inplace=True)
74 | Importance.index[0:50]
75 | 
76 | ### Technology specific Processes
77 | Effective_n_pv = sum(np.diag(pv_FISH_RNA.cosine_similarity_matrix_) > 0.3)
78 | 
79 | # explained variance RNA
80 | np.sum(pv_FISH_RNA.source_explained_variance_ratio_[np.arange(Effective_n_pv)])*100
81 | # explained variance spatial
82 | np.sum(pv_FISH_RNA.target_explained_variance_ratio_[np.arange(Effective_n_pv)])*100
83 | 


--------------------------------------------------------------------------------
/benchmark/seqFISH_AllenVISp/SpaGE/dimensionality_reduction.py:
--------------------------------------------------------------------------------
 1 | """ Dimensionality Reduction
 2 | @author: Soufiane Mourragui
 3 | This module extracts the domain-specific factors from the high-dimensional omics
 4 | dataset. Several methods are here implemented and they can be directly
 5 | called from string name in main method method. All the methods
 6 | use scikit-learn implementation.
 7 | Notes
 8 | -------
 9 | 	-
10 | 	
11 | References
12 | -------
13 | 	[1] Pedregosa, Fabian, et al. (2011) Scikit-learn: Machine learning in Python.
14 | 	Journal of Machine Learning Research
15 | """
16 | 
17 | import numpy as np
18 | from sklearn.decomposition import PCA, FastICA, FactorAnalysis, NMF, SparsePCA
19 | from sklearn.cross_decomposition import PLSRegression
20 | 
21 | 
22 | def process_dim_reduction(method='pca', n_dim=10):
23 |     """
24 |     Default linear dimensionality reduction method. For each method, return a
25 |     BaseEstimator instance corresponding to the method given as input.
26 | 	Attributes
27 |     -------
28 |     method: str, default to 'pca'
29 |     	Method used for dimensionality reduction.
30 |     	Implemented: 'pca', 'ica', 'fa' (Factor Analysis), 
31 |     	'nmf' (Non-negative matrix factorisation), 'sparsepca' (Sparse PCA).
32 |     
33 |     n_dim: int, default to 10
34 |     	Number of domain-specific factors to compute.
35 |     Return values
36 |     -------
37 |     Classifier, i.e. BaseEstimator instance
38 |     """
39 | 
40 |     if method.lower() == 'pca':
41 |         clf = PCA(n_components=n_dim)
42 | 
43 |     elif method.lower() == 'ica':
44 |         print('ICA')
45 |         clf = FastICA(n_components=n_dim)
46 | 
47 |     elif method.lower() == 'fa':
48 |         clf = FactorAnalysis(n_components=n_dim)
49 | 
50 |     elif method.lower() == 'nmf':
51 |         clf = NMF(n_components=n_dim)
52 | 
53 |     elif method.lower() == 'sparsepca':
54 |         clf = SparsePCA(n_components=n_dim, alpha=10., tol=1e-4, verbose=10, n_jobs=1)
55 | 
56 |     elif method.lower() == 'pls':
57 |         clf = PLS(n_components=n_dim)
58 | 		
59 |     else:
60 |         raise NameError('%s is not an implemented method'%(method))
61 | 
62 |     return clf
63 | 
64 | 
65 | class PLS():
66 |     """
67 |     Implement PLS to make it compliant with the other dimensionality
68 |     reduction methodology.
69 |     (Simple class rewritting).
70 |     """
71 |     def __init__(self, n_components=10):
72 |         self.clf = PLSRegression(n_components)
73 | 
74 |     def get_components_(self):
75 |         return self.clf.x_weights_.transpose()
76 | 
77 |     def set_components_(self, x):
78 |         pass
79 | 
80 |     components_ = property(get_components_, set_components_)
81 | 
82 |     def fit(self, X, y):
83 |         self.clf.fit(X,y)
84 |         return self
85 | 
86 |     def transform(self, X):
87 |         return self.clf.transform(X)
88 | 
89 |     def predict(self, X):
90 |         return self.clf.predict(X)


--------------------------------------------------------------------------------
/benchmark/seqFISH_AllenVISp/SpaGE/seqFISH_data_preprocessing.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | os.chdir('seqFISH_AllenVISp/')
 3 | 
 4 | import numpy as np
 5 | import pandas as pd
 6 | import scipy.stats as st
 7 | import pickle
 8 | 
 9 | seqFISH_data = pd.read_csv('data/seqFISH/sourcedata/cortex_svz_counts.csv',header=0)
10 | seqFISH_meta = pd.read_csv('data/seqFISH/sourcedata/cortex_svz_cellcentroids.csv',header=0)
11 | 
12 | seqFISH_data = seqFISH_data.iloc[np.where(seqFISH_meta['Region'] == 'Cortex')[0],:]
13 | seqFISH_meta = seqFISH_meta.iloc[np.where(seqFISH_meta['Region'] == 'Cortex')[0],:]
14 | 
15 | seqFISH_data = seqFISH_data.T
16 | 
17 | cell_count = np.sum(seqFISH_data,axis=0)
18 | def Log_Norm(x):
19 |     return np.log(((x/np.sum(x))*np.median(cell_count)) + 1)
20 | 
21 | seqFISH_data = seqFISH_data.apply(Log_Norm,axis=0)
22 | seqFISH_data_scaled = pd.DataFrame(data=st.zscore(seqFISH_data.T),index = seqFISH_data.columns,columns=seqFISH_data.index)
23 | 
24 | datadict = dict()
25 | datadict['seqFISH_data'] = seqFISH_data.T
26 | datadict['seqFISH_data_scaled'] = seqFISH_data_scaled
27 | datadict['seqFISH_meta'] = seqFISH_meta
28 | 
29 | with open('data/SpaGE_pkl/seqFISH_Cortex.pkl','wb') as f:
30 |     pickle.dump(datadict, f)
31 | 


--------------------------------------------------------------------------------
/dimensionality_reduction.py:
--------------------------------------------------------------------------------
 1 | """ Dimensionality Reduction
 2 | @author: Soufiane Mourragui
 3 | This module extracts the domain-specific factors from the high-dimensional omics
 4 | dataset. Several methods are here implemented and they can be directly
 5 | called from string name in main method method. All the methods
 6 | use scikit-learn implementation.
 7 | Notes
 8 | -------
 9 | 	-
10 | 	
11 | References
12 | -------
13 | 	[1] Pedregosa, Fabian, et al. (2011) Scikit-learn: Machine learning in Python.
14 | 	Journal of Machine Learning Research
15 | """
16 | 
17 | import numpy as np
18 | from sklearn.decomposition import PCA, FastICA, FactorAnalysis, NMF, SparsePCA
19 | from sklearn.cross_decomposition import PLSRegression
20 | 
21 | 
22 | def process_dim_reduction(method='pca', n_dim=10):
23 |     """
24 |     Default linear dimensionality reduction method. For each method, return a
25 |     BaseEstimator instance corresponding to the method given as input.
26 | 	Attributes
27 |     -------
28 |     method: str, default to 'pca'
29 |     	Method used for dimensionality reduction.
30 |     	Implemented: 'pca', 'ica', 'fa' (Factor Analysis), 
31 |     	'nmf' (Non-negative matrix factorisation), 'sparsepca' (Sparse PCA).
32 |     
33 |     n_dim: int, default to 10
34 |     	Number of domain-specific factors to compute.
35 |     Return values
36 |     -------
37 |     Classifier, i.e. BaseEstimator instance
38 |     """
39 | 
40 |     if method.lower() == 'pca':
41 |         clf = PCA(n_components=n_dim)
42 | 
43 |     elif method.lower() == 'ica':
44 |         print('ICA')
45 |         clf = FastICA(n_components=n_dim)
46 | 
47 |     elif method.lower() == 'fa':
48 |         clf = FactorAnalysis(n_components=n_dim)
49 | 
50 |     elif method.lower() == 'nmf':
51 |         clf = NMF(n_components=n_dim)
52 | 
53 |     elif method.lower() == 'sparsepca':
54 |         clf = SparsePCA(n_components=n_dim, alpha=10., tol=1e-4, verbose=10, n_jobs=1)
55 | 
56 |     elif method.lower() == 'pls':
57 |         clf = PLS(n_components=n_dim)
58 | 		
59 |     else:
60 |         raise NameError('%s is not an implemented method'%(method))
61 | 
62 |     return clf
63 | 
64 | 
65 | class PLS():
66 |     """
67 |     Implement PLS to make it compliant with the other dimensionality
68 |     reduction methodology.
69 |     (Simple class rewritting).
70 |     """
71 |     def __init__(self, n_components=10):
72 |         self.clf = PLSRegression(n_components)
73 | 
74 |     def get_components_(self):
75 |         return self.clf.x_weights_.transpose()
76 | 
77 |     def set_components_(self, x):
78 |         pass
79 | 
80 |     components_ = property(get_components_, set_components_)
81 | 
82 |     def fit(self, X, y):
83 |         self.clf.fit(X,y)
84 |         return self
85 | 
86 |     def transform(self, X):
87 |         return self.clf.transform(X)
88 | 
89 |     def predict(self, X):
90 |         return self.clf.predict(X)


--------------------------------------------------------------------------------