├── 11scRNAseq_DatasetSource.txt
├── Codes
    ├── GenerateSimData.py
    └── GenerateSimData_Rare.py
├── ExamplePollen.zip
├── LICENSE.txt
├── OLMC_animation.gif
├── PanoView.jpg
├── PanoViewManual.pdf
├── PanoramicView
    ├── __init__.py
    └── scPanoView.py
├── README.md
└── setup.py


/11scRNAseq_DatasetSource.txt:
--------------------------------------------------------------------------------
 1 | ###################################################################################################
 2 | The following is the download information for 11 scRNA-seq datasets used in the PanoView manuscript.
 3 | You could find expression values and original cluster assignment from these sources.
 4 | ###################################################################################################
 5 | 
 6 | ### Yan et al. ###
 7 | 1. Gene Expression Omnibus https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE36552
 8 | 2. scRNA-Seq Datasets at Hemberg Group/Sanger Institude  https://hemberg-lab.github.io/scRNA.seq.datasets/
 9 | 
10 | ### Goolam et al. ###
11 | 1. ArryaExpress at EMBL-EBI https://www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-3321/
12 | 2. scRNA-Seq Datasets at Hemberg Group/Sanger Institude https://hemberg-lab.github.io/scRNA.seq.datasets/mouse/edev/
13 | 
14 | ### Deng et al. ###
15 | 1. Gene Expression Omnibus https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE45719
16 | 2. ArryaExpress at EMBL-EBI https://www.ebi.ac.uk/arrayexpress/experiments/E-GEOD-45719/
17 | 3. scRNA-Seq Datasets at Hemberg Group/Sanger Institude https://hemberg-lab.github.io/scRNA.seq.datasets/mouse/edev/
18 | 
19 | ### Pollen et al. ###
20 | 1. scRNA-Seq Datasets at Hemberg Group/Sanger Institude https://hemberg-lab.github.io/scRNA.seq.datasets/human/tissues/
21 | 
22 | ### Patel et al. ###
23 | 1. GiniClust at github https://github.com/lanjiangboston/GiniClust
24 | 2. Gene Expression Omnibus https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE57872
25 | 
26 | ### Usoskin et al. ###
27 | 1. Linnarsson Lab http://linnarssonlab.org/drg/
28 | 2. Gene Expression Omnibus https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE59739
29 | 
30 | ### Villani et al. ###
31 | 1. Single Cell PORTAL at Broad Institude https://singlecell.broadinstitute.org/single_cell
32 | 2. Gene Expression Omnibus https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE94820
33 | 
34 | ### Zeisel et al. ###
35 | 1. Gene Expression Omnibus https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE60361
36 | 2. Linnarsson Lab http://linnarssonlab.org/cortex/
37 | 
38 | ### Tirosh et al. ###
39 | 1. Single Cell PORTAL at Broad Institude https://singlecell.broadinstitute.org/single_cell
40 | 2. Gene Expression Omnibus https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE72056
41 | 
42 | ### Baron et al. ###
43 | 1. Gene Expression Omnibus https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE84133
44 | 2. scRNA-Seq Datasets at Hemberg Group/Sanger Institude https://hemberg-lab.github.io/scRNA.seq.datasets/human/pancreas/
45 | 
46 | ### Campbell et al. ###
47 | 1. Gene Expression Omnibus https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE93374
48 | 2. Single Cell PORTAL at Broad Institude https://singlecell.broadinstitute.org/single_cell


--------------------------------------------------------------------------------
/Codes/GenerateSimData.py:
--------------------------------------------------------------------------------
 1 | import numpy as np
 2 | import pandas as pd
 3 | import random
 4 | from sklearn import datasets
 5 | from sklearn.preprocessing import MinMaxScaler
 6 | 
 7 | def Skl_scale(data):
 8 |     newdata = data.copy()
 9 |     for i in newdata.index:
10 |         scaler = MinMaxScaler(feature_range=(0,10000))
11 |         scaler = scaler.fit(newdata.loc[i,:].values.reshape(len(newdata.columns),1))
12 |         newdata.loc[i,:] = scaler.transform(newdata.loc[i,:].values.reshape(len(newdata.columns),1)).reshape(1,len(newdata.columns))
13 |     return(newdata)
14 |     
15 | 
16 | R_number = random.sample(range(1000), k=20) # random numners
17 | for i in range(20):
18 |     rnumber = R_number[i]
19 | 	for j in rnage(3,23): 
20 | 	blobs = datasets.make_blobs(n_samples=500,n_features=20000,random_state=rnumber,centers=j,cluster_std=1)
21 | 	inputdf = pd.DataFrame(data=blobs[0])
22 | 	blobs_cluster=pd.DataFrame(data=blobs[1],columns=['cluster']) # the ground truth 
23 | 	inputdf =Skl_scale(inputdf.transpose()) # expression data for simulation   
24 | 


--------------------------------------------------------------------------------
/Codes/GenerateSimData_Rare.py:
--------------------------------------------------------------------------------
 1 | import numpy as np
 2 | import pandas as pd
 3 | import random
 4 | from sklearn import datasets
 5 | from sklearn.preprocessing import MinMaxScaler
 6 | 
 7 | def Skl_scale(data):
 8 |     newdata = data.copy()
 9 |     for i in newdata.index:
10 |         scaler = MinMaxScaler(feature_range=(0,10000))
11 |         scaler = scaler.fit(newdata.loc[i,:].values.reshape(len(newdata.columns),1))
12 |         newdata.loc[i,:] = scaler.transform(newdata.loc[i,:].values.reshape(len(newdata.columns),1)).reshape(1,len(newdata.columns))
13 |     return(newdata)
14 |     
15 | 
16 | 
17 | R_number = random.sample(range(1000), k=20)
18 | 
19 | for r in R_number:
20 |     for j in range(3,16):
21 |       
22 |         blobs = datasets.make_blobs(n_samples=500,n_features=20000,random_state=r,centers=j,cluster_std=1)        
23 |         blobs_cluster=pd.DataFrame(data=blobs[1],columns=['cluster'])
24 | 
25 |         inputdf = pd.DataFrame(data=blobs[0])
26 |         inputdf=Skl_scale(data)(inputdf.transpose())
27 | 
28 |         clusternumber = random.sample(range(j), k=1)
29 |         clustersize = len(blobs_cluster[blobs_cluster.cluster == clusternumber])         
30 |         randomnumber = random.sample(range(clustersize), k=round(clustersize*0.9))
31 |         RandomCell = blobs_cluster[blobs_cluster.cluster == clusternumber].index[randomnumber]
32 | 
33 |         inputdf2=inputdf.drop(labels=RandomCell,axis=1)
34 |         inputdf2.columns=range(len(inputdf2.columns)) # expression data for the simulation of rare cells
35 |         
36 |         blobs_cluster2 = blobs_cluster.drop(labels=RandomCell,axis=0)           
37 |         blobs_cluster2.index=range(len(blobs_cluster2.index))
38 |         blobs_cluster2=blobs_cluster2.replace(to_replace=clusternumber,value=999) # ground truth
39 | 


--------------------------------------------------------------------------------
/ExamplePollen.zip:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mhu10/scPanoView/84d2a1e5c12f146314d2b741d0e3f0e3f63cfb2b/ExamplePollen.zip


--------------------------------------------------------------------------------
/LICENSE.txt:
--------------------------------------------------------------------------------
 1 | The MIT License (MIT)
 2 | 
 3 | Copyright (c) <year> <copyright holders>
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in
13 | all copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
21 | THE SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/OLMC_animation.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mhu10/scPanoView/84d2a1e5c12f146314d2b741d0e3f0e3f63cfb2b/OLMC_animation.gif


--------------------------------------------------------------------------------
/PanoView.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mhu10/scPanoView/84d2a1e5c12f146314d2b741d0e3f0e3f63cfb2b/PanoView.jpg


--------------------------------------------------------------------------------
/PanoViewManual.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mhu10/scPanoView/84d2a1e5c12f146314d2b741d0e3f0e3f63cfb2b/PanoViewManual.pdf


--------------------------------------------------------------------------------
/PanoramicView/__init__.py:
--------------------------------------------------------------------------------
1 | from PanoramicView import scPanoView
2 | 


--------------------------------------------------------------------------------
/PanoramicView/scPanoView.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | import pandas as pd
  3 | import seaborn as sns
  4 | import matplotlib.pyplot as plt
  5 | import matplotlib as mpl
  6 | import matplotlib.gridspec as gridspec
  7 | from numpy import linalg as LA
  8 | from collections import Counter
  9 | from scipy.spatial import ConvexHull
 10 | from scipy import stats
 11 | from scipy.spatial import distance
 12 | from scipy.cluster.hierarchy import linkage
 13 | from scipy.cluster.hierarchy import dendrogram
 14 | from scipy.cluster.hierarchy import fcluster
 15 | from scipy.cluster import hierarchy
 16 | from sklearn.decomposition import PCA
 17 | from sklearn.neighbors import BallTree
 18 | from sklearn.manifold import TSNE
 19 | from sklearn.preprocessing import MinMaxScaler
 20 | from sklearn.preprocessing import normalize
 21 | from statsmodels.sandbox.stats.multicomp import multipletests
 22 | 
 23 | import warnings
 24 | warnings.filterwarnings("ignore", category=FutureWarning)
 25 | np.random.seed(1)
 26 | 
 27 | 
 28 | def RunPCA(data,n):
 29 |     pca = PCA(n_components=n)
 30 |     pca.fit(data)
 31 |     data_trans = pca.transform(data) ### new coordinates after pca transform
 32 |     return(data_trans,pca.explained_variance_ratio_)
 33 | 
 34 |         
 35 | def gini(data):
 36 |     total=0
 37 |     for i in data:
 38 |         for j in data:
 39 |             
 40 |             total = total + abs(i-j)
 41 |     result = total / (2*len(data)*len(data)*np.mean(data))
 42 |     return(result)
 43 | 
 44 | 
 45 | def Skl_scale(data):
 46 |     newdata = data.copy()
 47 |     for i in newdata.index:
 48 |         scaler = MinMaxScaler(feature_range=(-2,2))
 49 |         scaler = scaler.fit(newdata.loc[i,:].values.reshape(len(newdata.columns),1))
 50 |         newdata.loc[i,:] = scaler.transform(newdata.loc[i,:].values.reshape(len(newdata.columns),1)).reshape(1,len(newdata.columns))
 51 |     return(newdata)
 52 | 
 53 | 
 54 | def OrderCell(data,radius):
 55 |     tree = BallTree(data,leaf_size=2)    
 56 |     Countnumber=[]
 57 |     for point in range(len(data)): 
 58 |         count = tree.query_radius(data[point].reshape(1,-1), r=radius, count_only = True) # counting the number of neighbors for each point
 59 |         Countnumber.append(count) # storing number of neighbors      
 60 |     CountnumberDf = pd.DataFrame(Countnumber,columns =['neighbors'])
 61 |     return(CountnumberDf)
 62 | 
 63 | 
 64 | def jitter(a_series, noise_reduction=1000000):
 65 |     return (np.random.random(len(a_series))*a_series.std()/noise_reduction)-(a_series.std()/(2*noise_reduction))
 66 | 
 67 | 
 68 | def HighVarGene(data,z,meangene): 
 69 |     data=data.transpose()
 70 |     data=data.loc[:,(data!=0).any(axis=0)]
 71 |     data.loc[:,'average'] = np.mean(data,axis=1)     
 72 |     data.loc[:,'genegroup'] = pd.qcut(data.loc[:,'average'] + jitter(data.loc[:,'average']),20,labels=range(1,21)) 
 73 |     data.loc[:,'variance']= np.var(data.drop('genegroup',axis = 1),axis=1)
 74 |     data.loc[:,'dispersion']= data.drop('genegroup',axis =1).variance/data.drop('genegroup',axis=1).average
 75 |     pickdf=[]
 76 |     for group in range(1,21):
 77 |           
 78 |         data.loc[data[data.genegroup == group].index,'zscore'] = pd.DataFrame(stats.zscore(data[data.genegroup == group].dispersion)).fillna(0).values
 79 |         if len(data[(data.zscore>z)&(data.average>meangene)]) > 0:
 80 |             pickdf.append(data[(data.zscore>z)&(data.average>meangene)].index)
 81 |             
 82 |     if pickdf ==[]:
 83 |         for group in range(1,21):
 84 |             data.loc[data[data.genegroup == group].index,'zscore'] = pd.DataFrame(stats.zscore(data[data.genegroup == group].dispersion)).fillna(0).values
 85 |             if len(data[data.zscore>z]) >0:
 86 |                 pickdf.append(data[(data.zscore>z)].index)
 87 |         if pickdf==[]:      
 88 |             return([])    
 89 |         else:
 90 |              hvg=(np.unique([g for sublist in pickdf for g in sublist]))
 91 |              if len(hvg) <3:
 92 |                  return([])
 93 |              else:
 94 |                  return(hvg)
 95 |     else:
 96 |         hvg=(np.unique([g for sublist in pickdf for g in sublist]))
 97 |         if len(hvg) < 3:
 98 |             return([])
 99 |         else:
100 |             return(hvg)
101 |             
102 |             
103 | def Distohull(xcoord,point,clusthull,clusthullvertices):
104 |     disttovertices = [LA.norm(xcoord[i] - xcoord[point]) for i in clusthull.iloc[clusthullvertices,:].index ]
105 |     if np.min(disttovertices) < np.mean(distance.pdist(xcoord[clusthull.iloc[clusthullvertices,:].index])):
106 |        outcheck = 0 # assign to cluster
107 |        outvalue = np.min(disttovertices)
108 |     else:
109 |         outcheck = 1 # not belong to cluster
110 |         outvalue = 9999999999999
111 |     return(outcheck,outvalue)
112 | 
113 | 
114 | def Findlocalmax(countdataframe,xcoordinate,bin):
115 |     clusters = []
116 |     newdataframe = countdataframe
117 |     distohighest = [LA.norm(xcoordinate[point] - xcoordinate[np.argmax(newdataframe.neighbors)]) for point in newdataframe.index]
118 |     newdataframe = newdataframe.assign(dist = distohighest)
119 |     hist = np.histogram(distohighest,bin)
120 |     firstclust = newdataframe[newdataframe.dist < hist[1][1]]
121 |     clust_1 = list(firstclust.index)    
122 |     if len(clust_1) < 4:
123 |             return(False) 
124 |     hull = ConvexHull(xcoordinate[firstclust.index],qhull_options ='QJ')  
125 |     clusters.append([firstclust,clust_1,hull])
126 |     
127 |     tempclust = newdataframe[newdataframe.dist >= hist[1][1]]
128 |     checkpoint = len(tempclust)    
129 |     while checkpoint > 0:
130 |         checkpoint = 0
131 |         neighbornumber = np.unique(tempclust.neighbors)[::-1]
132 |         newcenter = []
133 |         for i in neighbornumber:
134 |             for cell in tempclust[tempclust.neighbors ==i].index:
135 |                 checkgroup = []
136 |                 checkvalue = []
137 |                 for k in range(len(clusters)):
138 |                     check = Distohull(xcoordinate,cell,clusters[k][0],clusters[k][2].vertices)
139 |                     checkgroup.append(check[0])
140 |                     checkvalue.append(check[1])
141 |                 if 0 not in checkgroup:
142 |                     newcenter.append(cell)
143 |                     break
144 |                 else:
145 |                     tempclust = tempclust.drop(cell)
146 |                     clusters[np.argmin(checkvalue)][1].append(cell)
147 |                     clusters[np.argmin(checkvalue)][0] = newdataframe.loc[clusters[np.argmin(checkvalue)][1],:]
148 |                     clusters[np.argmin(checkvalue)][2] = ConvexHull(xcoordinate[clusters[np.argmin(checkvalue)][1]],qhull_options ='QJ')   
149 |             
150 |             if newcenter !=[]:
151 |                 break
152 |             elif i == 1:
153 |                 return(clusters)
154 |         
155 |         distohighest = [LA.norm(xcoordinate[point] - xcoordinate[newcenter[0]]) for point in tempclust.index]
156 |         tempclust=tempclust.assign(dist=distohighest)
157 |         hist = np.histogram(distohighest,bin)
158 |         cluster2 = tempclust[tempclust.dist < hist[1][1]]
159 |         clust_2 = list(cluster2.index)
160 |         if len(clust_2) < 4:
161 |             return(clusters)
162 |         else:
163 |             hull2 = ConvexHull(xcoordinate[cluster2.index],qhull_options ='QJ')          
164 |             clusters.append([cluster2,clust_2,hull2])
165 |             tempclust = tempclust[tempclust.dist >= hist[1][1]]
166 |             checkpoint = len(tempclust)                        
167 |     return(clusters)
168 | 
169 | 
170 |     
171 | class Panoite:
172 |      
173 |     def __init__(self,expression):       
174 |         self.expression = expression
175 |         self.genespace=[]
176 |         self.membership = pd.DataFrame({'L1Cluster':0},index=list(range(len(self.expression))))
177 |         self.stopite = False
178 |     
179 |     def generate_clusters(self,lowgene,zscore):    
180 |         CellMaximumn=1000
181 |         ginicutoff = 0.05
182 |         Bc= 20
183 |         Bg=20
184 |         maxbb = 20
185 |         findvarg = HighVarGene(self.expression,zscore,lowgene)
186 |         if len(findvarg) > 0:
187 |             self.genespace.append(findvarg)
188 |             subdf = self.expression.loc[:,self.genespace[-1]]
189 |             
190 |         elif len(findvarg) == 0:
191 |             self.stopite = True
192 |             return()
193 |      
194 |         subdf=Skl_scale(subdf)
195 |         pcaspace = RunPCA(subdf.values.astype(float),3)[0]
196 |         Radius = np.histogram(distance.pdist(pcaspace),Bc)[1][1]
197 |         temppca = pcaspace    
198 |         Ordercell = OrderCell(temppca,Radius)
199 |         bb=0
200 |         opt_bins=True
201 |         lm_number = 1
202 |         while opt_bins == True:
203 |             bb = bb+1        
204 |             if len(temppca) >CellMaximumn:
205 |                 localmax = Findlocalmax(Ordercell,temppca,Bg)
206 |                 opt_bins = False
207 |             
208 |             else:
209 |                 localmax = Findlocalmax(Ordercell,temppca,5*bb)
210 |             
211 |             if localmax == False:
212 |                     if bb == 1:   
213 |                         self.membership.L1Cluster= 1
214 |                         self.CSIZE = len(self.membership)
215 |                         return()             
216 |                     else:
217 |                         localmax = Findlocalmax(Ordercell,temppca,5*(bb-1))
218 |                         opt_bins = False                           
219 |                     
220 |             lm_number_next = len(localmax)
221 |             if bb > maxbb and len(temppca) <CellMaximumn:
222 |                 opt_bins = False
223 |             elif lm_number_next >= lm_number:
224 |                lm_number = lm_number_next
225 |             elif lm_number_next < lm_number and len(temppca) <CellMaximumn :
226 |                 localmax = Findlocalmax(Ordercell,temppca,5*(bb-1))   
227 |                 opt_bins = False
228 |     
229 |         densepoint = []
230 |         for i in range(len(localmax)):
231 |             densepoint.append(np.argmax(localmax[i][0].neighbors))
232 |     
233 |         for j in Ordercell.index:
234 |             pairdist = distance.cdist([temppca[j]],temppca[densepoint])
235 |             Ordercell.loc[j,'cluster'] = np.argmin(pairdist)+np.max(self.membership.L1Cluster)+1
236 |             
237 |         Ordercell.cluster = Ordercell.cluster.astype(int)
238 |         for i in Ordercell.index:
239 |             self.membership.loc[i,'L1Cluster'] = Ordercell.loc[i,'cluster']      
240 | 
241 |         Eva=[]
242 |         Cluster_size=[]        
243 |         check_CLN = []
244 |         
245 |         for i in np.unique(Ordercell.cluster):
246 |             Eva.append(np.var(distance.pdist(self.expression.loc[self.membership[self.membership.L1Cluster ==i].index,self.genespace[-1]],'correlation')))
247 |             check_CLN.append(i)
248 |         
249 |         for i in np.unique(Ordercell.cluster):
250 |             Cluster_size.append(len(self.membership[self.membership.L1Cluster == i]))
251 |         
252 |         Eva=pd.DataFrame(Eva,index = check_CLN).fillna(0)
253 |         tempEva = np.copy(Eva)
254 |         tempEva.sort(axis=0)
255 |         ginivalue = []
256 |         
257 |         for j in range(len(Eva)):
258 |             accum = 0
259 |             lorenz=[0]
260 |             
261 |             if j == 0:
262 |                 pass
263 |             else:
264 |                 Evalist = tempEva[0:j+1]
265 |                 for i in Evalist:
266 |                     persent = 100*(i / sum(Evalist))
267 |                     accum = accum + persent
268 |                     lorenz.append(accum)
269 |                 ginivalue.append(gini(Evalist))
270 | 
271 |         Gini=pd.DataFrame(ginivalue)    
272 |         check_iteation = any(Gini.values > ginicutoff)
273 |         while check_iteation == True:
274 |             
275 |             if all(Gini.values < ginicutoff):
276 |                 check_iteation = False
277 | 
278 |             else:
279 |                 clust_pick_auto = list(set(np.unique(Ordercell.cluster)))
280 |                 clust_keep_auto = Eva.idxmin().values
281 |                 clust_pick_auto.remove(clust_keep_auto)
282 |  
283 |             nextdf = self.expression.loc[self.membership[self.membership.L1Cluster.isin(clust_pick_auto)].index,:]
284 |             
285 |             print('Percentage of cells being analyzed: '+str(round((1-(float(len(nextdf))/float(len(self.expression))))*100))+'%')
286 |             
287 |             findvarg = HighVarGene(nextdf,zscore,lowgene)
288 |             
289 |             if len(findvarg) > 0:
290 |                 self.genespace.append(findvarg)
291 |                 subdf = self.expression.loc[nextdf.index,self.genespace[-1]]
292 |     
293 |             elif len(findvarg) == 0:
294 |                 self.stopite = True
295 |                 return()
296 |      
297 |             subdf=Skl_scale(subdf)
298 |             pcaspace = RunPCA(subdf.values.astype(float),3)[0]
299 |             Radius = np.histogram(distance.pdist(pcaspace),Bc)[1][1]          
300 |             temppca = pcaspace    
301 |             Ordercell = OrderCell(temppca,Radius)
302 |             bb=0
303 |             opt_bins=True
304 |             lm_number = 1
305 |             while opt_bins == True:
306 |                 bb = bb+1
307 |                 
308 |                 if len(temppca) >CellMaximumn:
309 |                     localmax = Findlocalmax(Ordercell,temppca,Bg)
310 |                     opt_bins = False                         
311 |                 else:
312 |                     localmax = Findlocalmax(Ordercell,temppca,5*bb)
313 |                 
314 |                 if localmax == False:
315 |                     if bb == 1:
316 |                         self.membership.loc[nextdf.index,'L1Cluster'] = np.max(self.membership.L1Cluster)+1
317 |                         opt_bins = False
318 |                     else:
319 |                         localmax = Findlocalmax(Ordercell,temppca,5*(bb-1))
320 |                         opt_bins = False                           
321 |                 
322 |                 lm_number_next = len(localmax)
323 |                 if bb > maxbb and len(temppca) <CellMaximumn:
324 |                     opt_bins = False
325 | 
326 |                 elif lm_number_next >= lm_number:
327 |                     lm_number = lm_number_next
328 |                 elif lm_number_next < lm_number and len(temppca) <CellMaximumn:
329 |                     localmax = Findlocalmax(Ordercell,temppca,5*(bb-1))           
330 |                     opt_bins = False                
331 |         
332 |             densepoint = []
333 |             for i in range(len(localmax)):
334 |                 densepoint.append(np.argmax(localmax[i][0].neighbors))
335 |          
336 |             for j in Ordercell.index:
337 |     
338 |                 pairdist = distance.cdist([temppca[j]],temppca[densepoint])
339 |                 Ordercell.loc[j,'cluster'] = np.argmin(pairdist)+np.max(self.membership.L1Cluster)+1
340 |                        
341 |             Ordercell.cluster = Ordercell.cluster.astype(int)
342 |             
343 |             cellid_pick = self.membership[self.membership.L1Cluster.isin(clust_pick_auto)].index
344 |             for i in range(len(cellid_pick)):
345 |                 self.membership.loc[cellid_pick[i],'L1Cluster'] = Ordercell.loc[i,'cluster']               
346 |             
347 |             Eva=[]
348 |             Cluster_size=[]
349 |             check_CLN = [] 
350 |             for i in np.unique(Ordercell.cluster):
351 |                 Eva.append(np.var(distance.pdist(self.expression.loc[self.membership[self.membership.L1Cluster ==i].index,self.genespace[-1]],'correlation')))
352 |                 check_CLN.append(i)
353 |             for i in np.unique(Ordercell.cluster):
354 |                 Cluster_size.append(len(self.membership[self.membership.L1Cluster == i]))            
355 | 
356 |             Eva=pd.DataFrame(Eva,index = check_CLN).fillna(0)
357 |             tempEva = np.copy(Eva)
358 |             tempEva.sort(axis=0)
359 |             ginivalue = []
360 |             for j in range(len(Eva)):
361 |                 accum = 0
362 |                 lorenz=[0]
363 |             
364 |                 if j == 0:
365 | 
366 |                    pass        
367 |                 else:     
368 |                    Evalist = tempEva[0:j+1]
369 |                    for i in Evalist:
370 |                        persent = 100*(i / sum(Evalist))
371 |                        accum = accum + persent
372 |                        lorenz.append(accum)
373 |                    ginivalue.append(gini(Evalist))
374 | 
375 |             Gini=pd.DataFrame(ginivalue)
376 |             check_iteation = any(Gini.values > ginicutoff)
377 |  
378 | 
379 | class PanoView:
380 |      
381 |     def __init__(self,filename,annotation=None):
382 |         
383 |         expression=pd.read_csv(filename+'.csv',index_col=0)      
384 |         
385 |         if annotation != None:
386 |             self.cell_anno = pd.read_csv(annotation+'.csv',index_col=0).values
387 |         
388 |         self.raw_exp = expression
389 |         self.log_exp =[]
390 |         self.cell_id =[]
391 |         self.vargene =[]
392 |         self.cell_clusters=[]        
393 |         self.cell_membership=[]
394 |         self.sim_matrix=[]
395 |         self.tsne2d=np.array([])
396 |         self.vg_stat=[]
397 |         self.L1cell_color=[]
398 |         self.L1cell_dendro_order=[]
399 |         self.L1cluster_color=[]
400 |         self.L2cell_color=[]
401 |         self.L2cell_dendro_order=[]
402 |         self.L2cluster_color=[]
403 |         
404 |         
405 |     def RunSearching(self,Normal=True,Log2=True,GeneLow='default',Zscore='default'):
406 |         
407 |         self.raw_exp.index.astype(str)
408 |         generepeat = len(self.raw_exp.index) != len(np.unique(self.raw_exp.index))
409 |         while generepeat == True:
410 |             self.raw_exp.index = self.raw_exp.index.where(~self.raw_exp.index.duplicated(), self.raw_exp.index + '_dp')
411 |             generepeat = len(self.raw_exp.index) != len(np.unique(self.raw_exp.index))        
412 |         
413 |         if Normal == False:
414 |             if Log2 == False:
415 |                 self.log_exp = self.raw_exp.transpose()
416 |                 expression = self.log_exp.loc[:,(self.log_exp!=0).any(axis=0)]
417 |             else:
418 |                 self.log_exp = np.log2(1+self.raw_exp.transpose())
419 |                 expression = self.log_exp.loc[:,(self.log_exp!=0).any(axis=0)]
420 |         else:
421 |             
422 |             self.raw_exp=self.raw_exp.loc[(self.raw_exp!=0).any(axis=1),:]
423 |             raw_norm = pd.DataFrame(normalize(self.raw_exp,norm='l1',axis=0)*self.raw_exp.sum().median(),index=self.raw_exp.index,columns=self.raw_exp.columns)
424 |             self.log_exp = np.log2(1+raw_norm.transpose())
425 |             expression = self.log_exp
426 |             
427 |         self.cell_id = expression.index
428 |         self.gene_id = expression.columns
429 |         
430 |         expression.index=range(len(expression))
431 |         VarGene=[]
432 |         
433 |         if GeneLow != 'default':
434 |             GeneLow = GeneLow
435 |         else:
436 |             GeneLow = 0.5
437 |         
438 |         if Zscore != 'default':
439 |             Zscore = Zscore
440 |         else:
441 |             Zscore = 1.5
442 |             
443 |         result = Panoite(expression)
444 |         result.generate_clusters(GeneLow,Zscore)
445 | 
446 |         finalcluster=[]
447 |         VarGene.append(np.unique([gene for sublist in result.genespace for gene in sublist]))
448 |         for i in np.unique(result.membership.L1Cluster):
449 |             finalcluster.append(result.membership[result.membership.L1Cluster ==i].index.values)        
450 |         self.vargene= np.unique([gene for sublist in VarGene for gene in sublist])
451 |         self.cell_clusters = finalcluster
452 |         print("Clusters generated ")
453 | 
454 |         
455 |     def OutputResult(self,clust_merge='default',metric_dis='default',fclust_height= 'default', init='default',n_PCs='default'):    
456 | 
457 |         if clust_merge != 'default':
458 |             clust_merge=clust_merge;
459 |         else:
460 |             clust_merge = 0.2
461 |         
462 |         if metric_dis != 'default':
463 |             metric_dis = 2;
464 |         else:
465 |             metric_dis = 1
466 |         
467 |         if fclust_height != 'default':
468 |             fclust_height = fclust_height
469 |         else:
470 |             fclust_height = 0.2
471 |             
472 |         if init != 'default':
473 |             init = 'random'
474 |         else:
475 |             init = 'pca'
476 |             
477 |         if n_PCs != 'default':
478 |             n_PCs = n_PCs
479 |         else:
480 |             n_PCs = 15
481 |             
482 |         expression= self.log_exp
483 |         expression.index=range(len(expression))
484 |         expression = Skl_scale(expression)
485 |                
486 |         cluster_list=[]
487 |         for i in range(len(self.cell_clusters)):
488 |             cluster_list.append(expression.loc[self.cell_clusters[i],self.vargene])            
489 |         
490 |         cluster_list_oroginal = cluster_list
491 |         
492 |         sim_mat = pd.DataFrame(0,index=np.arange(1,len(cluster_list)+1),columns=np.arange(1,len(cluster_list)+1))        
493 |         
494 |         for i in sim_mat.index:
495 |             ci = cluster_list[i-1]
496 |             for j in sim_mat.index:
497 |                 cj = cluster_list[j-1]
498 |                 if metric_dis == 1:
499 |                     
500 |                     sim_mat.loc[i,j] = np.mean(distance.cdist(ci,cj,metric='correlation'))
501 |                 elif metric_dis ==2:
502 |                     
503 |                     sim_mat.loc[i,j] = np.mean(distance.cdist(ci,cj,metric='euclidean'))
504 |                     
505 |         sim_mat_original = sim_mat 
506 |         clustdis_within = pd.DataFrame(np.diagonal(sim_mat),index=sim_mat.index,columns=['dis'])
507 |         
508 |         
509 |         linkage_matrix = linkage(sim_mat,method="ward")
510 |         cutree = pd.DataFrame(data=hierarchy.cut_tree(linkage_matrix , n_clusters=None, height=None)+1,index=range(1,len(cluster_list)+1))  
511 |         
512 |         init_lenth=0
513 |         Lgroups=[]
514 |         MergeCluster=[]
515 |     
516 |         for step in range(1,len(cutree.columns)):
517 |         
518 |             repeats=[item for item, count in Counter(cutree[step]).items() if count > 1]
519 |             check_lenth = len(repeats)
520 |         
521 |             lgroups=[]
522 |             for i in repeats:
523 |                 lgroups.append(cutree[step][cutree[step]==i].index.tolist())
524 |             Lgroups.append(lgroups)
525 |             
526 |             if check_lenth > init_lenth:
527 |                 cgroups=[]
528 |                 for i in repeats:
529 |                     cgroups.append(cutree[step][cutree[step]==i].index.tolist())
530 |             
531 |                 for g in cgroups:
532 |                     if len(g) ==2:
533 |                         pairdis = sim_mat.loc[g[0],g[1]]
534 |                     
535 |                         if abs(pairdis - clustdis_within.loc[g[0]].dis) <= clustdis_within.min().dis*clust_merge or abs(pairdis - clustdis_within.loc[g[1]].dis) <= clustdis_within.min().dis*clust_merge:                                        
536 |                             if g not in MergeCluster:                    
537 |                                 MergeCluster.append(g)            
538 |             
539 |             elif check_lenth < init_lenth:
540 |                 break
541 |             else:
542 |                 clust=[i for i in Lgroups[step-1] if i not in Lgroups[step-2]]
543 |                 clust=[i for sublist in clust for i in sublist]
544 |                 clust_to_merge = [s for s in MergeCluster if bool(set(s) & set(clust))]
545 |                 clust_to_merge=[i for sublist in clust_to_merge for i in sublist]
546 |                 clust_add = [s for s in clust if s not in clust_to_merge][0]
547 |             
548 |                 for c in clust_to_merge:
549 |                 
550 |                     pairdis = sim_mat.loc[c,clust_add]
551 |                
552 |                     if abs(pairdis - clustdis_within.loc[c].dis) <= clustdis_within.min().dis*clust_merge or abs(pairdis - clustdis_within.loc[clust_add].dis) <= clustdis_within.min().dis*clust_merge:
553 |                                    
554 |                         MergeCluster.remove(clust_to_merge)
555 |                         clust_to_merge.append(clust_add)
556 |                         MergeCluster.append(clust_to_merge)
557 |                         break
558 |             init_lenth = check_lenth
559 |         
560 |         mergelist = MergeCluster
561 |             
562 |         if len(mergelist) !=0:
563 |             for i in mergelist:
564 |                 frame = [cluster_list[s-1] for s in i ]
565 |                 cluster_list.append(pd.concat(frame))
566 |             
567 |             rmlist = [item for sublist in mergelist for item in sublist]
568 |             cluster_list_new = [v for i, v in enumerate(cluster_list) if i+1 not in rmlist]
569 |             
570 |             sim_mat = pd.DataFrame(0,index=np.arange(1,len(cluster_list_new)+1),columns=np.arange(1,len(cluster_list_new)+1))        
571 |             for i in sim_mat.index:
572 |                 ci = cluster_list_new[i-1]
573 |                 for j in sim_mat.index:                
574 |                     cj = cluster_list_new[j-1]
575 |                     
576 |                     if metric_dis == 1:
577 |                         sim_mat.loc[i,j] = np.mean(distance.cdist(ci,cj,metric='correlation'))
578 |                     elif metric_dis ==2:
579 |                         sim_mat.loc[i,j] = np.mean(distance.cdist(ci,cj,metric='euclidean'))
580 |                         
581 |             cluster_list = cluster_list_new
582 |         
583 |         if len(sim_mat) ==1:
584 |             cluster_list = cluster_list_oroginal
585 |             linkage_matrix = linkage(sim_mat_original,method="ward")
586 |             sim_mat=sim_mat_original
587 |             
588 |         else:
589 |             linkage_matrix = linkage(sim_mat,method="ward")
590 |        
591 |         dn1=dendrogram(linkage_matrix,distance_sort='descending',leaf_font_size=24,labels=sim_mat.index,color_threshold=0.01,no_plot=True)
592 |     
593 |         dfheat_order=[]
594 |         for i in dn1['leaves']:
595 |             dfheat_order.append(cluster_list[i])
596 |     
597 |         membership = pd.DataFrame({'L1Cluster':0},index=list(range(len(expression))))   
598 |         for i in range(len(dfheat_order)):
599 |             membership.loc[dfheat_order[i].index,'L1Cluster'] = dn1['leaves'][i]+1
600 |         
601 |         self.L1clust_dendro=dn1['leaves']
602 |         
603 |         membership=membership.astype(int)
604 |         membership.loc[:,'Cell_ID'] = self.cell_id
605 |         self.cell_membership = membership
606 |         dn1_linkage_matrix_dn1=linkage_matrix
607 |             
608 |         colormaps = sns.color_palette("hls", len(np.unique(self.cell_membership.L1Cluster)))
609 |         cellgroup=[]
610 |         heatcolor = []
611 |         cluster_color=[]
612 |         for i in self.L1clust_dendro:
613 |             cellgroup.append(self.cell_membership[self.cell_membership.L1Cluster == (i+1)].index)
614 |             cluster_color.append([(i+1),colormaps[i]])
615 |             for j in range(len(self.cell_membership[self.cell_membership.L1Cluster == (i+1)].index)):    
616 |                     heatcolor.append(colormaps[i])
617 |         
618 |         self.L1cell_color = heatcolor  
619 |         self.L1cell_dendro_order = [item for sublist in cellgroup for item in sublist]
620 |         self.L1cluster_color=cluster_color
621 |        
622 |         if fclust_height != 0.2:
623 |             fclust_height = fclust_height;
624 |         
625 |         assignments = fcluster(dn1_linkage_matrix_dn1,fclust_height,'distance')
626 |         Final_cluster = pd.DataFrame({'cluster':assignments})
627 |         Final_cluster.index=sim_mat.index
628 |         Final_cluster = Final_cluster.iloc[self.L1clust_dendro]
629 |         
630 |         for i in Final_cluster.index:
631 |             self.cell_membership.loc[self.cell_membership[self.cell_membership.L1Cluster ==i].index,'L2Cluster'] = Final_cluster.loc[i,'cluster']
632 |         self.cell_membership.loc[:,'L2Cluster'] =self.cell_membership.loc[:,'L2Cluster'].astype(int)
633 |         self.cell_membership=self.cell_membership[['Cell_ID','L1Cluster','L2Cluster']]
634 |         self.cell_membership.to_csv('Cell_Membership.csv')   
635 |         
636 |         print("Output Cell_Membership.csv ")
637 |         
638 |         
639 |         sns.set_style(style="white")
640 |         fig = plt.figure(figsize=(18, 12))
641 |         #plt.figure(figsize=(18,12))
642 |         gs = gridspec.GridSpec(2, 2)
643 |         gs.update(wspace=0.05)
644 |         
645 |         ax1 = plt.subplot(gs[:, 0])
646 |         dendrogram(dn1_linkage_matrix_dn1,distance_sort='descending',leaf_font_size=18,labels=sim_mat.index,color_threshold=0.01,above_threshold_color='grey')
647 |         ax1_1 = plt.gca()
648 |         xlbls = ax1_1.get_xmajorticklabels()
649 |         c=0
650 |         for lbl in xlbls:
651 |             lbl.set_color(self.L1cluster_color[c][1])
652 |             c=c+1
653 |         ax1.spines['top'].set_visible(False)
654 |         ax1.spines['right'].set_visible(False)
655 |         ax1.spines['bottom'].set_visible(False)
656 |         plt.axhline(y=fclust_height, color='grey', linestyle='--',linewidth=1.45)
657 |         
658 |         
659 |     
660 |         expression=self.log_exp
661 |         result=self.cell_membership
662 |         tsnedata= Skl_scale(expression.loc[:,self.vargene])
663 |         tmodel = TSNE(n_components=2,random_state=1,init=init)
664 |         if n_PCs != 15:
665 |             n_PCs=n_PCs;
666 |         tsnedata = RunPCA(tsnedata,n_PCs)[0]
667 |         tcoord = tmodel.fit_transform(tsnedata)
668 |         self.tsne2d = tcoord
669 |         
670 |         sns.set_style(style="white")
671 |         ax2 = plt.subplot(gs[0, 1])
672 |         cluster_colors = [i[1] for i in self.L1cluster_color]
673 |         j=0
674 |         
675 |         for i in [i[0] for i in self.L1cluster_color]:
676 |             
677 |             ax2.scatter(tcoord[result[result.L1Cluster ==i].index,0],tcoord[result[result.L1Cluster==i].index,1],color=cluster_colors[j],s=50,label=i)
678 |             j=j+1
679 |         
680 |         if len(np.unique(result.L1Cluster))>15:
681 |             ncol=2
682 |         else:
683 |             ncol=1
684 |             
685 |         plt.legend(prop={'size':11}, bbox_to_anchor=(0.95,1.05),ncol=ncol,loc='upper left',frameon=False)
686 |         plt.grid()
687 |         plt.title('L1 Cluster',fontsize=16)
688 |         plt.xticks([])
689 |         plt.yticks([])
690 |         ax2.spines['top'].set_visible(False)
691 |         ax2.spines['right'].set_visible(False)
692 |         
693 |         sns.set_style(style="white")
694 |         ax3 = plt.subplot(gs[1, 1])
695 |         cluster_colors = sns.color_palette("hls", len(np.unique(result.L2Cluster)))
696 |         l2order = []
697 |         [l2order.append(item) for item in Final_cluster.cluster if item not in l2order]
698 |         j=0
699 |         
700 |         heatcolor2 = []
701 |         cluster_color2=[]
702 |         cellgroup2 = []
703 |         for i in l2order:
704 |             cluster_color2.append([i,cluster_colors[j]])
705 |             clabel =  str(i) +" "+ str([i for i in Final_cluster[Final_cluster.cluster==i].index])
706 |             ax3.scatter(tcoord[result[result.L2Cluster ==i].index,0],tcoord[result[result.L2Cluster==i].index,1],color=cluster_colors[j],s=50,label=clabel)
707 |             heatcolor2.append([cluster_colors[j] for k in range(len(result[result.L2Cluster ==i].index))])
708 |             cellgroup2.append(result[result.L2Cluster ==i].index)
709 |             j=j+1
710 |         
711 |         self.L2cell_color = [item for sublist in  heatcolor2 for item in sublist] 
712 |         self.L2cell_dendro_order = [item for sublist in cellgroup2 for item in sublist]
713 |         self.L2cluster_color=cluster_color2
714 |         
715 |         
716 |         if len(np.unique(result.L2Cluster))>15:
717 |             ncol=2
718 |         else:
719 |             ncol=1
720 |             
721 |         plt.legend(prop={'size':11}, bbox_to_anchor=(0.95,1.05),ncol=ncol,loc='upper left',frameon=False)
722 |         plt.grid()
723 |         plt.title('L2 Cluster (---)',fontsize=16, color ='grey')
724 |         plt.xticks([])
725 |         plt.yticks([])
726 |         ax3.spines['top'].set_visible(False)
727 |         ax3.spines['right'].set_visible(False)
728 |         plt.savefig('PanoView_output',dpi=fig.dpi)
729 |         print("Output PanoView_output.png")
730 | 
731 |     
732 |     
733 |     
734 |     def VisCluster(self,clevel,cnumber):     
735 |         result=self.cell_membership
736 |         tcoord=self.tsne2d
737 |         
738 |         clustnumber1 = np.unique(result.L1Cluster)
739 |         clustnumber2 = np.unique(result.L2Cluster)
740 |         
741 |         
742 |         sns.set_style(style="white")
743 |         fig = plt.figure(figsize=(10, 10))
744 |         
745 |         if clevel == 1:
746 |             
747 |             if cnumber not in clustnumber1:
748 |                 print('cnumber not found, please input a different cnumber')
749 |                 
750 |             else:    
751 |                 for i in clustnumber1:
752 |                     if i == cnumber:
753 |                         plt.scatter(tcoord[result[result.L1Cluster ==i].index,0],tcoord[result[result.L1Cluster==i].index,1],color='b',s=50,label=i)
754 |                     else:
755 |                         plt.scatter(tcoord[result[result.L1Cluster ==i].index,0],tcoord[result[result.L1Cluster==i].index,1],color='gray',s=50,label=i)
756 |             
757 |                 plt.legend(loc='upper left', prop={'size':16}, bbox_to_anchor=(0.99,1),ncol=1,frameon=False)
758 |         
759 |         elif clevel == 2:
760 |             
761 |             if cnumber not in clustnumber2:
762 |                 print('cnumber not found, please input a different cnumber')
763 |                 
764 |             else:
765 |                 for i in clustnumber2:
766 |                     if i == cnumber:
767 |                         plt.scatter(tcoord[result[result.L2Cluster ==i].index,0],tcoord[result[result.L2Cluster==i].index,1],color='b',s=50,label=i)
768 |                     else:
769 |                         plt.scatter(tcoord[result[result.L2Cluster ==i].index,0],tcoord[result[result.L2Cluster==i].index,1],color='gray',s=50,label=i)
770 |             
771 |                 plt.legend(loc='upper left', prop={'size':16}, bbox_to_anchor=(0.99,1),ncol=1,frameon=False)
772 |             
773 |         else:
774 |             print('clevel not found, please input a different clevel')
775 |         
776 |         plt.savefig('cluster_%s.png' % cnumber,dpi=fig.dpi)    
777 |     
778 |     
779 |     def VisClusterAnno(self):
780 |         annotation = pd.DataFrame(self.cell_anno,columns=['anno'])
781 |         cluster_id = np.unique(annotation)
782 |         tcoord=self.tsne2d
783 |         sns.set_style(style="white")
784 |         fig = plt.figure(figsize=(16, 10))
785 |         cluster_colors = sns.color_palette("hls", len(cluster_id))
786 |         j=0
787 |         for i in cluster_id:
788 |             plt.scatter(tcoord[annotation[annotation.anno ==i].index,0],tcoord[annotation[annotation.anno==i].index,1],color=cluster_colors[j],s=50,label=i)
789 |             j=j+1
790 |         
791 |         plt.legend(prop={'size':14}, bbox_to_anchor=(0.99,1),loc='upper left',frameon=False)
792 |         plt.grid()
793 |         plt.xticks([])
794 |         plt.yticks([])
795 |         plt.savefig('Clusters_Annotation',dpi=fig.dpi)
796 |  
797 |        
798 |     def VisGeneExp(self,genes):
799 |         markers=genes
800 |         markerdata = self.log_exp
801 |         markerdata = normalize(markerdata,norm='max') ## normalization
802 |         markerdata=pd.DataFrame(markerdata,index=self.log_exp.index ,columns=self.log_exp.columns)
803 |         
804 |         for i in range(len(markers)):
805 |             sns.set_style(style="white")
806 |             fig = plt.figure(figsize=(10, 10))
807 |             plt.suptitle(markers[i],fontsize=36)
808 |             plt.scatter(self.tsne2d[:,0],self.tsne2d[:,1],c=markerdata.loc[:,markers[i]],s=50,cmap='BuPu',edgecolor='gray',alpha=0.3)
809 |             plt.savefig('%s.png' % markers[i],dpi=fig.dpi)
810 |             
811 | 
812 |     def RunVGs(self,clevel):
813 |         pvalue=[]
814 |         logFD=[]
815 |         datafordeg = self.log_exp.loc[:,(self.log_exp!=0).any(axis=0)]
816 |         for i in datafordeg.columns:
817 |             groups=[]
818 |             fdgroups=[]
819 |             if clevel ==1:
820 |                 for j in np.unique(self.cell_membership.L1Cluster):
821 |                     groups.append(datafordeg.loc[self.cell_membership[self.cell_membership.L1Cluster == j].index,i])
822 |                     fdgroups.append(datafordeg.loc[self.cell_membership[self.cell_membership.L1Cluster == j].index,i].mean())
823 |                 if max(fdgroups)-min(fdgroups) >=1:
824 |                     pvalue.append([i,stats.kruskal(*groups)[1]])
825 |                     logFD.append(max(fdgroups)-min(fdgroups))
826 |             elif clevel ==2:
827 |                 for j in np.unique(self.cell_membership.L2Cluster):
828 |                     groups.append(datafordeg.loc[self.cell_membership[self.cell_membership.L2Cluster == j].index,i])
829 |                     fdgroups.append(datafordeg.loc[self.cell_membership[self.cell_membership.L2Cluster == j].index,i].mean())
830 |                 if max(fdgroups)-min(fdgroups) >=1:
831 |                     pvalue.append([i,stats.kruskal(*groups)[1]])
832 |                     logFD.append(max(fdgroups)-min(fdgroups))
833 |            
834 |         DEGstat=pd.DataFrame(pvalue,columns=['gene','pvalue']).fillna(1)
835 |         p_value_adj=multipletests(DEGstat.iloc[:,1],alpha=0.05,method='fdr_bh')
836 |         DEGstat.loc[:,'Padj'] = p_value_adj[1]
837 |         DEGstat.loc[:,'log2FD'] = logFD
838 |         self.vg_stat = DEGstat
839 |         
840 |     
841 |     def HeatMapVGs(self,pval,number,fd,clevel,genes_add=None):
842 |     
843 |         if clevel ==1:
844 |            cellcolor =self.L1cell_color
845 |            cellorder=self.L1cell_dendro_order
846 |         elif clevel ==2:
847 |            cellcolor=self.L2cell_color 
848 |            cellorder=self.L2cell_dendro_order
849 |            
850 |         topgene = self.vg_stat.query("Padj <@pval & log2FD >@fd").sort_values(by='Padj')[:number].gene.tolist()
851 |         if genes_add != None:
852 |             topgene = topgene + genes_add
853 |         topgene = list(set(topgene))    
854 |         df = self.log_exp.loc[cellorder,topgene]
855 |         
856 |         linkage_matrix = linkage(df.T,method='complete')
857 |         dn=dendrogram(linkage_matrix,show_leaf_counts=True,orientation='left',no_plot=True)
858 |         df = df.iloc[:,dn['leaves']]
859 |         
860 |         sns.set_style(style="white")
861 |         FigHeat = plt.figure(figsize=(12, 12))                  
862 |           
863 |         ax = FigHeat.add_subplot(111)
864 |         cax = ax.matshow(df,aspect='auto',cmap='BuPu')
865 |         cbr=plt.colorbar(cax,fraction=0.02, pad=0.05)
866 |         cbr.ax.set_title('$\log2$',fontsize=10)
867 |         cbr.outline.set_visible(False)
868 |         
869 |         ax.set_xticks(range(len(topgene)))
870 |         ax.set_xticklabels(df.columns,fontsize=12,rotation=90)
871 |         ax.set_yticks([])
872 |         ax.spines['top'].set_visible(False)
873 |         ax.spines['right'].set_visible(False)
874 |         ax.spines['bottom'].set_visible(False)
875 |         ax.spines['left'].set_visible(False)
876 |         
877 |         
878 |         axbar = FigHeat.add_axes([0.1, 0.11, 0.02, 0.770])   
879 |         cmap1 = mpl.colors.ListedColormap(cellcolor[::-1])
880 |         cbar=mpl.colorbar.ColorbarBase(axbar,cmap=cmap1, orientation='vertical',ticks=[])
881 |         cbar.outline.set_visible(False)
882 |         plt.savefig('HeatmapVGs',dpi=FigHeat.dpi)
883 |         
884 |         
885 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # Single-cell Panoramic View Clustering (PanoView) #
 2 | 
 3 | **PanoView** is an iterative PCA-based method that integrates with a novel density-based clustering, ordering local maximum by convex hull (OLMC) algorithm, to identify cell subpopulations for single-cell RNA-sequencing. For details of the method, please see our paper at PLOS Computational Biology (https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1007040).
 4 | 
 5 | 
 6 | 
 7 | 
 8 | ![PanoView](https://github.com/mhu10/scPanoView/blob/master/PanoView.jpg)
 9 | <p align="center">
10 |   :heavy_plus_sign:
11 | 
12 | <p align="center">
13 |   <img width="350" height="350" src="https://github.com/mhu10/scPanoView/blob/master/OLMC_animation.gif">
14 | </p>
15 | 
16 | 
17 | ## Installation ##
18 | **PanoView** is a python module that uses other common python libraries such as *numpy*, *scipy*, *pandas*, *scikit-learn*, etc., to realize the proposed algorithm. Prior to installing **PanoView** from Github repository, please make sure that Git is properly installed or go to https://git-scm.com/  for the installation of Git.
19 | To install **PanoView** at your local computer, open your command prompt and type the following
20 | 
21 | ```
22 | pip install git+https://github.com/mhu10/scPanoView.git#egg=scPanoView
23 | ```
24 | 
25 | It will install all the required python libraries for executing **PanoView**. To test the installation of **PanoView**, open the python interpreter or your preferred IDE (*Spyder*, *PyCharm*, *Jupyter*, etc. ) and type the following
26 | 
27 | ```
28 | from PanoramicView import scPanoView
29 | ```
30 | There should not be any error message popping out.
31 | 
32 | Note: PanoView was implement and tested by python3.6.
33 | 
34 | 
35 | 
36 | ## Tutorial ##
37 | 
38 | Plese refer to the manuaul ( *"PanoViewManual.pdf"* ) for details of executing **PanoView** algorithm in python.
39 | 
40 | For running tutorial in the manual, please download the example dataset (*"ExamplePollen.zip"* ) and upzip it into your python working directory.
41 | 


--------------------------------------------------------------------------------
/setup.py:
--------------------------------------------------------------------------------
 1 | from setuptools import setup, find_packages
 2 | 
 3 | setup(
 4 |     name='scPanoView',
 5 |     version='0.1',
 6 |     packages=find_packages(exclude=['tests*']),
 7 |     license='MIT',
 8 |     description='A single-cell clustering algorithm',
 9 |     long_description=open('README.md').read(),
10 |     install_requires=['numpy>=1.13','pandas>=0.20','scipy>=0.19','matplotlib','seaborn>=0.8','scikit-learn>=0.19','statsmodels>=0.8'],
11 |     url='https://github.com/mhu10/scPanoView',
12 |     author='Ming-Wen Hu & Jiang Qian',
13 |     author_email='mhu10@jhmi.edu,jiang.qian@jhmi.edu'
14 | )
15 | 


--------------------------------------------------------------------------------