├── ItClust_package ├── ItClust.egg-info │ ├── PKG-INFO │ ├── SOURCES.txt │ ├── dependency_links.txt │ ├── requires.txt │ └── top_level.txt ├── ItClust │ ├── DEC.py │ ├── ItClust.py │ ├── SAE.py │ ├── __init__.py │ └── preprocessing.py ├── License ├── README.md ├── build │ └── lib │ │ └── ItClust │ │ ├── DEC.py │ │ ├── ItClust.py │ │ ├── SAE.py │ │ ├── __init__.py │ │ └── preprocessing.py ├── dist │ ├── ItClust-0.0.5-py3-none-any.whl │ ├── ItClust-0.0.5.tar.gz │ ├── ItClust-1.1.0-py3-none-any.whl │ ├── ItClust-1.1.0.tar.gz │ ├── ItClust-1.2.0-py3-none-any.whl │ └── ItClust-1.2.0.tar.gz └── setup.py ├── README.md ├── docs └── asserts │ └── images │ ├── README.md │ └── workflow.jpg └── tutorial ├── data ├── pancreas │ ├── Bh.h5ad │ ├── Bh_smartseq2_results.h5ad │ └── smartseq2.h5ad └── pbmc │ ├── barcodes.tsv │ └── genes.tsv ├── output_15_1.png ├── output_17_0.png ├── tutorial.ipynb └── tutorial.md /ItClust_package/ItClust.egg-info/PKG-INFO: -------------------------------------------------------------------------------- 1 | Metadata-Version: 2.1 2 | Name: ItClust 3 | Version: 1.2.0 4 | Summary: An Iterative Transfer learning algorithm for scRNA-seq Clustering 5 | Home-page: https://github.com/jianhuupenn/ItClust 6 | Author: Jian Hu 7 | Author-email: jianhu@pennmedicine.upenn.edu 8 | License: UNKNOWN 9 | Description: ItClust 10 | 11 | ItClust: Transfer learning improves clustering and cell type classification in single-cell RNA-seq analysis 12 | 13 | ItClust is an Iterative Transfer learning algorithm for scRNA-seq Clustering. It starts from building a training neural network to extract gene-expression signatures from a well-labeled source dataset. This step enables initializing the target network with parameters estimated from the training network. The target network then leverages information in the target dataset to iteratively fine-tune parameters in an unsupervised manner, so that the target-data-specific gene-expression signatures are captured. Once fine-tuning is finished, the target network then returns clustered cells in the target data. 14 | ItClust has shown to be a powerful tool for scRNA-seq clustering and cell type classification analysis. It can accurately extract information from source data and apply it to help cluster cells in target data. It is robust to strong batch effect between source and target data, and is able to separate unseen cell types in the target. Furthermore, it provides confidence scores that facilitates cell type assignment. With the increasing popularity of scRNA-seq in biomedical research, we expect ItClust will make better utilization of the vast amount of existing well annotated scRNA-seq datasets, and enable researchers to accurately cluster and annotate cells in scRNA-seq. 15 | 16 | For more info, please go to our github page: https://github.com/jianhuupenn/ItClust 17 | Platform: UNKNOWN 18 | Classifier: Programming Language :: Python :: 3 19 | Classifier: License :: OSI Approved :: MIT License 20 | Classifier: Operating System :: OS Independent 21 | Requires-Python: >=3.6 22 | Description-Content-Type: text/markdown 23 | -------------------------------------------------------------------------------- /ItClust_package/ItClust.egg-info/SOURCES.txt: -------------------------------------------------------------------------------- 1 | README.md 2 | setup.py 3 | ItClust/DEC.py 4 | ItClust/ItClust.py 5 | ItClust/SAE.py 6 | ItClust/__init__.py 7 | ItClust/preprocessing.py 8 | ItClust.egg-info/PKG-INFO 9 | ItClust.egg-info/SOURCES.txt 10 | ItClust.egg-info/dependency_links.txt 11 | ItClust.egg-info/requires.txt 12 | ItClust.egg-info/top_level.txt -------------------------------------------------------------------------------- /ItClust_package/ItClust.egg-info/dependency_links.txt: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /ItClust_package/ItClust.egg-info/requires.txt: -------------------------------------------------------------------------------- 1 | keras 2 | pandas 3 | numpy 4 | scipy 5 | scanpy 6 | anndata 7 | natsort 8 | sklearn 9 | -------------------------------------------------------------------------------- /ItClust_package/ItClust.egg-info/top_level.txt: -------------------------------------------------------------------------------- 1 | ItClust 2 | -------------------------------------------------------------------------------- /ItClust_package/ItClust/DEC.py: -------------------------------------------------------------------------------- 1 | from __future__ import division 2 | import os 3 | 4 | from . SAE import SAE # load Stacked autoencoder 5 | from . preprocessing import change_to_continuous 6 | from time import time 7 | import numpy as np 8 | from keras.engine.topology import Layer, InputSpec 9 | from keras.callbacks import TensorBoard, ModelCheckpoint, EarlyStopping, ReduceLROnPlateau,History 10 | from keras.layers import Dense, Input 11 | from keras.models import Model 12 | from keras.optimizers import SGD 13 | from keras import callbacks 14 | from keras.initializers import VarianceScaling 15 | from sklearn.cluster import KMeans 16 | import scanpy.api as sc 17 | import pandas as pd 18 | from sklearn.metrics import normalized_mutual_info_score,adjusted_rand_score 19 | import keras.backend as K 20 | 21 | class ClusteringLayer(Layer): # Re-define lot of build in functions for Keras 22 | """ 23 | Clustering layer converts input sample (feature) to soft label, i.e. a vector that represents the probability of the 24 | sample belonging to each cluster. The probability is calculated with student's t-distribution. 25 | 26 | # Example 27 | ``` 28 | model.add(ClusteringLayer(n_clusters=10)) 29 | ``` 30 | # Arguments 31 | n_clusters: number of clusters. 32 | weights: list of Numpy array with shape `(n_clusters, n_features)` witch represents the initial cluster centers. 33 | alpha: parameter in Student's t-distribution. Default to 1.0. 34 | # Input shape 35 | 2D tensor with shape: `(n_samples, n_features)`. 36 | # Output shape 37 | 2D tensor with shape: `(n_samples, n_clusters)`. 38 | """ 39 | 40 | def __init__(self, n_clusters, weights=None, alpha=1.0, **kwargs): 41 | if 'input_shape' not in kwargs and 'input_dim' in kwargs: 42 | kwargs['input_shape'] = (kwargs.pop('input_dim'),) 43 | super(ClusteringLayer, self).__init__(**kwargs) 44 | self.n_clusters = n_clusters 45 | self.alpha = alpha 46 | self.initial_weights = weights 47 | self.input_spec = InputSpec(ndim=2) 48 | 49 | def build(self, input_shape): 50 | assert len(input_shape) == 2 51 | input_dim = input_shape[1] 52 | self.input_spec = InputSpec(dtype=K.floatx(), shape=(None, input_dim)) 53 | self.clusters = self.add_weight((self.n_clusters, input_dim), initializer='glorot_uniform', name='clustering') 54 | if self.initial_weights is not None: 55 | self.set_weights(self.initial_weights) 56 | del self.initial_weights 57 | self.built = True 58 | 59 | def call(self, inputs, **kwargs): # The activation function for clustering layer 60 | """ student t-distribution, as same as used in t-SNE algorithm. 61 | q_ij = 1/(1+dist(x_i, u_j)^2), then normalize it. 62 | Arguments: 63 | inputs: the variable containing data, shape=(n_samples, n_features) 64 | Return: 65 | q: student's t-distribution, or soft labels for each sample. shape=(n_samples, n_clusters) 66 | """ 67 | q = 1.0 / (1.0 + (K.sum(K.square(K.expand_dims(inputs, axis=1) - self.clusters), axis=2) / self.alpha)) 68 | q **= (self.alpha + 1.0) / 2.0 69 | q = K.transpose(K.transpose(q) / K.sum(q, axis=1)) 70 | return q 71 | 72 | def compute_output_shape(self, input_shape): 73 | assert input_shape and len(input_shape) == 2 74 | return input_shape[0], self.n_clusters 75 | 76 | def get_config(self): 77 | config = {'n_clusters': self.n_clusters} 78 | base_config = super(ClusteringLayer, self).get_config() 79 | return dict(list(base_config.items()) + list(config.items())) 80 | 81 | class DEC(object): 82 | def __init__(self, 83 | dims, 84 | x, # input matrix, row sample, col predictors 85 | y=None, # if provided will trained with supervised 86 | alpha=1.0, 87 | init='glorot_uniform', #initialization method 88 | n_clusters=None, # Number of Clusters, if provided, the clusters center will be initialized by K-means, 89 | louvain_resolution=1.0, # resolution for louvain 90 | n_neighbors=10, # the 91 | pretrain_epochs=200, # epoch for autoencoder 92 | ae_weights=None, #ae_ 93 | actinlayer1="tanh",# activation for the last layer in encoder, and first layer in the decoder 94 | is_stacked=True, 95 | transfer_feature=None, 96 | model_weights=None, 97 | y_trans=None, 98 | softmax=False, 99 | ): 100 | 101 | super(DEC, self).__init__() 102 | self.dims = dims 103 | self.x=x #feature n*p, n:number of cells, p: number of genes 104 | self.y=y # for supervised 105 | self.y_trans=y_trans 106 | self.input_dim = dims[0] 107 | self.n_stacks = len(self.dims) - 1 108 | self.is_stacked=is_stacked 109 | self.resolution=louvain_resolution 110 | self.alpha = alpha 111 | self.actinlayer1=actinlayer1 112 | self.transfer_feature=transfer_feature 113 | self.model_weights=model_weights 114 | self.softmax=softmax 115 | self.pretrain_epochs=pretrain_epochs 116 | if self.transfer_feature is None: 117 | self.pretrain(n_neighbors=n_neighbors,epochs=self.pretrain_epochs,n_clusters=n_clusters) 118 | else: 119 | self.pretrain_transfer(n_neighbors=n_neighbors,model_weights=self.model_weights,features=transfer_feature,epochs=self.pretrain_epochs,n_clusters=n_clusters,y_trans=self.y_trans) 120 | 121 | def pretrain(self, optimizer='adam', epochs=200, n_neighbors=10,batch_size=256,n_clusters=None): 122 | print("Doing DEC: pretrain") 123 | sae=SAE(dims=self.dims,drop_rate=0.2,batch_size=batch_size,actinlayer1=self.actinlayer1)# batch_size 124 | print('...Pretraining...') 125 | # begin pretraining 126 | t0 = time() 127 | if self.is_stacked: 128 | sae.fit(self.x,epochs=epochs) 129 | else: 130 | sae.fit2(self.x,epochs=epochs) 131 | 132 | self.autoencoder=sae.autoencoders 133 | self.encoder=sae.encoder 134 | print('Pretraining time: ', time() - t0) 135 | self.pretrained = True 136 | 137 | #build dec model and initialize model 138 | features=self.extract_features(self.x) 139 | features=np.asarray(features) 140 | if self.y is None: # Train data not labeled 141 | if isinstance(n_clusters,int): # Number of clusters known, use k-means 142 | print("...number of clusters have been specified, Initializing Cluster centroid using K-Means") 143 | kmeans = KMeans(n_clusters=n_clusters, n_init=20) 144 | Y_pred_init = kmeans.fit_predict(features) 145 | self.init_pred= np.copy(Y_pred_init) 146 | self.n_clusters=n_clusters 147 | cluster_centers=kmeans.cluster_centers_ 148 | self.init_centroid=cluster_centers 149 | else: # Number of clustered unknow, use unsupervised method 150 | print("...number of clusters does not know, Initialize Cluster centroid using louvain") 151 | adata=sc.AnnData(features) 152 | sc.pp.neighbors(adata, n_neighbors=n_neighbors) 153 | sc.tl.louvain(adata,resolution=self.resolution) 154 | Y_pred_init=adata.obs['louvain'] 155 | self.init_pred=np.asarray(Y_pred_init,dtype=int) 156 | features=pd.DataFrame(features,index=np.arange(0,features.shape[0])) 157 | Group=pd.Series(self.init_pred,index=np.arange(0,features.shape[0]),name="Group") 158 | Mergefeature=pd.concat([features,Group],axis=1) 159 | cluster_centers=np.asarray(Mergefeature.groupby("Group").mean()) 160 | self.n_clusters=cluster_centers.shape[0] 161 | self.init_centroid=cluster_centers 162 | print("The shape of cluster_centers",cluster_centers.shape) 163 | else: # train data is labeled 164 | print("y known, initilize Cluster centroid using y") 165 | # build dec model 166 | features=pd.DataFrame(features,index=np.arange(0,features.shape[0])) 167 | Group=pd.Series(self.y.values,index=np.arange(0,features.shape[0]),name="Group") 168 | Mergefeature=pd.concat([features,Group],axis=1) 169 | cluster_centers=np.asarray(Mergefeature.groupby("Group").mean()) 170 | self.n_clusters=cluster_centers.shape[0] 171 | self.init_centroid=cluster_centers 172 | print("The shape of cluster_center is",cluster_centers.shape) 173 | if not self.softmax: # Use dec method to do clustering 174 | clustering_layer = ClusteringLayer(self.n_clusters, name='clustering')(self.encoder.output) 175 | else: # Use softmax to do clustering 176 | clustering_layer=Dense(self.n_clusters,kernel_initializer="glorot_uniform",name="clustering",activation='softmax')(self.encoder.output) 177 | self.model = Model(inputs=self.encoder.input, outputs=clustering_layer) 178 | 179 | 180 | 181 | def pretrain_transfer(self,features,model_weights,y_trans=None,optmizer="adam",n_neighbors=10,epochs=200,batch_size=32,n_clusters=None): 182 | #y_trans is a numpy array 183 | print("Doing DEC: pretrain_transfer") 184 | if isinstance(n_clusters,int): 185 | print("...number of clusters have been specified, Initializing Cluster centroid using K-Means") 186 | kmeans = KMeans(n_clusters=n_clusters, n_init=20) 187 | Y_pred_init = kmeans.fit_predict(features) 188 | self.init_pred= np.copy(Y_pred_init) 189 | self.n_clusters=n_clusters 190 | cluster_centers=kmeans.cluster_centers_ 191 | self.init_centroid=[cluster_centers] 192 | else: 193 | print("The shape of features is",features.shape) 194 | if y_trans is not None and y_trans.shape[0]==features.shape[0]: 195 | print("The shape of y_trans is",y_trans.shape) 196 | print("...predicted y_test known, use it to get n_cliusters and init_centroid") 197 | self.init_pred=y_trans 198 | features=pd.DataFrame(features,index=np.arange(0,features.shape[0])) 199 | Group=pd.Series(y_trans,index=np.arange(0,features.shape[0]),name="Group") 200 | Mergefeature=pd.concat([features,Group],axis=1) 201 | cluster_centers=np.asarray(Mergefeature.groupby("Group").mean()) 202 | self.n_clusters=cluster_centers.shape[0] 203 | self.init_centroid=cluster_centers 204 | else: 205 | print("...number of clusters does not know, Initialize Cluster centroid using louvain") 206 | #can be replaced by other clustering methods 207 | adata=sc.AnnData(features) 208 | sc.pp.neighbors(adata, n_neighbors=n_neighbors) #louvain step1 209 | sc.tl.louvain(adata,resolution=self.resolution) #louvain step2 210 | Y_pred_init=adata.obs['louvain'] 211 | self.init_pred=np.asarray(Y_pred_init,dtype=int) 212 | features=pd.DataFrame(features,index=np.arange(0,features.shape[0])) 213 | Group=pd.Series(self.init_pred,index=np.arange(0,features.shape[0]),name="Group") 214 | Mergefeature=pd.concat([features,Group],axis=1) 215 | cluster_centers=np.asarray(Mergefeature.groupby("Group").mean()) 216 | self.n_clusters=cluster_centers.shape[0] 217 | self.init_centroid=cluster_centers 218 | print("The shape of cluster_centers",cluster_centers.shape[0]) 219 | 220 | sae=SAE(dims=self.dims,drop_rate=0.2,batch_size=batch_size,actinlayer1=self.actinlayer1)# batch_size 221 | self.autoencoder=sae.autoencoders 222 | self.encoder=sae.encoder 223 | clustering_layer=ClusteringLayer(self.n_clusters, name='clustering')(self.encoder.output) # use dec to do clustering 224 | self.model=Model(self.encoder.input,outputs=clustering_layer) 225 | print("The length layers of self.model",len(self.model.layers)) 226 | for i in range(len(self.model.layers)-2): 227 | self.model.layers[i+1].set_weights(model_weights[i+1]) 228 | self.model.get_layer("clustering").set_weights([self.init_centroid]) 229 | #fine tunning 230 | 231 | def load_weights(self, weights): # load weights of DEC model 232 | self.model.load_weights(weights) 233 | 234 | def extract_features(self, x): 235 | return self.encoder.predict(x) 236 | 237 | def predict(self, x): # predict cluster labels using the output of clustering layer 238 | q = self.model.predict(x, verbose=0) 239 | return q.argmax(1) 240 | 241 | @staticmethod 242 | def target_distribution(q): 243 | weight = q ** 2 / q.sum(0) 244 | return (weight.T / weight.sum(1)).T 245 | 246 | def compile(self, optimizer='sgd', loss='kld'): 247 | self.model.compile(optimizer=optimizer, loss=loss) 248 | 249 | def fit(self,x, maxiter=2e3, epochs_fit=10,batch_size=256, tol=1e-3): # unsupervised 250 | print("Doing DEC: fit") 251 | #step1 initial weights by louvain,or Kmeans 252 | self.model.get_layer(name='clustering').set_weights([self.init_centroid]) 253 | y_pred_last = np.copy(self.init_pred) 254 | # Step 2: deep clustering 255 | # logging file 256 | #y_pred_last=self.init_pred 257 | loss = 0 258 | index = 0 259 | index_array = np.arange(x.shape[0]) 260 | for ite in range(int(maxiter)): 261 | q = self.model.predict(x, verbose=0) 262 | p = self.target_distribution(q) # update the auxiliary target distribution p 263 | # evaluate the clustering performance 264 | y_pred = q.argmax(1) 265 | 266 | # check stop criterion 267 | delta_label = np.sum(y_pred != y_pred_last).astype(np.float32) / y_pred.shape[0] 268 | y_pred_last = np.copy(y_pred) 269 | if ite > 0 and delta_label < tol: 270 | print('delta_label ', delta_label, '< tol ', tol) 271 | print('Reached tolerance threshold. Stopped training.') 272 | break 273 | print("The value of delta_label of current",str(ite+1),"th iteration is",delta_label,">= tol",tol) 274 | #train on whole dataset on prespecified batch_size 275 | callbacks=[EarlyStopping(monitor='loss',min_delta=10e-4,patience=4,verbose=0,mode='auto')] 276 | self.model.fit(x=x,y=p,epochs=epochs_fit,batch_size=batch_size,callbacks=callbacks,shuffle=True,verbose=False) 277 | 278 | y0=pd.Series(y_pred) 279 | print("The final prediction cluster is:") 280 | print(y0.value_counts()) 281 | Embeded_z=self.encoder.predict(x) 282 | return Embeded_z,q 283 | 284 | #Show the trajectory of the centroid during iterations 285 | def fit_trajectory(self,x, maxiter=2e3, epochs_fit=10,batch_size=256, tol=1e-3): # unsupervised 286 | print("Doing DEC: fit_trajectory") 287 | #step1 initial weights by louvain,or Kmeans 288 | self.model.get_layer(name='clustering').set_weights([self.init_centroid]) 289 | y_pred_last = np.copy(self.init_pred) 290 | # Step 2: deep clustering 291 | # logging file 292 | #y_pred_last=self.init_pred 293 | loss = 0 294 | index = 0 295 | index_array = np.arange(x.shape[0]) 296 | trajectory_z=[] #trajectory embedding 297 | trajectory_l=[] #trajectory label 298 | for ite in range(int(maxiter)): 299 | q = self.model.predict(x, verbose=0) 300 | p = self.target_distribution(q) # update the auxiliary target distribution p 301 | # evaluate the clustering performance 302 | y_pred = q.argmax(1) 303 | 304 | # check stop criterion 305 | delta_label = np.sum(y_pred != y_pred_last).astype(np.float32) / y_pred.shape[0] 306 | y_pred_last = np.copy(y_pred) 307 | if ite > 0 and delta_label < tol: 308 | print('delta_label ', delta_label, '< tol ', tol) 309 | print('Reached tolerance threshold. Stopped training.') 310 | break 311 | print("The value of delta_label of current",str(ite+1),"th iteration is",delta_label,">= tol",tol) 312 | #train on whole dataset on prespecified batch_size 313 | callbacks=[EarlyStopping(monitor='loss',min_delta=10e-4,patience=4,verbose=0,mode='auto')] 314 | self.model.fit(x=x,y=p,epochs=epochs_fit,batch_size=batch_size,callbacks=callbacks,shuffle=True,verbose=False) 315 | 316 | 317 | if ite % 10 ==0: 318 | print("This is the iteration of ", ite) 319 | Embeded_z=self.encoder.predict(x) # feature 320 | q_tmp=self.model.predict(x,verbose=0) # predicted clustering results 321 | l_tmp=change_to_continuous(q_tmp) 322 | trajectory_z.append(Embeded_z) 323 | trajectory_l.append(l_tmp) 324 | 325 | y0=pd.Series(y_pred) 326 | print("The final prediction cluster is:") 327 | print(y0.value_counts()) 328 | Embeded_z=self.encoder.predict(x) 329 | return trajectory_z, trajectory_l, Embeded_z, q 330 | #return ret 331 | 332 | 333 | def fit_supervise(self,x,y,epochs=2e3,batch_size=256): 334 | #y is 1-D array, Series, or a list, len(y)==x.shape[0] 335 | print("Doing DEC: fit_supervised") 336 | if self.softmax==False: # Only DEC clustering can set init_centroid 337 | self.model.get_layer(name='clustering').set_weights([self.init_centroid]) 338 | y0=pd.Series(y,dtype="category") #avoding y is string 339 | y0=y0.cat.rename_categories(range(len(y0.cat.categories))) 340 | y_true=pd.get_dummies(pd.Series(y0)).values# coded according to 0,1,...,3 341 | y_true=y_true+0.00005*np.random.random(y_true.shape)+0.00001 # add some disturb 342 | y_true=y_true/y_true.sum(axis=1)[:,None] 343 | callbacks=[EarlyStopping(monitor='loss',min_delta=10e-4,patience=4,verbose=0,mode='auto')] 344 | self.model.fit(x=x,y=y_true,epochs=int(epochs),batch_size=batch_size,callbacks=callbacks,shuffle=True,verbose=False,validation_split=0.25) 345 | Embeded_z=self.encoder.predict(x) # feature 346 | q=self.model.predict(x,verbose=0) # predicted clustering results 347 | #return y0, representing the mapping reference for y 348 | return Embeded_z,q 349 | -------------------------------------------------------------------------------- /ItClust_package/ItClust/ItClust.py: -------------------------------------------------------------------------------- 1 | from __future__ import division 2 | from time import time 3 | #### 4 | from . DEC import DEC 5 | from . preprocessing import * 6 | #### 7 | from keras.models import Model 8 | import os,csv 9 | from keras.optimizers import SGD 10 | import pandas as pd 11 | import numpy as np 12 | from scipy.sparse import issparse 13 | import scanpy.api as sc 14 | from anndata import AnnData 15 | from natsort import natsorted 16 | from sklearn import cluster, datasets, mixture,metrics 17 | #os.environ["CUDA_VISIBLE_DEVICES"]="1" 18 | 19 | class transfer_learning_clf(object): 20 | ''' 21 | The transfer learning clustering and classification model. 22 | This class has following api: fit(), predict(), Umap(), tSNE() 23 | ''' 24 | def __init__(self): 25 | super(transfer_learning_clf, self).__init__() 26 | 27 | def fit(self, 28 | source_data, #adata 29 | target_data, #adata 30 | normalize=True, 31 | take_log=True, 32 | scale=True, 33 | batch_size=256, 34 | maxiter=1000, 35 | pretrain_epochs=300, 36 | epochs_fit=5, 37 | tol=[0.001], 38 | alpha=[1.0], 39 | resolution=[0.2,0.4,0.8,1.2,1.6], 40 | n_neighbors=20, 41 | softmax=False, 42 | init="glorot_uniform", 43 | save_atr="isy_trans_True" 44 | ): 45 | ''' 46 | Fit the transfer learning model using provided data. 47 | This function includes preprocessing steps. 48 | Input: source_data(anndata format), target_data(anndata format). 49 | Source and target data can be in any form (UMI or TPM or FPKM) 50 | Retrun: No return 51 | ''' 52 | self.batch_size=batch_size 53 | self.maxiter=maxiter 54 | self.pretrain_epochs=pretrain_epochs 55 | self.epochs_fit=epochs_fit 56 | self.tol=tol 57 | self.alpha=alpha 58 | self.source_data=source_data 59 | self.target_data=target_data 60 | self.resolution=resolution 61 | self.n_neighbors=n_neighbors 62 | self.softmax=softmax 63 | self.init=init 64 | self.save_atr=save_atr 65 | dictionary={"alpha":alpha,"tol":tol,"resolution":resolution} 66 | df_expand=expand_grid(dictionary) 67 | #begin to conduct 68 | adata_tmp=[] 69 | source_data.var_names_make_unique(join="-") 70 | source_data.obs_names_make_unique(join="-") 71 | 72 | #pre-processiong 73 | #1.pre filter cells 74 | prefilter_cells(source_data,min_genes=100) 75 | #2 pre_filter genes 76 | prefilter_genes(source_data,min_cells=10) # avoiding all gene is zeros 77 | #3 prefilter_specialgene: MT and ERCC 78 | prefilter_specialgenes(source_data) 79 | #4 normalization,var.genes,log1p,scale 80 | if normalize: 81 | sc.pp.normalize_per_cell(source_data) 82 | #5 scale 83 | if take_log: 84 | sc.pp.log1p(source_data) 85 | if scale: 86 | sc.pp.scale(source_data,zero_center=True,max_value=6) 87 | source_data.var_names=[i.upper() for i in list(source_data.var_names)]#avoding some gene have lower letter 88 | adata_tmp.append(source_data) 89 | 90 | #Target data 91 | target_data.var_names_make_unique(join="-") 92 | target_data.obs_names_make_unique(join="-") 93 | #pre-processiong 94 | #1.pre filter cells 95 | prefilter_cells(target_data,min_genes=100) 96 | #2 pre_filter genes 97 | prefilter_genes(target_data,min_cells=10) # avoiding all gene is zeros 98 | #3 prefilter_specialgene: MT and ERCC 99 | prefilter_specialgenes(target_data) 100 | #4 normalization,var.genes,log1p,scale 101 | if normalize: 102 | sc.pp.normalize_per_cell(target_data) 103 | 104 | # select top genes 105 | if target_data.X.shape[0]<=1500: 106 | ng=500 107 | elif 1500 encoder_0->act -> encoder_1 -> decoder_1->act -> decoder_0; 14 | stack_0 model: Input->dropout -> encoder_0->act->dropout -> decoder_0; 15 | stack_1 model: encoder_0->act->dropout -> encoder_1->dropout -> decoder_1->act; 16 | 17 | Usage: 18 | from SAE import SAE 19 | sae = SAE(dims=[784, 500, 10]) # define a SAE with 5 layers 20 | sae.fit(x, epochs=100) 21 | features = sae.extract_feature(x) 22 | 23 | Arguments: 24 | dims: list of number of units in each layer of encoder. dims[0] is input dim, dims[-1] is units in hidden layer. 25 | The decoder is symmetric with encoder. So number of layers of the auto-encoder is 2*len(dims)-1 26 | act: activation (default='relu'), not applied to Input, Hidden and Output layers. 27 | drop_rate: drop ratio of Dropout for constructing denoising autoencoder 'stack_i' during layer-wise pretraining 28 | batch_size: batch size 29 | """ 30 | def __init__(self, dims, act='relu', drop_rate=0.2, batch_size=32,actinlayer1="tanh",init="glorot_uniform"): #act relu 31 | self.dims = dims 32 | self.n_stacks = len(dims) - 1 33 | self.n_layers = 2*self.n_stacks # exclude input layer 34 | self.activation = act 35 | #self.actinlayer1="tanh" #linear 36 | self.actinlayer1=actinlayer1 #linear 37 | self.drop_rate = drop_rate 38 | self.init=init 39 | self.batch_size = batch_size 40 | self.stacks = [self.make_stack(i) for i in range(self.n_stacks)] 41 | self.autoencoders ,self.encoder= self.make_autoencoders() 42 | #plot_model(self.autoencoders, show_shapes=True, to_file='autoencoders.png') 43 | 44 | def make_autoencoders(self): 45 | """ Fully connected autoencoders model, symmetric. 46 | """ 47 | # input 48 | x = Input(shape=(self.dims[0],), name='input') 49 | h = x 50 | 51 | # internal layers in encoder 52 | for i in range(self.n_stacks-1): 53 | h = Dense(self.dims[i + 1], kernel_initializer=self.init,activation=self.activation, name='encoder_%d' % i)(h) 54 | 55 | # hidden layer,default activation is linear 56 | h = Dense(self.dims[-1],kernel_initializer=self.init, name='encoder_%d' % (self.n_stacks - 1),activation=self.actinlayer1)(h) # features are extracted from here 57 | 58 | y=h 59 | # internal layers in decoder 60 | for i in range(self.n_stacks-1, 0, -1): #2,1 61 | y = Dense(self.dims[i], kernel_initializer=self.init,activation=self.activation, name='decoder_%d' % i)(y) 62 | 63 | # output 64 | y = Dense(self.dims[0], kernel_initializer=self.init,name='decoder_0',activation=self.actinlayer1)(y) 65 | 66 | return Model(inputs=x, outputs=y,name="AE"),Model(inputs=x,outputs=h,name="encoder") 67 | 68 | def make_stack(self, ith): 69 | """ 70 | Make the ith denoising autoencoder for layer-wise pretraining. It has single hidden layer. The input data is 71 | corrupted by Dropout(drop_rate) 72 | 73 | Arguments: 74 | ith: int, in [0, self.n_stacks) 75 | """ 76 | in_out_dim = self.dims[ith] 77 | hidden_dim = self.dims[ith+1] 78 | output_act = self.activation 79 | hidden_act = self.activation 80 | if ith == 0: 81 | output_act = self.actinlayer1# tanh, or linear 82 | if ith == self.n_stacks-1: 83 | hidden_act = self.actinlayer1 #tanh, or linear 84 | model = Sequential() 85 | model.add(Dropout(self.drop_rate, input_shape=(in_out_dim,))) 86 | model.add(Dense(units=hidden_dim, activation=hidden_act, name='encoder_%d' % ith)) 87 | model.add(Dropout(self.drop_rate)) 88 | model.add(Dense(units=in_out_dim, activation=output_act, name='decoder_%d' % ith)) 89 | 90 | #plot_model(model, to_file='stack_%d.png' % ith, show_shapes=True) 91 | return model 92 | 93 | def pretrain_stacks(self, x, epochs=200): 94 | """ 95 | Layer-wise pretraining. Each stack is trained for 'epochs' epochs using SGD with learning rate decaying 10 96 | times every 'ep ochs/3' epochs. 97 | 98 | Arguments: 99 | x: input data, shape=(n_samples, n_dims) 100 | epochs: epochs for each stack 101 | """ 102 | print("Doing SAE: pretrain_stacks") 103 | features = x 104 | for i in range(self.n_stacks): 105 | print( 'Pretraining the %dth layer...' % (i+1)) 106 | for j in range(3): # learning rate multiplies 0.1 every 'epochs/3' epochs 107 | print ('learning rate =', pow(10, -1-j)) 108 | self.stacks[i].compile(optimizer=SGD(pow(10, -1-j), momentum=0.9), loss='mse') 109 | #callbacks=[EarlyStopping(monitor='loss',min_delta=10e-4,patience=4,verbose=1,mode='auto')] 110 | #self.stacks[i].fit(features, features, batch_size=self.batch_size, callbacks=callbacks,epochs=math.ceil(epochs/3)) 111 | self.stacks[i].fit(features, features, batch_size=self.batch_size,epochs=math.ceil(epochs/3),verbose=0) 112 | print ('The %dth layer has been pretrained.' % (i+1)) 113 | 114 | # update features to the inputs of the next layer 115 | feature_model = Model(inputs=self.stacks[i].input, outputs=self.stacks[i].get_layer('encoder_%d'%i).output) 116 | features = feature_model.predict(features) 117 | 118 | def pretrain_autoencoders(self, x, epochs=500): 119 | """ 120 | Fine tune autoendoers end-to-end after layer-wise pretraining using 'pretrain_stacks()' 121 | Use SGD with learning rate = 0.1, decayed 10 times every 80 epochs 122 | 123 | :param x: input data, shape=(n_samples, n_dims) 124 | :param epochs: training epochs 125 | :return: 126 | """ 127 | print("Doing SAE: pretrain_autoencoders") 128 | print ('Copying layer-wise pretrained weights to deep autoencoders') 129 | for i in range(self.n_stacks): 130 | name = 'encoder_%d' % i 131 | self.autoencoders.get_layer(name).set_weights(self.stacks[i].get_layer(name).get_weights()) 132 | name = 'decoder_%d' % i 133 | self.autoencoders.get_layer(name).set_weights(self.stacks[i].get_layer(name).get_weights()) 134 | 135 | print ('Fine-tuning autoencoder end-to-end') 136 | for j in range(math.ceil(epochs/50)): 137 | lr = 0.1*pow(10, -j) 138 | print ('learning rate =', lr) 139 | self.autoencoders.compile(optimizer=SGD(lr, momentum=0.9), loss='mse') 140 | #callbacks=[EarlyStopping(monitor='loss',min_delta=10e-4,patience=4,verbose=1,mode='auto')] 141 | #self.autoencoders.fit(x=x, y=x, batch_size=self.batch_size, epochs=50,callbacks=callbacks) 142 | self.autoencoders.fit(x=x, y=x, batch_size=self.batch_size, epochs=50,verbose=0) 143 | 144 | def fit(self, x, epochs=200): 145 | self.pretrain_stacks(x, epochs=epochs/2) 146 | self.pretrain_autoencoders(x, epochs=epochs)#fine tunning 147 | 148 | def fit2(self,x,epochs=200): #no stack directly traning 149 | for j in range(math.ceil(epochs/50)): 150 | lr = 0.1*pow(10, -j) 151 | print ('learning rate =', lr) 152 | self.autoencoders.compile(optimizer=SGD(lr, momentum=0.9), loss='mse') 153 | self.autoencoders.fit(x=x, y=x, batch_size=self.batch_size, epochs=50) 154 | 155 | def extract_feature(self, x): 156 | """ 157 | Extract features from the middle layer of autoencoders. 158 | 159 | :param x: data 160 | :return: features 161 | """ 162 | hidden_layer = self.autoencoders.get_layer(name='encoder_%d' % (self.n_stacks - 1)) 163 | feature_model = Model(self.autoencoders.input, hidden_layer.output) 164 | return feature_model.predict(x, batch_size=self.batch_size) 165 | 166 | 167 | if __name__ == "__main__": 168 | """ 169 | An example for how to use SAE model on MNIST dataset. In terminal run 170 | python SAE.py 171 | to see the result. You may get NMI=0.77. 172 | """ 173 | import numpy as np 174 | 175 | def load_mnist(): 176 | # the data, shuffled and split between train and test sets 177 | from keras.datasets import mnist 178 | (x_train, y_train), (x_test, y_test) = mnist.load_data() 179 | x = np.concatenate((x_train, x_test)) 180 | y = np.concatenate((y_train, y_test)) 181 | x = x.reshape((x.shape[0], -1)) 182 | x = np.divide(x, 50.) # normalize as it does in DEC paper 183 | print ('MNIST samples', x.shape) 184 | return x, y 185 | 186 | db = 'mnist' 187 | n_clusters = 10 188 | x, y = load_mnist() 189 | 190 | # define and train SAE model 191 | sae = SAE(dims=[x.shape[-1], 500, 500, 2000, 10]) 192 | sae.fit(x=x, epochs=400) 193 | sae.autoencoders.save_weights('weights_%s.h5' % db) 194 | 195 | # extract features 196 | print ('Finished training, extracting features using the trained SAE model') 197 | features = sae.extract_feature(x) 198 | 199 | print ('performing k-means clustering on the extracted features') 200 | from sklearn.cluster import KMeans 201 | km = KMeans(n_clusters, n_init=20) 202 | y_pred = km.fit_predict(features) 203 | 204 | from sklearn.metrics import normalized_mutual_info_score as nmi 205 | print ('K-means clustering result on extracted features: NMI =', nmi(y, y_pred)) 206 | -------------------------------------------------------------------------------- /ItClust_package/ItClust/__init__.py: -------------------------------------------------------------------------------- 1 | __version__ = '1.2.0' 2 | from . ItClust import transfer_learning_clf 3 | from . preprocessing import read_10X -------------------------------------------------------------------------------- /ItClust_package/ItClust/preprocessing.py: -------------------------------------------------------------------------------- 1 | import scanpy.api as sc 2 | import pandas as pd 3 | import numpy as np 4 | import scipy 5 | import os 6 | from anndata import AnnData,read_csv,read_text,read_mtx 7 | from scipy.sparse import issparse 8 | from natsort import natsorted 9 | from anndata import read_mtx 10 | from anndata.utils import make_index_unique 11 | 12 | 13 | def read_10X(data_path, var_names='gene_symbols'): 14 | adata = read_mtx(data_path + '/matrix.mtx').T 15 | genes = pd.read_csv(data_path + '/genes.tsv', header=None, sep='\t') 16 | adata.var['gene_ids'] = genes[0].values 17 | adata.var['gene_symbols'] = genes[1].values 18 | assert var_names == 'gene_symbols' or var_names == 'gene_ids', \ 19 | 'var_names must be "gene_symbols" or "gene_ids"' 20 | if var_names == 'gene_symbols': 21 | var_names = genes[1] 22 | else: 23 | var_names = genes[0] 24 | if not var_names.is_unique: 25 | var_names = make_index_unique(pd.Index(var_names)).tolist() 26 | print('var_names are not unique, "make_index_unique" has applied') 27 | adata.var_names = var_names 28 | cells = pd.read_csv(data_path + '/barcodes.tsv', header=None, sep='\t') 29 | adata.obs['barcode'] = cells[0].values 30 | adata.obs_names = cells[0] 31 | return adata 32 | 33 | 34 | 35 | def change_to_continuous(q): 36 | #y_trans=q.argmax(axis=1) 37 | y_pred=np.asarray(np.argmax(q,axis=1),dtype=int) 38 | unique_labels=np.unique(q.argmax(axis=1)) 39 | #turn to continuous clusters label,from 0,1,2,3,... 40 | test_c={} 41 | for ind, i in enumerate(unique_labels): 42 | test_c[i]=ind 43 | y_pred=np.asarray([test_c[i] for i in y_pred],dtype=int) 44 | ##turn to categories 45 | labels=y_pred.astype('U') 46 | labels=pd.Categorical(values=labels,categories=natsorted(np.unique(y_pred).astype('U'))) 47 | return labels 48 | 49 | def presettings(save_dir="result_scRNA",dpi=200,verbosity=3): 50 | if not os.path.exists(save_dir): 51 | print("Warning:"+str(save_dir)+"does not exists, so we will creat it automatically!!:\n") 52 | os.mkdir(save_dir) 53 | figure_dir=os.path.join(save_dir,"figures") 54 | if not os.path.exists(figure_dir): 55 | os.mkdir(figure_dir) 56 | sc.settings.figdir=figure_dir 57 | sc.settings.verbosity=verbosity 58 | sc.settings.set_figure_params(dpi=dpi) 59 | sc.logging.print_versions() 60 | 61 | def creatadata(datadir=None,exprmatrix=None,expermatrix_filename="matrix.mtx",is_mtx=True,cell_info=None,cell_info_filename="barcodes.tsv",gene_info=None,gene_info_filename="genes.tsv",project_name=None): 62 | """ 63 | Construct a anndata object 64 | 65 | Construct a anndata from data in memory or files on disk. If datadir is a dir, there must be at least include "matrix.mtx" or data.txt(without anly columns name or rowname and sep="\t") , 66 | 67 | """ 68 | if (datadir is None and expermatrix is None and expermatrix_filename is None): 69 | raise ValueError("Please provide either the expression matrix or the ful path to the expression matrix!!") 70 | #something wrong 71 | cell_info=pd.DataFrame(["cell_"+str(i) for i in range(1,x.shape[0]+1)],columns=["cellname"]) if cell_info is not None else cell_info 72 | gene_info=pd.DataFrame(["gene_"+str(i) for i in range(1,x.shape[1]+1)],columns=["genename"]) if gene_info is not None else gene_info 73 | if datadir is not None: 74 | cell_and_gene_file = [f for f in os.listdir(datadir) if os.path.isfile(os.path.join(datadir, f))] 75 | if (os.path.isdir(datadir) and is_mtx==True): #sparse 76 | print("Start to read expression data (matrix.mtx)") 77 | x=sc.read_mtx(os.path.join(datadir,expermatrix_filename)).X.T 78 | else: #nonsparse 79 | x=pd.read_csv(os.path.join(datadir,expermatrix_filename),sep="\t",header=F) 80 | 81 | #only matrix with row names and colnames 82 | if cell_info_filename in cell_and_gene_file: 83 | cell_info=pd.read_csv(os.path.join(datadir,cell_info_filename),sep="\t",header=0,na_filter=False) 84 | if gene_info_filename in cell_and_gene_file: 85 | gene_info=pd.read_csv(os.path.join(datadir,gene_info_filename),sep="\t",header=0,na_filter=False) 86 | else: 87 | x=exprmatrix # n*p matrix, cell* gene 88 | 89 | adata=sc.AnnData(x,obs=cell_info,var=gene_info) 90 | a=adata.obs["cellname"] if "cellname" in adata.obs.keys() else adata.obs.index 91 | adata.var_names=adata.var["genename"] if "genename" in adata.var.keys() else adata.var.index 92 | adata.obs_names_make_unique(join="-") 93 | adata.var_names_make_unique(join="-") 94 | adata.uns["ProjectName"]="DEC_clust_algorithm" if project_name is None else project_name 95 | return adata 96 | 97 | def prefilter_cells(adata,min_counts=None,max_counts=None,min_genes=200,max_genes=None): 98 | if min_genes is None and min_counts is None and max_genes is None and max_counts is None: 99 | raise ValueError('Provide one of min_counts, min_genes, max_counts or max_genes.') 100 | id_tmp=np.asarray([True]*adata.shape[0],dtype=bool) 101 | id_tmp=np.logical_and(id_tmp,sc.pp.filter_cells(adata.X,min_genes=min_genes)[0]) if min_genes is not None else id_tmp 102 | id_tmp=np.logical_and(id_tmp,sc.pp.filter_cells(adata.X,max_genes=max_genes)[0]) if max_genes is not None else id_tmp 103 | id_tmp=np.logical_and(id_tmp,sc.pp.filter_cells(adata.X,min_counts=min_counts)[0]) if min_counts is not None else id_tmp 104 | id_tmp=np.logical_and(id_tmp,sc.pp.filter_cells(adata.X,max_counts=max_counts)[0]) if max_counts is not None else id_tmp 105 | adata._inplace_subset_obs(id_tmp) 106 | adata.raw=sc.pp.log1p(adata,copy=True) #check the rowname 107 | print("the var_names of adata.raw: adata.raw.var_names.is_unique=:",adata.raw.var_names.is_unique) 108 | 109 | def prefilter_genes(adata,min_counts=None,max_counts=None,min_cells=10,max_cells=None): 110 | if min_cells is None and min_counts is None and max_cells is None and max_counts is None: 111 | raise ValueError('Provide one of min_counts, min_genes, max_counts or max_genes.') 112 | id_tmp=np.asarray([True]*adata.shape[1],dtype=bool) 113 | id_tmp=np.logical_and(id_tmp,sc.pp.filter_genes(adata.X,min_cells=min_cells)[0]) if min_cells is not None else id_tmp 114 | id_tmp=np.logical_and(id_tmp,sc.pp.filter_genes(adata.X,max_cells=max_cells)[0]) if max_cells is not None else id_tmp 115 | id_tmp=np.logical_and(id_tmp,sc.pp.filter_genes(adata.X,min_counts=min_counts)[0]) if min_counts is not None else id_tmp 116 | id_tmp=np.logical_and(id_tmp,sc.pp.filter_genes(adata.X,max_counts=max_counts)[0]) if max_counts is not None else id_tmp 117 | adata._inplace_subset_var(id_tmp) 118 | 119 | def prefilter_specialgenes(adata,Gene1Pattern="ERCC",Gene2Pattern="MT-"): 120 | id_tmp1=np.asarray([not str(name).startswith(Gene1Pattern) for name in adata.var_names],dtype=bool) 121 | id_tmp2=np.asarray([not str(name).startswith(Gene2Pattern) for name in adata.var_names],dtype=bool) 122 | id_tmp=np.logical_and(id_tmp1,id_tmp2) 123 | adata._inplace_subset_var(id_tmp) 124 | 125 | def normalize_log1p_scale(adata,units="UMI",n_top_genes=1000): 126 | if units=="UMI" or units== "CPM": 127 | sc.pp.normalize_per_cell(adata,counts_per_cell_after=10e4) 128 | sc.pp.filter_genes_dispersion(adata,n_top_genes=n_top_genes) 129 | sc.pp.log1p(adata) 130 | sc.pp.scale(adata,zero_center=True,max_value=6) 131 | 132 | #creat DEC object 133 | def get_xinput(adata): 134 | if not isinstance(adata,AnnData): 135 | raise ValueError("adata must be an AnnData object") 136 | if issparse(adata.X): 137 | x=adata.X.toarray() 138 | else: 139 | x=adata.X 140 | return x 141 | 142 | def getdims(dim): 143 | x=dim 144 | assert len(x)==2 145 | n_sample=x[0] 146 | if n_sample>20000: 147 | dims=[x[1],128,32] 148 | elif n_sample>10000: 149 | dims=[x[1],64,32] 150 | elif n_sample>5000: 151 | dims=[x[1],32,16] 152 | elif n_sample>2000: 153 | dims=[x[1],128] 154 | elif n_sample>500: 155 | dims=[x[1],64] # 156 | else: 157 | dims=[x[1],16] # or 32 158 | return dims 159 | 160 | def OriginalClustering(adata,resolution=1.2,n_neighbors=20,n_comps=50,n_PC=20,n_job=4,dotsne=True,doumap=True,dolouvain=True): 161 | #Do PCA directly 162 | sc.tl.pca(adata,n_comps=n_comps) 163 | n_pcs=n_PC if n_PC 0 and delta_label < tol: 270 | print('delta_label ', delta_label, '< tol ', tol) 271 | print('Reached tolerance threshold. Stopped training.') 272 | break 273 | print("The value of delta_label of current",str(ite+1),"th iteration is",delta_label,">= tol",tol) 274 | #train on whole dataset on prespecified batch_size 275 | callbacks=[EarlyStopping(monitor='loss',min_delta=10e-4,patience=4,verbose=0,mode='auto')] 276 | self.model.fit(x=x,y=p,epochs=epochs_fit,batch_size=batch_size,callbacks=callbacks,shuffle=True,verbose=False) 277 | 278 | y0=pd.Series(y_pred) 279 | print("The final prediction cluster is:") 280 | print(y0.value_counts()) 281 | Embeded_z=self.encoder.predict(x) 282 | return Embeded_z,q 283 | 284 | #Show the trajectory of the centroid during iterations 285 | def fit_trajectory(self,x, maxiter=2e3, epochs_fit=10,batch_size=256, tol=1e-3): # unsupervised 286 | print("Doing DEC: fit_trajectory") 287 | #step1 initial weights by louvain,or Kmeans 288 | self.model.get_layer(name='clustering').set_weights([self.init_centroid]) 289 | y_pred_last = np.copy(self.init_pred) 290 | # Step 2: deep clustering 291 | # logging file 292 | #y_pred_last=self.init_pred 293 | loss = 0 294 | index = 0 295 | index_array = np.arange(x.shape[0]) 296 | trajectory_z=[] #trajectory embedding 297 | trajectory_l=[] #trajectory label 298 | for ite in range(int(maxiter)): 299 | q = self.model.predict(x, verbose=0) 300 | p = self.target_distribution(q) # update the auxiliary target distribution p 301 | # evaluate the clustering performance 302 | y_pred = q.argmax(1) 303 | 304 | # check stop criterion 305 | delta_label = np.sum(y_pred != y_pred_last).astype(np.float32) / y_pred.shape[0] 306 | y_pred_last = np.copy(y_pred) 307 | if ite > 0 and delta_label < tol: 308 | print('delta_label ', delta_label, '< tol ', tol) 309 | print('Reached tolerance threshold. Stopped training.') 310 | break 311 | print("The value of delta_label of current",str(ite+1),"th iteration is",delta_label,">= tol",tol) 312 | #train on whole dataset on prespecified batch_size 313 | callbacks=[EarlyStopping(monitor='loss',min_delta=10e-4,patience=4,verbose=0,mode='auto')] 314 | self.model.fit(x=x,y=p,epochs=epochs_fit,batch_size=batch_size,callbacks=callbacks,shuffle=True,verbose=False) 315 | 316 | 317 | if ite % 10 ==0: 318 | print("This is the iteration of ", ite) 319 | Embeded_z=self.encoder.predict(x) # feature 320 | q_tmp=self.model.predict(x,verbose=0) # predicted clustering results 321 | l_tmp=change_to_continuous(q_tmp) 322 | trajectory_z.append(Embeded_z) 323 | trajectory_l.append(l_tmp) 324 | 325 | y0=pd.Series(y_pred) 326 | print("The final prediction cluster is:") 327 | print(y0.value_counts()) 328 | Embeded_z=self.encoder.predict(x) 329 | return trajectory_z, trajectory_l, Embeded_z, q 330 | #return ret 331 | 332 | 333 | def fit_supervise(self,x,y,epochs=2e3,batch_size=256): 334 | #y is 1-D array, Series, or a list, len(y)==x.shape[0] 335 | print("Doing DEC: fit_supervised") 336 | if self.softmax==False: # Only DEC clustering can set init_centroid 337 | self.model.get_layer(name='clustering').set_weights([self.init_centroid]) 338 | y0=pd.Series(y,dtype="category") #avoding y is string 339 | y0=y0.cat.rename_categories(range(len(y0.cat.categories))) 340 | y_true=pd.get_dummies(pd.Series(y0)).values# coded according to 0,1,...,3 341 | y_true=y_true+0.00005*np.random.random(y_true.shape)+0.00001 # add some disturb 342 | y_true=y_true/y_true.sum(axis=1)[:,None] 343 | callbacks=[EarlyStopping(monitor='loss',min_delta=10e-4,patience=4,verbose=0,mode='auto')] 344 | self.model.fit(x=x,y=y_true,epochs=int(epochs),batch_size=batch_size,callbacks=callbacks,shuffle=True,verbose=False,validation_split=0.25) 345 | Embeded_z=self.encoder.predict(x) # feature 346 | q=self.model.predict(x,verbose=0) # predicted clustering results 347 | #return y0, representing the mapping reference for y 348 | return Embeded_z,q 349 | -------------------------------------------------------------------------------- /ItClust_package/build/lib/ItClust/ItClust.py: -------------------------------------------------------------------------------- 1 | from __future__ import division 2 | from time import time 3 | #### 4 | from . DEC import DEC 5 | from . preprocessing import * 6 | #### 7 | from keras.models import Model 8 | import os,csv 9 | from keras.optimizers import SGD 10 | import pandas as pd 11 | import numpy as np 12 | from scipy.sparse import issparse 13 | import scanpy.api as sc 14 | from anndata import AnnData 15 | from natsort import natsorted 16 | from sklearn import cluster, datasets, mixture,metrics 17 | #os.environ["CUDA_VISIBLE_DEVICES"]="1" 18 | 19 | class transfer_learning_clf(object): 20 | ''' 21 | The transfer learning clustering and classification model. 22 | This class has following api: fit(), predict(), Umap(), tSNE() 23 | ''' 24 | def __init__(self): 25 | super(transfer_learning_clf, self).__init__() 26 | 27 | def fit(self, 28 | source_data, #adata 29 | target_data, #adata 30 | normalize=True, 31 | take_log=True, 32 | scale=True, 33 | batch_size=256, 34 | maxiter=1000, 35 | pretrain_epochs=300, 36 | epochs_fit=5, 37 | tol=[0.001], 38 | alpha=[1.0], 39 | resolution=[0.2,0.4,0.8,1.2,1.6], 40 | n_neighbors=20, 41 | softmax=False, 42 | init="glorot_uniform", 43 | save_atr="isy_trans_True" 44 | ): 45 | ''' 46 | Fit the transfer learning model using provided data. 47 | This function includes preprocessing steps. 48 | Input: source_data(anndata format), target_data(anndata format). 49 | Source and target data can be in any form (UMI or TPM or FPKM) 50 | Retrun: No return 51 | ''' 52 | self.batch_size=batch_size 53 | self.maxiter=maxiter 54 | self.pretrain_epochs=pretrain_epochs 55 | self.epochs_fit=epochs_fit 56 | self.tol=tol 57 | self.alpha=alpha 58 | self.source_data=source_data 59 | self.target_data=target_data 60 | self.resolution=resolution 61 | self.n_neighbors=n_neighbors 62 | self.softmax=softmax 63 | self.init=init 64 | self.save_atr=save_atr 65 | dictionary={"alpha":alpha,"tol":tol,"resolution":resolution} 66 | df_expand=expand_grid(dictionary) 67 | #begin to conduct 68 | adata_tmp=[] 69 | source_data.var_names_make_unique(join="-") 70 | source_data.obs_names_make_unique(join="-") 71 | 72 | #pre-processiong 73 | #1.pre filter cells 74 | prefilter_cells(source_data,min_genes=100) 75 | #2 pre_filter genes 76 | prefilter_genes(source_data,min_cells=10) # avoiding all gene is zeros 77 | #3 prefilter_specialgene: MT and ERCC 78 | prefilter_specialgenes(source_data) 79 | #4 normalization,var.genes,log1p,scale 80 | if normalize: 81 | sc.pp.normalize_per_cell(source_data) 82 | #5 scale 83 | if take_log: 84 | sc.pp.log1p(source_data) 85 | if scale: 86 | sc.pp.scale(source_data,zero_center=True,max_value=6) 87 | source_data.var_names=[i.upper() for i in list(source_data.var_names)]#avoding some gene have lower letter 88 | adata_tmp.append(source_data) 89 | 90 | #Target data 91 | target_data.var_names_make_unique(join="-") 92 | target_data.obs_names_make_unique(join="-") 93 | #pre-processiong 94 | #1.pre filter cells 95 | prefilter_cells(target_data,min_genes=100) 96 | #2 pre_filter genes 97 | prefilter_genes(target_data,min_cells=10) # avoiding all gene is zeros 98 | #3 prefilter_specialgene: MT and ERCC 99 | prefilter_specialgenes(target_data) 100 | #4 normalization,var.genes,log1p,scale 101 | if normalize: 102 | sc.pp.normalize_per_cell(target_data) 103 | 104 | # select top genes 105 | if target_data.X.shape[0]<=1500: 106 | ng=500 107 | elif 1500 encoder_0->act -> encoder_1 -> decoder_1->act -> decoder_0; 14 | stack_0 model: Input->dropout -> encoder_0->act->dropout -> decoder_0; 15 | stack_1 model: encoder_0->act->dropout -> encoder_1->dropout -> decoder_1->act; 16 | 17 | Usage: 18 | from SAE import SAE 19 | sae = SAE(dims=[784, 500, 10]) # define a SAE with 5 layers 20 | sae.fit(x, epochs=100) 21 | features = sae.extract_feature(x) 22 | 23 | Arguments: 24 | dims: list of number of units in each layer of encoder. dims[0] is input dim, dims[-1] is units in hidden layer. 25 | The decoder is symmetric with encoder. So number of layers of the auto-encoder is 2*len(dims)-1 26 | act: activation (default='relu'), not applied to Input, Hidden and Output layers. 27 | drop_rate: drop ratio of Dropout for constructing denoising autoencoder 'stack_i' during layer-wise pretraining 28 | batch_size: batch size 29 | """ 30 | def __init__(self, dims, act='relu', drop_rate=0.2, batch_size=32,actinlayer1="tanh",init="glorot_uniform"): #act relu 31 | self.dims = dims 32 | self.n_stacks = len(dims) - 1 33 | self.n_layers = 2*self.n_stacks # exclude input layer 34 | self.activation = act 35 | #self.actinlayer1="tanh" #linear 36 | self.actinlayer1=actinlayer1 #linear 37 | self.drop_rate = drop_rate 38 | self.init=init 39 | self.batch_size = batch_size 40 | self.stacks = [self.make_stack(i) for i in range(self.n_stacks)] 41 | self.autoencoders ,self.encoder= self.make_autoencoders() 42 | #plot_model(self.autoencoders, show_shapes=True, to_file='autoencoders.png') 43 | 44 | def make_autoencoders(self): 45 | """ Fully connected autoencoders model, symmetric. 46 | """ 47 | # input 48 | x = Input(shape=(self.dims[0],), name='input') 49 | h = x 50 | 51 | # internal layers in encoder 52 | for i in range(self.n_stacks-1): 53 | h = Dense(self.dims[i + 1], kernel_initializer=self.init,activation=self.activation, name='encoder_%d' % i)(h) 54 | 55 | # hidden layer,default activation is linear 56 | h = Dense(self.dims[-1],kernel_initializer=self.init, name='encoder_%d' % (self.n_stacks - 1),activation=self.actinlayer1)(h) # features are extracted from here 57 | 58 | y=h 59 | # internal layers in decoder 60 | for i in range(self.n_stacks-1, 0, -1): #2,1 61 | y = Dense(self.dims[i], kernel_initializer=self.init,activation=self.activation, name='decoder_%d' % i)(y) 62 | 63 | # output 64 | y = Dense(self.dims[0], kernel_initializer=self.init,name='decoder_0',activation=self.actinlayer1)(y) 65 | 66 | return Model(inputs=x, outputs=y,name="AE"),Model(inputs=x,outputs=h,name="encoder") 67 | 68 | def make_stack(self, ith): 69 | """ 70 | Make the ith denoising autoencoder for layer-wise pretraining. It has single hidden layer. The input data is 71 | corrupted by Dropout(drop_rate) 72 | 73 | Arguments: 74 | ith: int, in [0, self.n_stacks) 75 | """ 76 | in_out_dim = self.dims[ith] 77 | hidden_dim = self.dims[ith+1] 78 | output_act = self.activation 79 | hidden_act = self.activation 80 | if ith == 0: 81 | output_act = self.actinlayer1# tanh, or linear 82 | if ith == self.n_stacks-1: 83 | hidden_act = self.actinlayer1 #tanh, or linear 84 | model = Sequential() 85 | model.add(Dropout(self.drop_rate, input_shape=(in_out_dim,))) 86 | model.add(Dense(units=hidden_dim, activation=hidden_act, name='encoder_%d' % ith)) 87 | model.add(Dropout(self.drop_rate)) 88 | model.add(Dense(units=in_out_dim, activation=output_act, name='decoder_%d' % ith)) 89 | 90 | #plot_model(model, to_file='stack_%d.png' % ith, show_shapes=True) 91 | return model 92 | 93 | def pretrain_stacks(self, x, epochs=200): 94 | """ 95 | Layer-wise pretraining. Each stack is trained for 'epochs' epochs using SGD with learning rate decaying 10 96 | times every 'ep ochs/3' epochs. 97 | 98 | Arguments: 99 | x: input data, shape=(n_samples, n_dims) 100 | epochs: epochs for each stack 101 | """ 102 | print("Doing SAE: pretrain_stacks") 103 | features = x 104 | for i in range(self.n_stacks): 105 | print( 'Pretraining the %dth layer...' % (i+1)) 106 | for j in range(3): # learning rate multiplies 0.1 every 'epochs/3' epochs 107 | print ('learning rate =', pow(10, -1-j)) 108 | self.stacks[i].compile(optimizer=SGD(pow(10, -1-j), momentum=0.9), loss='mse') 109 | #callbacks=[EarlyStopping(monitor='loss',min_delta=10e-4,patience=4,verbose=1,mode='auto')] 110 | #self.stacks[i].fit(features, features, batch_size=self.batch_size, callbacks=callbacks,epochs=math.ceil(epochs/3)) 111 | self.stacks[i].fit(features, features, batch_size=self.batch_size,epochs=math.ceil(epochs/3),verbose=0) 112 | print ('The %dth layer has been pretrained.' % (i+1)) 113 | 114 | # update features to the inputs of the next layer 115 | feature_model = Model(inputs=self.stacks[i].input, outputs=self.stacks[i].get_layer('encoder_%d'%i).output) 116 | features = feature_model.predict(features) 117 | 118 | def pretrain_autoencoders(self, x, epochs=500): 119 | """ 120 | Fine tune autoendoers end-to-end after layer-wise pretraining using 'pretrain_stacks()' 121 | Use SGD with learning rate = 0.1, decayed 10 times every 80 epochs 122 | 123 | :param x: input data, shape=(n_samples, n_dims) 124 | :param epochs: training epochs 125 | :return: 126 | """ 127 | print("Doing SAE: pretrain_autoencoders") 128 | print ('Copying layer-wise pretrained weights to deep autoencoders') 129 | for i in range(self.n_stacks): 130 | name = 'encoder_%d' % i 131 | self.autoencoders.get_layer(name).set_weights(self.stacks[i].get_layer(name).get_weights()) 132 | name = 'decoder_%d' % i 133 | self.autoencoders.get_layer(name).set_weights(self.stacks[i].get_layer(name).get_weights()) 134 | 135 | print ('Fine-tuning autoencoder end-to-end') 136 | for j in range(math.ceil(epochs/50)): 137 | lr = 0.1*pow(10, -j) 138 | print ('learning rate =', lr) 139 | self.autoencoders.compile(optimizer=SGD(lr, momentum=0.9), loss='mse') 140 | #callbacks=[EarlyStopping(monitor='loss',min_delta=10e-4,patience=4,verbose=1,mode='auto')] 141 | #self.autoencoders.fit(x=x, y=x, batch_size=self.batch_size, epochs=50,callbacks=callbacks) 142 | self.autoencoders.fit(x=x, y=x, batch_size=self.batch_size, epochs=50,verbose=0) 143 | 144 | def fit(self, x, epochs=200): 145 | self.pretrain_stacks(x, epochs=epochs/2) 146 | self.pretrain_autoencoders(x, epochs=epochs)#fine tunning 147 | 148 | def fit2(self,x,epochs=200): #no stack directly traning 149 | for j in range(math.ceil(epochs/50)): 150 | lr = 0.1*pow(10, -j) 151 | print ('learning rate =', lr) 152 | self.autoencoders.compile(optimizer=SGD(lr, momentum=0.9), loss='mse') 153 | self.autoencoders.fit(x=x, y=x, batch_size=self.batch_size, epochs=50) 154 | 155 | def extract_feature(self, x): 156 | """ 157 | Extract features from the middle layer of autoencoders. 158 | 159 | :param x: data 160 | :return: features 161 | """ 162 | hidden_layer = self.autoencoders.get_layer(name='encoder_%d' % (self.n_stacks - 1)) 163 | feature_model = Model(self.autoencoders.input, hidden_layer.output) 164 | return feature_model.predict(x, batch_size=self.batch_size) 165 | 166 | 167 | if __name__ == "__main__": 168 | """ 169 | An example for how to use SAE model on MNIST dataset. In terminal run 170 | python SAE.py 171 | to see the result. You may get NMI=0.77. 172 | """ 173 | import numpy as np 174 | 175 | def load_mnist(): 176 | # the data, shuffled and split between train and test sets 177 | from keras.datasets import mnist 178 | (x_train, y_train), (x_test, y_test) = mnist.load_data() 179 | x = np.concatenate((x_train, x_test)) 180 | y = np.concatenate((y_train, y_test)) 181 | x = x.reshape((x.shape[0], -1)) 182 | x = np.divide(x, 50.) # normalize as it does in DEC paper 183 | print ('MNIST samples', x.shape) 184 | return x, y 185 | 186 | db = 'mnist' 187 | n_clusters = 10 188 | x, y = load_mnist() 189 | 190 | # define and train SAE model 191 | sae = SAE(dims=[x.shape[-1], 500, 500, 2000, 10]) 192 | sae.fit(x=x, epochs=400) 193 | sae.autoencoders.save_weights('weights_%s.h5' % db) 194 | 195 | # extract features 196 | print ('Finished training, extracting features using the trained SAE model') 197 | features = sae.extract_feature(x) 198 | 199 | print ('performing k-means clustering on the extracted features') 200 | from sklearn.cluster import KMeans 201 | km = KMeans(n_clusters, n_init=20) 202 | y_pred = km.fit_predict(features) 203 | 204 | from sklearn.metrics import normalized_mutual_info_score as nmi 205 | print ('K-means clustering result on extracted features: NMI =', nmi(y, y_pred)) 206 | -------------------------------------------------------------------------------- /ItClust_package/build/lib/ItClust/__init__.py: -------------------------------------------------------------------------------- 1 | __version__ = '1.2.0' 2 | from . ItClust import transfer_learning_clf 3 | from . preprocessing import read_10X -------------------------------------------------------------------------------- /ItClust_package/build/lib/ItClust/preprocessing.py: -------------------------------------------------------------------------------- 1 | import scanpy.api as sc 2 | import pandas as pd 3 | import numpy as np 4 | import scipy 5 | import os 6 | from anndata import AnnData,read_csv,read_text,read_mtx 7 | from scipy.sparse import issparse 8 | from natsort import natsorted 9 | from anndata import read_mtx 10 | from anndata.utils import make_index_unique 11 | 12 | 13 | def read_10X(data_path, var_names='gene_symbols'): 14 | adata = read_mtx(data_path + '/matrix.mtx').T 15 | genes = pd.read_csv(data_path + '/genes.tsv', header=None, sep='\t') 16 | adata.var['gene_ids'] = genes[0].values 17 | adata.var['gene_symbols'] = genes[1].values 18 | assert var_names == 'gene_symbols' or var_names == 'gene_ids', \ 19 | 'var_names must be "gene_symbols" or "gene_ids"' 20 | if var_names == 'gene_symbols': 21 | var_names = genes[1] 22 | else: 23 | var_names = genes[0] 24 | if not var_names.is_unique: 25 | var_names = make_index_unique(pd.Index(var_names)).tolist() 26 | print('var_names are not unique, "make_index_unique" has applied') 27 | adata.var_names = var_names 28 | cells = pd.read_csv(data_path + '/barcodes.tsv', header=None, sep='\t') 29 | adata.obs['barcode'] = cells[0].values 30 | adata.obs_names = cells[0] 31 | return adata 32 | 33 | 34 | 35 | def change_to_continuous(q): 36 | #y_trans=q.argmax(axis=1) 37 | y_pred=np.asarray(np.argmax(q,axis=1),dtype=int) 38 | unique_labels=np.unique(q.argmax(axis=1)) 39 | #turn to continuous clusters label,from 0,1,2,3,... 40 | test_c={} 41 | for ind, i in enumerate(unique_labels): 42 | test_c[i]=ind 43 | y_pred=np.asarray([test_c[i] for i in y_pred],dtype=int) 44 | ##turn to categories 45 | labels=y_pred.astype('U') 46 | labels=pd.Categorical(values=labels,categories=natsorted(np.unique(y_pred).astype('U'))) 47 | return labels 48 | 49 | def presettings(save_dir="result_scRNA",dpi=200,verbosity=3): 50 | if not os.path.exists(save_dir): 51 | print("Warning:"+str(save_dir)+"does not exists, so we will creat it automatically!!:\n") 52 | os.mkdir(save_dir) 53 | figure_dir=os.path.join(save_dir,"figures") 54 | if not os.path.exists(figure_dir): 55 | os.mkdir(figure_dir) 56 | sc.settings.figdir=figure_dir 57 | sc.settings.verbosity=verbosity 58 | sc.settings.set_figure_params(dpi=dpi) 59 | sc.logging.print_versions() 60 | 61 | def creatadata(datadir=None,exprmatrix=None,expermatrix_filename="matrix.mtx",is_mtx=True,cell_info=None,cell_info_filename="barcodes.tsv",gene_info=None,gene_info_filename="genes.tsv",project_name=None): 62 | """ 63 | Construct a anndata object 64 | 65 | Construct a anndata from data in memory or files on disk. If datadir is a dir, there must be at least include "matrix.mtx" or data.txt(without anly columns name or rowname and sep="\t") , 66 | 67 | """ 68 | if (datadir is None and expermatrix is None and expermatrix_filename is None): 69 | raise ValueError("Please provide either the expression matrix or the ful path to the expression matrix!!") 70 | #something wrong 71 | cell_info=pd.DataFrame(["cell_"+str(i) for i in range(1,x.shape[0]+1)],columns=["cellname"]) if cell_info is not None else cell_info 72 | gene_info=pd.DataFrame(["gene_"+str(i) for i in range(1,x.shape[1]+1)],columns=["genename"]) if gene_info is not None else gene_info 73 | if datadir is not None: 74 | cell_and_gene_file = [f for f in os.listdir(datadir) if os.path.isfile(os.path.join(datadir, f))] 75 | if (os.path.isdir(datadir) and is_mtx==True): #sparse 76 | print("Start to read expression data (matrix.mtx)") 77 | x=sc.read_mtx(os.path.join(datadir,expermatrix_filename)).X.T 78 | else: #nonsparse 79 | x=pd.read_csv(os.path.join(datadir,expermatrix_filename),sep="\t",header=F) 80 | 81 | #only matrix with row names and colnames 82 | if cell_info_filename in cell_and_gene_file: 83 | cell_info=pd.read_csv(os.path.join(datadir,cell_info_filename),sep="\t",header=0,na_filter=False) 84 | if gene_info_filename in cell_and_gene_file: 85 | gene_info=pd.read_csv(os.path.join(datadir,gene_info_filename),sep="\t",header=0,na_filter=False) 86 | else: 87 | x=exprmatrix # n*p matrix, cell* gene 88 | 89 | adata=sc.AnnData(x,obs=cell_info,var=gene_info) 90 | a=adata.obs["cellname"] if "cellname" in adata.obs.keys() else adata.obs.index 91 | adata.var_names=adata.var["genename"] if "genename" in adata.var.keys() else adata.var.index 92 | adata.obs_names_make_unique(join="-") 93 | adata.var_names_make_unique(join="-") 94 | adata.uns["ProjectName"]="DEC_clust_algorithm" if project_name is None else project_name 95 | return adata 96 | 97 | def prefilter_cells(adata,min_counts=None,max_counts=None,min_genes=200,max_genes=None): 98 | if min_genes is None and min_counts is None and max_genes is None and max_counts is None: 99 | raise ValueError('Provide one of min_counts, min_genes, max_counts or max_genes.') 100 | id_tmp=np.asarray([True]*adata.shape[0],dtype=bool) 101 | id_tmp=np.logical_and(id_tmp,sc.pp.filter_cells(adata.X,min_genes=min_genes)[0]) if min_genes is not None else id_tmp 102 | id_tmp=np.logical_and(id_tmp,sc.pp.filter_cells(adata.X,max_genes=max_genes)[0]) if max_genes is not None else id_tmp 103 | id_tmp=np.logical_and(id_tmp,sc.pp.filter_cells(adata.X,min_counts=min_counts)[0]) if min_counts is not None else id_tmp 104 | id_tmp=np.logical_and(id_tmp,sc.pp.filter_cells(adata.X,max_counts=max_counts)[0]) if max_counts is not None else id_tmp 105 | adata._inplace_subset_obs(id_tmp) 106 | adata.raw=sc.pp.log1p(adata,copy=True) #check the rowname 107 | print("the var_names of adata.raw: adata.raw.var_names.is_unique=:",adata.raw.var_names.is_unique) 108 | 109 | def prefilter_genes(adata,min_counts=None,max_counts=None,min_cells=10,max_cells=None): 110 | if min_cells is None and min_counts is None and max_cells is None and max_counts is None: 111 | raise ValueError('Provide one of min_counts, min_genes, max_counts or max_genes.') 112 | id_tmp=np.asarray([True]*adata.shape[1],dtype=bool) 113 | id_tmp=np.logical_and(id_tmp,sc.pp.filter_genes(adata.X,min_cells=min_cells)[0]) if min_cells is not None else id_tmp 114 | id_tmp=np.logical_and(id_tmp,sc.pp.filter_genes(adata.X,max_cells=max_cells)[0]) if max_cells is not None else id_tmp 115 | id_tmp=np.logical_and(id_tmp,sc.pp.filter_genes(adata.X,min_counts=min_counts)[0]) if min_counts is not None else id_tmp 116 | id_tmp=np.logical_and(id_tmp,sc.pp.filter_genes(adata.X,max_counts=max_counts)[0]) if max_counts is not None else id_tmp 117 | adata._inplace_subset_var(id_tmp) 118 | 119 | def prefilter_specialgenes(adata,Gene1Pattern="ERCC",Gene2Pattern="MT-"): 120 | id_tmp1=np.asarray([not str(name).startswith(Gene1Pattern) for name in adata.var_names],dtype=bool) 121 | id_tmp2=np.asarray([not str(name).startswith(Gene2Pattern) for name in adata.var_names],dtype=bool) 122 | id_tmp=np.logical_and(id_tmp1,id_tmp2) 123 | adata._inplace_subset_var(id_tmp) 124 | 125 | def normalize_log1p_scale(adata,units="UMI",n_top_genes=1000): 126 | if units=="UMI" or units== "CPM": 127 | sc.pp.normalize_per_cell(adata,counts_per_cell_after=10e4) 128 | sc.pp.filter_genes_dispersion(adata,n_top_genes=n_top_genes) 129 | sc.pp.log1p(adata) 130 | sc.pp.scale(adata,zero_center=True,max_value=6) 131 | 132 | #creat DEC object 133 | def get_xinput(adata): 134 | if not isinstance(adata,AnnData): 135 | raise ValueError("adata must be an AnnData object") 136 | if issparse(adata.X): 137 | x=adata.X.toarray() 138 | else: 139 | x=adata.X 140 | return x 141 | 142 | def getdims(dim): 143 | x=dim 144 | assert len(x)==2 145 | n_sample=x[0] 146 | if n_sample>20000: 147 | dims=[x[1],128,32] 148 | elif n_sample>10000: 149 | dims=[x[1],64,32] 150 | elif n_sample>5000: 151 | dims=[x[1],32,16] 152 | elif n_sample>2000: 153 | dims=[x[1],128] 154 | elif n_sample>500: 155 | dims=[x[1],64] # 156 | else: 157 | dims=[x[1],16] # or 32 158 | return dims 159 | 160 | def OriginalClustering(adata,resolution=1.2,n_neighbors=20,n_comps=50,n_PC=20,n_job=4,dotsne=True,doumap=True,dolouvain=True): 161 | #Do PCA directly 162 | sc.tl.pca(adata,n_comps=n_comps) 163 | n_pcs=n_PC if n_PC 13 | 14 | ## Usage 15 | 16 | The [**ItClust**](https://github.com/jianhuupenn/ItClust) package is an implementation of Iterative Transfer learning algorithm for scRNA-seq Clustering. With ItClust, you can: 17 | 18 | - Preprocess single cell gene expression data from various formats. 19 | - Build a network for target data clustering with prioe knowledge learnt from the source data. 20 | - Obtain soft-clustering assignments of cells. 21 | - Obtain cell type confidence score for each clsuter to assist cell type assignment. 22 | - Visualize cell clustering/classification results and gene expression patterns. 23 | 24 | ## System Requirements 25 | Python support packages: pandas, numpy, keras, scipy, scanpy, anndata, natsort, sklearn. 26 | 27 | ## Versions the software has been tested on 28 | Environment 1: 29 | - System: Mac OS 10.13.6 30 | - Python: 3.7.0 31 | - Python packages: pandas = 0.25.3, numpy = 1.18.1, keras = 2.2.4, tensorflow = 1.14.0, scipy = 1.4.1, scanpy = 1.4.4.post1, anndata = 0.6.22.post1, natsort = 7.0.1, sklearn = 0.22.1 32 | 33 | Environment 2: 34 | - System: Linux 3.10.0 35 | - Python: 3.7.5 36 | - Python packages: pandas = 0.25.3, numpy = 1.17.3, keras = 2.3.1, scipy = 1.4.1, scanpy = 1.4.4.post1, anndata = 0.6.22.post1, natsort = 6.0.0, sklearn = 0.21.3 37 | 38 | Environment 3: 39 | - System: Ubuntu 16.04.6 LTS 40 | - Python: 3.5.2 41 | - Python packages: pandas = 0.22.0, numpy = 1.16.4, keras = 2.2.4, scipy = 1.0.1, scanpy = 1.3.1+21.g1df151f, anndata = 0.6.20, natsort = 5.2.0, sklearn = 0.19.1 42 | 43 | ## Installation guide, Demo and Instructions 44 | Please refer to: https://github.com/jianhuupenn/ItClust/blob/master/tutorial/tutorial.md 45 | 46 | 47 | ## Contributing 48 | 49 | Souce code: [Github](https://github.com/jianhuupenn/ItClust) 50 | Author email: jianhu@pennmedicine.upenn.edu 51 |
52 | We are continuing adding new features. Bug reports or feature requests are welcome. 53 |
54 | 55 | ## Debugging 56 | 57 | If you see this error information: 58 | #### TypeError: add_weight() got multiple values for argument 'name' 59 | Please update your Keras and TensorFlow to the same version as one of the tested environments. 60 | 61 | ## Reference 62 | 63 | Please consider citing the following reference: 64 | - Hu, J., Li, X., Hu, G. et al. Iterative transfer learning with neural network for clustering and cell type classification in single-cell RNA-seq analysis. Nat Mach Intell (2020). https://doi.org/10.1038/s42256-020-00233-7 65 | - https://www.nature.com/articles/s42256-020-00233-7 66 | - If you do not have access to the link above, you can access the paper by https://rdcu.be/b78Ya 67 | - https://www.biorxiv.org/content/10.1101/2020.02.02.931139v1.full (preprint) 68 | -------------------------------------------------------------------------------- /docs/asserts/images/README.md: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /docs/asserts/images/workflow.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jianhuupenn/ItClust/059b8d66d4f15bd79a2db5c9541f32c9685ba502/docs/asserts/images/workflow.jpg -------------------------------------------------------------------------------- /tutorial/data/pancreas/Bh.h5ad: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jianhuupenn/ItClust/059b8d66d4f15bd79a2db5c9541f32c9685ba502/tutorial/data/pancreas/Bh.h5ad -------------------------------------------------------------------------------- /tutorial/data/pancreas/Bh_smartseq2_results.h5ad: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jianhuupenn/ItClust/059b8d66d4f15bd79a2db5c9541f32c9685ba502/tutorial/data/pancreas/Bh_smartseq2_results.h5ad -------------------------------------------------------------------------------- /tutorial/data/pancreas/smartseq2.h5ad: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jianhuupenn/ItClust/059b8d66d4f15bd79a2db5c9541f32c9685ba502/tutorial/data/pancreas/smartseq2.h5ad -------------------------------------------------------------------------------- /tutorial/data/pbmc/barcodes.tsv: -------------------------------------------------------------------------------- 1 | AAACATACAACCAC-1 2 | AAACATTGAGCTAC-1 3 | AAACATTGATCAGC-1 4 | AAACCGTGCTTCCG-1 5 | AAACCGTGTATGCG-1 6 | AAACGCACTGGTAC-1 7 | AAACGCTGACCAGT-1 8 | AAACGCTGGTTCTT-1 9 | AAACGCTGTAGCCA-1 10 | AAACGCTGTTTCTG-1 11 | AAACTTGAAAAACG-1 12 | AAACTTGATCCAGA-1 13 | AAAGAGACGAGATA-1 14 | AAAGAGACGCGAGA-1 15 | AAAGAGACGGACTT-1 16 | AAAGAGACGGCATT-1 17 | AAAGATCTGGGCAA-1 18 | AAAGCAGAAGCCAT-1 19 | AAAGCAGATATCGG-1 20 | AAAGCCTGTATGCG-1 21 | AAAGGCCTGTCTAG-1 22 | AAAGTTTGATCACG-1 23 | AAAGTTTGGGGTGA-1 24 | AAAGTTTGTAGAGA-1 25 | AAAGTTTGTAGCGT-1 26 | AAATCAACAATGCC-1 27 | AAATCAACACCAGT-1 28 | AAATCAACCAGGAG-1 29 | AAATCAACCCTATT-1 30 | AAATCAACGGAAGC-1 31 | AAATCAACTCGCAA-1 32 | AAATCATGACCACA-1 33 | AAATCCCTCCACAA-1 34 | AAATCCCTGCTATG-1 35 | AAATGTTGAACGAA-1 36 | AAATGTTGCCACAA-1 37 | AAATGTTGTGGCAT-1 38 | AAATTCGAAGGTTC-1 39 | AAATTCGAATCACG-1 40 | AAATTCGAGCTGAT-1 41 | AAATTCGAGGAGTG-1 42 | AAATTCGATTCTCA-1 43 | AAATTGACACGACT-1 44 | AAATTGACTCGCTC-1 45 | AACAAACTCATTTC-1 46 | AACAAACTTTCGTT-1 47 | AACAATACGACGAG-1 48 | AACACGTGCAGAGG-1 49 | AACACGTGGAAAGT-1 50 | AACACGTGGAACCT-1 51 | AACACGTGGCTACA-1 52 | AACACGTGTACGAC-1 53 | AACAGCACAAGAGT-1 54 | AACATTGATGGGAG-1 55 | AACCAGTGATACCG-1 56 | AACCCAGATCGCTC-1 57 | AACCGATGCTCCCA-1 58 | AACCGATGGTCATG-1 59 | AACCGATGTTCTAC-1 60 | AACCGCCTAGCGTT-1 61 | AACCGCCTCTACGA-1 62 | AACCTACTGTGAGG-1 63 | AACCTACTGTGTTG-1 64 | AACCTTACGAGACG-1 65 | AACCTTACGCGAGA-1 66 | AACCTTACTAACGC-1 67 | AACCTTTGGACGGA-1 68 | AACCTTTGTACGCA-1 69 | AACGCAACAAGTAG-1 70 | AACGCATGACCCAA-1 71 | AACGCATGCCTTCG-1 72 | AACGCATGTACTTC-1 73 | AACGCCCTCGGGAA-1 74 | AACGCCCTCGTACA-1 75 | AACGCCCTGCTTAG-1 76 | AACGCCCTGGCATT-1 77 | AACGTCGAGTATCG-1 78 | AACGTGTGAAAGCA-1 79 | AACGTGTGGCGGAA-1 80 | AACGTGTGTCCAAG-1 81 | AACGTGTGTGCTTT-1 82 | AACTACCTTAGAGA-1 83 | AACTCACTCAAGCT-1 84 | AACTCACTTGGAGG-1 85 | AACTCGGAAAGTGA-1 86 | AACTCGGAAGGTCT-1 87 | AACTCTTGCAGGAG-1 88 | AACTGTCTCCCTTG-1 89 | AACTTGCTACGCTA-1 90 | AACTTGCTGGGACA-1 91 | AAGAACGAGTGTTG-1 92 | AAGAAGACGTAGGG-1 93 | AAGACAGAAGTCTG-1 94 | AAGACAGAGGATCT-1 95 | AAGACAGATTACCT-1 96 | AAGAGATGGGTAGG-1 97 | AAGATGGAAAACAG-1 98 | AAGATGGAGAACTC-1 99 | AAGATGGAGATAAG-1 100 | AAGATTACAACCTG-1 101 | AAGATTACAGATCC-1 102 | AAGATTACCCGTTC-1 103 | AAGATTACCGCCTT-1 104 | AAGATTACCTCAAG-1 105 | AAGATTACTCCTCG-1 106 | AAGCAAGAGCGAGA-1 107 | AAGCAAGAGCTTAG-1 108 | AAGCAAGAGGTGTT-1 109 | AAGCACTGAGCAAA-1 110 | AAGCACTGCATACG-1 111 | AAGCACTGGTTCTT-1 112 | AAGCCAACGTGTTG-1 113 | AAGCCATGAACTGC-1 114 | AAGCCATGACACGT-1 115 | AAGCCATGCGTGAT-1 116 | AAGCCATGTCTCGC-1 117 | AAGCCTGACATGCA-1 118 | AAGCCTGACCGAAT-1 119 | AAGCGACTCCTCAC-1 120 | AAGCGACTGTGTCA-1 121 | AAGCGACTTACAGC-1 122 | AAGCGACTTTGACG-1 123 | AAGCGTACGTCTTT-1 124 | AAGGCTTGCGAACT-1 125 | AAGGTCACGGTTAC-1 126 | AAGGTCACTGTTTC-1 127 | AAGGTCACTTCCCG-1 128 | AAGGTCTGACAGTC-1 129 | AAGGTCTGCAGATC-1 130 | AAGGTCTGGTATGC-1 131 | AAGTAACTCTGAAC-1 132 | AAGTAACTGAGATA-1 133 | AAGTAGGATACAGC-1 134 | AAGTATACCGAACT-1 135 | AAGTCCGACTTGTT-1 136 | AAGTCCGATAGAAG-1 137 | AAGTCTCTAGTCGT-1 138 | AAGTCTCTCGGAGA-1 139 | AAGTGGCTTGGAGG-1 140 | AAGTTCCTCATTCT-1 141 | AAGTTCCTTCTTAC-1 142 | AATAAGCTCGAATC-1 143 | AATAAGCTCGTTGA-1 144 | AATACCCTGGACGA-1 145 | AATACCCTGGCATT-1 146 | AATACTGAAAGGGC-1 147 | AATACTGAATTGGC-1 148 | AATAGGGAACCCTC-1 149 | AATAGGGAGAATGA-1 150 | AATCAAACTATCGG-1 151 | AATCCGGAATGCTG-1 152 | AATCCTACCGGTAT-1 153 | AATCCTTGACGGGA-1 154 | AATCCTTGGTGAGG-1 155 | AATCGGTGGAACTC-1 156 | AATCGGTGTGCTTT-1 157 | AATCTAGAAAAGTG-1 158 | AATCTAGAATCGGT-1 159 | AATCTCACAGCCTA-1 160 | AATCTCACTCTAGG-1 161 | AATCTCTGAACAGA-1 162 | AATCTCTGCTTTAC-1 163 | AATGATACACCAAC-1 164 | AATGATACGGTCAT-1 165 | AATGCGTGACACCA-1 166 | AATGCGTGGACGGA-1 167 | AATGCGTGGCTATG-1 168 | AATGGAGAATCGTG-1 169 | AATGGAGATCCTTA-1 170 | AATGGCTGACACCA-1 171 | AATGGCTGCGTGAT-1 172 | AATGGCTGTAAAGG-1 173 | AATGGCTGTACTCT-1 174 | AATGGCTGTGAAGA-1 175 | AATGTAACGGTGGA-1 176 | AATGTAACGTTTGG-1 177 | AATGTCCTCTTCTA-1 178 | AATGTTGACAGTCA-1 179 | AATGTTGAGTTGAC-1 180 | AATGTTGATCTACT-1 181 | AATTACGAATTCCT-1 182 | AATTACGACTTCTA-1 183 | AATTACGAGTAGCT-1 184 | AATTACGAGTGAGG-1 185 | AATTACGATTGGCA-1 186 | AATTCCTGCTCAGA-1 187 | AATTGATGTCGCAA-1 188 | AATTGTGACTTGGA-1 189 | ACAAAGGAGGGTGA-1 190 | ACAAATTGATTCTC-1 191 | ACAAATTGCTCAGA-1 192 | ACAAATTGTTGCGA-1 193 | ACAACCGAGGGATG-1 194 | ACAACCGAGTTACG-1 195 | ACAAGAGAAGTCGT-1 196 | ACAAGAGACTTATC-1 197 | ACAAGAGAGTTGAC-1 198 | ACAATCCTAACCGT-1 199 | ACAATCCTTAGCGT-1 200 | ACAATTGACTGACA-1 201 | ACAATTGATGACTG-1 202 | ACACAGACACCTGA-1 203 | ACACAGACCATACG-1 204 | ACACCAGAGGGCAA-1 205 | ACACCCTGGTGTTG-1 206 | ACACGAACAGTTCG-1 207 | ACACGATGACGCAT-1 208 | ACACGATGATGTGC-1 209 | ACACGATGTCGTAG-1 210 | ACACGATGTGGTCA-1 211 | ACAGACACGGCATT-1 212 | ACAGACACGTTGTG-1 213 | ACAGCAACACCTAG-1 214 | ACAGCAACCTCAAG-1 215 | ACAGGTACCCCACT-1 216 | ACAGGTACGCTGTA-1 217 | ACAGGTACTGGTGT-1 218 | ACAGTCGACCCAAA-1 219 | ACAGTCGACCGATA-1 220 | ACAGTGACTCACCC-1 221 | ACAGTGACTCTATC-1 222 | ACAGTGTGGTCACA-1 223 | ACAGTGTGTTGCGA-1 224 | ACATCACTCTACTT-1 225 | ACATGGTGAAGCCT-1 226 | ACATGGTGCAACCA-1 227 | ACATGGTGCGAGTT-1 228 | ACATGGTGCGTTGA-1 229 | ACATTCTGGCATAC-1 230 | ACATTCTGGGAACG-1 231 | ACCAACGACATGCA-1 232 | ACCACAGAAAGTAG-1 233 | ACCACAGAGTTGGT-1 234 | ACCACCTGTGTGCA-1 235 | ACCACGCTACAGCT-1 236 | ACCACGCTACCCAA-1 237 | ACCACGCTGCGAGA-1 238 | ACCACGCTGCTGTA-1 239 | ACCAGCCTGACAGG-1 240 | ACCAGTGAACGGTT-1 241 | ACCAGTGAATACCG-1 242 | ACCAGTGAGGGATG-1 243 | ACCAGTGATGACTG-1 244 | ACCATTACCTTCTA-1 245 | ACCATTACGAGATA-1 246 | ACCATTTGTCATTC-1 247 | ACCCAAGAACTGTG-1 248 | ACCCAAGAATTCCT-1 249 | ACCCAAGAGGACAG-1 250 | ACCCAAGATTCACT-1 251 | ACCCACTGCGCCTT-1 252 | ACCCACTGGACAGG-1 253 | ACCCACTGGTTCAG-1 254 | ACCCACTGTCGTAG-1 255 | ACCCAGCTCAGAAA-1 256 | ACCCAGCTGTTAGC-1 257 | ACCCAGCTTGCTTT-1 258 | ACCCGTTGATGACC-1 259 | ACCCGTTGCTGCAA-1 260 | ACCCGTTGCTTCTA-1 261 | ACCCTCGACCTATT-1 262 | ACCCTCGACGGTAT-1 263 | ACCCTCGATAAGGA-1 264 | ACCCTCGATCAAGC-1 265 | ACCGTGCTACCAGT-1 266 | ACCGTGCTGGAACG-1 267 | ACCTATTGCTGAGT-1 268 | ACCTATTGTGCCCT-1 269 | ACCTCCGAGTCCTC-1 270 | ACCTCCGATATGCG-1 271 | ACCTCCGATGCTGA-1 272 | ACCTCGTGAACCAC-1 273 | ACCTGAGATATCGG-1 274 | ACCTGGCTAAGTAG-1 275 | ACCTGGCTGTCTTT-1 276 | ACCTTTGACTCCCA-1 277 | ACCTTTGAGGAACG-1 278 | ACCTTTGAGGAAGC-1 279 | ACGAACACCTTGTT-1 280 | ACGAACTGGCTATG-1 281 | ACGAAGCTCTCCAC-1 282 | ACGAAGCTCTGAGT-1 283 | ACGACCCTATCTCT-1 284 | ACGACCCTGATGAA-1 285 | ACGACCCTTGACAC-1 286 | ACGACCCTTGACCA-1 287 | ACGAGGGACAGGAG-1 288 | ACGAGGGACGAACT-1 289 | ACGAGGGATGTAGC-1 290 | ACGAGTACCCTAAG-1 291 | ACGAGTACGAATCC-1 292 | ACGATCGAGGACTT-1 293 | ACGATCGAGTCACA-1 294 | ACGATGACAATGCC-1 295 | ACGATGACTGGTCA-1 296 | ACGATTCTACGGGA-1 297 | ACGCAATGGTTCAG-1 298 | ACGCACCTGTTAGC-1 299 | ACGCCACTGAACTC-1 300 | ACGCCGGAAACCAC-1 301 | ACGCCGGAAAGCCT-1 302 | ACGCCGGAAATGCC-1 303 | ACGCCTTGCTCCCA-1 304 | ACGCGGTGGCGAGA-1 305 | ACGCGGTGTGTGGT-1 306 | ACGCGGTGTTTGCT-1 307 | ACGCTCACAGTACC-1 308 | ACGCTCACCCTTGC-1 309 | ACGCTGCTGTTCTT-1 310 | ACGGAACTCAGATC-1 311 | ACGGAACTGTCGTA-1 312 | ACGGAGGACTCTTA-1 313 | ACGGATTGGGAGGT-1 314 | ACGGATTGGTTAGC-1 315 | ACGGCTCTGAGCAG-1 316 | ACGGCTCTTGCACA-1 317 | ACGGTAACCGCTAA-1 318 | ACGGTAACCTTCGC-1 319 | ACGGTAACGGTGGA-1 320 | ACGGTAACTCGCAA-1 321 | ACGGTATGAGTCGT-1 322 | ACGGTATGGGTATC-1 323 | ACGGTATGGTTGTG-1 324 | ACGGTCCTAACGGG-1 325 | ACGGTCCTCGGGAA-1 326 | ACGTAGACAACCAC-1 327 | ACGTAGACTACAGC-1 328 | ACGTCAGAAACGAA-1 329 | ACGTCAGAGAGCTT-1 330 | ACGTCAGAGGGATG-1 331 | ACGTCCTGATAAGG-1 332 | ACGTCCTGTGAACC-1 333 | ACGTCGCTCCTGAA-1 334 | ACGTCGCTCTATTC-1 335 | ACGTCGCTTCTCAT-1 336 | ACGTGATGCCATGA-1 337 | ACGTGATGGGTCTA-1 338 | ACGTGATGTAACCG-1 339 | ACGTGATGTGACAC-1 340 | ACGTGCCTCCGTAA-1 341 | ACGTGCCTTCTATC-1 342 | ACGTTACTTTCCAT-1 343 | ACGTTGGAAAAGCA-1 344 | ACGTTGGAAACCTG-1 345 | ACGTTGGACCGTAA-1 346 | ACGTTGGAGCCAAT-1 347 | ACGTTGGATATGGC-1 348 | ACGTTGGATCAGGT-1 349 | ACGTTTACATCAGC-1 350 | ACTAAAACCCACAA-1 351 | ACTAAAACTCGACA-1 352 | ACTACGGAATTTCC-1 353 | ACTACGGACCTATT-1 354 | ACTACGGATCGCTC-1 355 | ACTACTACTAAGGA-1 356 | ACTAGGTGGAACCT-1 357 | ACTAGGTGGAACTC-1 358 | ACTATCACCTTGGA-1 359 | ACTATCACTGCCAA-1 360 | ACTCAGGACTGAAC-1 361 | ACTCAGGATCTATC-1 362 | ACTCAGGATTCGTT-1 363 | ACTCCTCTCAACTG-1 364 | ACTCGCACGAAAGT-1 365 | ACTCGCACTACGAC-1 366 | ACTCTCCTGACACT-1 367 | ACTCTCCTGCATAC-1 368 | ACTCTCCTGTTTGG-1 369 | ACTGAGACAACCAC-1 370 | ACTGAGACCCATAG-1 371 | ACTGAGACGTTGGT-1 372 | ACTGCCACACACGT-1 373 | ACTGCCACTCCGTC-1 374 | ACTGGCCTTCAGTG-1 375 | ACTGTGGACGTGTA-1 376 | ACTGTGGATCTAGG-1 377 | ACTGTTACCCACAA-1 378 | ACTGTTACTGCAGT-1 379 | ACTTAAGAACCACA-1 380 | ACTTAAGACCACAA-1 381 | ACTTAAGATTACTC-1 382 | ACTTAGCTGCGTAT-1 383 | ACTTAGCTGGGAGT-1 384 | ACTTCAACAAGCAA-1 385 | ACTTCAACGTAGGG-1 386 | ACTTCCCTTTCCGC-1 387 | ACTTCTGACATGCA-1 388 | ACTTGACTCCACAA-1 389 | ACTTGGGAGAAAGT-1 390 | ACTTGGGAGGTTTG-1 391 | ACTTGGGATGTGAC-1 392 | ACTTGGGATTGACG-1 393 | ACTTGTACCCGAAT-1 394 | ACTTGTACCTGTCC-1 395 | ACTTTGTGCGATAC-1 396 | ACTTTGTGGAAAGT-1 397 | ACTTTGTGGATAGA-1 398 | AGAAACGAAAGTAG-1 399 | AGAAAGTGCGCAAT-1 400 | AGAAAGTGGGGATG-1 401 | AGAACAGAAATGCC-1 402 | AGAACAGACGACTA-1 403 | AGAACAGAGACAAA-1 404 | AGAACGCTTTGCTT-1 405 | AGAAGATGTGACTG-1 406 | AGAATGGAAGAAGT-1 407 | AGAATTTGTAACCG-1 408 | AGAATTTGTAGAGA-1 409 | AGACACACTGTAGC-1 410 | AGACACTGTCAAGC-1 411 | AGACCTGAAGTAGA-1 412 | AGACCTGACCAACA-1 413 | AGACCTGAGGAAGC-1 414 | AGACGTACAGAGGC-1 415 | AGACGTACCCCTAC-1 416 | AGACGTACCTCTTA-1 417 | AGACGTACTCGTGA-1 418 | AGACTGACCATCAG-1 419 | AGACTGACCCTTTA-1 420 | AGACTTCTCATGCA-1 421 | AGAGATGACAGTCA-1 422 | AGAGATGACTGAAC-1 423 | AGAGATGAGGTTTG-1 424 | AGAGATGATCTCGC-1 425 | AGAGATGATTGTGG-1 426 | AGAGCGGAGGCAAG-1 427 | AGAGGTCTACAGCT-1 428 | AGAGTCTGGTCGTA-1 429 | AGAGTGCTCAGCTA-1 430 | AGAGTGCTCGAATC-1 431 | AGAGTGCTGTCATG-1 432 | AGAGTGCTGTCCTC-1 433 | AGAGTGCTGTGTTG-1 434 | AGATATACCCGTAA-1 435 | AGATATACGATGAA-1 436 | AGATATACTGTTCT-1 437 | AGATATTGCCTACC-1 438 | AGATATTGGCCAAT-1 439 | AGATCGTGTCTGGA-1 440 | AGATCGTGTTTGTC-1 441 | AGATCTCTATCACG-1 442 | AGATTAACGTTCTT-1 443 | AGATTCCTATCGTG-1 444 | AGATTCCTCACTTT-1 445 | AGATTCCTGACGAG-1 446 | AGATTCCTGTTCAG-1 447 | AGCAAAGATATGCG-1 448 | AGCACAACAGTCTG-1 449 | AGCACTGAGGGAGT-1 450 | AGCACTGATATGCG-1 451 | AGCACTGATGCTTT-1 452 | AGCACTGATTGCGA-1 453 | AGCATCGAAGATCC-1 454 | AGCATCGAAGGGTG-1 455 | AGCATCGAGCTTCC-1 456 | AGCATCGAGTGAGG-1 457 | AGCATCGATAACCG-1 458 | AGCATGACGATGAA-1 459 | AGCCAATGGGGAGT-1 460 | AGCCAATGTATCTC-1 461 | AGCCACCTGGATCT-1 462 | AGCCGGTGCCAATG-1 463 | AGCCGGTGTGTTTC-1 464 | AGCCGTCTCAATCG-1 465 | AGCCGTCTGAGAGC-1 466 | AGCCTCACGTTCGA-1 467 | AGCCTCACTGTCAG-1 468 | AGCCTCTGCAGTTG-1 469 | AGCCTCTGCCAATG-1 470 | AGCGAACTGGATCT-1 471 | AGCGAACTTACTGG-1 472 | AGCGATACGGAGCA-1 473 | AGCGATTGAGATCC-1 474 | AGCGCCGAATCTCT-1 475 | AGCGCCGACAGAGG-1 476 | AGCGCTCTACCTTT-1 477 | AGCGGCACCGGGAA-1 478 | AGCGGCTGATGTGC-1 479 | AGCGGGCTTGCCAA-1 480 | AGCGTAACATGCTG-1 481 | AGCGTAACTGAGAA-1 482 | AGCTCGCTACTGGT-1 483 | AGCTCGCTCTGCTC-1 484 | AGCTGAACCATACG-1 485 | AGCTGAACCTCTCG-1 486 | AGCTGCCTTGGGAG-1 487 | AGCTGCCTTTCATC-1 488 | AGCTGCCTTTCTGT-1 489 | AGCTGTGATCCAAG-1 490 | AGCTTTACAAGTAG-1 491 | AGCTTTACACCAAC-1 492 | AGCTTTACTCTCAT-1 493 | AGGAAATGAGGAGC-1 494 | AGGAACCTCTTAGG-1 495 | AGGAACCTTGCCTC-1 496 | AGGAATGATAACGC-1 497 | AGGAATGATTTGTC-1 498 | AGGAGTCTGGTTTG-1 499 | AGGAGTCTTGTCAG-1 500 | AGGATAGACATTTC-1 501 | AGGATAGAGGATTC-1 502 | AGGATGCTACTAGC-1 503 | AGGATGCTTTAGGC-1 504 | AGGCAACTGAAGGC-1 505 | AGGCAGGAGTACCA-1 506 | AGGCCTCTAGTCGT-1 507 | AGGCCTCTCGGAGA-1 508 | AGGCCTCTCGTAAC-1 509 | AGGGACGACGTTGA-1 510 | AGGGACGAGTCAAC-1 511 | AGGGACGAGTTGTG-1 512 | AGGGACGATAGAGA-1 513 | AGGGACGATGCATG-1 514 | AGGGAGTGAGCCTA-1 515 | AGGGCCACCATACG-1 516 | AGGGCGCTAACCAC-1 517 | AGGGCGCTATGGTC-1 518 | AGGGTGGACAGTCA-1 519 | AGGGTGGACTCAAG-1 520 | AGGGTGGAGTTGCA-1 521 | AGGGTTTGTTCATC-1 522 | AGGTCATGAGTGTC-1 523 | AGGTCATGCTTATC-1 524 | AGGTCTGATTCTCA-1 525 | AGGTGGGAAGAATG-1 526 | AGGTGGGAAGTTCG-1 527 | AGGTGTTGGTTACG-1 528 | AGGTTCGAACCTCC-1 529 | AGGTTCGAACGTAC-1 530 | AGGTTCGAGGGTGA-1 531 | AGTAAGGAGTTTGG-1 532 | AGTAAGGATTCTTG-1 533 | AGTAATACATCACG-1 534 | AGTAATACCGAACT-1 535 | AGTAATTGTCCCAC-1 536 | AGTACGTGAGGGTG-1 537 | AGTACGTGCTGCAA-1 538 | AGTACGTGCTTGGA-1 539 | AGTACTCTACGTGT-1 540 | AGTACTCTCAACCA-1 541 | AGTACTCTCGGTAT-1 542 | AGTAGGCTTGCCTC-1 543 | AGTATAACTTGTCT-1 544 | AGTATCCTAGAACA-1 545 | AGTCACGATGAGCT-1 546 | AGTCAGACGAATAG-1 547 | AGTCAGACGCTTAG-1 548 | AGTCAGACTAGAGA-1 549 | AGTCAGACTGCACA-1 550 | AGTCCAGATATCTC-1 551 | AGTCCAGATTTCAC-1 552 | AGTCGAACCAACCA-1 553 | AGTCGCCTCCGTAA-1 554 | AGTCTACTAGGGTG-1 555 | AGTCTACTTGCATG-1 556 | AGTCTTACACCACA-1 557 | AGTCTTACTTCGCC-1 558 | AGTCTTACTTCGGA-1 559 | AGTGACTGCAACTG-1 560 | AGTGTTCTAACCTG-1 561 | AGTGTTCTATAAGG-1 562 | AGTGTTCTCACTTT-1 563 | AGTTAAACCACTTT-1 564 | AGTTATGAACAGTC-1 565 | AGTTATGACTGAGT-1 566 | AGTTATGAGTTCAG-1 567 | AGTTCTACCAGCTA-1 568 | AGTTCTTGAAGCCT-1 569 | AGTTGTCTACTACG-1 570 | AGTTTAGATGGTCA-1 571 | AGTTTCACGGTCTA-1 572 | AGTTTGCTACAGTC-1 573 | AGTTTGCTACTGGT-1 574 | AGTTTGCTCCAAGT-1 575 | ATAAACACAGTGCT-1 576 | ATAAACACCACCAA-1 577 | ATAACAACATGCTG-1 578 | ATAACAACGTCTAG-1 579 | ATAACAACTTTGTC-1 580 | ATAACATGTACTCT-1 581 | ATAACCCTGTTGGT-1 582 | ATAACCCTTGGTAC-1 583 | ATAAGTACGAATGA-1 584 | ATAAGTTGGTACGT-1 585 | ATAAGTTGTCTAGG-1 586 | ATAATCGAGCTGAT-1 587 | ATAATCGATGGTTG-1 588 | ATAATGACCTACTT-1 589 | ATAATGACTCGTGA-1 590 | ATACAATGTTAGGC-1 591 | ATACCACTCGTACA-1 592 | ATACCACTCTAAGC-1 593 | ATACCACTGCCAAT-1 594 | ATACCGGAATGCTG-1 595 | ATACCGGACATTTC-1 596 | ATACCGGACTTCGC-1 597 | ATACCGGAGGTGTT-1 598 | ATACCGGATCTCGC-1 599 | ATACCTACGCATCA-1 600 | ATACCTTGGGGCAA-1 601 | ATACGGACAGACTC-1 602 | ATACGGACCTACTT-1 603 | ATACGGACGAGGTG-1 604 | ATACGGACTATGCG-1 605 | ATACGGACTCTGGA-1 606 | ATACGTCTTAACGC-1 607 | ATACTCTGCTTCGC-1 608 | ATACTCTGGTATGC-1 609 | ATAGATACCATGGT-1 610 | ATAGATACGACGAG-1 611 | ATAGATTGGTGTAC-1 612 | ATAGCCGAACGGAG-1 613 | ATAGCGTGCAGATC-1 614 | ATAGCGTGCCCTTG-1 615 | ATAGCGTGGTATCG-1 616 | ATAGCGTGTCTCTA-1 617 | ATAGCTCTCTGATG-1 618 | ATAGCTCTGAGGTG-1 619 | ATAGGAGAAACAGA-1 620 | ATAGGCTGTCAGAC-1 621 | ATAGTCCTAGTGTC-1 622 | ATAGTCCTTGCATG-1 623 | ATAGTCCTTGTCGA-1 624 | ATAGTTGACAACTG-1 625 | ATAGTTGACCCTCA-1 626 | ATAGTTGAGACGTT-1 627 | ATAGTTGATAAGCC-1 628 | ATATACGAAGCCAT-1 629 | ATATACGAATTGGC-1 630 | ATATAGTGGAATGA-1 631 | ATATGCCTAGATCC-1 632 | ATATGCCTGGACAG-1 633 | ATATGCCTTCTCTA-1 634 | ATATGCCTTGGTAC-1 635 | ATCAAATGAGCCTA-1 636 | ATCAAATGGGTAAA-1 637 | ATCAACCTAAACGA-1 638 | ATCAACCTGAGGAC-1 639 | ATCAACCTTCTCTA-1 640 | ATCAACCTTTGTCT-1 641 | ATCACACTTTGTCT-1 642 | ATCACGGATTGCTT-1 643 | ATCACGGATTTCGT-1 644 | ATCATCTGACACCA-1 645 | ATCATGCTAGAGTA-1 646 | ATCATGCTGAACCT-1 647 | ATCCAGGACGCTAA-1 648 | ATCCAGGATGGAAA-1 649 | ATCCATACTCCTTA-1 650 | ATCCATACTTCATC-1 651 | ATCCCGTGCAGTCA-1 652 | ATCCCGTGCATGCA-1 653 | ATCCCGTGGCTGAT-1 654 | ATCCGCACGCATCA-1 655 | ATCCTAACGACGGA-1 656 | ATCCTAACGCTACA-1 657 | ATCGACGAAACTGC-1 658 | ATCGACGAATGACC-1 659 | ATCGAGTGGACGTT-1 660 | ATCGCAGAATCTCT-1 661 | ATCGCAGAGTGTCA-1 662 | ATCGCCACTGAGGG-1 663 | ATCGCCTGGGTCAT-1 664 | ATCGCCTGTGGCAT-1 665 | ATCGCGCTCAGAGG-1 666 | ATCGCGCTGGGATG-1 667 | ATCGCGCTTTTCGT-1 668 | ATCGGAACCAGTCA-1 669 | ATCGGTGAGTCAAC-1 670 | ATCGGTGATTGCAG-1 671 | ATCGTTTGCCTACC-1 672 | ATCGTTTGGGTACT-1 673 | ATCGTTTGTGCCAA-1 674 | ATCTACACCCGCTT-1 675 | ATCTACACCGGGAA-1 676 | ATCTCAACAGGAGC-1 677 | ATCTCAACCTCGAA-1 678 | ATCTCAACCTTGTT-1 679 | ATCTGGGAAACCAC-1 680 | ATCTGGGAAGTGTC-1 681 | ATCTGGGATTCCGC-1 682 | ATCTGTTGAACGGG-1 683 | ATCTGTTGACCTCC-1 684 | ATCTGTTGCCTTCG-1 685 | ATCTGTTGGTTGCA-1 686 | ATCTTGACACCAAC-1 687 | ATCTTGACCTCCCA-1 688 | ATCTTTCTGCATCA-1 689 | ATCTTTCTGTTTCT-1 690 | ATCTTTCTTGTCCC-1 691 | ATGAAACTCTGTGA-1 692 | ATGAAACTGAGGCA-1 693 | ATGAAGGAACAGCT-1 694 | ATGAAGGACCTGTC-1 695 | ATGAAGGACCTTAT-1 696 | ATGAAGGACTAGTG-1 697 | ATGAAGGACTTGCC-1 698 | ATGACGTGACGACT-1 699 | ATGACGTGATCGGT-1 700 | ATGAGAGAAAGTGA-1 701 | ATGAGAGAACGCAT-1 702 | ATGAGAGAAGTAGA-1 703 | ATGAGCACACAGCT-1 704 | ATGAGCACATCTTC-1 705 | ATGATAACTTCACT-1 706 | ATGATATGAAACAG-1 707 | ATGATATGACTGGT-1 708 | ATGATATGAGCACT-1 709 | ATGATATGGTCATG-1 710 | ATGATATGGTGCTA-1 711 | ATGATATGTTGTCT-1 712 | ATGCACGAATGTCG-1 713 | ATGCACGACTGTAG-1 714 | ATGCACGAGAACCT-1 715 | ATGCACGAGTTCGA-1 716 | ATGCACGATTGGTG-1 717 | ATGCAGTGTTACCT-1 718 | ATGCAGTGTTCTAC-1 719 | ATGCCAGAACGACT-1 720 | ATGCCAGACAGTCA-1 721 | ATGCCGCTTGAACC-1 722 | ATGCGATGCTATGG-1 723 | ATGCGATGCTGAGT-1 724 | ATGCGATGGTTACG-1 725 | ATGCGCCTTCATTC-1 726 | ATGCTTTGCGAATC-1 727 | ATGCTTTGGGCGAA-1 728 | ATGCTTTGTAGTCG-1 729 | ATGGACACATCGGT-1 730 | ATGGACACGCATCA-1 731 | ATGGGTACAACCTG-1 732 | ATGGGTACATCGGT-1 733 | ATGGGTACTATTCC-1 734 | ATGGGTACTGGGAG-1 735 | ATGTAAACACCTCC-1 736 | ATGTAAACCCGCTT-1 737 | ATGTAAACGGGATG-1 738 | ATGTAAACTCTCCG-1 739 | ATGTAAACTTCACT-1 740 | ATGTACCTCAGTCA-1 741 | ATGTACCTTAGTCG-1 742 | ATGTACCTTTATCC-1 743 | ATGTACCTTTCACT-1 744 | ATGTCACTAATGCC-1 745 | ATGTCACTCTGCTC-1 746 | ATGTCGGAGGTGAG-1 747 | ATGTTCACAGTCTG-1 748 | ATGTTCACCGTAGT-1 749 | ATGTTGCTTTCAGG-1 750 | ATTAACGATGAGAA-1 751 | ATTAACGATGCAAC-1 752 | ATTAAGACTGCAGT-1 753 | ATTACCTGCCTTAT-1 754 | ATTACCTGGAGGAC-1 755 | ATTACCTGGGCATT-1 756 | ATTAGATGTTTCAC-1 757 | ATTATGGAATCTCT-1 758 | ATTCAAGAACGGGA-1 759 | ATTCAAGACCTTTA-1 760 | ATTCAGCTCATTGG-1 761 | ATTCCAACCATTGG-1 762 | ATTCCAACTTAGGC-1 763 | ATTCGACTCACTAG-1 764 | ATTCGACTGAATAG-1 765 | ATTCGACTTTTGTC-1 766 | ATTCGGGAAAGGCG-1 767 | ATTCGGGATTAGGC-1 768 | ATTCTTCTGATACC-1 769 | ATTGAATGGACGGA-1 770 | ATTGATGAAGGTTC-1 771 | ATTGATGACTGAGT-1 772 | ATTGATGAGCGAAG-1 773 | ATTGATGATCTATC-1 774 | ATTGCACTGACGGA-1 775 | ATTGCACTGAGAGC-1 776 | ATTGCACTGGAGCA-1 777 | ATTGCACTTAGCCA-1 778 | ATTGCACTTGCTTT-1 779 | ATTGCTTGTTACTC-1 780 | ATTGGTCTGACTAC-1 781 | ATTGGTCTTGTCTT-1 782 | ATTGTAGATTCCCG-1 783 | ATTGTAGATTGCAG-1 784 | ATTGTCTGCGTACA-1 785 | ATTTAGGAACCATG-1 786 | ATTTAGGACAGAGG-1 787 | ATTTCCGAGATGAA-1 788 | ATTTCCGAGTGCTA-1 789 | ATTTCGTGTATGGC-1 790 | ATTTCTCTACTTTC-1 791 | ATTTCTCTAGCAAA-1 792 | ATTTCTCTCACTTT-1 793 | ATTTCTCTTCCCAC-1 794 | ATTTGCACAAGATG-1 795 | CAAAGCACAGCTCA-1 796 | CAAAGCACCGTAAC-1 797 | CAAAGCACGGTAAA-1 798 | CAAAGCTGAAAGTG-1 799 | CAAAGCTGTTGCTT-1 800 | CAAATATGTGACAC-1 801 | CAAATTGAGGGCAA-1 802 | CAAATTGATGGAGG-1 803 | CAACCAGAAAAGTG-1 804 | CAACCAGAAGTGCT-1 805 | CAACCAGAGTTCAG-1 806 | CAACCAGATAGAAG-1 807 | CAACCGCTGTTCAG-1 808 | CAACCGCTTTGAGC-1 809 | CAACGATGCGCAAT-1 810 | CAACGTGACTCCAC-1 811 | CAACGTGAGCCATA-1 812 | CAACGTGATCAAGC-1 813 | CAAGAAGACCACAA-1 814 | CAAGAAGACGTCTC-1 815 | CAAGAAGATTCTAC-1 816 | CAAGACTGACCTGA-1 817 | CAAGACTGAGTAGA-1 818 | CAAGCTGACCATAG-1 819 | CAAGCTGATCTATC-1 820 | CAAGGACTGTTCAG-1 821 | CAAGGACTTCTTTG-1 822 | CAAGGTTGCTCCAC-1 823 | CAAGGTTGTCATTC-1 824 | CAAGGTTGTCTGGA-1 825 | CAAGTCGAAACAGA-1 826 | CAAGTCGATAGCGT-1 827 | CAATAAACGCCATA-1 828 | CAATAATGAACTGC-1 829 | CAATATGACATGGT-1 830 | CAATATGACCTTCG-1 831 | CAATATGACGTTAG-1 832 | CAATATGAGGAGCA-1 833 | CAATCGGAGAAACA-1 834 | CAATCTACTGACTG-1 835 | CAATTCACCCAACA-1 836 | CAATTCACGATAGA-1 837 | CAATTCACTTGTGG-1 838 | CAATTCTGCTTGTT-1 839 | CAATTCTGGCGTAT-1 840 | CACAACGATACGAC-1 841 | CACAATCTTGTTCT-1 842 | CACAATCTTTCCAT-1 843 | CACACCTGCTTGAG-1 844 | CACACCTGTATGGC-1 845 | CACAGAACCCTTGC-1 846 | CACAGAACCTGATG-1 847 | CACAGATGGGATTC-1 848 | CACAGATGGTTTCT-1 849 | CACAGCCTGATACC-1 850 | CACAGCCTTGCCAA-1 851 | CACAGCCTTGTAGC-1 852 | CACAGTGATGAAGA-1 853 | CACATACTACAGCT-1 854 | CACATGGAACACGT-1 855 | CACATGGAAGTCGT-1 856 | CACCACTGCCAACA-1 857 | CACCACTGGCGAAG-1 858 | CACCCATGTTCTGT-1 859 | CACCGGGAATCGAC-1 860 | CACCGGGACGAGAG-1 861 | CACCGGGACGTGTA-1 862 | CACCGGGACTTCTA-1 863 | CACCGGGACTTGCC-1 864 | CACCGGGATTCGGA-1 865 | CACCGTACTAAGGA-1 866 | CACCGTACTAGCGT-1 867 | CACCTGACACCCAA-1 868 | CACCTGACCAGAAA-1 869 | CACCTGACCTCAAG-1 870 | CACCTGACGAAAGT-1 871 | CACCTGACTCGTAG-1 872 | CACGAAACTTCCGC-1 873 | CACGACCTCGATAC-1 874 | CACGCTACAGAAGT-1 875 | CACGCTACTGTTCT-1 876 | CACGCTACTTGACG-1 877 | CACGGGACAGAGTA-1 878 | CACGGGACATAAGG-1 879 | CACGGGACGTAGGG-1 880 | CACGGGTGCTTCGC-1 881 | CACGGGTGGAGGAC-1 882 | CACGGGTGTGTTTC-1 883 | CACTAACTCCTAAG-1 884 | CACTAACTGAAAGT-1 885 | CACTAGGATGATGC-1 886 | CACTATACCCCGTT-1 887 | CACTATACGTTTGG-1 888 | CACTGAGACAGTCA-1 889 | CACTGCACTTCATC-1 890 | CACTGCTGAGACTC-1 891 | CACTGCTGGAAAGT-1 892 | CACTTAACCGAATC-1 893 | CACTTAACCGTACA-1 894 | CACTTTGACTCTAT-1 895 | CACTTTGAGCTGTA-1 896 | CAGAAGCTCTCAAG-1 897 | CAGACATGAACGGG-1 898 | CAGACATGTCGACA-1 899 | CAGACCCTAAGGTA-1 900 | CAGACCCTAATGCC-1 901 | CAGACCCTAGGAGC-1 902 | CAGACTGAGTATGC-1 903 | CAGATCGAATGTCG-1 904 | CAGATCGACCTGAA-1 905 | CAGATCGATATGGC-1 906 | CAGATGACATTCTC-1 907 | CAGCAATGCCTTCG-1 908 | CAGCAATGGAGGGT-1 909 | CAGCAATGGTGCTA-1 910 | CAGCAATGTCTACT-1 911 | CAGCAATGTGACCA-1 912 | CAGCAATGTGAGGG-1 913 | CAGCACCTAAGCCT-1 914 | CAGCACCTAGGCGA-1 915 | CAGCACCTGTAGGG-1 916 | CAGCATGACAACCA-1 917 | CAGCATGAGACGTT-1 918 | CAGCCTACCCAACA-1 919 | CAGCCTTGCTACCC-1 920 | CAGCCTTGGGGACA-1 921 | CAGCGGACACCCTC-1 922 | CAGCGGACCTTTAC-1 923 | CAGCGTCTAAAGCA-1 924 | CAGCGTCTTATCGG-1 925 | CAGCTAGATGTGAC-1 926 | CAGCTCTGAGGCGA-1 927 | CAGCTCTGCAAGCT-1 928 | CAGCTCTGTCGTAG-1 929 | CAGCTCTGTGTGGT-1 930 | CAGGAACTAACTGC-1 931 | CAGGAACTCTCAGA-1 932 | CAGGCCGAACACCA-1 933 | CAGGCCGAACACGT-1 934 | CAGGCCGAACGACT-1 935 | CAGGCCGAATCTCT-1 936 | CAGGCCGACTAGCA-1 937 | CAGGGCACCATACG-1 938 | CAGGGCACCCAACA-1 939 | CAGGGCACTCCCGT-1 940 | CAGGTAACAGACTC-1 941 | CAGGTATGAGTCGT-1 942 | CAGGTATGTGCTTT-1 943 | CAGGTTGAGGATCT-1 944 | CAGTGATGGACGGA-1 945 | CAGTGATGGCTAAC-1 946 | CAGTGATGGGACAG-1 947 | CAGTGATGTAAGGA-1 948 | CAGTGATGTACGCA-1 949 | CAGTGTGAACACGT-1 950 | CAGTGTGATGTCAG-1 951 | CAGTTACTAAGGTA-1 952 | CAGTTACTGATAGA-1 953 | CAGTTGGAAAGAGT-1 954 | CAGTTGGACATACG-1 955 | CAGTTTACACACGT-1 956 | CAGTTTACCCCAAA-1 957 | CATAAAACGGAGCA-1 958 | CATAAATGAACTGC-1 959 | CATAACCTTCTCCG-1 960 | CATACTACCTCGAA-1 961 | CATACTACCTGAAC-1 962 | CATACTACGTACCA-1 963 | CATACTTGGGTTAC-1 964 | CATAGTCTAATCGC-1 965 | CATAGTCTCACTTT-1 966 | CATATAGACTAAGC-1 967 | CATATAGATCAGGT-1 968 | CATCAACTAGAAGT-1 969 | CATCAACTCCCTCA-1 970 | CATCAGGACTTCCG-1 971 | CATCAGGATAGCCA-1 972 | CATCAGGATCCTAT-1 973 | CATCAGGATGCACA-1 974 | CATCAGGATTTCGT-1 975 | CATCATACCGCATA-1 976 | CATCATACGGAGCA-1 977 | CATCATACTCAAGC-1 978 | CATCGCTGGGATCT-1 979 | CATCGCTGTGGCAT-1 980 | CATCGGCTATGCTG-1 981 | CATCGGCTTTGGCA-1 982 | CATCTCCTATGTGC-1 983 | CATCTCCTCGAACT-1 984 | CATGAGACACGGGA-1 985 | CATGAGACGTTGAC-1 986 | CATGAGACTCGCCT-1 987 | CATGCCACGGGTGA-1 988 | CATGCCACTGCCAA-1 989 | CATGCGCTAGTCAC-1 990 | CATGCGCTCAGATC-1 991 | CATGCGCTTTGCAG-1 992 | CATGGCCTAGGGTG-1 993 | CATGGCCTGTGCAT-1 994 | CATGTACTATCGTG-1 995 | CATGTTACAGTCGT-1 996 | CATGTTACCTGAGT-1 997 | CATGTTTGGGGATG-1 998 | CATTACACACGGAG-1 999 | CATTACACCAACTG-1 1000 | CATTACACGGAGTG-1 1001 | CATTACACTACTCT-1 1002 | CATTAGCTCCACAA-1 1003 | CATTGACTAGCGGA-1 1004 | CATTGGGACTCGAA-1 1005 | CATTGTACAGCGTT-1 1006 | CATTGTACTCGATG-1 1007 | CATTGTACTTATCC-1 1008 | CATTGTACTTTGCT-1 1009 | CATTGTTGCTAGTG-1 1010 | CATTTCGACTCTAT-1 1011 | CATTTCGAGATACC-1 1012 | CATTTGACCACACA-1 1013 | CATTTGACCCTGAA-1 1014 | CATTTGTGACGACT-1 1015 | CATTTGTGCATTGG-1 1016 | CATTTGTGCGGAGA-1 1017 | CATTTGTGGGATCT-1 1018 | CCAAAGTGCTACGA-1 1019 | CCAAAGTGTGAGAA-1 1020 | CCAACCTGAAGTAG-1 1021 | CCAACCTGACGTAC-1 1022 | CCAACCTGTTCGCC-1 1023 | CCAAGAACCCAATG-1 1024 | CCAAGAACGTAGCT-1 1025 | CCAAGAACGTGTCA-1 1026 | CCAAGAACTACTGG-1 1027 | CCAAGAACTCCTAT-1 1028 | CCAAGATGTCATTC-1 1029 | CCAAGATGTTTCAC-1 1030 | CCAAGTGAGGAACG-1 1031 | CCAAGTGATCAAGC-1 1032 | CCAATGGAACAGCT-1 1033 | CCAATTTGAACGTC-1 1034 | CCACCATGAACGTC-1 1035 | CCACCATGATCGGT-1 1036 | CCACCATGGACGAG-1 1037 | CCACCATGGGGAGT-1 1038 | CCACCATGTCCTGC-1 1039 | CCACTGACCCGCTT-1 1040 | CCACTGTGGGAAGC-1 1041 | CCACTGTGTGTAGC-1 1042 | CCACTTCTCGGGAA-1 1043 | CCAGAAACCCTGTC-1 1044 | CCAGAAACGAACTC-1 1045 | CCAGAAACGGTCTA-1 1046 | CCAGACCTCTGAGT-1 1047 | CCAGACCTTGTGGT-1 1048 | CCAGCACTGCGATT-1 1049 | CCAGCGGAAAGGCG-1 1050 | CCAGCGGACGACTA-1 1051 | CCAGCGGATGGGAG-1 1052 | CCAGCTACACAGTC-1 1053 | CCAGCTACCAGCTA-1 1054 | CCAGGTCTACACCA-1 1055 | CCAGGTCTAGCATC-1 1056 | CCAGGTCTATGGTC-1 1057 | CCAGTCACACTGGT-1 1058 | CCAGTCACACTGTG-1 1059 | CCAGTCACGTTGTG-1 1060 | CCAGTCTGCGGAGA-1 1061 | CCAGTGCTAACCAC-1 1062 | CCAGTGCTCGTAGT-1 1063 | CCATCCGAAAGCAA-1 1064 | CCATCCGAACGACT-1 1065 | CCATCCGAAGGTTC-1 1066 | CCATCCGATTCGCC-1 1067 | CCATCGTGAACGGG-1 1068 | CCATCGTGCTAGAC-1 1069 | CCCAACACCTCGCT-1 1070 | CCCAACACGCATCA-1 1071 | CCCAACACTTTGTC-1 1072 | CCCAACTGCAATCG-1 1073 | CCCAGACTGCCTTC-1 1074 | CCCAGACTGGTTTG-1 1075 | CCCAGACTTTCGCC-1 1076 | CCCAGTTGCAGTTG-1 1077 | CCCAGTTGGGTACT-1 1078 | CCCAGTTGTCTATC-1 1079 | CCCGATTGTGTTTC-1 1080 | CCCGGAGAAGGGTG-1 1081 | CCCTACGAATTGGC-1 1082 | CCCTAGTGCAAAGA-1 1083 | CCCTCAGACACTTT-1 1084 | CCCTCAGACGAGAG-1 1085 | CCCTCAGAGGTCAT-1 1086 | CCCTGAACTAAAGG-1 1087 | CCCTGATGCAACCA-1 1088 | CCCTGATGCAAGCT-1 1089 | CCCTTACTAACCAC-1 1090 | CCCTTACTGCAGTT-1 1091 | CCGAAAACCTTGTT-1 1092 | CCGACACTGGTTTG-1 1093 | CCGACTACCCAGTA-1 1094 | CCGACTACCGTGTA-1 1095 | CCGACTACTGAGGG-1 1096 | CCGATAGACCTAAG-1 1097 | CCGATAGAGTTGGT-1 1098 | CCGCGAGACACACA-1 1099 | CCGCGAGAGGTTCA-1 1100 | CCGCTATGGGACGA-1 1101 | CCGCTATGTGCAAC-1 1102 | CCGCTATGTGCACA-1 1103 | CCGGTACTGTCCTC-1 1104 | CCGTACACAAGCAA-1 1105 | CCGTACACAGCGTT-1 1106 | CCGTACACGTCATG-1 1107 | CCGTACACGTTGGT-1 1108 | CCGTACACTAACGC-1 1109 | CCTAAACTTTCGTT-1 1110 | CCTAAGGACCCAAA-1 1111 | CCTAAGGACTAGCA-1 1112 | CCTAAGGAGGGCAA-1 1113 | CCTAAGGATGATGC-1 1114 | CCTAAGGATGTCAG-1 1115 | CCTACCGACTCTTA-1 1116 | CCTACCGAGGGATG-1 1117 | CCTAGAGAGGTGAG-1 1118 | CCTATAACCAAAGA-1 1119 | CCTATAACGAGACG-1 1120 | CCTATAACTCAGAC-1 1121 | CCTATAACTGCATG-1 1122 | CCTCGAACACTTTC-1 1123 | CCTCGAACCCGTAA-1 1124 | CCTCGAACGTATCG-1 1125 | CCTCGAACTTACTC-1 1126 | CCTCTACTCTTCGC-1 1127 | CCTCTACTGGCATT-1 1128 | CCTGACTGAAGTAG-1 1129 | CCTGACTGGGGAGT-1 1130 | CCTGACTGTGTCTT-1 1131 | CCTGCAACACGTTG-1 1132 | CCTGGACTCGTGAT-1 1133 | CCTTAATGCCCAAA-1 1134 | CCTTAATGTTCTAC-1 1135 | CCTTCACTACGACT-1 1136 | CCTTCACTCAGTCA-1 1137 | CCTTCACTGGAGTG-1 1138 | CCTTTAGATTCATC-1 1139 | CGAACATGCCCTAC-1 1140 | CGAACATGTCAGAC-1 1141 | CGAAGACTGGAACG-1 1142 | CGAAGACTGTTACG-1 1143 | CGAAGGGAAACCTG-1 1144 | CGAAGGGATCCGAA-1 1145 | CGAAGTACCAACTG-1 1146 | CGAATCGAGGAGCA-1 1147 | CGAATCGAGGAGGT-1 1148 | CGACAAACCCATAG-1 1149 | CGACAAACCGACAT-1 1150 | CGACCACTAAAGTG-1 1151 | CGACCACTGCCAAT-1 1152 | CGACCGGAAGGTCT-1 1153 | CGACCGGATGGAAA-1 1154 | CGACCTTGCTAGTG-1 1155 | CGACCTTGGCAAGG-1 1156 | CGACGTCTATCGTG-1 1157 | CGACGTCTCGTGTA-1 1158 | CGACGTCTGAGGCA-1 1159 | CGACTCACGTCGTA-1 1160 | CGACTCACGTTGCA-1 1161 | CGACTCTGTGTGAC-1 1162 | CGACTGCTTCCTCG-1 1163 | CGAGAACTAAGGCG-1 1164 | CGAGAACTACGTTG-1 1165 | CGAGAACTTGTTCT-1 1166 | CGAGATTGGACACT-1 1167 | CGAGATTGGCCATA-1 1168 | CGAGCCGAACACCA-1 1169 | CGAGCCGACGACAT-1 1170 | CGAGCCGAGGCGAA-1 1171 | CGAGCGTGCTCCAC-1 1172 | CGAGCGTGGATACC-1 1173 | CGAGCGTGTATGCG-1 1174 | CGAGGAGACCTCCA-1 1175 | CGAGGAGATGTCGA-1 1176 | CGAGGCACCTATGG-1 1177 | CGAGGCACTATGCG-1 1178 | CGAGGCACTCTTCA-1 1179 | CGAGGCTGACGCTA-1 1180 | CGAGGCTGGCAGTT-1 1181 | CGAGGGCTACGACT-1 1182 | CGAGGGCTCGAATC-1 1183 | CGAGTATGTCACCC-1 1184 | CGATACGAACAGTC-1 1185 | CGATACGACAGGAG-1 1186 | CGATACGATTCACT-1 1187 | CGATAGACCCGTAA-1 1188 | CGATAGACCGTACA-1 1189 | CGATAGACGTAGGG-1 1190 | CGATAGACTGTTCT-1 1191 | CGATCAGAAGAACA-1 1192 | CGATCAGAGAGGGT-1 1193 | CGATCAGAGGTACT-1 1194 | CGATCAGATGTGAC-1 1195 | CGATCCACCGGGAA-1 1196 | CGATCCACTTCCAT-1 1197 | CGCAAATGCTCGAA-1 1198 | CGCAACCTCCTTGC-1 1199 | CGCAACCTGGACGA-1 1200 | CGCACGGAGGACGA-1 1201 | CGCACGGATCTTTG-1 1202 | CGCACTACAGAATG-1 1203 | CGCACTACAGCCAT-1 1204 | CGCACTACATTGGC-1 1205 | CGCACTACTCGCCT-1 1206 | CGCACTACTCGTGA-1 1207 | CGCACTTGTCACGA-1 1208 | CGCAGGACAGATCC-1 1209 | CGCAGGACCTACTT-1 1210 | CGCAGGACTTGTCT-1 1211 | CGCAGGTGCACTGA-1 1212 | CGCAGGTGCCATAG-1 1213 | CGCAGGTGGGAACG-1 1214 | CGCATAGATCACGA-1 1215 | CGCCATACTGCAAC-1 1216 | CGCCATTGAGAGGC-1 1217 | CGCCATTGCTATGG-1 1218 | CGCCATTGGAGACG-1 1219 | CGCCATTGGAGCAG-1 1220 | CGCCATTGTACTGG-1 1221 | CGCCGAGAGCTTAG-1 1222 | CGCCTAACGAATGA-1 1223 | CGCCTAACTGCTCC-1 1224 | CGCGAGACACAGCT-1 1225 | CGCGAGACAGGTCT-1 1226 | CGCGAGACGCTACA-1 1227 | CGCGATCTCAGTCA-1 1228 | CGCGATCTGTTGAC-1 1229 | CGCGATCTTTCTTG-1 1230 | CGCGGATGGCCAAT-1 1231 | CGCTAAGAATGTCG-1 1232 | CGCTAAGACAACTG-1 1233 | CGCTAAGACCCTTG-1 1234 | CGCTACTGAACAGA-1 1235 | CGCTACTGAGAACA-1 1236 | CGCTACTGTGAGCT-1 1237 | CGCTACTGTTCCCG-1 1238 | CGCTCATGCATTTC-1 1239 | CGGAATTGCACTAG-1 1240 | CGGAATTGGTTTGG-1 1241 | CGGAATTGTGGAGG-1 1242 | CGGACCGATGCGTA-1 1243 | CGGACCGATGGGAG-1 1244 | CGGACTCTAAACAG-1 1245 | CGGACTCTCCAATG-1 1246 | CGGACTCTCCTCGT-1 1247 | CGGAGGCTATTCCT-1 1248 | CGGAGGCTTGGATC-1 1249 | CGGATAACAACGAA-1 1250 | CGGATAACAGCTCA-1 1251 | CGGATAACTCAGTG-1 1252 | CGGCACGAACTCAG-1 1253 | CGGCACGAAGGGTG-1 1254 | CGGCACGACTACGA-1 1255 | CGGCATCTTAGAAG-1 1256 | CGGCATCTTCGTAG-1 1257 | CGGCCAGAAAGGTA-1 1258 | CGGCCAGAGAGGCA-1 1259 | CGGCGAACCAGTCA-1 1260 | CGGCGAACGACAAA-1 1261 | CGGCGAACGGTCTA-1 1262 | CGGCGAACTACTTC-1 1263 | CGGGACTGCGTGTA-1 1264 | CGGGACTGGAATAG-1 1265 | CGGGCATGACCCAA-1 1266 | CGGGCATGTCTCTA-1 1267 | CGGGCATGTTGTGG-1 1268 | CGGTAAACTCGCAA-1 1269 | CGGTCACTGTTTGG-1 1270 | CGGTCACTTACTTC-1 1271 | CGTAACGAATCAGC-1 1272 | CGTAACGATCGCCT-1 1273 | CGTACCACACACAC-1 1274 | CGTACCACACGTTG-1 1275 | CGTACCACCTCATT-1 1276 | CGTACCACGCTACA-1 1277 | CGTACCACGGAGCA-1 1278 | CGTACCTGGACGAG-1 1279 | CGTACCTGGCATCA-1 1280 | CGTAGCCTCTCTCG-1 1281 | CGTAGCCTGCGAAG-1 1282 | CGTAGCCTGTATGC-1 1283 | CGTCAAGAAAGGTA-1 1284 | CGTCAAGAACGTGT-1 1285 | CGTCAAGACAGAGG-1 1286 | CGTCAAGACAGGAG-1 1287 | CGTCCATGCTCTTA-1 1288 | CGTCGACTTTCCGC-1 1289 | CGTGATGACGCTAA-1 1290 | CGTGATGAGGTTCA-1 1291 | CGTGCACTTATGGC-1 1292 | CGTGTAGAAAAACG-1 1293 | CGTGTAGACGATAC-1 1294 | CGTGTAGAGTTACG-1 1295 | CGTGTAGAGTTCAG-1 1296 | CGTGTAGATTCGGA-1 1297 | CGTTAGGAAACCAC-1 1298 | CGTTAGGATCATTC-1 1299 | CGTTATACCCTGAA-1 1300 | CGTTTAACTGGTCA-1 1301 | CTAAACCTCTGACA-1 1302 | CTAAACCTGTGCAT-1 1303 | CTAACACTAACGTC-1 1304 | CTAACACTAGTGCT-1 1305 | CTAACGGAACCGAT-1 1306 | CTAACGGATTTCTG-1 1307 | CTAACTACGGCAAG-1 1308 | CTAAGGACACCATG-1 1309 | CTAAGGACCGTTAG-1 1310 | CTAAGGACGCCATA-1 1311 | CTAAGGTGCCTAAG-1 1312 | CTAAGGTGTTGCAG-1 1313 | CTAAGGTGTTTCTG-1 1314 | CTAATAGAGCTATG-1 1315 | CTAATGCTTGTGGT-1 1316 | CTACAACTCCCGTT-1 1317 | CTACCTCTCAACCA-1 1318 | CTACGCACACCTAG-1 1319 | CTACGCACTCTCCG-1 1320 | CTACGCACTGGTCA-1 1321 | CTACGGCTTTCTTG-1 1322 | CTACTATGAACCAC-1 1323 | CTACTATGATGTGC-1 1324 | CTACTATGCTAAGC-1 1325 | CTACTATGTAAAGG-1 1326 | CTACTCCTATGTCG-1 1327 | CTACTCCTGCCATA-1 1328 | CTAGAGACACTTTC-1 1329 | CTAGAGACAGCATC-1 1330 | CTAGAGACTTTGGG-1 1331 | CTAGATCTCTCTAT-1 1332 | CTAGATCTTCGACA-1 1333 | CTAGGATGAGCCTA-1 1334 | CTAGGATGATCGTG-1 1335 | CTAGGCCTCTCAGA-1 1336 | CTAGGTGATGGTTG-1 1337 | CTAGTTACCAGAGG-1 1338 | CTAGTTACCGCATA-1 1339 | CTAGTTACGAAACA-1 1340 | CTAGTTTGAGTACC-1 1341 | CTATAAGATCGTTT-1 1342 | CTATACTGAGGTTC-1 1343 | CTATACTGCCAGTA-1 1344 | CTATACTGCGCTAA-1 1345 | CTATACTGCTACGA-1 1346 | CTATACTGTCTCAT-1 1347 | CTATACTGTTCGTT-1 1348 | CTATAGCTGTCACA-1 1349 | CTATAGCTTCGCTC-1 1350 | CTATAGCTTGCCTC-1 1351 | CTATCAACGAACTC-1 1352 | CTATCAACGCAGAG-1 1353 | CTATCAACTTTGGG-1 1354 | CTATCCCTCCACCT-1 1355 | CTATGTACGAGAGC-1 1356 | CTATGTACGCTTAG-1 1357 | CTATGTACTGTTTC-1 1358 | CTATGTTGAAAGCA-1 1359 | CTATGTTGTCCTCG-1 1360 | CTATGTTGTCTCGC-1 1361 | CTATTGACAAACGA-1 1362 | CTATTGACACTGGT-1 1363 | CTATTGACGGTGAG-1 1364 | CTATTGTGGCAAGG-1 1365 | CTCAATTGGTTCAG-1 1366 | CTCAATTGGTTGCA-1 1367 | CTCAGAGATAGAAG-1 1368 | CTCAGCACTCTAGG-1 1369 | CTCAGCACTGAACC-1 1370 | CTCAGCACTTGCAG-1 1371 | CTCAGCTGAACCTG-1 1372 | CTCAGCTGCAGTTG-1 1373 | CTCAGCTGTTTCTG-1 1374 | CTCAGGCTCGTTGA-1 1375 | CTCAGGCTGCTAAC-1 1376 | CTCATTGACCTTAT-1 1377 | CTCATTGATGCTTT-1 1378 | CTCATTGATTGCTT-1 1379 | CTCCACGAGAGATA-1 1380 | CTCCATCTCTTAGG-1 1381 | CTCCATCTGACGAG-1 1382 | CTCCGAACAAGTGA-1 1383 | CTCCTACTGCCTTC-1 1384 | CTCGAAGATGTGGT-1 1385 | CTCGAAGATTAGGC-1 1386 | CTCGACTGCTCTAT-1 1387 | CTCGACTGGGTGAG-1 1388 | CTCGACTGGTTGAC-1 1389 | CTCGAGCTCTGGAT-1 1390 | CTCGCATGACTTTC-1 1391 | CTCGCATGCTTAGG-1 1392 | CTCTAAACCTCGAA-1 1393 | CTCTAAACGGCGAA-1 1394 | CTCTAATGTCCAAG-1 1395 | CTGAACGACAGTCA-1 1396 | CTGAACGATGAGGG-1 1397 | CTGAAGACCCAACA-1 1398 | CTGAAGACGTGCAT-1 1399 | CTGAAGTGAAGCCT-1 1400 | CTGAAGTGCAGCTA-1 1401 | CTGAAGTGGCTATG-1 1402 | CTGAAGTGTCCAGA-1 1403 | CTGAATCTGAATAG-1 1404 | CTGACAGAATCGTG-1 1405 | CTGACCACAGCAAA-1 1406 | CTGAGAACCGGGAA-1 1407 | CTGAGAACGTAAAG-1 1408 | CTGATACTAGTAGA-1 1409 | CTGATTTGGTGTTG-1 1410 | CTGCAGCTAACCGT-1 1411 | CTGCAGCTGACACT-1 1412 | CTGCAGCTGGATTC-1 1413 | CTGCAGCTTGGCAT-1 1414 | CTGCCAACAGGAGC-1 1415 | CTGCCAACCAGCTA-1 1416 | CTGCCAACTAACCG-1 1417 | CTGCCAACTGCTCC-1 1418 | CTGCCAACTTGCAG-1 1419 | CTGCCAACTTGCTT-1 1420 | CTGCGACTCCACCT-1 1421 | CTGGAAACAAACGA-1 1422 | CTGGAAACATCGAC-1 1423 | CTGGATGACTGGAT-1 1424 | CTGGATGACTTGTT-1 1425 | CTGGATGATGTGAC-1 1426 | CTGGCACTCAAGCT-1 1427 | CTGGCACTGGACAG-1 1428 | CTGTAACTAACCAC-1 1429 | CTGTAACTAGCGTT-1 1430 | CTGTATACGTAAAG-1 1431 | CTGTATACGTACGT-1 1432 | CTGTATACGTTGGT-1 1433 | CTGTGAGACAACCA-1 1434 | CTGTGAGACCTTGC-1 1435 | CTGTGAGACGAACT-1 1436 | CTGTGAGACTGTAG-1 1437 | CTTAAAGAACCTGA-1 1438 | CTTAACACCTGTAG-1 1439 | CTTAACACGAGCTT-1 1440 | CTTAACACTATCGG-1 1441 | CTTAAGCTACCTAG-1 1442 | CTTAAGCTAGTACC-1 1443 | CTTAAGCTCATCAG-1 1444 | CTTAAGCTCCGCTT-1 1445 | CTTAAGCTTCCTCG-1 1446 | CTTACAACTAACGC-1 1447 | CTTACAACTCCCGT-1 1448 | CTTACTGACGTACA-1 1449 | CTTAGACTAAACGA-1 1450 | CTTAGGGACTTGCC-1 1451 | CTTAGGGAGAATCC-1 1452 | CTTATCGACTCATT-1 1453 | CTTCACCTACCTGA-1 1454 | CTTCATGAAGCATC-1 1455 | CTTCATGAAGTACC-1 1456 | CTTCATGACCGAAT-1 1457 | CTTGAACTACGCAT-1 1458 | CTTGATTGAGGTTC-1 1459 | CTTGATTGATCTTC-1 1460 | CTTGATTGCATTCT-1 1461 | CTTGATTGTTTCGT-1 1462 | CTTGTATGACACCA-1 1463 | CTTGTATGCGCAAT-1 1464 | CTTTACGAGCGAAG-1 1465 | CTTTAGACCGTGAT-1 1466 | CTTTAGACGAGACG-1 1467 | CTTTAGACGATACC-1 1468 | CTTTAGACGTTGGT-1 1469 | CTTTAGACTCATTC-1 1470 | CTTTAGTGACGGGA-1 1471 | CTTTAGTGGGTGGA-1 1472 | CTTTCAGAGAAACA-1 1473 | CTTTGATGAGCACT-1 1474 | CTTTGATGTCTAGG-1 1475 | CTTTGATGTGTCCC-1 1476 | CTTTGATGTGTGGT-1 1477 | GAAACAGAACTACG-1 1478 | GAAACAGAATCACG-1 1479 | GAAACAGACATTCT-1 1480 | GAAACCTGATCGTG-1 1481 | GAAACCTGATGCCA-1 1482 | GAAACCTGCTTATC-1 1483 | GAAACCTGGACTAC-1 1484 | GAAACCTGTGCTAG-1 1485 | GAAAGATGATTTCC-1 1486 | GAAAGATGCTGATG-1 1487 | GAAAGATGCTTCGC-1 1488 | GAAAGATGTAAGGA-1 1489 | GAAAGATGTTTGCT-1 1490 | GAAAGCCTACGTTG-1 1491 | GAAAGTGAAAGTGA-1 1492 | GAAAGTGACCACAA-1 1493 | GAAAGTGACTCAAG-1 1494 | GAAATACTACCAAC-1 1495 | GAAATACTCTTAGG-1 1496 | GAAATACTTCCTCG-1 1497 | GAACACACGTGCAT-1 1498 | GAACACACTGCCTC-1 1499 | GAACAGCTAACTGC-1 1500 | GAACAGCTCTCAGA-1 1501 | GAACCAACCACAAC-1 1502 | GAACCAACTTCCGC-1 1503 | GAACCTGAACGTGT-1 1504 | GAACCTGAGAGACG-1 1505 | GAACCTGATGAACC-1 1506 | GAACGGGATACTTC-1 1507 | GAACGTTGACGGAG-1 1508 | GAACTGTGACCTGA-1 1509 | GAACTGTGCCAGTA-1 1510 | GAAGAATGCAATCG-1 1511 | GAAGCGGACCTATT-1 1512 | GAAGCTACGAATGA-1 1513 | GAAGCTACGGTTTG-1 1514 | GAAGGGTGAAAGTG-1 1515 | GAAGGGTGCTTAGG-1 1516 | GAAGGTCTGAAAGT-1 1517 | GAAGGTCTGTTGCA-1 1518 | GAAGGTCTTAAAGG-1 1519 | GAAGTAGACTCCCA-1 1520 | GAAGTAGATCCAAG-1 1521 | GAAGTCACCCTCGT-1 1522 | GAAGTCACCCTGTC-1 1523 | GAAGTCTGTCGCAA-1 1524 | GAAGTCTGTTCTGT-1 1525 | GAAGTGCTAAACGA-1 1526 | GAAGTGCTCCGCTT-1 1527 | GAAGTGCTTAACCG-1 1528 | GAATGCACCCTAAG-1 1529 | GAATGCACCTTCGC-1 1530 | GAATGCTGCGGTAT-1 1531 | GAATGGCTAAGATG-1 1532 | GAATTAACGATAAG-1 1533 | GAATTAACGGTCAT-1 1534 | GAATTAACGTCGTA-1 1535 | GAATTAACTGAAGA-1 1536 | GACAACACAGGCGA-1 1537 | GACAACACATCGTG-1 1538 | GACAACACTCGCCT-1 1539 | GACAACTGAGGTTC-1 1540 | GACAGGGAAGAGTA-1 1541 | GACAGGGAATGCCA-1 1542 | GACAGTACGAGCTT-1 1543 | GACAGTACTTCGGA-1 1544 | GACAGTTGAGTAGA-1 1545 | GACATTCTCCACCT-1 1546 | GACCAAACGACTAC-1 1547 | GACCAAACGTATCG-1 1548 | GACCATGACTCTCG-1 1549 | GACCCTACTAAAGG-1 1550 | GACCTAGACCTCAC-1 1551 | GACCTAGACGAGAG-1 1552 | GACCTCACAAGGTA-1 1553 | GACCTCACGTACGT-1 1554 | GACCTCTGCATCAG-1 1555 | GACGAACTCCCACT-1 1556 | GACGATTGCCAATG-1 1557 | GACGCCGACCTTCG-1 1558 | GACGCTCTCTCTCG-1 1559 | GACGGCACACGGGA-1 1560 | GACGGCACGAGATA-1 1561 | GACGTAACCTATGG-1 1562 | GACGTAACCTGTGA-1 1563 | GACGTAACTATGGC-1 1564 | GACGTATGTTGACG-1 1565 | GACGTATGTTTGCT-1 1566 | GACGTCCTACGGAG-1 1567 | GACGTCCTCTCAAG-1 1568 | GACGTCCTGATAAG-1 1569 | GACTACGATGGTCA-1 1570 | GACTCCTGCTCGCT-1 1571 | GACTCCTGGGTTAC-1 1572 | GACTCCTGTTATCC-1 1573 | GACTCCTGTTGGTG-1 1574 | GACTGAACAACCGT-1 1575 | GACTGAACCAATCG-1 1576 | GACTGATGTGATGC-1 1577 | GACTTTACATGCCA-1 1578 | GACTTTACGACAGG-1 1579 | GAGAAATGTTCTCA-1 1580 | GAGATAGAAAAAGC-1 1581 | GAGATCACGACAAA-1 1582 | GAGATGCTCTGGAT-1 1583 | GAGATGCTGAATGA-1 1584 | GAGCAGGATTCCCG-1 1585 | GAGCATACTTTGCT-1 1586 | GAGCGCACGCGTAT-1 1587 | GAGCGCACGGTGAG-1 1588 | GAGCGCTGAAGATG-1 1589 | GAGCGCTGTCTTAC-1 1590 | GAGCGGCTGGGAGT-1 1591 | GAGGACGACTCAGA-1 1592 | GAGGATCTGAAAGT-1 1593 | GAGGCAGACTTGCC-1 1594 | GAGGGAACACCAGT-1 1595 | GAGGGAACGAGGGT-1 1596 | GAGGGATGGGAAAT-1 1597 | GAGGGCCTTCACCC-1 1598 | GAGGGTGAAGAGTA-1 1599 | GAGGTACTACGGTT-1 1600 | GAGGTACTACTCAG-1 1601 | GAGGTACTGACACT-1 1602 | GAGGTACTGGGAGT-1 1603 | GAGGTACTTAGCGT-1 1604 | GAGGTGGAGTACGT-1 1605 | GAGGTGGATCCTCG-1 1606 | GAGGTTACTCGTTT-1 1607 | GAGGTTTGTAAGCC-1 1608 | GAGTCAACCATTCT-1 1609 | GAGTCAACGGGAGT-1 1610 | GAGTCTGATCGTGA-1 1611 | GAGTCTGATTTGGG-1 1612 | GAGTGACTCAGCTA-1 1613 | GAGTGACTCGGTAT-1 1614 | GAGTGACTCTTGCC-1 1615 | GAGTGACTGACTAC-1 1616 | GAGTGACTGTCTAG-1 1617 | GAGTGGGAGTCTTT-1 1618 | GAGTGGGATGCCCT-1 1619 | GAGTGGGATGCTGA-1 1620 | GAGTGTTGCTGTAG-1 1621 | GAGTGTTGTGGTCA-1 1622 | GAGTTGTGCATGGT-1 1623 | GAGTTGTGCTGAGT-1 1624 | GAGTTGTGGCGAGA-1 1625 | GAGTTGTGGTAGCT-1 1626 | GAGTTGTGTATGCG-1 1627 | GATAAGGAGAAACA-1 1628 | GATAAGGATTCACT-1 1629 | GATACTCTATCGGT-1 1630 | GATACTCTTACTTC-1 1631 | GATACTCTTGACTG-1 1632 | GATAGAGAAGGGTG-1 1633 | GATAGAGACTGTGA-1 1634 | GATAGAGATCACGA-1 1635 | GATAGCACCCATAG-1 1636 | GATAGCACGAAGGC-1 1637 | GATAGCACTTGTCT-1 1638 | GATATAACAAGGTA-1 1639 | GATATAACACGCAT-1 1640 | GATATATGCTGGAT-1 1641 | GATATATGTCCGTC-1 1642 | GATATATGTGGAGG-1 1643 | GATATCCTAGAAGT-1 1644 | GATATCCTCCCGTT-1 1645 | GATATTGACAGGAG-1 1646 | GATATTGACGAGTT-1 1647 | GATATTGAGCCAAT-1 1648 | GATCCCTGACCTTT-1 1649 | GATCCCTGTGTAGC-1 1650 | GATCCGCTGGTCAT-1 1651 | GATCGAACCGAGAG-1 1652 | GATCGATGACTAGC-1 1653 | GATCGATGGTAAAG-1 1654 | GATCGATGTAAGGA-1 1655 | GATCGTGACACTAG-1 1656 | GATCGTGATTCACT-1 1657 | GATCTACTGGTGAG-1 1658 | GATCTACTTTGCAG-1 1659 | GATCTTACACCCAA-1 1660 | GATCTTACCCTACC-1 1661 | GATCTTACGAATAG-1 1662 | GATCTTACGAGATA-1 1663 | GATGCAACTCCAGA-1 1664 | GATGCCCTACGTAC-1 1665 | GATGCCCTCTCATT-1 1666 | GATGCCCTGGCAAG-1 1667 | GATGCCCTTTTGCT-1 1668 | GATTACCTTGTTCT-1 1669 | GATTCGGAACGACT-1 1670 | GATTCGGACAGGAG-1 1671 | GATTCGGAGAAGGC-1 1672 | GATTCTTGATTCGG-1 1673 | GATTCTTGCCGATA-1 1674 | GATTCTTGCGAGTT-1 1675 | GATTGGACCCGTTC-1 1676 | GATTGGACGGTGTT-1 1677 | GATTGGACTTTCGT-1 1678 | GATTGGTGTGTCAG-1 1679 | GATTTAGACACTCC-1 1680 | GATTTAGACTAAGC-1 1681 | GATTTAGATTCGTT-1 1682 | GATTTGCTAACGAA-1 1683 | GATTTGCTAACGGG-1 1684 | GCAACCCTCCTCGT-1 1685 | GCAACTGATTGCGA-1 1686 | GCAAGACTACTGGT-1 1687 | GCAAGACTAGGTCT-1 1688 | GCAAGACTCCCTTG-1 1689 | GCAATCGACTGCAA-1 1690 | GCAATCGAGACGTT-1 1691 | GCAATCGATCCTTA-1 1692 | GCAATTCTCGTGTA-1 1693 | GCAATTCTTCTCCG-1 1694 | GCACAAACAATGCC-1 1695 | GCACAAACGGTACT-1 1696 | GCACAATGGTGCAT-1 1697 | GCACACCTGTGCTA-1 1698 | GCACCACTCATGAC-1 1699 | GCACCACTGTTTGG-1 1700 | GCACCACTTCCTTA-1 1701 | GCACCACTTTCGGA-1 1702 | GCACCTACGCGATT-1 1703 | GCACCTTGGCTGTA-1 1704 | GCACCTTGGGGAGT-1 1705 | GCACGGACCAGCTA-1 1706 | GCACGGTGACCTCC-1 1707 | GCACGGTGCTATGG-1 1708 | GCACTAGAACGGGA-1 1709 | GCACTAGAAGATGA-1 1710 | GCACTAGACCTTTA-1 1711 | GCACTAGACGTAAC-1 1712 | GCACTAGAGTCGTA-1 1713 | GCACTAGATGCAAC-1 1714 | GCACTGCTGAGGCA-1 1715 | GCAGATACAGCGTT-1 1716 | GCAGATACGACGGA-1 1717 | GCAGATACGCAGAG-1 1718 | GCAGCCGACAGTCA-1 1719 | GCAGCGTGCACTCC-1 1720 | GCAGCTCTCAATCG-1 1721 | GCAGCTCTGTTTCT-1 1722 | GCAGGGCTAAGAAC-1 1723 | GCAGGGCTAAGGGC-1 1724 | GCAGGGCTATCGAC-1 1725 | GCAGGGCTTGGGAG-1 1726 | GCAGTCCTAACTGC-1 1727 | GCAGTCCTCTCTTA-1 1728 | GCATCAGATGCGTA-1 1729 | GCATGTGACAAGCT-1 1730 | GCATTGGAGAAGGC-1 1731 | GCCAAAACGAGGCA-1 1732 | GCCAAATGATCGAC-1 1733 | GCCAACCTACGGTT-1 1734 | GCCAACCTCGCCTT-1 1735 | GCCACGGAGGCGAA-1 1736 | GCCACGGATACTGG-1 1737 | GCCACTACCTACTT-1 1738 | GCCACTACGTCTTT-1 1739 | GCCCAACTACCGAT-1 1740 | GCCCAACTATGGTC-1 1741 | GCCCATACAGCGTT-1 1742 | GCCGACGAACTCTT-1 1743 | GCCGAGTGCGTTGA-1 1744 | GCCGGAACGAACTC-1 1745 | GCCGGAACGTTCTT-1 1746 | GCCGGAACTGCACA-1 1747 | GCCGGAACTTACTC-1 1748 | GCCGTACTACCTGA-1 1749 | GCCGTACTGGCAAG-1 1750 | GCCTACACAGTTCG-1 1751 | GCCTACACCACTGA-1 1752 | GCCTACACCTTGAG-1 1753 | GCCTAGCTACGGAG-1 1754 | GCCTAGCTCTATTC-1 1755 | GCCTAGCTTCTCAT-1 1756 | GCCTAGCTTCTCTA-1 1757 | GCCTCAACCATGGT-1 1758 | GCCTCAACTCTTTG-1 1759 | GCCTCATGTCTTAC-1 1760 | GCCTGACTCTCAAG-1 1761 | GCGAAGGAACTCTT-1 1762 | GCGAAGGAGAGCTT-1 1763 | GCGAAGGATGCCAA-1 1764 | GCGAGAGAGGGACA-1 1765 | GCGAGCACTGTCGA-1 1766 | GCGAGCACTTGACG-1 1767 | GCGAGCACTTGCTT-1 1768 | GCGATATGGTACGT-1 1769 | GCGATATGGTGTTG-1 1770 | GCGCACGAAGTCGT-1 1771 | GCGCACGACTTTAC-1 1772 | GCGCATCTAGGTCT-1 1773 | GCGCATCTGGTTAC-1 1774 | GCGCATCTTCGATG-1 1775 | GCGCATCTTGCTCC-1 1776 | GCGCATCTTTCTAC-1 1777 | GCGCGAACGTTCTT-1 1778 | GCGCGATGAACGGG-1 1779 | GCGCGATGGTGCAT-1 1780 | GCGGAGCTCCTGAA-1 1781 | GCGGCAACCCGATA-1 1782 | GCGGCAACGGAGGT-1 1783 | GCGGCAACTGTCGA-1 1784 | GCGTAAACACGGTT-1 1785 | GCGTAATGCACCAA-1 1786 | GCGTATGAACACCA-1 1787 | GCGTATGATGAGAA-1 1788 | GCTACAGAAAGGTA-1 1789 | GCTACAGAATCTTC-1 1790 | GCTACAGATCTTAC-1 1791 | GCTACCTGAGAAGT-1 1792 | GCTACCTGATCACG-1 1793 | GCTACGCTAGAATG-1 1794 | GCTACGCTAGCTAC-1 1795 | GCTACGCTCCCTAC-1 1796 | GCTAGAACAGAGGC-1 1797 | GCTAGAACGGATCT-1 1798 | GCTAGAACTCCCGT-1 1799 | GCTAGATGAGCTCA-1 1800 | GCTAGATGGCGATT-1 1801 | GCTATACTAAGGCG-1 1802 | GCTATACTAGCGTT-1 1803 | GCTATACTCTCTTA-1 1804 | GCTATACTGGACGA-1 1805 | GCTCAAGAACCATG-1 1806 | GCTCAAGAAGTCAC-1 1807 | GCTCAGCTGTCTAG-1 1808 | GCTCCATGAGAAGT-1 1809 | GCTCCATGCCGAAT-1 1810 | GCTCGACTCTAGTG-1 1811 | GCTGATGAGGTATC-1 1812 | GCTTAACTACAGTC-1 1813 | GCTTAACTACTGGT-1 1814 | GCTTAACTGCTGAT-1 1815 | GCTTAACTTAGACC-1 1816 | GCTTAACTTCAGTG-1 1817 | GGAACACTCACTTT-1 1818 | GGAACACTTCAGAC-1 1819 | GGAACTACTACTTC-1 1820 | GGAACTTGAAGGTA-1 1821 | GGAACTTGAGAATG-1 1822 | GGAACTTGCTCCAC-1 1823 | GGAACTTGGGTAGG-1 1824 | GGAAGGACATCGGT-1 1825 | GGAAGGACCACTAG-1 1826 | GGAAGGACGAGGGT-1 1827 | GGAAGGACGCGAAG-1 1828 | GGAAGGTGGCGAGA-1 1829 | GGAATCTGAAGGGC-1 1830 | GGAATCTGAGGAGC-1 1831 | GGAATCTGCTTAGG-1 1832 | GGAATCTGCTTGTT-1 1833 | GGAATCTGGGAGGT-1 1834 | GGAATGCTTTCTAC-1 1835 | GGACAGGAAAGGGC-1 1836 | GGACAGGAGTGCTA-1 1837 | GGACAGGATCTCGC-1 1838 | GGACCCGAAGCTAC-1 1839 | GGACCGTGCTTACT-1 1840 | GGACCGTGGGAACG-1 1841 | GGACCGTGTAACGC-1 1842 | GGACCTCTGTAAGA-1 1843 | GGACCTCTTTTCTG-1 1844 | GGACGAGAGTGTCA-1 1845 | GGACGCTGACGCAT-1 1846 | GGACGCTGCTAGCA-1 1847 | GGACGCTGTCCTCG-1 1848 | GGAGAGACGTGAGG-1 1849 | GGAGCAGATTCAGG-1 1850 | GGAGCCACCTTCTA-1 1851 | GGAGCGCTACGCAT-1 1852 | GGAGCGCTCCGAAT-1 1853 | GGAGGATGCCACCT-1 1854 | GGAGGATGGTTGAC-1 1855 | GGAGGATGTCAGTG-1 1856 | GGAGGCCTCGTTGA-1 1857 | GGAGGCCTTTCTTG-1 1858 | GGAGGTGATACGCA-1 1859 | GGAGGTGATCGCTC-1 1860 | GGATACTGCAGCTA-1 1861 | GGATACTGTCTAGG-1 1862 | GGATAGCTCGTCTC-1 1863 | GGATAGCTCTGAAC-1 1864 | GGATGTACCAAAGA-1 1865 | GGATGTACGCGAAG-1 1866 | GGATGTACGTCTTT-1 1867 | GGATGTACGTGTCA-1 1868 | GGATTTCTAGGTTC-1 1869 | GGATTTCTTTGTCT-1 1870 | GGCAAGGAAAAAGC-1 1871 | GGCAAGGAAGAAGT-1 1872 | GGCAAGGACTTGGA-1 1873 | GGCAAGGAGGACTT-1 1874 | GGCAATACGCTAAC-1 1875 | GGCAATACGGCATT-1 1876 | GGCAATACGTTTCT-1 1877 | GGCACGTGGCTTAG-1 1878 | GGCACGTGTGAGAA-1 1879 | GGCACTCTTTTGTC-1 1880 | GGCATATGCTTATC-1 1881 | GGCATATGGGGAGT-1 1882 | GGCATATGTGTGAC-1 1883 | GGCCACGACAGAGG-1 1884 | GGCCAGACTGGTTG-1 1885 | GGCCCAGAAAGTAG-1 1886 | GGCCGAACAACGAA-1 1887 | GGCCGAACGCAGAG-1 1888 | GGCCGAACGTAGGG-1 1889 | GGCCGAACTCTAGG-1 1890 | GGCCGATGCAGGAG-1 1891 | GGCCGATGCCGAAT-1 1892 | GGCCGATGTACTCT-1 1893 | GGCGACACTGCCCT-1 1894 | GGCGACTGCGTAAC-1 1895 | GGCGCATGCCTAAG-1 1896 | GGCGCATGCTCCAC-1 1897 | GGCGCATGTGGAAA-1 1898 | GGCGGACTAGAGGC-1 1899 | GGCGGACTAGGAGC-1 1900 | GGCGGACTCTGACA-1 1901 | GGCGGACTCTTGGA-1 1902 | GGCGGACTTACTGG-1 1903 | GGCGGACTTGAACC-1 1904 | GGCTAAACACCTGA-1 1905 | GGCTAAACTCTTAC-1 1906 | GGCTAATGAGCACT-1 1907 | GGCTAATGGTCTAG-1 1908 | GGCTCACTACTCAG-1 1909 | GGGAACGAAGCTCA-1 1910 | GGGAACGACACAAC-1 1911 | GGGAACGAGTGTCA-1 1912 | GGGAAGTGTTGAGC-1 1913 | GGGACCACACGTTG-1 1914 | GGGACCACAGAACA-1 1915 | GGGACCACGAATAG-1 1916 | GGGACCACGTCATG-1 1917 | GGGACCACTCAAGC-1 1918 | GGGACCACTCGTGA-1 1919 | GGGACCACTGCATG-1 1920 | GGGACCTGACCCTC-1 1921 | GGGACCTGCTTGCC-1 1922 | GGGACCTGTGGAGG-1 1923 | GGGATGGACGACAT-1 1924 | GGGATGGATACTTC-1 1925 | GGGATGGATGGTTG-1 1926 | GGGATTACGTCTAG-1 1927 | GGGCAAGATGCATG-1 1928 | GGGCACACGGTGAG-1 1929 | GGGCACACGTTGCA-1 1930 | GGGCAGCTTGGGAG-1 1931 | GGGCAGCTTTTCTG-1 1932 | GGGCCAACCTTGGA-1 1933 | GGGCCAACGCGTTA-1 1934 | GGGCCAACTACGCA-1 1935 | GGGCCAACTCCAAG-1 1936 | GGGCCATGATGGTC-1 1937 | GGGCCATGTTGACG-1 1938 | GGGTAACTCAGCTA-1 1939 | GGGTAACTCTAGTG-1 1940 | GGGTAACTCTGGAT-1 1941 | GGGTTAACGTGCAT-1 1942 | GGTAAAGAGCTAAC-1 1943 | GGTACAACTGCAAC-1 1944 | GGTACATGAAAGCA-1 1945 | GGTACATGAGCTCA-1 1946 | GGTACATGCGGTAT-1 1947 | GGTACATGGTTACG-1 1948 | GGTACATGTGGGAG-1 1949 | GGTACTGAACTCTT-1 1950 | GGTAGTACACCACA-1 1951 | GGTAGTACACTAGC-1 1952 | GGTAGTACCCTGTC-1 1953 | GGTAGTACGCCATA-1 1954 | GGTAGTACTGTCTT-1 1955 | GGTATCGAGACAAA-1 1956 | GGTATCGATGAACC-1 1957 | GGTCAAACCAAAGA-1 1958 | GGTCTAGAGAAACA-1 1959 | GGTCTAGATAGCGT-1 1960 | GGTGATACCGACTA-1 1961 | GGTGATACGACTAC-1 1962 | GGTGATACTGTTTC-1 1963 | GGTGGAGAAACGGG-1 1964 | GGTGGAGAAGTAGA-1 1965 | GGTGGAGACAGATC-1 1966 | GGTGGAGATCGATG-1 1967 | GGTGGAGATCTCTA-1 1968 | GGTGGAGATTACTC-1 1969 | GGTTTACTACGCAT-1 1970 | GTAACGTGACCTCC-1 1971 | GTAACGTGATCGGT-1 1972 | GTAACGTGCAGCTA-1 1973 | GTAACGTGGTTGAC-1 1974 | GTAAGCACAACGGG-1 1975 | GTAAGCACTCATTC-1 1976 | GTAAGCTGGTACCA-1 1977 | GTAATAACCTTCTA-1 1978 | GTAATAACGTTGTG-1 1979 | GTACCCTGACAGTC-1 1980 | GTACCCTGGAGCTT-1 1981 | GTACCCTGTCCTTA-1 1982 | GTACCCTGTGAACC-1 1983 | GTACGTGAACGTTG-1 1984 | GTACTTTGTCGACA-1 1985 | GTAGACTGAGATGA-1 1986 | GTAGACTGTATTCC-1 1987 | GTAGCAACAGTCGT-1 1988 | GTAGCAACCATTTC-1 1989 | GTAGCAACGGTAGG-1 1990 | GTAGCATGCACTCC-1 1991 | GTAGCATGTAAGCC-1 1992 | GTAGCCCTGACGTT-1 1993 | GTAGCTGAAGCTAC-1 1994 | GTAGCTGAATTCGG-1 1995 | GTAGGTACACGGGA-1 1996 | GTAGTGACCTCATT-1 1997 | GTAGTGTGAGCGGA-1 1998 | GTAGTGTGAGGCGA-1 1999 | GTAGTGTGTGGTTG-1 2000 | GTATCACTGGTAGG-1 2001 | GTATCTACAGAAGT-1 2002 | GTATCTACGACGAG-1 2003 | GTATCTACGTTACG-1 2004 | GTATTAGAAACAGA-1 2005 | GTATTAGAGGTCTA-1 2006 | GTATTCACACAGCT-1 2007 | GTATTCACGGGTGA-1 2008 | GTCAACGACACTGA-1 2009 | GTCAACGAGTGTAC-1 2010 | GTCAACGATCAGGT-1 2011 | GTCAATCTACACCA-1 2012 | GTCAATCTGTAGCT-1 2013 | GTCAATCTTGTGGT-1 2014 | GTCACCTGCCTCCA-1 2015 | GTCACCTGTCCCGT-1 2016 | GTCATACTAATCGC-1 2017 | GTCATACTGCGATT-1 2018 | GTCATACTTCGCCT-1 2019 | GTCATACTTTACCT-1 2020 | GTCATACTTTGACG-1 2021 | GTCCAAGAAAAACG-1 2022 | GTCCACTGACCTCC-1 2023 | GTCCACTGGGTACT-1 2024 | GTCCAGCTACGGGA-1 2025 | GTCCCATGTGGTGT-1 2026 | GTCGAATGAAGGCG-1 2027 | GTCGACCTGAATGA-1 2028 | GTCGACCTGTTCAG-1 2029 | GTCGCACTTGAGAA-1 2030 | GTCTAACTGGTCTA-1 2031 | GTCTAGGAGCTTCC-1 2032 | GTGAACACACTCTT-1 2033 | GTGAACACAGATCC-1 2034 | GTGAACACTCAGGT-1 2035 | GTGACCCTTAAGCC-1 2036 | GTGATGACAAGTGA-1 2037 | GTGATGACCTGAGT-1 2038 | GTGATGACGGTTTG-1 2039 | GTGATTCTCATTTC-1 2040 | GTGATTCTCTCTCG-1 2041 | GTGATTCTGGTTCA-1 2042 | GTGATTCTGTCGAT-1 2043 | GTGATTCTTAGCGT-1 2044 | GTGCCACTCAGGAG-1 2045 | GTGGATTGCACTAG-1 2046 | GTGGATTGCGGAGA-1 2047 | GTGGATTGTAACGC-1 2048 | GTGTACGATCAGTG-1 2049 | GTGTAGTGGGTACT-1 2050 | GTGTATCTAGCCTA-1 2051 | GTGTATCTAGTAGA-1 2052 | GTGTATCTGTTACG-1 2053 | GTGTCAGAAGCGTT-1 2054 | GTGTCAGAATGCTG-1 2055 | GTTAAAACCGAGAG-1 2056 | GTTAAAACTTCGCC-1 2057 | GTTAAATGCTCGAA-1 2058 | GTTAAATGTCGACA-1 2059 | GTTAACCTAGCTAC-1 2060 | GTTAACCTTGCTTT-1 2061 | GTTAGGTGCACTCC-1 2062 | GTTAGGTGCCAGTA-1 2063 | GTTAGGTGCCCAAA-1 2064 | GTTAGGTGGAACTC-1 2065 | GTTAGTCTAAGAAC-1 2066 | GTTATAGAGGACAG-1 2067 | GTTATGCTTTCATC-1 2068 | GTTCAACTGGGACA-1 2069 | GTTCAACTTATGCG-1 2070 | GTTGACGAGCCCTT-1 2071 | GTTGACGATATCGG-1 2072 | GTTGAGTGGTCTTT-1 2073 | GTTGAGTGTGCTTT-1 2074 | GTTGATCTGGGACA-1 2075 | GTTGATCTTTTCAC-1 2076 | GTTGGATGTTTACC-1 2077 | GTTGTACTATTCCT-1 2078 | GTTGTACTTTTGGG-1 2079 | GTTTAAGACCATGA-1 2080 | GTTTAAGACTGTCC-1 2081 | TAAACAACCAACCA-1 2082 | TAAACAACGAATCC-1 2083 | TAAAGACTCAGGAG-1 2084 | TAAATCGATGAGGG-1 2085 | TAACAATGTGCCCT-1 2086 | TAACACCTTCGCTC-1 2087 | TAACACCTTCGTAG-1 2088 | TAACACCTTGTTTC-1 2089 | TAACATGACACTAG-1 2090 | TAACCGGACTTACT-1 2091 | TAACGTCTCAACCA-1 2092 | TAACGTCTCATTGG-1 2093 | TAACTAGAATTTCC-1 2094 | TAACTAGACTTAGG-1 2095 | TAACTAGATCTGGA-1 2096 | TAACTCACGAGGAC-1 2097 | TAACTCACGTACAC-1 2098 | TAACTCACGTATCG-1 2099 | TAACTCACTCTACT-1 2100 | TAAGAACTGTGTCA-1 2101 | TAAGAGGACTAAGC-1 2102 | TAAGAGGACTTGTT-1 2103 | TAAGATACCCACAA-1 2104 | TAAGATACGGTTCA-1 2105 | TAAGATTGCGTAGT-1 2106 | TAAGATTGTTGCTT-1 2107 | TAAGCGTGAGGTTC-1 2108 | TAAGCGTGGACAAA-1 2109 | TAAGCGTGGGAAAT-1 2110 | TAAGCGTGTGCTCC-1 2111 | TAAGGCTGCCATGA-1 2112 | TAAGGCTGCTGCTC-1 2113 | TAAGGCTGTCTCGC-1 2114 | TAAGGGCTGCTGTA-1 2115 | TAAGGGCTTTACTC-1 2116 | TAAGTAACCGAGAG-1 2117 | TAAGTAACCTCCAC-1 2118 | TAAGTAACCTGTAG-1 2119 | TAAGTAACTTGTCT-1 2120 | TAATGATGAGCGGA-1 2121 | TAATGCCTCATGAC-1 2122 | TAATGCCTCGTCTC-1 2123 | TAATGTGAAGATGA-1 2124 | TAATGTGACTGCAA-1 2125 | TAATGTGATTACTC-1 2126 | TACAAATGGGTACT-1 2127 | TACAATGAAAACAG-1 2128 | TACAATGACTTAGG-1 2129 | TACAATGATGCTAG-1 2130 | TACACACTCACACA-1 2131 | TACACACTCTTACT-1 2132 | TACATAGAACGCAT-1 2133 | TACATCACACGGGA-1 2134 | TACATCACCTGTTT-1 2135 | TACATCACGCTAAC-1 2136 | TACATCACTGAACC-1 2137 | TACCATTGAGGTTC-1 2138 | TACCATTGCGGGAA-1 2139 | TACCATTGGGGATG-1 2140 | TACCATTGTGAGGG-1 2141 | TACCGGCTGTTGGT-1 2142 | TACGAGTGATCTCT-1 2143 | TACGAGTGATGCTG-1 2144 | TACGAGTGCGGAGA-1 2145 | TACGAGTGGTTGGT-1 2146 | TACGATCTAGTGTC-1 2147 | TACGATCTCACTGA-1 2148 | TACGATCTTACGAC-1 2149 | TACGCAGACGTCTC-1 2150 | TACGCAGAGAATCC-1 2151 | TACGCCACATTCCT-1 2152 | TACGCCACTCCCAC-1 2153 | TACGCCACTCCGAA-1 2154 | TACGCGCTCTTCTA-1 2155 | TACGGAACGCGTTA-1 2156 | TACGGCCTGGGACA-1 2157 | TACGGCCTGTCCTC-1 2158 | TACGTACTACGGAG-1 2159 | TACGTACTCAGTTG-1 2160 | TACGTACTCCCGTT-1 2161 | TACGTTACAGAAGT-1 2162 | TACGTTACCAAGCT-1 2163 | TACTAAGAAAGGTA-1 2164 | TACTAAGAATCACG-1 2165 | TACTAAGATGATGC-1 2166 | TACTAAGATTGCGA-1 2167 | TACTACACGAGAGC-1 2168 | TACTACACTTACCT-1 2169 | TACTACTGAACCTG-1 2170 | TACTACTGATGTCG-1 2171 | TACTACTGATTCTC-1 2172 | TACTACTGTATGGC-1 2173 | TACTCAACGGTCTA-1 2174 | TACTCAACTGCTAG-1 2175 | TACTCCCTCAGTTG-1 2176 | TACTCTGAATCGAC-1 2177 | TACTCTGACGAGTT-1 2178 | TACTCTGATTGACG-1 2179 | TACTGGGATCGATG-1 2180 | TACTGTTGAAAGCA-1 2181 | TACTGTTGAGGCGA-1 2182 | TACTGTTGCTGAAC-1 2183 | TACTTGACTCCTCG-1 2184 | TACTTGACTGGTGT-1 2185 | TACTTTCTTTTGGG-1 2186 | TAGAAACTAATCGC-1 2187 | TAGAAACTGCTTCC-1 2188 | TAGAAACTGGGATG-1 2189 | TAGAATTGCGACAT-1 2190 | TAGAATTGTATCGG-1 2191 | TAGACGTGCTTGAG-1 2192 | TAGACGTGTCGCTC-1 2193 | TAGAGCACCTTACT-1 2194 | TAGATTGACTTGTT-1 2195 | TAGATTGAGGCATT-1 2196 | TAGCATCTCAGCTA-1 2197 | TAGCATCTCCCTCA-1 2198 | TAGCATCTGCTGTA-1 2199 | TAGCATCTGGGACA-1 2200 | TAGCATCTTGTCGA-1 2201 | TAGCCCACAAAAGC-1 2202 | TAGCCCACAGCCAT-1 2203 | TAGCCCACAGCTAC-1 2204 | TAGCCCACCCACAA-1 2205 | TAGCCCTGCGGAGA-1 2206 | TAGCCGCTTACGAC-1 2207 | TAGCCGCTTACTTC-1 2208 | TAGCCGCTTTCCAT-1 2209 | TAGCTACTGAATAG-1 2210 | TAGCTACTGTAGCT-1 2211 | TAGCTACTTTTGCT-1 2212 | TAGGACTGTGCTGA-1 2213 | TAGGAGCTAAGGCG-1 2214 | TAGGAGCTGAGGGT-1 2215 | TAGGAGCTTGCATG-1 2216 | TAGGCAACCGTCTC-1 2217 | TAGGCATGCTCTCG-1 2218 | TAGGCATGGCGAGA-1 2219 | TAGGCTGATGCCTC-1 2220 | TAGGGACTGAACTC-1 2221 | TAGGTCGACACTGA-1 2222 | TAGGTCGAGGATCT-1 2223 | TAGGTGACACACTG-1 2224 | TAGGTGACACGTTG-1 2225 | TAGGTGTGTTCTGT-1 2226 | TAGGTTCTGAAGGC-1 2227 | TAGGTTCTTCTTAC-1 2228 | TAGGTTCTTGCTGA-1 2229 | TAGTAAACCTCGCT-1 2230 | TAGTAAACGTCACA-1 2231 | TAGTAATGAGATCC-1 2232 | TAGTACCTAAGAAC-1 2233 | TAGTATGATCTTAC-1 2234 | TAGTATGATTCTCA-1 2235 | TAGTCTTGGCTGTA-1 2236 | TAGTCTTGGGACTT-1 2237 | TAGTCTTGTGGAAA-1 2238 | TAGTGGTGAAGTGA-1 2239 | TAGTTAGAACCACA-1 2240 | TAGTTAGATGAACC-1 2241 | TATAAGACAACAGA-1 2242 | TATAAGACAGCTCA-1 2243 | TATAAGTGACACCA-1 2244 | TATAAGTGTATCGG-1 2245 | TATAAGTGTGGTGT-1 2246 | TATACAGAACCCTC-1 2247 | TATACAGAAGAACA-1 2248 | TATACAGATCCAGA-1 2249 | TATACCACCTGATG-1 2250 | TATACGCTACCAAC-1 2251 | TATAGATGGACGGA-1 2252 | TATAGATGTTCCGC-1 2253 | TATCACTGACTGTG-1 2254 | TATCCAACCAGCTA-1 2255 | TATCCAACTCTCTA-1 2256 | TATCGACTACTAGC-1 2257 | TATCGACTCGATAC-1 2258 | TATCGTACAGATGA-1 2259 | TATCGTACATTCCT-1 2260 | TATCTCGAGAGATA-1 2261 | TATCTGACAGGTTC-1 2262 | TATCTGACTGTTTC-1 2263 | TATCTTCTAAACAG-1 2264 | TATGAATGGAGGAC-1 2265 | TATGAATGTTTGCT-1 2266 | TATGCGGATAACCG-1 2267 | TATGGGTGCATCAG-1 2268 | TATGGGTGCTAGCA-1 2269 | TATGGTCTCTACCC-1 2270 | TATGTCACGGAACG-1 2271 | TATGTCACTAACCG-1 2272 | TATGTCACTTCTCA-1 2273 | TATGTGCTCCGATA-1 2274 | TATGTGCTGGATTC-1 2275 | TATTGCTGAAGAAC-1 2276 | TATTGCTGCCGTTC-1 2277 | TATTGCTGTCTGGA-1 2278 | TATTGCTGTGCACA-1 2279 | TATTTCCTATTGGC-1 2280 | TATTTCCTGGAGGT-1 2281 | TATTTCCTGGTGTT-1 2282 | TCAACACTGTTTGG-1 2283 | TCAAGGACAGCGTT-1 2284 | TCAAGGACATTCTC-1 2285 | TCAAGGACGGTGTT-1 2286 | TCAATCACACTCTT-1 2287 | TCAATCACAGTCGT-1 2288 | TCACAACTATGTGC-1 2289 | TCACAACTTTGCTT-1 2290 | TCACATACACTTTC-1 2291 | TCACATACAGGGTG-1 2292 | TCACCCGAGACGGA-1 2293 | TCACCGTGCTCGCT-1 2294 | TCACCTCTACGACT-1 2295 | TCACCTCTTCCAAG-1 2296 | TCACGAGAGGAGGT-1 2297 | TCACTATGGGGCAA-1 2298 | TCACTATGGTTGTG-1 2299 | TCAGACGACGCTAA-1 2300 | TCAGACGACGTTAG-1 2301 | TCAGAGACTCCAGA-1 2302 | TCAGCAGACTCCAC-1 2303 | TCAGCGCTCTAGTG-1 2304 | TCAGCGCTGGATCT-1 2305 | TCAGCGCTGGTATC-1 2306 | TCAGGATGAAGTAG-1 2307 | TCAGGATGCCTTTA-1 2308 | TCAGTGGAAGATCC-1 2309 | TCAGTTACCTACGA-1 2310 | TCAGTTACTAGAAG-1 2311 | TCATCAACCCGATA-1 2312 | TCATCAACTGTTCT-1 2313 | TCATCATGCAGTTG-1 2314 | TCATCCCTTACTGG-1 2315 | TCATTCGATACAGC-1 2316 | TCCACGTGGAAACA-1 2317 | TCCACTCTACACTG-1 2318 | TCCACTCTGAGCTT-1 2319 | TCCACTCTTACTTC-1 2320 | TCCATAACAAAGTG-1 2321 | TCCATAACCGTAGT-1 2322 | TCCATAACGATGAA-1 2323 | TCCATAACTACGCA-1 2324 | TCCATCCTCCCTAC-1 2325 | TCCCACGATCATTC-1 2326 | TCCCATCTCAAAGA-1 2327 | TCCCGAACACAGTC-1 2328 | TCCCGAACTTCGCC-1 2329 | TCCCGATGAGATCC-1 2330 | TCCCGATGCCTGAA-1 2331 | TCCCGATGCTGTGA-1 2332 | TCCCTACTCAACTG-1 2333 | TCCGAAGACAATCG-1 2334 | TCCGAAGACGTTAG-1 2335 | TCCGGACTGAGGTG-1 2336 | TCCGGACTGTACGT-1 2337 | TCCTAAACATCGAC-1 2338 | TCCTAAACCGAGAG-1 2339 | TCCTAAACCGCATA-1 2340 | TCCTAATGGTTTGG-1 2341 | TCCTACCTGTCGTA-1 2342 | TCCTATGAAAAGCA-1 2343 | TCGAATCTCTGGTA-1 2344 | TCGACCTGCCGATA-1 2345 | TCGACGCTTCTATC-1 2346 | TCGACGCTTTGACG-1 2347 | TCGAGAACGACAGG-1 2348 | TCGAGAACGTTAGC-1 2349 | TCGAGCCTATCAGC-1 2350 | TCGAGCCTGCGAGA-1 2351 | TCGAGCCTTGTGAC-1 2352 | TCGATACTATTCCT-1 2353 | TCGATACTTGCACA-1 2354 | TCGATTTGATGCCA-1 2355 | TCGATTTGCACTCC-1 2356 | TCGATTTGCAGCTA-1 2357 | TCGATTTGCCTACC-1 2358 | TCGATTTGTCGTGA-1 2359 | TCGCACACCATCAG-1 2360 | TCGCAGCTAGATCC-1 2361 | TCGCCATGAGACTC-1 2362 | TCGCCATGTGGTCA-1 2363 | TCGGACCTAACAGA-1 2364 | TCGGACCTATAAGG-1 2365 | TCGGACCTGTACAC-1 2366 | TCGGTAGAGTAGGG-1 2367 | TCGGTAGATCCCAC-1 2368 | TCGTAGGATCGACA-1 2369 | TCGTGAGAACTGTG-1 2370 | TCGTTATGGACAAA-1 2371 | TCTAACACCAGTTG-1 2372 | TCTAACACGAGCAG-1 2373 | TCTAACTGAACCAC-1 2374 | TCTAAGCTAATGCC-1 2375 | TCTAAGCTTAGTCG-1 2376 | TCTAAGCTTCTAGG-1 2377 | TCTAAGCTTGTTCT-1 2378 | TCTAAGCTTTCGCC-1 2379 | TCTACAACGACTAC-1 2380 | TCTAGACTTAGAAG-1 2381 | TCTAGTTGCACCAA-1 2382 | TCTATGTGAAGAGT-1 2383 | TCTATGTGAGTCTG-1 2384 | TCTCAAACCTAAGC-1 2385 | TCTCTAGAATTTCC-1 2386 | TCTGATACACGTGT-1 2387 | TCTGATACTCGCCT-1 2388 | TCTTACGAACCTGA-1 2389 | TCTTCAGAGCTACA-1 2390 | TCTTGATGCGGAGA-1 2391 | TGAAATTGGTGAGG-1 2392 | TGAACCGAAAACGA-1 2393 | TGAACCGACTACTT-1 2394 | TGAACCGATTCGGA-1 2395 | TGAAGCACTCACGA-1 2396 | TGAAGCTGAACGAA-1 2397 | TGAAGCTGAGACTC-1 2398 | TGAAGCTGCATGGT-1 2399 | TGAAGCTGCGTAAC-1 2400 | TGAATAACCACTTT-1 2401 | TGAATAACTCCCAC-1 2402 | TGACACGACCTTAT-1 2403 | TGACCAGACAACCA-1 2404 | TGACCAGAGGATTC-1 2405 | TGACCGCTAAAAGC-1 2406 | TGACCGCTCTGCAA-1 2407 | TGACGATGCAAAGA-1 2408 | TGACGCCTGTACCA-1 2409 | TGACGCCTTTACTC-1 2410 | TGACTGGAAGAGAT-1 2411 | TGACTGGACCGTAA-1 2412 | TGACTGGACGCAAT-1 2413 | TGACTGGAGGACAG-1 2414 | TGACTGGATTCTCA-1 2415 | TGACTTACACACCA-1 2416 | TGACTTACAGTCTG-1 2417 | TGACTTTGCGCATA-1 2418 | TGACTTTGTTTGTC-1 2419 | TGAGACACAAGGTA-1 2420 | TGAGACACTCAAGC-1 2421 | TGAGACACTGTGCA-1 2422 | TGAGCTGAATGCTG-1 2423 | TGAGCTGACTGGAT-1 2424 | TGAGCTGAGCGAGA-1 2425 | TGAGCTGATGCTAG-1 2426 | TGAGGACTCTCATT-1 2427 | TGAGGACTTCATTC-1 2428 | TGAGGTACGAACCT-1 2429 | TGAGTCGAGTTACG-1 2430 | TGAGTGACTGAGCT-1 2431 | TGATAAACGAATCC-1 2432 | TGATAAACTCCGTC-1 2433 | TGATAAACTTTCAC-1 2434 | TGATACCTCACTAG-1 2435 | TGATACCTGTTGGT-1 2436 | TGATACCTTATGCG-1 2437 | TGATACCTTGAAGA-1 2438 | TGATATGAACCTTT-1 2439 | TGATCACTAGCATC-1 2440 | TGATCACTCTCGCT-1 2441 | TGATCACTTCTACT-1 2442 | TGATCGGACTGACA-1 2443 | TGATCGGAGGAGCA-1 2444 | TGATCGGATATGCG-1 2445 | TGATTAGACATTGG-1 2446 | TGATTAGATGACTG-1 2447 | TGATTAGATGCTAG-1 2448 | TGATTCACTATGCG-1 2449 | TGATTCACTGTCAG-1 2450 | TGATTCTGCCGAAT-1 2451 | TGATTCTGCTCTTA-1 2452 | TGCAAGTGAGAACA-1 2453 | TGCAAGTGGGTAGG-1 2454 | TGCAATCTTCAGGT-1 2455 | TGCACAGACGACAT-1 2456 | TGCCAAGAGCAGTT-1 2457 | TGCCAAGATCTCTA-1 2458 | TGCCACTGAACGTC-1 2459 | TGCCACTGCGATAC-1 2460 | TGCCAGCTTGGCAT-1 2461 | TGCCCAACAGCAAA-1 2462 | TGCCCAACCGCATA-1 2463 | TGCCGACTCTCCCA-1 2464 | TGCGAAACAGTCAC-1 2465 | TGCGAAACGTTGCA-1 2466 | TGCGATGAACGGTT-1 2467 | TGCGATGACCTCGT-1 2468 | TGCGATGACTAGTG-1 2469 | TGCGATGACTGCTC-1 2470 | TGCGATGACTTGCC-1 2471 | TGCGATGAGTGCTA-1 2472 | TGCGCACTCTTGAG-1 2473 | TGCGTAGAATAAGG-1 2474 | TGCGTAGACGGGAA-1 2475 | TGCGTAGATGGTCA-1 2476 | TGCTAGGAAACCGT-1 2477 | TGCTAGGATAGTCG-1 2478 | TGCTATACGGTTCA-1 2479 | TGCTATACTGCTGA-1 2480 | TGCTGAGAGAGCAG-1 2481 | TGCTGAGATTATCC-1 2482 | TGGAAAGACTCTCG-1 2483 | TGGAAAGAGCGATT-1 2484 | TGGAAAGAGGTCAT-1 2485 | TGGAAAGATATGGC-1 2486 | TGGAACACAAACAG-1 2487 | TGGAACACGCTAAC-1 2488 | TGGAAGCTCAGATC-1 2489 | TGGACCCTACACTG-1 2490 | TGGACCCTCATGGT-1 2491 | TGGACCCTGGTACT-1 2492 | TGGACTGAGTATGC-1 2493 | TGGAGACTATCAGC-1 2494 | TGGAGACTGAAACA-1 2495 | TGGAGACTTCAAGC-1 2496 | TGGAGACTTGACCA-1 2497 | TGGAGGGACGGAGA-1 2498 | TGGAGGGAGCTATG-1 2499 | TGGATCGATAAAGG-1 2500 | TGGATGTGACCTAG-1 2501 | TGGATGTGATGTCG-1 2502 | TGGATGTGTGAAGA-1 2503 | TGGATTCTCATACG-1 2504 | TGGCAATGCTTGTT-1 2505 | TGGCAATGGAGGGT-1 2506 | TGGCACCTTCACGA-1 2507 | TGGCACCTTCAGTG-1 2508 | TGGGTATGAAGAGT-1 2509 | TGGGTATGCACAAC-1 2510 | TGGGTATGGTACGT-1 2511 | TGGGTATGTTTGGG-1 2512 | TGGTAGACATGCCA-1 2513 | TGGTAGACCCTCAC-1 2514 | TGGTAGACCTGATG-1 2515 | TGGTAGTGCACTGA-1 2516 | TGGTATCTAAACAG-1 2517 | TGGTATCTCTTCCG-1 2518 | TGGTCAGACCCAAA-1 2519 | TGGTCAGACCGTTC-1 2520 | TGGTTACTGACGTT-1 2521 | TGGTTACTGTTCTT-1 2522 | TGTAACCTAGAGGC-1 2523 | TGTAACCTTGCCTC-1 2524 | TGTAATGACACAAC-1 2525 | TGTAATGAGGTAAA-1 2526 | TGTACTTGCTCTAT-1 2527 | TGTAGGTGCGAGAG-1 2528 | TGTAGGTGCTATGG-1 2529 | TGTAGGTGCTCTAT-1 2530 | TGTAGGTGTGCTGA-1 2531 | TGTAGTCTTCCAGA-1 2532 | TGTAGTCTTGCACA-1 2533 | TGTATCTGTTAGGC-1 2534 | TGTATGCTCATGGT-1 2535 | TGTATGCTGTAGGG-1 2536 | TGTATGCTTTCATC-1 2537 | TGTCAGGAATACCG-1 2538 | TGTCAGGAGATGAA-1 2539 | TGTCTAACCCCTTG-1 2540 | TGTGACGATTCTCA-1 2541 | TGTGAGACTGTCAG-1 2542 | TGTGAGACTTGAGC-1 2543 | TGTGAGTGACCACA-1 2544 | TGTGAGTGAGTGCT-1 2545 | TGTGAGTGGAGATA-1 2546 | TGTGATCTCTCTAT-1 2547 | TGTGATCTGACACT-1 2548 | TGTGGATGGCCAAT-1 2549 | TGTTAAGACAAAGA-1 2550 | TGTTAAGATAAGGA-1 2551 | TGTTAAGATTGGCA-1 2552 | TGTTACACCGCATA-1 2553 | TGTTACACGACTAC-1 2554 | TGTTACTGGCTACA-1 2555 | TGTTACTGTAGTCG-1 2556 | TTAACCACCGTAAC-1 2557 | TTAACCACTAAGGA-1 2558 | TTAACCACTCAGAC-1 2559 | TTACACACGTGTTG-1 2560 | TTACACACTCCTAT-1 2561 | TTACCATGAATCGC-1 2562 | TTACCATGGTTGAC-1 2563 | TTACCATGTGTCTT-1 2564 | TTACCATGTTGTGG-1 2565 | TTACGACTGAGAGC-1 2566 | TTACGACTTGACAC-1 2567 | TTACGTACGTTCAG-1 2568 | TTACTCGAACGTTG-1 2569 | TTACTCGAAGAATG-1 2570 | TTACTCGACGCAAT-1 2571 | TTACTCGAGGGTGA-1 2572 | TTACTCGATCTACT-1 2573 | TTAGAATGTGGTGT-1 2574 | TTAGAATGTGTAGC-1 2575 | TTAGACCTCCTACC-1 2576 | TTAGACCTCCTTTA-1 2577 | TTAGCTACAACCGT-1 2578 | TTAGCTACTGTCCC-1 2579 | TTAGCTACTTTCGT-1 2580 | TTAGGGACGCGAAG-1 2581 | TTAGGGTGCTGGAT-1 2582 | TTAGGGTGTCCTGC-1 2583 | TTAGGTCTACTTTC-1 2584 | TTAGTCACCAGTTG-1 2585 | TTAGTCTGAAAGCA-1 2586 | TTAGTCTGCCAACA-1 2587 | TTAGTCTGTGCACA-1 2588 | TTATCCGACTAGTG-1 2589 | TTATCCGAGAAAGT-1 2590 | TTATGAGAGATAAG-1 2591 | TTATGCACGTCACA-1 2592 | TTATGGCTTATGGC-1 2593 | TTATTCCTATGCTG-1 2594 | TTATTCCTGGACAG-1 2595 | TTATTCCTGGTACT-1 2596 | TTATTCCTTCGTGA-1 2597 | TTCAAAGATAAAGG-1 2598 | TTCAACACAACAGA-1 2599 | TTCAACACCCCAAA-1 2600 | TTCAACACGGACGA-1 2601 | TTCAAGCTAAGAAC-1 2602 | TTCAAGCTAGATGA-1 2603 | TTCAAGCTGTTGAC-1 2604 | TTCAAGCTTCCAAG-1 2605 | TTCAAGCTTGATGC-1 2606 | TTCAAGCTTTCGCC-1 2607 | TTCACAACCCGTTC-1 2608 | TTCACAACGTCTGA-1 2609 | TTCAGACTACCCAA-1 2610 | TTCAGACTCTCGAA-1 2611 | TTCAGTACCGACTA-1 2612 | TTCAGTACTCAAGC-1 2613 | TTCAGTACTCCTAT-1 2614 | TTCAGTTGCCAAGT-1 2615 | TTCAGTTGTCCTTA-1 2616 | TTCAGTTGTCTAGG-1 2617 | TTCAGTTGTCTCGC-1 2618 | TTCATCGAGGTGGA-1 2619 | TTCATGTGTGGTGT-1 2620 | TTCATTCTATGTCG-1 2621 | TTCATTCTTCTCTA-1 2622 | TTCCAAACCTATGG-1 2623 | TTCCAAACCTCCCA-1 2624 | TTCCAAACTCCCAC-1 2625 | TTCCAAACTTGACG-1 2626 | TTCCATGACGAGAG-1 2627 | TTCCATGACTGTCC-1 2628 | TTCCCACTTGAGGG-1 2629 | TTCCCACTTGTCTT-1 2630 | TTCCTAGAAAGTGA-1 2631 | TTCCTAGACTAGTG-1 2632 | TTCGAGGACTCTAT-1 2633 | TTCGAGGAGGGCAA-1 2634 | TTCGAGGATAGAAG-1 2635 | TTCGATTGAGCATC-1 2636 | TTCGGAGAATGCCA-1 2637 | TTCGGAGATGTGCA-1 2638 | TTCGTATGAAAAGC-1 2639 | TTCGTATGGATAGA-1 2640 | TTCGTATGGTCTGA-1 2641 | TTCGTATGTCCTTA-1 2642 | TTCTACGAACGTAC-1 2643 | TTCTACGAGTTGGT-1 2644 | TTCTAGTGACACGT-1 2645 | TTCTAGTGCATGAC-1 2646 | TTCTAGTGGAGAGC-1 2647 | TTCTAGTGGTCACA-1 2648 | TTCTCAGAAGAGAT-1 2649 | TTCTCAGAAGCATC-1 2650 | TTCTCAGATGGAGG-1 2651 | TTCTGATGGAGACG-1 2652 | TTCTTACTCTGGAT-1 2653 | TTGAACCTCCTTGC-1 2654 | TTGAATGAACTACG-1 2655 | TTGAATGACTTACT-1 2656 | TTGAATGATCTCAT-1 2657 | TTGACACTCTGTAG-1 2658 | TTGACACTGATAAG-1 2659 | TTGAGGACAGAACA-1 2660 | TTGAGGACTACGCA-1 2661 | TTGAGGTGGACGGA-1 2662 | TTGCATTGAGCTAC-1 2663 | TTGCATTGCTAAGC-1 2664 | TTGCATTGTGACTG-1 2665 | TTGCTAACACCAAC-1 2666 | TTGCTAACACGCTA-1 2667 | TTGCTAACCACTCC-1 2668 | TTGCTATGGTACGT-1 2669 | TTGCTATGGTAGGG-1 2670 | TTGGAGACCAATCG-1 2671 | TTGGAGACGCTATG-1 2672 | TTGGAGACTATGGC-1 2673 | TTGGGAACTGAACC-1 2674 | TTGGTACTACTGGT-1 2675 | TTGGTACTCTTAGG-1 2676 | TTGGTACTGAATCC-1 2677 | TTGGTACTGGATTC-1 2678 | TTGTACACGTTGTG-1 2679 | TTGTACACTTGCAG-1 2680 | TTGTAGCTAGCTCA-1 2681 | TTGTAGCTCTCTTA-1 2682 | TTGTCATGGACGGA-1 2683 | TTTAGAGATCCTCG-1 2684 | TTTAGCTGATACCG-1 2685 | TTTAGCTGGATACC-1 2686 | TTTAGCTGTACTCT-1 2687 | TTTAGGCTCCTTTA-1 2688 | TTTATCCTGTTGTG-1 2689 | TTTCACGAGGTTCA-1 2690 | TTTCAGTGGAAGGC-1 2691 | TTTCAGTGTCACGA-1 2692 | TTTCAGTGTCTATC-1 2693 | TTTCAGTGTGCAGT-1 2694 | TTTCCAGAGGTGAG-1 2695 | TTTCGAACACCTGA-1 2696 | TTTCGAACTCTCAT-1 2697 | TTTCTACTGAGGCA-1 2698 | TTTCTACTTCCTCG-1 2699 | TTTGCATGAGAGGC-1 2700 | TTTGCATGCCTCAC-1 2701 | -------------------------------------------------------------------------------- /tutorial/output_15_1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jianhuupenn/ItClust/059b8d66d4f15bd79a2db5c9541f32c9685ba502/tutorial/output_15_1.png -------------------------------------------------------------------------------- /tutorial/output_17_0.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jianhuupenn/ItClust/059b8d66d4f15bd79a2db5c9541f32c9685ba502/tutorial/output_17_0.png -------------------------------------------------------------------------------- /tutorial/tutorial.md: -------------------------------------------------------------------------------- 1 |

ItClust Tutorial

2 | 3 | 4 |
Author: Jian Hu, Xiangjie Li, Gang Hu, Yafei Lyu, Katalin Susztak, Mingyao Li*
5 | 6 | 7 | ### 0. Installation 8 | 9 | To install `ItClust` package you must make sure that your python version is either `3.5.x` or `3.6.x`. If you don’t know the version of python you can check it by: 10 | ```python 11 | import platform 12 | platform.python_version() 13 | #3.7.0 14 | ``` 15 | **Note:** Because ItClust depends on `tensorflow`, you should make sure the version of `tensorflow` is lower than `2.0`. If you want to get the same results as the results in our paper. 16 | ``` 17 | import tensorflow as tf 18 | tf.__version__ 19 | #1.7.0 20 | ``` 21 | Now you can install the current release of `ItClust` by the following three ways. 22 | 23 | * PyPI 24 | Directly install the package from PyPI. 25 | 26 | ```bash 27 | pip3 install ItClust 28 | ``` 29 | **Note**: you need to make sure that the `pip` is for python3,or we should install ItClust by 30 | ```bash 31 | python3 -m pip install ItClust 32 | #or 33 | pip3 install ItClust 34 | ``` 35 | 36 | If you do not have permission (when you get a permission denied error), you should install ItClust by 37 | 38 | ```bash 39 | pip3 install --user ItClust 40 | ``` 41 | 42 | * Github 43 | Download the package from [Github]() and install it locally: 44 | 45 | ```bash 46 | git clone https://github.com/jianhuupenn/ItClust 47 | cd ItClust/ItClust_package/ 48 | python3 setup.py install --user 49 | ``` 50 | 51 | * Anaconda 52 | 53 | If you do not have Python3.5 or Python3.6 installed, consider installing Anaconda (see [Installing Anaconda](https://docs.anaconda.com/anaconda/install/)). After installing Anaconda, you can create a new environment, for example, `ItClust` (*you can change to any name you like*): 54 | 55 | ```bash 56 | # create an environment called ItClust 57 | conda create -n ItClust python=3.7.0 58 | # activate your environment 59 | conda activate ItClust 60 | git clone https://github.com/jianhuupenn/ItClust 61 | cd ItClust/ItClust_package/ 62 | python3 setup.py build 63 | python3 setup.py install 64 | 65 | # now you can check whether `ItClust` installed successfully! 66 | ``` 67 | 68 | The installation should take within 30 seconds. 69 |
70 | 71 | 72 | 73 | ### 1. Import python modules 74 | 75 | 76 | ```python 77 | import ItClust as ic 78 | import scanpy.api as sc 79 | import os 80 | from numpy.random import seed 81 | from tensorflow import set_random_seed 82 | import pandas as pd 83 | import numpy as np 84 | import warnings 85 | os.environ["CUDA_VISIBLE_DEVICES"]="1" 86 | warnings.filterwarnings("ignore") 87 | #import sys 88 | #!{sys.executable} -m pip install 'scanpy==1.4.4.post1' 89 | #Set seeds 90 | seed(20180806) 91 | np.random.seed(10) 92 | set_random_seed(20180806) # on GPU may be some other default 93 | 94 | ``` 95 | 96 | Using TensorFlow backend. 97 | 98 | 99 | ### 2. Read in data 100 | The current version of ItClust works with an AnnData object. AnnData stores a data matrix .X together with annotations of observations .obs, variables .var and unstructured annotations .uns. The ItClust package provides 3 ways to prepare an AnnData object for the following analysis. 101 |
102 | ItClust supports most forms of the scRNAseq data, including UMI, TPM, FPKM. 103 |
104 |
105 | Important Note: For the source data, please store the true cell type label information in one column named "cell type". 106 | 107 | #### 1.1 Start from a 10X dataset 108 | Here we use the pbmc data as an example: 109 | Download the data and unzip it. Then move the data to data/pbmc/. 110 | 111 | 112 | ```python 113 | adata = ic.read_10X(data_path='./data/pbmc') 114 | ``` 115 | 116 | var_names are not unique, "make_index_unique" has applied 117 | 118 | 119 | #### 2.2 Start from *.mtx and *.tsv files 120 | When the expression data do not follow the standard 10X dataset format, we can manually import the data as follows. 121 | 122 | 123 | ```python 124 | #1 Read the expression matrix from *.mtx file. 125 | # The rows of this matrix correspond to cells, columns corresond to genes. 126 | adata = read_mtx('./data/pbmc/matrix.mtx').T 127 | 128 | #2 Read the *.tsv file for gene annotations. Make sure the gene names are unique. 129 | genes = pd.read_csv('./data/pbmc/genes.tsv', header=None, sep='\t') 130 | adata.var['gene_ids'] = genes[0].values 131 | adata.var['gene_symbols'] = genes[1].values 132 | adata.var_names = adata.var['gene_symbols'] 133 | # Make sure the gene names are unique 134 | adata.var_names_make_unique(join="-") 135 | 136 | #3 Read the *.tsv file for cell annotations. Make sure the cell names are unique. 137 | cells = pd.read_csv('./data/pbmc/barcodes.tsv', header=None, sep='\t') 138 | adata.obs['barcode'] = cells[0].values 139 | adata.obs_names = cells[0] 140 | # Make sure the cell names are unique 141 | adata.obs_names_make_unique(join="-") 142 | ``` 143 | 144 | Variable names are not unique. To make them unique, call `.var_names_make_unique`. 145 | 146 | 147 | #### 2.3 Start from a *.h5ad file 148 | We will use human pancreas data as our example for transfer learning. 149 | The Baron et al. data is used as source data and Segerstolpe et al. is treated as traget data. We can use the following codes to read data in from *.h5ad files: 150 | 151 | 152 | ```python 153 | adata_train=sc.read("./data/pancreas/Bh.h5ad") 154 | adata_test=sc.read("./data/pancreas/smartseq2.h5ad") 155 | ``` 156 | 157 | ### 3. Fit ItClust model 158 | ItClust includes preprocessing steps, that is, filtering of cells/genes, normalization, scaling and selection of highly variables genes. 159 | 160 | 161 | ```python 162 | clf=ic.transfer_learning_clf() 163 | clf.fit(adata_train, adata_test) 164 | ``` 165 | the var_names of adata.raw: adata.raw.var_names.is_unique=: True 166 | The number of training cell types is: 14 167 | Training the source network 168 | The layer numbers are[32, 16] 169 | The shape of xtrain is:8569:867 170 | The shape of xtest is:2394:867 171 | Doing DEC: pretrain 172 | ...Pretraining... 173 | Doing SAE: pretrain_stacks 174 | Pretraining the 1th layer... 175 | learning rate = 0.1 176 | 177 | learning rate = 0.01 178 | learning rate = 0.001 179 | The 1th layer has been pretrained. 180 | Pretraining the 2th layer... 181 | learning rate = 0.1 182 | learning rate = 0.01 183 | learning rate = 0.001 184 | The 2th layer has been pretrained. 185 | Doing SAE: pretrain_autoencoders 186 | Copying layer-wise pretrained weights to deep autoencoders 187 | Fine-tuning autoencoder end-to-end 188 | learning rate = 0.1 189 | learning rate = 0.010000000000000002 190 | learning rate = 0.001 191 | learning rate = 0.0001 192 | learning rate = 1e-05 193 | learning rate = 1.0000000000000002e-06 194 | Pretraining time: 158.4946711063385 195 | y known, initilize Cluster centroid using y 196 | The shape of cluster_center is (14, 16) 197 | Doing DEC: fit_supervised 198 | Training model finished! Start to fit target network! 199 | Doing DEC: pretrain_transfer 200 | The shape of features is (2394, 16) 201 | The shape of y_trans is (2394,) 202 | ...predicted y_test known, use it to get n_cliusters and init_centroid 203 | The length layers of self.model 4 204 | Doing DEC: fit_trajectory 205 | The value of delta_label of current 1 th iteration is 0.002506265664160401 >= tol [0.001] 206 | This is the iteration of 0 207 | The value of delta_label of current 2 th iteration is 0.004177109440267335 >= tol [0.001] 208 | The value of delta_label of current 3 th iteration is 0.001670843776106934 >= tol [0.001] 209 | delta_label 0.000835421888053467 < tol [0.001] 210 | Reached tolerance threshold. Stopped training. 211 | The final prediction cluster is: 212 | 2 988 213 | 5 457 214 | 3 323 215 | 8 210 216 | 0 182 217 | 4 126 218 | 1 56 219 | 6 22 220 | 10 8 221 | 9 7 222 | 7 7 223 | 11 6 224 | 12 2 225 | dtype: int64 226 | How many trajectories 1 227 | 228 | 229 | ### 4. Prediction 230 | predict() function will return the cluster prediction, clustering probability matrix and cell type confidence score. 231 | 232 | If the parameter write==True(default), it will also write the results to save_dir. 233 | 234 | The cluster prediction is written to save_dir+"/clustering_results.csv". 235 | 236 | The cell type confidence score is written to save_dir+"/cell type_assignment.txt". 237 | 238 | The clustering probability matrix is written to save_dir+"/clustering_prob.csv". 239 | 240 | 241 | 242 | ```python 243 | pred, prob, cell_type_pred=clf.predict() 244 | pred.head() 245 | ``` 246 | 247 | 248 | 249 | 250 | 251 |
252 | 253 | 254 | 255 | 256 | 257 | 258 | 259 | 260 | 261 | 262 | 263 | 264 | 265 | 266 | 267 | 268 | 269 | 270 | 271 | 272 | 273 | 274 | 275 | 276 | 277 | 278 | 279 | 280 | 281 | 282 | 283 | 284 | 285 | 286 | 287 |
cell_idcluster
0AZ_A2-target8
1AZ_H5-target8
2AZ_G5-target8
3AZ_D8-target8
4AZ_D12-target8
288 |
289 | 290 | 291 | 292 | ### 5. Visualization 293 | #### 5.1 t-SNE 294 | 295 | 296 | ```python 297 | import matplotlib 298 | matplotlib.use("Agg") 299 | matplotlib.rcParams['figure.dpi']= 300 300 | colors_use=['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#9467bd', '#8c564b', '#e377c2', '#bcbd22', '#17becf', '#aec7e8', '#ffbb78', '#98df8a', '#ff9896','#bec1d4','#bb7784','#4a6fe3','#FFFF00''#111010'] 301 | # Run t-SNE 302 | clf.adata_test.obsm['X_tsne']=clf.tSNE() 303 | num_celltype=len(clf.adata_test.obs["celltype"].unique()) 304 | clf.adata_test.uns["celltype_colors"]=list(colors_use[:num_celltype]) 305 | clf.adata_test.uns["decisy_trans_True_colors"]=list(colors_use[:num_celltype]) 306 | sc.pl.tsne(clf.adata_test,color=["decisy_trans_True","celltype"],title=["ItClust predition","True cell type"],show=True,size=50000/clf.adata_test.shape[0]) 307 | ``` 308 | 309 | Doing t-SNE! 310 | WARNING: Consider installing the package MulticoreTSNE (https://github.com/DmitryUlyanov/Multicore-TSNE). Even for n_jobs=1 this speeds up the computation considerably and might yield better converged results. 311 | 312 | 313 | 314 | ![png](output_15_1.png) 315 | 316 | 317 | #### 5.2 U-map 318 | 319 | 320 | ```python 321 | clf.adata_test.obsm['X_umap']=clf.Umap() 322 | sc.pl.umap(clf.adata_test,color=["decisy_trans_True","celltype"],show=True,save=None,title=["ItClust predition","True cell type"],size=50000/clf.adata_test.shape[0]) 323 | ``` 324 | 325 | 326 | ![png](output_17_0.png) 327 | 328 | --------------------------------------------------------------------------------