├── COVID19.csv ├── step0_map_raw_data_using_cellranger.sh ├── LICENSE ├── README.md ├── step2_integrate_patients_and_healthy_controls.R ├── step4_find_DEGs_between_disease_stages.ipynb ├── .ipynb_checkpoints ├── step4_find_DEGs_between_disease_stages-checkpoint.ipynb └── step1_detect_doublets_using_scrublet-checkpoint.ipynb └── step1_detect_doublets_using_scrublet.ipynb /COVID19.csv: -------------------------------------------------------------------------------- 1 | library_id,molecule_h5,batch 2 | P1-1r1,P1-1r1_outs/outs/molecule_info.h5,P1-1r1 3 | P1-1r2,P1-1r2_outs/outs/molecule_info.h5,P1-1r2 4 | P1-2r1,P1-2r1_outs/outs/molecule_info.h5,P1-2r1 5 | P1-2r2,P1-2r2_outs/outs/molecule_info.h5,P1-2r2 6 | P2-1,P2-1_outs/outs/molecule_info.h5,P2-1 7 | P2-2,P2-2_outs/outs/molecule_info.h5,P2-2 8 | P2-3,P2-3_outs/outs/molecule_info.h5,P2-3 9 | -------------------------------------------------------------------------------- /step0_map_raw_data_using_cellranger.sh: -------------------------------------------------------------------------------- 1 | cellranger count --id=P1-1r1_outs --transcriptome=~/refdata-cellranger-GRCh38-3.0.0/ --fastqs=P1-1r1/ 2 | cellranger count --id=P1-1r2_outs --transcriptome=~/refdata-cellranger-GRCh38-3.0.0/ --fastqs=P1-1r2/ 3 | cellranger count --id=P1-2r1_outs --transcriptome=~/refdata-cellranger-GRCh38-3.0.0/ --fastqs=P1-2r1/ 4 | cellranger count --id=P1-2r2_outs --transcriptome=~/refdata-cellranger-GRCh38-3.0.0/ --fastqs=P1-2r2/ 5 | cellranger count --id=P2-1_outs --transcriptome=~/refdata-cellranger-GRCh38-3.0.0/ --fastqs=P2-1/ 6 | cellranger count --id=P2-2_outs --transcriptome=~/refdata-cellranger-GRCh38-3.0.0/ --fastqs=P2-2/ 7 | cellranger count --id=P2-3_outs --transcriptome=~/refdata-cellranger-GRCh38-3.0.0/ --fastqs=P2-3/ 8 | cellranger aggr --id=COVID19 --csv=COVID19.csv --normalize=mapped -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | BSD 2-Clause License 2 | 3 | Copyright (c) 2020, QuKunLab 4 | All rights reserved. 5 | 6 | Redistribution and use in source and binary forms, with or without 7 | modification, are permitted provided that the following conditions are met: 8 | 9 | 1. Redistributions of source code must retain the above copyright notice, this 10 | list of conditions and the following disclaimer. 11 | 12 | 2. Redistributions in binary form must reproduce the above copyright notice, 13 | this list of conditions and the following disclaimer in the documentation 14 | and/or other materials provided with the distribution. 15 | 16 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" 17 | AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 18 | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 19 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE 20 | FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 21 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 22 | SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 23 | CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 24 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 25 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 26 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # COVID-19 2 | The R/Python scripts for the analysis of single-cell RNA-seq data from COVID-19 patients 3 | 4 | ## 1. Requirement 5 | We analyzed the scRNA-seq data in a Linux system with R (version 3.6.1) and Python (version 3.6.8) enviroment. The following software and packages are also required: 6 | 7 | software|version|enviroment 8 | -|-|- 9 | Cellranger|3.1.0|Linux 10 | Seurat|3.1.4|R 11 | dplyr|0.8.4|R 12 | patchwork|1.0.0|R 13 | scrublet|0.2.1|Python 14 | scipy|1.0.0|Python 15 | pandas|0.24.2|Python 16 | matplotlib|3.0.3|Python 17 | seaborn|0.9.0|Python 18 | 19 | ## 2. Installation 20 | Users need to copy the scripts to the same path as the raw data folders (i.e. the path containing the "P1-1r1/", "P1-1r2/", "P1-2r1/", "P1-2r2/", "P2-1/", "P2-2/", and "P2-3/" folders), and run these scripts in R studio or Jupyter notebook. 21 | 22 | ## 3. Step by step analysis 23 | 24 | ### 3.1 Map the raw sequencing data to the genome reference 25 | 26 | bash step0_map_raw_data_using_cellranger.sh 27 | 28 | ### 3.2 Detect doublet by using Scrublet 29 | Run **step1_detect_doublets_using_scrublet.ipynb** in Jupyter notebook. This script decects doublets from the cells of patients and healthy controls respectively. 30 | 31 | ### 3.3 Integrate patients with healthy controls 32 | Run **step2_integrate_patients_and_healthy_controls.R** in R studio. This script also cluster cells and perform UMAP analysis on the scRNA-seq data. 33 | 34 | ### 3.4 Plot UMAP diagram and marker gene expression violinplot 35 | Run **step3_plot_umap_and_marker_gene_expression.ipynb** in Jupyter notebook. This script illustrates UMAP diagram for the cells and violinplot for the expression of marker genes. All diagrams will be automatically presented in the jupyter interface. 36 | 37 | ### 3.5 Generate DEGs between different disease stages 38 | Run **step4_find_DEGs_between_disease_stages.ipynb** in Jupyter notebook. This script generates DEGs between different disease stages, for CD14 monocytes and effector CD8 T cells respectively. It also present the expression heatmap of DEGs in PNG format file. 39 | -------------------------------------------------------------------------------- /step2_integrate_patients_and_healthy_controls.R: -------------------------------------------------------------------------------- 1 | library(dplyr) 2 | library(Seurat) 3 | library(patchwork) 4 | # 5 | # 6 | #### import scRNA-seq data of healthy controls #### 7 | healthy.data <- Read10X(data.dir = "cellranger_output_for_healthy_controls/") 8 | healthy <- CreateSeuratObject(counts = healthy.data, project = "healthy", min.cells = 3, min.features = 200) 9 | healthy[["percent.mt"]] <- PercentageFeatureSet(healthy, pattern = "^MT-") 10 | healthy_filtered <- subset(healthy, subset = nFeature_RNA > 300 & nFeature_RNA < 5000 & percent.mt < 10) 11 | healthy_doublet <- read.table('healthy_controls_doublets.txt', sep=',', check.names=F, row.names=1) 12 | healthy_doublet <- healthy_doublet[healthy_doublet$predicted_doublets=='True',] 13 | healthy_cells <- setdiff(colnames(healthy_filtered), rownames(healthy_doublet[1])) 14 | healthy_filtered <- SubsetData(healthy_filtered, cells=healthy_cells) 15 | healthy_filtered@meta.data['batch'] <- healthy_filtered@meta.data['orig.ident'] 16 | # 17 | # 18 | #### import scRNA-seq data of COVID-19 patients #### 19 | ncov.data <- Read10X(data.dir = "cellranger_output_for_COVID19_patients/") 20 | ncov <- CreateSeuratObject(counts = ncov.data, project = "covid19", min.cells = 3, min.features = 200) 21 | ncov[["percent.mt"]] <- PercentageFeatureSet(ncov, pattern = "^MT-") 22 | ncov_filtered <- subset(ncov, subset = nFeature_RNA > 500 & nFeature_RNA < 6000 & percent.mt < 10) 23 | cellinfo <- read.table("COVID19_cells_info.csv", sep='\t', header=TRUE, quote="", check.names=FALSE, row.names=1) 24 | ncov_filtered <- AddMetaData(ncov_filtered, metadata = cellinfo[colnames(ncov_filtered),], col.name=c('batch')) 25 | ncov_doublet <- read.table('COVID19_patients_doublets.txt', sep=',', check.names=F, row.names=1) 26 | ncov_doublet <- ncov_doublet[ncov_doublet$predicted_doublets=='True',] 27 | ncov_cells <- setdiff(colnames(ncov_filtered), rownames(ncov_doublet[1])) 28 | ncov_filtered <- SubsetData(ncov_filtered, cells=ncov_cells) 29 | # 30 | # 31 | #### integrated cells from healthy controls and COVID-19 patients #### 32 | overlap_genes <- intersect(rownames(ncov_filtered), rownames(healthy_filtered)) 33 | pbmc.list <- c(healthy_filtered, ncov_filtered) 34 | names(pbmc.list) = c('healthy', 'covid19') 35 | pbmc.anchors <- FindIntegrationAnchors(object.list=pbmc.list, dims = 1:40) 36 | pbmc.integrated <- IntegrateData(anchorset = pbmc.anchors, dims = 1:40, features.to.integrate=overlap_genes) 37 | # 38 | # 39 | #### normalization, clustering, UMAP, and plotting marker genes #### 40 | pbmc.integrated <- ScaleData(pbmc.integrated, features = rownames(pbmc.integrated)) 41 | pbmc.integrated <- RunPCA(pbmc.integrated, npcs=50, verbose=F) 42 | pbmc.integrated <- FindNeighbors(pbmc.integrated, dims = 1:50) 43 | pbmc.integrated <- FindClusters(pbmc.integrated, resolution = 0.3) 44 | pbmc.integrated <- RunUMAP(pbmc.integrated, reduction = "pca", dims = 1:50) 45 | DimPlot(pbmc.integrated, reduction = "umap", label=TRUE) 46 | FeaturePlot(pbmc.integrated, features=c('PTPRC', 'CD14', 'FCGR3A', 'CD3D', 'CD4', 'IL7R', 'CCR7', 'CD8A', 'PRDM1', 'MKI67', 47 | 'TRGC1', 'NKG7', 'CD79A', 'CD38', 'CD1C', 'CLEC4C', 'PPBP', 'CD34'), 48 | slot='data', min.cutoff=0, max.cutoff='q90') 49 | # 50 | # 51 | #### save data #### 52 | save(pbmc.integrated, file='integrated.allgenes.RData') 53 | write.table(pbmc.integrated[['RNA']]@data, file='RNA.data.csv',sep=',', quote=F) 54 | write.table(pbmc.integrated[['integrated']]@data, file='integrated.data.csv', sep=',', quote=F) 55 | write.table(pbmc.integrated@reductions$umap@cell.embeddings, file='umap.csv', sep=',', quote=F) 56 | write.table(pbmc.integrated@meta.data, file='meta_data.csv', sep=',') 57 | # 58 | # -------------------------------------------------------------------------------- /step4_find_DEGs_between_disease_stages.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 1, 6 | "metadata": { 7 | "scrolled": true 8 | }, 9 | "outputs": [], 10 | "source": [ 11 | "import numpy,pandas,scipy.stats\n", 12 | "import matplotlib.pyplot as plt\n", 13 | "#\n", 14 | "#\n", 15 | "def find_de_genes(data_df, meta_df, clusters, stage1, stage2, logfc=1, pvalue=0.001, pct=0.2):\n", 16 | " cells_df = meta_df.loc[meta_df['seurat_clusters'].isin(clusters)]\n", 17 | " cells1 = cells_df.loc[cells_df['patient stage']==stage1].index.values\n", 18 | " cells2 = cells_df.loc[cells_df['patient stage']==stage2].index.values\n", 19 | " degenes, matrix = [], []\n", 20 | " for igene,gene in enumerate(data_df.index.values):\n", 21 | " expr1 = data_df.loc[gene, cells1]\n", 22 | " expr2 = data_df.loc[gene, cells2]\n", 23 | " log2_fc = numpy.log2((expr1.mean()+1e-6) / (expr2.mean()+1e-6))\n", 24 | " pct1 = len(numpy.where(expr1>0)[0]) / len(expr1)\n", 25 | " pct2 = len(numpy.where(expr2>0)[0]) / len(expr2)\n", 26 | " if ((log2_fc>logfc)&(pct1>pct))|((log2_fc<-logfc)&(pct2>pct)):\n", 27 | " ww, pp = scipy.stats.mannwhitneyu(expr1, expr2)\n", 28 | " if pplogfc].index.values\n", 94 | " t2_t1 = df1.loc[df1['log2fc']<-logfc].index.values\n", 95 | " t1_h = df2.loc[df2['log2fc']>logfc].index.values\n", 96 | " h_t1 = df2.loc[df2['log2fc']<-logfc].index.values\n", 97 | " t2_h = df3.loc[df3['log2fc']>logfc].index.values\n", 98 | " h_t2 = df3.loc[df3['log2fc']<-logfc].index.values\n", 99 | " p321 = list(set(t1_t2).intersection(set(t2_h)))\n", 100 | " p311 = list(set(t1_t2).intersection(set(t1_h))-set(t2_h)-set(h_t2))\n", 101 | " p331 = list(set(t2_h).intersection(set(t1_h))-set(t1_t2)-set(t2_t1))\n", 102 | " p313 = list(set(t1_t2).intersection(set(h_t2)))\n", 103 | " p123 = list(set(t2_t1).intersection(set(h_t2)))\n", 104 | " p113 = list(set(h_t2).intersection(set(h_t1))-set(t1_t2)-set(t2_t1))\n", 105 | " p133 = list(set(t2_t1).intersection(set(h_t1))-set(h_t2)-set(t2_h))\n", 106 | " p131 = list(set(t2_t1).intersection(set(t2_h)))\n", 107 | " print(len(p321), len(p311), len(p331), len(p313), len(p123), len(p113), len(p133), len(p131))\n", 108 | " print(len(set(list(t1_t2)+list(t2_t1)+list(t1_h)+list(h_t1)+list(t2_h)+list(h_t2))))\n", 109 | " return p321, p311, p331, p313, p123, p113, p133, p131\n", 110 | "#\n", 111 | "def plot_de_genes(clusters, batches, scaled_df, meta_df, figsize, prefix, cmap='Reds'):\n", 112 | " for ibatch,batch in enumerate(batches):\n", 113 | " with open(prefix+'_geneset_'+str(ibatch)+'.txt', 'w') as outfile:\n", 114 | " for gg in batch:\n", 115 | " outfile.write(gg+'\\n')\n", 116 | " batch_length = numpy.array([len(x) for x in batches])\n", 117 | " bars = [batch_length[:ix].sum() for ix,x in enumerate(batch_length)]\n", 118 | "#\n", 119 | " cells_df = meta_df.loc[meta_df['seurat_clusters'].isin(clusters)]\n", 120 | " cells_df = cells_df.sample(frac=1)\n", 121 | " cells1 = cells_df.loc[cells_df['patient stage']=='severe stage'].index.values\n", 122 | " cells2 = cells_df.loc[cells_df['patient stage']=='remisson stage'].index.values\n", 123 | " cells3 = cells_df.loc[cells_df['patient stage']=='healthy people'].index.values\n", 124 | " cells = list(cells1) + list(cells2) + list(cells3)\n", 125 | " colors = ['darkorange']*len(cells1)+['yellowgreen']*len(cells2)+['steelblue']*len(cells3)\n", 126 | " genes = []\n", 127 | " for batch in batches:\n", 128 | " genes.extend(batch)\n", 129 | " sub_df = scaled_df.loc[genes, cells]\n", 130 | " zscore = scipy.stats.zscore(sub_df.values, axis=1)\n", 131 | " zscore_df = pandas.DataFrame(zscore, index=sub_df.index, columns=sub_df.columns)\n", 132 | " im = seaborn.clustermap(zscore_df, col_cluster=False, row_cluster=False, cmap=cmap, xticklabels=False,\n", 133 | " method='ward', vmin=0, vmax=1, col_colors=colors, figsize=figsize)\n", 134 | " im.ax_heatmap.hlines(bars, xmin=0, xmax=zscore_df.shape[1], colors='grey')\n", 135 | " im.ax_heatmap.vlines([len(cells1), len(cells1)+len(cells2)], ymin=0, ymax=zscore_df.shape[0], \n", 136 | " colors='grey')\n", 137 | " plt.savefig(prefix+'_heatmap.png', bbox_inches='tight')\n", 138 | " plt.close()\n", 139 | " return\n", 140 | "#\n", 141 | "#\n", 142 | "p321, p311, p331, p313, p123, p113, p133, p131 = organize_genes(\n", 143 | " 'DEgenes_cd14mono.severe_vs_remission.csv', \n", 144 | " 'DEgenes_cd14mono.severe_vs_healthy.csv',\n", 145 | " 'DEgenes_cd14mono.remission_vs_healthy.csv',\n", 146 | " meta_df, logfc=1)\n", 147 | "cluster_color = {1:'skyblue', 8:'crimson', 15:'royalblue', 12:'green'}\n", 148 | "plot_de_genes([1,8,15,12], [p321+p311,p331,p133,p131], integrated_df, meta_df, (10,20), \n", 149 | " 'cd14mono', cluster_color=cluster_color)\n", 150 | "#\n", 151 | "#\n", 152 | "p321, p311, p331, p313, p123, p113, p133, p131 = organize_genes(\n", 153 | " 'DEgenes_cd8effector.severe_vs_remission.csv', \n", 154 | " 'DEgenes_cd8effector.severe_vs_healthy.csv',\n", 155 | " 'DEgenes_cd8effector.remission_vs_healthy.csv',\n", 156 | " meta_df, logfc=1)\n", 157 | "plot_de_genes([5], [p321+p311,p331,p131], integrated_df, meta_df, (10,20), 'cd8effector')\n", 158 | "#\n", 159 | "#" 160 | ] 161 | }, 162 | { 163 | "cell_type": "code", 164 | "execution_count": null, 165 | "metadata": {}, 166 | "outputs": [], 167 | "source": [] 168 | } 169 | ], 170 | "metadata": { 171 | "kernelspec": { 172 | "display_name": "Python 3", 173 | "language": "python", 174 | "name": "python3" 175 | }, 176 | "language_info": { 177 | "codemirror_mode": { 178 | "name": "ipython", 179 | "version": 3 180 | }, 181 | "file_extension": ".py", 182 | "mimetype": "text/x-python", 183 | "name": "python", 184 | "nbconvert_exporter": "python", 185 | "pygments_lexer": "ipython3", 186 | "version": "3.6.8" 187 | } 188 | }, 189 | "nbformat": 4, 190 | "nbformat_minor": 2 191 | } 192 | -------------------------------------------------------------------------------- /.ipynb_checkpoints/step4_find_DEGs_between_disease_stages-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 1, 6 | "metadata": { 7 | "scrolled": true 8 | }, 9 | "outputs": [], 10 | "source": [ 11 | "import numpy,pandas,scipy.stats\n", 12 | "import matplotlib.pyplot as plt\n", 13 | "#\n", 14 | "#\n", 15 | "def find_de_genes(data_df, meta_df, clusters, stage1, stage2, logfc=1, pvalue=0.001, pct=0.2):\n", 16 | " cells_df = meta_df.loc[meta_df['seurat_clusters'].isin(clusters)]\n", 17 | " cells1 = cells_df.loc[cells_df['patient stage']==stage1].index.values\n", 18 | " cells2 = cells_df.loc[cells_df['patient stage']==stage2].index.values\n", 19 | " degenes, matrix = [], []\n", 20 | " for igene,gene in enumerate(data_df.index.values):\n", 21 | " expr1 = data_df.loc[gene, cells1]\n", 22 | " expr2 = data_df.loc[gene, cells2]\n", 23 | " log2_fc = numpy.log2((expr1.mean()+1e-6) / (expr2.mean()+1e-6))\n", 24 | " pct1 = len(numpy.where(expr1>0)[0]) / len(expr1)\n", 25 | " pct2 = len(numpy.where(expr2>0)[0]) / len(expr2)\n", 26 | " if ((log2_fc>logfc)&(pct1>pct))|((log2_fc<-logfc)&(pct2>pct)):\n", 27 | " ww, pp = scipy.stats.mannwhitneyu(expr1, expr2)\n", 28 | " if pplogfc].index.values\n", 94 | " t2_t1 = df1.loc[df1['log2fc']<-logfc].index.values\n", 95 | " t1_h = df2.loc[df2['log2fc']>logfc].index.values\n", 96 | " h_t1 = df2.loc[df2['log2fc']<-logfc].index.values\n", 97 | " t2_h = df3.loc[df3['log2fc']>logfc].index.values\n", 98 | " h_t2 = df3.loc[df3['log2fc']<-logfc].index.values\n", 99 | " p321 = list(set(t1_t2).intersection(set(t2_h)))\n", 100 | " p311 = list(set(t1_t2).intersection(set(t1_h))-set(t2_h)-set(h_t2))\n", 101 | " p331 = list(set(t2_h).intersection(set(t1_h))-set(t1_t2)-set(t2_t1))\n", 102 | " p313 = list(set(t1_t2).intersection(set(h_t2)))\n", 103 | " p123 = list(set(t2_t1).intersection(set(h_t2)))\n", 104 | " p113 = list(set(h_t2).intersection(set(h_t1))-set(t1_t2)-set(t2_t1))\n", 105 | " p133 = list(set(t2_t1).intersection(set(h_t1))-set(h_t2)-set(t2_h))\n", 106 | " p131 = list(set(t2_t1).intersection(set(t2_h)))\n", 107 | " print(len(p321), len(p311), len(p331), len(p313), len(p123), len(p113), len(p133), len(p131))\n", 108 | " print(len(set(list(t1_t2)+list(t2_t1)+list(t1_h)+list(h_t1)+list(t2_h)+list(h_t2))))\n", 109 | " return p321, p311, p331, p313, p123, p113, p133, p131\n", 110 | "#\n", 111 | "def plot_de_genes(clusters, batches, scaled_df, meta_df, figsize, prefix, cmap='Reds'):\n", 112 | " for ibatch,batch in enumerate(batches):\n", 113 | " with open(prefix+'_geneset_'+str(ibatch)+'.txt', 'w') as outfile:\n", 114 | " for gg in batch:\n", 115 | " outfile.write(gg+'\\n')\n", 116 | " batch_length = numpy.array([len(x) for x in batches])\n", 117 | " bars = [batch_length[:ix].sum() for ix,x in enumerate(batch_length)]\n", 118 | "#\n", 119 | " cells_df = meta_df.loc[meta_df['seurat_clusters'].isin(clusters)]\n", 120 | " cells_df = cells_df.sample(frac=1)\n", 121 | " cells1 = cells_df.loc[cells_df['patient stage']=='severe stage'].index.values\n", 122 | " cells2 = cells_df.loc[cells_df['patient stage']=='remisson stage'].index.values\n", 123 | " cells3 = cells_df.loc[cells_df['patient stage']=='healthy people'].index.values\n", 124 | " cells = list(cells1) + list(cells2) + list(cells3)\n", 125 | " colors = ['darkorange']*len(cells1)+['yellowgreen']*len(cells2)+['steelblue']*len(cells3)\n", 126 | " genes = []\n", 127 | " for batch in batches:\n", 128 | " genes.extend(batch)\n", 129 | " sub_df = scaled_df.loc[genes, cells]\n", 130 | " zscore = scipy.stats.zscore(sub_df.values, axis=1)\n", 131 | " zscore_df = pandas.DataFrame(zscore, index=sub_df.index, columns=sub_df.columns)\n", 132 | " im = seaborn.clustermap(zscore_df, col_cluster=False, row_cluster=False, cmap=cmap, xticklabels=False,\n", 133 | " method='ward', vmin=0, vmax=1, col_colors=colors, figsize=figsize)\n", 134 | " im.ax_heatmap.hlines(bars, xmin=0, xmax=zscore_df.shape[1], colors='grey')\n", 135 | " im.ax_heatmap.vlines([len(cells1), len(cells1)+len(cells2)], ymin=0, ymax=zscore_df.shape[0], \n", 136 | " colors='grey')\n", 137 | " plt.savefig(prefix+'_heatmap.png', bbox_inches='tight')\n", 138 | " plt.close()\n", 139 | " return\n", 140 | "#\n", 141 | "#\n", 142 | "p321, p311, p331, p313, p123, p113, p133, p131 = organize_genes(\n", 143 | " 'DEgenes_cd14mono.severe_vs_remission.csv', \n", 144 | " 'DEgenes_cd14mono.severe_vs_healthy.csv',\n", 145 | " 'DEgenes_cd14mono.remission_vs_healthy.csv',\n", 146 | " meta_df, logfc=1)\n", 147 | "cluster_color = {1:'skyblue', 8:'crimson', 15:'royalblue', 12:'green'}\n", 148 | "plot_de_genes([1,8,15,12], [p321+p311,p331,p133,p131], integrated_df, meta_df, (10,20), \n", 149 | " 'cd14mono', cluster_color=cluster_color)\n", 150 | "#\n", 151 | "#\n", 152 | "p321, p311, p331, p313, p123, p113, p133, p131 = organize_genes(\n", 153 | " 'DEgenes_cd8effector.severe_vs_remission.csv', \n", 154 | " 'DEgenes_cd8effector.severe_vs_healthy.csv',\n", 155 | " 'DEgenes_cd8effector.remission_vs_healthy.csv',\n", 156 | " meta_df, logfc=1)\n", 157 | "plot_de_genes([5], [p321+p311,p331,p131], integrated_df, meta_df, (10,20), 'cd8effector')\n", 158 | "#\n", 159 | "#" 160 | ] 161 | }, 162 | { 163 | "cell_type": "code", 164 | "execution_count": null, 165 | "metadata": {}, 166 | "outputs": [], 167 | "source": [] 168 | } 169 | ], 170 | "metadata": { 171 | "kernelspec": { 172 | "display_name": "Python 3", 173 | "language": "python", 174 | "name": "python3" 175 | }, 176 | "language_info": { 177 | "codemirror_mode": { 178 | "name": "ipython", 179 | "version": 3 180 | }, 181 | "file_extension": ".py", 182 | "mimetype": "text/x-python", 183 | "name": "python", 184 | "nbconvert_exporter": "python", 185 | "pygments_lexer": "ipython3", 186 | "version": "3.6.8" 187 | } 188 | }, 189 | "nbformat": 4, 190 | "nbformat_minor": 2 191 | } 192 | -------------------------------------------------------------------------------- /step1_detect_doublets_using_scrublet.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 9, 6 | "metadata": {}, 7 | "outputs": [ 8 | { 9 | "name": "stdout", 10 | "output_type": "stream", 11 | "text": [ 12 | "Counts matrix shape: 66457 rows, 33538 columns\n", 13 | "Number of genes in gene list: 33538\n" 14 | ] 15 | } 16 | ], 17 | "source": [ 18 | "import numpy,pandas,scipy.io,scipy.sparse\n", 19 | "import scrublet\n", 20 | "#\n", 21 | "#\n", 22 | "input_dir = 'cellranger_output_for_healthy_controls/'\n", 23 | "counts_matrix = scipy.io.mmread(input_dir + '/matrix.mtx').T.tocsc()\n", 24 | "genes = numpy.array(scrublet.load_genes(input_dir + '/features.tsv', delimiter='\\t', column=1))\n", 25 | "out_df = pandas.read_csv(input_dir + '/barcodes.tsv', header = None, index_col=None, names=['barcode'])\n", 26 | "print('Counts matrix shape: {} rows, {} columns'.format(counts_matrix.shape[0], counts_matrix.shape[1]))\n", 27 | "print('Number of genes in gene list: {}'.format(len(genes)))\n", 28 | "#" 29 | ] 30 | }, 31 | { 32 | "cell_type": "code", 33 | "execution_count": 10, 34 | "metadata": {}, 35 | "outputs": [], 36 | "source": [ 37 | "scrub = scrublet.Scrublet(counts_matrix, expected_doublet_rate=0.06)" 38 | ] 39 | }, 40 | { 41 | "cell_type": "code", 42 | "execution_count": 11, 43 | "metadata": {}, 44 | "outputs": [ 45 | { 46 | "name": "stdout", 47 | "output_type": "stream", 48 | "text": [ 49 | "Preprocessing...\n", 50 | "Simulating doublets...\n", 51 | "Embedding transcriptomes using PCA...\n", 52 | "Calculating doublet scores...\n", 53 | "Automatically set threshold at doublet score = 0.35\n", 54 | "Detected doublet rate = 1.7%\n", 55 | "Estimated detectable doublet fraction = 47.8%\n", 56 | "Overall doublet rate:\n", 57 | "\tExpected = 6.0%\n", 58 | "\tEstimated = 3.6%\n", 59 | "Elapsed time: 236.5 seconds\n" 60 | ] 61 | } 62 | ], 63 | "source": [ 64 | "doublet_scores, predicted_doublets = scrub.scrub_doublets(min_counts=2, min_cells=3, min_gene_variability_pctl=85, \n", 65 | " n_prin_comps=30)" 66 | ] 67 | }, 68 | { 69 | "cell_type": "code", 70 | "execution_count": 12, 71 | "metadata": {}, 72 | "outputs": [ 73 | { 74 | "name": "stdout", 75 | "output_type": "stream", 76 | "text": [ 77 | "Detected doublet rate = 2.2%\n", 78 | "Estimated detectable doublet fraction = 51.4%\n", 79 | "Overall doublet rate:\n", 80 | "\tExpected = 6.0%\n", 81 | "\tEstimated = 4.3%\n" 82 | ] 83 | }, 84 | { 85 | "data": { 86 | "text/plain": [ 87 | "(
,\n", 88 | " array([,\n", 89 | " ],\n", 90 | " dtype=object))" 91 | ] 92 | }, 93 | "execution_count": 12, 94 | "metadata": {}, 95 | "output_type": "execute_result" 96 | }, 97 | { 98 | "data": { 99 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAjgAAADQCAYAAAAK/RswAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvnQurowAAIABJREFUeJzt3XuYHVWZ7/HvLyHcITgkKoSEgAHGCArSAhkdhlFmDEjERxlNFA/xRHNQUTyij3g5chFFnFEUQWNGMoAiFx1GE0QFFUQYAiThGhg0IkggkHBrrgrB9/yxVkNls3f37t37Wv37PE89vXfVqqq3dvd+e9Vaq6oUEZiZmZmVyZhOB2BmZmbWbK7gmJmZWem4gmNmZmal4wqOmZmZlY4rOGZmZlY6ruCYmZlZ6biC02UkHS/p+52OYzgkzZV0VafjGClJ75F0aafjMOs2rfxuSDpL0kmt2HaVfQ0rV0m6S9KBTdjvoMcoKSRNG+l+bEOu4LRZ/oLdIukpSfdL+rakbTodV6tImpq/vBt1OpahRMS5EfHP9ZQtS6XObICkN0j6b0n9kh6WdLWk18HwvhstjvEKSe/vdBydIukASas7HUevcAWnjSQdA5wCfBIYD+wH7AhcJmnjNsbRVZWNboinG2Iw6xRJWwMXA98E/gaYBJwA/KWTcZmNhCs4bZITyAnARyLi5xHxbETcBbwTmAocXii+qaQLJD0uaYWk1xS28ylJ9+Zld0h6U54/RtKxkv4g6SFJF0r6m7xsoBVlnqQ/Ab+W9DNJR1XEeJOkt+fXfyvpsnwmd4ekdxbKbStpsaTHJF0HvGKQQ78y/3xU0hOSZuTWj6slnSrpIeB4Sa+Q9Osc+4OSzi22bOWm4k9IujmfYV4gadO8bIKkiyU9muP9raQxedlkSRdJWpe3fXqeXy2GDVpl8mf2UUl35pj+NX/OrwQWADPyMT2ay4+XdE7e192SPleIo7i/R/M2/y7Pv0fSWklHFPa9iaR/k/QnSQ9IWiBps6GO16xBuwJExHkR8VxEPB0Rl0bEzfDiFsv83fiQpN/nXPSF/B3+75wXLhw4aatct7D+i7pkJL0k/22vk/RIfr1DXvZF4O+B0/P3buC73KxchaT35u/uQ5I+W7FsE0lfl3Rfnr4uaZNhHOOEHOfjkn4jaccaMVT97kvaAvgZsH0+/ickbS9pH0nL8jE+IOlrgx3jqBIRntowATOB9cBGVZadDZyXXx8PPAscBowDPgH8Mb/eDbgH2D6XnQq8Ir8+GlgK7ABsAnynsM2pQADnAFsAmwH/C7i6EMN04NG87hZ5P+8DNgL2Ah4Epuey5wMX5nK7A/cCV9U47oF9b1SYNzd/Fh/J298MmAb8U97/RFLF6OuFde4CrgO2J51h3g4cmZedTKpwjMvT3wMCxgI3AafmWDcF3jBIDHOLx5HjvjzvbwrwO+D9hfWvqjjWc4CfAFvl4/4dMK9if+/LcZ0E/Ak4Ix/zPwOPA1vm8qcCi/O+twKWACcPdryd/hv31LsTsDXwECkXHQS8pGJ5te/GT/J6ryK19PwK2JnUOn0bcES1dQvrT8uvzwJOyq+3Bd4BbJ7/7n8I/Liw3hUD38H8vpm5ajrwBLB//k5+LX9nD8zLTyTl2JeSctR/A18YxjE+Xtj2N6p8ngNlB/vuHwCsrtjPNcB78+stgf06/ffULVPHAxgtE6mF5v4ay74MXJZfHw8sLSwbA6zJ/8SmAWuBA4FxFdu4HXhT4f12pIrSRrxQydi5sHwr4Elgx/z+i8Ci/PpdwG8rtv8d4DjSP+dngb8tLPvSIEljYN+VFZw/DfF5vQ24ofD+LuDwwvuvAAvy6xNJyXZaxTZmAOuoXql8UQyVSSrHPbPw/kPAr2qUHQs8Q06sed7/Aa4olP99YdkeefsvK8x7CNiTVDl7klx5LRzLHwc7Xk+eRjIBryT9I15N+se+eODvs8Z34/WF98uBTxXef5V8glK5bmH9F1VwqsS0J/BI4f0VbFjBaWau+jxwfuH9Fvk7PVDB+QNwcGH5m4G7hnGMxW1vCTwHTC6WreO7fwAvruBcSeodmNDpv6Fum9ys3T4Pkpooq4312C4vH3DPwIuI+Csp4WwfEauAj5EqQWslnS9p+1x0R+C/crfFo6QKz3PAy2ps93Hgp8DsPGsOcG5hW/sObCtv7z3Ay0lnLhsVtwXcXd9HsIHi+kh6WT6eeyU9BnwfmFCxzv2F10+RkgTAvwKrgEtz18+xef5k4O6IWF9PDHWUuZvUglTNBFJrSvGzuJs0lmHAA4XXTwNEROW8LUmf8ebA8sLn//M8H2ofr1nDIuL2iJgbETuQWju2B74+yCqVf7vV/paHRdLmkr6Tu4keI/3z3kbS2BqrNDNXbc+GOfJJ0klHcXnl97tWPqimuO0ngIerrD/Ud7+aeaQuxv+RdL2kQ4YRU6m5gtM+15Cacd9enClpS1KT8K8KsycXlo8hdTvdBxARP4iIN5C+2EEatAzpy3NQRGxTmDaNiHsL2618dPx5wBxJM0jdN5cXtvWbim1tGREfJLWIrC/GSOq+qaXW4+or538pz9sjIrYmtXhpkO2+sKGIxyPimIjYGXgr8HGlsUn3AFNqVCoHi62o8jjvq7Hug6SzxR0ryt/L8D1I+gfxqsLnPz4itoRBj9esKSLif0itDrs3YXNPkv5pAyDp5YOUPYbUFb9vzgP7D6w2EFpF+WbmqjVsmHs3J3WZDbiPF3+/B/JBPcdY3PaWpC6o+yrKDPrdp0rOiojfR8QcUtfZKcCP8nidUc8VnDaJiH5SM+I3Jc2UNE7SVFL/8Grge4Xie0t6e/7H/DFSxWippN0kvTEPbPsz6Yvw17zOAuCLAwPXJE2UdOgQYV1C+sKeCFyQW4sgXU2xax5wNy5Pr5P0yoh4DriINCh3c0nTgSMG2ce6HOPOQ8SyFan/u1/SJNKVZnWRdIikaZIE9JNarv5KGrOzBviypC0kbSrp9fVuN/tkHvg4mTTO6YI8/wFgh4GBlPlzuZD0O9gq/x4+TmqJGpb8e/h34FRJL83HOEnSm4c4XrOG5IG6x+iFAb2TSa26S5uw+ZuAV0naU+nCgOMHKbsVKa89qnSRxHEVyx9gw1zSzFz1I+AQpcvlNyblxeL/yPOAz+XcOoHUpTXw/a7nGA8ubPsLpKEIG7QiD/Xdz8e/raTxA+tIOlzSxLzuo3m28wGu4LRVRHwF+Azwb8BjwLWkM5A3RUTxcsyfkPqWHwHeC7w9Ip4lDU77MqmWfz+pxv7pvM43SH3ml0p6nJSY9h0inr+QEsCBwA8K8x8nDXqdTTrDuJ90ZrBJLnIUqfn5ftJZ3n8Mso+nSON7rs5NrvvVKHoC8FrSP+yf5rjqtQvwS1IF6RrgWxFxeU5ws0h9238iVSTfNYztQvpdLAduzHGdmef/GlgJ3C9poHvxI6QzuTuBq0if6aJh7m/Ap0jdUEtzU/0vSWe2UON4G9yPGaQBsPsC10p6kpQ/biW1qIxIRPyOVFn4JfB70nejlq+TBvw/mGP4ecXybwCHKV1hdVqTc9VK4MOk7+0aUv4t3nPmJGAZcDNwC7Aiz6v3GH9AqrA9DOzNhlfOFtX87ueWtfOAO3M+3Z50ActKSU/kz2d2RDxd6zhHE0XU00pvNvpICmCXPPbJzMx6iFtwzMzMrHRcwTEzM7PScReVmZmZlY5bcMzMzKx0SvmAwQkTJsTUqVM7HUbbLV++nL333rvTYZg9b/ny5Q9GxGA3KesZzitm3aHevFLKLqq+vr5YtmxZp8NoO0mU8fdpvUvS8ojo63QczeC8YtYd6s0r7qIyMzOz0ilVBUfSLEkL+/v7Ox2KmZmZdVCpKjgRsSQi5o8fP37owmZmZlZapargmJmZmYErOGZmZlZCpbxMvF4nnHDCBu+PO67ywbVmZiPnXGPWfm7BMTMzs9JxBcfMzMxKxxUcMysVSZtKuk7STZJWSjqhSplNJF0gaZWkayVNbX+kZtZKXT8GR9LOwGeB8RFxWCv35X5ys1L4C/DGiHhC0jjgKkk/i4ilhTLzgEciYpqk2cApwLs6EayZtUZHWnAkLZK0VtKtFfNnSrojn1UdCxARd0bEvE7EaWa9J5In8ttxeap81sChwNn59Y+AN0lSm0I0szboVBfVWcDM4gxJY4EzgIOA6cAcSdPbH5qZ9TpJYyXdCKwFLouIayuKTALuAYiI9UA/sG17ozSzVupIBScirgQerpi9D7Aqt9g8A5xPOssyMxuWiHguIvYEdgD2kbR7I9uRNF/SMknL1q1b19wgzaylummQ8fNnVNlqYJKkbSUtAPaS9OlaKzsRmVmliHgUuJyKFmPgXmAygKSNgPHAQ1XWXxgRfRHRN3HixFaHa2ZN1PWDjCPiIeDIOsotlLQGmLXxxhvv3frIzKwbSZoIPBsRj0raDPgn0iDiosXAEcA1wGHAryOicpyOmfWwbmrBef6MKtshz6ubH7ZpZsB2wOWSbgauJ43BuVjSiZLemsucCWwraRXwceDYDsVqZi3STS041wO7SNqJVLGZDbx7OBuQNAuYNW3atBaEZ2a9ICJuBvaqMv/zhdd/Bv6lnXGZWXt16jLx80hNw7tJWi1pXr6S4SjgF8DtwIURsXI423ULjpmZmUGHWnAiYk6N+ZcAlzS63Wa34FTe+A988z8zM7Ne0E1jcEbMLThmZmYG3TUGx8xsVKjWOlzklmKzkStVC46kWZIW9vf3dzoUMzMz66BSteBExBJgSV9f3wc6HYuZWaP84F+zkStVC46ZmZkZlKyC4y4qMzMzA3dRDZubjs3MzLpfqVpwzMzMzMAVHDMzMyuhUlVwPAbHzMzMwGNwzMy6nh8bYzZ8pWrBMTMzMwNXcMysZCRNlnS5pNskrZR0dJUyB0jql3Rjnj7fiVjNrHVK1UXVCb5s3KzrrAeOiYgVkrYClku6LCJuqyj324g4pAPxmVkblKoFx4OMzSwi1kTEivz6ceB2YFJnozKzditVC44HGZtZkaSpwF7AtVUWz5B0E3Af8ImIWFll/fnAfIApU6a0LlCzzL0CzVOqFhwzswGStgT+E/hYRDxWsXgFsGNEvAb4JvDjatuIiIUR0RcRfRMnTmxtwGbWVK7gmFnpSBpHqtycGxEXVS6PiMci4on8+hJgnKQJbQ7TzFrIFRwzKxVJAs4Ebo+Ir9Uo8/JcDkn7kHLhQ+2L0sxarVRjcMzMgNcD7wVukXRjnvcZYApARCwADgM+KGk98DQwOyKiE8Ha6OHxNe1VqgqOpFnArGnTpnU6FDPrkIi4CtAQZU4HTm9PRGbWCaWq4HTDVVSuoZuZWav4f0z9hhyDI2nbdgRiZlbk3GNmI1HPIOOlkn4o6eCBQXlmZm3g3GNmDaungrMrsJA0aO/3kr4kadfWhmVm5txjZo0bsoITyWURMQf4AHAEcJ2k30ia0fIIzWxUcu4xs5EYcpBx7gc/nHQW9QDwEWAxsCfwQ2CnVgZoZqOTc4+ZjUQ9V1FdA3wPeFtErC7MXyZpQWvCMjNz7hmMr6YxG1w9FZzPRcSFxRmS/iUifhgRp7QoLjMz5x7raZWVUGuveio4xwIXVsz7NKmJuOUkbQF8C3gGuCIizm3HfpvFZ1lmDeto7jGz3lazgiPpIOBgYJKk0wqLtgbWj2SnkhYBhwBrI2L3wvyZwDeAscB3I+LLwNuBH0XEEkkXAD1VwTGz4Wll7jGz0WOwq6juA5YBfwaWF6bFwJtHuN+zgJnFGZLGAmcABwHTgTmSpgM7APfkYs+NcL9m1v1amXvMbJSo2YITETcBN0k6NyKaetYUEVdKmloxex9gVUTcCSDpfOBQYDWpknMjg1TIJM0H5gNMmTKlmeGaWRu1MveY2egxWBfVhRHxTuAGScWn7Ip0i4pXNzmWSbzQUgOpYrMvcBpwuqS3AEtqrRwRC0k3BaOvr89PBTbrUR3IPWZWQoMNMj46/zykHYHUEhFPAu+rp6yfJm5WCiPKPZImA+cALwMCWBgR36goI9J4v4OBp4C5EbGi4YjNrOvU7PKJiDX55YPAPRFxN7AJ8BpSH3mz3QtMLrzfIc+rW0QsiYj548ePb2pgZtY+Tcg964FjImI6sB/w4Tyer+ggYJc8zQe+3YzYzax71PMsqiuBTSVNAi4l3VX0rBbEcj2wi6SdJG0MzCYNKqybpFmSFvb397cgPDNrs4ZyT0SsGWiNiYjHgdtJXeBFhwLn5MdBLAW2kbRdM4M3s86q5z44ioinJM0DvhURX5F040h2Kuk84ABggqTVwHERcaako4BfkC4TXxQRK4ez3YhYAizp6+v7wEjiM7OuMOLcky9m2Au4tmJRtTF/k4A1xUK+eMHKYLTej62uCk5+sN17gHl53tiR7DQ/PK/a/EuASxrdbi+Mwal2Z8vR8sdmNkwjyj2StgT+E/hYRDzWSAC+eMGsd9XTRXU06e6h/xURKyXtDFze2rAa4zE4ZqXScO6RNI5UuTk3Ii6qUmTEY/7MrLsN2YITEVeS+sIH3t8JfLSVQZmZNZp78hVSZwK3R8TXahRbDByV77e1L9BfGNxsZiUwZAVH0q7AJ4CpxfIR8cbWhdWYXuiiMrP6jCD3vJ40IPmWwpidzwBT8voLSF3hBwOrSJeJ13UrCjPrHfWMwfkhsAD4Ll3+qAQPMjYrlYZyT0RcRbop4GBlAvjwiKIzs65WTwVnfUT4HhFm1m7OPdbVRuvVSb2inkHGSyR9SNJ2kv5mYGp5ZA3wfXDMSqVnco+ZdZ96WnCOyD8/WZgXwM7ND2dk3EVlVio9k3u6gVsTzDZUz1VUO7UjEDOzIuceMxuJIbuoJG0u6XOSFub3u0jq6AM4a3EXlVl59FLuMbPuU08X1X8Ay4G/y+/vJV3dcHGrgmpUr3ZRuWnZrKqeyT1m1n3qGWT8ioj4CvAsQEQ8xRCXYJqZNYFzj5k1rJ4KzjOSNiMN7kPSK4C/tDQqMzPnHjMbgXq6qI4Dfg5MlnQu6S6hc1sZlJkZzj1mNgL1XEV1maQVwH6k5uGjI+LBlkfWAD+qwaw8ein3mFn3qVnBkfTailkDD6KbImlKRKxoXViN6dVBxmb2gl7MPWbWfQZrwflq/rkp0AfcRDqLejWwDJjR2tDMbJRy7jGzEatZwYmIfwSQdBHw2oi4Jb/fHTi+LdGNUr5s3EYz5x4za4Z6rqLabSDBAETErcArWxeSmRng3GNmI1BPBedmSd+VdECe/h24udWBmdmo11DukbRI0lpJt9ZYfoCkfkk35unzTY/czDqunsvE3wd8EDg6v78S+HbLIjIzSxrNPWcBpwPnDFLmtxHhxz6YlVg9l4n/GTg1T13Nl4mblUejuScirpQ0tRUxmVnvqKcFp2eU9TJxDzo2a7oZkm4C7gM+ERErqxWSNB+YDzBlypQ2hmfdpjIPl1m1Y+3F/zv1jMExMyuTFcCOEfEa4JvAj2sVjIiFEdEXEX0TJ05sW4BmNnKu4JjZqBIRj0XEE/n1JcA4SRM6HJaZNVlDFZzcbGtm1lbNyD2SXi5J+fU+pDz40Ei3a2bdpdExOGpqFGZm9Rky90g6DzgAmCBpNemhneMAImIBcBjwQUnrgaeB2RERLYvYzDqioQpORHyn2YGYmQ2lntwTEXOGWH466TJyMyuxISs4krYl3R799UAAVwEnRoSbdM2sZXo194ymq23Mulk9LTjnk26w9Y78/j3ABcCBrQqqSNLOwGeB8RFxWDv22e3Kcgmf2RA6mnt6nW8vYaNdPYOMt4uIL0TEH/N0EvCyejZe65bpkmZKukPSKknHDraNiLgzIubVsz8zK5WGc4+ZWT0VnEslzZY0Jk/vBH5R5/bPAmYWZ0gaC5wBHARMB+ZImi5pD0kXV0wvHcaxmFm5jCT3mNkoV7OLStLjpH5vAR8Dvp8XjQGeAD4x1MZr3DJ9H2BVRNyZ93M+cGhEnAz42TBmo1wzco+ZWc0WnIjYKiK2zj/HRMRGeRoTEVuPYJ+TgHsK71fneVVJ2lbSAmAvSZ8epNx8ScskLVu3bt0IwjOzTmph7jGzUaSuy8QlvRXYP7+9IiIubl1IG8pXTBxZR7mFktYAszbeeOO9Wx+ZmbVaJ3OPmfW2IcfgSPoycDRwW56OlnTyCPZ5LzC58H6HPG/EImJJRMwfP358MzZnZh3UgtxjZqNIPS04BwN7RsRfASSdDdwA1OwuGsL1wC6SdiJVbGYD725wWxuQNAuYNW3atGZsrqcMXBI68NOXhFoJNDv3mNXky+rLp947GW8DPJxf1908Uu2W6RFxpqSjSFdDjAUWRcTK+kOuLSKWAEv6+vo+0Izt2ejh5Na1Gso9Zmb1VHBOBm6QdDnpqob9gUHvXTOg1i3T8xN8L6k3yHqN5hYcsxJqOPeYmQ1awclP3L0K2A94XZ79qYi4v9WBNcItOGbl0Gu5x8y6z6AVnIgISZdExB7A4jbFZNZx7rLqLOceMxuperqoVkh6XURc3/JoRshdVAZDP+ywGZUVV4DaoqHcI2kR6aahayNi9yrLBXyDNIj5KWBuRKxoRsDdzH+zNtrUU8HZFzhc0l3Ak6S+8IiIV7cysEa4i+oFnUhmjexzqHX8YNFRrdHccxZwOnBOjeUHAbvkaV/g2/mnmZVIPRWcN7c8CjOzF2so99R4REzRocA5ERHAUknbSNouItY0sj8z606DPYtqU9IdhKcBtwBnRsT6dgXWCHdRmfW+NuSeWo+LeVEFR9J8YD7AlClTmhiCmbXaYHcyPhvoIyWYg4CvtiWiEfCdjM1KoWtyT0QsjIi+iOibOHFip8IwswYM1kU1PV/BgKQzgevaE5J1q3aM6xlqgHAv8aDOhrU697TscTG9xOPbrOwGq+A8O/AiItanCw+sV3XLP9tmVGC65VisZVqdexYDR0k6nzS4uN/jb0afMp1MWXWDVXBeI+mx/FrAZvn9wJUMW7c8umHyGJzG+WzOusiIck+1R8QA40grLyDdRf1gYBXpMvH3teIgzKyzalZwImJsOwNpBl8mXj+fvVi3GmnuqfWImMLyAD48kn2YWfer92GbNgq5EmRmZr1qsKuozMzMzHqSW3DMzMxsRLrx4o9SVXA8yLi7uItreDzQ28yseUpVwfEgYzMzq+STrdHJY3DMzMysdFzBMTMzs9IpVReVmZk1rhsHipo1yi04ZmZmVjqlquBImiVpYX9/f6dDMTMzsw4qVReVr6IyMyuf4Xad+aopg5K14JiZmZmBKzhmZmZWQq7gmFnpSJop6Q5JqyQdW2X5XEnrJN2Yp/d3Ik4za51SjcGx9nI/t3UjSWOBM4B/AlYD10taHBG3VRS9ICKOanuAZtYWbsExs7LZB1gVEXdGxDPA+cChHY7JzNrMLThmPcw3ZqtqEnBP4f1qYN8q5d4haX/gd8D/jYh7KgtImg/MB5gyZUoLQrVmcGuyVeMWHDMbjZYAUyPi1cBlwNnVCkXEwojoi4i+iRMntjXAbjBQcTjhhBNcibCe0/UtOJLeBrwF2Bo4MyIu7XBIZtbd7gUmF97vkOc9LyIeKrz9LvCVNsTV81rRYuhWSGuVlrbgSFokaa2kWyvmD3qFQ1FE/DgiPgAcCbyrlfGaWSlcD+wiaSdJGwOzgcXFApK2K7x9K3B7G+MzszZodQvOWcDpwDkDM2pd4QCMBU6uWP9/R8Ta/PpzeT0zs5oiYr2ko4BfkPLKoohYKelEYFlELAY+KumtwHrgYWBuxwK2YXN3mdWjpRWciLhS0tSK2c9f4QAg6Xzg0Ig4GTikchuSBHwZ+FlErKi1Lw8GNLMBEXEJcEnFvM8XXn8a+HS74zJXTqx9OjHIuNoVDpMGKf8R4EDgMElH1io02gcDmpmZ2Qu6fpBxRJwGnFZPWUmzgFnTpk1rbVBmZlYXt9hYp3SigjPkFQ6N8tPEzczap1rlxVdBWbfoRAXn+SscSBWb2cC7m7Fht+CYmXWWW2ysW7T6MvHzgGuA3SStljQvItYDA1c43A5cGBErm7G/iFgSEfPHjx/fjM2ZmZlZj2r1VVRzasx/0RUOzeAWHDMzM4MeGGQ8HB6DY2bW29zFZc3iZ1GZmZlZ6ZSqgiNplqSF/f39nQ7FzMzMOqhUFRwPMjYzMzMo2RgcMxuan95sZqNBqVpw3EVlZmZmULIKjruozMzMDEpWwTEzMzODklVw3EVlZgCSZkq6Q9IqScdWWb6JpAvy8mslTW1/lGbWSqWq4LiLyswkjQXOAA4CpgNzJE2vKDYPeCQipgGnAqe0N0oza7VSVXDMzIB9gFURcWdEPAOcDxxaUeZQ4Oz8+kfAmySpjTGaWYspIjodQ9NJWgfcXUfRCcCDLQ5npHohRuiNOB1jcwwnxh0jYmIrg6kk6TBgZkS8P79/L7BvRBxVKHNrLrM6v/9DLvNgxbbmA/Pz292AO+oIoRd+h9AbcTrG5uiFGKH+OOvKK6W8D069CVXSsojoa3U8I9ELMUJvxOkYm6MXYmyWiFgILBzOOr3y+fRCnI6xOXohRmh+nO6iMrOyuReYXHi/Q55XtYykjYDxwENtic7M2sIVHDMrm+uBXSTtJGljYDawuKLMYuCI/Pow4NdRxv56s1GslF1UwzCspucO6YUYoTfidIzN0dUxRsR6SUcBvwDGAosiYqWkE4FlEbEYOBP4nqRVwMOkSlCzdPXnU9ALcTrG5uiFGKHJcZZykLGZmZmNbu6iMjMzs9JxBcfMzMxKZ1RUcHrhtu11xPhxSbdJulnSryTt2G0xFsq9Q1JI6shlifXEKemd+fNcKekH3RajpCmSLpd0Q/6dH9yBGBdJWpvvGVNtuSSdlo/hZkmvbXeMneS80r44C+U6llucV5oWY/vySkSUeiINMvwDsDOwMXATML2izIeABfn1bOCCLozxH4HN8+sPdmOMudxWwJXAUqCvS3/fuwA3AC/J71/ahTEuBD6YX08H7urAZ7k/8Frg1hrLDwZ+BgjYD7i23TF2anJeaW+cuVzHcovzSlPjbFteGQ0tOL1w2/YhY4yIyyPiqfx2KeneHu1Uz+cI8AXSc33+3M7gCuqJ8wPAGRHxCEBErO3CGAPYOr8eD9zXxvhSABFXkq4wquVQ4Jx8qK8OAAAFN0lEQVRIlgLbSNquPdF1nPNK8/RCbnFeaZJ25pXRUMGZBNxTeL86z6taJiLWA/3Atm2JrmL/WbUYi+aRarjtNGSMuSlxckT8tJ2BVajns9wV2FXS1ZKWSprZtuiSemI8Hjhc0mrgEuAj7QltWIb7d1smzivN0wu5xXmlfZqWV0b7fXB6jqTDgT7gHzodS5GkMcDXgLkdDqUeG5Gakw8gnbFeKWmPiHi0o1FtaA5wVkR8VdIM0j1bdo+Iv3Y6MCufbs0r0FO5xXmly4yGFpxeuG17PTEi6UDgs8BbI+IvbYptwFAxbgXsDlwh6S5S3+niDgwGrOezXA0sjohnI+KPwO9Iiald6olxHnAhQERcA2xKehBdN6nr77aknFeapxdyi/NK+zQvr7R7gFG7J1Kt+k5gJ14YePWqijIfZsPBgBd2YYx7kQaQ7dKtn2NF+SvozCDjej7LmcDZ+fUEUnPotl0W48+Aufn1K0l95erA5zmV2oMB38KGgwGva3d8nZqcV9obZ0X5tucW55Wmx9qWvNLWg+rURBqV/bv8Rf5snnci6YwFUi32h8Aq4Dpg5y6M8ZfAA8CNeVrcbTFWlG17EhrGZylSk/dtwC3A7C6McTpwdU5SNwL/3IEYzwPWAM+Szk7nAUcCRxY+xzPyMdzSqd93pybnlfbFWVG2I7nFeaVpMbYtr/hRDWZmZlY6o2EMjpmZmY0yruCYmZlZ6biCY2ZmZqXjCo6ZmZmVjis4ZmZmVjqu4NiLSHpO0o35ibg3STom30200e09UWP+WZIOG2LduZK2b3TfZtYdnFes3fyoBqvm6YjYE0DSS4EfkB7QdlwHYpkL3EobHgonaWxEPNfq/ZiNUs4r1lZuwbFBRXoi7nzgKCWbSvoPSbdIukHSP8LzZ0SnD6wn6WJJBxTen5rP3H4laWLlfiTtLek3kpZL+oWk7fJZWB9wbj7z26xinY9Kuk3SzZLOz/O2LMR3s6R35Plz8rxbJZ1S2MYTkr4q6SZgRrU4mvl5mpnzivNKe7iCY0OKiDuBscBLSbefj4jYg/TgtrMlbTrEJrYAlkXEq4DfUHHGJmkc8E3gsIjYG1gEfDEifgQsA94TEXtGxNMV2z0W2CsiXk26EybA/wP6I2KPPP/XuSn6FOCNwJ7A6yS9rRDbtRHxGuDaanHU+TGZ2TA4r1iruYvKhusNpC8rEfE/ku4Gdh1inb8CF+TX3wcuqli+G+lhepdJgpT01tQRy82ks7AfAz/O8w4kPfeHHOMjkvYHroiIdQCSzgX2z+s8B/znCOMws5FxXrGmcwXHhiRpZ9IXdu0gxdazYYvgYGdflc8HEbAyImYMM7S3kBLKLOCzkvYY5voAfy70jzcah5kNk/OKtZq7qGxQuV97AXB6pAeX/RZ4T162KzAFuAO4C9hT0hhJk4F9CpsZAwxc1fBu4KqK3dwBTJQ0I293nKRX5WWPA1tViWsMMDkiLgc+BYwHtgQuIzV3D5R7CelBh/8gaYKksaQm8N9UOdzB4jCzJnFecV5pB7fgWDWbSboRGEc6g/oe6Sm5AN8Cvi3plrxsbkT8RdLVwB9JT9K9HVhR2N6TwD6SPkc6W3tXcWcR8Uwe+HeapPGkv8uvAyuBs4AFkp4GZhT6y8cC38/lBZwWEY9KOgk4Q9KtpLPDEyLiIknHApfnsj+NiJ9UHvQQcZjZyDivOK+0lZ8mbmZmZqXjLiozMzMrHVdwzMzMrHRcwTEzM7PScQXHzMzMSscVHDMzMysdV3DMzMysdFzBMTMzs9L5/4A8ynQDT6l+AAAAAElFTkSuQmCC\n", 100 | "text/plain": [ 101 | "
" 102 | ] 103 | }, 104 | "metadata": { 105 | "needs_background": "light" 106 | }, 107 | "output_type": "display_data" 108 | } 109 | ], 110 | "source": [ 111 | "import matplotlib.pyplot as plt\n", 112 | "scrub.call_doublets(threshold=0.25)\n", 113 | "scrub.plot_histogram()" 114 | ] 115 | }, 116 | { 117 | "cell_type": "code", 118 | "execution_count": 13, 119 | "metadata": {}, 120 | "outputs": [ 121 | { 122 | "name": "stdout", 123 | "output_type": "stream", 124 | "text": [ 125 | "0.021908903501512256\n" 126 | ] 127 | } 128 | ], 129 | "source": [ 130 | "print(scrub.detected_doublet_rate_)\n", 131 | "out_df['doublet_scores'] = doublet_scores\n", 132 | "out_df['predicted_doublets'] = predicted_doublets\n", 133 | "out_df.to_csv('healthy_controls_doublets.txt', index=False,header=True)" 134 | ] 135 | }, 136 | { 137 | "cell_type": "code", 138 | "execution_count": 14, 139 | "metadata": {}, 140 | "outputs": [ 141 | { 142 | "name": "stdout", 143 | "output_type": "stream", 144 | "text": [ 145 | "Counts matrix shape: 18752 rows, 33538 columns\n", 146 | "Number of genes in gene list: 33538\n", 147 | "Preprocessing...\n", 148 | "Simulating doublets...\n", 149 | "Embedding transcriptomes using PCA...\n", 150 | "Calculating doublet scores...\n", 151 | "Automatically set threshold at doublet score = 0.58\n", 152 | "Detected doublet rate = 0.4%\n", 153 | "Estimated detectable doublet fraction = 18.6%\n", 154 | "Overall doublet rate:\n", 155 | "\tExpected = 6.0%\n", 156 | "\tEstimated = 2.3%\n", 157 | "Elapsed time: 27.7 seconds\n", 158 | "Detected doublet rate = 3.7%\n", 159 | "Estimated detectable doublet fraction = 44.1%\n", 160 | "Overall doublet rate:\n", 161 | "\tExpected = 6.0%\n", 162 | "\tEstimated = 8.4%\n", 163 | "0.037116040955631396\n" 164 | ] 165 | }, 166 | { 167 | "data": { 168 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAjgAAADQCAYAAAAK/RswAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvnQurowAAIABJREFUeJzt3XmcHHW57/HPl7CEJQQl0cOSEDDAMYCCjEAOykFEDQjiS7lKDnjFg+SqB8UreoWL9wKCR/EcBRE05goCsstxIYoCKhBBtoQ1EVFEkLAFUEJYZH3uH7/fQKWZnqmZ6e7qrvm+X69+TVd1LU/1TD3z1O9XiyICMzMzszpZpeoAzMzMzFrNBY6ZmZnVjgscMzMzqx0XOGZmZlY7LnDMzMysdlzgmJmZWe24wOlSko6WdFbVcQyHpAMlXVV1HKMlaX9Jl1Ydh1nV2rkvSDpd0nHtWPYA6xpWbpJ0t6TdW7DeQbdRUkiaPtr12MBc4FQk73C3SXpK0oOSvi1pvarjahdJ0/LOvGrVsQwlIs6OiHeWmbYuRZ2NXZLeIum3kpZL+qukqyW9GYa3L7Q5xiskfbTqOKoiaVdJS6uOo9e4wKmApMOA44HPAROBnYBNgMskrd7BOLqq2OiGeLohBrNOkbQu8FPgm8CrgY2AY4BnqozLrBVc4HRYTijHAJ+MiF9ExHMRcTfwAWAacEBh8vGSzpe0QtKNkt5YWM7nJd2XP7tD0tvz+FUkHS7pT5IelXSBpFfnz/pbUQ6S9Bfg15J+LumQhhhvkfS+/P4fJV2Wj+zukPSBwnTrS7pI0uOSrgdeN8imL8g/H5P0hKSZufXjakknSHoUOFrS6yT9Osf+iKSziy1buen4s5JuzUec50sanz+bJOmnkh7L8f5G0ir5symSfijp4bzsk/P4gWJYqVUmf2efknRXjuk/8vf8emAuMDNv02N5+omSzszrukfSFwpxFNf3WF7mP+Xx90paJunDhXWvIek/Jf1F0kOS5kpac6jtNStpC4CIODciXoiIpyPi0oi4FV7ZQpn3hU9I+mPOPcfmffa3OQ9c0H+Q1jhvYf5XdMlIelX+W35Y0t/y+43zZ18C3gqcnPez/n23VbkJSR/K++qjko5s+GwNSSdKuj+/TpS0xjC2cVKOc4WkKyVt0iSGAfd1SWsDPwc2zNv/hKQNJe0gaWHexockfX2wbRyTIsKvDr6AWcDzwKoDfHYGcG5+fzTwHLAvsBrwWeDP+f2WwL3AhnnaacDr8vtDgWuBjYE1gO8UljkNCOBMYG1gTeC/A1cXYpgBPJbnXTuv5yPAqsB2wCPAjDztecAFebqtgfuAq5psd/+6Vy2MOzB/F5/My18TmA68I69/MqkwOrEwz93A9cCGpCPO24GP5c++TCo4VsuvtwICxgG3ACfkWMcDbxkkhgOL25HjvjyvbyrwB+CjhfmvatjWM4GfABPydv8BOKhhfR/JcR0H/AU4JW/zO4EVwDp5+hOAi/K6JwDzgS8Ptr1V/4371TsvYF3gUVLu2QN4VcPnA+0LP8nzbUVq6fkVsBmpNfp3wIcHmrcw//T8/nTguPx+feD9wFr57/wHwI8L813Rv8/l4VbmphnAE8AueR/8et5Hd8+ff5GUU19Dykm/BY4dxjauKCz7GwN8n/3TDrav7wosbVjPNcCH8vt1gJ2q/nvqtlflAYy1F6mF5sEmn30FuCy/Pxq4tvDZKsAD+Z/YdGAZsDuwWsMybgfeXhjegFQorcrLRcZmhc8nAE8Cm+ThLwGn5fcfBH7TsPzvAEeR/jk/B/xj4bN/HySJ9K+7scD5yxDf13uBmwrDdwMHFIa/CszN779ISr7TG5YxE3iYgYvKV8TQmLRy3LMKw58AftVk2nHAs+REm8f9D+CKwvR/LHy2TV7+awvjHgW2JRVnT5KL18K2/Hmw7fXLr+G8gNeT/hEvJf1jv6j/77HJvrBzYXgR8PnC8NfIBySN8xbmf0WBM0BM2wJ/KwxfwcoFTitz0/8FzisMr5334f4C50/AnoXP3wXcPYxtLC57HeAFYEpx2hL7+q68ssBZQOoNmFT131C3vtyc3XmPkJosBzrXY4P8eb97+99ExIukBLRhRNwJfJpUBC2TdJ6kDfOkmwA/yt0Wj5EKnheA1zZZ7grgZ8B+edRs4OzCsnbsX1Ze3v7AP5COZFYtLgu4p9xXsJLi/Eh6bd6e+yQ9DpwFTGqY58HC+6dISQPgP4A7gUtz18/hefwU4J6IeL5MDCWmuYfUgjSQSaTWlOJ3cQ/p3IZ+DxXePw0QEY3j1iF9x2sBiwrf/y/yeGi+vWalRcTtEXFgRGxMau3YEDhxkFka/1YH+tsdFklrSfpO7iZ6nPTPez1J45rM0srctCEr58QnSQcZxc8b9+dm+/9Aist+AvjrAPMPta8P5CBSF+PvJd0gaa9hxDQmuMDpvGtIzbrvK46UtA6pifhXhdFTCp+vQup2uh8gIs6JiLeQdvQgnbQMaWfaIyLWK7zGR8R9heU2PkL+XGC2pJmk7pvLC8u6smFZ60TEx0ktIs8XYyR13zTT7LH1jeP/PY/bJiLWJbV4aZDlvrygiBURcVhEbAa8B/iM0rlJ9wJTmxSVg8VW1Lid9zeZ9xHS0eMmDdPfx/A9QvqHsVXh+58YEevAoNtrNiIR8XtSq8PWLVjck6R/2gBI+odBpj2M1PW+Y97vd+mfrT+0hulbmZseYOVcuxapy6zf/bxyf+7f/8tsY3HZ65C6oO5vmGbQfZ0BclRE/DEiZpO6zo4HLszn61jmAqfDImI5qVnxm5JmSVpN0jRSf/FS4PuFybeX9L78j/nTpMLoWklbStotn+j2d9KO8WKeZy7wpf4T2SRNlrTPEGFdTNqBvwicn1uLIF1dsUU+AW+1/HqzpNdHxAvAD0kn5a4laQbw4UHW8XCOcbMhYplA6g9fLmkj0pVmpUjaS9J0SQKWk1quXiSds/MA8BVJa0saL2nnssvNPpdPhJxCOs/p/Dz+IWDj/hMr8/dyAel3MCH/Hj5Daokalvx7+H/ACZJek7dxI0nvGmJ7zUrJJ+oeppdP6J1CasW9tgWLvwXYStK2ShcCHD3ItBNIeewxpYsijmr4/CFWzh2tzE0XAnspXS6/OikPFv83ngt8IefSSaQurf79ucw27llY9rGkUw9WajUeal/P27++pIn980g6QNLkPO9jebT3/wIXOBWIiK8C/xv4T+Bx4DrSEcnbI6J4eeZPSH3NfwM+BLwvIp4jnaz2FVLV/yCpgj8iz/MNUh/6pZJWkBLVjkPE8wwpIewOnFMYv4J00ut+pCOOB0lHCmvkSQ4hNUc/SDrq+94g63iKdH7P1bkJdqcmkx4DvIn0D/tnOa6yNgd+SSqQrgG+FRGX54S3N6mv+y+kQvKDw1gupN/FIuDmHNepefyvgSXAg5L6uxc/STqyuwu4ivSdnjbM9fX7PKkb6trcdP9L0pEuNNneEa7HxqYVpPxwnaQnSfliMalFZVQi4g+kYuGXwB9J+0IzJ5JO8H8kx/CLhs+/AeyrdIXVSS3OTUuAfyPtpw+Q8m3xnjPHAQuBW4HbgBvzuLLbeA6pYPsrsD0rXylb1HRfzy1r5wJ35fy5IemClSWSnsjfz34R8XSz7RyLFFGmdd5s7JIUwOb53CczM+sBbsExMzOz2nGBY2ZmZrXjLiozMzOrHbfgmJmZWe3U8sGCkyZNimnTplUdRiUWLVrE9ttvX3UYZk0tWrTokYgY7AZmlXMOcQ6x7lU2h9Syi6qvry8WLlxYdRiVkEQdf6dWH5IWRURf1XEMxjnEOcS6V9kc4i4qMzMzqx0XOGZmZlY7LnDMzMysdlzgmJmZWe24wDEzM7PaqeVl4qNxzDHHrDR81FGND7U1Mxsd5xmz9nMLjpmZmdVO17fgSNoMOBKYGBH7dnr9PtIys9FozCFm1hmVtOBIOk3SMkmLG8bPknSHpDslHQ4QEXdFxEFVxGlmZma9qaouqtOBWcURksYBpwB7ADOA2ZJmdD40MzMz63WVFDgRsQD4a8PoHYA7c4vNs8B5wD5llylpjqSFkhY+/PDDLYzWzMzMek03nWS8EXBvYXgpsJGk9SXNBbaTdESzmSNiXkT0RUTf5Mld/Rw/MzMza7OuP8k4Ih4FPlZmWkl7A3tPnz69vUGZmZlZV+umFpz7gCmF4Y3zuNIiYn5EzJk4cWJLAzMzM7Pe0k0Fzg3A5pI2lbQ6sB9wUcUxmZmZWQ+q6jLxc4FrgC0lLZV0UEQ8DxwCXALcDlwQEUuGudy9Jc1bvnx564M2MzOznlHJOTgRMbvJ+IuBi0ex3PnA/L6+voNHugwzMzPrfV1/knE7+Q6jZmNHvtfWQuC+iNir6njMrL266RycUXMXlZkN4lBS97eZjQG1KnB8FZWZDUTSxsC7ge9WHYuZdcaY7qIaCT9806wnnQj8L2BCswkkzQHmAEydOrVDYZlZu9SqBcddVGbWSNJewLKIWDTYdL4bulm91KrAcReVmQ1gZ+A9ku4mPeNuN0lnVRuSmbVbrQocM7NGEXFERGwcEdNINxD9dUQcUHFYZtZmLnDMzMysdmpV4PgcHDMbTERc4XvgmI0NtSpwfA6OmZmZQc0KHDMzMzNwgWNmZmY15ALHzMzMaqdWBY5PMjYzMzOoWYHjk4zNzMwMalbgmJmZmYEftjlqjQ/fBD+A08yGxw/xNWs9t+CYmZlZ7bjAMTMzs9qpVYHjq6jMzMwMalbg+CoqMzMzg5oVOGZmZmbgAsfMzMxqyAWOmZmZ1c6QBY6k9TsRiJnZUJyPzKysMi0410r6gaQ9JantEZmZNed8ZGallLmT8RbA7sC/AidJugA4PSL+0NbIepjvSmrWNs5HZlbKkC04kVwWEbOBg4EPA9dLulLSzLZHOAy+D45ZvfVSPjKzapU6B0fSoZIWAp8FPglMAg4DzmlzfMPi++CY1Vsv5SMzq1aZLqprgO8D742IpYXxCyXNbU9YZmYDcj4ys1LKnGT8hYg4tphMJP03gIg4vm2RmZm9kvORmZVSpsA5fIBxR7Q6EDOzEkaUjySNl3S9pFskLZF0zFDzmFlva9pFJWkPYE9gI0knFT5aF3i+3YGZmfVrQT56BtgtIp6QtBpwlaSfR8S1bQjXzLrAYOfg3A8sBN4DLCqMXwH8z3YGZWbWYFT5KCICeCIPrpZf0eIYzayLNC1wIuIW4BZJZ0eEW2zMrDKtyEeSxpGKo+nAKRFxXcPnc4A5AFOnTh1lxGZWtcG6qC6IiA8AN0kqHumIdED0hrZHZ2ZGa/JRRLwAbCtpPeBHkraOiMWFz+cB8wD6+vrcumPW4wbrojo0/9yrE4GYmQ2iZfkoIh6TdDkwC1g81PRm1puaXkUVEQ/kt48A90bEPcAawBtJ/eFmZh0x2nwkaXJuuUHSmsA7gN+3KVwz6wJlbvS3AHirpFcBlwI3AB8E9m9nYHXiZ1OZtcxI89EGwBn5PJxVgAsi4qdtjXQUGnMGOG+YDVeZAkcR8ZSkg4BvRcRXJd3c7sBeWrm0NvAt4Fngiog4u1PrNrOuM6J8FBG3Atu1Pzwz6xZlbvSn/BC7/YGf5XHjRrNSSadJWiZpccP4WZLukHSnpP4ber0PuDAiDiZdImpmY1fL85GZ1VOZAudQ0p1CfxQRSyRtBlw+yvWeTjrB7yW56fgUYA9gBjBb0gxgY+DePNkLo1yvmfW2duQjM6uhIbuoImIBqd+7f/gu4FOjWWlELJA0rWH0DsCdeflIOg/YB1hKKnJuZpCCrJfuYeH+dbORaUc+MrN6GrLAkbQF8FlgWnH6iNitxbFsxMstNZAKmx2Bk4CTJb0bmN9sZt/Dwqz+OpiPzKzHlTnJ+AfAXOC7VNBFFBFPAh8pM62kvYG9p0+f3t6gzKwqleYjs7qrUw9DmQLn+Yj4dtsjgfuAKYXhjfO40iJiPjC/r6/v4FYGZmZdo1P5yMx6XJmTjOdL+oSkDSS9uv/VhlhuADaXtKmk1YH9gIvasB4z612dykdm1uPKtOB8OP/8XGFcAJuNdKWSzgV2BSZJWgocFRGnSjoEuIR02edpEbFkmMt1F5VZvbU8H5lZPZW5imrTVq80ImY3GX8xcPEolusuKrMaa0c+MrN6GrKLStJakr4gaV4e3lySH8BpZh3nfGRmZZU5B+d7pMck/FMevg84rm0RjYKkvSXNW758edWhmFl79Ew+MrNqlSlwXhcRXwWeA4iIpwC1NaoRioj5ETFn4sSJVYdiZu3RM/nIzKpV5iTjZyWtSTqRD0mvA55pa1RmZgPr+nw00H1EzNqh8W+tV+9X0y5lCpyjgF8AUySdDewMHNjOoEbKV1GZ1V7P5CMzq1aZq6guk3QjsBOpKfjQiHik7ZGNgK+iMqu3XspHZlatpgWOpDc1jHog/5wqaWpE3Ni+sMYeNzWaNed8ZGbDNVgLztfyz/FAH3AL6YjpDcBCYGZ7QzMze4nzkZkNS9OrqCLibRHxNtKR0psioi8itge2Y5jPiOoUXyZuVk+9mI/MrFplLhPfMiJu6x+IiMXA69sX0sj5MnGz2uuZfGRm1SpzFdWtkr4LnJWH9wdubV9IBj4nx6wJ5yMzK6VMgfMR4OPAoXl4AfDttkVkZtac85GZlVLmMvG/Ayfkl5lZZZyPzKysMi04PcM3+jOzunK3tdnw1KrA8Y3+zGwgkqYAZwKvJT3mYV5EfKPaqEbHBU9v8++v/WpV4JiZNfE8cFhE3ChpArBI0mUR8buqAzOz9ihzmfgrSJrT6kDMzEaiTD6KiAf673YcESuA24GN2h2bmVVnpC04amkUNqSBnlDsJk0zYJj5SNI00g0Cr2sYPweYAzB16tQWhWZmVRlRC05EfKfVgZiZjcRw8pGkdYD/Aj4dEY83LGdevkNy3+TJk1sdppl12JAtOJLWB44GdiadnHcV8MWIeLS9oQ3fWLuKyiep2VgzmnwkaTVScXN2RPywnXGaWfXKdFGdR7qZ1vvz8P7A+cDu7QpqpHwVlVntjSgfSRJwKnB7RHy9rRHamOODze5Upotqg4g4NiL+nF/HkS61NDPrtJHmo52BDwG7Sbo5v/Zsb6hmVqUyLTiXStoPuCAP7wtc0r6QbKT6jyL6f/oowmpoRPkoIq7CF0eYjSlNCxxJK0h93AI+zcsPt1sFeAL4bNujMzPD+cjMhq9pgRMREzoZiJlZM85HZjZcpe6DI+k9wC558IqI+Gn7QjIza875yMzKKHOZ+FeANwNn51GHSto5Io5oa2TWcr5ZoPU65yMzK6tMC86ewLYR8SKApDOAm4CuSyhj7T44ZmNQz+QjM6tW2TsZr1d4P7EdgbRCRMyPiDkTJ3ZtiGY2ej2Rj8ysWmVacL4M3CTpctIVDLsAh7c1KjOzgTkfmVkpgxY4+e6fVwE7kfq9AT4fEQ+2OzAzsyLnIzMbjkELnIgISRdHxDbARR2KyczsFZyPrBP82IX6KHMOzo2S3jz0ZGZmbed8ZGallDkHZ0fgAEl3A0+S+r0jIt7QzsDMzAbgfGRmpZQpcN7V9iisa7h51rqc85GZlTLYs6jGAx8DpgO3AadGxPOdCszMrJ/zkbWCb3Y6tgzWgnMG8BzwG2APYAZwaCeCMjNr4Hw0TP5nbmPdYAXOjHy1ApJOBa7vTEjWKgMluG7lrjEbgvPREDq1v3tftV4x2FVUz/W/cVOwmVXM+cjMhmWwFpw3Sno8vxewZh7uv2ph3bZHB0jaDDgSmBgR+3ZinWOJj/qsR3RFPjKz3tG0wImIcaNduKTTgL2AZRGxdWH8LOAbwDjguxHxlUHiuAs4SNKFo43Hhq9MAdRYrPRS15j1hlbkIzMbW8pcJj4apwMnA2f2j5A0DjgFeAewFLhB0kWkYufLDfP/a0Qsa3OMZmZmVjNtLXAiYoGkaQ2jdwDuzC0zSDoP2Ccivkxq7RkRSXOAOQBTp04d6WLMzKxHuSvciso8qqHVNgLuLQwvzeMGJGl9SXOB7SQd0Wy6iJgXEX0R0Td58uTWRWtmZmY9p91dVKMWEY+SbvA1JEl7A3tPnz69vUFZJXx0ZmZmZVXRgnMfMKUwvHEeN2oRMT8i5kycOLEVizOzmpB0mqRlkhZXHYuZdUYVLTg3AJtL2pRU2OwH/EsFcZjZ2HE6DRc82Cv1t5IWW0u7paXULbg2XG0tcCSdC+wKTJK0FDgqIk6VdAhwCenKqdMiYkmL1ucuqjHMt6a3Zppc8GBmNdbuq6hmNxl/MXBxG9Y3H5jf19d3cKuXbWb1NhauxOyle1T1UqzWnao4B8fMrOv4SkyzeqlVgSNpb0nzli9fXnUoZmZmVqGuv0x8ONxFZcM1kvN2fLKjmVn3q1ULjpnZQPIFD9cAW0paKumgqmMys/aqVQuOr6Iys4E0u+DBquErHq0TalXguIvKzKz7+Iooq0KtChwzM6ueCxrrBi5wzCrgJnozs/aqVYHjc3DMzDrLrTXWrWp1FZUftmlmZmZQswLHzMzMDGrWRWVm1ml176Kp+/ZZfbkFx8zMzGqnVgWOn0VlZmZmULMCxycZm5mZGdSswDEzMzMDFzhmZmZWQy5wzMzMrHZc4JiZmVnt1KrA8VVUZmZmBjUrcHwVlZmZmUHNChwzMzMzcIFjZmZmNeQCx8zMzGrHD9s0q5HGByMeddRRFUViZlYtt+CYmZlZ7bjAMTMzs9pxgWNmZma1U6sCxzf6M7NmJM2SdIekOyUdXnU8ZtZetSpwfKM/MxuIpHHAKcAewAxgtqQZ1UZlZu1UqwLHzKyJHYA7I+KuiHgWOA/Yp+KYzKyNFBFVx9Bykh4G7ikx6STgkTaH0wq9Eif0TqyOs7WGE+cmETG5ncE0krQvMCsiPpqHPwTsGBGHFKaZA8zJg1sCd5RcfB1/R1VynK3VK3FC+VhL5ZBa3genbPKUtDAi+todz2j1SpzQO7E6ztbqlTgHExHzgHnDna9Xtt1xtpbjbL1Wx+ouKjMbC+4DphSGN87jzKymXOCY2VhwA7C5pE0lrQ7sB1xUcUxm1ka17KIahmE3R1ekV+KE3onVcbZWV8cZEc9LOgS4BBgHnBYRS1q0+K7e9gLH2VqOs/VaGmstTzI2MzOzsc1dVGZmZlY7LnDMzMysdsZEgTPULdolrSHp/Pz5dZKmdT7KUnF+RtLvJN0q6VeSNunGOAvTvV9SSKrkEsUycUr6QP5Ol0g6p9MxFuIY6nc/VdLlkm7Kv/89K4jxNEnLJC1u8rkknZS34VZJb+p0jO3kPNLZOAvTOY+U0As5JMfRuTwSEbV+kU4o/BOwGbA6cAswo2GaTwBz8/v9gPO7NM63AWvl9x/v1jjzdBOABcC1QF83xglsDtwEvCoPv6bTcQ4j1nnAx/P7GcDdFcS5C/AmYHGTz/cEfg4I2Am4rorvs8LfkfNIC+PM0zmPtC7OynNIXnfH8shYaMEpc4v2fYAz8vsLgbdLUgdjhBJxRsTlEfFUHryWdC+PTit7y/tjgeOBv3cyuIIycR4MnBIRfwOIiGUdjrFfmVgDWDe/nwjc38H4UgARC4C/DjLJPsCZkVwLrCdpg85E13bOI63lPNJaPZFDoLN5ZCwUOBsB9xaGl+ZxA04TEc8Dy4H1OxLdADFkA8VZdBCpyu20IePMTYpTIuJnnQysQZnvcwtgC0lXS7pW0qyORbeyMrEeDRwgaSlwMfDJzoQ2LMP9G+4lziOt5TzSWnXJIdDCPDLW74PTkyQdAPQB/1x1LI0krQJ8HTiw4lDKWJXUvLwr6Sh2gaRtIuKxSqMa2Gzg9Ij4mqSZwPclbR0RL1YdmPUm55GW6ZU8MuZyyFhowSlzi/aXppG0Kqn57tGORDdADNmAt5KXtDtwJPCeiHimQ7EVDRXnBGBr4ApJd5P6UC+q4ATBMt/nUuCiiHguIv4M/IGUqDqtTKwHARcARMQ1wHjSg+m6SZ0fh+A80lrOI61VlxwCrcwjVZxk1MkXqbq+C9iUl0++2qphmn9j5ZMDL+jSOLcjnUi2eTd/nw3TX0E1JweW+T5nAWfk95NIzaLrd2msPwcOzO9fT+o/VwWxTqP5yYHvZuWTA6/vdHwV/46cR1oYZ8P0ziOjj7Mrckhef0fySMc3rKIvc09SVf0n4Mg87oukoxdIlewPgDuB64HNujTOXwIPATfn10XdGGfDtJUkppLfp0jN4L8DbgP26+K/0RnA1Tlx3Qy8s4IYzwUeAJ4jHbUeBHwM+Fjh+zwlb8NtVf3eK/wdOY+0MM6GaZ1HRh9n5Tkkx9GxPOJHNZiZmVntjIVzcMzMzGyMcYFjZmZmteMCx8zMzGrHBY6ZmZnVjgscMzMzqx0XONaUpBck3ZyfkHuLpMPyHUZHurwnmow/XdK+Q8x7oKQNR7puM+s85xCrkh/VYIN5OiK2BZD0GuAc0sPajqoglgOBxXTgAXGSxkXEC+1ej9kY4BxilXELjpUS6Qm5c4BDlIyX9D1Jt0m6SdLb4KWjpJP755P0U0m7FoZPyEdzv5I0uXE9kraXdKWkRZIukbRBPjLrA87OR4NrNszzKUm/k3SrpPPyuHUK8d0q6f15/Ow8brGk4wvLeELS1yTdAswcKI5Wfp9mY41ziHNIp7nAsdIi4i5gHPAa0m3pIyK2IT3E7QxJ44dYxNrAwojYCriShqM4SasB3wT2jYjtgdOAL0XEhcBCYP+I2DYinm5Y7uHAdhHxBtIdMQH+D7A8IrbJ43+dm6ePB3YDtgXeLOm9hdiui4g3AtcNFEfJr8nMmnAOsU5yF5WN1FtIOzAR8XtJ9wBbDDHPi8D5+f1ZwA8bPt+S9IC9yyRBSoQPlIjlVtKR2Y+BH+dxu5OeB0SO8W+SdgGuiIiHASSdDeyS53kB+K9RxmFm5TmHWFu5wLHSJG1G2omXDTLZ86zcMjjYEVnjc0IELImImcMM7d2kJLM3cKSkbYY5P8DfC33mI43DzAbhHGKd5C4qKyX3dc8FTo70ALPfAPvnz7YApgJ3AHcD20paRdIUYIfCYlYB+q90+BfgqobV3AFMljQzL3c1SVvlz1YAEwaIaxVgSkRcDnwemAisA1xGagLvn+5VpAcg/rOkSZLGkZrFrxxgcweLw8wHgBcjAAAAyElEQVRGwDnEOaTT3IJjg1lT0s3AaqSjqu+TnpoL8C3g25Juy58dGBHPSLoa+DPpybq3AzcWlvcksIOkL5CO4D5YXFlEPJtPBjxJ0kTS3+eJwBLgdGCupKeBmYU+9HHAWXl6ASdFxGOSjgNOkbSYdMR4TET8UNLhwOV52p9FxE8aN3qIOMysPOcQ55DK+GniZmZmVjvuojIzM7PacYFjZmZmteMCx8zMzGrHBY6ZmZnVjgscMzMzqx0XOGZmZlY7LnDMzMysdv4/io6KjRAIDUwAAAAASUVORK5CYII=\n", 169 | "text/plain": [ 170 | "
" 171 | ] 172 | }, 173 | "metadata": { 174 | "needs_background": "light" 175 | }, 176 | "output_type": "display_data" 177 | } 178 | ], 179 | "source": [ 180 | "input_dir = 'cellranger_output_for_COVID19_patients/'\n", 181 | "counts_matrix = scipy.io.mmread(input_dir + '/matrix.mtx').T.tocsc()\n", 182 | "genes = numpy.array(scrublet.load_genes(input_dir + '/features.tsv', delimiter='\\t', column=1))\n", 183 | "out_df = pandas.read_csv(input_dir + '/barcodes.tsv', header = None, index_col=None, names=['barcode'])\n", 184 | "print('Counts matrix shape: {} rows, {} columns'.format(counts_matrix.shape[0], counts_matrix.shape[1]))\n", 185 | "print('Number of genes in gene list: {}'.format(len(genes)))\n", 186 | "scrub = scrublet.Scrublet(counts_matrix, expected_doublet_rate=0.06)\n", 187 | "doublet_scores, predicted_doublets = scrub.scrub_doublets(min_counts=2, min_cells=3, min_gene_variability_pctl=85, \n", 188 | " n_prin_comps=30)\n", 189 | "scrub.call_doublets(threshold=0.25)\n", 190 | "scrub.plot_histogram()\n", 191 | "print(scrub.detected_doublet_rate_)\n", 192 | "out_df['doublet_scores'] = doublet_scores\n", 193 | "out_df['predicted_doublets'] = predicted_doublets\n", 194 | "out_df.to_csv('COVID19_patients_doublets.txt', index=False,header=True)" 195 | ] 196 | }, 197 | { 198 | "cell_type": "code", 199 | "execution_count": null, 200 | "metadata": {}, 201 | "outputs": [], 202 | "source": [] 203 | } 204 | ], 205 | "metadata": { 206 | "kernelspec": { 207 | "display_name": "Python 3", 208 | "language": "python", 209 | "name": "python3" 210 | }, 211 | "language_info": { 212 | "codemirror_mode": { 213 | "name": "ipython", 214 | "version": 3 215 | }, 216 | "file_extension": ".py", 217 | "mimetype": "text/x-python", 218 | "name": "python", 219 | "nbconvert_exporter": "python", 220 | "pygments_lexer": "ipython3", 221 | "version": "3.6.8" 222 | } 223 | }, 224 | "nbformat": 4, 225 | "nbformat_minor": 2 226 | } 227 | -------------------------------------------------------------------------------- /.ipynb_checkpoints/step1_detect_doublets_using_scrublet-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 9, 6 | "metadata": {}, 7 | "outputs": [ 8 | { 9 | "name": "stdout", 10 | "output_type": "stream", 11 | "text": [ 12 | "Counts matrix shape: 66457 rows, 33538 columns\n", 13 | "Number of genes in gene list: 33538\n" 14 | ] 15 | } 16 | ], 17 | "source": [ 18 | "import numpy,pandas,scipy.io,scipy.sparse\n", 19 | "import scrublet\n", 20 | "#\n", 21 | "#\n", 22 | "input_dir = 'cellranger_output_for_healthy_controls/'\n", 23 | "counts_matrix = scipy.io.mmread(input_dir + '/matrix.mtx').T.tocsc()\n", 24 | "genes = numpy.array(scrublet.load_genes(input_dir + '/features.tsv', delimiter='\\t', column=1))\n", 25 | "out_df = pandas.read_csv(input_dir + '/barcodes.tsv', header = None, index_col=None, names=['barcode'])\n", 26 | "print('Counts matrix shape: {} rows, {} columns'.format(counts_matrix.shape[0], counts_matrix.shape[1]))\n", 27 | "print('Number of genes in gene list: {}'.format(len(genes)))\n", 28 | "#" 29 | ] 30 | }, 31 | { 32 | "cell_type": "code", 33 | "execution_count": 10, 34 | "metadata": {}, 35 | "outputs": [], 36 | "source": [ 37 | "scrub = scrublet.Scrublet(counts_matrix, expected_doublet_rate=0.06)" 38 | ] 39 | }, 40 | { 41 | "cell_type": "code", 42 | "execution_count": 11, 43 | "metadata": {}, 44 | "outputs": [ 45 | { 46 | "name": "stdout", 47 | "output_type": "stream", 48 | "text": [ 49 | "Preprocessing...\n", 50 | "Simulating doublets...\n", 51 | "Embedding transcriptomes using PCA...\n", 52 | "Calculating doublet scores...\n", 53 | "Automatically set threshold at doublet score = 0.35\n", 54 | "Detected doublet rate = 1.7%\n", 55 | "Estimated detectable doublet fraction = 47.8%\n", 56 | "Overall doublet rate:\n", 57 | "\tExpected = 6.0%\n", 58 | "\tEstimated = 3.6%\n", 59 | "Elapsed time: 236.5 seconds\n" 60 | ] 61 | } 62 | ], 63 | "source": [ 64 | "doublet_scores, predicted_doublets = scrub.scrub_doublets(min_counts=2, min_cells=3, min_gene_variability_pctl=85, \n", 65 | " n_prin_comps=30)" 66 | ] 67 | }, 68 | { 69 | "cell_type": "code", 70 | "execution_count": 12, 71 | "metadata": {}, 72 | "outputs": [ 73 | { 74 | "name": "stdout", 75 | "output_type": "stream", 76 | "text": [ 77 | "Detected doublet rate = 2.2%\n", 78 | "Estimated detectable doublet fraction = 51.4%\n", 79 | "Overall doublet rate:\n", 80 | "\tExpected = 6.0%\n", 81 | "\tEstimated = 4.3%\n" 82 | ] 83 | }, 84 | { 85 | "data": { 86 | "text/plain": [ 87 | "(
,\n", 88 | " array([,\n", 89 | " ],\n", 90 | " dtype=object))" 91 | ] 92 | }, 93 | "execution_count": 12, 94 | "metadata": {}, 95 | "output_type": "execute_result" 96 | }, 97 | { 98 | "data": { 99 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAjgAAADQCAYAAAAK/RswAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvnQurowAAIABJREFUeJzt3XuYHVWZ7/HvLyHcITgkKoSEgAHGCArSAhkdhlFmDEjERxlNFA/xRHNQUTyij3g5chFFnFEUQWNGMoAiFx1GE0QFFUQYAiThGhg0IkggkHBrrgrB9/yxVkNls3f37t37Wv37PE89vXfVqqq3dvd+e9Vaq6oUEZiZmZmVyZhOB2BmZmbWbK7gmJmZWem4gmNmZmal4wqOmZmZlY4rOGZmZlY6ruCYmZlZ6biC02UkHS/p+52OYzgkzZV0VafjGClJ75F0aafjMOs2rfxuSDpL0kmt2HaVfQ0rV0m6S9KBTdjvoMcoKSRNG+l+bEOu4LRZ/oLdIukpSfdL+rakbTodV6tImpq/vBt1OpahRMS5EfHP9ZQtS6XObICkN0j6b0n9kh6WdLWk18HwvhstjvEKSe/vdBydIukASas7HUevcAWnjSQdA5wCfBIYD+wH7AhcJmnjNsbRVZWNboinG2Iw6xRJWwMXA98E/gaYBJwA/KWTcZmNhCs4bZITyAnARyLi5xHxbETcBbwTmAocXii+qaQLJD0uaYWk1xS28ylJ9+Zld0h6U54/RtKxkv4g6SFJF0r6m7xsoBVlnqQ/Ab+W9DNJR1XEeJOkt+fXfyvpsnwmd4ekdxbKbStpsaTHJF0HvGKQQ78y/3xU0hOSZuTWj6slnSrpIeB4Sa+Q9Osc+4OSzi22bOWm4k9IujmfYV4gadO8bIKkiyU9muP9raQxedlkSRdJWpe3fXqeXy2GDVpl8mf2UUl35pj+NX/OrwQWADPyMT2ay4+XdE7e192SPleIo7i/R/M2/y7Pv0fSWklHFPa9iaR/k/QnSQ9IWiBps6GO16xBuwJExHkR8VxEPB0Rl0bEzfDiFsv83fiQpN/nXPSF/B3+75wXLhw4aatct7D+i7pkJL0k/22vk/RIfr1DXvZF4O+B0/P3buC73KxchaT35u/uQ5I+W7FsE0lfl3Rfnr4uaZNhHOOEHOfjkn4jaccaMVT97kvaAvgZsH0+/ickbS9pH0nL8jE+IOlrgx3jqBIRntowATOB9cBGVZadDZyXXx8PPAscBowDPgH8Mb/eDbgH2D6XnQq8Ir8+GlgK7ABsAnynsM2pQADnAFsAmwH/C7i6EMN04NG87hZ5P+8DNgL2Ah4Epuey5wMX5nK7A/cCV9U47oF9b1SYNzd/Fh/J298MmAb8U97/RFLF6OuFde4CrgO2J51h3g4cmZedTKpwjMvT3wMCxgI3AafmWDcF3jBIDHOLx5HjvjzvbwrwO+D9hfWvqjjWc4CfAFvl4/4dMK9if+/LcZ0E/Ak4Ix/zPwOPA1vm8qcCi/O+twKWACcPdryd/hv31LsTsDXwECkXHQS8pGJ5te/GT/J6ryK19PwK2JnUOn0bcES1dQvrT8uvzwJOyq+3Bd4BbJ7/7n8I/Liw3hUD38H8vpm5ajrwBLB//k5+LX9nD8zLTyTl2JeSctR/A18YxjE+Xtj2N6p8ngNlB/vuHwCsrtjPNcB78+stgf06/ffULVPHAxgtE6mF5v4ay74MXJZfHw8sLSwbA6zJ/8SmAWuBA4FxFdu4HXhT4f12pIrSRrxQydi5sHwr4Elgx/z+i8Ci/PpdwG8rtv8d4DjSP+dngb8tLPvSIEljYN+VFZw/DfF5vQ24ofD+LuDwwvuvAAvy6xNJyXZaxTZmAOuoXql8UQyVSSrHPbPw/kPAr2qUHQs8Q06sed7/Aa4olP99YdkeefsvK8x7CNiTVDl7klx5LRzLHwc7Xk+eRjIBryT9I15N+se+eODvs8Z34/WF98uBTxXef5V8glK5bmH9F1VwqsS0J/BI4f0VbFjBaWau+jxwfuH9Fvk7PVDB+QNwcGH5m4G7hnGMxW1vCTwHTC6WreO7fwAvruBcSeodmNDpv6Fum9ys3T4Pkpooq4312C4vH3DPwIuI+Csp4WwfEauAj5EqQWslnS9p+1x0R+C/crfFo6QKz3PAy2ps93Hgp8DsPGsOcG5hW/sObCtv7z3Ay0lnLhsVtwXcXd9HsIHi+kh6WT6eeyU9BnwfmFCxzv2F10+RkgTAvwKrgEtz18+xef5k4O6IWF9PDHWUuZvUglTNBFJrSvGzuJs0lmHAA4XXTwNEROW8LUmf8ebA8sLn//M8H2ofr1nDIuL2iJgbETuQWju2B74+yCqVf7vV/paHRdLmkr6Tu4keI/3z3kbS2BqrNDNXbc+GOfJJ0klHcXnl97tWPqimuO0ngIerrD/Ud7+aeaQuxv+RdL2kQ4YRU6m5gtM+15Cacd9enClpS1KT8K8KsycXlo8hdTvdBxARP4iIN5C+2EEatAzpy3NQRGxTmDaNiHsL2618dPx5wBxJM0jdN5cXtvWbim1tGREfJLWIrC/GSOq+qaXW4+or538pz9sjIrYmtXhpkO2+sKGIxyPimIjYGXgr8HGlsUn3AFNqVCoHi62o8jjvq7Hug6SzxR0ryt/L8D1I+gfxqsLnPz4itoRBj9esKSLif0itDrs3YXNPkv5pAyDp5YOUPYbUFb9vzgP7D6w2EFpF+WbmqjVsmHs3J3WZDbiPF3+/B/JBPcdY3PaWpC6o+yrKDPrdp0rOiojfR8QcUtfZKcCP8nidUc8VnDaJiH5SM+I3Jc2UNE7SVFL/8Grge4Xie0t6e/7H/DFSxWippN0kvTEPbPsz6Yvw17zOAuCLAwPXJE2UdOgQYV1C+sKeCFyQW4sgXU2xax5wNy5Pr5P0yoh4DriINCh3c0nTgSMG2ce6HOPOQ8SyFan/u1/SJNKVZnWRdIikaZIE9JNarv5KGrOzBviypC0kbSrp9fVuN/tkHvg4mTTO6YI8/wFgh4GBlPlzuZD0O9gq/x4+TmqJGpb8e/h34FRJL83HOEnSm4c4XrOG5IG6x+iFAb2TSa26S5uw+ZuAV0naU+nCgOMHKbsVKa89qnSRxHEVyx9gw1zSzFz1I+AQpcvlNyblxeL/yPOAz+XcOoHUpTXw/a7nGA8ubPsLpKEIG7QiD/Xdz8e/raTxA+tIOlzSxLzuo3m28wGu4LRVRHwF+Azwb8BjwLWkM5A3RUTxcsyfkPqWHwHeC7w9Ip4lDU77MqmWfz+pxv7pvM43SH3ml0p6nJSY9h0inr+QEsCBwA8K8x8nDXqdTTrDuJ90ZrBJLnIUqfn5ftJZ3n8Mso+nSON7rs5NrvvVKHoC8FrSP+yf5rjqtQvwS1IF6RrgWxFxeU5ws0h9238iVSTfNYztQvpdLAduzHGdmef/GlgJ3C9poHvxI6QzuTuBq0if6aJh7m/Ap0jdUEtzU/0vSWe2UON4G9yPGaQBsPsC10p6kpQ/biW1qIxIRPyOVFn4JfB70nejlq+TBvw/mGP4ecXybwCHKV1hdVqTc9VK4MOk7+0aUv4t3nPmJGAZcDNwC7Aiz6v3GH9AqrA9DOzNhlfOFtX87ueWtfOAO3M+3Z50ActKSU/kz2d2RDxd6zhHE0XU00pvNvpICmCXPPbJzMx6iFtwzMzMrHRcwTEzM7PScReVmZmZlY5bcMzMzKx0SvmAwQkTJsTUqVM7HUbbLV++nL333rvTYZg9b/ny5Q9GxGA3KesZzitm3aHevFLKLqq+vr5YtmxZp8NoO0mU8fdpvUvS8ojo63QczeC8YtYd6s0r7qIyMzOz0ilVBUfSLEkL+/v7Ox2KmZmZdVCpKjgRsSQi5o8fP37owmZmZlZapargmJmZmYErOGZmZlZCpbxMvF4nnHDCBu+PO67ywbVmZiPnXGPWfm7BMTMzs9JxBcfMzMxKxxUcMysVSZtKuk7STZJWSjqhSplNJF0gaZWkayVNbX+kZtZKXT8GR9LOwGeB8RFxWCv35X5ys1L4C/DGiHhC0jjgKkk/i4ilhTLzgEciYpqk2cApwLs6EayZtUZHWnAkLZK0VtKtFfNnSrojn1UdCxARd0bEvE7EaWa9J5In8ttxeap81sChwNn59Y+AN0lSm0I0szboVBfVWcDM4gxJY4EzgIOA6cAcSdPbH5qZ9TpJYyXdCKwFLouIayuKTALuAYiI9UA/sG17ozSzVupIBScirgQerpi9D7Aqt9g8A5xPOssyMxuWiHguIvYEdgD2kbR7I9uRNF/SMknL1q1b19wgzaylummQ8fNnVNlqYJKkbSUtAPaS9OlaKzsRmVmliHgUuJyKFmPgXmAygKSNgPHAQ1XWXxgRfRHRN3HixFaHa2ZN1PWDjCPiIeDIOsotlLQGmLXxxhvv3frIzKwbSZoIPBsRj0raDPgn0iDiosXAEcA1wGHAryOicpyOmfWwbmrBef6MKtshz6ubH7ZpZsB2wOWSbgauJ43BuVjSiZLemsucCWwraRXwceDYDsVqZi3STS041wO7SNqJVLGZDbx7OBuQNAuYNW3atBaEZ2a9ICJuBvaqMv/zhdd/Bv6lnXGZWXt16jLx80hNw7tJWi1pXr6S4SjgF8DtwIURsXI423ULjpmZmUGHWnAiYk6N+ZcAlzS63Wa34FTe+A988z8zM7Ne0E1jcEbMLThmZmYG3TUGx8xsVKjWOlzklmKzkStVC46kWZIW9vf3dzoUMzMz66BSteBExBJgSV9f3wc6HYuZWaP84F+zkStVC46ZmZkZlKyC4y4qMzMzA3dRDZubjs3MzLpfqVpwzMzMzMAVHDMzMyuhUlVwPAbHzMzMwGNwzMy6nh8bYzZ8pWrBMTMzMwNXcMysZCRNlnS5pNskrZR0dJUyB0jql3Rjnj7fiVjNrHVK1UXVCb5s3KzrrAeOiYgVkrYClku6LCJuqyj324g4pAPxmVkblKoFx4OMzSwi1kTEivz6ceB2YFJnozKzditVC44HGZtZkaSpwF7AtVUWz5B0E3Af8ImIWFll/fnAfIApU6a0LlCzzL0CzVOqFhwzswGStgT+E/hYRDxWsXgFsGNEvAb4JvDjatuIiIUR0RcRfRMnTmxtwGbWVK7gmFnpSBpHqtycGxEXVS6PiMci4on8+hJgnKQJbQ7TzFrIFRwzKxVJAs4Ebo+Ir9Uo8/JcDkn7kHLhQ+2L0sxarVRjcMzMgNcD7wVukXRjnvcZYApARCwADgM+KGk98DQwOyKiE8Ha6OHxNe1VqgqOpFnArGnTpnU6FDPrkIi4CtAQZU4HTm9PRGbWCaWq4HTDVVSuoZuZWav4f0z9hhyDI2nbdgRiZlbk3GNmI1HPIOOlkn4o6eCBQXlmZm3g3GNmDaungrMrsJA0aO/3kr4kadfWhmVm5txjZo0bsoITyWURMQf4AHAEcJ2k30ia0fIIzWxUcu4xs5EYcpBx7gc/nHQW9QDwEWAxsCfwQ2CnVgZoZqOTc4+ZjUQ9V1FdA3wPeFtErC7MXyZpQWvCMjNz7hmMr6YxG1w9FZzPRcSFxRmS/iUifhgRp7QoLjMz5x7raZWVUGuveio4xwIXVsz7NKmJuOUkbQF8C3gGuCIizm3HfpvFZ1lmDeto7jGz3lazgiPpIOBgYJKk0wqLtgbWj2SnkhYBhwBrI2L3wvyZwDeAscB3I+LLwNuBH0XEEkkXAD1VwTGz4Wll7jGz0WOwq6juA5YBfwaWF6bFwJtHuN+zgJnFGZLGAmcABwHTgTmSpgM7APfkYs+NcL9m1v1amXvMbJSo2YITETcBN0k6NyKaetYUEVdKmloxex9gVUTcCSDpfOBQYDWpknMjg1TIJM0H5gNMmTKlmeGaWRu1MveY2egxWBfVhRHxTuAGScWn7Ip0i4pXNzmWSbzQUgOpYrMvcBpwuqS3AEtqrRwRC0k3BaOvr89PBTbrUR3IPWZWQoMNMj46/zykHYHUEhFPAu+rp6yfJm5WCiPKPZImA+cALwMCWBgR36goI9J4v4OBp4C5EbGi4YjNrOvU7PKJiDX55YPAPRFxN7AJ8BpSH3mz3QtMLrzfIc+rW0QsiYj548ePb2pgZtY+Tcg964FjImI6sB/w4Tyer+ggYJc8zQe+3YzYzax71PMsqiuBTSVNAi4l3VX0rBbEcj2wi6SdJG0MzCYNKqybpFmSFvb397cgPDNrs4ZyT0SsGWiNiYjHgdtJXeBFhwLn5MdBLAW2kbRdM4M3s86q5z44ioinJM0DvhURX5F040h2Kuk84ABggqTVwHERcaako4BfkC4TXxQRK4ez3YhYAizp6+v7wEjiM7OuMOLcky9m2Au4tmJRtTF/k4A1xUK+eMHKYLTej62uCk5+sN17gHl53tiR7DQ/PK/a/EuASxrdbi+Mwal2Z8vR8sdmNkwjyj2StgT+E/hYRDzWSAC+eMGsd9XTRXU06e6h/xURKyXtDFze2rAa4zE4ZqXScO6RNI5UuTk3Ii6qUmTEY/7MrLsN2YITEVeS+sIH3t8JfLSVQZmZNZp78hVSZwK3R8TXahRbDByV77e1L9BfGNxsZiUwZAVH0q7AJ4CpxfIR8cbWhdWYXuiiMrP6jCD3vJ40IPmWwpidzwBT8voLSF3hBwOrSJeJ13UrCjPrHfWMwfkhsAD4Ll3+qAQPMjYrlYZyT0RcRbop4GBlAvjwiKIzs65WTwVnfUT4HhFm1m7OPdbVRuvVSb2inkHGSyR9SNJ2kv5mYGp5ZA3wfXDMSqVnco+ZdZ96WnCOyD8/WZgXwM7ND2dk3EVlVio9k3u6gVsTzDZUz1VUO7UjEDOzIuceMxuJIbuoJG0u6XOSFub3u0jq6AM4a3EXlVl59FLuMbPuU08X1X8Ay4G/y+/vJV3dcHGrgmpUr3ZRuWnZrKqeyT1m1n3qGWT8ioj4CvAsQEQ8xRCXYJqZNYFzj5k1rJ4KzjOSNiMN7kPSK4C/tDQqMzPnHjMbgXq6qI4Dfg5MlnQu6S6hc1sZlJkZzj1mNgL1XEV1maQVwH6k5uGjI+LBlkfWAD+qwaw8ein3mFn3qVnBkfTailkDD6KbImlKRKxoXViN6dVBxmb2gl7MPWbWfQZrwflq/rkp0AfcRDqLejWwDJjR2tDMbJRy7jGzEatZwYmIfwSQdBHw2oi4Jb/fHTi+LdGNUr5s3EYz5x4za4Z6rqLabSDBAETErcArWxeSmRng3GNmI1BPBedmSd+VdECe/h24udWBmdmo11DukbRI0lpJt9ZYfoCkfkk35unzTY/czDqunsvE3wd8EDg6v78S+HbLIjIzSxrNPWcBpwPnDFLmtxHhxz6YlVg9l4n/GTg1T13Nl4mblUejuScirpQ0tRUxmVnvqKcFp2eU9TJxDzo2a7oZkm4C7gM+ERErqxWSNB+YDzBlypQ2hmfdpjIPl1m1Y+3F/zv1jMExMyuTFcCOEfEa4JvAj2sVjIiFEdEXEX0TJ05sW4BmNnKu4JjZqBIRj0XEE/n1JcA4SRM6HJaZNVlDFZzcbGtm1lbNyD2SXi5J+fU+pDz40Ei3a2bdpdExOGpqFGZm9Rky90g6DzgAmCBpNemhneMAImIBcBjwQUnrgaeB2RERLYvYzDqioQpORHyn2YGYmQ2lntwTEXOGWH466TJyMyuxISs4krYl3R799UAAVwEnRoSbdM2sZXo194ymq23Mulk9LTjnk26w9Y78/j3ABcCBrQqqSNLOwGeB8RFxWDv22e3Kcgmf2RA6mnt6nW8vYaNdPYOMt4uIL0TEH/N0EvCyejZe65bpkmZKukPSKknHDraNiLgzIubVsz8zK5WGc4+ZWT0VnEslzZY0Jk/vBH5R5/bPAmYWZ0gaC5wBHARMB+ZImi5pD0kXV0wvHcaxmFm5jCT3mNkoV7OLStLjpH5vAR8Dvp8XjQGeAD4x1MZr3DJ9H2BVRNyZ93M+cGhEnAz42TBmo1wzco+ZWc0WnIjYKiK2zj/HRMRGeRoTEVuPYJ+TgHsK71fneVVJ2lbSAmAvSZ8epNx8ScskLVu3bt0IwjOzTmph7jGzUaSuy8QlvRXYP7+9IiIubl1IG8pXTBxZR7mFktYAszbeeOO9Wx+ZmbVaJ3OPmfW2IcfgSPoycDRwW56OlnTyCPZ5LzC58H6HPG/EImJJRMwfP358MzZnZh3UgtxjZqNIPS04BwN7RsRfASSdDdwA1OwuGsL1wC6SdiJVbGYD725wWxuQNAuYNW3atGZsrqcMXBI68NOXhFoJNDv3mNXky+rLp947GW8DPJxf1908Uu2W6RFxpqSjSFdDjAUWRcTK+kOuLSKWAEv6+vo+0Izt2ejh5Na1Gso9Zmb1VHBOBm6QdDnpqob9gUHvXTOg1i3T8xN8L6k3yHqN5hYcsxJqOPeYmQ1awclP3L0K2A94XZ79qYi4v9WBNcItOGbl0Gu5x8y6z6AVnIgISZdExB7A4jbFZNZx7rLqLOceMxuperqoVkh6XURc3/JoRshdVAZDP+ywGZUVV4DaoqHcI2kR6aahayNi9yrLBXyDNIj5KWBuRKxoRsDdzH+zNtrUU8HZFzhc0l3Ak6S+8IiIV7cysEa4i+oFnUhmjexzqHX8YNFRrdHccxZwOnBOjeUHAbvkaV/g2/mnmZVIPRWcN7c8CjOzF2so99R4REzRocA5ERHAUknbSNouItY0sj8z606DPYtqU9IdhKcBtwBnRsT6dgXWCHdRmfW+NuSeWo+LeVEFR9J8YD7AlClTmhiCmbXaYHcyPhvoIyWYg4CvtiWiEfCdjM1KoWtyT0QsjIi+iOibOHFip8IwswYM1kU1PV/BgKQzgevaE5J1q3aM6xlqgHAv8aDOhrU697TscTG9xOPbrOwGq+A8O/AiItanCw+sV3XLP9tmVGC65VisZVqdexYDR0k6nzS4uN/jb0afMp1MWXWDVXBeI+mx/FrAZvn9wJUMW7c8umHyGJzG+WzOusiIck+1R8QA40grLyDdRf1gYBXpMvH3teIgzKyzalZwImJsOwNpBl8mXj+fvVi3GmnuqfWImMLyAD48kn2YWfer92GbNgq5EmRmZr1qsKuozMzMzHqSW3DMzMxsRLrx4o9SVXA8yLi7uItreDzQ28yseUpVwfEgYzMzq+STrdHJY3DMzMysdFzBMTMzs9IpVReVmZk1rhsHipo1yi04ZmZmVjqlquBImiVpYX9/f6dDMTMzsw4qVReVr6IyMyuf4Xad+aopg5K14JiZmZmBKzhmZmZWQq7gmFnpSJop6Q5JqyQdW2X5XEnrJN2Yp/d3Ik4za51SjcGx9nI/t3UjSWOBM4B/AlYD10taHBG3VRS9ICKOanuAZtYWbsExs7LZB1gVEXdGxDPA+cChHY7JzNrMLThmPcw3ZqtqEnBP4f1qYN8q5d4haX/gd8D/jYh7KgtImg/MB5gyZUoLQrVmcGuyVeMWHDMbjZYAUyPi1cBlwNnVCkXEwojoi4i+iRMntjXAbjBQcTjhhBNcibCe0/UtOJLeBrwF2Bo4MyIu7XBIZtbd7gUmF97vkOc9LyIeKrz9LvCVNsTV81rRYuhWSGuVlrbgSFokaa2kWyvmD3qFQ1FE/DgiPgAcCbyrlfGaWSlcD+wiaSdJGwOzgcXFApK2K7x9K3B7G+MzszZodQvOWcDpwDkDM2pd4QCMBU6uWP9/R8Ta/PpzeT0zs5oiYr2ko4BfkPLKoohYKelEYFlELAY+KumtwHrgYWBuxwK2YXN3mdWjpRWciLhS0tSK2c9f4QAg6Xzg0Ig4GTikchuSBHwZ+FlErKi1Lw8GNLMBEXEJcEnFvM8XXn8a+HS74zJXTqx9OjHIuNoVDpMGKf8R4EDgMElH1io02gcDmpmZ2Qu6fpBxRJwGnFZPWUmzgFnTpk1rbVBmZlYXt9hYp3SigjPkFQ6N8tPEzczap1rlxVdBWbfoRAXn+SscSBWb2cC7m7Fht+CYmXWWW2ysW7T6MvHzgGuA3SStljQvItYDA1c43A5cGBErm7G/iFgSEfPHjx/fjM2ZmZlZj2r1VVRzasx/0RUOzeAWHDMzM4MeGGQ8HB6DY2bW29zFZc3iZ1GZmZlZ6ZSqgiNplqSF/f39nQ7FzMzMOqhUFRwPMjYzMzMo2RgcMxuan95sZqNBqVpw3EVlZmZmULIKjruozMzMDEpWwTEzMzODklVw3EVlZgCSZkq6Q9IqScdWWb6JpAvy8mslTW1/lGbWSqWq4LiLyswkjQXOAA4CpgNzJE2vKDYPeCQipgGnAqe0N0oza7VSVXDMzIB9gFURcWdEPAOcDxxaUeZQ4Oz8+kfAmySpjTGaWYspIjodQ9NJWgfcXUfRCcCDLQ5npHohRuiNOB1jcwwnxh0jYmIrg6kk6TBgZkS8P79/L7BvRBxVKHNrLrM6v/9DLvNgxbbmA/Pz292AO+oIoRd+h9AbcTrG5uiFGKH+OOvKK6W8D069CVXSsojoa3U8I9ELMUJvxOkYm6MXYmyWiFgILBzOOr3y+fRCnI6xOXohRmh+nO6iMrOyuReYXHi/Q55XtYykjYDxwENtic7M2sIVHDMrm+uBXSTtJGljYDawuKLMYuCI/Pow4NdRxv56s1GslF1UwzCspucO6YUYoTfidIzN0dUxRsR6SUcBvwDGAosiYqWkE4FlEbEYOBP4nqRVwMOkSlCzdPXnU9ALcTrG5uiFGKHJcZZykLGZmZmNbu6iMjMzs9JxBcfMzMxKZ1RUcHrhtu11xPhxSbdJulnSryTt2G0xFsq9Q1JI6shlifXEKemd+fNcKekH3RajpCmSLpd0Q/6dH9yBGBdJWpvvGVNtuSSdlo/hZkmvbXeMneS80r44C+U6llucV5oWY/vySkSUeiINMvwDsDOwMXATML2izIeABfn1bOCCLozxH4HN8+sPdmOMudxWwJXAUqCvS3/fuwA3AC/J71/ahTEuBD6YX08H7urAZ7k/8Frg1hrLDwZ+BgjYD7i23TF2anJeaW+cuVzHcovzSlPjbFteGQ0tOL1w2/YhY4yIyyPiqfx2KeneHu1Uz+cI8AXSc33+3M7gCuqJ8wPAGRHxCEBErO3CGAPYOr8eD9zXxvhSABFXkq4wquVQ4Jx8qK8OAAAFN0lEQVRIlgLbSNquPdF1nPNK8/RCbnFeaZJ25pXRUMGZBNxTeL86z6taJiLWA/3Atm2JrmL/WbUYi+aRarjtNGSMuSlxckT8tJ2BVajns9wV2FXS1ZKWSprZtuiSemI8Hjhc0mrgEuAj7QltWIb7d1smzivN0wu5xXmlfZqWV0b7fXB6jqTDgT7gHzodS5GkMcDXgLkdDqUeG5Gakw8gnbFeKWmPiHi0o1FtaA5wVkR8VdIM0j1bdo+Iv3Y6MCufbs0r0FO5xXmly4yGFpxeuG17PTEi6UDgs8BbI+IvbYptwFAxbgXsDlwh6S5S3+niDgwGrOezXA0sjohnI+KPwO9Iiald6olxHnAhQERcA2xKehBdN6nr77aknFeapxdyi/NK+zQvr7R7gFG7J1Kt+k5gJ14YePWqijIfZsPBgBd2YYx7kQaQ7dKtn2NF+SvozCDjej7LmcDZ+fUEUnPotl0W48+Aufn1K0l95erA5zmV2oMB38KGgwGva3d8nZqcV9obZ0X5tucW55Wmx9qWvNLWg+rURBqV/bv8Rf5snnci6YwFUi32h8Aq4Dpg5y6M8ZfAA8CNeVrcbTFWlG17EhrGZylSk/dtwC3A7C6McTpwdU5SNwL/3IEYzwPWAM+Szk7nAUcCRxY+xzPyMdzSqd93pybnlfbFWVG2I7nFeaVpMbYtr/hRDWZmZlY6o2EMjpmZmY0yruCYmZlZ6biCY2ZmZqXjCo6ZmZmVjis4ZmZmVjqu4NiLSHpO0o35ibg3STom30200e09UWP+WZIOG2LduZK2b3TfZtYdnFes3fyoBqvm6YjYE0DSS4EfkB7QdlwHYpkL3EobHgonaWxEPNfq/ZiNUs4r1lZuwbFBRXoi7nzgKCWbSvoPSbdIukHSP8LzZ0SnD6wn6WJJBxTen5rP3H4laWLlfiTtLek3kpZL+oWk7fJZWB9wbj7z26xinY9Kuk3SzZLOz/O2LMR3s6R35Plz8rxbJZ1S2MYTkr4q6SZgRrU4mvl5mpnzivNKe7iCY0OKiDuBscBLSbefj4jYg/TgtrMlbTrEJrYAlkXEq4DfUHHGJmkc8E3gsIjYG1gEfDEifgQsA94TEXtGxNMV2z0W2CsiXk26EybA/wP6I2KPPP/XuSn6FOCNwJ7A6yS9rRDbtRHxGuDaanHU+TGZ2TA4r1iruYvKhusNpC8rEfE/ku4Gdh1inb8CF+TX3wcuqli+G+lhepdJgpT01tQRy82ks7AfAz/O8w4kPfeHHOMjkvYHroiIdQCSzgX2z+s8B/znCOMws5FxXrGmcwXHhiRpZ9IXdu0gxdazYYvgYGdflc8HEbAyImYMM7S3kBLKLOCzkvYY5voAfy70jzcah5kNk/OKtZq7qGxQuV97AXB6pAeX/RZ4T162KzAFuAO4C9hT0hhJk4F9CpsZAwxc1fBu4KqK3dwBTJQ0I293nKRX5WWPA1tViWsMMDkiLgc+BYwHtgQuIzV3D5R7CelBh/8gaYKksaQm8N9UOdzB4jCzJnFecV5pB7fgWDWbSboRGEc6g/oe6Sm5AN8Cvi3plrxsbkT8RdLVwB9JT9K9HVhR2N6TwD6SPkc6W3tXcWcR8Uwe+HeapPGkv8uvAyuBs4AFkp4GZhT6y8cC38/lBZwWEY9KOgk4Q9KtpLPDEyLiIknHApfnsj+NiJ9UHvQQcZjZyDivOK+0lZ8mbmZmZqXjLiozMzMrHVdwzMzMrHRcwTEzM7PScQXHzMzMSscVHDMzMysdV3DMzMysdFzBMTMzs9L5/4A8ynQDT6l+AAAAAElFTkSuQmCC\n", 100 | "text/plain": [ 101 | "
" 102 | ] 103 | }, 104 | "metadata": { 105 | "needs_background": "light" 106 | }, 107 | "output_type": "display_data" 108 | } 109 | ], 110 | "source": [ 111 | "import matplotlib.pyplot as plt\n", 112 | "scrub.call_doublets(threshold=0.25)\n", 113 | "scrub.plot_histogram()" 114 | ] 115 | }, 116 | { 117 | "cell_type": "code", 118 | "execution_count": 13, 119 | "metadata": {}, 120 | "outputs": [ 121 | { 122 | "name": "stdout", 123 | "output_type": "stream", 124 | "text": [ 125 | "0.021908903501512256\n" 126 | ] 127 | } 128 | ], 129 | "source": [ 130 | "print(scrub.detected_doublet_rate_)\n", 131 | "out_df['doublet_scores'] = doublet_scores\n", 132 | "out_df['predicted_doublets'] = predicted_doublets\n", 133 | "out_df.to_csv('healthy_controls_doublets.txt', index=False,header=True)" 134 | ] 135 | }, 136 | { 137 | "cell_type": "code", 138 | "execution_count": 14, 139 | "metadata": {}, 140 | "outputs": [ 141 | { 142 | "name": "stdout", 143 | "output_type": "stream", 144 | "text": [ 145 | "Counts matrix shape: 18752 rows, 33538 columns\n", 146 | "Number of genes in gene list: 33538\n", 147 | "Preprocessing...\n", 148 | "Simulating doublets...\n", 149 | "Embedding transcriptomes using PCA...\n", 150 | "Calculating doublet scores...\n", 151 | "Automatically set threshold at doublet score = 0.58\n", 152 | "Detected doublet rate = 0.4%\n", 153 | "Estimated detectable doublet fraction = 18.6%\n", 154 | "Overall doublet rate:\n", 155 | "\tExpected = 6.0%\n", 156 | "\tEstimated = 2.3%\n", 157 | "Elapsed time: 27.7 seconds\n", 158 | "Detected doublet rate = 3.7%\n", 159 | "Estimated detectable doublet fraction = 44.1%\n", 160 | "Overall doublet rate:\n", 161 | "\tExpected = 6.0%\n", 162 | "\tEstimated = 8.4%\n", 163 | "0.037116040955631396\n" 164 | ] 165 | }, 166 | { 167 | "data": { 168 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAjgAAADQCAYAAAAK/RswAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvnQurowAAIABJREFUeJzt3XmcHHW57/HPl7CEJQQl0cOSEDDAMYCCjEAOykFEDQjiS7lKDnjFg+SqB8UreoWL9wKCR/EcBRE05goCsstxIYoCKhBBtoQ1EVFEkLAFUEJYZH3uH7/fQKWZnqmZ6e7qrvm+X69+TVd1LU/1TD3z1O9XiyICMzMzszpZpeoAzMzMzFrNBY6ZmZnVjgscMzMzqx0XOGZmZlY7LnDMzMysdlzgmJmZWe24wOlSko6WdFbVcQyHpAMlXVV1HKMlaX9Jl1Ydh1nV2rkvSDpd0nHtWPYA6xpWbpJ0t6TdW7DeQbdRUkiaPtr12MBc4FQk73C3SXpK0oOSvi1pvarjahdJ0/LOvGrVsQwlIs6OiHeWmbYuRZ2NXZLeIum3kpZL+qukqyW9GYa3L7Q5xiskfbTqOKoiaVdJS6uOo9e4wKmApMOA44HPAROBnYBNgMskrd7BOLqq2OiGeLohBrNOkbQu8FPgm8CrgY2AY4BnqozLrBVc4HRYTijHAJ+MiF9ExHMRcTfwAWAacEBh8vGSzpe0QtKNkt5YWM7nJd2XP7tD0tvz+FUkHS7pT5IelXSBpFfnz/pbUQ6S9Bfg15J+LumQhhhvkfS+/P4fJV2Wj+zukPSBwnTrS7pI0uOSrgdeN8imL8g/H5P0hKSZufXjakknSHoUOFrS6yT9Osf+iKSziy1buen4s5JuzUec50sanz+bJOmnkh7L8f5G0ir5symSfijp4bzsk/P4gWJYqVUmf2efknRXjuk/8vf8emAuMDNv02N5+omSzszrukfSFwpxFNf3WF7mP+Xx90paJunDhXWvIek/Jf1F0kOS5kpac6jtNStpC4CIODciXoiIpyPi0oi4FV7ZQpn3hU9I+mPOPcfmffa3OQ9c0H+Q1jhvYf5XdMlIelX+W35Y0t/y+43zZ18C3gqcnPez/n23VbkJSR/K++qjko5s+GwNSSdKuj+/TpS0xjC2cVKOc4WkKyVt0iSGAfd1SWsDPwc2zNv/hKQNJe0gaWHexockfX2wbRyTIsKvDr6AWcDzwKoDfHYGcG5+fzTwHLAvsBrwWeDP+f2WwL3AhnnaacDr8vtDgWuBjYE1gO8UljkNCOBMYG1gTeC/A1cXYpgBPJbnXTuv5yPAqsB2wCPAjDztecAFebqtgfuAq5psd/+6Vy2MOzB/F5/My18TmA68I69/MqkwOrEwz93A9cCGpCPO24GP5c++TCo4VsuvtwICxgG3ACfkWMcDbxkkhgOL25HjvjyvbyrwB+CjhfmvatjWM4GfABPydv8BOKhhfR/JcR0H/AU4JW/zO4EVwDp5+hOAi/K6JwDzgS8Ptr1V/4371TsvYF3gUVLu2QN4VcPnA+0LP8nzbUVq6fkVsBmpNfp3wIcHmrcw//T8/nTguPx+feD9wFr57/wHwI8L813Rv8/l4VbmphnAE8AueR/8et5Hd8+ff5GUU19Dykm/BY4dxjauKCz7GwN8n/3TDrav7wosbVjPNcCH8vt1gJ2q/nvqtlflAYy1F6mF5sEmn30FuCy/Pxq4tvDZKsAD+Z/YdGAZsDuwWsMybgfeXhjegFQorcrLRcZmhc8nAE8Cm+ThLwGn5fcfBH7TsPzvAEeR/jk/B/xj4bN/HySJ9K+7scD5yxDf13uBmwrDdwMHFIa/CszN779ISr7TG5YxE3iYgYvKV8TQmLRy3LMKw58AftVk2nHAs+REm8f9D+CKwvR/LHy2TV7+awvjHgW2JRVnT5KL18K2/Hmw7fXLr+G8gNeT/hEvJf1jv6j/77HJvrBzYXgR8PnC8NfIBySN8xbmf0WBM0BM2wJ/KwxfwcoFTitz0/8FzisMr5334f4C50/AnoXP3wXcPYxtLC57HeAFYEpx2hL7+q68ssBZQOoNmFT131C3vtyc3XmPkJosBzrXY4P8eb97+99ExIukBLRhRNwJfJpUBC2TdJ6kDfOkmwA/yt0Wj5EKnheA1zZZ7grgZ8B+edRs4OzCsnbsX1Ze3v7AP5COZFYtLgu4p9xXsJLi/Eh6bd6e+yQ9DpwFTGqY58HC+6dISQPgP4A7gUtz18/hefwU4J6IeL5MDCWmuYfUgjSQSaTWlOJ3cQ/p3IZ+DxXePw0QEY3j1iF9x2sBiwrf/y/yeGi+vWalRcTtEXFgRGxMau3YEDhxkFka/1YH+tsdFklrSfpO7iZ6nPTPez1J45rM0srctCEr58QnSQcZxc8b9+dm+/9Aist+AvjrAPMPta8P5CBSF+PvJd0gaa9hxDQmuMDpvGtIzbrvK46UtA6pifhXhdFTCp+vQup2uh8gIs6JiLeQdvQgnbQMaWfaIyLWK7zGR8R9heU2PkL+XGC2pJmk7pvLC8u6smFZ60TEx0ktIs8XYyR13zTT7LH1jeP/PY/bJiLWJbV4aZDlvrygiBURcVhEbAa8B/iM0rlJ9wJTmxSVg8VW1Lid9zeZ9xHS0eMmDdPfx/A9QvqHsVXh+58YEevAoNtrNiIR8XtSq8PWLVjck6R/2gBI+odBpj2M1PW+Y97vd+mfrT+0hulbmZseYOVcuxapy6zf/bxyf+7f/8tsY3HZ65C6oO5vmGbQfZ0BclRE/DEiZpO6zo4HLszn61jmAqfDImI5qVnxm5JmSVpN0jRSf/FS4PuFybeX9L78j/nTpMLoWklbStotn+j2d9KO8WKeZy7wpf4T2SRNlrTPEGFdTNqBvwicn1uLIF1dsUU+AW+1/HqzpNdHxAvAD0kn5a4laQbw4UHW8XCOcbMhYplA6g9fLmkj0pVmpUjaS9J0SQKWk1quXiSds/MA8BVJa0saL2nnssvNPpdPhJxCOs/p/Dz+IWDj/hMr8/dyAel3MCH/Hj5Daokalvx7+H/ACZJek7dxI0nvGmJ7zUrJJ+oeppdP6J1CasW9tgWLvwXYStK2ShcCHD3ItBNIeewxpYsijmr4/CFWzh2tzE0XAnspXS6/OikPFv83ngt8IefSSaQurf79ucw27llY9rGkUw9WajUeal/P27++pIn980g6QNLkPO9jebT3/wIXOBWIiK8C/xv4T+Bx4DrSEcnbI6J4eeZPSH3NfwM+BLwvIp4jnaz2FVLV/yCpgj8iz/MNUh/6pZJWkBLVjkPE8wwpIewOnFMYv4J00ut+pCOOB0lHCmvkSQ4hNUc/SDrq+94g63iKdH7P1bkJdqcmkx4DvIn0D/tnOa6yNgd+SSqQrgG+FRGX54S3N6mv+y+kQvKDw1gupN/FIuDmHNepefyvgSXAg5L6uxc/STqyuwu4ivSdnjbM9fX7PKkb6trcdP9L0pEuNNneEa7HxqYVpPxwnaQnSfliMalFZVQi4g+kYuGXwB9J+0IzJ5JO8H8kx/CLhs+/AeyrdIXVSS3OTUuAfyPtpw+Q8m3xnjPHAQuBW4HbgBvzuLLbeA6pYPsrsD0rXylb1HRfzy1r5wJ35fy5IemClSWSnsjfz34R8XSz7RyLFFGmdd5s7JIUwOb53CczM+sBbsExMzOz2nGBY2ZmZrXjLiozMzOrHbfgmJmZWe3U8sGCkyZNimnTplUdRiUWLVrE9ttvX3UYZk0tWrTokYgY7AZmlXMOcQ6x7lU2h9Syi6qvry8WLlxYdRiVkEQdf6dWH5IWRURf1XEMxjnEOcS6V9kc4i4qMzMzqx0XOGZmZlY7LnDMzMysdlzgmJmZWe24wDEzM7PaqeVl4qNxzDHHrDR81FGND7U1Mxsd5xmz9nMLjpmZmdVO17fgSNoMOBKYGBH7dnr9PtIys9FozCFm1hmVtOBIOk3SMkmLG8bPknSHpDslHQ4QEXdFxEFVxGlmZma9qaouqtOBWcURksYBpwB7ADOA2ZJmdD40MzMz63WVFDgRsQD4a8PoHYA7c4vNs8B5wD5llylpjqSFkhY+/PDDLYzWzMzMek03nWS8EXBvYXgpsJGk9SXNBbaTdESzmSNiXkT0RUTf5Mld/Rw/MzMza7OuP8k4Ih4FPlZmWkl7A3tPnz69vUGZmZlZV+umFpz7gCmF4Y3zuNIiYn5EzJk4cWJLAzMzM7Pe0k0Fzg3A5pI2lbQ6sB9wUcUxmZmZWQ+q6jLxc4FrgC0lLZV0UEQ8DxwCXALcDlwQEUuGudy9Jc1bvnx564M2MzOznlHJOTgRMbvJ+IuBi0ex3PnA/L6+voNHugwzMzPrfV1/knE7+Q6jZmNHvtfWQuC+iNir6njMrL266RycUXMXlZkN4lBS97eZjQG1KnB8FZWZDUTSxsC7ge9WHYuZdcaY7qIaCT9806wnnQj8L2BCswkkzQHmAEydOrVDYZlZu9SqBcddVGbWSNJewLKIWDTYdL4bulm91KrAcReVmQ1gZ+A9ku4mPeNuN0lnVRuSmbVbrQocM7NGEXFERGwcEdNINxD9dUQcUHFYZtZmLnDMzMysdmpV4PgcHDMbTERc4XvgmI0NtSpwfA6OmZmZQc0KHDMzMzNwgWNmZmY15ALHzMzMaqdWBY5PMjYzMzOoWYHjk4zNzMwMalbgmJmZmYEftjlqjQ/fBD+A08yGxw/xNWs9t+CYmZlZ7bjAMTMzs9qpVYHjq6jMzMwMalbg+CoqMzMzg5oVOGZmZmbgAsfMzMxqyAWOmZmZ1c6QBY6k9TsRiJnZUJyPzKysMi0410r6gaQ9JantEZmZNed8ZGallLmT8RbA7sC/AidJugA4PSL+0NbIepjvSmrWNs5HZlbKkC04kVwWEbOBg4EPA9dLulLSzLZHOAy+D45ZvfVSPjKzapU6B0fSoZIWAp8FPglMAg4DzmlzfMPi++CY1Vsv5SMzq1aZLqprgO8D742IpYXxCyXNbU9YZmYDcj4ys1LKnGT8hYg4tphMJP03gIg4vm2RmZm9kvORmZVSpsA5fIBxR7Q6EDOzEkaUjySNl3S9pFskLZF0zFDzmFlva9pFJWkPYE9gI0knFT5aF3i+3YGZmfVrQT56BtgtIp6QtBpwlaSfR8S1bQjXzLrAYOfg3A8sBN4DLCqMXwH8z3YGZWbWYFT5KCICeCIPrpZf0eIYzayLNC1wIuIW4BZJZ0eEW2zMrDKtyEeSxpGKo+nAKRFxXcPnc4A5AFOnTh1lxGZWtcG6qC6IiA8AN0kqHumIdED0hrZHZ2ZGa/JRRLwAbCtpPeBHkraOiMWFz+cB8wD6+vrcumPW4wbrojo0/9yrE4GYmQ2iZfkoIh6TdDkwC1g81PRm1puaXkUVEQ/kt48A90bEPcAawBtJ/eFmZh0x2nwkaXJuuUHSmsA7gN+3KVwz6wJlbvS3AHirpFcBlwI3AB8E9m9nYHXiZ1OZtcxI89EGwBn5PJxVgAsi4qdtjXQUGnMGOG+YDVeZAkcR8ZSkg4BvRcRXJd3c7sBeWrm0NvAt4Fngiog4u1PrNrOuM6J8FBG3Atu1Pzwz6xZlbvSn/BC7/YGf5XHjRrNSSadJWiZpccP4WZLukHSnpP4ber0PuDAiDiZdImpmY1fL85GZ1VOZAudQ0p1CfxQRSyRtBlw+yvWeTjrB7yW56fgUYA9gBjBb0gxgY+DePNkLo1yvmfW2duQjM6uhIbuoImIBqd+7f/gu4FOjWWlELJA0rWH0DsCdeflIOg/YB1hKKnJuZpCCrJfuYeH+dbORaUc+MrN6GrLAkbQF8FlgWnH6iNitxbFsxMstNZAKmx2Bk4CTJb0bmN9sZt/Dwqz+OpiPzKzHlTnJ+AfAXOC7VNBFFBFPAh8pM62kvYG9p0+f3t6gzKwqleYjs7qrUw9DmQLn+Yj4dtsjgfuAKYXhjfO40iJiPjC/r6/v4FYGZmZdo1P5yMx6XJmTjOdL+oSkDSS9uv/VhlhuADaXtKmk1YH9gIvasB4z612dykdm1uPKtOB8OP/8XGFcAJuNdKWSzgV2BSZJWgocFRGnSjoEuIR02edpEbFkmMt1F5VZvbU8H5lZPZW5imrTVq80ImY3GX8xcPEolusuKrMaa0c+MrN6GrKLStJakr4gaV4e3lySH8BpZh3nfGRmZZU5B+d7pMck/FMevg84rm0RjYKkvSXNW758edWhmFl79Ew+MrNqlSlwXhcRXwWeA4iIpwC1NaoRioj5ETFn4sSJVYdiZu3RM/nIzKpV5iTjZyWtSTqRD0mvA55pa1RmZgPr+nw00H1EzNqh8W+tV+9X0y5lCpyjgF8AUySdDewMHNjOoEbKV1GZ1V7P5CMzq1aZq6guk3QjsBOpKfjQiHik7ZGNgK+iMqu3XspHZlatpgWOpDc1jHog/5wqaWpE3Ni+sMYeNzWaNed8ZGbDNVgLztfyz/FAH3AL6YjpDcBCYGZ7QzMze4nzkZkNS9OrqCLibRHxNtKR0psioi8itge2Y5jPiOoUXyZuVk+9mI/MrFplLhPfMiJu6x+IiMXA69sX0sj5MnGz2uuZfGRm1SpzFdWtkr4LnJWH9wdubV9IBj4nx6wJ5yMzK6VMgfMR4OPAoXl4AfDttkVkZtac85GZlVLmMvG/Ayfkl5lZZZyPzKysMi04PcM3+jOzunK3tdnw1KrA8Y3+zGwgkqYAZwKvJT3mYV5EfKPaqEbHBU9v8++v/WpV4JiZNfE8cFhE3ChpArBI0mUR8buqAzOz9ihzmfgrSJrT6kDMzEaiTD6KiAf673YcESuA24GN2h2bmVVnpC04amkUNqSBnlDsJk0zYJj5SNI00g0Cr2sYPweYAzB16tQWhWZmVRlRC05EfKfVgZiZjcRw8pGkdYD/Aj4dEY83LGdevkNy3+TJk1sdppl12JAtOJLWB44GdiadnHcV8MWIeLS9oQ3fWLuKyiep2VgzmnwkaTVScXN2RPywnXGaWfXKdFGdR7qZ1vvz8P7A+cDu7QpqpHwVlVntjSgfSRJwKnB7RHy9rRHamOODze5Upotqg4g4NiL+nF/HkS61NDPrtJHmo52BDwG7Sbo5v/Zsb6hmVqUyLTiXStoPuCAP7wtc0r6QbKT6jyL6f/oowmpoRPkoIq7CF0eYjSlNCxxJK0h93AI+zcsPt1sFeAL4bNujMzPD+cjMhq9pgRMREzoZiJlZM85HZjZcpe6DI+k9wC558IqI+Gn7QjIza875yMzKKHOZ+FeANwNn51GHSto5Io5oa2TWcr5ZoPU65yMzK6tMC86ewLYR8SKApDOAm4CuSyhj7T44ZmNQz+QjM6tW2TsZr1d4P7EdgbRCRMyPiDkTJ3ZtiGY2ej2Rj8ysWmVacL4M3CTpctIVDLsAh7c1KjOzgTkfmVkpgxY4+e6fVwE7kfq9AT4fEQ+2OzAzsyLnIzMbjkELnIgISRdHxDbARR2KyczsFZyPrBP82IX6KHMOzo2S3jz0ZGZmbed8ZGallDkHZ0fgAEl3A0+S+r0jIt7QzsDMzAbgfGRmpZQpcN7V9iisa7h51rqc85GZlTLYs6jGAx8DpgO3AadGxPOdCszMrJ/zkbWCb3Y6tgzWgnMG8BzwG2APYAZwaCeCMjNr4Hw0TP5nbmPdYAXOjHy1ApJOBa7vTEjWKgMluG7lrjEbgvPREDq1v3tftV4x2FVUz/W/cVOwmVXM+cjMhmWwFpw3Sno8vxewZh7uv2ph3bZHB0jaDDgSmBgR+3ZinWOJj/qsR3RFPjKz3tG0wImIcaNduKTTgL2AZRGxdWH8LOAbwDjguxHxlUHiuAs4SNKFo43Hhq9MAdRYrPRS15j1hlbkIzMbW8pcJj4apwMnA2f2j5A0DjgFeAewFLhB0kWkYufLDfP/a0Qsa3OMZmZmVjNtLXAiYoGkaQ2jdwDuzC0zSDoP2Ccivkxq7RkRSXOAOQBTp04d6WLMzKxHuSvciso8qqHVNgLuLQwvzeMGJGl9SXOB7SQd0Wy6iJgXEX0R0Td58uTWRWtmZmY9p91dVKMWEY+SbvA1JEl7A3tPnz69vUFZJXx0ZmZmZVXRgnMfMKUwvHEeN2oRMT8i5kycOLEVizOzmpB0mqRlkhZXHYuZdUYVLTg3AJtL2pRU2OwH/EsFcZjZ2HE6DRc82Cv1t5IWW0u7paXULbg2XG0tcCSdC+wKTJK0FDgqIk6VdAhwCenKqdMiYkmL1ucuqjHMt6a3Zppc8GBmNdbuq6hmNxl/MXBxG9Y3H5jf19d3cKuXbWb1NhauxOyle1T1UqzWnao4B8fMrOv4SkyzeqlVgSNpb0nzli9fXnUoZmZmVqGuv0x8ONxFZcM1kvN2fLKjmVn3q1ULjpnZQPIFD9cAW0paKumgqmMys/aqVQuOr6Iys4E0u+DBquErHq0TalXguIvKzKz7+Iooq0KtChwzM6ueCxrrBi5wzCrgJnozs/aqVYHjc3DMzDrLrTXWrWp1FZUftmlmZmZQswLHzMzMDGrWRWVm1ml176Kp+/ZZfbkFx8zMzGqnVgWOn0VlZmZmULMCxycZm5mZGdSswDEzMzMDFzhmZmZWQy5wzMzMrHZc4JiZmVnt1KrA8VVUZmZmBjUrcHwVlZmZmUHNChwzMzMzcIFjZmZmNeQCx8zMzGrHD9s0q5HGByMeddRRFUViZlYtt+CYmZlZ7bjAMTMzs9pxgWNmZma1U6sCxzf6M7NmJM2SdIekOyUdXnU8ZtZetSpwfKM/MxuIpHHAKcAewAxgtqQZ1UZlZu1UqwLHzKyJHYA7I+KuiHgWOA/Yp+KYzKyNFBFVx9Bykh4G7ikx6STgkTaH0wq9Eif0TqyOs7WGE+cmETG5ncE0krQvMCsiPpqHPwTsGBGHFKaZA8zJg1sCd5RcfB1/R1VynK3VK3FC+VhL5ZBa3genbPKUtDAi+todz2j1SpzQO7E6ztbqlTgHExHzgHnDna9Xtt1xtpbjbL1Wx+ouKjMbC+4DphSGN87jzKymXOCY2VhwA7C5pE0lrQ7sB1xUcUxm1ka17KIahmE3R1ekV+KE3onVcbZWV8cZEc9LOgS4BBgHnBYRS1q0+K7e9gLH2VqOs/VaGmstTzI2MzOzsc1dVGZmZlY7LnDMzMysdsZEgTPULdolrSHp/Pz5dZKmdT7KUnF+RtLvJN0q6VeSNunGOAvTvV9SSKrkEsUycUr6QP5Ol0g6p9MxFuIY6nc/VdLlkm7Kv/89K4jxNEnLJC1u8rkknZS34VZJb+p0jO3kPNLZOAvTOY+U0As5JMfRuTwSEbV+kU4o/BOwGbA6cAswo2GaTwBz8/v9gPO7NM63AWvl9x/v1jjzdBOABcC1QF83xglsDtwEvCoPv6bTcQ4j1nnAx/P7GcDdFcS5C/AmYHGTz/cEfg4I2Am4rorvs8LfkfNIC+PM0zmPtC7OynNIXnfH8shYaMEpc4v2fYAz8vsLgbdLUgdjhBJxRsTlEfFUHryWdC+PTit7y/tjgeOBv3cyuIIycR4MnBIRfwOIiGUdjrFfmVgDWDe/nwjc38H4UgARC4C/DjLJPsCZkVwLrCdpg85E13bOI63lPNJaPZFDoLN5ZCwUOBsB9xaGl+ZxA04TEc8Dy4H1OxLdADFkA8VZdBCpyu20IePMTYpTIuJnnQysQZnvcwtgC0lXS7pW0qyORbeyMrEeDRwgaSlwMfDJzoQ2LMP9G+4lziOt5TzSWnXJIdDCPDLW74PTkyQdAPQB/1x1LI0krQJ8HTiw4lDKWJXUvLwr6Sh2gaRtIuKxSqMa2Gzg9Ij4mqSZwPclbR0RL1YdmPUm55GW6ZU8MuZyyFhowSlzi/aXppG0Kqn57tGORDdADNmAt5KXtDtwJPCeiHimQ7EVDRXnBGBr4ApJd5P6UC+q4ATBMt/nUuCiiHguIv4M/IGUqDqtTKwHARcARMQ1wHjSg+m6SZ0fh+A80lrOI61VlxwCrcwjVZxk1MkXqbq+C9iUl0++2qphmn9j5ZMDL+jSOLcjnUi2eTd/nw3TX0E1JweW+T5nAWfk95NIzaLrd2msPwcOzO9fT+o/VwWxTqP5yYHvZuWTA6/vdHwV/46cR1oYZ8P0ziOjj7Mrckhef0fySMc3rKIvc09SVf0n4Mg87oukoxdIlewPgDuB64HNujTOXwIPATfn10XdGGfDtJUkppLfp0jN4L8DbgP26+K/0RnA1Tlx3Qy8s4IYzwUeAJ4jHbUeBHwM+Fjh+zwlb8NtVf3eK/wdOY+0MM6GaZ1HRh9n5Tkkx9GxPOJHNZiZmVntjIVzcMzMzGyMcYFjZmZmteMCx8zMzGrHBY6ZmZnVjgscMzMzqx0XONaUpBck3ZyfkHuLpMPyHUZHurwnmow/XdK+Q8x7oKQNR7puM+s85xCrkh/VYIN5OiK2BZD0GuAc0sPajqoglgOBxXTgAXGSxkXEC+1ej9kY4BxilXELjpUS6Qm5c4BDlIyX9D1Jt0m6SdLb4KWjpJP755P0U0m7FoZPyEdzv5I0uXE9kraXdKWkRZIukbRBPjLrA87OR4NrNszzKUm/k3SrpPPyuHUK8d0q6f15/Ow8brGk4wvLeELS1yTdAswcKI5Wfp9mY41ziHNIp7nAsdIi4i5gHPAa0m3pIyK2IT3E7QxJ44dYxNrAwojYCriShqM4SasB3wT2jYjtgdOAL0XEhcBCYP+I2DYinm5Y7uHAdhHxBtIdMQH+D7A8IrbJ43+dm6ePB3YDtgXeLOm9hdiui4g3AtcNFEfJr8nMmnAOsU5yF5WN1FtIOzAR8XtJ9wBbDDHPi8D5+f1ZwA8bPt+S9IC9yyRBSoQPlIjlVtKR2Y+BH+dxu5OeB0SO8W+SdgGuiIiHASSdDeyS53kB+K9RxmFm5TmHWFu5wLHSJG1G2omXDTLZ86zcMjjYEVnjc0IELImImcMM7d2kJLM3cKSkbYY5P8DfC33mI43DzAbhHGKd5C4qKyX3dc8FTo70ALPfAPvnz7YApgJ3AHcD20paRdIUYIfCYlYB+q90+BfgqobV3AFMljQzL3c1SVvlz1YAEwaIaxVgSkRcDnwemAisA1xGagLvn+5VpAcg/rOkSZLGkZrFrxxgcweLw8wHgBcjAAAAyElEQVRGwDnEOaTT3IJjg1lT0s3AaqSjqu+TnpoL8C3g25Juy58dGBHPSLoa+DPpybq3AzcWlvcksIOkL5CO4D5YXFlEPJtPBjxJ0kTS3+eJwBLgdGCupKeBmYU+9HHAWXl6ASdFxGOSjgNOkbSYdMR4TET8UNLhwOV52p9FxE8aN3qIOMysPOcQ55DK+GniZmZmVjvuojIzM7PacYFjZmZmteMCx8zMzGrHBY6ZmZnVjgscMzMzqx0XOGZmZlY7LnDMzMysdv4/io6KjRAIDUwAAAAASUVORK5CYII=\n", 169 | "text/plain": [ 170 | "
" 171 | ] 172 | }, 173 | "metadata": { 174 | "needs_background": "light" 175 | }, 176 | "output_type": "display_data" 177 | } 178 | ], 179 | "source": [ 180 | "input_dir = 'cellranger_output_for_COVID19_patients/'\n", 181 | "counts_matrix = scipy.io.mmread(input_dir + '/matrix.mtx').T.tocsc()\n", 182 | "genes = numpy.array(scrublet.load_genes(input_dir + '/features.tsv', delimiter='\\t', column=1))\n", 183 | "out_df = pandas.read_csv(input_dir + '/barcodes.tsv', header = None, index_col=None, names=['barcode'])\n", 184 | "print('Counts matrix shape: {} rows, {} columns'.format(counts_matrix.shape[0], counts_matrix.shape[1]))\n", 185 | "print('Number of genes in gene list: {}'.format(len(genes)))\n", 186 | "scrub = scrublet.Scrublet(counts_matrix, expected_doublet_rate=0.06)\n", 187 | "doublet_scores, predicted_doublets = scrub.scrub_doublets(min_counts=2, min_cells=3, min_gene_variability_pctl=85, \n", 188 | " n_prin_comps=30)\n", 189 | "scrub.call_doublets(threshold=0.25)\n", 190 | "scrub.plot_histogram()\n", 191 | "print(scrub.detected_doublet_rate_)\n", 192 | "out_df['doublet_scores'] = doublet_scores\n", 193 | "out_df['predicted_doublets'] = predicted_doublets\n", 194 | "out_df.to_csv('COVID19_patients_doublets.txt', index=False,header=True)" 195 | ] 196 | }, 197 | { 198 | "cell_type": "code", 199 | "execution_count": null, 200 | "metadata": {}, 201 | "outputs": [], 202 | "source": [] 203 | } 204 | ], 205 | "metadata": { 206 | "kernelspec": { 207 | "display_name": "Python 3", 208 | "language": "python", 209 | "name": "python3" 210 | }, 211 | "language_info": { 212 | "codemirror_mode": { 213 | "name": "ipython", 214 | "version": 3 215 | }, 216 | "file_extension": ".py", 217 | "mimetype": "text/x-python", 218 | "name": "python", 219 | "nbconvert_exporter": "python", 220 | "pygments_lexer": "ipython3", 221 | "version": "3.6.8" 222 | } 223 | }, 224 | "nbformat": 4, 225 | "nbformat_minor": 2 226 | } 227 | --------------------------------------------------------------------------------