├── .gitignore ├── DESCRIPTION ├── GSE123813 ├── GSE123813_Vignette.pdf └── Raw_Data │ └── bcc_annotation.tsv ├── GSE123813_Vignette_RISC_v1.0.pdf ├── GSE123813_Vignette_RISC_v1.6.pdf ├── LICENSE ├── NAMESPACE ├── R ├── AllClasses.R ├── AllGenerics.R ├── Cluster.R ├── ClusterMarker.R ├── Graph.R ├── Integrating.R ├── Preprocess.R ├── RcppExports.R ├── Reduce_Dimension.R └── Utilities.R ├── README.md ├── RISC_1.0.tar.gz ├── RISC_1.6.0.tar.gz ├── RISC_1.7.tar.gz ├── RISC_Supplementary ├── GSE110823 │ ├── GSE110823.R │ ├── GSE110823_ann.tsv │ └── var.tsv ├── GSE111113 │ ├── GSE111113.R │ └── var.tsv ├── GSE114727 │ ├── GSE114727.R │ ├── GSE114727_Anno_All.tsv │ ├── var_all.tsv │ └── var_pos.tsv ├── GSE123813 │ ├── GSE123813_Vignette.Rmd │ └── bcc_annotation.tsv ├── GSE125688 │ ├── GSE125688.R │ └── GSE125688_Ann.tsv ├── GSE131181 │ ├── GSE131181.R │ ├── GSE131181_Ann.tsv │ └── GSE131181_var.tsv ├── GSE132044 │ ├── GSE132044.R │ ├── GSE132044_Anno.tsv │ └── var.tsv ├── GSE84133 │ ├── GSE84133.R │ ├── GSE84133_Anno.tsv │ ├── GSE84133_Filter.tsv │ └── var.tsv ├── GSE85241_GSE81076_GSE83139_EMTAB_5061 │ ├── GSE85241_GSE81076_GSE83139_EMTAB_5061.R │ ├── Pancreas_Annotation.tsv │ └── var.tsv └── GSE96583 │ ├── Anno0.tsv │ ├── GSE96583.R │ ├── Var0.tsv │ └── Var0_Ori.tsv ├── Release.txt ├── Seurat_to_RISC_RISC_v1.0.pdf ├── build └── vignette.rds ├── data ├── datalist └── raw.mat.rda ├── inst └── doc │ ├── RISC_Vignette.R │ ├── RISC_Vignette.Rmd │ └── RISC_Vignette.html ├── man ├── AddFactor.Rd ├── All-Cluster-Marker.Rd ├── Cluster-Marker.Rd ├── Cluster.Rd ├── DimPlot.Rd ├── Disperse.Rd ├── Filter.Rd ├── FilterPlot.Rd ├── Heatmap.Rd ├── Import-10X-h5.Rd ├── Import-10X-mtx.Rd ├── Import-Matrix.Rd ├── InPlot.Rd ├── Integration-Algorithm-SIMPLS.Rd ├── MSC.Rd ├── Multiple-Integrating.Rd ├── Normalize.Rd ├── PCA.Rd ├── PCPlot.Rd ├── PLS-Integrating.Rd ├── Pairwise-DEGs.Rd ├── Scale.Rd ├── SingleCellData.Rd ├── Subset.Rd ├── UMAP.Rd ├── UMAPlot.Rd ├── Violin-Plot.Rd ├── raw.mat.Rd ├── setClass.Rd ├── setMethod.Rd └── tSNE.Rd ├── src ├── Makevars ├── Makevars.win ├── RcppArmadilloProcess.cpp └── RcppExports.cpp ├── tests ├── testthat.R └── testthat │ └── test-workflow.R └── vignettes └── RISC_Vignette.Rmd /.gitignore: -------------------------------------------------------------------------------- 1 | .DS_Store 2 | .DS_Store 3 | .DS_Store 4 | .DS_Store 5 | .DS_Store 6 | .DS_Store 7 | .DS_Store 8 | .Rhistory 9 | .DS_Store 10 | .Rhistory 11 | .DS_Store 12 | .Rhistory 13 | .DS_Store 14 | .Rhistory 15 | .DS_Store 16 | .Rhistory 17 | .DS_Store 18 | .Rhistory 19 | -------------------------------------------------------------------------------- /DESCRIPTION: -------------------------------------------------------------------------------- 1 | Package: RISC 2 | Type: Package 3 | Title: Robust Integration of Single-Cell RNA-Seq Datasets 4 | Version: 1.7 5 | Date: 2022-1-10 6 | Update: 2024-3-15 7 | Authors@R: c(person("Yang", "Liu", role = c("aut", "cre"), 8 | email = "yanliurisc@gmail.com"), 9 | person("Deyou", "Zheng", role = "aut"), 10 | person("Tao", "Wang", role = "aut")) 11 | Maintainer: Yang Liu 12 | Description: The 'RISC' package can integrate single cell RNA sequencing data, correct batch effects, cluster cells, identify gene markers, and produce integrated gene expression matrix. More details in URLs below. 13 | URL: https://www.biorxiv.org/content/10.1101/483297v1.article-info 14 | https://github.com/yangRISC/RISC 15 | Depends: R (>= 4.0.0) 16 | Imports: Rcpp, Matrix, sparseMatrixStats, MASS, pbapply, hdf5r, 17 | doParallel, foreach, irlba, Rtsne, umap, densityClust, FNN, 18 | igraph, RColorBrewer, ggplot2, gridExtra, pheatmap, methods, 19 | grDevices, stats, utils 20 | LinkingTo: Rcpp, RcppArmadillo 21 | LazyData: true 22 | Encoding: UTF-8 23 | RoxygenNote: 7.3.1 24 | Suggests: testthat, usethis, knitr, rmarkdown 25 | VignetteBuilder: knitr 26 | NeedsCompilation: yes 27 | License: GPL-3 | file LICENSE 28 | Packaged: 2024-03-18 13:34:09 UTC; liuy128 29 | Author: Yang Liu [aut, cre], 30 | Deyou Zheng [aut], 31 | Tao Wang [aut] 32 | -------------------------------------------------------------------------------- /GSE123813/GSE123813_Vignette.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/bioinfoDZ/RISC/7f1ddf61711f0d24ef8ea0601e550370ff3ec2b5/GSE123813/GSE123813_Vignette.pdf -------------------------------------------------------------------------------- /GSE123813_Vignette_RISC_v1.0.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/bioinfoDZ/RISC/7f1ddf61711f0d24ef8ea0601e550370ff3ec2b5/GSE123813_Vignette_RISC_v1.0.pdf -------------------------------------------------------------------------------- /GSE123813_Vignette_RISC_v1.6.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/bioinfoDZ/RISC/7f1ddf61711f0d24ef8ea0601e550370ff3ec2b5/GSE123813_Vignette_RISC_v1.6.pdf -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/bioinfoDZ/RISC/7f1ddf61711f0d24ef8ea0601e550370ff3ec2b5/LICENSE -------------------------------------------------------------------------------- /NAMESPACE: -------------------------------------------------------------------------------- 1 | # Generated by roxygen2: do not edit by hand 2 | 3 | export(AllMarker) 4 | export(DimPlot) 5 | export(FilterPlot) 6 | export(Heat) 7 | export(InPlot) 8 | export(SubSet) 9 | export(ViolinPlot) 10 | export(read10X_h5) 11 | export(read10X_mtx) 12 | export(readsc) 13 | export(scCluster) 14 | export(scDEG) 15 | export(scDisperse) 16 | export(scFilter) 17 | export(scMarker) 18 | export(scMultiIntegrate) 19 | export(scNormalize) 20 | export(scPCA) 21 | export(scPLS) 22 | export(scScale) 23 | export(scTSNE) 24 | export(scUMAP) 25 | exportClasses(RISCdata) 26 | import(RColorBrewer) 27 | import(ggplot2) 28 | importFrom(FNN,get.knn) 29 | importFrom(MASS,glm.nb) 30 | importFrom(Matrix,colMeans) 31 | importFrom(Matrix,colSums) 32 | importFrom(Matrix,mean) 33 | importFrom(Matrix,readMM) 34 | importFrom(Matrix,rowMeans) 35 | importFrom(Matrix,rowSums) 36 | importFrom(Matrix,spMatrix) 37 | importFrom(Matrix,sparseMatrix) 38 | importFrom(Rcpp,evalCpp) 39 | importFrom(Rtsne,Rtsne) 40 | importFrom(densityClust,densityClust) 41 | importFrom(densityClust,findClusters) 42 | importFrom(doParallel,registerDoParallel) 43 | importFrom(foreach,"%dopar%") 44 | importFrom(foreach,foreach) 45 | importFrom(grDevices,col2rgb) 46 | importFrom(grDevices,colorRampPalette) 47 | importFrom(gridExtra,grid.arrange) 48 | importFrom(hdf5r,H5File) 49 | importFrom(igraph,"E<-") 50 | importFrom(igraph,E) 51 | importFrom(igraph,cluster_louvain) 52 | importFrom(igraph,graph_from_data_frame) 53 | importFrom(igraph,simplify) 54 | importFrom(irlba,irlba) 55 | importFrom(methods,as) 56 | importFrom(methods,new) 57 | importFrom(pbapply,pblapply) 58 | importFrom(pbapply,pboptions) 59 | importFrom(pheatmap,pheatmap) 60 | importFrom(stats,"contrasts<-") 61 | importFrom(stats,aggregate) 62 | importFrom(stats,as.formula) 63 | importFrom(stats,coef) 64 | importFrom(stats,contr.sum) 65 | importFrom(stats,cutree) 66 | importFrom(stats,dist) 67 | importFrom(stats,embed) 68 | importFrom(stats,gaussian) 69 | importFrom(stats,glm) 70 | importFrom(stats,ks.test) 71 | importFrom(stats,loess) 72 | importFrom(stats,loess.smooth) 73 | importFrom(stats,median) 74 | importFrom(stats,model.matrix) 75 | importFrom(stats,p.adjust) 76 | importFrom(stats,pchisq) 77 | importFrom(stats,poisson) 78 | importFrom(stats,qnorm) 79 | importFrom(stats,quantile) 80 | importFrom(stats,quasipoisson) 81 | importFrom(stats,reshape) 82 | importFrom(stats,sd) 83 | importFrom(stats,smooth) 84 | importFrom(stats,var) 85 | importFrom(stats,wilcox.test) 86 | importFrom(umap,umap) 87 | importFrom(utils,head) 88 | importFrom(utils,read.table) 89 | importFrom(utils,setTxtProgressBar) 90 | importFrom(utils,txtProgressBar) 91 | useDynLib(RISC) 92 | -------------------------------------------------------------------------------- /R/AllClasses.R: -------------------------------------------------------------------------------- 1 | #################################################################################### 2 | #' Import single cell data 3 | #################################################################################### 4 | #' 5 | #' The single cell RNA-seq (scRNA-seq) data can be imported in three different ways. 6 | #' Primarily, we could import from 10X Genomics output directly by using 7 | #' "read10Xgenomics". The user only need to provide the folder path. Secondly, 8 | #' we could read data from HT-seq output by "readHTSeqdata", the user have to input 9 | #' the folder path. Lastly, we could input matrix, cell and genes mannually 10 | #' using "readscdata". 11 | #' 12 | #' @useDynLib RISC 13 | #' @importFrom Rcpp evalCpp 14 | #' @importFrom methods as new 15 | #' @importFrom utils head read.table 16 | #' @importFrom Matrix readMM colSums rowSums 17 | #' @rdname SingleCellData 18 | #' @return SingleCellData 19 | #' @param assay The list of gene counts. 20 | #' @param coldata The data.frame with cell information. 21 | #' @param rowdata The data.frame with gene information. 22 | #' @name SingleCellData 23 | 24 | SingleCellData <- function(assay, coldata, rowdata){ 25 | object <- new(Class = 'RISCdata', assay = assay, coldata = coldata, rowdata = rowdata) 26 | } 27 | 28 | 29 | 30 | #################################################################################### 31 | #' Example data 32 | #################################################################################### 33 | #' 34 | #' @docType data 35 | #' @usage data(raw.mat) 36 | #' @format A list including a simulated cell-gene matrix, columns for cells and 37 | #' rows for genes, a cell group and a batch information. 38 | "raw.mat" 39 | 40 | 41 | 42 | #################################################################################### 43 | #' Import data from matrix, cell and genes directly. 44 | #################################################################################### 45 | #' 46 | #' Import data set from matrix, cell and genes directly, the customer needs three 47 | #' files: a matrix file including gene expression values: raw counts/UMIs (rows for 48 | #' genes while columns for cells), a cell file (whose row.name are equal to the 49 | #' col.name of the matrix), and a gene file whose row.name are the same as the 50 | #' row.name of the matrix. If row.names of the gene matrix are Ensembl ID, the 51 | #' customer need to transfer them to gene symbols manually. 52 | #' 53 | #' @rdname Import-Matrix 54 | #' @param count Matrix with raw counts/UMIs. 55 | #' @param cell Data.frame with cell Barcode, whose row.name are equal to the 56 | #' col.name of the matrix. 57 | #' @param gene Data.frame with gene symbol, whose row.name are the same as the 58 | #' row.name of the matrix. 59 | #' @param is.filter Remove not expressed genes. 60 | #' @return RISC single cell dataset, including count, coldata, and rowdata. 61 | #' @name readsc 62 | #' @export 63 | #' @examples 64 | #' mat0 = as.matrix(raw.mat[[1]]) 65 | #' coldata0 = as.data.frame(raw.mat[[2]]) 66 | #' coldata.obj = coldata0[coldata0$Batch0 == 'Batch3',] 67 | #' matrix.obj = mat0[,rownames(coldata.obj)] 68 | #' obj0 = readsc(count = matrix.obj, cell = coldata.obj, 69 | #' gene = data.frame(Symbol = rownames(matrix.obj), 70 | #' row.names = rownames(matrix.obj)), is.filter = FALSE) 71 | 72 | readsc <- function( 73 | count, 74 | cell, 75 | gene, 76 | is.filter = TRUE 77 | ) { 78 | 79 | if(exists("count") & exists("cell") & exists("gene")){ 80 | 81 | if(all(colnames(count) == rownames(cell)) & all(rownames(count) == rownames(gene))){ 82 | 83 | run.matrix = as(count, 'CsparseMatrix') 84 | # run.matrix = as.matrix(count) 85 | 86 | mito.gene = grep(pattern = '^mt-', x = rownames(run.matrix), ignore.case = TRUE, value = TRUE) 87 | run.cell = data.frame(scBarcode = rownames(cell), UMI = Matrix::colSums(run.matrix), nGene = Matrix::colSums(run.matrix > 0), row.names = rownames(cell), stringsAsFactors = FALSE) 88 | run.cell$mito = Matrix::colSums(run.matrix[rownames(run.matrix) %in% mito.gene,]) / run.cell$UMI 89 | run.cell = cbind.data.frame(run.cell, cell) 90 | 91 | run.gene = data.frame(Symbol = rownames(run.matrix), RNA = "Gene Expression", row.names = rownames(run.matrix), stringsAsFactors = FALSE) 92 | run.gene$nCell = Matrix::rowSums(run.matrix > 0) 93 | if(is.filter){ 94 | run.gene = run.gene[run.gene$nCell > 0,] 95 | } else { 96 | run.gene = run.gene 97 | } 98 | run.matrix = run.matrix[rownames(run.matrix) %in% run.gene$Symbol,] 99 | 100 | SingleCellData(assay = list(count = as(run.matrix, 'CsparseMatrix')), rowdata = data.frame(run.gene, stringsAsFactors = FALSE), coldata = data.frame(run.cell, stringsAsFactors = FALSE)) 101 | 102 | } else { 103 | stop('Matrix colnames or rownames are not equal to cell name or gene name') 104 | } 105 | 106 | } else { 107 | stop('No matrix, cell or gene is found here') 108 | } 109 | 110 | } 111 | 112 | 113 | 114 | #################################################################################### 115 | #' Import data from 10X Genomics output (tsv-mtx). 116 | #################################################################################### 117 | #' 118 | #' Import data directly from 10X Genomics output, usually using filtered gene 119 | #' matrices which contains three files: matrix.mtx, barcode.tsv and gene.tsv. 120 | #' The user only need to input the directory into "data.path". If not the original 121 | #' 10X Genomics output, the user have to make sure the barcode.tsv and gene.tsv 122 | #' without col.names, the barcode.tsv at least contains one column for cell 123 | #' barcode, and the gene.tsv has two columns for gene Ensembl ID and Symbol. 124 | #' 125 | #' @rdname Import-10X-mtx 126 | #' @param data.path Directory containing the filtered 10X Genomics output, 127 | #' including three files: matrix.mtx, barcode.tsv (without colnames) and gene.tsv 128 | #' (without colnames). 129 | #' @param sep The sep can be changed by the users 130 | #' @param is.filter Remove not expressed genes. 131 | #' @return RISC single cell dataset, including count, coldata, and rowdata. 132 | #' @name read10X_mtx 133 | #' @export 134 | 135 | read10X_mtx <- function( 136 | data.path, 137 | sep = '\t', 138 | is.filter = TRUE 139 | ) { 140 | 141 | if(!exists("data.path")){ 142 | stop('Please input data.path') 143 | } else { 144 | data.path = as.character(data.path) 145 | } 146 | 147 | sep0 = sep 148 | files = list.files(path = data.path, full.names = TRUE) 149 | file.matrix = grep('matrix.mtx', files, ignore.case = TRUE, value = TRUE) 150 | file.gene = grep(pattern = 'features.tsv', files, ignore.case = TRUE, value = TRUE) 151 | file.cell = grep('barcodes.tsv', files, ignore.case = TRUE, value = TRUE) 152 | 153 | if(length(file.matrix) == 1 & length(file.gene) == 1 & length(file.cell) == 1){ 154 | 155 | run.matrix = readMM(file = file.matrix) 156 | run.matrix = as(run.matrix, 'CsparseMatrix') 157 | 158 | run.gene = read.table(file = file.gene, header = FALSE, sep = sep0, stringsAsFactors = FALSE) 159 | run.gene = data.frame(run.gene, stringsAsFactors = FALSE) 160 | if(ncol(run.gene) > 2){ 161 | colnames(run.gene) = c('Ensembl', 'Symbol', 'RNA') 162 | } else { 163 | colnames(run.gene) = c('Ensembl', 'Symbol') 164 | run.gene$RNA = "Gene Expression" 165 | } 166 | run.gene$nCell = Matrix::rowSums(run.matrix > 0) 167 | run.gene$Symbol = make.unique(run.gene$Symbol) 168 | rownames(run.matrix) = rownames(run.gene) = run.gene$Symbol 169 | if(is.filter){ 170 | run.gene = run.gene[run.gene$nCell > 0,] 171 | } else { 172 | run.gene = run.gene 173 | } 174 | run.matrix = run.matrix[rownames(run.matrix) %in% run.gene$Symbol,] 175 | 176 | mito.gene = grep(pattern = '^mt-', x = rownames(run.matrix), ignore.case = TRUE, value = TRUE) 177 | run.cell = read.table(file = file.cell, header = FALSE, sep = sep0, stringsAsFactors = FALSE) 178 | # run.cell = sapply(run.cell$V1, function(x){strsplit(x, '-', fixed = T)[[1]][[1]]}) 179 | run.cell = data.frame(scBarcode = as.character(run.cell$V1), UMI = Matrix::colSums(run.matrix), nGene = Matrix::colSums(run.matrix > 0), stringsAsFactors = FALSE) 180 | run.cell$mito = Matrix::colSums(run.matrix[rownames(run.matrix) %in% mito.gene,]) / run.cell$UMI 181 | colnames(run.matrix) = rownames(run.cell) = run.cell$scBarcode 182 | 183 | } else { 184 | stop('The direcotry is invalid, please input the dir including files: "barcodes.tsv", "features.tsv", "matrix.mtx"') 185 | } 186 | 187 | SingleCellData(assay = list(count = as(run.matrix, 'CsparseMatrix')), rowdata = data.frame(run.gene, stringsAsFactors = FALSE), coldata = data.frame(run.cell, stringsAsFactors = FALSE)) 188 | 189 | } 190 | 191 | 192 | 193 | #################################################################################### 194 | #' Import data from 10X Genomics output (h5). 195 | #################################################################################### 196 | #' 197 | #' Import data directly from 10X Genomics output, usually using filtered gene 198 | #' matrices which contains h5 file. 199 | #' The user only need to input the directory into "data.path". If not the original 200 | #' 10X Genomics output, the user can use 'readsc' function. 201 | #' 202 | #' @rdname Import-10X-h5 203 | #' @param file.path The path of the filtered 10X Genomics output (h5 file). 204 | #' @param is.filter Remove not expressed genes. 205 | #' @importFrom hdf5r H5File 206 | #' @return RISC single cell dataset, including count, coldata, and rowdata. 207 | #' @name read10X_h5 208 | #' @export 209 | 210 | read10X_h5 <- function( 211 | file.path, 212 | is.filter = TRUE 213 | ) { 214 | 215 | if(!exists("file.path")){ 216 | stop('Please input file.path') 217 | } else { 218 | file.path = as.character(file.path) 219 | } 220 | 221 | files = H5File$new(filename = file.path, mode = "r") 222 | key = files$names 223 | 224 | if(!is.null(files[[key]])){ 225 | 226 | if(key == 'matrix'){ 227 | 228 | run.cell = data.frame(scBarcode = files[['matrix/barcodes']][], row.names = files[['matrix/barcodes']][]) 229 | run.gene = data.frame(Symbol = make.unique(files[['matrix/features/name']][]), Ensembl = files[['matrix/features/id']][], row.names = make.unique(files[['matrix/features/name']][])) 230 | run.matrix = Matrix::sparseMatrix(dims = files[['matrix/shape']][], x = files[['matrix/data']][], i = files[['matrix/indices']][], p = files[['matrix/indptr']][], index1 = FALSE) 231 | files$close_all() 232 | 233 | run.matrix = as(run.matrix, 'CsparseMatrix') 234 | run.gene$RNA = "Gene Expression" 235 | run.gene$nCell = Matrix::rowSums(run.matrix > 0) 236 | rownames(run.matrix) = rownames(run.gene) 237 | 238 | if(is.filter){ 239 | run.gene = run.gene[run.gene$nCell > 0,] 240 | } else { 241 | run.gene = run.gene 242 | } 243 | run.matrix = run.matrix[rownames(run.matrix) %in% run.gene$Symbol,] 244 | 245 | mito.gene = grep(pattern = '^mt-', x = rownames(run.matrix), ignore.case = TRUE, value = TRUE) 246 | run.cell$UMI = Matrix::colSums(run.matrix) 247 | run.cell$nGene = Matrix::colSums(run.matrix > 0) 248 | run.cell$mito = Matrix::colSums(run.matrix[rownames(run.matrix) %in% mito.gene,]) / run.cell$UMI 249 | colnames(run.matrix) = rownames(run.cell) 250 | 251 | } else { 252 | 253 | run.cell = data.frame(scBarcode = files[[paste0(key, '/barcodes')]][], row.names = files[[paste0(key, '/barcodes')]][]) 254 | run.gene = data.frame(Symbol = make.unique(files[[paste0(key, '/gene_names')]][]), Ensembl = files[[paste0(key, '/genes')]][], row.names = make.unique(files[[paste0(key, '/gene_names')]][])) 255 | run.matrix = Matrix::sparseMatrix(dims = files[[paste0(key, '/shape')]][], x = files[[paste0(key, '/data')]][], i = files[[paste0(key, '/indices')]][], p = files[[paste0(key, '/indptr')]][], index1 = FALSE) 256 | files$close_all() 257 | 258 | run.matrix = as(run.matrix, 'CsparseMatrix') 259 | run.gene$RNA = "Gene Expression" 260 | run.gene$nCell = Matrix::rowSums(run.matrix > 0) 261 | rownames(run.matrix) = rownames(run.gene) 262 | 263 | if(is.filter){ 264 | run.gene = run.gene[run.gene$nCell > 0,] 265 | } else { 266 | run.gene = run.gene 267 | } 268 | run.matrix = run.matrix[rownames(run.matrix) %in% run.gene$Symbol,] 269 | 270 | mito.gene = grep(pattern = '^mt-', x = rownames(run.matrix), ignore.case = TRUE, value = TRUE) 271 | run.cell$UMI = Matrix::colSums(run.matrix) 272 | run.cell$nGene = Matrix::colSums(run.matrix > 0) 273 | run.cell$mito = Matrix::colSums(run.matrix[rownames(run.matrix) %in% mito.gene,]) / run.cell$UMI 274 | colnames(run.matrix) = rownames(run.cell) 275 | 276 | } 277 | 278 | } else { 279 | stop('The h5 file is invalid, please input correct file path.') 280 | files$close_all() 281 | } 282 | 283 | SingleCellData(assay = list(count = as(run.matrix, 'CsparseMatrix')), rowdata = data.frame(run.gene, stringsAsFactors = FALSE), coldata = data.frame(run.cell, stringsAsFactors = FALSE)) 284 | 285 | } 286 | 287 | 288 | -------------------------------------------------------------------------------- /R/AllGenerics.R: -------------------------------------------------------------------------------- 1 | #################################################################################### 2 | #' RISC data 3 | #################################################################################### 4 | #' 5 | #' The RISC object contains all the basic information used in single cell RNA-seq 6 | #' analysis, including raw counts/UMIs, normalized gene values, dimension reduction, 7 | #' cell clustering, and so on. The framework of RISC object is a S4 dataset, 8 | #' consisting of assay, coldata, rowdata, metadata, vargene, cluster, and 9 | #' DimReduction. 10 | #' 11 | #' @rdname setClass 12 | #' @return RISC object: a S4 framework dataset 13 | #' @slot assay The list of gene counts/UMIs: raw and normalized counts 14 | #' @slot coldata The data.frame with cell information, such as cell types, stages, 15 | #' and other factors. 16 | #' @slot rowdata The data.frame with gene information, such as coding or non-coding 17 | #' genes. 18 | #' @slot metadata The data.frame with meta value. 19 | #' @slot vargene The highly variable gene. 20 | #' These genes are utilized in dimension reduction. 21 | #' @slot cluster The cell clustering information: include three algorithms for 22 | #' clustering. 23 | #' @slot DimReduction The values of dimension reduction. 24 | #' @docType class 25 | #' @exportClass RISCdata 26 | 27 | setClass( 28 | 'RISCdata', slots = list( 29 | assay = 'list', 30 | coldata = 'data.frame', 31 | rowdata = 'data.frame', 32 | metadata = 'list', 33 | cluster = 'factor', 34 | DimReduction = 'list', 35 | vargene = 'vector' 36 | ) 37 | ) 38 | 39 | #' RISC data 40 | #' 41 | #' This will show the full information of RISC object, including the number of cells, 42 | #' the number of genes, any biological or statistical information of cells or genes. 43 | #' 44 | #' @rdname setMethod 45 | #' @name RISCdata 46 | #' @aliases RISC object 47 | #' @param object RISC object: a S4 framework dataset 48 | #' @docType methods 49 | 50 | .RISC_show <- function(object){ 51 | cat( 52 | "SingleCell-Dataset", '\n', 53 | "RISC v1.6", '\n', 54 | c('assay:', names(object@assay)), '\n', 55 | c(paste0('colData: ', '(', nrow(object@coldata), ')'), colnames(object@coldata)), '\n', 56 | c(paste0('rowData: ', '(', nrow(object@rowdata), ')'), colnames(object@rowdata)), '\n', 57 | 'DimReduction', '\n', 58 | 'Cell-Clustering', '\n' 59 | ) 60 | } 61 | 62 | setMethod( 63 | f = 'show', 64 | signature = 'RISCdata', 65 | definition = .RISC_show 66 | ) 67 | 68 | 69 | 70 | 71 | -------------------------------------------------------------------------------- /R/Cluster.R: -------------------------------------------------------------------------------- 1 | #################################################################################### 2 | #' Clustering cells 3 | #################################################################################### 4 | #' 5 | #' In RISC, two different methods are provided to cluster cells, all of them are 6 | #' widely used in single cells. The first method is "louvain" based on cell 7 | #' eigenvectors, and the other is "density" which calculates cell clusters using 8 | #' low dimensional space. 9 | #' 10 | #' @rdname Cluster 11 | #' @param object RISC object: a framework dataset. 12 | #' @param method The methods for clustering cells, density and louvain. 13 | #' The "density" is based on the slot "cell.umap" or other low dimensional 14 | #' space; while "louvain" based on "cell.pca" (individual data) or "cell.pls" 15 | #' (for integration data). 16 | #' @param slot The dimension_reduction slot for cell clustering. The default is 17 | #' "cell.umap" under RISC object "DimReduction" item for UMAP method, but the 18 | #' customer can add new dimension_reduction method under DimReduction and use it. 19 | #' @param neighbor The neighbor cells for "igraph" method. 20 | #' @param algorithm The algorithm for knn, the default is "kd_tree", all options: 21 | #' "kd_tree", "cover_tree", "CR", "brute". 22 | #' @param npc The number of PCA or PLS used for cell clustering. 23 | #' @param k The number of cluster searched for, works in "density" method. 24 | #' @param res The resolution of cluster searched for, works in "louvain" method. 25 | #' @param dc The distance used to generate random center points which affect 26 | #' clusters. If have no idea about this, do not input anything. Keep it as the 27 | #' default value for most users. Work for "density" method. 28 | #' @param redo Whether re-cluster the cells. 29 | #' @param random.seed The random seed, the default is 123. 30 | #' @return RISC single cell dataset, the cluster slot. 31 | #' @name scCluster 32 | #' @importFrom densityClust densityClust findClusters 33 | #' @importFrom FNN get.knn 34 | #' @importFrom igraph simplify graph_from_data_frame cluster_louvain E E<- 35 | #' @references Blondel et al., JSTAT (2008) 36 | #' @references Rodriguez et al., Sicence (2014) 37 | #' @export 38 | #' @examples 39 | #' # RISC object 40 | #' obj0 = raw.mat[[3]] 41 | #' obj0 = scPCA(obj0, npc = 10) 42 | #' obj0 = scUMAP(obj0, npc = 3) 43 | #' obj0 = scCluster(obj0, slot = "cell.umap", k = 3, method = 'density') 44 | #' DimPlot(obj0, slot = "cell.umap", colFactor = 'Cluster', size = 2) 45 | 46 | scCluster <- function( 47 | object, 48 | slot = "cell.pca", 49 | neighbor = 10, 50 | algorithm = "kd_tree", 51 | method = 'louvain', 52 | npc = 20, 53 | k = 10, 54 | res = 0.5, 55 | dc = NULL, 56 | redo = TRUE, 57 | random.seed = 123 58 | ) { 59 | 60 | set.seed(random.seed) 61 | k = as.integer(k) 62 | res = as.numeric(res) 63 | neighbor = as.integer(neighbor) 64 | algorithm = as.character(algorithm) 65 | npc = as.integer(npc) 66 | random.seed = as.integer(random.seed) 67 | 68 | if(is.null(dc)){ 69 | dc = object@metadata$dcluster$dc 70 | if(isTRUE(redo)){ 71 | dc = NULL 72 | } else { 73 | dc = dc 74 | } 75 | } else { 76 | dc = dc 77 | } 78 | 79 | if(!is.null(object@vargene) & !is.null(object@DimReduction)){ 80 | 81 | slot0 = as.character(slot) 82 | dimReduce0 = object@DimReduction[[slot0]] 83 | if(is.null(dimReduce0)){ 84 | stop("Do not include this dimention_reduction slot, try another one") 85 | } else if(ncol(dimReduce0) >= npc){ 86 | count = as.matrix(dimReduce0[,1:npc]) 87 | } else { 88 | count = as.matrix(dimReduce0) 89 | } 90 | 91 | if(method == 'density'){ 92 | 93 | dist0 = dist(count) 94 | if(is.null(dc)){ 95 | dataClust = densityClust(dist0, gaussian = TRUE) 96 | } else { 97 | dataClust = densityClust(dist0, dc = dc, gaussian = TRUE) 98 | } 99 | 100 | delta.rho = data.frame(rho = dataClust$rho, delta = dataClust$delta, stringsAsFactors = FALSE) 101 | delta.rho = delta.rho[order(delta.rho$delta, decreasing = TRUE),] 102 | delta.cut = delta.rho$delta[k + 1L] 103 | clust0 = findClusters(dataClust, 0, delta.cut) 104 | # object@metadata$dcluster = clust0 105 | object@cluster = object@coldata$Cluster = as.factor(clust0$clusters) 106 | names(object@cluster) = names(object@coldata$Cluster) = rownames(count) 107 | object@metadata[['clustering']] = data.frame( 108 | Method = 'densityClust', Distance = as.numeric(clust0$dc), 109 | rho = as.numeric(clust0$threshold[1]), delta = as.numeric(clust0$threshold[2]), 110 | stringsAsFactors = F 111 | ) 112 | 113 | } else if(method == 'louvain') { 114 | 115 | clust0 = get.knn(count, k = neighbor, algorithm = algorithm) 116 | clust1 = data.frame(NodStar = rep(1L:nrow(count), neighbor), NodEnd = as.vector(clust0$nn.index), stringsAsFactors = FALSE) 117 | clust1 = graph_from_data_frame(clust1, directed = FALSE) 118 | E(clust1)$weight = 1/(1 + as.vector(clust0$nn.dist)) 119 | clust1 = simplify(clust1) 120 | clust1 = cluster_louvain(clust1, resolution = res) 121 | object@cluster = object@coldata$Cluster = as.factor(clust1$membership) 122 | names(object@cluster) = names(object@coldata$Cluster) = rownames(count) 123 | object@metadata[['clustering']] = data.frame(Method = 'louvain', PCs = npc, Neighbors = neighbor, stringsAsFactors = F) 124 | 125 | } else {stop('A new method later')} 126 | 127 | } else {stop('Please calculate dispersion and dimention reduction first')} 128 | 129 | return(object) 130 | 131 | } 132 | 133 | 134 | -------------------------------------------------------------------------------- /R/RcppExports.R: -------------------------------------------------------------------------------- 1 | # Generated by using Rcpp::compileAttributes() -> do not edit by hand 2 | # Generator token: 10BE3573-1514-4C36-9D1C-5A225CD40393 3 | 4 | sqrt_sp <- function(X) { 5 | .Call('_RISC_sqrt_sp', PACKAGE = 'RISC', X) 6 | } 7 | 8 | cent_sp_d <- function(X) { 9 | .Call('_RISC_cent_sp_d', PACKAGE = 'RISC', X) 10 | } 11 | 12 | multiply_sp_sp <- function(X, Y) { 13 | .Call('_RISC_multiply_sp_sp', PACKAGE = 'RISC', X, Y) 14 | } 15 | 16 | multiply_sp_d <- function(X, Y) { 17 | .Call('_RISC_multiply_sp_d', PACKAGE = 'RISC', X, Y) 18 | } 19 | 20 | multiply_d_d <- function(X, Y) { 21 | .Call('_RISC_multiply_d_d', PACKAGE = 'RISC', X, Y) 22 | } 23 | 24 | multiply_sp_d_sp <- function(X, Y) { 25 | .Call('_RISC_multiply_sp_d_sp', PACKAGE = 'RISC', X, Y) 26 | } 27 | 28 | multiply_sp_d_v <- function(X, Y) { 29 | .Call('_RISC_multiply_sp_d_v', PACKAGE = 'RISC', X, Y) 30 | } 31 | 32 | crossprod_sp_sp <- function(X, Y) { 33 | .Call('_RISC_crossprod_sp_sp', PACKAGE = 'RISC', X, Y) 34 | } 35 | 36 | crossprod_sp_d <- function(X, Y) { 37 | .Call('_RISC_crossprod_sp_d', PACKAGE = 'RISC', X, Y) 38 | } 39 | 40 | crossprod_d_d <- function(X, Y) { 41 | .Call('_RISC_crossprod_d_d', PACKAGE = 'RISC', X, Y) 42 | } 43 | 44 | tcrossprod_sp_sp <- function(X, Y) { 45 | .Call('_RISC_tcrossprod_sp_sp', PACKAGE = 'RISC', X, Y) 46 | } 47 | 48 | tcrossprod_d_d <- function(X, Y) { 49 | .Call('_RISC_tcrossprod_d_d', PACKAGE = 'RISC', X, Y) 50 | } 51 | 52 | winsorize_ <- function(x, y) { 53 | .Call('_RISC_winsorize_', PACKAGE = 'RISC', x, y) 54 | } 55 | 56 | lm_coef <- function(X, y) { 57 | .Call('_RISC_lm_coef', PACKAGE = 'RISC', X, y) 58 | } 59 | 60 | lm_ <- function(X, y) { 61 | .Call('_RISC_lm_', PACKAGE = 'RISC', X, y) 62 | } 63 | 64 | -------------------------------------------------------------------------------- /R/Reduce_Dimension.R: -------------------------------------------------------------------------------- 1 | #################################################################################### 2 | #' Dimension Reduction. 3 | #################################################################################### 4 | #' 5 | #' Based on highly variably expressed genes of the datasets, RISC calculates the 6 | #' principal components (PCs) of the cells using prcomp functions. The major PCs, 7 | #' which explain most gene expression variance, are used for dimension reduciton. 8 | #' 9 | #' @rdname PCA 10 | #' @param object RISC object: a framework dataset. 11 | #' @param npc The number of PCs will be generated based on highly variable genes 12 | #' (usually < 1,500), npc equal to the first 20 PCs as the default. 13 | #' @return RISC single cell dataset, the DimReduction slot. 14 | #' @references Jolliffe et al. (2016) 15 | #' @references Alter et al., PNAS (2000) 16 | #' @references Gonzalez et al., JSS (2008) 17 | #' @references Mevik et al., JSS (2007) 18 | #' @name scPCA 19 | #' @export 20 | #' @examples 21 | #' # RISC object 22 | #' obj0 = raw.mat[[3]] 23 | #' obj0 = scPCA(obj0, npc = 10) 24 | 25 | scPCA <- function(object, npc = 20){ 26 | 27 | if(!length(object@vargene) > 0){ 28 | stop('Please disperse object first') 29 | } else { 30 | 31 | count = object@assay$logcount 32 | var = count[object@vargene,] 33 | var = scale(var, center = TRUE, scale = TRUE) 34 | varpc = irlba(var, nv = npc) 35 | cell.pca = as.matrix(varpc$v) 36 | gene.pca = as.matrix(varpc$u) 37 | var.pca = varpc$d^2 / sum(varpc$d^2) 38 | rownames(cell.pca) = colnames(var) 39 | rownames(gene.pca) = rownames(var) 40 | colnames(cell.pca) = colnames(gene.pca) = names(var.pca) = paste0('PC', 1L:npc) 41 | object@DimReduction[['cell.pca']] = cell.pca 42 | object@DimReduction[['var.pca']] = var.pca 43 | object@DimReduction[['gene.pca']] = gene.pca 44 | return(object) 45 | 46 | } 47 | } 48 | 49 | 50 | 51 | #################################################################################### 52 | #' Dimension Reduction. 53 | #################################################################################### 54 | #' 55 | #' The UMAP is calculated based on the eigenvectors of single cell dataset, and the 56 | #' user can select the eigenvectors manually. Of note, the selected eigenvectors 57 | #' directly affect UMAP values. 58 | #' For the integrated data (the result of "scMultiIntegrate" funciton), RISC utilizes 59 | #' the PCR output "PLS" to calculate the UMAP, therefore, the user has to input "PLS" 60 | #' in "use = ", instead of the default parameter "PCA". 61 | #' 62 | #' @rdname UMAP 63 | #' @param object RISC object: a framework dataset. 64 | #' @param npc The number of the PCs (or the PLS) using for UMAP, the default is 20, 65 | #' but need to be modified by the users. The PCA for individual dataset, while PLS 66 | #' for the integrated data. 67 | #' @param embedding The number of components UMAP output. 68 | #' @param use What components used for UMAP: PCA or PLS. 69 | #' @param neighbors The n_neighbors parameter of UMAP. 70 | #' @param dist The min_dist parameter of UMAP. 71 | #' @param seed The random seed to keep tSNE result consistent. 72 | #' @return RISC single cell dataset, the DimReduction slot. 73 | #' @importFrom umap umap 74 | #' @references Becht et al., Nature Biotech. (2018) 75 | #' @name scUMAP 76 | #' @export 77 | #' @examples 78 | #' # RISC object 79 | #' obj0 = raw.mat[[3]] 80 | #' obj0 = scPCA(obj0, npc = 10) 81 | #' obj0 = scUMAP(obj0, npc = 3) 82 | #' DimPlot(obj0, slot = "cell.umap", colFactor = 'Group', size = 2) 83 | 84 | scUMAP <- function( 85 | object, npc = 20, 86 | embedding = 2, 87 | use = 'PCA', 88 | neighbors = 15, 89 | dist = 0.1, 90 | seed = 123, 91 | ... 92 | ) { 93 | 94 | if(use == 'PCA'){ 95 | 96 | if(length(object@DimReduction$cell.pca) == 0){ 97 | stop('Please disperse object first') 98 | } else { 99 | pca0 = FALSE 100 | pca_center0 = FALSE 101 | pca_scale0 = FALSE 102 | cell.pc = object@DimReduction$cell.pca[,1:npc] 103 | } 104 | 105 | } else if(use == 'PLS'){ 106 | 107 | if(length(object@DimReduction$cell.pls) == 0){ 108 | stop('Please integrate objects first') 109 | } else { 110 | pca0 = TRUE 111 | pca_center0 = TRUE 112 | pca_scale0 = TRUE 113 | cell.pc0 = object@DimReduction$cell.pls 114 | 115 | if(npc <= ncol(cell.pc0)){ 116 | cell.pc = cell.pc0[,1:npc] 117 | } else { 118 | scale.beta = object@metadata$Beta 119 | scale.beta0 = irlba(scale.beta, nv = npc) 120 | cell.pc = rbind(scale.beta0$u, scale.beta0$v) 121 | rownames(cell.pc) = c(rownames(scale.beta), colnames(scale.beta)) 122 | colnames(cell.pc) = paste0('PC', 1L:npc) 123 | # cell.pc = scale.beta1[order(rownames(scale.beta1), decreasing = F),] 124 | } 125 | 126 | } 127 | 128 | } else { 129 | stop('Input use, PCA or PCR') 130 | } 131 | 132 | set.seed(as.numeric(seed)) 133 | embedding = as.integer(embedding) 134 | neighbor0 = as.integer(neighbors) 135 | dist0 = as.numeric(dist) 136 | umap0 = umap(as.matrix(cell.pc), method = 'naive', n_components = embedding, min_dist = dist0, n_neighbors = neighbor0, ... = ...) 137 | cell.umap = as.matrix(umap0$layout) 138 | rownames(cell.umap) = rownames(cell.pc) 139 | colnames(cell.umap) = paste0('UMAP', 1L:embedding) 140 | object@DimReduction[['cell.umap']] = cell.umap 141 | return(object) 142 | 143 | } 144 | 145 | 146 | 147 | #################################################################################### 148 | #' Dimension Reduction. 149 | #################################################################################### 150 | #' 151 | #' The t-SNE is calculated based on the eigenvectors of single cell dataset, and 152 | #' the user can select the eigenvectors manually. Of note, the selected eigenvectors 153 | #' directly affect t-SNE values. 154 | #' For the integrated data (the result of "scMultiIntegrate" funciton), RISC utilizes 155 | #' the PCR output "PLS" to calculate the t-SNE, therefore, the user has to input 156 | #' "PLS" in "use = ", instead of the defaut parameter "PCA". 157 | #' 158 | #' @rdname tSNE 159 | #' @param object RISC object: a framework dataset. 160 | #' @param npc The number of PCs (or PLS) using for t-SNE, the default is 20, 161 | #' but need to be modified by the users. The PCA for individual dataset, while 162 | #' PLS for the integrated data. 163 | #' @param embedding The number of components t-SNE output. 164 | #' @param use What components used for t-SNE: PCA or PLS. 165 | #' @param perplexity Perplexity parameter: if the cell numbers are small, 166 | #' decrease this parameter, otherwise tSNE cannot be calculated. 167 | #' @param seed The random seed to keep tSNE result consistent. 168 | #' @return RISC single cell dataset, the DimReduction slot. 169 | #' @importFrom Rtsne Rtsne 170 | #' @references Laurens van der Maaten, JMLR (2014) 171 | #' @name scTSNE 172 | #' @export 173 | #' @examples 174 | #' # RISC object 175 | #' obj0 = raw.mat[[3]] 176 | #' obj0 = scPCA(obj0, npc = 10) 177 | #' obj0 = scTSNE(obj0, npc = 4, perplexity = 10) 178 | #' DimPlot(obj0, slot = "cell.tsne", colFactor = 'Group', size = 2) 179 | 180 | scTSNE <- function( 181 | object, 182 | npc = 20, 183 | embedding = 2, 184 | use = 'PCA', 185 | perplexity = 30, 186 | seed = 123, 187 | ... 188 | ) { 189 | 190 | npc = as.integer(npc) 191 | perplexity = as.integer(perplexity) 192 | 193 | if(use == 'PCA'){ 194 | 195 | if(length(object@DimReduction$cell.pca) == 0){ 196 | stop('Please disperse object first') 197 | } else { 198 | pca0 = FALSE 199 | pca_center0 = FALSE 200 | pca_scale0 = FALSE 201 | cell.pc = object@DimReduction$cell.pca[,1:npc] 202 | } 203 | 204 | } else if(use == 'PLS'){ 205 | 206 | if(length(object@DimReduction$cell.pls) == 0){ 207 | stop('Please integrate objects first') 208 | } else { 209 | pca0 = TRUE 210 | pca_center0 = TRUE 211 | pca_scale0 = TRUE 212 | cell.pc0 = object@DimReduction$cell.pls 213 | 214 | if(npc <= ncol(cell.pc0)){ 215 | cell.pc = cell.pc0[,1:npc] 216 | } else { 217 | scale.beta = object@metadata$Beta 218 | scale.beta0 = irlba(scale.beta, nv = npc) 219 | cell.pc = rbind(scale.beta0$u, scale.beta0$v) 220 | rownames(cell.pc) = c(rownames(scale.beta), colnames(scale.beta)) 221 | colnames(cell.pc) = paste0('PC', 1L:npc) 222 | # cell.pc = scale.beta1[order(rownames(scale.beta1), decreasing = F),] 223 | } 224 | 225 | } 226 | 227 | } else { 228 | stop('Input use, PCA or PCR') 229 | } 230 | 231 | set.seed(as.numeric(seed)) 232 | embedding = as.integer(embedding) 233 | tsne0 = Rtsne(as.matrix(cell.pc), dims = embedding, pca = pca0, pca_center = pca_center0, pca_scale = pca_scale0, perplexity = perplexity, ... = ...) 234 | cell.tsne = as.matrix(tsne0$Y) 235 | rownames(cell.tsne) = rownames(cell.pc) 236 | colnames(cell.tsne) = paste0('tSNE', 1L:embedding) 237 | object@DimReduction[['cell.tsne']] = cell.tsne 238 | return(object) 239 | 240 | } 241 | 242 | 243 | -------------------------------------------------------------------------------- /R/Utilities.R: -------------------------------------------------------------------------------- 1 | #################################################################################### 2 | #' Utilities Subset data 3 | #################################################################################### 4 | #' 5 | #' The "Subset" function can abstract a data subset from the full dataset, this 6 | #' function not only collect the subset of coldata and rowdata, but also abstract 7 | #' raw counts/UMIs. Meanwhile, after "Subset" function, RISC object need to be 8 | #' normalized and scaled one more time. 9 | #' 10 | #' @rdname Subset 11 | #' @param object RISC object: a framework dataset. 12 | #' @param cells The cells are directly used for collecting a data subset. 13 | #' @param genes The genes are directly used for collecting a data subset. 14 | #' @name SubSet 15 | #' @export 16 | #' @examples 17 | #' # RISC object 18 | #' obj0 = raw.mat[[5]] 19 | #' obj0 20 | #' cell1 = rownames(obj0@coldata)[1:15] 21 | #' obj1 = SubSet(obj0, cells = cell1) 22 | #' obj1 23 | 24 | SubSet <- function(object, cells = NULL, genes = NULL){ 25 | 26 | coldata0 = object@coldata 27 | rowdata0 = object@rowdata 28 | raw.assay = object@assay 29 | DimReduction0 = object@DimReduction 30 | 31 | if(is.null(object)){ 32 | stop('Please input a RISC object') 33 | } else if(!is.null(cells) & is.null(genes)){ 34 | 35 | coldata0 = coldata0[rownames(coldata0) %in% cells,] 36 | 37 | if('Integration' %in% names(object@metadata)){ 38 | 39 | name0 = 'logcount' 40 | raw.assay = raw.assay$logcount 41 | raw.assay = lapply(raw.assay, FUN = function(y){y[, colnames(y) %in% rownames(coldata0), drop = F]}) 42 | raw.assay[sapply(raw.assay, function(x){dim(x)[2] == 0})] = NULL 43 | rowsum0 = lapply(raw.assay, FUN = function(y){Matrix::rowSums(y > 0)}) 44 | rowsum0 = do.call(cbind, rowsum0) 45 | keep = Matrix::rowSums(rowsum0) > 0 46 | gene0 = rownames(object@rowdata)[keep] 47 | raw.assay = lapply(raw.assay, FUN = function(y){y[gene0, , drop = FALSE]}) 48 | raw.assay[sapply(raw.assay, function(x){dim(x)[1] == 0})] = NULL 49 | 50 | } else { 51 | 52 | name0 = names(raw.assay) 53 | raw.assay = raw.assay[[name0]] 54 | raw.assay = raw.assay[, rownames(coldata0), drop = FALSE] 55 | keep = Matrix::rowSums(raw.assay > 0) > 0 56 | gene0 = rownames(object@rowdata)[keep] 57 | raw.assay = raw.assay[gene0,] 58 | 59 | if(name0 == 'count'){ 60 | coldata0$UMI = Matrix::colSums(raw.assay) 61 | coldata0$nGene = Matrix::colSums(raw.assay > 0) 62 | } else { 63 | coldata0 = coldata0 64 | } 65 | 66 | } 67 | 68 | rowdata0 = rowdata0[gene0,] 69 | 70 | if(length(DimReduction0) > 0){ 71 | 72 | for(key0 in names(DimReduction0)){ 73 | if(!key0 %in% c("var.pca", "gene.pca")){ 74 | DimReduction0[[key0]] = DimReduction0[[key0]][rownames(DimReduction0[[key0]]) %in% cells,] 75 | } 76 | else{ 77 | DimReduction0[[key0]] = DimReduction0[[key0]] 78 | } 79 | } 80 | 81 | } else { 82 | DimReduction0 = DimReduction0 83 | } 84 | 85 | } else if(!is.null(cells) & !is.null(genes)){ 86 | 87 | coldata0 = coldata0[rownames(coldata0) %in% cells,] 88 | rowdata0 = rowdata0[rownames(rowdata0) %in% genes,] 89 | gene0 = rownames(rowdata0) 90 | 91 | if('Integration' %in% names(object@metadata)){ 92 | 93 | name0 = 'logcount' 94 | raw.assay = raw.assay$logcount 95 | raw.assay = lapply(raw.assay, FUN = function(y){y[gene0, colnames(y) %in% rownames(coldata0), drop = FALSE]}) 96 | raw.assay[sapply(raw.assay, function(x){dim(x)[1] == 0})] = NULL 97 | raw.assay[sapply(raw.assay, function(x){dim(x)[2] == 0})] = NULL 98 | 99 | } else { 100 | 101 | name0 = names(raw.assay) 102 | raw.assay = raw.assay[[name0]] 103 | raw.assay = raw.assay[gene0, rownames(coldata0), drop = FALSE] 104 | 105 | if(name0 == 'count'){ 106 | coldata0$UMI = Matrix::colSums(raw.assay) 107 | coldata0$nGene = Matrix::colSums(raw.assay > 0) 108 | } else { 109 | coldata0 = coldata0 110 | } 111 | 112 | } 113 | 114 | if(length(DimReduction0) > 0){ 115 | 116 | for(key0 in names(DimReduction0)){ 117 | if(!key0 %in% c("var.pca", "gene.pca")){ 118 | DimReduction0[[key0]] = DimReduction0[[key0]][rownames(DimReduction0[[key0]]) %in% cells,] 119 | } 120 | else{ 121 | DimReduction0[[key0]] = DimReduction0[[key0]] 122 | } 123 | } 124 | 125 | } else { 126 | DimReduction0 = DimReduction0 127 | } 128 | 129 | } else if(is.null(cells) & !is.null(genes)){ 130 | 131 | rowdata0 = rowdata0[rownames(rowdata0) %in% genes,] 132 | gene0 = rownames(rowdata0) 133 | 134 | if('Integration' %in% names(object@metadata)){ 135 | 136 | name0 = 'logcount' 137 | raw.assay = raw.assay$logcount 138 | raw.assay = lapply(raw.assay, FUN = function(y){y[gene0, , drop = FALSE]}) 139 | raw.assay[sapply(raw.assay, function(x){dim(x)[1] == 0})] = NULL 140 | 141 | } else { 142 | 143 | name0 = names(raw.assay) 144 | raw.assay = raw.assay[[name0]] 145 | raw.assay = raw.assay[gene0, , drop = FALSE] 146 | 147 | if(name0 == 'count'){ 148 | coldata0$UMI = Matrix::colSums(raw.assay) 149 | coldata0$nGene = Matrix::colSums(raw.assay > 0) 150 | } else { 151 | coldata0 = coldata0 152 | } 153 | 154 | } 155 | 156 | } else {stop('No parameters')} 157 | 158 | object@coldata = data.frame(coldata0) 159 | object@rowdata = data.frame(rowdata0) 160 | object@DimReduction = DimReduction0 161 | object@assay = list(raw.assay) 162 | names(object@assay) = name0 163 | object@cluster = factor() 164 | 165 | return(object) 166 | 167 | } 168 | 169 | 170 | 171 | #################################################################################### 172 | #' Utilities Add Factors 173 | #################################################################################### 174 | #' 175 | #' The "AddFactor" function can add factors to the full dataset, this function 176 | #' can add one or more factors into coldata. Here the row.names/names of factor 177 | #' matrix/vector should be equal to the row.names of coldata of RISC object. 178 | #' 179 | #' @rdname AddFactor 180 | #' @param object RISC object: a framework dataset. 181 | #' @param colData Input the names that will be added into coldata of RISC object, 182 | #' it should be characters, as the col.names of coldata. 183 | #' @param rowData Input the names that will be added into rowdata of RISC object, 184 | #' it should be characters, as the col.names of rowdata. 185 | #' @param value The factor vector or data.frame that will be added into coldata 186 | #' or rowdata, the vector/data.frame should have equal names/row.names to the 187 | #' row.names coldata or rowdata of RISC object. The input: vector or data.frame. 188 | #' @name AddFactor 189 | 190 | AddFactor <- function(object, colData = NULL, rowData = NULL, value = NULL){ 191 | 192 | coldata0 = as.data.frame(object@coldata) 193 | col.name0 = colnames(coldata0) 194 | rowdata0 = as.data.frame(object@rowdata) 195 | row.name0 = colnames(rowdata0) 196 | 197 | if(is.null(object)) { 198 | stop('Please input a RISC object') 199 | } else if(!is.null(colData)) { 200 | 201 | colData = as.character(colData) 202 | if(inherits(value) == "data.frame") { 203 | coldata1 = data.frame(coldata0, colData = value) 204 | colnames(coldata1) = c(col.name0, colData) 205 | object@coldata = coldata1 206 | } else { 207 | coldata1 = data.frame(coldata0, colData = value) 208 | colnames(coldata1) = c(col.name0, colData) 209 | object@coldata = coldata1 210 | } 211 | 212 | } else if(!is.null(rowData)){ 213 | 214 | rowData = as.character(rowData) 215 | if(inherits(value) == "data.frame") { 216 | rowdata1 = data.frame(rowdata0, colData = value) 217 | colnames(rowdata1) = c(row.name0, rowData) 218 | object@rowdata = rowdata1 219 | } else { 220 | rowdata1 = data.frame(rowdata0, colData = value) 221 | colnames(rowdata1) = c(row.name0, rowData) 222 | object@rowdata = rowdata1 223 | } 224 | 225 | } else { 226 | stop('Value should be vector or data.frame, of which the name/row.names should be equal to row.names of coldata or rowdata') 227 | } 228 | 229 | return(object) 230 | 231 | } 232 | 233 | 234 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | ## RISC 2 | 3 | 4 | ### Overview 5 | Integrated analysis of single cell RNA-sequencing (scRNA-seq) data from multiple batches or studies is often necessary in order to learn functional changes in cellular states upon experimental perturbations or cell type relationships in a developmental lineage. Here we introduce a new algorithm (RPCI) that uses the gene-eigenvectors from a reference dataset to establish a global frame for projecting all datasets, with a clear advantage in preserving genuine gene expression differences in matching cell types between samples, such as those present in cells at distinct developmental stages or in perturbated vs control studies. This R package “RISC” (Robust Integration of Sinlgle Cell RNA-seq data implements the RPCI algorithm, with additional functions for scRNA-seq analysis, such as clustering cells, identifying cluster marker genes, detecting differentially expressed genes between experimental conditions, and importantly outputting integrated gene expression values for downstream data analysis. 6 | 7 | #### RISC v1.7 update, Mar 26, 2024 8 | This version mainly solves the problems that are caused by the update of dependent package igraph. We also add clustering resolution parameters ('res') in InPlot and scCluster (for 'louvain' method) funcitons. Additionally, the mean expression and %s of expressing cells are included for both groups in the marker and differential expression results. Installtion note: dependent package “sparseMatrixStats” is in Bioconductor only (it cannot be installed by GitHub). 9 | 10 | #### RISC v1.6 update 11 | This version mainly solves the problems that are caused by dependent package updates 12 | 13 | #### RISC v1.5 update 14 | Changes from the last release (v1.0)
15 | (1) Replace dependent "RcppEigen" with "RcppArmadillo", fully support sparse matrix in core functions.
16 | (2) Replace dependent "pbmcapply" with "pbapply"
17 | (3) Optimize "scMultiIntegrate" function and reduce memory-consuming; the new RISC release can support integration of datasets with >1.5 million cells and 10,000 genes.
18 | (4) When data integration, all the genes expressed in indivudal datasets will be reserved in the integrated data. The genes shared expressed across samples will be labeled in "rowdata" of RISC object.
19 | (5) Convert "logcounts" in the integrated RISC object "object@assay$logcount" from a large matrix to a list including multiple logcounts matrices, each corrected matrix for the corresponding individual data sets. To output full integrated matrix, mat0 = do.call(cbind, object@assay$logcount)
20 | (6) Change function name "readscdata" -> "readsc"
21 | (7) Change function name "read10Xgenomics" -> "read10X_mtx"
22 | (8) Parameter names in some functions are changed.
23 | 24 | Added new functions
25 | (1) In "scMarker" and "AllMarker" functions, add Wilcoxon Rank Sum and Signed Rank model.
26 | (2) In "scMarker", "AllMarker" and "scDEG" functions, add pseudo-cell (bin cells to generate meta-cells) option to detect marker genes.
27 | (3) Add "slot" parameter in "DimPlot" function, external dimension reduction results can be added in RISC object, e.g. add phate results (phate0) to RISC object obj0@DimReduction$cell.phate = phate0; DimPlot(obj0, slot = "cell.phate", colFactor = 'Group', size = 2, label = TRUE)
28 | (4) Add "read10X_h5" function for 10X Genomics h5 file.
29 | 30 | Removed old functions
31 | (1) delete "readHTSeqdata" function.
32 | 33 | 34 | #### Install dependent packages: 35 | ``` 36 | install.packages(c("Matrix", "irlba", "doParallel", "foreach", "Rtsne", "umap", "MASS", "pbapply", "Rcpp", "RcppArmadillo", "densityClust", "FNN", "igraph", "RColorBrewer", "ggplot2", "gridExtra", "pheatmap", "hdf5r")) 37 | BiocManager::install("sparseMatrixStats") 38 | ``` 39 | 40 | #### Install RISC: 41 | ``` 42 | install_github("https://github.com/bioinfoDZ/RISC.git") 43 | ``` 44 | The RISC package can also be downloaded and installed mannually 45 | Link 46 | ``` 47 | install.packages("/Path/to/RISC_1.7.tar.gz", repos = NULL, type = "source") 48 | ``` 49 | 50 | 51 | ### vignettes 52 | Here we provide a vignette which shows the key steps in analyzing example scRNA-seq datasets from the basal or squamous carcinoma patients before and after anti-PD-1 therapy (GSE123813). Please also check the RISC functions for reading data directly from h5 files. 53 | 54 | #### RISC v1.0 Link 55 | #### RISC v1.6 Link 56 | 57 | We also provide an example of how to convert a Seurat object to a RISC object (to use the new features, please reinstall RISC package), similarly one can convert a RISC object to a Seurat object. 58 | 59 | #### RISC v1.0 Link 60 | #### Notice, RISC v1.6 package is developed in R (v4.2.2), we test this vignette in the same R version. 61 | #### Notice, RISC v1.7 package is developed in R (v4.3.3) 62 | 63 | 64 | #### Contents: 65 | (1) RISC package: "RISC_1.7.tar.gz"
66 | (2) Vignette for GSE123813: "GSE123813_Vignette_RISC_v1.6.pdf"
67 | (3) GSE123813 directory contains the information of cell-type, patients and treatment. 68 | file position, "/GSE123813/Raw_Data/bcc_annotation.tsv"
69 | 70 | Old RISC version: "RISC_1.0.tar.gz" 71 | Old RISC version: "RISC_1.6.0.tar.gz" 72 | 73 | 74 | ### Citation: 75 | Liu Y, Tao W, Zhou B, Zheng D (2021) Robust integration of multiple single-cell RNA sequencing datasets using a single reference space. 76 | Nat Biotechnol 39(7):877-884. 77 | -------------------------------------------------------------------------------- /RISC_1.0.tar.gz: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/bioinfoDZ/RISC/7f1ddf61711f0d24ef8ea0601e550370ff3ec2b5/RISC_1.0.tar.gz -------------------------------------------------------------------------------- /RISC_1.6.0.tar.gz: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/bioinfoDZ/RISC/7f1ddf61711f0d24ef8ea0601e550370ff3ec2b5/RISC_1.6.0.tar.gz -------------------------------------------------------------------------------- /RISC_1.7.tar.gz: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/bioinfoDZ/RISC/7f1ddf61711f0d24ef8ea0601e550370ff3ec2b5/RISC_1.7.tar.gz -------------------------------------------------------------------------------- /RISC_Supplementary/GSE110823/GSE110823.R: -------------------------------------------------------------------------------- 1 | library(RISC) 2 | library(Matrix) 3 | library(R.matlab) 4 | 5 | 6 | PATH = "/Path to the data/GSE110823" 7 | 8 | ################################################################################### 9 | ### Prepare Data ### 10 | ################################################################################### 11 | ## All Cells 12 | dat0 = readMat(paste0(PATH, "/GSM3017261_150000_CNS_nuclei.mat")) 13 | mat0 = dat0$DGE 14 | coldata0 = data.frame(Barcode = paste0("Cell-", dat0$barcodes[1,]), Organ = dat0$sample.type[,1], Type = dat0$cluster.assignment[,1]) 15 | coldata0$Type = sapply(coldata0$Type, function(x){gsub(" ", "-", x, fixed = T)}) 16 | gene0 = data.frame(Symbol = dat0$genes[,1]) 17 | gene0$Symbol = sapply(gene0$Symbol, function(x){gsub(" ", "", x, fixed = T)}) 18 | colnames(mat0) = rownames(gene0) = gene0$Symbol 19 | rownames(mat0) = rownames(coldata0) = coldata0$Barcode 20 | 21 | # P2 Brain 22 | keep = coldata0$Organ == "p2_brain " & !coldata0$Type %in% c("53-Unresolved------------------", "54-Unresolved-Kcng1------------") 23 | coldata1 = coldata0[keep,] 24 | mat1 = t(mat0[keep,]) 25 | keep = Matrix::rowSums(mat1 > 0) > 0 26 | mat1 = mat1[keep,] 27 | rowdata1 = data.frame(Symbol = rownames(mat1), row.names = rownames(mat1)) 28 | dat1 = readscdata(mat1, coldata1, rowdata1, is.filter = F) 29 | 30 | # P11 Brain 31 | keep = coldata0$Organ == "p11_brain" & !coldata0$Type %in% c("53-Unresolved------------------", "54-Unresolved-Kcng1------------") 32 | coldata1 = coldata0[keep,] 33 | mat1 = t(mat0[keep,]) 34 | keep = Matrix::rowSums(mat1 > 0) > 0 35 | mat1 = mat1[keep,] 36 | rowdata1 = data.frame(Symbol = rownames(mat1), row.names = rownames(mat1)) 37 | dat2 = readscdata(mat1, coldata1, rowdata1, is.filter = F) 38 | 39 | 40 | ################################################################################### 41 | ### RISC Objects ### 42 | ################################################################################### 43 | process0 <- function(obj0){ 44 | obj0 = scFilter(obj0, min.UMI = 500, max.UMI = 5000, min.gene = 200, min.cell = 3) 45 | obj0 = scNormalize(obj0) 46 | obj0 = scDisperse(obj0) 47 | print(length(obj0@vargene)) 48 | return(obj0) 49 | } 50 | 51 | dat1 = process0(dat1) 52 | dat2 = process0(dat2) 53 | 54 | 55 | ################################################################################### 56 | ### Uncorrect ### 57 | ################################################################################### 58 | library(irlba) 59 | library(Rtsne) 60 | library(ggplot2) 61 | library(RColorBrewer) 62 | 63 | set.seed(1987) 64 | gene0 = intersect(dat1@rowdata$Symbol, dat2@rowdata$Symbol) 65 | ann0 = read.table(file = paste0(PATH, "/GSE110823_ann.tsv"), sep = "\t", header = T, stringsAsFactors = F) 66 | logmat0 = cbind(dat2@assay$logcount[gene0, ann0$Barcode[ann0$Set == "P11"]], dat1@assay$logcount[gene0, ann0$Barcode[ann0$Set == "P2"]]) 67 | pca0 = irlba(logmat0, nv = 50)$v 68 | tsne0 = Rtsne(pca0)$Y 69 | m0 = data.frame(tSNE1 = tsne0[,1], tSNE2 = tsne0[,2], Set = ann0$Set, CellType = ann0$CellType) 70 | m0$Set = factor(m0$Set, levels = c("P2", "P11")) 71 | m0$CellType = factor(m0$CellType, levels = paste0("C", 1:59)) 72 | color0 = c("#1B9E77", "#288E96", "#367EB6", "#419486", "#4BAC50", "#579055", "#636C63", "#667F49", "#669E26", "#6B9254", "#72789C", "#8064AD", "#9154A5", "#98649F", "#98899B", "#9C867A", "#A26643", "#A65D25", "#A66D20", "#B07117", "#C9650A", "#DB5206", "#E03013", "#E42F18", "#E5760B", "#E69B12", "#E65C54", "#E8318E", "#F05BA8", "#F780B3", "#FB7F56", "#FF8201", "#FFC01A", "#FFFF33", "#9E0142", "#AF1446", "#C0274A", "#D13A4E", "#DC494C", "#E65848", "#F06744", "#F57948", "#F88D51", "#FBA15B", "#FDB466", "#FDC373", "#FDD380", "#FEE18E", "#FEEB9E", "#FEF5AE", "#FFFFBF", "#F7FBB2", "#EFF8A6", "#E7F59A", "#D7EF9B", "#C4E79E", "#B2E0A2", "#9ED7A4", "#88CFA4", "#72C7A4", "#5FBAA8", "#4FA8AF", "#3F96B7", "#3484BB", "#4272B2", "#5060AA", "#5E4FA2") 73 | 74 | ggplot(m0, aes(tSNE1, tSNE2)) + 75 | geom_point(aes(color = Set), size = 0.5) + 76 | scale_color_manual(values = c("#FB8072", "#80B1D3")) + 77 | theme_bw(base_size = 12, base_line_size = 0) + 78 | labs(color = "Set") + 79 | guides(color = guide_legend(override.aes = list(size = 8), ncol = 1)) 80 | ggplot(m0, aes(tSNE1, tSNE2)) + 81 | geom_point(aes(color = CellType), size = 0.5) + 82 | scale_color_manual(values = color0) + 83 | theme_bw(base_size = 12, base_line_size = 0) + 84 | labs(color = "Cell Type") + 85 | theme(legend.text = element_text(size = 16), legend.title = element_text(size = 20)) + 86 | guides(color = guide_legend(override.aes = list(size = 8), ncol = 4)) 87 | 88 | 89 | ################################################################################### 90 | ### Integration Data ### 91 | ################################################################################### 92 | ## Integration 93 | var0 = read.table(file = paste0(PATH, "/var.tsv"), sep = "\t", header = F, stringsAsFactors = F) 94 | var0 = var0$V1 95 | data0 = list(dat2, dat1) 96 | InPlot(data0, var.gene = var0, ncore = 6) 97 | data0 = scMultiIntegrate(data0, eigens = 40, var.gene = var0, add.Id = c("P11", "P2"), adjust = F, ncore = 1) 98 | data0 = scTSNE(data0, npc = 42, use = "PLS") 99 | ann0 = read.table(file = paste0(PATH, "/GSE110823_ann.tsv"), sep = "\t", header = T, stringsAsFactors = F) 100 | ann0$scBarcode = paste0(ann0$Set, "_", ann0$Barcode) 101 | cell0 = intersect(rownames(data0@coldata), ann0$scBarcode) 102 | data0 = SubSet(data0, cells = cell0) 103 | data0@coldata$CellType = factor(ann0$CellType, levels = paste0("C", 1:59)) 104 | 105 | color0 = c("#1B9E77", "#288E96", "#367EB6", "#419486", "#4BAC50", "#579055", "#636C63", "#667F49", "#669E26", "#6B9254", "#72789C", "#8064AD", "#9154A5", "#98649F", "#98899B", "#9C867A", "#A26643", "#A65D25", "#A66D20", "#B07117", "#C9650A", "#DB5206", "#E03013", "#E42F18", "#E5760B", "#E69B12", "#E65C54", "#E8318E", "#F05BA8", "#F780B3", "#FB7F56", "#FF8201", "#FFC01A", "#FFFF33", "#9E0142", "#AF1446", "#C0274A", "#D13A4E", "#DC494C", "#E65848", "#F06744", "#F57948", "#F88D51", "#FBA15B", "#FDB466", "#FDC373", "#FDD380", "#FEE18E", "#FEEB9E", "#FEF5AE", "#FFFFBF", "#F7FBB2", "#EFF8A6", "#E7F59A", "#D7EF9B", "#C4E79E", "#B2E0A2", "#9ED7A4", "#88CFA4", "#72C7A4", "#5FBAA8", "#4FA8AF", "#3F96B7", "#3484BB", "#4272B2", "#5060AA", "#5E4FA2") 106 | DimPlot(data0, slot = "cell.tsne", colFactor = "Set") 107 | DimPlot(data0, slot = "cell.tsne", colFactor = "CellType", Colors = color0) 108 | 109 | DimPlot(data0, slot = "cell.tsne", genes = "Ebf3", size = 0.2) 110 | DimPlot(data0, slot = "cell.tsne", genes = "Fat2", size = 0.2) 111 | 112 | 113 | -------------------------------------------------------------------------------- /RISC_Supplementary/GSE111113/GSE111113.R: -------------------------------------------------------------------------------- 1 | library(RISC) 2 | library(ggplot2) 3 | library(RColorBrewer) 4 | library(irlba) 5 | library(umap) 6 | 7 | 8 | PATH = "/Path to the data/GSE111113" 9 | 10 | ################################################################################### 11 | ### RISC raw ### 12 | ################################################################################### 13 | mat0 = read.table(file = paste0(PATH, '/GSE111113_Table_S1_FilterNormal10xExpMatrix.txt'), sep = '\t', header = T, stringsAsFactors = F) 14 | mat1 = as.matrix(mat0[,-c(1:3)]) 15 | rownames(mat1) = mat0$gene_id 16 | Group = sapply(colnames(mat1), function(x){strsplit(x, "_")[[1]][1]}) 17 | Symbol0 = mat0[,c(1, 3)] 18 | colnames(Symbol0) = c('Ensembl', 'Symbol') 19 | 20 | 21 | ################################################################################### 22 | ### Uncorrected data ### 23 | ################################################################################### 24 | dat0 = readscdata(count = mat1, cell = data.frame(Time = Group, row.names = colnames(mat1)), gene = data.frame(Symbol = rownames(mat1), row.names = rownames(mat1))) 25 | dat0 = scFilter(dat0, min.UMI = 500, max.UMI = Inf, min.gene = 200, min.cell = 5) 26 | cell0 = rownames(dat0@coldata)[dat0@coldata$Time %in% c('E16', 'E18', 'P4', 'Adu1', 'Adu2')] 27 | dat0 = SubSet(dat0, cells = cell0) 28 | dat0 = scNormalize(dat0) 29 | dat0 = scDisperse(dat0) 30 | length(dat0@vargene) 31 | dat0 = scPCA(dat0) 32 | dat0 = scUMAP(dat0) 33 | 34 | UMAPlot(dat0, colFactor = 'Time', Colors = brewer.pal(5, 'Spectral')) 35 | 36 | 37 | ################################################################################### 38 | ### RISC integration ### 39 | ################################################################################### 40 | cell0 = rownames(dat0@coldata)[dat0@coldata$Time == 'E16'] 41 | dat1 = SubSet(dat0, cells = cell0) 42 | cell0 = rownames(dat0@coldata)[dat0@coldata$Time == 'E18'] 43 | dat2 = SubSet(dat0, cells = cell0) 44 | cell0 = rownames(dat0@coldata)[dat0@coldata$Time == 'P4'] 45 | dat3 = SubSet(dat0, cells = cell0) 46 | cell0 = rownames(dat0@coldata)[dat0@coldata$Time == 'Adu1'] 47 | dat4 = SubSet(dat0, cells = cell0) 48 | cell0 = rownames(dat0@coldata)[dat0@coldata$Time == 'Adu2'] 49 | dat5 = SubSet(dat0, cells = cell0) 50 | 51 | dat1 = scDisperse(dat1) 52 | length(dat1@vargene) 53 | dat2= scDisperse(dat2) 54 | length(dat2@vargene) 55 | dat3= scDisperse(dat3) 56 | length(dat3@vargene) 57 | dat4= scDisperse(dat4) 58 | length(dat4@vargene) 59 | dat5= scDisperse(dat5) 60 | length(dat5@vargene) 61 | 62 | 63 | ################################################################################### 64 | ### RISC integration ### 65 | ################################################################################### 66 | ### Integration All 67 | var0 = read.table(file = paste0(PATH, "/var.tsv"), sep = "\t", header = F, stringsAsFactors = F) 68 | var0 = var0$V1 69 | dat.all = list(dat4, dat5, dat3, dat2, dat1) 70 | InPlot(dat.all, var.gene = var0) 71 | dat.all = scMultiIntegrate(dat.all, eigens = 15, var.gene = var0, add.Id = c('Adult1', 'Adult2', 'P4', 'E18', 'E16'), ncore = 4) 72 | dat.all = scUMAP(dat.all, npc = 15, use = 'PLS') 73 | dat.all@coldata$Set = factor(dat.all@coldata$Set, levels = c('E16', 'E18', 'P4', 'Adult1', 'Adult2')) 74 | 75 | DimPlot(dat.all, slot = "cell.umap", colFactor = 'Set') 76 | UMAPlot(dat.all, genes = Symbol0$Ensembl[Symbol0$Symbol == 'Wfdc18']) 77 | UMAPlot(dat.all, genes = Symbol0$Ensembl[Symbol0$Symbol == 'Sostdc1']) 78 | 79 | ## Integration Patial 80 | dat.par = list(dat4, dat5, dat2, dat1) 81 | dat.par = scMultiIntegrate(dat.par, eigens = 20, var.gene = var0, add.Id = c('Adult1', 'Adult2', 'E18', 'E16'), ncore = 4) 82 | dat.par = scUMAP(dat.par, npc = 20, use = 'PLS') 83 | dat.par@coldata$Set = factor(dat.par@coldata$Set, levels = c('E16', 'E18', 'Adult1', 'Adult2')) 84 | 85 | DimPlot(dat.par, slot = "cell.umap", colFactor = 'Set') 86 | UMAPlot(dat.par, genes = Symbol0$Ensembl[Symbol0$Symbol == 'Wfdc18']) 87 | UMAPlot(dat.par, genes = Symbol0$Ensembl[Symbol0$Symbol == 'Sostdc1']) 88 | 89 | 90 | -------------------------------------------------------------------------------- /RISC_Supplementary/GSE114727/GSE114727.R: -------------------------------------------------------------------------------- 1 | library(RISC) 2 | library(irlba) 3 | library(umap) 4 | library(ggplot2) 5 | library(RColorBrewer) 6 | 7 | 8 | PATH = "/Path to the data/GSE114727" 9 | 10 | ################################################################################### 11 | ### Raw Data ### 12 | ################################################################################### 13 | # BC09 rep1 14 | cell1 = read.csv(file = paste0(PATH, "/10X_Genomics/GSM3148580_BC09_TUMOR1_filtered_contig_annotations.csv"), sep = ",", header = T, stringsAsFactors = F) 15 | cell1 = cell1[cell1$full_length == 'True' & cell1$productive == 'True',] 16 | data1 = read10Xgenomics(data.path = paste0(PATH, "/10X_Genomics/BC09_Tech1")) 17 | data1 = SubSet(data1, cells = cell1$barcode) 18 | 19 | # BC09 rep2 20 | cell2 = read.csv(file = paste0(PATH, "/10X_Genomics/GSM3148581_BC09_TUMOR2_filtered_contig_annotations.csv"), sep = ",", header = T, stringsAsFactors = F) 21 | cell2 = cell2[cell2$full_length == 'True' & cell2$productive == 'True',] 22 | data2 = read10Xgenomics(data.path = paste0(PATH, "/10X_Genomics/BC09_Tech2")) 23 | data2 = SubSet(data2, cells = cell2$barcode) 24 | 25 | # BC10 26 | cell3 = read.csv(file = paste0(PATH, "/10X_Genomics/GSM3148582_BC10_TUMOR1_filtered_contig_annotations.csv"), sep = ",", header = T, stringsAsFactors = F) 27 | cell3 = cell3[cell3$full_length == 'True' & cell3$productive == 'True',] 28 | data3 = read10Xgenomics(data.path = paste0(PATH, "/10X_Genomics/BC10_Tech1")) 29 | data3 = SubSet(data3, cells = cell3$barcode) 30 | 31 | # BC11 rep1 32 | cell4 = read.csv(file = paste0(PATH, "/10X_Genomics/GSM3148583_BC11_TUMOR1_filtered_contig_annotations.csv"), sep = ",", header = T, stringsAsFactors = F) 33 | cell4 = cell4[cell4$full_length == 'True' & cell4$productive == 'True',] 34 | data4 = read10Xgenomics(data.path = paste0(PATH, "/10X_Genomics/BC11_Tech1")) 35 | data4 = SubSet(data4, cells = cell4$barcode) 36 | 37 | # BC11 rep2 38 | cell5 = read.csv(file = paste0(PATH, "/10X_Genomics/GSM3148584_BC11_TUMOR2_filtered_contig_annotations.csv"), sep = ",", header = T, stringsAsFactors = F) 39 | cell5 = cell5[cell5$full_length == 'True' & cell5$productive == 'True',] 40 | data5 = read10Xgenomics(data.path = paste0(PATH, "/10X_Genomics/BC11_Tech2")) 41 | data5 = SubSet(data5, cells = cell5$barcode) 42 | 43 | # GSE110686 44 | data6 = read10Xgenomics(data.path = paste0(PATH, "/10X_Genomics/GSE110686")) 45 | 46 | 47 | ################################################################################### 48 | ### Prepare Data ### 49 | ################################################################################### 50 | process0 <- function(obj0){ 51 | obj0 = scFilter(obj0, min.UMI = 1000, max.UMI = 20000, min.gene = 500, min.cell = 5) 52 | obj0 = scNormalize(obj0, ncore = 4) 53 | obj0 = scDisperse(obj0) 54 | print(length(obj0@vargene)) 55 | return(obj0) 56 | } 57 | 58 | data1 = process0(data1) 59 | data2 = process0(data2) 60 | data3 = process0(data3) 61 | data4 = process0(data4) 62 | data5 = process0(data5) 63 | data6 = process0(data6) 64 | 65 | FilterPlot(data1) 66 | FilterPlot(data2) 67 | FilterPlot(data3) 68 | FilterPlot(data4) 69 | FilterPlot(data5) 70 | FilterPlot(data6) 71 | 72 | 73 | ################################################################################### 74 | ### Uncorrect Data ### 75 | ################################################################################### 76 | var0 = Reduce(intersect, list( 77 | rownames(data1@assay$logcount), rownames(data2@assay$logcount), rownames(data3@assay$logcount), 78 | rownames(data4@assay$logcount), rownames(data5@assay$logcount), rownames(data6@assay$logcount) 79 | )) 80 | logmat0 = cbind( 81 | as.matrix(data2@assay$logcount)[var0,], as.matrix(data1@assay$logcount)[var0,], 82 | as.matrix(data3@assay$logcount)[var0,], as.matrix(data4@assay$logcount)[var0,], 83 | as.matrix(data5@assay$logcount)[var0,], as.matrix(data6@assay$logcount)[var0,] 84 | ) 85 | pca0 = irlba(logmat0, nv = 12)$v 86 | umap0 = umap(pca0)$layout 87 | ann0 = read.table(file = paste0(PATH, "/GSE114727_Anno_All.tsv"), sep = "\t", header = T, stringsAsFactors = F) 88 | m0 = data.frame(UMAP1 = umap0[,1], UMAP2 = umap0[,2], Set = ann0$Set, Type = ann0$Type) 89 | m0$Type = factor(m0$Type, levels = c( 90 | "CD4+ sub1", "CD4+ sub1 TN", "CD4+ sub2", "CD4+ sub3", "CD4+ sub4", 91 | "CD4+ Treg", "CD8+ sub1", "CD8+ sub2", "CD8+ sub3", "CD8+ Trm" 92 | )) 93 | m0$Set0 = factor( 94 | m0$Set, 95 | levels = c("BC_ER", "BC1_ER_PR", "BC2_ER_PR", "BC1_Her2", "BC2_Her2", "BC_TN"), 96 | labels = c("BC ER+", "BC ER+PR+", "BC ER+PR+", "BC Her2+", "BC Her2+", "BC TN") 97 | ) 98 | 99 | ggplot(m0, aes(UMAP1, UMAP2)) + 100 | geom_point(aes(color = Set0, shape = Set), size = 1, alpha = 1) + 101 | scale_color_manual(values = c("#1B9E77", "#D95F02", "#7570B3", "#E7298A")) + 102 | scale_shape_manual(values = c(1, 2, 4, 5, 6, 8)) + 103 | theme_bw(base_line_size = 0) + 104 | labs(x = 'UMAP-1', y = 'UMAP-2', color = 'Set', shape = 'Source') + 105 | theme(plot.title = element_text(hjust = 0.5, size = 16)) + 106 | guides(colour = guide_legend(override.aes = list(size = 5), ncol = 1), shape = guide_legend(override.aes = list(size = 5), ncol = 1)) 107 | 108 | ggplot(m0, aes(UMAP1, UMAP2)) + 109 | geom_point(aes(color = Type, shape = Set), size = 0.5, alpha = 1) + 110 | scale_color_manual(values = c("#E41A1C", "#377EB8", "#4DAF4A", "#6A3D9A", "#FF7F00", "#A65628", "#F781BF", "#984EA3", "#999999", "#E7298A")) + 111 | scale_shape_manual(values = c(1, 2, 4, 5, 6, 8)) + 112 | theme_bw(base_line_size = 0) + 113 | labs(x = 'UMAP-1', y = 'UMAP-2', color = 'Set', shape = 'Source') + 114 | theme(plot.title = element_text(hjust = 0.5, size = 16)) + 115 | guides(colour = guide_legend(override.aes = list(size = 5), ncol = 1), shape = guide_legend(override.aes = list(size = 5), ncol = 1)) 116 | 117 | 118 | ################################################################################### 119 | ### Data Integration ### 120 | ################################################################################### 121 | ## BC positive 122 | dat.pos = list(data2, data1, data3, data4, data5) 123 | var0 = read.table(file = paste0(PATH, "/var_pos.tsv"), sep = "\t", header = F, stringsAsFactors = F) 124 | var0 = var0$V1 125 | InPlot(dat.pos, var.gene = var0) 126 | dat.pos = scMultiIntegrate(dat.pos, eigens = 12, var.gene = var0, add.Id = c("BC2_ER_PR", "BC1_ER_PR", "BC_ER", "BC1_Her2", "BC2_Her2"), ncore = 4) 127 | dat.pos = scUMAP(dat.pos, npc = 12, use = "PLS") 128 | 129 | ann0 = read.table(file = paste0(PATH, "/GSE114727_Anno_All.tsv"), sep = "\t", header = T, stringsAsFactors = F) 130 | ann0 = ann0[ann0$Set != "BC_TN",] 131 | dat.pos@coldata$Type = as.character(ann0$Type) 132 | 133 | DimPlot(dat.pos, slot = "cell.umap", colFactor = "Set", size = 0.5) 134 | DimPlot(dat.pos, slot = "cell.umap", colFactor = "Type", size = 0.5) 135 | DimPlot(dat.pos, slot = "cell.umap", genes = c("HAVCR2", "CD8A", "CD4", "FOXP3"), size = 0.2) 136 | 137 | 138 | ## BC positive and triple negative 139 | dat.all = list(data2, data1, data3, data4, data5, data6) 140 | var0 = read.table(file = paste0(PATH, "/var_all.tsv"), sep = "\t", header = F, stringsAsFactors = F) 141 | var0 = var0$V1 142 | InPlot(dat.all, var.gene = var0) 143 | dat.all = scMultiIntegrate(dat.all, eigens = 12, var.gene = var0, add.Id = c("BC2_ER_PR", "BC1_ER_PR", "BC_ER", "BC1_Her2", "BC2_Her2", "BC_TN"), ncore = 4) 144 | dat.all = scUMAP(dat.all, npc = 12, use = "PLS") 145 | 146 | ann0 = read.table(file = paste0(PATH, "/GSE114727_Anno_All.tsv"), sep = "\t", header = T, stringsAsFactors = F) 147 | dat.all@coldata$Type = as.character(ann0$Type) 148 | 149 | DimPlot(dat.all, slot = "cell.umap", colFactor = "Set", size = 0.5) 150 | DimPlot(dat.all, slot = "cell.umap", colFactor = "Type", size = 0.5) 151 | 152 | UMAPlot(dat.all, genes = "HAVCR2", size = 0.5, exp.col = "firebrick2") 153 | UMAPlot(dat.all, genes = "CD8A", size = 0.5, exp.col = "firebrick2") 154 | UMAPlot(dat.all, genes = "CD4", size = 0.5, exp.col = "firebrick2") 155 | UMAPlot(dat.all, genes = "FOXP3", size = 0.5, exp.col = "firebrick2") 156 | UMAPlot(dat.all, genes = "CD40LG", size = 0.5, exp.col = "firebrick2") 157 | UMAPlot(dat.all, genes = "DPP4", size = 0.5, exp.col = "firebrick2") 158 | UMAPlot(dat.all, genes = "CHN1", size = 0.5, exp.col = "firebrick2") 159 | UMAPlot(dat.all, genes = "KRT86", size = 0.5, exp.col = "firebrick2") 160 | UMAPlot(dat.all, genes = "LINC00402", size = 0.5, exp.col = "firebrick2") 161 | UMAPlot(dat.all, genes = "IKZF2", size = 0.5, exp.col = "firebrick2") 162 | UMAPlot(dat.all, genes = "ZNF683", size = 0.5, exp.col = "firebrick2") 163 | UMAPlot(dat.all, genes = "AIF1", size = 0.5, exp.col = "firebrick2") 164 | UMAPlot(dat.all, genes = "PLEK", size = 0.5, exp.col = "firebrick2") 165 | 166 | 167 | -------------------------------------------------------------------------------- /RISC_Supplementary/GSE114727/var_all.tsv: -------------------------------------------------------------------------------- 1 | MMP23B 2 | TNFRSF25 3 | TNFRSF9 4 | AGTRAP 5 | DHRS3 6 | SPEN 7 | ID3 8 | RUNX3 9 | STMN1 10 | UBXN11 11 | ZNF683 12 | MAP3K6 13 | FGR 14 | MATN1-AS1 15 | S100PBP 16 | ZC3H12A 17 | C1orf228 18 | PLK3 19 | CDKN2C 20 | ZCCHC11 21 | PDE4B 22 | GADD45A 23 | IFI44L 24 | LMO4 25 | GBP3 26 | GBP1 27 | GBP4 28 | TGFBR3 29 | DPYD 30 | CDC14A 31 | VCAM1 32 | S1PR1 33 | CSF1 34 | CHI3L2 35 | C1orf162 36 | PTPN22 37 | FAM46C 38 | CD160 39 | PDE4DIP 40 | LINC00869 41 | PBXIP1 42 | LMNA 43 | SEMA4A 44 | FCRL3 45 | FCRL6 46 | CD84 47 | SLAMF7 48 | CD244 49 | MPZ 50 | HSPA6 51 | FCGR3A 52 | GPA33 53 | XCL2 54 | XCL1 55 | ATP1B1 56 | FASLG 57 | TNFSF4 58 | RABGAP1L 59 | GLUL 60 | RGS16 61 | C1orf21 62 | IVNS1ABP 63 | RGS13 64 | RGS2 65 | ASPM 66 | LAX1 67 | IL10 68 | CD55 69 | G0S2 70 | HHAT 71 | TRAF5 72 | ATF3 73 | CENPF 74 | TMEM63A 75 | GALNT2 76 | TTC13 77 | GNG4 78 | LYST 79 | EFCAB2 80 | NLRP3 81 | RSAD2 82 | AC092580.4 83 | RRM2 84 | FAM49A 85 | RHOB 86 | SLC30A3 87 | EIF2AK2 88 | AC006369.2 89 | CDC42EP3 90 | HNRNPLL 91 | GALM 92 | THADA 93 | SPTBN1 94 | PLEK 95 | CAPG 96 | GNLY 97 | CD8A 98 | CD8B 99 | AC133644.2 100 | RP11-1399P15.1 101 | MAL 102 | ANKRD36B 103 | IL1R2 104 | LIMS1 105 | MIR4435-2HG 106 | AC017002.1 107 | ZC3H6 108 | SLC20A1 109 | PTPN4 110 | HS6ST1 111 | ZEB2 112 | RBMS1 113 | METTL8 114 | ITGA6 115 | CDCA7 116 | GPR155 117 | CHN1 118 | TTN 119 | AC104820.2 120 | ITGA4 121 | NAB1 122 | SLC39A10 123 | HSPD1 124 | SPATS2L 125 | CTLA4 126 | ICOS 127 | EPHA4 128 | ITM2C 129 | PASK 130 | PDCD1 131 | ITPR1 132 | BHLHE40-AS1 133 | BHLHE40 134 | SRGAP3 135 | SH3BP5 136 | ANKRD28 137 | RFTN1 138 | NR1D2 139 | SLC4A7 140 | RP11-222K16.2 141 | EOMES 142 | CMC1 143 | CMTM8 144 | TRANK1 145 | CSRNP1 146 | ENTPD3-AS1 147 | CCR2 148 | CCR5 149 | LRRC2 150 | CISH 151 | ATXN7 152 | FRMD4B 153 | SENP7 154 | NFKBIZ 155 | BBX 156 | ZBED2 157 | CD200 158 | BTLA 159 | SIDT1 160 | RP11-553L6.2 161 | ZNF80 162 | GOLGB1 163 | PARP9 164 | PARP15 165 | H1FX 166 | SLC9A9 167 | PLSCR1 168 | GYG1 169 | HPS3 170 | GPR171 171 | RAP2B 172 | TIPARP 173 | SMC4 174 | SKIL 175 | TNFSF10 176 | KLHL24 177 | TPRG1 178 | CCDC50 179 | SPON2 180 | ZFYVE28 181 | LYAR 182 | BOD1L1 183 | CD38 184 | FGFBP2 185 | SEL1L3 186 | DTHD1 187 | KLF3 188 | ATP10D 189 | TXK 190 | HOPX 191 | CENPC 192 | RUFY3 193 | AREG 194 | CXCL13 195 | ANTXR2 196 | PLAC8 197 | PTPN13 198 | PPM1K 199 | HERC5 200 | PPP3CA 201 | NFKB1 202 | LEF1 203 | TIFA 204 | LARP7 205 | TNIP3 206 | KIAA1109 207 | IL2 208 | SPRY1 209 | RP11-83A24.2 210 | PRMT9 211 | ANKRD37 212 | TLR3 213 | LPCAT1 214 | PTGER4 215 | ANXA2R 216 | ITGA1 217 | PDE4D 218 | ENC1 219 | F2R 220 | ARRDC3 221 | KIAA0825 222 | SLF1 223 | KIF3A 224 | JADE2 225 | TGFBI 226 | EGR1 227 | ARAP3 228 | PPP2R2B 229 | JAKMIP2 230 | ADRB2 231 | HAVCR2 232 | PTTG1 233 | MIR3142HG 234 | N4BP3 235 | C5orf45 236 | IRF4 237 | SERPINB9 238 | GFOD1 239 | CD83 240 | MYLIP 241 | ATXN1 242 | SOX4 243 | FAM65B 244 | HIST1H1C 245 | HIST1H1E 246 | BTN3A3 247 | TNF 248 | NCR3 249 | AIF1 250 | HLA-DRA 251 | HLA-DQA1 252 | HLA-DQB1 253 | HLA-DMB 254 | HLA-DMA 255 | HMGA1 256 | CDKN1A 257 | PIM1 258 | CCDC167 259 | RUNX2 260 | CLIC5 261 | PHIP 262 | NT5E 263 | PRDM1 264 | AIM1 265 | SCML4 266 | SESN1 267 | THEMIS 268 | SAMD3 269 | SGK1 270 | AHI1 271 | PDE7B 272 | IFNGR1 273 | RP11-356I2.4 274 | SYNE1 275 | SYTL3 276 | CCR6 277 | CHST12 278 | ETV1 279 | AHR 280 | HDAC9 281 | AOAH 282 | ELMO1 283 | TRGC2 284 | TRGV10 285 | TRGV9 286 | TRG-AS1 287 | TRGV5 288 | TRGV3 289 | TRGV2 290 | MYO1G 291 | IGFBP3 292 | UPP1 293 | AUTS2 294 | GTF2I 295 | FGL2 296 | ABCB1 297 | SAMD9 298 | SAMD9L 299 | PILRB 300 | TRIM56 301 | CLDN15 302 | ATXN7L1 303 | CDHR3 304 | NAMPT 305 | IFRD1 306 | CAV1 307 | KDM7A 308 | TRBV28 309 | EPHB6 310 | EPHA1 311 | TCAF2 312 | GIMAP6 313 | GIMAP1 314 | GIMAP5 315 | SMARCD3 316 | P2RY8 317 | SMS 318 | MID1IP1 319 | LINC01281 320 | USP11 321 | TIMP1 322 | PIM2 323 | FOXP3 324 | TSPYL2 325 | SMC1A 326 | MAGEH1 327 | CYSLTR1 328 | ZMAT1 329 | NGFRAP1 330 | CXorf57 331 | SH2D1A 332 | CD40LG 333 | GAB3 334 | IL9R 335 | TNFRSF10A 336 | CLU 337 | RP11-489E7.4 338 | RP11-51J9.5 339 | EIF4EBP1 340 | CEBPD 341 | TOX 342 | RP11-25K19.1 343 | MYBL1 344 | NCOA2 345 | MSC 346 | ZBTB10 347 | FABP5 348 | GEM 349 | FBXO32 350 | FAM84B 351 | MYC 352 | TMEM71 353 | PLEC 354 | UHRF2 355 | PLIN2 356 | CDKN2A 357 | DDX58 358 | B4GALT1 359 | AQP3 360 | CEP78 361 | TLE4 362 | CTSL 363 | CKS2 364 | GADD45G 365 | UNQ6494 366 | TRMO 367 | TRAF1 368 | STOM 369 | MVB12B 370 | PRRC2B 371 | SETX 372 | RALGDS 373 | SARDH 374 | PHPT1 375 | CLIC3 376 | NPDC1 377 | TUBB4B 378 | IRF7 379 | IFITM10 380 | KCNQ1OT1 381 | OSBPL5 382 | RIC3 383 | DKK3 384 | PDE3B 385 | NUCB2 386 | CD59 387 | TRIM44 388 | PRR5L 389 | FAM111A 390 | MS4A6A 391 | MS4A1 392 | RP11-286N22.8 393 | PLA2G16 394 | PPP1R14B 395 | RASGRP2 396 | AP000769.1 397 | CDK2AP2 398 | PGM2L1 399 | LRRC32 400 | SYTL2 401 | PRSS23 402 | SMCO4 403 | ANKRD49 404 | SESN3 405 | PDGFD 406 | LAYN 407 | USP28 408 | CADM1 409 | RNF214 410 | FXYD2 411 | JAML 412 | UBE4A 413 | CXCR5 414 | BCL9L 415 | H2AFX 416 | CRTAM 417 | IL2RA 418 | PFKFB3 419 | WAC-AS1 420 | MAP3K8 421 | ASAH2B 422 | ANK3 423 | ARID5B 424 | RTKN2 425 | EGR2 426 | PRF1 427 | RP11-338I21.1 428 | IFIT2 429 | IFIT3 430 | KIF20B 431 | PDLIM1 432 | ENTPD1 433 | FRAT2 434 | PDCD4-AS1 435 | ABLIM1 436 | FAM160B1 437 | PTPRE 438 | MKI67 439 | KDM5A 440 | NINJ2 441 | CCND2 442 | CD9 443 | PTMS 444 | LAG3 445 | CLSTN3 446 | KLRG1 447 | A2M-AS1 448 | KLRB1 449 | CLECL1 450 | KLRF1 451 | GABARAPL1 452 | KLRD1 453 | KLRK1 454 | KLRC3 455 | KLRC1 456 | RP11-291B21.2 457 | YBX3 458 | LRMP 459 | ITPR2 460 | CAPRIN2 461 | KIF21A 462 | NELL2 463 | GALNT6 464 | NR4A1 465 | KRT86 466 | TESPA1 467 | NXPH4 468 | ARHGAP9 469 | DDIT3 470 | IFNG-AS1 471 | IFNG 472 | CPM 473 | PHLDA1 474 | PPP1R12A 475 | CEP290 476 | DUSP6 477 | ATP2B1 478 | LINC00936 479 | RP11-796E2.4 480 | GNPTAB 481 | PMCH 482 | C12orf75 483 | OAS1 484 | OASL 485 | RILPL2 486 | LINC00944 487 | HSPH1 488 | LHFP 489 | TSC22D1 490 | LPAR6 491 | PHF11 492 | KLF12 493 | LINC00402 494 | TBC1D4 495 | MYCBP2 496 | NDFIP2 497 | SPRY2 498 | MBNL2 499 | GPR18 500 | GPR183 501 | TNFSF13B 502 | RASA3 503 | TRAV4 504 | TRAV5 505 | TRAV6 506 | TRAV8-2 507 | TRAV13-1 508 | TRAV8-4 509 | TRAV13-2 510 | TRAV8-6 511 | TRAV17 512 | TRAV19 513 | TRAV20 514 | TRAV22 515 | TRAV23DV6 516 | TRDV1 517 | TRAV26-1 518 | TRAV27 519 | TRAV29DV5 520 | TRAV36DV7 521 | TRAV38-2DV8 522 | TRDC 523 | PPP1R3E 524 | REC8 525 | GZMH 526 | GZMB 527 | MIS18BP1 528 | RP11-596C23.2 529 | PTGDR 530 | PTGER2 531 | LGALS3 532 | ARID4A 533 | RP11-902B17.1 534 | HIF1A 535 | AKAP5 536 | GPR65 537 | LGMN 538 | IFI27 539 | CRIP2 540 | GCHFR 541 | RP11-23P13.6 542 | EIF3J-AS1 543 | PATL2 544 | GABPB1-AS1 545 | RORA 546 | DAPK2 547 | DENND4A 548 | TLE3 549 | TBC1D2B 550 | NMB 551 | MIR9-3HG 552 | MCTP2 553 | TARSL2 554 | HAGHL 555 | SNHG9 556 | NPW 557 | MMP25-AS1 558 | ZNF75A 559 | NLRC3 560 | ABCC1 561 | GGA2 562 | PRKCB 563 | RP11-666O2.2 564 | SPN 565 | RP11-455F5.5 566 | ADCY7 567 | CHD9 568 | AKTIP 569 | CES1 570 | MT1E 571 | MT1F 572 | CPNE2 573 | ADGRG5 574 | ADGRG1 575 | KIFC3 576 | TEPP 577 | CTD-2012K14.6 578 | LINC01229 579 | PLCG2 580 | GINS2 581 | SLC7A5 582 | CPNE7 583 | ZNF276 584 | ITGAE 585 | TXNDC17 586 | XAF1 587 | TNK1 588 | HS3ST3B1 589 | LRRC75A 590 | RASD1 591 | RP11-47L3.1 592 | SLFN12L 593 | AC069363.1 594 | CCL3 595 | CCL4 596 | CCL3L3 597 | CCL4L2 598 | MLLT6 599 | TOP2A 600 | IGFBP4 601 | RP5-1028K7.2 602 | TBX21 603 | RP11-357H14.17 604 | ABI3 605 | TOB1 606 | YPEL2 607 | ERN1 608 | PECAM1 609 | BPTF 610 | RAB37 611 | TK1 612 | SOCS3 613 | TBCD 614 | TYMS 615 | YES1 616 | TGIF1 617 | DLGAP1-AS1 618 | LDLRAD4 619 | MAPRE2 620 | PMAIP1 621 | BCL2 622 | NFATC1 623 | SIRPG 624 | CDC25B 625 | PLCB1 626 | ESF1 627 | KIZ 628 | NAPB 629 | ZNF337 630 | ID1 631 | BCL2L1 632 | SLA2 633 | TOX2 634 | CTSA 635 | TSHZ2 636 | ZNF217 637 | BCAS1 638 | ZBP1 639 | HELZ2 640 | MADCAM1 641 | ABCA7 642 | CTB-31O20.2 643 | GADD45B 644 | MATK 645 | EBI3 646 | TMIGD2 647 | UHRF1 648 | CD70 649 | TNFSF14 650 | MYO1F 651 | S1PR5 652 | ACP5 653 | ZNF443 654 | ASF1B 655 | ADGRE5 656 | ARRDC2 657 | ZNF101 658 | PLEKHF1 659 | FXYD7 660 | LSR 661 | ZBTB32 662 | NFKBID 663 | TYROBP 664 | CAPN12 665 | SERTAD1 666 | TGFB1 667 | LINC01480 668 | AC006129.2 669 | CD79A 670 | ATP1A3 671 | FOSB 672 | IGFL2 673 | SLC1A5 674 | ZNF331 675 | MYADM 676 | LAIR1 677 | LENG8 678 | LAIR2 679 | KIR2DL3 680 | KIR3DL2 681 | NCR1 682 | ZNF530 683 | USP18 684 | ZNF280B 685 | KIAA1671 686 | TPST2 687 | MIAT 688 | XBP1 689 | OSM 690 | RP4-539M6.22 691 | MCM5 692 | MAFF 693 | APOBEC3C 694 | APOBEC3D 695 | APOBEC3H 696 | GRAP2 697 | EP300 698 | RNU12 699 | LDOC1L 700 | CTA-29F11.1 701 | PIM3 702 | MIR155HG 703 | TIAM1 704 | LINC00649 705 | MX2 706 | ITGB2-AS1 707 | COL6A2 708 | -------------------------------------------------------------------------------- /RISC_Supplementary/GSE125688/GSE125688.R: -------------------------------------------------------------------------------- 1 | library(RISC) 2 | library(RColorBrewer) 3 | library(ggplot2) 4 | 5 | 6 | colname <- function(x0){ 7 | colname0 <- colnames(x0)[-ncol(x0)] 8 | x0 <- x0[,-ncol(x0)] 9 | colnames(x0) <- colname0 10 | x0 <- as.matrix(x0) 11 | rownames(x0) <- sapply(rownames(x0), function(x){strsplit(x, "__", fixed = T)[[1]][1]}) 12 | return(x0) 13 | } 14 | 15 | PATH = "/Path to the data/GSE125688" 16 | 17 | ################################################################################### 18 | ### Prepare data ### 19 | ################################################################################### 20 | Ctrl1 <- read.table(file = paste0(PATH, '/Raw_Counts/GSM3580367_bil-adult1.coutb.csv'), sep = '\t', header = T, stringsAsFactors = F) 21 | Ctrl2 <- read.table(file = paste0(PATH, '/Raw_Counts/GSM3580368_bil-adult2.coutb.csv'), sep = '\t', header = T, stringsAsFactors = F) 22 | Ctrl3 <- read.table(file = paste0(PATH, '/Raw_Counts/GSM3580369_bil-adult3.coutb.csv'), sep = '\t', header = T, stringsAsFactors = F) 23 | DDC <- read.table(file = paste0(PATH, '/Raw_Counts/GSM3580370_bil-DDC1.coutb.csv'), sep = '\t', header = T, stringsAsFactors = F) 24 | 25 | Ctrl1 <- colname(Ctrl1) 26 | Ctrl2 <- colname(Ctrl2) 27 | Ctrl3 <- colname(Ctrl3) 28 | DDC <- colname(DDC) 29 | 30 | Ctrl.dat1 <- readscdata(count = Ctrl1, cell = data.frame(Barcode = colnames(Ctrl1), row.names = colnames(Ctrl1)), gene = data.frame(Symbol = rownames(Ctrl1), row.names = rownames(Ctrl1))) 31 | Ctrl.dat2 <- readscdata(count = Ctrl2, cell = data.frame(Barcode = colnames(Ctrl2), row.names = colnames(Ctrl2)), gene = data.frame(Symbol = rownames(Ctrl2), row.names = rownames(Ctrl2))) 32 | Ctrl.dat3 <- readscdata(count = Ctrl3, cell = data.frame(Barcode = colnames(Ctrl3), row.names = colnames(Ctrl3)), gene = data.frame(Symbol = rownames(Ctrl3), row.names = rownames(Ctrl3))) 33 | DDC.dat <- readscdata(count = DDC, cell = data.frame(Barcode = colnames(DDC), row.names = colnames(DDC)), gene = data.frame(Symbol = rownames(DDC), row.names = rownames(DDC))) 34 | 35 | 36 | ################################################################################### 37 | ### Process data ### 38 | ################################################################################### 39 | Ctrl.dat1 = scFilter(Ctrl.dat1, min.UMI = 500, max.UMI = 8000, min.gene = 500, min.cell = 5) 40 | Ctrl.dat2 = scFilter(Ctrl.dat2, min.UMI = 500, max.UMI = 8000, min.gene = 500, min.cell = 5) 41 | Ctrl.dat3 = scFilter(Ctrl.dat3, min.UMI = 500, max.UMI = 8000, min.gene = 500, min.cell = 5) 42 | DDC.dat = scFilter(DDC.dat, min.UMI = 500, max.UMI = 8000, min.gene = 500, min.cell = 5) 43 | 44 | Ctrl.dat1 = scNormalize(Ctrl.dat1) 45 | Ctrl.dat2 = scNormalize(Ctrl.dat2) 46 | Ctrl.dat3 = scNormalize(Ctrl.dat3) 47 | DDC.dat = scNormalize(DDC.dat) 48 | 49 | Ctrl.dat1 = scDisperse(Ctrl.dat1) 50 | Ctrl.dat2 = scDisperse(Ctrl.dat2) 51 | Ctrl.dat3 = scDisperse(Ctrl.dat3) 52 | DDC.dat = scDisperse(DDC.dat) 53 | 54 | length(Ctrl.dat1@vargene) 55 | length(Ctrl.dat2@vargene) 56 | length(Ctrl.dat3@vargene) 57 | length(DDC.dat@vargene) 58 | 59 | Ctrl.dat1 = scPCA(Ctrl.dat1) 60 | Ctrl.dat2 = scPCA(Ctrl.dat2) 61 | Ctrl.dat3 = scPCA(Ctrl.dat3) 62 | DDC.dat = scPCA(DDC.dat) 63 | 64 | 65 | ################################################################################### 66 | ### Uncorrected data ### 67 | ################################################################################### 68 | ## Uncorrect 69 | library(irlba) 70 | library(umap) 71 | 72 | gene0 = Reduce(intersect, list(Ctrl.dat1@rowdata$Symbol, Ctrl.dat2@rowdata$Symbol, Ctrl.dat3@rowdata$Symbol, DDC.dat@rowdata$Symbol)) 73 | logmat0 = cbind(Ctrl.dat1@assay$logcount[gene0,], Ctrl.dat2@assay$logcount[gene0,], Ctrl.dat3@assay$logcount[gene0,], DDC.dat@assay$logcount[gene0,]) 74 | pca0 = irlba(logmat0, nv = 50)$v 75 | umap0 = umap(pca0[,1:9])$layout 76 | m0 = data.frame(UMAP1 = umap0[,1], UMAP2 = umap0[,2], Set = c(rep("BEC1", nrow(Ctrl.dat1@coldata)), rep("BEC2", nrow(Ctrl.dat2@coldata)), rep("BEC3", nrow(Ctrl.dat3@coldata)), rep("DDC", nrow(DDC.dat@coldata)))) 77 | ggplot(m0, aes(UMAP1, UMAP2)) + 78 | geom_point(aes(color = Set), shape = 19, size = 1, alpha = 1) + 79 | scale_color_manual(values = c("#9ECAE1", "#4292C6", "#08519C", "#EF3B2C")) + 80 | theme_bw(base_line_size = 0, base_size = 12) + 81 | labs(x = 'UMAP1', y = 'UMAP2', color = 'Set') + 82 | theme(plot.title = element_text(hjust = 0.5, size = 16)) + 83 | guides(colour = guide_legend(override.aes = list(size = 5), ncol = 1)) 84 | 85 | 86 | ################################################################################### 87 | ### Integration data ### 88 | ################################################################################### 89 | ## Integration 90 | BEC = list(Ctrl.dat3, Ctrl.dat2, Ctrl.dat1, DDC.dat) 91 | var0 = Reduce(intersect, list( 92 | rownames(Ctrl.dat1@rowdata), rownames(Ctrl.dat2@rowdata), 93 | rownames(Ctrl.dat3@rowdata), rownames(DDC.dat@rowdata) 94 | )) 95 | BEC = scMultiIntegrate(BEC, eigens = 12, var.gene = var0, ncore = 4, add.Id = c('Ctrl3', 'Ctrl2', 'Ctrl1', 'DDC')) 96 | BEC = scUMAP(BEC, npc = 12, use = 'PLS') 97 | 98 | DimPlot(BEC, slot = "cell.umap", colFactor = "Set", size = 2, Colors = c("#9ECAE1", "#4292C6", "#08519C", "#EF3B2C")) 99 | 100 | 101 | -------------------------------------------------------------------------------- /RISC_Supplementary/GSE131181/GSE131181.R: -------------------------------------------------------------------------------- 1 | library(RISC) 2 | library(irlba) 3 | library(umap) 4 | library(ggplot2) 5 | library(RColorBrewer) 6 | 7 | 8 | PATH = "/Path to the data/GSE131181" 9 | 10 | ################################################################################### 11 | ### Prepare data ### 12 | ################################################################################### 13 | library(data.table) 14 | 15 | E10 = fread(file = paste0(PATH, '/Raw_Counts/GSE131181_e10.5.raw.data.csv'), sep = ',') 16 | E13 = fread(file = paste0(PATH, '/Raw_Counts/GSE131181_e13.5.raw.data.csv'), sep = ',') 17 | dE10 = E10[,-1] 18 | gene1 = E10$V1 19 | dE13 = E13[,-1] 20 | gene2 = E13$V1 21 | rm(E10, E13) 22 | 23 | mE10 = fread(file = paste0(PATH, '/Raw_Counts/GSE131181_e10.5.meta.data.csv'), sep = ',') 24 | mE13 = fread(file = paste0(PATH, '/Raw_Counts/GSE131181_e13.5.meta.data.csv'), sep = ',') 25 | mdE10 = mE10[,c(1, 4, 9, 10)] 26 | mdE13 = mE13[,c(1, 4, 10, 8)] 27 | colnames(mdE10) = colnames(mdE13) = c('Sample', 'Experiment', 'Group', 'Set') 28 | rm(mE10, mE13) 29 | 30 | dE10 = dE10[, mdE10$Sample, with = F] 31 | dE13 = dE13[, mdE13$Sample, with = F] 32 | 33 | mdE10 = as.data.frame(mdE10) 34 | mE10wt = mdE10[mdE10$Set == 'Control', , drop = F] 35 | mE10pd = mdE10[mdE10$Set == 'PhD', , drop = F] 36 | mdE13 = as.data.frame(mdE13) 37 | mE13wt = mdE13[mdE13$Set == 'Control', , drop = F] 38 | mE13pd = mdE13[mdE13$Set == 'PhD', , drop = F] 39 | 40 | E10wt = as.matrix(dE10[, mE10wt$Sample, with = F]) 41 | E10pd = as.matrix(dE10[, mE10pd$Sample, with = F]) 42 | E13wt = as.matrix(dE13[, mE13wt$Sample, with = F]) 43 | E13pd = as.matrix(dE13[, mE13pd$Sample, with = F]) 44 | rm(dE10, dE13, mdE10, mdE13) 45 | rownames(E10wt) = rownames(E10pd) = gene1 46 | rownames(E13wt) = rownames(E13pd) = gene2 47 | rownames(mE10wt) = mE10wt$Sample 48 | rownames(mE10pd) = mE10pd$Sample 49 | rownames(mE13wt) = mE13wt$Sample 50 | rownames(mE13pd) = mE13pd$Sample 51 | 52 | 53 | ################################################################################### 54 | ### Individual datasets ### 55 | ################################################################################### 56 | dat1 = readscdata(count = E10wt, cell = mE10wt, gene = data.frame(Symbol = gene1, row.names = gene1)) 57 | dat2 = readscdata(count = E10pd, cell = mE10pd, gene = data.frame(Symbol = gene1, row.names = gene1)) 58 | dat3 = readscdata(count = E13wt, cell = mE13wt, gene = data.frame(Symbol = gene2, row.names = gene2)) 59 | dat4 = readscdata(count = E13pd, cell = mE13pd, gene = data.frame(Symbol = gene2, row.names = gene2)) 60 | rm(E10wt, E10pd, E13wt, E13pd, mE10wt, mE10pd, mE13wt, mE13pd, gene1, gene2) 61 | 62 | process0 <- function(obj0){ 63 | obj0 = scFilter(obj0, min.UMI = 1000, max.UMI = Inf, min.gene = 200, min.cell = 5) 64 | obj0 = scNormalize(obj0) 65 | obj0 = scDisperse(obj0) 66 | length(obj0@vargene) 67 | return(obj0) 68 | } 69 | 70 | dat1 = process0(dat1) 71 | dat2 = process0(dat2) 72 | dat3 = process0(dat3) 73 | dat4 = process0(dat4) 74 | 75 | FilterPlot(dat1) 76 | FilterPlot(dat2) 77 | FilterPlot(dat3) 78 | FilterPlot(dat4) 79 | 80 | 81 | ################################################################################### 82 | ### Uncorrect Data ### 83 | ################################################################################### 84 | # Integration 85 | ann0 = read.table(file = paste0(PATH, "/GSE131181_Ann.tsv"), sep = "\t", header = T, stringsAsFactors = F) 86 | var0 = read.table(file = paste0(PATH, "/GSE131181_var.tsv"), sep = "\t", header = F, stringsAsFactors = F) 87 | var0 = var0$V1 88 | logmat0 = cbind(dat3@assay$logcount[var0,], dat4@assay$logcount[var0,], dat1@assay$logcount[var0,], dat2@assay$logcount[var0,]) 89 | pca0 = irlba(as.matrix(logmat0), nv = 50)$v 90 | umap0 = umap(pca0[,1:25])$layout 91 | 92 | m0 = data.frame(UMAP1 = umap0[,1], UMAP2 = umap0[,2], Set = ann0$Set, Type = ann0$Type) 93 | ggplot(m0, aes(UMAP1, UMAP2)) + 94 | geom_point(aes(color = Set, shape = Set), size = 0.1, alpha = 0.85) + 95 | scale_color_manual(values = c('#FB8072', '#80B1D3', '#8DD3C7', '#BC80BD')) + 96 | scale_shape_manual(values = c(3, 4, 2, 6)) + 97 | theme_bw(base_line_size = 0, base_size = 24) + 98 | labs(x = 'UMAP-1', y = 'UMAP-2', color = 'Set', shape = 'Set') + 99 | theme(plot.title = element_text(hjust = 0.5, size = 16)) + 100 | guides(colour = guide_legend(override.aes = list(size = 6), ncol = 1)) 101 | 102 | ggplot(m0, aes(UMAP1, UMAP2)) + 103 | geom_point(aes(color = Type, shape = Set), size = 0.1, alpha = 0.9) + 104 | scale_color_manual(values = colorRampPalette(brewer.pal(11, 'Spectral'))(18)) + 105 | scale_shape_manual(values = c(3, 4, 2, 6)) + 106 | theme_bw(base_line_size = 0, base_size = 24) + 107 | labs(x = 'UMAP-1', y = 'UMAP-2', color = 'Cell Type', shape = 'Set') + 108 | theme(plot.title = element_text(hjust = 0.5, size = 16)) + 109 | guides(colour = guide_legend(override.aes = list(size = 6), ncol = 1), shape = guide_legend(override.aes = list(size = 5), ncol = 1)) 110 | 111 | 112 | ################################################################################### 113 | ### Data Integration ### 114 | ################################################################################### 115 | # Integration 116 | var0 = read.table(file = paste0(PATH, "/GSE131181_var.tsv"), sep = "\t", header = F, stringsAsFactors = F) 117 | var0 = var0$V1 118 | dat.all = list(dat3, dat4, dat1, dat2) 119 | InPlot(dat.all, var.gene = var0, nPC = 30, ncore = 4) 120 | dat.all = scMultiIntegrate(dat.all, eigens = 20, var.gene = var0, ncore = 2, adjust = FALSE, add.Id = c('E13wt', 'E13pd', 'E10wt', 'E10pd')) 121 | dat.all = scUMAP(dat.all, npc = 25, use = 'PLS') 122 | dat.all = scTSNE(dat.all, npc = 25, use = 'PLS') 123 | 124 | ann0 = read.table(file = paste0(PATH, "/GSE131181_Ann.tsv"), sep = "\t", header = T, stringsAsFactors = F) 125 | dat.all@coldata$Type = ann0$Type 126 | 127 | DimPlot(dat.all, slot = "cell.umap", colFactor = "Set", size = 0.2) 128 | DimPlot(dat.all, slot = "cell.umap", colFactor = "Type", size = 0.2) 129 | 130 | 131 | -------------------------------------------------------------------------------- /RISC_Supplementary/GSE132044/GSE132044.R: -------------------------------------------------------------------------------- 1 | library(RISC) 2 | library(Matrix) 3 | library(irlba) 4 | library(umap) 5 | library(ggplot2) 6 | library(RColorBrewer) 7 | 8 | 9 | PATH = "/Path to the data/GSE132044" 10 | 11 | 12 | ################################################################################### 13 | ### Prepare Data ### 14 | ################################################################################### 15 | ## All Cells 16 | mat0 = readMM(file = paste0(PATH, "/Raw_Data/counts.umi.mtx")) 17 | cell0 = read.table(file = paste0(PATH, "/Raw_Data/cells.umi.txt"), sep = "\t", header = F, stringsAsFactors = F) 18 | colnames(cell0) = "Mix" 19 | cell0$Sort = sapply(cell0$Mix, function(x){strsplit(x, "_", fixed = T)[[1]][1]}) 20 | cell0$Platform = sapply(cell0$Mix, function(x){strsplit(x, "_", fixed = T)[[1]][2]}) 21 | gene0 = read.table(file = paste0(PATH, "/Raw_Data/genes.umi.txt"), sep = "\t", header = F, stringsAsFactors = F) 22 | colnames(gene0) = "Mix" 23 | gene0$Ensembl = sapply(gene0$Mix, function(x){strsplit(x, "_", fixed = T)[[1]][1]}) 24 | gene0$Symbol = sapply(gene0$Mix, function(x){strsplit(x, "_", fixed = T)[[1]][2]}) 25 | keep = !duplicated(gene0$Symbol) 26 | gene0 = gene0[keep,] 27 | mat0 = mat0[keep,] 28 | colnames(mat0) = cell0$Mix 29 | rownames(mat0) = gene0$Symbol 30 | 31 | 32 | ## PBMC ("10xChromiumv2A", "Drop", "inDrops") 33 | # 10X Chromium V2 34 | keep = cell0$Sort == "pbmc2" & cell0$Platform == "10xChromiumv2" 35 | cell1 = cell0[keep,] 36 | cell1$Barcode = sapply(cell1$Mix, function(x){strsplit(x, "_", fixed = T)[[1]][3]}) 37 | rownames(cell1) = cell1$Barcode 38 | mat1 = mat0[,keep] 39 | keep = rowSums(as.matrix(mat1) > 0) > 0 40 | mat1 = mat1[keep,] 41 | colnames(mat1) = rownames(cell1) 42 | gene0 = data.frame(Symbol = rownames(mat1), row.names = rownames(mat1)) 43 | dat1 = readscdata(mat1, cell1, gene0, is.filter = F) 44 | 45 | # Drop-seq 46 | # keep = cell0$Sort == "pbmc2" & cell0$Platform == "Drop" & cell0$Mix %in% Ann0$Mix 47 | keep = cell0$Sort == "pbmc2" & cell0$Platform == "Drop" 48 | cell1 = cell0[keep,] 49 | cell1$Barcode = sapply(cell1$Mix, function(x){strsplit(x, "_", fixed = T)[[1]][4]}) 50 | rownames(cell1) = cell1$Barcode 51 | mat1 = mat0[,keep] 52 | keep = rowSums(as.matrix(mat1) > 0) > 0 53 | mat1 = mat1[keep,] 54 | colnames(mat1) = rownames(cell1) 55 | gene0 = data.frame(Symbol = rownames(mat1), row.names = rownames(mat1)) 56 | dat2 = readscdata(mat1, cell1, gene0, is.filter = F) 57 | 58 | # inDrops-seq 59 | # keep = cell0$Sort == "pbmc2" & cell0$Platform == "inDrops" & cell0$Mix %in% Ann0$Mix 60 | keep = cell0$Sort == "pbmc2" & cell0$Platform == "inDrops" 61 | cell1 = cell0[keep,] 62 | cell1$Barcode = sapply(cell1$Mix, function(x){strsplit(x, "_", fixed = T)[[1]][4]}) 63 | rownames(cell1) = cell1$Barcode 64 | mat1 = mat0[,keep] 65 | keep = rowSums(as.matrix(mat1) > 0) > 0 66 | mat1 = mat1[keep,] 67 | colnames(mat1) = rownames(cell1) 68 | gene0 = data.frame(Symbol = rownames(mat1), row.names = rownames(mat1)) 69 | dat3 = readscdata(mat1, cell1, gene0, is.filter = F) 70 | 71 | 72 | ################################################################################### 73 | ### RISC Objects ### 74 | ################################################################################### 75 | process0 <- function(obj0){ 76 | obj0 = scFilter(obj0, min.UMI = 100, max.UMI = 10000, min.gene = 100, min.cell = 3) 77 | obj0 = scNormalize(obj0, ncore = 4) 78 | obj0 = scDisperse(obj0) 79 | print(length(obj0@vargene)) 80 | return(obj0) 81 | } 82 | 83 | dat1 = process0(dat1) 84 | dat2 = process0(dat2) 85 | dat3 = process0(dat3) 86 | 87 | 88 | ################################################################################### 89 | ### Uncorrect Data ### 90 | ################################################################################### 91 | set.seed(123) 92 | var0 = Reduce(intersect, list(dat1@rowdata$Symbol, dat2@rowdata$Symbol, dat3@rowdata$Symbol)) 93 | logmat1 = as.matrix(dat1@assay$logcount)[var0,] 94 | logmat2 = as.matrix(dat2@assay$logcount)[var0,] 95 | logmat3 = as.matrix(dat3@assay$logcount)[var0,] 96 | logmat0 = cbind(logmat3, logmat1, logmat2) 97 | pca0 = irlba(logmat0, nv = 50)$v 98 | umap0 = umap(pca0[,1:15])$layout 99 | 100 | ann0 = read.table(file = paste0(PATH, "/GSE132044_Anno.tsv"), sep = "\t", header = T, stringsAsFactors = F) 101 | m0 = data.frame(UMAP1 = umap0[,1], UMAP2 = umap0[,2], Plat = ann0$Set, Type = ann0$Type) 102 | m0$Platform = factor(m0$Plat, levels = c("inDrops", "10X-Genomics", "Drop-seq"), labels = c("inDrops", "10xGenomics", "Drop-seq")) 103 | m0$Type = factor(m0$Type, levels = c("B", "Platelet", "pDC", "CD16+ Mono", "CD4 Memory T", "secretory B", "CD4 Naive T", "CD8 T", "DC", "NK", "CD14+ Mono")) 104 | color0 = brewer.pal(11, "Spectral") 105 | color1 = c("#D9D9D9", "#FDB462", "#BC80BD") 106 | 107 | ggplot(m0, aes(UMAP1, UMAP2)) + 108 | geom_point(aes(color = Platform, shape = Platform), size = 1, alpha = 1) + 109 | scale_color_manual(values = color1) + 110 | scale_shape_manual(values = c(1, 2, 6)) + 111 | theme_bw(base_line_size = 0) + 112 | labs(x = 'UMAP-1', y = 'UMAP-2', color = 'Platforms', shape = 'Set') + 113 | theme(plot.title = element_text(hjust = 0.5, size = 16)) + 114 | guides(colour = guide_legend(override.aes = list(size = 5), ncol = 1), shape = guide_legend(override.aes = list(size = 5), ncol = 1)) 115 | 116 | ggplot(m0, aes(UMAP1, UMAP2)) + 117 | geom_point(aes(color = Type, shape = Platform), size = 1, alpha = 1) + 118 | scale_color_manual(values = color0) + 119 | scale_shape_manual(values = c(1, 2, 6)) + 120 | theme_bw(base_line_size = 0) + 121 | labs(x = 'UMAP-1', y = 'UMAP-2', color = 'Platforms', shape = 'Set') + 122 | theme(plot.title = element_text(hjust = 0.5, size = 16)) + 123 | guides(colour = guide_legend(override.aes = list(size = 5), ncol = 1), shape = guide_legend(override.aes = list(size = 5), ncol = 1)) 124 | 125 | 126 | ################################################################################### 127 | ### RISC Integration ### 128 | ################################################################################### 129 | ## PBMC2 130 | set.seed(123) 131 | data0 = list(dat3, dat1, dat2) 132 | var0 = read.table(file = paste0(PATH, "/var.tsv"), sep = "\t", header = F, stringsAsFactors = F) 133 | var0 = var0$V1 134 | InPlot(data0, var.gene = var0, ncore = 4, minPC = 11, nPC = 15) 135 | data0 = scMultiIntegrate(data0, eigens = 14, var.gene = var0, add.Id = c("inDrops", "10X-Genomics", "Drop-seq"), ncore = 4) 136 | data0 = scUMAP(data0, npc = 15, use = "PLS") 137 | ann0 = read.table(file = paste0(PATH, "/GSE132044_Anno.tsv"), sep = "\t", header = T, stringsAsFactors = F) 138 | data0@coldata$Type = ann0$Type 139 | 140 | DimPlot(data0, slot = "cell.umap", colFactor = "Set", size = 0.5) 141 | DimPlot(data0, slot = "cell.umap", colFactor = "Type", size = 0.5, label = T) 142 | DimPlot(data0, slot = "cell.umap", genes = "MS4A1", size = 0.2) 143 | DimPlot(data0, slot = "cell.umap", genes = "GNLY", size = 0.2) 144 | DimPlot(data0, slot = "cell.umap", genes = "RCAN3", size = 0.2) 145 | DimPlot(data0, slot = "cell.umap", genes = "CD8A", size = 0.2) 146 | DimPlot(data0, slot = "cell.umap", genes = "GZMK", size = 0.2) 147 | DimPlot(data0, slot = "cell.umap", genes = "CD14", size = 0.2) 148 | DimPlot(data0, slot = "cell.umap", genes = "FCGR3A", size = 0.2) 149 | DimPlot(data0, slot = "cell.umap", genes = "FCER1A", size = 0.2) 150 | DimPlot(data0, slot = "cell.umap", genes = "PPBP", size = 0.2) 151 | DimPlot(data0, slot = "cell.umap", genes = "SERPINF1", size = 0.2) 152 | DimPlot(data0, slot = "cell.umap", genes = "JCHAIN", size = 0.2) 153 | 154 | 155 | -------------------------------------------------------------------------------- /RISC_Supplementary/GSE84133/GSE84133.R: -------------------------------------------------------------------------------- 1 | library(RISC) 2 | library(irlba) 3 | library(umap) 4 | library(ggplot2) 5 | library(RColorBrewer) 6 | library(biomaRt) 7 | 8 | 9 | ################################################################################### 10 | ### Prepare Data ### 11 | ################################################################################### 12 | PATH = "/Path to the data/GSE84133" 13 | Anno = read.table(file = paste0(PATH, "/GSE84133_Filter.tsv"), sep = "\t", header = T, stringsAsFactors = F) 14 | ann0 = read.table(file = paste0(PATH, "/GSE84133_Anno.tsv"), sep = "\t", header = T, stringsAsFactors = F) 15 | 16 | dat1 = read.table(file = paste0(PATH, '/Raw_Counts/GSM2230757_human1_umifm_counts.csv'), sep = ',', header = T, stringsAsFactors = F, check.names = F) 17 | mat1 = as.matrix(t(dat1[,-c(1:3)])) 18 | cel1 = dat1[,c(1, 3)] 19 | colnames(cel1) = c('Barcode', 'Origi') 20 | cel1$Set = 'Donor1' 21 | colnames(mat1) = rownames(cel1) = cel1$Barcode 22 | cel1 = cel1[cel1$Barcode %in% Anno$Barcode[Anno$Set == "Donor1"],] 23 | mat1 = mat1[,colnames(mat1) %in% rownames(cel1)] 24 | 25 | dat2 = read.table(file = paste0(PATH, '/Raw_Counts/GSM2230758_human2_umifm_counts.csv'), sep = ',', header = T, stringsAsFactors = F, check.names = F) 26 | mat2 = as.matrix(t(dat2[,-c(1:3)])) 27 | cel2 = dat2[,c(1, 3)] 28 | colnames(cel2) = c('Barcode', 'Origi') 29 | cel2$Set = 'Donor2' 30 | colnames(mat2) = rownames(cel2) = cel2$Barcode 31 | cel2 = cel2[cel2$Barcode %in% Anno$Barcode[Anno$Set == "Donor2"],] 32 | mat2 = mat2[,colnames(mat2) %in% rownames(cel2)] 33 | 34 | dat3 = read.table(file = paste0(PATH, '/Raw_Counts/GSM2230759_human3_umifm_counts.csv'), sep = ',', header = T, stringsAsFactors = F, check.names = F) 35 | mat3 = as.matrix(t(dat3[,-c(1:3)])) 36 | cel3 = dat3[,c(1, 3)] 37 | colnames(cel3) = c('Barcode', 'Origi') 38 | cel3$Set = 'Donor3' 39 | colnames(mat3) = rownames(cel3) = cel3$Barcode 40 | cel3 = cel3[cel3$Barcode %in% Anno$Barcode[Anno$Set == "Donor3"],] 41 | mat3 = mat3[,colnames(mat3) %in% rownames(cel3)] 42 | 43 | dat4 = read.table(file = paste0(PATH, '/Raw_Counts/GSM2230760_human4_umifm_counts.csv'), sep = ',', header = T, stringsAsFactors = F, check.names = F) 44 | mat4 = as.matrix(t(dat4[,-c(1:3)])) 45 | cel4 = dat4[,c(1, 3)] 46 | colnames(cel4) = c('Barcode', 'Origi') 47 | cel4$Set = 'Donor4' 48 | colnames(mat4) = rownames(cel4) = cel4$Barcode 49 | cel4 = cel4[cel4$Barcode %in% Anno$Barcode[Anno$Set == "Donor4"],] 50 | mat4 = mat4[,colnames(mat4) %in% rownames(cel4)] 51 | 52 | matrix1 = cbind(mat1, mat2, mat3, mat4) 53 | keep = rowSums(matrix1 > 0) >= 0 54 | matrix1 = matrix1[keep,] 55 | gene0 = rownames(matrix1) 56 | gen1 = gen2 = gen3 = gen4 = data.frame(Symbol = gene0, row.names = gene0, stringsAsFactors = F) 57 | 58 | dat1 = readscdata(mat1[gene0,], cel1, gen1, is.filter = F) 59 | dat2 = readscdata(mat2[gene0,], cel2, gen2, is.filter = F) 60 | dat3 = readscdata(mat3[gene0,], cel3, gen3, is.filter = F) 61 | dat4 = readscdata(mat4[gene0,], cel4, gen4, is.filter = F) 62 | 63 | mouse = useMart('ensembl', dataset = 'mmusculus_gene_ensembl') 64 | human = useMart('ensembl', dataset = 'hsapiens_gene_ensembl') 65 | symbol = getLDS(attributes = 'hgnc_symbol', filters = 'hgnc_symbol', values = gene0, mart = human, attributesL = 'mgi_symbol', martL = mouse, uniqueRows = T) 66 | colnames(symbol) = c('HGNC', 'MGI') 67 | symbol0 = symbol[!duplicated(symbol$MGI, na.rm = T),] 68 | symbol0 = symbol0[!duplicated(symbol0$HGNC, na.rm = T),] 69 | 70 | dat5 = read.table(file = paste0(PATH, '/Raw_Counts/GSM2230761_mouse1_umifm_counts.csv'), sep = ',', header = T, stringsAsFactors = F, check.names = F) 71 | mat5 = as.matrix(t(dat5[,-c(1:3)])) 72 | cel5 = dat5[,c(1, 3)] 73 | colnames(cel5) = c('Barcode', 'Origi') 74 | cel5$Set = 'Mouse5' 75 | colnames(mat5) = rownames(cel5) = cel5$Barcode 76 | gene0 = intersect(symbol0$MGI, rownames(mat5)) 77 | symbol1 = symbol0[symbol0$MGI %in% gene0,] 78 | mat5 = mat5[symbol1$MGI,] 79 | rownames(mat5) = symbol1$HGNC 80 | cel5 = cel5[cel5$Barcode %in% Anno$Barcode[Anno$Set == "Mouse1"],] 81 | mat5 = mat5[,colnames(mat5) %in% rownames(cel5)] 82 | 83 | dat6 = read.table(file = paste0(PATH, '/Raw_Counts/GSM2230762_mouse2_umifm_counts.csv'), sep = ',', header = T, stringsAsFactors = F, check.names = F) 84 | mat6 = as.matrix(t(dat6[,-c(1:3)])) 85 | cel6 = dat6[,c(1, 3)] 86 | colnames(cel6) = c('Barcode', 'Origi') 87 | cel6$Set = 'Mouse6' 88 | colnames(mat6) = rownames(cel6) = cel6$Barcode 89 | gene0 = intersect(symbol0$MGI, rownames(mat6)) 90 | symbol2 = symbol0[symbol0$MGI %in% gene0,] 91 | mat6 = mat6[symbol2$MGI,] 92 | rownames(mat6) = symbol2$HGNC 93 | cel6 = cel6[cel6$Barcode %in% Anno$Barcode[Anno$Set == "Mouse2"],] 94 | mat6 = mat6[,colnames(mat6) %in% rownames(cel6)] 95 | 96 | matrix2 = cbind(mat5, mat6) 97 | keep = rowSums(matrix2 > 0) >= 0 98 | matrix2 = matrix2[keep,] 99 | gene0 = rownames(matrix2) 100 | gen5 = gen6 = data.frame(Symbol = gene0, row.names = gene0, stringsAsFactors = F) 101 | 102 | dat5 = readscdata(mat5[gene0,], cel5, gen5, is.filter = F) 103 | dat6 = readscdata(mat6[gene0,], cel6, gen6, is.filter = F) 104 | 105 | 106 | ################################################################################### 107 | ### RISC Objects ### 108 | ################################################################################### 109 | process0 <- function(obj0){ 110 | obj0 = scFilter(obj0, min.UMI = 1000, max.UMI = 20000, min.gene = 200, min.cell = 0, is.filter = F) 111 | obj0 = scNormalize(obj0) 112 | obj0 = scDisperse(obj0) 113 | print(length(obj0@vargene)) 114 | return(obj0) 115 | } 116 | 117 | dat1 = process0(dat1) 118 | dat2 = process0(dat2) 119 | dat3 = process0(dat3) 120 | dat4 = process0(dat4) 121 | dat5 = process0(dat5) 122 | dat6 = process0(dat6) 123 | 124 | 125 | ################################################################################### 126 | ### Uncorrect Data ### 127 | ################################################################################### 128 | Color0 = data.frame( 129 | Type = c("activated_stellate", "acinar", "alpha", "beta", "delta", "ductal", "epsilon", "immune_other", "b_cell", "mast", "schwann", "t_cell", "gamma", "macrophage", "endothelial", "quiescent_stellate"), 130 | Color = colorRampPalette(brewer.pal(11, 'Spectral'))(16), stringsAsFactors = F 131 | ) 132 | 133 | coldata0 = rbind.data.frame(dat1@coldata, dat2@coldata, dat3@coldata, dat4@coldata) 134 | var0 = Reduce(intersect, list(rownames(dat1@assay$logcount), rownames(dat2@assay$logcount), rownames(dat3@assay$logcount), rownames(dat4@assay$logcount))) 135 | logmat1 = cbind(as.matrix(dat1@assay$logcount)[var0,], as.matrix(dat2@assay$logcount)[var0,], as.matrix(dat3@assay$logcount)[var0,], as.matrix(dat4@assay$logcount)[var0,]) 136 | pca0 = irlba(logmat1, nv = 10, center = T)$v 137 | umap0 = umap(pca0)$layout 138 | m0 = data.frame(UMAP1 = umap0[,1], UMAP2 = umap0[,2], Set = coldata0$Set, Type = coldata0$Origi) 139 | m0$Type = factor(m0$Type, levels = c("activated_stellate", "acinar", "alpha", "beta", "delta", "ductal", "epsilon", "mast", "schwann", "t_cell", "gamma", "macrophage", "endothelial", "quiescent_stellate")) 140 | color1 = Color0$Color[Color0$Type %in% as.character(m0$Type)] 141 | 142 | ggplot(m0, aes(UMAP1, UMAP2)) + 143 | geom_point(aes(color = Set, shape = Set), size = 1, alpha = 1) + 144 | scale_color_manual(values = brewer.pal(4, 'Dark2')) + 145 | scale_shape_manual(values = c(1, 2, 6, 8)) + 146 | theme_bw(base_line_size = 0) + 147 | labs(x = 'UMAP-1', y = 'UMAP-2', color = 'Set', shape = 'Source') + 148 | theme(plot.title = element_text(hjust = 0.5, size = 16)) + 149 | guides(colour = guide_legend(override.aes = list(size = 5), ncol = 1), shape = guide_legend(override.aes = list(size = 5), ncol = 1)) 150 | ggplot(m0, aes(UMAP1, UMAP2)) + 151 | geom_point(aes(color = Type, shape = Set), size = 1, alpha = 1) + 152 | scale_color_manual(values = color1) + 153 | scale_shape_manual(values = c(1, 2, 6, 8)) + 154 | theme_bw(base_line_size = 0) + 155 | labs(x = 'UMAP-1', y = 'UMAP-2', color = 'Cell Type', shape = 'Source') + 156 | theme(plot.title = element_text(hjust = 0.5, size = 16)) + 157 | guides(colour = guide_legend(override.aes = list(size = 5), ncol = 1), shape = guide_legend(override.aes = list(size = 5), ncol = 1)) 158 | 159 | coldata0 = rbind.data.frame(dat5@coldata, dat6@coldata) 160 | var0 = Reduce(intersect, list(rownames(dat5@assay$logcount), rownames(dat6@assay$logcount))) 161 | logmat2 = cbind(as.matrix(dat5@assay$logcount)[var0,], as.matrix(dat6@assay$logcount)[var0,]) 162 | pca0 = irlba(logmat2, nv = 8, center = T)$v 163 | umap0 = umap(pca0)$layout 164 | m0 = data.frame(UMAP1 = umap0[,1], UMAP2 = umap0[,2], Set = coldata0$Set, Type = coldata0$Origi) 165 | m0$Set = factor(m0$Set, levels = c("Mouse5", "Mouse6"), labels = c("Mouse1", "Mouse2")) 166 | m0$Type = factor( 167 | m0$Type, 168 | levels = c("activated_stellate", "beta", "ductal", "immune_other", "B_cell", "schwann", "T_cell", "gamma", "macrophage", "endothelial", "quiescent_stellate"), 169 | labels = c("activated_stellate", "beta", "ductal", "immune_other", "b_cell", "schwann", "t_cell", "gamma", "macrophage", "endothelial", "quiescent_stellate") 170 | ) 171 | color2 = Color0$Color[Color0$Type %in% m0$Type] 172 | 173 | ggplot(m0, aes(UMAP1, UMAP2)) + 174 | geom_point(aes(color = Set, shape = Set), size = 1, alpha = 1) + 175 | scale_color_manual(values = brewer.pal(4, 'Dark2')) + 176 | scale_shape_manual(values = c(4, 5)) + 177 | theme_bw(base_line_size = 0) + 178 | labs(x = 'UMAP-1', y = 'UMAP-2', color = 'Set', shape = 'Source') + 179 | theme(plot.title = element_text(hjust = 0.5, size = 16)) + 180 | guides(colour = guide_legend(override.aes = list(size = 5), ncol = 1), shape = guide_legend(override.aes = list(size = 5), ncol = 1)) 181 | ggplot(m0, aes(UMAP1, UMAP2)) + 182 | geom_point(aes(color = Type, shape = Set), size = 1, alpha = 1) + 183 | scale_color_manual(values = color2) + 184 | scale_shape_manual(values = c(4, 5)) + 185 | theme_bw(base_line_size = 0) + 186 | labs(x = 'UMAP-1', y = 'UMAP-2', color = 'Cell Type', shape = 'Source') + 187 | theme(plot.title = element_text(hjust = 0.5, size = 16)) + 188 | guides(colour = guide_legend(override.aes = list(size = 5), ncol = 1), shape = guide_legend(override.aes = list(size = 5), ncol = 1)) 189 | 190 | 191 | ################################################################################### 192 | ### RISC Integration ### 193 | ################################################################################### 194 | ## Human 195 | data1 = list(dat1, dat2, dat3, dat4) 196 | var0 = Reduce(intersect, list(dat1@vargene, dat2@vargene, dat3@vargene, dat4@vargene)) 197 | InPlot(data1, var.gene = var0) 198 | data1 = scMultiIntegrate(data1, eigens = 14, var.gene = var0, ncore = 4, add.Id = c("Donor1", "Donor2", "Donor3", "Donor4")) 199 | data1 = scUMAP(data1, npc = 14, use = 'PLS') 200 | data1@coldata$Type = as.factor(ann0$Type[ann0$Set %in% c("Donor1", "Donor2", "Donor3", "Donor4")]) 201 | 202 | DimPlot(data1, slot = "cell.umap", colFactor = "Set", Colors = brewer.pal(4, "Dark2")) 203 | DimPlot(data1, slot = "cell.umap", colFactor = "Type") 204 | 205 | ## Mouse 206 | data2 = list(dat6, dat5) 207 | var0 = Reduce(intersect, list(dat5@vargene, dat6@vargene)) 208 | InPlot(data2, var.gene = var0) 209 | data2 = scMultiIntegrate(data2, eigens = 14, var.gene = var0, ncore = 4, add.Id = c("Mouse2", "Mouse1")) 210 | data2 = scUMAP(data2, npc = 14, use = 'PLS') 211 | data2@coldata$Type = as.factor(ann0$Type[ann0$Set %in% c("Mouse2", "Mouse1")]) 212 | 213 | DimPlot(data2, slot = "cell.umap", colFactor = "Set", Colors = brewer.pal(4, "Dark2")) 214 | DimPlot(data2, slot = "cell.umap", colFactor = "Type") 215 | 216 | ## Both species 217 | data3 = list(dat1, dat2, dat3, dat4, dat5, dat6) 218 | var0 = read.table(file = paste0(PATH, "/var.tsv"), sep = "\t", header = F, stringsAsFactors = F) 219 | var0 = var0$V1 220 | InPlot(data3, var.gene = var0) 221 | data3 = scMultiIntegrate(data3, eigens = 15, var.gene = var0, ncore = 4, add.Id = c("Donor1", "Donor2", "Donor3", "Donor4", "Mouse1", "Mouse2")) 222 | data3 = scUMAP(data3, npc = 15, use = 'PLS') 223 | data3@coldata$Type = as.factor(ann0$Type) 224 | 225 | DimPlot(data3, slot = "cell.umap", colFactor = "Set", Colors = brewer.pal(6, "Dark2")) 226 | DimPlot(data3, slot = "cell.umap", colFactor = "Type") 227 | 228 | DimPlot(data3, slot = "cell.umap", genes = "GCG", size = 0.5) 229 | DimPlot(data3, slot = "cell.umap", genes = "INS", size = 0.5, exp.range = c(6, 10)) 230 | DimPlot(data3, slot = "cell.umap", genes = "RESP18", size = 0.5) 231 | DimPlot(data3, slot = "cell.umap", genes = "KRT19", size = 0.5) 232 | DimPlot(data3, slot = "cell.umap", genes = "COL6A3", size = 0.5) 233 | DimPlot(data3, slot = "cell.umap", genes = "SST", size = 0.5, exp.range = c(3, 11)) 234 | DimPlot(data3, slot = "cell.umap", genes = "RGS5", size = 0.5) 235 | DimPlot(data3, slot = "cell.umap", genes = "PRSS1", size = 0.5) 236 | DimPlot(data3, slot = "cell.umap", genes = "FLT1", size = 0.5) 237 | 238 | 239 | -------------------------------------------------------------------------------- /RISC_Supplementary/GSE85241_GSE81076_GSE83139_EMTAB_5061/GSE85241_GSE81076_GSE83139_EMTAB_5061.R: -------------------------------------------------------------------------------- 1 | library(RISC) 2 | library(irlba) 3 | library(umap) 4 | library(ggplot2) 5 | library(RColorBrewer) 6 | 7 | 8 | ################################################################################### 9 | ### Prepare Data ### 10 | ################################################################################### 11 | PATH = "/Path to the data/GSE85241_GSE81076_GSE83139_EMTAB_5061" 12 | 13 | ## GSE85241 CEL-Seq2 14 | gse85241.df <- read.table(file = paste0(PATH, "/Raw_Counts/GSE85241_cellsystems_dataset_4donors_updated.csv"), sep = '\t', h = TRUE, row.names = 1, stringsAsFactors = FALSE) 15 | dim(gse85241.df) 16 | donor.names <- sub("^(D[0-9]+).*", "\\1", colnames(gse85241.df)) 17 | table(donor.names) 18 | plate.id <- sub("^D[0-9]+\\.([0-9]+)_.*", "\\1", colnames(gse85241.df)) 19 | table(plate.id) 20 | gene.symb <- gsub("__chr.*$", "", rownames(gse85241.df)) 21 | is.spike <- grepl("^ERCC-", gene.symb) 22 | table(is.spike) 23 | gse85241.df$Symbol = gene.symb 24 | gse85241.df = gse85241.df[!duplicated(gse85241.df$Symbol),] 25 | rownames(gse85241.df) = gse85241.df$Symbol 26 | gse85241.df = gse85241.df[,-3073] 27 | 28 | count0 = gse85241.df 29 | cell = data.frame(Donor = donor.names, Plate = plate.id) 30 | rownames(cell) = colnames(count0) 31 | gene = data.frame(Symbol = rownames(count0)) 32 | rownames(gene) = rownames(count0) 33 | 34 | data1 = readscdata(count = count0, cell = cell, gene = gene) 35 | rm(cell, count0, donor.names, gene, gene.symb, gse85241.df, is.spike, plate.id) 36 | 37 | 38 | ## GSE81076 CEL-Seq 39 | gse81076.df <- read.table(file = paste0(PATH, "/Raw_Counts/GSE81076_D2_3_7_10_17.txt"), sep='\t', header=TRUE, stringsAsFactors=FALSE, row.names=1) 40 | donor.names <- sub("^(D[0-9]+).*", "\\1", colnames(gse81076.df)) 41 | table(donor.names) 42 | plate.id <- sub("^D[0-9]+(.*)_.*", "\\1", colnames(gse81076.df)) 43 | table(plate.id) 44 | gene.symb <- gsub("__chr.*$", "", rownames(gse81076.df)) 45 | is.spike <- grepl("^ERCC-", gene.symb) 46 | table(is.spike) 47 | gse81076.df$Symbol = gene.symb 48 | gse81076.df = gse81076.df[!duplicated(gse81076.df$Symbol),] 49 | rownames(gse81076.df) = gse81076.df$Symbol 50 | gse81076.df = gse81076.df[,-1729] 51 | 52 | count0 = gse81076.df 53 | cell = data.frame(Donor = donor.names, Plate = plate.id) 54 | rownames(cell) = colnames(count0) 55 | gene = data.frame(Symbol = rownames(count0)) 56 | rownames(gene) = rownames(count0) 57 | 58 | data2 = readscdata(count = count0, cell = cell, gene = gene) 59 | rm(cell, count0, donor.names, gene, gene.symb, gse81076.df, is.spike, plate.id) 60 | 61 | 62 | ## E-MTAB-5061 Smart-Seq2 63 | header <- read.table(file = paste0(PATH, "/Raw_Counts/pancreas_refseq_rpkms_counts_3514sc.txt"), nrow = 1, sep = "\t", comment.char = "", stringsAsFactors = FALSE) 64 | ncells <- ncol(header) - 1L 65 | col.types <- vector("list", ncells*2 + 2) 66 | col.types[1:2] <- "character" 67 | col.types[2+ncells + seq_len(ncells)] <- "integer" 68 | e5601.df <- read.table(file = paste0(PATH, "/Raw_Counts/pancreas_refseq_rpkms_counts_3514sc.txt"), sep = "\t", colClasses = col.types) 69 | gene.data <- e5601.df[,1:2] 70 | e5601.df <- e5601.df[,-(1:2)] 71 | colnames(e5601.df) <- as.character(header[1,-1]) 72 | dim(e5601.df) 73 | is.spike <- grepl("^ERCC-", gene.data[,2]) 74 | table(is.spike) 75 | e5601.df = data.frame(e5601.df, Symbol = gene.data$V1) 76 | e5601.df = e5601.df[!duplicated(e5601.df$Symbol),] 77 | rownames(e5601.df) = e5601.df$Symbol 78 | e5601.df = e5601.df[,-3515] 79 | col.types[[1]] = sapply(colnames(e5601.df), function(x){strsplit(x, '_', fixed = T)[[1]][1]}) 80 | col.types[[2]] = sapply(colnames(e5601.df), function(x){strsplit(x, '_', fixed = T)[[1]][2]}) 81 | 82 | count0 = e5601.df 83 | cell = data.frame(Donor = col.types[[1]], Plate = col.types[[2]]) 84 | rownames(cell) = colnames(count0) 85 | gene = data.frame(Symbol = rownames(count0)) 86 | rownames(gene) = rownames(count0) 87 | 88 | data3 = readscdata(count = count0, cell = cell, gene = gene) 89 | rm(cell, count0, gene, e5601.df, is.spike, col.types, gene.data, header, ncells) 90 | 91 | 92 | ## GSE83139 SMARTer 93 | gse83139.df <- read.table(file = paste0(PATH, "/Raw_Counts/GSE83139_tbx-v-f-norm-ntv-cpms.csv"), sep = "\t", header = T, stringsAsFactors = F) 94 | gse83139.df <- gse83139.df[!duplicated(gse83139.df$gene),] 95 | gse83139.df <- gse83139.df[!is.na(gse83139.df$gene),] 96 | count0 = as.matrix(gse83139.df[,-c(1:7)]) 97 | rownames(count0) = as.character(gse83139.df$gene) 98 | cell = data.frame(Sample = colnames(count0), row.names = colnames(count0), stringsAsFactors = F) 99 | gene = data.frame(ID = rownames(count0), row.names = rownames(count0), stringsAsFactors = F) 100 | 101 | data4 = readscdata(count = count0, cell = cell, gene = gene) 102 | rm(count0, gse83139.df, cell, gene) 103 | 104 | 105 | ################################################################################### 106 | ### RISC Data ### 107 | ################################################################################### 108 | process1 <- function(obj0){ 109 | obj0 <- scFilter(obj0, min.UMI = 3000, max.UMI = 100000, min.gene = 1000, min.cell = 5) 110 | obj0 <- scNormalize(obj0) 111 | obj0 <- scDisperse(obj0) 112 | print(length(obj0@vargene)) 113 | obj0 <- scPCA(obj0) 114 | return(obj0) 115 | } 116 | 117 | process2 <- function(obj0){ 118 | obj0 <- scFilter(obj0) 119 | obj0 <- scNormalize(obj0) 120 | obj0 <- scDisperse(obj0) 121 | print(length(obj0@vargene)) 122 | obj0 <- scPCA(obj0) 123 | return(obj0) 124 | } 125 | 126 | data1 = process1(data1) 127 | data2 = process1(data2) 128 | data3 = process1(data3) 129 | data4 = process2(data4) 130 | 131 | 132 | ################################################################################### 133 | ### Uncorrect Data ### 134 | ################################################################################### 135 | var0 = read.table(file = paste0(PATH, "/var.tsv"), sep = "\t", header = F, stringsAsFactors = F) 136 | var0 = var0$V1 137 | logmat0 = cbind( 138 | as.matrix(data1@assay$logcount)[var0,], 139 | as.matrix(data2@assay$logcount)[var0,], 140 | as.matrix(data3@assay$logcount)[var0,], 141 | as.matrix(data4@assay$logcount)[var0,] 142 | ) 143 | pca0 = irlba(logmat0, nv = 12, center = T)$v 144 | umap0 = umap(pca0)$layout 145 | ann0 = read.table(file = paste0(PATH, "/Pancreas_Annotation.tsv"), sep = "\t", header = T, stringsAsFactors = F) 146 | m0 = data.frame(UMAP1 = umap0[,1], UMAP2 = umap0[,2], Set = ann0$Set, Type = ann0$Type) 147 | m0$Set = factor(m0$Set, levels = c("E-MTAB-5061", "GSE81076", "GSE85241", "GSE83139")) 148 | 149 | ggplot(m0, aes(UMAP1, UMAP2)) + 150 | geom_point(aes(color = Set, shape = Set), size = 2, alpha = 1) + 151 | scale_color_manual(values = c("#1B9E77", "#D95F02", "#7570B3", "#E7298A")) + 152 | scale_shape_manual(values = c(1, 2, 6, 8)) + 153 | theme_bw(base_line_size = 0) + 154 | labs(x = 'UMAP-1', y = 'UMAP-2', color = 'Cell Type', shape = 'Set') + 155 | theme(plot.title = element_text(hjust = 0.5, size = 16)) + 156 | guides(colour = guide_legend(override.aes = list(size = 5), ncol = 1), shape = guide_legend(override.aes = list(size = 5), ncol = 1)) 157 | ggplot(m0, aes(UMAP1, UMAP2)) + 158 | geom_point(aes(color = Type, shape = Set), size = 2, alpha = 1) + 159 | scale_color_manual(values = brewer.pal(8, 'Spectral')) + 160 | scale_shape_manual(values = c(1, 2, 6, 8)) + 161 | theme_bw(base_line_size = 0) + 162 | labs(x = 'UMAP-1', y = 'UMAP-2', color = 'Cell Type', shape = 'Set') + 163 | theme(plot.title = element_text(hjust = 0.5, size = 16)) + 164 | guides(colour = guide_legend(override.aes = list(size = 5), ncol = 1), shape = guide_legend(override.aes = list(size = 5), ncol = 1)) 165 | 166 | 167 | ################################################################################### 168 | ### Data Integration ### 169 | ################################################################################### 170 | # Integrating 171 | data0 = list(data1, data2, data3, data4) 172 | var0 = read.table(file = paste0(PATH, "/var.tsv"), sep = "\t", header = F, stringsAsFactors = F) 173 | var0 = var0$V1 174 | InPlot(data0, var.gene = var0, minPC = 8, nPC = 12) 175 | data0 = scMultiIntegrate(data0, eigens = 8, var.gene = var0, ncore = 4, adjust = F, add.Id = c("GSE85241", "GSE81076", "E-MTAB-5061", "GSE83139")) 176 | data0 = scUMAP(data0, npc = 11, use = 'PLS') 177 | ann0 = read.table(file = paste0(PATH, "/Pancreas_Annotation.tsv"), sep = "\t", header = T, stringsAsFactors = F) 178 | data0@coldata$Platform = as.character(ann0$Platform) 179 | data0@coldata$Type = as.character(ann0$Type) 180 | 181 | DimPlot(data0, slot = "cell.umap", colFactor = "Set", Colors = brewer.pal(4, "Dark2")) 182 | DimPlot(data0, slot = "cell.umap", colFactor = "Platform", Colors = brewer.pal(4, "Dark2")) 183 | DimPlot(data0, slot = "cell.umap", colFactor = "Type", label = T) 184 | DimPlot(data0, slot = "cell.umap", genes = "SST", size = 1) 185 | DimPlot(data0, slot = "cell.umap", genes = "PPY", size = 1) 186 | DimPlot(data0, slot = "cell.umap", genes = "GCG", size = 1) 187 | DimPlot(data0, slot = "cell.umap", genes = "INS", size = 1) 188 | DimPlot(data0, slot = "cell.umap", genes = "COL1A1", size = 1) 189 | DimPlot(data0, slot = "cell.umap", genes = "PRSS1", size = 1) 190 | DimPlot(data0, slot = "cell.umap", genes = "KRT19", size = 1) 191 | DimPlot(data0, slot = "cell.umap", genes = "FLT1", size = 1) 192 | 193 | 194 | -------------------------------------------------------------------------------- /RISC_Supplementary/GSE96583/GSE96583.R: -------------------------------------------------------------------------------- 1 | library(RISC) 2 | library(irlba) 3 | library(umap) 4 | library(Matrix) 5 | library(matrixStats) 6 | library(ggplot2) 7 | library(RColorBrewer) 8 | 9 | 10 | PATH = "/Path to the data/GSE96583_PBMC" 11 | 12 | ################################################################################### 13 | ### Preparing RISC Objects ### 14 | ################################################################################### 15 | # Input Data 16 | data1 = read10Xgenomics(paste0(PATH, "/PBMC_Raw_Counts/Control")) 17 | data2 = read10Xgenomics(paste0(PATH, "/PBMC_Raw_Counts/Stimulate")) 18 | 19 | # Processing 20 | process0 <- function(obj0){ 21 | obj0 = scFilter(obj0, min.UMI = 200, min.gene = 200, max.UMI = 8000, min.cell = 3) 22 | obj0 = scNormalize(obj0) 23 | obj0 = scDisperse(obj0) 24 | print(length(obj0@vargene)) 25 | return(obj0) 26 | } 27 | 28 | data1 = process0(data1) 29 | data2 = process0(data2) 30 | 31 | 32 | ################################################################################### 33 | ### Uncorrected Data ### 34 | ################################################################################### 35 | ## Plot 36 | Interplot <- function(m0, color0, shape0, group0 = "Type"){ 37 | if(group0 == "Type"){ 38 | g0 = ggplot(m0, aes(UMAP1, UMAP2)) + 39 | geom_point(aes(color = CellType, shape = Group), size = 0.5, alpha = 1) + 40 | scale_color_manual(values = color0) + 41 | scale_shape_manual(values = shape0) + 42 | theme_bw(base_line_size = 0) + 43 | labs(color = "Cell-Type", shape = "Batch") + 44 | theme(plot.title = element_text(hjust = 0.5, size = 16)) + 45 | guides(color = guide_legend(override.aes = list(size = 6)), shape = guide_legend(override.aes = list(size = 5))) 46 | } else { 47 | g0 = ggplot(m0, aes(UMAP1, UMAP2)) + 48 | geom_point(aes(color = Group, shape = Group), size = 0.5, alpha = 1) + 49 | scale_color_manual(values = c("#FB8072", "#80B1D3")) + 50 | scale_shape_manual(values = shape0) + 51 | theme_bw(base_line_size = 0) + 52 | labs(color = "Cell-Type", shape = "Batch") + 53 | theme(plot.title = element_text(hjust = 0.5, size = 16)) + 54 | guides(color = guide_legend(override.aes = list(size = 6)), shape = guide_legend(override.aes = list(size = 5))) 55 | } 56 | print(g0) 57 | } 58 | 59 | # Raw PCs 60 | logmat1 = as.matrix(data1@assay$logcount) 61 | logmat2 = as.matrix(data2@assay$logcount) 62 | gene0 = intersect(rownames(logmat1), rownames(logmat2)) 63 | logmat0 = cbind(logmat1[gene0,], logmat2[gene0,]) 64 | pca0 = irlba(logmat0, nv = 50, center = T)$v 65 | umap0 = umap(pca0[,1:16])$layout 66 | 67 | # Plot 68 | m0 = data.frame(UMAP1 = umap0[,1], UMAP2 = umap0[,2], Group = c(rep("IFN-", ncol(logmat1)), rep("IFN+", ncol(logmat2)))) 69 | shape0 = c(1, 8) 70 | Interplot(m0, color0, shape0, group0 = "Set") 71 | 72 | 73 | ################################################################################### 74 | ### Data Integration with Full Cells ### 75 | ################################################################################### 76 | ## Integration 77 | var0 = read.table(paste0(PATH, "/Var0_Ori.tsv"), sep = "\t", header = F, stringsAsFactors = F) 78 | var0 = var0$V1 79 | data3 = list(data1, data2) 80 | InPlot(data3, var.gene = var0, nPC = 20) 81 | data3 = scMultiIntegrate(data3, eigens = 15, var.gene = var0, add.Id = c('IFN-', 'IFN+'), adjust = FALSE, ncore = 4) 82 | data3 = scUMAP(data3, npc = 18, use = 'PLS') 83 | 84 | DimPlot(data3, slot = "cell.umap", colFactor = "Set", Colors = c("firebrick2", "skyblue"), size = 0.2) 85 | 86 | 87 | ################################################################################### 88 | ### Subset the Cells ### 89 | ################################################################################### 90 | Anno0 = read.table(paste0(PATH, "/Anno0.tsv"), sep = "\t", header = T, stringsAsFactors = F) 91 | cell1 = Anno0$scBarcode[Anno0$Set == "IFN-"] 92 | cell2 = Anno0$scBarcode[Anno0$Set == "IFN+"] 93 | 94 | data1 = SubSet(data1, cells = cell1) 95 | data2 = SubSet(data2, cells = cell2) 96 | data1 = process0(data1) 97 | data2 = process0(data2) 98 | 99 | FilterPlot(data1) 100 | FilterPlot(data2) 101 | 102 | 103 | ################################################################################### 104 | ### Data Integration with Selected Cells ### 105 | ################################################################################### 106 | ## Integration 107 | var0 = read.table(paste0(PATH, "/Var0.tsv"), sep = "\t", header = F, stringsAsFactors = F) 108 | var0 = var0$V1 109 | data3 = list(data1, data2) 110 | InPlot(data3, var.gene = var0, nPC = 20) 111 | data3 = scMultiIntegrate(data3, eigens = 16, var.gene = var0, add.Id = c('IFN-', 'IFN+'), adjust = TRUE, ncore = 4) 112 | data3 = scUMAP(data3, npc = 16, use = 'PLS') 113 | DimPlot(data3, slot = "cell.umap", colFactor = "Set", Colors = c("firebrick2", "skyblue"), size = 0.2) 114 | 115 | # Cell Type 116 | Anno0 = read.table(paste0(PATH, "/Anno0.tsv"), sep = "\t", header = T, stringsAsFactors = F) 117 | data3@coldata$Type0 = Anno0$Type0 118 | DimPlot(data3, slot = "cell.umap", colFactor = "Type0", size = 0.2, label = T) 119 | 120 | # Marker Genes 121 | DimPlot(data3, slot = "cell.umap", genes = "CD8A", size = 0.2) 122 | DimPlot(data3, slot = "cell.umap", genes = "FYN", size = 0.2) 123 | DimPlot(data3, slot = "cell.umap", genes = "CACYBP", size = 0.2) 124 | DimPlot(data3, slot = "cell.umap", genes = "IL1B", size = 0.2) 125 | DimPlot(data3, slot = "cell.umap", genes = "HES4", size = 0.2) 126 | DimPlot(data3, slot = "cell.umap", genes = "GNLY", size = 0.2) 127 | DimPlot(data3, slot = "cell.umap", genes = "AES", size = 0.2) 128 | DimPlot(data3, slot = "cell.umap", genes = "CD79A", size = 0.2) 129 | DimPlot(data3, slot = "cell.umap", genes = "CD79B", size = 0.2) 130 | DimPlot(data3, slot = "cell.umap", genes = "SMPDL3A", size = 0.2) 131 | DimPlot(data3, slot = "cell.umap", genes = "MIR155HG", size = 0.2) 132 | DimPlot(data3, slot = "cell.umap", genes = "CKB", size = 0.2) 133 | DimPlot(data3, slot = "cell.umap", genes = "PPBP", size = 0.2) 134 | 135 | 136 | -------------------------------------------------------------------------------- /Release.txt: -------------------------------------------------------------------------------- 1 | RISC v1.6.0 release (03.22.2022) 2 | 3 | Changes from the last release (v1.5) 4 | This version mainly solves the problems that is caused by dependent package updates 5 | 6 | Changes from the last release (v1) 7 | (1) Replace dependent "RcppEigen" with "RcppArmadillo", fully support sparse matrix in core functions. 8 | (2) Replace dependent "pbmcapply" with "pbapply" 9 | (3) Optimize "scMultiIntegrate" function and reduce memory-consuming; the new RISC release can support integration of datasets with >1.5 million cells and 10,000 genes. 10 | (4) Convert "logcounts" in the integrated RISC object "object@assay$logcount" from a large matrix to a list including multiple logcounts matrices, each corrected matrix for the corresponding individual data sets. To output full integrated matrix, mat0 = do.call(cbind, object@assay$logcount) 11 | (5) Change function name "readscdata" -> "readsc" 12 | (6) Change function name "read10Xgenomics" -> "read10X_mtx" 13 | (7) Parameter names in some functions are changed. 14 | 15 | Added new functions 16 | (1) In "scMarker" and "AllMarker" functions, add Wilcoxon Rank Sum and Signed Rank model. 17 | (2) In "scMarker", "AllMarker" and "scDEG" functions, add pseudo-cell (bin cells to generate meta-cells) option to detect marker genes. 18 | (3) Add "slot" parameter in "DimPlot" function, external dimension reduction results can be added in RISC object, e.g. add phate results (phate0) to RISC object obj0@DimReduction$cell.phate = phate0; DimPlot(obj0, slot = "cell.phate", colFactor = 'Group', size = 2, label = TRUE) 19 | (4) Add "read10X_h5" function for 10X Genomics h5 file. 20 | 21 | Removed old functions 22 | (1) delete "readHTSeqdata" function. 23 | -------------------------------------------------------------------------------- /Seurat_to_RISC_RISC_v1.0.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/bioinfoDZ/RISC/7f1ddf61711f0d24ef8ea0601e550370ff3ec2b5/Seurat_to_RISC_RISC_v1.0.pdf -------------------------------------------------------------------------------- /build/vignette.rds: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/bioinfoDZ/RISC/7f1ddf61711f0d24ef8ea0601e550370ff3ec2b5/build/vignette.rds -------------------------------------------------------------------------------- /data/datalist: -------------------------------------------------------------------------------- 1 | raw.mat 2 | -------------------------------------------------------------------------------- /data/raw.mat.rda: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/bioinfoDZ/RISC/7f1ddf61711f0d24ef8ea0601e550370ff3ec2b5/data/raw.mat.rda -------------------------------------------------------------------------------- /inst/doc/RISC_Vignette.R: -------------------------------------------------------------------------------- 1 | ## ----include = FALSE---------------------------------------------------------- 2 | knitr::opts_chunk$set( 3 | collapse = TRUE, 4 | comment = "#>" 5 | ) 6 | 7 | ## ----setup-------------------------------------------------------------------- 8 | library(RISC) 9 | library(RColorBrewer) 10 | 11 | data("raw.mat") 12 | mat0 = raw.mat[[1]] 13 | coldata0 = raw.mat[[2]] 14 | 15 | coldata1 = coldata0[coldata0$Batch0 == 'Batch1',] 16 | coldata2 = coldata0[coldata0$Batch0 == 'Batch2',] 17 | coldata3 = coldata0[coldata0$Batch0 == 'Batch3',] 18 | coldata4 = coldata0[coldata0$Batch0 == 'Batch4',] 19 | coldata5 = coldata0[coldata0$Batch0 == 'Batch5',] 20 | coldata6 = coldata0[coldata0$Batch0 == 'Batch6',] 21 | mat1 = mat0[,rownames(coldata1)] 22 | mat2 = mat0[,rownames(coldata2)] 23 | mat3 = mat0[,rownames(coldata3)] 24 | mat4 = mat0[,rownames(coldata4)] 25 | mat5 = mat0[,rownames(coldata5)] 26 | mat6 = mat0[,rownames(coldata6)] 27 | 28 | ## ----------------------------------------------------------------------------- 29 | sce1 = readsc(count = mat1, cell = coldata1, gene = data.frame(Symbol = rownames(mat1), row.names = rownames(mat1)), is.filter = FALSE) 30 | sce2 = readsc(count = mat2, cell = coldata2, gene = data.frame(Symbol = rownames(mat2), row.names = rownames(mat2)), is.filter = FALSE) 31 | sce3 = readsc(count = mat3, cell = coldata3, gene = data.frame(Symbol = rownames(mat3), row.names = rownames(mat3)), is.filter = FALSE) 32 | sce4 = readsc(count = mat4, cell = coldata4, gene = data.frame(Symbol = rownames(mat4), row.names = rownames(mat4)), is.filter = FALSE) 33 | sce5 = readsc(count = mat5, cell = coldata5, gene = data.frame(Symbol = rownames(mat5), row.names = rownames(mat5)), is.filter = FALSE) 34 | sce6 = readsc(count = mat6, cell = coldata6, gene = data.frame(Symbol = rownames(mat6), row.names = rownames(mat6)), is.filter = FALSE) 35 | 36 | ## ----------------------------------------------------------------------------- 37 | process0 <- function(obj0){ 38 | 39 | # filter cells 40 | obj0 = scFilter(obj0, min.UMI = 0, max.UMI = Inf, min.gene = 10, min.cell = 3, is.filter = FALSE) 41 | 42 | # normalize data 43 | obj0 = scNormalize(obj0, method = 'robust') 44 | 45 | # find highly variable genes 46 | obj0 = scDisperse(obj0) 47 | 48 | # here replace highly variable genes by all the genes for integraton 49 | obj0@vargene = rownames(sce1@rowdata) 50 | 51 | return(obj0) 52 | 53 | } 54 | 55 | sce1 = process0(sce1) 56 | sce2 = process0(sce2) 57 | sce3 = process0(sce3) 58 | sce4 = process0(sce4) 59 | sce5 = process0(sce5) 60 | sce6 = process0(sce6) 61 | 62 | ## ----------------------------------------------------------------------------- 63 | set.seed(1) 64 | var.genes = rownames(sce1@assay$count) 65 | pcr0 = list(sce1, sce2, sce3, sce4, sce5, sce6) 66 | pcr0 = scMultiIntegrate(pcr0, eigens = 9, var.gene = var.genes, align = 'OLS', npc = 15) 67 | # pcr0 = scLargeIntegrate(pcr0, var.gene = var.genes, align = 'Predict', npc = 8) 68 | pcr0 =scUMAP(pcr0, npc = 9, use = 'PLS', dist = 0.001, neighbors = 15) 69 | 70 | ## ----------------------------------------------------------------------------- 71 | pcr0@coldata$Group = factor(pcr0@coldata$Group0, levels = c('Group1', 'Group2', 'Group2*', 'Group3', 'Group3*'), labels = c("a", "b", "b'", "c", "c'")) 72 | pcr0@coldata$Set0 = factor(pcr0@coldata$Set, levels = c('Set1', 'Set2', 'Set3', 'Set4', 'Set5', 'Set6'), labels = c('Set1 rep.1', 'Set2 rep.1', 'Set3 rep.1', 'Set1 rep.2', 'Set2 rep.2', 'Set3 rep.2')) 73 | pcr0 = scCluster(pcr0, slot = "cell.umap", k = 4, method = "density", dc = 0.3) 74 | 75 | ## ----fig.show="hold", out.width="48", fig.dim=c(7, 5)------------------------- 76 | DimPlot(pcr0, colFactor = 'Set0', size = 2) 77 | DimPlot(pcr0, colFactor = 'Group', size = 2, Colors = brewer.pal(5, "Set1")) 78 | DimPlot(pcr0, colFactor = 'Cluster', size = 2, Colors = brewer.pal(6, "Dark2")) 79 | 80 | ## ----------------------------------------------------------------------------- 81 | sessionInfo() 82 | 83 | -------------------------------------------------------------------------------- /inst/doc/RISC_Vignette.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "RSC: robust integration of single-cell RNA-seq datasets using a single reference space" 3 | author: 4 | - name: Yang Liu, Tao Wang, Deyou Zheng 5 | affiliation: Albert einstein college of medicine, Bronx, NY, United States 6 | date: "2021" 7 | output: rmarkdown::html_vignette 8 | package: RISC 9 | vignette: > 10 | %\VignetteIndexEntry{RSC: robust integration of single-cell RNA-seq datasets using a single reference space} 11 | %\VignetteEngine{knitr::rmarkdown} 12 | %\VignetteEncoding{UTF-8} 13 | --- 14 | 15 | ```{r, include = FALSE} 16 | knitr::opts_chunk$set( 17 | collapse = TRUE, 18 | comment = "#>" 19 | ) 20 | ``` 21 | 22 | # Introduction 23 | 24 | Single-cell RNA sequencing (scRNA-seq) has become an essential genomic technology for resolving gene expression heterogeneity in single cells and has been widely used in many biological domains. It remains challenging to integrate scRNA-seq datasets with inter-sample heterogeneity, for example, different cell subpopulation compositions among datasets or gene expression difference in the same cell groups across datasets. 25 | 26 | We find that the distortion in the integration of heterogeneous data is due to the lack of a consistent global reference space for projecting all cells from individual datasets. To overcome this issue, we develop a novel approach, named reference principal component integration (RPCI) and implemente it in a new scRNA-seq analysis package called “RISC”, for robust integration of scRNA-seq data. 27 | 28 | 29 | ```{r setup} 30 | library(RISC) 31 | library(RColorBrewer) 32 | 33 | data("raw.mat") 34 | mat0 = raw.mat[[1]] 35 | coldata0 = raw.mat[[2]] 36 | 37 | coldata1 = coldata0[coldata0$Batch0 == 'Batch1',] 38 | coldata2 = coldata0[coldata0$Batch0 == 'Batch2',] 39 | coldata3 = coldata0[coldata0$Batch0 == 'Batch3',] 40 | coldata4 = coldata0[coldata0$Batch0 == 'Batch4',] 41 | coldata5 = coldata0[coldata0$Batch0 == 'Batch5',] 42 | coldata6 = coldata0[coldata0$Batch0 == 'Batch6',] 43 | mat1 = mat0[,rownames(coldata1)] 44 | mat2 = mat0[,rownames(coldata2)] 45 | mat3 = mat0[,rownames(coldata3)] 46 | mat4 = mat0[,rownames(coldata4)] 47 | mat5 = mat0[,rownames(coldata5)] 48 | mat6 = mat0[,rownames(coldata6)] 49 | ``` 50 | 51 | 52 | # Heterogeneous Simulated data 53 | 54 | To show the advantages of RPCI and evaluate its performances quantitatively, we start with simulated data and control the degrees of gene expression difference in two of the three cell groups between datasets: more DE genes in c/c' than that in b/b', and no DE genes in group a. As expected, the cell groups with increasing DE genes displayed a gradual reduction in cell similarity. 55 | 56 | # Create RISC objects 57 | 58 | We generate RISC objects from the gene-cell matrix (mat), the data frame of the cells (coldata), and the data frame of the genes, using "readsc" function. The RISC objects can also be generated by "read10X_mtx" or "read10X_h5" function for 10X Genomics data. 59 | 60 | 61 | ```{r} 62 | sce1 = readsc(count = mat1, cell = coldata1, gene = data.frame(Symbol = rownames(mat1), row.names = rownames(mat1)), is.filter = FALSE) 63 | sce2 = readsc(count = mat2, cell = coldata2, gene = data.frame(Symbol = rownames(mat2), row.names = rownames(mat2)), is.filter = FALSE) 64 | sce3 = readsc(count = mat3, cell = coldata3, gene = data.frame(Symbol = rownames(mat3), row.names = rownames(mat3)), is.filter = FALSE) 65 | sce4 = readsc(count = mat4, cell = coldata4, gene = data.frame(Symbol = rownames(mat4), row.names = rownames(mat4)), is.filter = FALSE) 66 | sce5 = readsc(count = mat5, cell = coldata5, gene = data.frame(Symbol = rownames(mat5), row.names = rownames(mat5)), is.filter = FALSE) 67 | sce6 = readsc(count = mat6, cell = coldata6, gene = data.frame(Symbol = rownames(mat6), row.names = rownames(mat6)), is.filter = FALSE) 68 | ``` 69 | 70 | 71 | # Processing RISC data 72 | 73 | After create RISC objects, we next process the RISC data, here we show the standard processes: 74 | (1) filter the cells, remove cells with extremely low or high UMIs and discard cells with extremely low number of expressed genes. Here we use simulated data, so we do not filter out any cell. 75 | (2) normalize gene expression, removing the effect of RNA sequencing depth. 76 | (3) scale gene expression, the scaled counts merely contain gene signal information for individual cells, and yield column-wise zero empirical mean for each column, thus satisfying the requirement for PCA and SVD. 77 | (4) find highly variable genes, identify highly variable genes by Quasi-Poisson model and utilize them for gene-cell matrix decomposition and data integration. 78 | 79 | 80 | ```{r} 81 | process0 <- function(obj0){ 82 | 83 | # filter cells 84 | obj0 = scFilter(obj0, min.UMI = 0, max.UMI = Inf, min.gene = 10, min.cell = 3, is.filter = FALSE) 85 | 86 | # normalize data 87 | obj0 = scNormalize(obj0, method = 'robust') 88 | 89 | # find highly variable genes 90 | obj0 = scDisperse(obj0) 91 | 92 | # here replace highly variable genes by all the genes for integraton 93 | obj0@vargene = rownames(sce1@rowdata) 94 | 95 | return(obj0) 96 | 97 | } 98 | 99 | sce1 = process0(sce1) 100 | sce2 = process0(sce2) 101 | sce3 = process0(sce3) 102 | sce4 = process0(sce4) 103 | sce5 = process0(sce5) 104 | sce6 = process0(sce6) 105 | ``` 106 | 107 | 108 | # RPCI integration 109 | 110 | The core principle of RPCI is very different from existing methods, RPCI introduces an effective formula to calibrate cell similarity by a global reference, and directly projects all cells into a reference RPCI space. 111 | 112 | 113 | ```{r} 114 | set.seed(1) 115 | var.genes = rownames(sce1@assay$count) 116 | pcr0 = list(sce1, sce2, sce3, sce4, sce5, sce6) 117 | pcr0 = scMultiIntegrate(pcr0, eigens = 9, var.gene = var.genes, align = 'OLS', npc = 15) 118 | # pcr0 = scLargeIntegrate(pcr0, var.gene = var.genes, align = 'Predict', npc = 8) 119 | pcr0 =scUMAP(pcr0, npc = 9, use = 'PLS', dist = 0.001, neighbors = 15) 120 | ``` 121 | 122 | ```{r} 123 | pcr0@coldata$Group = factor(pcr0@coldata$Group0, levels = c('Group1', 'Group2', 'Group2*', 'Group3', 'Group3*'), labels = c("a", "b", "b'", "c", "c'")) 124 | pcr0@coldata$Set0 = factor(pcr0@coldata$Set, levels = c('Set1', 'Set2', 'Set3', 'Set4', 'Set5', 'Set6'), labels = c('Set1 rep.1', 'Set2 rep.1', 'Set3 rep.1', 'Set1 rep.2', 'Set2 rep.2', 'Set3 rep.2')) 125 | pcr0 = scCluster(pcr0, slot = "cell.umap", k = 4, method = "density", dc = 0.3) 126 | ``` 127 | 128 | 129 | # UMAP plot 130 | 131 | The dissimilarity in c/c' is larger than that in b/b' based on our original design, and this cell-cell relationship can be directly reflected in the UMAP plots of the RPCI-integrated data. And the difference in c/c' can be re-clustered from the integrated data. 132 | 133 | Here the simulated data contain three sets, each set includes three cell groups and two replicates, with the batches existing among sets and duplicates. 134 | 135 | 136 | ```{r, fig.show="hold", out.width="48", fig.dim=c(7, 5)} 137 | DimPlot(pcr0, colFactor = 'Set0', size = 2) 138 | DimPlot(pcr0, colFactor = 'Group', size = 2, Colors = brewer.pal(5, "Set1")) 139 | DimPlot(pcr0, colFactor = 'Cluster', size = 2, Colors = brewer.pal(6, "Dark2")) 140 | ``` 141 | 142 | 143 | More details and real scRNA-seq data tutorial shown in the URLs: https://github.com/yangRISC/RISC 144 | 145 | 146 | # Session Information 147 | ```{r} 148 | sessionInfo() 149 | ``` 150 | -------------------------------------------------------------------------------- /man/AddFactor.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/Utilities.R 3 | \name{AddFactor} 4 | \alias{AddFactor} 5 | \title{Utilities Add Factors} 6 | \usage{ 7 | AddFactor(object, colData = NULL, rowData = NULL, value = NULL) 8 | } 9 | \arguments{ 10 | \item{object}{RISC object: a framework dataset.} 11 | 12 | \item{colData}{Input the names that will be added into coldata of RISC object, 13 | it should be characters, as the col.names of coldata.} 14 | 15 | \item{rowData}{Input the names that will be added into rowdata of RISC object, 16 | it should be characters, as the col.names of rowdata.} 17 | 18 | \item{value}{The factor vector or data.frame that will be added into coldata 19 | or rowdata, the vector/data.frame should have equal names/row.names to the 20 | row.names coldata or rowdata of RISC object. The input: vector or data.frame.} 21 | } 22 | \description{ 23 | The "AddFactor" function can add factors to the full dataset, this function 24 | can add one or more factors into coldata. Here the row.names/names of factor 25 | matrix/vector should be equal to the row.names of coldata of RISC object. 26 | } 27 | -------------------------------------------------------------------------------- /man/All-Cluster-Marker.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/ClusterMarker.R 3 | \name{AllMarker} 4 | \alias{AllMarker} 5 | \title{Find All Cluster Markers} 6 | \usage{ 7 | AllMarker( 8 | object, 9 | positive = TRUE, 10 | frac = 0.25, 11 | log2FC = 0.5, 12 | Padj = 0.05, 13 | latent.factor = NULL, 14 | min.cells = 25L, 15 | method = "QP", 16 | ncore = 1 17 | ) 18 | } 19 | \arguments{ 20 | \item{object}{RISC object: a framework dataset.} 21 | 22 | \item{positive}{Whether only output the cluster markers with positive log2FC.} 23 | 24 | \item{frac}{A fraction cutoff, the marker genes expressed at least a 25 | cutoff fraction of all the cells.} 26 | 27 | \item{log2FC}{The cutoff of log2 Fold-change for differentially expressed marker 28 | genes.} 29 | 30 | \item{Padj}{The cutoff of the adjusted P-value.} 31 | 32 | \item{latent.factor}{The latent factor from coldata, which represents number 33 | values or factors, and only one latent factor can be inputed.} 34 | 35 | \item{min.cells}{The threshold for the cell number of valid clusters.} 36 | 37 | \item{method}{Which method is used to identify cluster markers, three options: 'NB' 38 | for Negative Binomial model, 'QP' for QuasiPoisson model, and 'Wilcox' for Wilcoxon 39 | Rank Sum and Signed Rank model.} 40 | 41 | \item{ncore}{The multiple cores for parallel calculating.} 42 | } 43 | \description{ 44 | This function depends on "scMarker" function by using the same criteria, and 45 | generates markers for all the clusters. Here, if the cell number of any cluster 46 | is less than 10 , RISC will skip this cluster and not detect its cluster markers 47 | in the default parameters. 48 | } 49 | \details{ 50 | Because log2 cannot handle counts with value 0, we use log1p to calculate average 51 | values of counts and log2 to format fold-change. 52 | } 53 | -------------------------------------------------------------------------------- /man/Cluster-Marker.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/ClusterMarker.R 3 | \name{scMarker} 4 | \alias{scMarker} 5 | \title{Find Cluster Markers} 6 | \usage{ 7 | scMarker( 8 | object, 9 | cluster = 1, 10 | positive = TRUE, 11 | frac = 0.25, 12 | log2FC = 0.5, 13 | Padj = 0.05, 14 | latent.factor = NULL, 15 | method = "QP", 16 | min.cells = 10, 17 | ncore = 1 18 | ) 19 | } 20 | \arguments{ 21 | \item{object}{RISC object: a framework dataset.} 22 | 23 | \item{cluster}{Select the cluster that we want to detect cluster marker genes.} 24 | 25 | \item{positive}{Whether only output the cluster markers with positive log2FC.} 26 | 27 | \item{frac}{A fraction cutoff, the marker genes expressed at least a 28 | cutoff fraction of all the cells.} 29 | 30 | \item{log2FC}{The cutoff of log2 Fold-change for differentially expressed marker 31 | genes.} 32 | 33 | \item{Padj}{The cutoff of the adjusted P-value.} 34 | 35 | \item{latent.factor}{The latent factor from coldata, which represents number 36 | values or factors, and only one latent factor can be inputed.} 37 | 38 | \item{method}{Which method is used to identify cluster markers, three options: 'NB' 39 | for Negative Binomial model, 'QP' for QuasiPoisson model, and 'Wilcox' for Wilcoxon 40 | Rank Sum and Signed Rank model.} 41 | 42 | \item{min.cells}{The minimum cells for each cluster to calculate marker genes.} 43 | 44 | \item{ncore}{The multiple cores for parallel calculating.} 45 | } 46 | \description{ 47 | This is the basic function in RISC, it can identify the cluster markers by 48 | comparing samples in the selected cluster to the samples of the rest clusters. 49 | Therefore, it is possible one gene labeled as a marker in more than one clusters. 50 | Two methods are employed in RISC, one is based on Negative Binomial model 51 | while the other using QuasiPoisson model. 52 | } 53 | \details{ 54 | Because log2 cannot handle counts with value 0, we use log1p to calculate average 55 | values of counts and log2 to format fold-change. 56 | } 57 | \examples{ 58 | # RISC object 59 | obj0 = raw.mat[[3]] 60 | obj0 = scPCA(obj0, npc = 10) 61 | obj0 = scUMAP(obj0, npc = 3) 62 | obj0 = scCluster(obj0, slot = "cell.umap", k = 3, method = 'density') 63 | DimPlot(obj0, slot = "cell.umap", colFactor = 'Cluster', size = 2) 64 | marker1 = scMarker(obj0, cluster = 1, method = 'QP', min.cells = 3) 65 | } 66 | \references{ 67 | Paternoster et al., Criminology (1997) 68 | 69 | Berk et al., Journal of Quantitative Criminology (2008) 70 | 71 | Liu et al., Nature Biotech. (2021) 72 | } 73 | -------------------------------------------------------------------------------- /man/Cluster.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/Cluster.R 3 | \name{scCluster} 4 | \alias{scCluster} 5 | \title{Clustering cells} 6 | \usage{ 7 | scCluster( 8 | object, 9 | slot = "cell.pca", 10 | neighbor = 10, 11 | algorithm = "kd_tree", 12 | method = "louvain", 13 | npc = 20, 14 | k = 10, 15 | res = 0.5, 16 | dc = NULL, 17 | redo = TRUE, 18 | random.seed = 123 19 | ) 20 | } 21 | \arguments{ 22 | \item{object}{RISC object: a framework dataset.} 23 | 24 | \item{slot}{The dimension_reduction slot for cell clustering. The default is 25 | "cell.umap" under RISC object "DimReduction" item for UMAP method, but the 26 | customer can add new dimension_reduction method under DimReduction and use it.} 27 | 28 | \item{neighbor}{The neighbor cells for "igraph" method.} 29 | 30 | \item{algorithm}{The algorithm for knn, the default is "kd_tree", all options: 31 | "kd_tree", "cover_tree", "CR", "brute".} 32 | 33 | \item{method}{The methods for clustering cells, density and louvain. 34 | The "density" is based on the slot "cell.umap" or other low dimensional 35 | space; while "louvain" based on "cell.pca" (individual data) or "cell.pls" 36 | (for integration data).} 37 | 38 | \item{npc}{The number of PCA or PLS used for cell clustering.} 39 | 40 | \item{k}{The number of cluster searched for, works in "density" method.} 41 | 42 | \item{res}{The resolution of cluster searched for, works in "louvain" method.} 43 | 44 | \item{dc}{The distance used to generate random center points which affect 45 | clusters. If have no idea about this, do not input anything. Keep it as the 46 | default value for most users. Work for "density" method.} 47 | 48 | \item{redo}{Whether re-cluster the cells.} 49 | 50 | \item{random.seed}{The random seed, the default is 123.} 51 | } 52 | \value{ 53 | RISC single cell dataset, the cluster slot. 54 | } 55 | \description{ 56 | In RISC, two different methods are provided to cluster cells, all of them are 57 | widely used in single cells. The first method is "louvain" based on cell 58 | eigenvectors, and the other is "density" which calculates cell clusters using 59 | low dimensional space. 60 | } 61 | \examples{ 62 | # RISC object 63 | obj0 = raw.mat[[3]] 64 | obj0 = scPCA(obj0, npc = 10) 65 | obj0 = scUMAP(obj0, npc = 3) 66 | obj0 = scCluster(obj0, slot = "cell.umap", k = 3, method = 'density') 67 | DimPlot(obj0, slot = "cell.umap", colFactor = 'Cluster', size = 2) 68 | } 69 | \references{ 70 | Blondel et al., JSTAT (2008) 71 | 72 | Rodriguez et al., Sicence (2014) 73 | } 74 | -------------------------------------------------------------------------------- /man/DimPlot.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/Graph.R 3 | \name{tSNEPlot} 4 | \alias{tSNEPlot} 5 | \alias{DimPlot} 6 | \title{Dimension Reduction Plots} 7 | \usage{ 8 | DimPlot( 9 | object, 10 | slot = "cell.umap", 11 | colFactor = NULL, 12 | genes = NULL, 13 | legend = TRUE, 14 | Colors = NULL, 15 | size = 0.5, 16 | Alpha = 0.8, 17 | plot.ncol = NULL, 18 | exp.range = NULL, 19 | exp.col = "firebrick2", 20 | label = FALSE, 21 | adjust.label = 0.25, 22 | label.font = 5 23 | ) 24 | } 25 | \arguments{ 26 | \item{object}{RISC object: a framework dataset.} 27 | 28 | \item{slot}{The dimension_reduction slot for drawing the plots. The default is 29 | "cell.umap" under RISC object "DimReduction" item for UMAP plot, but the customer 30 | can add new dimension_reduction method under DimReduction and use it.} 31 | 32 | \item{colFactor}{Use the factor (column name) in the coldata to make a 33 | dimension_reduction plot, but each time only one column name can be inputted.} 34 | 35 | \item{genes}{Use the gene expression values (gene symbol) to make dimension 36 | reduction plot, each time more than one genes can be inputted.} 37 | 38 | \item{legend}{Whether a legend shown at dimension_reduction plot.} 39 | 40 | \item{Colors}{The users can use their own colors (color vector). The default of 41 | the "tSNEPlot" funciton will assign colors automatically.} 42 | 43 | \item{size}{Choose the size of dots at dimension_reduction plot, the default 44 | size is 0.5.} 45 | 46 | \item{Alpha}{Whether show transparency of individual points, the default is 0.8.} 47 | 48 | \item{plot.ncol}{If the users input more than one genes, the arrangement of 49 | multiple dimension_reduction plot depends on this parameter.} 50 | 51 | \item{exp.range}{The gene expression cutoff for plot, e.g. "c(0, 1.5)" for 52 | expression level between 0 and 1.5.} 53 | 54 | \item{exp.col}{The gradient color for gene expression.} 55 | 56 | \item{label}{Whether label the clusters or cell populations in the plot.} 57 | 58 | \item{adjust.label}{The adjustment of the label position.} 59 | 60 | \item{label.font}{The font size for the label.} 61 | } 62 | \description{ 63 | The Dimension Reduction plots are widespread in scRNA-seq data analysis. Here, 64 | the "DimPlot" function not only can make plots for factor labels of individual 65 | cells but also can show gene expression values of each cell. 66 | } 67 | \examples{ 68 | # RISC object 69 | obj0 = raw.mat[[3]] 70 | obj0 = scPCA(obj0, npc = 10) 71 | obj0 = scUMAP(obj0, npc = 3) 72 | DimPlot(obj0, slot = "cell.umap", colFactor = 'Group', size = 2, label = TRUE) 73 | DimPlot(obj0, genes = c('Gene718', 'Gene325', 'Gene604'), size = 2) 74 | } 75 | \references{ 76 | Wickham, H. (2016) 77 | 78 | Auguie, B. (2015) 79 | } 80 | -------------------------------------------------------------------------------- /man/Disperse.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/Preprocess.R 3 | \name{scDisperse} 4 | \alias{scDisperse} 5 | \title{Processing Data.} 6 | \usage{ 7 | scDisperse( 8 | object, 9 | method = "loess", 10 | min.UMI = NULL, 11 | mean.cut = NULL, 12 | QP_bin = 100, 13 | lspan = 0.05, 14 | top.var = NULL, 15 | pval = 0.5 16 | ) 17 | } 18 | \arguments{ 19 | \item{object}{RISC object: a framework dataset.} 20 | 21 | \item{method}{What method is used to define dispersion, now support "QP" and "loess", 22 | the default method is "loess".} 23 | 24 | \item{min.UMI}{A cutoff of the minimum UMIs of each gene, the genes expression 25 | below the min.UMI will be discarded from highly variable genes. The default value 26 | is 100 when input NULL.} 27 | 28 | \item{mean.cut}{A cutoff of the average value of each gene, the genes expression 29 | outside the mean.cut range will be discarded from highly variable genes. The 30 | input is a range vector, like c(0.1, 5).} 31 | 32 | \item{QP_bin}{The number of fragments using to fit dispersion in the Quasi-Poinsson 33 | model, how many bins are formed in the regression.} 34 | 35 | \item{lspan}{The number of parameter using to fit dispersion, controlling the degree 36 | of smoothing in the loess model.} 37 | 38 | \item{top.var}{The maximum number of highly variable genes, the default is NULL, 39 | including all the highly variable genes.} 40 | 41 | \item{pval}{The P-value is used to cut off the highly variable genes, 42 | the default is 0.5.} 43 | } 44 | \value{ 45 | RISC single cell dataset, the metadata slot. 46 | } 47 | \description{ 48 | After data scaling, RISC will identify highly variably expressed genes, based on 49 | Quasi-Poinsson model, where the coefficient of variation is calculated for each 50 | gene (C > 0.5 as a cutoff for highly variable genes). Then, to controlling for 51 | the relationship between S (standard deviation) and mean (average value), 52 | Quasi-Poisson regression is used to further filter the genes with over- 53 | dispersion C caused by small mean. Lastly, RISC estimates the corresponding 54 | ratio with r between the observed C and the predicted C of each gene, with a 55 | threshold r > 1. 56 | } 57 | \examples{ 58 | # RISC object 59 | obj0 = raw.mat[[5]] 60 | obj0 = scFilter(obj0, min.UMI = 0, max.UMI = Inf, min.gene = 10, min.cell = 3) 61 | obj0 = scNormalize(obj0) 62 | obj0 = scDisperse(obj0) 63 | } 64 | \references{ 65 | Liu et al., Nature Biotech. (2021) 66 | } 67 | -------------------------------------------------------------------------------- /man/Filter.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/Preprocess.R 3 | \name{scFilter} 4 | \alias{scFilter} 5 | \title{The Pre-processing Data.} 6 | \usage{ 7 | scFilter( 8 | object, 9 | min.UMI = NULL, 10 | max.UMI = NULL, 11 | min.gene = NULL, 12 | min.cell = NULL, 13 | mitochon = 1, 14 | gene.ratio = 0.05, 15 | is.filter = TRUE 16 | ) 17 | } 18 | \arguments{ 19 | \item{object}{RISC object: a framework dataset.} 20 | 21 | \item{min.UMI}{The min UMI for valid cells is usually based on the distribution 22 | analysis, with more than 2.5 percentage of UMI distribution, discarding the 23 | cells with too few UMIs. This parameter can be adjusted manually.} 24 | 25 | \item{max.UMI}{The max UMI for valid cells is usually based on the distribution 26 | analysis, with less than 97.5 percentage of UMI distribution, discarding the 27 | cells with too many UMIs.This parameter can be adjusted manually.} 28 | 29 | \item{min.gene}{The min number of expressed genes for valid cells. The default 30 | is based on the distribution analysis, with more than 0.5 percentage of gene 31 | distribution. This parameter can be adjusted manually.} 32 | 33 | \item{min.cell}{The min number of cells for valid expressed genes. The default 34 | is based on the distribution analysis, with more than 0.5 percentage of cell 35 | distribution. This parameter can be adjusted manually.} 36 | 37 | \item{mitochon}{The cutoff of the mitochondrial UMI proportion for valid cells.} 38 | 39 | \item{gene.ratio}{The cutoff of the proportions of genes in UMIs.} 40 | 41 | \item{is.filter}{Whether filter the data.} 42 | } 43 | \value{ 44 | RISC single cell dataset, the coldata and rowdata slots. 45 | } 46 | \description{ 47 | After input data, RISC preliminarily filter datasets by using three criteria: 48 | first, discard cells with too low/high raw counts/UMIs by distribution analysis. 49 | Second, remove cells with too low expressed genes. Lastly, filter out the genes 50 | only expressed in few cells. 51 | } 52 | \examples{ 53 | # RISC object 54 | obj0 = raw.mat[[5]] 55 | obj0 = scFilter(obj0, min.UMI = 0, max.UMI = Inf, min.gene = 10, min.cell = 3) 56 | } 57 | \references{ 58 | Liu et al., Nature Biotech. (2021) 59 | } 60 | -------------------------------------------------------------------------------- /man/FilterPlot.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/Graph.R 3 | \name{FilterPlot} 4 | \alias{FilterPlot} 5 | \title{Processing Plot} 6 | \usage{ 7 | FilterPlot(object, colFactor = NULL) 8 | } 9 | \arguments{ 10 | \item{object}{RISC object: a framework dataset.} 11 | 12 | \item{colFactor}{Use the factor (column name) in the coldata to make a processing 13 | plot, but each time only one column name can be inputted.} 14 | } 15 | \description{ 16 | The "FilterPlot" function makes the plots to show the UMIs and expressed genes 17 | of individual cells. These plots are usually used to estimate the data before 18 | and after pre-processing, so the users can visually select the optimal 19 | parameters to filter data. 20 | } 21 | \examples{ 22 | # RISC object 23 | obj0 = raw.mat[[3]] 24 | FilterPlot(obj0, colFactor = 'Group') 25 | } 26 | \references{ 27 | Wickham, H. (2016) 28 | 29 | Auguie, B. (2015) 30 | 31 | Liu et al., Nature Biotech. (2021) 32 | } 33 | -------------------------------------------------------------------------------- /man/Heatmap.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/Graph.R 3 | \name{Heat} 4 | \alias{Heat} 5 | \title{Heatmap} 6 | \usage{ 7 | Heat( 8 | object, 9 | colFactor = NULL, 10 | genes = NULL, 11 | cells = NULL, 12 | gene.lab = FALSE, 13 | gene.cluster = 0, 14 | sample_bin = FALSE, 15 | ann_col = NULL, 16 | lim = NULL, 17 | smooth = "smooth", 18 | span = 0.75, 19 | degree = 1, 20 | palette = NULL, 21 | num = 50, 22 | con.bin = TRUE, 23 | cell.lab.size = 10, 24 | gene.lab.size = 5, 25 | value_only = FALSE, 26 | ... 27 | ) 28 | } 29 | \arguments{ 30 | \item{object}{RISC object: a framework dataset.} 31 | 32 | \item{colFactor}{Use the factor (column name) in the coldata to make heatmap, but 33 | be factors.} 34 | 35 | \item{genes}{Use the gene expression values (gene symbol) to make heatmap, need to 36 | be inputted by the users.} 37 | 38 | \item{cells}{Use the subset cells of the whole coldata (cells) to make heatmap, 39 | the default is NULL and including all the cells.} 40 | 41 | \item{gene.lab}{Whether label gene names for the heatmap.} 42 | 43 | \item{gene.cluster}{The cluster numbers for gene clustering in the heatmap. 44 | The default is 0, without clustering genes.} 45 | 46 | \item{sample_bin}{The cell aggregating in samples, the default is FALSE.} 47 | 48 | \item{ann_col}{The annotation colors for colFactors, the input is a list.} 49 | 50 | \item{lim}{The gene expression range shown at heat-maps.} 51 | 52 | \item{smooth}{If use smooth to adjust heatmap, the default is "smooth" and another 53 | choice is "loess".} 54 | 55 | \item{span}{The loess span.} 56 | 57 | \item{degree}{The loess degree.} 58 | 59 | \item{palette}{The color palette used for heatmap. The default is 60 | brewer.pal(n = 7, name = "RdYlBu").} 61 | 62 | \item{num}{The cells for individual bin spans.} 63 | 64 | \item{con.bin}{Whether use consistent bin span.} 65 | 66 | \item{cell.lab.size}{The font size for column.} 67 | 68 | \item{gene.lab.size}{The font size for row.} 69 | 70 | \item{value_only}{Only return values.} 71 | } 72 | \description{ 73 | The "Heat" map makes heatmap to show gene expression patterns of single cells. 74 | The default groups cells into clusters, so the column of heatmap represents 75 | genes while the row of heatmap for the clusters of all the cells. 76 | } 77 | \examples{ 78 | # RISC object 79 | obj0 = raw.mat[[3]] 80 | gene0 = c('Gene718', 'Gene120', 'Gene313', 'Gene157', 'Gene30', 81 | 'Gene325', 'Gene415', 'Gene566', 'Gene990', 'Gene13', 82 | 'Gene604', 'Gene934', 'Gene231', 'Gene782', 'Gene10') 83 | Heat(obj0, colFactor = 'Group', genes = gene0, gene.lab = TRUE, gene.cluster = 3, 84 | sample_bin = TRUE, lim = 2, gene.lab.size = 8) 85 | } 86 | \references{ 87 | Kolde, R. (2015) 88 | } 89 | -------------------------------------------------------------------------------- /man/Import-10X-h5.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/AllClasses.R 3 | \name{read10X_h5} 4 | \alias{read10X_h5} 5 | \title{Import data from 10X Genomics output (h5).} 6 | \usage{ 7 | read10X_h5(file.path, is.filter = TRUE) 8 | } 9 | \arguments{ 10 | \item{file.path}{The path of the filtered 10X Genomics output (h5 file).} 11 | 12 | \item{is.filter}{Remove not expressed genes.} 13 | } 14 | \value{ 15 | RISC single cell dataset, including count, coldata, and rowdata. 16 | } 17 | \description{ 18 | Import data directly from 10X Genomics output, usually using filtered gene 19 | matrices which contains h5 file. 20 | The user only need to input the directory into "data.path". If not the original 21 | 10X Genomics output, the user can use 'readsc' function. 22 | } 23 | -------------------------------------------------------------------------------- /man/Import-10X-mtx.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/AllClasses.R 3 | \name{read10X_mtx} 4 | \alias{read10X_mtx} 5 | \title{Import data from 10X Genomics output (tsv-mtx).} 6 | \usage{ 7 | read10X_mtx(data.path, sep = "\\t", is.filter = TRUE) 8 | } 9 | \arguments{ 10 | \item{data.path}{Directory containing the filtered 10X Genomics output, 11 | including three files: matrix.mtx, barcode.tsv (without colnames) and gene.tsv 12 | (without colnames).} 13 | 14 | \item{sep}{The sep can be changed by the users} 15 | 16 | \item{is.filter}{Remove not expressed genes.} 17 | } 18 | \value{ 19 | RISC single cell dataset, including count, coldata, and rowdata. 20 | } 21 | \description{ 22 | Import data directly from 10X Genomics output, usually using filtered gene 23 | matrices which contains three files: matrix.mtx, barcode.tsv and gene.tsv. 24 | The user only need to input the directory into "data.path". If not the original 25 | 10X Genomics output, the user have to make sure the barcode.tsv and gene.tsv 26 | without col.names, the barcode.tsv at least contains one column for cell 27 | barcode, and the gene.tsv has two columns for gene Ensembl ID and Symbol. 28 | } 29 | -------------------------------------------------------------------------------- /man/Import-Matrix.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/AllClasses.R 3 | \name{readsc} 4 | \alias{readsc} 5 | \title{Import data from matrix, cell and genes directly.} 6 | \usage{ 7 | readsc(count, cell, gene, is.filter = TRUE) 8 | } 9 | \arguments{ 10 | \item{count}{Matrix with raw counts/UMIs.} 11 | 12 | \item{cell}{Data.frame with cell Barcode, whose row.name are equal to the 13 | col.name of the matrix.} 14 | 15 | \item{gene}{Data.frame with gene symbol, whose row.name are the same as the 16 | row.name of the matrix.} 17 | 18 | \item{is.filter}{Remove not expressed genes.} 19 | } 20 | \value{ 21 | RISC single cell dataset, including count, coldata, and rowdata. 22 | } 23 | \description{ 24 | Import data set from matrix, cell and genes directly, the customer needs three 25 | files: a matrix file including gene expression values: raw counts/UMIs (rows for 26 | genes while columns for cells), a cell file (whose row.name are equal to the 27 | col.name of the matrix), and a gene file whose row.name are the same as the 28 | row.name of the matrix. If row.names of the gene matrix are Ensembl ID, the 29 | customer need to transfer them to gene symbols manually. 30 | } 31 | \examples{ 32 | mat0 = as.matrix(raw.mat[[1]]) 33 | coldata0 = as.data.frame(raw.mat[[2]]) 34 | coldata.obj = coldata0[coldata0$Batch0 == 'Batch3',] 35 | matrix.obj = mat0[,rownames(coldata.obj)] 36 | obj0 = readsc(count = matrix.obj, cell = coldata.obj, 37 | gene = data.frame(Symbol = rownames(matrix.obj), 38 | row.names = rownames(matrix.obj)), is.filter = FALSE) 39 | } 40 | -------------------------------------------------------------------------------- /man/InPlot.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/Integrating.R 3 | \name{InPlot} 4 | \alias{InPlot} 5 | \title{Integration Plot} 6 | \usage{ 7 | InPlot( 8 | object = NULL, 9 | var.gene = NULL, 10 | Colors = NULL, 11 | nPC = 20, 12 | neighbor = 30, 13 | res = 1, 14 | method = "louvain", 15 | algorithm = "kd_tree", 16 | ncore = 1, 17 | minPC = 11, 18 | Std.cut = 0.95, 19 | bin = 5 20 | ) 21 | } 22 | \arguments{ 23 | \item{object}{A list of RISC objects.} 24 | 25 | \item{var.gene}{The highly variable genes.} 26 | 27 | \item{Colors}{The colors labeling for different data sets.} 28 | 29 | \item{nPC}{The PCs will be calculated.} 30 | 31 | \item{neighbor}{The nearest neighbors.} 32 | 33 | \item{res}{The resolution of cluster searched for, works in "louvain" method.} 34 | 35 | \item{method}{The method of cell clustering for individual datasets.} 36 | 37 | \item{algorithm}{The algorithm for knn, the default is "kd_tree", all options: 38 | "kd_tree", "cover_tree", "CR", "brute".} 39 | 40 | \item{ncore}{The number of multiple cores for testing.} 41 | 42 | \item{minPC}{The minimal PCs for detecting cell clustering.} 43 | 44 | \item{Std.cut}{The cutoff of standard deviation of the PCs.} 45 | 46 | \item{bin}{The bin number for calculating cell clustering.} 47 | } 48 | \description{ 49 | The "InPlot" function makes the plot to show how the PCs explain the variance 50 | for data integration. This plot helps the users to select the optimal reference 51 | and the PCs to perform data integration. 52 | } 53 | \references{ 54 | Liu et al., Nature Biotech. (2021) 55 | } 56 | -------------------------------------------------------------------------------- /man/Integration-Algorithm-SIMPLS.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/Integrating.R 3 | \name{SIMPLS} 4 | \alias{SIMPLS} 5 | \title{Integration Algorithm SIMPLS} 6 | \usage{ 7 | SIMPLS(X, Y, npcs = 10, seed = 123) 8 | } 9 | \arguments{ 10 | \item{X}{The reference matrix, row for genes and column for cells.} 11 | 12 | \item{Y}{The target matrix, row for genes and column for cells.} 13 | 14 | \item{npcs}{The number of the PCs used for data integration.} 15 | 16 | \item{seed}{The random seed to keep consistent result.} 17 | } 18 | \description{ 19 | The partial least square (PLS) with SIMPLS algorithm is an extension of the 20 | multiple linear regression model and considered as bilinear factor models. 21 | Instead of embedding the reference and target matrices into a hyperplane 22 | of maximum variance, the PLS utilizes a linear regression to project the 23 | reference and target matrices into a new place. The SIMPLS algorithm provides 24 | the regularization procedure for PLS. The matrices need to be centered before 25 | SIMPLS integraton. 26 | } 27 | \references{ 28 | De-Jong et al. (1993) 29 | } 30 | -------------------------------------------------------------------------------- /man/MSC.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/Integrating.R 3 | \name{MSC} 4 | \alias{MSC} 5 | \title{Alignment of gene expression values} 6 | \usage{ 7 | MSC(X, Y) 8 | } 9 | \arguments{ 10 | \item{X}{The reference matrix, row for genes and column for cells.} 11 | 12 | \item{Y}{The target matrix, row for genes and column for cells.} 13 | } 14 | \description{ 15 | This funciton is not used in data integration but for adjusting the results, 16 | usually the users do not need this. 17 | } 18 | \references{ 19 | Mevik et al., JSS (2007) 20 | } 21 | -------------------------------------------------------------------------------- /man/Multiple-Integrating.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/Integrating.R 3 | \name{scMultiIntegrate} 4 | \alias{scMultiIntegrate} 5 | \title{Integrating Multiple Datasets} 6 | \usage{ 7 | scMultiIntegrate( 8 | objects, 9 | eigens = 10, 10 | add.Id = NULL, 11 | var.gene = NULL, 12 | align = "OLS", 13 | npc = 50, 14 | adjust = TRUE, 15 | ncore = 1, 16 | seed = 123 17 | ) 18 | } 19 | \arguments{ 20 | \item{objects}{The list of multiple RISC objects: 21 | list{object1, object2, object3, ...}. The first set is the reference to generate 22 | gene-eigenvectors.} 23 | 24 | \item{eigens}{The number of eigenvectors used for data integration.} 25 | 26 | \item{add.Id}{Add a vector of Id to label different datasets, a character vector.} 27 | 28 | \item{var.gene}{Define the variable genes manually. Here input a vector of gene 29 | names as variable genes} 30 | 31 | \item{align}{The method for alignment of gene expression values: "Optimal" for 32 | alignment by experience, "Predict" for alignment by RPCI prediction, and "OLS" 33 | for alignment by the ordinary linear regression.} 34 | 35 | \item{npc}{The number of the PCs returns from "scMultiIntegrate" function, 36 | they are usually used for the subsequent analyses, like cell embedding and 37 | cell clustering.} 38 | 39 | \item{adjust}{Whether adjust the number of eigenvectors.} 40 | 41 | \item{ncore}{The number of multiple cores for data integration.} 42 | 43 | \item{seed}{The random seed to keep consistent result.} 44 | } 45 | \description{ 46 | The "scMultiIntegrate" function can be used for data integration of multiple 47 | datasets, it is basically based on our new approach RPCI (reference principal 48 | components integration), which decomposes all the target datasets based on the 49 | reference data. The output of this function is RISC object, including the 50 | integrated eigenvectors and aligned gene expression values. 51 | } 52 | \examples{ 53 | obj1 = raw.mat[[3]] 54 | obj2 = raw.mat[[4]] 55 | obj0 = list(obj1, obj2) 56 | var0 = intersect(obj1@vargene, obj2@vargene) 57 | obj0 = scMultiIntegrate(obj0, eigens = 8, var.gene = var0, align = 'Predict', 58 | npc = 20, add.Id = c("Set1", "Set2"), ncore = 2) 59 | obj0 = scUMAP(obj0, npc = 8, use = "PLS", dist = 0.001, neighbors = 15) 60 | DimPlot(obj0, slot = "cell.umap", colFactor = "Set", size = 2) 61 | DimPlot(obj0, slot = "cell.umap", colFactor = "Group", size = 2, label = TRUE) 62 | } 63 | \references{ 64 | Liu et al., Nature Biotech. (2021) 65 | } 66 | -------------------------------------------------------------------------------- /man/Normalize.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/Preprocess.R 3 | \name{scNormalize} 4 | \alias{scNormalize} 5 | \title{The Processing Data.} 6 | \usage{ 7 | scNormalize( 8 | object, 9 | method = "robust", 10 | libsize = 1e+06, 11 | remove.mito = FALSE, 12 | norm.dis = TRUE, 13 | large = TRUE, 14 | ncore = 1 15 | ) 16 | } 17 | \arguments{ 18 | \item{object}{RISC object: a framework dataset.} 19 | 20 | \item{method}{A method for scdataset normalization, two options: "cosine" and 21 | "robust".} 22 | 23 | \item{libsize}{The standard sum of the UMI in each cell.} 24 | 25 | \item{remove.mito}{Remove mitochondrial genes from library size.} 26 | 27 | \item{norm.dis}{Normalize the distribution of count data.} 28 | 29 | \item{large}{Whether a large size data (ncell > 50,000)} 30 | 31 | \item{ncore}{The multiple cores for parallel calculating.} 32 | } 33 | \value{ 34 | RISC single cell dataset, the assay and rowdata slots. 35 | } 36 | \description{ 37 | After data filtration, RISC will normalized the raw counts/UMIs by using size 38 | factors which are calculated by the raw counts/UMIs of each cell and will 39 | remove sequencing depth batch. The output will be transformed in log1p. The gene 40 | expression values of RISC object for the subsequent analyses is based on the 41 | normalized data. Here two kinds of normalization can be employed: one is based on 42 | the least absolute deviations, while the other is from the least square. 43 | } 44 | \examples{ 45 | # RISC object 46 | obj0 = raw.mat[[5]] 47 | obj0 = scFilter(obj0, min.UMI = 0, max.UMI = Inf, min.gene = 10, min.cell = 3) 48 | obj0 = scNormalize(obj0) 49 | } 50 | \references{ 51 | Boscovich, R.J. (1757) 52 | 53 | Thompson, W.J., Computers in Physics (1992) 54 | } 55 | -------------------------------------------------------------------------------- /man/PCA.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/Reduce_Dimension.R 3 | \name{scPCA} 4 | \alias{scPCA} 5 | \title{Dimension Reduction.} 6 | \usage{ 7 | scPCA(object, npc = 20) 8 | } 9 | \arguments{ 10 | \item{object}{RISC object: a framework dataset.} 11 | 12 | \item{npc}{The number of PCs will be generated based on highly variable genes 13 | (usually < 1,500), npc equal to the first 20 PCs as the default.} 14 | } 15 | \value{ 16 | RISC single cell dataset, the DimReduction slot. 17 | } 18 | \description{ 19 | Based on highly variably expressed genes of the datasets, RISC calculates the 20 | principal components (PCs) of the cells using prcomp functions. The major PCs, 21 | which explain most gene expression variance, are used for dimension reduciton. 22 | } 23 | \examples{ 24 | # RISC object 25 | obj0 = raw.mat[[3]] 26 | obj0 = scPCA(obj0, npc = 10) 27 | } 28 | \references{ 29 | Jolliffe et al. (2016) 30 | 31 | Alter et al., PNAS (2000) 32 | 33 | Gonzalez et al., JSS (2008) 34 | 35 | Mevik et al., JSS (2007) 36 | } 37 | -------------------------------------------------------------------------------- /man/PCPlot.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/Graph.R 3 | \name{PCPlot} 4 | \alias{PCPlot} 5 | \title{Processing Plot} 6 | \usage{ 7 | PCPlot(object) 8 | } 9 | \arguments{ 10 | \item{object}{RISC object: a framework dataset.} 11 | } 12 | \description{ 13 | The "PCPlot" function makes the plot to show How the PCs explain the variance. 14 | This plot helps the users to select the optimal PCs to perform dimension 15 | reduction and data integration. 16 | } 17 | \references{ 18 | Wickham, H. (2016) 19 | 20 | Auguie, B. (2015) 21 | 22 | Liu et al., Nature Biotech. (2021) 23 | } 24 | -------------------------------------------------------------------------------- /man/PLS-Integrating.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/Integrating.R 3 | \name{scPLS} 4 | \alias{scPLS} 5 | \title{Integrating Multiple Large Datasets} 6 | \usage{ 7 | scPLS( 8 | objects, 9 | eigens = 10, 10 | add.Id = NULL, 11 | var.gene = NULL, 12 | npc = 100, 13 | adjust = TRUE, 14 | ncore = 1, 15 | seed = 123 16 | ) 17 | } 18 | \arguments{ 19 | \item{objects}{The list of multiple RISC objects: 20 | list{object1, object2, object3, ...}. The first set is the reference to generate 21 | gene-eigenvectors.} 22 | 23 | \item{eigens}{The number of eigenvectors used for data integration.} 24 | 25 | \item{add.Id}{Add a vector of Id to label different datasets, a character vector.} 26 | 27 | \item{var.gene}{Define the variable genes manually. Here input a vector of gene 28 | names as variable genes} 29 | 30 | \item{npc}{The number of the PCs returns from "scMultiIntegrate" function, 31 | they are usually used for the subsequent analyses, like cell embedding and 32 | cell clustering.} 33 | 34 | \item{adjust}{Whether adjust the number of eigenvectors.} 35 | 36 | \item{ncore}{The number of multiple cores for data integration.} 37 | 38 | \item{seed}{The random seed to keep consistent result.} 39 | } 40 | \description{ 41 | The "scPLS" function can be used for data integration of multiple 42 | datasets, it is basically based on our new algorithm: reference principal 43 | components integration (RPCI). RPCI decomposes all the target datasets based 44 | on the reference. The output of this function can be used for low dimension 45 | visualization. 46 | } 47 | \examples{ 48 | obj1 = raw.mat[[3]] 49 | obj2 = raw.mat[[4]] 50 | obj0 = list(obj1, obj2) 51 | var0 = intersect(obj1@vargene, obj2@vargene) 52 | PLS0 = scPLS(obj0, var.gene = var0, npc = 20, add.Id = c("Set1", "Set2"), ncore = 1) 53 | } 54 | \references{ 55 | Liu et al., Nature Biotech. (2021) 56 | } 57 | -------------------------------------------------------------------------------- /man/Pairwise-DEGs.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/ClusterMarker.R 3 | \name{scDEG} 4 | \alias{scDEG} 5 | \title{Find Differentially Expressed Genes between Clusters} 6 | \usage{ 7 | scDEG( 8 | object, 9 | cell.ctrl = NULL, 10 | cell.sam = NULL, 11 | frac = 0.1, 12 | log2FC = 0.5, 13 | Padj = 0.01, 14 | latent.factor = NULL, 15 | method = "NB", 16 | min.cells = 10, 17 | ncore = 1 18 | ) 19 | } 20 | \arguments{ 21 | \item{object}{RISC object: a framework dataset.} 22 | 23 | \item{cell.ctrl}{Select the cells as the reference cells for detecting DEGs.} 24 | 25 | \item{cell.sam}{Select the cells as the sample cells for detecting DEGs.} 26 | 27 | \item{frac}{A fraction cutoff, the cluster marker genes expressed at least a 28 | cutoff fraction of the cluster cells.} 29 | 30 | \item{log2FC}{The cutoff of log2 Fold-change for differentially expressed marker 31 | genes.} 32 | 33 | \item{Padj}{The cutoff of the adjusted P-value. If Padj is NULL, use p-value 34 | < 0.05 as a threshold. Set Padj as 1, without any cutoff.} 35 | 36 | \item{latent.factor}{The latent factor from coldata, which represents number 37 | values or factors, and only one latent factor can be inputed.} 38 | 39 | \item{method}{Which method is used to identify cluster markers, two options: 40 | 'NB' for Negative Binomial model, 'QP' for QuasiPoisson model, and 'wil' for 41 | Wilcoxon Rank-Sum model.} 42 | 43 | \item{min.cells}{The minimum cells for each cluster to calculate marker genes.} 44 | 45 | \item{ncore}{The multiple cores for parallel calculating.} 46 | } 47 | \description{ 48 | This is the basic function in RISC, it can identify the differentially expressed 49 | genes (DEGs) by comparing samples between the selected clusters. The criteria 50 | used for the cluster markers are also appropriate to DEGs. 51 | } 52 | \details{ 53 | Here RISC provides two algorithms to detect DEGs, the primary one is a model 54 | "Quasi-Poisson" which has advantage to identify DEGs from the cluster with 55 | a small number of cells. Meanwhile, RISC also has alternative algorithm: 56 | "Negative Binomial" model. 57 | 58 | Because log2 cannot handle counts with value 0, we use log1p to calculate average 59 | values of counts and log2 to format fold-change. 60 | } 61 | \examples{ 62 | # RISC object 63 | obj0 = raw.mat[[4]] 64 | obj0 = scPCA(obj0, npc = 10) 65 | obj0 = scUMAP(obj0, npc = 3) 66 | obj0 = scCluster(obj0, slot = "cell.umap", k = 3, method = 'density') 67 | DimPlot(obj0, slot = "cell.umap", colFactor = 'Cluster', size = 2) 68 | cell.ctrl = rownames(obj0@coldata)[obj0@coldata$Cluster == 1] 69 | cell.sam = rownames(obj0@coldata)[obj0@coldata$Cluster == 3] 70 | DEG0 = scDEG(obj0, cell.ctrl = cell.ctrl, cell.sam = cell.sam, 71 | min.cells = 3, method = 'QP') 72 | } 73 | \references{ 74 | Paternoster et al., Criminology (1997) 75 | 76 | Berk et al., Journal of Quantitative Criminology (2008) 77 | 78 | Liu et al., Nature Biotech. (2021) 79 | } 80 | -------------------------------------------------------------------------------- /man/Scale.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/Preprocess.R 3 | \name{scScale} 4 | \alias{scScale} 5 | \title{Processing Data.} 6 | \usage{ 7 | scScale(object, method = "scale", center = TRUE, scale = FALSE) 8 | } 9 | \arguments{ 10 | \item{object}{RISC object: a framework dataset.} 11 | 12 | \item{method}{A model used for scale scdataset, the default is root-mean-square 13 | scaling.} 14 | 15 | \item{center}{Whether to center the matrix.} 16 | 17 | \item{scale}{Whether using standard deviation to scale the matrix.} 18 | } 19 | \value{ 20 | The scaled matrix. 21 | } 22 | \description{ 23 | After data normalization, RISC will perform root-mean-square scaling to the 24 | dataset and generated scaled counts which balance expression levels in each cell 25 | with empirical mean equal to 0. Therefore, only biological signal will be 26 | reserved in scaled counts. RISC utilizes scaled counts for dimension reduction 27 | and data integration. 28 | } 29 | \examples{ 30 | # RISC object 31 | obj0 = raw.mat[[3]] 32 | scale.mat = scScale(obj0) 33 | } 34 | \references{ 35 | Juszczak et al., CiteSeer (2002) 36 | 37 | Jiawei et al. (2011) 38 | } 39 | -------------------------------------------------------------------------------- /man/SingleCellData.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/AllClasses.R 3 | \name{SingleCellData} 4 | \alias{SingleCellData} 5 | \title{Import single cell data} 6 | \usage{ 7 | SingleCellData(assay, coldata, rowdata) 8 | } 9 | \arguments{ 10 | \item{assay}{The list of gene counts.} 11 | 12 | \item{coldata}{The data.frame with cell information.} 13 | 14 | \item{rowdata}{The data.frame with gene information.} 15 | } 16 | \value{ 17 | SingleCellData 18 | } 19 | \description{ 20 | The single cell RNA-seq (scRNA-seq) data can be imported in three different ways. 21 | Primarily, we could import from 10X Genomics output directly by using 22 | "read10Xgenomics". The user only need to provide the folder path. Secondly, 23 | we could read data from HT-seq output by "readHTSeqdata", the user have to input 24 | the folder path. Lastly, we could input matrix, cell and genes mannually 25 | using "readscdata". 26 | } 27 | -------------------------------------------------------------------------------- /man/Subset.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/Utilities.R 3 | \name{SubSet} 4 | \alias{SubSet} 5 | \title{Utilities Subset data} 6 | \usage{ 7 | SubSet(object, cells = NULL, genes = NULL) 8 | } 9 | \arguments{ 10 | \item{object}{RISC object: a framework dataset.} 11 | 12 | \item{cells}{The cells are directly used for collecting a data subset.} 13 | 14 | \item{genes}{The genes are directly used for collecting a data subset.} 15 | } 16 | \description{ 17 | The "Subset" function can abstract a data subset from the full dataset, this 18 | function not only collect the subset of coldata and rowdata, but also abstract 19 | raw counts/UMIs. Meanwhile, after "Subset" function, RISC object need to be 20 | normalized and scaled one more time. 21 | } 22 | \examples{ 23 | # RISC object 24 | obj0 = raw.mat[[5]] 25 | obj0 26 | cell1 = rownames(obj0@coldata)[1:15] 27 | obj1 = SubSet(obj0, cells = cell1) 28 | obj1 29 | } 30 | -------------------------------------------------------------------------------- /man/UMAP.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/Reduce_Dimension.R 3 | \name{scUMAP} 4 | \alias{scUMAP} 5 | \title{Dimension Reduction.} 6 | \usage{ 7 | scUMAP( 8 | object, 9 | npc = 20, 10 | embedding = 2, 11 | use = "PCA", 12 | neighbors = 15, 13 | dist = 0.1, 14 | seed = 123, 15 | ... 16 | ) 17 | } 18 | \arguments{ 19 | \item{object}{RISC object: a framework dataset.} 20 | 21 | \item{npc}{The number of the PCs (or the PLS) using for UMAP, the default is 20, 22 | but need to be modified by the users. The PCA for individual dataset, while PLS 23 | for the integrated data.} 24 | 25 | \item{embedding}{The number of components UMAP output.} 26 | 27 | \item{use}{What components used for UMAP: PCA or PLS.} 28 | 29 | \item{neighbors}{The n_neighbors parameter of UMAP.} 30 | 31 | \item{dist}{The min_dist parameter of UMAP.} 32 | 33 | \item{seed}{The random seed to keep tSNE result consistent.} 34 | } 35 | \value{ 36 | RISC single cell dataset, the DimReduction slot. 37 | } 38 | \description{ 39 | The UMAP is calculated based on the eigenvectors of single cell dataset, and the 40 | user can select the eigenvectors manually. Of note, the selected eigenvectors 41 | directly affect UMAP values. 42 | For the integrated data (the result of "scMultiIntegrate" funciton), RISC utilizes 43 | the PCR output "PLS" to calculate the UMAP, therefore, the user has to input "PLS" 44 | in "use = ", instead of the default parameter "PCA". 45 | } 46 | \examples{ 47 | # RISC object 48 | obj0 = raw.mat[[3]] 49 | obj0 = scPCA(obj0, npc = 10) 50 | obj0 = scUMAP(obj0, npc = 3) 51 | DimPlot(obj0, slot = "cell.umap", colFactor = 'Group', size = 2) 52 | } 53 | \references{ 54 | Becht et al., Nature Biotech. (2018) 55 | } 56 | -------------------------------------------------------------------------------- /man/UMAPlot.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/Graph.R 3 | \name{UMAPlot} 4 | \alias{UMAPlot} 5 | \title{UMAP Plots} 6 | \usage{ 7 | UMAPlot( 8 | object, 9 | colFactor = NULL, 10 | genes = NULL, 11 | legend = TRUE, 12 | Colors = NULL, 13 | size = 0.5, 14 | Alpha = 0.8, 15 | plot.ncol = NULL, 16 | raw.count = FALSE, 17 | exp.range = NULL, 18 | exp.col = "firebrick2" 19 | ) 20 | } 21 | \arguments{ 22 | \item{object}{RISC object: a framework dataset.} 23 | 24 | \item{colFactor}{Use the factor (column name) in the coldata to make a UMAP plot, 25 | but each time only one column name can be inputted.} 26 | 27 | \item{genes}{Use the gene expression values (gene symbol) to make UMAP plots, 28 | each time more than one genes can be inputted.} 29 | 30 | \item{legend}{Whether a legend shown at UMAP plot.} 31 | 32 | \item{Colors}{The users can use their own colors (color vector). The default of 33 | the "UMAPlot" funciton will assign colors automatically.} 34 | 35 | \item{size}{Choose the size of dots at UMAP plots, the default size is 0.5.} 36 | 37 | \item{Alpha}{Whether show transparency of individual points, the default is 0.8.} 38 | 39 | \item{plot.ncol}{If the users input more than one genes, the arrangement of 40 | multiple UMAP plots depends on this parameter.} 41 | 42 | \item{raw.count}{If use normalized or raw counts.} 43 | 44 | \item{exp.range}{The gene expression cutoff for plot, e.g. "c(0, 1.5)" for 45 | expression level between 0 and 1.5.} 46 | 47 | \item{exp.col}{The gradient color for gene expression.} 48 | } 49 | \description{ 50 | The UMAP plots are widespread in scRNA-seq data analysis. Here, the "UMAPlot" 51 | function not only can make plots for factor labels of individual cells but also 52 | can show gene expression values of each cell. 53 | } 54 | \references{ 55 | Wickham, H. (2016) 56 | 57 | Auguie, B. (2015) 58 | } 59 | -------------------------------------------------------------------------------- /man/Violin-Plot.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/Graph.R 3 | \name{ViolinPlot} 4 | \alias{ViolinPlot} 5 | \title{Violin Plot} 6 | \usage{ 7 | ViolinPlot( 8 | object, 9 | colFactor = NULL, 10 | genes = NULL, 11 | legend = TRUE, 12 | trim = TRUE, 13 | Colors = NULL, 14 | Alpha = 0.8, 15 | dots = TRUE, 16 | wid = "area" 17 | ) 18 | } 19 | \arguments{ 20 | \item{object}{RISC object: a framework dataset.} 21 | 22 | \item{colFactor}{Use the factor (column name) in the coldata to make heatmap, but 23 | each time only one column name can be inputted.} 24 | 25 | \item{genes}{The gene expression pattern: gene symbol} 26 | 27 | \item{legend}{Whether a legend shown at heatmap.} 28 | 29 | \item{trim}{Whether trim the violin plot} 30 | 31 | \item{Colors}{The users can use their own colors (color vector). The default of 32 | the "UMAPlot" funciton will assign colors automatically.} 33 | 34 | \item{Alpha}{Whether show transparency of individual points, the default is 0.8.} 35 | 36 | \item{dots}{Adding jitter dots to the violin plot. The default is TRUE.} 37 | 38 | \item{wid}{The scale format, options: "area", "width", "count". 39 | The default: "area"} 40 | } 41 | \description{ 42 | The "ViolinPlot" map makes plots to show gene expression patterns of the clusters 43 | or other factors. The default groups cells into the clusters, but the users can 44 | input the factor (column name) of coldata. 45 | } 46 | \examples{ 47 | # RISC object 48 | obj0 = raw.mat[[3]] 49 | ViolinPlot(obj0, colFactor = 'Group', genes = 'Gene718') 50 | } 51 | \references{ 52 | Wickham, H. (2016) 53 | 54 | Auguie, B. (2015) 55 | } 56 | -------------------------------------------------------------------------------- /man/raw.mat.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/AllClasses.R 3 | \docType{data} 4 | \name{raw.mat} 5 | \alias{raw.mat} 6 | \title{Example data} 7 | \format{ 8 | A list including a simulated cell-gene matrix, columns for cells and 9 | rows for genes, a cell group and a batch information. 10 | } 11 | \usage{ 12 | data(raw.mat) 13 | } 14 | \description{ 15 | Example data 16 | } 17 | \keyword{datasets} 18 | -------------------------------------------------------------------------------- /man/setClass.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/AllGenerics.R 3 | \docType{class} 4 | \name{RISCdata-class} 5 | \alias{RISCdata-class} 6 | \title{RISC data} 7 | \value{ 8 | RISC object: a S4 framework dataset 9 | } 10 | \description{ 11 | The RISC object contains all the basic information used in single cell RNA-seq 12 | analysis, including raw counts/UMIs, normalized gene values, dimension reduction, 13 | cell clustering, and so on. The framework of RISC object is a S4 dataset, 14 | consisting of assay, coldata, rowdata, metadata, vargene, cluster, and 15 | DimReduction. 16 | } 17 | \section{Slots}{ 18 | 19 | \describe{ 20 | \item{\code{assay}}{The list of gene counts/UMIs: raw and normalized counts} 21 | 22 | \item{\code{coldata}}{The data.frame with cell information, such as cell types, stages, 23 | and other factors.} 24 | 25 | \item{\code{rowdata}}{The data.frame with gene information, such as coding or non-coding 26 | genes.} 27 | 28 | \item{\code{metadata}}{The data.frame with meta value.} 29 | 30 | \item{\code{vargene}}{The highly variable gene. 31 | These genes are utilized in dimension reduction.} 32 | 33 | \item{\code{cluster}}{The cell clustering information: include three algorithms for 34 | clustering.} 35 | 36 | \item{\code{DimReduction}}{The values of dimension reduction.} 37 | }} 38 | 39 | -------------------------------------------------------------------------------- /man/setMethod.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/AllGenerics.R 3 | \docType{methods} 4 | \name{RISCdata} 5 | \alias{RISCdata} 6 | \alias{.RISC_show} 7 | \alias{RISC} 8 | \alias{object} 9 | \title{RISC data} 10 | \usage{ 11 | .RISC_show(object) 12 | } 13 | \arguments{ 14 | \item{object}{RISC object: a S4 framework dataset} 15 | } 16 | \description{ 17 | This will show the full information of RISC object, including the number of cells, 18 | the number of genes, any biological or statistical information of cells or genes. 19 | } 20 | -------------------------------------------------------------------------------- /man/tSNE.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/Reduce_Dimension.R 3 | \name{scTSNE} 4 | \alias{scTSNE} 5 | \title{Dimension Reduction.} 6 | \usage{ 7 | scTSNE( 8 | object, 9 | npc = 20, 10 | embedding = 2, 11 | use = "PCA", 12 | perplexity = 30, 13 | seed = 123, 14 | ... 15 | ) 16 | } 17 | \arguments{ 18 | \item{object}{RISC object: a framework dataset.} 19 | 20 | \item{npc}{The number of PCs (or PLS) using for t-SNE, the default is 20, 21 | but need to be modified by the users. The PCA for individual dataset, while 22 | PLS for the integrated data.} 23 | 24 | \item{embedding}{The number of components t-SNE output.} 25 | 26 | \item{use}{What components used for t-SNE: PCA or PLS.} 27 | 28 | \item{perplexity}{Perplexity parameter: if the cell numbers are small, 29 | decrease this parameter, otherwise tSNE cannot be calculated.} 30 | 31 | \item{seed}{The random seed to keep tSNE result consistent.} 32 | } 33 | \value{ 34 | RISC single cell dataset, the DimReduction slot. 35 | } 36 | \description{ 37 | The t-SNE is calculated based on the eigenvectors of single cell dataset, and 38 | the user can select the eigenvectors manually. Of note, the selected eigenvectors 39 | directly affect t-SNE values. 40 | For the integrated data (the result of "scMultiIntegrate" funciton), RISC utilizes 41 | the PCR output "PLS" to calculate the t-SNE, therefore, the user has to input 42 | "PLS" in "use = ", instead of the defaut parameter "PCA". 43 | } 44 | \examples{ 45 | # RISC object 46 | obj0 = raw.mat[[3]] 47 | obj0 = scPCA(obj0, npc = 10) 48 | obj0 = scTSNE(obj0, npc = 4, perplexity = 10) 49 | DimPlot(obj0, slot = "cell.tsne", colFactor = 'Group', size = 2) 50 | } 51 | \references{ 52 | Laurens van der Maaten, JMLR (2014) 53 | } 54 | -------------------------------------------------------------------------------- /src/Makevars: -------------------------------------------------------------------------------- 1 | ## OpenMP support in Armadillo prefers C++11 support. However, for wider 2 | ## availability of the package we do not yet enforce this here. It is however 3 | ## recommended for client packages to set it. 4 | ## 5 | ## And with R 3.4.0, and RcppArmadillo 0.7.960.*, we turn C++11 on as OpenMP 6 | ## support within Armadillo prefers / requires it 7 | 8 | PKG_CXXFLAGS = $(SHLIB_OPENMP_CXXFLAGS) 9 | PKG_LIBS = $(SHLIB_OPENMP_CXXFLAGS) $(LAPACK_LIBS) $(BLAS_LIBS) $(FLIBS) 10 | -------------------------------------------------------------------------------- /src/Makevars.win: -------------------------------------------------------------------------------- 1 | ## OpenMP support in Armadillo prefers C++11 support. However, for wider 2 | ## availability of the package we do not yet enforce this here. It is however 3 | ## recommended for client packages to set it. 4 | ## 5 | ## And with R 3.4.0, and RcppArmadillo 0.7.960.*, we turn C++11 on as OpenMP 6 | ## support within Armadillo prefers / requires it 7 | 8 | PKG_CXXFLAGS = $(SHLIB_OPENMP_CXXFLAGS) 9 | PKG_LIBS = $(SHLIB_OPENMP_CXXFLAGS) $(LAPACK_LIBS) $(BLAS_LIBS) $(FLIBS) 10 | -------------------------------------------------------------------------------- /src/RcppArmadilloProcess.cpp: -------------------------------------------------------------------------------- 1 | // Most codes of C++ functions are based on Dirk Eddelbuettel's 2 | // 'RcppArmadillo' examples. RISC package rebuilds them to support 3 | // the calculation of sparse matrices. 4 | 5 | #include 6 | // [[Rcpp::depends(RcppArmadillo)]] 7 | 8 | using namespace Rcpp; 9 | using namespace arma; 10 | 11 | 12 | // [[Rcpp::export]] 13 | arma::sp_mat sqrt_sp(arma::sp_mat X) { 14 | return sqrt(X); 15 | } 16 | 17 | // [[Rcpp::export]] 18 | arma::mat cent_sp_d(arma::sp_mat X) { 19 | arma::mat Y = conv_to::from(X); 20 | int c = Y.n_cols; 21 | rowvec meanx(c); 22 | meanx = mean(Y, 0); 23 | for(int j=0; j= y); 87 | x.elem(id).fill(y); 88 | return x; 89 | } 90 | 91 | // [[Rcpp::export]] 92 | arma::colvec lm_coef(arma::mat X, arma::colvec y) { 93 | arma::colvec coef = arma::solve(X, y); 94 | return coef; 95 | } 96 | 97 | // [[Rcpp::export]] 98 | Rcpp::List lm_(arma::mat X, arma::colvec y) { 99 | 100 | int n = X.n_rows, k = X.n_cols; 101 | arma::colvec coef = arma::solve(X, y); 102 | arma::colvec res = y - X * coef; 103 | double s2 = std::inner_product(res.begin(), res.end(), res.begin(), 0.0) / (n - k); 104 | arma::colvec std_err = arma::sqrt(s2 * arma::diagvec(arma::pinv(arma::trans(X) * X))); 105 | return Rcpp::List::create(Rcpp::Named("coefficients") = coef, 106 | Rcpp::Named("stderr") = std_err, 107 | Rcpp::Named("df.residual") = n - k); 108 | } 109 | 110 | 111 | -------------------------------------------------------------------------------- /src/RcppExports.cpp: -------------------------------------------------------------------------------- 1 | // Generated by using Rcpp::compileAttributes() -> do not edit by hand 2 | // Generator token: 10BE3573-1514-4C36-9D1C-5A225CD40393 3 | 4 | #include 5 | #include 6 | 7 | using namespace Rcpp; 8 | 9 | #ifdef RCPP_USE_GLOBAL_ROSTREAM 10 | Rcpp::Rostream& Rcpp::Rcout = Rcpp::Rcpp_cout_get(); 11 | Rcpp::Rostream& Rcpp::Rcerr = Rcpp::Rcpp_cerr_get(); 12 | #endif 13 | 14 | // sqrt_sp 15 | arma::sp_mat sqrt_sp(arma::sp_mat X); 16 | RcppExport SEXP _RISC_sqrt_sp(SEXP XSEXP) { 17 | BEGIN_RCPP 18 | Rcpp::RObject rcpp_result_gen; 19 | Rcpp::RNGScope rcpp_rngScope_gen; 20 | Rcpp::traits::input_parameter< arma::sp_mat >::type X(XSEXP); 21 | rcpp_result_gen = Rcpp::wrap(sqrt_sp(X)); 22 | return rcpp_result_gen; 23 | END_RCPP 24 | } 25 | // cent_sp_d 26 | arma::mat cent_sp_d(arma::sp_mat X); 27 | RcppExport SEXP _RISC_cent_sp_d(SEXP XSEXP) { 28 | BEGIN_RCPP 29 | Rcpp::RObject rcpp_result_gen; 30 | Rcpp::RNGScope rcpp_rngScope_gen; 31 | Rcpp::traits::input_parameter< arma::sp_mat >::type X(XSEXP); 32 | rcpp_result_gen = Rcpp::wrap(cent_sp_d(X)); 33 | return rcpp_result_gen; 34 | END_RCPP 35 | } 36 | // multiply_sp_sp 37 | arma::sp_mat multiply_sp_sp(arma::sp_mat X, arma::sp_mat Y); 38 | RcppExport SEXP _RISC_multiply_sp_sp(SEXP XSEXP, SEXP YSEXP) { 39 | BEGIN_RCPP 40 | Rcpp::RObject rcpp_result_gen; 41 | Rcpp::RNGScope rcpp_rngScope_gen; 42 | Rcpp::traits::input_parameter< arma::sp_mat >::type X(XSEXP); 43 | Rcpp::traits::input_parameter< arma::sp_mat >::type Y(YSEXP); 44 | rcpp_result_gen = Rcpp::wrap(multiply_sp_sp(X, Y)); 45 | return rcpp_result_gen; 46 | END_RCPP 47 | } 48 | // multiply_sp_d 49 | arma::mat multiply_sp_d(arma::sp_mat X, arma::sp_mat Y); 50 | RcppExport SEXP _RISC_multiply_sp_d(SEXP XSEXP, SEXP YSEXP) { 51 | BEGIN_RCPP 52 | Rcpp::RObject rcpp_result_gen; 53 | Rcpp::RNGScope rcpp_rngScope_gen; 54 | Rcpp::traits::input_parameter< arma::sp_mat >::type X(XSEXP); 55 | Rcpp::traits::input_parameter< arma::sp_mat >::type Y(YSEXP); 56 | rcpp_result_gen = Rcpp::wrap(multiply_sp_d(X, Y)); 57 | return rcpp_result_gen; 58 | END_RCPP 59 | } 60 | // multiply_d_d 61 | arma::mat multiply_d_d(arma::mat X, arma::mat Y); 62 | RcppExport SEXP _RISC_multiply_d_d(SEXP XSEXP, SEXP YSEXP) { 63 | BEGIN_RCPP 64 | Rcpp::RObject rcpp_result_gen; 65 | Rcpp::RNGScope rcpp_rngScope_gen; 66 | Rcpp::traits::input_parameter< arma::mat >::type X(XSEXP); 67 | Rcpp::traits::input_parameter< arma::mat >::type Y(YSEXP); 68 | rcpp_result_gen = Rcpp::wrap(multiply_d_d(X, Y)); 69 | return rcpp_result_gen; 70 | END_RCPP 71 | } 72 | // multiply_sp_d_sp 73 | arma::sp_mat multiply_sp_d_sp(arma::sp_mat X, arma::mat Y); 74 | RcppExport SEXP _RISC_multiply_sp_d_sp(SEXP XSEXP, SEXP YSEXP) { 75 | BEGIN_RCPP 76 | Rcpp::RObject rcpp_result_gen; 77 | Rcpp::RNGScope rcpp_rngScope_gen; 78 | Rcpp::traits::input_parameter< arma::sp_mat >::type X(XSEXP); 79 | Rcpp::traits::input_parameter< arma::mat >::type Y(YSEXP); 80 | rcpp_result_gen = Rcpp::wrap(multiply_sp_d_sp(X, Y)); 81 | return rcpp_result_gen; 82 | END_RCPP 83 | } 84 | // multiply_sp_d_v 85 | arma::vec multiply_sp_d_v(arma::sp_mat X, arma::mat Y); 86 | RcppExport SEXP _RISC_multiply_sp_d_v(SEXP XSEXP, SEXP YSEXP) { 87 | BEGIN_RCPP 88 | Rcpp::RObject rcpp_result_gen; 89 | Rcpp::RNGScope rcpp_rngScope_gen; 90 | Rcpp::traits::input_parameter< arma::sp_mat >::type X(XSEXP); 91 | Rcpp::traits::input_parameter< arma::mat >::type Y(YSEXP); 92 | rcpp_result_gen = Rcpp::wrap(multiply_sp_d_v(X, Y)); 93 | return rcpp_result_gen; 94 | END_RCPP 95 | } 96 | // crossprod_sp_sp 97 | arma::sp_mat crossprod_sp_sp(arma::sp_mat X, arma::sp_mat Y); 98 | RcppExport SEXP _RISC_crossprod_sp_sp(SEXP XSEXP, SEXP YSEXP) { 99 | BEGIN_RCPP 100 | Rcpp::RObject rcpp_result_gen; 101 | Rcpp::RNGScope rcpp_rngScope_gen; 102 | Rcpp::traits::input_parameter< arma::sp_mat >::type X(XSEXP); 103 | Rcpp::traits::input_parameter< arma::sp_mat >::type Y(YSEXP); 104 | rcpp_result_gen = Rcpp::wrap(crossprod_sp_sp(X, Y)); 105 | return rcpp_result_gen; 106 | END_RCPP 107 | } 108 | // crossprod_sp_d 109 | arma::mat crossprod_sp_d(arma::sp_mat X, arma::sp_mat Y); 110 | RcppExport SEXP _RISC_crossprod_sp_d(SEXP XSEXP, SEXP YSEXP) { 111 | BEGIN_RCPP 112 | Rcpp::RObject rcpp_result_gen; 113 | Rcpp::RNGScope rcpp_rngScope_gen; 114 | Rcpp::traits::input_parameter< arma::sp_mat >::type X(XSEXP); 115 | Rcpp::traits::input_parameter< arma::sp_mat >::type Y(YSEXP); 116 | rcpp_result_gen = Rcpp::wrap(crossprod_sp_d(X, Y)); 117 | return rcpp_result_gen; 118 | END_RCPP 119 | } 120 | // crossprod_d_d 121 | arma::mat crossprod_d_d(arma::mat X, arma::mat Y); 122 | RcppExport SEXP _RISC_crossprod_d_d(SEXP XSEXP, SEXP YSEXP) { 123 | BEGIN_RCPP 124 | Rcpp::RObject rcpp_result_gen; 125 | Rcpp::RNGScope rcpp_rngScope_gen; 126 | Rcpp::traits::input_parameter< arma::mat >::type X(XSEXP); 127 | Rcpp::traits::input_parameter< arma::mat >::type Y(YSEXP); 128 | rcpp_result_gen = Rcpp::wrap(crossprod_d_d(X, Y)); 129 | return rcpp_result_gen; 130 | END_RCPP 131 | } 132 | // tcrossprod_sp_sp 133 | arma::sp_mat tcrossprod_sp_sp(arma::sp_mat X, arma::sp_mat Y); 134 | RcppExport SEXP _RISC_tcrossprod_sp_sp(SEXP XSEXP, SEXP YSEXP) { 135 | BEGIN_RCPP 136 | Rcpp::RObject rcpp_result_gen; 137 | Rcpp::RNGScope rcpp_rngScope_gen; 138 | Rcpp::traits::input_parameter< arma::sp_mat >::type X(XSEXP); 139 | Rcpp::traits::input_parameter< arma::sp_mat >::type Y(YSEXP); 140 | rcpp_result_gen = Rcpp::wrap(tcrossprod_sp_sp(X, Y)); 141 | return rcpp_result_gen; 142 | END_RCPP 143 | } 144 | // tcrossprod_d_d 145 | arma::mat tcrossprod_d_d(arma::mat X, arma::mat Y); 146 | RcppExport SEXP _RISC_tcrossprod_d_d(SEXP XSEXP, SEXP YSEXP) { 147 | BEGIN_RCPP 148 | Rcpp::RObject rcpp_result_gen; 149 | Rcpp::RNGScope rcpp_rngScope_gen; 150 | Rcpp::traits::input_parameter< arma::mat >::type X(XSEXP); 151 | Rcpp::traits::input_parameter< arma::mat >::type Y(YSEXP); 152 | rcpp_result_gen = Rcpp::wrap(tcrossprod_d_d(X, Y)); 153 | return rcpp_result_gen; 154 | END_RCPP 155 | } 156 | // winsorize_ 157 | arma::rowvec winsorize_(arma::rowvec x, double y); 158 | RcppExport SEXP _RISC_winsorize_(SEXP xSEXP, SEXP ySEXP) { 159 | BEGIN_RCPP 160 | Rcpp::RObject rcpp_result_gen; 161 | Rcpp::RNGScope rcpp_rngScope_gen; 162 | Rcpp::traits::input_parameter< arma::rowvec >::type x(xSEXP); 163 | Rcpp::traits::input_parameter< double >::type y(ySEXP); 164 | rcpp_result_gen = Rcpp::wrap(winsorize_(x, y)); 165 | return rcpp_result_gen; 166 | END_RCPP 167 | } 168 | // lm_coef 169 | arma::colvec lm_coef(arma::mat X, arma::colvec y); 170 | RcppExport SEXP _RISC_lm_coef(SEXP XSEXP, SEXP ySEXP) { 171 | BEGIN_RCPP 172 | Rcpp::RObject rcpp_result_gen; 173 | Rcpp::RNGScope rcpp_rngScope_gen; 174 | Rcpp::traits::input_parameter< arma::mat >::type X(XSEXP); 175 | Rcpp::traits::input_parameter< arma::colvec >::type y(ySEXP); 176 | rcpp_result_gen = Rcpp::wrap(lm_coef(X, y)); 177 | return rcpp_result_gen; 178 | END_RCPP 179 | } 180 | // lm_ 181 | Rcpp::List lm_(arma::mat X, arma::colvec y); 182 | RcppExport SEXP _RISC_lm_(SEXP XSEXP, SEXP ySEXP) { 183 | BEGIN_RCPP 184 | Rcpp::RObject rcpp_result_gen; 185 | Rcpp::RNGScope rcpp_rngScope_gen; 186 | Rcpp::traits::input_parameter< arma::mat >::type X(XSEXP); 187 | Rcpp::traits::input_parameter< arma::colvec >::type y(ySEXP); 188 | rcpp_result_gen = Rcpp::wrap(lm_(X, y)); 189 | return rcpp_result_gen; 190 | END_RCPP 191 | } 192 | 193 | static const R_CallMethodDef CallEntries[] = { 194 | {"_RISC_sqrt_sp", (DL_FUNC) &_RISC_sqrt_sp, 1}, 195 | {"_RISC_cent_sp_d", (DL_FUNC) &_RISC_cent_sp_d, 1}, 196 | {"_RISC_multiply_sp_sp", (DL_FUNC) &_RISC_multiply_sp_sp, 2}, 197 | {"_RISC_multiply_sp_d", (DL_FUNC) &_RISC_multiply_sp_d, 2}, 198 | {"_RISC_multiply_d_d", (DL_FUNC) &_RISC_multiply_d_d, 2}, 199 | {"_RISC_multiply_sp_d_sp", (DL_FUNC) &_RISC_multiply_sp_d_sp, 2}, 200 | {"_RISC_multiply_sp_d_v", (DL_FUNC) &_RISC_multiply_sp_d_v, 2}, 201 | {"_RISC_crossprod_sp_sp", (DL_FUNC) &_RISC_crossprod_sp_sp, 2}, 202 | {"_RISC_crossprod_sp_d", (DL_FUNC) &_RISC_crossprod_sp_d, 2}, 203 | {"_RISC_crossprod_d_d", (DL_FUNC) &_RISC_crossprod_d_d, 2}, 204 | {"_RISC_tcrossprod_sp_sp", (DL_FUNC) &_RISC_tcrossprod_sp_sp, 2}, 205 | {"_RISC_tcrossprod_d_d", (DL_FUNC) &_RISC_tcrossprod_d_d, 2}, 206 | {"_RISC_winsorize_", (DL_FUNC) &_RISC_winsorize_, 2}, 207 | {"_RISC_lm_coef", (DL_FUNC) &_RISC_lm_coef, 2}, 208 | {"_RISC_lm_", (DL_FUNC) &_RISC_lm_, 2}, 209 | {NULL, NULL, 0} 210 | }; 211 | 212 | RcppExport void R_init_RISC(DllInfo *dll) { 213 | R_registerRoutines(dll, NULL, CallEntries, NULL, NULL); 214 | R_useDynamicSymbols(dll, FALSE); 215 | } 216 | -------------------------------------------------------------------------------- /tests/testthat.R: -------------------------------------------------------------------------------- 1 | library(testthat) 2 | library(RISC) 3 | 4 | test_check("RISC") 5 | -------------------------------------------------------------------------------- /tests/testthat/test-workflow.R: -------------------------------------------------------------------------------- 1 | context("test-workflow") 2 | 3 | # Load raw data 4 | # load("../testdata/testdata.rda") 5 | 6 | mat0 = as.matrix(raw.mat[[1]]) 7 | coldata0 = as.data.frame(raw.mat[[2]]) 8 | coldata1 <- coldata0[coldata0$Batch0 == 'Batch1',] 9 | coldata2 <- coldata0[coldata0$Batch0 == 'Batch4',] 10 | mat1 <- mat0[,rownames(coldata1)] 11 | mat2 <- mat0[,rownames(coldata2)] 12 | 13 | 14 | ##################################################################################### 15 | context("Creat RISC object") 16 | 17 | sce1 <- readsc(count = mat1, cell = coldata1, gene = data.frame(Symbol = rownames(mat1), row.names = rownames(mat1)), is.filter = FALSE) 18 | sce2 <- readsc(count = mat2, cell = coldata2, gene = data.frame(Symbol = rownames(mat2), row.names = rownames(mat2)), is.filter = FALSE) 19 | 20 | test_that("Whether objects are scdataset objects", { 21 | expect_is(sce1, 'RISCdata') 22 | expect_is(sce2, 'RISCdata') 23 | }) 24 | 25 | test_that("col.names of matrix in objects equal to row.names of coldata in objects", { 26 | expect_equal(colnames(sce1@assay$count), rownames(sce1@coldata)) 27 | }) 28 | 29 | test_that("row.names of matrix in objects equal to row.names of rowdata in objects", { 30 | expect_equal(rownames(sce2@assay$count), rownames(sce2@rowdata)) 31 | }) 32 | 33 | 34 | ##################################################################################### 35 | context("Preprocess data") 36 | 37 | sce1 = scFilter(sce1, min.UMI = 0, max.UMI = Inf, min.gene = 0, min.cell = 0, is.filter = FALSE) 38 | sce2 = scFilter(sce2, min.UMI = 0, max.UMI = Inf, min.gene = 0, min.cell = 0, is.filter = FALSE) 39 | 40 | test_that("Select the correct cells", { 41 | expect_equal(min(sce1@coldata$nGene), 464) 42 | expect_equal(max(sce2@coldata$nGene), 522) 43 | }) 44 | 45 | test_that("Select the correct genes", { 46 | expect_equal(sce1@rowdata$Symbol[1], "Gene1") 47 | expect_equal(max(sce2@rowdata$nCell), 25) 48 | expect_false(is.null(sce1@metadata$filter)) 49 | }) 50 | 51 | sce1 = scNormalize(sce1) 52 | sce2 = scNormalize(sce2) 53 | 54 | test_that("Normalize raw counts/UMIs", { 55 | expect_equal(min(sce1@assay$logcount), 0) 56 | expect_gt(max(sce2@assay$logcount), 6.4) 57 | expect_equal(length(sce1@metadata$normalise), 1) 58 | }) 59 | 60 | 61 | ##################################################################################### 62 | context("Identify highly variable genes") 63 | 64 | sce1 = scDisperse(sce1) 65 | sce2 = scDisperse(sce2) 66 | 67 | test_that("highly variable genes", { 68 | expect_false(is.null(sce1@vargene)) 69 | expect_false(is.null(sce2@metadata$dispersion.var)) 70 | }) 71 | 72 | 73 | ##################################################################################### 74 | context("Dimension reduction") 75 | 76 | sce1 = scPCA(sce1, npc = 5) 77 | sce2 = scPCA(sce2, npc = 5) 78 | 79 | test_that("PCA", { 80 | expect_false(is.null(sce1@DimReduction$cell.pca)) 81 | expect_gt(max(sce2@DimReduction$var.pca), 0.30) 82 | }) 83 | 84 | sce1 = scTSNE(sce1, npc = 5, perplexity = 5) 85 | sce2 = scTSNE(sce2, npc = 5, perplexity = 5) 86 | 87 | test_that("tSNE", { 88 | expect_false(is.null(sce1@DimReduction$cell.tsne)) 89 | }) 90 | 91 | sce1 = scUMAP(sce1, npc = 5) 92 | sce2 = scUMAP(sce2, npc = 5) 93 | 94 | test_that("UMAP", { 95 | expect_false(is.null(sce1@DimReduction$cell.umap)) 96 | }) 97 | 98 | 99 | ##################################################################################### 100 | context("Data integration") 101 | 102 | var0 = intersect(rownames(sce1@assay$logcount), rownames(sce2@assay$logcount)) 103 | idat = list(sce1, sce2) 104 | idat = scMultiIntegrate(idat, eigens = 4, npc = 5, var.gene = var0, add.Id = c('Set1', 'Set2')) 105 | 106 | test_that("Integrate datasets", { 107 | expect_false(is.null(idat@DimReduction$cell.pls)) 108 | expect_equal(length(levels(idat@coldata$Set)), 2) 109 | }) 110 | 111 | 112 | ##################################################################################### 113 | context("Cell clustering") 114 | 115 | idat = scUMAP(idat, npc = 5, use = 'PLS') 116 | idat = scCluster(idat, slot = 'cell.umap', k = 4, method = 'density', dc = 0.3253) 117 | 118 | test_that("Integrate datasets", { 119 | expect_false(is.null(idat@DimReduction$cell.umap)) 120 | expect_equal(length(levels(idat@coldata$Cluster)), 4) 121 | }) 122 | 123 | 124 | -------------------------------------------------------------------------------- /vignettes/RISC_Vignette.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "RSC: robust integration of single-cell RNA-seq datasets using a single reference space" 3 | author: 4 | - name: Yang Liu, Tao Wang, Deyou Zheng 5 | affiliation: Albert einstein college of medicine, Bronx, NY, United States 6 | date: "2021" 7 | output: rmarkdown::html_vignette 8 | package: RISC 9 | vignette: > 10 | %\VignetteIndexEntry{RSC: robust integration of single-cell RNA-seq datasets using a single reference space} 11 | %\VignetteEngine{knitr::rmarkdown} 12 | %\VignetteEncoding{UTF-8} 13 | --- 14 | 15 | ```{r, include = FALSE} 16 | knitr::opts_chunk$set( 17 | collapse = TRUE, 18 | comment = "#>" 19 | ) 20 | ``` 21 | 22 | # Introduction 23 | 24 | Single-cell RNA sequencing (scRNA-seq) has become an essential genomic technology for resolving gene expression heterogeneity in single cells and has been widely used in many biological domains. It remains challenging to integrate scRNA-seq datasets with inter-sample heterogeneity, for example, different cell subpopulation compositions among datasets or gene expression difference in the same cell groups across datasets. 25 | 26 | We find that the distortion in the integration of heterogeneous data is due to the lack of a consistent global reference space for projecting all cells from individual datasets. To overcome this issue, we develop a novel approach, named reference principal component integration (RPCI) and implemente it in a new scRNA-seq analysis package called “RISC”, for robust integration of scRNA-seq data. 27 | 28 | 29 | ```{r setup} 30 | library(RISC) 31 | library(RColorBrewer) 32 | 33 | data("raw.mat") 34 | mat0 = raw.mat[[1]] 35 | coldata0 = raw.mat[[2]] 36 | 37 | coldata1 = coldata0[coldata0$Batch0 == 'Batch1',] 38 | coldata2 = coldata0[coldata0$Batch0 == 'Batch2',] 39 | coldata3 = coldata0[coldata0$Batch0 == 'Batch3',] 40 | coldata4 = coldata0[coldata0$Batch0 == 'Batch4',] 41 | coldata5 = coldata0[coldata0$Batch0 == 'Batch5',] 42 | coldata6 = coldata0[coldata0$Batch0 == 'Batch6',] 43 | mat1 = mat0[,rownames(coldata1)] 44 | mat2 = mat0[,rownames(coldata2)] 45 | mat3 = mat0[,rownames(coldata3)] 46 | mat4 = mat0[,rownames(coldata4)] 47 | mat5 = mat0[,rownames(coldata5)] 48 | mat6 = mat0[,rownames(coldata6)] 49 | ``` 50 | 51 | 52 | # Heterogeneous Simulated data 53 | 54 | To show the advantages of RPCI and evaluate its performances quantitatively, we start with simulated data and control the degrees of gene expression difference in two of the three cell groups between datasets: more DE genes in c/c' than that in b/b', and no DE genes in group a. As expected, the cell groups with increasing DE genes displayed a gradual reduction in cell similarity. 55 | 56 | # Create RISC objects 57 | 58 | We generate RISC objects from the gene-cell matrix (mat), the data frame of the cells (coldata), and the data frame of the genes, using "readsc" function. The RISC objects can also be generated by "read10X_mtx" or "read10X_h5" function for 10X Genomics data. 59 | 60 | 61 | ```{r} 62 | sce1 = readsc(count = mat1, cell = coldata1, gene = data.frame(Symbol = rownames(mat1), row.names = rownames(mat1)), is.filter = FALSE) 63 | sce2 = readsc(count = mat2, cell = coldata2, gene = data.frame(Symbol = rownames(mat2), row.names = rownames(mat2)), is.filter = FALSE) 64 | sce3 = readsc(count = mat3, cell = coldata3, gene = data.frame(Symbol = rownames(mat3), row.names = rownames(mat3)), is.filter = FALSE) 65 | sce4 = readsc(count = mat4, cell = coldata4, gene = data.frame(Symbol = rownames(mat4), row.names = rownames(mat4)), is.filter = FALSE) 66 | sce5 = readsc(count = mat5, cell = coldata5, gene = data.frame(Symbol = rownames(mat5), row.names = rownames(mat5)), is.filter = FALSE) 67 | sce6 = readsc(count = mat6, cell = coldata6, gene = data.frame(Symbol = rownames(mat6), row.names = rownames(mat6)), is.filter = FALSE) 68 | ``` 69 | 70 | 71 | # Processing RISC data 72 | 73 | After create RISC objects, we next process the RISC data, here we show the standard processes: 74 | (1) filter the cells, remove cells with extremely low or high UMIs and discard cells with extremely low number of expressed genes. Here we use simulated data, so we do not filter out any cell. 75 | (2) normalize gene expression, removing the effect of RNA sequencing depth. 76 | (3) scale gene expression, the scaled counts merely contain gene signal information for individual cells, and yield column-wise zero empirical mean for each column, thus satisfying the requirement for PCA and SVD. 77 | (4) find highly variable genes, identify highly variable genes by Quasi-Poisson model and utilize them for gene-cell matrix decomposition and data integration. 78 | 79 | 80 | ```{r} 81 | process0 <- function(obj0){ 82 | 83 | # filter cells 84 | obj0 = scFilter(obj0, min.UMI = 0, max.UMI = Inf, min.gene = 10, min.cell = 3, is.filter = FALSE) 85 | 86 | # normalize data 87 | obj0 = scNormalize(obj0, method = 'robust') 88 | 89 | # find highly variable genes 90 | obj0 = scDisperse(obj0) 91 | 92 | # here replace highly variable genes by all the genes for integraton 93 | obj0@vargene = rownames(sce1@rowdata) 94 | 95 | return(obj0) 96 | 97 | } 98 | 99 | sce1 = process0(sce1) 100 | sce2 = process0(sce2) 101 | sce3 = process0(sce3) 102 | sce4 = process0(sce4) 103 | sce5 = process0(sce5) 104 | sce6 = process0(sce6) 105 | ``` 106 | 107 | 108 | # RPCI integration 109 | 110 | The core principle of RPCI is very different from existing methods, RPCI introduces an effective formula to calibrate cell similarity by a global reference, and directly projects all cells into a reference RPCI space. 111 | 112 | 113 | ```{r} 114 | set.seed(1) 115 | var.genes = rownames(sce1@assay$count) 116 | pcr0 = list(sce1, sce2, sce3, sce4, sce5, sce6) 117 | pcr0 = scMultiIntegrate(pcr0, eigens = 9, var.gene = var.genes, align = 'OLS', npc = 15) 118 | # pcr0 = scLargeIntegrate(pcr0, var.gene = var.genes, align = 'Predict', npc = 8) 119 | pcr0 =scUMAP(pcr0, npc = 9, use = 'PLS', dist = 0.001, neighbors = 15) 120 | ``` 121 | 122 | ```{r} 123 | pcr0@coldata$Group = factor(pcr0@coldata$Group0, levels = c('Group1', 'Group2', 'Group2*', 'Group3', 'Group3*'), labels = c("a", "b", "b'", "c", "c'")) 124 | pcr0@coldata$Set0 = factor(pcr0@coldata$Set, levels = c('Set1', 'Set2', 'Set3', 'Set4', 'Set5', 'Set6'), labels = c('Set1 rep.1', 'Set2 rep.1', 'Set3 rep.1', 'Set1 rep.2', 'Set2 rep.2', 'Set3 rep.2')) 125 | pcr0 = scCluster(pcr0, slot = "cell.umap", k = 4, method = "density", dc = 0.3) 126 | ``` 127 | 128 | 129 | # UMAP plot 130 | 131 | The dissimilarity in c/c' is larger than that in b/b' based on our original design, and this cell-cell relationship can be directly reflected in the UMAP plots of the RPCI-integrated data. And the difference in c/c' can be re-clustered from the integrated data. 132 | 133 | Here the simulated data contain three sets, each set includes three cell groups and two replicates, with the batches existing among sets and duplicates. 134 | 135 | 136 | ```{r, fig.show="hold", out.width="48", fig.dim=c(7, 5)} 137 | DimPlot(pcr0, colFactor = 'Set0', size = 2) 138 | DimPlot(pcr0, colFactor = 'Group', size = 2, Colors = brewer.pal(5, "Set1")) 139 | DimPlot(pcr0, colFactor = 'Cluster', size = 2, Colors = brewer.pal(6, "Dark2")) 140 | ``` 141 | 142 | 143 | More details and real scRNA-seq data tutorial shown in the URLs: https://github.com/yangRISC/RISC 144 | 145 | 146 | # Session Information 147 | ```{r} 148 | sessionInfo() 149 | ``` 150 | --------------------------------------------------------------------------------