├── .gitignore
├── DESCRIPTION
├── GSE123813
    ├── GSE123813_Vignette.pdf
    └── Raw_Data
    │   └── bcc_annotation.tsv
├── GSE123813_Vignette_RISC_v1.0.pdf
├── GSE123813_Vignette_RISC_v1.6.pdf
├── LICENSE
├── NAMESPACE
├── R
    ├── AllClasses.R
    ├── AllGenerics.R
    ├── Cluster.R
    ├── ClusterMarker.R
    ├── Graph.R
    ├── Integrating.R
    ├── Preprocess.R
    ├── RcppExports.R
    ├── Reduce_Dimension.R
    └── Utilities.R
├── README.md
├── RISC_1.0.tar.gz
├── RISC_1.6.0.tar.gz
├── RISC_1.7.tar.gz
├── RISC_Supplementary
    ├── GSE110823
    │   ├── GSE110823.R
    │   ├── GSE110823_ann.tsv
    │   └── var.tsv
    ├── GSE111113
    │   ├── GSE111113.R
    │   └── var.tsv
    ├── GSE114727
    │   ├── GSE114727.R
    │   ├── GSE114727_Anno_All.tsv
    │   ├── var_all.tsv
    │   └── var_pos.tsv
    ├── GSE123813
    │   ├── GSE123813_Vignette.Rmd
    │   └── bcc_annotation.tsv
    ├── GSE125688
    │   ├── GSE125688.R
    │   └── GSE125688_Ann.tsv
    ├── GSE131181
    │   ├── GSE131181.R
    │   ├── GSE131181_Ann.tsv
    │   └── GSE131181_var.tsv
    ├── GSE132044
    │   ├── GSE132044.R
    │   ├── GSE132044_Anno.tsv
    │   └── var.tsv
    ├── GSE84133
    │   ├── GSE84133.R
    │   ├── GSE84133_Anno.tsv
    │   ├── GSE84133_Filter.tsv
    │   └── var.tsv
    ├── GSE85241_GSE81076_GSE83139_EMTAB_5061
    │   ├── GSE85241_GSE81076_GSE83139_EMTAB_5061.R
    │   ├── Pancreas_Annotation.tsv
    │   └── var.tsv
    └── GSE96583
    │   ├── Anno0.tsv
    │   ├── GSE96583.R
    │   ├── Var0.tsv
    │   └── Var0_Ori.tsv
├── Release.txt
├── Seurat_to_RISC_RISC_v1.0.pdf
├── build
    └── vignette.rds
├── data
    ├── datalist
    └── raw.mat.rda
├── inst
    └── doc
    │   ├── RISC_Vignette.R
    │   ├── RISC_Vignette.Rmd
    │   └── RISC_Vignette.html
├── man
    ├── AddFactor.Rd
    ├── All-Cluster-Marker.Rd
    ├── Cluster-Marker.Rd
    ├── Cluster.Rd
    ├── DimPlot.Rd
    ├── Disperse.Rd
    ├── Filter.Rd
    ├── FilterPlot.Rd
    ├── Heatmap.Rd
    ├── Import-10X-h5.Rd
    ├── Import-10X-mtx.Rd
    ├── Import-Matrix.Rd
    ├── InPlot.Rd
    ├── Integration-Algorithm-SIMPLS.Rd
    ├── MSC.Rd
    ├── Multiple-Integrating.Rd
    ├── Normalize.Rd
    ├── PCA.Rd
    ├── PCPlot.Rd
    ├── PLS-Integrating.Rd
    ├── Pairwise-DEGs.Rd
    ├── Scale.Rd
    ├── SingleCellData.Rd
    ├── Subset.Rd
    ├── UMAP.Rd
    ├── UMAPlot.Rd
    ├── Violin-Plot.Rd
    ├── raw.mat.Rd
    ├── setClass.Rd
    ├── setMethod.Rd
    └── tSNE.Rd
├── src
    ├── Makevars
    ├── Makevars.win
    ├── RcppArmadilloProcess.cpp
    └── RcppExports.cpp
├── tests
    ├── testthat.R
    └── testthat
    │   └── test-workflow.R
└── vignettes
    └── RISC_Vignette.Rmd


/.gitignore:
--------------------------------------------------------------------------------
 1 | .DS_Store
 2 | .DS_Store
 3 | .DS_Store
 4 | .DS_Store
 5 | .DS_Store
 6 | .DS_Store
 7 | .DS_Store
 8 | .Rhistory
 9 | .DS_Store
10 | .Rhistory
11 | .DS_Store
12 | .Rhistory
13 | .DS_Store
14 | .Rhistory
15 | .DS_Store
16 | .Rhistory
17 | .DS_Store
18 | .Rhistory
19 | 


--------------------------------------------------------------------------------
/DESCRIPTION:
--------------------------------------------------------------------------------
 1 | Package: RISC
 2 | Type: Package
 3 | Title: Robust Integration of Single-Cell RNA-Seq Datasets
 4 | Version: 1.7
 5 | Date: 2022-1-10
 6 | Update: 2024-3-15
 7 | Authors@R: c(person("Yang", "Liu", role = c("aut", "cre"), 
 8 |          email = "yanliurisc@gmail.com"), 
 9 |          person("Deyou", "Zheng", role = "aut"), 
10 |          person("Tao", "Wang", role = "aut"))
11 | Maintainer: Yang Liu <yanliurisc@gmail.com>
12 | Description: The 'RISC' package can integrate single cell RNA sequencing data, correct batch effects, cluster cells, identify gene markers, and produce integrated gene expression matrix. More details in URLs below.
13 | URL: https://www.biorxiv.org/content/10.1101/483297v1.article-info
14 |         https://github.com/yangRISC/RISC
15 | Depends: R (>= 4.0.0)
16 | Imports: Rcpp, Matrix, sparseMatrixStats, MASS, pbapply, hdf5r,
17 |         doParallel, foreach, irlba, Rtsne, umap, densityClust, FNN,
18 |         igraph, RColorBrewer, ggplot2, gridExtra, pheatmap, methods,
19 |         grDevices, stats, utils
20 | LinkingTo: Rcpp, RcppArmadillo
21 | LazyData: true
22 | Encoding: UTF-8
23 | RoxygenNote: 7.3.1
24 | Suggests: testthat, usethis, knitr, rmarkdown
25 | VignetteBuilder: knitr
26 | NeedsCompilation: yes
27 | License: GPL-3 | file LICENSE
28 | Packaged: 2024-03-18 13:34:09 UTC; liuy128
29 | Author: Yang Liu [aut, cre],
30 |   Deyou Zheng [aut],
31 |   Tao Wang [aut]
32 | 


--------------------------------------------------------------------------------
/GSE123813/GSE123813_Vignette.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/bioinfoDZ/RISC/7f1ddf61711f0d24ef8ea0601e550370ff3ec2b5/GSE123813/GSE123813_Vignette.pdf


--------------------------------------------------------------------------------
/GSE123813_Vignette_RISC_v1.0.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/bioinfoDZ/RISC/7f1ddf61711f0d24ef8ea0601e550370ff3ec2b5/GSE123813_Vignette_RISC_v1.0.pdf


--------------------------------------------------------------------------------
/GSE123813_Vignette_RISC_v1.6.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/bioinfoDZ/RISC/7f1ddf61711f0d24ef8ea0601e550370ff3ec2b5/GSE123813_Vignette_RISC_v1.6.pdf


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/bioinfoDZ/RISC/7f1ddf61711f0d24ef8ea0601e550370ff3ec2b5/LICENSE


--------------------------------------------------------------------------------
/NAMESPACE:
--------------------------------------------------------------------------------
 1 | # Generated by roxygen2: do not edit by hand
 2 | 
 3 | export(AllMarker)
 4 | export(DimPlot)
 5 | export(FilterPlot)
 6 | export(Heat)
 7 | export(InPlot)
 8 | export(SubSet)
 9 | export(ViolinPlot)
10 | export(read10X_h5)
11 | export(read10X_mtx)
12 | export(readsc)
13 | export(scCluster)
14 | export(scDEG)
15 | export(scDisperse)
16 | export(scFilter)
17 | export(scMarker)
18 | export(scMultiIntegrate)
19 | export(scNormalize)
20 | export(scPCA)
21 | export(scPLS)
22 | export(scScale)
23 | export(scTSNE)
24 | export(scUMAP)
25 | exportClasses(RISCdata)
26 | import(RColorBrewer)
27 | import(ggplot2)
28 | importFrom(FNN,get.knn)
29 | importFrom(MASS,glm.nb)
30 | importFrom(Matrix,colMeans)
31 | importFrom(Matrix,colSums)
32 | importFrom(Matrix,mean)
33 | importFrom(Matrix,readMM)
34 | importFrom(Matrix,rowMeans)
35 | importFrom(Matrix,rowSums)
36 | importFrom(Matrix,spMatrix)
37 | importFrom(Matrix,sparseMatrix)
38 | importFrom(Rcpp,evalCpp)
39 | importFrom(Rtsne,Rtsne)
40 | importFrom(densityClust,densityClust)
41 | importFrom(densityClust,findClusters)
42 | importFrom(doParallel,registerDoParallel)
43 | importFrom(foreach,"%dopar%")
44 | importFrom(foreach,foreach)
45 | importFrom(grDevices,col2rgb)
46 | importFrom(grDevices,colorRampPalette)
47 | importFrom(gridExtra,grid.arrange)
48 | importFrom(hdf5r,H5File)
49 | importFrom(igraph,"E<-")
50 | importFrom(igraph,E)
51 | importFrom(igraph,cluster_louvain)
52 | importFrom(igraph,graph_from_data_frame)
53 | importFrom(igraph,simplify)
54 | importFrom(irlba,irlba)
55 | importFrom(methods,as)
56 | importFrom(methods,new)
57 | importFrom(pbapply,pblapply)
58 | importFrom(pbapply,pboptions)
59 | importFrom(pheatmap,pheatmap)
60 | importFrom(stats,"contrasts<-")
61 | importFrom(stats,aggregate)
62 | importFrom(stats,as.formula)
63 | importFrom(stats,coef)
64 | importFrom(stats,contr.sum)
65 | importFrom(stats,cutree)
66 | importFrom(stats,dist)
67 | importFrom(stats,embed)
68 | importFrom(stats,gaussian)
69 | importFrom(stats,glm)
70 | importFrom(stats,ks.test)
71 | importFrom(stats,loess)
72 | importFrom(stats,loess.smooth)
73 | importFrom(stats,median)
74 | importFrom(stats,model.matrix)
75 | importFrom(stats,p.adjust)
76 | importFrom(stats,pchisq)
77 | importFrom(stats,poisson)
78 | importFrom(stats,qnorm)
79 | importFrom(stats,quantile)
80 | importFrom(stats,quasipoisson)
81 | importFrom(stats,reshape)
82 | importFrom(stats,sd)
83 | importFrom(stats,smooth)
84 | importFrom(stats,var)
85 | importFrom(stats,wilcox.test)
86 | importFrom(umap,umap)
87 | importFrom(utils,head)
88 | importFrom(utils,read.table)
89 | importFrom(utils,setTxtProgressBar)
90 | importFrom(utils,txtProgressBar)
91 | useDynLib(RISC)
92 | 


--------------------------------------------------------------------------------
/R/AllClasses.R:
--------------------------------------------------------------------------------
  1 | ####################################################################################
  2 | #' Import single cell data
  3 | ####################################################################################
  4 | #' 
  5 | #' The single cell RNA-seq (scRNA-seq) data can be imported in three different ways.
  6 | #' Primarily, we could import from 10X Genomics output directly by using 
  7 | #' "read10Xgenomics". The user only need to provide the folder path. Secondly, 
  8 | #' we could read data from HT-seq output by "readHTSeqdata", the user have to input 
  9 | #' the folder path. Lastly, we could input matrix, cell and genes mannually 
 10 | #' using "readscdata".
 11 | #' 
 12 | #' @useDynLib RISC
 13 | #' @importFrom Rcpp evalCpp
 14 | #' @importFrom methods as new
 15 | #' @importFrom utils head read.table
 16 | #' @importFrom Matrix readMM colSums rowSums
 17 | #' @rdname SingleCellData
 18 | #' @return SingleCellData
 19 | #' @param assay The list of gene counts.
 20 | #' @param coldata The data.frame with cell information.
 21 | #' @param rowdata The data.frame with gene information.
 22 | #' @name SingleCellData
 23 | 
 24 | SingleCellData <- function(assay, coldata, rowdata){
 25 |   object <- new(Class = 'RISCdata', assay = assay, coldata = coldata, rowdata = rowdata)
 26 | }
 27 | 
 28 | 
 29 | 
 30 | ####################################################################################
 31 | #' Example data
 32 | ####################################################################################
 33 | #' 
 34 | #' @docType data
 35 | #' @usage data(raw.mat)
 36 | #' @format A list including a simulated cell-gene matrix, columns for cells and 
 37 | #' rows for genes, a cell group and a batch information. 
 38 | "raw.mat"
 39 | 
 40 | 
 41 | 
 42 | ####################################################################################
 43 | #' Import data from matrix, cell and genes directly.
 44 | ####################################################################################
 45 | #' 
 46 | #' Import data set from matrix, cell and genes directly, the customer needs three 
 47 | #' files: a matrix file including gene expression values: raw counts/UMIs (rows for 
 48 | #' genes while columns for cells), a cell file (whose row.name are equal to the 
 49 | #' col.name of the matrix), and a gene file whose row.name are the same as the 
 50 | #' row.name of the matrix. If row.names of the gene matrix are Ensembl ID, the 
 51 | #' customer need to transfer them to gene symbols manually.
 52 | #' 
 53 | #' @rdname Import-Matrix
 54 | #' @param count Matrix with raw counts/UMIs.
 55 | #' @param cell Data.frame with cell Barcode, whose row.name are equal to the 
 56 | #' col.name of the matrix.
 57 | #' @param gene Data.frame with gene symbol, whose row.name are the same as the 
 58 | #' row.name of the matrix.
 59 | #' @param is.filter Remove not expressed genes.
 60 | #' @return RISC single cell dataset, including count, coldata, and rowdata.
 61 | #' @name readsc
 62 | #' @export
 63 | #' @examples 
 64 | #' mat0 = as.matrix(raw.mat[[1]])
 65 | #' coldata0 = as.data.frame(raw.mat[[2]])
 66 | #' coldata.obj = coldata0[coldata0$Batch0 == 'Batch3',]
 67 | #' matrix.obj = mat0[,rownames(coldata.obj)]
 68 | #' obj0 = readsc(count = matrix.obj, cell = coldata.obj, 
 69 | #'        gene = data.frame(Symbol = rownames(matrix.obj), 
 70 | #'        row.names = rownames(matrix.obj)), is.filter = FALSE)
 71 | 
 72 | readsc <- function(
 73 |   count, 
 74 |   cell, 
 75 |   gene, 
 76 |   is.filter = TRUE
 77 | ) {
 78 |   
 79 |   if(exists("count") & exists("cell") & exists("gene")){
 80 |     
 81 |     if(all(colnames(count) == rownames(cell)) & all(rownames(count) == rownames(gene))){
 82 |       
 83 |       run.matrix = as(count, 'CsparseMatrix')
 84 |       # run.matrix = as.matrix(count)
 85 |       
 86 |       mito.gene = grep(pattern = '^mt-', x = rownames(run.matrix), ignore.case = TRUE, value = TRUE)
 87 |       run.cell = data.frame(scBarcode = rownames(cell), UMI = Matrix::colSums(run.matrix), nGene = Matrix::colSums(run.matrix > 0), row.names = rownames(cell), stringsAsFactors = FALSE)
 88 |       run.cell$mito = Matrix::colSums(run.matrix[rownames(run.matrix) %in% mito.gene,]) / run.cell$UMI
 89 |       run.cell = cbind.data.frame(run.cell, cell)
 90 |       
 91 |       run.gene = data.frame(Symbol = rownames(run.matrix), RNA = "Gene Expression", row.names = rownames(run.matrix), stringsAsFactors = FALSE)
 92 |       run.gene$nCell = Matrix::rowSums(run.matrix > 0)
 93 |       if(is.filter){
 94 |         run.gene = run.gene[run.gene$nCell > 0,]
 95 |       } else {
 96 |         run.gene = run.gene
 97 |       }
 98 |       run.matrix = run.matrix[rownames(run.matrix) %in% run.gene$Symbol,]
 99 |       
100 |       SingleCellData(assay = list(count = as(run.matrix, 'CsparseMatrix')), rowdata = data.frame(run.gene, stringsAsFactors = FALSE), coldata = data.frame(run.cell, stringsAsFactors = FALSE))
101 |       
102 |     } else {
103 |       stop('Matrix colnames or rownames are not equal to cell name or gene name')
104 |     }
105 |     
106 |   } else {
107 |     stop('No matrix, cell or gene is found here')
108 |   }
109 |   
110 | }
111 | 
112 | 
113 | 
114 | ####################################################################################
115 | #' Import data from 10X Genomics output (tsv-mtx).
116 | ####################################################################################
117 | #' 
118 | #' Import data directly from 10X Genomics output, usually using filtered gene 
119 | #' matrices which contains three files: matrix.mtx, barcode.tsv and gene.tsv. 
120 | #' The user only need to input the directory into "data.path". If not the original 
121 | #' 10X Genomics output, the user have to make sure the barcode.tsv and gene.tsv 
122 | #' without col.names, the barcode.tsv at least contains one column for cell 
123 | #' barcode, and the gene.tsv has two columns for gene Ensembl ID and Symbol.
124 | #' 
125 | #' @rdname Import-10X-mtx
126 | #' @param data.path Directory containing the filtered 10X Genomics output, 
127 | #' including three files: matrix.mtx, barcode.tsv (without colnames) and gene.tsv 
128 | #' (without colnames).
129 | #' @param sep The sep can be changed by the users
130 | #' @param is.filter Remove not expressed genes.
131 | #' @return RISC single cell dataset, including count, coldata, and rowdata.
132 | #' @name read10X_mtx
133 | #' @export
134 | 
135 | read10X_mtx <- function(
136 |   data.path, 
137 |   sep = '\t', 
138 |   is.filter = TRUE
139 |   ) {
140 |   
141 |   if(!exists("data.path")){
142 |     stop('Please input data.path')
143 |   } else {
144 |     data.path = as.character(data.path)
145 |   }
146 |   
147 |   sep0 = sep
148 |   files = list.files(path = data.path, full.names = TRUE)
149 |   file.matrix = grep('matrix.mtx', files, ignore.case = TRUE, value = TRUE)
150 |   file.gene = grep(pattern = 'features.tsv', files, ignore.case = TRUE, value = TRUE)
151 |   file.cell = grep('barcodes.tsv', files, ignore.case = TRUE, value = TRUE)
152 |   
153 |   if(length(file.matrix) == 1 & length(file.gene) == 1 & length(file.cell) == 1){
154 |     
155 |     run.matrix = readMM(file = file.matrix)
156 |     run.matrix = as(run.matrix, 'CsparseMatrix')
157 |     
158 |     run.gene = read.table(file = file.gene, header = FALSE, sep = sep0, stringsAsFactors = FALSE)
159 |     run.gene = data.frame(run.gene, stringsAsFactors = FALSE)
160 |     if(ncol(run.gene) > 2){
161 |       colnames(run.gene) = c('Ensembl', 'Symbol', 'RNA')
162 |     } else {
163 |       colnames(run.gene) = c('Ensembl', 'Symbol')
164 |       run.gene$RNA = "Gene Expression"
165 |     }
166 |     run.gene$nCell = Matrix::rowSums(run.matrix > 0)
167 |     run.gene$Symbol = make.unique(run.gene$Symbol)
168 |     rownames(run.matrix) = rownames(run.gene) = run.gene$Symbol
169 |     if(is.filter){
170 |       run.gene = run.gene[run.gene$nCell > 0,]
171 |     } else {
172 |       run.gene = run.gene
173 |     }
174 |     run.matrix = run.matrix[rownames(run.matrix) %in% run.gene$Symbol,]
175 |     
176 |     mito.gene = grep(pattern = '^mt-', x = rownames(run.matrix), ignore.case = TRUE, value = TRUE)
177 |     run.cell = read.table(file = file.cell, header = FALSE, sep = sep0, stringsAsFactors = FALSE)
178 |     # run.cell = sapply(run.cell$V1, function(x){strsplit(x, '-', fixed = T)[[1]][[1]]})
179 |     run.cell = data.frame(scBarcode = as.character(run.cell$V1), UMI = Matrix::colSums(run.matrix), nGene = Matrix::colSums(run.matrix > 0), stringsAsFactors = FALSE)
180 |     run.cell$mito = Matrix::colSums(run.matrix[rownames(run.matrix) %in% mito.gene,]) / run.cell$UMI
181 |     colnames(run.matrix) = rownames(run.cell) = run.cell$scBarcode
182 |     
183 |   } else {
184 |     stop('The direcotry is invalid, please input the dir including files: "barcodes.tsv", "features.tsv", "matrix.mtx"')
185 |   }
186 |   
187 |   SingleCellData(assay = list(count = as(run.matrix, 'CsparseMatrix')), rowdata = data.frame(run.gene, stringsAsFactors = FALSE), coldata = data.frame(run.cell, stringsAsFactors = FALSE))
188 |   
189 | }
190 | 
191 | 
192 | 
193 | ####################################################################################
194 | #' Import data from 10X Genomics output (h5).
195 | ####################################################################################
196 | #' 
197 | #' Import data directly from 10X Genomics output, usually using filtered gene 
198 | #' matrices which contains h5 file. 
199 | #' The user only need to input the directory into "data.path". If not the original 
200 | #' 10X Genomics output, the user can use 'readsc' function.
201 | #' 
202 | #' @rdname Import-10X-h5
203 | #' @param file.path The path of the filtered 10X Genomics output (h5 file).
204 | #' @param is.filter Remove not expressed genes.
205 | #' @importFrom hdf5r H5File
206 | #' @return RISC single cell dataset, including count, coldata, and rowdata.
207 | #' @name read10X_h5
208 | #' @export
209 | 
210 | read10X_h5 <- function(
211 |   file.path, 
212 |   is.filter = TRUE
213 | ) {
214 |   
215 |   if(!exists("file.path")){
216 |     stop('Please input file.path')
217 |   } else {
218 |     file.path = as.character(file.path)
219 |   }
220 |   
221 |   files = H5File$new(filename = file.path, mode = "r")
222 |   key = files$names
223 |   
224 |   if(!is.null(files[[key]])){
225 |     
226 |     if(key == 'matrix'){
227 |       
228 |       run.cell = data.frame(scBarcode = files[['matrix/barcodes']][], row.names = files[['matrix/barcodes']][])
229 |       run.gene = data.frame(Symbol = make.unique(files[['matrix/features/name']][]), Ensembl = files[['matrix/features/id']][], row.names = make.unique(files[['matrix/features/name']][]))
230 |       run.matrix = Matrix::sparseMatrix(dims = files[['matrix/shape']][], x = files[['matrix/data']][], i = files[['matrix/indices']][], p = files[['matrix/indptr']][], index1 = FALSE)
231 |       files$close_all()
232 |       
233 |       run.matrix = as(run.matrix, 'CsparseMatrix')
234 |       run.gene$RNA = "Gene Expression"
235 |       run.gene$nCell = Matrix::rowSums(run.matrix > 0)
236 |       rownames(run.matrix) = rownames(run.gene)
237 |       
238 |       if(is.filter){
239 |         run.gene = run.gene[run.gene$nCell > 0,]
240 |       } else {
241 |         run.gene = run.gene
242 |       }
243 |       run.matrix = run.matrix[rownames(run.matrix) %in% run.gene$Symbol,]
244 |       
245 |       mito.gene = grep(pattern = '^mt-', x = rownames(run.matrix), ignore.case = TRUE, value = TRUE)
246 |       run.cell$UMI = Matrix::colSums(run.matrix)
247 |       run.cell$nGene = Matrix::colSums(run.matrix > 0)
248 |       run.cell$mito = Matrix::colSums(run.matrix[rownames(run.matrix) %in% mito.gene,]) / run.cell$UMI
249 |       colnames(run.matrix) = rownames(run.cell)
250 |       
251 |     } else {
252 |       
253 |       run.cell = data.frame(scBarcode = files[[paste0(key, '/barcodes')]][], row.names = files[[paste0(key, '/barcodes')]][])
254 |       run.gene = data.frame(Symbol = make.unique(files[[paste0(key, '/gene_names')]][]), Ensembl = files[[paste0(key, '/genes')]][], row.names = make.unique(files[[paste0(key, '/gene_names')]][]))
255 |       run.matrix = Matrix::sparseMatrix(dims = files[[paste0(key, '/shape')]][], x = files[[paste0(key, '/data')]][], i = files[[paste0(key, '/indices')]][], p = files[[paste0(key, '/indptr')]][], index1 = FALSE)
256 |       files$close_all()
257 |       
258 |       run.matrix = as(run.matrix, 'CsparseMatrix')
259 |       run.gene$RNA = "Gene Expression"
260 |       run.gene$nCell = Matrix::rowSums(run.matrix > 0)
261 |       rownames(run.matrix) = rownames(run.gene)
262 |       
263 |       if(is.filter){
264 |         run.gene = run.gene[run.gene$nCell > 0,]
265 |       } else {
266 |         run.gene = run.gene
267 |       }
268 |       run.matrix = run.matrix[rownames(run.matrix) %in% run.gene$Symbol,]
269 |       
270 |       mito.gene = grep(pattern = '^mt-', x = rownames(run.matrix), ignore.case = TRUE, value = TRUE)
271 |       run.cell$UMI = Matrix::colSums(run.matrix)
272 |       run.cell$nGene = Matrix::colSums(run.matrix > 0)
273 |       run.cell$mito = Matrix::colSums(run.matrix[rownames(run.matrix) %in% mito.gene,]) / run.cell$UMI
274 |       colnames(run.matrix) = rownames(run.cell)
275 |       
276 |     }
277 |     
278 |   } else {
279 |     stop('The h5 file is invalid, please input correct file path.')
280 |     files$close_all()
281 |   }
282 |   
283 |   SingleCellData(assay = list(count = as(run.matrix, 'CsparseMatrix')), rowdata = data.frame(run.gene, stringsAsFactors = FALSE), coldata = data.frame(run.cell, stringsAsFactors = FALSE))
284 |   
285 | }
286 | 
287 | 
288 | 


--------------------------------------------------------------------------------
/R/AllGenerics.R:
--------------------------------------------------------------------------------
 1 | ####################################################################################
 2 | #' RISC data
 3 | ####################################################################################
 4 | #' 
 5 | #' The RISC object contains all the basic information used in single cell RNA-seq 
 6 | #' analysis, including raw counts/UMIs, normalized gene values, dimension reduction, 
 7 | #' cell clustering, and so on. The framework of RISC object is a S4 dataset, 
 8 | #' consisting of assay, coldata, rowdata, metadata, vargene, cluster, and 
 9 | #' DimReduction.
10 | #'
11 | #' @rdname setClass
12 | #' @return RISC object: a S4 framework dataset
13 | #' @slot assay The list of gene counts/UMIs: raw and normalized counts
14 | #' @slot coldata The data.frame with cell information, such as cell types, stages, 
15 | #' and other factors.
16 | #' @slot rowdata The data.frame with gene information, such as coding or non-coding 
17 | #' genes.
18 | #' @slot metadata The data.frame with meta value.
19 | #' @slot vargene The highly variable gene. 
20 | #' These genes are utilized in dimension reduction.
21 | #' @slot cluster The cell clustering information: include three algorithms for 
22 | #' clustering.
23 | #' @slot DimReduction The values of dimension reduction.
24 | #' @docType class
25 | #' @exportClass RISCdata
26 | 
27 | setClass(
28 |   'RISCdata', slots = list(
29 |     assay = 'list',
30 |     coldata = 'data.frame',
31 |     rowdata = 'data.frame',
32 |     metadata = 'list',
33 |     cluster = 'factor',
34 |     DimReduction = 'list',
35 |     vargene = 'vector'
36 |   )
37 | )
38 | 
39 | #' RISC data
40 | #' 
41 | #' This will show the full information of RISC object, including the number of cells, 
42 | #' the number of genes, any biological or statistical information of cells or genes.
43 | #' 
44 | #' @rdname setMethod
45 | #' @name RISCdata
46 | #' @aliases RISC object
47 | #' @param object RISC object: a S4 framework dataset
48 | #' @docType methods
49 | 
50 | .RISC_show <- function(object){
51 |   cat(
52 |     "SingleCell-Dataset", '\n',
53 |     "RISC v1.6", '\n',
54 |     c('assay:', names(object@assay)), '\n',
55 |     c(paste0('colData: ', '(', nrow(object@coldata), ')'), colnames(object@coldata)), '\n',
56 |     c(paste0('rowData: ', '(', nrow(object@rowdata), ')'), colnames(object@rowdata)), '\n',
57 |     'DimReduction', '\n',
58 |     'Cell-Clustering', '\n'
59 |   )
60 | }
61 | 
62 | setMethod(
63 |   f = 'show',
64 |   signature = 'RISCdata',
65 |   definition = .RISC_show
66 | )
67 | 
68 | 
69 | 
70 | 
71 | 


--------------------------------------------------------------------------------
/R/Cluster.R:
--------------------------------------------------------------------------------
  1 | ####################################################################################
  2 | #' Clustering cells
  3 | ####################################################################################
  4 | #' 
  5 | #' In RISC, two different methods are provided to cluster cells, all of them are 
  6 | #' widely used in single cells. The first method is "louvain" based on cell 
  7 | #' eigenvectors, and the other is "density" which calculates cell clusters using 
  8 | #' low dimensional space. 
  9 | #' 
 10 | #' @rdname Cluster
 11 | #' @param object RISC object: a framework dataset.
 12 | #' @param method The methods for clustering cells, density and louvain. 
 13 | #' The "density" is based on the slot "cell.umap" or other low dimensional 
 14 | #' space; while "louvain" based on "cell.pca" (individual data) or "cell.pls" 
 15 | #' (for integration data).
 16 | #' @param slot The dimension_reduction slot for cell clustering. The default is 
 17 | #' "cell.umap" under RISC object "DimReduction" item for UMAP method, but the 
 18 | #' customer can add new dimension_reduction method under DimReduction and use it.
 19 | #' @param neighbor The neighbor cells for "igraph" method.
 20 | #' @param algorithm The algorithm for knn, the default is "kd_tree", all options: 
 21 | #' "kd_tree", "cover_tree", "CR", "brute".
 22 | #' @param npc The number of PCA or PLS used for cell clustering.
 23 | #' @param k The number of cluster searched for, works in "density" method.
 24 | #' @param res The resolution of cluster searched for, works in "louvain" method.
 25 | #' @param dc The distance used to generate random center points which affect 
 26 | #' clusters. If have no idea about this, do not input anything. Keep it as the 
 27 | #' default value for most users. Work for "density" method.
 28 | #' @param redo Whether re-cluster the cells.
 29 | #' @param random.seed The random seed, the default is 123.
 30 | #' @return RISC single cell dataset, the cluster slot. 
 31 | #' @name scCluster
 32 | #' @importFrom densityClust densityClust findClusters
 33 | #' @importFrom FNN get.knn
 34 | #' @importFrom igraph simplify graph_from_data_frame cluster_louvain E E<-
 35 | #' @references Blondel et al., JSTAT (2008)
 36 | #' @references Rodriguez et al., Sicence (2014)
 37 | #' @export
 38 | #' @examples 
 39 | #' # RISC object
 40 | #' obj0 = raw.mat[[3]]
 41 | #' obj0 = scPCA(obj0, npc = 10)
 42 | #' obj0 = scUMAP(obj0, npc = 3)
 43 | #' obj0 = scCluster(obj0, slot = "cell.umap", k = 3, method = 'density')
 44 | #' DimPlot(obj0, slot = "cell.umap", colFactor = 'Cluster', size = 2)
 45 | 
 46 | scCluster <- function(
 47 |   object, 
 48 |   slot = "cell.pca", 
 49 |   neighbor = 10, 
 50 |   algorithm = "kd_tree", 
 51 |   method = 'louvain', 
 52 |   npc = 20, 
 53 |   k = 10, 
 54 |   res = 0.5, 
 55 |   dc = NULL, 
 56 |   redo = TRUE, 
 57 |   random.seed = 123
 58 |   ) {
 59 |   
 60 |   set.seed(random.seed)
 61 |   k = as.integer(k)
 62 |   res = as.numeric(res)
 63 |   neighbor = as.integer(neighbor)
 64 |   algorithm = as.character(algorithm)
 65 |   npc = as.integer(npc)
 66 |   random.seed = as.integer(random.seed)
 67 |   
 68 |   if(is.null(dc)){
 69 |     dc = object@metadata$dcluster$dc
 70 |     if(isTRUE(redo)){
 71 |       dc = NULL
 72 |     } else {
 73 |       dc = dc
 74 |     }
 75 |   } else {
 76 |     dc = dc
 77 |   }
 78 |   
 79 |   if(!is.null(object@vargene) & !is.null(object@DimReduction)){
 80 |     
 81 |     slot0 = as.character(slot)
 82 |     dimReduce0 = object@DimReduction[[slot0]]
 83 |     if(is.null(dimReduce0)){
 84 |       stop("Do not include this dimention_reduction slot, try another one")
 85 |     } else if(ncol(dimReduce0) >= npc){
 86 |       count = as.matrix(dimReduce0[,1:npc])
 87 |     } else {
 88 |       count = as.matrix(dimReduce0)
 89 |     }
 90 |     
 91 |     if(method == 'density'){
 92 |       
 93 |       dist0 = dist(count)
 94 |       if(is.null(dc)){
 95 |         dataClust = densityClust(dist0, gaussian = TRUE)
 96 |       } else {
 97 |         dataClust = densityClust(dist0, dc = dc, gaussian = TRUE)
 98 |       }
 99 |       
100 |       delta.rho = data.frame(rho = dataClust$rho, delta = dataClust$delta, stringsAsFactors = FALSE)
101 |       delta.rho = delta.rho[order(delta.rho$delta, decreasing = TRUE),]
102 |       delta.cut = delta.rho$delta[k + 1L]
103 |       clust0 = findClusters(dataClust, 0, delta.cut)
104 |       # object@metadata$dcluster = clust0
105 |       object@cluster = object@coldata$Cluster = as.factor(clust0$clusters)
106 |       names(object@cluster) = names(object@coldata$Cluster) = rownames(count)
107 |       object@metadata[['clustering']] = data.frame(
108 |         Method = 'densityClust', Distance = as.numeric(clust0$dc), 
109 |         rho = as.numeric(clust0$threshold[1]), delta = as.numeric(clust0$threshold[2]), 
110 |         stringsAsFactors = F
111 |       )
112 |       
113 |     } else if(method == 'louvain') {
114 |       
115 |       clust0 = get.knn(count, k = neighbor, algorithm = algorithm)
116 |       clust1 = data.frame(NodStar = rep(1L:nrow(count), neighbor), NodEnd = as.vector(clust0$nn.index), stringsAsFactors = FALSE)
117 |       clust1 = graph_from_data_frame(clust1, directed = FALSE)
118 |       E(clust1)$weight = 1/(1 + as.vector(clust0$nn.dist))
119 |       clust1 = simplify(clust1)
120 |       clust1 = cluster_louvain(clust1, resolution = res)
121 |       object@cluster = object@coldata$Cluster = as.factor(clust1$membership)
122 |       names(object@cluster) = names(object@coldata$Cluster) = rownames(count)
123 |       object@metadata[['clustering']] = data.frame(Method = 'louvain', PCs = npc, Neighbors = neighbor, stringsAsFactors = F)
124 |       
125 |     } else {stop('A new method later')}
126 |     
127 |   } else {stop('Please calculate dispersion and dimention reduction first')}
128 |   
129 |   return(object)
130 |   
131 | }
132 | 
133 | 
134 | 


--------------------------------------------------------------------------------
/R/RcppExports.R:
--------------------------------------------------------------------------------
 1 | # Generated by using Rcpp::compileAttributes() -> do not edit by hand
 2 | # Generator token: 10BE3573-1514-4C36-9D1C-5A225CD40393
 3 | 
 4 | sqrt_sp <- function(X) {
 5 |     .Call('_RISC_sqrt_sp', PACKAGE = 'RISC', X)
 6 | }
 7 | 
 8 | cent_sp_d <- function(X) {
 9 |     .Call('_RISC_cent_sp_d', PACKAGE = 'RISC', X)
10 | }
11 | 
12 | multiply_sp_sp <- function(X, Y) {
13 |     .Call('_RISC_multiply_sp_sp', PACKAGE = 'RISC', X, Y)
14 | }
15 | 
16 | multiply_sp_d <- function(X, Y) {
17 |     .Call('_RISC_multiply_sp_d', PACKAGE = 'RISC', X, Y)
18 | }
19 | 
20 | multiply_d_d <- function(X, Y) {
21 |     .Call('_RISC_multiply_d_d', PACKAGE = 'RISC', X, Y)
22 | }
23 | 
24 | multiply_sp_d_sp <- function(X, Y) {
25 |     .Call('_RISC_multiply_sp_d_sp', PACKAGE = 'RISC', X, Y)
26 | }
27 | 
28 | multiply_sp_d_v <- function(X, Y) {
29 |     .Call('_RISC_multiply_sp_d_v', PACKAGE = 'RISC', X, Y)
30 | }
31 | 
32 | crossprod_sp_sp <- function(X, Y) {
33 |     .Call('_RISC_crossprod_sp_sp', PACKAGE = 'RISC', X, Y)
34 | }
35 | 
36 | crossprod_sp_d <- function(X, Y) {
37 |     .Call('_RISC_crossprod_sp_d', PACKAGE = 'RISC', X, Y)
38 | }
39 | 
40 | crossprod_d_d <- function(X, Y) {
41 |     .Call('_RISC_crossprod_d_d', PACKAGE = 'RISC', X, Y)
42 | }
43 | 
44 | tcrossprod_sp_sp <- function(X, Y) {
45 |     .Call('_RISC_tcrossprod_sp_sp', PACKAGE = 'RISC', X, Y)
46 | }
47 | 
48 | tcrossprod_d_d <- function(X, Y) {
49 |     .Call('_RISC_tcrossprod_d_d', PACKAGE = 'RISC', X, Y)
50 | }
51 | 
52 | winsorize_ <- function(x, y) {
53 |     .Call('_RISC_winsorize_', PACKAGE = 'RISC', x, y)
54 | }
55 | 
56 | lm_coef <- function(X, y) {
57 |     .Call('_RISC_lm_coef', PACKAGE = 'RISC', X, y)
58 | }
59 | 
60 | lm_ <- function(X, y) {
61 |     .Call('_RISC_lm_', PACKAGE = 'RISC', X, y)
62 | }
63 | 
64 | 


--------------------------------------------------------------------------------
/R/Reduce_Dimension.R:
--------------------------------------------------------------------------------
  1 | ####################################################################################
  2 | #' Dimension Reduction.
  3 | ####################################################################################
  4 | #' 
  5 | #' Based on highly variably expressed genes of the datasets, RISC calculates the 
  6 | #' principal components (PCs) of the cells using prcomp functions. The major PCs, 
  7 | #' which explain most gene expression variance, are used for dimension reduciton. 
  8 | #' 
  9 | #' @rdname PCA
 10 | #' @param object RISC object: a framework dataset.
 11 | #' @param npc The number of PCs will be generated based on highly variable genes 
 12 | #' (usually < 1,500), npc equal to the first 20 PCs as the default.
 13 | #' @return RISC single cell dataset, the DimReduction slot.
 14 | #' @references Jolliffe et al. (2016)
 15 | #' @references Alter et al., PNAS (2000)
 16 | #' @references Gonzalez et al., JSS (2008)
 17 | #' @references Mevik et al., JSS (2007)
 18 | #' @name scPCA
 19 | #' @export
 20 | #' @examples 
 21 | #' # RISC object
 22 | #' obj0 = raw.mat[[3]]
 23 | #' obj0 = scPCA(obj0, npc = 10)
 24 | 
 25 | scPCA <- function(object, npc = 20){
 26 |   
 27 |   if(!length(object@vargene) > 0){
 28 |     stop('Please disperse object first')
 29 |   } else {
 30 |     
 31 |     count = object@assay$logcount
 32 |     var = count[object@vargene,]
 33 |     var = scale(var, center = TRUE, scale = TRUE)
 34 |     varpc = irlba(var, nv = npc)
 35 |     cell.pca = as.matrix(varpc$v)
 36 |     gene.pca = as.matrix(varpc$u)
 37 |     var.pca = varpc$d^2 / sum(varpc$d^2)
 38 |     rownames(cell.pca) = colnames(var)
 39 |     rownames(gene.pca) = rownames(var)
 40 |     colnames(cell.pca) = colnames(gene.pca) = names(var.pca) = paste0('PC', 1L:npc)
 41 |     object@DimReduction[['cell.pca']] = cell.pca
 42 |     object@DimReduction[['var.pca']] = var.pca
 43 |     object@DimReduction[['gene.pca']] = gene.pca
 44 |     return(object)
 45 |     
 46 |   }
 47 | }
 48 | 
 49 | 
 50 | 
 51 | ####################################################################################
 52 | #' Dimension Reduction.
 53 | ####################################################################################
 54 | #' 
 55 | #' The UMAP is calculated based on the eigenvectors of single cell dataset, and the 
 56 | #' user can select the eigenvectors manually. Of note, the selected eigenvectors 
 57 | #' directly affect UMAP values. 
 58 | #' For the integrated data (the result of "scMultiIntegrate" funciton), RISC utilizes
 59 | #' the PCR output "PLS" to calculate the UMAP, therefore, the user has to input "PLS"
 60 | #' in "use = ", instead of the default parameter "PCA".
 61 | #' 
 62 | #' @rdname UMAP
 63 | #' @param object RISC object: a framework dataset.
 64 | #' @param npc The number of the PCs (or the PLS) using for UMAP, the default is 20, 
 65 | #' but need to be modified by the users. The PCA for individual dataset, while PLS 
 66 | #' for the integrated data.
 67 | #' @param embedding The number of components UMAP output.
 68 | #' @param use What components used for UMAP: PCA or PLS.
 69 | #' @param neighbors The n_neighbors parameter of UMAP.
 70 | #' @param dist The min_dist parameter of UMAP.
 71 | #' @param seed The random seed to keep tSNE result consistent.
 72 | #' @return RISC single cell dataset, the DimReduction slot.
 73 | #' @importFrom umap umap
 74 | #' @references Becht et al., Nature Biotech. (2018)
 75 | #' @name scUMAP
 76 | #' @export
 77 | #' @examples 
 78 | #' # RISC object
 79 | #' obj0 = raw.mat[[3]]
 80 | #' obj0 = scPCA(obj0, npc = 10)
 81 | #' obj0 = scUMAP(obj0, npc = 3)
 82 | #' DimPlot(obj0, slot = "cell.umap", colFactor = 'Group', size = 2)
 83 | 
 84 | scUMAP <- function(
 85 |   object, npc = 20, 
 86 |   embedding = 2, 
 87 |   use = 'PCA', 
 88 |   neighbors = 15, 
 89 |   dist = 0.1, 
 90 |   seed = 123,
 91 |   ...
 92 |   ) {
 93 |   
 94 |   if(use == 'PCA'){
 95 |     
 96 |     if(length(object@DimReduction$cell.pca) == 0){
 97 |       stop('Please disperse object first')
 98 |     } else {
 99 |       pca0 = FALSE
100 |       pca_center0 = FALSE
101 |       pca_scale0 = FALSE
102 |       cell.pc = object@DimReduction$cell.pca[,1:npc]
103 |     }
104 |     
105 |   } else if(use == 'PLS'){
106 |     
107 |     if(length(object@DimReduction$cell.pls) == 0){
108 |       stop('Please integrate objects first')
109 |     } else {
110 |       pca0 = TRUE
111 |       pca_center0 = TRUE
112 |       pca_scale0 = TRUE
113 |       cell.pc0 = object@DimReduction$cell.pls
114 |       
115 |       if(npc <= ncol(cell.pc0)){
116 |         cell.pc = cell.pc0[,1:npc]
117 |       } else {
118 |         scale.beta = object@metadata$Beta
119 |         scale.beta0 = irlba(scale.beta, nv = npc)
120 |         cell.pc = rbind(scale.beta0$u, scale.beta0$v)
121 |         rownames(cell.pc) = c(rownames(scale.beta), colnames(scale.beta))
122 |         colnames(cell.pc) = paste0('PC', 1L:npc)
123 |         # cell.pc = scale.beta1[order(rownames(scale.beta1), decreasing = F),]
124 |       }
125 |       
126 |     }
127 |     
128 |   } else {
129 |     stop('Input use, PCA or PCR')
130 |   }
131 |   
132 |   set.seed(as.numeric(seed))
133 |   embedding = as.integer(embedding)
134 |   neighbor0 = as.integer(neighbors)
135 |   dist0 = as.numeric(dist)
136 |   umap0 = umap(as.matrix(cell.pc), method = 'naive', n_components = embedding, min_dist = dist0, n_neighbors = neighbor0, ... = ...)
137 |   cell.umap = as.matrix(umap0$layout)
138 |   rownames(cell.umap) = rownames(cell.pc)
139 |   colnames(cell.umap) = paste0('UMAP', 1L:embedding)
140 |   object@DimReduction[['cell.umap']] = cell.umap
141 |   return(object)
142 |   
143 | }
144 | 
145 | 
146 | 
147 | ####################################################################################
148 | #' Dimension Reduction.
149 | ####################################################################################
150 | #' 
151 | #' The t-SNE is calculated based on the eigenvectors of single cell dataset, and 
152 | #' the user can select the eigenvectors manually. Of note, the selected eigenvectors 
153 | #' directly affect t-SNE values. 
154 | #' For the integrated data (the result of "scMultiIntegrate" funciton), RISC utilizes
155 | #' the PCR output "PLS" to calculate the t-SNE, therefore, the user has to input 
156 | #' "PLS" in "use = ", instead of the defaut parameter "PCA".
157 | #' 
158 | #' @rdname tSNE
159 | #' @param object RISC object: a framework dataset.
160 | #' @param npc The number of PCs (or PLS) using for t-SNE, the default is 20, 
161 | #' but need to be modified by the users. The PCA for individual dataset, while 
162 | #' PLS for the integrated data.
163 | #' @param embedding The number of components t-SNE output.
164 | #' @param use What components used for t-SNE: PCA or PLS.
165 | #' @param perplexity Perplexity parameter: if the cell numbers are small, 
166 | #' decrease this parameter, otherwise tSNE cannot be calculated.
167 | #' @param seed The random seed to keep tSNE result consistent.
168 | #' @return RISC single cell dataset, the DimReduction slot.
169 | #' @importFrom Rtsne Rtsne
170 | #' @references Laurens van der Maaten, JMLR (2014)
171 | #' @name scTSNE
172 | #' @export
173 | #' @examples 
174 | #' # RISC object
175 | #' obj0 = raw.mat[[3]]
176 | #' obj0 = scPCA(obj0, npc = 10)
177 | #' obj0 = scTSNE(obj0, npc = 4, perplexity = 10)
178 | #' DimPlot(obj0, slot = "cell.tsne", colFactor = 'Group', size = 2)
179 | 
180 | scTSNE <- function(
181 |     object, 
182 |     npc = 20,
183 |     embedding = 2,
184 |     use = 'PCA',
185 |     perplexity = 30,
186 |     seed = 123,
187 |     ...
188 |     ) {
189 |   
190 |   npc = as.integer(npc)
191 |   perplexity = as.integer(perplexity)
192 |   
193 |   if(use == 'PCA'){
194 |     
195 |     if(length(object@DimReduction$cell.pca) == 0){
196 |       stop('Please disperse object first')
197 |     } else {
198 |       pca0 = FALSE
199 |       pca_center0 = FALSE
200 |       pca_scale0 = FALSE
201 |       cell.pc = object@DimReduction$cell.pca[,1:npc]
202 |     }
203 |     
204 |   } else if(use == 'PLS'){
205 |     
206 |     if(length(object@DimReduction$cell.pls) == 0){
207 |       stop('Please integrate objects first')
208 |     } else {
209 |       pca0 = TRUE
210 |       pca_center0 = TRUE
211 |       pca_scale0 = TRUE
212 |       cell.pc0 = object@DimReduction$cell.pls
213 |       
214 |       if(npc <= ncol(cell.pc0)){
215 |         cell.pc = cell.pc0[,1:npc]
216 |       } else {
217 |         scale.beta = object@metadata$Beta
218 |         scale.beta0 = irlba(scale.beta, nv = npc)
219 |         cell.pc = rbind(scale.beta0$u, scale.beta0$v)
220 |         rownames(cell.pc) = c(rownames(scale.beta), colnames(scale.beta))
221 |         colnames(cell.pc) = paste0('PC', 1L:npc)
222 |         # cell.pc = scale.beta1[order(rownames(scale.beta1), decreasing = F),]
223 |       }
224 |       
225 |     }
226 |     
227 |   } else {
228 |     stop('Input use, PCA or PCR')
229 |   }
230 |   
231 |   set.seed(as.numeric(seed))
232 |   embedding = as.integer(embedding)
233 |   tsne0 = Rtsne(as.matrix(cell.pc), dims = embedding, pca = pca0, pca_center = pca_center0, pca_scale = pca_scale0, perplexity = perplexity, ... = ...)
234 |   cell.tsne = as.matrix(tsne0$Y)
235 |   rownames(cell.tsne) = rownames(cell.pc)
236 |   colnames(cell.tsne) = paste0('tSNE', 1L:embedding)
237 |   object@DimReduction[['cell.tsne']] = cell.tsne
238 |   return(object)
239 |   
240 | }
241 | 
242 | 
243 | 


--------------------------------------------------------------------------------
/R/Utilities.R:
--------------------------------------------------------------------------------
  1 | ####################################################################################
  2 | #' Utilities Subset data
  3 | ####################################################################################
  4 | #' 
  5 | #' The "Subset" function can abstract a data subset from the full dataset, this 
  6 | #' function not only collect the subset of coldata and rowdata, but also abstract 
  7 | #' raw counts/UMIs. Meanwhile, after "Subset" function, RISC object need to be 
  8 | #' normalized and scaled one more time.
  9 | #' 
 10 | #' @rdname Subset
 11 | #' @param object RISC object: a framework dataset.
 12 | #' @param cells The cells are directly used for collecting a data subset.
 13 | #' @param genes The genes are directly used for collecting a data subset.
 14 | #' @name SubSet
 15 | #' @export
 16 | #' @examples 
 17 | #' # RISC object
 18 | #' obj0 = raw.mat[[5]]
 19 | #' obj0
 20 | #' cell1 = rownames(obj0@coldata)[1:15]
 21 | #' obj1 = SubSet(obj0, cells = cell1)
 22 | #' obj1
 23 | 
 24 | SubSet <- function(object, cells = NULL, genes = NULL){
 25 |   
 26 |   coldata0 = object@coldata
 27 |   rowdata0 = object@rowdata
 28 |   raw.assay = object@assay
 29 |   DimReduction0 = object@DimReduction
 30 |   
 31 |   if(is.null(object)){
 32 |     stop('Please input a RISC object')
 33 |   } else if(!is.null(cells) & is.null(genes)){
 34 |     
 35 |     coldata0 = coldata0[rownames(coldata0) %in% cells,]
 36 |     
 37 |     if('Integration' %in% names(object@metadata)){
 38 |       
 39 |       name0 = 'logcount'
 40 |       raw.assay = raw.assay$logcount
 41 |       raw.assay = lapply(raw.assay, FUN = function(y){y[, colnames(y) %in% rownames(coldata0), drop = F]})
 42 |       raw.assay[sapply(raw.assay, function(x){dim(x)[2] == 0})] = NULL
 43 |       rowsum0 = lapply(raw.assay, FUN = function(y){Matrix::rowSums(y > 0)})
 44 |       rowsum0 = do.call(cbind, rowsum0)
 45 |       keep = Matrix::rowSums(rowsum0) > 0
 46 |       gene0 = rownames(object@rowdata)[keep]
 47 |       raw.assay = lapply(raw.assay, FUN = function(y){y[gene0, , drop = FALSE]})
 48 |       raw.assay[sapply(raw.assay, function(x){dim(x)[1] == 0})] = NULL
 49 |       
 50 |     } else {
 51 |       
 52 |       name0 = names(raw.assay)
 53 |       raw.assay = raw.assay[[name0]]
 54 |       raw.assay = raw.assay[, rownames(coldata0), drop = FALSE]
 55 |       keep = Matrix::rowSums(raw.assay > 0) > 0
 56 |       gene0 = rownames(object@rowdata)[keep]
 57 |       raw.assay = raw.assay[gene0,]
 58 |       
 59 |       if(name0 == 'count'){
 60 |         coldata0$UMI = Matrix::colSums(raw.assay)
 61 |         coldata0$nGene = Matrix::colSums(raw.assay > 0)
 62 |       } else {
 63 |         coldata0 = coldata0
 64 |       }
 65 |       
 66 |     }
 67 |     
 68 |     rowdata0 = rowdata0[gene0,]
 69 |     
 70 |     if(length(DimReduction0) > 0){
 71 |       
 72 |       for(key0 in names(DimReduction0)){
 73 |         if(!key0 %in% c("var.pca", "gene.pca")){
 74 |           DimReduction0[[key0]] = DimReduction0[[key0]][rownames(DimReduction0[[key0]]) %in% cells,]
 75 |         }
 76 |         else{
 77 |           DimReduction0[[key0]] = DimReduction0[[key0]]
 78 |         }
 79 |       }
 80 |       
 81 |     } else {
 82 |       DimReduction0 = DimReduction0
 83 |     }
 84 |     
 85 |   } else if(!is.null(cells) & !is.null(genes)){
 86 |     
 87 |     coldata0 = coldata0[rownames(coldata0) %in% cells,]
 88 |     rowdata0 = rowdata0[rownames(rowdata0) %in% genes,]
 89 |     gene0 = rownames(rowdata0)
 90 |     
 91 |     if('Integration' %in% names(object@metadata)){
 92 |       
 93 |       name0 = 'logcount'
 94 |       raw.assay = raw.assay$logcount
 95 |       raw.assay = lapply(raw.assay, FUN = function(y){y[gene0, colnames(y) %in% rownames(coldata0), drop = FALSE]})
 96 |       raw.assay[sapply(raw.assay, function(x){dim(x)[1] == 0})] = NULL
 97 |       raw.assay[sapply(raw.assay, function(x){dim(x)[2] == 0})] = NULL
 98 |       
 99 |     } else {
100 |       
101 |       name0 = names(raw.assay)
102 |       raw.assay = raw.assay[[name0]]
103 |       raw.assay = raw.assay[gene0, rownames(coldata0), drop = FALSE]
104 |       
105 |       if(name0 == 'count'){
106 |         coldata0$UMI = Matrix::colSums(raw.assay)
107 |         coldata0$nGene = Matrix::colSums(raw.assay > 0)
108 |       } else {
109 |         coldata0 = coldata0
110 |       }
111 |       
112 |     }
113 |     
114 |     if(length(DimReduction0) > 0){
115 |       
116 |       for(key0 in names(DimReduction0)){
117 |         if(!key0 %in% c("var.pca", "gene.pca")){
118 |           DimReduction0[[key0]] = DimReduction0[[key0]][rownames(DimReduction0[[key0]]) %in% cells,]
119 |         }
120 |         else{
121 |           DimReduction0[[key0]] = DimReduction0[[key0]]
122 |         }
123 |       }
124 |       
125 |     } else {
126 |       DimReduction0 = DimReduction0
127 |     }
128 |     
129 |   } else if(is.null(cells) & !is.null(genes)){
130 |   	
131 |     rowdata0 = rowdata0[rownames(rowdata0) %in% genes,]
132 |     gene0 = rownames(rowdata0)
133 |     
134 |     if('Integration' %in% names(object@metadata)){
135 |       
136 |       name0 = 'logcount'
137 |       raw.assay = raw.assay$logcount
138 |       raw.assay = lapply(raw.assay, FUN = function(y){y[gene0, , drop = FALSE]})
139 |       raw.assay[sapply(raw.assay, function(x){dim(x)[1] == 0})] = NULL
140 |       
141 |     } else {
142 |       
143 |       name0 = names(raw.assay)
144 |       raw.assay = raw.assay[[name0]]
145 |       raw.assay = raw.assay[gene0, , drop = FALSE]
146 |       
147 |       if(name0 == 'count'){
148 |         coldata0$UMI = Matrix::colSums(raw.assay)
149 |         coldata0$nGene = Matrix::colSums(raw.assay > 0)
150 |       } else {
151 |         coldata0 = coldata0
152 |       }
153 |       
154 |     }
155 |     
156 |   } else {stop('No parameters')}
157 |   
158 |   object@coldata = data.frame(coldata0)
159 |   object@rowdata = data.frame(rowdata0)
160 |   object@DimReduction = DimReduction0
161 |   object@assay = list(raw.assay)
162 |   names(object@assay) = name0
163 |   object@cluster = factor()
164 |   
165 |   return(object)
166 |   
167 | }
168 | 
169 | 
170 | 
171 | ####################################################################################
172 | #' Utilities Add Factors
173 | ####################################################################################
174 | #' 
175 | #' The "AddFactor" function can add factors to the full dataset, this function 
176 | #' can add one or more factors into coldata. Here the row.names/names of factor 
177 | #' matrix/vector should be equal to the row.names of coldata of RISC object.
178 | #' 
179 | #' @rdname AddFactor
180 | #' @param object RISC object: a framework dataset.
181 | #' @param colData Input the names that will be added into coldata of RISC object, 
182 | #' it should be characters, as the col.names of coldata.
183 | #' @param rowData Input the names that will be added into rowdata of RISC object,
184 | #' it should be characters, as the col.names of rowdata.
185 | #' @param value The factor vector or data.frame that will be added into coldata 
186 | #' or rowdata, the vector/data.frame should have equal names/row.names to the 
187 | #' row.names coldata or rowdata of RISC object. The input: vector or data.frame.
188 | #' @name AddFactor
189 | 
190 | AddFactor <- function(object, colData = NULL, rowData = NULL, value = NULL){
191 |   
192 |   coldata0 = as.data.frame(object@coldata)
193 |   col.name0 = colnames(coldata0)
194 |   rowdata0 = as.data.frame(object@rowdata)
195 |   row.name0 = colnames(rowdata0)
196 |   
197 |   if(is.null(object)) {
198 |     stop('Please input a RISC object')
199 |   } else if(!is.null(colData)) {
200 |     
201 |     colData = as.character(colData)
202 |     if(inherits(value) == "data.frame") {
203 |       coldata1 = data.frame(coldata0, colData = value)
204 |       colnames(coldata1) = c(col.name0, colData)
205 |       object@coldata = coldata1
206 |     } else {
207 |       coldata1 = data.frame(coldata0, colData = value)
208 |       colnames(coldata1) = c(col.name0, colData)
209 |       object@coldata = coldata1
210 |     }
211 |     
212 |   } else if(!is.null(rowData)){
213 |     
214 |     rowData = as.character(rowData)
215 |     if(inherits(value) == "data.frame") {
216 |       rowdata1 = data.frame(rowdata0, colData = value)
217 |       colnames(rowdata1) = c(row.name0, rowData)
218 |       object@rowdata = rowdata1
219 |     } else {
220 |       rowdata1 = data.frame(rowdata0, colData = value)
221 |       colnames(rowdata1) = c(row.name0, rowData)
222 |       object@rowdata = rowdata1
223 |     }
224 |     
225 |   } else {
226 |     stop('Value should be vector or data.frame, of which the name/row.names should be equal to row.names of coldata or rowdata')
227 |   }
228 |   
229 |   return(object)
230 |   
231 | }
232 | 
233 | 
234 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | ## RISC
 2 | 
 3 | 
 4 | ### Overview
 5 | Integrated analysis of single cell RNA-sequencing (scRNA-seq) data from multiple batches or studies is often necessary in order to learn functional changes in cellular states upon experimental perturbations or cell type relationships in a developmental lineage. Here we introduce a new algorithm (RPCI) that uses the gene-eigenvectors from a reference dataset to establish a global frame for projecting all datasets, with a clear advantage in preserving genuine gene expression differences in matching cell types between samples, such as those present in cells at distinct developmental stages or in perturbated vs control studies. This R package “RISC” (Robust Integration of Sinlgle Cell RNA-seq data implements the RPCI algorithm, with additional functions for scRNA-seq analysis, such as clustering cells, identifying cluster marker genes, detecting differentially expressed genes between experimental conditions, and importantly outputting integrated gene expression values for downstream data analysis.
 6 | 
 7 | #### RISC v1.7 update, Mar 26, 2024
 8 | This version mainly solves the problems that are caused by the update of dependent package igraph. We also add clustering resolution parameters ('res') in InPlot and scCluster (for 'louvain' method) funcitons. Additionally, the mean expression and %s of expressing cells are included for both groups in the marker and differential expression results. Installtion note:  dependent package “sparseMatrixStats” is in Bioconductor only (it cannot be installed by GitHub).
 9 | 
10 | #### RISC v1.6 update
11 | This version mainly solves the problems that are caused by dependent package updates
12 | 
13 | #### RISC v1.5 update
14 | Changes from the last release (v1.0) <br />
15 | (1) Replace dependent "RcppEigen" with "RcppArmadillo", fully support sparse matrix in core functions. <br />
16 | (2) Replace dependent "pbmcapply" with "pbapply" <br />
17 | (3) Optimize "scMultiIntegrate" function and reduce memory-consuming; the new RISC release can support integration of datasets with >1.5 million cells and 10,000 genes. <br />
18 | (4) When data integration, all the genes expressed in indivudal datasets will be reserved in the integrated data. The genes shared expressed across samples will be labeled in "rowdata" of RISC object. <br />
19 | (5) Convert "logcounts" in the integrated RISC object "object@assay$logcount" from a large matrix to a list including multiple logcounts matrices, each corrected matrix for the corresponding individual data sets. To output full integrated matrix, mat0 = do.call(cbind, object@assay$logcount) <br />
20 | (6) Change function name "readscdata" -> "readsc" <br />
21 | (7) Change function name "read10Xgenomics" -> "read10X_mtx" <br />
22 | (8) Parameter names in some functions are changed. <br />
23 | 
24 | Added new functions <br />
25 | (1) In "scMarker" and "AllMarker" functions, add Wilcoxon Rank Sum and Signed Rank model. <br />
26 | (2) In "scMarker", "AllMarker" and "scDEG" functions, add pseudo-cell (bin cells to generate meta-cells) option to detect marker genes. <br />
27 | (3) Add "slot" parameter in "DimPlot" function, external dimension reduction results can be added in RISC object, e.g. add phate results (phate0) to RISC object obj0@DimReduction$cell.phate = phate0; DimPlot(obj0, slot = "cell.phate", colFactor = 'Group', size = 2, label = TRUE) <br />
28 | (4) Add "read10X_h5" function for 10X Genomics h5 file. <br />
29 | 
30 | Removed old functions <br />
31 | (1) delete "readHTSeqdata" function. <br />
32 | 
33 | 
34 | #### Install dependent packages:
35 | ```
36 | install.packages(c("Matrix", "irlba", "doParallel", "foreach", "Rtsne", "umap", "MASS", "pbapply", "Rcpp", "RcppArmadillo", "densityClust", "FNN", "igraph", "RColorBrewer", "ggplot2", "gridExtra", "pheatmap", "hdf5r"))
37 | BiocManager::install("sparseMatrixStats")
38 | ```
39 | 
40 | #### Install RISC:
41 | ```
42 | install_github("https://github.com/bioinfoDZ/RISC.git")
43 | ```
44 | The RISC package can also be downloaded and installed mannually
45 | <a href="https://github.com/bioinfoDZ/RISC/blob/master/RISC_1.7.tar.gz" download="RISC_1.7.tar.gz">Link</a>
46 | ```
47 | install.packages("/Path/to/RISC_1.7.tar.gz", repos = NULL, type = "source")
48 | ```
49 | 
50 | 
51 | ### vignettes
52 | Here we provide a vignette which shows the key steps in analyzing example scRNA-seq datasets from the basal or squamous carcinoma patients before and after anti-PD-1 therapy (GSE123813). Please also check the RISC functions for reading data directly from h5 files. 
53 | 
54 | #### RISC   v1.0 <a href="https://github.com/bioinfoDZ/RISC/blob/master/GSE123813_Vignette_RISC_v1.0.pdf" download="GSE123813_Vignette_RISC_v1.0.pdf">Link</a>
55 | #### RISC   v1.6 <a href="https://github.com/bioinfoDZ/RISC/blob/master/GSE123813_Vignette_RISC_v1.6.pdf" download="GSE123813_Vignette_RISC_v1.6.pdf">Link</a>
56 | 
57 | We also provide an example of how to convert a Seurat object to a RISC object (to use the new features, please reinstall RISC package), similarly one can convert a RISC object to a Seurat object. 
58 | 
59 | #### RISC v1.0   <a href="https://github.com/bioinfoDZ/RISC/blob/master/Seurat_to_RISC_RISC_v1.0.pdf" download="Seurat_to_RISC_RISC_v1.0.pdf">Link</a>
60 | #### Notice, RISC v1.6 package is developed in R (v4.2.2), we test this vignette in the same R version.
61 | #### Notice, RISC v1.7 package is developed in R (v4.3.3)
62 | 
63 | 
64 | #### Contents:
65 | (1) RISC package: "RISC_1.7.tar.gz" <br />
66 | (2) Vignette for GSE123813: "GSE123813_Vignette_RISC_v1.6.pdf" <br />
67 | (3) GSE123813 directory contains the information of cell-type, patients and treatment.
68 | file position, "/GSE123813/Raw_Data/bcc_annotation.tsv" <br />
69 | 
70 | Old RISC version: "RISC_1.0.tar.gz"
71 | Old RISC version: "RISC_1.6.0.tar.gz"
72 | 
73 | 
74 | ### Citation:
75 | Liu Y, Tao W, Zhou B, Zheng D (2021) Robust integration of multiple single-cell RNA sequencing datasets using a single reference space. 
76 |  <a href="https://doi.org/10.1038/s41587-021-00859-x">Nat Biotechnol 39(7):877-884.</a>
77 | 


--------------------------------------------------------------------------------
/RISC_1.0.tar.gz:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/bioinfoDZ/RISC/7f1ddf61711f0d24ef8ea0601e550370ff3ec2b5/RISC_1.0.tar.gz


--------------------------------------------------------------------------------
/RISC_1.6.0.tar.gz:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/bioinfoDZ/RISC/7f1ddf61711f0d24ef8ea0601e550370ff3ec2b5/RISC_1.6.0.tar.gz


--------------------------------------------------------------------------------
/RISC_1.7.tar.gz:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/bioinfoDZ/RISC/7f1ddf61711f0d24ef8ea0601e550370ff3ec2b5/RISC_1.7.tar.gz


--------------------------------------------------------------------------------
/RISC_Supplementary/GSE110823/GSE110823.R:
--------------------------------------------------------------------------------
  1 | library(RISC)
  2 | library(Matrix)
  3 | library(R.matlab)
  4 | 
  5 | 
  6 | PATH = "/Path to the data/GSE110823"
  7 | 
  8 | ###################################################################################
  9 | ### Prepare Data ###
 10 | ###################################################################################
 11 | ## All Cells
 12 | dat0 = readMat(paste0(PATH, "/GSM3017261_150000_CNS_nuclei.mat"))
 13 | mat0 = dat0$DGE
 14 | coldata0 = data.frame(Barcode = paste0("Cell-", dat0$barcodes[1,]), Organ = dat0$sample.type[,1], Type = dat0$cluster.assignment[,1])
 15 | coldata0$Type = sapply(coldata0$Type, function(x){gsub(" ", "-", x, fixed = T)})
 16 | gene0 = data.frame(Symbol = dat0$genes[,1])
 17 | gene0$Symbol = sapply(gene0$Symbol, function(x){gsub(" ", "", x, fixed = T)})
 18 | colnames(mat0) = rownames(gene0) = gene0$Symbol
 19 | rownames(mat0) = rownames(coldata0) = coldata0$Barcode
 20 | 
 21 | # P2 Brain
 22 | keep = coldata0$Organ == "p2_brain " & !coldata0$Type %in% c("53-Unresolved------------------", "54-Unresolved-Kcng1------------")
 23 | coldata1 = coldata0[keep,]
 24 | mat1 = t(mat0[keep,])
 25 | keep = Matrix::rowSums(mat1 > 0) > 0
 26 | mat1 = mat1[keep,]
 27 | rowdata1 = data.frame(Symbol = rownames(mat1), row.names = rownames(mat1))
 28 | dat1 = readscdata(mat1, coldata1, rowdata1, is.filter = F)
 29 | 
 30 | # P11 Brain
 31 | keep = coldata0$Organ == "p11_brain" & !coldata0$Type %in% c("53-Unresolved------------------", "54-Unresolved-Kcng1------------")
 32 | coldata1 = coldata0[keep,]
 33 | mat1 = t(mat0[keep,])
 34 | keep = Matrix::rowSums(mat1 > 0) > 0
 35 | mat1 = mat1[keep,]
 36 | rowdata1 = data.frame(Symbol = rownames(mat1), row.names = rownames(mat1))
 37 | dat2 = readscdata(mat1, coldata1, rowdata1, is.filter = F)
 38 | 
 39 | 
 40 | ###################################################################################
 41 | ### RISC Objects ###
 42 | ###################################################################################
 43 | process0 <- function(obj0){
 44 |   obj0 = scFilter(obj0, min.UMI = 500, max.UMI = 5000, min.gene = 200, min.cell = 3)
 45 |   obj0 = scNormalize(obj0)
 46 |   obj0 = scDisperse(obj0)
 47 |   print(length(obj0@vargene))
 48 |   return(obj0)
 49 | }
 50 | 
 51 | dat1 = process0(dat1)
 52 | dat2 = process0(dat2)
 53 | 
 54 | 
 55 | ###################################################################################
 56 | ### Uncorrect ###
 57 | ###################################################################################
 58 | library(irlba)
 59 | library(Rtsne)
 60 | library(ggplot2)
 61 | library(RColorBrewer)
 62 | 
 63 | set.seed(1987)
 64 | gene0 = intersect(dat1@rowdata$Symbol, dat2@rowdata$Symbol)
 65 | ann0 = read.table(file = paste0(PATH, "/GSE110823_ann.tsv"), sep = "\t", header = T, stringsAsFactors = F)
 66 | logmat0 = cbind(dat2@assay$logcount[gene0, ann0$Barcode[ann0$Set == "P11"]], dat1@assay$logcount[gene0, ann0$Barcode[ann0$Set == "P2"]])
 67 | pca0 = irlba(logmat0, nv = 50)$v
 68 | tsne0 = Rtsne(pca0)$Y
 69 | m0 = data.frame(tSNE1 = tsne0[,1], tSNE2 = tsne0[,2], Set = ann0$Set, CellType = ann0$CellType)
 70 | m0$Set = factor(m0$Set, levels = c("P2", "P11"))
 71 | m0$CellType = factor(m0$CellType, levels = paste0("C", 1:59))
 72 | color0 = c("#1B9E77", "#288E96", "#367EB6", "#419486", "#4BAC50", "#579055", "#636C63", "#667F49", "#669E26", "#6B9254", "#72789C", "#8064AD", "#9154A5", "#98649F", "#98899B", "#9C867A", "#A26643", "#A65D25", "#A66D20", "#B07117", "#C9650A", "#DB5206", "#E03013", "#E42F18", "#E5760B", "#E69B12", "#E65C54", "#E8318E", "#F05BA8", "#F780B3", "#FB7F56", "#FF8201", "#FFC01A", "#FFFF33", "#9E0142", "#AF1446", "#C0274A", "#D13A4E", "#DC494C", "#E65848", "#F06744", "#F57948", "#F88D51", "#FBA15B", "#FDB466", "#FDC373", "#FDD380", "#FEE18E", "#FEEB9E", "#FEF5AE", "#FFFFBF", "#F7FBB2", "#EFF8A6", "#E7F59A", "#D7EF9B", "#C4E79E", "#B2E0A2", "#9ED7A4", "#88CFA4", "#72C7A4", "#5FBAA8", "#4FA8AF", "#3F96B7", "#3484BB", "#4272B2", "#5060AA", "#5E4FA2")
 73 | 
 74 | ggplot(m0, aes(tSNE1, tSNE2)) + 
 75 |   geom_point(aes(color = Set), size = 0.5) + 
 76 |   scale_color_manual(values = c("#FB8072", "#80B1D3")) + 
 77 |   theme_bw(base_size = 12, base_line_size = 0) + 
 78 |   labs(color = "Set") + 
 79 |   guides(color = guide_legend(override.aes = list(size = 8), ncol = 1))
 80 | ggplot(m0, aes(tSNE1, tSNE2)) + 
 81 |   geom_point(aes(color = CellType), size = 0.5) + 
 82 |   scale_color_manual(values = color0) + 
 83 |   theme_bw(base_size = 12, base_line_size = 0) + 
 84 |   labs(color = "Cell Type") + 
 85 |   theme(legend.text = element_text(size = 16), legend.title = element_text(size = 20)) + 
 86 |   guides(color = guide_legend(override.aes = list(size = 8), ncol = 4))
 87 | 
 88 | 
 89 | ###################################################################################
 90 | ### Integration Data ###
 91 | ###################################################################################
 92 | ## Integration
 93 | var0 = read.table(file = paste0(PATH, "/var.tsv"), sep = "\t", header = F, stringsAsFactors = F)
 94 | var0 = var0$V1
 95 | data0 = list(dat2, dat1)
 96 | InPlot(data0, var.gene = var0, ncore = 6)
 97 | data0 = scMultiIntegrate(data0, eigens = 40, var.gene = var0, add.Id = c("P11", "P2"), adjust = F, ncore = 1)
 98 | data0 = scTSNE(data0, npc = 42, use = "PLS")
 99 | ann0 = read.table(file = paste0(PATH, "/GSE110823_ann.tsv"), sep = "\t", header = T, stringsAsFactors = F)
100 | ann0$scBarcode = paste0(ann0$Set, "_", ann0$Barcode)
101 | cell0 = intersect(rownames(data0@coldata), ann0$scBarcode)
102 | data0 = SubSet(data0, cells = cell0)
103 | data0@coldata$CellType = factor(ann0$CellType, levels = paste0("C", 1:59))
104 | 
105 | color0 = c("#1B9E77", "#288E96", "#367EB6", "#419486", "#4BAC50", "#579055", "#636C63", "#667F49", "#669E26", "#6B9254", "#72789C", "#8064AD", "#9154A5", "#98649F", "#98899B", "#9C867A", "#A26643", "#A65D25", "#A66D20", "#B07117", "#C9650A", "#DB5206", "#E03013", "#E42F18", "#E5760B", "#E69B12", "#E65C54", "#E8318E", "#F05BA8", "#F780B3", "#FB7F56", "#FF8201", "#FFC01A", "#FFFF33", "#9E0142", "#AF1446", "#C0274A", "#D13A4E", "#DC494C", "#E65848", "#F06744", "#F57948", "#F88D51", "#FBA15B", "#FDB466", "#FDC373", "#FDD380", "#FEE18E", "#FEEB9E", "#FEF5AE", "#FFFFBF", "#F7FBB2", "#EFF8A6", "#E7F59A", "#D7EF9B", "#C4E79E", "#B2E0A2", "#9ED7A4", "#88CFA4", "#72C7A4", "#5FBAA8", "#4FA8AF", "#3F96B7", "#3484BB", "#4272B2", "#5060AA", "#5E4FA2")
106 | DimPlot(data0, slot = "cell.tsne", colFactor = "Set")
107 | DimPlot(data0, slot = "cell.tsne", colFactor = "CellType", Colors = color0)
108 | 
109 | DimPlot(data0, slot = "cell.tsne", genes = "Ebf3", size = 0.2)
110 | DimPlot(data0, slot = "cell.tsne", genes = "Fat2", size = 0.2)
111 | 
112 | 
113 | 


--------------------------------------------------------------------------------
/RISC_Supplementary/GSE111113/GSE111113.R:
--------------------------------------------------------------------------------
 1 | library(RISC)
 2 | library(ggplot2)
 3 | library(RColorBrewer)
 4 | library(irlba)
 5 | library(umap)
 6 | 
 7 | 
 8 | PATH = "/Path to the data/GSE111113"
 9 | 
10 | ###################################################################################
11 | ### RISC raw ###
12 | ###################################################################################
13 | mat0 = read.table(file = paste0(PATH, '/GSE111113_Table_S1_FilterNormal10xExpMatrix.txt'), sep = '\t', header = T, stringsAsFactors = F)
14 | mat1 = as.matrix(mat0[,-c(1:3)])
15 | rownames(mat1) = mat0$gene_id
16 | Group = sapply(colnames(mat1), function(x){strsplit(x, "_")[[1]][1]})
17 | Symbol0 = mat0[,c(1, 3)]
18 | colnames(Symbol0) = c('Ensembl', 'Symbol')
19 | 
20 | 
21 | ###################################################################################
22 | ### Uncorrected data ###
23 | ###################################################################################
24 | dat0 = readscdata(count = mat1, cell = data.frame(Time = Group, row.names = colnames(mat1)), gene = data.frame(Symbol = rownames(mat1), row.names = rownames(mat1)))
25 | dat0 = scFilter(dat0, min.UMI = 500, max.UMI = Inf, min.gene = 200, min.cell = 5)
26 | cell0 = rownames(dat0@coldata)[dat0@coldata$Time %in% c('E16', 'E18', 'P4', 'Adu1', 'Adu2')]
27 | dat0 = SubSet(dat0, cells = cell0)
28 | dat0 = scNormalize(dat0)
29 | dat0 = scDisperse(dat0)
30 | length(dat0@vargene)
31 | dat0 = scPCA(dat0)
32 | dat0 = scUMAP(dat0)
33 | 
34 | UMAPlot(dat0, colFactor = 'Time', Colors = brewer.pal(5, 'Spectral'))
35 | 
36 | 
37 | ###################################################################################
38 | ### RISC integration ###
39 | ###################################################################################
40 | cell0 = rownames(dat0@coldata)[dat0@coldata$Time == 'E16']
41 | dat1 = SubSet(dat0, cells = cell0)
42 | cell0 = rownames(dat0@coldata)[dat0@coldata$Time == 'E18']
43 | dat2 = SubSet(dat0, cells = cell0)
44 | cell0 = rownames(dat0@coldata)[dat0@coldata$Time == 'P4']
45 | dat3 = SubSet(dat0, cells = cell0)
46 | cell0 = rownames(dat0@coldata)[dat0@coldata$Time == 'Adu1']
47 | dat4 = SubSet(dat0, cells = cell0)
48 | cell0 = rownames(dat0@coldata)[dat0@coldata$Time == 'Adu2']
49 | dat5 = SubSet(dat0, cells = cell0)
50 | 
51 | dat1 = scDisperse(dat1)
52 | length(dat1@vargene)
53 | dat2= scDisperse(dat2)
54 | length(dat2@vargene)
55 | dat3= scDisperse(dat3)
56 | length(dat3@vargene)
57 | dat4= scDisperse(dat4)
58 | length(dat4@vargene)
59 | dat5= scDisperse(dat5)
60 | length(dat5@vargene)
61 | 
62 | 
63 | ###################################################################################
64 | ### RISC integration ###
65 | ###################################################################################
66 | ### Integration All
67 | var0 = read.table(file = paste0(PATH, "/var.tsv"), sep = "\t", header = F, stringsAsFactors = F)
68 | var0 = var0$V1
69 | dat.all = list(dat4, dat5, dat3, dat2, dat1)
70 | InPlot(dat.all, var.gene = var0)
71 | dat.all = scMultiIntegrate(dat.all, eigens = 15, var.gene = var0, add.Id = c('Adult1', 'Adult2', 'P4', 'E18', 'E16'), ncore = 4)
72 | dat.all = scUMAP(dat.all, npc = 15, use = 'PLS')
73 | dat.all@coldata$Set = factor(dat.all@coldata$Set, levels = c('E16', 'E18', 'P4', 'Adult1', 'Adult2'))
74 | 
75 | DimPlot(dat.all, slot = "cell.umap", colFactor = 'Set')
76 | UMAPlot(dat.all, genes = Symbol0$Ensembl[Symbol0$Symbol == 'Wfdc18'])
77 | UMAPlot(dat.all, genes = Symbol0$Ensembl[Symbol0$Symbol == 'Sostdc1'])
78 | 
79 | ## Integration Patial
80 | dat.par = list(dat4, dat5, dat2, dat1)
81 | dat.par = scMultiIntegrate(dat.par, eigens = 20, var.gene = var0, add.Id = c('Adult1', 'Adult2', 'E18', 'E16'), ncore = 4)
82 | dat.par = scUMAP(dat.par, npc = 20, use = 'PLS')
83 | dat.par@coldata$Set = factor(dat.par@coldata$Set, levels = c('E16', 'E18', 'Adult1', 'Adult2'))
84 | 
85 | DimPlot(dat.par, slot = "cell.umap", colFactor = 'Set')
86 | UMAPlot(dat.par, genes = Symbol0$Ensembl[Symbol0$Symbol == 'Wfdc18'])
87 | UMAPlot(dat.par, genes = Symbol0$Ensembl[Symbol0$Symbol == 'Sostdc1'])
88 | 
89 | 
90 | 


--------------------------------------------------------------------------------
/RISC_Supplementary/GSE114727/GSE114727.R:
--------------------------------------------------------------------------------
  1 | library(RISC)
  2 | library(irlba)
  3 | library(umap)
  4 | library(ggplot2)
  5 | library(RColorBrewer)
  6 | 
  7 | 
  8 | PATH = "/Path to the data/GSE114727"
  9 | 
 10 | ###################################################################################
 11 | ### Raw Data ###
 12 | ###################################################################################
 13 | # BC09 rep1
 14 | cell1 = read.csv(file = paste0(PATH, "/10X_Genomics/GSM3148580_BC09_TUMOR1_filtered_contig_annotations.csv"), sep = ",", header = T, stringsAsFactors = F)
 15 | cell1 = cell1[cell1$full_length == 'True' & cell1$productive == 'True',]
 16 | data1 = read10Xgenomics(data.path = paste0(PATH, "/10X_Genomics/BC09_Tech1"))
 17 | data1 = SubSet(data1, cells = cell1$barcode)
 18 | 
 19 | # BC09 rep2
 20 | cell2 = read.csv(file = paste0(PATH, "/10X_Genomics/GSM3148581_BC09_TUMOR2_filtered_contig_annotations.csv"), sep = ",", header = T, stringsAsFactors = F)
 21 | cell2 = cell2[cell2$full_length == 'True' & cell2$productive == 'True',]
 22 | data2 = read10Xgenomics(data.path = paste0(PATH, "/10X_Genomics/BC09_Tech2"))
 23 | data2 = SubSet(data2, cells = cell2$barcode)
 24 | 
 25 | # BC10
 26 | cell3 = read.csv(file = paste0(PATH, "/10X_Genomics/GSM3148582_BC10_TUMOR1_filtered_contig_annotations.csv"), sep = ",", header = T, stringsAsFactors = F)
 27 | cell3 = cell3[cell3$full_length == 'True' & cell3$productive == 'True',]
 28 | data3 = read10Xgenomics(data.path = paste0(PATH, "/10X_Genomics/BC10_Tech1"))
 29 | data3 = SubSet(data3, cells = cell3$barcode)
 30 | 
 31 | # BC11 rep1
 32 | cell4 = read.csv(file = paste0(PATH, "/10X_Genomics/GSM3148583_BC11_TUMOR1_filtered_contig_annotations.csv"), sep = ",", header = T, stringsAsFactors = F)
 33 | cell4 = cell4[cell4$full_length == 'True' & cell4$productive == 'True',]
 34 | data4 = read10Xgenomics(data.path = paste0(PATH, "/10X_Genomics/BC11_Tech1"))
 35 | data4 = SubSet(data4, cells = cell4$barcode)
 36 | 
 37 | # BC11 rep2
 38 | cell5 = read.csv(file = paste0(PATH, "/10X_Genomics/GSM3148584_BC11_TUMOR2_filtered_contig_annotations.csv"), sep = ",", header = T, stringsAsFactors = F)
 39 | cell5 = cell5[cell5$full_length == 'True' & cell5$productive == 'True',]
 40 | data5 = read10Xgenomics(data.path = paste0(PATH, "/10X_Genomics/BC11_Tech2"))
 41 | data5 = SubSet(data5, cells = cell5$barcode)
 42 | 
 43 | # GSE110686
 44 | data6 = read10Xgenomics(data.path = paste0(PATH, "/10X_Genomics/GSE110686"))
 45 | 
 46 | 
 47 | ###################################################################################
 48 | ### Prepare Data ###
 49 | ###################################################################################
 50 | process0 <- function(obj0){
 51 |   obj0 = scFilter(obj0, min.UMI = 1000, max.UMI = 20000, min.gene = 500, min.cell = 5)
 52 |   obj0 = scNormalize(obj0, ncore = 4)
 53 |   obj0 = scDisperse(obj0)
 54 |   print(length(obj0@vargene))
 55 |   return(obj0)
 56 | }
 57 | 
 58 | data1 = process0(data1)
 59 | data2 = process0(data2)
 60 | data3 = process0(data3)
 61 | data4 = process0(data4)
 62 | data5 = process0(data5)
 63 | data6 = process0(data6)
 64 | 
 65 | FilterPlot(data1)
 66 | FilterPlot(data2)
 67 | FilterPlot(data3)
 68 | FilterPlot(data4)
 69 | FilterPlot(data5)
 70 | FilterPlot(data6)
 71 | 
 72 | 
 73 | ###################################################################################
 74 | ### Uncorrect Data ###
 75 | ###################################################################################
 76 | var0 = Reduce(intersect, list(
 77 |   rownames(data1@assay$logcount), rownames(data2@assay$logcount), rownames(data3@assay$logcount), 
 78 |   rownames(data4@assay$logcount), rownames(data5@assay$logcount), rownames(data6@assay$logcount)
 79 | ))
 80 | logmat0 = cbind(
 81 |   as.matrix(data2@assay$logcount)[var0,], as.matrix(data1@assay$logcount)[var0,], 
 82 |   as.matrix(data3@assay$logcount)[var0,], as.matrix(data4@assay$logcount)[var0,], 
 83 |   as.matrix(data5@assay$logcount)[var0,], as.matrix(data6@assay$logcount)[var0,]
 84 | )
 85 | pca0 = irlba(logmat0, nv = 12)$v
 86 | umap0 = umap(pca0)$layout
 87 | ann0 = read.table(file = paste0(PATH, "/GSE114727_Anno_All.tsv"), sep = "\t", header = T, stringsAsFactors = F)
 88 | m0 = data.frame(UMAP1 = umap0[,1], UMAP2 = umap0[,2], Set = ann0$Set, Type = ann0$Type)
 89 | m0$Type = factor(m0$Type, levels = c(
 90 |   "CD4+ sub1", "CD4+ sub1 TN", "CD4+ sub2", "CD4+ sub3", "CD4+ sub4", 
 91 |   "CD4+ Treg", "CD8+ sub1", "CD8+ sub2", "CD8+ sub3", "CD8+ Trm"
 92 | ))
 93 | m0$Set0 = factor(
 94 |   m0$Set, 
 95 |   levels = c("BC_ER", "BC1_ER_PR", "BC2_ER_PR", "BC1_Her2", "BC2_Her2", "BC_TN"), 
 96 |   labels = c("BC ER+", "BC ER+PR+", "BC ER+PR+", "BC Her2+", "BC Her2+", "BC TN")
 97 | )
 98 | 
 99 | ggplot(m0, aes(UMAP1, UMAP2)) + 
100 |   geom_point(aes(color = Set0, shape = Set), size = 1, alpha = 1) + 
101 |   scale_color_manual(values = c("#1B9E77", "#D95F02", "#7570B3", "#E7298A")) + 
102 |   scale_shape_manual(values = c(1, 2, 4, 5, 6, 8)) + 
103 |   theme_bw(base_line_size = 0) + 
104 |   labs(x = 'UMAP-1', y = 'UMAP-2', color = 'Set', shape = 'Source') + 
105 |   theme(plot.title = element_text(hjust = 0.5, size = 16)) + 
106 |   guides(colour = guide_legend(override.aes = list(size = 5), ncol = 1), shape = guide_legend(override.aes = list(size = 5), ncol = 1))
107 | 
108 | ggplot(m0, aes(UMAP1, UMAP2)) + 
109 |   geom_point(aes(color = Type, shape = Set), size = 0.5, alpha = 1) + 
110 |   scale_color_manual(values = c("#E41A1C", "#377EB8", "#4DAF4A", "#6A3D9A", "#FF7F00", "#A65628", "#F781BF", "#984EA3", "#999999", "#E7298A")) + 
111 |   scale_shape_manual(values = c(1, 2, 4, 5, 6, 8)) + 
112 |   theme_bw(base_line_size = 0) + 
113 |   labs(x = 'UMAP-1', y = 'UMAP-2', color = 'Set', shape = 'Source') + 
114 |   theme(plot.title = element_text(hjust = 0.5, size = 16)) + 
115 |   guides(colour = guide_legend(override.aes = list(size = 5), ncol = 1), shape = guide_legend(override.aes = list(size = 5), ncol = 1))
116 | 
117 | 
118 | ###################################################################################
119 | ### Data Integration ###
120 | ###################################################################################
121 | ## BC positive
122 | dat.pos = list(data2, data1, data3, data4, data5)
123 | var0 = read.table(file = paste0(PATH, "/var_pos.tsv"), sep = "\t", header = F, stringsAsFactors = F)
124 | var0 = var0$V1
125 | InPlot(dat.pos, var.gene = var0)
126 | dat.pos = scMultiIntegrate(dat.pos, eigens = 12, var.gene = var0, add.Id = c("BC2_ER_PR", "BC1_ER_PR", "BC_ER", "BC1_Her2", "BC2_Her2"), ncore = 4)
127 | dat.pos = scUMAP(dat.pos, npc = 12, use = "PLS")
128 | 
129 | ann0 = read.table(file = paste0(PATH, "/GSE114727_Anno_All.tsv"), sep = "\t", header = T, stringsAsFactors = F)
130 | ann0 = ann0[ann0$Set != "BC_TN",]
131 | dat.pos@coldata$Type = as.character(ann0$Type)
132 | 
133 | DimPlot(dat.pos, slot = "cell.umap", colFactor = "Set", size = 0.5)
134 | DimPlot(dat.pos, slot = "cell.umap", colFactor = "Type", size = 0.5)
135 | DimPlot(dat.pos, slot = "cell.umap", genes = c("HAVCR2", "CD8A", "CD4", "FOXP3"), size = 0.2)
136 | 
137 | 
138 | ## BC positive and triple negative
139 | dat.all = list(data2, data1, data3, data4, data5, data6)
140 | var0 = read.table(file = paste0(PATH, "/var_all.tsv"), sep = "\t", header = F, stringsAsFactors = F)
141 | var0 = var0$V1
142 | InPlot(dat.all, var.gene = var0)
143 | dat.all = scMultiIntegrate(dat.all, eigens = 12, var.gene = var0, add.Id = c("BC2_ER_PR", "BC1_ER_PR", "BC_ER", "BC1_Her2", "BC2_Her2", "BC_TN"), ncore = 4)
144 | dat.all = scUMAP(dat.all, npc = 12, use = "PLS")
145 | 
146 | ann0 = read.table(file = paste0(PATH, "/GSE114727_Anno_All.tsv"), sep = "\t", header = T, stringsAsFactors = F)
147 | dat.all@coldata$Type = as.character(ann0$Type)
148 | 
149 | DimPlot(dat.all, slot = "cell.umap", colFactor = "Set", size = 0.5)
150 | DimPlot(dat.all, slot = "cell.umap", colFactor = "Type", size = 0.5)
151 | 
152 | UMAPlot(dat.all, genes = "HAVCR2", size = 0.5, exp.col = "firebrick2")
153 | UMAPlot(dat.all, genes = "CD8A", size = 0.5, exp.col = "firebrick2")
154 | UMAPlot(dat.all, genes = "CD4", size = 0.5, exp.col = "firebrick2")
155 | UMAPlot(dat.all, genes = "FOXP3", size = 0.5, exp.col = "firebrick2")
156 | UMAPlot(dat.all, genes = "CD40LG", size = 0.5, exp.col = "firebrick2")
157 | UMAPlot(dat.all, genes = "DPP4", size = 0.5, exp.col = "firebrick2")
158 | UMAPlot(dat.all, genes = "CHN1", size = 0.5, exp.col = "firebrick2")
159 | UMAPlot(dat.all, genes = "KRT86", size = 0.5, exp.col = "firebrick2")
160 | UMAPlot(dat.all, genes = "LINC00402", size = 0.5, exp.col = "firebrick2")
161 | UMAPlot(dat.all, genes = "IKZF2", size = 0.5, exp.col = "firebrick2")
162 | UMAPlot(dat.all, genes = "ZNF683", size = 0.5, exp.col = "firebrick2")
163 | UMAPlot(dat.all, genes = "AIF1", size = 0.5, exp.col = "firebrick2")
164 | UMAPlot(dat.all, genes = "PLEK", size = 0.5, exp.col = "firebrick2")
165 | 
166 | 
167 | 


--------------------------------------------------------------------------------
/RISC_Supplementary/GSE114727/var_all.tsv:
--------------------------------------------------------------------------------
  1 | MMP23B
  2 | TNFRSF25
  3 | TNFRSF9
  4 | AGTRAP
  5 | DHRS3
  6 | SPEN
  7 | ID3
  8 | RUNX3
  9 | STMN1
 10 | UBXN11
 11 | ZNF683
 12 | MAP3K6
 13 | FGR
 14 | MATN1-AS1
 15 | S100PBP
 16 | ZC3H12A
 17 | C1orf228
 18 | PLK3
 19 | CDKN2C
 20 | ZCCHC11
 21 | PDE4B
 22 | GADD45A
 23 | IFI44L
 24 | LMO4
 25 | GBP3
 26 | GBP1
 27 | GBP4
 28 | TGFBR3
 29 | DPYD
 30 | CDC14A
 31 | VCAM1
 32 | S1PR1
 33 | CSF1
 34 | CHI3L2
 35 | C1orf162
 36 | PTPN22
 37 | FAM46C
 38 | CD160
 39 | PDE4DIP
 40 | LINC00869
 41 | PBXIP1
 42 | LMNA
 43 | SEMA4A
 44 | FCRL3
 45 | FCRL6
 46 | CD84
 47 | SLAMF7
 48 | CD244
 49 | MPZ
 50 | HSPA6
 51 | FCGR3A
 52 | GPA33
 53 | XCL2
 54 | XCL1
 55 | ATP1B1
 56 | FASLG
 57 | TNFSF4
 58 | RABGAP1L
 59 | GLUL
 60 | RGS16
 61 | C1orf21
 62 | IVNS1ABP
 63 | RGS13
 64 | RGS2
 65 | ASPM
 66 | LAX1
 67 | IL10
 68 | CD55
 69 | G0S2
 70 | HHAT
 71 | TRAF5
 72 | ATF3
 73 | CENPF
 74 | TMEM63A
 75 | GALNT2
 76 | TTC13
 77 | GNG4
 78 | LYST
 79 | EFCAB2
 80 | NLRP3
 81 | RSAD2
 82 | AC092580.4
 83 | RRM2
 84 | FAM49A
 85 | RHOB
 86 | SLC30A3
 87 | EIF2AK2
 88 | AC006369.2
 89 | CDC42EP3
 90 | HNRNPLL
 91 | GALM
 92 | THADA
 93 | SPTBN1
 94 | PLEK
 95 | CAPG
 96 | GNLY
 97 | CD8A
 98 | CD8B
 99 | AC133644.2
100 | RP11-1399P15.1
101 | MAL
102 | ANKRD36B
103 | IL1R2
104 | LIMS1
105 | MIR4435-2HG
106 | AC017002.1
107 | ZC3H6
108 | SLC20A1
109 | PTPN4
110 | HS6ST1
111 | ZEB2
112 | RBMS1
113 | METTL8
114 | ITGA6
115 | CDCA7
116 | GPR155
117 | CHN1
118 | TTN
119 | AC104820.2
120 | ITGA4
121 | NAB1
122 | SLC39A10
123 | HSPD1
124 | SPATS2L
125 | CTLA4
126 | ICOS
127 | EPHA4
128 | ITM2C
129 | PASK
130 | PDCD1
131 | ITPR1
132 | BHLHE40-AS1
133 | BHLHE40
134 | SRGAP3
135 | SH3BP5
136 | ANKRD28
137 | RFTN1
138 | NR1D2
139 | SLC4A7
140 | RP11-222K16.2
141 | EOMES
142 | CMC1
143 | CMTM8
144 | TRANK1
145 | CSRNP1
146 | ENTPD3-AS1
147 | CCR2
148 | CCR5
149 | LRRC2
150 | CISH
151 | ATXN7
152 | FRMD4B
153 | SENP7
154 | NFKBIZ
155 | BBX
156 | ZBED2
157 | CD200
158 | BTLA
159 | SIDT1
160 | RP11-553L6.2
161 | ZNF80
162 | GOLGB1
163 | PARP9
164 | PARP15
165 | H1FX
166 | SLC9A9
167 | PLSCR1
168 | GYG1
169 | HPS3
170 | GPR171
171 | RAP2B
172 | TIPARP
173 | SMC4
174 | SKIL
175 | TNFSF10
176 | KLHL24
177 | TPRG1
178 | CCDC50
179 | SPON2
180 | ZFYVE28
181 | LYAR
182 | BOD1L1
183 | CD38
184 | FGFBP2
185 | SEL1L3
186 | DTHD1
187 | KLF3
188 | ATP10D
189 | TXK
190 | HOPX
191 | CENPC
192 | RUFY3
193 | AREG
194 | CXCL13
195 | ANTXR2
196 | PLAC8
197 | PTPN13
198 | PPM1K
199 | HERC5
200 | PPP3CA
201 | NFKB1
202 | LEF1
203 | TIFA
204 | LARP7
205 | TNIP3
206 | KIAA1109
207 | IL2
208 | SPRY1
209 | RP11-83A24.2
210 | PRMT9
211 | ANKRD37
212 | TLR3
213 | LPCAT1
214 | PTGER4
215 | ANXA2R
216 | ITGA1
217 | PDE4D
218 | ENC1
219 | F2R
220 | ARRDC3
221 | KIAA0825
222 | SLF1
223 | KIF3A
224 | JADE2
225 | TGFBI
226 | EGR1
227 | ARAP3
228 | PPP2R2B
229 | JAKMIP2
230 | ADRB2
231 | HAVCR2
232 | PTTG1
233 | MIR3142HG
234 | N4BP3
235 | C5orf45
236 | IRF4
237 | SERPINB9
238 | GFOD1
239 | CD83
240 | MYLIP
241 | ATXN1
242 | SOX4
243 | FAM65B
244 | HIST1H1C
245 | HIST1H1E
246 | BTN3A3
247 | TNF
248 | NCR3
249 | AIF1
250 | HLA-DRA
251 | HLA-DQA1
252 | HLA-DQB1
253 | HLA-DMB
254 | HLA-DMA
255 | HMGA1
256 | CDKN1A
257 | PIM1
258 | CCDC167
259 | RUNX2
260 | CLIC5
261 | PHIP
262 | NT5E
263 | PRDM1
264 | AIM1
265 | SCML4
266 | SESN1
267 | THEMIS
268 | SAMD3
269 | SGK1
270 | AHI1
271 | PDE7B
272 | IFNGR1
273 | RP11-356I2.4
274 | SYNE1
275 | SYTL3
276 | CCR6
277 | CHST12
278 | ETV1
279 | AHR
280 | HDAC9
281 | AOAH
282 | ELMO1
283 | TRGC2
284 | TRGV10
285 | TRGV9
286 | TRG-AS1
287 | TRGV5
288 | TRGV3
289 | TRGV2
290 | MYO1G
291 | IGFBP3
292 | UPP1
293 | AUTS2
294 | GTF2I
295 | FGL2
296 | ABCB1
297 | SAMD9
298 | SAMD9L
299 | PILRB
300 | TRIM56
301 | CLDN15
302 | ATXN7L1
303 | CDHR3
304 | NAMPT
305 | IFRD1
306 | CAV1
307 | KDM7A
308 | TRBV28
309 | EPHB6
310 | EPHA1
311 | TCAF2
312 | GIMAP6
313 | GIMAP1
314 | GIMAP5
315 | SMARCD3
316 | P2RY8
317 | SMS
318 | MID1IP1
319 | LINC01281
320 | USP11
321 | TIMP1
322 | PIM2
323 | FOXP3
324 | TSPYL2
325 | SMC1A
326 | MAGEH1
327 | CYSLTR1
328 | ZMAT1
329 | NGFRAP1
330 | CXorf57
331 | SH2D1A
332 | CD40LG
333 | GAB3
334 | IL9R
335 | TNFRSF10A
336 | CLU
337 | RP11-489E7.4
338 | RP11-51J9.5
339 | EIF4EBP1
340 | CEBPD
341 | TOX
342 | RP11-25K19.1
343 | MYBL1
344 | NCOA2
345 | MSC
346 | ZBTB10
347 | FABP5
348 | GEM
349 | FBXO32
350 | FAM84B
351 | MYC
352 | TMEM71
353 | PLEC
354 | UHRF2
355 | PLIN2
356 | CDKN2A
357 | DDX58
358 | B4GALT1
359 | AQP3
360 | CEP78
361 | TLE4
362 | CTSL
363 | CKS2
364 | GADD45G
365 | UNQ6494
366 | TRMO
367 | TRAF1
368 | STOM
369 | MVB12B
370 | PRRC2B
371 | SETX
372 | RALGDS
373 | SARDH
374 | PHPT1
375 | CLIC3
376 | NPDC1
377 | TUBB4B
378 | IRF7
379 | IFITM10
380 | KCNQ1OT1
381 | OSBPL5
382 | RIC3
383 | DKK3
384 | PDE3B
385 | NUCB2
386 | CD59
387 | TRIM44
388 | PRR5L
389 | FAM111A
390 | MS4A6A
391 | MS4A1
392 | RP11-286N22.8
393 | PLA2G16
394 | PPP1R14B
395 | RASGRP2
396 | AP000769.1
397 | CDK2AP2
398 | PGM2L1
399 | LRRC32
400 | SYTL2
401 | PRSS23
402 | SMCO4
403 | ANKRD49
404 | SESN3
405 | PDGFD
406 | LAYN
407 | USP28
408 | CADM1
409 | RNF214
410 | FXYD2
411 | JAML
412 | UBE4A
413 | CXCR5
414 | BCL9L
415 | H2AFX
416 | CRTAM
417 | IL2RA
418 | PFKFB3
419 | WAC-AS1
420 | MAP3K8
421 | ASAH2B
422 | ANK3
423 | ARID5B
424 | RTKN2
425 | EGR2
426 | PRF1
427 | RP11-338I21.1
428 | IFIT2
429 | IFIT3
430 | KIF20B
431 | PDLIM1
432 | ENTPD1
433 | FRAT2
434 | PDCD4-AS1
435 | ABLIM1
436 | FAM160B1
437 | PTPRE
438 | MKI67
439 | KDM5A
440 | NINJ2
441 | CCND2
442 | CD9
443 | PTMS
444 | LAG3
445 | CLSTN3
446 | KLRG1
447 | A2M-AS1
448 | KLRB1
449 | CLECL1
450 | KLRF1
451 | GABARAPL1
452 | KLRD1
453 | KLRK1
454 | KLRC3
455 | KLRC1
456 | RP11-291B21.2
457 | YBX3
458 | LRMP
459 | ITPR2
460 | CAPRIN2
461 | KIF21A
462 | NELL2
463 | GALNT6
464 | NR4A1
465 | KRT86
466 | TESPA1
467 | NXPH4
468 | ARHGAP9
469 | DDIT3
470 | IFNG-AS1
471 | IFNG
472 | CPM
473 | PHLDA1
474 | PPP1R12A
475 | CEP290
476 | DUSP6
477 | ATP2B1
478 | LINC00936
479 | RP11-796E2.4
480 | GNPTAB
481 | PMCH
482 | C12orf75
483 | OAS1
484 | OASL
485 | RILPL2
486 | LINC00944
487 | HSPH1
488 | LHFP
489 | TSC22D1
490 | LPAR6
491 | PHF11
492 | KLF12
493 | LINC00402
494 | TBC1D4
495 | MYCBP2
496 | NDFIP2
497 | SPRY2
498 | MBNL2
499 | GPR18
500 | GPR183
501 | TNFSF13B
502 | RASA3
503 | TRAV4
504 | TRAV5
505 | TRAV6
506 | TRAV8-2
507 | TRAV13-1
508 | TRAV8-4
509 | TRAV13-2
510 | TRAV8-6
511 | TRAV17
512 | TRAV19
513 | TRAV20
514 | TRAV22
515 | TRAV23DV6
516 | TRDV1
517 | TRAV26-1
518 | TRAV27
519 | TRAV29DV5
520 | TRAV36DV7
521 | TRAV38-2DV8
522 | TRDC
523 | PPP1R3E
524 | REC8
525 | GZMH
526 | GZMB
527 | MIS18BP1
528 | RP11-596C23.2
529 | PTGDR
530 | PTGER2
531 | LGALS3
532 | ARID4A
533 | RP11-902B17.1
534 | HIF1A
535 | AKAP5
536 | GPR65
537 | LGMN
538 | IFI27
539 | CRIP2
540 | GCHFR
541 | RP11-23P13.6
542 | EIF3J-AS1
543 | PATL2
544 | GABPB1-AS1
545 | RORA
546 | DAPK2
547 | DENND4A
548 | TLE3
549 | TBC1D2B
550 | NMB
551 | MIR9-3HG
552 | MCTP2
553 | TARSL2
554 | HAGHL
555 | SNHG9
556 | NPW
557 | MMP25-AS1
558 | ZNF75A
559 | NLRC3
560 | ABCC1
561 | GGA2
562 | PRKCB
563 | RP11-666O2.2
564 | SPN
565 | RP11-455F5.5
566 | ADCY7
567 | CHD9
568 | AKTIP
569 | CES1
570 | MT1E
571 | MT1F
572 | CPNE2
573 | ADGRG5
574 | ADGRG1
575 | KIFC3
576 | TEPP
577 | CTD-2012K14.6
578 | LINC01229
579 | PLCG2
580 | GINS2
581 | SLC7A5
582 | CPNE7
583 | ZNF276
584 | ITGAE
585 | TXNDC17
586 | XAF1
587 | TNK1
588 | HS3ST3B1
589 | LRRC75A
590 | RASD1
591 | RP11-47L3.1
592 | SLFN12L
593 | AC069363.1
594 | CCL3
595 | CCL4
596 | CCL3L3
597 | CCL4L2
598 | MLLT6
599 | TOP2A
600 | IGFBP4
601 | RP5-1028K7.2
602 | TBX21
603 | RP11-357H14.17
604 | ABI3
605 | TOB1
606 | YPEL2
607 | ERN1
608 | PECAM1
609 | BPTF
610 | RAB37
611 | TK1
612 | SOCS3
613 | TBCD
614 | TYMS
615 | YES1
616 | TGIF1
617 | DLGAP1-AS1
618 | LDLRAD4
619 | MAPRE2
620 | PMAIP1
621 | BCL2
622 | NFATC1
623 | SIRPG
624 | CDC25B
625 | PLCB1
626 | ESF1
627 | KIZ
628 | NAPB
629 | ZNF337
630 | ID1
631 | BCL2L1
632 | SLA2
633 | TOX2
634 | CTSA
635 | TSHZ2
636 | ZNF217
637 | BCAS1
638 | ZBP1
639 | HELZ2
640 | MADCAM1
641 | ABCA7
642 | CTB-31O20.2
643 | GADD45B
644 | MATK
645 | EBI3
646 | TMIGD2
647 | UHRF1
648 | CD70
649 | TNFSF14
650 | MYO1F
651 | S1PR5
652 | ACP5
653 | ZNF443
654 | ASF1B
655 | ADGRE5
656 | ARRDC2
657 | ZNF101
658 | PLEKHF1
659 | FXYD7
660 | LSR
661 | ZBTB32
662 | NFKBID
663 | TYROBP
664 | CAPN12
665 | SERTAD1
666 | TGFB1
667 | LINC01480
668 | AC006129.2
669 | CD79A
670 | ATP1A3
671 | FOSB
672 | IGFL2
673 | SLC1A5
674 | ZNF331
675 | MYADM
676 | LAIR1
677 | LENG8
678 | LAIR2
679 | KIR2DL3
680 | KIR3DL2
681 | NCR1
682 | ZNF530
683 | USP18
684 | ZNF280B
685 | KIAA1671
686 | TPST2
687 | MIAT
688 | XBP1
689 | OSM
690 | RP4-539M6.22
691 | MCM5
692 | MAFF
693 | APOBEC3C
694 | APOBEC3D
695 | APOBEC3H
696 | GRAP2
697 | EP300
698 | RNU12
699 | LDOC1L
700 | CTA-29F11.1
701 | PIM3
702 | MIR155HG
703 | TIAM1
704 | LINC00649
705 | MX2
706 | ITGB2-AS1
707 | COL6A2
708 | 


--------------------------------------------------------------------------------
/RISC_Supplementary/GSE125688/GSE125688.R:
--------------------------------------------------------------------------------
  1 | library(RISC)
  2 | library(RColorBrewer)
  3 | library(ggplot2)
  4 | 
  5 | 
  6 | colname <- function(x0){
  7 |   colname0 <- colnames(x0)[-ncol(x0)]
  8 |   x0 <- x0[,-ncol(x0)]
  9 |   colnames(x0) <- colname0
 10 |   x0 <- as.matrix(x0)
 11 |   rownames(x0) <- sapply(rownames(x0), function(x){strsplit(x, "__", fixed = T)[[1]][1]})
 12 |   return(x0)
 13 | }
 14 | 
 15 | PATH = "/Path to the data/GSE125688"
 16 | 
 17 | ###################################################################################
 18 | ### Prepare data ###
 19 | ###################################################################################
 20 | Ctrl1 <- read.table(file = paste0(PATH, '/Raw_Counts/GSM3580367_bil-adult1.coutb.csv'), sep = '\t', header = T, stringsAsFactors = F)
 21 | Ctrl2 <- read.table(file = paste0(PATH, '/Raw_Counts/GSM3580368_bil-adult2.coutb.csv'), sep = '\t', header = T, stringsAsFactors = F)
 22 | Ctrl3 <- read.table(file = paste0(PATH, '/Raw_Counts/GSM3580369_bil-adult3.coutb.csv'), sep = '\t', header = T, stringsAsFactors = F)
 23 | DDC <- read.table(file = paste0(PATH, '/Raw_Counts/GSM3580370_bil-DDC1.coutb.csv'), sep = '\t', header = T, stringsAsFactors = F)
 24 | 
 25 | Ctrl1 <- colname(Ctrl1)
 26 | Ctrl2 <- colname(Ctrl2)
 27 | Ctrl3 <- colname(Ctrl3)
 28 | DDC <- colname(DDC)
 29 | 
 30 | Ctrl.dat1 <- readscdata(count = Ctrl1, cell = data.frame(Barcode = colnames(Ctrl1), row.names = colnames(Ctrl1)), gene = data.frame(Symbol = rownames(Ctrl1), row.names = rownames(Ctrl1)))
 31 | Ctrl.dat2 <- readscdata(count = Ctrl2, cell = data.frame(Barcode = colnames(Ctrl2), row.names = colnames(Ctrl2)), gene = data.frame(Symbol = rownames(Ctrl2), row.names = rownames(Ctrl2)))
 32 | Ctrl.dat3 <- readscdata(count = Ctrl3, cell = data.frame(Barcode = colnames(Ctrl3), row.names = colnames(Ctrl3)), gene = data.frame(Symbol = rownames(Ctrl3), row.names = rownames(Ctrl3)))
 33 | DDC.dat <- readscdata(count = DDC, cell = data.frame(Barcode = colnames(DDC), row.names = colnames(DDC)), gene = data.frame(Symbol = rownames(DDC), row.names = rownames(DDC)))
 34 | 
 35 | 
 36 | ###################################################################################
 37 | ### Process data ###
 38 | ###################################################################################
 39 | Ctrl.dat1 = scFilter(Ctrl.dat1, min.UMI = 500, max.UMI = 8000, min.gene = 500, min.cell = 5)
 40 | Ctrl.dat2 = scFilter(Ctrl.dat2, min.UMI = 500, max.UMI = 8000, min.gene = 500, min.cell = 5)
 41 | Ctrl.dat3 = scFilter(Ctrl.dat3, min.UMI = 500, max.UMI = 8000, min.gene = 500, min.cell = 5)
 42 | DDC.dat = scFilter(DDC.dat, min.UMI = 500, max.UMI = 8000, min.gene = 500, min.cell = 5)
 43 | 
 44 | Ctrl.dat1 = scNormalize(Ctrl.dat1)
 45 | Ctrl.dat2 = scNormalize(Ctrl.dat2)
 46 | Ctrl.dat3 = scNormalize(Ctrl.dat3)
 47 | DDC.dat = scNormalize(DDC.dat)
 48 | 
 49 | Ctrl.dat1 = scDisperse(Ctrl.dat1)
 50 | Ctrl.dat2 = scDisperse(Ctrl.dat2)
 51 | Ctrl.dat3 = scDisperse(Ctrl.dat3)
 52 | DDC.dat = scDisperse(DDC.dat)
 53 | 
 54 | length(Ctrl.dat1@vargene)
 55 | length(Ctrl.dat2@vargene)
 56 | length(Ctrl.dat3@vargene)
 57 | length(DDC.dat@vargene)
 58 | 
 59 | Ctrl.dat1 = scPCA(Ctrl.dat1)
 60 | Ctrl.dat2 = scPCA(Ctrl.dat2)
 61 | Ctrl.dat3 = scPCA(Ctrl.dat3)
 62 | DDC.dat = scPCA(DDC.dat)
 63 | 
 64 | 
 65 | ###################################################################################
 66 | ### Uncorrected data ###
 67 | ###################################################################################
 68 | ## Uncorrect
 69 | library(irlba)
 70 | library(umap)
 71 | 
 72 | gene0 = Reduce(intersect, list(Ctrl.dat1@rowdata$Symbol, Ctrl.dat2@rowdata$Symbol, Ctrl.dat3@rowdata$Symbol, DDC.dat@rowdata$Symbol))
 73 | logmat0 = cbind(Ctrl.dat1@assay$logcount[gene0,], Ctrl.dat2@assay$logcount[gene0,], Ctrl.dat3@assay$logcount[gene0,], DDC.dat@assay$logcount[gene0,])
 74 | pca0 = irlba(logmat0, nv = 50)$v
 75 | umap0 = umap(pca0[,1:9])$layout
 76 | m0 = data.frame(UMAP1 = umap0[,1], UMAP2 = umap0[,2], Set = c(rep("BEC1", nrow(Ctrl.dat1@coldata)), rep("BEC2", nrow(Ctrl.dat2@coldata)), rep("BEC3", nrow(Ctrl.dat3@coldata)), rep("DDC", nrow(DDC.dat@coldata))))
 77 | ggplot(m0, aes(UMAP1, UMAP2)) + 
 78 |   geom_point(aes(color = Set), shape = 19, size = 1, alpha = 1) + 
 79 |   scale_color_manual(values = c("#9ECAE1", "#4292C6", "#08519C", "#EF3B2C")) + 
 80 |   theme_bw(base_line_size = 0, base_size = 12) + 
 81 |   labs(x = 'UMAP1', y = 'UMAP2', color = 'Set') + 
 82 |   theme(plot.title = element_text(hjust = 0.5, size = 16)) + 
 83 |   guides(colour = guide_legend(override.aes = list(size = 5), ncol = 1))
 84 | 
 85 | 
 86 | ###################################################################################
 87 | ### Integration data ###
 88 | ###################################################################################
 89 | ## Integration
 90 | BEC = list(Ctrl.dat3, Ctrl.dat2, Ctrl.dat1, DDC.dat)
 91 | var0 = Reduce(intersect, list(
 92 |   rownames(Ctrl.dat1@rowdata), rownames(Ctrl.dat2@rowdata), 
 93 |   rownames(Ctrl.dat3@rowdata), rownames(DDC.dat@rowdata)
 94 | ))
 95 | BEC = scMultiIntegrate(BEC, eigens = 12, var.gene = var0, ncore = 4, add.Id = c('Ctrl3', 'Ctrl2', 'Ctrl1', 'DDC'))
 96 | BEC = scUMAP(BEC, npc = 12, use = 'PLS')
 97 | 
 98 | DimPlot(BEC, slot = "cell.umap", colFactor = "Set", size = 2, Colors = c("#9ECAE1", "#4292C6", "#08519C", "#EF3B2C"))
 99 | 
100 | 
101 | 


--------------------------------------------------------------------------------
/RISC_Supplementary/GSE131181/GSE131181.R:
--------------------------------------------------------------------------------
  1 | library(RISC)
  2 | library(irlba)
  3 | library(umap)
  4 | library(ggplot2)
  5 | library(RColorBrewer)
  6 | 
  7 | 
  8 | PATH = "/Path to the data/GSE131181"
  9 | 
 10 | ###################################################################################
 11 | ### Prepare data ###
 12 | ###################################################################################
 13 | library(data.table)
 14 | 
 15 | E10 = fread(file = paste0(PATH, '/Raw_Counts/GSE131181_e10.5.raw.data.csv'), sep = ',')
 16 | E13 = fread(file = paste0(PATH, '/Raw_Counts/GSE131181_e13.5.raw.data.csv'), sep = ',')
 17 | dE10 = E10[,-1]
 18 | gene1 = E10$V1
 19 | dE13 = E13[,-1]
 20 | gene2 = E13$V1
 21 | rm(E10, E13)
 22 | 
 23 | mE10 = fread(file = paste0(PATH, '/Raw_Counts/GSE131181_e10.5.meta.data.csv'), sep = ',')
 24 | mE13 = fread(file = paste0(PATH, '/Raw_Counts/GSE131181_e13.5.meta.data.csv'), sep = ',')
 25 | mdE10 = mE10[,c(1, 4, 9, 10)]
 26 | mdE13 = mE13[,c(1, 4, 10, 8)]
 27 | colnames(mdE10) = colnames(mdE13) = c('Sample', 'Experiment', 'Group', 'Set')
 28 | rm(mE10, mE13)
 29 | 
 30 | dE10 = dE10[, mdE10$Sample, with = F]
 31 | dE13 = dE13[, mdE13$Sample, with = F]
 32 | 
 33 | mdE10 = as.data.frame(mdE10)
 34 | mE10wt = mdE10[mdE10$Set == 'Control', , drop = F]
 35 | mE10pd = mdE10[mdE10$Set == 'PhD', , drop = F]
 36 | mdE13 = as.data.frame(mdE13)
 37 | mE13wt = mdE13[mdE13$Set == 'Control', , drop = F]
 38 | mE13pd = mdE13[mdE13$Set == 'PhD', , drop = F]
 39 | 
 40 | E10wt = as.matrix(dE10[, mE10wt$Sample, with = F])
 41 | E10pd = as.matrix(dE10[, mE10pd$Sample, with = F])
 42 | E13wt = as.matrix(dE13[, mE13wt$Sample, with = F])
 43 | E13pd = as.matrix(dE13[, mE13pd$Sample, with = F])
 44 | rm(dE10, dE13, mdE10, mdE13)
 45 | rownames(E10wt) = rownames(E10pd) = gene1
 46 | rownames(E13wt) = rownames(E13pd) = gene2
 47 | rownames(mE10wt) = mE10wt$Sample
 48 | rownames(mE10pd) = mE10pd$Sample
 49 | rownames(mE13wt) = mE13wt$Sample
 50 | rownames(mE13pd) = mE13pd$Sample
 51 | 
 52 | 
 53 | ###################################################################################
 54 | ### Individual datasets ###
 55 | ###################################################################################
 56 | dat1 = readscdata(count = E10wt, cell = mE10wt, gene = data.frame(Symbol = gene1, row.names = gene1))
 57 | dat2 = readscdata(count = E10pd, cell = mE10pd, gene = data.frame(Symbol = gene1, row.names = gene1))
 58 | dat3 = readscdata(count = E13wt, cell = mE13wt, gene = data.frame(Symbol = gene2, row.names = gene2))
 59 | dat4 = readscdata(count = E13pd, cell = mE13pd, gene = data.frame(Symbol = gene2, row.names = gene2))
 60 | rm(E10wt, E10pd, E13wt, E13pd, mE10wt, mE10pd, mE13wt, mE13pd, gene1, gene2)
 61 | 
 62 | process0 <- function(obj0){
 63 |   obj0 = scFilter(obj0, min.UMI = 1000, max.UMI = Inf, min.gene = 200, min.cell = 5)
 64 |   obj0 = scNormalize(obj0)
 65 |   obj0 = scDisperse(obj0)
 66 |   length(obj0@vargene)
 67 |   return(obj0)
 68 | }
 69 | 
 70 | dat1 = process0(dat1)
 71 | dat2 = process0(dat2)
 72 | dat3 = process0(dat3)
 73 | dat4 = process0(dat4)
 74 | 
 75 | FilterPlot(dat1)
 76 | FilterPlot(dat2)
 77 | FilterPlot(dat3)
 78 | FilterPlot(dat4)
 79 | 
 80 | 
 81 | ###################################################################################
 82 | ### Uncorrect Data ###
 83 | ###################################################################################
 84 | # Integration
 85 | ann0 = read.table(file = paste0(PATH, "/GSE131181_Ann.tsv"), sep = "\t", header = T, stringsAsFactors = F)
 86 | var0 = read.table(file = paste0(PATH, "/GSE131181_var.tsv"), sep = "\t", header = F, stringsAsFactors = F)
 87 | var0 = var0$V1
 88 | logmat0 = cbind(dat3@assay$logcount[var0,], dat4@assay$logcount[var0,], dat1@assay$logcount[var0,], dat2@assay$logcount[var0,])
 89 | pca0 = irlba(as.matrix(logmat0), nv = 50)$v
 90 | umap0 = umap(pca0[,1:25])$layout
 91 | 
 92 | m0 = data.frame(UMAP1 = umap0[,1], UMAP2 = umap0[,2], Set = ann0$Set, Type = ann0$Type)
 93 | ggplot(m0, aes(UMAP1, UMAP2)) + 
 94 |   geom_point(aes(color = Set, shape = Set), size = 0.1, alpha = 0.85) + 
 95 |   scale_color_manual(values = c('#FB8072', '#80B1D3', '#8DD3C7', '#BC80BD')) + 
 96 |   scale_shape_manual(values = c(3, 4, 2, 6)) + 
 97 |   theme_bw(base_line_size = 0, base_size = 24) + 
 98 |   labs(x = 'UMAP-1', y = 'UMAP-2', color = 'Set', shape = 'Set') + 
 99 |   theme(plot.title = element_text(hjust = 0.5, size = 16)) + 
100 |   guides(colour = guide_legend(override.aes = list(size = 6), ncol = 1))
101 | 
102 | ggplot(m0, aes(UMAP1, UMAP2)) + 
103 |   geom_point(aes(color = Type, shape = Set), size = 0.1, alpha = 0.9) + 
104 |   scale_color_manual(values = colorRampPalette(brewer.pal(11, 'Spectral'))(18)) + 
105 |   scale_shape_manual(values = c(3, 4, 2, 6)) + 
106 |   theme_bw(base_line_size = 0, base_size = 24) + 
107 |   labs(x = 'UMAP-1', y = 'UMAP-2', color = 'Cell Type', shape = 'Set') + 
108 |   theme(plot.title = element_text(hjust = 0.5, size = 16)) + 
109 |   guides(colour = guide_legend(override.aes = list(size = 6), ncol = 1), shape = guide_legend(override.aes = list(size = 5), ncol = 1))
110 | 
111 | 
112 | ###################################################################################
113 | ### Data Integration ###
114 | ###################################################################################
115 | # Integration
116 | var0 = read.table(file = paste0(PATH, "/GSE131181_var.tsv"), sep = "\t", header = F, stringsAsFactors = F)
117 | var0 = var0$V1
118 | dat.all = list(dat3, dat4, dat1, dat2)
119 | InPlot(dat.all, var.gene = var0, nPC = 30, ncore = 4)
120 | dat.all = scMultiIntegrate(dat.all, eigens = 20, var.gene = var0, ncore = 2, adjust = FALSE, add.Id = c('E13wt', 'E13pd', 'E10wt', 'E10pd'))
121 | dat.all = scUMAP(dat.all, npc = 25, use = 'PLS')
122 | dat.all = scTSNE(dat.all, npc = 25, use = 'PLS')
123 | 
124 | ann0 = read.table(file = paste0(PATH, "/GSE131181_Ann.tsv"), sep = "\t", header = T, stringsAsFactors = F)
125 | dat.all@coldata$Type = ann0$Type
126 | 
127 | DimPlot(dat.all, slot = "cell.umap", colFactor = "Set", size = 0.2)
128 | DimPlot(dat.all, slot = "cell.umap", colFactor = "Type", size = 0.2)
129 | 
130 | 
131 | 


--------------------------------------------------------------------------------
/RISC_Supplementary/GSE132044/GSE132044.R:
--------------------------------------------------------------------------------
  1 | library(RISC)
  2 | library(Matrix)
  3 | library(irlba)
  4 | library(umap)
  5 | library(ggplot2)
  6 | library(RColorBrewer)
  7 | 
  8 | 
  9 | PATH = "/Path to the data/GSE132044"
 10 | 
 11 | 
 12 | ###################################################################################
 13 | ### Prepare Data ###
 14 | ###################################################################################
 15 | ## All Cells
 16 | mat0 = readMM(file = paste0(PATH, "/Raw_Data/counts.umi.mtx"))
 17 | cell0 = read.table(file = paste0(PATH, "/Raw_Data/cells.umi.txt"), sep = "\t", header = F, stringsAsFactors = F)
 18 | colnames(cell0) = "Mix"
 19 | cell0$Sort = sapply(cell0$Mix, function(x){strsplit(x, "_", fixed = T)[[1]][1]})
 20 | cell0$Platform = sapply(cell0$Mix, function(x){strsplit(x, "_", fixed = T)[[1]][2]})
 21 | gene0 = read.table(file = paste0(PATH, "/Raw_Data/genes.umi.txt"), sep = "\t", header = F, stringsAsFactors = F)
 22 | colnames(gene0) = "Mix"
 23 | gene0$Ensembl = sapply(gene0$Mix, function(x){strsplit(x, "_", fixed = T)[[1]][1]})
 24 | gene0$Symbol = sapply(gene0$Mix, function(x){strsplit(x, "_", fixed = T)[[1]][2]})
 25 | keep = !duplicated(gene0$Symbol)
 26 | gene0 = gene0[keep,]
 27 | mat0 = mat0[keep,]
 28 | colnames(mat0) = cell0$Mix
 29 | rownames(mat0) = gene0$Symbol
 30 | 
 31 | 
 32 | ## PBMC ("10xChromiumv2A", "Drop", "inDrops")
 33 | # 10X Chromium V2
 34 | keep = cell0$Sort == "pbmc2" & cell0$Platform == "10xChromiumv2"
 35 | cell1 = cell0[keep,]
 36 | cell1$Barcode = sapply(cell1$Mix, function(x){strsplit(x, "_", fixed = T)[[1]][3]})
 37 | rownames(cell1) = cell1$Barcode
 38 | mat1 = mat0[,keep]
 39 | keep = rowSums(as.matrix(mat1) > 0) > 0
 40 | mat1 = mat1[keep,]
 41 | colnames(mat1) = rownames(cell1)
 42 | gene0 = data.frame(Symbol = rownames(mat1), row.names = rownames(mat1))
 43 | dat1 = readscdata(mat1, cell1, gene0, is.filter = F)
 44 | 
 45 | # Drop-seq
 46 | # keep = cell0$Sort == "pbmc2" & cell0$Platform == "Drop" & cell0$Mix %in% Ann0$Mix
 47 | keep = cell0$Sort == "pbmc2" & cell0$Platform == "Drop"
 48 | cell1 = cell0[keep,]
 49 | cell1$Barcode = sapply(cell1$Mix, function(x){strsplit(x, "_", fixed = T)[[1]][4]})
 50 | rownames(cell1) = cell1$Barcode
 51 | mat1 = mat0[,keep]
 52 | keep = rowSums(as.matrix(mat1) > 0) > 0
 53 | mat1 = mat1[keep,]
 54 | colnames(mat1) = rownames(cell1)
 55 | gene0 = data.frame(Symbol = rownames(mat1), row.names = rownames(mat1))
 56 | dat2 = readscdata(mat1, cell1, gene0, is.filter = F)
 57 | 
 58 | # inDrops-seq
 59 | # keep = cell0$Sort == "pbmc2" & cell0$Platform == "inDrops" & cell0$Mix %in% Ann0$Mix
 60 | keep = cell0$Sort == "pbmc2" & cell0$Platform == "inDrops"
 61 | cell1 = cell0[keep,]
 62 | cell1$Barcode = sapply(cell1$Mix, function(x){strsplit(x, "_", fixed = T)[[1]][4]})
 63 | rownames(cell1) = cell1$Barcode
 64 | mat1 = mat0[,keep]
 65 | keep = rowSums(as.matrix(mat1) > 0) > 0
 66 | mat1 = mat1[keep,]
 67 | colnames(mat1) = rownames(cell1)
 68 | gene0 = data.frame(Symbol = rownames(mat1), row.names = rownames(mat1))
 69 | dat3 = readscdata(mat1, cell1, gene0, is.filter = F)
 70 | 
 71 | 
 72 | ###################################################################################
 73 | ### RISC Objects ###
 74 | ###################################################################################
 75 | process0 <- function(obj0){
 76 |   obj0 = scFilter(obj0, min.UMI = 100, max.UMI = 10000, min.gene = 100, min.cell = 3)
 77 |   obj0 = scNormalize(obj0, ncore = 4)
 78 |   obj0 = scDisperse(obj0)
 79 |   print(length(obj0@vargene))
 80 |   return(obj0)
 81 | }
 82 | 
 83 | dat1 = process0(dat1)
 84 | dat2 = process0(dat2)
 85 | dat3 = process0(dat3)
 86 | 
 87 | 
 88 | ###################################################################################
 89 | ### Uncorrect Data ###
 90 | ###################################################################################
 91 | set.seed(123)
 92 | var0 = Reduce(intersect, list(dat1@rowdata$Symbol, dat2@rowdata$Symbol, dat3@rowdata$Symbol))
 93 | logmat1 = as.matrix(dat1@assay$logcount)[var0,]
 94 | logmat2 = as.matrix(dat2@assay$logcount)[var0,]
 95 | logmat3 = as.matrix(dat3@assay$logcount)[var0,]
 96 | logmat0 = cbind(logmat3, logmat1, logmat2)
 97 | pca0 = irlba(logmat0, nv = 50)$v
 98 | umap0 = umap(pca0[,1:15])$layout
 99 | 
100 | ann0 = read.table(file = paste0(PATH, "/GSE132044_Anno.tsv"), sep = "\t", header = T, stringsAsFactors = F)
101 | m0 = data.frame(UMAP1 = umap0[,1], UMAP2 = umap0[,2], Plat = ann0$Set, Type = ann0$Type)
102 | m0$Platform = factor(m0$Plat, levels = c("inDrops", "10X-Genomics", "Drop-seq"), labels = c("inDrops", "10xGenomics", "Drop-seq"))
103 | m0$Type = factor(m0$Type, levels = c("B", "Platelet", "pDC", "CD16+ Mono", "CD4 Memory T", "secretory B", "CD4 Naive T", "CD8 T", "DC", "NK", "CD14+ Mono"))
104 | color0 = brewer.pal(11, "Spectral")
105 | color1 = c("#D9D9D9", "#FDB462", "#BC80BD")
106 | 
107 | ggplot(m0, aes(UMAP1, UMAP2)) + 
108 |   geom_point(aes(color = Platform, shape = Platform), size = 1, alpha = 1) + 
109 |   scale_color_manual(values = color1) + 
110 |   scale_shape_manual(values = c(1, 2, 6)) + 
111 |   theme_bw(base_line_size = 0) + 
112 |   labs(x = 'UMAP-1', y = 'UMAP-2', color = 'Platforms', shape = 'Set') + 
113 |   theme(plot.title = element_text(hjust = 0.5, size = 16)) + 
114 |   guides(colour = guide_legend(override.aes = list(size = 5), ncol = 1), shape = guide_legend(override.aes = list(size = 5), ncol = 1))
115 | 
116 | ggplot(m0, aes(UMAP1, UMAP2)) + 
117 |   geom_point(aes(color = Type, shape = Platform), size = 1, alpha = 1) + 
118 |   scale_color_manual(values = color0) + 
119 |   scale_shape_manual(values = c(1, 2, 6)) + 
120 |   theme_bw(base_line_size = 0) + 
121 |   labs(x = 'UMAP-1', y = 'UMAP-2', color = 'Platforms', shape = 'Set') + 
122 |   theme(plot.title = element_text(hjust = 0.5, size = 16)) + 
123 |   guides(colour = guide_legend(override.aes = list(size = 5), ncol = 1), shape = guide_legend(override.aes = list(size = 5), ncol = 1))
124 | 
125 | 
126 | ###################################################################################
127 | ### RISC Integration ###
128 | ###################################################################################
129 | ## PBMC2
130 | set.seed(123)
131 | data0 = list(dat3, dat1, dat2)
132 | var0 = read.table(file = paste0(PATH, "/var.tsv"), sep = "\t", header = F, stringsAsFactors = F)
133 | var0 = var0$V1
134 | InPlot(data0, var.gene = var0, ncore = 4, minPC = 11, nPC = 15)
135 | data0 = scMultiIntegrate(data0, eigens = 14, var.gene = var0, add.Id = c("inDrops", "10X-Genomics", "Drop-seq"), ncore = 4)
136 | data0 = scUMAP(data0, npc = 15, use = "PLS")
137 | ann0 = read.table(file = paste0(PATH, "/GSE132044_Anno.tsv"), sep = "\t", header = T, stringsAsFactors = F)
138 | data0@coldata$Type = ann0$Type
139 | 
140 | DimPlot(data0, slot = "cell.umap", colFactor = "Set", size = 0.5)
141 | DimPlot(data0, slot = "cell.umap", colFactor = "Type", size = 0.5, label = T)
142 | DimPlot(data0, slot = "cell.umap", genes = "MS4A1", size = 0.2)
143 | DimPlot(data0, slot = "cell.umap", genes = "GNLY", size = 0.2)
144 | DimPlot(data0, slot = "cell.umap", genes = "RCAN3", size = 0.2)
145 | DimPlot(data0, slot = "cell.umap", genes = "CD8A", size = 0.2)
146 | DimPlot(data0, slot = "cell.umap", genes = "GZMK", size = 0.2)
147 | DimPlot(data0, slot = "cell.umap", genes = "CD14", size = 0.2)
148 | DimPlot(data0, slot = "cell.umap", genes = "FCGR3A", size = 0.2)
149 | DimPlot(data0, slot = "cell.umap", genes = "FCER1A", size = 0.2)
150 | DimPlot(data0, slot = "cell.umap", genes = "PPBP",  size = 0.2)
151 | DimPlot(data0, slot = "cell.umap", genes = "SERPINF1", size = 0.2)
152 | DimPlot(data0, slot = "cell.umap", genes = "JCHAIN", size = 0.2)
153 | 
154 | 
155 | 


--------------------------------------------------------------------------------
/RISC_Supplementary/GSE84133/GSE84133.R:
--------------------------------------------------------------------------------
  1 | library(RISC)
  2 | library(irlba)
  3 | library(umap)
  4 | library(ggplot2)
  5 | library(RColorBrewer)
  6 | library(biomaRt)
  7 | 
  8 | 
  9 | ###################################################################################
 10 | ### Prepare Data ###
 11 | ###################################################################################
 12 | PATH = "/Path to the data/GSE84133"
 13 | Anno = read.table(file = paste0(PATH, "/GSE84133_Filter.tsv"), sep = "\t", header = T, stringsAsFactors = F)
 14 | ann0 = read.table(file = paste0(PATH, "/GSE84133_Anno.tsv"), sep = "\t", header = T, stringsAsFactors = F)
 15 | 
 16 | dat1 = read.table(file = paste0(PATH, '/Raw_Counts/GSM2230757_human1_umifm_counts.csv'), sep = ',', header = T, stringsAsFactors = F, check.names = F)
 17 | mat1 = as.matrix(t(dat1[,-c(1:3)]))
 18 | cel1 = dat1[,c(1, 3)]
 19 | colnames(cel1) = c('Barcode', 'Origi')
 20 | cel1$Set = 'Donor1'
 21 | colnames(mat1) = rownames(cel1) = cel1$Barcode
 22 | cel1 = cel1[cel1$Barcode %in% Anno$Barcode[Anno$Set == "Donor1"],]
 23 | mat1 = mat1[,colnames(mat1) %in% rownames(cel1)]
 24 | 
 25 | dat2 = read.table(file = paste0(PATH, '/Raw_Counts/GSM2230758_human2_umifm_counts.csv'), sep = ',', header = T, stringsAsFactors = F, check.names = F)
 26 | mat2 = as.matrix(t(dat2[,-c(1:3)]))
 27 | cel2 = dat2[,c(1, 3)]
 28 | colnames(cel2) = c('Barcode', 'Origi')
 29 | cel2$Set = 'Donor2'
 30 | colnames(mat2) = rownames(cel2) = cel2$Barcode
 31 | cel2 = cel2[cel2$Barcode %in% Anno$Barcode[Anno$Set == "Donor2"],]
 32 | mat2 = mat2[,colnames(mat2) %in% rownames(cel2)]
 33 | 
 34 | dat3 = read.table(file = paste0(PATH, '/Raw_Counts/GSM2230759_human3_umifm_counts.csv'), sep = ',', header = T, stringsAsFactors = F, check.names = F)
 35 | mat3 = as.matrix(t(dat3[,-c(1:3)]))
 36 | cel3 = dat3[,c(1, 3)]
 37 | colnames(cel3) = c('Barcode', 'Origi')
 38 | cel3$Set = 'Donor3'
 39 | colnames(mat3) = rownames(cel3) = cel3$Barcode
 40 | cel3 = cel3[cel3$Barcode %in% Anno$Barcode[Anno$Set == "Donor3"],]
 41 | mat3 = mat3[,colnames(mat3) %in% rownames(cel3)]
 42 | 
 43 | dat4 = read.table(file = paste0(PATH, '/Raw_Counts/GSM2230760_human4_umifm_counts.csv'), sep = ',', header = T, stringsAsFactors = F, check.names = F)
 44 | mat4 = as.matrix(t(dat4[,-c(1:3)]))
 45 | cel4 = dat4[,c(1, 3)]
 46 | colnames(cel4) = c('Barcode', 'Origi')
 47 | cel4$Set = 'Donor4'
 48 | colnames(mat4) = rownames(cel4) = cel4$Barcode
 49 | cel4 = cel4[cel4$Barcode %in% Anno$Barcode[Anno$Set == "Donor4"],]
 50 | mat4 = mat4[,colnames(mat4) %in% rownames(cel4)]
 51 | 
 52 | matrix1 = cbind(mat1, mat2, mat3, mat4)
 53 | keep = rowSums(matrix1 > 0) >= 0
 54 | matrix1 = matrix1[keep,]
 55 | gene0 = rownames(matrix1)
 56 | gen1 = gen2 = gen3 = gen4 = data.frame(Symbol = gene0, row.names = gene0, stringsAsFactors = F)
 57 | 
 58 | dat1 = readscdata(mat1[gene0,], cel1, gen1, is.filter = F)
 59 | dat2 = readscdata(mat2[gene0,], cel2, gen2, is.filter = F)
 60 | dat3 = readscdata(mat3[gene0,], cel3, gen3, is.filter = F)
 61 | dat4 = readscdata(mat4[gene0,], cel4, gen4, is.filter = F)
 62 | 
 63 | mouse = useMart('ensembl', dataset = 'mmusculus_gene_ensembl')
 64 | human = useMart('ensembl', dataset = 'hsapiens_gene_ensembl')
 65 | symbol = getLDS(attributes = 'hgnc_symbol', filters = 'hgnc_symbol', values = gene0, mart = human, attributesL = 'mgi_symbol', martL = mouse, uniqueRows = T)
 66 | colnames(symbol) = c('HGNC', 'MGI')
 67 | symbol0 = symbol[!duplicated(symbol$MGI, na.rm = T),]
 68 | symbol0 = symbol0[!duplicated(symbol0$HGNC, na.rm = T),]
 69 | 
 70 | dat5 = read.table(file = paste0(PATH, '/Raw_Counts/GSM2230761_mouse1_umifm_counts.csv'), sep = ',', header = T, stringsAsFactors = F, check.names = F)
 71 | mat5 = as.matrix(t(dat5[,-c(1:3)]))
 72 | cel5 = dat5[,c(1, 3)]
 73 | colnames(cel5) = c('Barcode', 'Origi')
 74 | cel5$Set = 'Mouse5'
 75 | colnames(mat5) = rownames(cel5) = cel5$Barcode
 76 | gene0 = intersect(symbol0$MGI, rownames(mat5))
 77 | symbol1 = symbol0[symbol0$MGI %in% gene0,]
 78 | mat5 = mat5[symbol1$MGI,]
 79 | rownames(mat5) = symbol1$HGNC
 80 | cel5 = cel5[cel5$Barcode %in% Anno$Barcode[Anno$Set == "Mouse1"],]
 81 | mat5 = mat5[,colnames(mat5) %in% rownames(cel5)]
 82 | 
 83 | dat6 = read.table(file = paste0(PATH, '/Raw_Counts/GSM2230762_mouse2_umifm_counts.csv'), sep = ',', header = T, stringsAsFactors = F, check.names = F)
 84 | mat6 = as.matrix(t(dat6[,-c(1:3)]))
 85 | cel6 = dat6[,c(1, 3)]
 86 | colnames(cel6) = c('Barcode', 'Origi')
 87 | cel6$Set = 'Mouse6'
 88 | colnames(mat6) = rownames(cel6) = cel6$Barcode
 89 | gene0 = intersect(symbol0$MGI, rownames(mat6))
 90 | symbol2 = symbol0[symbol0$MGI %in% gene0,]
 91 | mat6 = mat6[symbol2$MGI,]
 92 | rownames(mat6) = symbol2$HGNC
 93 | cel6 = cel6[cel6$Barcode %in% Anno$Barcode[Anno$Set == "Mouse2"],]
 94 | mat6 = mat6[,colnames(mat6) %in% rownames(cel6)]
 95 | 
 96 | matrix2 = cbind(mat5, mat6)
 97 | keep = rowSums(matrix2 > 0) >= 0
 98 | matrix2 = matrix2[keep,]
 99 | gene0 = rownames(matrix2)
100 | gen5 = gen6 = data.frame(Symbol = gene0, row.names = gene0, stringsAsFactors = F)
101 | 
102 | dat5 = readscdata(mat5[gene0,], cel5, gen5, is.filter = F)
103 | dat6 = readscdata(mat6[gene0,], cel6, gen6, is.filter = F)
104 | 
105 | 
106 | ###################################################################################
107 | ### RISC Objects ###
108 | ###################################################################################
109 | process0 <- function(obj0){
110 |   obj0 = scFilter(obj0, min.UMI = 1000, max.UMI = 20000, min.gene = 200, min.cell = 0, is.filter = F)
111 |   obj0 = scNormalize(obj0)
112 |   obj0 = scDisperse(obj0)
113 |   print(length(obj0@vargene))
114 |   return(obj0)
115 | }
116 | 
117 | dat1 = process0(dat1)
118 | dat2 = process0(dat2)
119 | dat3 = process0(dat3)
120 | dat4 = process0(dat4)
121 | dat5 = process0(dat5)
122 | dat6 = process0(dat6)
123 | 
124 | 
125 | ###################################################################################
126 | ### Uncorrect Data ###
127 | ###################################################################################
128 | Color0 = data.frame(
129 |   Type = c("activated_stellate", "acinar", "alpha", "beta", "delta", "ductal", "epsilon", "immune_other", "b_cell", "mast", "schwann", "t_cell", "gamma", "macrophage", "endothelial", "quiescent_stellate"), 
130 |   Color = colorRampPalette(brewer.pal(11, 'Spectral'))(16), stringsAsFactors = F
131 | )
132 | 
133 | coldata0 = rbind.data.frame(dat1@coldata, dat2@coldata, dat3@coldata, dat4@coldata)
134 | var0 = Reduce(intersect, list(rownames(dat1@assay$logcount), rownames(dat2@assay$logcount), rownames(dat3@assay$logcount), rownames(dat4@assay$logcount)))
135 | logmat1 = cbind(as.matrix(dat1@assay$logcount)[var0,], as.matrix(dat2@assay$logcount)[var0,], as.matrix(dat3@assay$logcount)[var0,], as.matrix(dat4@assay$logcount)[var0,])
136 | pca0 = irlba(logmat1, nv = 10, center = T)$v
137 | umap0 = umap(pca0)$layout
138 | m0 = data.frame(UMAP1 = umap0[,1], UMAP2 = umap0[,2], Set = coldata0$Set, Type = coldata0$Origi)
139 | m0$Type = factor(m0$Type, levels = c("activated_stellate", "acinar", "alpha", "beta", "delta", "ductal", "epsilon", "mast", "schwann", "t_cell", "gamma", "macrophage", "endothelial", "quiescent_stellate"))
140 | color1 = Color0$Color[Color0$Type %in% as.character(m0$Type)]
141 | 
142 | ggplot(m0, aes(UMAP1, UMAP2)) + 
143 |   geom_point(aes(color = Set, shape = Set), size = 1, alpha = 1) + 
144 |   scale_color_manual(values = brewer.pal(4, 'Dark2')) + 
145 |   scale_shape_manual(values = c(1, 2, 6, 8)) + 
146 |   theme_bw(base_line_size = 0) + 
147 |   labs(x = 'UMAP-1', y = 'UMAP-2', color = 'Set', shape = 'Source') + 
148 |   theme(plot.title = element_text(hjust = 0.5, size = 16)) + 
149 |   guides(colour = guide_legend(override.aes = list(size = 5), ncol = 1), shape = guide_legend(override.aes = list(size = 5), ncol = 1))
150 | ggplot(m0, aes(UMAP1, UMAP2)) + 
151 |   geom_point(aes(color = Type, shape = Set), size = 1, alpha = 1) + 
152 |   scale_color_manual(values = color1) + 
153 |   scale_shape_manual(values = c(1, 2, 6, 8)) + 
154 |   theme_bw(base_line_size = 0) + 
155 |   labs(x = 'UMAP-1', y = 'UMAP-2', color = 'Cell Type', shape = 'Source') + 
156 |   theme(plot.title = element_text(hjust = 0.5, size = 16)) + 
157 |   guides(colour = guide_legend(override.aes = list(size = 5), ncol = 1), shape = guide_legend(override.aes = list(size = 5), ncol = 1))
158 | 
159 | coldata0 = rbind.data.frame(dat5@coldata, dat6@coldata)
160 | var0 = Reduce(intersect, list(rownames(dat5@assay$logcount), rownames(dat6@assay$logcount)))
161 | logmat2 = cbind(as.matrix(dat5@assay$logcount)[var0,], as.matrix(dat6@assay$logcount)[var0,])
162 | pca0 = irlba(logmat2, nv = 8, center = T)$v
163 | umap0 = umap(pca0)$layout
164 | m0 = data.frame(UMAP1 = umap0[,1], UMAP2 = umap0[,2], Set = coldata0$Set, Type = coldata0$Origi)
165 | m0$Set = factor(m0$Set, levels = c("Mouse5", "Mouse6"), labels = c("Mouse1", "Mouse2"))
166 | m0$Type = factor(
167 |   m0$Type, 
168 |   levels = c("activated_stellate", "beta", "ductal", "immune_other", "B_cell", "schwann", "T_cell", "gamma", "macrophage", "endothelial", "quiescent_stellate"), 
169 |   labels = c("activated_stellate", "beta", "ductal", "immune_other", "b_cell", "schwann", "t_cell", "gamma", "macrophage", "endothelial", "quiescent_stellate")
170 | )
171 | color2 = Color0$Color[Color0$Type %in% m0$Type]
172 | 
173 | ggplot(m0, aes(UMAP1, UMAP2)) + 
174 |   geom_point(aes(color = Set, shape = Set), size = 1, alpha = 1) + 
175 |   scale_color_manual(values = brewer.pal(4, 'Dark2')) + 
176 |   scale_shape_manual(values = c(4, 5)) + 
177 |   theme_bw(base_line_size = 0) + 
178 |   labs(x = 'UMAP-1', y = 'UMAP-2', color = 'Set', shape = 'Source') + 
179 |   theme(plot.title = element_text(hjust = 0.5, size = 16)) + 
180 |   guides(colour = guide_legend(override.aes = list(size = 5), ncol = 1), shape = guide_legend(override.aes = list(size = 5), ncol = 1))
181 | ggplot(m0, aes(UMAP1, UMAP2)) + 
182 |   geom_point(aes(color = Type, shape = Set), size = 1, alpha = 1) + 
183 |   scale_color_manual(values = color2) + 
184 |   scale_shape_manual(values = c(4, 5)) + 
185 |   theme_bw(base_line_size = 0) + 
186 |   labs(x = 'UMAP-1', y = 'UMAP-2', color = 'Cell Type', shape = 'Source') + 
187 |   theme(plot.title = element_text(hjust = 0.5, size = 16)) + 
188 |   guides(colour = guide_legend(override.aes = list(size = 5), ncol = 1), shape = guide_legend(override.aes = list(size = 5), ncol = 1))
189 | 
190 | 
191 | ###################################################################################
192 | ### RISC Integration ###
193 | ###################################################################################
194 | ## Human
195 | data1 = list(dat1, dat2, dat3, dat4)
196 | var0 = Reduce(intersect, list(dat1@vargene, dat2@vargene, dat3@vargene, dat4@vargene))
197 | InPlot(data1, var.gene = var0)
198 | data1 = scMultiIntegrate(data1, eigens = 14, var.gene = var0, ncore = 4, add.Id = c("Donor1", "Donor2", "Donor3", "Donor4"))
199 | data1 = scUMAP(data1, npc = 14, use = 'PLS')
200 | data1@coldata$Type = as.factor(ann0$Type[ann0$Set %in% c("Donor1", "Donor2", "Donor3", "Donor4")])
201 | 
202 | DimPlot(data1, slot = "cell.umap", colFactor = "Set", Colors = brewer.pal(4, "Dark2"))
203 | DimPlot(data1, slot = "cell.umap", colFactor = "Type")
204 | 
205 | ## Mouse
206 | data2 = list(dat6, dat5)
207 | var0 = Reduce(intersect, list(dat5@vargene, dat6@vargene))
208 | InPlot(data2, var.gene = var0)
209 | data2 = scMultiIntegrate(data2, eigens = 14, var.gene = var0, ncore = 4, add.Id = c("Mouse2", "Mouse1"))
210 | data2 = scUMAP(data2, npc = 14, use = 'PLS')
211 | data2@coldata$Type = as.factor(ann0$Type[ann0$Set %in% c("Mouse2", "Mouse1")])
212 | 
213 | DimPlot(data2, slot = "cell.umap", colFactor = "Set", Colors = brewer.pal(4, "Dark2"))
214 | DimPlot(data2, slot = "cell.umap", colFactor = "Type")
215 | 
216 | ## Both species
217 | data3 = list(dat1, dat2, dat3, dat4, dat5, dat6)
218 | var0 = read.table(file = paste0(PATH, "/var.tsv"), sep = "\t", header = F, stringsAsFactors = F)
219 | var0 = var0$V1
220 | InPlot(data3, var.gene = var0)
221 | data3 = scMultiIntegrate(data3, eigens = 15, var.gene = var0, ncore = 4, add.Id = c("Donor1", "Donor2", "Donor3", "Donor4", "Mouse1", "Mouse2"))
222 | data3 = scUMAP(data3, npc = 15, use = 'PLS')
223 | data3@coldata$Type = as.factor(ann0$Type)
224 | 
225 | DimPlot(data3, slot = "cell.umap", colFactor = "Set", Colors = brewer.pal(6, "Dark2"))
226 | DimPlot(data3, slot = "cell.umap", colFactor = "Type")
227 | 
228 | DimPlot(data3, slot = "cell.umap", genes = "GCG", size = 0.5)
229 | DimPlot(data3, slot = "cell.umap", genes = "INS", size = 0.5, exp.range = c(6, 10))
230 | DimPlot(data3, slot = "cell.umap", genes = "RESP18", size = 0.5)
231 | DimPlot(data3, slot = "cell.umap", genes = "KRT19", size = 0.5)
232 | DimPlot(data3, slot = "cell.umap", genes = "COL6A3", size = 0.5)
233 | DimPlot(data3, slot = "cell.umap", genes = "SST", size = 0.5, exp.range = c(3, 11))
234 | DimPlot(data3, slot = "cell.umap", genes = "RGS5", size = 0.5)
235 | DimPlot(data3, slot = "cell.umap", genes = "PRSS1", size = 0.5)
236 | DimPlot(data3, slot = "cell.umap", genes = "FLT1", size = 0.5)
237 | 
238 | 
239 | 


--------------------------------------------------------------------------------
/RISC_Supplementary/GSE85241_GSE81076_GSE83139_EMTAB_5061/GSE85241_GSE81076_GSE83139_EMTAB_5061.R:
--------------------------------------------------------------------------------
  1 | library(RISC)
  2 | library(irlba)
  3 | library(umap)
  4 | library(ggplot2)
  5 | library(RColorBrewer)
  6 | 
  7 | 
  8 | ###################################################################################
  9 | ### Prepare Data ###
 10 | ###################################################################################
 11 | PATH = "/Path to the data/GSE85241_GSE81076_GSE83139_EMTAB_5061"
 12 | 
 13 | ## GSE85241 CEL-Seq2
 14 | gse85241.df <- read.table(file = paste0(PATH, "/Raw_Counts/GSE85241_cellsystems_dataset_4donors_updated.csv"), sep = '\t', h = TRUE, row.names = 1, stringsAsFactors = FALSE)
 15 | dim(gse85241.df)
 16 | donor.names <- sub("^(D[0-9]+).*", "\\1", colnames(gse85241.df))
 17 | table(donor.names)
 18 | plate.id <- sub("^D[0-9]+\\.([0-9]+)_.*", "\\1", colnames(gse85241.df))
 19 | table(plate.id)
 20 | gene.symb <- gsub("__chr.*$", "", rownames(gse85241.df))
 21 | is.spike <- grepl("^ERCC-", gene.symb)
 22 | table(is.spike)
 23 | gse85241.df$Symbol = gene.symb
 24 | gse85241.df = gse85241.df[!duplicated(gse85241.df$Symbol),]
 25 | rownames(gse85241.df) = gse85241.df$Symbol
 26 | gse85241.df = gse85241.df[,-3073]
 27 | 
 28 | count0 = gse85241.df
 29 | cell = data.frame(Donor = donor.names, Plate = plate.id)
 30 | rownames(cell) = colnames(count0)
 31 | gene = data.frame(Symbol = rownames(count0))
 32 | rownames(gene) = rownames(count0)
 33 | 
 34 | data1 = readscdata(count = count0, cell = cell, gene = gene)
 35 | rm(cell, count0, donor.names, gene, gene.symb, gse85241.df, is.spike, plate.id)
 36 | 
 37 | 
 38 | ## GSE81076 CEL-Seq
 39 | gse81076.df <- read.table(file = paste0(PATH, "/Raw_Counts/GSE81076_D2_3_7_10_17.txt"), sep='\t', header=TRUE, stringsAsFactors=FALSE, row.names=1)
 40 | donor.names <- sub("^(D[0-9]+).*", "\\1", colnames(gse81076.df))
 41 | table(donor.names)
 42 | plate.id <- sub("^D[0-9]+(.*)_.*", "\\1", colnames(gse81076.df))
 43 | table(plate.id)
 44 | gene.symb <- gsub("__chr.*$", "", rownames(gse81076.df))
 45 | is.spike <- grepl("^ERCC-", gene.symb)
 46 | table(is.spike)
 47 | gse81076.df$Symbol = gene.symb
 48 | gse81076.df = gse81076.df[!duplicated(gse81076.df$Symbol),]
 49 | rownames(gse81076.df) = gse81076.df$Symbol
 50 | gse81076.df = gse81076.df[,-1729]
 51 | 
 52 | count0 = gse81076.df
 53 | cell = data.frame(Donor = donor.names, Plate = plate.id)
 54 | rownames(cell) = colnames(count0)
 55 | gene = data.frame(Symbol = rownames(count0))
 56 | rownames(gene) = rownames(count0)
 57 | 
 58 | data2 = readscdata(count = count0, cell = cell, gene = gene)
 59 | rm(cell, count0, donor.names, gene, gene.symb, gse81076.df, is.spike, plate.id)
 60 | 
 61 | 
 62 | ## E-MTAB-5061 Smart-Seq2
 63 | header <- read.table(file = paste0(PATH, "/Raw_Counts/pancreas_refseq_rpkms_counts_3514sc.txt"), nrow = 1, sep = "\t", comment.char = "", stringsAsFactors = FALSE)
 64 | ncells <- ncol(header) - 1L
 65 | col.types <- vector("list", ncells*2 + 2)
 66 | col.types[1:2] <- "character"
 67 | col.types[2+ncells + seq_len(ncells)] <- "integer"
 68 | e5601.df <- read.table(file = paste0(PATH, "/Raw_Counts/pancreas_refseq_rpkms_counts_3514sc.txt"), sep = "\t", colClasses = col.types)
 69 | gene.data <- e5601.df[,1:2]
 70 | e5601.df <- e5601.df[,-(1:2)]
 71 | colnames(e5601.df) <- as.character(header[1,-1])
 72 | dim(e5601.df)
 73 | is.spike <- grepl("^ERCC-", gene.data[,2])
 74 | table(is.spike)
 75 | e5601.df = data.frame(e5601.df, Symbol = gene.data$V1)
 76 | e5601.df = e5601.df[!duplicated(e5601.df$Symbol),]
 77 | rownames(e5601.df) = e5601.df$Symbol
 78 | e5601.df = e5601.df[,-3515]
 79 | col.types[[1]] = sapply(colnames(e5601.df), function(x){strsplit(x, '_', fixed = T)[[1]][1]})
 80 | col.types[[2]] = sapply(colnames(e5601.df), function(x){strsplit(x, '_', fixed = T)[[1]][2]})
 81 | 
 82 | count0 = e5601.df
 83 | cell = data.frame(Donor = col.types[[1]], Plate = col.types[[2]])
 84 | rownames(cell) = colnames(count0)
 85 | gene = data.frame(Symbol = rownames(count0))
 86 | rownames(gene) = rownames(count0)
 87 | 
 88 | data3 = readscdata(count = count0, cell = cell, gene = gene)
 89 | rm(cell, count0, gene, e5601.df, is.spike, col.types, gene.data, header, ncells)
 90 | 
 91 | 
 92 | ## GSE83139 SMARTer
 93 | gse83139.df <- read.table(file = paste0(PATH, "/Raw_Counts/GSE83139_tbx-v-f-norm-ntv-cpms.csv"), sep = "\t", header = T, stringsAsFactors = F)
 94 | gse83139.df <- gse83139.df[!duplicated(gse83139.df$gene),]
 95 | gse83139.df <- gse83139.df[!is.na(gse83139.df$gene),]
 96 | count0 = as.matrix(gse83139.df[,-c(1:7)])
 97 | rownames(count0) = as.character(gse83139.df$gene)
 98 | cell = data.frame(Sample = colnames(count0), row.names = colnames(count0), stringsAsFactors = F)
 99 | gene = data.frame(ID = rownames(count0), row.names = rownames(count0), stringsAsFactors = F)
100 | 
101 | data4 = readscdata(count = count0, cell = cell, gene = gene)
102 | rm(count0, gse83139.df, cell, gene)
103 | 
104 | 
105 | ###################################################################################
106 | ### RISC Data ###
107 | ###################################################################################
108 | process1 <- function(obj0){
109 |   obj0 <- scFilter(obj0, min.UMI = 3000, max.UMI = 100000, min.gene = 1000, min.cell = 5)
110 |   obj0 <- scNormalize(obj0)
111 |   obj0 <- scDisperse(obj0)
112 |   print(length(obj0@vargene))
113 |   obj0 <- scPCA(obj0)
114 |   return(obj0)
115 | }
116 | 
117 | process2 <- function(obj0){
118 |   obj0 <- scFilter(obj0)
119 |   obj0 <- scNormalize(obj0)
120 |   obj0 <- scDisperse(obj0)
121 |   print(length(obj0@vargene))
122 |   obj0 <- scPCA(obj0)
123 |   return(obj0)
124 | }
125 | 
126 | data1 = process1(data1)
127 | data2 = process1(data2)
128 | data3 = process1(data3)
129 | data4 = process2(data4)
130 | 
131 | 
132 | ###################################################################################
133 | ### Uncorrect Data ###
134 | ###################################################################################
135 | var0 = read.table(file = paste0(PATH, "/var.tsv"), sep = "\t", header = F, stringsAsFactors = F)
136 | var0 = var0$V1
137 | logmat0 = cbind(
138 |   as.matrix(data1@assay$logcount)[var0,], 
139 |   as.matrix(data2@assay$logcount)[var0,], 
140 |   as.matrix(data3@assay$logcount)[var0,], 
141 |   as.matrix(data4@assay$logcount)[var0,]
142 | )
143 | pca0 = irlba(logmat0, nv = 12, center = T)$v
144 | umap0 = umap(pca0)$layout
145 | ann0 = read.table(file = paste0(PATH, "/Pancreas_Annotation.tsv"), sep = "\t", header = T, stringsAsFactors = F)
146 | m0 = data.frame(UMAP1 = umap0[,1], UMAP2 = umap0[,2], Set = ann0$Set, Type = ann0$Type)
147 | m0$Set = factor(m0$Set, levels = c("E-MTAB-5061", "GSE81076", "GSE85241", "GSE83139"))
148 | 
149 | ggplot(m0, aes(UMAP1, UMAP2)) + 
150 |   geom_point(aes(color = Set, shape = Set), size = 2, alpha = 1) + 
151 |   scale_color_manual(values = c("#1B9E77", "#D95F02", "#7570B3", "#E7298A")) + 
152 |   scale_shape_manual(values = c(1, 2, 6, 8)) + 
153 |   theme_bw(base_line_size = 0) + 
154 |   labs(x = 'UMAP-1', y = 'UMAP-2', color = 'Cell Type', shape = 'Set') + 
155 |   theme(plot.title = element_text(hjust = 0.5, size = 16)) + 
156 |   guides(colour = guide_legend(override.aes = list(size = 5), ncol = 1), shape = guide_legend(override.aes = list(size = 5), ncol = 1))
157 | ggplot(m0, aes(UMAP1, UMAP2)) + 
158 |   geom_point(aes(color = Type, shape = Set), size = 2, alpha = 1) + 
159 |   scale_color_manual(values = brewer.pal(8, 'Spectral')) + 
160 |   scale_shape_manual(values = c(1, 2, 6, 8)) + 
161 |   theme_bw(base_line_size = 0) + 
162 |   labs(x = 'UMAP-1', y = 'UMAP-2', color = 'Cell Type', shape = 'Set') + 
163 |   theme(plot.title = element_text(hjust = 0.5, size = 16)) + 
164 |   guides(colour = guide_legend(override.aes = list(size = 5), ncol = 1), shape = guide_legend(override.aes = list(size = 5), ncol = 1))
165 | 
166 | 
167 | ###################################################################################
168 | ### Data Integration ###
169 | ###################################################################################
170 | # Integrating
171 | data0 = list(data1, data2, data3, data4)
172 | var0 = read.table(file = paste0(PATH, "/var.tsv"), sep = "\t", header = F, stringsAsFactors = F)
173 | var0 = var0$V1
174 | InPlot(data0, var.gene = var0, minPC = 8, nPC = 12)
175 | data0 = scMultiIntegrate(data0, eigens = 8, var.gene = var0, ncore = 4, adjust = F, add.Id = c("GSE85241", "GSE81076", "E-MTAB-5061", "GSE83139"))
176 | data0 = scUMAP(data0, npc = 11, use = 'PLS')
177 | ann0 = read.table(file = paste0(PATH, "/Pancreas_Annotation.tsv"), sep = "\t", header = T, stringsAsFactors = F)
178 | data0@coldata$Platform = as.character(ann0$Platform)
179 | data0@coldata$Type = as.character(ann0$Type)
180 | 
181 | DimPlot(data0, slot = "cell.umap", colFactor = "Set", Colors = brewer.pal(4, "Dark2"))
182 | DimPlot(data0, slot = "cell.umap", colFactor = "Platform", Colors = brewer.pal(4, "Dark2"))
183 | DimPlot(data0, slot = "cell.umap", colFactor = "Type", label = T)
184 | DimPlot(data0, slot = "cell.umap", genes = "SST", size = 1)
185 | DimPlot(data0, slot = "cell.umap", genes = "PPY", size = 1)
186 | DimPlot(data0, slot = "cell.umap", genes = "GCG", size = 1)
187 | DimPlot(data0, slot = "cell.umap", genes = "INS", size = 1)
188 | DimPlot(data0, slot = "cell.umap", genes = "COL1A1", size = 1)
189 | DimPlot(data0, slot = "cell.umap", genes = "PRSS1", size = 1)
190 | DimPlot(data0, slot = "cell.umap", genes = "KRT19", size = 1)
191 | DimPlot(data0, slot = "cell.umap", genes = "FLT1", size = 1)
192 | 
193 | 
194 | 


--------------------------------------------------------------------------------
/RISC_Supplementary/GSE96583/GSE96583.R:
--------------------------------------------------------------------------------
  1 | library(RISC)
  2 | library(irlba)
  3 | library(umap)
  4 | library(Matrix)
  5 | library(matrixStats)
  6 | library(ggplot2)
  7 | library(RColorBrewer)
  8 | 
  9 | 
 10 | PATH = "/Path to the data/GSE96583_PBMC"
 11 | 
 12 | ###################################################################################
 13 | ### Preparing RISC Objects ###
 14 | ###################################################################################
 15 | # Input Data
 16 | data1 = read10Xgenomics(paste0(PATH, "/PBMC_Raw_Counts/Control"))
 17 | data2 = read10Xgenomics(paste0(PATH, "/PBMC_Raw_Counts/Stimulate"))
 18 | 
 19 | # Processing
 20 | process0 <- function(obj0){
 21 |   obj0 = scFilter(obj0, min.UMI = 200, min.gene = 200, max.UMI = 8000, min.cell = 3)
 22 |   obj0 = scNormalize(obj0)
 23 |   obj0 = scDisperse(obj0)
 24 |   print(length(obj0@vargene))
 25 |   return(obj0)
 26 | }
 27 | 
 28 | data1 = process0(data1)
 29 | data2 = process0(data2)
 30 | 
 31 | 
 32 | ###################################################################################
 33 | ### Uncorrected Data ###
 34 | ###################################################################################
 35 | ## Plot
 36 | Interplot <- function(m0, color0, shape0, group0 = "Type"){
 37 |   if(group0 == "Type"){
 38 |     g0 = ggplot(m0, aes(UMAP1, UMAP2)) + 
 39 |       geom_point(aes(color = CellType, shape = Group), size = 0.5, alpha = 1) + 
 40 |       scale_color_manual(values = color0) + 
 41 |       scale_shape_manual(values = shape0) + 
 42 |       theme_bw(base_line_size = 0) + 
 43 |       labs(color = "Cell-Type", shape = "Batch") + 
 44 |       theme(plot.title = element_text(hjust = 0.5, size = 16)) + 
 45 |       guides(color = guide_legend(override.aes = list(size = 6)), shape = guide_legend(override.aes = list(size = 5)))
 46 |   } else {
 47 |     g0 = ggplot(m0, aes(UMAP1, UMAP2)) + 
 48 |       geom_point(aes(color = Group, shape = Group), size = 0.5, alpha = 1) + 
 49 |       scale_color_manual(values = c("#FB8072", "#80B1D3")) + 
 50 |       scale_shape_manual(values = shape0) + 
 51 |       theme_bw(base_line_size = 0) + 
 52 |       labs(color = "Cell-Type", shape = "Batch") + 
 53 |       theme(plot.title = element_text(hjust = 0.5, size = 16)) + 
 54 |       guides(color = guide_legend(override.aes = list(size = 6)), shape = guide_legend(override.aes = list(size = 5)))
 55 |   }
 56 |   print(g0)
 57 | }
 58 | 
 59 | # Raw PCs
 60 | logmat1 = as.matrix(data1@assay$logcount)
 61 | logmat2 = as.matrix(data2@assay$logcount)
 62 | gene0 = intersect(rownames(logmat1), rownames(logmat2))
 63 | logmat0 = cbind(logmat1[gene0,], logmat2[gene0,])
 64 | pca0 = irlba(logmat0, nv = 50, center = T)$v
 65 | umap0 = umap(pca0[,1:16])$layout
 66 | 
 67 | # Plot
 68 | m0 = data.frame(UMAP1 = umap0[,1], UMAP2 = umap0[,2], Group = c(rep("IFN-", ncol(logmat1)), rep("IFN+", ncol(logmat2))))
 69 | shape0 = c(1, 8)
 70 | Interplot(m0, color0, shape0, group0 = "Set")
 71 | 
 72 | 
 73 | ###################################################################################
 74 | ### Data Integration with Full Cells ###
 75 | ###################################################################################
 76 | ## Integration
 77 | var0 = read.table(paste0(PATH, "/Var0_Ori.tsv"), sep = "\t", header = F, stringsAsFactors = F)
 78 | var0 = var0$V1
 79 | data3 = list(data1, data2)
 80 | InPlot(data3, var.gene = var0, nPC = 20)
 81 | data3 = scMultiIntegrate(data3, eigens = 15, var.gene = var0, add.Id = c('IFN-', 'IFN+'), adjust = FALSE, ncore = 4)
 82 | data3 = scUMAP(data3, npc = 18, use = 'PLS')
 83 | 
 84 | DimPlot(data3, slot = "cell.umap", colFactor = "Set", Colors = c("firebrick2", "skyblue"), size = 0.2)
 85 | 
 86 | 
 87 | ###################################################################################
 88 | ### Subset the Cells ###
 89 | ###################################################################################
 90 | Anno0 = read.table(paste0(PATH, "/Anno0.tsv"), sep = "\t", header = T, stringsAsFactors = F)
 91 | cell1 = Anno0$scBarcode[Anno0$Set == "IFN-"]
 92 | cell2 = Anno0$scBarcode[Anno0$Set == "IFN+"]
 93 | 
 94 | data1 = SubSet(data1, cells = cell1)
 95 | data2 = SubSet(data2, cells = cell2)
 96 | data1 = process0(data1)
 97 | data2 = process0(data2)
 98 | 
 99 | FilterPlot(data1)
100 | FilterPlot(data2)
101 | 
102 | 
103 | ###################################################################################
104 | ### Data Integration with Selected Cells ###
105 | ###################################################################################
106 | ## Integration
107 | var0 = read.table(paste0(PATH, "/Var0.tsv"), sep = "\t", header = F, stringsAsFactors = F)
108 | var0 = var0$V1
109 | data3 = list(data1, data2)
110 | InPlot(data3, var.gene = var0, nPC = 20)
111 | data3 = scMultiIntegrate(data3, eigens = 16, var.gene = var0, add.Id = c('IFN-', 'IFN+'), adjust = TRUE, ncore = 4)
112 | data3 = scUMAP(data3, npc = 16, use = 'PLS')
113 | DimPlot(data3, slot = "cell.umap", colFactor = "Set", Colors = c("firebrick2", "skyblue"), size = 0.2)
114 | 
115 | # Cell Type
116 | Anno0 = read.table(paste0(PATH, "/Anno0.tsv"), sep = "\t", header = T, stringsAsFactors = F)
117 | data3@coldata$Type0 = Anno0$Type0
118 | DimPlot(data3, slot = "cell.umap", colFactor = "Type0", size = 0.2, label = T)
119 | 
120 | # Marker Genes
121 | DimPlot(data3, slot = "cell.umap", genes = "CD8A", size = 0.2)
122 | DimPlot(data3, slot = "cell.umap", genes = "FYN", size = 0.2)
123 | DimPlot(data3, slot = "cell.umap", genes = "CACYBP", size = 0.2)
124 | DimPlot(data3, slot = "cell.umap", genes = "IL1B", size = 0.2)
125 | DimPlot(data3, slot = "cell.umap", genes = "HES4", size = 0.2)
126 | DimPlot(data3, slot = "cell.umap", genes = "GNLY", size = 0.2)
127 | DimPlot(data3, slot = "cell.umap", genes = "AES", size = 0.2)
128 | DimPlot(data3, slot = "cell.umap", genes = "CD79A", size = 0.2)
129 | DimPlot(data3, slot = "cell.umap", genes = "CD79B", size = 0.2)
130 | DimPlot(data3, slot = "cell.umap", genes = "SMPDL3A", size = 0.2)
131 | DimPlot(data3, slot = "cell.umap", genes = "MIR155HG", size = 0.2)
132 | DimPlot(data3, slot = "cell.umap", genes = "CKB", size = 0.2)
133 | DimPlot(data3, slot = "cell.umap", genes = "PPBP", size = 0.2)
134 | 
135 | 
136 | 


--------------------------------------------------------------------------------
/Release.txt:
--------------------------------------------------------------------------------
 1 | RISC v1.6.0 release (03.22.2022)
 2 | 
 3 | Changes from the last release (v1.5)
 4 | This version mainly solves the problems that is caused by dependent package updates
 5 | 
 6 | Changes from the last release (v1)
 7 | (1) Replace dependent "RcppEigen" with "RcppArmadillo", fully support sparse matrix in core functions.
 8 | (2) Replace dependent "pbmcapply" with "pbapply"
 9 | (3) Optimize "scMultiIntegrate" function and reduce memory-consuming; the new RISC release can support integration of datasets with >1.5 million cells and 10,000 genes.
10 | (4) Convert "logcounts" in the integrated RISC object "object@assay$logcount" from a large matrix to a list including multiple logcounts matrices, each corrected matrix for the corresponding individual data sets. To output full integrated matrix, mat0 = do.call(cbind, object@assay$logcount)
11 | (5) Change function name "readscdata" -> "readsc"
12 | (6) Change function name "read10Xgenomics" -> "read10X_mtx"
13 | (7) Parameter names in some functions are changed.
14 | 
15 | Added new functions
16 | (1) In "scMarker" and "AllMarker" functions, add Wilcoxon Rank Sum and Signed Rank model.
17 | (2) In "scMarker", "AllMarker" and "scDEG" functions, add pseudo-cell (bin cells to generate meta-cells) option to detect marker genes.
18 | (3) Add "slot" parameter in "DimPlot" function, external dimension reduction results can be added in RISC object, e.g. add phate results (phate0) to RISC object obj0@DimReduction$cell.phate = phate0; DimPlot(obj0, slot = "cell.phate", colFactor = 'Group', size = 2, label = TRUE)
19 | (4) Add "read10X_h5" function for 10X Genomics h5 file.
20 | 
21 | Removed old functions
22 | (1) delete "readHTSeqdata" function.
23 | 


--------------------------------------------------------------------------------
/Seurat_to_RISC_RISC_v1.0.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/bioinfoDZ/RISC/7f1ddf61711f0d24ef8ea0601e550370ff3ec2b5/Seurat_to_RISC_RISC_v1.0.pdf


--------------------------------------------------------------------------------
/build/vignette.rds:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/bioinfoDZ/RISC/7f1ddf61711f0d24ef8ea0601e550370ff3ec2b5/build/vignette.rds


--------------------------------------------------------------------------------
/data/datalist:
--------------------------------------------------------------------------------
1 | raw.mat
2 | 


--------------------------------------------------------------------------------
/data/raw.mat.rda:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/bioinfoDZ/RISC/7f1ddf61711f0d24ef8ea0601e550370ff3ec2b5/data/raw.mat.rda


--------------------------------------------------------------------------------
/inst/doc/RISC_Vignette.R:
--------------------------------------------------------------------------------
 1 | ## ----include = FALSE----------------------------------------------------------
 2 | knitr::opts_chunk$set(
 3 |   collapse = TRUE,
 4 |   comment = "#>"
 5 | )
 6 | 
 7 | ## ----setup--------------------------------------------------------------------
 8 | library(RISC)
 9 | library(RColorBrewer)
10 | 
11 | data("raw.mat")
12 | mat0 = raw.mat[[1]]
13 | coldata0 = raw.mat[[2]]
14 | 
15 | coldata1 = coldata0[coldata0$Batch0 == 'Batch1',]
16 | coldata2 = coldata0[coldata0$Batch0 == 'Batch2',]
17 | coldata3 = coldata0[coldata0$Batch0 == 'Batch3',]
18 | coldata4 = coldata0[coldata0$Batch0 == 'Batch4',]
19 | coldata5 = coldata0[coldata0$Batch0 == 'Batch5',]
20 | coldata6 = coldata0[coldata0$Batch0 == 'Batch6',]
21 | mat1 = mat0[,rownames(coldata1)]
22 | mat2 = mat0[,rownames(coldata2)]
23 | mat3 = mat0[,rownames(coldata3)]
24 | mat4 = mat0[,rownames(coldata4)]
25 | mat5 = mat0[,rownames(coldata5)]
26 | mat6 = mat0[,rownames(coldata6)]
27 | 
28 | ## -----------------------------------------------------------------------------
29 | sce1 = readsc(count = mat1, cell = coldata1, gene = data.frame(Symbol = rownames(mat1), row.names = rownames(mat1)), is.filter = FALSE)
30 | sce2 = readsc(count = mat2, cell = coldata2, gene = data.frame(Symbol = rownames(mat2), row.names = rownames(mat2)), is.filter = FALSE)
31 | sce3 = readsc(count = mat3, cell = coldata3, gene = data.frame(Symbol = rownames(mat3), row.names = rownames(mat3)), is.filter = FALSE)
32 | sce4 = readsc(count = mat4, cell = coldata4, gene = data.frame(Symbol = rownames(mat4), row.names = rownames(mat4)), is.filter = FALSE)
33 | sce5 = readsc(count = mat5, cell = coldata5, gene = data.frame(Symbol = rownames(mat5), row.names = rownames(mat5)), is.filter = FALSE)
34 | sce6 = readsc(count = mat6, cell = coldata6, gene = data.frame(Symbol = rownames(mat6), row.names = rownames(mat6)), is.filter = FALSE)
35 | 
36 | ## -----------------------------------------------------------------------------
37 | process0 <- function(obj0){
38 |   
39 |   # filter cells
40 |   obj0 = scFilter(obj0, min.UMI = 0, max.UMI = Inf, min.gene = 10, min.cell = 3, is.filter = FALSE)
41 |   
42 |   # normalize data
43 |   obj0 = scNormalize(obj0, method = 'robust')
44 |   
45 |   # find highly variable genes
46 |   obj0 = scDisperse(obj0)
47 |   
48 |   # here replace highly variable genes by all the genes for integraton
49 |   obj0@vargene = rownames(sce1@rowdata)
50 |   
51 |   return(obj0)
52 |   
53 | }
54 | 
55 | sce1 = process0(sce1)
56 | sce2 = process0(sce2)
57 | sce3 = process0(sce3)
58 | sce4 = process0(sce4)
59 | sce5 = process0(sce5)
60 | sce6 = process0(sce6)
61 | 
62 | ## -----------------------------------------------------------------------------
63 | set.seed(1)
64 | var.genes = rownames(sce1@assay$count)
65 | pcr0 = list(sce1, sce2, sce3, sce4, sce5, sce6)
66 | pcr0 = scMultiIntegrate(pcr0, eigens = 9, var.gene = var.genes, align = 'OLS', npc = 15)
67 | # pcr0 = scLargeIntegrate(pcr0, var.gene = var.genes, align = 'Predict', npc = 8)
68 | pcr0 =scUMAP(pcr0, npc = 9, use = 'PLS', dist = 0.001, neighbors = 15)
69 | 
70 | ## -----------------------------------------------------------------------------
71 | pcr0@coldata$Group = factor(pcr0@coldata$Group0, levels = c('Group1', 'Group2', 'Group2*', 'Group3', 'Group3*'), labels = c("a", "b", "b'", "c", "c'"))
72 | pcr0@coldata$Set0 = factor(pcr0@coldata$Set, levels = c('Set1', 'Set2', 'Set3', 'Set4', 'Set5', 'Set6'), labels = c('Set1 rep.1', 'Set2 rep.1', 'Set3 rep.1', 'Set1 rep.2', 'Set2 rep.2', 'Set3 rep.2'))
73 | pcr0 = scCluster(pcr0, slot = "cell.umap", k = 4, method = "density", dc = 0.3)
74 | 
75 | ## ----fig.show="hold", out.width="48", fig.dim=c(7, 5)-------------------------
76 | DimPlot(pcr0, colFactor = 'Set0', size = 2)
77 | DimPlot(pcr0, colFactor = 'Group', size = 2, Colors = brewer.pal(5, "Set1"))
78 | DimPlot(pcr0, colFactor = 'Cluster', size = 2, Colors = brewer.pal(6, "Dark2"))
79 | 
80 | ## -----------------------------------------------------------------------------
81 | sessionInfo()
82 | 
83 | 


--------------------------------------------------------------------------------
/inst/doc/RISC_Vignette.Rmd:
--------------------------------------------------------------------------------
  1 | ---
  2 | title: "RSC: robust integration of single-cell RNA-seq datasets using a single reference space"
  3 | author: 
  4 | - name: Yang Liu, Tao Wang, Deyou Zheng
  5 |   affiliation: Albert einstein college of medicine, Bronx, NY, United States
  6 | date: "2021"
  7 | output: rmarkdown::html_vignette
  8 | package: RISC
  9 | vignette: >
 10 |   %\VignetteIndexEntry{RSC: robust integration of single-cell RNA-seq datasets using a single reference space}
 11 |   %\VignetteEngine{knitr::rmarkdown}
 12 |   %\VignetteEncoding{UTF-8}
 13 | ---
 14 | 
 15 | ```{r, include = FALSE}
 16 | knitr::opts_chunk$set(
 17 |   collapse = TRUE,
 18 |   comment = "#>"
 19 | )
 20 | ```
 21 | 
 22 | # Introduction
 23 | 
 24 | Single-cell RNA sequencing (scRNA-seq) has become an essential genomic technology for resolving gene expression heterogeneity in single cells and has been widely used in many biological domains. It remains challenging to integrate scRNA-seq datasets with inter-sample heterogeneity, for example,  different cell subpopulation compositions among datasets or gene expression difference in the same cell groups across datasets.
 25 | 
 26 | We find that the distortion in the integration of heterogeneous data is due to the lack of a consistent global reference space for projecting all cells from individual datasets. To overcome this issue, we develop a novel approach, named reference principal component integration (RPCI) and implemente it in a new scRNA-seq analysis package called “RISC”, for robust integration of scRNA-seq data.
 27 | 
 28 | 
 29 | ```{r setup}
 30 | library(RISC)
 31 | library(RColorBrewer)
 32 | 
 33 | data("raw.mat")
 34 | mat0 = raw.mat[[1]]
 35 | coldata0 = raw.mat[[2]]
 36 | 
 37 | coldata1 = coldata0[coldata0$Batch0 == 'Batch1',]
 38 | coldata2 = coldata0[coldata0$Batch0 == 'Batch2',]
 39 | coldata3 = coldata0[coldata0$Batch0 == 'Batch3',]
 40 | coldata4 = coldata0[coldata0$Batch0 == 'Batch4',]
 41 | coldata5 = coldata0[coldata0$Batch0 == 'Batch5',]
 42 | coldata6 = coldata0[coldata0$Batch0 == 'Batch6',]
 43 | mat1 = mat0[,rownames(coldata1)]
 44 | mat2 = mat0[,rownames(coldata2)]
 45 | mat3 = mat0[,rownames(coldata3)]
 46 | mat4 = mat0[,rownames(coldata4)]
 47 | mat5 = mat0[,rownames(coldata5)]
 48 | mat6 = mat0[,rownames(coldata6)]
 49 | ```
 50 | 
 51 | 
 52 | # Heterogeneous Simulated data
 53 | 
 54 | To show the advantages of RPCI and evaluate its performances quantitatively, we start with simulated data and control the degrees of gene expression difference in two of the three cell groups between datasets: more DE genes in c/c' than that in b/b', and no DE genes in group a. As expected, the cell groups with increasing DE genes displayed a gradual reduction in cell similarity.
 55 | 
 56 | # Create RISC objects
 57 | 
 58 | We generate RISC objects from the gene-cell matrix (mat), the data frame of the cells (coldata), and the data frame of the genes, using "readsc" function. The RISC objects can also be generated by "read10X_mtx" or "read10X_h5" function for 10X Genomics data.
 59 | 
 60 | 
 61 | ```{r}
 62 | sce1 = readsc(count = mat1, cell = coldata1, gene = data.frame(Symbol = rownames(mat1), row.names = rownames(mat1)), is.filter = FALSE)
 63 | sce2 = readsc(count = mat2, cell = coldata2, gene = data.frame(Symbol = rownames(mat2), row.names = rownames(mat2)), is.filter = FALSE)
 64 | sce3 = readsc(count = mat3, cell = coldata3, gene = data.frame(Symbol = rownames(mat3), row.names = rownames(mat3)), is.filter = FALSE)
 65 | sce4 = readsc(count = mat4, cell = coldata4, gene = data.frame(Symbol = rownames(mat4), row.names = rownames(mat4)), is.filter = FALSE)
 66 | sce5 = readsc(count = mat5, cell = coldata5, gene = data.frame(Symbol = rownames(mat5), row.names = rownames(mat5)), is.filter = FALSE)
 67 | sce6 = readsc(count = mat6, cell = coldata6, gene = data.frame(Symbol = rownames(mat6), row.names = rownames(mat6)), is.filter = FALSE)
 68 | ```
 69 | 
 70 | 
 71 | # Processing RISC data
 72 | 
 73 | After create RISC objects, we next process the RISC data, here we show the standard processes:
 74 |    (1) filter the cells, remove cells with extremely low or high UMIs and discard cells with extremely low number of expressed genes. Here we use simulated data, so we do not filter out any cell.
 75 |    (2) normalize gene expression, removing the effect of RNA sequencing depth.
 76 |    (3) scale gene expression, the scaled counts merely contain gene signal information for individual cells, and yield column-wise zero empirical mean for each column, thus satisfying the requirement for PCA and SVD.
 77 |    (4) find highly variable genes, identify highly variable genes by Quasi-Poisson model and utilize them for gene-cell matrix decomposition and data integration.
 78 | 
 79 | 
 80 | ```{r}
 81 | process0 <- function(obj0){
 82 |   
 83 |   # filter cells
 84 |   obj0 = scFilter(obj0, min.UMI = 0, max.UMI = Inf, min.gene = 10, min.cell = 3, is.filter = FALSE)
 85 |   
 86 |   # normalize data
 87 |   obj0 = scNormalize(obj0, method = 'robust')
 88 |   
 89 |   # find highly variable genes
 90 |   obj0 = scDisperse(obj0)
 91 |   
 92 |   # here replace highly variable genes by all the genes for integraton
 93 |   obj0@vargene = rownames(sce1@rowdata)
 94 |   
 95 |   return(obj0)
 96 |   
 97 | }
 98 | 
 99 | sce1 = process0(sce1)
100 | sce2 = process0(sce2)
101 | sce3 = process0(sce3)
102 | sce4 = process0(sce4)
103 | sce5 = process0(sce5)
104 | sce6 = process0(sce6)
105 | ```
106 | 
107 | 
108 | # RPCI integration
109 | 
110 | The core principle of RPCI is very different from existing methods, RPCI introduces an effective formula to calibrate cell similarity by a global reference, and directly projects all cells into a reference RPCI space.
111 | 
112 | 
113 | ```{r}
114 | set.seed(1)
115 | var.genes = rownames(sce1@assay$count)
116 | pcr0 = list(sce1, sce2, sce3, sce4, sce5, sce6)
117 | pcr0 = scMultiIntegrate(pcr0, eigens = 9, var.gene = var.genes, align = 'OLS', npc = 15)
118 | # pcr0 = scLargeIntegrate(pcr0, var.gene = var.genes, align = 'Predict', npc = 8)
119 | pcr0 =scUMAP(pcr0, npc = 9, use = 'PLS', dist = 0.001, neighbors = 15)
120 | ```
121 | 
122 | ```{r}
123 | pcr0@coldata$Group = factor(pcr0@coldata$Group0, levels = c('Group1', 'Group2', 'Group2*', 'Group3', 'Group3*'), labels = c("a", "b", "b'", "c", "c'"))
124 | pcr0@coldata$Set0 = factor(pcr0@coldata$Set, levels = c('Set1', 'Set2', 'Set3', 'Set4', 'Set5', 'Set6'), labels = c('Set1 rep.1', 'Set2 rep.1', 'Set3 rep.1', 'Set1 rep.2', 'Set2 rep.2', 'Set3 rep.2'))
125 | pcr0 = scCluster(pcr0, slot = "cell.umap", k = 4, method = "density", dc = 0.3)
126 | ```
127 | 
128 | 
129 | # UMAP plot
130 | 
131 | The dissimilarity in c/c' is larger than that in b/b' based on our original design, and this cell-cell relationship can be directly reflected in the UMAP plots of the RPCI-integrated data. And the difference in c/c' can be re-clustered from the integrated data.
132 | 
133 | Here the simulated data contain three sets, each set includes three cell groups and two replicates, with the batches existing among sets and duplicates. 
134 | 
135 | 
136 | ```{r, fig.show="hold", out.width="48", fig.dim=c(7, 5)}
137 | DimPlot(pcr0, colFactor = 'Set0', size = 2)
138 | DimPlot(pcr0, colFactor = 'Group', size = 2, Colors = brewer.pal(5, "Set1"))
139 | DimPlot(pcr0, colFactor = 'Cluster', size = 2, Colors = brewer.pal(6, "Dark2"))
140 | ```
141 | 
142 | 
143 | More details and real scRNA-seq data tutorial shown in the URLs: https://github.com/yangRISC/RISC
144 | 
145 | 
146 | # Session Information
147 | ```{r}
148 | sessionInfo()
149 | ```
150 | 


--------------------------------------------------------------------------------
/man/AddFactor.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/Utilities.R
 3 | \name{AddFactor}
 4 | \alias{AddFactor}
 5 | \title{Utilities Add Factors}
 6 | \usage{
 7 | AddFactor(object, colData = NULL, rowData = NULL, value = NULL)
 8 | }
 9 | \arguments{
10 | \item{object}{RISC object: a framework dataset.}
11 | 
12 | \item{colData}{Input the names that will be added into coldata of RISC object, 
13 | it should be characters, as the col.names of coldata.}
14 | 
15 | \item{rowData}{Input the names that will be added into rowdata of RISC object,
16 | it should be characters, as the col.names of rowdata.}
17 | 
18 | \item{value}{The factor vector or data.frame that will be added into coldata 
19 | or rowdata, the vector/data.frame should have equal names/row.names to the 
20 | row.names coldata or rowdata of RISC object. The input: vector or data.frame.}
21 | }
22 | \description{
23 | The "AddFactor" function can add factors to the full dataset, this function 
24 | can add one or more factors into coldata. Here the row.names/names of factor 
25 | matrix/vector should be equal to the row.names of coldata of RISC object.
26 | }
27 | 


--------------------------------------------------------------------------------
/man/All-Cluster-Marker.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/ClusterMarker.R
 3 | \name{AllMarker}
 4 | \alias{AllMarker}
 5 | \title{Find All Cluster Markers}
 6 | \usage{
 7 | AllMarker(
 8 |   object,
 9 |   positive = TRUE,
10 |   frac = 0.25,
11 |   log2FC = 0.5,
12 |   Padj = 0.05,
13 |   latent.factor = NULL,
14 |   min.cells = 25L,
15 |   method = "QP",
16 |   ncore = 1
17 | )
18 | }
19 | \arguments{
20 | \item{object}{RISC object: a framework dataset.}
21 | 
22 | \item{positive}{Whether only output the cluster markers with positive log2FC.}
23 | 
24 | \item{frac}{A fraction cutoff, the marker genes expressed at least a 
25 | cutoff fraction of all the cells.}
26 | 
27 | \item{log2FC}{The cutoff of log2 Fold-change for differentially expressed marker 
28 | genes.}
29 | 
30 | \item{Padj}{The cutoff of the adjusted P-value.}
31 | 
32 | \item{latent.factor}{The latent factor from coldata, which represents number 
33 | values or factors, and only one latent factor can be inputed.}
34 | 
35 | \item{min.cells}{The threshold for the cell number of valid clusters.}
36 | 
37 | \item{method}{Which method is used to identify cluster markers, three options: 'NB' 
38 | for Negative Binomial model, 'QP' for QuasiPoisson model, and 'Wilcox' for Wilcoxon 
39 | Rank Sum and Signed Rank model.}
40 | 
41 | \item{ncore}{The multiple cores for parallel calculating.}
42 | }
43 | \description{
44 | This function depends on "scMarker" function by using the same criteria, and 
45 | generates markers for all the clusters. Here, if the cell number of any cluster 
46 | is less than 10 , RISC will skip this cluster and not detect its cluster markers 
47 | in the default parameters.
48 | }
49 | \details{
50 | Because log2 cannot handle counts with value 0, we use log1p to calculate average 
51 | values of counts and log2 to format fold-change.
52 | }
53 | 


--------------------------------------------------------------------------------
/man/Cluster-Marker.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/ClusterMarker.R
 3 | \name{scMarker}
 4 | \alias{scMarker}
 5 | \title{Find Cluster Markers}
 6 | \usage{
 7 | scMarker(
 8 |   object,
 9 |   cluster = 1,
10 |   positive = TRUE,
11 |   frac = 0.25,
12 |   log2FC = 0.5,
13 |   Padj = 0.05,
14 |   latent.factor = NULL,
15 |   method = "QP",
16 |   min.cells = 10,
17 |   ncore = 1
18 | )
19 | }
20 | \arguments{
21 | \item{object}{RISC object: a framework dataset.}
22 | 
23 | \item{cluster}{Select the cluster that we want to detect cluster marker genes.}
24 | 
25 | \item{positive}{Whether only output the cluster markers with positive log2FC.}
26 | 
27 | \item{frac}{A fraction cutoff, the marker genes expressed at least a 
28 | cutoff fraction of all the cells.}
29 | 
30 | \item{log2FC}{The cutoff of log2 Fold-change for differentially expressed marker 
31 | genes.}
32 | 
33 | \item{Padj}{The cutoff of the adjusted P-value.}
34 | 
35 | \item{latent.factor}{The latent factor from coldata, which represents number 
36 | values or factors, and only one latent factor can be inputed.}
37 | 
38 | \item{method}{Which method is used to identify cluster markers, three options: 'NB' 
39 | for Negative Binomial model, 'QP' for QuasiPoisson model, and 'Wilcox' for Wilcoxon 
40 | Rank Sum and Signed Rank model.}
41 | 
42 | \item{min.cells}{The minimum cells for each cluster to calculate marker genes.}
43 | 
44 | \item{ncore}{The multiple cores for parallel calculating.}
45 | }
46 | \description{
47 | This is the basic function in RISC, it can identify the cluster markers by
48 | comparing samples in the selected cluster to the samples of the rest clusters.
49 | Therefore, it is possible one gene labeled as a marker in more than one clusters.
50 | Two methods are employed in RISC, one is based on Negative Binomial model 
51 | while the other using QuasiPoisson model.
52 | }
53 | \details{
54 | Because log2 cannot handle counts with value 0, we use log1p to calculate average 
55 | values of counts and log2 to format fold-change.
56 | }
57 | \examples{
58 | # RISC object
59 | obj0 = raw.mat[[3]]
60 | obj0 = scPCA(obj0, npc = 10)
61 | obj0 = scUMAP(obj0, npc = 3)
62 | obj0 = scCluster(obj0, slot = "cell.umap", k = 3, method = 'density')
63 | DimPlot(obj0, slot = "cell.umap", colFactor = 'Cluster', size = 2)
64 | marker1 = scMarker(obj0, cluster = 1, method = 'QP', min.cells = 3)
65 | }
66 | \references{
67 | Paternoster et al., Criminology (1997)
68 | 
69 | Berk et al., Journal of Quantitative Criminology (2008)
70 | 
71 | Liu et al., Nature Biotech. (2021)
72 | }
73 | 


--------------------------------------------------------------------------------
/man/Cluster.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/Cluster.R
 3 | \name{scCluster}
 4 | \alias{scCluster}
 5 | \title{Clustering cells}
 6 | \usage{
 7 | scCluster(
 8 |   object,
 9 |   slot = "cell.pca",
10 |   neighbor = 10,
11 |   algorithm = "kd_tree",
12 |   method = "louvain",
13 |   npc = 20,
14 |   k = 10,
15 |   res = 0.5,
16 |   dc = NULL,
17 |   redo = TRUE,
18 |   random.seed = 123
19 | )
20 | }
21 | \arguments{
22 | \item{object}{RISC object: a framework dataset.}
23 | 
24 | \item{slot}{The dimension_reduction slot for cell clustering. The default is 
25 | "cell.umap" under RISC object "DimReduction" item for UMAP method, but the 
26 | customer can add new dimension_reduction method under DimReduction and use it.}
27 | 
28 | \item{neighbor}{The neighbor cells for "igraph" method.}
29 | 
30 | \item{algorithm}{The algorithm for knn, the default is "kd_tree", all options: 
31 | "kd_tree", "cover_tree", "CR", "brute".}
32 | 
33 | \item{method}{The methods for clustering cells, density and louvain. 
34 | The "density" is based on the slot "cell.umap" or other low dimensional 
35 | space; while "louvain" based on "cell.pca" (individual data) or "cell.pls" 
36 | (for integration data).}
37 | 
38 | \item{npc}{The number of PCA or PLS used for cell clustering.}
39 | 
40 | \item{k}{The number of cluster searched for, works in "density" method.}
41 | 
42 | \item{res}{The resolution of cluster searched for, works in "louvain" method.}
43 | 
44 | \item{dc}{The distance used to generate random center points which affect 
45 | clusters. If have no idea about this, do not input anything. Keep it as the 
46 | default value for most users. Work for "density" method.}
47 | 
48 | \item{redo}{Whether re-cluster the cells.}
49 | 
50 | \item{random.seed}{The random seed, the default is 123.}
51 | }
52 | \value{
53 | RISC single cell dataset, the cluster slot.
54 | }
55 | \description{
56 | In RISC, two different methods are provided to cluster cells, all of them are 
57 | widely used in single cells. The first method is "louvain" based on cell 
58 | eigenvectors, and the other is "density" which calculates cell clusters using 
59 | low dimensional space.
60 | }
61 | \examples{
62 | # RISC object
63 | obj0 = raw.mat[[3]]
64 | obj0 = scPCA(obj0, npc = 10)
65 | obj0 = scUMAP(obj0, npc = 3)
66 | obj0 = scCluster(obj0, slot = "cell.umap", k = 3, method = 'density')
67 | DimPlot(obj0, slot = "cell.umap", colFactor = 'Cluster', size = 2)
68 | }
69 | \references{
70 | Blondel et al., JSTAT (2008)
71 | 
72 | Rodriguez et al., Sicence (2014)
73 | }
74 | 


--------------------------------------------------------------------------------
/man/DimPlot.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/Graph.R
 3 | \name{tSNEPlot}
 4 | \alias{tSNEPlot}
 5 | \alias{DimPlot}
 6 | \title{Dimension Reduction Plots}
 7 | \usage{
 8 | DimPlot(
 9 |   object,
10 |   slot = "cell.umap",
11 |   colFactor = NULL,
12 |   genes = NULL,
13 |   legend = TRUE,
14 |   Colors = NULL,
15 |   size = 0.5,
16 |   Alpha = 0.8,
17 |   plot.ncol = NULL,
18 |   exp.range = NULL,
19 |   exp.col = "firebrick2",
20 |   label = FALSE,
21 |   adjust.label = 0.25,
22 |   label.font = 5
23 | )
24 | }
25 | \arguments{
26 | \item{object}{RISC object: a framework dataset.}
27 | 
28 | \item{slot}{The dimension_reduction slot for drawing the plots. The default is 
29 | "cell.umap" under RISC object "DimReduction" item for UMAP plot, but the customer 
30 | can add new dimension_reduction method under DimReduction and use it.}
31 | 
32 | \item{colFactor}{Use the factor (column name) in the coldata to make a 
33 | dimension_reduction plot, but each time only one column name can be inputted.}
34 | 
35 | \item{genes}{Use the gene expression values (gene symbol) to make dimension 
36 | reduction plot, each time more than one genes can be inputted.}
37 | 
38 | \item{legend}{Whether a legend shown at dimension_reduction plot.}
39 | 
40 | \item{Colors}{The users can use their own colors (color vector). The default of 
41 | the "tSNEPlot" funciton will assign colors automatically.}
42 | 
43 | \item{size}{Choose the size of dots at dimension_reduction plot, the default 
44 | size is 0.5.}
45 | 
46 | \item{Alpha}{Whether show transparency of individual points, the default is 0.8.}
47 | 
48 | \item{plot.ncol}{If the users input more than one genes, the arrangement of 
49 | multiple dimension_reduction plot depends on this parameter.}
50 | 
51 | \item{exp.range}{The gene expression cutoff for plot, e.g. "c(0, 1.5)" for 
52 | expression level between 0 and 1.5.}
53 | 
54 | \item{exp.col}{The gradient color for gene expression.}
55 | 
56 | \item{label}{Whether label the clusters or cell populations in the plot.}
57 | 
58 | \item{adjust.label}{The adjustment of the label position.}
59 | 
60 | \item{label.font}{The font size for the label.}
61 | }
62 | \description{
63 | The Dimension Reduction plots are widespread in scRNA-seq data analysis. Here, 
64 | the "DimPlot" function not only can make plots for factor labels of individual 
65 | cells but also can show gene expression values of each cell.
66 | }
67 | \examples{
68 | # RISC object
69 | obj0 = raw.mat[[3]]
70 | obj0 = scPCA(obj0, npc = 10)
71 | obj0 = scUMAP(obj0, npc = 3)
72 | DimPlot(obj0, slot = "cell.umap", colFactor = 'Group', size = 2, label = TRUE)
73 | DimPlot(obj0, genes = c('Gene718', 'Gene325', 'Gene604'), size = 2)
74 | }
75 | \references{
76 | Wickham, H. (2016)
77 | 
78 | Auguie, B. (2015)
79 | }
80 | 


--------------------------------------------------------------------------------
/man/Disperse.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/Preprocess.R
 3 | \name{scDisperse}
 4 | \alias{scDisperse}
 5 | \title{Processing Data.}
 6 | \usage{
 7 | scDisperse(
 8 |   object,
 9 |   method = "loess",
10 |   min.UMI = NULL,
11 |   mean.cut = NULL,
12 |   QP_bin = 100,
13 |   lspan = 0.05,
14 |   top.var = NULL,
15 |   pval = 0.5
16 | )
17 | }
18 | \arguments{
19 | \item{object}{RISC object: a framework dataset.}
20 | 
21 | \item{method}{What method is used to define dispersion, now support "QP" and "loess", 
22 | the default method is "loess".}
23 | 
24 | \item{min.UMI}{A cutoff of the minimum UMIs of each gene, the genes expression 
25 | below the min.UMI will be discarded from highly variable genes. The default value
26 | is 100 when input NULL.}
27 | 
28 | \item{mean.cut}{A cutoff of the average value of each gene, the genes expression 
29 | outside the mean.cut range will be discarded from highly variable genes. The 
30 | input is a range vector, like c(0.1, 5).}
31 | 
32 | \item{QP_bin}{The number of fragments using to fit dispersion in the Quasi-Poinsson 
33 | model, how many bins are formed in the regression.}
34 | 
35 | \item{lspan}{The number of parameter using to fit dispersion, controlling the degree 
36 | of smoothing in the loess model.}
37 | 
38 | \item{top.var}{The maximum number of highly variable genes, the default is NULL, 
39 | including all the highly variable genes.}
40 | 
41 | \item{pval}{The P-value is used to cut off the highly variable genes, 
42 | the default is 0.5.}
43 | }
44 | \value{
45 | RISC single cell dataset, the metadata slot.
46 | }
47 | \description{
48 | After data scaling, RISC will identify highly variably expressed genes, based on 
49 | Quasi-Poinsson model, where the coefficient of variation is calculated for each 
50 | gene (C > 0.5 as a cutoff for highly variable genes). Then, to controlling for 
51 | the relationship between S (standard deviation) and mean (average value), 
52 | Quasi-Poisson regression is used to further filter the genes with over-
53 | dispersion C caused by small mean. Lastly, RISC estimates the corresponding 
54 | ratio with r between the observed C and the predicted C of each gene, with a 
55 | threshold r > 1.
56 | }
57 | \examples{
58 | # RISC object
59 | obj0 = raw.mat[[5]]
60 | obj0 = scFilter(obj0, min.UMI = 0, max.UMI = Inf, min.gene = 10, min.cell = 3)
61 | obj0 = scNormalize(obj0)
62 | obj0 = scDisperse(obj0)
63 | }
64 | \references{
65 | Liu et al., Nature Biotech. (2021)
66 | }
67 | 


--------------------------------------------------------------------------------
/man/Filter.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/Preprocess.R
 3 | \name{scFilter}
 4 | \alias{scFilter}
 5 | \title{The Pre-processing Data.}
 6 | \usage{
 7 | scFilter(
 8 |   object,
 9 |   min.UMI = NULL,
10 |   max.UMI = NULL,
11 |   min.gene = NULL,
12 |   min.cell = NULL,
13 |   mitochon = 1,
14 |   gene.ratio = 0.05,
15 |   is.filter = TRUE
16 | )
17 | }
18 | \arguments{
19 | \item{object}{RISC object: a framework dataset.}
20 | 
21 | \item{min.UMI}{The min UMI for valid cells is usually based on the distribution 
22 | analysis, with more than 2.5 percentage of UMI distribution, discarding the 
23 | cells with too few UMIs. This parameter can be adjusted manually.}
24 | 
25 | \item{max.UMI}{The max UMI for valid cells is usually based on the distribution 
26 | analysis, with less than 97.5 percentage of UMI distribution, discarding the 
27 | cells with too many UMIs.This parameter can be adjusted manually.}
28 | 
29 | \item{min.gene}{The min number of expressed genes for valid cells. The default 
30 | is based on the distribution analysis, with more than 0.5 percentage of gene 
31 | distribution. This parameter can be adjusted manually.}
32 | 
33 | \item{min.cell}{The min number of cells for valid expressed genes. The default 
34 | is based on the distribution analysis, with more than 0.5 percentage of cell 
35 | distribution. This parameter can be adjusted manually.}
36 | 
37 | \item{mitochon}{The cutoff of the mitochondrial UMI proportion for valid cells.}
38 | 
39 | \item{gene.ratio}{The cutoff of the proportions of genes in UMIs.}
40 | 
41 | \item{is.filter}{Whether filter the data.}
42 | }
43 | \value{
44 | RISC single cell dataset, the coldata and rowdata slots.
45 | }
46 | \description{
47 | After input data, RISC preliminarily filter datasets by using three criteria: 
48 | first, discard cells with too low/high raw counts/UMIs by distribution analysis. 
49 | Second, remove cells with too low expressed genes. Lastly, filter out the genes 
50 | only expressed in few cells.
51 | }
52 | \examples{
53 | # RISC object
54 | obj0 = raw.mat[[5]]
55 | obj0 = scFilter(obj0, min.UMI = 0, max.UMI = Inf, min.gene = 10, min.cell = 3)
56 | }
57 | \references{
58 | Liu et al., Nature Biotech. (2021)
59 | }
60 | 


--------------------------------------------------------------------------------
/man/FilterPlot.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/Graph.R
 3 | \name{FilterPlot}
 4 | \alias{FilterPlot}
 5 | \title{Processing Plot}
 6 | \usage{
 7 | FilterPlot(object, colFactor = NULL)
 8 | }
 9 | \arguments{
10 | \item{object}{RISC object: a framework dataset.}
11 | 
12 | \item{colFactor}{Use the factor (column name) in the coldata to make a processing 
13 | plot, but each time only one column name can be inputted.}
14 | }
15 | \description{
16 | The "FilterPlot" function makes the plots to show the UMIs and expressed genes 
17 | of individual cells. These plots are usually used to estimate the data before 
18 | and after pre-processing, so the users can visually select the optimal 
19 | parameters to filter data.
20 | }
21 | \examples{
22 | # RISC object
23 | obj0 = raw.mat[[3]]
24 | FilterPlot(obj0, colFactor = 'Group')
25 | }
26 | \references{
27 | Wickham, H. (2016)
28 | 
29 | Auguie, B. (2015)
30 | 
31 | Liu et al., Nature Biotech. (2021)
32 | }
33 | 


--------------------------------------------------------------------------------
/man/Heatmap.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/Graph.R
 3 | \name{Heat}
 4 | \alias{Heat}
 5 | \title{Heatmap}
 6 | \usage{
 7 | Heat(
 8 |   object,
 9 |   colFactor = NULL,
10 |   genes = NULL,
11 |   cells = NULL,
12 |   gene.lab = FALSE,
13 |   gene.cluster = 0,
14 |   sample_bin = FALSE,
15 |   ann_col = NULL,
16 |   lim = NULL,
17 |   smooth = "smooth",
18 |   span = 0.75,
19 |   degree = 1,
20 |   palette = NULL,
21 |   num = 50,
22 |   con.bin = TRUE,
23 |   cell.lab.size = 10,
24 |   gene.lab.size = 5,
25 |   value_only = FALSE,
26 |   ...
27 | )
28 | }
29 | \arguments{
30 | \item{object}{RISC object: a framework dataset.}
31 | 
32 | \item{colFactor}{Use the factor (column name) in the coldata to make heatmap, but 
33 | be factors.}
34 | 
35 | \item{genes}{Use the gene expression values (gene symbol) to make heatmap, need to 
36 | be inputted by the users.}
37 | 
38 | \item{cells}{Use the subset cells of the whole coldata (cells) to make heatmap, 
39 | the default is NULL and including all the cells.}
40 | 
41 | \item{gene.lab}{Whether label gene names for the heatmap.}
42 | 
43 | \item{gene.cluster}{The cluster numbers for gene clustering in the heatmap. 
44 | The default is 0, without clustering genes.}
45 | 
46 | \item{sample_bin}{The cell aggregating in samples, the default is FALSE.}
47 | 
48 | \item{ann_col}{The annotation colors for colFactors, the input is a list.}
49 | 
50 | \item{lim}{The gene expression range shown at heat-maps.}
51 | 
52 | \item{smooth}{If use smooth to adjust heatmap, the default is "smooth" and another
53 | choice is "loess".}
54 | 
55 | \item{span}{The loess span.}
56 | 
57 | \item{degree}{The loess degree.}
58 | 
59 | \item{palette}{The color palette used for heatmap. The default is 
60 | brewer.pal(n = 7, name = "RdYlBu").}
61 | 
62 | \item{num}{The cells for individual bin spans.}
63 | 
64 | \item{con.bin}{Whether use consistent bin span.}
65 | 
66 | \item{cell.lab.size}{The font size for column.}
67 | 
68 | \item{gene.lab.size}{The font size for row.}
69 | 
70 | \item{value_only}{Only return values.}
71 | }
72 | \description{
73 | The "Heat" map makes heatmap to show gene expression patterns of single cells. 
74 | The default groups cells into clusters, so the column of heatmap represents 
75 | genes while the row of heatmap for the clusters of all the cells.
76 | }
77 | \examples{
78 | # RISC object
79 | obj0 = raw.mat[[3]]
80 | gene0 = c('Gene718', 'Gene120', 'Gene313', 'Gene157', 'Gene30', 
81 |           'Gene325', 'Gene415', 'Gene566', 'Gene990', 'Gene13', 
82 |           'Gene604', 'Gene934', 'Gene231', 'Gene782', 'Gene10')
83 | Heat(obj0, colFactor = 'Group', genes = gene0, gene.lab = TRUE, gene.cluster = 3, 
84 | sample_bin = TRUE, lim = 2, gene.lab.size = 8)
85 | }
86 | \references{
87 | Kolde, R. (2015)
88 | }
89 | 


--------------------------------------------------------------------------------
/man/Import-10X-h5.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/AllClasses.R
 3 | \name{read10X_h5}
 4 | \alias{read10X_h5}
 5 | \title{Import data from 10X Genomics output (h5).}
 6 | \usage{
 7 | read10X_h5(file.path, is.filter = TRUE)
 8 | }
 9 | \arguments{
10 | \item{file.path}{The path of the filtered 10X Genomics output (h5 file).}
11 | 
12 | \item{is.filter}{Remove not expressed genes.}
13 | }
14 | \value{
15 | RISC single cell dataset, including count, coldata, and rowdata.
16 | }
17 | \description{
18 | Import data directly from 10X Genomics output, usually using filtered gene 
19 | matrices which contains h5 file. 
20 | The user only need to input the directory into "data.path". If not the original 
21 | 10X Genomics output, the user can use 'readsc' function.
22 | }
23 | 


--------------------------------------------------------------------------------
/man/Import-10X-mtx.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/AllClasses.R
 3 | \name{read10X_mtx}
 4 | \alias{read10X_mtx}
 5 | \title{Import data from 10X Genomics output (tsv-mtx).}
 6 | \usage{
 7 | read10X_mtx(data.path, sep = "\\t", is.filter = TRUE)
 8 | }
 9 | \arguments{
10 | \item{data.path}{Directory containing the filtered 10X Genomics output, 
11 | including three files: matrix.mtx, barcode.tsv (without colnames) and gene.tsv 
12 | (without colnames).}
13 | 
14 | \item{sep}{The sep can be changed by the users}
15 | 
16 | \item{is.filter}{Remove not expressed genes.}
17 | }
18 | \value{
19 | RISC single cell dataset, including count, coldata, and rowdata.
20 | }
21 | \description{
22 | Import data directly from 10X Genomics output, usually using filtered gene 
23 | matrices which contains three files: matrix.mtx, barcode.tsv and gene.tsv. 
24 | The user only need to input the directory into "data.path". If not the original 
25 | 10X Genomics output, the user have to make sure the barcode.tsv and gene.tsv 
26 | without col.names, the barcode.tsv at least contains one column for cell 
27 | barcode, and the gene.tsv has two columns for gene Ensembl ID and Symbol.
28 | }
29 | 


--------------------------------------------------------------------------------
/man/Import-Matrix.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/AllClasses.R
 3 | \name{readsc}
 4 | \alias{readsc}
 5 | \title{Import data from matrix, cell and genes directly.}
 6 | \usage{
 7 | readsc(count, cell, gene, is.filter = TRUE)
 8 | }
 9 | \arguments{
10 | \item{count}{Matrix with raw counts/UMIs.}
11 | 
12 | \item{cell}{Data.frame with cell Barcode, whose row.name are equal to the 
13 | col.name of the matrix.}
14 | 
15 | \item{gene}{Data.frame with gene symbol, whose row.name are the same as the 
16 | row.name of the matrix.}
17 | 
18 | \item{is.filter}{Remove not expressed genes.}
19 | }
20 | \value{
21 | RISC single cell dataset, including count, coldata, and rowdata.
22 | }
23 | \description{
24 | Import data set from matrix, cell and genes directly, the customer needs three 
25 | files: a matrix file including gene expression values: raw counts/UMIs (rows for 
26 | genes while columns for cells), a cell file (whose row.name are equal to the 
27 | col.name of the matrix), and a gene file whose row.name are the same as the 
28 | row.name of the matrix. If row.names of the gene matrix are Ensembl ID, the 
29 | customer need to transfer them to gene symbols manually.
30 | }
31 | \examples{
32 | mat0 = as.matrix(raw.mat[[1]])
33 | coldata0 = as.data.frame(raw.mat[[2]])
34 | coldata.obj = coldata0[coldata0$Batch0 == 'Batch3',]
35 | matrix.obj = mat0[,rownames(coldata.obj)]
36 | obj0 = readsc(count = matrix.obj, cell = coldata.obj, 
37 |        gene = data.frame(Symbol = rownames(matrix.obj), 
38 |        row.names = rownames(matrix.obj)), is.filter = FALSE)
39 | }
40 | 


--------------------------------------------------------------------------------
/man/InPlot.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/Integrating.R
 3 | \name{InPlot}
 4 | \alias{InPlot}
 5 | \title{Integration Plot}
 6 | \usage{
 7 | InPlot(
 8 |   object = NULL,
 9 |   var.gene = NULL,
10 |   Colors = NULL,
11 |   nPC = 20,
12 |   neighbor = 30,
13 |   res = 1,
14 |   method = "louvain",
15 |   algorithm = "kd_tree",
16 |   ncore = 1,
17 |   minPC = 11,
18 |   Std.cut = 0.95,
19 |   bin = 5
20 | )
21 | }
22 | \arguments{
23 | \item{object}{A list of RISC objects.}
24 | 
25 | \item{var.gene}{The highly variable genes.}
26 | 
27 | \item{Colors}{The colors labeling for different data sets.}
28 | 
29 | \item{nPC}{The PCs will be calculated.}
30 | 
31 | \item{neighbor}{The nearest neighbors.}
32 | 
33 | \item{res}{The resolution of cluster searched for, works in "louvain" method.}
34 | 
35 | \item{method}{The method of cell clustering for individual datasets.}
36 | 
37 | \item{algorithm}{The algorithm for knn, the default is "kd_tree", all options: 
38 | "kd_tree", "cover_tree", "CR", "brute".}
39 | 
40 | \item{ncore}{The number of multiple cores for testing.}
41 | 
42 | \item{minPC}{The minimal PCs for detecting cell clustering.}
43 | 
44 | \item{Std.cut}{The cutoff of standard deviation of the PCs.}
45 | 
46 | \item{bin}{The bin number for calculating cell clustering.}
47 | }
48 | \description{
49 | The "InPlot" function makes the plot to show how the PCs explain the variance 
50 | for data integration. This plot helps the users to select the optimal reference 
51 | and the PCs to perform data integration.
52 | }
53 | \references{
54 | Liu et al., Nature Biotech. (2021)
55 | }
56 | 


--------------------------------------------------------------------------------
/man/Integration-Algorithm-SIMPLS.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/Integrating.R
 3 | \name{SIMPLS}
 4 | \alias{SIMPLS}
 5 | \title{Integration Algorithm SIMPLS}
 6 | \usage{
 7 | SIMPLS(X, Y, npcs = 10, seed = 123)
 8 | }
 9 | \arguments{
10 | \item{X}{The reference matrix, row for genes and column for cells.}
11 | 
12 | \item{Y}{The target matrix, row for genes and column for cells.}
13 | 
14 | \item{npcs}{The number of the PCs used for data integration.}
15 | 
16 | \item{seed}{The random seed to keep consistent result.}
17 | }
18 | \description{
19 | The partial least square (PLS) with SIMPLS algorithm is an extension of the 
20 | multiple linear regression model and considered as bilinear factor models. 
21 | Instead of embedding the reference and target matrices into a hyperplane 
22 | of maximum variance, the PLS utilizes a linear regression to project the 
23 | reference and target matrices into a new place. The SIMPLS algorithm provides 
24 | the regularization procedure for PLS. The matrices need to be centered before 
25 | SIMPLS integraton.
26 | }
27 | \references{
28 | De-Jong et al. (1993)
29 | }
30 | 


--------------------------------------------------------------------------------
/man/MSC.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/Integrating.R
 3 | \name{MSC}
 4 | \alias{MSC}
 5 | \title{Alignment of gene expression values}
 6 | \usage{
 7 | MSC(X, Y)
 8 | }
 9 | \arguments{
10 | \item{X}{The reference matrix, row for genes and column for cells.}
11 | 
12 | \item{Y}{The target matrix, row for genes and column for cells.}
13 | }
14 | \description{
15 | This funciton is not used in data integration but for adjusting the results, 
16 | usually the users do not need this.
17 | }
18 | \references{
19 | Mevik et al., JSS (2007)
20 | }
21 | 


--------------------------------------------------------------------------------
/man/Multiple-Integrating.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/Integrating.R
 3 | \name{scMultiIntegrate}
 4 | \alias{scMultiIntegrate}
 5 | \title{Integrating Multiple Datasets}
 6 | \usage{
 7 | scMultiIntegrate(
 8 |   objects,
 9 |   eigens = 10,
10 |   add.Id = NULL,
11 |   var.gene = NULL,
12 |   align = "OLS",
13 |   npc = 50,
14 |   adjust = TRUE,
15 |   ncore = 1,
16 |   seed = 123
17 | )
18 | }
19 | \arguments{
20 | \item{objects}{The list of multiple RISC objects: 
21 | list{object1, object2, object3, ...}. The first set is the reference to generate 
22 | gene-eigenvectors.}
23 | 
24 | \item{eigens}{The number of eigenvectors used for data integration.}
25 | 
26 | \item{add.Id}{Add a vector of Id to label different datasets, a character vector.}
27 | 
28 | \item{var.gene}{Define the variable genes manually. Here input a vector of gene 
29 | names as variable genes}
30 | 
31 | \item{align}{The method for alignment of gene expression values: "Optimal" for 
32 | alignment by experience, "Predict" for alignment by RPCI prediction, and "OLS" 
33 | for alignment by the ordinary linear regression.}
34 | 
35 | \item{npc}{The number of the PCs returns from "scMultiIntegrate" function, 
36 | they are usually used for the subsequent analyses, like cell embedding and 
37 | cell clustering.}
38 | 
39 | \item{adjust}{Whether adjust the number of eigenvectors.}
40 | 
41 | \item{ncore}{The number of multiple cores for data integration.}
42 | 
43 | \item{seed}{The random seed to keep consistent result.}
44 | }
45 | \description{
46 | The "scMultiIntegrate" function can be used for data integration of multiple 
47 | datasets, it is basically based on our new approach RPCI (reference principal 
48 | components integration), which decomposes all the target datasets based on the 
49 | reference data. The output of this function is RISC object, including the 
50 | integrated eigenvectors and aligned gene expression values.
51 | }
52 | \examples{
53 | obj1 = raw.mat[[3]]
54 | obj2 = raw.mat[[4]]
55 | obj0 = list(obj1, obj2)
56 | var0 = intersect(obj1@vargene, obj2@vargene)
57 | obj0 = scMultiIntegrate(obj0, eigens = 8, var.gene = var0, align = 'Predict', 
58 |                         npc = 20, add.Id = c("Set1", "Set2"), ncore = 2)
59 | obj0 = scUMAP(obj0, npc = 8, use = "PLS", dist = 0.001, neighbors = 15)
60 | DimPlot(obj0, slot = "cell.umap", colFactor = "Set", size = 2)
61 | DimPlot(obj0, slot = "cell.umap", colFactor = "Group", size = 2, label = TRUE)
62 | }
63 | \references{
64 | Liu et al., Nature Biotech. (2021)
65 | }
66 | 


--------------------------------------------------------------------------------
/man/Normalize.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/Preprocess.R
 3 | \name{scNormalize}
 4 | \alias{scNormalize}
 5 | \title{The Processing Data.}
 6 | \usage{
 7 | scNormalize(
 8 |   object,
 9 |   method = "robust",
10 |   libsize = 1e+06,
11 |   remove.mito = FALSE,
12 |   norm.dis = TRUE,
13 |   large = TRUE,
14 |   ncore = 1
15 | )
16 | }
17 | \arguments{
18 | \item{object}{RISC object: a framework dataset.}
19 | 
20 | \item{method}{A method for scdataset normalization, two options: "cosine" and 
21 | "robust".}
22 | 
23 | \item{libsize}{The standard sum of the UMI in each cell.}
24 | 
25 | \item{remove.mito}{Remove mitochondrial genes from library size.}
26 | 
27 | \item{norm.dis}{Normalize the distribution of count data.}
28 | 
29 | \item{large}{Whether a large size data (ncell > 50,000)}
30 | 
31 | \item{ncore}{The multiple cores for parallel calculating.}
32 | }
33 | \value{
34 | RISC single cell dataset, the assay and rowdata slots.
35 | }
36 | \description{
37 | After data filtration, RISC will normalized the raw counts/UMIs by using size 
38 | factors which are calculated by the raw counts/UMIs of each cell and will 
39 | remove sequencing depth batch. The output will be transformed in log1p. The gene 
40 | expression values of RISC object for the subsequent analyses is based on the 
41 | normalized data. Here two kinds of normalization can be employed: one is based on 
42 | the least absolute deviations, while the other is from the least square.
43 | }
44 | \examples{
45 | # RISC object
46 | obj0 = raw.mat[[5]]
47 | obj0 = scFilter(obj0, min.UMI = 0, max.UMI = Inf, min.gene = 10, min.cell = 3)
48 | obj0 = scNormalize(obj0)
49 | }
50 | \references{
51 | Boscovich, R.J. (1757)
52 | 
53 | Thompson, W.J., Computers in Physics (1992)
54 | }
55 | 


--------------------------------------------------------------------------------
/man/PCA.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/Reduce_Dimension.R
 3 | \name{scPCA}
 4 | \alias{scPCA}
 5 | \title{Dimension Reduction.}
 6 | \usage{
 7 | scPCA(object, npc = 20)
 8 | }
 9 | \arguments{
10 | \item{object}{RISC object: a framework dataset.}
11 | 
12 | \item{npc}{The number of PCs will be generated based on highly variable genes 
13 | (usually < 1,500), npc equal to the first 20 PCs as the default.}
14 | }
15 | \value{
16 | RISC single cell dataset, the DimReduction slot.
17 | }
18 | \description{
19 | Based on highly variably expressed genes of the datasets, RISC calculates the 
20 | principal components (PCs) of the cells using prcomp functions. The major PCs, 
21 | which explain most gene expression variance, are used for dimension reduciton.
22 | }
23 | \examples{
24 | # RISC object
25 | obj0 = raw.mat[[3]]
26 | obj0 = scPCA(obj0, npc = 10)
27 | }
28 | \references{
29 | Jolliffe et al. (2016)
30 | 
31 | Alter et al., PNAS (2000)
32 | 
33 | Gonzalez et al., JSS (2008)
34 | 
35 | Mevik et al., JSS (2007)
36 | }
37 | 


--------------------------------------------------------------------------------
/man/PCPlot.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/Graph.R
 3 | \name{PCPlot}
 4 | \alias{PCPlot}
 5 | \title{Processing Plot}
 6 | \usage{
 7 | PCPlot(object)
 8 | }
 9 | \arguments{
10 | \item{object}{RISC object: a framework dataset.}
11 | }
12 | \description{
13 | The "PCPlot" function makes the plot to show How the PCs explain the variance. 
14 | This plot helps the users to select the optimal PCs to perform dimension 
15 | reduction and data integration.
16 | }
17 | \references{
18 | Wickham, H. (2016)
19 | 
20 | Auguie, B. (2015)
21 | 
22 | Liu et al., Nature Biotech. (2021)
23 | }
24 | 


--------------------------------------------------------------------------------
/man/PLS-Integrating.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/Integrating.R
 3 | \name{scPLS}
 4 | \alias{scPLS}
 5 | \title{Integrating Multiple Large Datasets}
 6 | \usage{
 7 | scPLS(
 8 |   objects,
 9 |   eigens = 10,
10 |   add.Id = NULL,
11 |   var.gene = NULL,
12 |   npc = 100,
13 |   adjust = TRUE,
14 |   ncore = 1,
15 |   seed = 123
16 | )
17 | }
18 | \arguments{
19 | \item{objects}{The list of multiple RISC objects: 
20 | list{object1, object2, object3, ...}. The first set is the reference to generate 
21 | gene-eigenvectors.}
22 | 
23 | \item{eigens}{The number of eigenvectors used for data integration.}
24 | 
25 | \item{add.Id}{Add a vector of Id to label different datasets, a character vector.}
26 | 
27 | \item{var.gene}{Define the variable genes manually. Here input a vector of gene 
28 | names as variable genes}
29 | 
30 | \item{npc}{The number of the PCs returns from "scMultiIntegrate" function, 
31 | they are usually used for the subsequent analyses, like cell embedding and 
32 | cell clustering.}
33 | 
34 | \item{adjust}{Whether adjust the number of eigenvectors.}
35 | 
36 | \item{ncore}{The number of multiple cores for data integration.}
37 | 
38 | \item{seed}{The random seed to keep consistent result.}
39 | }
40 | \description{
41 | The "scPLS" function can be used for data integration of multiple 
42 | datasets, it is basically based on our new algorithm: reference principal 
43 | components integration (RPCI). RPCI decomposes all the target datasets based 
44 | on the reference. The output of this function can be used for low dimension 
45 | visualization.
46 | }
47 | \examples{
48 | obj1 = raw.mat[[3]]
49 | obj2 = raw.mat[[4]]
50 | obj0 = list(obj1, obj2)
51 | var0 = intersect(obj1@vargene, obj2@vargene)
52 | PLS0 = scPLS(obj0, var.gene = var0, npc = 20, add.Id = c("Set1", "Set2"), ncore = 1)
53 | }
54 | \references{
55 | Liu et al., Nature Biotech. (2021)
56 | }
57 | 


--------------------------------------------------------------------------------
/man/Pairwise-DEGs.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/ClusterMarker.R
 3 | \name{scDEG}
 4 | \alias{scDEG}
 5 | \title{Find Differentially Expressed Genes between Clusters}
 6 | \usage{
 7 | scDEG(
 8 |   object,
 9 |   cell.ctrl = NULL,
10 |   cell.sam = NULL,
11 |   frac = 0.1,
12 |   log2FC = 0.5,
13 |   Padj = 0.01,
14 |   latent.factor = NULL,
15 |   method = "NB",
16 |   min.cells = 10,
17 |   ncore = 1
18 | )
19 | }
20 | \arguments{
21 | \item{object}{RISC object: a framework dataset.}
22 | 
23 | \item{cell.ctrl}{Select the cells as the reference cells for detecting DEGs.}
24 | 
25 | \item{cell.sam}{Select the cells as the sample cells for detecting DEGs.}
26 | 
27 | \item{frac}{A fraction cutoff, the cluster marker genes expressed at least a 
28 | cutoff fraction of the cluster cells.}
29 | 
30 | \item{log2FC}{The cutoff of log2 Fold-change for differentially expressed marker 
31 | genes.}
32 | 
33 | \item{Padj}{The cutoff of the adjusted P-value. If Padj is NULL, use p-value 
34 | < 0.05 as a threshold. Set Padj as 1, without any cutoff.}
35 | 
36 | \item{latent.factor}{The latent factor from coldata, which represents number 
37 | values or factors, and only one latent factor can be inputed.}
38 | 
39 | \item{method}{Which method is used to identify cluster markers, two options: 
40 | 'NB' for Negative Binomial model, 'QP' for QuasiPoisson model, and 'wil' for
41 | Wilcoxon Rank-Sum model.}
42 | 
43 | \item{min.cells}{The minimum cells for each cluster to calculate marker genes.}
44 | 
45 | \item{ncore}{The multiple cores for parallel calculating.}
46 | }
47 | \description{
48 | This is the basic function in RISC, it can identify the differentially expressed 
49 | genes (DEGs) by comparing samples between the selected clusters. The criteria 
50 | used for the cluster markers are also appropriate to DEGs.
51 | }
52 | \details{
53 | Here RISC provides two algorithms to detect DEGs, the primary one is a model 
54 | "Quasi-Poisson" which has advantage to identify DEGs from the cluster with 
55 | a small number of cells. Meanwhile, RISC also has alternative algorithm: 
56 | "Negative Binomial" model.
57 | 
58 | Because log2 cannot handle counts with value 0, we use log1p to calculate average 
59 | values of counts and log2 to format fold-change.
60 | }
61 | \examples{
62 | # RISC object
63 | obj0 = raw.mat[[4]]
64 | obj0 = scPCA(obj0, npc = 10)
65 | obj0 = scUMAP(obj0, npc = 3)
66 | obj0 = scCluster(obj0, slot = "cell.umap", k = 3, method = 'density')
67 | DimPlot(obj0, slot = "cell.umap", colFactor = 'Cluster', size = 2)
68 | cell.ctrl = rownames(obj0@coldata)[obj0@coldata$Cluster == 1]
69 | cell.sam = rownames(obj0@coldata)[obj0@coldata$Cluster == 3]
70 | DEG0 = scDEG(obj0, cell.ctrl = cell.ctrl, cell.sam = cell.sam, 
71 |              min.cells = 3, method = 'QP')
72 | }
73 | \references{
74 | Paternoster et al., Criminology (1997)
75 | 
76 | Berk et al., Journal of Quantitative Criminology (2008)
77 | 
78 | Liu et al., Nature Biotech. (2021)
79 | }
80 | 


--------------------------------------------------------------------------------
/man/Scale.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/Preprocess.R
 3 | \name{scScale}
 4 | \alias{scScale}
 5 | \title{Processing Data.}
 6 | \usage{
 7 | scScale(object, method = "scale", center = TRUE, scale = FALSE)
 8 | }
 9 | \arguments{
10 | \item{object}{RISC object: a framework dataset.}
11 | 
12 | \item{method}{A model used for scale scdataset, the default is root-mean-square 
13 | scaling.}
14 | 
15 | \item{center}{Whether to center the matrix.}
16 | 
17 | \item{scale}{Whether using standard deviation to scale the matrix.}
18 | }
19 | \value{
20 | The scaled matrix.
21 | }
22 | \description{
23 | After data normalization, RISC will perform root-mean-square scaling to the 
24 | dataset and generated scaled counts which balance expression levels in each cell 
25 | with empirical mean equal to 0. Therefore, only biological signal will be 
26 | reserved in scaled counts. RISC utilizes scaled counts for dimension reduction 
27 | and data integration.
28 | }
29 | \examples{
30 | # RISC object
31 | obj0 = raw.mat[[3]]
32 | scale.mat = scScale(obj0)
33 | }
34 | \references{
35 | Juszczak et al., CiteSeer (2002)
36 | 
37 | Jiawei et al. (2011)
38 | }
39 | 


--------------------------------------------------------------------------------
/man/SingleCellData.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/AllClasses.R
 3 | \name{SingleCellData}
 4 | \alias{SingleCellData}
 5 | \title{Import single cell data}
 6 | \usage{
 7 | SingleCellData(assay, coldata, rowdata)
 8 | }
 9 | \arguments{
10 | \item{assay}{The list of gene counts.}
11 | 
12 | \item{coldata}{The data.frame with cell information.}
13 | 
14 | \item{rowdata}{The data.frame with gene information.}
15 | }
16 | \value{
17 | SingleCellData
18 | }
19 | \description{
20 | The single cell RNA-seq (scRNA-seq) data can be imported in three different ways.
21 | Primarily, we could import from 10X Genomics output directly by using 
22 | "read10Xgenomics". The user only need to provide the folder path. Secondly, 
23 | we could read data from HT-seq output by "readHTSeqdata", the user have to input 
24 | the folder path. Lastly, we could input matrix, cell and genes mannually 
25 | using "readscdata".
26 | }
27 | 


--------------------------------------------------------------------------------
/man/Subset.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/Utilities.R
 3 | \name{SubSet}
 4 | \alias{SubSet}
 5 | \title{Utilities Subset data}
 6 | \usage{
 7 | SubSet(object, cells = NULL, genes = NULL)
 8 | }
 9 | \arguments{
10 | \item{object}{RISC object: a framework dataset.}
11 | 
12 | \item{cells}{The cells are directly used for collecting a data subset.}
13 | 
14 | \item{genes}{The genes are directly used for collecting a data subset.}
15 | }
16 | \description{
17 | The "Subset" function can abstract a data subset from the full dataset, this 
18 | function not only collect the subset of coldata and rowdata, but also abstract 
19 | raw counts/UMIs. Meanwhile, after "Subset" function, RISC object need to be 
20 | normalized and scaled one more time.
21 | }
22 | \examples{
23 | # RISC object
24 | obj0 = raw.mat[[5]]
25 | obj0
26 | cell1 = rownames(obj0@coldata)[1:15]
27 | obj1 = SubSet(obj0, cells = cell1)
28 | obj1
29 | }
30 | 


--------------------------------------------------------------------------------
/man/UMAP.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/Reduce_Dimension.R
 3 | \name{scUMAP}
 4 | \alias{scUMAP}
 5 | \title{Dimension Reduction.}
 6 | \usage{
 7 | scUMAP(
 8 |   object,
 9 |   npc = 20,
10 |   embedding = 2,
11 |   use = "PCA",
12 |   neighbors = 15,
13 |   dist = 0.1,
14 |   seed = 123,
15 |   ...
16 | )
17 | }
18 | \arguments{
19 | \item{object}{RISC object: a framework dataset.}
20 | 
21 | \item{npc}{The number of the PCs (or the PLS) using for UMAP, the default is 20, 
22 | but need to be modified by the users. The PCA for individual dataset, while PLS 
23 | for the integrated data.}
24 | 
25 | \item{embedding}{The number of components UMAP output.}
26 | 
27 | \item{use}{What components used for UMAP: PCA or PLS.}
28 | 
29 | \item{neighbors}{The n_neighbors parameter of UMAP.}
30 | 
31 | \item{dist}{The min_dist parameter of UMAP.}
32 | 
33 | \item{seed}{The random seed to keep tSNE result consistent.}
34 | }
35 | \value{
36 | RISC single cell dataset, the DimReduction slot.
37 | }
38 | \description{
39 | The UMAP is calculated based on the eigenvectors of single cell dataset, and the 
40 | user can select the eigenvectors manually. Of note, the selected eigenvectors 
41 | directly affect UMAP values. 
42 | For the integrated data (the result of "scMultiIntegrate" funciton), RISC utilizes
43 | the PCR output "PLS" to calculate the UMAP, therefore, the user has to input "PLS"
44 | in "use = ", instead of the default parameter "PCA".
45 | }
46 | \examples{
47 | # RISC object
48 | obj0 = raw.mat[[3]]
49 | obj0 = scPCA(obj0, npc = 10)
50 | obj0 = scUMAP(obj0, npc = 3)
51 | DimPlot(obj0, slot = "cell.umap", colFactor = 'Group', size = 2)
52 | }
53 | \references{
54 | Becht et al., Nature Biotech. (2018)
55 | }
56 | 


--------------------------------------------------------------------------------
/man/UMAPlot.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/Graph.R
 3 | \name{UMAPlot}
 4 | \alias{UMAPlot}
 5 | \title{UMAP Plots}
 6 | \usage{
 7 | UMAPlot(
 8 |   object,
 9 |   colFactor = NULL,
10 |   genes = NULL,
11 |   legend = TRUE,
12 |   Colors = NULL,
13 |   size = 0.5,
14 |   Alpha = 0.8,
15 |   plot.ncol = NULL,
16 |   raw.count = FALSE,
17 |   exp.range = NULL,
18 |   exp.col = "firebrick2"
19 | )
20 | }
21 | \arguments{
22 | \item{object}{RISC object: a framework dataset.}
23 | 
24 | \item{colFactor}{Use the factor (column name) in the coldata to make a UMAP plot, 
25 | but each time only one column name can be inputted.}
26 | 
27 | \item{genes}{Use the gene expression values (gene symbol) to make UMAP plots, 
28 | each time more than one genes can be inputted.}
29 | 
30 | \item{legend}{Whether a legend shown at UMAP plot.}
31 | 
32 | \item{Colors}{The users can use their own colors (color vector). The default of 
33 | the "UMAPlot" funciton will assign colors automatically.}
34 | 
35 | \item{size}{Choose the size of dots at UMAP plots, the default size is 0.5.}
36 | 
37 | \item{Alpha}{Whether show transparency of individual points, the default is 0.8.}
38 | 
39 | \item{plot.ncol}{If the users input more than one genes, the arrangement of 
40 | multiple UMAP plots depends on this parameter.}
41 | 
42 | \item{raw.count}{If use normalized or raw counts.}
43 | 
44 | \item{exp.range}{The gene expression cutoff for plot, e.g. "c(0, 1.5)" for 
45 | expression level between 0 and 1.5.}
46 | 
47 | \item{exp.col}{The gradient color for gene expression.}
48 | }
49 | \description{
50 | The UMAP plots are widespread in scRNA-seq data analysis. Here, the "UMAPlot" 
51 | function not only can make plots for factor labels of individual cells but also 
52 | can show gene expression values of each cell.
53 | }
54 | \references{
55 | Wickham, H. (2016)
56 | 
57 | Auguie, B. (2015)
58 | }
59 | 


--------------------------------------------------------------------------------
/man/Violin-Plot.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/Graph.R
 3 | \name{ViolinPlot}
 4 | \alias{ViolinPlot}
 5 | \title{Violin Plot}
 6 | \usage{
 7 | ViolinPlot(
 8 |   object,
 9 |   colFactor = NULL,
10 |   genes = NULL,
11 |   legend = TRUE,
12 |   trim = TRUE,
13 |   Colors = NULL,
14 |   Alpha = 0.8,
15 |   dots = TRUE,
16 |   wid = "area"
17 | )
18 | }
19 | \arguments{
20 | \item{object}{RISC object: a framework dataset.}
21 | 
22 | \item{colFactor}{Use the factor (column name) in the coldata to make heatmap, but 
23 | each time only one column name can be inputted.}
24 | 
25 | \item{genes}{The gene expression pattern: gene symbol}
26 | 
27 | \item{legend}{Whether a legend shown at heatmap.}
28 | 
29 | \item{trim}{Whether trim the violin plot}
30 | 
31 | \item{Colors}{The users can use their own colors (color vector). The default of 
32 | the "UMAPlot" funciton will assign colors automatically.}
33 | 
34 | \item{Alpha}{Whether show transparency of individual points, the default is 0.8.}
35 | 
36 | \item{dots}{Adding jitter dots to the violin plot. The default is TRUE.}
37 | 
38 | \item{wid}{The scale format, options: "area", "width", "count". 
39 | The default: "area"}
40 | }
41 | \description{
42 | The "ViolinPlot" map makes plots to show gene expression patterns of the clusters 
43 | or other factors. The default groups cells into the clusters, but the users can 
44 | input the factor (column name) of coldata.
45 | }
46 | \examples{
47 | # RISC object
48 | obj0 = raw.mat[[3]]
49 | ViolinPlot(obj0, colFactor = 'Group', genes = 'Gene718')
50 | }
51 | \references{
52 | Wickham, H. (2016)
53 | 
54 | Auguie, B. (2015)
55 | }
56 | 


--------------------------------------------------------------------------------
/man/raw.mat.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/AllClasses.R
 3 | \docType{data}
 4 | \name{raw.mat}
 5 | \alias{raw.mat}
 6 | \title{Example data}
 7 | \format{
 8 | A list including a simulated cell-gene matrix, columns for cells and 
 9 | rows for genes, a cell group and a batch information.
10 | }
11 | \usage{
12 | data(raw.mat)
13 | }
14 | \description{
15 | Example data
16 | }
17 | \keyword{datasets}
18 | 


--------------------------------------------------------------------------------
/man/setClass.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/AllGenerics.R
 3 | \docType{class}
 4 | \name{RISCdata-class}
 5 | \alias{RISCdata-class}
 6 | \title{RISC data}
 7 | \value{
 8 | RISC object: a S4 framework dataset
 9 | }
10 | \description{
11 | The RISC object contains all the basic information used in single cell RNA-seq 
12 | analysis, including raw counts/UMIs, normalized gene values, dimension reduction, 
13 | cell clustering, and so on. The framework of RISC object is a S4 dataset, 
14 | consisting of assay, coldata, rowdata, metadata, vargene, cluster, and 
15 | DimReduction.
16 | }
17 | \section{Slots}{
18 | 
19 | \describe{
20 | \item{\code{assay}}{The list of gene counts/UMIs: raw and normalized counts}
21 | 
22 | \item{\code{coldata}}{The data.frame with cell information, such as cell types, stages, 
23 | and other factors.}
24 | 
25 | \item{\code{rowdata}}{The data.frame with gene information, such as coding or non-coding 
26 | genes.}
27 | 
28 | \item{\code{metadata}}{The data.frame with meta value.}
29 | 
30 | \item{\code{vargene}}{The highly variable gene. 
31 | These genes are utilized in dimension reduction.}
32 | 
33 | \item{\code{cluster}}{The cell clustering information: include three algorithms for 
34 | clustering.}
35 | 
36 | \item{\code{DimReduction}}{The values of dimension reduction.}
37 | }}
38 | 
39 | 


--------------------------------------------------------------------------------
/man/setMethod.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/AllGenerics.R
 3 | \docType{methods}
 4 | \name{RISCdata}
 5 | \alias{RISCdata}
 6 | \alias{.RISC_show}
 7 | \alias{RISC}
 8 | \alias{object}
 9 | \title{RISC data}
10 | \usage{
11 | .RISC_show(object)
12 | }
13 | \arguments{
14 | \item{object}{RISC object: a S4 framework dataset}
15 | }
16 | \description{
17 | This will show the full information of RISC object, including the number of cells, 
18 | the number of genes, any biological or statistical information of cells or genes.
19 | }
20 | 


--------------------------------------------------------------------------------
/man/tSNE.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/Reduce_Dimension.R
 3 | \name{scTSNE}
 4 | \alias{scTSNE}
 5 | \title{Dimension Reduction.}
 6 | \usage{
 7 | scTSNE(
 8 |   object,
 9 |   npc = 20,
10 |   embedding = 2,
11 |   use = "PCA",
12 |   perplexity = 30,
13 |   seed = 123,
14 |   ...
15 | )
16 | }
17 | \arguments{
18 | \item{object}{RISC object: a framework dataset.}
19 | 
20 | \item{npc}{The number of PCs (or PLS) using for t-SNE, the default is 20, 
21 | but need to be modified by the users. The PCA for individual dataset, while 
22 | PLS for the integrated data.}
23 | 
24 | \item{embedding}{The number of components t-SNE output.}
25 | 
26 | \item{use}{What components used for t-SNE: PCA or PLS.}
27 | 
28 | \item{perplexity}{Perplexity parameter: if the cell numbers are small, 
29 | decrease this parameter, otherwise tSNE cannot be calculated.}
30 | 
31 | \item{seed}{The random seed to keep tSNE result consistent.}
32 | }
33 | \value{
34 | RISC single cell dataset, the DimReduction slot.
35 | }
36 | \description{
37 | The t-SNE is calculated based on the eigenvectors of single cell dataset, and 
38 | the user can select the eigenvectors manually. Of note, the selected eigenvectors 
39 | directly affect t-SNE values. 
40 | For the integrated data (the result of "scMultiIntegrate" funciton), RISC utilizes
41 | the PCR output "PLS" to calculate the t-SNE, therefore, the user has to input 
42 | "PLS" in "use = ", instead of the defaut parameter "PCA".
43 | }
44 | \examples{
45 | # RISC object
46 | obj0 = raw.mat[[3]]
47 | obj0 = scPCA(obj0, npc = 10)
48 | obj0 = scTSNE(obj0, npc = 4, perplexity = 10)
49 | DimPlot(obj0, slot = "cell.tsne", colFactor = 'Group', size = 2)
50 | }
51 | \references{
52 | Laurens van der Maaten, JMLR (2014)
53 | }
54 | 


--------------------------------------------------------------------------------
/src/Makevars:
--------------------------------------------------------------------------------
 1 | ## OpenMP support in Armadillo prefers C++11 support. However, for wider
 2 | ## availability of the package we do not yet enforce this here.  It is however
 3 | ## recommended for client packages to set it.
 4 | ##
 5 | ## And with R 3.4.0, and RcppArmadillo 0.7.960.*, we turn C++11 on as OpenMP
 6 | ## support within Armadillo prefers / requires it
 7 | 
 8 | PKG_CXXFLAGS = $(SHLIB_OPENMP_CXXFLAGS) 
 9 | PKG_LIBS = $(SHLIB_OPENMP_CXXFLAGS) $(LAPACK_LIBS) $(BLAS_LIBS) $(FLIBS)
10 | 


--------------------------------------------------------------------------------
/src/Makevars.win:
--------------------------------------------------------------------------------
 1 | ## OpenMP support in Armadillo prefers C++11 support. However, for wider
 2 | ## availability of the package we do not yet enforce this here.  It is however
 3 | ## recommended for client packages to set it.
 4 | ##
 5 | ## And with R 3.4.0, and RcppArmadillo 0.7.960.*, we turn C++11 on as OpenMP
 6 | ## support within Armadillo prefers / requires it
 7 | 
 8 | PKG_CXXFLAGS = $(SHLIB_OPENMP_CXXFLAGS) 
 9 | PKG_LIBS = $(SHLIB_OPENMP_CXXFLAGS) $(LAPACK_LIBS) $(BLAS_LIBS) $(FLIBS)
10 | 


--------------------------------------------------------------------------------
/src/RcppArmadilloProcess.cpp:
--------------------------------------------------------------------------------
  1 | // Most codes of C++ functions are based on Dirk Eddelbuettel's 
  2 | // 'RcppArmadillo' examples. RISC package rebuilds them to support 
  3 | // the calculation of sparse matrices.
  4 | 
  5 | #include <RcppArmadillo.h>
  6 | // [[Rcpp::depends(RcppArmadillo)]]
  7 | 
  8 | using namespace Rcpp;
  9 | using namespace arma;
 10 | 
 11 | 
 12 | // [[Rcpp::export]]
 13 | arma::sp_mat sqrt_sp(arma::sp_mat X) {
 14 |   return sqrt(X);
 15 | }
 16 | 
 17 | // [[Rcpp::export]]
 18 | arma::mat cent_sp_d(arma::sp_mat X) {
 19 |   arma::mat Y = conv_to<arma::mat>::from(X);
 20 |   int c = Y.n_cols;
 21 |   rowvec meanx(c);
 22 |   meanx = mean(Y, 0);
 23 |   for(int j=0; j<c; j++){
 24 |     Y.col(j) = Y.col(j) - meanx(j);
 25 |   }
 26 |   return Y;
 27 | }
 28 | 
 29 | // [[Rcpp::export]]
 30 | arma::sp_mat multiply_sp_sp(arma::sp_mat X, arma::sp_mat Y) {
 31 |   return X * Y;
 32 | }
 33 | 
 34 | // [[Rcpp::export]]
 35 | arma::mat multiply_sp_d(arma::sp_mat X, arma::sp_mat Y) {
 36 |   arma::mat result(X * Y);
 37 |   return result;
 38 | }
 39 | 
 40 | // [[Rcpp::export]]
 41 | arma::mat multiply_d_d(arma::mat X, arma::mat Y) {
 42 |   return X * Y;
 43 | }
 44 | 
 45 | // [[Rcpp::export]]
 46 | arma::sp_mat multiply_sp_d_sp(arma::sp_mat X, arma::mat Y) {
 47 |   arma::sp_mat result(X * Y);
 48 |   return result;
 49 | }
 50 | 
 51 | // [[Rcpp::export]]
 52 | arma::vec multiply_sp_d_v(arma::sp_mat X, arma::mat Y) {
 53 |   arma::mat result(X * Y);
 54 |   arma::vec Z = result.as_col();
 55 |   return Z;
 56 | }
 57 | 
 58 | // [[Rcpp::export]]
 59 | arma::sp_mat crossprod_sp_sp(arma::sp_mat X, arma::sp_mat Y) {
 60 |   return trans(X) * Y;
 61 | }
 62 | 
 63 | // [[Rcpp::export]]
 64 | arma::mat crossprod_sp_d(arma::sp_mat X, arma::sp_mat Y) {
 65 |   arma::mat result(trans(X) * Y);
 66 |   return result;
 67 | }
 68 | 
 69 | // [[Rcpp::export]]
 70 | arma::mat crossprod_d_d(arma::mat X, arma::mat Y) {
 71 |   return trans(X) * Y;
 72 | }
 73 | 
 74 | // [[Rcpp::export]]
 75 | arma::sp_mat tcrossprod_sp_sp(arma::sp_mat X, arma::sp_mat Y) {
 76 |   return X * trans(Y);
 77 | }
 78 | 
 79 | // [[Rcpp::export]]
 80 | arma::mat tcrossprod_d_d(arma::mat X, arma::mat Y) {
 81 |   return X * trans(Y);
 82 | }
 83 | 
 84 | // [[Rcpp::export]]
 85 | arma::rowvec winsorize_(arma::rowvec x, double y) {
 86 |   arma::uvec id = find(x >= y);
 87 |   x.elem(id).fill(y);
 88 |   return x;
 89 | }
 90 | 
 91 | // [[Rcpp::export]]
 92 | arma::colvec lm_coef(arma::mat X, arma::colvec y) {
 93 |   arma::colvec coef = arma::solve(X, y);
 94 |   return coef;
 95 | }
 96 | 
 97 | // [[Rcpp::export]]
 98 | Rcpp::List lm_(arma::mat X, arma::colvec y) {
 99 |   
100 |   int n = X.n_rows, k = X.n_cols;
101 |   arma::colvec coef = arma::solve(X, y);
102 |   arma::colvec res = y - X * coef;
103 |   double s2 = std::inner_product(res.begin(), res.end(), res.begin(), 0.0) / (n - k);
104 |   arma::colvec std_err = arma::sqrt(s2 * arma::diagvec(arma::pinv(arma::trans(X) * X)));  
105 |   return Rcpp::List::create(Rcpp::Named("coefficients") = coef,
106 |                             Rcpp::Named("stderr") = std_err,
107 |                             Rcpp::Named("df.residual") = n - k);
108 | }
109 | 
110 | 
111 | 


--------------------------------------------------------------------------------
/src/RcppExports.cpp:
--------------------------------------------------------------------------------
  1 | // Generated by using Rcpp::compileAttributes() -> do not edit by hand
  2 | // Generator token: 10BE3573-1514-4C36-9D1C-5A225CD40393
  3 | 
  4 | #include <RcppArmadillo.h>
  5 | #include <Rcpp.h>
  6 | 
  7 | using namespace Rcpp;
  8 | 
  9 | #ifdef RCPP_USE_GLOBAL_ROSTREAM
 10 | Rcpp::Rostream<true>&  Rcpp::Rcout = Rcpp::Rcpp_cout_get();
 11 | Rcpp::Rostream<false>& Rcpp::Rcerr = Rcpp::Rcpp_cerr_get();
 12 | #endif
 13 | 
 14 | // sqrt_sp
 15 | arma::sp_mat sqrt_sp(arma::sp_mat X);
 16 | RcppExport SEXP _RISC_sqrt_sp(SEXP XSEXP) {
 17 | BEGIN_RCPP
 18 |     Rcpp::RObject rcpp_result_gen;
 19 |     Rcpp::RNGScope rcpp_rngScope_gen;
 20 |     Rcpp::traits::input_parameter< arma::sp_mat >::type X(XSEXP);
 21 |     rcpp_result_gen = Rcpp::wrap(sqrt_sp(X));
 22 |     return rcpp_result_gen;
 23 | END_RCPP
 24 | }
 25 | // cent_sp_d
 26 | arma::mat cent_sp_d(arma::sp_mat X);
 27 | RcppExport SEXP _RISC_cent_sp_d(SEXP XSEXP) {
 28 | BEGIN_RCPP
 29 |     Rcpp::RObject rcpp_result_gen;
 30 |     Rcpp::RNGScope rcpp_rngScope_gen;
 31 |     Rcpp::traits::input_parameter< arma::sp_mat >::type X(XSEXP);
 32 |     rcpp_result_gen = Rcpp::wrap(cent_sp_d(X));
 33 |     return rcpp_result_gen;
 34 | END_RCPP
 35 | }
 36 | // multiply_sp_sp
 37 | arma::sp_mat multiply_sp_sp(arma::sp_mat X, arma::sp_mat Y);
 38 | RcppExport SEXP _RISC_multiply_sp_sp(SEXP XSEXP, SEXP YSEXP) {
 39 | BEGIN_RCPP
 40 |     Rcpp::RObject rcpp_result_gen;
 41 |     Rcpp::RNGScope rcpp_rngScope_gen;
 42 |     Rcpp::traits::input_parameter< arma::sp_mat >::type X(XSEXP);
 43 |     Rcpp::traits::input_parameter< arma::sp_mat >::type Y(YSEXP);
 44 |     rcpp_result_gen = Rcpp::wrap(multiply_sp_sp(X, Y));
 45 |     return rcpp_result_gen;
 46 | END_RCPP
 47 | }
 48 | // multiply_sp_d
 49 | arma::mat multiply_sp_d(arma::sp_mat X, arma::sp_mat Y);
 50 | RcppExport SEXP _RISC_multiply_sp_d(SEXP XSEXP, SEXP YSEXP) {
 51 | BEGIN_RCPP
 52 |     Rcpp::RObject rcpp_result_gen;
 53 |     Rcpp::RNGScope rcpp_rngScope_gen;
 54 |     Rcpp::traits::input_parameter< arma::sp_mat >::type X(XSEXP);
 55 |     Rcpp::traits::input_parameter< arma::sp_mat >::type Y(YSEXP);
 56 |     rcpp_result_gen = Rcpp::wrap(multiply_sp_d(X, Y));
 57 |     return rcpp_result_gen;
 58 | END_RCPP
 59 | }
 60 | // multiply_d_d
 61 | arma::mat multiply_d_d(arma::mat X, arma::mat Y);
 62 | RcppExport SEXP _RISC_multiply_d_d(SEXP XSEXP, SEXP YSEXP) {
 63 | BEGIN_RCPP
 64 |     Rcpp::RObject rcpp_result_gen;
 65 |     Rcpp::RNGScope rcpp_rngScope_gen;
 66 |     Rcpp::traits::input_parameter< arma::mat >::type X(XSEXP);
 67 |     Rcpp::traits::input_parameter< arma::mat >::type Y(YSEXP);
 68 |     rcpp_result_gen = Rcpp::wrap(multiply_d_d(X, Y));
 69 |     return rcpp_result_gen;
 70 | END_RCPP
 71 | }
 72 | // multiply_sp_d_sp
 73 | arma::sp_mat multiply_sp_d_sp(arma::sp_mat X, arma::mat Y);
 74 | RcppExport SEXP _RISC_multiply_sp_d_sp(SEXP XSEXP, SEXP YSEXP) {
 75 | BEGIN_RCPP
 76 |     Rcpp::RObject rcpp_result_gen;
 77 |     Rcpp::RNGScope rcpp_rngScope_gen;
 78 |     Rcpp::traits::input_parameter< arma::sp_mat >::type X(XSEXP);
 79 |     Rcpp::traits::input_parameter< arma::mat >::type Y(YSEXP);
 80 |     rcpp_result_gen = Rcpp::wrap(multiply_sp_d_sp(X, Y));
 81 |     return rcpp_result_gen;
 82 | END_RCPP
 83 | }
 84 | // multiply_sp_d_v
 85 | arma::vec multiply_sp_d_v(arma::sp_mat X, arma::mat Y);
 86 | RcppExport SEXP _RISC_multiply_sp_d_v(SEXP XSEXP, SEXP YSEXP) {
 87 | BEGIN_RCPP
 88 |     Rcpp::RObject rcpp_result_gen;
 89 |     Rcpp::RNGScope rcpp_rngScope_gen;
 90 |     Rcpp::traits::input_parameter< arma::sp_mat >::type X(XSEXP);
 91 |     Rcpp::traits::input_parameter< arma::mat >::type Y(YSEXP);
 92 |     rcpp_result_gen = Rcpp::wrap(multiply_sp_d_v(X, Y));
 93 |     return rcpp_result_gen;
 94 | END_RCPP
 95 | }
 96 | // crossprod_sp_sp
 97 | arma::sp_mat crossprod_sp_sp(arma::sp_mat X, arma::sp_mat Y);
 98 | RcppExport SEXP _RISC_crossprod_sp_sp(SEXP XSEXP, SEXP YSEXP) {
 99 | BEGIN_RCPP
100 |     Rcpp::RObject rcpp_result_gen;
101 |     Rcpp::RNGScope rcpp_rngScope_gen;
102 |     Rcpp::traits::input_parameter< arma::sp_mat >::type X(XSEXP);
103 |     Rcpp::traits::input_parameter< arma::sp_mat >::type Y(YSEXP);
104 |     rcpp_result_gen = Rcpp::wrap(crossprod_sp_sp(X, Y));
105 |     return rcpp_result_gen;
106 | END_RCPP
107 | }
108 | // crossprod_sp_d
109 | arma::mat crossprod_sp_d(arma::sp_mat X, arma::sp_mat Y);
110 | RcppExport SEXP _RISC_crossprod_sp_d(SEXP XSEXP, SEXP YSEXP) {
111 | BEGIN_RCPP
112 |     Rcpp::RObject rcpp_result_gen;
113 |     Rcpp::RNGScope rcpp_rngScope_gen;
114 |     Rcpp::traits::input_parameter< arma::sp_mat >::type X(XSEXP);
115 |     Rcpp::traits::input_parameter< arma::sp_mat >::type Y(YSEXP);
116 |     rcpp_result_gen = Rcpp::wrap(crossprod_sp_d(X, Y));
117 |     return rcpp_result_gen;
118 | END_RCPP
119 | }
120 | // crossprod_d_d
121 | arma::mat crossprod_d_d(arma::mat X, arma::mat Y);
122 | RcppExport SEXP _RISC_crossprod_d_d(SEXP XSEXP, SEXP YSEXP) {
123 | BEGIN_RCPP
124 |     Rcpp::RObject rcpp_result_gen;
125 |     Rcpp::RNGScope rcpp_rngScope_gen;
126 |     Rcpp::traits::input_parameter< arma::mat >::type X(XSEXP);
127 |     Rcpp::traits::input_parameter< arma::mat >::type Y(YSEXP);
128 |     rcpp_result_gen = Rcpp::wrap(crossprod_d_d(X, Y));
129 |     return rcpp_result_gen;
130 | END_RCPP
131 | }
132 | // tcrossprod_sp_sp
133 | arma::sp_mat tcrossprod_sp_sp(arma::sp_mat X, arma::sp_mat Y);
134 | RcppExport SEXP _RISC_tcrossprod_sp_sp(SEXP XSEXP, SEXP YSEXP) {
135 | BEGIN_RCPP
136 |     Rcpp::RObject rcpp_result_gen;
137 |     Rcpp::RNGScope rcpp_rngScope_gen;
138 |     Rcpp::traits::input_parameter< arma::sp_mat >::type X(XSEXP);
139 |     Rcpp::traits::input_parameter< arma::sp_mat >::type Y(YSEXP);
140 |     rcpp_result_gen = Rcpp::wrap(tcrossprod_sp_sp(X, Y));
141 |     return rcpp_result_gen;
142 | END_RCPP
143 | }
144 | // tcrossprod_d_d
145 | arma::mat tcrossprod_d_d(arma::mat X, arma::mat Y);
146 | RcppExport SEXP _RISC_tcrossprod_d_d(SEXP XSEXP, SEXP YSEXP) {
147 | BEGIN_RCPP
148 |     Rcpp::RObject rcpp_result_gen;
149 |     Rcpp::RNGScope rcpp_rngScope_gen;
150 |     Rcpp::traits::input_parameter< arma::mat >::type X(XSEXP);
151 |     Rcpp::traits::input_parameter< arma::mat >::type Y(YSEXP);
152 |     rcpp_result_gen = Rcpp::wrap(tcrossprod_d_d(X, Y));
153 |     return rcpp_result_gen;
154 | END_RCPP
155 | }
156 | // winsorize_
157 | arma::rowvec winsorize_(arma::rowvec x, double y);
158 | RcppExport SEXP _RISC_winsorize_(SEXP xSEXP, SEXP ySEXP) {
159 | BEGIN_RCPP
160 |     Rcpp::RObject rcpp_result_gen;
161 |     Rcpp::RNGScope rcpp_rngScope_gen;
162 |     Rcpp::traits::input_parameter< arma::rowvec >::type x(xSEXP);
163 |     Rcpp::traits::input_parameter< double >::type y(ySEXP);
164 |     rcpp_result_gen = Rcpp::wrap(winsorize_(x, y));
165 |     return rcpp_result_gen;
166 | END_RCPP
167 | }
168 | // lm_coef
169 | arma::colvec lm_coef(arma::mat X, arma::colvec y);
170 | RcppExport SEXP _RISC_lm_coef(SEXP XSEXP, SEXP ySEXP) {
171 | BEGIN_RCPP
172 |     Rcpp::RObject rcpp_result_gen;
173 |     Rcpp::RNGScope rcpp_rngScope_gen;
174 |     Rcpp::traits::input_parameter< arma::mat >::type X(XSEXP);
175 |     Rcpp::traits::input_parameter< arma::colvec >::type y(ySEXP);
176 |     rcpp_result_gen = Rcpp::wrap(lm_coef(X, y));
177 |     return rcpp_result_gen;
178 | END_RCPP
179 | }
180 | // lm_
181 | Rcpp::List lm_(arma::mat X, arma::colvec y);
182 | RcppExport SEXP _RISC_lm_(SEXP XSEXP, SEXP ySEXP) {
183 | BEGIN_RCPP
184 |     Rcpp::RObject rcpp_result_gen;
185 |     Rcpp::RNGScope rcpp_rngScope_gen;
186 |     Rcpp::traits::input_parameter< arma::mat >::type X(XSEXP);
187 |     Rcpp::traits::input_parameter< arma::colvec >::type y(ySEXP);
188 |     rcpp_result_gen = Rcpp::wrap(lm_(X, y));
189 |     return rcpp_result_gen;
190 | END_RCPP
191 | }
192 | 
193 | static const R_CallMethodDef CallEntries[] = {
194 |     {"_RISC_sqrt_sp", (DL_FUNC) &_RISC_sqrt_sp, 1},
195 |     {"_RISC_cent_sp_d", (DL_FUNC) &_RISC_cent_sp_d, 1},
196 |     {"_RISC_multiply_sp_sp", (DL_FUNC) &_RISC_multiply_sp_sp, 2},
197 |     {"_RISC_multiply_sp_d", (DL_FUNC) &_RISC_multiply_sp_d, 2},
198 |     {"_RISC_multiply_d_d", (DL_FUNC) &_RISC_multiply_d_d, 2},
199 |     {"_RISC_multiply_sp_d_sp", (DL_FUNC) &_RISC_multiply_sp_d_sp, 2},
200 |     {"_RISC_multiply_sp_d_v", (DL_FUNC) &_RISC_multiply_sp_d_v, 2},
201 |     {"_RISC_crossprod_sp_sp", (DL_FUNC) &_RISC_crossprod_sp_sp, 2},
202 |     {"_RISC_crossprod_sp_d", (DL_FUNC) &_RISC_crossprod_sp_d, 2},
203 |     {"_RISC_crossprod_d_d", (DL_FUNC) &_RISC_crossprod_d_d, 2},
204 |     {"_RISC_tcrossprod_sp_sp", (DL_FUNC) &_RISC_tcrossprod_sp_sp, 2},
205 |     {"_RISC_tcrossprod_d_d", (DL_FUNC) &_RISC_tcrossprod_d_d, 2},
206 |     {"_RISC_winsorize_", (DL_FUNC) &_RISC_winsorize_, 2},
207 |     {"_RISC_lm_coef", (DL_FUNC) &_RISC_lm_coef, 2},
208 |     {"_RISC_lm_", (DL_FUNC) &_RISC_lm_, 2},
209 |     {NULL, NULL, 0}
210 | };
211 | 
212 | RcppExport void R_init_RISC(DllInfo *dll) {
213 |     R_registerRoutines(dll, NULL, CallEntries, NULL, NULL);
214 |     R_useDynamicSymbols(dll, FALSE);
215 | }
216 | 


--------------------------------------------------------------------------------
/tests/testthat.R:
--------------------------------------------------------------------------------
1 | library(testthat)
2 | library(RISC)
3 | 
4 | test_check("RISC")
5 | 


--------------------------------------------------------------------------------
/tests/testthat/test-workflow.R:
--------------------------------------------------------------------------------
  1 | context("test-workflow")
  2 | 
  3 | # Load raw data
  4 | # load("../testdata/testdata.rda")
  5 | 
  6 | mat0 = as.matrix(raw.mat[[1]])
  7 | coldata0 = as.data.frame(raw.mat[[2]])
  8 | coldata1 <- coldata0[coldata0$Batch0 == 'Batch1',]
  9 | coldata2 <- coldata0[coldata0$Batch0 == 'Batch4',]
 10 | mat1 <- mat0[,rownames(coldata1)]
 11 | mat2 <- mat0[,rownames(coldata2)]
 12 | 
 13 | 
 14 | #####################################################################################
 15 | context("Creat RISC object")
 16 | 
 17 | sce1 <- readsc(count = mat1, cell = coldata1, gene = data.frame(Symbol = rownames(mat1), row.names = rownames(mat1)), is.filter = FALSE)
 18 | sce2 <- readsc(count = mat2, cell = coldata2, gene = data.frame(Symbol = rownames(mat2), row.names = rownames(mat2)), is.filter = FALSE)
 19 | 
 20 | test_that("Whether objects are scdataset objects", {
 21 |   expect_is(sce1, 'RISCdata')
 22 |   expect_is(sce2, 'RISCdata')
 23 | })
 24 | 
 25 | test_that("col.names of matrix in objects equal to row.names of coldata in objects", {
 26 |   expect_equal(colnames(sce1@assay$count), rownames(sce1@coldata))
 27 | })
 28 | 
 29 | test_that("row.names of matrix in objects equal to row.names of rowdata in objects", {
 30 |   expect_equal(rownames(sce2@assay$count), rownames(sce2@rowdata))
 31 | })
 32 | 
 33 | 
 34 | #####################################################################################
 35 | context("Preprocess data")
 36 | 
 37 | sce1 = scFilter(sce1, min.UMI = 0, max.UMI = Inf, min.gene = 0, min.cell = 0, is.filter = FALSE)
 38 | sce2 = scFilter(sce2, min.UMI = 0, max.UMI = Inf, min.gene = 0, min.cell = 0, is.filter = FALSE)
 39 | 
 40 | test_that("Select the correct cells", {
 41 |   expect_equal(min(sce1@coldata$nGene), 464)
 42 |   expect_equal(max(sce2@coldata$nGene), 522)
 43 | })
 44 | 
 45 | test_that("Select the correct genes", {
 46 |   expect_equal(sce1@rowdata$Symbol[1], "Gene1")
 47 |   expect_equal(max(sce2@rowdata$nCell), 25)
 48 |   expect_false(is.null(sce1@metadata$filter))
 49 | })
 50 | 
 51 | sce1 = scNormalize(sce1)
 52 | sce2 = scNormalize(sce2)
 53 | 
 54 | test_that("Normalize raw counts/UMIs", {
 55 |   expect_equal(min(sce1@assay$logcount), 0)
 56 |   expect_gt(max(sce2@assay$logcount), 6.4)
 57 |   expect_equal(length(sce1@metadata$normalise), 1)
 58 | })
 59 | 
 60 | 
 61 | #####################################################################################
 62 | context("Identify highly variable genes")
 63 | 
 64 | sce1 = scDisperse(sce1)
 65 | sce2 = scDisperse(sce2)
 66 | 
 67 | test_that("highly variable genes", {
 68 |   expect_false(is.null(sce1@vargene))
 69 |   expect_false(is.null(sce2@metadata$dispersion.var))
 70 | })
 71 | 
 72 | 
 73 | #####################################################################################
 74 | context("Dimension reduction")
 75 | 
 76 | sce1 = scPCA(sce1, npc = 5)
 77 | sce2 = scPCA(sce2, npc = 5)
 78 | 
 79 | test_that("PCA", {
 80 |   expect_false(is.null(sce1@DimReduction$cell.pca))
 81 |   expect_gt(max(sce2@DimReduction$var.pca), 0.30)
 82 | })
 83 | 
 84 | sce1 = scTSNE(sce1, npc = 5, perplexity = 5)
 85 | sce2 = scTSNE(sce2, npc = 5, perplexity = 5)
 86 | 
 87 | test_that("tSNE", {
 88 |   expect_false(is.null(sce1@DimReduction$cell.tsne))
 89 | })
 90 | 
 91 | sce1 = scUMAP(sce1, npc = 5)
 92 | sce2 = scUMAP(sce2, npc = 5)
 93 | 
 94 | test_that("UMAP", {
 95 |   expect_false(is.null(sce1@DimReduction$cell.umap))
 96 | })
 97 | 
 98 | 
 99 | #####################################################################################
100 | context("Data integration")
101 | 
102 | var0 = intersect(rownames(sce1@assay$logcount), rownames(sce2@assay$logcount))
103 | idat = list(sce1, sce2)
104 | idat = scMultiIntegrate(idat, eigens = 4, npc = 5, var.gene = var0, add.Id = c('Set1', 'Set2'))
105 | 
106 | test_that("Integrate datasets", {
107 |   expect_false(is.null(idat@DimReduction$cell.pls))
108 |   expect_equal(length(levels(idat@coldata$Set)), 2)
109 | })
110 | 
111 | 
112 | #####################################################################################
113 | context("Cell clustering")
114 | 
115 | idat = scUMAP(idat, npc = 5, use = 'PLS')
116 | idat = scCluster(idat, slot = 'cell.umap', k = 4, method = 'density', dc = 0.3253)
117 | 
118 | test_that("Integrate datasets", {
119 |   expect_false(is.null(idat@DimReduction$cell.umap))
120 |   expect_equal(length(levels(idat@coldata$Cluster)), 4)
121 | })
122 | 
123 | 
124 | 


--------------------------------------------------------------------------------
/vignettes/RISC_Vignette.Rmd:
--------------------------------------------------------------------------------
  1 | ---
  2 | title: "RSC: robust integration of single-cell RNA-seq datasets using a single reference space"
  3 | author: 
  4 | - name: Yang Liu, Tao Wang, Deyou Zheng
  5 |   affiliation: Albert einstein college of medicine, Bronx, NY, United States
  6 | date: "2021"
  7 | output: rmarkdown::html_vignette
  8 | package: RISC
  9 | vignette: >
 10 |   %\VignetteIndexEntry{RSC: robust integration of single-cell RNA-seq datasets using a single reference space}
 11 |   %\VignetteEngine{knitr::rmarkdown}
 12 |   %\VignetteEncoding{UTF-8}
 13 | ---
 14 | 
 15 | ```{r, include = FALSE}
 16 | knitr::opts_chunk$set(
 17 |   collapse = TRUE,
 18 |   comment = "#>"
 19 | )
 20 | ```
 21 | 
 22 | # Introduction
 23 | 
 24 | Single-cell RNA sequencing (scRNA-seq) has become an essential genomic technology for resolving gene expression heterogeneity in single cells and has been widely used in many biological domains. It remains challenging to integrate scRNA-seq datasets with inter-sample heterogeneity, for example,  different cell subpopulation compositions among datasets or gene expression difference in the same cell groups across datasets.
 25 | 
 26 | We find that the distortion in the integration of heterogeneous data is due to the lack of a consistent global reference space for projecting all cells from individual datasets. To overcome this issue, we develop a novel approach, named reference principal component integration (RPCI) and implemente it in a new scRNA-seq analysis package called “RISC”, for robust integration of scRNA-seq data.
 27 | 
 28 | 
 29 | ```{r setup}
 30 | library(RISC)
 31 | library(RColorBrewer)
 32 | 
 33 | data("raw.mat")
 34 | mat0 = raw.mat[[1]]
 35 | coldata0 = raw.mat[[2]]
 36 | 
 37 | coldata1 = coldata0[coldata0$Batch0 == 'Batch1',]
 38 | coldata2 = coldata0[coldata0$Batch0 == 'Batch2',]
 39 | coldata3 = coldata0[coldata0$Batch0 == 'Batch3',]
 40 | coldata4 = coldata0[coldata0$Batch0 == 'Batch4',]
 41 | coldata5 = coldata0[coldata0$Batch0 == 'Batch5',]
 42 | coldata6 = coldata0[coldata0$Batch0 == 'Batch6',]
 43 | mat1 = mat0[,rownames(coldata1)]
 44 | mat2 = mat0[,rownames(coldata2)]
 45 | mat3 = mat0[,rownames(coldata3)]
 46 | mat4 = mat0[,rownames(coldata4)]
 47 | mat5 = mat0[,rownames(coldata5)]
 48 | mat6 = mat0[,rownames(coldata6)]
 49 | ```
 50 | 
 51 | 
 52 | # Heterogeneous Simulated data
 53 | 
 54 | To show the advantages of RPCI and evaluate its performances quantitatively, we start with simulated data and control the degrees of gene expression difference in two of the three cell groups between datasets: more DE genes in c/c' than that in b/b', and no DE genes in group a. As expected, the cell groups with increasing DE genes displayed a gradual reduction in cell similarity.
 55 | 
 56 | # Create RISC objects
 57 | 
 58 | We generate RISC objects from the gene-cell matrix (mat), the data frame of the cells (coldata), and the data frame of the genes, using "readsc" function. The RISC objects can also be generated by "read10X_mtx" or "read10X_h5" function for 10X Genomics data.
 59 | 
 60 | 
 61 | ```{r}
 62 | sce1 = readsc(count = mat1, cell = coldata1, gene = data.frame(Symbol = rownames(mat1), row.names = rownames(mat1)), is.filter = FALSE)
 63 | sce2 = readsc(count = mat2, cell = coldata2, gene = data.frame(Symbol = rownames(mat2), row.names = rownames(mat2)), is.filter = FALSE)
 64 | sce3 = readsc(count = mat3, cell = coldata3, gene = data.frame(Symbol = rownames(mat3), row.names = rownames(mat3)), is.filter = FALSE)
 65 | sce4 = readsc(count = mat4, cell = coldata4, gene = data.frame(Symbol = rownames(mat4), row.names = rownames(mat4)), is.filter = FALSE)
 66 | sce5 = readsc(count = mat5, cell = coldata5, gene = data.frame(Symbol = rownames(mat5), row.names = rownames(mat5)), is.filter = FALSE)
 67 | sce6 = readsc(count = mat6, cell = coldata6, gene = data.frame(Symbol = rownames(mat6), row.names = rownames(mat6)), is.filter = FALSE)
 68 | ```
 69 | 
 70 | 
 71 | # Processing RISC data
 72 | 
 73 | After create RISC objects, we next process the RISC data, here we show the standard processes:
 74 |    (1) filter the cells, remove cells with extremely low or high UMIs and discard cells with extremely low number of expressed genes. Here we use simulated data, so we do not filter out any cell.
 75 |    (2) normalize gene expression, removing the effect of RNA sequencing depth.
 76 |    (3) scale gene expression, the scaled counts merely contain gene signal information for individual cells, and yield column-wise zero empirical mean for each column, thus satisfying the requirement for PCA and SVD.
 77 |    (4) find highly variable genes, identify highly variable genes by Quasi-Poisson model and utilize them for gene-cell matrix decomposition and data integration.
 78 | 
 79 | 
 80 | ```{r}
 81 | process0 <- function(obj0){
 82 |   
 83 |   # filter cells
 84 |   obj0 = scFilter(obj0, min.UMI = 0, max.UMI = Inf, min.gene = 10, min.cell = 3, is.filter = FALSE)
 85 |   
 86 |   # normalize data
 87 |   obj0 = scNormalize(obj0, method = 'robust')
 88 |   
 89 |   # find highly variable genes
 90 |   obj0 = scDisperse(obj0)
 91 |   
 92 |   # here replace highly variable genes by all the genes for integraton
 93 |   obj0@vargene = rownames(sce1@rowdata)
 94 |   
 95 |   return(obj0)
 96 |   
 97 | }
 98 | 
 99 | sce1 = process0(sce1)
100 | sce2 = process0(sce2)
101 | sce3 = process0(sce3)
102 | sce4 = process0(sce4)
103 | sce5 = process0(sce5)
104 | sce6 = process0(sce6)
105 | ```
106 | 
107 | 
108 | # RPCI integration
109 | 
110 | The core principle of RPCI is very different from existing methods, RPCI introduces an effective formula to calibrate cell similarity by a global reference, and directly projects all cells into a reference RPCI space.
111 | 
112 | 
113 | ```{r}
114 | set.seed(1)
115 | var.genes = rownames(sce1@assay$count)
116 | pcr0 = list(sce1, sce2, sce3, sce4, sce5, sce6)
117 | pcr0 = scMultiIntegrate(pcr0, eigens = 9, var.gene = var.genes, align = 'OLS', npc = 15)
118 | # pcr0 = scLargeIntegrate(pcr0, var.gene = var.genes, align = 'Predict', npc = 8)
119 | pcr0 =scUMAP(pcr0, npc = 9, use = 'PLS', dist = 0.001, neighbors = 15)
120 | ```
121 | 
122 | ```{r}
123 | pcr0@coldata$Group = factor(pcr0@coldata$Group0, levels = c('Group1', 'Group2', 'Group2*', 'Group3', 'Group3*'), labels = c("a", "b", "b'", "c", "c'"))
124 | pcr0@coldata$Set0 = factor(pcr0@coldata$Set, levels = c('Set1', 'Set2', 'Set3', 'Set4', 'Set5', 'Set6'), labels = c('Set1 rep.1', 'Set2 rep.1', 'Set3 rep.1', 'Set1 rep.2', 'Set2 rep.2', 'Set3 rep.2'))
125 | pcr0 = scCluster(pcr0, slot = "cell.umap", k = 4, method = "density", dc = 0.3)
126 | ```
127 | 
128 | 
129 | # UMAP plot
130 | 
131 | The dissimilarity in c/c' is larger than that in b/b' based on our original design, and this cell-cell relationship can be directly reflected in the UMAP plots of the RPCI-integrated data. And the difference in c/c' can be re-clustered from the integrated data.
132 | 
133 | Here the simulated data contain three sets, each set includes three cell groups and two replicates, with the batches existing among sets and duplicates. 
134 | 
135 | 
136 | ```{r, fig.show="hold", out.width="48", fig.dim=c(7, 5)}
137 | DimPlot(pcr0, colFactor = 'Set0', size = 2)
138 | DimPlot(pcr0, colFactor = 'Group', size = 2, Colors = brewer.pal(5, "Set1"))
139 | DimPlot(pcr0, colFactor = 'Cluster', size = 2, Colors = brewer.pal(6, "Dark2"))
140 | ```
141 | 
142 | 
143 | More details and real scRNA-seq data tutorial shown in the URLs: https://github.com/yangRISC/RISC
144 | 
145 | 
146 | # Session Information
147 | ```{r}
148 | sessionInfo()
149 | ```
150 | 


--------------------------------------------------------------------------------